By Frank Takes.
At the IPA-days. Frank works on the topic of pattern matching in graph theory. In the hype of data science: gaining insight from very large data sets from a network perspective: Network science. If you want to understand the global system, you can only do so by looking at the whole system, rather than just the individuals. As an example, Frank uses a large dataset of coporations who have shared board members over the last few years. Basically a large network with 7 steps to get to know everybody: a large graph without any structure.
Community detection
In a graph, it might be that there are clusters that together form communities. The idea is based on high cohesion and low coupling: within the community there are many links, but few(er) between communities. Finding the optimal is of course in the NP-spectrum. Applying community discovery algorithms on the dataset, we find again country borders, by plotting the points on their geographic location, and coloring based on the community.
Value chains
Offshore Financial Centers (OFCs) are jurisdictions (countries) that attract financial activities from abroad through low taxes and lenient regulation. These are not only from a moral point problematic, but also create unnecessary complexity and lowers the accountability of organizations. So, how can you discover OFCs? One way would be to calculate the FDI/GDP ratio. If there is more Foreign Direct Investment than expected, then it is an OFC. However, as a country, you have to provide those numbers yourself. Hence, the data is biased. Another problem is that some countries are not seen as countries: politics. So, can network science help? The dataset contains 71M links, and about 90M nodes. Links indicate ownership or value flow, given as an [0,1] interval. In this way, a path shows how revenue flows between organisations and individuals. An ownership chain then is a maximal, simple path with a path value larger than some treshold. These can be found using simple DFS techniques, resulting in a new dataset. Next, they aggregrate this data to countries: country chains, which resulted in 11M transnational ownership chains (from 108M ownership chains). They then only look at chains of length 2 or 3. A country is called a sink if less value leaves the country than enters. This can be calculated based on these small chains, resulting in a set of sink-OFCs. Next, they measure conduit using chains of length 3: A->B->C who is in the center? Conduit then counts how often a node is in the middle. Then, the Netherlands appears on position 1! This becomes extra visable if size represents the Conduit, and coloring the sink value.
[1] Javier Garcia-Bernardo, Jan Fichtner, Frank W. Takes & Eelke M. Heemskerk. Uncovering Offshore Financial Centers: Conduits and Sinks in the Global Corporate Ownership Network. Scientific Reports 7, Article number: 6246 (2017)