The digital exhaust left by flows of physical and digital commodities provides a rich measure of the nature, strength and significance of relationships between countries in the global network. With this work, we examine how these traces and the network structure can reveal the socioeconomic profile of different countries. We take into account multiple international networks of physical and digital flows, including the previously unexplored international postal network. By measuring the position of each country in the Trade, Postal, Migration, International Flights, IP and Digital Communications networks, we are able to build proxies for a number of crucial socioeconomic indicators such as GDP per capita and the Human Development Index ranking along with twelve other indicators used as benchmarks of national well-being by the United Nations and other international organisations. In this context, we have also proposed and evaluated a global connectivity degree measure applying multiplex theory across the six networks that accounts for the strength of relationships between countries. We conclude by showing how countries with shared community membership over multiple networks have similar socioeconomic profiles. Combining multiple flow data sources can help understand the forces which drive economic activity on a global level. Such an ability to infer proxy indicators in a context of incomplete information is extremely timely in light of recent discussions on measurement of indicators relevant to the Sustainable Development Goals.
Citation: Hristova D, Rutherford A, Anson J, Luengo-Oroz M, Mascolo C (2016) The International Postal Network and Other Global Flows as Proxies for National Wellbeing. PLoS ONE 11(6): e0155976. doi:10.1371/journal.pone.0155976
Editor: Daniele Marinazzo, Ghent University, BELGIUM
Received: January 21, 2016; Accepted: May 6, 2016; Published: June 1, 2016
Copyright: © 2016 Hristova et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The manuscript makes use of five open data sources. The following data sources were used in our analysis: World Trade Network available from the MIT Atlas Project (https://atlas.media.mit.edu/about/data/sources/); Global Migration Network available from the Global Migration Project (http://www.global-migration.info/); IP Traceroute Network available from the DIMES Project (http://www.netdimes.org/new/?q=node/65); Digital Communications Network available from the Mesh of Civilizations Project (https://sites.google.com/site/meshofcivilizations/density-measure); Flight Network data available from ICAO (http://www.icao.int/Pages/default.aspx); Postal Network data as used in the analysis is available as a Supporting Information file.
Funding: Desislava Hristova received funding from the Project LASAGNE, Contract No. 318132 (STREP), funded by the European Commission and EPSRC Grant GALE (EP/K019392). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The vast streams of data that are produced by the use of automated digital services such as social media, email and mobile phones, also known as ‘Big Data’, have for some time been leveraged in the private sector to assist in tasks as diverse as logistics, targeted advertising and offering personalised multimedia content. More recently, these same data sources and methodologies have begun to be used to assist humanitarian and development organisations, allowing new ways to use data to implement, monitor and evaluate programs and policies . The ability of such novel data sources to complement traditional data collection techniques such as household surveys and focus groups is clear . The data is collected passively without the need for costly and potentially dangerous active data collection, which also avoids inaccuracies due to human error, bias  or dishonesty.
However, the use of Big Data for development is still relatively nascent and questions remain over the ability of such sources to measure or approximate metrics of interest. Invariably, data sources such as social networking applications enjoy deeper penetration in developed economies and rely on expensive technologies such as smart phones and robust communications infrastructure. It has been noted that measurements of human dynamics based on such recent platforms can lead to strong biases , with worse implications for those with limited access to these digital platforms.
In this paper we present analysis of a data source which is undoubtedly ‘Big’ yet represents one of the most established and pervasive long-distance communications networks in the history of mankind. The international postal network (IPN) established in 1874 is administered by a dedicated United Nations specialised agency: the Universal Postal Union (UPU). Due to regulatory reporting requirements and the capabilities of automated data capture technologies such as RFID tags, the records of individual postal items maintained by UPU represent a rich record of human activity with unparalleled penetration, which can be expected to reflect individual level behaviour, local, regional and national economic activity and international economic relations.
Network representations have emerged as an extremely powerful and general framework for analysing and modeling systems as diverse as transportation, biological processes, academic authorship and logistics among others . Network science provides powerful tools for understanding such systems with large sets of coupled components with emergent behaviours more generally known as complex systems. Previous work has explored flows of both physical and digital nature, where physical flows of goods and people [6–12] and digital flows of information and communication [13–17] have been extensively studied in the past in order to understand better the way in which they affect the wealth, resilience and function of social systems on global, regional, national and sub-national scales. With our work we aim to address the general question of whether structural network properties of different flow networks between countries can be used to produce proxy indicators for the socioeconomic profile of a country.
Methodology and Data
In this work, we explore over four years of daily postal data records between all countries by comparing them to other global flow networks, such as the trade, migration and digital networks. We show how the network properties of global flow networks can approximate critical socioeconomic indicators and how network communities formed across physical and digital flow networks can reveal socioeconomic similarities.
Real-time measurements of international flow networks can ultimately act as global monitors of wellbeing with positive implications for international development efforts. Using knowledge about the way in which countries interact through flows of goods, people and information, we use the principles of multiplexity theory to understand the strength of international ties and the network communities they form. In this section, we will detail the methods used to perform our analysis and the various datasets with focus on the international postal network (IPN), which has previously not been described.
Multiplexity, or the multiple layers of interactions between the same entities, has been explored in a wide range of systems from global air transportation  to massive online multiplayer games . In , the author studied the implications of multiple media usage on social ties in an academic organisation and discovered that multiplex ties (those which use multiple media) indicate a stronger bond. This has been empirically evaluated on networks with both geographical and social interactions recently , where it was found that people share a stronger bond when observed to communicate through many different media. These findings support the intuition that a pair of nodes enjoy a stronger relationship if they are better connected across several diverse network layers. The multichannel exchange of information or goods, offers a simple and reliable way of estimating tie strength but has not been applied to international networks of flows until now.
Multiplex network model.
A natural extension of a network in which edges between pairs of nodes represent a single kind of flow between those nodes, is to a multiplex network  including several qualitatively different kinds of flows which may each be understood as a single distinct layer. The advantages of a multiplex model is that the presence of several different network layers has been consistently shown to be more informative than a single layer [23–26].
A comprehensive review of multiplex network models can be found in , however, in this work we will apply a simple multiplex model to capture the multiple flow interactions which we will describe in the following section. A multiplex network is one where multiple connections exist between the same entities yet a different set of neighbours exists for a node in each layer . Although many possible representations of multiplex networks exist, in our model, we consider all six networks in our study as a collection of graphs, similar to previous works on aggregated multiplex graphs [29, 30]: (1) where each graph contains a set of edges E and nodes V, and m is the total number of networks. This allows us to define the multiplex neighbourhood of a node i as the union of its neighbourhoods on each single graph: (2) where Nα(i) is the neighbourhood of nodes to which node i is connected on layer α. The cardinality of this set can be considered as the node’s global multiplex degree, or in other words the total number of countries with which a country has exchanges in any of the layers (post, trade, etc.), similarly to previous work on aggregated multiplex graphs [29–31]: (3)
We can also compute the weighted global degree of a node i as: (4) which is the sum of the weights of edges in the multiplex neighbourhood and for each graph layer they appear on. We add an edge weight if eij, eji ∈ G for each network in the collection . We only consider edges present in both directions because the global degree is ultimately a measure of tie strength and we want to consider well-established flows between countries only. This is common practice in other contexts where tie strength is of importance such as in social networks . We then normalise the weighted global degree by the number of possible edges n * m, where n is the total number of nodes and m is the number of networks in the multiplex collection. We plot the cumulative degree distribution of both the weighted and unweighted global degrees in Fig 1.
The average global degree is 110 and the average global weighted degree is 250, which means that each country connects with an average of 110 other countries through two or more layers. In terms of unweighted degree (number of unique connections globally in the multiplex) in Fig 1A, we notice a substantial curvature, indicative of the moderately stable degree approaching 102 but a sudden decline after, indicative of the few countries 10−0.5(32%) having a degree higher than 130. A steeper decline can be observed in the weighted distribution in Fig 1B, where the majority of countries have a weighted degree of 0.25 or less (10−0.6), signifying that they have realised 25% or less of their connectivity in the global multiplex. Although many empirical measurements of networks are noted to follow a power law distribution, this appears as a straight line in a log-log degree distribution plot, which is clearly not the case in our data. However, the distribution is right-skewed, with a small number of countries being observed to have high global degrees.
Community multiplexity index.
Networks are powerful representations of complex systems with a large degree of interdependence. However in many such systems, the network representing it naturally partitions into communities made up of nodes that share dependencies between each other, but share fewer with other components. In the present context, communities are composed of groups of countries that share higher connectivity than the rest of the network. If two countries appear in the same community across many network layers, this can be considered a greater level of connectivity and an indicator of greater socioeconomic similarity, otherwise not visible from the single network perspective. We formalise this idea as the community multiplexity index of a pair of countries (i, j): (5) where ci is a discrete variable indexing the cluster of which country i is a member. If the two are equivalent for a given network G, the level of community multiplexity increases by one, represented by the Kronecker delta function, which evaluates membership equivalency of the two nodes. Prior work has explored information similarity in terms of community structure between layers [6, 33] and many novel ways of community detection in multilayer networks [34–36]. Although we use a community detection approach on each layer separately, our goal is not to obtain community clusters of countries in the multiplex but to observe the strength of connectivity between countries across layers as a measure of their similarity in order to build a proxy for exploring the socioeconomic similarity of pairs of countries.
Having described our methodology using multiplex networks, which has not been previously applied to international networks of flows, we will proceed to describe the six networks and fourteen global socioeconomic indicators which we use in the core of our analysis next.
The International Postal Network
Although postal flows are understood to follow a distance based gravity model , similar to other networks describing flows, little is understood about the network properties of the postal network and how they relate to those of other global flow networks. The International Postal Network (IPN) is constructed using electronic data records of origin and destination for individual items sent between countries collected by the Universal Postal Union (UPU) since 2010 until present. Items are recorded on a daily basis amounting to nearly 14 million records of items sent between countries. As one of the most developed communication networks on a global scale, it is a dense network with 201 countries and autonomous areas, and 23K postal connections between them, with 64% of all possible postal connections established. The global volume of post has seasonal peaks observable in Fig 2. Notably, since 2010 postal activity is on the rise and this can be accounted for by the parallel growth of e-commerce . This positions postal flows as a sustainable indicator of socioeconomic activity.
In terms of daily activity, we can observe the mean relative number of daily items sent and received by countries during the period in Fig 3. This can be highly dependent on the size of the population of a country so we have normalised the volume per country’s population. We use annual population statistics provided by the World Bank and collected by the United Nations Population Division. From the distribution of volume it becomes clear that the majority of countries send and receive a similar amount of post per capita, however with a number of exceptions on both ends where a few countries send and receive exceptionally low or high number of items.
Volume is proportional but does not represent the actual number of items exchanged due to data sensitivity.
Next we report on the degree distributions of both the weighted and unweighted global postal graphs. The unweighted postal graph simply contains all directed edges present in the network regardless of flow volume. The weighted graph on the other hand also includes the weight of connections in the graph. We weight the network by summing the total annual volumes of directed flow between two countries, averaged over years and normalised over the population of the country of origin. We then further normalise by the maximum weight in the network, resulting in a value between 0 and 1, allowing us to compare values between networks. The weighted adjacency matrix of the top quartile of countries in terms of degree can be seen in Fig 4 with the US and UK having the largest numbers of postal partners. Prominent postal network countries have relatively high interaction with most of their partners, including interactions with lower ranked countries. This is related to the degree assortativity within the postal network, discussed in the following section. Further, both weighted and unweighted degree distributions are shown in Fig 5, as the complementary cumulative probability function (CCDF). We can see in Fig 5A that the in and out degrees are relatively balanced in both instances and that about 50% of countries have more than 100 postal partners. The weighted degree in Fig 5B follows a similar pattern, which means that countries tend to interact equally proportional to the number of their postal partners. In the following section, we will compare the postal network properties to other flow networks.
Other global flow networks
This work builds upon previous efforts using global flow networks to present novel data sources for international development efforts such as the IPN and to demonstrate a holistic view of several distinct flow networks. We consider five networks, which have been previously studied independently, along with the IPN. We will now describe these networks and compare their network properties in the following section.
The World Trade Network.
The trade network is constructed from records maintained by the UN Statistics Division in the Comtrade Database and provided by the Atlas Project and contains the number and value of products traded between countries classified by commodity class.
The Global Migration Network.
This is compiled from bilateral flows between 196 countries as estimated from sequential stock tables. It captures the number of people who changed their country of residence over a five-year period. This reflects migration transitions and not short term movements. This data is provided by the Global Migration Project.
The International Flights Network.
The flights data is collected by 191 national civil aviation administrations and compiled by the International Civil Aviation Organisation (ICAO). These tables detail, for all commercial passenger and freight flights, country of origin and destination and the number of flights between them. .
The IP Traceroute Network.
This city to city geocoded dataset is built from traceroutes in the form of directed IP to IP edges collected in a crowdsourced fashion by volunteers through the DIMES Project. The project relies on data from volunteers who have installed the measurement software which collects origin, destination and number of IP level edges which were discovered daily. We aggregate this data on a country to country basis and use it to construct an undirected Internet topology network, weighted by the number of IPs discovered and normalised by population as all other networks. The data collection methods are described in detail in the founding paper of the project . The global mapping of the Internet topology provides insight into international relationships from the perspective of the digital infrastructure layer.
The Social Media Density Network.
is constructed from aggregated digital communication data from the Mesh of Civilizations project, where Twitter and Yahoo email data is combined to produce an openly available density measure of the strength of digital communication between nations . This measure is normalised by the population of Internet users in each country and thus is well aligned with the rest of the networks we use. It also blends data from two distinct sources and thus provides greater independence from service bias. Because the study considers tie strength, it only includes bi-directed edges in the two platforms where there has been a reciprocal exchange of information and therefore this network is undirected.
In the following analysis we compare these networks and use multiplexity theory to extract knowledge about the strength of connectivity across them. We will distinguish between single layer and multiplex measures, which will allow us to observe to a deeper extent the international relationships and the potential for using global flow networks to estimate the wellbeing of countries in terms of a number of socioeconomic indicators (summarised in Table 1).
In order to understand the multiplex relationships of countries through flows of information and goods in context, we first compare all flow networks. We then present their respective and collective ability to approximate crucial socioeconomic indicators and finally perform a network community analysis of individual networks and their multiplex communities where the most socioeconomically similar countries can be found.
Although each of the five networks previously described apart from the International Postal Network (IPN) has been studied separately, there has not been a comparative analysis of all. In Table 2, we list the network properties of all six network separately. The number of nodes or countries exceeds 195(6) due to differing lists of member states providing statistics to each authority. Although weights are distinct for each network, they always represent a volume of flow between areas. While there are small discrepancies between the years of each network, most networks cover a five year period, with the exception of the Social Media network which is from a single year. The volume of interaction between two countries is therefore averaged over the number of years for each network.
We weight all networks by normalising the raw volume of interaction described above by the population of each respective country of origin and rescaling all weights across networks within the same range [0, 1] by dividing by the maximal weight, as we did for the postal network in the previous section. We compute the average out degree for each directed network in a standard way as for the postal network, as well as the degree assortativity (Pearson correlation between the degrees of all pairs of connected countries), the network density and clustering coefficient. The assortativity coefficient determines to what extent nodes in the network have mixing patterns that are determined by their degree. Positive assortativity means that nodes with high degree tend to connect to other nodes with high degree, whereas a negative assortativity means that nodes with high degree tend to connect with others with lower degree, which is the case for all of the six networks as seen in Table 2. Although all networks differ in size and average degree, they have relatively high clustering coefficients, reflecting a general tendency for countries to cluster together in global networks. This clustering however is not based on the importance of a node (its degree) since the assortativity coefficients for all networks are low or negative, suggesting that global networks are dissassortative and therefore higher degree nodes tend to connect to lower degree nodes.
We now turn to Fig 6 for a comparative analysis between the six networks. We refer to them for short as: post, trade, ip, mig, sm and fly. We use the Jaccard coefficient to compute the overlap of edges between pairs of networks in Fig 6A, where we divide the number of edges that exist on both networks over the number of edges that exist in any of the two networks. The highest Jaccard overlap is between the postal and trade networks, the two densest networks. The rest of the networks however are not strongly overlapping in terms of edges, which implies that each distinct network layer provides a non-trivial and complementary view of how countries connect. On the other hand, the Spearman rank correlation between weighted edges in Fig 6B reveals that the volume of flow of goods, people, and information is correlated for those edges between countries, which exist on both networks. A notable exception is the digital communications network (sm), which is entirely uncorrelated with any other network. This means that countries likely connect in unexpected ways on social media and email.
When considering the degree of a country as an indicator of its position in the network, we find that there are high correlations between the in and out positions of countries in Fig 6C and 6D. Although lower, the social media network is also correlated with the others. We should note that this is likely due to the smaller overlap between edges but for the nodes present across networks, we find that there is a strong correspondence between their positions in the different networks. Next we will explore how well different degree metrics approximate the socioeconomic indicators described above.
Timely statistics on key metrics of socio-economic status are essential for provision of services to societies, in particular marginalised populations. The motivation for this measurement varies from social resilience in the event of natural or man-made disasters to ensuring social rights such as education and access to information. While national governments typically administer their territories and allocate resources in terms of sub national divisions, international organisations such as the United Nations and the World Bank, as well as regional organisations and blocs such as the Economic Council or Latin American and the Caribbean and the African Union invariably partition populations under nation states. In this context, the nation state is the primary geographical entity considered for funding, planning and allocation of resources for development. Despite the importance of accurate statistics to quantify the state of a country and progress towards favourable socio-economic outcomes, regular and reliable measurement is difficult and costly particularly in low income countries.
With this in mind, in this section we compare the positions of countries within the different networks discussed previously to the values of several socioeconomic indicators. Fig 7 shows the Spearman rank correlation between the network degrees of the six networks (in and out degree, and weighted in and out degree) and various socio-economic indicators: GDP, Life expectancy, Corruption Perception Index (CPI), Internet penetration rate, Happiness index, Gini index, Economic Complexity Index (ECI), Literacy, Poverty, CO2 emissions, Fixed phone line penetration, Mobile phone users, and the Human Development Index. These indicators and their significance for the international development agenda are described in detail in the data section (see Table 1).
For each of the six networks, we compute the network degree, defined as the sum of the neighbours for both incoming and outgoing connections where directed. This reflects how well connected a country is in a particular network. We also take into account the amount of connectivity by computing the weighted incoming and outgoing degrees on each network, defined as the sum of the normalised flows from all neighbours and reflecting the volume of incoming and outgoing flows. In addition to these standard single-layer network metrics, we define and compute the global degree of a country, which takes into account connectivity across all networks.
All degrees of single networks and the global degree appear vertically in Fig 7 and all indicators appear horizontally. In general, weighted outgoing degrees on the single networks perform best for the postal, trade, ip and flight networks. An exception from the physical flow networks is the migration network, where the incoming migration degree is more correlated with the various indicators.The best-performing degree, in terms of consistently high performance across indicators is the global degree (for 7 out of indicators). This suggests that looking at how well connected a country is in the global multiplex can be more indicative of its socioeconomic profile as a whole than looking at single networks. A detailed correlation matrix including correlation coefficients is supplied in S1 Fig.
The GDP per capita and life expectancy are most closely correlated with the global degree, closely followed by the postal, trade and ip weighed degrees. This shows a relationship between national wealth and the flow of goods and information. The perception of corruption index (CPI) however, is most positively correlated with the out weighted degrees of the postal and trade networks, followed by the IP network, similar to their relationship with the happiness index. This signifies that less corrupt and more happy countries have greater outflows in those respects. On the other hand, the Gini Index of inequality is distinctly most negatively correlated with the flight network, which means that countries with greater inequality have less incoming and outgoing flight connections. The ECI index is equally highly correlated with most network degrees, and especially the global degree, trade, ip and post degrees. Literacy, Education and mobile phone users per capita were more weakly correlated across than other indicators, which means that there may be better predictor variables beyond the scope of this work for those indicators. Fixed phone line households, Internet penetration and CO2 emissions, however, are positively correlated with the global degree, followed by the postal and ip degrees. This indicates the importance of global connectivity across networks with respect to these factors.
Similarly to GDP, the rate of poverty of a country is best represented by the global degree, followed by the postal degree. The negative correlation indicates that the more impoverished a country is, the less well connected it is to the rest of the world. Finally, one of the most strongly correlated indicators with the various degrees is the Human Development Index (HDI), low human development (high rank) is most highly negatively correlated with the global degree, followed by the postal, trade and ip degrees. This shows that high human development (low rank) is associated with high global connectivity and activity in terms of incoming and outgoing flows of information and goods. One notable observation is that the ip, postal and trade weighted out network degrees all have similar correlation patterns with the various indicators, the commonality between these networks is that they express the flow of resources from a country. Another observation is that weighted social media and migration outflow are weak predictors of the explored indicators. Because most indicators are related to each other, e.g., high GDP indicates low Poverty or high HDI indicates Happiness, when a degree is a predictor of one, it tends to be a good predictor of the others.
In this section we have shown that network science can provide reliable and easy to compute approximations of various indices and that connectivity between countries determines their position in global flow networks which relate to the success of their socioeconomic properties. Isolation of causative relationships between effects is notoriously difficult and the question of why some countries are prosperous while others are not is no exception (see Why Nations Fail (2012) D. Acemoglu and J. Robinson, Crown Publishing). Put simply, there are a myriad of confounding factors such as historical legacy, conflict and environmental factors that could lead countries with otherwise similar profiles to have wildly divergent economic outcomes.
Although our results do not provide insight into the cause of the socioeconomic circumstances of a country, our hypothesis is that network measures derived from global flow networks are a proxy of socioeconomic activities and therefore highly correlated with the explored indicators. It is an open question as to whether a highly central position in the network leads to favourable socio-economic outcomes or vica-versa. The structural connectedness of a country in the global network represents the number of opportunities a country has to exchange goods, information and resources with our countries—the more opportunities, the higher the exchange and therefore socioeconomic benefit. An analogous relation between the social network of an individual and that individual’s poverty score supports this hypothesis . A broad longitudinal study would be necessary to assert whether a country’s growth in connections precedes its economic growth or vice versa which is beyond the scope of this work. Next, we will look at the community structure of countries across networks and evaluate their community multiplexity to show that countries with similar socioeconomic profiles tend to cluster together, much like in social networks.
Global Community Analysis
In the previous section we related network measures to various socioeconomic indicators, showing that metrics such as the network degree can be used to estimate wellbeing at a national level. In this section, we further examine the connectedness between pairs of countries through community structure across network layers as a form of socioeconomic similarity. We use the Louvain modularity optimisation method  for community detection in each individual network, which takes into account the tie strength of relationships between countries and finds the optimal split in terms of disconnectedness in the international network. This returns between 4–6 communities for each network, the geographical distribution of which is shown in Fig 8.
Although communities naturally seem to be very driven by geography in physical flow networks, this is not the case in digital networks where communities are geographically dispersed. This is an indication of the difference in the way countries connect through post, trade, migration and flights rather than on the IP and social media networks. However, what does it mean for two countries to be both members of the same network community? Common community membership indicates a level of connectedness between two countries, which is beyond the randomly expected for the network. It is often observed that nodes in the same communities share many similar properties, therefore it can be expected that pairs of nodes which share multiple communities across networks are even more similar. In this work, we measure the overlap in pairwise membership between pairs of countries across our six networks as the community multiplexity index, a measure of socioeconomic similarity.
Our hypothesis is that countries that are paired together in communities across more networks are more likely to be socioeconomically similar. We measure similarity here as the absolute difference between each indicator from the previous section for two countries and plot that against their community multiplexity. For example, the United States has an average life expectancy of 70 years, whereas Afghanistan has an average life expectancy of 50, the absolute difference between the two is 20 which represents low similarity when compared to the United Kingdom’s life expectancy of 72 for this indicator. In Fig 9, we can observe the variations in similarity for countries with different levels of community multiplexity. What is immediately striking is that countries that share a maximal number of communities and therefore exhibit the greatest community multiplexity, have the smallest margin of difference across all indicators. This suggests that countries with the highest community multiplexity have a very similar socioeconomic profile. This is confirmed by a two-sample Kolmogorov-Smirnov test between the distributions of differences in each indicator for pairs sharing different numbers of communities. Although the KS statistic is lower between groups sharing 0 and 1 communities (apx. 0.1 for all indicators and p-value <0.01), it is very high for groups between 1 and 6 communities (0.4 and above, p-value <0.01), except for mobile phone penetration (detailed KS test results are presented in S1 Table).
Further to this observation, in most indicators there is a very strong significance in the level of community multiplexity—the higher the community multiplexity index between two countries, the smaller the difference between their socioeconomic profiles. There are notable exceptions to this such as the mobile phone penetration ratio, where it appears that beyond the highest level of multiplexity, all other countries are relatively similar in this aspect with low variation even for those pairs of countries which share no communities. For all other indicators such as GDP, Literacy ratio, HDI and Internet penetration, there is a dramatic increase in similarity past a community multiplexity of 3. Ultimately, these similarities can be used to estimate the wellbeing of countries for which it is unknown but can be estimated from its neighbours.
Big data is often related to real-time data captured through the Internet or social networks. However, the digital divide makes access to big data insights for development more challenging in the least developed and many developing and emerging countries. Can we rely on other networks to overcome these critical data gaps in view of better measuring and monitoring developmental progress? This is particularly important following the United Nations adoption of the Sustainable Development Goals (SDG) in September 2015, made of 17 goals, 169 targets and almost 200 universal indicators, each of them calling for regular and increasingly disaggregated monitoring in every country during the 2016–30 period. This commitment invites a nuanced discussion on the nature and importance of measurement, inference and triangulation of data sources. This discussion is particularly prescient in the face of complex intertwined developmental challenges in an age of increased globalisation, economic interdependence and climate change.
The work presented above has clearly shown the value of measuring, comparing, and combining metrics of global connectivity across six different global networks in order to approximate socioeconomic indicators and to identify network communities with similar connectivity profiles. We have shown how both global digital and physical network flows can contribute to support a better monitoring of SDG indicators, as illustrated by the high correlation between Internet and postal flows on the one hand, with an exhaustive list of socioeconomic indicators on the other hand.
We also note the considerable potential, exposed here, for future applications of postal flow data. While we have here restricted our analysis to country-level relations, postal flows allow for socio-economic mapping on a sub-national level which can inform development programmes on a practical level. An additional dimension to be explored—that is beyond the scope of this paper is temporal analysis which, combined with the multiplex network model presented above, could provide early warning of economic shocks and their propagation .
Interestingly, despite the ease of digital interactions and subsequent evidence that ‘distance is dead’ , physical networks, particularly the global postal, flight and migration networks, are still stronger candidates for proxy variables in case of missing data than digital networks such as the Internet or social media. These networks not only reach populations excluded from access to digital communications, but are also associated with the highest number of country pairs sharing relatively similar socioeconomic patterns, in turn opening numerous ways of completing missing data with proxy variables. In the digital era, greater granularity and frequency of analysis and monitoring of SDGs can, paradoxically, be achieved through global physical networks data. We expect that the value as proxies for the digital communication networks will increase as they mature, expand and become more accessible. In the near future, both physical and digital networks will need to be combined to optimise monitoring efforts. In that sense, the emergence of the Internet of things (IoT) could play a critical role by making even more fuzzy the frontiers between the digital and physical worlds.
S1 Fig. Correlation matrix augmented with correlation coefficients for each cell.
All results are statistically significant with p<0.05.
S1 Table. Two-sample Kolmogorov-Smirnov test statistic results and p-values for socioeconomic indicator differences between pairs of countries with minimal and maximal community multiplexity values (1 and 6).
S1 File. International postal network edges, where Source is the sending country, Target is the receiving country and Weight is the volume of post sent, normalised over the Source country population and scaled.
Desislava Hristova was supported by the Project LASAGNE, Contract No. 318132 (STREP), funded by the European Commission and EPSRC through Grant GALE (EP/K019392). We are grateful to Andrei Bejan for the statistics consultation and Noa Zilberman for advice on the DIMES Project data.
Conceived and designed the experiments: DH AR JA MLO. Performed the experiments: DH. Analyzed the data: DH AR JA. Contributed reagents/materials/analysis tools: AR JA MLO. Wrote the paper: DH AR JA MLO CM.
- 1. United Nations Global Pulse. Big data for development: Challenges & opportunities; 2012.
- 2. United Nations. A World That Counts: Mobilising the Data Revolution for Sustainable Development; 2014.
- 3. Arnulf JK, Larsen KR, Martinsen ØL, Bong CH. Predicting survey responses: How and why semantics shape survey statistics on organizational behaviour. PloS one. 2014;9(9):e106361. doi: 10.1371/journal.pone.0106361. pmid:25184672
- 4. Tufekci Z. Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls. In: Eighth International AAAI Conference on Weblogs and Social Media; 2014.
- 5. Barabási AL. The network takeover. Nature Physics. 2013;8:14–16.
- 6. Barigozzi M, Fagiolo G, Mangioni G. Community structure in the multi-network of international trade. In: Complex Networks. Springer; 2011. p. 163–175.
- 7. Acemoglu D, Ozdaglar A, Tahbaz-Salehi A. The network origins of large economic downturns. National Bureau of Economic Research; 2013.
- 8. Schich M, Song C, Ahn YY, Mirsky A, Martino M, Barabási AL, et al. A network framework of cultural history. Science. 2014;345(6196):558–562. doi: 10.1126/science.1240064. pmid:25082701
- 9. Kaluza P, Kölzsch A, Gastner MT, Blasius B. The complex network of global cargo ship movements. Journal of the Royal Society Interface. 2010;7(48):1093–1103. doi: 10.1098/rsif.2009.0495.
- 10. Hidalgo CA, Klinger B, Barabási AL, Hausmann R. The product space conditions the development of nations. Science. 2007;317(5837):482–487. doi: 10.1126/science.1144581. pmid:17656717
- 11. Guimera R, Mossa S, Turtschi A, Amaral L. The worldwide air transportation network: Anomalous centrality, community structure, and cities global roles. PNAS. 2005;102(22). doi: 10.1073/pnas.0407994102. pmid:15911778
- 12. Llorente A, Garcia-Herranz M, Cebrian M, Moro E. Social media fingerprints of unemployment. PloS one. 2015;10(5):e0128692. doi: 10.1371/journal.pone.0128692. pmid:26020628
- 13. Rutherford A, Cebrian M, Rahwan I, Dsouza S, McInerney J, Naroditskiy V, et al. Targeted social mobilization in a global manhunt. PloS one. 2013;8(9):e74628. doi: 10.1371/journal.pone.0074628. pmid:24098660
- 14. Eagle N, Macy M, Claxton R. Network diversity and economic development. Science. 2010;328(5981):1029–1031. doi: 10.1126/science.1186605. pmid:20489022
- 15. Ugander J, Karrer B, Backstrom L, Marlow C. The anatomy of the facebook social graph. arXiv preprint arXiv:11114503. 2011;.
- 16. Magno G, Weber I. International gender differences and gaps in online social networks. In: Social Informatics. Springer; 2014. p. 121–138.
- 17. State B, Park P, Weber I, Macy M, et al. The mesh of civilizations in the global network of digital communication. PloS one. 2015;10(5):e0122543. doi: 10.1371/journal.pone.0122543. pmid:26024487
- 18. Cardillo A, Zanin M, Gómez-Gardeñes J, Romance M, del Amo AJG, Boccaletti S. Modeling the multi-layer nature of the European Air Transport Network: Resilience and passengers re-scheduling under random failures. The European Physical Journal Special Topics. 2013;215:23–33. doi: 10.1140/epjst/e2013-01712-8.
- 19. Szell M, Lambiotte R, Thurner S. Multirelational organization of large-scale social networks in an online world. Proceedings of the National Academy of Sciences. 2010;107(31):13636–13641. doi: 10.1073/pnas.1004008107.
- 20. Haythornthwaite C. Social networks and Internet connectivity effects. Information, Community & Society. 2005;8(2):125–147. doi: 10.1080/13691180500146185.
- 21. Hristova D, Musolesi M, Mascolo C. Keep Your Friends Close and Your Facebook Friends Closer: A Multiplex Network Approach to the Analysis of Offline and Online Social Ties. In: Eighth International AAAI Conference on Weblogs and Social Media; 2014.
- 22. Boccaletti S, Bianconi G, Criado R, Del Genio CI, Gómez-Gardeñes J, Romance M, et al. Structure and Dynamics of Multilayer Networks. Physics Reports. 2014;544. doi: 10.1016/j.physrep.2014.07.001.
- 23. Menichetti G, Remondini D, Bianconi G. Correlations between weights and overlap in ensembles of weighted multiplex networks. Physical Review E. 2014;90(6):062817. doi: 10.1103/PhysRevE.90.062817.
- 24. De Domenico M, Nicosia V, Arenas A, Latora V. Structural reducibility of multilayer networks. Nature communications. 2015;6. doi: 10.1038/ncomms7864.
- 25. De Domenico M, Solé-Ribalta A, Omodei E, Gómez S, Arenas A. Ranking in interconnected multilayer networks reveals versatile nodes. Nature communications. 2015;6. doi: 10.1038/ncomms7868.
- 26. Nicosia V, Latora V. Measuring and modeling correlations in multiplex networks. Physical Review E. 2015;92(3):032805. doi: 10.1103/PhysRevE.92.032805.
- 27. Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA. Multilayer networks. Journal of Complex Networks. 2014;2(3):203–271. doi: 10.1093/comnet/cnu016.
- 28. De Domenico M, Solé-Ribalta A, Cozzo E, Kivelä M, Moreno Y, Porter MA, et al. Mathematical Formulation of Multilayer Networks. Physical Review X. 2013;3. doi: 10.1103/PhysRevX.3.041022.
- 29. Battiston F, Nicosia V, Latora V. Structural measures for multiplex networks. Physical Review E. 2014;89(3):032804. doi: 10.1103/PhysRevE.89.032804.
- 30. Bianconi G. Statistical mechanics of multiplex networks: Entropy and overlap. Physical Review E. 2013;87. doi: 10.1103/PhysRevE.87.062806.
- 31. Bródka P, Skibicki K, Kazienko P, Musiał K. A degree centrality in multi-layered social network. In: International Conference on Computational Aspects of Social Networks; 2011.
- 32. Kwak H, Lee C, Park H, Moon S. What is Twitter, a Social Network or a News Media? In: WWW; 2010. p. 591–600.
- 33. Iacovacci J, Wu Z, Bianconi G. Mesoscopic structures reveal the network between the layers of multiplex data sets. Physical Review E. 2015;92(4):042806. doi: 10.1103/PhysRevE.92.042806.
- 34. De Domenico M, Lancichinetti A, Arenas A, Rosvall M. Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Physical Review X. 2015;5(1):011027. doi: 10.1103/PhysRevX.5.011027.
- 35. Peixoto TP. Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. Physical Review E. 2015;92(4):042807. doi: 10.1103/PhysRevE.92.042807.
- 36. Mucha PJ, Richardson T, Macon K, Porter MA, Onnela JP. Community structure in time-dependent, multiscale, and multiplex networks. science. 2010;328(5980):876–878. doi: 10.1126/science.1184819. pmid:20466926
- 37. Ansonl J, Helblei M. 3. A gravity model of international postal exchanges. Reforming the postal sector in the face of electronic competition. 2013; p. 36.
- 38. Universal Postal Union EPSP. Measuring postal e-services development; 2012. Available from: http://www.upu.int/uploads/tx_sbdownloader/studyPostalEservicesEn.pdf.
- 39. Shavitt Y, Shir E. DIMES: Let the Internet measure itself. ACM SIGCOMM Computer Communication Review. 2005;35(5):71–74. doi: 10.1145/1096536.1096546.
- 40. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics. 2008;10.
- 41. Harmon D, Lagi M, de Aguiar MA, Chinellato DD, Braha D, Epstein IR, et al. Anticipating Economic Market Crises Using Measures of Collective Panic. PloS one. 2015;10(7):e0131871. doi: 10.1371/journal.pone.0131871. pmid:26185988
- 42. Smith R. Distance is dead: the world will change. BMJ: British Medical Journal. 1996;313(7072):1572. doi: 10.1136/bmj.313.7072.1572. pmid:8990988