The authors have declared that no competing interests exist.
Conceived and designed the experiments: DH AR JA MLO. Performed the experiments: DH. Analyzed the data: DH AR JA. Contributed reagents/materials/analysis tools: AR JA MLO. Wrote the paper: DH AR JA MLO CM.
The digital exhaust left by flows of physical and digital commodities provides a rich measure of the nature, strength and significance of relationships between countries in the global network. With this work, we examine how these traces and the network structure can reveal the socioeconomic profile of different countries. We take into account multiple international networks of physical and digital flows, including the previously unexplored international postal network. By measuring the position of each country in the Trade, Postal, Migration, International Flights, IP and Digital Communications networks, we are able to build proxies for a number of crucial socioeconomic indicators such as GDP per capita and the Human Development Index ranking along with twelve other indicators used as benchmarks of national well-being by the United Nations and other international organisations. In this context, we have also proposed and evaluated a global connectivity degree measure applying multiplex theory across the six networks that accounts for the strength of relationships between countries. We conclude by showing how countries with shared community membership over multiple networks have similar socioeconomic profiles. Combining multiple flow data sources can help understand the forces which drive economic activity on a global level. Such an ability to infer proxy indicators in a context of incomplete information is extremely timely in light of recent discussions on measurement of indicators relevant to the Sustainable Development Goals.
The vast streams of data that are produced by the use of automated digital services such as social media, email and mobile phones, also known as ‘Big Data’, have for some time been leveraged in the private sector to assist in tasks as diverse as logistics, targeted advertising and offering personalised multimedia content. More recently, these same data sources and methodologies have begun to be used to assist humanitarian and development organisations, allowing new ways to use data to implement, monitor and evaluate programs and policies [
However, the use of Big Data for development is still relatively nascent and questions remain over the ability of such sources to measure or approximate metrics of interest. Invariably, data sources such as social networking applications enjoy deeper penetration in developed economies and rely on expensive technologies such as smart phones and robust communications infrastructure. It has been noted that measurements of human dynamics based on such recent platforms can lead to strong biases [
In this paper we present analysis of a data source which is undoubtedly ‘Big’ yet represents one of the most established and pervasive long-distance communications networks in the history of mankind. The international postal network (IPN) established in 1874 is administered by a dedicated United Nations specialised agency: the Universal Postal Union (UPU). Due to regulatory reporting requirements and the capabilities of automated data capture technologies such as RFID tags, the records of individual postal items maintained by UPU represent a rich record of human activity with unparalleled penetration, which can be expected to reflect individual level behaviour, local, regional and national economic activity and international economic relations.
Network representations have emerged as an extremely powerful and general framework for analysing and modeling systems as diverse as transportation, biological processes, academic authorship and logistics among others [
In this work, we explore over four years of daily postal data records between all countries by comparing them to other global flow networks, such as the trade, migration and digital networks. We show how the network properties of global flow networks can approximate critical socioeconomic indicators and how network communities formed across physical and digital flow networks can reveal socioeconomic similarities.
Real-time measurements of international flow networks can ultimately act as global monitors of wellbeing with positive implications for international development efforts. Using knowledge about the way in which countries interact through flows of goods, people and information, we use the principles of multiplexity theory to understand the strength of international ties and the network communities they form. In this section, we will detail the methods used to perform our analysis and the various datasets with focus on the international postal network (IPN), which has previously not been described.
Multiplexity, or the multiple layers of interactions between the same entities, has been explored in a wide range of systems from global air transportation [
A natural extension of a network in which edges between pairs of nodes represent a single kind of flow between those nodes, is to a
A comprehensive review of multiplex network models can be found in [
We can also compute the weighted global degree of a node
The average global degree is 110 and the average global weighted degree is 250, which means that each country connects with an average of 110 other countries through two or more layers. In terms of unweighted degree (number of unique connections globally in the multiplex) in
Networks are powerful representations of complex systems with a large degree of interdependence. However in many such systems, the network representing it naturally partitions into communities made up of nodes that share dependencies between each other, but share fewer with other components. In the present context, communities are composed of groups of countries that share higher connectivity than the rest of the network. If two countries appear in the same community across many network layers, this can be considered a greater level of connectivity and an indicator of greater socioeconomic similarity, otherwise not visible from the single network perspective. We formalise this idea as the
Having described our methodology using multiplex networks, which has not been previously applied to international networks of flows, we will proceed to describe the six networks and fourteen global socioeconomic indicators which we use in the core of our analysis next.
Although postal flows are understood to follow a distance based gravity model [
In terms of daily activity, we can observe the mean relative number of daily items sent and received by countries during the period in
Volume is proportional but does not represent the actual number of items exchanged due to data sensitivity.
Next we report on the degree distributions of both the weighted and unweighted global postal graphs. The unweighted postal graph simply contains all directed edges present in the network regardless of flow volume. The weighted graph on the other hand also includes the weight of connections in the graph. We weight the network by summing the total annual volumes of directed flow between two countries, averaged over years and normalised over the population of the country of origin. We then further normalise by the maximum weight in the network, resulting in a value between 0 and 1, allowing us to compare values between networks. The weighted adjacency matrix of the top quartile of countries in terms of degree can be seen in
This work builds upon previous efforts using global flow networks to present novel data sources for international development efforts such as the IPN and to demonstrate a holistic view of several distinct flow networks. We consider five networks, which have been previously studied independently, along with the IPN. We will now describe these networks and compare their network properties in the following section.
The trade network is constructed from records maintained by the UN Statistics Division in the Comtrade Database and provided by the Atlas Project and contains the number and value of products traded between countries classified by commodity class.
This is compiled from bilateral flows between 196 countries as estimated from sequential stock tables. It captures the number of people who changed their country of residence over a five-year period. This reflects
The flights data is collected by 191 national civil aviation administrations and compiled by the International Civil Aviation Organisation (ICAO). These tables detail, for all commercial passenger and freight flights, country of origin and destination and the number of flights between them. [
This city to city geocoded dataset is built from traceroutes in the form of directed IP to IP edges collected in a crowdsourced fashion by volunteers through the DIMES Project. The project relies on data from volunteers who have installed the measurement software which collects origin, destination and number of IP level edges which were discovered daily. We aggregate this data on a country to country basis and use it to construct an undirected Internet topology network, weighted by the number of IPs discovered and normalised by population as all other networks. The data collection methods are described in detail in the founding paper of the project [
is constructed from aggregated digital communication data from the Mesh of Civilizations project, where Twitter and Yahoo email data is combined to produce an openly available density measure of the strength of digital communication between nations [
In the following analysis we compare these networks and use multiplexity theory to extract knowledge about the strength of connectivity across them. We will distinguish between single layer and multiplex measures, which will allow us to observe to a deeper extent the international relationships and the potential for using global flow networks to estimate the wellbeing of countries in terms of a number of socioeconomic indicators (summarised in
Abbreviated | Full name | Description | Source |
---|---|---|---|
GDP | Gross Domestic Product | Aggregate measure of production on a on a per capita basis | The World Bank |
LifeExp | Life Expectancy | Life expectancy since birth in years | The World Bank |
CPI | Corruption Perception Index | Perceived levels of corruption, as determined by expert assessments and opinion surveys | Transparency International |
Happiness | Happiness Score | Survey of the state of global happiness perceptions | Gallup World Poll |
Gini.Idx | Gini Index | Income inequality on a national level | The World Bank |
ECI | Economic Complexity Index | Holistic measure of the production characteristics of large economic systems | The Observatory of Economic Complexity |
LitRate | Adult Literacy Rate | Percent of adult population who are literate | UNESCO |
PovRate | Poverty Rate | Percent of population living bellow national poverty threshold | The World Bank |
EdRate | Education Rate | Percent of population who have completed primary school | The World Bank |
CO2 | Emissions of carbon dioxide | Carbon dioxide in billions of metric tonnes per capita | Carbon Dioxide Information Analysis Center |
FxPhone | Fixed Phone Rate | Percent of population living in households with a fixed phone line | Int Telecommunication Union |
Inet | Internet penetration | Percent of population who have accessed the Internet in the past 12 months | Int Telecommunication Union |
Mobile | Mobile cellular subscriptions | Percent of population who have a mobile cellular subscription | Int Telecommunication Union |
HDI | Human Development Index | Composite statistic of life expectancy, education, and income per capita indicators | UNDP |
In order to understand the multiplex relationships of countries through flows of information and goods in context, we first compare all flow networks. We then present their respective and collective ability to approximate crucial socioeconomic indicators and finally perform a network community analysis of individual networks and their multiplex communities where the most socioeconomically similar countries can be found.
Although each of the five networks previously described apart from the International Postal Network (IPN) has been studied separately, there has not been a comparative analysis of all. In
network | weight | years | |V| | |E| | < k > | assort | d | cc |
---|---|---|---|---|---|---|---|---|
Post | postal items | 2010–15 | 201 | 22,280 | 110.85 | -0.26 | 0.55 | 0.79 |
Trade | export value | 2007–12 | 228 | 30,235 | 132.6 | -0.39 | 0.58 | 0.84 |
Migration | migrants | 2005–10 | 193 | 11,431 | 59.22 | -0.33 | 0.31 | 0.68 |
Flights | flights | 2010–15 | 223 | 6,425 | 28.81 | -0.1 | 0.13 | 0.49 |
IP | IPs | 2007–11 | 225 | 9,717 | 43.19 | -0.42 | 0.19 | 0.6 |
SM | density | 2009 | 147 | 10,667 | 145.13 | -0.02 | 0.98 | 0.99 |
We weight all networks by normalising the raw volume of interaction described above by the population of each respective country of origin and rescaling all weights across networks within the same range [0, 1] by dividing by the maximal weight, as we did for the postal network in the previous section. We compute the average out degree for each directed network in a standard way as for the postal network, as well as the degree assortativity (Pearson correlation between the degrees of all pairs of connected countries), the network density and clustering coefficient. The assortativity coefficient determines to what extent nodes in the network have mixing patterns that are determined by their degree. Positive assortativity means that nodes with high degree tend to connect to other nodes with high degree, whereas a negative assortativity means that nodes with high degree tend to connect with others with lower degree, which is the case for all of the six networks as seen in
We now turn to
When considering the degree of a country as an indicator of its position in the network, we find that there are high correlations between the in and out positions of countries in
Timely statistics on key metrics of socio-economic status are essential for provision of services to societies, in particular marginalised populations. The motivation for this measurement varies from social resilience in the event of natural or man-made disasters to ensuring social rights such as education and access to information. While national governments typically administer their territories and allocate resources in terms of sub national divisions, international organisations such as the United Nations and the World Bank, as well as regional organisations and blocs such as the Economic Council or Latin American and the Caribbean and the African Union invariably partition populations under nation states. In this context, the nation state is the primary geographical entity considered for funding, planning and allocation of resources for development. Despite the importance of accurate statistics to quantify the state of a country and progress towards favourable socio-economic outcomes, regular and reliable measurement is difficult and costly particularly in low income countries.
With this in mind, in this section we compare the positions of countries within the different networks discussed previously to the values of several socioeconomic indicators.
For each of the six networks, we compute the network degree, defined as the sum of the neighbours for both incoming and outgoing connections where directed. This reflects how well connected a country is in a particular network. We also take into account the amount of connectivity by computing the weighted incoming and outgoing degrees on each network, defined as the sum of the normalised flows from all neighbours and reflecting the volume of incoming and outgoing flows. In addition to these standard single-layer network metrics, we define and compute the
All degrees of single networks and the global degree appear vertically in
The GDP per capita and life expectancy are most closely correlated with the global degree, closely followed by the postal, trade and ip weighed degrees. This shows a relationship between national wealth and the flow of goods and information. The perception of corruption index (CPI) however, is most positively correlated with the out weighted degrees of the postal and trade networks, followed by the IP network, similar to their relationship with the happiness index. This signifies that less corrupt and more happy countries have greater outflows in those respects. On the other hand, the Gini Index of inequality is distinctly most negatively correlated with the flight network, which means that countries with greater inequality have less incoming and outgoing flight connections. The ECI index is equally highly correlated with most network degrees, and especially the global degree, trade, ip and post degrees. Literacy, Education and mobile phone users per capita were more weakly correlated across than other indicators, which means that there may be better predictor variables beyond the scope of this work for those indicators. Fixed phone line households, Internet penetration and CO2 emissions, however, are positively correlated with the global degree, followed by the postal and ip degrees. This indicates the importance of global connectivity across networks with respect to these factors.
Similarly to GDP, the rate of poverty of a country is best represented by the global degree, followed by the postal degree. The negative correlation indicates that the more impoverished a country is, the less well connected it is to the rest of the world. Finally, one of the most strongly correlated indicators with the various degrees is the Human Development Index (HDI), low human development (high rank) is most highly negatively correlated with the global degree, followed by the postal, trade and ip degrees. This shows that high human development (low rank) is associated with high global connectivity and activity in terms of incoming and outgoing flows of information and goods. One notable observation is that the ip, postal and trade weighted out network degrees all have similar correlation patterns with the various indicators, the commonality between these networks is that they express the flow of resources from a country. Another observation is that weighted social media and migration outflow are weak predictors of the explored indicators. Because most indicators are related to each other, e.g., high GDP indicates low Poverty or high HDI indicates Happiness, when a degree is a predictor of one, it tends to be a good predictor of the others.
In this section we have shown that network science can provide reliable and easy to compute approximations of various indices and that connectivity between countries determines their position in global flow networks which relate to the success of their socioeconomic properties. Isolation of causative relationships between effects is notoriously difficult and the question of why some countries are prosperous while others are not is no exception (see Why Nations Fail (2012) D. Acemoglu and J. Robinson, Crown Publishing). Put simply, there are a myriad of confounding factors such as historical legacy, conflict and environmental factors that could lead countries with otherwise similar profiles to have wildly divergent economic outcomes.
Although our results do not provide insight into the cause of the socioeconomic circumstances of a country, our hypothesis is that network measures derived from global flow networks are a proxy of socioeconomic activities and therefore highly correlated with the explored indicators. It is an open question as to whether a highly central position in the network leads to favourable socio-economic outcomes or vica-versa. The structural connectedness of a country in the global network represents the number of opportunities a country has to exchange goods, information and resources with our countries—the more opportunities, the higher the exchange and therefore socioeconomic benefit. An analogous relation between the social network of an individual and that individual’s poverty score supports this hypothesis [
In the previous section we related network measures to various socioeconomic indicators, showing that metrics such as the network degree can be used to estimate wellbeing at a national level. In this section, we further examine the connectedness between pairs of countries through community structure across network layers as a form of socioeconomic similarity. We use the Louvain modularity optimisation method [
Although communities naturally seem to be very driven by geography in physical flow networks, this is not the case in digital networks where communities are geographically dispersed. This is an indication of the difference in the way countries connect through post, trade, migration and flights rather than on the IP and social media networks. However,
Our hypothesis is that countries that are paired together in communities across more networks are more likely to be socioeconomically similar. We measure similarity here as the absolute difference between each indicator from the previous section for two countries and plot that against their community multiplexity. For example, the United States has an average life expectancy of 70 years, whereas Afghanistan has an average life expectancy of 50, the absolute difference between the two is 20 which represents low similarity when compared to the United Kingdom’s life expectancy of 72 for this indicator. In
Further to this observation, in most indicators there is a very strong significance in the level of community multiplexity—
Big data is often related to real-time data captured through the Internet or social networks. However, the digital divide makes access to big data insights for development more challenging in the least developed and many developing and emerging countries. Can we rely on other networks to overcome these critical data gaps in view of better measuring and monitoring developmental progress? This is particularly important following the United Nations adoption of the Sustainable Development Goals (SDG) in September 2015, made of 17 goals, 169 targets and almost 200 universal indicators, each of them calling for regular and increasingly disaggregated monitoring in every country during the 2016–30 period. This commitment invites a nuanced discussion on the nature and importance of measurement, inference and triangulation of data sources. This discussion is particularly prescient in the face of complex intertwined developmental challenges in an age of increased globalisation, economic interdependence and climate change.
The work presented above has clearly shown the value of measuring, comparing, and combining metrics of global connectivity across six different global networks in order to approximate socioeconomic indicators and to identify network communities with similar connectivity profiles. We have shown how both global digital and physical network flows can contribute to support a better monitoring of SDG indicators, as illustrated by the high correlation between Internet and postal flows on the one hand, with an exhaustive list of socioeconomic indicators on the other hand.
We also note the considerable potential, exposed here, for future applications of postal flow data. While we have here restricted our analysis to country-level relations, postal flows allow for socio-economic mapping on a sub-national level which can inform development programmes on a practical level. An additional dimension to be explored—that is beyond the scope of this paper is temporal analysis which, combined with the multiplex network model presented above, could provide early warning of economic shocks and their propagation [
Interestingly, despite the ease of
All results are statistically significant with p<0.05.
(EPS)
(TEX)
(CSV)
Desislava Hristova was supported by the Project LASAGNE, Contract No. 318132 (STREP), funded by the European Commission and EPSRC through Grant GALE (EP/K019392). We are grateful to Andrei Bejan for the statistics consultation and Noa Zilberman for advice on the DIMES Project data.