Community Evolution in International Migration Top1 Networks

Focusing on each country’s topmost destination/origin migration relation with other countries, this study builds top1 destination networks and top1 origin networks in order to understand their skeletal construction and community dynamics. Each top1 network covers approximately 50% of the complete migrant network stock for each decade between 1960 and 2000. We investigate the community structure by implementing the Girvan-Newman algorithm and compare the number of components and communities to illustrate their differences. We find that (i) both top1 networks (origin and destination) exhibited communities with a clear structure and a surprising evolution, although 80% edges persist between each decade; (ii) top1 destination networks focused on developed countries exhibiting shorter paths and preferring more advance countries, while top1 origin networks focused both on developed as well as more substantial developing nations that presented a longer path and more stable groups; (iii) only few countries have a decisive influence on community evolution of both top1 networks. USA took the leading position as a destination country in top1 destination networks, while China and India were the main Asian emigration countries in top1 origin networks; European countries and the Russian Federation played an important role in both.


Introduction
Cross-border migration is regarded as a fundamental characteristic of human life and has turned into a significant force all over the world with important economic, social, and political implications [1]. Moreover, with a constant growth throughout the last five decades of the twentieth century from 93 to 167 million [2], global migration reached 232 million people living outside their country of origin in 2013, and this figure is forecasted to double by 2050 [3]. Even more essential is the fact that the increasing complexity of migratory patterns-global mobility influencing from individual through families, industries, countries, along with the possibility to redesign the world where we live-have led migration to become a priority for international communities [4]. relationships and exhibit communities with a clear structure but with a surprising evolution. The remaining (20%) are strong enough to promote changes in the communities; they show an opposite trend concerning the number of clusters. In other words, featuring a decrease in the number of communities between 1960 to 2000, Top1D focused on developed countries and exhibited shorter paths and preferred more advanced countries. Over the same time period, with an increasing number of communities, Top1O focus more on both developed as well as more substantial developing nations; this presents a longer path and more stable groups. Furthermore, by exploring the composition of the communities, we notice that countries with traditions in receiving immigrants (e.g., USA) experienced a change in terms of the composition and amount of migrants compared with a more stable situation presented in Australia and New Zealand. World War II was followed by independence for most former colonies of European and Japanese empires [23]. The dominating key source of migrants, Europe, was replaced by Africa, Asia, Latin America, and the Caribbean. Instead of their empires, societies like Britain and France created links with those former colonies, which fostered migration among them and changed their position into immigrant-receiving countries. Japan, with a declining birth rate and aging symptoms, was in a position to import immigrants from poorer countries of Asia and South America in order to meet the workforce needs [1]. Taking into consideration the effects generated, comprehending the evolution of international migration communities can reveal more about the migration patterns.

Data and Complete International Migration Network (CIMN)
The source for data employed in this paper is United Nations Population Division's Global Migration Database [2], which consists of an origin-destination square matrix tracking bilateral migration between 231 countries for each decade between 1960 and 2000 (e.g., 1975 through 1984 is assigned to 1980). Just one standard list of countries is chosen for the entire time span of this database, regarding both origin and destination countries; this allowed comparison of the migration figures over time. For example, the 15 new sovereign states created after the break-up of the Soviet Union were treated as separate countries at every decade between 1960 and 2000 (that is, the internal migrants during the years of the Soviet Union in this database were considered as international migrants) [2]. This unique dataset comprises 3500 individual census and population records and provides information on international migrant bilateral stocks [2]. Preferentially, country of birth is used to define country of origin, and migration data are firstly provided by the destination country [2].
Bilateral data included in this dataset refer to immigrant stock instead of flows to facilitate interpretation [24]. Furthermore, data for bilateral flows are only available for OECD countries, which obviously limit the overall coverage considerably [25]. Therefore, seeking to provide an overview of the migration phenomenon covering a larger number of countries, we chose to use the migrant stock [14,16,17,19,20]. An important note is that this database does not include two important aspects of migration, that is, illegal and within border (internal) migration.
Nearly all systems analysed today are constructed from many elements; each have an independent role but make contributions to the whole. By considering only the degree of a node, we ignore the proven fact that even the nodes with small degrees can play an important role in connecting different regions of the network by servings as bridges. Another neglected aspect is related with losing the possibility of emphasizing the relationships between countries and groups of countries.
Starting from the idea that relationships as well as connections are some of the most significant components that characterize the shape and the conduct of the physical and social world as we comprehend it, we begin our analysis with the fact that international migration can be treated as a network of nodes (i.e., countries) that are connected via links that represent the migrants stock. Inside the origin-destination matrix, states like Channel Islands, Isle of Man, Kosovo, Montenegro and Serbia do not share any relationships with other nodes so we reduce the database to 226 countries [14,16,17,19,20].
Given these issues, we define the complete international migration network (CIMN) as a weighted, directed network: M t ¼ fw t ij g NÂN where M t represents the matrix, time is year t = [1960; 1970; 1980; 1990; 2000], countries are N = 226, and w t ij is the stock of migrants born in country i living in destination country j at the time t.
Accordingly, we define the binary projection of the international migration network as a ( , and a t ij expresses the presence of migrants born in country i living in destination country j at time t. International migration provides an impressive network that encompasses countries connected by several links of cross-border movements. Compared with the initial facts (e.g., the total number of immigrants living in each country), the analysis of entire network characteristics gives an integrated knowledge of human migration and shows how changes in behaviour of a node can influence other apparently unrelated nodes.

Top1 networks
CIMN is represented by a huge cluster with a large number of links established between network nodes. For each country, we rank its migration relationships with other countries by the number of migrant stock because some migration relationships are more important than others, especially the ones ranked first, which are called top1 [22].
The top1 network comprise each country's topmost migration relationships (the strongest link) with other countries, the top2 network comprise each country's top two migration relationships (the strongest and second strongest link) with other countries, and so on. The topmost important edges included in the top1 network covered approximately 50% of the complete migrant network stock. Meanwhile, the percentage of the top2 and top3 are around 61% and 69%, respectively. These percentages are generally stable over time. The impressive percentage of the top1 network captured our attention. For example, consider the data for international migration in 1960. When only the strongest tie for each country is kept, the top1 network will include only 226 links (from a total of 16485 links) but will cover around 46.5 million migrant stock (that is, around 50% of the total 93 million). Therefore, starting from the weighted directed matrix M t and considering each country's topmost migrant stock born in country i and living in destination country j, we build two top1 networks. The first is the international migration top1 destination network (Top1D) and is defined as follows: ( where dw t ij is the max migrant stock born in i living in destination j at time t after country of origin i.
Accordingly, we define the binary projection of Top1D as follows: We define the second by extracting the international migration top1 origin network (Top1O): ( Community Evolution in International Migration Top1 Networks where ow t ij represents the max migrant stock born in i living in destination j at time t after country of destination j.
The binary projection of Top1O is defined as follows: One crucial attribute of the top-ranked networks refers to the out-and in-degree. Specifically, in top destination networks, all nodes have an out-degree of the selected standard but with varying in-degrees across countries. As an example, in the top1 destination network, all the countries have out-degrees equal to 1, while the in-degrees vary across the nodes and can be greater than 1. This means that a country can have only one biggest origin source but can be the destination for many other countries. Of course, if the country is not selected as a destination, the number of in-degrees will equal 0. On the other hand, in top origin networks, all the countries have an in-degree of the selected standard but their in-degree varies across the countries. Fig 2 shows these networks using international migration data for the year 2000. To construct this figure, we began with the complete international migration network. Next, we kept each country's strongest migration link and extracted Top1D (the strongest outgoing link) and Top1O (the strongest incoming link). In the same way, Top2D and Top 2O networks can be extracted by keeping each country's top two important migration ties (the strongest and second strongest link), etc.

Network measures
By computing the basic information for all the nodes from the network, the topology can be characterized through selected metrics [14,26,27]. Probably the most widespread and accepted of these metrics include density (i.e., the ratio of the number of edges and the number of possible edges), diameter (i.e., the maximum shortest path length in a network), average path length (APL, i.e., the average length of all the shortest paths from or to the vertices in the network, considering directed paths in directed graphs), and degree (ND, i.e., the number of its adjacent edges). In directed graphs, the degree of a node is defined by the sum of its in-degree (ND in ) and its out-degree (ND out ), ND i = ND in,i + ND out,i , where the in-degree ND in,i of the node i is defined as the number of edges going to i, and its out-degree ND out,i is defined as the number of edges exiting from i. In terms of the adjacency matrix, we can write the following: For the original non-symmetrical matrices, the strength or weighted vertex degree is calculated as summing the edge weights of the adjacent edges for each vertex (i.e., country). Mode is defined as "out" for out-degree and "in" for in-degree.
Considering these metrics and their evolution will reveal how the composition of migration changed due to world events and progressively more selective immigration regulations in developed countries. As an overview, the migration process is influenced at every decade by the globalization phenomenon.
To make the comparison between both top1 networks and to emphasize their importance in the CIMN, we built a global level index at every decade: where pTop1D t and pTop1O t represents the proportion of each top1 network in the CIMN at time t.
Furthermore, in order to check the proportion of persistent edges in top1 networks, we built the following index: where pSETop1D t and pSETop1O t emphasize the stability of edges in each top1 network and represents the proportion of persistent edges from previous phase in total edges.

Communities
Despite the significant role of human migration as a major contributor to globalization, the understanding of community structure and evolution inside the top1 networks remain poorly understood.
In most real-world networks, especially in social networks, the node has a tendency to create properly knitted groups with a relatively high density of ties [28,29]. The literature provides several definitions and methods to detect communities. As a result, most algorithms can be distinguished into categories of divisive [30], agglomerative [31], and optimization-based [32]. Linkage-based approaches exploit the topological information of a network to identify dense sub-graphs.
Among the available detection algorithms for a network with directed-weighted edges, community detection based on the edge-betweenness algorithm is the most suitable [30]. Thus, using the Girvan-Newman algorithm [30,33], we detect the communities in both top1 networks. The betweenness centrality of edges can be calculated analogously to the node betweenness of the number of shortest paths among all possible node pairs that pass through a given edge. The edges with a maximum score are assumed to be more important for a graph to remain interconnected. Granovetter called these edges "weak ties" that interconnect clusters of nodes [34]. The algorithm computes the edge betweenness of the graph by removing the edge with the highest edge betweenness score, then it recalculates the edge betweenness of the edges and again removes the one with the highest score, etc.

CIMN and Top1 Networks: Descriptive statistics
Due to the fact that migration occurs inside the network, examining its characteristics is essential for comprehending migration patterns. The changes in dynamics, through the appearance and disappearance of some links or perhaps through short-cuts capable to avoid a longer path, affect the architectures of the networks between each two consecutive decades. The top1 networks extract the most important links from the overall international migration network. Accordingly, the migrant stock in the top1 destination network and the top1 origin network make up around 50% of the total global migrant stock. The percentages of Top1D vary from 53.6% in 1960 to 48% in 2000 compared with Top1O, which varies from 54.1% in 1960 to 35.4% in 2000; this once again shows the impact of globalization on the phenomenon of migration.
Considering its distinct relevance, we have taken the step of investigating the structure and evolution of the complete international migration network and both top1 networks. Table 1 presents basic descriptive statistics about the complete international migration network in ten-year intervals spanning the years 1960-2000. Furthermore, with a constant number of nodes (equal with 226) along the five networks, the CIMN features both extensive and intensive growth. First, the number of links between countries grew 45% from 16485 in 1960 to 23718 in 2000. As an average, at every decade, around 1446 new links were established between pairs of countries. This has resulted not only in an increase in density from 0.324 to 0.466 but also an increase in the mean node degree from 145.9 to 209.9, while the average path length decreased from 1.749 to 1.535. Further, regarding the maximal in-degree and out-degree, the situation shows the same pattern reaching the highest level in 2000 when a node received migrants from almost all the countries of the network (i.e., 223) and sent 216. Second, the number of migrant stock increased remarkably. The mean node strength (expressed in thousand) increased from 5.646 to 7.044 migrants, recording a maximal value of 34814.064 in destination country for the year 2000 and 13244.244 in the origin country for the year 1990.
The diameter of a growing random network can be more different than, for example, the one of a Poisson random network. This growing network has large-degree nodes that emerge and may work as hubs to reduce the overall distance between countries [35]. With a small, almost constant diameter displaying a decreasing average path length and an increasing clustering coefficient [36], we conclude that the CIMN demonstrates a "small-world" behaviour [37,38].
Top1 migrant relations are by definition the most important migrant ties for countries. In the Top1D network, by definition each country can have only 1 out-degree, but they can have different in-degrees. In this way, the in-degree determines a country's position in the network. Similarly, in a Top1O network, each country can have only 1 in-degree, but they can have several out-degrees, which will determine how central a country is. As a result, the number of nodes will equal the number of edges. The only exception to this rule is found in the Top1O network, where in the first four decades some countries are not considered as destinations. For example, in the 1960's and 1970's, those countries were Norfolk Island and Taiwan (reducing the number of edges at 224) and in the 1980's and 1990's, those countries were Taiwan and Belize (reducing to 225).
Both top1 networks share the same number of countries (i.e., 226) showing a small density (i.e., 0.0004) caused by the reduced number of ties between countries and a mean node degree maintained at a constant value of 2. Table 2 presents basic descriptive statistics about the top1 destination networks for each decade between 1960 and 2000, where the five networks are represented by disconnected graphs with a number of components that vary over time from 14 to 10 and have an infinite diameter [39].
Moreover, with a maximal node in-degree growing throughout the five networks from 37 in 1960 to 60 in 2000, the first migrant receiving country in the world is USA, which presents an increasing immigrant stock from 9274.493 in 1960 to 25270.152 in 2000. The maximal node out-degree, presented in Table 3

Communities vs. Components
In addition to the analysis of migrant stock dynamics, information about the connections of each node can be used to identify the community structure, i.e., the existence of clusters. For the purpose of determining the number of components, we ignore a key feature of our networks, namely, the edges' weight. Therefore, in order to overcome this limitation, we add the weight and then apply the community-detection Newman-Girvan modularity algorithm to explore the communities' structure. Specifically, we use the i-graph package in R [6,40] to detect the communities for each decade between 1960 and 2000 and to investigate the underlying behaviour of the network.
Afterwards, we analyse the dynamics of these communities to understand their evolution over time as well as the possible gradual disappearance of the legacy involving old communities within the following decades. The number of communities identified in each network is presented in Table 4. There is a noticeable fluctuation in the number of communities in the top1 networks. For example, in Top1D, a disordered fluctuation concerning the number of communities is notable, starting with 30 in 1960 and ending with 20 in 2000; this implies that globalization makes the architecture of Top1D less fragmented with modules that are more interconnected between them. On the contrary, we observe that in the Top1O the amount of communities increase from 15 in 1960 to 23 in 2000, which are driven by world events as well as increasingly selective immigration laws in developed countries and tend to generate significantly diversified migrant stocks. Moreover, when it comes to compare the number of communities with the number of components, we notice that considering the weight of the links reveals more about the inside of the network and the way the countries form communities. In both top1 networks, the number of components is less than the number of communities, showing again the importance of weight in characterizing the network. For the Top1D networks, the number of components show a downward trend (from 14 to 10) compared with the Top1O networks, which grows from 13 to 19 components.

Community evolution
In the last part, we characterize the time-evolution of in-degrees for Top1D and out-degrees for Top1O, highlighting the way the nodes interact with each other during the five networks. Table 5 shows the size of identified communities in terms of the number of nodes. There is an increased level of concentration in terms of the number of countries associated with a reduced number of communities. For example, in the 1960 decade, the top ten communities of the Top1D network includes 133 countries (that is, 59% from the total number of countries) in which the first 5 communities are made up of 91 countries. In the meantime, in the Top1O network, the first ten covers an impressive number of 212 countries (94% from the total number of countries) where the first five have 160 countries. Furthermore, at the longitudinal level, this situation emphasizes again the role played by international migration in the first phase of the globalization phenomenon, involving an opposite trend regarding the concentration of communities in the two top1 networks. In order to support this argument, we look at the level of concentration in the first 5 communities of each top1 network. With an increasing trend (i.e., evolving from 91 countries in 1960 to 139 countries in 2000), the Top1D network shows the highest concentration of node in 1980 reaching 186 countries (that is, 82% from the total number of countries). On the contrary, the Top1O network shows a negative trend, evolving from the pick of 160 countries in 1960 (that is, 70.8% from the total number of countries) to 116 countries in 2000. Both networks start in 1960 with around 40 nodes and evolve to 76 in the Top1D network compared with the Top1O network, which evolves to 31 nodes in 2000. We conclude that in terms of the evolution and structure of communities, the situation is complex. While top1 networks include communities that grow from decade to decade and are able to absorb or to dissolve small communities, they also include more stable communities during that time. Furthermore, once the communities structure is determined, we can investigate the node's degree evolution as the potential cause of the progressive disappearance for the legacy of old communities within the succeeding decades.
In Table 6, we summarize the statistics of degrees in the case of both top1 networks. Needless to say, more connected nodes tend to be more central. We notice that only a few countries (an average of 4.86% for Top1D and 3.54% for Top1O) have a high (that is, 5 or more) number of degrees. These so-called central countries are in a different number for the two top1 networks. For example, in the Top1D network, this number fluctuates between 13 and 9 compared with the Top1O network, which fluctuates between 9 and 6. Both top1 networks registered a peak in 1970 with 14 and 10 central countries, respectively. Furthermore, it is interesting that more than half of the total number of countries (around 70% for Top1D and 56% for Top1O)  (7); pSETop1O: Proportion of persistent edges in Top1O network as defined in Eq (9).
do not share any degrees over time (i.e., Degree = 0). Also, in this case, the evolution over time is different in the two top1 networks. For example, with a positive evolution from 147 to 158 compared with the negative trend from 134 to 122 countries, the Top1D network shows a higher concentration compared to the Top1O network.
In the final analysis, we identify the 25 most central countries that present the highest number of in-and out-degrees in each top1 network at every decade between 1960 and 2000 (see Table 7). It is interesting to note that in these 25 most central countries, 9 countries are developed and 16 are developing, and they exhibit different roles in each top1 network. Specifically, the Top1D network has more developed countries and shows an increased trend over time, while Top1O includes some large developing countries with high populations, like China and India. In the top1 destination networks, all these countries have not less than one in-degree in at least 3 of 5 years. The in-degree shows how many countries have these countries as their topmost destination according to migrant stock. In top1 origin networks, all the countries have more than 1 degree in all of the 5 years. The out-degrees present how many countries have this country as their topmost origin after migrant stock. As we can see, the countries with higher in-degrees in the Top1D network show an evolution that is not very stable during the time compared with the out-degrees in the Top1O network. Next, we will focus our attention on the 5 topmost central countries in top1 networks. The top tier includes 4 of the largest countries with the highest population such as USA, Russian Federation, China, and India but also the West European countries including France, Germany, and the United Kingdom. More precisely, the countries addressed are the most central countries in the first top5 communities in both top1 networks. In order to understand the position played by the most central country in the communities, we will take the example of USA. In 1960, USA was bringing to the community 37 nodes from the total of 43. In 1980, the size of the community grew to 88, in which USA brought 56 nodes, even though it was considered the main destination for 58 countries. The remaining two countries (i.e. Germany and France) formed their own communities.  The connection with their former colonies was essential for Britain and France; at every decade between 1960 and 2000 this was characterized by a fluctuated evolution in Top1D networks and a relatively stable one in Top1O networks with a larger number of countries of destination for British emigrants. The topmost number of British emigrants went to Canada in 1960, followed by Australia for the remaining years. In the meantime the French emigrants considered Morocco in 1960, Democratic Republic of the Congo in 1970, and USA in 1980, 1990, and 2000. In top1 destination networks, the second most central country is France, with the highest overall in-degree coming from Algeria compared with the United Kingdom, who received the highest number of immigrants from India in 1960, and Ireland in the rest of the decades under consideration. In the meantime Germany played an important role mostly in top1 destination network receiving the highest number of immigrants from Poland in the first four decades and Turkey in 2000.
On the other hand, compared with the Western European countries, the large-scale migration to the USA developed later due to restrictive legislation enacted in the 1920's [41]. USA's  Asian migration is not new. Even so, the discriminatory rules of the countries that repealed against Asians [41] has made from-migration within Asia a more popular trend being clearly captured in the Top1O network. In the nineteenth century, China's main destination country in 1960 and 1970 was Indonesia followed by Hong Kong in the remaining years. China's outdegree decreased from 11 countries in 1960 to 8 in 2000, and all of the destination countries during the time were countries inside Asia. India, which presented a stable evolution during that time, was considered the topmost origin country for 11 nodes (with the exception in 1980 and 1990 only 10 nodes). With the highest number of emigrants going to Pakistan in all the 5 years, India displayed the shortest path to countries located in southern Asia (i.e., Bhutan, Nepal, Sri Lanka, Bangladesh) but also in Persian Gulf (i.e., Oman, United Arab Emirates, Saudi Arabia).
The collapse of the Berlin Wall in 1989 followed by the collapse of the Soviet Union in 1991 and the Eastern European socialist states led to instability in Central Europe and created a threat to Western Europe in terms of migration [41]. Millions of people have moved in and between the successor states of the former Soviet Union, making immigrants from the Russian Federation the central country of origin for 13 countries in 1960 and 15 in 2000. It is interesting to note that over the five decades, the largest number of immigrants in the Russian Federation were from Ukraine, and the largest number of Russian emigrants had Ukraine as their destination country.
All this helps to infer that the Top1D network changed more than the Top1O network due to world events and increasingly selective immigration policies in developed countries that led to a higher diversification of migration stocks. Top1O network, where the communities are not so highly concentrated and present longer paths and more stable groups around both developed as well as more substantial developing nations. Also, our analysis revealed that only few countries have a central role in the communities evolution, with patterns becoming more skewed to migration from an increasing diverse array of origin countries concentrating on a shrinking pool of destination countries, which are mostly developed (e.g., USA and United Kingdom) but also developing countries such as the Russian Federation and India. Dissolution of the USSR in fifteen ethnically based national republics has led to many Russians who suddenly have become minorities in the new state to migrate to Russia. The communities structure also reveal the relation between the United Kingdom and France with their former colonies all over the world. Concerning the origin of the immigrants, both developed and developing countries were shown to participate in the migration process, with a decreasing evolution during the period of study for Western European countries and USA and a stable evolution for the countries with large populations, such as Russia, India, and China.
This analysis can be extended in several ways by finding explanations to the following questions: Why are top1 selection actions stable for some countries, and why do some change? What causes stability? Moreover, we can build a model to investigate the factors that influence the trend in top1 networks. Future research should extend the extraction methodology for determining how the migrants change their preferences in choosing the country of destination by considering not only the highest link weight (e.g., in top1) but the first two or three (e.g., generating top2 and top3 networks). Furthermore, we can explore the communities to determine the economic and social effects. In addition, the methodology presented in this paper can also be applied on other directed weighted networks, such as the international trade network and the international investment network.