Bow-tie structure and community identification of global supply chain network

We study on topological properties of global supply chain network in terms of degree distribution, hierarchical structure, and degree-degree correlation in the global supply chain network. The global supply chain data is constructed by collecting various company data from the web site of Standard&Poor's Capital IQ platform in 2018. The in- and out-degree distributions are characterized by a power law with in-degree exponent = 2.42 and out-degree exponent = 2.11. The clustering coefficient decays as power law with an exponent = 0.46. The nodal degree-degree correlation indicates the absence of assortativity. The Bow-tie structure of GWCC reveals that the OUT component is the largest and it consists 41.1% of total firms. The GSCC component comprises 16.4% of total firms. We observe that the firms in the upstream or downstream sides are mostly located a few steps away from the GSCC. Furthermore, we uncover the community structure of the network and characterize them according to their location and industry classification. We observe that the largest community consists of consumer discretionary sector mainly based in the US. These firms belong to the OUT component in the bow-tie structure of the global supply chain network. Finally, we confirm the validity for propositions S1 (short path length), S2 (power-law degree distribution), S3 (high clustering coefficient), S4 ("fit-gets-richer"growth mechanism), S5 (truncation of power-law degree distribution), and S7 (community structure with overlapping boundaries) in the global supply chain network.


Introduction
National economies are linked by international trade and consequently economic globalization forms a giant economic complex network with strong links, i.e., interactions due to increasing trade. Especially if we view the globalized world economy with high resolution or microscopic view, we might notice that the giant economic network is a global supply chain consisting of a huge number of firms. On the other hand, it has been known that various collective motions exist in natural and social phenomena. The collective motions are due to strong interactions between constituent elements. Thus, it is expected that various collective motions will emerge in the globalized world economy under trade liberalization.
In the study of supply chain network, a several review papers have been published. M. A. Bellamy et. al. [1] categorized the study into three themes: network structure (i.e., system architecture), network dynamics (i.e., system behavior), and network strategy (system policy and control). They listed important factors to characterize supply chain network. For instance, factors of network structure are node-level property, network-level property, and link-level property. The factors of network dynamics are stimuli, phenomenon, and sustainability. The factors of network strategy are scope, intent, and governance. S. Perera et. al. [2] surveyed the methodologies to model topology and robustness. They pointed out the limitation of preferential attachment growth model to explain characteristics of the supply chain network and stressed the importance of fitness based growth models [3] to explain the observed topological characteristics. Notable phenomena on the supply chain networks are not only resilience against random failure and targeted attack but also collective motion such as cascading failure or chain bankruptcy. Y. Fujiwara studied the chain bankruptcy by analyzing supply chain and bankruptcy data, and Y. Ikeda developed a agent-based model and ran realistic simulation of the chain bankruptcy caused by a failure of a single firm [4]. K. J. Mizgier et. al. [5] studied the dynamics of default process in supply chain network using a agent-based model. Based on the simulation, they discussed implication in risk management and policy making. L. Tang et. al. [6] developed a theoretical cascading failure model considering interdependence of firms in supply chain network. They observed a sudden collapse of the interdependence of supply chain network. T. Mizuno et. al. [7] have analyzed a large set of global supply chain data. They have investigated three different types of networks: a customer-supplier network, a licensee-licensor network and a strategic alliance network. The degree distributions of all these three networks show scale free properties characterized by a power law tail. They also observed that all three network shows average path length around six. They have further studied the community structure of undirected versions of the networks using modularity maximization technique [8].
In addition to these studies, E. J.S. Hearnshaw et. al. [9] have studied the supply chain network in terms of complex network approach and have proposed the following nine propositions: • S1: Efficient supply chain systems show a short characteristic path length • S2: The nodal degree distribution of efficient supply chain systems follows a power law as indicated by the presence of hub firms • S3: Efficient supply chain systems demonstrate a high clustering coefficient • S4: The growth of efficient supply chain systems follows "fit-gets-richer" mechanism • S5: The power law degree distribution of efficient supply chain systems is truncated • S6: The link weight distribution of efficient supply chain systems follows a power law • S7: Efficient supply chain systems demonstrate a pronounced community structure with overlapping boundaries • S8: The fitness of hub firms determines the resilience of supply chain systems against both random disturbances and targeted attacks • S9: Resilient supply chain systems demonstrate a power law distribution for link-weights The nine propositions are related to path length, power-law degree distribution, clustering coefficient, preferential attachment growth mechanism, truncated power-law connectivity distribution, power-law distribution of node strength, community structure with overlapping boundaries, resilience against random failure and targeted attack, core-periphery structure, respectively. They tried to explain various functions of the supply chain by the structural characteristics of supply chain network.
In order to understand the globalized world economy and to make effective policy recommendations, it is indispensable to study global supply chain, international trade, business cycle, and economic growth by analyzing global big data using network scientific methodology. In this paper, we focus on topological properties of the global supply chain network. The study on topological properties of the global supply chain network is the first step to understand the globalized world economy with a microscopic view. We study a degree distribution, hierarchical structure, and the degree-degree correlation in the global supply chain network. We uncover the community structure of the network using map equation method and characterized them according to their location and industry classification. Furthermore, the composition of communities in terms of the bow-tie components is analyzed. Finally, we investigate the validity of the nine propositions on the supply chain network [9] based on the obtained results on the topological properties of global supply chain network.
Our paper is organized as follows. In section Data, we briefly describe the global supply chain network data used in this study. The data was collected from Standard & Poor's Capital IQ platform in 2018. In section Methods, methodologies for the identification of bow-tie structure, the community detection, and the over-expression of bow-tie components are explained. In section Results, the obtained results from the analysis of the global supply chain network data: basic structural properties, bow-tie structure, community structure, and over-expression of bow-tie components are explained using figures and tables. Finally, we investigate the validity of the nine propositions based on the obtained results on the topological properties of global supply chain network. To close, this paper concludes in section Conclusions.

Data
The global supply chain data was constructed by collecting various firm data from the web site of Standard & Poor's (S&P) Capital IQ platform in 2018. The data include firm ID, firm name, country and location of firm, primary industry, and sector as node information. The industrial classification is based on the Global Industry Classification Standard (GICS) which is developed by Morgan Stanley Capital International and S&P. We have 206 countries as the location of firms 11 different sectors of firms, 158 primary industries as listed in Table S1-S3 of Appendix S1.
The data also include types of business relationship between supplier and customer as link information. Although the various types of business relationships that come under suppliers are supplier, creditor, franchisor, licensor, landlord, lessor, auditor, transfer agent, investor relations firm, and vendor, the majority of the relationship types are supplier and creditor. Here the supplier indicates a firm providing the products or services and the creditor indicates a private, public or institutional entity which makes funds available to others to borrow.
In Table 1, types of business relationship for all firms are summarized. We note that the links in the data set are dominated by the business relationship of supply chain. In Table 2, types of supplier for all firms are summarized. We note that the suppliers are dominated by private firms and public firms. Therefore, the entire characteristics of the data set is reflecting the nature of the global supply chain network.
The total number of firms and directed links are 437, 453 and 948, 247, respectively. Number of firms, total revenue of firms for each country is listed in Table S1 of Appendix S1. Firm distribution for different sectors are listed in Table S2 of Appendix S1. The aggregated revenue is compared with Gross Domestic Product (GDP) for each country as shown in Fig 1 . This statistics provides an evidence for goodness of data coverage of our global supply chain data. The GDP data was collected from https://data.worldbank.org/, which is in public domain.

Identification of bow-tie structure
The bow-tie structure [10] is uncovered from the GWCC based on the flow of goods and services (money flows in the opposite direction) along the directed links. The definitions of the different regions of the bow-tie structure are given as follows: • The Giant strongly connected component (GSCC): The largest region where any two nodes are reachable through directed path.
• IN components: The nodes from which GSCC is reachable through directed paths.
• OUT components: The nodes that are reachable from the GSCC through directed paths.
• Tendrils (TE): The rest of the nodes in the GWCC.
We use breadth-first search algorithm to detect different components of bow-tie structure.

Community detection
Empirical networks are generally non-homogeneous with a high local link density. Community detection captures highly connected groups of nodes as modules. It provides a coarse-grained description of very large scale networks. Modularity maximization [11] is one of the popular method to detect communities. In this method, one maximizes the modularity index. Modularity is defined as the fraction of intra-community links with a subtraction of the expected fraction given a random distribution. However, this method suffers from resolution limit problem [12] when applied to large networks. This indicate modularity optimization fails to detect well defined small communities in large scale networks. Moreover, this technique provides similar type of partition for both undirected and directed version of a network. It can not capture the dynamic behaviours of the network. The map equation method [13] detects communities using the flow dynamics of the network. We use map equation method for our analysis as it is a directed network of suppliers and customers where link represents flow of goods. This method is one of the best performing community detection techniques to detect communities in a network [14]. It minimizes per step average description length L(C) of a random walker on the network as defined below (1) q and H(C) are the probability and Shannon entropy for inter community movement of the random walker respectively. p i is the probability that the random walker leaves the node i, and H(P i ) is the entropy for intra community movement.

Over-expression of node attributes within communities
Communities are ubiquitous in empirical networks. These communities are formed based on the similarities in some attributes of nodes. For examples, locations and sectors are key attributes for the formation of communities in Japanese supply chain network [15,16], in protein-protein interaction networks, biological functions form the basis of community structure [17].
To measure the over-expression of attributes in a community we follow the method of Tumminello et. al. [18]. In this method, the probability that X randomly selected nodes in a community C of size N C has the attribute A is calculated by the following hyper-geometric distribution where N A is the total number of nodes in the network with attribute A. The p-value p(N C,A ) for the N C,A nodes with attribute A in the community C can be obtained from the following expression: The attribute A is over-expressed when p(N C,A ) is lower than the some threshold value p c . As it is a multiple-hypothesis test, one has to choose the p c appropriately to exclude false positive. We set p c = 0.01/N A as used in [18], which takes care of the Bonferroni correction [19]. Here, N A indicates total number of distinct attributes for all the nodes of the network.

Basic structural properties
As the supply chain network is directed in nature, one can define in and out degrees for the nodes. The nodal in-degree is defined as the number of incoming links to a node and out-degree is the total number of outgoing links from that node. We observe probability density distributions for both nodal in and out degree's have a heavy tail nature where the tail of the distributions is characterized by a power law of the form P (k in/out ) ∼ k −γ in/out with γ in = 2.42 and γ out = 2.11 respectively as shown in Fig. 2 (a-b). The power law tail of the degree distribution is also observed in past investigations of empirical supply chain network data [7,[20][21][22]. The degree distribution plays pivotal role in shock propagation among nodes. The high asymmetry in degree distribution can result in system wide aggregate fluctuation due to idiosyncratic shocks to large firms [23]. It has been argued in the literature that such heavy tail distribution of nodal degrees arises due to rich-get-richer mechanism [24,25]. Similar to the rich-get-richer principle, here large firms have more customers and suppliers than small firms. Probability density distributions P of (a) the nodal in-degrees k in and (b) the nodal out-degrees k out . Variation of (c) the clustering coefficient C(k) as a function of degree k and (d) the average nearest neighbor degree k nn (k) as a function of degree k. Logarithmic binning of the horizontal axis is used in (a) and (b). Red lines represent the best power-law fit to the data. Blue lines in (c) and (d) represent the results for degree preserved random network where average is taken over 100 such uncorrelated networks.
Clustering coefficient, a measure of three-point correlation, reflects cliquishness among the neighbours of a nodes. For most of the real world network, average clustering coefficient is a decaying function of degree having a form C(k) ∼ k −β k with β k ≤ 1.0. We observe the clustering coefficient in the supply chain network decays with an exponent β k = 0.46 as shown in Fig. 2 (c) indicates the presence of a hierarchical structure.
The average degree of the neighbors of a node i which capture the nodal degree-degree correlation is defined as k nn,i = j k j /k i where the j runs over all k i neighbours of i. For the nodes with degree k, k nn (k) = ki=k k nn,i /N k = k1 k 1 P (k 1 |k) where N k is the number of nodes having degree k. The k nn (k) increases with k for a assoratative network and decreases for a disaasortative network. In the absence of nodal degree-degree correlation k nn (k) remain constant. As can be seen from Fig. 2 (d), k nn (k) does not depend on k and remain more or less in constant with k, indicating the absence of nodal degree-degree correlation. Further, the statistical significance of these results are tested by comparing it with results of the randomized degree preserving network [26]. The clustering coefficients of randomized network shows C(k) ∼ constant as expected. The variation of k nn (k) with k matches nicely with the case of degree preserving randomized network, which further supports the absence of nodal degree-degree correlation in the empirical network.
We study the connected components when the network is viewed as an undirected network. The largest connected component of the network is known as the Giant weakly connected component(GWCC). As can be seen from Fig. 3, the network consists of a very large GWCC with N = 407, 527 nodes and L = 927, 316 links. Using a breadth-first search, we calculate the average path length in the GWCC, by calculating the shortest paths between all pairs of nodes. The average path length is found to be 5.370 reflecting the small world nature of the global supply chain network. While the GWCC contains 93.16% of nodes of the network, rest of the components are very small. In the subsequent sections, we investigate only the GWCC of the network.

Bow-tie structure
We detect the bow-tie components in GWCC of the global supply chain network. The number of firms in each component is shown in Table 3. The OUT component, consists of nodes from which GSCC is reachable through directed paths toward downstream, is the largest and it consists 41.1% of total firms. GSCC (any two nodes are reachable through directed path), IN (nodes from which GSCC is reachable through directed paths toward upstream), and TE (The rest of the nodes in the GWCC) are approximately similar in size and comprise 16 [15]. The GSCC in the Japanese supply chain network occupies half of the system, meaning that most firms are interconnected by the small geodesic distances or the shortest-path lengths in the economy. This shows a good contrast to the result of the global supply chain network observed in our study. However, by examining the shortest-path lengths from GSCC to IN and OUT as shown in Table 4, one can observe that the firms in the upstream or downstream sides are mostly located a few step away from the GSCC. This feature of the economic network is different from the bow-tie structure of many other complex networks [27].  Community structure Communities are detected in the largest weakly connected component of the network. We employ the map equation method [13] to uncover the communities in the GWCC of the global supply chain network. The detected communities are found in various sizes. The probability density distributions D(s) of community sizes s for the empirical network and its degree preserving randomized network are shown in Fig. 4 (a). The distribution for the empirical network is more wider than it is for the randomized network.
The biased in the direction of flow between a pair of communities is measured by the polarization ratio defined by P ij = |w ij − w ji |/(w ij + w ji ), where w ij is the total number of links from i-th community to j-th community. P ij = 1 if the flow is totally biased from one community to the other and P ij = 0 if the flow is evenly balanced between the communities. The total flow between a pair of communities is L ij = (w ij + w ji ). If we assume that there is no bias in the flow direction between any pair of communities, according to a null hypothesis, the values of P ij will fluctuates March 6, 2020 8/24 around 0 with the standard deviation σ = 1/ L ij . As can be seen from Fig. 4 (b), most of the values for the polarizability ratio P ij are significantly higher than the 2σ level which is indicated by the dashed curve.

Overexpression within communities
We study the significant overexpression of different attributes such as primary industry, sectors, firm's location, bow-tie components within the communities. We have shown the detail overexpression results within 10 largest communities in Table 5. Various interesting features can be observed from the results of attribute overexpression. The largest community comprises of consumer discretionary sector based in the US. Further analysis shows these are private firms mainly from automotive retail, which belong to the OUT component in the bow-tie structure of the global supply chain network. In the second largest community, we observe of consumer discretionary sector based in China, UK, France, Germany, Japan, Malaysia and New Zealand. These firms belong to the IN component of bow-tie structure. The firms of third largest communities are from consumer discretionary, industrials, and materials sectors which are mainly based in Japan, China and Thailand. These firms are mostly belonged to TE component of bow-tie structure. We construct a weighted and undirected network of countries from their overexpression in communities with size larger than 100 to show the inter-relation between countries. A link of weight 1 is placed between two countries if they over-express simultaneously within a community. Furthermore, we visualize community structure of this network as shown in Fig 5. It shows each community is formed by geographically closely located countries.
Similarly, we also constructed a weighted undirected network of over-expressed primary industries, where a link of weight w is present between two primary industries, if they are over-expressed simultaneously in w communities. As can be seen from Fig A  and Fig S1 of Appendix S1, the clusters among primary industries are formed based on their sector classification.
We show the frequency of over-expression of the different components within the communities in bow-tie structure in Fig. 7. Here, we selected communities which size of communities is at least 10 firms.  are composed by the combination of GSCC and IN (G-I component), which is also observed in Japanese supply chain network [15]. This indicate the flow of goods in the supply chain network is more often confined within the GSCC and IN component compared to any other combination of the components of bow-tie structure. Surprisingly, a large fraction of communities are located in TE component. The firms in the communities located in TE components not only supply but also procure any products and services from GSCC components.
Discussion on the nine propositions E. J.S. Hearnshaw et. al. [9] have studied the supply chain network in terms of complex network approach and have proposed the nine propositions. In this section, we investigate the validity of the nine propositions based on the obtained results on the topological properties of global supply chain network.
Proposition S1 S1: Efficient supply chain systems demonstrate a short characteristic path length The average path length in the GWCC of the global supply chain was found to be 5.370. The average path length in the small world network L s is known to be similar to the average path length in the random graph Ls ∼ L r . The average path length in the random graph L r is approximately calculated by L r = log N/ log < k >= 6.77. Here, the number of nodes in the GWCC is N = 407527, the average degree is < k >= (< k in > + < k out >)/2 = 6.74. By assuming the degree distributions to be a power-law distributions in the entire range of the degree with γ in = 2.42 and γ out = 2.11. The average in-degree and the average out-degree are calculated by < k in >= k min in (γ in − 1)/(γ in − 2) = 3.38, and < k out >= k min out (γ out − 1)/(γ out − 2) = 10.0, where k min in = 1 and k min out = 1. The estimated value of L r = 6.77 is close to the observed value 5.370. This is reflecting the small world nature of the global supply chain network. Therefore the estimation of the average path length validate the proposition S1.

Propositions S2
S2: The nodal degree distribution of efficient supply chain systems follows a power law as indicated by the presence of hub firms We observe probability density distributions for both nodal in and out degree's have a heavy tail nature where the tail of the distributions is characterized by a power law of the form P (k in/out ) ∼ k −γ in/out with γ in = 2.42 and γ out = 2.11 respectively as shown in Fig. 2 (a-b). The network whose degree distribution is characterized by a power law possess hub firms. The hub firms are known as channel leader firms which are said to control performance and provide system-wide coordination of the supply chain [28,29]. The channel leader firms can exert their influence and provide opportunities and motivation for other firms to align themselves with their own specific objectives [30].  Table S3 of Appendix S1. The power law distributions characterized with γ in = 2.42 and γ out = 2.11 validate the proposition S2.

Propositions S3
S3: Efficient supply chain systems demonstrate a high clustering coefficient Clustering coefficient, a measure of three-point correlation, reflects cliquishness among the neighbours of a node. For most of the real world network, average clustering coefficient is a decaying function of degree having a form C(k) ∼ k −β k with β k ≤ 1.0. We observe the clustering coefficient in the supply chain network decays with an exponent β k = 0.46 as shown in Fig. 2 (c) indicates the presence of a hierarchical structure. The observed moderate clustering coefficient indicates that the proposition S3: It has a high clustering coefficient is weakly valid.

Propositions S4
S4: The growth of efficient supply chain systems follows "fit-gets-richer" mechanism It has been argued in the literature that such heavy tail distribution of nodal degrees arises due to rich-get-richer mechanism [24,25]. Similar to the rich-get-richer principle, here large firms have more customers and suppliers than small firms. Preferential attachment in rich-get-richer mechanism assumes that the acquisition of new links by a firm is determined solely by the number of its existing links. This assumption leads to the number of links being proportional to their duration in the supply chain. However, one can often observe that older firms have been outstripped by new entrant firms. There is a need therefore, to include the "fitness" of the firms to account for new entrants that can quickly dominate supply chains. By introducing "fit-gets-richer" mechanism [31], the fitter nodes have a greater acquisition rate for links and therefore, resulting network possess a scale-free property. The heavy tail distribution of nodal degrees and overtaking of older firms by new entrant firms validate the proposition S4.

Propositions S5
S5: The power law degree distribution of efficient supply chain system is truncated The power law distributions P (k in/out ) ∼ k −γ in/out with γ in = 2.42 and γ out = 2.11 respectively are observed in the middle region of the distributions as shown in Fig. 2 (a-b). The tail region of both distributions seem like truncated or exponentially cut-off. Especially this tendency is evident for P (k out ). This phenomenon is said to be caused by four reasons [9]. First, the finite size of marketplaces generates a truncated power law degree distribution. Second, there are practical reasons in the operation of firms that limit the ability of firms to indefinitely form and maintain exchange relationships. Third, when new links are to be formed with a hub firms, incomplete information generates uncertainty which might costs higher than transaction costs. If these costs are unacceptable, the firms will scrap the deal with the hub firms. Finally, the aging and depreciation of firms limits their growth. The observed truncation or cut-off in the tail region of the degree distribution validates the proposition S5.

Propositions S7
S7: Efficient supply chain systems demonstrate a pronounced community structure with overlapping boundaries We employ the map equation method [13] to uncover the communities in the GWCC of the global supply chain network. The detected communities are found in various sizes. The probability density distributions D(s) of community sizes s for the empirical network and its degree preserving randomized network are shown in Fig 4 (a). The distribution for the empirical network is more wider than it is for the randomized network. In Table 5, the over-expression of sectors and countries in the ten largest communities is shown. Communities in a supply chain are bound together in clusters predominantly connected by horizontal relationships among firms with similar interests and functions. However, we empirically observed that all firms within a community are not entirely cooperative as shown in Table 5. Therefore, community formation in supply chain possess overlapping boundaries. These results validates the proposition S7.

Remaining Propositions
The supply chain data has no weight on links. Therefore the following two hypotheses: S6: The link weight distribution of efficient supply chain systems follows a power law, S9: Resilient supply chain systems demonstrate a power law distribution for link-weights are not applicable in the analyses of this paper. In addition, we concentrated on the topological properties of the supply chain network and therefore, the resilience of the system: S8: The fitness of hub firms determines the resilience of supply chain systems against both random disturbances and targeted attacks is out of scope of our current study.

Conclusions
We studied on topological properties of global supply chain network in terms of degree distribution, hierarchical structure, and degree-degree correlation in the global supply chain network. The global supply chain data was constructed by collecting various company data from the website of Standard & Poor's Capital IQ platform in 2018. The total number of firms and directed links in our data were 437, 453 and 948, 247, respectively.
The degree distributions is characterized by a power law of the form with γ in = 2.42 and γ out = 2.11. The clustering coefficient decays C(k) ∼ k −β k with an exponent β k = 0.46. This indicates the presence of a hierarchical structure of the supply chain network. We observed that k nn (k) does not depend on k and remain more or less in constant with k, indicating the absence of nodal degree-degree correlation. The Bow-tie structure of GWCC revealed that the OUT component was the largest and it consists 41.1% of total firms. The GSCC component comprised 16.4% of total firms. We observed that the firms in the upstream or downstream sides were mostly located a few step away from the GSCC.
Furthermore, we uncovered the community structure of the network using map equation method and characterized them according to their location and industry classification. We observed that the largest community comprises of private firms mainly from automotive retail based in the US. These firms are belong to the OUT component in the bow-tie structure of the global supply chain network. It indicates the retail firms are generally belong to the OUT component of bow-ties structure.
Finally, we investigated the validity of the nine propositions on the supply chain network based on the obtained results on the topological properties. We confirmed the validity of propositions S1 (short path length), S2 (power-law degree distribution), S3 (high clustering coefficient), S4 ("fit-gets-richer" growth mechanism), S5 (truncation of power-law degree distribution), and S7 (community structure with overlapping boundaries) in the global supply chain network. However, the propositions related to link weight and resilient nature of the network were not confirmed due to the limitation of our data and the scope of our current study. This will be left for future study.
Our study provides a detailed topological characterization of the global supply chain network. These topological properties are utmost important to understand the international trade dynamics. It is well-known that community structure plays an important role in spreading phenomena. Our characterization of community structure will be helpful to understand the wide-spread economic crisis. The study further shows the inter-relationships among the countries and among the industrial sectors. BRA  We show a different color code of the nodes for the overexpression network of primary industries, which is shown in Fig 6 in main text. Here we use the node color according to their sector classification. From Fig 6 of main text and Fig A, we observe the clustering among primary industries are formed based their sectors.  Table C.