Global Value Trees

The fragmentation of production across countries has become an important feature of the globalization in recent decades and is often conceptualized by the term “global value chains” (GVCs). When empirically investigating the GVCs, previous studies are mainly interested in knowing how global the GVCs are rather than how the GVCs look like. From a complex networks perspective, we use the World Input-Output Database (WIOD) to study the evolution of the global production system. We find that the industry-level GVCs are indeed not chain-like but are better characterized by the tree topology. Hence, we compute the global value trees (GVTs) for all the industries available in the WIOD. Moreover, we compute an industry importance measure based on the GVTs and compare it with other network centrality measures. Finally, we discuss some future applications of the GVTs.

GVTs and propose an industry importance measure based on the GVTs and compare it with other network centrality measures. Finally, Section IV discusses some future applications of the GVTs and concludes the paper.

II. METHODS
The complex networks approach has been widely used in economics and finance in recent years [17][18][19][20][21][22][23][24]. Designed to keep track of the inter-industrial relationships, the input-output system is an ideal test bed for network science. In particular, the global MRIO system can be viewed as an interdependent complex network, where the nodes are the individual industries in different countries and the edges are the input-output relationships between industries [24]. This paper takes one step further and uses the WIOD database to construct the global value networks (GVNs), where the nodes are the individual industries in different countries and the edges are the value-added contribution relationships (The call for a network analysis of the GVCs has existed for years [25][26][27][28].). Moreover, based on the GVNs, the global value trees (GVTs) can be computed in a straightforward manner.

A. Data Description
We use the World Input-Output Database (WIOD) [16]  prices are also called the producers' prices, which represent the amount receivable by the producers. An alternative is the purchases' prices, which represent the amount paid by the purchases and often include trade and transport margins. The former is preferred by the WIOD because it better reflects the cost structures underlying the industries [16].). Table  1 shows an example of a global MRIO table with two economies and two industries. The 4 × 4 inter-industry table is called the transactions matrix and is often denoted by Z. The rows of Z record the distributions of the industry outputs throughout the two economies while the columns of Z record the composition of inputs required by each industry. Notice that in this example all the industries buy inputs from themselves, which is often observed in real data. Besides intermediate industry use, the remaining outputs are absorbed by the additional columns of final demand, which includes household consumption, government expenditure, and so forth (In Table 1 we only show the aggregated final demand for the two economies.). Similarly, production necessitates not only inter-industry transactions but also labor, management, depreciation of capital, and taxes, which are summarized as the additional row of value-added. The final demand matrix is often denoted by F and the value-added vector is often denoted by v. Finally, the last row and the last column record the total industry outputs and its vector is denoted by x.
Insert Table 1 here.

B. Construct the Global Value Networks
If we use i to denote a summation vector of conformable size, i.e., a vector of all 1's with the length conformable to the multiplying matrix, and let Fi = f , we then have Zi + f = x.
Furthermore, if dividing each column of Z by its corresponding total output in x, we get the so-called technical coefficients matrix A (The ratios are called technical coefficients because they represent the technologies employed by the industries to transform inputs into outputs.). Replacing Zi with Ax, we rewrite the above equation as Ax + f = x. It can be rearranged as (I − A)x = f . Then we can solve x as follows: where matrix (I − A) −1 is often denoted by L and is called the Leontief inverse [29,30].
If dividing each element of v by its corresponding total output in x, we get the valueadded share vector and denote it by u. Moreover, if we useû to denote a diagonal matrix with u on its diagonal, then the value-added contribution matrix can be computed as follows: where G is the value-added contribution matrix and its element 0 ≤ G ij ≤ 1 is industry i's share of the value-added contribution in industry j's final demand, f j .
Finally, the GVNs can be constructed by using G as the adjacency matrix. Notice that the GVNs are both directed and weighted (We don't consider the self-loops so that we replace the diagonal of G with zeros. Meanwhile, we don't consider the rest of the world (RoW) and focus our attention on the 40 countries available in the WIOD.).

C. Compute the Global Value Trees
Based on the GVNs, the GVTs can be obtained by a modified breadth-first search algorithm. First, we choose an industry as the root of the GVT and the tree grows as we add the most relevant industries to the root industry in terms of the value-added contribution.
Second, since the GVNs are almost completely connected (This is a general feature of the input-output networks due to the aggregated industry classification [24].), we search the GVTs based on a threshold of the edge weight, which we denote by α, in order to separate the most relevant industries from the less relevant ones. Third, we limit the breadth-first search to a fixed number of rounds, which we denote by γ. Again, this is to ensure that only the most relevant industries with respect to the root industry are included in the GVTs.
Our benchmark GVTs are based on α = 0.01 and γ = 3. The tree topology requires that γ ≥ 2 because it would rather become a star topology if γ = 1. We choose γ = 3 to ensure that the nodes included in the GVTs are economically relevant to the root industries. To choose a proper value of α, we gather some statistics of the number of nodes across the GVTs by holding constant γ = 3 and by only varying the value of α. Second, α = 0.01 provides us with much more manageable size of GVTs (around 40 nodes on average) than α = 0.001 (around 800 nodes on average). In other words, only the most relevant nodes to the root industry will be present in the GVTs if α = 0.01. Last but not least, the coefficient of variation is the highest when α = 0.01, which means that α = 0.01 provides us with a more diverse set of GVTs than the other two parameter choices. This is very helpful if we want to examine the different topological properties of the different GVTs.
Insert Table 2 here. to the root are included in the tree.
Insert Figure 1 here.

III. RESULTS
Once we have computed the GVTs, some basic properties of the tree topology can be explored. Subsection III A quantifies the allometric scaling pattern of the GVTs. We estimate the allometric scaling exponents and find that the GVTs are topologically more similar to a star than to a chain. However, the GVTs have become more and more hierarchical over time. Subsection III B proposes a tree-based industry importance measure and compares it with other network centrality measures. We find that the tree-based measure performs the best in terms of the correlation with the industry total value-added. Therefore, the GVTs still retain the essential information of the GVNs and can be viewed as a reasonable simplification of the latter.

A. Allometric Scaling Pattern
The allometric scaling pattern refers to the power law relationship between size and other physical or behavioral variables. Previous studies have documented the ubiquitous existence of the allometric scaling pattern in systems as diverse as river networks, cellular metabolism, population dynamics, and food web [31,32].
For a directed tree topology, if we denote the total number of nodes in the sub-tree rooted at node i by X i and the sum of all X i 's in the sub-tree rooted at node i by Y i , then an allometric scaling relationship is observed between Y i and X i and can be described by a power law, i.e., Y i ∼ X η i , where η is called the allometric scaling exponent. Figure 2 shows the examples of a chain, a star, and a tree, respectively. The numbers inside the node circles are X i 's whereas those next to the circles are Y i 's. The allometric scaling exponent η of a tree is lower-bounded by that of a star (η = 1) and upper-bounded by that of a chain (η = 2). As a result, η can be interpreted as a measure of hierarchicality, as star is the "flattest" topology and chain is the most hierarchical topology given the same number of nodes.
Insert Figure 2 here.
To examine the hierarchicality of the GVTs, we estimate η's based on the root-node Y i -X i pairs across all the GVTs for each year. Figure 3 has the estimation result of η. Panel (a) shows the log-log plot of the root-node Y i -X i pairs in 2011, where the horizontal axis is the X i of the root node, i.e., the total number of nodes in a given GVT (the tree size), and the vertical axis is the Y i of the root node, which we call the accumulative tree size. The gray crosses are the observed data points. The thick blue dashed line is fitted with the observed data and with the slope of η. The fitting lines for star and chain based on the same set of X i 's are the green dashed line and the red dashed line respectively. It is straightforward to see that in 2011 the GVTs are more similar to a star than to a chain. Panel (b) plots the estimated η's over time. Again, the values of η are all closer to 1 than to 2. However, there is a clear upward trend, which means that the GVTs have become more and more hierarchical over time (Shi et al. [33] also estimate the allometric scaling exponent to understand the hierarchicality of the global production system. However, they consider the directed tree as a flow network. Furthermore, their paper differs from ours in both data source and research strategy. They use the United Nations COMTRADE database to construct the productspecific trade networks while we use the WIOD database to construct the GVNs with both country and industry dimensions.).
Insert Figure 3 here.

B. A Tree-Based Importance Measure
The GVTs are the subgraphs of the GVNs. Unlike the GVNs, the GVTs reveal the local importance of the industries. Previous studies have shown that the subgraph centrality measure can be used to complement the global centrality measures [34]. Hence, we compute a simple industry importance measure based on the GVTs and compare it with other network centrality measures.
First, we denote a tree with the root r by T (r). Furthermore, we denote the total number of nodes in the sub-tree rooted at industry i by X i (r) and the total number of nodes in the tree T (r) by N (r). If industry i is present in k trees all over the world and we denote the set of roots of the k trees by S i , then the importance of industry i is defined as follows: where T I i is the tree-based importance measure of industry i, F D(r) is the final demand in the root industry r and W GDP is the world GDP. Notice that when calculating T I i , we don't consider the role played by industry i in its own GVT (i.e., r = i), although the input-output network has strong self-loops [24].
The economic interpretation of the importance measure is that, more important industries are more closely attached to the root and are able to "pull" a larger portion of the GVTs (measured by X i (r) N (r) ) and are associated with more important roots (measured by F D(r) W GDP ). Moreover, since each T (r) where industry i is present has a score of importance, i.e., X i (r) W GDP , we can identify the GVTs where industry i has the highest importance score. For instance, Figure 4 shows the GVTs where China's electrical equipment industry has the highest importance score for domestic and foreign roots respectively in 2011.
Insert Figure 4 here.
To examine the tree-based importance measure in a more systematic way, we compare it with other network centrality measures. Table 3 has the top-20 industries identified by different measures for the selected years. Again, T I is the tree-based importance measure.
We also provide the results based on some network centrality measures. In particular, CC is the closeness centrality, BC is the betweenness centrality, P R is the PageRank centrality.
Finally, we include the measure of economic size of the industries, the industry total valueadded, which is denoted by V T . Some interesting patterns can be seen from this table.  Table S3 in Supplementary Information, we also report the country rankings by summing up the measures of the industries in the same country.).
Insert Table 3 here.
Moreover, Table 4  We find that T I performs the best in terms of the correlation with V T . Nevertheless, this is not to say that we should abandon other measures and solely use T I to understand the importance of a given industry. After all, we only consider the intermediate valueadded flows when calculating CC, BC, and P R, whereas we also take into account the final demand in the root industry, i.e., F D(r), when calculating T I, which gives more power to T I in explaining V T . However, the strong correlation between log(T I) and log(V T ) at least shows that the GVTs retain the essential information of the GVNs and can be viewed as a reasonable simplification of the latter. That is, T I can be considered as a measure of industry's position advantage. An industry holds an advantageous position by either attaching to big industries (i.e., big F D(r) W GDP ) or by affecting big portion of the GVTs (i.e., big X i (r) N (r) ). As a result, the better-positioned industries are more competitive in the world production system and hence are able to extract more value-added across the GVTs. Moreover, since the component X i (r) N (r) of T I measures how closely the given industry is attached to the roots (i.e., bigger X i (r) N (r) implies smaller distance to the roots), it can be considered as a measure of downstreamness. That is, the higher T I is the more downstream the industry is in the GVTs. Therefore, the strong correlation between log(T I) and log(V T ) supports Stan Shih's theory of "smiling curve", which states that most valueadded potentials are concentrated at the beginning (upstream) and the ending (downstream) parts of the supply chains.
Insert Table 4 here.

IV. DISCUSSION
Once we have the GVTs computed for all the industries available in the WIOD, many interesting questions can be proposed and answered. For instance, does a tree with a fixed root grow over time? This question can be answered by fixing the root industry and examining the GVTs over time. As an example, Figure 5 shows the evolution of the GVTs rooted at China's electrical equipment industry over time. A simple way of measuring the growth of the trees is to count the number of nodes over time. In Figure 5 Insert Figure 5 here.
We can also examine the different structures of the GVTs for the same industry and the same year but for different countries. Figure 6 compares the transport equipment industry between Indonesia and Japan in 1995. The immediate conclusion from this comparison is that the transport equipment industry has a more international GVT in Indonesia than in Japan. More interestingly, Japan's industries actually play important roles in Indonesia's GVT, i.e., three Japan's industries (JPN 12, JPN 15,and JPN 20) are direct neighbors of the root in Indonesia (This observation coincides with the increased foreign direct investment from Japan to Indonesia's car industry in 1995.). In this simple comparison, JPN 15 is clearly more competitive than IDN 15, according to the above T I measure.
Insert Figure 6 here.
In summary, previous studies of the GVCs are mainly interested in knowing how global the GVCs are rather than how the GVCs look like. To fill the gap in the literature, our paper is the first attempt to investigate the topological properties of the industry-level GVCs. Based on the GVNs, the global value trees (GVTs) can be obtained by a breadth-first search algorithm with a threshold of edge weight and a limit of the number of rounds.
We compute the GVTs for all the industries available in the WIOD and explore some basic properties of the GVTs. In particular, we estimate the allometric scaling exponents and find that the GVTs are topologically more similar to a star than to a chain. However, the GVTs have become more and more hierarchical over time. We also develop an industry importance measure based on the GVTs and compare it with other network centrality measures of the industries. We find that the tree-based measure performs the best in terms of the correlation with the industry total value-added. Therefore, the GVTs still retain the essential information of the GVNs and can be viewed as a reasonable simplification of the latter.
Finally, we discuss some future applications of the GVTs such as to examine the evolution of the GVTs for a certain industry and to compare the GVTs of the same industry in different countries.

ADDITIONAL INFORMATION
The authors declare no competing financial interests.              IDN_15   IDN_14   IDN_18   IDN_29   IDN_20   JPN_7   JPN_20  IDN_21   JPN_10   JPN_15   IDN_23   IDN_6   JPN_13  JPN_14   IDN_28   IDN_30  JPN_19   JPN_21   JPN_23   JPN_3   JPN_26  JPN_27  JPN_28  JPN_29  JPN_30   JPN_31   JPN_34   JPN_6  KOR_9   IDN_9   JPN_8   JPN_9  IDN_1   DEU_12   DEU_14   DEU_17  IDN_34   IDN_7   DEU_20  DEU_21  DEU_27  DEU_28  DEU_29  DEU_30  USA_7   USA_14   USA_9  USA_12  IDN_27  USA_17  USA_20   JPN_17   USA_27  USA_28  USA_29  USA_30  JPN_18  USA_34  JPN_2  IDN_8        other network centrality measures for the selected years. T I is the tree-based importance measure, CC is the closeness centrality, BC is the betweenness centrality, P R is the PageRank centrality, V T is the industry total value-added. The full names of the corresponding industries of the 3-letter codes can be found in Table S2 in Supplementary Information. The size of the sample is in the parentheses next to the corresponding years. T I is the tree-based importance measure, CC is the closeness centrality, BC is the betweenness centrality, P R is the PageRank centrality, V T is the industry total value-added. * * means that the coefficient is significant at 1% level. * means that the coefficient is significant at 5% level.    TABLE S3. The country rankings based on the tree-based importance measure and other network centrality measures for the selected years. T I is the tree-based importance measure, CC is the closeness centrality, BC is the betweenness centrality, P R is the PageRank centrality, V T is the industry total value-added. The full names of the corresponding industries of the 3-letter codes can be found in Table S2 IND  IND  CAN  KOR  BRA  IND  JPN  CAN  ESP   12  CHN  AUS  ROM  RUS  NLD  AUS  AUS  TUR  MEX  IND  IND  AUS  FRA  IND  AUS   13  KOR  BRA  DNK  AUS  IND  KOR  NLD  AUS  GRC  BRA  KOR  KOR  CAN  TUR  RUS   14  IND  NLD  IDN  TUR  AUS  IND  BRA  IDN  AUS  AUS  MEX  ESP  ITA  KOR  MEX   15  IDN  RUS  KOR  NLD  RUS  BRA  TWN  BRA  RUS  NLD  ESP  TUR  IDN  AUS  KOR   16  NLD  IND  BEL  IND  MEX  NLD  RUS  DNK  IND  RUS  TWN  TWN  TUR  GRC  IDN   17  MEX  BEL  IND  CAN  BEL  TWN  BEL  POL  NLD  TWN  IDN  NLD  MEX  MEX  NLD   18  SWE  DNK  TUR  SWE  TWN  IDN  TUR  HUN  SWE  BEL  TUR  BEL  GRC  POL  TUR   19  BEL  SWE  CHN  AUT  IDN  TUR  SWE  BGR  BRA  SWE  NLD  POL  LVA  IDN  SWE   20  TUR  POL  MEX  BEL  SWE  SWE  POL  ROM  POL  TUR  SWE  FIN  CZE  NLD  BEL   21  TWN  FIN  BGR  DNK  AUT  BEL  GRC  CAN  AUT  IDN  POL  GRC  HUN  SWE  POL   22  AUT  AUT  GRC  PRT  TUR  DNK  DNK  KOR  BEL  AUT  BEL  PRT  POL  BEL  TWN   23  DNK  PRT  NLD  TWN  DNK  AUT  PRT  SVK  IDN  POL  DNK  AUT  IND  FIN  AUT   24  GRC  GRC  CAN  MEX  POL  POL  AUT  GRC  FIN  DNK  AUT  DNK  SVK  AUT  DNK   25  POL  CZE  LVA  FIN  GRC  GRC  FIN  TWN  DNK  GRC  GRC  IRL  CYP  ROM  GRC   26  FIN  IRL  HUN  IDN  FIN  FIN  IRL  PRT  PRT  FIN  PRT  ROM  SWE  DNK  FIN   27  PRT  CHN  PRT  POL  PRT  PRT  CHN  EST  IRL  IRL  FIN  CZE  ROM  PRT  PRT   28  CZE  IDN  POL  CZE  IRL  IRL  CZE  BEL  CZE  PRT  ROM  HUN  PRT  CZE  IRL   29  ROM  ROM  SWE  HUN  CZE  CZE  IDN  CZE  HUN  CZE  CZE  SVK  DNK  CYP  CZE   30  HUN  TUR  TWN  ROM  HUN  HUN  HUN  LVA  CYP  HUN  HUN  BGR  SVN  HUN  ROM   31  IRL  HUN  AUT  CYP  ROM  ROM  ROM  SWE  TWN  ROM  IRL  CHN  BEL  TWN  HUN   32  SVN  SVN  CYP  IRL  LUX  SVN  SVK  CYP  ROM  SVK  SVK  IDN  BGR  IRL  SVK   33  SVK  BGR  CZE  SVN  SVN  SVK  SVN  MEX  SVN  LUX  SVN  SWE  MLT  BGR  LUX   34  BGR  SVK  LTU  BGR  SVK  BGR  LTU  LTU  LTU  SVN  BGR  LTU  LUX  SVK  BGR   35  CYP  LUX  SVN  LVA  BGR  LTU  BGR  MLT  LVA  BGR  CYP  SVN  LTU  LVA  SVN   36  LUX  LTU  IRL  LTU  CYP  CYP  LUX  NLD  SVK  LTU  LTU  EST  EST  LTU  LTU   37  LTU  CYP  EST  SVK  LTU  LUX  CYP  SVN  BGR  CYP  LVA  LVA  TWN  SVN  LVA