Scaling properties of food flow networks

Food flows underpin the complex food supply chains that are prevalent in our increasingly globalized world. Recently, much effort has been devoted to evaluating the resources (e.g. water, carbon, nutrients) embodied in food trade. Now, research is needed to understand the scientific principles of the food commodity flows that underpin these virtual resource transfers. How do food flows vary with spatial scale? To address this question, we present an empirical analysis of food commodity flow networks across the full spectrum of spatial scales: global, national, and village. We discover properties of both scale invariance and scale dependence in food flow networks. The statistical distribution of node connectivity and mass flux are consistent across scales. Node connectivity follows a generalized exponential distribution, while node mass flux follows a Gamma distribution across scales. Similarly, the relationship between node connectivity and mass flux follows a power law across scales. However, the parameters of the distributions change with spatial scale. Mean node connectivity and mass flux increase with increasing scale. A core group of nodes exists at all scales, but node centrality increases as the spatial scale decreases, indicating that some households are more critical to village food exchanges than countries are to global trade. Remarkably, the structural network properties of food flows are consistent across spatial scales, indicating that a universal mechanism may underpin food exchange systems. In future research, this understanding can be used to develop theoretical models of food flow networks and to model food flows at resolutions for which empirical information is not available.


Introduction
We live in an increasingly global society, in which food commodity transfers enable production and consumption activities to be separated in space via complex supply chains [1,2]. Here, we refer to the movement of food commodities from one location to another as 'food flows', reserving the term 'food trade' for the international exchange of food commodities between countries. Recently, much research has evaluated food trade [3][4][5][6], particularly the resources embodied in food trade, such as water, carbon, and nutrients [7][8][9]. However, we know relatively little about food flows at smaller spatial scales, such as within nations or cities. This is largely due to a lack of available data on food flows at smaller spatial scales. Research is a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Commodity flow data across scales
We obtain empirical information on commodity flows at three spatial scales: 'global', 'national', and 'village'. 'Global' data refers to international commodity trade between 240 countries for the year 2009. International trade data comes from COMTRADE [23] and is mapped in Fig 1A. 'National' commodity flow data is for the United States and is obtained from the Commodity Flow Survey (CFS) for the year 2007 [20]. The CFS dataset breaks the United States into 132 CFS Areas. A map of commodity flows within the United States is provided in Fig 1B. 'Village' scale data on commodity flows are available for all households for three Alaskan villages: Wainwright, Kaktovic, and Venetie (locations shown in Fig 1C). Data on village scale commodity exchanges are available for the years 2009 and 2010 [15] and are mapped in Fig 1C. Importantly, each dataset provides information on bilateral transfers between all nodes within the spatial domain, eliminating selection bias. Unfortunately, the data across these three scales are not uniform. First, the time period is different. Both village and global data are available for the year 2009. However, the national data comes from CFS, which is only collected every 5 years. We have selected 2007, which is the closest available year to 2009. Second, the commodity categories are different across spatial scales. To handle this issue, we lump commodities into two broad categories: 'food' and 'non-food' commodities. This broad categorization of commodities should alleviate some of the inconsistencies across datasets.
Categorizing commodity flow data as food and non-food commodities enables us to distinguish the unique aspects of food commodity flows across scales. At the global scale, Harmonized System (HS) codes 1 to 24 are food commodities, while non-food items are HS codes greater than 24. For the national scale, the Standard Classification of Transported Goods (SCTG) codes 2, 3, 4, 5, and 7 are food commodities and non-food commodities are all other SCTG codes. Village scale data on commodity donations are all food, with the exception of reports of equipment, cash, gas, ammunition, and donations of labor [15]. Global and national commodity flows are provided in both mass [kg] and value [$], while village flows are only available in units of mass. We weight commodity flows in these primary units, as they are most likely to be linked to the underlying heterogeneity and driving mechanisms.
The average area of a country in the global trade system is 5 x 10 11 m 2 [32]. The area of the United States is 9,147,420 km 2 [32], and there are 132 CFS Areas. So, we estimate that the average area of a CFS Area in the United States is 7 x 10 10 m 2 . The average size of an American house is 222 m 2 [33], which we assume is similar to the size of households in the three Alaskan villages. So, node size in the three systems varies across roughly 10 orders of magnitude.

Network analysis of commodity exchanges
For each commodity flow network, the nodes are the units exchanging commodities and the links are the bilateral commodity flows. The full specification of the data allows us to characterize weighted and directed networks. From these networks, we can also evaluate their simpler unweighted and undirected counterparts. Here, we consider both unweighted/weighted and undirected/directed networks in order to consistently assess flow directionality and flow intensity across spatial scales. The weighted, directed matrix (W D ) contains elements (w i, j ) that provide the weighted, link-level flows from node i to j. Each element w i, j has a value >0 when a commodity flow exists from node i to node j. The unweighted version of W D is referred to as an adjacency matrix (A D ), which is a binary matrix in which only connectivity information is present. Each element (a i, j ) is equal to 0 when no connection exists from node i to j and equal to 1 when there is a connection from node i to j. When direction is not taken into account, A D is a symmetric matrix (A U ) in which each element a i, j is equal to 0 when no connection exists between node i and j and equal to 1 when there is a connection between node i and j. Note that information on the flow direction between node i and j is not available in A U [34].
Network density refers to the number of links that exist in the network as a fraction of the total potential number of links. Density is a global network property and is measured by where M is the number of links and N(N − 1) is the number of possible links [34]. First order network properties examine the attributes of individual nodes in the network. Node degree (k) is a fundamental network property that measures the connectivity of each node. For directed networks, there are two kinds of degree: out-degree (k out i ) and in-degree (k in i ). The number of outgoing links is defined as k out  [34]. Higher order network properties examine attributes of the neighborhood of nodes in the network. Node clustering (c) describes the propensity of nodes in the network to form closed triangles [35]. This is a classic measure of the 'cliquishness' of a social network. Node betweenness centrality (B) is calculated as B ¼ P i;j sði;u;jÞ sði;jÞ , where σ(i, u, j) is the number of shortest paths between nodes i and j that pass through node u, σ(i, j) is the total number of shortest paths between i and j, summed over all pairs i, j of nodes [34]. B is normalized by 1/(N − 1)(N − 2) such that 2 [0, 1] [36]. Directed paths are used to calculate directed B and undirected paths for undirected B. B measures the importance of a node to the overall network structure.

Results and discussion
Here, we provide the results of our empirical analysis of both food and non-food flows at global, national, and village spatial scales. First, we map and determine the global properties of flow networks. Second, we determine the statistical distributions that best fit the networks across spatial scales. Third, we characterize the parameters of the statistical distributions across all scales.

Summary statistics
Fig 1 provides a map of food flows for each spatial domain of analysis. Panel A maps international food trade between countries. Note that world regions are color coded so that regional food trade can be more clearly observed. Panel B maps sub-national food flows within the United States. Food flows between the 132 CFS areas are depicted. Bubbles in Panel A and B are scaled by the total mass flux of food for each node. Panel C illustrates food flows between households in three Alaskan villages: (i) Wainwright, (ii) Kaktovic, and (iii) Venetie. However, geographical information is not available for households at the village spatial scale. So, maps of village food flow networks are provided for illustration purposes only and are not geographically accurate. Table 1 provides summary statistics for the three spatial domains. There are 240 nodes (i.e. countries) in the global food trade network, 123 nodes (i.e. CFS areas) in the national food flow network, and 163 nodes (i.e. households) in the village food flow network. Note that the 'village' provided in Table 1 is Kaktovic for comparison purposes. Kaktovic is representative of the other two Alaskan villages as shown in Table 2. Note that the number of nodes is constant across food and non-food flow networks. This indicates that all nodes trade both food and non-food. However, the number of links varies between commodity classes. Global and national domains have more non-food links, while the village domain has more food links. Network density decreases with spatial scale for food commodities. This means that the fraction of realized to potential links declines as the spatial scale decreases. This relationship hints at scale dependence in food flow networks. Yet, density does not follow this clear pattern for non-food commodities, in which the density of the national scale is actually greater than it is at the global scale. Density of the village scale is dramatically lower than it is for the global and national scale. This is true for both food and non-food flows.
The mass and value of global trade is on the same order of magnitude for both food and non-food. This is quite surprising given the relatively low value of food commodities. Interestingly, national commodity flows in the United States are higher in value than they are mass for both food and non-food commodities. In fact, value flows within the United States are comparable to the entirety of the global trade system. This indicates that roughly the same value of commodities flows within a country as across all national borders, highlighting the importance of considering sub-national commodity fluxes. Fig 2A, 2D and 2G present the degree distributions of undirected food flows across spatial scales. We fit a generalized exponential distribution to the node degree (k) distribution at each scale. The generalized exponential probability density function is given by:

Network connectivity.
where a, b, and c are shape parameters, d is a location parameter, and e is a scale parameter. The function can also be expressed in the 'standardized' form: Here where y is the imaginary value of discrete data being continuous [37,38]. So, the generalized exponential distribution used here has 5 parameters. Fig 2A, 2D and 2G illustrate that undirected food flow node degree distributions are well fit by a generalized exponential distribution across all spatial scales. However, Fig 3A, 3D and 3G indicate that non-food flows are not well fit by the generalized exponential distribution. In particular, the right tail of the histogram for global and national connectivity exhibit higher values than can be captured by the generalized exponential distribution. This indicates that food and non-food commodity flows exhibit different network structures, likely due to differences in the underlying driving mechanisms. The generalized exponential distribution parameters for undirected food and non-food networks are provided by spatial scale in Table 3. The generalized exponential distribution also provides a good fit to the connectivity of food flow networks with direction and value weights. When direction is taken into account, food flows are still well fit by the generalized exponential distribution. To see this, refer to Figs 4A, 4D, 4G, 5A, 5D and 5G. This indicates that food flow network connectivity can be modeled with the same distribution with or without consideration of flow directionality. This is despite the fact that the tails of in-and out-degree distributions are quite different. The distributions of out-degree are more right skewed compared to the in-degree distributions across all scales. This pattern has also been pointed out by Konar et al (2011) [27]. This indicates that countries have many export trade partners, while they tend to import from just a few trade partners. This makes sense under the principle of economic specialization, in which a location that is efficient at producing a given commodity will specialize in it's production and export. Conversely, if a nation imports a given commodity, it will likely be able to meet its demand for this commodity in a few trade relationships. Table 3. Parameters for undirected food and non-food commodity flow networks. Note that a generalized exponential distribution is fit to node degree, Gamma distribution is fit to node strength, and a power law is fit to node degree versus strength.

Global
National Village  Fig 6A, 6D and 6G illustrate that the generalized exponential distribution fits the connectivity structure of food flows well when value [$] weights are assigned rather than mass fluxes [kg]. This is what we would expect, since link weights are not considered in the connectivity structure, but it is good to empirically determine this. Additionally, all three Alaskan villages exhibit the same network properties (refer to Fig 7). However, Fig 8A illustrates that mean node connectivity decreases with spatial scale, despite the fact that the statistical distribution remains the same.
The generalized exponential distribution is an extension of the exponential distribution, much like the Gamma or Weibull distributions [39]. A more parsimonious distribution (i.e. such as the Gamma or Weibull distributions) would be desirable to fit to our data. However, rather than model parsimony, we selected the model with the best fit to the data. In the end, the generalized exponential distribution has qualities that are beneficial for our setting. The generalized exponential distribution is an extension of the bivariate exponential distribution (BVE). So, in order to understand the generalized exponential distribution, we first need to understand the BVE model. The BVE model has been used to ascertain the failure time of two plants that use common components. For example, both plants use engines produced by the same company. This means that any failure related to the common component will be correlated across plants. At the same, these two plants also use different components. Thus, this introduces potential independent reasons for failure. In the BVE model, the waiting time for failure does not change with the passing of time (due to Poisson process assumption). This gives a memoryless property to the BVE model. The generalized exponential distribution makes sense as a representation of trade systems. Replace one plant in the model with one trading node, and the other plant with all other nodes in the trading system. Then, assume that these two entities will keep producing one new link per time unit. A failure occurs when a link in the trade system is broken. In our commodity flow systems, failure could arise either from a common component (for example, correlated disaster in the production of a commodity in both nodes due to weather) or from different components. The generalized exponential model is an adaption of the BVE model such that failure becomes more likely as time goes by. In other words, the generalized exponential distribution is not memoryless, such that shocks accumulate in time [37]. This makes sense, as disruptions to trade would likely be taken into account by trading partners going forward. This feature of the generalized exponential distribution leads to an increased failure rate when exposed to external shocks. Additionally, the generalized exponential distribution enables the shock arrival rates to be separately identified from their impacts [37]. For this reason, the generalized exponential distribution-which provides the best fit to our data-also possesses properties that likely capture important aspects of trade systems. This feature will likely prove useful in future research that aims to model the impact of shocks to the global food trade system.

Network mass fluxes.
The distributions of mass transfers for undirected food flow networks are provided in Fig 2B, 2E and 2H. We fit a Gamma distribution to the node strength 1 where Γ is the Gamma function, α is the shape parameter, and θ is the scale parameter of the Gamma distribution. Fig 2B, 2E and 2H show that node mass flux, or strength (s), distributions for undirected food flow networks are fit well by a Gamma distribution across all spatial scales. This highlights that these networks are much more heterogeneous in terms of their mass fluxes than connectivity structure. The Gamma(shape, success rate) distribution is generated from a poisson process. Conceptually, the gamma distribution can be explained as commodity shipments to meet a "shape" amount of need. The chance that a unit of transported commodity will be successful and meet one unit of need is the "success" rate. The Gamma distribution parameters (α, θ) are provided for undirected food and non-food commodities in Table 3. Graphs of the Gamma distribution fit to non-food flow networks are provided in  Fig 5B. However, note that the statistical fit still performs well (i.e. see the KL-divergence value in Table 4).
The outlier in Fig 5B is the country that exports the most food in mass, which is China. In 2009, China exported 5.11 x 10 11 kg of food. The country that exports the second largest quantity of food is the U.S., with a total food export of 1.74 x 10 11 kg. It is possible that 2009 is an anomalous year (for example, due to the Great Recession in the U.S.). Note that, here, 'food' encapsulates more than raw agricultural commodities, which the U.S. has been shown to Scaling properties of food flow networks export more than China (e.g. shown in Dalin et al., 2012). The value of the food export of the U.S. is larger than that of China (i.e. U.S. food export value is $1.02 x 10 11 ; China food export value is $3.91 x 10 10 ), although the value of non-food export is larger in China (i.e. China nonfood export value is $1.35 x 10 12 ; U.S. non-food export value is $9.18 x 10 11 ). China and U.S. food export values are broken down by HS code in Table 5.
Node strength distributions in value [$] weights are provided in Fig 6 for the global and national spatial scales. These figures all indicate that the Gamma distribution provides a good fit to the intensity of commodity flows across commodity types, with or without directionality, and for both mass and value weighting schemes. Fig 8B shows that nodal mass flux decrease with spatial scale, as we would expect, even though a Gamma distribution provides a suitable model across scales. Here, we have shown that the Gamma distribution captures nodal strength distributions across all flow networks reasonably well. However, the Gamma distribution underestimates outliers in some instances; for example, missing the export dominance of China in 2009. Yet, the Gamma distribution is a flexible statistical model that is broadly representative of commodity mass fluxes. There are known properties of the Gamma distribution, which means that future efforts to model commodity fluxes may be able to benefit from these attributes. For example, the Gamma distribution has known reliability, lifetime, and hazard functions [40]. These statistical properties can be taken into account to model commodity fluxes in future research.

Relationship between connectivity and flows.
We examine the relationship between node degree and strength for undirected food flows in Fig 2C, 2F and 2I. In Fig 2C,  2F and 2I s is plotted against k for all spatial scales. The straight line relationship in log-log scale indicates that there is scale invariance between mass flows and network connectivity. Specifically, a power law relationship between nodal mass flux and connectivity is evident across all spatial scales. Thus, there is a power law relationship between s and k food flows and it is consistent across village, national, and global food networks.
A linear relationship is fit to log(s) and log(k) such that: The parameters of the power law relationship for undirected food and non-food flows are provided in Table 3. The statistical distribution parameters change with the scale of analysis, indicating that there is scale dependence, despite the fact that the power law exists across scales. The power law exponent is the highest for global trade (slope = 2.7; see Table 3), but is similar for national and village scales (slope = 1.5 and 1.6, respectively). The power law relationship is less clear for non-food flows. Note that the points exhibit more scatter in Fig 3 than in Fig 2. Likewise, the exponents are consistently smaller for non-food flows than for food flows. Again, this indicates that food and non-food flow networks exhibit different properties which may be due to their underlying unique attributes.
What are the implications of a power law relationship for node strength versus degree? Particularly for global trade, the high b value indicates that there is a strong relationship between the mass that each nation trades and its number of trade partners. In other words, the node strength grows faster than node degree, so the more trade connections a country has, the much more it is able to participate in the exchange of commodity mass. This relationship occurs in a highly nonlinear way. In this way, shocks to trade relationships may prove highly disruptive to national access to the mass of food commodities, unless trade patterns are Table 4. Parameters for in-food and out-food commodity flow networks. Note that a generalized exponential distribution is fit to node degree, Gamma distribution is fit to node strength, and a power law is fit to node degree versus strength.

Global
National Village allowed to adapt. This is another statistical attribute that future efforts to model food flows may endeavor to incorporate.

Network clustering and centrality.
The clustering coefficient enables us to evaluate the tendency of nodes in the network to form tightly connected groups. In Fig 8D, it is clear that node clustering decreases with spatial scale. In other words, nations are more likely to form 'cliques' than are households in a village [34]. However, node clustering decreases in a much less consistent manner than it does for degree and strength. This can be seen by the fact that the whiskers in the box-whisker plot overlap for clustering. Scale dependency in network parameters indicated by Fig 8D likely arises as a result of the aggregation process in food fluxes from smaller to larger scales of analysis. Fig 9 presents the relationship between betweenness centrality (B) and degree (k) for food fluxes by spatial scale. Here, an exponential distribution is used to fit the relationship between B and k. Specifically, we fit the following function: k = a Ã exp(b Ã B). The shape of the B versus k relationship is consistent across scales and between food and non-food commodity groups (shown in Fig 9). The shape of the relationship is relatively similar with or without the incorporation of directionality. However, directed networks exhibit higher centrality (i.e. note the red points have larger x-axis values). In other words, certain nodes play a more central role in Table 5. Total export quantities for China and the United States in 2009. Quantities are shown by HS code for the food commodities (i.e. HS codes . HS codes 25-97 are non-food commodities. Note that the U.S. exports an order of magnitude more cereals and oil seed crops than does China, but that China exports an order of magnitude more in other food categories, such as beverages, spirits, and vinegar (with a high mass). Total values are shown for both food and non-food exports. Scaling properties of food flow networks networks in which direction is taken into account. Interestingly, this pattern is reversed for village networks. In Fig 9E and 9F, the red and black lines flip and the black points have larger xaxis values. This means that direction is less important to node centrality at the village scale than it is at the national and global scales. The nodes that have both high k and B values are referred to as the network 'core'. A core has been shown for global [3] and national [17] food flows. We confirm that this relationship exists for both undirected and directed food and non-food commodities (refer to Fig 9), although a core group of nodes is more pronounced for food networks. We also show that a core group of nodes exists at the smallest spatial scale. In fact, core households at the village scale exhibit the highest centrality of all food flows networks. Note that the scale on the x-axis is an order of magnitude larger for the village scale than for the global scale. This indicates that The relationship between connectivity and centrality is similar across spatial scales. Yet, scale dependence also exists since the statistical attributes of B vary with spatial scale. This result is shown in Fig 8C, in which B increases with decreasing spatial scale. At the village scale, B is more concentrated amongst some households that are dramatic outliers. These B outliers are not evident in the global and national scales. So, some households in the village scale are more central to its food flow network than any CFS areas are to the national network or countries are to the global network. In this way, the village is more vulnerable to the removal of its core households.

Conclusion
Food flow networks may prove to be an important adaptation measure to cope with future climate and economic shocks. For example, if extreme climate events increase in frequency as projected under a changing climate, the ability to transfer commodities in both space and time may help those production locations that experience shocks to maintain consumption by importing from external sources. As such, it is essential to understand the scaling properties of food commodity flow networks so that we can understand how to model food flows and evaluate opportunities and roadblocks to the spatial and temporal redistribution of goods. Information on the scaling properties of food flow networks will also enable prediction of flows for the many locations and settings for which food transfer data is not available.
We have examined the empirical network structure of food commodity exchanges across the full spectrum of spatial scales. Village scale donations of food between household is the smallest spatial scale at which social food fluxes can occur; global scale international food trade between countries represents the largest possible spatial domain. Empirical evidence suggests that both scale dependent and invariant properties exist for food flow networks. Network parameters such as mean node connectivity, mass flux, and centrality vary with spatial scale, likely due to the aggregation process of food fluxes. Yet, we find that the statistical distribution functions of node connectivity and mass transfers are invariant across scales. Likewise, the relationship between node connectivity and mass flux exhibits a power law relationship for each spatial domain. These relationships hold for commodity fluxes weighted in both mass [kg] and value [$] units and for both undirected and directed networks. However, non-food commodities are not well fit by the same statistical distributions across spatial scales. This highlights that there are unique attributes of food transfers that lead them to have properties that are distinct to non-food.
The network structures of food flow systems provide a signature of their vulnerability and resiliency to disturbance. Extensive research has explored the implications of certain network structures for vulnerability and resiliency. For example, networks with a power law node degree distribution have been shown to be vulnerable to targeted attack, but resilient to random attack. Future research is needed to explore the implications of the statistical network distributions of food flows presented here. Scale invariant properties indicate that similar governing mechanisms are likely influencing food flows across scales, despite the fact that these systems are typically thought to arise from starkly different generative processes. We hypothesize that universal signatures of human behavior may lead to the similarities in food network statistical distributions across scales. For example, the human tendency for risk-sharing and cooperation may be an important mechanism generating the emergent food exchange patterns. Future research can build upon this work by modeling network formation processes and estimating food flows at resolutions lacking data.