Thermodynamics of Random Reaction Networks

Reaction networks are useful for analyzing reaction systems occurring in chemistry, systems biology, or Earth system science. Despite the importance of thermodynamic disequilibrium for many of those systems, the general thermodynamic properties of reaction networks are poorly understood. To circumvent the problem of sparse thermodynamic data, we generate artificial reaction networks and investigate their non-equilibrium steady state for various boundary fluxes. We generate linear and nonlinear networks using four different complex network models (Erdős-Rényi, Barabási-Albert, Watts-Strogatz, Pan-Sinha) and compare their topological properties with real reaction networks. For similar boundary conditions the steady state flow through the linear networks is about one order of magnitude higher than the flow through comparable nonlinear networks. In all networks, the flow decreases with the distance between the inflow and outflow boundary species, with Watts-Strogatz networks showing a significantly smaller slope compared to the three other network types. The distribution of entropy production of the individual reactions inside the network follows a power law in the intermediate region with an exponent of circa −1.5 for linear and −1.66 for nonlinear networks. An elevated entropy production rate is found in reactions associated with weakly connected species. This effect is stronger in nonlinear networks than in the linear ones. Increasing the flow through the nonlinear networks also increases the number of cycles and leads to a narrower distribution of chemical potentials. We conclude that the relation between distribution of dissipation, network topology and strength of disequilibrium is nontrivial and can be studied systematically by artificial reaction networks.


Introduction
Connecting network theory with thermodynamics was an idea already present more than 40 years ago under the term network thermodynamics [1]. Despite the fact that the terms were used in combination, the theory was merely a graphical representation of conservation equations and did not make any statements about complex networks, as they are known today. In 2006 Cantú and Nicolis [2] studied thermodynamic properties of linear networks, but limited themselves to small networks, which they were able to handle analytically. Here, we extend their study by generating big random linear and nonlinear reaction networks and simulating them to a thermodynamically constrained steady state. This might contribute to a framework that allows to test methods for reconstructing thermodynamic data of reaction networks [3,4] and lead to a better thermodynamic understanding of reaction networks in general. Possible applications of this approach include the thermodynamic investigation of reaction models in biology [3][4][5], origin of life [6] and also Earth system and planetary science [7,8].
We look at reaction networks as thermodynamic systems that transforms two chemical species into one another [2]. The environment is driving the network to thermodynamic disequilibrium by keeping the concentration of two species constant. In the following, we will call the chemical species that are kept constant 'boundary species', because they are the species to which the boundary conditions are applied to.
Our basic assumption is that the network is able to transform the two boundary species into each other. This is not always possible in real reaction networks where the transformations are constrained by stoichiometry of chemical constituents. For example, any chemically sound reaction model will implicitly forbid pathways that transform N 2 O into H 2 O. Even if the artificial networks we create are comparable in density, they are not created with this constraint. This is due to the implications this constraint would have on the complexity of the boundary conditions. Omitting it leads to the existence of many transformation pathways between most pairs of randomly chosen boundary species, otherwise almost all pairs of boundary species would just have a steady state flow of zero between them.
We study different quantitative properties of the networks at steady state. In particular, because cycles have been reported to have important functions in networks [9][10][11], we look at the cycles that appear in the flow pattern. These cycles depend on the direction of the flow of each reaction, which in turn depends on the strength of the thermodynamic disequilibrium caused by the boundary condition.
In the next section we describe our method for generating reaction networks so they resemble different complex network models and how we simulate them to find their non-equilibrium steady state. We then present our results concerning the flow through the networks, the distribution of entropy production of individual reactions, and the dependency of cycle number from flow through the nonlinear networks.

Reaction Networks
Reaction networks [12] consist of a set of species M combined with a set of reactions R. They contain information on the connection of chemical species through reactions and include the stoichiometric constraints given by the reactions. Mathematically, a reaction network can be described by two stoichiometric matrices L and R. L ij is the coefficient of the i-th species on the left side of the j-th reaction and R ij is the coefficient of the i-th species on the right side of the jth reaction. Combining both matrices gives the stoichiometric matrix N = R−L, for which the element N ij in i-th row and j-th column gives the effective change of species i by reaction j. Given a relation v = v(x) between reaction rates v and species concentrations x, one can associate the reaction network with the dynamics of an ordinary differential equation (ODE): In complex network science instead of looking at a bipartite graph, where reactions and species are represented by different types of nodes which are connected by edges, often the substrate graph is used. In this simplified view the nodes represent the species and an edge between two species is present if and only if there is a reaction having those two species on different sides of the reaction equation (Fig. 1, (B')) [13,14].

Network Construction
Our artificial reaction networks are generated in three steps. We first generate a simple directed network (graph) consisting out of a set V of N nodes and a multiset E [ V×V of M edges. These networks are generated following the models of Erdős-Rényi [15], Barabási-Albert [16] (scalefree), Watts-Strogatz [17] (small-world, clustering) and Pan-Sinha [18,19] (hierarchicallymodular). We are always using variants of these network models that allow formation of self loops and multiple edges between the same nodes. Also, we generate networks with a fixed number of edges. From these complex networks the reaction network is constructed.
Simple reaction networks are created by translating each edge into a reaction of the form X⇋Y with X being the first and Y being the second node. In the rate equation of mass action kinetics this leads to a linear dependence of the reaction rates from the concentrations and thus we are calling these networks "linear" reaction networks. Nonlinear reaction networks are generated out of directed networks by combining pairs of edges to second order reactions of the form X+Z⇋Y+W. The selection of pairs is done with a probability distribution that maintains the characteristic properties of the substrate graph as much as possible. This is done by considering the probability of newly introduced edges in the substrate graph in the originally used network model. For example, consider the combination of the reactions A⇋B and C⇋D to create the reaction A+C⇋B+D. This leads to two new edges in the substrate graph between A and D as well as between C and B. The probabilities of these two edges in the original network model are then used to calculate the probability of the combined reactions.
Finally, the thermodynamic data is generated and assigned to species and reactions. In the following, the generation process of nonlinear networks specific to the different network models is explained before the generation of thermodynamic data is specified in detail.
Erdős-Rényi (ER). In the Erdős-Rényi network model [15] all possible edges have the same probability. We create these networks by simply drawing the nodes of every edge from the set of all nodes with uniform probability. For the construction of nonlinear reaction networks, second order reaction equations are then chosen from the set of pairs of linear equations with uniform distribution. Note that linear equations that are used as part of a nonlinear equation are not returned leading to the probability of all pairs of linear equations containing it being set to zero for subsequent couplings.
Barabási-Albert (BA). For generating scale free networks, the Barabási-Albert models are used [16]. In this model nodes are added consecutively. Newly added nodes are connected to the network by introducing edges between it and already existing nodes. The selection of nodes to attach to is done with probability scaling with their node degree (preferential attachment).
The coupling probability of linear reactions is calculated from the product of the node degrees of the chemical products. In principle, other functional dependencies are possible, but for simplicity we choose this one and check that it maintains the power law distribution of the node degree in the associated substrate graph (Fig. 2 (A)).
Watts-Strogatz (WS). Networks having a comparable average path length to Erdős-Rényi but with a higher clustering are generated with the Watts-Strogatz model [17]. From a circular lattice like structure, a fraction α is randomly reordered. For the size of our networks we choose a value of α = 0.1 (see Table 1). Cumulative degree scaling of real network's substrate graphs shows pronounced scale free property in comparison with their null models (randomized counterparts) for Earth's photochemistry [22] and a kinetik model of Yeast's metabolism [25]. (C) Cumulative distribution of the standard change of Gibbs energy of formation (Dm 0 j ¼ P i N ij m 0 i ) for artificial networks and respective thermodynamic reference data for glycolysis (see [28], For the creation of nonlinear networks we only form those couplings between linear reactions which lead to two new close edges in the substrate graph. Here "close" means that their distance in terms of the circular lattice is not larger than the largest distance of non reordered edges in it. It would have be possible to use a more sophisticated approach and use the parameter α as the probability of introducing a far edge in the substrate graph while coupling. But because even our simple method does not achieve a clustering coefficient as high as equivalent linear networks (Table 1) we use this simple method.
Pan-Sinha (PS). Hierarchically-modular networks are generated starting with uniformly partitioning the nodes into 2 h elementary modules, with h being the number of hierarchical levels of the network. On the first level two pairs of modules on the elementary level are joined to form a new module, leading to 2 (h−1) modules on the first level. Analogous, for all other levels modules of the level below are joined pairwise, up to the h-th level where there is just one module consisting out of the entire network. When edges are added to the network this happens with a probability proportional to the lowest level l in which the two nodes to be connected share a module. Two nodes that share an elementary module are connected with the probability p 0 whose value is given by normalization. Nodes whose lowest common level is l are connected with probability p 0 p l . For our networks we choose p = 0.5 and h = 8.
When creating nonlinear reactions we assign each possible coupling a probability proportional to the product of the probability of the two newly introduced edges in the original model. Assuming a coupling leads to new edges in the substrate graph between nodes with lowest common module on levels l 1 and l 2 , then the probability of choosing this coupling is scaled with p l 1 p l 2 .
Parameters. For network construction we generate linear reaction networks with N = 1000 species and M = 5000 first-order reactions. Nonlinear networks are build by generating a linear network with M = 3000 reactions and connecting C = 1000 of them to second-order reactions. To compare linear and nonlinear networks directly we also generate linear networks Table 1. Network properties. Properties of the substrate graphs of artificially generated networks as well as of examples of real networks. Table contains the number of vertices (jVj) and edges (jEj), the mean shortest path length (< L >) and the clustering coefficient (< C >) of the respective undirected network. The modularity is calculated using the walktrap community finding algorithm [38]. Data for real networks is taken from a database for Earth's photochemical reactions [22], models for the combustion of Ethanol [23] and Dimethyl ether [24] and a kinetic model of Yeast's metabolism [25]. For the artificial networks and the randomizations of the real networks mean values and standard deviations are calculated from 10 samples. from the substrate graph of the nonlinear networks. This comparison is not possible with the generated linear reaction networks because their substrate graph is not as clustered. An overview of all generated networks is shown in Table 1.

Thermodynamics of Reaction Networks
Thermodynamic properties of reaction networks can be described by non-equilibrium thermodynamics [20,21]. For simplicity and due to the artificial nature of our simulations we use unitless equations with the Boltzmann constant k B and the temperature T set to one in this work (we do not consider variations in T).
The change of entropy dS can be separated into the exchange of the system with the environment d e S and change through processes in the system d i S: The entropy exchange with the environment (with constant temperature and pressure) is given by with d e x k being the change of concentration due to interaction with the environment and μ k being the chemical potential. The entropy change through internal processes d i S is given accordingly: From this equation the rate of entropy production can be calculated. By rewriting we see that the entropy production of the network s tot ¼ d i S dt is merely the sum of the entropy production σ i of all individual reactions with σ i = ∑ m μ m N mi v i : In steady state the entropy production of individual reaction i can be also written as function of forward and backward reaction rates v i,+ and v i, − [21]: This relation can be applied to calculate the total entropy production rate σ tot = ∑ i σ i of a reaction network acting between two boundary species b 1 and b 2 kept at concentrations c 1 and c 2 . As the entropy production rate in steady state only depends on the boundary conditions (c 1 ,c 2 ,v = v + −v − ) we can replace the entire network with one imaginary linear reaction b 1 ⇋b 2 . If we assume the Gibbs energies of formation of boundary species to be zero, the forward and backward rate coefficients are equal and we obtain the equation Alternatively one could also get this result by calculating the boundary species entropy exchange with the environment, because in steady state 0 = dS = d e S+d i S.
Generating Thermodynamic Data. The Gibbs energies of formation of the species m 0 i are drawn from a normal distribution N(0,1). Reaction rates are calculated using the Arrhenius equation (with the prefactor A set to 1): Here, E a is the activation energy which is sampled from the distribution for every reaction. We have chosen this distribution, which resembles the Planck-distribution, because it has an effective non-zero lower bound while still having a large tail to the right (Fig. 1). We simulate all reactions reversibly. Forward and backward reaction are energetically constrained by the Gibbs energies of the species. Thus, we sample E a just once for every reaction and assign it to that reaction direction which respective products have a higher Gibbs energy of formation, either E e;i ¼ The second reaction directions activation energy is then given by the This expression is a reflection of the fact that in equilibrium, forward and backward reaction rates need to balance.

Network Simulation
As we are interested in the steady state of the network under thermodynamic boundary conditions we solve the reaction equation while keeping the concentration of two selected chemical species b 1 , b 2 at fixed concentration c 1 , c 2 . To remove the effects of the energy difference between the boundary species on the flow, we set their Gibbs energy of formation m 0 i to zero and recalculate reaction rates before the simulation. For solving the reactions' ODE, the integrator of C++'s boost library is used. The selected algorithm is "Dormand-Prince 5". Concentrations are initialized normally distributed with c 1 þc 2 2 taken as mean and jc 1 −c 2 j as standard deviation. Dynamics are simulated up to a time t of 50000 or up to the time when the mean square change of concentration (per species and time-step size) is smaller than 10 −20 .
We assume that the greatest topological factor influencing flow through the reaction network is the shortest path distance between the boundary species. Because we cannot perform simulation and analysis for one million pairs of boundary species, we sample 50 pairs of boundary species for all values of the shortest path occurring in the network.
A simple investigation of network flow and entropy distribution is done with boundary species concentrations set to c 1 = 0.1 and c 2 = 1. To get an error estimate, we generate 10 independent samples of every network type.
To investigate the response of the nonlinear networks to an increase in thermodynamic disequilibrium we vary the boundary conditions accordingly. For this we keep c 1 at 0.1 while varying c 2 from 0.2 up to 60. With higher values of boundary concentrations we notice an extreme increase of computational time needed to solve the individual ODEs. Thus, we are only able to simulate one network sample of every type for this setup.

Results
For our study we generate various random networks for each network model (Table 1). We simulate these networks for different boundary conditions and analyze the resulting steady state. In the following, we first compare the artificial networks with real networks and then show in detail how the flow and and energy dissipation depend on network structure and boundary condition.

Network Structure
We compare the topological features of our artificially generated networks with real world networks (Table 1). For this we use a compilation of chemical reactions in Earth's atmosphere [22] and models for the combustion of Methane [23] and Dimethyl ether [24]. Also a kinetic model of the metabolic network of Yeast [25], available through the BioModels Database [26], is investigated. To avoid that the representation of networks as substrate graphs biases our results [27] we compare each network with a randomized version of itself. When randomizing an artificial network we would obtain an Erdős-Rényi network with the same density and the same types of reactions. Thus, rows for randomized BA, WS, and PS networks are omitted in Table 1.
The power law scaling for Earth's atmospheric reaction network and the metabolic network of Yeast are clearly pronounced in comparison with their respective null models (Fig. 2 (B)). This is not true for the Ethanol combustion chemistry whose size of 57 species (nodes) does not allow to unambiguously decide on the scale-free property. The substrate graphs of the two combustion chemistries show the properties of small world networks, they have a small mean shortest path length and a high clustering coefficient. As their null models show the same properties, this can be attributed to their high density. All reaction networks have more cycles than their randomized counterparts. With the exception of the network from Yeast's metabolism all real networks also have a higher clustering coefficient or a higher value for modularity.
For a comparison of the artificial reaction networks with real thermodynamic data we use a table of reaction free energies (Δ r G 0 ) of reactions in glycolysis [28]. In our networks this corresponds to Dm 0 j ¼ P j N ij m 0 i . Because there is no way to assign a unique reaction direction to the reactions in the artificial networks, we are only comparing the distributions of absolute values jm 0 j j. The normalized (mean set to one) cumulative distributions show a more localized distribution with a wider tail for the data from glycolysis. The distribution for the artificial networks is over all more regular. The bimodal distribution for the data from glycolysys might be related to the fact that it describes two distinct processes, the tricarboxylic acid cycle and the pentose phosphate pathway.

Distance Dependency of Flow
To characterize the strength of the steady state flow for different network types, we start with the intuitive assumption that the main factor determining the flow is the distance between the two boundary species in the reaction network, measured by shortest path length d in the substrate graph. The dependency of the mean flow on shortest path length is shown in Fig. 3.
The flow through reaction networks created with small-world and clustering topology (Watts-Strogatz model) shows to be especially weakly dependent on boundary species distance d. In the linear as well as the nonlinear case these networks have a lower mean flow for small d ( 4) while for larger values of d, they have generally a larger flow than the other networks. We hypothesize that the flow for boundary points whose distance is close to the diameter is limited by the sparse connection of those boundary species to the network. The high clustering of Watts-Strogatz networks (cf. Table 1) apparently leads to their exceptional high flow for boundary points with a large distance d. This also agrees with the low sensitivity to boundary species distance that the Watts-Strogatz networks show.
The linear networks generated out of the Erdős-Rényi model and those generated with the Pan-Sinha model show a strikingly similar behavior. This may be due to their similar degree distribution (not shown).

Varying Flow through Nonlinear Networks
Unlike in linear networks, the flow and dissipation distribution in nonlinear networks depend on the absolute concentrations of the boundary species. For the variation of boundary concentration flow dependency of the concentration difference is in an intermediate regime (Fig. 4  (A)) and the slope in log-log plot takes a value between 1 and 2. This is plausible since the network consists of a mix of linear reactions and nonlinear reactions with at best quadratic behavior. Theoretically a stronger than quadratic dependency of flow from concentration difference would be possible for a specific boundary condition and a specific concentration range, but this possibility seems not to influence the mean behavior.
We look at the distribution of chemical potentials μ = μ 0 +ln(x i ) inside the reaction network for different strengths of the boundary condition. In Fig. 4 (B) the distributions P(μ) are shown for the simulated Barabási-Albert (BA) reaction networks with boundary species distance d of 3. The distributions in general are localized between the chemical potentials of the boundary species μ b 1 = ln(x b 1 ) and μ b 2 = ln(x b 2 ) (remember that μ 0 for boundary species is set to zero). While the distributions are almost uniform in this range for low flows, at higher flows the distributions are more shifted towards the upper part. Normalizing the standard deviation σ μ by Δμ = jμ b 2 −μ b 1 j confirms this finding (Fig. 4 (D)) and shows a narrower distribution relative to the chemical potentials of the boundary species.
The distributions of dissipation values of the reactions are to noisy to find out if they also get narrower for higher flows. Thus, we calculate the fraction of the dissipation explained by the 10% of reactions with the highest dissipation, f σ (0.1). We see that with higher flows the fraction of dissipation explained by these 10 percent of the network decreases ( Fig. 4 (C)). The networks generated from the Watts-Strogatz (WS) and the Pan-Sinha (PS) networks show an increase of f σ (0.1) for lower values, but above a flow of around 5 they also decrease. Explained differently, for higher flows one needs a larger part of the network to explain a given fraction of its dissipation. Together with the narrower distribution of chemical potentials we interpret this as the thermodynamic disequilibrium leading to a tighter coupling of the reaction network. This coupling leads to the chemical potential of different species to be closer and to the dissipation being more evenly distributed among reactions.

Flow Dependency of Cycle Number in Nonlinear Networks
There are many indicators that cycles have an important function in networks [9][10][11]. Cycles function as feedback mechanisms and stabilize the dynamics of the system against perturbations. Also cyclicity has been related to thermodynamic efficiency [29]. To check if there is a The number of small cycles is dependent on local topological properties of the network models. Thus, for evaluation we subtract the number of cycles found in networks with randomly chosen reaction directions (Table 1). For all network types we find a clear increase in the number of cycles with increasing flow (Fig. 5). This formation of additional cycles can be understood as the network self-organizes in thermodynamic disequilibrium to increase its flow and dynamic stability. Note that this supports the idea of the previous section of a closer coupling of the network with higher degree of disequilibrium.

Distribution of Entropy Production Rates
To see how dissipation is distributed inside of the networks, we calculate the entropy production rate for the individual reactions σ (Eq. 5) and look at their distribution for specific network topologies and boundary conditions. To better see the power law dependency, we plot the cumulative distribution 1 À R s À1 Pðs 0 Þds 0 , which describes the probability of the entropy production rate being higher than σ [30] instead of P(σ).
The distributions show no large qualitative differences between the different network models (Fig. 6). The power law in the intermediate regime is differently pronounced in its extent for different network types but the greatest difference is clearly seen between the slopes of linear and nonlinear networks. Assuming that P(σ) follows a power law, we get an exponent of about −1.5 for linear networks and of −1.66 for nonlinear networks. The steeper slope of the nonlinear networks can be interpreted as an effect of their reactions being better coupled. This can be seen by the fact that nonlinear (A+B⇋C+D) reactions are not depleting a potential between two species directly but there is always the probability that they increase the potential between two other species. The coupling implies a stronger connection of the flow between individual reactions and by this a stronger connection with the magnitude of dissipation.

Connectivity Dependence of Dissipation
To evaluate how the dissipation of a reaction depends on the connectivity of the involved species, for every species we calculate the mean dissipation of all reactions connected to it. Plotting the mean dissipation depending on the degree centrality of the species (in the substrate graph) shows a relatively high dissipation for reactions adjacent to lowly connected species (Fig. 7). This effect is more pronounced for nonlinear networks. When looking for reactions with high dissipation we should search in the vicinity of lowly connected species. This can be explained by the stronger connection between reactions generating and consuming the species. When the rate of a reaction that produces a species is increased, the additional flow has to be distributed over the consuming reactions. If there are many consuming reactions, there are more potential pathways to forward the flow while keeping the mean dissipation rate low.

Conclusions
We have simulated random reaction networks under thermodynamic constraints in order to provide insight into how energy is dissipated in complex reaction networks in thermodynamic disequilibrium. We observe a clear difference between linear and nonlinear networks. The power law has an exponent of % −1.5 for linear and a slightly lower exponent of % −1.66 for nonlinear networks. However, there are no qualitative differences between the distributions of entropy production rate for different complex network models like Erdős-Rényi, Barabási-Albert, etc. (Fig. 6). The differences between thermodynamic reaction networks of different topologies are more pronounced in the flow (Fig. 3) than in the other properties investigated.
We found that a greater disequilibrium in nonlinear reaction networks is associated with a more tightly coupled network. For a greater flow, the network self-organizes and maintains a greater number of cycles (Fig. 5). A greater flow also leads to a narrower distribution of chemical potentials (Fig. 4 (B), (D)). This is associated with results that suggest that for higher flows, a larger fraction of the network is necessary to explain a given fraction of its dissipation (Fig. 4  (C)). We interpret this as an increase in the system's complexity that comes along with a higher thermodynamic disequilibrium.
Finally, we found that reactions involving lowly connected species tend to dissipate more energy, which is more pronounced in nonlinear networks, but is also found in linear networks  (Fig. 7). This might help to identify reactions that play central roles in the energy dissipation of a complex reaction network.
We also showed how our artificial networks share topological properties with real reaction networks. The artificial networks are toplogically more similar to Earth's atmospheric chemistry and Yeast's metabolism than to the two investigated combustion chemistries. The main discriminating factor here is the high density of those two combustion chemistries. The distribution of thermodynamic parameters in the artificial networks only roughly matches data from reactions of glycolysis (Fig. 2 (B)). Obviously, the amount of thermodynamic data (37 reactions) is quite limited. Current progress in bioinformatic methods to reconstruct thermodynamic data [31][32][33] may improve the availability of such data in future and allow a better analysis.
Nevertheless, a fundamental problem of such a comparison remains. It is the way the data of reaction networks is obtained. In networks from chemical models, experimentalists and modelers have made a decision on which reactions are relevant. Experimentalists only find reactions that are occurring and are measurable in the systems they investigate. Also the modelers might just decide to exclude reactions with low reaction rates from their models. Hence, the reaction network taken from a model is already biased with respect to the model's intention. Our approach with artificial networks, however, assumes the artificial network is a set of (hypothetically) possible reactions; which reactions become important emerges from the dynamics and can be different depending on the boundary conditions. Thus, we suggest to investigate such emergent phenomena in the future. This is possible by taking smaller artificial reaction networks and then looking at their reaction pathways using elementary flux modes [34]. This would also allow to test the relationship between the rate of an elementary mode and its entropy production [35,36]. The thermodynamics of reaction networks and of cycling processes therein may also provide insight into the origins of life. Revealing how thermodynamics constraints the behavior of complex reaction networks will be an important ingredient in understanding the role of thermodynamics in domains like prebiotic chemistry [11,37], biogeochemistry, and cellular systems.