Construction and Analysis of the Model of Energy Metabolism in E. coli

Genome-scale models of metabolism have only been analyzed with the constraint-based modelling philosophy and there have been several genome-scale gene-protein-reaction models. But research on the modelling for energy metabolism of organisms just began in recent years and research on metabolic weighted complex network are rare in literature. We have made three research based on the complete model of E. coli’s energy metabolism. We first constructed a metabolic weighted network using the rates of free energy consumption within metabolic reactions as the weights. We then analyzed some structural characters of the metabolic weighted network that we constructed. We found that the distribution of the weight values was uneven, that most of the weight values were zero while reactions with abstract large weight values were rare and that the relationship between w (weight values) and v (flux values) was not of linear correlation. At last, we have done some research on the equilibrium of free energy for the energy metabolism system of E. coli. We found that (free energy rate input from the environment) can meet the demand of (free energy rate dissipated by chemical process) and that chemical process plays a great role in the dissipation of free energy in cells. By these research and to a certain extend, we can understand more about the energy metabolism of E. coli.


Introduction
Since various 'Omics' datasets are becoming available, biology has transited from a data-poor to a data-rich environment. Systems biology has become a rapidly growing field as well [1]. Genome-scale models of metabolism have only been analyzed with the constraintbased modelling philosophy [2,3]. Genome-scale network models of diverse cellular processes have been generated and there have been several genome-scale GPR (gene-protein-reaction) models [4][5][6][7][8][9][10]. An extensive set of methods for analyzing these genome-scale models have also been developed and have also been applied to study a growing number of biological problems [12,13]. But research on the energy metabolism of organisms just begin in recent years, such as FVA (flux variability analysis) [14,15] and EBA (energy balance analysis) [16][17][18], and so on. All these methods are depended on the modelling of energy metabolism system of organisms. Data of Gibbs free energy of formation of every compound and Gibbs free energy change of every reaction is the core in this kind of modelling. Up to now, the most detailed genome-scale GPR model is the iAF1260 version of E. coli [5], but the modelling of it's energy metabolism is still incomplete [5]. There are 2381 reactions (not including the reaction defined for growth) and 1039 metabolites in E. coli_iAF1260, and apart from 304 EX_ & DM_ reactions (The text 'EX_' denotes an exchange reaction for a metabolite that can enter or leave the extracellular compartment. 'DM_' reactions are similar and signify compounds that the degradation pathway is unknown), the reconstructed reaction number is 2077 [5]. By the newest group contribution method (GCM), D r G of 1996 reactions (96%) and D f Gof 872 compounds (84%) can be estimated [19][20][21][22]. There leaves Gibbs free energy change (D r G) of 81 reactions (4%) and Gibbs free energy of formation (D f G) of 167 compounds (16%) unknown for E. coli_iAF1260 [5,19]. We have complemented, by computational method, the remaining unknown D r G (including the standard Gibbs free energy change D r G' 0 and the free energy change of reaction at 1 mM concentrations for all species D r G' m ) and D f G (just the standard Gibbs free energy formation D f G' 0 ). Energy metabolism models of other organisms, as we know, do not exist up to now. Research on metabolic weighted complex network are also rare in literature except that Almaas has used flux value as the weight of metabolic network [23].
In this paper, we have done three research on the complete model of E. coli's energy metabolism. First, we construct a metabolic weighted network using the rates of free energy consumption in metabolic reactions as the network weights. Then we did some research on some structural characters of the metabolic weighted network we constructed. At last, we did some research on the equilibrium of free energy for the energy metabolism system of E. coli.

Materials and Methods
Before constructing the metabolic weighted network, we complement the remained unknown D r G and D f G in E. coli _iAF1260. Then we construct the model using the rates of free energy consumption in metabolic reactions as the network weights.
At first, we draw the metabolic unweighted network of E. coli; we then calculate the flux distribution of E. coli_iAF1260; the third, we calculate the weights of the metabolic weighted network of E. coli that we will construct; at last, we calculated the input and output of free energy about E. coli.

Complement the Remained Unknown Free Energy
Change of E. Coli_iAF1260 1) Infer the unknown standard Gibbs free energy of formation for 167 compounds. The stoichiometric matrix, S, is the center-piece of a mathematical representation of genomescale metabolic networks. It represents each reaction as a column and each metabolite as a row, where each numerical element is the corresponding stoichiometric coefficient. In the calculation of D f G of compounds or D r G of reactions, we should use ''reaction with marvin charges (pH7)'' as the stoichiometry [5].
There are 1039 metabolites in the iAF1260 model, and if distinguishing the different compartments in the cell, i.e. [c] (cytoplasm), [e] (extracellular space), and [p] (periplasm), the total number of metabolites in the model is 1668. So there are 1668 rows in the stoichiometric matrix S. For a certain compound even in different compartments, the free energy change of formation is the same. In these 1039 metabolites, Gibbs free energy of formation of 872 compounds (84%) can be estimated by group contribution method [19][20][21][22], while 167 compounds (16%) remained unknown for E. coli_iAF1260. There are 2381 reactions in the iAF1260 model, so there are 2381 columns in the stoichiometric matrix S. For those 2077 reconstructed reactions, Gibbs free energy change of 1996 reactions (96%) can be estimated by group contribution method [19][20][21][22], while 81 reactions (4%) remained unknown for E. coli_iAF1260.
From the equation (1) and (2) of the paper [19], we can infer that Where D r G' 0 est is the estimated D r G' 0 ; v i is the stoichiometric coefficient of species i in the reaction, and m is the number of species involved in the reaction; D gr G' 0 j is the contribution of group j; n j is the number of instances of group j in the molecular structure; N gr is the number of groups for which D gr G' 0 j is known; est is the estimated Gibbs free energy of formation of i-th species. Equation (3) reflects the relationship between D f G' 0 and D r G' 0 for a reaction.
From those 167 compounds with unknown D f G' 0 , we seek out their involved reactions. Gibbs free energy change of one reaction can not be calculated, if a compound with unknown free energy change of formation appears in it. But the case is not for all. In the reactions involving the structural group with unknown energy, the structural group appears on both sides of the reaction, which means it cancels out of ''group energy change'' for the reaction. That is to say, while the compounds contain a structural group with unknown energy (such as ''R'' group, a pseudoatom) and appear in a reaction and the reaction does involve a change in the group, we can still calculate and estimate the energy change of the reaction, but we cannot estimate the formation energies of the compounds. Because a reaction with unknown D r G may include several compounds with unknown D f G, from equation (3), the values of those 81 unknown D r G and the values of those 167 unknown D f G may be interdependent. So we cannot infer the unknown D f G of compounds in a reaction just from the value of known D r G of this reaction, but we may infer the 167 unknown D f G of compounds from all of the values of known D r G by solving their simultaneous equations. Now we infer these unknown D f G' 0 from the known D r G' 0 data of those involved reactions. We use vector X  (3), we can obtain the following equation Where T (with dimension 16686167) is the transfer matrix from vector X to the vector indicating the values of D f G' 0 of entire 1668 compounds in the model of E. coli_iAF1260; S (with dimension 166862381) is the stochiomatrix of E. coli_iAF1260, S T is its transpose. Further we obtain Write it as Where S S~S T : T, F F~F{S T : P The dimension of vector F F is 2381, and there are 1996 F F(i) with known D r G' 0 . But there are 244 rows of S S : X with unknown D r G' 0 in the corresponding 1996 rows of F F(i), for the product of the corresponding rows of S S and X includes the sub-variables of X and while the remained 1752 rows of S S : X do not include the subvariables of X. So we can use these 244 rows from S S : X~ F F to get a new equationŜ Where the dimensions of matrixŜ S andF F are respectively 2446167 and 24461. By solving equation (7), we can obtain the D f G' 0 values of those 167 compounds which are unknown previously. Although equation (7) is not an exact equation (the row rank of matrixŜ S is not equal to the column rank of matrixŜ S), its solution is of least-squares.
2) Calculate the unknown Gibbs free energy change of 81 reactions. Conversely, we now use the obtained D f G' 0 data to calculate the 81 unknown D r G' 0 . The method is to substitute the solution value (defined as X 0 ) ofX X which we got from equation (7) to the equation (4), and by a simple calculation, we got the vector Now all the sub-variables of F 0 are known, so we can now obtain the D r G' 0 values of those 81 reactions which are unknown previously.
3) Adjust D r G' 0 (the standard Gibbs free energy change) to D r G' m (the free energy change of reaction at 1mM concentrations). The 1M reference state for the metabolite concentrations on which D r G' 0 is based does not accurately reflect the metabolite concentrations found in the cell (approximately 1 mM). Thus, we should computationally adjust all estimated D r G' 0 to the free energy change of reaction at 1 mM concentrations for all species, D r G' m . The relationship between D r G' m and D r G' 0 is as follows [5,19].
Where R is the universal gas constant; T is the temperature assumed to be 298 K; n i is the stoichiometric coefficient of compound i in the reaction (n i is negative for reactants and positive for products); PR is the set of products and reactants in this reaction. Note also that for H 2 , we should substitute 0.000034 for 0.001; For O 2 , we should substitute 0.000055 for 0.001; For H 2 O and H + , we should not include these compounds in the concentration portion of the calculation at all [5,19]. Here, all of the D r G' m values reported in our work have included the energy contribution of the transmembrane electrochemical potential and proton gradient for all reactions involving transport across the cytoplasmic membrane.

Unweighted Network of E. Coli_iAF1260
The general features of E. coli_iAF1260 are given in Ref. [5]. Two SBML (systems biology markup language) format files to the model E. coli_iAF1260 can be downloaded from the supplementary information of Ref. [5]. The in silico model that we used is E. coli_iAF1260_flux1.xml. SBML file properties are also given in Ref. [5]. The dimensions of rxns, mets, and genes are respectively 2382, 1668, 1261. The minimal media of in silico model is also an important aspect. The computational minimal   [5]. Column A -reactions with unknown D r G; . Column B -reactions with known D r G but involving compounds with unknown D f G; . Column C -reactions with known D r G but not involving compounds with unknown D f G; . Column D -total reactions with knownD r G. doi:10.1371/journal.pone.0055137.t001 media of E. coli_iAF1260 is also included in the supplementary information of Ref. [5]. In the method of constraint-based analysis, the biomass objective function (BOF) should be defined. The BOF was generated by defining all of the major and essential constituents that make up the cellular biomass content of E. coli [5]. Gene-protein-reaction associations embodied in rxnGene-Mat matrix, which is a matrix with as many rows as there are reactions in the model and as many columns as there are genes in the model. The ith row and jth column contains a one if the jth gene in genes is associated with the ith reaction in rxns and zero otherwise. The simulation condition (the nutrients and the uptake rates of the nutrients) of this paper is the same as in the file.

Flux Distribution of E. Coli_iAF1260
We now calculate the flux distribution of E. coli_iAF1260. The computational method we use is flux balance analysis (FBA) [11], one of the fundamental genome-scale phenotypic calculations, which can simulate cellular growth. FBA is based on linear optimization of an objective function, which typically is biomass formation. Given an uptake rate for key nutrients and the biomass composition of the cell (usually in mmol component gDW 21 and defined in the biomass objective function), the maximum possible growth rate of the cells can be predicted in silico.

max Z~vgrowth ð9Þ
Subject to Where S is the stoichiometric matrix, and a i and b i define the bounds through each reaction v i . The flux range was set arbitrarily high for all internal reactions so that no internal reaction restricted the network, with the exception of irreversible reactions, which have a minimum flux of zero. The inputs to the system were restricted to a minimal media. We use the COBRA toolbox [11] to carry out this computation of FBA. The flux distribution of E. coli_iAF1260 is illustrated in Fig. 1.

Metabolic Weighted Network Construction for E. Coli_iAF1260
By the newest group contribution method, D r Gof 1996 reactions (96%) and D f G of 872 compounds (84%) can be estimated [19][20][21][22]. We have complemented, by computation method, the remained unknown D r G (including the standard Gibbs free energy change D r G' 0 and the free energy change of reaction at 1 mM concentrations for all species D r G' m ) and D f G (just the standard Gibbs free energy formation D f G' 0 ). So a complete set of the data of free energy changes for reactions in E. coli can be obtained. . Now we can construct a metabolic weighted network. There is not a standard manner of determining reaction edge weights and Almaas has used flux value as the weight of metabolic network [23]. Here we use the rates of free energy dissipation in metabolic reactions as the network weights. For D r G' m of a reaction is the free energy dissipation in unit mol while flux is the passed mol number in unit time (as second). So the rate of free energy dissipation in every reaction is the multiplying product of the flux in this reaction and the free energy change of this reaction.
Where w i is the weight of i-th edge (i.e. reaction) of metabolic network, D r G' m i is the free energy change of i-th reaction and v i is the flux value of i-th reaction.

Calculation of Input and Output of Free Energy in E. Coli_iAF1260
For an open system at nonequilibrium steady state, from the theory of system science, its free energy rate dissipated by the system, E in , is in absolute value equal to the free energy rate input by the environment, E out .
We also distinguish the free energy rate E in to E ch in and E ph in respectively dissipated by chemical process and by physical process that take place in the cell (eq. 14a), while the free energy rate input from environment and through physical process can be neglected(eq. 14b).
For E. coli, we have known all of its reactions, the values of corresponding D r G' m and the values of corresponding flux, so we can calculate its E ch in and E out using the following equation (15) and (16).
Where R in is the set of reactions of the metabolic network excluding EX_ & DM_ reactions (The text 'EX_' denotes an exchange reaction for a metabolite that can enter or leave the extra-cellular compartment. 'DM_' reactions are similar and signify compounds that the degradation pathway is unknown), D r G' m i is the free energy change of i-th reaction in R in and v i is the flux value of i-th reaction inR in .
Where R out is the set of EX_ & DM_ reactions of the metabolic network, D r G' m j is the free energy change of j-the reaction in R out and v j is the flux value of j-th reaction in R out .

Complement the Remained Unknown Free Energy Change of E. Coli_iAF1260
With our method, we obtain Gibbs free energy change (D r G) of 81 reactions (see Table S1) and Gibbs free energy of formation (D f G) of 167 compounds (see Table S2) which are previously unknown for E. coli_iAF1260. We add our computed D f G' 0 of those 167 compounds to the former known D f G' 0 of 872 compounds, and obtain a complete set of D f G' 0 of E. coli_iAF1260. The entire D f G' 0 values of E. coli_iAF1260 are consistent with the known D r G' 0 of 1996 reactions (see Table S2 and Table 1). So we conclude that our computed D f G' 0 of those 167 compounds can also agree with the unknown D r G' 0 of 81 reactions. Up to now, there is no experimental data in literatures to test the D f G' 0 values of those 167 compounds and the D r G' 0 values of those 81 reactions.
It is important to know free energies for all metabolites and reactions in E. coli by using our method. First of all, the reason why GCM can't calculate the free energies for all metabolites and reactions in E. coli or other organisms is that the free energies of some molecular substructures are present in organic-inorganic complexes involving iron, nickel, or cobalt for which the new group contribution method has not been designed [19]. So if we use large scale free energy datasets, not confined to E. coli, such as free energies for reaction of KEGG [19], we will get free energies for more metabolites which can't be calculated by the GCM in ref. [19]. Even more, we can estimate some of the free energies of some molecular substructures in organic-inorganic complexes. So the method in our paper will directly contribute to GCM. At the same time, free energies for reactions are useful reference in determine the directions of reactions in cell [5,17] and can also be used as constraints in FBA [18], so if we know all the free energies of reactions for an organism, we can better carry out these tasks.

Some Structural Characters of the Metabolic Weighted
Network of E. Coli_iAF1260 1) Uneven distribution of the weight values of the metabolic weighted network of E. coli_iAF1260. We can calculate the weight values of the metabolic network of E. coli_iAF1260 using the above equation (12) (see Table S2). We can easily find that the distribution of the weight values is uneven and that most of the weight values are near zero while reactions with abstract large weight values are rare, illustrating in Fig. 2 and Table 2. The reason for the uneven distribution of weight values maybe lies in the uneven distribution of fluxes and the uneven distribution of Gibbs free energy change of reactions. From the uneven distribution of weight values, we can learn that there just are some main channels of free energy dissipation in the physiological process of E. coli. Table 3 has illustrated some reaction channels which have large weight values and Table 4 gives the functions of these reactions.
2) Reactions of large weight values and their related genes. Table 3 shows high w scopes, corresponding reaction number within these scopes and these reaction names, and we call these reactions the highly-dissipative reactions in the energy metabolism of E. coli. We examined into these large weights and found that they were the result of joint action from flux and free energy dissipation, while their D r G' m values were not the highest level. Table 4 gives all of the genes related to these reactions and the rules among genes in these reactions (rules are defined as the relationship among genes catalyzing a reaction such as ''AND, OR, NOT'' and these rules can be found in Supplementary   Information 1 of [5]), and we find that all of them are not essential genes from the literature [5]. This is important in the energy metabolism of E. coli, the deletion or loss of one gene will not result in death, and this may be due to the result of evolution.

3) Correlation between the weight values and the flux
values. By comparing the distribution of the weight values ( Fig. 2) with the distribution of fluxes (Fig. 1), we can also find that they are not consistent. Fig. 3 is the scatter diagram (w, v), 2077 data pairs in total. Many data pairs are superposition and locate at the same place. From the diagram, we can easily find that the relationship between w and v is not of linear correlation. So we can't say that a reaction with high flux has a corresponding high weight and vice verse, and in other words, we can't say that a reaction with high flux will dissipate more free energy. In fact, many different flux values correspond to the same weight value.
Although there is no consistency between flux values and energetic weights, energetic weights are very useful and important in determine the distribution of reaction fluxes. We have defined energetic weights as the rates of free energy dissipations, i.e. the multiplying product of the reaction fluxes and the free energy changes of reactions. Free energy dissipation can be regarded as the counterpart of entropy increase. Based on maximum entropy production principle (MEPP), the authors of the paper have developed a method to improve the prediction accuracy of flux balance analysis [24].

Equilibrium of Free Energy in the Energy Metabolism of E. Coli_iAF1260
There are 2077 reactions in R in and 304 reactions in R out of E. coli_iAF1260. The values of E ch in and E out which we calculated are respectively 21424.7 and 1890.1, see Table 5. So E out can meet the demand of E ch in and E out is a little more than E ch in . The absolute difference between E out and E ch in is an estimation of E ph in , and we can find that it takes about a quarter of E in , so we can conclude that chemical process plays a great role in the dissipation of free energy in cells while physical process can not be ignored.

Conclusions
In this paper, we constructed a metabolic weighted network by using the rates of free energy consumption within metabolic reactions as the network weights. We found several important and interesting results: 1) the distribution of the weight values was uneven; 2) the relationship between w (weight values) and v (flux values) was not of linear correlation; 3) by analyzing of the freeenergy equilibrium for the energy metabolism system of E. coli, we found that it is chemical process other than physical process that plays a great role in the dissipation of free energy in cells. By these research and to a certain extend, we can understand more about the energy metabolism of E. coli.
In our next step, we will conduct a similar type of analysis for different organisms using some of the other readily available constraint-based models and run the baseline simulations for growth in different carbon substrate environments. We will also do FBA analysis including energetic weighting as an additional constraint to bias flux distributions.

Supporting Information
Table S1 Standard Gibbs free energy formation D f G' 0 of 167 compounds.