Cellular determinants of metabolite concentration ranges

Cellular functions are shaped by reaction networks whose dynamics are determined by the concentrations of underlying components. However, cellular mechanisms ensuring that a component’s concentration resides in a given range remain elusive. We present network properties which suffice to identify components whose concentration ranges can be efficiently computed in mass-action metabolic networks. We show that the derived ranges are in excellent agreement with simulations from a detailed kinetic metabolic model of Escherichia coli. We demonstrate that the approach can be used with genome-scale metabolic models to arrive at predictions concordant with measurements from Escherichia coli under different growth scenarios. By application to 14 genome-scale metabolic models from diverse species, our approach specifies the cellular determinants of concentration ranges that can be effectively employed to make predictions for a variety of biotechnological and medical applications.


Author summary
We present a computational approach for inferring concentration ranges from genomescale metabolic models. The approach specifies a determinant and molecular mechanism underling facile control of concentration ranges for components in large-scale cellular networks. Most importantly, the predictions about concentration ranges do not require knowledge of kinetic parameters (which are difficult to specify at a genome scale), provided measurements of concentrations in a reference state. The approach assumes that reaction rates follow the mass action law used in the derivations of other types of kinetics. We apply the approach with large-scale kinetic and stoichiometric metabolic models of organisms from different kingdoms of life to show that we can identify a proportion of metabolites to which our approach is applicable. By challenging the predictions of concentration ranges in the genome-scale metabolic network of E. coli with real-world data sets, we further demonstrate the prediction power and limitations of the approach.

Introduction
Advances in systems biology studies have been propelled by the availability of high-quality genome-scale metabolic reconstructions for many organisms across all kingdoms of life [1]. Metabolic network reconstructions contain information about metabolites and reactions through which they are transformed to support different cellular processes [2,3]. Alongside enzyme concentrations and phenomenological constants, reaction rates and metabolite concentrations-as two faces of the metabolic phenotype-characterize key aspects of the metabolic capabilities of an organism. Since metabolic concentrations are important determinants of reaction rates [4], understanding what controls their physiological ranges can point to cellular mechanisms that control phenotypic plasticity to ensure viability of organisms under changing conditions [5].
A naïve approach to determine a concentration range for a given component is to assume that it is present with a single molecule or that the entire cell dry weight under an investigated scenario is composed solely of that component. This derivation requires only knowledge of the component's molecular weight, which is readily available. However, the derived ranges are vast and largely invariant under different scenarios; therefore, they are not informative. Here we ask whether large-scale metabolic models can be used for accurate prediction of concentration ranges. Resolving this question is tantamount to identifying a cellular mechanism underlying the control of concentration range for given cellular component.
The change in concentration of metabolites can be described by a system of coupled ordinary differential equations (ODEs), dxðtÞ dt ¼ Nv t ð Þ, where v(t) = (v 1 (t), � � �, v n (t)) T denotes reaction rates and x(t) = (x 1 (t), � � �, x m (t)) the metabolite concentrations at time t, and N represents the stoichiometric matrix. The rows of the stoichiometric matrix correspond to metabolites, columns stand for reactions, and its entries denote the stoichiometric coefficients with which metabolites participate in reactions as substrates or products [6]. Reaction rates are modeled according to a kinetic law, v(t) = f(x(t), θ), which often leads to nonlinearities and involves multiple parameters, denoted by θ [7]. As a result, the coupled nonlinear ODEs are often not analytically tractable and their simulations are challenging. These issues arise since parameters remain poorly specified at a genome scale for the majority of model organisms [8,9] and the nonlinear ODEs may lead to numerical issues [10]. In addition, determining the steady-state concentration ranges by characterizing the solutions to the system of non-linear equations Nf (x(t), θ) = 0 is intractable for large-scale networks even when the equations have a simplified mass action form often used in metabolic modeling [11].
Feasible steady-state reaction rates, v, for which Nv = 0, can be predicted based solely on the structure of the network with computational approaches from the constraint-based modeling framework [12]. However, since intracellular reaction rates cannot be measured directly, the validation of these predictions requires laborious labeling experiments and model fitting procedures [13]. By neglecting the effect of concentrations on reaction rates, constraint-based approaches do not facilitate the usage of metabolic network reconstructions to predict concentrations of metabolites, which are becoming more accessible by quantitative metabolomics technologies [14].
The existing constraint-based approaches that can make predictions of metabolite concentrations and their ranges are based on consideration of thermodynamics constraints. Thermodynamics-based metabolic flux analysis (TMFA) produces flux distributions that do not contain any thermodynamically infeasible reactions or pathways, and it provides information about the free energy change of reactions and the range of metabolite concentrations in addition to reaction fluxes [15]. However, due to uncertainty in the estimation of the standard Gibbs free energies, TMFA usually predicts unconstrained ranges for metabolite concentrations (see Discussion in Henry et al. [15]). Metabolic Tug-of-War (mTOW) extends TMFA but is based on a non-convex optimization approach which comes at a cost of local optima and lack of robustness of predictions (validated by correlation [16]). A method to predict metabolite concentration ranges with limited knowledge about the underlying kinetic laws and parameter values would allow direct integration and validation of genome-scale models with experimental data from metabolomics technologies, enabling systems biology applications, from engineering of intervention strategies to design of new drugs [17][18][19].
Here we provide an approach which relies on the structure of the network, encoded in the stoichiometric matrix, to provide simulation-free prediction of steady-state concentration ranges by employing mass action kinetics. We focus on mass action kinetics since it underlies the derivation of more involved types of kinetics for different reaction mechanisms [20], allows for consideration of enzyme concentration if enzymes appear as reaction substrates, and provides a simple mathematical form that may be amenable to analytical treatment. The usage of mass action was here also favored due to lack of information about reaction mechanisms at a genome-scale level. The approach expands on the well-established concept of full coupling of reactions [21] to consider pairs of reactions whose ratio of mass-action-compatible fluxes depends only on the respective rate constants. Thus, this flux ratio is invariant at any, not necessarily steady, state of the system. The approach is also refined to predict concentration ranges for unseen cellular scenarios provided concentration data from a reference experiment. Our method complements the constraint-based modeling framework, focused on analysis of steady-state reaction rates, and thus enables a comprehensive characterization of feasible concentrations in genome-scale metabolic networks under specified conditions.

Metabolites with structurally constrained concentrations (SCC)
Consider a metabolic network composed of m metabolites that participate in n reactions. The (m × n) stoichiometric matrix, N, can be written as a difference of two non-negative matrices, N = N + − N − , where N + includes the stoichiometry of the products and N − comprises the stoichiometry of the substrates of each reaction. For instance, the stoichiometry of substrates and products given in Fig 1b describes the metabolic network on Fig 1a. We assume that the rate of reaction R i is modeled according to mass action kinetic, where θ i > 0 is the reaction constant and the concentration of each substrate molecule appears in v i as a multiplicative factor.
To state our main result, we require some concepts and terminology which we next introduce and illustrate. We will say that a reaction R k lacks one substrate molecule of X i in comparison to reaction R l , if N À il À N À ik ¼ 1 and for every i Under the assumption of mass action kinetic, if a reaction lacks one substrate molecule in comparison to another, the reactions differ in their orders by one. As a result, the ratio of fluxes for such reactions at any state of the system depends only on the rate constants and the concentration of the substrate in which the reactions differ.
Furthermore, two reactions R k and R l are fully coupled if there exists λ > 0, such that v l = λv k for any positive steady-state reaction rate v, i.e., Nv = 0 [21]. Therefore, fully coupled reactions have an invariant ratio over all positive steady states that the network admits, and full coupling is a transitive relation. For the network in Fig 1a, reaction R 3 is fully coupled to R 4 and R 5 is fully coupled to R 6 . Such reactions, which are fully coupled irrespective of the kinetic law, can be efficiently determined based on the stoichiometry of large-scale networks by linear programming [21,22] (see Materials and methods).
Under the assumption of mass action kinetic, two reactions that share the same substrates of same stoichiometry are also fully coupled [23]. In this case, the coupling holds for any, not necessarily steady, state of the system. Therefore, the consideration of mass action kinetic expands the set of fully coupled reactions. For instance, this is the case for reactions R 3 and R 5 that have the substrate components of same stoichiometry in Fig 1a, whereby v 3 v 5 ¼ y 3 y 5 . Consider now a metabolite X j with an ODE given by where P j is the set of reactions with X j as one of their products and S j is the set of reactions which have metabolite X j as one of their substrates. A metabolite X i , not necessarily different from X j , has structurally constrained concentration (SCC), if the following conditions hold: (i) for each reaction R l in S j , there exists a non-empty subset Q À i l of reactions lacking one substrate (a) Reaction diagram that includes seven reactions, R 1 -R 7 , and three components, A-C. (b) stoichiometric matrices associated with substrates, N − , and products, N + , for the network in (a). Reaction R 7 lacks one substrate molecule of B in comparison to R 2 , since N À 22 À N À 27 ¼ 1 and N À i2 À N À i7 ¼ 0 for every i 6 ¼ 2. Reactions R 3 and R 5 share the same substrate components with same stoichiometry, and hence their fluxes are fully coupled under the assumption of mass action kinetic. Reaction R 3 is fully coupled to reaction R 4 , as are reactions R 5 and R 6 . (c) Component B exhibits structurally constrained concentration from the ODEs of components A and C. The network exhibits different positive steady states with changing rate of reaction R 1 . molecule of X i in comparison to R l ; the union of all Q À i l yields the set of reactions S À i j ; (ii) all reactions in S À i j are mutually fully coupled; and (iii) all reactions in P j are mutually fully coupled. A similar argument can be made with respect to condition (i) in terms of reactions in the set P j (Materials and methods). A metabolite X i that satisfies the conditions above will be referred to as a SCC metabolite.
In the following, we use the ODE for metabolite X j to derive the concentration bounds for a metabolite X i with SCC. Let Q be a subset of S À i j that contains one and only one reaction from each of Q À i l . Under mass action, for the flux of every reaction R l 2 S j , it then holds that v l ¼ x i y l y À i l v À i l (see Materials and methods), where y À i l is the reaction constant and v À i l the flux of a reaction R À i l 2 Q. The expression for dx j dt above then becomes At any positive steady state, it then holds that Due to the conditions (iii), above, the sum s p ¼ v p is a constant which, in the simplest case, when all reactions in P j are fully coupled irrespective of the kinetic rate law, depends only on the network structure. In addition, due to condition (ii), above, the value of s À i s ¼ is also a constant which depends on both the network structure and a subset of rate constants. The rate constants which appear in the expression for s À i s and σ p for any Q � S À i j will be referred to as relevant rate constants, while the flux ratio v p v À i s will be called relevant flux ratio. Therefore, given a steady-state flux distribution, v, a set Q � S À i j , and two reactions R p 2 P j and R À i s 2 Q, we have that . This derivation establishes a direct relation between a flux distribution, under specified inputs from the environment, and the concentration of a SCC metabolite. We can also use the derived expression to obtain the concentration bounds for x i over any set, F, of steady-state flux distributions and subset Q (per definition above), yielding the following: For instance, component B in Fig 1a is SCC, derived from the ODE of component A, whereby the relevant flux ratio is v 4 v 7 and the relevant rate constants are θ 3 and θ 7 (Fig 1c). Similarly, one can show that component B is SCC from the ODE of component C.
Let the lower and upper bounds for the concentration of metabolite X i derived from the ODE of metabolite X j in Eq (1) be denoted by L j i ¼ min fQ;Fg respectively. If there are r metabolites X d , 1 � d � r for which Eq (1) applies, then the lower and upper bounds for the concentration of X i are given by the intersection of the ranges derived from the ODEs of X d , i.e..
Therefore, the lower bound is the minimum of the maxima, while the upper bound is the maximum of the minima derived from the individual ODEs. In case that the SCC of a metabolite can be derived from multiple ODEs, Eq (2) provides more constrained predictions about metabolite concentration ranges than Eq (1) alone. For instance, component B in Fig 1a is SCC not only from the ODE of component A but also from that of C, whereby the relevant flux ratio is v 6 v 7 and the relevant rate constants are θ 5 and θ 7 (Fig 1c). In case that the upper bound is smaller than the lower bound in Eq (2) then the system of ODEs does not have a positive solution for X i , which implies that the network does not allow a positive steady state. Therefore, the approach can also be used to check for existence of positive steady state with respect to a SCC metabolite under mass action kinetics.

Validation of the approach with a large-scale kinetic model of E. coli
The proposed approach can be employed to determine metabolite concentration ranges by using information about full coupling of reactions, fluxes entering relevant flux ratios, and the relevant reaction rate constants. To validate the predictions, we employ a detailed kinetic model of elementary metabolic reactions of E. coli [8] from which these inputs are readily available. Of the 830 metabolites interconverted by 1,474 elementary reactions in the model, our approach determines that 23 metabolites exhibit SCC. The ranges for these SCC metabolites are fully determined by 67 relevant rate constants (4.6% of all rate constants) and fluxes of 67 reactions (4.6% of all reactions) which enter in the relevant flux ratios. We use the kinetic model to simulate 100 steady states from different initial conditions (Supplementary S1 Table). We determined the Euclidean distance between the predicted and simulated lower and upper bounds to demonstrate their quantitative agreement. Since metabolite concentrations vary over several orders of magnitude, the results based on Euclidean distance will be biased by the presence of very large metabolic pools; therefore, we also considered two variants of relative Euclidean distance (see Materials and methods). Our results from the quantitative comparison demonstrate a very good agreement between the predicted and simulated bounds (Supplementary S2 Table, Supplementary S1 Fig). We also employ the Pearson correlation to assess if the predicted and simulated bounds agree qualitatively across the metabolites with SCC. We determine that there is a perfect qualitative match between the predicted and simulated lower (1, p-value < 10 −6 ) and upper bounds (1, p-value < 10 −6 ) of the SCC metabolites (Supplementary S2 Table).
It has been recently proposed that the shadow prices of metabolites can be used to quantify the ranges of metabolite concentrations, under the assumption that the cellular system optimizes an objective [24]. To compare the performance of shadow prices as a measure of metabolite concentration ranges, we employ the stoichiometric matrix of the analyzed kinetic model by using the maximization of metabolic exchange fluxes as cellular objective, shown to outperform yield as a predictor of growth rate [25]. We used the kinetic model, since it provides full control in the comparison of simulated and predicted concentration ranges. We did not use optimization of yield, most widely used in flux balance analysis, since the model has been parameterized without consideration of a biomass reaction. We observe that for the analyzed model and the physiologically relevant objective, the calculated shadow prices for the 23 SCC metabolites cannot be used as indicators of concentration variability due to the weak negative correlation with the concentration ranges as well as with the coefficients of variation of the SCC metabolites (Supplementary S2 Table). These findings point out that our approach, in absence of a cellular objective but with knowledge about a few rate constants and selected flux ratios, outperforms the existing contender for quantifying concentration ranges in large-scale metabolic networks.

Effects of missing information about rate constants
While the full reaction couplings considered by our approach can be readily obtained given the structure of the network and flux ratios are increasingly available from labeling approaches [26], the resulting predictions can be affected by missing information about rate constants. To assess the effect of missing information on the accuracy of predictions, we consider the cases that 10-90% of rate constants used in the derivation of the ranges for the metabolites with SCC are known (see Methods). We consider three scenarios whereby the missing ratios of rate constants, appearing in Eq (1), are substituted by: (i) a value of one, simulating a scenario in which all relevant rate constants are of the same value, (ii) the mean, or (iii) the median of the ratios of relevant rate constants that are present (i.e., known) in the model equation from which the conditions for SCC are established. We note that the units of the rate constants are not relevant since rate constants enter Eq (1), above, as ratios.
We find that the substitutions for the missing ratios of rate constants according to the three scenarios, as expected, decrease the Pearson correlation between predicted and simulated ranges over 100 instances of models in which relevant rate constants were removed at random (Fig 2). Nevertheless, even when only 30% of the relevant rate constants are known for the cases (i) and (iii), we obtain a median Pearson correlation coefficient between the predicted and simulated ranges of at least 0.6 ( Fig 2). Substituting the missing ratio of rate constants with the mean of the ratios shows the largest variability over the 100 instances of models with partial knowledge of rate constants. The reason for this finding is that the distribution of rate constants and their ratios are highly left-skewed (Supplementary S2 Fig). Therefore, we conclude that even in the absence of information about rate constants that matches the current state-ofthe-art of knowledge about E. coli (Supplementary S3 Table), our approach provides qualitatively reliable estimates of concentration ranges in large-scale models. The ordering of lower and upper bounds between metabolites can be predicted well (median significant Spearman correlation above 0.75 at significance level of 0.05 for all scenarios). However, we observe that the median over relative and log-transformed Euclidean distances between predicted and simulated lower as well as upper bounds over the 23 SCC metabolites are small (<0.71 and <0.08,

Fig 2. Effect of missing information about relevant rate constants on the accuracy of concentration range predictions for a large-scale kinetic model of E. coli.
We consider 10-90% of the relevant rate constants to be unknown by random removal. We consider three scenarios for the substitution of missing ratios of rate constants: (i) equality (i.e., kinetic rate constants are assumed to be the same), (ii) the mean, or (iii) the median of the ratios of relevant rate constants that are still present in the model. Shown are the boxplots (red lines inside each box denote the corresponding medians) of the resulting Pearson correlation coefficients between the predicted and simulated ranges over the SCC metabolites in the kinetic model of E. coli.
https://doi.org/10.1371/journal.pcbi.1006687.g002 respectively) when more than 50% of the relevant rate constants are known (Supplementary S3-S6 Figs). Therefore, the approach can be used for the frequently employed comparison of metabolite concentration ranges within and between conditions.

Effect of missing information about flux ratios
We also investigate the accuracy of the predictions of concentration ranges when full information about relevant rate constants is available and relevant flux ratios are obtained from constraint-based modeling approaches. To obtain physiologically relevant predictions, we constrain the model with the simulated exchange fluxes (Supplementary S1 Table), since they can be readily obtained from experiments (e.g. by following substrate depletion). As the employed kinetic model does not specify a biomass reaction, we optimize a weighted average of ATP production and total flux, known to lead to predictions in line with flux estimates from labeling experiments [2]. To this end, we determine the range for the relevant flux ratios at the optimal value for the objective and used them together with Eq (2) to obtain concentration ranges for the 23 SCC metabolites (Materials and methods). We find that for 13 out of 23 SCC metabolites the predicted concentration range reside inside the respective simulated range. For additional 6 metabolites the ranges overlap, while the remaining metabolites show no overlap in the predicted and simulated range using the objective of optimized ATP production and total flux (Fig 3). Since the approach provided accurate quantitative and qualitative predictions with perfect information in the case of kinetic modeling, the discrepancy is due to the objective used to constrain the physiologically reasonable fluxes.

Concentration ranges in a genome-scale metabolic model of E. coli
Arguably the most interesting scenario for application of our approach is with genome-scale metabolic networks. We find 199 SCC metabolites in the cytosol and 168 in the periplasm and extracellular space of the most recent genome-scale metabolic network of E. coli [27] (Supplementary S8 Table). However, for this model, we observe that there are data available for only 28% of relevant rate constants (Supplementary S3 Table), and we have no estimates of the relevant flux ratios available from labeling experiments [28][29][30]. Therefore, the approach cannot be used without extensions. Given a steady-state flux distribution, v, the concentration of a SCC metabolite X i is given by If we have data on concentration of SCC metabolites and flux predictions from the constraint-based modeling framework, we can readily obtain estimates for the ratio Metabolite concentration data set of Ishii et al. [28]. We use the measurements of steady-state concentrations of 182 metabolites from E. coli under different growth scenarios [28]. This data set includes 15 of the 199 cytosolic SCC metabolites found in the genome-scale model. We also have access to rates of glucose and oxygen uptakes, carbon dioxide release as well as growth from the same experiments [28], which we use as constraints to a genome-scale metabolic network of E. coli. It has been shown that E. coli does not optimize a single objective (e.g., growth), but its steady-state flux distributions result from the trade-off between tasks of optimizing growth, ATP synthesis, and total flux [2]. Since growth rate is fixed from measurements, we optimize the weighted average of ATP synthesis and total flux, with a weighting factor of 0.1 on ATP synthesis to reduce the effect of the order difference in the respective optimum observed when ATP production and total flux are optimized individually. Here, too, at the obtained optimum we can efficiently estimate ranges for the relevant flux ratios (Materials and methods). In addition, we compare obtained concentration ranges with those predicted when maximization of ATP is used as the only objective. To obtain estimates for s p s À i s ; we use three replicates for the concentration data and predictions of ranges for relevant flux ratios at growth rate of 0.2h −1 (Supplementary S4 Table). Eq (2) can then be applied to determine concentration ranges based on s p s À i s for a combination of replicates, to investigate the effect of outliers. We predict in turn the concentration ranges for three other growth rates (i.e., 0.4, 0.5, and 0.7h −1 ).
For the objective of optimizing ATP synthesis and total flux, our results demonstrate that measurements for 9, 10, and 6 of the 15 SCC metabolites fall in the predicted concentration range for the three growth rates, respectively (Fig 4). Nevertheless, the Spearman correlation between the measured values and the predicted lower and upper bounds is significant and larger than 0.57 and 0.56, respectively (Supplementary S5 Table). Therefore, the approach can be used to compare the ordering of lower or upper bounds between different experimental scenarios (Supplementary S7 Fig). In addition, this analysis highlights the effect of the replicates of metabolite concentrations used in calculating the values of s p s À i s , since estimates for some of the replicates may be outliers (Fig 4). In contrast, we find that 4, 5 and 2 of the 15 SCC metabolites fall in the measured range for the three growth rates when maximization of ATP is used as objective (Supplementary S9 Fig). Moreover, we cannot predict concentrations for 8 out of the For metabolites with grey background, there is no access to measurements. The colored bars denote the 15 SCC metabolites due to numerical instabilities arising when using this objective under the additionally imposed constraints on growth. The reasons for the discrepancy between the predicted and measured values under both objectives include the combination of at least three factors: the inability to distinguish the concentrations of free metabolites from those bound to macromolecules experimentally [31], model (and objective) inaccuracies, and the simplifying assumption of mass action kinetic. Nevertheless, the approach can be extended to consider networks with kinetic laws derived from mass action which involve enzyme forms (e.g., Michaelis-Menten, see Discussion) at cost of increased data requirements for application.
Metabolite concentration data set of Gerosa et al. [32]. We use the measurements of steady-state concentrations of 43 metabolites from E. coli grown in eight different carbon sources [32]. This data set includes ten of the 199 cytosolic SCC metabolites found in the genome-scale model. We also have access to rates of carbon uptake, some secretion rates, as well as growth from the same experiments (see Supplementary S10 Table), which we use as constraints to a genome-scale metabolic network of E. coli. Since growth rate is fixed from measurements, as above, we optimize the weighted average of ATP synthesis and total flux, with weighting factors 0.001 for ATP synthesis and 1000 for total flux to reduce the effect of the order difference and make the comparison to optimization of ATP synthesis. Different weighting factors are used in comparison to the analysis of the data set from Ishii et al., above, since different constraints are used that affect the optimal values of the individual objectives. Here, too, at the obtained optimum we can efficiently estimate ranges for the relevant flux ratios (Materials and methods). To obtain estimates for s p s À i s ; we use the metabolite concentrations from growth on acetate (Supplementary S10 Table). We then predict the concentration ranges for the ten SCC metabolites for the seven other carbon sources (Supplementary S10 and S11 Figs).
In Supplementary S10 and S11 Figs measured concentration ranges are denoted by red bars and predicted concentration ranges are shown in black. In case of succinate as only carbon source we obtain a model with no feasible solution, so no concentrations could be predicted for that case without further model adaptations. In the remaining growth conditions, depending on the objective and growth condition analyzed, three to five predictions of concentrations resulted in minimum values larger than the respective maximum (missing black bars). This observation is a result of numerical instabilities occurring if flux values v p and v À i s in Eq (1) differ by several orders of magnitude. The Spearman correlation between the average measured and predicted concentrations (Fig 5) when optimizing ATP synthesis is 0.63 (p-value 3 � 10 −4 ), while it is only 0.33 (p-value 0.03) when ATP synthesis and total flux are optimized. In addition, the Spearman correlation between the measured and predicted upper and lower bounds when maximization of ATP is used results in higher correlation values (upper bounds 0.61 (pvalue 4.3 � 10 −4 ), lower bounds 0.85 (p-value 5.9 � 10 −9 )) than those when optimization of ATP synthesis and total flux are employed (upper bounds 0.21 (p-value 0.17), lower bounds 0.54 (pvalue 1.6 � 10 −4 )). These findings imply that the usage of different objectives to estimate flux ratios and through them concentrations of metabolites can also be used to discern importance of optimized objectives in a particular experiment. predicted ranges from each of the three different replicates, while the black bar represents the prediction over all replicates. The red cross denotes the measured value at the respective GR. For some metabolites there is no overlap between the colored bars, indicating poor reproducibility over the replicates in the reference scenario. The nomenclature of the metabolites is provided in Supplementary S5 Table. https://doi.org/10.1371/journal.pcbi.1006687.g004

Changes in metabolite concentrations in knock-out mutants
The fully parameterized kinetic model of E. coli can be used to test the applicability of the approach to predict changes in metabolite concentrations in metabolic engineering scenarios. Here, we test the performance of the approach with knock-out mutants based on the following procedure: We make use of the model parameterization to simulate a steady-state concentration and flux distribution from initial physiologically reasonable values for metabolite concentrations. The resulting steady-state concentrations and fluxes yield a wild type reference. We then knock-out each reaction and predict positive steady state flux distribution closest to the wild type reference, following the Minimization of Metabolic Adjustment (MOMA) approach [33]. The resulting flux distribution is used to calculate the concentrations of the 23 SCC metabolites following our approach (Eq (1)). In the last step, the predicted changes in concentration of the SCC metabolites with respect to the reference are compared to the changes from kinetic simulations of the knock-out with the wild-type reference specifying the initial conditions. We observe similar ranges for the predicted and simulated fold-changes in SCC concentration over all 23 SCC metabolites and knock-outs of 929 reactions for which we were able to simulate a steady-state knock-out flux distribution (Fig 6, fold changes for individual SCC metabolites are shown in Supplementary S12 Fig). We grouped the fold-changes into 12 bins, given in the x-axis of Fig 6. For ten SCC metabolites, the predicted fold change of at least 29% of the knock-outs is in the same bin as the simulated fold change. The highest overlaps are observed for AMP (39%), phosphoenolpyruvate (38%) and isocitrate (37%). In contrast, the fold changes in concentration for metabolites like succinyl-CoA, acetyl-CoA, oxaloacetate, malate and pyruvate are in the same class as simulated for at most 1% of the knock-outs. The lack of correspondence between simulated and predicted concentrations for some SCC

Metabolites with SCC across species
We next apply Eq (1) to 14 large-scale metabolic networks which differ in complexity due to the number of considered metabolites and reactions as well as their organization in subcellular compartments (Supplementary S6 Table). The investigated metabolic networks are mass-and charge-balanced and support positive steady-state reaction rates (see Methods). Since reliable kinetic information and measurements of absolute concentration measurements are currently missing across diverse species, we report only the number of the metabolites with SCC across the analyzed large-scale networks.
We find that the percentage of metabolites with SCC ranges from 7.74% and 8.02% in the models of N. pharaonis and C. reinhardtii to 33.66% and 36.53% in the models of A. thaliana and Y. pestis (Fig 7a). Interestingly, the number of metabolites with SCC scales linearly with the total number of metabolites (Fig 7b, R 2 = 0.82) and the number of reactions in the examined networks (Fig 7c, R 2 = 0.76). This finding indicates that the proposed approach is not limited to networks of a particular size.
Different reasons can be used to explain the observation that larger networks contain more metabolites with SCC. For instance, larger networks may include more linear pathways, whereby the number of reactions which are fully coupled due to structure is expected to increase. Yet, in denser networks, which include more reactions on the same set of metabolites, it is more likely to identify reactions which share substrates of same stoichiometry, which then leads to full coupling due to mass action kinetics, as considered in our approach. To investigate the reasons for the scaling of the number of metabolites with SCC, we determine the number of: (i) metabolites which are synthesized and used by one reaction, respectively (in support of the linear pathway explanation), (ii) fully coupled reactions only due to structure, (iii) coupled reactions due to mass action (in support of the network density explanation), (iv) the combination of (ii) and (iii), to assess the couplings due to both structure and kinetics (Supplementary S7 Table). We calculate the Pearson correlation coefficient between each of these properties and the number of reactions over the analyzed networks, as a measure of network size (Supplementary S7 Table). Larger networks indeed contain a bigger number of metabolites synthetized and used by a single reaction, respectively, and more reactions which are fully coupled due to both structure and kinetics. Therefore, both the linear pathway and the network density explanations contribute to the observed scaling in the analyzed networks.
Due to the derivation of Eq (1), it may be expected that the approach is not applicable to metabolites which participate in a large number of reactions, since they may be less likely to be fully coupled. Nevertheless, our findings show that between 28.89% and 62.95% of the SCC metabolites in the analyzed networks are involved in more than two reactions (see Supplementary S6 Table). One reason is that a SCC metabolite may also be determined by applying Eq (1) to the ODE of another metabolite (see Eq (2) and Fig 1c).
Since changes in relevant fluxes directly affect the concentration of a SCC metabolite, they can be used to tightly control the concentration range. For essential metabolic processes to be carried out efficiently, metabolites that serve as coenzymes and energy currency of biological systems, namely, the oxidized and reduced version of NAD and NADP as well as the adenosine phosphates (i.e. AMP, ADP, ATP), are maintained within certain concentration ranges that can be readily controlled, as is the case for SCC metabolites. Despite the many biochemical reactions in which these ubiquitous metabolites participate (Supplementary S8 Table), all of which must satisfy our conditions in order to invoke Eq (1), we find that the (sub)cellular concentrations of ATP and NAD are indeed structurally constrained in twelve and ten of the analyzed networks, respectively. This implies that the network structure, alongside the relevant rate constants and relevant flux ratio, imposes boundaries on and facilitates simple control over their concentrations. In addition, we find that NADP shows SCC in four of the investigated networks, including A. thaliana and C. reinhardtii (Table 1 and Supplementary S8  Table). In these photosynthetic organisms, NADPH is produced by ferredoxin-NADP+ reductase in the last step of the electron transport chain which constitutes the light reactions of photosynthesis [34]. The produced NADPH provides reducing power for the biosynthetic reactions in the Calvin cycle to fix carbon dioxide as well as in the reduction of nitrate into ammonia for plant assimilation in the nitrogen cycle. Therefore, precise and simple control of NADPH will provide uninterrupted functionality of these key metabolic pathways and maintenance of carbon and nitrogen balance [35]. In addition, for ten models, we find that H+ is SCC, ensuring maintenance of the specific functions of individual organelles [36]. Altogether, our findings indicate that the concentration ranges for coenzymes and other components essential for fueling metabolism can be established by controlling few ratios of fluxes, despite their involvement in hundreds of reactions. Moreover, they imply that the network architecture may be organized such that the concentrations of these metabolites are intrinsically constrained and easy to control.

Discussion
Genome-scale metabolic networks have propelled the understanding of the metabolic capabilities for a wide variety of organisms across all kingdoms of life. The existing large-scale modelling approaches examine the space of feasible fluxes, but cannot be used to infer the metabolite concentrations driving these fluxes without extensively relying on largely unknown kinetic parameters. Hence, the direct usage of large-scale metabolic networks to make predictions about concentrations that are directly testable from high-throughput metabolomics data is not possible with the existing modelling approaches.
Here we derive a condition that pinpoints that the structure of a metabolic network, ratios of relevant rate constants, and ratios of relevant reaction fluxes constitute the determinant of concentration ranges for selected metabolites. This link is based on the well-known concept of full coupling of reactions [21,23] which we expand under the assumption of mass action kinetics to include reactions that share substrates of same stoichiometry. These concepts allow us to efficiently determine the admissible concentration ranges in large-scale metabolic networks endowed with mass action kinetics across all kingdoms of life. The derivation of Eq (1) can be generalized by considering reactions which differ in order larger than one with respect to a single metabolite. For a given flux distribution this approach results in a polynomial equation in a single variable which can be efficiently solved with the Newton's method.
Our approach is also applicable to networks with kinetic laws derived from mass action which involve enzyme forms (e.g., Michaelis-Menten). This can be achieved by augmenting the network to include reactions which model substrate-enzyme complex formation as well as the synthesis and degradation of enzymes. However, these extensions come at a cost of substantially larger data sets which are not yet readily available. In addition, our analyses demonstrate that the casting of a kinetic rate law in terms of mass action mechanisms may affect the findings regarding the SCC metabolites. For instance, we find that there are many more SCC metabolites in comparison to other SCC components (i.e., enzymes and enzyme-substrate complexes) in each of the analyzed models (Supplementary S9 Table). With exception of the network of C. reinhardtii, the usage of enzymatic forms explicitly in mass action mechanisms leads to a decrease in the number of metabolites with SCC (Supplementary S9 Table), due to the decrease in the number of reaction pairs which differ in their order by one. Applications of the approach to other forms of kinetics will be subject in future investigations and extensions.
Our approach provides a links between metabolite concentrations, relevant rate constants, and relevant flux ratios; therefore, information on two of these can be used to predict the third. Our analyses demonstrate that there is a good quantitative agreement between predicted and simulated concentration ranges based on full knowledge of rate constants from a kinetic model of E. coli. Rate constants of elementary reactions are expected to become increasingly available for model organisms, largely due to the development of computational methods coupled with high-throughput data [8,9]. In addition, by examining the scenario where flux ratios are estimated from the constraint-based modeling framework, we observe that the approach can be used to select which objective function (or a combination thereof) is optimized by a biological system for which metabolite concentration measurements are available.
Most importantly, we show that even in the absence of data on relevant rate constants and relevant flux ratios, we can apply the approach to successfully predict concentration ranges in E. coli under different growth conditions, provided measurements of concentrations for SCC metabolites in one reference condition. Therefore, the proposed approach represents an important step in complementing genome-scale metabolic networks with metabolite concentrations, widening the applicability of large-scale models to a range of biotechnological and medical applications.

Components with structurally constrained concentrations
A metabolic network can be represented by the stoichiometric matrix, N = N + − N − , where N + includes the stoichiometry of the products and N − comprises the stoichiometry of the substrates of each reaction. In the following, we derive the conditions for structurally constrained robustness of component X i based on the ordinary differential equation (ODE) for the component X j (not necessarily different from X i ) under the assumption that the reaction rates, v(t), satisfy mass action kinetics, whereby v i ðtÞ ¼ y i Q j x N À ji j ðtÞ. Let the ODE be specified by where P j is the set of reactions with X j as one of their products and S j is the set of reactions which have metabolite X j as one of their substrates.
We consider the following two cases: (i) the concentration of X i appears in every v k (t) for which N þ jk 6 ¼ 0 and for every v k (t) there exist a set P À i j of reactions R À i and (ii) the concentration of X i appears in every v l (t) for which N À jl 6 ¼ 0 and for every v l (t) there exist a set of reactions

Case I
The rates of a reaction R k and a reaction from the set R À i k are given by From rewriting the equation of v À i k ðtÞ above we have that The ODE for component X j revealing structurally constrained concentration of component X i is then given by: Let p and s bet two reaction indices such that N þ jp 6 ¼ 0 and N À js 6 ¼ 0. In any positive state v(t), we have that In a steady state then is constant because either reactions R À i k and R À i p are fully coupled or share the same substrates, then p is a constant that only depends on a subset of rate constants and the network structure. Moreover, if for every N À jl 6 ¼ 0; v l v s is constant because either reactions R l and R s are fully coupled or share the same substrates, then P l2S j N À jl v l v s ¼ s s is a constant, too, which in the simplest case when all reactions in S j are fully coupled irrespective of the kinetic rate law, only depends on the network structure. Therefore, For each reaction R k in S j , there exists a non-empty subset Q À i k of reactions lacking one substrate molecule of X i in comparison to R k ; the union of all Q À i k yields the set of reactions S À i j . Let Q be a subset of P À i j that contains one and only one reaction from each of Q À i k . Since the reaction indices p and s are arbitrarily chosen, the concentration range of metabolite X i for a given subset Q over a given set of flux distributions, F, is given as

Case II
The rates of a reaction R l and a reaction from the set R À i l are given by From rewriting the equation of v À i l ðtÞ above we have that The ODE for component X j revealing structurally constrained concentration of component X i is then given by: Let p and s bet two reaction indices such that N þ jp 6 ¼ 0 and N À js 6 ¼ 0. In any positive state v(t), we have that In a steady state then If for every N þ jp 6 ¼ 0; v k v p is constant because either reactions R k and R p are fully coupled or share the same substrates, then P k2P j N þ jk v k v p ¼ s p is a constant that, in the simplest case when all reactions in P j are fully coupled irrespective of the kinetic rate law, depends only on the network structure. Moreover, if for every N À is constant because either reactions R À i l and R À i s are fully coupled or share the same substrates, then The constant s À i s then only depends on a subset of rate constants and the network structure. Therefore, For each reaction R l in P j , there exists a non-empty subset Q À i l of reactions lacking one substrate molecule of X i in comparison to R l ; the union of all Q À i l yields the set of reactions P À i j . Let Q be a subset of P À i j that contains one and only one reaction from each of Q À i l . Since the reaction indices p and s are arbitrarily chosen, the concentration range of metabolite X i for a given subset Q over a given set of flux distributions, F, is given as As a result, the ranges for steady-state concentration x i can be expressed as a function of a set of given flux distributions, ratios of specific fluxes and constants that depend only on the structure of the network and values for a subset of rate constants. Since fluxes are the integrated outcome of transcription, translation, and post-translational modifications and their interplay with the environment and nutrient availability, our derivation provides a direct relation between concentration ranges, flux ratios, and rate constants.

Flux coupling
Let CðNÞ ¼ fv 2 R n jNv ¼ 0; v � 0g be the steady-state flux cone for a given stoichiometric matrix N with n reactions, under the assumption that every reaction is irreversible. Here, we restrict our analysis to the subspace F � C(N) by bounding the fluxes: where lb and ub are lower and upper flux bounds. We will refer to v 2 F as the feasible flux distributions. A reaction R i is called blocked if for every v 2 F, v i = 0. A pair of reactions R i and R j is called fully coupled, if there exists λ > 0, such that for every v 2 F, v i = λv j .
The minimum and maximum value for the ratio v i v j over the flux distributions in F can be determined by the linear-fractional programming: which can be rewritten following the Charnes-Cooper transformation [37] to the following linear program: If the minimum and maximum values for the linear program are the same, then the reactions R i and R j are fully coupled. Such reactions can be efficiently computed for large-scale networks [4,21].
In addition, under the mass action kinetics, two reactions are fully coupled in any state of the system if they share the same substrates with the same stoichiometry. This leads to additional full couplings due to the transitivity of the relations, as demonstrated in the main text.

Metabolites with structurally constrained concentrations in mass action networks
In the following, we present an algorithm determining SCC metabolites under the assumption of mass action kinetics: x i has SCC end if end for end for

Correlation analysis
Using a large-scale kinetic model of E. coli we simulate 100 steady-state flux distribution and steady-state concentrations from different initial concentrations. Initial concentrations were obtained by perturbation of the original initial concentration of a metabolite by 1, 5, 10 or 20%. We run the model until a steady-state was reached. Using the simulated steady-state flux distributions we can predict concentration ranges for 23 metabolites using Eq (2) (Supplementary S1 Table). The Pearson correlation was then calculated for (i) simulated and predicted upper bounds, (ii) simulated and predicted lower bounds, and (iii) the absolute range over simulated and predicted concentrations. In addition, we also determined the correlation between shadow price for the respective metabolites and the simulated range, as well as, to the coefficient of variation obtained over simulated concentrations (Supplementary S2 Table). Moreover, we calculated the Euclidean distance between upper and lower bound from prediction and simulation, respectively. Due to the high difference in the order of magnitude over the analyzed metabolites we also calculated Euclidean distance after normalizing the data. We considered the Euclidean distance of log-transformed concentration vectors, and the Euclidean distance between the concentration vectors normalized by the respective maximum value.

Effect of missing information on rate constants
To assess the effect of missing information about rate constants on the accuracy of the predicted concentration range, we simulated missing knowledge about parameters by removing 10, 30, 50, 70 or 90% of the relevant rate constants uniformly at random. We consider only removing information about relevant rate constants to avoid bias due to removal of information in parts of the network that have no effect on the predictions of the concentration ranges. We compare the Pearson and Spearman correlation coefficient between predicted and simulated concentration ranges as well as the two versions of Euclidean distance for each percentage obtained over 100 random removals of rate constants.

Effect of missing information on flux ratios
To assess the effect of missing information about flux ratios on the accuracy of the predicted concentration range, we obtained relevant flux ratios from constraint-based modeling. Therefore, we solve the following linear program optimizing a weighted average of ATP production and total flux: In addition, the flux through exchange reactions is constrained by the respective minimum, v sim_min , and maximum value, v sim_max , obtained over 100 simulations (Supplementary S1 Table) to obtain a physiologically reasonable flux distribution. The weighting factor of 0.01 was chosen to reduce the effect of three orders of magnitude difference in the respective optimum observed when ATP production and total flux are optimized individually.
Next, we determine the range for the relevant flux ratios v p v À i s at the optimum z � using a transformed linear-fractional program: We then used the obtained ranges for v p v À i s together with Eq (2) to calculate concentration ranges for SCC metabolite X i .

Extension of the approach based on available concentration measurements
Using the most recent genome-scale metabolic network of E. coli [27] together with measurements of steady-state concentrations from E. coli under different growth scenarios [28] we predict concentration ranges for 15 SCC metabolites using the following procedure. We first use the concentration measurements from three replicates at a growth rate of 0.2h −1 (reference state) together with flux ratios obtained from constraint-based modelling to estimate the ratio For each replicate we solve the following linear programs in order to obtain ranges for the relevant flux ratios The linear program above constrains rates of glucose and oxygen uptakes, carbon dioxide release as well as growth by values β i,j (which differ between replicates j, 1 � j � 3) available from measurements [28]. We optimize the weighted average of ATP synthesis and total flux. The weighting factor of 0.1 and 0.001 for ATP synthesis, for the data set of Ishii et al. [28] and Gerosa et al. [32], respectively, is chosen to reduce the effect of the order difference in the respective optimum observed when ATP production and total flux are optimized individually. In addition, we use weighting factors of 1 and 1000 for optimization of total flux in the case of Ishii et al. [28] and Gerosa et al. [32], respectively. To obtain ranges for the relevant flux ratios From Eq (2)  The latter can be obtained following the same procedure as described above using rates of glucose and oxygen uptakes, carbon dioxide release as well as growth for E. coli cells grown at rates of 0.4, 0.5, and 0.7h −1 .

Fold changes in SCC metabolite concentrations in knock-out mutants
We use a large-scale kinetic model of E. coli [8] to simulate a steady-state concentration and flux distribution from initial physiologically reasonable values for metabolite concentrations provided in the original publication. The simulated steady-state concentrations and fluxes yield a wild type reference. Next, we simulate single reaction knock-outs and predict positive steady state flux distribution closest to the wild type reference, following the Minimization of Metabolic Adjustment (MOMA) approach [33] for each mutant. The resulting flux distribution is used to calculate the concentrations of the 23 SCC metabolites following Eq (1). In addition, we simulate steady-state flux distributions and concentrations for knock-out mutants from the kinetic model using the wild type reference as initial concentrations. For 929 out of 1474 reaction knock-outs we could simulate steady-state values. Based on these knock-out mutants we then compare fold changes in concentration of the SCC metabolites with respect to the reference obtained from kinetic model simulations and predictions using MOMA.

Concentration ranges from thermodynamics-based metabolic flux analysis (TMFA)
We use the genome-scale kinetic model of E. coli [27] and the implementations of TMFA available from https://github.com/EPFL-LCSB/matTFA [38]. Concentration range for the 199 cytosolic SCC metabolites in the analysed E. coli model are obtain by thermodynamics-based variability analysis for solutions ensuring biomass to be at least 95% of the optimum obtained without thermodynamic constrains. In addition, minimum and maximum bounds on metabolic activities of 10 −10 and 1, as well as minimum and maximum bounds of 0 and 1000 on reaction net flux are used. Thermodynamic information are taken from the database provided within the TMFA implementation. We consider 10-90% of the relevant rate constants to be unknown by random removal. We consider three scenarios for the substitution of missing ratios of rate constants: (i) equality (i.e., kinetic rate constants are assumed to be the same), (ii) the mean, or (iii) the median of the ratios of relevant rate constants that are still present in the model. The colored bars denote the predicted ranges from each of the three different replicates, while the black bar represents the prediction over all replicates. For some metabolites no value could be predicted due to numerical instabilities. The red cross denotes the measured value at the respective GR. For metabolites with missing red cross, there is no access to measurements. The nomenclature of the metabolites is provided in Supplementary S5 Table. (EPS) Comparison of the predicted concentration ranges for 10 intracellular metabolites in E. coli with absolute concentrations measured at seven different carbon sources. The red bars denote the measured ranges over three different replicates, while the black bar represents the predicted concentration. For some metabolites no value could be predicted due to numerical instabilities. For the model simulating growth on succinate no steady-state solution could be obtained without further model adaptation, therefore, no SCC concentration could be predicted.  Table. List of rate constants for reactions in the genome-scale model iJO1366 of E. coli. In addition to the used rate constants and the related organism in BRENDA, the table reports the reaction abbreviation used in the model and the enzyme EC number related to each reaction. In case more than one rate constant is known per reaction we consider the average value.  Table. Structurally constrained metabolites across the 14 analyzed metabolic networks. In addition, the in-and out-degree for these metabolites are provided. Metabolites marked in red correspond to energy metabolism (see Table 1 in the main text) and metabolites marked in green exhibit absolute concentration robustness. Metabolite names and their abbreviations are used as provided in the original models. (XLSX) S9 Table. Number of metabolites with structurally constrained concentrations metabolic networks analyzed including enzyme information. The numbers of reactions and metabolites correspond to the number after rewriting in Michaelis-Menten format, reaction splitting into irreversible reactions and removal of blocked reactions. Model components correspond to metabolites, enzymes and enzyme-substrate-complexes. (XLSX) S10