Physicochemical and metabolic constraints for thermodynamics-based stoichiometric modelling under mesophilic growth conditions

Metabolic engineering in the post-genomic era is characterised by the development of new methods for metabolomics and fluxomics, supported by the integration of genetic engineering tools and mathematical modelling. Particularly, constraint-based stoichiometric models have been widely studied: (i) flux balance analysis (FBA) (in silico), and (ii) metabolic flux analysis (MFA) (in vivo). Recent studies have enabled the incorporation of thermodynamics and metabolomics data to improve the predictive capabilities of these approaches. However, an in-depth comparison and evaluation of these methods is lacking. This study presents a thorough analysis of two different in silico methods tested against experimental data (metabolomics and 13C-MFA) for the mesophile Escherichia coli. In particular, a modified version of the recently published matTFA toolbox was created, providing a broader range of physicochemical parameters. Validating against experimental data allowed the determination of the best physicochemical parameters to perform the TFA (Thermodynamics-based Flux Analysis). An analysis of flux pattern changes in the central carbon metabolism between 13C-MFA and TFA highlighted the limited capabilities of both approaches for elucidating the anaplerotic fluxes. In addition, a method based on centrality measures was suggested to identify important metabolites that (if quantified) would allow to further constrain the TFA. Finally, this study emphasised the need for standardisation in the fluxomics community: novel approaches are frequently released but a thorough comparison with currently accepted methods is not always performed.


Introduction
Metabolic engineering aims to improve microbial strains by considering comprehensive metabolic pathways in their entirety rather than overexpressing a single gene [1]. To improve the strains, hypothesis-driven studies have attempted to rationally identify gene targets and to evaluate the effects of those changes in the network [2,3]. However, the complex nature of cellular metabolism and its regulation demands a holistic understanding, i.e. a data-driven approach [1][2][3]. Combining metabolic engineering with systems biology and mathematical modelling allows for an optimisation of entire cellular networks considering further downstream processes at early stages [4].
This systematic framework exploits information regarding the metabolic state, which comprises the metabolome (complete set of low-molecular-weight metabolites (<1.5 kDa)) and the fluxome (or metabolic activity, distribution of rates of conversion/transport in the metabolic network) [5,6]. Kinetic modelling can yield metabolic fluxes from metabolomics data, but lack of high-quality enzymatic parameters and computational limitations (e.g. time-consuming processes) hinder its application [7][8][9]. Performing an elementary flux mode analysis (EFMA) to decompose the metabolic network into minimal subsets allowing to maintain the steady state provides useful information [10]. However, the combinatorial explosion makes the algorithm computationally expensive and therefore limits the size of the network that can be analysed [10,11]. Alternatively, stoichiometric modelling can provide a flux distribution for larger networks without any kinetic or metabolomics information [12]. Briefly, a metabolic (quasi) steady state for intracellular concentration values (C) is assumed, so that the stoichiometric matrix (S) (including the stoichiometric coefficients of metabolites in each reaction of the metabolic network) constrains the set of metabolic fluxes (υ) [13]: Two main approaches to solve this equation can be found: (i) flux balance analysis (FBA), normally applied to large models (genome-scale model, GSM) [14] or (ii) metabolic flux analysis (MFA), used for smaller metabolic networks (mainly the central carbon metabolism) ( Table 1). FBA solves the underdetermined system represented in Eq 1 by maximising or minimising the value of an assumed objective function [14]. A plethora of different objectives has been described in the literature [15]. Three of them can be highlighted: maximisation of biomass yield (Y X/S , equal to the ratio growth rate/substrate uptake rate), maximisation of ATP yield, and minimisation of sum of fluxes, which have been suggested to compete in the regulation of bacterial metabolism [16]. Hence, selecting an adequate one/multi-dimensional objective function when analysing a GSM will depend on the growth conditions to be simulated in FBA. In general, measured extracellular metabolic rates (e.g. substrate uptake) are insufficient to properly constrain the intracellular metabolic fluxes [13]. In contrast, MFA is based on a least-squares-regression problem, normally solved by exploiting experimental mass isotopomer distribution (MID) of proteinogenic amino acids ( 13 C-MFA) [13]. Since this approach requires fewer assumptions and uses more experimental information than FBA, 13 C-MFA is considered to be the gold standard in fluxomics [17]. However, current applicability (central carbon metabolism), and technical/computational complexity (particularly for autotrophic growth [18]) limit its usage.
The set of constraints characterising stoichiometric modelling approaches (Eq 1) is insufficient to guarantee thermodynamically feasible results in the flux solution space [19,20]. Both FBA and 13 C-MFA assume most reactions to be reversible [13,21]: in the first case directionalities are dictated by the optimal flux distribution (which depends on the a priori chosen objective function [14]), whereas in 13 C-MFA they are determined by the MIDs [22]. The flux-force relationship (thermodynamic displacement from the equilibrium [23]) links thermodynamic potentials and fluxes (Eq 2): where Δ r G 0 and Δ r G 0o are the Gibbs free energies of reactions (the latter referring to adjusted standard conditions), Q and k eq are the ratio of products to reactant concentrations or activities (the latter at equilibrium) and (J + /J − ) is the relative forward-to-backward flux [22]. Four main approaches exploiting thermodynamics data can be highlighted: (i) energy balance analysis (EBA), where pre-selecting Δ r G 0 bounds leads to biased results [24], (ii) networkembedded thermodynamic (NET) analysis, that needs pre-assigned directionalities (e.g. obtained by FBA) and evaluates the thermodynamic consistency [25], (iii) max-min driving force (MDF), which needs a flux distribution as input data to predict metabolite concentration values [26], and (iv) thermodynamically-constrained FBA. Two methods were developed in the latter approach: thermodynamics-based flux analysis (TFA), and an optimization problem allowing to obtain a thermodynamically flux-minimised (TR-fluxmin) solution. TFA directly yields a thermodynamically feasible FBA solution (e.g. by maximising Y X/S ) and simulates metabolomics data [20,27]. In contrast, TR-fluxmin is based on the minimisation of sum of fluxes in the system whilst applying a penalty score for in silico metabolite concentration values [21]. Other recent approaches are based on alternative constraints, such as setting an upper limit on the Gibbs energy dissipation rate [28], or only provide information regarding reaction directionalities [29]. With regards to EFMA, even though using thermodynamics reduces the aforementioned limitations due to combinatorial explosion, the network size is still a limiting factor [30].
MDF and TFA are generally performed using eQuilibrator [26] and matTFA [20], respectively. Since matTFA can be directly used to analyse a GSM, it was selected for this study. Three features should be highlighted: (i) unique values for temperature (25˚C) are considered, (ii) salinity (S) is not taken into account when calculating parameter A, and (iii) Gibbs free energy values are adjusted for ionic strength (I) using the extended Debye-Hückel equation (Table 1). It should be noted that in [20], I = 0.25 M and no salinity were assumed to study E. coli (with a cytosol in the interval 0.15-0.20 M [27]), where the extended Debye-Hückel is only valid for I < 0.1 M [31].
This study was based on determining the impact of varying and adjusting the physicochemical parameters (t, I and S) on the predictive capabilities of TFA under mesophilic growth conditions. In order to do so, a modified matTFA was developed by increasing the number of parameters and parameter values that were originally considered [20]. To validate the results, a comparison with published 13 C-MFA and metabolomics data was performed. In particular, flux pattern changes between in vivo and in silico fluxes in the central carbon metabolism were analysed, with a focus on the anaplerotic reactions. In addition, a method based on centrality measures was suggested to identify important metabolites that (if quantified) would allow to further constrain the TFA.

Metabolic network, mapping of metabolic fluxes and experimental data
Mesophilic growth conditions were studied by selecting a GSM for Escherichia coli (str. K-12 substr. MG1655): iJO1366, which has proven to predict phenotypes in a wide range of growth conditions [34]. For the sake of consistency, metabolomics and fluxomics data were obtained from the same experiment (S1 Dataset and S1 Table) [35]. Briefly, cells were grown in glucoselimited chemostats at 37˚C with minimal medium and a fixed specific growth rate (μ) of 0.20 h -1 . The experimental glucose uptake rate (2.93 mmol gDCW -1 h -1 ) was used as a constraint, leaving the default lower and upper bounds for transport reactions. Maximisation of the biomass yield was selected as the objective function, and no flux value was forced through the biomass reactions (v biomass ). Directionalities of resulting flux values from TFA were compared on a reaction-by-reaction case against in vivo fluxes from 13 C-MFA, for which a mapping and directionality correction step was needed (S1 Table).

Generation of experimental design
The original matTFA toolbox uses unique values for t and I [20], and S is not taken into account (Table 1). To explore their potential impact in the predictive capabilities, a modified matTFA (mod-matTFA) allowing to consider alternative parameters values and methods was created ( Table 2). For the sake of reproducibility [36], the complete list of files used in this study was collected in S2 Table, and are publicly available in Nottingham SBRC's GitHub profile (https://github.com/SBRCNottingham/Impact-of-Physicochemical-Parameters-onthermodynamics-based-FBA). Analyses were performed using the COBRA toolbox [37] in MATLAB R2016b with the solver CPLEX 12.8.0 to ensure compatibility.
Since I affects the Gibbs energy of formation, an adjustment from the reference state (D f G o j ) was needed to obtain the standard transformed Gibbs energy of formation (D f G 0o j ) [32]. In the original matTFA [20] and other studies [26,28] the extended Debye-Hückel equation was used to adjust the Gibbs free energy values, with a proven validity for I < 0.1 M [31](Eq 3). The parameter B was assumed to be constant, with a value of 1.6 mol -1/2 L 1/2 [27,32]. Mod-matTFA also explored the impact of using the Davies equation (β = 0.3) (Eq 4) as an alternative adjustment approach, with a tested validity for I < 0.5 M [31].
Both formulas include terms correcting the pH and I, where N H (j) is the number of hydrogen atoms in species j, R is the gas constant, T is the absolute temperature and z j refers to the charge of the species [32]. Applying the Gibbs-Helmholtz equation would be necessary to account for temperature different from standard conditions, i.e. 25˚C, but the lack of measured changes in enthalpy (ΔH o ) for all the metabolites prevents from doing so [38]. Hence, variations from 25˚C to 37˚C were assumed to be small, as shown elsewhere [39]. The parameter A is normally assumed to be constant [27]or calculated using a temperature-dependent function Table 2. Factors considered in mod-matTFA. Values 0/1 refer to the binary codification for the full factorial design (S3 Table). In total, 2 6 combinations were tested. (Eq 5) [20,26], and the impact of using a temperature/salinity-dependent function (Eq 6) [38] was also tested in this study (Fig 1).
where the first term in (Eq 6) includes physical constants (Faraday's constant (F), vacuum permittivity (ε 0 ), gas constant (R) and Avogadro's constant (N A )), and the second is the temperature (T in K and t in˚C), and salinity (S) dependent functions to calculate the density (ρ sw ) [40]and the relative permittivity (ε sw ) [41]for seawater (S2 Table).
In general, consistency in units between parameters A (mol -1/2 kg 1/2 ) and B (mol -1/2 L 1/2 ) is achieved by assuming 1 kg = 1 L. In this study, an expression for seawater (Eq 7) [42]was used to estimate a salinity value by considering a buoyant density (ρ) for bacterial cells of 1.11 kg/L [43]. For I, a value of 0.25 M was used ( Table 2).

Assessment of fluxomics and metabolomics predictive capabilities
Mesophilic growth conditions for E. coli were selected as a case study to explore the impact of metabolic and physiochemical constraints on the predictive capabilities of TFA at the fluxomics and metabolomics level. Accordingly, 64 different factor combinations ( Table 2) were tested using mod-matTFA. It is important to note that not all test yielded a solution where cell growth was achieved (i.e. v biomass > 0 mmol gDCW -1 h -1 ). Since different factor combinations converged into the same set of solutions, tests were characterised at the fluxomics and PLOS COMPUTATIONAL BIOLOGY metabolomics levels by considering either the full set of values, or the subset with an experimental counterpart. Results yielding feasible solutions were also compared against 13 C-MFA flux values (S1 Table) and experimental metabolomics data (S1 Dataset), respectively. A goodness-of-fit analysis based on the Pearson correlation coefficient (r) was performed, as shown in [44]. In order to identify the test(s) with the best predictive capabilities at both levels, they were separately ranked according to two criteria: (i) correlation coefficient at the fluxomics level, and (ii) correlation coefficient at the metabolomics level. The concordance between results was assessed by the Kendall's W statistics (S2 Table), where a value of 0 means no agreement of ranking position with respect to each criterion, and a value of 1 indicates total agreement. This statistics is a normalisation of the Friedman test, which simply tests whether samples are from the same population or not [45]. Finally, a joint ranking after weighting the ranking position according to each criterion was considered (the higher the score, the better the correlation in both the fluxomics and metabolomics levels).

Thermodynamics-enriched network analysis
The constraining capacity of metabolites is not uniform, and depends on their connectivity in the network [20,46]. To further constrain the model, a priority list of metabolites to be quantified should be considered when designing the metabolomics protocol. In this study, the suitability of the selected dataset for this purpose was analysed (S1 Dataset). The importance of each metabolite in the network was measured by means of PageRank as implemented in MATLAB. This algorithm was developed by Google [47]and has been recently applied to metabolic networks [48]. In this sense, the presence of over-represented metabolites (e.g. proton donor) biases centrality measures [48]. Therefore, a removal of these currency [49], side [48]or pool [50]metabolites from the network was performed (S1 Appendix).
Non-redundant flux distributions from TFA were selected and subjected to network simplification and correction. Briefly, only active metabolites and reactions were kept, and stoichiometric coefficients were corrected so that they reflected the flux direction of each reaction. Centrality measures require a graph G, defined as a pair G = (V, E), where the vertices (or nodes) V are the metabolites, and the edges E the reactions connecting them. The stoichiometric matrix was converted into an adjacency matrix using an in-house script (S1 Appendix), which was later used to generate a G ready for the PageRank analysis. The final lists of metabolites were ranked by their centrality score, and the top 50% compared against the list of available experimental values.

Results and discussion
In the last two decades, biotechnology and systems biology have benefitted from the development of 13 C-MFA and FBA to measure and estimate intracellular metabolic fluxes in industrially relevant bacteria. Although the influence of thermodynamics in living systems has been considered for several decades, its application to study biochemical networks has been only recently enabled [24,32]. In this sense, a multitude of different approaches constraining wellestablished modelling approaches with thermodynamics have been suggested. Given its relevance, this study focused on analysing TFA (performed by matTFA toolbox [20]). This study aimed at: (i) assessing and improving TFA's reliability of predicting metabolic fluxes and metabolite concentration values, and (ii) identifying important metabolites to further constrain the model. In order to do so, (i) the published matTFA toolbox was modified to include a broader range of parameters (and parameter values) as well as alternative equations and constraints ( Table 2), and (ii) an in-house script was developed to perform a GSM-wide network analysis exploiting TFA-derived reaction directionalities.

Evaluation of the reliability of predicted flux and concentration values
A full factorial design comprising 2 6 tests ( Table 2) was applied in TFA to constrain the GSM iJO1366 [34], selecting the maximisation of biomass yield as the objective function. An experimental glucose uptake rate was set (2.93 mmol gDCW -1 h -1 ), reaching a μ � 0.28 h -1 (the experimental was 0.20 h -1 ) for all FBA and TFA tests. Overall, 26/64 tests were unsuccessful (no cell growth), and the remaining 38/64 converged into common optimal solutions (S4 Table). At the fluxomics level, a single flux distribution was achieved in FBA for all tests, whereas for TFA a different number of non-redundant solutions were found: 5 (when considering all reactions) or 4 (only those with an experimental counterpart). Likewise, at the metabolomics level, the 38 tests were reduced to 9 optimal solutions. Results were tested against available experimental data ( 13 C-MFA [35,51]and metabolomics [35]) by calculating the Pearson correlation coefficient. Therefore, each successful test was characterised by the optimal solutions it achieved and the correlation coefficients at both the fluxomics and metabolomics levels.
The importance of each factor was assessed by means of decision trees (CART in Minitab 19) (Table 3). Briefly, models were built considering categorical predictors (the factors after the codification (S3 Table)) and responses: the importance of a factor measured the improvement on the model when using it to split the data. Accordingly, the relative importance was calculated with respect to the best predictor ( Table 3). The I (M) was the top one for all responses except for TFA (full), where it equalized t (˚C) at 95.7% and was second to the adjustment method. In all cases, using either default concentrations values for AMP, ADP and ATP (as included in the original matTFA), or experimental data made no difference. As a result, tests only differing in this factor showed the same correlations with experimental data (Table 4).
Correlation coefficients for FBA in all tests was r � 0.02, whereas for TFA it varied within the range from 0.90 to 0.95. A reaction-by-reaction comparison of flux directionalities in central metabolism showed inherent differences between 13 C-MFA and FBA/TFA, as discussed in the last section of this study. At the metabolomics level, it ranged from 0.08 to 0.18 (S4 Table). Tests were ranked independently by both criteria, showing a notable agreement in their positions (Kendall's W � 0.81). Scoring the position according to each criterion allowed creating a joint ranking to identify the test(s) with the best predictive capability at both levels (Table 4). Four tests held the first position, since they all converged into the same optimal solutions (S4 Table 3. Relative factor importance. The type of analysis depended on the nature of the response: classification was selected for TFA (full), TFA (match 13 C-MFA), concentration values (full) and concentration values (match experimental), and regression for r (fluxomics) and r (metabolomics). The former was suited for categorical responses (i.e. which solution is achieved, as shown in S4 Table), and the latter for continuous responses (for Pearson's r, from -1 to +1).  Table). Specifically, t = 37˚C, I = 0.25 M and the Davies equations as adjustment method were used. Following the relative factor importance (Table 3), correlation coefficients were not affected by S and the selection of concentration values. This analysis showed that adjusting the physicochemical parameters to the experimental conditions did improve the predictive capabilities of TFA, but certain technical limitations at both levels need to be discussed. The nature of 13 C-MFA only allows determining the flux distribution in the central carbon metabolism by considering amino acid synthesis [13], which has been noted to be very robust against changes in the intermediate metabolite concentrations [52,53]. The recent discovery of non-enzymatic metabolism-like reactions suggests that current metabolic networks evolved from prebiotic reaction sequences. Therefore, a wellestablished flux distribution in the central pathways can be expected [54]. In order to discern among tests, focus on highly variable flux values should be promoted, but the variance among them was low (S2 Dataset). In fact, only 36/1679 showed a variance greater than zero, where 6 reactions had an experimental counterpart to compare against. Optimal solutions for all tests were similar (reducing the discerning capacity), which explained the overall high correlation coefficients for all tests. Therefore, results from the comparison of predicted and experimental metabolite concentration values are paramount to better understand the impact of varying the physicochemical parameters.

Concentration values (match
Regarding the metabolomics level, the 9 non-redundant solutions were subjected to a similar analysis. Likewise, only 46/972 metabolites had a variance among tests greater than zero (S3 Dataset), out of which 7 were quantified: L-aspartate, phosphoenolpyruvate, ATP, L-valine, pyruvate, NADP + , and FAD. Reliable quantitation of energy-carrying molecules and redox cofactors is not easily achievable, given the inherent cell dynamics (e.g. cell cycle and cell size variations) and degradation during extraction [55][56][57][58][59][60][61][62][63]. Since the correlation coefficients were calculated using a dataset blind to highly variable metabolites (e.g. 3-phosphohydroxypyruvate ranged four orders of magnitude), resulting values were similar for different factor Table 4. Tests with the highest score in the joint ranking. The full list is available in (S4 Table). � (run #3) reflects the conditions used in the original matTFA.   Table).

PLOS COMPUTATIONAL BIOLOGY
Thermodynamics-based stoichiometric modelling under mesophilic growth conditions combinations (Table 4). Thus, said metabolites should be quantified to deconvolute the impact of using default or experimental concentration values in the predictive capabilities.
Other limitations refer to the design of the tool itself. This method does not consider other complex phenomena affecting the thermodynamic feasibility of metabolic pathways, such as Mg complexation with metabolites, or compound dissociation into more than two protonated species [19,20](as shown in the file calcDGspecies.m). In addition, Gibbs free energy values are relaxed when no feasible solution is found, so the constraining power of experimental metabolite concentration values is reduced [20]. Related to this, an approach allowing to identify metabolites to further constrain the model was developed in this study (next section). Finally, it should be noted that to apply matTFA to thermophilic species (e.g. Thermus thermophilus, a potential non-model metabolic engineering platform [64]), recent methods to adjust Gibbs free energies to high temperatures should be considered [65].

Identification of central metabolites to further constrain the model
Successful tests converged into 5 solutions at the fluxomics level (S4 Table), which are structurally equivalent. Therefore, a single stoichiometric matrix was considered for further analysis. After the simplification step (removal of inactive metabolites and reactions, as well as side compounds) 622/1805 metabolites were left in the network. The experimental dataset included information about 44 metabolites (S1 Dataset), out of which 34 were also considered in the simplified network, and the rest was discarded as side compounds.
PageRank scores were calculated, allowing to identify metabolites in the top 50% for which experimental data was available (Table 5). Overall, 18/34 quantified metabolites were in the top 50%, with only 7 in the top 10%. The lack of high centrality for most metabolites explains the aforementioned result, where tests only differing in the set of concentrations values used as a constraint (default ATP/ADP/AMP or experimental) led to the same optimal solution (e.g. tests 20 and 52, Table 3). Table 5. Quantified metabolites in the top 50% of PageRank (PR) based analysis. The last position in the ranking (#622) was L-Tyrosine (PR score = 0.0004), which had been quantified. The full list can be found in (S4 Dataset).

Quantile
Ranking position Metabolite Node PR score The priority list is led by L-glutamate, pyruvate, 2-oxoglutarate (not quantified), D-fructose-6-phosphate and glyceraldehyde 3-phosphate (not quantified). Both L-glutamate and 2-oxoglutarate participate in the assimilation of nitrogen in E. coli, where the former also plays a role as nitrogen donor in the biosynthesis of nucleic acids [66]. The latter along with the rest (except for glyceraldehyde 3-phosphate), and acetyl-CoA are important biosynthetic precursors used in modelling [49]. Accordingly, other metabolites participating in central pathways such as glycolysis (glyceraldehyde 3-phosphate, dihydroxyacetone phosphate, etc.) and protein biosynthesis (amino acids) were also identified. Important metabolites highlighted here agree with results from the seminal work by Wagner et al. [49], where they used a smaller network (317 vs. 931 reactions). Due to computational costs, other attempts specifically focusing on the constraining capacity with regards to TFA (Thermodynamics-based Metabolite Sensitivity Analysis, TMSA) are also limited by the network size (156 reactions in [46]). In particular, this approach identified pyruvate as the most significant metabolite in terms of reducing the variability in the thermodynamic properties of reactions, and attributed it to its high connectivity in the network. Other important compounds included phosphate, NAD + , NADH, CO 2 , menaquinol-8, menaquinone-8 and D-lactate. All but the latter were classified as side compounds for this study (and therefore excluded (S1 Appendix)), since the centrality measures are biased by ubiquitous metabolites [48].
The impact of the inherent dynamics (cell cycle and cell ageing) has been pointed out as a source of metabolic heterogeneity in clonal microbial populations [55]. In a chemostat, cells are maintained at the exponential growth phase, but the cell cycle is not synchronised across single cells unless forced [56,57]. In E. coli, concentration values for NAD(P)H oscillate along the cell cycle [58], and ATP concentration values show an asymmetric distribution across single cells in a continuous culture [59]. From a metabolomics point of view, an unbiased extraction and quantitation method is yet to be developed [60]. Particularly, ATP/ADP/AMP quantitation require specific culture conditions [61], and nicotinamides parallel protocols to avoid degradation. Overall, the method developed here generated a priority list to be considered when selecting a metabolomics protocol aiming at providing data to further constrain a model in TFA.

Reaction directionalities in the central carbon metabolism
Finally, flux pattern changes between in vivo and in silico fluxes in the central carbon metabolism were analysed, with a particular focus on the anaplerotic reactions. The 'anaplerotic node' (Fig 2) consists of carboxylation/decarboxylation reactions including intermediates participating in the tricarboxylic acid (TCA) cycle that are used for biosynthesis of amino acids [67]. Given the fact similar MIDs (from proteinogenic amino acids) can be obtained from different precursors, 13 C-MFA has been noted to show a limited capability to elucidate fluxes around the anaplerotic node [52,68,69]. In order to evaluate changes in reaction directionalities, the available in vivo fluxes were tested against their equivalents in the simulated TFA flux distributions (S1 Table). Overall, 13/40 flux directions disagree between approaches (Table 6).
Discrepancies in flux pattern between methods are caused by both differences in the structure of the metabolic networks and the way the problem is defined (Table 1). On the one hand, iJO1366 includes 8 reactions concerning the anaplerotic node and the glyoxylate shunt: PPC and PPCK (between phosphoenolpyruvate and oxaloacetate), PYK and PPS (between phosphoenolpyruvate and pyruvate), ME1 and ME2 (between pyruvate and malate) (Fig 2), and finally ICL and MALS (from isocitrate to malate, via glyoxylate). In contrast, the metabolic network used for the 13 C-MFA did not consider PPCK and PPS (S1 Table), which could affect the determination of fluxes to/from phosphoenolpyruvate. Since 13 C-MFA is based on lumped reaction, branched pathways are not taken into account [13]. Thus, having a smaller range of alternative pathways than FBA/TFA may affect the estimation of flux values.
On the other hand, in silico flux distributions are the result of optimising the system according to the chosen objective function. Accordingly, when maximising the biomass production (which requires ATP), FBA and TFA promote pathways that reduce wasting ATP in the optimal solution [14]. For instance, PPCK (ATP-consuming reaction) carried no flux. In contrast, experimental data from E. coli grown on glucose has proven that both PPC and PPCK (which constitute a futile cycle) are active and play a role in metabolic regulations [70]. However, Anaplerotic node for E. coli. Set of carboxylation/decarboxylation reactions including phosphoenolpyruvate, pyruvate, oxaloacetate, and malate. Arrows indicate the expected direction of carbon fluxes. Boxes refer to reactions: blue when they are defined in both the GSM and the metabolic network used for 13 C-MFA, and orange when they are exclusively considered in the GSM. In the latter case no mapping was possible (S1 Table). https://doi.org/10.1371/journal.pcbi.1007694.g002 given the fact that ICL and ME1/ME2 do not generate any ATP, fluxes are shut down in the simulated flux distributions (as shown in [52]). In this sense, it should be noted that stochastic events or regulatory processes have been suggested to provoke a variation of the fluxes through PPCK and ME1/ME2 [71]. FBA/TFA also faced problems regarding the overflow metabolism: acetate was predicted to be produced (PTAr and ACKr), as opposed to the lack of flux according to 13 C-MFA.
Even though flux pattern changes between predicted and experimentally determined intracellular fluxes were present, TFA offered a reliable prediction of intracellular fluxes (Table 4). This overall consistency has been noted in the literature by comparing an array of different objective functions and constraints (based on split ratios rather than on mapping on a reaction-by-reaction case) [15]. A combination of both approaches to overcome their limitations and different flux space solutions has also been suggested [72,73]. However, fluxes concerning the TCA cycle, the glyoxylate shunt and acetate secretion have proven to be difficult to predict [15], as also shown in this study. Similarly, other reactions are also affected by the substrate uptake rate: ALCD2x becomes unidirectional at high glucose levels [28].
In addition, the nonlinear dependency of the anaplerotic fluxes on the growth rate has been reported in the literature, limiting the reliability of conclusions from experiments using single dilution rates [70,71]. Particularly, metabolic fluxes through the aforementioned futile cycle are expected to be active under glucose-limited growth conditions [74], rather than being totally shut down (Fig 2). In this sense, a higher degree of consistency between predicted and experimental flux distributions could have been achieved by (i) focusing on data from cultures with high dilution rates, so that futile cycle activity is lowered and the flux distribution becomes closer to the optimal solution, or (ii) applying further constraints to properly model the anaplerotic reactions [75]. The first option is limited by the lack of published data at both Where +, flux in the forward direction; -, flux in the reverse direction; 0, no flux. Corrected direction, refers to the adjustments due to differences in the definition of the reaction between 13 C-MFA and GSM (S1 Table). For example the case of ALCD2x: in vivo flux ( 13 C-MFA) suggests production of ethanol, whereas the in silico one (GSM/TFA) predicts consumption of ethanol. Since reactions are defined in opposite directions, a correction becomes necessary. Discrepancy between corrected directions and predicted ones allowed an automated identification of flux pattern changes.
https://doi.org/10.1371/journal.pcbi.1007694.t006 the metabolomics and fluxomics levels from the same experiment, and the second one by the lack of implementation. Consequently, it was assumed that the high correlation coefficient achieved for TFA against in vivo fluxomics data (r � 0.9) was high enough to enable the analyses on the impact of varying the physicochemical parameters in the predictive capabilities. Studying flux pattern changes on a reaction-by-reaction basis also allowed to confirm previously reported limitations from both 13 C-MFA and FBA/TFA with regards to the anaplerotic node [68,69,75]. Thus, metabolites in the node are expected to be directly affected.

Conclusions
This study showed that the predictive capabilities of TFA can be potentially improved by using physicochemical parameters closer to the experimental conditions and adequate equations. In addition, we proposed a method based on centrality measures to identify important metabolites allowing to further constrain the TFA. In contrast to previous attempts, our strategy is not limited by the size of the network and is computationally cheap. Therefore, a preliminary TFA could be considered when designing a metabolomics protocol to maximise the constraining power of the experimental concentration values. Overall, our study stressed the necessity of performing an in-depth assessment of available methods in the fluxomics field. For instance, interesting published potential solutions to known problems (e.g. elucidation of the anaplerotic fluxes) should be integrated with the widely used approaches. This should increase the degree of standardisation in the community, allowing to cross-validate novel strategies and improving the reliability of the simulated data.
Supporting information S1 Appendix. Generation of directed graphs and side compounds.