E-Flux2 and SPOT: Validated Methods for Inferring Intracellular Metabolic Flux Distributions from Transcriptomic Data

Background Several methods have been developed to predict system-wide and condition-specific intracellular metabolic fluxes by integrating transcriptomic data with genome-scale metabolic models. While powerful in many settings, existing methods have several shortcomings, and it is unclear which method has the best accuracy in general because of limited validation against experimentally measured intracellular fluxes. Results We present a general optimization strategy for inferring intracellular metabolic flux distributions from transcriptomic data coupled with genome-scale metabolic reconstructions. It consists of two different template models called DC (determined carbon source model) and AC (all possible carbon sources model) and two different new methods called E-Flux2 (E-Flux method combined with minimization of l2 norm) and SPOT (Simplified Pearson cOrrelation with Transcriptomic data), which can be chosen and combined depending on the availability of knowledge on carbon source or objective function. This enables us to simulate a broad range of experimental conditions. We examined E. coli and S. cerevisiae as representative prokaryotic and eukaryotic microorganisms respectively. The predictive accuracy of our algorithm was validated by calculating the uncentered Pearson correlation between predicted fluxes and measured fluxes. To this end, we compiled 20 experimental conditions (11 in E. coli and 9 in S. cerevisiae), of transcriptome measurements coupled with corresponding central carbon metabolism intracellular flux measurements determined by 13C metabolic flux analysis (13C-MFA), which is the largest dataset assembled to date for the purpose of validating inference methods for predicting intracellular fluxes. In both organisms, our method achieves an average correlation coefficient ranging from 0.59 to 0.87, outperforming a representative sample of competing methods. Easy-to-use implementations of E-Flux2 and SPOT are available as part of the open-source package MOST (http://most.ccib.rutgers.edu/). Conclusion Our method represents a significant advance over existing methods for inferring intracellular metabolic flux from transcriptomic data. It not only achieves higher accuracy, but it also combines into a single method a number of other desirable characteristics including applicability to a wide range of experimental conditions, production of a unique solution, fast running time, and the availability of a user-friendly implementation.

to simulate the unknown carbon source situation. As listed in Additional file 5, a total of 108, 154, and 158 potential 5 carbon sources were allowed to be taken up by the cell for iND750, iMM904, and Yeast 5 models of S. cerevisiae, 6 respectively. To test whether Yeast 5 performs worse than the older models because it has more carbon source 7 uptake reactions (leading to more incorrect carbon sources to confound the prediction method), we performed SPOT 8 again after blocking the uptake of model-specific carbon sources, leaving 106 exchange reactions that are common 9 across all three models (see Additional file 7 for details). In other words, we updated the three yeast AC models so 10 that they all have the same set of 106 possible carbon source exchange reactions, which we call the AC common (All 11 possible common Carbon sources) model in the figure to distinguish it from the original AC model.

12
As shown in the AC common +SPOT case of the figure above, although the average correlation of Yeast 5 was 13 improved from 0.5313 to 0.5791 after reducing the number of carbon source uptake reactions to 106, the older yeast 14 models still outperform Yeast 5, which suggests that the lower correlation achieved by SPOT on the Yeast 5 AC 15 model is not simply due to having a greater number of possible carbon sources.

16
Interestingly, unlike the older models, the performance of Yeast 5 in AC+SPOT was much improved by limiting the 17 uptake of well-known by-products of yeast such as ethanol and glycerol [1]. As shown in the (AC common -   This lemma can be proved by contradiction.

7
Let us assume that this statement is false. Then its negation, i.e. if optimizes (i.e. maximizes) problem 2, then 8 does not optimize problem 1, should be true. In other words, we assume that another vector exists such that 9 where .

11
Substituting into the term on the left side of the inequality above gives .

11
In addition, since optimizes problem 2 by assumption, . Note that no optimal solution to problem 2 has 1 1 because is a non-negative vector and we assume that (for otherwise, the problem is trivial). Thus, 2 if 1, the objective of problem 2 can always be increased by scaling up until . Hence .

3
The right side of the inequality, thus, is equal to . Therefore, This contradicts our assumption that is a vector that optimizes problem 2. Since the negation is impossible (false), 6 the original statement is true. Note that the lemma is still valid if the number used to limit in problem 2 is any 7 constant .  Lemma. If the optimal value of the SPOT problem is strictly positive, then its solution is unique.
Step 1. First we prove that . If then for sufficiently small satisfies 6 (and the other constraints) and 7 because , contradicting the maximality of .

8
Step 2. Because of the maximality of , the affine plane is a supporting plane for the convex set

13
The average correlation of standard FBA in Table 3 was calculated using solutions obtained in Step 1 under our 14 computational settings (see Methods in the main manuscript for the detailed settings).
Step 2 allows us to find a 15 metabolic flux distribution which achieves theoretically maximal or minimal correlation with the measured fluxes 16 while maintaining the optimal biomass flux, denoted as here. The nonlinear optimization problem in Step 2 was 17 solved using the sequential quadratic programming (SQP) algorithm provided by the MATLAB function fmincon.

18
Importantly, the maximum possible correlation can be calculated only when we already have the known measured 19 flux datasets. There is no way to force each method to produce a metabolic flux distribution which achieves the best 20 correlation with the measured fluxes. Our methods were developed during the process of finding that way and of 21 rigorously testing various strategies.

22
In the same way, the lower and upper bound of correlations of E-Flux can be calculated as follows: 23 14 Step 1. E-Flux Step 2. calculation of the possible range of correlation