Accurate prediction of flux distributions compatible with metabolite concentration effects in genome-scale metabolic networks

doi:10.1371/journal.pcbi.1014066

Fig 1.

Schematic overview of KineFlux.

A. A toy example of a metabolic network with two reactions, R₁ and R₂. Reaction R₁ has two metabolites, S₁ and S₂, acting as substrates, converted into two products, M₃ and M₄; it is catalyzed by an enzyme with an abundance of E₁ and a flux of . The flux-sum of M₃, is defined as the total flux producing this metabolite. B. The apparent turnover number of R₁ is determined by dividing the reaction flux by the corresponding enzyme abundance for each experimental scenario (e.g., strain, condition), denoted by a flask of different color. The maximum apparent catalytic rate across all strains , is used to calculate the metabolite concentration effects, η, of the reaction flux for each experiment and reaction with available data. C. The features used to predict the metabolite concentration effects, η, for reaction R₁ are the metabolite flux-sums. D. Model selection identifies the optimal combination of flux-sums for a single, pair, or a triplet of metabolites to be used as additional predictors together with the flux-sums of the reaction substrates. For reaction R₁ in the toy example, the two substrates and four additional metabolites yield C(4,0)+C(4,1)+C(4,2)+C(4,3)=15 combinations, of which seven are displayed. The best model is chosen based on the adjusted , identifying the flux-sum of substrates alongside the pair (M₂, M₃). E. A logit regression model is then trained to predict η as a function of metabolite flux-sums. F. Finally, a constraint-based optimization problem, that includes the logit regression models as constraints, is solved to predict a flux distribution and evaluate it against estimated fluxes.

More »

Expand

Fig 2.

Performance of logit regression models for metabolite concentration effects and their implication on flux predictions in E. coli.

A. The histogram illustrates the performance of the logit regression models in predicting metabolite concentration effects, based on their adjusted , for 339 reactions, each with more than 10 values corresponding to different E. coli knock-out strains. Among these, 92 reactions achieved an adjusted greater than 0.5. The respective logit models were in turn used in the constraint-based optimization problem B. Comparison of the predicted flux from the optimization problem with the estimated flux for the phosphoglycerate kinase (PGK_b) reaction, resulting in a Pearson correlation coefficient of 0.90 (p-value = ). C. The histogram presents the number of reactions based on the Pearson correlations between their predicted and estimated fluxes.

More »

Expand

Table 1.

Comparison of KineFlux performance with established constraint-based modeling approaches.

More »

Expand

Fig 3.

Enrichment analysis for reactions with well-predicted fluxes.

A. Comparison of a predicted and estimated flux distribution for a representative knock-out strain, pgi6. The fluxes are logarithmically transformed, with a small constant () added to all values to avoid taking the logarithm of zero. The Pearson correlation between the predicted and estimated fluxes is 0.87 (p-value = 0.0). A prediction interval band, corresponding to a 90% confidence level, is depicted in light blue. The reactions inside this interval band are considered to have well-predicted fluxes. Highlighted reactions outside of the confidence region include: TPI_b (Triose-phosphate isomerase), ADK1 (Adenylate kinase), PPM_b (Phosphopentomutase), GSNK (Guanosine kinase), NTD9 (5’-nucleotidase (GMP)), ADK3 (Adentylate kinase (GTP)), NACODA (N-acetylornithine deacetylase), and EDA (2-dehydro-3-deoxy-phosphogluconate aldolase) B. The mean and standard deviation of the proportions of reactions with well-predicted fluxes across subsystems for all knock-out strains. The value above each bar indicates the number of knockout strains in which the corresponding metabolic subsystem is significantly enriched with reactions exhibiting well-predicted fluxes, determined using a hypergeometric test with Bonferroni-corrected p-values below the 0.02 significance threshold.

More »

Expand

Fig 4.

Performance of logit regression models for metabolite concentration effects and evaluation of predicted flux distributions in S. cerevisiae.

A. The histogram illustrates the performance of the logit regression models in predicting metabolite concentration effects, η, based on their adjusted . The data set comprises 281 reactions, each with more than 10 entries corresponding to different S. cerevisiae conditions. Among these, 73 reactions achieved an adjusted greater than 0.6, used in the constraint-based optimization problem. B. The plot compares the predicted flux from the optimization problem with the estimated flux for the reaction r_0569 (inorganic diphosphatase), resulting in a Pearson correlation coefficient of 0.95 (p-value= . C. The histogram presents the number of reactions based on the Pearson correlations between their predicted and estimated fluxes. In total, there are 418 reactions with at least 80% non-zero estimated fluxes across different conditions. More than 80% of these reactions have a Pearson correlation greater than 0.8 between estimated and predicted fluxes. D. Comparison of the predicted flux distribution with the estimated flux distribution for a representative condition, Yu2021_N30_035R2, which corresponds to the second biological replicate of nitrogen-limited chemostat growth at a dilution rate of 0.35 and a carbon-to-nitrogen (C/N) ratio of 30 [39]. The fluxes are logarithmically transformed, with a small constant () added to all values to avoid taking the logarithm of zero. The Pearson correlation between the predicted and estimated fluxes is 0.86 (p-value = 0.0). A prediction interval band, corresponding to a 90% confidence level, is included. The reactions inside the prediction interval band are considered to have well-predicted fluxes. Highlighted reactions outside of the confidence region include: r_1021_f (succinate dehydrogenase (ubiquinone-6)), r_0815_b (O-succinylhomoserine lyase (L-cysteine)), r_0326_f (dCMP deaminase), r_3533_b (NAD transport, cytoplasm-ER membrane), r_1128_f (citrate transport), r_3534_f (glycerol 3-phosphate transport, cytoplasm-ER membrane), and r_1112_b (AKG transporter) E. The mean and standard deviation of the proportions of reactions with well-predicted fluxes across subsystems for all conditions. We limited the subsystem to those with more than 30 reaction. The value above each bar indicates the number of conditions in which the subsystem is significantly enriched with reactions exhibiting well-predicted fluxes, determined using a hypergeometric test with Bonferroni-corrected p-values below the 0.02 threshold.

More »

Expand

Fig 5.

Performance of KineFlux to predict flux distribution for unseen conditions in E. coli.

A. Histogram showing the distribution of Pearson correlation coefficients between predicted and estimated fluxes of the reactions across various growth conditions, each associated with different carbon uptake sources [1]. B. Comparison of the predicted and estimated fluxes under the condition GLC_CHEM_mu = 0.21_V from Davidi et al. [1], which corresponds to a chemostat culture with a growth rate of 0.21 using glucose as the carbon source [40]. All flux values are log-transformed, with a small constant ( added to prevent logarithms of zero. Highlighted reactions located farthest from the diagonal, with non-zero predicted fluxes, include: MALS (Malate synthase), ADD (Adenine deaminase), FORtppi (Formate transport via diffusion), FDH4pp (Formate dehydrogenase (quinone-8)), ENO_f (Enolase), TPI_b (Triose-phosphate isomerase), PUNP5_f (Purine-nucleoside phosphorylase (Inosine)), FADRx (FAD reductase), GAPD_f (Glyceraldehyde-3-phosphate dehydrogenase), and PGK_b (Phosphoglycerate kinase).

More »

Expand