IGM: Integrated Gene-expression Modeling for multi-condition flux-preserving genome-scale metabolic models

doi:10.1371/journal.pone.0342294

Fig 1.

Workflow of IGM framework.

(A) Gene expression data are transformed into a table of normalized relative expression values, ranging from 0 to 1. (B) Uptake rates and the GEM are provided as inputs for flux balance analysis (FBA) and flux variability analysis (FVA). (C) Relative gene expression values are mapped to expression variables and processed through gene–protein–reaction (GPR) rules to associate gene expression with reactions. (D) The MILP formulation of IGM determines the flux distribution for each condition by minimizing the difference between relative fluxes and relative gene expression while maximizing biomass production.

More »

Expand

Fig 2.

Performance comparison of IGM, FBA, and regularized variants.

(A-B) Predictive accuracy of IGM, FBA, and their L1/L2 regularized variants, evaluated by correlation coefficient (A) and normalized root mean square error (NRMSE) (B) between predicted fluxes and experimentally measured fluxes across three E. coli datasets (Data-A, Data-B, and Data-C). IGM consistently shows higher correlation and lower NRMSE compared to FBA. Regularization further improves performance, with L1 yielding the most stable and accurate predictions. (C) Reaction flux flexibility ratio between IGM and FBA across metabolic subsystems. Ratios ≤ 1 indicate that IGM reduces or maintains solution space relative to FBA, thereby refining flux predictions and improving reliability.

More »

Expand

Fig 3.

Comparison of IGM, extended IGM, and single-condition integration methods.

(A) Correlation coefficients between predicted and measured fluxes within each condition for GIMME (various thresholds), E-flux, E-flux2, IGM, IGM + L1, and IGM + L2 across three E. coli datasets. Single-condition methods show variable performance, with E-flux2 outperforming GIMME and E-flux but remaining less consistent than IGM-based methods. (B) Correlation between measured fluxes and predicted fluxes across conditions. IGM and its regularized variants achieve higher cross-condition consistency than all single-condition methods, demonstrating their strength in capturing dynamic flux changes. Overall, IGM + L1 provides the best balance of accuracy and stability across datasets.

More »

Expand

Fig 4.

The impact of varying the objective coefficient of biomass reaction flux and different gene expression transformation functions in Data-A, Data-B, and Data-C.

The first and second rows present boxplots of the predicted biomass levels and the correlation coefficients between predicted and measured fluxes, respectively. These predictions are obtained using IGM with different gene expression transformation functions: maximum gene expression as a reference, min-max scaling, and average gene expression as a reference. Each method is evaluated across various objective coefficient values for biomass flux: 1, B/1000, B/100, B/10, B, and 2B. The third row displays line plots of the average correlation coefficient in Data-A, Data-B, and Data-C, respectively, illustrating how changes in the biomass objective coefficient affect predictive performance. Each subplot corresponds to a different dataset, showing the trends across the three gene expression transformation methods. There are three lines which are blue, orange, and grey color which represent using maximum gene expression as reference, max-min scaling, and average of gene expression as reference, respectively.

More »

Expand

Fig 5.

Relative gene expression values and gene expression variable values of Data-A.

(A) Heatmap showing 30 randomly selected relative gene expression profiles (left panel) and gene expression variable values in IGM programming (right panel) across eight conditions: reference (RF), WT0.2 (wild type at 0.2 per hour), WT0.5 (wild type at 0.5 per hour), WT0.7 (wild type at 0.7 per hour), and single gene deletions (pgm, pgi, gapC, zwf, and rpe). The heatmap values range from 0 to 1, with blue indicating values near 0 and yellow indicating values near 1. (B) Histogram showing the distribution of correlation coefficients between relative gene expression values and gene expression variable values. (C) Box plot of correlation coefficients between relative gene expression values and gene expression variable values.

More »

Expand

Fig 6.

Flux solution change analysis for Data-B.

Heatmap showing the row-normalized average flux values in each metabolic subsystem across conditions C1 to C8, representing eight carbon sources: acetate, fructose, galactose, gluconate, glucose, glycerol, pyruvate, and succinate, respectively. The heatmap values range from 0 to 1, with blue tone indicating values near 0 and red tone indicating values near 1.

More »

Expand

Fig 7.

(A – D) Scatter plots of reaction flux changes comparing the glucose carbon source condition (C5) with four other conditions: acetate (C1, Fig 7A), fructose (C2, Fig 7B), galactose (C3, Fig 7C) and gluconate (C4, Fig 7D).

The x-axis represents the flux values (log scale) under the glucose condition, and the y-axis represents the flux values (log-scale) under the other conditions. The black dashed line indicates line y = x. The green and red nodes highlight the top 10 upregulated and downregulated relative reaction fluxes, respectively, with reaction names labeled; all other reactions are shown as blue nodes. (E – H) Horizontal bar plots of the top 10 upregulated (green) and downregulated (red) reactions, along with their relative flux scores, for the same comparisons: C5 vs. C1 (Fig 7E), C5 vs. C2 (Fig 7F), C5 vs. C3 (Fig 7G), and C5 vs. C4 (Fig 7H).

More »

Expand