Fig 1.
Flowchart for the proposed nsPCE surrogate modeling method.
The model response function can be freely chosen by the user. The singularity time function should be specified implicitly as a function of the DFBA model states. This function can be identified by simulating the DFBA model with nominal parameters and locating at which time points a switch in the active set of the FBA solution occurs. The PCE coefficients are fit using the basis-adaptive version of the hybrid LAR method, while the ED is sequentially enriched to ensure that the target accuracy level is achieved.
Fig 2.
Monte Carlo simulation of E. coli DFBA model.
The genome-scale model is integrated from 0 to 8.5 hours for 100 different parameter realizations that are independently drawn from the uniform prior density. The consumption of xylose only occurs after glucose is fully exhausted, which is a strong function of the parameters.
Fig 3.
Accuracy of singularity time surrogate models.
RMSE versus the number of model evaluations (i.e., size of the experimental design) used to train the PCE model for (a) the glucose singularity tg and (b) the xylose singularity tz. The RMSE was estimated empirically from a validation set of 10,000 full DFBA simulations.
Fig 4.
Uncertainty propagation with singularity time surrogate models.
The estimated global sensitivity indices of (a) tg and (b) tz with respect to the uncertain parameters. The estimated PDF of (c) tg and (d) tz based on 1e+6 surrogate model evaluations, which only requires approximately 1 second of CPU time.
Fig 5.
Parameter space decomposition over time.
The decomposition of the parameter support into two non-overlapping elements at (a) 6.5 hr, (b) 7.0 hr, and (c) 7.25 hr using a sparse PCE model of the glucose singularity time tg. The blue and red regions represent parameters for which tg(x) > t and tg(x) ≤ t, respectively, projected onto the two most sensitive parameters. The green line represents the decision boundary learned using an SVM classifier trained with the same set of data as the sparse PCE model, while the dashed green lines represent the corresponding 95% confidence limits. (d) Parity plot for the sparse PCE model of tg for 1e4 validation points.
Table 1.
Relative mean square error estimates for glucose singularity time surrogate models under multiple experimental design sizes.
Fig 6.
Convergence properties of nsPCE surrogate models.
(a,b) Glucose concentration at time 7 hours. (c,d) Xylose concentration at time 8 hours. (e,f) Biomass concentration at time 8 hours. Left plots show the validation RMSE versus the specified error tolerance. Right plots show the total number of model evaluations based on a sequential ED construction, with a maximum of 1000 samples allowed. The global sparse basis-adaptive PCE results are also shown for comparison purposes.
Fig 7.
Parity plots for nsPCE surrogate models.
(a,b,c) Target RMSE level εtarget = 10−2. (d,e,f) Target RMSE level εtarget = 10−3. (g,h,i) Target RMSE level εtarget = 10−4. The left, middle, and right columns correspond to glucose concentration at 7 hours, xylose concentration at 8 hours, and biomass concentration at 8 hours, respectively. The parity plots for global sparse basis-adaptive PCE are overlaid for comparison purposes. The global PCE has considerably larger error than nsPCE.
Fig 8.
Posterior distribution of the estimated model parameters.
The diagonal subplots represent marginal densities while the off-diagonal subplots represent two-dimensional projections of samples from the joint density. Blue denotes the posterior density while green denotes the prior density. The red line represents the true parameter values used to generate synthetic data for estimation purposes.
Fig 9.
Evolution of the posterior marginal densities for the observable model parameters over time.
Each subplot shows the histogram of parameter posterior samples estimated using the sequential Monte Carlo method. The x-axis represents the range of values of the parameters and the y-axis represents frequencies. The red line represents the true parameter values.
Fig 10.
Predicted probability distributions of extracellular concentrations.
(a) Model predictions using parameter samples from the prior. (b) Model predictions using parameter samples from the posterior. Each subplot shows the histogram of samples of the model output obtained by substituting i.i.d. samples from the parameter distribution into the corresponding ME-PCE surrogate model. The x-axis represents the range of values of the model outputs and the y-axis represents frequencies.
Fig 11.
Monte Carlo simulation of the synthetic metabolic network.
The synthetic DFBA model with twenty uncertain parameters is integrated from time 0 to 40 hours for 100 different parameter realizations drawn independently from the uniform prior density. The time profiles are shown for (a) biomass and lipids, (b) the substrates and products, and (c) the penalty state.
Fig 12.
Global sensitivity indices for the quantities of interest in the synthetic metabolic network.
(a)-(g) Global sensitivity indices for extracellular substrate and product concentrations at various time points with respect to the twenty uncertain parameters.
Fig 13.
Comparison of model predictions and data for the synthetic metabolic network.
The model predictions for (a) biomass and lipids and (b) the substrates and products, shown with solid lines, were obtained by integrating the DFBA model with the MAP estimates of the parameters. The ‘□’ marks represent the data that was obtained by corrupting the model predictions using the true (unknown) parameters with randomly generated noise. The dotted lines represent the model predictions based on the initial parameter guess that was used to initialization the optimizer.
Fig 14.
Estimated posterior distribution for the parameters of the synthetic metabolic network.
The diagonal subplots represent the estimated marginal densities, while the off-diagonal subplots represent the two-dimensional projections of the 95% confidence regions. Black ‘x’ marks represent the true parameter values, while the modes of the marginal densities signify the MAP estimates.