Fig 1.
The basic steps in 13C MFA and the model selection problem.
(A) New substrates, containing 13C (dark circles) are fed to the cells. (B) These substrates are consumed and converted to end products in the cells, according to its biochemical reactions. (C) The labelled 13C molecules appear to various proportions in each of the mass isotopomers, and these proportions are summed up in these distribution bar charts for each detected metabolite. (D) The iterative modelling cycle in which a hypothesized model structure is fitted to MID data. The model fit is evaluated, usually with a χ2-test, and either rejected or not. If the model structure is rejected it is revised and evaluated again. If the model structure is not rejected it is used for flux determination. (E) The iterative model development in (D) results in a model selection problem. Different approaches for solving this model selection problem might result in different model structures being selected. This paper evaluates how the uncertainty in measurement data affects uncertainty in model selection.
Fig 2.
Example of MID sample standard deviation (A) Example of estimated mass isotopomer distribution (MID) of citrate from epithelial cells, as described in section 2.5.
M+i indicate the fractional abundance of the i:th mass isotopomer. (B) Difference between the assumed magnitude of the standard deviations and the measured magnitudes.
Table 1.
A summary of the different model selection approaches considered in this paper.
Fig 3.
Example of how model selection is affected by σb, for the polynomial model. Error bars indicate data sampled from a 7th order polynomial y = h7(x, u0)+ϵ where ϵ is N(0, σr), σr = 0.2. Colours indicate estimation data Dest (blue) and validation data Dval (red) used by the “Validation” method. Solid curves in (A–B) indicate polynomials chosen by an estimation-based method with different “believed” standard deviation σb. (A) σb = 2, chosen model h1. (B) σb = 0.2 (the true value), chosen model h7 (the correct model). (C) σb = 0.02, chosen model h14.
Fig 4.
Model selection results for the polynomial model example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the polynomial models h1,…,h14. For each row, color indicates the fraction of times a model is selected for the given σb, out of 10,000 samples, as indicated by the color scale (right).
Fig 5.
Six different model structures for the linear model.
This example is chosen as a simple representation of a mass flow model. The top row shows the model names A1,…,A6. The second row shows the matrices that constitute the model structures. The third row constitute visual illustrations of how the corresponding matrices connect the inputs xi and the outputs yi via the parameters a1,…,a6.
Fig 6.
Model selection results for the linear model example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the linear models A1,…,A6. For each row, color indicates the fraction of times a model is selected for the given σb, out of 1000 samples, as indicated by the color scale (right).
Fig 7.
Seven different model structures included in the simulated EMU 13C MFA example with simulated data.
The added component to each model structure, compared to the previous model, with slightly smaller complexity, is found inside the red circle. The true model used to simulate the data is model nr 4. Detailed descriptions for each model can be found in the supplementary material (S1 Table).
Fig 8.
Model selection results for the simulated 13C MFA model example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the MFA models . For each row, color indicates the fraction of times a model is selected for the given σb, out of 100 samples, as indicated by the color scale (right).
Fig 9.
Comparison of estimated flux solutions for the simulated 13C MFA example.
The resulting flux values with 95% confidence intervals for seven of the fluxes that are overlapping between all model structures in the simulated 13C MFA example. The confidence intervals correspond to the estimated fluxes for model (Blue), model
with all data available (Green) and model
with the data split into Dest and Dval (Red). The figure illustrates the selecting the wrong model structure may result in incorrect flux estimations.
Fig 10.
How prediction uncertainty can be used to assess the novelty in the validation data.
(A) If there is too little novelty in the validation data, differences between estimation data and validation data will typically be smaller than the prediction and measurement uncertainty. (B) If there is too much novelty in the validation data, there is no information about the corresponding MIDs, and the prediction uncertainty will be large, approaching [0,1]. (C) An ideal design of validation data is thus to have well-determined predictions that are different compared to the estimation data. To be sure that there really is new information, one should also check that the new fluxes generate linearly independent EMU basis vectors (Section 2.4).
Fig 11.
Usage of prediction uncertainty to demonstrate that the validation data has neither too little, nor too much, novelty, compared to the estimation data.
This analysis shows the result from the simulated 13C MFA example (Fig 7–9). The model was trained on estimation data corresponding to three tracers: Tracer 1 = 1,2-13C-glutamine (dark red), Tracer 2 = 3-13C-pyruvate (red), and Tracer 3 = U-13C-glutamine (light red). The validation data (dark blue) came from usage of tracer U-13C-pyruvate. For the experimental data, the error bars represent standard deviation, and for the model predictions (light blue), the error bars represent model uncertainty (Section 4.4).
Fig 12.
Model selection results for the cultures epithelial cell example.
(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σb and columns represent the MFA models . For each row, color indicates the fraction of times a model is selected for the given σb, out of 1000 samples, as indicated by the color scale (right).
Fig 13.
Validation of lipid synthesis in HMEC cultures.
(A) Schematic of the model for lysophosphatidylcholine (LPC) 16:0 synthesis from acetate (ac). (B) Predicted MID of ac from the model selected by the “Validation” method. (C) Measured MID of glycerol-3-phosphocholine (g3pc). (D) Fitted (gray) and measured (black) MID of LPC 16:0. Mean values of biological triplicates are shown in (C, D). Error bars indicate standard deviation.