Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors

doi:10.1371/journal.pcbi.1009999

Fig 1.

The basic steps in ¹³C MFA and the model selection problem.

(A) New substrates, containing ¹³C (dark circles) are fed to the cells. (B) These substrates are consumed and converted to end products in the cells, according to its biochemical reactions. (C) The labelled ¹³C molecules appear to various proportions in each of the mass isotopomers, and these proportions are summed up in these distribution bar charts for each detected metabolite. (D) The iterative modelling cycle in which a hypothesized model structure is fitted to MID data. The model fit is evaluated, usually with a χ²-test, and either rejected or not. If the model structure is rejected it is revised and evaluated again. If the model structure is not rejected it is used for flux determination. (E) The iterative model development in (D) results in a model selection problem. Different approaches for solving this model selection problem might result in different model structures being selected. This paper evaluates how the uncertainty in measurement data affects uncertainty in model selection.

More »

Expand

Fig 2.

Example of MID sample standard deviation (A) Example of estimated mass isotopomer distribution (MID) of citrate from epithelial cells, as described in section 2.5.

M+i indicate the fractional abundance of the i:th mass isotopomer. (B) Difference between the assumed magnitude of the standard deviations and the measured magnitudes.

More »

Expand

Table 1.

A summary of the different model selection approaches considered in this paper.

More »

Expand

Fig 3.

Example of how model selection is affected by σ_b, for the polynomial model. Error bars indicate data sampled from a 7^th order polynomial y = h₇(x, u₀)+ϵ where ϵ is N(0, σ_r), σ_r = 0.2. Colours indicate estimation data D^est (blue) and validation data D^val (red) used by the “Validation” method. Solid curves in (A–B) indicate polynomials chosen by an estimation-based method with different “believed” standard deviation σ_b. (A) σ_b = 2, chosen model h₁. (B) σ_b = 0.2 (the true value), chosen model h₇ (the correct model). (C) σ_b = 0.02, chosen model h₁₄.

More »

Expand

Fig 4.

Model selection results for the polynomial model example.

(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the polynomial models h₁,…,h₁₄. For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 10,000 samples, as indicated by the color scale (right).

More »

Expand

Fig 5.

Six different model structures for the linear model.

This example is chosen as a simple representation of a mass flow model. The top row shows the model names A₁,…,A₆. The second row shows the matrices that constitute the model structures. The third row constitute visual illustrations of how the corresponding matrices connect the inputs x_i and the outputs y_i via the parameters a₁,…,a₆.

More »

Expand

Fig 6.

Model selection results for the linear model example.

(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the linear models A₁,…,A₆. For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 1000 samples, as indicated by the color scale (right).

More »

Expand

Fig 7.

Seven different model structures included in the simulated EMU ¹³C MFA example with simulated data.

The added component to each model structure, compared to the previous model, with slightly smaller complexity, is found inside the red circle. The true model used to simulate the data is model nr 4. Detailed descriptions for each model can be found in the supplementary material (S1 Table).

More »

Expand

Fig 8.

Model selection results for the simulated ¹³C MFA model example.

(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the MFA models . For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 100 samples, as indicated by the color scale (right).

More »

Expand

Fig 9.

Comparison of estimated flux solutions for the simulated ¹³C MFA example.

The resulting flux values with 95% confidence intervals for seven of the fluxes that are overlapping between all model structures in the simulated ¹³C MFA example. The confidence intervals correspond to the estimated fluxes for model (Blue), model with all data available (Green) and model with the data split into D^est and D^val (Red). The figure illustrates the selecting the wrong model structure may result in incorrect flux estimations.

More »

Expand

Fig 10.

How prediction uncertainty can be used to assess the novelty in the validation data.

(A) If there is too little novelty in the validation data, differences between estimation data and validation data will typically be smaller than the prediction and measurement uncertainty. (B) If there is too much novelty in the validation data, there is no information about the corresponding MIDs, and the prediction uncertainty will be large, approaching [0,1]. (C) An ideal design of validation data is thus to have well-determined predictions that are different compared to the estimation data. To be sure that there really is new information, one should also check that the new fluxes generate linearly independent EMU basis vectors (Section 2.4).

More »

Expand

Fig 11.

Usage of prediction uncertainty to demonstrate that the validation data has neither too little, nor too much, novelty, compared to the estimation data.

This analysis shows the result from the simulated ¹³C MFA example (Fig 7–9). The model was trained on estimation data corresponding to three tracers: Tracer 1 = 1,2-¹³C-glutamine (dark red), Tracer 2 = 3-¹³C-pyruvate (red), and Tracer 3 = U-¹³C-glutamine (light red). The validation data (dark blue) came from usage of tracer U-¹³C-pyruvate. For the experimental data, the error bars represent standard deviation, and for the model predictions (light blue), the error bars represent model uncertainty (Section 4.4).

More »

Expand

Fig 12.

Model selection results for the cultures epithelial cell example.

(A–D) Heatmaps represent results from the indicated selection methods, where rows represent different values of σ_b and columns represent the MFA models . For each row, color indicates the fraction of times a model is selected for the given σ_b, out of 1000 samples, as indicated by the color scale (right).

More »

Expand

Fig 13.

Validation of lipid synthesis in HMEC cultures.

(A) Schematic of the model for lysophosphatidylcholine (LPC) 16:0 synthesis from acetate (ac). (B) Predicted MID of ac from the model selected by the “Validation” method. (C) Measured MID of glycerol-3-phosphocholine (g3pc). (D) Fitted (gray) and measured (black) MID of LPC 16:0. Mean values of biological triplicates are shown in (C, D). Error bars indicate standard deviation.

More »

Expand