## Figures

## Abstract

Metabolic fluxes, the number of metabolites traversing each biochemical reaction in a cell per unit time, are crucial for assessing and understanding cell function. ^{13}C Metabolic Flux Analysis (^{13}C MFA) is considered to be the gold standard for measuring metabolic fluxes. ^{13}C MFA typically works by leveraging extracellular exchange fluxes as well as data from ^{13}C labeling experiments to calculate the flux profile which best fit the data for a small, central carbon, metabolic model. However, the nonlinear nature of the ^{13}C MFA fitting procedure means that several flux profiles fit the experimental data within the experimental error, and traditional optimization methods offer only a partial or skewed picture, especially in “non-gaussian” situations where multiple very distinct flux regions fit the data equally well. Here, we present a method for flux space sampling through Bayesian inference (BayFlux), that identifies the full distribution of fluxes compatible with experimental data for a comprehensive genome-scale model. This Bayesian approach allows us to accurately quantify uncertainty in calculated fluxes. We also find that, surprisingly, the genome-scale model of metabolism produces narrower flux distributions (reduced uncertainty) than the small core metabolic models traditionally used in ^{13}C MFA. The different results for some reactions when using genome-scale models vs core metabolic models advise caution in assuming strong inferences from ^{13}C MFA since the results may depend significantly on the completeness of the model used. Based on BayFlux, we developed and evaluated novel methods (P-^{13}C MOMA and P-^{13}C ROOM) to predict the biological results of a gene knockout, that improve on the traditional MOMA and ROOM methods by quantifying prediction uncertainty.

## Author summary

^{13}C MFA practitioners know that modeling results can be sensitive to minor modifications of the metabolic model. Certain parts of the metabolic model that are not well mapped to a molecular mechanism (*e.g*. drains to biomass or ATP maintenance) can have an inordinate impact on the final fluxes. The only way to ascertain the validity of the model is by checking that the result does not significantly differ from previously observed flux profiles. However, that approach diminishes the possibility of discovering truly novel flux profiles. Because of this strong dependence on metabolic model details, it would be very useful to have a systematic and repeatable way to produce these metabolic models. And indeed there is one: genome-scale metabolic models can be systematically obtained from genomic sequences, and represent all the known genomically encoded metabolic information. However, these models are much larger than the traditionally used central carbon metabolism models. Hence, the number of degrees of freedom of the model (fluxes) significantly exceeds the number of measurements (metabolite labeling profiles and exchange fluxes). As a result, one expects many flux profiles compatible with the experimental data. The best way to represent these is by identifying all fluxes compatible with the experimental data. Our novel method BayFlux, based on Bayesian inference and Markov Chain Monte Carlo sampling, provides this capability. Interestingly, this approach leads to the observation that some traditional optimization approaches can significantly overestimate flux uncertainty, and that genome-scale models of metabolism produce narrower flux distributions than the small core metabolic models that are traditionally used in ^{13}C MFA. Furthermore, we show that the extra information provided by this approach allows us to improve knockout predictions, compared to traditional methods. Although the method scales well with more reactions, improvements will be needed to tackle the large metabolic models found in microbiomes and human metabolism.

**Citation: **Backman TWH, Schenk C, Radivojevic T, Ando D, Singh J, Czajka JJ, et al. (2023) BayFlux: A Bayesian method to quantify metabolic Fluxes and their uncertainty at the genome scale. PLoS Comput Biol 19(11):
e1011111.
https://doi.org/10.1371/journal.pcbi.1011111

**Editor: **Christos A. Ouzounis,
CPERI, GREECE

**Received: **April 18, 2023; **Accepted: **September 27, 2023; **Published: ** November 10, 2023

**Copyright: ** © 2023 Backman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **We provide BayFlux, an open source implementation of the software and algorithms used for this analysis. BayFlux is a Python library for both genome scale and two-scale ^{13}C Metabolic Flux Analysis (MFA). It is built to be used in conjunction with the COBRApy library for Flux Balance Analysis (FBA), and provides compatibility with all COBRApy features for analyzing genome scale models. Our AcMet algorithm is implemented as an extension to the existing metabolic sampler infrastructure included in COBRApy. BayFlux can be downloaded from https://github.com/JBEI/bayflux, and includes extensive documentation, as well as automated testing for correctness (unit testing). The git repository also contains Jupyter notebooks with all of the analysis presented in this paper, which can serve as a template for performing a new analysis with BayFlux. BayFlux also works directly with models pre-processed through the Limit Flux To Core (lftc) software library provided at https://github.com/JBEI/limitfluxtocore. This enables BayFlux to perform 2S-^{13}C MFA, a simplifying assumption that reduces compute requirements, and makes model curation easier by only requiring atom transitions for central carbon metabolism. All data used in this manuscript was previously published, and is also packaged together with the BayFlux software provided at the GitHub url listed above, to allow readers to easily reproduce these results.

**Funding: **This work was part of the DOE Agile BioFoundry (http://agilebiofoundry.org) (to H.G.M.), and the DOE Joint BioEnergy Institute (http://www.jbei.org) (to J.D.K. and H.G.M.), supported by the U. S. Department of Energy, Bioenergy Technologies Office, and the Office of Science, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U.S. Department of Energy. This work was also supported by the Basque Government through the BERC 2022–2025 program and by the Spanish Ministry of Science and Innovation MICINN (AEI): BCAM Severo Ochoa excellence accreditation CEX2021-001142-S and PID2019-104927GB-C22 grants (both to E.A.). J.J.C. was supported by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists, Office of Science Graduate Student Research (SCGSR) program. The SCGSR program is administered by the Oak Ridge Institute for Science and Education (ORISE) for the DOE. ORISE is managed by Oak Ridge Associated Universities (ORAU) under contract number DE-SC001464. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** I have read the journal’s policy and the authors of this manuscript have the following competing interests: J.D.K. has financial interests in Amyris, Ansa Biotechnologies, Apertor Pharma, Berkeley Yeast, Cyklos Materials, Demetrix, Lygos, Napigen, ResVita Bio and Zero Acre Farms.

## Introduction

Synthetic biology enables us to bioengineer cells for synthesis of novel valuable molecules such as renewable biofuels or medical drugs [1–5], but its full potential is hindered by our inability to predict biological behavior [6, 7]. We can engineer DNA changes faster than ever, and we can measure the impact of these genetic changes in more detail than ever through an increasing amount of functional genomics data. But the availability of all these advances does not necessarily translate into better predictive capabilities for biological systems: converting the collected data into actionable insights to achieve a given goal (*e.g*., higher bioproduct yields) is far from trivial or routine.

Metabolic fluxes (*i.e*., the number of metabolites traversing each biochemical reaction per unit time per a given amount of biomass) are crucial to predict and understand biological systems because they map how carbon and electrons flow through metabolism to enable cell function. Flux analysis, for example, has been used to improve biofuel production [8], contextualize multiomics data [9], and provide insights into multi-species relationships [10].

Although there are several popular methods for studying metabolic fluxes [11–13]^{13}C MFA is considered to be the gold standard to measure metabolic fluxes. Metabolic Flux Analysis (MFA [11], as opposed to ^{13}C MFA) works by using measurements of the exchange fluxes (fluxes coming in and out of the cell) to fully constrain a small core metabolic network (with less degrees of freedom than measurements). Flux Balance Analysis (FBA [12]) uses a comprehensive metabolic network (a genome-scale metaboic model, or GSMM) that encompasses all reactions encoded in the genome, constrains it using the exchange fluxes, and then finds the fluxes corresponding to the highest growth rates. Traditional ^{13}C MFA [13] works by using exchange fluxes as well as data from ^{13}C labeling experiments to find which flux profile (*i.e*., set of fluxes for every reaction in the model) best describes the data for a small metabolic model. Genome-scale ^{13}C MFA uses the same data types but finds the flux profile for a (much larger) genome-scale metabolic model, often containing thousands of reactions [14, 15]. FBA can produce good results when the cells are under selection for maximum growth [16] but is less useful when that is not the case (*e.g*. human cells or engineered bacterial strains). ^{13}C MFA is considered the most accurate approach to measure metabolic fluxes but relies on expensive and time consuming ^{13}C experiments which are nontrivial to do in a high-throughput fashion. Both methods have been used successfully to guide bioengineering processes [17, 18].

The optimization approach used to date in ^{13}C MFA to determine fluxes [19–22] shows several limitations, particularly in characterizing the full distribution of fluxes compatible with the data. For example, the results from small core metabolic models can be very sensitive to the modification of apparently innocuous components of the model. Also, if genome-scale models are being used, the system shows more degrees of freedom (fluxes) than experimental data and we expect many flux profiles to be compatible with the experimental data. Furthermore, errors in determining ^{13}C labeling for a single metabolite can completely derail flux calculation. This optimization approach also often depends on expensive commercial solvers, and is hard to parallelize. Finally and most importantly, uncertainty quantification relies on confidence intervals estimated with help of frequentist statistics. Such intervals depend on a given experimental outcome and may result in misinterpretation of the flux uncertainty, as has been noted by others before us [23]. For example, the solution space may have an area of poor fit to the data between two distinct regions of excellent fit (*e.g*. non-gaussian fitness distribution), such that a single point does not meaningfully represent the experimental data.

Here, we present a method (BayFlux) that rigorously identifies the full distribution of flux profiles compatible with experimental data for a genome-scale model (Fig 1). The result provided to the end user is a “probability distribution” of possible fluxes, which faithfully reports the full uncertainty due to experimental error, and any potential model data incompatibilities. To achieve this, we combine *both* Bayesian inference and Markov Chain Monte Carlo (MCMC) methods to sample flux space, as informed by ^{13}C labeling and flux exchange data. Just Monte Carlo sampling by itself provides all fluxes compatible with experimental data instead of just the fluxes that best fit the available data [24–27]. However, while this approach offers much needed flux uncertainty quantification [28], it fails when the data used is inconsistent (*e.g*. fluxes coming in and out of the cell are not mass balanced). Bayesian inference can address this complication because it is based on a probabilistic interpretation that ensures a systematic approach to manage data inconsistencies and update flux probability distributions as more data becomes available [29–32]. Combining Monte Carlo flux sampling with Bayesian statistics hence provides reliable flux uncertainty quantification in a way that scales efficiently as more data become available [23]. Unlike previous approaches [23, 30, 33], our approach enables flux uncertainty quantification for genome-scale models rather than only for small central carbon metabolism models [23], and helps trace the origin of flux uncertainty directly to the physical measurements of metabolite labeling. Using genome-scale models for modeling metabolism provides a comprehensive understanding of all metabolic fluxes in a cell [14, 15], and standardizes the application of ^{13}C MFA. Furthermore, MCMC Bayesian inference is easier to parallelize, use with large datasets, and integrate with heterogenous data sources (*e.g*., multiomics data [31]).

The state of the art in measuring metabolic fluxes (^{13}C MFA) involves using extracellular metabolite concentration data (top left) and ^{13}C experimental data (bottom left) to find the fluxes that best fit the data (optimization approach) for a core metabolic model. The extracellular metabolite concentration data is converted into exchange fluxes, and the ^{13}C experimental data involves the metabolite labeling patterns or Mass Distribution Vector, MDV, or Mass Isotopomer Distribution, MID (e.g., for cytosolic ribose-5-phosphate, r5p_c, among others) after labeled glucose is metabolically transformed. However, core metabolite models only represent a small fraction of all possible reactions and involve simplifications that can have an inordinate impact on the calculated fluxes. Genome-scale models can be systematically derived from the organism genome and represent a comprehensive description of metabolism, but also display more degrees of freedom (reactions) than measurements. This mismatch results in several flux profiles being compatible with the experimental data, which are badly represented by a single flux solution, even if coupled with a confidence interval. Whereas the optimization approach (top right, where each point in the x axis is a different flux, and the y axis its value and confidence interval) only provides a best estimate for the flux profile (e.g., **v**_{CS} for citrate synthase, cs) and a confidence interval, BayFlux uses Bayesian Inference and Monte Carlo sampling to provide the full distribution of fluxes compatible with the experimental data (bottom right, where the x axis is the flux value, and the y axis represents *P*(**v**_{CS}): the probability of the flux being **v**_{CS}).

We showcase BayFlux by performing the first flux sampling for a genome-scale model as constrained by ^{13}C data, and demonstrating more informative flux profiles than can be achieved with non-Bayesian, optimization-based approaches. Using an *E. coli* model and data set [34], we show that BayFlux produces results that are compatible with optimization results, and also offer valuable uncertainty quantification information. This uncertainty quantification allows us to show that optimization approaches can overestimate flux uncertainty by representing it through only two numbers: the upper and lower confidence intervals. We find that genome-scale models of metabolism result in narrower flux distributions than the small core metabolic models that are traditionally used in ^{13}C MFA. Based on BayFlux, we develop and evaluate novel methods (P-^{13}C MOMA and ROOM) to predict the biological results of a gene knockout, that improve on FBA-based MOMA and ROOM methods by quantifying prediction uncertainty. Finally, we find that BayFlux scales well as more reactions are added, but efficiency improvements will be needed to sample the very large metabolic models required for microbiomes or human metabolism. We also discuss ideas for future improvement.

## Results and discussion

### A Bayesian approach to sampling fluxes

BayFlux uses Bayesian rather than frequentist statistics to determine fluxes through ^{13}C MFA. Bayesian and frequentist statistics represent two different paradigms to deal with probability and inference.

Frequentist statistics, by far the most often used approach in ^{13}C MFA, assume the existence of a true vector of fluxes **v**, use Maximum Likelihood Estimators (MLE) to find such a vector, and rely on confidence intervals to reflect uncertainties in flux estimates. The MLE determines **v** as the most likely to produce the measured experimental data: i.e., it is a point estimator that generates a single result even when many fluxes produce the same experimental data. The uncertainty in the result is reflected through a confidence interval, which can be computed in different ways that do not necessarily lead to the same outcome [23]. This approach struggles to deal with situations where many fluxes can equally best represent the experimental data, particularly if they are not adjacent.

Bayesian inference assumes a hypotheses-driven point of view, where the goal is to estimate the posterior *p*(**v**|**y**) representing the probability that a certain value of **v** is realized, given a certain prior knowledge and experimental data **y** available to the observer. Providing full probability distributions allows for a more accurate description in the case where many fluxes can equally best represent the experimental data. Bayesian inference calculates the posterior *p*(**v**|**y**) through the Bayes formula [35]:
(1)
where *p*(**v**|**y**) is the probability of obtaining experimental data **y** given fluxes **v**, *p*(**y**) is the probability of observing data **y**, and *p*(**v**) is the probability of observing **v** (representing prior knowledge). Monte Carlo sampling is used to calculate the posterior *p*(**v**|**y**) by “jumping” from point to point in the phase spaces of fluxes **v** with a probability given by the numerator *p*(**y**|**v**)*p*(**v**) (since the denominator is the same for all **v**) [36]. This process is very different from the Monte Carlo approach used in the past in, e.g., Antoniewicz *et al* [37] to calculate confidence intervals. This latter Monte Carlo process is still a frequentist approach to provide a single best flux estimate and a confidence interval by means of repeatedly adding noise to the Mass Isotopomer Distribution (MID) and recalculating the fluxes anew.

Our novel algorithm, Artificial centering Metropolis sampling (AcMet, Fig 2, algorithm 1), forms the basis of BayFlux by allowing us to sample from the posterior probability distribution, and is based on the commonly used Artificial Centering Hit-and-Run (ACHR) algorithm [33]. The ACHR algorithm is widely used to sample the genome scale metabolic models (GSMMs) flux space (a polytope) as constrained by stoichiometry only (i.e., the approach that forms the basis of FBA). The ACHR algorithm works by randomly jumping around the flux polytope, and keeping a running average of all samples called the “center.” Jumps always move in direction vectors from the center to known edges of the polytope, allowing the sampler to efficiently move around the polytope, even in directions with unusually squished or elongated dimensions. This method of sampling works well to collect uniform samples for the case where the only constraints are stochiometric, since all points inside the polytope are equally valid. We modify the ACHR algorithm and combine it with the Metropolis algorithm to produce a Markov Chain Monte Carlo (MCMC) sampler that can also take into account ^{13}C experimental data. We achieve this by sampling in two phases: an initial ACHR phase which finds the center of the flux polytope, and a second Bayesian inference phase in which the center is locked, and samples are accepted or rejected based on how well they fit the available ^{13}C experimental data. Briefly, for each flux point proposed in the Bayesian inference phase, the corresponding mass isotopomer distributions (MIDs) can be calculated using Elementary Metabolite Units (EMUs, [38]). The flux point is then more likely to be accepted as a sample the closer the calculated MIDs are to the measured MIDs. Locking the center is necessary to make the proposed jumps reversible- a necessary condition to guarantee that the sampler converges to the correct probability distribution. In order to make this approach work, we also need to add the capability to sample from cyclic fluxes (which are irrelevant in FBA, since it only considers net fluxes).

The AcMet algorithm is used to sample the phase space and find the probability for each flux profile (see Eq (1), and Algorithm 1). Each frame illustrates a step in the AcMet algorithm, shown in only two dimensions for simplicity. The black outline represents the feasible flux phase space (a polytope), as determined by the genome scale model stoichiometric matrix. **1. Center identification**. Initial ‘edge points’ are identified on the edges of the flux space by minimizing and maximizing each reaction. A running average of all samples is maintained as the ‘center’ and a series of samples are taken, always moving the current point in a direction determined by the current center and one of the edge points. Once a direction is determined, a sample is chosen from the uniform distribution within the allowable bounds, and all samples are accepted. Sufficient samples are collected to obtain a stable center. **2. Metropolis sampling**. Once a stable center is identified, the center is locked, and all previous samples are discarded. New proposed samples are collected in the same manner as step 1., but without updating the center. **3. Reject low probability samples**. Samples are accepted or rejected probabilistically based on the ratio of the likelihood of the data given the new sample, divided by the likelihood of the data given the current sample, *L*(*data*|*new sample*)/*L*(*data*|*current sample*). All higher likelihood samples are accepted. **4. If a sample is rejected, back up a step**. If a sample is rejected, it is discarded, and the sampler is moved back to the previous sample location, and records an additional sample at the previous location. **5. If a sample is accepted, continue**. If a sample is accepted, it is recorded and more samples are collected, just as in step 1, but without updating the center. **6. Halt and report posterior probability**. After a sufficient number of samples are collected, they are used to describe the posterior probability distribution. See Materials and methods for further details.

### BayFlux Monte Carlo sampling results are compatible with optimization results

We obtain compatible results when calculating fluxes through the sampling and optimization approaches for the same core metabolic model (Figs 3 and 4, A in S1 Text, B in S1 Text, and C in S1 Text). We compare our new BayFlux sampling approach with the optimization approach via a classical ^{13}C MFA tool: 13CFLUX2 [34], which leverages the IPOPT (www.coin-or.org/ipopt) and NAG C (www.nag.co.uk) mathematical optimization libraries. For this comparison, we use the *E. coli* data and core metabolic model employed in the demonstration of 13CFLUX2: measurements of glucose uptake, growth rate, and the labeling of eleven central carbon intracellular metabolites for an *E. coli* MG1655 strain grown in glucose-limited continuous culture, and a model comprised of 66 reactions and 37 metabolites describing central carbon metabolism (see Core Metabolic Model 1 below). Flux profiles for the best fit and best sample (*i.e*. highest likelihood) from both methods are very similar (Fig 3), and the 13CFLUX2 fluxes are always enclosed by the BayFlux probability distributions and close to the BayFlux best sample (Fig 4). These results hold for fifteen instances of flux profiles that were obtained through both methods (Figs B in S1 Text and C in S1 Text). Furthermore, the corresponding metabolite labeling patterns (mass distribution vectors or MDVs) are virtually identical between 13CFLUX2 and BayFlux (Fig A in S1 Text). BayFlux, however, provides more information on flux uncertainty, since it reports the full flux probability distribution, which is rarely uniform over the optimization confidence interval (Fig 4). BayFlux is best used as a probabilistic tool, and while it can report a single “best sample” (e.g. highest likelihood) point solution, we recommend considering the entire distribution when interpreting results.

The best sample (*e.g*. highest posterior probability) from BayFlux (for ten million samples) is here compared with the best fit obtained from 13CFLUX2. Results for best fits and samples are similar: while there are differences for some TCA cycle fluxes (e.g. tca3, tca4), the credible intervals for these fluxes overlap with the 13CFLUX2 best fit and its confidence interval (Fig 4 and C in S1 Text), indicating that the difference is not significant given the current data. In general, BayFlux credible intervals overlap with the 13CFLUX2 best fit and its confidence interval for all different inputs (Figs B in S1 Text and C in S1 Text). All the fluxes are in units of mmol/gDW/h. Reaction names correspond to Core Metabolic Model 1 (see Materials and methods).

Whereas the optimization approach only provides the best fit and confidence intervals, BayFlux supplies the probability distribution of all fluxes compatible with the experimental ^{13}C data (Fig 1). Probability densities (blue), best sample (vertical magenta line), and mean (vertical green line) from BayFlux for ten million flux samples are shown vs. 13CFLUX2 best fit with confidence intervals (in orange) for 5 out of 66 fluxes (see Fig 3 for best fits and best samples for a greater number of reactions). Reaction names correspond to Core Metabolic Model 1 (see Materials and methods). The credible intervals for, e.g., fluxes tca3 and tca4 (see Fig 3) overlap with the 13CFLUX2 best fit and confidence intervals. This shows that the difference is not significant, given the current data, and highlights the importance of quantifying flux uncertainty.

The rigorous uncertainty quantification provided by BayFlux shows that the 13CFLUX2 confidence intervals often grossly overestimate flux uncertainty (Fig 4). For example, fluxes ‘upt’ and ‘emp3’ show a probability density that extends over a range that is an order of magnitude smaller than the corresponding confidence interval. Hence, these fluxes can be determined more accurately through BayFlux than the optimization approach used in 13CFLUX2. The linearized statistics used as a default in 13CFLUX2 to identify confidence intervals have been shown in the past to provide only approximate confidence intervals, which can be highly inaccurate [37].

### Genome-scale models produce narrower flux distributions than core metabolic models

Flux sampling (via BayFlux) for the genome-scale model of metabolism generally produces narrower distributions of fluxes compatible with the experimental data than the small core metabolic models that are traditionally used in ^{13}C MFA (Fig 5). This means that fluxes are more accurately determined by using genome-scale models than small core models. For this comparison, we use the *E. coli* data and core metabolic model previously published by Toya *et al*. [39] (63 reactions and 47 metabolites describing central carbon metabolism, see Core Metabolic Model 2 below), and as the genome scale model we use the *E. coli* genome-scale model combining iAF1260 [40] and imEco726 [14] models (see genome-scale *E. coli* atom mapping model below). This finding is surprising because one would expect that the extra degrees of freedom provided by the several thousand reactions in the genome-scale model (as compared to the ∼60 in the core model) would allow for many more distinct flux profiles to meet stoichiometric constraints and labeling data. However, it seems that central metabolic fluxes are well constrained by metabolite labeling, and the addition of several hundreds of extra reactions only add draws of cofactors that further constrain core fluxes. This is consistent with the known bow tie structure of metabolism [41]. Exceptions to this observation involve reactions F6PA (fructose 6-phosphate aldolase), DHAPT (dihydroxyacetone phosphotransferase), and PYK (Pyruvate kinase), which show greater uncertainty (*e.g*. broader probability distrbution peaks) for the genome-scale model than the core model (Fig 5).

The results obtained from BayFlux with a core metabolic model (blue) are compared with those obtained from a genome-scale model (orange). Using a genome-scale model produces a narrower flux distribution (higher certainty posterior probability distributions), as informed by a greater amount of biological knowledge encoded in the genome-scale model. Notice too, that certain reactions display very different averages. For example, GLUDY shows very different averages for the genome-scale and core metabolic models, advising caution in assuming strong inferences from ^{13}C MFA since the results may depend significantly on the model used. Additionally, several of the probability distributions are non-Gaussian, which can only be meaningfully represented as a full distribution rather than a point or interval. We show here only reactions which occur in both models, and which show convergence across 4 repeated BayFlux runs (, Gelman-Rubin statistic [42], see main text). Reaction names correspond to Core Metabolic Model 2 (see Materials and methods).

To systematically quantify the uncertainty in the posterior probability distributions between the genome scale and core metabolic models, we computed the absolute value of the coefficient of variation for each of the 27 reactions which were present in both models, and which show convergence across 4 repeated BayFlux runs (, Gelman-Rubin statistic [42]). The posterior flux distributions for the core model showed a mean coefficient of variation of 0.359, whereas the genome scale model showed a mean coefficient of variation of 0.302. Overall, this shows a trend of greater information, and reduced uncertainty in the flux results for the genome scale metabolic model vs the core model.

Interestingly, whereas flux distributions for both genome-scale models and core models are mostly concordant in their means, there are a few instances in which flux measurements present very different averages (Fig 5). For example, PPCK (phosphoenolpyruvate carboxykinase), PPC (phosphoenolpyruvate carboxylase), and GLUDY (Glutamate dehydrogenase) offer very different flux estimates depending on which model we use (lower values for the genome-scale model). Among those are PPC and PPCK, that catalyze opposite reactions, forming a cycle/cyclic flux, so it is only the net flux that is biologically meaningful, and the net flux is approximately the same (Fig 5). GLUDY, however, catalyzes the conversion of 2-oxoglutarate into glutamate and is negative for the genome-scale model, but hovers around zero for the core model. This example advises caution in assuming strong inferences from ^{13}C MFA since the results may depend significantly on the model used. Whereas all mathematical models require validation, sensitivity analysis, and statistical testing in order to be applied effectively, the traditional use of small scale ^{13}C MFA is particularly fraught with modeler’s choices that can have an inordinate impact on the final fluxes, (see, e.g., the biomass fluxes in the 13CFLUX2 *E. coli* core model [19]). Genome-scale models, on the other hand, can be produced in a more systematic manner [12, 43, 44], so we suggest that the most consistent and repeatable way to proceed is to derive metabolic models systematically from the genome.

In any case, we can see that the idea of narrowly constraining all fluxes is, often, misleading. For example, for the cases of F6PA, DHAPT and PYK, the genome-scale model shows very flat probability distributions over large ranges of flux values (Fig 5).

### Monte Carlo sampling enables probabilistic knockout predictions

By leveraging the full probability distribution provided by BayFlux, we develop and evaluate two novel methods to predict fluxes after a knockout (Figs 6 and 7, and E in S1 Text): Probabilistic ^{13}C Minimum of Metabolic Adjustment (P-^{13}C MOMA), and Probabilistic ^{13}C Regulatory On/Off Minimization (P-^{13}C ROOM). Unlike traditional (FBA-based) MOMA [45] and ROOM [46] (or their ^{13}C versions [15]), this approach yields a predicted distribution of flux profiles after a knockout that aims to capture the uncertainty inherent in the initial wild type (WT) flux distribution, and represent that in the prediction. Traditional MOMA and ROOM work by first computing a base (wild type, WT) flux profile through FBA and then applying MOMA or ROOM to that FBA-derived flux profile, generating an single predicted flux profile. P-^{13}C MOMA and P-^{13}C ROOM, on the other hand, work by computing the base flux profile distribution using BayFlux, then choosing a representative set of flux profiles through subsampling, and finally computing the MOMA [45] and ROOM [46] knockout prediction for each of the flux profiles in this set (Fig D in S1 Text).

The x axis represent the Euclidean distances for the flux profiles predicted by MOMA, ROOM, P-^{13}C MOMA and P-^{13}C ROOM to the original fluxes calculated by Toya *et al*. [39] using a core metabolic model (ground truth fluxes, the smaller the Euclidean distance the better the prediction). Since P-^{13}C MOMA and P-^{13}C ROOM yield distributions of predicted flux profiles, the y axis represents the density of distances for these methods. MOMA and ROOM yield a single predicted flux profile, so we plot a single line to represent them. **A**. Distribution of euclidean distances to Toya *et al*. pyk5h flux predictions. 23.5% of the P-^{13}C MOMA and 36.5% of the P-^{13}C ROOM prediction distribution were more accurate than the traditional (FBA-based) MOMA and ROOM results, respectively. **B**. Distribution of euclidean distances to Toya *et al*. pgi16h flux predictions. 18.5% of the P-^{13}C MOMA and 38.5% of the P-^{13}C ROOM predicted distribution flux profiles were more accurate than the traditional MOMA and ROOM results, respectively.

Here we show the knockout prediction performance for four methods as judged by the distance of the prediction to the experimentally measured flux profile distribution (computed with BayFlux from ^{13}C experimental data). Rather than using single fluxes to determine which method performs better (Fig E in S1 Text), distances between full flux profile distributions comprising all fluxes are calculated through a classical measure of how two probability distributions differ from each other: the multivariate Kullback-Leibler divergence [47] (higher value → worse prediction, lower value → better prediction). The distance between the WT base profile distribution and the KO experimentally observed flux profile distribution is provided for reference. Notice how P-^{13}C MOMA and P-^{13}C ROOM produce smaller distances to the experimental results as compared with MOMA and ROOM, indicating improved predictions. The improvement is particularly pronounced for P-^{13}C MOMA, whereas it is marginal for P-^{13}C ROOM. All distances are shown as relative to the knockout strains (on the left) but flux profiles inhabit a multidimensional space, so similar distances do not mean the distance among them is small (e.g., the fact that the wild type and the P-^{13}C MOMA have a similar distance to the pyk5h knockout does not mean that these two flux distributions are similar to one another).

We compare the predictions of P-^{13}C MOMA and P-^{13}C ROOM and those of traditional MOMA and ROOM, with flux profiles measured through ^{13}C MFA (which we will take as ground truth) using data previously published in Toya *et al*. [39] for wild type and two gene knockouts (*pyk* and *pgi*). Flux profile distributions for P-^{13}C MOMA and P-^{13}C ROOM are obtained by using the Toya *et al*. [39] WT ^{13}C labeling data to calculate a base flux profile distribution through BayFlux and use it (Fig D in S1 Text) to yield P-^{13}C MOMA and P-^{13}C ROOM predictions for the *pyk* and *pgi* KOs. Flux profiles for traditional MOMA and ROOM are obtained through FBA using the same genome scale model used with BayFlux to obtain the base flux profile (maximizing growth after removing the growth rate constraint, but keeping extracellular exchange constraints). We then apply MOMA and ROOM to the resulting flux profiles to predict the knockout flux profiles for *pyk5h* and *pgi16h* knockouts. We compare this four predicted genome-scale flux profiles and distributions (for P-^{13}C MOMA, P-^{13}C ROOM, MOMA and ROOM) to the flux profile for the corresponding knockout calculated in two different ways: the original result through ^{13}C MFA using a core metabolic model by Toya *et al*., and through BayFlux. The comparison to the original flux profile from Toya *et al*. is meant to provide a comparison to a ground truth that is not influenced by BayFlux, and the comparison to a BayFlux profile is meant to provide a comparison to a more comprehensive genome-scale flux profile.

The comparison of predicted fluxes to the original fluxes calculated by Toya *et al*. using a core model shows how P-^{13}C MOMA and P-^{13}C ROOM provide uncertainty quantification for MOMA and ROOM predictions, and yield some solutions that improve on MOMA and ROOM results (Fig 6). The comparison is made by calculating the Euclidean distances of the predicted flux profiles to the ground truth fluxes for the 21 or 22 reactions (for *pyk* and *pgi* respectively) included in the core metabolic model, and that converged to a stable final result across 4 repeated BayFlux runs (, Gelman-Rubin statistic [42]). This approach results in a single distance for MOMA and ROOM, and a distribution for P-^{13}C MOMA and P-^{13}C ROOM. The distributions yielded by P-^{13}C MOMA and P-^{13}C ROOM provide a quantification of the predicted flux profiles, and shows that a large portion (18.5% to 38.5%) of the flux profiles predicted by P-^{13}C MOMA and P-^{13}C ROOM prediction are closer to the ground truth fluxes than traditional MOMA and ROOM predictions (Fig 6).

The comparison of predicted fluxes to the genome-scale fluxes obtained through BayFlux suggest that knockout predictions for a full set of genome-scale reactions improve by leveraging BayFlux flux probability distributions as the base flux distribution for MOMA and ROOM (Fig 7, and E in S1 Text). For both knockouts, P-^{13}C MOMA outperforms MOMA and P-^{13}C ROOM outperforms ROOM. The best predictions for all methods are provided by P-^{13}C MOMA, in terms of minimal distance from experimentally measured fluxes (Fig 7).

### Evaluation of convergence and scaling performance shows that faster or more efficient sampling will be required for very large systems

The number of required samples to reach convergence using BayFlux seems to scale linearly with the number of reactions in the model (Fig 8). Convergence, *i.e*. a stable distribution for the flux probabilities, is here defined as at least 80% of fluxes achieving a net flux Gelman-Rubin statistic [42] (the difference between estimated variance within the same chain and between different chains), across 4 independently run sampler chains. We test three models: the toy model from Antoniewicz *et al*. [38] (5 reactions), the example *E. coli* core metabolism model from 13CFLUX2 [19] (66 reactions), and the imEco726 genome scale model used throughout this paper [14] (487 reactions with statistical variability, *e.g*. not fixed by stoichiometry). This last, genome-scale, model requires 5.86 days to reach convergence after ≈ 33*M* samples using two Intel Xeon Gold 6154 (3 to 3.7 Ghz) cpu cores per sampler, for a sampler performance of approximately 65.4 samples per second. In contrast, the 13CFLUX2 optimization approach takes 2–5 minutes for the *E. coli* core metabolism model using two Intel(R) Xeon(R) CPUs E7–8870 v3 (2.10GHz), whereas BayFlux takes 8–9 hours using two Intel Xeon Gold 6154 (3 to 3.7 Ghz) cpu cores.

Shown are the number of samples required to reach convergence across four parallel chains, for three different size models and the fit to a linear model. We define convergence as having at least 80% of reactions with a net flux Gelman-Rubin statistic across 4 parallel chains, and exclude reactions with no sampling variance, *e.g*. reactions that are fully constrained and have only a single possible flux value (allowing for small amounts of numerical error) [42]. The data used here for the two largest models are the wild type 5 hour data from Toya *et. al*. [39].

Novel, faster, sampling algorithms will be required to apply BayFlux to the large metabolic models involved in microbial communities and human metabolism. A quick back-of-the-envelope calculation shows that a community of ≈ 200 species (3000 * 200 = 600, 000 reactions) would require over 19 years to achieve convergence based on the linear slope (65, 881) and intercept (1, 174, 530) shown in Fig 8, assuming the time per sample were roughly the same 65.4 samples per second as in our genome-scale *E. coli* model (a reasonable approximation because larger sparser matrices present in larger models are fundamentally better suited to parallelization, making the per-core runtime similar despite increased computing demands):

Similarly the recent human metabolic model by Thiele *et al* [48] (≈ 80, 000 reactions) would require over 2.5 years to converge:

Hence, if we are to tackle these problems in a reasonable amount of time, parallelization or more efficient sampling algorithms are required. BayFlux may be particularly informative for these applications since the large number of reactions for microbial communities and human metabolism (∼ 80, 000) is likely to produce many flux profiles compatible with the experimental data. For this reason, we believe that a probabilistic approach such as Bayflux would yield flux distributions that are flat and extended rather than peaked and concentrated, for which a single flux profile would be a poor representation of all the possible flux profiles.

Preliminary tests show that we can get approximately an order of magnitude speedup just by using sparse matrix solvers. We find that we can get a ≈ 8.10*x* speedup by using the popular sparse matrix solver algorithm SuperLU in place of the Numpy dense matrix solver from the Intel Distribution for Python that we are currently using, based on the mean of seven repeated matrix solving steps in our *E. coli* genome scale model [49]. We intend to support this alternative solver in future versions of BayFlux, after incorporating sparse matrix representations into the software.

## Conclusion

We have presented here a method (BayFlux) that provides all fluxes compatible with ^{13}C experimental data for a genome-scale metabolic model (Fig 1). BayFlux works by combining Bayesian inference and Markov Chain Monte Carlo (MCMC) sampling (Figs 1, and 2), and produces results compatible with the traditional optimization approach to estimating fluxes through ^{13}C MFA (Fig 3). However, BayFlux results provide extra information in the form of the full flux probability distribution, which allows for a more nuanced understanding of which flux values are possible given the current experimental data set (Fig 4). Moreover, BayFlux’s rigorous quantification of uncertainty shows that optimization models can overestimate flux uncertainty by representing it through only two numbers: the upper and lower confidence intervals. This is certainly the case for the local linearized estimates of uncertainty that are the default in 13CFLUX2 (Fig 4).

Surprisingly, the genome-scale model of metabolism produces narrower flux distributions than the small core metabolic models that are traditionally used in ^{13}C MFA (Fig 5). We hypothesize that the cause of this increased precision in flux determination for more complex models despite the apparent increase in degrees of freedom is due to the bow tie structure of metabolism [41]. Due to this structure, central metabolic fluxes are well constrained by metabolite labeling, and the inclusion of several hundreds of extra reactions only adds draws of cofactors that further constrain core fluxes. An additional surprising finding is that whereas flux distributions for both genome-scale models and core models are mostly concordant in their means, there are a few instances in which flux measurements present very different averages (Fig 5). This finding advises caution in assuming strong inferences from ^{13}C MFA since the results may depend significantly on the model used. We believe that the most systematic way to obtain a model for ^{13}C MFA is to derive it methodically from the genome annotation, as we have done here, by creating a genome-scale model from the genome annotation and augmenting it with the corresponding atom transitions for each reaction. However, this assumes that reactions and atom transitions for a genome-scale model can be obtained accurately. Because of its larger size, errors are more likely to be made when assembling a genome-scale set of reactions and atom transitions than a core metabolic model. While this problem can be partially offset by using automated annotation of GSMs (e.g. model SEED [50], EcoCyc [51], or CarveMe [44]), it will be complicated to completely eliminate. The question then arises: which is more accurate? a GSM with possible errors in annotation or a well-known core model that does not consider all known reactions and metabolites? We believe the answer can only be obtained by testing predictions using both approaches, in a similar way as shown in Figs 6 and 7. However, a final conclusion can only be achieved by comparing a statistically significant set of predictions (∼ 100 knockouts).

Based on BayFlux, we developed and evaluated novel methods (P-^{13}C MOMA and ROOM) to predict the biological results of a gene knockout, that improve on traditional FBA-based MOMA and ROOM methods (Fig E in S1 Text). P-^{13}C MOMA and P-^{13}C ROOM leverage the full flux probability distributions measured through BayFlux to provide probability distributions of fluxes after a gene knockout in a way that captures the uncertainty inherent in the initial flux. These probability distributions provide uncertainty quantification for these predictive methods (Fig 6).

The scaling properties for BayFlux with respect to model size (Fig 8) indicate that significant efforts in improving sampling and parallelization will be required to apply this method to large models such as microbiome or human metabolism models (displaying 80,000–200,000 reactions). However, preliminary results suggest that upgrades, such as a sparse solver, can increase speed by orders of magnitude.

In summary, BayFlux provides a rigorous way to find all flux profiles compatible with a given set of ^{13}C experimental data, opening the door to an improved understanding of metabolism and more effective predictions for strain metabolic engineering.

## Materials and methods

### Overview of physical problem

The overarching goal of this work is to leverage diverse and noisy experimental data to infer the metabolic fluxes in a cell: *e.g*. the number of chemical species per unit time through every chemical reaction in a living organism. A single cell can encompass thousands to tens of thousands of chemical reactions. We divide these reactions into internal, and external (*i.e*. exchange) reactions, where at least one product or reactant is extracellular. Exchange reactions can be observed directly, by measuring the rate of change in extracellular concentration for a given chemical species: *e.g*. the exchange flux for glucose will match the rate at which the glucose concentration falls in the culture media. Internal or intracellular fluxes can only be inferred indirectly from other types of experimental data, as the chemical species they consume and produce are immediately also consumed and produced by other reactions. For example, a reaction ‘A’ may have a very high flux, but its chemical product can still be almost absent from a cell if reaction ‘B’ consumes its product at the same or a higher rate.

Metabolic flux is a coarse-grained concept, comprising a large number of heterogeneous chemical reactions in a living cell. For very simple systems, it is possible to treat individual models, atoms, and enzymes as distinct events, *e.g*. using stochastic chemical kinetics [52]. However, these types of kinetic simulations are infeasible for a full cell. When a large number of cells are growing exponentially under stable environmental conditions (*e.g*. constant cellular doubling time) the random effects of individual chemical reaction events are cancelled out, allowing us to apply a steady state approximation. This assumes that the quantity of each chemical species has reached a quasi-stable equilibrium, and therefore one can regard each reaction as having a specific flux, where the total flux of reactions generating a chemical species always equals the sum of fluxes consuming it. In addition, applying the simplifying approximation that each cellular compartment or organelle is ‘well mixed,’ allows a single flux value to represent each reaction in each compartment.

### Key challenges

An important challenge involves integrating together diverse data sources with fundamentally different types of error, and underlying relationships to metabolic flux. Most common among these data types are exchange fluxes and mass isotopomer distributions (MIDs). Extracellular exchange fluxes account for mass entering and exiting the cell, and include measurements such as nutrient consumption and metabolic byproduct production rate, as estimated by a time series of extracellular concentration measurements and biomass or growth rate measurements. Exchange fluxes are represented directly as metabolic fluxes for specific reactions, and can be either applied as stoichiometric constraints to specific reactions, or as probablistic constraints on those fluxes. Mass isotopomer distributions (MIDs) are the result of feeding the organism with defined isotopomer nutrient substrates: *e.g*. glucose with extra neutrons on specific atoms. Since different metabolic flux vectors result in shuffling of atoms in different manners, they produce distinct MIDs. Using the Elementary Metabolic Unit (EMU) method [38], we can calculate the MID for any given flux vector, and compare this to the experimental data.

Perhaps the most important challenge is that, given the incredible complexity of metabolism and the paucity of (noisy) experimental data, the metabolic flux system is severely underdetermined, with substantial uncertainty about the true metabolic flux profile. As shown in the examples below, the number of fluxes for a genome-scale model is typically on the order of ∼ 3000, whereas the available data to constrain is usually on the order of 1–5 exchange fluxes and ∼ 60 metabolite measurements. Traditional ^{13}C MFA tackles this issue by considering only the reactions in central carbon metabolism and ignoring the rest. This approach conveniently, and artificially, reduces the number of degrees of freedom of the system below the number of measurements. This simplification can be justified by the “bow tie” structure of metabolism [41] (i.e., that metabolic flux from peripheral metabolism into central “core” carbon metabolism is minimal) and works well in the sense that good fits to experimental data can be obtained. However, it is still a simplification: just because we decide to ignore all reactions outside of central metabolism, that does not mean that they do not exist.

### Mathematical problem formulation

Mathematically, we represent metabolism as a directed bipartite graph, with *m* metabolites and *n* reaction vertices (See notation in Table 1). Metabolites represent unique chemical species in distinct compartments or areas of the organism. They participate as reactants in chemical reactions, and produce new chemical species, *e.g*. products, as a result. Reactants are represented as directed edges from the metabolite vertices to a reaction vertex, and products are represented as directed edges from the reaction vertices, to new metabolite vertices. This is the standard method used for genome scale models, as it encodes the chemical stoichiometry information for each reaction.

We represent the metabolic flux through this system with the unknown vector of fluxes associated to each reaction, with units of molecules per cell biomass unit per time. For example, the flux *v*_{PDH} = 0.16 mmol/gDW/h represents that the pyruvate dehydrogenase reaction is converting 0.16 mM of pyruvate per gram of cell dry weight per hour into Acetyl-CoA. Our system is constrained by conservation of mass, as described by the stoichiometric matrix *S* where:
(2)
at steady state [53], enforcing that the mass consumed by each reaction matches the mass produced.

Our goal is to characterize fluxes probabilistically, *i.e*. find the joint distribution of fluxes given data:
(3)
where **y** represents the experimental data:
(4) is the function of the simulated MIDs (details in the “Software Implementation” section below), and (optionally) any other relevant experimental data. For the MID data, we assume a Gaussian error model with a zero mean and standard deviation *σ*_{k} for each measurement *k* in the likelihood function.

For the purpose of finding the joint distribution, we use a Bayesian inference approach (Eq 1). The likelihood function follows from Eq (4), assuming that measurements are i.i.d, and normally distributed:
(5)
For simplicity, we use a uniform prior *p*(**v**) (*i.e*. a priori probability that the flux vector is **v**) on the polytope defined by *S***v** = 0: *i.e*. . The assumption that the MID measurements are statistically independent is not strictly true (e.g. for metabolites coming from the same pathway). However, the covariances are not explicitly known, and are left for future refinements of the technique. It is straightforward to modify this approach to incorporate any additional prior knowledge. When extracellular exchange fluxes are measured with high accuracy they can be added directly as stoichiometric constraints such that the prior is a uniform probability distribution on the set , where is defined by experimentally measured exchange fluxes, and zero probability outside. Alternatively, if there is substantial experimental uncertainty, the extracellular exchange fluxes could be added to the likelihood function.

The marginal likelihood (normalizing constant in Eq (1)) is difficult to compute, is only known for a small class of distributions, and becomes intractable with a high number of dimensions [54]. Therefore, we can only compute *p*(**v**|**y**) up to a normalizing constant, *i.e*. *p*(**v**|**y**) ∝ *p*(**y**|**v**)*p*(**v**), for which we use Markov Chain Monte Carlo (MCMC). MCMC allows us to draw samples from *p*(**v**|**y**) by creating a Markov chain that converges to the target distribution *p*(**v**|**y**) [54]. Our proposals are generated by choosing a random direction, and jumping along that direction to a uniformly distributed sample within the stoichiometric bounds on *S* (Fig 2). A generated proposal **v**′ is accepted according to the Metropolis probability [54, 55]:
(6)

The normalizing constant cancels out, and so does the prior, since it is uniform. In the current implementation we use the AcMet algorithm, described below to compute the proposal.

### Sampling

We created a new algorithm (Artificial centering Metropolis sampling, or AcMet, algorithm 1, Fig 2) for sampling from the posterior probability distribution, based on the commonly used Artificial Centering Hit-and-Run (ACHR) algorithm [33]. The ACHR algorithm is widely used to collect pseudo-uniform samples from the stoichiometrically feasible flux polytope of genome scale metabolic models (GSMMs). We modified the ACHR algorithm and combined it with the Metropolis algorithm, to produce a Markov Chain Monte Carlo (MCMC) sampler for Bayesian inference of metabolic fluxes (AcMet). AcMet can sample from the posterior probability distribution obtained by updating priors with an array of experimental evidence that includes ^{13}C isotopomer mass distribution vectors for metabolites, as well as extracellular exchange fluxes computed from time series measurements of extracellular metabolite concentrations. Other experimental data can be added in a similar fashion, facilitating integration of diverse omics data into a single coherent model.

The ACHR algorithm presented two characteristics that made it unsuitable for Bayesian inference with ^{13}C experimental data: center updates and the need to sample from net fluxes rather than both directional components of reversible reactions. Center updates involve finding the center of the accessible flux volume, which is used to direct the next sampling point [33]. These center updates happen in every step of the ACHR algorithm, and are not a problem in practice when doing uniform sampling of genome scale models. However, this ‘moving center’ makes the sampler non-reversible, eliminating the ability of the MCMC process to sample from the posterior. Moreover, it is critical for efficient sampling that the ‘center’ remains in the true center of the flux polytope. However, when sampling a non-uniform distribution, *e.g*. for Bayesian inference, samples no longer average around the center of the flux polytope, which would make using these samples to compute the center impossible. Therefore, maintaining a fixed center is critical to enable Markov-Chain Monte Carlo sampling. Sampling from net fluxes only is a problem because of cyclic reaction fluxes, which involve simultaneous forward and backwards fluxes for the same reaction [56]. While cyclic reaction fluxes play no role in the stoichiometric FBA simulations that originated the ACHR algorithm, they are critical for determining metabolite labeling in ^{13}C MFA [56].

AcMet overcomes the center updates problem by sampling in two phases. First we collect a large number of uniform samples from the genome scale model using the ACHR algorithm, until the center converges into a stable position. Next, we lock the center while collecting samples with the Metropolis algorithm, which accepts or rejects samples taking into account the likelihood function derived from experimental data. By keeping the center locked, the chain becomes reversible, permitting Metropolis sampling. The ACHR algorithm begins with finding ‘edge points’ (see Glossary of terms in S1 Text) that are coordinates on the most extreme bounds of the feasible flux space, which are points the sampler can move towards, and are used to navigate around the unusual shape of the flux polytope. Typically these represent the highest and lowest points in each dimension (*e.g*. reaction flux), as identified by linear optimization. Here, we use the term ‘edge points’ instead of the more commonly used term ‘warm-up points’ to avoid confusion with the Markov Chain Monte Carlo (MCMC) concept of warm-up samples. These ‘edge points’ are coordinates in the feasible flux space, and should not be confused with the concept of edges in a mathematical graph.

In order to effectively sample within-reaction cyclic fluxes, we added extra edge points to the sampler, allowing the sampler to explore along these dimensions, in addition to exploring net fluxes. For each reversible reaction, two additional edge points were added: one which maximizes forward flux and minimizes reverse flux, and another which minimizes forward flux and maximizes reverse flux, as computed from the allowable bounds of each reaction.

To conserve memory and data storage resources, the AcMet sampler provides the option of “thinning” where only a fraction of samples are retained, and saved. For all BayFlux results in this paper we set the thinning parameter such that only 100,000 samples were retained. For example, all runs with our genome-scale *E. coli* atom mapping model described below were continued for 120M samples, with a thinning parameter of 1,200, for a total of 100,000 samples saved. We recommend to users to limit final samples for all runs to 100,000 to prevent subsequent plotting and analysis steps from becoming overloaded.

**Algorithm 1** Artificial centering Metropolis sampling (AcMet, see Fig 2 and Table 1).

1: Create a set of “edge points” *W* = {**w**^{1}, …, **w**^{2n}} by maximizing and minimizing each dimension

2: Initialize

3: Set *k* = 0 and **c** = **v**^{0}

4: Draw a sample , and set direction **d**^{k} = (**w**^{j} − **c**)/‖**w**^{j} − **c**‖

5: Select a step size , such that *S*(**v**^{k} + λ_{k,x}**d**^{k}) = 0, *x* ∈ {min, max}

6: Generate a candidate **v**′ = **v**^{k} + λ_{k,x}**d**^{k}

7: Calculate acceptance ratio *α* = *L*(**v**′)/*L*(**v**^{k}) where *L* is the likelihood *p*(**y**|**v**).

8: Draw

9: If *u* ≤ *α* or if *k* < *q* (warmup phase) accept the candidate by setting **v**^{k+1} = **v**^{k} + λ_{k,x}**d**^{k}

10: If *u* > *α* and *k* ≥ *q* reject the candidate by setting **v**^{k+1} = **v**^{k}

11: Set *k* = *k* + 1

12: If *k* < *q* (warmup phase) set **c** = (*k***c** + **v**^{k})/(*k* + 1) to update the center running average as per ACHR [33]

13: Go to Step 4

The likelihood *L* is computed for each step according to the normally distributed likelihood function (Eq (5)). A uniform prior is implicitly included when calculating the acceptance ratio (Eq (6)).

### Models for generating metabolite labeling

#### Core metabolic model 1.

To compare our method with existing ^{13}C MFA software, we use the *E. coli* data and core metabolic model employed in the demonstration of 13CFLUX2 (the slightly adapted version of [57] used in [34]). The experimental data involve measurements of glucose uptake, growth rate, and the labeling of eleven central carbon intracellular metabolites for an *E. coli* MG1655 strain grown in glucose-limited continuous culture. The model comprises 66 reactions and 37 metabolites describing central carbon metabolism (selected reactions are shown in Fig 3). In order to provide enough instances for a comparison, we randomize the exchange fluxes fifteen times and determine fluxes through both 13CFLUX2 and BayFlux (Fig B in S1 Text).

#### Core metabolic model 2.

To compare genome scale and core metabolic models, we use the previously published Toya *et al*. wild type 5 hour (wt5h) data, which includes measurements of glucose uptake, acetate excretion, growth rate, and the labeling of nine central carbon intracellular metabolites for an *E. coli* BW25113 strain [39]. As the core metabolic model, we use a simplified *E. coli* core model with 63 reactions taken from previous literature [15], and as the genome scale model the *E. coli* genome scale model described in this paper which combines iAF1260 and imEco726 models. Note that both models have a built in biomass composition, which we leave intact as originally described in each model, and are not directly comparable because they merge and separate different metabolites.

#### Genome-scale *E. coli* atom mapping model.

To evaluate the method described by this paper using real world data, with a genome scale model, we merge a genome scale ^{13}C MFA model with a genome scale stoichiometric model, and use this to evaluate previously published *in vivo*^{13}C MFA data.

We utilize atom transitions from the imEco726 genome scale isotope mappings, together with the latest version of the iAF1260 metabolic reconstruction of *Escherichia coli* [14, 40]. We update both the iAF1260 and imEco726 models to account for the experimental conditions used by Toya *et al*., and accommodate recent updates to the latest version of iAF1260, including new reactions and changes to some reaction and metabolite identifiers [39].

This process was facilitated by the fact that imEco726 was initially based on iAF1260. Merging these models and adjusting them to be suitable for a specific experimental context required numerous refining steps, as described below. The full source code for this model creation and curation process is included with the BayFlux software package as a Python Jupyter notebook, such that it can serve as a template for users to repeat the general process for other organisms.

#### Mapping metabolite and reaction identifiers.

The iAF1260 genome scale model contains 2382 reactions, and the imEco726 isotope mapping model contains carbon transitions for only 686 reactions. We find that 70 of the transitions in imEco726, and all of the metabolites participating in transitions (595 total) can not be mapped to any identical identifiers in the latest version of iAF1260. To address this issue, we performed a text based alignment using the Levenshtein distance between the unmapped identifiers in both models, which we then manually review for correctness [58]. By inspecting the alignment output, we created a set of regular expressions which correctly map all reaction and metabolite identifiers in imEco726 to the corresponding identifiers in iAF1260.

For some reactions with the same or similar names we found that the products and reactants are swapped between the two models. For these reactions, we reorient the transitions from imEco726 to match those of iAF1260. Also, for this analysis, we are assuming steady state labeling, so we omit the “dil” transitions from imEco726, which account for metabolites which have not yet reached steady state ^{13}C labeling.

#### Metabolite symmetry.

Some metabolites in the genome scale model (*e.g*. succinate) exhibit structural symmetry, such that there is no single unique way of numbering the atoms. For reactions which act on symmetric metabolites, we duplicate the reactions atom transition(s) such that equal flux flows through the model with all possible orientations of the symmetric metabolite. Our software allows for an unlimited number of atom transitions for each reaction, over which the total flux for the reaction is equally divided.

#### Setting flux bounds and extracellular exchange fluxes.

After incorporating measured extracellular exchange fluxes for biomass, glucose uptake, and acetate production to iAF1260 we set a maximum absolute value flux for each reaction to 5× the glucose uptake rate, in order to constrain the search space, and accelerate sampler convergence.

#### Applying cell culture media constraints.

As an initial step to simplify the model, which contains a large number of infrequently used extracellular exchange reactions, we apply an automated method to pair down these reactions, which we then manually refine for correctness. After incorporating the experimentally measured extracellular exchange fluxes to iAF1260, we compute the smallest set of essential media components using Mixed Integer Programming (MIP) as implemented in the COBRApy function with the option set to [59]. This results in 15 essential nutrients that must be included in the media. We apply this minimal media constraint to the model, by setting the lower bound of all non-essential exchange reactions to zero.

Next, to manually refine these for correctness, we obtain the actual experimentally used media composition as published by Toya et al., and identify the set of exchange reactions that correspond to the molecules in this media formulation. From this, we find that two additional nutrients (H_{2}O and Na^{+}) were provided in the media that are not identified as essential in the analysis described above, and we enable the uptake of these.

### Model reduction

#### Pruning non-essential reactions with unknown atom transitions.

Next, we utilize Parsimonious Flux Balance Analysis (pFBA) to identify non-essential reactions in the iAF1260 model which had also been omitted from imEco726. We find 1655 non-essential intracellular reactions, of which 1310 can both carry carbon and do not have provided transitions in imEco726, so we remove them from the model, along with 499 (now unused) metabolites.

#### Pruning unused reactions.

Next, we remove all reactions in iAF1260 which cannot carry flux under the experimentally derived extracellular exchange flux conditions, even if they can carry carbon, or have transitions from imEco726. This removes an additional 380 reactions and 371 metabolites from the model.

#### Inferring unknown atom transitions.

After removing unused reactions from the iAF1260 model as described above, we identify 35 reactions which are both essential under our extracellular exchange flux conditions, do carry carbon, but do not contain transitions in imEco726. For these reactions, it was necessary to define atom transitions in order to obtain a complete genome scale isotope mapping model.

As a first approximation for computing atom transitions without using chemical structures, we identify the number of carbons in each metabolite based on the chemical formula, and sort the reactants and products by increasing carbon count. For reactions where no two reactants have the same carbon count, and both the reactants and products have an identical set of carbon counts per molecule, we assume direct 1:1 transitions between the identically sized reactants and products.

After applying the assumptions described above to estimate transitions for some reactions, we still have 8 reactions flagged as ambiguous, with either equal sized reactants, or a different distribution of sizes between reactants and products suggesting carbon exchange. For these reactions, we manually look up the corresponding reaction on MetaCyc, and manually write transitions that match the atom ordering used in imEco726.

After all of the steps described above including removing reactions, removing metabolites, and adding transitions our final genome scale isotope mapping model contains 692 reactions, and 798 metabolites. Of these 692 reactions, 633 carry carbon and have one or more atom transition mapping.

Please refer to the Jupyter Notebook entitled “imEco726_genome_scale” distributed with our BayFlux software, which outlines the full model curation process described above. All steps can be automatically reproduced, and used as a template for recreating this process with other species and/or experimental conditions.

### Software implementation

#### Simulating metabolite labeling.

During the process of Bayesian inference from a metabolic model, our software simulates the mass isotopomer distributions (MIDs) for each flux sample and plugs them into the likelihood equation (Eq (5)). Because this complex simulation must be performed for each sample, developing a high performance Elementary Metabolic Unit (EMU) simulation method is an essential technology to make our method feasible.

We have developed a high performance implementation of the Elementary Metabolic Unit (EMU) method capable of simulating genome scale mass isotopomer distributions for millions of flux vectors in just a few hours on standard computer hardware [38]. Three main techniques were utilized to achieve this performance. First, highly repetitive vector and matrix computations are performed directly via low level Fortran code and calls to the Linear Algebra PACKage (LAPACK) and The Netlib Basic Linear Algebra Subprograms (BLAS). Second, we implement ‘EMU pruning’ where long branch free metabolic pathways are automatically identified, and collapsed into a single EMU transition without loss of information. Third, we implement ‘transition merging’ where functionally equivalent atom transitions (*e.g*. from parallel reactions with identical transitions) are automatically identified and merged, also without loss of information.

This Elementary Metabolic Unit (EMU) code also provides the option of defining extra metabolites to simulate beyond those measured experimentally, *e.g*. to cross validate a model with experimental data by leaving it out and re-inferring it. Additionally, it opens the possibility of generating simulated ^{13}C experimental data for experimental design purposes. For example, one can simulate different sets of mass isotopomer distribution (MID) data for a wide array of labeled substrates, and measured metabolites. Next, one can perform Bayesian inference on these simulated data with BayFlux to predict the uncertainty surrounding key reactions of interest, and then select an experimental design that provides maximum information on these specific reactions. Additional metabolites to simulate must be defined while processing a model with BayFlux, because, for performance reasons BayFlux automatically avoids computing portions of a model unnecessary to simulate a given set of experimental data.

## Supporting information

### S1 Text. This file contains supplementary figures and a glossary of terms.

https://doi.org/10.1371/journal.pcbi.1011111.s001

(PDF)

## Acknowledgments

This research used computing resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

We would like to thank Christian Diener, Niko Sonnenschein, and the rest of the COBRApy team for meaningful discussions, and support in integrating BayFlux with COBRApy. We thank Xiaoye Sherry Li for feedback and meaningful discussions. We would also like to thank Mark Kulawik for providing computer support.

## References

- 1. Cameron DE, Bashor CJ, Collins JJ. A brief history of synthetic biology. Nature Reviews Microbiology. 2014;12(5):381. pmid:24686414
- 2. Beller HR, Lee TS, Katz L. Natural products as biofuels and bio-based chemicals: fatty acids and isoprenoids. Natural product reports. 2015;32(10):1508–1526. pmid:26216573
- 3. Chubukov V, Mukhopadhyay A, Petzold CJ, Keasling JD, Martín HG. Synthetic and systems biology for microbial production of commodity chemicals. npj Systems Biology and Applications. 2016;2:16009. pmid:28725470
- 4. Ajikumar PK, Xiao WH, Tyo KE, Wang Y, Simeon F, Leonard E, et al. Isoprenoid pathway optimization for Taxol precursor overproduction in Escherichia coli. Science. 2010;330(6000):70–74. pmid:20929806
- 5. Voigt CA. Synthetic biology 2020–2030: six commercially-available products that are changing our world. Nature Communications. 2020;11(1):1–6. pmid:33311504
- 6. Carbonell P, Radivojević T, Martin HG. Opportunities at the Intersection of Synthetic Biology, Machine Learning, and Automation. ACS Synth Biol. 2019;8:1474–1477. pmid:31319671
- 7. Gardner TS. Synthetic biology: from hype to impact. Trends in biotechnology. 2013;31(3):123–125. pmid:23419209
- 8. Ghosh A, Ando D, Gin J, Runguphan W, Denby C, Wang G, et al. 13C metabolic flux analysis for systematic metabolic engineering of S. cerevisiae for overproduction of fatty acids. Frontiers in bioengineering and biotechnology. 2016;4:76. pmid:27761435
- 9. Chechik G, Oh E, Rando O, Weissman J, Regev A, Koller D. Activity motifs reveal principles of timing in transcriptional control of the yeast metabolic network. Nature biotechnology. 2008;26(11):1251. pmid:18953355
- 10. Stolyar S, Van Dien S, Hillesland KL, Pinel N, Lie TJ, Leigh JA, et al. Metabolic modeling of a mutualistic microbial community. Molecular systems biology. 2007;3(1). pmid:17353934
- 11.
Stephanopoulos G, Aristidou AA, Nielsen J. Metabolic engineering: principles and methodologies. Elsevier; 1998.
- 12. Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. 2010;5(1):93. pmid:20057383
- 13. Wiechert W. 13C metabolic flux analysis. Metabolic engineering. 2001;3(3):195–206. pmid:11461141
- 14. Gopalakrishnan S, Maranas CD. 13C metabolic flux analysis at a genome-scale. Metabolic engineering. 2015;32:12–22. pmid:26358840
- 15. Martín HG, Kumar VS, Weaver D, Ghosh A, Chubukov V, Mukhopadhyay A, et al. A method to constrain genome-scale models with 13C labeling data. PLoS computational biology. 2015;11(9). pmid:26379153
- 16. Fong SS, Palsson BØ. Metabolic gene–deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nature genetics. 2004;36(10):1056–1058. pmid:15448692
- 17. d’Espaux L, Ghosh A, Runguphan W, Wehrs M, Xu F, Konzock O, et al. Engineering high-level production of fatty alcohols by Saccharomyces cerevisiae from lignocellulosic feedstocks. Metabolic engineering. 2017;42:115–125. pmid:28606738
- 18. Yim H, Haselbeck R, Niu W, Pujol-Baxley C, Burgard A, Boldt J, et al. Metabolic engineering of Escherichia coli for direct production of 1, 4-butanediol. Nature chemical biology. 2011;7(7):445. pmid:21602812
- 19. Weitzel M, Nöh K, Dalman T, Niedenführ S, Stute B, Wiechert W. 13CFLUX2—high-performance software suite for 13C-metabolic flux analysis. Bioinformatics. 2013;29(1):143–145. pmid:23110970
- 20. Antoniewicz MR, Kelleher JK, Stephanopoulos G. Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. Metabolic engineering. 2007;9(1):68–86. pmid:17088092
- 21. Shupletsov MS, Golubeva LI, Rubina SS, Podvyaznikov DA, Iwatani S, Mashko SV. OpenFLUX2: 13C-MFA modeling software package adjusted for the comprehensive analysis of single and parallel labeling experiments. Microbial cell factories. 2014;13(1):1–25.
- 22. Rahim M, Ragavan M, Deja S, Merritt ME, Burgess SC, Young JD. INCA 2.0: A tool for integrated, dynamic modeling of NMR-and MS-based isotopomer measurements and rigorous metabolic flux analysis. Metabolic Engineering. 2022;69:275–285. pmid:34965470
- 23. Theorell A, Leweke S, Wiechert W, Nöh K. To be certain about the uncertainty: Bayesian statistics for 13C metabolic flux analysis. Biotechnology and bioengineering. 2017;114(11):2668–2684. pmid:28695999
- 24. Schellenberger J, Palsson BØ. Use of randomized sampling for analysis of metabolic networks. Journal of biological chemistry. 2009;284(9):5457–5461. pmid:18940807
- 25. Kadirkamanathan V, Yang J, Billings SA, Wright PC. Markov Chain Monte Carlo Algorithm based metabolic flux distribution analysis on Corynebacterium glutamicum. Bioinformatics. 2006;22(21):2681–2687. pmid:16940326
- 26. Megchelenbrink W, Huynen M, Marchiori E. optGpSampler: an improved tool for uniformly sampling the solution-space of genome-scale metabolic networks. PloS one. 2014;9(2). pmid:24551039
- 27. Haraldsdóttir HS, Cousins B, Thiele I, Fleming RM, Vempala S. CHRR: coordinate hit-and-run with rounding for uniform sampling of constraint-based models. Bioinformatics. 2017;33(11):1741–1743. pmid:28158334
- 28. Mairinger T, Wegscheider W, Peña DA, Steiger MG, Koellensperger G, Zanghellini J, et al. Comprehensive assessment of measurement uncertainty in 13 C-based metabolic flux experiments. Analytical and bioanalytical chemistry. 2018;410(14):3337–3348. pmid:29654338
- 29.
Szallasi Z, Stelling J, Periwal V. System modeling in cell biology from concepts to nuts and bolts, chapter 4. 2006;. https://doi.org/10.7551/mitpress/9780262195485.003.0004
- 30. Heinonen M, Osmala M, Mannerström H, Wallenius J, Kaski S, Rousu J, et al. Bayesian metabolic flux analysis reveals intracellular flux couplings. Bioinformatics. 2019;35(14):i548–i557. pmid:31510676
- 31. John PCS, Strutz J, Broadbelt LJ, Tyo KE, Bomble YJ. Bayesian inference of metabolic kinetics from genome-scale multiomics data. PLoS computational biology. 2019;15(11).
- 32.
de Bastos PM. How to use Bayesian Inference for predictions in Python; 2022. Available from: https://towardsdatascience.com/how-to-use-bayesian-inference-for-predictions-in-python-4de5d0bc84f3.
- 33. Kaufman DE, Smith RL. Direction Choice for Accelerated Convergence in Hit-and-Run Sampling. Operations Research. 1998;46(1):84–95.
- 34. Weitzel M, Nöh K, Dalman T, Niedenführ S, Stute B, Wiechert W. 13CFLUX2—high-performance software suite for 13C-metabolic flux analysis. Bioinformatics. 2012;29(1):143–145. pmid:23110970
- 35. Laplace PS. Mémoire sur les probabilités. Mémoires de l’Académie Royale des sciences de Paris. 1781;1778:227–332.
- 36.
Brooks S, Gelman A, Jones G, Meng XL. Handbook of markov chain monte carlo. CRC press; 2011.
- 37. Antoniewicz MR, Kelleher JK, Stephanopoulos G. Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements. Metabolic engineering. 2006;8(4):324–337. pmid:16631402
- 38. Antoniewicz MR, Kelleher JK, Stephanopoulos G. Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. Metabolic Engineering. 2007;9(1):68–86. pmid:17088092
- 39. Toya Y, Ishii N, Nakahigashi K, Hirasawa T, Soga T, Tomita M, et al. 13C-metabolic flux analysis for batch culture of Escherichia coli and its pyk and pgi gene knockout mutants based on mass isotopomer distribution of intracellular metabolites. Biotechnology progress. 2010;26(4):975–992. pmid:20730757
- 40. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, et al. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular Systems Biology. 2007;3:121. pmid:17593909
- 41. Backman TW, Ando D, Singh J, Keasling JD, García Martín H. Constraining genome-scale models to represent the bow tie structure of metabolism for 13C metabolic flux analysis. Metabolites. 2018;8(1):3. pmid:29300340
- 42. Gelman A, Rubin DB. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science. 1992;7(4):457–472.
- 43. Seaver SM, Liu F, Zhang Q, Jeffryes J, Faria JP, Edirisinghe JN, et al. The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes. Nucleic acids research. 2021;49(D1):D575–D588. pmid:32986834
- 44. Machado D, Andrejev S, Tramontano M, Patil KR. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic acids research. 2018;46(15):7542–7553. pmid:30192979
- 45. Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic networks. Proceedings of the National Academy of Sciences. 2002;99(23):15112–15117. pmid:12415116
- 46. Shlomi T, Berkman O, Ruppin E. Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proceedings of the national academy of sciences. 2005;102(21):7695–7700. pmid:15897462
- 47.
Polani D. Kullback-leibler divergence. Encyclopedia of systems biology. 2013; p. 1087–1088.
- 48. Thiele I, Sahoo S, Heinken A, Hertel J, Heirendt L, Aurich MK, et al. Personalized whole-body models integrate metabolism, physiology, and the gut microbiome. Molecular systems biology. 2020;16(5):e8982. pmid:32463598
- 49. Li XS. An Overview of SuperLU: Algorithms, Implementation, and User Interface. 2005;31(3):302–325.
- 50. Devoid S, Overbeek R, DeJongh M, Vonstein V, Best AA, Henry C. Automated genome annotation and metabolic model reconstruction in the SEED and Model SEED. Systems metabolic engineering: methods and protocols. 2013; p. 17–45. pmid:23417797
- 51.
Karp PD, Paley S, Caspi R, Kothari A, Krummenacker M, Midford PE, et al. The EcoCyc Database (2023). EcoSal Plus. 2023; p. eesp–0002.
- 52. Rao CV, Arkin AP. Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the Gillespie algorithm. The Journal of chemical physics. 2003;118(11):4999–5010.
- 53. Orth JD, Thiele I, Palsson B. What is flux balance analysis? Nature Biotechnology. 2010;28(3):245–248. pmid:20212490
- 54. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57(1):97–109.
- 55. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21(6):1087.
- 56. Wiechert W. The thermodynamic meaning of metabolic exchange fluxes. Biophysical Journal. 2007;93(6):2255–2264. pmid:17526563
- 57. Weitzel M, Wiechert W, Nöh K. The topology of metabolic isotope labeling networks. BMC bioinformatics. 2007;8(1):1–27. pmid:17727715
- 58. Levenshtein V. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady. 1966;10:707.
- 59. Ebrahim A, Lerman JA, Palsson BO, Hyduke DR. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst Biol. 2013;7:74. pmid:23927696