Efficient Modeling of MS/MS Data for Metabolic Flux Analysis

Metabolic flux analysis (MFA) is a widely used method for quantifying intracellular metabolic fluxes. It works by feeding cells with isotopic labeled nutrients, measuring metabolite isotopic labeling, and computationally interpreting the measured labeling data to estimate flux. Tandem mass-spectrometry (MS/MS) has been shown to be useful for MFA, providing positional isotopic labeling data. Specifically, MS/MS enables the measurement of a metabolite tandem mass-isotopomer distribution, representing the abundance in which certain parent and product fragments of a metabolite have different number of labeled atoms. However, a major limitation in using MFA with MS/MS data is the lack of a computationally efficient method for simulating such isotopic labeling data. Here, we describe the tandemer approach for efficiently computing metabolite tandem mass-isotopomer distributions in a metabolic network, given an estimation of metabolic fluxes. This approach can be used by MFA to find optimal metabolic fluxes, whose induced metabolite labeling patterns match tandem mass-isotopomer distributions measured by MS/MS. The tandemer approach is applied to simulate MS/MS data in a small-scale metabolic network model of mammalian methionine metabolism and in a large-scale metabolic network model of E. coli. It is shown to significantly improve the running time by between two to three orders of magnitude compared to the state-of-the-art, cumomers approach. We expect the tandemer approach to promote broader usage of MS/MS technology in metabolic flux analysis. Implementation is freely available at www.cs.technion.ac.il/~tomersh/methods.html

MFA is based on the key observation that metabolite isotopic labeling patterns are uniquely determined by the distribtuion of metabolic flux in the network [10]. It is typically implemented as a non-convex optimization problem, searching for the most likely distribution of fluxes that would give rise to metabolite isotopic labeling that optimally matches experimental measurements. The running time of MFA methods becomes a major bottleneck when analyzing large-scale metabolic networks, consisting of hundreds of reactions, when repeatedly applied to compute accurate flux confidence intervals [11,12], and in experimental design of isotopic labeling experiments [13]. The major factor that affects the performance of MFA implementations is the time required to simulate metabolite isotopic labeling for a candidate flux distribution.
A distinct labeling pattern of a certain metabolite is called an isotopomer, while the distribution of abundances of all isotopomers is reffered to as, isotopomer distribution. A metabolite with n carbons has 2 n distinct isotopomers; for example, as shown in Table 1, a metabolite having 4 carbons has 16 possible isotopomers. Previous studies have suggested the cumomers [14] and fluxomers [15] approaches for efficiently simulating the isotopomer distributions of all metabolites in a metabolic network given a flux vector. However, as the number of distinct isotopomers of a metabolite is exponentially dependent on the number of carbons that is has, these methods require a huge number of variables and may become computationally intractable.
Measuring the complete isotopomer distribution of metabolites is technically infeasible. Instead, mass-spectrometry is typically used to measure the relative abundance of a given metabolite having different number of labeled atoms (i.e. zero labeled atoms, one, two, etc). A set of isotopomers of a certain metabolite having the same mass is referred to as massisotopomers, while the relative abundance of mass-isotopomers denoted mass-isotopomer distribution. Notably, a mass-isotopomer distribution provides limited information on positional isotopic labeling, as isotopomers with the same number of labeled atoms have the same mass regardless of their position. Mass-isotopomer distributions can be calculated given the complete isotopomer distributions (by summing the abundances of isotopomers having the same number of labeled atoms). Alternatively, they can be directly and efficiently computed via the EMU approach [1]. Information on the positional labeling of metabolites can be obtained by tandem mass-spectrometry (i.e. MS/MS) and was previously shown to significantly improve quantification of metabolic fluxes via MFA [12,[16][17][18]. It works by isolating a single parent ion from the full spectrum and measuring its mass, followed by a collision that yields product ions whose mass is also measured. It can hence be employed to derive the mass-isotopomer distribution of a metabolite of interest and that of a collisional fragment. Most importantly, MS/MS can further measure the abundance of specific transitions from certain parent to product mass-isotopomers, referred to as tandem mass-isotopomers (also denoted here as tandemers, for short). We denote the number of labeled atoms of a parent ion by M+0 (having no labeled atoms), M+1 (having one labeled atom), etc., and the number of labeled atoms of a product ion by m+0, m+1, etc. We denote a transition from parent mass-isotopomer M + i to product mass-isotopomer m + j, by . This provides additional information on positional labeling beyond that available via the mass-isotopomer distribution of the parent and product fragments separately (describing the abundance of a metabolite having various combinations of specific mass-isotopomers for the parent and product molecules). The relative abundance of all tandemers is referred to as a tandem mass-isotopomer distribution (or tandemer distribution, for short).
Currently, there is no method for efficiently simulating tandemer distributions. Previous applications of MFA given MS/MS data have inefficiently computed the complete isotopomer distributions for all metabolites in the network (for example, via cumomers [18]) in order to simulate tandemer distributions. Here, we present the tandemer method for efficiently simulating MS/MS measurements (i.e. tandemer distributions) of metabolites in a metabolic network. It builds upon and extends ideas set forward by the EMU method which efficiently simulates mass-isotopomer measurements [1]. f g is, by definition, zero.

MFP A N
K is defined as a set of isotopomers of A having 0 i |N| labeled atoms within the parent fragment N, and 0 j |K| labeled atoms within the product fragment K. A tandemer is considered feasible if it does not represent an empty set of isotopomers, i.e. when j is no larger than i (as the product fragment K is enclosed within the parent fragment N), and no smaller than i − (|N| − |K|) (when all atoms that are in the parent but not in the product fragment are labeled). The number of feasible tandemers for A N K is hence (|N| − |K| + 1)(|K| + 1) [18]. The entire tandemer distribution of A, with respect to the MFP A N K , can be represented by a tandemer distribution matrix ½A N K with |N| + 1 rows (representing the number of labeled atoms in the parent fragment; from zero to |N|), and |K| + 1 columns (representing the number of labeled atoms in the product fragment), such that ½A We refer to ½C M N as being equal to the Cauchy product between matrices A m 1 Ân 1 and B m 2 Ân 2 , denoted A B (extending the definition of Cauchy product between vectors to matrices).
For example, let us consider the bi-substrate reaction shown in Fig 2b, in which A (having 3 carbons) is condensed with B (having 2 carbons) to make C, with carbons from A mapped to the first 3 carbons in C and atoms from B mapped to the last 2 carbons in C. In this case, the tandemer distribution matrix for C 2;3;4;5 f g 1 f g . Under isotopic steady-state, a tandemer distribution matrix for the MFP A N K is determined based on the corresponding tandemer distributions of all of its substrate MFPs according to the following balance equation: where v i is the flux through reaction i.

An algorithm for simulating tandemer distributions
Our goal is to efficiently simulate the tandemer distribution for a pre-defined set of metabolites (for which corresponding experimental data might be available). We assume that a metabolic network model with reaction atom-mappings (describing the mapping of substrate to product metabolite atoms in each reaction) and candidate fluxes are given. To address this problem, we present the tandemers approach, whose outline is shown in Fig 3. A detailed explanation of the various steps of the algorithms is provided below, while a Matlab implementation is available at: www.cs.technion.ac.il/~tomersh/methods.html. 2.3.1. Identifying a minimal set of MFPs whose tandemer distributions are needed to simulate MS/MS measurements. The identification of a minimal set of MFP's in the metabolic network whose tandemer distributions would enable simulating the tandemer distribution of a given set of metabolites is done using a recursive procedure, in a similar manner to that presented in the EMU approach [1]. Specifically, starting with a list of MFPs for which tandemer distributions are available, we traverse the metabolic network to iteratively add substrate MFPs (via the definition of substrate MFPs provided above). This procedure ends when no more new MFPs can be added to the list.
The total number of MFPs depends on many factors, including the structure of the metabolic network, number of atoms per metabolite, reaction atom mappings, and number of metabolites for which experimental MS/MS data is available as input. In theory, for each Hence, theoretically, the number of MFPs for a certain metabolite found by the recursive procedure described above may exceed the number of possible isotopomers for that metabolite (which is 2 n ). In practice, as we show below, applying this method on various metabolic networks, the number of MFPs found per metabolite is markedly lower than the number of isotopomers, resulting in improved running time compared to existing methods. 2.3.2. MFP's clustering and ordering. In a uni-substrate reaction producing MFP B N K , the size of the parent fragment in B N K equals that of its substrate A N 0 K 0 ¼ SðB N K ; iÞ (i.e. |N'| = |N|), while in a bi-substrate reaction, the size of the parent fragment is larger than that of both its substrate MFPs (noting that a bi-substrate reaction in which one of the substrate MFP has a parent fragment of size zero can be regarded as a uni-substrate reaction). Hence, a tandemer distribution with parent fragment of a certain size is linearly dependent upon tandemer distributions with parent fragment of the same size, and non-linearly dependent (via Cauchy product) only upon tandemer distributions with smaller parent fragments (see Eq 2). Therefore, all tandemer distributions can be computed by sequentially solving linear balance equations for sets of tandemer distributions with increasing sizes (as Cauchy product of tandemer distribution matrices of smaller parent fragments can be computed before reaching tandemer distributions of larger parent fragments; similarly to the EMU [1] and cumomer [19] approaches).
Following a method proposed by [20], we divided the linear balance equations into smaller sets of equations that can be sequentially solved. Specifically, we construct a directed graph whose nodes represent the identified MFPs and edges connect an MFP A N K with its substrate SðA N K ; iÞ for uni-substrate reactions, and with both its substrates S 1 ðA N K ; iÞ and S 2 ðA N K ; iÞ for bisubstrate reactions. Notably, for a given MFP in this graph, its parent fragment size is not smaller than those of all MFPs having a directed edge towards it (see Fig 2). Hence, decomposing the graph into strongly connected components [21] and applying topological sorting on the identified clusters [22] leads to a cascade of MFP clusters, each consisting of MFPs having the same parent fragment size, with clusters ordered according to a non-decreasing fragment parent size (see Example below in Section 2.4). Iterating through the list of clusters, all tandemer distributions in a given cluster can be calculated via a set of linear equations based on tandemer distributions inferred in previous clusters (via Eq (2); where non-linear terms associated with bi-substrate reactions are calculated based on tandemer distributions inferred in previous clusters) [22].
2.3.3. Formulating and solving a series of linear balance equations for MFPs in each cluster. Tandemer distributions for MFPs in a cluster are linearly dependent on each other, given the corresponding tandemer distributions for MFPs in previous clusters. Notably, the set of balance equations for tandemer distribution matrices for MFPs in the i'th cluster can be formulated as following: where X i is a matrix whose rows represent the tandemer distributions for MFPs in the i'th cluster (i.e. each tandemer distribution represented as a row vector; removing infeasible tandemers), Y i is a matrix whose rows represent tandemer distributions and Cauchy product of tandemer distributions computed for previous clusters, and A i and B i consist of corresponding fluxes.
Solving a set of balance equations for MFPs in the i'th cluster requires calculating the inverse of A i , whose number of rows (and columns) is equal to the number of MFPs in the cluster. As the cumomers approach also involves grouping cumomers in clusters and inverting flux matrices (similar to A i , here) whose size depends on the number of cumomers in each cluster, the running time of cumomers and tandemers approaches can be compared in terms of cumomers and MFPs cluster sizes, and the time needed to invert the corresponding flux matrices. Theoretically, an n fold reduction in the number of MFPs versus cumomers in a cluster should result in n 3 improvement in running time. Notably, the number of non-feasible tandemers in each tandemer distribution matrix (represented by the number of column of X i has a negligible effect on the time require to solve Eq (4), which is dominated by the n 3 time required to calculate the inverse of A i ).

An example of the tandemers approach on a toy metabolic network
In this section we describe the application of the tandemers approach on a toy metabolic network shown in Fig 4a, where atom mappings are given in Fig 4b. We assume that metabolic fluxes as well as the labeling pattern of A are known (considering that isotopically labeled A is given the growth media) and aim to compute the tandemer distribution of a E 1;2;3;4 f g 2;3 f g (assumed to be measured via MS/MS). Applying the recursive procedure described in the previous section, starting from E 1;2;3;4 f g 2;3 f g , we identify a total of 10 MFPs (for metabolites other than A) whose tandemer distribution is needed to compute that of E 1;2;3;4 f g 2;3 f g . The MFP graph and its three connected components are shown in Fig 4c. Notably, cluster (III) depends on clusters (I) and (II), while the latter clusters are mutually independent.
For example, the isotopic balance equation for tandemer distribution matrices in cluster (I) is formulated as following, according to Eq (4): Considering that the tandemer distribution ½B 3;4 f g 4 f g is calculated as part of cluster (I) and ½C 1;2 f g 1 f g as part of cluster (II), enables to calculate the Cauchy product between the two matrices prior

Applying the tandemers method on a small-scale model of mammalian methionine metabolism
To demonstrate the applicability of the tandemers method for efficiently computing experimental MS/MS data in 13 C labeling experiments, we applied it on a simplified metabolic network model of mammalian cellular metabolism of methionine (Fig 5, S1 File). Methionine metabolism involves two partially overlapping cyclic pathways for transmethylation (of protein and DNA) and propylamine transfer (for polyamine biosynthesis). The metabolic donor of both the methyl and propylamine groups is S-adenosylmethionine (SAM), which has 15 carbons (hence, a high-carbon metabolite). The product metabolites S-adenosylhomocysteine (SAH) and methylthioadenosine (MTA) also have high number of carbons, 14 and 11 carbons respectively (as one additional SAM carbon is oxidized to CO 2 prior to the propylamine transfer). Considering that the number of isotopomers of a metabolite with n carbons is 2 n , explicitly modeling the entire isotopomer distribution of all five metabolites in this network (as done in the cumomers method) would require 52,306 variables.
To apply the tandemers method, we utilized experimentally determined fluxes in this network as input [23]. We assume that the carbon labeling pattern of metabolites that are outside the scope of the model, including that of media methionine, ATP, and 5-methyl-tetrahydrofolate (i.e. the labeling of the methyl group in 5-methyl-THF) are known. First, we applied the tandemers method to compute the tandemer distribution of all 5 metabolites in the network, assuming that parent fragments are the intact metabolites, and that product fragments are the adenine group in SAM, SAH, and MTA, and the four methionine carbons other than the methyl group for methionine and HCys. The resulting number of MFPs whose tandemer distributions are found to be required to compute the tandemer distributions of the given MFPs is 35. These are divided into clusters, such that the largest cluster has only 4 MFPs. In comparison, applying the cumomers method given the same input data would require a total of 52,306 cumomers (the same as the number of isotopomers), divided into clusters whose largest one has 10,197 cumomers.
Next, we ran both the tandemers and cumomers methods multiple times, choosing a different subset of metabolites to calculate their tandemer distribution in each run. The parent fragment was assumed to be the intact metabolite and the product fragment was chosen to be the ribose, adenine, propylamine, or the four methionine carbons other than the methyl group. We find an average number of only 33 MFPs per run of the tandemers method, while the cumomers method requiring of 52,306 cumomer variables regardless of assumed input. The average running time of the tandemers method was found to be 0.00046 seconds, while that of the cumomers method was 0.68 seconds,~1500-fold higher (Table 2). Notably, the significant reduction in running time will be especially important for designing optimal isotope tracing experiments, requiring numerous (many thousands for large-scale networks) repeated simulations of isotope labeling patterns for possible flux distributions [13].

Applying the tandemers approach in a large-scale metabolic network model of E. coli
To further demonstrate the applicability of the tandemers approach, we applied it on a largescale metabolic network model of E. coli [13]. This isotopomer model accounts for glycolysis, TCA cycle, pentose phosphate pathway, oxidative phosphorylation, pyruvate metabolism, anaplerotic reactions and other central and biosynthetic pathways, with a total of 206 metabolites and 405 reactions. Notably, metabolites with a high number of carbons were not included in this network reconstruction to facilitate the application of the cumomers approach (by lumping surrounding reactions together) [13]. The total number of isotopomers in this large-scale network is hence surprisingly low, reaching 19,404 isotopomers (much lower than that in the small-scale methionine network applied above). Hence, even though this network model is substantially larger than the methionine network, we do not expect the tandemers approach to demonstrate the same level of improvement compared to the cumomers method.
We applied the tandemers method 1000 times to compute the tandemer distribution of randomly chosen sets of metabolites (having between 1 to 20 metabolites). The average number of resulting MFPs was 695, with a maximal MFP cluster size of 32. In comparison, applying the cumomers approach resulted in 19,404 cumomers, and a maximal cluster size of 4,016. The average running time of the tandemers and cumomers methods are 0.01 and 3.3 seconds, respectively, representing a~300-fold improvement by the tandemers method (Table 2).
For the tandemers methods, the number of variables represents the number of MFP whose tandemer distribution is calculated; for the cumomers approach, it represents the number of cumomers. The number of these variables corresponds to the size of the flux matrices whose inverse is calculated by each method, and is hence proportional to overall running time (see Section 2.3.3). We report the average variable count, clustersize, and running time for the cumomers and tandemers methods in multiple simulations given different sets of metabolites and collisional fragments, as described above. Notably, considering that MFA applications and especially experimental design of isotope tracing experiments require thousands of repeated simulations of metabolite isotopic labeling, the~1500-fold and 300-fold improvement in

Discussion
Tandem MS holds great promise for metabolic flux analysis as it provides information on metabolite positional labeling [12,[16][17][18]. However, a major limitation in using MFA with MS/MS data is the lack of a computationally efficient method for simulating isotopic labeling data measurable via MS/MS. State-of-the-art methods such as cumomers that enable to simulate MS/MS data requires simulating the abundance of all distinct isotopomers, whose number is exponentially dependent on the number of atoms in each metabolite. Here, we described the tandemers approach that is specifically designed for efficiently computing tandem massisotopomer distributions measurable via MS/MS, demonstrating a roughly two to three orders of magnitude improvement in running time compared to the cumomers approach. The tandemers approach is especially useful when analyzing metabolic networks with metabolites having a high number of carbons, where modeling the entire isotopomer distribution may become computationally intractable. In our application of the tandemers method on a metabolic network of mammalian methionine metabolism and for E. coli, we computed tandemer distributions for MFPs in which the parent fragment was the entire metabolite. This represents the case where no in-source fragmentation occurs during MS ionization, which is typically the situation with LC-MS. However, in-source fragmentation can occur (mostly with GC-MS), leading to measurement of tandemer distributions with parent fragment that is not the entire metabolite. Such in-source fragmentation can provide further useful information on positional labeling, for example, as was recently used to infer all distinct isotopomers of aspartate [24]. Obviously, the tandemers approach may also be used to compute tandemer distributions for MFPs with parent fragments that are not the entire metabolite.
In a recent study, we described a method, Metabolic Flux Analysis/Unknown Fragments (MFA/UF), capable of using MS/MS data to improve flux inference even when the positional origin of fragments is unknown [12]. MFA/UF extends upon standard MFA and jointly searches for the most likely metabolic fluxes together with the most plausible position of collisional fragments that would optimally match measured MS/MS data. To simulate MS/MS data given candidate fluxes and candidate collisional fragments, MFA/UF utilized the cumomers approach to simulate MS/MS labeling data. Considering that the tandemers approach was shown here to outperform the cumomers method, integrating it within MFA/UF is expected to lead to a significant improvement in running time.
Considering that a major current complication in utilizing MS/MS data in metabolic flux analysis involves the lack of computationally efficient methods for simulating such experimental measurements, we expect the tandemers approach to promote broader usage of this technology.