Identification of Conserved Moieties in Metabolic Networks by Graph Theoretical Analysis of Atom Transition Networks

Conserved moieties are groups of atoms that remain intact in all reactions of a metabolic network. Identification of conserved moieties gives insight into the structure and function of metabolic networks and facilitates metabolic modelling. All moiety conservation relations can be represented as nonnegative integer vectors in the left null space of the stoichiometric matrix corresponding to a biochemical network. Algorithms exist to compute such vectors based only on reaction stoichiometry but their computational complexity has limited their application to relatively small metabolic networks. Moreover, the vectors returned by existing algorithms do not, in general, represent conservation of a specific moiety with a defined atomic structure. Here, we show that identification of conserved moieties requires data on reaction atom mappings in addition to stoichiometry. We present a novel method to identify conserved moieties in metabolic networks by graph theoretical analysis of their underlying atom transition networks. Our method returns the exact group of atoms belonging to each conserved moiety as well as the corresponding vector in the left null space of the stoichiometric matrix. It can be implemented as a pipeline of polynomial time algorithms. Our implementation completes in under five minutes on a metabolic network with more than 4,000 mass balanced reactions. The scalability of the method enables extension of existing applications for moiety conservation relations to genome-scale metabolic networks. We also give examples of new applications made possible by elucidating the atomic structure of conserved moieties.


Introduction
Conserved moieties give rise to pools of metabolites with constant total concentration and dependent individual concentrations.These constant metabolite pools often consist of highly connected cofactors that are distributed throughout a metabolic network.Representative examples from energy metabolism include the AMP and NAD moieties [3,30].Changes in concentration ratios within these cofactor pools affect thermodynamic and mass action kinetic driving forces for all reactions they participate in.Moiety conservation therefore imposes a purely physicochemical form of regulation on metabolism that is mediated through changes in concentration ratios within constant metabolite pools.Reich and Sel'kov likened conserved moieties to turning wheels that are "geared into a clockwork" [30].They described the thermodynamic state of energy metabolism as "open flow through a system closed by moiety conservation".Identification of conserved moieties in metabolic networks has helped elucidate complex metabolic phenomena including synchronisation of glycolytic oscillations in yeast cell populations [6] and the function of glycosomes in the African sleeping sickness parasite Trypanosoma brucei [5].It has also been shown to be relevant for drug development [5,9].
Identification of conserved moieties has been of interest to the metabolic modelling community for several decades [31,40].It is particularly important for dynamic modelling [17] and metabolic control analysis [16] where metabolite concentrations are explicitly modelled.Moiety conservation relations provide a sparse, physically meaningful description of concentration dependencies in a metabolic network.They can be used to eliminate redundant metabolite concentrations as the latter can be derived from the set of independently varying metabolite concentrations.Doing so facilitates simulation of metabolic networks and is in fact required for many computational modelling methods [31,40].Mathematically, moiety conservation gives rise to a stoichiometric matrix with linearly dependent rows.The left null space of the stoichiometric matrix therefore has nonzero dimension (see Section 2.2).Vectors in the left null space, hereafter referred to as conservation vectors, can be divided into several interrelated sets based on their numerical properties and biochemical meaning (Fig. 1).Moiety vectors constitute a subset of conservation vectors with a distinct biochemical interpretation.Each moiety vector represents conservation of a particular metabolic moiety.Elements of a moiety vector correspond to the number of instances of a conserved moiety in metabolites of a metabolic network.As moieties are discrete quantities, moiety vectors are necessarily nonnegative integer vectors.
Methods exist to compute conservation vectors based only on the stoichiometric matrix of a metabolic network.These methods compute different types of bases for the left null space of the stoichiometric matrix (see S1 Appendix for mathematical definitions).Each method draws basis vectors from a particular set of conservation vectors (Fig. 1).There is a tradeoff between the computational complexity of these methods and the biochemical interpretability of the basis vectors they return.At the low end of the computational complexity spectrum are linear algebraic methods such as singular value decomposition.Other methods, such as Householder QR factorisation [40] or sparse LU factorisation [13] are more efficient for large stoichiometric matrices.These methods construct a linear basis for the left null space from real-valued conservation vectors.Though readily computed, these vectors are also the most difficult to interpret as they generally contain negative and noninteger elements.
Schuster and Höfer [35] introduced the use of vertex enumeration algorithms to compute the extreme rays of the positive orthant of the left null space.They referred to these extreme rays as "extreme semipositive conservation relations".Famili and Palsson [11] later referred to them as "metabolic pools" and the set of all extreme rays as "a convex basis for the left null space".Like moiety vectors, extreme rays are nonnegative integer vectors.They are therefore readily interpreted in terms of constant metabolite pools.However, extreme rays can currently only be computed for relatively small metabolic networks due to the computational complexity of vertex enumeration algorithms [4].Moreover, the set of extreme rays is not identical to the set of moiety vectors (Fig. 1).Schuster and Hilgetag [34] presented examples of extreme rays that did not represent moiety conservation relations, as well as moiety vectors that were not extreme rays.
Moiety vectors are a property of a metabolic network while extreme rays are a property of its stoichiometric Figure 1.Sets of conservation vectors for metabolic networks.The set of real-valued conservation vectors consists of all vectors in the left null space of a stoichiometric matrix.Real-valued basis vectors can be computed using efficient linear algebra algorithms but are difficult to interpret as they generally contain negative and noninteger elements.Nonnegative integer vectors are easier to interpret but more difficult to compute.Existing algorithms have exponential worst case time complexity.Algorithms exist to compute extreme rays, the set of all nondecomposable nonnegative integer vectors, and a maximal set of linearly independent nonnegative integer vectors.These vector sets intersect with the set of moiety vectors but are not equivalent to it.Moiety vectors represent conservation of an identifiable group of atoms in network metabolites.They are a property of the specific set of metabolites and reactions that constitute a metabolic network whereas other conservation vectors are a property of the network's stoichiometric matrix.The method presented here computes moiety vectors in polynomial time.matrix.Multiple metabolic networks could in theory have the same stoichiometric matrix, despite consisting of different sets of metabolites and reactions.These networks would all have the same set of extreme rays, but could have different sets of moiety vectors.Schuster and Hilgetag [34] published an extension to the vertex enumeration algorithm in [35] to compute the set of all nondecomposable nonnegative integer vectors in the left null space of a stoichiometric matrix.This set is guaranteed to contain all nondecomposable moiety vectors for a particular metabolic network as subset (Fig. 1).However, it is impossible to identify the subset of moiety vectors without information about the atomic structure of metabolites.
Alternatives to vertex enumeration have been proposed to speed up computation of biochemically meaningful conservation vectors, e.g., [10,25,37].Most recently, De Martino et al. [10] published a novel method to compute a nonnegative integer basis for the left null space of a stoichiometric matrix.This method [10] relies on stochastic algorithms, without guaranteed convergence, but that were empirically shown to perform well even on large networks.Like extreme rays, the nonnegative integer vectors computed with this method are not necessarily moiety vectors (Fig. 1).In general, methods to analyse stoichiometric matrices are not suited to specifically compute moiety vectors.Computation of moiety vectors requires information about the atomic composition of metabolites.To our knowledge, only one method has previously been published to specifically compute moiety vectors for metabolic networks [28].This method was based on nonnegative integer factorisation of the elemental matrix; a numerical representation of metabolite formulas.Nonnegative integer factorisation of a matrix is at least NP-hard [41] and no polynomial time algorithm is known to exist for this problem.Moreover, only the chemical formula but not the atomic identities of the conserved moieties can be derived from this approach.Identifying the atoms that belong to each moiety requires additional information about the fate of atoms in metabolic reactions.This information is not contained in a stoichiometric matrix.
Here, we propose a novel method to identify conserved moieties in metabolic networks.Our method is based on the premise that atoms within the same conserved moiety follow identical paths through a metabolic network.Given data on which substrate atoms map to which product atoms in each metabolic reaction, the paths of individual atoms through a metabolic network can be encoded in an atom transition network.Until recently, the necessary data were difficult to obtain but relatively efficient algorithms have now become available to predict atom mappings in metabolic reactions [12,22,23].These algorithms have made it possible to construct atom transition networks for large metabolic networks.Unlike metabolic networks, atom transition networks are amenable to analysis with efficient graph theory algorithms.Here, we take advantage of this fact to identify conserved moieties in metabolic networks in polynomial time.Furthermore, starting from atom transition networks allows us to associate each conserved moiety with a specific group of atoms in a subset of metabolites in a metabolic network.
This work combines elements of biochemistry, linear algebra and graph theory.We have made an effort to accommodate readers from all fields.The main text consists of informal descriptions of our methods and results, accompanied by illustrative examples and a limited number of mathematical equations.Formal definitions of italicised terms are given in supporting file S1 Appendix.We precede our results with a section on the theoretical framework for this work, where we introduce key concepts and notation used in the remainder of the text.

Theoretical framework 2.1 Metabolic networks
A metabolic network consists of a set of metabolites that interconvert via a set of metabolic reactions.Metabolic networks in living beings are open systems that exchange mass and energy with their environment.For modelling purposes, the boundary between system and environment can be defined by introducing a set of metabolite sources and sinks collectively known as exchange reactions.Unlike internal reactions, exchange reactions are artificial constructs that do not conserve mass or charge.The topology of a metabolic network can be represented in several ways.Here, we use metabolic maps and stoichiometric matrices.A metabolic map for a small example metabolic network is shown in Fig. 2.This example will be used throughout this section to demonstrate key concepts relevant to this work.
A stoichiometric matrix for an open metabolic network with m metabolites and n reactions is denoted by S ∈ R m×n .Each row of S represents a metabolite and each column a reaction such that element S ij is the stoichiometric coefficient of metabolite i in reaction j.Coefficients are negative for substrates and positive for products.Substrates and products in reversible reactions are defined by designating one direction as forward.The stoichiometric matrix can be written as where N ∈ Z m×u consists of columns representing internal (mass balanced) reactions and B ∈ R m×(n−u) consists of columns representing exchange reactions (mass imbalanced).Note that N represents a metabolic network that is closed to the environment.In what follows we will refer to N as the internal stoichiometric matrix, B as the exchange stoichiometric matrix, and S as the total stoichiometric matrix.The total stoichiometric matrix for the example metabolic network in Fig. 2 is given in Table 1.
Stoichiometric matrices are incidence matrices for generalised graphs known as hypergraphs [20].Hypergraphs contain hyperedges that can connect more than two nodes.The metabolic map in Figure 2 is a planar visualisation of a hypergraph with one hyperedge, connecting four metabolites.A graph edge that only connects two nodes is a special instance of a hyperedge.Apart from the occasional isomerisation reaction, metabolic reactions involve more than two metabolites.As a result, they cannot be represented as graph edges without loss of information.Metabolic networks are therefore represented as hypergraphs where nodes represent metabolites and hyperedges represent reactions.Since reactions have a designated forward direction, they are directed hypergraphs.Representing Table 1.The total stoichiometric matrix S = [N, B] for the example metabolic network.
Rows are labelled with the corresponding metabolite identifier from Fig. 2. The internal stoichiometric matrix N ∈ Z 4×1 is row rank deficient, with rank (N ) = 1.The dimension of its left null space is therefore dim N N T = 4 − 1 = 3.The total stoichiometric matrix S ∈ Z 4×5 is full row rank.Its left null space is therefore zero dimensional.
metabolic networks as hypergraphs has the advantage of conserving basic structure and functional relationships.The disadvantage is that many graph theoretical algorithms are not applicable to hypergraphs [20].

Moiety vectors
An internal stoichiometric matrix N ∈ Z m×u for a closed metabolic network is always row-rank deficient, i.e., rank (N ) < m [35].The left null space of N , denoted by N N T , therefore has finite dimension given by dim N N T = m − rank (N ).The left null space holds all conservation vectors for a stoichiometric matrix [17].
The number of linearly independent conservation vectors for a closed metabolic network is dim N N T .The total stoichiometric matrix S for an open metabolic network has a greater rank than the internal stoichiometric matrix N for the corresponding closed metabolic network (e.g., Table 1), i.e., rank (N ) < rank (S).Consequently, for an open metabolic network S, such that S T z = 0, then z is also a conservation vector for the corresponding closed network N , and N T z = 0, since S = [N, B].The set of conservation relations for an open network is therefore a subset of all conservation relations for the corresponding closed network, i.e., N S T ⊆ N N T .In what follows we will mainly be concerned with the larger set of conservation relations for a closed metabolic network.Schuster and Hilgetag [34] defined a moiety vector l 1 as a nonnegative integer vector in the left null space of a stoichiometric matrix, i.e., In addition, they defined l 1 to be a maximal moiety vector if it cannot be decomposed into two other vectors l 2 and l 3 that satisfy Eq. 2 and 3, i.e., if where α 2 , α 3 ∈ N + .We propose a more specific definition.The properties above define increasingly small sets of conservation vectors (Fig. 1).Equation 2defines the set of all conservation vectors.Addition of Eq. 3 defines the set of nonnegative integer conservation vectors and addition of Eq. 4 defines the set of nonnegative integer conservation vectors that are nondecomposable.Although this set includes all nondecomposable moiety vectors as subset it is not equivalent (Fig. 1).To define the set of moiety vectors we require a fourth property.We define l 1 to be a moiety vector if it satisfies Eq. 2 and 3 and represents conservation of a specific metabolic moiety, i.e., an identifiable group of atoms in network metabolites.Element l 1,i should correspond to the number of instances of the conserved moiety in metabolite i.We define l 1 to be a nondecomposable moiety vector if it satisfies condition 4 and a composite moiety vector if it does not.Nondecomposable moiety vectors for the DOPA decarboxylase reaction from the example metabolic network in Fig. 2 are given in Table 2a.For comparison, conservation vectors computed with existing methods for conservation analysis of metabolic networks are given in Tables 2b-2d.In general, these vectors do not represent moiety conservation.
Table 2. Different types of conservation vectors for the DOPA decarboxylase reaction. (a) Moiety vectors are denoted l k .(a) Moiety vectors computed with the method presented here.Each column represents the conservation of a particular metabolic moiety.l 1 represents conservation of the dopamine moiety (blue background in Fig. 2), l 2 the CO 2 moiety (red background), and l 3 the hydrogen moiety (orange background).(b) The elemental matrix.Each column represents conservation of a particular element.Elemental conservation vectors generally do not span the left null space of a stoichiometric matrix.(c) Real-valued conservation vectors computed with singular value decomposition of the internal stoichiometric matrix N in Table 1.Real-valued conservation vectors cannot generally be interpreted in terms of conserved moieties as they contain negative and noninteger values.(d) Extreme rays of the left null space N N T .The first three belong to the intersection between the sets of extreme rays and moiety vectors in Fig. 1.The fourth belongs to the set difference.It cannot represent moiety conservation as no atoms are exchanged between H + and CO 2 .Without information about atom mappings between metabolites it would be impossible to determine which extreme rays correspond to conserved moieties.The full set of all nondecomposable nonnegative integer vectors includes 13 additional vectors (not shown), none of which represent moiety conservation.

Atom transition networks
Metabolic reactions conserve mass and chemical elements.Therefore, there must exist a mapping from each atom in a reactant metabolite to a single atom of the same element in a product metabolite.An atom transition is a single mapping from a substrate to a product atom.An atom transition network contains information about all atom transitions in a metabolic network.It is a mathematical structure that enables one to trace the paths of each individual atom through a metabolic network.An atom transition network can be generated automatically from a stoichiometric matrix for a metabolic network and atom mappings for internal reactions.The atom transition network for the DOPA decarboxylase reaction from the example metabolic network in Fig. 2 is shown in Fig. 3. Unlike metabolic networks, atom transition networks are graphs since every atom transition (edge) connects exactly two atoms (nodes).They are directed graphs since every atom transition has a designated direction that is determined by the directionality of the parent metabolic reaction, i.e., the designation of substrates and products.
Because atom transition networks are graphs, they are amenable to analysis with efficient graph algorithms that are not generally applicable to metabolic networks due to the presence of hyperedges [20].3 Results

Identification of conserved moieties in the dopamine synthesis pathway
We will demonstrate our method by identifying conserved moieties in the simple dopamine synthesis network DAS in Fig. 4.This network consists of 11 metabolites, four internal reactions and seven exchange reactions.The total stoichiometric matrix S = [N, B] is given in Table 3.The internal stoichiometric matrix N is row rank deficient with rank (N ) = 4.The dimension of the left null space is therefore dim N N T = 7, meaning that there are seven linearly independent conservation vectors for the closed metabolic network.Our analysis of an atom transition network for DAS will conclude with the computation of seven linearly independent moiety vectors that span N N T .To compute these vectors we require the internal stoichiometric matrix in Table 3 and atom mappings for the four internal reactions.Here, we used algorithmically predicted atom mappings [12].These data are required to generate an atom transition network for DAS (see Section 5.2).By graph theoretical analysis of this atom transition network we derive the first of two alternative representations of moiety conservation relations which we term moiety graphs.Nodes in a moiety graph represent separate instances of a conserved moiety.Each node is associated with a specific set of atoms in a particular metabolite.The second representation of moiety conservation relations are the moiety vectors which can be derived from moiety graphs in a straightforward manner.
Moiety vectors computed with our method are therefore associated with specific atoms via moiety graphs.
To identify all conserved moieties in DAS we require an atom transition network for all atoms regardless of element but for demonstration purposes we will initially focus only on carbon atoms.A carbon atom transition network for DAS is shown in Fig. 5a.Our working definition of a conserved moiety is a group of atoms that follow identical paths through a metabolic network.To identify conserved moieties, we therefore need to trace the paths of individual atoms and determine which paths are identical.The paths of individual atoms through the carbon Table 3.The total stoichiometric matrix S = [N, B] for DAS.
Rows and columns are labelled, respectively, with the corresponding metabolite and reaction identifiers from Fig. 4. The hydrogen ion (H + ) exchange reaction E7 was omitted from Fig. 4 for simplification.The first four columns of S correspond to the internal stoichiometric matrix N and the last seven columns correspond to the exchange stoichiometric matrix B.
atom transition network for DAS can be traced by visual inspection of Fig. 5a.For example, we can trace a path from C1 in L-phenylalanine to C7 in dopamine via C3 in L-tyrosine and C8 in levodopa.This path is made up of atom transitions in reactions R1, R2, and R3 from Fig. 4. In graph theory terms, these four carbon atoms and the atom transitions that connect them constitute a connected component [14] or, simply, a component of the directed graph representing the carbon atom transition network for DAS.A directed graph is said to be connected if a path exists between any pair of nodes when edge directions are ignored.A component of a directed graph is a maximal connected subgraph.In total, the carbon atom transition network for DAS in Fig. 5a consists of 18 components.The paths of the first eight carbon atoms (C1-C8) in L-phenylalanine are identical in the sense that they include the same number of atoms in each metabolite and the same number of atom transitions in each reaction.In graph theory terms, the components containing C1-C8 in L-phenylalanine are isomorphic.An isomorphism between two graphs is a structure preserving vertex bijection [14].The definition of isomorphism varies for different types of graphs as they have different structural elements that need to be preserved.An isomorphism between two simple graphs is a vertex bijection that preserves the adjacency and nonadjacency of every node, i.e., its connectivity.An isomorphism between two simple directed graphs must also preserve edge directions.We define an isomorphism between two components of an atom transition network as a vertex bijection that preserves the metabolic identity of every node.The nature of chemical reactions ensures that all other structural elements are preserved along with metabolic identities, including the connectivity of atoms and the number, directions and reaction identities of atom transitions.The 18 components of the carbon atom transition network for DAS in Fig. 5a can be divided into three sets, where every pair of components within each set is isomorphic.An isomorphism between two components of an atom transition network is a one-to-one mapping between atoms in the two components.For example, the isomorphism between the two left-most components in Fig. 5a maps between C1 and C2 in L-phenylalanine, C3 and C2 in L-tyrosine, C8 and C7 in L-DOPA, and C7 and C8 in dopamine.We say that two atoms are equivalent if an isomorphism maps between them.We note that our definition of isomorphism only allows mappings between atoms with the same metabolic identity.Two atoms can therefore only be equivalent if they are in the same metabolite.Equivalent atoms follow identical paths through a metabolic network and therefore belong to the same conserved moiety.In general, we define a conserved moiety to be a maximal set of equivalent atoms in an atom transition network.To identify conserved moieties, we must therefore determine isomorphisms between components of an atom transition network to identify maximal sets of equivalent atoms.
The first eight carbon atoms (C1-C8) in L-phenylalanine are equivalent.They are therefore part of the same conserved moiety, which we denote λ 1 .The last eight carbon atoms (C2-C9) in L-tyrosine are likewise part of the same conserved moiety.They make up another instance of the λ 1 moiety.The λ 1 moiety is conserved between L-phenylalanine and L-tyrosine in reaction R1, between L-tyrosine and levodopa in reaction R2, and between levodopa and dopamine in reaction R3.Each of the four metabolites contains one instance of the λ 1 moiety.The path of this moiety through DAS defines its conservation relation.This brings us to our first representation of moiety conservation relations, which we term moiety graphs.Moiety graphs are obtained from atom transition networks by merging a set of isomorphic components into a single graph.Moiety graphs for the three carbon atom moieties in DAS are shown in Fig. 5b.Four additional moieties were identified by analysis of an atom transition network for DAS that included all atoms regardless of element.All seven moiety graphs are shown in Fig. 6.Atoms belonging to each node in the moiety graphs are highlighted in Fig. 4.
The second way to represent moiety conservation relations is as moiety vectors.Above we defined a moiety vector as a conservation vector l k where element l k,i corresponds to the number of instances of moiety k in metabolite i of a metabolic network (see Section 2.2).We can now make this definition exact by relating moiety vectors to moiety graphs.Each instance of a conserved moiety is represented as a node in its moiety graph.Element l k,i of a moiety vector therefore corresponds to the number of nodes in moiety graph λ k that represent moieties in metabolite i. Moiety vectors are readily derived from moiety graphs by counting the number of nodes in each metabolite.Moiety vectors for DAS were derived from the moiety graphs in Fig. 6.The seven moiety vectors are given as columns of the moiety matrix L ∈ Z 11×7 in Table 4.These seven vectors are linearly independent and therefore span all seven dimensions of N N T .The moiety matrix L is therefore a moiety basis for the left null space.Table 4. Moiety vectors for DAS.
The seven moiety vectors, denoted l 1 -l 7 are written as columns of the moiety matrix L. Note that L 3,6 = L 4,6 = 2 because levodopa (i = 3) and dopamine (i = 4) each contain two instances of the l 6 moiety (see moiety graph λ 6 in Fig. 6).

Effects of variable atom mappings between recurring metabolite pairs
Atom transition networks are generated from atom mappings for internal reactions of metabolic networks.However, atom mappings for metabolic reactions are not necessarily unique.Computationally predicted atom mappings, as used here, are always associated with some uncertainty.In addition, there can be biochemical variability in atom mappings, in particular for metabolites containing symmetric atoms.All reactions of the O 2 molecule, for example, have at least two biochemically equivalent atom mappings since the two symmetric oxygen atoms map with equal probability to connected atoms.Different atom mappings give rise to different atom transition networks that may contain different moiety conservation relations.For the most part, we found that varying the set of input atom mappings did not affect the number of computed moiety conservation relations, only their atomic structure.An important exception was when atom mappings between the same pair of metabolites varied between reactions in the same metabolic network.
The same pair of metabolites often exchange atoms in multiple reactions throughout the same metabolic network.Common cofactors such as ATP and ADP, for example, exchange atoms in hundreds of reactions in large metabolic networks [39].In the dopamine synthesis network, DAS in Fig. 4, O 2 and H 2 O exchange an oxygen atom in two reactions, R1 and R2.Since the two oxygen atoms of O 2 are symmetric, there are four possible combinations of oxygen atom mappings for these two reactions.Each combination gives rise to a different oxygen transition network as shown in 7. Two of these oxygen transition networks, shown in Figures 7a and 7b, contain two moiety conservation relations each, λ 6 and λ 7 , which are shown in Fig. 7c.The other two oxygen transition networks, shown in Figures 7d and 7e, contain only one moiety conservation relation each, λ 8 , which is shown in Fig. 7f.
The DAS atom transition network considered in the previous section was generated with the oxygen atom mappings in Fig. 7a and thus contained the two moiety conservation relations λ 6 and λ 7 (see Fig. 6).An atom transition network generated with the atom mappings in Fig. 7d or 7e would contain the single moiety conservation relation λ 8 instead of these two.What distinguishes the oxygen transition networks in Figures 7d and 7e is that the oxygen atom in O 2 that maps to H 2 O varies between the two reactions R1 and R2.The atom transition network for DAS therefore contains one less moiety conservation relation if the atom mapping between this recurring metabolite pair varies between reactions.The moiety matrix for these alternative atom transition networks, only contains six linearly independent columns and is therefore not a basis for the seven dimensional left null space of N .
The vector representation of moiety graph λ 8 is We note that l 8 = l 6 + l 7 where l T 7 = 0 0 0 0 0 0 0 0 1 1 0 , from Table 4.The moiety vector l 8 therefore represents a composite moiety.It does not meet the definition of a nondecomposable moiety vector in Eq. 4. This example shows that variable atom mappings between recurring metabolite pairs may cause multiple nondecomposable moiety conservation relations to be joined together into a single composite moiety conservation relation.We formulated an optimisation problem, described in Section 5.5, to decompose composite moiety vectors.Solving this problem for the composite moiety vector l 8 yields the two nondecomposable components l 6 and l 7 .

General properties of identified moieties
We applied our method to identify conserved moieties in three metabolic networks of increasing size.The networks, listed from smallest to largest, were the dopamine synthesis network, DAS in Fig. 4, the E. coli core metabolic network, iCore [26], and an atom mapped subset of the generic human metabolic reconstruction, Recon 2 [39] which we refer to here as subRecon.The dimensions of the three networks are given in Table 5a.Further descriptions are provided in Section 5.1.There are seven linearly independent conservation relations for the closed DAS network, 11 for iCore, and 351 for subRecon.
Atom transition networks were generated using algorithmically predicted atom mappings [12] as described in Section 5.2.Seven, ten and 345 moiety conservation relations were identified in the predicted atom transition network for DAS, iCore and subRecon, respectively (Table 5b).Characterisation of identified moieties revealed some trends (Fig. 8).We found a roughly inverse relationship between the frequency of a moiety, defined as the number of instances, and the size of that moiety, defined as the number of atoms per instance.We also found a relationship between moiety size, frequency and classification.Internal moieties tended to be large and infrequent, occurring in a small number of closely related secondary metabolites, e.g., the 35 atom AMP moiety found in the  were derived by decomposing the columns of L as described in Section 5.5.(c) Carbon isotopomers (see Section 3.5).Comparison between the number of carbon atom and carbon moiety isotopomers.(d) Computation times (in seconds) for the graph-based method presented here, in comparison to the vertex enumeration algorithm described in [4] (see Section 3.8).three iCore metabolites AMP, ADP and ATP.Integrative moieties were usually small and frequent while transitive moieties were intermediate in both size and frequency.The smallest moieties consisted of single atoms.These were often highly frequent, occurring in up to 62/72 iCore metabolites and 2,472/2,970 subRecon metabolites.These results indicate a remarkable interconnectivity between metabolites at the atomic level.Due to their frequency, single atom moieties accounted for a large portion of atoms in each metabolic network.Single atom moieties accounted for nearly half (791/1,697) of all atoms in iCore, and approximately two thirds (104,268/153,298) of all atoms in subRecon.
Moiety matrices derived from the predicted atom transition networks for iCore and subRecon did not span the left null spaces of their respective stoichiometric matrices, indicating that they might contain composite moiety vectors.Using the method described in Section 5.5, we found two composite moiety vectors in the moiety matrix for iCore, and 10 in the one for subRecon.Decomposition of these vectors yielded three new nondecomposable moiety vectors for iCore and 18 for subRecon (Table 5b).The 11 nondecomposable moiety vectors for iCore were linearly independent.They therefore comprised a basis for the 11 dimensional left null space of N for iCore.The 353 nondecomposable moiety vectors for subRecon, on the other hand, were not linearly independent and only spanned 347 out of 351 dimensions in the left null space of S for subRecon.This indicated that there existed conservation relations for subRecon that were independent of atom conservation.
Schuster and Höfer, citing earlier work by Aris [2] and Corio [8], noted the importance of considering electron conservation in addition to atom conservation [35].Unfortunately, it is not as straightforward to map electrons as atoms and no formalism currently exists for electron mappings.As a result, electron conservation relations cannot be computed with the current version of our algorithm.We therefore computed electron conservation relations for subRecon by decomposing the electron vector with the method described in Section 5.5.An electron vector for a metabolic network with m metabolites is a vector e ∈ N m where e i is the total number of electrons in metabolite i. Decomposition of e for subRecon yielded 11 new conservation vectors.When combined, the 11 electron vectors and the 353 fully decomposed moiety vectors for subRecon (Table 5b) spanned the left null space of the subRecon stoichiometric matrix.

The gearwheels of metabolism
Internal moieties define pools of metabolites with constant total concentration and dependent individual concentrations.In the small dopamine synthesis network DAS in Fig. 4, the biopterin moiety (l 3 ) is classified as internal.
This moiety is conserved between the metabolites BH 2 and BH 4 .The total concentration of BH 2 and BH 4 is therefore fixed at a constant value in DAS.If the concentration of BH 2 increases, the concentration of BH 4 must decrease by the same amount and vice versa.
The concentration dependency between BH 2 and BH 4 couples all reactions that interconvert the two metabolites.Assume that DAS is initially at a steady state when there is a sudden increase in flux through reactions R1, R2, R3 and associated exchanges such that the concentrations of all primary metabolites remain constant.This would lead to net consumption of BH 4 accompanied by net production of BH 2 .The increased BH 2 /BH 4 concentration ratio would increase thermodynamic and mass action kinetic driving forces through R4, while simultaneously decreasing driving forces through R1 and R2.The system would eventually settle back to the initial steady state or a new one depending on reaction kinetic parameters and substrate availability.Conservation of the biopterin moiety therefore imposes a purely physicochemical form of regulation on dopamine synthesis that is mediated through mass action kinetics and thermodynamics.This statement can be generalised to all internal moieties, as Reich and Sel'kov did in their 1981 monograph on energy metabolism [30].
Reich and Sel'kov's gearwheel analogy [30] is appropriate for the five internal moieties we identified in iCore.These five moieties define five well known cofactor pools (Table 6).Each pool is coupled to a set of reactions that interconvert metabolites within that pool.The five pools are also coupled to each other through shared reactions, forming a gearwheel-like mechanism (Fig. 9).A change in concentration ratios within any pool will affect the driving forces that turn the wheels.The central wheel in iCore is the NAD moiety (l 6 ).A change in concentration ratios within one pool will therefore be propagated to other pools via the NAD/NADH concentration ratio (Fig. 9).This example shows how local changes in the state of a metabolic network can be propagated throughout the network via coupled cofactor pools defined by internal moieties.The majority of moieties identified in subRecon were classified as internal (237/345).Most of these internal moieties were artefacts of the way the subset of reactions from Recon 2 were selected, i.e., based on the availability of atom mapping data (see Section 5.1).Many reactions in subRecon were disconnected from the rest of the network and therefore could not carry any flux.To identify reactions capable of carrying flux, we computed the flux consistent part of subRecon [42], which consisted of 3,225 reactions and 1,746 metabolites.We identified 118 moiety conservation relations for this part of subRecon, 33 of which were classified as internal.The metabolite pools defined by these moieties consisted of between 2 and 9 metabolites and were distributed across five cell compartments; the cytosol, mitochondria, nucleus, endoplasmic reticulum, and peroxisomes.Some moieties were compartment specific while others were distributed amongst metabolites in two different compartments.As in iCore, the internal moiety pools were not independent of each other but were coupled by shared reactions.

Application of moiety graphs to stable isotope assisted metabolic flux analysis
Atoms in the same instance of a conserved moiety all follow the same path through a metabolic network.In an atom transition network these atoms are represented as separate nodes and their atom transitions as separate edges.
A moiety graph encodes the paths of all atoms in an atom transition network in a reduced number of nodes and edges.In effect, they are reduced representations of atom transition networks that can be used in many of the same applications.Atom transition networks arise most frequently in the context of stable isotope assisted metabolic flux analysis where they underpin the ability to model the flow of isotopically labelled atoms through metabolic networks [44].Stable isotope assisted metabolic flux analysis (MFA) deals with estimation of internal reaction fluxes in a metabolic network based on data from isotope labelling experiments [44].Internal fluxes are estimated by fitting a mathematical model to measured exchange fluxes and isotopomer distributions.
A basic MFA model consists of nonlinear flux balance equations formulated around isotopomers of metabolites in the metabolic network of interest [43].A metabolite with n carbon atoms has 2 n carbon atom isotopomers.Therefore, the number of isotopomer balance equations grows exponentially with the number of metabolites in the metabolic network.More sophisticated MFA modelling frameworks have been developed to reduce the complexity of the problem, notably the cumomer [45] and elementary metabolite unit (EMU) [1] frameworks.Cumomer models consist of flux balance equations formulated around transformed variables called cumomers.They are the same size as isotopomer models but have a simpler structure that makes them easier to solve.EMU models have a similar structure as cumomer models but are significantly smaller.They consist of flux balance equations formulated around transformed variables known as EMU species.The number of EMU species for a given metabolic network is much smaller than the number of isotopomers and cumomers.
MFA models can be derived from moiety graphs instead of atom transition networks without loss of predictive capacity.We say that a moiety is labelled if any of its atoms are labelled and define moiety isotopomers as different labelling states of a metabolite's moieties.The eight carbon containing metabolites in DAS (Fig. 4) have 2,820 possible carbon atom isotopomers.Their 55 carbon atoms can be grouped into 11 carbon moieties (Fig. 5b) with only 22 possible carbon moiety isotopomers.The reduction in number of isotopomers is even more pronounced for the two larger metabolic networks (Table 5c), reaching 12 orders of magnitude for iCore.It was less for subRecon where a greater proportion of moieties consist of a single atom (Figure 8).However, it was still substantial.Deriving MFA models from moiety graphs can therefore reduce the number of model equations by several orders of magnitude.
Isotopomer and cumomer models, in particular, can be simplified with this approach.The algorithm to generate EMU species from atom transition networks ensures that atoms in the same instance of a conserved moiety are always part of the same EMU species.EMU models derived from moiety graphs will therefore be identical to those derived from atom transition networks (see supporting file S2 Figure ).Regardless of the MFA modelling framework, moiety graphs can be used to simplify design of isotope labelling experiments, by reducing the number of options for labelled substrates.

Application of moiety vectors to decomposition of metabolic networks
Moiety vectors can be used to decompose a metabolic network into simpler moiety subnetworks [29].An open metabolic network with total stoichiometric matrix S can be decomposed into t moiety subnetworks where t is the number of moiety conservation relations for the corresponding closed network N .Each moiety vector l k ∈ N (N ) defines a stoichiometric matrix for one moiety subnetwork as Stoichiometric matrices for moiety subnetworks (S (k) ) are generally more sparse than the stoichiometric matrix for the full metabolic network (S).Each moiety subnetwork only includes the subparts of metabolites and reactions that involve a particular moiety.Moiety subnetworks of DAS are shown in Fig. 10a.In addition to being more sparse than the full metabolic network (Fig. 4), these subnetworks have simpler topologies.Of the seven moiety subnetworks of DAS only one (S (6) ) was a hypergraph.All other DAS subnetworks were graphs.Four of 11 iCore subnetworks and 342 of 365 subRecon subnetworks were also graphs.We note that, although metabolic networks could in theory be decomposed with other types of conservation vectors, only moiety vectors are guaranteed to result in mass balanced subnetworks (see Fig. 10b).

Instantaneous moieties
The results above were for moieties identified for metabolic network reconstructions where we assume each reaction is active.These moieties will only be relevant if all reactions in those reconstructions are actually active in practice, i.e., carrying nonzero flux.In general, not all reactions in a metabolic network are active simultaneously, e.g., oxidative phosphorylation reactions in iCore are only active in the presence of oxygen.The set of instantaneous conserved moieties, their conservation relations, and their classification depend on which reactions are active at any point in time.All steady state flux distributions v ∈ R n are in the right null space N (S) of the total stoichiometric matrix S for a metabolic network [27].A convex basis for N (S) gives all extreme pathways of a metabolic network [33].
Extreme pathways are analogous to extreme semipositive conservation relations in the left null space N S T (see Section 1).They are a maximal set of conically independent steady state flux distributions.Any steady state flux distribution can be written as a conical combination of extreme pathways.
To see how instantaneous conserved moieties vary depending on what reactions are active we computed the extreme pathways of iCore with the vertex enumeration algorithm from [4].Computation of the extreme pathways of subRecon with the same algorithm was not tractable.The algorithm returned 1,421 extreme pathways for iCore.The number of instantaneous moiety conservation relations for these pathways ranged from 4 to 11 and the total number of moieties (i.e., instances) ranged from 18 to 520. Figure 11 shows an example of instantaneous moieties in an extreme pathway that corresponds to glycolysis.We found that moieties classified as transitive or integrative in the entire iCore network, were often classified as internal in individual extreme pathways.In particular, the inorganic phosphate moiety (P i ) was classified as internal in all except one extreme pathway.The constant metabolite pool defined by the P i moiety varied between pathways, consisting of P i , ATP, AMP and 9 to 17 phosphorylated intermediates of glycolysis and the pentose phosphate pathway.The ammonia moiety (NH 4 + ) was also classified as internal in many extreme pathways (266/1,421) where it defined a constant metabolite pool consisting of NH 4 + , glutamine and glutamate.

Computational complexity
The computational complexity of the method presented here is largely determined by the following two steps: 1) finding connected components of an atom transition network, and 2) determining isomorphisms between components.
We used an implementation of Tarjan's Algorithm [38] to find connected components of atom transition networks (see Section 5.3).The worst case time complexity of this algorithm is O (p + q) where p is the number of atoms (nodes) and q is the number of atom transitions (edges) in the input atom transition network.We apply Tarjan's algorithm to the simple graph underlying the input atom transition network, which generally contains significantly fewer edges.Algorithms to determine isomorphisms between two general graphs are an active research area.Atom transition networks are specialised graphs where every node is associated with a metabolite and every edge is associated with a reaction in the parent metabolic network.These additional structural elements of atom transition networks make it possible to determine isomorphisms between their components by pairwise comparisons (see Section 5.3).Since every atom must be connected to at least one other atom, the number of components is bounded from above by p /2.The number of components in the atom transition networks treated here was much lower.There were 57 components in the atom transition for DAS, 391 in the one for iCore, and 16,348 in the one for subRecon.If no component is isomorphic to any other component, we need to compare the first component to all other components, the second component to all others except the first, etc.The maximum number of comparisons is therefore The overall worst case time complexity of our method is therefore O p 2 + q .In practice, however, computation time scales much better (Table 5d).Identification of conserved moieties in subRecon took under five minutes with our method.We compared this performance with an implementation of a vertex enumeration algorithm [4] to compute the extreme rays of the left null space of a stoichiometric matrix (Table 5d).The two algorithms performed similarly on the two smaller networks but computation of extreme rays proved intractable for subRecon.The vertex enumeration algorithm did not complete after running for a week, at which point we terminated the process.
It may be of interest to know how our method scales with the size of metabolic networks, instead of the size of atom transition networks.The number of atoms per metabolite varies greatly but is bounded from above.So is the number of atom transitions per reaction.The largest metabolite in the three metabolic networks treated here was the subRecon metabolite neurotensin (Recon 2 ID C01836), with 241 atoms.The largest reaction was the subRecon reaction peroxisomal thiolase 2 (Recon 2 ID SCP2x), with 1,791 atom transitions.This is a composite reaction with large stoichiometric coefficients.Such large reactions are anomalous.The average number of atom transitions per metabolic reaction was much lower.The average (±standard deviation) was 44 (±16) for DAS, 81 (±72) for iCore, and 105 (±90) for subRecon.The number of atoms and atom transitions scales approximately linearly with the number of metabolites and internal reactions, respectively (Table 5d).We can therefore approximate the worst case time complexity of our method as O m 2 + u .

Discussion
Moiety conservation relations are a subset of nonnegative integer conservation relations for a metabolic network.
In principle, the latter can be computed using only a stoichiometric matrix, but the computational complexity of existing algorithms [10,11,25,34,35] has prohibited their application to large networks.Computation of moiety conservation relations requires information about the paths of atoms through metabolic networks in addition to reaction stoichiometry (see Section 2.2).Here, we incorporated this information in the form of atom transition networks.Doing so allowed us to formulate the problem of computing moiety conservation relations as a graph theory problem that is solvable in polynomial time.We related atom paths to connected components of atom transition networks and conserved moieties to equivalent nodes of isomorphic components.We provided a novel definition of isomorphism that is specific to the structure of atom transition networks.This definition enabled us to determine isomorphisms and identify conserved moieties in a fast and reliable way.The relationship between conservation relations and metabolite substructures has long been known [3,28,30].A relationship between conservation relations and graph theoretical properties of atom transition networks has not, to our knowledge, been demonstrated prior to this work.This is also, to our knowledge, the first polynomial time method to compute nonnegative integer conservation relations for metabolic networks.
Our method requires data on reaction stoichiometry and atom mappings for internal reactions of a metabolic network.Reliable data on reaction stoichiometry are readily available from high quality, manually curated metabolic network reconstructions that have been published for hundreds of organisms over the past couple of decades or so.These reconstructions are accessible in a standardised format [18], e.g., through the BioModels database [24].Atom mapping data are increasingly becoming accessible through biochemical databases but are still largely algorithmically generated [22,23].KEGG [15,21] and BioPath (Molecular Networks GmbH, Erlangen, Germany) provide manually curated atom mappings but the data are not freely accessible.No database currently provides mappings for hydrogen atoms or electrons which are required to compute all conserved moieties in a metabolic network.Data formats vary between databases as there is currently no agreed standard.However, the availability and quality of atom mapping data are rapidly increasing and we expect these issues will be remedied in the near future.
We chose to use the DREAM algorithm [12] to predict atom mappings for this work.Advantages of DREAM include ease of use, the ability to map hydrogen atoms, and use of the information-rich rxnfile format.A disadvantage of DREAM is that it uses mixed integer linear programming (MILP) which has exponential worst case time complexity.Kumar and Maranas recently published the first polynomial time atom mapping algorithm, called canonical labelling for clique approximation (CLCA) [22].An implementation of this algorithm has not yet been released but should further speed up the process of obtaining atom mapping predictions.CLCA predictions for 27,000 reactions are already accessible through the MetRxn database [22].These predictions were not yet suitable for this work, however, as they do not include hydrogen atoms.Conserved moieties identified with our method depend on input atom mappings (see Section 3.2).We showed how variable atom mappings between recurring metabolite pairs could give rise to a non-maximal set of composite moiety vectors.Note that composite moieties are a biochemical reality, not just an artefact of the atom mapping algorithm used.Many metabolite pairs do have multiple biochemically equivalent atom mappings, each of which is realised in a living organism.For modelling purposes, however, it is desirable to identify a maximal number of linearly independent moiety conservation relations.We therefore formulated an MILP algorithm for decomposition of composite moiety vectors ( Section 5.5).It would be preferable to construct the atom transition network with minimal variability in atom mappings between recurring metabolite pairs to avoid composite moieties altogether.
Doing so would be relatively straightforward if input data included all alternative atom mappings for reactions.Prediction of alternative atom mappings with the DREAM algorithm is possible but time consuming, both due to the longer running times required, and because DREAM outputs each alternative atom mapping in a separate rxnfile.Some effort is therefore required to integrate alternative predictions.The CLCA algorithm outputs alternative atom mapping predictions in a single file by default and should therefore facilitate identification of nondecomposable moiety conservation relations.Ultimately, however, predicted atom mappings need to be manually curated for alternatives.
To span the left null space of Recon 2 we needed to decompose the electron vector e ∈ N m 0 ( Section 3.3) with the MILP algorithm described in Section 5.5.We note that this MILP algorithm can also be used to decompose the elemental matrix for a metabolic network.This is in fact a method for nonnegative integer factorisation of the elemental matrix, similar to the algorithm presented in [28].However, this method has exponential worst case time complexity.Also, while MILP decomposition of the elemental matrix returns the chemical composition of moieties it cannot be used to pinpoint the exact group of atoms in a metabolite that belong to each moiety.Empirically, we found that MILP decomposition of the elemental matrices for the three metabolic networks treated here completed in a reasonable amount of time although it scaled much worse than analysis of atom transition networks (3.4 × 10 −1 s for DAS, 1.8 × 10 0 s for iCore, 4.7 × 10 3 s for subRecon, compare to Table 5d).In the absence of atom mapping data, MILP decomposition of the elemental matrix provides an alternative way to compute moiety conservation relations for metabolic networks.For the most part, decomposition of elemental matrices gave the same set of vectors as analysis of atom transition networks.The only exception was that decomposition of the elemental matrix for DAS returned the vector l T 9 = 0 1 2 0 2 2 0 0 1 0 0 , (11) in place of the oxygen moiety vector l 6 in Table 4.We note that l 9 = l 6 + 2 (l 2 − l 1 ) does not correspond to a conserved moiety in DAS.
Here, we highlighted three potential applications of our method; to identify constant metabolite pools (Section 3.4), to model isotope labelling experiments for metabolic flux analysis (Section 3.5), and to decompose metabolic networks (Section 3.6).These applications take advantage of our method's unique ability to identify the exact group of atoms that correspond to each conserved moiety.As we alluded to in the introduction, another clear application area is metabolic modelling.A nonnegative integer basis for the left null space can be used to simplify metabolic models and to compute a full rank Jacobian which is required for many computational modelling methods [31,40].
Other applications would include minimisation of intermediate metabolite concentrations [36], and computation of minimal cut sets [19].We also believe our method may be of value to theoretical biologists.For example, the ability to decompose metabolic networks into simpler subnetworks may facilitate research on physical and mathematical properties that are otherwise obscured by topological complexity.

Metabolic networks
We tested our method on three metabolic networks of increasing sizes (see Table 5a), two human and one E. coli network.The E. coli network consisted of core metabolic pathways including glycolysis, the pentose phosphate shunt, the TCA cycle, oxidative phosphorylation and fermentation [26].We refer to this network as iCore for abbreviation.
The two human networks were derived from the generic human metabolic reconstruction Recon 2 [39].The smaller of the two consisted of four internal reactions from the dopamine synthesis pathway and seven metabolite exchange reactions.We refer to this network as DAS, and its four internal reactions as R1, R2, R3, and R4.R1 corresponds to Recon 2 reaction r0399, R2 is a composite of reactions TYR3MO2 and THBPT4ACAMDASE, R3 corresponds to reaction 3HLYTCL, and R4 is a composite of reactions DHPR and FDH.
The larger human network, which we refer to as subRecon, included approximately two thirds (4,261/6,691) of internal reactions in Recon 2. This was the largest subset of Recon 2 reactions for which atom mappings could be predicted at the time of our analysis.For most of the remaining reactions (2,380/2,430), we were unable to generate rxnfiles for input to the DREAM server [12].For other reactions (50/2,430), the DREAM algorithm timed out or failed to parse input rxnfiles.Rxnfiles could not be generated for 1,871/2,380 due to lack of information about metabolite structures, and for 509/2,380 reactions because they were not mass or charge balanced.

Generation of atom transition networks
Atom transition networks were generated based on atom mappings for metabolic reactions.Atom mapping predictions were obtained through the web interface to the mixed integer linear programming method DREAM [12].The objective was set to minimise the number of bonds broken and formed in each reaction.Reactions were input to DREAM in rxnfile format (Accelrys, San Diego, CA).Rxnfiles were written from data on reaction stoichiometry and metabolite structures in molfile format (Accelrys, San Diego, CA).All hydrogen atoms were explicitly represented to obtain mappings for hydrogen atoms in addition to other elements.Care was taken to ensure that hydrogen and charge balancing of reactions was the same in rxnfiles as in the parent stoichiometric matrix.This was essential to ensure that computed moiety vectors were in the left null space of the stoichiometric matrix.

Identification of conserved moieties
We denote the internal stoichiometric matrix of a metabolic network by N ∈ Z m×u .Conserved moieties in the metabolic network were identified by analysis of an atom transition network that was generated as described in 5.2.
We denote the incidence matrix of the input atom transition network by A ∈ {−1, 0, 1} p×q where p is the number of atoms and q the number of atom transitions.The first step in our analysis is to find connected components of A. To this end, we used an implementation of Tarjan's algorithm [38] (see Section 5.6).We denote the incidence Each atom in a component belongs to a particular metabolite in the metabolic network.We define a mapping matrix M (h) ∈ {0, 1} m×x that maps atoms to metabolites.It is defined such that M i,g = 1 if the atom represented by row g in C (h) belongs to the metabolite represented by row i in N .Otherwise, M (h) i,g = 0.The component C (h) represents conservation of a single atom throughout the metabolic network.We define its atom conservation vector as i.e., it is the column sum of M (h) .Element a h,i is therefore the number of atoms in metabolite i that are in component C (h) .We define two components C (h) and C (d) to be isomorphic if they include the same number of atoms from each metabolite.It follows that the two components are isomorphic, with A moiety graph λ k is obtained by merging a set K of isomorphic components into a single graph.The incidence matrix of λ k is given by We note that G (k) = C (h) ∀h ∈ K except that the rows of G (k) represent separate instances of a conserved moiety instead of atoms.A moiety vector l k is derived from the incidence matrix G (k) of a moiety graph in the same way that the atom conservation vector a h was derived from the incidence matrix C (h) of a component in Eq. 12.This is equivalent to setting l k = a h ∀h ∈ K.

Classification of moieties
We classified moieties according to the schema presented in [11].Briefly, moieties were grouped into three categories termed transitive, integrative and internal.These categories were referred to as Type A, Type B, and Type C, respectively, in [11].A moiety with conservation vector l k was classified as internal if it was conserved in the open metabolic network represented by the total stoichiometric matrix S, i.e., if S T l k = 0. Metabolites containing internal moieties were defined as secondary metabolites, while all other metabolites were defined as primary metabolites.
Moieties that were only found in primary metabolites were classified as transitive moieties, while those that were found in both primary and secondary metabolites were classified as integrative moieties.

Decomposition of moiety vectors
Our method for analysing atom transition networks returns r moiety vectors {l k ∈ N m 0 | k ∈ [1, r]} as the columns of the moiety matrix L ∈ N m×r 0 .As described in Section 3.2, our method may return composite moiety vectors if the input atom transition network was generated from variable atom mappings between recurring metabolite pairs.Any composite moiety vector can be written as l k = x k + y k , where x k and y k are nonzero moiety vectors.To decompose a composite moiety vector l k , we solved the mixed integer linear programming (MILP) problem We denote this problem by P k .The constraint in Eq. 15 defines the solution vectors x k and y k as components of l k .The constraints in Eq. 16 and 17 correspond to Eq. 2 and 3 defining nonnegative integer conservation vectors (see Section 2.2).These constraints are implicit for y k due to Eq. 15.The constraint in Eq. 18, when combined with Eq. 15, ensures that x k and y k are both greater than zero.We chose to minimise the sum of elements in x k       6 are coupled into a gearwheel-like mechanism.An increase in the NAD/NADH concentration ratio would affect driving forces in the direction shown.(a) Any reactions that interconvert NAD and NADH would be driven in the direction of increased NAD consumption.These include reactions of glycolysis and the TCA cycle, reactions converting malate and lactate to pyruvate, and reactions converting pyruvate, ethanol, and acetaldehyde to acetyl CoA.In short, NAD/NADH coupled reactions would be driven in the direction of increased acetyl CoA production from available carbon sources.(b) The increased NAD/NADH concentration ratio would also affect driving forces through reactions that couple the NAD pool to other cofactor pools.Altered flux through these reactions would in turn affect concentration ratios within those pools which are coupled to their own sets of reactions.(c) An increased NADP/NADPH ratio would drive flux through the pentose phosphate pathway and conversion of glutamate to alpha-ketoglutarate.An increased Q8/Q8H2 ratio would inhibit flux through the electron transport chain.Increased acetyl-CoA/CoA and succinyl-CoA/CoA ratios would drive acetate production and TCA cycle reactions, respectively, which are coupled to ATP production from ADP.(d) An increase in the ATP/ADP ratio resulting from increased flux through these reactions would drive ATP consuming reactions.In iCore, ATP consuming reactions are mainly found in gluconeogenesis so the increased ATP/ADP ratio would counteract the effects of an increased NAD/NADH ratio to some extent.4) were used to decompose the stoichiometric matrix for DAS (Table 3) into five subnetworks.Colours match the corresponding moieties in Fig. 4 and 6.The two hydrogen atom moiety subnetworks (l 4 and l 5 ) were omitted to simplify the figure.(b) A subnetwork derived from an extreme ray that did not represent moiety conservation.This subnetwork is not mass balanced as there is no mass transfer between Phe and BH 2 , Tyr and BH 2 , or BH 2 and CO 2 in the full metabolic network (Fig. 4 ).

Figure 2 .
Figure 2. A metabolic map for an example metabolic network.The network consists of one internal reaction and four exchange reactions.The internal reaction is the DOPA decarboxylase reaction (KEGG Reaction ID: R02080) that produces dopamine (DA, KEGG Compound ID: C03758) and CO 2 from levodopa (L-DOPA, C00355) and H + .The open network includes source reactions for the two substrates and sink reactions for the two products.Arrowheads indicate reaction directionality.Metabolite structures were rendered from molfiles (Accelrys, San Diego, CA) with MarvinView (ChemAxon, Budapest, Hungary).Atoms are numbered according to their order in each metabolite's molfile.Atoms of different elements are numbered separately, in colours matching their elemental symbol.The internal reaction conserves three metabolic moieties.Atoms belonging to the same moiety have identically coloured backgrounds.Levodopa and dopamine each contain one instance of a dopamine moiety (blue background).Implicit hydrogen atoms on both metabolites are also part of this moiety.Levodopa and CO 2 each contain one instance of a CO 2 moiety (red background).Finally, the hydrogen ion and dopamine each contain one instance of a hydrogen moiety (orange background).

Figure 3 .
Figure 3.A graphical representation of an atom transition network for the DOPA decarboxylase reaction.Nodes (open circles) represent atoms.Atoms can be matched to metabolite structures in Fig. 2 on their metabolite identifiers, colours and numbers.Directed edges (arrows) represent atom transitions.All except one hydrogen atom are omitted to simplify the figure.

( a )
Dimensions of stoichiometric matrices.The number of linearly independent conservation relations is dim N N T = m − rank (N ) in a closed network with stoichiometric matrix N ∈ Z m×u .(b) Dimensions of moiety matrices.Initial moiety matrices L ∈ N m×r 0 were computed directly from predicted atom transition networks.Decomposed moiety matrices D ∈ N m×t 0

Figure 5 . 4 .
Figure 5. Identification of conserved carbon moieties in DAS.(a) The carbon atom transition network.Numbering of atoms and line styles of atom transitions refer to Fig. 4. The directed graph consists of 18 components, one for each of the nine carbon atoms in L-phenylalanine, and one for each of the nine carbon atoms in tetrahydrobiopterin.The single carbon atom (C1) in formate is in the same component as C9 in L-phenylalanine, since a path can be traced between the two atoms when directionalities of atom transitions are ignored.Isomorphic components have matching colours.A single instance of a conserved moiety consists of all equivalent atoms in a set of isomorphic components.(b) Moiety graphs for the three carbon moieties in DAS.Each graph was obtained by merging a set of isomorphic components in (a) into a single directed graph.Each node represents an instance of a conserved moiety.Each edge represents conservation of a moiety between two metabolites in a particular reaction.Colours match the background colours of the corresponding moieties in Fig.4.Analysis of the full atom transition network for DAS yielded four additional conserved moieties (Fig.6).

Figure 6 .
Figure 6.Moiety graphs for all seven conserved moieties in DAS.The seven moieties were identified by analysis of the full atom transition network for DAS in Fig. 4. Colours match the background colours of the corresponding moieties in Fig. 4. The chemical composition of each moiety is given below its graph.

Figure 7 .
Figure 7. Effects of variable atom mappings between O 2 and H 2 O in DAS.The recurring metabolite pair exchanges an oxygen atom in two reactions, R1 and R2 in Fig. 4. Since the two oxygen atoms of O 2 are symmetric, there are four possible combinations of oxygen atom mappings for these two reactions.Each combination gives rise to a different oxygen transition network.(a) The first oxygen atom (O1) in O 2 maps to the single oxygen atom (O1) in H 2 O in both R1 and R2.(b) O2 in O 2 maps to O1 in H 2 O in both R1 and R2.(c) Moiety graphs obtained from the oxygen atom transition networks in (a) and (b).Two nondecomposable moiety conservation relations were identified in each atom transition network where the same atom mapped from O 2 to H 2 O in both R1 and R2.(d) O1 in O 2 maps to O1 in H 2 O in R1 while O2 in O 2 maps O1 in H 2 O in R2.(e) O2 in O 2 maps to O1 in H 2 O in R1 while O1 in O 2 maps to O1 in H 2 O in R2.(f) The single moiety graph obtained from the oxygen atom transition networks in (d) and (e).Only one composite moiety conservation relation was identified in each atom transition network where a different atom mapped from O 2 to H 2 O in R1 than R2.

Figure 8 .
Figure 8. Characteristics of conserved moieties identified in the three metabolic networks treated here.The total number of instances of a moiety is plotted against the number of atoms per instance.Classification of moieties as transitive, internal, or integrative is described in Section 5.4.

Figure 9 .
Figure 9. Coupling between internal moiety pools in iCore.The five pools from Table6are coupled into a gearwheel-like mechanism.An increase in the NAD/NADH concentration ratio would affect driving forces in the direction shown.(a) Any reactions that interconvert NAD and NADH would be driven in the direction of increased NAD consumption.These include reactions of glycolysis and the TCA cycle, reactions converting malate and lactate to pyruvate, and reactions converting pyruvate, ethanol, and acetaldehyde to acetyl CoA.In short, NAD/NADH coupled reactions would be driven in the direction of increased acetyl CoA production from available carbon sources.(b) The increased NAD/NADH concentration ratio would also affect driving forces through reactions that couple the NAD pool to other cofactor pools.Altered flux through these reactions would in turn affect concentration ratios within those pools which are coupled to their own sets of reactions.(c) An increased NADP/NADPH ratio would drive flux through the pentose phosphate pathway and conversion of glutamate to alpha-ketoglutarate.An increased Q8/Q8H2 ratio would inhibit flux through the electron transport chain.Increased acetyl-CoA/CoA and succinyl-CoA/CoA ratios would drive acetate production and TCA cycle reactions, respectively, which are coupled to ATP production from ADP.(d) An increase in the ATP/ADP ratio resulting from increased flux through these reactions would drive ATP consuming reactions.In iCore, ATP consuming reactions are mainly found in gluconeogenesis so the increased ATP/ADP ratio would counteract the effects of an increased NAD/NADH ratio to some extent.

Figure 10 .
Figure 10.Moiety subnetworks of DAS.(a) Moiety vectors l 1 , l 2 , l 3 , l 6 , and l 7 (Table4) were used to decompose the stoichiometric matrix for DAS (Table3) into five subnetworks.Colours match the corresponding moieties in Fig.4 and 6.The two hydrogen atom moiety subnetworks (l 4 and l 5 ) were omitted to simplify the figure.(b) A subnetwork derived from an extreme ray that did not represent moiety conservation.This subnetwork is not mass balanced as there is no mass transfer between Phe and BH 2 , Tyr and BH 2 , or BH 2 and CO 2 in the full metabolic network (Fig.4).
meaning that there are fewer linearly independent conservation vectors for an open metabolic network than the corresponding closed network.This is consistent with physical reality, since mass can flow into and out of open networks but is conserved within closed networks.All quantities that are conserved in an open metabolic network are also conserved in the corresponding closed network.That is, if z is a conservation vector

Table 5 .
Results for the three metabolic networks treated here.