^{1}

^{2}

^{2}

^{1}

^{3}

^{4}

^{*}

Conceived and designed the experiments: WJR PLK SR DS. Performed the experiments: WJR. Analyzed the data: WJR PLK SR DS. Wrote the paper: WJR DS.

The authors have declared that no competing interests exist.

Metabolic networks perform some of the most fundamental functions in living cells, including energy transduction and building block biosynthesis. While these are the best characterized networks in living systems, understanding their evolutionary history and complex wiring constitutes one of the most fascinating open questions in biology, intimately related to the enigma of life's origin itself. Is the evolution of metabolism subject to general principles, beyond the unpredictable accumulation of multiple historical accidents? Here we search for such principles by applying to an artificial chemical universe some of the methodologies developed for the study of genome scale models of cellular metabolism. In particular, we use metabolic flux constraint-based models to exhaustively search for artificial chemistry pathways that can optimally perform an array of elementary metabolic functions. Despite the simplicity of the model employed, we find that the ensuing pathways display a surprisingly rich set of properties, including the existence of autocatalytic cycles and hierarchical modules, the appearance of universally preferable metabolites and reactions, and a logarithmic trend of pathway length as a function of input/output molecule size. Some of these properties can be derived analytically, borrowing methods previously used in cryptography. In addition, by mapping biochemical networks onto a simplified carbon atom reaction backbone, we find that properties similar to those predicted for the artificial chemistry hold also for real metabolic networks. These findings suggest that optimality principles and arithmetic simplicity might lie beneath some aspects of biochemical complexity.

Metabolism is the network of biochemical reactions that transforms available resources (“inputs”) into energy currency and building blocks (“outputs”). Different organisms have different assortments of metabolic pathways and input/output requirements, reflecting their adaptation to specific environments, and to specific strategies for reproduction and survival. Here we ask whether, beneath the intricate wiring of these networks, it is possible to discern signatures of optimal (i.e., shortest and maximally efficient) pathway architectures. A systematic search for such optimal pathways between all possible pairs of input and output molecules in real organic chemistry is computationally intractable. However, we can implement such a search in a simple artificial chemistry, which roughly resembles a single atom (e.g., carbon) version of real biochemistry. We find that optimal pathways in our idealized chemistry display a logarithmic dependence of pathway length on input/output molecule size. They also display recurring topologies, including autocatalytic cycles reminiscent of ancient and highly conserved cores of real biochemistry. Finally, across all optimal pathways, we identify universally important metabolites and reactions, as well as a characteristic distribution of reaction utilization. Similar features can be observed in real metabolic networks, suggesting that arithmetic simplicity may lie beneath some aspects of biochemical complexity.

The prominent role of metabolism in any biological process and the fact that a large portion of the environmental factors shaping living systems are ultimately metabolic in nature, suggest that strong selective forces have been acting on metabolic networks throughout the history of life. In laboratory evolution experiments

In a 1961 review, Baldwin and Krebs suggested that biochemical network topologies may reflect the adaptation toward optimally efficient metabolic strategies, and that manifold use of certain molecules may be a crucial element of this adaptation, as “it is indeed a general principle of evolution that multiple use is made of given resources.”

In the present work, we combine the study of an extremely simple artificial chemistry

Our artificial chemistry consists of a set of _{1}, _{2}, _{3}, …, _{N}_{i} + a_{j}_{k}_{N}^{2}/4, see _{N}_{j}_{out}_{i}_{in}_{out} = v_{in}·j/i_{i}_{j}_{i}_{j}

(_{4} artificial chemistry network is composed of metabolite “strings” of _{4} network. There are four reactions that represent the exchange of mass with the environment (r_{1}-r_{4}) – one for each metabolite – and four reactions between the metabolites (r_{5}-r_{8}).

Despite the simplicity of our artificial chemistry, identifying the MBPs between all possible input-output pairs in a given artificial chemistry _{N}

(_{2 +} _{2} ↔ _{4} and is the most used reaction. MBPs on a yellow background are autocatalytic cycles. For a larger image with more examples, and a more detailed view of the properties observed in these networks, see _{8} from _{7} is also used to produce _{1}, _{2}, and _{4}. (_{10} from _{9}. Each breaks down _{10} into _{1} in different but equally optimal ways. Each of these sub-pathways (shaded metabolites) is an MBP in itself, showing the modularity of use of each of these metabolic tools.

Behind the apparent complexity of the topologies encountered in each of the different pathways, it is possible to observe the recurrence of three fundamental categories: each MBP functions either as a pure “addition chain”

This modular architecture of recurring graph types provides a topological signature of optimally efficient pathways in our idealized chemistry. Since these pathways are chosen based on their minimal length, one may expect that a systematic analysis of all MBP lengths will display additional distinctive properties. Indeed, pathway lengths increase roughly logarithmically with the size of the input (or output) molecule (_{9} ⇒ _{6} can be performed in 2 steps, but the neighbor task _{9} ⇒ _{7} requires a minimum of 6 steps. Moreover, while most MBPs have only one or a few optimal realizations, selected instances display a peak in possible redundant solutions (_{x}_{x}_{+1}), or to the inherent complexity of a specific molecule (e.g., _{7} ⇒ _{j}

Each plot has a single starting value _{i}_{i}_{1}/_{1} input. (_{2}/_{2} input. (_{5}/_{5} input.

A similar search for patterns associated with minimal steps had been previously encountered in the mathematics of addition-subtraction chains, of high importance in cryptography ^{128} can either be performed in 127 multiplications (^{2}, ^{2} × ^{3},…, ^{127} × ^{128}) or in a chain of 7 exponent multiplications (^{2}, ^{2} × ^{2} = ^{4}, ^{4} × ^{4} = ^{8},…, ^{64} × ^{64} = ^{128}). The latter can be further simplified by tracking the sums of the exponents in each calculation, which form an addition chain (1, 2, 4, 8, 16, 32, 64, 128). Shortest addition-subtraction chains are commonly used to calculate very large numbers in the fewest number of steps, thus speeding up computation time. These are often applied to methods in cryptography where the calculated exponents can have on the order of thousands to tens of thousands of bits

The pathways explored in our model resemble optimal addition-subtraction chains. For example, the problem of obtaining _{128} from _{1} is formally equivalent to the addition chain example described above. However while typical addition-subtraction chains start with the number 1, in our MBPs we explore minimal paths starting from any molecule _{i}_{i}_{j}

We can now ask whether similar minimal pathway length signatures are discernible in real metabolic networks. To cope with the gap in complexity between our model and real chemistry, we mapped real metabolic networks onto a single atom backbone _{6}H_{14}O_{12}P_{2}) into dihydroxyacetone phosphate (C_{3}H_{7}O_{6}P) and glyceraldehyde-3-phosphate (C_{3}H_{7}O_{6}P), can be mapped onto a carbon atom backbone, becoming simply C_{6} ↔ C_{3} + C_{3} (see _{6} ↔ _{3} + _{3} reaction in the idealized chemistry. Upon performing this mapping onto a carbon atom backbone, we ask whether the structure of real metabolic networks allows interconversions that use the minimal, logarithmic number of steps found for the artificial chemistry (

Each plot has a single starting value _{i}_{i}_{2}/_{2} input. Correlation coefficient = 0.92, p-val = 0.003. (_{3}/_{3} input. Correlation coefficient = 0.94, p-val = 0.001. (_{5}/_{5} input. Correlation coefficient = 0.78, p-val = 0.04.

The second method is aimed at identifying all shortest pathways between any two carbon compounds in the whole genome-scale metabolic network of _{5} as an input), the specific peaks and valleys of the predicted function are closely followed by the

So far, we have analyzed the properties of individual MBPs in our idealized chemistry, as well as analogous minimal length pathways in

Here, we build a meta-metabolome for our idealized chemistry by considering the collection of MBPs. One could imagine that each task _{i} ⇒ _{j} corresponds to a different organism, which has filled a specific metabolic niche (availability of _{i}_{j}

For this analysis we used the set of MBPs calculated on the _{19} network using the MILP method. One first result of this analysis is that every metabolite of an even length is used in many more MBP reactions than their odd length neighbors, compared to the underlying chemistry (_{8} from _{1} requires only three doubling reactions (_{1} + _{1} → _{2}, _{2} + _{2} → _{4}, and _{4} + _{4} → _{8}). In addition, this same pathway, with one additional reaction, can also be used to optimally produce _{9} and _{10} (see _{2} + _{2} ↔ _{4}, see ^{2} = 0.99). This value is close to our theoretically predicted value of −1 (See

(_{19} network were calculated using the MILP method while the others (_{30} through _{100}) were estimated using the iterative algorithm. We calculated the reaction usage by counting the number of MBPs that use each reaction. These were then ranked in descending order, yielding curves that follow a power law with an average exponent of -1.14 (+/− 0.03) (R^{2} = 0.99). The reaction usage in the KEGG-derived carbon dataset was calculated by counting the number of times each equivalent reaction appears, and follows a power law tail distribution, with exponent −0.89. The curve predicted by the analytical model, with exponent −1, is shown as a solid line (see _{19} model. This also shows the frequency of use of each metabolite in the _{19} network itself, and in a randomly chosen set of reactions (control). In the inset, the metabolite usage was sorted by rank and plotted on a semilog axis. (

Model reaction | KEGG carbon reaction | |

1 | a_{2} + a_{2} ↔ a_{4} |
_{6} + C_{6} ↔ C_{12} |

2 | a_{1} + a_{1} ↔ a_{2} |
C_{1} + C_{5} ↔ C_{6} |

3 | a_{4} + a_{4} ↔ a_{8} |
C_{1} + C_{3} ↔ C_{4} |

4 | _{3} + a_{3} ↔ a_{6} |
_{1} + C_{4} ↔ C_{5} |

5 | a_{2} + a_{4} ↔ a_{6} |
_{5} + C_{5} ↔ C_{10} |

6 | _{6} + a_{6} ↔ a_{12} |
C_{1} + C_{7} ↔ C_{8} |

7 | _{1} + a_{2} ↔ a_{3} |
C_{1} + C_{8} ↔ C_{9} |

8 | a_{8} + a_{8} ↔ a_{16} |
_{3} + C_{3} ↔ C_{6} |

9 | _{5} + a_{5} ↔ a_{10} |
C_{4} + C_{5} ↔ C_{9} |

10 | _{1} + a_{4} ↔ a_{5} |
_{1} + C_{2} ↔ C_{3} |

As in the case of the artificial chemistry network, we can now search for patterns of metabolites and reactions usage in the collective set of all metabolic reactions known in living systems, obtained from the KEGG database ^{−6}; see also _{N}

In addition to a preference for specific reactions, we can ask whether the spectrum of metabolite usage across the whole KEGG metabolism reflects the possible optimality criteria encountered in the model (_{5} periodicity is the profuse usage of adenine and nicotinamide adenine dinuculeotide compounds as energy carriers and redox balance molecules, although the removal of such compounds has little effect on the observed periodicity (

We have explored the potential existence of general principles underlying the evolution of metabolic network architecture. Specifically, we studied the properties of pathways (the MBPs) that perform elementary metabolic tasks with maximal yield and minimal length in an idealized chemistry. Using the results from the model chemistry, we asked whether similar signatures of optimally efficient organization could be found in real metabolic networks.

In computing possible MBPs, we have focused mostly on identifying modular features, on predicting their lengths, and on the statistics of usage of metabolites and reactions. In the future, it may be interesting to characterize the full spectrum of degenerate MBPs for large artificial chemistries. This would allow us to assess, for example, the density of specific topologies (such as autocatalytic cycles), or the dependence of degeneracy on the numerical properties of input/output pairs. One of our algorithms (the elementary modes one) can find a large number of degenerate solutions, including autocatalytic cycles. This algorithm is currently not scalable, because of the difficulty of computing elementary flux modes, especially in the highly connected artificial chemistry network we have used, though very recent improvements in elementary flux mode calculations _{100} network, though in this case the ensuing pathways are not of minimal length (see

Among the recurrent MBP topologies identified, we encountered numerous autocatalytic cycles. The properties of autocatalytic cycles have been studied previously _{7}_{8}_{7}_{8}

Along with the structural details of MBPs, we also used analytical methods to estimate the length of MBPs as a function of the length of input and output molecules. This estimate closely matched the lengths of the artificial chemistry pathways computed with numerical algorithms. These calculations establish a new link between two apparently unrelated disciplines, namely the mathematics of addition-subtraction chains and biochemistry. It will be interesting to explore in the future whether extensions to more realistic artificial chemistries can be formalized in a similar fashion. Conversely, the MBP length estimate obtained for biochemical pathways may suggest new avenues in applied mathematics.

To determine whether predictions of minimal MBP length in our idealized chemistry could have implications in real biochemistry, we searched for pathways of minimal length between compounds with different counts of carbon atoms in the

Previous work had addressed the question of optimality in specific metabolic pathways. For example, Meléndez-Hevia and Torres _{6} compound to C_{5}, with a yield of 5/6 ∼83% and an additional C_{1} byproduct; yet, obtaining maximal yield in this transformation, would cost at least 3 more reactions. One could argue that the combination of all of these criteria, including the maximization of ATP production and optimization of enzymatic catalysis may have played a key role in the evolution of modern metabolism, leading to compromise solutions. Exploring pathways that produce multiple compounds from multiple inputs with the addition of thermodynamic constraints might constitute an interesting model extension for further investigation.

We found that the statistical properties of the usage of reactions across MBPs recapitulate the statistics of reaction usage in the union of all known metabolic pathways (represented by the KEGG metabolic database). Both across the set of all MBPs for the idealized chemistry, and in the KEGG metabolic map, we observed that a few reactions are used far more often than many of the others in the set. Another way of determining the importance of individual reactions in the context of the global functionalities of a meta-metabolome would be to perform perturbation experiments. We implemented such an experiment in our idealized chemistry, by progressively removing reactions and checking how many metabolites can still be produced. Depending on whether reactions are removed in random order or in the order determined by their usage across MBPs, the outcome is quite different (

Determining to what extent real metabolic networks obey optimality principles like the ones described here will take additional effort. Even if an underlying arithmetic simplicity governs idealized optimal pathways, deviations from ideal behavior should be expected. For example, parallel selection pressures for energy production and biochemical stability would likely sacrifice pathway minimality. However, guiding principles as the ones we are proposing could serve as reference points for future research, including circumstances in which metabolism can be different from what we are used to. Using synthetic biology techniques, for example, it might be possible to redesign metabolic pathways so as to approach predicted ideal efficiencies and minimal enzyme cost

We define an artificial chemistry inspired by previous string-based artificial chemistries (see also main text and _{N}_{N}_{N}_{N}_{i}_{N}_{i}_{j}_{k}_{N}_{i}_{N}^{2}/4.

Flux Balance Analysis (FBA) is a steady state constraint-based approach to study the flow of mass through metabolic networks _{ij}_{j}

Additional constraints (such as availability of nutrients, experimentally observed irreversibility, maximal or minimal rates, etc.) can be imposed on the fluxes as inequalities of the form_{j}_{j}

Minimal Balanced Pathways (MBPs) are defined as sets of reactions in the _{N}_{j}_{,} with output flux _{out}_{i}_{in}_{out} = v_{in}·j/i_{i} and a_{j} will be indicated as _{i} ⇒ _{j}.

We have developed three different algorithms for computing MBPs, as described below:

We use a modified FBA approach to formulate the MBP problem in a constrained optimization framework. Specifically, we impose the same constraints used in an FBA problem, and further require that the maximal yield condition _{out} = v_{in}·y/x_{x}_{y}_{j}_{j}_{j}_{j}_{i}b_{i}_{x}_{y}

The optimal solution for this problem will give the flux distribution ^{−1}·h^{−1}, and the production of the target metabolite to the known maximal yield _{out}/v_{in} = j/i

Given a metabolic network defined by a stoichiometric matrix

It satisfies the steady state condition (

It must be feasible within the conditions of the model: if there are known boundaries for the fluxes, then

It must be non-decomposable. There are no two smaller EFMs that can be linearly combined to form the one in question.

Because of these constraints, those EFMs that use the minimal number of reactions satisfy the requirements for being an MBP. We used the METATOOL software package _{10} network, and then identified all of those EFMs that are also MBPs.

We designed and implemented an algorithm to produce most MBPs _{1} ⇒ _{1} (which requires no reactions), and _{1} ⇒ _{2} (requiring one reaction, _{1} + _{1} → _{2}). To compute _{1} ⇒ _{3}, we identify all the ways in which we can decompose 3 into two smaller addends (in this case, only one: 3 = 2+1). Next we combine together the previously computed MBPs that progress from _{1} to each of these two addends, giving a new putative MBP for the desired new task (_{1} + _{1} → _{2}, and _{1} + _{2} → _{3}). This procedure can be then iterated to give a prediction of MBP _{i}_{j}

This algorithm is fast and efficient compared to the previous methods, allowing us to apply it to even the _{100} network. However, it has two main drawbacks. First, it will miss pathways that “overshoot” the target value then subtract down to it. Second, it may miss MBPs that are not built modularly from smaller ones. From a comparison of the MBPs predicted by the different algorithms, one can see that the approximations introduced in this algorithm cause 18 out 361 MBPs (5%) in _{19} to overestimate pathway length by one reaction. Also, this algorithm correctly identifies 204 of the 384 degenerate MBPs that the EFM algorithm finds in _{10}. The reaction usage using this method is highly correlated with that of the MILP method applied to _{19} (Pearson correlation = 0.96, p-val = 10^{−51}), and the EFM method applied to _{10} (Pearson correlation = 0.98, p-val = 2·10^{−17}).

Data used for the comparison between the _{N}_{3} ↔ C_{3}). These reactions were ignored as well, without consequences on the results (data not shown).

We counted how often each metabolite and reaction was used in the artificial chemistry pathways as well as in the KEGG-derived single-atom networks. In the model pathways, reaction usage was calculated by counting how many times each reaction was used across all pathways. Metabolite usage was similarly calculated by counting the occurrence of reactions in which each metabolite participates. For example, in the pathways that convert _{9} to _{10} in _{9} participates in only one reaction, but _{10} participates in two.

In the KEGG-derived networks, a similar counting scheme was used. The reaction usage was calculated by counting how many times each reduced reaction appears, and the metabolite usage was calculated by counting how many times each metabolite appears across all reactions.

The first method used EFMs to find all shortest pathways in the central carbon metabolism of _{4}). Also, we effectively ignored reactions involving transport and exchange. We then used the EFM method described above to find the number of reactions in each of the MBPs for this reduced network.

For each input compound, we listed the lengths of the MBPs for all output compounds containing a number

The larger, genome-scale metabolic network of

As described above, we are only interested in the connections between carbon compounds, so we removed any non-carbonaceous metabolites (water, phosphate, ammonia, etc.). Also, we removed the following cofactors that are used in many reactions, but do not participate in the transformation of carbons: ATP, ADP, AMP, NAD+, NADH, NADP+, NADPH, coenzyme A, acetyl-CoA, and the acyl carrier protein.

Next, we used Johnson's all-pairs shortest paths algorithm (available as a Matlab function) to find shortest pathways between any two carbon compounds in

We developed an analytical approximation for the expected numbers of reactions to be found in any MBP _{i}_{j}_{j}_{1}

Our artificial chemistry represents a generalization, in which a metabolite of any length _{i}_{j}_{1} input and _{j}_{/i} output. Therefore, in the irreversible case, we can assume that inputs consist of monomers without loss of generality. Let

Sometimes the shortest chain can be found easily. For instance, {2^{0}, 2^{1}, …, 2^{k}^{k}^{m}

The above bounds give precise values in some cases and act as bounds in others. For instance,

There are various conjectures regarding ^{25}. Two other conjectures

While algorithms for generating the shortest addition chains are discussed by Thurber _{1}.

We are interested in the general case involving both addition and subtraction, and specifically the lengths _{i}_{j}_{1} as an input. Sometimes, in these cases,

Note also an inequality:

All of these features explain the growth law in equation (6).

The quantity

Recalling (6) we finally arrive at an approximation for the number of reactions in an MBP that uses _{i}_{j}

The approximation in (19) can also be used to estimate the rank distribution of reaction usage. Consider all possible MBPs producing _{j}_{i}_{pq}_{pq}_{pq}_{j}_{j}

From (19) it is clear that the average length 〈_{j}^{−1}. Thus we predict the power-law decay in (21).

A version of

(2.81 MB TIF)

Heatmaps describing MBP length and degeneracy

(0.02 MB PDF)

Logarithmic trend of metabolic pathway length

(0.04 MB PDF)

Metabolite and reaction usage frequencies for KEGG hydrogen, nitrogen, and oxygen sets

(0.09 MB PDF)

Fast Fourier transforms showing periodicity of metabolite usage

(0.02 MB PDF)

KEGG carbon reaction and metabolite usage with cofactors removed PDF

(0.04 MB PDF)

Pathway length comparison between iterative algorithm and algorithm minimizing absolute values of fluxes

(0.06 MB PDF)

Reaction usage is similar between R10 EFM and MILP results

(0.02 MB PDF)

Removing reactions from the R19 network affects the system's ability to yield balanced pathways

(0.04 MB PDF)

Comparison of the three different MBP search algorithms

(0.04 MB DOC)

List of MBPs found in the R19 network

(0.07 MB XLS)

List of all degenerate MBPs found in the R10 network

(0.07 MB XLS)

Metabolite and reaction usage from MBPs in the R19 network

(0.03 MB XLS)

Metabolite and reaction usage from MBPs in the R10 network

(0.02 MB XLS)

Metabolite and reaction usage from the KEGG carbon dataset

(0.08 MB XLS)

Metabolite and reaction usage from the KEGG hydrogen dataset

(0.17 MB XLS)

Metabolite and reaction usage from the KEGG nitrogen dataset

(0.02 MB XLS)

Metabolite and reaction usage from the KEGG oxygen dataset

(0.09 MB XLS)

Metabolite and reaction usage from the KEGG phosphorous dataset

(0.02 MB XLS)

Metabolite and reaction usage from the KEGG sulfur dataset

(0.01 MB XLS)

We are grateful to Oliver Ebenhöh, Tzachi Pilpel, Scott Mohr and members of the Segrè lab for helpful feedback and discussions.