The number of active metabolic pathways is bounded by the number of cellular constraints at maximal metabolic rates

Growth rate is a near-universal selective pressure across microbial species. High growth rates require hundreds of metabolic enzymes, each with different nonlinear kinetics, to be precisely tuned within the bounds set by physicochemical constraints. Yet, the metabolic behaviour of many species is characterized by simple relations between growth rate, enzyme expression levels and metabolic rates. We asked if this simplicity could be the outcome of optimisation by evolution. Indeed, when the growth rate is maximized –in a static environment under mass-conservation and enzyme expression constraints– we prove mathematically that the resulting optimal metabolic flux distribution is described by a limited number of subnetworks, known as Elementary Flux Modes (EFMs). We show that, because EFMs are the minimal subnetworks leading to growth, a small active number automatically leads to the simple relations that are measured. We find that the maximal number of flux-carrying EFMs is determined only by the number of imposed constraints on enzyme expression, not by the size, kinetics or topology of the network. This minimal-EFM extremum principle is illustrated in a graphical framework, which explains qualitative changes in microbial behaviours, such as overflow metabolism and co-consumption, and provides a method for identification of the enzyme expression constraints that limit growth under the prevalent conditions. The extremum principle applies to all microorganisms that are selected for maximal growth rates under protein concentration constraints, for example the solvent capacities of cytosol, membrane or periplasmic space. Author summary The microbial genome encodes for a large network of enzyme-catalyzed reactions. The reaction rates depend on concentrations of enzymes and metabolites, which in turn depend on those rates. Cells face a number of biophysical constraints on enzyme expression, for example due to a limited membrane area or cytosolic volume. Considering this complexity and nonlinearity of metabolism, how is it possible, that experimental data can often be described by simple linear models? We show that it is evolution itself that selects for simplicity. When reproductive rate is maximised, the number of active independent metabolic pathways is bounded by the number of growth-limiting enzyme constraints, which is typically small. A small number of pathways automatically generates the measured simple relations. We identify the importance of growth-limiting constraints in shaping microbial behaviour, by focussing on their mechanistic nature. We demonstrate that overflow metabolism – an important phenomenon in bacteria, yeasts, and cancer cells – is caused by two constraints on enzyme expression. We derive experimental guidelines for constraint identification in microorganisms. Knowing these constraints leads to increased understanding of metabolism, and thereby to better predictions and more effective manipulations.

The rate at which a microorganism produces offspring (specific growth rate) is a near universal selective pressure across microbial species. Growth requires metabolic enzymes, whose properties and expressions are moulded by evolution, within the bounds set by physical and chemical constraints. We asked therefore, if basic (bio)chemistry can determine the outcome of evolution. We found an evolutionary extremum principle that dictates that specific growth rate maximisation requires minimisation of metabolic complexity. This principle is a mathematical consequence of maximising specific growth rate under mass-conservation and protein-concentration constraints. We prove that the number of growth-limiting constraints bounds the number of active metabolic subnetworks (defined by Elementary Flux Modes) at maximal growth rate. Therefore, the complexity of metabolic behaviour is determined by the number of active protein constraints, not the size of the network. The consequences of the fundamental principle can be visualised in a graphical framework that provides a unified understanding of metabolic behaviours conserved across microbial species 1 , such as diauxic growth 2 , mixed substrate usage 3 and overflow metabolism 4 . This work therefore provides a biochemical basis for the fundamental limits of evolution, and the driving forces of evolutionary change under growth-enabling conditions. Fitter microorganisms drive competitors to extinction by synthesising more viable offspring in time 5,6 . The rate of offspring-cell synthesis per cell, i.e., specific growth rate, is a common selective pressure across microbial species 5 . High growth rate requires high metabolic rates, which in turn requires high enzyme concentrations 7 . Due to limited biosynthetic resources, such as ribosomes, polymerases, energy and nutrients, the expression of any enzyme is at the expense of others 8,9 . Consequently, proper balancing of enzyme benefits and costs results in optimally-tuned enzyme expression that maximises growth rate [10][11][12] .
We asked whether the evolutionary endpoint of this optimisation can be predicted from general principles of metabolism, i.e., from (i) mass conservation: steadystate reaction-stoichiometry relations, and (ii) enzyme biochemistry: the proportionality of an enzyme's activity to its concentration. Such an endpoint can indeed be found, in the form of an evolutionary extremum principle: growth-rate maximisation drives microorganisms to minimal metabolic complexity. We provide the mathematical proof of this principle in the supplemental information. Minimal metabolic complexity can be unambiguously defined in terms of 'Elementary Flux Modes' (EFM). An EFM is a mathematical definition of a metabolic subnetwork at steady state: it is a minimal set of thermodynamically feasible reactions that form a network from external sources to sinks 13 . An EFM is elementary (or minimal, nondecomposable) because none of its reactions can be removed without halting flux. An EFM is purely defined in terms of reaction stoichiometry; enzyme kinetics are not required, and are known to maximise specific flux 14,15 . Suppose that pure respiratory growth and fermentative growth are each EFMs, then respirofermentative growth is not, as it can be decomposed. In general, any flux distribution can be decomposed into positive linear combinations of EFMs 13 .
A cell requires around 250 reactions to make all its cellular components from basic nutrients 16 . This size is indeed comparable with the size of a calculated EFM sufficient for cell synthesis 17 . Because of the many combinations of parallel, alternative metabolic routes in metabolic networks, the total number of such EFMs in one such network is in the hundreds of millions; the exact number depends on the network and the growth-medium conditions 17 . However, when we estimate, from experimental data (SI 8), the number of EFMs that cells actually use at a particular condition, then this number is small, in the order of 1 to 3. That microorganisms choose only a handful of EFMs out of millions of alternatives suggests that these alternatives are not evolutionarily equivalent, and only a small number has been selected. This can be explained by the extremum principle, which in terms of EFMs states: when the rate of a particular metabolic reaction in a metabolic network is maximised, the number of flux-carrying EFMs is at most equal to the number of active constraints on protein concentrations. We define the constraints as sums of selected (weighted) protein concentrations. A constraint is 'active' when it limits the cell in increasing its growth rate, indicating that the corresponding protein pool is fully used. The principle also applies to the cell-synthesis reaction, which makes all cellular components in the right proportions according to the biomass composition, and is hence also known as the biomass reaction 18 . The extremum principle demands that if the number of constraints is low, so is the number of active EFMs at optimal growth rate.
We focus on protein constraints because the properties and expression of proteins are the major biological variables that evolution acts on. In addition, recent work indicates that many aspects of microbial growth can be understood from a protein allocation perspective. 8,9,[19][20][21][22] . Moreover, other studied constraints such as the solvent capacity of cellular compartments 23 or cellular membranes 24 or of the entire cell 20 can be reformulated in terms of constrained protein pools, e.g. a limited cytosolic protein pool or a limited membrane protein pool (SI 3). The exact nature of these pools could vary per organism and environmental condition, but the effects described by the extremum principle will not.
Constraints on protein concentrations limit the cell synthesis rate. Thus, the EFM, or the combination of EFMs, that uses the smallest fractions of the constrained protein pools for reaching one unit of specific growth rate is optimal. A graphical representation of the optimisation problem in 'constraint space' illustrates this in an intuitive manner ( Figure 1). In the case of multiple constrained protein pools, different EFMs vary in their usage of such pools, making the cost of implementing an EFM a multidimensional variable. Each EFM is therefore assigned a 'cost vector' in constraint space: a cost vector quantifies, for each constrained protein pool, the fraction that an EFM requires for producing one unit of objective flux -here the cell synthesis flux. The direction of a cost vector thus denotes which pool is used most by this EFM. Specific growth rate maximisation now becomes a geometric problem in constraint space: fit the largest possible multiple of cost vectors within the constraint space. Shorter vectors, corresponding to EFMs with lower enzymatic costs, will fit more often and will produce a higher flux. This is illustrated for a 2-constraint problem in Figure 1: both constraints can be fully used with only two vectors (EFMs). However, an EFM that sits on the diagonal can make full use of both constrained protein of the i th EFM denotes the fractions of the first and second protein pool that this EFM needs to produce one unit objective flux. The usage of a certain combination of EFMs i and j corresponds to a weighted sum of the cost vectors: λid i + λjd j . The combination is possible as long as none of the constraints is exceeded: λid i + λjd j ≤ 1. The optimal objective flux is achieved by maximising the sum of weights (λi + λj). This optimal sum is shown by the dashed vectors. The sole use of an off-diagonal cost vector leads to underuse of one constraint, while diagonal cost vectors can exhaust both constrained pools. A mixture of EFMs will always be a combination of an above-diagonal and a below-diagonal vector. All EFMs and combinations thereof, can be ranked by a dot on the diagonal which denotes the average cost per unit cell synthesis flux. Above-diagonal cost vectors should be projected horizontally, below-diagonal vectors vertically, and for combinations we should follow the connecting line. The (mixture of) EFM(s) with the lowest average cost reaches the highest growth rate (see Lemma 4 in the SI for details. The shaded regions indicate alternative positions for the cost vectors under different intracellular metabolite concentrations. We have drawn two of these alternative concentrations for two EFMs. The blue and orange cost vectors are calculated at those concentrations that would lead to the highest growth rate when using only that EFM. The green vectors are vectors that would lead to a mixture of the orange and blue EFM. Upon a change of environmental conditions, the mixture of EFMs becomes better than either single EFM. This would lead to a qualitative change in metabolic behaviour. Although the shown cost vectors reflect only few choices of metabolite concentrations, our mathematical results are valid for all possible concentrations. pools: hence, the extremum principle is that the number of EFMs that maximise flux is equal to or less than the number of active constraints. We have derived the necessary and sufficient conditions under which it is optimal to use EFMs in combinations (SI). Note that the length and direction of the cost vector can depend on the metabolite concentrations. We study the consequences in the SI.
The extremum principle is independent of the complexity of the metabolic network, i.e., of its kinetics and its structure. Rather, the metabolic complexity is determined by the number of active constraints; the kinetics and structure will subsequently determine which EFMs will be optimal and selected by evolution -as illustrated by in silico evolution of metabolic regulation towards only one active EFM 25 . The fact that constraints determine metabolic complexity is the important insight that our work produces. Genome-scale metabolic models, which contain all the annotated metabolic reactions that a microorganism's genome encodes 26 , will therefore behave qualitatively similar to simplified models, and coarse-grained models can be used without loss of generality. This greatly facilitates our understanding of metabolic behavior as a consequence of the extremum principle.
Several widely-occuring metabolic behaviours can be understood through the extremum principle. A wellknown recurrent behaviour is overflow metabolism: It refers to the apparently wasteful excretion of metabolic products. Examples are the aerobic production of ethanol by yeasts (Crabtree effect), lactate by cancer cells (Warburg) or acetate by Escherichia coli 8,27,28 . The onset of overflow metabolism is generally studied as a function of growth rate (e.g., in chemostats). When the growth rate is increased above some critical value, respiratory flux decreases and the flux of overflow metabolism emerges. Below this critical growth rate, the respiratory flux is proportional to the growth rate (see Figure 2), which is one of the distinctive characteristics of the usage of a single EFM (see SI 8.1). Moreover, the continuously decreasing respiratory flux and increasing overflow flux indicate that two EFMs must be active and resources are re-allocated from respiration to fermentation proteins. Thus, according to our theory, at least two constraints must be active.
For illustration purposes, we constructed a core model of overflow metabolism that includes enzyme kinetics for each step, a respiration and acetate overflow branch, and imposed constraints on two protein pools: total cytosolic protein, and total membrane protein. Figure 2 shows that overflow metabolism can be the outcome of the cell's strategy to maximise normalized growth rate μ/μ crit substrate uptake (q S /q S,crit ) 1 q S directed at fermenta�on total uptake rate (q S ) q S directed at respira�on 1 Figure 2: Overflow metabolism can be explained by growth rate maximisation under protein-concentration constraints. a) A core model with two EFMs (orange: respiration and blue: acetate overflow) that lead to cell synthesis. All considered reactions have an associated enzyme, whose activity depends on kinetic parameters and the metabolite concentrations. We varied growth rate by changing the external substrate concentration. Given this external condition, the growth rate was optimised under two enzymatic constraints (limited total enzyme ei ≤ 1 and limited membrane area etransport ≤ 0.3). b) The resulting substrate uptake flux directed towards respiration and fermentation is in agreement with experimental data scaled with respect to the growth rate (µcrit) and uptake rate (qcrit) at the onset of overflow 8,27,28 (see SI 8). its growth rate under these two protein-concentration constraints. At low glucose concentrations, the constrained membrane pool limits substrate uptake and therefore favours efficient use of glucose via respiration. In this respiratory phase, the cytosolic pool could be considered in excess, but in the model the expression of extra cytosolic proteins can nonetheless reduce product inhibition of the transporter pools. Consequently, for a large range of external substrate concentrations pure respiration leads to the highest growth rate by fully exploiting the two available enzyme pools. As glucose concentrations increase, so does the saturation level of the enzymes, as observed experimentally 29,30 .
At higher glucose concentrations, however, transport requires relatively less protein and the respiration cost vector becomes below-diagonal: pure respiration will leave membrane protein pool underused; the cytosolic pool limits respiration. A better strategy is to respire less and make some of the cytosolic pool available for another EFM that can exploit the underused membrane pool. The net outcome is that a mixture of EFMs attains a higher growth rate than either of the two EFMs alone. This explains the onset of overflow metabolism, from a basic principle: optimisation of growth rate under a set of constrained protein pools.
Explanations for overflow metabolism offered by other modelling methods, such as coarse-grained whole cell models 8,9,21 , genome-scale M 23,31-33 and ME models 34 , are discussed in the supplemental information. There we show that mathematically all of them are instances of the same constrained optimisation problem and thus follow the extremum principle.
The constraint plane formalism can predict how experimental perturbations influence overflow metabolism. These perturbations can often be reinterpreted as either reducing a constrained protein pool or increasing the resource requirement of an EFM. In both cases, this can be visualized by a reduction of the size of the constraint box. Experimental data from a perturbation experiment can be used to deduce which protein pools were affected in the perturbation. This mechanistic insight informs us about the origins of overflow metabolism, as is illustrated in Figure  3a-d where we predict the effect of the reduction of both pools or only the first (x-axis) constrained pool (see SI 6 for a mathematical analysis).
Overexpression of the unneeded protein LacZ 3e seems to reduce both enzyme pools equally. To find the most straightforward explanation, recall that the cost vectors denote the needed fraction of the constrained protein pools for one unit cell synthesis flux. The synthesis of LacZ requires a part of both pools that cannot be invested in useful proteins. Since LacZ can be considered an average protein in terms of resource requirements, and metabolism was already tuned to make proteins, the reduced parts of both pools will be equal.
The addition of chloramphenicol inhibits translation and the cell therefore needs a larger amount of ribosomes per unit flux. This again adds a cost for cell synthesis, thereby reducing both pools. The experiment however shows that chloramphenicol has a more dominant effect on the first pool (x-axis) than on the second pool (Figure 3f ). This means that the increased number of ribosomes has an additional effect on this first pool, which could well be related to the large cytosolic volume that the ribosomes take in.
The co-consumption of substrates is another universal phenomenon that can be analysed with our We predict that perturbations that tighten the protein pool most used by one EFM (here denoted by CO2) first lead to an increase in flux through the other EFM (Ac), and a subsequent decrease as its flux also becomes constrained by the perturbation. f ) This behaviour is observed, a.o., for translation inhibitor experiments using chloramphenicol (SI 7).
EFM-based perspective. A comparison of the costs for mixed and single substrate usage determines whether a growth-optimised micro-organism shows catabolite repression or not. In E. coli, transport-mediated mixed substrate usage was observed in medium containing combinations of excess carbon sources 35 . Combinations of substrates that enter upstream of glycolysis with substrates that enter downstream often gave rise to a higher growth rate than can be reached on the substrates individually. Hermsen et al. showed that growth rates on these combinations can be accurately predicted with a core model. We performed a genomescale EFM-analysis (SI 10) and found Elementary Flux Modes that use combinations of an upper-and a lowerglycolytic substrate. However, we also found EFMs that combine two upper-glycolytic substrates, indicat-ing that co-consumption of two of these substrates could be optimal. We therefore confirmed and extended the growth experiments from Hermsen and indeed found co-consumption of all combinations of mannose, maltose, xylose and succinate (SI 11). In Figure 4 we analyse our experimental results, by estimating the position of the cost vectors from the experimental data. The positions of these vectors determine whether xylose is not consumed (+glucose), coconsumed without a measurable growth rate advantage (+maltose) or co-consumed with a growth rate advantage (+succinate). Combining these vector positions with the metabolic network allows for a new perspective: the differences between cost vectors can only be caused by the non-overlapping parts of the EFMs, since the enzymatic costs in the overlapping parts are equal. Thus, larger gains in growth rate can be expected if the network distance between the substrates is larger.
This analysis can also be applied to co-consumption of glucose and ethanol by S. cerevisiae. Its metabolic network includes several EFMs that simultaneously take up glucose and ethanol. When external glucose levels decrease, the costs of the glycolytic reactions increase, making it increasingly favorable to stop these reactions and use a new EFM that co-consumes glucose and ethanol, but makes a larger part of the cell components from ethanol. The experimental finding 3 of a sequential use of several EFMs, can therefore be readily explained using the constraint space, see SI 9.2.
The extremum principle that we derived and illustrated in this work, predicts that evolution of metabolic regulation proceeds by fixation of mutations that influence enzyme kinetics to increase the rate per unit enzyme -i.e., evolution shortens cost vectors of EFMs. Resources are reallocated to those 'efficient' enzymes at the expense of others that are less active per unit enzyme -i.e., evolution reduces the number of active EFMs. This shows a fundamental limit in microbial evolution: under constant conditions, a metabolic state is selected that uses only a small number of EFMs, or even only one, that use the available resources from all constrained enzyme pools.
The extremum principle determines the evolutionary direction of microbial metabolism. Even if growthrate maximisation at constant conditions is at best a crude approximation of the selective pressure at particular instances of evolutionary history, we expect that it nonetheless provides an 'evolutionary arrow of time' over long time scales. When conditions change frequently, other aspects might come into play and fitness will be captured by the mean growth rate over environments, i.e., the geometric growth rate 5 . Whether Preliminary data Figure 4: The (dis)advantage of co-consuming xylose in terms of cost vectors. Left: We show, in an illustration of the metabolic network of E. coli, EFMs that co-consume xylose with another carbon source. The reactions in the shaded regions can be switched off when xylose is co-consumed, but are otherwise essential. The enzymatic costs for these reactions can thus be traded off for the costs of xylose uptake. This will change the cost vector positions and determine whether co-consumption increases the growth rate. Right: The positions of the cost vectors can be estimated from our experimental data: the growth rate determines the place of the projection on the diagonal (big dots), and acetate excretion determines the relative angles with the diagonal of the respiration and fermentation cost vector. extremum principles hold for the maximisation of geometric growth rate is an open problem for future theoretical work.
The extremum principle is a null hypothesis about the course of a particular evolutionary process 36 . This has direct operational implications for evolutionary engineering strategies, for example in industrial biotechnology where co-consumption of different sugars from biomass-hydrolysates are pursued, or prevention of overflow metabolism during heterologous protein production is attempted. Our extremum principle provides a species-overarching molecular, constraint-based perspective on the systemic capabilities of biological networks and their contributions to microbial fitness. We hope that it provides explanatory and predictive power like other extremum principles in science have done, such as the second law of thermodynamics.