## Figures

## Abstract

In this paper we try to describe all possible molecular states (phenotypes) for a cell that fabricates itself at a constant rate, given its enzyme kinetics and the stoichiometry of all reactions. For this, we must understand the process of cellular growth: steady-state self-fabrication requires a cell to synthesize all of its components, including metabolites, enzymes and ribosomes, in proportions that match its own composition. Simultaneously, the concentrations of these components affect the rates of metabolism and biosynthesis, and hence the growth rate. We here derive a theory that describes all phenotypes that solve this circular problem. All phenotypes can be described as a combination of minimal building blocks, which we call Elementary Growth Modes (EGMs). EGMs can be used as the theoretical basis for all models that explicitly model self-fabrication, such as the currently popular Metabolism and Expression models. We then use our theory to make concrete biological predictions. We find that natural selection for maximal growth rate drives microorganisms to states of minimal phenotypic complexity: only one EGM will be active when growth rate is maximised. The phenotype of a cell is only extended with one more EGM whenever growth becomes limited by an additional biophysical constraint, such as a limited solvent capacity of a cellular compartment. The theory presented here extends recent results on Elementary Flux Modes: the minimal building blocks of cellular growth models that lack the self-fabrication aspect. Our theory starts from basic biochemical and evolutionary considerations, and describes unicellular life, both in growth-promoting and in stress-inducing environments, in terms of EGMs.

## Author summary

A factory that produces identical factories with all of its machinery in the right proportions would be a wondrous thing. This is exactly what growing cells do: they self-fabricate their constituents (metabolites, enzymes, lipids, DNA) to form new cells. The production of these compounds leads to the expansion of cellular volume, and gives rise to a growth rate. To self-fabricate sustainably, also called “balanced growth”, all compounds should be produced at exactly this growth rate. How to design such a cellular factory, given a list of all machines (enzymes encoded on the genome) and their properties (enzyme kinetics)? This is a difficult circular problem, since the expressed enzymes both determine what is produced (because they catalyse the production), and what should be produced (since all expressed enzymes should be produced to fabricate an identical cell). Using a mathematical approach, we identified the minimal enzyme expression patterns needed for cellular growth: Elementary Growth Modes (EGMs). The EGMs form the basic units of all sustainable phenotypes, and we prove that usually only one of them will be selected by evolution in static conditions. Our theory of self-replication therefore forms a quantitative, biochemically-rooted, basis of cellular growth phenotypes in all unicellular life forms.

**Citation: **de Groot DH, Hulshof J, Teusink B, Bruggeman FJ, Planqué R (2020) Elementary Growth Modes provide a molecular description of cellular self-fabrication. PLoS Comput Biol 16(1):
e1007559.
https://doi.org/10.1371/journal.pcbi.1007559

**Editor: **Christoph Kaleta,
Christian Albrechts Universitat zu Kiel, GERMANY

**Received: **May 9, 2019; **Accepted: **November 22, 2019; **Published: ** January 27, 2020

**Copyright: ** © 2020 de Groot et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the manuscript and its Supporting Information files.

**Funding: **This work was supported by NWO VICI grant 865.14.005 and by Era-Industrial Biotechnology project nr. 053.80.772. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

One of the defining aspects of living cells is that they fabricate themselves from simple chemical compounds, a process often referred to as cell growth or replication. Microorganisms can do this in a multitude of different environments, using various food sources and while facing various stresses. Understanding the molecular players and mechanisms of self-fabrication is the realm of (molecular) microbiology; understanding the design principles underneath the regulatory adaptation processes, and how phenotype emerges from the molecular level, is the key question in systems biology. In evolutionary biology, the impact of phenotypic strategies on fitness is evaluated. Connecting these fields, and providing what is sometimes called the genotype-to-phenotype map, is a grand challenge in biology.

The aim of this paper is to make a start with just that: we provide a theory to predict the molecular mechanistic description of how a cell maintains itself and how it grows, in terms of all its biosynthetic processes and stress-managing systems. This may seem an impossible task, but when we restrict ourselves to steady state exponential, or balanced, growth [1], this becomes a feasible endeavour, as we will show. In such a balanced growth state, all intrinsic properties of a population of cells, such as distributions of molecule concentrations and reaction rates, are time-invariant. Under these circumstances, a molecular mechanistic description of a cellular phenotype amounts to listing all reaction rates and all the concentrations of cellular components. The theory we develop allows us to make concrete predictions of such phenotypes, on the basis of the chemical properties of the compounds (metabolites, enzymes, etc.) involved.

Arguably the most important phenotypic trait of growing microbial cells is their growth rate, since it directly determines the number of offspring cells synthesised per unit time. In growth-promoting conditions, the vast majority of a cell’s metabolic energy and resources is invested into growth and the production of new cells. There is also a direct evolutionary premium on fast growth: under constant conditions, specific growth rate is the direct determinant of fitness. We are therefore interested in understanding how a certain cellular phenotype gives rise to its steady state exponential growth rate.

To attain higher growth rates, the rates of enzyme synthesis need to be higher, both to achieve higher enzyme concentrations and thus higher reaction rates, but also to counter higher dilution rates by cell volume growth. Higher enzyme synthesis rates require higher numbers of ribosomes [2], but since the ribosomes also produce these additional ribosomes, even more ribosomes are needed. The relative abundance of ribosomes and enzymes will therefore inevitably change with increasing growth rate [2], turning cellular growth into a nonlinear phenomenon. Since it is the ribosome that is responsible for the synthesis of protein, including ribosomes, the “control space” of the cell can ultimately be viewed as a ribosomal allocation space, with different fractions of the ribosome allocated to synthesise the different enzymes [3]. These fractions might be determined by passive competition for mRNA molecules produced by gene expression. The aim of this paper is to classify all possible balanced growth states of a cell in terms of those ribosomal fraction variables.

There is a long tradition in trying to model growth of (microbial) cells, but in particular genome-scale metabolic network models have come close to providing a comprehensive molecular description of cellular phenotypes [4]. Such models neglect enzyme kinetics and the synthesis of enzymes and ribosomes, which greatly facilitates the analysis.

In order to predict growth phenotypes in such models, growth rate is assumed to be proportional to the rate of a so-called biomass reaction: a virtual reaction with fixed stoichiometry that is used to model the average demand for biomass components. This biomass reaction rate is being maximised at steady state under governing constraints on fluxes—usually input fluxes. This constitutes a linear optimisation problem with the individual fluxes as the optimisation (or control) variables, and is well known as Flux Balance Analysis (FBA; [5]). We understand the solution space of an FBA problem very well from a mathematical point of view [6–8]. This understanding was facilitated by the identification of a set of invariant building blocks of the solution space: the Elementary Flux Modes (EFMs) [9, 10].

In recent years, cellular resource management has become an important concept with which to improve our understanding of cellular physiology [2, 3, 11, 12], even hinting at true ‘bacterial growth laws’ [13]. The additional insight offered by cellular resource management has been driving the field from classical FBA to “resource”-FBA type approaches [14–16]. In these approaches a virtual biomass reaction is still optimized, but by introducing enzyme kinetics (either by introducing the full nonlinear enzyme saturation function, or by introducing only a maximal rate (k_{cat}) per enzyme) the constraints now act on the enzyme concentrations, instead of on the fluxes. This creates an optimisation problem with the enzyme concentrations as the optimisation variables, a problem that is nonlinear when nonlinear enzyme kinetics were introduced. The solution space, and the optimal solutions, can still be understood in terms of EFMs [17–19].

The concepts of cellular resource management have been further exploited in a number of modelling approaches in which metabolic flux is explicitly coupled to the synthesis of enzyme, which in turn is coupled to the presence of ribosomes and its required substrates (amino acids and a source of free energy). In these approaches the demand for biomass components is a variable that is calculated in the model, rather than a fixed biomass reaction. So far, such models, be it core models that include kinetics [3, 11], or augmented genome-scale metabolic and expression (ME) models [20–22], have been used for simulation studies and produce biologically relevant regulatory behaviours, such as overflow metabolism [3] or catabolite repression [23]. However, they differ in the nature of the imposed constraints, and still ignore (different) parts of the self-fabrication process. More importantly perhaps, there is a lack of understanding of the mathematical structure of the solutions to all these optimal-resource allocation problems—understanding we do have for FBA-type problems.

In this paper, we develop theory for this next generation of growth models. Our theory identifies a set of invariant building blocks for the steady-state solution space of a self-fabricating biochemical system, including all metabolites, enzymes, ribosomes, and their synthesis rates. We have called them Elementary Growth Modes, as they play a similar—but not exactly the same—role as EFMs in the metabolic network optimisation problem.

The structure of this paper is as follows. After introducing the relevant notation, we focus on a population of growing cells in balanced growth, and start by giving a definition of the growth rate in terms of metabolic rates inside these cells. Then we introduce the class of whole-cell models studied in this paper, and derive a set of relations that have to hold in balanced growth and which feature the control variables of the cell most transparently. The main remaining aim of the paper is then to study the solution space of these relations in terms of the control variables.

We first introduce Elementary Growth States (EGSs) and show that all balanced growth states, at a given growth rate and a fixed set of metabolite concentrations, can be formed by taking suitable convex combinations of such states. EGSs with the same participating reactions, but at different growth rates and metabolite concentrations, may be identified with each other, and such a class of EGSs is termed an Elementary Growth Mode (EGM). We then show that, if no additional biophysical constraints are introduced, balanced growth rate is maximised in exactly one such EGM. If multiple biophysical constraints are active, a mixture of EGSs may arise as the growth-rate maximiser.

Finally, it is a natural question to ask how the new EGSs and EGMs relate to the older EFMs. We show that under one additional biological assumption, each EGM can be mapped to a unique growth-sustaining EFM. We indicate using experimental data that this biological assumption is indeed borne out.

At the end of each section, we give a brief description of the biological interpretation and consequences of the results derived. We have tried to keep these separate from the mathematical theory, to aid both the more mathematically and the more biologically inclined reader. A fully worked out example of the theory may be found in the S1 Text.

## Methods

### Notation

Vectors are denoted by boldface, e.g., ** v**. The vector

*u*_{j}denotes the

*j*-th elementary vector, filled with zeros except for the

*j*-th element, which is 1. The dot product between vectors

**and**

*v***is denoted by**

*w***⋅**

*v***, and |**

*w***| denotes ∑**

*v*_{i}|

*v*

_{i}|, which in this paper always simplifies to ∑

_{i}

*v*

_{i}because the vectors in question have positive entries. Time derivatives are denoted by overdots, e.g., . The inequality

**≥**

*v***0**should be interpreted as

*v*

_{j}≥ 0 for all

*j*. Modelling assumptions are labelled as A1, A2, etc. An overview of all variables and parameters used in this paper can be found in S1 Table.

### Linking metabolic activity to the growth rate

The following is closely related, and in part identical, to the exposition in [24].

Let ** n**(

*t*) = (

*n*

_{1}(

*t*), …,

*n*

_{K}(

*t*)) be the vector of copy numbers (in moles) of

*K*compounds in a population of cells with total volume

*V*(

*t*) at time

*t*. The concentration of compound

*k*is defined as

*c*

_{k}(

*t*) ≔

*n*

_{k}(

*t*)/

*V*(

*t*). The modelling assumption is that changes in copy numbers of the compounds are due to chemical reactions which take place in a well-stirred cell [25], (1)

Here *N* denotes the stoichiometry matrix, and *v*_{j}(** c**) is the

*j*-th reaction rate, as a function of concentrations

**. Since**

*c**n*

_{k}=

*Vc*

_{k}, we have (2)

In balanced growth, all concentrations are constant over time [26], , which means that (3)

To relate the growth rate of the volume of a cell to all the metabolic reactions making new cell material, we have to relate cell volume to the molecule copy number. We consider experimental conditions such that the volume is a function of all the copy numbers only, *V*(*t*) = *V*(** n**(

*t*)). In particular, we assume (see S1 Text for justification and detail) that

*V*only depends linearly on copy numbers, (4) which is equivalent to (5)

The (instantaneous) growth rate is now defined as (6)

It directly follows (see S1 Text) that at balanced growth (7)

(This derivation differs only from [24] in the exact form of the “constant density” assumption (A2). In our case, each type of molecule contributes its own specific volume, instead of assuming an overall constant biomass concentration. As explained in the S1 Text, (A2) is not directly based on data, but on the requirement that the steady state Eq (3) have solutions at all).

Since *ρ*_{l} equals the molar volume of molecule *l*, relation (7) simply states that the growth rate of a cell equals the total volume synthesis rate of all its reactions per unit volume. This total volume synthesis rate is calculated by summing the volumes of all synthesized metabolites in all reactions per unit time.

Eq (2) is equivalent to (8) which is the familiar form showing reactions and dilution [3, 24]. If we substitute the balanced growth rate (7) into (8) at steady state conditions, we obtain (9)

This equation shows that at steady-state, the net production of molecule *k* (first term) should balance the dilution of molecule *k* by cellular growth, growth that is due to the volume contribution of all produced molecules in the cell (second term). We could not analyse these equations further to understand the structure of steady state solutions, without additional mathematical assumptions.

#### Biological interpretation and consequences.

We related the growth rate of a population of cells to the volume changes due to the activities of metabolic reactions. As such, we showed that growth is due to the production of cell components, such as DNA, RNA, lipids and proteins, from extracellular nutrients. Our definition of growth rate is closely related to how growth rate is measured, using for instance optical density measurements.

In our derivation, we assumed that *V*(*t*) = *V*(** n**(

*t*)), which means that the contents of a cell is approximated by an ideal solution [27]. We realise that this is likely an oversimplifying assumption [28, 29]), but one that underlies many models of cell growth [24, 26]. We note in the S1 Text that balanced growth becomes much harder to rationalise when the assumption

*V*(

*t*) = ∑

_{k}

*ρ*

_{k}

*n*

_{k}(

*t*) is invalid, since then the balanced growth equations generally have no solutions.

The theory so far concerns the descriptions of the copy number and volume changes in a population of cells that grows balanced, i.e., at a fixed growth rate, and such that all extensive properties increase exponentially and all intensive properties are constant [26]. The relation between the properties of a population of cells to those of its individual cell members can be found in [30].

### Introducing a whole-cell model of self-fabricating cells

To obtain a better understanding of how the molecular state of a cell, its ‘phenotype’, relates to its growth rate, we have to add more biological information to the general model.

We distinguish concentrations of metabolites ** x**, enzymes

**and ribosomes**

*e**r*. We ignore several biological entities, especially mRNA and genes. Genes are implicitly defined by the reaction stoichiometries occurring in the stoichiometric matrix; they are not synthesized nor degraded. DNA itself may be synthesized as a macromolecule in our description. Messenger RNA is not synthesized in our model, but it does play a role as will become clear later. We note that the model can be extended with mRNA, without qualitative changes. We left it out for simplicity and because, in terms of volume, it is a minor cellular component, in contrast to rRNA, which can be considered as a regular macromolecule.

We note that in all population perspectives on cell growth, including ours, the division and birth process is generally not incorporated as a molecular process. All cells are growing and dividing asynchronously, but in the model description they are considered in the same state of producing cell components. At each moment in time, a fixed fraction of cells is dividing in a population of cells at balanced growth [30]; considering their activities would require a different modelling formalism than the one pursued in this paper. We have chosen this formalism because it is the overarching description of all genome-scale modelling formalisms [4, 22] that are currently in use in systems biology.

The dynamical system for all concentrations ** c** = (

**,**

*x***,**

*e**r*) is still given by (8). We subdivide the stoichiometric matrix

*N*into parts, corresponding to the two levels of “metabolism” and “enzymes and ribosome synthesis”, as follows, (10)

Here *P* is an *m* × *n*-matrix (the usual stoichiometric matrix in a metabolic pathway), with entries generally of different signs. The (stoichiometric) number of metabolites needed for the synthesis of each enzyme is recorded in *M*, an *m* × (*n* + 1)-matrix with mostly positive (or zero) entries. The only negative entries in *M* correspond to metabolites that are produced when enzymes are synthesized, such as ADP. The (*n* + 1) × (*n* + 1) identity matrix, *I*, denotes that each enzyme is made by one enzyme synthesis reaction. The vector of reaction rates is also split, into metabolic rates ** v** = (

*v*

_{1}, …,

*v*

_{n}), and synthesis reactions

**, consisting of**

*w**n*enzyme synthesis rates (

*w*

_{1}, …

*w*

_{n}) and a ribosome synthesis rate

*w*

_{n+1}.

The assumptions underlying (10) are that each individual metabolic reaction *v*_{j}(** c**) has a unique enzyme associated to it. Since enzyme

*j*catalyses metabolic reaction

*j*, we assume, in agreement with basic enzyme kinetics [31], that (11)

The ribosome catalyses synthesis of all enzymes and itself, and amino acids (which form part of the metabolites ** x**) are consumed in the process. We therefore choose
(12)

The rate laws *f*_{j}(** x**) are assumed to be known nonlinear functions, and contain the catalytic rate parameter

*k*

_{cat,j}for the enzyme

*e*

_{j}in this notation; moreover, they incorporate the thermodynamic constraints imposed by the substrates and products of each enzyme. Also the

*g*

_{j}(

**) are assumed to be known and contain**

*x**k*

_{cat,r}. The linear dependence of the metabolic rates

*v*

_{j}on enzyme concentration

*e*

_{j}follows generally from quasi-steady-state type derivations of rate laws [31], and are a good approximation when enzymes are at concentrations that are much lower than the substrate or product concentrations. The linear dependence of enzyme and ribosome synthesis rates on ribosome level in (12) is assumed for the same reason. Both linear dependencies are crucial for the development of the theory.

The coefficient *α*_{j} is the fraction of the total ribosome pool that is allocated to produce enzyme *j*, and are the result of gene expression. A possible mechanism is that the different mRNA molecules passively compete to be translated by the ribosome. The fraction of the ribosome allocated to produce enzyme *j* is then assumed to be proportional to the fraction of mRNA_{j} over total mRNA (but not necessarily equal to it, since a certain fraction of the ribosome may not be allocated to anything, in principle). Our theory does not depend on the specific mechanism that is used in cells, but is based solely on the assumption that the *α*_{j}-factors can be influenced by gene expression. We have
(13)
where λ signifies the fraction of all the ribosomes that are actively involved in synthesis of new protein and ribosomes. We emphasise that the *α*_{j}’s are the main control parameters of the cell in the perspective taken in this paper. By changing them, the cell changes phenotype.

Let us lastly introduce the molar volume parameters for the compounds. We denote these by *ρ*_{1}, …, *ρ*_{m} for the *m* metabolites, *σ*_{1}, …, *σ*_{n} for the *n* enzymes, and *σ*_{n+1} for the ribosome. Assumption (A2) now reads, analogously to (5),
(14)

In summary, assumptions (A1)–(A6) give the following dynamical system for a whole-cell model, (15)

External environmental concentrations, such as nutrients, are incorporated as parameters in the relevant *f*_{j}(** x**). An overview of the main ingredients of the model is given in Fig 1.

The top figure shows how the synthesis of cellular building blocks leads to cell growth, with rate *μ*. In a balanced growth state, all currently present cellular building blocks (denoted by the coloured shapes on the left side) should be synthesized in the right proportions (depicted on the right side). Our theory is as general as possible: it identifies all cellular make-ups that could give rise to a balanced growth state, the specific coloured shapes are for illustration purposes only. The bottom part shows three layers of nonlinearity that have to be incorporated in a self-fabrication model. 1) The rate of a reaction depends nonlinearly on the concentrations of substrates, products, and often on different compounds via (allosteric) regulation. 2) The dilution by growth of each compound is proportional to its concentration and to the growth rate. Since the growth rate depends on the synthesis rate of volume, this again depends on the concentrations of compounds that contribute to volume synthesis, such as the ribosome. Thus, the dilution of such compounds depends nonlinearly on their own concentrations. 3) *Self*-fabrication brings an inherent nonlinearity: the cellular composition should be tuned to produce the exact demand for precursors, but the demand for precursors is again determined by the cellular composition.

#### Biological interpretation and consequences.

The growth rate given in (15) has a natural biological interpretation. The overall growth rate of cells is a sum of the net relative volume increases due to metabolic reactions (including transport reactions, and internal reactions in which volumes of substrate molecules are replaced by volumes of product molecules), and the net volume change as the result of the conversion of metabolites into proteins and ribosomes. Since for many reactions the summed volume of the products will be approximately equal to the summed volume of the substrates, the extracellular exchange reactions will be the main driver of volume change (see also the worked-out example in S1 Text). The quantity is the net change in volume due to the conversion of substrates into products in metabolic reaction *j*, and is the net volume change due to the production of one mole of enzyme *j* from the relevant amino acids. These quantities will be called *a*_{j} and *b*_{j} below. For different choices of ** α**, different sets of enzymes are synthesised resulting in different growth rates.

Note that the dependence of *μ* on the concentrations forms an important aspect of our theory. If only the first three equations of (15) were to be solved, then we would model the behaviour of a set of concentrations in an exponentially growing volume. The expression for *μ* however indicates that the growth rate of this volume is dependent on this same set of concentrations. Cellular growth is thus really coupled to the chemical reactions in the cell.

## Results

### The balanced growth equations

To make notation more concise, set
so that *μ* in (15) simplifies to
(16)

We now derive a set of identities that need to hold at balanced growth, i.e., when . The aim is that they are written in such a way that the variables under control of the cell, the ribosomal allocation parameters *α*_{j}, stand out.

Since , *μe*_{j} = *rα*_{j} *g*_{j}(** x**), and the definition of

*μ*in (16) may be rewritten as the following equation for

*μ*, (17) which allows us to compute a steady state value for

*r*in terms of

**,**

*x**α*

_{1}, …,

*α*

_{n+1}and

*μ*, (18)

Demanding for *k* = 1, …, *m* in Eq (15) gives another *m* identities for *μ*,
(19)
With Eqs (17) and (19), we now have two different expressions of that should be equal whenever *x*_{k} ≠ 0. This yields
(20)
for *k* = 1, …, *m*, which are *m* equations for ** x** and

*μ*only.

Lastly, we still need to require , which yields (21)

It is tempting to use this relation straight away in (20), but it is much better not to submit to this temptation as it would destroy the geometric structure of the solution space in the remaining fractions *α*_{1}, …, *α*_{n}. This is a vital ingredient later in the construction of Elementary Growth States and Modes.

For future reference, we collect the pertinent Eqs (20) and (21), together with the assumptions on ** α** in (13), and collectively call these the Balanced Growth equations,

Note that when a specific ribosome allocation is chosen (by fixing ** α**), we get

*m*+ 1 equations that should be met. In principle, this is enough to fix the

*m*metabolite concentrations and the growth rate,

*μ*. Subsequently, the enzyme and ribosome concentrations,

**,**

*e**r*, can also be calculated. The

*α*

_{j}-factors can thus be seen as controlling all other variable quantities in the model. However, not all choices for

**will yield a solvable system of equations, and not all ribosome allocations will thus lead to balanced growth.**

*α*#### Biological interpretation and consequences.

The phenotypic space of cells growing at a steady state rate consists of all possible sets of molecule concentrations that admit balanced growth. Each environmental condition, together with a particular ribosome allocation that allows growth, yields one phenotypic state. The complete phenotypic space is typically very large: many cells are able to attain balanced growth in many different conditions, often using different metabolic subnetworks [32]. Even on the same food substrates, metabolism may change when the concentrations of these substrates change, e.g., from pure respiration to respirofermentation. The cells can adapt to conditions by changing enzyme concentrations through gene expression control.

The control space of the cell is thus in terms of the proteins that are being expressed; these are the variables that allow phenotypic changes, and the ones that need to be tuned to maximise fitness. We have rewritten the steady-state equations of the whole-cell model to allow an investigation of the solution space in terms of these control variables, the *α*’s. In other words, we can now start to explore the possible phenotypes a cell may express in terms of the biochemical properties (encoded in rate laws *f*_{j} and *g*_{j}) of the catalysing molecules, the enzymes and ribosomes, and the extracellular nutrient concentrations.

### Definition of Elementary Growth States (EGSs)

The balanced growth Eq (22) may be succinctly written as
(26)
where
(27)
and for *k* ∈ {1, …, *m*} and *j* ∈ {1, …, *n*} we have
(28)

This is a system of linear equations in ** α**. Because the Eq (23) is also linear in

*α*

_{n+1}, we can incorporate it in the system by introducing a new matrix, (29)

With these definitions, the balanced growth Eqs (22)–(25) read
(30)
where *u*_{m+1} is the unit vector: [0, …, 0, 1]^{T}.

**Remark**. *The rank of the matrix B*(** x**,

*μ*)

*may in principle depend on the choice of*.

**x**and μ*Certainly, rows corresponding to a metabolite with zero concentration, x*

_{k}= 0,

*are no valid constraints. If these rows are left out, it is clear, considering the definition of the elements of B*(

**,**

*x**μ*)

*in*(28),

*that the set of*

*x**and μ for which B*(

**,**

*x**μ*)

*does not have full rank has measure zero, i.e., represents negligible exceptions—even if the original stoichiometry matrices may have lower rank. We therefore from now on assume, that the rank does not depend on the choice of*

*x**and μ*.

*Most arguments that follow are for fixed*

**,**

*x**and so we will require that the rank does not depend on μ*.

To introduce Elementary Growth States, we fix ** x** and

*μ*, and ignore the ribosome allocation inequality |

*α*| ≤ λ for now. Denote the row vectors of

*B*=

*B*(

**,**

*x**μ*) by ,

*k*= 1, …,

*m*+ 1. The set formed by all positive solutions to the first

*m*equations, is a pointed polyhedral cone. The cone becomes a convex polytope if we restrict to those vectors

**in such that**

*α**α*

_{n+1}

*g*

_{n+1}(

**) =**

*x**μ*. Fig 2a gives the reader a sketch of this construction. Let us define (31)

a) For fixed metabolite concentrations ** x** and growth rate

*μ*

_{1}, the balanced growth Eqs (22) and (24) define a cone . Intersecting this cone with (23) defines a polytope containing all balanced growth solutions. Its vertices are the EGSs at this growth rate and metabolite concentrations. b) Changing the growth rate from

*μ*

_{1}to

*μ*

_{2}changes the cone to and the polytope to , because (22) depends on

*μ*. (A similar change would be seen if

**had been changed.) c) The EGSs change continuously with**

*x**μ*(and also with

**, not shown), but their support does not. The green lines, connecting the vertices in the different polytopes, together form the EGMs.**

*x***Definition 1**. *For a given set of metabolite concentrations x and growth rate μ*,

*let*

*be the corresponding convex polytope defined by*(31).

*Then the*Elementary Growth States with metabolite concentrations

**and growth rate**

*x**μ are the vertices of this polytope*.

Each EGS is thus a vertex of and therefore a basic feasible solution of the Linear Program (30). It has a corresponding feasible basis *D* such that *B*_{D}(** x**,

*μ*)

*α*_{D}=

*μ*

*u*_{m+1}and

*α*

_{j}= 0 for all

*j*∉

*D*. Moreover, for each

*D*,

*B*

_{D}(

**,**

*x**μ*) is square (after restricting to a set of linearly independent rows) and nonsingular. The support supp

**for an EGS satisfies supp**

*α***⊆**

*α**D*.

EGSs are quintessentially objects that depend on the chosen metabolite concentration vector ** x** and growth rate

*μ*. For each choice, the polytope will be different, with different vertices. This is because the constraint equations that define this polytope include the stoichiometry, the kinetics of enzymatic reactions, and the metabolite concentrations of the cell.

With the definition of EGSs, we immediately have the following fundamental result.

**Theorem 1**. *Any balanced growth solution with growth rate μ is a convex combination of EGSs with the same growth rate and the same metabolite concentrations*.

*Proof*. For fixed ** x** and growth rate

*μ*, any balanced growth solution satisfies

*B*(

**,**

*x**μ*)

**=**

*α**μ*

*u*_{m+1}and

**≥ 0. The vectors that satisfy these relations form precisely the polytope . This polytope is the convex hull of its vertices, the EGSs.**

*α*A convex combination of EGSs with different growth rates does not automatically satisfy the balanced growth equations, i.e., if *α*_{1},*α*_{2} are two EGSs with growth rates *μ*_{1} and *μ*_{2}, then the convex combination *t**α*_{1}+(1 − *t*)*α*_{2} does not necessarily satisfy

The main reason is that *B*(** x**,

*μ*) changes with

*μ*. For fixed

**, each**

*x**μ*defines a different polytope , see Fig 2b.

We may assume without loss of generality that *B*(** x**,

*μ*) has maximal rank

*m*+ 1. If not, we can, without consequence, delete rows from

*B*that are linearly dependent. This would mean that in balanced growth, not all metabolic concentrations are independent: there are ‘moieties’, but not exactly in the classical stoichiometric fashion. There is always dilution by growth. When selecting a feasible basis

*D*, the last column in

*B*should always be kept, since it enforces

*α*

_{n+1}

*g*

_{n+1}(

**) =**

*x**μ*, and ensures that the ribosome is made; without ribosomes, there are no enzymes.

**Theorem 2**. *If a vector α is an EGS, then its support* supp

*α**corresponds to a subnetwork that has a number of metabolic reactions that is less or equal to the number of independent metabolites. Each set D that is a feasible basis for an EGS, and for which D is also its support, has exactly as many metabolic reactions as independent metabolites*.

*Proof*. We fix ** x** and

*μ*throughout, and thereby also the polytope . The EGSs are the vertices of , and equivalently the basic feasible solutions of the LP. (A standard LP has an objective function, but that does not concern us here. For now, we only need to focus on the basic feasible solutions).

Each basic feasible solution has a corresponding feasible basis *D*, so that *B*_{D}(** x**,

*μ*) is square and invertible. Note that |

*D*| must therefore be equal to the number of independent rows of

*B*. Independent rows are given by the Balanced Growth equations of independent metabolites with non-zero concentrations. The vector has length |

*D*|. For all

*j*∉

*D*we have

*α*

_{j}= 0, but even for

*j*∈

*D*we can have

*α*

_{j}= 0, so that we have at most

*D*non-zero

*α*

_{j}’s. Certainly

*n*+ 1 ∈

*D*, because it corresponds to the ribosome, without which enzymes cannot be synthesised. That leaves |

*D*| − 1 elements

*j*∈

*D*\{

*n*+ 1}, each of which corresponds to a unique

*α*

_{j}, and thus to a unique enzyme. The number of active reactions, corresponding to non-zero

*α*

_{j}’s, is thus less or equal to the number of independent metabolites (in the sense described above) with non-zero concentrations.

If *D* is also the support of *α*, then the number of non-zero *α*_{j}’s is equal to |*D*|, so that the number of active reactions is exactly equal to the number of independent metabolites with non-zero concentration.

#### Biological interpretation and consequences.

Not all protein expression profiles lead to balanced growth: at least the essential enzymes needed for growth on a set of nutrients need to be made. There exist also protein profiles in which the same metabolites are being produced in multiple different ways, or in which different modes of metabolism are active at the same time, such as with overflow metabolism. Such networks show redundancy.

We have identified Elementary Growth States as the simplest, non-decomposable, cellular states that allow balanced growth. Any balanced growth state may be decomposed into EGSs that share the same growth rate and metabolite concentrations, but have different ribosomal allocations and use a different set of reactions. EGSs, as the simplest cellular phenotypes that include the concentrations and synthesis rates of all cellular components, are therefore the fundamental building blocks of balanced cellular growth.

Note that this set of building blocks is not yet invariant. For example, changes in the growth rate, the metabolite concentrations, or the kinetic parameters will change the EGSs. Since a set of invariant building blocks has proven very useful in FBA-theory (EFMs), we will now define Elementary Growth Modes that will indeed be invariant.

### Elementary Growth Modes are equivalence classes of Elementary Growth States

The support of an EGS, defined as the set of nonzero *α*_{j} (see S1 Text), induces a natural equivalence class of EGSs which all share the same support and hence constitute steady state networks with the same enzymes. This equivalence class does *not* depend on the *μ* and ** x** for which the individual EGSs were calculated.

**Definition 2**. *Two EGSs*, *α*_{1}(*x*_{1}, *μ*_{1}) *and* *α*_{2}(*x*_{2}, *μ*_{2}), *are said to be equivalent if they have the same support. The equivalence class to which* *α*_{1} *belongs is denoted by* *α*_{1}(** x**,

*μ*)],

*and is called an Elementary Growth Mode*.

The set of EGMs, contains all essentially different minimal networks that can sustain balanced growth, each with its unique set of participating reactions. In mathematical terms, the set of all EGMs, , is the quotient set of all EGSs under identification of EGSs with the same support.

Note that for each ** x** and

*μ*individually, a representative EGS for each EGM equivalence class can be calculated using standard algorithms. It is just a matter of finding the vertices of the polytope defined in (31), a task for which computational algorithms are already available. These representatives will change with

**and**

*x**μ*, but their support (and thus the EGM that they belong to) will not. It is however possible that an EGM has a feasible representative at some set of metabolite concentrations and growth rate, but not for another set.

We here prove that an EGS can be continuously extended both in an open neighbourhood around *μ* and in a neighbourhood around ** x**. This shows that if the whole-cell model allows a steady state at some fixed

**and**

*x**μ*, then steady states also exist with the same participating reactions for metabolite concentrations and growth rates close to the fixed values, see Fig 2c. In other words, given a representative of an EGM, this EGM also has a feasible representative in a small neighbourhood.

**Theorem 3**. *For a given growth rate μ*_{0} *and set of metabolite concentrations* *x*_{0}, *there exists an open neighbourhood U such that for all* (** x**,

*μ*)∈

*U*,

*each EGS with support equal to its feasible basis*,

**(**

*α*

*x*_{0},

*μ*

_{0}),

*can be continuously extended to a vector*

**(**

*α***,**

*x**μ*)

*that solves the balanced growth equations and belongs to the same EGM*.

*Proof*. Choose an arbitrary EGS: . We know that it solves

The EGS has a feasible basis *D*, and without loss of generality we restrict *B*(*x*_{0}, *μ*_{0}) to the columns indexed in *D*. As discussed before, we can select a set of rows corresponding to independent metabolites with non-zero concentrations such that the resulting matrix is square and invertible. We may therefore choose *n* = *m* and the dimension of *B* to be (*m* + 1) × (*m* + 1).

We would like to apply the Implicit Function Theorem to see that there are continuously differentiable functions , *j* = 1, …, *m* + 1, such that in an open environment of (*x*_{0}, *μ*_{0}), the Balanced Growth Equations are still met: . Since the are continuous, and since no *α*_{j}(*x*_{0}, *μ*_{0}) is equal to zero by the assumption in the Theorem, we can then also choose a neighbourhood in which . This would thus indeed be a continuous extension of the EGS that belongs to the same EGM, since its support does not change.

Let *s* be the number of components in ** x**. For the Implicit Function Theorem we need a function that is zero at (

*α*_{0},

*x*_{0},

*μ*

_{0}). For this function, we can use the Balanced Growth equations as components: the first

*m*are given by (32) where

*k*= 1, …,

*m*. The last component is given by (33)

Let us check the conditions for the Implicit Function Theorem:

- The function with components given by (32) and (33) is continuously differentiable in
,*α*, and*x**μ*in a neighbourhood of (*α*_{0},*x*_{0},*μ*_{0}). - By assumption, (
*α*_{0},*x*_{0},*μ*_{0}) is a solution of the balanced growth equations:*F*(*α*_{0},*x*_{0},*μ*_{0}) = 0. - The entries of the Jacobian of
*F*at (*α*_{0},*x*_{0},*μ*_{0}) with respect toare, for*α**k*= 1…,*m*, given by and the last row of the Jacobian is zero except for the last entry, which is*g*_{m+1}(). Note that this Jacobian is exactly the matrix*x**B*(*x*_{0},*μ*_{0}), which was invertible by construction.

The IFT may therefore be invoked, which shows that for each EGS at (*x*_{0}, *μ*_{0}), with support equal to its feasible basis, there is an open neighbourhood around (*x*_{0}, *μ*_{0}) such that the EGS can be continuously extended. By taking the intersection of these neighbourhoods we can indeed find a neighbourhood in which all of the EGSs can be extended.

The previous theorem shows that the set of EGMs is largely conserved when the growth rate or the metabolite concentrations are changed. Indeed, the functions that were found in the previous proof in principle give rise to balanced growth vectors at a wide range of growth rates *μ* and concentrations ** x**. However, these vectors are not necessarily all EGSs, because the positivity constraint

**≥ 0 may be broken. If one of the components becomes smaller than zero, then the corresponding EGM ceases to exist. We will use this fact in Theorem 4.**

*α*#### Biological interpretation and consequences.

If a cell is able to sustain itself at particular fixed metabolite concentrations and at a certain growth rate, then it is still able to maintain itself at slightly perturbed concentrations or a different growth rate, using the same enzymes and hence the same metabolic network. This network structure is defined to be the Elementary Growth Mode. These EGMs therefore form the set of invariant building blocks that we were after, analogous to the EFMs in FBA-models.

Indeed, the EGM may be used at different metabolite concentrations and balanced growth rates as long as none of the constraints is violated. These constraints will eventually be violated however: in particular, a higher growth rate may either require a ribosomal fraction to become negative (which is physically impossible), or it may become too large, and violate |** α**| ≤ λ.

### Maximal growth rate is attained in EGMs

We now prove that maximal growth rate is attained in an EGS, and hence in an EGM. Until now we have ignored the upper bound on the ribosome allocation, expressed in

From now on, we will consider this bound, and study the optimisation problem of finding the maximal balanced growth rate for the whole-cell model (15), phrased here as

Since *μ* is part of the construction of the solution polytope , we formulate a different optimisation problem, in which the metabolite concentrations are fixed, and we do not maximise the growth rate but we minimise the total fraction of the ribosome necessary to attain a given growth rate,

Problem (P2) should be considered for growth rates *μ* that are relevant for (P1); for instance, for the growth rate *μ*_{max} that maximises (P1). Also note that (P2) may be reformulated as

Fig 3 gives some intuition behind the proof of the following theorem.

As *μ* is increased, the polytope cuts through the ribosome allocation constraint , with |** α**| = λ shown in blue. Exactly at the maximal growth rate, the intersection of the polytope with |

**| = λ is reduced to one vertex (black dot), i.e., one EGS.**

*α***Theorem 4**. *For each choice of* λ, *(P1) is maximised in an Elementary Growth State*.

*Proof*. Let us consider the maximisation problem for an arbitrary fixed set of metabolite concentrations ** x** (inspired by the proofs from [17, 19]). For this fixed

**we will prove that the problem is always maximised in an EGS. This is enough to prove the theorem because this means that the growth rate is also maximised in an EGS if we would have picked the optimal metabolite concentrations**

*x***.**

*x*We only need to consider those ** x** for which (P1) has a nonempty set over which to maximise. The guiding insight for this proof is that, for each fixed value of

*μ*such that the polytope is nonempty, the minimum of (P2) is attained in a vertex of this polytope, i.e., in an EGS with growth rate

*μ*. (This is true for any linear objective function of

**).**

*α*Let *μ*_{max} be the value of the growth rate that maximises (P1), with maximiser *α*^{max}, and |*α*^{max}| ≕ λ^{max} ≤ λ. It is clear that . For *μ* > *μ*_{max}, is either empty, or it is not, so we distinguish two cases.

Case 1: If is not empty for some *μ* > *μ*_{max}, then |** α**| = λ for all ; if not, those vectors would satisfy all the constraints of (P1), so

*μ*

_{max}could not have been maximal. Moreover, vertices of change continuously as a function of

*μ*in a neighbourhood of

*μ*

_{max}(Theorem 3), and since

*α*^{max}is a convex combination of those vertices,

*α*^{max}itself also changes continuously. Finally, because |

*α*| is a continuous function of

**, the change in |**

*α*

*α*^{max}| will also be continuous. Adding this to the observation that |

**| > λ for all ,**

*α**μ*>

*μ*

^{max}, we can conclude that |

**| ≥ λ for all .**

*α*Since also |*α*^{max}| ≤ λ, we must have |*α*^{max}| = λ, and *α*^{max} is thus a minimiser of (P2) for *μ* = *μ*_{max}. The minimum of (*P*2) is attained in a vertex of the polytope, so we can choose *α*^{max} such that it is an EGS.

Case 2: If is empty for all *μ* > *μ*_{max}, then all ** α** that solve

*B*(

**,**

*x**μ*)

**=**

*α**μ*

*u*_{m+1}for

*μ*>

*μ*

_{max}, must have at least one index

*j*such that

*α*

_{j}< 0. However, we also know that the polytope was nonempty for

*μ*=

*μ*

_{max}, and a nonempty polytope must have at least one vertex. We can thus choose

*α*^{max}to be such a vertex, and the maximal growth rate is thus indeed maximised in an EGS.

Note that in both cases there could, in principle, be more than one choice for *α*^{max}: in Case 1 there could be an entire edge of the polytope that minimises (*P*2) for *μ* = *μ*_{max}, and in Case 2 the polytope at *μ* = *μ*_{max} could have more than one vertex. This however does not contradict the theorem since the theorem does not state that (*P*1) is *only* maximised in an EGS, but rather that it *is* maximised in some EGS. This is of course a standard situation in Linear Programming theory.

The proof shows that there are two situations in which the growth rate is maximised. Either the constraint |** α**| ≤ λ is hit, or one of the

*α*

_{j}= 0 and becomes negative at larger growth rates, causing the network to lose an enzymatic reaction and thereby ‘disintegrate’ (it cannot support balanced growth anymore).

We noted that the maximiser of (*P*1) does not have to be unique. It is however very unlikely to be non-unique if enough information about the enzyme saturation functions, *f*_{j}(** x**), and the volumetric parameters,

*ρ*

_{k},

*σ*

_{j}, is provided. Having a non-unique maximiser means that there are at least two EGMs,

*α*_{1},

*α*_{2}that have exactly the same maximal growth rate, thus these EGMs either both have |

*α*_{1}| = |

*α*_{2}| = λ or both become infeasible, at

*μ*=

*μ*

_{max}. However, the two EGMs must by definition have a different support, such that there is at least one reaction

*j*that is only used in the first EGM, and a reaction

*j*′ that is only used in the second. The Balanced Growth equations for an EGM, and thus the maximal growth rate for that EGM, depend on the parameters of all participating reactions. The different parameters (e.g., different stoichiometry, volumetric constants, and enzymatic parameters) for the reactions

*j*and

*j*′, will thus influence the maximal growth rate, making it very unlikely that the maximal growth rates for the two EGMs are exactly the same.

We also remark that (P1) is a problem for all metabolite concentrations and all ribosome allocations, whereas (P2) is a problem only for the ribosome allocation. We can also consider an optimisation problem for the metabolite concentrations alone: if we pick an EGM we can maximise the growth rate in that EGM by varying ** x**. In this stage we do not know whether this optimum is unique, or what the convexity properties are of this problem.

The proof of Theorem 4 also shows that increasing λ must coincide with increased *maximal* growth rate.

**Corollary**. *Consider (P1) for two values of* λ, 0 < λ_{1} ≤ λ_{2}, *and denote* *the maximal growth rate attained with* λ = λ_{i}, *i* = 1, 2. *Then* .

The converse of this corollary does not necessarily hold: if *μ*_{1} ≤ *μ*_{2} and for *i* = 1, 2, it does not follow that |*α*_{1}| ≤ |*α*_{2}|.

#### Adding additional linear enzymatic constraints results in minimal convex combinations of EGMs.

In a cell, protein concentrations are bounded. This is the direct consequence of cells having membranes with finite areas, compartments with finite volumes and proteins occupying volume and space. Accordingly, each membrane and compartment in a cell has a finite capacity to contain proteins, effectively setting constraints to the protein concentrations in cells. These protein constraints are often modelled as upper bounds to weighted sums of enzyme (and ribosome) concentrations: . To include these bounds in the optimisation problem, we need to find expressions for the enzyme concentrations *e*_{i} in terms of the *α*_{j}’s.

The steady state value of the enzyme concentrations is (note that in steady state, this identity also holds for the ribosome concentration, *j* = *n* + 1, so we don’t have to treat it separately). Using the steady state ribosome concentration (18), which reads
this becomes

We can use this to rewrite a linear constraint on the enzyme concentrations into a linear constraint on the *α*_{j}’s, by rewriting to

These are inequalities with the same type of dependency on ** α** as before, and may be added to the optimisation problem. In a similar way, a linear combination of enzymes and the ribosome will produce inequalities with

*α*

_{j}of the same type.

**Theorem 5**. *The optimisation problem (P1), supplemented with K linear constraints* *has a solution that is a convex combination of at most K* + 1 *Elementary Growth States*.

*Proof*. In the proof of Theorem 4 we showed that the maximal growth rate *μ* will be attained by a vertex of the polytope . Here we show that this proof can still be used for a slightly different polytope, defined by
(34)

Following the proof of Theorem 4 we can separate two cases: in Case 1 there is a *μ* > *μ*_{max} for which the polytope is nonempty, and in Case 2 the polytope is empty for all *μ* > *μ*_{max}. The two cases can be handled exactly as in the proof of Theorem 4 so that we know that the maximal growth rate under the additional constraints will be attained in a vertex of . Therefore, if we can describe the vertices of the new polytope, we are done.

Vertices are 0-dimensional faces of the polytope. In an *n* + 1-dimensional space, the vertices should therefore, just as before, satisfy *n* + 1 equalities. Of these equalities, *K* could come from the newly added constraints. That means that a vertex of the new polytope should satisfy at least *n* + 1 − *K* equalities in the old polytope . In other words, the vertex was part of a *K*-dimensional face of the old polytope and therefore it is a convex combination of at most *K* + 1 of the 0-dimensional EGSs.

#### Biological interpretation and consequences.

EGMs are the self-fabricating networks that maximise balanced growth rate. By minimising the number of enzymatic reactions, the investment in each of the remaining reactions may be maximised, leading to the highest possible overall synthesis rates, and thus the maximal growth rate. While maximising the growth rate, biophysical constraints may be hit. Such constraints may often be formulated as linear combinations of enzyme concentrations, such as the total enzyme concentration in the cytosol, total enzyme concentration in the membrane, and so on. When constraints are hit, the maximiser is not necessarily an EGM, but possibly becomes a convex combination of EGMs. The number of such EGMs found in the optimum is less or equal to the number of constraints. Constant average density, assumed in (14), is the first constraint. Since the number of possible physical constraints is limited, the theory predicts that optimal steady state behaviour is simple, with a small number of EGMs active at any time.

### The relation between EFMs and EGMs

Elementary Flux Modes (EFMs) are the fundamental building blocks of genome-scale stoichiometric models [6]. EFMs are minimal in the sense that all the reactions in an EFM are essential for a steady state flux. The EFMs can be found as the extreme rays of the flux cone (see Fig 4), if reversible reactions are split in two separate irreversible reactions. The set of all feasible flux vectors, given the reaction stoichiometry, can then be described as the set of all convex combinations of EFMs.

In case (a) the cone of possible steady state flux solutions is a plane through the origin, bounded by *v*_{1} ≥ 0 and *v*_{2} ≥ 0. Restricting to solutions with those with a prescribed biomass flux *v*_{BM} (orange horizontal plane) gives a line segment bordered by two EFM vertices. In EFM_{1}, reaction *v*_{1} participates; in EFM_{2}, reaction *v*_{2}. In (b) for given metabolite concentration *x* and growth rate *μ*_{1}, the balanced growth solutions for the corresponding self-replicating network also form a line segment of a cone, again spanned by two vertices, the EGSs at these fixed values of *x* and *μ*. Only this line segment contains balanced growth solutions, the rest of the cone does not. In EGS_{1} only enzyme 1 and the ribosome are expressed, while in EGS_{2}, enzyme 2 and the ribosome are expressed.

To relate EFM theory to EGM theory, we start with stoichiometric matrices *P* and *M* of the whole-cell model (15). Choosing an EGS requires first choosing metabolite concentrations ** x** and a growth rate

*μ*, and then choosing a feasible basis

*D*which gives rise to a square invertible matrix

*B*

_{D}, corresponding to a vertex of the polytope . We may therefore also restrict the stoichiometry matrices to

*P*

_{D}and

*M*

_{D}, accordingly.

In genome-scale metabolic networks that only model the metabolic part and not the enzymatic and ribosomal part of self-fabrication, a virtual biomass reaction is added. This reaction consumes all components necessary for cell synthesis in (approximately) the right proportions. The growth rate is assumed to be proportional to the rate of the biomass reaction. In the following theorem we find, for each EGS, a suitable biomass reaction, such that the metabolic part of the network (*P*_{D}) appended with this virtual reaction has one EFM. The flux values of this EFM approximate the flux values of the EGS. The correction is of order *μ*, which is usually several orders of magnitude smaller than the flux values.

**Theorem 6**. *Let* *α**be an EGS for metabolite concentrations* *x**and growth rate μ*, *with feasible basis D*. *Let* *be the corresponding flux vector and* *the corresponding enzyme/ribosome synthesis flux vector. Let* ** ϕ** =

*M*

**,**

*w**where M is the enzyme synthesis stoichiometry matrix, and assume that*[

*P*

_{D}−

**]**

*ϕ**has full rank. Then we may write*(35)

*where*

*is the unique appropriately scaled EFM of the stoichiometric matrix*[

*P*

_{D}−

**].**

*ϕ**Proof*. Let *P* = *P*_{D}. At balanced growth, the flux vectors ** v** and

**satisfy where**

*w*These vectors satisfy the steady state equations , i.e.,

Setting ** ϕ** =

*M*

**and**

*w**P*

_{ϕ}= [

*P*−

**], we can see that**

*ϕ***is given by the linear set of equations, (36)**

*v*We analyse this problem for fixed ** x** and

**, which must have solutions, since**

*ϕ***is already one such solution. We call a solution to**

*v**P*

_{ϕ}**=**

*u**μ*

**. Since we have assumed that**

*x**P*

_{ϕ}has full rank, we know that there must be a one-dimensional solution space. We further exploit that

*P*

_{ϕ}has full rank by using row-reduction to write . Row-reduction does not change the solution space, so, if we let

*u*_{n}denote the first

*n*entries of

**, solves if and only if it also solves . The latter is solved by , so the first must be too, such that this gives us a particular solution. The one-dimensional solution space is then found by adding a scalar multiple of a vector,**

*u**u**, from the nullspace of

*P*

**. So,**

_{ϕ}The vector ** u*** is the EFM of the matrix

*P*

_{ϕ}. Let

*t** be chosen such that . (This is always possible, since we already have a solution

**). Setting , we deduce**

*v*The theorem states that [*P*_{D} − *M* ** w**] should have maximal rank. This requires at least that

*P*

_{D}≥

*m*− 1. This does not follow directly from the fact that

*B*

_{D}(

**,**

*x**μ*) has full rank; one can easily make a counterexample. We do expect that

*P*

_{D}has full rank in many cases.

#### The approximating EFM is constant if the precursor consumption rates change proportionally.

The relevant EFM for an EGS is thus the vector ** v** spanning the null space of the matrix

*P*

_{ϕ}, excluding the last element

*V*

_{n+1}. The total precursor consumption rate,

**=**

*ϕ**M*

**, is part of the EGS solution and is therefore not known a priori. In this section, however, we assume that the relative precursor consumption rates are constant, making**

*w***a fixed vector, up to a proportionality constant. We show that under this assumption, the EGS is approximated by the same EFM for all metabolite concentrations and growth rates.**

*ϕ***Theorem 7**. *Let* *α*^{0} *be an EGS for metabolite concentrations* *x*^{0} *and growth rate μ*^{0}, *with feasible basis D*. *Let* *be the corresponding flux vector and* *the corresponding enzyme synthesis flux vector. Let* *ϕ*^{0} = *M* ** w**,

*where M is the enzyme synthesis stoichiometry matrix, and assume that*[

*P*

_{D}−

*ϕ*^{0}]

*has full rank. Let*

*V*_{n}

*be the EFM that approximates the flux values of*

*α*^{0},

*according to Theorem 6. Moreover, assume that*

**(**

*ϕ***,**

*x**μ*) =

*h*(

**,**

*x**μ*)

*ϕ*^{0}

*with h some scalar function. Then, for all μ and*

*x**such that D is a feasible basis, the flux values*

**(**

*v***,**

*x**μ) are approximated by the same EFM*: (37)

*Proof*. Following the proof of Theorem 6 we know that the flux values of an EGS are approximated by the first *n* coordinates of a vector that is in the nullspace of *P*_{ϕ} = [*P* −** ϕ**]. Since , we can set

*V*_{ϕ(x, μ)}≔ (

*h*(

**,**

*x**μ*)

*V*

_{1}, …,

*h*(

**,**

*x**μ*)

*V*

_{n},

*V*

_{n+1})

^{T}. We have

Note that the first *n* entries of ** V** are all multiplied by the same scalar. So, the ratios of all metabolic reaction rates (not including the virtual biomass reaction) are constant.

In Fig 5 we provide evidence that the amino acid composition of cells growing in different media are practically equal. With the above theorem, this indicates that flux values of EFMs likely closely resemble those of EGSs, provided that these EFMs were calculated for an accurately estimated biomass reaction.

Bar plots give relative amino acid frequencies in minimal (left column) and rich medium (middle column). Right column illustrates the cumulative frequency distributions used in the Kolmogorov-Smirnov (KS) test. KS statistic values (which measure the maximal difference between these two cumulative distributions) for the three studies are 3.5 ⋅ 10^{−4}, 3.6 ⋅ 10^{−3} and 9.3 ⋅ 10^{−3}, indicating that these distributions can hardly be distinguished (as is evident from the plots). Data (available at proteomaps.net) from [45] (top row; *E. coli*) [46] (middle row; *S. cerevisae*) and [47] (bottom row; *E. coli*). See [48] for another experimental example from *Lactobacillus plantarum*.

#### The dependence of EGSs on growth rate.

The constant approximation of EGS flux values up to a correction of order *μ* by an EFM can be used to investigate the dependence of an EGS, ** α**, on the growth rate. If we make the additional assumption that the dilution of metabolites is negligible, we can even make the

*μ*-dependence explicit.

**Theorem 8**. *Let* *α*^{0} *be an EGS for metabolite concentrations* *x*^{0} *and growth rate μ*^{0}, *with feasible basis D*. *Let* *be the EFM that approximates the flux values of* *α*^{0}, *according to Theorem 6. Moreover, assume that* ** ϕ**(

**,**

*x**μ*) =

*h*(

**,**

*x**μ*)

*ϕ*^{0}

*with h some scalar function, and that the dilution of metabolites is negligible compared to metabolic fluxes*,

*μx*

_{k}= 0.

*Then, there is an upper bound for the growth rate attained in this subnetwork*,

*μ*

_{ub},

*and the dependence of the EGS*

*α**on μ*∈ [0,

*μ*

_{ub}]

*is given by*(38)

*where*

*Proof*. According to Theorem 7, the reaction rates are given by ** v**(

**,**

*x**μ*) = (

*h*(

**,**

*x**μ*)

*V*

_{1}, …,

*h*(

**,**

*x**μ*)

*V*

_{n})

^{T}. Consequently, the enzyme concentrations can be calculated by , and also by . If we isolate

*α*

_{j}from these two expressions for the enzyme concentrations, we get: (39)

Note that the *μ*-dependent part of this expression is equal for all metabolic reactions *j* ≤ *n*. This shows that the first *n* coordinates of the EGSs change proportionally when the growth rate changes at fixed metabolite concentrations.

Given this information we can make the *μ*-dependence even more explicit, by reconsidering the balanced growth equations given in (27) and (28). Let us start from an EGS *α*^{0}(*x*_{0}, *μ*_{0}). We know that the EGS satisfies *A*(*x*_{0}, *μ*_{0})*α*^{0}(*x*_{0}, *μ*_{0}) = 0. For general *μ* these equations, under the assumption that the dilution of metabolites is negligible, are given by

A metabolite for which *M*_{k(n+1)} = 0 yields no information on *H*(*x*_{0}, *μ*), but we know that there is at least one *k* such that *M*_{k(n+1)} > 0, because the synthesis of ribosomes needs at least one building block. We pick such a *k* to isolate *H*(*x*_{0}, *μ*):

Note that *H*(*x*_{0}, *μ*) is an increasing function of the growth rate, with a vertical asymptote at *μ* = *μ*_{ub}. We thus see that under the assumptions that metabolite dilution is negligible and that the biomass composition remains relatively constant, the specific index set *D* gives rise to a valid EGS for the range [0, *μ*_{ub}]. However, the ribosome allocation constraint |** α**| ≤ 1 will already constrain growth before

*μ*

_{ub}is approached. For each EGS we thus get an upper bound on, but not an accurate estimation of, the maximum achievable growth rate.

This upper bound is due to the following mechanism: the expression of an enzyme at a higher concentration leads both to a higher demand for its building blocks (because more enzyme is diluted per unit time), and to a higher supply of these building blocks (since the enzymes catalyse the production of building blocks). However, the demand for building block *k*, , scales with *μ*, while the supply, , does not. The upper bound, *μ*_{ub}, on the growth rate is given by that growth rate at which the expression of more protein increases the demand for building blocks more than that it increases the production of metabolites: . The building blocks can thus no longer be kept in steady-state.

#### Biological interpretation and consequences.

The results in this section have several important implications. First of all, we can finally understand more deeply why Flux Balance Analysis has been such a good predictor of microbial metabolism. The main reason is that, when relative amino acid usage (so the relative rates of amino acid consumption for enzyme and ribosome synthesis) is constant across conditions, much of the nonlinearity in self-fabrication disappears. A linear model that disregards enzyme synthesis, but that replaces this by an accurately estimated biomass reaction, will predict flux values that are indeed close to flux values in the corresponding nonlinear model in which enzyme and ribosome synthesis are included.

Second, these constant relative amino acid consumption rates indicate why growth regulation in microbes is regulated at the amino acid level. The cell must maintain constant amino acid consumption rates, and has immediate end-product inhibitions in place in which an overabundance of a particular amino acid causes a shutdown of the synthesis of that amino acid by allosteric inhibition.

Third, we have shown that under this ‘amino-acid-assumption’, all the enzyme allocation fractions *α*_{1}, …, *α*_{n} change with the same factor if the growth rate and metabolite concentrations are changed. In other words, not only the EGM is kept the same (the cell keeps using the same reactions), but the relative investments in the different enzymatic reactions also remains the same! This could explain why microbes such as *E. coli* have the alarmone ppGpp which signals the depletion of amino acids, and causes a change in allocation from enzyme synthesis (i.e., having more ribosomes) to metabolism (by creating more enzymes) [33, 34]. Since the *relative* enzyme synthesis rates do not change with growth rate, the cell needs only to control *α*_{n+1} versus the rest. This dramatically simplifies the overall optimisation problem.

Fourth, under the additional assumption that the dilution rates of metabolites are negligible compared to their metabolic production and consumption rates, we identified an upper bound on the growth rate. When this upper bound is approximated, the ribosomal allocation fractions *α*_{j} increase asymptotically. Therefore, the upper bound will in reality never be reached because the ribosomal allocation constraint |** α**|≤1 will be limiting first. However, the upper bound still shows that the costs of growth increase nonlinearly with the growth rate, and it shows that there is a fundamental limit to self-fabrication rates.

## Discussion

### A biochemical theory of balanced, unicellular self-fabrication

Self-fabrication, self-repair and phenotypic adaption to new environments are defining characteristics of autonomously living organisms. Understanding them in terms of the underlying biochemistry is a key challenge in cell biology. In this work, we focused on the average cell in a population that is growing balanced (arguably the only well-defined state in microbiology [35]) and in a static environment. We defined the phenotype of such an average cell as the complete set of concentrations of all biochemical components, and all rates of chemical reactions. An expression for cellular growth was derived in terms of the production rates of cellular components. We then asked which phenotypes could give rise to steady state balanced growth.

To get a mechanistic understanding of how these different phenotypes can be sustained by a cell, we have derived a theory in which the quantities that are directly controlled by gene expression—the allocation fractions of the ribosome—are the free variables (inspired by [3]). We ignored the precise mechanism how gene expression gives rise to these allocation fractions, but this does not influence the main findings of our theory. In currently used modelling approaches, the free variables are often only indirectly controllable, e.g., the reaction rates or the enzyme concentrations. By taking this new perspective, we can get a full description of the problem that needs to be solved by cells by means of gene expression, instead of only getting a description of the possible solutions of this problem.

### EGMs: The minimal phenotypes that lead to balanced growth

The key modelling assumption that turns the inaccessible balanced growth system (9) for general chemical systems into one with sufficient structure (22)–(25) for biological systems is the fact that enzymes and ribosomes catalyse reactions in a linear fashion. Doubling the concentration of these catalysts leads to a doubling in reaction rates. This introduces just enough linearity into the balanced growth equations to make linear programming techniques accessible.

To describe all balanced growth phenotypes, we identified EGMs: the minimal modes of gene expression that lead to balanced growth; all possible phenotypes are convex combinations of these minimal modes. The EGMs can thus be seen as the regulatory degrees of freedom of the cell. In contrast, one could suggest to view the expression of single enzymes as the most important degrees of freedom, but an enzyme alone can never lead to a self-fabricating system. Even an entire pathway that carries a steady-state flux does not lead to a self-fabricating system, unless it synthesises all cellular building blocks that are needed for the enzymes catalysing this pathway. Regulation should therefore not involve making numerous decisions about independent enzyme expression levels: the decision should rather be how and when to express one or a few EGMs.

In one specific environment, we can calculate all minimal growth-supporting enzyme expression states, which we called Elementary Growth States. However, when the environment changes the EGSs change with it, i.e., the specific allocation fractions of the ribosome change: changing nutrient levels cause changes in enzyme saturation and reaction rates, leading to new enzyme concentrations and a new growth rate. Since we wanted to find the regulatory degrees of freedom for the cell, the EGSs were not very useful: biological environments are noisy which means that these degrees of freedom would fluctuate constantly. That is why we defined the Elementary Growth Modes as equivalence classes of Elementary Growth States: two EGSs with the same set of expressed enzymes belong to the same EGM. Under small fluctuations in the environment, the EGSs change, but the set of EGMs is constant. The set of growth-supporting EGMs *can* change due to a larger change in the environment, e.g., due to the depletion or appearance of nutrients.

The EGMs are analogous to EFMs, objects defined in models where no explicit synthesis of enzymes and the ribosome is considered. The EFMs are defined as the minimal (i.e., non-decomposable) combinations of reaction rates that support a steady-state flux [9]. So, where the EGMs are the minimal building blocks of balanced growth, the EFMs are the building blocks of metabolism. While EGMs are defined as equivalence classes of EGSs, the EFMs are defined up to a normalization factor, and are therefore also equivalence classes of steady-state flux vectors. Moreover, we showed that, as long as dependent metabolites (for example due to so-called moieties) are removed, an EGM will have exactly as many active metabolic reactions as non-zero metabolites; the ribosome-catalyzed synthesis adds another reaction (Theorem 2). EFMs, also after removing dependent metabolites, have one active reaction more than the number of used metabolites. The additional reaction in the EFM is the (virtual) biomass reaction, thus playing a similar role as the ribosome-catalyzed reaction.

### Growth rate is maximised by using a small number of active EGMs

Evolutionary theory suggests that in stable environments, those microorganisms are selected that express the phenotype that maximises the growth rate. One of our aims was therefore to describe, in any given condition, the phenotype with the maximal growth rate.

We proved in Theorem 4 that the highest possible growth rate in any fixed environment is attained in an EGM. This makes intuitive sense: by reducing the number of active reactions, the redundancy in the reaction network is reduced. When fewer different proteins are used, the remaining ones can be expressed to a higher level. To maximise the growth rate, the cell should thus express only the pathways that are most efficient in terms of resources. This ultimately leads to cellular states in which no redundant reactions are active, i.e., with only one active EGM, such that none of the reactions can be removed without stopping growth. Again, EGMs prove to be analogous to EFMs, since it was proven that the metabolic flux through a proteome-constrained metabolic network is maximised in an EFM [17, 18].

Because cells and their compartments have a limited size, and because each molecule takes up a certain volume, molecule concentrations in a cell are bounded. As a consequence, biochemical constraints arise for weighted sums of molecule concentrations, in particular on enzyme concentrations since most cellular biomass consist of protein. These constraints might change the optimal solution. For example, an EGM that is very efficient in terms of cytosolic volume might be very inefficient with respect to the limited membrane area. The first of these constraints was already incorporated when we assumed a constant average density (Eq (4)), but additional constraints may be active and can be imposed on the model. We proved (Theorem 5) that a small number of EGMs still constitute the optimal solution when additional constraints are added. However, maximally one EGM might be added for each constraint that is imposed. Given that the constant average density constraint was already active, the number of active EGMs is thus bounded by the number of constraints on enzyme expression. We hereby generalized yet another existing result [19] for EFMs to their self-fabricating counterparts.

Evolution of metabolism proceeds via mutations, either affecting the kinetics of enzymes, to increase the activity per unit enzyme, or affecting the regulation of protein expression. The first type of evolution will change the properties of the EGMs, while the second type will change the number or identity of EGMs that is expressed. The above theorems involve the second type of evolution; they predict that—no matter the environment, nor the specific properties of the growth-supporting EGMs—only a small number of EGMs should eventually be selected. How close microorganisms currently are to this optimal state is unclear, and awaiting experimental investigation, although indirect evidence is mounting that cells are very good indeed [19, 36, 37].

### EGMs are the basic building blocks of cellular growth models

In order to fully describe the phenotypes that can sustain balanced growth, we needed to incorporate 1) a direct coupling between reaction rates and growth rate, 2) explicit synthesis of enzymes and the ribosome, and 3) nonlinear and metabolite-dependent enzyme kinetics. All three aspects added nonlinearity to our theory; the first makes it impossible to prescribe the growth rate and solve for all remaining variables in steady state, the second makes the solutions depend nonlinearly on the growth rate, and the third introduces a nonlinear dependence on metabolite concentrations. In the resulting theory, all phenotypes and their corresponding growth rates can in principle be calculated, but these computations are currently not feasible for a genome-scale model. We thus developed a quantitative framework that led to general theoretical results, but no quantitative predictions can be made for a specific organism at present.

Fortunately, our theory can be simplified in various ways, resulting in computationally feasible models (see [24] for a more extensive review of different modelling approaches). Elementary Growth Modes are still the minimal building blocks, and the above-mentioned maximiser theorems still hold. A commonly used simplification is to consider a small coarse-grained whole-cell model [3, 11, 13, 24]. These models become computationally feasible because of their small size, such that no further simplifications have to be made; EGM-theory is directly applicable to this type of models. Another popular modelling approach ignores the explicit synthesis of enzymes and the ribosome, replacing it by a constant biomass reaction as a sink for cell components. These approaches either consider medium-scale models with nonlinear enzyme kinetics [38, 39], or genome-scale models without metabolite concentrations and only a maximal catalytic rate (*k*_{cat}) as enzyme kinetics [40], or genome-scale models without enzyme kinetics (FBA [5]). In all these approaches, the solutions can be written as convex combinations of EFMs, which can be seen as the linear approximations of EGMs (Theorem 6). The recently developed Metabolism and Expression (ME) models currently make the least simplifying assumptions in modelling cellular growth [22, 41, 42]: enzyme and ribosome synthesis are explicitly modelled, but enzyme kinetics are replaced by a single catalytic rate. The minimal building blocks of these models are thus EGMs where the saturation function is replaced by a constant and the dilution of metabolites is ignored. Very recently, theory was also developed for kinetic models with explicit synthesis of one type of protein [43]. This approach replaces the constant density assumption by an upper bound on this density, and assumes that the ribosome (i.e., the one protein) is always fully occupied. This enabled the authors to further analyse the optimal growth state, which was indeed an EGM.

### A constant amino acid composition greatly simplifies self-fabrication

Experimental data suggests that the amino acid composition of cells is constant across different growth conditions (Fig 5). This can only be maintained when amino acids are consumed at rates with constant ratios. Under that assumption, EGMs simplify considerably, and become much more “linear” objects: the flux values of the EGM can be approximated by an EFM. Under the additional assumption that the dilution of metabolites is negligible, and within one EGM at fixed metabolite concentrations, the fractions of the ribosome allocated to producing enzymes, the *α*_{j}’s, all have the same dependence on the growth rate (Theorem 8). In other words, the relative allocation of these enzymatic *α*_{j}’s remains equal. The ribosomal fraction allocated to producing ribosomes, *α*_{n+1}, does scale differently: this fraction changes linearly with the growth rate. The result of this simplification is therefore that to control enzyme/ribosome synthesis the cell should only tune two different variables: the overall fraction of the ribosome allocated to making enzymes, and the fraction of the ribosome allocated to ribosome production. The ppGpp-mechanism in *E. coli*, which acts on amino acid abundances and controls the balance between metabolism and ribosome synthesis, fits perfectly with this structure [33, 34].

### Stresses and non-enzymatic reactions can be considered in terms of EGMs

Although the theory is presented for whole-cell models that are geared solely towards growth and self-fabrication, it does not preclude non-enzymatic (i.e., diffusive) reactions, stress responses, and maintenance or homeostasis activities. Processes that are not directly related to growth, such as heat shock responses or the removal of toxins, may all be incorporated. As long as the proteins involved in those processes act linearly on the reaction rates, growth rate maximisers will again be EGMs of the corresponding model. In Fig 6 (code in S1 Text) we provide an example of a situation where it is optimal to invest in a stress response, in this case the removal of a toxin. The effect of the toxin was modelled by adding an inhibitory effect of the toxin on all metabolic enzymes.

**(A)** The example network contains five metabolites; *x*_{1} to *x*_{4} are precursor molecules and *x*_{3} and *x*_{4} play the role of amino acids. An external toxin can diffuse over the membrane and, when inside the cell, inhibits the catalytic activity of all proteins. A fifth protein, in purple, destroys the toxin. **(B)** Without toxin present, investment is divided over the four proteins and the ribosome to maximise steady state growth rate. With toxin present, the toxin-degrading protein is also expressed, even though it does not directly contribute to growth. However, without expressing this stress protein, the maximal growth rate would have been lower. **(C)** Investment in proteins and ribosomes to attain maximal balanced growth rate as a function of toxin concentration. Note that in this example, a switch is observed, from an EGM with four metabolites to one with five. Full detail and code may be found in S1 Text.

### Open problems and outlook

Clearly, many open theoretical problems remain. For example, we have described the (optimal) phenotypes that lead to steady-state self-fabrication in static environments, but we do not know if the same phenotypes are evolutionary favourable in dynamic environments. Moreover, even if we know that the optimum is attained in one, or a combination of a few, EGMs, we do not know how a cell should find these optimal protein expression states with its regulatory circuitry. Further, it is unclear to us if more existing results for EFMs can be generalised to EGM theory. It is known, for example, that finding the metabolite concentrations that maximise the specific flux in an EFM is a strictly convex optimisation problem [37, 44], but we do not know if an analogous result for EGMs can be proved.

We believe that a sound fundamental theory of microbial growth could help to better understand the common denominators underlying qualitatively similar physiological behaviours of evolutionary distinct microorganisms. Systems biology is, however, remarkably short of experimentally testable theories, in contrast to other systems sciences, such as statistical physics and population genetics. This is somewhat surprising since it is apparent, even from the history of microbiology itself, that abstract theory can aid in the understanding of concrete phenomena. The understanding of single-cell physiology has profited greatly, for instance, from theories on stochastic fluctuations in molecular circuits, and the introduction of enzyme kinetics theories revolutionised enzyme biochemistry.

The EGM theory presented here should allow at least to improve our understanding of the biomass composition of cells, the relation between growth rate and reaction rates, better prediction of proteome constraints that limit growth rate, and prediction of transcription and hence also enzyme levels.

We hope that this paper contributes to a growing body of ‘biomathematical’ theories that eventually provide basic answers to the molecular basis of life—firstly of microorganisms. We are convinced that such a theory is within reach. The next frontier we foresee is understanding metabolic behaviour of microorganisms in terms of growth rate maximisation and constrained protein expression—a general biomathematical theory that merges enzyme biochemistry, metabolic network reconstructions and evolutionary theory.

## Supporting information

### S1 Text.

Supplementary text with 4 main parts: **1. A detailed derivation of the growth rate in terms of reaction rates, at balanced growth.** Here we give more detailed exposition of the growth rate in terms of reaction rates, and in particular discuss the assumption that the cellular volume is the combined volume of all metabolites, enzymes and ribosomes. **2. Some basic Linear Programming.** The essential notions from Linear Programming necessary to follow the proofs in the main text. **3. A single pathway toy model showing the three inherent non- linearities of self-fabrication.** A fully worked out example of the theory. We contrast a small self-fabricating model with its simpler non-self-fabricating counterpart, and explore how the additional nonlinear effects present in the self-fabricating models appear, and how they change fluxes, concentrations, growth rates, and their interdependence. **4. Extensions to EGM theory.** We discuss extensions to the theory: adding non-enzymatic reactions, adding a positivity constraint for the ribosomal constraint, and adding stress responses.

https://doi.org/10.1371/journal.pcbi.1007559.s001

(PDF)

### S1 Table. Overview of all variables and parameters used in this paper.

https://doi.org/10.1371/journal.pcbi.1007559.s002

(PDF)

### S1 Fig. Results for the self-fabrication model.

All results reflect optimal allocation for maximal growth rate.

https://doi.org/10.1371/journal.pcbi.1007559.s003

(TIF)

### S2 Fig. Results for the non self-fabricating model.

All results reflect optimal allocation for maximal biomass production rate.

https://doi.org/10.1371/journal.pcbi.1007559.s004

(TIF)

### S3 Fig. Enzyme concentrations and fluxes are no longer always proportional when nonlinear enzyme kinetics are modelled.

Values shown correspond to optimal growth rate solutions of the model.

https://doi.org/10.1371/journal.pcbi.1007559.s005

(TIF)

### S4 Fig. The demand for precursors as fractions of the total demand for biosynthesis.

The left figure shows the results for our toy model of self-fabrication, while the right figure shows the results for the corresponding conventional model.

https://doi.org/10.1371/journal.pcbi.1007559.s006

(TIF)

### S5 Fig. The demand for precursors as fractions of the total demand for biosynthesis.

Compared to the model results presented in S4 Fig, there is more variation in the precursor demand for protein and ribosomes. The left figure shows the results for our toy model of self-fabrication, while the right figure shows the results for the corresponding conventional model.

https://doi.org/10.1371/journal.pcbi.1007559.s007

(TIF)

### S6 Fig. The total enzyme synthesis rate is not always proportional to the growth rate.

If the dilution rate of metabolite is no longer negligible compared to their metabolic turnover, the growth rate can increase without a proportional increase of the enzyme synthesis.

https://doi.org/10.1371/journal.pcbi.1007559.s008

(TIF)

### S7 Fig. A model of self-fabrication that shows overflow metabolism.

https://doi.org/10.1371/journal.pcbi.1007559.s009

(TIF)

### S1 Code. Matlab and mathematica code to reproduce Fig 6 in main text.

https://doi.org/10.1371/journal.pcbi.1007559.s010

(ZIP)

### S2 Code. Matlab code to reproduce S1–S7 Figs in SI, corresponding to the fully worked out example model in SI Section 3.

https://doi.org/10.1371/journal.pcbi.1007559.s011

(ZIP)

## Acknowledgments

We thank Martin Lercher for helpful discussions and comments on a preliminary version of our manuscript. We are grateful to our colleagues Douwe Molenaar and Sieze Douwenga for their helpful opinion and constructive feedback, and to students Thomas Rooijakkers and Martino Pitruzzella for providing insight with their numerical explorations.

## References

- 1. Fishov I, Zaritsky A, Grover NB. On microbial states of growth. Mol Microbiol. 1995;15(5):789–794. pmid:7596281
- 2. Scott M, Gunderson CW, Mateescu EM, Zhang Z, Hwa T. Interdependence of Cell Growth and Gene Expression: Origins and Consequences. Science. 2010;330:1099–1102. pmid:21097934
- 3. Molenaar D, van Berlo R, de Ridder D, Teusink B. Shifts in growth strategies reflect tradeoffs in cellular economics. Mol Syst Biol. 2009;5:323. pmid:19888218
- 4. Price ND, Reed JL, Palsson BØ. Genome-scale models of microbial cells: evaluating the consequences of constraints. Nature Rev Microbiol. 2004;2:886–897.
- 5. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nature Biotechnol. 2010;28(3):245–248.
- 6. Schuster S, Fell DA, Dandekar T. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nature Biotechnology. 2000;18:326–332. pmid:10700151
- 7. Gagneur J, Klamt S. Computation of elementary modes: a unifying framework and the new binary approach. BMC Bioinf. 2004;5:175.
- 8. Bordel S, Nielsen J. Identification of flux control in metabolic networks using non-equilibrium thermodynamics. Metab Eng. 2010;12(4):369–377. pmid:20302968
- 9. Schuster S, Hilgetag C. On elementary flux modes in biochemical reaction systems at steady state. J Biol Systems. 1994;2:165–185.
- 10. Schuster S, Hilgetag C, Woods JH, Fell DA. Reaction routes in biochemical reaction systems: algebraic properties, validated calculation procedure and example from nucleotide metabolism. J Math Biology. 2002;45:153–181.
- 11. Weiße A, Oyarzún DA, Danos V, Swain PS. Mechanistic links between cellular trade-offs, gene expression, and growth. Proc Nat Acad Sciences USA. 2015; p. E1038–E1047.
- 12. Basan M, Hui S, Okano H, Zhang Z, Shen Y, Wiliamson JR, et al. Overflow metabolism in E. coli results from efficient proteome allocation. Nature. 2015;528:99–104. pmid:26632588
- 13. Scott M, Hwa T. Bacterial growth laws and their applications. Curr Opin Biotechnol. 2011;22:559–565. pmid:21592775
- 14. Müller S, Regensburger G, Steuer R. Resource allocation in metabolic networks: kinetic optimization and approximations by FBA. Biochem Soc Trans. 2015;43(6):1195–1200. pmid:26614660
- 15. Mori M, Hwa T, Martin OC, Martino AD, Marinari E. Constrained Allocation Flux Balance Analysis. ploscb. 2016;12(6):e1004913.
- 16. Goelzer A, Fromion V. Resource allocation in living organisms. Biochem Soc Trans. 2017;45(5):945–952. pmid:28687715
- 17. Wortel MT, Peters H, Hulshof J, Teusink B, Bruggeman FJ. Metabolic states with maximal specific rate carry flux through an elementary flux mode. FEBS Journal. 2014;281:1547–1555. pmid:24460934
- 18. Müller S, Regensburger G, Steuer R. Enzyme allocation problems in kinetic metabolic networks: Optimal solutions are elementary flux modes. J Theor Biology. 2014;347:182–190.
- 19. de Groot DH, Planqué R, van Boxtel C, Bruggeman FJ, Teusink B. The number of active metabolic pathways is bounded by the number of cellular constraints at maximal metabolic rates. PLoS Comp Biol. 2019;15(3):e1006858.
- 20. Goelzer A, Fromion V, Scorletti G. Cell design in bacteria as a convex optimisation problem. Automatica. 2011;47:1210–1218.
- 21. Lerman JA, Hyduke DR, Latif H, Portnoy VA, Lewis NE, Orth JD, et al. In silico method for modelling metabolism and gene product expression at genome scale. Nature Comm. 2012;3:929.
- 22.
O’Brien EJ, Utrilla J, Palsson BO. Quantification and classification of
*E. coli*proteome utilization and unused protein costs across environments. PLoS Comp Biol. 2016;12:e1004998. - 23. You C, Okano H, Hui S, Zhang Z, Kim M, Gunderson CW, et al. Coordination of bacterial proteome with metabolism by cyclic AMP signalling. Nature. 2013;500:301–306. pmid:23925119
- 24. de Jong H, Casagrand S, Giordano N, Cinquemani E, Ropers D, Geiselmann J, et al. Mathematical modelling of microbes: metabolism, gene expression and growth. J Roy Soc Interface. 2017;14:20170502.
- 25.
Heinrich R, Schuster S. The Regulation of Cellular Systems. Chapman and Hall (New York); 1996.
- 26. Schaechter M. A brief history of bacterial growth physiology. Frontiers in microbiology. 2015;6:289. pmid:25954250
- 27.
Dill K, Bromberg S. Molecular Driving Forces: Statistical Thermodynamics in Biology, Chemistry, Physics, and Nanoscience. Garland Science; 2012.
- 28. McGuffee SR, Elcock AH. Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm. PLoS computational biology. 2010;6(3):e1000694. pmid:20221255
- 29. Parry BR, Surovtsev IV, Cabeen MT, O’Hern CS, Dufresne ER, Jacobs-Wagner C. The bacterial cytoplasm has glass-like properties and is fluidized by metabolic activity. Cell. 2014;156(1-2):183–194. pmid:24361104
- 30. Painter PR, Marr AG. Mathematics of microbial populations. Ann Rev Microbiol. 1968;22:519–548.
- 31.
Cornish-Bowden A. Fundamentals of Enzyme Kinetics. 4th ed. Wiley-Blackwell; 2004.
- 32. Kelk SM, Olivier BG, Stougie L, Bruggeman FJ. Optimal flux spaces of genome-scale stoichiometric models are determined by a few subnetworks. Sci Rep. 2012;2:580. pmid:22896812
- 33. Scott M, Klumpp S, Mateescu EM, Hwa T. Emergence of robust growth laws from optimal regulation of ribosome synthesis. Mol Syst Biol. 2014;10:747. pmid:25149558
- 34. Bosdriesz E, Molenaar D, Teusink B, Bruggeman FJ. How fast-growing bacteria robustly tune their ribosome concentration to approximate growth-rate maximisation. FEBS Journal. 2015;282(10):2029–44. pmid:25754869
- 35.
Schaechter M, Ingraham JL, Neidhardt FC. Microbe. ASM Press, Washington DC; 2006.
- 36. Keren L, Hausser J, Lotan-Pompan M, Slutskin IV, Alisar H, Kaminski S, et al. Massively parallel interrogation of the effects of gene expression fevels on fitness. Cell. 2016;166(5):1282–1294.e18. pmid:27545349
- 37. Planqué R, Hulshof J, Hendriks JC, Teusink B, Bruggeman FJ. Maintaining maximal metabolic rate using gene expression control. PLoS Comp Biol. 2018;14(9):e1006412.
- 38. Wortel M, Noor E, Ferris M, Bruggeman FJ, Liebermeister W. Metabolic enzyme cost explains variable trade-offs between microbial growth rate and yield. PLoS Comp Biol. 2018;14(2):e1006010.
- 39.
Khodayari A, Maranas CD. A genome-scale
*Escherichia coli*kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nature Comm. 2016;7:13806. - 40.
Beg QK, Vazquez A, Ernst J, de Menezes MA, Bar-Joseph Z, Barabási AL, et al. Intracellular crowding defines the mode and sequence of substrate uptake by
*Escherichia coli*and constrains its metabolic activity. Proc Nat Acad Sciences USA. 2007;104(31):12663–12668. - 41.
Goelzer A, Fromion V, Scorletti G. Cell design in bacteria as a convex optimization problem. In: Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference. Shanghai, P.R. China, 16–18 December 2009; 2009. p. 4517–4522.
- 42. O’Brien EJ, Lerman JA, Chang RL, Hyduke DR, Palsson BO. Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol Syst Biol. 2013;9:693. pmid:24084808
- 43.
Dourado H, Lercher MJ. An analytical theory of cellular growth; 2019.
- 44. Noor E, Flamholz A, Bar-Even A, Davidi D, Milo R, Liebermeister W. The Protein Cost of Metabolic Fluxes: Prediction from Enzymatic Rate Laws and Cost Minimization. PLoS Comp Biol. 2016;12(11):e1005167.
- 45. Li GW, Burkhardt D, Gross C, Weissman JS. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell. 2014;157(3):624–635. pmid:24766808
- 46. Lu P, Vogel C, Wang R, Yao X, Marcotte EM. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nature Biotechnol. 2007;25(1):117–124.
- 47.
Valgepea K, Adamberg K, Seiman A, Vilu R.
*Escherichia coli*achieves faster growth by increasing catalytic and translation rates of proteins. Mol Biosyst. 2013;9(9):2344–2358. pmid:23824091 - 48.
Teusink B, Wiersma A, Molenaar D, Francke C, de Vos WM, Siezen RJ, et al. Analysis of growth of
*Lactobacillus plantarum*WCFS1 on a complex medium using a genome-scale metabolic model. J Biol Chem. 2006;281(52):40041–8. pmid:17062565