Models of protein production along the cell cycle: An investigation of possible sources of noise

In this article, we quantitatively study, through stochastic models, the effects of several intracellular phenomena, such as cell volume growth, cell division, gene replication as well as fluctuations of available RNA polymerases and ribosomes. These phenomena are indeed rarely considered in classic models of protein production and no relative quantitative comparison among them has been performed. The parameters for a large and representative class of proteins are determined using experimental measures. The main important and surprising conclusion of our study is to show that despite the significant fluctuations of free RNA polymerases and free ribosomes, they bring little variability to protein production contrary to what has been previously proposed in the literature. After verifying the robustness of this quite counter-intuitive result, we discuss its possible origin from a theoretical view, and interpret it as the result of a mean-field effect.

1 Impact of Random Partitioning 1 Each of the three different models are analyzed following the scheme presented in 2 Figure 7A of the main article. We begin with a theoretical analysis to derive the average 3 production of mRNAs and proteins along the cell cycle. Using these results, we then fit 4 the parameters of the model to make them correspond to the experimental data of [1]. 5 Then simulations are performed to predict protein variance for each gene considered. 6 We present now results for the model with volume growth, constant gene 7 concentration and partitioning at division which has been presented in Figure   Here is shown the Proposition 1 that describes the average number of mRNAs at any 13 instants of the cell cycle. 14 For any time s ∈ R + , denote by M s the number of mRNAs at this instant. We 15 suppose that the initial time s = 0 is a time of division; in this case, at each time i · τ D 16 with i ∈ N are moments of division. For any i ∈ N , M iτ D denotes the number of 17 mRNAs at the beginning of i-th cell cycle and M iτ D − the number of mRNAs in the 18 (i − 1)-th cell cycle just before division. 19 We suppose that a lot of cell divisions have already occurred even before time t = 0, 20 and hence the considered cell cycle takes place when the embedded Markov chain 21 (M iτ D ) i has already reached its steady state: the distribution M iτ D is the same as the 22 distribution of M (i+1)τ D . If the steady state is already reached at time 0, it implies 23 that the distribution of any M iτ D +t for any i ∈ N and s ∈ [0, τ D [ is equal to the 24 distribution of M t . As a consequence, we can only consider the first cell cycle t ∈ [0, τ D [ 25 to fully characterize the behavior of M s at any time s ∈ R + . 26 We propose here to describe the evolution of (M t ) between times 0 and τ D (during 27 this period of time, the number of mRNA approximately doubles). We first divide 28 mRNAs into two categories, 29 -First group: mRNAs which were present at the birth of the cell. Each mRNA i of 30 the first group is characterized by E i σ1 , its lifetime given by an exponential following the MPPP N , whose distribution is of intensity ν). The random variable x i represents the time at mRNA creation and y i its lifetime, hence this mRNA exists from volume x i up to volume x i + y i ; that is to say that mRNA is still present at time t, if and only if the point (x i , y i ) is in the set with ∆ t = (x, y) ∈ R 2 + , 0 < x < t, y > t − x .
-Second group: mRNAs which have been created since the birth of the cell. The Chapter of [2] or [3] for the main results concerning MPPP. 42 We use this tool to characterize the number of mRNAs of the second group. In ∆ t = (x, y) ∈ R 2 + , 0 < x < t, y > t − x .
One can refer to S5 Fig Since all E i σ1 i are i.i.d. and independent of M 0 , the first term is given by For the second term, one has to remark that as N is a MPPP, N (∆ τ D − ) is a Poisson 60 random variable (Proposition 1.13.a of [2]). The parameter of this Poisson random 61 variable is given by As a consequence, one gets that, for any time t in the cell cycle, We still have to specify the mean number of mRNAs at birth M 0 . At the end of The mean number of mRNAs is now determined for any moment of the cell cycle. Each 73 of the mRNAs potentially produces proteins at rate λ 2 . As for the mRNAs, we describe 74 the number of proteins at time t by grouping them into two categories.

75
PLOS 3/36 -The P 0 proteins that were present at birth and which remain in the bacteria 76 during all the cell cycle (as said in the main article the proteolysis is not 77 considered in this model).

78
-The proteins that have been created during the current cell cycle. The rate of 79 production is depending on the current number of mRNAs. We consider N i λ2 (for 80 i ∈ N and i ≥ 1) independent Poisson Point Processes of intensity λ 2 . If the i-th 81 mRNA exists at time t (that is to say if i ≤ M t ), then the number of proteins 82 produced by this mRNA between t and t + dt is N i λ2 (dt). 83 To sum up, the number of proteins at a time t of the cell cycle is given by The first term is the number of proteins at birth, and the second take into account all 85 the proteins created between times 0 and t. One can then determine the mean number 86 of proteins at any time t of the cell cycle.
Proof. By taking the average of Equation (1.5), one gets As we know the mean number of mRNAs M u at time u of the cell cycle with 90 Proposition 1, Since the system is at steady state, we have for time τ D −, P τ D − = 2 P 0 ; so Consequently, for any time t of the cell cycle, hence the result.

94
In particular, as the mean P t /V (t) does not change across the cell cycle, the global 95 protein average is given by, (1.6)

97
For each gene measured in [1], we want to identify the set of corresponding parameters 98 λ 1 , σ 1 and λ 2 . We also need to determine the "global" quantities τ D and V 0 . We first 99 determine the parameters common to all genes. The division time τ D is set to 150min 100 in the article and the volume at birth V 0 is taken equal to 1.3 µm 3 . 1

101
Then we have to determine for each gene the three gene-specific parameters λ 1 , σ 1 102 and λ 2 . We consider the genes of the article for which was measured the empirical mean 103 of messengers µ m and proteins µ p concentrations, as well as the mRNA half-life time τ m . 104 First we determine the rate of mRNA degradation for each gene with the measured 105 mRNA half-life time τ m . A half-life τ m indicates that a mRNA has a probability 1/2 to 106 disappear within a duration τ m , hence e −σ1τm = 1/2. From that, we can compute the 107 rate σ 2 (specific for each type of mRNA), A summary of the different parameters can be seen in Figure 7C of the main article. 113 Having determined all the parameters allows to perform simulations of the model using 114 stochastic algorithm in order to assess the variability of every protein and compare 115 them with those experimentally obtained in [1]. 116

117
When performing simulations, one needs to take care of the non-homogeneity of the 118 Poisson processes describing mRNA creation times: the rate of protein production 119 λ 1 V (t) is not a homogeneous rate as it changes with time. That does not allow a direct 120 application of Gillepsie method [4], an extension for non-homogenous processes has to 121 be used.

122
Reference [4] describes an algorithm to simulate stochastic trajectories such as the 123 quantities of different chemical species interacting together. The main idea is to 124 consider the state of a system (for instance the number of each chemical compounds) 125 and to compute the first reaction to occur, as well as the time when it happens. Once 126 both pieces of information computed, one change the current state of the system 127 accordingly with the reaction, and update the time.

128
One important hypothesis is that all reactions occur at exponential times (even if 129 the rates of these exponential times may depend on the current state of the system). In 130 the current intermediate model, at any time t, the state is described by (M t , P t )

131
(respectively, the number of mRNAs and proteins), and the rate of mRNA production is 132 Λ(t) = λ 1 V (t) with λ 1 a parameter and V (t) the non-constant volume of the cell. The 133 parameter Λ(t) does not depend on the state (M t , P t ) but is time dependent through 134 V (t); for this reason, it is not an exponential time.

135
In this case, the duration of time T until the next mRNA production is 1 The value of V 0 , even if it is not explicitly given in [1] can be deduced from the typical width given in its supplementary materials.

5/36
which is not an exponential distribution as Λ is non-constant. To compute T , we 138 consider that Λ(t) is strictly positive for any t ∈ R + , as a consequence 139 F (x) := x 0 Λ(t) dt is strictly increasing. Let E be an exponential random variable with 140 parameter 1. We have hence If we consider the case of y = F (x), since F is strictly increasing, hence and As a consequence the random variable F −1 (E) has the same distribution as T .

143
Based on that we can propose a new version of the algorithm of Gillespie that can 144 take into account non-exponential times such as T .
In order to be applied here, we need to specify what does the gene environment Z refers 163 to. In our model, two similar genes (with the same parameters λ 1 , σ 1 ,λ 2 ) in the same 164 cell would undergo the same volume growth. But each gene would undergo a specific 165 partition at division (for instance, in the case of random partitioning, the partitions of 166 the proteins of one of the gene, is uncorrelated with the partition of the proteins of the 167 other gene).

168
With only considering the volume as the gene environment, we end up with the following decomposition,

6/36
As in this model, the P t /V (t) remains constant during the cell cycle, the second 169 term of the decomposition remains null.

170
It has been confirmed by a simulation of the dual reporter technique, where the 171 expression of two identical promoters in the same cell has been compared (we took the 172 example of the protein Adk). The concentrations of the two proteins, respectively P 1 /V 173 and P 2 /V , have been compared. In particular, we measured their covariance which is 174 much smaller than their respective variances (see S1 Fig (D)). 175 1.5 Simplified model for the random partitioning 176 Here we explain the simplified model used to make the prediction of protein noise ratio 177 in blue dash line of Figure 2D of the main article.

178
For a given quantity P associated with the gene, the partitioning can be performed 179 in two ways, either exact or random. The result in each case will be denoted 180 respectively by P e and P r . During division, the volume is divided by two, changing 181 from 2V 0 to V 0 . In order to be plotted in Figure 2D  Proposition 3. The Variance ratio η as a function of x is given by Proof. We have the quantity before division P . Since by definition, we have that For the effect of binomial division, see Lemma 11, it describes the effect of the binomial division on the means and on the variances of several quantities. By the volume in order to observe the concentrations, one gets that As a consequence, this gives the relation We present here results relative to the model presented in Figure   If time t = 0 is the beginning of a new cell cycle and if the system is already at 204 steady state in the same sense as the previous models (see Section 1.1.1). We consider 205 that M 0 , the number of mRNAs at birth is known. As in Section 1.1.1, we assort 206 mRNAs in independent groups; here we consider three categories.

207
-mRNAs which were present at the birth of the cell. Each of them is characterized 208 by its lifetime given by an exponential time of rate σ 1 . The i-th mRNA is still  As in Section 1.1.1, one can represent the number of mRNAs of the second and the third group as two independent MPPPs N and N . The first variable x of each of these MPPPs will represent the time. The intensity of each of the MPPP is the same, The only difference between N and N is the fact that they begin at time 0 for N and 220 at time τ R for N (see S6 Fig). As a consequence, if we consider an mRNA of either 221 group, the conditions of its existence at time t ∈ [0, τ D [ are respectively,

222
-if it is in the second group: Hence, we can describe the number of mRNAs at any time t ∈ [0, τ D [ as follows, Each term corresponds to each group of mRNAs previously described.

226
At first we want to characterize the distribution of M 0 , the number of mRNAs at 227 the birth of the cell. To do so, we use the steady state hypothesis that implies that

229
Proposition 4. At steady state, the number of mRNAs at birth M 0 follows a Poisson 230 distribution of parameter: Proof. When s=τ D −, by Relation (2.1), The first term corresponds to initial messengers not degraded after the time τ D .

233
Suppose that M 0 is distributed according to a Poisson distribution with parameter x 0 , 234 then the random variable follows also a Poisson distribution with parameter x 0 e −τ D σ1 . Operation of thinning of 236 Poisson processes, see [3] for example.

237
The second term corresponds to mRNAs that were created by the first copy of the 238 gene and which are still present at division. Since N is a MPPP, N (∆ τ D − ) is a Poisson 239 random variable (Proposition 1.13 of [2]) with parameter The third term corresponds to mRNAs that were created by the second copy of the 241 gene (replicated at time τ R ) and which are still present at division. As before, As M τ D − is the sum of three independent Poisson random variables, one gets that Between τ D − and τ D , with the random sampling, each mRNA has an equal chance 244 to stay or to disappears, therefore Since the system is at steady state, one has M 0 which gives Since the steady state distribution is unique, the number of mRNAs at birth follows a 251 Poisson distribution of parameter x 0 at steady state.
In particular, the mean and the variance of mRNA concentration are known at any time t of the cell cycle, Proof. At a moment t of the cell cycle, the moment-generating function of M t at ξ < 0 258 is given by

10/36
With Proposition 4, M 0 is known to be a Poisson random variable of parameter x 0 , 261 hence, with the probability generating function of a Poisson random variable, holds. For the second factor, one can recall that N (∆ t ) is a Poisson random variable.

263
As in Proposition 4, its parameter can be calculated Identically for the third factor, N (∆ t ) is a Poisson random variable of parameter As a consequence, the moment generating function of M t is which is the moment-generating function of a Poisson random variable of parameter

268
As for the previous analysis of the mRNA number, we search an expression for protein 269 production through the cell cycle. This case is more complicated than the mRNA case 270 and we will only calculate analytical expressions only for the first two moments of P t .

271
Propositions 8 and 10 are the main theoretical results of this section: for any time t 272 of the cell cycle, it gives explicit expressions for the mean P t and the variance Var [P t ] 273 of the protein number. This result is important as it will be used to directly calculate 274 the mean P/V and variance Var [P/V ] of the protein concentration averaged across 275 the cell cycle without using simulations: only with the parameters of the model (λ 1 , σ 1 , 276 λ 2 , τ R and τ D ), we will be able to know the behavior of the protein concentration in 277 terms of variance.

278
In order to prove the Propositions 8 and 10, we will characterize P t and Var [P t ] in 279 the two following cases: 280 1. First, we consider the case before replication (t < τ R ). We begin by considering 281 that the state of the cell at birth (M 0 , P 0 ) is known and we calculate the first two 282 moments of P t for any time t < τ R (Corollary 7). Then, we integrate over all the 283 possible initial states (M 0 , P 0 ) to determine expressions for P t and Var [P t ] for Then we consider the case after replication (t ≥ τ R ). Similarly the first case, we 288 will consider that the state of the cell at replication (M τ R , P τ R ) is known and we 289 calculate the first two moments of P t for any time τ R ≤ t < τ D (Proposition 10). 290 After integration, expressions for P t and Var [P t ] for any time t after replication 291 are determined, these expressions depend on and Cov [M τ R , P τ R ] (Proposition 10).

293
In the end, in Propositions 8 and 10, are presented the mean and variance of protein 294 number at any time t of the cell cycle, only depending on the first moments of (M 0 , P 0 ) 295 and (M τ R , P τ R ). Additional results then determine explicitly the first moments of 296 (M 0 , P 0 ) and (M τ R , P τ R ) so that the mean and variance of protein number will be fully 297 characterized.

298
Description of the Process of the Number of Proteins Before beginning, we 299 describe the number of proteins P t at any time t. We will use this description in the 300 following proofs. Similarly to mRNA case Equation (2.1), we group them into two 301 categories.

302
-The P 0 proteins that were there at birth and which remain in the cell during all 303 the cell cycle (as said in the main article the proteolysis is not considered in this 304 model).

305
-The proteins that were created during the cell cycle. The rate of production 306 depends on the current number of mRNAs. For that we consider N i λ2 i∈N , a 307 sequence of i.i.d. Poisson Point Processes of intensity λ 2 ; if the i-th mRNA exists 308 at time t (that is to say if i ≤ M t ), then the number of proteins produced by this 309 mRNA between t and t + dt is N i λ2 (dt). Hence, the total number of proteins 310 produced between t and t + dt is then

311
To summarize, the number of proteins at a time t of the cell cycle is The first term is the number of proteins at birth, and the second takes into account all 313 proteins created between times 0 and t.

314
Protein Number Before Replication We begin with the case before replication, 315 t < τ R . We use the notation · M0,P0 as the conditional expectation given (M 0 , P 0 ), i.e. 316 · M0,P0 = ·| (M 0 , P 0 ) . We first characterize the first two moments of P t conditionally 317 on (M 0 , P 0 ). As for the mRNAs, we determine at first the moment-generating function 318 of P t .

319
Proposition 6. For any t ∈ [0, τ R [, the conditional moment generating function of P t 320 can be expressed as for any ξ < 0 and such as h t is the moment generating function of expression of h t is given by We then consider the Laplace functional of the Poisson process N i λ2 , By making the product for i from 1 to infinity, one gets As a consequence, it indeed follows that Using the expression (2.1) of M t , integrated between time 0 and t < τ R gives the result. 327 For more details of the calculations, see Chapter 3 of [7]).

328
As the moment generating function of P t has been characterized, it is possible to 329 deduce, by derivation, the first two moments of P t knowing (M 0 , P 0 ) for any time t 330 before the gene replication.

331
Corollary 7. At steady state, for t ∈ [0, τ R [, the first two conditional moments of P t are given by Proof. The first two moments of P t can be obtained by derivation of the moment 332 generating function of Proposition 6, and The calculations of h t (0) 2 , h t (0) 2 allow to show the result (see Chapter 3 of [7] for 334 the details of the calculation).

335
The previous corollary gives expressions for P t M0,P0 and P 2 t M0,P0 . In the next 336 proposition, we integrate these expressions over all birth states (M 0 , P 0 ) to find 337 formulas for P t and Var [P t ] for any time t < τ R before replication. These expression 338 depends on joint moments of M 0 and P 0 .

13/36
Proposition 8. At any time t ∈ [0, τ R [ before replication, the mean and the variance of P t are given by where x 0 is defined in Proposition 4.

340
Proof. By considering the mean of the random variable P t |(M 0 , P 0 ) in Corollary 7, the result for P t is easy to get. For the variance, consider the expression of P 2 t |(M 0 , P 0 ) Finally, one just has to remark that due to Proposition Protein Number After Replication For a time t such as τ R ≤ t < τ D . We adopt 342 a similar approach as for the previous case, the state just after replication (M τ R , P τ R ) is 343 known, and we want to determine the first two moments of P t for any time t after the 344 replication.

14/36
Proposition 9. At steady state, for a time t ∈ [τ R , τ D [, conditionally on the state of 346 the cell at replication (M τ R , P τ R ), the first two moments of P t are given by Proof. After the replication, the rate of mRNA production is doubled, but otherwise, 348 the dynamic is identical as it was before the replication. One can hence easily adapt the 349 proofs of Proposition 6 and Corollary 7, by replacing the initial state by the state at 350 replication (M τ R , P τ R ), by considering that the mRNA production rate is 2λ 1 , and that 351 the time spent since the initial state is t − τ R .

352
We can then integrate the previous expressions on all possible states at replication Proposition 10. At any time t ∈ [τ R , τ D [ after replication, depending on joint moments of P τ R and M τ R , the mean and the variance of P t are given by Proof. It is similar to the proof of Proposition 8.

15/36
Indeed, between times τ D − and τ D , the proteins undergo a random partitioning, 363 and since the system is at steady state, the distribution of the number of proteins after 364 division P τ D is the same as the distribution of proteins at birth P 0 . As a consequence: 365 with B i,1/2 being independent Bernoulli random variables of parameter 1/2 and being 366 all independent of P τ D − .

367
Lemma 11. The mean and the variance of P 0 depend on the mean and the variance of 368 P τ D − in the following way Proof. With the moment-generating function of P 0 , one gets As ξ goes to 0, one getss The lemma is proved. 372 We then use this Lemma to calculate the means P 0 and P τ R and the variances with τ = 0 in the case of η = 1 (before replication) and τ = τ R for the case η = 2 (after 376 replication). In that case, we have that: Proof. With Propositions 8 and 10, one gets We conclude with the Lemma 11.

379
Proposition 13. For η = 1, 2, define with τ = 0 in the case of η = 1 (before replication) and τ = τ R for the case η = 2 (after replication). In that case, we have that: Proof. By considering the expressions of Proposition 10 for t = τ D −, Similarly, the expression of Proposition 8 for t = τ R − gives the expression of Var [P τ R ] by continuity. We have Lemma 11 describes the effect of the binomial sampling between τ D − and τ D on the mean and the variance of P . Since, we are at steady state of cell cycles, one has The expression of Var [P τ R ] can then be deduced from Proposition 8.

17/36
with τ = 0 in the case of η = 1 (before replication) and τ = τ R for the case η = 2 (after 383 replication). In that case, the covariances can be expressed as Proof. The proof follows the same arguments as in the proofs of the previous 386 propositions. Details of the calculations can be found in Chapter 3 of [7].   The article [8] investigates the replication initiation. It is shown that the initiation 400 occurs at a fixed volume per replication origin, and thus independently from the time 401 since the previous division. Furthermore, this volume seems to be constant for different 402 conditions. For slow growing bacteria (with only one DNA replication per cell cycle), 403 such as those in [1], the volume at which DNA replication initiation occurs is 404 V I = 1.8 µm 3 . As in our model, the volume is considered as growing exponentially, we 405 define the time of replication initiation τ I as The initiation of DNA replication occurs at τ I , the remaining delay to gene replication 407 of each gene is considered as deterministic (we consider the speed of DNA replication as 408 constant). The whole chromosome is replicated in around 40 min [9], therefore the 409 distance of the gene from the origin of replication is sufficient to determine the time it 410 takes for the DNA-polymerase to replicate it. The position of each gene was determined 411 with Ecogene database [10]. population, one have to explicit the age distribution u of the population. We consider at first that the distribution is homogeneous between age 0 and τ D (see Section 2.2.3 for a more realistic distribution). Then, the global averages are known through the integration over the cell cycle of the mean formulas of Theorem 5 and the Propositions 8 and 10 we can write the global average of mRNA and protein concentrations as As a consequence, parameters λ 1 and λ 2 can be expressed as follows: For each gene, all parameters can be hence determined.  The unit of production of one particular protein is presented in S7 Fig. We recall that, 440 for any time t, the copy number of the i-th gene is G i (t), the number of mRNA is M i (t) 441 and the number of proteins is P i (t), the number of RNA-polymerases sequestered on the 442 i-th gene is E Y,i (t) and the number of ribosomes sequestered on an mRNA of type i is 443 E R,i (t). The number of non-sequestered RNA-polymerases and ribosomes are 444 respectively denoted as F Y (t) and F R (t).  Transcription In the current model, the process of mRNA production is considered as 446 taking part in two steps: first, the binding of the RNA-polymerase and initiation; 447 and second, the elongation and termination of the mRNA. For the first step, 448 inside a unit volume, the rate at which an RNA-polymerase binds on the promoter 449 of the i-th gene is given by the law of mass action with λ 1,i accounts for the specificity of the promoter (its affinity for the 451 RNA-polymerase, the chromosome conformation, etc.). As we are interested in the 452 rate of reactions inside the whole cell of volume V (t), the rate of reaction is then 453 The elongation time is given by an exponential random variable of rate µ 1,i . Once 454 the elongation terminates, the RNA-polymerase is released in the cytoplasm 455 (increasing the number of free RNA-polymerases F Y by one unit). A messenger is 456 considered created as soon as its elongation begins: the reason for it is that in 457 bacteria (unlike eukaryotes), since transcriptions and translations happen in the 458 same medium, a translation can begin on an mRNA on which the transcription is 459 not finished. As for the previous models, each messenger of type i has a lifetime 460 given by an exponential random variable of rate σ 1,i .

461
Translation Similarly to the transcription, the rate at which a ribosome encounters an 462 mRNA of type i and initiate translation is λ 2,i M i (t)F R (t)/V (t) where λ 2,i will 463 account for mRNA specific aspects (RBS affinity for ribosomes, etc.). The total 464 number of ribosomes sequestered on messengers of type i is E Y,i (t) and each 465 elongation time follows an exponential distribution of rate µ 2,i . Here we consider 466 that the protein is created after the termination (since the protein is usually fully 467 functional once its translation is completed); the number of proteins P i (t) is then 468 increased by one unit. As previously we do not consider protein proteolysis since 469 it usually occurs at much longer timescale than cell cycle. For the division, we considered at first that, like in the previous models, the 484 division occurs when the cell reaches exactly the volume 2V 0 (with V 0 = 1.3 µm 3 485 as it was the case for the intermediate models considered earlier). We will consider 486 in Section 3.6.4 the case where the division timing is not as precise. As before, the 487 effect of septation is a random sampling of messengers and proteins: each of them 488 has an equal chance to be in the next considered cell or not. Moreover, at division, 489 all genes have only one copy. considered as proportional to the current total mass of proteins in the cell. We 493 denote by β P represents ratio mass-volume and by w i the mass of a type i protein. 494 In that case, we have by definition Thus each protein of type i created increases the total volume of the cell with 496 respect to the factor w i /β P . The mass w i of a protein is determined according to 497 its gene length.

498
Production of RNA-polymerases and ribosomes The total number of 499 RNA-polymerases and ribosomes (whether allocated or not) are respectively 500 denoted by N Y (t) and N R (t). In a first step, we consider that the both these 501 quantities are in constant concentration, that is to say with β Y and β R constant parameters and where is the notation for the floor 503 function. As the cell grows, new RNA-polymerases and ribosomes are added to 504 the system in the corresponding proportion. When division occurs, ribosomes and 505 RNA-polymerases will be set accordingly to the new volume. In Section 3.6 we 506 will consider the more complex case where both RNA-polymerases and ribosomes 507 are directly produced through a gene expression process.

509
This complete model is more complex than the previous ones. It is due in part to the 510 feedback loop that proteins have on their own production: the more proteins, the more 511 the volume increases, thereby increasing the total amount of ribosomes and hence the 512 translation rates. This complicates the complete analytical description of mRNA and 513 protein mean productions. In this section, we propose a description that mimics the 514 average behavior of our stochastic model: the goal is to be able to fit parameters to real 515 measures and use them for stochastic simulations.

517
The description chosen to reflect the average behavior of the stochastic model previously 518 described is a system of ordinary differential equations (ODEs) that describes the 519 kinetics of each compound concentration of the system. 520 We consider K genes, each of them has a corresponding type of mRNA and protein. 521 For a gene of type i, the concentration of gene copies is given by g i (t); mRNAs and 522 protein concentrations are denoted by m i (t) and p i (t). Similarly, f Y (t) and f R (t) 523 respectively represent the concentrations of free RNA-polymerases and free ribosomes; 524 while e Y,i (t) and e R,i (t) denote the concentrations of RNA-polymerases and ribosomes 525 currently sequestered to produce type i proteins. All these quantities correspond to 526 concentrations and not numbers of entities (their stochastic counterparts would be the 527 The reactions between different compounds are given by the law of mass action, that 529 is to say that the rate of chemical reaction is proportional to the reactants abundance. 530 We will study the evolution of m i , the concentration of mRNAs of type i . The creation 531 of a type i mRNA is the result of a reaction between a free RNA-polymerase (whose 532 concentration is f Y (t)) and the gene i (whose concentration is g i (t)); λ 1,i is interpreted 533 as the affinity constant of the reaction. The type i mRNA degradation is the result of a 534 reaction that occurs at rate σ 1,i .

535
As in the usual description of the cell (see [14] for instance), one also must consider 536 the dilution: without any molecule creation, the concentration of the compound still 537 decreases as the volume grows due to dilution. If we consider that cell volume is 538 growing exponentially, doubling of volume in a time τ D , then the rate of dilution is 539 log 2/τ D . The exponential growth corresponds to the volume dynamics of real 540 bacteria [15], and we will see in Section 3.4 that it is a good approximation of the 541 growth of cells in stochastic simulations.

542
All these aspects considered altogether, the kinetics of the concentration of mRNAs 543 of type i is given by the ODE: The first term represents the mRNA creation; the second, the mRNA degradation; and 545 the last, the dilution.

546
For the other reactions, for i ∈ {1, ..., K}, one has dp i dt As for the stochastic model of the previous section, assume that the concentration of 547 RNA-polymerases (allocated or not) is constant and equal to β Y , i.e. (3.6) since i e Y,i and f Y represent the concentrations of respectively the allocated and 549 non-allocated RNA-polymerases. It is similar to the ribosomes as we have: shift between the stochastic protein concentration and the one that shoule be obtained. 557 We have described the cell during one cycle with a non-constant gene concentration. 558 The instant of replication of gene i replication within the cycle is denoted by τ R,i . in 559 particular, at time t, the i-th gene copy number is known: The dynamic of h is given by summing the Equations (3.4) for i from 1 to K, and by 583 using Equation (3.6): The h is simply a weighted sum of the e Y,i allocated RNA-polymerases. We decided 585 to consider that such weighting has little influence, and that h does not greatly differ 586 from the uniform sum i e Y,i , that is to say: It would be in particular true if all elongation rates µ 1,i are identical for all genes (i.e. if 588 µ 1,i ≡ µ 1 for all i).

589
With this simplification, from Relation (3.9), one obtains a differential equation for 590 (3.10) One can remark that the concentrations of free RNA-polymerases is on a quick 592 timescale. Indeed, as there are of the order of 1.4 × 10 3 mRNAs in the cell, see [17] that 593 last approximately 4 minutes, see [1], it gives of the order of 6 translations per second. 594 As a consequence, one can expect that that f Y quickly reaches its steady state during 595 the cell cycle. This consideration will be justified a posteriori by the agreement with 596 stochastic simulations.

597
With these considerations, we set the derivative term of Equation (3.10) to be null, 598 hence In the next section it will be shown that log 2/( µ 1 × τ D ) ∼ 10 −3 1, we will 600 therefore neglect the contribution of this term. With a similar argument for free 601 ribosomes, we get .
With global quantities f Y and f R known, we are able to give expression for find that: With the boundary conditions of Equation (3.8), it is easy to deduce that: (3.11) Since the quantities g i , f Y are known, we have an explicit solution for m i .

607
Similarly for e Y,i (t) and e R,i (t), Consider now the type i protein concentration. By integrating the Equation As in the previous models, we are interested in average concentrations over the cell 610 cycle. Since, in the system of ODEs, we define average concentrations over the cell cycle 611 of free RNA-polymerases and ribosomes respectively as dt.
(3.13) We defined similarly the concentrations m i and p i averaged over the cell cycle. By 613 integrating Equations (3.12) and (3.12), it follows: (3.14) Now we have expressions of the average concentrations of m i , p i , f R and f Y for any 615 time t in the cell cycle that will be used in the next subsection to determine the 616 parameters.

618
The stochastic model of this section are used to describe the production of all proteins 619 of the cell. Recall that [1] has only considered 1018 genes, out of which only 841 have 620 their mRNA production measured. In a first step, we only take into account the 841 621 genes with protein and mRNA production measured and consider that it would 622 represent the whole genome; in Section 3.6.1 we will study the case of a simulation with 623 a complete set of genes representing a full genome of about 2000 genes.

624
The determination of the model parameters σ 1,i of mRNA degradation of type i , of 625 the doubling time τ D and the time τ R,i of gene replication is the same as for the

627
We still need to determine all reaction rates for every protein type (λ 1,i , µ 1,i , λ 2,i 628 and µ 2,i for i ∈ {1, ..., K}) as well as concentration parameters of RNA-polymerases, 629 and ribosomes (respectively β Y and β R ), the proportion between the volume and the 630 protein mass β P , the mass of each proteins w i and the copy number g i of any gene.

631
Reference [1] does not give the quantities of non-allocated RNA-polymerases or 632 ribosomes. To determine the set of parameters, we fix the average concentration of free 633 RNA-polymerases and ribosomes. Note that we can have multiple sets of parameters 634 depending on this choice. In the simulations, we will examine several simulations with 635 different values for average free RNA-polymerase and ribosome concentrations to see 636 their impact on the dynamic of the model (Section 3.5).

637
The rates µ 1,i , µ 2,i of mRNAs and protein elongation rates can be deduced from the 638 gene length of the i-th gene. In the description of the model, we have considered that 639 the length of the mRNA is characterized by its length; so a rate the parameter µ 1,i is 640 given by the mRNA elongation speed (39 Nucl/s in [18] for slowly growing cells) divided 641 by the length of the i-th gene. Similarly µ 1,i is given by the protein elongation speed λ 2,i ) in each unit of production i ∈ {1, ..., K}. To do so, we interpret the mRNA and 649 protein concentration of each type measured in [1] as the average concentration of each 650 mRNA and proteins over the cell cycle of this model (respectively m i and p i ).

651
Moreover, as previously said, the average concentrations of free RNA-polymerases f Y 652 and free ribosomes f R are fixed. 653 We want now to compute β P , β Y , β R , λ 1,i and λ 2,i based on known values for f Y , 654 f R , m i and p i . We determine the parameter β P . In the description of the stochastic 655 model, Equation (3.1) states that at any moment, the volume is considered to be 656 proportional to the total mass of proteins. Interpreting p i as the average concentration 657 of the protein of type i leads by integration of Equation (3.1) to We continue with the parameters relevant to the transcription: λ 1,i and β Y . With 659 Equations (3.13) and (3.15), β Y , λ 1,1 , ..., λ 1,K are solution of the system Since f Y , m i and g i (t) have already been settled, we can use a fixed point optimization 661 procedure to determine β Y and all λ 1,i . Then, as these parameters are determined, we 662 now have an explicit expression for f Y (t) for any time t of the cell cycle. 663 We have to determine the parameters relevant to translation, namely λ 2,i and β R .

664
Here again, we use a fixed point optimization procedure to deliver the result. With     Here, we present the results of a particular simulation, whose parameters are 677 presented in S3 Fig(A).Its average behavior will be compared with the expressions 678 derived from the system of ODEs. The simulation presented here takes into account the 679 841 genes with protein and mRNA production described in [1], and we have fixed the 680 number of free RNA-polymerases and ribosomes in order to compute the parameters.

681
The system of ODEs assumes volume growth is exponential with rate log 2/τ D . In 682 S3 Fig(A), the volume of the cell indeed seems to grow exponentially in the simulations; 683 the growth rate corresponds to the expected a doubling time of τ D .

684
For each type of gene, S3 Fig(B), shows the ratio between protein production 685 observed in the simulations divided by protein production expected (and similarly for 686 the mRNAs in inset). It appears that the correspondence is correct, especially for the 687 highly expressed proteins. It is less precise for the protein less expressed but, globally, 688 the correspondence seems good enough. even quicker for the free ribosomes (insets of S3 Fig(C) and S3 Fig(D)).

697
All these results support the idea that the expressions derived from the system of 698 ODEs are accurate to describe the average behavior of the stochastic model.  The first simulation, corresponding to S4 Fig(A) and S4 Fig(B), considers a low 705 concentration of free ribosomes and a high concentration of free RNA-polymerases. This 706 situation seems to reasonable in this biological setting. Consequently, free ribosomes 707 and free polymerases are subject to a large competition between transcripts (see [19] in 708 the case of the yeast). At the same time, the parameters are fixed in such a way that 709 most of RNA-polymerases are non-allocated, i.e. not bound on the DNA, see [20], see replication and random partitioning); it shows that for 90% of the genes, protein 713 variance ratio is above 0.9 (the mean of the ratio is 0.96).

714
In these simulations, we look at the distributions of free RNA-polymerases and 715 ribosomes. In S4 Fig(B   In this section, based on the set of parameters of the simulation Section 3.5.1 (with few 776 free ribosomes and many free RNA-polymerases), we make variations on some modeling 777 replication processes, etc. We will show that protein variability is quite robust to any of 781 these changes: as for the results presented in Section 3.5.1, protein variance is still 782 increased by at most 10% compared to the gene-centered model.  For each new gene, we have sampled its average protein and mRNA concentration, 791 an mRNA lifetime and gene position. By studying the data of [1], we have investigated 792 the possible statistical correlations between these quantities; it appears that only the 793 mRNA and protein concentration are correlated (as it is shown in the Figure 7B of the 794 main article). We therefore have sampled the mRNAs lifetime and the gene position 795 and length independently from the two other quantities.

796
As in the dataset, the genes are evenly distributed in the DNA, the gene position is 797 assumed to be uniformly distributed. The empirical mRNA lifetime distribution fitted a 798 log-normal distribution; we have chosen the mRNA lifetime accordingly.

799
For the mRNA and the protein expression, we have taken into account their 800 correlation. The first step of the procedure is to sample protein production of the new 801 gene according protein empirical distribution of the dataset. We then obtain a realistic 802 protein production for the new gene. In a second step, the mRNA production is chosen 803 depending on its protein production already determined. We have subdivised the 804 dataset in 10 classes according to protein production (see the different colors of 805 Figure 5A of the main article); we then consider the classes in which protein production 806 of the new gene falls in. We then sample its mRNA production according to the 807 empirical distribution of the mRNA productions of this specific class only. Thus, the 808 mRNA production of the new gene is sampled with a specific distribution that depends 809 on its protein production With this procedure, the newly created genes have a protein 810 and mRNA productions that seem to be in adequation with the original dataset (see 811

812
Simulations with the completed genome show no significant difference in terms of 813 protein variability. In particular, the variance ratio between protein concentration of the 814 gene-centered model and the multi-protein model is not different as in Section 3.5.1. In the complete stochastic model as it was presented in Section 3.1, all ribosomes and 817 all RNA-polymerases are supposed to have constant concentrations (respectively β R and 818 β Y ). In reality, both RNA-polymerases and ribosomes are composed of different 819 subunits, each subunit is either a protein or, in the case of ribosomes, a functional RNA. 820 The variability of the production of these subunits can have an overall impact on the 821 global production. 822 We have performed a preliminary simulation that takes into account this aspect: the 823 goal is not to have a precise description of mechanisms of RNA-polymerase and 824 ribosome productions, but rather to have an insight in the magnitude of additional 825 variability it can induce. In this version of the model, the expression of one gene 826 represents RNA-polymerase production and the expression of another gene represents 827 the ribosome production. It refers to a case where the RNA-polymerases and ribosomes 828 would be composed of only one proteic subunit. 829 We therefore created two genes, whose protein production was fixed to correspond to 830 the measured concentration of RNA-polymerases and ribosomes. The mRNA 831 production and lifetime, the gene position and length have been chosen by the same 832 procedure as described in the previous subsection.

833
This simulation brings an additional variability in the growth rate: volume growth is 834 more variable. These fluctuations are directly correlated with the number of ribosomes 835 in the cell ( Figure 5B of the main article, above). But surprisingly, these additional 836 variability has no significant impact in protein variability. The Figure 5B   We can propose a possible interpretation of these results. The fluctuations in the 841 total number of ribosomes seems to influence primarily the speed of growth as shown in 842 Figure 5B of the main article. When ribosomes are produced, it accelerates the global 843 production of all types of proteins thus increasing the volume. As a consequence, both 844 the production of each type of protein and the volume are co-regulated. Fluctuations in 845 the total number of ribosomes affect volume growth and the production of the i-th 846 protein in the same way such as in a cell of a given volume, the i-th protein distribution 847 is relatively unchanged. In the complete stochastic model as it was described in Section 3.1, RNA-polymerases 850 are either on the DNA involved in a transcription process, or is among the F Y free 851 RNA-polymerases that freely evolve in the cytoplasm. But it has been experimentally 852 shown that large portion of RNA-polymerases can bind non-specifically on the DNA, 853 without initiating transcription. For instance, [20] estimated that around 90% of the 854 RNA-polymerases are non-specifically bound to the DNA. 855 We have created an alternative version of the stochastic model to introduce a third 856 possible class for RNA-polymerases. RNA-polymerases can also bound non-specifically 857 on the DNA. The binding rate is modeled as follows: at any time t, a free 858 RNA-polymerase bind on the DNA at a rate that depends on the number of free 859 RNA-polymerases F Y (t) and on the DNA concentration G(t)/(K · V (t)) (where 860 PLOS 30/36 ; the global rate is hence λ + F Y (t)G(t)/(K · V (t)) where λ + is a 861 parameter that represents the natural affinity of RNA-polymerases for the DNA. Once 862 an RNA-polymerase is bound, it is released in a time represented by an exponential 863 random variable of rate λ − (see S8 Fig). 864 We performed a simulation where the parameters λ + and λ − are chosen such that 865 around 90% of the RNA-polymerases are sequestered on the DNA at any time, as it was 866 the case in [20]. The variability of protein concentration does not seem to be impacted 867 in this case. In the complete stochastic model as it was initially described, replication initiation and 870 division occur when the cell reaches the respective volumes of V I and 2V 0 . In practice it 871 does not occur in this way. We propose here a modification of the stochastic model to 872 take into account this aspect.

873
The way replication and division occur is still a disputed topic, see for 874 example [15,[21][22][23]. For the division, one hypothesis (referred as "sizer model") is that 875 the division decision depends on the current size of the cell (the size can refer to the 876 mass or the volume, but as explained in Section 3.1, the density constraint, see [24], 877 ensures a proportionality relation between these two quantities). With this hypothesis, 878 at any instant, the instantaneous probability to divide depends only on the current cell 879 size. In a first approximation, the cell size distributions observed experimentally can be 880 explained by this "sizer model", see [13,23]. It is therefore this framework that we have 881 considered to represent the cell division decision.   In order to perform the environmental state decomposition, one has to specify the 904 environment Z with which the conditioning is made in Equation (8)   where the maximum number of ribosomes on one single mRNA is limited. The system 919 also evolves in a fixed volume as it is the case for classic models.

920
In order to have a prediction for the number of respectively free RNA-polymerases 921 and free ribosomes; we consider two analog models that are slightly simplified versions 922 of the model of [25]. They respectively represent the transcription and translation part 923 and they are completely independent.

924
The goal is, for each of these models, to provide the equivalent of the first results 925 of [25] and we will show that the expected distribution of free RNA-polymerases (or free 926 ribosomes) is binomial in these simplified cases. In a pool of K genes, denote by N Y the constant total number of polymerases. We 936 consider the random variables E Y,i for i ∈ {1, ..., K} be the number of 937 RNA-polymerases attached to the i-th gene. As a consequence, the random variable is the number of free RNA-polymerases in the system. The process X(t) = (E Y,i (t), i ∈ {1, ...K}) takes place in the state place t the subset 940 S of N K such as There are at most N Y RNA-polymerases that can be attached to genes at the same time. 942 We can describe the Markov process transition by the following for any x ∈ S and with Z > 0 the normalization constant.

952
Proof. We only need to check that π satisfies Equation (3.8). For a gene i ∈ {1, ..., K}, we take x ∈ S such that x − e i ∈ S .
So π satisfies Equation (3.8). 953 We can now derive from the previous proposition the steady state distribution of 954 F Y , the number of free RNA-polymerases of the process.

960
Since the independent random variables C p,k are Poisson, their sum is also Poisson 961 with parameter Λ := K i=1 G i λ 1,i /(V µ 1,i ). By summing up the previous relation, one 962 gets that

964
A Model for Translation

965
The model for translation considered here is analogous to the transcription case. We 966 still consider that the volume V is fixed and that for each gene, the number M i of 967 mRNAs of type i is known and constant (because of these, the process describes here is 968 independent of transcription). As in the complete stochastic model, and contrary to the 969 model of [25], we consider that there is no limiting number of elongating ribosomes on 970 one mRNA.

971
Similarly to the transcription, we can define N R (the total number of ribosomes), 972 E R,i (the number of ribosomes elongating an mRNA of type i) and F R (the number of 973 free ribosomes) such as The rate at which a ribosome is sequestered on a type i mRNA is therefore M i λ 2,i /V , 975 and the rate at which an elongation terminates on a type i mRNA is µ 2,i E R,i .

976
As this model is analog to the transcription case, we can also prove that