Intercellular Variability in Protein Levels from Stochastic Expression and Noisy Cell Cycle Processes

Inside individual cells, expression of genes is inherently stochastic and manifests as cell-to-cell variability or noise in protein copy numbers. Since proteins half-lives can be comparable to the cell-cycle length, randomness in cell-division times generates additional intercellular variability in protein levels. Moreover, as many mRNA/protein species are expressed at low-copy numbers, errors incurred in partitioning of molecules between two daughter cells are significant. We derive analytical formulas for the total noise in protein levels when the cell-cycle duration follows a general class of probability distributions. Using a novel hybrid approach the total noise is decomposed into components arising from i) stochastic expression; ii) partitioning errors at the time of cell division and iii) random cell-division events. These formulas reveal that random cell-division times not only generate additional extrinsic noise, but also critically affect the mean protein copy numbers and intrinsic noise components. Counter intuitively, in some parameter regimes, noise in protein levels can decrease as cell-division times become more stochastic. Computations are extended to consider genome duplication, where transcription rate is increased at a random point in the cell cycle. We systematically investigate how the timing of genome duplication influences different protein noise components. Intriguingly, results show that noise contribution from stochastic expression is minimized at an optimal genome-duplication time. Our theoretical results motivate new experimental methods for decomposing protein noise levels from synchronized and asynchronized single-cell expression data. Characterizing the contributions of individual noise mechanisms will lead to precise estimates of gene expression parameters and techniques for altering stochasticity to change phenotype of individual cells.


Introduction
The level of a protein can deviate considerably from cell-to-cell, in spite of the fact that cells are genetically-identical and are in the same extracellular environment [1][2][3]. This intercellular variation or noise in protein counts has been implicated in diverse processes such as corrupting functioning of gene networks [4][5][6], driving probabilistic cell-fate decisions [7][8][9][10][11][12], buffering cell populations from hostile changes in the environment [13][14][15][16], and causing clonal cells to respond differently to the same stimulus [17][18][19]. An important source of noise driving random fluctuations in protein levels is stochastic gene expression due to the inherent probabilistic nature of biochemical processes [20][21][22][23]. Recent experimental studies have uncovered additional noise sources that affect protein copy numbers. For example, the time take to complete cell cycle (i.e., time between two successive cell-division events) has been observed to be stochastic across organisms [24][25][26][27][28][29][30][31][32]. Moreover, given that many proteins/mRNAs are present inside cells at low-copy numbers, errors incurred in partitioning of molecules between two daughter cells are significant [33][34][35]. Finally, the time at which a particular gene of interest is duplicated can also vary between cells [36,37]. We investigate how such noise sources in the cell-cycle process combine with stochastic gene expression to generate intercellular variability in protein copy numbers (Fig 1).
Prior studies that quantify the effects of cell division on the protein noise level have been restricted to specific cases. For example, noise computations have been done in stochastic gene expression models, where cell divisions occur at deterministic time intervals [33,38,39]. Recently, we have analyzed a deterministic model of gene expression with random cell-division events [40]. Building up on this work, we formulate a mathematical model that couples stochastic expression of a stable protein with random cell-division events that follow a general class of probability distributions. Moreover, at the time of cell division, proteins are randomly partitioned between two daughter cells based on a framework that allows the partitioning errors to be higher or lower than as predicted by binomial partitioning. For this class of models, we derive an exact analytical formula for the protein noise level as quantified by the steadystate squared Coefficient of Variation (CV 2 ). This formula is further decomposed into individual components representing contributions from different noise sources. A systematic investigation of this formula leads to novel insights, such as identification of regimes where increasing randomness in the timing of cell-division events decreases the protein noise level.
Next, we extend the above model to include genome-duplication events that increase the gene's transcription rate [36,41]. To our knowledge, this is the first study integrating randomness in the genome-duplication process with stochastic gene expression. An exact formula for the protein noise level is derived for this extended model and used to investigate how the timing of duplication affects different noise components. Counter intuitively, results show that doubling of the transcription rate within the cell cycle can lead to smaller fluctuations in protein levels as compared to a constant transcription rate through out the cell cycle. Finally, we discuss how formulas obtained in this study can be used to infer parameters and characterize the gene expression process from single-cell studies.

Coupling gene expression to cell division
We consider the standard model of stochastic gene expression [42,43], where mRNAs are transcribed at exponentially distributed time intervals from a constitutive gene with rate k x . For the time being, we exclude genome duplication and the transcription rate is fixed throughout the cell cycle. Assuming short-lived mRNAs, each transcription event results in a burst of proteins [43][44][45]. The corresponding jump in protein levels is shown as where x(t) is the protein population count in the mother cell at time t, B is a random burst size drawn from a positively-valued distribution and represents the number of protein molecules synthesized in a single-mRNA lifetime. Motivated by observations in E. coli and mammalian cells, where many proteins have half-lives considerably longer than the cell-doubling time, we assume a stable protein with no active degradation [46][47][48]. Thus, proteins accumulate within Stochastically expressed proteins accumulate within the cell at a certain rate. At a random point in the cell cycle, gene duplication results in an increase in production rate. Stochastic cell-division events lead to random partitioning of protein molecules between two daughter cells with each cell receiving, on average, half the number of proteins in the mother cell just before division. The steady-state protein copy number distribution obtained from a large number of trajectories is shown on the right. The total noise in the protein level, as measured by the squared coefficient of variation (CV 2 ) can be broken into contributions from individual noise mechanisms.
follows an arbitrary positively-valued probability distribution with the following mean and squared coefficient of variation (CV 2 ) where h.i denotes expected value through out this paper. The random change in x(t) during cell division is given by where x(t s ) denotes the protein levels in the mother cell just before division and x + (t s ) denotes the protein levels in one of the daughter cells just after division. Conditioned on x(t s ), x + (t s ) is assumed to have the following statistics The first equation implies symmetric partitioning, i.e., on average each of the daughter cells inherits half the number protein molecules just before division. The second equation in Eq (5) describes the variance of x + (t s ) and quantifies the error in partitioning of molecules through the non-negative parameter α. For example, α = 0 represents deterministic partitioning where x + (t s ) = x(t s )/2 with probability equal to one. A more realistic model for partitioning is each molecule having an equal probability of being in the each daughter cell [49][50][51]. This results in a binomial distribution for x + (t s ) Probabilityfx þ ðt s Þ ¼ jjxðt s Þg ¼ xðt s Þ! j!ðxðt s Þ À jÞ! 1 2 xðt s Þ ; j 2 f0; 1; . . . ; xðt s Þg; ð6Þ and corresponds to α = 1 in Eq (5). Interestingly, recent studies have shown that partitioning of proteins that form clusters or multimers can result in α > 1 in Eq (5), i.e., partitioning errors are much higher than as predicted by the binomial distribution [33,39]. In contrast, if molecules push each other to opposite poles of the cell, then the partitioning errors will be smaller than as predicted by Eq (6) and α < 1. The model with all the different noise mechanisms (stochastic expression; random cell-division events and partitioning errors) is illustrated in Fig 2A and referred to as the full model. We also introduce two additional hybrid models [52,53], where protein production and partitioning are considered in their deterministic limit (Fig 2B and 2C). Note that unlike the full model, where x(t) takes non-negative integer values, x(t) is continuous in the hybrid models. We will use these hybrid models for decomposing the protein noise level obtained from the full model into individual components representing contributions from different noise sources.

Modeling the cell-cycle time
In order to quantify the steady-state protein mean and noise, we need to define the stochastic process that governs the timing of cell division. Variations in the duration of cell cycle can result from a variety of factors, such as cell physiology, growth rate, cell size and expression of genes that affect cell-cycle time such as FtsZ [24][25][26][27][28][29][30][31][32]. Given these complexities, we take a phenomenological approach to modeling cell-cycle time, and assume it to be an independent and identically distributed random variable that is drawn from a mixture of Erlang distributions (also known as phase-type distribution). The motivation for choosing this distribution is two fold: 1. Mixture of Erlang distributions can be represented via a continuous-time Markov chain, allowing mathematical tractability in terms of deriving/analyzing time evolution of moments.
2. This class of distribution is fairly general, in the sense that, any positively-valued distribution with CV 1 can be modeled via a mixture of Erlang distributions [54].
Consider a mixture of n Erlang distributions with mixing probabilities p i , i = {1, . . ., n}. Recall that an Erlang distribution of order i is the distribution of the sum of i independent and identical exponential random variables. The cell-cycle time is assumed to have an Erlang distribution of order i with probability p i and can be represented by a continuous-time Markov chain with states G ij , j = {1, . . ., i}, i = {1, . . ., n} (Fig 3). Let Bernoulli random variables g ij = 1 if the system resides in state G ij and 0 otherwise. The probability of transition G ij ! G i(j+1) in the next  (1) and (4). The differential equation within the circle represents the time evolution of x(t) in between events. A) Model with all the different sources of noise: proteins are expressed in stochastic bursts, cell division occurs at random times, and molecules are partitioned between the two daughter cells based on Eq (5). The trivial dynamics _ x ¼ 0 signifies that the protein level is constant in-between stochastic events. B) Hybrid model where randomness in cell-division events is the only source of noise. Protein production is modeled deterministic through a differential equation and partitioning errors are absent, i.e., α = 0 in Eq (5). C) Hybrid model where noise comes from both cell-division events and partitioning errors. Protein production is considered to be deterministic as in Fig 2B. Since x(t) is continuous here, x + (t s ) has a positively-valued continuous distribution with same mean and variance as in Eq (5) doi:10.1371/journal.pcbi.1004972.g002 infinitesimal time interval [t, t + dt) is given by ikg ij dt, implying that the time spent in each state G ij is exponentially distributed with mean 1/ik. To summarize, at the start of cell cycle, a state G i1 , i = {1, . . ., n} is chosen with probability p i and cell division occurs after transitioning through i exponentially distributed steps. Based on this formulation, the probability of a celldivision event occurring and a new cell cycle obtained from an Erlang distribution of size i starting in the next time interval [t, t + dt) is given by kp i P n j¼1 ðjg jj Þdt, and whenever the event occurs, the protein level changes as per Eq (4). Finally, the mean, the squared coefficient of variation, and the skewness of the cell-cycle time in terms of the Markov chain parameters are given by [55], where hT 3 i is the third-order moment of the cell-cycle time. An important property of this class of distributions is that increasing CV 2 T also makes the distribution highly skewed, because from Eq (7) both the CV and skewness are linear combinations of p i , albeit with different linear coefficients that decrease with i. Considering that P n i¼1 p i ¼ 1, the only way to increase CV 2 T is by increasing smaller-index components and decreasing larger-index components of the distribution (i.e. increasing p i and decreasing p j , where i < j). Since higher values of i are more penalized in the skewness equation, this would correspond to making the distribution more positively skewed. Hence high values of CV 2 T also means high values of skewness, thus occurrences of longer cell cycles are more probable. As we will shortly see, this property leads to mean protein levels being dependent on CV 2 T .

Results
Computing the average number of protein molecules We refer the reader to [52,56,57] for an introduction to moment dynamics for stochastic and hybrid systems. Analysis in Appendix A in S1 Text shows Note that the time-derivative of the mean protein level (first-order moment) is unclosed, in the sense that, it depends on the second-order moment hxg ij i. Typically, approximate closure methods are used to solve moments in such cases [52,[57][58][59][60][61][62]. However, the fact that g ij is binary can be exploited to automatically close moment dynamics. In particular, since g ij 2 {0, 1} hg n ij x m i ¼ hg ij x m i; n 2 f1; 2; . . .g ð 9Þ for any non-negative integer m. Moreover, as only a single state g ij can be 1 at any time Using Eqs (9) and (10), the time evolutions of hg ij i and hxg ij i are obtained as and only depend on hg ij i and hxg ij i (see Appendix A in S1 Text). Thus, Eqs (8) and (11)-(14) constitute a closed system of linear differential equations from which moments can be computed exactly.
To obtain an analytical formula for the average number of proteins, we start by performing a steady-state analysis of Eq (8) that yields where h:i denotes the expected value in the limit t ! 1. Using Eq (15), hxg i1 i is determined from Eq (13), and then all moments hxg ij i are obtained recursively by performing a steadystate analysis of Eq (14) for j = {2, . . ., i}. This analysis results in Using Eqs (7), (16) and the fact that P n i¼1 P i j¼1 g ij ¼ 1 we obtain the following expression for the mean protein level It is important to point that Eq (17) holds irrespective of the complexity, i.e., the number of states G ij used in the phase-type distribution to approximate the cell-cycle time distribution. As expected, hxi increases linearly with the average cell-cycle time duration hTi with longer cell cycles resulting in more accumulation of proteins. Consistent with previous findings, Eq (17) shows that the mean protein level is also affected by the randomness in the cell-cycle times ðCV 2 T Þ [40,63]. For example, hxi reduces by 25% as T changes from being exponentially distributed ðCV 2 T ¼ 1Þ to periodic ðCV 2 T ¼ 0Þ for fixed hTi. Next, we determine the noise in protein copy numbers, as quantified by the squared coefficient of variation.

Computing the protein noise level
Recall that the full model introduced in Fig 2A has three distinct noise mechanisms. Our strategy for computing the protein noise level is to first analyze the model with a single noise source, and then consider models with two and three sources. As shown below, this approach provides a systematic dissection of the protein noise level into components representing contributions from different mechanisms.
Contribution from randomness in cell-cycle times. We begin with the model shown in Fig 2B, where noise comes from a single source-random cell-division events. For this model, the time evolution of the second-order moment of the protein copy number is obtained as and depends on third-order moments hx 2 g jj i (see Appendix B in S1 Text). Using the approach introduced earlier for obtaining the mean protein level, we close moment equations by writing the time evolution of moments hx 2 g ij i. Using Eqs (9) and (10) Note that the moment dynamics for hxi and hxg ij i obtained in the previous section (Eqs (8), (13) and (14)) are identical for all the models in Fig 2, irrespective of whether the noise mechanism is modeled deterministically or stochastically. Eqs (8), (11)-(14) and (18)-(20) represent a closed set of linear differential equations and their steady-state analysis yields From Eq (21) Using Eq (22) and the mean protein count quantified in Eq (17), we obtain the following squared coefficient of variation where CV 2 E represents the noise contribution from random cell-division events. Since cell division is a global event that affects expression of all genes, this noise contribution can also be referred to as extrinsic noise [49,[64][65][66][67]. In reality, there would be other sources of extrinsic noise, such as, fluctuations in the gene-expression machinery that we have ignored in this analysis.
Note that CV 2 E ! 1=27 as T approaches a delta distribution, i.e., cell divisions occur at fixed time intervals. We discuss simplifications of Eq (24) in various limits. For example, if the time taken to complete cell cycle is lognormally distributed, then and extrinsic noise monotonically increases with CV 2 T . If fluctuations in T around hTi are small, then using Taylor series Substituting Eq (26) in Eq (24) and ignoring CV 4 T and higher order terms yields where the first term is the extrinsic noise for CV 2 T ! 0 and the second term is the additional noise due to random cell-division events.
Contribution from partitioning errors. Next, we consider the model illustrated in Fig 2C  with both random cell-division events and partitioning of protein between the two daughter cells. Thus, the protein noise level here represents the contribution from both these sources. Analysis in Appendix C in S1 Text shows that the time evolution of hx 2 i and hx 2 g ij i are given by Note that Eqs (28) and (29) are slightly different from their counterparts obtained in the previous section (Eqs (18) and (19)) with additional terms that depend on α, where α quantifies the degree of partitioning error as defined in Eq (5). As expected, Eqs (28) and (29) reduce to Eqs (18) and (19) when α = 0 (i.e., deterministic partitioning). Computing hx 2 g ij i by performing a steady-state analysis of Eqs (28)-(30) and using a similar approach as in Eq (22) we obtain Finding CV 2 of the protein level and subtracting the extrinsic noise found in Eq (24) yields where CV 2 R represents the contribution of partitioning errors to the protein noise level. Intriguingly, while CV 2 R increases with α, it decrease with CV 2 T . Thus, as cell-division times become more random for a fixed hTi and hxi, the noise contribution from partitioning errors decrease. It turns out that this dependence of CV 2 R on CV T is a direct result of the second equation in Eq (5), where stochasticity in the partitioning process increases linearly with x(t s ), the number of protein molecules just before division. Based on Eq (17), one needs to reduce k x or hBi to maintain a fixed hxi for increasing randomness in cell-division times. Since the average number of protein molecules just before division is 2k x hBihTi (see Appendix D in S1 Text), a reduction in k x or hBi results in a lower number of protein molecules before division, and hence, lesser noise from partitioning as per Eq (5) and a smaller CV 2 R . This reasoning is supported by the fact that if we redefine the noise in the partitioning process to make it independent of x(t s ), i.e. modify Eq (5) as then the noise contribution from partitioning errors is given by and the dependency of CV 2 R on CV T disappears (Appendix D in S1 Text). Contribution from stochastic expression. Finally, we consider the full model in Fig 2A  with all the three different noise sources. For this model, moment dynamics is obtained as (see Appendix E in S1 Text) ðjhxg jj iÞ À ikhx 2 g i1 i; ð36Þ Compared to Eqs (28)-(30), (35)-(37) have additional terms of the form k x hB 2 i, where hB 2 i is the second-order moment of the protein burst size in Eq (1). Performing an identical analysis as before we obtain which yields the following total protein noise level that can be decomposed into three terms. The first term CV 2 E represents the contribution from random cell-division events and is given by Eq (24). The second term CV 2 R is the contribution from partitioning errors determined in the previous section (partitioning noise), and the final term CV 2 P is the additional noise representing the contribution from stochastic expression (production noise). A common approach to study gene expression noise is to decompose it into intrinsic and extrinsic components. These components are obtained experimentally using the dual-color assay that measures the correlation in the expression of two identical copies of the gene [49]. As per this definition, CV 2 E represents the extrinsic noise as random cell-division events are common to all genes and makes expression levels more correlated in individual cells.
In contrast, the contributions from noisy production and partitioning represent the intrinsic noise as they are specific to an individual gene and make expression levels less correlated. An interesting observation from Eq (39) is that CV 2 T has opposite effects on CV 2 R and CV 2 P (for fixed mean protein level). While CV 2 R monotonically decreases with increasing CV 2 T , CV 2 P increases with CV 2 T . Thus, if hxi is small and α is large, then the noise contributed from partitioning dominates the total noise, and making cell-cycle duration more random will reduce the total noise. However, since both CV 2 E and CV 2 P are monotonically increasing functions of CV 2 T , the total noise will begin to increase with CV 2 T once these noise sources become dominant. It turns out that in certain cases the intrinsic noise becomes invariant of CV 2 T . For example, when B = 1 with probability one, i.e., proteins are synthesized one at a time at exponentially distributed time intervals and α = 1 (binomial partitioning) In this limit the intrinsic noise is always 1/Mean irrespective of the cell-cycle time distribution T [33]. Note that the average number of proteins itself depends on T as shown in Eq (17). Another important limit is CV 2 T ! 0, in which case Eq (39) reduces to and is similar to the result obtained in [38] for deterministic cell-division times and binomial partitioning. Fig 4 shows how different protein noise components change as a function of the mean protein level as the gene's transcription rate k x is modulated. The extrinsic noise is primarily determined by the distribution of the cell-cycle time and is completely independent of the mean. In contrast, both CV 2 R and CV 2 P scale inversely with the mean, albeit with different scaling factors (Fig 4). This observation is particularly important since many single-cell studies in E. coli, yeast and mammalian cells have found the protein noise levels to scale inversely with the mean across different genes [68][69][70][71]. Based on this scaling it is often assumed that the observed cellto-cell variability in protein copy numbers is a result of stochastic expression. However, as our results show, noise generated thorough partitioning errors is also consistent with these experimental observations and it may be impossible to distinguish between these two noise mechanisms based on protein CV 2 versus mean plots unless α is known.

Quantifying the effects of gene duplication on protein noise
The full model introduced in Fig 2 assumes that the transcription rate (i.e., the protein burst arrival rate) is constant throughout the cell cycle. This model is now extended to incorporate gene duplication during cell cycle, which increases the burst arrival (transcription) rate by f times (Fig 5). Note that due to gene dosage compensation, doubling the genome does not always correspond to f = 2 [72][73][74]. If f > 1, then accumulation of proteins will be bilinear as illustrated in Fig 1. As before, we again take a phenomenological approach to model the timing of gene duplication. The cell-cycle time T is divided into two intervals: time from the start of cell cycle to gene duplication (T 1 ), and time from gene duplication to cell division (T 2 ). T 1 and T 2 are independent random variables, each drawn from a mixture of Erlang distributions (see Fig. B in S1 Text). The mean cell-cycle duration and its noise can be expressed as where CV 2 X denotes the squared coefficient of variation of the random variable X. An important variable in this formulation is β, which represents the average time of gene duplication normalized by the mean cell-cycle time. Thus, β values close to 0 (1) imply that the gene is duplicated  T ) and magnitude of partitioning errors quantified by α (see Eq (5)). With increasing mean level the total noise first decreases and then reaches a baseline that corresponds to extrinsic noise. For this plot, B is assumed to be geometrically-distributed with mean hBi = 1.5, CV 2 T ¼ 0:05 and α = 1 (i.e., binomial partitioning). doi:10.1371/journal.pcbi.1004972.g004 early (late) in the cell-cycle process. Moreover, the noise in the gene-duplication time is controlled via CV 2 T 1 . We refer the reader to Appendix F in S1 Text for a detailed analysis of the model in Fig 5  and only present the main results on the protein mean and noise levels. The steady-state mean protein count is given by and decreases with β, i.e., a gene that duplicates early has on average, more number of proteins. When β = 1, then the transcription rate is k x throughout the cell cycle and we recover the mean protein level obtained in Eq (17). Similarly, when β = 0 the transcription rate is fk x and we obtain f times of the amount as in Eq (17). As per our earlier observation, more randomness in the timing of genome duplication and cell division (i.e., higher CV 2 T 1 and CV 2 T 2 values) increases hxi. Our analysis shows that the total protein noise level can be decomposed into three components where CV 2 E is the extrinsic noise from random genome-duplication/cell-division events, and the sum of the contributions from partitioning errors (CV 2 R ) and stochastic expression (CV 2 P ) is the intrinsic noise. We refer the reader to Appendix F in S1 Text for noise formulas for any f, and only present formulas for f = 2 here. In this case, the intrinsic noise is obtained as Note that for β = 0 and 1, we recover the intrinsic noise level in Eq (39) from Eq (45). Interestingly, for B = 1 with probability 1 and α = 1, the intrinsic noise is always 1/Mean irrespective of the values chosen for CV 2 T 1 , CV 2 T 2 and β. For high precision in the timing of cell-cycle events where mean protein level is given by We investigate how different noise components in Eq (46) vary with β as the mean protein level is held fixed by changing k x . Fig 6 shows that CV 2 P follows a U-shaped profile with the optima occurring at b ¼ 2 À ffiffi ffi 2 p % 0:6 and the corresponding minimum value being % 5% lower than its value at β = 0. An implication of this result is that if stochastic expression is the dominant noise source, then gene duplication can result in slightly lower protein noise levels. In contrast to CV 2 P , CV 2 R has a maxima at b ¼ 2 À ffiffi ffi 2 p which is % 6% higher than its value at β = 0 (Fig 6). Analysis in Appendix F5 in S1 Text reveals that CV 2 R and CV 2 P follow the same qualitative shapes as in Fig 6 for any CV 2 T 1 and CV 2 T 2 . Interestingly, when CV 2 T 1 ¼ CV 2 T 2 , the maximum and minimum values of CV 2 R and CV 2 P always occur at b ¼ 2 À ffiffi ffi 2 p albeit with different optimal values than Fig 6 (see Fig. C in S1 Text). For example, if CV 2 T 1 ¼ CV 2 T 2 ¼ 1 (i.e., exponentially distributed T 1 and T 2 ), then the maximum value of CV 2 R is 20% higher and the minimum value of CV 2 P is 10% lower than their respective value for β = 0. Given that the effect of changing β on CV 2 P and CV 2 R is small and antagonistic, the overall affect of genome duplication on intrinsic noise may be minimal and hard to detect experimentally.
While the above analysis is for a stable protein, a natural question to ask is how would these results change for an unstable protein? Consider an unstable protein with half-life considerably shorter than the cell-cycle duration. This rapid turnover ensures that the protein level equilibrates instantaneously after cell-division and gene-duplication events. Let γ x denote the protein decay rate. Then, the mean protein level before and after genome duplication is hxi ¼ k x hBi=g x and hxi ¼ 2k x hBi=g x , respectively. Note that in the limit of large γ x there is no noise contribution from partitioning errors since errors incurred at the time of cell division would be instantaneously corrected. The extrinsic noise, which can be interpreted as the protein noise level for deterministic protein production and decay is obtained as (for analysis on general f see Appendix G in S1 Text)  (46) are plotted as a function of β, which represents the fraction of time within the cell cycle at which gene duplication occurs. The mean protein level is held constant by simultaneously changing the transcription rate k x . Noise levels are normalized by their respective value at β = 0. The noise contribution from partitioning errors is maximized at β % 0.6. In contrast, the contribution from stochastic expression is minimum at β % 0.6. The extrinsic noise contribution from random gene-duplication and cell-division events is maximum at β % 0.2 and minimum at β % 0.8. For this plot, the mean of the protein is 170 molecules per cell; and the bursts are geometrically distributed with hBi = 10.
doi:10.1371/journal.pcbi.1004972.g006 which is similar to noise level reported in [75]. When β = 0 or 1, the transcription rate and the protein level are constant within the cell cycle and CV 2 E ¼ 0. Moreover, CV 2 E is maximized at β = 2/3 with a value of 1/8. Thus, in contrast to a stable protein, extrinsic noise in an unstable protein is strongly dependent on the timing of gene duplication. Next, consider the intrinsic noise component. Analysis in Appendix G in S1 Text shows that the noise contribution from random protein production and decay is While the mean protein level is strongly dependent on β, the intrinsic noise Fano factor ¼ CV 2 P Â hxi is independent of it. Thus, similar to what was observed for a stable protein, the intrinsic noise in an unstable protein is invariant of β for a fixed hxi.

Discussion
We have investigated a model of protein expression in bursts coupled to discrete gene-duplication and cell-division events. The novelty of our modeling framework lies in describing the size of protein bursts B, the time between cell birth and gene duplication T 1 , the time between gene duplication and cell division T 2 , and partitioning of molecules during cell division through general statistical distributions. Exact formulas connecting the protein mean and noise levels to these underlying distributions were derived. Furthermore, the protein noise level, as measured by the squared coefficient of variation, was decomposed into three components representing contributions from gene-duplication/cell-division events, stochastic expression and random partitioning. While the first component is independent of the mean protein level, the other two components are inversely proportional to it. Some important insights are as follows: • The mean protein level is affected by both the first and second-order moments of T 1 and T 2 .
In particular, randomness in these times (for a fixed mean) increases the average protein count. This increase can be attributed to the fact that increasing cell-cycle time variations leads to positively skewed distributions, making longer cell cycles (and hence higher protein accumulation) more likely.
• Random gene-duplication/cell-division events create an extrinsic noise term which is completely determined by moments of T 1 and T 2 up to order three. Interestingly, noise in the timing of these events also critically affects the intrinsic noise contributions from stochastic expression and partitioning. Hence, ignoring the effect of cell-cycle time variations, may lead to erroneous estimation of intrinsic noise.
• The noise contribution from partitioning errors decreases with increasing randomness in T 1 and T 2 . Thus, if hxi is sufficiently small and α is large compared to B in Eq (45), increasing noise in the timing of cell-cycle events decreases the total noise level.
A key limitation of our approach is to model timing of gene-duplication/cell-division events through independent random variables. There is always non-zero correlation in the cell-cycle durations of mother and daughter cells [76][77][78]. Moreover, in the same cell cycle, times T 1 and T 2 could be dependent [79]. While our assumption on independence of timing maybe unrealistic, it played an important role in deriving exact analytical formulas for the protein mean and noise levels. We have used Monte Carlo simulations to investigate scenarios where T 1 and T 2 , or successive cell-division events, have memory and are dependent random variables (see Appendix H in S1 Text). Our analysis reveals that the results presented in Figs 4 and 6 hold even when the assumption of independent timing is perturbed over biologically meaningful parameters.

Effect of gene duplication on noise level
In this first-of-its-kind study, we have investigated how discrete f-fold changes in the transcription rate due to gene duplication affect the intercellular variability in protein levels. Not surprisingly, the timing of genome duplication strongly affects the mean protein level-hxi can change up to f folds depending on whether the gene duplicates early (β = 0) or late (β = 1) in the cell cycle. Results show that genome duplication has counter intuitive effects on the protein noise level (Fig 6). For example, if stochastic expression is the dominant source of noise, then doubling of transcription due to duplication results in lower noise, as compared to constant transcription throughout the cell cycle. This is because for the same mean protein level, there are more burst (transcription) events in the case of genome duplication (f = 2) than constant transcription (f = 1). For example, consider deterministic timing (CV 2 T 1 ¼ CV 2 T 2 ¼ 0) and gene duplication in the middle of the cell cycle (β = 0.5). Then, for the case β = 1, there are on average k x hTi burst events per cell cycle. For the same hxi, there are 1.05k x hTi production events in the case of gene duplication (β = 0.5). This slight increase in the number of transcription events leads to better averaging of bursty protein synthesis and lower noise levels. Overall, the effect of β on different noise component is quite modest: as β varies, noise components deviate at maximum % 20% from their values at β = 0 (Fig 6). These results are in contrast to the case of an unstable protein, where noise from the cell-cycle process is strongly dependent on β as shown in Eq (48).

Noise in synchronized cells
The mathematical framework introduced for modeling timing of cell division can be easily used to compute noise in synchronized cells. For example, let the cell-cycle duration be an Erlang distribution with shape parameter n and rate parameter nk (i.e., p n = 1 in Fig 3), which can be biologically interpreted as cells moving through n cell-cycle stages G n1 , G n2 , . . ., G nn . Statistical moments conditioned on the cell-cycle stage G nj can be obtained using Using Eq (50) and moments hg nj x m i obtained from Eqs (16) and (35)- (37), yields the following conditional mean which increases with cell-cycle stage (i.e., higher values of j). The protein noise level given that cells are in stage G nj is given by Note that if n is large then the first term, which represents the noise contribution from the cellcycle process, is negligible and can be dropped. Interesting, while the noise contribution from partitioning errors CV 2 R decreases with cell-cycle stage, the noise contribution from stochastic expression CV 2 P increases with j. Moreover, for B = 1 with probability 1 and α = 1, the intrinsic noise is always 1/Mean irrespective of j. Assuming high n, the noise at cell birth (j = 1) and division (j = n) are obtained as respectively. Thus, measurements of Eqs (53) and (54) by synchronizing cells (or by using cell size as a proxy for cell-cycle stage) can be used to quantify α and hB 2 i/hBi, providing a novel way to separate these noise contributions. Next, we discuss how noise in asynchronous cell can be used to quantify these parameters.

Parameter inference in asynchronous cells
Simple models of bursty expression and decay predict the distribution of protein levels to be negative binomial (or gamma distributed in the continuous framework) [80,81]. These distributions are characterized by two parameter-the burst arrival rate k x and the average burst size hBi, which can be estimated from measured protein mean and noise levels. This method has been used for estimating k x and hBi across different genes in E. coli [47,82]. Our detailed model that takes into account partitioning errors predicts (ignoring gene-duplication effects) Using CV 2 T ⪡1 and a geometrically distributed B [50,[83][84][85], Eq (55) reduces to Intrinsic noise ¼ 4a 9 1 hxi þ 5 9 Given measurements of intrinsic noise and the mean protein level, hBi can be estimated from Eq (56) assuming α = 1 (i.e., binomial partitioning). Once hBi is known, k x is obtained from the mean protein level given by Eq (17). Since for many genes hBi % 0.5-5 [47], the contribution of the first term in Eq (56) is significant, and ignoring it could lead to overestimation of hBi. Overestimation would be even more severe if α happen to be much higher than 1, as would be the case for proteins that form aggregates or multimers [33]. One approach to estimate both hBi and α is to measure intrinsic noise changes in response to perturbing hBi by, for example, changing the mRNA translation rate through mutations in the ribosomal-binding sites (RBS). Consider a hypothetical scenario where the Fano Factor (intrinsic noise times the mean level) is 6. Let mutations in the RBS reduces hxi by 50%, implying a 50% reduction in hBi. If the Fano factor changes from 6 to 4 due to this mutation, then hBi = 3.6 and hαi = 3.25.
Our recent work has shown that higher-order statistics of protein levels (i.e., skewness and kurtosis) or transient changes in protein noise levels in response to blocking transcription provide additional information for discriminating between noise mechanisms [86,87]. Up till now these studies have ignored noise sources in the cell-cycle process. It remains to be seen if such methods can be used for separating the noise contributions of partitioning errors and stochastic expression to reliably estimate hBi and α.
Integrating cell size and promoter switching An important limitation of our modeling approach is that it does not take into account the size of growing cells. Recent experimental studies have provided important insights into the regulatory mechanisms controlling cell size [88][89][90][91]. More specifically, studies in E. coli and yeast argue for an "adder" model, where cell-cycle timing is controlled so as to add a constant volume between cell birth and division [78,[91][92][93]. Assuming exponential growth, this implies that the time taken to complete cell cycle is negatively correlated with cell size at birth. More importantly, cell size also affects gene expression-in mammalian cells transcription rates linearly increase with the cell size [94]. Thus, as cells become bigger they also produce more mRNAs to ensure gene product concentrations remains more or less constant. A main direction of future work would be to explicitly include cell size with size-dependent expression and timing of cell division determined by the adder model. This formulation will for the first time, allow simultaneous investigation of stochasticity in cell size, protein molecular count and concentration.
Our study ignores genetic promoter switching between active and inactive states, which has been shown to be a major source of noise in the expression of genes across organisms [95][96][97][98][99][100][101][102][103][104]. Taking into account promote switching is particularly important for genome-duplication studies, where doubling the number of gene copies could lead to more efficient averaging of promoter fluctuations. Another direction of future work will be to incorporate this additional noise source into the modeling framework and investigate its contribution as a function of gene-duplication timing.
Supporting Information S1 Text. Supplementary Material. Additional proofs and simulations. (PDF)

Author Contributions
Conceived and designed the experiments: MS DA AS.