Gene expression in individual cells is highly variable and sporadic, often resulting in the synthesis of mRNAs and proteins in bursts. Such bursting has important consequences for cell-fate decisions in diverse processes ranging from HIV-1 viral infections to stem-cell differentiation. It is generally assumed that bursts are geometrically distributed and that they arrive according to a Poisson process. On the other hand, recent single-cell experiments provide evidence for complex burst arrival processes, highlighting the need for analysis of more general stochastic models. To address this issue, we invoke a mapping between general stochastic models of gene expression and systems studied in queueing theory to derive exact analytical expressions for the moments associated with mRNA/protein steady-state distributions. These results are then used to derive noise signatures, i.e. explicit conditions based entirely on experimentally measurable quantities, that determine if the burst distributions deviate from the geometric distribution or if burst arrival deviates from a Poisson process. For non-Poisson arrivals, we develop approaches for accurate estimation of burst parameters. The proposed approaches can lead to new insights into transcriptional bursting based on measurements of steady-state mRNA/protein distributions.
One of the fundamental problems in biology is understanding how phenotypic variations arise among individuals in a population. Recent research has shown that phenotypic variations can arise due to probabilistic cell-fate decisions driven by inherent randomness (noise) in the process of gene expression. One of the manifestations of such stochasticity in gene expression is the production of mRNAs and proteins in bursts. Bursting in gene expression is known to impact cell-fate in diverse systems ranging from latency in HIV-1 viral infections to cellular differentiation. Recent single-cell experiments provide evidence for complex arrival processes leading to bursting, however an analytical framework connecting such burst arrival processes with the corresponding higher moments of mRNA/protein distributions is currently lacking. We address this issue by invoking a mapping between general models of gene expression and systems studied in queueing theory. The framework developed and the results derived lead to new approaches for testing commonly used assumptions in modeling gene expression and for accurate estimation of burst parameters. Notably, the phenomenon of stochastic bursting has been observed in a wide range of disciplines ranging from neuroscience and finance to cell biology. The approaches developed and results obtained in this work will thus contribute towards quantitative characterization of burst processes in diverse systems of current interest.
Citation: Kumar N, Singh A, Kulkarni RV (2015) Transcriptional Bursting in Gene Expression: Analytical Results for General Stochastic Models. PLoS Comput Biol 11(10): e1004292. https://doi.org/10.1371/journal.pcbi.1004292
Editor: Alexandre V. Morozov, Rutgers University, UNITED STATES
Received: February 20, 2015; Accepted: April 16, 2015; Published: October 16, 2015
Copyright: © 2015 Kumar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was funded by: NSF PHY 1307067 URL: http://www.nsf.gov; NSF DMS 1413111 URL: http://www.nsf.gov; and NSF DMS-1312926 URL: http://www.nsf.gov. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The cellular response to fluctuating environments requires adjustments to cellular phenotypes driven by underlying changes in gene expression. Given the inherent stochasticity of cellular reactions, biological circuits controlling gene expression have to operate in the presence of significant noise [1–15]. While noise reduction and filtering is essential for several cellular processes , cells can also amplify and utilize intrinsic noise to generate phenotypic diversity that enables survival under stressful conditions . Recent studies have demonstrated the importance of such bet-hedging survival strategies in diverse processes ranging from viral infections to bacterial competence . Quantifying the kinetic mechanisms of gene expression that drive variations in a population of cells will thus contribute towards a fundamental understanding of cellular functions with important applications to human health.
Recent experiments focusing on gene expression at the single-cell level have revealed striking differences from the corresponding population-averaged behavior. In particular, it has been demonstrated that transcription in single cells is sporadic, with mRNA synthesis often occurring in bursts followed by variable periods of inactivity [7, 18–28]. Such transcriptional bursting can give rise to high variability in gene expression products and to phenotypic variations in a population of genetically identical cells [29–32]. Furthermore, dynamical parameters that characterize transcriptional bursting of key genes can significantly influence cell-fate decisions in diverse processes ranging from HIV-1 viral infections to stem-cell differentiation . Correspondingly, there is significant interest in developing approaches for quantifying parameters related to transcriptional bursting such as frequency and mean burst size.
In recent years, multiple studies have provided evidence for bursty synthesis of mRNAs [20–25, 33, 34] and proteins [35, 36]. Experimental approaches in such studies include both steady-state measurements and time-dependent measurements of the mean and variance of gene expression products at the single-cell level. While obtaining time-lapse measurements of bursts at the single-cell level can be challenging, steady-state measurements at the single-cell level are now carried out routinely. It would thus be desirable to develop approaches for making inferences about burst parameters in gene expression using steady-state measurements at the single-cell level.
As noted in , steady-state measurements of the mean and variance alone cannot be used for estimating burst parameters for general models of gene expression, e.g. when burst arrival is governed by complex promoter-based regulation . Additional insights into processes leading to transcriptional bursting can potentially be obtained using measurements of higher moments. However, analytical results for higher moments of steady-state mRNA and protein distributions in general models of expression have not been obtained so far. The derivation of the corresponding analytical expressions will elucidate how measurement of higher moments can potentially lead to quantification of burst parameters. To address these issues, it is essential to develop and analyze a general class of stochastic models of gene expression.
A simple stochastic model that is widely used in analyzing bursting in gene expression is the random telegraph model that takes into account the switching of promoter between transcriptionally active (ON) and inactive (OFF) states [39–41]. This model has been used as the basis for several studies focusing on inferring gene expression parameters based on observations of the mean and variance of mRNA/protein distributions [13, 27, 42]. In this model, in the limit that we have transcriptional bursting, the arrival of bursts is a Poisson process. Correspondingly, the waiting-time distribution between arrival of mRNA bursts is assumed to be exponential. In general, this assumption is not valid as there are multiple kinetic steps involved in promoter activation [37, 43, 44]. Recent experiments on mammalian genes [7, 45, 46] have demonstrated that the waiting-time for arrival of bursts does not have an exponential distribution. In view of these experimental observations, it is natural to ask: Using steady-state measurements, can we infer if the burst arrival process is not a Poisson process? If so, how can we estimate the corresponding burst parameters?
Furthermore, in estimating burst size it is commonly assumed that mRNA/protein bursts are geometrically distributed. This assumption, which has been validated by experimental observations for some genes, is derived from the corresponding distribution of bursts in the random telegraph model. However, given the complexity and diversity of gene expression mechanisms, it is possible that several promoters involve multiple rate-limiting steps in the transition from the ON state to the OFF state. In such cases, the transcriptional burst size distribution will not be a geometric distribution. This observation leads to the following question: Can we use steady-state measurements of moments to determine if the burst distribution deviates from a geometric distribution?
The aim of this paper is to address the above questions by considering models with general arrival processes for mRNA creation. The paper is organized as follows. First, we introduce a class of gene expression models with general arrival processes leading to mRNA/protein bursts with arbitrary burst distribution. Then we review the mapping from gene expression models to systems studied in queuing theory [43, 47, 48] and use this mapping to derive steady-state moments for mRNA/protein distributions. The analytical expressions obtained for the steady-state moments are used to develop approaches for estimating burst parameters for general arrival processes. Finally, we use the results obtained to derive conditions relating experimentally measurable quantities that determine if the arrival of mRNA bursts deviates from a Poisson process and if the distribution of mRNA bursts deviates from a geometric distribution.
Model and preliminaries
We consider a general model of gene expression  as outlined in Fig 1. In the model, mRNAs are produced in bursts, with f(t) representing a general arrival time distribution for mRNA bursts. The mRNA burst distribution can be arbitrary. Each mRNA then produces proteins with rate kp, and finally, both mRNAs and proteins decay with rates μm and μp, respectively. Note that the model also allows for post-transcriptional regulation since the protein burst distribution from each mRNA can be arbitrary; the only assumption is that each mRNA produces proteins independently.
Bursts of mRNAs arrive with a general arrival time distributions f(t). Each mRNA produces proteins with rate kp and mRNAs and proteins decay with rates μm and μp, respectively.
In the limit μp ≪ μm, we can use the bursty synthesis approximation  for analyzing protein dynamics. This approximation consists of two steps: 1) obtaining the distribution of proteins produced from each mRNA and 2) assuming that the proteins are produced in instantaneous bursts. The corresponding distribution for the number of proteins created is referred to as the protein burst distribution. A detailed justification of the validity of this approximation has been provided in previous work [40, 49].
Let denote the generating function of the protein burst distribution p(n) produced by a single mRNA, and let denote the generating function of the protein burst distribution P(n) produced by all the mRNAs in a burst. If we denote by Am(z) the generating function of the mRNA burst distribution, then we have the following relation between the generating functions (1) The above relation follows from the observation that the number of proteins produced in a burst is a compound random variable: the sum of m independent identical random variables, each of which corresponds to the number of proteins produced from a single mRNA in the burst and m itself is a random variable denoting the number of mRNAs produced in the burst.
While the analytical results that we derive are valid for general mRNA and protein burst distributions, we will primarily focus on a specific class of burst distributions. Simple kinetic models and the results from multiple experiments indicate that mRNA burst distributions are geometric . Similarly, the burst distribution of proteins produced from a single mRNA is a geometric distribution with mean ⟨pb⟩ = kp/μm. For a geometric distribution with mean ⟨pb⟩, the generating function is given by
If we condition the geometric distribution on the production of at least 1 mRNA, then the generating function for the corresponding conditional geometric distribution is given by with (1 + ⟨mb⟩) as the mean mRNA burst size. Note that in the limit ⟨mb⟩ → 0, this distribution reduces to exactly 1 mRNA produced in each burst. Thus the conditional geometric distribution provides a unified representation of both Poisson arrival process for mRNAs (⟨mb⟩ → 0) and processes leading to transcriptional bursting (⟨mb⟩ > 0).
Consider now the protein burst distribution produced by an underlying conditional geometric mRNA burst distribution with mean (1 + ⟨mb⟩). Using Eq (1), we see that the corresponding generating function of the protein burst distribution is given by This is the generating function for a geometric distribution with mean b = (1 + ⟨mb⟩)⟨pb⟩), where ⟨pb⟩ = kp/μp represents the mean protein burst size from a single mRNA.
Single-cell experiments have demonstrated that the protein burst mean b can be directly measured in some cases . However, if the protein production rate kp is not known, the preceding analysis implies that measurements of protein burst distributions (which determine b) cannot be used to determine the degree of transcriptional bursting (1 + ⟨mb⟩). Since the mean transcriptional burst size is an important parameter characterizing bursting, it is of interest to develop approaches for estimating it based on available experiments. Previous work  has argued that the mean transcriptional burst size cannot be determined using measurements of protein burst distributions alone or by using only protein steady-state distributions. It was suggested that combining such measurements can potentially provide a way of determining the mean transcriptional burst size. To explore this possibility, it is necessary to derive analytical results connecting moments of burst and steady-state distributions for general kinetic schemes.
Mapping to queueing theory: Results for moments
To obtain steady-state moments for the model outlined in Fig 1, we invoke the mapping of this gene expression model to systems studied in queueing theory [43, 48, 51, 52]. Broadly speaking, queueing theory is the mathematical theory of waiting lines formed by customers who, arriving according to some random protocol, stay in the system until they receive service from a group of servers. Such queues are typically characterized by specifying a) the stochastic process governing the arrival of customers, b) distribution of number of customers in each arrival, c) the stochastic process governing departure of customers, and d) the number of servers. When the gene expression model in Fig 1 is expressed in the language of queueing theory, individual mRNAs/proteins are the analogs of customers in queueing models. The production of mRNAs/proteins in bursts corresponds to the arrival of customers in batches. Just as the customers leave the queue after receiving service, mRNAs/proteins exit the system upon degradation. Thus the waiting-time distribution for mRNA/protein decay is the analog of service time distribution for customers in queueing models. For the model in Fig 1, their decay time distribution is the exponential distribution. Also, since mRNAs/proteins are degraded independently of each other, the corresponding number of servers in queueing models is ∞ (which ensures that presence of a customer in the system does not affect the service time of any other customer in the system).
Based on the above mapping, the queueing system corresponding to the model outlined in Fig 1 is the GIX/M/∞ system [43, 48]. In this model, the symbol G refers to a general waiting-time distribution for the arrival process, IX denotes customers arriving in batches of independently distributed random sizes X, M stands for Markovian (i.e. exponential) service-time distribution for customers and ‘∞’ stands for infinite servers.
For the GIX/M/∞ model, exact results for iteratively obtaining the moments of the steady-state distribution of the number of customers have been derived . Using these results, explicit expressions for the first four moments of the steady-state distribution are provided in the Supplementary S1 Text. Applying the mapping discussed above, these results can be translated into exact expressions for the moments of mRNA/protein steady-state distributions, as discussed below.
Let us first examine the expressions for steady-state means of mRNAs, ⟨ms⟩, and proteins, ⟨ps⟩, which are given by (2) where kb stands for the mean arrival rate of mRNA bursts and b = ⟨mb⟩⟨pb⟩ is the mean of the protein burst distribution (including contributions from all the mRNAs). Although Eq (2) has been derived by assuming that the arrival of mRNAs/proteins is a renewal process, it is valid for arbitrary arrival processes. This is because Eq (2) is a direct consequence of Little’s Law [47, 53] which is valid for general arrival processes.
The above equations, Eq (2), can be used to determine the mean transcriptional burst size, provided the protein burst distribution can be measured experimentally. To see this, we note that dividing the expressions for the mean mRNA and protein levels leads to (3) Since the steady-state means ⟨ms⟩ and ⟨ps⟩ as well as the degradation rates μm and μp are parameters that can be measured experimentally, the above equation implies that the ratio b/⟨mb⟩ can be determined experimentally. Given b/⟨mb⟩ = kp/μm, this implies that the mean protein production rate kp can also be determined experimentally. This is an important result since it provides an approach for determining the mean protein production rate kp that is valid for arbitrary arrival processes for mRNAs. Furthermore, the above equation implies that, if the mean of protein burst distribution b can be measured , then the mean transcriptional burst size ⟨mb⟩ can also be determined. Thus, if we have measurements for mean mRNA and protein numbers and also the mean of protein burst distribution, then these measurements can be used to determine the degree of transcriptional bursting ⟨mb⟩ as well as the parameters ⟨pb⟩ and kp. It is noteworthy that this procedure for estimating the burst parameters is valid for arbitrary stochastic processes corresponding to mRNA transcription.
We next turn to expressions for higher moments of mRNA and protein steady-state distributions. The noise in mRNA steady-state distributions is given by (4) where is the variance of mRNA burst distribution and Kg(μm) is the so-called gestation factor, (5) with fL(s) denoting the Laplace transform of arrival time distribution of mRNA bursts. The function Kg(μm) encodes information about the arrival process. Specifically, we note that for Poisson arrivals, we have Kg(μm) = 1.
For proteins (in the burst limit μm ≫ μp), we obtain  (6) where Kg(μp) is given by Eq (5) and is the variance of protein burst distribution produced by a single mRNA. The expression for protein noise is composed of the noise term for the basic two-stage model of gene expression  and additive noise contributions due to: a) deviations from exponential waiting-time distribution for the arrival process, b) deviations from conditional geometric distributions for mRNA burst distributions and c) deviations from geometric distributions for protein burst distributions. For both mRNAs and proteins, the noise in steady-state distributions depends on all the moments of the burst arrival time distribution through the term Kg. Therefore, arrival processes corresponding to different kinetic schemes for transcription will make different contributions to the overall noise, even if they have identical means and variances for the the burst arrival time distribution.
We note from Eq (4) that, for Poisson arrivals, i.e. Kg = 1, and geometrically distributed burst, i.e. ), the equations for the noise and mean have only two unknown burst parameters, kb and ⟨mb⟩. In this case, experimental measurements of the first two moments of the steady-state distribution are sufficient to estimate the burst parameters, as has been done in multiple studies. However, when the arrival process is non-Poisson or if the burst distribution deviates from a geometric distribution, measurements of the first two steady-state moments are not sufficient for estimating the burst parameters. This observation motivates the need for analytical expressions for the higher moments which we turn to next.
We now derive analytical expressions for the third moment, specifically the skewness parameter. For mRNAs, the exact expression for skewness γms is given by (7) where we have defined (8) For proteins, we obtain in the burst-limit (μm ≫ μp), (9) where the functions , , are given by Eq (8), is defined by and, using Eq (1) , we obtain the parameters as: (10) Similarly, expressions for higher order moments of protein and mRNA steady-state distributions can be obtained iteratively. The corresponding expressions for the kurtosis are provided in the S1 Text.
The analytical results derived above for proteins are exact in the burst limit, which assumes that proteins are produced instantaneously from all the mRNAs in a burst. Going beyond the burst limit (i.e. not limited to μm ≫ μp), exact results for the higher moments of the protein steady-state distribution will, in general, depend on the details of the kinetic scheme for gene expression. However, we can derive approximate analytical expressions for general schemes by requiring that: a) the results reduce to the exact results in the burst limit and b) they match the exact results for the two-stage model of gene expression. For the two-stage model, exact results for the first four moments have been derived by Bokes et. al . Comparing these exact results with our results derived in the burst limit, we observe that results of  can be reproduced by a suitable scaling of the burst-size parameters . For example, the exact expression for the noise is obtained by the following scaling : (11) Similarly, for the expression for skewness, the parameters and are scaled as: (12) As shown in Fig 2 (for the random telegraph model) the analytical expressions using this approach are in good agreement with results from simulations .
(a) Kinetic scheme for the two-state random telegraph model. For this model, steady state variance (scaled by 10−5) and third central moment ν3 (scaled by 10−6) of proteins as a function of μm/μp are plotted in (b) and (c) respectively: lines represent analytic estimates and points correspond to the simulation results. Parameters are: α = 0.5, β = 0.25, km = 2, ⟨mb⟩ = 5, kp = 0.5.
It is noteworthy that the results derived are valid for a general class of kinetic schemes of gene expression. For a specific kinetic scheme, we can determine the corresponding waiting-time distribution for the arrival process and the burst distributions for mRNA and proteins. Substituting these results in the equations derived leads to the corresponding expressions for the moments of the steady-state distribution. The results obtained can thus provide insight into how specific kinetic schemes of gene expression (e.g. combining promoter-based regulation and post-transcriptional regulation) can be used to impact the noise and higher moments of steady-state distributions.
Estimation of burst parameters
The results derived for the steady-state moments indicate that, if the burst arrival process is not a Poisson process, then it is no longer accurate to estimate burst parameters based on measurements of mean and variance only, as has been done in previous studies . In the following, we present approaches for estimating burst parameters in the general case.
We begin by considering the general kinetic scheme shown in Fig 3. This form for the kinetic scheme is supported by recent experiments in mammalian cells which suggest the presence of multiple rate-limiting steps between transition of the promoter from OFF to ON state [45, 57]. However, as observed in these experiments, a promoter in the ON state switches to the OFF state by a single rate-limiting step. We model the promoter switching from OFF to ON state by a general waiting-time distribution, g(t). The switching rate from ON to OFF state is given by β.
Thick line from inactive state D0 to active state Da represents a general kinetic scheme with g(t) as the waiting-time distribution for the promoter to switch to the ON state.
Burst parameters from the sequence-size function.
To extract burst parameters for the general scheme considered above, we first note that bursts are generated due to the interplay of two time scales, one that corresponds to production of mRNAs (when the gene is active) while the other one corresponds to the waiting-time between production events (when the gene is in inactive state). For bursty gene expression, we expect a clear separation of time-scales between the characteristic time periods for these two cases. Following , it is convenient to define a sequence-size function, (13) where f(t) is the waiting-time distribution for the arrival of a single mRNA starting with the promoter in the ON state. For a fixed τ, the sequence-size function can be used to categorize time intervals larger than τ as separating bursts. Correspondingly, the term represents the fraction of all mRNA arrivals that correspond to the arrivals produced in a single burst; thus ϕ provides the corresponding mean burst size. For bursty gene expression with a separation of time-scales, for a specific choice of τ = τx, the sequence-size function can be related to the actual mean burst size. If f(t) can be measured, then determination of τx can result in accurate estimates of the burst parameters such as mean burst size and frequency. In the following, we discuss how to determine τx for the general class of arrival processes considered in Fig 3.
The key insight is based on the observation that, due to the separation of time scales within bursts and between consecutive bursts, determination of τx can be done by using a simple two-state model as shown in Fig 2a. Even though the actual waiting time distribution between bursts (g(t)) may differ from the exponential distribution for the two-state model, the short-time behavior of the sequence-size function will be indistinguishable between the two cases (given separation of time-scales). If τx can be connected to the short-time behavior, then analytical expressions for the sequence-size function ϕ(τ) for the two-state model can be used to estimate τx and thereby the mean burst size.
For the two-state model, we find that burst size can be determined using a specific τx, which corresponds to an inflexion point where the curvature of ϕ(τ) changes its sign. Specifically, for the two-state model, we obtain f(t) by taking inverse Laplace transform of f(s) given by Eq (23). In the burst-limit, i.e., α/β → 0, we find that the sequence-size function, using Eq (13), is given by (14) and the value of τ at which ϕ(τ) exhibits inflexion is (15) The sequence size function ϕ(τ) at this point (τ = τx) is given by: (16)
Thus, the procedure for determination of the mean burst size (1 + ⟨mb⟩), given f(t), is as follows:
- Obtain the sequence-size function ϕ(τ) from f(t). For bursty synthesis, ϕ(τ) will have an inflexion point.
- The mean burst size (1 + ⟨mb⟩) is simply twice the value of the the sequence-size function ϕ(τ) at the inflexion point, τx.
This approach has been validated using stochastic simulations for multiple promoter models with correspondingly complex waiting-time distributions between bursts.
Estimation of f(t) from steady-state moments.
The procedure outlined in the previous section assumes that the waiting-time distribution f(t) can be determined. However, this can be challenging experimentally, thus it is desirable to develop approaches for estimating f(t) based on measurements of steady-state distributions.
To proceed in this direction, let us first obtain a relation connecting the two waiting-time distributions f(t) (for single mRNA arrival) and g(t) (for burst arrival). In Fig 3, we note that when the promoter is in the active state, Da, it can make multiple trips to D0 before producing mRNA. Whenever gene is in Da state, it can either create mRNA or can switch back to D0 state. Gene in Da state can produce mRNA either in a single step, i.e., without switching back to D0 state, or by making multiple trips to D0 before producing mRNA. Denoting the number of trips made before producing mRNA by q, we obtain that the Laplace transform of the waiting-time distribution f(t) is given by (17) which leads to: (18)
In order to determine fL(s), we will assume a specific functional form for gL(s). We consider that gL(s) is given by the following rational function, (19) This form for the Laplace transform of the waiting-time distribution is consistent with known waiting-time distributions for phase-type processes  and thus is valid quite generally.
Once we have an explicit form for fL(s), the next step is to determine the parameters, km, β, a1…am, and b1…bn. Thus, in general, we need m+n+2 measurements to estimate these parameters if we use . The simplest case, , implies the presence of one kinetic step from inactive state to active state, with rate 1/b1, and so it corresponds to the standard two-state random telegraph model. For this simple kinetic scheme, we can find the parameters, km, β, and b1, and hence fL(s) and the sequence size function by using three measurements associated with either mRNAs or proteins.
The form, , is exact for the two-state random telegraph model. Using the expressions obtained for the first four steady-state moments, we can derive an analytic condition that determines whether the underlying mechanism can be represented by (see Supplementary S2 Text). However, if the arrival process is complex and involves multiple rate-limiting steps, then will not be an accurate representation of the underlying kinetic process. In such cases, we need to use gL(s) of higher order. The next step in this iterative process is to take . This form of gL(s) is valid if there are only two rate-limiting steps in the promoter transition from OFF to ON state. For kinetic schemes that involve more than two steps, it will serve as an approximate reduced representation. Interestingly, it turns out that even if is not a correct representation of the underlying kinetic process, this reduced representation works very well as far as estimating burst size is concerned. In Fig 4, we have illustrated the effectiveness of this approach for a complex kinetic scheme for the promoter transition from OFF to ON state. The figure also illustrates the effectiveness of the approach outlined in the previous subsection for determining the mean burst size using the sequence-size function ϕ(τ).
For the transcriptional scheme shown in (a), the variations of ϕ″(τ) and ϕ(τ) as a function of time τ (scaled by 103) are shown in (b) and (c) respectively. The three lines correspond to three different values of β, 50 (dashed line), 100 (dotted line) and 200 (dashed-dotted line), while keeping km = 500: Exact burst size for these three cases are 11, 6 and 3.5, respectively. Estimated mean burst size has been indicated by filled symbols and the inflexion points in the sequence size function are shown by empty symbols. Other parameters: α1 = 1,α2 = 0.5,α3 = 0.25,α4 = 0.75,β1 = 0.1,β2 = 0.2,β3 = 0.5.
While the reduced representation, , works reasonably well for estimating burst size, with additional data, it is possible to extend the process further. The iterative procedure we propose is as follows:
- Start with the simplest form and use three moments associated with either mRNA or proteins (or both) to find fL(s) as discussed above. Then this fL(s) can be used to get analytic predictions for higher moments .
- If these analytic predictions are consistent with the corresponding experimental observations then provides a reasonable representation of the underlying kinetic scheme, else a representation using more complex kinetic schemes is required.
- To address more complex kinetic schemes, we iteratively change gL(s) from to …and so on, and iterate the steps outlined to determine the underlying fL(s). However, we note that for uncovering more complex kinetic scheme we need additional measurements to estimate fL(s). If moment measurements are possible at different mRNA/protein degradation rates, then these additional measurements can be used to estimate fL(s) and hence the corresponding mean transcriptional burst size.
Effect of extrinsic noise on burst estimation.
The burst estimation approach discussed in the preceding section assumes that the dominant contribution comes from intrinsic sources of fluctuations. However, extrinsic noise [1, 12], e.g. arising from different concentration of cellular components such as RNA polymerase, can also contribute significantly to the observed variations. It is thus of interest to examine how the proposed burst parameter estimation procedure works if we also consider sources of extrinsic noise.
To explore the effects of such fluctuations, we consider the model shown in Fig 5. In this kinetic scheme, the activation of gene from OFF to ON state involves two sequential steps, with rates α1 and α2. To include extrinsic fluctuations in the model, we consider that the rate of transcription km is a Log-normally distributed random variable with mean ⟨km⟩ and standard deviation σkm. For a given value of σkm, we determine the mean burst size following the procedure outlined above: i.e. by taking and then using the simulation values for the first four steady-state moments of mRNAs to estimate the unknown parameters (b1,b2,km,β), and hence the burst size. By varying σkm we study how the estimated burst size ⟨mb⟩σ deviates from the one without extrinsic noise, ⟨mb⟩0. As can be seen in Fig 5, for smaller values of σkm, the estimated burst size ⟨mb⟩σ is reasonably close to ⟨mb⟩0, however, as expected, ⟨mb⟩σ shows monotonic deviations from ⟨mb⟩0 for larger values of σkm.
Signatures for non-Poisson arrivals
The analytical expressions derived for the steady-state moments for mRNAs and proteins can also be used to make inferences about the burst arrival process based on steady-state measurements. Since multiple studies assume that the burst arrival process is characterized by an exponential waiting-time distribution, it would be useful to determine if this assumption is invalid using measurements of steady-state distributions. As shown below, we can obtain conditions for the same using the results derived for higher moments.
In the following, we will first focus on the cases that the mRNA burst distribution is conditional geometric and the protein burst distribution is geometric, which is consistent with multiple experimental observations. As discussed, choosing the conditional geometric distribution for mRNAs allows us to consider both single mRNA arrivals and geometric mRNA bursts in one framework. Since experiments can provide measurements of both mRNA and protein steady-state distributions, it is useful to have conditions for the arrival process using either mRNA data or protein data or both mRNA and protein data. Based on these three possibilities, we present three different conditions in the following.
Using moments of mRNA steady-state distributions.
Let us first consider the case where we have only measurements of the mRNA steady-state distribution. We note that for Poisson arrivals Kg(μm) = Kg(2μm) = 1, and using the expressions for mean and noise from Eqs (2) and (4) we get, Fm = ⟨mb⟩, where is the mRNA Fano factor. Further, using this in the equation for skewness, Eq (7), we derive the following condition that must be satisfied if the arrival of mRNA bursts is a Poisson process: (20) Thus is a signature of non-Poisson arrival processes. Since the above prescription is based on experimentally measurable quantities such as and μm, it can be used to determine if the assumption of a Poisson arrival process is invalid.
Using moments of protein steady-state distributions.
We next consider the case where we have access to only the protein steady-state distribution. The steps followed are similar to those outlined for the mRNA case. For Poisson arrivals, Kg(μp) = 1, and using Eqs (2) and (11) we get where is the protein Fano factor. Substituting this in the expression for protein skewness, Eq (9) with the scaled and given by Eq (12), we arrive at the following condition for Poisson arrivals. (21) Again, non-zero value of is a signature of non-Poisson arrivals.
Using both mRNA and protein steady-state distributions.
Finally, if we have both mRNA and protein steady-state distribution measurements available, then the condition for Poisson arrivals can be obtained by combining measurements of second moments of mRNA and protein distributions as follows: Using Eqs (2),(4) and (11), we get, (22) which vanishes for Poisson arrival of mRNA bursts. Thus non-zero values of indicate non-Poisson arrival of mRNA bursts. Interestingly, for this condition there is no need to assume that the mRNA burst distribution is geometric. That is, the condition holds true for arbitrary mRNA burst distributions. Also, the condition does not require measurement of third moments.
Signatures for a simple kinetic scheme.
To illustrate the prescription derived for determining non-Poisson arrival processes, we consider a specific kinetic scheme, Fig 2a. For this kinetic scheme, the mRNA arrival time distribution in the Laplace domain is given by (Eq (S3–9) Supplementary S3 Text) (23) Using this in Eq (5) we find the gestation factor, Kg, and hence the mean, Fano factor and skewness for both mRNAs and proteins. Finally, we derive exact analytic expressions for , and from Eqs (20), (21) and (22) respectively. The expression for reads (24) where (25) and we have set μm = 1 for simplicity. As expected, we note that vanishes for the Poisson arrival processes, i.e., either when β is zero, or when the switching rates α and β are very large compared to the rate of transcription, km. The general expression for is complicated. However, to gain insight about the arrival process, we can write down a simpler expression for in the burst limit, μm = 1 ≫ μp: (26) where (27) Again, for Poisson arrival processes vanishes. Finally, we obtain an analytic expression for , which is given by (28) and as expected, we note that vanishes for Poisson arrivals and is negative for μp < μm. In Fig 6, we have plotted the three analytic expressions together with simulation results as a function of β.
The quantities , and are plotted for the model shown in Fig 2a as a function of off rate β. Analytic estimates are shown by lines whereas points correspond to the simulation results with parameters: α = 0.25, km = 2, ⟨mb⟩ = 5, kp = 0.5, μm = 1, μp = 0.01.
Signatures for non-geometric bursts
As discussed in the previous section, it is widely assumed that the mRNA burst distribution can be represented by a conditional geometric distribution (i.e. including both single mRNA arrivals and geometrically distributed burst arrivals). While this assumption is consistent with multiple experimental observations, for general kinetic schemes the possibility of non-geometric mRNA burst distributions has to be considered. Thus, it is of interest to examine if the results obtained can be used to determine if the mRNA burst distribution deviates from a conditional geometric distribution.
To address the possibility of non-geometric mRNA burst distributions, let us first consider that the random variable corresponding to the mRNA burst distribution (mb) has a conditional geometric distribution. That is, the probability that a burst produces n mRNA molecules is given by (29) where 0 < p ≤ 1, and n = 1, 2, 3…∞. This distribution leads to (30) Using Eqs (2) and (30) in Eq (4), and denoting as the Fano factor of mRNA copy number, Eq (4) can be rewritten as (31) Similarly, using the burst size distribution from Eq (29), the skewness in Eq (7) is given by (32)
We note that Eq (32) connects experimentally measurable moments of the steady-state distribution to the parameters Kg(μm), Kg(2μm) and ⟨mb⟩. Furthermore, note that Eq (31) can be recast as Kg(μm) = (2Fm(μm)/⟨mb⟩)−1. Now, considering a change in the degradation rate from μm to 2μm (keeping the mean burst size, ⟨mb⟩ invariant), we obtain (33) Using the above in Eq (32), we get an expression connecting experimentally measurable quantities associated with moments of the mRNA steady-state distribution. The resulting expression is: (34)
We note that the above expression has been derived by making just one assumption, namely, the mRNA burst distribution is a conditional geometric distribution. The derived expression thus indicates that a combination of experimentally measurable quantities has to deviate from 1 if the mRNA burst distribution deviates from a conditional geometric distribution. Thus the analytical results derived provide a signature for deviation from conditional geometric mRNA bursts using measurements of the first three moments of the mRNA steady-state distribution.
The main requirement for using the above relation is that measurements of mRNA steady-state distribution can be carried out at two different rates of the mRNAs μm and 2μm. Given that mRNA degradation rates can be tuned experimentally, a straightforward strategy to ensure that the degradation rate is tuned to twice the original value (2μm) is to compare the mean mRNA levels at μm and 2μm. Given these measurements, a value of implies that bursts are not distributed geometrically. The strength of this result lies in the fact that it holds for general arrival processes for mRNA bursts with arbitrary waiting-time distributions.
Let us consider a specific simple model to illustrate the condition derived above. First, let the arrival process for mRNA bursts be a Poisson process. For this, arrival time distributions of mRNA bursts in the time domain, t, and in the Laplace domain, s, are given by (35) where kb is the rate of arrival of mRNA bursts. For the mRNA burst distribution, let us assume that it is given by the negative binomial distribution, i.e. (36) where 0 < p ≤ 1, r ≥ 1, and n = 0, 1, 2, 3…∞. For r = 1, the above reduces to the geometric distribution and therefore we expect Gm = 1 in this limit. Using the expressions for the moments derived in the previous section, we obtain an explicit expression for Gm (Supplementary S3 Text): (37) Notice that for the geometric bursts (r = 1) we get , as expected. However, for non-geometric bursts, deviations of from 1 are observed (also see Fig S3–1 in Supplementary S3 Text). Two additional examples of microscopic models for non-geometric bursts (the two state random telegraph model and a model with three promoter states where mRNAs are produced from two states) are discussed in the.
The preceding analysis can be extended to protein steady-state distributions to derive a similar condition for deviations from geometric burst distributions in terms of steady state moments associated with proteins (see Supplementary S4 Text).
In this paper we study stochastic gene expression models with a general renewal-type arrival process for mRNAs. By mapping such a generic model of gene expression to systems studied in queueing theory, we derive analytical expressions for the moments for mRNA and protein steady-state distributions. While the focus of this work is on using approaches drawn from queueing theory, it is noteworthy that the kinetic scheme defined in Fig 1 can also be analyzed using the general theory of branching processes with immigration . In future work, it would be of interest to explore potential connections between complementary approaches to such models based on branching processes and queueing theory.
While previous studies [37, 43] have focused on protein noise, in the present work we derive analytic expressions for higher order moments of both mRNA and protein steady-state distributions. For arbitrary kinetic schemes, the results obtained determine how the moments of steady-state distributions depend on model parameters. They elucidate how different sources (promoter-based regulation, transcriptional bursting, post-transcriptional regulation) combine to determine the overall noise and higher moments. Furthermore, the results derived show how parameters of interest (such as mean protein production rate kp) can be estimated for general models (i.e. without making any assumptions about specific features of the models).
The expressions derived for the moments can also be used to infer if the arrival process for mRNAs is non-Poisson or if the mRNA burst distribution deviates from the geometric distribution. Correspondingly, we obtain analytic conditions that provide signatures for non-Poisson arrivals of mRNA bursts and for non-geometric mRNA burst distributions. These conditions involve relations between combinations of of experimentally measurable quantities and can thus be tested by using measurements of either mRNA steady-state distributions or protein steady-state distributions or both. Apart from obtaining insights into the statistics of the arrival process, we can use the results derived for steady-state moments for accurately estimating burst parameters using an iterative approach. Notably, the results and the approaches developed in this work are valid for general models of gene expression i.e., given the general assumptions made, they do not depend on the specifics of the kinetic schemes.
It is important to note that the burst parameter estimation approaches presented in this paper rely on the accurate measurements of higher order moments, such as skewness or kurtosis. This, in turn requires that we have relatively large sample sizes. For example, simulations of two state random telegraph model (see Supplementary S5 Text) indicate that for the standard error in skewness to be below 10%, the sample size should be ∼ 1000. Current experimental limitations on measurements of mRNA distributions (e.g. using RNA FISH) do not allow for such large sample sizes and thus do not lead to accurate computation of skewness or kurtosis. While accurate measurements of higher moments are not readily available in the existing data, it is hoped that our results will provide motivation for carrying out the corresponding experiments in future. The combination of these experimental results with our theoretical approaches can be used in obtaining accurate representations of the arrival process and burst parameters for a wide range of cellular systems.
S1 Text. Derivation of steady-state moments for mRNAs and proteins.
S2 Text. Condition for the two-state random telegraph model.
S3 Text. Illustrative examples for condition identifying non-geometric bursts.
S4 Text. Condition for non-geometric bursts using protein steady-state moments.
Conceived and designed the experiments: NK AS RVK. Performed the experiments: NK. Analyzed the data: NK AS RVK. Contributed reagents/materials/analysis tools: NK RVK. Wrote the paper: NK RVK. Conceived and designed the research: NK AS RVK. Developed the analytical approaches: NK RVK. Carried out stochastic simulations: NK. Discussed the results at all stages and also their implications: NK AS RVK. Wrote the main paper: NK RVK. Wrote the Supplementary Information: NK. Read, commented on and edited the manuscript: NK AS RVK.
- 1. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002;297(5584):1183–1186. pmid:12183631
- 2. Kærn M, Elston TC, Blake WJ, Collins JJ. Stochasticity in gene expression: from theories to phenotypes. Nature Reviews Genetics. 2005;6(6):451–464. pmid:15883588
- 3. Raser JM, O’Shea EK. Noise in gene expression: origins, consequences, and control. Science. 2005;309(5743):2010–2013. pmid:16179466
- 4. Sanchez A, Choubey S, Kondev J. Regulation of noise in gene expression. Annual review of biophysics. 2013;42:469–491. pmid:23527780
- 5. Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010;467(7312):167–173. pmid:20829787
- 6. Raj A, van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. 2008;135(2):216–226. pmid:18957198
- 7. Larson DR. What do expression dynamics tell us about the mechanism of transcription? Current opinion in genetics & development. 2011;21(5):591–599.
- 8. Junker JP, van Oudenaarden A. Every Cell Is Special: Genome-wide Studies Add a New Dimension to Single-Cell Biology. Cell. 2014;157(1):8–11. pmid:24679522
- 9. Munsky B, Neuert G, van Oudenaarden A. Using gene expression noise to understand gene regulation. Science. 2012;336(6078):183–187. pmid:22499939
- 10. Golding I. Decision making in living cells: lessons from a simple system. Annual review of biophysics. 2011;40:63–80. pmid:21545284
- 11. Bar-Even A, Paulsson J, Maheshri N, Carmi M, O’Shea E, Pilpel Y, et al. Noise in protein expression scales with natural protein abundance. Nature genetics. 2006;38(6):636–643. pmid:16715097
- 12. Newman JR, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, DeRisi JL, et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006;441(7095):840–846. pmid:16699522
- 13. Weinberger L, Voichek Y, Tirosh I, Hornung G, Amit I, Barkai N. Expression noise and acetylation profiles distinguish HDAC functions. Molecular cell. 2012;47(2):193–202. pmid:22683268
- 14. Kumar N, Platini T, Kulkarni RV. Exact distributions for stochastic gene expression models with bursting and feedback. Physical Review Letters. 2014;113(26):268105. pmid:25615392
- 15. Tsimring LS. Noise in biology. Reports on Progress in Physics. 2014;77(2):026601. pmid:24444693
- 16. Hinczewski M, Thirumalai D. Cellular Signaling Networks Function as Generalized Wiener-Kolmogorov Filters to Suppress Noise. Phys Rev X. 2014 Oct;4:041017. Available from: http://link.aps.org/doi/10.1103/PhysRevX.4.041017.
- 17. Balázsi G, van Oudenaarden A, Collins JJ. Cellular decision making and biological noise: from microbes to mammals. Cell. 2011;144(6):910–925. pmid:21414483
- 18. Suter DM, Molina N, Naef F, Schibler U. Origins and consequences of transcriptional discontinuity. Current opinion in cell biology. 2011;23(6):657–662. pmid:21963300
- 19. Coulon A, Chow CC, Singer RH, Larson DR. Eukaryotic transcriptional dynamics: from single molecules to cell populations. Nature Reviews Genetics. 2013;14(8):572–584. pmid:23835438
- 20. Golding I, Paulsson J, Zawilski SM, Cox EC. Real-Time Kinetics of Gene Activity in Individual Bacteria. Cell. 2005;123(6):1025—1036. Available from: http://www.sciencedirect.com/science/article/pii/S0092867405010378. pmid:16360033
- 21. Chubb JR, Trcek T, Shenoy SM, Singer RH. Transcriptional pulsing of a developmental gene. Current biology. 2006;16(10):1018–1025. pmid:16713960
- 22. Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS biology. 2006;4(10):e309. pmid:17048983
- 23. So Lh, Ghosh A, Zong C, Sepúlveda LA, Segev R, Golding I. General properties of transcriptional time series in Escherichia coli. Nature genetics. 2011;43(6):554–560. pmid:21532574
- 24. Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010;329(5991):533–538. pmid:20671182
- 25. Zong C, So Lh, Sepúlveda LA, Skinner SO, Golding I. Lysogen stability is determined by the frequency of activity bursts from the fate-determining gene. Molecular systems biology. 2010;6(1). pmid:21119634
- 26. Sanchez A, Golding I. Genetic determinants and cellular constraints in noisy gene expression. Science. 2013;342(6163):1188–1193. pmid:24311680
- 27. Dar RD, Razooky BS, Singh A, Trimeloni TV, McCollum JM, Cox CD, et al. Transcriptional burst frequency and burst size are equally modulated across the human genome. Proceedings of the National Academy of Sciences. 2012;109(43):17454–17459.
- 28. Singh A, Razooky BS, Dar RD, Weinberger LS. Dynamics of protein noise can distinguish between alternate sources of gene-expression variability. Molecular systems biology. 2012;8(1).
- 29. Gefen O, Gabay C, Mumcuoglu M, Engel G, Balaban NQ. Single-cell protein induction dynamics reveals a period of vulnerability to antibiotics in persister bacteria. Proceedings of the National Academy of Sciences. 2008;105(16):6145–6149.
- 30. Weinberger LS, Burnett JC, Toettcher JE, Arkin AP, Schaffer DV. Stochastic gene expression in a lentiviral positive-feedback loop: HIV-1 Tat fluctuations drive phenotypic diversity. Cell. 2005;122(2):169–182. pmid:16051143
- 31. Zeng L, Skinner SO, Zong C, Sippy J, Feiss M, Golding I. Decision Making at a Subcellular Level Determines the Outcome of Bacteriophage Infection. Cell. 2010;141(4):682—691. Available from: http://www.sciencedirect.com/science/article/pii/S0092867410003521. pmid:20478257
- 32. Wernet MF, Mazzoni EO, Çelik A, Duncan DM, Duncan I, Desplan C. Stochastic spineless expression creates the retinal mosaic for colour vision. Nature. 2006;440(7081):174–180. pmid:16525464
- 33. Ochiai H, Sugawara T, Sakuma T, Yamamoto T. Stochastic promoter activation affects Nanog expression variability in mouse embryonic stem cells. Scientific reports. 2014;4.
- 34. Senecal A, Munsky B, Proux F, Ly N, Braye FE, Zimmer C, et al. Transcription factors modulate c-Fos transcriptional bursts. Cell reports. 2014;8(1):75–83. pmid:24981864
- 35. Cai L, Friedman N, Xie XS. Stochastic protein expression in individual cells at the single molecule level. Nature. 2006;440(7082):358–62. pmid:16541077
- 36. Yu J, Xiao J, Ren X, Lao K, Xie XS. Probing gene expression in live cells, one protein molecule at a time. Science. 2006;311(5767):1600–1603. pmid:16543458
- 37. Pedraza JM, Paulsson J. Effects of molecular memory and bursting on fluctuations in gene expression. Science. 2008;319(5861):339–343. pmid:18202292
- 38. Zhang J, Zhou T. Promoter-mediated Transcriptional Dynamics. Biophysical journal. 2014;106(2):479–488. pmid:24461023
- 39. Peccoud J, Ycart B. Markovian modeling of gene-product synthesis. Theoretical population biology. 1995;48(2):222–234.
- 40. Shahrezaei V, Swain PS. Analytical distributions for stochastic gene expression. Proceedings of the National Academy of Sciences. 2008;105(45):17256–17261.
- 41. Dobrzyński M, Bruggeman FJ. Elongation dynamics shape bursty transcription and translation. Proceedings of the National Academy of Sciences. 2009;106(8):2583–2588.
- 42. Skupsky R, Burnett JC, Foley JE, Schaffer DV, Arkin AP. HIV promoter integration site primarily modulates transcriptional burst size rather than frequency. PLoS computational biology. 2010;6(9):e1000952. pmid:20941390
- 43. Jia T, Kulkarni RV. Intrinsic Noise in Stochastic Models of Gene Expression with Molecular Memory and Bursting. Phys Rev Lett. 2011 Feb;106:058102. Available from: http://link.aps.org/doi/10.1103/PhysRevLett.106.058102. pmid:21405439
- 44. Xu X, Kumar N, Krishnan A, Kulkarni RV. Stochastic modeling of dwell-time distributions during transcriptional pausing and initiation. In: Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on. IEEE; 2013. p. 4068–4073.
- 45. Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, Naef F. Mammalian genes are transcribed with widely different bursting kinetics. Science. 2011;332(6028):472–474. pmid:21415320
- 46. Harper CV, Finkenstädt B, Woodcock DJ, Friedrichsen S, Semprini S, Ashall L, et al. Dynamic analysis of stochastic transcription cycles. PLoS biology. 2011;9(4):e1000607. pmid:21532732
- 47. Elgart V, Jia T, Kulkarni RV. Applications of LittleÕs Law to stochastic models of gene expression. Physical Review E. 2010;82(2):021901.
- 48. Liu L, Kashyap BRK, Templeton JGC. On the GIX/G/Infinity system. Jour Appl Prob. 1990;27(3):671–683.
- 49. Bokes P, King JR, Wood AT, Loose M. Multiscale stochastic modelling of gene expression. Journal of mathematical biology. 2012;65(3):493–520. pmid:21979825
- 50. Ingram PJ, Stumpf MPH, Stark J. Nonidentifiability of the Source of Intrinsic Noise in Gene Expression from Single-Burst Data. PLoS Comp Biol. 2008;4(10).
- 51. Cookson NA, Mather WH, Danino T, Mondragón-Palomino O, Williams RJ, Tsimring LS, et al. Queueing up for enzymatic processing: correlated signaling through coupled degradation. Molecular systems biology. 2011;7(1). pmid:22186735
- 52. Mather WH, Cookson NA, Hasty J, Tsimring LS, Williams RJ. Correlation resonance generated by coupled enzymatic processing. Biophysical journal. 2010;99(10):3172–3181. pmid:21081064
- 53. Little JD. A proof for the queuing formula: L = λ W. Operations research. 1961;9(3):383–387.
- 54. Ross SM. Introduction to Probability Models, Ninth Edition. Orlando, FL, USA: Academic Press, Inc.; 2006.
- 55. Bokes P, King JR, Wood AT, Loose M. Exact and approximate distributions of protein and mRNA levels in the low-copy regime of gene expression. Journal of mathematical biology. 2012;64(5):829–854. pmid:21656009
- 56. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. The journal of physical chemistry. 1977;81(25):2340–2361.
- 57. Daigle BJ, Soltani M, Petzold LR, Singh A. Inferring Single-Cell Gene Expression Mechanisms using Stochastic Simulation. Bioinformatics. 2015;p. btv007.
- 58. Karlin S. A first course in stochastic processes. Academic press; 2014.