The Low Noise Limit in Gene Expression

Protein noise measurements are increasingly used to elucidate biophysical parameters. Unfortunately noise analyses are often at odds with directly measured parameters. Here we show that these inconsistencies arise from two problematic analytical choices: (i) the assumption that protein translation rate is invariant for different proteins of different abundances, which has inadvertently led to (ii) the assumption that a large constitutive extrinsic noise sets the low noise limit in gene expression. While growing evidence suggests that transcriptional bursting may set the low noise limit, variability in translational bursting has been largely ignored. We show that genome-wide systematic variation in translational efficiency can–and in the case of E. coli does–control the low noise limit in gene expression. Therefore constitutive extrinsic noise is small and only plays a role in the absence of a systematic variation in translational efficiency. These results show the existence of two distinct expression noise patterns: (1) a global noise floor uniformly imposed on all genes by expression bursting; and (2) high noise distributed to only a select group of genes.


Distinguishing between an extrinsic and burst noise floor
Eqns. 2 and 4 in the main text show that where C 2 represents a noise floor. Since where f B is the frequency of transcriptional bursts and  p is the protein decay/dilution rate, If at the larger values of <P i > increasing protein abundance is primarily driven by increasing values of b i and B i , and  p is controlled by a constant cell growth rate, then the transcriptional burst frequency must remain relatively constant, and where C B is a constant value (the ratio of the protein decay rate to the maximum transcriptional burst frequency) that defines the burst noise floor. Eqn. S-1 describes the noise floor as the combination of burst and constitutive extrinsic noise floors. If C B is large enough, the constitutive extrinsic noise floor must be small.

Forcing an extrinsic noise floor
We tested various models of gene expression noise with significant levels of constitutive extrinsic noise to determine if they could parsimoniously represent the Taniguchi et al. (Taniguchi et al., 2010) noise data and transcriptional bursts described by the experimentally based model of So et al. (So et al., 2011). We tested the following models: Model 1 (Equation 1 from the main text): Model 2 (two-state model): We obtained values of b from Eqn. 5 in the main text to apply to each of the two models. Average values of B were assumed to be related to protein expression through a power law of the form: where B min = 1 for Model 1 and B min = 0 for Model 2. Values of q and r were adjusted to obtain a maximum likelihood fit of each model to the noise data of Taniguchi et al. (Taniguchi et al., 2010). Log transformations of each model were used to obtain residuals that were near-normally distributed and with magnitude independent of <P>.
Both models were evaluated for values of the extrinsic noise floor of E ranging from 0 to 0.1. To assess the ability of each model to describe the data of Taniguchi et al. (Taniguchi et al., 2010) , we used the Akaike information criteria (Akaike, 1974) (AIC): where k is the number of parameters in the model and ℒ is the likelihood of the model given the observed data. The AIC characterizes the information that is lost when a model is used to represent the underlying process that generates the data. The probability that a given model j has minimized the information loss compared to the model with AIC min is given by (Akaike, 1974): and can be considered as a relative comparison of model quality. where r i are the residuals from the fit of the log-transformed model to the data of Taniguchi et al. (Taniguchi et al., 2010),  2 is the variance of the residuals, and the mean of the residuals is assumed to be zero.
Maximum likelihood estimates of the model parameters q and r for each model are summarized below. Power law parameters from Eqn. 7 in the main text and from the power law function determined by So et al. (So et al., 2011) are provided for reference.  For Model 2, even moderate levels of constitutive extrinsic noise (E=0.05) result in unlikely models. For the highest level of extrinsic noise (E=0.1), the optimum fit was obtained for q=0 and r=0, corresponding to strictly Poissonian mRNA expression across all expression levels and contrary to known transcriptional behavior (So et al., 2011). Therefore, our conclusion that bursty expression plays a major role in establishing the observed noise floor and that the noise floor cannot be the result of extrinsic noise acting alone, does not depend upon a particular model of gene expression noise (Fig 4).

Transcriptional bursting in mammalian cells
Using a high-throughput time-lapse imaging, we previously measured transcriptional burst size and frequency for over 2000 integration sites of a polyclonal population of human T-cells harboring diverse integrations of a single HIV-LTR promoter driving a de-stabilized d2GFP reporter with a 2.5 hour half-life (Dar et al., 2012). Using this data and the reported equations for transcriptional burst size and burst frequency, Bs, BF, and k off are calculated for the HIV LTR-d2GFP polyclonal sub-clusters or groups of single-cell with unique integration sites and similar mean expression levels (Dar et al., 2012). k off is calculated using an assumed low "on fraction" range of O < 0.2 and in addition the reported mRNA FISH measurement of 110 mRNA is assumed to be equivalent to O = 0.1 and used as a benchmark to calculate the O and k off values (using BF or k on ) for each polyclonal sub-cluster by scaling by their <GFP>. Finally a = Bs*k off was calculated and a 5 sub-cluster moving average across abundance levels was applied before plotting the results ( Figure B and Fig 6).

Expression burst analysis
If an expression burst (combined transcriptional and translational) occurs in a relatively short time period (i.e. if we consider k OFF >> k ON ), then we can approximate this as the product of three random processes: Process A (transcriptional initiation) composed of a Poissonian pulse train of impulse functions of weight = 1 and average value ̅ ; Process B (transcriptional bursting) that is uncorrelated with process A, has a mean value of ̅ , and a variance of 2 ; and Process b (translational bursting) that is uncorrelated with processes A and B, has a mean value of ̅ , and a variance of 2 . The autocorrelation functions of these three processes are The autocorrelation function of the expression burst is given by the product of the autocorrelation functions of these three functions or ( ) = ( ) * ( ) * ( ) = ̅ 2 2 ( ) + ̅ ̅ 2 2 ( ) + ̅ ̅ 2 2 ( ) + ̅ ̅ 2 ̅ 2 ( ) where we have neglected all the ̅ 2 terms because ̅ ≪ 1. From this we get 2 = ̅ 2 2 + ̅ ̅ 2 2 + ̅ ̅ 2 2 + ̅ ̅ 2 ̅ 2 and the Fano factor (which would be the Fano factor of the protein abundance) is where FF b and FF B are the Fano factors of translational and transcriptional burst sizes respectively.
This equation points out that in the two-state model a transcriptional burst size, B = 1 produces a different Fano factor (FF = 2 (1 for the value of ̅ and an additional + 1 for the Fano factor of B)) than Poissonian expression of single mRNA molecules.
To overcome this apparent discrepancy in the Fano factor in the two-state model, we introduce a model in which the first mRNA synthesis event begins the burst, and the number of synthesis events that follow the initiating event (B E ) is a random variable. In that case, where the 1 term stems from the Poissonian process of initiation events and B E is the randomized process contributing to the variance in the burst size. Therefore to recover B, B E must equal B -1, and the variance in B exclusively comes from B E 2 = 2 = From this it follows that, And at the low end of expression where ̅ ≈ 1, < > ≈ ̅ ( ̅ + 1).

(S-2)
In contrast to the two-state model, this model provides a smooth transition from Poissonian expression of single mRNA molecules to bursts of multiple mRNA production.
Note that the model presented here is based on a burst of protein expression where the average size of the burst is b*B and the frequency of the burst is driven by the random process A as described above. These conditions can be violated when b << 1, where almost regardless of the value of B, protein expression is nearly Poissonian. In such caseswhich all occur at low values of <P> -CV 2 goes as 1/<P> (the Poissonian regime in Eqn. 8 of the main text). Since noise behavior is so insensitive to transcriptional burst size in this regime, it is difficult to extract accurate values of B from the protein noise for the lowest protein populations.