Nonidentifiability of the Source of Intrinsic Noise in Gene Expression from Single-Burst Data

Over the last few years, experimental data on the fluctuations in gene activity between individual cells and within the same cell over time have confirmed that gene expression is a “noisy” process. This variation is in part due to the small number of molecules taking part in some of the key reactions that are involved in gene expression. One of the consequences of this is that protein production often occurs in bursts, each due to a single promoter or transcription factor binding event. Recently, the distribution of the number of proteins produced in such bursts has been experimentally measured, offering a unique opportunity to study the relative importance of different sources of noise in gene expression. Here, we provide a derivation of the theoretical probability distribution of these bursts for a wide variety of different models of gene expression. We show that there is a good fit between our theoretical distribution and that obtained from two different published experimental datasets. We then prove that, irrespective of the details of the model, the burst size distribution is always geometric and hence determined by a single parameter. Many different combinations of the biochemical rates for the constituent reactions of both transcription and translation will therefore lead to the same experimentally observed burst size distribution. It is thus impossible to identify different sources of fluctuations purely from protein burst size data or to use such data to estimate all of the model parameters. We explore methods of inferring these values when additional types of experimental data are available.

and the probability that the next state is 2 is 1 2 0 α β Figure S1: If the system is in state 0 at a given time, it can transit to state 1 at a rate α or to state 2 at a rate β. The probability that the system will transit from state 0 to step 1 in an arbitrary time-step h is αh.
The Joint Distribution In the main paper we have given the overall protein burst size distribution P (n). It is also possible to derive the more detailed joint distribution P (m, n) that exactly m mRNA and n protein molecules are produced. We may think of this as where P (n|M = m) is the conditional distribution that n proteins are produced if there are m mRNA molecules. If we assume that each transcript produces copies of the protein independently then the generating function P * (z|m) is just the product of the m generating functions for the protein produced by one mRNA molecule, Hence to compute the probabilities P (n|M = m), we calculate For the case n = 1, we may easily compute We now prove the more general result using the case n = 1 as a basis for induction. Assuming that for the case n = i: then for n = i + 1: which completes the inductive step. Therefore  Figure S2: Distribution of the number of proteins which will be produced during a gene expression burst with one mRNA molecule and with twenty mRNA molecules.
Thus the joint probability may now be calculated as This is illustrated for two different values of number of mRNA molecules in Figure S2. Finally, by summing over m we can recover the overall burst size distribution P (n) which was derived using generating functions (but only the conditional distribution for n > 0 was explicitly stated). Special consideration is needed for the case n = 0, as the case that no transcripts are produced must be added to the probability that m transcripts are produced but no proteins are produced. Thus and for n > 0 Conditioning on n > 0 and defining A 2 = A 2 (1 + A 1 ) recoversP (n) as in the main article. Similar calculations can be carried out for the various extensions to the standard model considered above, though the details become quite lengthy for the more complex cases.

Alternative generalisation
A different generalisation is to add additional loops with the same structure as the current transcription and translation loops, Figure S3. We prove below that if we have k − 1 such loops, the final conditional protein size distributionP k (n) will still be geometriĉ with the parameterÂ k given bŷ  Figure S3: Diagram of the generalised situation with k − 1 serially coupled loops of the type considered. If k = 3 then we have a system with two loops which we have used to model transcription and translation in gene expression.
Iterating this with initial conditionÂ 1 = A 1 gives the expression in Equation S2.