Protein Dynamics in Individual Human Cells: Experiment and Theory

A current challenge in biology is to understand the dynamics of protein circuits in living human cells. Can one define and test equations for the dynamics and variability of a protein over time? Here, we address this experimentally and theoretically, by means of accurate time-resolved measurements of endogenously tagged proteins in individual human cells. As a model system, we choose three stable proteins displaying cell-cycle–dependant dynamics. We find that protein accumulation with time per cell is quadratic for proteins with long mRNA life times and approximately linear for a protein with short mRNA lifetime. Both behaviors correspond to a classical model of transcription and translation. A stochastic model, in which genes slowly switch between ON and OFF states, captures measured cell–cell variability. The data suggests, in accordance with the model, that switching to the gene ON state is exponentially distributed and that the cell–cell distribution of protein levels can be approximated by a Gamma distribution throughout the cell cycle. These results suggest that relatively simple models may describe protein dynamics in individual human cells.


General properties of cell cycle behavior of PRC1 and YB1
The cell-cycle time is 21.6 ± 2 hrs for PRC1 and 17.2 ± 2 hrs for YB1 clones. We measured the correlation between fluorescence levels of the tagged proteins PRC1 and YB1 at the end of the cell cycle and several cellular factors such as cell cycle length, initial protein levels and cell size. For PRC1 we found low correlation with cell size (R = 0.21 p<0.05) and no correlation with cell cycle length (R = 0.02 p = 0.8) or with initial fluorescence levels (R = 0.09 p = 0.2). However, YB1 showed opposite tendencies: no correlation with cell size (R = 0.08 p = 0.3) and moderate correlation with cell cycle length (R = 0.44 p<0.05) and initial fluorescence levels (R = 0.5 p<0.05) ( Figure S8A-F). When no correlation is found it suggests that different aspects of the biological phenomena are independent of each other. On the contrary, when correlation is found, it may suggest causative connections between processes at different stages of the cell cycle and the protein level at the end of the cell cycle.
We found strong correlation (R=0.99 for PRC1 and R=0.9 for YB1, p<0.05) between initial protein levels inherited by each daughter cell and the absolute amount of protein degraded at the beginning of the cell cycle ( Figure S9A,B). In the case of PRC1 this correlation is due to the fact that almost all initial protein is degraded. respectively. For PRC1 no significant correlation was found with any of the above four parameters. YB!, showed significant correlation between protein levels at end of cell cycle and cell cycle age (D) and fluorescence at beginning of cell cycle (F).

Fitting parameters to the transcription inhibition experiment:
A transcription inhibitor is added to the cell clones expressing YB1 at 0 hr t = . If the inhibitor starts taking affect at time 0 t then the differential equations for mRNA levels ( ) r t and protein levels ( ) p t are: The solution to these equations is detailed in Table S1: Table S1: Analytical solution for the dynamics of the mean mRNA and protein levels for the inhibition experiment done on YB1 reporter clones.
A good fit is achieved for 0 2.33hr t = , -1 0.77 hr R α = as shown in Figure S8. This conforms to previous observations which indicate that the inhibitor starts taking affect approximately 2 hours after introduction. Figure S8: Fitting the mean protein levels for the transcription inhibition experiment. Shown are the YB1 protein levels in individual cells (a), their mean (orange circles) (b). Fitting to the analytical solution (blue line) indicates that the inhibitor starts taking affect roughly 0 2.33 t = hours after introduction into the medium in line with Nyugen et al. (Nguyen, Giannoni et al. 1996). Both PRC1 and YB1 can be described by a simple stochastic model for gene expression. The model assumes that mRNA is produced at rate R k , and degraded with degradation rate R α (for PRC1 we assume that the mRNA is stable and take Protein is produced at rate P k from each molecule of mRNA, and is taken to be stable throughout the cell cycle. The cell cycle is approximately 20 hours. : State diagram of the stochastic model for transcription and translation. 6 arrows correspond to 6 terms: 3 incoming (+) and 3 outgoing (-). Note that r cannot be negative due to the fact that the degradation from state 0 r = is proportional to 0 R r α = .
The master equation can be solved analytically for the mean and variance (and all higher moments) of the protein and mRNA levels. Using the generating function ( ) , F x y and its derivatives with respect to x and y (as detailed in Table S2) the master equation can be written as: p . This is actually done by differentiating the master equation with increasing orders of x and y , and substituting 1 x y = = , see Table S3.

Analytical solution for time-dependent moments:
Solving the differential equations in Table S3 for YB1 ( 0  Table S4. It can be seen that asymptotically (i.e., for 1/ R t α ), the mean protein level of YB1 is increasing linearly with time (p t ), whereas for PRC1 it grows quadratically ( 2 p t ).
For Poissonian processes, the variance is equal to the mean, and Noise strength is equal to 1. In our case we define a parameter b to quantify the deviation from Poisson such that 2 / 1 The CV however, is asymptotically decreasing with time for both cases: Using the expressions for the mean protein levels p , the Noise strength, and the CV, it is possible in principle to estimate the biochemical parameters R k , R α , and P k from the experimental data of protein levels in single cells. It is important to note that when measuring the protein levels using fluorescent reporters, the mean number of proteins and the noise strength are multiplied by a proportionality factor which represents the number of fluorescent units per protein molecule. This factor has to be measured independently. However, the CV is invariant to measurement units.
For long times: Noise strength (asymptotically):

Simulation:
In order to simulate the stochastic process we use the Gillespie algorithm: 1. Define the species: r , p . b. Calculate the time T for next reaction, and the next reaction m . The basic assumption is that the probability that reaction m will occur after time interval (i) Choose two random numbers:

General description:
A more complete description of gene expression in eukaryotic cells should take into account the stochastic process of gene activation and deactivation. These fluctuations occur due to random events of binding and unbinding of the DNA to the chromatin ("chromatin remodeling"). The gene is active and mRNA can be produced only when the DNA is "open", i.e. not bound tightly to the chromatin -see figure S11. We assume that the transitions between active and inactive states occur at rates on k and off k , and are rather slow (a few events per hour). Mathematically this process is known as a "Telegraph process". is the probability to be in state ( ) The transitions between the different states are shown in Figure S12.
These coupled equations can be solved analytically for the mean and variance (and all higher moments) of the protein levels p and mRNA levels r , as well as for the gene activity d .
For this model we use two generating functions (Peccoud and Ycart 1995) that correspond to the two states , ,1 r p f and , ,0 r p f (see Table S6): Using the above formalism, the two master equations can be written as: and comparing coefficients of r p x y from both sides of the equation, it is possible to obtain differential equations for the time-dependent moments d , 2 d , r , p , rp , 2 r , and 2 p -see Table S7.  ] is possible only when the gene is in active state. Generating function Coefficient of r p x y  (1) 1 Differentiate the master equation with respect to x , and substitute 1 x y = = . (5)

Analytical solution for time-dependent moments:
Solving the differential equations in Table S8 Table   S8. Again, it can be seen that asymptotically (i.e., for 1/ R t α ), the mean protein level of YB1 is increasing linearly with time (p t ), whereas for PRC1 it grows quadratically ( 2 p t ).
As in the previously described model, the non-Poissonian parameter 2 / 1 b p p ≡ Δ − is asymptotically constant for YB1, whereas for PRC1 it grows linearly with time.
The CV is asymptotically decreasing with time for both cases: /~1/ p p t Δ . also non-Poissonian (i.e., Noise strength larger than 1) in this case due to the telegraph process of gene activation and inactivation. Note also that the Noise strength of the telegraph process is smaller than 1. The results of the simple model previously summarized in Table S4 correspond to the case of 1 1 P ≈ , which occurs when the gene inactivation rate is very small compared to the activation rate: on off k k . The meanings of symbols 1 P , R b , and P b are given in Table S9.

YB1
The asymptotic probability to be active/inactive state.
The mRNA burst size: If on off k k this is the average number of mRNA's produced per single gene activation event.
The protein burst size: The average number of proteins translated from a single mRNA molecule during its lifetime.

Simulations:
Again we simulate the stochastic model using the Gillespie algorithm with the species r , p , d . The reactions and their respective probabilities are detailed in Table S10. the probability is a product of the velocity of the reaction and the numbers of ways the molecules of the reactants participating in this reaction can interact. For example, for the reaction "DNA produces one mRNA molecule" ( 1 m = ) the probability is the product of the velocity of the reaction ( R k ) and the active DNA concentration ( d ).

Protein-level distribution:
In order to find the time-dependent distribution of the protein levels ( ) p t we solved analytically for higher moments (up to moment 6 n = ). We find asymptotically (i.e. for times longer than 1/ R α , 1/ on k , and 1/ off k ) that the behavior of moments corresponds to that of the negative binomial and the Gamma distributions, as detailed in Tables S11 and S12. Note that it does not matter if we take into account the activation-inactivation process or not -the Gamma distribution fits the stochastic model for both cases, with the difference being captured by the non-Poissonian parameter b .  the Negative binomial and the Gamma distributions are alike.
x e x σ + = ⋅ In bacteria the distribution was found to be Negative binomial when the lifetimes of the mRNA were taken to be much shorter than those of the protein / 1 P R η α α = (This is usually the case, as bacterial protein are considered stable and their lifetimes are assumed to be roughly 1 cell cycle). The parameters of the distribution have a simple physical meaning: . This result is demonstrated in Figure S13.
If the telegraph process is initially in equilibrium, than a fraction state. Thus the histogram of onset-times for such a case is composed of a peak at 0 0 t = (whose underlying surface is 1 P ) followed by an exponentially decaying tail (whose underlying surface is 0 P ).
-24 - We can thus use the histogram of onset times to extract the gene activation rate on k from experimental data. This method is relatively robust to changes in the threshold used to calculate the onset time 0 t , as demontrated in Figure S14. Furthermore, using the cumulative data function (CDF) gives an estimation that is more robust when the number of cells is small (the PDF is sensitive to empty bins that arise when only 150-200 cells are taken). The number of proteins produced per mRNA molecule per second is: 2 This rate, also refered to as the "elongation rate", was measured for bacteria and is assumed to be on the same order of magnitude for yeast and mammalian cells Young, R. and H. Bremer (1976). "Polypeptide-chain-elongation rate in Escherichia coli B/r as a function of growth rate." Biochem J 160(2): 185-94.. 3 Usually ribosome density is taken to be 0.03 to 3 ribosomes per 100 base-pairs. The maximal density corresponds to roughly 30 base-pairs per ribosome. Ribosome density is a roughly decreasing function of the length of the gene Arava, Y., Y. Wang, et al. (2003). "Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae." Proc Natl Acad Sci U S A 100 (7)    give approximately the same low fit-error to the experimental data. The "potential well" is quite shallow, as can be seen by the bootstrap (i.e. any minima seen inside the well is not robust to addition/deletion of single cells).
However, all values of R k , P k inside the "potential well" are biochemically feasible. Cell-to-cell variability can stem from either extrinsic or intrinsic sources In order to model the cell-to-cell variability two different models may be suggested. The first model assumes that the observed variability has 'extrinsic' origins, meaning that each cell has different rate constants which are conserved for a relatively long time period. Such differences were suggested to rise for example from amounts of transcriptional and translational machinery that differ between cells (Elowitz, Levine et al. 2002;Pedraza and van Oudenaarden 2005). The second model assumes that the observed variability arises from stochastic processes due to low copy molecular interactions and has been termed 'intrinsic'. In order to try and characterize the sources of variability in protein accumulation of PRC1 and YB1, we ran intrinsic and extrinsic simulations of the process. Intrinsic simulations were run using the Gillespie algorithm (Gillespie 1977) taking into account DNA opening rate -computed from onset times, and protein and mRNA half lives -computed from the transcription inhibition experiment (as previously described in the supplementary text). Extrinsic simulations were run by fitting linear and quadratic curves to each single cell profile of YB1 and PRC1 respectively, and adding the fit-to-experiment difference as experimental noise. We find that all measured parameters of the experiment on PRC1 and YB1, including, mean, noise strength, auto correlation and the ergodicity metric all can be reproduced (within the experimental error limit) using feasible biochemical parameters in both extrinsic and intrinsic simulations (see Figures S21-S23). Mixing occurs when a cell lineage, given enough time, reaches the different states found in a snapshot of a cell population. Mixing was measured using two approaches (detailed by Sigal et al. (Sigal, Milo et al. 2006)). The first is the auto-correlation function A(τ) of the protein levels (PRC1 at plot B and YB1 at plot D). The second, an 'ergodicity' metric, ranked the cells according to tagged protein fluorescence at the beginning of the movie, and followed the fraction of the total ranks that each cell traversed as a function of time (PRC1 at plot A and YB1 at plot C). All graphs (A-D) denote measurements of mixing between 3-13 hours following beginning of protein accumulation for PRC1 (A,B) and YB1 (C,D). In each graph, red line denotes average mixing measured in the experiment, brown line denotes mixing measured in simulation with extrinsic origins of noise and blue line denotes mixing measured in intrinsic simulation of a 3 stage molecular process as discussed in the text. - Mixing is measured as detailed in figure S20. All graphs (A-D) denote distributions of mixing after 13 hours of protein accumulation. Mixing measured from extrinsic simulations is brown and from intrinsic simulations is blue. Red circle denotes experimental data A,C) 'Ergodicity' metric measured for PRC1 and YB1 respectively. B,D) Auto correlation measured for PRC1 and YB1 respectively. Figure S23: Gamma distribution consists of a scale parameter a, and shape parameter b, which are both related to the mean, μ, and standard deviation, σ, by α=μ 2 /σ 2 and b=σ 2 /μ. Bold blue and red circles denote a,b measurements for YB1 and PRC1 respectively for the first 12 hours of protein accumulation. Cyan and orange diamonds denote a,b measurements from intrinsic simulations that best fit the data. Gray and pink squares denote a,b measurements from extrinsic simulations based on parameters fitted from the experimental data. Error bars denote standard errors.

Supporting movies
Supporting Movie 1: Time-lapse movie of transmitted light images of the clone with YFP CD-tagged PRC1. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 2:
Time-lapse movie of yellow fluorescence images of the clone with YFP CD-tagged PRC1. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 3:
Time-lapse movie of yellow fluorescence images overlaid on transmitted light images of the clone with YFP CD-tagged PRC1. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 4:
Time-lapse movie of transmitted light images of the clone with YFP CD-tagged YB1. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 5:
Time-lapse movie of yellow fluorescence images of the clone with YFP CD-tagged YB1. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 6:
Time-lapse movie of yellow fluorescence images overlaid on transmitted light images of the clone with YFP CD-tagged YB1. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 7:
Time-lapse movie of transmitted light images of the clone with YFP CD-tagged ANLN. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 8:
Time-lapse movie of yellow fluorescence images of the clone with YFP CD-tagged ANLN. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).

Supporting Movie 9:
Time-lapse movie of yellow fluorescence images overlaid on transmitted light images of the clone with YFP CD-tagged ANLN. Movie duration is 92 hours, each frame is played for 0.1 second (time-lapse: 1 frame per 20 minutes).