^{1}

^{2}

^{1}

^{2}

^{1}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: JS MD GC EG. Performed the experiments: JS MD. Analyzed the data: EG GC DR. Contributed reagents/materials/analysis tools: JS MD. Wrote the paper: EG GC DR.

The amount of cellular proteins is a crucial parameter that is known to vary between cells as a function of the replicative passages, and can be important during physiological aging. The process of protein degradation is known to be performed by a series of enzymatic reactions, ranging from an initial step of protein ubiquitination to their final fragmentation by the proteasome. In this paper we propose a stochastic dynamical model of nuclear proteins concentration resulting from a balance between a constant production of proteins and their degradation by a cooperative enzymatic reaction. The predictions of this model are compared with experimental data obtained by fluorescence measurements of the amount of nuclear proteins in murine tail fibroblast (MTF) undergoing cellular senescence. Our model provides a three-parameter stationary distribution that is in good agreement with the experimental data even during the transition to the senescent state, where the nuclear protein concentration changes abruptly. The estimation of three parameters (cooperativity, saturation threshold, and maximal velocity of the reaction), and their evolution during replicative passages shows that only the maximal velocity varies significantly. Based on our modeling we speculate the reduction of functionality of the protein degradation mechanism as a possible competitive inhibition of the proteasome.

Modeling changes in cellular states, such as differentiation, is one of the primary goals of systems biology. A large amount of effort is spent to model such processes in ways that are complete enough to be useful while being simple enough to be understood. Efforts of this kind have to balance the accuracy of the modeling of each single process with the complexity, both mathematical and algorithmical, of bringing these models together. One common practice is to employ a simple description of each process in order to subsequently combine them into a single macro-model.

In the last years much interest has centered on borrowing stochastic techniques from other fields and applying them in systems biology. It became clear that the biochemical fluctuations of individual reactions in the cell are not a secondary effect, but rather a driving force that the cell has to circumvent, or sometimes exploit, to survive.

Simple stochastic models that can be easily understood and verified with

We develop here a model that describes protein degradation as an active process, in agreement with the large literature on this subject [

We generated a large volume of high resolution fluorescence microscopy data, based on single-cell image analysis. These experiments followed the replicative senescence of murine tail fibroblast (MTF) until the complete senescence ensued. We performed a fluorescent staining on the nuclear proteins of the cells, that are known to vary with the cellular senescence and can be characterized from an experimental point of view with a more robust procedure. These observations have been used to validate the ability of the model to describe the protein distribution and to evaluate how this distribution changes when cells approach senescence.

In the next section we will discuss the experiment performed and the model, showing the exact resolution for the simplest case, and defining how to obtain a numerical solution for the general case. We will also show how the model is capable of reproducing experimental data. We compare a null model obtained from the production of a single protein with our model, proving that the latter obtains a better performance. Using the estimated parameters we will provide some insights into the cellular senescence process as a reduction of the efficiency of protein degradation, which can be interpreted in the framework of enzyme inhibition.

There is, indeed, considerable agreement on the substantial age-associated accumulation of nuclear protein in cultured cells [

We first corroborated previous works that senescent cells contain more nuclear protein using mouse tail fibroblasts (MTF), passaged as previously reported ([

In order to study the kinetic of the accumulation of nuclear protein in the nucleus without biases, we used single-cell quantitative microscopy for every cellular passage, corresponding to two population doublings in our passaging regime (see

Even if the protein production process is a rather complex one, recent experimental [

Where the DNA quantity is assumed constant and mRNA production is low. The reaction constants _{1} and _{2} represent the production rates of mRNA and protein respectively, and the _{1} and _{2} their degradation rates. For each mRNA molecule, several proteins are produced, generating the so-called protein production burst. These bursts have been experimentally observed [_{1} greater than _{1}).

The above model has been solved in the continuous limit by Friedman et al [

They have shown that under the hypothesis of independent exponential bursts of RNA, this model has a stationary distribution described by a Gamma distribution
_{1}/_{2} represents the average number of production bursts per cycle and _{2}/_{1} is the average number of protein produced by each burst.

The Gamma distribution is commonly used to describe over-disperse distributions, with a Fano factor (ratio between the variance and the mean of the distribution) greater than 1. A Fano factor of 1 is characteristic of a very simple process of independent creation and destruction of a protein. For a single enzyme

We will use this as the null model to be compared with ours.

In these models the protein levels that are predicted are all the proteins that are produced and not completely destroyed, including the ubiquinated ones. This is compatible to the experimental setting used in this work, that measure the signal from all the protein in the nucleus, without distinction between the ubiquination state.

This crude approximation of the protein production rate can be justified considering that is known that there is little correlation between the transcript levels and the corresponding protein abundance [

Our model will be described and investigated in the framework of the Chemical Master Equation [

The master equation model describes discrete valued processes, so we will refer not to the Gamma distribution, but to its discrete equivalent, the Negative Binomial distribution. This change in the model does not change the validity of the results. The relationship between the two has been addressed by Paulsson et al [

We aim to describe the amount of proteins in the cell nucleus as a coarse-grained process of generation and degradation, without differentiation between individual protein species. Considering the total production of proteins as the sum of many weakly correlated processes, the total effect can be seen as a quasi-stationary process with a mean value greater than its standard deviation, so we will approximate it as a constant production.

The degradation process, on the other hand, is driven by a much smaller number of reactions, each of which is strongly correlated with the others: the target protein is first ubiquitinated, then moved to a different location and finally degraded by the proteasome (a large degradation complex that binds to the target protein and fragments it). As a first approximation we will consider all these processes as an enzymatic process performed in a single step.

This hypothesis is based on the observation that in mammalian cells protein degradation is an active process. In this model we ignore the effect of the dilution due to cellular division, being the process time scale much faster than the cell division time, as several weeks can be spent between two divisions in the late stages of cellular senescence.

We can express this process with a monodimensional master equation in the form:
_{n} of observing

The operators

The _{n} and _{n} are the generation and recombination [

The _{n} represents the constant production rate due to the translation of the RNA into proteins, while the _{n} represents the active degradation of the protein by means of the degradation mechanisms.

This master equation reaches a stationary distribution under the convergence condition that

For all practical purposes we can normalize all the kinetic coefficients to the value of

We can find the stationary solution by a recurrence relation, which states that in mono-dimensional systems with one-step processes the solution is subject to the detailed balance condition:

The obtained equation for the occupancy probability of each state of the system is:

By expanding the product we can factor the formula in terms of exponentials and factorials of _{0} is a normalization constant and the formula can be recognized as a Negative Binomial Distribution where all the terms that don’t depend on _{0}. The use of the Negative Binomial Distribution as basic model for cellular processes has been proposed in the past on the basis of stochastic properties of the biochemical regulatory circuits [

Summing from

For the distribution to exist the

The resulting distribution is monomodal and its mode can be evaluated with very good accuracy by the solution of the deterministic system where the increase and decrease terms balance out.

The protein degradation chain is a complex mechanism composed by several steps performed by specific cellular machinery that need to be performed in a specific order. Given that the amount of proteins responsible for the degradation chain are very diluted in respect to their target, the whole proteome, it is not far fetched to hypothesize a pseudo-stationary dynamic like the one underlying the single protein dynamic.

We use the Hill kinetics [

This hypothesis leads to a change in the degradation term as follow:

To obtain a partial solution it is necessary to decompose the term ^{α} + ^{α}, and this is possible only for integers valued _{r}:

This allows us to obtain the following form for the stationary solution:
_{0} term has the form of a generalized HyperGeometric function:

The resulting distribution is still monomodal but depending on the value of the Hill cooperation parameter

The constraints of

We will refer to this distribution as the generalized Negative Binomial distribution.

Using a bootstrap method we evaluate the goodness of fit of the two models of active degradation (as described in the

The two distributions to be tested are the Negative Binomial distribution and the Generalized Negative Binomial, that allows cooperativity. The results are shown in

Each row represent a different experiment, that was evaluated individually. The row labeled as

STANDARD NEGATIVE BINOMIAL: | ||||||

0.34 | ||||||

0.33 | ||||||

0.64 | ||||||

0.17 | ||||||

0.17 | ||||||

GENERALIZED NEGATIVE BINOMIAL: | ||||||

0.17 | 0.07 | |||||

0.22 | 0.21 | |||||

0.85 | 0.11 | |||||

0.25 | 0.55 | 0.3 | 0.57 | 0.18 | ||

0.44 | 0.25 | 0.55 | 0.3 | 0.57 | 0.16 |

The Generalized Negative Binomial distribution is compatible with the observed data at all times for all the available data points (p-value > 0.05), while the standard Negative Binomial distribution does not satisfy these criteria (see

The upper graph is linearly scaled, the lower one is logarithmically scaled to show the distribution at high n. The black line is the best estimated distribution, while the gray area represents the uncertainty in the distribution.

The upper graph is linearly scaled, the lower one is logarithmically scaled to show the distribution at high n. The black line is the best estimated distribution, while the gray area represents the uncertainty in the distribution.

The upper graph is linearly scaled, the lower one is logarithmically scaled to show the distribution at high n. The black line is the best estimated distribution, while the gray area represents the uncertainty in the distribution.

To account for the different number of parameters in the two model distributions we performed the AIC (Aikake Information Criterion) and BIC (Bayesian Information Criterion) tests. The results in

A negative value imply a preference toward the generalized negative binomial. The generalized negative binomial is preferred in all cases for both measurements, aside for the BIC value of the tenth passage where the difference in close to 0 (so they are equivalent). Bigger dataset (passages 3 and 13) evidence a strong preference toward the generalized negative binomial. These results are robust under correction for small sample size (that gives a correction of order 10^{−1}). As BIC penalizes strongly the higher number of parameters of the generalized negative binomial we obtain values lower than those of the AIC, but with the same general trend.

passage | ΔAIC | ΔBIC | dataset size |
---|---|---|---|

03 | -21.31 | -16.70 | 744 |

09 | -24.00 | -20.46 | 255 |

10 | -3.24 | 0.03 | 195 |

11 | -3.70 | -1.06 | 103 |

12 | -4.36 | -1.16 | 182 |

13 | -25.69 | -21.16 | 684 |

From now on the analysis will refer only to the Generalized Negative Binomial distribution.

In

We can see that the parameters

The Hill threshold concentration

These observations are compatible with the hypothesis that the degradation chain is qualitatively the same during cellular senescence, and does not undergo structural changes, thus the protein accumulation in the nucleus is due to the variation of balance between protein creation and degradation.

In

All of them exhibit a transition in the observed value around the eleventh passage. The gray ares are the 50% and 95% confidence interval for the fit with a logistic function with four parameters: maximum and minimum value, steepness and transition point.

These results are compatible with an increase in the amount of nuclear proteins during cellular senescence. It is important to note that this process is not continuous, but rather a steep one, even if the underlying parameters vary in a smooth, almost linear manner. The cells undergo a transition from a state of high efficiency of protein degradation to a lower one in few replicative steps. The most relevant change occurs between the eleventh and twelfth passages, in a way that is compatible with biological markers for the onset of cellular senescence, like the fraction of SA-

We proposed a model describing the amount of nuclear proteins as a production/degradation process, in which the degradation is a cooperative enzymatic reaction. This process is characterized by three parameters: the proportion between the rate of production and the maximum potential degradation rate

From a kinetic point of view this can be interpreted as a competitive inhibition mechanism, in which the enzyme active site is blocked by an inhibitor, similar to the usual substrate, that prevents it from properly working. The presence of the inhibitor reduces the capability of the enzyme to convert the substrate into the final product (the

Recent biochemical studies support our results that proteasome activity in cell might be affected upon ageing because of the accumulation of inhibitors instead of a degeneration of proteasome activity [

The results from our analysis, combined with the preexisting experimental and theoretical knowledge, suggest that the accumulation of proteins in the cell nucleus can be described with a good approximation as a reduction of the proteasome activity, due to the accumulation of inhibitors. These inhibitors are probably non–correctly degradated protein debris that drive a vicious cycle that prevents the degradation cycle from working properly, leading to the observed protein accumulation.

We believe that our approach is innovative for the following reasons:

We utilized a CME for the description of this process and characterized the stationary distribution as a generalized negative binomial distribution

Variations in the parameters distribution are able to discriminate between different stages of cellular senescence

Our modeling, also verified by experimental data, are supporting the hypothesis of molecular clogging versus the cellular clock.

In conclusion we think that stochastic modeling of biological processes is a very informative approach, especially if compared with experimental data, because it can shed new light on the complexity of biological processes.

Primary adult mouse tail fibroblasts (MTF) were obtained from tail biopsies of 8–12 week old C57 mice as described [^{Ink4a}[

MTFs were routinely sub-cultured at 1:4 dilution upon reaching 80% confluence: under these conditions cells reached senescence at passage 12. At each passage cells were seeded onto glass cover slips and fixed as previously described [

The data used in the analysis were obtained from a software evaluation of the integral fluorescence signal and the surface extension of the cellular nucleus.

The stationary distribution of the proposed model cannot be written in a closed form for a generic, real valued

The value _{0} used for the normalization in

It is possible to normalize the value of

The parameters for each passage have been evaluated independently from the others by means of an estimation of the likelihood function of the master equation given the data. The likelihood function was evaluated by a MonteCarlo Markov Chain, and the local maximum was taken as the best value (the maximum likelihood). The likelihood function was normalized as probability density function to evaluate the uncertainties of the parameters and their correlation. This is equivalent to a Bayesian analysis of the data with a flat, non-proper prior on all the parameters.

The MonteCarlo Markov Chain was performed with an adaptive Metropolis algorithm to account for the parameters correlation, and the simulation was run for 2 ⋅ 10^{5} steps, with a burn–in period of 10^{5} steps, and thinning factor of 10 (9 out of 10 steps are discarded to reduce sampling correlation).

The resulting distribution of the individual parameters and their joint distribution for each passage in time can be seen in

In the first line we have the distribution of each parameter. In the second line we have the joint probability distribution of each couple. This show that the values are correlated but without strong non–linearity.

In the first line we have the distribution of each parameter. In the second line we have the joint probability distribution of each couple. This show that the values are correlated but without strong non–linearity.

In the first line we have the distribution of each parameter. In the second line we have the joint probability distribution of each couple. This show that the values are correlated but without strong non–linearity.

For each parameter a linear trend, as a function of the replicative passages, was estimated with a weighted linear regression, represented by the dashed gray line. The light gray region represents the 95% uncertainty margin for the linear trend, and gives a graphical representation of the significance of the linear trend.

To perform the Bayesian estimation the library pymc [

We tested the predictive power of our model by fitting the experimentally measured distribution of nuclear protein amount and comparing the goodness of fit of the resulting distribution with the Negative Binomial. For each distribution we evaluated a p-value for the null hypothesis that the distribution family is able to fit all the experimental data. The p-value was evaluated with a bootstrap method. This method adapts a given distribution to the experimental data with a maximum likelihood method and performs an ^{2} to assess the goodness of fit; then the fitted distribution, by a sampling procedure, is used to generate a new set of data of the same size of the original one and performs a new fit and ^{2} estimate on this sample.

By iteratively repeating this procedure we can estimate the expected distribution of the ^{2} and compare the observed value to it, obtaining a value of probability of observing the given value of the test. This can be used as an indicator of the overall goodness of fit of the distribution family, avoiding test distortion due to the fit procedure and data transformation [

Here we show how the normalization constant _{0} for the standard negative binomial is obtained from

Starting from the recurrence relation we obtain that:

Each row represents the data from a single cell in each one of the perfomed experiments. The first column contains the size of the cellular nucleus, estimated with fluorescent probing. The second column contains the raw value of the fluorescence signal returned by the microscope. The third column represent the replicative passage of the observation. The fourth column represent which replication of the experiment was used for the observation.

(PDF)