^{1}

^{*}

^{2}

^{3}

^{4}

^{2}

^{3}

^{4}

^{3}

^{4}

^{5}

^{6}

^{2}

^{3}

^{4}

OL, JMN, ZW, and SCP conceived and designed the experiments. OL, JMN, and ZW performed the experiments. OL developed the theoretical model and analyzed the data. ZW and LH contributed reagents/materials/analysis tools. OL, JMN, and SCP wrote the paper.

The authors have declared that no competing interests exist.

In many biological systems, the interactions that describe the coupling between different units in a genetic network are nonlinear and stochastic. We study the interplay between stochasticity and nonlinearity using the responses of Chinese hamster ovary (CHO) mammalian cells to different temperature shocks. The experimental data show that the mean value response of a cell population can be described by a mathematical expression (empirical law) which is valid for a large range of heat shock conditions. A nonlinear stochastic theoretical model was developed that explains the empirical law for the mean response. Moreover, the theoretical model predicts a specific biological probability distribution of responses for a cell population. The prediction was experimentally confirmed by measurements at the single-cell level. The computational approach can be used to study other nonlinear stochastic biological phenomena.

Complex biological systems are built out of a huge number of components. These components are diverse: DNA sequence elements, mRNA, transcription factors, etc. The concentration of each component changes over time. One way to understand the functions of a complex biological system is to construct a quantitative model of the interactions present in the system. These interactions are usually nonlinear in terms of the concentrations of the components that participate in the interaction process. For example, the concentration of a dimer is proportional to the product of the concentrations of the molecules that dimerise. Besides being nonlinear, the interactions are also stochastic. The production process of a molecule is not deterministic, and it is governed by a probability rate of production. In what follows, a nonlinear stochastic model for the response to heat shocks in CHO mammalian cells will be developed. Heat stress is just one example of the many ways a molecular system can be perturbed. From a general perspective, the structure of a molecular system is uncovered by imposing different perturbations (input signals) on the system under study, and then the responses of the system (output signals) are measured. From the experimental collection of pairs of input–output signals, laws that describe the system can be uncovered. This is the fundamental idea in Systems and Synthetic Biology [

To acquire the experimental data, we elected to use a system using a reporter gene where the expression of the green fluorescent protein (GFP) is under the control of the promoter region of the mouse ^{4} cells were analyzed per sample. Detailed protocols and experimental conditions are available in the

First, we will follow a description of the time course of the mean response to a heat shock. At elevated temperatures (39 °C to 47 °C), the heat shock promoter HSP70 is active and GFP starts to be synthesized. The input signals were chosen in the form of a pulse at a temperature (

(A) The accumulation of GFP is monitored for 18 h after heat shock. The fold induction is defined as the ratio of the mean value of GFP at different times (mean GFP) over the mean value at 30 min after the shock (mean GFP_{0}).

(B) The logarithm of the fold induction saturates exponentially in time. The last 15 samples were predicted by the fit on the first 25 points.

(C) The formula

The fold induction of GFP with respect to a reference (_{0}) was then determined:

The reference is the first measured sample away from the end of the heat shock (30 min after the shock in

The time _{0}.The initial fold induction at _{0} after the end of heat shock) is 1. This value of 1 for the initial fold induction is consistent with the entire time evolution if a fit with the expression

The empirical law for the response of the cells to the heat pulse can be thus cast into the form:

The same law appeared in repeated measurements of pulses at 42 °C for 30 min duration (unpublished data). Parameter ^{a}

These findings suggest that the same law is valid for other heat shock pulses, parameters

To find the range of validity for the empirical law, measurements were taken for the responses to heat shocks at various heat pulse parameters

For each heat shock pulse (T, D), 13 time samples were taken. At each time sample, the intensity of GFP in at least 10,000 cells was recorded. The groups A, B, and C represent weak, moderate, and strong heat shocks, respectively.

The law was again present in all responses for temperatures between 41.5 °C and 42.5 °C, (examples selected in

(A) For weak shocks (39.5 °C to 40.5 °C), the fits are less tight than they are for moderate shocks (41.5 °C and 42.5 °C).

(B) For strong heat shocks (duration greater than 15 min in this figure), the response starts at a slow pace. Later, the response grows faster, overcoming those responses produced by less strong shocks. The time origin and the reference value for fold induction, GFP_{0}, is the mean response at 2 h after the shock.

In the following, a theoretical model will be developed to explain the experimentally discovered law. The exponential accumulation of the GFP shows that the derivative with respect to time of the mean GFP is proportional with itself:

There must be thus a molecular process, described by the exponential term ^{−bt}_{1}) represents the first phase of the heat shock response and includes components like HSF1–DNA binding activity. _{1} will increase during the duration of the heat shock and then, after the shock, will decrease with a lifetime proportional to parameter

The accumulation rate of _{2} is controlled by _{1}. For weak and moderate shocks, the activation component _{1} reaches high values at the end of the heat pulse. The degradation rate of _{1} after the heat pulse is proportional with parameter _{2} at the end of the heat pulse depends on parameter

The “accumulation” variable (_{2}) includes the products of transcription and translation. This second variable, at low levels before the shock, will gain momentum after the shock. To connect the model with the experimental data, the GFP will be considered to be proportional with _{2}. The speed of accumulation of _{2}, that is, _{2}/_{1}_{2}. Immediately after the shock, _{1} has a big value (the activation is high), and thus the speed of _{2} is high (the accumulation is in full thrust). This will trigger an initial fast accumulation of GFP, which is proportional with _{2}. Later on, the activity _{1} disappears, nullifying the product _{1}_{2} and thus the speed of _{2}. The process is then terminated (the accumulation stops) (_{1}(0) and _{2}(0) at a zero time reference _{0} = 0, the solution to this system of differential equations is
_{2}(

The theoretical model contains two parameters: _{1}(0) which equals the product of

It is interesting to notice that the above time evolution can be re-expressed as a conservation law which is independent of any reference time. For any two time points _{1} and _{2}, the following holds

At this point, there is no more information in the activation–accumulation description above than is in the empirical law. However, one can search for more information hidden in the above two-component description by turning attention to the full data available, not only to the mean value of GFP. For each sampled time, the full data available consists of measured GFP levels for at least 10,000 single cells. These 10,000 single-cell measurements are typically distributed as in _{1} is the mean value of a stochastic activation variable which will be denoted by _{1}, _{1} = 〈_{1}〉. After the heat shock, _{1} will decrease with a probabilistic transition rate _{1}. The activation–accumulation stochastic model is based on the same relation as before (compare _{1} with _{1}), but now it describes the probabilistic transition rate and not a deterministic speed of attenuation. By the same token, _{2} is the mean value of _{2} and its probabilistic accumulation rate is _{1}_{2}. One notices that the transition probability rate _{1}_{2} is nonlinear in the variables _{1} and _{2}. The stochastic two-component description is thus a mirror image of the deterministic two-component system. However, the probabilistic system is more powerful as it predicts that the histograms of GFP (proportional with _{2}) obtained from the flow cytometry measurements follow a Gamma distribution

The experimental GFP fluorescence intensities are Gamma-distributed, as predicted by the activation–accumulation model.

The fact that the levels of proteins in gene networks tend to follow a Gamma distribution, which is a continuum version of a discrete negative-binomial distribution, was presented in [_{2}, predicted by the stochastic activation–accumulation model, is the negative-binomial distribution. This distribution appeared in earlier theoretical studies of genetic networks [_{2} and appears in measurements as a decimal number and not as a pure integer. Thus, to describe the probability distribution of the GFP intensity, a continuous version of the discrete negative-binomial distribution is necessary. This continuous version is the Gamma distribution observed experimentally in _{0}, immediately after the heat shock, there will be at least one cell from the entire cell population which contains the minimum number of molecules _{2}. Denote this number by _{0}. As the time passes, the molecule number _{2} will grow, following the described stochastic process. However, there is a nonzero probability, though extremely small, that the process of accumulation in one cell does not start even after 24 h. This can happen in one of those cells that contain the minimum number of molecules _{2} at the initial time _{0}. Thus, at any later time _{0}, the lowest possible number of molecules _{2} in a cell is _{0} as it was at the initial time _{0}. It can be shown (see the section Analysis of the Theoretical Model) that _{0}. This explains the time independence of the experimental values of _{1} and _{2}. The above relation

As time develops, the biological heterogeneity increases. At all times, the heterogeneity is Gamma-distributed. Gamma distribution parameters

To further check the reality of the Gamma distribution for heat shock response, a comparison of the Gamma fit with the lognormal fit is presented in

For 37 °C, the lognormal fits data better than the Gamma distribution. As the heat shock is increased from low to moderate, the Gamma distribution becomes a better fit. For strong heat shocks (at 44.5 °C for 30 min), there is no a clear separation between a Gamma distribution and a lognormal one.

The law

(A) The contours for

(B) In the lower left region, the contours for ^{−1}. The instability of the saddle configuration can be related to a need for sensitivity with variation in temperature and duration of the stress. Because of the double exponential law, a small variation in

The conclusion of this section will be rephrased using a control theory perspective. The end result of this paper is an input–output relation for the response of the CHO cells to heat shocks, together with a theoretical model that explains it. The input signals are pulses of a precise time duration

Parameters _{1}). We associated this theoretical factor with the heat shock factor HSF1-DNA binding activity.

The theoretical model is based on an activation variable _{1} and an accumulation variable _{2}. The state of this two-component model is thus (_{1}, _{2}), and any pair of positive integer numbers can be a possible state. The main goal is to find the mean value and standard deviation for the activation and accumulation variable, respectively. These quantities will be obtained from the equation for the probability _{1}, _{2}, _{1}, _{2}) at the time _{1}, _{2}, _{1}, _{2}). The experimental results suggest that two possible transitions change the state (_{1}, _{2}). One transition represents the decreasing of the activation variable from _{1} to _{1} − 1. On the state (_{1}, _{2}), this attenuation appears as (_{1}, _{2}) → (_{1} − 1, _{2}), with an unaffected accumulation variable _{2}. The second transition will describe the accumulation of the accumulation variable from _{2} to _{2} + 1. On the state (_{1}, _{2}), this accumulation appears as (_{1}, _{2}) → (_{1}, _{2} + 1), with the activation variable _{1} now being unaffected. A notation for the transition direction can be introduced: _{−1} = (−1,0). The degradation transition can thus be written as (_{1}, _{2}) → (_{1}, _{2}) + _{−1}. The negative sign in the index −1 is just a reminder of the fact that the transition reduces the number of molecules; the 1 in the subscript tells us that the transition is on the first variable. Likewise, the accumulation transition can be expressed as (_{1}, _{2}) → (_{1}, _{2}) + _{2} and _{2} = (0,1). The index 2 is positive (accumulation) and is associated with the second component. To find the probability _{1}, _{2},

The components _{1} and _{2} are represented by ovals and the transitions by squares. The lines that start from the center of a transition square represent the sign of that transition and point to the component on which the transition acts. The transition _{−1} is negative, so the line ends in a bar and acts on _{1}. The transition _{2} is positive and so the line ends with an arrow; it acts on _{2}. The lines that stop on the edges of the transition squares represent the transition probability rates. The line that starts from _{1} and ends on _{−1} represents the transition probability rate _{1}. In other words, the transition _{−1} is controlled by _{1}. The lines that start on _{1} and _{2} and merge together to end on _{2} represent the product _{1}_{2}, (the merging point represents the mathematical operation of taking the product).

At this point, the theoretical model is fixed and what comes next is a sequence of computations to extract information out of it. This information will be compared with the experimental results. Given the transition rates, the equation for the probability _{1},_{2},

The above equation for _{1},_{2},_{1},_{2},

The equation for the function _{1},_{2},_{1},_{2},

The goal is to find the time variation of the mean value and standard deviation for the activation and accumulation variable: 〈_{1}〉, 〈_{2}〉, _{1},_{2}〉, etc. Here 〈〉 is a notation for the mean value with respect to the probability distribution _{1},_{2},_{1},_{2},_{1},_{2},_{1} =1, _{2} = 1. These partial derivatives are actually the _{1},_{2},

The equations for _{1}(_{2}(_{11}(_{1},_{2},

The activation–accumulation model being nonlinear, the equations for the factorial cumulants cannot be reduced to a finite system of equations, unless some approximation technique is employed. All third-order cumulants were discarded to obtain the above system of equations. In [_{1}, _{2}, _{12}, _{11}, and _{22} as variables. Although it can be solved for _{1} and _{2}, we found that the influence of the correlation term _{12} is small and cannot be experimentally detected in the GFP response. Taken thus, _{12} = 0, and the system of equations is reduced to:

The solution to _{22} from the four-equation system is
_{2}(_{0}) at some time _{0} after the heat shock. The solution can be restated in terms of the variance, Var, of the variable _{2}. The transformation from the factorial cumulants to Var is
_{2} is _{22}, it follows that
_{1}, _{2}, _{11}, and _{22}. However, for the case of negligible _{12}, the stochastic process is decoupled in two stochastic processes, each of which is exactly solvable. It is thus useful to solve directly for the probability distribution of _{2} at this point. The transition probability rate for the first stochastic process (for the activation component _{1}) is the same as before: _{1} and _{2} is through the _{1} now). This simplifies the problem of finding the distribution of _{2}. Denote the mean value of _{1} with _{2} [_{2} now has an accumulation transition rate

The origin of time, _{1}(0) represents the mean value of the activation variable at the end of the heat shock.

The probability _{2},_{2} number of molecules at time

To find the solution, an initial condition _{2},_{0}) must be specified. The time _{0} is some time taken after the heat shock pulse (_{0} > 0), when the effects of the shock start to be detectable; it can be, for example, 30 min or 2 h after the pulse. The probability distribution _{2},_{0}) can be obtained, in principle, from the experimental values of GFP since GFP = _{2}. There is an obstacle though: the proportionality factor _{2} into the laser intensity which is the output of the flow cytometry machine. The conversion from the molecule numbers to the laser intensity can be more complicated than the proportionality relation GFP = _{2}. For example, a background _{2} + _{2} to connect the flow cytometry readings with the number of molecules. To conclude this initial condition discussion, in a perfect setting we would know the scaling factor _{2},_{0}) from the measured data. Because the scaling factor _{2},_{0}) is based on a simple assumption: all cells have the same number of molecules _{2} = N at the time _{0}. That is _{2},_{0}) = _{2},N) where

Here _{2} can take only values greater than N, _{2} = _{2} − _{2} in any cell. This physical interpretation of _{2}:

The mean and variance for _{2} are

Although the assumption that all the cells contain the same number of molecules at _{0} is unreal, it produces a valuable outcome. The negative-binomial distribution implies a Gamma distribution for the GFP intensity (through the scaling relation GFP = _{2}), a fact to be discussed shortly. Because the Gamma distribution is a good fit for the experimental data, we conclude that the negative-binomial is the correct solution for the distribution of the accumulation variable _{2}.

The second step in choosing the probability distribution _{2}, _{0}) will be guided by the experimental results. The experimental results show that the biological system passes through a chain of events from an unknown distribution of GFP before the heat shock, to a Gamma distribution at some time _{0} after the heat shock (2 h, for example). Also, the experiment shows that the distribution of GFP is Gamma at later times _{0}. In other words, the distribution of _{2} becomes a negative-binomial at some time _{0} after the heat shock and then afterward remains negative-binomial. These experimental observations are mathematically explained by showing that a solution to _{0} remains negative-binomial for all later times _{0}. Indeed, the solution to _{0}.

The number _{0} is the minimum number of molecules _{2} to be found in a cell at _{0} and also at all later times _{0} (because _{2} cannot decrease).

The time evolution of the mean 〈_{2}〉 is
_{0} but different parameters

To connect the theory with the experimental results, the probability distribution for the GFP intensity is needed. This distribution is the continuum limit of the distribution for _{2}. It is a well-known fact that the continuum limit of a negative-binomial distribution is the Gamma distribution. This continuum limit is presented here in order to find parameters

The change from the integer variable _{2} to the real variable _{2} is simple if advantage is taken of the fact that the common parameter _{0} is a small number. Parameter _{0} is less than any possible molecule number _{2} present in the system after the time _{0}, _{2} ≫ _{0}. Then, writing for simplicity

In the last step, we used the approximation 1 − ^{−y} for small values of

To go from the discrete variable _{2} to the continuous variable GFP, we write the above relation as an equation for the probability density
_{2} = 1; then scale to GFP, (GFP = _{2}). The probability density
P℘ for GFP is then

This is a Gamma distribution for GFP ≡

The mean value of the Gamma distribution is

The way the material is organized and presented in this paper is an outcome of a series of guiding principles imposed upon the project. These guiding principles were formulated to keep in balance the experimental data with both the mathematical and biological models. The guiding principles are: 1) start from experimental measurements and discover an empirical law from data using signal generators as input into the system; 2) build a simple mathematical model with as few parameters as possible to explain the empirical law; 3) check the mathematical model using additional experimental information; 4) use a general mathematical technique, likely to be applied to other experimental designs; 5) keep the biological model and the mathematical model to a level of complexity commensurate with the richness of the experimental data

These guiding principles filtered out other possible presentation formats. For example, the fifth principle will prevent the development of a complex mathematical model built on a complex biological model, although many molecules involved in the heat shock response are known. One outcome of the strategy outlined above is the discovery of a new variable, _{1}, brought about by a mathematical necessity from the empirical law. The behavior of this variable matches the behavior of the HSF1-DNA binding activity, experimentally described in [

At a deeper level, the double exponential law and the activation–accumulation model need to be extended by simultaneously measuring the GFP production and the HSF1 activity. Following a series of modelling and data acquisition, more and more molecules can be reliably added into a quantitative description of the heat shock response.

Narrowing the discussion from general views to the specifics of this project, a natural question arises: why would cells evolve such a double exponential response? We can only speculate and say that cells need a very fast response immediately after the shock. Moreover, cells cannot bear for a long time such a fast exponential accumulation, so this initial exponential growth must be stopped. A compromise between these two requirements is the double exponential law for the mean heat shock response

Another aspect to be noted is the time evolution of the stochastic process that describes the heat shock response. Not only the time evolution of the mean value can be mathematically modelled, but also the time evolution of the probability distribution.

The time evolution of GFP distribution can be well-explained by a negative-binomial with a time-dependent parameter. This behavior is obtained by neglecting the statistical correlation between the activation and the accumulation variable in the stochastic activation–accumulation model. It will be interesting to reach a level of experimental accuracy at which the statistical correlation becomes detectable, and then measure the deviation of the probability distributions from the negative-binomial.

From a mathematical point of view, we choose to work with the discrete master equation because it is simple to relate it to a biological model. The transition probability rates can be easily connected with biological phenomena at the molecular level. The ease of building the model is counterweighed by the difficulty of solving the discrete master equation. To overcome this difficulty, we employ the method outlined in [

The biological significance of the approach can be also expressed using a control theory perspective. The structure of an unknown physical system is uncovered by perturbing the system with a series of input signals. The response to these perturbations is measured as output signals. Then the mathematical relation between the input and the output signals constitutes a model for the system. As much as possible, this theoretical model must also incorporate the molecular components of the system. The activation–accumulation model belongs to the category of input–output models. It is possible that other biological systems can be described by other simple models. A classification of molecular networks can thus be devised using their input–output functional relation. Moreover, decomposing the biological system in subsystems, there is a hope that global properties of each subsystem can also be described by a coarse-grained model. In this way, a hierarchy of models can be built to explain more and more details of a complex system.

A 5.3-kilobase DNA containing promoter and 5′-untranslated region of the mouse

CHO-K1 cells (ATCC) were grown in MEM-alpha (Cellgro) containing penicillin, streptomycin, and amphotericin (Cellgro) and complemented with 10% FBS (Gemini Bio-Products). Cells were transfected by lipofection using Lipofectamine (Invitrogen) as previously described. After 10 d of selection in hygromycin (500

The cells were detached with trypsin and allowed to recover in suspension in complete growth medium for 3 to 4 h at 1 × 10^{6} cells/mL at 37 °C in a CO_{2} incubator. The cells were then aliquoted in 50 mL conical tubes, one for each experimental condition (temperature and duration of heat shock). Up to five different temperatures were tested simultaneously, one water-bath being used for each temperature. The temperature of each water-bath was accurately monitored with a precision Hg thermometer (accuracy ±0.1 °C). Then the cells were centrifuged, the medium was aspirated, and the heat was initiated by resuspending the cell pellet quickly at 5 × 10^{5} cells/mL in a medium prewarmed at the temperature selected for the heat shock. The tube was then placed in the same water-bath for the remainder of the heat shock, after which the tube was placed in ice-cold water and agitated for the amount of time that had previously been determined to be necessary to bring the temperature back to 37 °C (from 2 to 14 s). The tube containing the cells was then placed in a waterbath set at 37 °C. From that point on, samples were taken every 30 min or every 2 h for up to 26 h. In all experiments, a control where the cells were kept at 37 °C for the whole time was included. The exact duration of each heat shock was monitored with a stopwatch. This protocol allowed a very strict control over the amount of input applied to the cells. The cells were kept in suspension in the 50 mL tubes in a CO_{2} incubator at 37 °C for the rest of the experiment.

At each time point, 1 mL of cell suspension was removed from each tube and placed in a 5 mL tube. The cells were centrifuged for 2 min at 300 g, the supernatant was aspirated, and the cell pellet was resuspended in 500

The samples were analyzed by flow cytometry on an LSR II (Becton-Dickinson) equipped with a 488 nm solid state laser. The performance of the system was routinely checked with fluorescent beads (8-peak beads, Shero Rainbow, Spherotech), and the same instrument settings were used in all experiments, yielding almost identical fluorescence intensities every time for the cells kept at 37 °C. The cells were gated based on their forward scatter (FSC) and side scatter (SSC), and the same gate was used for all the samples. The fluorescence of each cell was measured based on the area of the corresponding pulse. The data were analyzed with the Diva software (Becton-Dickinson) for the mean fluorescence. The flow cytometry binary FCS files were converted to an ASCII text format with FCSExtract utility (Stowers Institute for Medical Research). The data were consequently analyzed with cftool and dfittool from MATLAB (MathWorks).

The time evolution of the mean GFP expressed with respect to a reference initial time _{0} = 0 is

The above time evolution can be reexpressed as a conservation law which is independent of any reference time. For any two time points _{1} and _{2} we have
_{0} we get

As the promoter is activated by increasing temperature pulses, 41.5 °C to 43.5 °C, the Gamma distribution becomes a better description of the biological variation (^{−ηt}. A few hours after the heat shock, when the effect of the exponential ^{−ηt}

The response at strong shocks can also be explained with the help of the activation–accumulation two-component model, by the following scenario. At the beginning of the heat shock, the activation component _{1} will start to accumulate irrespective of how strong the heat shock will be. A cell does not know at the beginning of the heat shock about the duration of the shock. For a high temperature, if the duration of the shock is too long, after an initial accumulation, the activation component _{1} will drop to low values. At the end of a strong shock, the activation component _{1} will thus have low values. This is contrary to the case of moderate shocks, when at the end of the shock _{1} has high values (_{1} will accumulate again after the shock. Because _{1} accumulates after the end of a strong shock, the speed of _{1} is no longer described by _{1})/_{1} in this time interval. Then, a few hours after the shock, it reaches a maximum value, from which it will decrease in the subsequent hours following _{1})/_{1}. The accumulation of _{1} after the shock is responsible for the slow response in the first hours. The decrease of _{1}, which follows, imposes a response similar with the one observed at low and moderate shocks. The mathematical model for strong shocks during the time period when _{1} decreases (5–6 h after the shock) is the same with the model for moderate shocks, _{1})/_{1} and _{2})/_{1} _{2.}. After the slow response ends, the empirical law

In view of the above discussion, for strong shocks the mean GFP is given by a modification of

OL is grateful to W. H. Wong for helpful comments on the manuscript and continuous encouragement. Many thanks go to F. Vaida, Y. Zhang, B. L. Adam, and to E. F. Glynn for FCSExtract software.

Chinese hamster ovary

green fluorescent protein