Design Principles of a Genetic Alarm Clock

Turning genes on and off is a mechanism by which cells and tissues make phenotypic decisions. Gene network motifs capable of supporting two or more steady states and thereby providing cells with a plurality of possible phenotypes are referred to as genetic switches. Modeled on the bases of naturally occurring genetic networks, synthetic biologists have successfully constructed artificial switches, thus opening a door to new possibilities for improvement of the known, but also the design of new synthetic genetic circuits. One of many obstacles to overcome in such efforts is to understand and hence control intrinsic noise which is inherent in all biological systems. For some motifs the noise is negligible; for others, fluctuations in the particle number can be comparable to its average. Due to their slowed dynamics, motifs with positive autoregulation tend to be highly sensitive to fluctuations of their chemical environment and are in general very noisy, especially during transition (switching). In this article we use stochastic simulations (Gillespie algorithm) to model such a system, in particular a simple bistable motif consisting of a single gene with positive autoregulation. Due to cooperativety, the dynamical behavior of this kind of motif is reminiscent of an alarm clock – the gene is (nearly) silent for some time after it is turned on and becomes active very suddenly. We investigate how these sudden transitions are affected by noise and show that under certain conditions accurate timing can be achieved. We also examine how promoter complexity influences the accuracy of this timing mechanism.


Introduction
Genetic circuits bear resemblance to human-made (e. g. electrical) circuits, in that both types perform a specific function or functions and are optimized to be robust against stochastic fluctuations and, in the former case, genetic mutations. However, the natural optimization of the genetic circuits seems yet incomplete and in constant flux. Using such naturally occurring circuits, synthetic biologists make improvements where nature fell short as well as devise new and novel motifs previously unseen.
Designing genetic circuits has been a major preoccupation by researchers working in the field of synthetic biology. Thus far, the record of successfully designed and implemented biological systems is noteworthy and still growing. Examples of synthetically constructed systems include: the toggle switch [1], positive autoregulation motifs [2], gene networks for tuning protein degradation [3], complex promoters [4] and many others (see [5,6] and references therein). In order for this trend of success to continue, it is imperative that both, theoretical modeling and experimentation, continue to refine existing designs as well as invent and test new ones.
Network motifs with positive autoregulation have been studied extensively [7][8][9][10] and their functions are well-known: (i) they slow the response time to stimuli, (ii) they increase the intrinsic noise and hence variability among a cell population, and (iii) those capable of supporting more than one steady state can function as bistable switches. In some cases these functions work together as, for example, during an epigenic differentiation where the intrinsic noise can trigger a random transition from low to high protein concentration, hence giving rise to two different populations of cells [11][12][13]. In other cases the delayed response serves the purpose of filtering short noisy bursts.
Longer delays -several hundred minutes or more -have been observed in real biological systems as in, for example, certain genetic circuits that control cell death [14]. Such delay-generating circuits usually involve motifs containing several genes, which makes them less ideal as systems to emulate by synthetic biology. On the other hand, due to greater degree of freedom and parameter space, circuits comprised of several genes tend to be more robust against external fluctuations and genetic mutations. Somewhere between this practical drawback and functional advantage lies an optimal design for generating controlled delayed responses.
Our aim here is to model, using stochastic simulation, a bistable gene switch capable of behaving like an alarm clock and discover general design principles that would facilitate its construction. More specifically, we want to know what makes the time of switching predictable to a high degree of accuracy. Nature gives us examples of breathtaking accuracy, e. g. in multicellular organisms which, during gestation, follow a temporal and spacial pattern so predictable ''it could be used to set a watch'' [15]. This observation inspired us to hope that accuracy in the system at hand was not asking too much.
To narrow the focus of our study, we set out to answer these three specific questions: (i) Is accurate switching at all possible in this type of system? (ii) What effects, if any, does the length of the delay have on this accuracy? (iii) What are the conditions under which this accuracy is possible? Results 0.1 Intrinsic properties of positive feedback 0.1.1 Delayed response to an external input. In systems with autoregulation, the proteins encoded by a gene themselves regulate their production rate; they are referred to as transcription factors (TF). The dynamical behavior of TF concentration, from now on denoted as y, is shaped by two forces: production and degradation. In most cases, the degradation term is linearly dependent on y, whereas the rate of production is generally a more complicated function of y, e. g. a Hill function, f (y)~y n =(kzy n ), where k is called the Hill coefficient and is related to y 1=2 , the TF concentration at which f~1=2; n is the Hill exponent, an integer determined by the promoter complexity. Figure (1) is an illustration of a promoter with two transcription activation sites (TAS) to which only these specific TFs can bind; this gives rise to n~2. Figure (2a) shows graphically the dependence of degradation on the TFs, ky, and several curves representing different production rates, f (y). The important feature to notice is the reduced difference between f (y) and ky as compared to the case of constant production rate, f (y)~1. This means that systems with positive feedback will always take longer to reach their steady state than those without, provided their steady states are the same. Figure (2b) shows the TF concentration dynamics for each case of f (y) in (2a) governed by the equation _ y y(t)~f (y){ky(t). 0.1.2 Bistability and long delays. When more than one TF is required for transcription initiation, the system may have more than one stable steady state and can be induced to evolve from one to another by changing one or more of its parameters (reaction rates). Among these, the one that most commonly occurs in nature is the bistable system [16]. In figure (1d), the curve on the right represents a bistable configuration: the system rests indefinitely in either of the two stable steady states, 1 or 3 (point 2 is unstable), corresponding to _ y y~0. If the system starts out at point 1, it will remain there until an external input, e. g. external chemical, change in environmental conditions (temperature, pH, light) or a TF from a different gene, modifies one or more of the system parameters and hence the curve in such a way that point 3 becomes the only available steady state. Figure (2d) illustrates the dynamics of this arrangement (the point-dashed curve). The initial rate of net TF production depends on the difference along the vertical between the curve f (y) and the degradation line, ky. In principle, this difference can be arbitrarily small, making the system linger near point 1 for an arbitrarily long time.
We should point out the fact that multi-stability can only be achieved for n §2. This can be seen in the rectangle within the graph of Fig. (2c), showing f (y), for which n~1, and ky. Notice that only points 2 and 3 are present, 3 being the only stable one. Dynamics of y for this system are shown in the rectangle within Fig. (2d). 0.1.3 Switching in the presence of noise. When noise is taken into account, the situation becomes more complicated. If we define the delay as the time when the TFs reach one third of their final steady state concentration (this definition is arbitrary), we should expect to find a distribution of delays centered near the value predicted by deterministic models.
Though noise will always be present, there can be significant differences in delay uncertainty between systems with similar averaged dynamics but different parameter values. Figure (3a-b) shows a simulation of two such systems, using the Gillespie algorithm (see section ''Methods''). At time t = 0 the promoter binding rate of each system was increased by such amount that the averaged delay (as defined above) was very similar &250 min. One can see that while their average profiles are similar, apart from their final steady state values, their delay distribution is quite different. In the following sections we investigate the source of this difference.
0.2 Taming the noise: how to construct a switch with predictable delays 0.2.1 Deterministic Model. To go beyond the qualitative description of the system at hand and understand its dynamical behavior in terms of chemical reactions, we must include in our model the interactions of the TFs with their TAS (transcription activation sites). The rate equations then read: The variables S, x and y represent the concentrations of the free TAS, mRNA and TF respectively; the other variables, z i , signify the concentrations of complexes made up of i TFs and S. The quantity N stands for total DNA copy number; here, it is set to 1. The parameters a i and b i denote the rate of association and dissociation between TF and the activation site respectively. One can write a 2~C a 1 where C is the cooperativity factor. When no cooperativity is present, C~1, we have a 1~a2 and b 1~b2 ; in all that follows (unless noted otherwise), we will consider only this case, setting a i~a and b i~b . The other parameters, r 0 , r, K, k, q denote, in the respective order, the rates for: basal transcription (when z 2~0 ), maximal transcription (when z 2~1 ), translation, mRNA degradation, and TF degradation. The factor of 2 appearing in Eqs. (1)-(4) comes from the fact that formation of z 1 and dissociation of z 2 can happen in two distinctive ways: TF can bind to one or the other TAS; and, similarly, when two TFs are bound to S, each has an opportunity to escape. In writing Eq.
(3) we assumed that only by forming complex z 2 does the transcription rate increase from its basal value r 0 to r. For the purpose of unifying our graphical representations of Fig. (2a-d) and the model we have just defined, one may decouple Eqs. (1) and (2) from the others by setting Eqs. (2) and (3) to zero and solving for z 1 and z 2 in terms of y: What allows this approximation is the (experimental) observation that mRNA and TFs take several orders of magnitude longer to reach equilibrium than z 1 and z 2 . Further steps can be taken by expressing x and _ x x in terms of y and _ y y, differentiating Eq. (4), and   inserting to it the newly expressed x and _ x x. This leads to: where f (y)~c 1 c 2 y 2 1z2c 1 yzc 1 c 2 y 2 , c 1~a With the second derivative in Eq. (8) it is now easy to interpret the right hand side as a force. For the case c 1 %c 2 , the function f (y) acquires its Hill form discussed earlier with k~1=(c 1 c 2 ). Note that while the profile of y as a solution of Eq. (8) is only approximately equal to that which is a solution of the full system, (1-4)), their fixed points y 1 , y 2 and y 3 , and x 1 , x 2 and x 3 are identical.
0.2.2 Exploring the parameter space. Once a deterministic model is defined, its dynamical properties can be explored as a function of its parameters. Here we are interested in a specific dynamical behavior, namely, delayed response to an external input. We imagine that the switch is in an off position, i. e. the lowest fixed point, when the input is introduced. The input may be a signaling molecule which can, in principle, depending on its type, change the value of any system parameter, or even several of them. For our study we chose a to be the control parameter. This is a reasonable choice as in the real systems, e. g. E. coli, many types of signaling molecules can change the affinity between TFs and their TAS. Thus, when the input is zero, the affinity of transcription factor y to bind its activation region is such that the system has three fixed points (see Figure (2c)). Once the input is introduced the TFs undergo a conformational change that allows them to bind more strongly to their activation sites. This shifts the curve in Fig. (2c) to the left, leaving only the third fixed point available.
In order to understand how the delay uncertainty depends on the system parameters, we first selected 275 different parameter sets, each satisfying the following constraints: 1) the parameter values were restricted to a realistic range (see the ''Methods section''); 2) only those parameter sets for which f (y) had three positive roots (fixed points) were considered; 3) a lower bound was placed on the possible values of the first fixed point x 1 (no such bound was imposed on y 1 ), and was increased incrementally after every parameter set selection; this ensured that large values of x 1 were also selected; in the present case the range we chose x 1 [½6,30 (the reason for this choice is explained later in the section) 4) the distance between points y 1 and y 2 was kept smaller than that between y 2 and y 3 ; without this constraint the dynamical behavior would not resemble the switch-like profile of Fig. (2d) (dot-dashed line); and, lastly, 5) only those parameter sets for which the distance between x 1 and x 2 was larger than 4 ffiffiffiffiffi x 1 p were kept; this constraint served as insurance that the tail of the probability distribution of mRNA falls off to zero before it reaches the second fixed point, i. e. its variance is four times smaller than the distance between x 1 and x 2 , where the variance is taken as ffiffiffiffiffi x 1 p , the true value for constant transcription rate [17]; this, of course, is only an approximation, as the dependence of f on y is negligible only up to certain values of y.
Next, in each set we increased a by such a factor that the numerical solution of Eqs. (1-5) yielded a delayed response of 300 (+5) minutes.    Once the parameter sets were selected in this manner, we performed a stochastic simulation, using the Gillespie algorithm [18], 100 times for each set. In each run, we started with S~1, z 1~0 , z 2~0 , x~½x 1 and y~½y 1 at time t~0, where ½::: means rounded to the nearest integer, and let the system evolve to its equilibrium. When the input was introduced, we set the time to zero again. Each run was terminated when the protein number exceeded one half of its value at the final steady state; the time of termination -the delay -was recorded.
We repeated the above procedure with parameter sets in which all but the value of as were the same. The new as were chosen so as to obtain delays of 400 minutes. In order to determine what effect, if any, does promoter complexity have on the accuracy of delays, we derived equations similar to Eqs. (1)-(5) for three TAS, instead of two, and repeated the procedure described above.
Based on our results we observed several trends. First, the delays predicted by Eqs. (1)-(5) are almost always less than the averaged delays obtained from the stochastic simulations. Second, the relative delay shifts and delay uncertainties, defined by where d i is the delay of the ith run and t~300 or 400 minutes, approximately follow a linear trend as shown in Fig. (4); this implies that the more the averaged delay differs from the deterministic delay, the greater the delay uncertainty. Another trend can be seen in Fig. (5a-b) where we plotted the fraction of those cases whose s r falls between the range of values indicated on the x-axis for both, the two and three TAS. From these, one can draw two conclusions: transcription initiation requiring high number of activator sites tends to lead to less accurate delays; and, the longer the delay is the less accurate it is.

Correlation between the delay uncertainties and system parameters
Since the system parameters completely define a model, any stochastic quantity can, in principle, be calculated in terms of them, whether one uses a stochastic algorithm or the master's equation. This is also true for d r and s r . However, obtaining an analytical expression for, say, s r as a function of the system parameters is not feasible. The most one can do is find a trend between some function of the parameters and s r . Earlier we hypothesized that the initial mRNA number, which is approxi-mately given by r 0 =k, is a factor in determining s r . Figure (6a-d) shows that this trend indeed exists; however, the fact that many points fall far off the fitting line implies that the story is more complicated and requires cooperation of the other parameters.
To obtain a more accurate relation between s r and the system parameters we took an ansatz with fl 1 :::l M g being the set of system parameters and fn 1 :::n M g a set of integers, such that P i n i~0 (since zero is the dimension of s r ). The best fit is given by Figure (7a-d) shows the plot of the correlation between F (l) and s r . As a final step, we searched for an optimal linear combination of k=r 0 and F (l) and found that with c 2~0 :0724 for two TAS and c 3~0 :177776 for three TAS, gives the best fit, which can be seen in Fig. (8a-b). While Eq. (13) does not provide an exact relation, it does shed light on the condition that needs to be satisfied in order to construct a switch with accurate delays, namely that G(l) must be small. A more general approach, one that does not depend on the coefficient c, would be to require that F be as small as possible while keeping r 0 =k large. To put this into a test, we generated one hundred parameter sets for two TAS and kept the first five for which F was the smallest; the same was done on the opposite end, corresponding to the largest F. For all ten sets, r 0 =k was kept above 20. Performing the same stochastic simulations as before showed that the first group, with small F s, all had s r below 0.1 (10%), while the latter averaged at 18%. We repeated this procedure, again for two TAS with cooperativity, C,equal to 2,3,4 and 5 (a 2~C a 1 ). Table 1 presents the average scores and deviations for all ten cases. Regardless of the cooperativity, all five cases show significant discrepancy in s r between low and high values of F .