## Figures

## Abstract

Modeling stochastic behavior of chemical reaction networks is an important endeavor in many aspects of chemistry and systems biology. The chemical master equation (CME) and the Gillespie algorithm (GA) are the two most fundamental approaches to such modeling; however, each of them has its own limitations: the GA may require long computing times, while the CME may demand unrealistic memory storage capacity. We propose a method that combines the CME and the GA that allows one to simulate stochastically a part of a reaction network. First, a reaction network is divided into two parts. The first part is simulated via the GA, while the solution of the CME for the second part is fed into the GA in order to update its propensities. The advantage of this method is that it avoids the need to solve the CME or stochastically simulate the entire network, which makes it highly efficient. One of its drawbacks, however, is that most of the information about the second part of the network is lost in the process. Therefore, this method is most useful when only partial information about a reaction network is needed. We tested this method against the GA on two systems of interest in biology - the gene switch and the Griffith model of a genetic oscillator—and have shown it to be highly accurate. Comparing this method to four different stochastic algorithms revealed it to be at least an order of magnitude faster than the fastest among them.

**Citation: **Albert J (2016) A Hybrid of the Chemical Master Equation and the Gillespie Algorithm for Efficient Stochastic Simulations of Sub-Networks. PLoS ONE 11(3):
e0149909.
https://doi.org/10.1371/journal.pone.0149909

**Editor: **Ramon Grima,
University of Edinburgh, UNITED KINGDOM

**Received: **November 16, 2015; **Accepted: **February 5, 2016; **Published: ** March 1, 2016

**Copyright: ** © 2016 Jaroslav Albert. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All data are available in the paper.

**Funding: **This work was supported by The Interuniversity Attraction Poles program of the Belgian Science Policy Office, under grant IAP P7-35 (www.belspo.be) and The Onderzoeksraad of the Vrije Universiteit Brussel (www.vub.ac.be) through the strategic research program "Applied Physics and Systems Biology".

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

In a network of chemical reactions, the molecular concentrations at any given time cannot be predicted with a certainty; they can only be anticipated with a certain probability. This probability can in principle be determined by solving (analytically or numerically) the chemical master equation (CME). Attempting to do so, however, can more often than not be a frustrating exercise: except for a handful of simple cases, the CME cannot be solved analytically, and for a lot of interesting cases even a numerical solution can be near impossible to attain. One way around this obstacle was an algorithm proposed by Doob [1] and later presented and popularized by Gillespie [2]. The authors showed that the information stored in the CME can be extracted through a series of relatively simple steps coupled with the help of a pseudo-random number generator. Known today by its popular name as the Gillespie algorithm (GA) (also known as kinetic Monte Carlo or stochastic simulation algorithm (SSA)), this procedure guarantees an exact solution to the CME, provided these steps are repeated sufficiently many times so as to build a statistically significant ensemble of data points. The solution to the CME can thus be reconstructed step by step without the need for enormous memory storage capacity that is usually required to solve the CME directly. The one drawback of the GA is that the number of steps required scales with the number of reactions and the magnitude of their rates. Consequently, for large reaction networks the running time may become impractical.

Since it first appeared, researchers have devised faster versions of the GA, some of which are exact [3, 4], in the sense that they give statistically identical answers as the CME, while others rely on approximations [5–8]. In conjunction with the Langevin approximation [9], these algorithms comprise a library of methods to chose from when simulating reaction networks. The Dizzy package [10], for instance, is one such library containing four stochastic simulators of various speed and accuracy.

With the advent of stochastic algorithms, the appeal for solving the CME directly has not diminished however. Having an analytical solution to the CME, even an approximate one, is extremely useful and can provide insight into the stochastic properties of chemically interacting networks. It is not surprising then that many methods for finding approximate solutions to the CME exist and continue to appear [11–16].

There exists a class of methods which involve partitioning a reaction network in a way that facilitates either a solution to the CME [17, 18] or a faster stochastic simulation [19–21] of one part of the reaction network, while yielding only a partial information about the rest of the network. Other partitioning methods rely on large differences among the values of the reaction rates [22, 23]. Such methods take advantage of the fast rates by considering the chemical species affected by these rates to be in a quasi-steady state.

In this paper, we present another method where synergy between the CME and the GA is exploited for the purpose of simulating one part of a reaction network. While previous methods of this kind rely on *a priory* approximations of either the CME or the GA, we begin by deriving the exact equations from which the next reaction time and the next reaction probability can be computed, as well as the exact CME for the non-simulated part of the network. Once these equations take form, they may be solved by virtue of approximations, either as a matter of necessity or in order to speed up the simulation process. We show on two biologically relevant examples how this method can be applied and discuss what its limitations are.

## Materials and Methods

### Chemical Master Equation

The time evolution of the joint probability distribution *P*(**n**, *t*) for a chemically reacting system comprising *N* molecular species and *J* reactions is governed by the chemical master equation (CME):
(1)
where **n** is short for the set {*n*_{1}, *n*_{2}, …, *n*_{N}}, and *a*_{1}(**n**)…*a*_{J}(**n**), are the reaction propensities, which, for our purposes here, will depend on time only explicitly, i. e. through the variables **n**. The matrix elements *υ*_{iμ} specify the change in *n*_{i} due to the *μ*th reaction. It will be useful later on to represent Eq (1) in another way [9, 24–26]. Let us define a vector state
(2)
and operators , and the action of which on the state |*n*_{i}〉 is
(3)
where, by definition, . The index *i* means that the operators act only on the *i*th vector state, leaving the rest of them untouched. The vector state |*n*_{i}〉, and its transpose 〈*n*_{i}| are simply the orthogonal unit column and row vectors respectively:
(4)
so that 〈0| = (1, 0, 0, …), 〈1| = (0, 1, 0, …), etc. The operators and have the form
(5)

The vector state product in Eq (2) can be thought of as a vector whose elements are the individual vector states, |*n*_{i}〉:
(6)

A product of any operators, e. g. , could then be represented by an *N* × *N* matrix:
(7)
where is the identity operator: .

With this notation, the master equation can be written in the form
(8)
where
(9)
is an operator acting on the vector state and *θ*(.) is a step function centered around zero.

Let us check that Eqs (8) and (1) are indeed identical. To see how the operator in Eq (9) acts on the state |*ψ*〉, let us first look at how the operators within it act on the individual vector states |*n*_{j}〉. We have
(10)
and hence,
(11)

Since the second term is merely the identity operator, it leaves the state unchanged. The operator acting on the product state ∏_{j}|*n*_{j}〉 also leaves it unchanged and itself becomes a number, *a*_{μ}(**n**). Putting the above relations together, we can write
(12)
where in the last line we used the fact that *υ*_{jμ} = *θ*(*υ*_{jμ})|*υ*_{jμ}|−*θ*(−*υ*_{jμ})|*υ*_{jμ}|. The left hand side of Eq (8) states that
(13)

The only way expressions Eqs (12) and (13) can be equal is if the coefficients of the product vector state ∏_{j = 1}|*n*_{j}〉 on both sides are equal. This leads to Eq (1).

The formal solution to Eq (8) is
(14)
where |*ψ*(0)〉 is the initial vector state specified entirely by *P*(**n**, 0). The operator is called the evolution operator. Multiplying both sides of Eq (14) by 〈**n**| and invoking the orthogonality relation we obtain the probability distribution
(15)
where for brevity . With this formalism it is easy to write down quantities such as the transition probabilities. For example,
(16)
means the probability to find the system in the state **n**′ at time *t*′ if at time *t* it was in the state **n**.

Lastly, let us also write down the identity operator in the form (17) as it will be useful in later sections.

### Gillespie Algorithm

The idea behind the GA is to simulate a chain of Markov processes by sampling the probability distribution of the time elapsed since the last reaction, *τ*, and the probability that a specific reaction, *μ*, will occur at *τ*, such that any reaction occurring at *τ* has probability 1. The steps are as follows:

- At some initial time
*t*(e. g.*t*= 0) select your initial state**n**and compute the propensities*a*(_{μ}**n**). - Select two random numbers
*r*_{1}and*r*_{2}. - Compute
*τ*using the formula (18) where . - Find the smallest integer
*j*that satisfies (19) and set*j*=*μ*. - Update the system according to and set
*t*=*t*+*τ*. - Return to step 1.

Repeating these steps until *t* reaches some final time leads to a particular path, or realization, for **n**. In order to obtain the same information within this time interval as is contained in the CME, one must compute this realization infinitely many times. It is in this sense that the GA and the CME are exactly equivalent. Of course, in practice one only needs to compute a finite number of realizations to extract meaningful information about the system. Depending on how many realizations are considered “sufficient” and how long it takes to compute each realization, the GA may be a fast route to solving the CME, or it may be a very slow one. In the next section we will show how one can combine the CME and the GA in order to simulate stochastically a part of a system.

### CME-GA hybrid

Consider a reaction network of *J* reactions with propensities {*a*_{1}, …, *a*_{J}}, comprising two sets of molecular species: and (*N*_{1} + *N*_{2} = *N*). The propensities are some functions of **m** and **n**, but not time. Let us arrange the reactions into two groups, *G*_{1} = {*a*_{1}, …, *a*_{K−1}} and *G*_{2} = {*a*_{K}, …, *a*_{J}}, such that the reactions in group *G*_{1} can affect both **m** and **n**, while the reactions in group *G*_{2} can only affect **n**. During a time interval, *τ*, in which no reaction occurs in *G*_{1}, the CME for *G*_{2} can be written as follows:
(20)

The subscripts **m** in *P*_{m}(**n**, *t*) serve as a reminder that the solution of *P*_{m}(**n**, *t*) will depend on their value. Note that by definition **n** may change during *τ* only via the reaction channels in *G*_{2}, but not in *G*_{1}. Let us now see how one can sample *τ* and the next reaction in *G*_{1} from their respective probability distributions in a manner similar to the one described in the previous section. We begin by deriving the probability distribution for *τ*.

Let us divide time into *L* discrete infinitesimal intervals, Δ*t*, such that Δ*tL* = *τ*. Using the notation introduced earlier, the probability that no reaction in *G*_{1} occurs in the time *τ* can be expressed as
(21)
where and *k* refers to the *k*th time interval. The exponential terms, exp[−Σ_{k} Δ*t*], are the probabilities that, starting at time *t*_{k} = *k*Δ*t*, no reaction occurs in the time Δ*t*. Since the Σs are numbers, not operators, we can move them wherever we want within the total product. Especially useful is to move each exp[−Σ_{k} Δ*t*] just left of the state |**n**〉 for each *k*:
(22)
where is the initial state. Now, thanks to the fact that
(23)
and the relation for arbitrary operators and , we may write
(24)
where . Recalling Eq (17), we can set all the terms in the parentheses to unity. The expression for *Q*(*τ*) can now be written as
(25)
where, setting *t*_{L} to *τ*,
(26)

This expression is identical in structure to Eq (15) and as such can be expressed in a differential form:
(27)
with the initial conditions that *Q*_{m}(**n**, 0) = *P*_{m}(**n**, 0). This equation resembles Eq (20) except that now we have all propensities, from *G*_{1} and *G*_{2}, appearing in the second term.

Next we need to compute the probability *p*_{μ} that the *μ*th reaction in *G*_{1} occurs at time *τ*. This is given by the probability that the *μ*th reaction occurs given a specific set **n**, multiplied by the probability of having **n**, and then summing over all **n**. In symbols:
(28)

We now have everything we need to simulate the evolution of **m** via the CME-GA. Here are the steps:

- Select your initial set
**m**and initial probability*P*_{m}(**n**, 0) and hence*Q*_{m}(**n**, 0) (since*P*_{m}(**n**, 0) =*Q*_{n}(**n**, 0)). - Solve Eqs (20) and (27) and compute
*Q*(*τ*) and*p*_{μ}for*μ*= 1, …,*K*− 1 according to Eqs (25) and (28). - Compute
*τ*and select the next reaction*μ*according to:- i. Generate a random real number
*ξ*_{1}in the range [0, 1] and solve*ξ*_{1}=*Q*(*τ*) for*τ* - ii. Generate another random real number
*ξ*_{2}in the range [0, 1] and select the smallest integer*k*that satisfies the condition . Set*μ*=*k*.

- i. Generate a random real number
- Update
**m**and let*P*_{m}(**n**,*τ*) be the new initial condition for Eqs (20) and (27) if and only if the selected reaction does not effectuate a change in*n*_{i}. If the selected reaction changes an*n*_{i}by ±*w*_{i}(*w*_{i}= 1, 2…), the new initial probability*P*_{m}(**n**,*τ*) must be modified according to: where is an operator that transforms*P*_{m}(**n**,*τ*) like so:

If more than one species of*n*is affected by a reaction in*G*_{1}, the above transformation must be applied to all of them, e. g. .

This procedure allows one to simulate stochastically a part of a reaction network, i. e. **m**, at the expense of losing information about the rest of the network, i. e. **n**. However, the tractability of this algorithm will depend on the system of interest and on the way in which it is partitioned. If, for instance, a particular choice of partition leads to Eqs (20) and/or (27) being too complicated to solve efficiently, the speed of this algorithm may end up being inferior to other stochastic algorithms. Another obstacle to efficiency is having to solve *ξ*_{1} = *Q*(*τ*) for *τ*, which, depending on the particulars of *P*(*τ*), might be a difficult task. One way to solve for *τ* is to use a minimization algorithm that optimizes the measure (*ξ*_{1} − *Q*(*τ*))^{2} with respect to *τ*. This, however, will likely require a number of steps, calling into question the efficiency of this algorithm. Same can be said of Eq (28), in which the summation(s) may or may not have a closed form. These potential difficulties require not only that the system be partitioned wisely, but also that some of the steps above be simplified/approximated. In the next section we will test the CME-GA on two biological systems and see how it may be applied effectively and accurately.

## Results

### The genetic switch

Let us consider a single-gene motif with positive autoregulation and a promoter cooperativity of 2. This system can exhibit very large noise [27] due to its positive feedback, and is therefore of interest in systems biology. The simplest yet realistic version of this gene motif is the one described by the following reactions (see Fig 1A):
(29)
where *m* stands for the copy number of mRNA molecules, *n* for the copy number of proteins, and *S*_{0}, *S*_{1} and *S*_{2} label the promoter states: unoccupied, occupied by one protein, and occupied by two proteins, respectively. The reaction propensities appear above each arrow. The parameters *α*_{i}, *β*_{i}, *r*, *r*_{0}, *K*, *k*, *q* are the reaction frequencies per molecule. One can get a sense for the dynamics of this system by looking at the evolution of its averaged variables, , , , and , given by the set of ordinary differential equations (ODE)
(30)
where . Fig 1B shows the dynamics of , , , and .

**A**) A schematic of the genetic switch. B) Dynamics of average mRNA, *m*, protein, *n*, and the three states of promoter, *S*_{0}, *S*_{1} and *S*_{2}. The chosen reaction frequencies in inverse minutes were: *α*_{1} = *α*_{2} = 0.001, *β*_{1} = *β*_{2} = 1, *r*_{0} = 0.1, *r* = 10, *K* = 1, *k* = 0.05, *q* = 0.01.

Let us now employ the CME-GA to study the stochastic properties of a part of this system. Notice that the reactions were organized into two columns. The reactions in the left column effectuate a change in either *m* or in both *S*_{i} and *n* together, but not exclusively in *n*; the reactions that change *n* only appear in the right column. Hence, referring to the notation in the previous section, the left column represents the set **m** = {*S*_{0}, *S*_{1}, *S*_{2}, *m*}, while the right column represents the set **n** = {*n*}. Hence, the two equations, Eqs (20) and (27), reduce to:
(31)
and
(32)

It is easy to check that, provided the initial probability is a Poisson distribution, the solution to both, Eqs (31) and (32), is also a Poisson distribution. Thus
(33)
and
(34)
where for notational simplicity the indexes **m** were omitted. Inserting *P*(*n*, *t*) and *Q*(*n*, *t*) into Eqs (31) and (32) respectively yields
(35) (36) (37)
where and . With the initial conditions *Q*(*n*, 0) = *P*(*n*, 0), and hence *g*(0) = *h*(0) = *λ*(0), we obtain
(38) (39) (40)

The probability distribution for *τ* acquires a closed form:
(41)

Referring to Eq (28), we can readily compute the probabilities for each reaction in the left column to occur. In the order in which they appear in Eq (29), they read:
(42)
where
(43)
and 〈…〉 stands for . To work out the exact expressions for all the *p*s we can employ the formula
(44)
with Γ(.) being the Gamma function. Thus, we have
(45)

We now have all ingredients to run the CME-GA. However, before we do, let us consider the computational expenses involved in performing all the steps. In particular, solving Eq (41) for *τ* may slow down the algorithm considerably, as it requires an optimization algorithm of some kind. One way to avoid this is to solve Eq (41) approximately by assuming that the solution, i. e. *τ*, is small and expand ln *Q*(*τ*) up to *τ*^{2}. The equation for *τ* then becomes
(46)
where
(47)

Writing *τ* = *τ*_{0} + *τ*_{1}, where *τ*_{0} satisfies ln1/*ξ* = *b*_{1} *τ*_{0} and *τ*_{1} is a correction, we obtain the ratio
(48)

Thus, as long as *τ*_{1}/*τ*_{0} < *ϵ*, where *ϵ* is some small number, e. g. 0.001, we may write
(49)

This equation is identical in structure to Eq (18), especially when we see that *b*_{1} = (*α*_{1} *S*_{0} + *α*_{2} *S*_{1})*λ*(0) + *β*_{1} *S*_{1} + *β*_{2} *S*_{2} + *r*_{0} + *rS*_{2} + *km*, which is just the sum of all reactions in the left column but with *n* replaced by its average, *λ*(0). The condition *τ*_{1}/*τ*_{0} < *ϵ* must be incorporated into the CME-GA and checked for each cycle; if it fails, Eq (41) must be solved by some other means, e. g. an optimization algorithm.

Similarly for the reaction probabilities, we may simplify them by expanding 1/(*a*_{i} + *n*) around *n* − 〈*n*〉 and then averaging each term:
(50)
where we used the fact that for a Poisson distribution 〈*n*〉 = 〈(*n* − 〈*n*〉)^{2}〉 = 〈(*n* − 〈*n*〉)^{3}〉, which in the present case is equivalent to *λ*(0). Here again we need to keep in mind that this approximation may become inaccurate (depending on the number of terms), as for instance when *λ*(0), *a*_{1} < 1.

Finally, before running the CME-GA, we need to address its forth step. Remember that expressions Eqs (33) and (34) are the solutions to Eqs (31) and (32) only if their initial distributions are Poissonian. However, when reactions 1–4 in the left column occur, we must add to or subtract from the system one copy of *n*, and then modify the new initial probability *P*(*n*, *τ*) according to step 4 of the CME-GA. This however will render Eqs (33) and (34) incorrect. There may be ways to overcome this problem; however, in the present case, with *α*_{1}, *α*_{2} < <1, we are justified in ignoring it. Running the CME-SSA with the parameters of Fig 1 leads to the results of Fig 2.

**A**) A superposition of 100 realizations generated by the GA. The black solid curve represents the average of 500 realizations, while the white curve is the solution of Eq (30) for . B) Probability distributions for *m* at *t* = 150min and *t* = 300min, showing the match between the CME-GA (asterisk) and the GA (bar) constructed from an ensemble size of 10000. C) Comparison between the CME-GA and the GA of the averages and standard deviations of *m*, *S*_{0}, *S*_{1} and *S*_{2}. The ensemble size was 1000.

### The Griffith model of a genetic oscillator

Consider now a larger network consisting of a promoter with three states, *S*_{0}, *S*_{1}, and *S*_{2}, an mRNA, and a protein that can be in several conformations, e. g. when undergoing a multi-step phosphorylation [28], such that in its final conformation the protein can bind to its promoter and repress it (see Fig 3A). The reactions for this system are as follows:
(51)

**A**) A schematic of the Griffith model. B) Dynamics of average mRNA, *m*, protein, *n*, and the five states of promoter, *S*_{0}, *S*_{1}, *S*_{2}, *S*_{3} and *S*_{4}. The chosen reaction frequencies in inverse minutes were: *α*_{1} = *α*_{2} = *α*_{3} = *α*_{4} = 0.01, *β*_{1} = *β*_{2} = *β*_{3} = *β*_{4} = 1, *r* = 10, *K* = 1, *k* = 0.05, *q* = 0.05, *a* = 0.1. The number of protein conformations, *d* was set to 10.

Here again *m* refers to the mRNA copy number, *n*_{i} to the protein copy number, where the indexes *i* = 1, …, *d* label different protein conformations. The differential equations for , , , , , and read:
(52)

For some values of its reaction rates, this system can exhibit sustained oscillations, as shown in Fig 3.

The solutions to Eqs (20) and (27) for the right column are products of Poisson distributions,
(53) (54)
provided again that their initial distributions are also Poisson. Inserting these into Eqs (20) and (27) leads to differential equations for the *λ*s:
(55)
and another almost identical set for the *h*s but with *q* replaced with , and
(56)
with . The solutions for the *λ*s, *h*s and *g* are
(57)

Following the steps detailed in the previous section, we arrive at the same approximation for *τ*:
(58)
with the same error parameter as in Eq (48), *τ*_{1}/*τ*_{0}, but now with .

The propensities *p*_{1} to *p*_{10} are given by
(59)
with
(60)

Finally, making the approximation that reactions *p*_{1}-*p*_{8} do not alter *P*_{m}(**n**, 0) significantly, and expanding the terms 〈…〉 in *n*_{d} − *λ*_{d}(*τ*), we can run the CME-GA. The results are shown in Fig 4.

**A**) Graphs 1–3 show individual realizations generated by the GA. The forth graph shows a superposition of 50 realizations. B) Probability distributions for *m* at *t* = 350min and *t* = 500min, showing the match between the CME-GA (asterisk) and the GA (bar) constructed from an ensemble size of 10000. C) Comparison between the CME-GA and the GA of the averages and standard deviations of *m*, *S*_{0}, *S*_{1}, *S*_{2}, *S*_{3} and *S*_{4}. The ensemble size was 1000.

## Discussion

The method presented herein provides a means of stochastically simulating a reaction sub-network. Because most of the information about the rest of the network is lost, its usefulness is limited to cases where partial information about a network is sufficient. The two examples discussed above illustrate the accuracy of this method. In terms of efficiency, Table 1. shows the computation times of the CME-GA and five other stochastic simulation algorithms that were used to simulate the two models. It is clear that the CME-GA is significantly more efficient than any of the other algorithms.

Important to notice is the relation between the speed of CME-GA and the abundance of those molecular species that appear in the CME Eq (20). Since the speed of the GA scales with the number of species and their abundance, and the CME does not (at least when it can be solved exactly), different partitions will lead to different speeds. If, for instance, we had chosen to partition either of the example systems such that the mRNA appeared in the CME Eq (20) instead of the protein, and it was the protein that was simulated via the GA, the computational time would have been drastically increased. Therefore, a network to be partitioned must be done so wisely. This of course may be in conflict with the user’s desire to simulate a particular set of species. Consequently, if the CME-GA is to remain superior in speed to others, it must be limited to such partitions where the species with large molecular numbers are placed in the non-simulated sub-network.

Although Eqs (20) and (27), which are necessary for the CME-GA to run, were derived exactly, without any assumptions, in practice they may not always be tractable and will require approximations. However, bisecting a system into two groups as proscribed above necessarily renders the CME less complex (Eq (1) vs. Eq (20)) and hence more manageable. It should be noted that as the simulated reaction network grows larger, the time between reactions becomes shorter. This means that Eqs (20) and (27), given a large sub-network (i. e. *G*_{1}), will only need to be solved for very short times. Another relief may come from moment closure methods [29–31]: since the propensities in Eq (28) can be expressed as a sum of statistical moments (see Eq (50)), one needs only to solve the set of equations for a few moments, instead of the full CME for the sub-system *G*_{2}; and same goes for *Q*_{m}(**n**, *t*). This last approach might in fact be the most promising way of extending our algorithm to more complex systems.

Lastly, it bears mentioning that although the information about the sub-system *G*_{2} is lost, it may not always be completely lost for all types of systems. In both examples discussed above, when the original CME, Eq (1), is multiplied by the variables in *G*_{2} and summed over all variables, one ends up with
(61)
for the genetic switch, and
(62)
for the Griffith model. And because is computed via the CME-GA, all the above equations have a closed form.

## Acknowledgments

This research was supported by: the Interuniversity Attraction Poles program of the Belgian Science Policy Office, under grant IAP P7-35 (www.belspo.be); and the Onderzoeksraad of the Vrije Universiteit Brussel (www.vub.ac.be) through the strategic research program “Applied Physics and Systems Biology”. I would like to thank Ekaterina Ejkova for her technical support.

## Author Contributions

Conceived and designed the experiments: JA. Performed the experiments: JA. Analyzed the data: JA. Contributed reagents/materials/analysis tools: JA. Wrote the paper: JA.

## References

- 1. Doob JL. Topics in the Theory of Markoff Chains. Transactions of the American Mathematical Society 1945; 58 (3), 455–473
- 2. Gillespie DT. Exact Stochastic Simulation of Coupled Chemical Reactions. J. Phys. Chem. 1977; 81(25), 2340–2361
- 3. Gibson MA, Bruck J. Efficient Exact Stochastic Simulation of Chemical Systems with Many Species and Many Channels. J. Phys. Chem. 2000; 104(9), 1876–1889
- 4. Cao Y, Li H, Petzold L. Efficient formulation of the stochastic simulation algorithm for chemically reacting systems. J. Chem. Phys. 2004; 121, 4059 pmid:15332951
- 5. Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 2001; 115(4), 1716
- 6. Cao Y, Gillespie DT, Petzold LR. Avoiding negative populations in explicit Poisson tau-leaping. J. Chem. Phys. 2005; 123(5), 054104 pmid:16108628
- 7. Cao Y, Gillespie DT, Petzold LR. Efficient step size selection for the tau-leaping simulation method. J. Chem. Phys. 2005; 124(4), 044109
- 8. Rathinama M, Samadb HE. Reversible-equivalent-monomolecular tau: A leaping method for’small number and stiff’ï¿½ stochastic chemical systems. J. Comp. Phys. 2007; 224(2), 897–923
- 9. Walczak AM, Mugler A, Wiggins CH. Analytic methods for modeling stochastic regulatory networks. Methods Mol Biol. 2012; 880, 273–322 pmid:23361990
- 10. Ramsey S, Orrell D, Bolouri H. Dizzy: stochastic simulation of large-scale genetic regulatory networks. J. Bioinform. Comput. Biol. 2005; 3(2), 415–36 pmid:15852513
- 11. Mugler A, Walczak AM, Wiggins CH. Spectral solutions to stochastic models of gene expression with bursts and regulation. Phys. Rev. E 2011; 80, 041921
- 12. Wolf V, Goel R, Mateescu M, Henzinger TA. Solving the chemical master equation using sliding windows. BMC Systems Biology 2010; 4, 42 pmid:20377904
- 13. Albert J, Rooman M. Probability distributions for multimeric systems. J. Math. Biol. 2011; 10.1007/s00285-015-0877-0.
- 14. Munsky B, Khammash M. A multiple time interval finite state projection algorithm for the solution to the chemical master equation. J. Copm. Phys. 2007; 226(1), 818–835
- 15. Munsky B, Khammash M. Transient analysis of stochastic switches and trajectories with applications to gene regulatory networks. IET Syst Biol. 2008; 2(5), 323–33 pmid:19045827
- 16. MacNamara S, Burrage K, Sidje RB. Multiscale Modeling of Chemical Kinetics via the Master Equation. Multiscale Model. Simul. 2007; 6(4), 1146–1168
- 17. Jahnke T. On Reduced Models for the Chemical Master Equation Multiscale Model. Simul. 2011; 9(4), 1646–1676.
- 18. Smith S, Cianci C, Grima R. Model reduction for stochastic chemical systems with abundant species. J. Chem. Phys. 2015; 143, 214105 pmid:26646867
- 19. Alfonsi A, Cances E, Turinic G, Ventura BD, Huisinga W. Adaptive simulation of hybrid stochastic and deterministic models for biochemical systems. ESAIM: Proc. 2005; 14, 1–13
- 20. Jahnke T, Altmtan D. Efficient simulation of discrete stochastic reaction systems with a splitting method. BIT Num Math 2010; 50(4), 797–822
- 21. Hellander A, Lotstedt P. Hybrid method for the chemical master equation. J. Comp. Phys. 2007; 227(1), 100–122
- 22. Burrage K, Tian T, Burrage P. A multi-scaled approach for simulating chemical reaction systems. Progress in Biophysics & Molecular Biology 2004; 85, 217–234
- 23. Salis H, Kaznessis Y. Accurate hybrid stochastic simulation of a system of coupled chemical or biochemical reactions. J. Chem. Phys. 2005; 122, 054103
- 24. Doi M. Stochastic theory of diffusion-controlled reaction. Journal of Physics A: Mathematical and General 1976; 9, 1465
- 25. Zel’Dovich YB, Ovchinnikov AA. The mass action law and the kinetics of chemical reactions with allowance for thermodynamic fluctuations of the density. Soviet Journal of Experimental and Theoretical Physicsl 1978; 47, 829
- 26. Peliti L. Renormalisation of fluctuation effects in the A+A to A reaction. Journal of Physics A: Mathematical and General 1986; 19, L365
- 27. Albert J, Rooman M. Design Principles of a Genetic Alarm Clock. PLoS ONE 2012; 7(11), e47256 pmid:23144809
- 28. Griffith JS. Mathematics of Cellular Control Processes. I. Negative feedback to one gene. J. Theor. Biol. 1968; 20, 202–208 pmid:5727239
- 29. Lee CH, Kim K-H, Kim P. A moment closure method for stochastic reaction networks. J. Phys. Chem. 2009; 130, 134107
- 30. Barzel B., Biham O. Stochastic analysis of dimerization systems. Phys. Rev. E. 2009; 80, 031117
- 31. Grima R. A study of the accuracy of moment-closure approximations for stochastic chemical kinetics. J. Phys. Chem. 2012;13680, 154105