A Hybrid of the Chemical Master Equation and the Gillespie Algorithm for Efficient Stochastic Simulations of Sub-Networks

Modeling stochastic behavior of chemical reaction networks is an important endeavor in many aspects of chemistry and systems biology. The chemical master equation (CME) and the Gillespie algorithm (GA) are the two most fundamental approaches to such modeling; however, each of them has its own limitations: the GA may require long computing times, while the CME may demand unrealistic memory storage capacity. We propose a method that combines the CME and the GA that allows one to simulate stochastically a part of a reaction network. First, a reaction network is divided into two parts. The first part is simulated via the GA, while the solution of the CME for the second part is fed into the GA in order to update its propensities. The advantage of this method is that it avoids the need to solve the CME or stochastically simulate the entire network, which makes it highly efficient. One of its drawbacks, however, is that most of the information about the second part of the network is lost in the process. Therefore, this method is most useful when only partial information about a reaction network is needed. We tested this method against the GA on two systems of interest in biology - the gene switch and the Griffith model of a genetic oscillator—and have shown it to be highly accurate. Comparing this method to four different stochastic algorithms revealed it to be at least an order of magnitude faster than the fastest among them.


Introduction
In a network of chemical reactions, the molecular concentrations at any given time cannot be predicted with a certainty; they can only be anticipated with a certain probability. This probability can in principle be determined by solving (analytically or numerically) the chemical master equation (CME). Attempting to do so, however, can more often than not be a frustrating exercise: except for a handful of simple cases, the CME cannot be solved analytically, and for a lot of interesting cases even a numerical solution can be near impossible to attain. One way around this obstacle was an algorithm proposed by Doob [1] and later presented and popularized by Gillespie [2]. The authors showed that the information stored in the CME can be extracted through a series of relatively simple steps coupled with the help of a pseudo-random number generator. Known today by its popular name as the Gillespie algorithm (GA) (also known as kinetic Monte Carlo or stochastic simulation algorithm (SSA)), this procedure guarantees an exact solution to the CME, provided these steps are repeated sufficiently many times so as to build a statistically significant ensemble of data points. The solution to the CME can thus be reconstructed step by step without the need for enormous memory storage capacity that is usually required to solve the CME directly. The one drawback of the GA is that the number of steps required scales with the number of reactions and the magnitude of their rates. Consequently, for large reaction networks the running time may become impractical.
Since it first appeared, researchers have devised faster versions of the GA, some of which are exact [3,4], in the sense that they give statistically identical answers as the CME, while others rely on approximations [5][6][7][8]. In conjunction with the Langevin approximation [9], these algorithms comprise a library of methods to chose from when simulating reaction networks. The Dizzy package [10], for instance, is one such library containing four stochastic simulators of various speed and accuracy.
With the advent of stochastic algorithms, the appeal for solving the CME directly has not diminished however. Having an analytical solution to the CME, even an approximate one, is extremely useful and can provide insight into the stochastic properties of chemically interacting networks. It is not surprising then that many methods for finding approximate solutions to the CME exist and continue to appear [11][12][13][14][15][16].
There exists a class of methods which involve partitioning a reaction network in a way that facilitates either a solution to the CME [17,18] or a faster stochastic simulation [19][20][21] of one part of the reaction network, while yielding only a partial information about the rest of the network. Other partitioning methods rely on large differences among the values of the reaction rates [22,23]. Such methods take advantage of the fast rates by considering the chemical species affected by these rates to be in a quasi-steady state.
In this paper, we present another method where synergy between the CME and the GA is exploited for the purpose of simulating one part of a reaction network. While previous methods of this kind rely on a priory approximations of either the CME or the GA, we begin by deriving the exact equations from which the next reaction time and the next reaction probability can be computed, as well as the exact CME for the non-simulated part of the network. Once these equations take form, they may be solved by virtue of approximations, either as a matter of necessity or in order to speed up the simulation process. We show on two biologically relevant examples how this method can be applied and discuss what its limitations are.

Chemical Master Equation
The time evolution of the joint probability distribution P(n, t) for a chemically reacting system comprising N molecular species and J reactions is governed by the chemical master equation (CME): where n is short for the set {n 1 , n 2 , . . ., n N }, and a 1 (n). . .a J (n), are the reaction propensities, which, for our purposes here, will depend on time only explicitly, i. e. through the variables n.
The matrix elements υ iμ specify the change in n i due to the μth reaction. It will be useful later on to represent Eq (1) in another way [9,[24][25][26]. Let us define a vector state and operatorsÂ i ,Â y i andn i the action of which on the state |n i i iŝ where, by definition,n i ¼Â y iÂi . The index i means that the operators act only on the ith vector state, leaving the rest of them untouched. The vector state |n i i, and its transpose hn i | are simply the orthogonal unit column and row vectors respectively: The vector state product in Eq (2) can be thought of as a vector whose elements are the individual vector states, |n i i: A product of any operators, e. g.Â 1Â2 , could then be represented by an N × N matrix: where 1 i is the identity operator: 1 i jn i i ¼ jn i i. With this notation, the master equation can be written in the form whereĤ is an operator acting on the vector state Q N i¼1 jn i i and θ(.) is a step function centered around zero.
Let us check that Eqs (8) and (1) are indeed identical. To see how the operatorĤ in Eq (9) acts on the state |ψi, let us first look at how the operators within it act on the individual vector states |n j i. We have and hence, Since the second term is merely the identity operator, it leaves the state unchanged. The operator a m ðnÞ acting on the product state ∏ j |n j i also leaves it unchanged and itself becomes a number, a μ (n). Putting the above relations together, we can writê where in the last line we used the fact that υ jμ = θ(υ jμ )|υ jμ |−θ(−υ jμ )|υ jμ |. The left hand side of Eq (8) The only way expressions Eqs (12) and (13) can be equal is if the coefficients of the product vector state ∏ j = 1 |n j i on both sides are equal. This leads to Eq (1).
The formal solution to Eq (8) is where |ψ(0)i is the initial vector state specified entirely by P(n, 0). The operator eĤ t is called the evolution operator. Multiplying both sides of Eq (14) by hn| and invoking the orthogonality relation hn 0 jni ¼ d n 0 1 n 1 d n 0 2 n 2 :::d n 0 N n N we obtain the probability distribution where for brevity jni ¼ Q N i¼1 jn i i. With this formalism it is easy to write down quantities such as the transition probabilities. For example, Wðn 0 ; t 0 jn; tÞ ¼ hn 0 jeĤ ðt 0 ÀtÞ jni; ð16Þ means the probability to find the system in the state n 0 at time t 0 if at time t it was in the state n.
Lastly, let us also write down the identity operator in the form as it will be useful in later sections.

Gillespie Algorithm
The idea behind the GA is to simulate a chain of Markov processes by sampling the probability distribution of the time elapsed since the last reaction, τ, and the probability that a specific reaction, μ, will occur at τ, such that any reaction occurring at τ has probability 1. The steps are as follows: 1. At some initial time t (e. g. t = 0) select your initial state n and compute the propensities a μ (n).
2. Select two random numbers r 1 and r 2 .
3. Compute τ using the formula where a 0 ¼ P m a m ðnÞ.

Find the smallest integer j that satisfies
and set j = μ.
Repeating these steps until t reaches some final time leads to a particular path, or realization, for n. In order to obtain the same information within this time interval as is contained in the CME, one must compute this realization infinitely many times. It is in this sense that the GA and the CME are exactly equivalent. Of course, in practice one only needs to compute a finite number of realizations to extract meaningful information about the system. Depending on how many realizations are considered "sufficient" and how long it takes to compute each realization, the GA may be a fast route to solving the CME, or it may be a very slow one. In the next section we will show how one can combine the CME and the GA in order to simulate stochastically a part of a system.

CME-GA hybrid
Consider a reaction network of J reactions with propensities {a 1 , . . ., a J }, comprising two sets of molecular species: m ¼ fm 1 ; :::; m N 1 g and n ¼ fn 1 ; :::; n N 2 g (N 1 + N 2 = N). The propensities are some functions of m and n, but not time. Let us arrange the reactions into two groups, G 1 = {a 1 , . . ., a K−1 } and G 2 = {a K , . . ., a J }, such that the reactions in group G 1 can affect both m and n, while the reactions in group G 2 can only affect n. During a time interval, τ, in which no reaction occurs in G 1 , the CME for G 2 can be written as follows: The subscripts m in P m (n, t) serve as a reminder that the solution of P m (n, t) will depend on their value. Note that by definition n may change during τ only via the reaction channels in G 2 , but not in G 1 . Let us now see how one can sample τ and the next reaction in G 1 from their respective probability distributions in a manner similar to the one described in the previous section. We begin by deriving the probability distribution for τ.
Let us divide time into L discrete infinitesimal intervals, Δt, such that ΔtL = τ. Using the notation introduced earlier, the probability that no reaction in G 1 occurs in the time τ can be expressed as QðtÞ ¼ X n 0 ;n 1 ;::: where S k ¼ P J i¼K a i ðm; n k Þ and k refers to the kth time interval. The exponential terms, exp [−S k Δt], are the probabilities that, starting at time t k = kΔt, no reaction occurs in the time Δt. Since the Ss are numbers, not operators, we can move them wherever we want within the total product. Especially useful is to move each exp[−S k Δt] just left of the state |ni for each k: QðtÞ ¼ X n 1 ;n 2 ;::: hn L jeĤ Dt e ÀS LÀ1 Dt jn LÀ1 ihn LÀ1 jeĤ Dt e ÀS LÀ2 Dt jn LÀ2 i::: :::hn 2 jeĤ Dt e ÀS 1 Dt jn 1 ihn 1 jeĤ Dt e ÀS 0 Dt ji; where ji ¼ P n 0 P m ðn 0 ; 0Þjn 0 i is the initial state. Now, thanks to the fact that and the relation lim !0 exp½Bexp½Ĉ ¼ exp½ðB þĈÞ for arbitrary operatorsB andĈ, we may write jn LÀ2 ihn LÀ2 j ! ::: whereĤ 0 ¼Ĥ þŜ. Recalling Eq (17), we can set all the terms in the parentheses to unity. The expression for Q(τ) can now be written as where, setting t L to τ, This expression is identical in structure to Eq (15) and as such can be expressed in a differential form: with the initial conditions that Q m (n, 0) = P m (n, 0). This equation resembles Eq (20) except that now we have all propensities, from G 1 and G 2 , appearing in the second term.
Next we need to compute the probability p μ that the μth reaction in G 1 occurs at time τ. This is given by the probability that the μth reaction occurs given a specific set n, multiplied by the probability of having n, and then summing over all n. In symbols: We now have everything we need to simulate the evolution of m via the CME-GA. Here are the steps: 1. Select your initial set m and initial probability P m (n, 0) and hence Q m (n, 0) (since P m (n, 0) = Q n (n, 0)). (20) and (27) and compute Q(τ) and p μ for μ = 1, . . ., K − 1 according to Eqs (25) and (28). 3. Compute τ and select the next reaction μ according to:

Solve Eqs
i. Generate a random real number ξ 1 in the range [0, 1] and solve ξ 1 = Q(τ) for τ ii. Generate another random real number ξ 2 in the range [0, 1] and select the smallest integer k that satisfies the condition P k j¼1 p j > x 2 . Set μ = k.
4. Update m and let P m (n, τ) be the new initial condition for Eqs (20) and (27) if and only if the selected reaction does not effectuate a change in n i . If the selected reaction changes an n i by ±w i (w i = 1, 2. . .), the new initial probability P m (n, τ) must be modified according to: whereL i is an operator that transforms P m (n, τ) like so: ð1 À d n i w i Þ: If more than one species of n is affected by a reaction in G 1 , the above transformation must be applied to all of them, e. g.L 1L2 :::P m ðn; tÞ.
This procedure allows one to simulate stochastically a part of a reaction network, i. e. m, at the expense of losing information about the rest of the network, i. e. n. However, the tractability of this algorithm will depend on the system of interest and on the way in which it is partitioned. If, for instance, a particular choice of partition leads to Eqs (20) and/or (27) being too complicated to solve efficiently, the speed of this algorithm may end up being inferior to other stochastic algorithms. Another obstacle to efficiency is having to solve ξ 1 = Q(τ) for τ, which, depending on the particulars of P(τ), might be a difficult task. One way to solve for τ is to use a minimization algorithm that optimizes the measure (ξ 1 − Q(τ)) 2 with respect to τ. This, however, will likely require a number of steps, calling into question the efficiency of this algorithm. Same can be said of Eq (28), in which the summation(s) may or may not have a closed form. These potential difficulties require not only that the system be partitioned wisely, but also that some of the steps above be simplified/approximated. In the next section we will test the CME-GA on two biological systems and see how it may be applied effectively and accurately.

The genetic switch
Let us consider a single-gene motif with positive autoregulation and a promoter cooperativity of 2. This system can exhibit very large noise [27] due to its positive feedback, and is therefore of interest in systems biology. The simplest yet realistic version of this gene motif is the one described by the following reactions (see Fig 1A): where m stands for the copy number of mRNA molecules, n for the copy number of proteins, and S 0 , S 1 and S 2 label the promoter states: unoccupied, occupied by one protein, and occupied by two proteins, respectively. The reaction propensities appear above each arrow. The parameters α i , β i , r, r 0 , K, k, q are the reaction frequencies per molecule. One can get a sense for the dynamics of this system by looking at the evolution of its averaged variables, S 0 , S 1 , S 2 , m and Efficient Stochastic Simulations of Sub-Networks: A Master Equation-Gillespie Algorithm Hybrid n, given by the set of ordinary differential equations (ODE) where Fig 1B shows the dynamics of S 0 , S 1 , S 2 , m and n. Let us now employ the CME-GA to study the stochastic properties of a part of this system. Notice that the reactions were organized into two columns. The reactions in the left column effectuate a change in either m or in both S i and n together, but not exclusively in n; the reactions that change n only appear in the right column. Hence, referring to the notation in the previous section, the left column represents the set m = {S 0 , S 1 , S 2 , m}, while the right column represents the set n = {n}. Hence, the two equations, Eqs (20) and (27) It is easy to check that, provided the initial probability P m ðn; 0Þ is a Poisson distribution, the solution to both, Eqs (31)  where for notational simplicity the indexes m were omitted. Inserting P(n, t) and Q(n, t) into Eqs (31) and (32) respectively yields _ g ¼K À qh; ð37Þ With the initial conditions Q(n, 0) = P(n, 0), and hence g(0) = h(0) = λ(0), we obtain hðtÞ ¼ λð0Þ À Km q e Àqt þ Km q ; ð39Þ The probability distribution for τ acquires a closed form: Referring to Eq (28), we can readily compute the probabilities for each reaction in the left column to occur. In the order in which they appear in Eq (29), they read: where We now have all ingredients to run the CME-GA. However, before we do, let us consider the computational expenses involved in performing all the steps. In particular, solving Eq (41) for τ may slow down the algorithm considerably, as it requires an optimization algorithm of some kind. One way to avoid this is to solve Eq (41) approximately by assuming that the solution, i. e. τ, is small and expand ln Q(τ) up to τ 2 . The equation for τ then becomes where b 1 ¼K À Km þ ðq À qÞλð0Þ; b 2 ¼ ½Km Àqλð0Þðq À qÞ: Writing τ = τ 0 + τ 1 , where τ 0 satisfies ln1/ξ = b 1 τ 0 and τ 1 is a correction, we obtain the ratio Thus, as long as τ 1 /τ 0 < , where is some small number, e. g. 0.001, we may write This equation is identical in structure to Eq (18), especially when we see that b 1 = (α 1 S 0 + α 2 S 1 )λ(0) + β 1 S 1 + β 2 S 2 + r 0 + rS 2 + km, which is just the sum of all reactions in the left column but with n replaced by its average, λ(0). The condition τ 1 /τ 0 < must be incorporated into the CME-GA and checked for each cycle; if it fails, Eq (41) must be solved by some other means, e. g. an optimization algorithm.
Similarly for the reaction probabilities, we may simplify them by expanding 1/(a i + n) around n − hni and then averaging each term: where we used the fact that for a Poisson distribution hni = h(n − hni) 2 i = h(n − hni) 3 i, which in the present case is equivalent to λ(0). Here again we need to keep in mind that this approximation may become inaccurate (depending on the number of terms), as for instance when λ(0), a 1 < 1. Finally, before running the CME-GA, we need to address its forth step. Remember that expressions Eqs (33) and (34) are the solutions to Eqs (31) and (32) only if their initial distributions are Poissonian. However, when reactions 1-4 in the left column occur, we must add to or subtract from the system one copy of n, and then modify the new initial probability P(n, τ) according to step 4 of the CME-GA. This however will render Eqs (33) and (34) incorrect. There may be ways to overcome this problem; however, in the present case, with α 1 , α 2 < <1, we are justified in ignoring it. Running the CME-SSA with the parameters of

The Griffith model of a genetic oscillator
Consider now a larger network consisting of a promoter with three states, S 0 , S 1 , and S 2 , an mRNA, and a protein that can be in several conformations, e. g. when undergoing a multi-step phosphorylation [28], such that in its final conformation the protein can bind to its promoter and repress it (see Fig 3A). The reactions for this system are as follows: Here again m refers to the mRNA copy number, n i to the protein copy number, where the indexes i = 1, . . ., d label different protein conformations. The differential equations for S 0 , S 1 , S 2 , S 3 , S 4 , m and n i read: For some values of its reaction rates, this system can exhibit sustained oscillations, as shown in Fig 3. The solutions to Eqs (20) and (27) for the right column are products of Poisson distributions, provided again that their initial distributions are also Poisson. Inserting these into Eqs (20) and  (27) leads to differential equations for the λs: and another almost identical set for the hs but with q replaced with q ¼ a 1 S 0 þ a 2 S 1 þ a 3 S 2 þ a 4 S 3 , and _ g ¼K À qh d ; ð56Þ The solutions for the λs, hs and g are Following the steps detailed in the previous section, we arrive at the same approximation for τ: with the same error parameter as in Eq (48), τ 1 /τ 0 , but now with b 2 ¼ ðq À qÞðaλ dÀ1 ð0Þ À qλ d ð0ÞÞ.
The propensities p 1 to p 10 are given by with Finally, making the approximation that reactions p 1 -p 8 do not alter P m (n, 0) significantly, and expanding the terms h. . .i in n d − λ d (τ), we can run the CME-GA. The results are shown in

Discussion
The method presented herein provides a means of stochastically simulating a reaction sub-network. Because most of the information about the rest of the network is lost, its usefulness is limited to cases where partial information about a network is sufficient. The two examples discussed above illustrate the accuracy of this method. In terms of efficiency, Table 1. shows the computation times of the CME-GA and five other stochastic simulation algorithms that were used to simulate the two models. It is clear that the CME-GA is significantly more efficient than any of the other algorithms.
Important to notice is the relation between the speed of CME-GA and the abundance of those molecular species that appear in the CME Eq (20). Since the speed of the GA scales with the number of species and their abundance, and the CME does not (at least when it can be solved exactly), different partitions will lead to different speeds. If, for instance, we had chosen to partition either of the example systems such that the mRNA appeared in the CME Eq (20) instead of the protein, and it was the protein that was simulated via the GA, the computational time would have been drastically increased. Therefore, a network to be partitioned must be done so wisely. This of course may be in conflict with the user's desire to simulate a particular set of species. Consequently, if the CME-GA is to remain superior in speed to others, it must be limited to such partitions where the species with large molecular numbers are placed in the non-simulated sub-network.
Although Eqs (20) and (27), which are necessary for the CME-GA to run, were derived exactly, without any assumptions, in practice they may not always be tractable and will require approximations. However, bisecting a system into two groups as proscribed above necessarily renders the CME less complex (Eq (1) vs. Eq (20)) and hence more manageable. It should be noted that as the simulated reaction network grows larger, the time between reactions becomes shorter. This means that Eqs (20) and (27), given a large sub-network (i. e. G 1 ), will only need to be solved for very short times. Another relief may come from moment closure methods [29][30][31]: since the propensities in Eq (28) can be expressed as a sum of statistical moments (see Eq (50)), one needs only to solve the set of equations for a few moments, instead of the full CME for the sub-system G 2 ; and same goes for Q m (n, t). This last approach might in fact be the most promising way of extending our algorithm to more complex systems.
Lastly, it bears mentioning that although the information about the sub-system G 2 is lost, it may not always be completely lost for all types of systems. In both examples discussed above, when the original CME, Eq (1), is multiplied by the variables in G 2 and summed over all variables, one ends up with for the genetic switch, and _ n 1 ¼ K m À a n 1 ; _ n 2 ¼ að n 1 À n 2 Þ; : : : _ n d ¼ a n dÀ1 À q n d for the Griffith model. And because m is computed via the CME-GA, all the above equations have a closed form. Comparison of computing times (rounded to nearest second) of the four standard algorithms with the CME-GA hybrid. For both, the genetic switch and the Griffith model, the ensemble size was 1000. The stopping time was: 500 for the former and 2000 for the latter.