Stochastic Simulation of Biomolecular Networks in Dynamic Environments

Simulation of biomolecular networks is now indispensable for studying biological systems, from small reaction networks to large ensembles of cells. Here we present a novel approach for stochastic simulation of networks embedded in the dynamic environment of the cell and its surroundings. We thus sample trajectories of the stochastic process described by the chemical master equation with time-varying propensities. A comparative analysis shows that existing approaches can either fail dramatically, or else can impose impractical computational burdens due to numerical integration of reaction propensities, especially when cell ensembles are studied. Here we introduce the Extrande method which, given a simulated time course of dynamic network inputs, provides a conditionally exact and several orders-of-magnitude faster simulation solution. The new approach makes it feasible to demonstrate—using decision-making by a large population of quorum sensing bacteria—that robustness to fluctuations from upstream signaling places strong constraints on the design of networks determining cell fate. Our approach has the potential to significantly advance both understanding of molecular systems biology and design of synthetic circuits.

where a j (X(t), I(t + u)) gives the propensity of the j-th reaction channel at time t + u provided that no channel has fired since time t. Once some channel {R 1 , ..., R M +1 } does fire, the bound B is reset according to Eq. 4 (with t the time of the most recent firing). We use this approach in Fig. 1 B&D. Finding B according to Eq. 4 involves finding the supremum of I over the same interval, which can account for the majority of the CPU time needed to implement Extrande (see also Fig. 2). We also explored a second type of implementation of Extrande in which either: (a) the input I(t + u) ≤ C with probability 1 for all u < L, for some (deterministic) constant, C; or (b) the input I is unbounded ex ante (that is, no such constant exists), but we can find an approximate ceiling, C, such that P r[I(t + u) ≤ C] = 1 − for a small, predetermined and all u < L. In case (b), the Extrande algorithm is then applied using as input 1 the processĨ(t) = min{I(t), C}. In both cases (a) and (b), the bound B in Step 3 is computed using the ceiling, C.
Examples of (a)-in which case Extrande is an exact simulation method unconditionally provided exact trajectories of the input can be simulated-are given in Fig. 1A and in computing the exact result shown in Fig. 4 below. Examples of (b) are given in Fig. 1C (where we set = 1.35 × 10 −3 and find C using the relevant quantile of the distribution of the underlying OU process) and Fig. 4. In Fig. 4, we demonstrate the convergence to the exact Extrande result of the case (b) implementation with approximate bound when is small. The advantage of the implementation in (b) is the potential for further large reductions in CPU time when (b) can be implemented but (a) cannot, by avoiding CPU time spent finding local ceilings of a pre-simulated input trajectory.
When the input I is a function of a univariate OU process, the transition density of the OU process is Gaussian and can be used to obtain I(t) in Step 9 of the Extrande algorithm, avoiding pre-simulation of the OU trajectory on a fine grid when used in combination with case (b). We use this approach in Fig. 1C. We verify the accuracy of the 'case (b)' implementation of Extrande for a fluctuating but saturating transcription rate given by k(t) = 2K exp(ξ(t))/(K + exp(ξ(t))), with K = 10. Here ξ(t) is a mean-zero Gaussian (OU) process with autocovariance ξ(t)ξ(t ) = (5/4)e −γ|t−t | , and γ = 10 −4 . (A) compares the resulting distributions of the input, k(t) = min{k(t), C( )}, for different values of (using a bin size of 0.2). (B) We compare the distributions of protein numbers obtained for the gene expression model (k dm = 4, k dp = 1, ks = 100) for decreasing values of . For large (red line), the tail of the distribution is substantially underestimated compared to the result from the exact, 'case (a)' Extrande algorithm (black diamonds). We find quantitative agreement with the exact Extrande result for = 1.35 × 10 −3 (black line). Decreasing the magnitude of fluctuations in the k(t) helped to reduce the error in the protein distributions corresponding to larger (not shown). We speculate this is because intrinsic noise dominates protein fluctuations in the latter case.
Relation of Extrande with small look-ahead horizon, L, to the direct integral method We write a j [X(t), I(t)] for the propensity of the jth reaction (conditional on {H X t , I}, for j = 1, ..., M , and a 0 (t) = M j=1 a j [X(t), I(t)] for their sum. First, we analyse the behaviour of the Extrande method in the limit where the look-ahead horizon L is small and the input I(t) can therefore be taken as constant over the horizon L. In this limit, B ≈ I(t) and therefore the extra, virtual channel never fires. We write W i = T i −T i−1 for the waiting time to the firing of the ith reaction. In this limit, Extrande finds W i by drawing a sequence of exponential random variables, {E k ; k = 1, 2, ...}, with means m(k) = [a 0 (T i + (k − 1)L)] −1 , and stopping the first time one of these random variables does not exceed L: for n = 1, 2, ..., which is the probability that n evaluations of a 0 in Step 3 of Extrande are used to draw W i . By a direct integral method we mean one that finds the waiting time, W i , by numerical integration in order to solve where Exp(1) denotes an exponential random variable with mean 1. The reaction occurrence time is then allocated to a reaction channel in the usual way for a direct method. Suppose the numerical integration uses a timestep equal to L. Then it is straightforward to show (again taking I(t) constant over each small time-step of length L) that the probability that W i = (n − 1)L is again given by the right-hand side of Eq. 5; this is the probability that n integration steps and hence n evaluations of a 0 are used to draw W i . We have shown that the Extrande method with small L and the direct integral method with step-size equal to that value of L give the same distribution for the number of evaluations of the propensity sum (number of iterations) used to draw each waiting time W i , and therefore have the same expected CPU time for the propensity evaluations involved in drawing the sequence of W i 's. As L → 0, the expected number of propensity evaluations diverges (E[n] → ∞) for both methods, as does the expected CPU time for those propensity evaluations. The two methods are identical in the way they allocate reaction times to reaction channels. (The difference between them is that with small L, Extrande draws as many exponential random variables as evaluations of the propensity sum, n, for each waiting time (Eq. 5) whereas the direct integral method draws only one for each waiting time.) The above analysis establishes that the expected CPU time of Extrande for some other choice of horizon, L * , will be below the expected CPU time of the direct integral method (with small timestep L)-which includes the expected CPU time for propensity sum evaluation-whenever the expected CPU time for propensity evaluation in Step 3 (Box 1) for small L-Extrande is equal to (or exceeds) the total expected CPU time of Extrande using L * . This is the case (to an adequate approximation) in Fig. 2A, setting, for example, L * = 10h).

The Slow Input Approximation method
We present below the Slow Input Approximation (SIA) method used here for stochastic simulation over the interval [0, T ]. The SIA method is based on Gillespie's direct method [14].
The reaction network has M reaction channels {R 1 , ..., R M } with associated stoichiometries {v 1 , ..., v M }, and network state, X. The algorithm takes as input a function that simulates the dynamic input, I, and outputs the time t and state of the system whenever a reaction event occurs. 0. Initialise time t ← 0 and network state to X ← X 0 .
3. If t + τ > T end the algorithm; Else proceed to step 4. 4. Draw independently a Uniform (0,1) random variable, U ; let j be the smallest positive integer less than or equal to M satisfying 5. Advance the time by replacing t ← t + τ , evaluate I(t) and update the system's state with the occurrence of reaction R j at time t, X ← X + ν j .
6. Output (t, X) and return to step 1. Relating to Figure 1 (A) Average protein number is based on 10, 000 realizations, T = 1000, and the integral was discretised using ∆t = 0.01. Average protein numbers were varied via the translation rate, k s , and the remaining parameters given by k dm = 4 and k dp = 1 (where the the unit of time is the inverse protein degradation rate). Median or mean protein degradation rates were estimated from proteomics data (O. tauri -5h −1 [51], S. crescentus -0.12h −1 [52], S. cerevisiae -0.9h −1 [53], human cancer cells -0.1h −1 [54], fibroblast -0.01h −1 [55]). (B) Average protein numbers were varied as above and the same parameter values. Autocorrelation times of the transcription rate, k(t), were chosen according to typical cell cycle durations t 1/2 = ln 2/γ from the literature (O. tauri -24h [56], S. cerevisiae -1.6h, human cancer cells -23h [54]). The average protein numbers were obtained from 100 independent realizations with T = 10, 000 sampled every ∆t = 100. The resulting 100 data points are then used to compute the error | n −n ex |/n ex (as obtained from Eq. (1) of the main text in the stationary case) and the sampling error given by one standard deviation of the bootstrap distribution.
In order to understand better the source of the error of the SIA method shown in Fig. 1A, we considered the fraction of the 10 day simulated trajectories during which both mRNA and protein levels are zero (the 'off-time'), comparing the SIA result to the one obtained using Extrande (Fig. B). The relative error of the SIA method is well-explained by the over-estimation of the off-time by the SIA method. We have verified, throughout Fig. 1A close agreement of the results generated using Extrande with the corresponding analytical results (Fig. C).  Fig. 1A [Eq. 7 with circadian transcription rate, k(t) = 4(1+sin(2πf t)), f −1 = 24h], for a range of protein degradation rates (k dp ) and average protein levels (10-10000 molecules). (B-H) We calculate the proportional root mean square error (RMSE) between the (sample) average protein number and the exact, time-dependent solution of the model (derived as in [10] from datasets with different sample sizes. In all cases as the sample size is increased the proportional RMSE diminishes in agreement with the relative standard error of the mean (SEM) obtained from the model. We vary the average protein numbers using the translation rate (ks) and set k dm = 4k dp .  Fig. 3), with an OU input process. Note the conspicuous relative error (> 60 %) of the MN integral method using an integration time-step of 1s. We were unable to report the relative error for integration time-steps < 10 −4 h owing to the CPU times needed to obtain averages across 1000 cells. Figure 3 (A) Table 1 lists the reactions and parameter values used in our models of the competence modules of the wild-type B. subtilis and the Synthetic Decision-Making (SynDM) networks. The wild-type module is based on the stochastic model presented in [57]; here, the ComK and ComS promoters are modelled using massaction reactions kinetics (rather than Hill functions) and a group of rate constants is rescaled in order to ensure that the model yields a fraction of competent cells (17%) and an average time of entering competence (12h) in rough agreement with those reported in the literature [45, 46] (when pComA is constant at 1000 molecules). Table 2 lists the reactions and parameter values used in our model of the upstream signaling module determining the dynamics of pComA. The model is based on the descriptions of the signaling given in [58][59][60] and parameterised using biophysically realistic parameter values.

Relating to
(B-G) ComK is the master transcription factor that drives gene expression and the physiological changes required for competence [46] (migration of ComGA and other proteins to the cell poles, assembly of DNA uptake and processing machinery at the poles, etc.). We therefore prefer not to model entry into competence as occurring the first time the ComK level exceeds a threshold, but to use a different model of ComK-driven progress and entry into functional competence: where k is an effective rate of ComK-driven differentiation. A cell is taken to enter (functional) competence at the time when Progress(t) = 1. The value of the parameter k is set so that the wild-type and SynDM To study the effect of extrinsic fluctuations from upstream signaling on the competence decision we simulated the wild-type and SynDM networks using Gaussian, Ornstein-Uhlenbeck (OU) input processes to model pComA fluctuations. To obtain the parameters of the OU process we first apply the linear noise approximation (LNA) [43] to the upstream signaling module in Table 2 to find the mean (1000 molecules), CV (0.35) and autocorrelation function (ACF) of steady-state pComA levels. We find that the autocorrelation function (ACF) of pComA is well fitted by a single exponential function. Therefore we use a single OU process (with mean and variance obtained from the LNA, and lifetime obtained from the fit) to describe pComA fluctuations. We verified the adequacy of our single-OU approximation by checking that modeling the pComA input as a sum of two independent OU processes yields quantitatively similar fractions of competent cells (this yielded 0.39 cf 0.40 for SynDM, and 0.17 cf 0.17 for the wild-type). We obtain the other OU input process specifications examined by keeping the mean constant (µ = 1000) and varying the lifetime (τ ) and variance (σ 2 ) of the process. OU input trajectories used were presimulated using time-step of 1s.
To confirm the importance of modeling synthesis and degradation of the signaling proteins in the upstream signal transduction module, we repeated the LNA analysis using a variant of the signaling module in which ComA, ComP and RapC species were kept constant at their average levels (by removing all corresponding gene expression and degradation reactions from the model). This formulation of the model yielded an OU process with the same mean (1000 molecules), but with greatly reduced CV (0.07) and autocorrelation time (28 sec). All fractions reported in panels (E-G) were based on 1000 independent cells. (H) The autocorrelation function of pComA that was obtained from the LNA analysis of the signaling module was fitted to: 1. the ACF of an OU process, exp (−t/τ ), yielding τ = 5h; and 2. the ACF of the sum of two independent OU processes, p · exp (−t/τ 1 ) + (1 − p) · exp (−t/τ 2 ), yielding τ 1 = 5.4h, τ 2 = 6.0 · 10 −3 h, and p = 0.941.
Least-squares fits were performed in Matlab using the fit function. Model of inter-cellular quorum sensing. We assume a population of N identical cells, each with a copy of the upstream quorum signaling module listed in Table 2. Communication between cells is achieved via the molecular species CSF and ComX, which are produced by cells and released to the extracellular space with volume V e . The macroscopic dynamics of the extracellular concentration of these two species are governed by the following equations: where index i denotes the concentration of the intracellular species in the ith cell (having volume V c ). Since the cells are identical, the sum simplifies and we can rescale the appropriate rates by N/V e . Then we solve the resulting ODE system (describing the macroscopic dynamics of one cell and the two extracellular species) to obtain the steady-state levels of all species in the signaling module. The LNA is then applied to the model in No.