Universally valid reduction of multiscale stochastic biochemical systems using simple non-elementary propensities

Biochemical systems consist of numerous elementary reactions governed by the law of mass action. However, experimentally characterizing all the elementary reactions is nearly impossible. Thus, over a century, their deterministic models that typically contain rapid reversible bindings have been simplified with non-elementary reaction functions (e.g., Michaelis-Menten and Morrison equations). Although the non-elementary reaction functions are derived by applying the quasi-steady-state approximation (QSSA) to deterministic systems, they have also been widely used to derive propensities for stochastic simulations due to computational efficiency and simplicity. However, the validity condition for this heuristic approach has not been identified even for the reversible binding between molecules, such as protein-DNA, enzyme-substrate, and receptor-ligand, which is the basis for living cells. Here, we find that the non-elementary propensities based on the deterministic total QSSA can accurately capture the stochastic dynamics of the reversible binding in general. However, serious errors occur when reactant molecules with similar levels tightly bind, unlike deterministic systems. In that case, the non-elementary propensities distort the stochastic dynamics of a bistable switch in the cell cycle and an oscillator in the circadian clock. Accordingly, we derive alternative non-elementary propensities with the stochastic low-state QSSA, developed in this study. This provides a universally valid framework for simplifying multiscale stochastic biochemical systems with rapid reversible bindings, critical for efficient stochastic simulations of cell signaling and gene regulation. To facilitate the framework, we provide a user-friendly open-source computational package, ASSISTER, that automatically performs the present framework.


Supplementary Methods
Validity of the stochastic model reduction using the stochastic QSSA Here, we show that an accurate approximation for a stochastic model containing a rapid reversible binding (Eq. (1)) can be obtained by replacing the fast variables (A, B, and C) representing the numbers of A, B, and C in Eq. (1) with their stochastic QSSAs ( A , B , and C ) [1][2][3][4].
Let P (n; t) represent the probability of being in state n at time t. If a system contains some reactions that are faster than the others, n can be partitioned into n f and n s denoting the states of fast species whose copy numbers are changed by the fast reactions and slow species whose copy numbers are not changed by the fast reactions, respectively. Then the chemical master equation (CME) for this system is − a f j (n f , n s )P (n f , n s ; t) − a s i (n f , n s )P (n f , n s ; t), where N f and N s are the numbers of fast and slow reactions, respectively, and a f j and a s i are the propensity functions of the jth fast and the ith slow reactions, respectively. Furthermore, v f f,j , v s f,i , and v s s,i are the stoichiometric vectors associated with the fast species of the jth fast reaction, the fast species of the ith slow reaction, and the slow species of the ith slow reaction, respectively. Note that there is no v f s,j in Eq. (S1) because the number of slow species is not affected by the fast reactions.
When a multiscale stochastic model contains the rapid reversible binding between A and B forming complex C and all the other reactions are slower than the reversible binding, n f consists of the variables A, B, and C, the numbers of A, B, and C, respectively, and n s consists of the variables that correspond to the other species.
Note that the rapid reversible binding reactions does not change the total numbers (i.e., both the free form and the complex form) of A species and B species (A T = A + C and B T = B + C).. Thus, by changing the variables, A = A T − C, B = B T − C, we can eliminate A and B in n f while adding A T and B T in n s . Then, because n f contains only C, the CME (Eq. (S1)) becomes where Ω is the system volume.
Since P (C, n s ; t) = P (C|n s ; t)P (n s ; t), where P (C|n s ; t) represents the probability of C at time t conditioned on n s , Eq. (S2) can be represented as − a s i (C, n s )P (C|n s ; t)P (n s ; t)].
(S3) In a short timescale, we can assume that only the fast binding and unbinding reactions occur, so the slow variables do not change (i.e., dP (ns) dt = 0). Then we can rewrite Eq. (S3) as Then, while n s is fixed, C evolves following Eq. (S4) and thus equilibriate to its conditional stationary distribution P (C|n s ), which satisfies October 6, 2021 S2/S12 In a long timescale, C has already reached the conditional stationary distribution P (C|n s ) (i.e., P (C|n s ; t) ≈ P (C|n s ) and thus dP (C|ns;t) dt = 0). Then by summing both sides of Eq. (S3) over all possible states of C, we get the following: In Eq. (S6), the summation of P (C|n s ) in the LHS is one as it is a probability distribution, and the first summation term (i.e., four lines) in the RHS becomes zero by Eq. (S5). Then we get the following reduced CME that consists of only the slow variables (n s ) and the slow reactions: where a s i (n s ) = C a s i (C, n s )P (C|n s ), i.e., the stationary conditional expectation of the propensity functions for the slow reactions. When the slow reactions are uni-or bi-molecular reactions, each a s i is a linear function in terms of the fast variables. Then where C|n s denotes the stationary average number of C conditioned on n s . Then the reduced CME (Eq. (S7)) becomes Since the conditional stationary average number of C ( C|n s ) solely depends on A T and B T , C|n s is the same as the stochastic QSSA, C , defined in the main text. Thus, Eq. (S8), where the fast variables (A, B, C) are replaced with their stochastic , is a faithful approximation of a stochastic model containing a rapid reversible binding in a long timescale.
Exact bound for the Fano factor of A In this section, we show that the Fano factor of A (F A ) is always less than 1. We first define A n |a, b as the stationary n th power moment of A conditioned on A T = a and B T = b. Then for any positive integer a and b, When a = 1, A can be only 0 and 1, so Here, F 0 is a generating function defined by is any positive deterministic steady state of the reversible binding reaction [6]. Here, we set (λ 1 , λ 2 , λ 3 ) = (K d , 1, 1). Then for r = 1, and for r = 2, We can rewrite Eq. (S12) as Then by replacing A 2 |a, b in Eq. (S9), we get Since A|a − 1, b < A|a, b , we finally proved F A < 1. While A|a − 1, b < A|a, b seems trivial, for the rigorousness, we provide the proof for this as follows. Since F A is always positive, from Eq. (S13), for all a ≥ 2 and b > 0. From the recurrence relation derived in the previous work [5], October 6, 2021 S4/S12 which can be re-expressed as By multiplying K d to both sides, we have Then we can show A|a − 1, b < A|a, b , as follows: Manual for the computational package, ASSISTER The computational package ASSISTER (Adaptive Simplification of StochastIc SystEm with Reversible binding; https://github.com/Mathbiomed/ASSISTER) contains three codes implemented in MATLAB: LQSSA, QSSA Threshold, and Gillespie Reduction. LQSSA calculates the L-state slQSSA for given A T , B T , K d , and L. QSSA Threshold is the function that determines which of the stQSSA and the slQSSA is valid, and calculates the number of needed states (L) for the slQSSA to ensure the smaller relative error than a tolerance for given K d value (Fig 6). The function Gillespie Reduction performs accurate stochastic simulations for any values of the parameters with the adaptive choice of the valid approximation method determined by using QSSA Threshold. While LQSSA and QSSA Threshold are auxiliary functions for the comprehensive code, Gillespie Reduction, they can be used independently by users.
To perform Gillespie Reduction, users need to enter the inputs for the Gillspie algorithm (the propensity function and stoichiometric vector for each reaction, the initial condition, and the time points) and the additional two inputs (the indices specifying rapid reversible binding reactions and the error tolerance). Then Gillespie Reduction automatically constructs the reduced model with adaptive choice of the approximation strategy (i.e., the non-elementary propensities) within the given error tolerance.
The file test in the repository contains the codes for a running example (Fig 2). The variable Stoi is the d × R stoichiometric matrix where d is the number of species and R is the number of reactions. x init is the initial condition for the species, and tspan is the vector of increasing time points at which simulated trajectories are measured. rev idx is the n × 2 matrix where n is the number of rapid reversible bindings to be reduced, and each column contains two indices that represent the pair of rapid reversible binding reactions. err tol is the error tolerance of the approximation for the stochastic QSSA.
prop functions = @propensity functions; Output contains the simulated trajectories of the species measured at each time point in tspan from the reduced model. Since the first (D) and second (P) species represent the binding molecules for the rapid reversible binding, they are not simulated in the reduced model. Instead, the trajectories of their corresponding total variables (D + D:P and P + D:P ) are stored in the corresponding rows (i.e., the first and second rows in this example) of Output. The zero vector is stored in the row corresponding to the bound complex (i.e., the third row in this example) for simplicity.    Table D. Propensity functions of the reduced models for the biological oscillator obtained using the stQSSA (Fig 3a bottom) and the slQSSA (Fig 5d) Reactions The Fano factor of A (F A ) is between 0 and 1. As K d decreases, F A becomes closer to the step function, which is 0 when A T > B T and 1 when A T ≤ B T .
Since the relative error of the stQSSA for A to the stochastic QSSA (R A ) satisfies (7) in the main text), R A is almost 0 when B T < A T while R A is between S A and 2S A when B T ≥ A T . Therefore, when A tightly binds with B and B T ≥ A T , R A mainly depends on S A (Fig 1d and 1e). Here, A T = 100. See Supplementary Methods for the proof that F A is always between 0 and 1.
10 -2 10 -2 10 -1 10 0 10 1 10 2 10 -2 10 -1 10 0 10 1 10 2 10 2 10 -2 10 -1 10 0 10 1 for each A T K d and K d when B T varies. As more states are used (i.e., k increases), A k lq accurately approximates A in the wider regions. Specifically, when A T K d is less than 2, 5, and 10 (i.e., left side of the dashed lines in the heat maps), R lq, 3 A , R lq,4 A and R lq,5 A are less than 0.1, respectively. This indicates that one can accurately approximate A with relative error of less than 0.1 regardless of the values of A T , B T , and K d by adaptively using both A tq and A 5 lq because the relative error of A tq is less than 0.1 when A T K d > 10.
October 6, 2021 S11/S12  (Figs 2a and 5c), using GillesPy2, one of the major, standard software suites for Gillespie type simulations [9]. The lines with colored ranges represent the mean ± standard deviation of 10 4 trajectories. The results are consistent with Fig 5c generated using MATLAB. All parameters are the same as in Fig 5c. (b) Execution times for the 10 4 repetitive simulations with the full model (182.5s) and the slQSSA model (24.0s). The simulation with the reduced model was approximately 7.5 times faster than the full model. The simulations were conducted with Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz.