Long-time analytic approximation of large stochastic oscillators: Simulation, analysis and inference

In order to analyse large complex stochastic dynamical models such as those studied in systems biology there is currently a great need for both analytical tools and also algorithms for accurate and fast simulation and estimation. We present a new stochastic approximation of biological oscillators that addresses these needs. Our method, called phase-corrected LNA (pcLNA) overcomes the main limitations of the standard Linear Noise Approximation (LNA) to remain uniformly accurate for long times, still maintaining the speed and analytically tractability of the LNA. As part of this, we develop analytical expressions for key probability distributions and associated quantities, such as the Fisher Information Matrix and Kullback-Leibler divergence and we introduce a new approach to system-global sensitivity analysis. We also present algorithms for statistical inference and for long-term simulation of oscillating systems that are shown to be as accurate but much faster than leaping algorithms and algorithms for integration of diffusion equations. Stochastic versions of published models of the circadian clock and NF-κB system are used to illustrate our results.

1 Drosophila circadian clock system A schematic representation of the Drosophila circadian clock system, as provided in [1], is displayed in Figure A. The oscillations are driven by the negative feedback exerted on the per and tim genes by the complex formed from PER and TIM proteins following phosphorylation. The per and tim mRNA, M P and M T , respectively, are transported into the cytosol where they are degraded and translated into protein (P 0 and T 0 ). These proteins are multiply phosphorylated (PER: P 0 → P 1 → P 2 ; TIM: T 0 → T 1 → T 2 ) and these modifications can be reversed by a phosphatase. The fully phosphorylated form of the proteins is targeted for degradation and forms a complex, C, which is transported into the nucleus in a reversible manner where the nuclear form of the PER-TIM complex, C N , represses the transcription of per and tim genes.  [1] The variables of the model along with the initial conditions (in nanomolar concentrations) used in our implementation are provided in Table A The parameter values used to derive the ODE solution of the system are provided in Table B. parameter description value measurement unit v sP M P transcription 1.10 Hill power 4.00 NA Table B: The parameters of Drosophila circadian clock system and the values used to derive their ODE solution.
The ODE system for the Drosophila circadian clock iṡ The SSA trajectories are derived using the rates provided in Table 1 of [1].
In the simulations of the Drosophila circadian clock provided in the next section and in I, we consider five values of the system size Ω = 200, 300, 500, 1000 and 3000. In Table C, we provide the approximate ranges of the molecular numbers for each species observed in our simulations. Notice the increase in the maximum number of molecules with increasing Ω, but also the decrease in the length of the range in terms of concentrations, X(t) = Y (t)/Ω, that reflects the lower levels of stochasticity for increasing Ω.
However, the precision of the approximation seems to stabilise after the third round with KS distances much smaller compared to standard LNA. The median CPU times under this correction frequency are 0.45secs for Ω = 300 and 0.22secs for Ω = 1000, compared to 0.45secs and 0.42secs, respectively, for correction frequency 6h. The reason for the absence of speed improvement for Ω = 300 is because at this system size and with less corrections negative populations appear more frequently and therefore SSA simulation are used more frequently, as explained in Sec. 13 in S1, and therefore any improvements due to applying correction less frequently are counterbalanced. For Ω = 1000, where negative populations are much less likely the speed improvement, is almost twofold.   Figure G: Comparison between pcLNA, tau-leap and CLE simulation algorithms for the Drosophila circadian clock. This figure has the same form as Fig. 9 of I, but (A), (B) and (C) panels contain results for Ω = 1000. As in Fig. 9 of I, the parameter values = 0.002 and ∆t = 0.002 for the tau-leap and the CLE approximation are the largest values to achieve precision similar to the pcLNA simulation, and the results displayed here are simulated using these values.
pcLNA v SSA  for Ω = 3000. As in Fig. 9 of I, the parameter values = 0.002 and ∆t = 0.002 for the tau-leap and the CLE approximation are the largest values to achieve precision similar to the pcLNA simulation, and the results displayed here are simulated using these values.

Light entrained Drosophila Circadian Clock
The Light entrained Drosophila circadian clock system first proposed in [2] is exactly the same with the Drosophila circadian clock without entrainment, apart from the degradation rate, v dT , of the phosphorylated TIM protein, T 2 , which instead of being equal to 3.4, it is equal to 4 during the day-time (6-18h) and equal to 2 during night-time (18h-6h).
We first compare the LNA distributions at fixed times with the empirical distributions derived using SSA simulations. As we can see in Fig. I, despite that the empirical distributions of the exact stochastic model do not spread along the curved limit cycle as much as the corresponding distributions for the system without entrainment (see Fig. 3 of I) and that the spread of the distributions in the light entrained system appears to stabilise even after the second cycle, the standard LNA distributions still do not approximate well these distributions.  k distributions under the pcLNA and the SSA, k = 2, 3, . . . , 10, r = 1, 2, 3, 4. The system size is Ω = 300.