Constructing the Energy Landscape for Genetic Switching System Driven by Intrinsic Noise

Genetic switching driven by noise is a fundamental cellular process in genetic regulatory networks. Quantitatively characterizing this switching and its fluctuation properties is a key problem in computational biology. With an autoregulatory dimer model as a specific example, we design a general methodology to quantitatively understand the metastability of gene regulatory system perturbed by intrinsic noise. Based on the large deviation theory, we develop new analytical techniques to describe and calculate the optimal transition paths between the on and off states. We also construct the global quasi-potential energy landscape for the dimer model. From the obtained quasi-potential, we can extract quantitative results such as the stationary distributions of mRNA, protein and dimer, the noise strength of the expression state, and the mean switching time starting from either stable state. In the final stage, we apply this procedure to a transcriptional cascades model. Our results suggest that the quasi-potential energy landscape and the proposed methodology are general to understand the metastability in other biological systems with intrinsic noise.

In this section we take the singular perturbation analysis [1] to analyze the large system size limit of the stochastic equations when K goes to infinity. This result gives the rationale in what sense that the deterministic dynamics is related to the stochastic dynamics.
At first the backward operator L, i.e. the infinitesimal generator of the considered chemical reaction process (also the adjoint operator arising in chemical master equations (CMEs)) has the form Taking the adjoint of L, we get the CMEs for this system. Denote P mnd , Q mnd the probability distribution function (PDF) of inactive/active DNA state with m mRNAs, n proteins and d dimers at time t. The CMEs have the form: where A * is the adjoint of A. That is, To investigate the evolution of the concentration variables, we consider the variables on the rescaled lattice N 3 /K. Correspondingly, the infinitesimal generatorL on the rescaled lattice has the form where the rescaled operator Define u(x, y, z) = h(0, x, y, z), v(x, y, z) = h(1, x, y, z), ε = K −1 , then we have the backward equations for the rescaled dynamics through Taylor expansion whereγ The operator A is the differentiation form ofÃ as From (7), the slow scale equation can be obtained by a weighted summation By a perturbation expansion we find that the leading term O(ε −1 ) of (7) gives and (10) gives This equation exactly corresponds to the backward equation of the deterministic process Transforming back to M, N, D variables according to (4), we obtain This is exactly the mean field ODEs satisfied by the number of mRNAs, proteins and dimers. The steady states of Eq. (14) satisfy It is easy to find that the system (15) has three positive solutions. We denote them as A linearized stability analysis tells us these three states have different stability property. The Jacobian of ODEs (14) is Thus it is a saddle.
The parameter values we used above are based on the references [2] and [3]. Their biological relevance can be referred to the Table I below (we assume the volume of a prokaryotic cell to be 3 µm 3 and the number of amino acids of one protein to be 500). In addition, we choose the values of parameters k 1 and γ 1 based on the Ref. [8] (the number of dimers per exponentially growing cell is around 113).

II. COMPARISON OF HAMILTONIANS
In this section, we will compare the differences between the Hamiltonian obtained by our approach and the Hamiltonian derived from WKB asymptotics, e.g. what was taken in [2].
Recall that our Hamiltonian derived from the large deviation theory [9,10] has the form
To obtain the Hamiltonian from WKB approach, we utilize the chemical master equation (3). In steady state, we have So we get Denote (x, y, z) = (m/K, n/K, d/K) and P (x, y, z) ≡ P mnd , we obtain the continuous approximation of the CMEs by applying the same scaling in Section I where A * is the adjoint of A appeared in Section I. Plugging the WKB ansatz P (x, y, z) ∼ e −KS(x,y,z) into (20) and keeping the leading terms, we get a Hamilton-Jacobi equation for S(x, y, z) as H WKB (x, ∇ x S) = 0 with Hamiltonian Here A(x, p) is the same as that containted in (17). One may find at the first sight that the Hamiltonian H WKB is quite different from the H in (17). Furthermore H WKB is complex to be used. We have proved in the main text that our Hamiltonian H is strictly convex and pointed out that H WKB does not guarantee such a property. Indeed, in our parameter setup, we may easily find for example if (x, y, z) = (27, 755, 5) and (p x , p y , p z ) = (0.01, 0.01, 0.01), the eigenvalues of ∇ 2 p H WKB are (-653.5176,1645.6911,2143.6604). The negative eigenvalue means H WKB is not convex even when x, y, z are all in the physically meaningful domain. Of course, this choice of x, p is based on a random sampling in the phase space. One can find the area for H WKB to be non-convex is much larger than one single point. It is also found in our computations that this non-convexity leads to breakdown or very slow convergence of the algorithm.

III. SCALE INDEPENDENCE ON THE CHOICE OF SYSTEM SIZE
We will prove the scale independence on the choice of system size for general chemical kinetic systems in this section. The specification to our concrete model is straightforward. The overall analysis tells us that the different choices of system size V (denoted as K in the main body of the paper) lead to equivalent results with the correct parameter rescaling, which is essential for the analysis.
Let us consider the classical set-up for the chemical reactions with infinitesimal generator where θ is the parameters appearing in the system. Suppose the propensity functions satisfy the scaling where α ∈ R corresponds to the scaling relation of the parameters with the system size V . In realistic situations, α usually takes values in {1, 0, −1, . . .}. For a fixed system and parameter θ 0 with different choices of system size V i (i = 1, 2), we define where the superscripts characterize the dependence on V i instead of the power of an exponent. It is not difficult to show that the large volume limit of this process when V i → ∞ is The fixed points of the above reaction rate equations satisfies It is straightforward to see that the fixed points {y Vi * ,k } K k=1 for two choices of the system size have the relation from (23). The Hamiltonian in the large deviation theory has the form The transition path connecting different fixed points in time T satisfies the Hamiltonian dynamics with boundary conditions It is easy to observe that if is a solution, where the last three variables in the parenthesis represents the related boundary points and reaction rates. Then is also a solution by the scaling (23) and ∇ y a j (V y; V α θ) = ∇ y a j (y; θ) by direct differentiation with respect to y to both sides of (23). This explains the scale independence of the system with respect to the choice of system size V . The key point is that the choice of V does not matter but the scaling relation (23) is crucial for the considered system. The above analysis can be generalized to the system with multiple parameters and condition or more generally such thatã j satisfies the relation (35) [11].
All of the analysis generalizes to the considered genetic switching case without any difficulty.

IV. INTRODUCTION OF THE GMAM
For the ease of readers, we here give a brief synopsis of the Geometric Minimum Action Method (gMAM), one can have a more detailed reading by reference [12].
The geometric minimum action method (gMAM) is a variant of the minimum action method (MAM) 13 . Both of them are going to minimize the action functional required to compute quasi-potential V (x 1 , x 2 ) in Freidlin-Wentzell theory and finding the minimizer.
For the presence of small random perturbations in dynamical systems, the behavior of the systems can't be described by the deterministic models all the time and the effect of the noise becomes ubiquitous. When event with very little likelihood occurs, large-deviations theory gives a rough estimate for the probability that the trajectory X ε (t), t ∈ [0, T ], T < ∞, of the random dynamical system lies in a small neighborhood around a given path ψ ∈ C(0, T ), where C(0, T ) denotes the space of all continuous functions mapping from [0, T ] into R n . The theory asserts that, for σ and ε sufficiently small, where P x denotes the probability conditioned on X ε (0) = x and assume ψ(0) = x. The action can be written as is absolutely continuous and the integral converges, +∞ otherwise, where the Lagrangian L(x, y) is given by Here ·, · denotes the Euclidean scalar product in R n and H(x, θ) is the Hamiltonian.
One can obtain the quasi-potential of a system by minimize its action functional Here, the functionsθ(x, y) is implicitly defined for all x ∈ D and y ∈ R n \{0} as the unique solutionθ, λ ∈ R n × [0, ∞) of the system H(x,θ) = 0, H θ (x,θ) = λy, λ ≥ 0.
As to our problem, Freidlin-Wentzell theory only provide the form of Hamiltonian. Though for some special cases, such as SDE
(4) Repeat until some stopping criterion is fulfilled.

B. Evaluating the Action
After the outer loop above, one can easily compute the value of the action by adding the following steps: In order to computeθ(ϕ, ϕ ) from (44), we use function h(·) to denote the strictly convex and twice-differentiable function H(ϕ, ·). The quadratically convergent routine is as follows. For p ≥ 0 : where w It is shown in [12] that this algorithm guarantees local quadratic convergence. So when applying this algorithm, we need a proper initial guess of ϑ. However, we usually have little knowledge about ϑ. If one find in a particular problem, innerloop does not converge, we suggest the following improvement.
Equation (44) can be viewed as Lagrange multiplier method that solves constrained optimization problem: So we can first apply a quadratic penalty method on this optimization problem. That is, we solve unconstrained problem where µ is a penalty factor. We can simply use Newton method with a proper line search to solve this problem. We know from [14] that Newton method with exact line search guarantees global convergence. Although this step may be a little expansive in computing, we need only a few step to offer the original innerloop a proper initial value. It is also easy to show that this strategy guarantees global convergence.

V. STOCHASTIC SIMULATION
To check our theoretical predictions, we performed Monte Carlo (MC) simulations using the Gillespie algorithm There are two kinds of typical criterion for switch in MC simulation. One is to check whether the system has passed through the stable manifold of the saddle-node point, the other is to define two small boxes around each stable point and check whether the system has jumped from one box to the other. Because between two switches, most of the time is wasted in randomly walking around one of the two stable points and after passing the stable manifold of the saddle-node point, the system converges quickly to the corresponding stable point. Therefore, this two kinds of criterion mentioned above is almost the same. Here for convenience, in Fig. 2 we adopted the second kind of criterion and in Fig. 4 the first kind. The two boxes are defined as below: {off: m of f ± 2.5, n of f ± 20.0, d of f ± 3.5} and {on: m on ± 5.0, n on ± 70.0, d on ± 10.} We use the following method to record each transition trajectory: (1) Determine the system being in which box.
(2) Wait until the system goes out of this box, and note down the trajectory to a memory unit.
(3) If the system comes back to the same box, free the memory unit and goes to step (2); if the system goes into the other box, output the transition trajectory in the memory unit and free it.

B. Mean Switching Time (MST) from MC simulation
Here we use Mean First-Passage Time (MFPT) to approximately represent MST. Because the system is three dimensional, in each set of parameters we first set the system at the corresponding initial state, and run MC simulation until the system crosses the saddle node (thus crosses the stable manifold of saddle node). For each set of parameters, we run MC simulation 1000 times.

C. Local Property: Fluctuation around Stable States
As to the inconsistent portion between analytical and simulation results (the left part of the line with slow promoter transition rates in Fig. 5A), the reason is below. In Fig. 5, from right to left the high stable state is losing its stability. During this process, the standard deviation is increasing while the distance between high stable point and saddle point is decreasing. Because our analytical results were based on the local information around high stable state, and the MC simulation results were constricted by the stability of the high stable state. Therefore the inconsistency becomes apparent when the noise is high.

VI. ANALYSIS OF UPHILL PATH
We will do more analysis about uphill path for both diffusion process and chemical jump process. Recall that that the uphill path has the forṁ For the diffusion procesṡ where w j are independent temporal Gaussian white noise with properties Eẇ j (t) = 0, Eẇ j (t)ẇ k (s) = δ jk δ(t − s). The Hamiltonian is H(ϕ, p) = b(ϕ) · p + 1 2 p T · a · p and the diffusion matrix a(ϕ) = σ · σ T . With this specific form we obtain the uphill patḣ If the drift b has the decomposition b(x) = −a · ∇ x U (x) + l(x) such that l(x) · ∇ x U = 0, then we have the result S(x) = 2U (x) 16 , which gives the rationale why S(x) is called quasi-potential 16 . This orthogonal decomposition also tells that the uphill path is given byẋ = a(x) · ∇U (x) + l(x).
For the general chemical jump processes, the similar picture still holds, but the argument on the orthogonal type decomposition of the drift is no longer valid if we recall that the uphill dynamics isẋ = ∇ p H(x, ∇ x S). In general, this form does not permit to specify some matrix a and make a meaningful decomposition because of full nonlinearity of H. However, it is instructive to do analysis under the condition that p is small. Recall that the Hamiltonian of a chemical jump process has the form thus the uphill patḣ where b(x) = M j=1 a j (x)ν j is exactly the right hand side of the deterministic mean field dynamics. If p is small, we can make the approximation then (56) leads tȯ This is exactly the uphill path for chemical Langevin dynamicṡ This analysis tells us that near the critical points, that is, the optimal uphill path of a chemical jumping process can be well approximated by the uphill path of a chemical Langevin dynamics. The situation with DNA fast switching is discussed in the main text.

VII. APPLICATION IN TRANSCRIPTIONAL CASCADES
The model we used was based on the previous work of Sara Hooshangi and Ron Weiss, etc 17 . In their work, they synthesized transcriptional cascades comprising one, two, and three repression stages and analyzed the sensitivity and noise propagation as a function of network complexity. The original gene network they used is shown in Fig. S1.
FIG. S1: The network design of three synthetic transcriptional cascades 17 . In all circuits, TetR is expressed constitutively from P lacIq promoter. aTc, which freely diffuses into the cell, binds TetR and prevents the repression of PLtet−O1. (A) In the one-stage cascade (circuit 1), eyfp expression is under the control of TetR repressor. (B) Circuit 2 has an additional repression stage where the expression of eyfp is controlled by LacI protein, which can be repressed by the TeR repressor. (C) In circuit 3, eyfp expression is controlled by the CI repressor. cI expression is controlled by LacI protein, which is under the control of TetR.

A. Step 1: Deterministic Model Described by the ODEs
To further illustrate the powerful visual effect of quasi-potential energy landscape and the abundant quantitative information it contains, here we simplify the network by modeling each stage as a Gillespies birth-death process and reaction rate of each stage is governed by a Hill function. The simplified model is x → y 1 y 2 y 3 Here, x denotes the concentration of original inducing signal atc, and y 1 , y 2 , y 3 correspond to the concentration of GFP in each cascade, i.e. the expression level of protein LacI, CI, Eyfp, respectively. This model involves 9 reactions and we state the model in Table II as below.  x is a parameter used to denote the concentration of aTc, a 0 the basal production rate of each protein, a i the maximum production rate of each protein, y i the output concentration in circuit i, K i the microscopic dissociation constant, i = 1, 2, 3, and n the Hill coefficient. The values of parameters are a 0 = 1, n = 2, a i = 101, K i = 10, i = 1, 2, 3. The corresponding deterministic equations are given by