^{1}

^{1}

^{2}

^{3}

^{*}

JW conceived and designed the experiments. KYK performed the experiments. KYK and JW analyzed the data, contributed reagents/materials/analysis tools, and wrote the paper.

The authors have declared that no competing interests exist.

Finding a multidimensional potential landscape is the key for addressing important global issues, such as the robustness of cellular networks. We have uncovered the underlying potential energy landscape of a simple gene regulatory network: a toggle switch. This was realized by explicitly constructing the steady state probability of the gene switch in the protein concentration space in the presence of the intrinsic statistical fluctuations due to the small number of proteins in the cell. We explored the global phase space for the system. We found that the protein synthesis rate and the unbinding rate of proteins to the gene were small relative to the protein degradation rate; the gene switch is monostable with only one stable basin of attraction. When both the protein synthesis rate and the unbinding rate of proteins to the gene are large compared with the protein degradation rate, two global basins of attraction emerge for a toggle switch. These basins correspond to the biologically stable functional states. The potential energy barrier between the two basins determines the time scale of conversion from one to the other. We found as the protein synthesis rate and protein unbinding rate to the gene relative to the protein degradation rate became larger, the potential energy barrier became larger. This also corresponded to systems with less noise or the fluctuations on the protein numbers. It leads to the robustness of the biological basins of the gene switches. The technique used here is general and can be applied to explore the potential energy landscape of the gene networks.

In the post-genome era, with a wealth of data on genomic sequences, the crucial question becomes how to understand the organization of these sequences in nature and how genes function [

The underlying nature of cellular networks has been explored by many experimental techniques [

Theoretical models of cellular networks have often been formulated with a set of chemical reaction equations in bulk. These averaged dynamical descriptions are inherently local. To probe the global properties, one often has to explore different parameters. Since the parameter space is huge, the issue of global robustness is hard to address directly from these approaches.

Here we will explore the nature of the network from another angle: we formulate the problem in terms of the potential energy function or potential energy landscape. If the potential landscape of the cellular network is known, the global properties can be explored [

There is another intriguing factor controlling the gene expression patterns. In the cell, there are a finite number of molecules (typically on the order of several hundreds or thousands). The intrinsic statistical fluctuations, usually not encountered in bulk due to the large-number averaging, can be significant and play an important role in the dynamics of gene expression. This gives the source of intrinsic statistical fluctuations or noise. On the other hand, the fluctuations from highly dynamical and nonhomogeneous environments of the interior of the cell give the source of the external noise for the networks [

In general, instead of studying the averaged chemical reaction network equations in bulk, we should use statistical descriptions to model the cellular process. This can be realized by constructing a master equation for the evolution of probability instead of average concentration for the corresponding chemical reaction network equations [

There are three aims of this paper. Our first aim is to develop a time-dependent Hartree approximation scheme [

As our goal is to uncover the potential energy landscape, we first studied the chemical reaction network involved in gene regulations. In particular we need to take into account the intrinsic statistical fluctuations due to the finite number of molecules in the cells. The statistical nature of the chemical reactions can be captured by the corresponding master equations. We established master equations for the gene regulations that describe the evolution of the networks probabilistically. The master equation is almost impossible to solve due to its inherent huge dimensions. We therefore used the Hartree approximation to reduce the dimensionality [

Gene expression is regulated in various and complex ways, and can be represented by many coupled biochemical reactions. In this report, our goal was not just to explain some specific gene network system as accurately as possible, but to illustrate mathematical tools for exploring the general mechanisms of transcriptional regulatory gene networks. We therefore took abstractions of some essential biochemical reactions from complicated reactions of diverse systems.

Let us start with the explanation of some terminologies used in this manuscript: “activator” is a regulatory protein that increases the level of transcription, “repressor” is a regulatory protein that decreases the level of transcription. By “operator” we mean the DNA site or the gene where regulatory proteins (either an activator or a repressor) bind. First we are interested in the effect of “operator fluctuation” by which we mean the biochemical reactions that change the state of the operator. The operator is said to be in an occupied state if a regulatory protein is bound to it, and in an unoccupied state if the protein is not bound to it. For the repressor we include the following reaction.
_{β}_{αβ}_{AB}

Notice that the superscript 1(0) in

Next we include the transcription and translation steps. Here we ignore mRNA and consider only one step combining transcription and translation:
_{α}_{α}_{1(0)} is a protein synthesis probability per unit time, and _{α}

We can say that

The master equation is the equation for the time evolution of the probability of some specific state _{A}_{B}_{C}_{A}_{B}_{C}_{A}_{B}_{C}_{A}_{B}_{N}^{N}_{A}_{B}_{α}, 1) and one for _{α}, 0). (In fact, these are not just two equations because _{α} varies from 0 to hundreds.) With the two component vector notation,

Notice that _{αβ}_{αβ}_{β}_{β}_{αβ}. All network interactions can be determined by assigning every _{αβ}_{α}

The techniques of quantum field theory can be used to solve the master equation [

For each protein concentration, a creation and an annihilation operator are introduced, such that ^{+}|n〉 = |n + 1〉 and ^{+}] = 1. The generalization to include activating proteins is straightforward. While the state vector is a simple product of individual genes, the operator product form of Ω is chosen deliberately to reproduce the original master _{i}_{α} is the birth–death part and plays a role in the diffusion and drift terms in the context of Fokker–Plank equation. The second term and third term in _{αβ}_{αβ}

To complete the mean-field approximation, we need to average all interaction effects by doing an inner product with some reference state, which is a two-component generalization of the Glauber state [

We will use the Rayleigh–Ritz variational method to obtain an approximate solution of a non-Hermitian Hamiltonian system (nonequilibrium system) like _{t}_{α}_{1}, _{α}_{0}, _{α}_{1}, _{α}_{0}, α_{α}_{1}, α_{α}_{0}, λ_{α}_{1}, λ_{α}_{0} are time-dependent parameters to be determined by the variational principle. The ket ansatz is chosen as coherent state, which corresponds to a Poisson distribution. _{α}_{1} and _{α}_{0} are the probabilities of the two DNA-binding states, _{α}_{1} and _{α}_{0} as well as by means of the protein concentrations at _{α}_{1} and _{α}_{0} (from the Poisson distribution ansatz). With the following notation

Here, 〈Φ|(α^{L}^{L}^{R}_{β}| is defined by

The mean-field approximation approach should inherently provide information on moments (_{α1}, _{α0}, 〈_{α1}〉, 〈_{α0}〉,

Moment equations are more exact than the variational approach, but the approach cannot be used to obtain exact solutions for the system having self-interaction, in which equations are not closed. Even in the closed system, an ansatz reduces the degrees of freedom significantly and makes the problem easier to handle. Mathematically, using an ansatz is equivalent to giving specific relations between moments. We may, therefore, not need to take care of higher moments if an infinite number of higher moments is automatically given by assuming a specific ansatz. In practice, ansatz might be useful. Then the issue would be how faithful the ansatz we choose is. In this paper we used both the moment equation and the Poisson ansatz. Notice that

The final output we get from these equations is basically moments. From these moments we need to construct the total probability. There are several important features to be pointed out. We start with the single gene case.

First, notice that the total probability does not have the structure of _{1}_{1} + _{0}_{0}. We started with a two-component column vector and to extract the physical observables we needed to do the inner product with a two-component row state vector. (We never added the spin up and down component directly in quantum mechanics.) The total probability should therefore not follow the steps of constructing _{1} and _{0} first and then weighing by _{1} and _{0}. The correct procedure is the following. With the moments, the solutions of equations, we construct new moments:

In principle, we can get arbitrary order of moments and construct the corresponding probability if the equations are closed. In practice, however, we may choose one of two probability distributions: Poisson or Gaussian distributions.

Second, the probability obtained above corresponds to one limit point or basin of attraction. One solution of the equations determines one of the limit points and also gives the variation around the basin of attraction, so it is intrinsic. If the system allows multistability, then there are several probability distributions localized at each basin of attraction, but with different variations. Thus, the total probability is the weighted sum of all these probability distributions. The weighting factors (^{a}^{b}

Notice that the steady state solution is not enough to describe the total probability. It does not say anything about the volume of the basin, it only tells us the limit point. So the effort to derive an effective potential energy from the steady state solution on general grounds needs to take into account the volume of the basin of attraction. One simple exception is the symmetric toggle switch, where the weighting factors are simply (0.5, 0.5) by symmetry.

Third, the total probability of many genes is simply the product of each gene based on our basic assumption, the mean field approximation. For example, the probability of a toggle switch can be written as
^{a}^{b}

Finally, once we have the total probability, we can construct the potential energy (or potential energy landscape) by the relationship with the steady state probability:

This is the reverse order of the usual statistical mechanics of first obtaining the potential energy function, exponentially Boltzman weighting it, and then studying the partition function or probability of the associated system. Here we look for the inherent potential energy function from the steady state probability. In the gene-network system, every chemical parameter, such as the protein production/decay rates and binding/unbinding rates, will contribute to the fluctuation of the system. All these effects are encoded in the total probability distribution, and, consequently, in the underlying potential energy landscape.

We looked at an important example of two genes interacting with each other. The interactions are through the proteins synthesized by the genes, which act back to regulate the gene switch. The bacterial lambda phage is a good biological example of a toggle switch. The two lysogenic and lysosic genes are both stable and robust. It has been a long-standing problem to explain why the lambda phage is so stable [

All applications to specific network systems start with

For the symmetric switch, we first solved the equations of motion determining the amplitude, the mean, and the higher order moments of the probability distribution of the protein concentrations of the corresponding genes. These are given below.

We solved the master equation with two methods. One is the Poisson ansatz, mentioned above, by assuming the inherent Poisson distribution, and the other is the exact method, using the moment equation. For the inherent Poisson distribution, we can write down the equations of motion for the amplitude and mean.

_{α1} + _{α0} = 1), and recollected terms.

For the exact solution with moment equations, we also wrote down the equations of motion of the moment of protein concentration of the corresponding genes.

By giving some initial conditions, and taking the long time limit, we obtained the steady state solution. We fixed all parameters except the protein synthesis rate _{A1}(= _{B1}). We looked at the probability of genes that were in the active state versus the relative importance of synthesis rate versus degradation rate. By increasing the synthesis rate, _{A1}, we could observe the bifurcation from the monostable state to the bistable state after passing a certain critical point. _{ad}_{1}/2 in our choice.

Exact moment equation solutions are compared with Poisson ansatz solutions 0 < _{ad}_{1}/2) < 100 , _{A}_{B}_{A}_{B}_{A}_{B}_{A}_{A0} = _{B0} = 0.

The other parameters are the same as

In the parameter range in which the bistability occurs, we found two limit points (named

For the symmetric toggle switch case, the weight factor was simply (0.5, 0.5) due to symmetry. The change of the probability distribution shape in terms of the adiabatic parameter of the relative importance of the protein synthesis rate compared with the degradation rate is shown in

As we discussed, the steady-state distribution function

The other parameters are the same as

We can see that when the protein synthesis rate is small relative to degradation rate, only a single basin of attraction exists for the underlying potential energy landscape. For large enough protein synthesis rate relative to degradation rate, two basins of attraction emerge. Once we have the potential energy landscape, we can discuss the global stability of the gene regulatory networks. The time scale of the transition between the two stable minimum basins of attraction can be estimated by τ ∼ τ_{0}exp[^{≠} − _{min}_{0} is the pre-factor and τ is the time scale of transition from one basin of attraction to the other. ^{≠} is the potential energy at the saddle point between the two stable basins of attraction. _{min}^{≠} − _{min}

The other parameters are the same as

This illustrates how biological robustness is realized for the toggle switch. As the protein synthesis rate and unbinding rate of protein to DNA increase relative to the degradation rate, more proteins are synthesized. These proteins are strong repressors. This leads to smaller fluctuations. Furthermore, the associated barrier height between the two basins of attraction becomes large, and the two basins of attraction become more stable since it is harder to go from one well to another. So, small fluctuations and large barrier heights both serve as the source for the robustness and stability of the gene toggle switch. In other words, it is more unlikely for the system to change from one basin of attraction to the other. Therefore, the system becomes robust. The robustness issue is not yet well-understood for cellular networks in general. Here we explored the robustness of the switches against the intrinsic statistical fluctuations coming from the finite number of protein and DNA molecules. This is clearly very important and has potential applications to the robustness problem of lambda phage in bacteria.

We also studied the time evolution of the probability and the potential energy landscape with dynamic equations. We chose the specific parameters and initial conditions to illustrate the idea. The results are shown in

_{ad}

_{ad}

In

Finding the multidimensional potential energy landscape is the key to addressing important global issues such as the robustness of cellular networks. We have uncovered the underlying potential energy landscape of a simple gene network: toggle switch. We found that as the protein synthesis rate and the unbinding of protein to DNA rate relative to degradation change from small to large, the underlying potential energy landscape changes from having monostable to bistable basins of attraction. These basins correspond to stable, biologically functional states. The potential barrier between the two basins determines the time scale of conversion from one to the other. We found that as the protein synthesis rate and unbinding of protein with DNA rate relative to degradation became greater, the potential energy barrier became greater and the statistical fluctuations were effectively more severely suppressed. This leads to the robustness of the biological basins of the gene switches.

In principle, our approach can be generalized to more realistic networks involving multiple genes as well as additional levels of regulations. This could be realized by averaging the interactions among genes in the corresponding master equations. It effectively reduces the dimensionality of the problem from exponential to polynomial number of degrees of freedom. It is worthwhile to note the limitation of this approach. When the interactions among genes are very strong, our approach is less effective.

Recently, synthetic biology became an important part of systems biology [

The adaptive landscape idea was first introduced into biology by S. Wright in the 1930s [

This model can be modified to include more biochemical reactions. To investigate the role of mRNA, we can consider the transcription and translation process separately. To focus on the statistical fluctuations of genes turning on and off, it is possible to generalize the formalism to compute the statistical fluctuations quantitatively. We also can take into account the spatial variation of the state variables, such as the number of proteins.

JW would like to thank Professor Peter G. Wolynes, Professor Ping Ao, and Dr. Xiaomei Zhu for helpful discussions.