Critical Dynamics in Genetic Regulatory Networks: Examples from Four Kingdoms

The coordinated expression of the different genes in an organism is essential to sustain functionality under the random external perturbations to which the organism might be subjected. To cope with such external variability, the global dynamics of the genetic network must possess two central properties. (a) It must be robust enough as to guarantee stability under a broad range of external conditions, and (b) it must be flexible enough to recognize and integrate specific external signals that may help the organism to change and adapt to different environments. This compromise between robustness and adaptability has been observed in dynamical systems operating at the brink of a phase transition between order and chaos. Such systems are termed critical. Thus, criticality, a precise, measurable, and well characterized property of dynamical systems, makes it possible for robustness and adaptability to coexist in living organisms. In this work we investigate the dynamical properties of the gene transcription networks reported for S. cerevisiae, E. coli, and B. subtilis, as well as the network of segment polarity genes of D. melanogaster, and the network of flower development of A. thaliana. We use hundreds of microarray experiments to infer the nature of the regulatory interactions among genes, and implement these data into the Boolean models of the genetic networks. Our results show that, to the best of the current experimental data available, the five networks under study indeed operate close to criticality. The generality of this result suggests that criticality at the genetic level might constitute a fundamental evolutionary mechanism that generates the great diversity of dynamically robust living forms that we observe around us.


Introduction
There is evidence that many complex dynamical systems found in nature are critical; namely, they operate close to a phase transition between two different dynamical regimes [1]. Avalanches [2], atmospheric phenomena [3,4], financial markets [5,6], earthquakes [7,8], granular matter [9], and the brain [10][11][12], are typical examples. Critical systems exhibit remarkable properties which would be difficult to explain without the assumption of criticality. For instance, they can integrate, process and transfer information faster and more reliably than non critical systems [13]. Or they can detect and respond to external stimuli whose intensities span several orders of magnitude, like the brain [11]. These remarkable properties are mainly a consequence of the long-range correlations that emerge close to the critical point, producing collective behaviors and coordinated responses of the entire system. Thus, criticality confers on the system the ability to collectively respond and adapt to an often rapidly changing environment.
In the context of genetic regulatory networks (GRN), which are recognized as the main component in charge of cellular control [14], recent theoretical studies have shown that robustness and adaptability, two central properties of living organisms [15][16][17][18][19][20], exist simultaneously with the highest probability only in GRN operating at or close to criticality [21]. Thus, criticality is a property that can help us understand how the coordinate expression of the different genes in an organism is achieved under external perturbations, either to sustain cell functionality or to generate new phenotypes in order for the organisms to change and adapt to new environments [15][16][17][18][19][20][21]. Therefore, it is important to determine whether the GRN of real organisms are dynamically critical. Although some attempts have recently been made in order to answer this question [22][23][24][25], the definite answer has remained elusive for several decades. Here we present direct evidence that the GRN of five different organisms indeed exhibit critical dynamics. We do so by simulating in the computer the avalanche of perturbations in the gene expression profile of the genetic networks of these organisms. This allows us to compute the Derrida mapping M(x) for the five networks under consideration [26]. We will formally introduce the Derrida map in a further section. For the time being, it suffices to say that M(x) relates the size x(t) of the perturbation avalanche at time t, with the size x(t+1) of the avalanche at the next time step t+1. In other words: x(t+1) = M(x(t)).
Therefore, M(x) contains all the information of the perturbation dynamics and can be used to directly measure the dynamical regime in which the network operates. Using this technique, we show that the dynamics of these avalanches are critical within numerical accuracy for the five different organisms studied.
However, in computing M(x) for the large networks of E. coli, S. cerevisiae and B. subtilis, we face the problem that the overwhelming majority of the regulatory functions (also termed regulatory phrases) that determine the combined effect of the regulators on their target genes are unknown. To circumvent this difficulty, we used random functions to model the dynamics of these networks. Although random, these functions were constructed in accordance with the fraction of positive regulatory phrases inferred from real gene expression profiles. Thus, the internal structure of the regulatory functions that we used for E. coli, S. cerevisiae and B. subtilis is statistically consistent with the one observed in microarray experiments.
Another important aspect that determines the dynamical regime in which the network operates (ordered, critical or chaotic) is the fraction of canalizing functions [27][28][29][30][31], which will be defined in a further section. Intuitively, these functions take into account the existence of dominant regulators such that, when present, override the effect of the other regulators. From the microarray experiments that we analyze it was not feasible to infer the fraction of canalizing functions present in the regulatory networks of E. coli, S. cerevisiae and B. subtilis. However, for these networks we varied in our simulations the fraction of canalizing functions around the statistically expected values. Interestingly, we observe a significant robustness of the critical dynamics under the addition or elimination of canalizing functions. This suggests that the critical behavior observed in the dynamics of the genetic networks of the organisms under study, is mainly produced by the network architecture rather than by the specific nature of the regulatory functions.
In the sections that follow we first present the Boolean network model that we use to implement the dynamics of the genetic networks, and the well known mean-field results that predict the existence of a phase transition from ordered to chaotic dynamics in this model. Then, we go beyond the mean-field theory by implementing the Boolean approach in the networks of real organisms, and show that in all the cases the Derrida map M(x) is consistent with critical dynamics. We then analyze how this map changes under the addition and removal of canalizing functions. In the last section we summarize and discuss our results.

Boolean Models of Genetic Networks
Several models have been proposed to analyze the dynamics of GRN [32][33][34][35]. Although the details of the dynamics might change from one description (e.g. continuous) to another (e.g. discrete), we expect the general properties of the dynamics, such as criticality, to be model independent. In fact, recent work shows that continuous and discrete descriptions of GRN exhibit similar dynamical properties under very general conditions [36]. Here we use the Boolean approach [37][38][39], in which every gene is represented by a discrete variable g that can take two values: g = 1 if the gene is expressed and g = 0 otherwise. The genome is thus represented by a set of N binary variables, g 1 , g 2 , …, g N . The expression of each gene g n changes in time according to the equation where g n1 ,g n2 , . . . ,g nk n È É are the k n regulators of g n , and F n is a Boolean function (also known as a logical rule), which is constructed according to the inhibitory or activatory nature of the regulators. The value acquired by the Boolean function for each configuration of the regulators is termed a regulatory phrase. For instance, if F(g 1 , g 2 ) is a function of the two regulators g 1 and g 2 such that F(1,1) = 1, F(1,0) = 1, F(0,1) = 0, and F(0,0) = 0 , then this  function consists of the four regulatory phrases {1,1}R1,  {1,0}R1, {0,1}R0, and {0,0}R0. We will refer to the regulatory phrases for which F = 1 as activatory, and those for which F = 0 as inhibitory. The fraction p of activatory phrases in the entire network, called the gene expression probability, is an important parameter that controls the dynamical regime in which the network operates (i.e. ordered, critical or chaotic). Recent work shows that the Boolean approach does capture the main aspects of the gene regulation dynamics, for it is able to reproduce gene expression patterns observed experimentally for several organisms [35,[40][41][42][43].

Phase Transition in the Boolean Network Model
In this section we present the mean-field theory results that show the existence of a dynamical phase transition from ordered to chaotic dynamics in the Boolean network model [26,38,44]. This allows us to introduce the tools that we use to characterize the dynamical regime in which the network operates. Although the phase transition was first obtained within the context of the mean-field approximation, we will show in the next section the remarkable result that the phase transition also occurs, almost identically, in networks with realistic topologies, for which the mean-field assumption does not necessarily apply. The phase transition is characterized by the temporal evolution of the Hamming distance x(t) between two different dynamical trajectories produced by two slightly different initial conditions. From a biological point of view, the Hamming distance x(t) is the average normalized size at time t of the avalanche of perturbations in the gene expression profile, produced by the perturbation (e.g. gene knockout or gene over expression) of a small fraction x(0) of genes at time t = 0. The temporal evolution of x(t) is given by a dynamical mapping x(t+1) = M(x(t)) which relates the size of the perturbation avalanche at two consecutive time steps [26,38,44]. Given an initial perturbation x(0) at time t = 0, successive iterations of this mapping will eventually converge to a stable fixed point x ?~l im t??
x t ð Þ, which is the final size of the perturbation avalanche. The value x ' is the order parameter that characterizes the dynamical regime in which the network operates. Thus, if x ' = 0 (ordered regime), all the initial perturbations die out over time. On the contrary, if x ' .0 (chaotic regime), the initial perturbation of even a small fraction of genes propagates across the entire system, finally altering the expression of a finite fraction x ' of genes in the genome. It turns out that M(x) is a continuous convex monotonically increasing function of x (in the mean field theory M(x) is a polynomial), with the property that M(0) = 0 and M(1),1. Therefore, there is only one parameter that controls the phase transition, the so-called average network sensitivity S, which is given by S = [dM(x)/dx] x = 0 . If S,1 then the only fixed point is x ' = 0, whereas if S.1 then x ' .0. The phase transition occurs at S = 1, for which the fixed point x ' = 0 is only marginally stable. In general, S depends on p and on the topology of the network. The dynamical mapping M(x) contains all the information about the dynamical regime in which the network operates. This is true even if M(x) cannot be obtained through a mean-field computation, which is the case for real networks. In the Supporting Information (Text S1) we provide a Java applet with the animation of the perturbation dynamics in networks with homogeneous random topology.
Beyond the mean-field theory: Existence of the Phase Transitions in Networks with Realistic Topologies The mean-field theory that predicts the existence of the phase transition controlled by the average network sensitivity S, is based on the assumption that all the genes in the network are statistically independent and statistically equivalent [26,44]. However, this assumption is certainly not true for real GRN, due to the existence of global regulators which correlate the expression of a large fraction of genes. Indeed, recent large-scale analysis [21] of the transcription regulatory networks of E. coli, S. cerevisiae, and B. subtilis indicate that these networks exhibit a Poisson-like input topology and a scale-free output topology (see Figure 1a). The output scale-free topology correlates the expression of a large fraction of the genes, and the assumption of statistical independence is not longer satisfied. Therefore, we do not expect the mean-field theory to be applicable for real networks because of their topological characteristics. Nonetheless, the phase transition predicted by the mean-field theory is identical to the one observed in randomly constructed Boolean networks with topologies statistically equivalent to the ones observed in real GRN. This is shown for the first time in Figure 1b. This result is quite remarkable, for we know that in many other systems the phase transition strongly depends on the network topology, and can even disappear for topologies that induce strong correlations between the elements (such as the scale-free topology, [45]).

Critical Dynamics in Real Genetic Networks
To determine the dynamical regime in which the genetic network of a real organism operates, we have to compute the dynamical mapping M(x) directly from experimental data, without any meanfield assumptions. The actual form of M(x) depends on both the network topology and the particular set of Boolean functions. We computed numerically M(x) for five genetic networks: The network of flower morphogenesis in A. thaliana (15 genes; [41]); the network of segment polarity genes in D. melanogaster (60 genes; [42]); and the gene transcription networks of E. coli (1481 genes; [46]), S. cerevisiae (3459 genes; [47]), and B. subtillis (830 genes; [48]). In the first two cases the Boolean functions are already known. However, the overwhelming majority of regulatory phrases for the gene transcription networks of E. coli, S. cerevisiae and B. subtilis are still unknown. Due to this lack of information, to implement the Boolean dynamics on the GRN of these three organisms we used biased random Boolean functions generated with a gene expression probability p inferred from microarray experiments. (In the next section we show that the map M(x) does not change significantly for networks with a large fraction of canalizing functions.) Given the network topology, p can be estimated from microarray experiments by standard Bayessian parametric inference with two states (see Methods). Using 223 microarrays to sample the gene expression space in S. cerevisiae [49], 152 microarrays for E. coli [49][50][51], and 69 microarrays for B. subtillis [52], we inferred several regulatory phrases for each of these organisms. All the experiments used for this analysis are listed in the Supporting Information accompanying this article (Text S1). We only used regulatory phrases for which the average of the a posteriori probability distribution was greater than 70% (activatory phrase) or smaller than 30% (inhibitory phrase). The inference technique is explained below in the Materials and Methods section. With this technique, we obtained the following gene expression probabilities: p = 0.57660.038 for E. coli (from 264 inferred phrases), p = 0.49560.055 for S. cerevisiae (from 196 phrases), and p = 0.053160.035 for B. subtilis (from 307 phrases). We then constructed random Boolean functions with internal bias given by these probabilities.
We report in Figure 2 the Derrida curves (i.e. the graphs of M(x)) for the five genetic networks under consideration. It is apparent from this figure that in all five cases the Derrida curves arrive almost tangent to the identity close to the origin. The above is consistent with critical behavior at the genetic level. A polynomial regression analysis allows us to estimate the average network sensitivity S = [dM(x)/dx] x = 0 by computing the slope at the origin of the best-fit polynomial with a degree equal to the maximum number of regulators per gene in each case (this is the polynomial predicted by the mean-field theory). In all five cases the Regression Sum of Squares is below 10 24 , and the average network sensitivities obtained in this way are: S = 1.028 for E. coli; S = 1.036 for S. cerevisiae; S = 0.826 for B. subtilis; S = 0.914 for D. melanogaster; and S = 1.127 for A. thaliana. Within numerical accuracy, these sensitivities show that the dynamics of these For this network the probability for a given gene to have K regulators is P(K) = e 2z z K /K! (Poisson input topology), whereas the probability to be a regulator to n other genes is P(n) = Cn 2c (scale-free output topology). (b) Phase transition diagram showing x ' as a function of S. The solid line in black is the theoretical result predicted by the mean-field theory. The dashed line in red was computed numerically for randomly constructed Boolean networks with N = 1000, Poisson input topology and scale-free output topology. The Boolean functions were randomly generated with a probability of gene expression p = 0.5. Remarkably, the phase transition predicted by the mean field theory is identical to the one obtained for random Boolean networks with topologies statistically equivalent to the real ones. doi:10.1371/journal.pone.0002456.g001 networks are very close to criticality. Note that this is particularly true for the two largest and most complete networks of E. coli and S. cerevisiae.

Criticality and its robustness revealed by varying the degree of canalization
We have used random Boolean functions in our simulations because the overwhelming majority of the logical rules for E. coli, S. cerevisiae and B. subtilis, are still unknown. The caveat is that real biological functions are not random. It has been pointed out that canalizing functions are more realistic from a biological point of view [27,28]. A formal definition of canalizing functions can be found in Ref. [31]. Here, it suffices to define a canalizing function as follows. Let F(g 1 , g 2 , …, g k ) be a Boolean function of k arguments. We will say that F is canalizing on one of its arguments g i , if the value of F is determined by fixing the value of g i either to 0 or to 1. To illustrate this concept, Table 1 shows a function F(g 1 , g 2 , g 3 ) that depends on three arguments. In this example, F = 0 whenever g 2 = 1, regardless of the values of g 1 and g 3 . Therefore, F is canalizing on g 2 . The biological significance of canalizing functions is based on the existence of dominant regulators. Thus, in the example shown in Table 1, g 2 could represent a dominant repressor (like crp in E. coli) which, when present, blocks the transcription of the target gene regardless of the presence or absence of the activators.
It is known that the amount of canalizing functions in the system can change the dynamical regime in which the network operates [29,30]. Therefore, it is important to determine if the dynamics of the large networks of E. coli, S. cerevisiae and B. subtilis are still critical when more realistic Boolean functions are used. However, from the microarray experiments that we analyzed it is impossible to know the fraction of canalizing functions present in these organisms, or if such functions have one, two or more canalizing inputs. For instance, according to the regulonDB, which gives the topology of the transcription regulatory network of E. coli [46],  [46], S. cerevisiae [47], and B. subtilis [48], D. melanogaster [42] and A. thaliana [41]. Each point in the Derrida curve is the average over 20000 initial perturbations and the error bars indicate one standard deviation around this average. Additionally, the error bars in the curves of E. coli, B. subtilis and S. cerevisiae incorporate the uncertainty in the estimation of the p bias inferred from the microarray experiments. For comparison we show in the upper left corner three Derrida curves of randomly constructed networks with Poisson input topology operating in three dynamical regimes: Ordered (green), critical (black) and chaotic (blue). Note that criticality is characterized by the tangency of the Derrida curve to the identity close to x = 0. doi:10.1371/journal.pone.0002456.g002 Table 1. Example of canalizing function. there are at least 817 genes with two or more regulators. These 817 genes are all regulated by a subset of 160 genes. Therefore, to determine whether or not the Boolean functions associated with these genes are canalizing, one would have to analyze microarray experiments probing at least 2 160 different configurations for these genes. This is impossible for many reasons, but mainly because not all the 2 160 configurations of the regulators are biologically attainable. (Consider for instance the configuration 00000….00 in which all the regulators are ''off'', or the configuration 11111…11 in which all the regulators are ''on''. Clearly, these two configurations are not attainable under any biological condition-without killing the organism.) Therefore, the 152 microarrays used to sample the gene expression space in E. coli represent a very tiny fraction of all the possible configurations necessary to determine the whole set of Boolean functions. However, these 152 experiments are absolutely relevant because they represent the observable and biologically attainable gene expression configurations of the organism. Thus, it might be irrelevant if the whole Boolean function of a given gene is canalizing because neither us nor the organism are sampling its whole set of (mathematically possible) configurations. For this reason, we do believe that the important quantity is the observed gene expression probability for the biologically attainable configurations. Nonetheless, Random Boolean functions already contain canalizing functions. Table 2 gives the probability P c (K) for a randomly generated Boolean function with K inputs to be canalizing on at least one of its inputs (data taken from [31]). As we can see from Table 2, the probability for a randomly generated Boolean function to be canalizing is high for K = 1, 2 and 3 (Boolean functions with K = 1 are canalizing, by definition). On the other hand, Table 3 shows the distribution P E (K) of the number of genes with K inputs in the genetic network of E. coli, according to the last update of the regulonDB [46].
From Tables 2 and 3 Tables 2 and 3 we obtain that the fraction f c of canalizing functions present just by chance for the genes with K$2 is given by f c <421/817<0.51. Therefore, in the simulations with random Boolean functions there is already a large fraction of canalizing functions. To further investigate the effect that canalizing functions might have on the dynamics of the genetic networks, we have added and removed canalizing functions to and from the ones already present just by chance. Our results indicate that the critical behavior observed in the genetic dynamics still exists even when the fraction of canalizing functions substantially deviates from the value expected for fully random functions. Figure 3 shows the map M(x) for the E. coli network for several values of the fraction f c of genes with K$2 regulated by canalizing functions. Figure 3a corresponds to the case in which there are more canalizing functions than the ones present just by chance, whereas Figure 3b shows the opposite case in which there are less canalizing functions. To compute f c we have ignored the genes with only one regulator, taking into account only the 817 genes with two or more regulators. We have already mentioned that in the E. coli network, f c = 0.51 for fully random Boolean functions. Additionally, from Table 2 we also see that for K = 2 and K = 3 the probability P c (K) for a random Boolean function to be canalizing is relatively high (P c (2) = 0.875 and P c (3) = 0.468), whereas for K$4 the probability is extremely low. Therefore, what we did to increase the value of f c was to add canalizing functions only to the genes with K$4 in such a way as to preserve the overall value p = 0.57660.038 of the gene expression probability observed in microarray experiments. There are 230 genes with K$4, and we can consider that none of them are regulated by canalizing functions just by chance (the probability is very low). Therefore, to increase f c we forced a fraction q of these 230 genes to be regulated by canalizing functions. In Figure 3a     two or more inputs are regulated by canalizing functions. Only for f c .0.7 significant deviations from criticality are observed (dotteddashed curves in Figure 3). Analogously, to decrease the value of f c we removed a fraction q of the canalizing functions from the 421 genes which originally are regulated by canalizing functions just by chance. In other words, we forced a fraction q of these 421 genes to be regulated by noncanalizing functions. In this case, f c is given by f c = 421(12q). Figure 3b shows the Derrida maps for q = 0.1, 0.3, 0.5, 0.7 and 0.9.
From Figure 3 it is clear that the Derrida map is less robust to the removal than to the addition of canalizing functions. However, in the region 0.4#f c #0.66 the Derrida map does not seem to change substantially.
To indicate the significance of the above results we present in Figure 4 the map M(x) for homogeneous random networks with different fractions f c of canalizing functions. In these networks each gene has exactly K = 2 regulators chosen randomly from anywhere in the system, and the bias p of the Boolean functions is p = 0.5. With fully random functions the network operates in the critical regime. Figure 4(a) corresponds to the addition and Figure 4(b) to the removal of canalizing functions. In both cases we used networks with N = 1481 (the same number of genes as in the E. coli network). In a homogeneous random network with N genes and connectivity K, just by chance there are N c = N6P c (K) genes regulated by canalizing functions, and N nc = N6(12P c (K)) genes regulated by non-canalizing functions. By forcing a fraction q of these N nc genes to be regulated by canalizing functions, the overall fraction f c of canalizing functions in the network increases according to f c = (N c +q6N nc )/N. Analogously, by removing a fraction q of the N c canalizing functions present just by chance, the fraction f c of remaining canalizing functions is f c = N c (12q). In Figures 4(a) and 4(b) we present results for q = 0.1, 0.3, 0.5, 0.7 and 0.9, which produce the corresponding values of f c displayed in the Figure. As we can see from Figure 4, for homogeneous random networks the addition or removal of even a small fraction of canalizing functions, on top of the ones that are already present by chance, has a much bigger effect on the dynamics than for the E. coli network.  These results strongly suggest that the critical behavior observed in the dynamics of the genetic networks of real organisms, and its robustness to changes in the fraction of canalizing functions, are most probably due to the network architecture rather than to the logical rules that regulate the expression of the genes. If, for instance, there were a large fraction of genes with 5 inputs in the E. coli network, then the presence of canalizing functions would certainly affect the dynamical regime in which the network operates. But given that most of the genes have 1, 2 or 3 inputs, there is already a large fraction of canalizing functions just by chance. Adding more does not substantially affect the dynamical behavior.

Discussion
Our results show that, to the best of the current experimental data available, and within numerical accuracy, the Boolean dynamics of the GRN of five organisms from four different kingdoms are critical. We have used two small well documented networks for specific patterning processes in plants (A. thaliana) and animals (D. melanogaster), for which both the topology and the Boolean functions are known and correspond to thoroughly studied processes at the molecular level. We also tested larger GRN for unicellular organisms (E. coli, B. subtilis and S. cerevisiae) in which the logical rules are not known. Thus, in the absence of other knowledge, we used random regulatory phrases constructed with gene expression probabilities inferred from microarray experiments. How ever, the results are essentially the same when the fraction of canalizing functions with the same gene expression probabilities varies considerably. The fact that all these GRN, constructed with completely different approaches, for distinct organisms and of different sizes exhibit critical Boolean dynamics is of outmost interest and strongly suggest that this might be a generic characteristic with far reaching consequences. For there is evidence that criticality confers clear evolutionary advantages to living organisms, because it is only close to criticality that robustness and adaptability can coexist. It remains an open problem to determine how criticality has emerged throughout evolution, i.e. to devise biologically relevant models of network growth that generate critical dynamics. Such models must take into account not only the evolution of the network topology, but also the emergence of the regulatory phrases through which the genes interact. If this critical behavior is corroborated as more and better experimental data become available, and with more detailed dynamic models, criticality at the genetic level may become a fundamental evolutionary mechanism that renders the stability and diversity that we observe in living organisms.

Phase transition in networks with realistic topology
We computed the phase transition displayed in Fig. 2(b) by implementing the Boolean dynamics in networks with scale-free output topology and Poisson input topology. Such networks are easily generated in the computer by firs assigning to each gene g n its number of outputs l n , taken from a scale-free probability distribution P o (l) = Cl 2c , where C is the normalization constant and c is the scale-free exponent. Once every gene has been assigned with a number of outputs, the l n outputs of each gene g n are chosen randomly from anywhere in the network. By this process, the inputs of each gene are automatically set with a Poisson distribution P i k ð Þ~e {z z k k! whose average z depends on the scale free exponent c. There is a fraction P i (0) = e 2z of genes which do not have inputs and hence remain frozen throughout the temporal evolution of the system. Those genes were not perturbed. Only genes with a nonzero number of inputs were perturbed. We used networks with N = 1000 genes since this is the order of magnitude of the gene transcription networks available in the databases. We run the dynamics for 1000 time steps, starting out from two initial conditions differing in 20 genes (2%). After these 1000 time steps we computed the Hamming distance between the resulting states. Each point in Fig. 2(b) is the average of this Hamming distance over 10000 different pairs of initial conditions.
Once we know the regulators of every gene in the network, the Boolean functions are assigned as follows. A Boolean function of k inputs has 2 k values, one for each of the 2 k configurations of the k inputs. For each of these 2 k configurations we generate a random number z uniformly distributed in the interval [0,1]. If z#p we set the value of the Boolean function equal to 1 for the corresponding configuration of the inputs. If z.p, we set the value of the Boolean function equal to 0. We repeat this process for all the configurations of the Boolean function and for all the Boolean functions in the network. In this algorithm the parameter p is the gene expression probability inferred from microarray experiments.

Data Normalization
The data were retrieved from SMD as Log Ratios (base 2). These data were already background corrected and mean normalized by SMD itself. Only features with no flags were selected. Data from (Gosset et al, 2004;Zhang et al, 2005) were background corrected and normalized by Affymetrix Microarray Suite 5.0. Log Ratios (base 2) were then calculated between the wild type experiments and the mutants with and without glucose. The data from the KEGG Expression Database were already normalized when retrieved. They were only background corrected by us; Log ratios (base 2) were obtained.

Inference Algorithm
Only Log Ratios greater than a certain threshold T 0 were considered (see Assessing Inference Success). With the filtered data we performed standard Bayesian parametric inference with variables of two states, inhibited (repressed) and induced (activated). The Equivalent Sample Size was set equal to 4. The induction or repression of a gene were established if the average of the a posteriori distribution was equal to or greater than 70% for induction, and equal to or lower than 30% for repression. A detailed example of the inference algorithm is presented below.

Assessing inference success
The nature of the regulation (activation or repression) of many of the established regulations in the networks of E. coli and B. subtilis is reported in the corresponding databases (Salgado et al, 2006;Makita et al, 2004). From this information it is possible to generate regulatory phrases only for those genes with one regulator. Thus we compared the one-regulator phrases already reported with our inferred one-regulator phrases. The inference success was established as the percentage of coincidences between our inferred phrases and the reported ones. The inference success increases as the threshold T 0 increases. However, the total number of phrases that can be inferred diminishes as T 0 increases. A good compromise between low percentage of false positives (high inference success) and a good statistics was achieved at T 0 = 1.50 for E. coli, and T 0 = 1.30 for B. subtilis. From these results we found that a good threshold for S. cerevisiae, for which the nature of the regulations is not reported, is T 0 = 1.50. Note that a threshold of 1.5 in Log 2 Ratios is equivalent to almost a three fold change in expression intensities.

Example of inference
We illustrate the inference technique to obtain the value of the parameter p from microarray experiments with a specific example. Suppose that gene A is regulated by genes B and C, and that we want to determine the regulatory phrases, i.e. how A changes its expression due to the joint combinatorial changes of B and C. In order to do so, we need a set of microarray experiments in different conditions that have been already normalized and their background noise subtracted. We use the data shown in Table 4 for the gene expression level of three genes (log ratios base 2), obtained from two color spotted microarrays. In this table, ''Cond i '' refers to one microarray experiment in the i th condition (not necessarily all the conditions have to be different). For instance, Cond 1 may be the wild type condition, whereas Cond 2 may be a condition in which a global regulator has been knocked out. In general, the level of expression of a given gene is different from one condition to another. Namely, there may be over expressions or under expressions across the different conditions. To filter out only the biggest changes in the level of expression of the genes we use a threshold T 0 , and indicate the positive changes (those greater than the threshold) with an arrow pointing upwards. Analogously, we indicate the negative changes (those lower than the threshold) with an arrow pointing downwards. When no change is detected we use the symbol (-). Using the data shown in Table 4 and setting the threshold value to T 0 = 1.5, we get the discrete representation shown in Table 5.
By counting how many times A changes for the different combinations of B and C given in Table 5, we obtain the data shown in Table 6. Some table entries are equal to zero, indicating that there are no data for that particular combination of B and C. Note that we have considered all the cases with three symbols (Q,-,q), and two regulators (B and C). Now we reduce Table 6 by considering only the entries where a change can be detected (i.e., we eliminate all entries with ''-''). This gives the results displayed in Table 7. For reasons that will be clear in a moment, it is necessary to add one unit of a priori evidence to every entry of Table 7. After this unit has been added, we obtain Table 8. Observe from this last table that, for the first combinatorial change If the change of the level of expression reported in Table 4 is larger than a given threshold T 0 , we write an arrow pointing upwards (q), whereas if the change of the level of expression is smaller than 2T 0 then we write an arrow pointing downwards (Q). The symbol (-) indicates that no significant change was detected. In this example we used a threshold T 0 = 1.5 to discretize the numbers given in Table 4. doi:10.1371/journal.pone.0002456.t005 Table 6. Occurrence count of evidences.
Counting of how many times A changes positively (q), negatively (Q), or it does not change (-) for every combined instance of changes of B and C, according to the results displayed in Table 5. doi:10.1371/journal.pone.0002456.t006 Table 7. A priori and a posteriori evidence I.
After removing all the instances in Table 6 where there is no change, we end up with the data shown in this of B and C (first row), there are three evidences where the level of expression of A decreases and one in which it increases. If these were all the evidences, we could say that from 100% of the cases (3+1), in 75% of them the expression level of A increases and in 25% of them it decreases. Thus, the a priori evidence that we have added has the effect of changing the importance the a posteriori evidence. There are several ways of distributing the a priori evidence; in this work we have used an Equivalent Sample Size S = 4, which means that the 4 units of a priori evidence are distributed equally among all the possibilities. Finally, given the a posteriori evidence, in order to decide whether the expression level of A increases or decreases we use a second threshold T 1 . If the percentage of a posteriori evidence is greater or equal than T 1 , then a regulatory phrase has been established. Using a threshold T 1 = 75% we obtain the results shown in Table 9. We have used the symbol ''-'' to state that the evidence does not support a decision. To infer the regulatory phrases from the microarray experiments analyzed in this work we used the value T 1 = 75%.
Once a set of phrases has been successfully established, we determine the probability of gene expression as the quotient of activatory phrases between the total of inferred phrases. In the case shown in Table 9, two phrases have been successfully established, one activatory and the other inhibitory. Thus, the estimated probability of gene expression in this example would be p = K.
To validate our methodology for inferring regulatory phrases we use the nature of the regulation already reported in the data bases for the case of simple regulations, namely, when only one regulator determines the expression of the regulated gene (gene A is only regulated by gene B). For example, in E. coli the gene alaW is regulated (positively) only by fis. In the Regulon Data Base this regulation is represented as:

Fis?AlaWz
The plus sign at the end indicates that alaW is positively retulated by fis. The table with the two phrases follows immediately: Note that this information in the RegulonDB cannot be used to validate our methodology for genes with more than one regulator. This is because the regulation is combinatorial. To validate our methodology we compared only the inferred phrases for simple regulations with the ones reported in the RegulonDB or the DBTBS. Figure 5 shows the inference success (the fraction of regulatory phrases form simple regulations that matched the phrases obtained from the curated databases) as a function of the threshold T 0 . In the same graphs we have plotted the total number of inferred phrases (including the phrases from simple regulations). As we can see, the inference becomes better as the threshold T 0 increases. However, the total number of inferred phrases decreases with the threshold. In order to have a good statistics (more than To every instance of A shown in the previous table, we add an a priori evidence of 1, which results in the data displayed here. This has the effect of changing the importance of the a posteriori evidence. doi:10.1371/journal.pone.0002456.t008 Table 9. Inferred table of regulation. With a threshold T 1 = 75% for the a posteriori evidence in Table 8, we inferred the most probable effect of the combined changes of B and C on the expression of A. With the information available, only two regulatory phrases could be inferred in this example. doi:10.1371/journal.pone.0002456.t009 250 inferred phrases) and a high inference success (around 90%), in this work we have chosen T 0 = 1.5 in E. coli and T 0 = 1.3 in B. subtilis.

Supporting Information
Text S1 Java Applet of the dynamics and list of ID numbers for the microarray experiments used in this work. Found at: doi:10.1371/journal.pone.0002456.s001 (0.05 MB DOC)