A Publish-Subscribe Model of Genetic Networks

We present a simple model of genetic regulatory networks in which regulatory connections among genes are mediated by a limited number of signaling molecules. Each gene in our model produces (publishes) a single gene product, which regulates the expression of other genes by binding to regulatory regions that correspond (subscribe) to that product. We explore the consequences of this publish-subscribe model of regulation for the properties of single networks and for the evolution of populations of networks. Degree distributions of randomly constructed networks, particularly multimodal in-degree distributions, which depend on the length of the regulatory sequences and the number of possible gene products, differed from simpler Boolean NK models. In simulated evolution of populations of networks, single mutations in regulatory or coding regions resulted in multiple changes in regulatory connections among genes, or alternatively in neutral change that had no effect on phenotype. This resulted in remarkable evolvability in both number and length of attractors, leading to evolved networks far beyond the expectation of these measures based on random distributions. Surprisingly, this rapid evolution was not accompanied by changes in degree distribution; degree distribution in the evolved networks was not substantially different from that of randomly generated networks. The publish-subscribe model also allows exogenous gene products to create an environment, which may be noisy or stable, in which dynamic behavior occurs. In simulations, networks were able to evolve moderate levels of both mutational and environmental robustness.


Introduction
Models of genetic regulatory networks hold the promise of a deeper understanding of two fundamental processes in biology. First, the relationship between genotype and phenotype in each individual depends on the dynamic behavior of genes interacting with each other and their environment. Second, natural selection acts on the resulting phenotypes produced by this interaction, thus the response to selection and the long-term course of evolution depend on how variation in network properties can be altered by mutation and recombination. Of particular interest is understanding the connection between these two processes, as our assumptions about how these networks are formed affect how they operate at a time, and simultaneously how they can change over time. As with all modeling efforts, constructing these models requires a balance between simple, general, and easily interpreted models on the one hand, and more complex, specific, and predictive models on the other. Here we present what we call a publish-subscribe model of gene regulation. This model adds a layer of complexity to an existing simple model, Kauffman's NK networks [1,2]. Our model produces networks that operate similarly to those in the NK model-a number of regulatory genes affect each other, producing a series of activation states that stabilizes to a point or cyclic attractor. What differs is the fashion in which the regulatory connections are made, and hence how they can evolve. The changes we introduce allow for independently mutable regulatory and transcribed regions of a gene, and for regulatory connections to be made via intermediary products. This enables significantly different evolutionary dynamics (for example, significant neutral change can take place) and allows the network dynamics to change in different environments, as the intermediary products can be exogenously introduced. The ''environment'' of the network may be the external environment or neighboring cells in a multicellular organism. The focal network may also be a module within the total genetic network of an organism [3], in which case its environment includes other components of that larger network. We explore some consequences of these changes for the properties of single networks and the evolution of populations of networks.
The NK model has been used to explore the properties and dynamic behavior of genetic networks (e.g. [2,4,5]). This model represents a set of N genes, where the activation of these genes is represented by a binary state that is expressed (1) or not expressed (0). Each gene is influenced by K other genes. Whether or not a gene is expressed at time t is decided by a Boolean operation on the previous expression state (at time t21) of the K other genes that influence it. In the absence of stochasticity or perturbation, the activation of these N genes moves through a series of expression states depending on the initial conditions, ending up in either a stable state or periodic attractor. The entire state space can be described, and each possible attractor enumerated, by starting the network in each of its 2 N possible states and constructing a directed graph in which the nodes are possible states of the network and the edges are transitions among them. These transitions depend only on the connections between the genes and the specific Boolean rules associated with each gene.
The use of discrete, Boolean rules for gene regulation appears justified as a first approximation to data from living organisms [6][7][8]. In a real network, the interactions among genes are mediated by gene products, transcription factors, signaling pathways, cellular machinery, and diffusion processes [9]. In the NK network model, all of these processes are collapsed into the edges linking one gene to another. This may be a good assumption in part because biological networks must be somewhat environmentally robust, i.e. buffered against perturbations and stochasticity [10,11]. This may preclude, for example, dependence on sensitively fine-tuned levels of gene expression. Thus simple NK networks seem to capture many of the fundamental dynamics of genetic networks.
However, the assumption of these simple gene-to-gene connections may affect our understanding of the two basic questions raised above. Consider the first issue, the relationship between genotype and phenotype. We wish to know, for example, combinations of parameters for which networks exhibit a certain behavior (e.g. [9]). Randomly generated NK networks can provide an estimate, but models including other parts of the genetic regulatory process may widen the volume of parameter space in which solutions are found [12], or change our understanding of the effect on network properties of processes such as gene duplication [13].
Consider also the second issue, the evolution of populations of networks. Evolution is often envisaged by constructing a fitness landscape, a multidimensional surface defined by fitness as a function of genotype (or phenotype), where a single ''step'' on the surface is equivalent to a one locus mutation of the genotype [14]. Our assessment of the ruggedness of the landscape, and therefore the ability of populations to evolve toward global optima rather than remain on isolated local peaks, depends on the details of the model. In particular, what constitutes a single mutational step determines the structure of variation available to evolution. So we must consider not just how these networks operate, but also how changes in the genotype affect fitness, for this will be crucial to constructing the statistical properties of the fitness landscape. The simplest type of mutation in NK networks is the addition or removal of a single connection (or ''edge'') between genes in the network (e.g. [2]). Changes in such regulatory influence are often represented as changes to values in a connection matrix (e.g. [9]). Of course, simulated evolution of a population of NK networks can proceed by multiple such changes in a single generation or by other types of mutation, such as gene duplication or loss (e.g. [4]), and this has been a productive avenue for research on network evolution. Nonetheless, our view of the landscape of possible network configurations-whether it has a single or multiple adaptive peaks [2] or how connected is the ''metagraph'' of networks possessing some quality like robustness [8]-depends on which networks are connected to each other by a single mutational step.
In our model, we explicitly consider the process of gene regulation by introducing gene products that mediate the regulatory connections among genes. These gene products may represent proteins, or they may be any of a variety of non-protein regulatory molecules whose role is just beginning to be understood [15]. Each gene is separated into a coding region, which produces a gene product, and a regulatory region, to which gene products may bind. The coding region of each gene acts as a binary switch, either expressed or not in each time step. Whether a gene is expressed-whether the coding region produces its productdepends on the products that are bound to its regulatory region and a set of Boolean rules that translates the binding state of the regulatory region into the expression state of the coding region.
The regulatory connections are therefore not specified directly, but rather are an upshot of the correspondence between coding regions and regulatory regions.
For instance, the coding region of a particular gene might produce some product w. Any gene that has the binding site for w in its regulatory region will then be regulated by that gene, and also any other gene that produces product w. This has an effect on the range of variation in network behavior and on what constitutes a single mutational step. We can think of coding regions that contain a conserved DNA motif [16] as transmitting or publishing a signal on a certain channel, and regulatory binding sites which bind this motif as subscribing on that same channel. If a publisher (coding region) stops transmitting on a channel, then all subscribers (regulatory binding sites) tuned to that channel will be affected. Likewise, if a subscriber is tuned to a channel over which multiple publishers are sending signals, it will be affected by each of these multiple signals. In this way, the equivalent of several connections among genes in the network can be created or destroyed by a single genetic mutation. What constitutes a single step on the adaptive landscape is now significantly different than a model that directly connects or disconnects the regulatory interactions by adding or removing an edge or changing the weight in a connection matrix.
It is worth noting that what we have described here as publishsubscribe has a relevant parallel in the area of modern software construction (indeed, that is where we derived the name) [17,18]. The move from directly connecting two interacting parts of a software application to connecting them via this more indirect manner has an important result. The two processes are now decoupled, as new upstream processes may influence any processes subscribed to the right message, and likewise new downstream processes can react to a message by simply subscribing to it. This particular kind of ''design pattern'' [18] ensures that, although the system remains operationally equivalent to one with direct connections, it is far easier to implement changes that re-use the available structure. We might say that implementing the system in this way makes it more ''evolvable'', in the sense that modifications are easier to make, and have less chance of having a catastrophic effect. In a similar manner, moving from a model where connections are made directly, to one where the interactions occur indirectly through such a publish-subscribe paradigm, will have important implications in how the system may evolve.
Below we describe the model formally. We derive some basic properties of the structure and dynamic behavior of the networks, both by sampling randomly constructed networks and by analytic means. We then consider how this publish-subscribe view of gene regulatory interactions drives the potential of populations of networks to evolve in response to different regimes of selection.

Methods
We consider a conceptually simple model of a genetic regulatory network consisting of N genes, each of which includes a regulatory region and a coding region (Fig. 1). The regulatory region consists of a number of binding sites, to which specific gene products may bind. Let us denote the regulatory region of the ith gene by r i and its coding region by p i . We define r i and p i as sequences of length l and 1, respectively: Each element x ik (where k = 1,…,(l+1)) for the sequence of a gene is chosen from an alphabet P containing r letters with uniform probability 1/r. So if our alphabet P = {0,1,2,3}, and length l = 3, then one possible gene would be (1,1,3,2). Here r = (1,1,3) and p = 2. A network consists of N such genes.
Interaction between two genes is mediated by gene products. If the ith gene produces a product x that matches a binding site in the jth gene, then the ith gene may regulate the expression of the jth gene. We denote the possible interaction (adjacency) matrix by w, with elements The in-and out-degree of a gene are calculated by summing the elements of the adjacency matrix, respectively: Below we show both numerical and analytic estimations of in-and out-degree distributions [19]. Note that in estimating in-and outdegree distributions, we do not consider the particular set of Boolean rules governing the activity state of each gene. For instance, a given letter x may occur in the regulatory sequence of gene i. All genes containing x in their coding region (i.e., genes that may produce the corresponding product) are considered to be connected to gene i in the calculations of degree below. As with NK networks, this is the case even if the particular Boolean rules for gene i imply that the presence of that product has no effect on the activity state of gene i.
In each time step a set of products R(t) is present, where R(t),P. Each binding site in the regulatory region r i is bound if the matching gene product is present (i.e. if x ik ,R(t)). Products are not consumed when they bind; thus the product from a single gene is sufficient for binding the regulatory regions of several genes (effectively, we ignore quantities of gene products). We denote the entire binding state for a gene at time t as the vector where b ik denotes the binding state (either bound or not) of the kth site in the binding region, Note that this state B i (t) may be equivalent to some other binding state B j (t) if the jth gene has the same values in its binding region. It may also be equivalent to B i (t2s) if the same products were present at time t2s. This binding state is used to determine whether or not the gene is active, and whether the corresponding value in the coding region will produce a product at time t+1.
The value B i (t) locates a unique entry in a Boolean table that returns a value representing whether the corresponding gene is active or not. This table is common to all genes in the network (and all networks if there is a population of networks evolving). We will denote the table as Y. This table contains all possible combinations of values in a binding region and their possible bound state. Providing a global table of each particular Boolean response to a combination of bound products provides a realistic degree of stability to the system: two genes with identical regulatory regions presented with the same set of intermediary products will always do the same thing. The activity state of gene i at the following time step is read from this table as The activity state s i is binary, taking values of either 1 or 0. If s i (t+1) = 1, the product x i will be produced by gene i, so that . Arrows represent possible interactions, i.e. directed edges in the network. Below one gene is the Boolean rule set specific to that gene. A ''2'' indicates that the binding site is not bound by the corresponding product, and a ''+'' indicates that it is bound. The gene is then either expressed (''on'') or not (''off''). In this case, if product 7 but not product 0 is present at time t, the binding state of the regulatory region of this gene corresponds to the second row of the Boolean table. As a result, the gene is expressed and product 0 is present at time t+1. Because 0 occurs in both the regulatory region and the coding region of this gene, it is self-regulating and will not be expressed at time t+2. doi:10.1371/journal.pone.0003245.g001 x i ,R(t+1). If s i (t+1) = 0, then x i will not be produced by gene i, but the identical product x may be produced by another gene. The activation state of the network at time t is given by S(t) = (s 1 (t),…, s N (t)). In constructing the table Y, the value 1 is assigned to each s i with probability p, so that p gives a measure of the overall probability of gene activity. This model results in several possible regulatory patterns: for instance, multiple genes with the same product have identical regulatory effects, genes may regulate themselves (e.g. genes A and F in Fig. 2), and products can have either inhibitory or activating effects (e.g. the effect of product 6 on gene A versus gene B in Fig. 2). Because there is a finite number of genes and gene activity is binary, there is a finite number of states of the network. Therefore, given a set of starting conditions and no stochasticity, the network reaches a stable attractor. The attractor can be a single state (i.e. the same set of gene products in each time step) or a cycle (the same sets of products produced at regular intervals). Fig. 2 illustrates a period-3 attractor over the entire network, with some genes (D and F) in a stable state.
Because each gene's activation s i (t) is binary, the dynamics of any particular model network is much like the NK model, in that many binary states determine a single downstream gene's state by a set of Boolean operations. What differs is how the regulatory connections are constructed, and thus how they might evolve. How we initiate the network also differs. Rather than setting it into a particular state, its initial conditions are defined by the introduction of an initial set of products R(0). Note that this means that, although there are 2 N possible states of the network, not all of these states may be strictly reachable. There may be no combination of products that can produce a particular activation state S. In the course of simulations, we may activate any state and see what products it produces. But driving the dynamics of the network purely by introducing gene products already places a constraint on possible states that the network can enter. Finally, mediating connections among genes by using gene products means that a network can operate in an ''environment'' of exogenous gene products that influence its dynamic behavior. This environment may be stable or temporally variable, as we illustrate below.

Basic Properties of the Network
The interactions between a set of genes in the model described above can be represented as a directed graph, where the nodes represent genes and the edges represent connections among genes in the publish-subscribe model. The edges are directed because of the way we define the regulatory and coding regions of our genes. For instance, the product x i of gene i may affect the activity state of gene j at the next time step, but not vice versa. Thus each gene may affect ''downstream'' genes and simultaneously be affected by ''upstream'' genes. The number of upstream and downstream genes connected to a particular gene is the in-degree and outdegree (respectively) of that gene. Each network can be Figure 2. Diagram of four time steps in a 6-gene network. In the initial conditions, products 9 and 0 are present. Filled boxes represent expressed genes, dotted arrows represent binding of products to regulatory regions, and solid arrows represent production of gene products. From these initial conditions, this network enters a stable period-3 cyclic attractor. Boolean tables are not shown. doi:10.1371/journal.pone.0003245.g002 characterized by its in-and out-degree distribution-the frequency distribution of in-and out-degree across all genes, or nodes in the network.
Degree distributions are important indicators of the organizational principles underlying networks and have been the focus of network theory approaches to gene regulation. The in-and outdegree distributions of real transcriptional regulatory networks exhibit different functional forms. In-degree typically displays an exponential decay and is restricted to a narrow interval, while the out-degree distribution typically has a broad tail [20][21][22][23][24][25]. It has been shown [25] that in-and out-degree distributions together are sufficient to reproduce most of the global topological properties of genetic regulatory networks such as degree-degree correlation [26] and clustering coefficient [27]. Degree distributions are also considered to be important in determining the resistance of networks to perturbations (robustness) and the ability of populations of networks to evolve (evolvability) [5]. With these motivations, here we derive the directed degree distributions to provide better insight on the properties of our model networks. We have calculated these distributions both numerically and analytically. Numerical results were calculated from frequency distributions of a large number of networks, each generated by randomly and independently assigning letters from the alphabet P to each regulatory and coding site, while keeping the alphabet size, regulatory region length, and total network size constant. Below we present results for relatively small values of alphabet size (r = 10), regulatory region size (l = 3), and total network size (N). These parameter values, particularly alphabet size and total network size, are likely to be much smaller than those measured in actual genetic networks [16,21,28], but they provide a starting point for exploring the behavior of the publish-subscribe model. Our goal is to compare our results to the more basic NK model, so that our conclusions can be tied to the addition of explicit gene products. Because of the modularity found in empirical gene networks, we can envision smaller networks as modules operating in the context of a larger organismal network; in this context, the ''environment'' of exogenous gene products that we consider below represents other interacting modules of the overall network.
For relatively small values of N, both in-degree and out-degree distributions shift to the right as N increases (Fig. 3). In other words, as N increases, the number of genes with products corresponding to a binding site of gene i increases (in-degree), and the number of genes with binding sites corresponding to the product of gene i increases (out-degree). To explore the large-N limit, Fig. 4 shows in-degree and out-degree distributions for large networks (N = 1000). In the large-N limit, such that all sequences of length l are likely to be realized, the out-degree distribution approaches a single binomial distribution (Fig. 4A). In contrast, the in-degree distribution approaches a superposition of binomial distributions, with separate peaks corresponding to the number of different letters contained in a sequence of length l = 3 randomly sampled with replacement from the finite alphabet P (Fig. 4B). For example, the smallest peak in Fig. 4B is the result of genes whose three binding sites contain the same letter x, and the largest peak is the result of genes with a different letter at each of the three binding sites.
To calculate the out-degree distribution analytically, first we determine the probability of finding a given letter x in a randomly  chosen sequence of length l, which is given by This equals the probability of the product of gene i occuring in the regulatory sequence of gene j. Thus in the large-N limit, out-degree k out is binomially distributed: The mean and variance of this distribution are given by k k out~N p 1:l ð Þ and s 2 This analytic solution for out-degree distribution closely matches the numerical estimate (Fig. 4A). An analytic solution for the in-degree distribution is more complex, being in fact a superposition of binomial distributions. This is because a regulatory sequence of length l, chosen from a finite alphabet of size r, may contain duplicate letters. Let I be the number of different letters x occurring in a regulatory sequence, so that 1#I#min(l,r), and let v(I) be the number of possible sequences containing exactly I different letters x. The total number of possible regulatory sequences is v~P can be directly calculated in terms of the parameters r and l.
Denote the multiplicity of letter x i in a sequence of length l by n(x i ). Given I and l there are two constraints on n(x i ): For a set of I different letters with multiplicities {n(x i )}, the number of possible sequences is a multinomial coefficient Combining equations (9) and (10) we get the number of regulatory sequences containing exactly I different letters: If we sum over the multiplicities in equation (11), we get Note that v(I) also gives us the number of possible tuples B i (t) in the table Y: For regulatory sequences with I different letters, the in-degree distribution is where I/r is the probability that a randomly selected gene product x matches one of the I different letters in the regulatory sequence.
The mean and variance of this distribution are The total in-degree distribution is thus: where v(I)/v is the probability that a randomly selected regulatory sequence contains I different letters. This analytical solution closely matches the numerical estimate (Fig. 4B).

State Space
Although a graph representing the regulatory interactions between genes tells us something about the structure of possible interactions in the network, the full dynamics of a particular network-what that network does-can be represented by exploring its state space. A network activation state space contains all possible activation states that the network can take, and the transitions between each of them.
For a given number of genes N, there is a total of V = 2 N possible activation states of the network. For a finite network size N, the state space is also finite. Starting from an initial state, the system will eventually return to a previously visited state. Thereafter it will follow stable or cyclic behavior, if no stochasticity or exogenous gene products are introduced. The set of states that constitutes a cycle is called an attractor, and the number of states it contains is the attractor length. All the states converging to an attractor constitute its basin of attraction, and the number of states in a basin of attraction is the basin size. The state space of a network can be represented as a graph (Fig. 5), just as the possible regulatory links among genes can be. But these two graphs are very different things. For example, the in-degree of a gene is the number of other genes that may regulate it; the in-degree of a particular state of the network is the number of states at time t that will end up at that state at time t+1. We call the in-degree of a network state the precursor number of that state. Below we consider these characteristics of the state space of networks of size N = 10.
In a randomly constructed network, the vast majority of network states have no precursor (Fig. 5). Such states are unreachable by the network, unless they are used to initiate the network in a simulation. An immediate consequence of this fact is that the average transient time that it takes to reach an attractor starting from an arbitrary state is very short compared to the state space size V. To make these statements clear we have calculated the probabilities P p (n p ) and P t (t) that an arbitrary state has n p precursors and transient time t, respectively. These quantities are displayed in Fig. 6 for a network of size N = 10. It should be noted that P p (0) increases as N increases (not shown) and P p (n p ) may have any value between 0 and V. Note also that the mode of the transient time distribution shifts to the right as N increases (not shown).
We consider also the basin size distribution, P s (n s ), which is the probability of having a basin of attraction of size n s (Fig. 6C). P s (n s ) is concentrated on values n s = V/2 m , m = 0,1,…,', and decreases dramatically as mR'. This means that in an arbitrary realization we may observe only the peaks at n s = V or n s = V/2. The case of n s = V corresponds to a network with a single attractor whose basin of attraction encompasses the entire network. This pattern is similar to that found in NK networks when K is relatively small (e.g., K = 1) [29]. Fig. 7 shows the distribution of number of attractors, P a (n a ), and the probability that a given attractor has length l a , P l (l a ), in randomly constructed networks of size N = 10. Note that P a (1) and P l (1) decrease as N increases (not shown). Below we evolve populations of networks using selection on attractor number and attractor length. The distributions shown in Fig. 7 for randomly constructed networks illustrate the range of variation in these properties available to evolution in a randomly generated population, and they provide a benchmark against which to measure the efficacy of evolution to find relatively small regions of network space where fitness is maximized. Fig. 8 shows the mean values for attractor number (n a ), attractor length (l a ), transient time (t), and attractor basin size (n s ), over a range of small values of N. Note that the first three of these measures increase roughly linearly with N, while basin size increases exponentially. Thus basin size increases roughly proportional to state space size V, which itself is an exponential function of N. It had been believed that the average number of attractors of NK networks increased as the square root of system size [29], but recent numerical studies [30] have shown that this quantity increases linearly with N, as it does in our model.

Evolution of the Networks
In this section we use simulations to explore what sort of networks can be produced by selecting for a particular property in a population of networks. In the following simulations we restricted the changes to point mutations (changes in single letters in either the regulatory or coding regions of genes), and modeled the evolution of an asexual population. The model could also be extended to include recombination among genomes, and other types of mutations such as gene duplications and deletions (e.g. [13]), but we leave this for a later time. To begin, we selected on two network properties: attractor length and number of attractors. Given that attractors form the basis of any subsequent control of gene expression, it is important to show the lability these properties have under a simple selective regime. Such network traits may also relate to fitness in biological systems by corresponding to the identity and behavior of different cell types in multicellular organisms [2], or alternative states of a genetic network module [31]. Here they provide a simple first test of how the networks might evolve, and the resulting evolved networks provide an interesting comparison with the randomly sampled networks studied above.
In both cases we generated an initial population of 100 networks, analyzed the state space of each network and assigned it a fitness equal to either the number of states in the largest attractor or the total number of attractors in the state space. We then generated a new, non-overlapping generation of 100 networks. Each network in the new generation was produced, without recombination, from a single parent drawn randomly from the previous generation. The probability that a network was selected as a parent was directly proportional to its fitness. Each reproductive event included a single random point mutation in the network's genome, with each site in either regulatory or coding regions having an equal probability of mutation. We repeated this procedure for 100 generations. The state space of the fittest networks resulting from selection for attractor size and attractor number are shown in Fig. 9.
Selecting on these particular properties resulted in some very atypical networks. The results of these simulations were strikingly different from randomly generated networks, such as those depicted in [2] or in Fig. 5. For instance, the maximum attractor length in a sample of 40,000 randomly generated networks was 31 (Fig. 7A). In contrast, simulated evolution was able to produce an attractor length of 254 in less than 100 generations. Similarly, selection for attractor number produced a network with 112 attractors, far greater than the maximum of 17 in the sample of 40,000 networks shown in Fig. 7B. The large number of possible graphs in this network model means that random sampling to estimate distributions of network properties  may fail to capture evolutionarily important parts of the space of all possible networks. Furthermore, it appears that such atypical networks can reliably be reached in relatively few generations, even when the range of variation available to selection is constrained to single point mutations as it was in these simulations.
Surprisingly, despite their rapid evolution in the character subject to selection (attractor length and number, respectively), these evolved networks did not seem atypical in other respects. Their in-degree and out-degree distributions, shown in Fig. 10, were very close to the expectation for randomly generated networks of their size (N = 10; Fig. 3). The dramatic changes in attractor length and number were not the result of concomitant changes in degree distribution. This independence of network properties is further illustrated in Fig. 11. Fitness did not increase  smoothly, but rather made occasional large jumps. In contrast, genotypic change occurred more steadily over the course of the simulation. Many genetic changes were neutral with respect to attractor length. In addition, length or number of the other attractors in the network's state space changed without affecting length of the longest attractor; these are phenotypic changes that were also neutral with respect to fitness. Neither genotypic change nor change in other phenotypic traits was a reliable predictor of change in fitness in these simulations, despite the relative simplicity of the trait being selected.

Evolution in an Environment
We have been treating networks as though they operated in isolation, subject only to the gene products produced by the network itself. Because intermediary products control the activation of genes in our model, the introduction of any exogenous products can influence the downstream activation and resultant attractor of the network. This gives our model an important additional property over the NK model: the state space, including the number and type of attractors, is a property of a particular network combined with a particular environment.
A network with no exogenous input has a single state space. However, if we assume that our environment provides a constant set of products, not produced by the focal network itself, but still able to bind and regulate the functioning of the network, the state space for any single network now depends on the particular environment of exogenous products in which the network operates (Fig. 12). Under constant environmental conditions the network will settle into one attractor, depending on the starting point. When environmental conditions change, the state that was previously in an attractor may shift to the edge of a basin, and the network may move to a new state. The introduction or removal of different products can have many effects on the state space, such as changing the number of attractors, the size of their basins, or the set of expression states contained in their basins. The maximum number of possible environments is 2 r , where r is the number of possible letters in the alphabet P. Thus the number of state space graphs corresponding to a single network may be as large as 2 r .  One property of a network is the degree to which these state spaces are similar, or fall into broad groups. This similarity may be considered a measure of the environmental robustness of the network. If the network continues to act relatively unchanged (the attractors remain constant) in various environments (differing exogenous inputs), then the network operation is robust to these changes. Although robustness in Boolean networks can be thought of in this way, our model permits us to explore a much more dynamic sense of robustness (in contrast with [9], for example). The environments in which genetic networks operate are both sources of noise and sources of important signals, either from the external environment, from other parts of a multicellular organism, or from other modules in the organism's overall genetic network [31]. Fitness depends on responding appropriately to the signals and ignoring the noise. Viewed in this way, what must be robust is the reaction norm of the network-its ability to react in a plastic and appropriate manner under various environments by distinguishing signal from noise.
We simulated evolution in a series of simple environments, in which fitness was determined by their ability to respond ''appropriately.'' If some indicator product was present in the environment, a network had high fitness if it produced some other functional product. If another indicator product was present, the network was fit if it produced a second, different, functional product. A network had high fitness by doing the right thing at the right time: in environment A, produce product a, and in environment B, produce product b. Doing the right thing implies not doing the wrong thing also-producing product b in environment A reduced fitness, and a network that simply produced a and b constitutively did not have high fitness. We selected on networks' ability to respond correctly to two different environments that alternated over time.
We evolved a population of 100 networks of size N = 10. Each network was exposed to the first environment for 10 time steps, and then switched to the second environment for another 10 time steps. The networks were then returned to the original environment. This environment switching continued until the network had been exposed to each environment 5 times. Fitness was calculated as the number of correct functional products produced, minus the number of incorrect functional products, Figure 12. State space of a single network subject to different environmental conditions. Shown are the state spaces of a single network of size N = 10 under four different environments. Each environment represents a different set of gene products that are constantly present (e.g. exogenously produced) and available to bind to regulatory regions in the network. Note that a single network can vary in both the number and size of attractors depending on the environment. doi:10.1371/journal.pone.0003245.g012 summed across all time steps. Gene products that were not the functional product in either environment did not affect fitness.
In addition to this alternation of environmental signals, we tested the ability of networks to evolve robustness to environmental noise. In the stable environment, the only exogenous products were the indicator products. In this simulation the evolved networks quickly behaved exactly as required, changing their required output in the presence of different indicator products. In the noisy environment, the indicator product was present with 2 other products, randomly chosen at each time step. Achieving a high fitness under noisy conditions was more difficult to evolve, and the networks remained at lower fitnesses throughout the simulation under noisy conditions. However, we found that a network that had evolved in a noisy environment would often perform perfectly in a stable environment.
What sort of difference is there between a network evolved in a stable environment and one evolved in a noisy environment? We tested this by subjecting the fittest network from each simulation to a number of trials (10,000) in a noisy environment. The sample distributions generated are shown in Fig. 13. We assessed both the original network (steps = 0) and a sample of 1-, 2-, and 3-step mutants from this network. This gives us some idea of the fitness of the networks in the local mutational neighborhood, and thus an indication of the ruggedness of the fitness landscape close to the peak on which the evolved network sits. Evolving the networks in a noisy environment did indeed produce a more consistently environmentally robust network, shown by both the relative positions and the widths of the peaks in the frequency distributions in Fig. 13. The decline in fitness with increasing numbers of mutations away from the original network is similar for the networks evolved in both stable and noisy environments. Thus these networks have roughly equivalent mutational robustness. In terms of the fitness landscape, the fitness peaks to which the networks have evolved in both stable and noisy environments are somewhat intermediate between broad plateaus and precipitous spires, which would allow for some near-neutral variation to persist in mutation/selection balance.

Discussion
Simple models of genetic networks have led to general conclusions about the properties of network architecture and how they affect network evolution [1,2,4,5,32]. At the same time, a growing number of technological and analytical tools allow the direct measurement of regulatory networks in natural systems [33][34][35][36][37][38][39], so that a number of empirical networks have been described in detail [21,28,40,41]. In seeking to connect these growing fields, modeling efforts can proceed by adding layers of complexity and assessing the degree to which features of the model better approximate empirical results. Here we have added a degree of complexity to simple NK networks, using a publish-subscribe view of gene regulation. Although our model shares some basic similarities with the NK model, we have found some tantalizing differences in both the properties of single networks and in the evolution of populations of networks.
First, the pattern of degree distributions from randomly constructed networks in our model is substantially different from that of previous models. In Kauffman's [1] original NK model, each gene has exactly K inputs and in-degree distribution is therefore a Dirac delta function. In randomly constructed networks under the ''standard'' NK model [4], regulatory inputs to each gene are assigned independently with a given probability, resulting in unimodal binomial (or equivalently for large N, Poisson) distributions for in-and out-degree. In scale-free networks, in-degree distribution follows a power law P(k),k 2c while out-degree follows a Poisson distribution, or vice versa [4]. In contrast, our publish-subscribe model produces an in-degree distribution that is multimodal due to the superposition of binomial distributions with different mean values. The fact that in-degree and out-degree distributions differ in form from each other in our model also contrasts with the standard NK model. This qualitatively different pattern is a consequence of the matching rule between the different nodes, i.e. between the coding and regulatory sequences. Thus, although the networks in our model exhibit similar dynamics to those of Boolean NK networks, the distributions of basic network properties differ as a result of the publish-subscribe regulatory framework. A network model based on a similar matching rule was able to reproduce global topological properties of the yeast gene regulatory network [25]. These properties include not only degree distributions, but also other network descriptors such as clustering coefficient, richclub coefficient, degree-degree correlation, and k-core decomposition.
This divergence from previous models is echoed as well by the networks evolved in our simulations. Generally, degree distribution is believed to be a central feature of a network and a key predictor of its dynamic behavior in other respects [5]. For example, the importance of scale-free degree distributions for other properties like robustness and evolvability has been established in several studies of NK networks [4,42]. However, in our publish-subscribe model, it appears that dynamic behavior may be to some extent uncoupled from degree distribution. In the simulations above, attractor length and number evolved far outside the distribution expected from randomly generated networks, but degree distribution remained remarkably similar to random. The degree distributions of the evolved networks give us no clue to the general principles by which length and number of attractors may evolve. Conversely, degree distribution may be a poor predictor of other network properties in this model. Other topological properties (e.g. [25]) may be more relevant to the evolutionary dynamics of our publish-subscribe model, and this issue should be explored further. However, additional metrics that are directed toward specific tasks, such as robustness to various types of change, may be necessary to fully compare across networks and predict evolutionary dynamics.
In the broader context of dynamic behavior and evolution of genetic regulatory network models, two issues have received particular attention: evolvability and robustness. A critical component of evolvability is the presence of neutral variation in a population [43][44][45][46]. Evolution in our network model produces neutral variation in genotype, as seen in Fig. 11, that has no immediate effect on either phenotype or fitness. From an adaptive landscape perspective, this neutral change can be seen as meanderings along neutral ridges in the landscape [47,48]. The importance of this neutral variation is its effect on the fitness of subsequent mutations. In our model, as in natural systems [49], genes often interact epistatically, so that the fitness effect of a single mutation depends on the allelic states of other loci. Thus the genetic background against which a mutation arises may determine whether it is favored by selection, and therefore whether it sweeps to fixation and increases the average fitness of the population as a whole. Neutral mutations change the genetic background that determines both the sign and the magnitude of the fitness effects of subsequent mutations.
Our network model illustrates the mechanism of neutral variation in the publish-subscribe view of gene regulation. For example, regulatory binding sites may mutate to a state for which there is not currently a matching gene product being produced. At the time, this mutation may be neutral, with no effect on the phenotype of the network. However, this mutation has created a new subscriber, ready to receive a signal from a publisher, or coding region. Such a mutation in a publisher may occur in the future, and thus a new connection is made between two genes. In addition, the number of transcriptional regulators (gene products) is limited in our model [15,50]. As a result, multiple neutral mutations in the form of publishers (or subscribers) tuned to the same signal can accumulate as neutral changes with no effect on fitness. When a single mutation in a subscriber (or publisher) shifts to the matching signal, multiple new connections are formed. The effect on phenotype, and perhaps fitness, as a result of this single mutation is magnified by the presence of existing variation. In fact the ability of mutations to have broader effects on phenotype in this way may be an important component of evolvability [51].
In our simulations we explored the evolution of environmental robustness, which is the ability of a network to perform (i.e., maintain high fitness) in the face of a noisy environment. Incorporating the ability for networks to react to the local environment enables us to explore a number of possibilities. Here, we have emphasized that robustness can be a dynamic, rather than a static, property of networks. The publish-subscribe model allows us to evolve networks whose reaction norm is robust under noisy environments. The shift from a static to a dynamic conception of robustness may have important implications. Consider an idea introduced by Kauffman [2], in which the attractors in genetic networks are viewed as analogous to cell types in a multicellular organism [52]. For the NK model, the attractor into which a network falls is fixed for a particular genetic network and the starting conditions. In multicellular development, however, the environment is, in part, other cells, and the process of differentiation may be driven by dynamic interactions between cells rather than the isolated properties of a single cell [53]. The evolution of this plastic response to the local cellular environment, and the evolution of its subsequent robustness, may be a key element in understanding the emergence of multicellularity [54]. Alternatively, the focal network may be a module of a larger genetic network, and organismal fitness may depend on the network's ability to respond appropriately to signals from other modules.
A large number of issues could be explored further with the publish-subscribe model. First, in our estimates of degree distribution, we considered two genes to be connected if the coding region of one gene matched a site in the regulatory region of the other. However, this ignores the particular Boolean rules of expression for the second gene, whose expression state may not actually depend on the first gene's product; in fact, whether this dependence is present may itself depend epistatically on the expression states of yet other genes [2]. Calculation of degree distribution in this expanded sense soon gets quite complicated, although it may be necessary for more direct comparisons to empirical data, such as gene co-expression networks or expression time series [34].
Second, we assumed here that a single coding region produces a sufficient concentration of gene product to bind any number of matching regulatory sites. The consequences of this assumption, or alternatively of competition among binding sites for limited gene product copies, could be explored further. Relaxing this assumption would not change the observed patterns of degree distribution of networks, according to the rules by which we calculated it. However, it would introduce an element of stochasticity into the activation of genes at each time step if single gene products were to bind to either one or another regulatory site with some probability between 0 and 1. As a result, our conception of the state space of a network would also change. Under the current assumption, the out-degree of any node in the state space network is one, but relaxing this assumption would produce some states with probabilistic edges connecting to multiple other states. This would result in an additional concept of robustness that could be explored: the robustness of attractors to stochastic shifts outside of their attractor basin as a result of the stochastic binding of gene products.
Third, one could explore the consequences of variation in several of the parameters. Our goal here was to explore the properties of the simplest publish-subscribe model, so in our evolutionary simulations we held alphabet size, regulatory region length, and total network size constant. Varying these parameters across networks may have implications for measures of network topology and for the evolutionary dynamics of populations of networks. Regulatory region length could also vary across nodes within a network; in a network model similar to ours such variation produced similar qualitative behavior but improved the fit to empirical data on topological descriptors from yeast networks [25]. Change in this parameter has also been implicated in the evolution of organismal complexity [55]. Among other effects, longer regulatory regions would provide a larger mutational target for regulatory versus coding regions. It remains an outstanding question to what extent changes in regulatory versus coding regions play different roles in phenotypic evolution [56,57], and the publish-subscribe model explicitly separates the two. We plan to address this issue in future work. In our simulations, we used networks of relatively small size (N = 10), which can be thought of as modules within a larger network. However, simulations of larger networks, particularly in the noisy or fluctuating environments that we described, could be used to address the evolution of modularity itself; that is, do networks evolve some degree of internal separation of components that partition the response to environmental signals? Alternatively, can the behavior of larger networks be adequately represented by studies of smaller networks?
Finally, we addressed network evolution solely in the context of single-step mutations. The publish-subscribe model could easily be extended to address other types of mutations, such as gain or loss of binding sites in regulatory regions, gene duplication and divergence [13], or whole genome duplication. Nonetheless, our results suggest that the publish-subscribe model holds remarkable evolutionary potential even when mutation is restricted to single steps.
Our publish-subscribe model of genetic regulatory networks adds a layer of complexity to the common NK networks by making the gene regulation process more explicit, and by using a rule system for matching gene products to regulatory sites that affect the expression state of other genes. In this way it is similar to yet more complex models. Examples include the Artificial Genome class of models [58][59][60][61], which create an information sequence analogous to DNA, and content-based networks [19,25,62], where the focus is on the topological properties of the networks rather than their dynamics. The production of new, more complex, variants on well-studied models in biology can often aid in two ways. First, the introduction of new parameters might suggest that there is behavior outside the scope of the simpler model. Second, the introduction might allow us to ask different questions. The publish-subscribe model appears to do both.