Beyond the French Flag Model: Exploiting Spatial and Gene Regulatory Interactions for Positional Information

A crucial step in the early development of multicellular organisms involves the establishment of spatial patterns of gene expression which later direct proliferating cells to take on different cell fates. These patterns enable the cells to infer their global position within a tissue or an organism by reading out local gene expression levels. The patterning system is thus said to encode positional information, a concept that was formalized recently in the framework of information theory. Here we introduce a toy model of patterning in one spatial dimension, which can be seen as an extension of Wolpert’s paradigmatic “French Flag” model, to patterning by several interacting, spatially coupled genes subject to intrinsic and extrinsic noise. Our model, a variant of an Ising spin system, allows us to systematically explore expression patterns that optimally encode positional information. We find that optimal patterning systems use positional cues, as in the French Flag model, together with gene-gene interactions to generate combinatorial codes for position which we call “Counter” patterns. Counter patterns can also be stabilized against noise and variations in system size or morphogen dosage by longer-range spatial interactions of the type invoked in the Turing model. The simple setup proposed here qualitatively captures many of the experimentally observed properties of biological patterning systems and allows them to be studied in a single, theoretically consistent framework.


Introduction
Shape and size are global properties of organisms and of their constituent parts. Yet organisms develop and grow by processes that are intrinsically local: cell division, fate commitment and differentiation, migration, and death. To coordinate these processes appropriately, cells must reproducibly activate different gene expression programs in a manner that is positionally specified, or patterned, within a tissue or a whole organism. Here we are interested in essential conditions for pattern-forming systems to support rich and robust positional specification of cells. Rather than focusing on any particular organism, we analyze a minimal tractable model that determine its location with roughly 1% relative precision and consistent with the measured precision of downstream positional markers [25]. Furthermore, this patterning system exhibited signatures suggesting that positional "gap gene code" might be optimally organized. This suggests an interesting theoretical program: look for regulatory network architectures that maximize encoded positional information [36][37][38][39][40] and compare these ab initio predictions to Drosophila gap gene data.
Taking a step back from concrete systems that necessarily involve an overwhelming amount of biological detail, there are a number of basic yet still unresolved questions about patterning systems and positional information: How do optimal patterns (i.e., patterns that maximize positional information) look like and what determines their shape? How are efficient patterning strategies different if patterning cues are distributed throughout the domain or are present solely at domain boundaries? In systems where multiple outputs are simultaneously driven by the same patterning cues, how should these outputs be coupled amongst themselves and across space? Can reliable patterns emerge from very noisy patterning cues, that is, can the readout network actually "create" positional information? And finally, what is the interplay between positional information and various aspects of robustness-to noise, to systematic changes in patterning cue levels, or to small variations in system size-that have been extensively discussed in particular biological systems [41][42][43]?
To address these questions as clearly as possible in a rigorous information-theoretic framework, we follow the methodological approach taken by Wolpert in describing his French Flag model. We start with the simplest toy model of patterning, where smoothly varying patterning cues, e.g., morphogens, drive the expression of "binary genes" that only can have two states, ON or OFF. Clearly, this is not an appropriate assumption for many real patterning systems that rely on intermediate levels of gene expression. Conceptually, however, this assumption has three major advantages: first, it will provide us with basic theoretical insights that generalize to more complex setups; second, we will be able to easily visualize binary gene expression patterns; and third, we will be able to count the number of distinguishable gene expression states. The latter property is essential to gain an intuitive interpretation of positional information, which is generally measured in an abstract "currency" of bits.
In the following, we start by introducing our minimal 1D model of patterning, which is closely linked to Ising models in statistical physics, where magnetic spins (analogous to our binary ON/OFF genes) respond to spatially inhomogeneous magnetic field (analogous to our smoothly varying morphogen profile). We briefly review the information-theoretic foundations of positional information relevant to the proposed model. We then systematically explore optimal patterns and how they depend on the shape of the input gradient, noise level, the strength of gene-gene and spatial interactions, etc; as a result, we will be able to give a full account of how these factors affect positional information within our model class.

A minimal model for a pattern-forming system
We look for a model patterning system in which we can systematically explore the effects of gene-gene interactions, spatial interactions, and noise. To keep the task conceptually clean and computationally tractable, we sought for the simplest possible model: we focused on binary (ON/OFF) patterning genes in the established framework of Ising-like spin models. These paradigmatic systems of statistical physics are minimal, i.e., able to generate the relevant phenomenology while having the smallest number of parameters [44,45]. Our use of such models does not imply that all patterning genes should be viewed as binary (e.g., previous analysis suggests otherwise for Drosophila gap genes [25]), or that patterning happens at thermal equilibrium.
Nevertheless, conclusions obtained in the simple Ising model framework often do generalize qualitatively to more complex systems and focus attention on important quantities. At the same time, the Ising framework we introduce below can be seen as a direct extension of Wolpert's original model of binary genes responding to smooth morphogen gradients.
We start with a discrete one-dimensional lattice of N sites, x = 1, . . ., N. At every site x, the expression pattern is described by a binary variable σ(x), with σ(x) = 1 denoting that the patterning gene at location x is ON, and σ(x) = −1 denoting that the gene is OFF (see Fig 1A). Central to our analysis is the fact that the patterning system is noisy, due to, e.g., intrinsic stochasticity in gene regulation or extrinsic variability in system parameters. To capture the probabilistic nature of the patterning outcomes, we think of the patterning system as generating different spatial patternss ¼ fsðxÞg with different probabilities, and in the Ising model framework the probability of each pattern can be written as follows: Here, Z simply ensures that the distribution Q is normalized, η sets the intrinsic noise in the system, and the "energy, " H y ðsÞ, describes the effect of morphogens and gene-gene interactions on the resulting gene expression pattern. This distribution over all spatial patterns, Q y ðsÞ, is parameterized by θ, a set of parameters that we will explicitly identify for our proposed model later. For the "energy" H, we write: Here, spatial interaction is modeled by coupling of nearest neighbor sites with a coupling strength J. Positive J favor neighboring genes to have equal expression states, whereas negative J favor neighboring genes to have opposing expression states. Biologically, positive J could be realized by diffusion or active transport of gene products between neighboring cells or nuclei; negative J, corresponding to repressive spatial interactions, could be mediated by cell-cell signaling networks, e.g., the Delta-Notch pathway [46,47]. The first term in Eq (2) contains the "bias, " h. The bias favors each individual gene to be either ON, whenever h(x) > 0 at that gene's location, or OFF, whenever h(x) < 0. This term depends explicitly on the coordinate, x, and thus models the effect of a morphogen at each location. We can gain biological realism and allow later extensions of the model to multiple patterning genes if we assume that at every location there is a particular value of an abstract morphogen "signal, " m(x), which determines the bias in a linear fashion, h(x) = n(m(x) − E). If the signal, m(x), is interpreted as the logarithm of the morphogen concentration, m(x) = log(c(x)), there is an exact mathematical relation between the probability that the patterning gene is ON, P(σ(x) = 1), and the Hill-type thermodynamic model of regulation for the gene σ. Suppose that the gene σ is regulated in a strongly cooperative manner byñ binding sites with equal affinities,K . Then and it is easy to show that the parameters n and E relating the bias h(x) to the morphogen signal m(x) in our model correspond, up to a multiplicative factor, toñ and log ðK Þ in the thermodynamic model of regulation. Furthermore, observed morphogen concentration gradients that commonly have an exponential profile, c(x) = c 0 exp(−λx), map to linear morphogen signals, . Indicated in gray is the corresponding exponential concentration gradient, c (x), when m(x) is interpreted in the context of thermodynamic models of gene regulation (see text). The morphogen signal m(x) regulates a binary patterning gene σ(x), whose expression state also depends on spatial interaction with strength J. Levels of gray denote the mean value hσ(x)i. Hence, σ = +1 (white) and σ = −1 (black) mark positions where the gene expression state is deterministic, while levels of gray correspond to noise-induced fluctuations in expression state, with σ = 0 (medium gray) marking positions at which the two expression states are equiprobable. (B) For two or more genes per lattice site the model is extended with pairwise local interactions between genes at each lattice site. Spatial interactions are considered only between the same genes at different lattice sites.
m(x) = −λx + log(c 0 ), in our framework. Linear m(x) is thus our baseline profile, although we will subsequently also define and explore more localized morphogen signals. In our model, η controls intrinsic fluctuations in the system. For η ! 0, the gene σ responds deterministically to the morphogen and the expression state at neighboring locations; for example, if there were no spatial interactions (J = 0), the gene would be ON whenever the local morphogen signal exceeds the threshold, m(x) > E, and OFF otherwise, following Wolpert's original idea. In contrast, for η ! 1 the genes respond completely randomly, with equal probability of being ON or OFF, irrespective of the relevant signals. In addition to this intrinsic noise, we also consider extrinsic noise [48]. To that end, we assume that there can be fluctuations in the morphogen signal, m, that are additive and Gaussian with constant variance ν, i.e., var(m) = ν. This maps to fluctuations in morphogen concentration that are proportional to the mean concentration, std(c) = νhci, a dependence that is biophysically plausible and has previously been discussed in the literature [48,49].
An extension of the model from a single patterning gene to multiple genes is straightforward (see Fig 1B). Let there be K distinct patterning genes, such that at each location σ(x) {σ α (x)}, for α = 1, . . ., K. To write down the function H y ðσÞ and compute the probability of every pattern, we simply reproduce the "bias" and spatial interaction terms for each one of the K genes and sum them up in the energy function. Next, we add a qualitatively new term that couples the K genes amongst themselves at every location x, which models activating (J αγ > 0) or repressive (J αγ < 0) interactions between the genes α and γ (where α, γ 2 {1, . . ., K} and α 6 ¼ γ). The complete energy function reads: This model of K genes responding to a morphogen signal is thus fully specified by 3K + K(K + 1)/2 interaction parameters θ = {n α , E α , J αα , J αγ }, the intrinsic noise η, the extrinsic noise ν, and the shape of the morphogen gradient, m(x). These parameters, together with Eqs (1 and 4), fully specify the resulting distribution over gene expression patterns, Q y ðσÞ. Next, we formally define positional information and discuss its computation and behavior within our model. (5) is the entropy of the distribution of expression states across the lattice, with P σ ðσÞ ¼ 1 N P N x¼1 PðσjxÞ, which favors diverse use of expression states across the spatial pattern. The second term is a penalty term that quantifies the average variability in gene expression which is uncorrelated with position. Because it can induce confusion in the expression-position mapping, this "noise entropy" can only reduce the information and is zero in a noiseless system.

The first term in Eq
Considering a single patterning gene without noise, gene expression achieves maximum positional information by partitioning all N locations into two equally sized sets, one where the gene is OFF and one where the gene is ON. Without noise, the second term in Eq (5) vanishes and the positional information is given by the entropy of expression states. For a balanced assignment of expression states to positions we have P s ðs ¼ þ1Þ ¼ P s ðs ¼ À 1Þ ¼ 1 2 and thus S[P σ (σ)] = 1. Such a pattern is said to convey "one bit" of positional information, an amount sufficient to reliably separate the anterior from the posterior, or odd from even rows of cells; generally, one bit corresponds to the amount gained by an unambiguous answer to an optimally posed yes/no question. In the case of multiple patterning genes, a combination of gene expression states, e.g., {ON, OFF, ON, ON} or {+1, −1, +1, +1}, can be seen as a "codeword" for some particular position. The capacity of this code, independently of how the gene expression patterns are set up or read out, is again given by the first term in Eq (5). For four binary genes this cannot exceed 4 bits; as in the case of a single gene, the bound is achieved when each of the 2 4 = 16 distinct gene expression combinations is used equally often across all locations x in the tissue.
Mutual information is symmetric in its arguments, so that we can rewrite Eq (5) as where S[P x (x)] = log 2 N for cells that are uniformly distributed over coordinate x, as we assume in our model. This way of writing positional information emphasizes that log 2 (N) bits is the upper bound on the positional information in a patterning system, which would correspond to an unambiguous identification of every location x based on the expression pattern σ. Positional information is decreased from this bound by the second term in Eq (6) which measures the uncertainty in position that remains even when one knows gene expression levels.
One can be more explicit about the error in trying to estimate the position, x, given the expression levels σ. It can be shown that the expected error of any estimatorx for position is bounded by positional information. To find that bound, we think of P x (x) as the prior distribution of an estimator at position x. The best choice for an estimator of position is the expectation value EðxÞ estimated from a distribution encoding all and not more information about its position than the estimator can access. Therefore, for any estimator it holds Eðx ÀxÞ 2 ! Eðx À EðxÞÞ 2 ¼ varðxÞ.
Without further information the estimator has to rely on the prior distribution P x (x). For the uniform P x (x), the variance of x can be expressed by the entropy S[P x (x)]: The second inequality holds since the additional information about the expression state σ can only decrease the entropy (S[P x (x)] ! hS[P(x|σ)]i σ ). We assume that P(x|σ) = P(σ|x)P x (x)/P(σ) is the Bayesian inversion of the correct expression state of the system P(σ|x This bound on the error can be made vanishingly small if positional information approaches its upper bound, I ! log 2 N bits. This relationship is analogous to the relationship between positional error and positional information for continuous systems [25,26]. Importantly, when cells need to make decisions appropriate to their position within an organism, they are also subject to the estimation limits of Eq (8), irrespective of how complex the molecular readout mechanism for σ is. How is the pattern-forming process Q y ðσÞ related to P(σ|x), which determines positional information? While the pattern-forming process Q yields a global (joint) distribution over all possible patterns, the positional information is local and thus is only concerned with the expression state at a given location x; thus, P(σ|x) is obtained by summing (marginalizing) the joint distribution over gene expression states everywhere but at x: Therefore, we can use Eqs (1, 5 and 9) to compute positional information, I(σ;x), which we will refer to as "PI" in the text, for any patterning system with parameters θ. Using standard approaches from statistical physics (transfer matrices) these mathematical manipulations are all doable exactly when the extrinsic noise, ν, is zero. When ν 6 ¼ 0, analytical techniques coupled with tractable Monte Carlo sampling can be used to compute PI as a function of parameters; see S2 Appendix for details. Our framework clearly separates the pattern-forming process whose outcome is described by Q y ðσ Þ and that depends on mechanistic parameters θ, from the resulting pattern, which carries a certain amount of positional information I(σ;x). Such a distinction is important and clarifies a number of conceptual issues with positional information. Consider, for example, the case of a single, noiseless gene responding to a smoothly varying morphogen gradient. As explained above, the maximal positional information achievable in this system is 1 bit. On the other hand, one could argue that by changing the threshold at which the readout gene is activated, the system can position the pattern boundary in any of N + 1 possible locations (including the system boundaries), generating N + 1 possible patterns: shouldn't the information then be I = log 2 (N + 1) ! 1 bits? The apparent contradiction is that log 2 (N + 1) bits is not positional information; rather, it is a measure of how many different joint patterns,σ , can be generated by changing the parameters θ of the pattern forming system (in this case the readout threshold), i.e., the mutual information Iðσ ; yÞ, which is an intrinsic property of the pattern-forming process, Q. This quantity certainly affects positional information, I(σ;x), yet it is not identical to it. In particular, positional information can be different for each separate value of parameters θ. As we will see in our toy model, one can find patterning systems that have a high positional information while simultaneously having either a high or low value for Iðσ; yÞ, and vice versa. The two information-theoretic quantities therefore describe independent aspects of the system, and should not be confused with each other.

Optimal patterning with a single gene depends on noise level and morphogen profile
We start by studying a system with one patterning gene, σ, on a lattice of N = 50 sites. We choose a linearly decaying morphogen signal, m(x), which favors the ON state of the patterning gene in the anterior and OFF state in the posterior. In absence of any noise and spatial interactions, we have a clear expectation for the pattern that yields the maximally achievable PI of 1 bit: this will be a symmetric partition of the lattice into equally sized anterior (ON) and posterior (OFF) halves, with a single sharp boundary at x = N/2. In terms of the parameters of our energy function, Eq (2), this corresponds to choosing E = J = 0, n = 1 and η, ν ! 0.
How does the spatial interaction J affect the ability of a single gene σ to encode positional information when noise is not zero? Fig 2 shows I(σ; x) as a function of the coupling strength J at fixed levels of intrinsic (η) and extrinsic (ν) noise. In the absence of spatial interactions (J = 0) the response of the gene σ is uncoordinated across positions and gets largely destroyed by random fluctuations (Fig 2B(i)). Consequently, PI is low, in this case below 0.1 bits.
If spatial interaction is increased to positive values of J, PI increases steeply to a maximum. The emergent pattern is qualitatively consistent with the expected optimal pattern: it contains a single boundary that divides the lattice into anterior and posterior halves (Fig 2B(ii)). The effects of noise on the patterning gene are restricted to the area around the boundary. Therefore, the maximally achievable PI is determined by the accuracy with which the boundary is positioned and depends on the noise level. For this optimal boundary pattern, additional extrinsic noise, ν, diminishes PI only mildly.
The large increase in PI can be ascribed to the well-studied effect of spatial averaging [39,50]. With increasing coupling strength J, different lattice sites do not respond independently anymore but are spatially correlated. Consequently, fluctuations at individual lattice sites are overcome by a concerted response to the input field integrated over a range, which greatly sharpens the resulting pattern. As J is increased further, PI decreases again and eventually drops to zero because the lattice is forced into a spatially uniform (all ON or all OFF) configuration. Fig 2C summarizes the benefit of positive spatial interactions. The solid curve is the PI without spatial interactions plotted as a function of intrinsic noise η, whereas for the dashed curve J has been optimized separately for each value of η. The values of the optimized J as a function of η are shown in the inset. The spatially coupled system is therefore capable of retaining PI above 0.5 bits for more than an order of magnitude higher internal noise relative to the system without spatial interaction. In sum, positive spatial coupling J has the ability to stabilize the near optimal pattern against effects of intrinsic and extrinsic noise.
A qualitatively different pattern is formed if J is decreased to negative values. Initially, for small negative J, PI decreases almost to zero as negative spatial interaction disturbs the readout of the morphogen signal. Beyond the minimum, PI increases again and the system generates a pattern in which neighboring lattice sites alternate between ON and OFF states (Fig 2B(iii)). This is an alternative strategy for encoding PI: the system now distinguishes between even and odd lattice positions instead of an anterior and a posterior segment. While in principle both patterns can encode a full bit of PI, we will see that the alternating pattern is less robust against noise.
For the alternating pattern to form correctly it is necessary that the ON/OFF sequence is faithfully propagated through the bulk and that the morphogen signal can reliably break the symmetry between the two possible alternating patterns (if both patterns are equally likely, PI goes to zero). The condition for robust propagation depends on intrinsic noise. Breaking the symmetry is, on the other hand, highly susceptible to extrinsic noise, i.e., fluctuations in the morphogen signal. Consider, for example, the case of the linear gradient, which exerts the strongest bias at the anterior-and posterior-most sites. The ability to reliably break the symmetry depends on the anterior site consistently experiencing a stronger bias towards ON than the second site (which in an alternating pattern should be OFF), i.e., the mean difference in the morphogen signal between the first two sites needs to be larger than the typical strength of extrinsic fluctuations in the signal, hm(x = 1) − m(x = 2)i > ν, otherwise PI of the alternating pattern will be severely impaired. A similar argument can be made for the influence of intrinsic noise. Even if spatial interaction is strong enough to allow only strictly alternating patterns, it is still possible that the entire pattern flips to its inverse due to intrinsic fluctuations. Again, the ability of the system to select between the two possible patterns depends on the difference of the mean values at neighboring lattice site compared to the intrinsic noise level.
How does the optimal strategy for a single patterning gene depend on the shape of the morphogen signal? To investigate this, we consider a set of exponential shapes for the morphogen signal m(x), parametrized by a a decay parameter, χ. For χ ( 1, we recover the linear morphogen signal discussed above, while for larger χ the morphogen signal is increasingly concentrated at the anterior, as shown in Fig 3A. Fig 3B shows PI carried by a system with one gene as a function of χ and coupling strength J. Because localized morphogen signals can reliably break the symmetry of an alternating pattern, high PI can be achieved with a combination of negative J and large χ. In contrast, patterns that form a boundary (with J > 0) are efficient if the morphogen signal extends throughout the system. Consistent with our first observation, a systematic survey in Fig 3C shows that small additions of extrinsic noise severely lower PI carried by the alternating pattern while leaving the boundary pattern almost unaffected.
These observations can be summarized in a phase diagram, shown in Fig 3D. The diagram, constructed for zero extrinsic noise (ν = 0), divides the plane spanned by intrinsic noise η and morphogen shape parameter χ into a region where negative spatial interaction is optimal and a region where positive spatial interaction is optimal. For very low intrinsic noise, the alternating pattern generally outperforms the boundary pattern. As intrinsic noise increases, the optimal patterning strategy depends on the form of the gradient: for a spatially extended gradient the boundary pattern is optimal, whereas for a gradient concentrated at the boundary the alternating pattern is optimal. For sufficiently high intrinsic noise, the boundary pattern always outperforms the alternating pattern, even for gradients concentrated at the boundary. Adding extrinsic noise generally shifts the boundary in the phase diagram to favor positive spatial interactions.
Optimal patterning with multiple interacting genes can establish a stable combinatorial code for position Which patterns optimally encode positional information in systems with multiple patterning genes? The French Flag model proposes a cascaded activation of the genes in response to the Enhancing Positional Information by Spatial and Gene Regulatory Interactions morphogen signal. For two binary genes this would lead to a pattern with three separate states (the "Tricolore" of the French Flag) and a maximal PI of I(σ; x) = log 2 (3) % 1.59bits. In contrast, we know from the arguments made above that an optimal pattern with two binary genes should have PI of 2 bits, at least with vanishing intrinsic and extrinsic noise. In such a pattern all possible expression states would be realized and evenly distributed throughout the lattice. For two binary genes, (σ 1 , σ 2 ), there are four different possible states: (ON,ON), (ON,OFF), (OFF,ON) and (OFF,OFF); note that one of the mixed states is missing in the French Flag, which is why it has lower PI. We will refer to PI-maximizing binary patterns for K genes, where all 2 K states occur with equal probability in the pattern, as "Counter" patterns. A Counter pattern is an example of a combinatorial code, where position can only be decoded properly when the readout mechanism has simultaneous and complete access to the local expression states of all K genes.
We can ask two fundamental questions about Counter patterns. First, can such patterns be generated in a model where genes interact locally in a pairwise fashion and are spatially coupled, as assumed by Eq (4)? Second, Counter patterns are clearly optimal when noise is vanishing; are they optimal also when noise is present? If not, what are the optimal patterns in that case?
To investigate these questions we optimize the parameters of our model for two and three genes to find patterns that maximize PI for different levels of noise. Specifically, we vary all interactions in the system, both spatial (J αα ) and regulatory (J αγ ), as well as the parameters that prescribe how each gene couples to the morphogen signal ({n α , E α }). For the case of two (three) patterning genes, this amounts to a total of 9 (15) parameters; we use stochastic optimization to carry out PI maximization (see S2 Appendix for details). As with the single gene case, we assume a linear morphogen signal, m(x). The only remaining dependence of our results is thus on the strength of the intrinsic (η) and extrinsic (ν) noise.
When the noise level is low enough, we find that one of the genes always takes on strongly negative spatial interaction, J αα < 0, to generate an alternating pattern. This gene does not interact with the others, and contributes one bit (in the low noise limit) to the total PI. There can only be one such gene in the pattern, as any subsequent alternating gene (with the same or opposite polarity) would be redundant with the first one and thus provide no further increase in PI. As this strategy to increase PI by one bit is trivially available to any patterning system at low noise and does not occur at high noise, we restricted our subsequent search only to cases where spatial interactions are restricted to be positive, J αα > 0, mimicking spatial averaging induced by diffusion or active transport of patterning gene products. Fig 4A shows the maximal PI carried by two binary genes as a function of intrinsic noise, η. Characteristic examples of corresponding output patterns are shown in Fig 4B. As expected, the optimal pattern at low noise is a Counter, realizing each of the four possible expression states in an equally sized spatial fraction. A network schematic illustrating the optimized parameters is depicted in Fig 4C. As noise increases, the values of optimal parameters undergo an abrupt change and the optimal pattern changes from a Counter to a French Flag, encoding only three states (region (ii)). At the boundary between the two regions there is a visible kink in the PI curve. Continuations of the two coding strategies into the respective other noise regime are depicted as dotted curves in Fig 4A. If the noise level is increased even further, the optimal network changes its coding strategy again and generates a pattern in which both genes redundantly form a boundary in the center (region (iii)).
What are the respective influences of spatial and local gene-gene interactions on the formation of patterns and encoding of positional information? To study this question, Fig 4D compares the curve of Fig 4A with curves for PI carried by systems which are optimized without spatial interactions (dashed curve) and without local interactions (dotted curve). Without local interaction between genes the maximally achievable PI is about 1.59 bits, corresponding to the three-state French Flag pattern. This observation can be generalized: if a monotonous input gradient acts on the target genes independently via a monotonous response function (e.g., a sigmoid, as here), then each gene can form only a single transition or boundary. In that case, the French Flag pattern indeed encodes the maximally achievable PI. If, however, the genes interact with each other, their response functions receive multiple inputs and complex patterns are possible. For instance, the system generating the Counter pattern (Fig 4C(i)) has strong mutual repression between the two genes, which switches the lower gene to the inverse of the upper gene where the strength of the morphogen signal is low. Systems without spatial interactions, in contrast, can and do carry as much PI as the fully interacting system when the noise is vanishing. As the noise level increases, however, the fully interacting system always outperforms the system without spatial coupling, demonstrating the important role of spatial noise averaging. At high noise, the optimal boundary pattern is identical for both genes, providing two redundant read-outs of the morphogen signal. In this case, the local gene-gene interaction is Enhancing Positional Information by Spatial and Gene Regulatory Interactions positive and strong, providing further noise averaging (across the two readout genes), beyond that due to spatial interactions.
Maximization of PI in a system with three genes corroborates our results for two genes. Again, as intrinsic noise increases, the optimal strategy switches abruptly at particular noise values, marking a transition to a code that specifies one less distinct expression state. Between transitions, the number of distinct states that the network can generate is held constant, but information nevertheless decreases smoothly as increasing noise leads to more ambiguity in the mapping between position and gene expression state (Fig 4E). Three binary genes can generate a maximum of eight distinct expression states, which is achieved by a three gene Counter pattern (Fig 4F(i)). Between the optimal Counter pattern and the fully "redundant" pattern encoding two states, we find a variety of other strategies, including French Flag, that specify an intermediate number of states (Fig 4F(ii)-4F(iv)). The Ising-model-based framework for binary interacting genes is clearly sufficiently rich to generate this variety, including the theoretically optimal Counter. It is possible that even richer models, e.g., models allowing threeway interactions between genes in addition to pairwise interactions, would lead to higher PI values, possibly by being more robust to noise, or by allowing the Counter pattern to be easily generated in systems with K > 3 genes.
These results are not changed significantly by the addition of extrinsic noise. Generally, increasing extrinsic noise decreases the maximum achievable PI and can, analogously to intrinsic noise, lead to a change of the optimal patterning strategy. The effects of extrinsic noise can be effectively attenuated by spatial interaction of the patterning genes. This raises the question of how much positional information loss due to fluctuations in the input morphogen can be avoided by a spatially interacting network of patterning genes.

Positional information of spatially interacting patterning genes can exceed that of the morphogen signal
Can patterning genes encode more positional information than the morphogen signal itself? In other words, can a network generate PI starting with a noisy signal? Intuitively, both Turing patterns and cellular automata would suggest that the answer to this question is affirmative. In a Turing model at steady state, spatial locations are assignable to (at least) two expression states, so that there is positional information where initially there was none. Similarly, consider the simplest cellular automaton that proceeds along one discrete spatial dimension with a simple rule: "Read the value in the current position, increment the value by one, move one cell to the right, write the value. " Such a cellular automaton would generate a separate cell fate (i.e., a unique numerical value) in each position, providing maximal PI. In both cases, PI before patterning appears to be zero, and after patterning has some nonzero value.
More careful thought reveals, however, that the PI was established by transforming information that must have been present already at the beginning of the patterning process. Turing patterning is a deterministic mechanism defined by a set of partial differential equations, whose steady state solution therefore depends on the initial condition and the shape of the boundary. While the mechanism will generically produce domains with a typical lengthscale separated by sharp borders, the positions of the borders will shift with the initial and boundary conditions. Specifying initial and boundary conditions, however, requires information: more bits for more precise specification. Although the formal link between this information and the resulting PI depends on the system, it is clear that an ensemble of Turing systems with arbitrary initial / boundary conditions will not yield an ensemble of patterns with appreciable PI. This is even more obvious in the cellular automaton example. To apply the proposed rule and generate the pattern, one needs to specify the initial condition: the numerical value in the cell at position one. Suppose that there is N possible lattice positions. Specifying the initial value, θ 0 , for position one then amounts to providing I 0 = log 2 N bits of information for the automaton to start working. The PI of this initial pattern (with the first cell specified, and all others unspecified) is low. After the automaton finishes, all N positions are uniquely specified, yielding PI of I(σ; x) = log 2 N bits. The automaton has therefore taken the initial information I 0 and "spread it over space" to generate PI of equal amount. If, however, the initial value is specified poorly (i.e., probabilistically, with close to uniform distribution), the resulting ensemble of patterns will have very small PI.
These restrictions trace back to the Data Processing Inequality (DPI) [27], a central result in information theory. If initial/boundary conditions for a deterministic patterning process are specified with finite precision (e.g., they are noisy across repetitions of the same pattern generation), then the resulting patterns can again be seen as draws from the distribution Q y ðσÞ, where θ are now interpreted as the true boundary / initial conditions, which, however, enter the patterning dynamics with some noise. This is much like the extrinsic noise that corrupts the morphogen signal in our Ising-like model. In the case of the simple cellular automaton, it is easy to convince oneself that the final PI must be equal to the initial information, I 0 . The corresponding patterning can be seen as a Markov chain: y 0 !ỹ 0 !s, where θ 0 is the "true" initial condition,ỹ 0 is its corrupted version, which corresponds to the initial condition not being specified with perfect precision, ands is, as before, the resulting pattern. In this case, DPI states that Iðy 0 ;sÞ Iðy 0 ;ỹ 0 Þ-the precision by which the initial state is specified limits the reproducibility of the resulting pattern. Since in this example Iðy 0 ;sÞ ¼ Iðs; xÞ, PI is limited by the specification of the initial state. The same type of argument applies generally, although the bounds for more complex systems may be difficult to derive.
What limits does the Data Processing Inequality imply for our model system? First, unlike the cellular automaton and the Turing mechanism examples above, our patterning takes place at equilibrium, so that the dependence on initial conditions is lost. When spatial interactions in our model are set to zero, the expression states at individual locations of the lattice become independent of each other. The patterning can then be seen as a Markov chain, x ! m ! σ, and DPI requires that I(σ; x) I(m; x), i.e., that PI of the patterning genes must be smaller or equal to PI of the morphogen signal. In this case, the network, no matter how complicated, cannot provide more PI than the morphogen signal already has.
How does this picture change when we allow spatial interactions? Fig 5 compares PI carried directly by the morphogen signal with PI carried by an optimized network of three patterning genes responding to that morphogen signal. For steep gradients or high extrinsic noise, PI of the patterning genes can indeed exceed PI in the morphogen signal itself, which appears to be in violation of DPI.
To explain this observation, we turn to the theoretical question of whether a patterning network downstream of the morphogen signal can contribute to encoding of PI beyond the information already present in the signal. More specifically, given a morphogen signal m and several patterning genes σ responding to it, is it possible that I(σ, m; x) > I(m; x)? Here, I(σ, m; x) is PI jointly carried by the simultaneous state of both σ and m about position x. Joint information can be split up as follows: where the last term is the conditional mutual information In this context, I(σ; x|m) is PI carried by the pattern σ additional to that in the morphogen signal m. It is zero if and only if P(σ|m, x) = P(σ|m). It is easy to see that this condition is met precisely when spatial interactions are zero. Then, σ(x) depends only on its local input, m(x), and not directly on its location within the lattice. In other words, position, the morphogen signal, and the patterning genes form a Markov (dependency) chain, x ! m ! σ. If, in contrast, patterning genes are spatially coupled, σ(x) additionally depends on the state of σ at neighboring lattice sites, which in turn respond also to their respective local morphogen signals. In that case, position, the morphogen signal, and the patterning genes do not form a Markov chain and DPI does not apply. Our derivation and numerical results show that a spatially coupled network of genes downstream of the morphogen signal can extract PI in excess of that carried by the morphogen signal itself. Without spatial interactions, this is impossible in our setup, regardless of the complexity of local gene-gene interactions. Note that spatial interactions do not necessarily lead to an increase in PI, but can do so if optimally chosen, in the regime where they permit spatial noise averaging.
Long-range interactions enable Turing-like pattern formation, and make patterns robust to system size and morphogen signal variations Is there a role for spatial interactions beyond noise averaging? To address this question, we start by studying a seemingly unrelated problem of whether a Turing-like pattern generating mechanism also exists in our discrete setup. After establishing that is indeed the case, we proceed to show that in an appropriate limit Turing-like pattern generating capability also confers two biologically desired properties onto our system: the ability of the resulting patterns to automatically scale with the system size, and robustness to systematic perturbations of the morphogen signal, a special case of "canalization" that has been discussed in the biological literature [51][52][53][54].
A distinguishing characteristic of Turing patterning is the emergence of patterns with an intrinsic length scale, which is set by the diffusion ranges of the two reacting species of the model. Until now, our model did not exhibit this property: spatial scale was either slaved to the external morphogen signal, as in the French Flag model, giving rise to boundary patterns; or the scale was one lattice spacing, as in the alternating pattern, where genes switched from ON to  . Fig 2A). The computation of Iðm; xÞ is described in S3 Appendix. Enhancing Positional Information by Spatial and Gene Regulatory Interactions OFF at neighboring sites. Is it possible to generate patterns in which blocks of sites with a controllable intrinsic length scale alternate between ON and OFF states? To test if this behavior can arise in our model, we introduce long range spatial interactions. In particular, we assume that the spatial interaction strength between a gene α at lattice site i and at lattice site j decreases exponentially with distance: J aa ði; jÞ ¼J aa exp ðÀ ðji À jj À 1Þ=rÞ, as in Fig 6A. Here,J defines the amplitude of the interaction, which can also be negative. The parameter r defines the interaction range, such that r ! 0 leads to nearest-neighbor interactions and r ! 1 leads to a uniform, all-to-all coupling.
To find a Turing-like pattern, we considered two mutually repressing genes, one of which interacts in a strong positive nearest-neighbor fashion, whereas the other has weak long range repression (Fig 6B). The resulting patterns for several interaction ranges r are shown in Fig 6C. The system is clearly able to generate a blockwise alternating pattern with a length scale that is controlled by the interaction range, r. We emphasize that the only morphogen signal in this case is a strong positive bias at the anterior boundary, experienced by the first lattice site, provided to break the symmetry between the resulting pattern and its inverse. This explicitly demonstrates that the block length scale is not inherited from the shape of the morphogen signal, but is intrinsic to the interactions between the patterning genes. The same effect can be generated with a single gene, if we allow spatial interactions to change sign with distance (e.g., positive interaction at the nearest-neighbor range, and negative interaction with lattice sites at greater distance).
Can we combine the ability of the Turing-like mechanism to generate intrinsic patterns with the network architecture that yields high-PI Counter patterns? The Turing mechanism has an attractive property which makes use of the morphogen signal only to break the symmetry between two possible intrinsically stable patters that are inverses of each other; in all other respects, the pattern is invariant, or robust, to changes in the morphogen signal magnitude or shape. The notion that external signals simply serve to select one of the few stable patterns (attractors) of the system, while the properties of the patterns are generated by intrinsic interactions, is known as "canalization" in the biological literature, but also has a very long history in neuroscience and statistical physics. Another feature of patterning that appears beneficial in a biological context is the ability of the system to translate small variations in the overall system size into proportional variations in the resulting gene expression pattern, i.e., to scale the pattern system size. If the system exhibits such "scaling, " gene expression features, for instance boundaries between ON and OFF states, will occur at constant fractional coordinates in the system, rather than at constant absolute coordinates. Here we test whether elements of Turinglike patterning can provide "scaling" and "canalization" properties to our Counter networks.
Before proceeding, we give an intuitive account as to why negative long range spatial interactions can be of benefit. A particular property of Counter patterns is their balanced use of ON and OFF expression states. Upon perturbations to the morphogen signal or shifts in pattern position because of system size changes, this balance will be broken. We are thus looking for a modification to the energy function, Eq (2) (alternatively, Eq (4) for multiple genes), that would penalize any deviation away from such balance. The simplest way to implement this is to add a term of the form ÀJ ð X N x¼1 sðxÞÞ 2 (or equivalent term for multiple genes). The term in parenthesis will be zero if the ON and OFF states are exactly balanced, and will be positive on any deviations in balance. WhenJ is chosen to be negative, such deviations are disfavored. If one analytically expands the square and takes into account that σ 2 [−1, 1], one finds that the term corresponds to a global, all-to-all (long range) negative interaction between any pair of locations in the lattice. Crucially, such a term is an additive addition to an optimized Counter network: because the optimized Counter exhibits the ON / OFF balance already, the new term doesn't change the Counter state itself, but only makes states deviating from it less likely. Realistically, the strength and range of such interaction cannot be made infinite; the relevant question is therefore whether canalization and scaling can be obtained with aJ of finite magnitude and range. We took the optimized two-gene Counter network and equipped it with long range negative spatial interactions, as shown in Fig 7D. We first tested whether this addition confers the scaling property to our Counter network. To this end, we consider a morphogen signal m(x) which spans its dynamic range over a default system size of N = 60 lattice sites. We model variations of the system size without scaling of the gradient by cutting the system and the gradient short by 10 lattice sites or extending it by 10 lattice sites at constant value of the morphogen signal in the posterior (Fig 7A). The corresponding gene expression patterns of the two-gene Counter network are depicted in Fig 7B. Without new long range spatial interactions, variations in system size do not result in the scaling of the pattern, as expected; instead, boundaries are fixed to their absolute positions. In contrast, when we add negative long-range interactions to each gene in our network, the pattern scales approximately, as shown in Fig 7C. Specifically, the central boundary is preserved with large precision, while the boundaries at 1/4 and 3/4 shift marginally.
Next, we examine the robustness to systematic perturbations in the morphogen signal in detail. We introduce an additive offset to the morphogen signal,mðxÞ ¼ mðxÞ þ , and ask about the resulting gene expression patterns with or without negative long-range spatial interactions. Note that additive perturbations in the morphogen signal map, in the thermodynamic model of gene regulation, to multiplicative perturbations in the morphogen concentration, c(x). Such perturbations can be interpreted as variations in the morphogen dosage, and robustness to such perturbations has been a focus of several experimental studies. Fig 7E demonstrates that strong long-range spatial interactions indeed successfully make the pattern robust to (large) changes in , while in the absence of such interactions the patterns experience large shifts with .  Example patterns generated by the optimal Counter two-gene network, at different strengths of long range interactions,J, and different morphogen signal perturbation magnitudes, . At = 0, irrespective ofJ, the system generates the optimal pattern. When || increases, the patterns shift (providing less PI) with weakJ, but whenJ is strong, the pattern is robust to such perturbations. (F) To quantify the robustness to perturbations, we compute the overlap, S, of the resulting pattern with the optimal Counter pattern. The overlap is shown as a function of andJ; for strong negativeJ, the overlap is high irrespective of the perturbation strength, . (G) Susceptibility to small perturbations as a function ofJ shows transition into a robust regime, χ m ! 0, asJ increases in magnitude. doi:10.1371/journal.pone.0163628.g007

Enhancing Positional Information by Spatial and Gene Regulatory Interactions
To quantify the stability of the pattern, we compute the overlap, SðÞ ¼ 1 N hσi ¼0 Á hσi , as a function of the perturbation strength. An overlap of 1 means that under the perturbation the resulting pattern is identical-therefore fully robust-to the (Counter) pattern without perturbation; an overlap 0 means that on average half the expression states are inverted, while an overlap of -1 indicates that the perturbed pattern is the exact inverse of the unperturbed one. Mapping out the overlap as a function of perturbation strength, , and the long range interaction magnitude,J , in Fig 7E, we see that with sufficient, but still finite,J , the patterns can be made almost completely robust to large morphogen signal perturbations. A similar differential analysis in Fig 7F shows how the susceptibility of the gene expression pattern to small morphogen signal perturbations vanishes as the strength of the long range interactions increases.
In sum, we have shown that "canalization" and "scaling" in our toy model system can be provided by the addition of long range negative spatial interactions. Curiously, these interactions can be added to a previously optimized Counter network without any need to change the optimized parameters, because they do not affect the energy or the identity of the Counter pattern ground state. As in the Turing model, these interactions stabilize the ground state. Unlike the Turing model, however, the Counter pattern itself is generated solely by a joint action of a French Flag-like morphogen signal readout and local gene-gene interactions.
Should the long-range negative interactions that confer robustness also follow from an information optimization principle? Recalling our introductory remarks, robustness to, e.g., morphogen dosage could be measured directly as a low value of information Ið;σÞ, implying that changes in would not affect the expression patterns. The question is whether to gain robustness one needs to explicitly maximize PI while jointly minimizing Ið;σÞ, or whether it is sufficient to maximize PI only and get robustness as an automatic consequence. Due to the numerical complexity of the problem we did not perform such a large scale joint optimization here, but at non-zero intrinsic or extrinsic noise long-range interactions will stabilize the Counter pattern and thus maintain high PI beyond what could be achieved by local interactions alone. In the PI-maximization framework which we are proposing here, it thus seems that maximization of PI alone will also yield robustness, so long as the noise statistics under which the optimization is carried out contain the kinds of perturbations to which the system should be robust. As a conjecture, we suggest that, were we capable of carrying out large-scale optimization numerically, we would find optimal solutions with long-range negative couplings that yield canalization if our extrinsic noise also consisted of correlated additive fluctuations (that mimic the overall additive shifts in the morphogen signal considered above).

Discussion
We introduced a tractable toy model of patterning that extends Wolpert's French Flag model, in which several two-state genes respond to a morphogen signal based on a fixed set of thresholds. Our extension allows these genes to cross-regulate each other, to interact spatially (e.g., due to diffusion or cell-cell signaling), and to include the effects of both intrinsic noise (e.g., due to stochasticity in gene expression) and extrinsic noise (e.g., due to variability in the morphogen signal). A physics equivalent of our model is a set of coupled 1D Ising chains responding to inhomogeneous external field and this link to statistical physics allows us to perform most of our computations exactly. In this well-defined setup we ask which patterns of gene expression encode the maximal amount of positional information; our enquiry is set in an optimization framework, where for each choice of noise magnitude and morphogen signal profile we look for the optimal pattern of gene-gene and spatial interactions, and examine the resulting spatial patterns of gene expression.
We find that with vanishing noise magnitude, the optimal gene expression pattern is the socalled Counter, where each of the 2 K binary patterns that can be realized by K genes appears equally often along the spatial coordinate; this combinatorial code for position is, on information-theoretic grounds, the best achievable solution, independently of the patterning model. The Counter pattern can be realized if each patterning gene can be activated at a different threshold, and if the local gene-gene interactions can be adjusted; the optimal interactions are predominantly repressive. Increasing the noise quickly perturbs the Counter pattern, until the optimal pattern switches from Counter to French-Flag-like set of stripes, and finally, to a set of redundant genes that are all activated at the same threshold. Addition of short-range positive spatial interactions of intermediate strength makes the optimal patterns substantially more stable against increasing noise; qualitatively, this effect is analogous to noise averaging due to diffusive coupling in continuous systems. Strong short-range negative interactions generate an alternating pattern of gene expression which robustly separates odd from even rows of cells, a strategy that can yield one additional bit of PI when noise is low.
Qualitatively new effects emerge if the spatial interactions can be long-ranged. We observe that our discrete, Ising-model-based model exhibits Turing-like patterns whose spatial scale is set by the range of the repressive interactions. Surprisingly, we find that the energy function, optimized to yield Counter patterns with high positional information, can be modified by the addition of strong long-range repressive interactions. This modification does not perturb the Counter pattern, but makes it robust to changes in the morphogen dosage and to changes in system size by producing patterns that approximately scale with system size.
Taken together, our analysis allows us to identify elements-the basic building blocks-of positional information: adjustable thresholds as in the French Flag model to differentially drive gene activation; local repressive interactions to generate combinatorial codes for position; short-range positive spatial couplings to enable noise averaging and stabilize the patterns; and long-range negative spatial couplings to provide scaling and robustness via canalization. The optimal patterning system thus combines elements from both the French Flag model with the elements inherent to the Turing mechanism. Furthermore, these elements need not be identified and combined by hand, but emerge from a single information-theoretic optimization principle, and their contribution towards encoding of positional information can be individually quantified.
The explicit purpose of this work has been to provide conceptual clarity and computational tractability rather than a detailed model of any particular patterning system. In order to achieve our goals, we had to sacrifice several crucial aspects of biological realism. First, real patterning does not happen at equilibrium, but is rather a driven dynamical process evolving from some initial state. Second, as the real patterning mechanisms are dynamic and might involve multiple timescales, noise mitigation mechanisms beyond spatial averaging, e.g., temporal averaging, should be available. Third, the assumption that expression states of patterning genes are binary might be poor. For example, in the gap gene system in Drosophila, intermediate levels of expression are of crucial importance [25]. On the other hand, regulatory circuits where individual genes have strong positive self-interactions and thus exhibit bistability could be well captured by our model. Fourth, the Ising framework assumes a particular (Boltzmann) distribution over expression states and thus leaves no degree of freedom to describe intrinsic stochasticity beyond its magnitude; in contrast, gene expression noise in real regulatory networks has a complicated relation to the mean expression [55]. Furthermore, because our model is a statistical physics model at equilibrium, interactions between the genes are necessarily symmetric, which does not need to be the case in realistic gene regulatory networks. Last but not least, we here follow previous models of embryonic development in that we consider patterns along a single (one-dimensional) axis of the embryo, assuming independence from the patterns along the other axes [2,16]. An interesting direction of future research would be to study patterns with high positional information in more than one dimension and in different geometries.
Despite these approximations, our model qualitatively recapitulates many aspects of the optimal patterning solutions that have been reported previously in more realistic setups, where the required computations are substantially more complicated [36][37][38][39][40]50]. This is in line with our previous observations that many mechanistic details that define the pattern-forming system Q do not matter so long as the system has the ability to access patterns that achieve high positional information (which can be found by optimization). Pattern-forming processes can be complicated and can include spatial interactions between nearby cells or even long-range or global interactions. Ultimately, however, these systems generate patterns whose power is quantified by a local quantity, the positional information I, which is sufficient to summarize the limits to readout error. For optimal solutions to the patterning problem, this locality of positional code is crucial: there are many possible pattern-forming systems, but only in a restricted subset do we expect high information about global position to be available locally.