Optimal Signal Processing in Small Stochastic Biochemical Networks

We quantify the influence of the topology of a transcriptional regulatory network on its ability to process environmental signals. By posing the problem in terms of information theory, we do this without specifying the function performed by the network. Specifically, we study the maximum mutual information between the input (chemical) signal and the output (genetic) response attainable by the network in the context of an analytic model of particle number fluctuations. We perform this analysis for all biochemical circuits, including various feedback loops, that can be built out of 3 chemical species, each under the control of one regulator. We find that a generic network, constrained to low molecule numbers and reasonable response times, can transduce more information than a simple binary switch and, in fact, manages to achieve close to the optimal information transmission fidelity. These high-information solutions are robust to tenfold changes in most of the networks' biochemical parameters; moreover they are easier to achieve in networks containing cycles with an odd number of negative regulators (overall negative feedback) due to their decreased molecular noise (a result which we derive analytically). Finally, we demonstrate that a single circuit can support multiple high-information solutions. These findings suggest a potential resolution of the “cross-talk” phenomenon as well as the previously unexplained observation that transcription factors that undergo proteolysis are more likely to be auto-repressive.


Introduction
Genetic regulatory networks act as biochemical computing machines in cells, measuring, processing, and integrating inputs from the cellular and extracellular environment and producing appropriate outputs in the form of gene expression.The behavior of these networks is not deterministic; many of the molecules involved in genetic regulation (e.g., DNA, mRNA, transcription factors) are found in low copy numbers, and are thus subject to severe copy number fluctuations.In living cells, the origins and consequences of stochasticity are well-studied (27; 30; 36; 43; 52; 70); one can analyze propagation of noise through cellular networks (46) and disambiguate noise from different sources (e.g., intrinsic vs. extrinsic (16; 49; 61)).Surprisingly, cells function in the presence of noise remarkably well, often performing close to the physical limits imposed by the discreteness of the signals and the signal processing machinery (4; 6).
From an information-theoretic perspective (58), noise intrinsic to the gene network presents an obstacle for signal transduction and biochemical computation: with too much noise, the information about the state of the environment (the signal) may be lost.While it may be possible to build stable biochemical switches even in the presence of just tens of copies of a transcription factor (5), networks often need to exhibit a more detailed response.As an example, the well-studied p53 module responds to ionizing radiation in a "digital" manner (35; 37), initiating a number of disparate cellular responses, including cell cycle arrest, apoptosis, and induction of cellular differentiation, among others (66).The p53 module must not only transduce a simple binary answer (was there DNA damage or not?), but also more specific information (What was the damage?How severe?What should be done about it?)It is not evident that a few tens of molecules, whose abundance is subject to intrinsic copy number fluctuations, can successfully perform this task.Of note, a series of recent papers studying the effect of single allele loss in various tumor supressor genes, including p53, challenge the classic two-hit model of tumorigenesis (32) by demonstrating dosage-dependent modulation of phenotype (see (17; 21; 28) and references therein).
The above example is just one of many instances of "cross-talk"-a perplexing dilemma observed across many cellular signaling systems in which a single noisy biomolecular species, presumably existing in just two states (active/inactive), is able to transmit complicated information.Perhaps the most well-studied example of cross-talk occurs in the protein signaling mitogen-activated protein kinase (MAPK) pathways.MAPK cascades transduce multiple stimuli from the environment into distinct genetic programs.Many of these signals are transmitted by common components (39).Specificity can be established by sequestration including cell type, subcellular localization, temporal, or with scaffold proteins (7; 38; 39; 53; 57).In some cases sequestration mechanisms are not available and specificity is achieved via signaling kinetics.For example, in mammal pheochromoctyoma cells, ligands triggering distinct programs (proliferate or differentiate) activate the same receptor tyrosine kinase pathway but with different amplitudes (42).In fact, by increasing or decreasing receptor expression, the wrong program may be also be initiated (55) implying that tight control of kinetics may be required and that uncontrolled noise may be disastrous.Cross-talk can be exploited by cancer cells to initiate uncontrolled cell growth (19) even in the presence of chemotherapeutic agents targeting individual signaling pathways.
In this Chapter we present an information-theoretic measure of circuit quality which is independent of an assumed network's function and demonstrate that generic small networks under biological constraints can transduce more information than a simple binary switch, often coming close to the optimal transmission fidelity, which we calculate numerically and analytically from physical constraints.In choosing a general informationtheoretic quality measure, we obviate the problem of requiring prior knowledge of the function of the network (1; 3; 15; 33; 34; 59; 65; 68), which is obviously network-specific and often unknown, and the related problem that a given network may perform multiple functions (31; 67; 69).We also demonstrate that the presence of an odd number of negative regulators in a feedback loop confers an advantage to the circuit in terms of noise regulation and thus information transmission.Finally, we show that the ability to transduce information reliably is insensitive to most large (tenfold) deviations of a network's kinetic parameters.

Measure of quality of a biochemical computation
To motivate our approach, consider the experimental set-up of Guet et al. (25).Probing experimentally the relationship between structure and function in transcriptional networks, Guet and coworkers built a combinatorial library of 3-gene circuits and looked at the steady-state expression G of a reporter gene (GFP), coupled to one of the genes in the circuit, in response to four different chemical inputs C, namely two binary states of two different chemicals.The circuits acted as transducers, converting chemical signals into genetic response.They found that some networks could perform different behaviors (that is, behave as different logic gates), while others could achieve only one particular function.Of note, while some circuits responded differently to different inputs, for other circuits, the reporter expression did not depend on the chemical input state.The latter are clearly "broken circuits," transducing no information about the inputs.
The actual number of GFP reporters in each cell clearly is not repeatable due to the stochastic nature of the involved cellular machinery.For this reason, the inputoutput relation for a circuit should be described in terms of the probability distribution P (c, g) ≡ P (C = c, G = g), where c stands for particular chemical states, and g measures the number of reporter molecules.Then a natural measure of a circuit's quality is the mutual information between its inputs and outputs ( 58) where log is taken with base 2.This dimensionless, nonmetric quantity measures in bits the extent to which C and G are dependent (complete independence implies p(C, G) = p(C)p(G), and thus where H(X) is the entropy, H(X) = − x p(x) log p(x).In (25), there were ||C|| = 4 possible input states c ∈ {1, 2, 3, 4} = {c i } and two possible output states, GFP on or off.For a circuit with a constant g, H(G) = 0, and then I(G, C) = 0.At the other extreme, if the reporter gene is on for exactly two of the four equiprobable chemical inputs, then each reporter state has p = 1/2, and I(C, G) = 1 bit.Similarly, for multinomial distributions of g, the mutual information seamlessly takes into account all possible relations between g and c.
A crucial advantage in adopting mutual information as a quality measure is that it can be evaluated independently of the function of the circuit.For steady state responses considered here, the only reasonable way to define a qualitative function of the circuit, or to characterize the computation performed by it, is to consider how g(c i ) are ordered.
As long as all ||C|| responses are sufficiently resolved, the mutual information will be ∼ log ||C||, irrespective of the ordering.Thus the mutual information-based circuit quality measure is insensitive to the type of computation performed by the circuit, but only to whether the computation assigns a different output to each input.Furthermore, due to the data processing inequality (58), high I(C, G) is a sufficient condition for a high-quality realization of any computational function that depends (stochastically or deterministically) on P (c, g).High I(C, G) is especially important for sensory stages in biochemical signaling, where the same biomolecular species may control responses of many different biochemical modules, requiring high quality information about many different properties of the signal at the same time.

Proposal
We propose to investigate how the topology of a regulatory circuit affects its computational and information transmission properties, as measured by the steady state signal-response mutual information, Eq. ( 1).While the results of (25) may be interpreted as revealing that some circuits may perform better than others, this effect can be caused in part by operating at suboptimal kinetic parameters, some of which are biologically easy to adjust to improve the information transmission fidelity.In fact in (25), several networks identical in topology but different in their kinetic properties, performed markedly different functions.To avoid the problem, we study instead the maximum mutual information attainable by the circuit under realistic conditions.Specifically, for a regulatory circuit t, with a set of kinetic parameters ϑ = {ϑ 1 , ϑ 2 , . . .}, which responds to inducer (input) concentrations C = {c i } = {c 1 , c 2 , . . .} with different genetic (output) expression levels P (g|c, ϑ), we propose to investigate numerically ϑ * t = arg max ϑ I t (C, G).As in (25), we limit ourselves to 3-gene topologies where each gene is regulated by exactly one transcription factor (see Tables 4 and 5 for the list of these topologies, and Sec.2.1 for their more detailed description).We measure the output of the circuit in terms of the steady-state expression of the reporter gene, which is always downregulated by another gene denoted Z (see Tables 4 and 5).This limits us to 24 possible circuits, cf.Sec.2.1.The kinetics associated with these topologies are described in Sec.2.2.Note in particular that even though we use the genetic regulation terminology throughout the paper, the kinetic model is general enough to account for protein signaling and other regulatory mechanisms as well.
For each of the chosen topologies, we need to find stable fixed points of the dynamical systems that describe the circuit, cf.Sec.2.3, evaluate the fluctuations around them, P (g|c), estimate the corresponding mutual information, and then optimize it with respect to the kinetic parameters.Note that all of the parameters of the system that we treat as variable can, in fact, be easily adjusted by the cell over its lifetime by means of many biological mechanisms, cf.Sec.2.2.
Rather than discretizing the reporter output, as in (25), we take into account the actual numbers of the reporter molecules.Assuming mesoscopic copy numbers, we use the linear noise approximation, cf.Sec.2.4, to derive the reporter gene distribution as a sum of Gaussians with means at the stable fixed points.Under this assumption, the mutual information between the two random variables, C representing the discrete chemical (input) states, and G measuring the continuous reporter expression (output), is Here M is the total number of fixed point calculations performed for the circuit, and M c is the number of those done with C = c; N (g c i , σ c i ) denotes the output response for the i'th calculation with C = c, which is a Gaussian distribution with mean g c i and variance (σ c i ) 2 .Many calculations at each point in the parameter space, ϑ, are needed to explore multiple stable fixed points of the dynamical system (see Sec. 2.3).Finally, we choose each chemical state with equal probability, P (c) = const.
When optimizing Eq. ( 2) with respect to ϑ (see Sec. 2.6), we need to consider two computationally trivial (and biologically unrealistic) ways of achieving high I(C, G).First, given discrete c and an infinite range of g, achieving the upper bound I(C, G) = H(C) is easy: as the number of molecules of the reporter g c increases, the magnitude of its fluctuations, as measured by its standard deviation σ c , grows slower as σ c ∼ √ g c , so the responses to all c's can be well-separated if we allow for an infinite number of molecules.This, however, is not biologically relevant, since producing many copies of a molecule takes time and energy, both of which are limited.In fact, here we are interested only in solutions that involve low copy numbers as this is precisely the regime in which gene regulatory networks function.We note also that many apparently deterministic, high copy number systems may actually fall into this regime if the threshold of the system can be overcome with only a few molecules (9; 29; 41; 56).
Second, and perhaps less obvious, a trivial solution can also be obtained if we allow for multi-scale (stiff) systems.For example, if the response time of the reporter τ G is very large relative to the upstream regulators τ Z , then all of the noisy upstream fluctuations will be filtered (4; 6): effectively, the reporter takes N Z τ G /τ Z ≫ 1 samples of Z per its response time (here N Z is the mean number of Z molecules), and fluctuations are small.However, living cells must respond in a timely manner to changes in the environment, so infinite response times are also not biologically relevant.
These observations suggest that our objective function to be maximized requires some biologically reasonable constraints.For this reason, we have investigated many different realizations of the constraints, and, instead of maximizing the mutual information, we chose to maximize the following constrained mutual information where λ and γ are chosen such that the average number of molecules of all of the components in the system N = where N c ij is the average number of molecules of species i for fixed point j given C = c and N s = 4 is the total number of species in the system) is on the order of 100, and the average stiffness of the system q = i r i /r G (where r i are the decay rates of the transcription factors, and r G is the decay rate of the reporter) is approximately 3 orders of magnitude.

Topologies
As in the experimental set-up in (25), we consider 3-node circuits, in which genes are regulated by exactly one gene (including the possibility of auto-regulation).This also reduces the assumptions we would otherwise need to make about the dynamics associated with combinatorial regulation.The 3 genes (X, Y , and Z) in each circuit are interconnected by exactly 3 edges.There are only 3 such non-redundant topological structures, which, when we include the possibility of either excitatory or inhibitory interactions, results in 2 3 = 8 possible configurations per structure, for a total of 24 topologies (see Tables 4 and  5).The fourth (reporter) gene G is always down-regulated by Z, as in (25).Extensions to other topology classes are easily implemented.

Model and parameters
The dynamics of transcription and translation have been modeled with remarkable success for small circuits by avoiding the translation step completely and coupling the genes to each other directly by means of simple rational functions α j (15; 20; 26).In general, each of the species X = {X, Y, Z, G} in the circuit is subject to a degradation and a creation process The macroscopic, deterministic description of the system is governed by the system of ordinary differential equations where {φ 1 , . . ., φ N } is the concentration vector of the N chemical components, r j is the degradation rate of φ j , and α j is a production term that depends on the concentration of a regulator (parent) molecule of j, namely π j .The production is modeled as a constitutive expression (the leak) plus a Hill activation or inhibition, or where a 0 describes the leakiness of the promoter, a specifies its dynamic range, K is the concentration of the regulator at half-saturation (the Michaelis constant), n is the Hill coefficient, and s i is the modulating effect of the i'th input molecule on the regulator protein.s i can be modeled equivalently by rescaling K.One can think of this as the chemical signal binding to the protein, changing its conformation, and influencing various affinities.The two equations correspond to an excitatory and inhibitory regulation, respectively.For this dynamics, there is no distinction between the protein and the mRNA of a gene species, and we use the terms interchangeably.As in (25), we allow each input to take two binary states (either the input molecule is present or not).We have a total of 3 inputs and 2 3 input states, and each input modulates the expression of one of the three transcription factors.For a chemical state c where an input molecule i is not present, we set s i = 1.We set the units of measurements such that volume Ω of a cell is 1, so that concentration of 1 is equivalent to 1 molecule per cell.In all we have 1. 3 decay rates r X , r Y , r Z corresponding to decay rates for the 3 transcription factors.
We set r G corresponding to a response time of approximately a half hour.
2. 4 Michaelis constants K X , K Y , K Z , K G and 4 range parameters a X , a Y , a Z , a G describing the regulation function for each component of the circuit.
3. 3 input parameters s X , s Y , s Z modulating the effect of each input on the 3 transcription factors 4. 1 leak parameter a 0 .
For simplicity we assume n = 2.In total we have 15 parameters.Notice that all of these parameters can be easily adjusted by the cell by means of a variety of biological mechanisms, thus validating our proposal to study the dependence of the signal transduction, optimized with respect to the parameter values.Below is a non-exhaustive list of such regulatory mechanisms.
1.All protein/mRNA decay rates can be adjusted independently of each other by microRNA expressions or by regulated proteolysis, such as using ubiquitin tagging.
2. Michaelis constants depend on structural properties of proteins and the DNA, as well as on the abundance of the proteins near a DNA binding site compared to the overall protein concentration.Thus they can be adjusted by chromatin rearrangement, or by controlling the nuclear pore transport.
3. Effects of chemical inputs on transcription factors depend on the chemical-protein affinity and on the abundance of the chemicals near the relevant proteins.The former can be changed by modulating chemical-protein binding reaction by means of expression of various enzymes, while the latter can be achieved by controlling transport processes.
4. The leak depends on the concentration of the RNA polymerase, ribosome, as well as the DNA accessibility.All are easy to adjust in a living cell.

Determining Stable Fixed Points
All of our circuits incorporate some feedback mechanism (e.g., the "feedback dyad" (71)) and, therefore, may have multiple stable steady state solutions.We find these by numerically solving the macroscopic chemical kinetics system (6) describing the circuit using MATLAB's ode15s with the parameters as described in section 2.2.We randomly sample different initial conditions for the time-evolution to obtain a set of (almost all) fixed points for each chemical state and each topology.Additionally, since in vivo the system will be flipping between different input states, the steady-state solution of one input state is the potential initial condition for the time-evolution of the other inputs.To include these potential initial conditions, we first randomly choose 10 initial conditions for each c i , and then we take the resulting stable solutions and use them as the initial conditions for each c j =i .When a time-evolution of the system results in oscillations or chaotic behaviors, we neglect these solutions since, under our assumptions, they will result in multiple genetic outputs corresponding to the same chemical input and hence in a low mutual information.

Linear Noise Approximation (LNA)
For excellent reviews and discussions of the linear noise technique, we refer the reader to (13; 14; 45; 64).Here we briefly review one particular formulation that simplifies the analysis.
Given a system with volume Ω and N different particles, we denote the particle concentrations as φ = {φ 1 , . . ., φ N }, and the copy numbers as n = Ωφ.The state of the system is defined by n, and it changes when an elementary reaction j, j = 1, . . ., R takes place.When reaction j occurs, the copy number n i changes by S ij , which is the N × R stochiometric matrix.Then the evolution of the joint probability distribution P (n, t) is given by the following master equation where E −S ij is the step operator, which acts by removing S ij molecules from n i , and f j is a rate for j.While this equation is usually mathematically intractable, a Monte Carlo algorithm exists to solve it numerically (the Gillespie algorithm) (22; 23).To generate a particular stochastic trajectory, this method draws random pairs (τ, e) from the joint probability density function P (τ, e|n), where τ is the time to the next elementary reaction, and e is its index.Multiple trajectories allow to estimate the necessary moments of P (n, t).However, this approach is computationally intensive, and quickly becomes infeasible if one wants to explore multiple system parameterizations, or if f j span multiple scales.
Alternatively, one can expand the master equation in orders of Ω −1/2 .Introducing ξ, such that n i = Ωφ i + Ω 1/2 ξ i and treating ξ as continuous, the first two terms in the expansion yield the macroscopic rate and the linear Fokker-Plank equations, respectively: where The steady-state solution of Eq. ( 11) is a multivariate Gaussian where the covariance matrix Ξ is given by the matrix Lyapunov equation This system is solved using the standard matrix Lyapunov equation solvers (MATLAB's lyap).In order to assess the validity of LNA for our system we compared the steady state solutions to multiple Gillespie runs.We found that, even at very low copy numbers (∼ 10), LNA performed well as measured by the Jensen-Shannon divergence (see Sec. 2.5 for details).Based on these results, we approximate the steady-state distribution as a sum of multivariate Gaussians with means at the stable fixed points of Eq. ( 10), and with covariances as in Eq. ( 13).
We note that both the LNA and the Gillespie algorithm are derived assuming that the reactions j are truly elementary.In our system, a single particle creation, α, encapsulates all processes, starting from the protein-DNA binding and ending with the translation.Justification for using "elementary complex" reactions is provided in (2; 8; 11; 14; 44; 48)).However, the complex nature of the reactions has acomparatively small influence on the low frequency components of thestochastic response (? ), which is our focus here.For this reason we believe that approximating terms in Eq. ( 6) as elementary and using LNA is a less important approximation than merging transcription and translation into a single step.Generalizations to LNA with elementary reactions is straight-forward, provided the reaction system is known (which is more complicated).

LNA Performance
Here we quantify LNA accuracy as a function of the mean molecule numbers in the system.We approximate the steady-state probability distribution p l using LNA and compare this to the steady-state distribution p g obtained with multiple trajectories of the Gillespie algorithm.As a measure of the similarity between the two probability distributions, we use the symmetric Jensen-Shannon divergence where D KL [pl|q] = i p i log 2 (p i /q i ) is the Kullback-Leibler divergence and we set π 1 = π 2 = 1/2.We note that D KL ≥ 0 and in the case that the two distributions are equal p l = p g , then D KL = JS Π = 0. We calculated JS Π for multiple circuits and multiple parameterizations and found excellent consistency between LNA and Gillespie (see Fig. 1).Below 10 molecules, the two distributions were easily distinguishable, but above 10 molecules we consistently found large overlap.

Optimization
We employ a simplex optimization (using MATLAB's fminsearch) to maximize L = I(G, C) − λ N − γ q over the log 10 of the 15 parameters where λ and γ are chosen to accommodate biologically relevant molecule numbers and stiffness.For example, for an average of approximately 100 molecules for each transcription factor and a stiffness  Figure 1: Jensen-Shannon divergence JS Π between distributions obtained by LNA and the Gillespie algorithm for multiple circuits and multiple parameterizations plotted as a function of mean copy number.At JS Π = 0, the distributions are identical.There appears to be a sharp threshold at 10 molecules, below which the LNA does poorly, but above which, LNA does well.
of order 3 we choose λ = 0.01 and γ = 0.001.To explore the parameter space for each topology, we uniformly randomly select biologically relevant starting points (protein half-life near 10 minutes, promoter leakiness near 0.01 proteins/sec, promoter range near 10 proteins/sec, regulator at half-saturation near 100 proteins/sec, and input molecule modulation of regulator near 2).To make the search for maxima more efficient we only maximize random points that start already above a certain threshold (L ≥ 0).

Maximum mutual information for a fixed copy number
Suppose a molecular species G with concentration g, dgP (g)g = N G is used as a reporter species for a cascade of biochemical computations, so that the species is not allowed to participate in any feedback loops.Then its stochasticity is limited from below by a Poisson noise.That is, if ḡ is the deterministic value of g produced by some biochemical reaction kinetics, and ḡ ≫ 1, then Furthermore, ḡ itself is distributed probabilistically according to P (ḡ), dḡP (ḡ)ḡ = N Ḡ, due to stochasticity of inputs to and of the internal dynamics of the biochemical system.We are interested in the maximum number of bits that can be transmitted reliably by this reporter species (that is, is its channel capacity) at fixed N G .
Intuitively, the noise in this system is ∼ √ N Ḡ ≈ √ N G , so the number of distinguishable states of the reporter is also ∼ √ N G , and one should be able to transmit about 1/2 log 2 N G bits reliably.This argument has been used extensively (e.g., (10)).However, it fails (a) to establish the correct constant of proportionality in front of the number of distinguishable states and (b) to take into the account the ḡ dependence of the noise variance (which leads to a higher resolution at smaller ḡ).Both of these effects are likely to contribute only O(1) bits to the channel capacity, but, for N G 100 considered in this work, this might be an important correction.We are unaware of a prior derivation of the channel capacity for this system up to o(1), and we present it here.
To find the channel capacity of the reporter species, we maximize I(G, Ḡ) with respect to P (ḡ) subject to This results in where ≈ is due to the approximation involved in replacing H(G) by H( Ḡ).Plugging P ( Ḡ) into the equation for I, we get the channel capacity Thus, for the optimal distribution of inputs, as in Eq. 20, the naive estimate of I 0 = 1/2 log 2 N Ḡ for a biochemical reporter is correct up to terms non-vanishing with N −1 Ḡ .For the distribution of inputs analyzed in this work (up to 8 discrete input states), the maximum possible I(G, Ḡ) is clearly less than this channel capacity.One can obtain the maximum information for such input distributions by numerical optimization of I with respect to the values of the ḡ input states, assuming a Poisson distribution of g around ḡ.

Network noise analysis using LNA
Consider a regulatory network of N transcription factors indexed by i ∈ {1, 2, . . ., N}.The macroscopic behavior of the system is determined by where φ i is the concentration of the i'th transcription factor.Let n = Ωφ be the vector of molecule copy numbers with volume Ω.Using LNA (13), we can calculate the covariance matrix C = (n − n )(n − n ) T = ΞΩ by solving Eq. 13: This suggests that the topology or structure of the network can also play a role in controlling noise.Specifically, the variance of the i'th transcription factor C ii can be reduced by decreasing the product ∂f i ∂φ j C ij < 0, where j ∈ π i .
The covariance C ij is a more complicated function of the other covariances: If j ∈ π i , then from Eq. 28 we see that C ij is a function of the covariances between i and the regulators of its regulators (C ik , where k ∈ π j , and j ∈ π i ).We can write these covariances in turn as functions of covariances between i and the regulators of regulators of regulators of i, and so on.This implies a recursion which will end when we either reach a regulator which has no other regulators or, in the case of a cycle, we reach i again.
In the latter case, the recursion will end back with C ii , and the last term in Eq. 28 will have the form Since C ii ≥ 0, this implies that one way to reduce C ij (and hence C ii itself) is to have the product in Eq. 29 be negative.Crucially, the only way to achieve this is if the cycle contains an odd number of negative regulators.

Some simple examples of sub-Poisson noise
The transcription factors in the network may participate in various feedback loops.In some cases, this allows the usual Poisson noise lower bound to be overcome, resulting in a sub-Poisson noise (C ii < φi ).Below we give some simple examples for 1-,2-,and 3-cycles.
The set-up of ( 25), which we use in this work, simplifies the analysis since we only consider one promoter transcription factors, so that f i = −r i φ i +α i (φ π i ), where π i includes just one gene.In steady-state, φi = α i /r i .Finally, all of our reactions are enzymatic, so the diffusion matrix B will only have diagonal nonzero elements.Then, since B ii = r i φi + α i , we use the expression for φi to find B ii = 2r i φi .

Auto-repression
For the auto-repressive case there are no covariance terms and ∂f i ∂φ i = −r i + α ′ i , so we can rewrite Auto-repression implies α ′ i < 0. Thus C ii < φi , resulting in a sub-Poisson noise.A similar derivation using LNA is given for regulated degradation in (12) and regulated synthesis in (62).

A 2-cycle
In this case, π i = i − 1, and π i−1 = i.Assuming no auto-regulation, let ∂f i ∂φ i−1 = α ′ i and . Now we write, To reduce C ii we can reduce the magnitude of C i,i−1 .One way to achieve this is to have opposite signs for α ′ i and α ′ i−1 .Moreover, the sub-Poisson noise is possible if Thus the presence of a negative and positive regulator in a 2-cycle is a necessary, but not sufficient condition for achieving sub-Poisson noise.For sufficiency, we also need |α

A 3-cycle
In this case, π i = i − 1, π i−1 = i − 2, and π i−2 = i.The variance equation stays the same However, now we have and Combining the above into a single expression for C ii , we have The last term gives us a product of the derivatives, If this product is negative (that is, if we have an odd number of repressors in the cycle) then we can reduce the overall magnitude of the variance C ii .Note here that we have two extra terms in the variance.One, (α ′ i ) 2 C i−1,i−1 , is always positive, while the other can be of either sign.Thus the overall negative regulation is not a guarantee of a sub-Poisson noise in this case.
Ultimately, noise regulation can be improved with cycles with odd number of negative regulators.However, as cycles get larger and the network becomes more complex, the achievability of sub-Poisson noise becomes more limited.This may be related to the observation that, whereas small cycles are over-represented in a metabolic network, large cycles occur less frequently than one would expect given several different possible null models (24).

Transmitting more than 1 bit at low copy number
We tested the ability of each of the 24 different circuits to reliably transduce input signals.
For each circuit, we numerically optimized Eq. 3 at different λ and γ.The results of a single optimization thus give us a local maximizer ϑ * (λ, γ) of L. For each numerically obtained ϑ * we then plot the corresponding mutual information I * (as calculated by Eq. 2) as a function of the actually observed average number of reporter molecules For example, in Fig. 2 we show the results of multiple maximizations for two typical circuits.Each point on the plot corresponds to a ϑ * (λ, γ).The blue squares and the red diamonds correspond to the two different γ values and the solid lines correspond to the "best" solutions which we determine by finding the convex hull of the set of all maxima.
Not surprisingly, as the λ constraint is weakened, and higher molecule numbers are allowed, more information is transduced on average (blue and red curves always increase monotonically), Similarly, as the γ constraint is increased, and the stiff solutions are constrained, less information is transduced (red curve is always less than or equal to blue curve).We report that all 24 topologies can pass more than 1 bit of information with molecule numbers far smaller than 100.In fact, at 25 molecules, most circuits can pass nearly 2 bits of information.In short, generic topologies under biological constraints of response time and molecule numbers can still transduce more information than a simple binary switch.

Determining optimal bounds
To determine how well the circuits performed compared to the optimal solution, we first note that all solutions are upper bounded by the entropy of the input distribution, which in our case is H(C) = 3 bits.Next, recall that the reporter protein, G, must be at least subject to its own intrinsic noise, and the variance of this noise must be at least that of a Poisson distribution (p(x) = exp(−λ)λ x /x!) with mean λ = g c (since the reporter does      not have any feedback) (45).Given this noise lower bound and a probability distribution over inputs C (in this case a uniform distribution of 8 delta functions), we can numerically calculate an optimal transduction curve.That is, we optimize where gc and we denote gc to emphasize that the means g c are optimally separated and are described by a Poisson distribution.For different values of λ, we can define an optimal curve Ĩ vs. N G , where Ĩ is the mutual information at the maximum of L. All 24 topologies are upper bounded by the resulting curve, since the reporter gene must be at least subject to its own g c noise plus any noise translated from upstream factors.Finally we note that Ĩ is always bounded by the channel capacity I 0 which is defined to be the maximum over all input distributions and can be approximated analytically as in Eq. 22 (see section 2.7).For N G = 25 molecules, I 0 = 2.32 bits and for N G = 100 molecules I 0 = 3.32 bits.

(Almost) optimal circuits
We find that all 24 circuits are able to achieve close to the optimal transmission fidelity, implying that they are able to tune the noise from the upstream factors to almost negligible values (see Figs. 2-4 and Table 3).To quantify how well the circuits perform compared to the optimal bound and to each other, we define the statistic where I is the linearly interpolated convex hull and a and b are set to 25 and 120 molecules, respectively.Note that for our discrete input distribution we can upper bound I ≤ 2.75 bits, where we use the linearly interpolated curve derived numerically Ĩ, as described in Sec.3.2; similarly, for any input distribution we can upper bound I ≤ 3.03 bits, where we use the analytic approximation I 0 derived in section 2.7.
Since the convex hull area can only grow with the number of optimizations we run, there is a bias in our calculated statistic I .That is, with k optimization runs, I k ≥ I k−1 .We are interested in I = I ∞ , but this is clearly unattainable.Moreover, for different topologies, I may be approached with different speeds as a function of k, making comparisons between topologies suspect.We use jackknifing to estimate the bias.That is, in the spirit of (60; 63), instead of the total number of optimization runs N opt , we use only N opt /m of them to calculate I .Then one can estimate I ∞ by fitting where A i are some constants.In the insets of Figs.2-4 we show the dependence of I on m, the inverse fraction of data included.We see that for the most part I(m) is well fit by a straight line, and contributions from the higher order corrections are insignificant.
The results of extrapolating I to m = 0 for each circuit are reported in Tables 4 and  5.The average I over all circuits is 2.48 ± 0.05 and 2.32 ± 0.09 bits for γ = 0.001 and γ = 0.01 respectively.We find the circuits are within 10% of the optimal transduction capacity.

Ranking circuits
The optimality measure I provides a ranking of the topologies (see Tables 4 and 5).While, strikingly, all of the circuits perform close to the optimal bound, systematic differences are revealed.Consider for example the linear chains with autoregulation (circuits 1, 19, 14, and 4 with negative feedback and circuits 21, 15, 16, and 11 with positive feedback).We note that the negative feedback circuits all have higher I values than their positive feedback counterparts.Moreover, the difference (gap) between the γ = 0.001 and γ = 0.01 curves is smaller for the negative feedback circuits than for the positive feedback circuits.That is, even when the stiffness is constrained, these circuits still do well, whereas the other circuits are more reliant on stiff dynamics.These results are consistent with findings in (3) that autorepressive circuits can help regulate noise.Interestingly, this trend can be generalized to the circuits with longer cycles as well.For example, we also find that for circuits with 2-cycles, those that perform best are those that have opposite regulations (one repressive, one activating) rather than two activating or two repressing regulators.For the case of 3-cycles, those circuits with 1 or 3 negative regulators have on average higher values of I .In Figs.2-4 we display curves for typical 1-, 2-, and 3-cycles, respectively with both odd (left column) and even (right column) number of negative regulators.These findings imply that there are some structural constraints that impart small, but measurable limitations to the circuit's transduction capacity.In particular, those circuits with an odd number of regulators (an overall negative feedback) in their cycles are generally ranked higher than those circuits with an even number of regulators (an overall positive feedback), see Tables 4 and 5.In Fig. 5 we show a bar graph of the values of I for the two classes of circuits (odd and even number of regulators in the cycle) for different γ values and for different length cycles.The average mutual information for the circuits with an odd number of negative regulators is 2.51 ± 0.03 and 2.39 ± 0.05 for the two γ values, whereas for the circuits with an even number of negative regulators it is 2.44 ± .03 and 2.26 ± .05 for the two γ values.Between the two classes, these values are more than one standard deviation apart.To test the significance of this observation, we perform the non-parametric Mann-Whitney U (Wilcoxon) Test (40; 72), which measures the difference in medians between two samples.We find for γ = 0.001, U = 8 and p = 0.0002, and for γ = 0.01, U = 10 and p = 0.0003.That is, the null hypothesis that the optimality measures for the two classes of circuits (odd and even number of regulators, or, alternatively, overall negative and positive feedback) are drawn from the same distribution and, therefore, have the same medians, is highly unlikely.

Noise analysis
Since circuits containing cycles with an odd number of negative regulators are better signal transducers, we might expect that they are able to control the noise variance better.In fact, using LNA (cf.Sec.2.4), we prove this assertion for a generic transcriptional network in Sec.2.8.Furthermore, for simple networks we demonstrate that the overall negative feedback is a necessary and, in one case, even a sufficient condition to achieve sub-Poisson noise (variance less than the mean).
For example, let dφ i /dt = f i ≡ −r i φ i + α i (φ π i ) describe the deterministic dynamics of gene i (see Sec. 2.2 for explanation of the notation).In a steady-state φi = α i (φ π i )/r i .Then, for a 1-cycle where π i ≡ i, Eq. 25 for a species variance reduces to where α ′ is the derivative of the gene expression function, and the above is evaluated at the deterministic steady state.In the case of an auto-repression, α ′ < 0, and the variance C ii is less than the mean (3; 18).
Similarly, Eq. 25 can be reduced for a 2-cycle, i, j = {1, 2} Since C ii > 0, here a necessary (but not sufficient) condition for sub-Poisson noise is This analysis (as well as the derivation in Sec.2.8) also illustrates that it is easier to obtain smaller variance (and hence larger mutual information), for cycles of shorter length.This is in agreement with (24) where it was found that short cycles are over-represented in a metabolic network, but large cycles occurred less frequently than one would expect given several different possible null models.

Reliance on large q
The difference (gap) between the two γ curves also suggests a statistic to compare the circuits.Presumably, a large difference implies that the circuit relies on large stiffness q to regulate noise.Indeed, for large q , the objective function L decreases, though this decrease is moderated by the value of γ such that smaller γ values allow larger q values.Stiff solutions have the advantage of allowing the reporter protein to effectively act as a low-pass filter, slowly sampling and responding to fluctuations in the circuit components.
Figure 5: Bar graphs for I values for the two classes of circuits: odd (blue) includes circuits with cycles containing an odd number of regulators and even (green) includes circuits with cycles containing an even number of regulators.Top γ 1 = 0.001, middle γ 2 = 0.01, and bottom γ 1 − γ 2 .For all 3 measures, there is a statistically significant difference between the two classes of circuits as calculated by the U Test (top p = 0.0002, middle p = 0.0003 and bottom p = 0.01).
A reliance on small values of γ implies that the circuit has more difficulty regulating noise.We therefore expect the circuits with an odd number of negative regulators to have smaller gaps.Consistent with this prediction, while the average gap over all circuits was 0.16 ± 0.05, the average gap for the negative feedback circuits was 0.13 ± 0.04, and for the rest it was 0.18 ± 0.05 (see Fig. 5).The U Test using the gap measure gives U = 28 and p = 0.01, indicating a statistically significant difference.
Evidence from a database of transcription factors in prokaryotes supports the finding that circuits with negative feedback can suppress noise (54).In E. coli, many transcription factors do not undergo active degradation via proteolysis, but are instead only passively degraded via dilution.The half-lives of such proteins are on the order of the lifetime of the cell, allowing them to respond only slowly to fluctuations in the mRNA concentrations.As in the case of the stiff solutions with large q in our circuits, these slowly responding transcription factors have an advantage in noise control (47).Therefore we might expect that transcription factors that do not undergo proteolysis will have no auto-repression, or even positive auto-regulation.On the other hand, transcription factors that do undergo proteolysis and cannot, therefore, filter mRNA fluctuations as well would be more likely to require negative auto-regulation.To test this hypothesis, we analyzed 145 transcription factors of the E. coli regulatory network (see supplemental material in ( 73)).For each transcription factor we correlated whether the factor is auto-repressive (54) with whether it potentially undergoes proteolysis (by noting if the peptide sequence had any known cleavage sites) (50).We found that of the 13 transcription factors which are likely to undergo proteolysis, 9 are negative auto-repressors, and out of the 132 transcription factors which do not, 88 are not auto-repressors (see Table 1).A Fisher exact probability test revealed a statistically significant positive association between proteolysis and negative feedback (p = 0.01).

Negative Feedback No Negative Feedback Proteolysis 9 4
No Proteolysis 44 88 Table 1: Table comparing presence or absence of proteolysis to presence or absence of negative auto-regulation.Fisher exact probability test reveals signficant (p = 0.01) positive association.This supports our prediction that transcription factors which undergo proteolysis, and therefore have faster response times, are less able to regulate noise using the slow response, filtering solution and require the presence of negative auto-regulation to help control their noise.The five labeled peaks correspond to 5 distinct behaviors or unique signal encodings (cf.Fig. 7 and Table 2).

Robust, adaptive maxima
An important consideration in further assessing the quality of our circuits is the extent to which these high information maxima are robust to perturbations in the system.Qualitatively, we define a maximum as robust if, in its vicinity, the cost function L does not change significantly in response to perturbation of the parameters R, K, a, a 0 , and s (see Sec. 2.2 for parameter definitions).Relatedly, we would also like to consider the ability of our circuits to adapt their behavior to such perturbations.An adaptive maximum does not significantly decrease its function value L in response to parameter perturbation, but does alter its behavior.Here we characterize the behavior of the circuit at each constrained mutual information maximum by ordering g c .That is, two maxima have different behaviors if the permutation yielding the sorted sequence of g c is different.As a preliminary investigation, we analyzed the functional L of circuit 2 near one of its randomly selected maximum.In addition to the original maximum, we found four other distinct nearby peaks as displayed in Fig. 6.The circuit alters its behavior as a result of changes along the 2 displayed dimensions (cf.Eq. 6), input 1 (s X ) and input 2 (s Y ), so that at each maximum the ordering of responses is distinct, and the signal is encoded in a different way (see Table 2).Note that 4 of the maxima are separated by valleys no deeper than 2.3 bits.In other words, by a change in s X and s Y only, the circuit can alter its behavior, while maintaining a high transmission fidelity.In this sense, we consider these maxima to be adaptive.
We next numerically calculated the Hessian at each of the 5 peaks.In Fig. 7, we plot  the Hessian spectra along with the corresponding eigenvectors.By treating L as locally quadratic near each maximum we use the Hessian (evaluated with respect to log 10 of the parameters) to analyze how sensitive the maximum is to deviations in the parameters.For example, for an eigenvalue equal to −1, moving 10-fold in the corresponding eigendirection would result in a loss of 0.5 for the objective function.Alternatively, an eigenvalue equal to −0.1 means that we can move 10-fold in that direction while decreasing the objective function only by 0.05.This should be compared with the typical values near maxima of L, I ∼ 2 bits.We find that for most directions for all 5 peaks eigenvalues are less than −0.1.In this sense, we consider these maxima to be robust.We can identify three different regimes for the spectra: an extremely "soft" regime corresponding to the first two modes, a second soft regime, where modes 3 to 9 are basically equivalent, and a third regime (modes 10 through 15), where the eigenvalues become more negative.We note that the spectra for peaks 1 and 5 overlap almost entirely, as do the ones for 2, 3 and 4, and that the latter appear to be more robust (eigenvalues are closer to 0).Interestingly, all five spectra in Fig. 7 are similar, due to the fact that the ϑ * are themselves quite similar -that is, the maxima are closely arranged not just in the 2dimensions displayed in Fig. 6, but over all 15 dimensions.This underscores the circuit's adaptability.
In Fig. 7, we have also displayed the contributions from each parameter to each eigenvector for all 5 peaks.It is clear that the first mode corresponds entirely to the leak parameter, which for all 5 peaks is being driven to 0 as the optimization proceeds.The second mode is also consistent for all 5 peaks, and it corresponds to the parameters a Y and K Y (cf.Sec.6), governing creation for the transcription factor Y .Essentially the range of the gene activation function, a Y , is increased while the Michaelis constant K Y is decreased, so that Y is squeezed to low copy numbers.This is a reasonable strategy since maximizing the information in the output requires that most of the energy spent on building molecules is expended on the reporter protein.
Another consistent finding for all of the peaks is that inputs 1 (s X ) and 3 (s Z ) are more critical (less robust) than input 2 (s Y ).That is, in general, s X and s Z contribute to the highest 3 modes, whereas s Y contributes mostly to modes in the "soft" regime.Qualitatively, this is consistent with Fig. 6, where most of the peaks are ellipses with major axes roughly in line with s Y .Note that this is most evident for peaks 1 and 5 for which s X and s Y correspond almost exclusively to modes 14 and 4, respectively.

Discussion
We have presented an information-theoretic, function-independent measure of circuit quality.We have demonstrated that generic, small networks can transduce more information than a simple binary switch; moreover, such generic topologies can achieve close to optimal transmission fidelity, even under low copy number and fast response time (non-stiff) constraints.High information solutions can be robust to 10-fold deviations in parameters.That such simple, stochastic systems can act as high-fidelity signal transducers suggests a possible explanation for the cross-talk dilemma, in which multiple ligands trigger the same signaling pathway, and yet reliably produce distinct genetic outputs.We have demonstrated that multiple discrete input states can be transduced by the same signaling pathway and even the same molecule if the encoding is in molecule numbers.Moreover, when the trivial solutions to this problem are constrained (high copy number and slow response time) the input state can still be transduced reliably, implying that these simple circuits have enough flexibility to regulate noise.To our knowledge, this is the first demonstration that a simple, stochastic system can overcome cross-talk, without invoking the traditional sequestering argument (57) (where signal specificity is achieved by spatially or temporally sequestering pathways with shared components).
It may be possible to correlate some of the solutions of the circuits with experimental data to investigate to what extent is transduction optimality an essential goal of these systems.For example, a common generic solution of our circuits was to decrease the decay rate and increase the average molecule number of the reporter protein or the proteins that appear near the end of the circuit.The slower decay rate meant the reporter protein could temporally average the fluctuations of the proteins at the beginning of the circuit.Similarly, since the total number of molecules is constrained, it is best to expend energy building reporter molecules which need to encode the entire input signal, rather than wasting molecules on proteins in the beginning of the circuit.One well-known example of this is in the transcription-translation cascade from DNA to protein.Typically, mRNA degradation rates are faster than protein degaradation rates, and mRNA molecule numbers are smaller than protein molecule numbers.
Other more subtle correlations may also be identified.Proteins that do not undergo proteolysis undergo decay by dilution; the characteristic time scale for these proteins is on the order of an hour.This implies that one can have noise filtering due to mRNA-protein scale mismatch without requiring negative feedback.One would expect to find a positive association between proteolysis and negative feedback.Consistent with this prediction, a Fisher exact probability test revealed a significant positive association (p = 0.01) based on transcription factors from the E. coli gene regulatory network.Rosenfeld et al. (51) have argued that an auto-repressive circuit averts the need for proteolysis, since the negative auto-regulation shortens the rise time of the circuit.Closer analysis of transcription factor databases for other organisms may help distinguish between these two putative roles of auto-repression.Interestingly, in prokaryotes the percent of transcription factors undergoing proteolysis is less than in eukaryotes, and the percent of transcription factors which are auto-repressive is less than in eukaryotes.
In their analysis of the phototransduction cascade, Detwiler et al. (10) emphasize that the signal processing characteristics of the cascade can be tuned simply by altering the concentrations of proteins, rather than changing the genetic sequence.That is, the parameters of the system can be optimized on a time scale far shorter than evolution.So too, in our simple circuits, all of the kinetic constants can be regarded as functions of concentrations of proteins extrinsic to the circuit, meaning the parameters may also be tuned, even independently, on a time scale shorter than the response time of the system.We highlight that circuits supporting multiple, distinct maxima should be able to flip between behaviors in response to different stimuli, and that theoretically such adaptation can be as rapid as changes in protein concentration.Importantly, based on our findings, such adaptation can be smooth and still occur without significant loss in transduction capacity.
The fact that the 5 peaks we analyzed collapsed onto two categories of spectra underscores a somewhat paradoxical finding.Namely, the maxima are robust in that they can withstand 10-fold perturbations in most of their parameters without significant loss in transmission fidelity, and yet they are adaptive in that the circuit can flip between the different maxima (and different behaviors), again without significant loss in transmission fidelity.Intuitively, one might expect a tradeoff between robustness and adaptability.Our findings suggest that the circuits can avert this tradeoff by clustering the maxima in a general region of high transmission fidelity.Certainly a closer and more quantitative analysis of this tradeoff is warranted.For example, it is now well-established that a single circuit can support multiple functions (69).In this vein, one interesting research direction would be to enumerate the functions that a particular circuit can achieve and quantify how easily the circuit can flip between these functions.Whereas our circuits can all be regarded as "optimal" in the sense that they can tune their parameters to transduce the optimal amount of input information, it is evident that subtle distinctions in information processing exist among them.Our setup is well-suited to systematically explore these distinctions (e.g., varying the input distribution, quantifying the mutual information between time-varying input and output signals, and quantifying other statistics of the mutual information landscape rather than optimality).).Extrapolated average mutual information over range of 25 to 120 molecules at γ = 0.001 and γ = 0.01.

Figure 2 :
Figure 2: We ran multiple optimizations ϑ * = argmax ϑ L. For each optimization run, we plot the mutual information I * = I(C, G|ϑ * ) vs. the mean number of molecules of the reporter protein N G .Input distribution p(c) = 1/||C|| and ||C|| = 8 so thatI(C, G) ≤ H(C) = 3 bits.Blue squares and red triangles are for γ = 0.001 and γ = 0.01, respectively.The blue and red linearly interpolated lines correspond to the convex hull for each respective γ value.The black solid curve gives the numerically evaluated optimal bound (cf.Sec.3.3) and dashed curve gives analytic bound for any input distribution (cf.Sec.2.7).Inset: I as a function of a fraction of data m included in the analysis (cf.Sec.3.3).Blue and red correspond to two different γ values.Linear regression extrapolated to case of infinite data (y-intercept).Results for two typical circuits with 1-cycles: (a) Circuit 19 with odd number of negative regulators in cycle and (b) Circuit 11 with even number of negative regulators in cycle.Note here as in Fig.3and Fig.4circuits on left have larger I values as well as smaller differences between the two γ values than circuits on right.

Figure 3 :
Figure 3: Same as in Fig. 2 for two circuits with 2 cycles: (a) Circuit 23 with odd number of negative regulators in cycle and (b) Circuit 5 with even number of negative regulators in cycle.

Figure 4 :
Figure 4: Same as in Fig. 2 for two circuits with 3 cycles: (a) Circuit 13 with odd number of negative regulators in cycle and (b) Circuit 17 with even number of negative regulators in cycle.

Figure 6 :
Figure 6: (a) The objective function L and (b) the mutual information I as a function of the first two input parameters (input 1 is s X and input 2 is s Y ) for circuit 2. The rest of the parameters are held constant for this figure.The five labeled peaks correspond to 5 distinct behaviors or unique signal encodings (cf.Fig. 7 and Table2).

Figure 7 :
Figure7: Top-left: Spectra for the numerically calculated Hessian at each of the corresponding 5 peaks labeled in Fig.6.Soft modes (→ 0) are directions in which L has small curvature, hard modes (→ −∞) are directions in which L has large curvature.Many eigendirections exhibit small curvature (greater than −10 −2 for peaks 2-4 and −10 −1 for peaks 1 and 5), demonstrating that the maxima are robust to large deviations in parameter space.Colored panels: Magnitude of contribution from each parameter to each eigenvector for each of the 5 Hessians.Mode index is sorted as in top-left figure (from least curvature to greatest curvature).Row labeled leak corresponds to parameter a 0 .Paired rows labeled X, Y , Z, and G correspond to the two parameters, K and a, describing the gene regulation function for each transcription factor (X, Y , Z) and reporter protein (G).Rows labeled r correspond to the decay rates of each of the 3 transcription factors.Rows labeled s correspond to the input parameters modulating the three transcription factors.For all 5 peaks, the two most soft modes correspond to a 0 and a mixture of K Y , a Y , respectively.s X and s Z contribute mostly to the hard modes.

Table 3 :
Table of mutual information I as a function of mean reporter copy number N G .Insets: I as a function of inverse data m (cf.3.3).Note that circuits 5, 11, 13, 17, 19, 23 are shown in Figs. 2

Table 2 :
Table of behaviors corresponding to the five peaks shown in Fig.6.Behavior is defined as the ordering of g c , where g c is the deterministic steady-state solution for given chemical input c and c ∈ {000, 001, 010, 011, 100, 101, 110, 111}.Each row describes the behavior of the circuit at one of the five maxima.

Table 4 :
Table of Circuits (top 12).Extrapolated average mutual information over range of 25 to 120 molecules at γ = 0.001 and γ = 0.01.40

Table 5 :
Table of Circuits (bottom 12