The Impact of Different Sources of Fluctuations on Mutual Information in Biochemical Networks

Stochastic fluctuations in signaling and gene expression limit the ability of cells to sense the state of their environment, transfer this information along cellular pathways, and respond to it with high precision. Mutual information is now often used to quantify the fidelity with which information is transmitted along a cellular pathway. Mutual information calculations from experimental data have mostly generated low values, suggesting that cells might have relatively low signal transmission fidelity. In this work, we demonstrate that mutual information calculations might be artificially lowered by cell-to-cell variability in both initial conditions and slowly fluctuating global factors across the population. We carry out our analysis computationally using a simple signaling pathway and demonstrate that in the presence of slow global fluctuations, every cell might have its own high information transmission capacity but that population averaging underestimates this value. We also construct a simple synthetic transcriptional network and demonstrate using experimental measurements coupled to computational modeling that its operation is dominated by slow global variability, and hence that its mutual information is underestimated by a population averaged calculation.


Introduction
To survive in challenging conditions, cells need to detect, transduce, and process signals from their environment. A cell's ability to precisely process environmental signals is limited by intrinsic fluctuations and variability of its cellular processes. This variability takes root in the stochastic nature of biochemical reactions. For a given pathway, this includes the stochastic steps involved in transcription and translation [1][2][3][4] as well as diffusion-reactions, dissociations, allosteric changes, and degradation of biological molecules. A signal propagates across cellular networks through molecules undergoing these various reactions, and gets distorted and altered by their probabilistic nature. Therefore, metrics for quantifying the limits of faithful information propagation (signaling fidelity) in biological pathways are crucial for understanding their information processing and transduction capabilities.
Mutual information [5] (MI) is a natural metric for characterizing information transmission between the inputs of a stochastic network and its nodes. MI quantifies the level of precision with which a given node(s) in a network estimates and responds to an input(s) by accounting for both the mean and variability in the response. Recent studies have used MI to characterize information transmission between environmental inputs and transcription factors in a number of genetic circuits [6][7][8][9][10]. In these studies, steady-state MI was computed for a variety of in silico networks to assess their stationary response as a function of input dose. More recently, these ideas were extended to optimize time-dependent MI in delay circuits with binary inputs, and MI was used to discuss maximally informative network topologies in these contexts [11]. In addition, time dependent MI calculations were used to obtain fundamental limits on the suppression of molecular fluctuations for different network topologies [12].
Several experimental studies have also used MI to assess signaling fidelity. MI was used as a metric to argue that negative feedback enables dose-response alignment and enhances information transmission in the pheromone pathway in yeast [13]. Similarly, MI was used to estimate time-dependent information transfer in tumor necrosis factor (TNF) signaling, and to assess transmission bottlenecks in this system [14]. Recently, robustness and compensation of information transmission in different pathways and pharmacological perturbations were attempted in PC12 cells using similar measurements [15]. These experimental studies relied on driving isogenic cell populations with various inputs, and then calculating the mutual information based on the overall variability in the population response. Such calculations mostly found low MI values, suggesting that cellular pathways might have on average low information transmission capacity. In this work, we argue that these calculations often under-estimate MI of a pathway in a single cell, since they do not account for 1) variability in initial conditions and 2) variability that is extrinsic to the pathway. The overall effect of these two sources of variability is that any single cell has a quantitatively distinct input-output relationship [3,4,16] and that calculations that take this into account are needed for more accurate estimation of MI from experimental data. By assuming that extrinsic variability manifests as cell-to-cell differences in a global parameter, such as translation capacity, we demonstrate in a simple in silico circuit that mixing cells with different parameters sets (and/ or different initial conditions) reduces the value of the computed MI. We also argue this point experimentally by building a simple synthetic circuit that exhibits strong extrinsic variability, and then demonstrating with the help of computational modeling that single cells within the population have a larger mutual information than that exhibited by the averaged population. These results indicate that cells might possess higher capacity for information transmission than previously appreciated.

Results
To compute mutual information in a given biological network, we apply simple step functions [14] of the appropriate environmental input to N populations of the same isogenic cells. The step function is mathematically defined as x(t) = 0 for t < 0 and x(t) = X + for t ! 0, where X + is a constant within a given population. For each of the N populations, X + is sampled from a discrete uniform distribution, p u (x + ), over the range of interest (0 to X max ). The uniform distribution represents an unbiased distribution (other than the choice of X max ) that has been routinely applied to steady-state mutual information calculations [6]. Experimentally, one can implement this scheme by growing replicas of the same culture in an N-well plate and stimulating each well with a different step function as defined above (Fig 1a). For a given population n and sampled input amplitude X + (n), the stochastic time-dependent response of measurable proteins y = [y 1 . . . y m ] of the population at t will be p(y, tjX + (n)). For a general x + between 0 and X max , we interpolate the central moments of adjacent sampled distributions to construct p(y, tjx + ). The time-dependent mutual information is then given by IðxðtÞ; y; tÞ ¼ X xðtÞ X y pðy; tjxðtÞÞpðxðtÞÞ log 2 pðy; tjxðtÞÞ pðy; tÞ The value of N (that is the number of experiments) can be chosen based on well established methods [14] to approximate the MI in Eq (1) (see Materials and Methods for further details). We first illustrate MI calculations using an in silico model of a simple signaling cascade (Fig  1b). Here the input X n (t) causes the transformation of the inactive molecule Y Ã 1 to its active form Y 1 . Y 1 could be a receptor or transcription factor responsive to the given input. Y 1 in turn activates transcription of Y 2 , whose mRNA is denoted as M y 2 (chemical equations are detailed in Materials and Methods with the parameter values listed in Table 1). The uniform input distribution is between 0 nM and X max = 250 nM. We found that for this system and its corresponding parameters, N ! 20 was a conservatively large number of experiments to approximate MI.

Contribution of initial condition variability to time-dependent mutual information
We first assumed that this circuit is isolated from the rest of the cell, and that any stochasticity it exhibits is only the result of its chemical reactions (intrinsic variability). When this system is unstimulated (t 0), its molecular species assume a joint steady-state distribution, pðy Ã 1 ; y 1 ; M y 2 ; y 2 ; t 0Þ. As an example, we show the marginal distribution of Y Ã 1 in Fig 1c. This distribution represents the range of initial conditions in Y Ã 1 that a population containing this network would exhibit before any input is applied.
We will first compute the MI of the network while ignoring this initial distribution of states, assuming that all cells in the population start from the same initial condition (for state S1, this is the mean of the initial joint distribution, see Fig 1c) which we refer to as a homogeneous initial condition. This could be thought about as the mutual information of one cell in that population. We plot the time-dependent mutual information between the input and the different species of the circuit: I(x + ; y 1 , tjS1), I(x + ; M y 2 , tjS1), I(x + ; y 2 , tjS1) (Fig 1d). The MI from the input to y 1 , I(x + ; y 1 , tjS1), has rapid dynamics, peaking initially and decaying with time to a Time-dependent mutual information transmission in a simple biochemical circuit. a) Every well contains the same type of isogenic cell population where each well receives a different step input sampled from a uniform distribution. The resulting time series data is used to compute timedependent mutual information. b) Schematic of pathway. c) Initial distribution of Y Ã 1 before any input is given, reflecting cell-cell variability in initial conditions. In this case, a single parameter set is used and the distribution is the result of intrinsic variability of the circuit. S1 denotes the mean of the distribution. d) I(x + , y 1 , tjS1), I(x + , M y 2 , tjS1), and I(x + , y 2 , tjS1). e) Time-dependent dose-response relationships at t = 0, 75 and t = 750 minutes between y 2 and the input. The vertical black dashed line is where x + = 150 nM. f) Mutual information between y 2 and the input computed for heterogenous initial conditions (full distribution in panel c, black) and for homogeneous initial conditions (S1, red). Inset: same data but with time plotted to 2750 minutes and the MI plotted between 2.5 to 3 bits. steady-state. The initial peak in this MI is due solely to the activation and inactivation of y 1 , while the subsequent decrease to steady-state is due to the fluctuations in the synthesis and degradation of y 1 . By contrast, the MI from the input to M y 2 (I(x + ; M y 2 , tjS1)) is slower and on the order of tens of minutes, while that of the protein y 2 (I(x + ; y 2 , tjS1)) is on the order of hours. This is not unexpected, as the mutual information signals for each species follow the causality of the circuit where y 2 shows the largest delay.
The increase of I(x + ; y 2 , tjS1) as a function of time has an intuitive explanation in terms of y 2 dynamics. To visualize this, we plot y 2 ðn; tÞ, the mean of y 2 as well as y 2 ðn; tÞ AE s y 2 ðn; tÞ versus X + (n) for t = 0, 75, and 750 minutes (Fig 1e, red lines), where σ y 2 (n, t) is the standard deviation in y 2 . We will refer to these plots as the time-dependent dose response relationships. For t = 750, more values of x + are resolvable from measurement of y 2 than at time t = 75. For example, for x + > 150, steps in the dose response curves constrained between the standard deviations ( Fig  1e, black lines for t = 75 and 750 minutes) approximate how well a measurement in y 2 can infer the value of x + . At time 750, about 2 steps are resolvable allowing for two distinct ranges of x + to be inferred. While for t = 75 minutes, only one distinct range of x + is inferrable. The larger the number of resolved states, the higher the value of the mutual information.
When mutual information is calculated between the input and a given node over the entire time duration of the signals, the mutual information between the input and each successive node has an upper bound equal to that of the prior node. This is known as the data processing inequality [5]. However, since we are evaluating the time-dependent MI at a given time t, the instantaneous value y 2 can have more information about the input than y 1 . Indeed, at t approximately greater than 150 minutes, we find that the MI I(x + ; y 2 , tjS1) is greater than I(x + ; y 1 , tjS1) or I(x + ; M y 2 , tjS1) (Fig 1d).This is because for the particular parameter set used in this example, the noise propagated from y 1 onto y 2 is averaged out, and the only variability in y 2 stems from its own production and degradation. As a result, I(x + ; y 2 , tjS1) can be modulated to be higher or lower than I(x + ; y 1 , tjS1) by changing the rates of y 2 production and degradation [17]. On the other hand, increasing the number of y 1 molecules would increase its mutual information as this would reduce the noise in the y 1 signal. Therefore, the mutual information at each node of this pathway can be modulated through choice of kinetic parameters. Similar observations that filtering can improve time-dependent MI between success nodes have been discussed in the context of other types of pathways [18].
Next, we examined mutual information while accounting for the fact that cells assume a distribution of initial states across the population upon receiving the input stimulus. We do so by incorporating the pre-stimulus steady state initial joint distribution into the MI calculations. This variability in initial conditions transiently reduces the MI (Fig 1f). At steady-state, the mutual information curves computed for a single or a distribution of initial states eventually converge onto each other at approximately t = 2750 minutes (Fig 1f, inset). This convergence at longer times occurs because a population in which every cell assumes the same exact initial conditions will eventually produce a heterogeneous distribution of states due to the intrinsic Table 1. Parameters for circuit in numerical example.
.01 .0000067 .001 1.5 3 85 stochasticity of the biochemical reactions. For the values of parameters used in this example, the convergence of the two MI curves proceeds very slowly. Therefore, even when only intrinsic fluctuations are present, with no extrinsic contributions to variability, and for a given distribution of initial conditions, a single cell still transiently assumes, on average, a higher time-dependent mutual information than the whole population. In our case, this difference is very modest.

Time-dependent mutual information transmission with global parameter variation
Thus far, in our MI calculations, we have only accounted for variability in initial conditions given a single parameter set for the pathway. More realistically, any given pathway in a cell is subjected to variability through coupling to other cellular activities. This is known as extrinsic noise to distinguish it from intrinsic noise generated by the pathway itself. There are many extrinsic sources of variability that cellular pathways experience. For example, different cells may contain different numbers of polymerases or ribosomes, and hence have different capacities for transcription and translation [3]. This extrinsic variability can be accounted for in many ways, the simplest is to assume that the transcription or translation rate constants themselves can assume different values in different cells across the population.
To demonstrate the contribution of extrinsic variability to MI calculations, we consider a simple case where cells in the population have different translation rates. To do so, we add a stochastic global variable, G, which affects the protein creation rates such thatb y Ã and β y 2 are the nominal values for the parameters used above. In this way, the protein creation rates keep their mean value, but fluctuate because of their coupling to G. For this example, G follows a memoryless birth/death process such that the mean of G is G ¼ b g =g g (β g , γ g are the birth and death rates). It follows that G has a coefficient of variation given by First, setting G ¼ 50, we chose β g = 1.5 × 10 −6 mol-s −1 and γ g = 3 × 10 −8 s −1 . These values establish a stationary distribution of states, which we use as an initial distribution for the MI calculations. Fluctuations in the translation rate induce extra variability in the pathway components (compare the initial distributions of Y Ã 1 in Fig 2a to Fig 1c). As a result, mutual information calculations with this added extrinsic variability (and using the population distribution of initial conditions) show that I(x + ; y 2 , t) is now drastically reduced compared to the case when a single parameter set is used to represent the lack of global variability (compare black line in Fig  2b with value in Fig 1d). Here also, as expected, MI calculations from a single initial state corre- , generate high transient values (red (S1), blue (S2) and green (S3) curves in Fig 2b). This discrepancy between single cell and population MI is further highlighted by examining the time-dependent dose response relationship between y 2 and x + at t = 750 (Fig 2d (full population) and Fig 2c (S1, S2, and S3). Again, the sub-populations generated from S1, S2, S3 each have little variability (high mutual information) relative to the full population.
While constraining G ¼ 50, we investigated the time-dependent MI for different values of β g and γ g . Our original choice of γ g = 3 × 10 −8 forces G, and hence the translation ratesb y Ã 1 and b y 2 , to fluctuate very slowly. Therefore, the convergence of the MI values computed from a single initial condition versus the full distribution also proceeds slowly. As γ g increases, this convergence proceeds faster (Fig 2e). Therefore, these results indicate that the mutual information of a pathway can be severely underestimated by population-based measurements if the Global variability has large impact on mutual information. a) Initial distribution of Y Ã 1 before any input is given, reflecting cell-cell variability in initial conditions. In this case, the protein synthesis parameter is stochastic, reflecting a globally varying source of noise. S1 denotes the mean of the distribution, while S2 and S3 denote cells that are one standard deviation away from the mean. b) Time-dependent mutual information computed for heterogeneous initial pathway is subjected to global fluctuations that proceed on a slower timescale than the pathway itself.
Probing the mutual information of a simple synthetic circuit Next, we sought to probe the major determinants of mutual information for a simple synthetic transcriptional circuit (Fig 3a). In this circuit, a constitutively expressed transcription factor Y Ã 1 interacts with a small molecule X + , leading to the activation of the transcription factor. The active transcription factor Y 1 translocates into the nucleus and activates expression of a gene Y 2 .
In our implementation, Y Ã 1 is an estradiol (input X + ) responsive chimeric transcription regulator (TR) consisting of three fused elements: an activation domain (from MSN2), a lipid-binding domain (from the human estradiol receptor, hER-LBD), and a DNA binding domain (from GAL4). When estradiol binds to the LBD, the activated TR Y 1 translocates to the nucleus and controls the expression of promoters containing Gal4-binding-sites. Therefore, the protein Y 2 (in this case a fluorescent protein) is produced from a Gal4-responsive GAL10 promoter (See Materials and Methods for more details). At the same time, Y Ã 1 is produced from an altered version of the promoter of the alcohol dehydrogenase 1 gene (ADH1). We constructed two strains for measurement purposes. Strain 1 contains the circuit in addition to two copies of the GAL10 promoter, one driving YFP and the other driving mCherry. Strain 2 contains the circuit, but this time with two copies of the ADH1 promoter, one driving the production of Y Ã 1 and the other driving the production of YFP (which we will refer to as Y 1r ). The same strain also contains a GAL10 promoter driving the production of the mCherry (Y 2 ) protein. These strains were useful for two reasons. First, we wanted to establish how mutual information computations depend on the ability to simultaneously measure different quantities in a circuit (e.g. Y 1 and Y 2 versus Y 2 alone). Given that this necessitates the use of two fluorescent proteins, in this case YFP and mCherry, we wanted to ascertain that the results we obtain are qualitatively independent of the choice of fluorophores, given that mCherry has lower dynamic range than YFP with higher background fluorescence and hence increased noise at low concentrations.
For each strain, we subjected 12 exponentially growing populations (wells) of cells cultured in non-repressive media to input concentrations of estradiol (x + ) log-sampled between 0 and 100 nM. The 12 measurement points sufficiently sampled the dose response relationships. The number of cells measured from each well was greater than 3000, ensuring good statistics for approximating the MI [14]. All cultures were started from zero estradiol concentrations. Samples were taken at t = 0, 65, 165, 330 and 580 minutes. Fig 3b shows the time-dependent doseresponse relationships of estradiol versus y 2 (in this case YFP, strain 1) for these timepoints, where fluorescence values were normalized with respect to side scatter in order to minimize the effects of cellular volume and shape dependent differences.
The dose-response relationships of y 2 normalized by their respective maximum mean values (Fig 3d) exhibit an interesting trend: for the last 3 time points, the traces for the mean and variability are very similar to each other. The only outlier to this trend is the time point at t = 65 minutes after stimulation (Fig 3d). For this timepoint, fluorescence is weak and strongly conditions (the whole distribution in panel a, black), or homogeneous initial conditions (S1: red, S2: blue, and S3: green) c) Time-dependent dose-response relationships between y 2 and input at t = 750 minutes for a population starting from homogeneous initial conditions corresponding to S 1 , S 2 , and S 3 . d) Timedependent dose-response relationship between y 2 and input for heterogeneous initial conditions corresponding to full distribution in panel a. Calculations for panels a-d correspond to a slowly fluctuating global variable with γ g = 3 × 10 −8 . e) Mutual information for different timescales of the fluctuations in the global variable.
doi:10.1371/journal.pcbi.1004462.g002 'slow' global fluctuations experimental data overlaps with autofluorescence and folding delays, and therefore the true signal cannot be accurately estimated. Autofluorescence and folding delay also contributes, albeit less dramatically, to the measurement at the t = 165 minutes timepoint (Fig 3d). The mCherry measurements (strain 1 or strain 2) generated the same trend (S1a and S1b Fig) albeit with a noisier outcome than YFP due to the limited dynamic range of mCherry. As a consequence, the y 2 (YFP) and y 1r (YFP) experimental measurements from the two strains can be used in combination for comparison of modeling with data. The fact that variability in the y 2 data irrespective of the fluorescent protein does not decrease with increasing mean values suggests that dominant fluctuations are unlikely to be intrinsic to the pathway.
Fig 3c plots measurements of y 1r (YFP, strain 2). Unexpectedly, despite the common assumption that the ADH1 promoter has constitutive and constant expression, we found that it exhibits a modest dependence on estradiol. We do not know the root of this dependence, but it is likely to reflect the influence of the circuit itself on the metabolic state of the cell, hence affecting ADH1 promoter activity. Overall, the growth rate of these strains is independent of estradiol for concentrations under 100 nM over the duration of the experiment (S1c and S1d Fig), and therefore this effect can be compensated for in the mutual information calculations. It is worth noting here that we are making the assumption that despite the fact that YFP (Y 1r ) and Y Ã 1 are different proteins sharing only the same transcription rate (since both are driven by the ADH1 promoter), they share the same dominant noise characteristics. This would be the case if their intrinsic noise, which can be different, is insignificant compared to a dominant source of extrinsic noise affecting both. Next, we present data and modeling demonstrating that, indeed, noise in both Y 1 and Y 2 is most likely dominated by the same extrinsic global component.
Since the measured distributions are approximately gaussian for the majority of estradiol concentrations (S2 Fig) and the synthetic circuit (Fig 3a) follows the same basic chemical equations as the simple pathway we have studied in Fig 1b, we used this already established model to computationally explore different noise scenarios (see Materials and Methods for a more technical justification of the model). Specifically, we simulated the model (parameter values listed in Table 2) with both intrinsic variability and added global extrinsic parameter variability as sources of stochasticity. The data we collected are in fluorescent units, therefore we set our model to arbitrarily yield maximum y 2 protein expression levels of about 2500 molecules, likely b-g) Dose-response relationships for t = 65 (blue), 165 (green), 330 (red), and 580 (black) minutes. Insets in d and g represent the ratio of the noise at time t to that at time t = 580 where for green: t = 165, red: t = 330, and black: t = 580. Noise is defined as standard deviation over the mean. b) Experimental y 2 data. c) Experimental y 1r data. d) Normalized experimental y 2 data. Each curve is normalized by its maximum mean value in panel b. e) y 2 data generated by a computational model of the circuit with slow fluctuations in protein synthesis rates. f) y 1r data generated by a computational model of the circuit with slow fluctuations in protein synthesis rates. g) Normalized y 2 data generated by a computational model of the circuit with slow fluctuations in protein synthesis rates. Each curve is normalized by its maximum mean value in panel e. doi:10.1371/journal.pcbi.1004462.g003 Table 2. Parameters for model of synthetic circuit (global model).
parameter: an underestimation of the actual system. However, this choice constitutes a scaling factor and does not affect any of our results. We also accounted for the estradiol dependence of ADH1 (Fig 3c) by adding to the model a term depicting the modest estradiol dependent repression of this promoter. For global parameter variability, we again chose to focus on the parameters affecting protein expression. We potentially could model the global parameter variability with cell-to-cell heterogeneity in the protein degradation rates. However, given that our experimental data does not measure the expression of genes involved in either of these processes, i.e. no way to experimentally distinguish the source(s) of global parameter variability, we chose to model global variability in the protein creation rates. Following the same procedure as in the previous section, we added a stochastic global parameter, G, which affects the protein creation rates for Y 1 Ã and Y 2 such thatb y Ã The noise in the experimental Y 1r data is approximately .155, therefore modeling Y 1 and Y 1r using Poissonian statistics sets the mean of the global noise variable to G % 42. We first assumed that global parameter fluctuations are slow relative to the circuit timescales (γ g = 3 × 10 −6 ). Simulating the model with this slow global source of fluctuations (SGF model, (Fig 3e-3g)) generated profiles for normalized y 2 (Fig 3g) that recapitulated the highly similar variance envelopes of the experimental time-dependent dose responses (Fig 3d). This behavior was a characteristic feature of the model for any γ g < = 3 × 10 −6 . By contrast, as the global fluctuating variable assumes a faster timescale (γ g > 3 × 10 −6 ), the variability envelopes in the normalized time-dependent dose response of y 2 started to diverge from each other (S3a and S3b Fig, γ g = 3 × 10 −4 ). As expected, the system modeled with intrinsic variability only ( G ¼ 1, parameter values listed in Table 3) shows a normalized time-dependent dose response in which variability decreases as a function of time as the protein levels increase (S3c and S3d Fig). Given the data in Fig 3b, if the fluctuations were purely intrinsic, the ratio of the standard deviation to the mean between times 165 and 580 minutes would decrease by a factor of approximately 1.7. This is a change we should be able to detect in our data since for the number of cells sampled, the error in estimating the means and standard deviations in the dose response relationships are .5 percent and 2 percent, respectively. However, as previously discussed, the experimental data shows that this ratio is relatively invariant for the last 3 time points (Fig 3d, inset) while increasing for the both the fast global fluctuations model (S3b Fig, inset) and the intrinsic variability model (S3d Fig,  inset). Our argument is further strengthened by the fact that in order to capture the noise observed in y 1r with the intrinsic variability model for the first timepoint, we had to set the y 1r mean copy number in the model to an unrealistically low value for a strong promoter such as ADH1 (approximately 40 proteins), further indicating that variability is unlikely to be intrinsic. The results for the SGF model for γ g < = 3 × 10 −6 are not an artifact of the estradiol dependence of ADH1 since an SGF model without this effect yields indistinguishable results (S3e and S3f  Fig). We therefore conclude that the dominant source of variability in this synthetic circuit is likely to be due to a globally slow fluctuating variable. This is consistent with previous results, which also indicated that global parameters play a dominant role in cell to cell variability and Table 3. Parameters for model of synthetic circuit (intrinsic model).
parameter: Different Fluctuation Sources and Information in Biochemical Networks that these parameters exhibit fluctuations at a slower timescale than fluctuations of processes involved in gene expression [4]. In terms of mutual information, the fact that the normalized time-dependent dose responses coincide in terms of their variability (Fig 3d) implies that the experimentally computed mutual information I(x + , y 2 , t) at t = 165, 330 and 580 minutes should be similar. This is indeed the case (Fig 4a, solid black). Importantly, I(x + , y 2 , t) peaks and reaches a plateau at approximately 1 bit, at an earlier time than when y 2 reaches its steady-sate. This further lends credence to the idea that the variability in the population is dominated by global parameter variability. Gratifyingly, the model with 'slow' global parameter fluctuations (with γ g = 3 × 10 −6 ) also captures the Mutual information modeling predictions and experimental measurements of the transcriptional synthetic circuit. a) Mutual information I(x + ; y 2 , t) from experimental data (black solid, YFP in Strain 1), I(x + ; y 2 , t) from the SGF model (blue solid), and I(x + ; y 2 , tjSi) from the SGF model conditioned on S1 (red dashed), S2 (blue dashed), S3 (green dashed), and Sg (black dashed). b) Dose response of y 2 as a function of estradiol at time 580 minutes for the SGF model with initial condition S1, S2, and S3. c) Mutual information I(x + ;[y 1r y 2 ], t) based on joint measurement of y 1r and y 2 computed for the SGF model (black dashed), same calculations for equalized mean of y 1r (magenta dashed). Also shown for comparison are the mutual informations I(x + ; y 2 , t) (SGF, blue solid) and I(x + ; y 2 , tjS1) (SGF, red dashed). d) Measured joint mutual information I(x + ;[y 1r y 2 ], t) (black dashed) including the mean equalized case (magenta dashed). The measured mutual information I(x + ; y 2 , t) (mCherry, Strain 2) is also shown for comparison (blue solid). time-dependent mutual information seen in the data without any further parameter tuning (Fig 4a, solid blue).
Since slow global fluctuations seem to dominate in this circuit, our analysis above indicates that the population mutual information might be under-estimating the fidelity of a single cell. To illustrate this point, we used the model to computationally isolate and compute the mutual information I(x + ; y 2 , tjSi) for single cells S1, S2 and S3 as defined in the computational example above. These calculations yield a substantially higher MI value than the population MI for the time span simulated (Fig 4a). Evidently, and as explained above, the MI for S1, S2, or S3 will eventually converge back to the whole population MI, but here it will do so on a much slower timescale than that of the system. For example, the dose response and distribution of y 2 at time 580 minutes when the system is started from S1, S2 and S3 (Fig 4b) still shows tighter variability than that of the full population. Allowing for intrinsic variability in the initial conditions, i.e. starting with cells with G ¼ G at time zero (state S g ) yields a similar MI value to that of I(x + ; y 2 , tjS1) for γ g = 3 × 10 −6 (Fig 4a).
Finally, we explored how simultaneous measurement of y 1r and y 2 affects mutual information calculations. Calculations using the model indicate that as expected, knowledge of y 1r improves the estimate of mutual information. For a slow globally fluctuating variable (γ g = 3 × 10 −6 ), the joint mutual information I(x + ; [y 1r y 2 ], t) is larger than I(x + ; y 2 , t). It can be shown that I(x + ; [y 1r y 2 ], t) = I(x + ; y 1r , t) + E[I(x + ; y 2 , tjy 1r )] where E[I(x + ; y 2 , tjy 1r )] is the expected value of I(x + ; y 2 , tjy 1r ). Since the influence of estradiol on y 1r adds (albeit very slightly (Fig 3c (data) and Fig 3f (model)) to the mutual information, i.e. I(x + ; y 1r , t) > 0, we normalized for this effect. To do so, at a given time, we set the y 1r mean at each estradiol value to the value of the y 1r mean at zero estradiol while adjusting the variance to preserve the noise in y 1r at each estradiol value. Importantly, this operation does not affect correlation between y 2 and y 1r at each estradiol value, but enforces I(x + ; y 1r , t) = 0. We confirm that this does not change our conclusions that knowledge of y 1r improves the estimate of mutual information (compare Fig  4c blue, dashed black and dashed magenta).
For comparison, we can carry out mutual information from the data obtained using Strain 2 in which both y 1r and y 2 are measured. In this strain, y 2 is the fluorescent protein mCherry which has a limited dynamic range. Importantly, at the highest estradiol values and peak mCherry signal (time 580 minutes), the measured correlation between y 1r and y 2 (greater than .79) is less than ten percent below the model predictions. Even at these signal levels the noise in the mCherry signal still deteriorates the correlation. For decreasing values of estradiol the correlations become increasingly inaccurate. Therefore, the values of the MI cannot be quantitatively compared to the model which was fitted to YFP data. However, the qualitative trend of increased MI due to measurement of y 1r relative to computing the MI with no knowledge of y 1r should also hold. This is indeed seen to be the case (Fig 4d). This insight is in agreement with recent work [19] that studied mutual information in the RAS/ERK pathway. Nuclear ERK (erk nuc ) was used as a readout of pathway information transmission. The MI at time t between this readout and the input x was conditioned for single cell ERK levels, using measurement of total ERK (erk tot ). It was also shown that I(x; [erk tot erk nuc ], t) is greater than I(x; erk nuc , t). Therefore, simultaneous measurements of different cellular variables improve estimates of mutual information capabilities of single cells.

Discussion
In this work, we illustrated how variability in initial conditions across a population, as well as slow-fluctuating extrinsic (global) variables can generate low values for the population mutual information in response to an input. We also demonstrated that when subpopulations of cells that have similar parameters or initial states are isolated, their mutual information values are transiently much higher than those of the whole population. These findings are important in light of the fact that many previous studies have found that extrinsic variability is a substantial contributor to pathway fluctuations. Indeed, our own experimental data using a synthetic circuit also implicated extrinsic fluctuations as a major source of variability. As a result, cells in a population cannot be considered to be the same noisy channel for mutual information calculations. Rather, each cell is a different noisy channel possessing its own parameters. Recent work [20] using light-inducible input signals [21,22] to a mammalian RAS/MAPK pathway observed that different isogenic single cells have quantitatively different dose-response relationships. Interestingly, for the RAS/MAPK mammalian system, the dose-repsonse relationships were repeatable for hours within a given single cell [20], suggesting slow global parameters that affect that pathway for that duration.
A direct assessment of mutual information requires repeated time-resolved measurements in single cells. Another strategy to better approximate mutual information is to simultaneously measure a large number of interconnected variables, including global states. This might be increasingly feasible with breakthrough technologies such as mass-cytometry (a.k.a. CyTOF) [23] as well as improvements in fluorescent reporter technologies. In the mean time, however, we have demonstrated that computational modeling, especially with respect to the patterns of time-dependent variability, can generate valuable insights into whether intrinsic or extrinsic fluctuations dominate variability in a circuit. These results produce a more accurate quantification of mutual information, and therefore promise to generate a more realistic assessment of signaling fidelity in cellular circuits.
Our results and those from [20] support a view in which individual cells have distinct transfer functions over relevant signaling timescales, and have superior signaling fidelity (> 1.5 bits) than estimated from pooled measurements of a population. From this perspective, it could be the case that a diversity of high fidelity but different single cell signaling transfer functions across the population is a beneficial trait. However, some situations might arise where variability in population signal transmission capacity is not desirable. In this case, cells might use strategies such as negative feedback to constrain this variability. In either case, cells might also capitalize on the integration of signals from many pathways that respond to a given input(s) in order to generate a desired population response. In this view, each such pathway will add to the mutual information of the desired cellular output (e.g. level or activity of a transcription factor), allowing the population to further circumvent in this way any information fidelity bottleneck. Researchers of the subject are likely to encounter both situations, and perhaps a revised form of population mutual information might be needed to quantify these effects, along with the formulation of new information theoretic metrics. As an example, for any given input x(t), the mutual information Iðy 2 ; y Ã 1r ; t j xðtÞÞ gives us a sense of the diversity (or spread) in responses in y 2 given the cell-to-cell variability encoded in y Ã 1 . S4 Fig shows the results of this metric applied to the simple signaling cascade (Fig 1b) for different input step function amplitudes x + and for different times. We envision these kind of metrics to reflect the different subpopulations with similar parameters within a given population and to serve as a potential tool to quantify how cell-to-cell variability across a population might change in structure due to various time-dependent inputs.
Finally, most studies to date have focused on variability in populations of non-communicating cells. Information fidelity in cells that communicate, for example through quorum sensing for bacterial communities or cell-to-cell mechanical coupling for tissues, is still largely unstudied. How cell-to-cell communication modulates global variability and variability in initial conditions across a population, and hence mutual information of cellular pathways, is a topic that should be explored in order to determine whether and when multicellularity offers a beneficial strategy in terms of signaling fidelity.

Approximating mutual information with N experiments
Because we are using a finite number of experiments, the input distribution p u (x + ) is sampled with N discrete points. In practice, these points are spaced to accurately sample the input-output transfer function p(y, tjx + ) for x + ranging from 0 to X max . The time-dependent mutual information is then calculated with this data. For values of x + between the sampled values, p(y, tjx + ) is approximated by linearly interpolating the moments of the adjacent sampled distributions. Since the distributions generated by systems in this paper are approximately gaussian (and approximately negative binomial at very low x + for the synthetic circuit data), only the means and covariances are required. A larger number of experiments (N) generates a more accurate approximation of mutual information. However, we observed that convergence to accurate MI values does not increase monotonically with N for the logarithmic sampling of the doses response that we have adopted. Rather, convergence proceeds exponentially, followed by marginal gains in accuracy as N increases. Therefore, for every N, we examine the last three sample number values, N, N−1 and N−2. Given their measured convergence rates, we can extrapolate an upper bound on the MI at an infinite number samples. We choose N whose calculated MI at N is within 1 percent of the extrapolated upper bound.

Chemical equations for the simple in silico network
The chemical equations for the circuit in Fig 1b are ; ; À ! f 1 ðy 1 Þ M y 2 À! g m 2 M y 2 ; ð 2dÞ . The propensities of the reactions appear above the reaction arrows. The system is a simple cascade of reactions where the input X activates Y 1 , and subsequently the Y 1 -dependent transcription of Y 2 . The parameter values are tabulated in Table 1.
Here the mean total number of Y 1 molecules, active and inactive, is b y Ã 1 =g y Ã tively. This system has only a single stationary solution. This allows us to approximate and efficiently calculate the master equation with a local affine assumption using the first two moments Eqs (6) and (7) taken from [24].
Inclusion of global parameter variability within chemical equations. In this work, we have assumed that the stochastic global parameter, G, manifests itself in variation in translation rates. To incorporate G we define our new protein creations ratesb y Ã and β y 2 are the nominal values in Eq (2). For simplicity we define G to follow a memoryless birth/death process through the reaction equations where G ¼ b g =g g and the noise in G, η g , is given as the standard deviation over the mean: Chemical equations for the synthetic circuit. The chemical equations for the synthetic circuit in Fig 3a are the same as the simple circuit except that the production of Y Ã 1 now involves an mRNA step, which does not directly affect any of our results. Also, we have added a YFP reporter of Y Ã 1 that has a half-life of 6 hours which we set the transcription factor itself to be the same. We define the estradiol dependence in Y Ã 1 mRNA as b m Ã a x x n x e =ðx n x e þ x n x Þ The parameters for the circuit are given in Table 2 (global fluctuation model) and Table 3 (intrinsic fluctuation model).

Computation of first two moments using affine assumption
The formulation that we assume in our model and data consists of a system of well-stirred chemical reactions with N molecular species. For some environmental input X(t), we define the pathway state Y(t) to denote the vector whose integer elements Y i (t) are the number of molecules of the ith species at time t. If there are M elementary chemical reactions that can occur among these N species, then we associate with each reaction r j (j = 1, . . ., M) a non-negative propensity function defined such that a j (Y(t)) τ+o(τ) is the probability that reaction r j will happen in the next small time interval [t, t+τ], as τ ! 0. The polynomial form of the propensities a j (y) may be derived from fundamental principles under certain assumptions [25]. The occurrence of a reaction r j leads to a change of ν j 2 Z N (the set of nonnegative integers) for the state Y. ν j is therefore a stoichiometric vector that reflects the integer change in reactant species due to a reaction r j .
This set of well-stirred chemical reactions can be represented by the joint probability density function P(y, tj X(t)) which describes the probability of the system being in state y at time t, given the environmental signal X(t). The evolution of P(y, tjX(t)) is given by @Pðy; tjXðtÞÞ @t ¼ X M j¼1 a j ðy À n j ÞPðy À n j ; tjXðtÞÞ À a j ðyÞPðy; tjXðtÞÞ ð4Þ h Eq (4) is the so-called chemical master equation (CME) [26,27].
To approximate the CME with moment equations, we approximate the propensity function a j (y) with a locally affine Taylor series expansion [24] about the mean of the distribution, z(t), to get a j ðyÞ % a j ðzðtÞÞ þ X N i¼1 @a j ðyÞ @y i j y¼zðtÞ ½y i À z i ðtÞ ð5Þ From the time dependent mean equation for the kth species is n jk a j ðzðtÞÞ ð6Þ and the time dependent covariance equation for the kth and k 0 th species is The calculation of the mutual information requires probability distributions. Given that we solve the first two moments, we constrain our distributions to be either a negative binomial distribution or a normal distribution. For cases when m k < 3 ffiffi ð p C kk Þ, we apply the negative binomial distribution since it only requires the first two moments and is non-negative. The negative binomial is very close to a normal distribution for m k > 3 Ã ffiffi ð p C kk Þ and we therefore apply the normal distribution in these regions. The value of 3 used is heuristic, but the tail of the normal distribution at negative values is negligible at this point. For linear transcriptional systems, the negative binomial is a natural steady state solution [28] which was our motivation for applying it. Importantly, our data never violated any constraints required by the negative binomial distribution, for example, μ k C kk . Note that the negative binomial distribution is only required for our modeling of the synthetic circuit data. Our theoretical example in the first half of the paper has large enough basal levels at zero input which always keeps it in the normal distribution regime. As a demonstration of the validity of the moment approach, S5 Fig shows very good agreement in the distributions derived from stochastic simulations (SSA) and the moment equations for the synthetic circuit model (γ g = 3 × 10 −6 , initial condition S1).

Synthetic circuit constructs
Strains. All plasmids used in this study were derived from a set of yeast single integration vectors containing selectable markers and targeting sequences for the LEU2, HIS3, TRP1 and URA3 loci. These vectors were linearized by digestion with PmeI and transformed using standard yeast transformation techniques. All strains were derived from haploid W303a and were deleted for GAL4 to eliminate competition between the endogenous Gal4p and the previously described estradiol-inducible Gal4 chimera (Gal4DBD-ER) for binding to the GAL10 promoter [29]. The sequences for the ADH1 and GAL10 promoters were 658 and 646 bp upstream from the start codons for these genes, respectively. The genotypes for these strains are listed in Table 4.
Growth conditions and flow cytometry. Cells were grown in YPD at 30°C. Prior to the experiment, cells were grown and maintained in exponential phase (optical density <. 15) for approximately twenty-four hours in the absence of estradiol and then diluted to an optical density (OD600) of 0.05 at the beginning of the experiment and periodically diluted to stay under 0.15. Using a deep-well 96-well plate (Thermo), estradiol concentrations decreased from columns one to eleven in logarithmically spaced points according to the following equation: 100 (2/3) c−1 nM where c denotes the column number. Well 12 did not receive estradiol. Estradiol did not change the growth rate of the cell population. Two replicates were performed. Measurements were taken at t = 0, 65, 165, 330, and 580 minutes. Fluorescence measurements were performed on a LSRII analyzer (BD Biosciences). A blue (488 nm) laser was used to excite YFP (Venus) and a green (561 nm) laser was used to excite RFP (mCherry). Emission was detected using a 530/30-nm bandpass filter for Venus (Chroma) and a 610/20 bandpass filter for mCherry (Chroma). Greater than three thousand cells were collected for each measurement. Flow cytometry data was analyzed in MATLAB (Mathworks).
Supporting Information S1 Fig. Comparison of Y 2 mCherry measurements show that Strains 1 and 2 are equivalent in Y 2 output. Strains 1 and 2 have similar growth rates that are independent of estradiol concentrations less than 100 nM. a-b) Dose-response relationships for t = 65 (blue), 165 (green), 330 (red), and 580 (black) minutes. The top panel is the normalized experimental y 2 data. Each curve is normalized by its maximum mean value in the corresponding lower panel. The lower panel is the un-normalized experimental y 2 data. a) Experimental y 2 (mCherry) data for Strain 1. b) Experimental y 2 (mCherry) data for Strain 2. c)-d) Single well growth curves accouting for dilution of 1/2 at t = 165 and 1/2 at t = 330 (single wells (red), 0 nM estradiol well (blue), 100 nM estradiol well (green), average (black)). The flow cytometer sample volume changed at t = 165 minutes, hence the jump in cell count. There was also pipetting error from the deep well plate to the sample plate at t = 0 and 65 minutes resulting in error in cell count across wells. For t = 165, 330 and 580 the pipetting error was minimized. c) Growth curves for Strain 1. d) Growth curves for Strain 2. the synthetic circuit model (γ g = 3 × 10 −6 and initial condition S1). a) Plot of dose response from moment equations at time 580 minutes. b) Plots of the distribution p(y 2 , t = 580jx(t))/max(p(y 2 , t = 580jx(t))) derived from the SSA and moment equations for select estradiol values up to 8.5 nM. c) Plots of the distribution p(g, t = 580jx(t))/max(p(g, t = 580jx(t))) derived from the SSA and moment equations. Analytical steady-state distribution shows that the moment equations and SSA results have evolved the global variable distribution p(g, t = 580jx(t)) about halfway to steady state. (EPS)