Under-Dominance Constrains the Evolution of Negative Autoregulation in Diploids

Regulatory networks have evolved to allow gene expression to rapidly track changes in the environment as well as to buffer perturbations and maintain cellular homeostasis in the absence of change. Theoretical work and empirical investigation in Escherichia coli have shown that negative autoregulation confers both rapid response times and reduced intrinsic noise, which is reflected in the fact that almost half of Escherichia coli transcription factors are negatively autoregulated. However, negative autoregulation is rare amongst the transcription factors of Saccharomyces cerevisiae. This difference is surprising because E. coli and S. cerevisiae otherwise have similar profiles of network motifs. In this study we investigate regulatory interactions amongst the transcription factors of Drosophila melanogaster and humans, and show that they have a similar dearth of negative autoregulation to that seen in S. cerevisiae. We then present a model demonstrating that this stiking difference in the noise reduction strategies used amongst species can be explained by constraints on the evolution of negative autoregulation in diploids. We show that regulatory interactions between pairs of homologous genes within the same cell can lead to under-dominance — mutations which result in stronger autoregulation, and decrease noise in homozygotes, paradoxically can cause increased noise in heterozygotes. This severely limits a diploid's ability to evolve negative autoregulation as a noise reduction mechanism. Our work offers a simple and general explanation for a previously unexplained difference between the regulatory architectures of E. coli and yeast, Drosophila and humans. It also demonstrates that the effects of diploidy in gene networks can have counter-intuitive consequences that may profoundly influence the course of evolution.


Introduction
Negative autoregulation is a network motif in which a transcription factor inhibits its own expression. Theoretical work has shown that this type of regulation reduces intrinsic noise and quickens the response time to environmental perturbations [4,6,7] and experiments using artificial gene regulatory circuits in E. coli have confirmed these predictions [6]. Negative autoregulation therefore represents a simple yet powerful mechanism to maintain cellular homeostasis in the face of environmental and metabolic perturbations and reduce the often substantial fitness costs that noise can incur [3]. Different organisms, however, vary a great deal in their use of the motif. In E. coli, close to 50% of transcription factors (82 out of 182) [8][9][10]14] have been shown to negatively autoregulate. In contrast, negative autoregulation is almost entirely absent from the transcription factors of S. cerevisiae (3 out of 169) [9][10][11][12][13].
How can we account for this striking discrepancy? In order to answer this, we looked at the extent to which negative autoregulation is used in other species. We interrogated systematic datasets on the regulatory interactions amongst the transcription factors of D. melanogaster and humans and found a similar pattern to that observed in yeast: in D. melanogaster 3 out of 87 [15][16][17] and in humans 5 out of 301 [16][17][18] transcription factors autoregulate (see SI, Table S1-S3). Currently, there is no obvious way to account for this striking discrepancy between these organisms, despite widespread interest in the strategies they employ to tackle noise [1][2][3][4][5][6][7]. Here we develop a model, founded in biophysics, for the evolution of negative autoregulation in diploid species. We use it to argue that a dearth of autoregulating genes in yeast, flies and humans can be explained by constraints on the evolution of negative autoregulation that arise due to diploidy.

Gene expression under negative autoregulation
Previous theoretical work on the dynamics of gene expression under negative autoregulation has considered single genes and so is implicitly haploid [4][5][6][7]. Such models exclude the more complex interactions that occur due to cross-regulation between homologous gene copies within a diploid cell (Fig. 1). Here we characterise the expression dynamics and regulatory evolution of homologous pairs of negatively autoregulating genes, taking into account the cross-talk between alleles.   Figure 1: Cross-talk in diploid autoregulators (a) Schematic representation of negative autoregulation when one (left) and two (right) copies of a gene are present in a cell. In the haploid the amount of negative autoregulation the gene experiences depends on on its own expression level. In the diploid, two gene copies are present (shown as light gray and dark gray), and the amount of negative autoregulation experienced by each gene depends on the expression level of both genes combined. If the two gene copies differ from one another in the strength of their transcription factor binding sites, complex dynamics can arise that are not observed in haploids. (b) IIllustration of variation in the repression function, φ(p), with protein concentration for different Hill coefficients, n = 1 (sold line), n = 2 (small dashes) and n = 5 (large dashes).
We model negative autoregulation in a diploid using a set of ordinary differential equations that track changes in the mRNA and protein concentrations for each of a pair of alleles (labelled with subscripts i and j), r i , r j , p i and p j . The total concentration of mRNA and protein in the diploid cell are given by the summed output of the two alleles r = r i + r j and p = p i + p j . Changes in mRNA and protein concentrations for the pair of alleles over time are given by According to these equations, mRNA is transcribed at a (usually low) constant background rate k l , plus a rate φ(p) due to negative autoregulation, that decreases as the total cellular protein level p = p i + p j increases. Protein is produced from mRNA at the rate of translation k p , whilst protein and mRNA degrade with rates γ p and γ r , respectively. As in previous work [4,6], we model the repression function φ(p) in Eqs. 1 as a Hill function where K i is the dissociation constant associated with the autoregulating transcription factor binding site. Smaller values of K (lower rates of dissociation) indicate a steeper slope and stronger regulation. The Hill coefficient n governs the steepness of the function at the inflection point and hence determines how step-like regulation will be. In systems where transcription is regulated by a single binding site, φ(p) has a Michaelis-Menten-like form, corresponding to a Hill coefficient of n = 1 [6,19,20]. A single binding site is the simplest and most relevant case for evolving negative autoregulating, and it is the one we focus on here. For completeness, we analyse the more general case of arbitrary Hill coefficient in the Appendix and in the Supporting Information we show that our results also hold for different values of n.
In the absence of negative autoregulation (i.e., φ(p) = k 0 ), mRNA is produced at the maximum rate of transcription k l + k 0 . In this case, concentrations of mRNA and protein reach equilibrium values of r max = 2(k 0 +k l ) γr and p max = r max kp γp . Starting from these values, equilibrium mRNA and protein levels decrease with increasing autoregulatory binding strength (decreasing K). The minimum mRNA and protein levels are reached when negative autoregulation is strongest (i.e. φ(p) → 0 as K → 0). The resulting minimum equilibrium concentrations are r min = 2k l γr and p min = r min kp γp .

Evolution of negative autoregulation for homeostasis and faster response times
In order to analyse the evolution of autoregulatory binding sites we consider two separate but related functions of negative autoregulation: faster response times and maintaining mRNA and protein homeostasis. First, to study the evolution of negative autoregulation for faster response times, we simply equate the fitness of a system with its response time (i.e the time taken to return to equilibrium following a perturbation). We use Eqs. 1 to infer selection pressures on the strength of autoregulation, i.e., the dissociation constant K, by analysing how quickly genotypes with different autoregulatory binding strength return to equilibrium following a perturbation in protein level. To do this we calculate a genotype's "response time": the time taken for cellular protein concentration to return to equilibrium following a perturbation. We model perturbations as a reduction of the protein level to a fraction α of the equilibrium level.
The value of α varies continuously between 0 and 1 to encompass both small perturbations, for example those resulting from intrinsic noise in transcription and translation (α ≈ 1), and larger perturbations, for example those resulting from resource deprivation in the environment or following cell division [2]. We present results derived from numerical analysis of Eqs. 1 that are applicable to perturbations of any size. These are complemented with an analytical treatment of the response time of the system to small perturbations, based on its maximal eigenvalue, |λ| (see Appendix), which allows us to develop an intuition for how autoregulating genes in diploids respond to perturbations.
To study the evolution of negative autoregulation for homeostasis, we turn to stochastic simulations of negatively autoregulating genes, which allow us to assess the amount of intrinsic noise associated with gene expression. Previous work has shown that negative autoregulation can help maintain homeostasis in gene expression by reducing the amount of intrinsic noise in negatively autoregulating genes, compared to other genes [7]. In fact, reducing the response time of a gene to very small perturbations away from equilibrium, also decreases the intrinsic noise in gene expression. Therefore, the two functions of negative autoregulation we consider (producing faster response times and reduced intrinsic noise) are highly interrelated. To study the evolution of negative autoregulation for lower intrinsic noise, we equate the fitness of the system with the amount of intrinsic noise it displays (i.e the ratio of the variance in gene expression to the mean gene expression level). We infer selection pressures on the strength of autoregulation, i.e., the dissociation constant K, by the intrinsic noise of genotypes with different autoregulatory binding strengths. These are determined by performing Monte Carlo simulations for a full, molecular model of transcription, translation and autoregulation (see Materials and Methods).

Response time in homozygotes
We first compare the response times of two homogozyotes whose alleles are identical in every respect except for the dissociation constant. One homozygote carries two copies of a resident allele with dissociation constant K 1 , the other carries two mutant alleles that have a decreased dissociation constant K 2 = K 1 exp[− ] (with > 0) and hence stronger autoregulatory binding. Numerical analysis of the system shows that homozygotes for the more strongly autoregulating allele (with K 2 ) respond more quickly than homozygotes for the more weakly autoregulating allele (with K 1 , Fig. 2a). This is true up to a value of K ≈ K opt , which provides the fastest response time attainable by the system and hence provides the optimal binding strength. Further increases in regulation beyond this value are not favoured because they lead to overshooting the optimal binding strength. These results for diploid homozygotes mirror those obtained for haploids [6] (see Appendix) and show that regulatory interactions between pairs of identical alleles do not, in themselves, diminish the beneficial effects of negative autoregulation. Negative autoregulation can therefore, in principle, function as a mechanism to produce faster response times in diploids just as it does in haploids.  4,6]. Response times were calculated by numerically integrating Eq. 1 from zero protein concentration to 90% of the equilibrium. The optimal binding strength in these graphs is p max /K = 1250, corresponding to a background transcription rate k l /γ p = 10 −3 .

Response time in heterozygotes
The results above depend on comparing homozygotes for alleles with different dissociation constants, K 1 and K 2 . The evolution of negative autoregulation, however, must occur through the stepwise accumulation of new mutations that are initially rare and found only in heterozygotes. In order to assess whether autoregulation can evolve in diploids, we therefore need to determine whether a mutant allele with a stronger binding site (K 2 ) will confer a selective advantage to a heterozygote that also carries a resident allele with a weaker binding site (K 1 ). A mutation will be favoured and increase in frequency if a heterozygote is able to respond more quickly to perturbations than a homozygote carrying two copies of the more weakly binding resident allele.
Numerical analysis of Eqs. 1 reveals that heterozygotes often have greater response times than homozygotes with the more weakly binding resident allele. Fig. 2b shows that heterozygotes only have improved response times when the resident allele binding strength is weak (K p max [4,6]) or if the effect of a mutation that increases binding strength is small ( is small). As the resident allele binding strength increases (i.e. p max /K increases) an ever larger range of mutation sizes result in increased heterozygote response times (Fig. 2b), resulting in under-dominance (i.e. heterozygote disadvantage). Typicaly mutation sizes for transcription factor binding sites are in the range 1 < < 3, [20][21][22]. In this range regulatory mutations are subject to under-dominance and increasingly so as the binding strength of the resident allele increases. As a consequence, the maximum binding strength that can evolve is likely to be significantly lower than in haploids (Fig. 2). Based on these results, we expect under-dominance to pose a significant barrier to the evolution of negative autoregulation in diploids.
To better understand why under-dominance arises in this system, we calculated the eigenvalues associated with Eqs. 1. These provide a measure of the rate at which the system returns to equilibrium following a small perturbation, and allow us to elucidate the relative contributions of the different alleles to the response dynamics of the gene pair. The maximal eigenvalue of Eqs. 1 for a heterozygote, |λ het |, can be expressed as (see Appendix) where V is the squared difference of the mean steady state expression levels of the two alleles in the heterozygote and |λ hom | is the maximal eigenvalue of a homozygote with protein concentration equal to that of the heterozygote at equilibrium, p * het (see Appendix). Eq. 2 says that, even if increasing autoregulatory binding strength leads to a faster response time in a homozygote, this advantage is offset in the heterozygote by an amount V /p * het , which measures how different the expression levels of the two alleles are (it is analogous to the Fano factor, a measure of the spread in a probability distribution [7]). As the difference in the expression of the alleles increases, V /p * het increases from 0 to a maximum p * het /2. We can understand why increasing the difference in allelic expression results in increased response time by considering the contribution of the individual alleles to the response time of the gene pair (Fig. 3). The level of negative autoregulation at each allele depends on the strength of its binding site and the amount of protein product present in the cell. In a heterozygote, the allele with the stronger binding site is more strongly suppressed (compared to the same allele in a homozygote), since there is more protein available to bind to it. At the same time, the allele with the weaker binding site is less strongly suppressed compared to the same allele in a homozygote. As a result, the allele with the stronger binding site has a faster response time than in a homozygote, whilst the allele with the weaker binding site has a slower response time than in a homozygote. However, the overall effect tends to be to increase the response time of the heterozygote, because the dynamics of protein expression in the heterozygote are dominated by the allele with the weaker binding site (Fig. 3).  , and the optimal binding strength in these graphs is p max /K = 1250, corresponding to a background transcription rate k l /γ p = 10 −3 .

Evolution of faster response times
Under-dominance for response time occurs across a wide range of parameter values, but can be avoided if mutations have small effects on binding site strength (Fig. 2b). To determine whether a series of mutations with small effect could offer a feasible way for genes to evolve strong negative autoregulation in diploids, we carried out simulations of binding site evolution that incorporated established properties of real binding sites.
Transcription factor binding sites in eukaryotes vary between 5 and ∼ 30 nucleotides in length, with an average of 10 nucleotides [23]. They have a small number of optimal sequences that bind the transcription factor with maximum affinity [20][21][22]24,25]. The binding strength of a site can be expressed as a function of the total binding energy E of its sequence, so K = exp[−E]. This total binding energy is generated by the additive contributions of individual nucleotides to overall binding, E = i i . Individual contributions are set to i = 0 for nucleotides that do not match the optimal sequence and i > 0 for matched nucleotides [20][21][22].
Based on these properties, we performed simulations of the evolution of an autoregulatory binding site under selection for decreased response time. These took into account the empirical distribution of binding site length in model eukaryotes and the variation in contributions to binding strength i across the binding site sequence (see Materials and Methods). The values of i were drawn from a uniform distribution in the interval (0, 3). This sampling covers the empirically estimated range 1 < i < 3 [20][21][22]. It also ensures that mutations of small effect ( < 1) occur frequently and so allows for the possibility that autoregulation could evolve via the accumulation of mutations with small effect. Evolution was started from a state of minimum affinity (all nucleotides non-optimal) and proceeded through a series of single nucleotide substitutions. A mutant was assumed to go to fixation if it resulted in a response time less than or equal to that of the resident. Simulations were carried out for both haploids and diploids (for which the response time of mutants was evaluated in the heterozygote state).
The results (Fig. 4) confirm that under-dominance strongly constrains the evolution of negative autoregulation in diploids. Haploids readily evolved binding sites with dissociation constants close to K opt . In contrast, the average binding strength in diploids was around 100 times weaker than K opt and only a small proportion of sites reached binding strengths comparable to those of haploids. This shows that under realistic conditions, diploids will rarely be able to evolve the level of autoregulation observed in haploids.

Intrinsic noise in diploids
In order to investigate the evolution of negative autoregulation as a mecahnism to reduce intrinsic noise in diploids, we turned to stochastic simulations. Intrinsic noise in gene expression occurs because transcription and translation are inherently noisy processes. As a result of this noise, all genes experience constant fluctuations in their mRNA and protein levels. The greater intrinsic noise associated with a particular gene, the higher the variance in its expression level relative to the mean. Therefore, a natural way to characterise the amount of intrinsic noise associated with a gene is to measure the ratio of the variance to the mean expression level at equilibrium (known as the Fano factor) [7]. We performed molecular simulations that capture transcription, translation and degredation in the presence of negative autoregulation (see Materials and Methods). Just as in our analysis of response times, we compared a resident allele with dissociation constant K 1 , to a mutant allele with dissociation constant K 2 = K 1 exp[− ]. We compared the intrinsic noise (as measured by the Fano factor) in the resident homozygote to that of the heterozygte and the mutant homozygote, and thus determined whether under-dominance occurs in the evolution of negative autoregulation as a mechanism to reduce intrinsic noise. The results are shown in Fig. 5. We find once again that under-dominance occurs. Whereas the optimal binding strength for a single negatively autoregulating binding site is found to be p max /K ∼ 10, the maximum evolvable binding strength (i.e that which can evolve without encountering under-dominance) is found to be p max /K ∼ 1, an order of magnitude weaker. A similar pattern occurs when steeper Hill coefficients are considered (Fig. 5). Therefore we conclude that under-dominance poses a barrier to the evolution of strong negative autoregulation both as a mechanism to speed response times and to reduce intrinsic noise.

The effects of mutations to other parameters
To test the generality of our findings, we also considered variation in other parameters (see SI Fig. S1-S5 and Text). We first relaxed our assumption of a single binding site and explored the case of Hill coefficients n > 1, implying regulation through multiple, cooperatively acting binding sites. In line with the effect of increasing binding strength through changes in K, we find that mutations increasing the Hill coefficient are subject to under-dominance (see SI Fig. S1-S2 and Text). Therefore, a mutation that increases the strength of negative autoregulation are subject to the same evolutionary constraints, independent of whether they increase regulation by changing the dissociation constant K or the Hill coefficient n.
We also considered variation in the rates of mRNA and protein degradation (γ r and γ p ) to see whether they provide conditions in which the effects of under-dominance on autoregulatory binding strength can be avoided (see SI Fig. S4 and Text). Variation in the rate of mRNA or protein degradation did not remove the tendency for mutations that increase autoregulatory binding strength to be subject to underdominance. However, it has been pointed out in other work [2,26], faster rates of protein degradation result in faster response times, and regulation of protein degradation can reduce noise. As might be expected, the constraints we describe on the evolution of response times through stronger negative autoregulation do not preclude the evolution of response times through other mechanisms, such as changes in protein degradation rates.

Discussion
Negative autoregulation is found to occur in 46% of E. coli transcription factors [4][5][6][7], but is rare in other species for which systematic data on transcriptional regulation is available, occurring in <2% of the transcription factors of yeast, Drosophila and humans (see SI, Table S1-S3). We have argued that, at least in part, this difference can be understood by considering the different evolutionary dynamics of autoregulating genes in haploids and diploids: selection for genes to have a decreased response time to perturbations favours negative autoregulation in haploids, but under-dominance tends to prevent the evolution of stronger autoregulatory binding sites for this purpose in diploids. This constraint on the evolution of negative autoregulation in diploids is compelling because it offers a simple and general explanation for the near absence of the motif across yeast, humans and flies. Furthermore, it is important to note that under-dominance is not built into our model but arises as an emergent property of our analysis of regulatory evolution -an analysis that simply extends to diploids previous models that have been shown to provide a good description of regulatory behaviour in haploids [6,7].
The empirical patterns we present are striking, however it is important to ask weather they can be explained by other means than those proposed in this paper. In particular we asked whether negative autoregulation is truly under-represented in the yeast, human and Drosophila data sets, as compared to E. coli, or whether the apparent reduction in the number of negative autoregulators is due to underrepresentation of genes with repressive function generally. To address this we interrogated each dataset to find the number of transcription factors with documented repressor activity. These account for 58 factors in humans, 37 in Drosophila, 54 in yeast and 82 in E. coli. If we include only transcription factors with known repressor function in our analysis, we find that 5 out of 58 (8.6%) genes negatively autoregulate in humans, 3 out of 37 (8.1%) in Drosophila, 3 out of 54 (5.6%) in yeast and 82 out of 138 (59%) in E. coli. Thus, the relative rarity of negative autoregulation in eukaryotes is not due to a general underrepresentation of repressive transcription factor effects among the genetic interactions described for these species. Instead, they appear to be a true property of their regulatory networks.
The species for which we were able to collate data show stark differences in the frequency of negative autoregulation. However, it would clearly be desirable to extend the scope of the empirical analysis to include examples that span the prokaryote-eukaryote divide. Two types of data are of particular interest in this respect: haploid genes in diploid species and duplicate genes in haploid species. The constraints on the evolution of autoregulation predicted by our model only affect genes for which two homologous copies are expressed in the same cell. Haploid genes in a diploid organism should therefore escape the evolutionary constraint on negative autoregulation. Unfortunately, the data on genetic regulation are currently too sparse to test this prediction with any degree of rigor. The only candidate for a haploid gene in our dataset is the human Y-linked transcription factor Sry (see SI Table. S3). However, its mode of regulation (positive or negative) is unknown. Duplicate genes in haploids offer a better prospect. Just as single copy genes in diploids are predicted to escape under-dominance, multi-copy genes in haploids may be subject to evolutionary constraints similar to those we have described for diploids. Interestingly, this appears to be supported by data on duplicate genes in E. coli. Although the use of negative autoregulation is widespread amongst E. coli transcription factors, duplicates of negative autoregulating genes are no more common than among other genes [10]. This is despite the fact that negative autoregulation is expected to reduce the deleterious effects of increased dosage following duplication [10]. Although the evolutionary dynamics of duplication and divergence are complex [27], it is interesting to note that our model predicts that any divergence in the expression levels of a pair of negatively autoregulating duplicate genes will tend to slow the response time of the pair. This is because expression divergence will tend to increase the difference in expression across the two genes and thus will increase the response time in exactly the same way as we have described for heterozygotes in diploid cells.
The frequency of autoregulating duplicates in E. coli offers some support for our model by showing that, as we would predict, the evolution of multi-copy genes in bacteria is constrained. However, it could be argued that the relative dearth of negative autoregulation in yeast, fruit flies and humans is not primarily caused by under-dominance, but is rather due to the fact that these organisms are eukaryotes. Eukaryotes may simply experience different types of noise, and, accordingly, have different mechanisms for dealing with it, making negative autoregulation unnecessary. Although we cannot dismiss this limitation of our model entirely, several points are worth noting. The use of response time as a measure of fitness makes our model very general. First, any cell has to deal with large perturbations, such as occur following cell division. The speed with which the concentration of a transcription factor returns to its equilibrium, and the regulatory dynamics allowing it to do so, are important across all levels of biological complexity. Second, as we have noted, although our model captures the response time to perturbations and the amount of intrinsic noise associated with a gene [7], it does not capture other, extrinsic sources of noise. In particular, eukaryotes tend to be affected by "input noise" that results, for example, from the stochastic ON-OFF switching occuring in eukaryotic cells [2,28], and it is positive autoregulation, not negative autoregulation, that is best for dealing with such noise [2,28,29]. However, positive autoregulation does not feature any more prominently than negative autoregulation within the regulatory networks of the three eukaryotes we analysed, with 9 instances in yeast, 16 in humans and 11 in Drosophila. These figures are not comparable to the frequency of negative autoregulation in E. coli, indicating that we are not simply observing a straightforward shift in the importance of different types of perturbations.
It seems unlikely that eukaryotes are completely exempt from dealing with the kind of perturbations that would require negative autoregulation. It is possible, however, that they implement autoregulation differently, and in a way that would not appear in our systematic screen of existing datasets. When using transcription regulation, eukaryotes may be able to achieve some degree of negative autoregulation through multiple, weak autoregulatory binding sites, along with cooperation (see Figs. S1-S2). Although we are able to show that the evolution of strong cooperative autoregulation is subject to under-dominance (Fig. S1), we find that the evolution of multiple, weak autoregulatory binding sites (Fig. S2) are less constrained. Since weak binding sites would likely be under-represented or absent from systematic datasets, we cannot rule out the possibility that diploids achieve negative autoregulation in this way, and some studies based on sequence conservation do suggest that autoregulatory binding sites in humans may be quite widespread [30]. Eukaryotes may also achieve negative autoregulation through mechanisms other than direct transcription regulation, for example, through changes in local chromatin structure or covalent changes in the protein structure of transcription factors. As these regulatory mechanisms are less likely to generate the phenomena of cross-regulation that occur in diploid transcription regulation, they may not to be subject to under-dominance. Finally, it is important to reiterate that our study is only concerned with the evolution of negative autoregulation for noise reduction and faster response times. However, genes can achieve noise reduction through other means than autoregulation, and autoregulation can be used for other purposes than noise reduction [31][32][33]. We do not suggest that eukaryotes are exempt from the problem of noise. We do suggest that diploid gene networks, in contrast to those of haploids, must seek a different solution to the same problem.

Conclusion
Our work shows that regulatory interactions between homologous genes can generate deleterious effects that constrain the evolution of negative autoregulation. The predictions of our model show that the high incidence of autoregulation in E. coli and the dearth of negatively autoregulating genes in yeast, flies and humans can be reconciled by taking into account a simple biological attribute-ploidy. Importantly, the difference between haploid and diploid regulation is not a mere correlate of the prokaryote-eukaryote divide. This was already suggested by the finding that the genetic networks of E. coli and yeast are-with the exception of their use of autoregulation-very similar [13] and is further corroborated by the the fact that the predictions of our model are met by data on single and duplicated genes within E. coli.
More generally, our work demonstrates that regulatory evolution can be considerably complicated by the presence of multiple copies of a gene in a cell, as is typically the case for eukaryotes. By explicitly considering the evolution of regulatory interactions, we have highlighted constraints that would not be evident from an analysis of the functional properties of an existing regulatory interaction in isolationstrong negative autoregulation quickens the response of genes to perturbation, but it is hard to evolve for this purpose due to under-dominance. This evolutionary perspective needs to be absorbed into attempts at unravelling the function of regulatory networks in higher organisms, a key problem for systems biology.

Monte-Carlo simulations
We used simulations of the molecular dynamics within a cell to determine the amout of intrinsic noise of autoregulating genes in diploids. A model that tracks the number of mRNA and protein molecules for a negatively autoregulating gene within a haploid cell is described in [7]. We generalised this to account for diploidy. The state of the system is described by the number of mRNA molecules r i , and the number of protein molecules p i produced from the two alleles i ∈ {1, 2}. The probability of a state {r 1 , r 2 , p 1 , p 2 } is specified by the joint probability distribution n r 1 ,r 2 ,p 1 ,p 2 (t). The transition probabilities for the system to move between states due to changes in r 1 and p 1 (and, analogously, due to changes in r 2 and p 2 ) are given by where p = p 1 + p 2 , k l + φ 1 (p) is the rate at which mRNA molecules are transcribed from allele 1, γ r is the rate of mRNA degradation, k p is the rate at which mRNA is translated into protein and γ p is the rate of protein degradation. As in the ODE model, φ 1 (p) is a function of the number of proteins p present in the cell, such that where k 0 is the maximum rate of mRNA transcription, and K 1 is the dissociation constant of the binding site of allele 1.
To calculate response times we first determined the equilibrium expression level of the system from the average of 10 5 replicate Monte-Carlo simulations. We then reduced mRNA and protein levels to a fraction α of the equilibrium level. The time for each replicate to return to equilibrium was measured and the average across the ensemble used as an estimate of the response time of the system. In order to determine how response times vary with the level of perturbation, simulations were run for values of α between 0 and 1 in steps of 0.01.

Simulations of binding site evolution
Binding site evolution was modelled by generating a transcription factor binding motif with a length n nucleotides and an optimal base associated with each nucleotide. As in other models of TF-DNA binding, when a given nucleotide i was matched for for the optimal base it contributed an amount i to binding energy, otherwise it contributed 0 [20][21][22]. Binding site lengths were drawn from an empirical distribution generated from the binding motifs of 454 eukaryotic transcription factors contained in the JASPAR CORE database [23]. The value of i for each nucleotide was drawn from a uniform distribution in the interval (0, 3). The optimal binding strength K opt was determined numerically (see Appendix), using the values for the system parameters that are given in the legend of Figure 4. We excluded from our analysis any binding sites for which the total binding strength of the optimal sequence was too low to achieve the fastest response time (i.e., those sequences for which K = exp[− i i ] > K opt ). Evolution started from a state of minimum affinity (all nucleotides non-optimal) and proceeded through a series of single nucleotide substitutions. At each time step, a random mutation was introduced into the binding site sequence, switching one nucleotide from the non-optimal to the optimal state. If the mutation resulted in a response time less than or equal response time of the resident, the mutant sequence was assumed to go to fixation in the population. Deleterious mutations that increased response times were assumed to be lost. The simulation was ended when no further advantageous mutations were available. Simulations were carried out for both haploids and diploids (for which response time of mutants was evaluated in the heterozygote state).

Derivation of response times in haploids
Here we derive results for the response time of a haploid autoregulating gene. We derive results for the general case in which autoregulation is described by a Hill function with arbitrary coefficient n (the analyses in the main text assumes n = 1).
The set of ODEs describing transcription and translation of mRNA and protein at a single autoregulating gene are analogous to those given for one allele in Eqs. 1 for a pair of autoregulating genes in a diploid. In order to simplify the analysis of the system we make the change of variables with L = K pmax−p min and γ = γp γr . The dynamics of the system can then be rewritten as where β = r min rmax−r min = p min pmax−p min = k l k 0 . In general β 1 since k l k 0 and is the rescaled form of the repression function φ(p) described in the main text. Assuming that mRNA decays much faster than protein [6,7] γ p γ r , then γ 1, it follows that γ ds dτ is small relative to dq dτ and we can assume that transcription output goes to equilibrium rapidly. That is, we can take γ ds dτ ≈ 0 and hence that the quasi equilibrium condition s ≈ β + k s (q) holds. Substituting into Eqs. 3, generates a 2-dimensional system that is well approximated by the 1-dimensional system

Small Perturbations
The Lyapunov exponent associated with Eq. 5 at equilibrium gives the rate at which the system returns to equilibrium following a small perturbation. It is given by Eq. 6 is always negative. In what follows we will discuss only the magnitude of the Lyapunov exponent |λ| with the understanding that this quantity is always negative and therefore describes the rate at which the system returns to equilibrium. From Eq. 6 it is clear that a mutation which increases dks dq will always serve to decrease the Lyapunov exponent and thus increase the rate at which the system converges to equilibrium.

Evolution of a new binding site
We compare a wild-type binding site, with dissociation constant L 1 , to a mutant binding site with dissociation constant L 2 such that L 1 > L 2 -meaning that the mutant has a stronger binding site than the wild-type. At equilibrium, the protein concentrations satisfy It is simple to show that q * 1 > q * 2 by differentiating, with respect to L. Thus, strengthening the autoregulatory binding site (i.e., decreasing L) will lead to a decrease in the equilibrium protein concentration, and so with L 1 > L 2 we always have q * 1 > q * 2 . To calculate the value of L opt for which |λ| is maximum, we note that dk s dq = n q k s (q)(1 − k s (q)).
At equilibrium q = q * = k s (q * ) + β, and the Lyapunov exponent can be written as and we can find the value of q * that results in the largest Lyapunov exponent. This is given by Translating this back into units of protein concentration, this means that the fastest response to small perturbations about equilibrium occurs when Thus, mutations which increase the strength of negative autoreguation, (and therefore decrease K), will decrease response time provided the equilibrium protein concentration is p * > √ p max p min , as discussed in the main text. The optimal binding site strength K opt can be determined by calculating the value of K which gives the optimal equilibrium protein concentration of Eq. 8. In the general case of arbitrary n, K opt cannot be found analytically, but it can always be found numerically. The derivation of K opt presented here is based on the assumption that perturbations of the system are small, in which case the dynamics of the system are well captured by its Lyapunov exponent. The optimal binding strength under perturbations of arbitrary size can be obtained by numerical integration of the system. As might be expected, the valuesK opt obtained in this way are similar to those calculated for small perturbations above.

Derivation of response times in diploids
For diploids we proceed in the same way as for a single gene, and obtain a 1-D system for expression of a pair of alleles (with dissociation coefficients L i and L j ) where k s i (q ij ) = 1

Evolution of a new binding site
We now consider the response time of a pair of autoregulating alleles in a diploid. When an organism is homozygous, both binding sites have the same dissociation constant, L 1 and Eq. 9 is of the same form as Eq. 5 for a haploid, and the results for response time in haploids can be applied. When an organism is heterozygous however, the results for haploids do not hold. We compare the Lyapunov exponents of a heterozygote with dissociation constants L 1 and L 2 , where L 2 < L 1 , to a resident homozygote in which both binding sites have strength L 1 . At equilibrium the total protein concentrations satisfy q * 11 = 2β + where q * 11 is the equilibrium protein concentration of the (resident) homozygote and q * 12 is the equilibrium expression of the (mutant) heterozygote. It is simple to show that q * 11 > q * 12 . by differentiating Eq. 10 with respect to L 2 .

Small perturbations
Following a small displacement from equilibrium, under-dominance will occur if the heterozygote has a smaller Lyapunov exponent than the homozygote. The maximal Lyapunov exponent of the system is given by for the homozygote, and |λ 12 | = 1 + n q * 12 q * 1,12 − β 1 − q * 1,12 + β + n q * 12 q * 2,12 − β 1 − q * 2,12 + β , for the heterozygote, where q i,ij referes to allele i in a diploid carrying alleles i and j. We can observe that the squared difference in the mean allele expression, V , is given by V = q * 1,12 − Substituting this expression for V in Eq. 12 we find |λ 11 | = 1 + n 1 − q * 11 2 + 2β − 2β q *

11
(1 + β) , Note that Eq. 14 is of the same form as Eq. 13, with an additional term that depends on the ratio of the squared difference in allele expression, V to the total expression. We can define λ hom to be the Lyapunov exponent associated with a homozygote of a given equilibrium expression and λ het to be the Lyapunov exponent associated with a heterozygote of the same equilibrium expression and obtain Eq. 2 of the main text (with n = 1).