Skip to main content
  • Loading metrics

Quantifying how constraints limit the diversity of viable routes to adaptation


Convergent adaptation occurs at the genome scale when independently evolving lineages use the same genes to respond to similar selection pressures. These patterns of genetic repeatability provide insights into the factors that facilitate or constrain the diversity of genetic responses that contribute to adaptive evolution. A first step in studying such factors is to quantify the observed amount of repeatability relative to expectations under a null hypothesis. Here, we formulate a novel index to quantify the constraints driving the observed amount of repeated adaptation in pairwise contrasts based on the hypergeometric distribution, and then generalize this for simultaneous analysis of multiple lineages. This index is explicitly based on the probability of observing a given amount of repeatability by chance under a given null hypothesis and is readily compared among different species and types of trait. We also formulate an index to quantify the effective proportion of genes in the genome that have the potential to contribute to adaptation. As an example of how these indices can be used to draw inferences, we assess the amount of repeatability observed in existing datasets on adaptation to stress in yeast and climate in conifers. This approach provides a method to test a wide range of hypotheses about how different kinds of factors can facilitate or constrain the diversity of genetic responses observed during adaptive evolution.

Author summary

How many ways can evolution solve the same adaptive problem? While convergent adaptation is evident in many organisms at the phenotypic level, we are only beginning to understand how commonly this convergence extends to the genome scale. Quantifying the repeatability of adaptation at the genome scale is therefore critical for assessing how constraints affect the diversity of viable genetic responses. Here, we develop probability-based indices to quantify the deviation between observed repeatability and expectations under a range of null hypotheses, and an estimator of the proportion of loci in the genome that can contribute to adaptation. We demonstrate the usage of these indices with individual-based simulations and example datasets from yeast and conifers and discuss how they differ from previously developed approaches to studying repeatability. Because these indices are unitless, they provide a general approach to quantifying and comparing how constraints drive convergence at the genome scale across a wide range of traits and taxa.


If different species encounter the same selection pressure, will adaptive responses occur via homologous genes or follow distinct genetic routes to the same phenotype? What factors limit the diversity of viable genetic routes to adaptation and how does variation translate into evolution? Empirical studies have identified different amounts of convergent adaptation at the genome scale across a range of species, traits, timescales, and levels of developmental-genetic hierarchy [13]. When evolution uses the same genes repeatedly to generate a given trait value, is this because of constraints acting on the genetic and developmental pathways limiting the production of variation (i.e., there is only a limited number of ways to generate a given trait value), or because of fitness constraints acting on genotypes that yield the same trait value (i.e., only some genotypes are selectively optimal)? Note that “constraint” in the evolutionary literature is commonly invoked to refer to factors that limit an adaptive phenotypic response in general (e.g., [4,5]). Here, we use it to refer to the factors that limit the diversity of genes used in independent bouts of adaptation and use the term “diversity constraint” hereafter for clarity.

As a case study to examine the differing types of diversity constraint, Mc1r provides perhaps the most well-known example of convergent local adaptation at the gene scale, and has been implicated in driving colour pattern variation in mice, lizards, mammoths, fish, and a range of other organisms [610]. Extensive studies in mice have revealed that over 50 genes can be mutated to give rise to variation in colour pattern [11], yet Mc1r consistently tends be one of the main contributors to locally adapted colour polymorphisms. Mc1r has minimal pleiotropic side effects [8,11] and it can mutate to a similar trait value through numerous different changes in its protein sequence [10,11] and therefore may have a higher rate of mutation to beneficial alleles than other genes. As such, it seems to be driven by a combination of both types of constraint: more ways to mutate via Mc1r implies that developmental-genetic constraints limit the contribution of other genes, while limited pleiotropy in Mc1r implies that fitness costs constrain which genes can yield mutations that provide a viable route to adaptation.

Are the diversity constraints acting on melanism representative of the kinds of constraints that shape patterns of genome-scale convergence and non-convergence across the tree of life? To answer this question, it is necessary to quantify the extent of genome-scale convergence in a wide variety of organisms and traits and ascertain what kind of diversity constraints are operating, which requires the development of an appropriate statistical framework. To this end, it is helpful to frame the above questions based on the flexibility of the mapping from genotype to trait to fitness: does repeatability occur because of low redundancy in the mapping of genotype to trait (hereafter low GT-redundancy [4]: only a few ways to make the same trait value; Fig 1A), or because of low redundancy in the mapping of genotype to fitness? (hereafter low GF-redundancy: only a subset of the genotypes yielding the same trait value are optimal; Fig 1B).

Fig 1. Scenarios with low GT-redundancy and no additional GF-redundancy (A) or high GT-redundancy and low GF-redundancy (B) can both result in high repeatability of adaptation (adapted from [22]).

GT-redundancy is determined by two factors: 1) the difference between the number of genes that need to mutate to yield a given trait value and the number of genes that could mutate to give rise to variation in the trait, and 2) the extent to which different genes have interchangeable vs. uniquely important effects on the phenotype. High GT-redundancy means that many different combinations of alleles can yield the same trait value, so if all else is equal, then independent bouts of adaptation are likely to occur via different sets of mutations and repeatability will be low [12,13]. The standard quantitative genetic model implicitly assumes complete GT-redundancy with fully interchangeable allelic effects, while the recently proposed omnigenic model assumes high but incomplete redundancy, with “core” vs. “peripheral” genes having different potential to affect variation [14].

GF-redundancy is determined by differences in fitness among genotypes that produce the same trait value and can increase the diversity constraints driving repeatability above the level incurred by GT-redundancy. Such differences in fitness can occur when mutations cause correlated effects on other traits that also affect fitness, such that not all mutations that are equally suitable for adaptation (i.e., pleiotropy), or when interactions among particular mutations have negative fitness effects, such that only particular combinations of mutations tend to contribute to adaptation (i.e., epistasis). It is also possible for effects on fitness to arise independent of a phenotypic effect, because architectures with different allele effect sizes and linkage relationships can have different fitness depending on the interaction between migration, selection, and drift. For example, if a given phenotype is coded by many small unlinked alleles, this architecture would be less fit than a similar phenotype coded by a few large or tightly linked alleles, in the context of migration-selection balance [15] or negative frequency dependence [16,17]. Similarly, the increased drift that occurs in small populations may prevent alleles of small effect from responding to natural selection [18,19], resulting in such genotypes being effectively neutral and therefore lower in realized fitness than those made up of large-effect alleles. For example, polygenic models of directional selection (e.g., [20]) assume no GT- and GF-redundancy, while traditional quantitative genetic models of Gaussian stabilizing selection assume high GT- and GF-redundancy (e.g., [21]).

In addition to GT- and GF-redundancy, other factors also impact signatures of convergence, such as differences among genes in recombination rate or propensity to retain standing variation. There has been considerable discussion in the literature about the effects of these and other factors on convergence [3,2330], and various indices have been previously used to quantify repeatability in empirical contexts (e.g. Jaccard index, Proportional Similarity; [1,2]). These existing indices provide a useful description of how often the same gene is used in adaptation, but as we will show below, they are not well-suited for testing of hypotheses to discriminate between these different kinds of constraint. They do not incorporate information about the genes that could contribute to adaptation but don’t, which is necessary to evaluate what kinds of diversity constraints are operating, and they are not explicitly tied to the probability of repeatability occurring under a null model.

Here, we develop novel statistical approaches for quantifying the diversity constraints that drive repeatability in genomic data from studies of local adaptation and experimental evolution. To study these constraints, we formulate an explicit probability-based representation of the deviation of observed repeatability from expectations under different null hypotheses. This approach can be used after standard tests have been applied to identify the putative genes driving adaptation and uses as input either binary categorization of genes as “adapted” or “non-adapted” or any continuous index representing the relative amount of evidence for a given gene being involved in adaptation (e.g. FST, p-values, Bayes factors). We begin by formulating an analytical model for a contrast of two lineages with binary data, and then generalize this model for contrasts of multiple lineages using either binary or continuous data. We also propose a novel index estimating the proportion of genes in the genome that can potentially give rise to adaptation. In all cases, these models can be used to successively test null hypotheses that incorporate different amounts of information about the constraints that could shape repeatability. The simplest null hypothesis is that there are no constraints and all genes have equal probability of contributing to adaptation. If more repeatability is observed than expected under this null model, then two inferences can be made: natural selection is driving patterns of convergence (and that observed signatures are not all false positives), and some diversity constraints are operating to increase the repeatability of adaptation. We then consider how other null hypotheses can be formulated to represent the various kinds of constraints discussed above. We focus mainly on the effect of low GT-redundancy, where the number of genes that could potentially contribute to adaptation is much smaller than the total number of genes in the genome, but also discuss how constraints arising from GF-redundancy, standing variation, or mutation rate could be modeled. Because this method quantifies repeatability in terms of probability-scaled deviations from expectations, it can be applied across any trait or species of interest, allowing contrasts to be made on the same scale of measurement.


Quantifying diversity constraints in pairwise contrasts

Suppose there are two lineages, x and y, that have recently undergone adaptation to a given selection pressure, resulting in convergent evolution of the same trait value within each lineage. This adaptation could be global, with new mutations fixed within lineages (e.g., in experimental evolution studies with multiple replicate populations), or local, with mutations contributing to divergence among populations within each lineage (e.g., in observational studies of natural adaptation to environmental gradients). In either case, we assume that adaptation can be reduced to a binary categorization of genes as “adapted” or “non-adapted” to represent which genes contribute to fitness differences (either relative to an ancestor or a differently-adapted population). We use the following notation to represent different properties of the genomic basis of trait variation: the number of loci in the genome of each species is nx, and ny, with the number of orthologous loci shared by both species being ns; the adaptive trait is controlled by gx and gy loci in each species, with gs shared loci (i.e. the loci in which mutations will give rise to phenotypic variation in the trait, hereafter the “mutational target”); of the g loci that give rise to variation, only a subset have the potential to contribute to adaptation due to the combined effect of all constraints, represented by gax and gay, with gas shared loci (the “effective adaptive target”); in a given bout of adaptation, the number of loci that contribute to adaptation in each lineage is ax and ay, with as orthologous loci contributing in both lineages. For simplicity, we assume that there is complete overlap in the genomes (ns = nx = ny), mutational targets (gs = gx = gy), and loci potentially contributing to adaptation (gas = gax = gay) in both species (see supplementary materials and S1 Fig for set notation). These assumptions are most appropriate for lineages that are relatively recently diverged, where most orthologous genes are retained at the same copy number and the developmental-genetic program is relatively conserved, so that the same genes potentially give rise to variation in both lineages. Lineages separated by greater amounts of time would be expected to have reduced ns due to gene deletion, duplication, and pseudogenization in either lineage, and reduced gs and gas due to evolution and divergence of the developmental-genetic program, through sub- and neo-functionalization, and divergence in regulatory networks.

Under the assumption that all gas genes have equal probability of contributing to adaptation (i.e., no diversity constraints are operating), the amount of overlap in the complement of genes that are adapted in both lineages (as) is described by a hypergeometric distribution where the expected amount of overlap is ās = axay/gas (e.g. [31]). In practice, we typically have little prior knowledge about which genes have the potential to contribute to either adaptation (gas) or standing variation in the trait (gs), but we can draw inferences about how these parameters constrain the diversity of adaptive responses by testing hypotheses and comparing the observed amount of overlap (as) to the amount expected under a given null hypothesis (ās). To test different hypotheses about how diversity constraints give rise to repeated adaptation, we represent the total number of genes included in the test set as g0. The simplest null hypothesis is that there are no diversity constraints and all genes potentially give rise to variation and contribute to adaptation (g0 = gas = gs = ns), so by rejecting this null, we can infer that gas < ns, and calculate an effect size that represents the magnitude by which all types of constraints contribute to repeatability (see Eq 1, below; note that it is also possible that the null hypothesis could be falsified in the opposite direction, with less overlap in the loci contributing than expected under the null, which might occur if evolution had occurred towards a different optimum in each lineage). Without independent lines of evidence about which genes potentially contribute to variation in the trait (gs), it is not possible to evaluate the relative importance of GT- vs. GF-redundancy using the framework here. In model systems where independent information is available for the magnitude of gs (based on mutation accumulation or GWAS; see Discussion), then a more refined null hypothesis can be tested, where g0 = gs, allowing some inferences to be made about the relative importance of GT- and GF-redundancy (Table 1). By rejecting this null, we can infer that gas < gs, which could occur due to low GF-redundancy or differences among genes in mutation rate or standing variation. Alternatively, if we fail to reject this null hypothesis, then it suggests that gsgas, which would imply that GF-redundancy doesn’t make any additional contribution to repeatability beyond the contribution of GT-redundancy. We can also reverse the direction of inquiry and estimate gas directly from the data by calculating , such that an index representing the effective proportion of the genome that can potentially contribute to adaptation can be calculated as .

Table 1. Drawing inferences about the nature of constraints to diversity that drive repeatability.

For any value of g0, an effect size representing the excess in overlap due to convergence relative to the null hypothesis can be expressed by standardizing the observed overlap by subtracting the mean (ās = axay/g0) and dividing by the standard deviation of the hypergeometric distribution: (1)

This index provides a quantitative representation of how much more overlap occurs than expected under the null hypothesis, scaled according to how much a given bout of evolution would deviate from this expectation if the null hypothesis were true. Similarly, the exact probability of observing as or more shared loci contributing to adaptation can also be calculated using the hypergeometric probability (see Supplementary Information for sample R-script), which provides a p-value.

Quantifying diversity constraints in multiple lineages

While pairwise contrasts are most straightforward statistically, they have considerably lower power than comparisons among multiple lineages. If one gene (such as Mc1r) tends to drive adaptation repeatedly in a large number of lineages, this may go undetected in an approach using multiple pairwise comparisons but would be readily detected in a simultaneous comparison of multiple lineages. Unfortunately, while the hypergeometric distribution provides an exact analytical prediction for the amount of overlap in a pairwise comparison, which can be used to calculate a p-value and the probability-based effect size (Chyper), it cannot be easily generalized to simultaneously analyze multiple lineages. While it is possible to conduct pairwise analysis and average the results across multiple comparisons, p-values from this approach might fail to detect cases where a single gene contributes repeatedly to adaptation in more than two lineages, as information does not transfer among the pairwise comparisons. We now develop an alternate, approximate approach to assess repeatability in multiple lineages by calculating Pearson’s χ2 goodness of fit statistic and comparing this to a null distribution of χ2 statistics simulated under the null hypothesis to calculate a p-value as the proportion of replicates in the null that exceed the observed test statistic. The p-value obtained by this approach represents the probability of observing a test statistic as extreme or more extreme under the null hypothesis, considering all lineages simultaneously. While the p-value is calculated from simultaneous analysis of all lineages, the effect size is instead calculated as an average across all pairwise comparisons among the k replicate lineages, because this represents the increase in repeatability relative to expectations under the null for a given bout of adaptation in any single lineage. This difference is important because the effect size should not depend on sampling effort in terms of the number of lineages, while the p-value should reflect the statistical power gained from multiple lineages.

Consider the case where g0 genes can potentially contribute to adaptation in the given trait and each lineage has some complement of genes that have mutated to drive adaptation, with αi,j representing the binary score for gene i in lineage j (1 = adapted, 0 = non-adapted). The summation for gene i across all lineages provides the observed counts (oi = Σjαi,j) while the expected counts (ei) can be set based on the null hypothesis being tested. Under null hypotheses where all genes in g0 have equal probability of contributing to adaptation, the expected counts are equal to the mean of the observed counts (e = Σioi/g0), and Pearson’s χ2 statistic can be calculated by the usual approach: χ2 = Σ(oe)2/e. Under ideal conditions, Pearson’s χ2 would approximate the analytical χ2 distribution with its mean and standard deviation equal to the degrees of freedom (df) and 2df, respectively. While this could be used to make an analytical hypothesis test (as above), in practice there will often be large deviations between Pearson’s χ2 and the analytical distribution, due to violation of the assumptions when expected counts are low (See Supplementary Materials, S2 Fig). Instead, we simulate a null distribution of values under the null hypothesis by using permutation within each lineage and recalculating for each replicate. The p-value is then equal to the proportion of the values that exceed the observed χ2 (using all lineages simultaneously), while the effect size is calculated as the mean C-score across all pairwise contrasts (simulating for each pairwise contrast): (2)

The magnitude of Cchisq therefore represents deviation between the observed amount of repeatability and that expected under the null hypothesis, which will vary as a function of the diversity constraints affecting the trait evolution, but not the number of lineages being compared. While Cchisq relies upon simulation of a null distribution, it can be calculated relatively quickly. Importantly, the magnitude of Cchisq varies linearly with Chyper (Fig 2A & 2B), showing that it represents the extent of diversity constraints in the same way as the analytically precise Chyper. While this approach provides a more accurate p-value for comparisons of multiple lineages, there is no particular reason to use Cchisq rather than Chyper for binary input data, as both effect sizes are calculated on a pairwise basis. The main reason that we develop this approach is to extend it to continuously distributed data, which can allow greater sensitivity and avoid arbitrary choices necessary to categorize the commonly used indices of local adaptation (e.g. FST or p-values) into “adapted” or “non-adapted”.

Fig 2. Cchisq and Chyper provide approximately equal estimates of the magnitude of the diversity constraints driving repeatability, while provides an estimate of the proportion of all genes that could potentially contribute to adaptation, which is not collinear with the C-scores.

Plots show values calculated for simulated datasets generated by randomly drawing two arrays with gs genes, with ai loci adapted in one array and ai + 20 in the other, and then sorting a proportion of the rows in each array to artificially generate more repeatability than would occur by chance (with a different proportion sorted in each replicate). In Panel A&C, gs = 200; in panel B&D, ai = 10; calculated using Eq 4.

Quantifying diversity constraints with continuous data

In many empirical contexts, genome scans for selection yield continuously distributed scores representing the strength of evidence for each locus contributing to adaptation (e.g., FST, p-values, Bayes factors). Using the same notation as above, but with αi,j representing the continuous score for the ith gene in the jth lineage, the total score for each gene can be calculated as a sum across lineages, , while the mean score over all genes and lineages is . A statistic analogous to the above χ2 can then be calculated as , and the same approach for calculating the null distribution of this statistic can then be used to calculate Cchisq according to Eq 2.

With continuous data, there are additional complexities that arise depending on the distribution of the particular dataset being used and how its magnitude represents evidence for a gene’s involvement adaptation. One approach, which we used in all examples here, is to transform data so that values scale positively and approximately linearly with the weight of evidence for adaptation, by standardizing data within each lineage by subtracting their observed within-lineage minimum and dividing by their observed within-lineage maximum, such that the values within each lineage are bounded from 0 to 1. This reduces differences among lineages in the absolute magnitude of indices representing adaptation, which can be desirable when they vary across many orders of magnitude (e.g. p-values from GWAS of 10−10 and 10−20 both provide strong evidence of adaptation). However, if some lineages actually have stronger signatures of adaptation at more loci, then this kind of standardization should not be used, as it would obscure these true differences among lineages. In this case, it would be preferable to use the same standardization across all lineages by subtracting the minimum and dividing by the maximum values observed across all lineages. While Pearson’s χ2 statistic was designed for discrete data, the above approach using continuous data represents the variability among lineages in the same way, as a variance among genes in the sum of their scores representing putative adaptation. The Cchisq statistic on continuous data behaves similarly to the Chyper statistic across wide ranges of parameter space, as both are formulated in terms of deviations from the null distribution (see below).

What proportion of the genome can potentially contribute to adaptation?

While the number of genes that potentially contribute to adaptation (gas) can be estimated using the hypergeometric equation, , it is difficult to apply this to comparisons of multiple lineages, as some pairwise contrasts may have no overlap in the genes contributing to adaptation (as = 0), making the equation undefined. To estimate from all lineages simultaneously, we can instead formulate a likelihood-based approach where the probability that we observe locus i adapted in oi lineages is: (3) where Bin(n,y,x) is the probability under the binomial distribution of getting x successes in n trials, each with probability y. As above, oi is the number of adapted genes in k lineages (with oi = Σjαi,j), Pa is the proportion of g0 that can actually contribute to adaptation (Pa = gas/ns), and ō is the probability of each gene contributing to adaptation ō = Σoi/(gas k). The estimated value of is then the value at which the log-likelihood function: (4) is maximized. Once the maximum-likelihood value of is estimated, this can be expressed either as an absolute number representing the effective number of genes that can contribute to adaptation or as a proportion of the total number of shared genes in the genome: . This approach implicitly assumes that all genes that have the potential to contribute to adaptation (gas) have approximately equal probabilities of actually contributing to adaptation. In very extreme cases, such where one gene is very highly repeatable while other genes only contribute to adaptation in a single lineage, will tend to represent the contribution of the repeatable genes and discount the contribution of the idiosyncratic genes (see Supplementary Materials). Multi-class models could be developed to estimate gas for different classes of genes in such scenarios by accounting for their different probabilities of contributing to adaptation (See for scripts containing functions for the above calculations).


Comparison between indices for quantifying convergence

The Chyper, Cchisq, and estimators capture different aspects of the biology underlying convergence than other previously used estimators of repeatability. To estimate the repeatability of evolution, Conte et al. [1] used the additive and multiplicative Proportional Similarity (PSadd and PSmult) indices of [32] in a meta-analysis of QTL and candidate gene studies, while Bailey et al. [2] used the Jaccard Index to quantify patterns in bacterial evolution experiments. The PS indices are defined as PSadd = Σ min(αix, αiy) and , where αix and αiy are the relative contribution of gene i to adaptation in lineages x and y [33], while the Jaccard index is defined as J = (AxAy)/(AxAy), where Ax and Ay are the sets of adapted genes in each lineage [2]. Both of these indices are based on standardizing the number of overlapping adapted loci by the total number of adapted loci, and neither includes information about non-adapted genes that potentially could have contributed to adaptation.

To illustrate the differences between these various indices of convergence, we generated four example datasets showing either randomly drawn complements of genes with adapted mutations (Fig 3A) or highly convergent datasets drawn from a smaller (Fig 3B) or larger (Fig 3C & 3D) pool of genes that potentially contribute to trait variation (gs), with differing numbers of loci contributing to adaptation. Scenario C is the most constrained, as it exhibits the same amount of overlap as B, but this overlap is drawn from a larger pool of genes so it is less likely to occur by chance. While neither the Jaccard index nor the PS indices distinguish between the B, C, and D scenarios (as the same proportions of genes are being used for adaptation, so repeatability is the same), both the Cchisq and Chyper indices show the highest scores for scenario C, because it has the smallest probability of occurring by chance if all genes had equal probabilities of contributing to adaptation. The index also identifies scenario C as most constrained in terms of the smallest proportion genes potentially contributing to adaptation. The index also shows that this proportion is equal for scenarios B & D, despite differences in the probability of the observed repeatabilities occurring by chance (as per the C-scores). More generally, while tends to decrease with increasing C-score, these indices differ in magnitude (Fig 2C & 2D), as they represent different aspects of diversity constraints. In summary, the Jaccard and PS indices quantify the proportion of genes used for adaptation that are used repeatedly, the C-score indices are inversely proportional to the probability of the observed repeatability occurring if there were no constraints, and represents the proportion of genes in the genome that are available for adaptation, given the existing diversity constraints (also see S3 Fig for further comparisons).

Fig 3. Four example datasets showing different levels of convergent adaptation and a comparison of different indices assessing overlap among adapted genes.

Scenario A is unconstrained and exactly equal to the mean expectation under a random draw; scenarios B & C show the same amount of overlap (as) and number of adaptively mutated genes (ai), but scenario C is drawn from a larger number of potential genes (gs). Scenario D has the same proportion of overlap as B & C, but twice as many adapted genes.

Simulating convergence using individual-based simulations

To further explore the effect of population genetic parameters on the behaviour of the above indices of repeatability and constraint, we used Nemo (v2.3.45; [34]) to simulate two scenarios of two-patches under migration-selection balance: (i) constant size of mutational target with variable proportions of small- and large-effect loci; and (ii) constant number of large-effect loci and variable number of small effect loci, resulting in a variable size of mutational target. For scenario (i), simulations had n = gs = 100 loci, of which u loci had alleles of size +/- 0.1, while (100 − u) loci had alleles of size +/- 0.01 (with subsequent mutations causing the allele sign to flip from positive to negative or the reverse). For scenario (ii), simulations had 10 large-effect loci with alleles of size +/- 0.1 and v small-effect loci with alleles of size +/- 0.01, resulting in a variable size of mutation target. In all simulations, migration rate was set to 0.005 and the strength of quadratic phenotypic selection was 0.5, so that an individual perfectly adapted to one patch would suffer a fitness cost of 0.5 in the other patch (patch optima were +/- 1; similar to [13]). Simulations were run for 50,000 generations and censused every 100 generations. For binary categorization of the input data, loci were considered to be “adapted” if FST > 0.1 for >80% of the last 25 census points (these cut-offs are somewhat arbitrary, but qualitative patterns were comparable under different cut-offs); for continuous input data, raw FST values were used. Results are averaged across 20 runs, each with 20 replicates, with Cchisq calculated across the 20 replicates within each run.

These scenarios further illustrate the difference between the Jaccard and PSadd indices of repeatability and the C-score and indices of constraint. In both scenarios, the small effect loci do not tend to contribute much to adaptation because large effect loci are more strongly favoured under migration-selection balance [35], which results in low GF-redundancy. In scenario (i), all indices show qualitatively similar patterns, with decreasing repeatability occurring as a result of the decreasing constraints that occur as the number of large-effect loci increases, increasing the GT- and GF-redundancy (Fig 4A). By contrast, in scenario (ii), the Jaccard and PSadd indices indicate that roughly the same amount of repeatability is occurring regardless of the number of small effect loci and total size of mutational target (Fig 4B). However, over this same range of parameter space, the C -score indices show that constraint increases as the total mutational target is increasing. This occurs because while a larger number of potential routes to an adaptive phenotype are available with increasing number of small effect loci, only the same small number of loci are actually being involved in adaptation (i.e. the large effect loci), which is illustrated by the decrease in the index. While there are many potential genetic routes to adaptation that could involve these small effect loci (high GT-redundancy), the large effect loci tend to be favoured and repeatedly involved in adaptation (low GF-redundancy). Thus, when the size of the mutational target increases in scenario (ii), the repeatability tends to stay about the same (Jaccard and PSadd) but the amount of constraint is higher (C-scores), because a smaller proportion of the available routes to adaptation are being used (). The continuous and binary Cchisq indices are broadly similar across these parameters because there is very little variation in FST among loci within the same size class (see Supplementary Materials for additional simulations under varying allele effect sizes).

Fig 4. C-score indices of constraint are qualitatively similar to Jaccard and PSadd indices of repeatability when simulations have a constant size of mutational target (A), but differ when simulations vary in the size of mutational target (B).

shows qualitatively similar patterns to the C-scores, with a decreasing proportion of the genome accessible to adaptation occurring in scenarios with higher C-scores and higher constraints. In panel A, all runs have ns = gs = 100 loci, with u large effect loci and (100 − u) small-effect loci. In panel B, there are 10 large-effect loci, and v small-effect loci. In both scenarios, simulations were run with N = 10,000 individuals in each patch, recombination rate of r = 0.5 between loci, and per-locus mutation rate = 10−5. The calculation of Chyper is based on categorizing genes as adapted when FST > 0.1, while the calculation of Cchisq is based on FST standardized by subtracting the minimum value and dividing by the maximum within each lineage.

Adjusting for incomplete sampling of the genome

The amount of constraint quantified by the C-score will depend upon the proportion of the mutational target (gs) that is sampled by the sequencing approach, which should be proportional to the sampling of the total number of genes in the genome (ns). Some approaches, such as targeted sequence capture, will sample only a subset of the total number of genes in the genome, which will therefore cause a bias in the estimation of constraint due to incomplete sampling, even if the genes included are a random subset of gs. This can be most clearly seen in the calculation of Chyper, where multiplying all the variables in Eq 1 by a given factor will cause a change in the magnitude of the effect size. By contrast, the Jaccard and PS measures of repeatability are not affected by incomplete sampling. If binary input data are being used and the proportion of gs that has been sampled can be accurately estimated (q), then the calculation of Chyper can be corrected by dividing all input variables by q prior to calculation, yielding a corrected score Chyper-adj. If continuously distributed input data are being used, then the dataset can be adjusted by adding g0 (1 − q) new entries to the dataset by randomly sampling genes with replacement from the existing dataset, and then applying Eq 2 to this extended set.

To explore the effect of incomplete sampling of the genome on the calculation of C-scores and the impact of these types of correction, we constructed a test dataset by concatenating 5 replicates from the simulations in Fig 4A with 10 large effect loci, yielding a dataset with 500 loci in total and a high amount of repeatability. We then sampled a proportion q of this total dataset to simulate incomplete representation of the genome and used the above approach calculate uncorrected and corrected C-scores. While incomplete sampling can cause considerable bias in C-scores, as long as q is not too small, these approaches yield relatively accurate corrections of these estimates (Fig 5). At very low values of q, the variance in estimation among replicate subsets increases as a result of sampling effects when only a small number of adapted loci are included, but on average the magnitude of the corrected C-score is independent of q.

Fig 5. Incomplete sampling of the genome causes a bias in the estimation of C -scores (Chyper and Cchisq), but this can be adjusted by using a correction factor (Chyper-adj) or resampling from the existing dataset up to the estimated genome size (Cchisq-adj).

These approaches yield unbiased C-scores, although the variance of the estimates increases due to sampling effects when the proportion of sampled genes (q) is small. Figure shows estimates for 10 replicate subsamples performed for each value of q.

Example: Stress resistance in yeast

Experimental evolution studies provide a controlled framework to test theories on the genetic basis of adaptation under a diversity of scenarios. Gerstein et al. (2012) previously conducted an experiment to examine the diversity of first-step adaptive mutations that arose in different lines initiated with the same genotypes in response to the antifungal drug nystatin [36] and in response to copper [37]. The design allowed them to directly test how many different first-step solutions were accessible to evolution when the same genetic background adapted to the same environmental stressor. In the nystatin-evolved lines they identified 20 unique and independently evolved mutations in only four different genes that act in the nystatin biosynthesis pathway: 11 unique mutations in ERG3, seven unique mutations in ERG6, and one unique mutation in each of ERG5 and ERG7 [36]. The genotypic basis of copper adaptation was broader, and there were both genomic (SNPs, small indels) and karyotypic (aneuploidy) mutations identified. If we consider just the genomic mutations, mutations were found in 28 different genes, with multiple mutations identified in four genes (12 unique mutations in VTC4, four unique mutations in PMA1, and three unique mutations in MAM3 and VTC1). If we assume that all genes in the genome could potentially contribute to adaptation (i.e. g0 = 6604), then Chyper-nystatin = 32.5, while Chyper-copper = 12.3, and p < 0.00001 in both cases.

If we assume much lower GT-redundancy and that only the observed genes could possibly contribute to the trait (i.e. g0-nystatin = 4, g0-copper = 28), we can test whether the mutations are still more clustered than expected within these sets. Using the methods outlined above, we find Chyper-nystatin = 0.35, p = 0.002, and Chyper-copper = 0.43, p < 0.0001, indicating that even under the severe developmental-genetic constraints to diversity represented by this model, these data are slightly more overlapping than expected at random, likely due low GF-redundancy and potentially gene-specific differences in mutation rate (because these experiments were initiated using isogenic strains, standing variation was precluded).

Experimental evolution studies lend themselves nicely to future hypothesis testing about the impact of constraint on the genetic basis of adaptation and provide us with hypotheses about differences between the genes that were and were not observed in the screen. For example, we parsed the Saccharomyces Genome Database ( for genes that have been annotated as “resistance to nystatin: increased”, where this phenotype is conferred by the null mutation. This should be a conservative dataset, as we also expect there could be mutations in additional genes that do not result in a loss-of-function phenotype that could also confer tolerance to nystatin (although we expect that the mutations we recovered in ERG3, ERG5 and ERG6 are all similar to loss of function mutations, ERG7 is inviable when null [36]). This identified an additional five genes (KES1, OSH2, SLK19, VHR2, YEH2). We can test whether the five genes without an observed mutation have a negative pleiotropic effect when null or are in areas of the genome with a lower mutation rate compared to the ERG genes (particularly compared to ERG3 and ERG6). Similar experiments could also be conducted with different Saccharomyces cerevisiae genetic backgrounds, with closely related species, or under slightly different environmental conditions (e.g., increased or decreased concentration of stress) to directly examine how different aspects of the genomic and ecological environments influence the observed level of constraints acting on adaptation.

Example: Cold tolerance in conifers

Lodgepole pine and interior spruce both inhabit large ranges of western North America and display extensive local adaptation, with large differences in cold tolerance between northern and southern populations in each species. Recent work studied the strength of correlations between population allele frequencies and a number of environmental variables and phenotypes in each species [38]. Taking one representative environmental variable as an example, a total of 50 and 121 single-copy orthologs showed strong signatures of association to Mean Coldest Month Temperature (MCMT) in pine and spruce, respectively, with 5 of these genes overlapping (based on binary categorization using the binomial cutoff “top candidate” method, as per [38]). This study included a total of 9891 one-to-one orthologs with sufficient data in both species (i.e. at least 5 SNPs per gene), so observing 5 genes overlapping corresponds to Chyper = 5.6 and p = 0.00034 under the null hypothesis that all genes had equal potential to contribute to adaptation. Alternatively, it is also possible to estimate Cchisq on continuously distributed data by calculating top candidate scores for each gene using the binomial probability of seeing u outliers when there are v SNPs in a given gene, with an overall rate of w outliers per SNP (as per [38], this yields an index rather than an exact probability, due to linkage among SNPs). This approach is more sensitive to weak signatures of adaptation that occur below the binary categorization cutoff, yielding Cchisq = 5.1 and p < 0.00001. Assuming that the 9891 studied genes represent a random sample from approximately 23,000 genes in the whole genome and ignoring divergence in gene content between species (ns = nx = ny), the adjusted C-scores are Chyper-adj = 8.6 and Cchisq-adj = 7.8 (with resampling of 50 replicates and 10,000 permutations per replicate), providing a very rough estimate of the total diversity constraints driving repeatability.

As it is possible that some factors such as conservation of the genomic landscape of cold- and hot-spots of recombination could spuriously drive signatures of convergence (see Discussion), it is possible to perform a basic control by comparing the above results for pine-MCMT vs. spruce-MCMT to the overlap between top candidates for different environments in each species, where convergence would not be expected. Examining the variable least strongly correlated to MCMT (annual heat-moisture index; AHM), we find 23 top candidates in spruce with one overlap with pine-MCMT top candidates and 25 in pine with no overlaps with spruce-MCMT, which correspond to p = 0.11 and p = 1, respectively. Thus, there was no significant increase in overlap among the top-candidates for different variables, where signatures of convergence are unexpected but could have been generated spuriously by some combination of demography and low recombination in some regions of the genome. However, as this is a negative control, the lack of a significant result does not prove that such effects are absent, so caution is necessary when drawing inferences from these data.

These diversity constraints correspond to an effective adaptive target of gas = 1462 genes that could potentially contribute to adaptation in these species (out of 9891), which yields . However, a large number of the 50 and 121 genes identified using their “top candidate test” were likely false positives, because there were no controls for population structure during the association test, as this was subsequently accounted for by the among-species comparison. Thus, if we assume a 50% false positive rate for ax and ay, then gas declines to 370 genes, with . In their analysis, Yeaman et al. [38] used another more sensitive test (null-W) to identify loci with signatures of convergence that were not detected based on overlap in the top candidates lists, which suggests the true amount of repeatability may be higher than inferred here. This example illustrates how these kinds of statistics may be used to make inferences about constraints, but also highlights the sensitivity of the results to small changes in parameters.


The methods developed here provide a way to estimate the effective number of loci that can potentially contribute to adaptation and an index to quantify the total amount that all constraints contribute to repeatability relative to the null hypothesis of no constraints (“C-score”). Importantly, these statistics can be used to contrast the total constraints affecting convergence in highly divergent traits and species. The comparison between antimicrobial and copper resistance in yeast vs. cold tolerance in conifers suggests that the latter trait is less constrained than the former (Cantimicrobial = 32.5; Ccopper = 12.3; Cclimate = 7.8), but that in all cases considerably more repeatability in adaptation is seen than under a model with no constraints. As C-scores are scaled by the standard deviation of the probability distribution, their magnitude scales linearly with decreasing probability of the observed repeatability occurring by chance under the null model. In conjunction with other information about number of loci that could potentially give rise to standing variation (gs), this method can be used to test hypotheses about whether observed repeatability is due to GT- or GF-redundancy, or other confounding factors. We now review how these methods can be used to draw inferences and discuss potential problems that should be considered in their implementation.

Hypothesis testing to identify the factors that constrain diversity

Under the simplest null hypothesis that there are no diversity constraints, all genes can give rise to potentially adaptive variation in the trait (g0 = gs = gas = ns). While simplistic, this approach provides an intuitive method to assess whether the amount of convergence observed is more than expected due to pure randomness. But what do we learn if we reject such a simple null hypothesis? Two inferences can be drawn in this case: many of the genes flagged by our tests for selection are likely evolving by natural selection (i.e., they are not all false positives) and some kind of constraint is involved in shaping this adaptation. The former inference means that analyzing comparative data for convergence can provide a powerful tool for identifying the genes involved in local adaptation, as this is often a significant methodological hurdle in evolutionary biology (e.g., [38]). The latter inference may seem a straw-man, as few molecular biologists would advocate a model where every gene can mutate to give rise to adaptively useful variation in a given trait. However, different forms of the “universal pleiotropy” model have been assumed in theoretical quantitative genetics [39], and the recently proposed “omnigenic model” advocates extensive pleiotropy [14]. Regardless of the true number of genes involved, this null hypothesis provides a benchmark against which we can quantify how all factors constraining the diversity of forms combine to drive repeatability, which is useful for interpreting patterns of repeatability among species and traits.

In order to make inferences about the potential importance of different kinds of diversity constraints driving repeatability, it is necessary to specify more realistic models for the evolution of local adaptation that incorporate different assumptions about size of the mutational target of the trait, extent of shared standing variation, differences in mutation rate among genes, distribution of mutation effect sizes, and species demography. The simplest modification to the above null model is to represent the extent of GT-redundancy by specifying the number of loci that potentially contribute to trait variation as a subset of the total number of loci in the genome (g0 = gs < ns). In the context of the Chyper index (Eq 1), reducing g0 increases both the mean and standard deviation of the hypergeometric distribution and therefore decreases Chyper and the inferred level of residual (unexplained) constraints. If empirical estimates of gs result in Chyper ~ 0, then it is reasonable to conclude that low GT-redundancy is mainly responsible for the observed amount of convergent adaptation. This would not discount the importance of natural selection overall, as selection on the phenotype is still responsible for adaptation but would suggest that gs individual loci are more or less interchangeable and GF-redundancy contributes no additional constraints above those imposed by GT-redundancy (Table 1). However, as we have few (if any) conclusive estimates of gs in highly polygenic traits [40,41], the extent of constraint arising through low GT-redundancy will be difficult to assess without further directed study. Although they are by no means simple experiments to conduct, it should be possible to estimate gs from QTLs identified in multiple mutation accumulation experiments, as the number of loci detected across all experiments should asymptote towards gs, and rarefaction designs could be used to estimate gs based on the overlap between QTLs detected in two experiments (although this would likely still be biased by failing to detect loci of small effect). A similar approach to the study of repeatability in adaptive loci taken here could be applied to multiple GWAS results on standing variation for a given trait conducted independently in different species to assess the proportion of shared loci. However, it should be noted that in this case, the loci that contribute to standing variation could be shaped by previous selection and might therefore be more convergent than those identified using mutation accumulation, especially if long-term balancing selection is operating (e.g. [42]).

In order to draw inferences about the importance of these types of redundancy, it is critical to account for other factors unrelated to GT- and GF-redundancy that might drive repeatability, mainly through differences among genes in mutation rate or standing variation. The simplest approach to control these factors is to design studies that preclude shared standing variation, either through experiments founded from isogenic strains (e.g., [2,36]) or comparisons of distantly related lineages (divergence time >> 4Ne) where lineage sorting has been completed (as per [38]). While repeatability could still be driven by differences among genes in mutation rate, this can be seen as a component of GT-redundancy and therefore as a factor that can also constrain diversity. By contrast, the existence of shared standing variation occurs mainly due to historical contingency and is therefore a bias affecting estimation of C-scores, rather than a constraint. As such, parsing the contribution of mutation rate to C-scores and is less critical than parsing the contribution of standing variation when using these as overall indices of constraint. Unfortunately, in studies of recently diverged natural populations, it is not possible to preclude shared standing variation, so C-scores and could be strongly driven by this factor and therefore not particularly representative of diversity constraints. The recently developed likelihood-based method for discriminating between convergence via de novo mutation, migration, or shared standing variation ([24]) may provide a means to parse these contributions to repeatability and refine the inference of constraint. While testing the null hypothesis of no constraints is relatively straightforward, discriminating among other potential factors constraining diversity is much more complicated. Although it is possible to make very intricate models with variable mutation rates, selection coefficients, indices of pleiotropy, shared standing variation, and/or other factors determining the likelihood of each gene contributing to adaptation [3,25,28,43], it may be very difficult to actually confidently discriminate between such models.

Practical considerations in implementation

The accuracy of the indices developed here will critically depend on the correct identification of the genes contributing to adaptation. Studies of local adaptation are particularly prone to false positives when population structure is oriented on the same axis as adaptive divergence, and it is unclear how extensively methods that correct for population structure induce false negatives or fail to accurately control for false positives [44,45]. Assuming false positives are distributed randomly throughout the genome in each lineage, failure to remove them will cause the C-scores derived here to be biased downwards. Failure to identify true positives (i.e. false negatives) will impair the accuracy of Cchisq, with the direction dependent upon the underlying biology. Assuming false negatives are randomly distributed in the genome, they could also reduce the magnitude of C-scores due to lower information content. On the other hand, if large-effect loci are more likely than small-effect loci to be both detected and convergent, false negatives will tend to bias C-scores upwards. As it is typically necessary to set arbitrary cutoffs for statistical significance to identify putatively adapted loci, we might expect Cchisq to increase with increasing stringency of these cutoffs, as this would be expected to reduce false positives (but also increase false negatives). However, as there are many potential contingencies and interactions between the factors that affect these two types of error, there is a clear need for both theoretical studies on how the repeatability of local adaptation is affected by the interplay between demography and selection (e.g. [28]), and refinement of these methods to derive confidence intervals taking into account likely error rates.

A particularly important problem to address in implementing this method is that false positives may be non-randomly distributed throughout the genome in a similar way in different lineages. As local variations in the rate of mutation or recombination can drive genome-wide patterns in some indices used to identify selection and adaptation [4648], this could lead to signatures of convergence among distantly-related species if such patterns are conserved over long periods of evolutionary time. For example, genome-wide patterns of variation in nucleotide diversity, FST, and dxy were all significantly correlated across three distantly related bird species, likely driven in part by conservation of local recombination rate coupled with linked selection [49]. The extent of convergence of local recombination rates appears to vary considerably among species [5053], so it will be important to consider this factor as a potential driver of similarity in the genomic signatures used to identify selection. Methods for identifying signatures of adaptation that are explicitly linked to a phenotype or environment of interest across multiple pairs of populations may be less likely to be affected by such factors, as recombination and linked selection are unlikely to drive a pattern of repeated correlation between allele frequency and phenotype/environment. However, such methods are still vulnerable to potential biases that arise from the complex interplay between genomic landscape, selection, and recombination, and further study in both theoretical and empirical contexts will be important to test the robustness of different methods to this important source of bias. One potential approach to estimate the contribution of such confounding factors would be to compare signatures of the repeatability for adaptation in two different traits (which are not phenotypically convergent) to those for a phenotypically convergent trait.

While studying adaptation across multiple pairs of populations can greatly increase the power to detect signatures of selection when all populations are adapting via the same loci, such methods are inherently unable to detect idiosyncratic patterns where different populations of a given species are adapting via different loci. By its very nature, it may be very difficult, if not impossible to detect local adaptation in traits with high GT- or GF-redundancy, as each pair of populations may be differentiated via a different set of loci [13]. If local adaptation is much more readily detected when it arises repeatedly within a lineage, then it will be difficult to identify conclusive cases with low C-scores, causing an overestimation of the prevalence of highly repeated adaptation.

If patterns of genomic convergence are compared among multiple differentially-related lineages, it is important to consider their phylogeny when testing the importance of phylogenetic sharing of different factors affecting the propensity for gene reuse [54]. The ability to resolve orthology relationships also decreases with increasing phylogenetic distance, which can affect the estimation of ns. Similarly, the set of genes in a trait’s mutational target (gi) is expected to evolve over time, so the set of shared genes should decrease with phylogenetic distance (so that gigs increases with divergence time), leading to decreased repeatability over time [33]. When studies include multiple differentially-related lineages, it is probably useful to estimate C-scores on both a pairwise and mean-across-all-lineages basis to more clearly describe cases where convergence is high within pairs of closely related lineages but low among more distantly related lineages.

Finally, physical linkage is a factor that could critically affect the measurement of repeatability, as neutral alleles in other genes linked to a causal allele will tend to respond to indirect selection, causing spurious signatures of selection/local adaptation. If the same causal gene is driving adaptation in two lineages, this will tend to overestimate repeatability on a gene-by-gene basis, whereas the opposite will occur if different causal genes are driving adaptation. Yeaman et al. [38] found significantly elevated levels of linkage disequilibrium (LD) among candidate genes for local adaptation, which may have arisen due to physical linkage (with or without selection on multiple causal loci) or statistical associations driven by selection among physically unlinked loci. In this case, the fragmented genome and lack of suitable genetic map precluded a comprehensive analysis of the impact of LD. If genome/genetic map resources permit, it may be possible to analyse repeatability on haplotype blocks rather than individual genes, which could minimize the biases due to physical linkage.

Comparison to other indices of repeatability

A large number of indices have been developed to characterize similarity among ecological communities, which can be broadly grouped based on binary vs. quantitative input data and whether they account for joint absence of a given type (reviewed in [55]). In most cases, these indices are not derived from a probability-based representation of expectations, though Raup and Crick [56] quantified an index of similarity based on the p-value of a hypergeometric test (see also [57]). The Chyper index that we have developed here uses the same underlying logic as the Raup-Crick index but quantifies the effect size as a deviation from the expectation under the null hypothesis in units of the standard deviation of the null distribution. The C-score and indices developed here provide a complement to indices of repeatability that have been used in previous studies of convergence at the genome scale (e.g. [2,33]). Whereas the Jaccard, PSadd, and other similar indices represent how commonly a given gene tends to be used in adaptation, the C-score indices quantify how much constraint is involved in driving this observed repeatability, whereas quantifies the proportion of the genome that is effectively available for adaptation. In some cases, these indices will be qualitatively similar in quantifying patterns of convergence (e.g. Fig 4A), but in other cases they will diverge considerably, because the C-scores are explicitly aimed at representing the importance of genes that could contribute to adaptation but do not.

What does convergence tell us about the basis of trait variation?

The repeated observation of convergent adaptation at the genome scale violates a fundamental assumption of the infinitesimal model of quantitative genetics: that the mutations responsible for adaptation have small effects and are essentially interchangeable [58]. If there are infinitely many interchangeable loci that could contribute to adaptation, the chance of the same gene playing a causal role in independent bouts should be vanishingly small. Molecular-genetic studies show that some traits are causally generated by only a few genes in a specific pathway, presumably limiting the mutational target and increasing the potential for convergence. For example, only the small number of genes that are directly involved in terpene production [59] would likely contribute to large variations in the amount of terpene produced by a plant. However, a second category of mutations in other non-pathway genes could also indirectly contribute to variation in terpene production through perturbations of regulatory networks. The recently proposed “omnigenic model” posits that genes can be categorized into “core” vs. “peripheral” function for a given trait, as a way to distinguish between those with larger direct effects vs. smaller indirect effects [14], although this model has also been criticized [60]. The majority of evidence that has been considered in the context of the omnigenic model has come from Genome-Wide Association Studies (GWAS) of standing variation, but it is unclear whether this represents the “stuff” of long-term adaptation. Indeed, it appears that in humans there are pronounced differences in the distributions of alleles that contribute to standing vs. adaptive genetic variation, as GWAS studies of standing variation find mainly small-effect variants [61,62], whereas studies of local adaptation have found a number of large effect loci (e.g, for lactase persistence [63], diving [64], and high altitude [65]). If GT-redundancy is typically high and GF-redundancy commonly low, then there will be little correlation between the loci that can give rise to standing variation and the smaller subset of those responsible for long-term adaptation. Studying whether adaptation is commonly repeatable at the genome scale will therefore make an important complement to GWAS studies of standing variation, providing a window into the factors that constrain the diversity of viable routes to adaptation, and informing our broader understanding of how variation translates into evolution.


We present a method to quantify the constraints that drive genomic repeatability of adaptation, to enable testing of hypotheses about the nature of these constraints. Contrasting the repeatability of adaptation with studies of standing variation will deepen our understanding of evolution and the factors that affect how it gives rise to diversity. Comparative approaches examining C-scores and the proportion of adaptation-effective loci () for the same trait across different branches of the phylogeny may allow us to infer rates of evolution in constraints and potential differences between rapidly vs. slowly radiating lineages, and study whether adaptation drives such changes. Comparisons across traits within lineages will illuminate how different kinds of traits are constrained, and whether low GT- and GF-redundancy constitute important constraints at different levels of biological organization. Similarly, this approach could be used to examine whether the types of constraint that predominate depend upon critical population genetic parameters such as effective population size, which affects the long term efficiency of selection on the developmental-genetic program. While we have focused on repeatability at the gene level, this framework could be applied at other levels of organization, such as gene network, protein domain, or individual nucleotide (reviewed by [3]), and could include the contribution of intergenic regulatory regions if it is possible to identify orthology. These methods therefore provide a first step towards comprehensive quantification and understanding of evolutionary constraints and the role that different factors play in the rise of diversity during adaptation.

Supporting information

S1 Data. Additional analysis and model description.


S1 Fig. Schematic representation of the overlap between the genes that potentially contribute to variation (G) and adaptation (GA) and those observed to contribute to adaptation (A) within two lineages (x and y) undergoing local adaptation to a similar selection pressure.

Note that for simplicity Nx and Ny are not shown, and this figure is drawn under the assumption that all members of Gx and Gy are also members of Ns, and all members of GAx and GAy are also members of Gx and Gy (i.e., and and and ).


S2 Fig. Deviations between the distribution of Pearson’s estimator for χ2 and the exact χ2 distribution for the same degrees of freedom, for simulations with varying numbers of genes with adaptive mutations per replicate (A), and varying numbers of genes underlying the trait (B).

Solid vertical bars show analytical means, dashed vertical lines show simulated means. In all cases k = 10; in panel A, gs = 100; in panel B, ai = 10.


S3 Fig. Comparison of Chyper with Cchisq (A, B), Jaccard (C, D), and PSadd indices (E, F) for quantifying convergence under scenarios with different numbers of adaptive mutations (A-C) and causal loci (D-F).

In all cases, scenarios were simulated for two replicate lineages with ai adaptive mutations in one and ai + 20 adaptive mutations in the other. In panels A, C & E gs = 200; in panels B, D & F ai = 10. For each parameter set, a number of simulations were run, each with a different proportion of the rows in each lineage sorted numerically, to introduce different amounts of repeatability into each run (i.e., the same procedure as in Fig 3).


S4 Fig. The indices of convergence in individual based simulations exhibit a non-monotonic pattern with increasing values of the gamma shape parameter, which changes GF-redundancy (panel A).

Example plots show divergence at individual loci contributing to local adaptation for four of the replicate simulation runs plotted in the left-hand panel. Panels B, D, F, & H show binned FST values for each locus, averaged over the last 25 census points; panels C, E, G & I show histograms of the allele effect sizes in the simulation.



We would like to thank J. Mee, D. Schluter, and the Yeaman lab for comments and discussion during the preparation of this manuscript. Special thanks to L. Harris for suggesting Pearson’s χ2 when S.Y. & A.G. were mired in combinatorics.


  1. 1. GL , Arnegard ME, Peichel CL, Schluter D. The probability of genetic parallelism and convergence in natural populations. Proc R Soc B Biol Sci. 2012;279: 5039–5047.
  2. 2. Bailey SF, Rodrigue N, Kassen R. The Effect of Selection Environment on the Probability of Parallel Evolution. Mol Biol Evol. 2015;32: 1436–1448. pmid:25761765
  3. 3. Storz JF. Causes of molecular convergence and parallelism in protein evolution. Nat Reve Genet; 2016;17: 239–250.
  4. 4. Chevin LM. Genetic constraints on adaptation to a changing environment. Evolution. 2013;67: 708–721. pmid:23461322
  5. 5. Connallon T, Hall MD. Genetic constraints on adaptation: A theoretical primer for the genomic s era. Ann NY Acad Sci. 2018; 1422:65–87. pmid:29363779
  6. 6. Nachman MW, Hoekstra HE, D’Agostino SL. The genetic basis of adaptive melanism in pocket mice. Proc Natl Acad Sci. 2003;100: 5268–5273. pmid:12704245
  7. 7. Rosenblum EB, Hoekstra HE, Nachman MW. Adaptive Reptile Color Variation and the Evolution of the Mc1R Gene. Evolution. 2004;58: 1794. pmid:15446431
  8. 8. Mundy NI. A window on the genetics of evolution: MC1R and plumage colouration in birds. Proc R Soc B Biol Sci. 2005;272: 1633–1640.
  9. 9. Gross JB, Borowsky R, Tabin CJ. A novel role for Mc1r in the parallel evolution of depigmentation in independent populations of the cavefish Astyanax mexicanus. PLoS Genet. 2009;5. pmid:19119422
  10. 10. Manceau M, Domingues VS, Linnen CR, Rosenblum EB, Hoekstra HE. Convergence in pigmentation at multiple levels: mutations, genes and function. Philos Trans R Soc B Biol Sci. 2010;365: 2439–2450.
  11. 11. Hoekstra HE. Genetics, development and evolution of adaptive pigmentation in vertebrates. Heredity. 2006;97: 222–234. pmid:16823403
  12. 12. Goldstein DB, Holsinger KE. Maintenance of polygenic variation in spatially structured populations: Roles for local mating and genetic redundancy. Evolution. 1992;46: 412–429. pmid:28564040
  13. 13. Yeaman S. Local adaptation by alleles of small effect. Am Nat. 2015;186: S74–S89. pmid:26656219
  14. 14. Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169: 1177–1186. pmid:28622505
  15. 15. Yeaman S, Whitlock MC. The Genetic Architecture of Adaptation under Migration-Selection Balance. Evolution. 2011;65: 1897–1911. pmid:21729046
  16. 16. van Doorn GS, Dieckmann U. The long-term evolution of multilocus traits under frequency-dependent disruptive selection. Evolution. 2006;60: 2226–2238. pmid:17236416
  17. 17. Kopp M, Hermisson J. The evolution of genetic architecture under frequency-dependent disruptive selection. Evolution. 2006;60: 1537–1550. pmid:17017055
  18. 18. Kimura M. evolutionary Rate at the Molecular Level. Nature. 1968. 217:624–626. pmid:5637732
  19. 19. Ohta T. Slightly deleterious mutant substitutions in evolution. Nature. 1973; 246: 96–98. pmid:4585855
  20. 20. Flaxman SM, Feder JL, Nosil P. Genetic Hitchhiking and the Dynamic Buildup of Genomic Divergence During Speciation With Gene Flow. Evolution. 2013;67: 2577–2591. pmid:24033168
  21. 21. Turelli M. Heritable genetic variation via mutation-selection balance: Lerch’s zeta meets the abdominal bristle. Theor Popul Biol. 1984;25: 138–193. pmid:6729751
  22. 22. Schuster P. A testable genotype-phenotype map: modeling evolution of RNA molecules. Biological evolution and statistical physics. Berlin: Springer-Verlag; 2002. pp. 55–81.
  23. 23. Ralph PL, Coop G. The Role of Standing Variation in Geographic Convergent Adaptation. Am Nat. 2015;186: S5–S23. pmid:26656217
  24. 24. Lee KM, Coop G. Distinguishing among modes of convergent adaptation using population genomic data. Genetics. 2017;207: 1591–1619. pmid:29046403
  25. 25. MacPherson A, Nuismer SL. The probability of parallel genetic evolution from standing genetic variation. J Evol Biol. 2017;30: 326–337. pmid:27801996
  26. 26. Orr H. The probability of parallel evolution. Evolution. 2005;59: 216–220. pmid:15792240
  27. 27. Arendt J, Reznick D. Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends Ecol Evol. 2008;23: 26–32. pmid:18022278
  28. 28. Chevin LM, Martin G, Lenormand T. Fisher’s model and the genomics of adaptation: Restricted pleiotropy, heterogenous mutation, and parallel evolution. Evolution. 2010;64: 3213–3231. pmid:20662921
  29. 29. Rosenblum EB, Parent CE, Brandt EE. The Molecular Basis of Phenotypic Convergence. Annu Rev Ecol Evol Syst. 2014;45: 203–226.
  30. 30. Bailey SF, Blanquart F, Bataillon T, Kassen R. What drives parallel evolution?: How population size and mutational variation contribute to repeated evolution. BioEssays. 2017;39: 1–9.
  31. 31. Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW. Evidence for extensive parallelism but divergent genomic architecture of adaptation along altitudinal and latitudinal gradients in Populus trichocarpa. New Phytol. 2015;209: 1240–1251. pmid:26372471
  32. 32. Whittaker RH. A study of summer foliage insect communities in the Great Smoky Mountains. Ecol Monogr. 1952;22: 1–44.
  33. 33. Conte GL, Arnegard ME, Peichel CL, Schluter D. The probability of genetic parallelism and convergence in natural populations. Proc R Soc B Biol Sci. 2012;279: 5039–5047.
  34. 34. Guillaume F, Rougemont J. Nemo: An evolutionary and population genetics programming framework. Bioinformatics. 2006;22: 2556–2557. pmid:16882649
  35. 35. Yeaman S, Otto SP. Establishment and maintenance of adaptive genetic divergence under migration, selection, and drift. Evolution. 2011;65: 2123–2129. pmid:21729066
  36. 36. Gerstein AC, Lo DS, Otto SP. Parallel genetic changes and nonparallel gene-environment interactions characterize the evolution of drug resistance in yeast. Genetics. 2012;192: 241–252. pmid:22714405
  37. 37. Gerstein AC, Ono J, Lo DS, Campbell ML, Kuzmin A, Otto SP. Too much of a good thing: The unique and repeated paths toward copper adaptation. Genetics. 2015;199: 555–571. pmid:25519894
  38. 38. Yeaman S, Hodgins KA, Lotterhos KE, Suren H, Gray LK, Liepe KJ, Hamann A, Holliday JA, Whitlock MC, Rieseberg LH, Aitken SN. Convergent local adaptation to climate in distantly related conifers. Science. 2016; 353:1431–1433. pmid:27708038
  39. 39. Paaby AB, Rockman M V. The many faces of pleiotropy. Trends Genet. 2013;29: 66–73. pmid:23140989
  40. 40. Barton NH, Turelli M. Evolutionary quantitative genetics: how little do we know? Annu Rev Genet. 1989;23: 337–370. pmid:2694935
  41. 41. Mackay TFC, Stone E a, Ayroles JF. The genetics of quantitative traits: challenges and prospects. Nat Rev Genet. 2009;10: 565–577. pmid:19584810
  42. 42. Schluter D, Conte GL. Genetics and ecological speciation. Proc Natl Acad Sci. 2009;106: 9955–9962. pmid:19528639
  43. 43. Bailey SF, Guo Q, Bataillon T. Identifying drivers of parallel evolution: A regression model approach. Bioarxiv. 2017.
  44. 44. de Villemereuil P, Frichot É, Bazin É, François O, Gaggiotti OE. Genome scan methods against more complex models: when and how much should we trust them? Mol Ecol. 2014;23: 2006–2019. pmid:24611968
  45. 45. Lotterhos KE, Whitlock MC. The relative power of genome scans to detect local adaptation depends on sampling design and statistical method. Mol Ecol. 2015;24: 1031–1046. pmid:25648189
  46. 46. Renaut S, Grassa CJ, Yeaman S, Moyers BT, Lai Z, Kane NC, et al. Genomic islands of divergence are not affected by geography of speciation in sunflowers. Nat Commun. 2013;4.
  47. 47. Roesti M, Moser D, Berner D. Recombination in the threespine stickleback genome—Patterns and consequences. Mol Ecol. 2013;22: 3014–3027. pmid:23601112
  48. 48. Burri R, Nater A, Kawakami T, Mugal CF, Olason PI, Smeds L, et al. Linked selection and recombination rate variation drive the evolution of the genomic landscape of differentiation across the speciation continuum of Ficedula flycatchers. Genome Res. 2015;25: 1656–1665. pmid:26355005
  49. 49. Dutoit L, Vijay N, Mugal CF, Bossu CM, Burri R, Wolf J, et al. Covariation in levels of nucleotide diversity in homologous regions of the avian genome long after completion of lineage sorting. Proc R Soc B Biol Sci. 2017;284: 20162756.
  50. 50. Ptak SE, Hinds DA, Koehler K, Nickel B, Patil N, Ballinger DG, et al. Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet. 2005;37: 429–434. pmid:15723063
  51. 51. Manzano-Winkler B, McGaugh SE, Noor MAF. How Hot Are Drosophila Hotspots? Examining Recombination Rate Variation and Associations with Nucleotide Diversity, Divergence, and Maternal Age in Drosophila pseudoobscura. PLoS One. 2013;8: 1–6.
  52. 52. Tsai IJ, Burt A, Koufopanou V. Conservation of recombination hotspots in yeast. Proc Natl Acad Sci. 2010;107: 7847–7852. pmid:20385822
  53. 53. Smukowski CS, Noor MAF. Recombination rate variation in closely related species. Heredity; 2011;107: 496–508. pmid:21673743
  54. 54. Felsenstein J. Phylogenies and the comparative method. Am Nat. 1985;125: 1–15.
  55. 55. Anderson MJ, Crist TO, Chase JM, Vellend M, Inouye BD, Freestone AL, et al. Navigating the multiple meanings of β diversity: A roadmap for the practicing ecologist. Ecol Lett. 2011;14: 19–28. pmid:21070562
  56. 56. Raup DM, Crick RE. Paleontological Society Measurement of Faunal Similarity in Paleontology. J Paleontol. 1979;53: 1213–1227.
  57. 57. Chase JM, Kraft NJB, Smith KG, Vellend M, Inouye BD. Using null models to disentangle variation in community dissimilarity from variation in α-diversity. Ecosphere. 2011;2.
  58. 58. Falconer D.S, Mackay TFC. Introduction to Quantitative Genetics. 1996.
  59. 59. Chen F, Tholl D, Bohlmann J, Pichersky E. The family of terpene synthases in plants: A mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant J. 2011;66: 212–229. pmid:21443633
  60. 60. Wray NR, Wijmenga C, Sullivan PF, Yang J, Visscher PM. Common Disease Is More Complex Than Implied by the Core Gene Omnigenic Model. Cell. 2018;173: 1573–1580. pmid:29906445
  61. 61. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. The American Society of Human Genetics; 2012;90: 7–24.
  62. 62. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101: 5–22. pmid:28686856
  63. 63. Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet. 2007;39: 31–40. pmid:17159977
  64. 64. Ilardo MA, Moltke I, Korneliussen TS, Cheng J, Stern AJ, Racimo F, et al. Physiological and Genetic Adaptations to Diving in Sea Nomads. Cell.; 2018;173: 569–580.e15. pmid:29677510
  65. 65. Bigham A, Bauchet M, Pinto D, Mao X, Akey JM, Mei R, et al. Identifying signatures of natural selection in Tibetan and Andean populations using dense genome scan data. PLoS Genet. 2010;6.