Fig 1.
Scenarios with low GT-redundancy and no additional GF-redundancy (A) or high GT-redundancy and low GF-redundancy (B) can both result in high repeatability of adaptation (adapted from [22]).
Table 1.
Drawing inferences about the nature of constraints to diversity that drive repeatability.
Fig 2.
Cchisq and Chyper provide approximately equal estimates of the magnitude of the diversity constraints driving repeatability, while provides an estimate of the proportion of all genes that could potentially contribute to adaptation, which is not collinear with the C-scores.
Plots show values calculated for simulated datasets generated by randomly drawing two arrays with gs genes, with ai loci adapted in one array and ai + 20 in the other, and then sorting a proportion of the rows in each array to artificially generate more repeatability than would occur by chance (with a different proportion sorted in each replicate). In Panel A&C, gs = 200; in panel B&D, ai = 10; calculated using Eq 4.
Fig 3.
Four example datasets showing different levels of convergent adaptation and a comparison of different indices assessing overlap among adapted genes.
Scenario A is unconstrained and exactly equal to the mean expectation under a random draw; scenarios B & C show the same amount of overlap (as) and number of adaptively mutated genes (ai), but scenario C is drawn from a larger number of potential genes (gs). Scenario D has the same proportion of overlap as B & C, but twice as many adapted genes.
Fig 4.
C-score indices of constraint are qualitatively similar to Jaccard and PSadd indices of repeatability when simulations have a constant size of mutational target (A), but differ when simulations vary in the size of mutational target (B).
shows qualitatively similar patterns to the C-scores, with a decreasing proportion of the genome accessible to adaptation occurring in scenarios with higher C-scores and higher constraints. In panel A, all runs have ns = gs = 100 loci, with u large effect loci and (100 − u) small-effect loci. In panel B, there are 10 large-effect loci, and v small-effect loci. In both scenarios, simulations were run with N = 10,000 individuals in each patch, recombination rate of r = 0.5 between loci, and per-locus mutation rate = 10−5. The calculation of Chyper is based on categorizing genes as adapted when FST > 0.1, while the calculation of Cchisq is based on FST standardized by subtracting the minimum value and dividing by the maximum within each lineage.
Fig 5.
Incomplete sampling of the genome causes a bias in the estimation of C -scores (Chyper and Cchisq), but this can be adjusted by using a correction factor (Chyper-adj) or resampling from the existing dataset up to the estimated genome size (Cchisq-adj).
These approaches yield unbiased C-scores, although the variance of the estimates increases due to sampling effects when the proportion of sampled genes (q) is small. Figure shows estimates for 10 replicate subsamples performed for each value of q.