Quantifying how constraints limit the diversity of viable routes to adaptation

doi:10.1371/journal.pgen.1007717

Fig 1.

Scenarios with low GT-redundancy and no additional GF-redundancy (A) or high GT-redundancy and low GF-redundancy (B) can both result in high repeatability of adaptation (adapted from [22]).

More »

Expand

Table 1.

Drawing inferences about the nature of constraints to diversity that drive repeatability.

More »

Expand

Fig 2.

C_chisq and C_hyper provide approximately equal estimates of the magnitude of the diversity constraints driving repeatability, while provides an estimate of the proportion of all genes that could potentially contribute to adaptation, which is not collinear with the C-scores.

Plots show values calculated for simulated datasets generated by randomly drawing two arrays with g_s genes, with a_i loci adapted in one array and a_i + 20 in the other, and then sorting a proportion of the rows in each array to artificially generate more repeatability than would occur by chance (with a different proportion sorted in each replicate). In Panel A&C, g_s = 200; in panel B&D, a_i = 10; calculated using Eq 4.

More »

Expand

Fig 3.

Four example datasets showing different levels of convergent adaptation and a comparison of different indices assessing overlap among adapted genes.

Scenario A is unconstrained and exactly equal to the mean expectation under a random draw; scenarios B & C show the same amount of overlap (a_s) and number of adaptively mutated genes (a_i), but scenario C is drawn from a larger number of potential genes (g_s). Scenario D has the same proportion of overlap as B & C, but twice as many adapted genes.

More »

Expand

Fig 4.

C-score indices of constraint are qualitatively similar to Jaccard and PS_add indices of repeatability when simulations have a constant size of mutational target (A), but differ when simulations vary in the size of mutational target (B).

shows qualitatively similar patterns to the C-scores, with a decreasing proportion of the genome accessible to adaptation occurring in scenarios with higher C-scores and higher constraints. In panel A, all runs have n_s = g_s = 100 loci, with u large effect loci and (100 − u) small-effect loci. In panel B, there are 10 large-effect loci, and v small-effect loci. In both scenarios, simulations were run with N = 10,000 individuals in each patch, recombination rate of r = 0.5 between loci, and per-locus mutation rate = 10⁻⁵. The calculation of C_hyper is based on categorizing genes as adapted when F_ST > 0.1, while the calculation of C_chisq is based on F_ST standardized by subtracting the minimum value and dividing by the maximum within each lineage.

More »

Expand

Fig 5.

Incomplete sampling of the genome causes a bias in the estimation of C -scores (C_hyper and C_chisq), but this can be adjusted by using a correction factor (C_hyper-adj) or resampling from the existing dataset up to the estimated genome size (C_chisq-adj).

These approaches yield unbiased C-scores, although the variance of the estimates increases due to sampling effects when the proportion of sampled genes (q) is small. Figure shows estimates for 10 replicate subsamples performed for each value of q.

More »

Expand