Figure 1.
The effect of divergence-based ascertainment of the site frequency spectrum.
Coalescent simulations were performed to generate 106 unlinked genomic regions of 20 ingroup individuals and a single outgroup sequence, with a species divergence time of 5.0 4Ne generations and a fixed value of and
. Levels of divergence were then calculated in these regions by selecting a single, random ingroup sequence and comparing it to the outgroup. From this comparison, those regions within the observed upper and lower 1% of divergence were retained. Shown are the unfolded site frequency spectra from the upper (blue) and lower (orange) 1% regions compared to the expected site frequency spectrum under the standard neutral model (black). Ascertainment based on low levels of divergence biases the recovered site frequency spectrum towards rare alleles, relative to the standard neutral model, whereas ascertainment based on elevated levels of divergence biases the site frequency spectrum towards intermediate and high frequency alleles.
Figure 2.
The effect of species divergence time on divergence-based ascertainment bias.
Coalescent simulation as in Figure 1 were performed, however a species divergence time was varied between 5 and 100 4Ne generations was used (see Figure 1 caption for simulation details). Shown are values of Tajima's D from the upper (blue) and lower (orange) 1% regions. A dotted line at D = 0 is shown for reference to the neutral expectation. The strength of a divergence-based ascertainment bias decreases as a function of species divergence time, but does very slowly such that an appreciable effect remains at species divergence times of 100.
Figure 3.
ML estimates of the site frequency spectrum from divergence ascertained data.
Data used in the top and bottom panels correspond to the lower and upper 1% divergence ascertained data simulated via a coalescent method (see Figure 1 caption for details). Shown are the unfolded site frequency spectra from the divergence ascertained data (orange), ascertainment-corrected data (blue), and the expected standard neutral model spectrum (black).
Figure 4.
ML estimates of the strength of selection with and without divergence ascertainment correction for lower 1% data.
The strength of selection, α, was estimated from those loci identified as belonging to the lower 1% divergence group from the 106 simulated regions (see Figure 1 caption for details). The left box shows MLEs using an estimation routine which does not account for divergence-based ascertainment (uncorrected), the right box shows MLEs from the same data but estimated in a fashion which accounts for ascertainment (corrected). Note that uncorrected estimates show spurious evidence for negative selection even though the data were generated from a neutral model.
Figure 5.
ML estimates of the strength of selection from simulations with selection with and without divergence ascertainment correction for lower 1% data.
The strength of selection, α, was estimated from those loci identified as belonging to the lower 1% divergence group from the 106 regions simulated from a deleterious alleles model with α = −5.0. Simulations were performed using a model closely related to Gillespie's exponential shift model [15], but rather than a distribution of selection coefficients only a single coefficient is assigned to new mutations (simulation details can be found in Kern et al. [16]). This method uses a time forward population genetic simulation approach. Population size for these simulations is 105, all other parameters are identical to those in Figure 1. The left box shows MLEs using an estimation routine which does not account for divergence-based ascertainment (uncorrected), the right box shows MLEs from the same data but estimated in a fashion which accounts for ascertainment (corrected). As in the neutral setting, correction restores estimates close to their true value and evidence for negative selection once corrected for ascertainment can be thought of as conservative.