An Approximate Bayesian Estimator Suggests Strong, Recurrent Selective Sweeps in Drosophila

doi:10.1371/journal.pgen.1000198

Figure 1.

A cartoon representation of the difference between models of common weak and rare strong selection.

On the X-axis is distance along a chromosome in kilobases (kb), and the on the Y-axis is variability. The dotted-line represents the average heterozygosity, and the solid bars represent loci sequenced for polymorphism data. As shown, under the weak selection model each individual selective fixation impacts a small genomic region, though sweeps are occurring frequently. The combination results in a homogenizing effect across the chromosome. Alternatively, under the strong selection model each fixation impacts a large genomic region. However, because selection is rare, other regions will appear at equilibrium. Thus, sampling loci under these models, the mean level of variation among loci may be identical, but the variance between loci will be far greater under the strong selection case – with some loci falling in severely reduced regions of variation, and others in neutral regions.

More »

Expand

Table 1.

Definitions of commonly used symbols.

More »

Expand

Figure 2.

The ratio of the coefficient of variation (CV) of π under four recurrent selection models to the CV of π under equilibrium neutrality, for four selection coefficients (s = 1E−02, 1E−03, 1E−04, and 1E−05).

n = 25. A) Drosophila-like parameters, ρ/θ = 10, ρ = 0.1/site, θ = 0.01/site. (B) Drosophila-like parameters, ρ/θ = 20, ρ = 0.2/site, θ = 0.01/site. (C) Human-like parameters, ρ/θ = 1, ρ = 0.002/site, θ = 0.002/site. The selection coefficient, s, and rate of advantageous substitution, 2Nλ, differ among selection models, though their product remains the same for each given value of ρ/θ (sλ = 2.5E−13 for ρ/θ = 10, 20; sλ = 5E−11 for ρ/θ = 1 and N = 10⁶). 1000 replicates were generated under each model for each data point. As seen, the models begin to differentiate from one another as the size of the sampled region gets larger, suggesting greater power to distinguish weak and strong selection models at larger physical scales.

More »

Expand

Figure 3.

Distributions of Fay and Wu's H-statistic [5] and Tajima's D-statistic [45] under common weak and rare strong selection models.

(A) The distribution of Fay and Wu's H for 500 bp regions. (B) The distribution of Fay and Wu's H for 100 kb regions. (C) The distribution of Tajima's D for 500 bp regions. (D) The distribution of Tajima's D for 100 kb regions. 1000 replicates were generated under each model and the following parameters were fixed: ρ = 0.1/site, θ = 0.01/site (thus, ρ/θ = 10), and n = 25. The selection coefficient, s, and rate, 2Nλ, differ among models, though their product is the same (2Nλs = 5.0E−07). As shown in [9], the mean H is positive under a recurrent sweep model. However, while we confirm that the means are positive and nearly identical for 2Nλs = constant, we find that previous attempts to differentiate these models have likely been hampered by the scale of the regions considered. Specifically, while the distributions for both statistics appear similar for 500 bp regions, they are quite distinct at larger physical scales (i.e., 100 kb).

More »

Expand

Figure 4.

Approximate Bayesian estimation of the strength and rate of selection as well as the neutral θ, when estimation is based upon the means and SDs of π, S, θ_H and ZnS.

The model is one in which s and 2Nλ are fixed. For the strong selection case s = 1.0E−02, and 2Nλ = 2.0E−05, for weak selection s = 1.0E−04, and 2Nλ = 2.0E−03. ρ = 0.1/site and θ = 0.01/site. Shown are the distributions of 1000 MAP estimates. The dotted lines indicate the true values. The distributions for 10 50 kb region datasets are given in black, and for 1000 500 bp datasets in gray. As shown, the use of these multiple summary statistics improves estimation relative to π alone (Figure S1), reducing the RMSEs (Table S1).

More »

Expand

Figure 5.

Approximate Bayesian estimation of the strength and rate of selection as well as the neutral θ, when estimation is based upon the means and SDs of π, S, θ_H and ZnS.

The true model is one in which s and 2Nλ for each locus is drawn from exponential distributions. The mean s = 1.0E−02, and the mean 2Nλ = 2.0E−05 (given by dotted lines). Shown are the distributions of 1000 MAP estimates. ρ is given by a Normal(0.1, 0.05), and θ is fixed at 0.01/site. Results are given for estimation when priors are constructed under a distributed parameter model, as well as a fixed parameter model (see Methods), for 10×50 kb and 1000×500 bp regions. As shown, falsely assuming fixed selection parameters leads to consistent biases in estimation, whereas appropriately constructing the priors reduces the bias (see also Table S1).

More »

Expand

Figure 6.

Marginal posterior distributions of s, 2Nλ, and θ, for the 137-locus dataset of [11], when estimation is based upon the means and SDs of π, S, θ_H and ZnS.

Results are given when the priors are constructed assuming fixed selection parameters, as well as when parameters for each locus are drawn from distributions (see Methods). In order to model the dataset under consideration, priors are constructed such that each replicate consists of 137 loci each of the observed length. n = 12, ρ = 0.121, and N_e = 1.87⁶ (in accord with the estimates of [11]). Consistent with the simulation results, assuming a model in which selection coefficients are fixed leads to larger estimates of ŝ, and reduced estimates of .

More »

Expand

Table 2.

Comparing empirical estimates with estimated demographic models^a.

More »

Expand

Table 3.

Comparing estimates of recurrent hitchhiking model parameters in Drosophila.

More »

Expand