Identifying adaptively important loci in recently bottlenecked populations – be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed – remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.
Citation: Poh Y-P, Domingues VS, Hoekstra HE, Jensen JD (2014) On the Prospect of Identifying Adaptive Loci in Recently Bottlenecked Populations. PLoS ONE 9(11): e110579. https://doi.org/10.1371/journal.pone.0110579
Editor: Paul Hohenlohe, University of Idaho, United States of America
Received: May 13, 2014; Accepted: September 16, 2014; Published: November 10, 2014
Copyright: © 2014 Poh et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. Raw sequence data are available at the NCBI Sequence Read Archive (accession number: SRA050092.2 and SRP017939).
Funding: The work was funded by grants from the Swiss National Science Foundation and a European Research Council (ERC) Starting Grant to JDJ. Work in the Hoekstra Lab is funded by the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Identifying the genes driving speciation or adaptation following the colonization of novel habitats is a major focus of both ecological and evolutionary genetics. The rapid fixation of a favorable allele by directional selection results in reduced genetic variability  and well-described skews in the frequency spectrum at linked loci via genetic hitchhiking (; and see review ). However, demographic factors alone may also produce similar patterns, particularly reductions in population size that subsequently lead to an increased rate of genetic drift. Exploring this issue analytically, Barton  demonstrated that a selective sweep had similar effects on neutral diversity as a founder event. In particular, the coalescence events induced by the size reduction, followed by population growth, result in a scenario in which the distribution of neutral genealogies matches that expected under a selective sweep model (for further discussion, see review ). Despite this important result, it has nonetheless been proposed that because demographic events affect the entire genome, whereas selective events have only locus-specific effects (e.g., ), it may be possible to take a simple outlier approach to identify recently selected loci . However, consistent with the analytical results, it subsequently has been demonstrated via simulation that such outlier-based genomic scans based upon neutral equilibrium null models are prone to high false positive rates , , , owing to an inability to distinguish neutral non-equilibrium models from non-neutral equilibrium models.
To circumvent these difficulties, Nielsen and colleagues  proposed the idea of utilizing the background site frequency spectrum (SFS) as a null model in a statistic termed Sweepfinder. In brief, rather than depending upon comparison with the standard neutral model, this class of tests simply would identify putatively adaptive loci that are unusual relative to the background level of genomic variation. With the same notion, but utilizing patterns of linkage-disequilibrium (LD) instead of the SFS as with Sweepfinder, the ωmax  and iHS  statistics also have been proposed. Particularly with the emergence of next-generation sequencing, an ever-increasing number of studies have relied on these promising ‘background-effect-based’ approaches – utilizing huge amounts of data to construct the background SFS/LD (thus controlling for demography, in principle) – to identify loci contributing to a local adaptive response (e.g., –).
Because a great majority of these applications seek to identify adaptively significant loci in severely bottlenecked populations (e.g., populations that have recently colonized novel habitats or domesticated species), and in light of Barton's  important analytical results suggesting that the background SFS may not in fact be distinct relative to a swept region in bottlenecked populations, here we revisit the notion that the background SFS may be used to distinguish adaptively important loci in non-equilibrium populations. Thus, building on the results of Pavlidis et al. , we directly evaluate the ability of these approaches to (1) identify selected loci within recently bottlenecked populations (rather than considering neutral bottleneck models vs. equilibrium selection models) across a wide-range of bottleneck scenarios, and (2) localize the site of the beneficial fixation.
To test the utility of these approaches, we first focused on two particularly illustrative examples. First, we used the oldfield mouse (Peromyscus polionotus) from Florida's Gulf Coast, in which the selected phenotype (cryptic camouflage; ), and its underlying genotype (a single non-synonymous mutation [Arg65Cys] in the melanocortin-1 receptor [Mc1r]; ) are well documented. In addition, both the geological age of the islands  and the time and severity of the colonization bottleneck have been estimated . Specifically, the derived Mc1r allele contributes to lighter camouflaging pigment of the Santa Rosa Island beach mice (P. p. leucocephalus) relative to the darkly pigmented, ancestral mainland subspecies (P. p. subgriseus) (Figure 1A; , ). Thus, it is reasonable to expect an identifiable selective sweep signal around the Mc1r gene using the aforementioned population-genetics approach. However, we were unable to detect any significant signal in Mc1r or its surrounding regions by either SFS-based or linkage disequilibrium (LD)-based methods (Figure 1; also see ), despite the unusually precise knowledge of recent selection acting on genotype/phenotype.
Geographic location and photos of the derived light and ancestral dark mouse populations from (A) Florida (photos by J. Miller and S. Carey) and (B) Nebraska (photos by C. Linnen) (top panel). Cartoon representation of the inferred demographic model of the two species (middle panel). Both models include selection acting on the bottlenecked population (with effective population size reduced to fNe, where Ne is the ancestral population size) immediately after the divergence from the ancestral population at time d, and the selected allele becomes fixed at time τ. Likelihood ratio (LR) profile of Sweepfinder in both populations of light-colored mice (bottom panel), where the horizontal line indicates the significance cutoff. Stars indicate the approximate location of causal mutations conferring light pigmentation. Because there are multiple Agouti alleles, we here polarize (into “light” or “dark” class) based on the SNP mostly strongly associated with pigment variation (as described in ).
In a second example, populations of P. maniculatus in Nebraska have recently evolved cryptic coloration in a novel light substrate habitat as a result of the formation of the Sand Hills approximately 10,000 years ago , . The Nebraska Sand Hills mice have accumulated multiple adaptive mutations within the pigmentation locus Agouti (Figure 1B). But, unlike in beach mice, Sweepfinder detected large and strong selective footprints around SNPs associated with different pigmentation traits (Figure 1; ). This Sand Hills population has experienced a recent population reduction similar in both timing and severity to beach mice. For reference, both these bottlenecks are more extreme than that of human populations out of Africa , ) but comparable to the population reduction associated with dog breed formation .
Here, we explore the major factors contributing to this difference in performance between the Florida and Nebraska mouse populations – and more broadly explore the parameter space over which population-genetic approaches may be expected to be successful via simulation. While this study is motivated by the results observed in Peromyscus (as this is in many ways a ‘best-case scenario,’ in which selective pressure, phenotype, and underlying genotype are all well described), our results are broadly applicable across systems as the field continues to maintain a strong focus upon identifying locally adaptive loci in strongly bottlenecked populations that are associated with recent colonization (e.g., , , domestication (e.g., –, and infection (e.g., –).
Materials and Methods
Empirical data analysis
To evaluate the performance of commonly used statistics to detect selective sweeps, we used two well-studied populations of Peromyscus mice—one in which signatures of selection were absent and a second in which they were strong—as a starting point. We first utilized the Santa Rosa Island population of beach mice (P. polionotus) in which a Mc1r variant contributing to cryptic coloration has been fixed . Nineteen individuals from Santa Rosa Island were sampled. The SureSelect capture array (Agilent Technologies, Santa Clara, CA) based on a Peromyscus Mc1r-containing BAC clone was designed to enrich the templates for the Mc1r locus, and then the capture library was sequenced on an Illumina HiSeq 2000 (Illumina Inc., San Diego, CA) (see ). Raw sequence data are available at the NCBI Sequence Read Archive (accession number: SRA050092.2). We used the Burroughs-Wheeler Alignment (BWA) tool to perform mapping and alignment, and used GATK software to call the SNPs and identify genotypes.
The ancestral mainland population size was estimated to be Na∼ 2500, representing a 99.9% population size reduction associated with the colonization of Florida's Gulf Coast approximately 3,000 years ago (∼7000 generations ago). Then, we applied Sweepfinder on the Mc1r genomic region and determined the significance level by ms simulation  based on the above estimated demographic parameters.
Similarly, 91 individuals of the Nebraska Sand Hills mice (P. maniculatus) were collected, and ∼180 kb encompassing the Agouti locus was sequenced. The sequence data were deposited in NCBI Sequence Read Archive (accession number SRP017939). The SureSelect capture array based on a Peromyscus Agouti-containing BAC clone was designed to enrich the templates for the Agouti locus. The sequencing and mapping strategies were identical to those used above for P. polionotus and Mc1r, and further details can be found in .
The Sand Hills mice likely colonized the novel light dunes approximately 3,000 years ago at which time they also experienced a severe bottleneck (∼99.6% reduction in population size; , ). Thus, the timing (denoted as d) and severity (denoted as f) of the bottleneck are remarkably similar in both populations. However, the size of the Nebraska population (Ne = ∼50,000; Table 1) is estimated to be 200 times greater than the Florida population (Ne = ∼2500; Table 1). However, because the derived light phenotype in the Nebraska population is not fixed in the sampled population [see 26] and Sweepfinder and ωmax are only applicable for complete sweeps, we divided the entire dataset into “light” and “dark” alleles at the Agouti locus based on the SNP most strongly associated with pigmentation, in this case, the tail stripe phenotype. Thus, the light alleles represent a population in which the selected allele has been recently fixed, while the dark alleles are used as a reference population characterized by a shared demographic history.
Simulated data analysis
To parameterize both demographic and selection estimates, we performed coalescent simulations using the code of Thornton and Jensen , and the parameters used here follow their definitions. In addition to the Florida and Nebraska population models, we performed general simulations to better describe the performance of these statistics. Both comparable (severity of bottleneck, f = 0.01) and less severe bottlenecks (f = 0.1 and f = 0.5) were evaluated in three different population sizes (Ne = 104, 105 and 106). For the simulations, we used the mutation and recombination rates estimated from Mus domesticus (μ = 3.7×10−8, ; r = 5.6×10−7, ). The sample size (n; number of chromosomes) = 40 (see Figure S1 for comparisons of different sample sizes) and region length (L) = 180 kb (see Figure S2 for a comparison of different region lengths) are fixed in all simulated datasets to match the data obtained in the Peromyscus populations, thus representing realistic empirically-based parameters.
In our demographic simulations, we considered a selective sweep on a single de novo mutation at position 90kb (i.e., the middle of the simulated region) with selection coefficients of s = 0.001, s = 0.01 or s = 0.1 arising immediately after the divergence from the ancestral population. These models result in fixation times (τ) ranging from 0.01 to 0.3 2N generations in the past. The population size reduction occurs immediately after divergence from the ancestral population, and recovers 0.01 2N generations prior to sampling. Finally, 100 replicates for each model were generated and analyzed using the commonly used background SFS approach (Sweepfinder; ) as well as the sliding window LD method (ωmax; ). Significance cutoffs were determined via neutral simulation in ms , with the demographic model and θ fit to each case. Following Nielsen and colleagues , the 95th percentile of the statistic ΛSF denotes the threshold value. Given that the expected size of the sweep region can be approximated as 0.01 s/r base pairs , the footprints of selection should be captured within 10kb window surrounding the selected site (see Figure S3 for the empirically observed LD decay). We thus considered the rejections of neutrality within 10kb as true positives (TP), and those outside the targeted region as false positives (FP). The TP and FP rates were used as the major indicator for the performance of Sweepfinder and ωmax to identify selective sweeps.
Results and Discussion
The likelihood profiles of Sweepfinder in both the Florida beach mice and Nebraska Sand Hills mice are given in Figure 1, highlighting a significant result only in the Nebraska population. To investigate this finding, we first performed a series of simulations using demographic models mimicking the population history of the Florida and Nebraska mice (Table 1), accompanied by a single hard sweep. Here we assume that selection began at the time of the split from the ancestral population, and the selected allele was fixed at 0.1 or 0.3 2N generations ago, with strengths ranging from 0.001 to 0.1. The observed median values of polymorphism in the replicates range from π = 6.5×10−5 – 6.9×10−5 for Florida-model mice and π = 6.6×10−5 – 1.1×0−4 for Nebraska-model mice, with the SFS skewed towards rare alleles (with observed median values of Tajima's D = −1.80 – −2.13).
For small population sizes (i.e., the Florida population), we found that Sweepfinder has very limited power to detect recent selective fixations (Figure 2), while TP improves for larger population sizes (i.e., the Nebraska population) – though this increase in power is also associated with an increased FP rate (Table 2). Similarly, the ωmax statistic is not able to clearly discriminate the selected loci from the neutral background when the population size is small, but again TP improves as population size increases. As has been previously described (e.g., ), power diminishes quickly as the time since fixation (τ = given in 4N generations) increases – with Sweepfinder failing to detect any rejections of neutrality for τ = 0.3. At τ = 0.1, the power of Sweepfinder and ωmax are comparable. In general, the rejection rate (TP and FP) of Sweepfinder is lower than ωmax in both examples, though higher FP in many ways presents a greater concern. Thus, the successful empirical identification of the signature of selection in the Nebraska Sand Hills mice, relative to the Gulf Coast population, by Sweepfinder likely is attributable to the larger population size.
The simulations with demographic models mimic the history of (A) Florida beach mice (Ne = 2500) and (B) Nebraska Sand Hills mice (Ne = 50,000). The time of the bottleneck (tr = 0.1) and time since fixation (τ = 0.1) are fixed, but selection strength varies from s = 0.001 to 0.1. Ideal performance would be indicated by all replicates showing a significant signal at very small window sizes, suggesting an ability to localize the target.
To consider more generalized parameters, we examined performance across simulations of varying Ne and f (Figure 3). In general, the TP rate of Sweepfinder is higher than ωmax for populations of small Ne (though both approaches perform poorly), but ωmax performs better when Ne>105 (i.e., for Ne = 105, TP∼50%; for Ne = 106, TP∼60%), despite the severity of the bottleneck. The improved performance of ωmax is related to the increasing SNP density, which increases for larger Ne. However, the TP rate of Sweepfinder remains relatively constant across varying Ne or f (TP∼10%). To further explore the effect of the timing of selection, we compared the Sweepfinder and ωmax results for a range of times since fixation (Figure 4). As suggested by the mouse examples above, Sweepfinder has no power to reject neutrality when a beneficial fixation is older than 0.01 2N generations. Similarly for ωmax, power is maximized when the sweeps are recent and occur in large populations.
Simulations with ancestral population size equal to (A) 104, (B) 105 and (C) 106. Selection strength (s = 0.01), time since fixation (τ = 0.1), and time since bottleneck (tr = 0.1) are fixed, but bottleneck severity (f) varied from 0.01 to 0.5.
Simulations with ancestral population size equal to (A) 104, (B) 105 and (C) 106. Selection strength (s = 0.01) and time since bottleneck (tr = 0.01) are fixed, but the time of selected allele fixation (τ) varied from 0.01 to 0.3.
The results from both the empirical examples and the more general simulations together highlight two fundamental lessons. First, the skew in the SFS associated with a selected region is not unusual relative to the background genomic patterns under a variety of bottleneck models owing to the fact that the coalescent processes underlying both the selected locus as well as the surrounding neutral loci are similar, as described by Barton . Second, LD-based expectations generally outperform SFS-based expectations under these models (particularly for large population sizes), supporting the theoretical predictions of Stephan et al.  in describing the advantages of this specific post-fixation LD expectation (i.e., elevated LD flanking the beneficial mutation, but reduced LD spanning the site), further highlighting the value of generating linkage information, rather than simply SNP frequencies, in future genomic studies . Importantly however, even this LD pattern is not exclusive to selective sweeps, and also may be generated under certain neutral bottleneck models.
The ability to detect the footprint of a selective sweep in genomic data from bottlenecked populations remains as an important and largely unresolved challenge. The results presented here strongly suggest that the widely utilized approach of employing the background SFS as a null model has not much improved our ability to identify true selective sweeps for much of the parameter space of interest to biologists. Troublingly, the false positive rate found by these models is often in excess of power, suggesting that the majority of significant results in such populations are likely erroneous. In the extreme case of beach mice – in which the target of selection has been functionally validated – we have not identified any existing population-genetic-based statistic capable of identifying this causal variant. In comparison, the successful identification of beneficial mutations in Nebraska mice can be attributed to its larger population size as well as the recurrent and recent selective events still ongoing in the Sand Hills population. Thus, these data underscore both a need for great caution when interpreting results from selection studies in recently bottlenecked populations and for continued methodological and theoretical development, specifically inference procedures capable of jointly estimating selection and demography simultaneously.
The fraction of simulated replicates rejecting the neutral model by Sweepfinder, with varying sample size. The simulations with demographic models mimic the history of Nebraska Sand Hills mice (Ne = 50,000), and the time of the bottleneck (tr = 0.1) and time since fixation (τ = 0.1) and selection strength (s = 0.1) are fixed but sample size varies from N = 20 to 80. Ideal performance would be indicated by all replicates showing a significant signal at very small window sizes, suggesting an ability to localize the target.
The fraction of simulated replicates rejecting the neutral model by Sweepfinder, with and without another 180kb simulated neutral region. The simulations 180kb with demographic models mimic the estimated history of Nebraska Sand Hills mice (Ne = 50,000), the time of the bottleneck (tr = 0.1), the time since fixation (τ = 0.1), but selection strength varies from s = 0.001 to 0.1. The right panel shows the Sweepfinder performance with another 180kb simulated neutral regions with the same demographic parameters added. The results suggested that Sweepfinder could gain more efficacy in identifying sweeps with more neutral SNPs to build the background SFS, but the improvement is modest.
Decay of linkage disequilibrium (LD) as a function of physical distance between variable sites. In all panels the solid lines represent medians for each X axis category (physical spacing bin) centered on the plotted X coordinate, dashed lines represent means of spacing bins, and dotted lines represent 95th percentiles of spacing bins.
We would like to thank Kevin Thornton, Pavlos Pavlidis and Jessica Crisci for valuable discussion and sharing the scripts. We also thank Joanna Kelley and an anonymous reviewer for constructive comments, which have greatly improved the paper.
Conceived and designed the experiments: YPP VSD HEH JDJ. Performed the experiments: YPP. Analyzed the data: YPP. Contributed reagents/materials/analysis tools: YPP VSD HEH JDJ. Wrote the paper: YPP VSD HEH JDJ.
- 1. Maynard-Smith J, Haigh J (1974) The hitch-hiking effect of a favourable gene. Genet Res 23: 23–35.
- 2. Braverman JM, Hudson RR, Kaplan NL, Langley CH, Stephan W (1995) The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140: 783–796.
- 3. Barton NH (2000) Genetic hitchhiking. Philos Trans R Soc Lond B Biol Sci 355: 1553–1562.
- 4. Thornton KR, Jensen JD, Becquet C, Andolfatto P (2007) Progress and prospects in mapping recent selection in the genome. Heredity 98: 340–348.
- 5. Hudson RR, Kreitman M, Aguade M (1987) A test of neutral molecular evolution based on nucleotide data. Genetics 116: 153–159.
- 6. Harr B, Kauer M, Schlotterer C (2002) Hitchhiking mapping: a population-based fine-mapping strategy for adaptive mutations in Drosophila melanogaster. Proc Natl Acad Sci U S A 99: 12949–12954.
- 7. Teshima KM, Coop G, Przeworski M (2006) How reliable are empirical genomic scans for selective sweeps? Genome Res 16: 702–712.
- 8. Crisci JL, Poh YP, Mahajan S, Jensen JD (2013) The impact of equilibrium assumptions on tests of selection. Front Genet 4: 235.
- 9. Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, et al. (2005) Genomic scans for selective sweeps using SNP data. Genome Res 15: 1566–1575.
- 10. Pavlidis P, Jensen JD, Stephan W (2010) Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations. Genetics 185: 907–922.
- 11. Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection in the human genome. PLoS Biol 4: e72.
- 12. Molina J, Sikora M, Garud N, Flowers JM, Rubinstein S, et al. (2011) Molecular evidence for a single evolutionary origin of domesticated rice. Proc Natl Acad Sci U S A 108: 8351–8356.
- 13. Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, et al. (2007) Localizing recent adaptive evolution in the human genome. PLoS Genet 3: e90.
- 14. Xia Q, Guo Y, Zhang Z, Li D, Xuan Z, et al. (2009) Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science 326: 433–436.
- 15. Yoder JB, Stanton-Geddes J, Zhou P, Briskine R, Young ND, et al. (2014) Genomic Signature of Adaptation to Climate in Medicago truncatula. Genetics 196: 1263–1275.
- 16. Chavez-Galarza J, Henriques D, Johnston JS, Azevedo JC, Patton JC, et al. (2013) Signatures of selection in the Iberian honey bee (Apis mellifera iberiensis) revealed by a genome scan analysis of single nucleotide polymorphisms. Mol Ecol 22: 5890–5907.
- 17. Tsumura Y, Uchiyama K, Moriguchi Y, Ueno S, Ihara-Ujino T (2012) Genome scanning for detecting adaptive genes along environmental gradients in the Japanese conifer, Cryptomeria japonica. Heredity 109: 349–360.
- 18. Barton NH (1996) Natural selection and random genetic drift as causes of evolution on islands. Philos Trans R Soc Lond B Biol Sci 351: 785–794.
- 19. Vignieri SN, Larson JG, Hoekstra HE (2010) The selective advantage of crypsis in mice. Evolution 64: 2153–2158.
- 20. Hoekstra HE, Hirschmann RJ, Bundey RA, Insel PA, Crossland JP (2006) A single amino acid mutation contributes to adaptive beach mouse color pattern. Science 313: 101–104.
- 21. McNeil FS (1950) Pleistocene shorelines in Florida and Georgia. US Geol Survey 221: 59–107.
- 22. Domingues VS, Poh YP, Peterson BK, Pennings PS, Jensen JD, et al. (2012) Evidence of adaptation from ancestral variation in young populations of beach mice. Evolution 66: 3209–3223.
- 23. Mullen LM, Vignieri SN, Gore JA, Hoekstra HE (2009) Adaptive basis of geographic variation: genetic, phenotypic and environmental differences among beach mouse populations. Proc Biol Sci 276: 3809–3818.
- 24. Ahlbrandt TS, Fryberger SG (1980) Eolian deposits in the Nebraska sand hills. geological survey professional paper 1120: 1–24.
- 25. Linnen CR, Kingsley EP, Jensen JD, Hoekstra HE (2009) On the origin and spread of an adaptive allele in deer mice. Science 325: 1095–1098.
- 26. Linnen CR, Poh YP, Peterson BK, Barrett RD, Larson JG, et al. (2013) Adaptive evolution of multiple traits through multiple mutations at a single gene. Science 339: 1312–1316.
- 27. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009) Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5: e1000695.
- 28. Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475: 493–496.
- 29. Freedman AH, Gronau I, Schweizer RM, Ortega-Del Vecchyo D, Han E, et al. (2014) Genome sequencing highlights the dynamic early history of dogs. PLoS Genet 10: e1004016.
- 30. Grandcolas P, Murienne J, Robillard T, Desutter-Grandcolas L, Jourdan H, et al. (2008) New Caledonia: a very old Darwinian island? Philos Trans R Soc Lond B Biol Sci 363: 3309–3317.
- 31. Juan C, Guzik MT, Jaume D, Cooper SJ (2010) Evolution in caves: Darwin's ‘wrecks of ancient life' in the molecular era. Mol Ecol 19: 3865–3880.
- 32. Boyko AR (2011) The domestic dog: man's best friend in the genomic era. Genome Biol 12: 216.
- 33. Groeneveld LF, Lenstra JA, Eding H, Toro MA, Scherf B, et al. (2010) Genetic diversity in farm animals—a review. Anim Genet 41 Suppl 16–31.
- 34. Tian F, Stevens NM, Buckler ES (2009) Tracking footprints of maize domestication and evidence for a massive selective sweep on chromosome 10. Proc Natl Acad Sci U S A 106 Suppl 19979–9986.
- 35. Denoeud F, Roussel M, Noel B, Wawrzyniak I, Da Silva C, et al. (2011) Genome sequence of the stramenopile Blastocystis, a human anaerobic parasite. Genome Biol 12: R29.
- 36. Ocana-Macchi M, Ricklin ME, Python S, Monika GA, Stech J, et al. (2012) Avian influenza A virus PB2 promotes interferon type I inducing properties of a swine strain in porcine dendritic cells. Virology 427: 1–9.
- 37. Rottschaefer SM, Riehle MM, Coulibaly B, Sacko M, Niare O, et al. (2011) Exceptional diversity, maintenance of polymorphism, and recent directional selection on the APL1 malaria resistance genes of Anopheles gambiae. PLoS Biol 9: e1000600.
- 38. Van Tyne D, Park DJ, Schaffner SF, Neafsey DE, Angelino E, et al. (2011) Identification and functional validation of the novel antimalarial resistance locus PF10_0355 in Plasmodium falciparum. PLoS Genet 7: e1001383.
- 39. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338.
- 40. Thornton KR, Jensen JD (2007) Controlling the false-positive rate in multilocus genome scans for selection. Genetics 175: 737–750.
- 41. Lynch M (2010) Evolution of the mutation rate. Trends Genet 26: 345–352.
- 42. Jensen-Seaman MI, Furey TS, Payseur BA, Lu Y, Roskin KM, et al. (2004) Comparative recombination rates in the rat, mouse, and human genomes. Genome Res 14: 528–538.
- 43. Kaplan NL, Hudson RR, Langley CH (1989) The "hitchhiking effect" revisited. Genetics 123: 887–899.
- 44. Przeworski M (2002) The signature of positive selection at randomly chosen loci. Genetics 160: 1179–1189.
- 45. Barton NH (1998) The effect of hitch-hiking on neutral genealogies. Genetical Research 123: 123–134.
- 46. Stephan W, Song YS, Langley CH (2006) The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics 172: 2647–2663.
- 47. Cutler DJ, Jensen JD (2010) To pool, or not to pool? Genetics 186: 41–43.