Figures
Abstract
A common geographical pattern of genetic variation is the one-dimensional cline. Clines may be maintained by diversifying selection across a geographical gradient but can also reflect historical processes such as allopatry followed by secondary contact. To identify loci that may be undergoing diversifying selection, we examined the distribution of geographical variation patterns across the range of the killifish (Fundulus heteroclitus) in 310 loci, including microsatellites, allozymes, and single nucleotide polymorphisms. We employed two approaches to detect loci under strong diversifying selection. First, we developed an automated method to identify clinal variation on a per-locus basis and examined the distribution of clines to detect those that exhibited signifcantly steeper slopes. Second, we employed a classic -outlier method as a complementary approach. We also assessed performance of these techniques using simulations. Overall, latitudinal clines were detected in nearly half of all loci genotyped (i.e., all eight microsatellite loci, 12 of 16 allozyme loci and 44% of the 285 SNPs). With the exception of few outlier loci (notably mtDNA and malate dehydrogenase), the positions and slopes of Fundulus clines were statistically indistinguishable. The high frequency of latitudinal clines across the genome indicates that secondary contact plays a central role in the historical demography of this species. Our simulation results indicate that accurately detecting diversifying selection using genome scans is extremely difficult in species with a strong signal of secondary contact; neutral evolution under this history produces clines as steep as those expected under selection. Based on these results, we propose that demographic history can explain all clinal patterns observed in F. heteroclitus without invoking natural selection to either establish or maintain the pattern we observe today.
Citation: Strand AE, Williams LM, Oleksiak MF, Sotka EE (2012) Can Diversifying Selection Be Distinguished from History in Geographic Clines? A Population Genomic Study of Killifish (Fundulus heteroclitus). PLoS ONE 7(9): e45138. https://doi.org/10.1371/journal.pone.0045138
Editor: William J. Etges, University of Arkansas, United States of America
Received: April 24, 2012; Accepted: August 14, 2012; Published: September 26, 2012
Copyright: © Strand et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by the National Science Foundation (DEB-0919064 and IOS-1052262; www.nsf.gov). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Clinal variation is a common feature of terrestrial, freshwater [1], [2] and marine species [3]–[5]. This list includes Homo sapiens, whose geographic patterns of polymorphism are largely clinal [6]. In a review of John Endler’s influential monograph on geographic clines, David Woodruff quipped that clines and hybrid zones are “taxonomist’s nightmares and evolutionist’s delights” because they inform several vexing issues in evolution, including natural selection, gene flow, historical demography, and the definition and generation of species [7], [8].
Historically, clinal patterns were considered to be the consequence of diversifying selection [9], especially in cases in which a cline coincides with geographic shifts in abiotic or biotic factors (or an ecotone [7]). It is now broadly recognized that diversifying selection is not a necessary condition for establishment of clinal patterns. Restricted dispersal can create seemingly adaptive clines at single loci [10], and historic geographic subdivision may establish clinal patterns that may persist for long periods of time [7], [11] especially when population density or gene flow is low [12]. Despite a long history of both theoretical and empirical investigation into clinal variation, the extent to which selection, gene flow, history, and their interactions contribute to shaping a particular cline remains a difficult, and largely unanswered question. Today, the challenge that remains is to understand the extent by which particular clinal patterns are due to these neutral and selective processes [5]. For a well-developed theory of clines to successfully explain these commonly observed patterns, it will be critically important to distinguish the role of selection from other evolutionary factors.
With the recent explosion of genomic techniques to perform genome scans, evolutionary geneticists working with non-model organisms find themselves on the cusp of discovering loci under diversifying selection at relatively modest cost [13], [14]. Arguably, however, advances in the analysis of genomic data lags advances in the generation of genomic data, and this limits our ability to fully capitalize on the genomic information. One of these disparities is well illustrated by genomic clinal patterns.
One of the most successful approaches to identifying candidate loci under diversifying selection is the -outlier technique [15], [16], which requires measurement of geographic differentiation (typically measured as FST) across two or more populations at multiple loci. Loci under strong diversifying selection are those whose FST values are larger than expected from coalescent simulation of neutral evolution in an island model [15]. In the case of strong clines, however, both ‘neutral’ and ‘non-neutral’ loci show similar patterns of geographic variation [5], and values between edge populations are often uniformly high for all loci. As a consequence, it is challenging to distinguish among neutral and non-neutral patterns when also accounting for underlying clinal patterns. Simply ignoring the underlying clinal pattern may dramatically increase the rate of false identification of loci under diversifying selection. An additional limitation of analyses in clines is that they ignore potentially useful information encoded in characteristics of the cline such as its location and slope [17].
One of the most widely-cited and well-characterized genetic clines occur within the killifish Fundulus heteroclitus, a common estuarine fish whose biology and ecology has been intensively studied for over a century[18]–[20]. Over this period, F. heteroclitus has been developed as an economically important species, constituting a significant proportion of bait sold in commercial fish bait operations [21]. Perhaps more importantly, F. heteroclitus has been developed as a model system for experimental biology [18], and especially as a model for environmental toxicology in estuarine systems [20].
Fundulus heteroclitus also became an important model organism for the study of adaptive clinal variation [22]–[24] for several reasons. There are coincident allozyme clines centered at approximately 40°N along the New Jersey shoreline [25]–[27] (figure 1). These allozyme clines correlate with latitudinal changes in air and water temperature: the average yearly water temperature decreases by approximately 1°C per degree latitude [25]. A number of functional studies, especially those examining lactate dehydrogenase (Ldh), have indicated that protein variants (at the enzymatic, cellular and organismal levels) differ in their performance across these temperatures [25], [28], [29], suggesting that selection plays a role in maintaining F. heteroclitus allozyme clines. At the same time, non-adaptive explanations for Fundulus clines have strong empirical support. Most putatively-neutral loci that have been examined (mitochondria [30]–[32], microsatellites [23], [33] and SNPs [34]) display clinal variation that mirrors that of many allozyme loci (figure 1). Moreover, at microsatellite and mitochondrial SNPs, there is substantial endemic diversity at both northern and southern ends of the cline [35]. These patterns are consistent with a secondary contact zone between historically distinct populations and suggest that neutral non-equilibrial processes may play a role in generating and maintaining F. heteroclitus clines. While authors have consistently recognized that allozyme clines may reflect neutral processes in F. heteroclitus [19], [23], [25], [32], a comprehensive, simultaneous analysis of all publically available allelic clines has yet to be pursued. Such an analysis has the potential to discern the degree to which putative loci are under selection or reflect neutral processes (gene flow, drift, non-equilibrial secondary introgression).
A) Location of sampled populations. B) Allele frequences of clinal loci used in the Genomic Co-Co plot analysis. mtDNA and Mdh allozyme clines are highlighted in black, and Ldh allozyme and Ldh SNP clines are highlighted in red.
Here, we apply two general analytical approaches to detect the differences in spatial distributions of F. heteroclitus alleles undergoing diversifying selection and those evolving neutrally. One approach is the -outlier technique [15], [16]. The second approach focuses upon landscape-level parameters of the clines themselves, in particular, their location and slope. Szymura and Barton [12] pioneered the use of distributions of cline locations and slopes at multiple loci to simultaneously examine both demographic and selection history contained in a cline between two species of fire-bellied toads. In this study, we implement an analysis to visualize the distribution of cline location (coincidence) and slope (concordance) for large numbers of loci. We call the resulting plot a genomic Co-Co plot. An analogous approach has been used successfully to show that clines in morphological traits had different midpoints [36] and steeper slopes [37] than did the ‘neutral’ expectation, suggesting these morphological traits were under selection. Finally, we simulate population genetics of neutral loci under a likely demographic history for F. heteroclitus. We then use the simulated genotypic distributions to examine the performance of both outlier and Co-Co analyses in the presence of secondary contact. Our results caution that inferring selection using genome scans is very difficult when a strong signal of secondary contact remains.
Results
Summary of Genetic Data
We assembled a combined dataset in F. heteroclitus that consists of 310 loci (1 mtDNA, 8 microsatellites, 16 allozymes, and 285 SNPs) from the sources available listed in Table 1. Not all loci were sampled at each location; Figure 1a indicates the spatial locations for loci from each category (mtDNA, allozyme, microsatellite, and SNP). Close examination of Figure 1a reveals fine-scaled spatial sampling for allozymes and mtDNA, and slightly less dense sampling for microsatellites. Though the majority of the loci examined in this study were of SNP origin, these data were the least densely sampled with a total of 11 populations included, though the spatial extent of sampling for SNPs did span the cline located at approximately 40°N.
Comparison among Types of Loci
Of the 310 loci used in this study, 146 (47%) exhibited a clinal pattern of variation based upon likelihood ratio tests between intercept only models and broken-stick models assessed at the 0.05 level. In the SNP dataset alone, 44% of the loci genotyped in F. heteroclitus exhibit a detectable clinal pattern (Table 2).
Table 2 also illustrates the difference in the distribution of clinal patterns of variation among marker types. While the mtDNA locus and the majority of microsatellite and allozyme loci exhibit clinal variation, slightly less than one-half of SNPs did. This difference in clinal frequency between SNP and non-SNP loci is strongly supported by contingency-table analysis (Fisher’s exact test, p = 0.0001).
Outlier Analyses
The Co-Co plot indicates that the average midpoint of clinally-distributed loci is centered at approximately 40°N, with a broad confidence interval at both midpoint and slope (Figure 2, left panel). Outlier loci in the Co-Co plot tend to have a steeper slope and most have a midpoint that is 40°N. The fdist2 analysis (Figure 2, right panel) indicates that over most values of heterozygosity, FST values average 0.3. This reflects the high degree of differentiation across F. heteroclitus noted previously (see Introduction). In our fdist2 analysis, outlier loci are those that have a more profound differentiation than expected. No loci with a weaker level of differentiation were significantly distinct, but this is not unexpected given that fdist2 is known to provide greater statistical power in detecting divergent relative to stabilizing selection [15].
Contours of 75, 95 and 99% are shown on the co-co plot. Mean FST is dotted and 99% C.I. lines are in black on the fdist2 panel. Outlier loci are in red (95% C.I.), non-outlier loci are in grey, and Ldh allozyme and SNPs are blue.
Two loci (mtDNA and malate dehydrogenase, or Mdh) were significant outliers at both FST-outlier and Co-Co plot analyses. Twenty five SNP loci were significant outliers at only FST-outlier and three SNPs were significant outliers with only Co-Co plot analyses (Table S1).
Neither FST outlier nor Co-Co analyses indicate that alleles at Ldh are under diversifying selection. In fact, both analyses place Ldh in the center of the joint distributions of and FST in the FST outlier analysis and cline slope and midpoint in the Co-Co analysis (Figure 2, Figure 1B). This pattern is consistent among Ldh genotypes derived from allozyme analysis as well as genotyping based on two SNPs found within Ldh [38]. The fact that both allozyme and SNP characterization of Ldh genotypes show the same pattern, while anecdotal, also supports our decision to combine datasets to develop the large number of markers needed to identify outlier loci.
Simulations
To assess the performance of both outlier approaches to detect loci under diversifying selection along clines, we simulated neutral loci in the Fundulus system along the western North Atlantic. Figure 3 outlines the design that we used for these simulations. Figure 4 summarizes simulation results under two different rates of dispersal ( and 4.0, respectively). Our simulations indicate that after 10K years since the introgression of historically-separated populations of Fundulus, a large proportion (42%) of neutrally-simulated loci maintained a clinal pattern. Dispersal had a small but discernible effect upon cline width (Figure 4; for higher dispersal rates, see Figure S2 in supporting information). In addition, the percent of loci exhibiting a clinal pattern increased somewhat with dispersal rate (mean percent of clinal loci are 36%, 41%, and 50% for equal to 0.4, 4, and 40, respectively). As expected under neutrality, equal numbers of clines exhibited positive and negative slopes (in Figure 4 A-panels, was plotted for clines with positive slopes).
For all simulations presented, ancestral populations are split into two relictual populations 100,000 generations in the past. At 10,000 generations before present, each of these daughter populations split into 20 linearly arranged populations (only 8 of the resulting 40 populations are indicated). Gene flow between populations continued at the same rate throughout the 10,000 generations.
Subjecting simulation results to the two outlier analyses yielded overall distributions of cline steepness and among population largely indistinguishable from the same analyses performed upon the empirical data (Figure 2 vs. Figure 4). As we observed in the empirical data, large numbers of neutrally simulated loci were identified as undergoing diversifying selection. In addition, the peak in values identified at intermediate heterozygosities in the empirical data were also present in the simulated datasets, regardless of the strength of gene flow.
In the sub-figure on the left the stepping-stone dispersal rate is 0.4 (); on the right, is equal to 4. In each sub-figure, panel A illustrates the distribution of allele frequencies along the cline for simulated loci. All clines are adjusted so that higher allele frequencies are on the left. Bold lines indicate loci deemed to be under diversifying selection by fdist2. Panel B shows the distribution of cline midpoints and slopes estimated using broken stick models applied to the clines in panel A. Panel C shows the results of fdist2 applied to these clines.
Discussion
To our knowledge, this study represents the first time that the relative contribution of neutral and non-neutral processes has been quantified in the classic genetic clines of the killifish Fundulus heteroclitus. In addition, our computer-based coalescent simulations replicated the general characteristics of empirical F. heteroclitus clines in the absence of selection. This result raises real concern about the power of SNP genome scans to detect ‘loci that matter’ when a large portion of clines across the genome are generated by secondary contact.
Clinal Frequencies among SNPs versus Other Loci
Fewer SNP loci show clinal variation than observed in the other marker types (44% versus 75–100%; Table 2). Barring a fundamental difference in the evolutionary dynamics of SNPs versus other types of marker loci, this result could reflect bias associated with polymorphism screening, variability in numbers of segregating alleles per locus, or both. In the former, it is likely that published microsatellite and allozyme loci were chosen non-randomly from a larger pool of loci based upon their pattern of geographic variation. This easily could have occurred if new markers were screened for polymorphism by comparing few individuals collected from the margins of the range of F. heteroclitus. Using this common approach for assessing utility of genetic markers, loci that appear polymorphic differ in allele frequency at the ends of the cline, a necessary condition for clinal variation. SNP loci on the other hand, were chosen from the F. heteroclitus genome based upon location of restriction endonuclease recognition sites [39] and were not subsequently filtered based on patterns of geographic variability prior to this study. Alternatively, the typically bi-allelic nature of SNP loci results in a situation where the pattern of geographic variation for one allele is the complement of the other. In multi-allelic marker systems like allozymes, and in particular microsatellites, there are k−1 distinct geographic patterns possible among the alleles segregating across the species range, where k is the number of alleles segregating at a locus. Therefore when considering geographic variation on a per-locus basis as we have done here, mutli-allelic loci have more chances to exhibit a clinal pattern, even if some alleles at that loci are distributed non-clinally.
Simulations of F. heteroclitus Demographic History
Simulating the likely history of F. heteroclitus populations and analyzing simulation output with our approach yielded patterns that were remarkably similar to those observed in the empirical dataset (Figure 2 vs. Figure 4, B and C panels). Furthermore, the pattern revealed in the empirical data by the Co-Co and fdist2 analyses persisted under different rates of gene flow (Figure 4 left and right panels; Figure S2).
The proportion of loci that exhibited clinal patterns in the simulations bracketed the proportion observed in SNPs and in the entire genetic dataset (44% and 47%, respectively; Table 2). The difference depended on dispersal rate. At the lower rates of dispersal simulated in Figure 4, the percent of neutral loci exhibiting a clinal pattern was smaller than observed in the entire empirical dataset (one sample t-test: and 0.0001 for equal to 0.4 and 4, respectively). At higher dispersal rates, this proportion of clinal loci was higher than observed overall ( for equal to 40).
It is clear from both empirical and simulated fdist2 and Co-Co analyses that no signal of diversifying selection could be teased away from the strong demographic history signal in this system, despite the initially suggestive large numbers of outliers in these analyses. An interesting artifact associated with the fdist2 approach emerges from both empirical and simulated datasets and suggests the coalescent simulation model employed by fdist2 may always be problematic in clines with a history of secondary contact. A distinctive peak in values occurs at a system-wide . This peak occurs at an that results when each end of the cline is fixed for alternative alleles. This is the pattern exhibited by mtDNA and MDH in the empirical dataset. In this situation, the majority of populations are fixed for one allele or the other depending on their location along the linear cline resulting in almost all genetic variability occurring among populations (high ). This artifact was predicted by Beaumont and Nichols [15] when they examined the performance of fdist under different demographic scenarios and observed that clusters of similar populations effectively reduced the number of populations providing information. In the extreme case, strong linear clines represent two clusters of populations. Other authors have noted this issue as well under ancient vicariance followed by local range expansion [40]. The population-level history of F. heteroclitus through the Pleistocene and Holocene exemplifies this history [35].
Independent Evidence of Selection at F. heteroclitus Loci
mtDNA.
Both analyses find that mitochondrial RFLP haplotypes have a significantly steep clinal slope. Moreover, fdist2 identified a mitochondrial SNP (Table S1) that had significantly greater FST than expected. These results weakly support the hypothesis that maintenance of the mitochondrial cline since secondary contact may result from diversifying selection. Independently, environmentally-mediated selection has been proposed previously for mitochondria [32], [41]. It is also possible that cytonuclear co-adaptation may minimize the introgression of mitochondrial alleles across the cline [42]–[44], but this remains to be tested in Fundulus.
MDH.
Both FST-outlier and Co-Co plot analyses consistently highlight the significantly steeper cline at cytosolic malate dehydrogenase (or Mdh). It was previously recognized that Mdh “exhibits a substantially steeper cline” than the better understood Ldh, and ‘that selection may be even stronger at the locus’ [19]. However, ours represents the first effort to assess this uniquely strong cline among a large sample of coincident clines. Very little functional work on Mdh isozymes has been published for Fundulus even while Mdh is known to mediate temperature tolerance among other marine organisms [45]. Furthermore, the sensitivity of Mdh isozymes to water temperature has been shown in F. heteroclitus, where frequency of the southern allele increased in the warm-water outfall of a nuclear power plant compared to neighboring Long Island populations [46].
LDH.
One of the initially surprising aspects of our analysis of published data is that the Ldh cline shape is not significantly distinct from neutral expectation, given that much of the functional genetic work pursued by Dennis Powers and colleagues suggested strong diversifying selection [19], [20], [24]. The lack of evidence for diversifying selection when applying fdist2 to Ldh can be explained by looking at the full distribution of clines available (Figure 1, panel B). It is clear that Ldh does not exhibit the steepest slope observed. This gentler slope obviously determines the central location of Ldh in the Co-Co analysis, but it also reduces over the full geographic range due to the fact that populations in proximity to the cline location are polymorphic for the two Ldh alleles.
Analysis of Genomic Clines
In this study we introduce a likelihood-based framework that allows partially unsupervised identification of genetic loci exhibiting clinal patterns of spatial variation. Other analytical methods exist to analyze clinal variation across hybrid zones, though we argue that these methods are not appropriate in the F. heteroclitus cline. For example, recent regression-based methods [47] require generation of a hybrid index and are thus inappropriate for geographic clines in which the cline is old, introgression is extensive, or generation of hybrid indices are otherwise impossible. Likewise, models based on primary intergradation [10] are inappropriate for stepped clines generated by secondary contact. Such approaches may become increasingly important in species other than F. heteroclitus as more and more large population-genomic datasets are generated through the use of next-generation sequencing techniques.
Both type I and type II error are troublesome when searching for patterns of selection in a population genomic dataset. Type I error, the situation in which a neutral locus is considered to be under selection, can cause a waste of resources as a research team subjects the mis-identified locus to cost- and labor-intensive laboratory methods in an attempt to identify the nature of selection on the molecule. The converse, type II error, where loci under selection are not identified in a population genomic scan, can also waste resources and may also have a more insidious impact.
Use of population genomic scans has been suggested as a means to ensure that all loci in a dataset are behaving neutrally so that they can be used for typical molecular ecological inference [48]; type II error will be positively misleading in this case. Thus, an inability to show that loci are under selection using the distribution of geographic variation may indicate that these loci will not strongly mislead inference on population structure; however, this does not ensure that misidentified loci will not reduce the accuracy of estimates of within-population parameters (for example estimation of family size).
Roles of history versus selection in F. heteroclitus.
Though diversifying selection has been demonstrated for a few loci in F. heteroclitus using independent means, demographic history can explain all clinal patterns observed in F. heteroclitus without invoking natural selection to either establish or maintain the pattern we observe today. This alternative hypothesis has long been acknowledged but not quantitatively addressed by previous authors [19], [23], [25], [32]. In this study, two lines of evidence support this conclusion. First, in a large sample of loci examined along the cline, nearly half exhibit a significantly clinal pattern. We believe it is unlikely that 44–47% of the loci assayed happened to be under the same diversifying selection pressure. Second, we simulated neutral loci under a likely demographic history of F. heteroclitus and found that, compared to empirical data, simulated loci showed similar proportions of loci exhibiting clinal patterns and similar results using both Co-Co and fdist2 approaches.
Though demography can explain allele frequencies, we are asserting only that detecting diversifying selection in this system is difficult, if not impossible, given the strong historical signal. We do not rule out the possibility of diversifying selection in this system, only that geographical analysis is a very low-power technique to detect it.
Materials and Methods
Sources of genetic data are summarized in Table 1. Allele frequency data for each population were reported for allozyme and mitochondrial loci. Individual-level genotypes for microsatellite and SNP data were converted to allele frequencies before analysis. We removed all SNP loci that were invariant across populations included in this study and also those not assayed in each of the 11 populations surveyed for SNP variation. This reduced our initial set of 310 SNP loci to 285. In order to make analysis consistent across loci, we analyzed all data while assuming Hardy-Weinberg equilibrium within all locus-population combinations and linkage equilibrium among all loci.
Co-Co Plots
At loci in which more than two alleles were observed, we used an FST-based approach to identify a focal allele for clinal analysis. Our aim was to identify the allele with the strongest differentiation among northern and southern portions of the sampled range. We first divided the range of F. heteroclitus into northern (north of 42°N), central, and southern populations (south of 34°N). We then estimated FST for each multiallelic locus across the northern and southern region while ignoring the central region. For each locus, we chose the allele that contributed most to the total genetic variance observed across populations. Analyses were performed with the R package hierfstat [49].
Once focal alleles were identified, we fit a three-segment “twice-broken-stick” model to the relationship between collection site latitude and focal allele frequency. This model assumes that allele frequencies do not vary with space outside of the cline and that they vary linearly within the cline. Four parameters are fit: the allele frequencies North and South of the cline and the locations of the transitions from non-clinal to clinal portions of the range. These parameters correspond to the coordinates of the “breaks” in latitude/allele frequency space and were estimated using the R function optim. From these four parameters, we calculated the midpoint and slope of each cline.
Modeling clinal variation as three straight line segments represents a significant departure from traditional analyses. It is more common for single-locus clines to be described by a four-parameter logistic model [12] whose parameters describe the frequencies of the focal allele at either end of the geographic transect, the steepness of the cline, and the location of the cline. However, numerical estimation of the logistic model can be difficult when applied in the unsupervised manner required to screen large numbers of loci. In some cases when no clinal pattern exists, failure to converge on the parameter estimates is expected. Unfortunately, in cases in which visual inspection indicates an obvious cline but the difference in allele frequency at the ends of the cline is moderate or the ends of the cline are poorly defined, non-linear regression packages that implement logistic models (SAS; R-package nlme) also fail to converge. Broken-stick models are much less sensitive to these effects. For most loci for which we were able to fit a logistic model, logistic and broken-stick fits were both coincident and concordant (See Figure S1, for example).
We included a given locus in the Co-Co analysis when it passed two criteria. First, the locus showed a significant signal of genetic isolation (i.e., FST) by geographic distance, as determined by a Mantel test [50]. Second, a broken-stick fit of the locus represented a significant improvement over the fit of a horizontal line to the data. Comparison between these fits was conducted using likelihood-ratio tests. These criteria identify 125 of an initial 285 SNP loci available (44%) that clinally vary across the broad geographic range of the species. Note that all 285 SNP loci were included in the FST-outlier analysis (below).
To identify allelic clines that significantly differed in location or slope, we used a 2-dimensional kernel-density estimated from the coordinates of the cline for each locus in location/slope space. To account for right-skewness in the distribution of cline slopes, we first log-transformed each slope’s absolute value. We then estimated confidence envelopes that encompassed 95, and 99% of the volume of the 2-d kernel density. Bandwidths for 2-dimensional density estimation and creation of envelopes were estimated from the data using cross-validation [51] and were implemented in the R package ks.
FST-based Outlier Analysis
We summarized all raw data into an 82 population × 310 loci matrix of allele frequencies. For subsets of the data in which a particular type of locus (microsatellite, SNP, etc) was not collected for a population, that population was eliminated from analysis for those loci only. We then generated 20 individual-level genotypes at each population-locus combination assuming Hardy-Weinberg Equilibrium and no linkage disequilibrium. These synthetic genotypes were produced using a custom R script (available from authors upon request) and output in GenePop format. This Genepop file was then converted to FDist format using PGDSpider (http://www.cmpg.unibe.ch/software/PGDSpider/) and analyzed using fdist2 [15]. Though alternatives to fdist2 have been developed, we chose this approach because of its frequency of use in the literature and because it has recently been shown to exhibit more reasonable error rates relative to other methods [52]. Because the median number of populations assayed per locus was 11 (range = 7–41), we simulated 100 demes and sampled 11 demes across 50,000 replicate simulations assuming infinite alleles and using FST = 0.33 as a starting value. Changing the number of demes to 13 (the mean number of populations assayed per locus) or altering the starting FST value did not qualitatively affect the result.
Simulations
Historical biogeography and demography.
Our simulations were intended to reflect a historical pattern in which an ancestral population was divided in the late Pleistocene. These two descendants represent relictual, isolated populations. At the start of the Holocene (90,000 generations after isolation), each of these two populations underwent an instantaneous range expansion into 20 descendant populations. The timing of these events are consistent with existing estimates of divergence times along the Fundulus cline [35]. All 40 descendant populations were linked to their immediate neighbors through gene flow under a strict stepping-stone model with a constant pairwise rate of exchange, , until the present. Figure 3 provides an illustration of the demographic history used in simulations. Total effective population size was held constant throughout the simulation ( of ancestors is equal to the sum of ’s of their descendants). We chose an for each population in the present day equal to 2500, based on previously determined estimates of [23] and an assumed microsatellite mutation rate of .
All simulations were implemented with msms [53]. This software performs coalescent simulation of neutral loci using the same algorithm as ms [54], [55].
Supporting Information
Figure S1.
Examples of three model fits to clinal data for two loci. Red corresponds to a logistic fit, green a broken-stick fit, and blue a linear fit. Panel on the left illustrates the similarity between the the logistic and broken-stick models in well-behaved clinal data with large differences in allele frequency between the ends of the cline. Panel on the right corresponds to a situation where a clinal pattern appears in the data, but a logistic fit would not converge using the nlme package available on the R package repository (cran.r-project.org)
https://doi.org/10.1371/journal.pone.0045138.s001
(PDF)
Figure S2.
Simulation results for: , other conditions same as in manuscript. Symbols and notation employed in plot identical to figure 4 in main document. Note that while the slope of clinally varying loci is less steep, they have the same midpoint and the proportion of loci exhibiting clinal variation is comparable to those cases illustrated in the main document.
https://doi.org/10.1371/journal.pone.0045138.s002
(PDF)
Table S1.
Loci identified as outliers using Co-Co and fdist2 analyses.
https://doi.org/10.1371/journal.pone.0045138.s003
(PDF)
Author Contributions
Conceived and designed the experiments: AES LMW MFO EES. Performed the experiments: AES EES LMW. Analyzed the data: AES EES. Contributed reagents/materials/analysis tools: AES. Wrote the paper: AES EES LMW MFO.
References
- 1. Barton N, Hewitt G (1985) Analysis of hybrid zones. Annual review of Ecology and Systematics 16: 113–148.
- 2.
Arnold M (1997) Natural hybridization and evolution. New York, NY: Oxford University Press.
- 3. Gardner JPA (1997) Hybridization in the sea. Advances in Marine Biology 31: 1–78.
- 4. Sotka E, Palumbi S (2006) The use of genetic clines to estimate dispersal distances of marine larvae. Ecology 87: 1094–1103.
- 5. Schmidt P, Serrão E, Pearson G, Riginos C, Rawson P, et al. (2008) Ecological genetics in the North Atlantic: environmental gradients and adaptation at specific loci. Ecology 89: 91–107.
- 6. Handley LJL, Manica A, Goudet J, Balloux F (2007) Going the distance: human population genetics in a clinal world. Trends in Genetics 23: 432–9.
- 7.
Endler JA (1977) Geographic variation, speciation, and clines, volume 10 of Monographs in Population Biology. Princeton, New Jersey: Princeton University Press.
- 8. Woodruff D (1978) Review: Mechanisms of Speciation. Science 199: 1329–1330.
- 9. Haldane J (1948) The theory of a cline. Journal of Genetics 48: 277–84.
- 10. Vasemägi A (2006) The adaptive hypothesis of clinal variation revisited: single-locus clines as a result of spatially restricted gene ow. Genetics 173: 2411–4.
- 11. Slatkin M (1973) Gene ow and selection in a cline. Genetics 75: 733.
- 12. Szymura JM, Barton NH (1986) Genetic analysis of a hybrid zone between the fire-bellied toads, Bombina bombina and B. variegata, near Cracow in southern Poland. Evolution 40: 1141–1159.
- 13. Vasemägi A, Primmer CR (2005) Challenges for identifying functionally important genetic variation: the promise of combining complementary research strategies. Molecular Ecology 14: 3623–3642.
- 14. Willing EM, Bentzen P, van Oosterhout C, Hoffmann M, Cable J, et al. (2010) Genome-wide single nucleotide polymorphisms reveal population history and adaptive divergence in wild guppies. Molecular Ecology 19: 968–84.
- 15. Beaumont M, Nichols R (1996) Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society B 263: 1619–1626.
- 16. Beaumont M, Balding D (2004) Identifying adaptive genetic divergence among populations from genome scans. Molecular Ecology 13: 969–980.
- 17.
Barton N, Gale K (1993) Genetic analysis of hybrid zones. In: Harrison R, Price J, editors, Hybrid Zones and the Evolutionary Process, Oxford University Press, USA.
- 18. Atz J (1986) Fundulus heteroclitus in the laboratory: a history. American Zoologist 26: 111–120.
- 19. Powers DA, Schulte PM (1998) Evolutionary adaptations of gene structure and expression in natural populations in relation to a changing environment: a multidisciplinary approach to address the million-year saga of a small fish. The Journal of Experimental Zoology 282: 71–94.
- 20. Burnett K, Bain L, Baldwin W, Callard G (2007) Fundulus as the premier teleost model in environmental biology: opportunities for new insights using genomics. Comp Biochem Physiol Part D: Genomics Proteomics 2: 257–286.
- 21. Kircheis F (1998) Species Composition and Economic Value of Maine’s Winter Baitfish Industry. North American Journal of Fisheries Management 18: 175–180.
- 22. Eanes W (1999) Analysis of selection on enzyme polymorphisms. Annual Review of Ecology and Systematics 30: 301–326.
- 23. Duvernell DD, Lindmeier JB, Faust KE, Whitehead A (2008) Relative inuences of historical and contemporary forces shaping the distribution of genetic variation in the Atlantic killifish, Fundulus heteroclitus. Molecular Ecology 17: 1344–60.
- 24. Dalziel AC, Rogers SM, Schulte PM (2009) Linking genotypes to phenotypes and fitness: how mechanistic biology can inform molecular ecology. Molecular Ecology 18: 4997–5017.
- 25. Powers D, Place A (1978) Biochemical genetics of Fundulus heteroclitus (L.). I. Temporal and spatial variation in gene frequencies of Ldh-B, Mdh-A, Gpi-B, and Pgm-A. Biochemical Genetics 16: 593–607.
- 26. Cashon R, Beneden R, Powers D (1981) Biochemical genetics of Fundulus heteroclitus (L.). IV. Spatial variation in gene frequencies of Idh-A, Idh-B, 6-Pgdh-A, and Est-S. Biochemical Genetics 19: 719–728.
- 27. Ropson I, Brown D, Powers D (1990) Biochemical genetics of Fundulus heteroclitus (L.). VI. Geographical variation in the gene frequencies of 15 loci. Evolution 44: 16–26.
- 28. Schulte P, Gomez-Chiarri M, Powers D (1997) Structural and functional differences in the promoter and 5′anking region of Ldh-B within and between populations of the teleost Fundulus heteroclitus. Genetics 145: 759.
- 29. Schulte PM (2001) Environmental adaptations as windows on molecular evolution. Comparative Biochemistry and Physiology Part B, Biochemistry & Molecular Biology 128: 597–611.
- 30. González-Vilaseñor L, Powers D (1990) Mitochondrial-DNA restriction-site polymorphisms in the teleost Fundulus heteroclitus support secondary intergradation. Evolution 44: 27–37.
- 31. Smith M, Chapman R, Powers D (1998) Mitochondrial DNA Analysis of Atlantic Coast, Chesapeake Bay, and Delware Bay populations of the teleost Fundulus heteroclitus indicates temporally unstable distributions over geologic time. Molecular Marine Biology and Biotechnology 7: 79–87.
- 32. Whitehead A (2009) Comparative mitochondrial genomics within and among species of killifish. BMC Evolutionary Biology 9: 11.
- 33. Adams SM, Lindmeier JB, Duvernell DD (2006) Microsatellite analysis of the phylogeography, Pleistocene history and secondary contact hypotheses for the killifish, Fundulus heteroclitus. Molecular Ecology 15: 1109–23.
- 34. Williams LM, Oleksiak MF (2011) Ecologically and Evolutionarily Important SNPs Identified in Natural Populations. Molecular Biology and Evolution 28: 1817–26.
- 35. Haney Ra, Dionne M, Puritz J, Rand DM (2009) The comparative phylogeography of east coast estuarine fishes in formerly glaciated sites: Persistence versus recolonization in Cyprinodon variegates ovinus and Fundulus heteroclitus macrolepidotus. The Journal of heredity 100: 284–96.
- 36. Gay L, Crochet PA, Bell Da, Lenormand T (2008) Comparing clines on molecular and phenotypic traits in hybrid zones: a window on tension zone models. Evolution 62: 2789–806.
- 37. Storz JF (2005) Using genome scans of DNA polymorphism to infer adaptive population divergence. Molecular Ecology 14: 671–88.
- 38. Powell M, Crawford D, Lauermann T (1992) Analysis of cryptic alleles of Fundulus heteroclitus lactate dehydrogenase by a novel allele-specific polymerase chain reaction. Molecular Marine Biology and Biotechnology 1: 391–6.
- 39. Williams LM, Ma X, Boyko AR, Bustamante CD, Oleksiak MF (2010) SNP identification, verification, and utility for population genetics in a non-model genus. BMC Genetics 11: 32.
- 40. Bonhomme M, Chevalet C, Servin B, Boitard S, Abdallah J, et al. (2010) Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended. Genetics 186: 241–262.
- 41. Flight PA, Nacci D, Champlin D, Whitehead A, Rand DM (2011) The effects of mitochondrial genotype on hypoxic survival and gene expression in a hybrid population of the killifish, Fundulus heteroclitus. Molecular Ecology 20: 4503–20.
- 42. Arnold J (1993) Cytonuclear disequilibria in hybrid zones. Annual Review of Ecology and Systematics 24: 521–554.
- 43. Ballard JWO, Whitlock MC (2004) The incomplete natural history of mitochondria. Molecular Ecology 13: 729–744.
- 44. Abe TA, Spence JR, Sperling FA (2005) Mitochondrial introgression is restricted relative to nuclear markers in a water strider (Hemiptera: Gerridae) hybrid zone. Canadian Journal of Zoology 83: 13.
- 45. Somero GN (2010) The physiology of climate change: how potentials for acclimatization and genetic adaptation will determine ‘winners’ and ‘losers’. The Journal of Experimental Biology 213: 912–20.
- 46. Mitton J, Koehn R (1975) Genetic organization and adaptive response of allozymes to ecological variables in Fundulus heteroclitus. Genetics 79: 97.
- 47. Gompert Z, Buerkle CA (2009) A powerful regression-based method for admixture mapping of isolation across the genome of hybrids. Molecular Ecology 18: 1207–24.
- 48. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P (2003) The power and promise of population genomics: from genotyping to genome typing. Nature Reviews Genetics 4: 981–94.
- 49. Goudet J (2005) HierFstat, a package for R to compute and test hierarchical F -statistics. Molecular Ecology Notes 2: 184–186.
- 50. Mantel NA (1967) The detection of disease clustering and a generalized regression approach. Cancer Research 27: 209–220.
- 51. Jones M, Marron J, Park B (1991) A simple root n bandwidth selector. The Annals of Statistics 19: 1919–1932.
- 52. Narum SR, Hess JE (2011) Comparison of F(ST) outlier tests for SNP loci under selection. Molecular Ecology Resources 11 Suppl 1184–94.
- 53. Ewing G, Hermisson J (2010) MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: 2064–5.
- 54. Hudson R (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338.
- 55. Hudson RR (1990) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7: 1–44.