Ecological Niche Models and Coalescent Analysis of Gene Flow Support Recent Allopatric Isolation of Parasitoid Wasp Populations in the Mediterranean

Background The integration of multiple complementary approaches is a powerful way to understand the processes of diversification and speciation. The parasitoid wasp Aphidius transcaspicus Telenga (Hymenoptera: Braconidae) is a parasitoid of Hyalopterus aphids across a wide geographic range. This species shows a remarkable degree of genetic structure among western, central, and eastern Mediterranean population clusters. In this paper we attempt to better characterize this genetic structure. Methodology/Principal Findings We use a Bayesian coalescent analysis of gene flow under the Isolation with Migration model using mitochondrial and microsatellite markers together with climate-based ecological niche models to better understand the genetic structure of A. transcaspicus in the Mediterranean. The coalescent analysis revealed low levels of migration among western and eastern Mediterranean populations (Nm<1) that were not statistically distinguishable from zero. Niche models showed that localities within population clusters each occupy areas of continuously high environmental suitability, but are separated from each other by large regions of completely unsuitable habitat that could limit dispersal. Overall, environmental characteristics were similar among the population clusters, though significant differences did emerge. Conclusions/Significance These results support contemporary allopatric isolation of Mediterranean populations of A. transcaspicus, which together with previous analyses indicating partial behaviorally mediated reproductive isolation, suggest that the early stages of cryptic speciation may be in progress.


Introduction
Working to understand the processes that contribute to reproductive isolation and speciation is among the most active and challenging areas in evolutionary biology. Over the last several decades the use of genetic markers to identify evolutionarily distinct populations has become common practice [1], with phylogeographic studies having demonstrated that any number of biotic and abiotic components-including geological, climatic, and ecological processes-can generate population structure and eventually lead to reproductive isolation and speciation. Studies of recent divergences are particularly attractive because the signatures of such events have not been fully erased by time, and it can be more straightforward to infer process from patterns in genetic data [2]. As phylogeographic studies accumulate, however, it is becoming clear that complete explanations of evolutionary history often require the integration of multiple complementary sources of information [3].
One powerful way to improve genetic inferences about gene flow and recent speciation is through the incorporation of environmental data [3][4][5][6][7]. Environmental variables determine, in part, the geographic distribution of a species as well as how selective pressures vary within that distribution, shaping the potential for and mechanism behind reproductive isolation. For instance, allopatric genetic differentiation occurs when populations become spatially separated by regions of environmentally unsuitable habitat, resulting in an interruption to gene flow and the gradual accumulation of reproductive isolation mechanisms unless the organisms adapt to the intervening inhospitable conditions [5]. Alternatively, for differentiation in sympatry or parapatry, geographic ranges of diverging populations may overlap or abut, but divergent ecological pressures within the overall distribution are more likely to drive the formation of reproductive isolation. However, even under allopatric conditions ecology can play a role; if isolated populations experience unique selective pressures then local adaptation and niche diversification may ensue, helping to reinforce reproductive isolation even if physical barriers to dispersal eventually disappear [8]. Understanding how environmental variables contribute to the geographic distributions of species is thus of considerable interest.
Geographic information systems based ecological niche modeling is a promising tool for predicting species distributions and identifying factors that have contributed to their divergence [5,9,10]. Such models divide the Earth's surface into a series of grid cells and, provided with coordinates of sampling localities and a set of spatially explicit environmental data layers, statistically assessing the suitability of each grid cell for the taxa of interest. These values are then used to generate a map predicting areas likely to be suitable for a species given the environmental variables included in the model. Results from ecological niche models (ENMs) can be integrated with hypotheses from molecular data to assess whether populations that show strong genetic structure are also likely to be geographically discrete (i.e., each occupying areas of high environmental suitability but with intervening areas of low suitability). The combination of such evidence would suggest that gene flow is likely to be effectively negligible and lineages may be in the early stages of speciation [5,11,12].
Once populations are shown to be structured both genetically and environmentally, a rigorous assessment of gene flow levels is needed to determine their evolutionary status and potential for speciation [1,3]. While phylogenetic analyses or strong differentiation observed using population genetic statistics (e.g. F ST ) may allow identification of evolutionary significant units, these approaches can be problematic for quantifying gene flow under recent divergence because they do not adequately separate the historical effects of shared ancestry from contemporary gene flow [13,14]. However, new computational methods that explicitly model scenarios involving population divergence and ongoing gene flow using the coalescent process [15] can estimate such demographic parameters independently using multilocus genetic data. These methods can thus provide insights into both past and present processes and allow users to statistically distinguish the probability of current gene flow versus complete isolation [16][17][18].
We undertook this study in an attempt to explain the remarkably high levels of genetic differentiation previously observed among population clusters of the wasp Aphidius transcaspicus Telenga (Hymenoptera: Braconidae) [19]. A. transcaspicus is a specialist parasitoid of aphids in the genus Hyalopterus Koch and is distributed throughout much of the Mediterranean and parts of central Asia. Bayesian cluster analyses of microsatellite markers have shown that individuals in the Mediterranean region are grouped into three main population clusters (K12K3; Figure 1) that show striking levels of divergence, as measured using hierarchical F-statistics [20] (among-cluster F CT = 0.45) [19]. Plots of isolation by distance show that over the geographic areas covered by each population cluster there is no increase of genetic divergence with geographic distance, but genetic distances for inter-cluster locality pairs are much greater [19]. Mitochondrial DNA (mtDNA) sequences from the same regions are not reciprocally monophyletic among population clusters, but still have very high levels of differentiation (overall F ST = 0.68) and a unique set of haplotypes is restricted to parasitoids from K3 localities in the eastern Mediterranean, supporting the distinctness of this population [19]. Furthermore, laboratory mating experiments conducted using individuals collected from Spain and Israel (representing K1 and K3, respectively) suggest that this genetic differentiation may be correlated with limited reproductive compatibility between parasitoids from the different regions [21] (Lozier & Mills in prep.). While these experiments identified no obvious post-mating isolation effects in terms of offspring production, reciprocal inter-population mating either took substantially longer than intra-population crosses or were avoided entirely, suggesting that pre-zygotic isolation barriers, perhaps involving mate-recognition or courtship behavior [e.g. 22], have begun to evolve. Together, the genetic and behavioral information indicate that A. transcaspicus populations in the Mediterranean may be in the early stages of speciation.
Sampling data and field observations suggest that A. transcaspicus is most abundant in coastal Mediterranean areas, and becomes rare or absent above 42-43uN latitude in Europe and away from the coast in northern Africa. In contrast, its Hyalopterus hosts appear to have a much broader distribution, especially in Europe, where Hyalopterus pruni occurs as far north as England, Germany, Finland, and Sweden [23] (N Mills, personal observation). In this paper we use ENMs to investigate whether climatic variables may be limiting the geographic distribution of A. transcaspicus in the  Mediterranean region and thus contributing to genetic differentiation at the region-wide scale through environmental barriers to gene flow. At the same time, the broad geographic area inhabited by A. transcaspicus may provide opportunities for ecological specialization in subdivided populations, and we assess this possibility by testing for differences in climatic variables experienced by parasitoids from each genetic cluster. To supplement this analysis, we reanalyze mitochondrial sequence and microsatellite data from Mediterranean populations [19] by fitting them to a coalescent-based Isolation with Migration (IM) model [16,17], which estimates demographic parameters in recently diverged populations and allows the use of likelihood ratios to test the significance of nested models that include or exclude gene flow.

Genetic Analyses
We used genetic data from a previous study [19] to extend our analyses of the population structure of A. transcaspicus in the Mediterranean. Briefly, these data consist of a 432 bp segment of the mitochondrial cytochrome oxidase I gene (COI) and nine microsatellite loci [see 19, 24 for details and GENBANK accession numbers]. In our previous study, we genotyped parasitoids from most localities at which A. transcaspicus was observed ( Figure 1) and found that Mediterranean A. transcaspicus are organized into three highly differentiated population clusters based on microsatellite genotypes: a western cluster comprising parasitoids from Spain, Morocco, and Tunisia (K1), a cluster from Italy (K2), and an eastern cluster comprising parasitoids from Greece, Cyprus, Egypt, Turkey, and Israel (K3). The COI data provided less resolution, but one set of haplotypes was found only in localities within the K3 cluster, highlighting the uniqueness of parasitoids in the eastern Mediterranean.
Our present analyses focus on quantifying the degree of differentiation among these three clusters by fitting our data to the two-population Isolation with Migration (IM) coalescent model as implemented in the IMa software [17]. The model assumes a panmictic ancestral population of effective size N A that, at time t in the past, diverged into two daughter populations of sizes N 1 and N 2 that have undergone regular gene flow at a rate of m 1 and m 2 since divergence. IMa utilizes a Bayesian Markov Chain Monte Carlo method to estimate posterior distributions of the six demographic parameters in the IM model, scaled by the mutation rate m, including the population size parameters (for haplodiploid organisms h A = 3N A m, h 1 = 3N 1 m, and h 2 = 3N 2 m), the divergence time parameter (t = tm), and the post-divergence migration parameters (m 1 = m 1 /m and m 2 = m 2 /m, where, here, m equals the migration rate per generation into the specified population, considered forward in time). Upper bounds for the prior parameters were set at h 1,2 = 4, h A = 100, t = 4, m 1 = 10, and m 2 = 10. These bounds were selected following preliminary runs with very large upper bounds followed by modification in subsequent trial runs depending on the resulting posterior distributions, as suggested in the IMa manual. For the final analysis, the program was run using 50 chains for a total of 5610 7 steps (sampling every 100) following a burn-in period of 500,000 steps. Following multiple trial runs, geometric heating parameters were chosen to achieve sufficient mixing (h1 = 0.99, h2 = 0.70) such that no obvious trends were apparent in posterior distribution scatter-plots. The IMa analysis is computationally intensive, and can thus only analyze a two population model. Therefore, we attempted to perform three analyses, one for each pairwise comparison between K1, K2, and K3. We used the full set of mtDNA sequences from each cluster (n = 51, 12, and 96, respectively) under an infinite sites evolution model, and a subsample of 100 randomly selected individuals each from K1 and K3 (all 17 genotypes for K2 were used) from each of five loci (At001, At003, At004, At009, At016) that did not obviously deviate from the stepwise mutation model (i.e. contained no allele sizes deviating from expected given the repeat unit). Unfortunately, convergence could not be achieved for comparisons involving K2 despite multiple attempts and so we focus here on the distinctiveness of the larger K1 and K3 clusters. The analysis was repeated for three short runs using different random seed values and visually assessed for consistency before performing the final run described above.
To ensure that posterior parameter distributions were scaled properly, inheritance scalars were assigned to each locus to account for the haplodiploidy of A. transcaspicus (diploid females, haploid males). Because IMa is designed to return estimates given a value of h equal to twice the effective number of nuclear gene copies (in this case 1.56N e ), multiplied by m, the nuclear microsatellite loci were given an inheritance scalar of 1.0, while the maternally inherited haploid mtDNA locus was given an inheritance scalar of 0.333 (J. Hey, personal communication). We report estimates for all parameters scaled by m (but see Discussion).
We used 90% highest probability marginal-density (HPD) to examine the variance of parameter estimates. We also used the nested models approach implemented in the ''L-mode'' of IMa (using 5610 5 trees) to statistically test the hypothesis of zero gene flow for both m-parameters as compared to the full model containing all parameters using log-likelihood ratio tests. Because in our alternative hypothesis m is bounded by zero, to test the significance of no gene flow we used the following approach, recommended by J. Hey (internet communication on 26.iii.2007; http://groups.google.com/group/Isolation-with-Migration). First we tested whether the hypothesis of equal gene flow could be rejected (H 0 : m 1 = m 2 versus H A : m 1 ?m 2 , with all other parameters in the model free to vary) where the 22LLR statistic was tested by comparing maximum posterior probabilities for each model using a x 2 distribution with 1 d.f. The second test compared the 22LLR of H 0 : m 1 = m 2 = 0 versus H A : m 1 = m 2 , using a mixed distribution of 1=2x 2 1 z1 2x 2 0 [17] implemented in the R 2.6.2 [25] package emdbook [26]. If neither test returns a significant result, then the hypothesis of m 1 = m 2 = 0 cannot be rejected, providing no support for gene flow levels greater than zero.

Ecological Niche Model
Aphidius transcaspicus locality data. Museum and literature records can be useful tools for obtaining locality data for niche modeling [4], though are reliable only if specimens have been accurately identified. Because A. transcaspicus has a long history of taxonomic confusion [27][28][29], we suspected that many older historical records were potentially error-prone and chose to rely largely on our own extensive collections of the target species. Between 2002 and 2007-largely in May and June-field trips to Spain, Morocco, Tunisia, Italy, Greece, and Israel were conducted to obtain A. transcaspicus samples for importation to California, USA as part of a biological control program for the aphid Hyalopterus pruni. As both A. transcaspicus and its host aphids are sporadically distributed among host plants throughout the region samples were taken opportunistically by looking for Hyalopterusinfested Prunus trees or Phragmites plants (both both Hyalopterus host plants) and collecting any parasitized aphids found. In some countries samples were collected in more than one year during this period, for others only a single year of sampling was conducted. In total, A. transcaspicus was collected or observed at 70 locations throughout these regions ( Figure 1). We also obtained specimens confirmed as A. transcaspicus from Cyprus, Egypt, Turkey, Iran, and Pakistan from collaborators (M. Hadjystilli, University of California at Berkeley; P. Starý, Czech Academy of Sciences, Czech Republic; E. Rakhshani, Tarbiat Modarres University, Iran). The four localities from Iran and Pakistan were used as training data for the ENM but are not discussed further. Finally, we used three localities in Greece obtained from a recent review of European Aphidius species [30], leading to a total of 81 records (Table S1). All localities were georeferenced from label data or field notes using regional maps and Google Earth. We note that areas where Hyatlopterus occurs outside of our 'presence localities' have been searched on several occasions for A. transcaspicus (e.g. France, Germany, northern Italy, Switzerland, northern Greece and coastal Baltic states), but no evidence of the parasitoid was found, despite an abundance of their aphid hosts (Figure 1; N. Mills, personal observation). Other recent surveys of parasitoids by Aphidius taxonomists familiar with the collection and identification of A. transcaspicus have also suggested a restriction to the ''Mediterranean'' faunistic complex, as opposed to boreal, steppe, coniferous, or deciduous habitats [29].
Modeling approach. We generated an ENM for A. transcaspicus using the robust modeling approach implemented in the program Maxent v3.1 [31]. Maxent uses presence-only sampling data and environmental data layers to generate a probability distribution of species occurrence over a given area using the principle of maximum entropy [31]. Our intention is to generate a distribution of geographic areas that are likely to be climatically suitable for A. transcaspicus in a typical contemporary environment, rather than predict on a fine-scale the occurrence in a given community or landscape. Although presence-only data is collected from a geographic area that represents the contemporary realized niche (a-niche) of a species, which may be reduced in size by biotic interactions from the area representing the true fundamental niche (b-niche) defined by climatic tolerance, this is of less concern for A. transcaspicus. As noted above, the distribution of A. transcaspicus is not limited by the distribution of its food resources (also see Discussion), and as both aphidophagous competitors and hyperparasitoids are also more widely distributed than this parasitoid (N. Mills, personal observation), climatic tolerance is likely to be an important driver of its current distribution. We thus used the spatially explicit climatic data available from the WorldClim data set [32] at 5-arcminute resolution, which comprises 19 bioclimatic variables representing annual trends, seasonality and extremes of temperature and precipitation. These variables are collected and interpolated from globally distributed weather stations and are averaged over a period ranging from ,1950-2000. In general, temperature and precipitation have been used extensively to model the distribution of insects [33][34][35][36], and the WorldClim data set has been a central source of climate data for modeling species distributions [37,38].
We first attempted to minimize model over-fitting by calculating Pearson's correlation coefficient (r) between each pair of variables for 1,000 randomly selected points from throughout the geographical extent selected for bioclimatic modeling. Climate data for each point were extracted using DIVA-GIS v5.4 (http:// www.diva-gis.org). Variables with r.0.80 were considered as highly correlated and we selectively removed one variable from each of these pairs. Whenever possible, we retained variables that represent climatic seasonality or extremes rather than annual averages of precipitation and temperature, as they seem more likely to influence the range limits of a seasonal species like A. transcaspicus. The final model included 10 variables: BIO2 -mean diurnal temperature range [(mean of monthly (maximum -minimum temperature)]; BIO3 -isothermality [(mean monthly temperature range/annual temperature range)6100]; BIO4 -temperature seasonality (SD6100); BIO5 -maximum temperature of the warmest month; BIO6 -minimum temperature of the coldest month; BIO8 -mean temperature of the wettest quarter; BIO9mean temperature of the driest quarter; BIO13 -precipitation of the wettest month; BIO14 -precipitation of the driest month; and BIO15 -precipitation seasonality (coefficient of variation). In comparison to the ENM from the complete set of variables (not shown) this reduced set did not alter the results to a great extent, although it generated an ENM with a slightly broader distribution. Thus the model presented here is likely to have a lower rate of falsely omitted cells and should be a more conservative distribution against which to evaluate barriers to gene flow in A. transcaspicus based on predictions of climatic unsuitability.
We used logistic output from Maxent which generates a probability of presence (range 0-1) to each grid cell [39]. During initial runs we randomly selected 70% of sampling localities as training data and 30% as test data [31]. Thus, from the 81 records, we had 45 training points and 19 test points, after excluding multiple occurrences per grid cell. We ran Maxent 10 times with default parameters to evaluate performance across runs. For each we used the area under the receiver operating characteristic curve (AUC) to evaluate model performance for both training and test data. The AUC statistic is a thresholdindependent measure of model performance; theoretically, an AUC score of 1.0 indicates optimal performance, while AUC = 0.5 indicates a model that predicted occurrences no better than at random [40]. We also used a threshold dependent one-tailed binomial omission test to test the hypothesis that test points are predicted no better than at random for each of 11 thresholds implemented in Maxent v3.1. Following evaluation, we performed a single final run of the model using all non-duplicate sample coordinates, following [31]. This final full model is presented below (referred to as the 'full ENM') as a best estimate for the potential niche-based distribution of A. transcaspicus in the Mediterranean.
For visualization of ENM predictions, unsuitable habitat was defined as grid cells with a logistic probability less than the lowest value for any sampled locality (after exclusion of the Turkish locality as an outlier, see below). The remaining grid cells are presented continuously from light to dark shades representing the lowest to highest probability of occurrence, respectively. Finally, the importance of each variable to the final model was analyzed using the jackknife procedure implemented in Maxent, where separate models are analyzed that include each variable alone; the resulting effect on training gain can be viewed as a measure of the information contained by the particular variable.
Among-Cluster Climatic Differences. To further examine whether different genetic clusters occupied distinct ecological niche space, in addition to the 'full ENM,' we created two 'reduced ENMs' using only localities from either K1 or from K3 following the procedures above. K2 was not considered due to the small number of observations.
We also investigated how the climate experienced by A. transcaspicus differed among the three genetic clusters. We extracted values for the 10 bioclimatic variables used in the full niche model for each unique coordinate belonging to K1, K2, and K3 (making the assumption that ungenotyped localities in the vicinity of genotyped localities belonged to the same genetic cluster). We performed a principal components analysis (PCA) on correlations between variables using JMP (SAS Institute) followed by analysis of variance (ANOVA) among clusters on each of the first three principal components (PC1, PC2, PC3). We also tested the significance of differences between K1 and K3 localities separately for each climate variable using Students t-tests (once again K2 was excluded because of sample size). Bonferroni correction was applied so that a = 0.005 for 10 comparisons.

Isolation with Migration Model
We attempted to compare analyses for all three pairwise population cluster comparisons, though we were unable to achieve MCMC convergence for those including K2, so only the K12K3 comparison is presented. Multiple runs of IMa converged on very similar posterior probability distributions over multiple short analyses (not shown), and below we present a final analysis consisting of 5610 5 sampled genealogies (Table 1). All posterior distributions rose and fell from zero, with a clearly defined unimodal peak. Estimates of h for K1 and K3 were very similar and were both much smaller than the h A . The migration estimates were small in both directions, and the 90% HPDs for these parameters overlapped the smallest histogram bin (0.005). The highest posterior probabilities for migration were 0.305 and 0.585 for m K1 and m K3 , respectively. The effective rate at which individuals enter a population per generation (Nm; assuming equal sex ratios) was found by multiplying m K1 and m K2 by the highest posterior probability estimates of h K1 and h K3 , respectively and then dividing the resulting values by three. This results in mean estimates of 0.036 effective migrants into K1 and 0.078 into K3 per generation. The likelihood ratio tests showed that the full IM model was not significantly better than the equal migration model, which in turn was not significantly better than the zero migration model ( Table 1). Neither of the shorter preliminary runs was significant for either test (not shown). Together these results demonstrate that considering gene flow between K1 and K3 is not necessary to explain patterns of genetic variation in these populations, and that a pure isolation model is sufficient.

Ecological niche model
Maxent appeared to perform well for the full ENM, with average training and test AUC values of 0.96160.004 SD and 0.94460.015 SD, respectively, across 10 replicate runs. Threshold-based tests also suggest that the model predicted test localities significantly better than at random, with binomial probabilities of P,5610 24 for all omission thresholds examined. In the model generated using all localities as training data, the only locality omitted from the predicted distribution was from Kahramanmaras, Turkey. The climatic variables that contributed most to the model are summarized in Table 2 indicating that the minimum temperature of the coldest month and temperature seasonality contributed most to the full ENM.
The spatial prediction generated for the full ENM was largely congruent with our prior expectations for the distribution of A. transcaspicus in the Mediterranean (Figure 2). Grid cells with the highest occurrence probabilities were distributed in coastal areas in the southern-most parts of Europe and northern-most parts of Africa, as well as on most Mediterranean islands, and suitability gradually declined with increasing distance from the coast. Localities in the region occupied by the K1 cluster were connected by large, continuous areas of high suitability, including the Iberian Peninsula, Morocco, Algeria, and Tunisia, as well as coastal areas to the east of Tunisia from which we had no samples. The only break in the predicted distribution within the K1 region was caused by the Straits of Gibraltar, a distance of ,15 km over water.
Predicted habitat suitability between K1 and K2 was not continuous, however. While much of Spain was predicted to be at least moderately suitable for A. transcaspicus, habitat suitability in the K1 region declined strongly along the coast in northern Spain at about 40-41uN latitude, with only a narrow strip of habitat with low probability of occurrence along the coast of France and a lack of suitable habitat for about 75-100 km along the Italian coast to the east of France. This prediction of low suitability is supported by our failure to collect A. transcaspicus in France and northern Italy, despite the presence of their hosts (e.g., Figure 1).
The most striking feature of the full ENM, however, is the substantial area of unsuitable habitat predicted to occur between the K1+K2 and the K3 regions. Even coastal areas fall outside of the predicted distribution in this intervening region surrounding the Adriatic Sea, including northern Italy, Slovenia, Croatia, Montenegro, as well as inland portions of Albania and northern Greece. Greece, however, did show relatively high connectivity with other K3 countries including Turkey, Israel, and Egypt, either over land (e.g. the Turkish coast) or over islands of high predicted suitability in the Aegean Sea. Furthermore, a distinct break in the ENM prediction is apparent in northern Africa (Libya), which could help explain the genetic discontinuity between K1 and K3 in this part of the A. transcaspicus range.

Ecological similarity among genetic clusters
The reduced ENMs developed using localities from either K1 (average training and test AUC values of 0.99460.001 SD and 0.99260.002 SD, respectively) or K3 (average training and test AUC values of 0.98160.004 SD and 0.98260.018 SD, respectively) alone also performed well in predicting the range-wide distribution of A. transcaspicus (Figure 2). Each of these predictions is broadly similar in scope to the full ENM, particularly the K1only ENM. As in the full model, both reduced models predicted a coastal restriction for A. transcaspicus and also identified breaks in bioclimatic suitability found along the central Libyan coast and along the northern coast of the Adriatic Sea. Overall this suggests that there is some level of niche similarity between the genetic clusters, though there are several important differences. For the K1-only ENM none of the localities in K1 or K2 were omitted completely from the predicted distribution. Most of the K3 localities were also predicted successfully, with the exception of 5  localities in Israel, the Turkey locality, and a single locality from Pilion-Magnisia, Greece that was georeferenced from [30]. The omissions from the K3-only model were more pronounced, with regions containing the Turkey and Pilion-Magnisia [30] localities (K3), the three Italian localities (K2), and the southernmost Tunisian and all eastern Spanish localities (K1) classified as unsuitable. Comparison of logistic probabilities of occurrence among clusters and models shows clear declines for localities not included in reduced models relative to the full model ( Figure 3). While, on average all sampled localities were predicted with high probability in the full ENM, in the K1-only ENM, the K2 and K3 localities were dramatically reduced, and likewise, for the K3-only ENM the K1 and K2 localities had a low probability of occurrence ( Figure 3). This suggests that localities in each cluster contain unique bioclimatic niche information and thus a model generated using data from only a single cluster does a poor job of predicting the detailed geographic distributions of parasitoids in the other clusters, even though, at least qualitatively, the K1-only ENM does a fairly good job of reproducing the distribution map from the full ENM.
The first three axes of the PCA on bioclimatic variables extracted from sampling localities (explaining 35%, 32%, and 15% of the variance respectively) show that the three Mediterranean A. transcaspicus clusters broadly overlap in multidimensional niche space ( Figure 4). However, ANOVAs were significant among parasitoid clusters for all three principal components (PC1: F 2. 60 = 8.49, P,0.001; PC2: F 2. 60 = 4.64, P,0.05; PC3: F 2. 60 = 12.11, P,0.001 ), indicating some differences in the bioclimatic niche space occupied by parasitoids in each region, as suggested by locality omissions in the reduced ENMs. When compared individually between K1 (n = 30) and K3 (n = 28) localities, five of the 10 bioclimatic variables showed significant differences ( Figure 4): mean monthly temperature range (BIO2; t 56 = 2.14, P = 0.036, although this test is not significant following Bonferroni correction to a = 0.005); minimum temperature of the coldest month (BIO6; t 56 = 2.96, P = 0.005); mean temperature of the driest quarter (BIO9; t 56 = 3.86, P,0.001); precipitation of the wettest month (BIO13; t 56 = 4.21, P,0.001); and precipitation seasonality (BIO15; t 56 = 6.81, P,0.001). As in the ENMs, the Turkish locality was a noticeable outlier in most plots, again suggesting that this point falls outside the bioclimatic zone typical for A. transcaspicus in the Mediterranean. However, repeating the PCA and t tests on bioclimatic variables after exclusion of this locality did not change any patterns of significance (not shown).

Discussion
The power of integrating genetic, ecological, and distributional data when studying the process of diversification is being increasingly recognized, with such approaches providing insights into the mechanisms of evolutionary divergence (e.g. allopatry versus sympatry; drift versus natural selection and adaptation) and also aiding practical taxonomic decisions [3,7,9,41]. In the Mediterranean region, the parasitoid A. transcaspicus is subdivided into three major population clusters that encompass the Iberian Peninsula and the Maghreb region of northern Africa (K1), Italy (K2), and the eastern Mediterranean (K3; Figure 1) [19]. Given the high degree of genetic structure among these clusters [19] and experimental data suggesting the possibility of some pre-mating reproductive barriers [21] (Lozier & Mills in prep.), we wanted to better understand how populations of A. transcaspicus are connected by gene flow. In the present study we quantified gene flow using a coalescent-based analysis under a model of isolation with migration and, using an ecological niche modeling approach, tested whether predicted distributions support patterns of strong population structure in the Mediterranean. Our results show that the spatial distribution of A. transcaspicus is likely limited, at least in part, by climatic tolerances. The prediction of three main geographic regions of high environmental suitability separated by regions of poor suitability is a plausible explanation for the nearly complete genetic isolation of the previously identified population clusters, and suggests that environmental variables may provide the gene flow barriers necessary in the early stages of allopatric speciation.

Distinguishing Isolation from Migration
Previous estimates of high genetic differentiation [19] were supported by our analyses of gene flow using the coalescent IM model. Modal values of the posterior probability distributions for the two migration parameters were consistent with very low levels of ongoing post-divergence migration between western (K1) and eastern (K3) Mediterranean populations, well below that often considered necessary for ecological or evolutionary trajectories to remain correlated [42]. Furthermore, the IM models including symmetrical or asymmetrical migration parameters did not have significantly greater likelihood than the nested model excluding gene flow, indicating that contemporary migration is not necessary to explain the observed patterns of genetic variation (Table 1). While we cannot completely rule out low levels of ongoing migration with only six loci, given the present data we conclude that gene flow between these two A. transcaspicus populations is likely to be negligible and that they have remained effectively reproductively isolated since divergence. This suggests that mtDNA haplotypes shared between the two populations are likely the result of incomplete lineage sorting [13] rather than ongoing dispersal.

Effects of Unsampled Populations on Migration
Inferences. It is unfortunate that models comparing K2 were not possible given the failure of MCMC convergence in IMa, likely due to the more limited sample sizes for Italian populations. It thus becomes important to consider how the absence of IMa results for K2 might influence our interpretation of results for K1 and K2. One possibility is that the K2 population, due to its intermediate geographic location (Figure 1), might act as a stepping-stone for dispersal between K1 and K3. Using the five microsatellite loci evaluated in the present study, an estimate of F ST is, in fact, smaller between K1 and K2 (F ST = 0.487) than between K1 and K3 (F ST = 0.657), suggesting that there may be some slightly higher level of gene flow between western Mediterranean and Italian A. transcaspicus populations. Although this level of divergence is still very high for microsatellite data. The F ST observed for K2 and K3 (F ST = 0.601), however, is nearly as high as for K1 and K3, suggesting that genetic isolation for Italian and eastern Mediterranean populations is likely to be as strong as that observed here for western and eastern Mediterranean populations. Furthermore, any indirect gene flow between the eastern and western Mediterranean populations through Italy, or some other unsampled population, should still have been inferred as direct post-divergence migration in the IM model [43,44] rather than no migration. We thus expect that the high degree of reproductive isolation observed between the western and eastern Mediterranean A. transcaspicus populations is unlikely to be affected by the absence of results for K2. We anticipate that additional sampling of parasitoids from Italy will reveal that K2 does not act as a major stepping-stone population in the Mediterranean, although the degree to which this population is connected by gene flow to either K1 or K3 remains to be assessed.

Ecological Niche Models
Parasitoids belonging to the K1 genetic cluster are broadly distributed in Spain and the Maghreb region of Africa (represented here by Morocco and Tunisia). These regions were well connected in the ENM, with only the Straits of Gibraltar as a potential barrier to dispersal. While previous studies have identified the Straits of Gibraltar as a strong phylogeographic barrier between Iberia and Africa for some organisms [45,46], movement across the straits is known for several insect species [47,48]. Such a narrow stretch of water may not be an effective dispersal barrier for a flying insect like A. transcaspicus. At the northern border of the K1 cluster in Europe the full and K1-only ENMs predicted that habitat suitability declines between northern Spain into France, with only a small strip of coastal habitat predicted as marginally suitable for A. transcaspicus, corroborating our failure to collect A. transcaspicus in northern Spain and southern France. The models also suggest that parasitoids in the K2 cluster are likely to be restricted largely to southern and eastern coastal Italy. Again, this is consistent with absences during field observations ( Figure 1). Finally, the ENMs predicted two major over-land breaks in environmental suitability-one between Italy and Greece and another along the central Libyan coast-that correspond strongly with the genetic distinctness of A. transcaspicus in the eastern Mediterranean (K3). Within the eastern Mediterranean, connectivity of predicted suitable habitat was high, both along the mainland coast as well as for islands groups. While we only had samples for the islands of Crete and Cyprus, our results suggest that A. transcaspicus is also likely to be present on many other islands in the region.
In summary, the ENMs show that A. transcaspicus is likely to be restricted largely to the coastal Mediterranean, with climatic suitability generally declining inland. This supports the prediction that A. transcaspicus prefers ''Mediterranean'' environments to other habitat types found in the region [30] (N Mills personal observation). Together with results from previous analyses [19], these data suggest that regions of environmental unsuitability may be effective barriers to dispersal for A. transcaspicus, even more so than short stretches of water, and may explain the patterns of genetic structure present in the Mediterranean.
One potential problem with identifying breaks in species distributions using ENMs is the difficulty in determining if such breaks are due to true environmental unsuitability or false omissions due to insufficient sampling effort. This is most problematic for the K2-K3 gap in the eastern Balkans, for which we have no observations of A. transcaspicus. Although a recent review of aphid parasitoids of southeastern Europe reported no evidence of A. transcaspicus from these countries [30], additional sampling to verify its absence in this region would be ideal.
Another anomaly that suggests model under-prediction is the omission of the Turkish locality of Kahramanmaras from all three ENMs. However, because this locality was an obvious outlier for most environmental variables, we favor an alternative explanation. Based on the label data included with the Turkish A. transcaspicus specimens (supplied by P. Starý), this locality was georeferenced to the city of Kahramanmaras, but we suspect that this may not be an accurate representation of the site of collection. The Turkish province Kahramanmaras extends well to the south of the city into an area predicted to be highly suitable for A. transcaspicus in our ENMs, and may represent the true source of these samples. This single locality did not affect our results, but does highlight a potential concern for generating ENMs if accurate georeferencing is not available for many data points [but see 49].
Based on our own collecting experience, there is even some reason to believe that the ENM for A. transcaspicus may actually over-predict the true geographic range of this species. This is a general concern when evaluating ENM predictions; these methods generate potential distributions based only on the environmental variables provided by the user, and will not accurately predict aspects of a distribution that are influenced by biotic or abiotic factors not included in the model. For example, ENMs will overpredict a species' range if climatic conditions in an area are suitable, but there is some external factor that prevents colonization, such as the presence of a competitor, absence of prey, or physical barrier such as a mountain range. Beyond the climatic conditions included in our ENMs, the distribution of A. transcaspicus will be affected by the distribution and abundance of Hyalopterus aphids, which in turn will be affected by the distribution of suitable primary and secondary host plants. Such factors might explain the high suitability predicted for areas along the Atlantic coast of Morocco despite our failure to detect A. transcaspicus in this region in 2006, as Hyalopterus aphids were also rare. However, in other areas, such as southern France, Hyalopterus are abundant, and we have never observed A. transcaspicus in this region. Hyalopterus species are, in fact, broadly distributed in Europe and Africa [23] and in many parts of the world where they are invasive [50], suggesting that the presence of hosts is not the most important factor limiting the distribution of A. transcaspicus. Indeed, localities where Hyalopterus was collected but A. transcaspicus was absent generally had low probability of occurrence in the ENM (Figures 1, 2), suggesting that the climatic variables included in the model predict unsuitable habit with good reliability.
Niche Similarity Among Populations. The results of our ENM analyses suggest that the niche space occupied by parasitoids belonging to different populations is similar, but certainly not identical. The ENMs generated using only a subset of localities from either the K1 or K3 regions resulted in range predictions with boundaries that broadly agreed with the full model, suggesting that parasitoids in each population may experience many of the same environmental conditions. All three ENMs also showed gaps in habitat suitability in the same geographic regions, suggesting that these breaks are climatically unsuitable for parasitoids from each population. These observations are consistent with niche conservatism driving genetic isolation in A. transcaspicus, where diverging lineages possess adaptations to a recently shared environment and will tend to occur in ecologically similar areas separated by unsuitable habitat [5]. However notable differences in the reduced ENMs, multivariate analysis, and direct comparisons of climatic variables also suggest a potential for niche evolution within the areas inhabited by the different genetically distinct populations, where natural selection causes niche parameters to diverge in association with lineages [51]. Bioclimatic variables that significantly differed between the clusters K1 and K3 indicated that, on average, K1 localities appear to experience smaller seasonal precipitation fluctuations, colder winter temperatures, cooler temperatures in dry summer months, and less precipitation in the wettest winter month. While we did not statistically compare variables for K2 because of the small number of localities available, the trends apparent in Figure 4 suggest that in some cases southern Italy shares climatic characteristics with either K1 or K3 localities, but in others appears distinct (e.g. greater precipitation in the driest month, less precipitation seasonality).
The potential importance of these climatic differences for niche evolution in A. transcaspicus populations merits further research. Biological control practitioners have long recognized the need to consider climate when selecting natural enemy strains for pest management because parasitoid species often appear to exhibit adaptations to local climate in different geographic regions [52][53][54]. For example, overwintering success of aphid parasitoids can be influenced by their level of cold tolerance [55,56], and while there have been no studies of fungal infection of overwintering parasitoid mummies, infections of overwintering aphidophagous predators can be much greater under conditions of higher precipitation [57]. At the opposite extreme, in summer, parasitoids have a maximum temperature tolerance for survival which can also vary according to geographic region [58,59]. Given the ease of rearing A.transcaspicus in the laboratory, it would be relatively straightforward to conduct experiments to test the effects of different ambient temperatures and moisture levels on colonies collected from different geographic regions. Studies on the climatic limitations for A. transcaspicus would also act as useful validation for the ENM presented here.

Synthesis
Given results from the IM analysis and ENMs, we can begin to speculate on the recent evolutionary history of A. transcaspicus in the Mediterranean. We were clearly able to identify a lack of contemporary gene flow between environmentally isolated populations in the eastern and western Mediterranean, although our ENM only applies to a typical 20 th century climate, and it is more difficult to make inferences regarding demographic processes that occurred prior to this period or how changes in climate may affect future patterns of dispersal and gene flow. And while our genetic results suggest that populations have been isolated since they diverged, estimating the divergence time itself is problematic with the present data. Problems in conversion of coalescent parameters to demographic values can arise, for example, from unpredictable scaling associated with intra-population substructure (although within-population structure is minimal for A. transcaspicus [19]) or due to uncertainty in mutation rates (m) for molecular markers. However, obtaining a broad idea of divergence times can still be valuable for developing hypotheses about demographic history. By applying a naïve estimate of the geometric mean m for the six markers employed in our IMa runs based on the geometric mean of 10 24 mutations per generation for microsatellite loci [60] and 10 28 substitutions per site per year for COI [61], we obtain a geometric mean m = 4.04610 25 per gene per generation, assuming 10 generations per year (estimation based on laboratory observations and probable activity period in the field). The use of this rate leads to a divergence time estimate of ,5,000 generations, or ,500 (90% HPD: 143-1,145) years. However, microsatellite loci are known to have much slower rates of evolution for some insects, which would make true divergence times much older. For instance the use of a lower microsatellite mutation rate such as that observed in Drosophila melanogaster (5610 26 per generation) [62] leads to a divergence time of 6,076 (90% HPD: 1,745-13,896) years. In reality mutation rates will differ across microsatellite loci [60] so the accuracy of either estimate cannot really be assessed, though population divergence at some point within the last 15-20,000 years seems a reasonable upper bound.
Given such uncertainty in the divergence time, there are a number of possible hypotheses for the origins of genetic structure in A. transcaspicus, although resolution among these will require further study. One possibility is that the present distribution of isolated populations originated during the last glacial maximum in Europe. Climatic oscillations have led to repeated expansion and contraction of Mediterranean species, with allopatric isolation in Pleistocene refugia, in particular, having been implicated in shaping the present genetic structure of much of the European biota [63][64][65]. Interestingly, our ENM results suggest that the distribution of A. transcaspicus today differs little from that which might be expected for Pleistocene refugia of many European species [e.g. see Figure 1 of Ref. 65]. If the observed genetic differentiation is indeed indicative of Pleistocene isolation, why might A. transcaspicus not have expanded further northward into Europe as suitable habitat became available? It is possible that Pleistocene populations were distributed as they are today, but remained trapped in their refugia, perhaps due in part to an inability to cross the east-west oriented mountain ranges dominant in Mediterranean refuges. This hypothesis would suggest that A. transcaspicus would have had to evolve rapidly in response to changing conditions, and could account for the somewhat different climatic niches occupied by the different genetic population clusters. Alternatively, A. transcaspicus may have been isolated in much smaller refugia with appropriate climatic conditions, perhaps at the southern limits of European peninsulas or in northern-most Africa, and subsequently expanded into the present distribution as favorable conditions spread. This latter hypothesis makes more sense in light of the low variation observed at COI (seven total haplotypes dominated by two sequences present in 94% of individuals) and microsatellite loci (average expected heterozygosity across all localities of 0.37), as well as the absence of isolation by distance within population clusters [19], a pattern typical of non-equilibrium demographic histories [66].
A second scenario could involve even more recent processes consistent with dates associated with the faster (10 24 per generation) microsatellite mutation rate. For example, it is plausible that A. transcaspicus spent the last glaciation completely outside of the Mediterranean (possibly in Asia, the suspected origin of the sibling species A. colemani) [27], and was introduced rapidly through the Mediterranean by human action associated with the spread of cultivated Prunus and their associated Hyalopterus aphids [67,68]. Climatically unsuitable regions located between those currently occupied by the three contemporary Mediterranean population clusters would have reduced the likelihood of establishment in these areas, thereby preventing gene flow and allowing for the development of genetic differentiation. Recent introduction followed by rapid isolation due to climatic barriers is also consistent with the low within-population genetic variation and lack of isolation by distance discussed above. Inter-population cluster differentiation would be exacerbated on this recent timescale if, as the IMa result suggests, founding populations were small relative to the ancestral population, allowing drift to more effectively change allele frequencies in the absence of gene flow (although we note that estimates of h A can be inflated in an IM model by the presence of 'unsampled' contemporary populations, such as K2 and Middle Eastern populations not considered here). This hypothesis is particularly intriguing in light of partial reductions in mating success observed in reciprocal crosses between A. transcaspicus collected in Spain and Israel [21] (Lozier & Mills, in prep.), and would indicate extremely rapid evolution of pre-mating reproductive isolation behaviors. This might not be particularly surprising for an insect with many generations per year, but would still be an interesting example of rapid divergence in response to recently imposed allopatric isolation. Paleodistributional modeling could be a useful tool for determining where A. transcaspicus could have been historically distributed given its current climatic tolerances [69,70], and thus for distinguishing these different hypotheses. This approach may not be suitable, however, if recent niche evolution has occurred because conditions favorable to A. transcaspicus today may not match those experienced by historical populations. However, a better understanding of how Hyalopterus and their Prunus host plants were distributed during the Pleistocene may provide important clues as to where A. transcaspicus could have survived this period.
In conclusion, our results highlight the power of integrating multiple approaches for understanding patterns of divergence in closely related populations. By incorporating climatic niche-based models of geographic distribution and statistical analysis of isolation versus migration in situations where reciprocal monophyly is unlikely, researchers can not only more effectively detect cryptic taxonomic diversity but can better understand the mechanisms that generate this diversity. Our results suggest that A. transcaspicus may warrant further attention by Aphidiine systematists to determine if any morphological differences correlate with patterns of geographic genetic structure. While we have found evidence for environmentally mediated geographic isolation in contemporary populations, questions persist as to the historical events that may have resulted in the current distribution of A. transcaspicus. More data from A. transcaspicus and its close relatives, together with better estimates of intra-specific substitution rates and more complex population genetic models, will allow us to further resolve divergence times and colonization processes for these Mediterranean populations.

Author Contributions
Conceived and designed the experiments: JDL. Performed the experiments: JDL. Analyzed the data: JDL. Contributed reagents/materials/ analysis tools: NM. Wrote the paper: JDL NM.