Single Nucleotide Polymorphisms Reveal Genetic Structuring of the Carpathian Newt and Provide Evidence of Interspecific Gene Flow in the Nuclear Genome

Genetic variation within species is commonly structured in a hierarchical manner which may result from superimposition of processes acting at different spatial and temporal scales. In organisms of limited dispersal ability, signatures of past subdivision are detectable for a long time. Studies of contemporary genetic structure in such taxa inform about the history of isolation, range changes and local admixture resulting from geographically restricted hybridization with related species. Here we use a set of 139 transcriptome-derived, unlinked nuclear single nucleotide polymorphisms (SNP) to assess the genetic structure of the Carpathian newt (Lissotriton montandoni, Lm) and introgression from its congener, the smooth newt (L. vulgaris, Lv). Two substantially differentiated groups of Lm populations likely originated from separate refugia, both located in the Eastern Carpathians. The colonization of the present range in north-western and south-western directions was accompanied by a modest loss of variation; admixture between the two groups has occurred in the middle of the Eastern Carpathians. Local, apparently recent introgression of Lv alleles into several Lm populations was detected, demonstrating increased power for admixture detection in comparison to a previous study based on a limited number of microsatellite markers. The level of introgression was higher in Lm populations classified as admixed than in syntopic populations. We discuss the possible causes and propose further tests to distinguish between alternatives. Several outlier loci were identified in tests of interspecific differentiation, suggesting genomic heterogeneity of gene flow between species.


Introduction
Most species are genetically structured, and genetic structure is often observed at multiple spatial scales [1,2]. Genetic structure is the result of a complex interplay of drift, gene flow, and natural selection acting on standing variation and new mutations [3][4][5]. The relative importance of these evolutionary forces is contingent on biological features of the organisms [6], and has also been affected by large-scale historical events, such as the Pleistocene climatic oscillations [2,7]. Identification of factors responsible for the observed spatial structuring of genetic diversity is a major goal of population genetics [8]. The quantification and understanding of genetic structure within species are of fundamental importance for inferential studies of population history, population ecology and biodiversity conservation [3,9,10]. Analyses of genetic structure are also essential for several aspects of the study of adaptation [5,[11][12][13][14][15][16][17][18].
Genetic variation within species is commonly structured in a hierarchical manner which may result from superimposition of processes acting at different spatial and temporal scales. For example the impact of major climatic oscillations is clearly visible in the patterns of genetic differentiation observed currently in temperate and boreal species [2,19]. This is believed to reflect mainly secondary contact and partial admixture of populations derived from separate refugia with a contribution of processes related to the expansion itself, such as allele surfing [4,7]. Within these major geographic groups, populations are differentiated due to limited dispersal producing isolation by distance [20,21].
In species with limited dispersal capabilities, signatures of past subdivision are detectable for a long time [2,22]. This may be due to a combination of limited dispersal per se and differential adaptation in refugia superimposed on contemporary ecological gradients [23]. Admixture may also be delayed or prevented by the accumulation of intrinsic incompatibilities between populations [24,25] and poor dispersers appear to speciate on smaller geographic scales [26]. Thus studying contemporary genetic structure in taxa characterized by limited dispersal is likely to provide ample information about historical and demographic events. Amphibians and in particular salamanders are ideal for such inferences [27,28]. Another advantage of such taxa is that they retain historical information about spatial variation of genetic exchange with related, incompletely reproductively isolated species ( [29] and references therein).
Detecting, quantifying and interpreting genetic structure requires appropriate tools. Single nucleotide polymorphisms (SNPs) are powerful markers well suited for assessing genetic structure [30,31]. They are amendable to high throughput, costeffective and reliable genotyping through array-based [32] and genotyping by sequencing [33] approaches. If SNP discovery and genotyping are performed separately, the researcher has control over the location and other characteristics of the SNPs selected for genotyping. For instance, if polymorphism data from transcriptome sequencing are available, SNPs located in known protein coding genes can be selected for genotyping, providing information about variation in functionally important regions. On the other hand, a discover-then-genotype approach introduces ascertainment bias, which distorts the picture of variation obtained from a larger sample [34]. However, initial discovery of SNPs in a smaller sample may be desirable for some applications, such as detection and quantification of population structure [35,36]. This is because the discovery process is biased towards more variable SNPs, thus increasing the per-marker information content, especially if SNP discovery is performed in a random or geographically diverse sample [37]. A distinct advantage of SNPs over microsatellites is that orders of magnitude more locations in the genome, can be easily interrogated. Thus SNPs offer a truly genome-wide perspective, essential if the biological processes of interest affect portions of the genome differentially [38][39][40].
Here we investigate the genetic structure of populations of the Carpathian newt (Lissotriton montandoni, Lm), a species which has apparently survived the glacial period in the Carpathians [41], an important refugial area [42,43]. Two processes appear to have profoundly affected this species and shaped the genetic structure currently observed. The first includes climatic oscillations during the Pleistocene, likely responsible for the observed regional-scale genetic structuring. The second involves hybridization with and introgression from its widely distributed congener, the smooth newt (L. vulgaris, Lv). Previous studies [29,41,44,45] demonstrated that Lm is genetically differentiated across its range in both mitochondrial and nuclear (microsatellites) genome. Patterns of genetic differentiation and species distribution modeling performed by Zieliński et al. [29] suggest several glacial refugia in the Carpathians. While multiple, spatially and temporally distinct introgression events from Lv resulted in complete mtDNA replacement in Lm, very little recent interspecific nuclear gene flow was suggested by microsatellite markers [29]. However, interspecific gene flow in some parts of the nuclear genome has been extensive, as evidenced by data from the Major Histocompatibility Complex genes [45].
A set of transcriptome-derived SNPs and extensive sampling are used herein to address the following issues. First, we compare genetic structure inferred from the genome-wide sample of SNP markers with that estimated previously [29] from a much smaller number of microsatellites. Specifically we wanted to determine the number of genetic clusters (which may correspond to glacial refugia) supported by SNP markers, delineate their distribution and estimate the extent of admixture between them. To this end we use a comprehensive, uniform sampling, including previously undersampled Ukrainian Carpathians where admixture between genetic clusters was expected. Second, we test whether introgression from Lv is detectable in the nuclear genome of Lm with an increased number of markers, and if so, whether introgression varies geographically. Populations in which both species co-occur were also sampled across the range to estimate the admixture in syntopy. Third, we apply outlier analysis to test heterogeneity of gene flow within and between species and identify genes departing from the genomic average; such genes may be involved in population or species-specific adaptations.

Sample collection
Altogether we analyzed 473 individuals from 40 populations: 25 populations of Lm (298 individuals), 7 syntopic populations in which both species co-occur (83 individuals) and 8 populations of Lv (92 individuals) ( Fig. 1; Table 1). Sampling sites were selected to cover the Lm range uniformly and to reflect Lv diversity in the surrounding areas. The average per population sample size, 12, might be considered low for some of the population genetic analyses, however we decided to rather maximize the number of markers as it was shown that this might be beneficial for robust landscape genetic inferences [46]. Throughout the text, we use terms population or locality interchangeably to refer to a particular breeding site consisting of one or more closely located water bodies. Adult newts were sampled by dip-netting during breeding season. Animals were released after tailtips were collected. Tissue samples were stored in 95% ethanol until DNA extraction. DNA was extracted using the Wizard Genomic DNA Purification Kit (Promega). Ethical Committee on Animal Testing at the Jagiellonian University in Krakow. Tissue samples were collected according to the requirements of these institutions: adult newts were captured by dip netting and tissue samples from tail tips were taken under anesthesia. Immediately after recovery from anesthesia, newts were released at the collection site. The sampling locations were not privately owned or protected in any way.

SNP discovery, assay development, and genotyping
SNPs were identified in liver transcriptomes of six Lm individuals sampled to encompass the genetic diversity of the species (Fig. 1). Transcriptomes were sequenced using Illumina technology and de novo assembled with Trinity [47]. Details of transcriptome sequencing and assembly will be provided elsewhere (Stuglik et al. in prep). We used a custom bioinformatic pipeline [48] to construct transcriptome-based gene models (TGM) from the Trinity output. Reads were mapped to this reference transcriptome with Bowtie2 [49] and SNPs were called with SAMTools 0.1.18. [50]. Next, we used blast searches against Xenopus tropicalis transcripts to identify TGM representing protein coding genes. To be included in the design of the genotyping assay, the SNP had to fulfill the following criteria: i) occur in a TGM which produced an unambiguous hit to a single Xenopus gene and to not exhibit high similarity to other TGMs in the newt reference transcriptome; this criterion was applied to minimize the incidence of false ''SNPs'' derived from paralogous regions; ii) have a minimum sequencing depth of 15 x and minimum genotype quality of 30 phred; iii) be located at least 60 but not more than 1000 bp from the exon boundary; the latter filter was used because last exons of many genes are long and consist mostly of 39 untranslated regions (UTR) which are poorly conserved between species; thus the length of such exons in the newt could not be reliably determined and particularly long last ''exons'' may be artifacts of misassembly [51]. Filtering was performed with a custom Python script. A total of 251 SNPs and their flanking sequences were scored with Illumina Assay Design Tool (ADT) and the Illumina VeraCode GoldenGate Assay was designed for 192 best scoring SNPs. GoldenGate provides codominant genotype data for polymorphic positions with two segregating variants [32]. Genotyping, primary visualization, quality assessment and filtering were performed with Illumina GenomeStudio Data Analysis Software. All loci with cluster separation score and gen train score lower than 0.2 and 0.7, respectively, were excluded from further analysis. We also excluded loci with minor allele frequency (MAF),1% and less than 90% genotyped individuals.

Population genetics analyses
Allele frequencies for each locus, tests of Hardy-Weinberg equilibrium and tests of linkage disequilibrium (LD) were calculated in GENEPOP 4.1.2 [52]; the type I error was controlled using the false discovery rate (FDR) approach implemented in QVALUE 1.0 [53,54]. Expected heterozygosity (H E ), was calculated in the R package adegenet [55]. Allelic richness (AR) was calculated in FSTAT 2.9.3.2 [56]. We interpolated geographic gradients in H E and AR using inverse distance weighting (IDW) in ArcGIS v 10.0 (ESRI, Redlands, CA, USA). Pairwise F ST values between populations and their significance were computed in Arlequin 3.5 [57]. Multidimensional scaling (MDS) was used for visualization of the F ST matrix. Principal component analysis (PCA) was performed in R using adegenet. The significance of correlations between genetic and geographical distances was calculated using the Mantel test implemented in IBDWS [58]. A population tree was constructed in POPTREE2 [59] from the pairwise F ST [60] matrix using the neighbor-joining method. The number of genetic clusters was determined and assignment of individuals to clusters was performed using the Bayesian approach implemented in STRUC-TURE 2.3.3 [61][62][63]. We ran Structure on two separate datasets: Lm, which included only morphologically pure Lm populations and Lm&Lv comprising Lm, Lv and syntopic populations. We ran 10 analyses for each K 1-15 for Lm and 10 replicate runs for each K 1-20 for Lm&Lv dataset. In each case, the admixture model was applied and the runs consisted of 250 000 MCMC burnin steps followed by 1 000 000 post-burnin iterations. We performed inferences under the model of correlated allele frequencies for Lm, whereas the uncorrelated model was used for Lm&Lv dataset, because Lm and Lv populations were expected to be more divergent on average. To determine the most likely number of genetic clusters supported by the data, we calculated DK, a measure of second order rate of change in the likelihood of data [64], using the online software Structure Harvester [65]. Analysis of molecular variance (AMOVA) in Arlequin was used to partition SNP variation into hierarchical levels. Two groupings of populations were used: i) suggested by Structure; ii) supported by the methods based on genetic distances between populations. Significance levels for variance components were estimated using 10 000 permutations.
To identify markers departing significantly from the genomewide average of differentiation among populations a scan for F ST outliers was performed. In order to minimize the number of false positives the outlier detection was performed under a hierarchical island model [66] in Arlequin. We performed separate scans for differentiation within Lm and for interspecific differentiation. In each case 50 000 coalescent simulations with 2 groups of 100 demes were performed to obtain the null distribution of Fstatistics. We selected candidate loci based on F ST or F CT values falling into the 1% upper and lower quantile as suggested by Excoffier et al. [67]. F CT allows for identification of outlier loci between the groups of populations whereas F ST identifies outlier loci among populations after accounting for higher-level structure [67]. Genes containing outliers were annotated by similarity blastx search against the nr protein database.

Variation
Genotyping was successful for 139 out of 192 markers (72%) and these were used for population genetic analyses (Table S1)  (Table S1). No significant deviations from Hardy-Weinberg expectations were detected at the FDR 0.05 which indicates that null alleles are very rare in our markers (Table S2). Tests for linkage disequilibrium across populations detected significant LD at the FDR 0.05 for 12 pairs of loci, however significant results were found only in three syntopic populations. Thus significant LD resulted not from physical linkage but from local admixture, and we consider all markers as segregating independently. The expected heterozygosity (H E ) ranged from 0.05 (locality 33, Lv) to 0.29 (locality 29, syntopic) with a mean of 0.19 (SD = 0.07). H E was significantly higher in Lm than Lv (U-test, P = 2.9610 25 ) most likely due to ascertainment bias, as only markers known to be polymorphic in Lm were assayed. Within Lm H E was lowest in population 6, isolated in the Ukrainian Podolian Upland and highest in population 20 located in the Romanian Transylvanian Plateau, and ranged from 0.19 to 0.27 respectively, with a mean of 0.23 (SD = 0.01) (Fig. S1). Within Lv H E ranged from 0.05 in the northernmost locality 33 to 0.10 in locality 36 with a mean of 0.08 (SD = 0.01). H E in syntopic populations spanned a broad range from very low (0.06, locality 31) to the highest overall (0.29, locality 29), most likely depending on the frequency of both species in the population (Fig. S1b).

Genetic structure and diversity in Lissotriton montandoni
Genetic differentiation between Lm populations varied from negligible (F ST ,0.001 P = 0.39 between 14 and 17 located in the northern part of the Romanian Carpathians), to strong (F ST = 0.408 P,10 23 between populations 1 and 24 at the opposite limits of the species distribution) ( Table S3). The MDS plot of pairwise F ST revealed two major, genetically differentiated groups of populations with distinct geographic distributions in the northern and southern part of the species range (Fig. 2a). Pairwise F ST values within groups were similar (averages of 0.100 and 0.098 in the northern and southern group, respectively), and overlapped only slightly with the distribution of pairwise F ST between populations from different groups (mean 0.271; randomization test P,0.001; Fig. S2). Within the northern group two populations appeared distinct from the rest. Notably both are isolated from the continuous part of the species range (Fig. 1). The westernmost locality 1 in the Sudetes Mountains is separated from the main range by the Moravian Gate and locality 6 in the Ukrainian Podolian Upland by the Dniester river. The two groups of Lm populations are strongly supported also by the population tree (Fig. 3).
The presence of two genetic clusters is also evident from individual-based analyses.
In principal component analysis (PCA) PC1 (15.8% of variance explained) separated newts from northern and southern populations (Fig. 2b). The Evanno [64] method also supported K = 2 as the most likely number of clusters in the Structure analysis (Fig.  S3). Structure detected some admixture between the two clusters in the Ukrainian Carpathians and the northern part of the Romanian Carpathians. Admixture was strongest in population 10 which was therefore excluded from the AMOVA analysis (Fig. 4). AMOVA attributed 19.5% of total variation to differentiation between clusters and 8.2% to differentiation between populations within clusters ( Table 2). Whereas no alleles were private to any population, 8 and 20 alleles were private for the northern and southern group, respectively. No significant differences between groups were detected in H E (U-test, P = 0.37), but allelic richness was higher in the southern group (1.62 vs, 1.57; U-test, P = 0.0042) (Fig. S4). As could have been expected from the significant amongpopulation differentiation, there was a strong, highly significant isolation by distance observed both at the level of the entire species (Mantel test, r = 0.89, P,0.001) and within genetic clusters (North: r = 0.78, P,0.001; South: r = 0.78, P,0.001) (Fig. S5).

Differentiation within Lissotriton vulgaris
Genetic structure among Lv populations around the Carpathians is stronger and presumably deeper than that within Lm. A deep split between populations outside the Carpathian belt and those in the Carpathian Basin is visible in the population tree (Fig. 3), MDS (Fig. 5a) and PCA plot (PC3, Fig. 5c). Structuring within the Carpathian Basin is also pronounced, as substantial genetic distance separates the single analysed population of L. vulgaris ampelensis (locality 38) from two populations of the nominal subspecies L. vulgaris vulgaris (Fig. 3, 5a, 5c).

Genetic differentiation and gene flow between species
Strong differentiation between Lm and Lv was detected by the population-based analyses (Table S3, Fig. 3, 5a). AMOVA revealed that 43.5% variation was distributed between species and 13.3% between populations within species ( Table 2). Three of seven syntopic populations occupied intermediate positions in the MDS plot (Fig. 5a). However, an overwhelming majority of newts in syntopic populations fell within the range of variation of either one or the other species; only a handful of substantially admixed individuals and possibly a single F1 hybrid were detected (Fig. 5b,  6). K = 2 was strongly supported by Structure when the two species were analyzed together (Fig. 6, S6). Structure confirmed that in syntopic populations admixture is limited, genotypes of two parental species dominate and significantly admixed individuals are rare. Structure also provided an important insight which was not visible in PCA results: a clearly detectable (.3%) admixture of Lv genes in four Lm populations. Three of these were in the southern part of the Romanian Eastern Carpathians (localities 19,20,25) and one was the isolated locality 1 in the Sudetes Mountains; the average admixture in these populations was 8.5%. No admixture of Lm genes was detected in Lv populations. The comparison of the average proportion of admixture on both genetic backgrounds in syntopic populations demonstrated that the mean admixture was very low, ca. 2% and that the proportion of admixture did not differ (U-test, P = 0.24) between Lv and Lm backgrounds.
Screening for outlier loci between Lm and Lv was performed excluding syntopic populations (26)(27)(28)(29)(30)(31)(32). A total of nine (6.5%) F CT outlier loci were identified: eight (15,72,79,112,116,117,128,129) were more differentiated than expected under the neutral model and one (126) was less differentiated (Fig. 7c). Only one of the interspecific outliers was a nonsynonymous polymorphism. Locus 72 located within gene SRSF1, involved in splicing regulation, was classified as a candidate for balancing selection between the northern and southern Lm groups and at the same time as a candidate for divergent selection between species.

Discussion
Isolation in glacial refugia and limited dispersal determine the genetic structure of Lm Two clearly differentiated genetic units were identified in Lm by SNP data: the northern group in the Western Carpathians and the western part of the Eastern Carpathians, and the southern group across the rest of the species range. Admixture between them occurs around the Romanian-Ukrainian border. Zieliński et al. [29] identified three units in microsatellite data: our southern group combines their eastern and southern units. Can the discrepancy between the two studies be reconciled? Below we argue that SNPs reflect the history and differentiation better than microsatellites and offer several explanations of the apparent discrepancy between the datasets. We consider the alternative explanation that SNP data fail to detect true differentiation unlikely on several grounds. First, an overwhelming majority of pairwise F ST values calculated from SNPs were significant, demonstrating substantial power to detect differentiation. Second, remarkably strong isolation by distance is observed in Lm and the apparent break between the eastern and southern groups of Zieliński et al. [29] coincides with a gap in their sampling. This gap has been filled by the present study. Thus, isolation by distance and non-uniform sampling may have resulted in delineation of the apparently distinct unit [68,69]. Third, the population tree based on SNPs shows a remarkable pattern consistent with colonization from two refugia, but it is difficult to explain under the assumption of expansion from three refugia. While in both groups relationships between some populations are poorly resolved, in each group several populations are related in a nested fashion, with progressively longer branches at the deeper nesting levels. Nesting involves populations distributed from the center to the periphery of the species range -to the west in the Western Carpathians and to the south-west in the Romanian Carpathians. We hypothesize that the populations with poorly resolved relationships are those inhabiting the refugial areas and sharing most variation retained there. Populations related to each other in a nested fashion would be those which colonized the present range through serial events [4,70]. Taking the evidence together, we propose that the location of the refugium for the northern group was in the Eastern Carpathians close to the Polish-Ukrainian border, and that the refugium for the southern group was in the central part of the Eastern Carpathians in Romania. Species distribution models for the LGM reported by Zieliński et al. [29] are broadly consistent with the proposed location of  refugia. The role of the Carpathians as a major refugium for European biota has recently been well documented [42,43,71,72]. Multiple species show genetic differentiation between the Western, Eastern and Southern Carpathians, pointing to the presence of several refugia (reviewed in [29,72]). So far, a refugium in the western part of the Eastern Carpathians has to our knowledge not been proposed. Expansion from refugia is commonly accompanied by loss of variation [4,73]. Reduction of genetic variation along the postulated expansion routes is visible in our data, but the signal is not strong, and may be distorted locally by introgression from Lv (see below). Hence expansion was apparently not accompanied by severe bottlenecks and thus only a minor fraction of variation has been lost. The strongest reduction in genetic variation occurred in locality 6 in the Podolian Upland isolated from the main portion of the range. This population is a remnant of a geographically remote group of Lm populations [74] which has been hypothesized to be isolated from the main part of the range for several thousand years [75].
Salamanders often exhibit low individual mobility and strong philopatry [27,76]. Genetic differentiation between salamander populations appears to reflect these features although the geographic scale of subdivision differs among species [77,78], which may be related to habitat characteristics [79,80] and to lifehistory traits [81]. In continuous habitats limited dispersal abilities are likely to generate isolation by distance patterns with a gradient of genetic differentiation among sites, on which larger-scale, hierarchical differentiation reflecting geographic or environmental barriers may be superimposed [80,82]. A comparison of our results with those of a study [83] on fine-scale genetic differentiation in L. vulgaris graecus suggests that a combination of isolation by distance, probably due to limited dispersal, and spatial clustering due to historical fragmentation and/or landscape barriers occurs in Lissotriton newts at both micro-and macroscales.
Local introgression of Lv alleles into the Lm nuclear genome is detectable with SNP markers A major finding of the present study is substantial introgression of Lv nuclear alleles into some Lm populations. This is contrary to the findings of Zieliński et al. [29] who detected very little recent nuclear introgression in either direction. One likely explanation for the difference between the studies is the number of markers employed [84]. While we analyzed 139 unlinked SNPs, inference about introgression in the previous study was based on only 10 microsatellites. The observed discrepancy does not result from differences in sampling because three of four admixed populations were analysed in both studies. As SNP markers were discovered in a sample of Lm individuals, our study did not use diagnostic markers. This could be considered a weakness if viewed from the perspective of classical studies of hybrid zones which usually employed a limited number of diagnostic markers. However, because of the widespread genomic heterogeneity of interspecific gene flow [38,40,85,86], such diagnostic markers may constitute a highly nonrandom sample of the genome, enriched in genomic regions strongly differentiated between species. In our opinion randomly selected polymorphisms are better suited for an unbiased assessment of introgression. We acknowledge that ideally both species should be included in the discovery panel; this would however limit the number of polymorphic loci useful for the assessment of genetic structure within Lm. The current study demonstrates two peculiar features of Lm x Lv hybridization. First, appreciable (.3%) introgression was detectable only locally, in four of 25 sampled Lm localities. In these populations most individuals were introgressed and the average admixture of Lv genes was 8.5%. Second, in the introgressed Lm populations, admixture was stronger than in seven syntopic localities, where it was barely detectable. Thus current syntopy, even if it leads to occasional hybridization, as shown by a single putative F1 hybrid, does not necessarily cause introgression. This is somewhat surprising because a study of a Lm/Lv hybrid zone at microscale detected strong assortative mating but also found that syntopy was almost universally accompanied by some admixture [44]. As the four admixed Lm populations testify, nuclear introgression of Lv alleles into Lm populations extends beyond syntopy, but does not permeate into the core of the Lm range.
Local differences in the extent of introgression may be explained by several mechanisms. The introgressed populations may be simply located at the tails of local hybrid zones, and would thus be sampled entirely by chance. However other potential explanations deserve consideration. Local ecological conditions may either favor introgression or delay removal of introgressed alleles by selection [87]. Differences in abundance of species in a breeding locality may force the rarer species to hybridize due to scarcity of conspecific mates, but we have not observed this effect in syntopic populations. If species are genetically structured, as in our case, introgression may be easier between some genetic groups if their genomes harbor fewer incompatible alleles and thus intrinsic selection against hybrids is weaker or ecological/sexual adaptations are similar [24,25]. Lv is strongly differentiated genetically [41,88,89] and various Lv groups come into contact with Lm populations in the Carpathian Basin, and outside the Carpathian belt. If introgression is neutral, the observed pattern may result from expansion-related phenomena [4]. Under the scenario modeled by Currat et al. [90], when one species invades the range of another, neutral introgression occurs almost exclusively from the resident to the invading species. Thus, local expansion of Lm would bring Lv genes onto its genetic background. A comparison of the two isolated Lm populations may be instructive in this respect. Population 1 at the western margin of the species range, probably the result of postglacial or more recent expansion, has recently introgressed Lv mtDNA and shows clear evidence of nuclear introgression. Another isolated population (6), close to the postulated refugial area of the northern Lm group and possibly surviving in situ for a long time, shows no trace of nuclear introgression. Scenarios related to the Currat et al. [90] model were favored as the explanation of mtDNA introgression and replacement in Lm by Zieliński et al. [29].
In addition to laboratory experiments which are difficult to perform in this system due to logistic reasons, two other kinds of analyses would be informative with respect to the causes of the apparent differentiation in the extent of introgression. Examination of several transects through hybrid zones in the context of local environmental conditions and relative species abundance could be informative as demonstrated in multiple systems [87,[91][92][93]. Another important way forward would be to use multilocus sequence data [94] to construct and test multipopulation models of gene flow between Lm and Lv. Models distinguishing two groups within Lv, inside and outside of the Carpathian basin, as well as two groups within Lm can be evaluated and hypotheses regarding the timing and extent of gene flow may be tested within an Approximate Bayesian Computations framework [95,96]. This approach would provide a longer-scale perspective on gene flow between species and its spatial and temporal variation.

Genomic heterogeneity of gene flow within and between species
Outlier loci were detected both within Lm and between Lm and Lv. Such candidate loci may signal various forms of selection acting on the markers themselves or at linked sites [24,38]. Alternatively their apparent outlier status may result from violation of the model assumptions, to which the available methods are very sensitive [13]. We do not attempt a formal functional analysis of the identified outliers but rather emphasize that the outliers detected in the Lm-Lv comparison indicate heterogeneity of interspecific gene flow in nuclear protein coding genes. Dramatic discordance in the propensity for interspecific gene flow occurs between the mitochondrial and nuclear genome ( [29]; this study). Within the nuclear genome the genes of MHC class II introgress easily between the two species [45]. The present study suggests that heterogeneity of gene flow is widespread in the nuclear genome. Some genomic regions, typically linked to genes involved in intergenomic incompatibilities or underlying speciesspecific adaptations, i.e. genes which may cause reduced hybrid fitness, acquire reproductive isolation earlier than other regions [38,97,98]. The size of such regions and mechanisms responsible for maintenance of genomic differentiation have been a subject of ongoing controversy and intense recent research [85,[98][99][100][101]. It is expected that the shape of the heterogeneity in gene flow will evolve over time and a comparison of the extent of heterogeneity at various stages of divergence is of great interest for the understanding of the buildup of genomic divergence as differentiation progresses [40,86]. Transcriptome data, such as those used here for the development of SNP markers, are being applied to study genomic heterogeneity of gene flow in the Lm/Lv system (Stuglik et al. in prep.).

Conclusions
Using a panel of transcriptome-derived SNP markers, our study has demonstrated that isolation in glacial refugia and limited dispersal have been the main factors determining the genetic structure of Lm. Two substantially differentiated groups of Lm populations likely originated from separate refugia, both located in the Eastern Carpathians. The colonization of the present range in north-western and south-western directions was accompanied by a modest loss of variation. Local introgression of Lv alleles into several Lm populations was detected. Introgression was higher in Lm populations classified as admixed than in syntopic populations. We discuss the possible causes of this discrepancy and propose further tests to distinguish between alternatives. Several outliers were identified in tests of interspecific differentiation, suggesting genomic heterogeneity of gene flow between species. The shape of genomic heterogeneity at various stages of species divergence is of major interest for the understanding of the buildup of differentiation across the genome and Lm/Lv is a promising study system in this respect.