Speciation in the Deep Sea: Multi-Locus Analysis of Divergence and Gene Flow between Two Hybridizing Species of Hydrothermal Vent Mussels

Background Reconstructing the history of divergence and gene flow between closely-related organisms has long been a difficult task of evolutionary genetics. Recently, new approaches based on the coalescence theory have been developed to test the existence of gene flow during the process of divergence. The deep sea is a motivating place to apply these new approaches. Differentiation by adaptation can be driven by the heterogeneity of the hydrothermal environment while populations should not have been strongly perturbed by climatic oscillations, the main cause of geographic isolation at the surface. Methodology/Principal Finding Samples of DNA sequences were obtained for seven nuclear loci and a mitochondrial locus in order to conduct a multi-locus analysis of divergence and gene flow between two closely related and hybridizing species of hydrothermal vent mussels, Bathymodiolus azoricus and B. puteoserpentis. The analysis revealed that (i) the two species have started to diverge approximately 0.760 million years ago, (ii) the B. azoricus population size was 2 to 5 time greater than the B. puteoserpentis and the ancestral population and (iii) gene flow between the two species occurred over the complete species range and was mainly asymmetric, at least for the chromosomal regions studied. Conclusions/Significance A long history of gene flow has been detected between the two Bathymodiolus species. However, it proved very difficult to conclusively distinguish secondary introgression from ongoing parapatric differentiation. As powerful as coalescence approaches could be, we are left by the fact that natural populations often deviates from standard assumptions of the underlying model. A more direct observation of the history of recombination at one of the seven loci studied suggests an initial period of allopatric differentiation during which recombination was blocked between lineages. Even in the deep sea, geographic isolation may well be a crucial promoter of speciation.


Introduction
Reconstructing the history of divergence and gene flow between closely-related organisms is a difficult task of evolutionary genetics that has recently received much attention [1]. The analysis of DNA sequence polymorphism at multiple loci associated with new statistical coalescence-based inferences now provides an elegant approach to test for the existence of gene flow during the divergence process [2]. This approach has been applied to a number of closely related species [3], [4]. Somewhat unanticipated, gene flow has been detected in most cases [5]. Taken at face value, these observations would suggest parapatric speciation as a more common mode of speciation than previously thought. However, it is not clear that the detection of gene flow necessarily means genes continuously exchanged during the whole divergence period. Secondary introgression can be difficult to separate from ongoing primary differentiation with gene flow (i.e. parapatry), because the high stochasticity of the coalescence process often leads to a full range of situations, from reciprocal monophyly to extensive allele sharing [6]. Furthermore, in both cases natural selection is expected to prevent gene flow in regions of the genome where species-specific adaptations operate while gene flow can still be substantial in regions lacking isolation genes, resulting in socalled semi-permeable barrier to gene flow [7].
We here address the issue of the history of gene flow during differentiation in an original environment, hydrothermal vents in the deep ocean. Oceanic ridges are motivating places to study mechanisms by which speciation occurs. Ridges are nearly all connected into a one-dimensional network that has spread over the globe following plate tectonics. Contrary to what prevails in surface, quaternary climatic oscillations should not have perturbed so much the deep ocean and it is unclear whether long episodes of geographic isolation could have occurred in the recent history of species inhabiting ridges. In addition, the high variability of hydrothermal vent conditions (steep thermal gradients and chemistry heterogeneity) could induce perfect conditions for ecological speciation. For example, because of the ultramafic nature of the oceanic crust beneath the Mid-Atlantic Ridge [8], hydrothermal vent emissions from the deepest sites are naturally enriched in methane but nearly depleted in sulphides whereas it is the reverse in the shallower sites [9]. Depth, hydrothermal fluid composition in heavy metals, sulphides and methane, temperature and symbiotic associations are highly fluctuant parameters, which are likely to induce local adaptation and possibly ecological speciation. On the other hand, the subduction of ancient rifts and the emergence of new ones might have led to either isolation of vent fields at a large spatial scale on the long term or to transient events of spatial isolation on the short term, both phenomena being thought to have deeply affected the hydrothermal vent fauna [10], [11]. The constant modification of the hydrothermal vent landscape due to the fluid displacement led to recurrent 'crash and flush' in populations that might likely promote speciation and cause species crypticism [12], [13].
Mussels from the genus Bathymodiolus [14] are one of the most widespread mollusc dominating deep-sea hydrothermal vents and cold-seeps communities throughout the world. Interestingly, Bathymodiolus mussels still possess a dispersive larval stage, a biological feature that has long questioned the possibility of geographic isolation in marine species [15]. The genus Bathymodiolus has diversified a few millions of years ago following an explosive radiation leading to a large number of geographic species [16], [17]. Two distinct species of Bathymodiolus, B. azoricus and B. puteoserpentis, were morphologically described from the northern part of the Mid-Atlantic Ridge. Previous studies using both rDNA ITS2 intergenic spacer [18] and a combination of mitochondrial DNA sequences and allozymes [19] demonstrated the hybridizing potential of these two species on a very restricted area where the distribution of the two species meet. Concordant narrow clines in allele frequencies have been described [19] together with strong cytonuclear disequilibria in intermediate populations [20]. The history of divergence and gene flow, however, remains to be settled. Indeed, the hydrothermal vent complex of Bathymodiolus species in the northern Atlantic extends over more than 3000 km along the ridge axis, with fields separated by large transform faults, which offset the rift over tens to hundreds kilometres [21], [22]. The distribution of B. azoricus is limited to the north because of the rise of the rift up to Iceland and its southern limit is the hybrid zone centred on the Broken Spur site (29uN). Conversely, the northern range limit of B. puteoserpentis is the hybrid zone while the southern limit is unknown. Bathymodiolus phylogeny clearly indicates that B. azoricus and B. puteoserpentis are sister species of the Bathymodiolus boomerang species complex that inhabit the deep Atlantic cold seeps situated both on the American and African margins [17]. Won et al. [20] therefore proposed that the Broken Spur vent field (29uN) hybrid population corresponds to the dual arrival of migrants from more extreme source populations. Actually, this situation may suggest that either (i) B. azoricus and B. puteoserpentis have recently colonised the Mid-Atlantic Ridge via two independent entrance points and subsequently met again to hybridise in a secondary contact zone, or (ii) that ongoing speciation is occurring between the two species, in which case one species would derive from the other along an environmental gradient (parapatric speciation).
DNA sequence polymorphism data from one mitochondrial and seven nuclear loci were here gathered in order to assess genetic diversity, population subdivision and gene flow within and between both Atlantic vent species, and to perform divergence/ polymorphism analyses using the outgroup species Bathymodiolus thermophilus (Pacific Ocean). The goals of this study were: (1) to infer the existence of gene flow in the process of divergence by using the Isolation with Migration (IM) model [23], (2) to estimate the genome-wide level of gene flow and attempt to analyse its possible variation among genes, and (3) to determine whether genetic variation follows the neutral mutation-drift expectations.  (Figure 1). These two first species were collected using the telemanipulated arm of the ROV Victor6000. The third species, Bathymodiolus thermophilus, was sampled at 7u25S vent field (2450 m; 7u259S-107u479W) on the East-Pacific Rise (EPR) using the telemanipulated arm of the manned submersible Nautile, and on the Foundation hot spot (37uS on the Pacific Antarctic Ridge) (2200 m, 37u70S-110u879W) using a dredge and a chain bag. Samples were brought back to the surface alive, subsequently dissected on board. Foot tissue and gills were immediately frozen and kept in liquid nitrogen until nucleic acid extraction back in our laboratory. DNA was extracted using the classical phenolchloroform protocol [24]. A total of 44 and 52 individuals were used for the Rainbow and Menez Gwen B. azoricus populations respectively, 33 individuals for the Logatchev population of B. PCR amplification, cloning and sequencing DNA sequences were obtained for 7 nuclear loci and one mitochondrial gene ( Figure 2). Six are protein coding genes, Elongation Factor 1a (EF1a) (FJ767126 to FJ767244); Sadenosylhomocysteine hydrolase (SAHH) (FJ767612 to FJ767709); Collagen type XIV (COL1) (FJ766982 to FJ767125), Lysozyme (LYZO) (FJ767466 to FJ767611), Ferritin GF1 (with two distinct paralogues: GF1A and GF1B) (FJ767245 to FJ767353 and FJ767354 to FJ767407). The two other loci are the mitochondrial Cytochrome-C Oxydase subunit 1 (CO1) (FJ766849 to FJ766981) and the ribosomal internal spacer 2 (ITS2) (FJ767408 to FJ767465). Targeted nuclear genes were obtained from a collection of B. azoricus ESTs (full-length cDNA library, [25]).

Data collection
PCR amplifications were done with specific sets of primers designed from the conserved exonic regions of B. azoricus EST sequences, for which an intron was known to occur in other taxa. The sequences of the primer sets are summarised in Table 1. Primer sets and conditions used for CO1, EF1a and for Lysozyme have been described elsewhere [26]. For new loci, genes were amplified and cloned following the Mark-Recapture (MR) method [27] (see also [26], [28]) with 4-nucleotides 59-tailed primers (4 bp polynucleotides tag at their 59 ends). Reactions were performed in

Data analyses
For all genes, sequences were checked and edited using the Chromas 2.22 computer program (Technelysium Pty. Ltd., Helensvale, Australia). The sequences from each dataset were initially aligned with the program ClustalW [29] in BioEdit Sequence Alignment Editor [30], and improved manually. Artefactual mutations and in vitro recombinants were discarded when detected from alleles sequenced several times in multiplerecaptures. In order to prevent from sampling bias, only one sequence per individual was kept. In the specific case of heterozygote individuals, when both allelic forms were captured, the most frequently captured allele was retained. We used the two populations of B. azoricus as distinct samples (never pooled) in order to test whether a ''site effect'' occurred between Menez Gwen (850 m) and Rainbow (2500 m) and to get partial replicates when comparing B. azoricus and B. puteoserpentis.
As intragenic recombination was often detected, gene trees do not necessarily reflect gene genealogies. To represent an overview of the genetic relationships among alleles we inferred a network with the NeighborNet algorithm [31] in the software SplitsTree4 [32]. In addition, we paid a special attention to reconstructing gene genealogies for two loci which hold useful information. For the SAHH locus, recombination was not detected and the single possible gene tree was reconstructed with Genetree [33]. For the GF1B locus, in which a few recombination breakpoints and a few recombinant alleles were easily identified, we reconstructed the more parsimonious genealogy without recombinants and added recombinants afterwards with the help of the ancestral recombination graph reconstruction made possible with the Beagle software [34].
For each gene, the number of segregating sites (S), nucleotidic diversities (h w [35] and h p [36]), Tajima's D statistic [37], Ramos-Onsins and Rozas's R 2 statistic [38] minimum number of recombination events (Rm), [39] and absolute divergences (D XY ) were estimated using the DnaSP version 4.10 software package [40]. Genetic differentiation between populations was estimated with the wst statistic analysis carried out with the Arlequin software [41], using a matrix of Tamura-Nei's distances between individuals; with 1000 permutations. Statistical significance of Tajima's D tests for each locus and sample was assessed by coalescent simulations with the software HKA (J. Hey's web page: http://lifesci.rutgers.edu/~heylab/HeylabSoftware.htm#HKA). Although coalescence simulations are conditioned on Theta and not on the number of segregating sites, they are based on the HKA protocol [42], a multilocus framework which incorporates the information contained in the divergence with the outgroup (e.g. mutation rate). We nonetheless verified whether the results were different when conditioning simulations on S, and they were not. According to Ramirez-Soriano et al. [43], the R 2 statistic was tested by coalescence simulations conditioned on S. A multilocus version of the HKA test [42], a neutrality test that compares divergence and polymorphisms at several loci, was performed on the 8 loci with the HKA software in order to test whether differences in tree topologies reflect differing levels of evolutionary rates or selective processes. In this latter test, a locus under positive selection is supposed to have a polymorphism/divergence ratio smaller than that of a neutral one. Finally, when possible we also used the coalescence-based maximum-likelihood method of Galtier, Depaulis, and Barton [44]. This method is designed to detect a distortion in the shape of gene genealogies generated by a diversity-reducing event (hitchhiking or bottleneck). The likelihood of a model in which a drop in effective size of strength S occurred at time T in the past is compared to the likelihood of a constantsize model.

Fitting an Isolation-with-Migration model
To discriminate between the relative effects of divergence and gene flow on the speciation process, we analysed our data set under the Isolation with Migration model [45]. The model assumes that an ancestral population splits into two descendant populations that may continue to exchange genes after separation. The method estimates posterior probability distributions for both ancestral and actual population sizes, directional migration rates between the two populations, and the time elapsed since population splitting. A MCMC approach is used to draw a sample from the posterior distribution of genealogies and to estimate population parameters. The posterior distributions of migration rates and population sizes are derived analytically from the sampled genealogies [46], [23]. Convergence by the Markov chain simulations toward the true stationary distribution is checked by monitoring multiple independent chains started at different starting points and by assessing the autocorrelation of the parameter values over the course of the run. We also used the procedure for swapping among the cold primary chain and multiple heated chains (Metropolis coupling) to further ensure that the distributions we obtained actually reflected the stationary distributions [47]. Each locus was assigned an inheritance scalar, to adjust for its relative expected effective population size: 1.0 for autosomal locus, 0.25 for mtDNA. The method assumes no recombination and allows two mutation models: infinite sites (IS) and HKY with back mutation [48]. We chosed the HKY model and used the largest fragment without recombination for the analyses.
We first conducted a multi-locus analysis with the IMa program [45] and secondly analysed the migration rates of each gene separately with both the IMa and the IM program [46], [23]. Using the IM program allowed us a parameterisation of individual migration rates for different loci while theta and divergence time remained multi-locus estimates. The 90% highest posterior density (HPD) interval, that is the shortest span that includes 90% of the probability density, was recorded as. Conversion into effective population size estimates was done from the relationship N e = h/ 4u. Migration parameters of the model were used to calculate the number of migrants exchanged between the two hybridising species (i.e., M = 2Nm, the product of the effective number of gene copies and the per gene copy migration rate). Thus, M = 2N e m = (4N e u 6 m/u)/2 = h 6 m/2 [23].
When the method reveals nonzero migration rate estimates, the posterior distribution of the mean time of migration events was estimated for trees sampled from the Markov chain that had at least one migration event (given by the IM program) following Won and Hey [3].

Genetic variation
To avoid a sampling bias in estimates and tests that assume random sampling, we used one multiple captured allele per individual, this allele being the most frequently captured in our mark-recapture cloning procedure [27]. From both Atlantic species and the outgroup B. thermophilus, a total of 559 sequences have been analysed at one mitochondrial locus and 7 nuclear loci, leading to a global number of 437 756 bases sequenced (745 bp per sequence in average). Allelic and nucleotide diversities as well as divergence to the outgroup are summarized in Table 2 for the three Bathymodiolus species. The weighted mean of Watterson's h estimators [35] across autosomal loci (i.e. without CO1), were 0.0076, 0.009 and 0.0101 for B. azoricus (Rainbow), B. azoricus (Menez Gwen), B. puteoserpentis, respectively. The average diver- gence to the Pacific species B. thermophilus at autosomal loci was 3.6%. The divergence observed at the mtDNA locus was much higher (11.8%) while the diversity was lower than the nuclear average, in accordance with a higher rate of evolution and a smaller effective population size of the mitochondrial genome. Pairwise Wst estimates between the two B. azoricus populations were never significantly different from zero. As a consequence the two azoricus samples were pooled in following analyses. The between-species comparison (B. azoricus/B. puteoserpentis) detected significant differences at five loci: CO1 (P = 0.01), SAHH (P = 0.01), LYSO (P = 0.01), EF1a (P = 0.05), and ITS2 (P = 0.05).

Contrast of the phylogenetic signal across gene trees
All loci displayed a large number of parsimony informative sites (77 per locus on average when excluding the outgroup). Networks inferred with the NeighborNet algorithm depict the genetic relationship among alleles for the 8 genes in Figure 3. The mitochondrial locus CO1 is the only one that displayed complete reciprocal monophyly of alleles sampled in both species. For the nuclear genes SAHH and LYSO, two well defined groups of alleles were observed, each of which preponderantly composed of genes sampled in the same species with only a few genes sampled in the other species. To a lesser extent, the ribosomal spacer ITS2 and EF1a tended to fit fine with the species dichotomy B. azoricus and B. puteoserpentis. The three other genes COL1, GF1A and GF1B displayed highly reticulated networks and an absence of association with geography (species). Although one could have attempted to define two groups of alleles at loci GF1A and GF1B, there was no possibility to attribute these groups to a given species. The position of the outgroup and the shape of the network obtained at locus COL1 did not produce phylogenetic relationships which easily fit with a long period of divergence between two gene pools.
The five loci which retained a probable footprint of differentiation were used to define two groups of alleles (ellipses in Figure 3) and the frequency of such defined synthetic alleles was reported in Table 3.
Variations in phylogenetic patterns across loci can be readily summarized by comparing the levels of species divergence across loci. We computed the net nucleotide divergence d, which is the average divergence between B. azoricus and B. puteoserpentis minus the average sequence divergence within both species (p puteo +p azo )/2). Without inter-species gene flow, d should be positive, while under a model in which B. azoricus and B. puteoserpentis have exchanged genes since the separation from their common ancestry, d may fall down to zero and even become negative. The quantity d was positive for loci CO1, SAHH and close to zero for the remaining loci ( Figure 4). d can then be compared to the divergence obtained between the two Atlantic species and their Pacific outgroup. In the absence of gene flow, a positive relationship is expected between evolutionary rates. However, the relationship between the B. azoricus/B. puteoserpentis divergence and the Pacific/Atlantic divergence was not significant. Five loci (SAHH, COL1, LYSO, GF1A, ITS2) out of the eight displayed similar ratios ( Figure 5). Three genes seemed to depart from the genomic trend. CO1 and GF1B exhibited higher values, while EF1a displayed a lower ratio.
Tajima's D tests [37] were used to measure the deviation from the mutation/drift equilibrium under neutrality (Table 2). In the three mussel populations, Tajima's D was mostly negative and sometimes significantly so (CO1, EF1a, ITS2). However, the LYSO locus displayed a D close to zero and the GF1B locus displayed a significantly positive D value. As population structure has an important impact on Tajima's D, the average D was plotted against W ST in Figure 6. The higher Tajima's D observed at loci LYSO and GF1B could likely be the consequence of secondary introgression of divergent alleles while low gene flow at the CO1 and SAHH loci prevented divergent alleles to segregate within the same population. However, it remained puzzling why the D was so low at the remaining four loci. To complement Tajima's D tests, a multi locus HKA test has been performed. This test indicates that the ratio of polymorphism to divergence did not vary significantly among loci, however, the result for B. puteoserpentis was nearly significant: sum of deviation = 3.9587 (df = 7; P = 0.7845); 7.7511 (df = 7; P = 0.3550); 13.3811 (df = 7; P = 0.0633), for B. azoricus Rb vs B. thermophilus 7S; B. azoricus MG vs B. thermophilus 7S; and B. puteoserpentis vs B. thermophilus 7S respectively. We also computed the Ramos-Onsins and Rozas' R 2 statistic [38]. R 2 is a statistic based on the difference between the number of singletons per sequence and the average number of nucleotide differences and is especially well designed and powerfull to detect population expansion. R 2 was significant for CO1, COL1, (for all the populations), GF1B, ITS2 (for the B. puteoserpentis_Logatchev population), and SAHH (for the B. azoricus_Menez Gwen population).
The analysis of patterns of genetic diversity revealed a strong contrast among loci. Although some loci have well registered a history of divergence, exhibiting two distinct groups of alleles that were sometimes mixed up (e.g. GF1B) and sometimes not (e.g. CO1), other genes did not (e.g. COL1). Tajima's D was mostly negative suggesting population expansion, especially in B. azoricus. However, the homogenisation of divergent alleles was strong enough at loci GF1B and LYSO to restore positive D values. Possibly, selection could have superposed to history and demography and might explain that 3 genes (CO1, SAHH, and EF1a) departed from the averaged evolutionary rate estimated from the divergence with the Pacific species.

Isolation-with-Migration model
Repeated runs of the IMa program revealed unambiguous marginal posterior probability distributions of the parameters for the two actual species and the ancestral one. The peaks of the six parameter estimates were confined to fairly narrow ranges with corresponding posterior distribution illustrated in the Figure 7. In order to convert parameter estimates in more easily interpreted units, we estimated substitution rates for each locus from the divergence between the species of B. thermophilus from the EPR and its sister-species from 37uS, under the hypothesis that deep-sea species have been separated during the formation of the Eastern Island microplate (at 25uS) about 5.9 millions years ago [49]. The Eastern Island microplate was created on the axis of the ancestral Pacific ridge, and is thus responsible of the split of the South Pacific hydrothermal fauna north and south of this island. A geometric mean of the B. thermophilus EPR -37uS DNA sequence divergence across loci was calculated and subsequently used to convert the time since separation parameter, t, to divergence in years. The mutation rate can be estimate as r = D/2T, where D is the pairwise sequence divergence and T is the time of divergence, multiplied by 2 to account for the age of each lineage. The geometric divergence for the mitochondrial and the seven nuclear genes is 1.07%. This divergence gave a substitution rate of 0.09% per MY. These values are in average twice as low as previous evolution rates estimated by Chevaldonné et al. [50] and Johnson et al. [51] for other hydrothermal invertebrates using the mtCO1 gene. However, this substitution rate is nearly equal to the rate of 0.10% per MY found by Johnson et al. [51] on a nuclear locus. Taking into account the length of each gene (Table 2), we estimated the number of mutations per locus per year to be from 0.25610 26 (ITS2) to 2.07610 26 (CO1). The geometric mean, u = 0.63610 26 was used to rescale the IMa parameter estimates from the combined analysis. We used this value to directly convert the estimate t (the number of substitutions) to a number of years since population splitting.
To convert estimates of the population mutation rate parameters (h 1 , h 2 , and h A ) to estimates of effective population size (N 1 , N 2 , and N A , respectively), we estimated the generation time of Bathymodiolus from previous estimations of the mussel individual growth rate: 0.6 to 1.1 cm per year [52], [53] and the size at which individuals reach their sexual maturity (2 years: the minimum size observed for individuals able to reproduce is 19.7 mm for male and 31.  Estimates of the effective population size are shown in Table 4. The estimate of the B. azoricus population size was five times greater than that of B. puteoserpentis and the ancestral one, which suggests population expansion in B. azoricus. The marginal posterior probability distribution of the divergence parameter, t, revealed a sharp peak at 0.477 (Figure 7), which converted into a divergence time of 0.758 MYA with the 90% HPD interval ranged from 0.599 to 0.953 MYA ( Table 4). The multi-locus estimates of migration parameters revealed an asymmetric gene flow from B. azoricus to B. puteoserpentis with a mode around 2Nm = 4.3 ( Figure 7, Table 4), while the reverse flow from B. puteoserpentis to B. azoricus did not appear to significantly depart from zero.
Although the multi-locus estimates benefited from the statistical power of the full information contained in the eight loci the single locus analysis of population differentiation (Table 5) suggested that gene-flow could vary between genome regions as expected in the model of semi-permeable barriers to gene flow [7]. A single locus analysis of divergence with gene-flow under the Isolation with Migration model was therefore conducted. For this analysis we compared the outputs obtained with the IM [46] [23] and the IMa [45] programs. As results were very similar, we here present the results obtained with the IMa program that retrieved graphically clearer posterior distributions. Results are presented in Table 5 as a series of single-locus probabilities and in Figure 8 where posterior distributions are drawn. The mode, 90% HPDs and shapes of the posterior distributions allowed us to distinguish three categories of loci: (1) one locus without any evidence of migration (max at 0 in both direction), CO1; (2) four loci with a maximum of the marginal posterior distribution different from 0 at least in one    (Table 5). Despite the lack of power of the single locus analyses, differences between extreme values showed up. For instance, loci ITS2 and EF1a showed distinguishable probability distribution to locus CO1 (Figure 8), with the former suggesting evidence of gene flow while not the latter. Estimates of the mean time of migration events have been obtained for the six loci with no-null migration, even if not significant (EF1a, SAHH, COL1, LYSO, GF1B and ITS2) and are provided in Table 5. This analysis suggested the existence of old migration events, which averaged at the half of the divergence period (mtime/t ratio in Table 5) around 0.4 MYA.

Evidence of a selective sweep at the SAHH locus
SAHH is the only nuclear locus exhibiting a strong differentiation between the two species and a high d/D ATL/PAC ratio. The hypothesis that selection could have played a role in the evolution of this locus was therefore hypothesised. Although Tajima's D and HKA tests were not significant, the Fay and Wu's H test which is designed to be sensible to an excess of high frequency derived variants was significantly different from zero (H = 28.52, P = 0.04). The SAHH data were compatible with the infinite mutation model in B. azoricus and the single gene tree was reconstructed in Figure 9. The gene tree shows the partially starshaped genealogy observed at this locus in B. azoricus which is characteristic of selective sweeps [54]. We therefore used the coalescence-based maximum-likelihood method of Galtier, Depaulis, and Barton [44]. Results of the runs indicated that the model with a selective sweep was significantly better supported

Evidence for secondary introgression from interclade recombinant alleles at the GF1B locus
The GF1B locus displayed positive Tajima's D values (Table 2), a fast rate of evolution in the Atlantic species when compared to the divergence from the outgroup B. thermophilus (Table 2), and evidence of recombination between divergent alleles. Although highly reticulated, a closer look at the phylogenetic network in Figure 3 revealed two groups of alleles and intermediate sequences with small terminal branches. One expects ancestral lineages to exhibit long terminal branches and these sequences seemed better explained by recombination. We conducted a detailed analysis of recombination the results of which are synthesized in the ancestral recombination graph presented in Figure 10. Four distinct recombination events between well distinct and divergent alleles have been identified. The interesting result of this analysis was that the diversity observed at this locus allowed to infer with little ambiguity that recombination between divergent alleles occurred recently after a long period during which recombination did not happened. This finding is in agreement with the hypothesis of secondary introgression of divergent alleles after a period of divergence in allopatry.
Allele frequency spectra: searching for the footprint of interrupted gene-flow in the gene history As the GF1B locus provided compelling evidence of secondary introgression, we searched for the footprint of interrupted gene flow at other loci. The idea was that intragenic recombination can erase the divergence produced by a period of allopatry as it was partly the case for the GF1B locus. Substitutions that were fixed during allopatry would then be dispatched among chromosomes by recombination but if introgression was balanced enough and the number of substitutions was high enough, an excess of mutations in intermediate frequency should be observed. In Figure 11 is presented the distribution of the average frequency of variant nucleotidic sites (when compared with the outgroup sequences) in the two Atlantic species. Logically, the CO1 and SAHH loci showed an excess of mutations with an intermediate average frequency which are indeed the fixed differences between the two species. The GF1B locus also displayed a clear excess of mutations with an intermediate average frequency which correspond to the substitutions of the internal branch in Figure 10. For the other five genes, the main effect was an excess of rare variants which can be explained by demography or a slightly deleterious effect of polymorphic mutations. However, once the excess of rare variants was accounted for, an excess of mutations with an intermediate frequency was still detected at the locus LYSO (Fisher's exact test: p = 0.01).

Discussion
The two species of Bathymodiolus discovered along the Mid-Atlantic Ridge [55], [56] were first described from their morphologies and considered to be distinct species named B. azoricus and B. puteoserpentis [57]. O'Mullan et al. [19] and Won et al. [20], however, reported that these two species were able to    hybridise at the Broken Spur vent field (29u109N/43u109W; 3350 m depth) situated between the two species' ranges. These authors have interpreted local hybridisation between the two Atlantic species as the result of an accidental secondary contact between two divergent species [20] which is the traditional interpretation in the hybrid zone framework [58]. However, it is also well-known in the hybrid zone literature that it is extremely difficult to tell apart secondary contact zone from parapatric primary differentiation [59] and the former hypothesis was preferred because the latter was thought more unlikely to happen. Recently, a new burst of data relevant to this question lead to a reevaluation of the easiness of speciation with gene flow [5]. In the case of B. azoricus and B. puteoserpentis, the geographic context is not easily reconcilable with the opportunity for a secondary contact as the Northern Mid-Atlantic Ridge is a dead-end limited to the north by Iceland. This is why our eight-locus analysis of DNA sequence polymorphism was aimed to reconstruct the history of divergence and gene flow between these two closely-related organisms in the original environment of hydrothermal vents, in an to attempt to accumulate evidence for one of the two scenarios, secondary contact or parapatric speciation.

Inferences from the isolation with gene-flow model
As recently done for an increasing number of pairs of closely related species [2], [4], [5], the multi-locus fit to an isolation with gene flow model produced useful inferences to understand how isolation happened between B. azoricus and B. puteoserpentis. First, divergence was estimated to date back to more than half a million years. Secondly, we estimated that the effective population size of B. azoricus was 2-5 times greater than that of B. puteoserpentis and the ancestral population suggesting population expansions in B. azoricus. This finding is in agreement with census population sizes observed for both species -dense mussel beds in the case of B. azoricus [60] versus scattered individuals in the case of B. puteoserpentis [61]. Finally, evidence for gene-flow was unambiguous although mainly asymmetrical from the B. azoricus to B. puteoserpentis. The asymmetry of gene flow can have several explanations. (i) Hybrid zone theory predicts asymmetry of a genetic barrier to gene flow according to the density of populations [62]. Both population genetics inferences and observations in the field are in accordance for a denser and bigger population of B. azoricus, which would agree well with the asymmetry of the barrier to gene flow. (ii) Hybrid zone theory also predicts that moving hybrid zones are expected to leave a footprint in the form of asymmetrical introgression [58], [63]. The inference of an asymmetric gene flow could therefore also be in accordance with a northward displacement of the hybrid zone. A recent review [63] suggests that hybrid zone movements could be more frequent than previously thought. The direction of introgression observed here would be in accordance with a northward displacement of the Bathymodiolus hybrid zone. Providing a displacement of the zone was indeed the explanation of asymmetric introgression, one might hypothesised it could have been favoured by abyssal currents along the Mid-Atlantic Ridge. Indeed, the Brazil basin represents a vortex zone, from which the AABW (Antarctic Bottom Water) is subdivided into deep-sea water jets that move northward against the western flank of the MAR to reach the Florida Escarpment [64] and, enters the Mid Atlantic Ridge using the Vema Fracture Zone located near 10uN/MAR. At this point, the residual jet favours larval dispersal further north [65]. (iii) Finally, asymmetric gene flow could be a consequence of an asymmetry of hybridisation, either genome-wide or depending on the isolation genes principally responsible for the barrier to gene-flow generated on the chromosomal region a locus belongs to. Distinguishing among these various explanations however is difficult and would require experiments in the lab hardly achievable with Bathymodiolus mussels.
The history of gene-flow: what evidence for a period of no geneflow?
When we undertook the analysis of DNA sequences in Bathymodiolus we had in mind that we might be able to distinguish secondary contact from parapatric differentiation. However, it was not that simple and although evidence of gene flow was straightforward, it was unclear whether it meant genes were continuously exchanged during the whole divergence period or whether it remained compatible with a period without gene flow before secondary contact. Taking at face value, the result from the posterior distribution of the mean time of migration events could suggest migration mainly occurred around 0.4 MYA, half of the divergence period (Table 5). However, the statistical behaviour of this parameter has not been extensively studied and it is difficult to understand its true meaning. For instance, our data are enriched in rare variants (e.g. singletons) which can be explained in various ways (demography, selection, …) that are not compatible with the IM model assumptions and which could therefore result in an absence of gene flow in the recent past under the IM framework. The divergence being short relative to the coalescence time, this recent period freed from coalescence events, and therefore from migration events, in gene trees can contribute to artificially overestimate the mean time of migration events. It is unclear also how this parameter would react when the true story implies several periods of gene flow interrupted by periods of isolation. As the parameter is the mean time of all the migration events, the multimodality should not be detected. One might look more closely at the branching of easily identified introgressed alleles in observed gene trees in order to verify whether observed introgression times concur with the posterior estimates from a predefined model. Introgressed alleles are visible only when divergence is strong enough and gene flow low enough. This was the case for locus SAHH which exhibited fairly clear examples of introgression of B. azoricus alleles in the B. puteoserpentis background. We would like here to emphasise that this observation cannot be the result of the sampling of rare B. azoricus individual migrants in a B. puteoserpentis population because we sampled far away from the identified hybrid zone [19] and none of the introgressed individuals displayed a hybrid signature. We identified two alleles which are likely results of introgression at the SAHH locus, (see the two arrays in Figure 3), one (labelled 1 in Figure 3) coalesced at the tip of the network and was in accordance with recent secondary introgression while the other (labelled 2 in Figure 3) coalesced more deeply and could well indicate ancient gene flow. To a lesser extent, locus LYSO also exhibited strong divergence together with restricted gene flow and both kinds of branching were also observed ( Figure 3).
Another useful information, although less studied (but see [66]), is intragenic recombination, because recombination can only occur between alleles that are segregating together within the same populations and genomes. The reconstructed ancestral recombination graph of the GF1B locus ( Figure 10) allowed us to identify the presence of recent recombinant alleles between two divergent clades of alleles. The recent take again of recombination at the GF1B locus is a strong argument for the hypothesis of secondary introgression. However, it also suggests that secondary introgression operates since a sufficiently long period of time to allow newly formed recombinants to reach a high enough frequency to be sampled. According to Figure 10, recombination acted during roughly the 10-20% end of the global divergence period which would result in a guesstimate of the time of the secondary contact of 50-200 thousands of years (from the estimate of the divergence time in Table 4). This guesstimate is not in accordance with the mean time of migration events estimated with IMa. The true value is probably in-between. The important point is the relevant evidence for the existence of a period without gene flow that prevented recombination.
Secondary contact and subsequent introgression would therefore explain our results. Because no species can enter the Atlantic ridge further north of Iceland, the only scenario of the MAR colonisation that allows the formation of a secondary contact zone is a first colonisation by the species B. azoricus. Its geographic distribution was then challenged by B. puteoserpentis located further south. Such a scenario implies that the Mid Atlantic Ridge would have been independently colonised twice. As both vent species seem to be related to the B. boomerang complex of species (sister group) located at cold seeps from the deep-sea Caribbean region [17], geographic isolates originated from either Barbados accretion prism or the Florida escarpment were likely to colonise the rift separately and met subsequently to hybridise.

The impact of selection on the differentiation process
The effect of a possible selective sweep at the most discriminating nuclear locus SAHH has been detected. In addition, negative and significant values of the Tajima's D and Ramos-Onsins and Rozas's R 2 were found at the mitochondrial gene CO1 which remains the only locus differentially fixed between the two species. Recurrent selective sweeps are indeed likely to rejuvenate the coalescence time of the mitochondrial genome in species with big populations such as marine molluscs [67]. One can therefore argue that selection helped to accelerate the evolution toward monophyly at these two loci, explaining their status of (semi-) diagnostic markers. However, at least two other loci, LYSO and GF1B, also registered a strong divergence between two groups of alleles that are interpreted to have introgressed secondarily. Therefore, SAHH and CO1 also need to be in genomic regions influenced by isolation genes responsible for the genetic barrier to gene-flow. One can then hypothesise that the sweeps detected were the consequence of positive selection at speciation genes linked to SAHH and CO1 [68]. Some other ecological elements like the symbiont composition of the Bathymodiolus gills can also act as selective constraints. Both species indeed yield two kinds of symbionts, one being a sulfo-oxidizing bacteria and the other a methanogen coccus. Both symbionts coexist in the mussel gills [69], [70] but can be easily lost depending on the fluid composition [71] or selectively acquired by each species in the environment [72].
Another possible effect of selection we already discussed in a previous paper is slight purifying selection on polymorphic mutations. We suspected such a deleterious effect of polymorphic mutations at the EF1a locus in Bathymodiolus [26]. Purifying selection at some locus can inflate the contrast observed between loci with some having registered well the period of allopatry with multiple fixations (e.g. CO1, SAHH, LYSO) while other loci having evolved too slowly and exhibiting an excess of rare, slightly deleterious, variants (e.g. EF1a, ITS2; see Figure 11). However, despite low evolutionary rates, the species structure was still observable and alleles segregate into two distinct groups. For some other loci (COL1, GF1A), the situation seemed more complex. Other confounding effects can be a high rate of intragenic recombination that artificially makes younger the divergence [73] such as observed at GF1A, or the complete replacement of one lineage by introgression. Such processes were reported in Drosophila to explain complex genealogies between D. permilis, D. pseudoobscura and D. p. bogotana. [74]; [75].

Conclusion
The two closely-related species of Bathymodiolus from the Mid-Atlantic Ridge were known to hybridise locally within the narrow hybrid zone of Broken Spur [19], [20]. The analysis of populations located at both extremities of the mussel range by using a multilocus approach indicated that the divergence started more than half a million years ago and that gene flow occurs or has occurred, although mainly asymmetrically from B. azoricus to B. puteoserpentis. Finally, we obtained some evidences that gene-flow was due to secondary introgression after a period of allopatry. The main argument is the recent restart of intragenic recombination after a period where it proved to be blocked at the GF1B locus.