Combined Mitochondrial and Nuclear Markers Revealed a Deep Vicariant History for Leopoldamys neilli, a Cave-Dwelling Rodent of Thailand

Background Historical biogeography and evolutionary processes of cave taxa have been widely studied in temperate regions. However, Southeast Asian cave ecosystems remain largely unexplored despite their high scientific interest. Here we studied the phylogeography of Leopoldamys neilli, a cave-dwelling murine rodent living in limestone karsts of Thailand, and compared the molecular signature of mitochondrial and nuclear markers. Methodology/Principal Findings We used a large sampling (n = 225) from 28 localities in Thailand and a combination of mitochondrial and nuclear markers with various evolutionary rates (two intronic regions and 12 microsatellites). The evolutionary history of L. neilli and the relative role of vicariance and dispersal were investigated using ancestral range reconstruction analysis and Approximate Bayesian computation (ABC). Both mitochondrial and nuclear markers support a large-scale population structure of four main groups (west, centre, north and northeast) and a strong finer structure within each of these groups. A deep genealogical divergence among geographically close lineages is observed and denotes a high population fragmentation. Our findings suggest that the current phylogeographic pattern of this species results from the fragmentation of a widespread ancestral population and that vicariance has played a significant role in the evolutionary history of L. neilli. These deep vicariant events that occurred during Plio-Pleistocene are related to the formation of the Central Plain of Thailand. Consequently, the western, central, northern and northeastern groups of populations were historically isolated and should be considered as four distinct Evolutionarily Significant Units (ESUs). Conclusions/Significance Our study confirms the benefit of using several independent genetic markers to obtain a comprehensive and reliable picture of L. neilli evolutionary history at different levels of resolution. The complex genetic structure of Leopoldamys neilli is supported by congruent mitochondrial and nuclear markers and has been influenced by the geological history of Thailand during Plio-Pleistocene.


Introduction
Subterranean habitats and their singular fauna have fascinated biologists and biogeographers for centuries. Historically, there has been a long debate over the hypotheses regarding the evolution of subterranean animals and particularly the relative role of vicariance and dispersal and the influence of potential isolating barriers to explain biogeographic patterns of these taxa [1,2]. According to the vicariant hypothesis, the ancient range expansion of a large ancestral population is followed by its fragmentation due to the appearance of isolating barrier [3][4][5]. In contrast, according to the dispersalist hypothesis, active colonization of new habitats from a small centre of origin happens across pre-existing barriers [2,5]. It is now suggested that a combination of both vicariance and dispersal better explains the biogeography of several subterranean taxa rather than a single model (see [1,2,6] for a review).
The modes of speciation of subterranean animals have also been debated and several models proposed: the Climatic Relict Hypothesis (CRH) [7] and the Adaptive Shift Hypothesis (ASH) [8,9]. The CRH was suggested for temperate ecosystems where unfavourable environmental conditions during Pleistocene climatic fluctuations divided an ancestral surface population into subterranean populations which retreated beneath the surface, and surface populations which may have become extinct or have migrated. To explain the evolution of subterranean fauna in the tropics, an alternative model, the ASH, was proposed. This model assumes an active colonization of subsurface habitats to exploit new resources coupled with an adaptive differentiation of surface and subterranean populations and possible gene flow between them for some time before genetic isolation and parapatric speciation occurred. Support for both models has been obtained for diverse species in different regions of the world using phylogenetic analysis, molecular dating and analysis of the current geographical distributions of taxa [10][11][12].
Due to their discontinuous distribution and the peculiarity of their fauna, limestone karsts constitute an interesting model to explore these evolutionary processes. Limestone karsts are sedimentary rock outcrops consisting of calcium carbonate sculpted by solutional erosion into the typical landscape made of cone-shaped or sheer-sided hills riddled with caves and sinkholes [13]. Karst biodiversity is characterized by high levels of endemic species adapted to this extreme environment and include a wide variety of organisms among which some species are permanent subterranean resident in caves while others are associated with karst surface [14]. Most of these species are rare and confined to small geographic areas [13,14]. Studies of karst and cave organisms available in the literature have mainly focused on temperate ecosystems [10,13] while karst biodiversity remain largely unexplored in Southeast Asia, even though this region encompassing four biodiversity hotspots [15] has some of the largest limestone karst regions of the world which cover an area of 460000 km 2 [16].
In order to improve our knowledge of karst biodiversity in southeast asian limestone karsts and to better understand the biogeographic processes operating in these habitats, we studied the phylogeography and evolutionary history of Neill's rat Leopoldamys neilli (Marshall, 1976), a murine rodent species endemic to limestone karsts of Thailand, within the Indo-Burmese biodiversity hotspot. The species was discovered in karsts of the Saraburi province, central Thailand and has also been recorded in few locations in northern and western Thailand ( Figure S1) but the limits of its geographical range are not clearly resolved [17][18][19][20]. Following the classification of Sket [21], L. neilli is a subtroglophilic species, i.e., a resident of subterranean habitat associated with karst surface to complete some parts of its life cycle [22]. This species was listed as ''Endangered'' on the IUCN Red List but is now classified as ''Data deficient'' due to the lack of data about its distribution and ecological requirements [23]. Two additional Leopoldamys species, L. edwardsi and L. sabanus, both semi-arboreal rodents living in forest ecosystems, also occur in Thailand. A recent molecular phylogeny suggested that L. neilli and L. edwardsi are sister species [24].
A preliminary study of the phylogeography of L. neilli based on one nuclear and two mitochondrial fragments revealed a strong geographic structure of the mitochondrial DNA (mtDNA) genetic diversity for this species [25]: six highly divergent and allopatric genetic lineages that were isolated for a long time were observed in Thailand. However, the nuclear fragment used in this study (the intron 7 of the b-fibrinogen gene) showed a very weak genetic variability within the study area and presented no phylogeographic signal. Therefore conclusions of this previous study were mainly derived from mtDNA. Due to its numerous advantages such as high rate of evolution, lack of recombination and haploidy, mtDNA has been widely used as a classical phylogeographic marker. However, due to its maternal inheritance and risks of introgression, mtDNA also presents some inconveniences and could yield to biased phylogenies that are not representative of the actual species tree (e.g. [26][27][28]). Therefore, to avoid this trouble and to refine the rather conflicting results obtained from a single nuclear fragment and two mitochondrial markers in Latinne et al. [25], the combined analysis of multiple independent loci with various evolutionary rates was required to corroborate or reject the hypotheses previously established in this last study. Moreover, the sample size of this previous study was excessively small for some regions of northern Thailand and needed to be expanded with more samples and more localities to yield reliable conclusions.
Using a much larger sampling and various nuclear markers, the present study was conducted to address the following questions: (i) How are L. neilli populations genetically structured throughout Thailand according to mitochondrial and nuclear markers? (ii) Is the phylogeograhic structure of mtDNA representative of the evolutionary history of L. neilli? Are molecular signatures of mitochondrial and nuclear markers congruent? (iii) When did the divergence between L. neilli and L. edwardsi occur and what can we infer about the speciation process of L. neilli? (iv) Where were L. neilli ancestors distributed? What are the relative roles of vicariance, dispersal and potential isolating barriers in the evolutionary history of L. neilli? (v) According to its distribution and genetic structure, which conservation status should be given to this species?

Ethics statement
Sample collection was carried out in collaboration with researchers located at the Thailand Institute of Scientific and Technological Research (TISTR) and directors of Wildlife Research Stations (under the administration of the Department of National Park, Wildlife and Plant Conservation in Thailand) having the required permits. This study is part of the ''CERoPath project'' (Community Ecology of Rodents and their Pathogens in South-East Asia: effects of biodiversity changes and implications in health ecology), ANR Biodiversity ANR07 BDIV012. CERoPath protocols (available at http://www.ceropath.org/references/ rodent_protocols_book) were evaluated by the ethics committee of Mahidol University in Bangkok.

Sampling and DNA extraction
A total of 225 samples of L. neilli collected in 28 localities ( Figure 1, Table S1) during a two-year survey of 122 limestone karsts distributed in all karstic regions of Thailand were used in this study ( Figure S1). These 225 samples include 115 specimens from a previous study [25] and 110 new individuals, among which 50 come from eight localities (CHAI1, CHAI2, CHR, KK1, KK2, PET, SARA5, UT) unsampled in Latinne et al. [25]. Field identifications were made based on morphological criteria according to [17,29]. Animals were live-trapped and released after taking a tissue biopsy from the ear. Skin samples were stored in 96% ethanol. Genomic DNA was extracted from skin samples using the DNeasy Tissue Kit (Qiagen Inc.) following the manufacturer's protocol.

Mitochondrial and nuclear DNA amplification
Two mitochondrial markers, the cytochrome b gene (cytb) and the cytochrome oxidase I gene (COI), were used in this study in addition to two nuclear fragments; the intron 7 region of the bfibrinogen gene (bfibr) and the intron 1 region of the X-linked glucose-6-phosphate dehydrogenase gene (G6pd). Sequences of cytb (n = 115), COI (n = 115) and bfibr (n = 65) were available from a previous study [25] and were associated to newly amplified sequences. The final dataset included 225 sequences of mitochondrial genes (cytb, COI) and 104 sequences of nuclear introns (bfibr, G6PD) from a subset of 104 individuals representative of the main sampling localities. Primers sets used to amplify the cytb, COI, bfibr and G6pd genes and their annealing temperatures are listed in Table S3. PCRs were carried out in 50 ml volume containing 12.5 ml of each 2 mM primers, 1 ml of 10 mM dNTP, 10 ml of 56 reaction buffer (Promega), 0.2 ml of 5 U/ml Promega Taq DNA Polymerase and approximately 30 ng of DNA extract. Amplifications were performed in thermal cycler VWR Unocycler using one activation step (94uC for 4 min) followed by 40 cycles (94uC for 30 sec, 50-58.7uC (depending on the primers, Table S3) for 60 sec, and finally 72uC for 90 sec) with a final extension step at 72uC for 10 min. Sequencing reactions were performed by Macrogen Inc. (Seoul, Korea) using sequencing on an ABI 3730 automatic sequencer.
For the nuclear genes, heterozygous states were identified as strong double peaks of similar height in both forward and reverse strands, or when the particular base corresponding to the dominant peak alternated on the two chromatograms [30].

Microsatellite genotyping
The 225 samples used in this study were genotyped at twelve variable microsatellite loci using the multiplex sets and PCR conditions reported in [31]. Amplified DNA was analyzed for length variations on an ABI 3700 sequencer using GENEMAP-PER 4.0 (Applied Biosystems).

Mitochondrial and nuclear DNA analysis
Alignment, phasing and recombination. Sequences were aligned in BIOEDIT 7.0.9.0 [32] using the ClustalW algorithm. Haplotypes were identified using ARLEQUIN 3.11 [33]. For nuclear sequences, two alignments were created: first, heterozygous sites were coded using the IUPAC ambiguity codes (unphased dataset); and second, gametic phases of nuclear sequences were inferred using the Bayesian algorithm implemented in PHASE 2.1 [34] (phased dataset). Two runs were conducted for 1610 4 iterations with the default values.
Recombination in our nuclear genes was tested using the PHI test implemented in SPLITSTREE 4.11.3 [35] and the tree-based SBP and GARD methods [36] implemented online via the Datamonkey webserver [37].
Phylogenetic analysis. Phylogenetic reconstructions were performed using the maximum likelihood (ML) and Bayesian inference (BI) approaches. Analyses were first run independently on mitochondrial and nuclear loci and then on combined datasets (cytb/COI, bfibr/G6pd (both phased and unphased datasets) and cytb/COI/bfibr/G6pd). The most suitable model of DNA substitution for each locus and dataset was determined using MODELTEST 3.0 [38] according to the Akaike Information Criterion (AIC). PhyML 3.0 [39] was used to perform ML analyses with default parameters as starting values. Robustness of the tree was assessed by 1000 bootstrap replicates. Bayesian analyses were performed with MRBAYES 3.1.1. [40]. Metropoliscoupled Markov chain Monte Carlo (MCMC) sampling was performed with 5 chains run for 3610 6 generations with one tree sampled every 1000 generations. All trees obtained before the Markov chain reached stationary distribution (empirically determined by checking of likelihood values) were discarded as burn-in values. A 50% Majority-rule consensus tree was generated in PAUP 4.0b10 [41]. Median joining networks were performed with NETWORK 4.5.1.6 [42] to explore relationships among haplotypes for the mitochondrial (cytb/COI) and nuclear (bfibr/G6pd) datasets.
Genetic diversity and population differentiation. Haplotype (h) and nucleotide (p) diversities of the main lineages identified by phylogenetic analysis were estimated for the four loci independently using ARLEQUIN. Tables of nuclear allele frequency were computed with GENE-POP 4.0.11 [43] and frequency differences (genic differentiation) were tested for each nuclear locus and across all nuclear loci for all pairs of lineages containing more than three individuals with GENEPOP.
The net genetic distance between lineages was computed in MEGA 4.1 [44] under the Kimura two parameters (K2P model) for the cytb dataset. Genetic differentiation among lineages containing more than three individuals was quantified by computing pairwise differentiation W ST for the mitochondrial dataset and F ST for the nuclear dataset using ARLEQUIN.
A Mantel's test performed in ARLEQUIN on mitochondrial and nuclear datasets independently was used to assess the hypothesis of isolation by distance (IBD) among the four groups of populations by comparing pairwise geographic distance (log transformed) with pairwise (F ST /[12F ST ]).
Divergence time estimates. Divergence times of L. neilli and L. edwardsi, and of the main lineages of L. neilli (approximation of the time to the most recent common ancestor-TMRCA, [45]), were estimated using Bayesian inference, as implemented in the program BEAST 1.6.1 [46] on the cytb dataset. Three fossil-based calibration points were used: (i) the split between the tribe Phloemyini and the other tribes of Murinae at 12.3 Myr (we considered Progonomys [47] as the MRCA of extant murines and assigned the oldest record of Progonomys at 12.3 Myr to the split between the tribe Phloemyini and the other tribes of Murinae rather than the Mus/Rattus split [48,49]) (ii) the divergence between Apodemus mystacinus and all the species of the subgenus sylvaemus (A. flavicollis and A. sylvaticus) at 7 Myr [50,51]; and (iii) the Otomyini/Arvicanthi split at 5.7 Myr [52]. Sequences of Batomys granti (AY324458) (Phloemyini) and Phloemys cumingi (DQ191484) (Phloemyini), Apodemus mystacinus (AF159394), Apodemus sylvaticus (AB033695), Apodemus flavicollis (AB032853), Otomys angoniensis (AM408343) (Otomyini) and Arvicanthis niloticus (AF004569) (Arvicanthi) were added to our dataset to calibrate the tree and constrain the age of some nodes. Analyses were performed under the TN93+G substitution model (previously estimated by MODELTEST), a strict molecular clock and a constant size coalescent population model. These priors were selected because they better fit the data than any other molecular clock and population models according to the Bayes factors calculated to compare the models [53]. However, all tested models (strict or relaxed molecular clock combined to Bayesian Skyline coalescent population model, exponential growth coalescent population model or Yule process speciation model) produced similar divergence time estimates. Three independent runs with MCMC chain length of 1.5610 8 were performed, sampling every 1610 4 generations. Convergence of the chains to the stationary distribution was checked using TRACER 1.5 [54]. All BEAST computations were performed on the computational resource Bioportal at the University of Oslo (http://www.bioportal.uio.no).

Microsatellites analysis
Genetic diversity. Genetic diversity was assessed by calculating expected (He) and observed (Ho) heterozygosities with ARLEQUIN and confirmation of Hardy-Weinberg equilibrium (HWE) was tested using GENEPOP for each locus separately and over all loci for each sampling site. The allelic richness (AR) was calculated using the rarefaction procedure implemented in FSTAT 2.9.3.2 [55]. Multi-locus Fis was calculated for each population and adjusted for multiple tests using a Bonferroni's correction with FSTAT. The proportion of null alleles (NA) at each locus and for each population was estimated with FREENA [56] and genotypes were corrected using MICRO-CHECKER 2.2.3 [57]. Tests for linkage disequilibrium between loci for each sampling site were performed with GENEPOP.
Population structure. STRUCTURE 2.3.1 [58] was used to infer the number of populations (K) and assign individuals to genetic clusters independently of spatial sampling. Ten iterations were run for each value of K from 1 to 28 using an admixture model with a burn-in of 1610 5 and MCMC values of 1610 6 . We used CLUMPP 1.1 [59] to average the results of multiple iterations for a given K. GENELAND 3.3.0 [60] was used to perform a spatial genetic analysis by integrating geographic and genetic information and to determine the most probable K. Ten runs of 1610 5 MCMC iterations were performed for K from 1 to 28 with a spatial coordinates uncertainty of 1 km. The posterior probability of population membership for each individual was determined using a burn-in of 3610 4 iterations. A visual output of the STRUCTURE and GENELAND results was generated using DISTRUCT [61].
Genetic differentiation among clusters containing more than three individuals was quantified by computing pairwise F ST using ARLEQUIN.
IBD among the four groups of populations and within groups including at least four clusters (northeast and west) was tested by comparing pairwise geographic distance (log transformed) with pairwise (F ST /[12F ST ]) and (R ST /[12R ST ]) using ARLEQUIN and SPAGeDI 1.3 [62], respectively.

Biogeographic analysis
Spatial Analysis of Vicariance. We used VIP (Vicariance Inference Program) [63] to localize the main isolating barriers within L. neilli distribution range. VIP uses a phylogenetic tree (ML tree of the combined dataset) and direct geographic information data to search for the disjunctions/barriers among sister groups.
Ancestral range reconstruction analyses. To infer ancestral areas of L. neilli populations and investigate the relative role of vicariance and dispersal in the evolutionary history of this species, we used a dispersal-extinction-cladogenesis (DEC) model of range evolution implemented in LAGRANGE [64]. The DEC model describes ancestor-descendant transitions between geographic ranges by processes of dispersal (range expansion), local extinction (range contraction), and cladogenesis (range subdivision/inheritance) and estimates likelihoods of ancestral states (range inheritance scenarios) [65]. This analysis was carried out on the chronogram inferred with BEAST limited to the ten lineages with four geographic areas (west, centre, north, northeast). We first performed an analysis without constraints to infer ancestral areas of each node of the tree. Then, to optimize the results obtained for the ancestral node of L. neilli, we constrained successively the area of this node to each combination of one, two, three or four areas (except for the combination centre+north because these two areas are not adjacent) and compared the global likelihood scores obtained for each area or combination.
ABC computations. Approximate Bayesian computation implemented in the software DIYABC 1.0.4.45beta [66] was used to infer the evolutionary history of L. neilli combining our mitochondrial and microsatellite datasets. Several biogeographic scenarios were compared to test whether the current groups of populations originated from the fragmentation of a common ancestral population or from one of the current groups and whether admixture events occurred among populations. According to the phylogenetic trees and Bayesian clustering analysis, four groups of populations were used (west, centre, north, northeast) and the origin of each group was tested independently in a fourstep analysis using 40 scenarios (Table 1). For each group, a set of ten scenarios was designed to test its origin (see Figure 2 for a schematic representation of these ten scenarios for the NE group, similar scenarios were used for the three other groups): either from a hypothetical ancestral population (AP) (scenario NE.1), from one of the current groups of populations (scenarios NE.2 to NE.4), or a mix of two populations (admixture, scenarios NE.5 to NE.10). Range and distribution of priors for parameters used to describe these scenarios (effective population size, time of splitting or merging events, and rates of admixture in the case of merging events) are found in Table 2. For the microsatellite dataset, all onesample summary statistics available were used in addition to F ST and classification index as two-sample statistics to compare observed and simulated datasets. For the mitochondrial dataset, we used all one-sample statistics and the number of segregating sites, the mean of pairwise differences (W) and F ST as two-sample summary statistics.
For each scenario, 5610 5 datasets were simulated to build a reference table. To check if the combination of scenarios and prior distributions of their parameters were able to generate datasets similar to the observed one, a principal component analysis (PCA) was performed on the first 10000 simulated datasets of the reference table in the space of summary statistics. To determine the most likely scenario, the normalized Euclidean distances between each simulated dataset of the reference table and the observed dataset was then computed and 1% of the closest simulated datasets were used to estimate the relative posterior probability (with 95% confidence intervals) of each scenario with a logistic regression [67]. For each set of scenarios, the most likely scenario was the one with the highest posterior probability value and non-overlapping 95% confidence intervals. Finally, the posterior density distributions of the effective population size of each group were estimated from 1% of the closest datasets simulated according to the most likely scenario obtained for each step.

Mitochondrial and nuclear DNA analysis
Sequence variation. Fragments of 900 and 713 bp were obtained for cytb and COI genes, respectively. No stop codons, indels or heterozygous sequences were observed in these mitochondrial coding genes, supporting the mitochondrial origin of the sequences amplified. A total of 37 cytb haplotypes and 25 COI haplotypes were identified among our datasets. The cytb gene contained 158 (231 in the dataset including outgroup sequences) variable sites, whereas the COI gene contained 85 (138) variables sites.
For the bfibr gene, fragments of 740 bp were obtained and no indels were observed in the alignment. The G6pd sequences obtained varied in length from 678 to 690 bp and the final alignment included three indels of 1, 2 and 10 bp. A total of 31 bfibr alleles and eight G6pd alleles were identified among our datasets. The bfibr gene contained 27 (49) variables sites, whereas the G6pd gene contained 17 (29) variables sites.
Recombination was detected via a PHI test (p = 0.004) and GARD/SBP tests at position 386 for the bfibr gene. G6pd sequences did not present evidence of recombination, according to the PHI test (p = 1.0) and GARD/SBP tests. Therefore, we performed all phylogenetic trees including the bfibr gene with alignment containing only positions 1-385 of the bfibr gene.
Phylogenetic analyses. Mitochondrial loci were first analyzed separately; they yielded low-supported but congruent phylogenies (data not shown). They were then combined in a single matrix. The mitochondrial phylogenetic tree ( Figure 3) clearly indicated that the sister species of L. neilli is represented by  Schematic representations of these scenarios are given in Figure 2. The best scenario for each step is emboldened. Due to the low variability of our nuclear genes they were combined in a single matrix. For both phased and unphased datasets, the nuclear phylogenetic tree (Figure 4) supported the monophyly of the three Leopoldamys species; however, intraspecific relationships within L. neilli were less clear than for mitochondrial dataset. Only one lineage is well-supported (BS = 81%-BP = 1.0) and includes all samples from western Thailand (UT, KAN1-2-3-4-5-6-7-8).
The ML and Bayesian trees combining nuclear and mitochondrial datasets ( Figure 5) recovered exactly the same topology as the mitochondrial tree with better support for some nodes.

Genetic diversity and population
differentiation. Haplotype diversities within lineages are quite low (Table 3), ranging from 0.0 and 0.775 for mitochondrial markers and from 0.0 to 1.0 for nuclear markers. Nucleotide diversities (expressed as percentages) varied from 0.0 to 0.33 and from 0.0 to 0.73 for mitochondrial and nuclear markers respectively ( Table 3).
The table of nuclear allele frequency showed a high genetic heterogeneity among lineages (Table S4). Exact tests of genic differentiation computed across all pairs of lineages containing more than three individuals were highly significant at each locus (except for the pair Loei/Khon Kaen-Chaiyaphum (p = 0.10) for the bfibr gene and for the pair Uthai Thani-Kanchanaburi (p = 0.12) for the G6pd gene), as well as over all nuclear loci.
Moreover, the Phrae and Nan lineages shared a single G6pd allele and the Centre1 and Centre2 lineages also.
The analysis of pairwise W ST and F ST values confirmed a high population differentiation (Table 4) (Table S5). Three levels of genetic distance are observed among our dataset: around 7.8% of K2P distance (7.4-8.8%) between western lineages and all the other lineages, around 3.4% (2.8-4.3%) between central and northern/northeastern lineages and finally around 1.5% (0.7-2.3%) between northern and northeastern lineages.
The Mantel's test was nonsignificant for both mitochondrial (r 2 = 0.09; p = 0.66) and nuclear datasets (r 2 = 0.56; p = 0.17). This result is consistent with no relationship between geographic distance and genetic distance.
Divergence time estimates. Divergence time estimates are given in Table 5 and Figure 7. The TMRCA of L. neilli and L. edwardsi was estimated to occur at around 3.8 Myr. The TMRCA of L. neilli was dated around 2.4 Myr. The TMRCA of (centre+north+northeast) was estimated around 1.3 Myr while the TMRCA of (north+northeast) was dated around 0.9 Myr.

Microsatellites analysis
Genetic diversity. Analysis of microsatellite genetic diversity showed that observed heterozygosity ranged from 0.43 to 0.93, and allelic richness (based on one diploid individual) ranged from 1.41 to 1.79 (Table 6). Tests for HWE showed deviation from the expected frequencies for three populations: KAN2, KAN4 and KAN5. In the KAN2 and KAN5 populations, the homozygote excess could be due to NA at single loci (32% at locus mLn05 in Kan2 and 29% at locus mLn06 in Kan5) but NA were not detected in KAN4 population. None of the inbreeding coefficient (Fis) was significant at population level. Only one pair of loci   Evolutionary History of Leopoldamys neilli (mLn04 and mLn12) showed significant linkage disequilibrium in two populations (NAN, KK2) after Bonferroni's correction. Population structure. The STRUCTURE output was interpreted using the DK method described by Evanno et al. [68]. The highest DK was found at K = 11 with a second peak at K = 4 ( Figure S2). At K = 4, the four clusters correspond broadly to the four sub-groups observed in phylogenetic trees (west, centre, north, northeast) with the exception of UT and CHR samples included in the north and northeast clusters respectively (Figure 8). At K = 11, three of these clusters (west, north, northeast) are subdivided into ten clusters and inconsistencies with phylogenetic trees were still observed for UT and CHR samples which clustered with PHR and LO1+CHAI2 samples respectively (Figure 8). Few individuals were identified as possessing apparent admixed genotypes between central and northeastern clusters (in LOP1 and NKR2 populations) and between western and northern clusters (in KAN2 population).
Nine out of ten GENELAND runs gave a K of 13 clusters; while the other one gave a K of 12 clusters. As the mean posterior densities of the K = 13 runs were the highest, we selected K = 13 as the most probable number of clusters. These 13 ''geneland clusters'' were slightly different from the 11 ''structure clusters''. Most of them were congruent with the ten lineages identified in the phylogenetic trees and revealed a finer population structure within Kanchanaburi and Loei/Khon Kaen lineages ( Figure 8).
All pairwise F ST values among ''geneland clusters'' were significantly higher than zero, indicating a strong genetic differentiation (Table S6).

Biogeographic analysis
Spatial Analysis of Vicariance. VIP identified three main barriers within the studied area (Figure 7). The oldest barrier, isolating the western populations from the others, coincides exactly with the Central Plain of Thailand.
Ancestral range reconstruction analyses. LAGRANGE estimated the likelihoods of all possible ancestral range reconstructions for each node of the L. neilli tree. The most likely area of L. neilli ancestral population encompasses the four regions (west+centre+north+northeast) (relative probability = 1) (Figure 7). The likelihood scores of other areas or combinations of areas for the ancestral range of L. neilli are significantly different (more than 3-log likelihood units between the most likely area and the remaining likelihood scores). Several vicariant events progressively fragmented this large population and isolated the four regions. The ancestral range of the node containing the central, northern and northeastern lineages includes the three regions (centre+-north+northeast) (relative probability = 1). Finally, the ancestral Evolutionary History of Leopoldamys neilli range of the node containing the northern and northeastern lineages encompasses these two regions (north+northeast) (relative probability = 1).
ABC computations. The PCA performed on the first 10000 simulated datasets of the reference table in the space of summary statistics revealed that the observed dataset is clearly surrounded by simulated datasets, indicating that our model was able to produce datasets similar to the observed one (data not shown).
The logistic regressions performed on 1% of the closest simulated datasets allow distinguishing unambiguously the most likely of all tested scenarios at each step and revealed the origin of the four groups of populations. The most likely scenarios (highest posterior probability) of each step clearly indicated that all of the four groups originate from a common ancestral population and not from one of the current populations and that admixture events did not occur (Table 1 and Figure S3). These results corroborated the hypothesis of the fragmentation of a widespread ancestral population (vicariant hypothesis). The posterior parameter estimation indicated that the effective population size of this ancestral population was several times higher than those of the current groups of populations. Moreover we consistently observed that the western group has the largest effective population size, followed by the northeastern group, the central group and finally the northern group has the smallest effective population size (N W .N NE .N C .N N ) (Table S7).

Mitochondrial vs. nuclear phylogeographic structures of L. neilli
Three kind of molecular markers with various evolutionary rates and different modes of inheritance were used in this study and depicted an exhaustive picture of the phylogeographic structure of L. neilli.
Due to their slow evolutionary rate and their higher coalescent time, the use of nuclear DNA sequences in phylogeographic studies has been limited. However the present study confirms the interest to combine several nuclear introns to gain more resolution. Indeed the intron 7 of the bfibr gene used alone in Latinne et al. [25] did not show any phylogeographic signal but the phylogenetic tree combining this intron with the intron 1 of the G6pd gene ( Figure 4) clearly confirmed the separation of the western populations from all the others. Furthermore, the nuclear haplotype network ( Figure 6B) supported the high divergence among western, central and northern/northeastern populations. These results indicate the ability of nuclear introns to detect old The mtDNA markers corroborated the results obtained with nuclear introns and supported a large-scale population structure of four main groups (west, centre, north, northeast). Moreover they also detected more recent isolations among populations and indicated a strong finer structure within each of these groups: ten well-supported and geographically well-structured lineages of L. neilli are observed within Thailand (Figures 3, 6A).
Finally the microsatellite markers also detected the two levels of population structure using the DK method (Figures 8, S2). Microsatellites and mtDNA gave broadly congruent results and due to their high levels of polymorphism, microsatellites revealed a micro-fragmentation of the genetic diversity within western and northeastern regions that was not detected by mtDNA. Only two minor discrepancies were observed between these markers. The microsatellite-based clustering of samples from Uthai Thani (UT) with those from Phrae (PHR) (northern group) in the STRUC-TURE analysis conflicts with the mitochondrial and nuclear phylogenetic analyses indicating a strong link between samples from Uthai Thani and Kanchanaburi (western group). The fact that mitochondrial and nuclear genes sequences gave a similar result points towards an accuracy problem of microsatellite markers (e.g. size homoplasy) rather than a real cytonuclear conflict. Moreover, the GENELAND clustering analysis that integrates the spatial distribution of samples did not recover the UT+PHR cluster (Figure 8). An additional discordance was observed for samples from Chiang Rai (CHR), which clustered with northeastern samples (LO1+CHAI2) in STRUCTURE analysis (Figure 8), but were nested within the northern group in phylogenetic trees (Figures 3, 5). The table of nuclear allele frequency (Table S4) also indicates a link between CHR and northeastern lineages. However, due to the low sample size for this locality, it is not possible to draw a certain conclusion on this possible cytonuclear conflict.
The admixed genotypes between central and northeastern clusters identified in LOP1 and NKR2 populations and between western and northern clusters in KAN2 (Figure 8) most probably reflect the co-ancestry shared by samples of these clusters. Few bfibr alleles are also shared among some of the four groups of populations (Table S4) and they also probably represent polymorphism maintained from the common ancestral population.
In conclusion the present study confirms the benefit of using several independent and complementary genetic markers to obtain a comprehensive and reliable picture of L. neilli evolutionary history at several levels of resolution.
The congruent patterns of mitochondrial and nuclear markers indicated a strong population fragmentation for L. neilli. The absence of geographical overlap in the mtDNA haplotype distribution among lineages (except for Centre1 and Centre2 lineages) coupled with the high F ST values and genetic distances among them revealed a high genetic divergence and low gene Evolutionary History of Leopoldamys neilli flows among lineages. This high degree of genetic differentiation is confirmed by the old divergence times estimated among lineages as the TMRCA of L. neilli was approximated around 2.4 Myr. Such deep genealogical divergence among geographically close lineages is quite surprising and denotes a high population fragmentation related to the patchy distribution of limestone karsts and a close association of L. neilli with this habitat. Similar phylogeographic patterns are rare among rodents but have been observed for a few other rodent species strongly dependent on rocky and karstic habitats in various regions of the world, such as the Laotian rock rat Laonastes aenigmamus in Central Lao PDR [69], the Martino's vole Dinaromys bogdanovi in the western Balkans [70,71] and the Allegheny woodrat Neotoma magister in northeastern America [72]. Our study suggests that the spatial isolation of karsts prevents extensive migration among lineages of L. neilli. These results confirm the endemicity of L. neilli to limestone karsts previously described in the literature [17,19,25].

Evolutionary history of L. neilli
The divergence between L. neilli and L. edwardsi was estimated to occur during Pliocene, around 3.8 Myr. Several splits subsequently occurred within L. neilli population throughout Quaternary. Due to the potential inaccuracy of temporal estimates based on molecular data across different evolutionary scales [73,74], these estimates should be considered as a general approximation rather than as exact dating. These dates are consistent with fossil evidence attesting the widespread presence of the genus Leopoldamys in Indochina during Plio-Pleistocene. The oldest occurrence of Leopoldamys (L. minutus n. sp., an extinct species) has been reported in cave deposits from Late Pliocene to Early Pleistocene in the Kanchanaburi and Saraburi provinces, Thailand [75]. Fossils of L. edwardsioides, a predecessor of L. edwardsi, from Early Pleistocene have also been identified in southern China [76].
The evolutionary context of the divergence between L. neilli and L. edwardsi is a matter of debate. Two main alternative models of speciation of cave animals have been proposed in the literature: the Climatic Relict Hypothesis (CRH) [7], an allopatric speciation Evolutionary History of Leopoldamys neilli model; and the Adaptive Shift Hypothesis (ASH) [8,9], a divergence with gene flow speciation model (see Introduction for more details). Even if these two modes of speciation have been proposed for troglobionts, i.e., permanent residents of subterra-nean habitat without link with the surface [21], it could be interesting to attempt to use it to explain the speciation of a mammal subtroglophilic species. The old divergence between L. neilli and L. edwardsi that occurred a long time before Pleistocene environmental instability and the present parapatric/partially sympatric distribution of these two sister species in northern and northeastern Thailand [77] could support the ASH. According to this model, populations of a forest Leopoldamys ancestor could have invaded cave ecosystems to exploit the novel habitat or food resources and this ecological shift between the two populations could have been the driving force for genetic divergence and speciation [78]. Due to the considerable lack of ecological data on L. neilli, we can only propose some hypotheses on this point but we suggest that this ecological shift could affect the reproductive behaviour of these rodents and we suspect L. neilli to use caves for breeding and nursing as young individuals and babies have been observed within caves during our survey. The ecological specialization of L. neilli to limestone karsts could also be associated to the particular xerophytic flora of karsts or to specific food resources available in caves such as insects.
However, an allopatric speciation of L. neilli and L. edwardsi followed by a more recent sympatry cannot be excluded. The two species could have evolved independently in different regions of Southeast Asia during Pliocene and subsequently dispersed throughout Indochina to establish the current parapatry in northern Thailand. Additional data about the past (and present) distribution of L. neilli, L. edwardsi and their ancestor are needed to  infer the speciation process leading to the origin of a cave-dwelling rodent.
The relative role of vicariance (the fragmentation of populations due to isolating barriers) and dispersal (the colonization of new habitats across potential pre-existing barriers) to explain biogeographic patterns of cave and karstic endemic taxa have also been widely discussed during decades and the debate is still ongoing [1,2]. The first step to differentiate these two hypotheses is to identify potential dispersal barriers. Three main barriers associated with the Central Plain of Thailand have been detected within the studied area by VIP (Figure 7). The Central Plain of Thailand, a rift valley 500 km long and up to 200 km wide [79,80], is the result of the tectonic deformation of this region due to the collision of the Indian and Eurasian continental plates throughout the Cenozoic Era. As a consequence, Quaternary sediments have been accumulated on the valley floor and progressively buried the basement rocks of the Central Plain [79,81]. These deposits, with thickness of about 2000 m, represent a complex sequence of alluvial, fluvial, and deltaic sediments and were overlaid in the Lower Central Plain with thick marine clay during Holocene sea transgression [82].
The role of these barriers in the evolutionary history of L. neilli has been tested using ancestral range reconstruction analysis and ABC methods and they clearly supported the vicariant hypothesis. All four of the current groups of L. neilli originate from a common ancestral population widely distributed within Thailand during Pliocene and whose effective population size was several times higher than those of the current groups. Then the formation of the Central Plain of Thailand during Quaternary has played a significant role in progressively fragmenting the ancestral population and shaping the phylogeographic pattern of this species. The progressive burying and disappearing of limestone karsts of the Central Plain and of the main river valley basins would constitute a barrier to L. neilli dispersal and lead to the vicariant events isolating populations. The western populations were the first to have been isolated around 2.4 Myr, followed by the central populations around 1.3 Myr and finally the northern and northeastern groups around 0.9 Myr.
The non significant Mantel's tests performed on both mitochondrial and nuclear datasets supported the hypothesis of the presence of isolating barriers within this region. Geographic distance alone does not seem to influence the genetic structure of L. neilli and even short distances of lowland areas among karsts seem to constitute important dispersal barriers. The high F ST values for both mitochondrial and nuclear datasets between western and central populations (Table 4) highlighted the lack of gene flow between these two regions and the influence of the formation of the Central Plain of Thailand on L. neilli dispersal.
Implication for the conservation of L. neilli and limestone karsts L. neilli is currently classified as ''Data deficient'' on the IUCN Red List due to the lack of data about its distribution and ecological requirements [23], but our study provides valuable information for the re-evaluation of its status. Our survey of the rodent diversity in Thai limestone karsts has shown that the distribution range of this species is wider and its populations more numerous than indicated by previous records. Therefore, we believe that L. neilli does not satisfy the criteria of the IUCN threatened categories now but is likely to qualify for a threatened category in the near future for the following reasons.
First, populations of L. neilli are small and highly fragmented. Six Evolutionarily Significant Units (ESUs) have been defined in Latinne et al. [25] on the basis of mtDNA diversity. However, the larger sampling of the present study and additional information obtained from nuclear genes indicate that the delineation of four ESUs is more appropriate. The western, central, northern and northeastern groups of populations were historically isolated and evolved independently. Consequently, their preservation is of major importance to protect the intraspecific diversity of this species [83].
Second, the habitat requirement of L. neilli seems to be highly specific and the species is closely associated with limestone karsts. This habitat is currently strongly threatened through quarrying, hunting, and urbanization [13,14]. More than 122 limestone quarries are operating in Thailand in all the provinces where limestone is available [84]. More than 20% of limestone karsts in Thailand have already been quarried for cement, lime and hard core, and many have completely disappeared from the landscape [85]. As a result of these activities, some karst endemic species became dramatically endangered. The long-term survival of L. neilli is thus strongly dependent on the preservation of karstic habitats.
Third, L. neilli is intensively trapped for human consumption in some regions of northeastern Thailand. This could threaten the long-term preservation of the species in this region.
Therefore, we strongly recommend classifying this species at least as ''Near Threatened'' on the IUCN Red List. If the four ESUs that we described were assessed separately, they would qualify as ''Vulnerable'' because of their small geographic ranges and population sizes. Particular attention should be paid to the western lineages, highly divergent for both mitochondrial (more than 7% of net K2P distance for cytb) and nuclear genes, which could be considered as a distinct sub-species of L. neilli under the phylogenetic species concept [25] and therefore requires separate management and conservation plans.
We also suggest, following the recommendations of Vermeulen & Whitten [14], for limestone quarrying management, and   specially to avoid the quarrying of isolated limestone hills and to locate quarries in the largest limestone areas and let a considerable part of it remain intact.

Conclusions
This study reveals a complex genetic structure for Leopoldamys neilli supported by both mitochondrial and nuclear markers. The evolutionary history of this species has been influenced by the geological history of Thailand during Plio-Pleistocene and vicariance has played a significant role in shaping the phylogeographic pattern of L. neilli. The western, central, northern and northeastern groups of populations were historically isolated and they evolved independently. To our knowledge this is the first phylogeographic study of karst endemic taxa in this region. Further phylogeographic studies of other co-distributed taxa will allow the determination of whether the phylogeographic pattern recovered for L. neilli and the strong isolation of western populations could be generalized to other species living in Thai limestone karsts and whether they share a similar evolutionary history.