Integrative Taxonomy of Southeast Asian Snail-Eating Turtles (Geoemydidae: Malayemys) Reveals a New Species and Mitochondrial Introgression

Based on an integrative taxonomic approach, we examine the differentiation of Southeast Asian snail-eating turtles using information from 1863 bp of mitochondrial DNA, 12 microsatellite loci, morphology and a correlative species distribution model. Our analyses reveal three genetically distinct groups with limited mitochondrial introgression in one group. All three groups exhibit distinct nuclear gene pools and distinct morphology. Two of these groups correspond to the previously recognized species Malayemys macrocephala (Chao Phraya Basin) and M. subtrijuga (Lower Mekong Basin). The third and genetically most divergent group from the Khorat Basin represents a previously unrecognized species, which is described herein. Although Malayemys are extensively traded and used for religious release, only few studied turtles appear to be translocated by humans. Historic fluctuations in potential distributions were assessed using species distribution models (SDMs). The Last Glacial Maximum (LGM) projection of the predictive SDMs suggests two distinct glacial distribution ranges, implying that the divergence of M. macrocephala and M. subtrijuga occurred in allopatry and was triggered by Pleistocene climate fluctuations. Only the projection derived from the global circulation model MIROC reveals a distinct third glacial distribution range for the newly discovered Malayemys species.


Introduction
Snail-eating Turtles of the genus Malayemys inhabit a variety of natural and anthropogenic freshwater habitats across the lowlands of Southeast Asia [1]. The genus was long considered  Chao Phraya Basin (Thailand), the Khorat Basin (north-eastern Thailand) and the Tonlé Sap Lake (part of the Lower Mekong Basin, central Cambodia) and cover major parts of the distribution range of Malayemys (Figs 1 and 2; Table 1). All study sites are located outside of designated protected areas, except for the Prek Toal area of the Tonlé Sap Biosphere Reserve, Cambodia. While in Thailand live turtles were primarily obtained from resident fishermen and local markets, the majority of turtles from Cambodia were caught during our own field research and supplemented by few specimens obtained from local markets. For genetic analyses of living turtles, samples of either 100 μl of blood (drawn from the coccygeal vein) or, in few cases, a small section of tissue clipped from the tail tip was collected by FI and preserved in 900 μl analytical ethanol (98%). Tissue sample size varied between 2-3mm (depending on the size of the turtle) and was collected from the distal-most tip of the tail. The collection of blood and tail tip samples represent non-lethal standard methods commonly applied to turtles and no deleterious effects were observed in response to the procedure. Additional samples were acquired by extracting muscle or bone tissue from freshly dead (road kills) or preserved specimens in the Natural History Museum, Pathum Thani, Thailand. Except for four specimens retained as vouchers, living turtles were released at or near the site of capture after data collection. Voucher specimens constituting the type series were euthanized by intravenous injection of pentobarbital sodium, fixed in 90% ethanol and deposited at the following collections: Natural History  Laboratory procedure and phylogenetic analyses of mitochondrial DNA Two mitochondrial DNA fragments, that have been successfully used for phylogenetic and phylogeographic purposes in geoemydid turtles (e.g. [8,9,18,24]) were utilized, namely: the nearly complete cytochrome b gene (cyt b) plus adjacent DNA coding for tRNA-Thr and the 3' half of the NADH dehydrogenase subunit 4 gene (ND4) plus adjacent DNA coding for tRNAs. Total genomic DNA of fresh tissue and blood samples was extracted using the innuPREP DNA Mini Kit and the innuPREP Blood DNA Mini Kit (Analytik Jena AG, Jena, Germany), respectively. The mitochondrial DNA blocks were amplified using PCRs having a final volume of 25 μl. The reaction mix for both genes contained 1 unit Taq polymerase (Bioron, Ludwigshafen, Germany) with PCR buffer 10× including MgCl 2 , 0.2 mM of each dNTP (Thermo Scientific, St. Leon-Rot, Germany), 0.4 mM of each primer, ultrapure H 2 O, and 10-50 ng of total DNA. The cyt b fragment was amplified using the primer pair CytbG [8] and mt-f-na [25]. To amplify the ND4 fragment, the primer pair L-ND4 and H-Leu [9] was used. Thermocycling protocols are given in S1 Table. PCR products were purified using the ExoSAP-IT enzymatic clean-up (USB Europe GmbH, Staufen, Germany; modified protocol: 30 min at 37°C; 15 min at 80°C) and sequenced on an ABI 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) and either the primers CytbG, mt-C-For2, and mt-f-na for the cyt b fragment; or the primers L-ND4 and H-Leu for the ND4 fragment. Cycle sequencing reactions were purified using Sephadex™ (GE Healthcare, München, Germany) and 25 cycles were run with denaturing at 96°C for 10 s, annealing at 50°C for 5 s and elongation at 60°C for 4 min. Cyt b and ND4 sequences were obtained from 89 and 90 samples, respectively (Table 1). These were checked using the original electropherograms and subsequently manually aligned in BIOEDIT 7.0.5.2 [26]. Homologous sequences of Heosemys annandalii were downloaded from GenBank (accession number JF742646) and added as outgroup. The final concatenated dataset consisted of 1863 bp (1078 bp of cyt b plus adjacent tRNA and 785 bp of ND4 plus adjacent tRNAs) and was partitioned by gene and codon position. Partitions were tested with PARTITION-FINDER v1.1.1 [27], which also selects the best-fitting nucleotide substitution model using the Bayesian Information Criterion. Chosen models and partitions were HKY+G for the first codon positions of cyt b and ND4, and the tRNA; HKY+I for the second codon positions of cyt b and ND4; and HKY for the third codon positions of cyt b and ND4.
Phylogenetic trees were inferred with MRBAYES 3.2.2 [28]. Model parameters were estimated separately for each of the three partitions by unlinking them. Four independent runs were performed with 10 million generations each, sampling every 1,000 trees. However, the analysis was stopped when the average standard deviation of split frequencies fell below 0.01. Results of the MCMC were summarized and the initial 1,000 trees of each run discarded as burn-in after checking for convergence and sufficient effective sample sizes in TRACER v1.6 [29].
Relationships of DNA sequences were further assessed by building haplotype networks using a statistical parsimony approach as implemented in TCS 1.21 [30]. Networks were constructed separately for the DNA fragments containing the cyt b and ND4 genes. In addition, uncorrected p-distances were calculated for cyt b and ND4 using MEGA 6.06 [31] and the pairwise deletion option.
Microsatellite DNA was amplified using individual PCRs for each locus or multiplex PCR (S2 Table), each in a final volume of 10 μl containing 0.5 units Taq polymerase (Bioron) together with the buffer recommended by the supplier, 0.375 mM of MgCl 2 (Bioron), 0.2 mM of each dNTP (Thermo Scientific), 2 μg of bovine serum albumin (Thermo Scientific), 0.25 mM of each primer and 20-40 ng of total DNA. Cycling conditions were as follows: 35 cycles with denaturation at 94°C for 45 s, preceded by an initial denaturation step of 5 min, annealing at primer-specific temperature (S2 Table) for 60 s and extension at 72°C for 60 s, followed by final elongation of 10 min. PCR products were diluted with water in a ratio of 1:100. Fragment lengths were determined on an ABI 3730 Genetic Analyzer (Applied Biosystems) using the GeneScan-600 LIZ Size Standard (Applied Biosystems) and the software PEAK SCANNER 1.0 (Life Technologies, Carlsbad, CA).
The 12 loci were analysed with an unsupervised Bayesian clustering approach as implemented in STRUCTURE 2.3.4 [39,40] using the admixture model and correlated allele frequencies.
The advantage of such unsupervised analyses is that the software clusters the samples strictly according to the genetic information, but without any presumptions about population structuring (e.g. geographical distances, sampling sites). STRUCTURE searches the data set for partitions which are, as far as possible, in Hardy-Weinberg equilibrium and linkage equilibrium. For all analyses, the upper bound for calculations was set arbitrarily to K = 10, and the most likely number of clusters (K) was determined using the ΔK method [41] with the software STRUCTURE HARVESTER [42]. Calculations were repeated 10 times for each K using a MCMC chain of 750,000 generations for each run, including a burn-in of 250,000 generations. Population structuring and individual admixture were visualized using the software DISTRUCT 1.1 [43]. Individuals below a threshold of 80% for cluster membership were treated as having mixed ancestries [44].
Diversity and divergence parameters were estimated for all three clusters with microsatellite data. For comparing the number and size of microsatellite alleles, a frequency table was produced using CONVERT 1.31 [45]. Genetic differentiation between clusters was examined with ARLEQUIN 3.11 [46] using F ST values and analyses of molecular variance (AMOVAs; 10,000 permutations). In addition, ARLEQUIN was used to estimate locus-specific observed (H O ) and expected heterozygosities (H E ). The software FSTAT 2.9.3.2 [47] was used for determining locusspecific excess or deficiency of heterozygotes as expressed by the inbreeding coefficient F IS [48] and also the statistical significance of F IS . Finally, values for locus-specific allelic richness were calculated with the same software.

Morphology
A total of 94 specimens of Malayemys was examined for 28 morphometric and 13 colorationrelated characters (meristic n = 3, categorical n = 10). Metric characters of carapace and plastron were measured to the nearest mm by the same person (FI) using a digital calliper for straight-line characters and a measuring tape for curved measurements. The area of dark plastron pigmentation was assessed as percentage of the total plastron area using a colour threshold method in IMAGEJ [49]. The threshold was evaluated using a binary image and set to the value that captured the dark pigmentation completely. Subsequently, all images were processed with this threshold. The number of nasal stripes (NasS) and the shape of the infraorbital stripe (InfLorb) were determined following Brophy [6]. For the complete set of examined characters, see S1 Text. As turtles are known to be sexually dimorphic regarding their shell size and shape [50][51][52][53][54], all analyses were conducted for males and females separately. Sexes were determined according to tail morphology as described by Brophy [6]. To avoid size-dependent intercorrelation effects in the morphometric data, we calculated regression residuals on the log-transformed metric variables using straight carapace length as a covariable.
A principal component analysis (PCA) was conducted using mixed variables as implemented in the ADE4 package [55] for CRAN R [56]. This method can handle quantitative and categorical variables [57], allowing for the inclusion of coloration data. Such categorical characters can be important to distinguish taxa, and therefore should not be neglected in multivariate analyses [58]. The first four principal components (PCs; with Eigenvalues > 1) were used for subsequent analyses to capture major parts of the total variation. Restriction to four PCs was necessary as a trade-off between the number of observations per group and dimensionality of the morphological space.
Subsequently, a multivariate kernel density estimation (KDE) was used to estimate the ndimensional hypervolume of each of the different clades of Malayemys using the HYPERVOLUME package in R [59]. The KDE produces random points with a uniform distribution in morphospace defined by a minimum convex polytope enclosing the observations. For each species pair, we determined the total volume (union of both hypervolumes), the overlap (intersection of hypervolumes), and the unique part of each hypervolume. Furthermore, the Soerensen index based on the unique and shared parts was calculated as a measure of pairwise overlap in multidimensional morphological space.

Species distribution models
The potential distribution of Malayemys was assessed through a correlative species distribution model using MAXENT version 3.3.3k [60,61], which performs comparatively well even with a restricted number of records [62]. A set of 29 genetically verified occurrence records was used to build the model. In addition, six uncorrelated bioclimatic variables with a spatial resolution of 2.5 arc minutes were obtained from WorldClim (www.worldclim.org): annual mean temperature and precipitation (bio 1, bio 12), mean temperature of the warmest quarter (bio 10), mean temperature of the coldest quarter (bio 11), and precipitation of the wettest and driest quarter (bio 16, Bio 17). As training area, an area defined by a polygonal bounding box enclosing all species records was selected.
To predict the impact of historical climate fluctuations and related eustatic sea level changes on the distribution of Malayemys, we obtained two projections for the Last Glacial Maximum (LGM, 21,000 years ago) from global circulation models (GCM) through the Paleoclimate Modelling Intercomparison Project Phase II (PMIP2) [63], namely the Community Climate System Model (CCSM) [64], and the Model for Interdisciplinary Research on Climate (MIROC) [65]. Both GCMs were statistically down-scaled to a spatial resolution of 2.5 arc minutes using the delta method [66] and downloaded from www.worldclim.org.
In MAXENT, we applied a bootstrapping approach with 100 replicates splitting the occurrence data set randomly with 80% used for model training and 20% for model evaluation using the area under the receiver operating characteristic curve [67]. Only linear, quadratic and product features were allowed in order to restrict model complexity and to avoid spurious effects in response curves. The average projections across all 100 replicates were used for further processing, wherein the "balance training omission, predicted area and threshold value logistic threshold" were applied as non-fixed presence-absence threshold. Areas requiring extrapolation beyond the training area of the SDM were identified via multivariate environmental similarity surfaces (MESS) [68].

Nomenclatural Acts
The electronic version of this article corresponds to the requirements of the amended International Code of Zoological Nomenclature [69]. Therefore, the new name contained herein is available under that Code from the electronic edition of this article. This published work and the nomenclatural act it contains have been registered in ZooBank, the online registration system for the ICZN. The ZooBank LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser. The LSID for this publication is: urn:lsid:zoobank.org:pub:97BA99D1-13EA-4390-8A2D-2112CFA7D6B8. The electronic edition of this work was published in a journal with an ISSN, has been archived and is available from the following digital repositories: PubMed Central, LOCKSS.

Mitochondrial phylogeny and geographic distribution of clades
Analysis of the concatenated mtDNA revealed three well-supported clades (Fig 2). With a single exception, Clade B comprises turtles from the Chao Phraya Basin. This clade is, with high support, sister to another clade (Clade C) containing mostly individuals from the Lower Mekong Basin. However, this clade also contains a number of individuals from the Chao Phraya Basin. The successive sister, again with high support, is Clade A, comprising mainly turtles from the Khorat Basin. Thus, the geographic distribution of Clade B and C roughly correspond to the ranges of the two previously recognized species: Malayemys macrocephala (Chao Phraya Basin) and M. subtrijuga (Lower Mekong Basin). Yet haplotypes of different clades were found in few instances at the same sites, especially in central and eastern Thailand (Fig 2).
While Clade B and C differed by uncorrected p-distances of 1.45% for cyt b and 1.09% for the ND4 fragment, Clade A was much more divergent. It differed from Clade B by 6.92% for cyt b and 5.30% for the ND4 fragment, and from Clade C by 6.81% for cyt b and 5.38% for the ND4 fragment (S5 Table).

Genetic structuring according to microsatellite data
The ΔK method [41] revealed K = 3 as the optimal number of clusters. Cluster B corresponded to turtles primarily from the Chao Phraya Basin (M. macrocephala), Cluster A referred to turtles from the Khorat Basin, and Cluster C represented turtles from the Lower Mekong Basin (M. subtrijuga; Fig 3).
There was no evidence for gene flow across these three clusters. However, two misplaced individuals were identified in the second cluster (Fig 3; Site 18: Sakhon Nakhon), and a single one within the Lower Mekong cluster (Fig 3; Site 24: Tonlé Sap Lake). The latter misplacement is in concordance with the mitochondrial phylogeny.
However, individuals at two locations within the Lower Mekong Basin (Fig 3; Sites 18 and 24) were placed in the microsatellite cluster associated with M. macrocephala (Cluster B), rather than M. subtrijuga (Cluster C). Two of these individuals (Fig 3; at Site 18: Sakhon Nakhon) had discordant mitochondrial haplotypes corresponding to M. subtrijuga (Clade C). In addition, a single individual (Fig 3; Site 24: Tonlé Sap Lake) was identified as M. macrocephala by both microsatellite (Cluster B) and mitochondrial (Clade B) data.

Diversity within and divergence among clusters
For the following diversity analyses, turtles were assigned to the three distinct clusters revealed by STRUCTURE. Numbers of alleles per locus ranged from 2 to 22. Of a total of 134 alleles, 65 private alleles were found in only one of the three clusters (Table 2). With exception of the F IS value, the highest genetic diversity indices were found in the Chao Phraya cluster (Cluster B). The highest F ST value was found between the Khorat cluster (Cluster A) and the Lower Mekong cluster (Cluster A) (S3 Table, F ST = 0.55) for microsatellites and the AMOVA indicated that 43% of the observed global variation occurred among and 57% within clusters.

Morphology
The four-dimensional hypervolumes of the three genetically delineated groups showed no overlap at all (Soerensen overlap = 0.0 between all pairs), neither in males nor in females ( Fig  4). Complete differentiation in morphological space is observed among the genetically defined clades (Table 3). For variable contribution to the respective principal components, see S1 Text, for properties of the hypervolumes, see S4 Table.

Species distribution model
The predictive model performed well with high area under the receiver operating characteristic curve values (AUC training = 0.86, AUC test = 0.85), indicating the model to discriminate well between suitable and unsuitable environmental space. Mean temperature of the warmest quarter (bio 10) was the variable with the highest importance across all 100 replicates (44.3%), followed by annual mean temperature (bio 1, 39.9%) and precipitation of the driest quarter (bio 17, 5.12%). All other variables contributed less than 5% (S6 Table).
Under present climatic conditions, the model reflects the known distribution range of Malayemys. Areas of higher altitude such as the Dangrek Mountain range (separating the Khorat Basin from the Northern Plains of Cambodia) and the Cardamom Mountains (separating eastern Thailand from the south-west of Cambodia) are not predicted to have suitable Genotypic structuring of 91 Malayemys specimens for K = 3 using 12 microsatellite loci (top). Shown is the STRUCTURE run with the best probability value. Within each cluster an individual turtle is represented by a vertical segment that reflects its ancestry. Mixed ancestries are indicated by differently coloured sectors corresponding to inferred genetic percentages of the respective cluster. The mitochondrial lineage of each sample is shown above the STRUCTURE diagrams. Colours of pooled sampling sites in the map correspond to STRUCTURE clusters; slices represent turtles with conflicting cluster assignment (percentages). Numbers refer to population IDs (Table 1). doi:10.1371/journal.pone.0153108.g003 Integrative Taxonomy of Southeast Asian Snail-Eating Turtles environmental conditions (Fig 5). Projections onto paleoclimatic conditions derived from two different global circulation models, however, suggest that suitable environmental space contracted to largely disjunct ranges during the last LGM. The first of these areas, corresponding to the current distribution range of M. macrocephala, was predominantly confined to the Chao Phraya Basin, but extended southwards onto the Sunda Shelf. A second suitable area, which was perhaps intermittently connected to the first via a narrow corridor exhibits lower occurrence probabilities and stretched from eastern Thailand across the Lower Mekong Basin to the Mekong Delta in southern Vietnam. This area matches the current distribution range of M. subtrijuga. Only the MIROC projection predicts extensive suitable environmental conditions   within the Khorat Basin during the LGM, wherein the potential distribution in this region was much more restricted in CCSM but still present.

Taxonomic Conclusions
Our analyses revealed substantial phylogeographic structure within the genus Malayemys, with three mitochondrial clades that largely correspond to genetic clusters identified by STRUCTURE analyses of 12 microsatellite loci and to drainage basins. Two of these entities, from the Chao Phraya Basin and from the Lower Mekong Basin, can be identified with the species Malayemys    , 1845), respectively. Based on morphological differences alone, the validity of these two taxa have been recognized since the studies of Brophy [1,6]. According to our data, mitochondrial sequence divergence is relatively low between M. macrocephala and M. subtrijuga (Fig 2), with uncorrected p-distances of 1.45% for the DNA fragments comprising the cyt b gene and 1.09% for the one comprising the partial ND4 gene (S1 Fig, S1 FASTA). Despite this weak differentiation, our microsatellite analyses revealed distinct nuclear genomic gene pools, suggesting reproductive isolation and corroborating their species status. However, mismatches of mitochondrial haplotypes and STRUCTURE clusters (Figs 2 and 3) indicate old mitochondrial introgression into M. macrocephala. The populations from the Khorat Basin constitute another morphologically distinct group (Fig 4), harbouring the most divergent mitochondrial lineage (see also S1 Fig). As in M. macrocephala and M. subtrijuga, microsatellite data indicate its reproductive isolation. Based on the congruence of multiple lines of evidence, we conclude that the Khorat populations represent a distinct third species for which no name is available (cf. [70,71]). This species will be formally described below.
Diagnosis: Malayemys species with a straight carapace length below 20 cm and a tricarinate, blackish-brown carapace, and a blackish-brown head with distinct yellowish facial stripes. Malayemys khoratensis differs from M. macrocephala by the following characters: (1) first vertebral scute roughly square and not tapered posteriorly (V1W/V1L: 0.83±0.09 vs. 0.74±0.19 in females, 0.83±0.12 vs. 0.69±0.09 in males); (2) lower marginal scutes 8-12 with a pattern of diagonal to cone-shaped blackish brown blotches originating from the outer posterior corner vs. narrow blackish-brown bars at the posterior sutures; (3) infraorbital stripe not or rarely reaching the loreal seam, not broadened at the suture vs. always reaching the loreal seam and usually distinctly broadened at the suture; (4) infraorbital stripes only slightly curved below anterior edge of eyes vs. distinctly curved or angled; (5) short yellowish postocular stripe (between supraorbital and infraorbital stripe) lacking or reduced vs. postocular stripe always present and distinct (Fig 7).
Genetically, M. khoratensis differs from both congeners by the presence of tyrosine (T) instead of cytosine (C) at positions 45,48,117,144 and by the presence of cytosine (C) instead  Description of the holotype (Fig 6): Straight carapace length 106 mm; carapace moderately domed, somewhat depressed; carapace with a prominent dorsal keel from first to fifth vertebral and two nearly parallel lateral keels from the first to forth costal, most prominent on the second and third costal, feeble on forth costal; nuchal scute rather triangular, nearly as broad as long and not projecting beyond front border of adjacent marginals; first vertebral almost squarish (22x18 mm), anterior of scute slightly convex; second vertebral almost square (18x21 mm); third vertebral faintly hexangular and slightly broader than long (17x22 mm); forth vertebral hexangular (17x23 mm); fifth vertebral trapezoid with a convex posterior scute (25x33 mm); first costal quadrant, second and third costal broader than long, forth costal obliquely quadrangular; plastron shorter than carapace; slightly rounded anteriorly, posterior edges of gulars form an angle of 110°; a broad bridge, plastral edge slightly angular; measurements of median sutures of plastral scutes: gular 12 mm; humeral 9 mm, pectoral 14 mm, abdominal 22 mm, femoral 20 mm, anal 10 mm.
Toes fully webbed, claws moderate; posterior region of thighs and area above tail with a few very small conical and pointed scales; two rows of flat scales under the tail.
Head large, skin of top smooth with a large undivided prefrontal scute. Supraocular scute very large, bordering the upper jaw scute. Upper jaw notched.
Carapace dark brown; marginal scutes exhibit blackish area along the posterior edge; submarginal scutes: 1st almost entirely yellowish, 2nd and 3rd with blackish area near sutures; 4th to 7th (next to bridge) almost entirely blackish, 8th to 12th (posterior to bridge) triangular to cone-like blackish area emanating from outer posterior corner of scutes; plastron pale yellow, each scute except anal scutes with a large blackish area emanating from the lateral posterior corner; head blackish brown; one ivory-white supraorbital stripe (pale yellowish in life) running from tip of snout above eye and ear to base of neck; one ivory infraorbital stripe (pale yellowish in life) from about anterior lower corner of eye, crossing jaw angle, rather straight running continuously below ear to neck; two parallel ivory lines (pale yellowish in life) from nostrils to upper jaw, more or less continuous with two pale washed-out median lines crossing lower jaw; soft parts of posterior neck and shoulder greyish beige; limbs darker on upper surfaces.
Variation: For variation in colour traits and morphological measurements, see Table 3. For genetic variation, see GenBank accession numbers (Table 1).
Etymology: The species epithet refers to the Khorat Basin, the watershed to which the range of the new species appears to be restricted. The proposed English common name is Khorat snail-eating turtle.
Distribution: M. khoratensis is so far only known from the Khorat Basin of northeastern Thailand. More precisely from Sikhio District, Nakhon Ratchasima Province, a location drained by the Mun River, and the provinces of Udon Thani (type locality) and Nong Bua Lamphu, in areas associated with the Chi River, a tributary of the Mun River.

Discussion
Different data sets (morphology, mtDNA, and microsatellite loci) support our hypothesis that Malayemys consists of three clearly distinct taxa. Quantitative analyses yielded a complete morphological differentiation of all three clusters. This lack of overlap in morphological space is a strong argument for specific distinctiveness, as one would not expect such discrete clusters if there was consistent exchange between sampled populations. Furthermore, all three species can be identified by qualitative, diagnostic characters (see above). While M. macrocephala and M. subtrijuga have already been granted species status based on morphology [5,6], the third species described herein is new to science. At least for the new species M. khoratensis, mitochondrial and microsatellite markers showed largely congruent patterns, both supporting its species status. According to our phylogeny, the initial-comparatively deep-split within Malayemys separated M. khoratensis from the last common ancestor of its two congeners. Observed p-distances between M. khoratensis and the other two species resemble those of many well-established turtle species (e.g. [17,18]), whereas divergences between M. macrocephala and M. subtrijuga are close to the lowermost divergence values observed between distinct turtle species [72,73]. However, population genetic analyses revealed no signs of gene flow between any of the three Malayemys species (no admixture among STRUCTURE clusters; high F ST values and a high number of private alleles and haplotypes; see e.g. [74]) and thus corroborate to treat all three taxa as full species under the restrictive biological species concept [75,76]. However, considering the shared mitochondrial haplotypes in M. macrocephala and M. subtrijuga, some historical introgression seems to have occurred between these two species.
Although Malayemys is excessively traded and used for religious release, our study found well-separated distribution ranges and identified only a few genetically mismatching individuals that may result from translocation by humans. This could indicate that turtles used for traditional religious release are mostly locally caught and released locally. Yet, for three mismatching turtles translocation seems likely: a single individual from within the range of M. subtrijuga, offered for religious release in Siem Reap, Cambodia (Site 24; Lower Mekong Basin), was morphologically and genetically allocated to M. macrocephala, suggestive of introduction from Thailand. Another two individuals from the Khorat Plateau (site 18; Sakhon Nakhon) genetically as well as morphologically represented M. macrocephala and not, as expected, M. khoratensis. In addition, a single M. macrocephala has been reported from the Nam Lik Valley, Vientiane Province, northern Laos [77] and M. macrocephala was found on food markets in and around Vientiane city [78]. It remains unclear whether these turtles have been introduced or whether M. macrocephala occurs naturally in the region of the northern Middle Mekong and its tributaries.
Snail-eating turtles occur in a variety of natural and anthropogenic freshwater habitats, including rice fields and canals across the Southeast Asian lowlands but are absent from higher elevations [1,6]. In accordance, our SDM identifies temperature-related variables as the limiting parameters for geographic distribution, restricting the occurrence of Malayemys to lower elevations. The present distribution pattern has to be understood as the result of paleogeographic events, historical climate fluctuations and eustatic sea level changes.
During the last glacial maximum, the sea level fell and exposed a vast portion of the Sunda Shelf as swampy plains that might have served as suitable habitat for the semi-aquatic Malayemys. However, rising sea levels led to inundation of these potentially suitable areas during the Pleistocene and post-Pleistocene period, making vast areas of the Asian continental shelf uninhabitable for Malayemys once more.
In addition, tectonic activity caused significant rearrangement of the Southeast Asian drainage systems leading to alterations of evolutionary and ecological trajectories [79][80][81][82]. Uplifting along the southern and western edges of the Khorat Plateau disconnected the Mun River, nowadays a tributary of the Mekong, from the Chao Phraya Basin [83][84][85][86]. The predictive SDM under current climatic conditions highlights geographic barriers (i.e., mountain ranges) that separate suitable areas in the Khorat Basin from the remaining areas suitable for Malayemys. Past projections of the SDM suggest that environmentally suitable space for Malayemys was restricted to two disjunct areas during the LGM. These two potentially suitable glacial distribution ranges largely match the current ranges of M. macrocephala and M. subtrijuga, proposing that the divergence of these two species occurred in allopatry and was triggered by Pleistocene climate fluctuations. Yet, the older split, separating M. khoratensis, is difficult to explain. Theoretically, it should be assumed that for this species a third distinct glacial distribution range existed. However, the LGM model revealed no suitable areas within the current range of the new species, which suggests either a niche shift (i.e., the realized niche of M. khoratensis has changed through time) or that M. khoratensis could have co-existed during the LGM with M. macrocephala and M. subtrijuga in the same areas.
The presence of Pleistocene (or older) fossils belonging to the freshwater turtle genera Chitra and Batagur as well as sediments obtained from Khok Sung, Nakhon Ratchasima Province, Thailand suggest that the Mun River was historically draining into the Chao Phraya instead of flowing from west to east into the Mekong system [85]. Now extirpated from east of the Chao Phraya, this historical drainage system arrangement likely facilitated gene flow between western and southern Batagur populations such as the B. trivittata and its closest relative B. borneoensis [85]. Therefore, the Khorat Basin might have served as a historical migration route for freshwater turtles between the Lower Mekong and the Chao Phraya Basin. These hydrogeographical events influenced geographic distribution patterns and genetic diversity of freshwater fishes (Henicorhynchus), gharials (Gavialis), and semi-aquatic mud snakes (Enhydris) [85][86][87][88].
Supporting Information S1 FASTA. Reference alignment of the nearly complete mitochondrial cytochrome b gene (plus adjacent DNA coding for tRNA-Thr) of Malayemys.