The Historical Speciation of Mauremys Sensu Lato: Ancestral Area Reconstruction and Interspecific Gene Flow Level Assessment Provide New Insights

Mauremys sensu lato was divided into Mauremys, Chinemys, Ocadia, and Annamemys based on earlier research on morphology. Phylogenetic research on this group has been controversial because of disagreements regarding taxonomy, and the historical speciation is still poorly understood. In this study, 32 individuals of eight species that are widely distributed in Eurasia were collected. The complete mitochondrial (mt) sequences of 14 individuals of eight species were sequenced. Phylogenetic relationships, interspecific divergence times, and ancestral area reconstructions were explored using mt genome data (10,854 bp). Subsequent interspecific gene flow level assessment was performed using five unlinked polymorphic microsatellite loci. The Bayesian and maximum likelihood analyses revealed a paraphyletic relationship among four old genera (Mauremys, Annamemys, Chinemys, and Ocadia) and suggested the four old genera should be merged into the genus (Mauremys). Ancestral area reconstruction and divergence time estimation suggested Southeast Asia may be the area of origin for the common ancestral species of this genus and genetic drift may have played a decisive role in species divergence due to the isolated event of a glacial age. However, M. japonica may have been speciated due to the creation of the island of Japan. The detection of extensive gene flow suggested no vicariance occurred between Asia and Southeast Asia. Inconsistent results between gene flow assessment and phylogenetic analysis revealed the hybrid origin of M. mutica (Southeast Asian). Here ancestral area reconstruction and interspecific gene flow level assessment were first used to explore species origins and evolution of Mauremys sensu lato, which provided new insights on this genus.


Introduction
Mauremys sensu lato belonging to Geoemydidae was proposed as a new genus based on the recent molecular research. For this genus, nine species are generally recognized: M. annamensis from Vietnam, M. caspica from Turkey to Iran, M. rivulata from Northeast Mediterranean, M. leprosa from Southern Europe to Northern Africa, M. mutica from Southeast China and Vietnam, M. reevesii from China, Korea and Japan, M. sinensis from Southeast China, Laos and Vietnam, M. nigricans from Southern China, M. japonica from Japan [1].
In recent years, more attention has been focused on the phylogeography and population genetic structure of Mauremys sensu lato. Previous studies of gene flow in this group mainly focused on Western Palearctic species. The existence of intraspecific gene flow was confirmed in M. leprosa, M. caspica, and M. rivulata [8][9][10]. Based on these results, Fritz et al. (2006) surmised the population of M. leprosa may be affected by glacial period bottlenecks, resulting in a decline in population diversity [9]. This inference has been confirmed in subsequent research about the population genetic structure of M. japonica [11]. However, the interspecific gene flow of East and Southeast Asian species has rarely been reported.
Eight species of Mauremys sensu lato were collected in this study. M. nigricans was not included because purebred M. nigricans is hardly found in the wild or turtle market. The phylogenetic relationships, interspecific divergence times, and ancestral area reconstruction of this group were explored using mt data. Subsequently, interspecific gene flow levels were assessed using five unlinked polymorphic microsatellite loci. Ancestral area reconstruction and interspecific gene flow level assessment were first used to explore species origins and evolution of Mauremys sensu lato, which provide new insights on the phylogeny of this genus.

Ethics statement and Sample collection
Procedures involving animals and their care were consistent with NIH guidelines (NIH Pub. No. 85-23, revised 1996) and approved by the Animal Care and Use Committee of Anhui Normal University under approval number #20130710.
Thirty-two individuals of eight species included 18 living turtles and 14 specimens. No endangered or protected species were involved in this study. Twenty-five samples were collected from China and boundary areas adjacent to Vietnam. No permission was necessary for accessing areas where turtles were collected. All M. japonica and three West Asian species (M. caspica, M. rivulata, and M. leprosa) were purchased from the pet market; i.e., Yihe market in Guangdong (Table 1). Tissue samples were collected from the tails (3-5 mm from the tip) of living turtles at a sampling location using procedures that minimized pain. Before tissue collection, we used 5% lidocaine ointment to anesthetize the tails to alleviate pain and 70% alcohol to clean the tails to avoid infection. A low dose of antibiotic was applied on the wound after tissue collection. During the healing period, wound were kept dry.
Most turtles were immediately released into the local habitat and others were fed in Anhui Normal University due to being an alien species. Specimens were deposited in the Provincial Key Laboratory of the Conservation and Exploitation Research of Biological Resources in Anhui, China. M. mutica were collected from two populations; i.e., an eastern China population and Vietnamese population. M. mutica from the two regions had obvious differences in morphology.

Laboratory protocols
Total genomic DNA was extracted from tail muscle tissue by a standard phenol/chloroform procedure via proteinase K digestion [12], and then kept at -20°C for PCR amplification.
Sixteen pairs of universal primers were designed for the mt DNA of Mauremys sensu lato (S1 Table). PCR reactions were conducted in 50 μL reaction mixtures containing 200 ng template DNA, 5 μL 10 × buffer (TaKaRa, Dalian, China), 4.0 μL MgCl 2 (2.5 mol/L), 3.0 μL dNTP (2.5 mM), 2 μL of each primer (5 μmol/L), and 0.5 U Taq DNA polymerase (25 U/μL, TaKaRa). PCR conditions were as follows: initial denaturation (95°C, 1 min), then 35 cycles of denaturation (94°C, 50 s), primer annealing (50°C-58°C, 50 s), and elongation (72°C, 1 min) and a final extension (72°C, 10 min). The mt DNA fragments of intended sizes were recovered using a Gel Extract Purification Kit (TaKaRa). Purified PCR products were cloned into pMD19T vectors (TaKaRa) and all fragments were sequenced in both directions with an ABI3730 automated sequencer (Invitrogen Biotechnology Co., Ltd, USA). Cross species microsatellite amplification was performed across 10 primer pairs developed for M. reevesii in earlier work of our laboratory (patent number: ZL201110026152.5) and five loci were chosen for amplification in this study. PCR conditions were as follows: 95°C for 5 min, 94°C for 30 s, 57°C for 60 s and 72°C for 90 s, followed by 32 cycles of step 2 to step 4 and final extension at 72°C for 5 min. SSR analysis was detected by ABIPRISM 3730. The results were read with GeneMarker software. Twelve protein-coding genes [cytochrome c oxidase (COX) subunits 1, 2, and 3, cytochrome b (Cyt b), NADH dehydrogenase (ND) subunits 1, 2, 3, 4, 4L, and 5, and ATP synthase F0 (ATP) subunits 6 and 8] were chosen for analyses. Other genes [22 tRNA genes, 2 rRNA genes, a highly variable control region (CR) and ND6 gene] were excluded from the analyses due to potential saturation, alignment problems and different evolutionary rates, which influenced replacement patterns at the amino acid sequence level [13,14].

Genetic distance analysis
Twelve protein-coding genes were aligned separately with Mega 6.06 software [15] by ClustW (codon). Then, the genetic distances for all species were calculated by analysis of the combined data (10,854 bp) of mt 12 protein-coding genes using Mega v6.06.

Phylogenetic analysis
In order to resolve the current ambiguous phylogeny of Mauremys sensu lato, a total of 35 complete mt genome sequences of Cuora and Mauremys sensu lato, and out-group Manouria emys were selected (S2 Table).
Considering different evolutionary rates, twelve protein-coding genes were partitioned by codon positions and a best-fit substitution model was selected using PartitionFinder [16] (Table 2).
Phylogenetic relationships were inferred with maximum likelihood (ML) and Bayesian analysis (BI) under a best-fit partitioning scheme. ML tree was calculated with Raxml version 7.2.6 [17]. For each partition scheme, we performed a rapid (-f a -x option) with 1,000 replications to assess support on different nodes [17,18]. We regard bootstrap values of ! 70% as strong support and values of < 70% as weak support [19]. BI analysis was constructed using MrBayes v.3.1.2 [20]. BI was run with four Markov chains for 5 × 10 7 generations and sampled every 1000 generations. The stationary point was reached when the potential scale reduction factor (PSRF) equaled 1, and when -log likelihood (-lnL) scores plotted against generation time reached a stationary value, and 25% of the generations were discarded. Trees from sample points following the burn-in were combined into a 50% majority rule consensus tree; the percentage of samples recovering a given clade reflected the clade's posterior probability (PP). Posterior probabilities ! 95% are regarded as strong support.

Ancestral area reconstruction and divergence time estimation
Ancestral area reconstructions were inferred by the program RASP 3.0 [21] for speciational evolution in phylogenetic trees, using the Bayesian binary method (BBM) and the statistical dispersal-vicariance method (S-DIVA). All eight species were allocated to five areas where these turtles still existed: (1) East Asia (A); (2) Southeast Asia (B); (3) West Asia (C); (4) West Europe (D); (5) South Europe and North Africa (E).
The divergence times of these species have been estimated using BEAST 1.8.0 under a model of uncorrelated rates drawn from a log normal distribution for 1.5 × 10 8 generations, assigning a Yule prior to rates of cladogenesis [22]. In order to ensure the time estimates to be as accurate as possible, all existing mt 12-protein sequences of Testudinidae, Cuora and Mauremys were selected from the NCBI database and Pelomedusa subrufa was used as the out-group (S3 Table). Two calibration points were selected to calibrate the divergence times: the fossil record of earliest divergence between Testudinidae and Geoemydidae, 55.0-66.4 million years ago (ma) [23]; the fossil record in Testudinidae, 45.6-55.8 ma [24].

Gene flow level assessment
Interspecific gene flow levels were assessed with Popgene 1.32 [25] using five unlinked polymorphic microsatellite data from 29 individuals except three Western Palearctic species (M. rivulata, M. caspica and M. leprosa). The interspecific genetic differentiation coefficients (Fsts) were calculated and 1,000 bootstrap resampling were carried out to verify confidence intervals (P < 0.05). Subsequently, the number of individual migrations in each generation was estimated by the effective number of migrants (Nm), which was calculated by the formula Nm = 0.25 Ã (1-FST)/FST [26].

Characteristics of the mitochondrial gene data
Fourteen individuals of eight species were determined from 16,443 bp (GU938833 and NC_016951) to 17,067 bp (KP100055) in length. All complete mt genomes encoded for 2 rRNA, 22 tRNA, 13 protein-encoding genes and a highly variable control region (CR). Similar to other vertebrates, CRs of all species were located between tRNA pro and tRNA phe . The following three parts of CR were also identified in these fourteen turtles: central conserved sequence block (CD), termination associated sequence (TAS), and conserved sequence blocks (CSB). And CSB contains a variable number of tandem repeat (VNTR) sequences. The arrangement of mt genes accorded with the general features of vertebrates.
Twelve protein-coding genes encoded by the H-strand were chosen for subsequent analyses because of the suitable evolutionary rate. The entire dataset was 10,854 bp in length, which contained 6,771 conserved sites, 4,080 variable sites, 3,072 parsim-informative sites and 1,008 singleton sites. The nucleotide compositions were slightly biased toward A and the average value was 32.3%.

Genetic distance comparative analyses between Cuora and Mauremys sensu lato
We calculated the genetic distance of Cuora and Mauremys sensu lato, respectively. Results for genetic distance showed that the largest genetic diversity in Mauremys sensu lato existed between M. reevesii and M. leprosa as well as M. reevesii and M. annamensis and the value was 0.086 (S4 Table). However, the largest value of Cuora was 0.093 and it existed between C. mouhotii, and C. amboinensis (S5 Table). Compared with Cuora, the genetic distance results suggested the largest divergence in Mauremys sensu lato was still at the species level within the genus.

Phylogenetic analyses
Phylogenetic relationships were estimated for the first time with ML and BI using mt twelve protein-coding genes. Trees derived from ML and BI analyses showed identical topology (Fig  1). Phylogenetic trees clearly indicated a sister relationship between Cuora and Mauremys sensu lato, while Manouria emys came out as the outgroup. The

Divergence time estimation and ancestral area reconstruction
The results were read combining the trend of temperature change and climatic events (Fig 2) [27]. Divergence time estimation was calibrated by two fossil records (node 1 and node 2)  [23,24]. The mean and 95% confidence interval (CI) of the ages of major nodes in the phylogeny are listed in Table 3 [27]. The red zone represents the warming period. The blue zone represents the glacial period. The earliest divergence of Mauremys sensu lato occurred in two Southeast Asian species (node 4; mean: 23.55 ma with a 12.26-32.12 ma 95% CI), and then species of the Eastern and Western Palearctic region diverged in 22.12 ma (node 5).
The historical evolution of Mauremys sensu lato is clearly shown by the results of ancestral area reconstruction in Fig 3. The slight difference between the results of BBM and S-DIVA analyses were a consequence of assumptions underlying different methods. We preferred the results of BBM analysis compared to S-DIVA because BBM calculates the probability of each area based on the distribution of terminal taxa [21].
All results supported Southeast Asia (area B) as the original area of living members of Mauremys sensu lato and current distributions were formed by multiple diffusion. Diffusion began in Southeast Asia (area B), and then expanded step by step from East Asia (area A) to Western Palearctic region (areas C, D and E). Most species of Mauremys sensu lato were distributed in East and Southeast Asia, and only three species (M. caspica, M. rivulata, and M. leprosa) were distributed in Western Palearctic region.

The assessment of gene flow
Five microsatellite loci amplified unambiguous and repeatable products in the size range expected. We assessed the interspecific gene flow level of six populations of East and Southeast Asian species. The results are represented by Fst and Nm values (Table 4).
Strong gene flow existed among most all of the populations (Nm > 1), except for between M. annamensis and M. reevesii (Nm = 0.96). The maximum Nm and the minimum Fst occurred between the Southeast Asian population and East Asian population of M. mutica and this suggested no differentiation between the two populations of M. mutica.

Interspecific phylogenetic relationships of Mauremys sensu lato
In both ML and BI trees, species of old Mauremys (purple in Fig 1) were discovered in all three clades (M. annamensis + M. mutica clade, M. japonica + M. sinensis + M. reevesii clade and M. rivulata + M. caspica + M. leprosa clade). The paraphyletic relationship among four old genera; i.e., Mauremys, Annamemys, Chinemys, and Ocadia, was revealed clearly, so it suggested that the four old genera should be merged into the genus; i.e., Mauremys, which was consistent with previous studies based on different molecular data [4][5][6][7]. Compared with Cuora, the most diverse species in Mauremys sensu lato should remain in the same genus, which was clearly reflected in the genetic distance. The largest diversity recognized in Cuora was discovered between C. mouhotii and C. amboinensis and the value was 0.093. However, the interspecific maximum value in Mauremys sensu lato was 0.083 and this existed between M. reevesii and M. leprosa as well as M. reevesii and M. annamensis. The interspecific maximum genetic distance of Mauremys sensu lato was less than Cuora. However, there is still controversy in recent molecular studies. Barth et al. (2004) and Spinks et al. (2004) suggested a sister relationship between M. reevesii and M. japonica + M. sinensis clade [4,6]. Also, Feldman et al. (2004) revealed M. sinensis was the sister to M. japonica + M. reevesii clade [5]. However, our results strongly supported a sister relationship between M. japonica and M. sinensis + M. reevesii clade.
Based on our phylogenies, an obvious paraphyletic relationship between M. reevesii and M. megalocephala was revealed. We inferred the reason for this paraphyletic relationship was that M. megalocephala is a "diet variant" of M. reevesii. This inference was in congruence with Barth et al.'s results [28].

The hypothesis for the origin and evolution of species in Mauremys sensu lato
Species of Mauremys sensu lato have a very wide distribution all over the Palearctic region. Some species are far apart and there is no gene exchange among them. However, they have an ultra-close phylogenetic relationship. This disjunction pattern had been reported for many species, e.g., softshell turtles, plants, fishes, amphibians, birds, and mammals [29,30].
We sampled extensively and traced the origins using mt genes. Combining divergence time and ancestral area reconstruction, we propose all species of Mauremys originated from a common ancestry in Southeast Asia. The earliest divergence of Mauremys occurred during the period of Late Oligocene Warming (25-16 Ma) (Fig 2, node 4). The warming climate provided a prerequisite for geographic radiations of Mauremys. Then, the divergence of species between the Eastern and Western Palearctic region was detected (Fig 2, node 5 and 6), while the turtles of Mauremys were diffusing from the East Asia to Western Palearctic region (Fig 3).
M. japonica diverged in approximately 15.19 Ma (Fig 2, node 7) while Japan was separating as an island arc by the Miocene back-arc opening of the Japan Sea during 15-25 Ma [31]. Considering that M. japonica is an endemic species in Japan, we suggest the divergence of this turtle may have been caused by the effects of the creation of the island of Japan.
The warm period broke in the late Miocene with the following glacial age [27] and turtles gathered in each refuge again. During long term evolution, genetic drift plays an important role that may lead to speciation [32]. Then permanent isolation between the species of Mauremys occurred by the emergence of deserts, oceans and mountains until now [33].

Interspecific gene flow level assessment for Southeast Asian and East Asian species
We assessed the interspecific gene flow level based on five unlinked polymorphic microsatellite loci. The results indicated the presence of extensive gene flow among East Asian and Southeast Asian species. Interestingly, phylogenetic analyses based on mt genes showed obvious paraphyletic relationship between M. annamensis and M. mutica. The Southeast Asian population of M. mutica was closer to M. annamensis than the East Asian population of M. mutica. However, extremely extensive gene flow has been detected between the two populations of M. mutica and the value of Fst revealed that the two populations were hardly differentiated. The inconsistency in the two different data sets suggested the hybrd origin of Southeast Asian M. mutica. Compared with the study of Qi [34], we inferred sex-biased dispersal of M. annamensis may have occurred. The females of M. annamensis mated with the males of M. mutica. Then hybrids mated with M. mutica. The nuclear proportion of M. annamensis was diluted for multi-generations and mitochondria were preserved. Nevertheless, further evidence is required.
The extensive gene flow indicated the possibility of interspecific hybridization. More hybridization has been reported in Mauremys sensu lato, such as O. glyphistoma being a hybrid of M. sinensis and M. cf. annamensis [35] and M. pritchardi being hybrids of M. reevesii and M. mutica, respectively [36,37]. Therefore, the interspecific gene flow may have been increased by the turtle trade, escape from farms and release activities, which led to the genetic diversity of wild populations being lost.