DNA Barcodes and Species Distribution Models Evaluate Threats of Global Climate Changes to Genetic Diversity: A Case Study from Nanorana parkeri (Anura: Dicroglossidae)

Anthropogenic global climate changes are one of the greatest threats to biodiversity. Distribution modeling can predict the effects of climate changes and potentially their effects on genetic diversity. DNA barcoding quickly identifies patterns of genetic diversity. As a case study, we use DNA barcodes and distribution models to predict threats under climate changes in the frog Nanorana parkeri, which is endemic to the Qinghai-Tibetan Plateau. Barcoding identifies major lineages W and E. Lineage W has a single origin in a refugium and Lineage E derives from three refugia. All refugia locate in river valleys and each greatly contributes to the current level of intraspecific genetic diversity. Species distribution models suggest that global climate changes will greatly influence N. parkeri, especially in the level of genetic diversity, because two former refugia will fail to provide suitable habitat. Our pipeline provides a novel application of DNA barcoding and has important implications for the conservation of biodiversity in southern areas of the Qinghai-Tibetan Plateau.


Introduction
Climatic changes influence organisms and an understanding of how this occurs is important for conservation. More than one line of evidence documents the impact anthropogenic global climate change (GCC) exerts on organisms [1]. Explorations into how past climate changes influenced organisms may serve to predict future impacts of GCC. Genetic diversity is important in conservation because higher levels maintain the evolutionary potential of species. However, the distribution of genetic diversity is often uneven across the range of a species and many factors may contribute to this. Environmental changes during glacial-interglacial cycling in the Quaternary is one of the most important historical drivers of genetic patterns [2,3]. For example, by retaining suitable habitat over several glacial cycles, refugia hold higher levels of genetic diversity compared with recently occupied areas [2,3]. Refugia also drive genetic distinctiveness within species owing to providing long-term geographic isolation. A clear understanding a species' evolutionary history and its drivers is important for planning conservation [4].
Genetic analyses form the cornerstone of conservation planning, especially in defining objective evolutionarily significant units (ESUs) and management units (MUs) [5]. DNA barcoding [6], which uses a short, universal genetic marker (COI in eukaryotes) to identify matrilines and species, may serve to efficiently identify ESUs and MUs.
Species distribution models (SDMs) also provide information useful for conservation planning [7][8][9]. Comparisons between SDMs for current and the Last Glacial Maximum (LGM) may provide a chance to explore the impacts of past climate changes to organisms. As an extension of this application, SDMs can compare the distributions of current and future habitats. This allows assessments of the risk of local extirpation and extinction caused by future habitat degradation. Whereas both DNA barcoding and SDMs provide valuable insights, their synthesis may serve to evaluate threats of GCC to organisms.
The Qinghai-Tibetan Plateau (QTP), which covers more than 2.5 million km 2 and has an average elevation of about 4000 m above sea level [10], is the largest and highest plateau on Earth. Unlike North America and Europe, no unified ice sheet formed on the QTP during the LGM, yet its environment changed substantially. The average temperature was from 6uC to 9uC lower than today and precipitation decreased by from 30% to 70% [11].
In addition to climatic drivers, geographic features also play a role in the formation of patterns of genetic diversity. In the southern QTP, two mountains stretch from east to west and the Yarlung Zangpo River (YZR; Brahmaputra River) (Fig. S2) occurs between them. The Nianqingtanggula Mountains (NM; Nyainqentanglha Mountains) in the east and Gangdisi Mountains (GM; Kailas Range) in the west form the northern mountains and the Himalayas form the southern boundary. Several rivers flow southward across the latter. The complex geography may have offered refugia during dramatic Quaternary climatic changes. Analyses of matrilineal dispersal are likely to reveal such history. Driven by the GCC, future severe temperature changes of the QTP may exceed those of lower elevations [12]. Thus, the GCC may impose enormous impacts on organisms living in the QTP. We assess this possibility by synthesizing DNA barcoding and SDMs analyses.
Because of their relatively low mobility and physiological requirements, amphibians are sensitive to environmental changes, especially temperature and precipitation. Further, amphibians retain high-resolution signals of historical responses to environmental perturbations [13]. The environment on the QTP is harsh for amphibians; suitable environment is rare except in the eastern and southern edges. A previous study on Rana kukunoris, a common frog on eastern QTP, revealed a unique genetic pattern compared to other taxa in the region [14]. Comparative amphibian studies from the southern QTP will yield insights into how environmental changes affect the biota fauna of the QTP.
Nanorana parkeri, a median-sized frog belonging to the family Dicroglossidae, is endemic to southern and southeastern Tibet. It occurs at elevations ranging from 2800 to 5000 m in valleys of the YZR and north of the NM [15]. Climate changes during LGM greatly affected this area. Our study evaluates samples from across the entire range of N. parkeri. Using the species as model system, we test the effectiveness of a pipeline that combines DNA barcoding and SDMs to identify future threats to genetic diversity in the face of GCC.

Ethics Statement
All the species included in our study (Table S1)  For egg masses, we separated 5-10 eggs from egg mass and stored them in 95% alcohol after removing egg jelly. For adult, within each locality only five individuals were euthanized using clove oil firstly. Following euthanization, tissues dissected from adult specimens were preserved in 95% ethanol. More other adults were just cut two toe tips and then released. Tadpoles were also euthanized using clove oil firstly, then stored in 95% alcohol after removing gut. Table S1 lists our samplings information, including species name, locality, GPS coordinates, and accession nos. in Genbank etc.

Population sampling and laboratory techniques
Our samplings covered the entire documented distribution range of N. parkeri (Fig. 1, Table S1). One individual of N. pleskei, N. ventripunctata and N. liebigii was used as an outgroup representative [16,17].
We extracted genomic DNA using standard phenol-chloroform extraction protocol [18]. Partial sequences of cytochrome c oxidase subunit I (COI) were sequenced for all individuals using universal primers [19]. PCR products were purified and used as the template DNA for cycle sequencing reactions performed using BigDye Terminator Cycle Sequencing Kit (v.2.0, Applied Biosystems, Foster City, USA) and an ABI PRISM 3730 DNA Analyzer.

Sequence alignment and phylogenetic analyses
Nucleotide sequences were checked by eye using LASERGENE 7.0 and aligned using CLUSTALX 1.81 [20] with default parameters. Subsequently, the aligned sequences were checked and optimized in MEGA 4.0 [21]. Identical haplotypes for mtDNA were collapsed using DNASP 5.10 [22]. The overall value of nucleotide diversity (p) and haplotype diversity (H) were also estimated using DNASP.
Phylogenetic analyses of the COI data were conducted using Bayesian inference (BI), maximum likelihood (ML) and maximum parsimony (MP). BI analyses were performed using MrBayes 3.1.2 [23]. We tested three different partition strategies based on codon positions (no partition; 1+2, 3; and 1, 2, 3). The best strategy was chosen based on the Bayes factor test [24] (Table 1), as it represented a robust method testing partitioning strategies [25]. Nucleotide substitution models were selected for each data partition using the Akaike information criterion in MrModeltest v2.3 [26]. BI analyses for each partition strategy were run 3 million generations while sampling trees every 1000 generations. The first 50% of the sampled trees were discarded as burn-in. The final analyses employing the best partition strategy were run for 10 million generations. AWTY [27] was used to confirm satisfactory convergence of the topological split frequencies. MP analyses were implemented using PAUP* 4.0b10 [28]. Heuristic searches with TBR were executed for 1000 random pseudoreplicates with all characters treated as unordered and equally weighted. Bootstrap analyses were conducted using 1000 replicates to assess nodal reliabilities. ML analyses were conducted using RaxML 7.0.4 [29] and the GTR+I+G model was implemented for each data partition.
We built a median-joining network (MJN) using NETWORK 4.5 [30] to visualize the frequencies of the haplotypes. To remove excessive links and median vectors, we used the MP option [31].

Divergence time estimation
Different demographic and molecular clock models were compared using path sampling and stepping-stone sampling ( Table 2) [32]. Employing the best models, time to most recent common ancestor was estimated using BEAST 1.7.5 [33]. Due to the absence of a reliable fossil record and an established substitution rate of COI for N. parkeri, we used the divergence time between N. parkeri and N. pleskei (8.962.7 Ma) [16] as a secondary calibration point. Model selection and partition scheme were the same as used in the BI analyses. The final analyses involved two independent runs of 30 million generations each, while sampling trees every 1000 th generation. Burn-in and convergence of the chains were determined with Tracer 1.5 [34]. The measures of effective sample sizes were used to determine the Bayesian statistical significance of each parameter.

Population structure and demographic analyses
We explored population structure and genetic diversity landscape using SPADS 1.0 [35]. Groups of populations were defined as for SAMOVA [36]. We explored values of K (number of groups) ranging from 2 to 10 with 100 simulated annealing processes. The optimum value of K was identified by exploring the behavior of the indices F CT and F SC . Spatial patterns of genetic diversity based on allelic richness (Ar) and p across the landscape were explored using the GDivPAL function in SPADS.
To test for the influences of mountains and rivers on the genetic structure of N. parkeri, we measured the population structure by three independent analyses of molecular variance (AMOVA) [37]: populations north and south of the YZR; populations north of NM, west of the boundary of GM and NM, and the remaining ones; and four groups according the results of the SAMOVA. Analyses were performed using Arlequin 3.5 [38] and significance was assessed by 10000 permutations.
We investigated past changes of effective population size using the neutral test and mismatch distributions. Tajima's D [39] and Fu's Fs statistics [40] with 10000 coalescent simulations were calculated using Arlequin. Pairwise mismatch distributions [41] were calculated for each group in Arlequin. The expected distribution under a model of sudden demographical expansion was generated using 10000 permutations. The significance of deviations from this model was tested using the sum of squared deviation (SSD) and raggedness index (Rag). All analyses were performed for each lineage separately because population subdivisions could have masked the effects of expansions.

Species distribution modeling
We inferred the potential geographic range of N. parkeri using the maximum entropy model implemented in MAXENT 3.3.3 [42,43]. Environmental variables from the WorldClim database with resolutions of 2.5 arc-minutes [44] were downloaded as environmental layers. Because of the controversy about whether correlated variables should be removed or not, we used all of the 19 bioclimatic layers [45]. All layers were cropped to span from 83uE to 99uE and from 26uN to 33uN.
Random null distributions were built to test for the significance of our SDM. For this test, we built a new SDM using 39 random points (as same number as our sampling localities). This process was repeated 100 times and the areas under the curves (AUCs) were used as null distributions [46].
Assuming niche conservatism over time [47][48][49], we predicted the former distribution of N. parkeri by projecting our model on  LGM climatic layers. Predicted distributions during the LGM were generated by downloading both the community climate system model (CCSM) [50] and model for interdisciplinary research on climate (MIROC) [51] from the WorldClim database.
To predict the influences of GCC to this species, we also projected the model to climate data of the 2080 s based on MIROC model under the A1b scenario. In this way, we predicted the distribution changes in future.

Sequence information
We obtained 549 partial COI sequences from 39 localities of N. parkeri, plus one sequence each from N. pleskei, N. ventripunctata and N. liebigii. The fragment consisted of 539 base pairs, of which 35 positions exhibited variation and 27 were potentially parsimony informative, resulting in 23 haplotypes in N. parkeri. All sequences were deposited in GenBank (Table S1). The overall value of p was 0.0174660.00055 and H was 0.67860.016.

Genealogical analyses of COI
Bayes factor test showed a preference for the partition strategy of 1+2, 3 ( Table 1). The best fit substitution model for the first and second codon was HKY and GTR+G model was the best fit for third codon.
Matrilineal genealogies obtained from BI, ML and MP analyses were nearly identical ( Fig. 1 & Fig. S1) and all methods recovered lineages East (E) and West (W). The boundary of GM and NM separated these two lineages. Lineage W was comprised of 6 haplotypes, which had no clear structure. Lineage E was comprised 17 haplotypes of which 12 formed highly supported sublineage E1. The MJN depicted patterns similar to those of the gene tree (Fig. 2). Haplotypes H1 and H11, which were more common than the other haplotypes, occupied central positions.

Divergence time estimates
Results of model comparisons were shown in Table 2. Both path sampling and stepping-stone sampling suggested that the BSP demographic model under strict clock outperformed the alternative models. The average divergence time estimations were shown in Fig. 3

Genetic structure and demographic history
Results of population clustering were illustrated in Figure 4. F CT plateaued when K was 4 (Table S2). Samples from Lineage W grouped together (Fig. 4, dark blue). Lineage E contained 3 groups, of which populations in the southern edges of the QTP formed two small groups (localities 2, 16 and 26; and locality 15 and 19); the remaining localities comprised the third group (Fig. 4, light blue). Spatial patterns of genetic diversity based on Ar and p were shown in Figure 5 and listed in Table S3. Several peaks indicated high levels of genetic diversity across the distribution of N. parkeri. This was congruent with the pattern of lineage divergence.
When forming two groups based on the YZR, among-group diversity accounted for 16.04% of the overall variation and amongpopulations within groups accounted for 80.76%. Dividing populations according to the mountains, among-group diversity accounted for 87.90% of the overall variation and among-populations within  groups accounted for 9.67%. Finally, after dividing populations based on results of the SAMOVA, among-group diversity accounted for 96.88% of the overall variation and among-populations within groups accounted for 0.97% (Table 3). For Lineage W, the values of Tajima's D and Fu's Fs were significantly negative (Table 4), which indicated population expansions. The hypothesis of sudden expansion was not rejected by mismatch distribution analyses (Fig. 6) as the SSD and Rag were insignificant (P SSD = 0.381 and P RAG = 0.658). In sublineage E1, significantly negative values of both Tajima's D and Fu's Fs supported population expansions. The mismatch distribution analyses did not reject the sudden expansion model (P SSD = 0.545 and P RAG = 0.609). In sublineages E2, Tajima's D and Fu's Fs were negative, but insignificant. In sublineage E3, Tajima's D was negative but Fu's Fs was positive, yet all values were not significant. The null hypothesis of sudden expansion model was not rejected by mismatch distribution analyses.

Species distribution modeling
The AUCs from random data ranged from 0.605 to 0.800 (0.73360.033; Table S4). For the SDMs of N. parkeri, the AUC of our data was 0.904 and this was significant better than that of a random model (P,0.001).
The MIROC model predicted that the distribution of N. parkeri at the LGM was much smaller than that of today (Fig. 7). The results based on CCSM model suggested a similar pattern, but with larger areas than those from MIROC model. Both models suggested range fluctuations after the LGM in northern, high altitude areas. The boundary area between GM and NM, where lineages W and E divided, has been unsuitable for N. parkeri since the LGM. The predicted distribution under GCC assuming the A1b scenario was shown in Figure 7. Nanorana parkeri is predicted to have a larger distribution in the northwestern QTP. However, the species' distribution will contract in the southeastern QTP.

Lineage divergence and ESUs
ESUs, which can be determined by genetic differentiation at neutral markers caused by isolation, are important when considering conservation actions. Our study builds on the framework for delineating ESUs of N. parkeri by reconstructing the patterns of lineage divergence and its drivers. Our analyses suggest the recognition of ESUs that correspond to lineages W, E1, E2 and E3. Our research demonstrates that COI can quickly delineate ESUs.
The gene trees and MJN depict a clear east-west split for N. parkeri. SDMs suggest that unsuitable habitat between the GM and NM, both now and during the LGM, drives this pattern. The boundary is the demarcation line of two QTP climatic zones that are largely congruent with the 400 mm precipitation line. East of this line, the region is semi-humid, and to the west semi-arid [52]. Similar pattern was also found in Hippophae tibetana [53,54].
Geographic features, such as rivers or mountains, appear to contribute little to the patterns of genetic divergence. The YZR, which would impose a north-south split, does not drive population divergence. The AMOVA explains only 16.04% of the genetic variation in this scenario. Elevations of NM range from 5000 m to 6000 m, but no substantial genetic divergence distinguishes frogs in south and north of NM. Lineages E and W are separated by the boundary of GM and NM. However, habitat barriers and environmental differences more likely generate this pattern than geographical features. Two observations support this hypothesis. First, populations of lineages W and E both occur south of the GM and NM. Second, N. parkeri occurs along the YZR valley, which flows across the boundary and connects lineages W and E.
The pattern of genetic divergence within each major lineage differs. The 6 haplotypes in Lineage W do not yield a clear pattern (Fig. 1). The network depicts a star-like haplogroup (Fig. 2). SAMOVA also suggests populations from W comprise a single group. In contrast, Lineage E differs in having three groups (Figs. 1, 2). Results from SAMOVA present the same pattern. Most northern populations form a group as do frogs from localities 15, 19 (E2) and localities 2, 16, 26 (E3). The difference in patterns between E and W may owe to topography. The intricate topography of the East Himalayas, which is a global biodiversity hotspot, provides suitable habitats and generates barriers that limit dispersal for Lineage E. Other species exhibit a similar pattern; populations in the fringes or southern slope of Himalayas are genetically different from those on the QTP [54,55]. Expansion and contraction of distribution ranges caused by climate changes provided opportunity for populations divergence in QTP [54][55][56][57].
The genetic pattern within N. parkeri suggests recognition of four ESUs, W, E1, E2 and E3. Certainly gene flow occurs within and between lineages E and W, as lineages or sublineages mix at localities 8, 12, 16 and 26. However, matrilineal uniqueness occurs in populations far from the boundary.

Glacial refugia and genetic diversity of N. parkeri
Refugia in the QTP prevented extinction during the LGM. In doing so, they conserved high levels of genetic diversity. In contrast, genetic diversity in recently occupied areas is much lower because of founder effects [2,3]. The identification of refugia is an important part of conservation because these areas preserve genetic diversity. Synthesizing DNA barcoding and SDMs, we quickly and effectively identify refugia for N. parkeri. Populations in refugia retain higher levels of diversity than population in newly occupied places (Fig. 5 and Table S3).
The multiple refugia hypothesis corresponds with patterns of mtDNA divergence. The matrilineal genealogy, MJN and SAMOVA analyses indicate the absence of genetic structure within Lineage W. Thus, Lineage W likely originates from a single refugium. The SDMs indicate the presence of suitable habitat in the river valley near locality 27 (Fig. 7) and the area retains a much higher level of genetic diversity (Fig. 5) than other sites. The neutral test and mismatch distribution analyses clearly detect a population expansion in Lineage W.
Our analyses identify several refugia in Lineage E. Sublineage E1 may originate from a northern refugium near localities 7 and 8, which occur in the valley at the confluence of the Lhasa River and the YZR (Fig. 7). During the LGM the suitable environment harbored a high level of genetic diversity. The absence of genealogical structure in sublineage E1 is congruent with a sudden population expansion, as are significantly negative values of Tajima's D and Fu's Fs, and the mismatch distribution analyses. Further, our analyses identify two microrefugia in the river valleys along southern edges of the QTP. Localities 2, 16 and 26, which  Populations from sampling localities north of the NM also retain a high level of genetic diversity (Fig. 5). Private haplotypes H4 (locality 38), H2 and H3 (locality 33), and H5 (locality 37) occur here (Table S1). The SDMs suggest that suitable habitat existed in northern areas of NM. This area appears to harbor another microrefugium. However, these private haplotypes fail to cluster together and the most common haplotype in the area is H1 (Fig. 2 & Table S1). Thus, we cannot reject the hypothesis that these private haplotypes originated during population expansions after the LGM.

Potential threats of GCC to N. parkeri
Our study suggests that the combination of DNA barcoding and SDMs can detect threats of GCC to the survival of species. This pipeline facilitates conservation. SDMs predict that suitable habitat for N. parkeri may experience great shifts in the near future. Whereas the Northwest QTP will offer suitable habitat for N. parkeri, the Southeast QTP will become unsuitable. Although suitable habitat will experience an overall expansion, populations in the Southeast QTP may experience sharp decreases in population size or become extirpated. Given that amphibians have poor dispersal abilities, the latter scenario may be an unfortunate consequence of GCC.
Our pipeline suggests that many populations of Lineage E may suffer from developing unsuitable habitats. The microrefugium near Nyingchi and Medog will become unsuitable under this prediction (Fig. 7). Fortunately, suitable habitat may persist in the two other refugia. Thus, genetic variation will decrease yet all may not be lost. For Lineage W, suitable habitat will disappear near the refugium at locality 27. Loss of this refugium will greatly decrease the amount of genetic diversity.
Our analyses suggest that genetic diversity of N. parkeri may greatly decrease. Although this facilitates effective conservation planning, we urge caution. Our SDMs for the future assume A1b scenarios, which involve rapid global economic growth with a balance of fossil and non-fossil energy sources. Each of possible scenarios yields different predictions about the future climate. All scenarios require constant adjustment according to global economic conditions. The complex landscape in the southern QTP may supply suitable microhabitats for N. parkeri which cannot be detected by our SDMs. Accordingly, we suggest the urgent development of an effective monitoring program, especially for populations in refugia that may lose suitable habitats.

Conclusion
DNA barcoding detects the genetic structure of N. parkeri and serves to define ESUs. Our analyses recover major lineages E and  Two microrefugia occur in river valleys along the Himalayas. Our analyses detect a population mixture after the LGM in some localities but these occurrences do not influence our designations of ESUs. Our study highlights the importance of valleys along the Himalayas for biodiversity conservation. Based on climate models under GCC, we predict the potential distribution changes and threats to genetic diversity of N. parkeri. Our pipeline, which combines DNA barcoding and SDMs, is an effective approach in conservation. Figure S1 Matrilineal genealogy of Nanorana parkeri based on BI analyses of COI sequence data. Bootstrap proportions $70% and Bayesian posterior probabilities $95% were treated as strongly supported (.) and bootstrap proportions $70% and Bayesian posterior probabilities $90% were treated as being moderately supported. Bootstrap proportions,70% and Bayesian posterior probabilities ,90% were treated as being unsupported (*). (TIF)