Genetic diversity and population structure of Prunus mira (Koehne) from the Tibet plateau in China and recommended conservation strategies

Prunus mira Koehne, an important economic fruit crop with high breeding and medicinal values, and an ancestral species of many cultivated peach species, has recently been declared an endangered species. However, basic information about genetic diversity, population structure, and morphological variation is still limited for this species. In this study, we sampled 420 P. mira individuals from 21 wild populations in the Tibet plateau to conduct a comprehensive analysis of genetic and morphological characteristics. The results of molecular analyses based on simple sequence repeat (SSR) markers indicated moderate genetic diversity and inbreeding (A = 3.8, Ae = 2.5, He = 0.52, Ho = 0.44, I = 0.95, FIS = 0.17) within P. mira populations. STRUCTURE, GENELAND, and phylogenetic analyses assigned the 21 populations to three genetic clusters that were moderately correlated with geographic altitudes, and this may have resulted from significantly different climatic and environmental factors at different altitudinal ranges. Significant isolation-by-distance was detected across the entire distribution of P. mira populations, but geographic altitude might have more significant effects on genetic structure than geographic distance in partial small-scale areas. Furthermore, clear genetic structure, high genetic differentiation, and restricted gene flow were detected between pairwise populations from different geographic groups, indicating that geographic barriers and genetic drift have significant effects on P. mira populations. Analyses of molecular variance based on the SSR markers indicated high variation (83.7% and 81.7%), whereas morphological analyses revealed low variation (1.30%–36.17%) within the populations. Large and heavy fruits were better adapted than light fruits and nutlets to poor climate and environmental conditions at high altitudes. Based on the results of molecular and morphological analyses, we classified the area into three conservation units and proposed several conservation strategies for wild P. mira populations in the Tibet plateau.

Introduction comprehensively reveal the genetic diversity and population structure of P. mira; (2) estimate morphological variation among and within populations; and (3) propose effective conservation strategies for the studied populations.

Materials
The leaf samples, fruits, and nutlets of 420 individuals were collected from 21 wild P. mira populations along the Yarlung Zangbo Grand Canyon and its tributary basin in the Tibet plateau (S1 Fig, Table 1). The longitude, latitude, and altitude of each individual were detected using a global position system. For molecular analyses, 20 individuals were sampled from each population, with a minimum distance of 50 m between any two individuals. Healthy and young leaves were collected and immediately preserved with silica gel for DNA extraction. For morphological analyses, 20 fruits and 20 nutlets were randomly selected from each individual. Finally, 8400 fruits and 8400 nutlets were collected to analyze the morphological variation within and among P. mira populations.
In this study, the 21 sampled populations covered the entire distribution of P. mira in the Tibet plateau. Among these populations, P1, P2, P3, P4, and P5 were located in Yarlung Zangbo Grand Canyon National Nature Reserve, and P14 was distributed in the Qomolangma National Nature Reserve. Our study was permitted and approved by these authorities. Furthermore, P8, P11, P16, P17, P18, P19, P20, and P21 were distributed in the wild, and P6, P10, P13, and P15 were distributed around roadsides, which were taken for the wild species. No specific permits were required for collecting these samples from the wild, and no specific permissions were required for these locations or activities. Furthermore, P7, P9, and P12 were distributed around rural houses, and were collected with the permission of private land owners. No Population structure strategies for Prunus mira from Tibet, China endangered or protected species were sampled. All populations could be divided into two groups according to their geographic distribution. The Linzhi group (G1) contained 16 populations (P1-P16). Among these populations, P1, P2, P3, P4, P5, and P14 were distributed in  low-altitude areas (2921-2979 m), and P6, P7, P8, P9, P10, P11, P12, P13, P15, and P16 were  located in medium-altitude areas (3111-3394 m; Table 1). The Shannan group (G2) included the five remaining populations (P17-P21), which were distributed in the high-altitude areas (3661-3844 m).
DNA concentration, SSR analysis, and polymerase chain reaction amplification The total genomic DNA of all dried leaves was extracted using the CTAB method [24], and DNA concentration and quality were tested using a 1% agarose gel. In all, 25 microsatellite markers, developed from three species related to P. mira, were selected to determine the genetic diversity and population structure of P. mira, including the 22 loci from P. persica [25][26][27][28][29][30], one locus from P. cerasus [31], and two loci from P. avium [32] (S1 Table). The forward primers of all SSR markers were assembled using M13 (5 0 -TGTAAAACGACGGCCAGT-3 0 ), which contained one of four fluorescent dyes-FAM, NED, VIC, and PET. Genetic analysis GenAlEx version 6.501 [33] was used to convert size data into various formats for genetic analysis. Based on the Bayesian method, we used STRUCTURE version 2.3.4 [34] to determine the population structure of the 21 P. mira populations. Twenty independent runs were performed for each set, with a K value between 1 and 21, length of burn-in period of 1 × 10 5 iterations, and 1 × 10 5 MCMC iterations. No prior information about populations was used. The method described by Evanno et al. [35] was used to calculate the distribution of delta K (ΔK) using the online website STRUCTURE HARVESTER [36] (http://taylor0.biology.ucla.edu/struct_ harvest/). Permutations of the most likely results among all runs for each K were performed in CLUMPP [37], and the final figure was visualized using Distruct software [38]. GENELAND [39,40] is a powerful way to detect genetic boundaries in R (2011). This method was used to estimate the number of clusters and their spatial patterns. The analysis was based on an uncorrelated frequency model, and was conducted over 10 replicates for each K value (1-10). The null allele models were selected to infer the number of clusters using 1,000,000 MCMC iterations, of which every 1000 th was retained. The replicates with the highest mean logarithm of posterior probability were used to compute the posterior probabilities of population membership for each pixel of the spatial domain with a burn-in of 500.
P. mira genetic variation was estimated by conducting an analysis of molecular variance (AMOVA) in Arlequin version 3.5 [41]. This analysis subdivided all individuals into 21 populations and K genetic clusters. Three hierarchical divisions were identified based on the genetic variance: within populations, among populations within genetic clusters, and among genetic clusters using a nonparametric permutation procedure that incorporated 10,000 iterations. An unrooted unweighted pair-group method with arithmetic means (UPGMA) tree was constructed using Nei's genetic distance [42] in NTSYS PC version 2.10 [43]. Mantel tests were conducted between the genetic distance (F ST /(1-F ST )) and geographic distance (km) in GenAlEx version 6.501. GenAlEx version 6.501 was used to calculate the following genetic indices for loci, populations, and genetic clusters identified in the STRUCTURE analysis: allele number (A), effective allele number (Ae), private allele number (Np), Shannon's information index (I), expected heterozygosity (He), observed heterozygosity (Ho), gene flow (Nm), and F-statistics indices (F IS , F IT , and F ST ). The Hardy-Weinberg equilibrium (HWE) was tested for each locus and population using Arlequin version 3.1 with 100,000,000 steps in the Markov chain and 100,000 dememorization steps. Polymorphism information content (PIC) was calculated using the following formula [44]: where P i and P j are the frequency of the amplified alleles, and n is the number of alleles at each SSR locus.

Morphologic analysis
In all, 20 fruits and 20 nutlets were randomly selected per individual to measure and calculate the following 11 morphological characteristics: fruit weight, fruit length, fruit width, fruit diameter, fruit shape index (fruit length/fruit width), nutlet weight, nutlet length, nutlet width, nutlet diameter, nutlet shape index (nutlet length/nutlet width), and nutlet surface. The weights of fruits and nutlets were tested using an electronic balance (accuracy of 0.001 g). The lengths, widths, and diameters of fruits and nutlets were measured using a digital Vernier caliper (accuracy of 0.001 mm). The mean values of all morphological characteristics were used to construct an unrooted UPGMA tree in NTSYS PC version 2.10 based on Nei's genetic distance (1978). IBM SPSS Statistics 19 software was used to conduct Duncan's multiple comparison analysis, calculate coefficients of variation (CV, %), and draw the boxplots of the 11 morphological characteristics. For clustering and statistical analyses, the three types of nutlet surfaces were classified as follows: 1 (smooth), 2 (shallow groove), and 3 (deep groove).

Characteristics of SSR markers
Twenty-five SSR markers were selected to identify genotype 420 P. mira individuals. A total of 214 alleles were detected across all loci, ranging from four (UDP96-019) to 15 (BPPCT-025 and PMS67), with an average of 8.6 alleles per locus (S3 Table). The effective number of alleles (Ae) was between 1.5 (UDP96-019) and 7.5 (BPPCT-025), with an average of 3.6. The mean expected heterozygosity (He) was 0.52, and it exceeded the observed heterozygosity (Ho = 0.46).

Genetic structure and differentiation
The genetic structure of 21 P. mira populations was tested using Bayesian clustering methods in STRUCTURE version 2.3.4. The classification of populations was highly correlated with their geographic altitudes. When K = 2, 16 low-and medium-altitude populations (P1-P16) from Linzhi group and five high-altitude populations (P17-P21) from Shannan group were assigned to two different clusters with high ancestry values (Q ! 0.80; Fig 1, S4 Table). When K = 3 and Q ! 0.60, 21 populations could be clearly assigned to three genetic clusters (Fig 1,  S4 and S5 Tables). Cluster I consisted of six low-altitude populations (P1, P2, P3, P4, P5, and P14) from the Linzhi group. Of the 120 individuals from these populations, 88 individuals (73.33%) were assigned to Cluster I; 25 individuals (20.84%), to Cluster II; and seven individuals (5.83%) were mixed (Q < 0.60). Cluster II included 10 medium-altitude populations in the Linzhi group (P6, P7, P8, P9, P10, P11, P12, P13, P15, and P16). Among these populations, only 11 individuals (5.50%) were assigned to Cluster I, and the Q values of six individuals (3.00%) were below 0.60. Cluster III included five high-altitude populations (P17, P18, P19, P20, and P21) from the Shannan group. Among these populations, only 13 individuals (13.00%) were assigned to two other clusters, and 15 (15.00%) were mixed. The maximum Each individual is shown as a vertical line divided into segments representing the estimated membership proportion; different genetic clusters were inferred using STRUCTURE. At K = 2, high-altitude populations (P17-21) were separated from the medium-and low altitude populations. Furthermore, medium-altitude (P6-13, P15, P16) and low-altitude (P1-5, P14) populations were assigned to two distinct clusters at K = 3. value of ΔK was observed at K = 3 (S2 Fig), indicating that the 21 populations were potentially assigned to three clusters. We further investigated the spatial patterns among the 21 P. mira populations based on an uncorrelated frequency model in GENELAND. This analysis inferred three distinct clusters that were the same as those identified in the STRUCTURE analysis (Fig 2). Spatial patterning indicated substantial genetic boundaries among low-, medium-and high-altitude populations.
The unrooted UPGMA tree also denoted three clusters that were consistent with those of STRUCTURE and GENELAND analyses (Fig 3). Therefore, we divided the 21 P. mira populations into three genetic clusters for subsequent analyses.
The Mantel test revealed a moderately positive correlation between genetic and geographic distance (r = 0.50, P = 0.01) across the entire distribution of P. mira populations (Fig 4A). This result indicated significant isolation-by-distance (IBD) among all populations. We also tested the pattern of IBD within each genetic cluster. Three separate Mantel tests indicated that a highly positive correlation between geographic and genetic distance still existed within Cluster III (r = 0.81, P = 0.02; Fig 4B), but it was absent within Cluster I (r = -0.64, P = 0.12) and Cluster II (r = 0.13, P = 0.19; Fig 4C and 4D).
To investigate genetic differentiation in P. mira, we conducted an AMOVA and calculated pairwise differentiation values (F ST ) and pairwise gene flow values (Nm) among all populations and the three genetic clusters. Genetic differentiation between the 21 populations was significant and high as evidenced by a global F ST value of 0.16 (P = 0.01; Table 2). The pairwise F ST and pairwise Nm values ranged from 0.01 to 0.36 and from 0.44 to 20.33, respectively (S6 Table). Low genetic differentiation and frequent gene flow were detected between pairwise populations from the same genetic cluster. Populations in Cluster III were significantly and greatly differentiated from the populations in the other two clusters. Among the three genetic clusters, genetic differentiation was also significant and high (F ST = 0.18, P = 0.01; Table 2). The differentiation between Clusters I and II was significant, but very slight (F ST = 0.03, P = 0.01), whereas the genetic differentiation between Cluster III and the other clusters was significant and slightly high (F ST = 0.14, P = 0.01 and F ST = 0.13, P = 0.01; S7 Table). Furthermore, an AMOVA based on the 21 populations revealed 83.7% of variation within populations, and that based on the three genetic clusters detected 81.7% of variation within populations, 13.6% among the three clusters, and 4.7% among the populations within each cluster (Table 2). These results supported high genetic variation within the P. mira populations.

Genetic diversity
The mean A, Ae, He, Ho, and I values of the 21 populations were 3.8, 2.5, 0.52, 0.44, and 0.95, respectively, indicating moderate genetic diversity in P. mira populations ( Table 3). The highest genetic diversity was detected in P17, and the lowest diversity was observed in P18. In addition to P4, P17, and P20, the remaining 18 populations had private alleles (Np), which ranged from one to four alleles per population. Interestingly, P18 had the highest number of private alleles (Np = 4), although it contained the lowest genetic diversity. Genetic diversity was also estimated within each genetic cluster (Table 3). Cluster II contained the highest genetic diversity, followed by Cluster I, and Cluster III had the lowest genetic diversity.

Morphological clustering and variation
We measured and calculated 11 morphological characteristics of 8400 fruits and 8400 nutlets from the 21 populations, and we used these morphological data to analyze morphological variation among and within P. mira populations. An unrooted UPGMA tree was constructed  When the 21 populations were divided into three clusters based on the morphological data, with the exception of the fruit shape index, the remaining 10 morphological characteristics exhibited a wide range of variation and significant differences among the different clusters (S8 Table). The nutlet surface of most individuals was smooth in Cluster I, shallow groove in Cluster II, and deep groove in Cluster III (S4 Fig, S9 Table). Most of the morphological  Population structure strategies for Prunus mira from Tibet, China characteristics increased as the geographic altitudes of populations increased. For instance, low-altitude populations in Cluster I had the lightest and smallest fruits and nutlets, and highaltitude populations from Cluster III contained the heaviest and largest fruits and nutlets. Furthermore, all characteristics exhibited no or low differences among the populations within each cluster (S8 Table, Fig 5). Among the six populations in Cluster I, five characteristics (fruit weight, fruit diameter, nutlet width, nutlet diameter, and nutlet surface) exhibited no significant differences, and the remaining showed low morphological variations. Among the 10 populations from Cluster II, no significant variation was detected for four characteristics (fruit width, fruit diameter, nutlet weight, and nutlet surface), but a low degree of differences was observed for seven characteristics. In Cluster III, only four characteristics (fruit width, fruit shape index, nutlet shape index, and nutlet diameter) indicated low differences among the five populations. The coefficients of variation (VC) within populations ranged from 1.30% to 36.17% (S8 Table). The F values among populations were higher than those within populations for all morphological characteristics. These results indicated that most of the variation occurred among populations, which was not consistent with our molecular analysis results.

Discussion
Knowledge of genetic diversity and morphological variation can provide basic information that can aid our understanding of plant evolutionary history and the protection of extant wild resources. In this study, we sampled 420 P. mira individuals from 21 wild populations to study their genetic diversity, population structure, and morphological variation by conducting molecular analyses and morphological evaluations.

Population structure and differentiation
The STRUCTURE and GENELAND analyses based on SSR markers revealed three genetic clusters, which were highly correlated with geographic altitudes. The most likely reason for this result is that altitude differences led to significant climatic and environmental differences in water, soil, light, and temperature among the populations distributed at different altitude ranges. Climatic and environmental factors can significantly influence plant geographic distributions, evolutionary patterns, and phenological phases [45]. According to the results of our field investigation, P. mira populations at high altitudes usually flower five to 14 days later than populations at low altitudes. This phenomenon can significantly reduce the gene flow through pollen, and it can shape clear structures among populations that were distributed at different altitudinal ranges. Mantel et al. [46] suggested that large geographic distances would limit pollen and seed dispersal among populations. In this study, moderately positive IBD was detected among all P. mira populations. However, although we conducted separate Mantel tests within clusters, different results were revealed. All of these results indicated that geographic distance is not a crucial factor for the current genetic composition in partial small-scale areas. Regarding Population structure strategies for Prunus mira from Tibet, China populations from Clusters I and II, we inferred that altitude gradients might have more significant effects on population structure than geographic distance. For instance, P14 was not assigned to Cluster II based on pairwise geographic distances, but was instead grouped with Cluster I because of its geographic altitude.
Our findings indicated that Cluster III was significantly and highly differentiated from the other two clusters. Five populations from Cluster III and 16 populations from Cluster I and Cluster II are distributed in Shannan and Linzhi, respectively. In the central region of the Tibet plateau, numerous mountains, rivers, and valleys shape geographic barriers between the two geographic groups, thereby leading to the observed high genetic differentiation. However, low gene flow values (Nm < 1) were observed between pairwise populations from different geographic regions, thus indicating that the gene flow cannot combat genetic differentiation caused by genetic drift [16,18]. According to Slaktin et al. [47], the founder effect is an important form of genetic drift in isolated populations, so the genetic drift caused by "founder effects" might have affected the genetic structures of P. mira populations.  [49,50]. Theoretically, the genetic diversity of endangered species should be low, but this was not the case for P. mira. Moderate or even high levels of genetic diversity have been reported in some endangered species [51,52]. Regarding P. mira, high adaption to various environments and wide distributions might have helped shape and preserve the genetic diversity of populations under poor living conditions [53,54]. For example, more than 60% of Tibetan areas are mountains, and the impact of topography and elevation led to significant climatic and environmental differences in water, soil, light, and temperature, thus resulting in a number of distinct and natural geographical areas. Overall, the natural ecological conditions of the Tibet region are poor because the plateau climate is characterized by thin air, low air pressure, reduced oxygen, thin soil, low rainfall, stronger solar radiation, perennial ice and snow, and frost weathering [55]. The effective population size of P. mira is relatively large, but more recent periods of excessive logging and fragmentation due to habitat destruction may have led to the initial loss of genetic diversity, despite the short period of fragmentation. Therefore, this factor might be also responsible for the moderate genetic diversity in P. mira.

Genetic diversity of populations
Positive inbreeding coefficients were detected within each population and at each locus, indicating the significant excess of homozygotes in P. mira. Deforestation and geographic isolation have led to the heavy fragmentation of P. mira habitats [6,48,55]. Habitat fragmentation can limit genetic exchange among populations, and it can increase inbreeding within a population [56][57][58]. Thus, moderate inbreeding in P. mira could be attributed to habitat fragmentation. Previous studies [55] revealed that human activities such as over-harvesting and deforestation had significant effects on wild P. mira resources near rural areas in the Tibetan plateau. In this study, we found 7 populations near rural areas that deviated from HWE, thus indicating that recent human activities might have significant effects on these P. mira populations.

Genetic and morphological variation
Perennial woody plant species are expected to preserve high variation within populations because of their long life cycles [59][60][61]. An AMOVA based on SSR markers revealed that most of the total genetic variation (83.7% and 81.7%) occurred within P. mira populations, which is higher than that reported previously based on SSR (81.0%), ISSR (71.0%), and AFLP (74.48%) data [7,48]. This discrepancy can be attributed to the fact that the number of individuals analyzed in this study was considerably greater than that in previous studies. For P. mira, an outcrossing system and long life cycle (over 1000 years) can improve gene flow and preserve shared ancestral genetic information among populations, thus significantly improving genetic variation within populations.
Both molecular and morphological data revealed three clusters that were correlated with geographic altitudes. Interestingly, our morphological analyses showed fewer differences within populations, and this was contrary to our molecular analysis results. The morphological variation among the three clusters was higher than that within clusters. P. mira individuals are distributed over a narrow altitude range within each population, although they are widely distributed in the Tibet plateau. Combined with our molecular results, we concluded that the morphological differences may be caused by the relatively isolated distributions and various climates and environments associated with P. mira populations from different geographic altitudes. For instance, high-altitude populations contained larger and heavier fruits and nutlets than the medium-and low-altitude populations. These results were consistent with the findings of Pluess et al. [62] in that large seeds could be retained at high-altitude areas because of their high seedling rates under poor environmental conditions.

Conservation strategy
Studying genetic diversity, population structure, and morphological variations could provide basic information about plant conservation [63]. Although the levels of genetic diversity of P. mira populations were moderate, some populations still exhibited high risk of genetic variation loss, and human activities and climates had significant effects on most P. mira populations. Therefore, there is also an urgent need to conduct P. mira resource conservation.
The analyses based on SSR markers and morphological characteristics revealed three completely consistent clusters among the 21 wild P. mira populations. Each cluster contained unique genetic and morphological variations and had different evolutionary histories. Therefore, the conservation efforts should be divided into three units that correspond to the three genetic and morphological clusters. Because the six populations in Cluster I were found in national nature reserves, more attention needs to be paid to the other clusters. Furthermore, in situ conservation should prioritize populations with high genetic diversity in Clusters II and III such as P11 and P17. Because human activities have had significant effects on many P. mira populations (P6, P7, P9, P10, P12, P13, and P15), we propose the strengthening of conservation awareness in local farmers and the hanging of placards to reduce deforestation and prevent continuing habitat deterioration. We also suggest ex situ conservation for individuals with unique traits and populations with high inbreeding to prevent the potential loss of genetic variation. Furthermore, improving the genetic diversity within populations requires the monitoring of crossing among populations.

Conclusions
This study comprehensively analyzed the genetic diversity, population structure, and morphological variation of 21 wild P. mira populations in the Tibet plateau. Moderate levels of genetic diversity were detected in P. mira populations. Significant homozygote excess was observed within each population, and this might be the result of sibling mating caused by habitat fragmentation. Furthermore, human activities might be responsible for the deviation of numerous P. mira populations from the HWE. Both molecular and morphological analyses assigned the 21 populations to three clusters, which were significantly correlated with the geographic altitudes. Because of the presence of geographic barriers and genetic drift, populations in Shannan were highly differentiated from populations in Linzhi. Lastly, we formed three conservation units and proposed several conservation strategies for these studied populations.  Table. Data associated with 11 morphological characteristics of the 420 P. mira individuals. (XLSX)