High-Level Genetic Diversity and Complex Population Structure of Siberian Apricot (Prunus sibirica L.) in China as Revealed by Nuclear SSR Markers

Siberian apricot (Prunus sibirica L.), an ecologically and economically important tree species with a high degree of tolerance to a variety of extreme environmental conditions, is widely distributed across the mountains of northeastern and northern China, eastern and southeastern regions of Mongolia, Eastern Siberia, and the Maritime Territory of Russia. However, few studies have examined the genetic diversity and population structure of this species. Using 31 nuclear microsatellites, we investigated the level of genetic diversity and population structure of Siberian apricot sampled from 22 populations across China. The number of alleles per locus ranged from 5 to 33, with an average of 19.323 alleles. The observed heterozygosity and expected heterozygosity ranged from 0.037 to 0.874 and 0.040 to 0.924 with average values of 0.639 and 0.774, respectively. A STRUCTURE-based analysis clustered all of the populations into four genetic clusters. Significant genetic differentiation was observed between all population pairs. A hierarchical analysis of molecular variance attributed about 94% of the variation to within populations. No significant difference was detected between the wild and semi-wild groups, indicating that recent cultivation practices have had little impact on the genetic diversity of Siberian apricot. The Mantel test showed that the genetic distance among the populations was not significantly correlated with geographic distance (r = 0.4651, p = 0.9940). Our study represents the most comprehensive investigation of the genetic diversity and population structure of Siberian apricot in China to date, and it provides valuable information for the collection of genetic resources for the breeding of Siberian apricot and related species.


Introduction
Siberian apricot (Prunus sibirica L.), an ecologically and economically important tree species, is widely distributed across the mountainous areas of northern and northeastern China, eastern Siberian, and Mongolia [1]. It can adapt to a variety of harsh environmental conditions, including cold stress, drought stress, and reduced soil fertility, making it one of the primary choices for controlling desertification in northern and northwestern China. Siberian apricot almond is not only a traditional dry food, but also an important raw material for food, cosmetics, and biodiesel manufacturing. Thus, Siberian apricot is important to the income of farmers in these areas [2,3].
In recent decades, almond products have become increasingly popular on the domestic and international market. Consequently, many almond processing plants have been established around the major areas of production in China. However, Siberian apricot resources are declining due to backward management patterns and deterioration of the natural environment [4]. Furthermore, diseases and insect pests such as awning caterpillar (Malacosoma neustria testacea Motsch) and leaf roller (Adoxophyes honmai) have made the originally fragile natural environment even worse [5]. Despite the hardiness of Siberian apricot, its flowers will wither if a late frost hits during flowering, and this can cause a serious reduction in yield or no yield at all. Therefore, there is an urgent need to develop a Siberian apricot cultivar with increased tolerance to both abiotic and biotic stresses. The success of breeding programs is based on the knowledge and availability of genetic variability for efficient selection [6]. However, Siberian apricot, as a building block for breeding programs, has not been extensively studied in China until now.
Increased knowledge of the genetic diversity and population structure of Siberian apricot in China will provide the basis for protecting, utilizing, and improving our resources. Therefore, an assessment of the extent and nature of the genetic variation in Siberian apricot is important for breeding and genetic resource conservation programs. Traditionally, genetic diversity has been assessed based on morphological characteristics, which are often influenced by the environmental conditions. With the advent of molecular markers, including restriction fragment length polymorphisms, amplified fragment length polymorphisms, simple sequence repeats (SSRs), and single nucleotide polymorphisms, much progress has been made in understanding the genetic diversity and population structure of various species [7][8][9][10]. Among these markers, SSRs have been the first choice for the study of genetic diversity and population structure owing to their desirable genetic attributes, including high numbers of polymorphisms, wide genomic distribution, co-dominant inheritance, and high degree of reproducibility [11,12]. Nuclear SSR makers have also proven to be very useful for the evaluation of genetic diversity in apricot [13,14]: they have been employed to investigate the genetic diversity of Siberian apricot in the Yan Mountains of China [15]. However, a comprehensive analysis of Siberian apricot genetic diversity and its population structure in China at the DNA level is lacking.
In this study, 31 nuclear SSR loci developed previously for this species [16] were used to analyze the genetic diversity and structure of Siberian apricot populations in China. The objectives of the study were to provide a complete picture of the organization of genetic diversity of Siberian apricot populations in China and to reveal the origin of the genetic variation in Siberian apricot populations.

Materials and Methods
Sampling A total of 672 individuals of Siberian apricot representing 22 populations were collected throughout the areas of distribution in China (Table 1). A total of 25 to 32 individuals were sampled for each population, and the coordinate of each tree was recorded using a global positioning system. The distance between any two individuals at each location was .50 m. The 22 populations were from 21 sampling locations (P5 and P17 were from the same region) across 18 longitudes in the east-west direction and across 6 latitudes in the north-south direction. The highest altitude of the locations was 1,334 m (P20), while the lowest altitude was 87 m (P1). Daqing (P14) had a minimum altitude gap of only 3 m, while Weichang (P20) had a maximum altitude gap of 271 m. The sampled populations were divided into six groups according to their geographical locations. The Yan Mountains group (G1) included P7, P8, P9, P10, P11, and P12; the Greater Khingan Mountains group (G2) included P18, P19, P20, P21, and P6; the Western Liaoning Hills group included (G3) P1, P2, P3, P4, P5, P17, and P22; the Northeast Plain group (G4) included P13 and P14; the Linkou group (G5) included P15; and the Daqingshan Mountain group (G6) included P16 (Figure 1). No specific permits were required for this field study. All sampling locations were public space where anyone can enter and collect forest products, regardless of ownership. In addition, the field study did not involve endangered or protected species.
In China, Siberian apricot has been cultivated for decades in an experimental forest. Currently, the main method of propagation is to sow seeds collected from the immediate area or near the region without selection. Three such populations were collected and designated as semi-wild type. All other populations were from the wild (Table 1). Young leaves were collected and placed immediately in Ziploc bags preloaded with colored silica gel to dry them and preserve them for DNA extraction.

Microsatellite DNA Analysis
Total genomic DNA was extracted from dry leaves collected from all localities using a modified version of the cetyl trimethylammonium bromide method [17]. The quality and concentration of the extracted DNA was determined by 1% agarose gel electrophoresis and ultraviolet spectrophotometry.
Thirty-one microsatellite loci were employed to study the genetic diversity on wild Siberian apricot accessions including 23 recently developed in Siberian apricot [16,18], one from apricot (Prunus armeniaca L.) [19] and seven from peach (Prunus persica L.) [20][21][22] (Table S1). The forward primer of each pair was tagged with a section of the universal M13 sequence (59-TGTAAAAC-GACGGCCAGT-39) during synthesis. Amplification was performed in a 10-mL reaction mixture containing 1 mL of DNA template (10 ng/mL), 5 mL of 2X Taq mix, 0.4 mL of the forward primer (1 mM), 1.6 mL of the reverse primer, 1.6 mL of M13 primer (1 mM) with a fluorescent label (FAM, HEX, ROX, or TAMRA), and 1.4 mL of ddH 2 O. The reaction conditions were: 94uC for 5 min, followed by 30 cycles of 94uC for 30 s, 55uC for 30 s, and 72uC for 30 s, followed by 8 cycles of 94uC for 30 s, 53uC for 40 s, and 72uC for 30 s, with a final extension at 72uC for 10 min. The products were separated in an ABI 37306L DNA Analyzer using GeneScan-500LIZ as an internal marker (Applied Biosystems, Foster City, CA, USA). The amplicon fragments were sized using Gene-Marker 1.75 software (SoftGenetics LLC, State College, PA, USA). All rare alleles and private alleles were reamplified. For the alleles from the homozygous loci, the purified PCR products were sent to sequence. For the alleles from the heterozygous loci, the targeted fragments were separated, cloned and sequenced following the protocol by Chen et al [23]. These sequences were compared with target fragments to distinguish whether they were non-specific amplifications.

Data Analysis
FLEXBIN was used for automated binning of the microsatellite raw data [24], and the Excel Microsatellite Toolkit [25] was employed to convert the size data into various formats for further analysis. The level of genetic diversity was estimated using GENALEX software version 6.41 [26] with the following statistics: number of alleles (Na), effective number of alleles (Ne), Shannon's Information Index (I), observed heterozygosity (Ho), expected heterozygosity (He) [27], and F-statistics calculations (F IS , F IT , and F ST ).
Clustering based on a Bayesian model was used to evaluate the genetic structures of the Siberian apricot populations with the software package STRUCTURE [28] in its extended version 2.3.3 [29,30]. The admixture model and independent allelic frequencies were employed to analyze the data set without prior population information. The length of the burn-in period and number of MCMC reps after burn-in were set to 25,000 and 100,000, respectively. These steps were used to determine the ancestry value, which estimates the proportion of an individual's genome that originated from a given genetic group. The algorithm was run ten times for each K value, from 1 to 22. Using an ad hoc quantity constructed from the second-order rate of change of the likelihood function with respect to K (DK), the distribution of DK showed a clear peak at the true value of K [31].
The observed genetic variation among and within the populations and genetic groups was characterized by an analysis of molecular variance (AMOVA) using ARLEQUIN version 3.5  [32]. This analysis subdivided the 22 populations into two different origin groups, six geographical groups and K groups. Three hierarchical divisions were identified based on the genetic variance: within populations, among populations within groups, and among groups using a nonparametric permutation procedure incorporating 10,000 iterations. In addition, we tested all of the loci for deviations from Hardy-Weinberg equilibrium (HWE) using ARLEQUIN version 3.5 [33] with 100,000,000 steps in the Markov chain [34] and 100,000 dememorization steps. We selected F ST and R ST to calculate the genetic differentiation of all population pairs. The values of F ST and R ST were calculated using FSTAT version 2.9.3 [35] and ARLEQUIN version 3.5 [33], respectively. To examine the effect of geographic distance on genetic structure, correlations between the pairwise genetic distances, represented by F ST /(12F ST ) estimates [36], and pairwise geographic distances among 19 wild populations, which were calculated according to the latitude and longitude of each site with Vincenty's formula (http://www.movable-type.co.uk/ scripts/latlong-vincenty.html), were tested using the Mantel test implemented by Isolation By Distance Web Service version 3.23 (http://ibdws.sdsu.edu/,ibdws/) [37,38]. We also employed Monmonier's maximum difference algorithm to highlight geographical features corresponding to pronounced genetic discontinuity using BARRIER version 2.2 [39].

Genetic Diversity Among the Loci
A total of 31 microsatellite loci were used to genotype 672 individuals of Siberian apricot ( Table 2). The genetic profiles detected 599 alleles, which ranged between 5 and 33 per locus, 207 of which were rare alleles with a frequency below 1%. The

Genetic Structure of the Siberian Apricot Samples
The genetic structure of the Siberian apricot samples was investigated by a Bayesian-based population assignment analysis using STRUCTURE [28]. Our results show a clear maximum for DK at K = 4 ( Figure 2B), in which all individuals were classified into four different clusters. About 80% individuals belonged to each genetic cluster, which showed strong ancestry values with an average .0.90 (Table S2). Regarding the genetic cluster 1 (C1) which included P1, P2, P3, P4, P5 and P17, only 14 individuals (7.4%) showed ancestry values ,0.60. Eighteen individuals which were from the locations belonged to other genetic clusters. These individuals corresponded to two accessions from P9, two accessions from P10, two accessions from P11, six accessions from P12, one accession from P18, and five accessions from P19 ( Figure 3 and Table S2). The genetic cluster 2 (C2) consisted of P6, P7 and P8, and only 13 individuals (13.1%) showed ancestry values ,0.60. We found 10 individuals were from the locations belonged to other genetic clusters for C2. These individuals corresponded to two accessions from P2, one accession from P9, three accessions from P10, one accession from P11, one accession from P19, and two accessions from P20. Within the genetic cluster 3 (C3), which contained population P9, P10, P11, P12, P13, P14 and P15, only 25 individuals (11.6%) showed ancestry values , 0.60, and 11 individuals were from the locations belonged to other genetic clusters. These individuals correlated to one accession from each of the population P1, P2, P3 and P6, five accessions from P17, and two accessions from P19 ( Figure 3 and Table S2); All remaining populations including P16, P18, P19, P20, P21 and P22 were clustered into the genetic cluster 4 (C4), among which 16 individuals (9.5%) showed ancestry values ,0.60. And only two individuals were from P6 which belonged to C2 (Figure 3 and Table S2).
At the same time, the second largest DK at K = 7 was much larger than the remaining values. In addition, two clear peaks were observed at K = 10 and 14 ( Figure 2B). When K = 7, P16 and P17 were separated into two new genetic clusters from genetic cluster C1 and genetic cluster C2, while genetic cluster C3 was divided into two genetic clusters. On the basis of seven genetic clusters, the 3th, 4th and 7th genetic clusters, were all split into two genetic clusters while K = 10. When k = 14, the 1st and the 2nd genetic clusters were further divided into two genetic clusters, and the 10th genetic cluster was divided into three detailed genetic clusters on the basis of the clustering of ten genetic clusters (Figure 3).
The value of Na for the wild genotypes was significantly higher than that for the semi-wild genotypes ( Table 3). The number of private alleles in the wild genotypes was far greater than that in the semi-wild genotypes. These differences could be associated with the huge disparities in sample size. The values of Ho and He for the wild genotypes were almost equal to the values for the semiwild genotypes.
The Ho and He values in genetic cluster C3 were slightly larger than those in the other genetic clusters (Table 3) whereas genetic cluster C4 was the lowest Ho value; however, regardless of whether the individuals were considered to be wild or semi-wild, and regardless of whether they belonged to which genetic cluster, the Ho value was significantly lower than the He value. This result is in agreement with the high value of the fixation index, suggesting a deficit of heterozygotes with regard to the expectations of HWE. A comparison of private alleles between the wild (152) and semiwild (11) populations showed a significant difference between them (Table 3). When all populations were considered, P1 contained the most private alleles (7); no private alleles were found in P12 and P14.

AMOVA
Our AMOVA revealed that a low percentage of variation was divided among natural populations, different origins, geographical distribution, and genetic clusters, respectively (Table 4). About 94% of the variation was attributed to differences within populations in all variance partitions. A hierarchical AMOVA of the four genetic clusters using STRUCTURE revealed that 1.87% of the variance was distributed among them, and it produced the largest F ST value (0.06008). Seven genetic clusters revealed the highest percentage of variation (3.48%) among them, and it produced the second largest F ST value (0.06002). With the populations grouped according to their geographical origin, a lower percentage of variation (1.98%) could be explained by geographic factors. When the populations were grouped according to their origin, a negative percentage of variation was detected among the groups.

Genetic and Geographic Relatedness
The pairwise genetic differentiation values (F ST and R ST ) calculated for the 22 populations showed genetic differentiation between each population ( Table 5) [36] indicated that the most closely related Siberian apricot populations were P11 and P12, even though the geographical distance between them was not the closest. The greatest geographic distance (1,526.5 km) was between P15 and P16; however, this pairing did not have the largest Rousset's distance (0.123). The largest Rousset's distance (0.156) was between P16 and P22. The Mantel test (Figure 4) showed that genetic distance was not significantly correlated with geographic distance (r = 0.4651, p = 0.9940).

The Identification of Genetic Barriers
A genetic barrier prediction analysis using Monmonier's maximum difference algorithm identified three putative barriers when all populations were included ( Figure 5A). The first barrier separated the western peripheral population P16 from all other populations. The second predicted barrier separated population P22, which was located in the center of the distribution areas. The third predicted barrier separated population P20,. When only the 19 wild populations were included ( Figure 5B), the first barriers separated P16, similar to the result obtained when all of the populations were included. The second predicted barrier separated P20 and P22 from the other populations. There was a gap between P20 and P22 that could be associated with each other. The second and the third predicted barriers together separated P8 from the other populations.

Genetic Diversity of Siberian Apricot in China
Heterozygosity is an important measurement of gene diversity [40]. In our study, a relatively high level of genetic diversity was detected at microsatellite loci in Siberian apricot; the mean Ho and He values were 0.639 and 0.774, respectively. Similar values were reported for populations of Siberian apricot in the Yan Mountains (Ho = 0.668, He = 0.788) [15]. Fewer polymorphisms have been reported for apricot (Prunus armenica L.; Ho = 0.615, He = 0.621) [41]. The genetic diversity of Chinese wild almond (Amygdalus nana L.; Ho = 0.339, He = 0.219) is reportedly even lower [42]. Ferrer et al. [43] found that the number of loci and populations included in studies might affect estimates of genetic diversity. In our study, the number of loci and samples was larger than in the aforementioned studies. The geographic range of the species and species characteristics (e.g., long-lived, outcrossing, and wind-pollinated) also influenced the genetic diversity, and high heterozygosity could be favorable in long-lived plants growing in arid zones. Indeed, Siberian apricot is long-lived, wind-pollinated, self-incompatible, and distributed across a wide area with a harsh environment, which may be one cause of the high level of genetic diversity and high number of alleles of per loci detected in Siberian apricot populations. We have found many morphological variations in our field investigation, such as double petals apricot, green sepal apricot, big flower apricot, late flowering apricot, heartshaped apricot, sweet benevolence apricot and so on, which have not been reported previously. Among the populations, P16 and P21 had the lowest level of genetic diversity (Table 3). P16 is The fifth analysis included seven genetic subclusters. f The fifth analysis included ten genetic subclusters. g The fifth analysis included fourteen genetic subclustersF ST variance among coefficient of individual relative to the total variance. F SC variance among subpopulations within groups. F CT variance among groups relative to the total variance. doi:10.1371/journal.pone.0087381.t004 located in the western edge of the distribution area whereas P21 is located 1,200 m above sea level at the southwestern edge of the Yan Mountains. The marginal distribution would reduce the opportunity to communicate with other populations and lead to a low level of genetic diversity.
The Ho value was lower than the He value at all 31 loci (Table 2), indicating a deficiency of heterozygotes at these loci. A heterozygote deficiency was also observed at the population level (Table 3). Similar findings related to heterozygote deficiency have been observed in other trees [44][45][46]. In Cinnamomum insularimontanum Hyata (Lauraceae) from southern Korea, a heterozygote deficiency was explained as a process of partial selfing rather than the presence of null alleles or a temporal Wahlund effect. A deficiency of heterozygotes in the tropical species Sextonia rubra (Mez) van der Werff was explained as an effect of biparental breeding due to limited pollen dispersal among relatives [45]. In flowering dogwood trees, a deficiency in heterozygotes was explained as the result of half-sibling mating occurring over a small geographical area [46]. The seed-setting rate by selfpollination in Siberian apricot is very low; such trees usually exhibit self-incompatibility. Thus, the deficiency of heterozygotes in Siberian apricot in our study may be the result of low levels of inbreeding. Further research on the mating system, pollen dispersal, and seeds in Siberian apricot populations is needed to infer the precise cause of the deficiency in heterozygotes.

Genetic Structure of Siberian Apricot
An AMOVA revealed that genetic variation within populations accounted for about 94.4% of the total (Table 4). Outcrossing woody plants tend to be more genetically diverse and have less genetic differentiation among populations [32]. The percentage of genetic variation within populations of Siberian apricot in the Yan Mountains was shown to be up to 96% [15]. The negative percentage of variation detected among wild and semi-wild groups suggests that there is no significant difference between them. Furthermore, the values of Ne, He,Ho and F in the semi-wild population were similar to those in the wild population. This indicates that the sources of semi-wild populations might be selected randomly from the seeds of wild populations, and that recent cultivation practices have had little impact on the genetic diversity of Siberian apricot. The relatively low values of Na in the semi-wild group might be due to the small sample size.
The use of R-and F-statistics when estimating genetic differentiation assumes a stepwise-mutation model (SMM) and an infinite-allele model, respectively. R-statistics was developed to take into account the high homoplasy inherent in microsatellite markers [47]. However, several analyses of population structure have reached the conclusion that many microsatellite loci do not fit an SMM process [48][49][50]. Balloux et al. [51] showed that microsatellites could mutate following a fairly strict SMM model. De Andrés et al. [52] also chose F ST instead of R ST when calculating the genetic differentiation among grapevine populations. Compared with R ST , F ST was more consistent with our other analysis. Though all the genetic differentiation between pairwise populations was significant, the lowest values of F ST still could be observed between P1-P2, P1-P3, P1-P4, P1-P17, P2-P3, P2-P4, P4-P5, P4-P17, P5-P17, P11-P12 and P13-P14. An UPGMA dendrogram based on Nei's unbiased genetic distance showed these population pairs had the shortest genetic distance  ( Figure S1). In addition, the clustering analysis showed these pairwise populations with low F ST value clustered into a genetic cluster. All pairwise populations with the lowest genetic differentiation were from the same region (Figure 1), except P13-P14. We did not find variation in the Siberian apricot trees around P13 and P14, indicating that they are isolated populations. The distance from P13 to P14 is about 300 km, which is far enough that the two populations have little chance to exchange genes. The F ST value between P13 and P14 (Table 5) is not significantly different, suggesting a low degree of genetic differentiation between them. It may be that a long time ago human activity severed the continuity of their distribution, but that the later development of a similar environment at the two sampling locations (the eastern edge of the Greater Khingan Mountains and western edge of the Northeast Plain) guided the evolution of the two populations in the same direction. Isolated populations cannot communicate with outside populations, which may increase the chance of inbreeding. The relatively high positive value of F (Table 3) supports this possibility. STRUCTURE has been successfully used in a large variety of population genetic studies, including in studies of genetic structure, the distinguishing of breeds, and the detection of hybrids between cultivated and wild assortments [53][54][55]. In general, two models are used to identify the true optimum number of subsets (K) in STRUCTURE. The first model, described by Pritchard et al. [28], is based on the probability Pr(X|K) (called Ln P[D] in STRUCTURE), and the K value that provides the maximum Ln P(D) value is selected as the optimum number of subsets [56]. Evanno et al. [31] found that in many cases the estimated Ln P(D) does not help visualize the correct number of clusters (K). They recommended using an ad hoc statistic, DK, based on the rate of change in the log probability of data between successive K values evaluated by STRUCTURE to more accurately detect the real number of clusters [57,58]. However, Vigouroux et al. [59] pointed out that the DK method of Evanno et al. [31] always favored K = 2 in the main structure analysis. When large datasets are analyzed, a convergence problem for the Gibbs sampler algorithm used in STRUCTURE may occur [60,61]. Recently, Jacobs et al. [62] grouped populations by maximizing the allocation of genetic diversity among subgroups (i.e., maximizing the F ST values). This provided a new means of identifying the true optimum number of subsets. The result of AMOVA showed that the maximum F ST value (0.06008) when all populations were grouped into four genetic clusters (Table 4).
In this study, STRUCTURE identified four main genetic groups (clusters) ( Table S2). All genetic clusters showed high average ancestry values, as compared to their own clusters. The populations from G3 were almost all clustered into genetic cluster C1, while the populations from G1 were clustered into genetic cluster C4 except P6. However, based on the ancestry values of all of the individuals (Table S2), we found that a high number of individuals from P9 (2 individuals), P10 (2 individuals), P11 (2 individuals), P12 (6 individuals), P18 (1 individual), and P19 (5 individuals), belonged to the other genetic clusters, which were clustered in genetic cluster C1. Similar results were also found in the other genetic clusters. Siberian apricot reproduces mainly by seeds from ripe and dehiscent fruits. A natural gene flow over such distance could not be possible, and one putative explanation could be dispersal by some kinds of rodent and human actions. The Korean field mouse (Apodemus peninsulae), whitebellied rat (Niviventer confucianus), striped field mouse (Apodemus agrarius), and other rodents feed and store Siberian apricot seeds, which makes longdistance gene flow possible and improves the level of genetic diversity.
According to the results of our structure analysis, most of the populations that were geographically close were generally clustered into the same cluster. An analysis based on the Mantel test (Figure 4) showed that the genetic distance was not significantly correlated with the geographic distance (r = 0.4651, p = 0.9940), suggesting that geographic distance is not the principal factor influencing genetic differentiation in Siberian apricot. The distance between P5 and P17 was ,17 km; however, the populations were not clustered into the same cluster when K. 4. Furthermore, significant genetic differentiation was detected between them ( Table 5), suggesting that the seeds from the semiwild population in P17 was not from the local.
P16 was separated by the first predicted barriers, regardless of whether the three semi-wild populations were included or not ( Figure 5). However, the second and third predicted barriers produced different results. If semi-wild populations were excluded, P22 had an exchange with P20 (belonging to genetic cluster C4) while P8 did not exchange with other populations. It is possible that the seed resources of P7 were from P8, because they were from the same genetic cluster when K = 14, and the seed resources of P17 were from genetic cluster C1 and genetic cluster C3. Most of the populations with low-level genetic diversity (Table 3) were separated from other populations, indicating that the barrier was an important factor influencing genetic diversity. Further investigation into how these genetic barriers are related to geographic or other factors is needed.

Conclusions
Our studies show a relatively high level of genetic diversity among Siberian apricot populations in China. However, a significant deficiency in heterozygotes was detected at the locus and population levels, which may be the result of low-level inbreeding. Our structure analysis clustered all of the populations into four genetic clusters. There was no significant difference between the wild and semi-wild groups, indicating that recent cultivation practices have had little impact on the genetic diversity of Siberian apricot. Our study represents the most comprehensive investigation of the genetic diversity and population structure of Siberian apricot in China and will provide valuable information for the collection of genetic resources for the breeding of Siberian apricot and related species.