Genetic diversity and population structure in the endangered tree Hopea hainanensis (Dipterocarpaceae) on Hainan Island, China

Hopea hainanensis Merrill & Chun (Dipterocarpaceae) is an endangered tree species restricted to Hainan Island, China and a small part of Northern Vietnam. On Hainan Island, it is an important indicator species for tropical forests. However, because of its highly valued timber, H. hainanensis has suffered from overexploitation, leading to a sharp population decline. To facilitate the conservation of this species, genetic diversity and population structure were assessed using 12 SSR markers for 10 populations sampled across Hainan Island. Compared to non-threatened Hopea species, H. hainanensis exhibited reduced overall genetic diversity and increased population differentiation (AMOVA: FST = 0.23). Bayesian model-based clustering and principal coordinate analysis consistently assigned H. hainanensis individuals into three genetic groups, which were found to be widespread and overlapping geographically. A Mantel test found no correlation between genetic and geographical distances (r = 0.040, p = 0.418). The observed genetic structure suggests that long-distance gene flow occurred among H. hainanensis populations prior to habitat fragmentation. A recent population bottleneck was revealed, which may cause rapid loss of genetic diversity and increased differentiation across populations. Based on these findings, appropriate strategies for the long-term conservation of the endangered species H. hainanensis are proposed.


Introduction
Earth's biodiversity is rapidly declining as a consequence of agricultural expansion, overexploitation, deforestation, pollution and climate change [1][2][3]. Approximately 40% of plant species are threatened with extinction [4]. Conservation genetics, a new discipline that applies the concepts and tools of population genetics to biological conservation, is aimed at preserving endangered species from extinction [1]. Endangered species are commonly characterized by small, fragmented populations and restricted gene flow among populations [3]. In small, isolated populations, mating occurs more frequently among relatives, and a shift to selfing may be observed in hermaphroditic plants. Inbreeding  recessive alleles and the consequent production of inferior offspring, a phenomenon known as inbreeding depression [5]. In addition, genetic drift is stronger in small populations, further contributing to the fixation of deleterious mutations and loss of genetic variation, which compromise the adaptive potential of a population and thereby increase its extinction risk [3,6].
With the goal of studying the genetic diversity, population differentiation, mating system and historical demography of endangered species, the field of conservation genetics provides remarkable insights into the preservation of biodiversity in the real world [2]. Tree species from a single family, the Dipterocarpaceae, are key community members in Asian tropical forests, accounting for 20-50% of forest basal area and more than 50% of canopy trees [7,8]. Many Dipterocarpaceae species represent important timber resources, and thus have been heavily exploited in tropical Asian countries. Due to massive logging for timber, as well as deforestation for agriculture, many dipterocarps are now classified as endangered or critically endangered [4,8]. However, dipterocarp forests are far more than a mere resource for timber production. They are key components of Asian tropical rainforest ecosystems, serving as a foundation on which these highly diverse ecosystems are assembled. Indeed among 25 worldwide 'biodiversity hotspots', four are located in Southeast Asia [9]. Furthermore, dipterocarp forests provide a variety of ecosystem services and play a crucial role in maintaining the equilibrium of ecological processes at both regional and global scales [8,10].
Dipterocarp forests flourish in the Malay Peninsula, Sumatra, Borneo, Java and wetter parts of the Philippines, and stretch to the northern limits of tropical Asia [8]. The species diversity of the Dipterocarpaceae is dramatically reduced at the range limits in contrast to the aseasonal equatorial zones of Malaysia and Indonesia [7,11]. Hainan Island in China is located near the edge of the Asian tropics, and only two Hopea species, H. hainanensis Merrill & Chun and H. reticulata Tardieu, are found there. H. hainanensis grows in the tropical lowland forests of Hainan Island and Northern Vietnam. It is a large evergreen tree with a height up to 20 meters, and it blooms and fruits almost every year. This tree species serves as an indicator for tropical forests on Hainan Island [11,12] and is known for its highly valued timber, which is extremely durable and suitable for building boats, bridges and houses. Polyploidy is infrequent in the family Dipterocarpaceae, but triploid and tetraploid species were recorded in genus Hopea [13,14]. Although many dipterocarps have a mixed mating system, most mature seeds are outcrossed, indicating selective abortion of selfed fruits [8]. Neither ploidy nor mating system has been reported for H. hainanensis. The size of H. hainanensis populations on Hainan Island has been greatly reduced due to the overexploitation and habitat loss for rubber tree plantations and agriculture [15,16]. The remaining populations are now severely fragmented, preserved only in a few natural reserves [11]. H. hainanensis was assessed as endangered in the IUCN Red List of Threatened Species [16] and classified as the first-class protective plant in the Information System of Chinese Rare and Endangered Plants (ISCREP) [17]. This species is quite scarce even in protected areas and the mature trees were estimated to be less than 250 [17]. However, the lack of information on the distribution of genetic variation in H. hainanensis has hampered the effective conservation and management of this endangered species.
In view of the ecological and silvicultural importance of the Dipterocarpaceae, there have been many studies of dipterocarp trees that have investigated patterns of genetic diversity, fine-scale spatial genetic structure, mating system and gene flow among populations [18][19][20][21][22] The majority of studied species have moderate to high levels of genetic variation, and low differentiation among populations, suggesting a historical pattern of outcrossing combined with large, stable populations [3,8]. In addition, the genetic diversity and population structure of endangered species in the Dipterocarpaceae have been assessed for the purposes of conservation and management of genetic resources [23][24][25][26].
In this study, population genetic analyses of H. hainanensis were performed using 12 microsatellite markers newly developed for this species. The goal of the study was to quantify the amount of genetic variation in H. hainanensis populations on Hainan Island, and compare this to the genetic diversity of non-threatened Hopea species. Then, the geographic pattern of genetic variation was revealed and discussed. The effect of habitat fragmentation on genetic differentiation across populations was also evaluated. Our studies on genetic diversity and population structure should facilitate the long-term conservation of this endangered species.

Sample collection
Totally 76 individuals were collected from 10 natural reserves for population genetic analyses, as H. hainanensis is very rare in tropical forests of Hainan Island [17] (Table 1, Fig 1). The field investigation was approved by the Administration Office of Nature Reserves, Forestry Department of Hainan Province. The habitat of H. hainanensis is primary or secondary tropical lowland forest, at altitudes ranging from 200 to 1000 meters [16]. Leaf samples were collected from mature trees with a diameter at breast height larger than 0.2 meter; all sampled trees were separated by a distance of at least ten meters. Young leaves free of disease and damage were collected and then immediately dried with silica gel. Voucher specimens for H. hainanensis were deposited in the Herbarium of Hainan University, Haikou, China.

DNA isolation, SSR amplification and genotyping
Total genomic DNA was extracted from the silica gel-dried leaves using a DNeasy Plant Mini Kit (QIAGEN, Shanghai, China) following the manufacturer's instructions, with the addition of a step to remove high levels of polysaccharides in the initial extracts. The concentration and quality of the extracted DNA were measured with a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Inc., Wilmington, DE, United States). Using second generation sequencing and computational screening, twelve polymorphic SSR primer pairs were developed to genotype H. hainanensis. PCR amplification was carried out using an Eppendorf Mastercycler ep Gradient S thermocycler (Eppendorf, Hamburg, Germany); the 20 μL final reaction volume contained 2 μL gDNA (at least 50 μg/mL), 0.2 μL of each primer (50 μM) 10 μL 2× Taq PCR MasterMix (TIANGEN Biotech, Beijing, China), and ddH 2 O. The following cycling program was used: an initial five minute denaturation at 94˚C; followed by 32 cycles of denaturing at 94˚C for 20 s, annealing at 59-63˚C for 20 s, and extension at 72˚C for 60 s; with a final extension of seven minutes at 72˚C. PCR products were mixed with Hi-Di Formamide

Data analysis
In view of the autopolyploid nature of H. hainanensis (personal communication with Rong Wang of East China Normal University, who initiated the whole genome sequencing of H. hainanensis), allelic dosage was determined based on the ratios between peak intensities following the MAC-PR method [27] (S1 File). However, in the analysis of this autopolyploid, genotypic ambiguities caused by unknown allelic dosage could not be fully resolved with the MAC-PR method. In order to account for missing dosage information for partial heterozygotes, GenoDive version 3.04 [28] was used. Standard metrics of genetic diversity, such as the number of alleles (N a ), effective number of alleles (N e ), and observed (H o ) and expected (H e ) heterozygosity were then calculated. The heterozygosity-based G is statistic was used to test for deviations from Hardy-Weinberg equilibrium (HWE). Another challenge posed by autopolyploidy is polysomic inheritance, under which double-reduction may occur, biasing the results of standard population genetic analyses [29]. A new software package, Polygene version 1.2b, was developed to analyze polyploid genetic data while taking into account both genotypic ambiguities and double-reduction [30]. Four polysomic inheritance models are implemented in the software, and the optimal model is selected based on the Bayesian information criterion (BIC). For each SSR locus, the possible genotypes and their posterior probabilities are inferred based on the allelic phenotype and inheritance model. The effects of null alleles, negative amplification and self-fertilization are also modeled by Polygene. To take advantage of these benefits, Polygene version 1.2b was implemented in this study to conduct further population genetic analyses.
The observed (H o ) and expected (H e ) heterozygosity, polymorphic information content (PIC) and inbreeding coefficient (G is ) were estimated for each population and locus. Differentiation between populations was assessed using G ST [31]. An analysis of molecular variance (AMOVA, [32]) was implemented in Polygene with modification for polysomic inheritance; AMOVA hierarchically partitions genetic variation among populations. Isolation by distance was assessed by regressing pairwise genetic distance against the natural logarithm of geographical distance (km) using the Mantel test [33] with 9,999 permutations. Slatkin's linearized F ST was adopted as the measure of genetic distance [34]. Principal coordinate analysis (PCoA) was carried out based on Cavalli-Sforza's (1967) chord distances [35], which have been shown to be the least biased distance measure when lacking dosage information [36].
A model-based clustering method, STRUCTURE v. 2.3.4 [37], was implemented to infer population structure, using an admixture model with correlated allele frequencies. Individuals were assigned to groups without prior knowledge of their population affinities [38]. The number of populations (K) varied from one to ten, and for each value of K, ten independent replicates were run with 100,000 burn-in iterations followed by 1,000,000 MCMC (Markov chain Monte Carlo) iterations. The best K was inferred using STRUCTURE Harvester [39] following recommended procedures [40]. The Greedy algorithm of Clumpp version 1.1.2 [41] was used to permute the independent repetitions for the selected value of K, and a graphical representation of the population structure was generated with Distruct version 1.1 [42]. A neighbor-joining tree based on Cavalli-Sforza's (1967) genetic distance was created for the H. hainanensis individuals using the software package MEGA 5.0 [43].
A qualitative graphical method was performed to identify potential population bottleneck in H. hainanensis [44]. Allele frequencies were estimated using Polygene under the best polysomic inheritance model. SSR alleles were grouped into each of 10 allele frequency classes and then a frequency histogram was plotted. The 10 allele frequency classes are 0.001-0.100, 0.101-0.200, 0.201-0.300, etc. The rationale of the graphical method is that compared to common alleles, rare alleles are expected to be lost rapidly during a bottleneck. As a result, alleles at low frequency (<0.1) would be less abundant than alleles at intermediate frequency (e.g. 0.101-0.200) after population bottleneck, regardless of the mutation rate and model [44]. For comparison, SSR data of three dipterocarp species, a critically endangered tree of the Seychelles, Vateriopsis seychellarum [23], and two non-threatened species in the rainforests of Southeast Asia, Shorea leprosula [45] and S. macrophylla [21], were further analyzed with the qualitative graphical method.

Results
Based on the BIC, the optimal polysomic inheritance model selected for Polygene analyses is RCS (S1 Table). For the 12 SSR markers genotyped across the ten Hopea hainanensis populations on Hainan Island, a total of 45 alleles were detected, for an average of 3.75 alleles per locus; the minimum number of alleles detected per locus was two (at loci Hha1, Hha9 and Hha10) and the maximum was six (Hha7 and Hha8). The polymorphism information content (PIC) varied from 0.205 (Hha10) to 0.707 (Hha7), with an average of 0.477. A large number of loci were found to deviate from HWE (P < 0.05) across the study populations (Table 2) (Table 3).
The overall G ST across all populations and loci was 0.229. Pairwise comparisons of genetic differentiation between populations showed that G ST ranged from 0.001, between populations BW and FJ, up to 0.271, between populations BW and DL. Some populations, for example QW, KF and DL, were highly divergent from the other populations (S2 Table). The population genetic structure of H. hainanensis on Hainan Island did not conform to the isolation by distance model. No correlation was detected (Mantel test: r = 0.040, p = 0.418) between the genetic distance (as measured by Slatkin's linearized F ST ) and the natural logarithm-transformed geographical distance among populations (S1 Fig). An analysis of molecular variance (AMOVA) revealed that 77.43% of total genetic variation was found within populations, while a small portion of variation (22.57%) was attributed to among population differentiation (Table 4).
In model-based STRUCTURE analysis, LnP(K) was found to increase from K = 1 to 10 and ΔK was maximized at K = 3, suggesting that there are three distinct genetic clusters in H. hainanensis. The results of the individual assignment indicated that only a few individuals were found to have mixed ancestry of more than one genetic cluster (Fig 2). Five populations, FJ, BW, DL, KF and MR, were dominated by one genetic subpopulation. In LM and JX, two genetic subpopulations were represented in roughly equal proportions. Lastly, in JF, QW and BL, all three genetic subpopulations were represented, with one cluster (red, III) dominated (Fig 1). Geographically, individuals assigned to the same genetic cluster were widespread, and the distributions of the three genetic clusters were overlapping (Fig 1). The genetic relationships among H. hainanensis individuals were further explored using PCoA, and the results were consistent with the STRUCTURE analysis (Fig 3). The first two principal coordinates explained 43.89% of the total genetic variance (27.69% and 16.21%, respectively). Three genetic groups with clear boundaries were identified, corresponding to the three genetic components determined in the STRUCTURE analysis. In addition, using Cavalli-Sforza's (1967) chord distances, a neighbor-joining tree was constructed. In the tree, individuals were divided into three groups, again in agreement with the three genetic groups identified by both PCoA and STRUCTURE (S2 Fig). The distribution of allele frequencies in H. hainanensis is obvious distorted and different from the other three dipterocarp species (Fig 4). The number of alleles at low frequency (<0.

Genetic diversity in Hopea hainanensis
Species in the family Dipterocarpaceae are generally characterized by moderate to high levels of genetic diversity and low differentiation among populations [8]. This pattern of genetic variation is expected for long-lived, perennial woody plants with predominantly outcrossing mating systems [46]. For example, using SSR markers, most Shorea species have been found to have high levels of genetic variation (as measured by expected heterozygosity [H e ] and the number of alleles per locus) [19,[47][48][49]. Hopea dryobalanoides is a non-threatened species widely distributed in Malaysia, Sumatra and Borneo [50]. A single population of H. dryobalanoides was collected from the Pasoh Forest Reserve in Peninsular Malaysia, and five SSR markers were used to evaluate its genetic diversity. Expected heterozygosity (H e ) was estimated at 0.678 in H. dryobalanoides, which is comparable with that found for a number of Shorea species [21,22,49]. In contrast, in this study, the average H e for H. hainanensis populations from Hainan Island was 0.409 as estimated by Polygene and 0.427 as estimated by GenoDive, which is markedly lower than the H e measured for H. dryobalanoides [50]. Furthermore, fewer alleles were discovered in H. hainanensis (N a = 2.46) versus H. dryobalanoides (N a = 5.60). In most cases, reductions in the effective population size and shifts of mating system from outcrossing to selfing cause the loss of genetic diversity within species [51,52]. The low genetic diversity within populations of H. hainanensis is most likely due to a severe demographic bottleneck, as this species has undergone approximately a 70% population reduction over the last three hundred years [16]. On Hainan Island, deforestation is greatly accelerated in the 20th century. About 80% to 95% of the primary forests were destroyed due to massive logging for timber, transitions to rubber tree and Eucalyptus plantations, as well as urban expansion [15,53]. The H. hainanensis populations would shrink proportionally, perhaps even more, due to the high quality of its timber. A population bottleneck was manifested by the qualitative graphical method based on the distorted distribution of allele frequencies in H. hainanensis populations (Fig 4). In non-bottlenecked populations near mutation-drift equilibrium, selectively neutral loci are expected to have a large proportion of alleles at low frequency. However, in recently bottlenecked populations, the expected distribution of allele frequencies would be distorted, with fewer alleles in the low frequency class (<0.1) than in one or more intermediate frequency classes [44,54]. The graphical method makes no assumption about the ploidy of the studied species [44] and thus could be used in cases where common bottleneck testing tools, such as BOTTLENECK [55], could not be applied. We found a large number of low-frequency alleles (<0.1) are maintained in populations of the two non-threatened Shorea species. The ratio of the number of alleles at low frequency (<0.1) to the number of alleles at intermediate frequency (0.1-0.2) is 11.35 for S. leprosula and 7.10 for S. macrophylla. However, the ratios are 3.86 and 1.0 for V. seychellarum and H. hainanensis, respectively. Obviously, many low-frequency alleles have been lost during recent bottlenecks in the populations of V. seychellarum and H. hainanensis. A higher proportion of low-frequency alleles are maintained in V. seychellarum than in H. hainanensis, which may arise from different sample sizes used for the two species. Since the graphical method requires only samples of 5 to 20 polymorphic loci and approximately 30 individuals, the SSR data used in this study should be adequate [44]. To conclude, the H. hainanensis populations on Hainan Island may have experienced a recent demographic bottleneck, resulting in relative fewer alleles at low frequency and lower genetic diversity detected in this species. The genetic diversity of H. hainanensis in Northeast Vietnam was recently evaluated using 10 SSR markers [56]. The populations collected in Northeast Vietnam (H e = 0.382, allelic richness = 2.08) and Hainan Island have similar levels of genetic variation (Table 3). A population bottleneck was reported and suggested to be responsible for the low genetic diversity of these populations [56].
When the size of a population is drastically reduced, genetic drift may override natural selection and act as the main evolutionary force, leading to a loss of genetic variation [3]. Moreover, inbreeding is inevitable in small populations [3,5]. The inbreeding level of a population can be quantified with the inbreeding coefficient (G is ). In this study, G is ranged from 0.138 to 0.408, higher than the G is estimates for many other dipterocarps [48][49][50]. The proposed bottleneck event in H. hainanensis and resulting population fragmentation likely produced small and isolated populations, in which enhanced inbreeding would be expected [3]. Significant deviation from HWE was detected in a large portion of the loci from the 10 study populations (Table 3). Nonrandom mating may be the main cause of the departure from HWE in those populations. In conclusion, the low genetic diversity, increased inbreeding and deviations from HWE found in the H. hainanensis populations on Hainan Island are typical of endangered species, and are likely the consequence of intensified inbreeding and genetic drift acting in small H. hainanensis populations.

Population genetic structure and differentiation
Three genetic groups were identified in H. hainanensis using both Bayesian model-based clustering and principal coordinate analysis (PCoA) (Figs 2 and 3). The assignment of individuals to genetic groups was consistent between the two methods. Additionally, the topology of a neighbor-joining tree was concordant with the clustering results from STRUCTURE and the PCoA (S1 Fig). The three genetic clusters were widely distributed across the mountainous area of Hainan Island, without any clear geographical pattern (Fig 1). In addition, a Mantel test did not detect a correlation between genetic divergence (as Slatkin's linearized F ST ) and geographical distance (S2 Fig), suggesting that genetic differentiation in H. hainanensis does not follow a pattern of isolation by distance. Such population genetic structure implies the existence of longdistance gene flow among populations before the species-wide bottleneck. Long-distance gene flow has been reported in Neobalanocarpus heimii, Dipterocarpus tempheses and several Shorea species [56,57]. Neobalanocarpus heimii is an emergent tree species endemic to the Malay Peninsula. Using paternity analysis, the average distance of pollen flow in N. heimii was estimated to be 191 m, while several pollination events were found to exceed 400 m [58]. In addition, seeds produced by N. heimii may be transported by squirrels. Interestingly, as with H. hainanensis, no positive correlation between genetic relatedness and spatial distance was detected in N. heimii. Long-distance gene flow via pollen and seed migration are likely responsible for the weak genetic structure in this species [58]. In summary, the widespread geographical distribution of genetic subpopulations and the failure to find isolation by distance in H. hainanensis are likely due to long-distance gene flow among populations before the species-wide bottleneck.
Endangered dipterocarp species are generally highly differentiated across populations based on the analysis of SSR markers. For example, the overall G ST across adult populations of the endangered V. seychellarum is 0.20 [23]. Habitat fragmentation and restricted gene flow were suggested as possible reasons for the strong genetic divergence of this species [23]. Genetic differentiation was quantified for a few Hopea species. A moderate level of differentiation between two geographically adjacent populations was revealed in H. bilitonensis (mean value of G ST is 0.116), which is an extremely rare and predominantly selfing dipterocarp in Peninsular Malaysia [24]. Two threatened Hopea species were also reported to have low to moderate levels of genetic differentiation (G ST = 0.009 for H. chinensis; G ST = 0.102 for H. odorata) [56]. Compared to the three above Hopea species, H. hainanensis showed higher overall genetic divergence (G ST = 0.229). The AMOVA analysis also revealed high differentiation among populations (F ST = 0.23). Population genetic theory predicts that under Island model of complete isolation, reduction in population size could promote divergence among populations [59], which is likely the reason of elevated differentiation observed in H. hainanensis. However, other factors, such as the number of generations after population isolation (t), geographic distance between sampled populations, mating system and evolutionary history also influence the level of population divergence [46]. In summary, further research is needed to explore the effect of habitat fragmentation on genetic differentiation across H. hainanensis populations.
Implications for conservation. Because the loss of genetic variation is a major threat to the survival of endangered species, an important conservation action is to preserve and restore genetic variation [3,60]. Three H. hainanensis populations, BW, FJ and DL, were found to have relatively lower levels of genetic variation. These three populations may therefore be more vulnerable to abiotic and biotic stresses and should be the focus of special conservation attention. Meanwhile, the BL, JF, JX, LM and QW populations had relatively higher levels of genetic diversity and contained more than one genetic subpopulation. These five populations could therefore serve as seed sources for the propagation of seedlings and saplings, to be used in the restoration of previously logged lowland rainforests on Hainan Island. The natural regeneration of H. hainanensis populations is challenging: seedlings and saplings grow very slowly and often fail to establish in the highly shaded understory. Therefore, selecting populations with high genetic diversity (e.g. BL, JF, LM and QW) to produce seedlings may facilitate the restoration of endangered H. hainanensis populations on Hainan Island.

S1 File. SSR genotypes of 76 H. hainanensis individuals used in this study.
(XLSX) S1 Table. The BIC scores of the four polysomic inheritance models implemented in Polygene version 1.2. Two options, "Consider negative PCR" and "Consider selfing", were combined along with optimal model selection.