Phylogeographic pattern suggests a general northeastward dispersal in the distribution of Machilus pauhoi in South China

Machilus pauhoi Kanehira is an important timber species in China. A provenance trial was recently set up to evaluate the growth performance of trees from different localities, with the aim of designing seed transfer guidelines. Here, we tested twelve nuclear microsatellite markers derived from other species of the Lauraceae family and investigated population genetic structure in M. pauhoi. Both the number of observed alleles per locus (Na) and the polymorphic information content (PIC) significantly decreased against the latitude, but showed an insignificant decrease against the longitude. Heterozygosity (Ho) and gene diversity (h) exhibited a weak correlation with geographic location. Private alleles were present in multiple populations, and a moderate level of population genetic differentiation was detected (Gst = 0.1691). The joint pattern of genetic diversity (Na, PIC, Ho, and h) suggests that general northeastward dispersal led to the current distribution of M. pauhoi. Significant but weak effects of isolation-by-distance (IBD) occurred, implicating the mountain ranges as the major barrier to gene flow. Both STRUCTURE and hierarchical clustering analyses showed three distinct groups of populations related to the physical connectivity among mountain ranges. A priority in designing genetic conservation should be given to the populations at the southwest side of the species’ distribution. This conservation strategy can also be combined with the pattern of adaptive genetic variation from the provenance trial for comprehensive genetic resource management of native M. pauhoi.


Introduction
Machilus pauhoi Kanehira is an important long-lived and evergreen broad-leaved tree species. It is naturally distributed in the subtropical and tropical regions in China, covering Anhui, Zhejiang, Jiangxi, Hunan, Fujian, Guangdong, and Guangxi Provinces (Fig 1)  primers from genetically related species is a low-cost, simple, rapid, and effective way to study species possessing low genetic variation. SSR primers have been developed in some other species of the Lauraceae family, such as M. thunbergii, M. pseudokobu, and Persea americana [28], [29], [30], [31]. Here, we tested the effectiveness of these SSR primers in M. pauhoi. Through analyzing population genetic structure, we inferred the potential processes responsible for the origin and evolution of the present natural populations of M. pauhoi. This background information in combination with data on the pattern of adaptive variation derived from a provenance trial can be used to produce a comprehensive genetic resource management plan for M. pauhoi.

Plant material
Mature fruits were collected from about 16 maternal parents that were located at least 50m apart in each of 24 natural populations in July 2014 (Table 1; Fig 1). Most samples were collected at 100-200m (population codes RH, SC, HT, HE, GR, HS, GT, FO, AS, GO, and NP) or below 100m (YC, JD, PY, ZP, YS, ZF, AF, RY, and JO) above sea level. Three populations were collected at 200-300m (XA, QN, and CY), and one population (LY) at an altitude of 329m above sea level. Annual mean temperature ranges from 16.1˚C to 20.5˚C (Table 1). Annual

SSR amplification and selection of optimum primers
SSR primers were tested according to previous reports on M. thunbergii, M. pseudokobu, and Persea americana from the Lauraceae family [28], [29]. Twelve pairs of SSR primers were screened to determine the sequence variability observed among 16 plants of M. pauhoi. These SSR markers are from nuclear genomes rather than organelle genomes, and the resultant amplicons are biparentally inherited. Table 2 lists the SSR primer sequences and the maximum number of observed alleles at each SSR locus. PCR amplification was carried out in a 25μL reaction volume that contained 1U Taq DNA polymerase (Takara Biotechnology Company Limited, Dalian, China), 1μL 20ngÁμL -1 template DNA, 2μL 25 mmolÁL -1 MgCl 2 , 0.5μL 10mmolÁL -1 dNTP, 0.5μL10mmolÁL -1 of each forward and reverse fluorescent primer, and 2.5μL10×PCR buffer (without MgCl 2 , 100 mmolÁL -1 Tris- HCl, pH 8.8 at 25˚C, 500 mmolÁL -1 KCl). PCR amplification was conducted using the following cycle procedure: an initial 3 minutes of denaturing at 95˚C, followed by 10 cycles of three steps (30 seconds of denaturing at 95˚C, 30 seconds of annealing at 60˚C, and 30 seconds of elongation at 72˚C). The next 20 cycles were then conducted with the following steps: 30 seconds of denaturing at 95˚C, 30 seconds of annealing at 55˚C, and 30 seconds of elongation at 72˚C. A final elongation step was set at 72˚C for 6 minutes. The PCR products were detected by capillary electrophoresis using an ABI 3730XL DNA analyzer after confirmation of PCR amplification on a 2% agarose gel.

Data analysis
Population genetic diversity was estimated using POPGENE1.32 [32], including the number of observed alleles (N a ) per locus, observed (H o ) heterozygosity, gene diversity (h), and polymorphic information content (PIC). Gene diversity (h) at a locus with k alleles was calculated as 1 À X k i¼1p 2 i (i = 1,..,k) [33]. The PIC at a single SSR locus was calculated as j wherep i is the estimate of the frequency of allele i (i = 1,..,k) [34]. GENEPOP version 4.3 [35] was used to test Hardy-Weinberg equilibrium (HWE) at each locus in each population and linkage-disequilibrium (LD) for all pairs of microsatellite loci in each population. Exact tests for HWE were conducted where the p-value in each test was the sum of the probabilities of all possible samples with the same allelic counts [33], [36], [37]. The significant level α for testing HWE and LD was adjusted by Bonferroni correction, approximately 0.05/24 = 0.0021 for HWE at each locus and 0.05/(12 Ã 11/2) = 0.0008 for LD for each pair of loci. According to the type of repeat motifs flanked by a pair of SSR primers [28], MICRO-CHECKER was applied to checking marker scoring errors due to stuttering, large allele dropout, and null alleles at each locus in each population [38]. Only those loci that were in HWE, absent of null alleles, and in linkage equilibria among them were included for later population genetic analyses. Population genetic differentiation was estimated using Weir and Cockerham's method [35], denoted by G st instead of F st since the number of alleles at a single SSR locus exceeded two alleles among individuals. The significance of G st from zero was tested through 1000 permutations of the genotype data using GENEPOP [35]. STRUCTURE2.3 [39] was used to further analyze population genetic structure based on the Bayesian method where each individual was assigned to distinct clustering groups with estimated probabilities. The optimum number of clusters (K) was identified after 10 independent runs for each K value ranging from 1 to 20, with a burn-in of 10,000 iterations followed by 100,000 iterations. We checked that the burn-in of 10,000 iterations was sufficient to ensure the stationary distributions of all estimated parameters, such as F st and log(alpha) where alpha is the parameter in the prior distribution (Dirichlet) of allele frequencies [39]. ΔK was calculated through the second-order changing rate of the likelihood function L(K) between adjacent K values, and the K value with a maximum ΔK was selected as the appropriate number of clusters [40].
Isolation by distance (IBD) was tested through the regression analysis of G st /(1-G st ) on the logarithm of the geographic distance [41]. Note that the geographical distance between two populations was calculated using the longitude and latitude coordinates (Table 1). A Mantel's test was also conducted to examine the relationship between G st and the geographic distance [42]. Pearson's correlations (r) were tested between genetic variation (means of N a , H o , h, and PIC) and geographic location (longitude, latitude, and elevation). The correlations were also tested between genetic variation and annual rainfall or annual mean temperature.
Nei's genetic distance was estimated to measure population genetic divergence [43]: 1=2 in which p lu1 and p lu2 are the frequencies of alleles u1 and u2 at the lth locus from populations 1 and 2, respectively. On the basis of gene frequency data where loci with HWD and null alleles were excluded in their resident populations, we applied pvclust package in R to examining the consensus genetic relationships (1000 bootstrapping samples) among twenty-four populations [44]. A principal coordinate analysis (PCoA) based on the correlation matrix of allele frequencies at multiple loci was conducted to investigate population genetic relationships [45]. PCoA was also conducted on the basis of the correlation matrix using longitude, latitude, elevation, annual rainfall, and annual mean temperature. The two principal coordinate analyses were then compared to view the relationship between comprehensive genetic and ecological components.

Results
The SSR primers previously reported in M. thunbergii, M. pseudokobu, and Persea americana were tested. The SSR primers from M. thunbergii and M. pseudokobu [30], [31] did not work well, but twelve SSR primers from Persea americana [28], [29] were polymorphic. These twelve SSR primer pairs produced a various number of alleles among loci, with a total of 181 alleles ranging from 7 (SHRSPa057) to 28 (LMAV15) ( Table 2; S1 Table for genotypes in detail). The alleles from each primer pair and the range for the number of observed alleles from the twelve primer pairs were unequally distributed among populations. There were 37 private alleles in total, and almost all populations had private alleles. For instance, the populations close to the east side of the Xuefeng Mountain had 4, 1, 3, 1, 4, and 2 private alleles in XA, GO, GR, GT, ZP, and HE, respectively. Populations SC, NP, LY, RH, ZF, and PY did not have private alleles. The number of observed alleles per locus (N a ) ranged from 3.333± 0.89 (AS) to 6.08± 2.81 (XA), with a mean of 5.02±2.20 alleles per locus ( Table 3). The N a values changed from 6.0 in Guangxi Province to 5.0 in Guangdong and Jiangxi Provinces, and to 3.0 or 4.0 in Fujian Province (Fig 1). No significant deviations from Hardy-Weinberg equilibrium (HWE) were found in 92.01% (265/288) of the tests in total. Only 7.99% (23/288) of the tests were in Hardy-Weinberg disequilibrium (HWD). The loci with significant HWD were in one to three populations, including SHRSPa009, SHRSPa012, LMAV03, LMAV25, LMAV34, and LMAV35. Loci SHRSPa021, LMAV15, and LMAV24 were in HWD in four, six and five populations, respectively. Population NP had HWD at four loci, while populations SC, CY, FO, LY, HE, RH, RY, JD, HS, and AS had HWE at all twelve loci. All other populations had HWD only at one to three loci.
Tests of linkage disequilibrium (LD) indicated that all SSR locus pairs were in linkage equilibrium in each population (data not shown here). These loci were independent from each other.
The results from MICRO-CHECKER analysis were generally consistent with those of HWE tests (Table 4). Twelve loci with significant HWD were detected to have null alleles, including locus SHRSPa021 in AF, ZP, RH, YS, ZF, and PY, SHRSP057 in HT, LMAV15 in GT, GR, and PY, LMAV24 in QN, and LMAV34 in YC. However, ten loci with significant HWD were detected to be absent of null alleles, including locus SHRSPa021 in GO, LAMV03 in QN, LAMV15 in JO and GO, LAMV24 in XA and GO, LAMV25 in AF, JO, and NP, and LAMV34 in AF. All these loci with significant HWD and null alleles were excluded in their resident populations in further population genetic analyses.
The average PIC was about 0.55±0.19, with the highest value in CY (0.67±0.11) and the lowest value in JO (0.41±0.23). Populations JO, YC, PY, FO, LY, and HS had the PIC values of less than 50%. Mean observed heterozygosity (H o ) in each population was close to the average gene diversity that is equal to the expected heterozygosity (H e ) under HWE ( Table 3).
Analysis of population genetic differentiation indicated that 16.91% of the total genetic variation at multiple SSR loci was distributed among populations. Genetic differentiation at individual loci was significantly different from zero (p-value<0.001; Table 5), with the G st values ranging from 0.0850 at the LMAV24 locus to 0.3146 at the SHRSPa010 locus. When the inverse number of migrants between populations inferred from G st /(1-G st ) was regressed on the logarithm of geographic distance, a significant relationship occurred at loci SHRSPa009, LMAV03, LMAV25, and LMAV35, but not at the remaining loci (Table 5). A multilocus analysis indicated that a significant but very weak IBD effect occurred (a = 0.1657 and b = 0.0536; p-value = 0.0005; R 2 = 0.0436; Fig 2). Mantel's test based on the multilocus G st and geographic distance matrices also supported the presence of weak IBD effects at multiple loci (the observed correlation r = 0.2661, p-value = 0.01).
Pearson's correlation tests indicated that genetic variation (N a , PIC, except H e and H o ) significantly decreased against latitude, but decreased insignificantly against longitude or elevation. Genetic variation did not significantly correlate with the annual rainfall or the annual mean temperature (Table 6). Analysis with STRUCTURE showed a maximum ΔK value at K = 3(ΔK = 307.66), indicating that the twenty-four populations were appropriately divided into three groups (Fig 3). Group I (in green) included JO, NP, and FO in Fujian Province. Group II (in red) included  When three groups of populations were mapped to the geographic locations (Fig 1), Group II    (Fig 4), the same as those derived from the analysis with STRUCTURE (Fig 3).  Analysis of principal coordinates supported the three distinct groups among the twentyfour populations (Fig 5A). The first two principal components jointly accounted for 70.63% of total genetic variation. Three populations in Group I (JO, FO, and NP) were clearly separated from the remaining 21 populations. Groups II and III could also be separated although they were more closely related, consist with the results in Figs 3B and 4. However, the three groups could not be delineated from the map of the first two principal components of ecological factors (jointly accounting for 41% of variation; Fig 5B), indicating discordant relationships between comprehensive genetic and ecological components.

Discussion
Analysis of genetic diversity at twelve nuclear SSR loci indicated that the number of observed alleles per locus (N a ) and genetic diversity (PIC) significantly decreased along the southwestto-northeast mountain ranges, but the average heterozygosity (H o ) and gene diversity (h) did not. These results imply that a general northeastward dispersal of M. pauhoi led to its present distribution in South China. Private alleles were present in multiple populations, and about 16.91% of genetic variation was distributed among populations. Both population structure and clustering analyses supported three groups of populations in the natural distribution of  Table 1). The first and second principal components accounted for 26.32% and 14.67% of variation, respectively.
The transferability of nuclear SSR markers may vary with species or families owing to the high mutation rate at SSR loci in eukaryote organisms. Although primer sequences are often not conserved even among genetically related species due to the high mutation rate, preliminarily screening of microsatellite primers from within the same genus or family may nevertheless yield useful marker loci. Here, we tested the effectiveness of interspecific transferability in M. pauhoi with the SSR markers previously derived from M. thunbergii, M. pseudokobu, and Persea americana. We observed that the SSR primers previously explored for M. thunbergii and M. pseudokobu [30], [31], did not work well in M. pauhoi, but some SSR primers from Persea americana worked well. Further, the absence of LD among loci implied that these loci were freely recombined if they were on the same chromosomes or on different chromosomes. Our results indicated that the SSR markers employed were available for investigating genetic variation in M. pauhoi.
The genetic diversity or heterozygosity in natural populations at the SSR loci that were different from those we examined was quite variable in Lindera glauca [46], Cinnamomum camphora [47], Ocotea species [48], Aniba rosaeodora [49], and Beilschmiedia roxburghiana [50]. As expected, the heterozygosity at microsatellite loci is higher than those at allozyme loci in woody plant species (H e = 0.148) [51]. However, the gene diversity (expected heterozygosity under HWE) in M. pauhoi was generally comparable to those in genetically distant species assayed with SSR markers, such as 0.68 for perennial plants, 0.65 for outcrossing plants, and 0.61 for wind pollinated plants [52]. The structure of perfect flowers enables the occurrence of a mixed mating system although the level of selfing is not quantified in M. pauhoi. Most populations exhibited random mating system (insignificant F is ), implying the pattern of predominant out crossing [17]. However, heterozygote deficit (F is >>0) was present at a few loci in several populations (Table 4). As was indicated before, some of these loci with HWD were detected to arise from the occurrence of null alleles [53], such as the SHRSPa021 locus in AF, ZP, RH, YS, ZF, and PY, and the LMAV15 locus in GT, GR, and PY. A few other loci with HWD were detected to be absent of null alleles, such as LMAV15 locus in JO and GO, and the LMAV25 locus in AF, JO, and NP. Reason for these variations needs a further clarification.
The routes for population origination of M. pauhoi inferred from the genetic diversity at SSR loci are complex. Both physical (mountain ranges) and non-physical factors (Quaternary glaciations and historical human activities) could shape the pattern of genetic diversity. The physical factors include the Xuefeng, Nanling, and Wuyi Mountains, and could create barriers to inter-population gene flow. The Xuefeng Mountain (1934 m above sea level at the highest peak) exhibited a southwest-northeast orientation. The Hengduan Mountains locating at the west side of the Xuefeng Mountain also exhibit a southeast-northwest orientation. The regions between the Hengduan and Xuefeng Mountains are the high plateaus, basins, and mountainous regions (1000-2000m above sea level). This mountain shelf significantly affects the climate at the west side of the Xuefeng Mountain, and produces adverse environments that block the westward dispersal of M. pauhoi. However, the east side of the Xuefeng Mountain ranges could likely provide some refugia during the Quaternary glaciation for plants that subsequently spread southward. This possibly explains the high allele richness (N a ) in the populations at the east side of the Xuefeng Mountain. Potential refugia or ancestral populations could likely exist around populations XA,GR, GO, GT, and ZP whose altitudes are about 52-226m above sea level (Table 1). These altitudes are much lower than the plateaus at the western side of the Xuefeng Mountain.
The Nanling Mountain ranges (1920m above sea level at the highest peak) exhibit a westeast orientation, and consist of three discontinuous and relatively small mountains. The gaps between discontinuous mountains allow the easy movement of plants from north to south or vice versa (Fig 1), forming a transitional region (aka, a biodiversity "hotspot", [54]) where tropical and subtropical vegetation was historically mixed [55]. However, the western section harboured more deciduous broad-leaved tree species and temperate flora species than the eastern section due to the impacts of Yuunan-Guizhou Plateau (the Xuefeng Mountain is a part of the Plateau) and the flora of Central China [55]. Some relic, archaic, primitive, and endemic floras (Cathaysian flora) were maintained along the Nanling Mountain ranges [56], suggesting the existence of likely refugia for plants, insects, and animals. Historically, human admixture moving from north to south regions or from south to north regions took place along the Nanling Mountain ranges [57]. This could include the activity of seed transference, such as seedlings of M. pauhoi for the ornamental use or seeds for producing cosmetic perfume [3], [5], [6], enhancing the spread of M. pauhoi. As a consequence, barriers to gene flow were weakened among populations in this region. Genetic diversity and the number of observed alleles per locus were relatively comparable among populations (average N a = 5.16 in Group II).
The Wuyi Mountain ranges (2158m above sea level at the highest peak) are continuously distributed in a southwest-to-northeast orientation. One viewpoint is that the Quaternary glaciation did not occur in the Wuyi Mountain [58] although the uncertainty about whether there was Quaternary glaciation in southern China remains to be clarified [59]. This is the same situation as in the Nanling Mountains There is no doubt that the Wuyi Mountain historically provided refugia for plants and animals [60]. A strong barrier to gene flow could exist across the Wuyi Mountain along the west-east route (Fig 1), resulting in low genetic diversity at the east side of the distribution of M. pauhoi. The eastward dispersal is weaker than the northward dispersal in the whole range-wide distribution (Table 6), and this is probably related to the strong barrier along the west-east route.
The preceding discussions suggest a general relationship between the genetic variation and the mountain barriers to gene flow in the distribution of M. pauhoi. A joint pattern for the correlations of N a , H o , h, and PIC with the geographic locations suggested a general northeastward dispersal ( Table 6). As in Ocotea ternra [12], seed dispersal in M. pauhoi is mainly mediated by birds although dispersal through other animals cannot be excluded. Seed dispersal by water transport could not be excluded as well because many local streams run in the southwest-northeast orientation between the Xuefeng and Wuyi Mountains. This may account for the relatively close genetic relationships among populations in Group III. Pollen flow by insects could be seriously affected by the mountain barriers from the west to east sides in the species' distribution, and hence its contribution to inter-population gene flow may be limited. As a whole, a relatively higher level of population genetic differentiation was produced in M. pauhoi than in other tropical trees [12]. The significant but weak effects of isolation-by-distance suggested that physical distance was not the main factor shaping population genetic structure. The presence of private alleles might suggest dispersal limitation. All these results supported the hypothesis that physical barriers rather than the geographic distance was the main reason that gene flow was impeded.
Populations CY, QN, LY, and YC were loosely joined with populations GO, XA, GR, and GT in Group II based on their Nei's genetic distances (Fig 4). The speciation time for M. pauhoi in the Lauraceae family is unknown since phylogenetic relationships among species are not studied. One possible explanation is that M. pauhoi could have been widely distributed in subtropical and tropical regions in ancient periods according to the fossil records of other species in the same family [9], [10], [11], [12], [13]. Population genetic differentiation could therefore have been small in M. pauhoi before glaciations occurred in the southern regions of the Nanling and Wuyi Mountains. The subsequent glaciations, especially the Quaternary glaciation, could significantly have shaped the genetic diversity in the northern regions, but not seriously affected populations CY, QN, LY, and YC in the southern regions of the species' distribution. In the later northeastward, or particularly the northward, dispersal [61], the previously close genetic relationships present before glaciations were maintained among the group of populations at the southeast side (CY, QN, LY, and YC) and the group of populations at the west side (XA, GO, GR, and GT) that were potentially saved in refugia. This explanation needs further evidence with additional genetic data collections.
Concerning the genetic management of M. pauhoi, a moderate level of population genetic differentiation (G st = 16.91%) suggests that a strategy of considering multiple populations is needed. As discussed above, complex routes of gene flow were potentially involved in maintaining contemporary population genetic structure in M. pauhoi. Previous studies indicated that seed dispersal by birds mainly contributed to inter-population gene flow in the genus Ocotea [12], [62], [63]. This could partially occur in the genus Machilus or some other genera in the Lauraceae family [29], [64]. IBD tests indicated that the physical mountains played an important role in impeding inter-population gene flow in the distribution of M. pauhoi. Furthermore, the historical human activities in the Nanling Mountain regions could facilitate gene flow. All these processes of gene flow deserve our attention in managing natural genetic resources of M. pauhoi.
Our field survey indicated that only a small proportion of the natural distribution of M. pauhoi is present in protected nature reserves. Most populations remain in a vulnerable status and are often faced with potential deforestation. Although M. pauhoi is more widely distributed than Persea chekiangensis C.B. and Persea bournei (Hemsl.) Yang in the Lauraceae family in South China, its genetic resource is rapidly eroded due to its extensive utility for timber. Our results indicate the potential evolutionary processes forming the present population genetic structure, and this provides useful reference for genetic resource management in M. pauhoi. The three groups of populations revealed by STRUCTURE and cluster analyses (Figs 3 and 4) imply a focus on these three clusters in practical management and genetic conservation. The relatively large population genetic differentiation and Nei's genetic distances between populations were partly attributable to high mutation rates at SSR loci, compared with those using allozyme markers [12], [19]. This might simplify practical manipulation in conserving genetic diversity because most populations in each of the three groups were loosely joined. From the distribution of genetic diversity and the northeastward dispersal, a priority should be given to the populations at the southwest sides of the distribution in conserving genetic variations (e.g., sampling for establishing a germplasm gene pool). Populations in the southwest sides could likely be more ancestral populations.
Note that the results from neutral microsatellite markers do not provide us with information on population adaptation across range-wide environmental sites [65]. SSR markers are usually considered as selectively neutral although strict tests were not conducted. Their neutrality might partially be reflected from the insignificant correlations between genetic and ecological (annual mean rainfall and annual mean temperature) components ( Table 6). The neutral evolutionary processes (mutation, migration, and genetic drift) can be used to interpret the observed pattern of genetic structure. Recent provenance trials from the two-year-old seedlings showed the presence of significant differences among populations [66], implicating adaptive variation among provenances. Although results for stable growth or other adaptive phenotypic traits among provenances are not available yet, the designation of seed zones is conventionally concentrated on the basis of the provenance trials [66], such as setting up breeding populations for multiple purposes or for combating climate changes. The population structure and the historical evolutionary route inferred from SSR markers can be combined with the information from provenance trials to develop a comprehensive scheme for future genetic management [67], [68].