Evolutionary History of a Desert Shrub Ephedra przewalskii (Ephedraceae): Allopatric Divergence and Range Shifts in Northwestern China

Based on two chloroplast DNA sequences, psbA-trnH and trnT-trnF, phylogeographical patterns of a desert shrub, Ephedra przewalskii, were examined across most of its geographic range in northwestern China. A total of sixteen haplotypes were detected. There was a common haplotype in each basin, that was haplotype A in Tarim Basin, haplotype G in Junggar Basin, and haplotype M in Qaidam Basin. Genetic variance mainly occurred among populations, geographic regions, and eleven geographic groups subdivided by SAMOVA analysis. E. przewalskii likely had a smaller and more fragmented geographic range during the Last Glacial Maximum, which was determined based on ecological niche modelling. Three groups of E. przewalskii populations were detected to have experience range expansion, and this was based on significant values of Fu’s FS, Tajima’s D, and unimodel mismatch distributions. The cold and dry climate during the glacial period of the Quaternary is postulated to have been a driver for significant genetic isolation and divergence among populations or groups in E. przewalskii, whereas the warmer and wetter climate during the interglacial period is speculated to have provided favourable conditions for range expansion of the species.

Our study did not concern Human Subject Research or Animal Research. We can confirm that all sample locations of field collection are public, whether national park nor other protected area of land, thus the collections are permissible, and the stem materials collected are not the endangered or protected species.

DNA extraction, amplification, and sequencing
Total genomic DNA was extracted from dried leaf tissue using a modified 2 × CTAB method [21,22]. After screening several gene segments, we chose two cpDNA non-coding spacers, trnH-psbA and trnT-trnF, to conduct our study. The two spacers were amplified and sequenced using the primers and protocols of Sang et al. (1997) and Taberlet et al. (1991) respectively [23,24]. Polymerase chain reaction (PCR) amplification for the two markers was performed using the reaction mix as follows: 2 mM MgCl 2 , 200 mM dNTP, 1 pmol primer, and 0.025 U mL -1 Taq polymerase (Takara), and the temperature profile of the amplification process is: 95°C for 1 min; 30 cycles of 94°C for 30 s; 52°C for 30 s; and 72°C for 1 min 30 s linked to 72°C for 10 min. Forward and reverse primers of the PCR amplification were also used in sequencing, conducted by the DYEnamic ET Terminator Kit (Amersham Pharmacia Biotech) at the Shanghai Sangon Biological Engineering Technology & Services Co., Ltd (Shanghai, China) with the use of an ABI-PRISM 3730 automatic DNA analyzer. Electropherograms were edited and assembled using SEQUENCHER, version 4.8. Sequences were aligned with CLUSTAL W [25] and refined by visual inspection.

Phylogeographical analysis and divergence time estimate
Phylogenetic analysis of the resulting alignments was carried out by maximum-likelihood (ML) in PAUP Ã version 4.0b10 [26], to investigate the phylogenetic relationships of the cpDNA haplotypes. Characters were weighted equally and their state changes were treated as unordered. We used Modeltest 3.7 [27] to search for the best nucleotide substitution model. The reliability of the ML trees was tested using 1000 bootstrapping replicates. We also used BEAST [28,29] to conduct Bayesian analyses to search for tree topologies and estimate divergence times between the lineages. In BEAST, we used a constant-size coalescent tree prior and the HKY substitution model in the analysis. Each indel was treated as a single mutation event and coded as a substitution in the Bayesian analysis [30]. The cpDNA substitution rates of most angiosperm species were estimated to vary between 1 and 3×10 −9 substitutions per site per year (s/s/y) [31]. Because of the uncertainty of the rates, we used normal distribution priors with a mean of 2×10 −9 and a SD of 6.080×10 −10 within the 95% range of the distribution to estimate the divergence times [32]. All parameters were sampled once every 1000 steps from 10 000 000 Markov chain Monte Carlo steps, after a burn-in of 1 000 000 steps. By visual inspection of plotted posterior estimates, convergence of the stationary distribution was examined using TRACER [29], and the effective sample size for each parameter sampled was found to be over 200. A network of all haplotypes was constructed using statistical parsimony [33] implemented in the program TCS v. 1.21, with a maximum connection limit equal to 14 steps [34]. For the phylogenetic analysis, we chose Ephedra monosperma and E. major as the outgroups of haplotypes of E. przewalskii, and for the network, we used E. monosperma as the outgroup.

Molecular variability and demographic analysis
Within-population diversity (H S ), total gene diversity (H T ), genetic differentiation (G ST ), and population subdivision of phylogenetically ordered alleles (N ST ) were estimated using HAP-LONST. Spatial analysis of molecular variance was performed using SAMOVA, version 1.0, to test the spatial genetic structure, defining groups that are geographically homogeneous and genetically differentiated [35]. The analysis was run from K = 2 to 44, with 100 random initial conditions for each run. Analyses of molecular variance (AMOVA) at different levels were performed to study the genetic structure of the populations [36].
Tajima's D, Fu's F S , and the mismatch distribution were calculated to test for range expansions [37,38,39,40]. A significant value for D or a significant, large negative value for F S , or an unimodal shape of the mismatch distribution may be the result of population expansion [38,41,42,43,44]. All expansion tests were performed in ARLEQUIN, version 3.01 [44], with 10 000 permutations to test for significance. If the sudden expansion model was detected, we then used the relationship τ = 2ut to calculate the expansion time (t) [44], where τ is the total number of mutations, and u is the mutation rate per generation for the whole analyzed sequence. The value of u was calculated as u = 2 μÁkÁg, where μ is the substitution rate per nucleotide site per year (s /s/y), k is the mean sequence length of the analyzed DNA region, and g is the generation time in years. We again used a mean substitution rate of 2.0 × 10 −9 to estimate the expansion times of SAMOVA groups. According to our initial observations, we assumed the generation time of this desert species to be 4 years.

Present and past distribution modeling
Based on the geographical locations of the Ephedra przewalskii populations included in this study (Fig 1, S1 Table), ENMs were carried out in MAXENT, version 3.2.1 [18] to predict potentially suitable past (Last Glacial Maximum; LGM) and present climate envelopes for the species. Geographical distributions was modeled for the present using the WorldClim dataset [45], and for the past, (LGM, approximately 21 kya before present), using the Community Climate System Model [46]. The climatic niche of the species was modelled using eight (out of 19) BIOCLIM variables screened by principal component analysis. These variables were the mean annual temperature, mean diurnal range, temperature seasonality, mean temperature of driest quarter, mean temperature of coldest quarter, annual precipitation, precipitation of the driest month and precipitation seasonality. This restricted dataset was used to avoid including highly correlated variables and prevent potential overfitting [47]. We performed ENMs using the default parameters of MAXENT and the user-selected features: regularization multiplier of 3.0, application of a random seed, duplicate presence records removal, and logistic probabilities used for output [48]. The area under the receiver operating characteristic curve (AUC) was calculated to evaluate model performance. We used 75% of localities to train the model and 25% to test the model. Values between 0.7 and 0.9 indicate good discrimination [49].

Haplotype geographical distribution and relationships
The geographic coordinates and the frequency of cpDNA haplotypes in each population, were presented in Fig 1 and S1 Table. Haplotype A was widespread in the north of Tarim Basin, Urumqi, and Turpan Basin, and beyond Hami Basin, it extended north to Junggar Basin and south to Hexi Corridor. Haplotype D was only found in the west of Tarim Basin. Haplotype E and F were found in the south of the basin, and merged with haplotype A in the east of the basin. Haplotype G was widespread in Junggar Basin, and some other rare haplotypes, such as H, I and J, were also found in the basin. Haplotype M was widespread in Qaidam Basin and its adjacent areas, and haplotype O was distributed in the center of the basin. Haplotypes A and M merged in the north of Hexi corridor and extended to Alxa Desert, whereas haplotype P was found 400 kms away in the south of Hexi corridor (population 45).
In the network (Fig 2), haplotypes D, E and F were in a centra position and therefore presumably ancestral. Haplotypes A was connected to D by one substitution, and haplotype G was connected to A by one substitution. Haplotype L was connected to A by one substitution, which in turn was connected to M by another substitution. Most haplotypes were arranged in the network as would be expected from their geographic arrangement. The exception was haplotype P located further in the south of Hexi corridor. Haplotype P was closely associated with haplotype D, located in the west of Tarim Basin.

Genetic diversity and genetic structure
In the spatial analysis of molecular variance, F CT values increased to a maximum of 0.8799 when the number of groups (K) was raised from 2 to 11. The subdivision pattern of populations corresponding to K = 11 was represented in Table 1.

Demography of the groups in Ephedra przewalskii
Statistics of Fu's F S , Tajiam's D, and the mismatch distribution were analysed for every subdivided group, and the results showed that significant values of Fu's F S , Tajiam's D, and unimodal distributions of the mismatch distribution were detected in groups 1, 6, and 10, and the total Evolutionary History of Ephedra przewalskii samples (S1 Fig; Table 3); thus, these groups and the total samples were speculated to have experienced recent range expansion in the past. The range expansion time of the groups was estimated to have occurred at approximately 83 Kya, and that of the total samples was approximately 53 Kya, which was consistent with the deglaciation period in the Last Interglaciation and the Interstadial of The Last Glaciation respectively [50].

Phylogenetic analysis and divergence time
Trees produced by ML and Bayesian analyses had the same topology, and thus only the ML tree was presented (Fig 3). In this tree, Ephedra przewalskii was resolved as monophyletic and included two main clades. The first clade consisted of haplotypes E and F with high support values (64%; 1.00), that was sister to the second clade including all other haplotypes except E and F, with a moderate support value (64%). The second clade consisted of an inner monophyletic clade (node 1, 51%) and two paraphyletic clades, haplotype D and P. In the inner monophyletic clade, haplotypes A, B, C, G, H, I, J, K, and L clustered into a clade with a moderate support value 0.67, and haplotypes M, N, and O clustered into a clade with a moderate support value 0.78. Divergence times at node 1 was estimated at 0.73 Mya (early-middle Pleistocene).

Past and present distribution of Ephedra przewalskii
The test AUC value for the ENM was very high (0.997), and the predicted range ( Fig 4A) represented the species' current distribution well, except for the east of Alxa Desert, an area of high habitat suitability, was absent ( Fig 4B). Range shrinkage in the LGM was showed by CCSM climate models.

Genetic variation in Ephedra przewalskii
Total gene diversity of Ephedra przewalskii was high compared with other desert shrubs in northwestern China, such as Reaumuria soongarica (H T = 0.607) and Ammopiptanthus mongolicus (H T = 0.434) [8,51], and this was consistent with the results of E. przewalskii in Qin et al. (2013) [11]. E. przewalskii probably originated in the late Tertiary [52], and the long evolution process in northwestern China has evidently accumulated the degree of genetic diversity [53,54]. E. przewalskii spread across almost the entire arid zone in northwestern China. The habitats varied in geology and topography, might harbour locally adapted genetic variations of the species. All these might help account for the high level of genetic diversity observed in the species.
High total genetic diversity but low within-population genetic diversity resulted in significant genetic differentiation among populations, which was also supported by AMOVA analyses. Several factors might account for the high level of genetic differentiation among populations and geographic groups. First, numerous geographic barriers within the    Evolutionary History of Ephedra przewalskii

Allopatric divergence in Ephedra przewalskii
As shown by the genetic variation distribution in Fig 1, there were no shared haplotypes between Xinjiang Province and Qaidam Basin. In the phylogeny, haplotypes in Xinjiang clustered into a clade, which was separate from those in Qaidam Basin (Fig 3). In addition, the main genetic variation occurred among populations, geographic regions, and the subdivided geographic groups, as was demonstrated by the AMOVA analysis (Table 2). Together, these data demonstrated a high genetic differentiation among populations, regions, and geographical groups.
In northwestern China, the generally dry climate could date back to the late-Miocene, and aridification promoted the formation of salt mineral resources in Qaidam Basin [57,58]. Since the Pliocene, uplifting of Kunlun Mountains and other ranges of the northern Tibetan Plateau has caused increased aridity in northwestern China because of the growth of rain shadows [59,60,61]. According to Qin et al. (2013) [11], divergence of lineages of Ephedra could be dated to the Middle or Late Miocene, and this was very likely linked to the uplift of the QTP and Asian aridification. The climate in northwestern China became cold during the Quaternary [62], and during this time, aridification was usually correlated with periodically cold glacial episodes [50,63]. From the early to middle Pleistocene, the dry climate developed continually, and deserts and gobi further expanded in northwestern China [64]. When it came to the middle Pleistocene, vast Taklimakan Desert has been formed. In Tarim Basin, haplotypes were distinct from the north to the south as well as from the west to the south (Fig 1). The genetic divergence between the north and south could have resulted from Taklimakan Desert in the center of the basin, a vast geographic barrier for gene flow exchanging between the populations on either side of the basin. Groups in the west rim were distant from those in the south of the basin, and the long distances between them could make it difficult for gene exchange. Northwestern China entered into the largest glaciation at approximately 0.8-0.6 Mya, and the climate was much colder and dryer beyond the extent of any former periods [50,64]. Extremely low temperatures during glacial periods was usually correlated with intraspecific differentiation for plant species in allopatric regions, such as Hippophae tibetana Schltdl. and Aconitum gymnandrum Maxim. distributed in the QTP, with the major lineages diverging in the early Pleistocene as the cooling climate drove speciation [65,66]. Aridification accompanied with cold temparatures during glacial periods was another driver on intraspecific or specific differentiation, such as Reaumuria soongarica (Pall.) Maxim. and Lagochilus Bunge ex Benth., distributed in northwestern China [8,67]. The main lineages of Ephedra przewalskii (node 1 in Fig 3) diverged approximately 0.73 Mya (early-middle Pleistocene), thus, we infer that the extremely cold and dry climate during this period have resulted in the contraction and fragmentation of the geographic range of the species. Consequently, the intensified barriers for gene flow led to genetic differentiation among Tarim Basin, Junggar Basin, and Qaidam Basin.
In addition, the extreme climate in early-middle Pleistocene might also cause extinct events of some genetic variations. Haplotype P (in the south of Hexi Corridor) was closely connected with haplotype D distributed far away in the west of Tarim Basin by two evolution steps in the network. We speculate that intermediate evolution variants between them have been extinct due to hard adaptation to the extreme climate during the long evolution process.

Range shifts in Ephedra przewalskii
Climate oscillation during the Quaternary is usually considered to have profoundly influenced the current geographic distribution patterns of plants [68]. The results of the ENMs suggested that the potential present distribution approximately included the existing sampled locations as well as others areas suitable for living, such as the north of Alxa Desert, from which the populations are currently absent. We speculate this was caused by long-term dispersal barriers over historical time. In CCSM models, the predicted distribution of the species at the LGM was much smaller than present. During the LGM, the climate was cooler and drier than it currently is, and the hostile environment would be expected to force species to suitable habitats, shrinking the distribution to a smaller range.
Although there is no palynological or macro-fossil evidence, there are signatures that support recent range expansion, including the statistically significant Fu's F S , Tajima's D, and unimodel mismatch distributions. As mentioned, the recent expansion events of the groups occurred during the interglacial period. After the glacial period, temperatures increased and the climate warmed, resulting in a greater amount of glacial ice melting from the surrounding high mountains. The increased run-off infused the basins making the habitats moister [64], which was more suitable for the species to expand their distribution areas into more suitable habitats. During the colonization into a new territory, strong genetic drift would happen in populations on the edge of expansion wave, and some alleles would spread wide areas and reach a high frequency, a process called genetic surfing [69,70]. Consequently the homogenization of the expansion populations would increased, inducing the new occupied areas into distinct sectors from the ancestral population [71,72]. Variants of haplotype A in Tarim Basin, G in Junggar Basin, and M in Qaidam basin might be such cases as this. Haplotype A spread along the north rim of Tarim Basin, and extended north to Junggar Basin and south to Hexi Corridor through Urumqi, Turpan Basin, and Hami Basin. Haplotype M, in contrast, spread into Qaidam Basin, and extended its range east to Hexi Corridor, and further to Alxa Desert. During the multiple processes of colonization, populations with genetic distinction from dispersed refugia would merge with each other, resulting in admixture of the genetic variations from distinct populations [73,74]. Examples of this colonization pattern appear to include: populations in Tarim Table. Sixteen haplotypes of Ephedra przewalskii recognized on basis of two chloroplast DNA sequences, trnH-psbA and trnT-trnF. (DOC)