Genetic diversity of the cultivated Salvia miltiorrhiza populations revealed by four intergenic spacers

For better understanding the genetic diversity and phylogeny of the cultivated Salvia miltiorrhiza populations, four intergenic spacer sequences, ETS, psbA-trnH, trnL-trnF, and ycf1-rps15 of the 40 populations collected from China were Polymerase Chain Reaction (PCR) amplified, analyzed both individually and in combination. Haplotype diversity analysis showed that the cultivated S. miltiorrhiza populations had a very rich genetic diversity and an excellent capacity to resist environmental pressure. The best-fit nucleotide substitution models for ETS, psbA-trnH, trnL-trnF, ycf1-rps15, and their combined sequences were HKY+I, T92, T92, T92+G, and T92+G, respectively; the nucleotide conversion frequency in the combined sequences was lower than the transversion, and the relatively high nucleotide substitution frequencies suggests its high genetic variability. Neutral tests showed that the spacer sequences of the populations conform with the neutral evolution model, and there has been no current expansion events occurred. Phylogeny analyses based on both the individual and the combined sequences showed that the 40 populations were clustered in two clades with a very similar topological structure. The discrimination rate of the combined sequence marker is significantly increased to 52.5% (21 populations) over the highest 35% (13 populations) by the single marker of ETS, though still inadequate but a big step forward. Further exploration of more DNA markers is needed. This study for the first time revealed the rich genetic diversity and phylogeny of the currently cultivated S. miltiorrhiza populations in China and provides novel alternative molecular markers for the genetic identification and resources evaluation of the cultivated S. miltiorrhiza populations.


Introduction
Salvia miltiorrhiza Bunge is a perennial medicinal plant of Salvia in the Labiatae family [1]. Its rhizomes have been traditionally and widely used as medicine in China, Japan, the United States and Europe to treat cardiovascular disorders, including atherosclerosis, high blood pressure, hyperlipidemia and stroke [2]. In the last decade, it has been also proved that S. miltiorrhiza has many other pharmacological effects, such as antioxidant [3], neuroprotective [4], anti-fibrosis [5], anti-inflammatory and antibacterial [6], and anti-tumor [7]; and its extensive application has been contributing greatly to patients care and human health. Although native to China, S. miltiorrhiza is also widely cultivated in other countries such as South Korea, Vietnam and Australia [8]. The pharmaceutically effective ingredients of S. miltiorrhiza mainly include fat-soluble tanshinones and water-soluble phenolic acids [9]. Genetic diversity, the genetic variability present within species that changes with time and space, is the product of recombination of genetic material by sexual reproduction, and by mutation of genes, genetic drift, and gene flow. It gives different physical attributes to the individual and capacity to adapt to stress, diseases and unfavorable environmental conditions. Low genetic diversity is an important factor affecting species survival, leading to endangerment or even to distinction. The genetic diversity at various levels provides the basis for the collection, preservation, utilization and evaluation. To ensure and maintain good levels of genetic diversity in crop populations is crucial. The genetic diversity at various levels provides the basis for the collection, preservation, utilization and evaluation of S. miltiorrhiza.
The external properties of S. miltiorrhiza cultivated in different producing areas are very similar and difficult to identify, and the yields and medicinal ingredient contents of different germplasm are also different, which affects the stability of the quality of S. miltiorrhiza medicinal materials and brings multiple difficulties to the standardized production of S. miltiorrhiza [10]. At present, traditional methods based on sources, morphological traits, physical and chemical and biological properties have been mainly used to identify the provenance of S. miltiorrhiza. They have played important roles in the identification and quality evaluation of Chinese medicinal materials, but also have their own shortcomings. Trait identification is difficult to identify materials with similar morphological traits, the microscopic identification requires the relevant professional knowledge, and the physical and chemical properties are easily affected by many environmental factors. In view of the limitations of traditional identification, new methods as supplements have been emerging to ensure more accurate and reliable identification. In recent years, molecular marker technology that reveals polymorphism at the DNA level has become a powerful tool for studying plant genetic diversity [11]. Both the chloroplast and nuclear gene markers are excellent for application for these purposes.
The nuclear ribosomal DNAs (nrDNAs) in higher plants are highly tandem repeating sequences consisted of the external transcribed spacer (ETS), the non-transcribed spacer (NTS), 18SrDNA, internal transcription spacer 1 (ITS1), 5.8SrDNA, internal transcription spacer 2 (ITS2), and 28SrDNA. The non-coding regions of nrDNA are hypervariable, due to the much less selective pressure they subjected. Sequence differences manifested in these regions have potentials for application in systematics, hybridization and population evolution researches among closely related groups [12]. Seven markers had been selected to barcode a wide range of medicinal plant species and their relatives, and found that ITS2 reached the highest discriminating rate of 92.7% at the species level among the 7 fragments [13]. The relatively less repetitive ETS evolves rapidly and has a high degree of polymorphism and variability, and is extensively used in the study of genetic variation, classification and phylogeny. For example, the phylogenetic study on the origin and spread of millet based on ETS detected both intra-and inter-species polymorphisms [14]; and analysis of the ETS of sycamore showed that all the Scandinavian populations composed of well-defined subgroups of the species' genes and had a spontaneous origin closest with other spontaneous populations from neighboring areas of the Nordic continent [15]. The genetic diversity of 39 Chinese cultivated and wild pepper populations from 8 provinces has been assessed by two nrDNA non-coding markers (ETS and ITS2) and found that despite its long-term cultivation, the genetic variation was relatively high existing mainly within each province rather than between provinces [16].
The chloroplasts of plants have higher evolving rates than mitochondria; its uniparental inheritance enables the researchers to target DNA markers in the genome [17]. Initially, functional genes for rubisco large subunit (rbcL), maturase K (matK), NAD(P)H dehydrogenase F (ndhF), the β-subunit of ATP synthase (atpB), the RNA polymerase B and C subunit (rpoB and rpoC) genes, and intergenic spacers between tRNA Leu and tRNA Phe genes (trnL-trnF), between D1 protein of PSII and tRNA His genes (psbA-trnH), and between CP47 protein and small subunit I genes of PSII (psbK-psbI) in the chloroplast genome were used individually for plant species identification. Later, it was proposed to use different combinations of these fragments. For example, the relatively conserved psbA [18] has been studied as markers in lake and marine ecosystems [19]; the second longest hypothetical chloroplast open reading frame1(ycf1) in the chloroplast genome [20] which has been proved to be essential for plant survival [21] has only recently begun to be explored in phylogenetic studies; the chloroplast rps15 gene encoding a ribosomal protein S15 (rps15), mediating the interaction between the subunits and interacting with mRNA and tRNA [22] plays certain role in the production of ribosomes [23]. It has been also used for phylogenetic studies [13].
Current researches on plant genetic geography have been under transition from individual sequences in the past to the joint analysis of the sequences from the chloroplast genome or nuclear ribosomal genes. For example, the trnH-psbA spacer and partial coding region of rbcL gene has been combined as two global terrestrial plant molecular genetic markers and provided the necessary versatility and species differentiation [24]. A higher species resolution has been obtained with the combination of matK + atpF-atpH + psbK-psbI [25]; the monophyly and placement of Lepechina in the Lamiaceae plants have been tested based on separate and combined datasets of cpDNA (ycf1, ycf1-rps15, and trnL-trnF) and the combined nrDNA (ITS and ETS) datasets [26]. The phylogeny and biogeographic history of Lamiaceae plants based on a combined sequence of 1144bp containing 519 mutation sites and 339 parsimony information sites from five plastid sequences (rbcL, rps16, rpl32-trnH, psbA-trnH and trnL-trnF) and two nuclear sequences (ITS and ETS) has been explored to interpret the disjunct distribution of Northern Hemisphere herbaceous plants [27]. The phylogeny of East Asian sage of the most comprehensive geographic location, taxonomy, and genetic sampling data has been reconstructed by two separate matrixes, one combined the two nrDNA (ITS, ETS), and the other combined the four cpDNA (psbA-trnH, ycf1-rps15, trnL-trnF, and rbcL) [28].
The chloroplast genome of S. miltiorrhiza has been sequenced and its detailed structure provides an excellent basis for systematic analyses [29]. Its more variable non-coding regions could be readily used for intra-and inter-specific phylogenetic studies.
As one of China's traditional bulk medicinal materials, S. miltiorrhiza has been playing increasingly important roles in clinical and health-care utilization. However, the random introduction of unidentified source varieties in production and nonstandard management have resulted in a chaotic variety sources and unstable quality; and studies concerning the genetic diversity and molecular identification of the germplasm resources have been also inadequate. So far as we known, no reports on the genetic diversity of the cultivated S. miltiorrhiza populations have been seen. In this study, the nuclear ETS and three chloroplast intergenic spacers of the 40 cultivated S. miltiorrhiza populations in China were for the first time analyzed in order to identify the SNP fingerprints, understand the genetic diversity and phylogeny both individually and in combination. A basis for the protection, introduction, domestication, novel variety selection, and breeding of the cultivated S. miltiorrhiza germplasm resources, was provided.

Sampling
Seeds of 40 cultivated S. miltiorrhiza populations representing more than 30 regions of China were collected from three major seed industries (Table 1). Uniform seeds preliminarily selected were used for sowing in Southwest University Agricultural Station, Chongqing; and leaves of morphologically representative single plants of each population were used for extraction of genomic DNAs.

Methods
Genomic DNA extraction. The total genomic DNA of S. miltiorrhiza was extracted by the CTAB method [30], the purity was confirmed by 1% agarose gel electrophoresis, and was stored at -20˚C.
Primers and PCR amplification. Well-documented 4 primer pairs respectively for the 4 intergenic spacers (Table 2) were adopted. The spacers were PCR amplified in a 25μl reaction system consisted of 2×SanTaq PCR Mix (Sangon, Shanghai, China) 11μl, primers (10μmol/L) each 1μl, DNA template 1μl, ddH 2 O 11μl, with the program of pre-denaturation at 94˚C for 5min, denaturation at 94˚C for 30s, annealing for 30s, extension at 72˚C for 30s, 35 cycles, and a final extension at 72˚C for 10min. Products were electrophoresed by agarose gel, purified and then bidirectionally sequenced by dideoxy chain termination (Sangon, Chengdu, China).
Data processing. The manually checked and confirmed sequence data were subject to BLAST analysis and a corresponding reference sequence with the highest identity percentage for each locus was determined and used for manual finishing and multiple alignment with BioEdit 7.09 and Vector NTI Advance 11.5.3, respectively. The best nucleotide substitution models were identified via the Find Best DNA/Protein Models in MEGA 7.0 [35], and the Table 1 displayed substitution frequencies were used to make histograms with Excel. MEGA 7.0 was used for genetic evolution analysis and phylogenetic tree construction; unrooted phylogenetic trees with a cut-off value of 50% for consensus were constructed with MEGA 7.0 based on Neighbor-Joining (NJ) with a bootstrap value of 1000. DnaSP v5 was used for analyses of polymorphic sites, genetic diversity index, and neutral detection [36].

General features of the intergenic spacers
The four intergenic spacer sequences of ETS, psbA-trnH, trnL-trnF and ycf1-rps15 of the 40 cultivated S. miltiorrhiza populations were assorted and submitted to GenBank, and the corresponding accession numbers were obtained (S1 Table).  (Table 3). Based on the matrix length of ETS, psbA-trnH, trnL-trnF, ycf1-rps15, or the combined sequence (Table 3), the parsimony informative sites accounted for 46.6, 19.5, 7.4, 19.3, and 24.7% respectively, indicating that the ETS of the cultivated S. miltiorrhiza populations had the highest in parsimony information site rate, the combined sequence, second; the chloroplast genome psbA-trnH and ycf1-rps15, lower and comparable, and trnL-trnF was the least.
In the studying of the processes of gene evolution, researchers have proposed different nucleotide substitution models to demonstrate the DNA substitution process. By MAGA 7.0 using the maximum likelihood method, the best nucleotide substitution models for the four DNA molecular markers under this study, ETS, psbA-trnH, trnL-trnF, ycf1-rps15, and the Table 2. Primers used in this study.

Nucleotide variation frequency analysis of the intergenic spacers
Statistics on the nucleotide substitution frequencies (Fig 1) showed that for the nuclear ribosomal ETS sequences, the conversion frequency (40.1%) is lower than the transversion (59.9%), while for the other three intergenic spacers of the chloroplast genome, conversion frequencies were higher than those of transversion. The conversion and transversion frequencies of the combined sequences were 43.4% and 56.8% respectively, in good accordance with the those of the three non-coding regions of the chloroplast genome.

Genetic diversity analysis of the intergenic spacers
Dna SP 5.10 was used to analyze the ploidy polymorphism and neutrality detection, and the haplotype, haplotype diversity (Hd), nucleotide diversity (π), variance of haplotype diversity (Vh) and standard deviation of haplotype diversity (Sh) of the four intergenic regions and the combined sequence. The results of neutral analysis showed that the Fu and Li's D � and F � test   Table 4). The results suggested that the three intergenic spacers, ETS, psbA-trnH, ycf1-rps15, and the combined sequence of the cultivated S. miltiorrhiza populations conform with the neutral evolution at the species level, while the trnL-trnF does not.

Evolutionary relationship of the intergenic spacers
Unrooted phylogenetic trees with a cut-off value of 50% for consensus based on the four intergenic spacers (ETS, psbA-trnH, trnL-trnF, and ycf1-rps15) of the 40 cultivated Danshen populations were constructed. The phylogenetic trees based on ETS and psbA-trnH of S. miltiorrhiza were similarly shown a two-clade structure: for ETS, 28 populations clustered as one, and the  remaining 12 populations clustered in the other clade (Fig 2A), and for psbA-trnH, 29 populations clustered in one clade, and the remaining 11 in the other (Fig 2B). The phylogenetic trees based on the trnL-trnF and ycf1-rps15 of S. miltiorrhiza showed a very similar structure in that the same 30 populations clustered in one clade, and the remaining 10 clustered in the other (Fig 2C and 2D). The phylogenetic tree based on the combined sequence of S. miltiorrhiza showed that 40 populations clustered in 2 clades, among which 29 populations are clustered in one clade, and the remaining 11 populations clustered in the other (Fig 2E), very similar to psbA-trnH (Fig 2B). The four intergenic spacers and the combined sequence of the 40 cultivated S. miltiorrhiza populations all showed a similar two-clade topological structure in the phylogenetic trees with only slight differences.

Discussion and conclusion
For phylogenetic and genetic diversity studies, the key is the selection of suitable genetic markers [37]. It has been suggested that DNA barcodes are used for species identification by sequencing the standard regions of DNA [38]. Although the ribosomal DNA contains very rich genetic information and undergoes a high rate of evolution, it is often prone to recombination and heterozygous due to the biparental inheritance. It is difficult to find single-or lowcopy nuclear genes with sufficient variation [39]. Thus it is often confronted with the problem of orthology or paralogy in pedigree geography studies. In contrast, chloroplast DNA is generally maternally inherited in most angiosperms [40] and its non-coding regions are subject to low selective pressures with no gene recombination, which can unambiguously reflect the pedigree history, and has gradually been widely used in studies of plant genetic structure evaluation, population genetic diversity, phylogeny and pedigree geography [41]. The nuclear ribosomal ETS evolve rapidly and have high polymorphism and variability, and play important roles in the studies of genetic variation, classification and phylogeny. However, it is difficult to find suitable universal primers to amplify the entire region of ETS. So there are relatively few related reports in systematic studies [42].
In this study, the four intergenic spacers of ETS, psbA-trnH, trnL-trnF and ycf1-rps15 of the 40 cultivated S. miltiorrhiza populations in China were successfully amplified by PCR. Based on the matrix length, the number of parsimony informative sites accounted for 205, 71, 24, and 88, respectively ( Table 3). The results indicate that, among single markers, the ETS of the cultivated S. miltiorrhiza populations were the highest number of parsimony information sites. The PIS rates (Table 3) were generally consistent with the study of the phylogenetic relationship and haplosystem of Lamiaceae plants [26], which showed parsimony information rates of 39.8, 23.95, and16.67%, respectively, for the ETS, ycf1-rps15, and trnL-trnF, markers. The lower PIS of the chloroplast IGS than that of ETS demonstrated both in our research and in haplosystem of Lamiaceae plants [26] indicate that ETS within the nuclear genome are more varied, more informative, and more discriminating than those IGS in the chloroplast genome (psbA-trnH, trnL-trnF, and ycf1-rps15). However, the number of parsimony informative sites of the combined sequence was even higher (392), reaching a discrimination rate of 52.5%.
It is generally accepted that for most of the common species, the more widely distributed, the higher its genetic diversity, while the genetic diversity of endemic, rare or narrowly distributed species is low, and the low genetic diversity is an important factor leading to species endangerment or even extinction [43]. In this study, the genetic diversity analysis of 40 cultivated populations of S. miltiorrhiza showed that the Hd of ETS, psbA-trnH, trnL-trnF and ycf1-rps15 sequences were all greater than 0.5, indicating that S. miltiorrhiza has very rich genetic diversity at the species level, which is consistent with the result based on EST-SSR markers for S. miltiorrhiza [44]. The Hd of the combined sequences is 0.950, much higher than that of the four individual intergenic spacers, suggesting the better reflection of genetic diversity of the cultivated S. miltiorrhiza populations by multi-sequence combination, and could be important in evaluating its genetic variation and adaptability. The best-fit nucleotide substitution models for ETS, psbA-trnH, trnL-trnF, ycf1-rps15 and the combined sequence are HKY+I, T92, T92, T92+G and T92+G respectively. The lower conversion versus transversion frequency of the joint sequences is consistent with all the three IGSs from the chloroplast genome. The relative high nucleotide substitution rates found in this research might be the reason for the high genetic variation among the cultivated S. miltiorrhiza populations, as suggested [45]. Results from Fu and Li's D � and F � test statistics, together with the Tajima's D neutral test show that the ETS, psbA-trnH and ycf1-rps15 all conform with the neutral evolution model, while trnL-trnF does not. The combined sequence is not significant at the level of P>0.10, and conforms with the neutral evolution model. It is suggested that the cultivated S. miltiorrhiza populations has not recently experienced expansion events.
It was also found, in this research, that the phylogenetic trees based on both the individual and the combined sequences of the four intergenic spacers of the 40 cultivated populations of S. miltiorrhiza exhibited topologically very similar two-clade structures. For single sequences, ETS has the highest identification rate of 35.0%, and the trnL-trnF sequence has the lowest identification rate of 10.0%. The combined DNA marker has an identification rate as high as 52.5%. But still, they could not discriminate all the tested populations. Further endeavor to explore more and effective DNA markers for the identification of more reliable composite molecular markers is needed so that convenient, fast, efficient and standard identification at the gene level could be established.
Supporting information S1