DNA barcoding for the efficient and accurate identification of medicinal polygonati rhizoma in China

Polygonati rhizoma (PR), a traditional medicinal and edible product with various bioactive components (Polygonatum polysaccharides, saponins, phenols, and flavonoids), is widely consumed in China. However, other species with morphological characteristics similar to those of the actual components are being used to replace or adulterate PR, causing issues with quality control and product safety. The morphological similarity of PR and its substitutes makes classic morphological identification challenging. To address this issue, DNA barcoding-based identification using ITS2 and psbA-trnH sequences was applied in this study to evaluate the efficiency and accuracy of this approach in identifying PR samples collected from 39 different regions in China. The identification of PR by this method was confirmed by other methods (phylogeny-based and character-based methods), and all the samples were clearly and accurately distinguished. This study highlights the efficient and accurate nature of DNA barcoding in PR identification. Applying this technique will provide a means to differentiate PR from other altered formulations, thus improving product quality and safety for consumers of PR and its products.


Introduction
Polygonati rhizoma (PR) is a medicinal and edible product recognized by the China Food and Drug Administration. PR has been used during famine [1] and by Chinese Taoists and Buddhists for sustenance during extended fasting [2]. PR is composed of the dry rhizomes of Polygonatum sibiricum F. Delaroche, Polygonatum cyrtonema Hua, and Polygonatum kingianum Coll. et Hemsl., all of which are perennial herbs belonging to the Asparagaceae family [3]. These three species contain multiple bioactive components, such as Polygonatum polysaccharides [4], saponins [5], phenols [6], and flavonoids [7], which have various biological functions. Records in the Compendium of Materia Medica [8] describe PR as sweet and non-toxic. In China, people use PR as both a healthcare product and a food ingredient in daily meals. It appears to tonify the spleen and kidney, moisten the lungs, and quench thirst [9], according to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 the primary lung, kidney, and spleen channel classifications in traditional Chinese medicine theories. Moreover, modern pharmacological studies have shown that PR improves immunity [10] and memory [11]; reduces blood glucose [12] and fat levels [13]; and elicits antibacterial [14], antiviral [15], anticancer [16], and anti-aging [17] effects. As it has important edible value in both raw and processed forms, PR has been used to make various foods, such as cakes, biscuits, succade, preserved fruit, pastes, bread, tea, drinks, and wine, as well as products like soaps and shampoos.
Unfortunately, numerous rhizome species with similar morphological traits distributed in the southwest region of China are misused as PR components. These include the rhizomes of Polygonatum filipes Merr., Polygonatum punctatum, Polygonatum cathcartii Baker, Polygonatum verticillatum (L.) All., Disporopsis longifolia, and Gelsemium elegans (Gardn. & Champ.) Benth. Thus, efficiently and accurately distinguishing PR from other rhizome species is essential to assure product quality and consumer safety. The components of PR have particular characteristics, including time-consuming methods for breaking the dormancy [18] and growing the seedlings [19], as well as variable growth environments, all of which make classical identification techniques using herb morphology and biochemistry inefficient and often inaccurate. Therefore, other methodologies are required to effectively identify PR components and distinguish them from non-PR herbs. Molecular identification based on DNA barcoding is a recent tool that has been used to conduct species-level identification [20]. As an effective complement to traditional identification methods, DNA barcoding identification classifies species based on standardized, relatively short DNA sequences, which differ among species and are consistent regardless of environment. To date, DNA barcoding has been used to identify various plant species, such as lichens [21], fungi [22,23], weeds [24], trees [25][26][27][28], and economically important plants such as crops [29] and medicinal and aromatic plants [30][31][32][33][34][35]. The DNA barcoding system and principles established by Chen [36] can be used to clearly identify Chinese medicinal materials found in various forms, including multi-component medicines, medicinal powders [37] and fragments [38], and the original herb [39]. This system uses the ITS2 and psbA-trnH sequences as the main and auxiliary sequences, respectively, and has successfully identified Dendrobium [40], Polygonaceae [41], Rosaceae [42], Araceae [43], and Fabaceae [44]. However, this DNA barcoding system has not been used to identify the components of PR.
In this study, we applied a DNA barcoding system with ITS2 and psbA-trnH sequences to identify PR samples collected from southern China. To validate this DNA barcoding-based method, identification using phylogeny-based and character-based methods was also conducted. To our knowledge, this is the first report demonstrating the use of this ITS2/psbA-trnH-based DNA barcoding system to efficiently and accurately identify PR. Importantly, our findings have immediate practical implications on the application of DNA barcoding to molecularly identify herbal medicines irrespective of their material state.

Plant materials
In the present investigation, 39 samples were collected from different regions in 11 different provinces of China (Tables 1 and 2). Approximately 100 rhizomes were selected from each region and were maintained in PR GAP base (Buchang Pharma, Lveyang, Shaanxi, China).

DNA extraction
Fresh rhizome tissue samples from each region were disinfected with 75% alcohol, frozen in liquid nitrogen, and preserved at −80˚C. Total genomic DNA was extracted following the Doyle and Doyle method with little modification [45]. DNA was isolated from 0.5 g of rhizome tissue. Purified total DNA was quantified using a NanoDrop 2000 instrument (Thermo Fisher Scientific, Waltham, MA, USA). The DNA samples were then diluted to 30 ng/μL and stored at −20˚C until ITS2 and psbA-trnH analysis.

ITS2 and psbA-trnH screening and amplification
Previously published ITS2 and psbA-trnH primers [32] were synthesized by Shanghai Sangon Biological Engineering Technology and Services (Shanghai, China). PCR amplification was performed with a Veriti 96-well Thermal Cycler (Applied Biosystems). Amplification reactions were conducted in 20 μL reaction volumes in 1.5 mL microfuge tubes with 10.0 μL of 2× Es Taq MasterMix (CWBIO, China), which contains Taq DNA polymerase, 2× Taq PCR buffer, 3 mM MgCl 2 , and 400 μM dNTP mix, along with 1.0 μL of 1 μM primer, 1.0 μL of 30 ng/μL DNA template, and 7.0 μL ddH 2 O (CWBIO). The PCR amplification procedure was set as follows: an initial denaturation of 5 min at 94˚C; 30 cycles of 1 min denaturation at 94˚C, 1 min annealing at 55˚C, and 1.5 min extension at 72˚C; and a final extension for 7 min at 72˚C. All amplification products were separated on a 1.5% agarose gel using 1× TBE buffer by electrophoresis. The gel was stained with ethidium bromide and visualized with a gel documentation system (Bio-Rad Universal Hood II).

Data collection and analysis
PCR amplified products with high reproducibility and a clear single target band were recovered, and each product was sequenced by Shanghai Sangon Biological Engineering Technology and Services using a bidirectional sequencing method with the amplification primers. Sequences were proofread and spliced with DNAMAN 5.0 software, and the low-quality sequences and primer regions were removed. We blasted each sequence using NCBI BLAST software, and the top search hit was used as the reference sequence. Multiple sequence alignment using the ClustalW program, variable site analysis using the Data Explorer program, and genetic distance (GD) calculations using the Find Best DNA Models program were all conducted with MEGA 6.06 software. We used the nearest distance method to determine the "best close match" for species identification, using 95% intraspecific distance as the threshold [46]. Finally, a neighbor-joining (NJ) tree, which is closely related to the homology of the sequences, was estimated with the model selected by MEGA 6.06 using the GDs between the sampled sequences and the reference sequences calculated by the Compute Pairwise Distance program. A total of 1,000 bootstrap replicates were chosen to test the phylogeny for species identification. To further test the NJ tree, a maximum-likelihood (ML) tree was also estimated with the same models.

Phylogenetic analysis
DNA barcoding sequences of species in Polygonatum and other outgroups (Heteropolygonatum, Asparagus, and Zingiberaceae) were searched for in NCBI's GenBank, which includes deposited data from other researchers and institutions. We collected all the sequences in these NCBI BLAST searches and organized them using the initial letter of their specific name as the norm. Two alignments were generated with the sequences within and among species. Sequences were aligned using ClustalW in the MEGA 6.06 software, which first processes the pairwise sequence alignment representing the relationship between them and then conducts multiple sequence alignments using an asymptotic approach. Thus, the GD among species precisely characterizes the genetic relationship among them.
The best-fit model of evolution and an optimal data-partitioning scheme were chosen using the Find Best DNA/Protein Models program in MEGA 6.06, with each codon position being chosen as an a priori data subset. ML was used as the statistical method using partial deletion gaps, with 95% site coverage as the cutoff for Gaps/Missing Data Treatment and moderate as the Branch Swap Filter. GDs were calculated using the Compute Pairwise Distances program with the Substitution Model selected and partial deletion gaps with 95% site coverage set as the cutoff. An ML tree was constructed in MEGA 6.06 using the models with the following settings: partitioning scheme selected, "Nearest-neighbor-interchange" chosen as the ML heuristic method, "Make initial tree automatically" as the initial tree for ML, and "Moderate" as the branch swap filter. A total of 1,000 bootstrap replicates were chosen to test the phylogeny illustrating phylogenetic relationships among species. Using the bootstrap values to test the credibility of the evolutionary tree branch, we can predicate its veracity.

Topology tests
There are two types of errors in phylogenetic trees: topological errors and branch length errors. Topological difference tests are needed to determine tree reliability, and branch length errors can also be tested using bootstrap tests. When the number of sequences is large and the extent of sequence divergence is low, it is generally difficult to reconstruct the true tree by any method. However, the bootstrap consensus tree often gives a reasonably good tree. Although weakly supported interior branches might differ, the bootstrap consensus tree obtained with the NJ method is usually similar to that obtained with the ML method. Therefore, we chose the NJ method to reconstruct the NJ tree and verify the topological tree constructed with the ML method.

Character-based tests
According to our BLAST searches, Polygonatum species have low psbA-trnH diversity. This makes distance-based identification more challenging, because a query sequence can have a nearly identical distance to multiple, different reference sequences. To circumvent this issue, we used the previously published character-based identification key named "characteristic attributes" (CAs) [47], which consists of 14 nucleotide characters (Tables 3 and 4) at specific positions across the psbA-trnH barcoding region that were used to identify PR species. This CA system was developed from an alignment of 32 reference sequences with diagnostic states that are specific to each PR species.

Sequence amplification, data collation, and preliminary analysis
In our analysis using ITS2 and psbA-trnH primers, ITS2 sequences with polymorphic bands (S1A Fig) were not favorable for PR species identification. However, psbA-trnH sequences (approximately 650 bp) were clear and formed single bands, which could be used to identify PR species (S1B Fig). After each band was recovered and amplified, the samples were labeled and sequenced. After proofreading, aligning, and removing the low-quality sequences and primer sequences, sequence length varied from 529 to 603 bp, with the G+C and A+T content Table 3. Character based identification for samples.

Menu listing
Character positions (22, 27, 65, 67, 103, 104,125, 127, 128, 129, 130, 132, 172,   . Therefore, P. kingianum has a closer relationship with P. cyrtonema than with P. sibiricum. According to the nearest distances method, we verified that sample sequences from 18 regions are classified as P. cyrtonema; those from nine regions, as P. kingianum; those from seven regions, as P. sibiricum; and that from one region, as D. longifolia. Among these, the sample sequences from S30 had equal GDs with P. cyrtonema and D. longifolia, while sample sequences from S8 and S10 had equal GDs with P. curvistylum, P. cirrhifolium, and P. prattii. Sequences from the S9 region also have the same GD with P. curvistylum and P. cirrhifolium. Therefore, this analysis using the nearest distances method supports our results from the BLAST analysis, with the exception of sample sequences from the S30 region. To visualize these results, an NJ tree was generated with the GDs using MEGA 6.06 (Fig  2A). All regions were divided into two clusters with a support rate of 100%. One cluster contained nine regions belonging to P. kingianum, while another cluster contained the remaining regions. To further verify this NJ tree, an ML tree was also generated (Fig 2B). In this tree, all the sampled regions were clustered into two clusters with a 99% support rate. Notably, the ML tree revealed more accurate genetic relationships compared to the NJ tree.

Phylogeny-based tests
A total of 207 psbA-trnH sequences from Polygonatum were downloaded from NCBI's Gen-Bank. After identical sequences deposited by different researchers were removed using Clus-talW, 32 sequences were selected (S1 Table). Through data collection and analysis, we found Table 5. Sequences length and codon content of psbA-trnH sequences.  (Fig 3A) indicates that our sample sequences can be classified into six groups, which supports our species identification with the DNA barcodingbased method. In this tree, psbA-trnH sequences of some species do not appear to be unique, such as those for P. cyrtonema (4), P. odoratum (2), P. cirrhifolium (4), and P. prattii (2). The reason for this might stem from similar sequences being uploaded by different researchers at different institutions. This would result in the sequences being slightly different (individual base changes) because of the plants' different growth environments or small interspecific divergences at the psbA-trnH region. By the ML tree method, the samples identified as P. cyrtonema showed a closer evolutionary relationship with P. cyrtonema, having a 19% support rate, compared with the DNA barcoding-based method. Samples identified as P. sibiricum appeared to have a close evolutionary relationship with both P. sibiricum and P. cyrtonema, with support rates of 26% and 14%, respectively. Furthermore, the ML tree showed 82% and 79% support rates for evolutionary relationships among the samples identified as P. kingianum, P. punctatum, P. cyrtonema, and P. cyrtonema. The topology of the ML tree (Fig 3B) also supports the conclusions drawn from our phylogeny tree.

Regions T(%) C(%) A(%) G(%)
Finally, to test the authenticity and accuracy of our ML tree, an NJ tree was constructed. The NJ tree (S3A Fig) and its topology (S3B Fig) showed the same results as those of the ML tree. Thus, the authenticity and accuracy of the ML tree constructed based on the psbA-trnH sequences is supported by conventional phylogeny-based identification methods, indicating that DNA barcoding using psbA-trnH sequences can be used to identify PR species.  Table) were used to identify Polygonatum species. The combination of these 14 sites formed 11 compound CAs used to identify 22 species distributed in the sampling areas. Apart from P. cyrtonema (3), P. cirrhifolium (2), P. odoratum (2), P. punctatum (2), and P. involucratum (2), which had multiple corresponding CAs, the other 17 species were each associated with a single CA for species identification. Among these, only seven CAs were specific solely to their own species. P. kingianum (B17) and P. sibiricum (B26) had specific CAs, whereas three CAs (B5, B6, and B7) were found for P. cyrtonema. A single CA (B15) was observed for P. involucratum, while another (B24) was found for P. punctatum. Furthermore, two nonspecific CAs (B16 and B23) were also observed for P. involucratum and P. punctatum, but this association requires further confirmation.
These constructed CAs were then used to verify the identity of our 39 experimental samples (Tables 3 and 4). We found that our samples were differentiated in a manner that is similar to that observed using the nearest distance method corresponding to the "best close match." Notably, the S11 and S22 CAs were not retrieved in this CA analysis, whereas they were identified as D. longifolia and P. cyrtonema, respectively, by the nearest distance method and phylogeny-based analysis. Thus, while we can precisely differentiate PR from other Polygonatum species and genera with this character-based method, discrepancies among the identification methods used in this study do exist. Identifying PR with DNA barcoding

Discussion
Consumers purchase and consume PR based on its rhizome composition, which is largely identified on the basis of morphological characteristics. However, the rhizomes of other Polygonatum species and Asparagaceae genera are similar to those constituting PR, making them difficult to differentiate and easy to substitute into this herbal formulation. This can affect the biological effectiveness and safety of PR. Unfortunately, an efficient and accurate identification method has yet to be established for these types of medicinal rhizome formulations. In this study, we used three different methodologies to identify PR. Notably, all three enabled accurate identification of the PR components. Our results are the first to report an efficient approach, combining the three methods, for the characterization of PR, which can be used to discern the identity of PR components in formulations in the market.

DNA barcoding-based identification of samples from different regions
DNA barcoding is an efficient and accurate method for true product identification that is not affected by the condition of the sample material. Barcoding gap, quantified as the difference between intraspecific and the smallest interspecific distance, has been used to evaluate DNA barcoding [49,50] and define new species [51], studies reported earlier showed that it is an artifact of insufficient sampling across taxa [52] and no distinct or sufficiently sized global barcoding gap exists [53]. Thus, it is useless and unworthiness for PR identification at species level due to inexistence of barcoding gap in PR samples in our study. This may because the number of sequences per species is small, and the study reported earlier supports this result [54]. ITS2 and psbA-trnH as recommended DNA barcoding genes have been used to identify plants at the species level based on their high resolution [55] and fast evolutionary rate [56]. The ITS2 sequence has been considered an ideal DNA barcoding sequence for species identification of fungi and higher plants [57], which revealed a 92.7% of resolution success rate at the species level [44,58]. Among genes used for DNA barcoding in plants, rpoB, rpoC1, matK, trnH-psbA, rbcL, ITS, accD, nhdJ, YCF5, UPA, atpF-atpH, and psbK-psbI, psbA-trnH have demonstrated the best amplification success rates and species identification rates [59,60]. However, the success rate of ITS2 amplification is comparatively lower, and the sequencing of cITS2 sequences is a little difficult [55], thereby limiting its application. In addition, a large number of insertions/deletions in the psbA-trnH sequence makes BLAST searches among species in different genera challenging. In this study, ITS2 sequences amplified from samples using universal primers yielded polymorphic DNA bands, not as psbA-trnH. This fact may because universal primer of ITS2 is not specific for PR and Polygonatum species. Study reported by Li [61] showed that ITS2 regions was very low due to failure in PCR amplification for Taxillus chinensis, and this fact could support our result. As a matter of course, the reasons need further investigation. Although ITS2 sequences were unsuitable for PR identification, the higher resolving power and accurate discrimination of PR obtained using the psbA-trnH sequence in this study indicates that this DNA barcoding system can be used to differentiate PR from other Polygonatum species and genera. Our results reflect similar findings reported earlier [62]. According to this method, our 39 samples were divided into five groups: P. sibiricum (7), P. cyrtonema (19), P. kingianum (9), D. longifolia (1), and undetermined (3). Notably, the undetermined groups could not be identified with the same GDs, and the "best close match" was observed for multiple species. This could be due to the low level of variation in the psbA-trnH sequence among these species. However, it is clear that the identity of the P. punctatum samples corresponded to their geographical origin.

Phylogeny-based tests of psbA-trnH sequences
Phylogenetic tree construction can reveal interrelations among different species and can be used to judge the relationships between sample sequences and reference sequences based on their psbA-trnH sequence. These relationships can then be used to accurately identify the samples. Among the multiple phylogenetic trees constructed, the ML tree was considered to be the tree closest to the true tree for our samples. In fact, the ML tree based on the psbA-trnH sequences of our samples reflected the results based on the DNA barcoding system. In the ML tree, S30 was identified as P. cyrtonema and had a close relationship with Heteropolygonatum roseolum. Because three psbA-trnH sequences were downloaded from NCBI's GenBank for P. cyrtonema, the samples identified as P. sibiricum were observed to have a close relationship with P. cyrtonema (KJ745888.1). This may again be due to the low level of variation in the psbA-trnH sequences of these two species. S8, S9, and S10 were also incompletely identified owing to their close relationships with numerous species. This indicates that the psbA-trnH sequences had a low identification efficiency for these species. The topology of the ML tree confirmed these results. Moreover, the NJ tree constructed in this study and its topology were also used to verify the reliability of the ML tree and the accuracy of our results. Our phylogeny-based tests revealed that DNA barcoding identification is an accurate method and can also be used to distinguish PR from adulterants or imitations.

Character-based tests of psbA-trnH sequences
Character-based tests showed the same results as the DNA barcoding system and phylogenybased methods. In this study, P. cyrtonema, P. cirrhifolium, P. odoratum, P. punctatum, and P. involucratum all had more than one CA, which could lead to mistakes in identifying PR. However, three CAs of P. cyrtonema were specific and could be used for identification. Thus, character-based tests of the psbA-trnH sequences can be further used to distinguish PR. Similar findings reported in other species [63] support our results. Compared with DNA barcodingbased and phylogeny-based methods, a character-based method has advantages for identifying species with lower variation in DNA barcoding. DNA barcoding-based and phylogeny-based methods are the main and universal methods, while phylogeny-based methods can identify components at not only the genus and family level but also the species level. Thus, combining all three methods would render our results more accurate.

Application of DNA barcoding for the identification of PR
DNA barcoding has been used for identifying medicinal plants [64] and industrial quality assurance [65], such as for Smithia conferta Sm. [66], turmeric [67], Crocus sativus [68,69], Peucedanum praeruptorum [70], radix astragali [71,72], Cinnamomum verum [73], Sabia parviflora [74], Valeriana jatamansi [75], sandalwood [76], and Hippophae [30], which supports our findings. To date, it has been difficult to completely authenticate PR and its related products without relying on morphological characterization. Furthermore, the authenticity of raw materials is essential to guarantee product quality and consumer safety. DNA barcoding can efficiently and accurately identify products [23] regardless of their form. Thus, the method used in this study has immediate practical implications and can be quickly applied to molecularly identify PR.
However, for all Polygonatum species, DNA barcoding based on psbA-trnH sequences is limited due to lower genetic diversity, which might make inaccurate identification. Identification with more DNA barcodes or complete chloroplast genome and whole genome sequences would provide an effective method for Polygonatum species authentication. In addition, morphological characteristics of medicinal herbs plants can also be used to correctly classify species when they does not vary under different growing environment. Thus, more researches are needed to optimize and improve the method to molecularly identify Polygonatum species.

Conclusions
A total of five species were identified in the 39 samples we analyzed from different growing regions: P. sibiricum, P. cyrtonema, P. kingianum, D. longifolia, and P. punctatum. Samples collected from four regions, S8−S11, were misidentified based on the morphological characteristics of their rhizomes. Our study indicates that this DNA barcoding identification method based on psbA-trnH sequences can efficiently and precisely differentiate PR from other species with the same rhizome characteristics. With this technology, PR quality can be preserved and improved for consumer consumption.