Promise and Challenge of DNA Barcoding in Venus Slipper (Paphiopedilum)

Orchidaceae are one of the largest families of flowering plants, with over 27,000 species described and all orchids are listed in CITES. Moreover, the seedlings of orchid species from the same genus are similar. The objective of DNA barcoding is rapid, accurate, and automated species identification, which may be used to identify illegally traded endangered species from vegetative specimens of Paphiopedilum (Venus slipper), a flagship group for plant conservation with high ornamental and commercial values. Here, we selected eight chloroplast barcodes and nrITS to evaluate their suitability in Venus slippers. The results indicate that all tested barcodes had no barcoding gap and the core plant barcodes showed low resolution for the identification of Venus slippers (18.86%). Of the single-locus barcodes, nrITS is the most efficient for the species identification of the genus (52.27%), whereas matK + atpF-atpH is the most efficient multi-locus combination (28.97%). Therefore, we recommend the combination of matK + atpF-atpH + ITS as a barcode for Venus slippers. Furthermore, there is an upper limit of resolution of the candidate barcodes, and only half of the taxa with multiple samples were identified successfully. The low efficiency of these candidate barcodes in Venus slippers may be caused by relatively recent speciation, the upper limit of the barcodes, and/or the sampling density. Although the discriminatory power is relatively low, DNA barcoding may be a promising tool to identify species involved in illegal trade, which has broad applications and is valuable for orchid conservation.


Introduction
DNA barcoding uses short DNA sequences to identify species [1,2]. Barcoding is a practical, simple, and quick method compared to traditional methods, but there are pros and cons for DNA barcoding [3][4][5][6][7][8][9][10]. Because of its potential application in several areas of biology, such as species identification, biodiversity assessment, plant conservation, trade control to biomedicine, forensics, and many other applications, DNA barcoding has undergone significant development and growth and hundreds of articles have been published. Therefore, many biologists and other end users have positive attitudes towards DNA barcoding.
Because of the frequent structural variation, low mutation rate, and horizontal gene transfer of plant mitochondrial genome [11,12], greater attention was paid to the chloroplast DNA barcodes in plants. A series of chloroplast fragments have been recommended as barcodes, such as the coding regions, accD, matK, ndhJ, rbcL, rpoC1, rpoB, and ycf5, and noncoding regions, atpF-atpH, psbK-psbI, trnH-psbA, and the trnL intron. Because plant chloroplast genes have a lower mutation rate than animal mitochondrial genes, a multi-locus approach is generally adopted for plant barcodes [2,[13][14][15][16][17][18][19]. For example, Kress et al. [2] proposed that the commonly used ITS spacer and the highly variable trnH-psbA region be used in combination to identify flowering plants. Chase et al. [15] outlined two three-region options, rpoC1 + rpoB + matK and rpoC1 + matK + trnH-psbA. Finally, the CBOL Plant Working Group [18] recommended the combination of rbcL and matK as a core plant barcode.
Orchidaceae are one of the largest families of flowering plants and all orchids are listed in CITES. However, to date the barcoding of orchids is rather limited in number and scope [43][44][45][46][47][48][49][50][51]. Lahaye et al. [44] proposed matK as barcode for the identification of the flowering plants based on data from >1,000 species of Mesoamerican and South Africa orchids. In addition, the species in some genera have been sparsely sampled, for example, Yao et al. [47] studied 17 species of Dendrobium, while Parveen et al. [48] sampled only eight species of Paphiopedilum. Because these studies were based on relatively sparse sampling, the question remains: when more samples are added to these large, diverse genera, will the resolution remain high? Orchid DNA barcoding is far from resolved and more samples and genera should be tested and Paphiopedilum provides an opportunity to explore these questions.
Paphiopedilum Pfitzer (Venus slipper) is the largest genus of slipper orchids, with 96 accepted species (data collected from KBG, 01/2014) and is an ideal group to evaluate the suitability of candidate barcodes for the conservation of plants. Almost all species of the genus have showy flowers and long flowering periods, often up to several months and have been cultivated widely since the 19 th Century [52,53]. However, the ornamental and commercial value of the genus has caused over-collection and illegal poaching and trade [54,55]. For example, Paphiopedilum lawrenceanum has 120 years of cultivation history, but there are no wild populations because of over-collection [56]. Paphiopedilum vietnamense was only discovered in 1997 and is critically endangered in nature [57,58] and all of the species distributed in Vietnam are disappearing rapidly [54]. In addition, the customs and quarantine inspectors often cannot differentiate between rare and common species when not in flower [52]. The young seedlings of Paphiopedilum are very similar and are difficult to differentiate. Thus, morphological assessments are time-consuming, expensive, and require skilled labor [59]. Therefore, DNA barcoding might be used to solve these problems.
In this study, our objectives are as follows: 1) test the performance of the core plant barcode in Venus slippers; 2) evaluate the discriminatory power of nine single-loci (accD, matK, rbcL, rpoC2, ycf1, atpF-atpH, atpI-atpH, ITS) and multi-locus combinations with dense taxon sampling and test whether an upper limit exists in the barcodes; and 3) discuss the factors that affect barcoding success.

Plant sampling
We used the data in Guo et al. [60] for our analysis with two unknown samples excluded. A total of 107 samples representing 77 Paphiopedilum species were used to test the species resolution, 22 of which were represented by two or more individuals and varieties were treated as samples within the same species. These data were supplemented with additional data from GenBank (http://www.ncbi.nlm.nih.gov/genbank/) (S1 Table) to test the upper limit of the barcodes. In total, 359 ITS sequences, 116 matK sequences, 60 ycf1 sequences, and 44 rbcL sequences were downloaded from GenBank.

Data analysis
The sequences were aligned with BioEdit [61] and refined manually. First, we analyzed the data from Guo et al. [60]. We evaluated the resolution of eight single-locus DNA regions (accD, matK, rbcL, rpoC2, ycf1, atpF-atpH, atpI-atpH), six selected two-locus combinations (rbcL + accD, rbcL + matK, ycf1 + rpoC2, ycf1 + atpF-atpH, rpoC2 + atpF-atpH, matK + atpF-atpH), two three-locus combinations (rbcL + matK + atpF-atpH, trnS-trnfM + atpI-atpH + atpF-atpH), and the combined eight cpDNA regions. Then, we evaluated the resolution of ITS and three cpDNA sequence regions (matK, rbcL, ycf1) with the data downloaded from Gen-Bank. The analysis was performed with the SpeciesIdentifier 1.7.7 program from the Tax-onDNA software package [62]. The inter-and intra-specific genetic divergences were calculated following Meyer and Paulay [63] and were used to determine whether a barcoding gap exists. The best match/best close match was used to assess the correct identification of the species [62]. To assess the haplotype accumulation in different datasets, we calculated the accumulation curves for haplotypes in the cpDNA and ITS of Paphiopedilum with the SPIDER package in R [64]. Neighbor-joining analysis of the eight combined cpDNAs was performed in MEGA6 [65], with the Kimura-2-parameter distance option and 1000 replicates.

Results
The number of sequences analyzed and the sequence lengths are listed in Table 1. The attendant datasets included approximately 70-90% of the accepted species of Venus slipper. The species were best represented by the ITS dataset (72/85), followed by matK (55/84), and ycf1 (52/79), but other datasets have lower intra-species sampling. The intra-and interspecific distance ranges overlapped and all tested barcodes had no barcoding gap (Fig 1). The summary of the single-and multi-locus barcode resolution is listed in Table 2. The ITS has the highest discriminatory power of the single-locus barcodes (52.27%) and approximately half the attendant sequences were identified successfully. In the single-locus analysis of the five coding cpDNA regions, rpoC2 has the highest resolution (25.74%), followed by ycf1, matK, and accD (22.42%, 15.88%, and 14.01%, respectively), whereas rbcL has the lowest discrimination rate (3.77%). Of the three intergenic regions, atpF-atpH has the highest resolution (22.42%), followed by atpI-atpH, and trnS-trnfM (19.62% and 13.33%, respectively). Of the multi-locus combinations, except the two two-locus combinations, those with rbcL have relatively lower resolutions (14.14% and 18.86%) and the discriminatory power of the other combinations is similar, ranging from 25.74% to 29.52%. The resolution did not increase significantly with the addition of sequence length.
To eliminate the error induced by the sampling, we calculated the resolution of the taxa with the sequences downloaded from GenBank and the single-locus resolution increased significantly ( Table 2), such that the resolution of matK increased from 15.88% to 32.73% and the resolution of ycf1 increased from 22.42% to 31.13%. In addition, the accumulation curves for the haplotypes in the cpDNA and ITS indicated saturation of the candidate markers with the addition of the sequences from GenBank, which indicates the upper limit of the attendant barcodes (Fig 2). The tree topology of the NJ tree was congruent with that reported in previous studies [60,66]. However, several species represented by two or more individuals did not form monophyletic groups (Fig 3).

Discussion
The efficiency of the chloroplast markers in Paphiopedilum Compared to the study of Parveen et al. [48], the identification rate decreased with denser species sampling ( Table 2). The single-locus resolution ranged from 3.77% (rbcL) to 50.69% (ITS) (ITS > rpoC2 > atpF-atpH > ycf1 > atpI-atpH > matK > accD > trnS-trnfM > rbcL), but the single-locus can assign these species to Venus slipper. ITS is the most efficient single-locus barcode, identifying half the attendant sequences correctly and could be used easily as a potential barcode for Venus slipper. For the five coding cpDNA regions, the efficiency of rbcL is too low, whereas ycf1 and rpoC2 are too long to be used as barcodes ( Table 1). The resolution of matK is slightly higher than accD and matK is one of the most widely used phylogenetic markers with high variation. Therefore, we suggest matK as one of the coding cpDNA regions for the identification of the Venus slipper, which is consistent with the results of Lahaye et al. [44] and Parveen et al. [48]. For the three intergenic regions, atpF-atpH has the highest resolution and shortest length compared to the other two regions (Tables 1, 2), and should be selected as a potential barcode. Moreover, the resolution of the combination of matK and atpF-atpH is comparable with the other combinations.
For the other multi-locus combinations, the efficiency is similar, except for the two twolocus combination with relatively lower resolution ( Table 2). The core plant barcode showed low efficiency in the Venus slipper (18.86%), which is much lower than the 72% obtained by the CBOL Plant Working Group [18] and this is not suitable to barcode the genus. In addition, the lengths of matK, atpF-atpH, and ITS (Table 1) are also suitable as potential barcodes, which could be sequenced with one primer. Therefore, we recommend the combination of matK + atpF-atpH + ITS as a barcode for Venus slipper during the preliminary stage.

Factors that affect species discrimination
Fazekas et al. [67] demonstrated that the resolution of the plant dataset is~70%. The resolution of the present study is relatively low compared to other orchid barcoding studies [44,[46][47][48]50,51] and also non-orchid plant groups [21,25,35,37,38,40,41,67,68]. According to the evolution of the Venus slipper and the sampling strategy of this study, the factors that affect the species discrimination may include the recent diversification of many species, the upper limit of the barcodes, and/or the sampling density. The common ancestor of the Venus slipper dates to the early Miocene [69] and many species are recently diverged [60]. Recently diverged species are difficult to identify [70]. For example, the successful identification of Inga species is 69% and 32% in Araucaria [68]. Most species of Inga originated from recent radiations [68]. In Picea, the recently diversified species distributed in the Himalayan-Hengduan Mountains and northeastern Asia are also a challenge for barcoding [25]. In young species, gene flow may blur the delimitation of closely related species. Guo et al. [60] determined that reticulate evolution plays an important role in the speciation of Paphiopedilum and the rampant non-monophyly of the tested species [43,60] (Fig 3) indicates that the Venus slippers are a conundrum for DNA barcoding. Table 2. Identification success of analyzed barcodes using SpeciesIdentifier 1.7.7 program under 'best match' and 'best close match' methods (Meier et al. 2006 The upper limit of the chloroplast genes also constrains the success rate of species identification [67]. In our study, the combination of the eight cpDNAs together did not significantly improve the resolution of this genus (Table 2), which indicates that the addition of other cpDNAs may lead to correct identification, but would not improve efficiency. In addition, the accumulation curves for the haplotypes in matK, ycf1, and ITS show saturation, which suggests that the barcode efficiency reached the upper limit with increased sampling. There is no barcoding gap in the candidate barcodes of the genus (Fig 1). The barcoding gap does not exist in some other tested plant groups [22,25,44,[71][72][73] and it also affects the upper limit of resolution in the Venus slipper and other untested plant groups. In Bromeliaceae, the two-locus (matK + rbcL) species discrimination is 43.48% and the addition of a third locus (trnH-psbA) did not show a significant improvement [35].
The sampling density may also affect the efficiency. Our study covers 70-90% of the accepted species of Venus slipper. Parveen et al. [48] only sampled eight species of Paphiopedilum, which represent no more than 8% of the accepted species and those eight species are strongly diverged; therefore, matK may identify the eight species correctly. In our study, the resolution of matK is 32.73% and after the saturation of the haplotype, with additional sampling of this genus, the efficiency may decrease. With more multiple representation species included, the resolution may be much higher before the accumulation curve of the single-locus barcode reaches saturation, similar to the single-locus resolution of matK, rbcL, and ycf1 increasing with the addition of sequences from GenBank (Table 2). Other studies showed high resolution with relatively small sampling. For example, Yao et al. [47] collected 17 species of Dendrobium and Hologlossum is a relatively small genus [46]. The rate of successful identification is low in species rich clades and several species-rich genera, such as Pouteria, Inga, Eschweilera, and Ocotea, showed little or no variation in cpDNA [74]. Furthermore, several studies with dense sampling showed low resolution [22,72,75]. For Sisyrinchium, the study sampled 185 accessions from 98 putative species and ITS only identified 30.61-38.78% of the species included [22], whereas Sun et al. [72] collected 148 accessions from 38 species and determined that matK could discriminate only 23.26% of Dioscorea taxa.

Conclusions
The potential application of DNA barcoding promotes the development and growth of the method. In this study, we selected eight chloroplast barcodes and ITS to evaluate their suitability in Venus slippers with dense sampling. We found that ITS is the most efficient single-locus barcode, which can identify half the Venus slippers correctly, whereas the combination of matK + atpF-atpH is the most efficient multi-locus barcode. Therefore, we recommend the combination of matK + atpF-atpH + ITS as the barcode for Venus slipper. However, there is an upper limit of the barcodes tested; therefore, adding more fragments apparently cannot solve the problem. Because of recent diversification and a complex evolutionary history in the genus, low-copy nuclear genes may be used in the DNA barcoding of this genus for more precise identification.
This study sheds light on the barcoding of orchids in a more efficient manner, which can improve orchid conservation. In the future, additional horticultural forms may be cultivated, which will lessen the over-collection from the natural environment. However, based on the assessment of the markers commonly used for the standardized application of this technique, much work remains to be done.
Supporting Information S1