Refining DNA Barcoding Coupled High Resolution Melting for Discrimination of 12 Closely Related Croton Species

DNA barcoding coupled high resolution melting (Bar-HRM) is an emerging method for species discrimination based on DNA dissociation kinetics. The aim of this work was to evaluate the suitability of different primer sets, derived from selected DNA regions, for Bar-HRM analysis of species in Croton (Euphorbiaceae), one of the largest genera of plants with over 1,200 species. Seven primer pairs were evaluated (matK, rbcL1, rbcL2, rbcL3, rpoC, trnL and ITS1) from four plastid regions, matK, rbcL, rpoC, and trnL, and the nuclear ribosomal marker ITS1. The primer pair derived from the ITS1 region was the single most effective region for the identification of the tested species, whereas the rbcL1 primer pair gave the lowest resolution. It was observed that the ITS1 barcode was the most useful DNA barcoding region overall for species discrimination out of all of the regions and primers assessed. Our Bar-HRM results here also provide further support for the hypothesis that both sequence and base composition affect DNA duplex stability.


Classification of Croton and uses in ethnomedicine
Croton (Euphorbiaceae) is one of the largest genera of flowering plants, with between 1,200 and 1,300 species. It is widespread in tropical areas, with habits ranging from large woody trees through climbing lianas to simple and prostrate weeds [1,2]. In Southeast Asia and Thailand there are at least 80 species and 30 species of Croton, respectively [3]. The complex subgeneric taxonomy of Croton relies on a provisional revision of the sections of the genus from the early 1990s [1]. Several studies, based on both classical taxonomy and phylogenetic analyses, have demonstrated the complex taxonomy of the Croton genus (e.g. [1,4]). An example of this complexity is highlighted by C. acutifolius Esser in Thailand. This species is superficially similar to C. robustus Kurz, C. laccifer L. and C. caudatus Geiseler which can however be distinguished by their fruit size, and C. argyratus Blume which is characterized by silvery pubescent mature leaves [3]. Additionally, C. griffithii Hook.f. has been confused with C. oblongus Burm.f. by several authors, as the two species show similarity in their indumentum and floral characters [3].
Many species of Croton are used in traditional herbal medicine around the world. In Asia, and Thailand in particular, there has been a resurgence of interest in traditional medicine, aided by government programs to promote research into medicinal plants as a potential source of new remedies. Among the most popular remedies are those based on Croton species. Many Croton species from this region have been used in traditional medicine and have had their biological activities assessed. These include Croton caudatus Geiseler, Croton crassifolius Geiseler, C. kongensis Gagnep., C. oblongifolius Roxb., C. sublyratus Kurz., C. tiglium L., and C. tonkinensis Gagnep. [5][6][7][8][9][10]. At the same time, it is widely known from scientific and folk literature that many species of Croton are toxic and cause irritation [11], and therefor the correct identification of the species is important to avoid negative health effects.
In Thailand, C. stellatopilosus H.Ohba, known locally as 'Plao Noi', is a popular natural remedy used for stomach disorders. Commercial products containing 'Plao Noi' are sold as tea bags, capsules, and in powder form in herbal markets and are claimed to have anti-ulcer, anticancer, and anti-inflammatory effects. However, confusion has arisen because the same vernacular name is used for a number of species. At least six Croton species, C. columnaris Airy Shaw, C. delpyi Gagnep., C. longissimus Airy Shaw, C. kongensis Gagnep., C. stellatopilosus and C. thorelii Gagnep., share the same Thai common name 'Plao Noi' [3]. The different species vary in their secondary metabolite spectra, which results in variation in their potential efficacy and safety.

Identification of plant products in trade through DNA barcoding
Medicinal plant products are commonly sold in processed or modified forms such as powders, dried material, tablets, capsules and tea bags, making it almost impossible to accurately identify the constituent species using morphology [12,13]. Incorrect identification of the constituent plants may lead to the inclusion of undesirable, unrelated species, with potential health risks to end users. Substitution of the product's ingredients either intentionally or inadvertently can have negative effects on both consumers and producers. Herbal products are often perceived to be safe due to their natural origin. However, counterfeited, substituted and adulterated products can put consumers in danger [14][15][16]. There is an urgent need to find an approach that could help with the quality control of these herbal products in order to ensure consumer safety and which can also produce reliable species identifications even when morphology-based identification is impossible.
Molecular identification through DNA barcoding is a powerful method for the identification of plant species, including medicinal plants and products. Many studies have shown the potential for DNA barcoding to effectively distinguish among medicinal plants, as well as to identify constituent species in processed herbal medicines [12,[17][18][19][20]. Several reviews have highlighted the increasing and diverse applications of medicinal plant barcoding [21][22][23]. However, DNA barcoding in plants does have limitations. These include the inability to amplify marker regions due to degraded DNA in some processed samples [24], limited binding site universality [12,25,26], low rates of marker discrimination [12,27], overlapping intraspecific and infraspecific genetic variation in some groups of plants [28], and low applicability of chloroplast markers for the identification of species of hybrid origin [28]. Another limitation of DNA barcoding is the associated costs, as it requires a molecular laboratory, costly equipment, chemicals and disposables, and DNA sequencing facilities. The lack of access to DNA sequencing facilities and the high costs of sequencing often impedes the wider implementation of DNA barcoding in developing countries [29].

Bar-HRM for species discrimination
Developing and validating sequencing-free methods that are reliable, yet faster and more economical than DNA barcoding is challenging, but will be beneficial for the advancement of herbal product identification routines in developing countries. High resolution melting (HRM) is an emerging method for monitoring DNA dissociation ("melting") kinetics, and is a powerful technique for the detection of point mutations, indels, and methylated DNA [30,31]. In addition to standard PCR equipment and reagents, HRM requires a generic DNA intercalation fluorescent dye. This dye is added to previously amplified PCR products and as the doublestranded DNA samples dissociate with increasing temperature the dye is progressively released and fluorescence diminishes. These denaturation thermodynamics are based on the binding affinities of individual nucleotide pairs, which vary due to indels, mutations and methylations. These differences are inferred by fluorescent measurements collected at standard temperature increments, which are plotted as a melting curve. The curve's shape and peak are characteristic for each sample, allowing for comparison and discrimination among samples. By using HRM, single base changes between samples can be readily detected and identified [32,33].
The first study reporting the use of Bar-HRM to evaluate herbal medicine substitution came from a recent investigation of species substitution among three medicinal species of Acanthaceae [29]. Bar-HRM has also been used in a number of comparable applications [34], such as for authentication of an EU Protected Designation of Origin product made from Lathyrus clymenum [35], for the identification of olive oil and adulterants [36], for subspecies cultivar identification in eggplants [37], for the identification of closely related species of Sideritis and Helleborus [38,39], for species distinction in Mediterranean pines [40], for the detection of allergenic hazelnut contamination [41], and for the identification of processed bean crops [42][43][44].

Research questions and hypothesis
The use of Bar-HRM has been reported for species identification and detection in food and herbal and agricultural products. However, all previous studies have looked at species discrimination in complexes of limited species diversity [29,[35][36][37][40][41][42]. In this study we test the hypothesis that reduced melting discrimination resolution among twelve closely related species in the same genus could be overcome by marker optimization and combination of data from multiple Bar-HRM markers. Here, we use a dataset made up of twelve species in the genus Croton to answer the following research questions relevant to the model group: 1) Can Bar-HRM primers sets that enable universal amplification of selected plastid and nuclear markers be designed?; 2) Can these twelve related Croton species be discriminated by Bar-HRM?; 3) Which single Bar-HRM marker has the highest rate of successful discrimination?; 4) Which minimum number of combined Bar-HRM markers that together give the highest rate of successful identification?

DNA mining of barcode regions
Sequences of four selected plastid DNA regions (matK, rbcL, rpoC and trnL) and one nuclear regions (ITS1) of Croton species from the family Euphobiaceae were extracted from GenBank (at the end of November 2014) using the key phrases "the name of locus" and "the name of genus" in the annotations. Generally, sequences obtained from public databases, including GenBank, are of low quality with no known associated herbarium vouchers. For this reason, all of the sequences were subjected to critical evaluation and any low-quality sequences were removed. After processing, multiple sequence alignments were made from the selected sequences using MEGA6 [45] and variable characters were calculated in order to design primers to be used for high resolution melting (HRM) analyses. The species names and accession numbers of all analyzed sequences are listed in S1 Table. Plant material and DNA isolation Dried plant tissues for DNA extraction were kindly provided by Queen Sirikit Botanic Garden (QSBG) ( Table 1). The plant material was ground with liquid nitrogen, and 100 mg of fine powder was then used for DNA extraction with the Nucleospin Plant II kit (Macherey-Nagel, Germany) following the manufacturer's instructions. DNA concentrations of all samples were adjusted to a final concentration of 20 ng/μL. The DNA was stored at −20°C for further use.

Data mining
Sequences from all five selected barcode regions were extracted from GenBank for species in the genus Croton, and the variable characters and average %GC content were calculated for all samples using MEGA6. Sequence data were present for most markers, except for rpoC. In total, 62 sequences of matK, 148 of rbcL, 240 of trnL and 770 of ITS1 were retrieved, of which 19, 39, 136 and 293 sequences of each respective region were deemed useful for further analysis ( Table 3). The rpoC data were excluded from this study due to the absence of a sufficient number of rpoC reference sequences for the target species (only 1). A single set of primers for each of matK, trnL, and ITS1, along with three sets of primers for rbcL (rbcL1-rbcL3), were evaluated. These various primer combinations yielded amplicons ranging from 100 to 320 bp  (Table 3). Reed and Wittwer [31] found that amplicons suitable for HRM analysis should be 300 bp or less for optimal results. The trnL and ITS1 primer sets yielded amplicons of variable length, whereas the rbcL and matK primers set yielded amplicons of consistent sizes (Table 3). Both the sequence length and the nucleotide variation within sequences influence the dissociation energy of the base pairs and result in different T m values. Two hundred and thirty nine variable sites (67.7%) were observed within the analyzed fragment of the ITS1 region (353 characters) ( Table 3). The ITS1 amplicon sequences were observed to have the highest nucleotide variation, and the relative nucleotide variation within amplicons was found to be: ITS1> trnL > matK > rbcL1 > rbcL2 > rbcL3 (Fig 1A).
The forward matK primer matched the consensus sequence of the target species at the binding sites in only 8 out of 26 sites (30.77%) ( Table 3). High universality at the initial bases of the primer site is crucial for primer annealing and subsequent elongation initiation by the DNA polymerase. The matK locus is one of the most variable plastid coding regions and has high interspecific divergence and good discriminatory power. However, it can be difficult to amplify with the standard barcoding primers due to high substitution rates at the primer sites [46,47]. The average %GC content of amplicons was calculated in order to predict variation in melting curves for the different markers. The matK region had the lowest average %GC content, with 32.26%, followed by trnL, rbcL3, rbcL2, ITS1, and rbcL1 with 35.74, 43.56, 45.1, 52.17 and 53.54% respectively (Fig 1B).

Evaluation of primers and amplicons for discrimination among four Croton species
The seven HRM primer sets were used for the amplification of DNA-fragments from four Croton species, C. delpyi, C. kongensis, C. persimilis and C. thorelii, and the resulting amplicons were analyzed using HRM to define the melting temperature (T m ). These seven primer sets amplified fragments from matK, rbcL1, rbcL2, rbcL3, rpoC, trnL and ITS1 and yielded amplicons of 125 bp, 100 bp, 150 bp, 150 bp, 170 bp, 120 bp, and 300 bp, respectively. HRM analysis was performed in triplicate on each of the four taxa to establish the T m for each primer set. The shapes of the melting curves were analyzed using EcoStudy Software v 5.0 to distinguish among the different plant species. The melting profiles of all amplicons are illustrated in Fig  2A-2F. The mean of the melting temperatures obtained from each primer pair, along with the melting curve shape, was used to measure the discriminatory power of each region regarding the species tested. The discrimination power of each region ranged from 0% (rbcL1) to 100% (ITS1). The primer pairs could be divided into four classes as follows: 1) none of the tested species could be distinguished from each other (rbcL1) (Fig 2A), 2) two out of the four species could be distinguished (rpoC and trnL) (Fig 2D and 2E), 3) three out of the four species could be distinguished (rbcL2 and rbcL3) (Fig 2B and 2C), and 4) all four species could be distinguished (ITS1) (Fig 2F). Although matK has been proposed as one of the best plant barcodes in terms of species discrimination [46,48], we found that the amplification of the fragment of matK that we targeted had such a low success rate that it could not be used for further analysis. Based on these results, it was predicted that the ITS1 primer pair would perform well in HRM analyses with an expanded sampling of Croton species. Several other studies have also shown the accuracy and universality of ITS1 for DNA barcoding in plants [12,17,19,[48][49][50][51].

Bar-HRM using ITS1 primers and an expanded Croton sampling
The ITS1 primer set yielded amplicons of the expected size, approximately 300 base-pairs long, and the amplicons were analyzed using HRM to determine the T m of all 12 tested Croton species (Table 4). Fig 3 depicts the analysis by means of conventional derivative plots, which show the T m value for the ITS1 fragment from each species. However, not all amplicons from the different species yielded distinctive HRM profiles. The melting curves were reproducibly achieved from each of the tested species in triplicate analyses.
To corroborate the results of the HRM analysis, the ITS1 sequences of the 12 tested species were used for further analysis. Sequence data from six species, C. cascarilloides (CAS), C. caudatus (CAU), C. crassifolius (CRA), C. hutchinsonianus (HUT), C. kongensis (KON) and C. persimilis (PER), were obtained. Pairwise comparison of the species' respective ITS1 sequences showed the number of variable sites ranged from 4 sites for PER-HUT to 21 for CAS-CRA   (Table 5). HRM can detect differences among samples of as little as one base pair, but in order to allow for intra-specific and geographical variations within species, an arbitrary confidence interval cut-off of 90% was chosen based on previous molecular studies. The sequence variation  in ITS1 among thirty samples of Croton stellatopilosus collected from different parts of Thailand shows that these can be divided into three groups based on the occurrence of indels [52]. The melting profiles of the six Croton species (Fig 5) were compared to the pairwise sequence analyses ( Table 5). As expected from the sequence analyses, melting curves of C. caudatus and C. crassifolius were nearly the same (6 variable sites in 320 bp fragment) (Fig 6A). The melting curves of C. hutchinsonianus and C. persimilis (4 variable sites) were also expected to be similar to each other but the result showed that they are not (Fig 6B). The most surprising result emerging from the HRM analysis is that the melting curves of C. crassifolius and C. hutchinsonianus are nearly identical to each other (Fig 6C), despite the presence of 19 variable sites among the two species' ITS1 sequences. Comparison of the base composition of the amplicons for each of the nucleotides (Fig 6A-6F) yielded a possible explanation for these unexpected results. A small number of nucleotide variations were found between C. hutchinsonianus and C. persimilis (4 variable sites; Fig 6B), but differences in their base composition are as high as other pairs with different melting curves, e.g. C. cascarilloides and C. hutchinsonianus (18 variable sites; Fig 6D), C. caudatus and C. kongensis (14 variable sites; Fig 6E), and C. cascarilloides and C. crassifolius (21 variable sites; Fig 6F). It can thus be suggested that in the C. crassifolius and C. hutchinsonianus pair, where a high number of variable sites was found between the two sequences, low differences in the overall inter-sequence base  composition resulted in similar melting curve shapes. This combination of findings provides some support for the conceptual premise that it is not only the nucleotide variation but also the nucleotide composition of each sample that contributes to the melting curve shape. Our results seem to be consistent with other research which found that DNA duplex stability can be determined by both sequence and base composition as in the nearest-neighbor (NN) model [53][54][55]. The NN model for nucleic acids assumes that the stability of a given base pair depends on the identity and orientation of neighboring base pairs. The NN model has long been engaged in several works on DNA melting analysis (e.g. [56][57][58]), even before the introduction of HRM technology. Several computer programs such as OligoCalc [59], uMELT [60] and MELTING [61] that can be used to predict melting temperatures and/or generate melting curves for DNA duplexes of interest are built on the NN model, and it is one of the most frequent approaches used for predicting melting temperature.
Although the ITS1 was found to be optimal primer pair in this study, it is likely that other markers and/or combinations thereof might perform better in other plant groups as the results shown in this study are limited to the examined genus (Croton).

Conclusions
Bar-HRM has proven to be a cost-effective and reliable method for the identification of the twelve Croton species tested in this study. The hybrid method of DNA barcoding and High Resolution Melting is dependable, fast, and sensitive enough to distinguish between species. The fragment amplified using the ITS1 primer set had the highest single marker species discrimination rate out of the seven regions coming from the tested plastid and nuclear primer pairs, and it is likely that other ITS1 based primer sets will perform well in other plant groups. The Bar-HRM primer sets developed in this study are not only useful for identification of Croton plant vouchers, but can also be used for species discrimination, authentication, and detection of adulteration in samples lacking diagnostic morphological characters, such as herbal products.
Supporting Information S1 Table. Croton sequences of matK, rbcL, trnL and ITS were retrieved from GenBank (NCBI) for each of the species with accession number. (DOCX)