Rapid Plant Identification Using Species- and Group-Specific Primers Targeting Chloroplast DNA

Plant identification is challenging when no morphologically assignable parts are available. There is a lack of broadly applicable methods for identifying plants in this situation, for example when roots grow in mixture and for decayed or semi-digested plant material. These difficulties have also impeded the progress made in ecological disciplines such as soil- and trophic ecology. Here, a PCR-based approach is presented which allows identifying a variety of plant taxa commonly occurring in Central European agricultural land. Based on the trnT-F cpDNA region, PCR assays were developed to identify two plant families (Poaceae and Apiaceae), the genera Trifolium and Plantago, and nine plant species: Achillea millefolium, Fagopyrum esculentum, Lolium perenne, Lupinus angustifolius, Phaseolus coccineus, Sinapis alba, Taraxacum officinale, Triticum aestivum, and Zea mays. These assays allowed identification of plants based on size-specific amplicons ranging from 116 bp to 381 bp. Their specificity and sensitivity was consistently high, enabling the detection of small amounts of plant DNA, for example, in decaying plant material and in the intestine or faeces of herbivores. To increase the efficacy of identifying plant species from large number of samples, specific primers were combined in multiplex PCRs, allowing screening for multiple species within a single reaction. The molecular assays outlined here will be applicable manifold, such as for root- and leaf litter identification, botanical trace evidence, and the analysis of herbivory.


Introduction
The identification of plants is well established for the majority of species occurring in Europe. This approach, however, gets corrupted when no morphologically assignable parts are available [1,2]. For example, determining the composition of root samples containing multiple species by morphology-based approaches is impossible [3]. The problem becomes even more evident when bringing soil-living herbivores into play. Albeit leaf litter and below-ground plant parts are representing an important food source, our knowledge of the dietary choice in soil-living animals is rudimentary [4]. The semi-digested plant tissues, remaining in the intestine or faeces of herbivores, are also not identifiable based on morphological characters. Knowing the food sources and dietary preferences of soil animals, however, is vital, for example to manage soil insect pests.
Molecular methods, based on genomic differences between plant species, offer a promising means to circumvent these problems [5], [6]. In recent years, the application of diverse molecular techniques has gained increasing importance in answering ecological questions, e.g. concerning population genetics, the assessment of invasive or endangered species, or trophic interactions based on morphologically unidentifiable remains [7]. Amongst these newly evolved approaches, DNA barcoding, which relies on the use of a standardized DNA region for accurate and rapid species identification [8], has been used more and more by ecologists. Since the last decade the international initiative CBOL (Consortium for the Barcode of Life, http://barcoding.si.edu) aims in global standards for DNA barcoding. But, in plants the situation is controversial and many strategies have been proposed. The mitochondrial cytochrome c oxidase subunit I gene (COI), which serves as the standard barcode for animals, is not suitable for species identification in plants, due to low levels of variability. Previous studies on DNAbased plant identification were primarily focusing on the plastid genome (e.g. [9], [10], [11], [12], [13]), but there is a lack of consensus regarding the most universal, informative and technically practical DNA region(s). The suitability of a molecular marker strongly relies on the questions to be answered. For ecologists, who are concerned with the identification of environmental samples [7] it is essential, that the target DNA region exhibits highly conserved priming sites to guarantee reliable DNA amplifications. Moreover, it should be short enough to allow amplification of degraded DNA. Taberlet et al. [14] promoted the trnL intron as a plant barcode, harbouring its main power in ecological applications [7], i.e. when working with degraded DNA [15], [16], [17], [18].
The trnL-barcode has also been adopted in studies on herbivory using next generation sequencing techniques [19], [20], [21], [22]. This approach, however, is costly, especially for processing large numbers of individual samples. Diagnostic PCR (polymerase chain reaction) using specific primers, offers a cost-effective alternative for the molecular identification of specific plant taxa. If the primers are designed to amplify amplicons of different length, it is possible to screen for multiple species within a single reaction [23,24]. However, unlike the COI in animals, the alignment of non-coding cpDNA sequences is challenging due to the considerable variability which can occur even between closely related taxa [25]. Consequently, so far only few primers are available for specific plant taxa, most of them accessing nuclear DNA [1], [26], [27]. The current paper describes a novel approach for identifying plant species via diagnostic PCR based on the trnL-F region.
Here we present diagnostic PCR assays, ready to use for the identification of various plant taxa common in agricultural land. Moreover, we show -step by step -how to generate these PCR assays using primers targeting the trnT-F cpDNA region, allowing the development of diagnostic assays for further plant taxa not included here. Our approach involves three consecutive steps: (i) development of specific plant primers at different taxonomic levels, (ii) combination of primers in multiplex reactions, and (iii) optimization of PCR protocols to maximize their specificity and sensitivity. This practice involves two deliberate strategies, aiming to maximize screening efficacy: firstly, the development of groupspecific primers allows a pre-selection, thus reducing the number of samples that need to be analysed for different species within the respective genera or families (2-step analysis); and secondly, the combination of primer pairs in multiplexes to reduce the number of PCRs necessary.

Methods
The current paper is part of a comprehensive study on the feeding ecology of wireworms, the soil-living larvae of click beetles (Coleoptera: Elateridae). Wireworms of the genus Agriotes were chosen as they feed on the underground parts of a wide range of plants [28] and are amongst the most abundant soil pests in arable land [29]. The locations of plant and animal collection were not protected in any way and no specific permits were required for the described studies. We confirm that they did not involve endangered or protected species.
To identify the plant species eaten by these insect larvae we employed a PCR-approach, based on the use of specific primers. Each primer combination was designed to specifically target a single plant species, genus or family, thus resulting in a DNA fragment of distinctive size allowing identifying the targeted taxa. The best performing primer combinations were then joined together in multiplex PCRs, and reaction conditions optimized for maximum specificity and sensitivity.
To develop and test the method there had been five consecutive steps: (i) compilation of a sequence database, (ii) construction of specific primers based on the sequence database, (iii) test of primers and optimization of PCR reactions, (iv) evaluation of the developed PCR-assays for specificity and sensitivity, (v) test of the PCR-assay on various field samples.

Plant species and DNA extraction
All plants (target and non-target species, Table 1) were collected as multiple individuals in the summers 2008/09 from grasslands and maize fields in Tyrol (Austria) and stored at 280uC.
Plant tissue was homogenized together with glass beads in 440 mL lysis buffer containing TES-buffer (0.1 M TRIS, 10 mM EDTA, 2% SDS, pH 8), 10 mL Proteinase K (20 mg/mL, AppliChem, Darmstadt, Germany), and a pinch of PVP (Polyvinylpyrrolidone) using a PrecellysH 24 Tissue Homogenizer (Bertin Technologies, Montigny-le-Bretonneux, France). To increase the DNA yield, samples were incubated in the lysis buffer for 12 hours. The remaining DNA extraction followed a modified CTAB-based protocol described by Juen & Traugott (2005). Forty seven plant species were sequenced for part or the whole cpDNA sequence of interest, and representative sequences submitted to GenBank (accession numbers are JQ041821 -JQ041881).

Sequence Database
The chloroplast DNA sequence between the trnT (UGU) and the trnF (GAA) genes was selected for the development of the species-, genus-and family-specific primers. This region comprises two exons of the trnL (UAA) gene (trnL-E1 and trnL-E2) and three non-coding regions: the intergenic spacer between trnT and trnL-E1 (IS1), the trnL intron (trnL-I) and the intergenic spacer between trnL-E2 and trnF (IS2). This chloroplast region is known for its potential as species-specific marker due to low intra-and higher inter-specific genetic variation [14]. Primer design was based on alignment of sequences from target and non-target plant species. The sequence database was built by combining published sequences from GenBank and sequencing results from specimens collected in grasslands and maize fields at the study sites. Of the 100 plant species present (Table 1) 78 species were represented by part or the whole cpDNA sequence of interest in GenBank already. Using general primers [14], [10], [15] (PCR conditions see: Supporting Information S1) we obtained sequences of additionally 46 species (GenBank accession numbers are JQ041821 -JQ041881). Altogether we relied on a final sequence database comprising 92 plants.
Since the entire trnT-F region is too long for sequencing it within a single sequence run, several reactions need to be carried out, resulting in a final assemblage of the entire region. But, the general plant primers [14] do not always perfectly match, resulting in incomplete DNA sequences for some species, both in our sequence database and in GenBank.
Sequence information on the introns trnL-I and the intergenic spacer IS2 was available for 91% and 80% of the investigated plants, respectively. Fewer sequences could be retrieved for the IS1 (36% of the investigated plant species. Sequences length varied from 241 to 588 bp for the trnL-I, and 541 to 991 bp and 75 to 692 bp for the intergenic spacers IS1 and IS2, respectively. Consensus sequences for each species were constructed by combining all sequence information available using BioEdit Sequence Alignment Editor [30].

Primer design
An overall reliable sequence alignment of all study species was impossible due to the high variability within the non-coding regions and the fact that for many of the species only part of the trnT-F cpDNA was available. So we aligned (i) all sequences within families and (ii) all sequences that were available in full length, i.e. the whole sequence between trnT and trnF (30 plant species) using Clustal X (Larkin et al. 2007). Finally, the alignments were handedited using BioEdit. Based on these sequence alignments it was possible to define regions that were highly similar across all species and families and we could pinpoint sequence positions that were suitable for the 39-end of the specific primers.
Forward and reverse primers were constructed for different plant taxa using CLC DNA Workbench 4.0, (CLC bio, Aarhus, Denmark) following the rules for ARMS primer design (Hawkins 1997). We developed group-and species-specific primers to identify two plant families (Poaceae and Apiaceae), the genera Plantago and Trifolium, and nine plant species common in Central European agricultural land: Achillea millefolium, Fagopyrum esculentum, Lolium perenne, Lupinus angustifolius, Phaseolus coccineus, Sinapis alba, Taraxacum officinale, Triticum aestivum, and Zea mays. All potential primers were checked in CLC DNA Workbench for crossamplification, within target and non-target species. Only 10% of the originally selected primer positions were found reliable for specific primers, due to repeats of sequences on both strands and in different relative positions within introns. The evaluation of the primers included tests of several DNA extracts from at least five different individuals per plant species. The final primer pairs were chosen based on similarity in melting temperature and on the fragment length of amplicons.

Optimization of PCR assays
All primers developed were initially checked in singleplex PCRs (specific conditions see: Supporting Information S2). The best performing primer combinations were then tested in gradient PCRs to define the optimum annealing temperature. Finally, conditions for multiplex PCRs were optimized, testing different concentrations of primers (0.2-0.8 mM) and MgCl 2 (3-6 mM), and by varying the duration of annealing and extension steps (60 or 90 s). To test the efficiency of the assays in amplifying specific taxa in compound samples, mixes from the targeted plant DNA in different combinations were used. The mixed samples included DNA of different numbers and combinations of target and nontarget species. PCR products were visualized on QIAxcel, an automated capillary electrophoresis system (Qiagen, Hilden, Germany) with method AL320, and results were scored using BioCalculator Fast Analysis Software version 3.0 (Qiagen). All samples showing the expected fragment length, with signal strength above 0.1 relative fluorescent units, were deemed to be positive.

Evaluation of the PCR assays
The specificity of the primer pairs finally selected was tested for cross-amplification against DNA from all other species occurring in the same habitat (i.e. grasslands and maize fields; Table 1) and against wireworm DNA.
For testing the sensitivity of the newly established PCR assays, DNA templates of all target species for species-specific primers and of representative species for the genus-and family-specific primers were required (Table 2). Hence, general plant primers [14] were used to amplify fragments from the trnT-F cpDNA region which covered the binding sites of the newly designed primers (PCR conditions are given in Supporting Information S3). The DNA concentrations of the purified PCR products were then determined with a VICTOR TM 64 Multilabel Plate Reader (Perkin Elmer, Waltham, USA) using Quant-iT TM PicoGreenH dsDNA Assay Kit (Invitrogen, Paisley, UK) and the molecular weight of the PCR products was computed, summarizing the weight of the nucleotides within the sequences of each species (including the flanking primer sequences). Based on the DNA concentrations (ng mL 21 ) and the molecular weight of the fragments the number Table 1. Plant species collected in maize fields (M) and perennial grassland (G), which were used to establish the PCRbased identification system.  of template copies per ng DNA was calculated, which was finally used for sensitivity testing. The actual sensitivity of the optimized diagnostic PCR protocols was determined via serial dilution of template DNA (i.e. known numbers of copies). Assay sensitivity was also evaluated in the presence of wireworm DNA to test the capability for molecular gut content analysis. For the latter, for each plant species 1 ml of the two highest dilutions of template DNA tested positive (Table 3) were spiked with 3.5 ml of undiluted Agriotes spp. DNA.

Applicability of the PCR assays
To evaluate the performance of the method with degraded and complex samples, DNA extracts of decayed plant material and wireworms from both, feeding experiments and catches in the field, were tested.
For decayed samples, maize stalks and whole wheat plants were buried in an abandoned field (574 m a.s.l., Tyrol, Austria) and left there for 20 (wheat) and 24 (maize) weeks, respectively. At this time point most plant parts were almost decomposed. We then analyzed ten DNA extracts per plant species using the TZ duplex ( Table 2) to test the applicability of our method for decayed plant tissues.
In addition, we tested the PCR assays on whole-body extracts of wireworms obtained from feeding experiments, which were performed similar to those described in [31]: we offered L. perenne, T. officinale, A. millefolium, T. pratense, Plantago lanceolata, and Pimpinella major for 24 h to the larvae as a food source. Subsequently, total DNA of 10 wireworms per plant species was extracted, including any plant DNA present within their guts [31], and analyzed them with the adequate PCR assays ( Table 3).
The third set of samples comprised whole-body DNA extracts of wireworms, which were collected in a maize field (574 m a.s.l., Tyrol, Austria); these samples were tested with the TZ duplex PCR (Table 2). Table 2. Details of primers: plant species targeted, primer sequences (forward-followed by reverse primer), expected amplicon length, concentration of each primer (mM), optimized annealing temperatures (uC), MgCl 2 concentration (mM), and affiliation to a multiplex assay.

Specific primers
Species and genus-specific primer sites were found in all introns (Fig. 1), and the PCR products of two species-specific primer pairs also include the trnL-E1 region. The newly designed primers generate amplicons ranging between 116 bp and 381 bp.
The two family-specific primer pairs for Poaceae and Apiaceae are positioned in the IS2 and both reverse primers are placed next to the trnF gene. The length of the PCR product for Poaceae varies considerably among species, being shortest for Echinochloa crus-galli (187 bp) and longest for Z. mays (293 bp). In contrast, the amplicon length for Apiaceae is the same for all five species tested (198 bp). Likewise, the primers for the genus Plantago result in PCR products of the same length for the two species tested (P. major and P. lanceolata, 116 bp). The multiplex TAT, on the other hand, allows discerning between four different species within a single PCR (Table 3), because T. repens and T. pratense, were represented by different amplicon length (172 bp and 151 bp, respectively), using the very same primers (Tri-sp-S550 and Tri-sp-A558). For F. esculentum two primer combinations were optimized, resulting in fragments of 380 bp and 206 bp length, respectively. Assays for the remaining species-specific primers generate amplicons ranging between 181 (Z. mays) and 306 bp (T. aestivum) in length.

Diagnostic PCR assays
Each PCR contains 4 mL of DNA extract per 15 mL reactions, 7.5 mL 26 TypeIt Mutation TM Detect PCR Kit (Qiagen), 0.5 mg bovine serum albumin (BSA), and 0.5 mL 56Q-solution (Qiagen). The thermocycling program is: 95uC for 5 min, 40 cycles of 92uC for 20 s, 51-64uC for 90 s and 70uC for 90 s and finally 70uC for 5 min. Primer concentrations, MgCl 2 content and annealing temperature for specific PCRs are given in Table 2.
In only one case non-target species generated PCR products which were of similar size than the ones of the targeted plants: the multiplex designed for T. officinale, A. millefolium and the two Trifolium species (TAT) cross-reacted with DNA of Medicago lupulina, producing a 222 bp fragment, the same length as the one expected for A. millefolium.
The PCR assays are highly sensitive: in most cases amplification and visualization of the target DNA is possible down to the presence of 100 templates of target DNA per PCR ( Table 3). The presence of wireworm DNA does not or only marginally decrease the sensitivity of the different assays (Table 3). Only two Apiaceae species exhibit a lower sensitivity: Anthriscus sylvestris at 800 templates and Pimpinella major at 1,600 templates. For F. esculentum the primers amplifying the longer fragment turned out to be more sensitive (200 copies) than the ones generating the shorter one (400 copies).

Applicability of the newly established PCR assays
In the decay experiment, all DNA extracts of the decayed parts from maize and wheat, that were recovered after 20 or 24 days exposure in the soil, could be identified (detection rate = 100%). Likewise, all plant species fed to the wireworms were detectable in the whole-body DNA extracts of larvae (the mean detection rate over all plant species was 30%). The detection rates were 50% for P. major, 45.5% for P. lanceolata, 29.2% for A. millefolium, 20% for L. perenne, 18.6% for T. officinale, and 10% for T. pratense. Out of the field-collected wireworms, 21% tested positive for maize DNA.

Discussion
We present optimized PCR assays based on specific primers for the identification of plant DNA. Based on a discrimination of similar vs. variable sequence regions within and among families and a comprehensive testing of cross-reactivity of primers in silico we were able to generate specific primers targeting the trnL-F cpDNA region. The most challenging part within the development of these assays was the development of reliable primers. This is mainly due to the highly ambiguous alignment of the selected chloroplast sequences caused by high rates of indels -a general feature of cpDNA spacer regions [25]. It appears that even within closely related taxa, great length differences in non-coding regions exist, such that at greater taxonomic distances no shared sequences remain.
Earlier attempts of molecular identification from morphologically indistinguishable plant parts employed different DNA markers and methods. Some of the methodological hurdles involved are coinciding with the difficulty in finding an appropriate DNA barcode for plants. The ITS, for example has been successfully applied to distinguish plants in small scale studies harbouring a limited number of species [32], [1], [33]. But it does not always allow to identify plant species unambiguously [6]. Likewise, Kesanakurti et al. [34] were unable to distinguish multiple species using the rbcL for the identification of plant roots.  Consequently, more rapidly evolving regions are required when barcoding roots, or the application of a 2-locus approach, as promoted by the CBOL plant working group [35]. In addition, alternative PCR-based methods have been applied [36], [37], [16]: DNA sequencing or restriction fragment length polymorphism (RFLP) analysis of plastid genes (rbcL and trnL). Moore and Field [32] were able to identify root samples of up to four species based on RFLP keys. Despite their usability, RFLPs reveal only changes at restriction sites or length variation large enough to be detected [6]. With an increasing number of species present in a sample the revealed patterns are more likely to blend together and overlap. Moreover, the type of the organ, where the DNA is taken from, affects the genetic fingerprint, as pattern differences between roots and leaves were found [38]. Each of the approaches described above comprises a cascade of reactions necessary to assign PCR products to a specific plant species. Contrary, we could identify plants to species level within a single PCR. The trnT-F cpDNA used in this study already proved as an appropriate barcode for identifying digested plant DNA [39], [40], [41]. But, the approach presented here is also applicable for other loci than the trnT-F region. Once specific primers are established, multiplex PCR provides a means to detect and identify several targets simultaneously [42][43], circumventing the need of follow-up reactions such as RFLP analysis. The number of species that can be identified simultaneously in a single multiplex PCR is limited due to the requirement of adequate size differences between the amplicons, and in case degraded plant DNA is targeted, by the restricted length of the PCR products [23]. As the number of target species increases, so will the time and effort needed to screen each sample for multiple plant species. Another limitation of our approach is the need to sequence and find primer sites prior to the application of a new PCR system. In time, an increasing number of both, plant sequences and specific primers will become available, thus reducing these efforts. This process could also be accelerated by the use of next-generation sequencing, which is capable of sequencing many thousands of samples simultaneously [44].
Whereas with our approach only plant taxa are accessible for which primers already are developed, next-generation sequencing allows an examination without a priori knowledge of the species involved. But, this approach also implies that the general primer used, match equally well on all target species and that preferential amplification of certain species does not inhibit the detection of other species [45]. Besides, the tagging of the primers, which is necessary for most next-generation sequencing techniques in order to analyse individual samples [44], can influence their reactivity in the PCR [45].
Due to these constraints, next generation sequencing is recommended to situations where little or no a priori knowledge is available. Alternatively, when information on the population level is sufficient for addressing a study's aims, a meta-sample can be analysed [46]. However, in-depth analysis will be limited to a few individual samples only (e.g. [20], [22]) due to the cost of this approach. Moreover, it is expensive to use separate tags for potentially hundreds of individual samples. Hence, for work which requires an individual-based analysis, primers can subsequently be designed that target specific taxa followed by mass screening of individuals, using multiplexing and fragment analysis to make the task more efficient [47].
While in next generation sequencing species are identified by comparing the obtained DNA with reference sequence information, in diagnostic PCR, plant identification is based on differences in amplicon size. Hence, it is vital for the current approach that the specific amplicon sizes are obtained with target DNA only, involving the need to carefully test the PCRs against a wide range of non-target taxa whose DNA might be also present in the samples [48]. Accordingly, recurrent checks of a subsample of amplicons via sequencing are advisable to confirm the identity of the target species. For the PCR protocols presented here the levels of cross-reactivity remained low as in only one case a size-specific PCR product was obtained with a non-target species, belonging to the same family. An application on other plant communities will require cross-reactivity testing with species that were not present in the current study.
Our PCR assays were successful in detecting as less as 100 template molecules per reaction. The sensitivity remained high even in the presence of excess non-target (wireworm) DNA, mimicking plant detection in complex mixtures of DNA, as it is the case for gut content-, faecal-, litter-or soil samples. Besides a high assay sensitivity, PCR products need to be short enough to track degraded DNA [10], like remains in decaying plant material as well as in the intestine and faeces of herbivores [23]. The current assays generate amplicons with less than 400 bp, thus maximizing the likelihood of detection of degraded DNA.
We already proved the capability of our approach to detect and identify DNA of ingested plants from whole-body extracts of wireworms for over three days post-feeding [31]. Here, we introduce two methods to increase the efficacy of diagnostic screenings. Firstly, primer pairs have been combined in multiplex PCRs to reduce the number of PCRs necessary [42]. Secondly, the application of family-primers allows a pre-selection of samples, which considerably reduces the number of samples that need to be analysed for different genera or species within this family. Our molecular identification system could also be applied in forensic botany to routinely and correctly identify trace botanical evidence, where the absence of an accurate identification system currently remains the major obstacle [49]. For analysis of botanical trace evidence in criminal and civil cases plant species identification would be reduced to a set of PCRs in a routine analysis based on the PCR technique reported here. Tsai et al. [17] established a DNA database of local plants in Taiwan from sequences comprising the trnL intron and the trnL-F intergenic spacer, which could provide an additional basis for the development of new specific primers.
The analysis of leaf litter mixtures is another example where decaying plant material is difficult to assign to species [50]. Although badly needed -to our knowledge -currently no successful attempts of molecular litter identification exist. It is very difficult to estimate litter composition in natural ecosystems: Many species are mixed, and they are present in different stages of decay due to species dependent differences in rates of plant-litter decomposition [51]. This causes problems when attempting to sequence litter samples. The use of short diagnostic PCR products as markers enables the detection, even if only traces of DNA are left. It provides a simple and cheap means for sorting litter components into species, similar to the molecular identification of detritivorous macro-invertebrates from their faecal pellets [52].
In summary, the approach outlined here is applicable for the identification of otherwise unidentifiable plant(part)s, comprising the genus-and species specific primers: The dotted lines represent the known sequence, the inner bars indicate the position of the two trnL exons and the outer bars the position of the trnT and the trnF gen. The binding sites of primers are indicated by double crosses. doi:10.1371/journal.pone.0029473.g001 roots, leaf litter, decaying or ingested plant material, and herbivore faeces. It offers a wide range of application and can be tailored towards the needs of future work following the protocols described here, contributing to a better understanding in what is going on ''directly under our very noses''.

Supporting Information
Supporting Information S1 PCR conditions to generate templates for sequencing.