DNA Barcoding for Identification of ‘Candidatus Phytoplasmas’ Using a Fragment of the Elongation Factor Tu Gene

Background Phytoplasmas are bacterial phytopathogens responsible for significant losses in agricultural production worldwide. Several molecular markers are available for identification of groups or strains of phytoplasmas. However, they often cannot be used for identification of phytoplasmas from different groups simultaneously or are too long for routine diagnostics. DNA barcoding recently emerged as a convenient tool for species identification. Here, the development of a universal DNA barcode based on the elongation factor Tu (tuf) gene for phytoplasma identification is reported. Methodology/Principal Findings We designed a new set of primers and amplified a 420–444 bp fragment of tuf from all 91 phytoplasmas strains tested (16S rRNA groups -I through -VII, -IX through -XII, -XV, and -XX). Comparison of NJ trees constructed from the tuf barcode and a 1.2 kbp fragment of the 16S ribosomal gene revealed that the tuf tree is highly congruent with the 16S rRNA tree and had higher inter- and intra- group sequence divergence. Mean K2P inter−/intra- group divergences of the tuf barcode did not overlap and had approximately one order of magnitude difference for most groups, suggesting the presence of a DNA barcoding gap. The use of the tuf barcode allowed separation of main ribosomal groups and most of their subgroups. Phytoplasma tuf barcodes were deposited in the NCBI GenBank and Q-bank databases. Conclusions/Significance This study demonstrates that DNA barcoding principles can be applied for identification of phytoplasmas. Our findings suggest that the tuf barcode performs as well or better than a 1.2 kbp fragment of the 16S rRNA gene and thus provides an easy procedure for phytoplasma identification. The obtained sequences were used to create a publicly available reference database that can be used by plant health services and researchers for online phytoplasma identification.


Introduction
Phytoplasmas are bacterial plant pathogens that are transmitted by hemipteran insect vectors and that cause significant losses in agricultural production worldwide [1]. They are assigned to a clade within the class Mollicutes, a branch of the Gram-positive eubacteria that lack cell walls [2]. Although phytoplasmas are relatively well studied, their identification is still challenging as they do not possess a distinctive morphology and are currently nonculturable in vitro.
Phytoplasmas infect over 200 plant species [1], and, when infected, plants show symptoms such as virescence, phyllody, yellowing, witches' broom, and generalized decline. Apple proliferation, 'stolbur', 'flavescence dorée', 'bois noir' and coconut lethal yellowing are among the most prominent phytoplasma diseases and are considered of quarantine relevance in the EU, which means that their spread should be especially tightly regulated. No fully resistant crop varieties are currently available, and main disease management strategies are limited to control of insect vectors (when known), and elimination of infected/ symptomatic plants. Therefore, availability of reliable and efficient methods for identification is especially important for this group of pathogens.
Currently, over thirty species are recognized within the 'Candidatus Phytoplasma', mostly based on at least 97.5% sequence identity within their 16S ribosomal RNA gene, but also on biological, phytopathological, and other molecular characteristics [3]. For practical diagnostics purposes, phytoplasmas are commonly classified based on patterns of restriction fragment length polymorphism (RFLP) analysis of a 1.2 kbp fragment of the 16S rRNA gene [4,5] after amplification with R16F2n and R16R2 [6] primers, often preceded by nested PCR amplification of a 1.8 kb fragment using P1 [7] and P7 [8] primers. However the 16S rRNA gene does not show much variation and may be present in two, sometimes non-identical, copies [9]. This has prompted the use of other, more variable regions of the phytoplasma genome. The 16SrI ('Ca. P. asteris'-related) group has been subdivided using the tuf gene, the ribosomal protein operon (rp) and the 16S-23S rRNA intergenic spacer region [10,11], along with the secY gene [12] and groEl gene [13]. The 16SrV group has been subdivided using secY, map and uvrB-degV [14], and rp [15] genes. The 16SrXII group has been subdivided using the tuf gene and the rp operon [16].The majority of these studies have examined taxonomic relations within specific 16Sr groups, as these PCR primers are groupspecific and do not amplify DNA from phytoplasmas in other groups. Martini et al. (2007) used primers that amplify a 1,2-1,4 kbp fragment of the genes rplV (rpl22) and rpsC (rps3) from a wide range of phytoplasmas and constructed a phylogenetic tree which resulted in a finer resolution within 16S ribosomal groups [17]. Lee and coworkers (2010) used various primers for different phytoplasma groups that amplified a fragment of more than 2 kb of the secY gene and were also able to construct a phylogenetic tree with high resolution [18]. Although these studies improved knowledge on phylogenetic relationships, the long size of amplicons makes them rather impractical for routine sequencing-based identification. The first attempt to use a shorter marker for universal phytoplasma identification was undertaken by Hodgetts et al. (2008), when a 480 bp-long fragment of the secA gene was used [19]. The resulting phylogenetic tree revealed a good resolution of 'Candidatus Phytoplasma' species and the secA fragment emerged as a promising marker for universal identification of phytoplasmas. However, under-representation of phytoplasma strains (34 strains in total) in the study did not allow evaluation of its full potential, and furthermore, some strains, which were not tested in the original study, did not amplify well using the published primers, at least in our hands.
It has been suggested that universally amplified, short, and highly variable DNA markers (DNA barcodes) may help to rapidly identify organisms to a species level with a high confidence in a cost-effective way, which would be useful in a wide array of applications, including diagnostics [20,21]. In general, DNA barcodes should contain sufficient variation to discriminate among closely related species and yet possess highly conserved regions so that the barcode region can be easily amplified and sequenced with standard protocols. Furthermore, the taxonomy to which DNA barcodes are linked should be already established by other means, as DNA barcoding is considered a method of molecular identification and not taxonomy [22]. Indeed, DNA barcoding recently emerged as a popular and convenient tool for species identification and has been successfully used and taxonomically validated for eukaryotes [23][24][25]. Several international DNA barcoding projects have been launched in recent years [26]. QBOL (Quarantine Barcode of Life), an EU project aiming at the development of a universal identification system for main groups of quarantine organisms, including phytoplasmas, was started in 2009 [27]. As a part of this initiative we attempted to develop a universal DNA barcoding-based tool for phytoplasma identification.
Here, we present the results of a study on DNA barcoding for phytoplasma identification. A set of primers amplifying a fragment of the tuf gene was designed and the potential of this fragment as a DNA barcode was examined. Elongation factor Tu is a key protein involved in the process of translation that is present in all known organisms, relatively well conserved and found as a single copy in the four phytoplasma genomes fully sequenced to date [28][29][30][31]. Successful amplification and sequencing of the 91 phytoplasma strains tested, and ability to separate various phytoplasma strains to 'Candidatus' species, 16S rRNA group and subgroup levels suggested it can be used as a DNA barcode for phytoplasma identification.

Taxon Selection and Nucleic Acid Preparation
Ninety one phytoplasma strains collected in various geographic locations from 16Sr groups -I, -II, -III, -IV, -V, -VI, -VII, -IX, -X, -XI, -XII, -XV, -XX were used in this study. The strain names and their respective 'Candidatus Phytoplasma' species (when available), 16Sr group and subgroup, geographical origin and host plant are listed in Table S1. The majority of the strains were obtained from the phytoplasma reference collection located at the University of Bologna, Italy [32], or as DNA preparations from other researchers or were maintained in periwinkle (Catharanthus roseus), or in napier grass (Pennisetum purpureum) for napier grass stunt, in grapevine for 'flavescence dorée' and 'bois noir' strains. Healthy apple, aster, grapevine, lettuce, maize, tobacco, periwinkle, plum, potato and oat plants, some of which are typical hosts of phytoplasmas, were used for negative controls ( Figure 1). DNA was extracted by the method described in Prince et al. 1993 [33].

Primer Design
tuf gene sequences of phytoplasmas (16Sr groups -I, -III, -IV, -V, -VII, -X and -XII), several plant species, Bacillus and Clostridium spp. were retrieved from the NCBI GenBank (http://www.ncbi. nlm.nkh.gov/), aligned using the DNA Workbench (CLCbio, Aarhus, Denmark) software using default settings, and conserved regions present in all phytoplasmas were identified. Primer sequences were designed by visual assessment of the alignment to include all phytoplasma groups and to exclude plant and bacterial DNA.

PCR Amplification and Sequencing
PCR was carried out in a 25 ml reaction mixture containing 10 mM primers, Fermentas Taq polymerase (for the P1/P7 primer pair) or Promega GoTaq DNA polymerase (for all other primers) and respective reaction buffers, 25 mM MgCl 2 , 10 mM dNTPs mix (Fermentas) and sterile water. One ml (20 ng/ml) of DNA template was used per reaction. For amplification of the tuf barcode, two pairs of primer cocktails (Tuf340/Tuf890 for direct PCR and Tuf400/Tuf835 for nested PCR, Table 1) were used. Each primer cocktail consisted of several variants of the same primer mixed in equimolar proportions to the final concentration of 10 mM. The same primer cocktails were used for amplification of all phytoplasma strains used in this study. PCR conditions for both rounds of PCR with primer pair Tuf340/Tuf890 followed by primer pair Tuf400/Tuf835 ( Table 1) were 94uC for 3 min followed by 35 cycles of 94uC for 15 sec, 54uC for 30 sec and 72uC for 1 min and a final extension step of 72uC for 7 min. Resulting PCR products from the first round were diluted 1:30 with sterile water and 1 ml product was used in the nested PCR assay. Amplification of the 16S rRNA gene was performed in a nested PCR assay using primers P1 and P7 in the first round, followed by primer pairs P1-ATT (AAGAGTTT-GATCCTGGCTCAGG)/P625 (ACTTAYTAAACCGCC-TACRCACC) (this study), P4 [8]/P7, 16R758f (M1) [34]/P7 and R16F2n/R16R2 [35] in the nested PCR to allow for overlapping coverage of the 1,800 bp region. The cycling conditions for the primer pair P1-ATT/P625 were 94uC for 3 min followed by 35 cycles of 94uC for 15 sec, 64uC for 30 sec and 72uC for 1 min and a final extension step 72uC for 7 min; for the primer pair P4/P7 94uC 3 min, 35 cycles of 94uC for 15 sec, 52uC for 30 sec, 72uC for 60 sec followed by final extension 72uC for 7 min; for the primer pair M1/P7 94uC for 5 min, followed by 35 cycles of 94uC for 90 sec, 54uC for 2 min, 72uC for 3 min, followed by 72uC for 7 min; for the primer pair R16F2n/R16R2 94uC for 2 min, followed by 35 cycles of 94uC for 1 min, 50uC for 2 min, 72uC for 3 min, followed by 72uC for 10 min. PCR products from the first round with P1/P7 primers were diluted 1:30 with sterile water and 1 ml product was used in the nested PCR assays. Post-PCR cleanup and sequencing of the amplicons   were processed by Macrogen Inc. (Seoul, Korea). All PCR products were sequenced on both strands using M13F and T7 as primers for tuf sequences and P1, P625, P4, P7, Phyt16Sr (TCCTACGGGAGGCAGCAG), M1, 16R1232r(M2) [34] and Phyt16Sr2 (TATTGTTAGTTGCCAGCACG) for the 16Sr gene.

Sequence and Phylogenetic Analyses
Sequences were assembled and edited using DNA Workbench (CLC bio, Aarhus, Denmark) software. The resulting consensus sequences have been deposited in the NCBI GenBank and the Q-Bank (http://www.q-bank.eu/) databases. The NCBI GenBank accession numbers of the tuf barcode sequences can be found in Table S1. The NCBI GenBank accession numbers of the 16S rRNA gene sequences sequenced in this study or obtained from the NCBI GenBank for phylogenetic analyses can be found in Figure 2. The tuf gene sequence of Acholeplasma laidlawii (accession number NC010163) was retrieved from the NCBI GenBank. The 1.2 kb R16F2n/R16R2 fragment of the 16S rRNA gene from 66 phytoplasma strains and the 420-444 bp Tuf400/Tuf835 fragment of the tuf gene from 91 strains were used for construction of the 16S rRNA and tuf alignments, respectively. Sequence alignments were performed using a progressive alignment algorithm [36] implemented in the DNA Workbench package (CLC bio, Aarhus, Denmark) with the following settings: gap open cost 10, gap extension cost 1, end gap cost as any other. The alignments were exported to the MEGA 4 software [37] for distance and phylogenetic analyses. Neighbor-Joining (NJ) [38] trees were constructed using 500 replicates for bootstrap analysis [39] and A. laidlawii as an outgroup to root the tree. Average intraand inter-group evolutionary divergences were calculated using the Kimura 2-parameter (K2P) distance model [40]. Groups for determining genetic distances were defined based on the 16Sr [4,5] and 'Ca. Phytoplasma' (when applicable) [3] classification systems. The bootstrap [39] consensus Maximum likelihood (ML) trees were inferred from 100 replicates using the best fitting model of 24 different nucleotide substitution models (Table S2 and  Table S3). The Tamura-Nei model [41] was used to infer the 16S rRNA ML tree, and the Tamura 3-parameter model [42] was used for construction of the tuf ML tree. In both cases a discrete Gamma distribution (+G) was used to model evolutionary rate differences among sites, assuming that a fraction of sites are evolutionarily invariable (+I).

Results and Discussion
For an ideal phytoplasma DNA barcoding procedure, the barcode should be easily amplifiable using a single set of primers, be relatively short to facilitate sequencing, show non-overlapping inter-and intra-species sequence divergence (i.e. creating the DNA barcoding gap, when interspecific variation is normally greater than intraspecific variation by an order of magnitude) [22] and it should provide sufficient resolution for identification of phytoplasma 'Candidatus' species within the current taxonomy. Moreover, DNA barcoding of uncultivable plant pathogenic bacteria obviously requires that primers do not amplify plant or unrelated bacterial DNA. A DNA barcoding-based system was developed in Figure 3. Distribution of the pairwise inter-group mean K2P sequence divergence. Pairwise Kimura-2-parameter average distances between groups were determined for 91 and 66 phytoplasma strains from 19 groups for tuf and 16Sr, respectively. Note that inter-group sequence divergence in the 420-444 bp tuf barcode is much higher than divergence in the 1,2 kbp 16Sr gene fragment. doi:10.1371/journal.pone.0052092.g003 this study that fulfilled these criteria and allowed identification of all tested phytoplasma strains to ribosomal group and/or phytoplasma 'Candidatus' species level and in some cases to subgroup level with only one set of nested primers.

Tuf Barcode Primer Design, Amplification and Sequencing
Prior to this study, tuf gene sequences available in the NCBI nucleotide database were limited to a few sequences from the 16Sr groups -I, -III, -IV, -V, -VII, -VIII, -X and -XII. Alignment of these phytoplasma tuf sequences resulted in identification of conserved regions within the tuf gene. These regions were exploited for primer design in an attempt to amplify tuf gene sequences from most or all phytoplasmas, but not from plant or DNA from other bacteria. As phytoplasmas can occur in low titer, two sets of primers were developed for use in a nested PCR assay, resulting in four primer cocktails to accommodate any sequence variation between phytoplasma groups: Tuf340/Tuf890 were used for the first PCR and Tuf400/Tuf835 for the nested PCR ( Table 1). M13 and T7 sequences were attached to the inner primer pair to facilitate sequencing. The nested PCR resulted in products of the expected size (420-444 bp) from all 91 phytoplasma strains tested (Table S1) and no products from a range of healthy plant controls (Figure 1). The tuf gene PCR products were sequenced and the obtained sequences were deposited in the NCBI GenBank and in the QBOL project reference barcode database Q-bank (www.q-bank.eu).

Sequence Analysis
The sequences of the tuf barcode and, for comparison, the 1,240 bp R16F2n/R16R2 fragment of the 16S rRNA gene were assembled into two datasets for sequence analysis. The tuf barcode dataset consisted of sequences from 91 phytoplasma strains obtained in this study, whereas the 16S rRNA gene dataset contained sequences from 66 selected strains that were sequenced in this study or imported from NCBI Genbank. Both datasets included sequences from 16Sr groups -I, -II, -III, -IV, -V, -VI, -VII, -IX, -X, -XI, -XII, -XV, -XX.
Average inter-group K2P evolutionary divergence was calculated for nineteen phytoplasma groups (16Sr groups or 'Candidatus' species). Distribution of mean inter-group sequence divergence revealed that most of the 16Sr pairwise comparisons had 3-11% sequence divergence, whereas for tuf they ranged 28-42%, suggesting that the tuf barcode has more characters that allow better separation among groups ( Figure 3). However, in several instances, pairwise inter-group comparisons were as low as 1-8% for the tuf barcode. These outliers represent phytoplasma groups, which are closely related to each other, but are considered separate 'Candidatus' species based on distinctive biological and phytopathological properties [3], and include phytoplasmas from the 16SrV (A, B, C+D, E) and 16SrX (A, B, C) groups. Taken together, these results suggest that although the tuf barcode demonstrates a variable level of ribosomal subgroup resolution, its overall performance is comparable with the full 16Sr sequence or better. Average within-group sequence divergence was determined for groups where tuf and 16Sr sequences were available for at least three strains (16SrI, -II, -III, -VI and -XI) (Figure 4). The highest variation was found in the tuf dataset groups 'Ca. P. oryzae' group 16SrXI (13.1%) and 'Ca. P. aurantifolia' group 16SrII (4.9%), suggesting the presence of phytoplasma subgroups which were not identified based on 16S rRNA gene phylogeny alone, which was also observed in other studies of group 16SrII [19,43]. However, we cannot rule out the possibility that the combination of highly variable sequences and a low number of representatives in a given group could artificially inflate average intra-group sequence divergence. Comparison of the tuf barcode average inter-(both minimal and mean values) and intra-group K2P sequence divergence revealed the presence of a barcoding gap in most groups, as ratios between inter-and intra-group divergences were greater than 1 ( Table 2).
The obtained tuf sequences were subjected to phylogenetic analysis to further test whether individual sequences form groups that can be used for identification. However, it should be stressed that the tuf sequences used in this study were solely intended for identification of phytoplasmas and not for phylogenetics. Neighbor-Joining trees constructed from the tuf and 16S rRNA alignments showed remarkable similarity in terminal taxa, implying that the tuf barcode is well linked to the existing 16S rRNA phytoplasma phylogeny (Figure 2). This was supported by a separate phylogenetic analysis using the Maximum Likelihood method ( Figure S1 and Figure S2).
Furthermore, the tuf -based NJ phylogeny also resolved subgroups within several 16Sr groups. For example, 16SrII could be split into several subgroups (Figure 2 A), as reported previously [19,43]. Another example is the 16SrX group that was resolved into the three 'Candidatus' species: 'Ca. P. pyri', 'Ca. P. mali' and 'Ca. P. prunorum', all of which are important pathogens of fruit trees in Europe. 'Ca. P. asteris' (16SrI), which has previously been divided into subgroups based on the RFLP analysis of the 16S rRNA region, could clearly be differentiated into subgroups 16SrI-A, -B and -C using the tuf barcode. Closely related 16SrV group members containing important pathogens such as 'flavescence doreé' (16SrV-C and -D) and elm yellows (16SrV-A) could also be separated using the tuf barcode (Figure 2 A), as seven single nucleotide polymorphisms (SNPs) were observed between 'flavescence doreé' and elm yellows and one SNP was found between subgroups 16SrV-C/D and 16SrV-E ('Ca. P. rubi').
In conclusion, it was demonstrated that the tuf barcode, being three times shorter than the full length 16Sr gene, has much higher both inter-and intra-group divergence than the 16Sr gene and that inter-and intra-group divergences did not overlap creating a barcoding gap, one of the prerequisites for newly proposed DNA barcodes [44]. Furthermore, phylogenetic analysis and alignments showed that important groups of phytoplasmas could readily be identified. All these findings suggest that while being shorter, the tuf barcode provides clear resolution at both group and subgroup levels compared to the 16S rRNA gene. Finally, it should be noted that in the case of mixed infection, it may not be possible to obtain good quality sequence information, however, this is a general problem in phytoplasma research that may be solved by cloning of PCR products and subsequent sequencing of individual clones. The sensitivity of the PCR using tuf primers was not tested in this study, however, the use of a small fragment likely increases sensitivity in PCR compared to much larger fragments used previously in other phytoplasma studies.

Online Identification Tool and DNA Barcode Database
This work was a part of the QBOL initiative, which aims to adopt DNA barcoding principles to identification of plant pests and pathogens with a focus on quarantine organisms and to develop a DNA barcode-based identification system [27]. This includes establishment of a free online reference sequence database. The DNA barcodes obtained in this study were deposited in the database of plant pests and pathogens and can be found on http://www.q-bank.eu/Phytoplasmas/together with other relevant information (geographical origin of strains, original and maintenance hosts, 16Sr groups and subgroups etc). This DNA barcoding procedure will provide plant inspection services and other diagnosticians with a robust and easily performed identification tool. By using the protocols and primers provided on the mentioned above website, it will be possible to sequence phytoplasma DNA and with the help of the online identification tool to compare sequences from field-collected phytoplasmas with reference strain sequences. This DNA barcoding system will improve detection of phytoplasmas associated with plant diseases. Figure S1 The bootstrap Maximum Likelihood tree of the tuf barcode. The ML tree supports the terminal branches clustering according to the 16Sr phytoplasma group classification observed in the NJ tree (Figure 2a). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Numbers at the nodes indicate bootstrap values; bar, substitutions per nucleotide position; 16Sr group and subgroup are in parentheses; A. laidlawii (accession number NC010163) was used as an outgroup. (PDF) Figure S2 The bootstrap Maximum Likelihood tree of the R16F2n/R16R2 fragment of the 16S ribosomal RNA gene. The ML tree supports the terminal branches clustering according to the 16Sr phytoplasma group classification observed in the NJ tree ( Figure 2b). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Numbers at the nodes indicate bootstrap values; bar, substitutions per nucleotide position; asterisk, strains sequenced in this study; GenBank sequence accession number is indicated following the strain acronym; 16Sr group and subgroup are in parentheses; A. laidlawii (accession number NC010163) was used as an outgroup.

Supporting Information
(PDF)