DNA Barcodes for Nearctic Auchenorrhyncha (Insecta: Hemiptera)

Background Many studies have shown the suitability of sequence variation in the 5′ region of the mitochondrial cytochrome c oxidase I (COI) gene as a DNA barcode for the identification of species in a wide range of animal groups. We examined 471 species in 147 genera of Hemiptera: Auchenorrhyncha drawn from specimens in the Canadian National Collection of Insects to assess the effectiveness of DNA barcoding in this group. Methodology/Principal Findings Analysis of the COI gene revealed less than 2% intra-specific divergence in 93% of the taxa examined, while minimum interspecific distances exceeded 2% in 70% of congeneric species pairs. Although most species are characterized by a distinct sequence cluster, sequences for members of many groups of closely related species either shared sequences or showed close similarity, with 25% of species separated from their nearest neighbor by less than 1%. Conclusions/Significance This study, although preliminary, provides DNA barcodes for about 8% of the species of this hemipteran suborder found in North America north of Mexico. Barcodes can enable the identification of many species of Auchenorrhyncha, but members of some species groups cannot be discriminated. Future use of DNA barcodes in regulatory, pest management, and environmental applications will be possible as the barcode library for Auchenorrhyncha expands to include more species and broader geographic coverage.


Introduction
The hemipteran suborder Auchenorrhyncha includes two large superfamilies, Fulgoroidea (planthoppers) and Membracoidea (treehoppers and leafhoppers), and two smaller superfamilies, Cercopoidea (spittlebugs or froghoppers) and Cicadoidea (cicadas). In North America north of Mexico, there may be as many as 3800 species of Auchenorrhyncha [1][2][3][4]. The Nearctic component of the Mexican fauna is poorly known but may be equally as rich. Many species of Auchenorrhyncha, especially leafhoppers, are economically important as either direct plant pests or as vectors of plant pathogens [5]. Some tree, shrub and grass-feeding Auchenorrhycha are host specific [6][7][8], and leafhoppers have been used as indicators of habitat quality, particularly in grasslands where they are especially diverse [8,9].
Sequence variation in the 59 end of the mitochondrial cytochrome c oxidase subunit I gene (COI) has been adopted as the DNA barcode for the identification of species in the animal kingdom [10,11]. DNA barcode data are already available for several groups of Hemiptera (Aphididae [12,13], Adelgidae [14], Heteroptera [15,16], Coccoidea [17]). However, very little sequence information is available for the barcode region in Auchenorrhyncha. The only broad surveys are those of Kamitani [18], who provided DNA barcodes for 45 species of Japanese Cicadellidae, Cryan and Svenson [19] who included COI sequences for 80 species as part of their investigation of familylevel relationships among Cercopoidea, and Lin & Wood [20] in a study of tribal relationships and the evolution of maternal care in Membracinae. In addition, several genus-level or species-group phylogenetic analyses and investigations of population variation have included all or part of the barcode region [21][22][23][24][25][26][27][28][29][30][31][32] providing intensive within-species replication. Two other studies employed COI barcodes to identify cicadellid prey items [33][34] while Le Roux and Rubinoff [35] used COI sequences to help determine the source of a leafhopper adventive to Hawaii. Seabra et al. [36] examined the use of 'barcoding' in Philaenus, but used the 39 end of the COI gene, generating results that are not directly comparable to the global standard.
The present study provides a preliminary library of COI barcodes for Nearctic Auchenorrhyncha, primarily from Canada and the northern United States, based on material in the Canadian National Collection of Insects, Arachnids and Nematodes (Agriculture and Agri-Food Canada, Ottawa) (CNC).   A total of 1482 dried, pinned specimens of Cicadidae, Cercopidae (including Aphrophoridae), Cicadellidae, Membracidae and Aetalionidae were selected from the CNC ( Table 1). Most of the specimens were larger-bodied forms collected by hand so several cicadellid subfamilies, consisting mainly of species with small body size were not represented. The majority of the species were from North America north of Mexico (including several introduced species), but material from other areas was examined for certain groups. An average of 2.5 specimens was selected per species, chosen when practical to provide geographic coverage. The age of specimens ranged from one to 60 years. All specimens of leafhoppers and spittlebugs, as well as many of the treehoppers were determined, or the identification was confirmed, by KGA Hamilton. For other groups, identifications applied to the specimens by past workers were assumed to be correct. In some cases, narrow generic concepts were applied to assess the utility of COI barcodes in assigning generic names to species not represented in the data set (see Table 1). Names of genera and higher taxa of leafhoppers follow Oman [2]; higher classification of spittlebugs follows the Metcalf Catalog [37], while the generic names are based on the work of Cryan and Svenson [19] and Hamilton [38]. Names for other taxa follow the most recent checklists [3,4,39,40]. Authorship of generic and specific names is provided in these references.
Sequences, trace files, collection data, and specimen photographs are deposited in the Barcode of Life Data Systems (BOLD, http://www.boldsystems [41]) as public dataset DS-EMAUCH0 (dx.doi.org/10.5883/DS-EMAUCH0). Sequences are also available in GenBank (accession numbers KF919304 -KF920463). A label was added to each specimen enabling its linkage with the corresponding record in BOLD.

COI Amplification, Sequencing and Analysis
DNA was extracted from a single leg (left middle leg whenever possible) removed from each specimen. Extraction, PCR amplification and sequencing were performed at the Biodiversity Institute of Ontario (Guelph, Ontario, Canada) following procedures described in Hajibabaei et al. [42] Primer names and sequences are given in Table 2. If the first pass using the pair LepF-t1/LepR failed, further reactions were performed using internal primers or cocktails of mixed primers.
Sequences were assembled and edited using CodonCode Aligner Ver2.0.6 (CodonCode Co.). Pairwise divergences were calculated using both uncorrected values and the Kimura twoparameter (K2P) model of base substitution [43], and several other substitution models were explored for various subsets of the data. Athough K2P is not necessarily the best model to employ [44,45], the values derived from this model are reported here for generic  and family level summaries because it allows for direct comparison with the existing COI barcode literature for Hemiptera [12][13][14][15][16][17][18].
For the smaller divergence values encountered within species and among closely related species, the difference among models (including uncorrected distance) was small, often less than the reporting precision, and model choice did not affect the conclusions drawn. Both K2P and uncorrected values are given for species-level comparisons when they differ; when no model is specified there was no difference at the reported precision. Summary statistics were calculated using the utilities available in BOLD. Additional analysis of base substitution rates and exploration of alternate substitution models was carried out using MEGA version 5 [46].

Results
Summary statistics are given in Table 3. Results are reported separately for the following groups: Fulgoroidea, Cicadidae, Cercopoidea (Aphrophoridae, Cercopidae, Clastopteridae), Cicadellidae, and Membracidae plus Aetalionidae. Of the 1482 specimens analyzed, 1150 (77%, representing 471 identified species and 66 undetermined OTUs) produced sequences longer than 400 base pairs. Nearly 75% of these (870 of the 1150) met the barcode standard (at least 500 base pairs, less than 1% ambiguous residues, bidirectional sequence coverage [47]). One sequence with a single base pair deletion was assumed to represent a possible NUMT and was excluded. A second sequence was excluded, pending replication, as possible contamination, since the nearest matches were for non-hemipterans. Several sequences with significant background signal from co-amplified products were also excluded. Sequences less than 400 base pairs in length were not included in subsequent analyses, but are available on BOLD (data set DS-EMAUCH1, dx.doi.org/10.5883/DS-EMAUCH1) as many of these records provide the sole coverage currently available for the taxa in question.
Although one specimen collected 60 years ago yielded a full sequence, the probability of obtaining results from dry specimens declined with age, dropping from 98% recovery in specimens analyzed within a decade of capture to 57% in specimens more than 50 years old.
The LepF-t1 primer was less effective than the other forward primers. As a result, many sequences only provided coverage for about 400 bp at the 39 end of the barcode region (i.e. upstream of primer MHemF or MLepF primers, closely positioned to the primer 'Ron' or C1-J-1718, often used in phylogenetic studies e.g. [19,29]. The proportional contribution of each base change to the total divergence values for these truncated sequences is slightly inflated. As well, analysis indicated that site changes were more frequent in this region, and pairwise K2P distances in the 400 bp at the 39 end of the barcode region averaged 1.08 times higher than those for the full length of the same sequence. In general, both mean (0.36%-0.64% K2P or 0.36-0.63% uncorrected; Table 3) and maximum intraspecific divergences were low. For example, only 22 (Table 4) of the 304 species with more than one specimen possessed a maximum K2P divergence greater than 2%. In contrast to these general patterns, there were some groups in which between-species and within-species sequence variation at COI showed little or no discontinuity, i.e. the species lacked a barcode gap. Overall, the minimum nearest neighbor distance was less than 1% in 10%-37% of the species (25% average) in the various families (Table 2) and another 0%-13% (4% average) of species showed just 1 to 2% K2P sequence  divergence. Table 5 lists the species groups with minimum pairwise K2P divergence of less than 2%. For genera represented by more than one species, the nearest neighbor (as measured by minimum pairwise K2P distance among specimens of each species) was usually a congeneric species so members of a genus were cohesive. However, 47 exceptions were detected (Table 6) with the membracid Heliria particularly noteworthy, as four of its five members had nearest-neighbors in another genus.

Discussion
Previous studies have provided DNA barcodes for only a few species of Auchenorrhyncha. Seabra et al. [36] found that COI sequences clearly discriminated European members of the genus Philaenus (Cercopoidea), while Le Roux & Rubinoff [35] used COI barcode sequences to identify the geographic origin of populations adventive to Hawaii in the leafhopper genus Macrosteles. Kamitani [18] provided DNA barcodes for 45 Japanese species of Cicadellidae. Although cercopids and membracids are well-represented as a result of broad phylogenetic analyses [19,20], this study provides the first large-scale data release of COI barcode records for the suborder, with representation of about 8% of the known species of Auchenorrhyncha found in Canada and the United States. In fact, 8 of the 13 species of Fulgoroidea and 46% of the species of Cercopoidea present in this region were barcoded.
The mean intraspecific sequence divergence was less than 0.7% for the species examined in this study, but future work may increase this value because sample sizes are small and geographic coverage is limited for most species. These values are similar to those reported for Heteroptera (0.74% intraspecific, 10.67% interspecific [16]), but greater than those for aphids (0.24% intraspecific, 7.25% interspecific [12]). Among the 45 Japanese species treated by Kamitani [18] all species had distinct barcode sequences. By comparison, we found that 24% of species showed less than 1% minimum sequence divergence from their nearestneighbour. For example, 11 closely related species in the leafhopper genus Gyponana, possessed pairwise interspecific distances ranging from 0 to 2.46% and a mean between-species divergence of only 1.38%. Despite their close sequence similarity, these species are distinguished morphologically by the form of the male genitalia, wing venation, and colour patterns, and exhibit biological differences [48]. These taxa likely represent instances of recent speciation through host specialization or geographic separation of populations with low vagility. The three species in the spittlebug genus Philaenarcys provide a similar example as they have different morphology, male genitalia and host plants [49], but there are only two sequence clusters, with specimens of P. killa in both. Various factors, including the retention of ancestral COI polymorphisms or mitochondrial introgression, may explain these situations, but in some cases they may also reflect the adoption of subtle inter-population differences as criteria for species delineation. Thus a reference library of DNA barcodes can motivate re-evaluation of the significance of morphological character differences among populations and species.
Pronotal shape has been an important character in defining some genera of Membracidae, such as Telamona. However, species of Telamona have barcodes similar to those of species of Archasia, Carynota, Glossonotus and Heliria. In fact, barcodes for specimens assigned to Glossonotus acuminatus, Heliria cristata and Telamona concava differ by less than 1%, a pattern which supports previous indications by morphological studies [50] that these genera need revision.
We detected a few examples of deep sequence divergence among specimens assigned to a single species (Table 3) suggesting the possibility of unrecognized cryptic species. For example, three specimens of Carsonus furcatus, all from Washington State, differ by up to 8.26%. Similarly, Mexican specimens currently assigned to Cephisus variolosus show divergences up to 7.5%. This species exhibits variation in shape and coloration, and Hamilton [51] has already suggested that it represents a complex of taxa. However, two specimens, collected together, of Pagaronia minor, a recently introduced Japanese species, diverged by about 4%. Sequences for these specimens are most similar to each other and distinct from those from any other species in the data set, so contamination is an unlikely explanation.
In general, broad geographic sampling results in an increase in observed intraspecific variation and a consequent decrease in minimum interspecific distance [52,53]. The magnitude of this increase varies among taxa. All of the species treated in this barcode reference library are represented by few specimens and require further sampling from across their geographic ranges. This expanded data may result in an increase in the already rather high incidence of low interspecific divergenced in COI sequence among the Auchenorrhyncha. However, many species have quite restricted distributions, and additional sampling may reveal that for at least some of the species pairs with low divergence, intaspecific variation is limited and barcodes are truly diagnostic.
Further work is also necessary on many species groups to provide a more strongly validated taxonomic system to aid interpretation of COI sequence variation. Barcodes for type specimens would be especially valuable to correctly anchor the name associated with barcode clusters. Nevertheless, the utility of the method as a tool in the identification process and in species discovery was emphasized during the course of this study in that discrepancies in barcode sequence suggested errors in original species identification. On morphological re-examination the original determinations for 98 specimens were shown to be misidentifications, and in fact some of these specimens represent undescribed species. More than 3000 additional specimens from the CNC have now been sequenced and validation of these records and the identification of their source specimens are in progress.