Towards a Global Barcode Library for Lymantria (Lepidoptera: Lymantriinae) Tussock Moths of Biosecurity Concern

Background Detecting and controlling the movements of invasive species, such as insect pests, relies upon rapid and accurate species identification in order to initiate containment procedures by the appropriate authorities. Many species in the tussock moth genus Lymantria are significant forestry pests, including the gypsy moth Lymantria dispar L., and consequently have been a focus for the development of molecular diagnostic tools to assist in identifying species and source populations. In this study we expand the taxonomic and geographic coverage of the DNA barcode reference library, and further test the utility of this diagnostic method, both for species/subspecies assignment and for determination of geographic provenance of populations. Methodology/Principal Findings Cytochrome oxidase I (COI) barcodes were obtained from 518 individuals and 36 species of Lymantria, including sequences assembled and generated from previous studies, vouchered material in public collections, and intercepted specimens obtained from surveillance programs in Canada. A maximum likelihood tree was constructed, revealing high bootstrap support for 90% of species clusters. Bayesian species assignment was also tested, and resulted in correct assignment to species and subspecies in all instances. The performance of barcoding was also compared against the commonly employed NB restriction digest system (also based on COI); while the latter is informative for discriminating gypsy moth subspecies, COI barcode sequences provide greater resolution and generality by encompassing a greater number of haplotypes across all Lymantria species, none shared between species. Conclusions/Significance This study demonstrates the efficacy of DNA barcodes for diagnosing species of Lymantria and reinforces the view that the approach is an under-utilized resource with substantial potential for biosecurity and surveillance. Biomonitoring agencies currently employing the NB restriction digest system would gather more information by transitioning to the use of DNA barcoding, a change which could be made relatively seamlessly as the same gene region underlies both protocols.


Introduction
Non-indigenous insects and the pathogens they harbour have an overwhelming ecological, socio-economic, and evolutionary impact on the forest ecosystems they invade [1][2][3][4][5][6]. The rapid initiation of containment and eradication programs is imperative to prevent the establishment and spread of adventive populations or individuals, which in turn relies on early detection and accurate identification of non-indigenous species as they enter a new region [7]. Surveillance and monitoring at or near sites considered high risk (e.g. ports, airports, cargo facilities) serve this function, but current practices have proven insufficient, as evidenced by the alarming increase in non-indigenous insect establishments in recent decades [8]. Species identification in particular is a vulnerable step, as suggested by studies documenting the identification rate of port interceptions (e.g. 40-100% incomplete identifications across families of Coleoptera [9]).
To aid in the inherently difficult task of species identification for biosecurity, several molecular approaches are described in the recent standard released by the International Plant Protection Convention on the diagnostic protocols for regulated pests [10]. In contrast to the many ad-hoc protocols recognized in this standard (e.g. allozymes, restriction/amplified fragment length polymorphism analysis, microsatellites-techniques are reviewed in [11]), DNA barcoding provides a standardized and generic platform while still meeting and exceeding the rigorous standards of data quality and transparency (reviewed by [12]). Several studies demonstrate the efficacy of the approach as applied to invasive species detection and determination of native provenance, including work on leeches [13], agromyzid leafminers [14], tephritid fruit flies [15,16], ants [17], siricid wasps [18], true bugs [19], the cactus moth [20], the European poplar shoot borer [21], and nocturnal moths [22]. The integration of DNA barcoding into national biosurveillance programs has been protracted, but acceptance by certain agencies is apparent (e.g. see LBAM ID, which incorporates barcoding into the diagnostics of light brown apple moth and related species in California [23]. The tussock moth genus Lymantria L. (Lepidoptera: Noctuoidea: Lymantriinae) has been the focus of numerous investigations of molecular diagnostic tools, predominantly on Lymantria dispar L. to separate the North American, European and Asian gypsy moth populations (e.g. [24][25][26][27][28][29][30][31][32]). The intensity of research effort is certainly warranted -few groups of insects rival the tussock moths in number of undesirable, potentially invasive species and the destruction they wreak on both native and non-native forests. This invasive potential is the product of several undesirable life history traits including long overwintering period in the egg stage, polyphagy in the larval stage, attraction of adults to artificial light sources (e.g. of ports and cargo vessels), and oviposition by females on inanimate surfaces [32][33][34]. Invasive species identification for this group is complicated by the propensity for immatures to be the stage of translocation, providing few characters for morphological diagnosis. Furthermore, when the adults are targeted, as is the case with traps employing synthetic sex pheromone attractants, lures may inadvertently attract more then one species (e.g. [35]), and individuals may be severely damaged by the trapping mechanism (e.g. sticky traps; see Figure 1).
Two previous studies [15,36] initiated a DNA barcode library for Lymantria (and other tussock moth genera) and results suggested potential for biosurveillance and species identification. The present study builds upon this solid foundation, to further populate the library with additional species, test that its utility remains with comprehensive sampling within species, and employ the reference library to assign intercepted specimens from quarantine and monitoring programs in British Columbia, Canada. In addition, we compare the DNA barcoding approach to a diagnostic assay based on the same gene region, herein called the 'NB system', routinely-used in surveillance programs to assess the gross geographic provenance of Lymantria dispar [24,26], and suggest a simple transition for monitoring programs to adopt the barcoding approach with its improved reliability and resolution.

Materials and Methods
An effort was made to expand the taxonomic and geographic coverage of the DNA barcode library for the tussock moth genus Lymantria, with particular attention to those at high risk of being transported in cargo or on vessels to new habitats where they could become invasive pests, recently reviewed by Pogue and Schaefer [34]. We followed the taxonomy of [34] and [37], with the exceptions of L. nebulosa Wileman and L. subpallida Okano, which are considered here as species (see Appendix S1 for taxonomic treatment), and identifications were determined using these monographs. Samples and sequences were obtained from four sources. Firstly, sequences were obtained from several previous studies that analyzed all or part of the barcode region: Armstrong & Ball [15] (N = 76), Ball & Armstrong [36] (5), Bogdanowicz et al. [26] (74), Hebert et al. [38] (12), and Yamaguchi et al. unpublished (8). Secondly, 146 samples were analysed from vouchered material in publicly accessible collections, namely the Agricultural Scientific Collections Trust (ASCT, New South Wales, Australia), Biodiversity Institute of Ontario (Guelph, Canada), Pacific Forestry Centre (Victoria, Canada), Smithsonian National Museum of Natural History (Washington D.C., United States), and University of Maryland Alcohol Tubes of Lepidoptera collection (College Park). Thirdly, 165 specimens were analyzed from the USDA Forest Service Northern Research Station (Hamden, United States) which included representatives from all 46 populations of Lymantria dispar analyzed in Keena et al. [32]. Finally, 32 specimens were acquired from surveillance programs in British Columbia, Canada. Thirty of these specimens were collected in sticky traps baited with L. dispar pheromone ( Figure 1) around the province during 2006 and 2009 and two individuals were reared from egg masses removed from a Russian cargo vessel in the port of Vancouver in 1992.
DNA extraction, amplification, and sequencing of the barcode region of the mitochondrial cytochrome oxidase I (COI) gene followed standard high-throughput DNA barcoding methods previously described [38,40]. The primers LepF1 and LepR1 [41] were used in most instances, but amplification and sequencing using the 'Lep mini primers' (MLepF1, MLepR1) [42] was necessary for a few older specimens. Molecular work performed in Australia used the COI primers and amplification methods described in Cho et al. [43]. All sequences are publicly available from the Barcode of Life Data Systems (BOLD) (www.boldsystems. org; [44]; project 'Global Lymantria' [GLYMA]) and GenBank (see Table S1 for accession numbers). The 658 base pair (bp) barcode region encompasses 360 of the 378 bp amplicon and the two restriction enzyme sites of the 'NB system' of Bogdanowicz et al. [24] that is routinely used for gypsy moth diagnostics ( Figure 2). The four possible haplotypes in this system are N+B+, N2B+, N+B2, and N2B2, where presence (+) or absence (2) is indicated for the NlaIII and BamHI restriction sites (Table 1).
To explore the utility of the barcode region for species diagnosis in Lymantria, a maximum likelihood tree was constructed in Garli 1.0 [45], using the best-fit model as determined by ModelTest 3.7 [46] under the Akaike Information Criterion. The general-timereversible substitution model was chosen, with among-site-rateheterogeneity modeled according to a gamma distribution, and an estimated proportion of DNA sites invariant. Analysis used default settings with Orgyia antiqua (BOLD ProcessID XAG647-05, GenBank Accession GU091296) included as an outgroup; node supports were estimated using 100 bootstrap replicates. The levels of genetic variation within and between species were calculated in MEGA 4 [47] using pairwise Kimura 2-parameter (K2P) distances [48] and the pairwise deletion option. The number of morphologically defined species that could successfully be differentiated by DNA barcoding were tallied to determine an overall success rate. Successful differentiation of a species required that its barcodes formed monophyletic clusters and were not shared with other species; in the case of species with a single representative, only the latter criterion was necessary.
To investigate subspecies delimitation in Lymantria dispar using the COI gene, we parsed the dataset to the 244 sequences .600 bp. Removing the significant sequence length variation permitted the use the more rapid neighbour-joining (NJ) method. The NJ tree was constructed in MEGA 4 with K2P distances and the pairwise deletion option (and resulted in a near-identical topology to an ML analysis; data not shown). Specimens were classified to subspecies based on the distribution limits provided by Pogue and Schaefer [34] where sufficient locality data were available.
For both the species and subspecies level, Bayesian assignment tests were performed using the 'segregating sites' algorithm of Abdo and Golding [49] (see also [48]). For all species with three or more individuals, one randomly-chosen COI sequence was removed from the dataset and used as the query sequence for the test, as in Taveres and Baker [51]. This procedure was subsequently repeated for all haplotypes of the 32 surveillance specimens. The number of segregating sites, the posterior probability of correct assignment, and the risk of mis-assignment were calculated with the program Assigner (available at http:// info.mcmaster.ca/TheAssigner/; [49]). The assignment of species and subspecies (and potential for determining geographic provenance and female flight ability) with the COI barcode region was compared against the commonly employed 'NB system' [24]. Results COI barcodes were obtained from 518 individuals and 36 species of Lymantria, originating from 35 countries (Table S1). Sixteen species were represented by a single sample sequence while the remaining species ranged from 2 to 308 sequences (mean excluding L. dispar = 5.8 sequences/species). The average sequence length was 597 bp despite several shorter sequences obtained from previous studies (e.g. sequences from [26] are 375-378 bp). Four species (L. singapura, L. semperi, L. todara, and L. subrosea) had sequences shorter than half the barcode region due to specimen age and/or preservation. There was no evidence of insertions, deletions or stop codons detected, suggesting pseudogenes were absent from the dataset.
The ML tree for the 36 species revealed long branches between the terminal clusters of closely related haplotypes (Figure 3, Figure  S1). In only one case did a terminal cluster not correspond to known species limits: L. sinica (n = 3) was paraphyletic with respect to the single specimen of L. nebulosa, with the deep split in L. sinica equalling 2.9% K2P sequence divergence (LYMAN070-08, locality: China split from LYMAN065-08 & LYMAN066-08, locality: Taiwan). Eighteen of the 20 species clusters with multiple individuals had high bootstrap support (.80%); the remaining two species, L. umbrosa and L. dispar, considered conspecific until recently [34] and respond to the same pheromone lure [35], had bootstrap support of 69% and ,50%, respectively. The mean K2P sequence divergence between species (x = 14.02%; SD = 3.47%; range = 2.68-37.61%) was approximately 21-fold higher than within species variation (0.66%; SD = 0.50%; range = 0-3.38%) and formed mostly disjunct distributions (Figure 4). Only 157 of 50,390 intraspecific comparisons and 2 of 630 interspecific comparisons fell within a small range of overlap, between 2 and 4%. The former cases involved L. mathura, L. sinica, L. obfuscata, and L. dispar (only those comparisons involving one or two short GenBank sequences) and the latter involved the pairs L. dispar/umbrosa and L. xylina/schaeferi. Overall, DNA barcoding was able to successfully differentiate 35 of 36 (97.2%) morphologically defined species.
The NJ tree of COI sequence divergences in L. dispar indicated a clear delineation between the L. d. dispar subspecies and the two Asian subspecies asiatica and japonica ( Figure 5). The tree also revealed all North American L. d. dispar individuals clustered together along with two sequences from France (LYMMK049-09, LYMMK054-09). The 32 specimens caught in surveillance programs in British Columbia, Canada comprised four COI haplotypes. Two of these haplotypes clustered within the North American L. d. dispar, while the other two clustered in the asiatica/ japonica group-in all 32 cases, there is perfect correspondence with the subspecies designations deduced from morphology and the origin/pathway of introduction.
The tests of assignment in a Bayesian statistical framework resulted in correct assignment to species and subspecies in all instances ( Table 2). The posterior probabilities of assignment for most species were high, and the risks of mis-assignment were low. The assignment of surveillance specimens to L. dispar and their respective subspecies displayed substantially lower posterior probabilities and higher risks, likely a product of the close association of L. dispar with L. umbrosa, as well as the nonmonophyly of L. d. asiatica and japonica.
The haplotypes of the 'NB system' were inferred based on the COI sequence present at the two restriction sites (Table S1) rather than by restriction endonuclease digests. As in Keena et al. [32], the three haplotypes N+B+, N2B2 and N+B2 were present in L. dispar. All 3 haplotypes were also found in at least one other species-one species possessed N+B+, five species with N2B2, and 16 species scored as N+B2. Currently it is unclear exactly which species will generate a COI amplicon with the primers of Bogdanowicz et al. [24,27], but previous work has successfully assayed L. albescens (L. dispar albescens in [26]), L. umbrosa (L. dispar hokkaidoensis in [27]), and L. monacha (LMH, unpublished data of male from Russian Far East trapped in greater Vancouver, Canada in 1992). This suggests that misidentification of species prior to the 'NB system' assay could have serious ramifications. For example, the heavily regulated nun moth (L. monacha), if captured in North America and inadvertently assayed as L. dispar, would be determined to possess the N+B2 haplotype (Table S1), and mistakenly diagnosed as European gypsy moth, L. dispar dispar. In respect to tracing geographic origins, the 3 NB haplotype states remain mostly informative as originally characterized [24], but with limited resolution and outliers that could also be misleading (e.g. BOGDA004-08, from Ontario, Canada has the N+B2 haplotype characteristic of Europe).
In contrast, the COI sequence data consisted of 91 haplotypes within L. dispar and 142 haplotypes across all species, none of which were shared between species. Moreover, the haplotype data recovered clusters of L. d. dispar, North American L. d. dispar, and L. d. asiatica/japonica. The presence of multiple barcode-determined haplotypes in several regions (e.g. 6 haplotypes from 7 individuals in the Indre-et-Loire department of France; 5 haplotypes from 9 individuals from the Ticino Canton in Switzerland) suggests the potential for determining the native provenance (and associated traits such as female flight ability) by analyzing haplotype frequencies across the native and introduced range (e.g. [20,52,19]). Such analyses may be compromised if humanassociated movement has altered haplotype frequencies in the native populations, but this potential exists for all methods tracing source populations. Although the sampling density in the present study was not adequate for these analyses, the presence of two haplotypes from France which clustered within the North American L. d. dispar is consistent with the well-documented introduction of the species in 19 th century Massachusetts by Leopold Trouvelot [53].

Discussion
This study has clearly demonstrated the efficacy of DNA barcodes for diagnosing species of Lymantria and has reinforced the view that the approach has substantial potential for biosecurity and surveillance [12]. The minimal overlap between mean intraand interspecific variation resulted in well-supported, cohesive clusters of sequences corresponding with named species. In only one case-the paraphyly of L. sinica relative to L. nebulosa-did the COI barcodes fail to correctly distinguish a species, providing a success rate of 97.2% among 36 morphologically defined Lymantria species. This high success rate is comparable to three recent barcode studies of Lepidoptera: 100% success observed in a temperate, local study of 190 species [22]; 97.9% in a tropical, regional study of 521 species [42]; and 99.3% in a temperate, continental study of 1,327 species [38].
It remains possible that the deep split in L. sinica is genuine and reflects the allopatric divergence between the nominal species and a misidentified or undescribed species, or that L. sinica has maintained a deep COI polymorphism following its divergence from L. nebulosa. In any case, further investigation is warranted. Interestingly, a deep COI divergence revealed within L. mathura in the study of Ball and Armstrong [36] pointed to a possible cryptic species L. flavida by Pogue and Schaefer [34], explaining previously mixed responses to a synthetic pheromone lure. Likewise, our preliminary COI data identified a deep divergence within L. mathura, which led us to revise the status of L. subpallida (previously synonymized by [37]; see Appendix S1).
In addition to these instances of deep divergence, there are two pairs of species which show ,2% sequence divergence. L. dispar and L. umbrosa were recognized as separate species by Pogue and Schaefer [34], contra Schintlmeister [37], based in large part on the DNA barcoding study of Ball and Armstrong [36], and their distributions, with L. umbrosa confined to the northern part of Hokkaido, Japan. L. xylina and L. schaeferi also show ,2% sequence divergence, and again recent taxonomic studies disagree on their species status: [34] treated L. xylina from mainland China, Taiwan and Japan as conspecific but [37] described the Chinese populations as a new species, L. schaeferi. Our data confirms that these taxa are closely related, but given that we have sampled just a single specimen of the latter species, more data are needed to resolve this question. All these cases illustrate that DNA barcoding provides secondary benefits to biosecurity applications-taxonomic insights and implications for detection and monitoring in the field. It also validates a fundamental practice in barcoding projects, the retention of voucher specimens [12], to facilitate future taxonomic examination and to avoid the proliferation of incorrect identifications or 'error cascades' [54].
The positive results of the species and sub-species assignment tests, calculated for both reference and surveillance specimens, demonstrates the future potential for the application of DNA barcoding to biosecurity. An actively curated and validated reference library is queried with the DNA barcode obtained from a surveillance or quarantine specimen; within a robust statistical framework, species assignments are calculated along with measures of confidence; where sufficient coverage exists, additional assignments are computed (e.g. subspecies identification, native provenance, characteristics related to invasiveness), which can be corroborated with additional markers; voucher specimens are retained to facilitate morphological confirmation and future taxonomic enquiry. While the scaffold is in place for this at the present time (i.e. BOLD [44]), further benchmarking of statistical assignment programs is necessary (e.g. [50]), as is the continued construction and maintenance of reference barcode libraries. For the latter, participation by multiple national biosecurity research and monitoring programs is imperative.
To this end, our comparison of the commonly used 'NB system' [24] with the COI barcode approach clearly indicates that higher resolution and generality would be achieved by Lymantria monitoring programs should they adopt DNA barcoding. We have demonstrated instances when the 'NB system' would provide misleading results that could have serious consequences, whereas COI barcodes provide reliable subspecies and broad geographic   information. Moreover, recent studies (e.g. [20,52,19]) indicate that higher sampling density should allow more precise tracing of source populations, which in turn will predict female flight ability, the key trait of interest [32]. Because the same COI gene region underlies both approaches, the transition for monitoring agencies could be seamless. Instead of amplifying the 378 bp fragment of Bogdanowicz et al. [24], the 658 bp barcode region could be amplified with the near-universal LepF1/LepR1 primers [41]. Assays of the NlaIII and BamHI enzyme digests could continue to be performed, producing slightly different banding patterns ( Figure 6), but providing a rapid initial assessment as with the 'NB system'. The amplicon could then be sequenced for definitive species and broad geographic information. The increased sampling density would continually develop a baseline from which progressively finer levels of information could be gleaned, then confirmed by other marker systems (e.g. FS1 [28] or microsatellites [24]) when necessary.

Conclusion
In this study we have shown that DNA barcoding is an effective tool for discriminating species of Lymantria. Furthermore, our comparison of this system against the NB restriction digest system-commonly employed at present by biomonitoring agencies for this purpose-suggests that the informational value provided by barcoding (species resolution, general applicability) is significantly greater. DNA barcoding also has the potential to provide novel taxonomic insights, facilitated by its rigorous standards of record-keeping and permanent archiving of voucher specimens. As the DNA barcode reference library continues to increase in coverage, both for Lymantria and other taxa of international regulatory concern, its utility as part of a diagnostic/monitoring system will continue to expand.

Supporting Information
Appendix S1 Taxonomic treatment of Lymantria nebulosa Wileman, revised status and Lymantria subpallida Okano, revised status.
Found at: doi:10.1371/journal.pone.0014280.s001 (9.23 MB PDF) Figure S1 Complete maximum likelihood tree for 36 species of Lymantria. Full tree from Figure 3 displaying haplotype variation within species clusters. BOLD process IDs and collection localities are provided for each sequence. Tree was rooted with the outgroup Orgyia antiqua. Found at: doi:10.1371/journal.pone.0014280.s002 (0.02 MB PDF) Table S1 Taxonomic samples used in study, including Barcode of Life Database (BOLD) and GenBank accessions. Also listed is the presence (+) or absence (-) of the NlaIII (N) and BamHI (B) restriction enzyme sites of the 'NB system' [24], determined from the sequence data (i.e. not by performing the assay). Found at: doi:10.1371/journal.pone.0014280.s003 (0.16 MB PDF)