Calibrating Snakehead Diversity with DNA Barcodes: Expanding Taxonomic Coverage to Enable Identification of Potential and Established Invasive Species

Detecting and documenting the occurrence of invasive species outside their native range requires tools to support their identification. This can be challenging for taxa with diverse life stages and/or problematic or unresolved morphological taxonomies. DNA barcoding provides a potent method for identifying invasive species, as it allows for species identification at all life stages, including fragmentary remains. It also provides an efficient interim taxonomic framework for quantifying cryptic genetic diversity by parsing barcode sequences into discontinuous haplogroup clusters (typical of reproductively isolated species) and labelling them with unique alphanumeric identifiers. Snakehead fishes are a diverse group of opportunistic predators endemic to Asia and Africa that may potentially pose significant threats as aquatic invasive species. At least three snakehead species (Channa argus, C. maculata, and C. marulius) are thought to have entered North America through the aquarium and live-food fish markets, and have established populations, yet their origins remain unclear. The objectives of this study were to assemble a library of DNA barcode sequences derived from expert identified reference specimens in order to determine the identity and aid invasion pathway analysis of the non-indigenous species found in North America using DNA barcodes. Sequences were obtained from 121 tissue samples representing 25 species and combined with public records from GenBank for a total of 36 putative species, which then partitioned into 49 discrete haplogroups. Multiple divergent clusters were observed within C. gachua, C. marulius, C. punctata and C. striata suggesting the potential presence of cryptic species diversity within these lineages. Our findings demonstrate that DNA barcoding is a valuable tool for species identification in challenging and under-studied taxonomic groups such as snakeheads, and provides a useful framework for inferring invasion pathway analysis.


Introduction
Invasive species are a leading cause of decline and extinction of native fishes globally [1]. One group of potential aquatic invaders are snakehead fishes (family Channidae), which have generated concern due to their wide-ranging diet, parental care, and successful establishment [2]. Snakehead species inhabit freshwater ecosystems and are divided into two geographically isolated genera. The genus Parachanna consists of three nominal species that are native to Africa, whereas Channa is represented by at least 30 species native to Asia [3] (Table 1). Three snakehead species have established populations in the United States due to humanmediated introductions [2]: the blotched snakehead, (C. maculata) became established in Hawaii prior to 1900, while the bullseye snakehead (C. marulius) has been established in Florida since 2000 [4]. Although formal risk assessment has identified habitat suitability for both species in the southern United States and parts of Mexico [5], there is very little information available on their potential invasiveness and/or ecological effects. The northern snakehead (C. argus) is of far more concern as a potential North American invader. Within the last decade, the northern snakehead (C. argus) has established multiple populations in the eastern United States [4]. Their rapid colonization and spread, as well as the species' tolerance for colder temperatures have identified C. argus as being of significant concern [2,5,6]. Correctly identifying snakehead species is therefore necessary, as they vary in their ecological requirements and potential invasive ability [5,7].
Despite the attention snakeheads have received, there are substantial difficulties for accurate species identification [8]. Existing taxonomic keys are limited to local geographic regions, and there is no comprehensive morphological key for the Channidae. At present, the most comprehensive listing of snakehead species is that of Courtenay and Williams [2], which provides summary species accounts but no keys for their identification. Moreover, several ''species'' currently circumscribed are thought to represent species complexes. While this issue has received recent attention with a number of new species descriptions [9][10][11], a clear picture of snakehead diversity remains elusive. Genetic calibration of snakehead diversity and interspecies differences could significantly aid taxonomic resolution within the group. Orell and Weight [12] identified seven distinct and locally restricted mitochondrial (mtDNA) control region haplotypes among established populations of C. argus in eastern North American waters, suggesting multiple independent introductions from different maternal sources. This was supported by King and Johnson [13], who similarly concluded that there were multiple introductions, based on microsatellite analyses. Lakra et al. [14] sequenced 16S and cytochrome c oxidase subunit I (COI) mtDNA of eight Indian snakehead species, while Bhat et al. [15] used Random Amplified Polymorphic DNA (RAPDs) in seven of the same species; both studies aimed to test the utility of the respective approaches to discriminate species. The most comprehensive coverage of snakehead diversity was assessed in a phylogenetic study by Li et al. [16], which sequenced individuals from 20 species and focused primarily on the NADH 1 and 2 mitochondrial genes (ND-1 and ND-2). Despite the substantial molecular studies on snakehead fishes, the lack of directly comparable sequence data across species between studies, limits the utility of this body of data from the literature for resolving taxonomic boundaries or identifying non-natives using a molecular approach.
As a tool for species identification and discovery, DNA barcoding uses a standardized ,650 base-pair segment of the mitochondrial 59 COI gene region to map animal diversity and identify cryptic species [17]. The Fish Barcode of Life (FISH-BOL; [18]) campaign was launched to create a barcode reference sequence library for all fishes in order to facilitate their identification at all life stages and to expand knowledge of their geographic distributions and varied life histories. Barcoding has been successfully applied to both freshwater and marine fishes on continental scales e.g. [19][20][21]. It can enhance the accuracy of species identifications e.g. [22,23] and aid in cryptic species detection [24]. Barcode records for snakeheads are beginning to appear in the literature due to a number of regionally [14,25,26] and taxonomically [27] focused efforts, but many species have yet to be characterized. The major objectives of this study were to extend the library of DNA barcode sequences derived from expert-identified reference specimens and to assess the utility of barcoding for elucidating the identity of non-native snakeheads and their entry pathways. A total of 121 snakehead specimens were sequenced for the mitochondrial 59 COI barcoding region. Specimens were sourced from various institutions worldwide, including expert-identified reference specimens derived from within their native ranges, as well as those obtained from outside the known range of snakeheads (e.g. established invaders, from the aquarium trade, or from food markets). Voucher specimen information and digital images (where applicable) were deposited in the Barcode Of Life Database (BOLD website. Available: http://www.boldsystems.org. Accessed 2014 May 21. [28]) following recommendations of the FISH-BOL collaborators protocol [29]. Vouchers were retained for all but two specimens (NRSC040-11 and NRSC042-11). A ''reference'' sequence in this study is defined as one that was obtained from a native range or a sequence that was imported from GenBank (GenBank website. Available: www.ncbi.nlm.nih.gov. Accessed 2014 May 21). All pertinent specimen information is accessible through the BOLD project DSCHA 'Family Channidae' (BOLD website. Available: http://www.boldsystems.org. Accessed 2014 May 21) or the DOI for data set: (DOI website. Available: dx.doi.org/10.5883/DS-DSCHA. Accessed 2014 May 21).

Ethics Statement
DNA was extracted using a Qiagen DNeasy Blood & Tissue Kit (QIAGEN) following the manufacturer's instructions with some exceptions: after adding AW2, spin columns were dried through a final centrifugation at 17,0006g for 5 minutes; sample DNA was eluted with 50 mL of AE buffer and centrifuged at 6,0006g for 1 minute, and the same 50 mL of AE buffer was then re-eluted with a final centrifugation at 6,0006g for 1 minute in order to increase the DNA concentration. Each 12.5 mL PCR reaction consisted of 2 mL of template DNA, 6 Each sequencing reaction consisted of 1 mL of PCR product along with 1 mL BIG DYE 3.1 reagent (Applied Biosystems, Inc), 1 mL M13F/M13R primer [31], 10 mL ddH 2 O and 1 mL 5X sequencing buffer (Invitrogen). The thermocycling profile was an initial hot start 96uC for 2 min, followed by 30 cycles of [denaturation at 96uC for 30 s, annealing at 55uC for 15 s, and an extension at 60uC for 4 min]. PCR products were bidirectionally sequenced and run on an ABI 3730 capillary sequencer (Applied Biosystems). Sequencher 4.05 (GeneCodes) was used to trim primers, assemble and manually edit bidirectional contigs from raw electropherogram ''trace'' files.
Sequence contigs (and their supporting trace files) were uploaded to BOLD [28], and combined with other published sequences from GenBank [32]. Sequences were aligned using a Hidden Markov Model alignment of translated COI amino acid sequences [28]. Aligned sequences were used to generate pairwise or p-distances [33] to infer a neighbour-joining phenogram of sequence divergences using MEGA 5 [34] to provide a visual depiction of the barcode variation among and between species, with bootstrap analysis (based on 500 replications). Sequence data were also parsed into molecular operational taxonomic units (MOTUs) using the RESL (Refined Single Linkage Analysis) algorithm and subsequently annotated with Barcode Index Numbers (BINs), as implemented on version 3 of BOLD [28]. This approach combines single linkage clustering and Markov clustering to recognize gaps in sequence space that correlate with species boundaries by optimizing MOTU partitions using the Silhouette index and uniquely labelling each MOTU with a Barcode Index Number (detailed in [35]). Concordance was assessed between BINs and specimens that were morphologically identified to species by characterizing the discordance (or lack thereof) between morphological species identifications and BIN clusters into one of four categories (MATCH, SPLIT, MERGE or MIXTURE). When members of a single species clustered within a single discrete BIN they were considered to MATCH; when they clustered into multiple BINs unique to that species they were SPLIT (e.g. revealing cryptic genetic diversity); a species placed in a single BIN together with individuals of another species found only in that BIN constituted a MERGE (e.g. revealing species indistinguishable through barcodes); and species with complex partitioning involving both a merge and a split fell into the MIXTURE category (e.g. revealing potential misidentification or hybridization issues). Identifications of non-native and invasive specimens were inferred on the basis of their BIN assignments. Identifications were considered successful when they clustered within a ''MATCH'' or ''SPLIT'' BIN that contained expertidentified reference specimens.

Results
DNA was extracted from 140 channid specimens, 121 of which generated high quality barcode sequences, yielding coverage for a total of 25 of the 36 (Table 1, Table S1, Figure S1) described species of snakeheads. Another 129 GenBank sequences were also included, for a combined analysis of 250 specimens (DOI website. Available: dx.doi.org/10.5883/DS-DSCHA. Accessed 2014 May 21) (Table S1). Each species was represented by between 1 to 35 individuals with sequences ranging from 561 to 666 base pairs in length. No indels or stop codons were detected during sequence alignment, suggesting the absence of pseudogenes. Mean nucleotide frequencies across all sequences were T = 27.9%, C = 30.2%, G = 18.2%, A = 23.7%.
Of the 25 species represented by the 250 individuals, 14 species represented a MATCH (one individual was only named to genus and not included in this number), 9 represented a SPLIT, and 2 species represented a MIXTURE ( Table 2). These 25 species partitioned into 49 BINs, 19 of which were represented by singletons. The average intraspecific variation for the Parachanna genus was 0.43% and for the Channa genus was 1.58% ( Figure 1, Table 2; using species as categories, not BINs). The species that represented MATCHes, for which more than one sequence per species was available, exhibited mean intraspecific divergences of 0.16% with a range from 0% to 0.37% (Table 2). Two species, P. africana and C. maculata represent a MIXTURE, in which P. obscura and C. argus individuals, respectively, group within them. Additionally, of the species that constituted MATCHes, four named species (individual named to genus not included since identity is unknown) were represented by singletons and could not be assessed for intraspecific diversity.
To decipher if there was a relationship between BIN partitioning and sample size, a graph of number of specimens per species against number of unique haplotypes was generated ( Figure 2). The species that were ''outliers,'' (i.e./high sampling effort and high unique haplotypes) consisted of C. argus (MATCH) and C. orientalis, C. punctata, C. gachua and C. striata (SPLIT). Three of the aforementioned SPLIT snakehead species are recognized as potentially harbouring cryptic diversity according to the literature. Five MATCH species had a high number of barcodes ($5) and low number of unique haplotypes (#3), suggesting there is no obvious relationship between BIN partitioning and sample size.

SPLITS: cryptic diversity within phenotypic species
Substantial intraspecific divergences were observed among sequences from specimens morphologically identified as C. striata, C. marulius, and C. gachua, which exhibited intraspecific divergences of 1.98%, 5.8%, and 7.43% respectively ( Figure 1; Table S1,  Table 2). These species, as currently circumscribed, are suspected of harbouring cryptic diversity according to the literature, and our results indicate high levels of intraspecific variation [2]. C. striata was comprised of three distinct genetic clusters (BINs) sourced from the following regions: China, Indonesia and Thailand (and North American introductions) (23 individuals; BIN AAB2497), North East India (6 individuals; BIN ACB7973), and South India (2 individuals; BIN AAB2498). The average internal p-distance values of the three BINs were 0.36%, and these three BINs differed from each other by a mean p-distance of 4.2% (range of 3.9%-4.4%). Sequence data from C. marulius specimens yielded two distinct clusters: the reference individuals collected from India (10 individuals; BIN AAI7187) had a mean internal p-distance of 0.12%, and the non-native specimens obtained from the established population in Florida and from a pet store in Windsor (7 individuals; BIN ABW0012) had an internal p-distance of 0%. The two C. marulius BINs differed from each other by 10.4%. C. gachua showed the highest amount of intraspecific variation of all species in this study. It consists of ten divergent BINs sourced from a wide geographic distribution spanning India, Indonesia, Myanmar, and Thailand ( Table 2), five of which occurred as singletons in the dataset. Each of these species exhibited phylogeographic structuring, as the divergent BINs within each putative taxon corresponded to a specific geographic region. Lastly, C. punctata harbours cryptic diversity with two BIN clusters (16 individuals; BIN AAE8814 and 1 individual; BIN ACG5323) with a mean internal distance of 3.2% between these two BINs. By contrast, although C. micropeltes is believed to potentially contain cryptic diversity [2], barcode sequences from Thailand formed a single distinct cluster, with mean intraspecific divergences of 0.28%. Species C. asiatica (BINs: AAW6834, ACH5880, ACH5881), C. lucius (BINs: AAW6833, ABW0051), C. orientalis (BINs: ABV9995, AAC6050, ACA9095, ABA8489), C. stewartii (BINs: AAF3764, AAF3772), P. insignis (BINs: ABW0157, ACE8403) each represented a SPLIT: although not considered as species complexes in the literature, the first four taxa exhibited moderate genetic diversity with mean internal divergence values of 2.96%, 2.01%, 3.60%, 1.80%, respectively, while that of P. insignis was more typical of intraspecific variation at 0.52%.
A graph comparing mean intraspecific divergences ( Figure 3A) and distance to nearest neighbour ( Figure 3B) of MATCH species reveals that the mean intraspecific variation is lower than the distance to the nearest neighbour. A graph comparing all snakehead species mean intraspecific divergence ( Figure 3C) and distance to nearest neighbour ( Figure 3D) does not reveal such clear separation. However, the figures make evident what may be species complexes as shown by the large amounts of intraspecific variation.

MIXTURE: challenges for taxonomic resolution
Sequence data for specimens contained in BIN ABW0048 highlighted ambiguities in morphological identification and/or potential limitations of barcoding as a maternally inherited marker. For example, BIN ABW0048 consisted of 17 individuals; nine C. sp., one C. maculata, six C. argus, one C. argus (male) x C. maculata (female) (Figure 1), and therefore result in multiple species sharing the same BIN. The presumptive reference DNA barcodes for C. argus were obtained from sequences deposited on GenBank for individuals sourced from China (Zhou et al unpublished, [36,37]), while the C. maculata reference specimens were sourced Vietnam (NRSC042-11; [38]) and China ( [36,37,39]; ) (see C. maculata; Table 3). The C. argus specimen whole mitochondrial genome sequence [40] clustered in Table 2. List of BOLD species in DS-DSCHA, their corresponding BIN categories, BIN numbers and individuals per BIN, intraspecific variations within the BIN and species, and the countries that the reference specimen and introduced specimens were sourced from for that particular BIN.  the same BIN (ABW0048) as the C. maculata and C. maculata (F) x C. argus (M) hybrid mitochondrial genome sequences of specimens [36,39,40], and therefore highlights the difficulties with identification of this species.

Discussion
This study represents the largest and most comprehensive global synthesis of sequence diversity within the family Channidae yet undertaken. In lieu of limited snakehead taxonomic expertise and inadequate morphological keys, molecular techniques provide a rapid method of identification. The substantial sequence diversity Triangle bars represent genetic diversity with Channidae. Numbers contained in bracket of each species represent number of individuals that clustered in that particular haplogroup. The species that showed phylogeogra-phical structuring had localities listed after the semi-colons within that particular haplogroup. Species C. maculata* is represented by nine C. sp., one C. maculata, six C. argus, one C. argus (male)x C. maculata (female). Species P. africana* is represented by three P. africana and one P. obscura. doi:10.1371/journal.pone.0099546.g001   identified in this study within broadly defined taxa in both channid genera highlights the need for comprehensive examination of the molecular and morphological systematics within the Channidae. It also highlights the need for integrative taxonomic resolution and delineation of species boundaries. This study is constrained by limited representation from source populations and type localities for described species e.g. [41], as well as incomplete representation of species from the genus Channa. At least nine described species have yet to be added to the BOLD database, and nine of the sensu lato species included in this study may account for undescribed cryptic taxa as revealed by the barcode data and related BIN assignments. Based on the biogeographic range and distribution of snakehead species and genera, it seems apparent that substantial genetic and evolutionary diversity within the Channidae remains to be described.
The results substantiate a growing body of work indicating that DNA barcodes can be used to discriminate between various channid species, shown to be effective in other fish studies [18,26,42]. Past studies focused on a relatively small number of snakehead species occurring within a particular region [12,14,25]. Efforts to characterize the genetic diversity of snakeheads have been restricted [16,27,38]. The present study has attempted to overcome this limitation by adding to the barcode library for snakeheads and providing novel coverage for 25 species. Importantly, the sequences compiled from this study also include additional metadata concerning collection locality, digital images of the vouchers and metadata pertaining to them, electropherogram ''trace'' files and PCR primer sequences where possible, which enhances their fitness-for-use in molecular diagnostic applications (as discussed in [18]). With this contribution, we also make a plea for other researchers to not only include the COI barcode as a common marker in their studies, but to also adhere to barcode data standards for reporting provenance data.

Delineation and species identification using Barcodes and BINS
DNA barcoding proved to be an effective tool for species-level identification of snakeheads. Of the 25 species that were presented in this study, 14 were a MATCH, reinforcing the finding that individuals of the same species tend to possess diagnostic barcode arrays [43]. Nine species were SPLIT across multiple BINs, highlighting the existence of discrete phylogeographically structured barcode clusters separated by genetic distances typically associated with different species and suggesting the presence of possible cryptic species. Two species were a MIXTURE, indicating that there are ambiguities in identification and cases of hybridization. Our efforts to expand barcode coverage for the channid contribute significantly to a synthesis that includes 70% of their described species diversity (25 of 36 spp.). The barcodes segregate into 49 BINs, suggesting that our current taxonomic framework under-represents the genetic diversity of the group, a finding consistent with other large-scale barcode surveys of freshwater fishes [21,44].
The Parachanna species have overlapping ranges and are morphologically very similar. This renders identification very difficult, as indicated by the assignment of P. africana and P. obscura under BIN AAF7843, which constitutes a MIXTURE. Future efforts should be aimed at sourcing more specimens and carefully examining the taxonomy of this group.
In addition to this study, the BIN assignment and classification system has been successfully implemented in other model organisms such as spiders [45] and moths [46]. In well-studied species and taxonomic groups, the BIN system is ideal because it allows for unambiguous classification by using MATCH catego-ries. In taxonomic groups that have been incompletely resolved, classification can be more difficult as a barcoding gap may not be present between recently diverged species, while older species could be represented by artificial splits [47]. While the BIN algorithm shows considerable promise, it should also be implemented with caution and in concert with existing taxonomic and phyletic information wherever possible.

Identification of cryptic species and hybridization
The literature suggests that several snakehead species as currently circumscribed could each constitute species complexes in their own right [2,27,38]. If true, the species in question could represent a SPLIT, and would therefore partition into multiple BINs as a result of high intraspecific variation. This is seen for C. gachua, C. marulius, C. punctata and C. striata, which are characterized by SPLITs, and partitioned into ten, two, two and three BINs respectively, suggesting that they harbour cryptic diversity ( Table 2). Of particular importance to note is that BINs show phylogeographic structuring making them particularly interesting for inferring introduction pathways involving broadly distributed morphospecies (e.g. like many of the taxon concepts currently used for snakeheads).
Hybrids cannot be detected with DNA barcoding directly because mitochondrial DNA is typically only inherited from the maternal parent. Adamson et al. [38] sequenced nuclear gene RP1 and suggested that C. striata is a product of genetic introgression rather than speciation. If C. striata specimens could be extensively sampled throughout the species range, this could be very helpful to aid in identifying invasion sources and expansion pathways, as this species is phylogeographically structured [48,49]. For example, C. striata (NRSC031-11, 032-11; Table S1) were sourced from a market in New York but based on the observed sequence similarity, we can predict that these specimens most likely originated from Southeast Asia as opposed to India, although geographic representation (sampling intensity) for this region was limited.
Genetic diversity in C. gachua, C. marulius, and C. punctata has been largely unexplored. Low amounts of intraspecific diversity were observed in prior studies [14,27,38], but this could reflect a limited scope of geographical sampling. In contrast, our study partitions C. gachua into ten BINs suggesting cryptic diversity with the further acknowledgement that half of these clusters are represented by a singleton specimen. Adamson et al. [38] suggested C. marulius might harbour cryptic diversity and observed that the C. marulius sourced from India was genetically divergent from the Channa that was only identified to genus sourced from Cambodia, but consistently grouped together in a phylogenetic analysis. The results from our study indicated that the C. marulius Indian haplogroup partitions into a separate BIN from the North American C. marulius population, ruling out the Indian population as a source of invasion. In the case of C. punctata, previous studies using RAPD and allozymes suggested that geographic distance is positively correlated with genetic diversity within this species [50,51]. Our study revealed two BIN clusters (AAE8814, ACG5323), yet all but one sequence segregated into one haplogroup. Specimen availability was limited for this species, and a broader geographic coverage would likely reveal additional haplotypes. This particular species has achieved a low-risk ''nearthreatened'' status [52]. While snakeheads are viewed as ''invasive'', this particular species is declining as a result of overexploitation, disease and habitat loss [8,48], suggesting that conservation efforts need to be focused on their preservation.
Believed to possess cryptic diversity, C. micropeltes showed very low intraspecific variation and formed only one BIN in our study.
These results are unsurprising; as we only had access to samples from Thailand, despite the fact that C. micropeltes as currently circumscribed is thought to occupy a much larger geographic range. Additional sampling for C. micropeltes is likely to reveal higher intraspecific mtDNA variation similar to that observed in other channid species [50,51].

Applications for species introductions and invasions
A key component of conservation management is the identification of non-native and potentially invasive species. Consequently one objective of this study was to resolve the taxonomic identity of introduced snakeheads. This was accomplished by examining the non-native snakehead species BINs assignments to see if they cluster with expert-identified reference specimens used to construct the barcode library and perhaps shed light on potential expansion or invasion pathways. While this was successfully assessed for most introduced snakeheads (Table 3), there remains ambiguity with respect to the identity of the United States species (C. argus, C. maculata, C. marulius). While C. marulius shows a broad distribution in southern and south-eastern Asia (Table 1), individuals sourced from their native ranges were only available from India for this study [52] and these specimens were genetically divergent from individuals found in North America. The barcode reference library remains incomplete for this species, and specimens need to be sampled throughout its native range before the C. marulius source of origin can be established.
Difficulties also lie in accurate identification between C. argus and C. maculata, as these two species names have often been interchanged [2,37]. Most recently, Wang and Yang [40] sequenced the whole mitogenome of C. argus, but typical of most mitogenome sequences deposited in GenBank, they made no reference to a voucher specimen making it difficult to corroborate their identification. A subsequent study conducted by Zhu et al. [36] in which whole mitogenomes were sequenced for C. argus, C. maculata, and a hybrid C argus (male) x C. maculata (female), showed contradictory results. Zhu et al. [36] observed that the C. argus specimen sequenced by Wang and Yang [39] shared the same DNA barcode sequence as their C. maculata and hybrid haplogroup, suggesting the putative C. argus material sequenced by Wang and Yang [40] could have been derived from a misidentified specimen of C. maculata or an unrecognized hybrid. The C. maculata reference specimen sourced for our study was obtained from Vietnam [38] and shares the same BIN with C. maculata and a hybrid [36,38] as well as a likely misidentified C. argus [40].
A second case of mistaken identity was observed with the Lake Wylie, North Carolina specimen (NRSC027-11) that was originally identified as C. argus. The morphological voucher specimen was retained and later re-identified as C. maculata as comparisons with barcode reference sequences alerted to that possibility (per comm. W.C. Starnes, NCSMNS.). The remaining northern snakehead specimens sourced from Virginia, Pennsylvania, and New York matched C. argus. Recent literature [36,37], however, suggests that C. argus (female) and C. maculata (male) have been crossed for aquaculture purposes. If such hybrids are fertile, genetic introgression could limit barcode identification to the level of species-pair for this group.
As management decisions are influenced by the perceived biological attributes of the species that are thought to be present, accurate species identification is critical. For example, C. argus is a temperate, cold-tolerant species that has a much broader environmental suitability throughout North America than the tropical C. maculata [5,53] and could therefore call for differing levels of intervention. Specimens that were purchased from a fish market in BC, Canada (NRSC007-11, DSCHA062-11-064-11) grouped with C. maculata [54], which was surprising as it was thought that C. argus represents the only species being imported into BC [52,53]. The BC wild-caught specimen (DSCHA065-12) also grouped with the C. maculata haplogroup and could potentially represent either C. maculata or a hybrid with C. maculata as maternal parent species [50]. The hybrid species could pose its own threats because it is able to grow very fast; it has higher survival rates and is better able to endure stress than its parent species [55]. Regardless, this example illustrates how a DNA barcode reference library can provide insight into past invasions and shed new light on emerging invasion threats, calling for continued efforts to populate it and perhaps extend coverage by including a nuclear marker to aid the identification of hybrids.
Although C. maculata has been established in Hawaii for over a century [2], this population is a prime example of taxonomic misidentification. Originally believed to represent C. striata, further examination [7] revealed it to be C. maculata. Courtenay et al. [7] argue there is a need for accurate snakehead identification in order to make practical predictions about their effects in non-indigenous environments. They also state that there is a need to acquire voucher specimens so that re-examination can be possible.
An emerging potent molecular tool that is being applied to aquatic invasive species monitoring is environmental DNA (eDNA) detection. This approach employs the presence of species-specific DNA sequence motifs using genomic DNA extracts from water samples [56]. Because aquatic organisms shed DNA into their environment, it can be applied to infer species presence and has been used successfully to detect invasive species, even at low abundances [57,58]. The sequence data from this work and similar studies could inform the development of speciesspecific PCR primer and probe sets for the detection of eDNA for species such as C. argus, C. marulius, and C. maculata. As a caveat, the large number of BINs documented in this study, despite the relatively low numbers of available samples, indicates that there is substantial genetic diversity within and among snakehead species that has yet to be documented. Accordingly, eDNA surveillance would only be effective for detecting haplotypes that fit within the BINs/species that are present in the BOLD reference library. Hence, while it may be possible to build primers and probes for known haplotypes, the risk of generating false negative results from targeted marker development remains substantial for at least some members of this group.

Conclusions
This study represents the most comprehensive account of mtDNA diversity within Channidae, and has contributed to the species diversity within this family. In order to better understand and delineate this fascinating group of species, efforts should be focused on more intensive sampling at hierarchical spatial scales, coupled with both morphological and genetic analysis. Similarly, this study also highlights the importance of documenting the barcode profiles of unknowns and non-native specimens. When placed in the BIN framework, these sequences provide additional evidence that the barcode reference sequence library remains incomplete as not all specimens' barcodes cluster with those of reference specimens. Moreover, their retention in the BOLD BIN schema suggests that as the reference sequence library grows, it may be possible to retrospectively identify their putative source population. Hence, we argue for continuing to not only query nonnatives against the reference library but also deposit their sequences in it because they are important for documenting the extent of haplotypic diversity in nature and can help contribute to the creation of robust tools for detecting the eDNA of invasive species. Figure S1 Neighbour Joining Tree of collapsed snakehead sequences with species name, process ID, sample ID and BIN number. (PDF)

Supporting Information
Table S1 Table of species name, process ID, sample ID, country specimen was sourced from, museum ID if applicable, if the specimen is from a native or non-native range and BIN number. (DOC)