Barcoding of Arrow Worms (Phylum Chaetognatha) from Three Oceans: Genetic Diversity and Evolution within an Enigmatic Phylum

Arrow worms (Phylum Chaetognatha) are abundant planktonic organisms and important predators in many food webs; yet, the classification and evolutionary relationships among chaetognath species remain poorly understood. A seemingly simple body plan is underlain by subtle variation in morphological details, obscuring the affinities of species within the phylum. Many species achieve near global distributions, spanning the same latitudinal bands in all ocean basins, while others present disjunct ranges, in some cases with the same species apparently found at both poles. To better understand how these complex evolutionary and geographic variables are reflected in the species makeup of chaetognaths, we analyze DNA barcodes of the mitochondrial cytochrome oxidase c subunit I (COI) gene, from 52 specimens of 14 species of chaetognaths collected mainly from the Atlantic Ocean. Barcoding analysis was highly successful at discriminating described species of chaetognaths across the phylum, and revealed little geographical structure. This barcode analysis reveals hitherto unseen genetic variation among species of arrow worms, and provides insight into some species relationships of this enigmatic group.


Introduction
Arrow worms (Phylum Chaetognatha) comprise over 120 species, all of which inhabit marine environments and exhibit hermaphroditic reproduction. Although there are fewer species in this phylum than in many others, chaetognaths can be numerically abundant in many pelagic environments [1], and their grasping hooks, rows of strong teeth, and transparent bodies make them excellent predators in many marine food webs.
Despite knowledge of chaetognaths extending back to at least the eighteenth century (the first description was by Slabber in 1778), taxonomic affinities of the phylum remain enigmatic. Although fossils are known as far back as the early Cambrian [2], the generally poor preservation of chaetognaths has frustrated attempts to reconstruct their evolutionary history. Chaetognaths appear to have a relatively simple, conserved body plan, with few complex internal structures. However, variation in morphological characters-e.g. position of lateral fins, morphology of tail fins, organization of teeth and grasping hooks-is often a matter of degree rather than of sharp contrast, making classification difficult [3]. Indeed, the seemingly simple morphology of arrow worms belies an underlying mix of features synapomorphic to chaetognaths and features shared with other phyla, complicating placement at even the most basic levels of metazoan organization.
Although fewer studies have focused on the relationships among the species and proposed families within the Chaetognatha, they too reflect a history of revision. After Tokioka's reorganization of the early chaetognath classification [11], morphological taxonomy has advanced a succession of alternative schemes [12]. For instance, the genus Sagitta, which contains some 60 species, has also been considered a family [13]; while this relative placement reflects the Linnaean classification system and is therefore somewhat arbitrary, it does highlight the current uncertainly in timing and driving forces of speciation in Chaetognatha. Morphological identification of arrow worms requires significant training and expertise, and delineating species (that is, identifying monophyletic taxa) has often been difficult, even for experienced taxonomists.
Biogeographical data further complicate our understanding of species structure in chaetognaths. Although many species exhibit large ranges, encompassing similar latitudinal bands in all major oceans [14], chaetognaths can also be accurate indicators of regional water masses and depth layers [15]. Species such as Sagitta setosa often exhibit disjunct distributions [12,16], which raise questions as to whether the morphological variation seen between populations has a genetic basis. Finally, species and/or groups of related species exhibit patterns of distribution that may reflect their history of evolution and speciation. For instance, the cold-water species Sagitta maxima exhibits submergence (i.e. a shift into deeper waters) in subtropical and tropical zones. More intriguing are the similar distributions of two groups of chaetognaths, each containing three species (S. marri, S. zetesios, S. planctonis, and S. gazellae, S. maxima, S. lyra). The first species in each triplet is found in Antarctic waters, the second shows a bipolar distribution with submergence towards the equator, and the third is subtropical [15,17]. It is not known whether this latitudinal series of distributions reflects the speciation history of the triplets either northwards or southwards, or whether it is an ecological grouping only.
The complex morphological and geographical associations of chaetognaths present a situation in which DNA barcoding [18] can offer significant insight. Analysis of the patterns of DNA sequence diversity at the mitochondrial cytochrome oxidase c subunit I (COI) gene, when combined with known morphological associations of established species, results in a fuller understanding not only of the cohesion of well-known taxa, but also the range of variation contained within them. Barcoding using COI has been effective in revealing previously unknown patterns of genetic diversity in terrestrial systems (e.g. [19][20][21]) and marine systems (e.g. fish [22], chitons [23], and crustaceans [24]). While nuclear rRNA genes are frequently used in similar investigations, their resolution is typically taxonomically deeper than the species level crucial to species discrimination with COI. Further, the ribosomal genes copies in chaetognaths appear to be split into two highly divergent ''classes'' [3;25] whose paralog vs. pseudogene status remains unclear. This possibly non-homogeneous duplication of nuclear ribosomal genes complicates their use in genetic analyses. This study presents DNA barcodes for 52 specimens of 14 species of chaetognaths collected from the Atlantic and Southern Oceans. These collections are part of an ongoing barcoding effort of the Census of Marine Zooplankton (CMarZ) and the Mid-Atlantic Ridge Ecology group (MAR-ECO), two field projects of the Census of Marine Life (CoML).

Materials and Methods
Chaetognaths sequenced in this project were collected on six cruises ( Figure 1 For some specimens, DNA extraction, PCR amplification, and sequencing took place during the cruise; other specimens were analyzed at the University of Connecticut. Procedures and equipment were the same for all specimens. Vouchered material was preserved in acetone (MAR and Southern Ocean cruises) or 95% ethanol (Northwest Atlantic, MAB, Eastern Atlantic, and Arctic cruises). The voucher consisted of at least one additional individual taken from the same net tow, or as necessary, a minimal amount of excised tissue of an individual specimen was removed for DNA extraction and the remainder retained as the voucher. All vouchers are therefore paragenophores (sensu [26]). Photographs were taken of specimens before dissection when possible. Vouchers and images are maintained by CMarZ at the University of Connecticut, USA. Collection information and species identifications are summarized in Table 1.
Forward and reverse sequences for each individual were assembled in Sequencher (GeneCodes, Inc., Ann Arbor, MI USA) and manually edited. All sequences were compared to the GenBank database using BLAST [28], and to a database of all zooplankton barcodes obtained in the laboratory (Bucklin et al. unpublished). Edited DNA sequences were exported into BioEdit and translated to inferred amino acid sequences to verify that they translated correctly. Once verified, the COI sequences were aligned as amino acids using the CLUSTAL algorithm [29] in BioEdit, and returned to DNA format. This alignment was manually edited for consistency and to remove primer sequences. The final dataset contained sequences for 52 specimens of 14 species of chaetognaths. For reference, three COI sequences of Sagitta bedoti from [16] were added from GenBank. Sequences produced in this project were deposited in the BARCODE section  of GenBank along with georeferenced metadata (Accession Numbers GQ368374-GQ368425).
To investigate the levels of genetic variation within and between chaetognath species, pairwise Kimura 2-parameter distances (K2P; [30]) were computed in MEGA 4 [31], with gap positions ignored on a pairwise basis. These distances were hierarchically tabulated within each species, and between species within each genus. Because the sequence dataset contained only two genera from the same family (Eukrohnia and Heterokrohnia), comparisons between genera within the family were not tabulated.
To investigate the evolutionary history of COI sequences in chaetognaths, a model of DNA sequence evolution was chosen using MrModeltest v2 [32] under Akaike's Information Criterion (AIC). The general time-reversible model (GST) was selected, with an estimated proportion of DNA sites invariant (I), and mutation rates among sites following a gamma distribution (G). This GTR+I+G model was then used to generate a Bayesian and a maximum likelihood (ML) gene tree. The Bayesian tree was obtained with MrBayes 3.1.2 [33], with the search conducted 100,000 iterations at a time, continuing until the average standard deviation of split frequencies approached its asymptote (roughly 0.01 after 400,000 generations). The collection of trees produced at this point was pruned heuristically by viewing the output of likelihood scores in MrBayes, and only trees near the optimum likelihood score were retained using the appropriate burn-in criterion. The final sample contained 2000 trees, on which posterior probabilities (PP) were calculated. To construct the ML tree, the hill-climbing algorithm of [34] was performed online via the PHYML web server [35], using the default options, the chosen GTR+I+G model, and a starting tree made by neighbor joining. For consistency with MrBayes, in which the form of the molecular model is specified but parameters are estimated, only the model form was specified in PHYML. Support for nodes in the tree was assessed using the approximate likelihood ratio test (aLRT, [36]) as implemented in PHYML.

Species
Voucher no.

Results
Hierarchical comparison of K2P distances at different taxonomic levels revealed disjunct distributions in sequence similarity within vs. between species (Figure 2A,B). The average proportion of difference in sequences within species was 0.014660.0193 (mean6SD), whereas mean distance between species within a genus was over an order of magnitude larger, 0.34560.100. The only overlap between these distributions results from comparisons of Eukrohnia hamata and E. bathyantarctica (K2P distances of 0.06-0.08).
The optimal gene trees produced by Bayesian and ML searches showed nearly identical topology, in which the tip branches within species were short, and species were separated by much longer branches (Figure 3). Sequences clustered strongly by species in all cases. Although the nodes separating Sagitta spp. from all others (Heterokrohnia and Eukrohnia spp.) were well supported in the Bayesian analysis (both PP = 1.00), they were not well supported by ML (71% and ,50%). Most other internal nodes were moderately supported by both analyses, or strongly supported by only the Bayesian analysis.

Discussion
Barcode analysis of chaetognaths was extremely successful in diagnosing established species based on COI gene sequence, in that sequences clustered by species in all cases. Given the difficulty in diagnosing species from morphological features, especially in ethanol-preserved material, the high accuracy of barcode analysis presents a very useful tool to aid identification of known species. The comparatively short branch (and small K2P distances) between E. hamata and E. bathyantarctica may mean these are a young species pair, or that regional variants of a single species have been mistaken for separate species.
The average K2P distance within species for chaetognaths, 0.0145, was on the high end of values computed for other taxa: recent barcoding work has reported intraspecific mean K2P distances of 0.00460 (decapods, [24]), 0.00740 (gammarid amphipods, [24]), 0.0100-0.0200 (13,000 species pairs, [37]), and 0.00390 (fish, [22]). The average distance between the species within each genus for the present dataset, 0.345, was considerably larger than for these same taxa (0.170, 0.0.250, 0.110, and 0.099 respectively), and reflects the high diversity of Sagitta. Although not directly comparable to the K2P distances reported here, uncorrected p-distances of 6.3062.74% (mean6SD) within Sagitta setosa, 2.0860.95% within S. bedoti, and maximum-likelihood corrected distances of 77.763.45% between the two species have been reported [38]. These comparisons all indicate that most chaetognath species seem to have diverged long ago, and have undergone comparatively less divergence since. The disjunct distributions of K2P distances imply that barcode analysis can also alert taxonomists to genetically distinct lineages that warrant further morphological examination.
Although all barcodes for a given species in this dataset tended to be from the same locality, the genetic variation seen within species showed little association with geography. Most species (e.g. S. bipunctata, S. helenae, E. hamata) exhibited at least one barcode separated from the others by a longer branch, even though all were from the same location. In S. lyra, there was a weak clustering of two clades, but there was no separation between the central Atlantic (i.e. MAR) and the Northeast Atlantic. The presence of significant genetic diversity without geographic structure could imply reproductive mixing across the portion of the range represented in these specimens, or insufficient time for lineage sorting in isolated populations. More thorough barcoding of species throughout their ranges will be required to address the issue of phylogeography.
Although the COI barcodes did not resolve the branching order of the ''paired triplet'' species, preliminary analysis suggests that nearly complete sequences of the nuclear large ribosomal subunit (28S) will have the power to address this question (Jennings et al. unpublished data). Existing partial Class I sequences [39] contain insufficient variation to obtain robust branching order; however, if the preliminary patterns from full Class I 28S can be confirmed by more complete sequencing, they should shed light on this interesting evolutionary history.
On the whole, the chaetognath barcodes indicate a complex history of speciation and evolution. The lack of correlation between location and genetic similarity underscores this complexity, and the potential for genetic mixing over large distances in chaetognaths. At least for the species in the present analysis, COI barcode analysis was a highly successful and accurate tool for species confirmation, in that all species barcoded to date displayed readily distinguishable COI sequences, with lower divergence within species. Given the difficulty in identifying chaetognaths, particularly from suboptimally preserved material, barcoding of uncertain specimens and comparison to known specimens should greatly assist taxonomists in morphological identifications. More complete barcoding of species across their ranges promises to further elucidate the patterns of genetic diversity of this enigmatic group.