Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Barcoding of Arrow Worms (Phylum Chaetognatha) from Three Oceans: Genetic Diversity and Evolution within an Enigmatic Phylum


Arrow worms (Phylum Chaetognatha) are abundant planktonic organisms and important predators in many food webs; yet, the classification and evolutionary relationships among chaetognath species remain poorly understood. A seemingly simple body plan is underlain by subtle variation in morphological details, obscuring the affinities of species within the phylum. Many species achieve near global distributions, spanning the same latitudinal bands in all ocean basins, while others present disjunct ranges, in some cases with the same species apparently found at both poles. To better understand how these complex evolutionary and geographic variables are reflected in the species makeup of chaetognaths, we analyze DNA barcodes of the mitochondrial cytochrome oxidase c subunit I (COI) gene, from 52 specimens of 14 species of chaetognaths collected mainly from the Atlantic Ocean. Barcoding analysis was highly successful at discriminating described species of chaetognaths across the phylum, and revealed little geographical structure. This barcode analysis reveals hitherto unseen genetic variation among species of arrow worms, and provides insight into some species relationships of this enigmatic group.


Arrow worms (Phylum Chaetognatha) comprise over 120 species, all of which inhabit marine environments and exhibit hermaphroditic reproduction. Although there are fewer species in this phylum than in many others, chaetognaths can be numerically abundant in many pelagic environments [1], and their grasping hooks, rows of strong teeth, and transparent bodies make them excellent predators in many marine food webs.

Despite knowledge of chaetognaths extending back to at least the eighteenth century (the first description was by Slabber in 1778), taxonomic affinities of the phylum remain enigmatic. Although fossils are known as far back as the early Cambrian [2], the generally poor preservation of chaetognaths has frustrated attempts to reconstruct their evolutionary history. Chaetognaths appear to have a relatively simple, conserved body plan, with few complex internal structures. However, variation in morphological characters—e.g. position of lateral fins, morphology of tail fins, organization of teeth and grasping hooks—is often a matter of degree rather than of sharp contrast, making classification difficult [3]. Indeed, the seemingly simple morphology of arrow worms belies an underlying mix of features synapomorphic to chaetognaths and features shared with other phyla, complicating placement at even the most basic levels of metazoan organization. Reflective of this complexity, taxonomists have variably placed chaetognaths as basal members of protostomes or deuterostomes [4][6] or even outside the coelomate metazoans [7]. Although molecular phylogenetic analyses tend to support placement within the protostomes [6], [8][10], alternative arrangements are still advanced.

Although fewer studies have focused on the relationships among the species and proposed families within the Chaetognatha, they too reflect a history of revision. After Tokioka's reorganization of the early chaetognath classification [11], morphological taxonomy has advanced a succession of alternative schemes [12]. For instance, the genus Sagitta, which contains some 60 species, has also been considered a family [13]; while this relative placement reflects the Linnaean classification system and is therefore somewhat arbitrary, it does highlight the current uncertainly in timing and driving forces of speciation in Chaetognatha. Morphological identification of arrow worms requires significant training and expertise, and delineating species (that is, identifying monophyletic taxa) has often been difficult, even for experienced taxonomists.

Biogeographical data further complicate our understanding of species structure in chaetognaths. Although many species exhibit large ranges, encompassing similar latitudinal bands in all major oceans [14], chaetognaths can also be accurate indicators of regional water masses and depth layers [15]. Species such as Sagitta setosa often exhibit disjunct distributions [12], [16], which raise questions as to whether the morphological variation seen between populations has a genetic basis. Finally, species and/or groups of related species exhibit patterns of distribution that may reflect their history of evolution and speciation. For instance, the cold-water species Sagitta maxima exhibits submergence (i.e. a shift into deeper waters) in subtropical and tropical zones. More intriguing are the similar distributions of two groups of chaetognaths, each containing three species (S. marri, S. zetesios, S. planctonis, and S. gazellae, S. maxima, S. lyra). The first species in each triplet is found in Antarctic waters, the second shows a bipolar distribution with submergence towards the equator, and the third is subtropical [15], [17]. It is not known whether this latitudinal series of distributions reflects the speciation history of the triplets either northwards or southwards, or whether it is an ecological grouping only.

The complex morphological and geographical associations of chaetognaths present a situation in which DNA barcoding [18] can offer significant insight. Analysis of the patterns of DNA sequence diversity at the mitochondrial cytochrome oxidase c subunit I (COI) gene, when combined with known morphological associations of established species, results in a fuller understanding not only of the cohesion of well-known taxa, but also the range of variation contained within them. Barcoding using COI has been effective in revealing previously unknown patterns of genetic diversity in terrestrial systems (e.g. [19][21]) and marine systems (e.g. fish [22], chitons [23], and crustaceans [24]). While nuclear rRNA genes are frequently used in similar investigations, their resolution is typically taxonomically deeper than the species level crucial to species discrimination with COI. Further, the ribosomal genes copies in chaetognaths appear to be split into two highly divergent “classes” [3]; [25] whose paralog vs. pseudogene status remains unclear. This possibly non-homogeneous duplication of nuclear ribosomal genes complicates their use in genetic analyses. This study presents DNA barcodes for 52 specimens of 14 species of chaetognaths collected from the Atlantic and Southern Oceans. These collections are part of an ongoing barcoding effort of the Census of Marine Zooplankton (CMarZ) and the Mid-Atlantic Ridge Ecology group (MAR-ECO), two field projects of the Census of Marine Life (CoML).

Materials and Methods

Chaetognaths sequenced in this project were collected on six cruises (Figure 1). A cruise collected zooplankton from the waters west of the Antarctic Peninsula on board the R/V N.B. Palmer in 2002 (NBP0202). Four cruises sampled waters in the Atlantic: the R/V G.O. Sars to the northern Mid-Atlantic Ridge (MAR) in summer 2004 (SARS_2004110), the R/V R.H. Brown in April 2006 (RHB0603) to the Sargasso Sea (Northwest Atlantic), the R/V Delaware II in November 2006 (DL0616) to the Mid-Atlantic Bight (MAB), and the FS Polarstern along the eastern boundary of the Atlantic (Canary Islands to South Africa) in November 2007 (PS-ANT-XXIV/1). Finally, the FS Polarstern collected zooplankton from the Arctic Ocean north of Europe in summer 2007 (PS-ARK-XXII/2).

Figure 1. Map showing locations of cruises and material collected in this study.

For some specimens, DNA extraction, PCR amplification, and sequencing took place during the cruise; other specimens were analyzed at the University of Connecticut. Procedures and equipment were the same for all specimens. Vouchered material was preserved in acetone (MAR and Southern Ocean cruises) or 95% ethanol (Northwest Atlantic, MAB, Eastern Atlantic, and Arctic cruises). The voucher consisted of at least one additional individual taken from the same net tow, or as necessary, a minimal amount of excised tissue of an individual specimen was removed for DNA extraction and the remainder retained as the voucher. All vouchers are therefore paragenophores (sensu [26]). Photographs were taken of specimens before dissection when possible. Vouchers and images are maintained by CMarZ at the University of Connecticut, USA. Collection information and species identifications are summarized in Table 1.

Table 1. Species identity and collection information for barcoded chaetognaths.

For all preserved, identified specimens, DNA analysis proceeded as follows. Up to 25 mm3 of tissue from single arrow worms was dissected using sterile techniques and DNA was extracted with the DNEasy DNA Extraction Kit (Qiagen). PCR amplification of the COI barcode region employed primers LCO-1490 (5′-GGTCAACAAATCATAAAGATATTGG-3′) and HCO-2198 (5′-TAAACTTCAGGGTGACCAAAAAATCA-3′) from [27], in 50 µL PCRs consisting of 1x GoTaq Flexi buffer (Promega, Madison, WI USA), 2.5 mM MgCl2, 2 pmol dNTPs, 1.2 pmol of each primer, approximately 50 ng extracted DNA template, and 1U of Taq polymerase (Promega). The PCR protocol was as follows: initial denaturation, 95°C for 5 min.; 35 cycles of (95°C for 30 sec., 50°C for 45 sec., 72°C for 1 min); final extension, 72°C for 5 min. Products were purified using the QIAquick PCR Purification Kit (Qiagen, Valencia, CA USA). Sequencing reactions were performed using BigDye Terminators v3.1, purified via ethanol precipitation, and run on an ABI 3130 Automated Sequencer.

Forward and reverse sequences for each individual were assembled in Sequencher (GeneCodes, Inc., Ann Arbor, MI USA) and manually edited. All sequences were compared to the GenBank database using BLAST [28], and to a database of all zooplankton barcodes obtained in the laboratory (Bucklin et al. unpublished). Edited DNA sequences were exported into BioEdit and translated to inferred amino acid sequences to verify that they translated correctly. Once verified, the COI sequences were aligned as amino acids using the CLUSTAL algorithm [29] in BioEdit, and returned to DNA format. This alignment was manually edited for consistency and to remove primer sequences. The final dataset contained sequences for 52 specimens of 14 species of chaetognaths. For reference, three COI sequences of Sagitta bedoti from [16] were added from GenBank. Sequences produced in this project were deposited in the BARCODE section of GenBank along with georeferenced metadata (Accession Numbers GQ368374-GQ368425).

To investigate the levels of genetic variation within and between chaetognath species, pairwise Kimura 2-parameter distances (K2P; [30]) were computed in MEGA 4 [31], with gap positions ignored on a pairwise basis. These distances were hierarchically tabulated within each species, and between species within each genus. Because the sequence dataset contained only two genera from the same family (Eukrohnia and Heterokrohnia), comparisons between genera within the family were not tabulated.

To investigate the evolutionary history of COI sequences in chaetognaths, a model of DNA sequence evolution was chosen using MrModeltest v2 [32] under Akaike's Information Criterion (AIC). The general time-reversible model (GST) was selected, with an estimated proportion of DNA sites invariant (I), and mutation rates among sites following a gamma distribution (G). This GTR+I+G model was then used to generate a Bayesian and a maximum likelihood (ML) gene tree. The Bayesian tree was obtained with MrBayes 3.1.2 [33], with the search conducted 100,000 iterations at a time, continuing until the average standard deviation of split frequencies approached its asymptote (roughly 0.01 after 400,000 generations). The collection of trees produced at this point was pruned heuristically by viewing the output of likelihood scores in MrBayes, and only trees near the optimum likelihood score were retained using the appropriate burn-in criterion. The final sample contained 2000 trees, on which posterior probabilities (PP) were calculated. To construct the ML tree, the hill-climbing algorithm of [34] was performed online via the PHYML web server [35], using the default options, the chosen GTR+I+G model, and a starting tree made by neighbor joining. For consistency with MrBayes, in which the form of the molecular model is specified but parameters are estimated, only the model form was specified in PHYML. Support for nodes in the tree was assessed using the approximate likelihood ratio test (aLRT, [36]) as implemented in PHYML.


Hierarchical comparison of K2P distances at different taxonomic levels revealed disjunct distributions in sequence similarity within vs. between species (Figure 2A,B). The average proportion of difference in sequences within species was 0.0146±0.0193 (mean±SD), whereas mean distance between species within a genus was over an order of magnitude larger, 0.345±0.100. The only overlap between these distributions results from comparisons of Eukrohnia hamata and E. bathyantarctica (K2P distances of 0.06–0.08).

Figure 2. Hierarchical histograms of pairwise Kimura 2-Parameter (K2P) distances between specimens.

Vertical lines show mean pairwise distance at each level. Asterisks mark outlier values discussed in the Results. A, K2P distances within species. B, K2P distances between species within each genus.

The optimal gene trees produced by Bayesian and ML searches showed nearly identical topology, in which the tip branches within species were short, and species were separated by much longer branches (Figure 3). Sequences clustered strongly by species in all cases. Although the nodes separating Sagitta spp. from all others (Heterokrohnia and Eukrohnia spp.) were well supported in the Bayesian analysis (both PP = 1.00), they were not well supported by ML (71% and <50%). Most other internal nodes were moderately supported by both analyses, or strongly supported by only the Bayesian analysis.

Figure 3. Gene tree for COI, showing topology and branch lengths from Bayesian analysis.

Pairs of numbers in parentheses are support values, given as (Bayesian posterior probabilities, approximate Likelihood Ratio Test support), with asterisks indicating maximum support of (1.00, 1.00), and blanks indicating topologies not recovered in that analysis. Scale bar denotes distance along branches. Underlined sequences were obtained from GenBank. Symbols following species names depict sampling location.


Barcode analysis of chaetognaths was extremely successful in diagnosing established species based on COI gene sequence, in that sequences clustered by species in all cases. Given the difficulty in diagnosing species from morphological features, especially in ethanol-preserved material, the high accuracy of barcode analysis presents a very useful tool to aid identification of known species. The comparatively short branch (and small K2P distances) between E. hamata and E. bathyantarctica may mean these are a young species pair, or that regional variants of a single species have been mistaken for separate species.

The average K2P distance within species for chaetognaths, 0.0145, was on the high end of values computed for other taxa: recent barcoding work has reported intraspecific mean K2P distances of 0.00460 (decapods, [24]), 0.00740 (gammarid amphipods, [24]), 0.0100–0.0200 (13,000 species pairs, [37]), and 0.00390 (fish, [22]). The average distance between the species within each genus for the present dataset, 0.345, was considerably larger than for these same taxa (0.170, 0.0.250, 0.110, and 0.099 respectively), and reflects the high diversity of Sagitta. Although not directly comparable to the K2P distances reported here, uncorrected p-distances of 6.30±2.74% (mean±SD) within Sagitta setosa, 2.08±0.95% within S. bedoti, and maximum-likelihood corrected distances of 77.7±3.45% between the two species have been reported [38]. These comparisons all indicate that most chaetognath species seem to have diverged long ago, and have undergone comparatively less divergence since. The disjunct distributions of K2P distances imply that barcode analysis can also alert taxonomists to genetically distinct lineages that warrant further morphological examination.

Although all barcodes for a given species in this dataset tended to be from the same locality, the genetic variation seen within species showed little association with geography. Most species (e.g. S. bipunctata, S. helenae, E. hamata) exhibited at least one barcode separated from the others by a longer branch, even though all were from the same location. In S. lyra, there was a weak clustering of two clades, but there was no separation between the central Atlantic (i.e. MAR) and the Northeast Atlantic. The presence of significant genetic diversity without geographic structure could imply reproductive mixing across the portion of the range represented in these specimens, or insufficient time for lineage sorting in isolated populations. More thorough barcoding of species throughout their ranges will be required to address the issue of phylogeography.

Although the COI barcodes did not resolve the branching order of the “paired triplet” species, preliminary analysis suggests that nearly complete sequences of the nuclear large ribosomal subunit (28S) will have the power to address this question (Jennings et al. unpublished data). Existing partial Class I sequences [39] contain insufficient variation to obtain robust branching order; however, if the preliminary patterns from full Class I 28S can be confirmed by more complete sequencing, they should shed light on this interesting evolutionary history.

On the whole, the chaetognath barcodes indicate a complex history of speciation and evolution. The lack of correlation between location and genetic similarity underscores this complexity, and the potential for genetic mixing over large distances in chaetognaths. At least for the species in the present analysis, COI barcode analysis was a highly successful and accurate tool for species confirmation, in that all species barcoded to date displayed readily distinguishable COI sequences, with lower divergence within species. Given the difficulty in identifying chaetognaths, particularly from suboptimally preserved material, barcoding of uncertain specimens and comparison to known specimens should greatly assist taxonomists in morphological identifications. More complete barcoding of species across their ranges promises to further elucidate the patterns of genetic diversity of this enigmatic group.


We gratefully acknowledge the assistance of the Captains and crew of the R/V G.O. Sars (MAR-ECO 2004), R/V R.H. Brown (RHB-0603), FS Polarstern (ANT-XXIV/1), and R/V N.B. Palmer (NBP0202). Some specimens were collected by Ksemia Kosobokova and kindly contributed by her. K.T.C.A. Peijnenburg provided helpful comments during the writing of this manuscript, and kindly shared sequence data and analyses. This study is a contribution from the Census of Marine Zooplankton (CMarZ, see, an ocean realm field project of the Census of Marine Life.

Author Contributions

Conceived and designed the experiments: RMJ AB. Performed the experiments: RMJ. Analyzed the data: RMJ APB. Contributed reagents/materials/analysis tools: AB APB. Wrote the paper: RMJ.


  1. 1. Bone Q, Kapp H, Pierrot-Bults AC (1991) The biology of chaetognaths. Oxford University Press, Oxford.
  2. 2. Vannier J, Steiner M, Renvoise E, Hu S-X, Casanova J-P (2007) Early Cambrian origin of modern food webs: evidence from predator arrow worms. Proceedings of the Royal Society of London B 274: 627–633.
  3. 3. Telford MJ, Holland PWH (1997) Evolution of 28S ribosomal DNA in chaetognaths: duplicate genes and molecular phylogeny. Journal of Molecular Evolution 44: 135–144.
  4. 4. Ghirardelli E (1968) Some aspects of the biology of the chaetognaths. Advances in Marine Biology 6: 271–375.
  5. 5. Marlétaz F, Martin E, Perez Y, Papillon D, Caubit X, et al. (2006) Chaetognath phylogenomics: a protostome with deuterostome-like development. Current Biology 16: R577–R578.
  6. 6. Halanych KM (1996) Testing hypotheses of chaetognath origins: long branches revealed by 18S ribosomal DNA. Systematic Biology 45: 223–246.
  7. 7. Telford MJ, Holland PWH (1993) The phylogenetic affinities of the chaetognaths: a molecular analysis. Molecular Biology and Evolution 10: 660–676.
  8. 8. Telford MJ (2004) Affinity for arrow worms. Nature 431: 254–256.
  9. 9. Papillon D, Perez Y, Caubit X, Le Parco Y (2004) Identification of chaetognaths as protostomes is supported by the analysis of their mitochondrial genome. Molecular Biology and Evolution 21: 2122–2129.
  10. 10. Matus DQ, Copley RR, Dunn CW, Hejnol A, Eccleston H, et al. (2006) Broad taxon sampling and gene sampling indicate that chaetognaths are sister to lophotrochozoans. Current Biology 16: R575–576.
  11. 11. Tokioka T (1965) The taxonomical outline of Chaetognatha. Publications of the Seto Marine Biological Laboratory 12: 335–357.
  12. 12. Peijnenburg KTCA, Fauvelot C, Breeuwer JAJ, Menken SBJ (2006) Spatial and temporal genetic structure of the planktonic Sagitta setosa (Chaetognatha) in European seas as revealed by mitochondrial and nuclear DNA markers. Molecular Ecology 15: 3319–3338.
  13. 13. Bieri R (1991) Systematics of the Chaetognatha. In: Bone Q, Kapp H, Pierrot-Bults AC, editors. The biology of chaetognaths. Oxford, England: Oxford University Press. pp. 122–136.
  14. 14. van der Spoel S, Pierrot-Bults AC (1973) Zoogeography and Diversity of Plankton. Bunge, Utrecht, Netherlands.
  15. 15. Pierrot-Bults , AC (2008) A short note on the biogeographic patterns of the Chaetognatha fauna in the North Atlantic. Deep-Sea Research Part II 55: 137–141.
  16. 16. Peijnenburg KTCA, Breeuwer JAJ, Pierrot-Bults AC, Menken SBJ (2004) Phylogeography of the planktonic chaetognath Sagitta setosa reveals isolation in European seas. Evolution 58: 1472–1487.
  17. 17. Pierrot-Bults , AC (1976) Zoogeographic patterns in chaetognaths and some other planktonic organisms. Bulletin Zoologisch Museum Universiteit van Amsterdam 5: 59–72.
  18. 18. Hebert PDN, Alina C, Shelley LB, Jeremy RD (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B: Biological Sciences 270: 313–321.
  19. 19. Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN (2006) DNA barcodes distinguish species of tropical Lepidoptera. Proceedings of the National Academy of Sciences USA 103: 968–971.
  20. 20. Clare EL, Lim BK, Engstrom MD, Eger JL, Hebert PDN (2007) DNA barcoding of Neotropical bats: species identification and discovery within Guyana. Molecular Ecology Notes 7: 184–190.
  21. 21. Pfenninger M, Nowak C, Kley C, Steinke D, Streit B (2007) Utility of DNA taxonomy and barcoding for the inference of larval community structure in morphologically cryptic Chironomous (Diptera) species. Molecular Ecology 16: 1957–1968.
  22. 22. Ward RD, Zemlak TS, Innes BH, Last PR, Hebert PDN (2005) DNA barcoding Australia's fish species. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences 360: 1847–1857.
  23. 23. Kelly RP, Sarkar IN, Eernisse DJ, DeSalle R (2007) DNA barcoding using chitons (genus Mopalia). Molecular Ecology Notes 7: 177–183.
  24. 24. Costa FO, deWaard JR, Boutillier J, Ratnasingham S, Dooh RT, et al. (2007) Biological identifications through DNA barcodes: the case of the Crustacea. Canadian Journal of Fisheries and Aquatic Science 64: 272–295.
  25. 25. Papillon D, Perez Y, Caubit X, Le Parco Y (2006) Systematics of Chaetognatha under the light of molecular data, using duplicated ribosomal 18S DNA sequences. Molecular Phylogenetics and Evolution 38: 621–634.
  26. 26. Pleijel F, Jondelius U, Norlinder E, Nygren A, Oxelman B, et al. (2008) Phylogenies without roots? A plea for the use of vouchers in molecular phylogenetic studies. Molecular Phylogenetics and Evolution 48: 369–371.
  27. 27. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and Biotechnology 3: 294–299.
  28. 28. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403–410.
  29. 29. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948.
  30. 30. Kimura M (1980) A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16: 111–120.
  31. 31. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24: 1596–1599.
  32. 32. Nylander JAA (2004) MrModeltest v2. Program distributed by the author. Evolutionary Biology Centre, Uppsala University.
  33. 33. Ronquist F, Huelsenbeck JP (2003) MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.
  34. 34. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology 52: 696–704.
  35. 35. Guindon S, Lethiec F, Duroux P, Gascuel O (2005) PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Research 33(Web Server issue): W557–9.
  36. 36. Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Systematic Biology 55: 539–552.
  37. 37. Waugh J (2007) DNA barcoding in animal species: progress, potential and pitfalls. BioEssays 29: 188–197.
  38. 38. Peijnenburg K, van Haastrecht E, Fauvelot C (2005) Present-day genetic composition suggests contrasting demographic histories of two dominant chaetognaths of the North-East Atlantic, Sagitta elegans and S. setosa. Marine biology 147: 1279–1289.
  39. 39. Telford MJ, Holland PWH (1997) Evolution of 28S ribosomal DNA in chaetognaths: Duplicate genes and molecular phylogeny. Journal of Molecular Evolution 44: 135–144.