A mega-cryptic species complex hidden among one of the most common annelids in the North East Atlantic

We investigate mitochondrial (COI, 16S rDNA) and nuclear (ITS2, 28S rDNA) genetic structure of North East Atlantic lineages of Terebellides, a genus of sedentary annelids mainly inhabiting continental shelf and slope sediments. We demonstrate the presence of more than 25 species of which only seven are formally described. Species boundaries are determined with molecular data using a broad range of analytical methods. Many of the new species are common and wide spread, and the majority of the species are found in sympatry with several other species in the complex. Being one of the most regularly encountered annelid taxa in the North East Atlantic, it is more likely to find an undescribed species of Terebellides than a described one.


Introduction
The revelation of cryptic species has increased exponentially since the use of molecular data in taxonomic studies became common practise, but our understanding of the magnitude and importance of this neglected biodiversity is still at an early stage [1][2][3]. To unravel, describe and explain this hidden and unexplored dimension of life on earth is one of the major challenges to practising taxonomists [1].
This paper is a case study on the genus Terebellides Sars, 1835 (Annelida) based on specimens collected from North East Atlantic waters, ranging from the British Isles in the south, to the Polar Basin in the north. The genus and its first member, Terebellides stroemii Sars, 1835, was described from the west coast of Norway near Bergen. Even though a few other species of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Specimens, and study area
Specimens were collected between 2005 and 2014 on collecting trips, or by the following scientific expeditions, monitoring programs or institutes: Survey of Utsjöbankarna, SAMARIN (Marine surveys done by the Swedish Taxonomy Initiative), BIOICE (Benthic Invertebrates of All samples were collected prior to that the Nagoya protocol entered into force, thus there was no need for specific permissions. Sampling did not include endangered or protected species. We analyzed 513 specimens from 133 collecting sites, in the depth range 8-4380 m (Figs 3 and 4), with the majority of the samples and specimens coming from the continental shelf along the Swedish and Norwegian coasts.
The study area was divided into the following biogeographic regions according to topographic and oceanographic features [24][25][26] (Fig 3). Kattegat (magenta dots in Fig 3), is a rather shallow area dominated by water masses from the North Sea, and heavily influenced by the Baltic Stream; Skagerrak (dark green), also a shallow shelf area, technically a part of the eastern part of the North Sea; North Sea (light green), shallow shelf area dominated by warm North Atlantic water masses; Irish Sea, Celtic Sea (orange), shelf areas, western UK and Ireland; Norwegian coast and shelf (red), north of Egersund to Loppa, areas <600 m except in the fjords, dominated by North Atlantic water with a mix of the less saline Norwegian coastal current; Norwegian Sea (brown), off the shelf break at approximately 600 m and deeper waters. Deeper areas below 800 m with permanent sub zero temperatures with Norwegian Sea deep water; Barents Sea (dark blue), separated from the Norwegian Sea by the shelf break between Norway and Svalbard, shelf sea dominated by cold water areas, but with a strong influence of North Atlantic water in the western areas and along the Troms and Finnmark coast [27]; Arctic Ocean (rose red), proper Polar Basin with permanent sub zero temperatures; Greenland Sea (yellow), with cold water areas with inflow of water from the Arctic Ocean by the East Greenland current; South of Iceland (light blue), area south of the Scotland-Faroe-Greenland ridge. Collecting data for specimens, together with voucher and GenBank accession numbers can be found in S36 Appendix and Table 1. Specimens are deposited in one of the following museums: Department of Natural History, University Museum of Bergen (ZMBN 116171-116514, 344 specimens), The Gothenburg Museum of Natural History (GNM 14625-15137, 74 specimens), Norwegian University of Science and Technology, NTNU University Museum, Trondheim (NTNU-VM 59990-72567, 36 specimens), and Senckenberg Museum Frankfurt (SMF 24368-24693, 59 specimens). All specimens are publicly deposited and accessible in a permanent repository.
PCR mixtures contained 0.33 μl of each primer (10μM), 1 μl of DNA template, and 10 μl of RedTaq 1.1x MasterMix 2.0 mM MgCl 2 (VWR). Temperature profile was as follows: a denaturation step at 96˚C for 1 minute, 29 cycles (95˚C for 30 seconds-52˚C (for COI and 16S rDNA) or 62˚C (for ITS2 and 28S rDNA) for 30 seconds-72˚C for 60 seconds), and a final step at 72˚C for 7 minutes. PCR products were run for c. 15 minutes on a 1% agarose gel electrophoresis, containing GelRed Nuclear Acid Stain (Bioticum), and then visualized under UVlight. PCR products were purified using ExoSAP-IT PCR Product Cleanup protocol (Thermo-Scientific). Sanger sequencing was performed on both strands at Eurofins Genomics, DNA Sequencing Department in Ebersberg, Germany. Overlapping complementary strands were merged into consensus sequences using Geneious version 7.0.6 [34].

Alignments
We used MAFFT version 7.017 [36] within Geneious version 7.0.6 with the following settings: algorithm = E-INS-i, scoring matrix = 200PAM / k = 2, gap open penalty = 1.53, to align 16S rDNA and 28S rDNA. Aligning was unproblematic since the sequences were of similar length and resulting alignments had a moderate number of indels. The ITS2-region was challenging to align due to a high number of indels, and we proceeded with aligning using two approaches. In the first approach, we removed identical haplotypes with the uniqhaplo.pl script (S35 Appendix) leaving a data set with 136 unique ITS2-sequences. As we experienced problems with two sequences that were shorter due to incomplete 3'-end, these sequences were first removed (1999_13 and 2865_24), and the remaining 134 complete, or nearly complete, sequences were aligned with the X-INS-i algorithm in MAFFT that takes into account the secondary structure of the sequence. Subsequently the short excluded sequences were reincluded with the mafft-add command. The resulting alignment is referred to as ITS2x-unique. In the second approach, the sequences in the ITS2x-unique alignment were realigned using the software RNAsalsa [37], using the secondary structure of ITS2 modeled for Eumida ockelmanni Eibye-Jacobsen, 1987 (GenBank accession number HM358782) [38] as a constraint, and implementing default parameters. The resulting alignment is referred to as ITS2s-unique. Identical sequences removed in the first step with the uniqhaplo.pl script were then added back to the two alignments by hand in Geneious version 7.0.6 mimicking the gaps present in those identical sequences aligned. The two resulting alignments with all 402 ITS2-sequences are referred to as ITS2x-all, and ITS2s-all. Finally, we used the MUSCLE alignment option in Geneious version 7.0.6 to align all 462 COI-sequences (COI-all) which was trivial due to the absence of indels.
Identical COI-sequences were removed using uniqhaplo.pl script creating an alignment with 271 unique COI-sequences (COI-unique). Where relevant, aligned gene partitions were concatenated using Mesquite v. 2.75 (Maddison and Maddison 2008) [39]. For the statistical parsimony haplotype analyses, we used COI-all, and the two ITS2-all alignments as a starting point. Sequences of each haplotype network were extracted separately, and subsequently these clade data sets were pruned to remove gaps in flanking positions that was caused by incomplete sequencing. The purpose of this was to obtain the same data coverage for all included specimens in each haplotype network, and allowing for an unambiguous assessment of haplotypes. In a few instances, one, or a few of the shortest sequences were removed prior to pruning the sequence ends (Tables 3 and 4). In the choice between removing short sequences or pruning we chose the method that kept the maximum number of haplotypes. As there were a few ambiguities assessing number of haplotypes between the two ITS2-alignments, although based on the same data, we decided to realign the ITS2-data from each network separately, using the E-INS-i algorithm in MAFFT, with scoring matrix = 200PAM / k = 2, and gap open penalty = 1.53. The rational behind this is that aligning more similar sequences will result in a more accurate alignment. For the distance calculations we used COI-all, and ITS2s-all alignments. All different alignments, and data set combinations described above are available as S1-S9 Appendixes.

Data set combinations
For a robust assessment of the evolutionary relationships of the Terebellides lineages, specimens for which three or four of the genetic markers were present (i.e. COI, 16S rDNA, ITS2, 28S rDNA), were combined into a data set comprising 91 Terebellides specimens (S36 Appendix and Table 2, last column) plus three outgroups. This was done by combining COI-all with either ITS2x-all or ITS2s-all, concatenating 16S rDNA and 28S rDNA, but excluding specimens that did not meet the criteria having three or four genetic markers. This resulted in two data set combinations, referred to as concatenated-xinsi-alignment (CONCATx) and concatenated-salsa-alignment (CONCATs).
For the three types of species delimitation analyses, we used the following data sets: COI-all, ITS2x-all, and ITS2s-all for TCS; COI-unique, ITS2s-unique, and ITS2x-unique for GMYC [40,41]; the concatenated alignment of COI-all and ITS2s-all, keeping all specimens with both COI and ITS2 data present, resulting in a data set with 351 Terebellides specimens ( Table 2, 5th column) for STACEY [42].

Model selection
Best-fit models for phylogenetic analyses were selected using the Akaike information criterion in JModel [43]. The protein coding gene COI was divided into two partitions, one with the first and second codon positions, and one with the third codon positions. In the general phylogeny of Terebellides, ITS2 and the neighboring 28S rDNA were combined into a single partition.

Phylogenetic analyses
Mitochondrial (COI and 16S rDNA) and nuclear data sets (ITS2 and 28S rDNA) were analyzed separately and combined using Bayesian inference (BI), and Maximum Likelihood (ML). This means five different analyses per method; 1) mitochondrial data alone, 2) nuclear data alone with 28S rDNA combined with xinsi-, or 3) salsa-aligned ITS2 sequences, and 4) mitochondrial data combined with nuclear data with 28S rDNA combined with xinsi-, or 5) salsa-aligned ITS2 sequences (S8 and S9 Appendixes). Bayesian analyses of separate and combined data sets were run in MrBayes version 3.2 [44]. Partitions were unlinked for the parameters statefreq, revmat, shape and pinvar. Rateprior for the partition rate multiplier was set to be variable. Two independent analyses were run for 10 million generations, with four parallel chains (three hot, one cold), that were sampled every 1000th generation. One fourth of the samples was discarded as burn-in. Maximum likelihood analyses were performed in raxmlGUI [45]. In Table 3. Summary of haplotype and distance analyses for COI, with specification of excluded sequences, alignment length, number of haplotypes, and uncorrected intra-and interspecific distances. Species number to which the species is compared with, for the minimum and maximum interspecific distances, in parentheses. RAxML, we used the same partitioning as in MrBayes, and node support was assessed with 1000 bootstrap replicates.

Species delimitation analyses
Minimum spanning haplotype networks were constructed with the software program TCS 1.2.1, using a 95% connection limit with gaps = missing. The General Mixed Yule Coalescent model (GMYC) uses a likelihood ratio test to compare a null model assuming a single coalescent branching rate across a clock-like tree (i.e. intraspecific population events) with a complex model including both coalescent and Yule (interspecific diversification events) branching rate models. The later also estimates the threshold time that maximizes the transition between coalescent and Yule branching models, and hence delimiting species boundaries. Species delimitation with the GMYC algorithm was performed with the R library splits v.1.0-19 [46] using a single threshold and the required R packages ape, paran, and MASS. Ultrametric trees for species delimitation using GMYC algorithm were built in BEAST v1.8.2 [47] setting a nucleotide  [48] and widely implemented by many studies. Alternation of the GMYC algorithm permit to assess whether the branch leading to a node contains a threshold from coalescence to speciation under different coalescent models [41]. A node support value of 1 means that all coalescent models tested support the existence of a speciation event on that branch, and lower supports indicate that fewer coalescent models support such a speciation event. The number of species and so species limits would be influenced by the support cut-off selected. With lower cut-off value, the number of species will be more similar to the raw species delimitation estimated by GMYC algorithm without taking into account the support. On the other hand, higher cut-off values would reduce the number of species, generally merging closely related GMYC entities (species). We selected an arbitrary, but high, GMYC support value cut-off (0.9) to ensure that remaining species are discovered by GMYC algorithms (i.e. supported) under most of the different coalescent models tested (90%). The optimal cut-off value should be validated by simulation studies and with several empirical datasets but this is beyond the scope of our study. STACEY is a phylogenetic and a species delimitation method under a multispecies coalescent method (i.e. find the species tree and delimit species but allowing different coalescent gene trees and coalescent times). STACEY v. 1.2.0 analyses were run in BEAST2 v2.4.3 [49].

Haplotype analyses, genetic distances, maps and distribution analysis
Haplotype networks were constructed using the TCS network inference method with a 95% connection limit, and gaps treated as uninformative. Each individual network was plotted in PopART [50] including distribution information according to the geographic areas designated. Uncorrected p-distances, with gaps treated as uninformative, were calculated in PAUP Ã 4.0b10 [51], and Microsoft Excel v. 14.

Morphological analysis
The aim of the morphological work in the present study was primarily to identify our species to available species names, and to allocate these available names to the correct clade circumscribed by the molecular analysis. The detailed morphological analyses of new species derived from this study will appear in forthcoming papers.

Model selection
The selected best-fit models were a general time reversible model with a proportion of invariable sites and gamma distributed rate across sites (GTR+I+G) for the partitions 16S rDNA, ITS2, and ITS2 combined with 28S rDNA, and COI-partition with third codon sites only, while a general time reversible model with a proportion of the sites invariable (GTR+I) was selected for the COI-partition including first and second codon positions. In RAxML, the analyses were run with an independent GTRGAMMAI model for each partition, as the program do not allow the assignment of more than one model to different partitions.

Phylogenetic analyses
The combined data set of the two different combinations of COI, 16S rDNA, ITS2 and 28S rDNA (CONCATx and CONCATs) consisted of 2574/2474 aligned positions, of which 993/ 1023 were parsimony-informative, and 172/171 were variable but not parsimony-informative. The results from the separate and combined analyses are summarized on the ML-tree from CONCATx (Fig 5). The phylogenetic tree is arbitrarily divided into four major groups, A-D, to make the presentation of the results more perspicuous. The results from each analysis (S10-S19 Appendixes), are presented in pie diagrams next to each node (Fig 5). The different analyses show high level of congruence between methods (ML or BI), alignment treatment (CON-CATx or CONCATs), and data set combinations (mitochondrial, nuclear or combined). Out of the 49 nodes in Fig 5, 35 are identical among all five different analyses. There are few conflicting nodes between the topologies, most of them are related to the arrangement within group A, and most of them have low node support and therefore cannot be interpreted as incongruences. However, the analyses have recovered four well supported clades different to the topology illustrated in Fig 5: 1) clades 11 and 19 (group A) are sister taxa with BI-support of 0.97, in the separate nuclear data set with salsa-aligned ITS2 sequences (S13 Appendix); 2) clade 18 (group A) is sister taxa to a clade with 11, 12, 13, 19, 20, and 21 with BI-support of 0.95 in the separate nuclear data set with salsa-aligned ITS2 sequences (S13 Appendix); 3) clade 17 (group B) is sister taxa to a clade with 1, 4, 5, 14, 16, 26, and 27, with 0.98 in BI-support and 78 in ML-support, in the separate nuclear data set with xinsi-aligned ITS2 sequences (S14 and S15 Appendixes); 4) clades 24 and 25 (group C) are sister taxa, with 0.93/1.0 in BIsupport, and 70/95 in ML-support in both separate nuclear data sets (with xinsi-or salsaaligned ITS2 sequences) (S12-S15 Appendixes).

Species delimitation analyses: TCS, GMYC and STACEY
The statistical parsimony analysis of the COI data set, rendered 28 separate haplotype networks, while TCS analyses of ITS2x and ITS2s resulted in 24 and 23 networks respectively (S20-S22 Appendixes). GMYC analysis of the COI data set rendered 28 putative species, and GMYC of ITS2x and ITS2s resulted both in 24 putative species (S23-S31 Appendixes). In STA-CEY we treated the 28 haplotype networks from the COI data as the species to be tested, and in 98.8% of the resulting trees, all of these 28 clades were recovered and in 1.2% of the trees, clades 20 and 28 were lumped together (S32 Appendix) (see Fig 5). We used the most inclusive data sets for each species delimitation analyses, and in TCS all sequences of COI (n = 462) and ITS2 (n = 402) were included, in GMYC all unique sequences of COI (n = 271) and ITS2 (n = 136) were included, while all terminals with both COI and ITS2-data (n = 351) were included in STACEY. The outcomes from the TCS, GMYC and STACEY analyses are identical for 17 of the 28 putative species, namely clades 1, 2, 3, 6,7,9,10,11,14,15,17,18,19,22,23,24, and 25. Looking at the instances where there is disagreement among methods, and starting with group A, clades 12 and 13 are separate in all analyses except for TCS on ITSs, where the haplotypes are connected into a single haplotype network, with the closest haplotypes for clades 12 and 13 separated by eight mutations (connection limit = nine). Clade 8 is further divided in the GMYC-analysis of COI where a group with six haplotypes (1197_8, 1198_8, 1999_5, 2013_8, 2014_8, 2214_8) is found as a separate putative species. The closest haplotype of this group is seven mutations from the closest haplotype in the main group of clade 8 in the minimum-spanning haplotype network from the TCS-analysis on the same data. Clades 20 and 28 are connected in the GMYC-analysis of COI. The closest haplotypes for these clades are separated by 16 mutations in the minimum-spanning haplotype network from the TCS-analysis (using a fixed connection limit) on the same data. Clades 20 and 28 share the same haplotype in ITS2, and are thus connected in all analyses on ITS2; this haplotype is also connected to clade 21 in the GMYC-analysis of ITS2s. Haplotypes of clades 21 and 20/28 are separated by 11 mutations in the minimum-spanning haplotype network from the TCS-analysis (using a fixed connection limit) on the same data. Continuing with group B, clades 5 and 16 are connected in the TCSanalyses of ITS2x and ITS2s (where the closest haplotypes of clades 5 and 16 are separated by 6 and 5 mutations; connection limit = 9), as well as in the GMYC-analysis of ITSx. Clades 4, 26 and 27, all represented by single haplotypes in ITS2, are connected in all four analyses of the ITS2-data. The haplotypes are separated by between three and eight mutations in the minimum-spanning haplotype network in the two TCS-analyses.
In summary, we suggest that clades 12 and 13 represent different species even though they are connected in one of the ITS2-analyses. The two clades do not share any ITS2-haplotypes (Fig 6), and both lineages are fairly well sampled with 23 (clade 12) and 27 specimens (clade 13). There are also insertion/deletion events in the ITS2-sequence alignments that support the two clades, however, in the analyses presented here, we treated indels as missing data. We further conclude that the separate putative species in clade 8 found in the GMYC-analysis of COIdata could be ignored as intraspecific genetic variation (only seven mutations in the TCS-analysis), and there is neither any differences in the ITS2-data to support such a conclusion. We do think that there is evidence that clades 20 and 28 should be regarded as the same species even though they have separate haplotype networks in the TCS-analysis on the COI-data, both lineages are under-sampled with only two (clade 20) and five specimens (clade 28), and the difference between the lineages is within the variation that is found in better sampled clades (compare clades 20 and 28 in Fig 6 with clade 8 in Fig 7, and clade 16 in Fig 8), and there is a good chance that the haplotypes would be connected given a larger sample size. ITS2-data also support this conclusion as clades 20 and 28 share the same ITS2-haplotype (Fig 6). Results from STACEY also give some support to this deduction. In contrast, we believe that it is likely that clade 21 represents a separate species even though it is connected with clade 20/28 in the GMYC of the ITS2s, differences in COI between 20/28 and 21 is substantial (12.1-13.2%) ( Table 3), and there is also additional indel events in the ITS2-data alignment that suggests that they do represent different species. As was the case for clades 12 and 13, we also strongly argue  (Figs 6-8). Specimens with at least three of the genetic markers were included in the phylogenetic analyses, outgroups are not shown. Pie diagrams indicate support values for the node, left pie shows results from ML analyses, and right pie diagram results from Bayesian analyses. Upper two slices of a pie illustrate results from the combined data sets' two different alignments, with xinsi-aligned ITS2-sequences to the left, and salsaaligned ITS2-sequences to the right. The three remaining slices illustrate results from the combined mitochondrial data (lower left slice), and the combined nuclear data sets' two different alignments, where lower median slice has xinsi-aligned ITS2-sequences, and lower right slice has salsaaligned ITS2-sequences. Yellow, blue and red colour indicate low, moderate and strong support, which equals ML support in the intervals 50-74, 75-89, and 90-100, or BI posterior probabilities in the intervals 0.50-0.84, 0.85-0.94 and 0.95-1.0 respectively. White means support <50/0.50 for the node. Columns show clustering of terminals according to different methodologies performed on more inclusive data sets where all specimens with COI or ITS2 data, or specimens with both COI and ITS2 data, were included. The first columns under the headings COI, ITSx and ITSs represent the results from TCS, and the second columns represent the results from GMYC. The columns under the heading STACEY show the two different outcomes from this analysis. White means that the network or species recovered is identical to the initial haplotype network found in COI including all COI-sequences, light grey means that less inclusive networks or putative species were recovered, and dark grey means that a more inclusive network or putative species was recovered. Double-headed arrows to the right of the columns show our final judgement of species delimitation. The two small letters to the right indicate our designation of described species, st = T. stroemii, bi = T. bigeniculatus, at = T. atlantis, sh = T. shetlandica, ir = T. irinae, wi = T. williamsae, and gr = T. gracilis.
https://doi.org/10.1371/journal.pone.0198356.g005 that clades 5 and 16 represent different species, even though they are connected in three of the four ITS-analyses. The two clades do not share any ITS2-haplotypes (Fig 8), and there are also indel events and morphological data (see below) supporting their separation. Finally, clades 4, 26, and 27 is suggested to represent different species, but the lineages are poorly sampled both in numbers and in geographic distribution, and more specimens are needed. However, COIdifferences (13,3%) as well as ITS2-differences (Fig 7) in the two sympatric clades 26, 27 is comparable to the differences found between other closely related species pairs in the species complex, but as only 1 (clade 26) and 2 specimens (clade 27) were found of these clades, we are less certain in this case.
To conclude, we think we have strong evidence that we have between 25 and 27 different species of Terebellides among the sequenced specimens. In the analyses and discussion below we will proceed with the 27 species hypothesis, and the species will be referred to as species 1, 2 etc. following the original clade numbering, until the available names can be allocated to their proper clades. The clades 20 and 28 will be referred to as species 20/28.

Biogeographic and bathymetric analyses
The number of species varied rather much between the biogeographic regions (Figs 9-11). However, as the study was not designed to assess the differences in diversity for different areas, we cannot answer if certain areas are more diverse than others. Instead, the number of species strongly correlates with how many specimens that are sequenced (Fig 9), and this probably explain much of the differences found in diversity among areas. Some sort of saturation in discovering new species seems to be reached at about 100 sequenced specimens for a biogeographic area. We found more than one species in all biogeographic regions except for the two most poorly sampled regions, Arctic Ocean, and Irish and Celtic Seas (Fig 11), while the highest diversity was found in the best sampled regions with 13 species among 192 specimens in the Norwegian coast and shelf area, 13 species among 100 specimens in Barents Sea, and 10 species among 108 specimens in Skagerrak (Figs 9 and 10).
With regard to similarity in shared Terebellides species between the different biogeographic regions the following may be assumed (Fig 10), Kattegat is most similar to Skagerrak, with four out of its four species in common; Skagerrak is most similar to Norwegian coast and shelf, with eight out of its 10 species in common; North Sea is most similar to either Skagerrak, or Norwegian coast and shelf, with two out of its three species in common; the single species found in Irish and Celtic Sea is also present at the Norwegian coast and shelf, North Sea, Skagerrak and Kattegat; Norwegian coast and shelf is most similar to Skagerrak, with eight out of its 14 species in common; Norwegian Sea is most similar to either Skagerrak, Norwegian coast and shelf, Barents Sea, or Greenland Sea, with two out of its three species in common; the single species found in the Arctic Ocean is endemic for the area; Greenland Sea is most similar to Barents Sea, with four out of its four species in common; and the area South of Iceland is most similar to either Norwegian Sea, or Norwegian coast and shelf, with two out of its eight species in common. Endemic species are found in the Arctic Ocean (species 24), North Sea (species 9), Norwegian coast and shelf (species 11), Barents Sea (species 21, 25, 26, and 27), and in the area South of Iceland (species 17, 18, 19, 22 and 23).
Many of the species that that were found in the same biogeographic regions also overlapped in their bathymetric distribution (Fig 12). Yet, there is some sort of division between some of Distribution maps, depth distribution in meters, and haplotype networks for group A, species 10,11,18,19,23,21,12,13, and 20/28. All species except for species 20/28 that we refer to T. bigeniculatus (bi) are undescribed. Sites are colour coded as in Fig 3. Type locality for T. bigeniculatus indicated with yellow arrow.
https://doi.org/10.1371/journal.pone.0198356.g006 the species, e.g. species 6 and 7 are found down to about 200 meters depth, while the closely related species 8 is found below 200 meters depth. Within the same biogeographic area, up to eight different species can be found in a depth span of 100 meters, and even in the same sample from a supposedly homogenous environment from a mud bottom from 534 meters depth, in the Trondheimsfjord in Norway, using a Sneli sledge, up to five different species were found (see Table 1, siteID NCS24). We can safely conclude that a majority of the species live in sympatry with several other species in the complex.

Haplotype and distance analyses
Distance calculations (S33-S34 Appendixes), uncorrected, are summarized in Tables 3 and 4, as are the results from the haplotype analyses. The latter are also visualized in Figs 6-8 for all different species. For most species, haplotypes, or group of closely related haplotypes, are generally not restricted to a certain area. A few species show a week tendency towards geographic sorting, e.g. in species 16 (Fig 8), the haplotypes from the area South of Iceland (light blue) may to some extent be interpreted in this way. Haplotype diversity is generally high, and in a few of the well sampled species it is extreme. In species 2 there are 25 haplotypes among 32 specimens, in species 3 there are 44 haplotypes among 50 specimens, in species 8 there are 33 haplotypes among 40 specimens, and in species 16 there are 48 haplotypes among 55 specimens.

Morphological analyses
Group A comprises 13 species. For the time being we are not able to find any morphological character that unites the group, but two of the known species, T. bigeniculatus and T. stroemii, can be attributed to two of the clades found. Terebellides bigeniculatus is identified by the presence of geniculate chaetae (Figs 1B, 2A and 2C-2E) in both chaetigers 5 and 6, and this condition is found in species 21 and species 20/28, and as the latter of these two species is the only one found among our Icelandic specimens we suggest that the name T. bigeniculatus, that has its type locality north-west of Iceland, may be used for species 20/28. Terebellides stroemii on the other hand is characterized by a robust body, and relatively small branchiae, with partially fused lobes (Fig 1B) instead of unfused ones (Figs 1A and 2B). From the available diagnosis, any of the clades 6, 7, 8 and 9 are possible candidates for representing the true T. stroemii. Terebellides stroemii has a type locality from between 55-110 meters depth near Bergen in SW Norway, and species 8 is only found deeper than 200 meters and is thus excluded for being the nominal species, and with the same reasoning, we also exclude species 9 due to that it is only found in the North Sea region. However in the choice between species 6 and 7 we cannot say right now which one is more likely to be the correct T. stroemii, but our suggestion is that clade 6 could be used for the name, because in our samples it seems to be the most common and widely spread species of the two.
Group B comprises eight species. A possible morphological identifier for this group of species is that they all have small to medium sized, elongated bodies. In this group, two clades, 16 and 1, could be identified as already described taxa. Species 16 is characterized by having unfused branchial lobes, with a low number of lamellae. Due to this, and because of its distribution, found at great depths in Greenland Sea and in the area South of Iceland, congruent with the species original depth distribution, we suggest that the name T. atlantis might be Distribution maps, depth distribution in meters, and haplotype networks for group A, species 6, 7, 8, and 9, and for group B, species 1, 17, 14, 4, 26, and 27. All species except for species 6 that we refer to as T. stroemii (st), and clade 1 that we refer to as T. shetlandica (sh) are undescribed. Sites are colour coded as in Fig 3. Type localities for T. stroemii, and T. shetlandica indicated with yellow arrows.
https://doi.org/10.1371/journal.pone.0198356.g007 applicable for this species. Species 1 should be referred to T. shetlandica, it is the only species we have found that have the characteristic gills with branchial lobes of different sizes, and provided with a long posterior filament (Fig 2B), diagnostic features for T. shetlandica. Moreover, in some specimens a parasitic copepod was found, as was also described for several specimens of T. shetlandica in the original description.
Group C comprises three species, with no apparent morphological identifier. We attribute the name T. irinae to the deep-water species 24 found only in the Arctic Ocean in our analysis. It fits the original description well, and even if our collecting sites are not near the type locality we think a distribution from Beaufort Sea to the Arctic Basin is likely.
Group D also comprises three species. The group is characterized by white ventral colouration in anterior thoracic chaetigers (1 to 4) (Fig 1A). Species 2 and 3 are further characterized by having branchiae with ventral and dorsal lobes of similar shape (Fig 2A). The combination of these characters fits the diagnosis of two already described species of Terebellides, T. gracilis and T. williamsae. Terebellides williamsae is considered to be a junior synonym to T. gracilis but we prefer to withdraw it from synonymy even though, at this moment, we do not have any morphological characters that separates them. Species 2 is suggested to represent T. williamsae as it is the only one of the two occurring in the Barents Sea (the type locality for T. williamsae), and thus species 3 is suggested to represent T. gracilis even though both species 2 and 3 are found in sympatry at the type locality for T. gracilis, the Swedish coast of Skagerrak.

Discussion
Cryptic species are of paramount importance because of their commonality, and they are routinely found in genetic surveys, also in well-known taxa in well-studied areas [2,54]. It is clear that the small fraction of morphological species that has been investigated so far still only represents the tip of the iceberg as Knowlton stated in her visionary paper on sibling species almost 25 years ago [55]. Considering that cryptic species literally are everywhere, in a taxonomic as well as in a geographic context, they can in no way be neglected if we want to correctly assess species diversity, understand biogeographic patterns or keep track of natural or man-made induced changes in the marine environment.
Terebellides is one of the most regularly encountered annelid taxa in environmental monitoring programs in the North East Atlantic [56], and it is normally reported under the species names T. gracilis, and T. stroemii, and in recent years T. shetlandica and T. bigeniculatus have been added to the list. Prior to this study, we suspected there might be cryptic species hiding among Terebellides, but it came as an overwhelming surprise to find so many of them, and that in cases some of them were so common.
Having a closer look at the best sampled areas (Fig 12), starting with Kattegat and Skagerrak, a very small part of the North East Atlantic. There are so far three different species reported from this area, and we have identified these in our sequenced specimens; T. stroemii (species 6), T. gracilis (species 3), and T. shetlandica (species 1). In addition we have a new record of T. williamsae (species 2), but the remainder, that is species 4, 5, 7, 8, 12 and 13 are unknown and undescribed. Out of the 133 specimens sampled and sequenced from the area, about roughly 1/3 (47 specimens) belong to this latter category of undescribed species. Continuing with the Norwegian coast and shelf, we find the same four described species present in Kattegat/Skagerrak, and in addition T. bigeniculatus (species 20/28), but species 5,7,8,10,11,13,14, and 15 are all undescribed. These unnamed species gather more than half of the specimens sequenced (117 out of 192 = 61%), and that means in short, that the current probability of finding an undescribed species of Terebellides is larger than finding a described one! This is indeed astonishing given that we are dealing with one of the best investigated marine environments in the world, the relatively shallow waters just outside the coasts of Sweden and Norway, and one of the most frequently encountered annelid taxa in the area. The situation in the Barents Sea is similar, and we find T. williamsae (species 2), T. atlantis (species 16) and T. bigeniculatus (species 20/28), but neither T. stroemii (species 6) nor T. gracilis (species 3) among our sequenced specimens; in addition we also find the undescribed species 8, 10, 12, 13, 14, 15, 21, 25, 26, and 27, and together these undescribed species represent c. 50% of the sequenced specimens. Greenland Sea and the area South of Iceland are dominated by specimens of T. atlantis (species 16), T. gracilis (species 3), and T. williamsae (species 2), but there are also quite a few undescribed species present here as well, but as the sample size is not as large as in Skagerrak, Norwegian coast and shelf, and Barents Sea the results are not really comparable.
Looking at the depth distribution for the different species in a given geographic area, we can see that most species overlap in depth, and there is, in most cases, no clear sorting of species at different depths (Fig 12). In the depth range 150-250 meters in the Norwegian coast and shelf region, we have nine species present, and they all more or less overlap and are present at most of the localities that we have been able to sample, e.g. in the area from Trondheim in the north to Bergen in the south, 11 species are found (Figs 6-8), indicating that they do not inhabit specific habitats like fjords or the open ocean. For most of our samples we have used a sledge, a dredge or a beam trawl, all these gears sample material from an unspecified area of the sea floor. But often, at least when a sledge is used, the area sampled is an apparently flat uniform habitat of mud. In 49 out of the 89 sites from where we have sequenced more than one specimen, we found more than one species (Table 1), and in the most species-rich samples, five different species of Terebellides were found, e.g. site NCS24, a sample from a flat mud bottom from 534 meters depth in the Trondheimsfjord. There are few samples taken with a grab, but in one of them (BS5) we found two species co-occurring. Anyway, as we did not sequence all specimens from all samples, it is difficult to assess how many species of Terebellides that do co-occur at the same site. For many of the sites only one or a few specimens were sequenced, thus it is likely that diversity for each separate site is underestimated, but when looking at a slightly larger scale this should not be the case.
Apart from the fact that so many species of Terebellides still go under the radar, and that these unknown and undescribed species are so common, and even constitute a major part of the diversity both in number of species and specimens, one other thing struck us: the extreme diversity of haplotypes found in COI among some species. The most note-worthy are T. gracilis (species 3), T. williamsae (species 2), T. atlantis (species 16), and species 8, where almost all specimens sampled and sequenced have their own unique haplotype (Figs 7 and 8, Table 3). This variation rarely has led to an amino-acid substitution within the species, and in T. atlantis (species 16) all 48 haplotypes found among the 55 specimens produce the same exact aminoacid sequence. As the sample size varies a lot between different species, it is difficult to make a direct comparison in haplotype diversity, but one thing to note is that all those four species mentioned above are found at greater depths than a couple of 100 meters (Figs 7 and 8), in contrast with two other species that also are well represented in the material, i.e. T. shetlandica (species 1), and T. stroemii (species 6), that are found at more shallow depths (Fig 7). The relatively low genetic diversity among these shallow water species may be explained by that they have been more affected by the recurrent ice ages that have occurred during the last 1.8 million years [57], than the species living at greater depths have been. Even so it is hard to understand and explain the extremely high diversity of haplotypes, and how it is maintained, in these deeper-living species, but see [58] that also reported high haplotype diversity in Aonidella cf dayi, for possible explanations to this phenomenon.
Our angle on this study has been a molecular one, in order to find out how many species that occur in North East Atlantic waters, and the full morphological investigation has to await forthcoming studies. The main purpose of the morphological examination conducted in this paper has been to connect the described and known morphological species to the correct, or at least the best, molecularly recognized species. It is our hope that we in the future will be able to find morphological characters that will help in standard morphological identification down to at least a group of possible species, and in the best of worlds also down to species level. Molecular data from this study will be vital to help us to sort out when this latter task is obtainable and when it is not.
Much water has passed under the bridge since Holthe [59] published his book on Terebellomorpha in the North East Atlantic, when he discussed the supposed cosmopolitan distribution of T. stroemii. He acknowledged that the worldwide reports were due to a confusion of closely related species, but nevertheless stated that 'I do not suspect that there are more than one species in the Norwegian material'. Still in these days, most Terebellides in the North East Atlantic are routinely identified as T. stroemii, and our comprehensive study make it clear that this is a severe underestimation of the true diversity among Terebellides. We do not think that Terebellides is an unusual example of cryptic species, on the contrary, when morphospecies are properly assessed molecularly, in terms of sampling strategy and number of specimens analyzed (e.g. [60]), it is commonplace to find more than one species, sometimes several, in the material. Already Grassle [61] asked the question 'How common are cryptic polychaetes' when she and her husband had discovered six cryptic species of Capitella capitata after an oil spill in West Falmouth, Massachusetts in September 1969 [62]; we think we now have taken a small step further towards the answer to this long-held question. S8 Appendix. CONCATx.nex. Concatenated alignment, in nexus-format, of COI, 16S rDNA, ITS2s, and 28S rDNA, including specimens with data from three of the four genetic markers. (NEX) S9 Appendix. CONCATs.nex. Concatenated alignment, in nexus-format of COI, 16S rDNA, ITS2x, 28S rDNA, including specimens with data from three of the four genetic markers. (NEX) S10 Appendix. CONCATmito_ML.tre. Resulting tree with support values, using Maximum Likelihood on mitochondrial data only. (TRE) S11 Appendix. CONCATmito_BI.tre. Resulting tree with support values, using Bayesian inference on mitochondrial data only. (TRE) S12 Appendix. CONCATnucls_ML.tre. Resulting tree with support values, using Maximum Likelihood on nuclear data only, with salsa-aligned ITS2-sequences. (TRE) S13 Appendix. CONCATnucls_BI.tre. Resulting tree with support values, using Bayesian inference on nuclear data only, with salsa-aligned ITS2-sequences. (TRE) S14 Appendix. CONCATnuclx_ML.tre. Resulting tree with support values, using Maximum Likelihood on nuclear data only, with xinsi-aligned ITS2-sequences. (TRE) S15 Appendix. CONCATnuclx_BI.tre. Resulting tree with support values, using Bayesian inference on nuclear data only, with xinsi-aligned ITS2-sequences. (TRE) S16 Appendix. CONCATs_ML.tre. Resulting tree with support values, using Maximum Likelihood on the combined mitochondrial and nuclear data set, with salsa-aligned ITS2-sequences. (TRE) S17 Appendix. CONCATs_BI.tre. Resulting tree with support values, using Bayesian inference on the combined mitochondrial and nuclear data set, with salsa-aligned ITS2-sequences. (TRE) S18 Appendix. CONCATx_ML.tre. Resulting tree with support values, using Maximum Likelihood on the combined mitochondrial and nuclear data set, with xinsi-aligned ITS2-sequences. (TRE) S19 Appendix. CONCATx_BI.tre. Resulting tree with support values, using Bayesian inference on the combined mitochondrial and nuclear data set, with xinsi-aligned ITS2-sequences.  Table 1), sequence ID, and GenBank accession numbers.