Multi-Gene Analysis Reveals a Lack of Genetic Divergence between Calanus agulhensis and C. sinicus (Copepoda; Calanoida)

The discrimination and taxonomic identification of marine species continues to pose a challenge despite the growing number of diagnostic metrics and approaches. This study examined the genetic relationship between two sibling species of the genus Calanus (Crustacea; Copepoda; Calanidae), C. agulhensis and C. sinicus, using a multi-gene analysis. DNA sequences were determined for portions of the mitochondrial cytochrome c oxidase I (mtCOI); nuclear citrate synthase (CS), and large subunit (28S) rRNA genes for specimens collected from the Sea of Japan and North East (NE) Pacific Ocean for C. sinicus and from the Benguela Current and Agulhas Bank, off South Africa, for C. agulhensis. For mtCOI, C. sinicus and C. agulhensis showed similar levels of haplotype diversity (Hd = 0.695 and 0.660, respectively) and nucleotide diversity (π = 0.003 and 0.002, respectively). Pairwise FST distances for mtCOI were significant only between C. agulhensis collected from the Agulhas and two C. sinicus populations: the Sea of Japan (FST = 0.152, p<0.01) and NE Pacific (FST = 0.228, p<0.005). Between the species, FST distances were low for both mtCOI (FST = 0.083, p = 0.003) and CS (FST = 0.050, p = 0.021). Large subunit (28S) rRNA showed no variation between the species. Our results provide evidence of the lack of genetic distinction of C. sinicus and C. agulhensis, raise questions of whether C. agulhensis warrants status as a distinct species, and indicate the clear need for more intensive and extensive ecological and genetic analysis.


Introduction
The taxonomic relationships of closely related species provide vital information for accurate assessment and conservation of marine biodiversity. However, identifying diagnostic characteristics for species identification and agreeing on exact delimitation of species boundaries has remained challenging. Molecular phylogenetic analysis provides a reliable and independent means to evaluate evolutionary and taxonomic relationships and examine species boundaries among closely related and cryptic species [1,2,3]. Molecular systematic and phylogenetic studies of marine zooplankton have resulted in the revision of many pelagic marine taxa [4,5], including copepods [6,7,8,9].
The planktonic marine copepod family, Calanidae (Crustacea: Copepoda: Calanoida) includes eight genera and 29 species that share highly similar morphological characteristics and overlapping species' distributions [10]. Evolutionary relationships within the Calanidae continue to be a topic of debate [11]. The genus Calanus comprises 14 species, including 11 that have been assorted into two sibling species groups: the finmarchicus group (C. finmarchicus, C. glacialis, C. marshallae) and the helgolandicus group (C. helgolandicus, C. agulhensis, C. australis, C. chilensis, C. euxinus, C. jashnovi, C. pacificus, C. sinicus), as well as three ungrouped species (C. hyperboreus, C. simillimus and C. propinquus; [12,10]). The sibling species are discriminated in many cases by very subtle morphological and morphometric characters, primarily secondary sexual characters [13,14], and species identifications are frequently based on individual size and geographical collection location.

Taxonomy and Ecology of the Species
Calanus agulhensis was first documented by Cleve [15] as C. finmarchicus off the coast of South Africa. A distinct new species was described by De Decker et al. [16] based on subtle morphological characters and geographical isolation of the South African populations from those of C. australis and C. pacificus. In particular, De Decker et al. [16] differentiated C. agulhensis from C. australis by physical characteristics such as shorter first antennae of the females and detailed structures of the fifth thoracic leg of both males and females. De Decker et al. [16] gave the type locality as the Agulhas Bank, off the southern tip of South Africa, which he considered to be the center of distribution, where the species was observed to spawn year-round, with decreased abundance to the west. The species is also found off the east and west coasts of South Africa, with relatively high abundance off the western shelf from November through December [17]. Within this region, C. agulhensis dominates the zooplankton community, comprising up to 85% of copepod biomass on the western bank [18].
Calanus sinicus was first described by Brodsky [19], although lack of a type locality and type specimen caused it to become a nomen nudum. A new diagnostic of the species was made with the type locality identified as Tsindao, Yellow Sea [20]. The species distribution includes the South and East China Sea, Yellow Sea, Bohai Sea and the Sea of Japan [20,21,22]. Reproduction of C. sinicus occurs year-round in the Sea of Japan, reaching a maximum reproductive rate between June and August depending on the region [23,24].
Comparison of descriptions made by Hulsemann [20] and DeDecker et al. [16] reveals similarities in several morphological features. Adult females share similar averages in body length, 2.73 mm for C. agulhensis and 2.95 mm for C. sinicus. The first antenna for C. agulhensis reaches beyond the furcal rami by one segment, while C. sinicus reaches beyond the furcal rami by one or two segments. Genital segments for both are described as being as long as they are broad. Both species average 18 teeth on the inner segment of leg V for adult males. The heads also show similarities such as an anterior bulge dorsal of the rostral attachments. Other comparisons were difficult to make because of a lack of descriptive standards and physical analyses. DeDecker et al. [16] chose to do a full description of both the males and females, however leaving out pore signatures. Hulsemann [20] chose to analyze differences between C. sinicus and C. joshnovi to give recognition to C. sinicus as a species, leaving out a full description and focusing largely on pore signatures. Additional morphological comparisons may be required (B.W. Frost, Univ. Washington, pers. comm.).
The sibling species C. agulhensis and C. sinicus are continental shelf species that prosper in mid-shelf ecosystems [9]. These areas are characterized by moderate temperatures, with optimal food supply and water depth. Both species are integral members of the zooplankton because of their large size, abundance and role as secondary producers for important commercial fish species [23,25]. Calanus agulhensis is a major prey species for anchovy, pilchard, herring, hake and horse mackerel; it is estimated that the species makes up to 60% of the diet of round herring, while C. sinicus is a major prey species for anchovies, sardine, and sand eels [22].
The coastal ocean regions surrounding South Africa and Japan have similarly dynamic hydrography. The Benguela Current and western Agulhas Bank regions are characterized by seasonal winds and current-driven coastal upwelling [26,27]. The North Pacific western boundary current influences coastal areas to the east of Japan: the Kii Channel and Sagami Bay exhibit estuarine upwelling and micro-scale upwelling plumes, respectively [22,[27][28][29]. These dynamic physical processes throughout the species' ranges highlight their shared ability to resist advection and establish populations in the most advantageous regions [9,19].

Molecular systematic analysis
The taxonomic and systematic relationships among Calanus species have been examined using molecular characters [9,[30][31][32][33]. Dates of evolutionary divergence among the species, once considered to be on the order of tens or hundreds of thousands of years [13,34], have been estimated to be on the order of tens of millions of years [31].
Analysis of multiple gene regions is critical for accurate resolution of species relationships; the selection of markers with diverse evolutionary drivers is particularly important [41,[44][45][46]. In this study, we analyzed the taxonomic relationship between two sibling species, C. agulhensis and C. sinicus, based on DNA sequences for portions of three genes: mitochondrial cytochrome c oxidase subunit I (mtCOI), citrate synthase (CS), and nuclear large subunit (28S) rRNA. MtCOI -especially the so-called ''barcode region'' [47] -exhibits useful patterns of inter-and intraspecific variation for diagnostic analysis of evolutionary relationships among congeneric species of many metazoans [48,49], including calanoid copepods [41]. MtCOI has also been used to examine population genetic structure of several species of Calanus, including: C. helgolandicus [33] C. pacificus [50], and C. sinicus [51]. The nuclear gene CS was selected to provide a diploid marker for better resolution of breeding patterns, including possible interbreeding and hybridization, among the closely-related species. CS has also been used to discern significant intraspecific variation of C. finmarchicus in the N. Atlantic Ocean [46]. Although it is less reliable as a diagnostic tool at the species level, the slowly-evolving 28S rRNA gene was chosen to better resolve the deeper branches between selected species of the two sibling species groups of Calanus. This gene has previously been used as a reliable comparative and ''support'' gene for mtCOI analyses [52,53]. The combined use of DNA sequences for mitochondrial and nuclear protein-coding genes and a nuclear rRNA gene provides us with a broad genetic spectrum for analysis of evolutionary and taxonomic relationships among species of this challenging copepod genus.

Sample Collection and Processing
Samples containing C. sinicus were collected from two stations to the west of Japan, in the Sea of Japan, and one station to the east, in the NE Pacific Ocean. Samples of C. agulhensis were collected from seven stations to the south and west of South Africa, including four stations to the west in the Benguela Current System and three stations from the Agulhas Bank ( Fig. 1; Table 1 and 2). Samples were preserved in 95% ethanol and stored at 4uC. DNA from adult females was obtained using the DNeasyH Blood and Tissue Kit (Qiagen) and eluted to a final volume of 200 mL. A 507 base-pair (bp) region of mtCOI was amplified using the consensus primers LCO-1490 and HCO-2198 [54]. PCR reactions were carried out in 25 mL volume, with 3 mL template DNA, 2.5 mL 25 mM MgCL 2 , 1 mL of dNTPs (0.2 mM of each dNTP), 1 mL 10 mM each of forward and reverse primer, 0.75 units GoTaq Flexi DNA polymerase, and 5 mL 56 Green GoTaq Flexi buffer (Promega) and H 2 O to a final volume of 25 mL. Twenty C. agulhensis and 7 C. sinicus sequences were determined using a pair of universal primers that define the mtCOI barcode region; the sequences were used to design the internal speciesspecific PCR and sequencing primers: LCO-1576 59-ATTCGAT-TAGAGTTAGGTCAAGC-39 and HCO-2081 59-CAT-AAAATGTGGTGTTCAGGTTACG-39. Use of these primers was necessary to obtain clean sequences from poorly-preserved C. sinicus samples. The mtCOI PCR protocol used was: 1 step of 94uC for 3 min; 35 cycles of 94uC for 30 s, 60uC for 45 s, 72uC for 45 s; 1 step of 72uC for 7 min. A 503 bp region of CS was amplified using the primers: CS-9F 59-ATTCCGTGGGTACAC-CATCC-39 and CS-514R 59-TTGTCAAGTACAGTCTCAT-CAGC-39 (Ebru Unal, University of Connecticut, unpubl. data). The CS PCR protocol used was: 1 step of 94uC for 1 min; 35 cycles of 94uC for 20 s, 55uC for 30 s, 72uC for 1 min; 1 step of 72uC for 5 min. A 658 bp region of 28S rRNA was amplified using the primers: 28S-F1a 59-GCGGAGGAAAAGAAAC-TAAC- 39 and 28S-R1a 59-GCATAGTTTCAC-CATCTTTCGGG-39 [55]. The 28S rRNA PCR protocol used was: 1 step of 94uC for 4 min; 35 cycles of 94uC for 45 s, 50uC for 40 s, 72uC for 90 s; 1 step of 72uC for 15 min. Successful PCR products were electrophoresed in a 1% agarose gel. Products that showed a strong band of the correct size were selected and cleaned using a QIAquickH PCR purification kit (Qiagen). The PCR primers were also used for sequencing with the BigDye H Terminator Ver. 3.1 Cycle Sequencing Kit (Applied Biosystems Inc., ABI) and protocols. Sequencing was done on an ABI 3130 Genetic Analyzer 4-capillary automated DNA sequencer. Sequences were edited and aligned using the Molecular Evolutionary Genetics Analysis (MEGA, Ver. 4.0; [56]).

Data Analysis
The program DnaSP Ver. 5 [57] was used to calculate haplotype diversity (H d ) and nucleotide diversity (p) for the COI data, and also to test for neutrality. Haplotype diversity was standardized using the program RAREFACT Ver.1.0 [58]. Tajima's D [59] and Fu's F S [60] were used to test for neutrality. The diploid CS sequences were recorded using ambiguity codes to represent sites with double peaks in the chromatogram file; these were interpreted as heterozygous sites. The PHASE analysis implemented in DnaSP Ver. 5 was then used to reconstruct haplotypes from the diploid CS sequences based on a Bayesian statistical model [61,62].
Analysis of Molecular Variation (AMOVA; [63]) and F ST pairwise distances were calculated for the mtCOI and CS data independently using ARLEQUIN Ver. 3.5 [64]. Our F ST pairwise distances for CS were obtained on both phased and unphased data. For the unphased analysis, ambiguous sites were ignored; the phased analysis considered all sites. Two a priori hierarchical groupings were tested for the AMOVA analysis of mtCOI data.  Variance among groups (W CT ), among populations within groups (W SC ) and within populations (W ST ) was tested for statistical significance after 100,172 permutations. Also, F ST pairwise distances comparing genetic variation (in nucleotide bases) within and among sub-populations in relation to the entire population were calculated [65]. F ST values, which range from 0 (indicating a panmictic population) to 1 (complete separation), were calculated using models and gamma values assigned by jModelTest [66]: pairwise differences for mtCOI (c = 2.0), Jukes and Cantor for the unphased CS data (c = 2.0) and Tamura and Nei for the phased CS data (c = 0.159). Pairwise distances for mtCOI were also calculated among and between C. sinicus and C. agulhensis using the Kimura 2-Paramater (K2P; [67]) method in MEGA (c = 2.0). A K2P analysis was chosen to provide a secondary analysis of genetic variation and to adhere with the barcoding literature in which K2P is the most common metric [47,48]. AMOVA terms and a parsimony haplotype network diagram for mtCOI was constructed using the program TCS Ver. 2.1 [68]. In the diagram, haplotype frequencies are represented by size and graphics were assigned to represent the four populations. A Maximum Likelihood tree for the 28S rRNA gene sequences was computed using RAxML Ver. 7.0.3 [69,70], under the GTRGAMMA option (i.e., GTR model of nucleotide substitution with the C model of rate heterogeneity) and a complete random starting tree (option -d) for 1,000 bootstrap replicates. Phasing the data was not necessary, as we did not observe any ambiguous sites. Analysis was done for multiple alignments with additional published sequences obtained from GenBank for C. helgolandicus (GenBank Acc. No. HM997038), C. marshallae (EF460770), C. glacialis (EF460768), C. hyperboreus (EF460769), C. simillimus (EU914255) and C. finmarchicus (EU375491). GenBank sequences for Neocalanus plumchrus (AF385471) and Neocalanus cristatus (AF385470) were used as outgroups.

Cytochrome c Oxidase 1
A total of 48 mtCOI sequences for C. sinicus and 46 sequences for C. agulhensis were analyzed after trimming the multiplesequence alignment to a final length of 507 bp (Table 2). For the AMOVA analysis, two a priori groups, Japan and South Africa, were established. Each group was further divided into two populations, respectively: East and West Japan, and Benguela and Agulhas. All four populations shared three common haplotypes, with a total of 8 haplotypes for C. agulhensis (GenBank Acc. Nos. JF430012 -JF430019) and 6 for C. sinicus (JF430039 -JF430044; Fig. 2). There were 5 unique mtCOI haplotypes that occurred within one sample each; these satellite haplotypes each resulted from a single base change (Fig. 2).
The neutrality test using Tajima's D was not significant, suggesting neutral evolution. However, C. agulhensis had a negative Fu's F S value that was very significant (24.60; P = 0.001); C. sinicus was neither negative nor significant (0.005, P = 0.53). Haplotype diversity was high and comparable among the groups ( Table 3). The high and very similar observed levels of haplotype and nucleotide diversity also suggested the shared genetic composition of the two species; this was discerned based on their shared haplotypes and low to no genetic variation, despite the rapid mutational rate of mtCOI. The pairwise F ST values for mtCOI were low, with the lowest value between the West Japan and Benguela populations. The largest pairwise F ST values for mtCOI were found between East Japan and Agulhas (F ST = 0.228; P = 0.001) and between West Japan and Agulhas (F ST = 0.152; P = 0.010; Table 4). The AMOVA test showed low and non-significant variation among the groups, with significant variation within populations (Table 5). A second test to compare Japan and South Africa was run without separating the groups into populations; the resulting F ST value was low and significant (F ST = 0.083, P = 0.003). The K2P test also showed low levels of variation within each group: C. agulhensis had the lower average (0.002, 60.002) compared to C. sinicus (0.003, 60.002); variation between the two groups was also low (0.003, 60.002; Table 6).

Citrate Synthase
The CS nucleotide sequences were trimmed to a length of 457 bp for analysis. A total of 19 different sequence phenotypes, four of which are shared, were found for C. agulhensis (GenBank Acc. Nos. JF430020 -JF430038), and 16 for C. sinicus (JF430045 -JF430060; Table 2). Arlequin Ver. 3.5 was used to calculate F ST values between two groups, Japan and South Africa. The groups were not further subdivided into populations, due to a lack of analyzed samples for all four populations. Tajima's D was not significant for the comparison, but C. agulhensis had a negative and highly significant Fu's F S test (222.82, P,0.0001); C. sinicus was also negative and significant (27.29, P = 0.008). The F ST values were low and significant (F ST = 0.05, P = 0.021) for the unphased

Large subunit (28S) rRNA
Sequences were determined for the same region of 28S rRNA for C. agulhensis (GenBank Acc. No. JF703102), C. sinicus (JF703103), C. propinquus (JF703105), and C. pacificus (JF703104). No nucleotide variation was observed for a 658 bp region among six 28S rRNA sequences for each of C. agulhensis and C. sinicus (Table 2). There was sufficient genetic variation among most species to allow resolution of relationships within this genus, but no variation between the two species in question (Fig. 3).

Discussion
The absence of accurate and detailed descriptions of species can complicate the classification needed to adequately study and conserve marine species diversity. Closely related species that lack diagnostic morphological characters, yet share other traits (e.g., behavior, life history, and geographical distribution), present persistent challenges for taxonomists and ecologists. This is further complicated in the marine environment, where clear cut physical characters may be altered in collection; mating behavior and chemical signaling cannot be observed; species may inhabit variable ranges with disjunct populations; cryptic species may have overlapping ranges; and dynamic currents and human intervention may transport species into foreign ecosystems.
This study analyzed three genes to provide new genetic data for re-evaluation of the taxonomic distinctiveness of two sibling species of the copepod genus, Calanus. The results confirm very low levels of genetic variation between C. agulhensis and C. sinicus. The lack of divergence of these genes is not typical among other Calanus species.
Although the mtCOI barcode region has been shown to be a reliable molecular character for recognition and discrimination of metazoans [47], including marine metazoans [49], this gene region did not provide evidence of species distinction between C. agulhensis and C. sinicus. The high and very similar observed levels of haplotype and nucleotide diversity also suggested the shared genetic composition of the two species. The majority of mtCOI variance was found within the sampled populations (i.e., Agulhas, Benguela, etc.) and not among the designated groups (i.e., Japan and South Africa; Table 5). In fact, F ST values similar to those found here are not unusual between geographically isolated, conspecific populations that span vast distances and occupy discontinuous ranges [71,72,73]. Similar patterns were observed based on analysis of K2P values of the mtCOI gene and the diploid CS gene (using unphased data), for which low levels of differentiation were found between the two species. We were unable to adequately measure interbreeding or hybridization from the CS data because of the lack of variation observed between the two species There were no differences between C. sinicus and C. agulhensis for the 28S rRNA gene region sequenced. This gene shows sufficient variation to resolve most other species of Calanus, except for the sibling species C. marshallae and C. glacialis, and does not resolve the two groups of sibling species within the genus. Although 28S rRNA is not a conclusive diagnostic marker for such closely related species, the gene shows useful rate constancy to evaluate taxonomic relationships at higher taxonomic levels [37].
Overall, these patterns indicate a degree of geographic structure and reproductive isolation that is more characteristic of differences between populations of the same species than between different species. Several considerations necessitate caution in the strength of our conclusions. First is our moderate sample sizes and limited geographic scope of sampling across each species' range. Second, F-statistics alone are insufficient metrics for delimiting species and should be used with caution for taxonomic studies.
Several scenarios based on patterns and pathways of large-scale dispersal or migration may be useful to explain our findings of genetic cohesiveness between C. sinicus and C. agulhensis. One possibility is that migrants may survive the extensive journey via warm-water surface currents that flow through the Indian Ocean and around Africa into the South Atlantic [74], where they may  contribute to and establish populations around South Africa. The South Equatorial current flows west toward Africa from the Indonesian seas and then south into the Mozambique Channel, combining with the Agulhas Current and traveling toward the South Atlantic. The prominent Agulhas Current and Return Current also provide a convenient loop that could deposit migrants on the western and northern region of the Agulhas Bank, where recruits would find optimal living conditions [75]. Transport may well have been intermittent or episodic, since these major pathways are strongly influenced by climatic conditions and other factors [74,76]. To our knowledge, neither species has been documented or observed in the Indian Ocean; it should be noted that the proposed current system flows substantially south of any suitable coastal habitats. Another possible transport pathway is via ships' ballast water; C. sinicus was reported in 31 bulk cargo carriers traveling from Japanese ports to Australia [77]. In the Russian port of Vladivostok, C. sinicus was found in samples of ballast water coming from the port of Longkou [78]; the species had been previously unrecorded in the Peter the Great Bay. Nuwer [9] noted ballast water transport as a possibility, noting long-distance transport of the copepod Eurytemora americana, which was introduced to Argentina from the Northern Hemisphere. Our genetic data are consistent with such a scenario: the very significant Fu's F S value (222.82; P,0.001) for C. agulhensis is consistent with a recent population expansion. Also, Robinson et al. [79] note that there is little investigation into invasive species of South Africa, especially near the Indian Ocean, citing a lack of full-time professional marine taxonomists.
Feeding dynamics and life history may account for the establishment and dominance of this copepod. The center of distribution for C. algulhensis is the Agulhas Bank, which has a high concentration of small phytoplankton [25] and may be a suboptimal habitat for older stages. The species' progression from east to west during their ontogenetic maturation provides access to stage-specific preferred food and nutrient conditions, and has allowed them to prosper and retain a healthy population. [17]. Maintaining a home range south of the Benguela Current may allow C. agulhensis to prosper without competing with Calanoides carinatus and other copepods that inhabit the area. Hugget et al. [80] proposed a model whereby the European anchovy, Engraulis encrasicolus, replaced a South African population that became extinct; they suggested this was possible since E. encrasicolus spawns in the warm Agulhas Bank waters, where their larvae avoid lethal temperatures, with a westward progression of life stages following a similar path to that of C. agulhensis. Investigation of this hypothesis is problematical, because of a lack of fossil records and specimens collected before 1960.
Finally, it is possible that C. agulhensis and C. sinicus are part of a cryptic species complex that exists as two -or moregeographically isolated species with indistinguishable genetics (e.g., [2]) In this scenario, the observed genetic similarities could be attributed to plesiomorphic haplotypes (i.e., alleles inherited from a common ancestor). The defining biological characteristics may be unobservable, including chemical recognition during mating and reproduction, fertilization barriers, etc. and/or differences in reproductive behavior and synchronicity may be caused by geographic isolation [3].
Overall, our analysis of mtCOI, CS, and 28S rRNA variation within and among the analyzed samples of C. sinicus and C. agulhensis consistently showed low to zero levels of genetic divergence between the species. The great majority of molecular variation was observed within the sampled populations, rather than between the species, suggesting that we have sampled a large panmictic population that spans distinct ocean basins. Further, our results concur with earlier studies, such as that by Nuwer [9],  which have questioned whether C. agulhensis warrants status as a distinct species. Explaining this result and exploring the underlying mechanisms that link these two populations separated by large geographic distances and continental barriers will require further investigation of the species' ecology, behavior, life history, morphology, physiology, and -not least -molecular genetics.

Acknowledgments
This study was made possible by colleagues who generously collected samples and/or identified specimens for our analysis, including Hans Verheye (Marine and Coastal Management, South Africa), Jenny Huggett (Department of Environmental Affairs, South Africa), Hiroshi Ueda (University of Kochi, Japan), Ryuji Machida (Academia Sinica, Taiwan) and Nancy J. Copley (Woods Hole Oceanographic Institution, USA). Special thanks are due to Bruce W. Frost (University of Washington, USA), who provided useful advice and suggestions throughout the project and manuscript preparation, as well as helped in the sampling effort. Technical assistance was provided by Paola G. Batta-Lona and Ebru Unal (University of Connecticut).

Author Contributions
Conceived and designed the experiments: AB LB-B. Performed the experiments: RK. Analyzed the data: RK LB-B. Contributed reagents/ materials/analysis tools: AB. Wrote the paper: RK LB-B AB.