Mitochondrial genome sequencing of a vermivorous cone snail Conus quercinus supports the correlative analysis between phylogenetic relationships and dietary types of Conus species

Complete mitochondrial genome (mitogenome) sequence of a worm-hunting cone snail, Conus quercinus, was reported in this study. Its mitogenome, the longest one (16,460 bp) among reported Conus specie, is composed of 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and one D-loop region. The mitochondrial gene arrangement is highly-conserved and identical to other reported. However, the D-loop region of C. quercinus is the longest (943 bp) with the higher A+T content (71.3%) and a long AT tandem repeat stretch (68 bp). Subsequent phylogenetic analysis demonstrated that three different dietary types (vermivorous, molluscivorous and piscivorous) of cone snails are clustered separately, suggesting that the phylogenetics of cone snails is related to their dietary types. In conclusion, our current work improves our understanding of the mitogenomic structure and evolutionary status of the vermivorous C. quercinus, which support the putative hypothesis that the Conus ancestor was vermivorous.


Introduction
Cone snails (Conus spp.), a species-rich genus of venomous marine gastropods, produce a complex of conotoxins for prey capture and defense. They are usually classified into fish-hunting (piscivorous), snail-hunting (molluscivorous) and worm-hunting (vermivorous) groups [1][2][3]. The number of piscivorous species is the least, while these snails are assessed as deadly to humans. A larger number of molluscivorous species is dangerous, although some snails have been implicated in unconfirmed fatalities. Forming the largest group, the vermivorous species account for 70% of the Conus genus, while they seem to be nonthreatening [2][3][4]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 There are more than 800 Conus species, and each typically contains 100~200 venom peptides; therefore, a total of over 80,000 conotoxins have been identified from various cone snails around the world [5,6].
The colors and diets, along with the composed conotoxins, in different cone snails are abundant and complicated; hence, related taxonomy, population genetics, evolutionary biology and phylogenetics have aroused the interest of scientists [7,8]. Evolutionary relationships between feeding and conotoxins have been discussed on basis of transcriptomics, proteomics and genomics [9,10]. The diversity of peptides in the venom of cone snails confirms that most species are able to produce a variety of conotoxins, as widely reported in literatures [11,12]. There is a specific hypothesis for the shift from an ancestral worm-hunting to more recent fish-hunting [13]. However, the poor performance of venom components in predicting prey taxonomic class suggests that conotoxins (gene superfamily) and traditional means of categorizing prey types (worms, mollusa, fish) do not accurately clarify the evolutionary dynamics between venom composition and diets [14][15][16].
Here, we reported the mitogenome of an additional vermivorous cone snail, C. quercinus, and described some outstanding features of its mitogenome sequence. Related mitogenomic structure and phylogentic status are going to provide more supportive evidence for the putative hypothesis about the vermivorous Conus ancestor and the correlation between traditional classification of prey types and mitogenome evolution.

Ethical statement
No specific permits were required for the collection of specimens for this study. These Conus specimens are common in China, and the field collection did not involve any endangered or protected species. Our experimental procedures complied with the current laws on animal welfare and research in China, and were specifically approved by the Animal Research Ethics Committee of Hainan Medical University.

Genomic DNA extraction and sequencing
Live C. quercinus were collected in the offshore areas of Lingshui City, Hainan Province, China. Around 150 mg of foot tissue was ground to powder using mortar and pestle under liquid nitrogen. Total genomic DNA was extracted by the Column mtDNAout kit (Tianda, Beijing, China) according to the manufacturer's instructions with minor modifications. The purified genomic DNA was quantified with a Nanodrop 2000 spectrometer (ThermoFisher Scientific, Wilmington, DE, USA).
Normalized DNA of 3 μg was employed to prepare a paired-end library using the NEB Next DNA sample libraries kit (New England Biolabs, New England) in accordance with the manufacturer's instructions. Quantification and size estimation of the library were performed on a Bioanalyzer 2100 High Sensitivity DNA chip (Agilent, Palo Alto, CA, USA). Finally, the library was normalized to 2 nM and sequenced on the Illumina HiSeq2000 (Illumina, San Diego, CA, USA).

Sequence assembly
Illumina paired-end reads were filtered on the basis of quality values, and the low-quality bases (quality < 20, p error > 0.01) at upstream and downstream were trimmed. The remained clean data were de novo assembled by SOAPdenovo2 (http://soap.genomics.org.cn/soapdenovo. html) on the basis of overlapping and paired-end relationships. All the cleaned reads were also mapped onto the assembled contigs with Bowtie 2 (2.2.5) [27] to estimate the sequencing depth. Those contigs with sequencing depth over 30× were mapped to the Conoidea mitochondrial genomes that were downloaded from the NCBI non-redundant nucleotide database (Nt) with blastn (2.2.31+) to validate mito-contigs, and the remaining genome gaps were filled with a python script.
Genome confirmation was indispensable to perform after assembling. Finally, the pairedend clean reads were mapped onto the assembled genome with 100% coverage, and the insertsize matched the information of sequenced library. The sequencing depth, coverage and relationship of the paired-end reads were the main criteria for confirmation.

Genome annotation and analysis
Preliminary gene annotation was realized through the online program Dual Organellar GenoMe Annotator (DOGMA) [28] and ORF Finder [29] with invertebrate mitochondrial genetic codes and default parameters. To verify the exact gene and exon boundaries, putative nucleotide and protein sequences were BLAST searched in the public Nt and Nr (the NCBI Non-redundant protein) databases. All tRNA genes were further confirmed through online tRNAscan-SE and ARWEN search server [30][31][32], in combination with the annotated results of ARAGORN. Graphical map of the circular plastome was drawn with Organellar Genome DRAW (OGDRAW v1.2) [33]. Relative synonymous codon usage (RSCU) value was employed to evaluate the synonymous codon bias in accordance with a previous report [34]. The skewing of nucleotide composition was calculated according to the following formulas: AT skew = (A-T)/(A+T) and GC skew = (G−C)/(G+C) [35,36]. To further analyze the evolutionary adaptation in the Conus lineage, we applied DnaSP 6 [37] to estimate the ratios of nonsynonymous (Ka) and synonymous (Ks) substitutions in the mitochondrial genomes among cone snails with the three different dietary types.

Phylogenetic analysis
Phylogenetic analysis was performed among ten taxa in the Conidae based on the nucleotide sequences of eleven protein-coding genes without ATP8 and ND6 from GenBank, and Oxymeris dimidiata (NC_013239.1), Fusiturris similis (NC_013242.1) and Lophiotoma cerithiformis (NC_008098.1) were employed as the out-groups. Before reconstructing phylogenetic trees, both nucleotide and protein sequences of eleven protein-coding genes were subjected to concatenated alignments using MUSCLE 3.8.31 (http://www.drive5.com/muscle/) [38]. The best-fit model GTR+G+I for nucleotide sequences was selected using the Akaike Information Criterion (AIC) with jModeltest [39]. Bayesian analyses of both nucleotide and protein alignments were carried out using PhyloBayes version 3.3f [40] under the best-fit model. Two independent Markov Chain Monte Carlo (MCMC) chains were run simultaneously to determine whether the searching reached stabilization, and were immediately stopped when all chains converged (maxdiff less than 0.1). The phylogenetic trees were eventually constructed using the Tree View program v.1.65 and Evolview (www.evolgenius.info/evolview/) [41].

Genome organization and nucleotide composition
The mitogenome of C. quercinus is a closed circular molecule of 16,430 bp in length (GenBank accession No. KY609509; see more details in Fig 1). It encodes a high mutation region (Dloop), and a typical set of 37 mitochondrial genes including 13 protein-coding genes (PCGs), two ribosomal RNA (rRNA) genes (12S rRNA and 16S rRNA), and 22 transfer RNA (tRNA) genes. Eight tRNA genes are encoded on the light (L) strand, whereas the other genes are located on the heavy (H) strand (Table 1).
In the C. quercinus mitogenome, gene overlapping occurred three times (the negative numbers in Table 1), spanning 1~7 nucleotides (nts), for a total of 9 nts. The intergenic spacer region occurred 20 times (the positive numbers in Table 1), spanning 1~162 bp, for a total of 421 bp. The overall base composition is estimated to be 28.15% for A, 38.32% for T, 14.82% for C and 18.71% for G, with a high A+T content at 66.47% (Table 2).
The mitogenome gene arrangement is conserved and identical to those of other reported Conus species. The intergenic sequences vary between 0 and 41 nts, and one relatively large region of 162 nts happens between COX1 and COX2 (Fig 1). The gene sequences of NADH dehydrogenase subunit 4L (NAD4l) and subunit 4 (NAD4) are overlapped by 7 nts, NAD4 and tRNA-His by 1 nt, and NAD5 and tRNA-Phe by 1 nt.

Protein-coding, tRNA and rRNA genes
The 13 PCGs of C. quercinus are similar in length and arrangement to the nine previously sequenced Conus mitogenomes. All PCGs are transcribed from the H strand in C. quercinus, with initiation of the standard start codon ATG. They also display the typical TAN termination codon, in which 8 PCGs have the complete termination codon TAA and 5 PCGs have the TAG (Fig 1 and Table 1). The RSCU values of the C. quercinus mitogenome were calculated (Fig 2), indicating that TTA (Leu), TCT (Ser), GCT (Ala), ACT (Thr), and GTT (Val) are the five most frequently used codons.

D-loop region of the C. quercinus mitogenome
The D-loop region between tRNA-Phe and COX3 in C. quercinus (Fig 1) is the longest (943 bp), which is much higher than those in other Conus species (97~698 bp; see more details in Table 3). Based on the sequence alignment of C. quercinus with other Conus species, we observed that the intergenic sequences of the C. quercinus mitogenome are also the longest in these Conus species. Usually, in animal mitochondrial genomes, the longest intergenic sequences were reported to play a key role in the initiation of replication and transcription [22,42]. Interestingly, the D-loop region in C. quercinus (among the 10 Conus species except C. californicus) also presents the higher A+T content (71.3%) with a long AT tandem repeat stretch (68 bp; Fig 3).

Synonymous and nonsynonymous substitutions
In genetics, the Ka/Ks ratio is of significance to estimate the balance between neutral mutations and is especially useful for understanding the evolutionary relations between homologous PCGs in closely related species [43]. To detect the influence of selection on the C. quercinus, Ka and Ks were estimated [44]. In all the 13 PCGs of three cone snails (Fig 4), the ratio of Ka/Ks is much less than 1 (between 0 and 0.16), indicating existence of a strong purifying or stabilizing selection.  The average Ka/Ks in ATP8 is the highest among the 13 PCGs, suggesting that this protein is under the least selective pressure among all the mitochondrial genes. Interestingly, in C. textile and C. striatus, the ratio of Ka/Ks is the least in nine protein-coding genes (except COX3, CYTB, ND4L and ND6) compared to C. quercinus, indicating that these two cone snails have a closer phylogenetic relationship than C. quercinus. These data are consistent with their dietary difference (Fig 5).

Phylogenetic relationships of Conus species with different dietary types
Molecular phylogeny of the taxonomy is a hypothesis of its evolutionary patterns and processes. The molecular-based phylogenetic tree can estimate divergence times and ancestral distributions, and provides evidence relevant to taxonomic hypotheses [8]. To further Table 2. Nucleotide composition of Conus mitogenomes. validate the mitogenome sequence of C. quercinus and understand the evolutionary history of the Conus species with different feeding ecologies, we constructed a phylogenetic tree using Bayesian inference analysis with 13 complete mitogenomes downloaded from the NCBI (Fig 5). It is obvious that three different dietary types (vermivorous, molluscivorous and piscivorous) of cone snails, except the broad C. californicus, are clustered separately. This is consistent with previous reports [43,45] and supports the putative hypothesis that the cone snail ancestor was vermivorous. However, the inclusion of C. californicus in the phylogenetic tree analysis may bias the results, because it is often regarded as an atypical member of Conidae due to its extremely broad diet and distant phylogenetic relationship to the rest of Conidae [14,26].  Venom composition across Conus species has been hypothesized to be shaped by prey type and dietary breadth. Several studies [14] proved a significant positive relationship between dietary breadth and measures of conotoxin complexity by transcriptome sequencing of venom duct from 12 Conus species. However, given the high evolutionary lability of venom toxins, it is unclear that a clear relationship between dietary preference and venom composition should be expected. The prey taxonomic class of Conus can be predicted by venom components, but the performance of prey taxonomic class in predicting venom components was poor [14][15][16][17]26]. By far, we are certain that the selective pressures driven by diets play a major role in shaping evolutionary patterns in venom across the cone snails.

Conclusion
In this present study, we sequenced and annotated the complete mitogenome of C. quercinus. We used other nine publically available Conus mitogenomes to illustrate the structure of C. quercinus mitogenome and investigated the evolutionary relationships among Conus species. Interestingly, the mitochondrial gene arrangement of C. quercinus is highly conserved and identical to other Conus species. However, the D-loop region (943 bp) of C. quercinus is the longest with the higher A+T content (71.3%) and a long AT tandem repeat stretch (68 bp). The phylogenetic tree (Fig 5) revealed that three different dietary types of cone snails are clustered separately, suggesting that the phylogenetics of cone snails is related to their dietary types. Our current work improves our understanding of the mitogenomic structure and evolutionary status of the vermivorous C. quercinus, which support the putative hypothesis that the Conus ancestor was vermivorous.