Novel gene rearrangement in the mitochondrial genome of Siliqua minima (Bivalvia, Adapedonta) and phylogenetic implications for Imparidentia

Siliqua minima (Gmelin, 1791) is an important economic shellfish species belonging to the family Pharidae. To date, the complete mitochondrial genome of only one species in this family (Sinonovacula constricta) has been sequenced. Research on the Pharidae family is very limited; to improve the evolution of this bivalve family, we sequenced the complete mitochondrial genome of S. minima by next-generation sequencing. The genome is 17,064 bp in length, consisting of 12 protein-coding genes (PCGs), 22 transfer RNA genes (tRNA), and two ribosomal RNA genes (rRNA). From the rearrangement analysis of bivalves, we found that the gene sequences of bivalves greatly variable among species, and with closer genetic relationship, the more consistent of the gene arrangement is higher among the species. Moreover, according to the gene arrangement of seven species from Adapedonta, we found that gene rearrangement among families is particularly obvious, while the gene order within families is relatively conservative. The phylogenetic analysis between species of the superorder Imparidentia using 12 conserved PCGs. The S. minima mitogenome was provided and will improve the phylogenetic resolution of Pharidae species.


Introduction
The mitochondrial DNA (mtDNA) of metazoans is generally a closed circular molecule and is the only extranuclear genome in animal cytoplasm [1]. It contains its own genetic system, with maternal inheritance, low intermolecular recombination, high copy number and high substitution rate [2]. In general, mitochondrial DNA of Bivalvia contains 22 transfer RNA genes (tRNA), two ribosomal RNA genes (rRNA), 12 protein-coding genes (PCGs) and a noncoding control region, i.e., the origin of light-strand replication (OL) region [3,4]. Complete mitochondrial genomes have become popular for phylogenetic reconstruction of animal relationships [5][6][7][8]. Molecular analysis is the most commonly method to identify species without morphological classification, which is accuracy and provides a lot of information [9]. In recent years, there were many studies on gene rearrangements and phylogenetic analysis of bivalves using the mitochondrial genomes [10][11][12]. The superorder Imparidentia is a newly defined branch of bivalves in 2014 [13]. Through the Paleozoic and Mesozoic, it showed stable diversification [13]. This superorder includes most marine bivalve families [14], and is of great significance in phylogenetic analysis of Bivalvia. Phylogenetics of bivalves has been a hot topic since ancient times, but there are still many deficiencies in previous studies, among which the analysis of Imparidentia yet has numerous uncertainties [15]. Encompassing Combosch et al. had conducted the systematics of the Imparidentia in the Bivalvia based on Sanger-sequencing approach, nevertheless, it is difficult to resolve the relationships within Imparidentia using this approach. Thus, they suggested that transcriptomic analysis of Imparidentia to resolve its position of a taxa [16]. Subsequently, Lemer et al. analyzed the phylogeny of Imparidentia through transcriptome data, and the Imparidentia puzzle in phylogeny was solved by establishing a data matrix optimized for Imparidentia [14]. Due to it is a new clade of definition, existing analysis is superficial, it is extremely necessary for taxonomic and phylogenetic in-depth investigate of the superorder clams.
The razor clams (e.g., Pharidae, Solenidae) are ecologically and economically important shellfish in the coastal areas of China. They are distributed in the tropics and temperate zones [17]. The family Pharidae is dominated by marine species, belonging to the order Adapedonta of Bivalvia, except for a single typically freshwater genus, Novaculina [18,19]. The family Solenidae is once considered to include the family Pharidae by some authorities [20]. Siliqua minima (Gmelin, 1791), belong to the family Pharidae, which lives in the benthic environment from intertidal mudflats at a water depth of more than 30 m [17,21]. It is mainly distributed in the coastal areas in the south of Zhejiang Province in China. Siliqua minima mainly feeds on plankton and organic debris in seawater through filtration [22]. It has gained attention because of it is ecologically and economically important in the coastal regions of China with high commercial and nutritional value [19,23]. Previous studies of S. minima mainly focused on nutritional value evaluation, the composition and changes of fatty acids, and the effects of various environmental factors [22][23][24]. There are few researches on molecular level about it.
In the present study, we sequenced the first complete mitogenome of S. minima to gain insights into its adaptive evolution and study the characteristics of its mitogenomes, including nucleotide composition, codon usage and secondary structure of tRNAs. Furthermore, we performed phylogenetic analysis of the 12 protein-coding genes (PCGs) (except atp8) in the S. minima mitogenome with the PCGs of 54 complete mitogenomes of the superorder Imparidentia retrieved from GenBank of NCBI in order to understand its evolutionary relationship. We also integrated the gene arrangement of mitogenomes during evolution in Adapedonta in order to obtain a more accurate evolutionary relationship. These results will help to view the phylogenetic relationship of S. minima in bivalve species.

Ethics statement
The study was conducted in accordance with the guidelines and regulations of the government. No endangered or protected species were involved. There is no special permission for this kind of razor clam, which is very common in the aquatic market. Sampling also did not require specific permissions for the location.

Sample collection and DNA extraction
Siliqua minima samples were collected in November 2018 from the coastal area of Xiapu County (E120˚24.8577 0 , N26˚93.0578 0 ), Fujian Province, in the South China Sea. Preliminary morphological identification of the specimens was carried out through the published taxonomic books [25], and a taxonomist from the Marine Biological Museum of Zhejiang Ocean University was consulted [26]. The field-collected samples were initially placed in absolute ethyl alcohol and stored at -20˚C prior DNA extraction. The total genomic DNA was extracted from adductor muscle using the rapid salting-out method [27]. The quality of DNA was detected by 1% agarose gel electrophoresis, and the DNA was stored at -20˚C before sequencing.

Sequencing, assembly, and annotation of mitochondrial genomes
Complete mitogenome sequencing of S. minima was performed on an Illumina HiSeq X Ten platform (Shanghai Origingene Bio-pharm Technology Co., Ltd., China), and an Illumina PE Library of 400 bp was constructed. Quality control, de novo assembly, functional annotation and molecular evolution analysis of the S. minima mitogenome were conducted based on bioinformatics analysis methods. The NCBI has established a large database SRA (Sequence Read Archive, https://trace.ncbi.nlm.nih.gov/Traces/sra/) to store and share original high-throughput sequencing data. Clean data without sequencing adapters were assembled de novo using NOVOPlasty software (https://github.com/ndierckx/NOVOPlasty) [28]. To ensure the accuracy of the species and the correctness of sequence, we compared the mitochondrial genomes of the assembled S. minima, and used NCBI BLAST to detect the cox1 barcode sequence for taxonomical identification [29]. The new mitogenome was annotated using the MITOS Web Server with the invertebrate genetic code (http://mitos2.bioinf.uni-leipzig.de/index.py) and then compared with its existing relatives to determine the number of genes and the position of its initial and terminal codons [30,31].

Genome visualization and comparative analysis
The circular map of the S. minima mitochondrial genome was generated by using the online server CGView (http://stothard.afns.ualberta.ca/cgview_server/index.html) [32]. The secondary structures of tRNAs were predicted initially by using MITOS WebServer, as well as tRNAscan-SE v.2.0 Webserver (http://lowelab.ucsc.edu/tRNAscan-SE/), and ARWEN (http://130. 235.244.92/ARWEN/) was used to re-identify the numbers of tRNAs and secondary structures [33,34]. The putative origin of L-strand replication (OL) was identified by the Mfold Web Server and edited in Adobe Photoshop CC [35]. Base composition and relative synonymous codon usage (RSCU) for 12 PCGs of S. minima were calculated and sorted using MEGA 7.0 [36]. The skew value denotes strand asymmetry, which was calculated according to the following formulas: AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C) [37].
The ML tree was constructed in IQ-TREE using the TVM+F+R8 model with 1000 nonparametric bootstrapping replicates and the best-fit substitution model with ModelFinder [39,40]. Bayesian inference (BI) methods were used with the program MrBayes v3.2 [41]. By associating PAUP 4.0, Modeltest 3.7 and MrModeltest 2.3 software in MrMTgui, the best-fit model (GTR+I+G) of substitution was chosen according to AIC [42,43]. BI analyses were conducted with Markov Chain Monte Carlo (MCMC) sampled every 1,000 generations each with three heated chains and one cold chain run for 2,000,000 generations, and the first 25% burn-in was discarded. Visualization of the tree was realized using FigTree v1.4.3 [44].

Genome organization and base composition
The complete mitogenome of S. minima was 17,064 bp in length, which has been deposited in GenBank under accession NO. MT375556 (Fig 1, Table 1). In the present study, there was only

PLOS ONE
The complete mitochondrial genome of Siliqua minima one referential species, S. constricta, of Pharidae, whose mitogenome was 17,225 bp in length, similar to that of S. minima [45]. S. constricta was previously classified as belonging to the family Solecurtidae but has been confirmed to belong to the family Pharidae [31,46]. The mitogenomes of Pharidae were longer than other species in Adapedonta observed, i.e. typically ranged from 15,381 bp (Panopea abrupta) to 16,784 bp (Solen grandis) ( Table 1). The circular mitochondrial genome of S. minima had 22 putative tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA), 12 PCGs and one control region (CR) including an origin of the light-strand replication (OL) region. According to our statistics, all species of Adapedonta we downloaded contained the atp8 gene, except for species of the family Pharidae [45,[47][48][49][50][51]. The gene arrangement of the mitogenome of S. minima was identical to that of S. constricta. Interestingly, all 36 mitochondrial genes were encoded on the heavy chain.
The overall base composition of the whole mitochondrial genome was 25.41% A, 41.00% T, 22.93% G, and 10.62% C, exhibiting obvious AT bias (66.41%). Due to the skewness of the S. minima mitogenome, most of them are negative, except the CR and rRNAs possessed an opposite AT skew compared with other genes ( Table 2). All GC-skews were positive, indicating that the base composition ratios were G biased to C.

Noncoding regions and gene overlapping
Generally, the mitogenome contains a non-coding region (NR), including AT-rich, hairpin structures, tandem repeats and some peculiar patterns [52][53][54]. It is supposed to play a role in the regulation of mitochondrial transcription and replication [55]. There were 25 NRs in S. minima, which is similar to S. constricta (25 NR) of the same family as in previous reports [45].
The largest NR of S. minima was identified as a putative control region (CR). In addition, the longest intergenic region of the razor clam was 273 bp and was located between trnF and cox1 (Table 3). The CR is the region with the largest length variation in the whole mitochondrial sequence and the region with the fastest evolution in the mitochondrial genome [56]. It has a high mutation rate, so it is of great value to study for population genetic analyses [57]. By comparing the gene order of bivalves, we can see that the CR regions are not conservative but are highly rearranged. The CR region was located between nad2 and trnK in the S. minima mitogenome, spanning 1,371 bp with 60.10% A+T content and showing positive AT-and GC-skew (0.075 and 0.492), indicating bias towards A and G (Fig 1; Tables 2 and 3). Simultaneously, the replication origin of the L-strand (OL) region was also found in this region. The "OL" region could form a stem-loop secondary structure with 18 bp in the stem and 16 bp in the loop, with an overall length of 34 bp (CCTTCCCCCTTCTACGATAGTTGGAGGGGGAAGG), and the secondary structure of the stem-loop, which has the potential to fold, was predicted (Fig 2).
The overlapping of neighbouring genes is common in bivalve mollusc mitochondria. There were eight overlaps of neighbouring genes in the mitochondrial genome of S. minima. The position of the largest gene overlap (16 bp) was between trnS2 and trnE.

PLOS ONE
The complete mitochondrial genome of Siliqua minima

Protein-coding genes and codon usage
Siliqua minima has 12 PCGs and lacks the atp8 gene, which is very common in bivalves. The total length of the 12 concatenated protein-coding genes was 11,262 bp, accounting for approximately 66.00% of the whole mitogenome ( Table 2). The average A+T content was 66.35%, ranging from 61.92% (cox2) to 71.86% (nad6) ( Table 2). We further compared the PCGs of the six Adapedonta species mitogenomes, and the PCGs ranged from 61.78% (Solen strictus) to 66.45% (S. constricta) ( Table 4). The AT-skew values were negative (-0.336) for PCGs, while the GC-skew values (0.377) were positive ( Table 2).
For all 12 PCGs identified in the S. minima mitogenome, two genes (nad1 and nad3) were initiated with the start codon ATA, three genes (nad5, nad6 and nad2) started with the codon ATT, and the remaining seven genes had the start codon ATG. The nad6, atp6, cox3 and nad3 genes had the termination codon TAG (Table 3). Moreover, the most common termination codon, TAA, was detected in eight PCGs.

PLOS ONE
The complete mitochondrial genome of Siliqua minima Most amino acids were used by either two or four in invertebrates, and only Leu and Ser were encoded by six and eight different codons, respectively [58]. The nucleotide relative synonymous codon usages (RSCUs) of S. minima are presented (Fig 3, Table 5). GCU (Ala), UUA (Leu2), CCU (Pro) and ACU (Thr) are the most frequently used codons, whereas CUC (Leu1), GAC (Asp) and CGC (Arg) are relatively scarce. As per the RSCU values, codons ending with an A or U were preferred, and the codons NNA and NNU were found in the majority.
In addition, we also compared the amino acid composition of two species of Pharidae ( Fig  4). The four most frequent amino acids in the PCGs of S. minima were phenylalanine (11.66%), glycine (8.64%), leucine 2 (8.52%) and valine (8.10%), whose proportions were similar to those observed in S. constricta.

Transfer and ribosomal RNA genes
The mitogenome of S. minima contained 22 tRNA genes varying in size from 63 to 76 bp, and each of them was unique and compatible with codon usage in invertebrate mitogenomes (Table 3). Two types of anticodons (TAG and TAA) determined leucine, and TCT and TGA determined serine. The average content of A+T in the entire tRNA was 66.12%. The AT-skew values were negative (-0.077), and GC-skew values were positive (0.256) ( Table 2), indicating a bias towards Ts and Gs when horizontal alignment of 22 tRNAs was performed. In addition, the tRNAs of the mitogenomes of six Adapedonta species ranged from 62.67% (Solen strictus) to 68.01% (Panopea abrupta) ( Table 4). We observed that the seven species of Adapedonta had negative AT skew and positive GC skew. To understand the functional and structural characteristics of tRNAs, we predicted the secondary structures of 22 tRNAs of S. minima (Fig 5). Except for two trnS (TCT and TGA), which have the most different structures, all the tRNAs could fold into a typical cloverleaf secondary structure. Similar to other bivalves, S. minima has no discernible DHU (dihydrouridine) stem and loop and cannot be folded into a typical cloverleaf structure [59]. Otherwise, we found twenty tRNAs (except two trnL) with at least one G-T base pairs, which formed weak bonds. This base pairs can be corrected by post-transcriptional RNA editing mechanisms [60].
The 16S rRNA subunit (rrnL) and 12S rRNA subunit (rrnS) were 1,247 bp and 829 bp in size, respectively (Table 3). Both fragments were separated by atp6 and trnM genes. The base composition of the rRNA genes was 34.97% A, 33.86% T, 18.98% G and 12.19% C, and the A +T content was 68.83% (Table 2). Notably, both AT-skew (0.016) and GC-skew (0.218) values of rRNAs were positive, which was different from other genes. This indicates that the A and G content is more prevalent in mitochondrial RNA genes.

Gene arrangement
Gene order of the mitochondrial genome can be used to research the evolution of species. It can be used to investigate the ancestral lineage of phylogeny, and to establish the mechanism of gene replication, regulation and rearrangement. Bivalves of molluscs have highly variable mitochondrial gene sequences, and are the most mutated species in metazoa [61,62]. In the study, we selected some species from four orders of the superorder Impardentia, Venerida, Cardiida, Adapedonta and Lucinida as representatives of bivalves to study mitochondrial gene rearrangement (Fig 6). Due to the great difference of gene sequence among bivalves, we excluded the tRNA genes and compared with them by 12 or 13 PCGs. Although their gene order are highly variable, we try to find out whether there are some shared gene blocks among bivalve species. The results showed that there was a mass of rearrangement in each order of bivalves, even if we deleted all tRNAs. The gene rearrangement analysis based on families or even genera is more appropriate. In addition, the sequence of genes in the four genera of Dosinia, Meretrix, Saxidomus and Cyclina were identical. Both of them contained cox1-nad1--nad2-nad4l-cox2-cytb-rrnl gene fragment. In addition, atp6-nad3-nad5 and atp8-nad4 were the same gene fragments, and cox3 and rrns gene were interchanged. Compared with the families of Vesicomyidae and Corbiculidae, the gene order of the genus Dosinia and other four genera retained the overlength gene fragment of cox2-cytb-rrnl-atp8-nad4-atp6-nad3, as well as two small fragments of nad5-nad6 and rrns-cox3. The sequence of two families Vesicomyidae and Corbiculidae, contains the same two gene blocks as that of the family Mactridae: cytbrrnl-atp8-nad4-atp6-nad3-nad1-nad5 and nad2-rrns-cox3. There is only one cox3-cytb-rrnl gene fragment in Mactridae, which is the same as Tridacnidae in Cardiida. Except for Tridacnidae, the gene sequences of the other five families in Cardiidae are identical. Donacidae, Psammobiidae, Solecurtidae, Tellinidae and Semelidae have the same arrangement as the family Tridacnidae. It also illustrates the family Tridacnidae is a very special family in Cardiidae. The gene arrangement of most families of the order Cardioidea is similar to that of the order Hiatellidae of the superorder Adapedonta in that there are four identical fragments nad4-nad3, nad1-nad5-rrnl-atp6 and cox3-nad2. There are nad2-cox1-cox2 and nad4l-atp8-nad4 gene fragments in the same gene order between two families Hiatellidae and Solenidae, while nad3--nad1-nad5-nad6-cytb-rrnl-atp6-rrns-cox3 gene fragment in the family Hiatellidae is almost the same as nad3-nad1-nad6-nad5-cytb-rrnl-atp6-rrns-cox3 gene arrangement in the family Solenidae, only nad5 and nad6 genes in the reverse order. They are the two families with the closest gene arrangement in this study. The sequence of nad5-cytb and rrnL-atp6-rrns-cox3 was the same as that of the family Pharidae. Moreover, species of Pharidae lack atp8 gene, which is also common in bivalve species. Compared with the species of the family Lucinidae, there are only three identical gene blocks: atp6-rrnS, nad2-cox2, nad4-nad3. Through above analysis we can find that although the gene sequence among bivalve species is highly variable, the same gene arrangement is longer among the species with closer genetic relationship. However, the possibility of rearrangement of the contrast between the spanning orders is greater. It shows that

PLOS ONE
there is a certain relationship between evolution and gene rearrangement, even in bivalve species with high rearrangement rate. However, in this study, there is a high degree of gene rearrangement, such as the family Tridacnidae. Thereby, the taxonomic evolution of species cannot be supported only by the study of gene sequence, but also needs the combination of phylogenetic reconstruction.
In addition, we further analyzed the species of the superorder Adapedonta using all the genes of mitogenome. Previously, gene rearrangement is rarely discussed separately in Adapedonta because of its extremely few whole mitogenome data. We propose the gene order analysis of three family species in Adapedonta first time, which can be used as a reliable phylogenetic marker for some bivalve lineages. The CR of Adapedonta species is typically only

PLOS ONE
The complete mitochondrial genome of Siliqua minima a small fragment or no obvious region. Nevertheless, we discovered that the CR was more than 1300 bp in family Pharidae. It is located between the nad2 and trnK genes, and its order different from other Adapedonta species. In the mitogenome of Pharidae, a total of 10 genes,

PLOS ONE
The complete mitochondrial genome of Siliqua minima including 4 PCGs, 4 tRNAs and all rRNA gene are rearranged (Fig 7). As shown in Fig 7, three main gene blocks are described for Adapedonta, the gene arrangement in the family is relatively conservative, and only part of the difference comes from the base content. However, the gene rearrangement among the three families differed substantially, but some of the fragments were still retained. From the observation of seven species of Adapedonta, we can see that each species contained short fragments trnL2-nad1-trnl1 (segment A), rrnL-atp6-trnM-rrnS-cox3--nad2 (B) and trnS1-nad2 (C), which were all behind cox1 gene and in the same order. In the family Hiatellidae and Pharidae, fragments B and C are connected as a long fragment, which may be related to time of divergence.

Phylogenetic relationships of Imparidentia
To research the phylogenetic implications of the S. minima mitogenome in Imparidentia, we reconstructed the order-level phylogenetic tree. The phylogenetic trees based on Bayesian inference (BI) and maximum likelihood (ML) analyses of 12 PCGs of 54 species produced identical topologies (Fig 8). The tree topologies based on two methods were basically congruent and obtained high supports in the majority of nodes. The relationships among the five orders of Imparidentia involved herein were consistently recovered as (Venerida + (Cardiida + Adapedonta)) + Myoida + Lucinida, which is slightly different from the study of Fernandez-Perez et al. [46]. However, the results are consistent with the topological structure of phylogenetic tree constructed by using transcriptional data base on the morpho-anatomical by Lemer et al. [14]. In addition, this result is also basically consistent with phylogenetic tree constructed based on mitogenome by Yuan et al. [63], but we added a large number of mitochondrial genome data species on this basis to further explore the evolutionary relationship between bivalves. In our analysis, the evolutionary differences were mainly concentrated between three orders of Venerida, Cardiida and Adapedonta. We can find that Venerida is the first to branch out of the three orders, and its branching posterior probabilities and bootstrap values are higher than those of the previous studies with adapedont as the first branch. This confirmed that phylogenetic analysis based on our data is more effective. The analysis shows that two species of the order Lucinida were the outermost species of all bivalves and formed a single clade, i.e., Lucinida is monophyletic, in accordance with previous viewpoint [63]. Moreover, BI and ML recovered each the family Pharidae, Solenidae and Hiatellidae form a monophyletic assemblage with strong support. Both ML and BI analyses of two datasets supported the sistergroup relationship of Pharidae and Solenidae species (Bayesian posterior probabilities (PP) = 1.00, and bootstrap values (BS) = 100), as previously reported [30]. In addition, the family Hiatellidae was placed as sister to Pharidae and Solenidae (PP = 1; BS = 100). The phylogenetic relationships between seven species in the order Adapedonta are (((Panopea abrupta + Panopea generosa) + Panopea globose) + Hiatella arctica) + ((S. minima + S. constricta) + (Solen grandis + Solen strictus)) (Fig 8). There has been controversy about the branch of S. constricta for a long time. It was once thought to be a member of the family Solenidae, and then it was classified into the family Tellinoidea by morphological identification and anatomical characteristics [64]. Yuan et al. used multiple PCGs to reconstruct the phylogenetic relationships and classified S. constricta within the family Solenidae [31]. In our study, it was obvious that S. minima and S. constricta of the family Pharidae form a new branch. The analysis of the mitochondrial genome in this study further strengthens the previous elevation of the order Adapedonta to the family level. In fact, at present, there has not been any special evolutionary research on the whole superorder species based on molecular data. Therefore, the study evolution and classification use molecular means base on morphological dissection, such as mitochondrial genome are still necessary to test the taxonomy of superorder Imparidentia.

Conclusions
We sequenced and assembled the mitogenome of S. minima using next-generation sequencing, and the genome was 17,064 bp in length. The gene distribution was entirely presented on the heavy chain of the S. minima mitogenome. With the skewness of the S. minima mitogenome, except for the CR and rRNAs, most AT skews were negative; moreover, all GC skews were positive. In the tRNA secondary structure, only two trnS cannot be folded into a typical cloverleaf structure because they do not have a discernible DHU stem-loop. In the analysis of PCGs rearrangement of bivalve species, the gene sequence among species is highly variable, the more consistent of the same gene arrangement is longer among the species with closer genetic relationship. Furthermore, after analysis of homologous regions between the seven Adapedonta mitogenomes, it was concluded that the gene rearrangement among families is particularly obvious, while the gene rearrangement within families is relatively conservative. The phylogenetic trees constructed by ML and BI methods had the same branches. The results show that S. minima and S. constricta are the closest relatives and both belong to the family Pharidae. At present, the complete mitochondrial genome data of Pharidae are quite limited, and this study we reconstructed phylogenetic trees using the superorder Imparidentia, thus increases the understanding of the phylogeny of Pharidae.