Comparative analyses of the complete mitochondrial genomes of Dosinia clams and their phylogenetic position within Veneridae

Mitochondrial genomes have proved to be a powerful tool in resolving phylogenetic relationship. In order to understand the mitogenome characteristics and phylogenetic position of the genus Dosinia, we sequenced the complete mitochondrial genomes of Dosinia altior and Dosinia troscheli (Bivalvia: Veneridae), compared them with that of Dosinia japonica and established a phylogenetic tree for Veneridae. The mitogenomes of D. altior (17,536 bp) and D. troscheli (17,229 bp) are the two smallest in Veneridae, which include 13 protein-coding genes, 2 ribosomal RNA genes, 22 tRNA genes, and non-coding regions. The mitogenomes of the Dosinia species are similar in size, gene content, AT content, AT- and GC- skews, and gene arrangement. The phylogenetic relationships of family Veneridae were established based on 12 concatenated protein-coding genes using maximum likelihood and Bayesian analyses, which supported that Dosininae and Meretricinae have a closer relationship, with Tapetinae being the sister taxon. The information obtained in this study will contribute to further understanding of the molecular features of bivalve mitogenomes and the evolutionary history of the genus Dosinia.


Introduction
The genus Dosinia belonging to the family Veneridae in the superfamily Veneracea are important marine bivalves inhabiting from intertidal zone to subtidal zone along shallow coasts [1]. It was recognized as one of the significant taxons both in stratigraphy and chronology [2]. Therefore, the evolution of genus Dosinia deserves a careful investigation for understanding the bivalve evolution and diversity.
Along with morphological classification, molecular analyses are essential for evaluating the diversity of metazoans because of their high informativeness and accuracy [3,4]. However, in contrast to the large number of morphological descriptions of Dosinia, there have been few attempts to perform molecular phylogenetic analysis in this genus so far [5][6][7][8][9][10]. Most present work involving Dosinia clams used different gene sequences (e.g., ITS2, H3, rrnL) to resolve the phylogenetic relationships at the level of Heterodont or Veneroida, nevertheless, the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Sample collection and DNA extraction
A single specimen was obtained from each of the two Dosinia bivalve. D. altior was collected from Qinzhou (Guangxi province of China) and preserved frozen at -80˚C in 2017. D. troscheli was collected from Beihai (Guangxi province of China) and preserved in 95% ethanol in 2014. Specimens were taxonomically identified based on shell morphology [35] and DNA barcoding sequence (cox1 gene) from GenBank and Boldsystems. Total genomic DNA was extracted from adductor muscle following the standard phenol-chloroform procedure described by Li et al. [36] with some modifications. The DNA quality was checked on 1.0% agarose gel.
PCR reactions were performed in a 10 μl volume containing about 50 ng template DNA, 1× reaction Buffer (Mg 2+ plus, Takara), 0.02 mM of each dNTP, 1 μM of each primer, 0.25 U LA-Taq polymerase (Takara). The long-PCR reactions for long fragments refer to Yuan et al. [18] protocols : an initial denaturation for 3 min at 94˚C, followed by 35 cycles comprising denaturation at 94˚C for 30 s, annealing at 48-56˚C for 30 s and extension at 68˚C for 5 min, the  whole process was completed with a final extension at 72˚C for 10 min. PCR products were purified with EZ-10 spin column DNA gel extraction kit (Sangon Biotech), and sequenced with the primer walking method directly. The sequencing was conducted on an ABI PRISM 3730 (Applied Biosystems) automatic sequencer in Beijing Genomics Institute (BGI) using standard Sanger sequencing chemistry.

Sequence analysis and gene annotation
The DNA sequences obtained by sequencing the PCR-amplified DNA fragments were analyzed and assembled using Seqman program from DNASTAR (available at: http://www. dnastar.com/). Protein coding genes were annotated by using NCBI ORF Finder (https:// www.ncbi.nlm.nih.gov/orffinder/) and BLASTx with the invertebrate mitochondrial code. The positions of tRNA genes were determined by ARWEN v.1.2 [39] and MITOS Web Server [40]; and secondary structures of tRNAs were inferred using MITOS in default search mode. The rRNA genes were identified by BLAST searches (http://www.ncbi.nlm.nih.gov/BLAST/), the boundaries of each gene were determined with multiple alignments of other 13 published bivalve mitogenomes from NCBI website. Repeat sequence patterns in the major no-coding region (MNR) were checked using the web-based software server Tandem Repeat Finder 4.0 (http://tandem.bu.edu/trf/trf.html) [41]. Mitogenome maps were drawn using the CGview Server (http://stothard.afns.ualberta.ca/cgview_server/) [42]. The two mitogenomes have been deposited in the GenBank database under the accession numbers MG543473 for D. altior and MG543474 for D. troscheli.
The base composition and skewness analyses were performed and compared between D. altior and D. troscheli, as well as D. japonica. The A+T content values and nucleotide frequencies were computed using Editseq program from DNASTAR. The GC-and AT-skews were calculated according to the formulae by Perna et al. [43]: AT-skew = (A -T)/(A + T); GCskew = (G-C)/(G+C), where A, T, G and C are the occurrences of the four nucleotides. Codon usage analysis was performed by MEGA 6 [44]. The phylogenetic relationships were reconstructed based on nucleotide sequences of 12 PCGs (except atp8). The nucleotide sequences were aligned with Clustal X [45] using default settings, followed by manual correction.

Phylogenetic analyses
Phylogenetic trees were reconstructed using maximum-likelihood (ML) analysis and Bayesian inference (BI) approaches. For phylogenetic analyses based on nucleotide data, the most appropriate model GTR + G was selected by jMODELTEST using the Akaike Information Criterion (AIC). The ML analysis was conducted with PhyML 3.0 (http://www.atgcmontpellier.fr/phyml/) and 1000 bootstraps were used to assess the support of nodes. BI was performed using MrBayes 3.1.2 [46]. In the case of the Bayesian analysis, the Markov Chain Monte Carlo were run for 100,0000 generations with trees sampled every 100 generations. Stationarity was defined as mean standard deviation of split frequency less than 0.01.

Genome organization
The complete mt genomes of D. altior and D. troscheli are 17,536 bp and 17,229 bp in length, respectively, and the length of D. troscheli mitogenome is the smallest among the available mitogenomes of Veneridae currently. Length differences are mostly due to the size variation of the non-coding region. In the D. altior mitogenome, there are 22 non-coding regions with a total of 2385 bp long varying from 1 bp to 1865 bp, and the D. troscheli mitogenome contains 21 non-coding regions with a total of 2140 bp with various lengths of 2-1521 bp. The longest non-coding region of two Dosinia mitogenomes are both located between trnI and trnD.
Both D. altior and D. troscheli mitogenomes contain 13 protein-coding genes (PCGs), 22 transfer genes and 2 ribosomal RNA genes (Fig 1). Main structure features of three Dosinia mitogenomes are summarized in Table 2. All the genes are encoded on the (+) strand. As a whole, although gene organization is known to vary extensively, even among species from the same genus [30], the three Dosinia mitogenomes showed a same gene order within the genus.
AT content, AT-skew and GC-skew for mitochondrial sequences of two Dosinia species were shown in Table 3. The overall AT content of the D. altior mt genome is 69.57%, which is little lower than 69.67% of D. troscheli and 69.97% of D. japonica. Similar patterns appear in NCRs (72.45% vs. 74.07% vs.74.56%), tRNAs (72.04% vs. 72.72% vs. 72.23%), and rrnL (73.08% vs. 73.27% vs. 74.15%). In order to evaluate the base bias in the mitogenomes, we measured AT-skew and GC-skew in different gene regions of D. altior and D. troscheli, and compared with D. japonica. The results indicated that the values of AT-skew of three species were mostly negative except rrnS, as well as GC-skew values were all positive. Furthermore, the AT-and

Protein-coding genes
Both D. altior and D. troscheli mt genomes encode the full set of 13 proteins. The whole size of the PCGs of D. altior was 11,688 bp, higher than the length of D. troscheli which is 11,598 bp. The overall AT content of the 13 PCGs was 68.21% in the D. altior mitogenome, ranging from 65.68% (cox1) to 75.28% (nad4l), and in D. troscheli mitogenome, the overall AT content of 13 PCGs was 68.06%, which is ranging from 65.91% (nad1) to 71.79% (nad4l).
In the 13 PCGs identified in Dosinia mitogenomes, we found the atp8 which has been reported as missing in several bivalve species. Likewise, there are also many accurate searches referring to the identification of this gene, so the alleged lack of atp8 within some bivalve species is likely due to inaccurate annotation due to the extreme variability and the small size of the gene in most cases [4,13,15,18]. The atp8 was found in publicly available mitogenome sequences of Veneridae species (Table 4). In genus Dosinia, this short gene encoded a 38 amino acids protein starting with ATG and GTG (ATG in D. troscheli and D. japonica, GTG in D. altior), and end with a stop codon (TAA). The location of the atp8 within the mitogenomes is same in the three species of Dosinia genus, which is between rrnL and nad4. Amino acid sequences of these ATPases were aligned among three Dosinia clams, and the conversed motifs were found, in particularly, the N-terminus and the C-terminus (Fig 2).
Codon usages of three Dosinia species were presented in Table 5. All stop codons were excluded from the calculation. The mt genome of D. altior encoded 3883 amino acids in total, compared with 3853 in D. troscheli and 3813 in D. japonica, respectively. All codons are used in three mitogenomes but with different frequencies. The three predominant codon families are UUU (phenylalanine), UUA (leucine) and GUU (valine) in D. altior mitogenome, and there are UUU (phenylalanine), UUA (leucine) and AUU (isoleucine) in D. troscheli, as all of these codons are A+T-rich codon families. On the contrary, the least chosen codons are from G+C-rich codon families such as CGC (arginine) and CCC (proline).

Transfer RNA and ribosomal RNA genes
D. altior and D. troscheli mitogenomes contain the usual set of 22 tRNA genes typical of metazoans, two copies of trnS and trnL, and one tRNA gene for each of the other 18 amino acids. All the tRNA genes are encoded on the (+) strand. The tRNA genes are interspersed in the mt genome, and ranged from 61 bp (trnE and trnA) to 69 bp (trnK and trnQ). There are seven tRNA genes (trnS, trnP, trnF, trnW, trnG, trnT, trnM) which have identical lengths within the three species from Dosinia ( Table 2). The predicted secondary structures of tRNAs in two Dosinia clams were shown in S1 and S2 Figs. The majority of tRNAs can be folded into typical clover-leaf secondary structures except for trnS AGN and trnS UCN , which lack the DHU arm. However, absence of the DHU arm in the secondary structure of trnS AGN and trnS UCN is commonly observed in molluscs [33,47,48]. The trnY of D. altior showed no terminal TψC loop, which have previously been found in mtDNA of other bivalve species, such as genus Donax [4]. The accepter stem and anticodon stem of tRNAs in Dosinia mt genomes are 5-8 bp, while the dihydrouracil stem and TψC stem are 3-5 bp in length.
The rrnL and rrnS of D. altior and D. troscheli were identified by sequence comparison with the available 13 Veneridae species from GenBank, though an accurate delimitation of gene boundaries must await transcript mapping. All rrnS in Dosinia mt genomes were located between trnT and trnM, which was same with that in S. purpuratus and C. sinensis. The length of rrnS in three mt genomes ranged from 900 bp (D. altior) to 903 bp (D. troscheli), with A+T content between 69.18% (D. japonica) and 71.44% (D. altior). On the other hand, the rrnL was flanked by cytb and atp8 in all Dosinia mt genomes, just like in M. lyrata [34], M. lusoria [33], and M. lamarckii [32]. Its size varied from 1,200 bp (D. altior) to 1,203 bp (D. japonica), and Complete mitochondrial genomes of Dosinia and phylogenetic position within Veneridae A+T content ranged between 73.08% (D. altior) and 74.15% (D. japonica). In the mt genome of D. altior, the size of either rrnS or rrnL is the shortest within that of mitogenomes in the family Veneridae [10,27,30,[32][33][34].

Non-coding regions and repeat units
Differences of genome size are mainly due to different lengths of non-coding regions (NCRs) [30]. In the comparison of NCRs within the three mt genomes of genus Dosinia (Table 6), the mt genome of D. altior contained 23 NCRs ranging in size from 1 to 1865 bp, and the total length was 2385 bp, which was longer than D. troscheli. D. troscheli mitogenome contained 21 NCRs ranging in size from 2 to 1521 bp, and the whole length of that was 2140 bp. Both lengths of NCRs in D. altior and D. troscheli are shorter than that of D. japonica. Within these noncoding sequences, a 1521 bp and a 1865 bp nucleotide segments were putatively identified as the major non-coding region (MNR) of D. altior and D. troscheli. The MNR has a high A+T content of 71.26% in D. altior and 72.45% D. troscheli, which is higher than the average of the whole genome (69.57% in D. altior, 69.67% in D. troscheli) and PCGs (68.21% in D. altior, 68.06% in D. troscheli). The increased A+T content of MNR is thought to contain the signals for replication and transcription, and hence is referred to as the control region [49]. In addition, the MNR has been regarded as the most easily relocated segment followed by tRNA genes and PCGs [25,50,51]. For example, the gene arrangements of four Paphia genomes are identical, but locations of MNRs are obviously different [30]. However, in the present study, we observed that the MNRs of the three Dosinia clams were all located between trnI and trnD genes, although with different tandem repeats. Tandem repeats have been described in the non-coding regions of metazoan [52][53][54][55], and in family Veneridae, the tandem repeat units were regarded as a common feature for mitogenomes, such as in M. petechialis, M. lusoria, and R. philippinarum [56]. The tandem repeat units found in D. altior, D. troscheli and D. japonica were all occurring around about 450 bp from the 5' end of the noncoding region (Fig 3). There were 2.2 nearly identical copies of a 30 bp unit in the MNR of D. altior, and 2 nearly identical copies of a 30 bp sequence in D. troscheli, both the total lengths were shorter than that of D. japonica. D. japonica contained 1.9 copies of 75 bp repeat motif in MNR. Each tandem repeat motif of MNR in the Dosinia clams could form a secondary structure with a stem-loop when the sequence is folded to minimize the free energy of the structure. Great divergence in the tandem repeat region of different species which were closely related, even from same genus, was common, such as the M. lusoria and M. petechialis from Meretrix genus [33]. Further study of tandem repeats in the control region is needed, as it is important to illuminate a variety of process, including the molecular mechanisms for their generation and their possible functional implications [57].

Gene arrangement and phylogenetic analyses
Gene arrangements appear to be dramatically variable in the major groups of bivalves, even with differences in the same family or genus [19,28,29]. In this study, we compare the gene arrangement of three Dosinia mt genomes with that of other closed related species belonging Complete mitochondrial genomes of Dosinia and phylogenetic position within Veneridae to Veneridae family (Fig 4). There were 15 Veneridae clams from six different genera (Paphia, Meretrix, Ruditapes, Saxidomus, Cyclina and Dosinia), and exhibit nine different gene orders. In genus Dosinia, three Dosinia clams share the same gene order with each other among 37 genes, and at the same time, the four Paphia mitogenomes also have completely identical gene arrangements within genus. Unlike above two genera, there were four patterns of gene arrangement in the genus Meretrix, and this comparison was previously reported by Wu et al. [34] and Wang et al. [33]. M. lamarckii and M. lusoria have the identical gene order, but the other three Meretrix species have variable arrangement due to the tandem duplication of trnQ, trnI, trnN, and five different locations of tRNA genes (trnT, trnA, trnQ, trnN, trnR).
The tRNAs are more variable than either rRNAs or PCGs because of their secondary structure, which allow them to translocate more frequently [58,59]. After excluding tRNA genes from the comparison, the gene arrangements among Veneridae mitogenomes tend to be relatively conserved. In this regard, Genera Dosinia, Meretrix, Cyclina and Saxidomus show the same gene order except for translocations of genes nad3 and nad5 in Saxidomus; Dosinia and Ruditapes are almost identical except for translocations of genes rrnS and cox3. It is clear that Dosinia genus is more similar to Meretrix and Cyclina than Paphia in terms of gene arrangement. Furthermore, there is only one gene block (cox1-nad1-nad2-nad4l-cox2-cytb-rrnL) shared when comparing gene arrangement of PCGs and rRNAs in the 15 Veneridae species, and this gene block may be inherited from the common ancestor of Veneridae.
For the dramatic gene arrangement of mt genome during evolution in Veneridae, Xu et al. [27] ever explained the mt genomic rearrangement event among the R. philippinarum, P. euglypta and M. petechialis with the tandem duplication and random loss (TDRL) model, and then, the TDRL model also was used in the analyses of gene arrangements in five Meretrix clams [34]. The deduced evolutionary pathways from this model suggested that block interchange between adjacent genes might be common in the evolution of the mt genomes in venerids.
Comparative analysis of mitochondrial gene order has been proved to be a useful phylogenetic tool as reported in many other previous studies [4,18,48,[60][61][62]. In this study, the results are consistent with the conclusion from the phylogenetic analysis (see below), and support that gene order is a useful hallmark helping to clarify phylogenetic relationships within the family.
ML and BI trees based on the nucleotide sequences of 12 concatenated protein-coding genes (except atp8 gene) were performed to construct phylogenetic relationships within family Veneridae (Fig 5). The topological structures showed that the results of ML and BI trees were in almost complete agreement.
The phylogenetic analysis indicated that three species belonging to genus Dosinia, including D. japonica, D. altior and D. troscheli clustered together, supporting that Dosinia is monophyletic, which was in accordance with previous viewpoints [6,7]. At the same time, D. japonica and D. troscheli formed a single group at the Dosinia level, with D. altior being the sister taxon. This result is consistent with the morphological classification [35], in which D. altior was classified into the subgenus Bonartemis, D. japonica and D. troscheli were classified into another subgenus Phacosoma.
In this study, according to the phylogenetic tree, Dosininae and Meretricinae have a closer relationship, with Tapetinae being the sister taxon, which indicates Dosininae and Meretricinae may have a common ancestor based on the nucleotide sequences. Our results were in agreements with the previous studies discussing the phylogenetic relationships which including Veneridae family within Heterodont or Veneroida [3,13,14,16,18]. Besides, there is a disagreement about the sister relationship between Dosininae and Meretricinae, which ever Complete mitochondrial genomes of Dosinia and phylogenetic position within Veneridae reported by Chen et al. [6] and Lv et al. [10] based on short fragments of nuclear gene or mt DNA. This similar phenomenon had also happened to Cardioidea and Tellinoidea which was reported by Yuan et al. [18], and different evolutionary position of M. lyrata and M. lamarckii by Wu et al. [34] and Fernandez-Perez et al. [4]. The credibility of the results in this study include: i) the number of mt genome was increased to enhance the accuracy of evolutionary position, avoiding only one mitogenome of Dosinia considerable divergence from the other venerids; ii) mt genome contains more genetic information than short gene fragments, and short gene fragments existed the substitutional saturation of nucleotide and horizontal gene transfer; and iii) this conclusion was in accordance with the analyses of gene arrangement among 15 mt genomes in Veneridae family in this study.
The phylogenetic analysis here provided a real possible phylogenetic relationship among Dosinia species and their evolutionary position within family Veneridae, and in this sense, results of phylogenetic analyses support the idea that genome reorganization among congeneric species is not random, but is correlated with their phylogenetic relationships. In future, more detailed analyses with a larger taxon sampling and more rapidly evolving molecular markers including mt genome are still necessary in order to clarify the phylogenetic relationships within family Veneridae.

Conclusion
In this study, we sequenced the complete mt genomes of D. altior and D. troscheli, and compared the genus-specific genomic features of mitogenomes in Dosinia genus. Our study has provided insights into the evolutional relationships among Veneridae clams. Based on phylogenetic analyses of multiple protein-coding genes, we suggest that Dosinia genus would be evolutionarily closer to Meretricinae clams than Tapetinae, and support that the Dosinia is monophyletic. Nevertheless, additional analyses with mitochondrial genomic information of more diverse clam species will be required to further understand the phylogenetic relationships in family Veneridae including genus Dosinia.