First complete female mitochondrial genome in four bivalve species genus Donax and their phylogenetic relationships within the Veneroida order

Background Four species of the genus Donax (D. semistriatus, D. trunculus, D. variegatus and D. vittatus) are common on Iberian Peninsula coasts. Nevertheless, despite their economic importance and overexploitation, scarce genetic resources are available. In this work, we newly determined the complete mitochondrial genomes of these four representatives of the family Donacidae, with the aim of contributing to unveil phylogenetic relationships within the Veneroida order, and of developing genetic markers being useful in wedge clam identification and authentication, and aquaculture stock management. Principal findings The complete female mitochondrial genomes of the four species vary in size from 17,044 to 17,365 bp, and encode 13 protein-coding genes (including the atp8 gene), 2 rRNAs and 22 tRNAs, all located on the same strand. A long non-coding region was identified in each of the four Donax species between cob and cox2 genes, presumably corresponding to the Control Region. The Bayesian and Maximum Likelihood phylogenetic analysis of the Veneroida order indicate that all four species of Donax form a single clade as a sister group of other bivalves within the Tellinoidea superfamily. However, although Tellinoidea is actually monophyletic, none of its families are monophyletic. Conclusions Sequencing of complete mitochondrial genomes provides highly valuable information to establish the phylogenetic relationships within the Veneroida order. Furthermore, we provide here significant genetic resources for further research and conservation of this commercially important fishing resource.


Principal findings
The complete female mitochondrial genomes of the four species vary in size from 17,044 to 17,365 bp, and encode 13 protein-coding genes (including the atp8 gene), 2 rRNAs and 22 tRNAs, all located on the same strand. A long non-coding region was identified in each of the four Donax species between cob and cox2 genes, presumably corresponding to the Control Region. The Bayesian and Maximum Likelihood phylogenetic analysis of the Veneroida order indicate that all four species of Donax form a single clade as a sister group of other bivalves within the Tellinoidea superfamily. However, although Tellinoidea is actually monophyletic, none of its families are monophyletic.

Conclusions
Sequencing of complete mitochondrial genomes provides highly valuable information to establish the phylogenetic relationships within the Veneroida order. Furthermore, we provide here significant genetic resources for further research and conservation of this commercially important fishing resource. PLOS

Introduction
Bivalve molluscs of the genus Donax (Donacidae family) are an important constituent of the macrofauna of sandy beaches in temperate, tropical and subtropical zones, being the dominant organisms in this type of environment [1]. In the littoral of Iberian Peninsula, the five European species of Donax live sympatrically in the same beaches [2,3] [4,5,6,7]. Nevertheless, D. venustus is practically non-existent in the Iberian Peninsula as a single individual has been found between the years 2000 and 2006 along the south coast of Portugal [3]. Few species of the genus Donax are commercially exploited, but some are consumed locally or used as fishing bait. D. trunculus is exploited in many countries bordering the Mediterranean Sea and Atlantic Ocean, including Portugal [8,9], Italy [10], France [11], and Spain [12,13]. Only in Iberian Peninsula, the recorded captures since 1999 to 2014 equal 10,156 tons, with a maximum production of 1,042 tons in 2005 followed by an incessant decline reaching only 250 tons in 2014 [14]. Although this data only reflects production since fishermen were obliged to declare their captures [8], the species has been subjected to intense exploitation over the last decades and, currently, some D. trunculus populations seem to be at high long-term risk of extinction [15]. Furthermore, this species constitutes an important shellfish resource due to its high economical value. For instance, in Galicia (northwest of Spain), D. trunculus is a species with a high contribution rate, being the bivalve with greater commercial value (38.52 €/kg in the year 2016) [16] in markets during last years. Due to the similarity in size, shape and colour of the Donax clams in different species, captures of D. trunculus in natural beds may contain other species of the genus with lesser economical value and may be marketed together. However, despite their overexploitation and economic importance, relatively few genetic resources are available for this species [15,17] and the whole genus [18,19].
In order to preserve this important fishing resource, genetic tools should be employed. Molecular genetics has proven highly informative to determine the level of genetic variability, which is an essential feature to consider when defining conservation priorities, as well as to better understand the (recent) evolutionary history of species groups. Within the molecular resources, mitochondrial (mt) genome stands out to be considered a useful tool for population genetic and phylogenetic studies, not only because complete mt genomes are often more informative than single genes, but also because they reveal some genome-level details, such as the rearrangement of genes, which are valuable information for studies of evolutionary relationships among species [20,21,22,23]. Moreover, mitochondrial DNA (mtDNA) is particularly important in helping to differentiate species that are morphologically similar, contributing to the identification and authentication of commercial food species to detect and avoid fraud, to protect consumer rights and to achieve other quality objectives, such as certificate of origin.
Most metazoan mitochondrial genomes are typically closed circular molecules of~16 kb, enconding 37 genes: 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes [24]. In addition, at least one extensive non-coding sequence is present which contain elements that control the initiation of replication and transcription [25]. Mitochondrial genome has several valuable features that make it exclusive, including its small size, high evolutionary rates, limited recombination, relatively conserved gene content and organization, and maternal inheritance [22,26]. Though, an extreme exception to the paradigm of strict maternal inheritance of animal mtDNA (SMI) is found in some bivalve lineages, which possess an unusual system known as doubly uniparental inhertince (DUI) ( [27,28,29] for reviews). Species showing DUI display two different kinds of mitochondrial genomes, i.e. male (M) and female (F) mitogenomes. While females have only the F genome, males are heteroplasmic and possess F and M genomes, which the F type predominating in somatic tissues and the M one in gonads [30,31]. To date, the vast majority of species with DUI which have been reported belong to the orders Mytiloida, Nuculanoida, Unionoida and Veneroida [32], including the wedge clam D. trunculus [33].
In this study, we determine, for the first time, the complete female mitochondrial (mt) genome sequences in four species of Donax from the Iberian Peninsula, and compare them with those of other marine bivalves. In addition, the four newly sequenced mitogenomes, together with the veneroids mt genomes available in GenBank, were used to construct the phylogenetic relationships in the Veneroida order. This work should be of importance not only for better understanding the phylogenetic relationships within the Veneroida order, but also for the development of genetic markers useful in wedge clams aquaculture and restoration effects, as well as for the identification and authentication of commercial species.

Ethics statement
All clams handling was conducted in accordance with the guidelines and regulations established by the University of A Coruña and the local governments. Field sampling did not require specific permissions but was in accordance with general governmental regulations. No endangered or protected species were involved.

Samples collection and DNA extraction
Given that DUI has been described in D. trunculus [33] and we have found evidence for it in D. vittatus and D. semistriatus [34], and since the goal of our work was on female mtDNA, we used somatic cells of female specimens as the only source for mtDNA sequencing. Therefore, each of the four Donax complete mt genomes sequenced here was obtained from a single female specimen in each species, sampled at natural beds. The D. trunculus sample was collected at Corrubedo (A Coruña, northwestern Spain) while the D. semistriatus, D. variegatus and D. vittatus samples came from the Portuguese coast (Table 1). Gender determination was performed on each individual by microscopic examination of gametogenic tissue from the visceral mass, and was based on the presence of eggs or sperm. Specimens were taxonomically identified using Pereira et al. 2012 [18] and Nantón et al. 2015 [19] molecular protocols developed in our laboratory. Voucher specimens and their shells were deposited at the malacology collections of the Museo Nacional de Ciencias Naturales (MNCN), Madrid (Spain) ( Table 1).
Total genomic DNA was extracted from about 40 mg of ethanol-preserved foot muscle tissue of female specimens using DNAeasy Blood and Tissue Kit (Qiagen, Germany) following manufacturer´s instructions with only a minor modification, namely EB (10mM Tris-Cl, pH 8.5) rather than AE (10mM Tris-Cl, 0.5 mM EDTA, pH 9.0) buffer was used to avoid possible interference of EDTA with Nextera enzyme.

Mitogenome assembly and annotation
The mt genomes were reconstructed using 2x1,000,000 reads per species with the MITObim assembler [35]. We performed a first assembly with the -quick option, which resulted in a partial mt genome sequence of about 10,000 bp. In order to get the complete sequence, we extracted the sequence of the COI gene from the previous assembly to be used as starting sequence in MITObim with the -seed option. This yielded sequence of about 17,000 bp whose quality and completeness were assessed on the basis of their average coverage along their whole length, by mapping, in each species, the same 2x1,000,000 reads used in the assembly against the inferred mitogenome sequence. For this purpose, we used the SSAHA2 software [36] with a minimum score of 100. Then we extracted coverage information from these mapping using pysamstats (available at: http://github.com/alimanfoo/pysamstats).
The mt genomes were annotated using the MITOS Web Server [37] applying the invertebrate mitochondrial genetic code and followed by manual validation of the coding regions using the NCBI ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/). Based on ORF Finder result, the sqn files generated from MITOS were edited and submitted to NCBI. The annotations of PCGs were refined, while the annotations of tRNA genes were kept unchanged. tRNA genes were detected using MITOS, tRNAScan-SE v.2.0 [38] and ARWEN v.1.2 [39]; and secondary structures of tRNAs were inferred using MITOS in default search mode. Mitogenome maps were drawn using GenomeVx online tool [40] followed by manual modification. Repeat sequence patterns in the longest non-coding region (NCR) were checked using the web-based software server Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html) [41].

Phylogenetic analyses
To investigate the phylogenetic relationships between species of the Veneroida order, we used the 33 mitogenomes currently available in GenBank (last accessed 17 January 2017), in addition to the four newly determined in this work. Lucinella divaricata and Loripes lacteus, belonging to the order Lucinoida, were used as outgroups ( Table 2). Owing to the fact that a lack of the Atpase subunit 8 (atp8) gene has been reported in some bivalve species, we investigated the possibility that its presence might have gone unnoticed in these species by actively searching for atp8 sequence in an annotation with MITOS and aligning with other mitogenomes using Geneious Pro v.4.8.5 [42]. We found the atp8 gene in eight species where previous analyses had concluded the absence of this gene. The alignment of the amino acid sequences for each of the 13 mitochondrial PCGs was performed with the MUSCLE plug-in in Geneious Pro v.4.8.5 [42] with default parameters. We removed poorly aligned regions with Gblocks v.0.91b [43], with options allowing gaps for all positions and 85% of the number of sequences for flanking positions. The 13 separate amino acid sequence alignments were then concatenated into a single large dataset consisting of 2617 sites (S1 File).
Phylogenetic analyses were performed under Maximun Likelihood (ML) using RaxML [44] in a web server (http://embnet.vital-it.ch/raxml-bb/) and Bayesian inference (BI) using MrBayes v3.2.6 [45] and PhyloBayes [46]. The best fit models of amino acid evolution were LG + G + F for cox2, atp6, nad6 and atp8; and LG + G for nad4l. The ML analyses consisted of 1000 bootstrap iterations using the CAT model for each partition. BI analysis consisted of two independent Markov chain Monte Carlo (MCMC) runs, each comprising four linked chains (one cold and three heated; as default settings). They were performed for 1,000,000 generations, sampling every 100 generations to allow adequate time for convergence. The convergence of the two runs was assessed by stopping the analysis when the average standard deviation was below 0.01 (stoprule = yes and stopval = 0.01 in the mcmc command). 1,000,000 generations were enough to reach adequate average standard deviation (<0.01). By default, the first 25% trees were discarded as burn-in. BI analyses were also conducted at the amino-acid level using the CAT + GTR model in PhyloBayes [46]. Two independent MCMC analyses were run in parallel for 4,000 generations. The first 1,000 samples were discarded as burn-in. From the remaining samples, we sampled a tree every 10 cycles to compute a consensus tree. The convergence between the two chains were considered acceptable when the maxdiff parameter was below 0.3 (maxdiff = 0.218586) and the minimum effective size (MES) was >50 (MES = 64).

Genome composition
The mitogenomes of the four Donax species sequenced in this study were circular molecules, as revealed by the MITObim assembly. They are composed of 37 genes: 13 PCGs (including the atp8 gene), two ribosomal RNA genes and 22 transfer RNA genes (Fig 1). Their main structural features are summarized in Table 3 Although gene organization is known to vary extensively, even among species from the same genus [22,48,49], all four complete Donax mt genomes showed the same gene order and they are located on the "+" strand, likewise in Macoma balthica, other member of the Tellinoidea superfamily for which the whole mt genome is available [50]. The only difference was noted in the location of the longest NCR which, in M. balthica, is situated between rrnS and tRNA-Met, whereas in Donax clams it is located between cob and cox2 genes (Fig 1). Therefore, in consistency with the highly rearranged gene order in bivalves, the longest NCR is not conserved at the same position among bivalve mt genomes [51,52].

Protein coding genes
The typical 13 PCGs were identified in the four new mitogenomes analyzed here, including the atp8 gene, which had been reported as missing in several bivalve species [51,53,54,55,56,57,58], but subsequent analysis found its presence in several of them [48,50,52,59,60,61,62]. It was suggested that the short and variable length of this protein, along with its high variation in amino acid composition, might hinder the finding of this gene due to annotation difficulties [22]. However, using the same bioinformatic approach employed in Donax species, we found the atp8 gene in publicly available mitogenome sequences of most Veneroida order species available in the databases (Table 4). Moreover, we found other discrepancies with Gen-Bank annotations.    [63]. However, in Donax species atp6 and atp8 genes are physically separated by 1,917 (D. trunculus)-1,928 bp (D. vittatus). Likewise, these two genes also fail to be adjacent in the mitogenome of other heterodont bivalves, such as Hiatella arctica [59], M. balthica [50] and Meretrix lamarckii [64]. On the contrary, they are adjacent in the Unionidae [65] and Solemydae [66], as well as in basal molluscs like Chaetoderma nitidilum (EF211990) and Katharina tunicata [67]. This suggests that the association of these genes might be an example of an ancestral state that has later been lost in derived bivalves. Total length of the 13 PCGs ranged from 11,718 bp (D. variegatus) to 11,829 bp (D. trunculus), accounting for 68.1-69.2% of its total mt genome length. The longest PCG is nad5, with a size of 1,734 bp (577 aa), whereas nad2, cox1, nad4 and cob exceed 1,000 bp. However, nad3 and nad4l genes are shorter than 400 bp and atp8 gene is the shortest PGC with 126 bp (41 aa). These features are similar to those previously reported in M. balthica [50] and five other species of the Tellinoidea superfamily (Moerella iridescens, Sanguilonaria diphos, Sanguinolaria olivacea, Semele scabra and Solecurtus divaricatus) [51].
The ATN conventional start codon is used in most PCGs (ATG, N = 41; ATA, N = 2; the last codon being classically found in the invertebrate mitochondrial genetic code, particularly in bivalves [50]). However, like most invertebrate mt genomes, Donax mtDNA shows alternative start codons, and some PCGs start with NTG codons (TTG, N = 8; GTG, N = 1). In contrast, the observed stop codons are TAA (N = 32) and TAG (N = 20), and all 13 PCGs of the four mt genomes end in a full termination codon.
Twenty-two discrete nucleotide sequences (ranging from 62 to 69 bp) were predicted to fold into the typical secondary structures of tRNAs (see S2-S5 Figs). The predicted structures of tRNA genes showed cloverleaf shape with four arms in the four species, although some of them exhibited folding differences. Sixteen tRNAs showed a small supplemental stem loop (four in D. semistriatus: tRNA-Pro, tRNA-Phe, tRNA-Ile and tRNA-Leu2; two in D. trunculus: tRNA-Ile and tRNA-Thr; six in D. variegatus: tRNA-Val, tRNA-Pro, tRNA-Gln, tRNA-His, tRNA-Ile and tRNA-Arg; and four in D. vittatus: tRNA-Pro, tRNA-Phe, tRNA-Ile and tRNA-Leu2). Seven tRNAs showed no terminal TCC loop (three in D. semistriatus: tRNA-His, tRNA-Thr and tRNA-Arg; one in D. trunculus: tRNA-Asn; and three in D. vittatus: tRNA-His, tRNA-Thr and tRNA-Asp). In addition, tRNA-Ser2 in D. trunculus showed the dihydrouracil (DHU) stem replaced by a big DHU loop. Finally, the single unpaired nucleotide, which is usually present at the 5´end in other tRNAs, appeared at the 3´end in tRNA-Tyr, with the only exception of D. variegatus where this tRNA lacks this unpaired nucleotide. These features have previously been found in mtDNAs of other bivalve species, such as M. balthica [50] and M. lamarckii [64].

Non-coding regions
As in most bivalves, the four species of the genus Donax analyzed here contained a large number of NCRs. The number of intergenic sequences varied from 17 (D. trunculus and D. vittatus) to 22 (D. variegatus), with 1,679 bp (representing 9.9% of the whole mitogenome) in D. semistriatus to 1,985 bp (11.4% of the mt genome) in D. trunculus ( Table 5). The longest NCR was First complete female mitogenome in four Donax species located between cob and cox2 genes in the four species, with length ranging from 1,549 bp (D. semistriatus) to 1,863 bp (D. trunculus). The other NCRs ranged fom 1 to 21 bp. The longest NCR is thought to contain the Control Region (CR) because it presents some peculiar patterns, such as AT-rich or tandem repeats, believed to play a role in initiating and/or regulating mitochondrial transcription and replication [24,68,69]. The A+T content of the longest NCR in each mt genome was higher (D. semistriatus, D. variegatus and D. vittatus) or slightly lower (D. trunculus) than that of the whole mt genome (Table 5).
Six tandem repeats were also found in the longest NCRs of the four mt genomes, four of which were distinct tandem repeat units.  [51,55,59,62]. The study of tandem repeats in the CR is important for the light it sheds on a variety of processes, including the molecular mechanisms arising them and their possible functional implications [70].

Phylogenetic analysis in Veneroida
To further study the relationships among Donax species and its position within the Veneroida order, ML and BI trees based on amino acid sequences of 13 concatenated PCGs belonging to 37 species were performed (Fig 2). Tree topologies were congruent and received high support in most nodes, with the exception of S. scabra, which showed a less basal position in the Phylo-Bayes phylogeny ((M. balthica + M. iridescens) + S. scabra) with 0.57 posterior probability as branch support.
We perform here the first phylogeny including the species of the genus Donax from the Iberian Peninsula (D. trunculus, D. semistriatus, D. variegatus and D.vittatus). Our analysis has shown that the four species form a single clade as a sister group to other bivalves of the superfamily Tellinoidea. All ten species of this superfamily belong to five different families and form a strongly supported clade, thus corroborating the monophyly of this superfamily [71,72]. Nevertheless, our phylogenetic tree indicated, with high support by BI and ML, that S. diphos (Psammobiidae) shows closer relationship with S. divaricatus (Solecurtidae), M. balthica and M. iridescens (Tellinidae), S. scabra (Semelidae) and Donax species (Donacidae) rather than with N. olivacea (Psammobiidae), which implies that these two species (S. diphos and N. olivacea) do not form monophyletic groups. This result is also reported by Yuan et al. 2012 [51] and Ozawa et al. 2017 [73], and it is in agreement with the conclusion put forward by Taylor et al. 2007 [71] when analysed familial relationships within Tellinoidea, as Semelidae, Donacidae and Tellinidae do not form monophyletic groups. Tellinoidea is actually monophyletic, but none of its families are monophyletic [72], suggesting the need for a more exhaustive study within this commercially important marine bivalve clade. Gene arrangement within mitogenomes is highly conserved in many taxonomic groups. For instance, most vertebrates share the same gene order [74]. However, in other animal groups, like the class Bivalvia, the mitochondrial genome arrangement is more variable [51,52]. We compare here the gene arrangements of four newly sequenced mitogenomes to other closed related species belonging to Tellinoidea superfamily. This comparison was previously done by by , without taking into account the atp8 gene and without including Donax species and M. balthica, and their results supported the conclusion that comparisons of mitochondrial gene order rearrangements are, to some extent, a useful tool for phylogenetic studies. Seven out of the ten Tellinoidea mitogenomes hitherto analyzed (including the four Donax species analyzed by us, M. balthica, M. iridescens and S. divaricatus) show completely identical gene order, and S. diphos only differs in lacking a tRNA-Phe. Remarkably, the atp8 gene shows the same location within the mitogenome of these eight species of the Tellinoidea superfamily, specifically between tRNA-Met and tRNA-Ser1. This result is consistent with the main phylogenetic conclusions from the 37 mitogenomes analyzed here (see above), and remarks the interest of performing additional full mitogenome sequencing, especially including more veneroid families and subfamilies, with gene order being a useful hallmark helping to clarify phylogenetic relationships within the order.

Future implications
This is a basic research work where we describe and characterize, for the first time, the female mitochondrial genome in four bivalve molluscs belonging to the genus Donax. This has provided new interesting information for the scientific community which can be feasible for First complete female mitogenome in four Donax species application in aquaculture. In fact, the mtDNA sequences contributed here add significantly useful genetic markers for i) helping to differentiate these commercial food species being morphologically similar, ii) detecting and avoiding fraud, iii) protecting consumer rights and achieving other quality objectives, such as certificate of origin, and iv) for using in population genetics studies and aquaculture stock management in Donax species. However, this possible applicability requires a broader work, where the different markers will be tested in a higher number of individuals, not only fresh individuals but also processed, packaged or frozen ones, as well as in a high number of females and males given that male genomes are still not available.

Conclusions
In this study, we determined the complete mt genomes of four bivalve species of the genus Donax, which are the first representatives from the family Donacidae being analyzed at this respect. Not only we have increased the number of complete mt genomes sequenced within Veneroida order, but also, we have illustrated the phylogenetic relationships among Donax species and their position within this order. Our results demonstrate that the sequencing of complete mitogenomes provides highly valuable information for phylogenetic analysis in bivalves. Furthermore, the mtDNA sequences contributed here add significantly useful genetic markers for use in species identification and authentication, phylogeny, population genetics, and aquaculture stock management in species of Donax.