The Complete Mitochondrial Genome of Galba pervia (Gastropoda: Mollusca), an Intermediate Host Snail of Fasciola spp

Complete mitochondrial (mt) genomes and the gene rearrangements are increasingly used as molecular markers for investigating phylogenetic relationships. Contributing to the complete mt genomes of Gastropoda, especially Pulmonata, we determined the mt genome of the freshwater snail Galba pervia, which is an important intermediate host for Fasciola spp. in China. The complete mt genome of G. pervia is 13,768 bp in length. Its genome is circular, and consists of 37 genes, including 13 genes for proteins, 2 genes for rRNA, 22 genes for tRNA. The mt gene order of G. pervia showed novel arrangement (tRNA-His, tRNA-Gly and tRNA-Tyr change positions and directions) when compared with mt genomes of Pulmonata species sequenced to date, indicating divergence among different species within the Pulmonata. A total of 3655 amino acids were deduced to encode 13 protein genes. The most frequently used amino acid is Leu (15.05%), followed by Phe (11.24%), Ser (10.76%) and IIe (8.346%). Phylogenetic analyses using the concatenated amino acid sequences of the 13 protein-coding genes, with three different computational algorithms (maximum parsimony, maximum likelihood and Bayesian analysis), all revealed that the families Lymnaeidae and Planorbidae are closely related two snail families, consistent with previous classifications based on morphological and molecular studies. The complete mt genome sequence of G. pervia showed a novel gene arrangement and it represents the first sequenced high quality mt genome of the family Lymnaeidae. These novel mtDNA data provide additional genetic markers for studying the epidemiology, population genetics and phylogeographics of freshwater snails, as well as for understanding interplay between the intermediate snail hosts and the intra-mollusca stages of Fasciola spp..


Introduction
Many snails within the families Lymnaeidae and Planorbidae act as intermediate hosts of medically and veterinary important digenean trematodes that infect humans and domestic animals (especially sheep and cattle) [1,2]. Galba pervia (Pulmonata: Lymnaeidae) is widely distributed and is the dominant host snail for transmission of Fasciola spp. in China [3]. Fascioliasis caused by Fasciola spp. is a signifancant disease of livestock animals causing substantial economic impact [4][5][6][7]. More importantly, millions of humans have been infected by Fasciola spp. in a number of countries [8].
The metazoan mitochondrial (mt) genome, ranging in length from 14 to 18 kb, is typically circular and usually contains 36-37 genes, including 12-13 protein-coding genes, 2 ribosomal RNA (rRNA) genes and 22 transfer RNA (tRNA) genes [9]. In addition, metazoan mt genome usually contains at least one lengthy noncoding region which is essential regulatory element for the initiation of transcription and replication [9]. Mitochondrial DNA (mtDNA) has long been extensively used as genetic markers to resolve evolutionary relationships among animal species due to their maternal inheritance, higher mutation rates than nuclear genes, and relatively conserved genome structures compared to ribosomal DNA [10][11][12][13][14][15].
In coelomate animals, mt gene arrangements are usually relatively stable within each phylum [16]. However, Mollusca, the second largest animal phylum, exhibit high diversity in their mt genome structures. For example, the mt genomes of Mytilus edulis, Argopecten irradians and Chlamys farreri contain supernumerary or lost tRNA genes [17,18]. The Gastropoda are the largest class of the Mollusca and their mt gene arrangements also exhibit high levels of varibility [16]. There have been considerable controversies regarding the phylogenetic relationships of Gastropoda. Gastropoda were traditionally classified into three main subclasses based on their morphological characters: Prosobranchia, Opisthobranchia and Pulmonata. In modern taxonomies, Opisthobranchia and Pulmonata usually are clustered together in the clade Euthyneura [19,20]. When mt gene rearrangements occur, they may usually provide very powerful phylogenetic information for resolving phylogenetic relationships among taxa [21]. Compared to other metazoan animals, only 21 complete mt genome sequences of Pulmonata species have been sequenced and deposited in GenBank (Table 1) to date, and only a low quality mt genome has been determined for the family Lymnaeidae [22].
The objectives of the present study were to determine the complete mt sequence of the G. pervia, to compare the mt sequence with those of Radix balthica to infer further insights into the high variability of Gastropoda mitochondrial genomes, and to study phylogenetic relationships of Pulmonata using mt sequence dataset.

Genome content and organization
The complete mt genome of G. pervia was 13,768 bp in length (Figure 1), and the mtDNA sequence was deposited in GenBank (accession number JN564796). The G. pervia mt genome contains 13 protein-coding genes (cox1-3, nad1-6, nad4L, atp6, atp8 and cytb), a small subunit ribosomal RNA gene (rrnS), a large subunit ribosomal RNA gene (rrnL), and 22 transfer RNA genes, but without lengthy non-coding regions ( Table 2). As found in other Gastropoda species, most of these genes are coded on the heavy strand (H-strand) except for atp6, atp8, nad3, cox3, 8 tRNA genes and rrnS. The details of gene locations were given in Table 2.
The nucleotide compositions of the complete mtDNA sequence of G. pervia are biased toward A and T, with T being the most favored nucleotide and C the least favored, in accordance with the mt genome of R. balthica and Biomphalaria glabrata and B. tenagophila [22,23]. The content of A+T is 72.67% for G. pervia (32.21% A, 40.46% T, 14.58% G and 12.75% C) (  13.50% G and 10.72% C), respectively. Strand asymmetry (strand compositional bias) is usually reflected by skewness [24], which is calculated as (A%2T%)/(A%+T%) and (G%2C%)/(C%+G%), respectively. AT-skews and GC-skews of the whole mt genome were calculated for Pulmonata species to date (Table 3). This composition of full mtDNA sequence of G. pervia is strongly skewed away from A in favor of T (AT skew = 20.114), and GC skew = 0.067. The pattern of skew values of G. pervia is highly congruent with those observed in the mtDNA sequences of other pulmonate animals (Table 3). Previous studies suggested that GC skew is the best indicators of strand asymmetry [25]. Hence, all Pulmonata species in the present study show strand asymmetry (GC skew between 20.071and 0.210) ( Table 3). Interestingly, in all the mt genome sequences of Pulmonata species reported to date, only the GC skew of Rhopalocaulis grandidieri has negative value due to that the C content of R. grandidieri mt genome is relatively higher than its G content. In mammals, these asymmetrical and biased base composition of mt genomes may be due to the spontaneous deamination process of C and A in the H-strand during replication [26,27].
The gene arrangement differs among the mt genome sequences of 21 pulmonate animals, including that of R. balthica and B. glabrata and B. tenagophila (not shown). The gene arrangement in protein-coding genes seems to be stable in this group with a few exceptions. However, tRNA gene arrangement among these pulmonate animals is highly diversified, which provides further support for considerable variation in mt gene arrangement among pulmonate animals. Compared to the mt gene arrangement of R. balthica, the mt genome of G. pervia shows a novel gene arrangement ( Figure 2). All rearranged genes are tRNA genes, in which 3 tRNA genes (tRNA-His, tRNA-Gly and tRNA-Tyr) changed their positions or directions ( Figure 2). The tRNA-His arrangements represent translocation, tRNA-Gly arrangements represent shuffling, and tRNA-Tyr arrangements represent remote inversion translocations. Generally, tRNA gene rearrangements can be classified as translocations (across a protein-coding gene), local inversion (inverted but remaining in the position), remote inverted (translocated and inverted), and shuffling (on the same mt strand but in a different position) [28]. Within the Gastropoda, gene rearrangements have been particularly prevalent, and the mt gene rearrangment events in Gastropoda mt genomes have been discussed [29,30]. To date, four mechanisms have been proposed to explain mt gene rearrangement in metazoan: (i) the tandem duplication followed by random loss (TDRL) of supernumerary genes owning to selection favoring small genomes [31,32], (ii) illicit priming of replication by tRNA genes [33], (iii) tandem duplication followed by nonrandom loss of excess genes [34], and (iv) nonhomologous intergenome or intragenome recombination is presumed to be the most possible explanation for local inversion [35]. Considering all the data available so far, these mechanisms still do not explain mt gene rearrangment events in Gastropoda, because these compactly organized mt genomes of Gastropoda (with very few and short noncoding sequences) suggest strong selection against maintaining remnants of duplication events [36]. However, in G. pervia, a 6 bp and in R. balthica a 156 intergenic region were found in the location of tRNA-His, which might support TDRL as a mechanism acting in Pulmonata mt genome rearrangements.
Overlapping of adjacent genes (protein-encoding genes overlapped) is common in many animal mt genomes, although the extent of overlaps varies [37][38][39][40]. The mt genes of G. pervia overlap a total of 526 bp in 21 locations which range from 1 to 158 bp, including overlapping between protein-encoding and tRNA genes ( Table 2) which has also been found in Gastropoda mt genomes [41,42]. The G. pervia mt genes are separated by intergenic spacer sequences of a total of 179 bp in length, which are located in 13 regions and range from 1 to 42 bp in size ( Table 2). The longest intergenic region (42 bp) is located between cox3 and tRNA-Ile genes.

Protein-coding genes and codon usage patterns
The boundaries between protein-coding genes in the mt genome of G. pervia were determined by aligning their sequences and by identifying translation initiation and termination codons with comparison to those of R. balthica and Biomphalaria spp.. The predicted translation initiation and termination codons for the protein-coding genes of G. pervia mt genome were compared with those of R. balthica and Biomphalaria spp.. As shown in Table 2, the start codons of 13 protein-coding genes are ATN codon, which is typical of most metazoan mt genomes. The start codons inferred in the mt genome of G. pervia are ATA and ATG, and all readingframes of the G. pervia ended with TAG or TAA as termination codons, whereas no anomalous initiation codons and incomplete stop codons are used, although they frequently occur in proteincoding genes of most Gastropoda mtDNA genomes [29].
The pattern of codon usage in the mtDNA of G. pervia was also studied. Excluding the termination codons, a total of 3655 amino acids are encoded by the G. pervia mt genome. Codons composed of A and T are more frequently used in protein-coding genes, which seems to reflect the high content of A+T in the entire mt genome of G. pervia. For example, the most frequently used amino acids were Leu (15.05%), followed by Phe (11.24%), Ser (10.76%) and IIe (8.34%) ( Table 4).

Transfer RNA and ribosomal RNA genes
The 22 tRNA genes in G. pervia mt genome vary in length from 53 to 75 nucleotides with differences in stem and loop sizes of dihydrouridine (D) and TYC loops. The 22 tRNA genes are located on both strands. Of these, 14 tRNA are encoded on the Hstrand and 8 on the light strand (L-strand) ( Table 2). The order and orientation of the gene arrangement pattern are identical to that of R. balthica, except for the positions and directions of the tRNA-His, tRNA-Gly and tRNA-Tyr genes. All of the 22 tRNA genes can be folded into normal cloverleaf structure, except for tRNA-Ser (AGN) that lacks DHU arm. Their putative secondary structures are similar to those of R. balthica or B. tenagophila (not shown), indicating their similar functions. In mt genomes of most Gastropoda animals, tRNA-Ser (UCN) generally lacks DHU arm [30,42,43], but it has a standard cloverleaf structure in mt genome of G. pervia. It is interesting that the secondary structures of mt tRNA genes are highly variable among Gastropoda animals. Previous studies suggested that the reduction of tRNA stem was caused by a strong pressure for mt genome minimization [44].
The rrnS and rrnL genes of G. pervia were identified by sequence comparison with those of R. balthica. The rrnS is located between tRNA-Glu and tRNA-Met, and the rrnL is located between tRNA-Val and the tRNA-Leu CUN . The lengths of the rrnS and rrnL genes of G. pervia are 713 bp and 1009 bp, respectively. The lengths of the rrnS and rrnL genes of R. balthica are 713 bp and 969 bp, respectively. The A+T contents of the rrnS and rrnL of G. pervia are 72.09% and 74.93%, respectively. Sequence identities in the rrnS and rrnL genes between G. pervia and R. balthica are 83.45% and 81.97%, respectively.

Phylogenetic analyses
The phylogenetic relationships of 20 Pulmonata species based on concatenated amino acid sequence datasets, plus the mt DNA sequence of G. pervia obtained in the present study, using maximum parsimony (MP), maximum likelihood (ML) and Bayesian analyses (Bayes) analyses are shown in Figure 3. The amino acid sequences of R. balthica were not used due to its low quality mt genome. The topologies of the trees from the MP, ML and Bayes analyses were identical or similar. In the tree, two major clades were recovered within Pulmonata: clade I and clade II form monophyletic groups, respectively. Within the clade I, Siphonaria pectinata and S. gigas clustered together with high statistical support, indicating that S. pectinata is sister to S. gigas. Within the clade II, Placobranchidae+Volvatellidae and other families form monophyletic groups, respectively.
The results of the present study revealed that G. pervia and B. tenagophila are closely related with high statistical support, indicating that Lymnaeidae and Planorbidae are closely related families, consistent with previous classification based on their morphological features and molecular data [45,46]. Interestingly, MP and ML analyses revealed that families Amphibolidae and Pyramidellidae are not closely related (with very low bootstrap probability value), whereas Bayes analysis strongly supported that Amphibolidae and Pyramidellidae are closely related with high posterior probability value (Figure 3). Recent molecular phylogenetic studies revealed that the monophy of Pulmonata was clearly rejected by the phylogenetic position of a representative of the Systellommatophora (Onchidella celtica) which is more closely related to Pyramidella dolabrata than it was to any other Pulmonate [47]. This is probably due to the use of a small dataset. Thus, sampling of more taxa is needed to more accurately define phylogenetic positions of these families in further studies. In the present study, the phylogenetic status of the family Ellobiidae remains problematic, consistent with findings of recent molecular phylogenetic studies [48,49].
Here, we will not further discuss the phylogenetic relationships of these families (eg., Clausiliidae, Aplysiidae, Volvatellidae, Pleurobranchidae, Siphonariidae and Ellobiidae) since their phylogenetic relationships have been discussed in detail by a recent study using nearly 80 Pulmonata species based on analyses of 18S, 16S and CO1 sequences [50]. Some recent studies showed that analyses using whole mt genome sequences (amino acid sequences) yield more accurate results than that using singe or small sets of gene sequences [51,52]. Thus, more mt genomes of Pulmonata need to be sequenced and evolutionary relationships of Pulmonata should be reexamined.
In conclusion, the present study determined the mt genome sequence of G. pervia, which represents the first sequenced high quality mt genome of freshwater snails of the family Lymnaeidae. The G. pervia mt genome exhibits novel mt gene arrangement compared with other Pulmonata species. Phylogenetic analysis based on the mt amino acid sequence dataset revealed that Lymnaeidae and Planorbidae are closely related families, supporting previous classifications based on their morphology and molecular studies. These novel mtDNA data should be useful for further studying the population genetics and phylogeographics of this freshwater snail, which in turn would contribute to the effective control of Fasciola spp. transmitted by it. Figure 2. Comparison of the mitochondrial gene arrangement between Galba pervia and Radix balthica. All genes have standard nomenclature including the 22 tRNA genes, which are designated by the one-letter code for the corresponding amino acid, with numerals differentiating each of the two leucine-and serine-specifying tRNA (L1 and L2 for codon families CUN and UUR, respectively; S1 and S2 for codon families AGN and UCN, respectively). Underlined genes are coded on the light strand. doi:10.1371/journal.pone.0042172.g002

Sample collection and DNA extraction
An adult freshwater snail representing G. pervia was collected from Guilin City, Guangxi Zhuang Nationality Autonomous Region, China. The specimen was washed in physiological saline, identified morphologically to species according to existing keys and descriptions [53], fixed in 70% (v/v) ethanol and stored at 220uC until use.
Total genomic DNA was extracted from its thoracic muscle tissue by treatment with sodium dodecyl sulphate/proteinase K (Merck), followed by purification using Wizard TM DNA Clean-Up System (Promega) and then eluted into 60 ml H 2 O according to the manufacturer's recommendations. The DNA sample was stored at 220uC until further use.

Long-PCR amplification and sequencing
The obtained nucleotide sequences of partial cox1, nad1, cytb, cox3 and rrnL were used to design primer sets (Table 6) for long PCR amplification of the entire G. pervia mt genome. Five overlapping long PCR fragments covering the entire mt genome of G. pervia were obtained. The Long-PCR reaction volume amounted 50 ml containing 27.5 ml sterile deionized water, 5.0 ml 106LA PCR Buffer (Mg 2+ free), 5.0 ml MgCl 2 (25 mM), 8.0 ml dNTPs (2.5 mM each), 0.5 ml each primer (50 pmol/ml), 0.5 ml LA Taq DNA polymerase (5 U/ml, Takara) and 3 ml DNA template (40 ng/ml). Long-PCR cycling conditions used were 92uC for 2 min (initial denaturation), then 92uC for 10 s (denaturation), 40uC for 30 s (annealing), and 60uC for 5 min (extension) for 5 cycles, followed by 92uC for 10 s, 40uC for 30 s, and 66uC for 5 min for 20 cycles and a final extension at 66uC for 10 min. All amplifications were done on a T-Gradient thermocycler (Biometra, Germany). The 5 long-PCR fragments were sequenced using a primer-walking strategy.

Gene annotation and sequence analysis
Sequences were assembled manually and aligned against the complete mt genome sequence of R. balthica and B. tenagophila using the computer program Clustal X 1.83 [54] to identify gene boundaries. The open-reading frames and codon usage profiles of protein-coding genes were analysed by the Open Reading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) using the invertebrate mitochondrial code. Translation initiation and translation termination codons were identified based on comparison with the mt genome of R. balthica and B. tenagophila. The  amino acid sequences inferred for the mt genes of G. pervia were aligned with those of R. balthica and B. tenagophila by using Clustal X 1.83. Based on pairwise comparison, amino acid identity (%) was calculated for homologous genes. Codon usage was examined based on the relationships between the nucleotide composition of codon families and amino acid occurrence, where the genetic codons are partitioned into AT rich codons, GC-rich codons and unbiased codons. For analyzing ribosomal RNA genes, putative secondary structures of 22 tRNA genes were identified using tRNAscan-SE [55], of the 22 tRNA genes, 5 were identified using tRNAscan-SE, the other 17 tRNA genes were found by eye inspection, and rRNA genes were identified by comparison with the mt genome of R. balthica and B. tenagophila.

Phylogenetic analyses
Phylogenetic relationship among the 20 Pulmonata species (Table 1), plus the mt DNA sequence of G. pervia obtained in the present study was reconstructed based on amino acid sequences of 13 protein-coding genes using the 2 Opisthobranchia species (Aplysia californica, GenBank accession number NC_005827 and A. dactylomela, NC_015088) as the outgroup. Each gene was translated into amino acid sequence using the invertebrate mitochondrial genetic code in MEGA 4 [56], and aligned based on its amino acid sequence using default settings, and ambiguously aligned regions were excluded using Gblocks online server (http://molevol. cmima.csic.es/castresana/Gblocks_server.html) [57] using the options for a less stringent selection. The final amino acid sequences of the 13 protein-coding genes were then concatenated into single alignments for phylogenetic analyses. Three different inference methods, namely MP, ML, and Bayes, were used for phylogenetic analyses. MP analysis was performed using PAUP* 4.0b10 [58], with indels treated as missing character states. A total of 1,000 random addition searches using TBR were performed for each MP analysis. Bootstrap probability (BP) was calculated from 1,000 bootstrap replicates with 10 random additions per replicate in PAUP. ML analyses were performed using PhyML 3.0 [59], and the MtArt+I+G+F model with its parameter for the concatenated dataset was determined for the ML analysis using ProtTest 10.2 based on the Akaike information criterion (AIC) [60]. BP value for ML trees was calculated using 1000 bootstrap replicates. Bayesian analyses were conducted with four independent Markov chains run for 1,000,000 metropolis-coupled MCMC generations, sampling a tree every 100 generations in MrBayes 3.1.1 [61]. The first 2,500 trees were omitted as burn-in and the remaining trees were used to calculate Bayesian posterior probabilities (PP). Phylograms were drawn using the Tree View program version 1.65 [62].