The Complete Mitochondrial Genome of Aix galericulata and Tadorna ferruginea: Bearings on Their Phylogenetic Position in the Anseriformes

Aix galericulata and Tadorna ferruginea are two Anatidae species representing different taxonomic groups of Anseriformes. We used a PCR-based method to determine the complete mtDNAs of both species, and estimated phylogenetic trees based on the complete mtDNA alignment of these and 14 other Anseriforme species, to clarify Anseriform phylogenetics. Phylogenetic trees were also estimated using a multiple sequence alignment of three mitochondrial genes (Cyt b, ND2, and COI) from 68 typical species in GenBank, to further clarify the phylogenetic relationships of several groups among the Anseriformes. The new mtDNAs are circular molecules, 16,651 bp (Aix galericulata) and 16,639 bp (Tadorna ferruginea) in length, containing the 37 typical genes, with an identical gene order and arrangement as those of other Anseriformes. Comparing the protein-coding genes among the mtDNAs of 16 Anseriforme species, ATG is generally the start codon, TAA is the most frequent stop codon, one of three, TAA, TAG, and T-, commonly observed. All tRNAs could be folded into canonical cloverleaf secondary structures except for tRNASer (AGY) and tRNALeu (CUN), which are missing the "DHU" arm.Phylogenetic relationships demonstrate that Aix galericula and Tadorna ferruginea are in the same group, the Tadorninae lineage, based on our analyses of complete mtDNAs and combined gene data. Molecular phylogenetic analysis suggests the 68 species of Anseriform birds be divided into three families: Anhimidae, Anatidae, and Anseranatidae. The results suggest Anatidae birds be divided into five subfamilies: Anatinae, Tadorninae, Anserinae, Oxyurinae, and Dendrocygninae. Oxyurinae and Dendrocygninae should not belong to Anserinae, but rather represent independent subfamilies. The Anatinae includes species from the tribes Mergini, Somaterini, Anatini, and Aythyini. The Anserinae includes species from the tribes Anserini and Cygnini.


Introduction
Anseriformes is a highly differentiated order of birds with worldwide distribution, containing more than 150 species [1], [2]. Anseriformes is one of the best-studied groups of birds, largely owing to the group's historic importance in hunting, domestication, and aviculture [3]. The phylogenetic relationships among the Anseriformes, especially the phylogenetic position of several important species and groups, are rather complex and controversial, and have been affected by rearrangements several times throughout history [1], [2], [3].
Mandarin duck (Aix galericulata) and Ruddy shelduck (Tadorna ferruginea) are two typical Anseriform waterfowl, yet their phylogenetic status has remained controversial [3]. Traditionally, Aix galericulata belongs in Tadornini, and Tadorna ferruginea is a member of Cairinini; both of them placed inside Anatinae. The two birds have a moderately large body size, contrastingly pale dorsal wing-coverts, and blunt carpal (wing) spurs, along with other shared morphological characteristics; and both also feed through a combination of wading and dabbling [3]. All of these characteristics lie between the true ducks (Anatinae) and true geese (Anserinae) in terms of anatomy and behavior [12]. Accordingly, Table 1. Organization of the complete mtDNAs of Aix galericulata and Tadorna ferruginea.  Researchers have also used molecular evidence, based on combined Cyt b and ND2 gene sequences, to suggest placing Aix and Tadorna together as a tribe in the Anatinae [2]. These authors have also grouped the shelducks (Tadorna) together with the sheldgeese (Cyanochen, Alopochen, Neochen and Chloephaga), forming the tribe Tadornini, based on morphological and molecular data, but consider it a non-monophyletic grouping [2]. To add to the confusion, traditional primitive characters used to define ''perching ducks'' describes a polyphyletic grouping, because similar morphological, biochemical, and behavioral characteristics occur in many differing genera, including Anseranas, Dendrocygna, Sarkidiornis, Tadorna, Cairina, Aix, and Chenonetta [13].
Nonetheless, important groups in the Anseriformes remain controversial. Traditionally, the order Anseriformes has been considered to be composed of the families Anhimidae (two genera and three species) and Anatidae (approximately 41 genera and 147 species, including Anseranas semipalmata) [1]. However, some authors suggest that it should be divided into three families: Anhimidae, Anatidae, and Anseranatidae, the latter only containing one species, Anseranas semipalmata [1], [2]. The family Anhimidae is supported by both morphological and molecular data, usually without controversy [1], [3]. A major source of conflict at the family level is centered around Anseranas semipalmata, that is, whether it should be considered a member of an independent family by itself, or whether it is contained within a subfamily of Anatidae [1], [2], [4], [5], [6]. The Anatidae comprises the largest number of species in Anseriformes, and traditionally was divided into two subfamilies: Anatinae and Anserinae [1], [2], [3], [7], [8]. However, this view has been challenged by several authors, who recognize five subfamilies within Anatidae: Anatinae, Anserinae, Oxyurinae, Dendrocygninae, and Anseranatinae [2], [9], [10]. Anatinae, Anserinae, and Dendrocygninae are supported by previously published mtDNA data [10]. The stiff-tailed ducks (Oxyurinae/Oxyurini) include some of the most distinctive waterfowl species, showing the greatest sexual size dimorphism [11]. Most of its members have long stiff-tail feathers, which are erected at rest, and relatively large, swollen bills [3]. According to morphological and behavioral characteristics, stiff-tailed ducks (Oxyurinae/Oxyurini) appear to be closer to swans and true geese than they are to typical ducks [3]. Previously the group was considered to be a comparatively primitive member of the tribe Oxyurini; however, some authors do not support this view and considered it to be a subfamily, Oxyurinae [3]. Its relationships are still enigmatic, and are subject to considerable debate regarding its validity and circumscription [3], [11]. Systematic controversies concerning the stiff-tailed ducks (Oxyurinae/Oxyurini) have focused on whether they constitute a tribe or a subfamily, and often consider it to be close to Anserinae, not within Anatinae at all, and agree with morphology-based studies [8], [9], [11].
Mitochondrial DNA is a powerful, increasingly popular, and widely used molecular marker for the estimation of the animal phylogenetic relationships. It has become a major tool of comparative genomics and plays an important role in phylogenetic studies, comparative and evolutionary genomics, and molecular evolutionary analyses, owing to its maternal inheritance, lack of recombination, and accelerated nucleotide substitution rates compared with those of the nuclear DNA [14], [15]. Here, we attempt to resolve the controversial Aves species using mtDNA analyses. Our newly completed mitochondrial genomes should provide new insights into the phylogenetic position of important species, and yield insight into the higher-level systematics of Anseriform birds. Early molecular work disentangling the phylog- eny of the Anseriformes was mostly based on one or a few mitochondrial loci, almost always including Cyt b, ND2, and/or control region (CR) sequences [1], [2]. However, complete mtDNA sequences have become increasingly important for comprehensive evolutionary and phylogenetic studies [10], [15], [16]. Several analyses have demonstrated that complete mtDNA provides higher levels of phylogenetic support than those based on individual or partial mitogenomes [10], [15], [16], [17], [18], [19]. Complete mtDNA sequences are not only more informative than shorter sequences of individual genes, but also provide reliable information toward the inference of phylogenetic relationships among controversial animals [17], [19], [20], [21]. Consequently, complete mtDNA genomes are becoming a preferred marker for resolving controversial species relationships, and are increasingly important for comprehensive evolutionary studies [10], [14], [15], [16], [22]. However, very few Anseriform birds are currently represented with complete mtDNAs, consequently, a number of Anseriform species and their phylogenetic relationships remain unresolved. We sequenced the complete mtDNA of two important Anseriform birds, from the genera Aix and Tadorna, in this study. We also analyzed the nucleotide composition, codon usage, and compositional biases of the mitogenomes. Our phylogenomic analysis should shed increased light on the phylogenetic status of Aix galericulata and Tadorna ferruginea, and on the phylogenetic relationships of other important groups of Anseriformes.

Protein-coding genes
Through the 13 protein-coding genes, ATG is the start codon in nine of the 13 PCGs in Aix galericulata, but ND4 starts with ATT, while COI, COII, and ND5 begin with the nonstandard start codon GTG. In Tadorna ferruginea ten PCGs start with ATG, and ATT is the start codon only in ND4, while COI and ND5 begin with GTG. The standard stop termination codon TAA occurs in most of the same genes in the two birds' mtDNAs, except ND2 stops with TAA in Aix galericulata and TAG in Tadorna ferruginea. Furthermore, AGG terminates the ND1 and COI genes, TAG terminates the ND3 and ND6 gene, and the incomplete termination codon T-occurs in the COIII and ND4 genes in both birds.

Ribosomal RNA, transfer RNA, and non-coding regions
In Aix galericulata and Tadorna ferruginea 12S rRNA (997 bp and 982 bp, respectively) and 16S rRNA (1,604 bp and 1,610 bp, respectively) genes are located between the tRNA Phe and tRNA Leu genes, separated by the tRNA Val gene. The two complete mtDNAs contain 22 tRNAs genes, and except for tRNA Ser (AGY) and tRNA Leu (CUN) , which lack dihydrouridine (DHU) arms, all other tRNAs could be folded into the typical cloverleaf structure. The longest tRNAs are tRNA Asn (78 bp) and tRNA Glu (87 bp) in Aix galericulata and Tadorna ferruginea, respectively, and the shortest is tRNA Cys (65 bp in both).
Non-coding regions in the mtDNAs include the CRs and a few intergenic spacers. The CRs are located between the tRNA Glu and tRNA Phe genes, which are 1,071 bp and 1,077 bp, respectively, in Aix galericulata and Tadorna ferruginea. Additionally, 11 gene junction regions spacer by a total of 31 bp, with the longest one being 10 bp between ND6 and tRNA Glu in Aix galericulata (Table 1). There are a total of 55 bp spacer region at 12 gene junctions in Tadorna ferruginea (Table 1).

Phylogenetic reconstructions
Our chosen 16 Anseriforme species represent two major branches of the Anseriformes phylogeny with highly similar topologies and only slight differences in bootstrap support and posterior probability values ( Figure 1). The first branch is Anatidae and the second is Anseranatidae. Anatidae contains Anatinae, Tadorninae, Anserinae, and Dendrocygninae; and Anseranatidae only contains Anseranas semipalmata. Anatinae and Tadorninae are sister groups, grouped together nestled within the clades Anserinae and Dendrocygninae. Mergini, Anatini, and Aythyini form Anatinae; Anserinae contains Anserini and Aythyini. Aix galericulata and Tadorna ferruginea are in the Tadorninae group.
Phylogenetic analysis was also performed on a concatenated Cyt b, ND2, and COI genes among 68 Anseriforme species. The trees from the maximum likelihood (ML) and Bayesian inference (BI) analyses share identical topologies and high node support values ( Figure 2). The results indicate that Anseriformes could be divided into three branches: Anatidae, Anseranatidae, and Anhimidae. Anatidae and Anseranatidae are sister branchs, then grouped with Anhimidae. Anatidae contains five clades: Anatinae, Tadorninae, Anserinae, Oxyurinae, and Dendrocygninae. The Anatinae, includes species from the tribes Mergini, Somaterini, Anatini, and Aythyini. This subfamily is a sister group to Tadorninae, comprising Aix, Cairina, Tadorna, and Chloephaga. These two subfamilies form a clade that is in a sister group relationship with Anserinae, comprising the tribes Anserini and Cygnini. In turn, this clade is sister to the remaining Oxyurinae and Dendrocygninae. All of them are grouped with families Anseranatidae and Anhimidae.

Mitochondrial genome annotation and features
The gene order and arrangement of the two new mtDNA sequences, including such features as gene length, base composition, and RNA structure, are extremely conservative, and similar to that of other Anseriform birds [10], [16], [23], [24]. The overall base composition is similar to other Anseriforme species, for example, A+T content is higher than C+G content, conforming to other Anseriforme species (51.6-55.7%) [10], [16], [23], [24]. The relative abundance of nucleotides is C.A.T.G, reflecting the strong AT bias [10], [16]. Guanine is the rarest nucleotide.
Results indicate that all Anseriformes mtDNAs so far sequenced have the same gene order and arrangement, no introns, no long intergenic spacers, and only a few overlapping sequences [10], [16]. All genes are encoded on the same arrangement, and there are no missing or duplicated genes [10], [16]. Among the 16 mtDNAs the longest is from Anseranas semipalmata (16,870 bp), and the shortest is Anas formosa (16,594 bp). Homologous regions comprise 12,748 bp, representing 79.68% of the complete genome ( Table 2). The 16 genomes generally have the highest transition/ transversion ratio in closely-related species [25]. The A, T, and A+ T compositions are similar, and shared with a strong AT bias and rare guanines ( Table 2). Metazoan mtDNA usually present a clear strand bias in nucleotide composition; this strand bias can be measured as AT-skew and GC-skew [26]. All of the Anseriformes mtDNAs exhibit a slight AT-skew (average value: 0.137), ranging from 0.125 in Dendrocygna javanica to 0.147 in Tadorna ferruginea ( Table 2). The GC-skew ranges from 20.377 (Anseranas semipalmata) to 20.327 (Dendrocygna javanica), with an average value of 20.353 ( Table 2).

Comparison of protein-coding genes
We compared the total length of the 13 PCGs in Aix galericulata and Tadorna ferruginea with other Anseriform birds. Lengths among them are quite similar and very conservative; the longest one is Aix galericulata (11,403 bp) and the shortest is Anser fabalis (11,328 bp). The 13 PCGs have a total length of 11,331 bp in Aix galericulata and 11,385 bp in Tadorna ferruginea, which is 68.05% and 68.42% of each entire mtDNA genome, respectively. In both species, the longest gene is ND5, located between the tRNA Leu (CUN) and Cyt b genes, and the shortest is ATP8, which is between the tRNA Lys and ATP6 genes. Most PCGs used ATG as start codons, only a few start with GTG, GTC, or ATA. Stop codons are also similar across species, with TAA, TAG, and Toccurring most frequently. In Tadorna ferruginea and Tadorna ferruginea, the start condons are ATG, GTG and ATT, and TAA, AGG, TAG and T-as stop termination codons occur in most of the same genes in the two birds' mtDNAs. Among the 13 PCGs, specific examples include the following: the COI initiation codon is GTG and the termination codon is AGG in all 16 species; Cyt b starts with ATG and ends with TAA; COII starts with ATG and ends with TAG, except in Branta canadensis, where it starts with GTC; ND6 starts with ATG and ends with TAG, except in Anser fabalis it ends with TAA; ND1 starts with ATG and ends with AGG, except in Anser albifrons where the stop codon is TAA; and ND2 starts with ATG and ends with TAG, except in Anseranas semipalmata, Cygnus atratus, and Tadorna ferruginea, where the stop codon is TAA (Table 3).
Some mtDNA PCGs are particularly worthy of note. Avian species generally exhibit moderate levels of sequence divergence in some mitochondrial genes, including Cyt b, ND2, and COI. These genes are of special interest, because they have been widely used to resolve the taxonomy of controversial groups in Anseriformes [2]. A combination of these three genes is often adequate and has been used for resolving phylogenetic problems at many different taxonomic levels, ranging from related species to genera and families [25]. They have been valuable for clarifying phylogenetic relationships within many controversial animal groups, especially that of Anseriform birds [16], [25], [27].

Control region comparisons
The CR is the only major non-coding segment of mtDNA, and has higher variability, evolving three to five times more rapidly, than other vertebrate mtDNAs [28]. Its primary function is thought to be the regulation of replication and transcription [29]. In Aves the CR is located between the tRNA Glu and tRNA Phe genes. Sequence variation in the CR results in length variability in      11 repeats in Anser albifrons and Anser anser, and (ATCAAACG) 15 elements in Anseranas semipalmata. The average genetic distance (0.376) between the CR of Anseranas semipalmata and the other species in our analysis is higher than average (0.288) ( Table 4). Comparative analysis of the structure and organization of CRs can help show relationships in the Anseriformes [8], [27], [31]. A better understanding of CR characteristics can provide insights into phylogenetics [28], Anseranas semipalmata thus has the most divergent CR of the studied species (Figure 3). Typically, vertebrate CRs are subdivided into three domains (Domain I, Domain II, and Domain III) [28]. Within the central conserved domain (Domain II), conserved sequence blocks (CSBs) C, D, E, and F show evolutionary conservation, and exhibit a rather homogeneous evolutionary rate [28], and are 31 bp, 24 bp, 20 bp and 20 bp in the length, respectively (Figure 3). It has lower variation rate among the Domain II of 16 Anseriforme species, and there are no base insert, only three bases deletion are found in Aix galericulata and 13 bases deletion in Tadorna ferruginea (Figure 3). These boxes are present in most avian taxa, are similar to those of other vertebrates, and are associated with regulating Hstrand synthesis [8], [31]. Several CR CBS boxes exist in the 16 species of Anseriformes studied as well, which suggests that the boxes may play a key role in the replication and transcription of the mitochondrial genome [27], [31].

Phylogenetic analyses
The trees from the ML and BI analyses based on the complete mtDNA of 16 Anseriform species share similar topologies and high node support values with those of the concatenated three mitochondrial gene sequences from 68 species (Figures 2 and 3). The results support the grouping of Aix galericula and Tadorna ferruginea together in the same lineage, Tadorninae. Livezey included Aix and Cairina in the Anatini but proposed a subtribe, Cairinina, clustering these species together on the basis of a single osteological synapomorphy [9]. Dickinson considered Aix and Cairina to be members of the Anatinae [6]. Our molecular results show Aix and Cairina grouped with Tadorna and Chloephaga, supported by high bootstrap values, forming the Tadorninae, which is located between Anatinae and Anserinae. Therefore, we agree that Tadorninae should be an independent subfamily in the Anatidae, and that Aix and Cairina don't belong to Anatinae, but are members of Tadorninae, which is a little different from Dickinson's view [6]. The taxonomy and systematic relationships within the Tadorninae have been considerably debated. Our results show that the relationships between Aix and Cairina, and Tadorna and Chloephaga are much closer. Cairina and Aix cluster together as a sister group adjacent to Tadorna and Chloephaga, which, based on morphological and molecular studies, has been repeatedly claimed [2]. Aix and Cairina have several similar characteristics in behavior and breeding biology, and molecular phylogenetic trees also suggest that they are sister branches, showing they have a close genetic relationship as well [8]. Our mtDNA evidence also supports Tadorna and Chloephaga having a much closer genetic relationship, congruent with morphological studies [9].
Our mtDNA analysis suggests that the genus Anas is actually polyphyletic, but Anas formosa formosa, which is found in one of the Anatini branches, has no close relatives among living ducks, and should be placed in some distinct genera. Anas formosa formosa should be in a distinct genus; and Anas discors, Anas querquedula, Anas clypeata and Anas platalea should be placed in another distinct genus, closest to the multigenus duck group (Tachyeres, Lophonetta, and Amazonetta).

Sample collection
Trace blood samples from Aix galericulata and Tadorna ferruginea were collected using non-invasive methods at the Hefei Wild Animal Park in May 2013. No animal was killed for the purpose of the experiment, the method will not affect the health of the animals, and it conforms to our animal ethics committee's guidline in this study. The Hefei Wild Animal Park is authorized to administer animal rescue and medical treatment by Anhui Provincial Conservation and Mangement Station for Wildlife (APCMSW), a provincial government agency for wildlife conservation in the Anhui Province of China. We were authorized to study the birds by the APCMSW. The samples were stored at 220uC at the Institute of Biodiversity and Wetland Ecology, School of Resources and Environmental Engineering, Anhui University (Sample codes are AHU-WB20130522 and AHU-WB20130523, respectively).

DNA extraction, PCR amplification, and sequencing
Whole genomic DNA was isolated from blood samples using the phenol/chloroform method. Extracted DNA was examined on a 1.0% agarose/TBE gel and stored at 220uC as templates for PCR.
Based on an alignment of complete mtDNA sequences from Cygnus atratus (NC_012843), Anas platyrhynchos (NC_009684), and Aythya americana (NC_000877), we designed three primer pairs (primer sets 9, 11, and 14) using Primer 5.0. We also used other primers developed from Anser fabalis [10], [16]. These primers were used to amplify and sequence the complete mtDNAs of Aix galericulata and Tadorna ferruginea (Table 5). Generated sequences were all less than 1,200 bp each, with each segment overlapping the next by 80-150 bp.
PCR amplifications were carried out in 50 ml volumes containing 100 ng template DNA, 5 ml of 106 reaction Buffer, 2 ml of 25 mM MgCl 2 , 4 ml of 2 mM dNTPs, 1 ml of each 10 mM primer, 0.5 U Taq DNA polymerase (Trans Taq-T DNA Polymerase, Beijing, China), and sterile doubly-distilled water to final volume. PCR amplification conditions follow: denaturation for 5 min at 94uC, followed by 30 cycles of denaturation for 30 s at 94uC, annealing for 30 s at 49-55uC (depending on primer combinations), elongation for 2 min at 72uC, and a final extension step of 10 min at 72uC. PCR products were examined using electrophoresis on a 1% agarose/TBE gel ( Figure 4) and purified and bidirectionally sequenced by Sangon Biotech Co., Ltd.

Sequence analysis
Sequences were checked and assembled using the programs Seqman (DNASTAR, 2001), BioEdit, and Chromas 2.22, and then adjusted manually. PCGs were identified by comparison with the known complete mtDNA sequences of Anseriform birds using Sequin 11.0. The 22 tRNA genes were identified using the software package tRNA Scan-SE 1.21 (http://lowelab.ucsc.edu/ tRNAscan-SE), and their cloverleaf secondary structures and anticodon sequences were determined using DNASIS (Ver.2.5, Hitachi Software Engineering). Two rRNAs were identified by comparison with complete mtDNA sequences of other Anseriformes available in GenBank. The complete mtDNA sequences have been deposited in GenBank under accession numbers KF437906 and KF684946.

Phylogenetic analyses
Phylogenetic trees were estimated using ML and BI methods, to study the phylogeny of the Anseriformes. Corresponding Gallus gallus (NC_001323) sequence was used as outgroup. Phylogenetic trees were estimated for two cases: one based on the complete mtDNA of 16 Anseriforme species (Table 6), and another one based on a multiple sequence alignment of three mitogenome (Cyt b, ND2, and COI) sequences from 68 typical Anseriform species from GenBank (Table 7). Our previous research has shown that the combined gene sequence from Cyt b, ND2, and COI is suitable for resolving phylogenetic relationships among Anseriform species in the absence of sufficient complete mtDNA data [16].  Before phylogenetic tree estimations all 16 complete mtDNAs and the three concatenated data sets of 68 Anseriform species were aligned using ClustalX 1.8, followed by manual adjustment. Specifics of the phylogenetic tree estimation based on the three concatenated data sets among the 68 Anseriforme species follows: the three mitogenome sequences were translated into their corresponding amino acid sequences and saved into.meg format, then turned it into.nex format, and then remove each third position of every codon using MEGA 4.0. We then concatenated the three mitogenome sequences to each other under the each species bird name, saved the file in.meg format, and then turned it into nex form. ML analyses was performed in PAUP* 4.0b10 using TBR branch swapping (10 random addition sequences) and a general time-reversible model with invariant sites and among-site variation (GTR+I+C) that was selected as the best fit model of evolution using Modeltest (version 3.06) based on the AIC criterion. Internal ML tree branch support was evaluated using a bootstrap test with 100 iterations. Bayesian phylogenetic inference was done using MrBayes 3.1.2, with the same best-fit substitution model as that selected for the ML analysis. MrBayes 3.1.2 simultaneously initiates two Markov Chain Monte Carlo (MCMC) runs to provide additional confirmation of convergence of posterior probability distributions. Analyses were run for one million generations until the average standard deviation of split frequencies was less than 0.01, which indicated that convergence was reached. Chains were sampled every 1,000 generations.

Supporting Information
Appendix S1 Phylogenetic Classification of the Anseriformes. (DOC)