The Mitochondrial Genomes of Aquila fasciata and Buteo lagopus (Aves, Accipitriformes): Sequence, Structure and Phylogenetic Analyses

The family Accipitridae is one of the largest groups of non-passerine birds, including 68 genera and 243 species globally distributed. In the present study, we determined the complete mitochondrial sequences of two species of accipitrid, namely Aquila fasciata and Buteo lagopus, and conducted a comparative mitogenome analysis across the family. The mitogenome length of A. fasciata and B. lagopus are 18,513 and 18,559 bp with an A + T content of 54.2% and 55.0%, respectively. For both the two accipitrid birds mtDNAs, obvious positive AT-skew and negative GC-skew biases were detected for all 12 PCGs encoded by the H strand, whereas the reverse was found in MT-ND6 encoded by the L strand. One extra nucleotide‘C’is present at the position 174 of MT-ND3 gene of A. fasciata, which is not observed at that of B. lagopus. Six conserved sequence boxes in the Domain II, named boxes F, E, D, C, CSBa, and CSBb, respectively, were recognized in the CRs of A. fasciata and B. lagopus. Rates and patterns of mitochondrial gene evolution within Accipitridae were also estimated. The highest dN/dS was detected for the MT-ATP8 gene (0.32493) among Accipitridae, while the lowest for the MT-CO1 gene (0.01415). Mitophylogenetic analysis supported the robust monophyly of Accipitriformes, and Cathartidae was basal to the balance of the order. Moreover, we performed phylogenetic analyses using two other data sets (two mitochondrial loci, and combined nuclear and mitochondrial loci). Our results indicate that the subfamily Aquilinae and all currently polytypic genera of this subfamily are monophyletic. These two novel mtDNA data will be useful in refining the phylogenetic relationships and evolutionary processes of Accipitriformes.


PCR amplification and sequencing
The PCR primers and several internal primers (S1 Table) used in PCR amplification or sequencing were designed based on available mitochondrial sequences of Accipitriformes. Each primer set amplified a mtDNA fragment, including an overlap region of at least 100 bp with its adjacent amplified fragments at both the terminals. Long PCR and nested-PCR were performed as described by Kan et al. [28]. The amplified fragments were purified using TIANgel Midi Purification Kit (Tiangen Biotech Co., Ltd, Beijing, China). The purified PCR products were sequenced directly on ABI-PRISM 3730 sequencer using BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) with their corresponding primers.

Genome assembly and annotation
DNA sequences were analyzed using software BioEdit [37] and Ugene [38]. Contig assembly was performed with the program Sequencher 4.14 (Gene Codes Corporation, Ann Harbor, USA). The boundaries of protein-coding genes and rRNA genes were initially identified via the MITOS [39] and DOGMA [40] webservers, and refined by alignment with mitochondrial genomes of other species of Accipitriformes. Transfer RNA genes were identified using tRNAscan-SE v.1.21 [41] and ARWEN v.1.2 [42]. The whole-mtgenome comparison maps were visualized using the software CGView Comparison Tool (CCT) [43]. All gene names included mitochondrial and nuclear gene are in accordance with the HUGO Gene Nomenclature Committee's database (HGNC) [44].

Sequence alignment and Rate heterogeneity
Sequence alignment was carried out using MAFFT 7.2 [45] with the default settings. The nucleotide bias, skew can be calculated as (G − C) / (G + C) or (A − T) / (A + T). The rates (number of variable sites, ratio of nonsynonymous-to-synonymous substitutions rates (dN/dS)) and patterns (Transition-to-transversion (ts/tv) ratio) of evolution for each gene were calculated in the present study. Number of variable sites was conducted using DnaSP ver. 5.10 [46]. dN/dS was performed with Datamonkey [47]. ts/tv was estimated by MEGA ver. 6.06 [48].

Phylogenetic analysis
To investigate the evolutionary relationships among A. fasciata, B. lagopus and their related species, three data sets were performed with the maximum likelihood (ML) and the Bayesian inference (BI) methods: (1) for mitogenomic phylogeny of Accipitriformes data set, 13 PCGs of 16 Accipitriformes species were used (Table 1), with two species from Strigiformes (Phodilus badius, NC_023787; Strix leptogrammica, NC_021970) as the outgroups,(2) for phylogeny of Accipitridae data set, two mitochondrial genes (MT-CYB and MT-ND2 (mitochondrially encoded NADH dehydrogenase 2)) of 148 species from Accipitridae were used (S2 Table), and (3) for phylogeny of Aquilinae data set, available multiple sequences of four mtDNA loci (MT-CYB, MT-ND2, MT-CO1 (mitochondrially encoded cytochrome c oxidase I) and CR) and five nuclear loci (RAG1 (recombination activating gene 1) coding region, LDH (lactate dehydrogenase) intron 3, MYC (v-myc avian myelocytomatosis viral oncogene homolog), AK1 (adenylate kinase 1) exon 6, FIB7 (beta-fibrinogen gene, intron 7)) of all 39 species in this subfamily from GenBank were used (S3 Table), with Morphnus guianensis and Harpia harpyja as the outgroups. Codon positions included in the analysis were the 1st, 2nd and 3rd. Sequence alignment was carried out using MAFFT 7.2 [45] with the default settings. Sequence format convertion was performed with DAMBE 5.5 [49]. To check for saturation in nucleotide codons, substitution saturation analysis [50] was performed for subsets with the first, second and third codon positions using DAMBE 5.5. According to the results, none of the substitutions from three codon positions of all protein-coding genes in our two data sets were saturated. The bestfit models were selected using Bayesian Information Criterion (BIC) as implemented in Model-Generator version 0.85 [51]. For 13 PCGs mitogenome nucleotides data set, we defined the independent mitochondrial partitions as each of the 13 loci. For combined mitochondrial and nuclear data set, we defined independent partitions as each of the 9 loci.
The ML analyses were conducted in RAxML v.8.0.26 [52], as implemented in the graphical user interface RaxML GUI v.1.3.1 [53]. We performed analyses with ML + slow bootstrap for ten runs and 1000 replicates under GTR + CAT [54] model. Due to lower computational and memory costs, CAT, which is a rate heterogeneity and fast approximation of the gamma model, appears to yield better log likelihood scores even when calculated under a real gamma model [54,55]. Under the GTR + CAT model, the default number of categories (c = 25) and random starting trees were employed. The BI analyses were performed with MrBayes 3.2.2 [56], applying independent models of evolution for each partition. For 13 PCGs nucleotides data set, GTR + G was chosen for MT-ND4 (mitochondrially encoded NADH dehydrogenase 4), HKY+I+G was chosen for MT-ND1 (mitochondrially encoded NADH dehydrogenase 1), HKY+G was chosen for MT-ATP8 (mitochondrially encoded ATP synthase 8), MT-ND3 (mitochondrially encoded NADH dehydrogenase 3) and MT-ND4L (mitochondrially encoded NADH dehydrogenase 4L), while GTR + I + G for the remaining 10 loci. For combined mitochondrial and nuclear data set, GTR + I + G was chosen for MT-CYB, MT-ND2, and MT-CO1, while GTR + G for the remaining six loci were selected as the best substitution model for the 1st, 2nd, and 3rd codon positions, respectively. Four Markov chains were run for ten million generations (sampling every 100 generations) allowing adequate time for convergence. After discarding 25% of the initial trees as burn-in, the remaining trees were used to estimate 50% majority rule consensus tree and Bayesian posterior probabilities (BPP). All MCMC runs were repeated twice to confirm consistent approximation of the posterior parameter distributions. PSRF (potential scale reduction factor) is close to 1 as runs converge, and the ESS (effective sample sizes) for all parameters are above 200.

Features of the mitogenomes of A. fasciata and B. lagopus
The complete mitochondrial genomes (mt-genomes) of A. fasciata and B. lagopus were determined to be 18,513 and 18,559 bp in length, respectively. These are close to the other Accipitriformes mt-genomes sizes reported (S4 Table). The two sequences were deposited in GenBank (A. fasciata: KP329567 and B. lagopus: KP337337). The number and order of 37 mitochondrial genes and two control regions (CR and CCR (pseuo control region)) of both A. fasciata and B. lagopus are identical ( Fig 1A, and S5 Table). The nucleotide compositions of the complete mtDNA sequences (Heavy-strand) of A. fasciata and B. lagopus are slightly biased toward A and T (S4 Table), which is similar to that from other avian species [28,29,35]. Comparative analyses of the nucleotide sequences of each mt gene (excluded tRNA genes) and non-coding regions, together with the amino acid sequences, are given in Table 2. The sequence lengths of 13 PCGs were the same for A. fasciata and B. lagopus, except for MT-ND3 gene ( Table 2). One extra nucleotide 'C' is present at the position 174 of MT-ND3 gene of A. fasciata, which is not observed at that of B. lagopus. In mitochondrial MT-ND3 for 16 species of Accipitriformes (Fig  2), 6 have the extra base, whereas 10 lack it. Additional putative frameshift sites have been proposed in the mitochondrial of many birds, turtles and ants [28,29,35,[57][58][59]. Mindell el al. [58] suggested that the site was subject to a +1 programmed frameshit during translation, allowing correct translational of functional protein. We also found that the extra base insertion appear to be dynamically gained and lost from the MT-ND3 of closely related taxa, such as the MT-ND3 of A. fasciata and A. chrysaetos. So, we infered that there was no evolutionary significance for the extra insertion site.
In addition, for both A. fasciata and B. lagopus mtDNAs, obvious positive AT-skew and negative GC-skew biases were detected for all 12 PCGs encoded by the H strand, whereas the reverses were found in MT-ND6 (mitochondrially encoded NADH dehydrogenase 6) encoded by the L strand (Fig 3). The nucleotide variation across the entire genome sequence between the two accipitrid species was 19.07%. The values of nucleotide sequence difference in 13 PCGs and two RNA genes between A. fasciata and B. lagopus ranged from 9.54% (MT-CO3 (mitochondrially encoded cytochrome c oxidase III)) to 22.02% (MT-ATP8) ( Table 2). The amino acid sequence differences ranged from 3.45 to 34.55%, with MT-ND3 being the most conserved PCG and MT-ATP8 the least conserved. Like other Accipitriformes species (expect for Cathartes aura and Sagittarius serpentarius) [34,60,61] (S4 Table), two putative control regions (CR and CCR) of A. fasciata and B. lagopus mitogenomes were found between MT-TT (mitochondrially encoded tRNA threonine) and MT-TF (mitochondrially encoded tRNA phenylalanine), and were separated by MT-TP (mitochondrially encoded tRNA proline), MT-ND6 and MT-TE (mitochondrially encoded tRNA glutamic acid) (Fig 1A). The length of CR of Accipitriformes species varies between 1,117 bp (S. serpentarius) and 2,329 bp (Accipiter nisus), with AT content ranging from 55.9% (A. monachus and Spilornis cheela) to 65.0% (A. nisus) (S4 Table).
Based on the distribution of the conserved motifs in other avian CRs [29,35,62,63], the CRs of the two accipitrid species could be divided into three domains: ETAS (extended termination-associated sequence) Domain I, Central Conserved Domain II and CSB (conserved sequence block) Domain III. Six conserved sequence boxes in the Domain II [60], named boxes F, E, D, C, CSBa, and CSBb, respectively, were recognized in the CRs of A. fasciata and B. lagopus. Nevertheless, conserved sequence box B was only found in the CR of A. fasciata (Fig  4). In the CR of B. lagopus, we observed three minisatellites, one was 23 nucleotides (5 0 -TTTATCATCATATTTTATTATTA) with six tandem repeats, and two were 11 nucleotides (5 0 -AAATTTTTACA, and 5 0 -AATTTATCATG) with 3 and 17 tandem repeats, correspondingly. Moreover, we only found one minisatellite, which was 22 nucleotides (5 0 -with two Strigiformes birds as outgroups. This analysis is based on 13PCGs. Both ML and Bayesian analyses produced identical tree topologies. The ML bootstrap and Bayesian posterior probability values for each node are indicated. TTTTTTCACAATTTTTTCACAT) with two tandem repeats, in the CR of A. fasciata. The size, repeats, and sequence composition of minisatellite in Accipitriformes CRs might be served as useful genus-and species-specific molecular markers.

Rates and patterns of mitochondrial gene evolution within Accipitridae
As can be seen in the CCT BLAST map, CR and CCR of these accipitrids are highly divergent (Fig 1A). Comparison of the evolutionary rate of different genes provides a better understanding of the patterns of molecular evolution of the mtDNA. Due to the length and sequence heterogeneity, CR and CCR sequences were not included in these analyses. Among the eleven mtDNA sequences (excluding CR and CCR), 5213 nucleotides sites (33.28%) are variable (Table 3). In the protein coding region, the most variable region of the genomes by percent variable sites is MT-ND3, followed by, MT-ATP8, MT-ND4 and MT-ND6. In contrast, the lowest one is tRNA genes (20.19%), followed by MT-RNR1 (mitochondrially encoded 12S RNA) and MT-RNR2 (mitochondrially encoded 16S RNA). The ts/tv in the concatenated nucleotide data varies from MT-CYB (2.242) to tRNAs (6.640) ( Table 3). As we know, transversions are typically rare in the slow-evolving regions, such as tRNA genes. However, the patterns (ts/tv) among protein-coding genes are variable and do not reflect differences in either synonymous or nonsynonymous rates among the genes [31,64]. For example, high ts/tv ratios were found for MT-CO1, MT-CO2 (mitochondrially encoded cytochrome c oxidase II), and MT-ND4L genes, which have intermediate rates in both these substitution categories. Furthermore, the highest dN/dS was detected for the MT-ATP8 gene (0.29896) among Accipitridae, while the lowest for the MT-CO1 gene (0.01546).

Phylogenetic analysis
To infer mitophylogenetic relationships among accipitrids, evolutionary analysis was carried out using 13PCGs of 16 species, with two species of Strigiformes as outgroups. The ML and BI trees showed identical topologies (Fig 1B) [65][66][67][68]. The monophyly of Accipitridae was strongly supported (Fig 1B). New World vultures (Cathartidae) have some characters in common with storks, and a close phylogenetic relationship with storks was suggested [69][70][71][72]. However, recent molecular studies demonstrated that the New World vultures clearly have their affinity with other raptors and not with storks [7,8,21]. Our phylogeny indicated that C. aura (Cathartidae) was basal to the remaining of Accipitriformes. Based on relatively short sequences, Wink and Sauer-Gurth [21] placed Secretarybird (S. serpentarius) with storks. In our analysis, the Secretarybird is deepest on the branch with Osprey (Pandion haliaetus) and Accipitridae (Fig 1B). This result is in congruence with more and more recent studies, such as Lerner and Mindell [17], Mahmood et al. [10] and Hackett et al. [8]. Furthermore, P. haliaetus (Pandionidae) is the sister one to Accipitridae. The relationships among the four families of Accipitriformes is consistent with Burleigh et al.' s results [7]. There were three clades in Accipitridae. One was (Aegypius + Spilornis), one was (Aquila + Nisaetus), and the other was (Accipiter + Buteo). Aegypius is a member of Aegypiinae, while Spilornis is that of Circaetinae.
To test monophyly of Aquilinae, phylogenetic analysis was performed using MT-ND2 and MT-CYB genes for 60 genera and 148 species of Accipitridae, plus two species (S. serpentarius and P. haliaetus) as outgroups. There were a number of well-resolved basal clades within the family (Fig 5); the four most basal of these included: (1) a clade composed of monotypic genus Elanus (Elaninae), (2) a clade composed of Polyboroidinae, Perninae and Gypaetinae, (3) a clade that contained Circaetinae and Aegypiinae, and (4) a very large clade consisting of all remaining taxa. Within the last large clade, all species from aquiline eagles formed a robust monophyletic clade (Fig 5). Our results strongly support previous research on monophyly of Aquilinae [13,18]. Beyond recognizing the monophyly of Aquilinae, we also analysed phyloegentic relationships among and within subfamilies of Accipitridae. Our results were largely consistent with Jetz et al.'s phylogenetic trees [73] (the trees for this are available online at http://birdtree.org/), but differed in the positions of Polyboroides typus, Accipiter striatus, Buteo nitidus, and Buteogallus lacernulatus (Fig 5).
To reassess monophyly of polytypic genera from Aquilinae, a phylogenetic analysis was carried out by combining mitochondrial and nuclear loci for 39 birds, representing all species and all genera from this subfamily. Some taxa are sampled for many loci, while other taxa are only represented by two loci (S3 Table). To a large-scale, sparse supermatrix of sequence data currently available, Burleigh et al. demonstrated that it be sufficient to produce a robust estimate of the avian tree of life [7]. So, in this analysis, we used the heterogeneous concatenated matrix data to infer phylogeny of Aquilinae. The ML and the BI methods generated identical topological trees (Fig 6). The monophyly of Clanga, Hieraaetus, Nisaetus and Spizaetus were strongly supported (>99% in ML, 1.00 in BI). The genus Nisaetus, previously the Old World birds of Spizaetus, was composed of two sister clades. Clade I included five species (N. alboniger, N. bartelsi, N. kelaarti, N. nanus and N. nipalensis), while Clade II harbored the remaining species of Nisaetus. The genus Aquila also consisted of two sister clades. Clade I was composed of four species (A. adalberti, A. heliaca, A. rapax, and A. nipalensis) (100% in ML, 1.00 in BI), while Clade II included the remaining birds of Aquila (97% in ML, 1.00 in BI). Notably, the monophyly of Aquila was weakly supported (63% in ML, 0.89 in BI). Based on above analyses, we recognize all currently polytypic genera of Aquilinae as monophyly.

Conclusion
The complete mitochondrial sequences of A. fasciata and B. lagopus were found to be 18,513 and 18,559 bp with an A + T content of 54.2% and 55.0%, respectively. The genomes consist of 37 typical genes (13 energy pathway protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes)and two putative control regions (CR and CCR). The nucleotide variation across the entire genome sequence between the two accipitrid species was 19.07%. Obvious positive AT-skew and negative GC-skew biases were detected for all 12 PCGs encoded by the H strand, whereas the reverses were found in MT-ND6 encoded by the L strand. Six conserved sequence boxes in the Domain II, named boxes F, E, D, C, CSBa, and CSBb, respectively, were recognized in the CRs of A. fasciata and B. lagopus. The highest dN/dS was detected for the MT-ATP8 gene (0.29896) among Accipitridae, while the lowest for the MT-CO1 gene (0.01546). Mitophylogenetic analysis supported the robust monophyly of Accipitriformes, and Cathartidae was basal to the balance of the order. Moreover, our results indicate that the subfamily Aquilinae and all currently polytypic genera of this subfamily are monophyletic. These two novel mtDNA data will be useful in refining the phylogenetic relationships and evolutionary processes of Accipitriformes. Supporting Information S1   Table. Localization and features of genes in the mitochondrial genomes of Aquila fasciata (AF) and Buteo lagopus (BL). (DOC)