Rapid and recent diversification patterns in Anseriformes birds: Inferred from molecular phylogeny and diversification analyses

The Anseriformes is a well-known and widely distributed bird order, with more than 150 species in the world. This paper aims to revise the classification, determine the phylogenetic relationships and diversification patterns in Anseriformes by exploring the Cyt b, ND2, COI genes and the complete mitochondrial genomes (mito-genomes). Molecular phylogeny and genetic distance analyses suggest that the Dendrocygna species should be considered as an independent family, Dendrocygnidae, rather than a member of Anatidae. Molecular timescale analyses suggests that the ancestral diversification occurred during the Early Eocene Climatic Optimum (58 ~ 50 Ma). Furthermore, diversification analyses showed that, after a long period of constant diversification, the median initial speciation rate was accelerated three times, and finally increased to approximately 0.3 sp/My. In the present study, both molecular phylogeny and diversification analyses results support that Anseriformes birds underwent rapid and recent diversification in their evolutionary history, especially in modern ducks, which show extreme diversification during the Plio-Pleistocene (~ 5.3 Ma). Therefore, our study support that the Plio-Pleistocene climate fluctuations are likely to have played a significant role in promoting the recent diversification for Anseriformes.


Introduction
Adaptive radiation, the evolution of ecological and phenotypic diversity within a rapidly multiplying lineage, has fascinated biologists over the past century [1][2][3]. Identifying the evolutionary diversification of organisms is a crucial step in understanding their evolutionary history, where the final goal is exploring the factors that might potentially affect the diversification [4]. It was known that molecular phylogenies provide robust framework to study the patterns of speciation and diversification in lineages [5][6]. The branching pattern of a phylogenetic tree a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 animal welfare and research in China, and were specifically approved by the Animal Research Ethics Committee of Anhui University.

Sample collection and DNA extraction
From 2006 to 2013, muscle samples of nine Anatidae species (Anas acuta, A. poecilorhyncha, A. crecca, A. clypeata, Aythya ferina, Aythya fuligula, Mergus merganser, Tadorna tadorna and Aix galericulata) were collected from the Anqing wetland along the Yangtze River in Anhui Province. Total genomic DNA was extracted using a standard proteinase K / phenol-chloroform protocol, as described by Sambrook & Russell [69]. An EasyPure PCR Purification Kit (TransGene) was used to purify each DNA extraction.

Phylogenetic analyses
Phylogenetic trees were reconstructed based on the mito-genomes of 30 Anseriformes species (Dataset 1, S2 Table). In this study, Dataset 1 were estimated under the Maximum Likelihood (ML) and Bayesian Inference (BI) methods, with Gallus gallus (NC_001323) used as an outgroup, which were performed with RAxML version 8 [89] and MrBayes 3.2.2 software program [90], respectively.
Before phylogenetic tree constructions, sequence alignment was carried out by using Clustal X 1.8 software [91] and examined visually. To obtain the estimated best fit evolution model for each data set, we performed analyses separately as described above using the MrModeltest 1.0b software program [92] in Paup Ã 4.0b [93], which was based on the AIC criterion. For ML analysis, node support was calculated with a GTRGAMMA model via rapid bootstrapping with ten runs and 1,000 replications to estimate the best topology. For the BI analysis, two independent runs of four Markov Chains Monte Carlo (MCMC) chains (one cold chain and three hot chains) were simultaneously run under the best fit substitution model (GTR + I + G) for 1,000,000 generations, with sampling conducted every 100 generations until the average standard deviation of split frequencies reached a value less than 0.01. The first 10% of the total trees were discarded as ''burn-in" and the remaining trees were used to calculate 50% majority rule consensus tree and Bayesian posterior probabilities.

Dating analyses
To estimate the precise divergence times within Anseriformes, we dated the divergence time between Anseriformes and Galliformes with multiple out-groups (Columba livia, NC_013978 [94]; Falco peregrinus, NC_000878 [95]; Tinamus guttatus, NC_027260 [96] and Struthio camelus, NC_002785 [96]). All calibration points used in the dating analyses were derived from Jarvis' study [66] (S5 Table). Then, we used the divergence time between Anseriformes and Galliformes as the calibration point to estimate the divergence within Anseriformes. We applied a Bayesian MCMC method based on the mito-genomes, which employs a relaxed molecular clock approach, and implemented in BEAST 1.7.4 [97,98]. A relaxed uncorrelated log normal model of lineage variation and a Yule Process prior to the branching rates based on HKY + I + G model were assumed, as recommended by MrModel test 1.0b. Four replicates were run for 1,000,000 generations with tree and parameter sampling that occurred every 100 generations for the first 10% of samples that were discarded as burn-in. All parameters were assessed by visual inspection using Tracer v. 1.6 [99]. To further estimate the divergence times within the Anseriformes, we also used concatenated sequences (ND2 and Cyt b sequences) for 128 species (Dataset 2, S3 Table), and were based on the above calibration points (S5 Table). Compared with Gonzalez's study [17], seven species (Tachyeres brachypterus, T. leucocephalus, Mergus squamatus, Dendrocygna arcuata, Anhima cornuta, Chauna torquata and Anseranas semipalmata) were added. Relaxed uncorrelated log normal models of lineage variation and the Yule Process were set by basing them on HKY + I + G model as recommended by MrModel test 1.0b.

Tempo and rate shifts of diversification analyses
To visualize the temporal accumulation of lineages, a log-transformed lineage-through-time (LTT) plot [100] was performed based on 1,000 trees randomly selected from the BEAST analysis by using APE package (version 4.1) [101] in R language [102]. In this study, the LTT plots were based on the molecular phylogeny of 128 species in Anseriformes (excluding outgroups).
BAMM is a Bayesian approach that uses reversible jump Markov chain Monte Carlo (rjMCMC) sampling to explore shifts in macro-evolutionary regimes assuming they occur across the branches of a phylogeny under a compound Poisson process, and explicitly accommodates diversification rate variation through time and among lineages [103,104]. Therefore, we used the program BAMM to estimate rates of speciation, conduct rate-through-time analysis of these rates, and identify and visualize shifts in species rates across the molecular phylogeny.
BAMM accounts for non-random and incomplete taxon sampling in the phylogenetic trees by allowing all non-sampled species to be associated with a particular tip or more inclusive clade [103,104]. Therefore, we assumed that our sampling included 80% of extant Anseriformes species diversity (4 families, 39 genera and 128 species in total; S3 Table). Furthermore, priors for BAMM were generated using setBAMMpriors function, implemented in R package BAMMtools v 2.1.6 [103], by providing the Maximum Clade-Credibility (MCC) tree from BEAST and total species numbers across the order (approximately 160) [41]. Four independent MCMC chains of 20,000,000 generations, sampling event data every 20,000 steps, were run in BAMM and convergence was assessed by computing the effective sample sizes of log likelihoods, as well as the number of shift events present in each sample using the R package coda v. 0.16-1 [105]. After removing 10% of trees as burn-in, we analyzed the BAMM output using BAMMtools and computed the 95% credible rate shift configurations using the summary of posterior distribution of the number of shifts.

Mito-genome organization
In this study, nine mito-genomes were sequenced and annotated, which contain 13 PCGs (including ATP6, ATP8, COI, COII, COIII, ND1, ND2, ND3, ND4, ND4L, ND5, ND6 and Cyt b), two rRNAs (12S rRNA and 16S rRNA), 22 tRNAs and a control region (CR), respectively. The total length of these mito-genomes ranged from 16,599 bp to 16,630 bp. The heavy DNA strand (H-strand) carried most of the genes (12 PCGs, two rRNAs and 14 tRNAs), while ND6 and eight tRNAs were located on the L-strand (S1 Fig). The arrangement of the whole mito-genome of these species were identical to known typical vertebrate patterns [106]. The total length of the 13 PCGs was 11,411 bp, which represented approximately 68.7% of the entire mito-genome in Anatidae. The overall base composition is also similar to other Anseriformes species. For example, A + T content (50.6-51.8%) (S6 Table) is higher than C + G content, which reflects the strong AT bias [107]. What's more, the relative abundance of nucleotides is C > A > T > G, which showed that Guanine (G) is the rarest nucleotide (S6 Table), similar to other vertebrate animals [48, 49, 107].

Genetic distance
To understand the sequence divergence within the Anseriformes, we calculated genetic distances among different groups based on COI gene for 54 species. In this study, the K2P distances among four families ranged from 0.158 to 0.208, while the average genetic distances were 0.117 ± 0.015 (Table 1, Dataset 3). In addition, the K2P distances among genera within Anatidae ranged from 0.037 to 0.107, while the average genetic distances were 0.103 ± 0.008 (Table 2, Dataset 3). In this study, the genetic distance between Anatidae and Dendrocygnidae is significantly different from genetic distances within the Anatidae.

Phylogenetic reconstructions
The phylogenetic tree recovered from the ML (S2  (Fig 1). In this study, the 30 species clustered into two major lineages, represented by Anatidae-Dendrocygnidae and Anseranatidae, respectively (Fig 1). There are three major lineages in the Anatidae-Dendrocygnidae group (Fig 1). Additionally, topologies recovered from the MCMC method of Cyt b and ND2 for 128 Anseriformes species obtained highly posterior probability values for most nodes (Fig 2), which shared basic topology with the above results and a previous study [18]. Our study revealed that the 128 Anseriformes species also clustered into two major lineages, represented by Anatidae-Dendrocygnidae and Anhimidae-Anseranatidae, respectively ( Fig  2). Furthermore, the Anatidae-Dendrocygnidae group comprises three major lineages (Fig 2).

Tempo and rate shifts of diversification
The empirical LTT plot displayed that the diversification rate of the Anseriformes had a significant increase during the late Miocene after a long period of constant diversification (Fig 3). In the BAMM analyses, we confirmed convergence of the MCMC chains after discarding the burn-in, according to the result of effective samples size (ESS is 900). In the Anseiformes, the 95% credible set of rate shift configurations with the highest probability included three or more core shifts (Fig 4a). The best configuration detected three shifts within the Anseriformes birds (Fig 4b), suggesting that the mean initial speciation rate was accelerated three times, and finally increased to approximately 0.3 sp/My in their evolutionary history (Fig 4a).

Classification implications based on phylogenetic relationships
In the last few decades, the notion of dividing the Anseriformes into three families (Anatidae, Anhimidae and Anseranatidae) was widely prevalent [19,[35][36][37][38][39][40]. In 1990, Sibley and Ahlquist classified Dendrocygna species into an independent family (Dendrocygnidae) based on DNA-DNA hybridization [42]. However, it was not accepted widely [43]. Based on the tree of two concatenated mitochondrial genes, the Anhimidae clustered together with Anseranatidae, which contains a limited number of species (3 species). Meanwhile, the Dendrocygnidae clustered together with Anatidae, which comprises the majority of species (approximately 125 species). In this study, both trees (mito-genomes, Fig 1 and two concatenated mitochondrial genes, Fig 2) showed that Dendrocygnidae is an independent clade, which diverged in the late Eocene at about 37.3 / 34.1 Ma BP (Figs 1 and 2). In birds, the COI gene is considered as standard for DNA barcoding [108]. For the COI gene, the mean K2P distances within-species, genus and family are 0.43%, 7.93% and 12.71%, respectively [108]. In this study, K2P distances between Dendrocygnidae and the other three families (Anatidae, Anhimade and Anseranatidae) of species ranged from 15.8% to 19.0% (Table 1), which are higher than the average genetic distances within the family. In addition, the mean K2P distance between Anatidae and Dendrocygnidae is 15.8% (Table 1), which is significantly different from the mean K2P distance within the Anatidae (10.3%, Table 2). Therefore, our study provide support for Dendrocygnidae to be an independent family, rather than as a genus within Anatidae. The crested duck (Lophonetta specularioides) and spectacled duck (Speculanas specularis) were once considered as members of the genus Anas [109][110][111]. In this study, six species from genera Amazonetta, Speculanas, Tachyeres and Lophonetta clustered in a distinct clade (Clade C , Fig 2), which is the sister group to Clade B (Fig 2). Meanwhile, Clade B is comprised of twelve Anas species (Fig 2). Besides, Clade B and C are sister groups to Clade A, which is comprised of 31 Anas species (Fig 2). Based on phylogenetic relationships, the extant Anas species (Clade A and B) do not form a monophyletic group. Therefore, in systematics, we suggest that placing these six species (Clade C) into the genus Anas would be more appropriate.
The Egyptian goose (Alopochen aegyptiacus) is the only extant species in the genus Alopochen [110,111]. Based on the phylogenetic analysis, Alopochen aegyptiacus nested tightly together with the Tadorna species in Clade D, rather than being an independent clade (Fig 2). It is obvious that Alopochen aegyptiacus is a member of the genus Tadorna. What's more, the high incidence of hybridization between Alopochen aegyptiaca and Tadorna tadorna also suggest the close genetic proximity between Alopochen and Tadorna [112]. Therefore, we propose that Alopochen aegyptiacus should be placed in the genus Tadorna.

Ancestral diversification during the Early Eocene Climatic Optimum
Compared to the members of Anhimidae and Anseranatidae, Anatidae and Dendrocygnidae species have well developed feet webs which helped them become successful swimmers [43]. They tend to exploit different ecological niches, which eventually generates ecological adaptation in their evolutionary history. For example, the Anatidae and Dendrocygnidae species exist in various types of lacustrine systems with a worldwide distribution, except for the South Pole, while the Anhimidae (found in South America) and Anseranatidae (Australia, Indonesia and Papua New Guinea) species have a narrower distribution [30, 32, 33, 35]. Our study identified the ancestral diversification between the Anhimidae-Anseranatidae group and Anatidae-Dendrocygnidae group occurred at about 58.5 / 56.3 Ma (Figs 1 and 2) in the late Paleocene. Furthermore, we compared this estimated divergence times with other studies [113][114][115] using TimeTree (http://timetree.org/) [116], which also proved that these estimates are reliable.
From the late Paleocene (~58 Ma) to the early Eocene (~50 Ma), the earth's surface experienced a long-term warming trend that culminated in an extended period of extreme warmth called the Early Eocene Climatic Optimum (EECO) [117][118][119][120]. Remarkably, at the boundary between the Paleocene and Eocene epochs (~56 Ma), a most intense and abrupt interval of global warming occurred, called the Paleocene-Eocene Thermal Maximum (PETM) [121][122][123][124]. During the EECO interval, the mean annual temperature and the mean annual rainfall increased significantly, while these climate changes promoted a major increase in floral and faunal diversity [118,125,126]. For example, they have induced diversification in many animal groups, such as insects [127], salamanders [128], bats [10] and the ruminants [12]. In this study, this ancestral diversification in Anseriformes occurred close to the PETM during the EECO. Therefore, we suggest that the warmer climatic conditions during the EECO induced this ancestral diversification in Anseriformes, which reinforced their habit of exploiting different ecological niches.

Rapid and recent diversification during the Plio-Pleistocene
Diversification patterns and other potential driving factors, together with accurate divergence time estimation can be used to predict species change [129]. Therefore, we performed a test to characterize the patterns of diversification of this group by using the molecular phylogeny as a robust framework. Based on the results of divergence time estimation and LTT plot analyses, rapid and recent diversification events can be observed to have occurred in the Anseriformes since 10.0 Ma in the Late Miocene (Figs 2 and 3). Clearly, a higher speciation rate has characterized the rapid and recent diversification in the Anseriformes during the Plio-Pleistocene, while the mean initial speciation rate accelerated three times, and finally increased to approximately 0.3 sp/My (Fig 4). Specifically, the Anatidae comprises the majority of species (approximately 148), which diversified at about 29.1 / 25.1 Ma (Figs 1 and 2). Therefore, we proposed that the Anatidae species might have experienced rapid and recent diversification during the Plio-Pleistocene.
Particularly, many waterfowls (e.g., Mesitornis unicolor, Nipponia Nippon, Pelecanus crispus and Balearica regulorum) have undergone rapid and recent diversification during the Pleistocene [155]. Besides, the Plio-Pleistocene climate fluctuation have also driven the rapid speciation events in Anser species [25]. Therefore, we proposed that the Plio-Pleistocene climate fluctuation have also played a significant role in diversification of Anseriformes birds.

Conclusion
In this study, we sequenced and annotated nine mito-genomes of Anseriformes birds, which ranged from 16,599 bp to 16,630 bp. In addition we have included two mitochondrial genes and reconstructed a strongly supported phylogeny, which covered the majority species (128 species, 39 genera) in Anseriformes. Based on the results of molecular phylogeny and genetic distances, we revised the classification of the Dendrocygna, Amazonetta, Speculanas, Tachyeres, Lophonetta and Alopochen. Furthermore, we strongly suggested that the Dendrocygna species should be considered as an independent family (Dendrocygnidae). During the EECO (from 58 to 50 Ma), the warmer climatic conditions induced the ancestral diversification in Anseriformes, which reinforced their habit of exploiting different ecological niches. At last, our study provided evidence to support that the Plio-Pleistocene climate fluctuation fostered the rapid and recent diversification in Anseriformes, especially in Anatidae.
Supporting information S1