Cytochrome b Divergence between Avian Sister Species Is Linked to Generation Length and Body Mass

It is increasingly realised that the molecular clock does not tick at a constant rate. Rather, mitochondrial mutation rates are influenced by factors such as generation length and body mass. This has implications for the use of genetic data in species delimitation. It could be that speciation, as recognised by avian taxonomists, is associated with a certain minimum genetic distance between sister taxa, in which case we would predict no difference in the cytochrome b divergence of sister taxa according to the species' body size or generation time. Alternatively, if what taxonomists recognise as speciation has tended to be associated with the passage of a minimum amount of time since divergence, then there might be less genetic divergence between sister taxa with slower mutation rates, namely those that are heavier and/or with longer generation times. After excluding non-flying species, we analysed a database of over 600 avian sister species pairs, and found that species pairs with longer generation lengths (which tend to be the larger species) showed less cytochrome b divergence. This finding cautions against using any simple unitary criterion of genetic divergence to delimit species.


Introduction
In the face of mounting evidence that the rate of molecular evolution varies between different lineages, biologists have been increasingly obliged to abandon the simple, albeit appealing, idea of a molecular clock ticking at a constant rate [1]. As knowledge of the specifics of rate variation has grown, so the emphasis has shifted towards trying to understand the basis for variation. Several factors, which may be linked to one another, for example body size, metabolic rate, generation length and population size, have been identified as correlates of the rate of molecular evolution [2][3][4].
While body size may be correlated with the rate of molecular evolution, that correlation does not of itself shed light on the underlying cause of the correlation. First, the correlation might be mediated via generation length [5]. Species that are smaller tend to have shorter generation lengths and thus, per unit time, a higher number of DNA replication rounds within the germline and, consequently, a greater probability of replication errors [4], [6].
Second, the correlation between body size and rate of molecular evolution might be mediated via metabolic rate. Among homeotherms, such as birds, whole-body metabolic rate scales with body mass to the power of approximately L and mass-specific metabolic rate to the power of -J [7][8], but see also ref [9]. That scaling could lead to a reduced rate of mitochondrial respiration in larger animals and a reduced rate of production of mutagenic free radicals, a by-product of aerobic respiration [10][11]. Whether such scaling in the somatic cells also affects the germ-line cells, from where mutations will be transmitted to the next generation, remains unclear [12][13].
A separate hypothesis holds that smaller population sizes, which are often associated with larger body size [14], may allow more slightly deleterious mutations to drift to fixation. This leads to a prediction of an inverse relation between effective population size and rate of molecular evolution, a prediction for which Woolfit and Bromham [15] obtained support from island populations. It also leads to the possibility that larger species, whose generally smaller populations will be exposed to a greater risk of fixing deleterious mutations, may invest more in DNA copy fidelity and repair mechanisms [4], [16] -which would enhance the inverse relationship between generation time and rate of molecular evolution. This enhanced reduction of mutation rate in longlived, as opposed to short-lived, species is the 'longevity hypothesis' [13], [16]. Its predictions overlap with those of the generation time hypothesis of the previous paragraph.
In the past, when the biological species concept prevailed [17], genetic distance data were not widely used to inform species delimitation. However, over the last three decades as, first, aminoacid and then nucleic acid sequence data became available, so the phylogenetic species concept has risen in prominence [18], not least because sequence data are well suited for phylogenetic reconstruction [19]. While ornithologists have sometimes struggled with how best to integrate molecular data into species diagnosis [20][21], such data are frequently used in practice [22][23][24][25]. This renders it crucial to understand factors impinging on the genetic distance between related species. If a suite of factors influences mutation rate, then this information should certainly be noted and possibly be incorporated when genetic data are used to delimit species. In fact, it has long been recognized that there is no uniform level of genetic divergence between vertebrate species, whether allozyme data [26] and/or DNA sequence data (e.g. from cytochrome b [27]) are used. This is to be expected, because the times since sister taxa first diverged and then speciated will vary. That proviso acknowledged, if speciation tended to be associated with a certain minimum genetic distance between sister taxa, then we would predict no difference in the cyt b divergence of sister taxa according to body size or generation time. Alternatively, if speciation tended to be associated with the passage of minimum amount of time since divergence, sufficient, for example, for reproductive isolation to develop, then there might be less genetic divergence between sister taxa that are heavier and/or with longer generation times [28], because, as discussed above, these species will have experienced lower rates of molecular evolution at neutral marker sites.
Our paper addresses the possibility of a relationship between the cyt b divergence of avian sister taxa, the response variable in the analyses, and the average mass, mass difference and generation length of those taxa. To our knowledge, this possibility has not hitherto been investigated. We chose cyt b as a gene widely used in avian barcoding studies [29]. Crucially, we emphasize we are taking current species as a given, and asking whether we can identify factors that are correlated with the degree of genetic difference between those species.

Genetic distance values
Gene sequences were obtained from Genbank for the mitochondrial gene cyt b. All available sequences within a particular genus were downloaded into MEGA4 [30], provided that at least half of the species within the genus had sequences available. The consequences of using a more stringent 75% cutoff, as opposed to the 50% cut-off described, are trivial (see Results). Sequences within each genus were then aligned using ClustalW [31] within MEGA4, and were adjusted by eye. Any sequences that did not align, or were much shorter than all others, were removed. The ends of sequences were then trimmed so that all were of the same length within the alignment group.
Once the sequences were aligned, molecular phylogenies were created for the various individual genera, using both the maximum parsimony and neighbour joining methods, in order to find the sister species pairs within that genus, including the cases where a given sister species pair was already known. In many cases, published phylogenies supported the pairings. Species involved in unresolved polytomies were discarded. These genus-level phylogenies were not used in the wider phylogenetic analysis discussed below and reported in the Results.
After sister species pairs were found, the genetic distance between them was calculated, using the Tamura-Nei model [32].   The Tamura-Nei model accounts for unequal nucleotide frequencies, and different rates of nucleotide transitions and transversions. Where multiple sequences were present for either species, the average value of divergence between the species was calculated. It is worth noting that, as our comparisons involved sister species, saturation of sites is unlikely to be a serious problem. The full dataset is presented in the Table S1.

Mass data
Data on species masses were obtained from [33]. Where a mass range rather than mean value was given for a species, the median of this range was taken. The mean value for each species was calculated if more than one value was given (i.e. separate masses provided for males and females). For analyses of average body mass, a single value was needed for each pair, rather than separate values for each species, so the mean value of the masses was calculated. Where mass data were only available for one of the species pair, then that value was taken to represent both species. However, for analyses involving differences in species' mass, we could only use those pairs where mass data were available for both species. The difference in mass was calculated and then expressed as a percentage of the mass of the heavier member of the pair.
Non-flying species, ratites and penguins, were excluded from the mass and generation time analysis since determinants of these species' masses are likely to be different to those of volant birds.

Generation length
Species generation lengths were obtained from a database supplied by BirdLife International (http://www.birdlife.org/ datazone/home) which, following [34], defines generation length as the average age of the parents of the current cohort.

Statistical analysis
Statistical analyses were carried out in R [35] and R commander [36]. Initial histograms of the cyt b divergences revealed that the data were positively skewed, so the Shapiro-Wilk test for normality [37] was carried out. The data were not normally distributed, and so the square roots of the Tamura-Nei values were used for subsequent analyses.
After initial investigation using ordinary least-squares (OLS) models, we analysed the effects of mass and generation length on divergence using phylogenetic generalized least-squares models (PGLS), using the pglmEstLambda function in CAICR [38] and ape 2.3 packages [39] in R. This method gives an estimate of the strength of the phylogenetic signal within the data (the l value: [40]) as well as estimating the effect of a given factor when the phylogeny is incorporated into the model. As with the linear analyses, the PGLS analysis used the square root of the Tamura-Nei distance for cyt b divergence to improve normality of residuals.
We fitted all possible combinations of our three predictor variables and their interactions, comparing models using Akaike's Information Criterion (AIC; all fitted models are shown in Table 1). Following Burnham & Anderson [41] we accepted as ''top models'' all those within a threshold of 2 AIC points of the minimum within the model set.
A phylogeny of bird species, based on genetic, behavioural and morphological data, was obtained from the Tree of Life [42]. The tree did not contain all the genera within the dataset, so the dataset was trimmed to contain only those genera that were available within the tree, and likewise the tree was trimmed to only those genera found within the dataset. Hence the dataset used in phylogenetic analyses was a subset of the original. Branch lengths on the phylogeny were unknown and were therefore arbitrarily set to 1.

Results
The mean Tamura-Nei divergence of cyt b between sister species pairs was 0.04916s.d. 0.0334 (5th-95th percentiles = 0.005-0.109: n = 633 species pairs; see Table S1). If the analysis is restricted to the smaller sample of those genera where sequence data were available for at least 75% of species within the genus (see Methods), then the mean Tamura-Nei distance barely changes (mean = 0.049260.0336: n = 512). For this reason, all results presented henceforth are based on the less stringent 50% cut-off outlined in the Methods.
All models and their associated coefficients are given in Table 1. There was a single clear top model, which included only two predictor variables: percentage mass difference and generation length (DAIC to the next best model = 2.72). Phylogenetic signal was strong in this model (l = 0.375, test of l versus 0, x 2 = 19.286, p = 1.125610 25 ). According to this model, an increase of 10 percent in the mass difference between the two members of a species pair was associated with an increase of approximately 0.009 in their square-root transformed cyt b divergence ( Fig. 1; dropping ''mass difference'' from top model, PGLS, F 1,274 = 8.892, p = 0.003). In contrast, an increase of one year in the average generation length of a species pair was associated with a decrease of 0.004 in the square-root transformed cyt b divergence between the two species ( Fig. 2; dropping ''generation length'' from top model, PGLS, F 1,274 = 6.997, p = 0.009).
Superficially, there was also what appeared to be a negative relationship between the average mass of a volant species pair and the cyt b divergence of the two species, even though this variable did not appear in the top model. Although this relationship was significant in a single-factor analysis under OLS regression (Fig. 3), incorporating phylogenetic information greatly reduced its significance (PGLS; F 1,275 = 4.339, p = 0.038, l = 0.417, test of l versus 0, x 2 = 14.749, p = 0.0001), consistent with a general pattern of strong phylogenetic conservatism in body mass [43]. When the other important predictor variables were incorporated, the explanatory power of body mass became non-significant (dropping ''body mass'' from the ''M+D+G'' model, PGLS, F 1,273 = 3.257, p = 0.072).
Taken together, we interpret these results as supporting the idea that generation length and percentage mass difference have independent effects on cyt b difference, but mass per se does not, at least when phylogeny is accounted for.

Discussion
Our principal finding was that the cyt b divergence between avian sister taxa decreased as generation length increased. Divergence also decreased as body mass increased, but there is no strong and conclusive evidence that this latter effect was either independent of generation length, to which mass is collinearly related [28], or robust to phylogenetic correction. Thus while speciation is associated with the accumulation of some minimal amount of genetic divergence in neutral markers such as cyt b, other factors bear on the genetic divergence between recognised species and must be borne in mind when genetic data are used for species diagnosis [22][23][24][25]. Therefore, our findings caution against the simple expectation that there might be a uniform genetic divergence between sister species, particularly when comparison is made between species pairs drawn from different lineages.
Cyt b divergence was positively associated with the percentage difference in body size of species (Fig. 1). This association was to be expected, as species that have diverged proportionately more in body size are likely to have speciated in the more distant past, accumulating a greater number of changes in cyt b sequence. That would remain true whether the molecular clock 'ticks' at a rate proportional to absolute time, generation time or metabolic rate.
While percentage difference in mass between sister taxa is related to their cyt b divergence, mass per se has limited effect. This suggests that mass-specific metabolic rate, allometrically related to mass, has limited effect on substitution rates and divergence [10], [11], a negative result supporting those previously obtained [6], [12]. However, our conclusion must be cautious as we did not analyse actual metabolic rates.
There was a decrease in the divergence between species as their generation length increased (Fig. 2). We therefore tentatively suggest that speciation is typically recognised by taxonomists after the passage of a certain amount of time representing, on average, fewer generations in larger species, and more generations in smaller species. Those species with shorter generation lengths may have about the same proportion of DNA changes per generation as those with longer generations, and therefore more changes will accumulate per year in the species with shorter generation times [10]. However the amount of variance explained was very low, about two percent (Fig. 2, legend). Superior DNA repair in the longer-lived species (see Introduction) could contribute to the low amount of variance explained, as could the fact that the gene used in this study, cyt b, was mitochondrial. Such mitochondrial genes may show a weaker correlation with generation time than nuclear DNA, as there is a larger and more variable number of duplications of mitochondrial genomes per generation than of nuclear genomes [44][45].
Lanfear et al. [46] reported that rates of molecular evolution of protein-coding nuclear genes were positively correlated with rates of diversification in various avian lineages. If mutation promotes speciation, then it is tempting to conclude that the difference between the cyt b of sister species might be lower in fast-mutating lineages of smaller taxa. However that need not be the case. If a certain minimum amount of time is required to elapse before diverging lineages are recognised as species by taxonomists, then the smaller species pairs with shorter generation times might show higher divergence, as we find. Implicit in our study is an assumption that the scientific behaviour of avian taxonomists is similar across the range of bird weights. Our results could be explained if cryptic species, awaiting 'splitting', were disproportionately represented among smaller species of low body mass and short generation time. Objectively excluding this possibility would be extremely taxing and we merely observe that avian taxonomy at the species level remains in a state of flux from the smallest Phylloscopus warblers (,10 g: [47]) to large albatrosses (c. 5 kg; [22], [48]).
The fact that a signal indicating effects of body mass and generation length on the extent of genetic divergence between sister taxa can be recovered at all is perhaps remarkable. It confirms speciation as an ongoing biological process that continues to create problems for species delimitation.

Supporting Information
Table S1 The Table shows the 633 sister species pairs used in the analysis, and the cytochrome b divergence between the pairs. (DOC)