Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

  • Stephen A. Smith ,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America

  • Joseph W. Brown ,

    Contributed equally to this work with: Joseph W. Brown, Joseph F. Walker

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom

  • Joseph F. Walker

    Contributed equally to this work with: Joseph W. Brown, Joseph F. Walker

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Resources, Software, Writing – original draft, Writing – review & editing

    Affiliation Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

  • Stephen A. Smith, 
  • Joseph W. Brown, 
  • Joseph F. Walker


Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.


Divergence-time estimation is a complicated, but often essential, step for many phylogenetic analyses. The sources of error include the ambiguous nature of fossil placement, model mis-specification (e.g., involving significant variation in the branchwise and/or sitewise rates of evolution), uncertainty in the phylogenetic tree, topological dissonance amongst gene trees due to incomplete lineage sorting, and complexity of the model for the molecular clock [16]. While fossils give the only available information for absolute age, their placement and age carry uncertainty. Multiple fossil calibrations and complicated tree shape priors can interact to further complicate molecular dating [711]. Rate variation is common among individual branches of a phylogeny and can constitute extensive deviations from the molecular clock. As a result, complex models have been developed to accommodate for these deviations [1214]. However, these more parameter-rich models also carry with them significant uncertainty and can, when the data deviate significantly from the model, lead to biased results [15]. Despite these difficulties, researchers continue to use divergence-time estimates extensively as they remain essential for many downstream evolutionary and comparative analyses.

Datasets based on thousands of genes from genomes and transcriptomes have emerged as a major tool in addressing broad evolutionary questions including, but not limited to, phylogenetic reconstruction, gene and genome duplication, and inference of molecular evolutionary patterns and processes. And while these datasets have been used for divergence-time estimation [16, 17], their overall utility for divergence-time analyses has not been fully examined. In particular, it is unclear whether within these enormous datasets there exist nearly clock-like gene regions that may aid in producing lower error divergence-time estimates. While some authors of recent large genomic analyses, such as Jarvis et al. [16], have suggested restricting analysis to clock-like genes, a repeatable and fast procedure to identify these genes has not been explored for phylogenomics, and an examination of the frequency of these genes in empirical datasets has not yet been conducted.

Researchers can take steps to ease sources of errors for divergence-time analyses. For example, better use of fossils in temporal calibrations can dramatically improve estimations [3, 18], as does better accounting for rate variation in the molecular models by improving model fit. Several relaxed clock models have been introduced over the last few decades to accommodate rate heterogeneity because most data do not conform to a strict clock. The most commonly used relaxed clock methods include penalized likelihood [12] as implemented in r8s [19] (and more recently in treePL; [20]), and Bayesian uncorrelated rate models (e.g., the uncorrelated lognormal (UCLN) model; [13]) as implemented in BEAST [21] and MrBayes [22], although many other methods have been developed and new ones are continually released [14, 2332]. The diversity of techniques is matched with a variety of different inputs. For example, PL implementations minimally require an estimated phylogram, calibration, smoothing penalty value, and alignment size, while full Bayesian methods minimally require an alignment and priors to be set for each parameter, including any fossil calibrations.

Bayesian methods that use relaxed clock models, such as those implemented in BEAST, MrBayes, and PhyloBayes [25], simultaneously estimate phylogenetic relationships and divergence-times, and so may be preferred over other approaches as Bayesian methods incorporate uncertainty more easily and explicitly. However, the computational burden of these simultaneous reconstruction methods limit their use to smaller datasets (i.e., excluding entire genomes and transcriptomes). Fortunately, a prescient solution to this dilemma was proffered two decades ago with the concept of “gene shopping” [33, 34], wherein available genes are filtered by how well they conform to a molecular clock. [A related procedure, “taxon-shopping” [23, 35, 36], prunes taxa from an alignment until the dataset no longer rejects a molecular clock test. We do not consider this approach here.]. Using “gene shopping”, it should be possible to reduce larger datasets to alignments that are capable of being analyzed by Bayesian methods. However, despite it having been available for decades, “gene shopping” has not been widely applicable before the recent development of next-generation sequencing techniques because of the relatively small number of genes available for any single clade (outside of model organisms). However, as genomic and transcriptomic datasets have become more readily available, “gene shopping” holds tremendous promise as a tool for inferring phylogenetic timescales. Nevertheless, the utility of these large genomic datasets for divergence-time estimation and the distribution of lineage-specific rate heterogeneity has yet to be fully explored.

Next generation sequencing techniques have dramatically increased the number of gene regions available for phylogenetic analysis. This has stimulated research into questions that are specifically pertinent to datasets with hundreds or thousands of genes. What is the best method for reconstructing the species tree [3739]? How many genes support the dominant species tree signal [40, 41]? Genomic datasets also allow us to examine the extent of molecular rate variation in genes, genomes, and lineages. For example, Yang et al. [42] explored the distribution of lineage-specific rate heterogeneity throughout transcriptomes of the plant clade Caryophyllales as it relates to life history. Jarvis et al. [16], analyzing a genomic dataset of birds, explored rate heterogeneity and selection as it relates to errors in phylogeny reconstruction in a genomic dataset of birds. Recently, the clock-likeness of phylogenomic datasets has come of interest to the community. For example, Doyle et al. [43] attempt to identify strictly clock-like genes in order to avoid long branch attraction artifacts in phylogenetic inference. Jarvis et al. [16] filtered gene regions by inferred mean coefficient of rate variation (a measure of clock-likeness) from full Bayesian analyses of each gene, to identify “clock-like” genes explicitly for divergence-time estimation. However, while these authors have conducted filtering analysis on their genomic data, a thorough examination of lineage-specific rate heterogeneity across clades for divergence-time estimation has not been conducted.

Here we present one means of utilizing genomic data for estimating divergence-times by introducing a simple winnowing procedure to identify informative and low variance genes (see description below). Genes that satisfy these criteria may enable convenient and efficient divergence-time estimation for large datasets as methods such as clock partitioning are difficult when dealing with dozens, hundreds, or even thousands of genes [44]. Additionally, this procedure can be used to examine the overall distribution of evolutionary rate heterogeneity, bipartition concordance, and potential utility of genes for divergence-time analysis. It is assumed that a researcher will possess a species tree in order to determine what gene regions best conform to that hypothesis. This can be especially important given the underlying conflict present in most genomic datasets. The procedure holds promise that, while various relaxed clock models are available (conducive to accommodating different forms of rate heterogeneity), data have lower variance and may not require as extensive rate modeling will enable fast, accurate, and precise divergence-time estimates [45, see also]. Finally, we examine six genomic and transcriptomic datasets across animals and plants and with different temporal and taxonomic scopes to examine the extent of lineage-specific rate heterogeneity. We investigate the distribution of variation in the branchwise rates of evolution across thousands of genes to understand whether these new genomic resources may improve divergence-time estimation by enabling analysis using simpler models of molecular evolution.

Materials and methods

Description of procedures

Because phylogenomic datasets consist of hundreds to thousands of genes, we developed a winnowing procedure intended to mimic a “gene shopping” exercise (see Fig 1). Our approach has two requirements: (i) a species tree topology must be provided (presumed to have been inferred by some means), and (ii) individual gene trees must be rooted (this is typically performed using outgroups). The filtering procedure itself relies on three criteria relevant to divergence-time estimation: (i) a root-to-tip variance statistic (an indication of clock-likeness), (ii) a bipartition calculation to determine the similarity to the provided species tree, and (iii) the total tree length (i.e., discernible information content). Our implementation of this pipeline sorts first by the similarity to the species tree, then root-to-tip variance, and finally tree length. This procedure produces a ranked list of genes that jointly satisfy the filtering criteria. Below we demonstrate the efficacy of this approach using both simulated and empirical data. The analyses performed herein can be conducted using the SortaDate package (with source code and instructions available at This package is written in Python and available as an open source set of procedures. In some cases, external programs are used (e.g., those found in the Phyx package; [46]) that are also open source and freely available.

Fig 1. Winnowing criteria used for sorting genes for use in divergence-time inference analyses.

The order presented here is arbitrary.

Dataset processing

We analyzed six published phylogenomic datasets to explore rate and topological heterogeneity: birds (Aves, BIR) [16], carnivorous Caryophyllales (CAR) [47], the broader Caryophyllales (CARY) [42], Vitales (VIT) [48], Hymenoptera (HYM) [49], and millipedes (Diplopoda, MIL) [50]. The range in datasets spans different taxonomic groups (e.g., animals vs. plants), datasets sizes (e.g., CAR vs CARY), and age (e.g., from hundreds of millions of years to within the last hundred million years). For all datasets except BIR orthologs were identified using the Maximum Inclusion method of [51]. These data are available at For each ortholog we have an alignment and maximum likelihood (ML) gene tree inferred in RAxML v8.2.3 [52] using the GTR+G model for DNA and WAG+G for amino acids. For the BIR dataset we used the exon alignments and inferred phylograms available online ([53]; Gene trees, regardless of the source of orthologs, were then rooted and SH-like tests were performed to assess confidence in edges [54].

Gene tree analyses

Because deviation from the clock is empirically manifest in a phylogram as a departure from ultrametricity, we measured the variance of root-to-tip lengths for each ML gene tree phylogram. This was performed on each rooted ortholog, for which outgroups were removed with the pxlstr program of Phyx package [46]. We performed the standard clock test for each ortholog [55], with outgroup removed, using PAUP* v4.0a151 [56] by calculating the ML score for a gene both with and without assuming a clock, and then performing a likelihood ratio test. In addition to assessing the clock-likeness of genes, we also compared gene tree topologies to the corresponding published species tree topology. Branch lengths were not available for some species trees. To compare the individual gene trees to their corresponding species trees, we conducted bipartition comparison analyses on each gene tree using pxbp from the Phyx package [41].


We conducted simulations to examine expectations of rate variation given clock-like, noisy clock-like, and uncorrelated lognormal data. We first generated simulated clock-like data using Indelible v1.03 [57] using the WAG model with 500 characters for amino acid datasets, and JC with 1500 characters for nucleotide datasets, on each of the empirical species tree topologies. For these data simulations, because the species tree often had no branch lengths available, node heights were first simulated randomly using Indelible and then the tree was rescaled to a height of 0.25, 0.5, or 0.75. We used the trees generated by Indelible to further simulate 100 noisy clock (rate = 1.0, noise = 0.25, and rate = 1.0, noise = 0.75) and uncorrelated lognormal (UCLN) trees (mean.log = -0.5, stdev.log = 0.5, and mean.log = -0.5, stdev.log = 1.0) using NELSI v0.21 [58]. We note the ‘noise’ in NELSI corresponds to the standard deviation of a normal distribution with mean = 0. For the noisy clock, branch-specific rates are a sum of the global rate (here, 1.0) and a draw from this normal distribution. The simulations with noise = 0.75 thus are only loosely clock-like, and serve as a comparison between the more clock-like (noise = 0.25) and UCLN analyses. We used RAxML to reconstruct each of these datasets. For each simulation, we examined the rate variation and the root-to-tip length variation on the reconstructed phylograms.

While the focus of this study is not the performance of divergence-time estimation methods, we nevertheless examine an exemplar from the simulations to ascertain the variation in the results given different clock models. We used one random realization of node heights as simulated from the Indelible analyses as described above to generate two datasets with NELSI. One dataset had three genes generated from a clock rate of 1 and noise at 0.25, and the other dataset had three genes generated from a UCLN model and mean.log at -0.5 and sd.log = 1. As above, each amino acid gene consisted of 500 residues, while DNA genes consisted of 1500 nucleotides. For each simulation, all three genes shared a common topology (but with different edge lengths, as our filtering procedure involves a single focal topology). For both the noisy clock-like and UCLN datasets, we conducted BEAST analyses with both a clock model and a UCLN model. Thus we have two scenarios where the generating and inference clock models are identical, and two where they are mismatched. A birth-death tree prior was used as the prior for all node heights, and runs were conducted for 10 million generations with the first 10% discarded as burnin. Results were summarized using treeannotator from the BEAST package. Median node heights as well as 95% highest posterior density (HPD) node heights were compared between the simulated datasets and the tree used to generate these datasets.

Finally, to demonstrate the inference precision afforded by our winnowing procedure, we conducted simulations where heterogeneity was restricted to overall molecular substitution rates (i.e., tree lengths) and the degree of unclock-like noise involved. In all, 1000 Yule trees with a speciation rate of 1.0 were simulated for 12 taxa using pxbdsim from the Phyx package. For each tree, we simulated 1000 nucleotide alignments under JC, to reduce additional variation, and a strict clock model with varying degrees of noise. ML trees for each alignment were inferred using RAxML. From each pool of alignments we constructed 2 datasets: (i) three genes chosen randomly, and (ii) three genes selected from the SortaDate procedure. All datasets were analyzed in BEAST under both strict clock and UCLN models, where genes shared a clock model but had gene-specific relative rates and individual GTR+G substitution models. The tree topology was fixed to match the true tree. As above, analyses were conducted for 10 million generations with the first 10% discarded as burnin. Finally, we rescaled the root height of the simulated tree to the true height and then we compare the inferred median node ages from the resulting maximum clade credibility trees to the true simulated tree. We note that because within a simulation there is no real topological conflict, any heterogeneity involved in inference accuracy involves (i) the gene selection criteria and (ii) the clock model used for inference.

Sorting and dating analyses on real data

In addition to these analyses on simulated datasets, we conducted divergence-time analyses on a subset of the empirical datasets. For these examples we limited the results to the top three genes reported from the sorting procedure, although for empirical analyses one should choose the number of genes by carefully examining the filtering results. Because we filtered for genes that were consistent with the species tree, these genes were then concatenated and the topology was fixed to be consistent with the species tree. We applied individual substitution models to each gene within a dataset. However, given that the genes were filtered both to match the species tree and for clock-likeness, we modeled all genes with a single molecular clock (albeit with gene-specific relative rates). For each dataset, we conducted two BEAST analyses, one assuming a strict clock and the other assuming a UCLN model. Because specific dates were not the focus of this examination, the birth-death tree prior was used instead of fossil priors for nodes. The analyses were run in duplicate for 50 million generations with the first 10% discarded as burnin. Replicate MCMC logs were concatenated while removing burnin using the pxlog program from the Phyx package, and finally summarized using treeannotator as above. Median node heights and 95% HPD node heights were compared between the clock and UCLN runs as the node heights on the true phylogeny are unknown.

Results and discussion

A fundamental question for each empirical dataset examined here is whether there are clock-like gene regions present within the genome? Results were varied, from 0.4% of genes passing the strict clock test for the VIT dataset to 17% for the MIL dataset (see Table 1). Those datasets with fewer clock-like genes may have underlying models that deviate significantly from the clock. However, there are several reasons that, even with a strict clock model underlying these data, we might expect deviations from a clock. For example, we expect stochastic differences to accumulate with increased sampling (i.e., more edges) and older trees (i.e., longer edges). Within the datasets examined here, two, the CAR and CARY, overlap in sampling with the CAR dataset having far fewer taxa in total. The broader sampling within the CARY dataset, in addition to greater dataset size, also results in including more known life-history shifts [42]. These differences may account for the variation between these two datasets with the CARY dataset having significantly more variation and fewer clock-like genes. As for clade age, HYM and MIL datasets include divergence-times that are significantly older than the other datasets, which may account for the rate variation within those data. Nevertheless, each dataset had at least a few orthologs that passed a strict clock test even if these orthologs were in the small minority.

Table 1. Dataset properties and results of likelihood ratio tests for strict clock-like gene behaviour.

Because passing a clock test does not necessarily indicate that the gene would be good for phylogenetic reconstruction (e.g., if the rate of evolution is so slow as to be uninformative), we also measured tree length and root-to-tip variance for each ortholog (see Figs 2 and 3). Clock tests are stringent in their need to conform to the clock (see below) and so by examining the root-to-tip variance and lineage-specific variation (that is, deviation of individual terminals from the mean), we are more directly examining the deviation from ultrametricity. Although this is primarily descriptive and does not include a formal test, this provides an easily interpretable characterization of rate variation. We found that the datasets varied dramatically with no discernible general pattern for both root-to-tip variance and tree length. For example, the BIR dataset demonstrated very little molecular evolution as demonstrated by the short tree lengths. For this dataset, we analyzed nucleotides (rather than amino acids) to maximize tree lengths as Jarvis et al. [16] demonstrated low rates of evolution, especially deep in the phylogeny. However, the inferred rates of evolution (as determined by overall tree length) were still low. Given the difficulty in resolving the avian phylogeny, this pattern is, perhaps, to be expected [16]. This same pattern is present in the VIT dataset, though this was not explored as thoroughly in the original publication. Both the CAR and CARY datasets showed a pattern of increasing variance with greater tree length (Fig 2). This contrasts with the HYM and MIL datasets that were clock-like even with longer tree lengths (Fig 3). Lineage-specific rate variation in each dataset was idiosyncratic with most extreme variation in the outgroups. While outgroups were excluded for clock tests and in determining root-to-tip variance for “gene shopping”, we allowed outgroups to remain for lineage-specific rate variation analyses as in the right hand plots of Fig 3. The VIT dataset was an exception with several lineages other than the outgroup having high rates. In each dataset, there were genes that fell within the distribution of simulated trees that are clock-like or clock-like with low noise.

Fig 2. Gene tree properties for the BIR, CAR, and CARY datasets.

Left: relationship between root-to-tip variance and tree length for simulated (clock and ucln) and empirical (data) datasets. Each simulation condition consists of 100 simulated datasets. Contours represent densities, while grey dots represent raw empirical values. Right: tip-specific root-to-tip variance for empirical datasets. Here, 0 represents the mean root-to-tip across all genes and all lineages. Red dots indicate outgroup taxa.

Fig 3. Gene tree properties for the VIT, HYM, and MIL datasets.

See Fig 2 for a description of the plotting attributes.

One potential benefit of identifying orthologs with lower lineage-specific rate variation within phylogenomic datasets is to use these, or a subset of these, orthologs to conduct divergence-time analyses. The hope is that by using clock-like genes, we may overcome or alleviate the impact of lineage-specific rate variation on the error of divergence-time analyses. The non-identifiability of rates and dates (e.g., longer branch lengths may be the result of a long time or fast evolution) is exacerbated by lineage-specific rate heterogeneity. We used a subset of orthologs to conduct divergence-time analyses and we implemented a sorting procedure (available in SortaDate) to (i) filter the genes that best reflect the species tree (i.e., higher bipartition concordance with the species tree), (ii) have lower root-to-tip variance (i.e., most clock-like), and (iii) discernible amounts of molecular evolution (i.e., greater tree length; Fig 1). For each empirical dataset, we generated such an alignment (see Table 2). The genes that were filtered and used for divergence-time analyses for the BIR, CARY, VIT, and HYM datasets rejected the clock. The genes for the CAR and MIL datasets either didn’t or weakly rejected the clock. Resulting HPD trees were rescaled so that the root heights were equivalent to allow for easier comparisons between datasets. Specifically, the ucln median root height was scaled to the clock median root height. Typically, fossil placements would be used for scaling but because these are not intended to be runs for future use, we eliminated fossil placements as one source of variation. We found rough correspondence of node heights between the clock and UCLN analyses, especially for the four smallest datasets (see Fig 4). The UCLN analyses, as expected, had far greater variance in the 95% HPDs for node ages. We found the greatest differences in the larger BIR and CARY datasets (see Table 3) where there are major differences in tree heights. This may reflect the size of the dataset or the underlying rate variation in the datasets. In general, strict clock estimates resulted in older median node ages than analogous UCLN estimates, as well as narrower maximum and minimum 95% HPD values (see Table 3). The coefficient of rate variation statistics (an measure of clock-likeness) for UCLN runs ranged from the lowest mean values of 0.2358 (stErr = 0.0347) in HYM to the highest of 1.2464 (stErr = 0.1135) in CARY.

Table 2. Properties of the filtered genes used in the empirical dating analyses.

Variance regards the root-to-tip paths. Tree length is measured in units of expected substitutions per site across all branches. Bipartition proportion measures agreement to the species tree topology (1.0 indicates complete concordance).

Table 3. The cumulative difference across all nodes in the height, lower 95% HPD, and higher 95% HPD of each node comparing the UCLN estimates to the strict clock estimates from the individual empirical dating analyses.

A value lower than 0 results when the cumulative difference in the clock values of height or HPD are younger than the associated UCLN values.

Fig 4. A comparison of strict clock and UCLN estimates of node ages for the six curated empirical datasets.

Bars represent 95% HPD intervals and overlay the UCLN maximum clade credibility tree.

As is always the problem with real datasets, the true divergence-times are unknown. So we conducted exemplar analyses. For each empirical dataset, we simulated data for three genes under both noisy clock and UCLN models to examine the variation in the resulting divergence-time analyses where the true dates were known. For these simulated datasets, a strict clock was rejected in each case, including those datasets that were simulated under a clock with noise. We compared the resulting node heights from the divergence-time analyses under clock and UCLN models with the tree used for simulation (see Tables 4 and 5 and Fig 5). For the datasets generated under a noisy clock model, more of the true node heights were found in the 95% HPD interval when using the UCLN model for inference than the strict clock model for inference. However, the precision as measured by the total width of the 95% HPD interval for the UCLN runs were much lower than the clock runs (see Tables 4 and 5). Those nodes that were not within the interval of the 95% HPD when using the strict clock model for reconstruction, were close to the true value. So, while fewer true node ages were contained in the strict clock HPDs, the overall error rate was lower. For example, in the CARY dataset, while fewer nodes in the clock estimate were found to be within the interval (52 vs 67 for the UCLN), the distance of the interval from the estimate was lower for the clock dataset for both the high and low value for the 95% HPD. Stated another way, the UCLN intervals were large enough that the true age was often included, but this was at the cost of far lower precision. Because of this error relative to the strict clock, the UCLN perhaps should not be the preferred model, especially if the researcher is going to use a single summary tree for future analyses.

Table 4. Assessment of dating error for the clock (CL) and UCLN (UC) analyses of the simulated clock data.

All measures involve distance from the true node age, and are cumulative sums across all nodes. Height is the inferred node age. Lower and Higher regard the 95% HPD node age bounds. Nodes (%) indicates the percent of true node ages contained within the HPD interval. Error is the total error involved, equivalent to Low + High.

Table 5. Assessment of dating error for the clock (CL) and UCLN (UC) analyses of the simulated ucln data.

All measures involve distance from the true node age, and are cumulative sums across all nodes. Height is the inferred node age. Lower and Higher regard the 95% HPD node age bounds. Nodes (%) indicates the percent of true node ages contained within the HPD interval. Error is the total error involved, equivalent to Low + High.

Fig 5. A comparison of strict clock and UCLN estimates of node ages for the simulated clock and ucln datasets.

Each simulation condition consists of three genes. Red and pink are scenarios where the generating and inference models are identical, while green and blue are where the models are mismatched. Bars represent 95% HPD intervals and overlay the true simulated tree.

Our results on the empirical datasets, both with simulated and real data, suggest that a ‘gene shopping’ approach is can yield more precise estimates of divergence times. However, because the complexity of the underlying gene trees for these empirical datasets can complicate our interpretation of divergence-time results, we conducted additional simulations where the underlying true trees and true divergence-times were known. In each simulation, genes shared the same underlying species tree but varied in the rate and noise of the underlying clock model. In comparing the divergence-times estimated using the genes identified from SortaDate versus those chosen at random, it is evident that winnowed genes produced more precise estimates (Fig 6). This is expected as SortaDate aims to reduce variation in aspects of genetic data that increase variance. In addition, reducing inference model complexity to a strict clock reduced error substantially. Given that the data were generated with a clock, this was not unexpected but we note that the UCLN model did not perform like a clock when the data are essentially clock-like. Therefore, when possible, it seems prudent to restrict model complexity (i.e., when the data conform to the model). However, whether this is possible will depend entirely on the properties of the empirical dataset in question.

Fig 6. Results from a simulation comparing total difference between true and estimated ages for three random genes under a clock (RC) and UCLN (RU) and three genes chosen with SortaDate under a clock (SC) and UCLN (SU).

Results are from 100 simulations of distinct parameter values. See details in the text.

Reconsidering standard clock tests

Several gene trees from the examples discussed fail a standard strict clock test but have low root-to-tip variance. To briefly explore whether this was the result of the stringency of the clock test, we simulated strict clock amino acid and nucleotide data on orthologs from each empirical dataset and examined the frequency of incorrectly rejecting a strict clock. The false rejection rate for clock tests using amino acid data and a strict clock were between 5% and 8%. For the two nucleotide datasets, the rejection rate was much higher at 23% and 46%. This suggests that for amino acid data, the false rejection rate was near the nominal value, while for the nucleotide datasets the false rejection rate was unreliable. Both nucleotide datasets (BIR and CARY) also had the largest number of species and so the rejection rate may be a function of the number of taxa (i.e., with a greater number of sampled lineages, cumulative stochastic variation for even clock-like data can lead to the rejection of a strict clock). Sensitivity of the clock-test to nucleotide data is not the focus of this study, but should be examined in more detail. Furthermore, it would be more informative to examine the deviation from the clock instead of a boolean test of significant fit. In regard to divergence-time estimation, if a strict or stricter clock can be used, molecular phylogenies may be dated with significantly lower error. As an added benefit, fewer fossils would be necessary to calibrate nodes (and indirectly, rates). We suggest that the community explore model fit to relaxed clock models as well as potential alternatives to the prevailing strict clock test that may be more beneficial for divergence-time estimates and more informative in regard to rate heterogeneity in phylogenomic datasets.


We would like to thank Oscar Vargas, Greg Stull, Jianjun Jin, Drew Larson, Ning Wang, Caroline Parins-Fukuchi, Edwige Moyroud, Lijun Zhao, and Nat Walker-Hale for discussion of the manuscript and method. Greg Stull assisted with testing the SortaDate code. JFW and SAS were supported by NSF 1354048. JWB and SAS were supported by NSF 1207915.


  1. 1. Smith SA, Beaulieu JM, Donoghue MJ. An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants. Proceedings of the National Academy of Sciences. 2010;107(13):5897–5902.
  2. 2. Dornburg A, Brandley MC, McGowen MR, Near TJ. Relaxed Clocks and Inferences of Heterogeneous Patterns of Nucleotide Substitution and Divergence Time Estimates across Whales and Dolphins (Mammalia: Cetacea). Molecular Biology and Evolution. 2012;29(2):721–736. pmid:21926070
  3. 3. Parham JF, Donoghue PCJ, Bell CJ, Calway TD, Head JJ, Holroyd PA, et al. Best Practices for Justifying Fossil Calibrations. Systematic Biology. 2012;61(2):346–359. pmid:22105867
  4. 4. Heath TA, Moore BR. Bayesian inference of species divergence times. In: Chen MH, Kuo L, Lewis PO, editors. Bayesian Phylogenetics: Methods Algorithms, and Applications. Boca Raton, Florida: CRC Press; 2014. p. 277–318.
  5. 5. Beaulieu JM, O’Meara BC, Crane P, Donoghue MJ. Heterogeneous Rates of Molecular Evolution and Diversification Could Explain the Triassic Age Estimate for Angiosperms. Systematic Biology. 2015;64(5):869–878. pmid:25944476
  6. 6. Kumar S, Hedges SB. Advances in time estimation methods for molecular data. Molecular Biology and Evolution. 2016;33(4):863–869. pmid:26882983
  7. 7. Zhu T, Dos Reis M, Yang Z. Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci. Systematic Biology. 2015;64(2):267–280. pmid:25503979
  8. 8. Heled J, Drummond AJ. Calibrated Birth—Death Phylogenetic Time-Tree Priors for Bayesian Inference. Systematic Biology. 2015;64(3):369–383. pmid:25398445
  9. 9. Rannala B. Conceptual issues in Bayesian divergence time estimation. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2016;371(1699):20150134. pmid:27325831
  10. 10. dos Reis M. Notes on the birth—death prior with fossil calibrations for Bayesian estimation of species divergence times. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2016;371(1699):20150128. pmid:27325826
  11. 11. Brown JW, Smith SA. The Past Sure Is Tense: On Interpreting Phylogenetic Divergence Time Estimates. Systematic Biology. 2017;syx074.
  12. 12. Sanderson MJ. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Molecular Biology and Evolution. 2002;19(1):101–109. pmid:11752195
  13. 13. Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biology. 2006;4(5):699.
  14. 14. Drummond AJ, Suchard MA. Bayesian random local clocks, or one rate to rule them all. BMC Biology. 2010;8(1):114. pmid:20807414
  15. 15. Worobey M, Han GZ, Rambaut A. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature. 2014;508(7495):254–257. pmid:24531761
  16. 16. Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346(6215):1320–1331. pmid:25504713
  17. 17. Prum RO, Berv JS, Dornburg A, Field DJ, Townsend JP, Lemmon EM, et al. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature. 2015;526(7574):569–573. pmid:26444237
  18. 18. Ksepka DT, Parham JF, Allman JF, Benton MJ, Carrano MT, Cranston KA, et al. The Fossil Calibration Database—A New Resource for Divergence Dating. Systematic Biology. 2015;64(5):853–859. pmid:25922515
  19. 19. Sanderson MJ. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003;19(2):301–302. pmid:12538260
  20. 20. Smith SA, O’Meara BC. treePL: divergence time estimation using penalized likelihood for large phylogenies. Bioinformatics. 2012;28(20):2689–2690. pmid:22908216
  21. 21. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology. 2007;7(1):214. pmid:17996036
  22. 22. Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology. 2012;61(3):539–542. pmid:22357727
  23. 23. Takezaki N, Rzhetsky A, Nei M. Phylogenetic test of the molecular clock and linearized trees. Molecular Biology and Evolution. 1995;12(5):823–833. pmid:7476128
  24. 24. Thorne JL, Kishino H. Divergence Time and Evolutionary Rate Estimation with Multilocus Data. Systematic Biology. 2002;51(5):689–702. pmid:12396584
  25. 25. Lartillot N, Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution. 2004;21(6):1095–1109. pmid:15014145
  26. 26. Britton T, Anderson CL, Jacquet D, Lundqvist S, Bremer K. Estimating divergence times in large phylogenetic trees. Systematic Biology. 2007;56(5):741–752. pmid:17886144
  27. 27. Lepage T, Bryant D, Philippe H, Lartillot N. A General Comparison of Relaxed Molecular Clock Models. Molecular Biology and Evolution. 2007;24(12):2669–2680. pmid:17890241
  28. 28. Rannala B, Yang Z. Inferring speciation times under an episodic molecular clock. Systematic Biology. 2007;56(3):453–466. pmid:17558967
  29. 29. Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, Kumar S. Estimating divergence times in large molecular phylogenies. Proceedings of the National Academy of Sciences. 2012;109(47):19333–19338.
  30. 30. Heath TA, Huelsenbeck JP, Stadler T. The fossilized birth—death process for coherent calibration of divergence-time estimates. Proceedings of the National Academy of Sciences. 2014;111(29):E2957–E2966.
  31. 31. Ronquist F, Lartillot N, Phillips MJ. Closing the gap between rocks and clocks using total-evidence dating. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2016;371(1699):20150136. pmid:27325833
  32. 32. Lartillot N, Phillips MJ, Ronquist F. A mixed relaxed clock model. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 2016;371(1699).
  33. 33. Hedges SB, Parker PH, Sibley CG, Kumar S. Continental breakup and the ordinal diversification of birds and mammals. Nature. 1996;381(6579):226–229. pmid:8622763
  34. 34. Kumar S, Hedges SB. A molecular timescale for vertebrate evolution. Nature. 1998;392(6679):917–920. pmid:9582070
  35. 35. van Tuinen M, Hedges SB. Calibration of avian molecular clocks. Molecular Biology and Evolution. 2001;18(2):206–213. pmid:11158379
  36. 36. van Tuinen M, Dyke GJ. Calibration of galliform molecular clocks using multiple fossils and genetic partitions. Molecular Phylogenetics and Evolution. 2004;30(1):74–86. pmid:15022759
  37. 37. Gatesy J, Springer MS. Phylogenetic analysis at deep timescales: Unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Molecular Phylogenetics and Evolution. 2014;80:231–266. pmid:25152276
  38. 38. Mirarab S, Bayzid MS, Boussau B, Warnow T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science. 2014;346(6215):1250463. pmid:25504728
  39. 39. Roch S, Warnow T. On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. Systematic Biology. 2015;64(4):663–676. pmid:25813358
  40. 40. Salichos L, Stamatakis A, Rokas A. Novel Information Theory-Based Measures for Quantifying Incongruence among Phylogenetic Trees. Molecular Biology and Evolution. 2014;31(5):1261–1271. pmid:24509691
  41. 41. Smith SA, Moore MJ, Brown JW, Yang Y. Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Evolutionary Biology. 2015;15(1):150. pmid:26239519
  42. 42. Yang Y, Moore MJ, Brockington SF, Soltis DE, Wong GKS, Carpenter EJ, et al. Dissecting Molecular Evolution in the Highly Diverse Plant Clade Caryophyllales Using Transcriptome Sequencing. Molecular Biology and Evolution. 2015;32(8):2001–2014. pmid:25837578
  43. 43. Doyle VP, Young RE, Naylor GJ, Brown JM. Can we identify genes with increased phylogenetic reliability? Systematic Biology. 2015;64(5):824–837. pmid:26099258
  44. 44. Duchêne S, Molak M, Ho SY. ClockstaR: choosing the number of relaxed-clock models in molecular phylogenetic analysis. Bioinformatics. 2013; p. btt665.
  45. 45. To TH, Jung M, Lycett S, Gascuel O. Fast dating using least-squares criteria and algorithms. Systematic Biology. 2015;65(1):82–97. pmid:26424727
  46. 46. Brown JW, Walker JF, Smith SA. Phyx: phylogenetic tools for Unix. Bioinformatics. 2017;33(12):1886–1888. pmid:28174903
  47. 47. Walker JF, Yang Y, Moore MJ, Mikenas J, Timoneda A, Brockington SF, et al. Widespread paleopolyploidy, gene tree conflict, and recalcitrant relationships among the carnivorous Caryophyllales. American Journal of Botany. 2017;104(6):858–867. pmid:28634254
  48. 48. Wen J, Xiong Z, Nie ZL, Mao L, Zhu Y, Kan XZ, et al. Transcriptome Sequences Resolve Deep Relationships of the Grape Family. PLoS ONE. 2013;8(9):e74394. pmid:24069307
  49. 49. Johnson BR, Borowiec ML, Chiu JC, Lee EK, Atallah J, Ward PS. Phylogenomics Resolves Evolutionary Relationships among Ants, Bees, and Wasps. Current Biology. 2013;23(20):2058–2062. pmid:24094856
  50. 50. Brewer MS, Bond JE. Ordinal-Level Phylogenomics of the Arthropod Class Diplopoda (Millipedes) Based on an Analysis of 221 Nuclear Protein-Coding Loci Generated Using Next-Generation Sequence Analyses. PLoS ONE. 2013;8(11):e79935. pmid:24236165
  51. 51. Yang Y, Smith SA. Orthology Inference in Nonmodel Organisms Using Transcriptomes and Low-Coverage Genomes: Improving Accuracy and Matrix Occupancy for Phylogenomics. Molecular Biology and Evolution. 2014;31(11):3081–3092. pmid:25158799
  52. 52. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. pmid:24451623
  53. 53. Jarvis E, Mirarab S, Aberer A, Houde P, Li C, Ho S, et al. Phylogenomic analyses data of the avian phylogenomics project. GigaScience. 2014;4(1):4.
  54. 54. Anisimova M, Gascuel O. Approximate Likelihood-Ratio Test for Branches: A Fast, Accurate, and Powerful Alternative. Systematic Biology. 2006;55(4):539–552. pmid:16785212
  55. 55. Muse S, Weir BS. Testing for equality of evolutionary rates. Genetics. 1992;132(1):269–276. pmid:1398060
  56. 56. Swofford DL. PAUP*: Phylogenetic Analysis Using Parsimony (* and other methods) 4.0.b5; 2001.
  57. 57. Fletcher W, Yang Z. INDELible: a flexible simulator of biological sequence evolution. Molecular Biology and Evolution. 2009;26(8):1879–1888. pmid:19423664
  58. 58. Ho SYW, Duchêne S, Duchêne D. Simulating and detecting autocorrelation of molecular evolutionary rates among lineages. Molecular Ecology Resources. 2015;15(4):688–696. pmid:25155426