^{1}

^{*}

^{2}

^{2}

^{3}

^{1}

^{1}

^{1}

^{1}

Analyzed the data: HK NN YK TO TW YO. Wrote the paper: HK PW YK. Other: Developed the initial theory and developed the simulations: HK YK. Developed the simulations, added biological/theoretical interpretations and revised the manuscript: PW. Developed the Fortran programs: NN TO TW YO.

The authors have declared that no competing interests exist.

Molecular studies have reported divergence times of modern placental orders long before the Cretaceous–Tertiary boundary and far older than paleontological data. However, this discrepancy may not be real, but rather appear because of the violation of implicit assumptions in the estimation procedures, such as non-gradual change of evolutionary rate and failure to correct for convergent evolution.

New procedures for divergence-time estimation robust to abrupt changes in the rate of molecular evolution are described. We used a variant of the multidimensional vector space (MVS) procedure to take account of possible convergent evolution. Numerical simulations of abrupt rate change and convergent evolution showed good performance of the new procedures in contrast to current methods. Application to complete mitochondrial genomes identified marked rate accelerations and decelerations, which are not obtained with current methods. The root of placental mammals is estimated to be ∼18 million years more recent than when assuming a log Brownian motion model. Correcting the pairwise distances for convergent evolution using MVS lowers the age of the root about another 20 million years compared to using standard maximum likelihood tree branch lengths. These two procedures combined revise the root time of placental mammals from around 122 million years ago to close to 84 million years ago. As a result, the estimated distribution of molecular divergence times is broadly consistent with quantitative analysis of the North American fossil record and traditional morphological views.

By including the dual effects of abrupt rate change and directly accounting for convergent evolution at the molecular level, these estimates provide congruence between the molecular results, paleontological analyses and morphological expectations. The programs developed here are provided along with sample data that reproduce the results of this study and are especially applicable studies using genome-scale sequence lengths.

Despite great progress over the past decade, the evolutionary history of placental mammals remains controversial. While a consensus is emerging on the topology of the evolutionary tree

In contrast, molecular studies have suggested markedly older origins for many superordinal groups and that some extant orders diversified before the K–T boundary. The root of living placental mammals has been reported to be in the range of 100–140 mya

The strength of molecular divergence time studies is their potential to draw information from very long aligned sequences of many species. It is widely assumed that a huge amount of sequence data and the approximate rate constancy of sequence evolution

However, molecular studies acknowledge the problem of misspecification of the model of sequence change, which may result in seriously biased estimation. The relationships among placental orders do vary according to the data used and the taxa sampled

A further problem facing molecular dating is the evolutionary rate constancy, or lack of constancy, over long periods. Since its proposal in 1965

To improve the detection of, and robustness to, abrupt rate changes, we have developed a new procedure that minimizes the local variability of the inverse of the evolutionary rate. Just as the effective size of fluctuating populations is represented by the harmonic mean over time, the mean evolutionary rate among lineages is expressed better by the harmonic mean. This approach is especially useful when branch lengths measured in the expected number of substitutions per site (the products of rates and times) are estimated accurately, and there are either rapid transient changes of rate (hence large rate heterogeneity) or a general bias towards a speed up or slow down in rates through time.

Using this new procedure, an analysis of 69 mitochondrial protein sequences (3660 amino acid sites in total) from placental mammals identified a rapid acceleration of evolutionary rate for the lineage directly leading to the common ancestor of Supraprimates and an even more marked one for the lineage leading to Laurasiatheria. This acceleration was followed closely by a strong deceleration, which persisted in nearly all lineages of Laurasiatheria. In contrast, almost all lineages of Afrotheria and Xenarthra seem to have retained rates similar to that of the root. This view is in marked contrast to current rate-change penalty functions. The robustness of the new procedure is assessed using simulations that show the types of change that most concern biologists; speedups or slowdowns through time, transient rate changes, and rate changes that are do not follow a normal or transformed normal distribution, as well as stochastic variation. A revised estimate of the origin of placental mammals is as young as 84 mya, which is much more recent than current estimates using molecular data. The inferred age of deeper splits in the placental tree are compared with the rate of occurrence of new species from the North American fossil record. These two sources of data are far more congruent than is suggested by using current, possibly strongly misleading, dating methods.

Mitochondrial protein sequences are used widely in phylogenetic studies and have been particularly popular in studying placental mammals. A desirable feature of these data is relatively long sequences, good taxon sampling and very little missing data. Following alignment, we retained 3660 amino acid sites present in all of 62 placental mammals plus seven outgroup taxa (

We adopted a standard two-step procedure to estimate divergence times. The first step is to estimate the phylogenetic tree with unconstrained branch lengths in units of expected numbers of substitutions per site. Given the problems with convergent evolution in mitochondrial data

Three cost functions, F_{ADD}, F_{LOG}, and F_{IR}, were applied to the MVS and ML trees. These functions penalize the fluctuation of rates, and do this on either a linear scale (here called the ADD function), on the log rate (LOG), or on the inverse rates (IR), respectively (see equations (1), (2), and (4) in _{ADD }is denoted MVS-F_{ADD} and so on (while ML-F_{ADD} indicates use of ML branch lengths).

To calibrate these trees, we used eight fossil constraints, all taken from previous studies (see

A and B show the MVS-F_{IR} and ML-F_{LOG} trees, respectively. The numbers 1–61 denote the ancestral nodes. The red numbers 1–8 indicate the internal nodes with fossil constraints which are as follows: 1, 49–61; 2, 52–58; 3, 45–63; 4, 43–60; 5, <63; 6, >12; 7, 36–55; 8, 54–65 (all in mya) ^{−9}/site per year) as follows: black, <0.2; dark blue, 0.2–0.3; light blue, 0.3–0.4; green, 0.4–0.5; brown, 0.5–0.6; yellow, 0.6–0.7; and red, >0.7.

The MVS-F_{IR} analysis (_{L} analysis (_{L} is giving results consistent with that reported in earlier studies _{IR} (see below) is due to the method and not the data. The difference between the F_{LOG} and the F'_{LOG} function in _{LOG} method that most closely approximates the Brownian motion assumed by Multidivtime.

Tree model | Cost function | |||||

F_{IR} | F_{ADD} | F_{LOG} | F'_{LOG} | F_{ADD}(r8s) | F'_{LOG}(r8s) | |

MVS | 84.2(80.7, 88.4) | Infinity | 105.0(97.0,117.0) | 91.8(84.5,104.7) | 160.5 | 91.1 |

ML | 106.4(102.4,110.6) | Infinity | 122.2(112.2,144.7) | 112.0(102.0,121.2) | 122.2 | 111.6 |

The age of the root using MVS and ML branch lengths was estimated after minimizing various penalty functions with the same fossil constraints. All times are in mya, and 95% confidence intervals were estimated by the sum-of-squares method described in

_{IR} tree identified an abrupt acceleration of evolutionary rate near the common ancestor of both Supraprimates and Laurasiatheria, then a very strong acceleration in just the ancestral lineage of Laurasiatheria (

Figures A and B, respectively, trace the estimated rates along edges on analyses using MVS and ML branch lengths. The root time of of the F_{ADD} analysis in Figure A and B was set at a large value (400 mya) because the numerical calculation continues towards an infinite root time.

The cost function F_{IR} detected an acceleration–deceleration pattern near the base of Laurasiatheria using both the MVS and ML branch lengths (the red lines of _{LOG} showed a far more flat prediction of generally lower evolutionary rates that led to older root times (the green lines of _{ADD} inferred gradually decreasing rates in all deep branches of the MVS and ML trees (the blue lines of

_{ADD} and F′_{LOG} (equation (3) in

_{IR} tree (

The blue squares show the rate of the appearance of new species based on the fossil record _{IR} tree (_{LOG} tree (

The MVS-F_{IR} analysis reconciles molecular with fossil data in two ways. First, the chronological distribution of the internal node density is clearly largely consistent with the rate of appearance of novel fossil species. This is despite the fact that the paleontological data assessed are limited to the well-studied North American record _{IR} tree and the fossil record suggest accelerating taxonomic diversity near the K–T boundary, rather than a longer slower buildup _{LOG} criterion seems to suggest a prolonged increase in diversity that agrees far less with the quantitative fossil record. This congruence with fossils does not automatically show that the combination of MVS and F_{IR} is the best way to analyze this data, but it does reframe the discussion of the relationship between the fossil and molecular times into one where the molecular dates are being looked at far more critically.

Second, and more indicatively, the cost function F_{IR} resolves incongruence among the fossil constraints/inferred divergence times in different parts of the molecular tree. The best fossil constraints in Laurasiatheria suggest much older times than constraints in other parts of the tree _{IR}, MVS-F_{LOG}, and MVS-F_{ADD} trees with all constraints give the root as 84.2, 105.0, and ∞ mya, respectively (_{IR} tree is insensitive to “constraint sampling”.

Finally, there is a good fossil calibration for tarsier _{IR} tree (

We highlight two distinct properties of the new function F_{IR} with the help of evolutionary simulations and worked examples. The first property is its ability to detect a transient acceleration of evolutionary rate. Such an effect might be caused by a burst of positive selection and/or a bottleneck in population size. The second is to assess the effects of both stochastic fluctuations and systematic bias on the robustness of estimated times. Here, we model bias in the form of either a general slowdown or a general acceleration of evolutionary rate across the whole tree.

We first modeled a strong instantaneous acceleration as an analogue to what is inferred by F_{IR} to have occurred ancestral branch leading to Laurasiatheria. We simulated a 32-taxon symmetric tree in which a molecular clock holds except for an abrupt elevation (by a factor of 10) of evolutionary rate along a short internal branch (the red line in _{IR} function and something similar appears on both the MVS-F_{IR} and ML-F_{IR} mitochondrial trees. The times at the internal nodes were set to 48, 56, 64, and 72 mya, and the root time was set to 80 mya. On this weighted tree, the cost functions were minimized with two constrained node times, which corresponded to two fossil calibration points. Only the F_{IR} function accurately estimated the true divergence times and appeared robust against the abrupt change (_{LOG} and F_{ADD} inferred gradual rate changes (

In this worked example using a symmetric 32-taxon tree (Figure A), a global molecular clock holds, except for a short-term increase in evolutionary rate along one branch (the red line in Figure A). The true root time was set to 80 mya, and the times at the internal nodes are 48, 56, 64, and 72 mya. The deep internal branch (the red line in Figure A) is given an evolutionary rate ten times that of the remaining edges. The various cost functions were minimized subject to two calibrated nodes (the red numbers 1 and 2 in Figure A), using the exact branch lengths of this example as input data. The cost functions F_{IR}, F_{LOG}, and F_{ADD} inferred the weighted trees of Figures B, C and D, respectively. Figures E–G show the trace of evolutionary rates along the lineages from the root to taxa numbers 5, 9, and 25 of Figure A, respectively. The inferred age of the root for each cost function is shown with an arrow. The function F_{IR} recovered the original pattern of rate change, whereas the other two functions inferred far more gradual changes, which resulted in a substantial overestimation of the root time.

True value | Cost function | |||||

F_{IR} | F_{ADD} | F_{LOG} | F'_{LOG} | F_{ADD}(r8s) | F'_{LOG}(r8s) | |

80 | 81.3 | 161.1 | 114.5 | 101.7 | 160.5 | 108.7 |

These estimates are from a worked example with an abrupt acceleration then deceleration (the red line of

We next simulated stochastic rate fluctuations by themselves, plus either a prevailing slowing down or acceleration of rates through time. Such simulations are distinct from a Brownian-type process, and are used to gauge the general robustness of the functions. To impose rate fluctuations on the same basic tree as _{IR} gave reasonable estimates with a bias towards deceleration, whereas F_{ADD} gave an infinite root time in 139 of 600 samples. These undefined root-time values were set arbitrarily to 200 mya before calculating the mean and standard error.

Rate range | Cost function | ||

F_{IR} | F_{LOG} | F_{ADD} | |

0.25–1.0 | 111.3±11.1 | 125.1±12.7 | 172.0±33.9 |

0.5–1.5 | 100.7±4.4 | 103.3±3.9 | 103.6±5.1 |

1.0–1.75 | 93.2±2.0 | 94.0±2.1 | 94.3±2.4 |

In these simulations, the length of a branch (on what tree!) is selected randomly proportional to its true duration in time. A rate adjustment (change) factor is then chosen randomly form a uniform distribution and all its descendant branch lengths were then multiplied by this factor. A total of 25 such random rate changes were placed on the tree, then the branch lengths (measured in the product of rate and time) of the weighted tree were passed on to the time estimation procedures. The whole procedure was repeated 600 times to obtain the average root time and standard error. Three different ranges were used for the uniform distribution of rate changes. The first has a range of 0.25 to 1 and represent a strong persistent bias towards rate deceleration. A range of 0.5 to 1.5 gives minimal rate change bias, but retains stochastic fluctuations. Finally a strong acceleration effect is achieved by the use of the range 1.0–1.75. The function F_{ADD} gave an age of the root tending to infinity on 139 of the weighted trees simulated under the deceleration model. In such cases the root time is set to 200 mya in order to allow the mean and standard deviation of this cost function to be calculated.

The MVS model was shown previously to recover the correct tree in a simulation with two strongly convergent lineages ^{4} amino acid sites, following the tree depicted in

Amino acid substitutions were evolved on the shown weighted tree using the JTT model

Pairwise distances were then estimated from the terminal sequences obtained above using the same JTT model. A modified MVS core-set procedure

The branch lengths recovered by the MVS model reproduced the true values accurately (

The tree topology inferred by these methods was identical to the tree that generated the data, so the estimated branch lengths are plotted against their true values. The blue numbers show the branch index, as used on

We begin the discussion by examining why the current cost functions, F_{LOG} and F_{ADD}, overestimated the age of the root in the worked example with strong rate heterogeneity in the form of a short term highly elevated evolutionary rate. It is also important to examine the profile of the cost function around their minimal values for the age of the root. The functions F_{LOG} and F_{ADD} showed asymmetric behavior around the estimated root time, even in worked examples with a perfect molecular clock (data not shown), whereas the F_{IR} profile was symmetric and parabolic in shape. The asymmetry seen with F_{LOG} and F_{ADD} increased in response to a general bias towards deceleration of rates through time (_{ADD} showed a monotonic decrease with respect to increasing age of the root, that is, its estimate tended to infinity. Thus, asymmetric behavior of a cost function seems to be a symptom of unstable estimation of the age of the root. A similar strong asymmetry appeared in the profile of the current cost function with respect to the age of the root of the mitochondrial tree of placental mammals (

This figure shows the profile of the cost function with respect to the age of the root estimated by three different cost functions. Because the root age estimated by function F_{ADD} was going to infinity, it was set to 200 mya for illustrative purposes. Figure A is a single example from a tree simulated under the scenario of random auto-correlated changes of rate moving towards the tips, strongly biased towards a deceleration of evolutionary rates as time progresses (from the set of simulations used for

Other approaches to divergence time estimation are under active development. For example, Drummond et al.

Our refined approaches resolve apparent contradictions between the quantitative molecular and paleontological data of placental mammals. Given such agreement, there is no need for ancillary hypotheses such as the long fuse model _{IR} cost function developed in this paper will provide an improved methodology for a wide range of molecular studies because its robustness to rapid fluctuations of rate is essential to understanding events such as adaptive evolution. For example, because acquisition of new molecular functions can be achieved in a few million years after gene duplications

Because the branch lengths of a phylogenetic tree are estimated from the data as the product of the evolutionary rate of a branch and its time duration, these two factors cannot be directly estimated separately. It has become standard to use loose constraints to accommodate uncertainty and it is often wise to exclude constraints if there is no firm basis for them. Information on the age of some internal nodes may be fairly directly available from the fossil record. An example of this is the horse-rhino split

The most widely used assumption in order to model the evolutionary rate changes away form being constant (or away from a clock) is to introduce a stochastic process. For example, Sanderson _{n}_{a}_{(n)} being from the _{n}^{1/2}σ}^{−1} exp{−(_{n}_{a}_{(n)})^{2}/2σ^{2}} implies that this function can be interpreted assumes _{n}

However, the estimated rates based on any of the above cost functions may overly smooth the change of evolutionary rates when a pronounced transient change of rate has occurred. In turn, biased estimates of evolutionary rates will lead to biased estimates of divergence times. Here we propose a new cost function, which penalizes the local rate deviation from the harmonic mean. For simplicity, we ignore the stochastic variance of branch length _{n}_{n}_{n}_{n}_{IR,n}) of two successive branches can be expressed as the variance of _{n}_{a}_{(n)} around their average value, _{n}_{n}B_{n}_{a}_{(n)}_{a}_{(n)})/(_{n}_{a}_{(n)}), which is equal to the inverse of the harmonic mean of _{n}_{n}_{n}_{a}_{(n)}, with the weights _{n}_{a}_{(n)}. That is, F_{IR,n} = {(_{n}_{n}^{2}_{n}_{a}_{(n)}−_{n}^{2}_{a}_{(n)}}/(_{n}_{a}_{(n)}). Because F_{IR,n} is rewritten as F_{IR,n} = (_{n}_{a}_{(n)})^{2}_{n}_{n}_{n}B_{a}_{(n)}/(_{n}_{a}_{(n)})^{2}, the inverse-rate cost function, F_{IR}, is defined as the weighted average of (_{n}_{a}_{(n)})^{2} over all ancestor–descendant branch pairs in the tree:_{n}

Because F_{IR} places a smaller penalty on abrupt rate changes than do previous models, it can confine an abrupt change close to where it occurred on the tree. In contrast, F_{ADD} and F_{LOG} _{n}

At first glance, it seems most intuitive to use a penalty such as (rate_{1}−rate_{2})^{2}; this has been the implicit assumption until now _{1}, then back at velocity _{2}, the average velocity (_{av}) is equal not to the standard arithmetic mean, but to the harmonic mean, 1/_{av} = (1/_{1}+1/_{2})/2. Further, going from point (node) _{1}, and from point _{2}, then the distances traveled (_{ab}_{bc}_{av} = (_{ab}_{bc}_{1}+_{2}), where _{1} = _{ab}_{1} and _{2} = _{bc}_{2}. As a result, 1/_{av} = (_{ab}_{1}+_{bc}_{2})/(_{ab}_{bc}_{n}_{IR} (equation 4). The use of the harmonic mean becomes important when _{1} differs greatly from _{2}.

The Bayesian approach estimates the divergence times and evolutionary rates in the form of a posterior distribution, which is summarized approximately as

In the MVS method, the additivity of evolutionary distances is converted using the expected orthogonality among branch vectors in a multidimensional Euclidean space

The first step of this approach is to divide the set of taxa into subgroups and to correct the distances between pairs in each subgroup. We decompose the taxa based on partitions with both strong biological support and high bootstrap support. The deviation from additivity of pairwise distances is much smaller within each subgroup than across the whole distance matrix. If an anomalously large deviation is observed associated with a single taxon within a subgroup, that taxon is removed temporarily from the analysis. When the only deviations between distances within a subset of taxa are judged to be due to stochastic noise, the pairwise distances are modified by solving the equation of motion (a method using a many body kinetic equation in physics), which uses the index of the deviation from additivity as the potential energy

The second step is a clustering of subgroups. The pairwise distances between two core-set groups are corrected by assuming attractions between them (that is, some distances are underestimated) and by minimal enlargement of the distances between the two groups to satisfy additivity. In this way, the core-set approach used in this article is in the direction opposite to that used in an earlier procedure that proceeds from the whole tree down to subtrees

Conceptually, the first step of MVS core set is to decompose the whole tree into putative monophyletic groups that show minimal bias internally. The approach developed as part of this particular study resembles

When analyzing the mitochondrial protein sequences, we estimated pairwise distances initially under the JTT+Gamma model with α = 0.5. Given moderately high bootstrap support, the placental tree was decomposed into four major groups: Laurasiatheria, Supraprimate, Xenarthra, and Afrotheria. Laurasiatheria was decomposed into five subgroups, namely Cetartiodactyla, Perissodactyla, Carnivora, Chiroptera, and Euliptotyphyla. Supraprimates was decomposed into two subgroups: Primates and Glires. The core-set analysis began by generating additive distances within these subgroups, after which the clustering procedure described above was applied to connect these subgroups. In this analysis, we analyzed sequences of Xenarthra and Afrotheria together in a single core-set. This is because a relatively short internal branch separates them and Xenarthra contains but a single sequence in this analysis. Finally, we tried to modify the distances between placental mammals and the outgroups, but we could not determine the position of the root with confidence because of very strong attractions between many placental mammals and the outgroups. The consensus

Further details of the current MVS procedure of distance modification are documented in

We note that least-squares estimators can be interpreted as ML estimators when errors in the data follow a multi-normal distribution. Accordingly, the least-squares residual can be treated as twice the log-likelihood ratio for the three cost functions G_{J}_{root}) (_{J}_{root}) = _{J}_{root})/F_{J}_{M}_{J}_{root}) gives the minimum residual value with all internal node times reestimated, except for the root time _{root}, and it becomes truly minimal at _{root} = _{M}_{J}_{root}) = 3.84. Here, the minimum for criterion F_{ADD} was arbitrarily set to F_{ADD}(_{root}) at _{root} = 200 because the profile decreased monotonically for all greater values of the root time. In all evaluations, we avoid putting arbitrary constraints near the root of the tree (for example, that the rate at the root is the same as that of one of its descendants) because while arbitrary constraints may bound the minimum, they will also create unknown biases. Note that the MVS-F_{IR} profile is symmetric and parabolic around the estimated root time, whereas those of F_{LOG} and F_{ADD} are not. We see the same thing even in worked examples using a perfectly clock-like tree.

Our novel way of estimating CIs jointly takes into account some, but not all, of the sources of error noted in a previous publication

Fortran source code and executable versions of the programs used in this study are downloadable from 〈

The procedure of MVS distance modification for placental mammals

(0.04 MB PDF)

The operation manual of programs for divergence time and MVS analyses

(0.04 MB PDF)

Trace of evolutionary rates along seven lineages in the MVS-F_{IR} and ML-F_{IR} analyses. The ML-F_{IR} analysis (B) showed a flatter peak rate than did the MVS-F_{IR} tree (A) (the black lines of F in the whale lineage) and inferred a longer period at a lower rate of evolution, producing an older root time. If lineages appear to merge with each other, zoom in to follow their exact path.

(0.68 MB TIF)

Sequences and accession numbers used in this paper. The aligned amino acid sequences of the mitochondrial proteins ND1-ND6, ND4L, CO1-CO3, CTYB, and ATP6, are concatenated giving a total length of 3660 sites.

(0.09 MB DOC)

Divergence times of the MVS-F_{IR}, ML-F_{IR}, and ML-F_{LOG} analyses. The numbers 1–61 denote the ancestral nodes in

(0.09 MB DOC)

We thank Jeffrey L Thorne for his constructive criticism and theoretical insights.