^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: AMW DMH. Performed the experiments: AMW. Analyzed the data: AMW DMH. Contributed reagents/materials/analysis tools: AMW DMH. Wrote the paper: AMW DMH.

Despite the introduction of likelihood-based methods for estimating phylogenetic trees from phenotypic data, parsimony remains the most widely-used optimality criterion for building trees from discrete morphological data. However, it has been known for decades that there are regions of solution space in which parsimony is a poor estimator of tree topology. Numerous software implementations of likelihood-based models for the estimation of phylogeny from discrete morphological data exist, especially for the Mk model of discrete character evolution. Here we explore the efficacy of Bayesian estimation of phylogeny, using the Mk model, under conditions that are commonly encountered in paleontological studies. Using simulated data, we describe the relative performances of parsimony and the Mk model under a range of realistic conditions that include common scenarios of missing data and rate heterogeneity.

For many decades, parsimony methods have been the most widely used approaches for estimation of phylogeny from discrete phenotypic data, despite the availability of likelihood-based methods for phylogenetic analysis. Maximum likelihood and Bayesian methods are commonly used in data sets combining molecules and morphology

At the present, the most widely implemented (in both pure likelihood and Bayesian contexts) model for estimating phylogenetic trees from discrete phenotypic data is the Mk model proposed by Lewis

Sampled characters within data sets typically evolve under different rates, developmental processes, and modes of evolution

The ability to estimate branch lengths in numbers of changes per site or character is also useful for estimating divergence times. The Mk model, for example, is implemented in the software packages BEAST

Though there are many positive aspects of the Mk model (statistical consistency, ability to accept superimposed changes, explicit modeling of rate heterogeneity with a gamma distribution), paleontologists have been slow to adopt model-based approaches. Comparisons between the Mk model and parsimony analyses have provided interesting and illuminating results. For example, Xu et al.

Here, we investigate the relative performance of parsimony and Bayesian analyses using the Mk model, under a variety of conditions applicable to paleontological investigations. We based simulations on empirically estimated trees so that we could sample realistic branch lengths and tree topologies. We then designed the simulations to investigate a range of factors associated with accuracy of phylogenetic estimation, including missing data, rate heterogeneity, and overall character change rate.

To investigate the efficacy of the Mk model for phylogenetic estimation, we simulated data sets in the R package GEIGER

We simulated data sets of two sizes. The first data set size was 350 characters. This number of characters is representative for data sets of phenotypic data, as many published data sets are this size or smaller. We also simulated comparatively larger data sets of 1000 characters to investigate the effects of character sample sizes. The empirical tree along which data were simulated was based on the tree presented by Pyron

This tree was obtained from a combined molecular–phenotypic data set analyzed by Pyron

Phenotypic data are often filtered by an observer-defined scheme. Characters that do not vary or vary in a parsimony uninformative way (such as autapomorphies) are usually excluded from analysis. In contrast to molecular sequence data, this means that there are rarely invariant sites in paleontological data sets. This bias can result in inflation of the estimated rate of evolutionary change in the data set, increasing the estimated branch lengths on the tree

Each character filtration scheme was parameterized appropriately in MrBayes. We did not explore the effects of model misspecification or incorrectly accounting for acquisition bias in this study. Data files can be found in the online supporting material, along with scripts for assembling MrBayes and PAUP blocks.

To assess the effects of missing data on phylogenetic estimation, we used several schemes for character deletion. We sorted the characters by rate of change, and divided them into three categories: fast-, intermediate-, and slow-evolving sites. Within each class of sites, we created data sets in which we removed between 10% and 100% of sites to investigate the effects of underrepresentation of certain classes of characters. Missing data were concentrated in fossil taxa, as seen in

Columns represent characters. In the taxon-names column, an asterisk represents fossil taxa. Characters with the slowest rate of change are represented in light grey; intermediate-rate characters are represented in medium grey; characters with highest rate of change are represented in dark grey. In the top matrix, all characters are present for all taxa. The bottom matrices illustrate the missing data conditions that we simulated in this paper.

We estimated Bayesian phylogenetic trees in MrBayes 3.2.2

We used PAUP* for parsimony analyses. In PAUP*

There are many ways to categorize how well a tree has been estimated. Given that these data were simulated under a tree, we can compare the estimated phylogenetic trees to the true phylogenetic tree. We used a script written in Python, making use of the Dendropy library

In a Bayesian analysis, the posterior sample of trees is not comprised of equally optimal solutions. Instead, each tree in the sample typically has a different likelihood score. A majority-rule consensus tree can be used to summarize the variation across the posterior sample, and this consensus tree is often taken as a summary estimate of the phylogeny. Therefore, we used the symmetric distance from the majority-rule consensus tree of the posterior sample to the model tree to evaluate the performance of the Bayesian analyses. In contrast, under the parsimony criterion, equally parsimonious trees are each considered optimal alternative solutions. Therefore, in parsimony analyses, we calculated the symmetric distance from each equally parsimonious solution to the model tree, and then averaged these scores within each data set to obtain an average symmetric distance score. We also used a majority-rule consensus tree to evaluate the parsimony analyses, and found the results were almost identical with the two measures (

Sampling bias does not affect Bayesian estimation when appropriate corrections are implemented. Correcting for ascertainment bias in MrBayes

As seen in

Bayesian-Mk outperforms parsimony most strongly when the rate of character evolution (and hence homoplasy) is high.

As the amount of missing data increases in these data sets, the amount of error also increases. With 75% of data missing, as seen on

In data sets with rate heterogeneity among the characters, the Mk model continues to outperform parsimony, as shown in

Note that, unlike

This figure compares the effect of deleting one-third of the characters from three different rate classes. (A) Comparisons of Bayesian-Mk analyses. (B) Comparisons of parsimony analyses.

Both Bayesian Mk analyses and parsimony show degraded performance when characters of different rate classes are removed from the analysis, although the negative effects of missing data are much greater for parsimony than for the Bayesian analyses (especially for deletion of the slowest-evolving characters). Part of this effect is related to reduction in the overall number of characters available for analysis. Increasing the total number of characters in the analysis improves the performance for both Bayesian and parsimony analyses, although the Bayesian analyses continue to exhibit higher accuracy compared to parsimony in the 1000-character analyses (

Our results suggest that Bayesian methods of analysis are likely to exhibit lower error rates compared to parsimony analyses in phylogenetic analyses of morphological and paleontological data sets. Moreover, researchers should carefully consider character-sampling design, as error rates can increase if characters are evolving too rapidly (

However, it is unlikely that empirical data sets will have only one rate of evolution across the whole data set. Rather, they are likely to be made up of characters that have been subjected to different selective pressures, different developmental constraints, and different evolutionary processes

Increasing the size of the data set improves estimation for both parsimony and Bayesian methods. However, even in large data sets with no missing data, the Bayesian analyses using a simple likelihood model of character change typically outperform parsimony analyses (

The benefits of adding fossil taxa to a data set are numerous. Earlier research has argued that fossil taxa can alleviate the issue of long-branch attraction (LBA), particularly when additional extant taxa cannot be added to break up long branches

In addition to exhibiting lower error rates, model-based methods offer another important advantage over parsimony: the ability to estimate time based on branch lengths of the phylogenetic tree. The Mk model, for example, is implemented in the software packages BEAST

Our results demonstrate that Bayesian methods are more accurate than parsimony for estimating trees from discrete morphological data under a wide set of realistic conditions. Even when there are large amounts of missing data (as is common in paleontological studies), a simple likelihood model consistently produces less error in tree estimation compared to parsimony. Although there is considerable room for models of morphological character evolution to be improved, even simple model-based methods can result in considerable improvement of phylogenetic analyses of morphological data sets.

(TIFF)

(TIFF)

We thank Craig Dupree and Ming Cheng for technical support on this project. The Texas Advanced Computing Center (TACC) at The University of Texas at Austin provided HPC resources in support of the research results reported within this paper. Luke Harmon and Joseph Brown also provided very useful discussion on this project in its early stages, and Nicholas Matzke and Paul Lewis made numerous helpful suggestions in reviews of this manuscript.