Accurate Bayesian phylogenetic point estimation using a tree distribution parameterized by clade probabilities

doi:10.1371/journal.pcbi.1012789

Fig 1.

A CCD graph

((B) forest network with clade split probabilities) based on a tree sample (A) smoothens the probabilities to all trees it displays: (A) Posterior sample of size seven consisting of three different trees sampled thrice, twice, and twice. Only the clades ABCDE and ABC are split in multiple ways. The resulting probabilities of the trees in the CCD1 are thus 9 ∕ 49, 8 ∕ 49, and 8 ∕ 49. (B) Truncated CCD graph (cherry splits and singletons omitted) based on the sample trees above also displays the unsampled trees below. (C) Unsampled trees with CCD1 probabilities 12 ∕ 49, 6 ∕ 49, and 6 ∕ 49, respectively.

More »

Expand

Fig 2.

For this sample of trees, the CCD graph of CCD0 and CCD1 differ since AB and CD can form an unobserved clade split.

More »

Expand

Fig 3.

Truncated extended CCD graph (cherry splits and singletons omitted, sibling clades in brackets) based on the sample trees from Fig 1A that represents a CCD2.

Note that it only contains the three sampled trees. (While the clade vertices might seem redundant here, they have in general higher in- and outdegree.)

More »

Expand

Fig 4.

Heatmap showing the majority wins based on MAE with simulations in five entropy categories (higher means noisier/harder); more saturated colors mean a larger wining margin for the respective distribution (CCD0, CCD1, CCD2 or the sample distribution).

More »

Expand

Fig 5.

Median MRE for trees and clades in the golden distribution per sample size for Yule20.

Trees are separated into 50% and 95% credible sets.

More »

Expand

Fig 6.

Mean rank of the top tree (rank 1) in the golden distribution in the other distributions per sample size for Yule20.

More »

Expand

Fig 7.

Evaluating the precision of the distributions, we computed the mean of mean absolute differences of tree probabilities by the distributions between two replicates per sample size for Yule20.

More »

Expand

Table 1.

Percentage of the true tree being contained in a distribution for Yule20 and Yule50 (out of the 250/100 simulations with 2 replicates each).

More »

Expand

Fig 8.

The accuracy of the point estimates measured in terms of the mean relative RF distance to the true tree for different sample sizes of the large datasets.

More »

Expand

Fig 9.

The precision of the point estimates in terms of the RF distance, that is, the mean RF distance of the point estimates of the two replicates of each simulation.

More »

Expand