Table 1.
Prior distributions for the effective population size of the constant-size coalescent (known as θ in the constant-size coalescent and different to the scale parameter of the Γ distribution).
Fig 1.
Relative log marginal likelihoods of empirical data sets.
The polygons represent the relative log marginal likelihoods of each microbe data set under a different effective population size (θ) prior, analysed with four different configurations of sampling times and molecular clocks. The corners correspond to models (a combination of molecular clock model and the inclusion or exclusion of sampling times). The outermost dashed lines are for the highest marginal likelihood recorded and the extent to which the polygons are deformed denotes relative model support (e.g. a perfect square would imply that all models are equally well supported). A corner that falls close to the centre would correspond to a model that has low support. Het (heterochronous) includes sampling, while Iso (isochronous) does not include any sampling times. SC is strict clock and UCLD is the uncorrelated log-normal relaxed clock. Red represents an exponential prior on the effective population size, blue is a Γ prior, and green is a log-normal prior.
Table 2.
Log Bayes factors between isochronous and heterochronous models for each data set, separated by prior on effective population size, θ.
Fig 2.
Relative log marginal likelihoods of simulations.
The polygons represent the relative log marginal likelihood under three possible priors on the effective population size (θ) parameter of the constant-size coalescent tree prior. The top row is for heterochronous simulations, where temporal signal is present. The bottom row is for isochronous simulations that do not have temporal signal. Within each panel the corners correspond to a combination of model and sampling times, either a strict (SC) or relaxed molecular clock with an underlying log-normal distribution (UCLD), and with (heterochronous) or without (isochronous) sampling times. The correct model used to generate the data is the SC heterochronous (SC/het) for the top row and the SC isochronous (Iso/SC) for the bottom row. Each polygon is for one simulation replicate (a total of ten) and the colours denote whether we employed a hard bound on the root height of the form Uniform(0.0, 5.0), as shown in the legend.
Table 3.
Correctly classified simulation replicates under heterochronous and isochronous trees.
A total of twenty simulations were generated in each configuration. Ten under heterochronous trees that are expected to display temporal signal, and ten under isochronous trees that they are not expected to support temporal signal. A number of ten represents perfect classification according to the Bayesian evaluation of temporal signal, BETS, and a log Bayes factor of at least 3 (strong evidence for temporal signal) for heterochronous trees. For isochronous trees, the threshold is a log Bayes factor of at most -3 (strong evidence against temporal signal). The rows correspond to three possible priors on the effective population size of the constant-size coalescent, θ. The ‘Best clock model’ is a situation where we consider the best heterochronous and isochronous model, take their log Bayes factor, and determine temporal signal if the absolute value in favour of the correct model is at least 3. The value to the left of / is the number of correctly classified heterochronous simulations, and the value right of / is the number of correctly classified isochronous simulations.
Fig 3.
Phylogenetic tree extension for a simulation replicate with no temporal signal.
Highest clade credibility trees from a data set simulated with no sampling times (isochronous) and under a strict molecular clock model (SC). The prior on effective population size (θ) is a Γ(κ = 0.001, θ = 1000), which resulted in high classification errors using BETS. The y-axis is the time from the present. Tip nodes have solid grey circles. Including sampling times that span 0.5 units of time and about 1/4 of the true root height induces dramatic overestimation of the root height, compared to the true model (SC, isochronous). This effect occurs under both molecular clock models, the SC and the relaxed molecular clock with an underlying log-normal distribution (UCLD), but it is markedly less pronounced in the UCLD. Note that the y-axis is in logarithmic scale (log10).
Fig 4.
Phylogenetic trees from a simulation replicate with temporal signal.
Highest clade credibility trees from a data set simulated with sampling times (heterochronous) and under a strict molecular clock model (SC). The isochronous trees are inferred by fixing the molecular clock rate to the true value, such that the timescale is in the comparable units to the heterochronous analyses. Unlike the estimates for isochronous trees (e.g. Fig 3), the height of the trees under all scenarios are comparable. Axes and labels are the same as those of Fig 3.
Table 4.
Correctly classified simulation replicates under heterochronous and isochronous trees using hard bounds on the root height.
Rows and columns are identical to those of Table 3, but here the analyses include an explicit prior on the root height, via a uniform distribution between 0 and 5.0.
Fig 5.
Marginal prior distributions and pairs plots.
The grey histograms correspond to the parameter labelled at the bottom of each column; effective population size (pop. size, θ), tree length, root height, and the evolutionary rate (evol. rate). The prior for θ here is a Uniform distribution between 0 and 103, while that for the evolutionary rate is a CTMC-rate reference prior. Note that the tree length and root height have units of time, the evolutionary rate is in subs/site/time, and θ is proportional to units of time.
Fig 6.
Prior predictive simulations and marginal priors of root heights, given the prior on the effective population size, θ.
Each panel corresponds to a different prior on θ, as described on Table 1 (Exponential(μ = 1.0), Log-normal(μ = 1.0, σ = 5.0), and Γ(κ = 0.001, θ = 1000)). We show five simulated trees from our analysis using sampling times (heterochronous analyses) and overlaid them (similar to densitree plots [56]). The violin plots show the prior densities of the root height and the hollow circles denote 100 randomly drawn samples from the prior. The y-axis is the time from the present, but note that the scales are different. Tip nodes are shown with solid grey circles. The densities and trees on the left, in orange, do not include an explicit prior on the root height (Th), while those to the right, in purple, have a hard bound on the root height in the form of a uniform prior between 0.0 and 5.0 units of time.