Skip to main content
Advertisement

< Back to Article

Fig 1.

Branch rate parameterisations.

Top left: the prior density of a branch rate r under a Log-normal(−0.5σ2, σ) distribution (with its mean fixed at 1). The function for transforming into branch rates is depicted for real (top right), cat (bottom left), and quant (bottom right). For visualisation purposes, there are only 10 bins/pieces displayed, however in practice we use 2N − 2 bins for cat and 100 pieces for quant. The first and final quant pieces are equal to the underlying function (solid lines) however the pieces in between use linear approximations of this function (dashed lines).

More »

Fig 1 Expand

Table 1.

Summary of pre-existing BEAST 2 operators.

More »

Table 1 Expand

Fig 2.

Clock standard deviation scale operators.

The two operators above propose a clock standard deviation σσ′. Then, either the new quantiles are such that the rates remain constant (“New quantiles”, above) or the new rates are such that the quantiles remain constant (“New rates”). In the real parameterisation, these two operators are known as Scale and CisScale, respectively. Whereas, in quant, they are known as CisScale and Scale.

More »

Fig 2 Expand

Fig 3.

Traversing likelihood space.

The z-axes above are the log-likelihoods of the genetic distance r × τ between two simulated nucleic acid sequences of length L, under the Jukes-Cantor substitution model [40]. Two possible proposals from the current state (white circle) are depicted. These proposals are generated by the RandomWalk (RW) and ConstantDistance (CD) operators. In the low signal dataset (L = 0.1kb), both operators can traverse the likelihood space effectively. However, the exact same proposal by RandomWalk incurs a much larger likelihood penalty in the L = 0.5kb dataset by “falling off the ridge”, in contrast to ConstantDistance which “walks along the ridge”. This discrepancy is even stronger for larger datasets and thus necessitates the use of operators such as ConstantDistance which account for correlations between branch lengths and rates.

More »

Fig 3 Expand

Table 2.

Summary of AdaptiveOperatorSampler operators and their parameters of interest (POI).

More »

Table 2 Expand

Fig 4.

The Bactrian proposal kernel.

The step size made under a Bactrian proposal kernel is equal to sΣ where Σ is drawn from the above distribution and s is tunable.

More »

Fig 4 Expand

Fig 5.

Depiction of NarrowExchange and NarrowExchangeRate operators.

Proposals are denoted by . The vertical axes correspond to node heights t. In the bottom figure, branch rates r are indicated by line width and therefore genetic distances are equal to the width of each branch multiplied by its length. In this example, the and constraints are satisfied.

More »

Fig 5 Expand

Fig 6.

Screening of NER and NERw variants by acceptance rate.

Top left: comparison of NER variants with the null operator NER{} = NarrowExchange. Each operator is represented by a single point, uniquely encoded by the point stylings. The number of times each operator is proposed and accepted is compared with that of NER{}, and one-sided z-tests are performed to assess the statistical significance between the two acceptance rates (p = 0.001). This process is repeated across 300 simulated datasets. The axes of each plot are the proportion of these 300 simulations for which there is evidence that the operator is significantly better than NER{} (x-axis) or worse than NER{} (y-axis). Top right: comparison of NER and NERw acceptance rates. Each point is one NER/NERw variant from a single simulation. Bottom: relationship between the acceptance rates α of and NER{} with the clock model standard deviation σ and the number of sites L. Each point is a single simulation.

More »

Fig 6 Expand

Table 3.

Summary of clock model operators introduced throughout this article.

More »

Table 3 Expand

Table 4.

Operator configurations and the substitution rate parameterisations which each operator is applicable to.

More »

Table 4 Expand

Fig 7.

Protocol for optimising clock model methodologies.

Each area (detailed in Models and methods) is optimised sequentially, and the best setting from each step is used when optimising the following step.

More »

Fig 7 Expand

Table 5.

Benchmark datasets, sorted in increasing order of taxon count N.

More »

Table 5 Expand

Fig 8.

Round 1: Benchmarking the AdaptiveOperatorSampler operator.

Top left, top right, bottom left: each plot compares the ESS/hr (±1 standard error) across two operator configurations. Bottom right: the effect of sequence length L on operator weights learned by AdaptiveOperatorSampler. Both sets of observations are fit by logistic regression models. The benchmark datasets are displayed in Table 5. The cat and quant settings are evaluated in S1 Fig.

More »

Fig 8 Expand

Fig 9.

Round 2: Benchmarking substitution rate parameterisations.

Top left, top right, bottom left: the adapt (real), adapt (cat), and adapt (quant) configurations were compared. Bottom right: comparison of the mean tip substitution rate ESS/hr as a function of alignment length L.

More »

Fig 9 Expand

Fig 10.

Comparison of runtimes across methodologies.

The computational time required for a setting to sample a single state is divided by that of the nocons (cat) configuration. The geometric mean under each configuration, averaged across all 9 datasets, is displayed as a horizontal bar.

More »

Fig 10 Expand

Fig 11.

Round 3: Benchmarking the Bactrian kernel.

The ESS/hr (±1 s.e.) under the Bactrian configuration, divided by that under the uniform kernel, is shown in the y-axis for each dataset and relevant parameter. Horizontal bars show the geometric mean under each parameter.

More »

Fig 11 Expand

Fig 12.

Round 4: Benchmarking the NER operators.

Top: the learned weights (left) behind the two NER operators (NER{} and ), and the relative difference between their acceptance rates α (right), are presented as functions of sequence length. Logistic and logarithmic regression models are shown, respectively. Bottom: maximum clade credibility tree of the bony fish dataset by Broughton et al. 2013 [53]. This alignment received the strongest boost from NER, likely due to its high topological uncertainty and branch rate variance. Branches are coloured by substitution rate, the y-axis shows time units, and internal nodes are labelled with posterior clade support. Tree visualised using UglyTrees [59].

More »

Fig 12 Expand

Fig 13.

Round 5: Benchmarking the LeafAVMVN operator.

See Fig 11 caption for figure notation.

More »

Fig 13 Expand