Skip to main content
Advertisement

< Back to Article

Fig 1.

Example genome search tree and its use.

An example genome search tree for ancestral genome ACGGT. Blue nodes are terminal and red nodes are internal. Inside each node we represent on top the genome positions represented by the node; at the center inside terminal nodes we show the allele of the node; at the bottom of nodes is their total rate. Under each terminal node we show the example relevant mutation rates. The path highlighted in orange shows an example sampling of one mutation event. A parameter R is assigned an initial random number sampled uniformly between 0 and the total rate 8.1, in this case it is R = 4.7. As we move downward, the value of R can decrease, as described in Algorithm 2, determining which site will mutate and how. Here, an initial R = 4.7 results in the sampling of a G→T mutation at genome position 4.

More »

Fig 1 Expand

Fig 2.

Example of multi-layer genome search tree and its evolution.

We track the evolution of the multi-layer genome search tree starting from the genome search tree of Fig 1. Colors for the genome search tree are the same as in Fig 1 (right side of each panel). On the left side of each panel, we show an extract of the phylogenetic tree containing three nodes (“P” for parent, which in this example is the root of the phylogeny, and “L” and “R” for left and right node). “L” has further descendants, but we don’t show them here and only focus on this triplet of nodes as an example. The green arrow along the phylogenetic tree shows the current step of the preorder traversal being considered by the given panel. Black arrows show past steps. Vertical dashed lines in the multi-layer genome search tree connect nodes that represent the same portions of the genome but that are in different layers. “L0” stands for “Layer 0” and “L1” for “Layer 1”, etc. A At the phylogenetic root “P” we initialize the genome search tree for layer 0. B As we move to child “L”, a new substitution is sampled (as in Fig 1) and 3 corresponding genome nodes are created in layer 1. These nodes correspond to the nodes in the original genome search tree whose rate is affected by the new mutation. C As we traverse the subtree of the descendants of L, new nodes and mutations might be added in the layers below. D We are finished traversing the subtree of the descendants of L, and we return to L, at which point all nodes in layer below 1 have either been removed or have become irrelevant. E We return to P, at which point the genome search tree nodes previously added layer 1 are also ignored or deleted. F We move from P to R, and in doing so new mutation events might be sampled and the corresponding genome nodes might be added to layer 1 (new genome search tree nodes corresponding to 1 new substitution are shown in the new layer 1).

More »

Fig 2 Expand

Table 1.

Indel length distribution options.

More »

Table 1 Expand

Table 2.

A comparison of features of different sequence evolution simulation software packages.

More »

Table 2 Expand

Fig 3.

Comparison of running times of different simulators in a scenario similar to SARS-CoV-2 data.

On the Y axis we show the number of seconds it takes to perform simulations using different software. On the X axis is the number of tips simulated. Each point represents ten replicates. We do not run the most demanding simulators when each replicate would take substantially more than 1 minute to run. In red is the time to run phastSim with a concise output, and in orange is the time for phastSim with additionally generating a FASTA format output. In green is the demand of pyvolve, and in purple of Seq-Gen. In yellow and brown are respectively the time for running INDELible with method 1 (matrix exponentiation) and method 2 (Gillespie approach).

More »

Fig 3 Expand

Fig 4.

Comparison of running times of different simulators in a scenario similar to E. Coli outbreak data.

On the Y axis we show the number of seconds it takes to perform simulations using different software. On the X axis is the number of tips simulated. Each point represents ten replicates. We do not run Seq-Gen for more than 1000 tips due to high computational demand. In red is the time to run phastSim, and in orange is the time for phastSim with the simple, non-hierarchical approach. In purple is the time demand of Seq-Gen.

More »

Fig 4 Expand

Fig 5.

Comparison of running times of different simulators in a SARS-CoV-2 scenario using different evolutionary models.

On the Y axis we show the number of seconds it takes to perform simulations using different software. On the X axis is the model used for simulations: “nucleotide” is a nucleotide substitution model without variation; “nuc+10cat” is a nucleotide model with 10 rate categories; “nuc+alpha” is a nucleotide model with continuous variation in rate (each site has a distinct rate sampled from a Gamma distribution); “codon” represents a codon substitution model; “codon+10cat” represents a codon substitution model with 10 categories for ω; “codon+alpha” is a codon model with continuous rate variation in mutation rate and in ω (only allowed in phastSim). Each value represents ten replicates. Seq-Gen does not allow codon models. Colors are as in Fig 3. Here we used alignments of 1000 tips.

More »

Fig 5 Expand

Fig 6.

Comparison of running times of Indelible and phastSim simulators in a SARS-CoV-2 scenario with indels.

In this scenario we compare phastSim against Indelbile-m1 and Indelible-m2 (the only other methods considered here that model indels). Each value represents ten replicates.

More »

Fig 6 Expand

Fig 7.

Comparison of running times of different simulators in a SARS-CoV-2 scenario after rescaling the tree branch lengths by different factors.

On the Y axis we show the number of seconds it takes to perform simulations using different software. On the X axis is the rescaling factor we use to make the phylogenetic tree branch lengths longer or shorter. Colors are as in Fig 3. Here we used alignments of 5000 tips.

More »

Fig 7 Expand