Skip to main content

Advertisement

PLOS Computational Biology

Browse
Publish
- Submissions
- Policies
- Manuscript Review and Publication
About

Search Search

advanced search

< Back to Article

Fig 1 — Fig 1.

Example of a phylogenetic network.
The top node represents the origin and its child node is called the root of the network. Time flows from the origin node to the leaves (here A, B, C, D) so branches are directed from the top to the leaves. Each branch x is associated to a length t_x, and to a population size θ_x. Additionally, branches x on top of a reticulation node have an inheritance probability γ_x representing their probability to have contributed to any individual at the top of the branch just below.

More »

Fig 2.

Illustration of the concepts and notation employed to describe likelihood computations.
The species network topology is the same as that in Fig 1, but branches (populations) are now represented as grey parallelograms. A gene tree is drawn inside the species network (green and red lines). One mutation occurs in the branch above D. We focus on three branches: x, y and z. Colored horizontal bars represent the population interfaces , , and . Note that (blue) is a vector of incomparable population interfaces, while (orange) is not, as is a descendant of . Here, n_A = n_B = n_C = n_D = 2, r_A = 2, r_B = 1, r_C = 0, r_D = 2 are known, whereas the values of and are not observed, and depend on the gene tree generated by the MSNC process. For the gene tree shown, and . Since z is incident to leaf B, we have and . Now note . Then, .

More »

Fig 2.

Illustration of the concepts and notation employed to describe likelihood computations.
The species network topology is the same as that in Fig 1, but branches (populations) are now represented as grey parallelograms. A gene tree is drawn inside the species network (green and red lines). One mutation occurs in the branch above D. We focus on three branches: x, y and z. Colored horizontal bars represent the population interfaces , , and . Note that (blue) is a vector of incomparable population interfaces, while (orange) is not, as is a descendant of . Here, n_A = n_B = n_C = n_D = 2, r_A = 2, r_B = 1, r_C = 0, r_D = 2 are known, whereas the values of and are not observed, and depend on the gene tree generated by the MSNC process. For the gene tree shown, and . Since z is incident to leaf B, we have and . Now note . Then, .

More »

Fig 3.

Illustration of Rule 2.
Given (a) the partial likelihoods for the (red) vector of population interfaces and the partial likelihoods for the (blue) vector of population interfaces, Rule 2 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 3.

Illustration of Rule 2.
Given (a) the partial likelihoods for the (red) vector of population interfaces and the partial likelihoods for the (blue) vector of population interfaces, Rule 2 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 4.

Illustration of Rule 3.
Given (a) the partial likelihoods for the (red) vector of population interfaces, Rule 3 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 4.

Illustration of Rule 3.
Given (a) the partial likelihoods for the (red) vector of population interfaces, Rule 3 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 5.

Illustration of Rule 4.
Given (a) the partial likelihoods for the (red) vector of population interfaces, Rule 4 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 5.

Illustration of Rule 4.
Given (a) the partial likelihoods for the (red) vector of population interfaces, Rule 4 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 6.

Example of a phylogenetic network where the level ℓ is equal to 6 (the reticulation nodes are depicted in grey), while , depending on the traversal algorithm (not shown).
A traversal ensuring that remains close to the lower end of this interval (the scanwidth of the network [66]) will be several orders of magnitude faster than algorithms whose complexity depends exponentially on ℓ. Increasing the number of reticulation nodes while keeping a “ladder” topology as above can make ℓ arbitrarily large, while the scanwidth remains constant. This topology may seem odd but it is intended as the backbone of a more complex and realistic network with subtrees hanging from the different internal branches of the ladder, in which case the complexity issue remains.

More »

Fig 6.

Example of a phylogenetic network where the level ℓ is equal to 6 (the reticulation nodes are depicted in grey), while , depending on the traversal algorithm (not shown).
A traversal ensuring that remains close to the lower end of this interval (the scanwidth of the network [66]) will be several orders of magnitude faster than algorithms whose complexity depends exponentially on ℓ. Increasing the number of reticulation nodes while keeping a “ladder” topology as above can make ℓ arbitrarily large, while the scanwidth remains constant. This topology may seem odd but it is intended as the backbone of a more complex and realistic network with subtrees hanging from the different internal branches of the ladder, in which case the complexity issue remains.

More »

Fig 7 — Fig 7.

The three phylogenetic networks used for simulating data.
Networks A and B are taken from [54]. Branch lengths are measured in units of expected number of mutations per site (i.e. substitutions per site). Displayed values represent inheritance probabilities.

More »

Fig 8 — Fig 8.

The networks from the C family, with either 3 or 4 reticulation nodes, and with or without outgroup O.

More »

Table 1.

Average posterior probability of the correct topology (for networks A and B, see Fig 7) obtained by running SnappNet on simulated data.
Results are given as a function of the number of sites and as a function of the hyperparameter values α and β for the prior on θ (θ ∼ Γ(α, β) and ). Here, one lineage was simulated per species. Constant sites are included in the analysis, the rates u and v are considered as known, and 20 replicates are considered for each simulation set up (criterion ESS > 200; , r ∼ Beta(1, 1), for the network prior).

More »

Table 1.

Average posterior probability of the correct topology (for networks A and B, see Fig 7) obtained by running SnappNet on simulated data.
Results are given as a function of the number of sites and as a function of the hyperparameter values α and β for the prior on θ (θ ∼ Γ(α, β) and ). Here, one lineage was simulated per species. Constant sites are included in the analysis, the rates u and v are considered as known, and 20 replicates are considered for each simulation set up (criterion ESS > 200; , r ∼ Beta(1, 1), for the network prior).

More »

Fig 9 — Fig 9.

The ratio of trees (black), 1-reticulation networks (dark grey), 2-reticulations networks (light gray), sampled by SnappNet, under the different simulations settings studied in Table 1.
Recall that networks A and B contain 1 and 2 reticulations, respectively.

More »

Fig 10.

Estimated height and length for network A (see Fig 7), as a function of the number of sites.
Heights and lengths are measured in units of expected number of mutations per site. True values are given by the dashed horizontal lines. Two lineages per species were simulated. Constant sites are included in the analysis, and 20 replicates are considered for each simulation set up (criterion ESS > 200; θ ∼ Γ(1, 200), , r ∼ Beta(1,1), for the priors, number of reticulations bounded by 2 when exploring the network space).

More »

Fig 10.

Estimated height and length for network A (see Fig 7), as a function of the number of sites.
Heights and lengths are measured in units of expected number of mutations per site. True values are given by the dashed horizontal lines. Two lineages per species were simulated. Constant sites are included in the analysis, and 20 replicates are considered for each simulation set up (criterion ESS > 200; θ ∼ Γ(1, 200), , r ∼ Beta(1,1), for the priors, number of reticulations bounded by 2 when exploring the network space).

More »

Fig 11 — Fig 11.

Estimated inheritance probability and instantaneous rates for network A (see Fig 7), as a function of the number of sites. True values are given by the dashed horizontal lines.
Same framework as in Fig 10.

More »

Fig 12 — Fig 12.

Estimated node heights of network A (see Fig 7), as a function of the number of sites.
Heights are measured in units of expected number of mutations per site. True values are given by the dashed horizontal lines. Same framework as in Fig 10. The initials MRCA stand for “Most Recent Common Ancestor”.

More »

Fig 13 — Fig 13.

Estimated population sizes θ for each branch of network A (see Fig 7), as a function of the number of sites.
True values are given by the dashed horizontal lines. Same framework as in Fig 10. The initials MRCA stand for “Most Recent Common Ancestor”.

More »

Table 2.

Average posterior probability (PP) of the topology of network C obtained by running SnappNet on data simulated from network C.
Results are given as a function of the number of sites and as a function of the number of lineages sampled in hybrid species B and C (either both 1 or both 4). Only one lineage was sampled in every other species. Constant sites are included in the analysis and the rates u and v are considered as known. Posterior probabilities are computed on the basis of replicates for which the criterion ESS > 100 is fulfilled. The sampler efficiency (SE) is also indicated (true hyperparameter values for the prior on θ, i.e. θ ∼ Γ(1, 200); as a network prior , r ∼ Beta(1, 1), ; number of reticulations bounded by 2 when exploring the network space).

More »

Table 2.

Average posterior probability (PP) of the topology of network C obtained by running SnappNet on data simulated from network C.
Results are given as a function of the number of sites and as a function of the number of lineages sampled in hybrid species B and C (either both 1 or both 4). Only one lineage was sampled in every other species. Constant sites are included in the analysis and the rates u and v are considered as known. Posterior probabilities are computed on the basis of replicates for which the criterion ESS > 100 is fulfilled. The sampler efficiency (SE) is also indicated (true hyperparameter values for the prior on θ, i.e. θ ∼ Γ(1, 200); as a network prior , r ∼ Beta(1, 1), ; number of reticulations bounded by 2 when exploring the network space).

More »

Table 3.

Average posterior probability (PP) of the topology of network C obtained by running MCMC_BiMarkers on data simulated from network C.
Results are given as a function of the number of sites and as a function of the number of lineages sampled in hybrid species B and C (either both 1 or both 4). Only one lineage was sampled in every other species, constant sites are included in the analysis, and the rates u and v are considered as known. 1.5 × 10⁶ iterations are considered. is the average ESS over the different replicates, and SE stands for the sampler efficiency.

More »

Table 3 — Table 3.

Average posterior probability (PP) of the topology of network C obtained by running MCMC_BiMarkers on data simulated from network C.
Results are given as a function of the number of sites and as a function of the number of lineages sampled in hybrid species B and C (either both 1 or both 4). Only one lineage was sampled in every other species, constant sites are included in the analysis, and the rates u and v are considered as known. 1.5 × 10⁶ iterations are considered. is the average ESS over the different replicates, and SE stands for the sampler efficiency.

More »

Fig 14 — Fig 14.

Frequency of trees (black), 1-reticulation networks (dark grey), 2-reticulations networks (light gray) sampled by SnappNet and MCMC_BiMarkers, when data were simulated from Network C (see Tables 2 and 3).
Recall that network C contains 2 reticulations.

More »

Table 4 — Table 4.

Computational efficiency of calculating a single likelihood value in SnappNet and MCMC_BiMarkers for networks C, C(3) and C(4).
10 lineages are sampled in species C and 1 lineage in other species. Average and standard deviation are reported.

More »

Fig 15 — Fig 15.

The two networks obtained for data set 1 with only one variety per subpopulation.
Each network corresponds to the posterior mean of the distribution sampled by SnappNet. Inheritance probabilities are reported above reticulation edges and branch lengths are given in units of expected number of mutations per site (see the scale at the top left).

More »

Fig 16 — Fig 16.

The two networks obtained for data set 2 with two varieties per subpopulation.
Each network corresponds to the posterior mean of the distribution sampled by SnappNet. Inheritance probabilities are reported above reticulation edges and branch lengths are given in units of expected number of mutations per site (see the scale at the top left).

More »

Fig 17 — Fig 17.

The MAP phylogenetic network obtained for data set 3 with two varieties per subpopulation.
Inheritance probabilities are reported above reticulation edges and branch lengths are given in units of expected number of mutations per site (see the scale at the top left).

More »

Fig 18 — Fig 18.

The three topologies sampled by SnappNet when data set 3 was considered.
Reported inheritance probabilities for each topology are averages on sampled observations.

More »

Publications
PLOS Aging and Health
PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Ecosystems
PLOS Genetics

PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS One
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Water

Home
Blogs
Collections
Give feedback
LOCKSS

Privacy Policy
Terms of Use
Advertise
Media Inquiries
Contact

PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in California, US