Skip to main content
Advertisement

< Back to Article

Fig 1.

Example of a phylogenetic network.

The top node represents the origin and its child node is called the root of the network. Time flows from the origin node to the leaves (here A, B, C, D) so branches are directed from the top to the leaves. Each branch x is associated to a length tx, and to a population size θx. Additionally, branches x on top of a reticulation node have an inheritance probability γx representing their probability to have contributed to any individual at the top of the branch just below.

More »

Fig 1 Expand

Fig 2.

Illustration of the concepts and notation employed to describe likelihood computations.

The species network topology is the same as that in Fig 1, but branches (populations) are now represented as grey parallelograms. A gene tree is drawn inside the species network (green and red lines). One mutation occurs in the branch above D. We focus on three branches: x, y and z. Colored horizontal bars represent the population interfaces , , and . Note that (blue) is a vector of incomparable population interfaces, while (orange) is not, as is a descendant of . Here, nA = nB = nC = nD = 2, rA = 2, rB = 1, rC = 0, rD = 2 are known, whereas the values of and are not observed, and depend on the gene tree generated by the MSNC process. For the gene tree shown, and . Since z is incident to leaf B, we have and . Now note . Then, .

More »

Fig 2 Expand

Fig 3.

Illustration of Rule 2.

Given (a) the partial likelihoods for the (red) vector of population interfaces and the partial likelihoods for the (blue) vector of population interfaces, Rule 2 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 3 Expand

Fig 4.

Illustration of Rule 3.

Given (a) the partial likelihoods for the (red) vector of population interfaces, Rule 3 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 4 Expand

Fig 5.

Illustration of Rule 4.

Given (a) the partial likelihoods for the (red) vector of population interfaces, Rule 4 allows us to compute the partial likelihoods for the (green) vector (b).

More »

Fig 5 Expand

Fig 6.

Example of a phylogenetic network where the level is equal to 6 (the reticulation nodes are depicted in grey), while , depending on the traversal algorithm (not shown).

A traversal ensuring that remains close to the lower end of this interval (the scanwidth of the network [66]) will be several orders of magnitude faster than algorithms whose complexity depends exponentially on . Increasing the number of reticulation nodes while keeping a “ladder” topology as above can make arbitrarily large, while the scanwidth remains constant. This topology may seem odd but it is intended as the backbone of a more complex and realistic network with subtrees hanging from the different internal branches of the ladder, in which case the complexity issue remains.

More »

Fig 6 Expand

Fig 7.

The three phylogenetic networks used for simulating data.

Networks A and B are taken from [54]. Branch lengths are measured in units of expected number of mutations per site (i.e. substitutions per site). Displayed values represent inheritance probabilities.

More »

Fig 7 Expand

Fig 8.

The networks from the C family, with either 3 or 4 reticulation nodes, and with or without outgroup O.

More »

Fig 8 Expand

Table 1.

Average posterior probability of the correct topology (for networks A and B, see Fig 7) obtained by running SnappNet on simulated data.

Results are given as a function of the number of sites and as a function of the hyperparameter values α and β for the prior on θ (θ ∼ Γ(α, β) and ). Here, one lineage was simulated per species. Constant sites are included in the analysis, the rates u and v are considered as known, and 20 replicates are considered for each simulation set up (criterion ESS > 200; , r ∼ Beta(1, 1), for the network prior).

More »

Table 1 Expand

Fig 9.

The ratio of trees (black), 1-reticulation networks (dark grey), 2-reticulations networks (light gray), sampled by SnappNet, under the different simulations settings studied in Table 1.

Recall that networks A and B contain 1 and 2 reticulations, respectively.

More »

Fig 9 Expand

Fig 10.

Estimated height and length for network A (see Fig 7), as a function of the number of sites.

Heights and lengths are measured in units of expected number of mutations per site. True values are given by the dashed horizontal lines. Two lineages per species were simulated. Constant sites are included in the analysis, and 20 replicates are considered for each simulation set up (criterion ESS > 200; θ ∼ Γ(1, 200), , r ∼ Beta(1,1), for the priors, number of reticulations bounded by 2 when exploring the network space).

More »

Fig 10 Expand

Fig 11.

Estimated inheritance probability and instantaneous rates for network A (see Fig 7), as a function of the number of sites. True values are given by the dashed horizontal lines.

Same framework as in Fig 10.

More »

Fig 11 Expand

Fig 12.

Estimated node heights of network A (see Fig 7), as a function of the number of sites.

Heights are measured in units of expected number of mutations per site. True values are given by the dashed horizontal lines. Same framework as in Fig 10. The initials MRCA stand for “Most Recent Common Ancestor”.

More »

Fig 12 Expand

Fig 13.

Estimated population sizes θ for each branch of network A (see Fig 7), as a function of the number of sites.

True values are given by the dashed horizontal lines. Same framework as in Fig 10. The initials MRCA stand for “Most Recent Common Ancestor”.

More »

Fig 13 Expand

Table 2.

Average posterior probability (PP) of the topology of network C obtained by running SnappNet on data simulated from network C.

Results are given as a function of the number of sites and as a function of the number of lineages sampled in hybrid species B and C (either both 1 or both 4). Only one lineage was sampled in every other species. Constant sites are included in the analysis and the rates u and v are considered as known. Posterior probabilities are computed on the basis of replicates for which the criterion ESS > 100 is fulfilled. The sampler efficiency (SE) is also indicated (true hyperparameter values for the prior on θ, i.e. θ ∼ Γ(1, 200); as a network prior , r ∼ Beta(1, 1), ; number of reticulations bounded by 2 when exploring the network space).

More »

Table 2 Expand

Table 3.

Average posterior probability (PP) of the topology of network C obtained by running MCMC_BiMarkers on data simulated from network C.

Results are given as a function of the number of sites and as a function of the number of lineages sampled in hybrid species B and C (either both 1 or both 4). Only one lineage was sampled in every other species, constant sites are included in the analysis, and the rates u and v are considered as known. 1.5 × 106 iterations are considered. is the average ESS over the different replicates, and SE stands for the sampler efficiency.

More »

Table 3 Expand

Fig 14.

Frequency of trees (black), 1-reticulation networks (dark grey), 2-reticulations networks (light gray) sampled by SnappNet and MCMC_BiMarkers, when data were simulated from Network C (see Tables 2 and 3).

Recall that network C contains 2 reticulations.

More »

Fig 14 Expand

Table 4.

Computational efficiency of calculating a single likelihood value in SnappNet and MCMC_BiMarkers for networks C, C(3) and C(4).

10 lineages are sampled in species C and 1 lineage in other species. Average and standard deviation are reported.

More »

Table 4 Expand

Fig 15.

The two networks obtained for data set 1 with only one variety per subpopulation.

Each network corresponds to the posterior mean of the distribution sampled by SnappNet. Inheritance probabilities are reported above reticulation edges and branch lengths are given in units of expected number of mutations per site (see the scale at the top left).

More »

Fig 15 Expand

Fig 16.

The two networks obtained for data set 2 with two varieties per subpopulation.

Each network corresponds to the posterior mean of the distribution sampled by SnappNet. Inheritance probabilities are reported above reticulation edges and branch lengths are given in units of expected number of mutations per site (see the scale at the top left).

More »

Fig 16 Expand

Fig 17.

The MAP phylogenetic network obtained for data set 3 with two varieties per subpopulation.

Inheritance probabilities are reported above reticulation edges and branch lengths are given in units of expected number of mutations per site (see the scale at the top left).

More »

Fig 17 Expand

Fig 18.

The three topologies sampled by SnappNet when data set 3 was considered.

Reported inheritance probabilities for each topology are averages on sampled observations.

More »

Fig 18 Expand