On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo

For various species, high quality sequences and complete genomes are nowadays available for many individuals. This makes data analysis challenging, as methods need not only to be accurate, but also time efficient given the tremendous amount of data to process. In this article, we introduce an efficient method to infer the evolutionary history of individuals under the multispecies coalescent model in networks (MSNC). Phylogenetic networks are an extension of phylogenetic trees that can contain reticulate nodes, which allow to model complex biological events such as horizontal gene transfer, hybridization and introgression. We present a novel way to compute the likelihood of biallelic markers sampled along genomes whose evolution involved such events. This likelihood computation is at the heart of a Bayesian network inference method called SnappNet, as it extends the Snapp method inferring evolutionary trees under the multispecies coalescent model, to networks. SnappNet is available as a package of the well-known beast 2 software. Recently, the MCMC_BiMarkers method, implemented in PhyloNet, also extended Snapp to networks. Both methods take biallelic markers as input, rely on the same model of evolution and sample networks in a Bayesian framework, though using different methods for computing priors. However, SnappNet relies on algorithms that are exponentially more time-efficient on non-trivial networks. Using simulations, we compare performances of SnappNet and MCMC_BiMarkers. We show that both methods enjoy similar abilities to recover simple networks, but SnappNet is more accurate than MCMC_BiMarkers on more complex network scenarios. Also, on complex networks, SnappNet is found to be extremely faster than MCMC_BiMarkers in terms of time required for the likelihood computation. We finally illustrate SnappNet performances on a rice data set. SnappNet infers a scenario that is consistent with previous results and provides additional understanding of rice evolution.


A closer look at the rules
Here, we first provide proofs of correctness for the rules to compute the partial likelihoods introduced in the main text (Sec. 1.1). Then we explain the rationale behind the ranges used for the summation terms in Rules 2 and 4 (Sec. 1.2).

Correctness of the rules for partial likelihoods.
Recall the definition of the partial likelihoods, which will be used in each of the proofs below: where L(x) is a vector of population interfaces (VPI) containing exactly once each leaf that descends from any element of x.
Finally, we note that many statements of conditional independence that we require in our proofs depend on the fact that the involved VPIs are incomparable.
Rule 0. Let x be a branch incident to a leaf. Then, Recall that the number of lineages sampled from species x is known and equal to n x . Then, applying definition (1) above with x = (x), we have: F (x) ((n); (r)) = P R x = r x | N x = n, R x = r × P N x = n = 1{r x = r} × 1{n x = n}.
Rule 1. Let x, x be a vector of incomparable population interfaces. Then, F x,x (n x , n x ; r x , r x ) = mx n=nx n r=0 F x,x (n x , n; r x , r) exp(Q x t x ) (n,r);(nx,rx) Proof. First, note that, because R L(x,x) is independent of N x , R x , when given N x , R x , and because L(x, x) = L(x, x): Writing down the definition of F x,x , then summing over all possible values of N x and R x , and then using the identity above, we obtain: Moreover, where we have used that R x is independent of N x and R x , when given N x , N x , R x .
We then have: Using the fact that N x is independent of N x , when given N x , the last term in the product can be rewritten as follows: P(N x = n, N x = n x , N x = n x ) = P(N x = n x | N x = n) × P(N x = n, N x = n x ) Using Equation (2), we finally obtain: F x,x (n x , n x ; r x , r x ) = mx n=nx n r=0 P R L(x,x) = r L(x,x) | N x = n x , R x = r x , N x = n, R x = r × P(N x = n, N x = n x ) × exp(Q x t x ) (n,r);(nx,rx) = mx n=nx n r=0 F x,x (n x , n; r x , r) × exp(Q x t x ) (n,r); (nx,rx) In the following proofs, to make the mathematics more readable, we denote each event A = a inside a probability simply as a, whenever the left-hand side of A = a is unambiguously determined by the right-hand side. For example: We will still write the full version in those cases where the left-hand side cannot be inferred from the right-hand side.
Rule 2. Let x, x and y, y be two vectors of incomparable population interfaces, such that L(x, x) and L(y, y) have no leaf in common. Let x, y be the immediate descendants of branch z. Then, F x,y,z n x , n y , n z ; r x , r y , r z = nx rx F x,x (n x , n x ; r x , r x ) F y,y n y , n z − n x ; r y , r z − r x n x r x The ranges of n x and r x in the summation terms are defined by max(0, n z − m y ) ≤ n x ≤ min(m x , n z ) and max(0, n x + r z − n z ) ≤ r x ≤ min(n x , r z ).
Proof. By definition, F x,y,z n x , n y , n z ; r x , r y , r z = P r L(x,y,z) | n x , n y , n z , r x , r y , r z × P n x , n y , n z We then sum over all possible realizations of N x and R x , and obtain: F x,y,z n x , n y , n z ; r x , r y , r z = nx rx P r L(x,y,z) | n x , n y , n z , r x , r y , r z , n x , r x × P n x , r x | n x , n y , n z , r x , r y , r z × P n x , n y , n z , where the ranges in the summation terms are the same as those in the statement.
Now recall that L(x, x) and L(y, y) are disjoint vectors and note that their concatenation is equivalent to L(x, y, z). This means that r L(x,y,z) can also be written as r L(x,x) , r L(y,y) . Moreover, N z = n z and N x = n x implies N y = n z − n x , and similarly R z = r z and R x = r x implies R y = r z − r x . We can then write: P r L(x,y,z) | n x , n y , n z , r x , r y , r z , n x , r x = P r L(x,x) , r L(y,y) | n x , n y , n z , r x , r y , r z , n x , r x , N y = n z − n x , R y = r z − r x = P r L(x,x) | n x , r x , n x , r x × P r L(y,y) | n y , r y , N y = n z − n x , R y = r z − r x .
In the last equality above, we used the fact that R L(x,x) and R L(y,y) are independent random variables, given N x,x , R x,x and N y,y , R y,y , respectively.

Moreover,
P n x , r x | n x , n y , n z , r x , r y , r z = P r x | n x , n x , n y , n z , r x , r y , r z × P n x | n x , n y , n z , r x , r y , r z = P r x | n x , n z , r z × P n x | n x , n y , n z , where in the last equality we have used the fact that R x is independent of N x , N y , R x , R y , when given N x , N z , R z , and the fact that N x is independent of R x , R y , R z , when given N x , N y , N z .
Putting all this together, we get: F x,y,z n x , n y , n z ; r x , r y , r z = nx rx P r L(x,x) | n x , r x , n x , r x × P r L(y,y) | n y , r y , N y = n z − n x , R y = r z − r x × P r x | n x , n z , r z × P n x , n x , n y , n z .

Now note that
P n x , n x , n y , n z = P n x , n y , n x , N y = n z − n x = P (n x , n x ) × P n y , N y = n z − n x , where the last equality is due to the independence between the lineages from L(x, x) and those from L(y, y).
Finally, R x , given N x = n x , N z = n z , R z = r z follows a hypergeometric distribution: which allows us to conclude: x be a vector of incomparable population interfaces, such that branch x's top node is a reticulation node. Let y, z be the branches immediately ancestral to x. Then, F x,y,z n x , n y , n z ; r x , r y , r z = F x,x n x , n y + n z ; r x , r y + r z n y + n z n y γ ny y ·γ nz z Proof. First note that P r L(x,y,z) | n x , n y , n z , r x , r y , r z = P r L(x,x) | n x , N x = n y + n z , r x , R x = r y + r z .
Then, using the definitions of F x,y,z and F x,x : F x,y,z n x , n y , n z ; r x , r y , r z F x,x n x , n y + n z ; r x , r y + r z = P n x , n y , n z P n x , N x = n y + n z But P n x , n y , n z P n x , N x = n y + n z = P n y , n z | n x , N x = n y + n z = n y + n z n y γ where the first equality applies the definition of conditional probability, and the second equality uses the fact that N y and N z are binomially distributed, when given N x . The Rule trivially follows.
Rule 4. Let z, x, y be a vector of incomparable population interfaces, and let x, y be immediate descendants of branch z. Then, The ranges of n x and r x in the sums are the same as those in Rule 2.
Proof. Use the definition of F z,z and then sum over all possible realizations of N x and R x : Now note that L(z, z) = L(z, x, y), and that meaning that P r L(z,z) | n z , n z , r z , r z , n x , r x = P r L(z,x,y) | n z , r z , n x , r x , N y = n z − n x , R y = n z − n x .
Moreover, P(n x , r x | n z , n z , r z , r z ) = P(r x | n x , n z , n z , r z , r z ) × P(n x | n z , n z , r z , r z ) where in the last equality we have used that r x is independent of n z , r z , when given n x , n z , r z , and the fact that n x is independent of r z , r z , when given n z . (3) to express P(r x | n x , n z , r z ) and conclude:

About ranges
We start this section with a general discussion about the values that the random variables N x , N x , R x , R x can take for any population interface in the network. As usual, we will use lower-case letters for their realizations, i.e. n x , n x , r x , r x . Our remarks will allow us to derive the ranges used in our rules as simple consequences of a few equations.

Observable number of lineages across the network
The number of lineages n x , n x , r x , r x observed at any population interface in the network must satisfy a few simple and obvious constraints, which we list below: • For any branch x, the number of lineages at the top of the branch is at least 1, unless there were no lineages at the bottom of the branch, and at most equal to the number of lineages at the bottom. That is, • At any population interface, the number of red and green lineages cannot exceed the total number of lineages. That is, for any branch x: • For any internal node u, the numbers of red and green lineages entering u are the same as the numbers of red and green lineages exiting u. That is, if u is a tree node with ingoing branch z and outgoing branches x, y: (Note that these two equations also imply that the numbers of green lineages entering and exiting u are the same.) If u is a reticulation with ingoing branches x, y and outgoing branch z: • A simple consequence of Equations (4), (7) and (9) is that the number of lineages in any branch x cannot exceed the total number of lineages at the leaves that descend from x, that is: (This can easily be proven by induction on the height of x.) Constraints (4)-(10) above are not only necessary, but also sufficient to describe all possible values of n x , n x , r x , r x across the network. In theory they could be used to infer the precise ranges for these variables, starting from the leaves and moving up the network.
In practice, however, this is unnecessary. SnappNet only ensures that for any population interface x or x, the following two equations are satisfied: These equations also specify the ranges for which F x (n x ; r x ) is defined and stored in memory.
Note that equations (12) and (13) permit a few more values for the n arguments than are actually possible. For example n x is allowed to be 0, even when this is not possible (e.g. when x lies on all paths from a leaf with sampled individuals to the root). Whenever this occurs, the probability term within F x (n x ; r x ) equals 0. As a result, the partial likelihood itself is 0 and does not contribute to the calculation of any partial likelihood higher up in the network.

Ranges of the sums in Rules 2 and 4
It is now easy to justify the ranges in the sums in Rules 2 and 4. Recall that both these rules describe the behavior of the algorithm when traversing a tree node with ingoing branch z and outgoing branches x, y. Also recall that these rules sum over the possible values for n x and r x . Note that, because conservation constraints (7) and (8) must hold here, these values also determine the values of n y = n z − n x and r y = r z − r x .
Let's first consider the range for n x . By applying constraint (13) to n x and then n y , we must ensure: The second equation is equivalent to n z − m y ≤ n x ≤ n z and therefore we get: As for r x , by applying constraint (13) to r x and then r y , we must ensure: The second equation is equivalent to n x + r z − n z ≤ r x ≤ r z and therefore we get: max(0, n x + r z − n z ) ≤ r x ≤ min(n x , r z ).

Likelihood computation in detail
SnappNet uses Algorithm 1 to compute the full likelihood of a network Ψ with respect to D i , the data from marker i. The algorithm starts by initializing the data structures that will subsequently be used and then processes all nodes of the network Ψ using the rules presented in the main text. Rules 2, 3 and 4 are applied respectively in Algorithm 3, 4 and 5, together with suitable modifications of data structures.
The data structures are the following: ReadyNodesQ, a queue storing the nodes that are ready to be processed; Processed, which stores whether a node has already been processed or not; and CurrF, a dictionary that associates any branch x to the F x having x in x. In this pseudocode, F x represents a data structure holding all the relevant values of F x (n x , r x ), as well as the vector of population interfaces x. We also note that, to reduce memory usage, we only store the F x associated to branches that separate an unprocessed node to a processed node, as these are the only ones that will be used in future computations. Note that unlike in the main text, nodes are denoted u, u and u p in S1 Text.
if all children of u are leaves then Enqueue (ReadyNodesQ,u) end Algorithm 3: Apply Rule 2(u) // u is a tree node of Ψ Let x, y be u's outgoing branches and let z be u's incoming branch Apply Rule 2 to obtain F x,y,z from F x,x and F y,y if u is the root node of Ψ then return Apply Rule 1 to obtain F x,y,z from F x,y,z foreach branch w with an interface in x, y, z do

Other computational complexity results
In this section, we shall use the weak definition of connectivity in a directed graph: we say that two nodes in Ψ are connected is there is an undirected path between them in Ψ. The same holds for the notion of biconnected, see below.
3.1 Time complexity of the algorithm by Zhu et al. [1] Although the time complexity stated by Zhu and coauthors is O(sn 4r+4 ), where r is the number of reticulation nodes in the network, they also note that all labelled partial likelihoods (LPLs) at a lowest articulation node can be merged into a single LPL, thus avoiding carrying forth all that information [1]. This means that, as we stated in the main text, the time complexity to process a node with their algorithm is actually O(n 4ru+4 ), where r u is the number of reticulation nodes which descend from u, and for which there exists a directed path from u that does not pass via a lowest articulation node. Note that r u is potentially much smaller than r. We refer to the original paper by Zhu and coauthors for the definition of LPL and the full description of their algorithm [1].
Here we prove that, since the time complexity to process a node is O(n 4ru+4 ), then the whole algorithm runs in O(sn 4 +4 ) time, where is the level of the network [5,6].
Let us first recall some definitions from the theory of phylogenetic networks that are fundamental to analyse the complexity of the algorithm by Zhu et al. [1]. A subgraph G of Ψ is biconnected if the removal of any one node in G leaves the remainder of G connected. A biconnected component of Ψ is a maximal biconnected subgraph of Ψ. The nodes of Ψ that belong to two or more biconnected components are called articulation nodes. (Equivalently, articulation nodes are the nodes in Ψ whose removal cause the network to become disconnected.) An articulation node is said to be a lowest articulation node if all of its children are not articulation nodes. The level of a phylogenetic network is the maximum number of reticulation nodes in one of its biconnected components.
It is easy to see that a phylogenetic network has two kinds of biconnected components: those that only consist of two adjacent nodes -which we call trivial biconnected components -and more complex ones -which we call nontrivial biconnected components or blobs. Every articulation node of Ψ is found at the root of a biconnected component. The lowest articulation nodes of a network coincide with the roots of the network's blobs.
Recall that r u is defined as the number of reticulation nodes which descend from u, and for which there exists a directed path from u that does not pass via a lowest articulation node. Now note that every directed path that ends in a reticulation node v and does not pass via a lowest articulation node can only be from a node u in the same blob as v. Then, r u is at most equal to the number of reticulation nodes in the same biconnected component as u. In turn, the number of reticulation nodes in the same biconnected component as u is at most equal to , the level of Ψ. We can then conclude that r u ≤ and that each node is processed in at most O(n 4 +4 ) time, giving a total running time of O(sn 4 +4 ).

SnappNet's K and the level of the network
Here we prove that for any traversal of the network Ψ, we have K ≤ +1, where is the level of Ψ (Proposition 1 below).
We let B(x) denote the set of branches x for which there exists a population interface x or x in the VPI x. Moreover we let G Ψ x denote the subgraph of Ψ induced by all the descendant nodes of the branches in B(x).
The intuition behind the proof is that, for any VPI activated by the traversal algorithm, the branches in B(x) must all belong to the same biconnected component of Ψ. Moreover, |B(x)| cannot exceed 1 + the number of reticulations within that biconnected component, which implies K ≤ + 1.
Lemma 1. Let x be a VPI activated by any traversal algorithm using Rules 0-4. Then, G Ψ x is connected.
Proof. If x = (x) is activated by Rule 0, then G Ψ x consists of a single leaf and is trivially connected. Thus, we just need to prove that every subsequent application of Rules 1-4 can only activate a VPI x with connected G Ψ x , assuming that this property is satisfied by the VPI or VPIs that the rule uses as input.
For Rule 1, this is trivially true as G Ψ x,x = G Ψ x,x . For Rule 2, let's assume that G Ψ x,x is connected and that G Ψ y,y is connected. This implies that G Ψ x,y,z is connected, as x and y appear in G Ψ x,y,z and ensure that all nodes in G Ψ x,x are connected to all nodes in G Ψ y,y . For Rule 3 and 4, the thesis is again trivial, because G Ψ x for the newly active VPI only differs from the one for the input VPI by inclusion of a single new vertex, which is easily seen to be connected to the rest of G Ψ x .
Corollary 1. Let x be a VPI activated by any traversal algorithm using Rules 0-4. Then, all the branches in B(x) belong to the same biconnected component of Ψ.
Proof. If |B(x)| = 1, this is trivial. If B(x) contains at least two branches x and y, it is now easy to see that x and y belong to a cycle obtained by attaching the following two disjoint paths: (1) the path within G Ψ x from the bottom of x to the bottom of y -which exists because of Lemma 1 -and (2) the path from the bottom of x to the bottom of y, going via x and y and only using branches that are ancestral to x and y. The existence of this cycle implies the thesis. Lemma 2. Let x be a VPI activated by any traversal algorithm using Rules 0-4, and let R(x) be the set of reticulation nodes that descend from any branch in B(x) and belong to the same biconnected component as the one of B(x). Then, Proof. To make notation light, let b(x) = |B(x)| and r(x) = |R(x)|. As in the proof of Lemma 1, we start by noting that if x = (x) is activated by Rule 0, then the thesis trivially holds, as b((x)) = 1 and r((x)) = 0.
We then consider the other rules, and show that if the thesis holds for the VPIs that have already been activated, then it must hold for the newly activated thus proving the thesis for VPI x, y, z.
Finally, for Rule 4, we assume b(z, x, y) ≤ r(z, x, y)+1. Now distinguish between two cases. Either (i) z is nonempty, in which case B(z, x, y) and B(z, z) are in the same biconnected component and In this case we therefore have b(z, z) ≤ r(z, z), which implies the thesis.
We now have all we need to prove the main result of this section: Proposition 1. For any traversal algorithm using Rules 0-4 to process a network of level , K ≤ + 1.
Proof. Note that K = max{|B(x)| such that x is activated by the given traversal algorithm}.
Thus, using Lemma 2, and the definition of the level : K ≤ max{|R(x)| + 1 such that x is activated by the given traversal algorithm} ≤ + 1. Next, the following commands, were successively used to run MCMCBiMarkers. The first step consists in a pre-burnin phase relying on 3 chains of different temperatures. Note that the "-snet" option refers to the starting network obtained from the pre-burnin phase. Besides, the options "-mr" and "-pp" allow to specify the network prior: the maximum number of reticulations was set to 2, and the prior Poisson distribution on the number of reticulation nodes was centered on 2.

MCMC_BiMarkers
6 Supplementary results for the simulation study Table A. Table linked to Table 1 of the main manuscript. Trees inferred by SnappNet when m=1,000 sites were considered.  Table B. Average posterior probability (PP) of the topology of network C obtained by running MCMCBiMarkers on data simulated from network C. Same as Table 3 of the main manuscript except that 12 × 10 6 iterations are considered, and only one lineage is sampled in hybrid species B and C. ESS is the average ESS over the different replicates, and SE stands for the sampler efficiency.

Additional experiments on SnappNet's MCMC sampler
In the following, we describe a few experiments that were conducted to better understand the behavior of the MCMC sampler employed by SnappNet-in particular its efficiency at sampling from network space, and how this efficiency is affected by the priors on phylogenetic network and population sizes. The prior on phylogenetic networks is specified in terms of the birth-hybridization model by Zhang et al. [7].

Protocol
In the first experiment we assess whether the MCMC sampler employed by SnappNet can adequately sample from network space. We specify a posterior distribution over 5-taxon phylogenetic networks with high variance across multiple number of reticulations. We ran the MCMC sampler so that it sampled from a posterior distribution specified in terms of a birth-hybridization model prior, origin height prior and a null likelihood function (always returns zero regardless of the input data). We then compared the sampled networks with 5-taxon networks simulated directly from the birth-hybridization model. Theoretically we expect the distributions of sampled and simulated networks to match.
We studied three different cases of the birth-hybridization model prior, for each case we either specified a normal prior with mean 0.1 and standard deviation of 0.01 on the origin height or an exponential prior with mean 0.1 on the origin height (that is a total of six different scenarios): In the first case we used a birth-hybridization model with speciation rate 20 and hybridization rate 1 (mean number of reticulations close to zero). In the second case we used a birth-hybridization model with speciation rate 20 and hybridization rate 2 (mean number of reticulations close to one). In the third case we used a birthhybridization model with speciation rate 20 and hybridization rate 3 (mean number of reticulations close to two). We only kept simulated networks with 5 leaves.
In each case we simulated 1000 networks directly from the birth-hybridization model and sampled 2,000,000 networks using the SnappNet sampler (burning half the chain and logging every 1000th sample thereafter). Note that it is possible to fix the birth and hybridization rates in the prior used by SnappNet by fixing corresponding values for parameters d and r. We used Tracer to assess convergence of the MCMC chain by visually inspecting the trace and computing the ESS (effective sample size). Thereafter we compared the simulated networks with sampled networks in terms of the number of reticulations, time until first reticulation, network height and network length.

Results
In the first experiment the sampler converged to the specified prior in all three cases (for both origin height priors) based on the computed summary statistics (see        Table G. BH(birth rate, hybridisation rate) refers to the birth-hybridisation process of Zhang et al. with the specified birth and hybridisation rates. For data simulated with network A, only chains 1,2,3,4,9,10,11,12,17,18,19,20 were run. We indicate the mean number of reticulation for the Birth-Hybridization model given an exponential prior with mean 0.1 on network origin. Note that we only used the exponential prior in the experiment in Section 8.2.

Protocol
In the second experiment we assess how population size priors and network priors influence SnappNet's inferences, in particular the rate of convergence and sampling efficiency of the MCMC sampler. Recall that the network prior specifies a hybridization rate, whereas the prior on population sizes affects the probability of coalescence, and therefore that of ILS. Thus, these two priors have an important role in determining the relative probability of hybridization and ILS as causes of incongruent (non-tree-like) signals in the data.
We simulated 10,000 SNPs for network A and network B under the multispecies network coalescent using SimSnappNet. For each of these two simulated SNP datasets, we ran 12 (for network A) or 24 (for network B) MCMC chains, for 500,000 iterations each. See Table G for details on the priors specified for each chain. In this experiment we only use the exponential prior with mean 0.1 on the network origin.
Briefly, as in the experiment of Sec. 8.1, we specified a network prior using the birth-hybridisation model of Zhang et al. [7]. Again, we fixed the birth rate to 20 for all MCMC chains and chose a hybridisation rate so that the mean number of reticulations is close to zero, one or two. Furthermore we specified either a 'correct' or 'incorrect' prior on population size ('correct' implies the mean of the prior distribution corresponds to the population size parameter used to simulate the SNP dataset). The 'correct' population size prior on each branch was specified as Γ (1,200). The 'incorrect' population size prior on each branch was specified as Γ (1,20). For network B we considered two additional incorrect population size priors, namely Γ(1, 1000) and Γ (1,2000). Note that the rest of the priors of the model used the default SnappNet settings. In order to assess convergence we ran two MCMC chains for each prior setting (as specified in Table G). We randomly drew initial networks and population sizes for each MCMC chain from the prior distribution. Also note that, here we do not impose any upper bound on the number of reticulations in the sampled networks.  Table H (MCMC chains with correct population size priors) and Table I (MCMC chains with incorrect population size priors). We note that all chains with correct population size priors converged to the correct topology, network height and network length (see Figs U(c) and U(d)). We assume convergence for network topology since there was only one unique topology for each posterior distribution of the chains with correct population size priors. In each case the unique topology matched up with the topology of network A. Furthermore in Fig U(b) all chains have similar prior distributions. This could be due to the topology of network A that is very unlikely under all the specified birth-hybridization model priors (similar to sampling from a flat prior). We also note a much lower ESS under the model prior with reticulation mean close to zero (see ESS in Table H).

Results for network A
Chains with incorrect population size priors also converged to the correct topology. Similar to correct priors, there was only one unique topology for all chains. However the chains did not converge to the correct network height or network length. This is not unexpected since the length of a branch and its associated population size are correlated (see Bryant et al. [2] for more detail). Furthermore the ESS for chains with incorrect population size priors is significantly lower than chains with correct population size priors (see ESS in Table H and  Table I). There is also a difference in ESS between chains with different topology priors. In this case ESS is highest when the mean number of reticulations on the network topology prior is close to one and lowest when mean number of reticulations for the network topology prior is close to two. This seems to suggest that specifying a prior with correct mean number of reticulations can improve sampling efficiency.   1, 2000)). We also give detailed summary statistics in  Table J, Table K, Table  L and Table M).

Operator acceptance rates
To better understand the behavior of the MCMC sampler, we inspect the acceptance rates for the 5 operators acting on the network topology (AddReticulation, DeleteReticulation, FlipReticulation RelocateBranch, RelocateBranch-Narrow ), the 4 operators updating branch lengths (NodeSlider, NodeUniform, NetworkMultiplier, OriginMultiplier ) and the 2 operators updating population sizes (ChangeGamma, ChangeAllGamma).
We summarize the acceptance rates for network B in Table N, Table O and  Table P. Each table focuses on a different population size prior, while averaging across the topology priors.
We observe that MCMC moves that update topology have a much lower acceptance rate than MCMC moves that update branch lengths and population sizes. FlipReticulation moves, which flip the direction of a reticulation branch, are the least likely to be accepted. There is no clear difference in the acceptance rates between different population size priors. More work is needed to determine what the proposal weights should be in order to optimally sample from the posterior distribution.