Phylodynamics on local sexual contact networks

Phylodynamic models are widely used in infectious disease epidemiology to infer the dynamics and structure of pathogen populations. However, these models generally assume that individual hosts contact one another at random, ignoring the fact that many pathogens spread through highly structured contact networks. We present a new framework for phylodynamics on local contact networks based on pairwise epidemiological models that track the status of pairs of nodes in the network rather than just individuals. Shifting our focus from individuals to pairs leads naturally to coalescent models that describe how lineages move through networks and the rate at which lineages coalesce. These pairwise coalescent models not only consider how network structure directly shapes pathogen phylogenies, but also how the relationship between phylogenies and contact networks changes depending on epidemic dynamics and the fraction of infected hosts sampled. By considering pathogen phylogenies in a probabilistic framework, these coalescent models can also be used to estimate the statistical properties of contact networks directly from phylogenies using likelihood-based inference. We use this framework to explore how much information phylogenies retain about the underlying structure of contact networks and to infer the structure of a sexual contact network underlying a large HIV-1 sub-epidemic in Switzerland.


Author summary
Phylodynamic models relate the branching pattern of a pathogen's phylogenetic tree to the tree-like growth of an epidemic as it spreads through a host population. Such models are increasingly used to learn about the epidemiology of different pathogens. We extend current models to consider the structure of host contact networks-the web of physical interactions through which pathogens spread. By considering how local interactions among hosts shape the phylogeny of a pathogen, our models offer a "pathogen's eye view" of these networks. Our models also provide a statistical framework that can be used to infer network structure directly from phylogenies, which we use to estimate the properties of a sexual contact network in Switzerland from a HIV phylogeny. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
From the viewpoint of an infectious pathogen, host populations are highly structured by the physical contacts necessary for disease transmission to occur. For pathogens whose transmission does not require intimate or sustained physical contact, random mixing models assuming contacts form instantaneously between individuals may offer a reasonable approximation to the true dynamics of person-to-person contact [1][2][3]. But for pathogens like sexually-transmitted infections (STIs), transmission requires contacts that are generally more limited in number, less transient in nature, and form non-randomly based on individual behavior-resulting in host populations that are highly structured locally at the level of individuals [4][5][6][7]. Even for non-STIs, a limited number or clustering among contacts may constrain transmission; potentially explaining why newly emerging infections like SARS and Ebola give rise to large outbreaks in some social settings but not others [8][9][10]. It is therefore often more reasonable to view communities as networks of individuals connected by edges that represent the physical contacts through which transmission can occur. Through the study of theoretical network models, epidemiologists now understand that contact network structure has a profound influence on epidemic dynamics and whether or not control strategies will be effective [8,[11][12][13][14]. Yet studying the structure of contact networks empirically through methods such as contact tracing is difficult and costly, meaning we often know little about the structure of contact networks underlying real-world epidemics [15,16].
New hope for the empirical study of contact networks has emerged in recent years from the widespread availability of pathogen molecular sequence data. In molecular epidemiology, sequence data is already commonly used to link individuals into probable transmission pairs or clusters based on the phylogenetic distances between their pathogens. While such approaches do not directly reveal the structure of contact networks, they can reveal paths in the contact network through which the pathogen spread and provide a useful heuristic for assessing how well connected networks are within and between different subpopulations or risk-groups [17][18][19]. Other methods in molecular epidemiology attempt to reconstruct the full details of the underlying transmission tree, the directed graph showing exactly who infected whom in an outbreak [20][21][22][23][24][25]. Essentially though, all current methods for inferring linkage and transmission trees take a bottom-up approach-they attempt to reconstruct routes of transmission by linking sampled individuals based on their phylogenetic distance. While this can be a powerful approach for studying densely sampled outbreaks where most infected individuals are sampled, bottom-up approaches may provide misleading results when applied to sparsely sampled epidemics. In this case, two infected individuals may have pathogens that are most closely related to one another phylogenetically but an unknown number of intervening infections might separate them in the true transmission tree. Thus the phylogenetic proximity of individuals may only weakly correlate with their proximity in the transmission tree, making it very difficult to reconstruct the detailed transmission history of who infected whom.
While it may not be possible to reconstruct the detailed structure of transmission networks from sparsely sampled data, it may still be possible to infer large-scale properties of contact networks. By simulating the phylogenetic history of pathogens spreading through networks, recent studies have shown that network properties can exert a strong influence on the structure of phylogenetic trees [26][27][28]. For example, increasing levels of contact heterogeneity-variation in the number of contacts individuals form-can result in increasingly asymmetric or imbalanced trees and shift the distribution of coalescent (i.e. branching) events earlier towards the beginning of an epidemic [26,27]. However, statistical measures of tree topology like imbalance may only weakly correlate with network statistics like contact heterogeneity, and may also be highly dependent on how samples are collected [28]. Moreover, in addition to network structure, population dynamics also strongly shape phylogenies and therefore potentially confound inferences of network structure drawn from phylogenies [28]. For example, clustering of samples together in phylogenetic trees has previously been assumed to indicate clustering of individuals in the underlying contact network, but phylogenetic clustering can arise naturally in epidemics even when no measurable degree of clustering exists in a population [29]. Taken together then, previous work suggests that contact network structure can shape pathogen phylogenies, but we do not yet know how to properly extract this information from trees.
In this paper, we present a new theoretical framework for relating pathogen phylogenies to contact networks using phylodynamic modeling. Our approach is quite different from bottom-up approaches in that it does not attempt to reconstruct the details of person-to-person transmission. Rather, we start with a random graph model [30] that captures the important statistical properties of real-world networks. We then use pairwise epidemic models [31][32][33] to capture the population dynamics of an epidemic on a network with the statistical properties specified by the random graph model. In addition to tracking the infection status of individuals, these pairwise models track the status of pairs of individuals and thereby correlations in the infection status of neighboring individuals, such as the depletion of susceptible hosts around infected individuals. Analogously, by shifting our focus from the level of individuals to the level of pairs, we derive a relatively simple coalescent model that captures a pathogen's phylogenetic history as a backwards-time dynamical process on a network. The pairwise coalescent model naturally takes into account incomplete sampling and how network structure and epidemic dynamics interact to shape pathogen phylogenies. By considering phylogenies in a probabilistic framework, the pairwise coalescent model also allows us to compute the likelihood of a given phylogeny evolving on a network with defined statistical properties, and therefore to estimate the structure of networks from phylogenies using likelihood-based inference.
How local contact network structure shapes pathogen phylogenies has received some attention in recent years [26][27][28], but has not been comprehensively studied. After deriving the pairwise coalescent model, we therefore begin by using simulations to explore how network properties such as overall connectivity, clustering, contact heterogeneity and assortativity shape phylogenies. Using these simulations, we demonstrate that the pairwise coalescent model captures how these network properties shape phylogenies in terms of coalescent times, how lineages move through a network, and overall tree topology. We then go on to show that the model can be used to accurately estimate network properties from phylogenies, although how precisely depends strongly on sampling effort. Finally, we have implemented the model in BEAST 2 [34] as a package called PairTree, which we use to estimate the structure of a contact network underlying a large HIV sub-epidemic among men-who-have-sex-with-men (MSM) in Switzerland.

Models and methods
Our phylodynamic modeling framework is composed of three interacting components. The first two components, random graph and pairwise epidemic models, are well described in the literature and we only briefly review the necessary concepts and notation here. Instead, we focus on the third and novel component of our framework, the pairwise coalescent model, which we derive from the pairwise epidemic model.

Random graph models
In network epidemiology, random graph models are often used to model the large-scale statistical properties of networks while treating the fine-scale details of who is connected to whom as random. Random graph models can therefore be thought of as a probability distribution on graphs constrained to take on certain statistical properties. Here, we use the configuration model [35] and extensions thereof to model network structure and generate random graphs parameterized to vary in overall connectivity, clustering, contact heterogeneity and assortative mixing.
A. Connectivity. Connectivity quantifies how well-connected individual nodes are in a network in terms of their degree, or number of contacts. In the simplest case of the configuration model, all N nodes are assigned a fixed degreek and then randomly connected to other nodes through edges, resulting in a homogenous or k-regular random graph. The parameterk therefore quantifies the overall connectivity of the network.
B. Clustering. Clustering is defined as the probability that two nodes connected to a common neighbor are also connected to one another, and therefore quantifies how locally interconnected networks are [31]. Clustering can be quantified in terms of a clustering coefficient ϕ: where a triangle refers to a closed loop of three connected nodes and a triple to three linearly connected nodes [30].
To introduce clustering into random networks, we use the triangular configuration model [36,37]. Under this model, rather than defining the degree distribution d k , we define a joint degree distribution d st on the probability that a node is connected to s neighbors not forming triangles and 2t other neighbors through triangles, and thus has total degree k = s + 2t. As shown by [37], given d st and the overall degree distribution d k , the expected clustering coefficient is C. Contact heterogeneity. Contact heterogeneity refers to variation in the number of contacts individuals form in a network and can be quantified by the variance s 2 k in the degree distribution d k . Networks with any arbitrary degree distribution can be generated under the standard configuration model by assigning each node n = 1, 2, . . ., N a degree according to a random degree sequence k 1 , k 2 , . . ., k N drawn from d k . Each individual node n is then randomly joined to k n other nodes to form the edges of the network.
D. Assortative mixing. Assortative mixing is the tendency for individuals to form connections with individuals similar to themselves, leading to correlations between the properties of adjacent nodes in a network [30]. Here, we introduce correlations in the degree of connected nodes by specifying the edge degree distribution e kl , which gives the probability of a randomly chosen edge connecting a degree k to a degree l node. The strength of assortative mixing can be quantified in terms of the assortativity coefficient r given e kl and d k : Thus, r can also be interpreted as the Pearson correlation coefficient in the degree of nodes connected by edges in the network.
To get a one-parameter random graph model that allows for the strength of assortative mixing to vary based on r, we follow [38] and constrain each entry in e kl to follow the form where x k is a normalized distribution chosen such that e kl is never negative. To our knowledge, there is no algorithm that allows for direct simulation of networks from the distribution over graphs defined by e kl . We therefore sample networks using the Metropolis-Hastings sampler also proposed by [38] that iteratively rewires networks until convergence on a target distribution defined by e kl is reached.

Pairwise epidemic model
The second component of our modeling framework consists of epidemiological models that describe the dynamics of a pathogen spreading through a network with statistical properties specified by a random graph model. As in standard SIR-type epidemiological models, we track the infection status of each host node as susceptible or infected, along with an optional recovered class. We use the notation [S k ] and [I k ] to denote the number of degree k susceptible and infected individuals. [S k I l ] denotes the the number of pairs in the network connecting susceptible individuals with degree k to infected individuals with degree l. At the level of individuals, the epidemic dynamics are described by the following differential equations: Here, τ is the per-contact rate at which infected individuals transmit to their neighbors and ν is the removal or recovery rate. The dummy variable δ 0,1 is set to either 1 or 0 depending on if recovered individuals become susceptible again (the SIS model) or are immunized (the SIR model).
As seen from Eq (5), the transmission dynamics depend on the [S k I l ] terms and thus how individuals are connected into pairs or partnerships. We therefore need to track the dynamics at the level of pairs, which in turn depends on the number of triples such as [S k S l I m ] where m is the degree of the third node: These are the pairwise network equations introduced by [31] and extended to heterogenous contact networks by [33]. By tracking the status of pairs rather than just individuals, the pairwise equations take into account local correlations that build up over time between the infection status of neighboring nodes; hence their other common name, correlation equations [32]. These local correlations arise because a node's infection status depends strongly on the status of its neighbors. For example, early on in an epidemic positive correlations develop between infected individuals, reflecting the fact that infected individuals are likely to be surrounded by other infected individuals who either infected them or became infected by them. Because these correlations can have a strong impact on epidemic dynamics, such as through the local depletion of susceptible nodes surrounding infected nodes, tracking these correlations allows pairwise models to more accurately describe epidemic dynamics on networks. The initial conditions for the pairwise epidemic model depend on the degree distribution d k and edge degree distribution e kl of the network and are described in S1 Text.
While the dynamics at the level of pairs depends on the number of triples, which in turn depends on even higher-order configurations, previous work has shown that moment closure methods can be used to approximate the number of triples based on the number of pairs without much loss of accuracy [33,39]. We thus "close" the system at the level of pairs by approximating each triple of arbitrary type [ABC] as: where l is the degree of the central node in state B. By taking into account the clustering coefficient ϕ, this moment closure takes into account additional state correlations that can arise between three nodes when there is appreciable clustering in the network [31,32]. The basic reproductive number R 0 , as usually defined, is the average number of secondary cases resulting from a single infectious individual in an otherwise susceptible population. As shown in [33], for the pairwise model on a heterogenous network, where λ max is the dominant eigenvalue of the next generation matrix M, which has elements Here, N k and N l represent the total number of individuals in the network with degree k and l.

Pairwise coalescent model
The third and novel component of our modeling framework are coalescent models that allow us to probabilistically relate the phylogenetic history of a pathogen back to the dynamics of an epidemic on a network. In essence, these coalescent models provide a probability distribution over trees, and therefore allow us to compute the likelihood of a given phylogeny having evolved on a random graph with known statistical properties. While coalescent theory has previously been extended to accommodate the nonlinear transmission dynamics of infectious pathogens [40][41][42][43], these coalescent models assume random mixing, at least within discrete subpopulations, and therefore neglect local contact network structure. Below, we extend the structured coalescent framework of [43] to include local contact network structure by shifting our focus from the level of individuals to pairs of hosts in the network.
The likelihood of a time-scaled phylogeny T under a structured coalescent model with parameters θ has the general form: For a tree containing P samples, the total likelihood is the product of the likelihood of each of the P − 1 coalescent events and the waiting times between events. The likelihood of each coalescent event depends on the rate λ ij (t p ) at which lineages i and j coalesce at time t p . These rates may depend on the location of lineages i and j, and thus this formulation of the coalescent likelihood accommodates population structure. The likelihood of the waiting time between coalescent events is given by the exponential term in Eq (10). Here, the sums are over all lineages AðsÞ present in the phylogeny at time s, which is allowed to change within coalescent intervals due to sampling.
The pairwise coalescent rates λ ij are centrally important to our model as they are required to compute the likelihood in Eq (10) and provide the main link between the epidemic dynamics and the coalescent process. To derive these rates, we begin by making the simplifying assumption common in phylodynamics that only a single pathogen lineage resides in each infected host. While this assumption ignores within-host pathogen diversity, it dramatically simplifies the relationship between transmission events and coalescent events in the pathogen phylogeny: each coalescent event in the phylogeny will represent a transmission event on the network. Below, we use this relationship to derive the pairwise coalescent rates λ ij for pairs of lineages.
Pairwise coalescent rates. To derive the pairwise coalescent rate λ ij , we first consider the probability that lineages i and j coalesce conditional on a transmission event occurring somewhere in the network. In order for two lineages to coalesce at a transmission event from an individual with l contacts to an individual with k contacts, we can reason that at the time of the event three conditions must hold: 1. The two lineages must be in two infected individuals, one with k and the other with l contacts.
2. The two lineages must reside in two nodes connected in a I k I l pair.
3. The two lineages must be in the specific I k I l pair involved in the transmission event.
We note that each of these conditions must be met in turn for the remaining conditions to be met. We will therefore consider the probability that each of these conditions is true in turn conditional on the preceding conditions having been met. In this case, Pr(2) ≔ Pr(2|1) and Pr(3) ≔ Pr(3|1, 2), such that the joint probability of all three being true Pr(1, 2, 3) = Pr(1)Pr(2|1)Pr(3|1, 2).
First, consider the probability that lineages i and j are in two infected individuals; one in a I k node and the other in a I l node. In general, we will not know the degree of the infected node in which the lineage resides (for shorthand, we will refer to this as the lineage's state). We must therefore treat the state of lineages probabilistically and will use the notation p ik to represent the probability that lineage i resides in a degree k infected node. The probability p ik will evolve over time, and we show below how we track p ik backwards through time. The probability P kl that lineages i and j reside in nodes with degrees k and l is then equal to the probability that lineage i is in state k and lineage j is in state l or vice versa, such that Second, consider the probability that lineages i and j reside in two nodes connected in a I k I l pair. The total number of possible pairs between I k and I l nodes is [ Because pairs are assumed to form randomly under our random graph models, the probability χ kl that a randomly chosen I k node is connected to a random I l node in a I k I l pair is: Given that our two lineages reside in I k and I l nodes, then the probability that our lineages reside in a I k I l pair is χ kl .
Third, given that lineages i and j are in two nodes that form a I k I l pair, the probability that it is this pair out of all I k I l pairs in the network that was involved in a given transmission event is Note that all three of these probabilities evolve over time to reflect the changing distribution of infectious contacts in the network through their dependence on the epidemic dynamics described by the pairwise epidemic model, which in turn reflects the degree distribution d k and the edge degree distribution e kl in the underlying random graph model. For example, in a model with assortative mixing, correlations in the degree of connected individuals as described by e kl will be reflected in [I k I l ], and therefore influence χ kl as [I k I l ] evolves over the course of an epidemic.
These three probabilities collectively give the probability that lineages i and j coalesce at a particular transmission event. Multiplying these probabilities by the total rate at which degree l nodes transmit to degree k nodes, the rate at which lineages i and j coalesce through l ! k transmission events is: Summing over all possible transmission events with respect to the degree of the nodes involved, we arrive at the total pairwise coalescent rate: The likelihood given in Eq (10) can then be computed by plugging in λ ij (t) for each lineage pair after numerically integrating the ODEs for [S k I l ] and [I k I l ] given in Eq (6) up to time t.
For a homogenous network where all nodes have the same degreek, Eq (14) simplifies to In a fully connected network where each node has degreek ¼ N À 1, every node is connected to every other node. In this limiting case, we expect the dynamics of an epidemic on a network to be the same as under a random mixing model with a transmission rate β scaled so that infectious contacts occur at the same rate under both models. In this case, the probability that two random infected nodes are connected in a pair χ ) 1, [SI] ) SI, and ½II ) I 2 À Á . Making these substitutions in Eq (15), we see that This is the same pairwise coalescent rate derived by [40] for a random mixing SIR-type model. We therefore see that the pairwise coalescent model and earlier coalescent models assuming random mixing converge in the limit of a fully connected network.
Tracking lineage movement. We now consider how individual pathogen lineages move through a network. Because we need to know the probabilities p ik of a lineage residing in a degree k host in order to compute the pairwise coalescent rates given in Eq (14), we probabilistically track the movement of lineages by tracking how p ik changes backwards through time along a lineage using a framework based on master equations previously developed by [43]. These master equations have the general form where γ k l is the rate at which lineages transition from degree l to degree k hosts backwards in time. How these transition rates are computed and further details about how the ancestral degree distribution p k is computed for each lineage in a phylogeny are described in S1 Text, where we also show that these master equations accurately describe how lineages move through networks (S1 Fig).

Summary of the general approach
To summarize, our approach uses a random graph model to describe the statistical properties of a network including its degree distribution. These statistical properties are then used to initialize the state variables in the pairwise epidemic model-including the degree distribution of susceptible and infected individuals in the network. The ODEs for state variables like the number of [S k I l ] pairs given in Eqs (5) and (6) are then solved forward in time. Given these forward-time dynamics, the lineage state probabilities can be solved backwards in time along each lineage in the phylogeny according to Eq (17). With both the epidemic state variables and lineage state probabilities at hand, the pairwise coalescent rates can be computed according to Eq (14). Finally, with these coalescent rates the likelihood of a phylogeny can be calculated using Eq (10), allowing us to infer network parameters in either a maximum likelihood (ML) or Bayesian framework. For ML inference, we use a numerical optimization routine (fminsearch in Matlab) to find the parameter values that maximize the likelihood of a given phylogeny. For Bayesian inference, we use a MCMC approach to sample from the posterior distribution. For the latter, the pairwise coalescent model was implemented in BEAST 2.
Details on how the MCMC analysis was performed are provided in the input XML files provided along with the source code at https://github.com/davidrasm/PairTree.

The coalescent process on random networks
To see how key network properties shape pathogen phylogenies and how well the pairwise coalescent model captures their effects, we generated networks under random graph models parameterized to obtain networks with known statistical properties. On top of these networks, we simulated the spread of an epidemic using individual-based stochastic (IBS) simulations that tracked the ancestry of each pathogen lineage forward in time so that a true phylogeny was obtained from each simulation (see S1 Text). We then compared the epidemic dynamics and phylogenies simulated under the IBS model to those expected under the pairwise epidemic and coalescent models.
As expected from earlier work [14,39,44], the pairwise epidemic model provides an excellent deterministic approximation to the mean dynamics observed in IBS simulations across a wide range of random networks, whereas random mixing models generally do not (Fig 1). Likewise, the pairwise coalescent model does an excellent job of capturing the coalescent process on these networks in terms of the temporal distribution of coalescent events over the epidemic (Fig 2, blue). In contrast, the coalescent distributions expected under a random mixing coalescent model provide a reasonable approximation on some networks but not others (Fig 2,  red). For example, on poorly connected and highly clustered networks, the expected distribution of coalescent times under random mixing deviates widely from the IBS simulations. This is the case even if we condition the random mixing model on the more accurate population trajectories predicted by the pairwise epidemic model (Fig 2, green). On better connected networks and on networks with more contact heterogeneity, the random mixing model does almost as well as the pairwise coalescent model. In the S1 Text, we additionally explore when the pairwise approximation fails due to the presence of higher-order network structure (see S2 Fig).

Network effects on tree topology
In addition to coalescent time distributions, contact network structure can shape the topology of phylogenies. In particular, pathogens residing in well connected hosts may have the opportunity to infect more hosts and therefore leave more descendent lineages, causing trees to become increasingly asymmetric or imbalanced as the amount of contact heterogeneity increases in a population [26][27][28]45]. We therefore simulated trees on random networks with different levels of contact heterogeneity using IBS simulations and under the pairwise coalescent model using backward-time simulations in order to see if the coalescent model can capture the effects of contact heterogeneity on tree imbalance. Trees were compared using three of the most widely used measures of tree imbalance [46]: Colless's index, Sackin's index and the number of cherries. How these statistics were computed and normalized for trees of different sizes is described in the S1 Text.
Trees generated by IBS simulations and under the pairwise coalescent model both grow increasingly imbalanced with increasing contact heterogeneity (Fig 3). Colless' and Sackin's index both increase with greater contact heterogeneity, while the number of cherries decreases as expected for more imbalanced trees. However, IBS trees are more imbalanced overall than coalescent trees and grow disproportionally more so with increasing contact heterogeneity. This discrepancy may be due to additional network structure above the level of pairs not accounted for in the pairwise models-leading to additional variability in transmission potential for lineages in different parts of a network. Thus, it appears that the pairwise coalescent can partially capture the effects of local contact structure on tree topology, although the effect of network structure appears to be stronger in trees evolving on actual networks.

Inference
The pairwise coalescent model allows us to compute the likelihood of a given phylogeny being generated by an epidemic on a random graph with defined statistical properties. It is therefore possible to estimate the statistical properties of a network directly from a phylogeny. However, phylogenies may retain little information about contact network structure, especially if the epidemic is sparsely sampled. To explore the information content of phylogenies regarding network structure, we simulated epidemics on random networks with known statistical properties. A variable fraction of infected nodes was then sampled upon removal to obtain To allow for greater assortativity, the mean and variance of the degree distribution was raised to six. For all simulations the network size N = 250, the transmission rate τ = 0.5 and the recovery rate ν = 0.1. The transmission rate β under random mixing was scaled so that the rate of infectious contacts was the same as under the pairwise epidemic model at t = 0, giving the two models the same intrinsic growth rate. https://doi.org/10.1371/journal.pcbi.1005448.g001 Phylodynamics on networks phylogenies with sampling fractions ρ of 10, 25, 50 and 100%. The pairwise coalescent model was then used to construct likelihood profiles for each parameter controlling local network structure while all other epidemiological parameters were fixed at their true values.
At sampling fractions at or below 10%, except for overall connectivity the simulated phylogenies contain little or no information about local network structure, as seen from the essentially flat likelihood profiles (Fig 4). At sampling fractions ! 25%, the likelihood profiles begin to show significant curvature for clustering and contact heterogeneity, and with sampling fractions ! 50% the likelihood profiles are sharply curved enough that these parameters can be estimated rather precisely with narrow 95% confidence intervals. Assortativity appears more difficult to infer from phylogenies, even if the true degree of sampled nodes is provided. Although the likelihood profiles for r do show some curvature at sampling fractions ! 50%, the credible intervals remain relatively wide even with complete sampling.  To check for potential biases in our estimates of network parameters, we simulated 100 additional phylogenies under a fixed value of each parameter using forward-time IBS simulations. We then obtained a maximum likelihood estimate (MLE) of the corresponding parameter. The MLEs appear centered around the true parameter values with little to no detectable bias (Fig 5a-5d).
Next, we simulated trees under a wider range of parameter values for each network property to check how well our estimator performs under different model parameterizations. Overall, parameter estimates appear well-calibrated with a high correlation between the true and MLE values (Fig 5e-5h). While the coverage of our confidence intervals falls below the desired 95% level, we believe the coverage achieved is very reasonable given that the epidemic dynamics in the stochastic simulations can diverge considerably from what is expected under the pairwise model. In S1 Text, we in fact show that most of the estimation error can be attributed to stochastic variation in the epidemic dynamics (see S3 Fig).

HIV-1 in Switzerland
We performed a phylodynamic analysis of a HIV-1 subtype B epidemic among men-whohave-sex-with-men (MSM) in Switzerland using the pairwise coalescent model to see if we could estimate the statistical properties of a real-world sexual contact network. HIV pol sequences from infected individuals were obtained from patients enrolled in the Swiss HIV Cohort Study [47][48][49]. To minimize the effects of spatial structure, we focus on a single large sub-epidemic identified as primarily occurring in the Zürich region in a preliminary phylogenetic analysis (see S1 Text and S4 Fig). A time-calibrated phylogeny containing 200 sequences revealed this sub-epidemic to be quite genetically diverse with many older lineages originating in the early 1980's (Fig 6a).
We fit a SIR-type pairwise epidemic model to this sub-epidemic while simultaneously inferring the tree from the sequence data in BEAST 2. We assumed a discretized gamma distribution for the degree distribution d k , which allowed us to independently estimate the mean μ k and variance s 2 k of the network's underlying degree distribution. The posterior estimates of μ k and s 2 k indicate that the network was not especially well-connected (median μ k = 2.20) but Phylodynamics on networks heterogenous in degree (median s 2 k ¼ 4:04) (Fig 6b and 6c). The basic reproductive number was estimated to be between 1.0 and 2.5, although unlike for the network parameters the posterior density of R 0 only diverged slightly from the prior (Fig 6d). R 0 values estimated under the pairwise coalescent model were however significantly lower than the values estimated under the random mixing model, even though the same prior on R 0 was used for both models.
Overall our phylodynamic analysis suggests that this particular sub-epidemic spread rapidly by way of a few highly connected individuals. This is supported by the inferred degree distribution of the network and can be seen from the expected degree of lineages computed from the inferred ancestral degree distribution of each lineage over time (Fig 6a). Most coalescent (i.e. transmission) events early in the epidemic are attributable to lineages residing in high degree individuals. Later, towards the beginning of the 2000's, a few clusters in the tree begin to grow again through new transmission events along lineages with higher than average degree, which corresponds in time to the resurgence of HIV among MSM in Switzerland [48,50].

Discussion
Recent work has suggested that the structure of local contact networks can shape pathogen phylogenies [26][27][28]. Yet it remains unclear how much information pathogen phylogenies retain about the networks through which they spread and how to best extract information about network structure from trees. As a step towards addressing these questions, we sought a simple theoretical framework to explore the relationship between contact networks, epidemic dynamics, and phylogenies. Starting with random graph and pairwise epidemic models, we derived a fairly simple coalescent model that includes local network structure by using pair approximations. By treating the coalescent process as a backwards-time dynamical process on a network, our pairwise coalescent model allows us to capture the phylogenetic history of a pathogen in terms of how lineages move through a network and the rates at which they coalesce. As we have shown, our phylodynamic modeling framework provides a very good approximation to the coalescent process on random networks and can recapitulate the major features of pathogen phylogenies simulated on different types of random graphs.
Using the pairwise coalescent model and individual-based stochastic simulations as guides, we reexamined how contact network structure shapes pathogen phylogenies. We found that local contact network structure can have a strong impact on the the coalescent process in terms of the timing of coalescent events. Network properties like overall connectivity and contact heterogeneity that increase the epidemic growth rate concentrate coalescent events towards the beginning of an epidemic, while properties like clustering that slow epidemic growth broaden the distribution of coalescent events over the epidemic. On the other hand, properties like assortativity that were observed to have no strong effect on epidemic dynamics Phylodynamics on networks likewise have little influence on the timing of coalescent events (although we may have observed a larger effect if we had considered more extreme forms of assortative mixing [51][52][53]). This suggests that local contact network structure primarily shapes the coalescent process indirectly through the network's influence on epidemic dynamics, particularly the timing of transmission events.

Phylodynamics on networks
Because the pairwise coalescent model can be used for likelihood-based inference, it also offers a means of exploring how much information phylogenies contain about contact network structure. Using simulated phylogenies, our ability to infer network properties was highly dependent on the fraction of sampled individuals. While we could estimate network properties that strongly regulate epidemic dynamics such as overall connectivity even at sampling fractions as low as 10-25%, other properties that do not strongly regulate epidemic dynamics like assortativity proved difficult to precisely estimate even with complete sampling. This observation suggests that for the parameters that can be estimated at low sampling fractions, we may largely be inferring the structure of networks not from any direct signal of network structure in the tree itself, but from the indirect effect of network structure on the epidemic dynamics reflected in the timing of the coalescent events in the tree. Thus, without additional information beyond a phylogeny, network parameters may essentially be statistically unidentifiable from other epidemiological parameters like transmission rates that control epidemic dynamics.
Although it appears difficult to estimate some network properties from phylogenies, we were able to obtain informative estimates of the degree distribution of a sexual contact network underlying a large HIV sub-epidemic in Switzerland. While we were likely helped by the high fraction of HIV infected individuals sampled in Switzerland (at least 34-38%) compared with other countries and the relatively informative priors we placed on the model's epidemiological parameters, this demonstrates that it is at least technically possible to estimate the structure of real-world contact networks from phylogenies. Our phylodynamic estimates indicated that the Swiss MSM network was not particularly well-connected with a rather low mean degree, but that the number of contacts per individual was highly variable. While a high degree of contact heterogeneity is consistent with other empirical studies of sexual contact networks [4,6,54], such low overall connectivity may be surprising in a high-risk population such as MSM. However, our estimates were based on a random graph model assuming a completely static network and therefore may better reflect the momentary degree distribution of the network in terms of the number of concurrent or nearly concurrent contacts individuals have rather than their total number of partners over time, which may grow substantially larger depending on the time they remain sexually active.
As in mathematical epidemiology, phylodynamics has made conceptual progress by developing simple models that largely assume random mixing [40,55,56]. It may now be worthwhile to consider when including network structure is likely to be important in phylodynamics. From earlier work in network epidemiology, it is well known that random mixing models can fail to provide an accurate description of epidemic dynamics because they ignore how local structure promotes or hinders transmission as well as how this local structure evolves over the course of an epidemic [14,31,57]. The timing of transmission events as regulated by local interactions on the network also appears to determine how well random mixing models can approximate the coalescent process on networks. On weakly connected or highly clustered networks where local interactions strongly limit transmission due to saturation effects, random mixing models overshoot the true transmission rate and therefore also the expected coalescent rate. In better connected networks, the effect of these local interactions on transmission is minimized by well-connected nodes and random mixing models can perform quite well [14]. The effect of local contact network structure on the coalescent process can therefore probably be safely ignored in highly connected networks, but may be important to consider in less well-connected networks. While network structure can also shape the topological characteristics of phylogenies such as tree imbalance, these effects appear to be quite weak [28], unless the distribution of contacts is extreme, as in the case of scale-free networks with power law degree distributions [27]. For multi-strain pathogen systems, network structure can have more complex effects on evolutionary dynamics depending on how different pathogens interact; ranging from a higher probability of invading strains being competitively excluded [58,59] to facilitating the spread of synergistic coinfections [60]. The importance of multi-strain interactions on networks remains to be explored in phylodynamics.
Considering network structure may also be important when estimating and interpreting key epidemiological parameters. Of perhaps most interest to epidemiologists is R 0 , which is a function of both the transmissibility of a disease and contact network structure [8]. Thus, even if two pathogens are equally transmissible per contact, they can have substantially different R 0 values based on contact patterns. Contact heterogeneity can for instance dramatically increase R 0 , while clustering can reduce R 0 [61,62]. But network structure can have an even more dramatic impact on epidemic dynamics, especially initial growth rates. For example, epidemics can spread very rapidly though heterogenous contact networks via a few highly connected individuals. In this case, R 0 would likely be overestimated under a random mixing model because fast initial growth can only be explained under such a model by uniformly high transmission rates in the population; whereas in fact the highly connected individuals infected early in the epidemic do not represent the average connectedness of the population. This most likely explains why we estimated R 0 for the Swiss HIV sub-epidemic to be more than twice as high under a random mixing model than under a model accounting for contact heterogeneity. Phylodynamic inference using random mixing models may be especially prone to large estimation errors in such settings because most phylogenetic information is concentrated in the early stages of an epidemic when most coalescent events occur just as network structure begins to shape epidemic dynamics.
While we strove for simplicity, the true complexity of real-world contact networks does highlight some deficiencies in the pairwise models. First, while we only considered perfectly static random graph models, real-world networks temporally evolve as new contacts form and dissolve. Pairwise epidemic models that allow for dynamic partner exchange have been proposed [63,64], and in theory could be merged with our pairwise coalescent model to explore contact durations that are intermediate between the infinitesimal nature assumed by random mixing models and the permanent nature assumed by static models, as in [65]. Second, the deterministic pairwise models ignore the often considerable stochastic variability of epidemics on networks, which we found to be a major source of estimation error. Particle filtering approaches similar to those in [66] could be adapted to the pairwise coalescent model, although it is not straightforward to simply add stochasticity into pairwise models [67,68]. Finally, the random graph models we employed here only consider local structure at the level of pairs in the network. Higher-order structure that subdivides networks into different communities most likely also plays a very strong role in shaping pathogen phylogenies. Developing methods that can quantify connectivity within and between communities while accounting for epidemic dynamics and incomplete sampling such as our approach does on local networks remains a challenging but highly important area of future research.
Supporting information S1 Text. Supporting information on models, simulation methods and the HIV-1 phylogenetic analysis. Includes details on initial conditions for the pairwise epidemic model, how the pairwise coalescent model tracks lineage movement through networks, how epidemics and phylogenies were simulated, when the pairwise approximation fails due to higher-order community structure, how tree imbalance statistics were computed and normalized, how stochastic variability in epidemic dynamics can lead to estimation error under the deterministic pairwise models and details about the phylogenetic analysis of the Swiss HIV-1 sequence data. (PDF)