A Simulation Study Comparing Epidemic Dynamics on Exponential Random Graph and Edge-Triangle Configuration Type Contact Network Models

We compare two broad types of empirically grounded random network models in terms of their abilities to capture both network features and simulated Susceptible-Infected-Recovered (SIR) epidemic dynamics. The types of network models are exponential random graph models (ERGMs) and extensions of the configuration model. We use three kinds of empirical contact networks, chosen to provide both variety and realistic patterns of human contact: a highly clustered network, a bipartite network and a snowball sampled network of a “hidden population”. In the case of the snowball sampled network we present a novel method for fitting an edge-triangle model. In our results, ERGMs consistently capture clustering as well or better than configuration-type models, but the latter models better capture the node degree distribution. Despite the additional computational requirements to fit ERGMs to empirical networks, the use of ERGMs provides only a slight improvement in the ability of the models to recreate epidemic features of the empirical network in simulated SIR epidemics. Generally, SIR epidemic results from using configuration-type models fall between those from a random network model (i.e., an Erdős-Rényi model) and an ERGM. The addition of subgraphs of size four to edge-triangle type models does improve agreement with the empirical network for smaller densities in clustered networks. Additional subgraphs do not make a noticeable difference in our example, although we would expect the ability to model cliques to be helpful for contact networks exhibiting household structure.

1. Simulate a length N random sequence of node degrees {D u } from the distributionp(d).
2. Randomly assign a number of triangle corners T u to each node u according to its degree D u and the conditional distribution {p j|Du }.
3. For each node u, assign the number of stubs S u according to S u = D u − 2T u .
4. Form the network by connecting stubs to stubs and corners stubs to corner stubs.

Remove self-loops and multiple edges.
We consider six estimators for the degree distribution and evaluate each in the same way, using the following four steps. (Note that the empirical network is a snowball sample from 49 seed nodes, and waves zero to two.) 1. For a size N network simulate a length N random sample of node degrees {D 1 , . . . , D N }.
2. Use the steps above to create the node role sequence (S u , T U ), u = 1, . . . , N and create the network.
3. Choose 49 nodes at random as wave 0 and construct a two-wave snowball sample. 4. Repeat steps 1-3 to simulate 100 independent snowball samples from size N networks.
If the hypothesized degree distribution is a reasonable model to generate the simulated networks, the sample degree distribution of the observed snowball sample should resemble the estimated degree distribution from the 100 simulated snowball samples. This can be tested with a χ 2 goodness-of-fit test, for example.
Let q A (d) be the empirical probability a node in set A has degree d where N A is the number of nodes in set A. Estimator one uses the empirical probabilities from only the seed nodesp If the seed nodes were an independent and identically distributed (i.i.d.) sample from the population, A would be a random sample and this could produce a useful distribution.
Estimator two uses the empirical probabilities from waves 1 and 2 only: As suggested by Salganik and Heckathorn [2], we disregard the seed nodes for the reason they have been selected by a different, unknown mechanism. Estimator three uses the empirical probabilities from nodes in waves 0, 1 and 2: On the one hand we would expect this to be a biased estimate of the degree distribution due to the snowball nature of the sample. On the other hand, if the sample is a large fraction of the population, the potential bias may be small. Estimator four is the degree distribution estimator of Salganik and Heckathorn [2]. It is based on the assumption the probability a node appears in the sample is proportional to its degree (so d p(d) ∝ q A (d)). As with estimator two, we disregard nodes in wave 0. Estimator five arises from the Volz-Heckathorn RDS estimator [3] p 2}}. This estimator also arises from assuming the probability a node is sampled is proportional to its degree. Finally, estimator six arises from making simplifying approximations to the probability a node is included in the sample [1] so the estimator takes the form where K is a constant to ensure probabilities sum to one. This estimator requires knowledge of the network size N which is often unknown. In our case we use the estimated size 524.
Our estimator of the conditional triangle corner distribution is simply the sample conditional triangle corner distribution We disregard the second wave because triangle nodes in wave 2 could have edges to nodes in wave 3 which is beyond the sample. Thus, the counts of triangle corners in wave 2 will be too small, with an unknown effect on the distribution estimate.
For the edge-triangle model, Table 1 shows how well each of the six estimators captures the degree distribution. Column two shows results from using each of the six estimated degree distributions to simulated edge-triangle networks. Because the population network is unknown we compare the observed snowball sample with a snowball sample of the simulated edge-triangle network with the same number of seed nodes. Shown are the number of networks rejected (out of 100) using a χ 2 goodness-of-fit test at 5% significance for each estimator. This shows that amongst the six estimators considered, estimator three is the only one which rejects at the level of the test. This is evidence that the degree distribution produced using estimator three is consistent with the empirical network and suggests it is the best choice of the six to model the contact network.
To verify this result, we conducted a sub-study by simulating 100 networks with 524 nodes using our ERGM model of the PWID network to serve as "known truth". From each network, we formed 100 snowball samples by using 100 independent seed sets of 49 nodes (so 10,000 samples in total). For each sample we used a χ 2 goodness-of-fit test at 5% significance to compare the estimated degree distribution of the sample with the degree distribution of the corresponding network. Column three of Table 1 shows the mean number of samples rejected (with 95% confidence intervals), calculated as the mean over 100 samples per network, then the mean over 100 networks. We expect rejection rates higher than the level of the test because the samples are only a fraction of the data. Estimator three (the sample degree distribution on waves 0, 1 and 2) does the best at capturing the degree distribution of the population. Although this estimator will be biased by the higher likelihood of including higher degree nodes, it seems (in this case) this effect on producing good distribution estimations is small in comparison to the effect from excluding waves, the inaccuracies from the assumptions of estimator five, or the approximations of estimator six. Thus, we use estimator three to generate edge-triangle type networks. Column one identifies which estimator was studied. Column two shows the number of simulated edge-triangle networks rejected (of 100) at 5% significance using degree distribution estimators 1-6. For each network the sample degree distribution is compared with the sample degree distribution of the observed snowball sample using a χ 2 goodness-of-fit test. Column three shows results from a simulation sub-study using 100 simulated ERGMs and 100 snowball samples per ERGM network (so 10,000 samples in total). For each snowball sample, the estimated degree distribution is compared with the degree distribution of the population network ("known truth") using a χ 2 goodness-of-fit test. Both results suggest estimator three generates a degree distribution most like that of the population network.