Skip to main content
Advertisement
  • Loading metrics

A statistical framework for comparing epidemic forests

  • Cyril Geismar ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    c.geismar21@imperial.ac.uk

    Affiliations MRC Centre for Global Infectious Disease Analysis, Imperial College School of Public Health, London, United Kingdom, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,‌‌ United States of America, NIHR Health Protection Research Unit in Modelling and Health Economics,‌‌ Imperial College School of Public Health, London, United Kingdom

  • Peter J. White,

    Roles Funding acquisition, Resources

    Affiliations MRC Centre for Global Infectious Disease Analysis, Imperial College School of Public Health, London, United Kingdom, NIHR Health Protection Research Unit in Modelling and Health Economics,‌‌ Imperial College School of Public Health, London, United Kingdom

  • Anne Cori ,

    Contributed equally to this work with: Anne Cori, Thibaut Jombart

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Validation, Writing – review & editing

    Affiliations MRC Centre for Global Infectious Disease Analysis, Imperial College School of Public Health, London, United Kingdom, NIHR Health Protection Research Unit in Modelling and Health Economics,‌‌ Imperial College School of Public Health, London, United Kingdom

  • Thibaut Jombart

    Contributed equally to this work with: Anne Cori, Thibaut Jombart

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – review & editing

    Affiliations MRC Centre for Global Infectious Disease Analysis, Imperial College School of Public Health, London, United Kingdom, NIHR Health Protection Research Unit in Modelling and Health Economics,‌‌ Imperial College School of Public Health, London, United Kingdom

Abstract

Inferring who infected whom in an outbreak is essential for characterising transmission dynamics and guiding public health interventions. However, this task is challenging due to limited surveillance data and the complexity of immunological and social interactions. Instead of a single definitive transmission tree, epidemiologists often consider multiple plausible trees forming epidemic forests. Various inference methods and assumptions can yield different epidemic forests, yet no formal test exists to assess whether these differences are statistically significant. We propose such a framework using a chi-square test and permutational multivariate analysis of variance (PERMANOVA). We assessed each method’s ability to distinguish simulated epidemic forests generated under different offspring distributions. While both methods achieved perfect specificity for forests with 100+ trees, PERMANOVA consistently outperformed the chi-square test in sensitivity across all epidemic and forest sizes. Implemented in the R package mixtree, we provide the first statistical framework to robustly compare epidemic forests.

Author summary

Identifying who infected whom is a central part of outbreak investigation. It helps trace the source of infection, uncover missing cases, identify superspreaders, and describe broader dynamics of transmission such as its speed, pattern, and scale. With the advent of pathogen sequencing and digital contact tracing, computational models have become the standard approach for reconstructing outbreaks. These probabilistic models do not identify a single definitive history of who infected whom (i.e., a transmission tree), but a collection of plausible alternatives, which we call epidemic forests. Different modelling assumptions or data sources can produce different epidemic forests, but until now, there has been no formal way to determine whether these differences are meaningful. We present the first statistical framework designed to compare epidemic forests. We evaluate two methods: one that counts how often specific transmission pairs appear, and another that compares the structure of transmission trees. Testing these methods on simulated outbreaks, we found that both successfully identified when forests represented identical transmission dynamics, but one method outperformed the other in identifying forests representing distinct transmission dynamics. Our framework, implemented in the R package mixtree, enables epidemiologists to validate and compare outbreak reconstruction approaches, supporting more reliable investigations.

Introduction

Tracking who infected whom is central to outbreak investigations. Transmission trees, modelled as directed acyclic graphs (DAGs) where vertices represent infected individuals and directed edges indicate transmission events, delineate infector-infectee relationships [1]. These representations can assist epidemiologists in identifying introduction and superspreading events [2,3], whilst also elucidating broader transmission dynamics relevant to outbreak response. The topology of transmission trees, defined by the arrangement of vertices and edges, encodes key epidemiological parameters. The out-degree distribution of vertices represents the number of secondary infections per infected individual (i.e., the offspring distribution), revealing the extent of heterogeneity in transmission [47]. Branching patterns inform on transmission dynamics between groups [8], revealing group reproduction numbers [5] and transmission patterns, for example, between healthcare workers and patients in nosocomial outbreaks [9] or between children and adults in schools [10].

The inference of transmission trees is challenging and often characterised by large uncertainty, partly due to the lack of discriminatory power in choosing between possible transmission pairs [5,9,11], incomplete surveillance data and diverse, sometimes conflicting sources (e.g., contact, temporal, spatial, or genetic data) [12]. Additional complexities arise from varying methodological approaches [12] and pathogen evolution mechanisms that are difficult to model (e.g., within-host evolution and transmission bottleneck [7,13]). Consequently, outbreak reconstruction often yields epidemic forests, which are collections of plausible transmission trees rather than a singular definitive representation of who infected whom.

Without formal statistical methods to differentiate epidemic forests, determining whether differences between them represent meaningful variations in transmission dynamics or uncertainty in tree reconstruction is challenging. Such distinction would help validate convergence when repeated model runs produce statistically similar forests and assess whether competing inference approaches or alternative data sources yield significantly different forests.

Bayesian inference methods have emerged as the gold standard for transmission tree reconstruction, with various approaches differing in their assumptions, data requirements, and inference strategies [12]. In this context, epidemic forests represent samples drawn from a model’s posterior distribution of transmission trees. To assess the performance of the inference process, researchers rely on general Markov Chain Monte Carlo (MCMC) diagnostics, applied to scalar parameter chains rather than the inferred trees themselves. These diagnostics evaluate convergence through trace plot inspection and the Gelman-Rubin statistic [14], assess sampling efficiency through effective sample size calculations, and check model fit using posterior predictive checks [15]. In parallel, model selection criteria such as Deviance Information Criterion and Bayesian Information Criterion balance goodness-of-fit with model complexity, enabling comparison of competing inference models fitted to the same data [16]. Consensus trees are used also to summarise epidemic forests, typically representing, for each case, the infector with the highest posterior support across samples [11,1720]. However, these trees are often abstract representations rather than plausible transmission scenarios, potentially introducing cycles or multiple index cases. While algorithms such as Edmonds can enforce a valid tree topology, the resulting consensus tree may correspond to a combination of ancestries that was never observed as a complete tree in the posterior [2022].

Consequently, standard MCMC diagnostics assess parameter chains rather than the inferred transmission events, model selection criteria compare model fit but not the resulting tree topologies, while consensus trees ignore the uncertainty in who infected whom and may misrepresent key epidemiological features. This underscores the need for specialised statistical methods that can differentiate epidemic forests while accounting for uncertainty and relevant topological properties.

Here, we introduce a statistical framework for testing differences between epidemic forests. We consider two alternative methods: an adaptation of a Monte Carlo chi-square () test [23] used to compare the distribution of infector-infectee pairs between forests, and a permutation-based multivariate analysis of variance (PERMANOVA) [24] which compares topological distances between trees within and between forests. Both methods are summarised in Fig 1. We evaluated the performance of each method by comparing simulated epidemic forests with varying offspring distributions, measuring their ability to correctly identify forests stemming from distinct (sensitivity) or identical (specificity) generative processes (see Methods).

thumbnail
Fig 1. Statistical framework for comparing epidemic forests.

Diagram illustrating the methods for comparing epidemic forests A (pink, e.g., no superspreading) and B (orange, e.g., superspreading). Coloured dots represent infected cases, and arrows indicate transmission events. Left: The test compares the frequency of infector-infectee pairs between forests. Right: The PERMANOVA method first calculates pairwise standard graph distances (number of transmission events between two cases, plus one – see supplementary S1 Fig) between vertices within each tree, converts these to a Euclidean distance matrix between all trees, and tests for significant topological differences between the forests using permutation-based testing. Both methods test the null hypothesis that the compared epidemic forests stem from ‌‌the same generative process.

https://doi.org/10.1371/journal.pcbi.1014271.g001

Results

We simulated pairs of epidemic forests that were either stemming from the same, or different generative processes (see Methods). We systematically varied epidemic size (: 20–200 cases), forest size (m: 20–200 trees per forest), and the parameters of the negative binomial offspring distribution (R0: 1.5-3 and k: 0.1-Poisson-like), which determine the mean and dispersion of secondary infections. We simulated 90,000 epidemic forests (see Methods) based on which we conducted 5,760,000 tests to measure sensitivity and specificity for the test and PERMANOVA.

Overall results are presented as Receiver Operating Characteristic (ROC) curves (supplementary S6 Fig), with area under the curve (AUC) provided in supplementary S7 Fig. Both methods exhibited near-perfect specificity (> 97%), i.e., the ability to correctly identify forests drawn from identical generative processes across all epidemic or forest sizes (Fig 2, row 4). The test had a negligible advantage in specificity (+1.5%) when the number of trees in each forest was small (m ≤ 50). Across the aggregated simulation results, sensitivity was near perfect once the forest size reached 50–100 trees, with AUC nearing 1 (supplementary S7 Fig). However, these results varied between methods and simulation settings.

thumbnail
Fig 2. Performance of and PERMANOVA for comparing epidemic forests.

This figure summarises simulation results for the test (square points) and PERMANOVA (circular points) across varied parameter conditions. The y-axis shows the percentage of tests rejecting the null hypothesis (H0) of no difference between forests. The x-axis displays the epidemic sizes. Column panels refer to the forest size (i.e., the number of trees in each forest). Row panels refer to the type of differences between the two forests (with H0 for no differences).

https://doi.org/10.1371/journal.pcbi.1014271.g002

The methods differed substantially in their sensitivity, i.e., their ability to correctly identify forests drawn from different generative processes, with PERMANOVA consistently outperforming the test across all scenarios (Fig 2, row 1–3). However, the magnitude of PERMANOVA’s advantage varied considerably depending on which parameters differed between forests. A logistic regression model explained 58% of the variance in test sensitivity (pseudo R2 [25] = 0.58), with method choice, forest size, epidemic size, and differences in R0 and k as key predictors (Table 1). Compared to the test, PERMANOVA showed much greater sensitivity when forests differed in their dispersion parameter (), with 51-fold higher odds of correctly distinguishing overdispersed from Poisson-like forests (), and 8-fold higher odds when comparing forests with different degrees of overdispersion () (Table 1). When forests differed in dispersion and contained at least 100 trees, PERMANOVA achieved near-perfect sensitivity (98.7%), irrespective of epidemic size (Fig 2, rows 1 and 3, column 3).

thumbnail
Table 1. Logistic regression results for the sensitivity model (Eq 10).

https://doi.org/10.1371/journal.pcbi.1014271.t001

In contrast, PERMANOVA’s advantage over the test diminished when forests differed only in reproduction number (OR = 3, Table 1). When both forests shared strong overdispersion (common k ≤ 0.5), high stochastic variability in individual transmission limited the ability to detect differences in R0 up to 1 (), yielding low sensitivity even with 200 trees per forest (52% across epidemic sizes; supplementary S3 Fig and S4 Fig). Sensitivity improved progressively as the common dispersion parameter approached Poisson-like transmission () or as epidemic size increased (supplementary S4 Fig).

In addition to higher sensitivity, PERMANOVA produced consistently narrower p-value distributions than the test, with interquartile ranges substantially smaller across all scenarios (supplementary S2 Fig).

Both methods’ sensitivity increased with forest size (OR = 3, 6, 12 for m = 50, 100 and 200 respectively) but showed opposite correlations with epidemic size: PERMANOVA’s sensitivity rose with larger epidemics (OR= 2, 4 and 5 for = 50, 100, and 200 respectively), whereas the test’s sensitivity declined (OR= 0.5, 0.3, and 0.1) (Table 1, Fig 2). Our findings establish PERMANOVA as the superior method for comparing epidemic forests when sufficient samples are available ( 100), providing excellent sensitivity and specificity regardless of epidemic size.

Discussion

We evaluated two statistical approaches, the chi-square () test and PERMANOVA, for distinguishing between collections of transmission trees (i.e., epidemic forests) originating from different generative processes, defined by the mean and dispersion of their offspring distribution (i.e., the distribution of secondary cases generated by each infected individual). The test tests for differences in the frequency of infector-infectee pairs between epidemic forests, treating each pair as isolated edges without considering their relative position within each tree. In contrast, PERMANOVA leverages customised tree-based distance metrics to quantify meaningful epidemiological differences between tree topologies, which may better signal distinct pathogen transmission dynamics.

Our simulations showed that PERMANOVA consistently outperformed the test in distinguishing epidemic forests generated under different offspring distributions. It achieved near-perfect sensitivity when forests differed in their dispersion parameter across all epidemic sizes (20–200 cases), provided forests contained at least 100 trees. However, its performance declined when forests differed solely in their mean reproduction number, especially for forests with high overdispersion (common k < 0.5) (supplementary S4 Fig). In such settings, the substantial stochastic variability in individual transmission masked differences in mean transmissibility. Although the test also demonstrated excellent specificity, its sensitivity was consistently lower across all scenarios and declined further as epidemic size increased. Larger epidemics produced higher forest entropy, which indicates greater variation in who infected whom across trees (supplementary S5 Fig, S1 Table). Increased entropy yielded sparse contingency tables with many low expected counts and growing degrees of freedom, which reduced the statistical power of the test (see Methods, Eq 2). In contrast, PERMANOVA became more sensitive as epidemics grew, given that additional transmission events reduced the variance in within-group distances, increasing the F-statistic (see Methods, Eq 9).

Computationally, both methods scale with epidemic size, although PERMANOVA incurs greater computational expense (see supplementary file S1 File, S2 Table). Parallelisation and constrained permutation (for PERMANOVA [26]) or replicates used in the Monte Carlo test (for test [27]) make both methods applicable to most contexts. When comparing two forests, each with 100 trees and 100 vertices, the test takes 0.5 seconds, while PERMANOVA takes an average of 5 seconds (supplementary S2 Table). To facilitate accessibility of these methods, we have developed mixtree [28], a free, open-source R package available on CRAN [27]. mixtree implements both the test and PERMANOVA methods described in this study.

The proposed framework addresses several needs for outbreak reconstruction. First, it provides a formal approach for assessing MCMC convergence in tree space by comparing epidemic forests sampled from independent MCMC chains, which should be statistically indistinguishable when converged. This method complements existing diagnostics that focus on scalar parameter chains, which do not fully capture the complex tree structures that form the primary output of Bayesian inference models. Second, it enables rigorous comparison between competing models with different assumptions about transmission dynamics, facilitating evidence-based model selection. Third, it can detect whether incorporating additional data sources (e.g., contact tracing [29]) into reconstruction efforts significantly alters the resulting transmission trees, helping researchers evaluate the value of supplementary data. However, it cannot independently determine which reconstruction is more accurate without additional validation measures.

Our study focused on comparing two forests of equal size for computational feasibility. However, both methods can compare any number of forests of varying sizes sharing the same set of vertices, as implemented in our mixtree package. Nonetheless, the two methods do not share identical limitations. PERMANOVA assumes full graph connectivity [30], so it cannot accommodate multiple introductions that result in disconnected trees. In contrast, the test can handle transmission trees that have multiple introductions by assigning them specific identifiers, e.g., ‘Introduction A’, ‘Introduction B’ etc. In the presence of unobserved cases, the test cannot distinguish between direct and unobserved intermediate transmissions. Importantly, PERMANOVA could be extended by modelling epidemiological, spatial or genetic distances as edge weights. For example, these weights could represent the number of infection generations between pairs of cases, thus accounting for unobserved cases. The standard graph distance used here could be replaced with a more complex metric that incorporates additional edge characteristics (i.e., weights) such as the number of generations between observed cases, or the time difference between their symptoms [31] or infection dates. While our simulation framework assessed method performance when the forest’s generative process differed only in its offspring distribution, other epidemic features also shape tree topology. Future work should evaluate performance under alternative assumptions about epidemic dynamics such as group transmission patterns [8], the effects of saturation [32], vaccination or new variants of concern [31], which would require developing additional distance metrics for PERMANOVA to capture such features. Our simulation framework focused on epidemics of 20–200 cases, reflecting the typical range for computational outbreak reconstruction, and our results show that PERMANOVA performs well once forests comprise 100 or more trees, corresponding to the typical effective sample size from Bayesian reconstruction models [12].

While alternative methods for comparing graph collections exist, they typically rely on abstract graph kernels not directly interpretable in our epidemiological context [33]. In contrast, our method employs a distance metric that is epidemiologically meaningful, as it corresponds to the number of generations of infection separating each pair of cases. Future work could also examine how the multivariable analysis capability of PERMANOVA may be used to quantify the relative contributions of the inference method, data type, and prior assumptions on the observed topological differences between epidemic forests. In addition to the application to epidemic reconstruction that we have considered here, this work addresses a more general methodological gap across disciplines where relational structures are represented as graphs [3439]. In practice, diverse data sources, modelling assumptions, and analytical methods typically produce not single solutions but ensembles of plausible alternatives, i.e., collections of graphs. Bayesian approaches excel at generating these collections through MCMC sampling but lack formal statistical tools for comparing the resulting posterior samples. One example of other such application area is phylogenetic tree reconstruction [39], where researchers encounter similar challenges that can lead to conflicting evolutionary hypotheses or taxonomic classifications. In information and network science, different network representations may likewise suggest distinctive social patterns or information flow dynamics.

In conclusion, our framework enables the comparison of collections of transmission trees, a special class of graph, by distinguishing meaningful structural variations from sampling and model uncertainty. We have demonstrated its utility to epidemic reconstruction, but this approach likely extends to other fields relying on graph-based representations. We encourage researchers to adapt and validate this framework to address domain-specific challenges in their respective fields, potentially developing additional metrics that capture the unique characteristics of their data structures.

Methods

We introduce a framework for comparing collections of transmission trees, termed epidemic forests. We present two approaches: the first based on a test [23] on transmission pair frequencies, and the second using PERMANOVA, a method originally developed for ecological community analysis [24], on transmission tree distances. Both methods are described below and illustrated in Fig 1. We use a simulation to compare the respective performances of the two approaches.

Epidemic forests

Transmission trees represent the spread of a disease amongst infected individuals as directed acyclic graphs (DAGs) [1]. A transmission tree T = (V, E) consists of a set containing n vertices (each representing an infected individual) and a set of n − 1 directed edges. Each edge represents an infector-infectee pair, denoted as , with and . This directed edge connects an infector vi to its infectee vj, formally encoding the ‘who infected whom’ relationship. All vertices have an in-degree of 1, except the root, which represents the index case and has an in-degree of 0. In the absence of data to define meaningful edge weights, we assume all edges have a weight of 1.

We define an epidemic forest as a collection of transmission trees, each with the exact same set of vertices, but possibly different sets of edges. We consider two epidemic forests and , where the kth tree in is defined as . For simplicity, we assume that the two epidemic forests have the same size (), but the approaches described below can readily accommodate (). In practice, an epidemic forest may be obtained by sampling from a posterior distribution via Bayesian inference (e.g., MCMC) or from a stochastic transmission model [12,40].

test

The test compares the absolute frequencies of infector-infectee pairs (i.e., edges) between two epidemic forests and . For each of the possible infector-infectee pair, we count their occurrences across all trees in a forest as:

(1)

where is the indicator function (yielding 1 if the pair appears in tree , 0 otherwise).

The statistic for comparing forests and is:

(2)

where includes only infector-infectee pairs observed in at least one forest. Under the null hypothesis that both forests stem from the same underlying frequency distribution of infector-infectee pairs, follows a chi-square distribution with degrees of freedom, where denotes the number of unique infector-infectee pairs observed. To accommodate small counts, the non-parametric Monte Carlo version of the chi-square test (999 replicates) was then used [23,41]. This formulation assumes equal forest sizes (). Under the null hypothesis that both forests are sampled from the same distribution of infector-infectee pairs, the expected count for pair (i,j) in forest is , and similarly for . Substituting these expected values into the classical chi-squared formula and simplifying yields Equation 2. When forest sizes differ , the standard chi-squared formulation applies, with expected counts proportional to forest size. This generalisation is implemented in the mixtree package.

PERMANOVA

PERMANOVA is a generic approach used to test group differences using pairwise distances between all observations of a sample and makes no model assumptions [24]. Here, we apply it to test whether distances between transmission trees differ when the trees belong to the same epidemic forest versus different forests.

Distance between two transmission trees.

The field of phylogenetics offers a range of established methods for comparing tree structures, providing several distance metrics for quantifying topological differences between pairs of phylogenies [18,4246]. These methods typically follow a two-step process: (i) convert trees into vectors of pairwise distances between all sampled taxa and (ii) compute Euclidean distances between these vectors.

A commonly used metric for the first step is the patristic distance [44], defined as the sum of branch lengths on the path separating two taxa, reflecting the evolutionary distance between them. Adapting this concept to transmission trees, we define the graph distance between cases (i.e., vertices) vi and vj as the sum of edge weights along their connecting path on the undirected graph. Since all edges here have a weight of 1, this distance directly corresponds to the number of transmission events between cases, carrying clear epidemiological meaning. An illustration of graph distances in a transmission tree is available in the supplementary material (S1 Fig).

We denote the function mapping a transmission tree T of size n into a vector of graph distances:

(3)

where .

The dissimilarity between two trees Tk and Tl is then quantified by the Euclidean distance between the respective vectors of graph distances, calculated as the norm:

(4)

This distance captures topological differences by evaluating how the relative positions of vertices, encoded as graph distances, diverge between the two trees. If Tk and Tl have identical edge sets, their graph distance matrices are equal, yielding ; otherwise, discrepancies in path lengths increase the distance.

Outline of the method.

Given two epidemic forests, and , each containing m transmission trees, we apply PERMANOVA to test whether tree topologies differ significantly between forests. Broadly, the method partitions pairwise distances between all trees into within-group (SSW) and between-group (SSB) components [24], based on pre-defined groups (here, the two forests). Statistical significance is assessed through permutation testing, where forest identifiers (e.g., ‘’, ‘’) are randomly reassigned multiple times.

We define the combined epidemic forest as , containing all trees from and . The total sum of squares, SST, representing the overall variance across all trees in , is:

(5)

The double summation computes squared pairwise distances amongst the 2m trees in , which decomposes to:

(6)

The within-group sum of squares SSW measures the variance within forests:

(7)

where each term sums the squared distances among all pairs within each forest, normalised by m. The between-group sum of squares (), capturing variability between the forests, is:

(8)

The PERMANOVA test statistic [24] is:

(9)

The reference distribution of F under the null hypothesis of no differences between groups is generated by a Monte Carlo procedure where forests’ identifiers are permuted a large number of times (i.e., 999 by default). p-values are calculated as the proportion of permuted F -values exceeding the observed F [24].

Simulation study

We conducted a simulation study to evaluate the performance of the test and PERMANOVA in distinguishing between simulated epidemic forests drawn from distinct generative processes corresponding to different epidemic dynamics. The simulation framework is illustrated in Fig 3.

  1. We simulate a reference transmission tree with infections from offspring distribution NegBin(R0, k). This process is repeated 100 times to account for the stochasticity of epidemic dynamics.
  2. We generate reconstructed forests and , each containing m trees, by re-assigning infector-infectee relationships from and , conditional on ’s dates of infection and case identifiers. In this example, and .
  3. The test and PERMANOVA are applied to test whether the two epidemic forests stem from the same generative process.
thumbnail
Fig 3. Simulation framework for assessing the performance of the test and PERMANOVA.

Diagram illustrating the simulation study to assess the respective performances of test and PERMANOVA for detecting‌‌ differences between pairs of epidemic forests.

https://doi.org/10.1371/journal.pcbi.1014271.g003

Simulating epidemic forests.

We generated epidemic forests through a three-stage process to systematically evaluate forest comparison methods across diverse transmission scenarios.

First, we defined the parameter space for the simulations:

  • Epidemic size: . The number of infected individuals, corresponding to the number of vertices in the tree.
  • Basic reproduction number: . The mean number of secondary infections per case in a fully susceptible population, corresponding to the mean of the negative binomial offspring distribution.
  • Dispersion parameter: . Controls heterogeneity in individual transmission, corresponding to the dispersion of the negative binomial offspring distribution. Lower values indicate greater overdispersion; as , the distribution converges to Poisson.

For each epidemic sizes , we defined offspring distributions NegBin(R0, k) using all pairwise combinations of basic reproduction number R0 and dispersion parameter k.

Second, we generated reference transmission trees. For each parameter combination we simulated a reference transmission tree using a stochastic branching process. Secondary infections per case were drawn from NegBin(R0, k), and generation times followed a gamma distribution with a mean of 12 days and standard deviation of 6 days (see supplementary S1 File). We generated 100 replicate trees per parameter set to account for stochasticity. Simulations were initialised with 10,000 susceptible individuals, ran for a maximum of 365 days, and terminated upon reaching exactly infections, thereby excluding saturation effects. Within each reference tree , infected individuals were assigned identifiers , ordered by their dates of infection tv.

Third, we constructed epidemic forests by reassigning cases’ ancestries. For each reference tree , we generated forests by conditioning on the observed infection set while resampling ancestries from (see supplementary S1 File). Each forest comprised m = 200 trees. This procedure yielded 15 distinct forests per reference tree (one for each offspring distribution pair ), including one forest matched the reference tree’s generative process, where .

This procedure generated a total of 6,000 reference trees (), each generating 15 distinct forests (), yielding 120 pairwise forest comparisons per reference tree (), resulting in a total of 720,000 forest comparisons.

Assessing statistical performance.

For each of the 720,000 forest comparisons ( vs. ), we performed the test and PERMANOVA under 4 forest sizes , where m denotes the number of trees sampled from each forest. This resulted in a total of 5,760,000 tests performed. For each parameter combination, we measured:

  • Sensitivity: The proportion of tests that correctly rejected the null hypothesis (H0) when comparing forests generated with different offspring distributions, i.e., vs. where .
  • Specificity: The proportion of tests that correctly accepted H0 when comparing forests generated with identical offspring distributions, i.e., .

To quantify the factors influencing test sensitivity, we fit a logistic regression model to all comparisons where forests were generated under different parameter settings (H1; n = 5,040,000). The binary outcome was whether the test correctly rejected the null hypothesis (H0). We compared four nested models using the Akaike Information Criterion and selected the model with the lowest value. The final model included main effects for statistical method (PERMANOVA or ), forest size (m), epidemic size (), and parameter differences between forests ( and ). It also included all two-way and three-way interaction terms involving the method:

(10)

Where ‘:’ represents the interaction term and e is the normally distributed residuals. Results are reported as odds ratios using the test, the smallest forest size (m = 20), the smallest epidemic size (), and no difference in dispersion () as reference categories. The model achieved a pseudo R2 of 0.58 [25].

Both methods achieved near-perfect specificity (>97%) across all conditions, precluding regression‌‌ analysis.

Supporting information

S1 File. Description of the methods for generating epidemic forests.

https://doi.org/10.1371/journal.pcbi.1014271.s001

(PDF)

S1 Fig. Calculation of graph distances in a transmission tree.

We define the graph distance between two vertices as the number of edges on the path that connects them. Epidemiologically, this corresponds to the number of transmission events that separate these two cases. A. A transmission tree with 5 cases. The coloured dashed lines show the unique paths connecting case 5 to all other cases. B. The matrix representation of graph distances between all pairs of cases. The coloured numbers correspond to the number of transmission events between case 5 and all other cases, matching the coloured paths shown in A. For illustration, we only represent the lower triangle but the matrix is symmetric.

https://doi.org/10.1371/journal.pcbi.1014271.s002

(TIF)

S2 Fig. Performance of test and PERMANOVA for distinguishing epidemic forests.

Median p-values and interquartile ranges for the test (squares) and PERMANOVA (circles) across epidemic sizes (x-axis), forest sizes (columns), and parameter conditions (rows). Grey shading indicates desired p-value ranges: below when forests differ in at least one parameter (rows 1–3, reject H0) and above when forests share identical parameters (row 4, accept H0).

https://doi.org/10.1371/journal.pcbi.1014271.s003

(EPS)

S3 Fig. Performance of and PERMANOVA in distinguishing epidemic forests.

Each panel shows the proportion of tests rejecting the null hypothesis (p < 0.05) when comparing epidemic forest and . The upper triangle shows the test results; the lower triangle shows PERMANOVA results. Outer columns refer to epidemic size (), common to both forest. Forests can differ in their offspring distribution parameter: R0 (inner columns) and k (x and y axes; x-axis labels omitted for clarity, values identical to the y-axis). Red diagonal lines indicate comparisons where both forests share identical parameters (H0 true; low rejection rates indicate good specificity). The other cells compare forests with different parameters (high rejection rates indicate good sensitivity). Both methods maintain excellent specificity (diagonal), but PERMANOVA demonstrates superior sensitivity.

https://doi.org/10.1371/journal.pcbi.1014271.s004

(EPS)

S4 Fig. Sensitivity of PERMANOVA when comparing epidemic forests that differ only in R0.

Sensitivity (y-axis) is the proportion of tests correctly rejecting the null hypothesis when comparing forests of 200 trees generated with different R0 (, colour) but identical k (x-axis). Columns correspond to epidemic size.

https://doi.org/10.1371/journal.pcbi.1014271.s005

(EPS)

S5 Fig. Variation in infector-infectee relationships across epidemic forests Histogram of mean scaled entropy across epidemic forests (m = 200 trees per forest), stratified by simulation parameters.

For each infectee j, the scaled entropy Hj (x axis) quantifies variation in their assigned infector across all trees in a forest, computed using the normalised Shannon entropy formula [49]: , where pij is the proportion of trees in which individual i infects j, and Kj is the number of distinct infectors of j observed across the forest. Values range from 0 (identical infector in all trees) to 1 (all possible infectors equally frequent). The mean scaled entropy () is obtained by averaging over all cases. Columns refer to epidemic sizes (), rows refer to the mean reproduction number R0 and dispersion parameter k of the negative binomial offspring distribution. Average entropy for our simulations is 77% and increases with epidemic size, due to greater variation in infector assignment (S1 Table).

https://doi.org/10.1371/journal.pcbi.1014271.s006

(EPS)

S1 Table. Linear regression results for the mean scaled entropy model.

The mean scaled entropy for each forest was modelled as a linear function of epidemic size, reproduction number, and dispersion.  (11) where is the intercept, each term represents the categorical effect of epidemic size (), reproduction number (R0), and dispersion parameter (k) respectively, and is the residual error. The model explained 68.2% of the variance (R2 = 0.682) in mean scaled entropy across 90,000 simulated forests, with coefficient estimates shown in supplementary S1 Table.

https://doi.org/10.1371/journal.pcbi.1014271.s007

(TEX)

S6 Fig. ROC curves for the test and PERMANOVA Each panel shows the Receiver Operating Characteristic (ROC) curves plotting true positive rate (sensitivity) against false positive rate (1-specificity) for the test (blue) and PERMANOVA (orange) across all simulations for all possible significance thresholds ().

Panels are arranged by epidemic size (columns: 20–200 cases) and forest size (rows: 20–200 trees), x-axis tick labels are omitted for clarity, as both axes share the same scale.

https://doi.org/10.1371/journal.pcbi.1014271.s008

(EPS)

S7 Fig. Area under curve (AUC) for test and PERMANOVA This figure shows the AUC derived from the Receiver Operating Characteristic (ROC) curves (supplementary S6 Fig) of the two tests evaluated in our simulations.

The y-axis displays the AUC value, with higher values corresponding to better performances. An AUC of 1 corresponds to a test with perfect sensitivity and specificity. The x-axis displays the epidemic size, i.e., the number of cases in the simulated epidemics, while panels refer to the forest size, i.e., the number of trees in each forest.

https://doi.org/10.1371/journal.pcbi.1014271.s009

(EPS)

S2 Table. Benchmark results of execution times in seconds for the test and PERMANOVA, comparing two epidemic forests with 100 trees and 100 vertices each.

Both tests used 999 permutations (PERMANOVA) / Monte Carlo replicates ( test) without parallelisation and were replicated 100 times per method. For the test, we compute the frequency of each infector-infectee pair across all trees between forests. In the worst-case scenario, where every possible infector-infectee pair (i.e., n(n − 1) pairs) appears at least once in either forest, the computational time for the test increases with the number of trees in each forest (m) and the square of the number of cases (n2), since it considers all infector-infectee pairs for every tree (See Methods, Fig 1). Therefore the overall computational time will increase as a function of mn2. On the other hand, PERMANOVA involves a two-step process. First, it computes pairwise distances between all vertices within each tree (supplementary S1 Fig), which scales as a function of n2 for a given tree. Second, it calculates pairwise distances between all trees (Fig 1), which scales as a function of m2. Therefore the overall computational time will increase as a function of .

https://doi.org/10.1371/journal.pcbi.1014271.s010

(TEX)

References

  1. 1. Jombart T, Eggo RM, Dodd PJ, Balloux F. Reconstructing disease outbreaks from genetic data: a graph approach. Heredity (Edinb). 2011;106(2):383–90. pmid:20551981
  2. 2. Wang L, Didelot X, Yang J, Wong G, Shi Y, Liu W, et al. Inference of person-to-person transmission of COVID-19 reveals hidden super-spreading events during the early outbreak phase. Nat Commun. 2020;11(1):5006. pmid:33024095
  3. 3. Frieden TR, Lee CT. Identifying and Interrupting Superspreading Events-Implications for Control of Severe Acute Respiratory Syndrome Coronavirus 2. Emerg Infect Dis. 2020;26(6):1059–66. pmid:32187007
  4. 4. Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438(7066):355–9. pmid:16292310
  5. 5. Abbas M, Cori A, Cordey S, Laubscher F, Robalo Nunes T, Myall A. Reconstruction of transmission chains of SARS-CoV-2 amidst multiple outbreaks in a geriatric acute-care hospital: a combined retrospective epidemiological and genomic study. eLife. 2022;11:e76854.
  6. 6. Didelot X, Gardy J, Colijn C. Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol Biol Evol. 2014;31(7):1869–79. pmid:24714079
  7. 7. Didelot X, Fraser C, Gardy J, Colijn C. Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks. Molecular Biology and Evolution. 2017;34(4):997–1007.
  8. 8. Geismar C, White PJ, Cori A, Jombart T. Sorting out assortativity: When can we assess the contributions of different population groups to epidemic transmission?. PLoS One. 2024;19(12):e0313037. pmid:39621588
  9. 9. Abbas M, Robalo Nunes T, Cori A, Cordey S, Laubscher F, Baggio S, et al. Explosive nosocomial outbreak of SARS-CoV-2 in a rehabilitation clinic: the limits of genomics for outbreak reconstruction. J Hosp Infect. 2021;117:124–34. pmid:34461177
  10. 10. Kremer C, Torneri A, Libin PJK, Meex C, Hayette M-P, Bontems S, et al. Reconstruction of SARS-CoV-2 outbreaks in a primary school using epidemiological and genomic data. Epidemics. 2023;44:100701. pmid:37379776
  11. 11. Campbell F, Strang C, Ferguson N, Cori A, Jombart T. When are pathogen genome sequences informative of transmission events?. PLoS Pathog. 2018;14(2):e1006885. pmid:29420641
  12. 12. Duault H, Durand B, Canini L. Methods Combining Genomic and Epidemiological Data in the Reconstruction of Transmission Trees: A Systematic Review. Pathogens. 2022;11(2):252. pmid:35215195
  13. 13. De Maio N, Wu CH, Wilson DJ. SCOTTI: efficient reconstruction of transmission within outbreaks with the structured coalescent. PLoS Computational Biology. 2016;12(9):e1005130.
  14. 14. Gelman A, Rubin DB. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science. 1992;7(4):457–72.
  15. 15. Lambert B. A student’s guide to Bayesian statistics. SAGE Publications Ltd. 2018.
  16. 16. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2002;64(4):583–639.
  17. 17. Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39(4):783–91.
  18. 18. Jombart T, Kendall M, Almagro-Garcia J, Colijn C. treespace: Statistical exploration of landscapes of phylogenetic trees. Mol Ecol Resour. 2017;17(6):1385–92. pmid:28374552
  19. 19. Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C, Ferguson N. Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Comput Biol. 2014;10(1):e1003457. pmid:24465202
  20. 20. Klinkenberg D, Backer JA, Didelot X, Colijn C, Wallinga J. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks. PLoS Comput Biol. 2017;13(5):e1005495. pmid:28545083
  21. 21. Gibbons A. Algorithmic Graph Theory. Cambridge University Press; 1985. Google-Books-ID: Be6t04pgggwC.
  22. 22. Hall M, Woolhouse M, Rambaut A. Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set. PLoS Comput Biol. 2015;11(12):e1004613. pmid:26717515
  23. 23. Pearson K. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 1900;50(302):157–75.
  24. 24. Anderson MJ. A new method for non‐parametric multivariate analysis of variance. Austral Ecology. 2001;26(1):32–46.
  25. 25. Tjur T. Coefficients of Determination in Logistic Regression Models—A New Proposal: The Coefficient of Discrimination. The American Statistician. 2009;63(4):366–72.
  26. 26. Oksanen J, Simpson GL, Blanchet FG, Kindt R, Legendre P, Minchin PR. vegan: Community Ecology Package. 2025.
  27. 27. R Core Team R. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2025.
  28. 28. Geismar C. Mixtree: A statistical framework for comparing sets of trees. https://cran.r-project.org/web/packages/mixtree/index.html 2025.
  29. 29. Campbell F, Cori A, Ferguson N, Jombart T. Bayesian inference of transmission chains using timing of symptoms, pathogen genomes and contact data. PLoS Comput Biol. 2019;15(3):e1006930. pmid:30925168
  30. 30. Bang-Jensen J, Gutin GZ. Connectivity of digraphs. Digraphs: Theory, Algorithms and Applications. London: Springer. 2009. p. 191–226. https://doi.org/10.1007/978-1-84800-998-1_5
  31. 31. Geismar C, Nguyen V, Fragaszy E, Shrotri M, Navaratnam AMD, Beale S, et al. Bayesian reconstruction of SARS-CoV-2 transmissions highlights substantial proportion of negative serial intervals. Epidemics. 2023;44:100713. pmid:37579586
  32. 32. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol. 2013;178(9):1505–12. pmid:24043437
  33. 33. Gudmundarson RL, Peters GW. GTST: A Python Package for Graph Two-Sample Testing. Journal of Open Research Software. 2024.
  34. 34. Gross JL, Yellen J, Anderson M. Graph theory and its applications. 3 ed. New York: Chapman and Hall/CRC. 2018. https://doi.org/10.1201/9780429425134
  35. 35. Chen PP-S. The entity-relationship model—toward a unified view of data. ACM Trans Database Syst. 1976;1(1):9–36.
  36. 36. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18(5):821–9. pmid:18349386
  37. 37. Balaban AT. Applications of graph theory in chemistry. J Chem Inf Comput Sci. 1985;25(3):334–43.
  38. 38. Granovetter MS. The Strength of Weak Ties. American Journal of Sociology. 1973;78(6):1360–80.
  39. 39. Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214. pmid:17996036
  40. 40. Watson HW, Galton F. On the Probability of the Extinction of Families. The Journal of the Anthropological Institute of Great Britain and Ireland. 1875;4:138.
  41. 41. Bradley DR, Cutcomb S. Monte Carlo simulations and the chi-square test of independence. Behav Res Meth Instru. 1977;9(2):193–201.
  42. 42. Pavoine S, Ollier S, Pontier D, Chessel D. Testing for phylogenetic signal in phenotypic traits: new matrices of phylogenetic proximities. Theor Popul Biol. 2008;73(1):79–91. pmid:18022657
  43. 43. Robinson DF, Foulds LR. Comparison of weighted labelled trees. Combinatorial Mathematics VI. Berlin, Heidelberg: Springer. 1979. p. 119–26. https://doi.org/10.1007/BFb0102690
  44. 44. Steel MA, Penny D. Distributions of Tree Comparison Metrics-Some New Results. Systematic Biology. 1993;42(2):126.
  45. 45. Colijn C, Plazzotta G. A Metric on Phylogenetic Tree Shapes. Syst Biol. 2018;67(1):113–26. pmid:28472435
  46. 46. Kendall M, Ayabina D, Xu Y, Stimson J, Colijn C. Estimating Transmission from Genetic and Epidemiological Data: A Metric to Compare Transmission Trees. Statist Sci. 2018;33(1).
  47. 47. CyGei. CyGei/mixtree_analysis: mixtree-analysis. Zenodo. 2025. https://doi.org/10.5281/zenodo.17704759
  48. 48. Geismar C. Mixtree-analysis-data. Zenodo. 2025. https://doi.org/10.5281/zenodo.17704456
  49. 49. Shannon CE. A Mathematical Theory of Communication. Bell System Technical Journal. 1948;27(3):379–423.