EpiFusion: Joint inference of the effective reproduction number by integrating phylodynamic and epidemiological modelling with particle filtering

Ciara Judge; Timothy Vaughan; Timothy Russell; Sam Abbott; Louis du Plessis; Tanja Stadler; Oliver Brady; Sarah Hill

doi:10.1371/journal.pcbi.1012528

Abstract

Accurately estimating the effective reproduction number (R_t) of a circulating pathogen is a fundamental challenge in the study of infectious disease. The fields of epidemiology and pathogen phylodynamics both share this goal, but to date, methodologies and data employed by each remain largely distinct. Here we present EpiFusion: a joint approach that can be used to harness the complementary strengths of each field to improve estimation of outbreak dynamics for large and poorly sampled epidemics, such as arboviral or respiratory virus outbreaks, and validate it for retrospective analysis. We propose a model of R_t that estimates outbreak trajectories conditional upon both phylodynamic (time-scaled trees estimated from genetic sequences) and epidemiological (case incidence) data. We simulate stochastic outbreak trajectories that are weighted according to epidemiological and phylodynamic observation models and fit using particle Markov Chain Monte Carlo. To assess performance, we test EpiFusion on simulated outbreaks in which transmission and/or surveillance rapidly changes and find that using EpiFusion to combine epidemiological and phylodynamic data maintains accuracy and increases certainty in trajectory and R_t estimates, compared to when each data type is used alone. We benchmark EpiFusion’s performance against existing methods to estimate R_t and demonstrate advances in speed and accuracy. Importantly, our approach scales efficiently with dataset size. Finally, we apply our model to estimate R_t during the 2014 Ebola outbreak in Sierra Leone. EpiFusion is designed to accommodate future extensions that will improve its utility, such as explicitly modelling population structure, accommodations for phylogenetic uncertainty, and the ability to weight the contributions of genomic or case incidence to the inference.

Author summary

Understanding infectious disease spread is fundamental to protecting public health, but can be challenging as disease spread is a phenomenon that cannot be directly observed. So, epidemiologists use data in conjunction with mathematical models to estimate disease dynamics. Often, combinations of different models and data can be used to answer the same questions–for example ‘traditional’ epidemiology commonly uses case incidence data (the number of people who have tested positive for a disease during a certain time period) whereas phylodynamic models use pathogen genomic sequence data and our knowledge of the way their genomes evolve to model disease population dynamics. Each of these approaches have strengths and limitations, and data of each type can be sparse or biased, particularly during rapidly developing outbreaks or in countries with poor pathogen surveillance. An increasing number of approaches attempt to fix this problem by incorporating diverse concepts and data types together in their models. We aim to contribute to this movement by introducing EpiFusion, a modelling framework that improves the efficiency and precision at which we can monitor important changes in pathogen transmission (specifically, in the effective reproduction number). EpiFusion uses particle filtering to simulate epidemic trajectories over time and weight their likelihood according to both case incidence data and a phylogenetic tree using separate observation models, resulting in the inference of trajectories in agreement with both sets of data. Improvements in our ability to accurately and confidently model pathogen spread help us to respond to infectious disease outbreaks and improve public health.

Citation: Judge C, Vaughan T, Russell T, Abbott S, du Plessis L, Stadler T, et al. (2024) EpiFusion: Joint inference of the effective reproduction number by integrating phylodynamic and epidemiological modelling with particle filtering. PLoS Comput Biol 20(11): e1012528. https://doi.org/10.1371/journal.pcbi.1012528

Editor: Joëlle Barido-Sottani, Ecole Normale Superieure, FRANCE

Received: June 28, 2024; Accepted: October 1, 2024; Published: November 11, 2024

Copyright: © 2024 Judge et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: We make available all code and data used in this manuscript at the Github repositories 'https://github.com/ciarajudge/EpiFusion_PublicationRepo' and 'https://github.com/ciarajudge/EpiFusion'.

Funding: CJ was supported by a Bloomsbury Colleges PhD Studentship and a National University of Ireland Denis Phelan Scholarship. TWR was supported by funding from the Wellcome Trust (grant: 20650/Z/17/Z). SA was funded by the Wellcome Trust (grant: 210758/Z/18/Z). TS received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement no. 101001077. Further, TS and TGV acknowledge funding from ETH Zurich. OJB was supported by a UK Medical Research Council Career Development Award (MR/V031112/1). SCH was supported by a Sir Henry Wellcome Postdoctoral Fellowship from the Wellcome Trust (220414/Z/20/Z) [https://welcome.org/]. For the purpose of open access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The effective reproduction number (R_t) is a helpful epidemiological parameter for characterising disease transmission. R_t refers to the time-varying average number of secondary infections resulting from a primary infected individual and can vary due to factors such as population immunity, human behaviour, or changes in pathogen infectiousness. Retrospective modelling of how R_t varies over the course of an outbreak allows for evaluation of policy and intervention efficacy [1–4], and quantifying how different factors contribute to R_t can inform outbreak preparedness planning by providing the basis for modelling spread under different scenarios [5]. Classical epidemiology [3] and phylodynamics [4] often aim to infer R_t but use distinct methodologies and data to achieve this goal. Both fields face similar but non-overlapping obstacles in terms of data availability, reliability, and bias [6–9]. We investigate an approach to estimate R_t that reduces this uncertainty through linking principles of phylodynamic and epidemiological modelling using particle Markov Chain Monte Carlo (pMCMC) [10] which is scalable to large datasets.

Phylodynamic approaches allow estimation of the genealogical history of genome-sequenced sampled viruses and can therefore inform about disease spread that occurred prior to the first identified case. Phylogenetic trees frequently capture unusual population dynamics [11] that are not normally detectable using case data alone, such as long-range virus lineage movements, importations or growth in the dominance of specific variants. However, a central challenge for phylodynamics is that genomic data sampling density can be low or spatiotemporally biased relative to infection occurrence [12]. Furthermore, R_t has thus far been commonly estimated as a piecewise constant function that rarely has sufficient temporal resolution to be useful for public health decision making [13], with some exceptions [14].

Conversely, epidemiological models of R_t use case data that are often more spatiotemporally consistently sampled than genomic data, and usually have greater flexibility than phylodynamic models to accommodate additional information such as climatic or human movement data [15–18]. However, case data can be easily biased by changes in case definitions or reporting practices [7,19] which can cause artificial fluctuations in R_t estimates. Disease dynamics can only be examined once individuals with infections are detected, which may not occur until long after a pathogen starts to spread (whereas phylogenetic tree data can be used to reconstruct past pathogen dynamics prior to the sampling date of the earliest genome). Furthermore, viruses that can cause similar clinical symptoms (such as Zika, chikungunya and dengue viruses [20,21]) can be easily misreported where specific molecular or serological testing is not conducted. This can result in the inferred R_t capturing the average population dynamics of multiple cocirculating pathogens, which is then less useful to inform disease-specific control measures such as vaccination programs [22–24].

As a result of these limitations and strengths, phylodynamic and epidemiological approaches may vary in their effectiveness at different stages of an outbreak [12]. Approaches that combine principles and data from both phylodynamic and epidemiological models could improve our ability to estimate R_t, by taking advantage of the complementary strengths of each field.

Early attempts to use both phylodynamics and epidemiology to estimate disease dynamics typically employed a ‘corroborate or contradict’ strategy, where methods and data native to each field were used separately to address the same research question [25–27]. Alternatively, methods from each field have sometimes been used to address different research questions in the same study [28]. Recently, attempts have been made to develop joint inference approaches that use both phylodynamic (dated genomic sequence) and epidemiological (case incidence) data as input to a single model [29–34]. Many of these attempts have built on the principle of the particle filter [10]. Particle filtering is a sequential Monte Carlo approach that aims to approximate the posterior distribution of a state variable in a stochastic process (in this case, an epidemic trajectory). Particles move through a hidden Markov Model (the process model) and are weighted by their likelihood according to the data (the observation model). They can then be resampled according to their weights, resulting in the propagation of particles with estimated states consistent with the data under an observation model. The use of particle filtering is arguably the most straight-forward method to directly link epidemiological and phylodynamic models, as the resampling of particles through time allows the genealogical and epidemiological data to jointly influence the particle states during the state-simulation process.

Particle filtering is well established for use with epidemiological case incidence data, and there are many existing implementations of particle filtering in epidemiological modelling [35,36]. More recently, appropriate particle filtering approaches have been developed that can use genealogies obtained from sequence data. Rasmussen et. al first proposed a joint inference approach consisting of a common process model and separate observation models for a genealogy and case incidence data [30]. This methodology was later extended to allow fitting of epidemiological models that incorporate simple population structure [31], and was also used as the basis of an approach for inferring transmission heterogeneity [37]. These models were all reliant on coalescent-based phylodynamic methods and assumed independence between case incidence and events in the phylogenetic tree [38]. In 2019 Vaughan et. al proposed a method ‘EpiInf’, that enable the use of birth-death phylodynamic methods within a particle filter to infer epidemic trajectories through time [32]. EpiInf derived a phylodynamic likelihood that explicitly models case incidence data as ‘unsequenced observations’ within the phylodynamic observation model as ‘events’ on the tree, thus overcoming the independence assumption made in earlier approaches. However, this latter approach quickly becomes intractable as the number of sequences or cases increases (even when using a tau-leaping approximation [39]). It also greatly limits the possible complexity that could be obtained using a separate epidemiological observation model, which could feasibly incorporate diverse data sources (e.g. climate or human movement data). Conversely TimTam, proposed by Zarebski et al. in 2022 [29,40], is a (non-particle filtering) birth-death phylogenetic approach that can integrate case incidence and genomic sequence data in a computationally tractable way by approximation of the birth-death observation model density [41,42], while also eliminating the assumption of independence between tree and occurrence events. However, while it is possible to infer prevalence at user-specified times and R_t in piecewise constant intervals, it is not practical to infer continuous (here we use the term ‘continuous’ to refer to a fine grid size of a single day) epidemic trajectories with this model, which limits its ability to detect transmission fluctuations at a higher temporal resolution.

We develop a new approach, EpiFusion, that extends existing implementations that employ particle filtering or pMCMC [30,32] to reconstruct epidemic trajectories using case incidence and a phylogenetic tree either individually or together, while making the assumption that the tree and case trajectory are independent of each other. Our proposed approach improves on the limitations of previous methods by (i) introducing a birth-death based phylodynamic likelihood to a dual observation model structure (ii) making improvements in computational efficiency and (iii) allowing epidemic trajectories to be inferred in greater temporal resolution.

Methods

Theory

We adopt an overall structure based on the ‘common process model–dual observation model’ structure (Fig 1) used by Rasmussen et. al [30] and validated by many particle filtering methods outside of the context of infectious disease [43,44]. The data inputs (‘observations’ in Fig 1) to this model are case incidence, a time-scaled phylogenetic tree constructed from virus genomic sequences, or both data types together. The hidden particle states are the true number of individuals infected ‘I’ and any particle specific parameters.

Download:

Fig 1.

EpiFusion particle filter structure, with the particle states per unit time (green outlined boxes) driven by the parameters of the process model, evaluated at resampling steps by epidemiological and phylodynamic observation models against case incidence and phylogenetic tree segments respectively per unit time (orange and purple circles). All models in this manuscript use daily time units.

https://doi.org/10.1371/journal.pcbi.1012528.g001

Process model.

We use the term ‘process model’ to define how particle states are incremented between resampling steps in the particle filter. n particles model the number of infected individuals (I) in discrete daily intervals driven by a process model that assumes independent Poisson-distributed infection and recovery counts (Eq 1).

(1)

We have implemented this daily discretisation rather than modelling each infection trajectory event individually in completely continuous time to improve computational efficiency. Transmission dynamics are captured by modelling the change in the infection rate β and/or recovery rate γ over time (see Table 1B legend). R_t can be derived from the process model using the formula .

Download:

Table 1. A. Explanation of the data points used by EpiFusion; B. the key parameters of the EpiFusion particle filter.

Gamma, phi and psi are fit by MCMC, either as constant values over time or in epochs by either fixing or fitting change times and interval values. Beta must vary over time and can either be fit using (i) a random walk within the particle filter, (ii) linear splines within the particle filter, (iii) MCMC fitting in epochs by fixing or fitting change times and interval values, or (iv) MCMC fitting the parameters of a logistic function which defines beta over time; C. Other key terms in the EpiFusion particle filtering algorithm, in order of appearance in the text.

https://doi.org/10.1371/journal.pcbi.1012528.t001

Observation models.

At each resampling step, the particle states are evaluated against epidemiological and phylogenetic data using individual ‘observation models’; that is, models that define the weights (ω) of each particle state according to each dataset.

The provided epidemiological data c_t, represents the number of reported cases with symptom onset between regular intervals. As the particle infection trajectory is simulated through time, it ‘emits’ daily positive cases ρ_t at a rate I_tφ_t. These positive cases are summed for the days in the interval between case incidence observations. When t is a day with observed data, then this total can be evaluated against the total summed emitted cases in the corresponding interval ρ_interval (Table 1C), using the epidemiological observation model (Eq 2). This is not needed when case incidence is provided in daily intervals, in which case ρ_t can be directly compared to c_t. Examples of the fit of ρ_interval to corresponding case incidence data points in practice are available in S1 Fig. This process gives the ‘epidemiological weight’ of the particle given the case incidence data (ω^c_t, Eq 2). Currently users may choose between a Poisson probability density function (Eq 2A) and a negative binomial probability density function (Eq 2Bb) with an overdispersion parameter k for the epidemiological weight. Here we use a Poisson model as there is no overdispersion in the simulated datasets used for validation.

(2A)

(2B)

The particle weight given the phylodynamic data (a one-day segment of a time-scaled phylogenetic tree; g_t) is a daily discretisation of that which was derived by Vaughan et. al for EpiInf [32] (Eq 3). This is the sum (in log space) of the probabilities of the observed events (number of observed infections of new individuals b_t; number of sampling events s_t) on the tree segment and the exponentially distributed waiting times for events that were not observed (infections with rate β_tI_t−1 and genomic samplings of infected individuals with rate ψI_t−1).

(3)

We implement an importance sampling strategy to prevent trajectory events that are impossible given the tree structure, for example recovery events that result in fewer individuals being infected than there are lineages in the tree (S1 Text).

During resampling, the particles are weighted (ω_t) by the product of their phylodynamic and epidemiological weights (Eq 4), thus facilitating the propagation of particles that are consistent with both the phylogenetic and epidemiological data.

(4)

Fitting with MCMC.

Following completion of the particle filter, the overall likelihood of each estimated trajectory across the whole outbreak consists of the product of the average particle weights at each resampling step (Eq 5). This is therefore the likelihood of a particle trajectory sampled from the surviving particles given the epidemiological and phylogenetic data, and the parameter set of the particle filter θ which can be concurrently fit using MCMC.

(5)

This model is fit using Metropolis-Hastings MCMC sampling, deriving posterior samples of the number of infected individuals over time, and the rates β,γ,φ and ψ. Options are available for defining and fitting time-varying rates for the latter four parameters both within the particle filter, and by MCMC (Table 1B legend).

Implementation and distribution

We include details of the implementation of the EpiFusion algorithms in S2 Text, including pseudocode for the MCMC and particle filtering algorithms. The EpiFusion model is distributed as a Java program and the model source code, executable files, tutorials, example parameter files and guidance on usage are available at the GitHub repository, https://github.com/ciarajudge/EpiFusion, under the GNU General Public License. The program takes an XML file as input, which contains the data and parameters for the model. The user does not need to define any compartmental model (i.e. SIR, SEIR etc), but parameterisation of rates β,γ,φ and ψ is necessary with a selection of options available to users for priors or to allow discrete step-changes in these rates at specific times during the outbreak (e.g., corresponding to known dates of changes in public health surveillance strategies). Code used for the models and plots in this manuscript are housed at the GitHub repository https://github.com/ciarajudge/EpiFusion_PublicationRepo.

Model validation and testing

We validated and tested the performance of EpiFusion using five different approaches:: (i) comparison of the EpiFusion phylodynamic likelihood to the BEAST2 BDSky phylodynamic likelihood to validate our novel likelihood calculation (ii) large scale (i.e., many replicates) simulation based calibration [33,45], (iii) scenario testing, (iv) noise testing, and (v) benchmarking of accuracy against existing models.

Simulated datasets.

The latter four phases involved the use of simulated epidemic datasets with SIR transmission dynamics that were generated using ReMASTER [46]. ReMASTER produces the true trajectories over time of each population compartment (S, I and R), identified cases over time (which we aggregated into weekly case incidence), and a phylogenetic tree of all identified cases under the epidemiological sampling rate, which we downsampled to give a simulated phylogeny of sequenced samples with a smaller sampling rate (Fig 2). Details of the simulated datasets used in this manuscript are provided in the Supplementary Information (S1 and S2 Tables and S2–S4 Figs).

Download:

Fig 2. Example ReMASTER epidemic simulation and resulting data used for the EpiFusion program (specifically, according to the “Baseline Scenario” described in the section “Scenario Testing”).

(a) True number of people infectedover time, from which (b) weekly reported case incidence counts and a (c) phylogenetic tree of simulated samples were derived based on given sampling rates. Plots of the other simulated datasets are provided in S2–S4 Figs.

https://doi.org/10.1371/journal.pcbi.1012528.g002

Likelihood comparison.

To validate our daily approximation of the phylodynamic likelihood we compared the EpiFusion likelihoods to those computed with the BEAST2 [47] package BDSky (13). We examined the effect on the likelihood of varying the parameters β,γ, and ψ in turn around their true values with all other parameter values fixed to the truth. We repeated this on a range of simulated datasets with varying true values of each parameter. To evaluate the estimation of the infection or birth rate parameter (β), we used datasets generated under a constant-rate birth-death process in ReMASTER [46].

Simulation based calibration.

To assess calibration of our MCMC algorithm, we defined distributions of the EpiFusion model parameters β,γ,φ and ψ and simulated 500 unique epidemics using parameter combinations drawn randomly from these distributions. We then fit EpiFusion models with priors equal to the original distributions from which the parameters were drawn, and analysed the ability of EpiFusion to recapture the true parameter values within Highest Posterior Density (HPD) intervals of increasing credible mass. A perfectly calibrated MCMC algorithm should result in 5% of models capturing the true parameter in their 0.05 HPD intervals, 10% of models capturing the true parameter in their 0.10 HPD intervals, etc. The β parameter varies over time in both our model and the simulated data (i.e. it is modelled as β_t), as opposed to consisting of one fixed value per simulation. Thus, to calculate coverage at a given value of credible mass alpha for the β parameter, we took the average proportion of the true β_t trajectory that falls within the inferred HPD interval across all replicates.

Scenario testing.

We examined the ability of EpiFusion to reconstruct infection and R_t trajectories under a range of epidemic scenarios. The parameters under which each of these scenario datasets were simulated are included in the Supplementary Information (S2 Table). To assess the advantage of combining phylodynamic and epidemiological data in this framework, models using solely the phylogenetic tree or case incidence data were compared to using a combination of both (S3 and S4 Tables). The three scenarios examined were: (i) the introduction of a novel pathogen into an immune naïve population with time-constant sampling, (ii) an introduction scenario with a step-change in sampling when the outbreak is ‘discovered’, and (iii) a step-change in transmission of an endemic pathogen that has previously circulated at stable levels. We assessed model performance according to a selection of metrics and probabilistic scoring rules (Table 2). Further details on the performance metrics used and how they were calculated are included in the Supplementary Information (S5 Table).

Download:

Table 2. Statistics used to evaluate model performance under scenarios 1, 2, and 3 for analyses using case incidence only (epi), phylogenetic tree only (phylo), and both data sources combined (combo).

The best or joint-best result for each statistic for each scenario is highlighted in bold. Trajectory RMSE: root-mean-squared error. Calibrated Trajectory Coverage: proportion of true trajectory that falls within the 95% HPD, scaled by 0.95. Scaled HPD Width: mean width of the 95% highest posterior density interval, scaled by the true value. Continuous Ranked Probability Score: mean CRPS across the trajectory time series. Brier Score: Classification accuracy for transmission phase (Rt) being above or below 1. Further details on the calculation of these statistics are included in the Supplementary Information (S5 Table).

https://doi.org/10.1371/journal.pcbi.1012528.t002

Noise testing.

We then tested model robustness to noise, by testing scenarios with increasing transmission or observation noise and examining the effect on the inferred R_t continuous ranked probability score. Here we use the term transmission noise to mean fluctuations in R_t, and observation noise to mean fluctuations in the case and sequence sampling rate. We achieved increasing noise in the ReMASTER simulations by replacing constant transmission or sampling rates with a time series of rates drawn from Gaussian distributions with increasing standard deviations (S2 Table).

Benchmarking against existing approaches.

For the three scenarios outlined in ‘Scenario Testing’, we benchmarked the combined EpiFusion model against existing packages EpiNow2 [48], BDSky [13] and TimTam [29,40] which are respectively among the most commonly used tools for estimating R(t) from epidemiological, phylodynamic, and both data types. The BDSky and TimTam models are usually provided with a sequence alignment as input data and subsequently infer phylogenetic trees. Here, we instead directly provided these models with the same fixed tree as was provided to EpiFusion (i.e., a phylogeny down-sampled from the tips in the simulated true transmission tree). This removed phylogenetic uncertainty to allow a fairer comparison of the model performances. Full model specifications are in the Supplementary Information (S3 Text). As the BDSky and TimTam models require specification of intervals in which to infer R(t), uniform intervals of 5 or 10 days were provided. It was necessary to use different specifications of R_t intervals for the TimTam and BDSky approaches across different scenarios due to a particular sensitivity in TimTam to the interval change times, where placing the intervals at certain points resulted in highly impractical estimates.

Ebola virus disease in Sierra Leone

Finally we used an EpiFusion combined model with a negative binomial epidemiological observation model to infer R_t over the course of the 2014 Ebola virus outbreak in Sierra Leone. We obtained case count data from Fang et. al [49] and a maximum clade credibility tree generated from a BEAST Coalescent Skygrid analysis with an uncorrelated lognormal relaxed clock from the Github repository associated with Dellicour et. al [50]. The tree contained samples from Sierra Leone, Guinea and Liberia, so we selected a monophyletic clade of 980 Sierra Leone sequences (S5 Fig). We aggregated the case count data (total 8358 confirmed cases) into weekly incidence to reduce any observation noise introduced by weekly periodicity in reporting rates (S6 Fig), and used a combined EpiFusion model to estimate national R_t from March 2014 to August 2015 (78 weeks). We fit β as a series of linear splines (see Table 1B legend), and γ as a constant value over time. The model was run using 6 chains of 10,000 MCMC samples with 300 particles each.

Results

Testing on simulated data

Likelihood comparison.

Our comparison of the phylodynamic likelihood calculated by EpiFusion with that calculated in BDSky shows good agreement between the two approaches (Fig 3), though the stochastic and approximate nature of the EpiFusion likelihood means that the values are not identical. The EpiFusion likelihood curves are also less smooth due to the stochastic nature of the algorithm. As the parameter values get further from the truth for the β and γ parameters, the EpiFusion likelihood drops sharply due to the parameter values implying highly unlikely or even impossible trajectories. More extensive likelihood comparisons are available in the Supplementary information (S7 Fig).

Download:

Fig 3.

Comparison of median log likelihoods generated by EpiFusion (green) and a birth-death skyline model implemented in the BEAST2 (50) package BDSky (14) for the parameters β,γ and ψ. The true value of the parameter is marked by the blue vertical line.

https://doi.org/10.1371/journal.pcbi.1012528.g003

Simulation based calibration.

In Fig 4 we show the results of the simulation-based calibration of the combined EpiFusion model. Fig 4A shows the proportion of replicates (or ‘coverage’) that recover the true parameter with increasing credible interval mass (‘alpha’). We note that coverage increases with increasing credible interval mass, however slight under-coverage is observed, particularly for the γ parameter. This is also demonstrated in Fig 4B, where the model appears to have limited ability to estimate the γ parameter. However, the model does appear to recapture the true values of the sampling parameters φ and ψ, with only slight underestimation for larger true values. The model was generally able to accurately infer the values of β over time.

Download:

Fig 4.

(a) Proportion of replicates that capture the true value of the parameter within their HPD intervals (y-axis) of increasing credible mass alpha (x-axis), for the parameters: β infection parameter (green), γ recovery parameter (blue), φ case sampling rate (yellow) and ψ sequence sampling rate (orange). For the infection rate parameter β (which varies over time), the y-axis reflects the average proportion of the β trajectory captured in the HPD interval across all replicates (b) Mean inferred value and 95% HPD interval of the parameter (y-axis) plotted against the true value of the parameter (x-axis). For the infection parameter β, a subset of 1000 values of β_t is shown for clarity in the plot as β varied over time in the simulations and models, so each replicate resulted in the inference of many β_t values. For both graphs the grey dotted line indicates the ‘perfect’ result: perfect calibration for (a) and perfect agreement between true and inferred parameters for (b).

https://doi.org/10.1371/journal.pcbi.1012528.g004

Scenario testing.

Next we evaluated how well EpiFusion could reconstruct trajectories of infections and R_t corresponding to simulated outbreaks reflecting three common epidemiological scenarios: (i) the introduction of a novel pathogen into an immune naïve population with time-constant sampling, (ii) an introduction scenario with a step-change in sampling when the outbreak is ‘discovered’, and (iii) a step-change in transmission of an endemic pathogen. We compared the performance of EpiFusion using as input solely case incidence data, solely a phylogenetic tree, and using both datasets combined. The metrics by which models are compared and their statistics are summarised for a single realisation of each scenario in Table 2.

We first considered a scenario in which a novel pathogen enters an immune naïve population with constant sampling: the ‘baseline scenario’. Here, each approach successfully captured the true epidemic and R_t trajectories within the 95% HPD intervals (Figs 5 and 6), however the tree only approach underperformed compared to the case incidence only and combined approaches according to the metrics that we chose for evaluation (Table 2). The combined approach was most successful in estimating the true infection trajectory (Infection Trajectory RMSE: 41.3) compared to tree only and case incidence only models (329.8, 43.2) (see Tables 2 and S5 for a description of the statistics). These improvements in infection trajectory estimation are accompanied by a reduction in the width of the scaled HPD intervals (1.13 vs 1.41 and 1.94), a positive result indicating increased confidence, provided that coverage and accuracy is maintained (as is observed here). The Continuous Ranked Probability Score (CRPS) was used to evaluate the probability of the true infection or R_t trajectory given the posterior infection or R_t trajectories from each model, where a lower value equates to a more accurate result. Here the combined approach also performed best for both infection and R_t trajectories (41.04 vs 88.19 and 162.91 for infection trajectories and 0.129 vs 0.188 and 0.196 for R_t trajectories).

Download:

Fig 5.

Inferred mean infection count trajectories from EpiFusion using only case incidence (orange), only the phylogenetic tree (purple) and both data types combined (green) (columns) for the three scenarios tested (rows). The true number infected over time is represented by the black line. 95%, 80% and 66% highest posterior density intervals are represented by increasingly dark shaded regions. Times of step-changes are marked by the vertical dotted lines for the step-change in sampling and transmission scenarios: a 10-fold increase in case and genomic sequence sampling rates on day 35 for the ‘Sampling’ step-change scenario, and a 3-fold increase in transmission rate on day 100 for the ‘Transmission’ step-change scenario.

https://doi.org/10.1371/journal.pcbi.1012528.g005

Download:

Fig 6.

Inferred R_t from EpiFusion using only case incidence (orange), only the phylogenetic tree (purple) and both data types combined (green) for the three scenarios tested (rows). True R_t is represented by the solid black line. 95%, 80% and 66% highest posterior density intervals are represented by increasingly dark shaded regions. Times of step-changes are marked by vertical dotted lines: a 10-fold increase in case and genomic sequence sampling rates on day 35 for the ‘Sampling’ step-change scenario, and a 3-fold increase in transmission rate on day 100 for the ‘Transmission’ step-change scenario. An R_t of 1 is marked by the dashed horizontal line. The true R_t fluctuates at the end of the sampling step-change scenario due to very low prevalence as the outbreak ends.

https://doi.org/10.1371/journal.pcbi.1012528.g006

Each of the approaches demonstrated a slight propensity to over-cover the infection and R_t trajectories (calibrated trajectory coverages > 1). The combined approach led to a decrease in R_t trajectory RMSE (0.217 vs 0.333 and 0.356). We also used the Brier score (mean squared error between the probabilistic prediction and the true outcome) to evaluate each approach based on its ability to predict transmission phase, i.e. correctly estimating if R_t is above or below 1, where a lower Brier score indicates improved performance. We find each approach to be adept at classifying R_t as above or below 1, however the combined approach (0.011) leads to a marked improvement compared to the case incidence only or tree only approaches (0.034, 0.042).

The second scenario consisted of a simulated outbreak with similar transmission dynamics to the introduction scenario but for which levels of both genomic and case sampling are low during the initial period of spread until more widespread surveillance is introduced (thus leading to a step-wise increase in sampling density). This was characterised in the data simulation by a spontaneous 10-fold increase of the case and genomic sequence sampling rates on day 35 of the simulation (S2 Table). Here, the date of this step-change is provided as a fixed parameter to the model under the assumption that it would be known to health authorities, but fixing this parameter is not strictly necessary to run the model as it can be co-inferred with MCMC by providing the model with an expected number of step-changes in sampling rates. The sampling rates before and after the step-change are inferred as parameters of the MCMC.

For this analysis, all three approaches successfully infer the R_t trajectories (Fig 6), but slightly overestimate the peak of the infection trajectories, with the case incidence only approach being the least accurate. This is further reflected in the performance metrics (Table 2), where the case incidence only approach performs the best for only one metric, scaled R_t trajectory HPD width. The combined approach demonstrates optimal scaled coverage of the true infection trajectory (1), while at the same time reducing the HPD interval width (0.96 vs 1.07, 1.26) in comparison to individual approaches (Fig 5). The combined approach also led to the best R_t trajectory CRPS results (0.123 vs 0.181 and 0.183) by a wide margin and led to a reduction of almost 50% in the Brier score (0.019 vs 0.031 and 0.032). The tree only approach demonstrated more advantages in this scenario than in the other scenarios, resulting in the best infection trajectory RMSE and CRPS (137.09 and 159.68, respectively).

The final scenario examined was a scenario in which a step-change in transmission was simulated, such as when a pathogen experiencing endemic transmission undergoes a rapid increase in transmission, but where sampling parameters remain constant. Specifically, we simulated an outbreak scenario where the transmission rate was increased 3-fold on day 100 of the simulation (S2 Table). For this analysis, the date of the transmission increase was inferred as a parameter of the MCMC (it is possible to fit any number of rate step-changes with EpiFusion; it is not currently possible to infer the number of step changes). All three analyses broadly captured the epidemic trajectories (Fig 5), with the case incidence only approach demonstrating better coverage (1.01, vs 0.95, 1.05), however the combined approach resulted in the lowest trajectory RMSE (131.41 vs 196.07 and 171.19) and CRPS (83.44 vs 109.25 and 114.23). The combined approach also resulted in a slightly improved CRPS (0.15 vs 0.16 and 0.20), along with improved R_t RMSE (0.266, vs 0.291 and 0.349). The Brier score for this scenario is the only instance across all metrics and scenarios where the combined approach did not result in an improvement or perform equally to one or both individual approaches. However, the difference between all three approaches for this metric is marginal (0.105, 0.101, 0.109 for case incidence only, tree only and combined approaches, respectively).

Noise testing.

Next we examined the performance of the three approaches on scenarios with increasing observation and transmission noise, and summarise the results by examining how the R_t RMSE, CRPS, and Brier Score changes (Fig 7). R_t trajectory fits for these scenarios are included in the Supplementary Information (S8 and S9 Figs). The tree only approach appears most robust to observation noise. Each metric sees a decrease in performance with increasing noise, with the exception of the Brier Score, which improves with increasing transmission noise.

Download:

Fig 7.

R_t trajectory RMSE, CRPS and Brier Score (y-axes) for case incidence only (orange), tree only (purple) and combined (green) EpiFusion approaches on scenarios with increasing noise (x-axes). For each of these metrics, a value closer to 0 reflects a better score. Noise is quantified as the standard deviation divided by the mean of the distribution from which the transmission or sampling rates were drawn. The general trend is shown by linear regression lines of the corresponding colour.

https://doi.org/10.1371/journal.pcbi.1012528.g007

Benchmarking against existing approaches.

We compared the performance of the EpiFusion combined model against existing R_t inference methods (Fig 8) on the simulated datasets from the scenario testing section. We used (i) EpiNow2 [48], (ii) a Birth-Death Skyline Serially Sampled model implemented in BEAST2 (BDSky) [13], and (iii) TimTam [29,40] implemented in BEAST2 to represent commonly used approaches for estimating R(t) from only molecular data, case incidence data, and both data types. Further information on model specifications is included in the Supplementary Information (S3 Text).

Download:

Fig 8.

Estimated mean Rt and 95% HPD intervals for the three validation scenarios from EpiFusion (green), EpiNow2 (blue), BDSky (red) and TimTam (yellow).

https://doi.org/10.1371/journal.pcbi.1012528.g008

R_t posteriors were obtained from each pre-existing tool for all three scenarios and compared to the combined EpiFusion approach. The strengths and weaknesses of the different models are apparent when examining their performance under selected scoring criteria (Table 3).

Download:

Table 3. Model Benchmarking.

https://doi.org/10.1371/journal.pcbi.1012528.t003

Each model captured the general trend of transmission for all three scenarios, with some weaknesses. Using EpiFusion resulted in improved R_t RMSE for all three scenarios. EpiFusion also led to substantially improved Brier scores compared to other methods for the introduction and sampling scenarios. For the sampling and transmission scenarios, EpiFusion resulted in improved R_t CRPS by a large margin, and the best coverage by a smaller margin. Notably EpiFusion never produced the worst performance under any scenario and metric combination. EpiNow2 performed well in the introduction scenario, yielding the best R_t CRPS, however the model somewhat struggled with identifying the sharp fluctuations in transmission in the third scenario, especially the initial step-change in transmission, possibly due to the smoothing influence of the Gaussian process. For the sampling scenario it was not possible to parameterise the large and sudden step-change in sampling in the EpiNow2 model. This is reflected by the underperformance of EpiNow2 in this scenario, where the sharp increase in case incidence due to increased sampling is instead interpreted by the model as sustained transmission of R_t > 1 (Fig 8). The BDSky approach systematically overestimated R_t towards the end of the time series, a problem which interestingly also affected the EpiFusion tree only model fits (Fig 6). However, the model generally demonstrated good coverage of the true R_t, despite inferring the parameter in piecewise constant intervals. Conversely, TimTam struggled with slight overestimation of R_t at the beginning of the time series.

Ebola virus disease in Sierra Leone

Finally, we demonstrated the use of an EpiFusion combined model on real data by retrospectively inferring the R_t of Ebola virus in Sierra Leone from March 2014 to August 2015 (Fig 9). The root of the tree was in March 2014, approximately two months prior to the first observed epidemiological case, allowing us to model the early dynamics of the outbreak. The EpiFusion analysis was completed within 9 hours on a MacBook Air M3 PC with an 8 core CPU. We expect that the long duration of the time series (>1.5 years) influenced the runtime.

Download:

Fig 9.

(a) Phylogenetic tree of Ebola virus sequences in Sierra Leone consisting of a subclade of the MCC tree obtained from Dellicour et. al [50], with tips coloured by region at a 1^st administrative unit level. (b) Weekly case incidence of Ebola virus disease in Sierra Leone obtained from Fang et. al [49], stratified by region. (c) Inferred median effective reproduction number (solid line) of Ebola virus disease in Sierra Leone from an EpiFusion combined model. 95%, 80% and 66% highest posterior density intervals are represented by increasingly dark shaded regions. Two key dates in the epidemic are labelled: (i) Declaration of a national state of emergency on August 6^th 2014, and (ii) national three day quarantine beginning on September 19^th 2014.

https://doi.org/10.1371/journal.pcbi.1012528.g009

We estimate the initial R_t during the first week of the study time series to be 1.33 (with lower and upper 0.95 HPDs of 1.04 and 1.61 respectively). Fig 9C shows that the R_t trajectory inferred by EpiFusion is in agreement with other estimates in the literature [47,51–53] including a birth-death phylodynamic approach implemented by Alizon et. al [52], and epidemiological models used by Towers et. al [51] and the WHO Ebola Response Team [53]. The average daily reproductive number by Wiratsudakul et. al [54] for the first year of the outbreak was comparable to our estimate over the same time period (1.03 vs 1.08), with the estimates in this paper also mirroring the small uptick in R_t we observe in early 2015. However, EpiFusion infers a slightly later time period for the decrease of R_t below 1.0 (13^th October, 0.95 HPD 18^th September– 5^th November) than some other studies (Althaus et. al ‘late July’ [55], Nishiura et. al, ‘late August’ [56]).

The trajectory also aligns well with key dates [49] during the outbreak, particularly the three day nationwide quarantine on September 19^th 2014 [57] which is followed by a sharp drop in the inferred R_t of our model.

Discussion

We outline EpiFusion, a computationally tractable and flexible infrastructure for the combination of phylogenetic and epidemiological data to estimate infection and R_t trajectories. EpiFusion fills a gap in current modelling approaches at the intersection between the fields of phylodynamics and epidemiology (Table 4). We show that by combining data types with EpiFusion it is often possible to improve the accuracy of R_t or infection trajectory estimates compared to using only phylogenetic or epidemiological data alone.

Download:

Table 4. Comparison of the key characteristics of EpiFusion compared to the tools and literature referred to in this manuscript.

Rasmussen’11 denotes Rasmussen et. al (2011), which was referenced in the introduction. However, the model is not distributed for use as a software or program, so we were unable to assess its computational efficiency (*). (BD–birth death).

https://doi.org/10.1371/journal.pcbi.1012528.t004

Through extensive simulations we found the EpiFusion model to be adept at recapturing the case incidence and genomic sampling parameters φ and ψ. The model was less able to accurately recapture the γ recovery parameter, but this can often reliably be obtained from empirical literature [57,58], and thus could be informed in practice with a strong prior. Given that the EpiFusion process model simulates epidemic trajectories according to the balance of the infection and recovery parameters β and γ, we suspect the flexible specification of time-varying β disincentivises accurate inference of the γ parameter. However, while we would expect β to also be biased in the opposite direction to γ under this hypothesis, Fig 4B indicates that the model is capable of accurately inferring the true value of β over time and the time-varying nature of β in the model and simulated data made it difficult to fully characterise any bias in the parameter. Nevertheless, although the model does not consistently recover the γ parameter, it does reliably reconstruct infection and R_t trajectories over time (Figs 5, 6 and S10). Future development of EpiFusion will aim to improve coverage of epidemiological parameters.

When testing the ability of EpiFusion to recover changes in R_t in different epidemiological scenarios (Scenario Testing section) the ‘baseline scenario’ aimed to represent a situation such as the emergence of a novel pathogen [59] or the expansion of an existing pathogen into a new ecological niche [60]. All three EpiFusion approaches (case incidence only, tree only, combined) were able to accurately reconstruct the epidemic trajectories of the simple, single epidemic peak, with the combined approach resulting in the best result for seven of the nine performance metrics tested. While the outbreak lasted 100 days, the inference using the phylogenetic tree is truncated at day 69 of the simulation as this is the date of the last sampling event on the tree. An advantage of the combined model is therefore that the trajectories can be jointly inferred up until the last sampling event on the tree, but after this point R(t) can still be estimated using any additional case incidence data only (as is often the case in real-time outbreak response, where recent case incidence data usually precedes new genomic sequences). Conversely, where the most recent common ancestor of viruses sampled is phylogenetically estimated prior to the first observed case (as in the Ebola example we show here (Fig 9), it is possible to infer R(t) for earlier time points than possible for case incidence only approaches.

We subsequently considered more complex scenarios in which the sampling or transmission rates change over time in a more realistic way. Such changes are widely acknowledged to complicate the estimation of R_t. This allowed us to examine how combining phylodynamic and epidemiological models and data could improve our ability to accurately estimate R_t under such challenging scenarios. The rationale for the step-change in sampling scenario was to emulate the transition of a disease from passive to active surveillance, perhaps due to the declaration of a Public Health Emergency of International Concern (PHEIC), resulting in a lack of data from the early stages of an outbreak and a lack of comparability in case numbers before and after detection is scaled up. This also applies for novel pathogens that do not have established means of clinical diagnoses or reporting, or where testing is initially limited. For example, during the Zika virus epidemic in Brazil in 2016, case detection rates rose sharply following the implementation of widespread PCR testing [61], compared to the beginning of the outbreak. The tree only approach demonstrated more advantages during this scenario than in the other scenarios tested, which is likely due to the additional information captured by birth events in the tree even when sampling was low. Notably the combined approach led to improved R_t continuous ranked probability scores, the probabilistic scoring rule we chose for model comparison. For both the baseline and sampling scenario the combined model greatly outperformed the individual approaches according to the Brier score metric, leading to ~4 fold and ~2 fold decreases in the baseline and sampling scenarios, respectively. This indicates that the combined approach may benefit estimation of whether an epidemic is growing or declining, which is a useful public health indicator to be able to evaluate with certainty [62,63].

The step change in transmission scenario was used to mimic a sudden increase in transmission, such as a change in human behaviour (e.g. school holidays end, non-pharmaceutical intervention ceases), or a change in the intrinsic transmissibility of a pathogen (e.g. a new variant [64]). The phylogenetic tree simulated from ReMASTER is more applicable to the former, in that all ‘active’ lineages at the time of the step-change undergo an equal increase in transmission which is not what would be observed in the case of a new, more transmissible variant. Currently, EpiFusion does not attempt to infer lineage specific transmission rates, but any future incorporation of lineage specific analyses will require this to be considered. Among the three approaches, the tree only approach detected the earliest uptick in the R_t trajectory due to the step-change in transmission rate (Fig 6) by a small margin, but all three approaches indicated the increase in a timely manner (within 1 day). The combined approach confidently inferred the time and magnitude of the increase of transmission, in both the infection and R_t trajectories (Figs 5 and 6). This approach also led to the best RMSE and CRPS scores for the infection and R_t trajectories, and a comparable Brier score to the individual approaches.

Overall, the combined-model tended to reduce uncertainty compared to case-only and phylogenetic-only approaches, as observed by narrowing of the HPD intervals of the infection trajectories, while maintaining coverage (Table 2). For all three of the main scenarios, the combined approach led to the best R_t CRPS and R_t trajectory RMSE, and it consistently outperformed one or both of the individual approaches according to our other metrics. There may be some circumstances, however, where either the pure epidemiological or phylodynamic approaches are preferable, such as if one dataset is suspected to be highly biased or incomplete. This points to the benefit of the versatility of the EpiFusion program; while we emphasise the combined inference abilities of EpiFusion, it is possible to run analyses using either case incidence or the phylogenetic tree alone. Furthermore, the program is sufficiently fast for users to test tree only, incidence only, and combined approaches in a reasonable timeframe. It is also theoretically possible to specify the weight of each dataset’s contribution to the inference, allowing further customisation of the combined approach. Going forward, we aim to characterise the implementation and effect of data weighting more thoroughly.

In Fig 7 we explore the effect of increasing transmission and observation noise on the ability of the EpiFusion models to accurately infer R_t. Currently we do not explicitly model observation noise in the EpiFusion algorithm, however the tree only approach appears particularly robust to even high levels of observation noise. This is possibly due to the extra information provided by branching events in the tree providing a smoothing effect despite noisy sampling rates, and further indicates the possible benefit using phylogenetic data rather than solely case incidence data when estimating R_t. Interestingly, the Brier Score saw an improvement for all three approaches with increasing transmission noise. We believe that the increased transmission noise resulted in more extreme fluctuations in the R_t which provided more signal for the models to distinguish whether R_t was less than or greater than 1.0 (S8 Fig).

By benchmarking of EpiFusion’s combined model against existing approaches we show that the model can achieve comparable or improved results compared to established epidemiological or phylodynamic tools. For many of the performance metrics used, the difference in scores between all models was marginal, however, EpiFusion led to improved R_t RMSE in all scenarios compared all other models (Table 3). EpiNow2 proved difficult to parameterise for some scenarios, so it is also possible that an improved parameterisation of the model would result in better estimates. For example, it was not possible to parameterise a step-change in sampling rate in the EpiNow2 model, and the method consequently underperformed in the step-change in sampling scenario.

Finally we examined the performance of EpiFusion using data on the 2014 Ebola outbreak in Sierra Leone. The fact that the most recent common ancestor (MRCA) of the viral phylogeny (March 2014) occurs approximately two months prior to the first sampled case of Ebola in the region (May 2014) allowed modelling of R(t) from an earlier time point than would have been possible using case incidence data alone. We found the model to be sensitive to the sampling parameterisation due to temporal bias in the sampling of genomic sequences compared to the case data, i.e. large fluctuations in the genomic sampling rate of cases over time would sometimes result in particle depletion (a steep drop in the number of particles inferring ‘possible’ trajectories) between particle resampling steps a higher rejection rate of the MCMC algorithm. For this reason, it was necessary to run the model for a larger number of MCMC steps than necessary using simulated data in order to improve the effective sample sizes of model parameters. Similarly, we found that it was necessary to run the particle filter with a greater number of particles to avoid this particle depletion, which also contributed to a slightly longer runtime than the other analyses in this paper. Despite these two caveats we found that R(t) inferred from EpiFusion for this outbreak was similar to that previously obtained in the literature [50,54,55].

Our approach retains some limitations and necessitates some assumptions that provide opportunity for future improvements. As with many models of this type, the model may underperform or exhibit convergence issues if provided with especially biased case incidence or phylogenetic tree data, for example in the early stages of an emerging outbreak where misdiagnosis as other conditions may be common and reported cases may comprise of a combination of autochthonous and imported cases. Thus we advise potential users to exercise discretion in when considering their data inputs. Unlike other phylodynamic approaches such as TimTam, EpiFusion does not estimate phylogenies alongside trajectories, and instead takes single phylogenetic trees as inputs. We aim to better account for phylogenetic uncertainty in the future. However, the computational trade-off of not performing tree inference means that our method may be appropriate for use in rapidly unfolding outbreaks once it has been further validated in a real-time setting, as it is highly scalable to inclusion of trees with thousands of tips. Although not yet optimised for high performance computing or able to take advantage of a GPU, the runtime of EpiFusion generally scales linearly with both tree and epidemic size (S11 Fig), making it suitable to analyse very large datasets, which may become more relevant due to the sharp increase in genomic sequencing during the recent COVID-19 pandemic. The model is therefore currently best suited as a post-hoc tool using an MCC tree generated with BEAST [50], or a time-scaled maximum-likelihood phylogeny such as that which can be generated using NextStrain [65].

The lightweight composition of this model provides the opportunity for the future introduction of additional complexity without overtly increasing computational load. This includes the introduction of population structure or vector population dynamics. The separation of the phylogenetic and epidemiological observation models in EpiFusion also lays the foundation for the combination mathematical epidemiological models that previously would have been too complex to integrate into the phylodynamic likelihood with phylogenetic data to jointly model epidemic trajectories.

In conclusion, we propose EpiFusion as a new addition to the small, but growing, number of tools that integrate phylodynamics and epidemiology for the modelling of infectious disease. EpiFusion builds upon the foundation laid by its predecessors to make improvements in computational efficiency, temporal resolution and flexibility.

Supporting information

S1 Text. Information on the importance sampling implementation used within EpiFusion.

https://doi.org/10.1371/journal.pcbi.1012528.s001

(DOCX)

S2 Text. Pseudocode for the two key EpiFusion algorithms: (1) the MCMC algorithm and (2) the particle filtering algorithm.

https://doi.org/10.1371/journal.pcbi.1012528.s002

(DOCX)

S3 Text. Details on the model parameterisation for the benchmarking section, where existing Rt modelling methods were used.

https://doi.org/10.1371/journal.pcbi.1012528.s003

(DOCX)

S1 Fig. The fit of the simulated incidence from the EpiFusion model weekly incidence data as explained in the methods section.

The black dots represent case incidence data points c_t, which are compared to ρ_interval by the epidemiological observation model. We save the ρ_interval values from the model to facilitate examination of this fit. The coloured lines show the mean ρ_interval values and the shaded regions show HPD intervals of increasing credible mass. Here we show the results of this fit for the combined and case incidence-only approaches in the Scenario Testing section (the tree-only models do not have an epidemiological observation model so this fitting does not take place).

https://doi.org/10.1371/journal.pcbi.1012528.s004

(TIFF)

S2 Fig.

True infection trajectories, case incidence data, and phylogenetic trees for the step change in sampling (a, b, c) and transmission scenarios (d, e, f) in the Scenario Testing section.

https://doi.org/10.1371/journal.pcbi.1012528.s005

(TIFF)

S3 Fig. True infection trajectories, case incidence data, and phylogenetic trees for simulated outbreaks with increasing transmission noise.

Transmission noise was simulated in ReMASTER by varying the transmission rate at regular intervals drawn from a Poisson distribution with rate 6 days.

https://doi.org/10.1371/journal.pcbi.1012528.s006

(TIFF)

S4 Fig. True infection trajectories, case incidence data, and phylogenetic trees for simulated outbreaks with increasing observation noise.

Observation noise was simulated in ReMASTER by varying the sampling rate at intervals of 7 days.

https://doi.org/10.1371/journal.pcbi.1012528.s007

(TIFF)

S5 Fig. Publicly available existing MCC tree of Ebola sequences from 2014 obtained from Dellicour et. al (53).

The highlighted clade consisting of predominantly Sierra Leone sequences was subsampled for our analysis, and the small Guinea subclades and singleton nodes that represent repeated exports from Sierra Leone were removed. The origin of the highlighted clade was March 20^th 2014, which preceded the first case data in Sierra Leone. We therefore modelled the outbreak from this date until the date of the last sampled sequence in the clade (August 4^th 2015).

https://doi.org/10.1371/journal.pcbi.1012528.s008

(TIFF)

S6 Fig. Weekly confirmed and suspected cases of Ebola in Sierra Leone during the period of investigation obtained from Fang et. al.

The first confirmed case was on May 18^th 2014, two months after the root of the MCC tree that we used and the beginning of the time period we modelled. For our model, we fit to confirmed cases, but used the suspected cases to help inform our sampling rate priors by indicating what proportion of the true number of infections were being sampled as cases.

https://doi.org/10.1371/journal.pcbi.1012528.s009

(TIFF)

S7 Fig.

Comparison of EpiFusion and BDSky likelihoods on the same datasets for varying values of (a) beta, (b) gamma and (c) psi around the true values (marked by the blue vertical line). The stochastic and approximate nature of the EpiFusion likelihood means the values are not identical, though they do show good agreement in awarding the true value with the highest likelihood. As the model values of each parameter become further from the true value, the EpiFusion likelihood shows a tendency to drop sharply due to the parameters values implying very unlikely or impossible trajectories. The EpiFusion models appear to demonstrate a marginal overestimation of the sampling parameter psi here, however this was not seen in the simulation based calibration.

https://doi.org/10.1371/journal.pcbi.1012528.s010

(TIFF)

S8 Fig.

Rt trajectory fits for EpiFusion models on datasets with increasing transmission noise. The true Rt (black line) fluctuates in intervals of ~ 6 days. The row labels (right) indicate the noise level (see Methods ‘Noise Testing’ for more information).

https://doi.org/10.1371/journal.pcbi.1012528.s011

(TIFF)

S9 Fig. Rt trajectory fits for EpiFusion models on datasets with increasing observation noise.

The real Rt (black line) is smooth with increasing uncertainty in the fits introduced by noisy data, where the sampling rate changed every 7 days. The row labels (right) indicate the noise level (see Methods ‘Noise Testing’ for more information).

https://doi.org/10.1371/journal.pcbi.1012528.s012

(TIFF)

S10 Fig. Trajectory fits for a random sample of 60 of the 500 models fitted in the Simulation Based Calibration section.

The true trajectory is marked by the black line, with the mean inferred trajectory represented by the green line and the HPD intervals indicated by shaded green regions.

https://doi.org/10.1371/journal.pcbi.1012528.s013

(TIFF)

S11 Fig.

(a, b, c) Runtime statistics for EpiFusion models with increasing tree size, outbreak size (peak number of individuals infected), and outbreak length (days) using data from the Simulation Based Calibration. Runtime scales linearly with tree size. Runtimes represent the time taken (in minutes) to generate 2000 MCMC samples from EpiFusion on a Macbook Air M3 8-core CPU. EpiFusion has not yet been configured to run on a GPU. (d) Boxplots of the number of effective samples from the posterior generated per minute for the four key EpiFusion particle MCMC variables. Only the initial value of the infection rate beta is shown as beta is fitted as a changing variable over time within the particle filter. According to these times, to yield over 100 effective samples from the posterior for each variable will take approximately 25 minutes.

https://doi.org/10.1371/journal.pcbi.1012528.s014

(TIFF)

S1 Table. Summary of the 500 replicate outbreaks modelled (with varying parameters) for the Simulated Based Calibration section.

We show characteristics of the datasets: the median epidemic peak (max number of individuals infected at one time); number of cases; and tree size. Next we show `scaled deviated from truth`for gamma, phi and psi parameters. This is calculated as the difference between the model mean and the true value of the parameter, scaled by the true value of the parameter. Finally we show runtime in minutes to generate 2000 MCMC samples.

https://doi.org/10.1371/journal.pcbi.1012528.s015

(XLSX)

S2 Table. . ReMASTER parameters for outbreak simulations for the Scenario Testing section.

The ‘Main Scenarios’ include the Baseline, Sampling and Transmission. Here constant rates were used for each reaction. In the ‘sampling’ scenario, the rate of sampling was increased 10-fold on day 35. In the transmission scenario, the rate of transmission was increased 3-fold on day 100. For the noise scenarios, either transmission or sampling rates were changed at regular intervals (intervals drawn from a Poisson distribution with rate 6 for the transmission noise, and every 7 days for the observation noise). We added increased noise by drawing interval rate values from distributions with increasing standard deviations.

https://doi.org/10.1371/journal.pcbi.1012528.s016

(XLSX)

S3 Table. EpiFusion model parameter priors for each model in the Scenario and Noise Testing section.

For the Noise Testing section, the same priors were used for all models.

https://doi.org/10.1371/journal.pcbi.1012528.s017

(XLSX)

S4 Table. EpiFusion model results by parameter for each model in the Scenario Testing section.

https://doi.org/10.1371/journal.pcbi.1012528.s018

(XLSX)

S5 Table. Calculation methods for metrics used to assess model performance.

https://doi.org/10.1371/journal.pcbi.1012528.s019

(XLSX)

Acknowledgments

The authors would like to express their gratitude to Dr. Alex Zarebski, Prof. Oliver Pybus, Prof. Katia Koelle, Dr. David Hodgson, Dr. Alexis Robert, Antoine Zwaans, Ciara McCarthy, Gregory Barnsley and Emilie Finch for their advice and guidance during the development of this work.

References

1. Flaxman S, Mishra S, Gandy A, Unwin HJT, Mellan TA, Coupland H, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature 2020 584:7820. 2020 Jun 8;584(7820):257–61. pmid:32512579
- View Article
- PubMed/NCBI
- Google Scholar
2. Candido DS, Claro IM, de Jesus JG, Souza WM, Moreira FRR, Dellicour S, et al. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science (1979). 2020 Sep 4;369(6508):1255–60.
- View Article
- Google Scholar
3. Krämer A, Akmatov M, Kretzschmar M. Principles of Infectious Disease Epidemiology. Modern Infectious Disease Epidemiology. 2010;85.
- View Article
- Google Scholar
4. Douglas J, Mendes FK, Bouckaert R, Xie D, Jiménez-Silva CL, Swanepoel C, et al. Phylodynamics reveals the role of human travel and contact tracing in controlling the first wave of COVID-19 in four island nations. Virus Evol. 2021;7(2). pmid:34527282
- View Article
- PubMed/NCBI
- Google Scholar
5. Padmanabhan R, Abed HS, Meskin N, Khattab T, Shraim M, Al-Hitmi MA. A review of mathematical model-based scenario analysis and interventions for COVID-19. Vol. 209, Computer Methods and Programs in Biomedicine. 2021. pmid:34392001
- View Article
- PubMed/NCBI
- Google Scholar
6. Krämer A, Akmatov M, Kretzschmar M. Principles of Infectious Disease Epidemiology. Modern Infectious Disease Epidemiology [Internet]. 2010;85. Available from: /pmc/articles/PMC7178878/
- View Article
- Google Scholar
7. Frost SDW, Pybus OG, Gog JR, Viboud C, Bonhoeffer S, Bedford T. Eight challenges in phylodynamic inference. Epidemics. 2015 Mar;10:88–92. pmid:25843391
- View Article
- PubMed/NCBI
- Google Scholar
8. Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, et al. Epidemiological data challenges: Planning for a more robust future through data standards. Vol. 6, Frontiers in Public Health. 2018.
9. Peters R, Stevenson M. Zika virus diagnosis: challenges and solutions. Clinical Microbiology and Infection [Internet]. 2019 Feb;25(2):142–6. Available from: http://www.clinicalmicrobiologyandinfection.com/article/S1198743X18307742/fulltext pmid:30553031
- View Article
- PubMed/NCBI
- Google Scholar
10. Lourenço J, Tennant W, Faria NR, Walker A, Gupta S, Recker M. Challenges in dengue research: A computational perspective. Evol Appl [Internet]. 2018 Apr;11(4):516. Available from: /pmc/articles/PMC5891037/ pmid:29636803
- View Article
- PubMed/NCBI
- Google Scholar
11. Kitagawa G. Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models. Journal of Computational and Graphical Statistics. 1996;5(1).
- View Article
- Google Scholar
12. Grenfell BT, Pybus OG, Gog JR, Wood JLN, Daly JM, Mumford JA, et al. Unifying the Epidemiological and Evolutionary Dynamics of Pathogens. Science (1979) [Internet]. 2004 Jan;303(5656):327–32. Available from: https://www.science.org/doi/abs/10.1126/science.1090727
- View Article
- Google Scholar
13. Hill V, Ruis C, Bajaj S, Pybus OG, Kraemer MUG. Progress and challenges in virus genomic epidemiology. Vol. 37, Trends in Parasitology. 2021. pmid:34620561
- View Article
- PubMed/NCBI
- Google Scholar
14. Stadler T, Kühnert D, Bonhoeffer S, Drummond AJ. Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc Natl Acad Sci U S A. 2013 Jan;110(1):228–33. pmid:23248286
- View Article
- PubMed/NCBI
- Google Scholar
15. Volz EM, Siveroni I. Bayesian phylodynamic inference with complex models. PLoS Comput Biol. 2018;14(11). pmid:30422979
- View Article
- PubMed/NCBI
- Google Scholar
16. Hall RJ, Brown LM, Altizer S. Modeling vector-borne disease risk in migratory animals under climate change. Integr Comp Biol [Internet]. 2016 Aug;56(2):353–64. Available from: https://academic.oup.com/icb/article/56/2/353/2240693
- View Article
- Google Scholar
17. Lee SA, Economou T, Catão R de C, Barcellos C, Lowe R. The impact of climate suitability, urbanisation, and connectivity on the expansion of dengue in 21st century Brazil. PLoS Negl Trop Dis. 2021;15(12). pmid:34882679
- View Article
- PubMed/NCBI
- Google Scholar
18. mok Jung S, Endo A, Akhmetzhanov AR, Nishiura H. Predicting the effective reproduction number of COVID-19: inference using human mobility, temperature, and risk awareness. International Journal of Infectious Diseases. 2021 Dec;113:47–54. pmid:34628020
- View Article
- PubMed/NCBI
- Google Scholar
19. Kraemer MUG, Golding N, Bisanzio D, Bhatt S, Pigott DM, Ray SE, et al. Utilizing general human movement models to predict the spread of emerging infectious diseases in resource poor settings. Scientific Reports 2019 9:1 [Internet]. 2019 Mar;9(1):1–11. Available from: https://www.nature.com/articles/s41598-019-41192-3
- View Article
- Google Scholar
20. Moran KR, Fairchild G, Generous N, Hickmann K, Osthus D, Priedhorsky R, et al. Epidemic forecasting is messier than weather forecasting: The role of human behavior and internet data streams in epidemic forecast. Journal of Infectious Diseases. 2016;214. pmid:28830111
- View Article
- PubMed/NCBI
- Google Scholar
21. Okoror LE, Bankefa EO, Ajayi EO, Ojo SK. Misdiagnosis of Dengue Fever and Co-infection With Malaria and Typhoid Fevers in Rural Areas in Southwest Nigeria. 2021 Mar; Available from: https://www.researchsquare.com
- View Article
- Google Scholar
22. Oidtman RJ, España G, Alex Perkins T. Co-circulation and misdiagnosis led to underestimation of the 2015–2017 Zika epidemic in the Americas. PLoS Negl Trop Dis [Internet]. 2021 Mar;15(3):e0009208. Available from: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0009208 pmid:33647014
- View Article
- PubMed/NCBI
- Google Scholar
23. Brady O. Mapping the emerging burden of dengue. Elife [Internet]. 2019 May;8. Available from: /pmc/articles/PMC6513550/ pmid:31081497
- View Article
- PubMed/NCBI
- Google Scholar
24. Hamlet A, Gaythorpe KAM, Garske T, Ferguson NM. Seasonal and inter-annual drivers of yellow fever transmission in south America. PLoS Negl Trop Dis. 2021;15(1). pmid:33428623
- View Article
- PubMed/NCBI
- Google Scholar
25. Valentine MJ, Murdock CC, Kelly PJ. Sylvatic cycles of arboviruses in non-human primates. Vol. 12, Parasites and Vectors. 2019. pmid:31578140
- View Article
- PubMed/NCBI
- Google Scholar
26. Naveca FG, Claro I, Giovanetti M, de Jesus JG, Xavier J, Iani FC de M, et al. Genomic, epidemiological and digital surveillance of Chikungunya virus in the Brazilian Amazon. PLoS Negl Trop Dis. 2018;13(3).
- View Article
- Google Scholar
27. Faria NR, Kraemer MUG, Hill SC, De Jesus JG, Aguiar RS, Iani FCM, et al. Genomic and epidemiological monitoring of yellow fever virus transmission potential. Science (1979). 2018;361(6405). pmid:30139911
- View Article
- PubMed/NCBI
- Google Scholar
28. Klitting R, Kafetzopoulou LE, Thiery W, Dudas G, Gryseels S, Kotamarthi A, et al. Predicting the evolution of the Lassa virus endemic area and population at risk over the next decades. Nat Commun. 2022;13(1).
- View Article
- Google Scholar
29. Giovanetti M, Faria NR, Lourenço J, Goes de Jesus J, Xavier J, Claro IM, et al. Genomic and Epidemiological Surveillance of Zika Virus in the Amazon Region. Cell Rep. 2020;30(7). pmid:32075736
- View Article
- PubMed/NCBI
- Google Scholar
30. Zarebski AE, du Plessis L, Parag KV, Pybus OG. A computationally tractable birth-death model that combines phylogenetic and epidemiological data. PLoS Comput Biol. 2022;18(2). pmid:35148311
- View Article
- PubMed/NCBI
- Google Scholar
31. Rasmussen DA, Ratmann O, Koelle K. Inference for nonlinear epidemiological models using genealogies and time series. PLoS Comput Biol. 2011;7(8). pmid:21901082
- View Article
- PubMed/NCBI
- Google Scholar
32. Rasmussen DA, Volz EM, Koelle K. Phylodynamic Inference for Structured Epidemiological Models. PLoS Comput Biol. 2014;10(4). pmid:24743590
- View Article
- PubMed/NCBI
- Google Scholar
33. Vaughan TG, Leventhal GE, Rasmussen DA, Drummond AJ, Welch D, Stadler T, et al. Estimating Epidemic Incidence and Prevalence from Genomic Data. Mol Biol Evol. 2019;36(8). pmid:31058982
- View Article
- PubMed/NCBI
- Google Scholar
34. Andréoletti J, Zwaans A, Warnock RCM, Aguirre-Fernández G, Barido-Sottani J, Gupta A, et al. The Occurrence Birth–Death Process for Combined-Evidence Analysis in Macroevolution and Epidemiology. Syst Biol [Internet]. 2022 Oct 12 [cited 2024 May 28];71(6):1440–52. Available from: pmid:35608305
- View Article
- PubMed/NCBI
- Google Scholar
35. Gill A, Koskela J, Didelot X, Everitt RG. Bayesian Inference of Reproduction Number from Epidemiological and Genetic Data Using Particle MCMC. 2023 Nov 16 [cited 2024 May 28]; Available from: http://arxiv.org/abs/2311.09838
- View Article
- Google Scholar
36. Funk S, Camacho A, Kucharski AJ, Eggo RM, Edmunds WJ. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics. 2018;22. pmid:28038870
- View Article
- PubMed/NCBI
- Google Scholar
37. Murray LM. Bayesian state-space modelling on high-performance hardware using LibBi. J Stat Softw. 2015;67(10).
- View Article
- Google Scholar
38. Li LM, Grassly NC, Fraser C. Quantifying transmission heterogeneity using both pathogen phylogenies and incidence time series. Mol Biol Evol. 2017;34(11). pmid:28981709
- View Article
- PubMed/NCBI
- Google Scholar
39. Volz EM. Complex population dynamics and the coalescent under neutrality. Genetics. 2012;190(1). pmid:22042576
- View Article
- PubMed/NCBI
- Google Scholar
40. Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting systems. Journal of Chemical Physics. 2001;115(4).
- View Article
- Google Scholar
41. Zarebski AE, Zwaans A, Gutierrez B, Plessis L du, Pybus OG. Estimating epidemic dynamics with genomic and time series data. medRxiv [Internet]. 2023 Aug 8 [cited 2023 Dec 16];2023.08.03.23293620. Available from: https://www.medrxiv.org/content/10.1101/2023.08.03.23293620v1
- View Article
- Google Scholar
42. Manceau M, Gupta A, Vaughan T, Stadler T. The probability distribution of the ancestral population size conditioned on the reconstructed phylogenetic tree with occurrence data. J Theor Biol. 2021;509. pmid:32739241
- View Article
- PubMed/NCBI
- Google Scholar
43. Gupta A, Manceau M, Vaughan T, Khammash M, Stadler T. The probability distribution of the reconstructed phylogenetic tree with occurrence data. J Theor Biol. 2020;488.
- View Article
- Google Scholar
44. Cunningham N, Griffin JE, Wild DL. ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification. Adv Data Anal Classif. 2020;14(2).
- View Article
- Google Scholar
45. Caron F, Davy M, Duflos E, Vanheeghe P. Particle filtering for multisensor data fusion with switching observation models: Application to land vehicle positioning. IEEE Transactions on Signal Processing. 2007;55(6 I).
- View Article
- Google Scholar
46. Stolz U, Stadler T, Müller NF, Vaughan TG. Joint Inference of Migration and Reassortment Patterns for Viruses with Segmented Genomes. Mol Biol Evol [Internet]. 2022 Jan 7 [cited 2024 Apr 16];39(1). Available from: https://dx.doi.org/10.1093/molbev/msab342 pmid:34893876
- View Article
- PubMed/NCBI
- Google Scholar
47. Andréoletti J, Zwaans A, Warnock RCM, Aguirre-Fernández G, Barido-Sottani J, Gupta A, et al. The Occurrence Birth–Death Process for Combined-Evidence Analysis in Macroevolution and Epidemiology. Syst Biol [Internet]. 2022 Oct 12 [cited 2024 Apr 16];71(6):1440–52. Available from: pmid:35608305
- View Article
- PubMed/NCBI
- Google Scholar
48. ReMASTER [Internet]. [cited 2023 Dec 5]. Available from: https://tgvaughan.github.io/remaster/
49. Vaughan TG. ReMASTER: Improved phylodynamic simulation for BEAST 2.7. bioRxiv [Internet]. 2023 Oct 10 [cited 2023 Dec 16];2023.10.09.561485. Available from: https://www.biorxiv.org/content/10.1101/2023.10.09.561485v1
- View Article
- Google Scholar
50. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol [Internet]. 2019;15(4):e1006650. Available from: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006650 pmid:30958812
- View Article
- PubMed/NCBI
- Google Scholar
51. Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters • EpiNow2 [Internet]. Available from: https://epiforecasts.io/EpiNow2/
- View Article
- Google Scholar
52. Fang LQ, Yang Y, Jiang JF, Yao HW, Kargbo D, Lou Li X, et al. Transmission dynamics of Ebola virus disease and intervention effectiveness in Sierra Leone. Proc Natl Acad Sci U S A. 2016;113(16). pmid:27035948
- View Article
- PubMed/NCBI
- Google Scholar
53. Dellicour S, Baele G, Dudas G, Faria NR, Pybus OG, Suchard MA, et al. Phylodynamic assessment of intervention strategies for the West African Ebola virus outbreak. Nat Commun. 2018;9(1). pmid:29884821
- View Article
- PubMed/NCBI
- Google Scholar
54. Kahn R, Peak CM, Fernández-Gracia J, Hill A, Jambai A, Ganda L, et al. Incubation periods impact the spatial predictability of cholera and Ebola outbreaks in Sierra Leone. Proc Natl Acad Sci U S A [Internet]. 2020 Mar 3 [cited 2024 May 10];117(9):5067–73. Available from: https://www.pnas.org/doi/abs/10.1073/pnas.1913052117 pmid:32054785
- View Article
- PubMed/NCBI
- Google Scholar
55. Althaus CL. Estimating the Reproduction Number of Ebola Virus (EBOV) During the 2014 Outbreak in West Africa. PLoS Curr [Internet]. 2014 [cited 2024 May 10];6. Available from: /pmc/articles/PMC4169395/ pmid:25642364
- View Article
- PubMed/NCBI
- Google Scholar
56. Sierra Leone to Impose 3-Day Ebola Quarantine—The New York Times [Internet]. [cited 2024 May 10]. Available from: https://www.nytimes.com/2014/09/07/world/africa/sierra-leone-to-impose-widespread-ebola-quarantine.html
57. Koutsouris DD, Pitoglou S, Anastasiou A, Koumpouros Y. A Method of Estimating Time-to-Recovery for a Disease Caused by a Contagious Pathogen Such as SARS-CoV-2 Using a Time Series of Aggregated Case Reports. Healthcare [Internet]. 2023 Mar 1 [cited 2024 Apr 16];11(5). Available from: /pmc/articles/PMC10001208/ pmid:36900738
- View Article
- PubMed/NCBI
- Google Scholar
58. Hakki S, Zhou J, Jonnerby J, Singanayagam A, Barnett JL, Madon KJ, et al. Onset and window of SARS-CoV-2 infectiousness and temporal correlation with symptom onset: a prospective, longitudinal, community cohort study. Lancet Respir Med [Internet]. 2022 Nov 1 [cited 2024 Apr 16];10(11):1061–73. Available from: http://www.thelancet.com/article/S2213260022002260/fulltext pmid:35988572
- View Article
- PubMed/NCBI
- Google Scholar
59. Zhao S, Lin Q, Ran J, Musa SS, Yang G, Wang W, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. International Journal of Infectious Diseases. 2020;92.
- View Article
- Google Scholar
60. Messina JP, Brady OJ, Golding N, Kraemer MUG, Wint GRW, Ray SE, et al. The current and future global distribution and population at risk of dengue. Nature Microbiology 2019 4:9 [Internet]. 2019 Jun;4(9):1508–15. Available from: https://www.nature.com/articles/s41564-019-0476-8 pmid:31182801
- View Article
- PubMed/NCBI
- Google Scholar
61. de Araújo TVB, Rodrigues LC, de Alencar Ximenes RA, de Barros Miranda-Filho D, Montarroyos UR, de Melo APL, et al. Association between Zika virus infection and microcephaly in Brazil, January to May, 2016: preliminary report of a case-control study. Lancet Infect Dis. 2016;16(12). pmid:27641777
- View Article
- PubMed/NCBI
- Google Scholar
62. Contreras S, Villavicencio HA, Medina-Ortiz D, Saavedra CP, Olivera-Nappa Á. Real-Time Estimation of Rt for Supporting Public-Health Policies Against COVID-19. Front Public Health. 2020;8. pmid:33415091
- View Article
- PubMed/NCBI
- Google Scholar
63. Flaxman S, Mishra S, Gandy A, Juliette Unwin HT, Mellan TA, Coupland H, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe Mélodie Monod 1, Imperial College COVID-19 Response Team*, Azra C. Nature. 2020;584.
- View Article
- Google Scholar
64. Barnard RC, Davies NG, Pearson CAB, Jit M, Edmunds J. Modelling the potential consequences of the Omicron SARS-CoV-2 variant in England | CMMID Repository. Report in progress. 2021;(December).
- View Article
- Google Scholar
65. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. NextStrain: Real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23). pmid:29790939
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Flaxman S, Mishra S, Gandy A, Unwin HJT, Mellan TA, Coupland H, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature 2020 584:7820. 2020 Jun 8;584(7820):257–61. pmid:32512579
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Candido DS, Claro IM, de Jesus JG, Souza WM, Moreira FRR, Dellicour S, et al. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science (1979). 2020 Sep 4;369(6508):1255–60.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Krämer A, Akmatov M, Kretzschmar M. Principles of Infectious Disease Epidemiology. Modern Infectious Disease Epidemiology. 2010;85.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Douglas J, Mendes FK, Bouckaert R, Xie D, Jiménez-Silva CL, Swanepoel C, et al. Phylodynamics reveals the role of human travel and contact tracing in controlling the first wave of COVID-19 in four island nations. Virus Evol. 2021;7(2). pmid:34527282
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref5] 5. Padmanabhan R, Abed HS, Meskin N, Khattab T, Shraim M, Al-Hitmi MA. A review of mathematical model-based scenario analysis and interventions for COVID-19. Vol. 209, Computer Methods and Programs in Biomedicine. 2021. pmid:34392001
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref6] 6. Krämer A, Akmatov M, Kretzschmar M. Principles of Infectious Disease Epidemiology. Modern Infectious Disease Epidemiology [Internet]. 2010;85. Available from: /pmc/articles/PMC7178878/
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref7] 7. Frost SDW, Pybus OG, Gog JR, Viboud C, Bonhoeffer S, Bedford T. Eight challenges in phylodynamic inference. Epidemics. 2015 Mar;10:88–92. pmid:25843391
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, et al. Epidemiological data challenges: Planning for a more robust future through data standards. Vol. 6, Frontiers in Public Health. 2018.

[ref9] 9. Peters R, Stevenson M. Zika virus diagnosis: challenges and solutions. Clinical Microbiology and Infection [Internet]. 2019 Feb;25(2):142–6. Available from: http://www.clinicalmicrobiologyandinfection.com/article/S1198743X18307742/fulltext pmid:30553031
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref10] 10. Lourenço J, Tennant W, Faria NR, Walker A, Gupta S, Recker M. Challenges in dengue research: A computational perspective. Evol Appl [Internet]. 2018 Apr;11(4):516. Available from: /pmc/articles/PMC5891037/ pmid:29636803
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref11] 11. Kitagawa G. Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models. Journal of Computational and Graphical Statistics. 1996;5(1).
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref12] 12. Grenfell BT, Pybus OG, Gog JR, Wood JLN, Daly JM, Mumford JA, et al. Unifying the Epidemiological and Evolutionary Dynamics of Pathogens. Science (1979) [Internet]. 2004 Jan;303(5656):327–32. Available from: https://www.science.org/doi/abs/10.1126/science.1090727
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref13] 13. Hill V, Ruis C, Bajaj S, Pybus OG, Kraemer MUG. Progress and challenges in virus genomic epidemiology. Vol. 37, Trends in Parasitology. 2021. pmid:34620561
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref14] 14. Stadler T, Kühnert D, Bonhoeffer S, Drummond AJ. Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc Natl Acad Sci U S A. 2013 Jan;110(1):228–33. pmid:23248286
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref15] 15. Volz EM, Siveroni I. Bayesian phylodynamic inference with complex models. PLoS Comput Biol. 2018;14(11). pmid:30422979
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref16] 16. Hall RJ, Brown LM, Altizer S. Modeling vector-borne disease risk in migratory animals under climate change. Integr Comp Biol [Internet]. 2016 Aug;56(2):353–64. Available from: https://academic.oup.com/icb/article/56/2/353/2240693
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref17] 17. Lee SA, Economou T, Catão R de C, Barcellos C, Lowe R. The impact of climate suitability, urbanisation, and connectivity on the expansion of dengue in 21st century Brazil. PLoS Negl Trop Dis. 2021;15(12). pmid:34882679
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref18] 18. mok Jung S, Endo A, Akhmetzhanov AR, Nishiura H. Predicting the effective reproduction number of COVID-19: inference using human mobility, temperature, and risk awareness. International Journal of Infectious Diseases. 2021 Dec;113:47–54. pmid:34628020
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref19] 19. Kraemer MUG, Golding N, Bisanzio D, Bhatt S, Pigott DM, Ray SE, et al. Utilizing general human movement models to predict the spread of emerging infectious diseases in resource poor settings. Scientific Reports 2019 9:1 [Internet]. 2019 Mar;9(1):1–11. Available from: https://www.nature.com/articles/s41598-019-41192-3
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref20] 20. Moran KR, Fairchild G, Generous N, Hickmann K, Osthus D, Priedhorsky R, et al. Epidemic forecasting is messier than weather forecasting: The role of human behavior and internet data streams in epidemic forecast. Journal of Infectious Diseases. 2016;214. pmid:28830111
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref21] 21. Okoror LE, Bankefa EO, Ajayi EO, Ojo SK. Misdiagnosis of Dengue Fever and Co-infection With Malaria and Typhoid Fevers in Rural Areas in Southwest Nigeria. 2021 Mar; Available from: https://www.researchsquare.com
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref22] 22. Oidtman RJ, España G, Alex Perkins T. Co-circulation and misdiagnosis led to underestimation of the 2015–2017 Zika epidemic in the Americas. PLoS Negl Trop Dis [Internet]. 2021 Mar;15(3):e0009208. Available from: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0009208 pmid:33647014
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref23] 23. Brady O. Mapping the emerging burden of dengue. Elife [Internet]. 2019 May;8. Available from: /pmc/articles/PMC6513550/ pmid:31081497
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref24] 24. Hamlet A, Gaythorpe KAM, Garske T, Ferguson NM. Seasonal and inter-annual drivers of yellow fever transmission in south America. PLoS Negl Trop Dis. 2021;15(1). pmid:33428623
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref25] 25. Valentine MJ, Murdock CC, Kelly PJ. Sylvatic cycles of arboviruses in non-human primates. Vol. 12, Parasites and Vectors. 2019. pmid:31578140
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref26] 26. Naveca FG, Claro I, Giovanetti M, de Jesus JG, Xavier J, Iani FC de M, et al. Genomic, epidemiological and digital surveillance of Chikungunya virus in the Brazilian Amazon. PLoS Negl Trop Dis. 2018;13(3).
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref27] 27. Faria NR, Kraemer MUG, Hill SC, De Jesus JG, Aguiar RS, Iani FCM, et al. Genomic and epidemiological monitoring of yellow fever virus transmission potential. Science (1979). 2018;361(6405). pmid:30139911
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref28] 28. Klitting R, Kafetzopoulou LE, Thiery W, Dudas G, Gryseels S, Kotamarthi A, et al. Predicting the evolution of the Lassa virus endemic area and population at risk over the next decades. Nat Commun. 2022;13(1).
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref29] 29. Giovanetti M, Faria NR, Lourenço J, Goes de Jesus J, Xavier J, Claro IM, et al. Genomic and Epidemiological Surveillance of Zika Virus in the Amazon Region. Cell Rep. 2020;30(7). pmid:32075736
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref30] 30. Zarebski AE, du Plessis L, Parag KV, Pybus OG. A computationally tractable birth-death model that combines phylogenetic and epidemiological data. PLoS Comput Biol. 2022;18(2). pmid:35148311
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref31] 31. Rasmussen DA, Ratmann O, Koelle K. Inference for nonlinear epidemiological models using genealogies and time series. PLoS Comput Biol. 2011;7(8). pmid:21901082
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref32] 32. Rasmussen DA, Volz EM, Koelle K. Phylodynamic Inference for Structured Epidemiological Models. PLoS Comput Biol. 2014;10(4). pmid:24743590
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref33] 33. Vaughan TG, Leventhal GE, Rasmussen DA, Drummond AJ, Welch D, Stadler T, et al. Estimating Epidemic Incidence and Prevalence from Genomic Data. Mol Biol Evol. 2019;36(8). pmid:31058982
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref34] 34. Andréoletti J, Zwaans A, Warnock RCM, Aguirre-Fernández G, Barido-Sottani J, Gupta A, et al. The Occurrence Birth–Death Process for Combined-Evidence Analysis in Macroevolution and Epidemiology. Syst Biol [Internet]. 2022 Oct 12 [cited 2024 May 28];71(6):1440–52. Available from: pmid:35608305
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref35] 35. Gill A, Koskela J, Didelot X, Everitt RG. Bayesian Inference of Reproduction Number from Epidemiological and Genetic Data Using Particle MCMC. 2023 Nov 16 [cited 2024 May 28]; Available from: http://arxiv.org/abs/2311.09838
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref36] 36. Funk S, Camacho A, Kucharski AJ, Eggo RM, Edmunds WJ. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics. 2018;22. pmid:28038870
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref37] 37. Murray LM. Bayesian state-space modelling on high-performance hardware using LibBi. J Stat Softw. 2015;67(10).
View Article
Google Scholar

[132] View Article

[133] Google Scholar

[ref38] 38. Li LM, Grassly NC, Fraser C. Quantifying transmission heterogeneity using both pathogen phylogenies and incidence time series. Mol Biol Evol. 2017;34(11). pmid:28981709
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref39] 39. Volz EM. Complex population dynamics and the coalescent under neutrality. Genetics. 2012;190(1). pmid:22042576
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref40] 40. Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting systems. Journal of Chemical Physics. 2001;115(4).
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref41] 41. Zarebski AE, Zwaans A, Gutierrez B, Plessis L du, Pybus OG. Estimating epidemic dynamics with genomic and time series data. medRxiv [Internet]. 2023 Aug 8 [cited 2023 Dec 16];2023.08.03.23293620. Available from: https://www.medrxiv.org/content/10.1101/2023.08.03.23293620v1
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref42] 42. Manceau M, Gupta A, Vaughan T, Stadler T. The probability distribution of the ancestral population size conditioned on the reconstructed phylogenetic tree with occurrence data. J Theor Biol. 2021;509. pmid:32739241
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref43] 43. Gupta A, Manceau M, Vaughan T, Khammash M, Stadler T. The probability distribution of the reconstructed phylogenetic tree with occurrence data. J Theor Biol. 2020;488.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref44] 44. Cunningham N, Griffin JE, Wild DL. ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification. Adv Data Anal Classif. 2020;14(2).
View Article
Google Scholar

[156] View Article

[157] Google Scholar

[ref45] 45. Caron F, Davy M, Duflos E, Vanheeghe P. Particle filtering for multisensor data fusion with switching observation models: Application to land vehicle positioning. IEEE Transactions on Signal Processing. 2007;55(6 I).
View Article
Google Scholar

[159] View Article

[160] Google Scholar

[ref46] 46. Stolz U, Stadler T, Müller NF, Vaughan TG. Joint Inference of Migration and Reassortment Patterns for Viruses with Segmented Genomes. Mol Biol Evol [Internet]. 2022 Jan 7 [cited 2024 Apr 16];39(1). Available from: https://dx.doi.org/10.1093/molbev/msab342 pmid:34893876
View Article
PubMed/NCBI
Google Scholar

[162] View Article

[163] PubMed/NCBI

[164] Google Scholar

[ref47] 47. Andréoletti J, Zwaans A, Warnock RCM, Aguirre-Fernández G, Barido-Sottani J, Gupta A, et al. The Occurrence Birth–Death Process for Combined-Evidence Analysis in Macroevolution and Epidemiology. Syst Biol [Internet]. 2022 Oct 12 [cited 2024 Apr 16];71(6):1440–52. Available from: pmid:35608305
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref48] 48. ReMASTER [Internet]. [cited 2023 Dec 5]. Available from: https://tgvaughan.github.io/remaster/

[ref49] 49. Vaughan TG. ReMASTER: Improved phylodynamic simulation for BEAST 2.7. bioRxiv [Internet]. 2023 Oct 10 [cited 2023 Dec 16];2023.10.09.561485. Available from: https://www.biorxiv.org/content/10.1101/2023.10.09.561485v1
View Article
Google Scholar

[171] View Article

[172] Google Scholar

[ref50] 50. Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol [Internet]. 2019;15(4):e1006650. Available from: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006650 pmid:30958812
View Article
PubMed/NCBI
Google Scholar

[174] View Article

[175] PubMed/NCBI

[176] Google Scholar

[ref51] 51. Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters • EpiNow2 [Internet]. Available from: https://epiforecasts.io/EpiNow2/
View Article
Google Scholar

[178] View Article

[179] Google Scholar

[ref52] 52. Fang LQ, Yang Y, Jiang JF, Yao HW, Kargbo D, Lou Li X, et al. Transmission dynamics of Ebola virus disease and intervention effectiveness in Sierra Leone. Proc Natl Acad Sci U S A. 2016;113(16). pmid:27035948
View Article
PubMed/NCBI
Google Scholar

[181] View Article

[182] PubMed/NCBI

[183] Google Scholar

[ref53] 53. Dellicour S, Baele G, Dudas G, Faria NR, Pybus OG, Suchard MA, et al. Phylodynamic assessment of intervention strategies for the West African Ebola virus outbreak. Nat Commun. 2018;9(1). pmid:29884821
View Article
PubMed/NCBI
Google Scholar

[185] View Article

[186] PubMed/NCBI

[187] Google Scholar

[ref54] 54. Kahn R, Peak CM, Fernández-Gracia J, Hill A, Jambai A, Ganda L, et al. Incubation periods impact the spatial predictability of cholera and Ebola outbreaks in Sierra Leone. Proc Natl Acad Sci U S A [Internet]. 2020 Mar 3 [cited 2024 May 10];117(9):5067–73. Available from: https://www.pnas.org/doi/abs/10.1073/pnas.1913052117 pmid:32054785
View Article
PubMed/NCBI
Google Scholar

[189] View Article

[190] PubMed/NCBI

[191] Google Scholar

[ref55] 55. Althaus CL. Estimating the Reproduction Number of Ebola Virus (EBOV) During the 2014 Outbreak in West Africa. PLoS Curr [Internet]. 2014 [cited 2024 May 10];6. Available from: /pmc/articles/PMC4169395/ pmid:25642364
View Article
PubMed/NCBI
Google Scholar

[193] View Article

[194] PubMed/NCBI

[195] Google Scholar

[ref56] 56. Sierra Leone to Impose 3-Day Ebola Quarantine—The New York Times [Internet]. [cited 2024 May 10]. Available from: https://www.nytimes.com/2014/09/07/world/africa/sierra-leone-to-impose-widespread-ebola-quarantine.html

[ref57] 57. Koutsouris DD, Pitoglou S, Anastasiou A, Koumpouros Y. A Method of Estimating Time-to-Recovery for a Disease Caused by a Contagious Pathogen Such as SARS-CoV-2 Using a Time Series of Aggregated Case Reports. Healthcare [Internet]. 2023 Mar 1 [cited 2024 Apr 16];11(5). Available from: /pmc/articles/PMC10001208/ pmid:36900738
View Article
PubMed/NCBI
Google Scholar

[198] View Article

[199] PubMed/NCBI

[200] Google Scholar

[ref58] 58. Hakki S, Zhou J, Jonnerby J, Singanayagam A, Barnett JL, Madon KJ, et al. Onset and window of SARS-CoV-2 infectiousness and temporal correlation with symptom onset: a prospective, longitudinal, community cohort study. Lancet Respir Med [Internet]. 2022 Nov 1 [cited 2024 Apr 16];10(11):1061–73. Available from: http://www.thelancet.com/article/S2213260022002260/fulltext pmid:35988572
View Article
PubMed/NCBI
Google Scholar

[202] View Article

[203] PubMed/NCBI

[204] Google Scholar

[ref59] 59. Zhao S, Lin Q, Ran J, Musa SS, Yang G, Wang W, et al. Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: A data-driven analysis in the early phase of the outbreak. International Journal of Infectious Diseases. 2020;92.
View Article
Google Scholar

[206] View Article

[207] Google Scholar

[ref60] 60. Messina JP, Brady OJ, Golding N, Kraemer MUG, Wint GRW, Ray SE, et al. The current and future global distribution and population at risk of dengue. Nature Microbiology 2019 4:9 [Internet]. 2019 Jun;4(9):1508–15. Available from: https://www.nature.com/articles/s41564-019-0476-8 pmid:31182801
View Article
PubMed/NCBI
Google Scholar

[209] View Article

[210] PubMed/NCBI

[211] Google Scholar

[ref61] 61. de Araújo TVB, Rodrigues LC, de Alencar Ximenes RA, de Barros Miranda-Filho D, Montarroyos UR, de Melo APL, et al. Association between Zika virus infection and microcephaly in Brazil, January to May, 2016: preliminary report of a case-control study. Lancet Infect Dis. 2016;16(12). pmid:27641777
View Article
PubMed/NCBI
Google Scholar

[213] View Article

[214] PubMed/NCBI

[215] Google Scholar

[ref62] 62. Contreras S, Villavicencio HA, Medina-Ortiz D, Saavedra CP, Olivera-Nappa Á. Real-Time Estimation of Rt for Supporting Public-Health Policies Against COVID-19. Front Public Health. 2020;8. pmid:33415091
View Article
PubMed/NCBI
Google Scholar

[217] View Article

[218] PubMed/NCBI

[219] Google Scholar

[ref63] 63. Flaxman S, Mishra S, Gandy A, Juliette Unwin HT, Mellan TA, Coupland H, et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe Mélodie Monod 1, Imperial College COVID-19 Response Team*, Azra C. Nature. 2020;584.
View Article
Google Scholar

[221] View Article

[222] Google Scholar

[ref64] 64. Barnard RC, Davies NG, Pearson CAB, Jit M, Edmunds J. Modelling the potential consequences of the Omicron SARS-CoV-2 variant in England | CMMID Repository. Report in progress. 2021;(December).
View Article
Google Scholar

[224] View Article

[225] Google Scholar

[ref65] 65. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. NextStrain: Real-time tracking of pathogen evolution. Bioinformatics. 2018;34(23). pmid:29790939
View Article
PubMed/NCBI
Google Scholar

[227] View Article

[228] PubMed/NCBI

[229] Google Scholar

Figures

Abstract

Author summary

Introduction

Methods

Theory

Process model.

Observation models.

Fitting with MCMC.

Implementation and distribution

Model validation and testing

Simulated datasets.

Likelihood comparison.

Simulation based calibration.

Scenario testing.

Noise testing.

Benchmarking against existing approaches.

Ebola virus disease in Sierra Leone

Results

Testing on simulated data

Likelihood comparison.

Simulation based calibration.

Scenario testing.

Noise testing.

Benchmarking against existing approaches.

Ebola virus disease in Sierra Leone

Discussion

Supporting information

S1 Text. Information on the importance sampling implementation used within EpiFusion.

S2 Text. Pseudocode for the two key EpiFusion algorithms: (1) the MCMC algorithm and (2) the particle filtering algorithm.

S3 Text. Details on the model parameterisation for the benchmarking section, where existing Rt modelling methods were used.

S1 Fig. The fit of the simulated incidence from the EpiFusion model weekly incidence data as explained in the methods section.

S2 Fig.

S3 Fig. True infection trajectories, case incidence data, and phylogenetic trees for simulated outbreaks with increasing transmission noise.

S4 Fig. True infection trajectories, case incidence data, and phylogenetic trees for simulated outbreaks with increasing observation noise.

S5 Fig. Publicly available existing MCC tree of Ebola sequences from 2014 obtained from Dellicour et. al (53).

S6 Fig. Weekly confirmed and suspected cases of Ebola in Sierra Leone during the period of investigation obtained from Fang et. al.

S7 Fig.

S8 Fig.

S9 Fig. Rt trajectory fits for EpiFusion models on datasets with increasing observation noise.

S10 Fig. Trajectory fits for a random sample of 60 of the 500 models fitted in the Simulation Based Calibration section.

S11 Fig.

S1 Table. Summary of the 500 replicate outbreaks modelled (with varying parameters) for the Simulated Based Calibration section.

S2 Table. . ReMASTER parameters for outbreak simulations for the Scenario Testing section.

S3 Table. EpiFusion model parameter priors for each model in the Scenario and Noise Testing section.

S4 Table. EpiFusion model results by parameter for each model in the Scenario Testing section.

S5 Table. Calculation methods for metrics used to assess model performance.

Acknowledgments

References