Skip to main content
  • Loading metrics

A Bayesian Ensemble Approach for Epidemiological Projections

  • Tom Lindström ,

    Affiliations Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden, Department of Biology, Colorado State University, Fort Collins, Colorado, United States of America, US National Institute of Health, Bethesda, Maryland, United States of America, University of Exeter, Exeter, United Kingdom

  • Michael Tildesley,

    Affiliations US National Institute of Health, Bethesda, Maryland, United States of America, School of Veterinary Medicine and Science, University of Nottingham, Leicestershire, United Kingdom

  • Colleen Webb

    Affiliations Department of Biology, Colorado State University, Fort Collins, Colorado, United States of America, US National Institute of Health, Bethesda, Maryland, United States of America


Mathematical models are powerful tools for epidemiology and can be used to compare control actions. However, different models and model parameterizations may provide different prediction of outcomes. In other fields of research, ensemble modeling has been used to combine multiple projections. We explore the possibility of applying such methods to epidemiology by adapting Bayesian techniques developed for climate forecasting. We exemplify the implementation with single model ensembles based on different parameterizations of the Warwick model run for the 2001 United Kingdom foot and mouth disease outbreak and compare the efficacy of different control actions. This allows us to investigate the effect that discrepancy among projections based on different modeling assumptions has on the ensemble prediction. A sensitivity analysis showed that the choice of prior can have a pronounced effect on the posterior estimates of quantities of interest, in particular for ensembles with large discrepancy among projections. However, by using a hierarchical extension of the method we show that prior sensitivity can be circumvented. We further extend the method to include a priori beliefs about different modeling assumptions and demonstrate that the effect of this can have different consequences depending on the discrepancy among projections. We propose that the method is a promising analytical tool for ensemble modeling of disease outbreaks.

Author Summary

Policy decisions in response to emergent disease outbreaks use simulation models to inform the efficiency of different control actions. However, different projections may be made, depending on the choice of models and parameterizations. Ensemble modeling offers the ability to combine multiple projections and has been used successfully within other fields of research. A central issue in ensemble modeling is how to weight the projections when they are combined. For this purpose, we here adapt and extend a weighting method used in climate forecasting such that it can be used for epidemiological considerations. We investigate how the method performs by applying it to ensembles of projections for the UK foot and mouth disease outbreak in UK, 2001. We conclude that the method is a promising analytical tool for ensemble modeling of disease outbreaks.


Epidemiological forecasting is inherently challenging because the outcome often depends on largely unpredictable characteristics of hosts and pathogens as well as contact structure and pathways that mediate transmission [1]. Faced with such uncertainty, policy makers must still make decisions with high stakes, both in terms of health and economics. For instance, global annual malaria mortality was recently estimated at around 1.1 million [2] and to optimize control efforts, policy makers must make seasonal predictions about spatiotemporal patterns [3]. The prospect of an emergent pandemic influenza outbreak remains a global threat and emergency preparedness must evaluate the costs and benefits of control measures such as border control, closing of workplaces and/or schools as well as different vaccination strategies [4]. Livestock diseases are major concerns for both animal welfare and economics. As an example, the United Kingdom (UK) 2001 outbreak of foot and mouth disease (FMD) involved culling of approximately 7 million animals, either in an effort to control the disease or for welfare reasons, and the total cost has been estimated at £8 billion [5]. To minimize the size and duration of future outbreaks, various strategies for culling and vaccination must be compared [68]. As a tool to address these challenging tasks, mathematical models offer the possibility to explore different scenarios, thereby informing emergency preparedness and response to epidemics [1,912].

The predictive focus of epidemiological models can either be classified as forecasting or projecting [13]. Forecasting aims at estimating what will happen and can be used for example to predict seasonal peaks of outbreaks [3,14] or to identify geographical areas of particular concern [15]. Projecting, which is the main focus of this study, instead aims at comparing different scenarios and exploring what would happen under various assumptions of transmission, e.g. comparing the effectiveness of different control actions [7,1619].

Whilst analytical models clearly provide important insight into observed dynamics and a theoretical understanding of epidemiology [2022], there has been a shift in recent years towards stochastic simulation models for predictive purposes [1]. Typically, dynamic models are constructed and outbreaks are simulated repeatedly, thus generating predictive distributions of outcomes [1,17,18,23]. This variability in outcomes caused by the mere stochasticity of the transmission process includes one level of uncertainty, but still only relies on a single set of assumptions about the underlying disease transmission process. However, multiple assumptions can often be justified, leading to further uncertainty in the predictions. For instance, different models may have different projections because of different assumptions about transmission or because they incorporate different levels of detail. It may also be informative to explore different projections in terms of different parameterizations of a single model, for example corresponding to worst or best case scenarios. Faced with a set of projections, an important issue is how to combine these in a manner such that they can be used to inform policy.

The issue of multiple projections is not unique to the field of epidemiology, and various techniques of ensemble modeling have been used to merge projections based on different modeling assumptions. The key concept is that rather than relying on a single set of assumptions, a range of projections is used for predictive purposes. For instance, climate forecasting has employed ensemble techniques to account for uncertainty about initial conditions, parameter values and structure of the model design when predicting climate change [24,25]. Weather forecasting has been improved by combining the results of multiple models [26,27]. Similarly, hydrological model ensembles have been demonstrated to increase reliability of catchment forecasting [28] and have been used to assess the risk of flooding events [29]. Ensemble methods have also proven to be a powerful decision tool for medical diagnostics [30,31] and ecological considerations including management [32] and prediction of future species distribution [33].

Ensemble modeling has not yet been extensively used in epidemiology. However a few implementations exist, commonly by feeding climate or weather ensembles into disease models. Daszak et al. [34] coupled a set of climate projections to an environmental niche model of Nipah virus to predict future range distribution of the virus under climate change. Similarly, Guis et al. [35] investigated the effect of climate change on Bluetongue emergence in Europe by simulating outbreaks under different climate change scenarios. Focusing on a shorter time scale, Thomson et al. [3] used an ensemble of seasonal forecasts to predict the spatiotemporal pattern of within seasonal variation in malaria incidence. These studies all used a single disease model projection, coupled to an ensemble of climate or weather forecasts and the use of structurally different epidemiological models are to our knowledge still rare. However, Smith et al. [36] compared different malaria vaccination strategies by implementing an ensemble approach with different alterations of a base model. Also, in order to estimate global malaria mortality, Murray et al. [2] used weighted averages of different predictive models.

Given the success of ensemble methods in other fields, we expect that epidemiological implementations will increase. For that purpose however, there is a need for methods that combine multiple projections. A central issue in ensemble modeling is how to weight different projections, and we envisage four main procedures for this. Firstly, all models can be given equal weights. For instance, the IPCC 2001 report on climate change [37] used a set of climate models and gave the range of probable scenarios by averaging over different models and uncertainty by envelopes that included all scenarios. Gårdmark et al. [32] used seven ecological models for cod stock and argued that in order to prevent underestimation of uncertainty, weighted model averages are not to be used and when communicating with policy makers, it is preferable to present all included projections as well as the underlying assumptions behind them. A similar approach was also used by Smith et al. [36], who presented the prevalence of malaria under different vaccination strategies by medians of individual models and the range of the whole ensemble.

Secondly, expert opinions can be used to weight models. To our knowledge, no ensemble study has implemented weights based exclusively on expert opinion, but Bayesian model averaging can incorporate expert opinion as a subjective prior on model probabilities [38]. This approach relies on engaging stakeholders and communicating the underlying assumptions of the projections.

Thirdly, models can be weighted by agreement with other models. This approach was implemented by Räisänen and Palmer [39], who used cross-validation to weight climate models. As a more informal approach to the use of model consensus, the third IPCC report excluded two models because these predicted much higher global warming than the rest of the ensemble, thus acting as outliers [24].

Fourthly, weights can be determined by the models’ ability to replicate data. If all models are fitted to the same data using likelihood based methods, weights can be given directly by Akaike or Bayesian Information Criterion (AIC or BIC) [40,41]. In the FMD context, this may be a suitable approach if all included models are data driven kernel models that estimate parameters from outbreak data, such as those proposed by Jewell et al. [42] or Tildesley et al. [43]. However, such weighting schemes would be unfeasible when including detailed simulation models that rely on a large number of parameters, that are determined by expert opinion or lab experiment, such as AusSpread [44], NAADSM [45] and InterSpread Plus [46]. We propose that the future of ensemble modeling for epidemiology will benefit from combining structurally different model types, and methods of weighting need to handle both kernel type models as well as detailed simulation models.

Thus, bias assessment is often confined to the ability of models to replicate observed summary statistics of interest, in particular when the resolution of data observation is on a courser scale than the model prediction [47]. Such methods have been developed within the field of climate forecasting. Giorgi and Mearns [48] introduced a formal framework in which model weights were assessed based on model bias compared to observed data as well as convergence, i.e. agreement with the model consensus. Tebaldi et al. [47] extended the approach to a Bayesian framework. This approach is appealing because it provides probability distributions of quantities of interest, hence uncertainty about the projected outcomes may be provided to policy makers. As such, it would be a suitable approach also for epidemiological predictions.

However, methods developed in one field might not be directly transferable to another. Tebaldi et al. [47] points out that lack of data at fine scale resolution is a limiting factor for climate forecasting. Yet, at courser resolution climate researchers have access to long time series of climate variables to assess model bias. Comparable data may be available for endemic diseases, such as malaria [36] or tuberculosis [49], or seasonally recurrent outbreaks, such as influenza [14] or measles [50]. However, for emerging diseases, long time series would rarely be available, making the lack of data an even bigger issue for epidemiology.

In this methodological paper we aim to explore the potential of using ensemble methods based on the approach presented by Tebaldi et al. [47] for epidemiological projections. The Tebaldi et al. methodology focus on ensembles where projections are made with different models and our aim is to provide a corresponding framework for disease outbreak projections. To investigate the potential of the framework for epidemiology, we here use variations of a single model as a proxy for different models, thus allowing us to investigate how the methodology performs under different levels of discrepancy among projections in the ensemble. We exemplify the implementation by using the UK 2001 FMD outbreak and projections modelled by different parameterizations of the Warwick model [7,9].

In the 2001 UK FMD outbreak, livestock on all infected premises (IPs) were culled. In addition, livestock on farms that were identified to be at high risk of infection were culled as either traditional dangerous contacts (DCs) or contiguous premises (CPs). CPs were defined as “a category of dangerous contact where animals may have been exposed to infection on neighboring infected premises” [5,8]. We start by focusing on ensemble prediction of epidemic duration under the control action employed during the 2001 outbreak compared with an alternative action that excludes CP culling. We investigate sensitivity to priors and explore a hierarchical Bayesian extension of the method to circumvent potential problems with prior sensitivity. We also explore the potential of including subjective a priori trust in the different modeling assumptions and extend the methodology further to allow incorporation of multiple epidemic quantities, here exemplified by adding number of infected and culled farms to the analysis. Through a simulation study, we finally explore the capacity and limitations of the proposed ensemble method, pinpointing some important features of ensemble modeling

Materials and Methods

We apply a terminology such that control actions refers to different strategies for disease control. In the ensemble, each action is simulated under different modeling assumptions about the underlying process, expressed as either different models or, as in the example described here, different parameterizations of the same model. We refer to the combination of control action and modeling assumption as a projection. Each projection is further simulated with several replicates, which produce different outcomes merely due to the stochasticity of simulation models. We are also interested in how discrepancy among projections influences the performance of the weighting method and refer to sets of different projections as different ensembles with small and large discrepancy. A flow chart that demonstrates the relationship between different concepts and weighting schemes are presented in Fig 1.

Fig 1. Overview of analyses and methods developments.

Panel A presents a conceptual flowchart, showing that modeling assumptions are combined with control actions to simulate projections. These are then combined with observed outbreak data in the ensemble analysis. Panel B presents the methods developments in this study, indicating that we start with the Non Hierarchical Weighting (NHW) scheme, which is most similar to the original climate application. We then extend this to the Standard Hierarchical Weighting (SHW) scheme, and subsequently to Informative Hierarchical Weighting (IHW). We also extend the analysis to consider multiple epidemic quantities. The dashed lines indicate combinations for which the method and the supplied algorithm (S1 File) can be used but are not treated explicitly in the presented examples.

The Warwick model, control actions and ensembles

We focus on projections of FMD made by the Warwick model [7,9]. This model was developed in the early stages of the 2001 FMD outbreak by Keeling and coworkers to determine the potential for disease spread and the impact of intervention strategies [9]. Here, we utilized a modified version of the model used in 2001, and we briefly describe relevant aspects of the Warwick model with regard to ensemble modeling. Full details of the model can be found in [7,9]. The rate at which an infectious farm I infects a susceptible farm J is given by: (1) where (2) is the susceptibility of farm J and (3) is the transmissibility of farm I and K(dIJ) is the distance-dependent transmission kernel, estimated from contact tracing [9]. In this model Zs, I is the number of livestock species s (sheep or cow) recorded as being on farm I, Ss and Ts measure the region and species-specific susceptibility and transmissibility, dIJ is the distance between farms I and J and K(dIJ) is the distance dependent transmission kernel. The parameters, ps, S, pc, S, ps, T and pc, T, are power law parameters that account for a non-linear increase in susceptibility and transmissibility as animal numbers on a farm increase. Previous work has indicated that a model with power laws provides a closer fit to the 2001 data than when these powers are set to unity [43,51,52].

This version of the model has previously been parameterized to fit to the 2001 FMD outbreak [43]. Region-specific transmissibility and susceptibility parameters (and associated power laws) capture specific epidemiological characteristics and policy measures used in the main hot spots of Cumbria, Devon and the Welsh and Scottish borders. The model is therefore fitted to five regions: Cumbria, Devon, Wales, Scotland and the rest of England (excluding Cumbria and Devon). A table listing all the parameter values used in this model is given in Tildesley et al. [43].

In order to obtain multiple modeling assumptions for ensemble modeling, we specified different transmission rates, i as (4) where k1i and k2i are constants, specific for each modeling assumption, that scale the transmissibility and the spatial kernel, respectively. k1i = k2i = 1, follow the parameterizations of Tildesley et al. [43] and by decreasing or increasing these constants, we obtain parameterizations that correspond to best or worst case expectations about the transmissibility and spatial range of transmission. We are interested in how the level of discrepancy among modeling assumptions influences the performance of the ensemble method and we therefore created two different ensembles with different k1i and k2i, as listed in Table 1. We refer to these as the large and small discrepancy ensemble, corresponding to large and small differences, respectively, in the underlying modeling assumptions used for projections.

Table 1. Scaling constants k1i and k2i used for each modeling assumption i in the small and large discrepancy ensemble.

DCs in our model were determined based upon both prior infection by an IP and future risk of infection in the same way as in previous work [8]. CPs were defined as farms that share a common boundary and were determined on an individual basis. The model was seeded with the farms that were predicted to be infected prior to the introduction of movement restrictions on the 23rd February. For each modeling assumption i and control action, 200 replicates were simulated and each simulation progressed until the epidemic died out.

Adapting the Tebaldi et al. method for emerging diseases

To demonstrate concepts and explore the potential of using the Tebaldi et al. [47] approach for epidemiological considerations we initially focus on outbreak duration. This is often considered to be the most costly aspect of FMD outbreaks due to its implication for trade [53]. In section 2.7 we extend the methodology to multiple epidemic quantities. However, the outbreak duration example offers transparent transition from the original climate analysis of Tebaldi et al. [47] that considers the ensemble estimated difference between current and future mean temperatures. In order to introduce the framework to epidemiology, we consider the difference between the implemented and an alternative control action, attempting to show whether the inclusion of CP culling was an appropriate choice given the conditions at the start of the outbreak. As this is a post outbreak analysis, we know the final outbreak duration of the observed outbreak, but that is just a single realization and due to the stochastic nature of disease transmission, the exact outcome may be quite variable. We also have no observed outbreak under the alternative control action to compare with the implemented control. Under these conditions, the most appropriate quantities to compare are the mean duration of a large number of outbreaks under the two control actions, something that can only be achieved through epidemic modeling.

We are interested in comparing projections under the implemented control action to the observed data in order to estimate model weights. Using the Bayesian method of Tebaldi et al. [47], weights as well as statistics of the outbreak, like duration, are considered unknown random variables, and we denote the mean outbreak duration under the implemented and the alternative control action as μ and v, respectively, corresponding to the mean current and future temperature, respectively, for the climate application. In order to fit with the normal assumptions of the method, we consider log-duration in the analysis.

Weights are expressed through precision, λ = λ1, λ2,…, λn, with λi denoting the precision of modeling assumption i. The projection specific parameter xi indicates the mean of all replicates under the implemented control action (analog of current climate) for modeling assumption i. For the UK 2001 outbreak this included culling of IPs, DCs and CPs. The corresponding projection for the alternative control action (analog of future climate), that included culling of IPs and DCs is denoted yi. The relationship between projections and ensemble parameters is expressed as (5) (6) with Normal(μi-1) denoting the normal distribution with mean μ and variance λi-1. Parameter θ is included to allow for difference in overall precision of the modeling assumptions under implemented and alternative control actions. However, since projections xi and yi are based on simulations, it is fair to assume that modeling assumption i that has a high precision for the observed control action also will do well for the unobserved action. This is incorporated by the λi term in both Eqs 5 and 6. For the same reason, we may expect that a projection of a large xi also corresponds to a large value for yi and thus β is included to allow for correlation between corresponding projections for the two control actions; a projection that e.g. over-predicts duration of the outbreak for the observed control action can be expected to also over-predict the alternative control action.

The analysis of Tebaldi et al. [47] also assessed bias of projections by their ability to reproduce observed current climate by incorporating the relationship between observed current climate x0, an unobserved true mean climate variable (μ) and the precision of natural variability τ0 through (7)

In climate modeling, it is a fair assumption that τ0 is a known, fixed parameter because it can be assessed through historical data. That would rarely be the case for the corresponding epidemic considerations, at least for emerging diseases. Using a single outbreak to evaluate bias, we clearly have no way of assessing variability in outcomes. We therefore include τ0 as an unknown, random variable that is estimated in the analysis as described in the following section.

To aid the interpretation and transfer from the climate to the epidemiological interpretation, we have included Table 2 that lists the variables used in the analysis.

Table 2. Variables in the NHW analysis with the interpretation for the original climate interpretation and the epidemiological counterpart.

Stochasticity and variability

Our main interest in terms of outcome under the implemented control action is μ rather than x0. However, it is clear that in addition to the mean duration of the outbreak, the uncertainty about the process also results in some variability in the outcomes that we need to consider. The stochastic simulations used to generate projections provide not only a mean simulated outbreak quantity, but also a range of outcomes that projects the variability. In the absence of repeated outbreaks to evaluate variability of outcomes, an obvious choice would be to use this information to inform the variability τ0. Defining the variability τi as the precision of projections under the implemented action for modeling assumptions i = 1,2,…,n we include a hierarchical structure in the analysis so that for i = 0,1,2,…,n (8) where Gamma(aτ, bτ) indicates the gamma distribution with shape parameter aτ and rate bτ both of which are unknown parameters and are estimated in the analysis. Thus, as it would be cumbersome to elicit a fixed prior for τ0 based on our prior expectations about variability, we instead assume that τ0 comes from some, unknown distribution, and make use of τ1, τ2,…, τn to inform what this distribution should be.

Similarly, we need to model the variability of projections under the alternative control action, and denoting this φi we specify


The parameters aφ and bφ are conditionally independent from all other parameters in the analysis and can be modelled separately in the Bayesian analysis. As xi, yi, τi and φi are calculated from a finite number of realization with each modeling assumption and control action, there is some uncertainty related to this. Tebaldi et al. [47] however points out that while it is certainly possible to construct a Bayesian model that takes this uncertainty into account, the effect is minimal if the number of replicates is large. With the R = 200 replicates preformed here, the uncertainty of the mean will in practice have very little effect, and we have included xi, yi, τi and φi as fixed observations.

Priors for aτ and bτ were specified as a gamma distribution with shapes A and A, respectively, and rates B and B, respectively. Similarly, the priors for aφ and bφ, were specified as a gamma distribution with shapes A and A, respectively, and rates B and B, respectively. We explored different parameter choices for the hyperpriors and found that the results were insensitive to the choice of prior for a wide range of values. In the analysis presented, we used A = A = B = B = A A = B = B = 0.001. This corresponds to prior distributions with a mean of one and a variance of 1000, thus allowing for a wide range of plausible values.

Bayesian model

Bayesian analysis requires the specification of prior parameters. We follow Tebaldi et al. [47] with priors specified as uniform on the real line for μ, ν, and β, and λi~Gamma(aλ, bλ) for i = 1,2,…, n and θ~Gamma(aθ, bθ). We also need to specify hyperpriors for aτ and bτ, and we implement aτ~Gamma(A, B) and bτ~Gamma(A, B). Denoting x = x1, x2,…, xn and y = y1, y2,…, yn, the full posterior distribution under these priors is given by (10)

This posterior only differs from the one defined by Tebaldi et al. in that we include τ0 as an unknown variable and use a hierarchical structure for its prior. Using Markov Chain Monte Carlo (MCMC) techniques as described in 2.9, we first performed the analysis with priors as specified by Tebaldi et al. [47] where applicable, i.e. aλ = bλ = aθ = bθ = 0.001, because they argue that this ensures that the prior contributes little to the posteriors.

However, we propose that this argument is not necessarily always valid. In particular λi could be expected to be sensitive to priors because it is essentially only fitted to two data points, xi and yi. Yet, based on approximations of conditional distributions, Tebaldi et al. argued that the gamma distribution with aλ = bλ = 0.001 is appropriately vague for the analysis. For transparency we here follow their approach and investigate the effect of the prior for the simplified model where β = 0. The mean of the conditional distribution of λi is then approximated by (11) where is the conditional mean of the distribution of μ, given by (12) and the corresponding value for v, given by (13)

We stress that Σλi need not sum to one, as might be intuitive when using weights. As given by Eqs 11 and 12, the mean of μ and ν only depends on the relative values of λi, but the absolute values influence the width of the distribution, with the variance of the conditional distributions increasing with lower absolute values of λi (Table 3).

Table 3. Conditional distributions used for updating via Gibbs sampling for the single quantity example.

While a low value of aλ certainly ensures little contribution to the numerator in Eq (11), it is less evident that a low value for bλ contributes little to the denominator because if and , the denominator actually approaches bλ. Hence, to ensure that a low value of bλ can be considered vague such that our posterior is informed primarily by x0, x and y, we must conclude that or is clearly separated from zero. However, if λiλj for all ij and λiτ0, then and and nothing in the model prevents this relationship. In fact, if we consider the gamma prior with shape and scale parameters set to 0.001, the distribution has most of its density near zero, however with a fat tail (yet exponentially bounded) that allows for high values. In the current analysis, this corresponds to the prior belief that the majority of modeling assumptions will have very low precision while a few will have very high. Under this prior belief, it is expected that for some model i, λiλj for all ij. In the instance where instead τ0λi for all i, then and the approximation would hold, but we cannot expect that relationship.

As we cannot a priori be sure that the choice of aλ and bλ does not influence our posterior as long as they are arbitrarily small, we performed a prior sensitivity analysis and re-ran the analysis with aλ = bλ = 0.01 and aλ = bλ = 0.0001. We could expect that the sensitivity to priors depends on the difference among modeling assumptions, and we investigate this by analyzing ensembles with little and large discrepancy between assumptions in the ensemble as given by Table 1.

We refer to this as the Non Hierarchical Weighting (NHW) method.

Standard Hierarchical Weighting model

If we cannot ensure that the analysis is insensitive to the choice of prior, it implies that our prior beliefs will influence how much different projections contribute to ensemble predictions with the current method. Using prior beliefs is sometimes desirable, and in section 2.6 we consider the situation where we trust some modeling assumptions more than others. However, it would rarely be the case that we would have substantial expectations that could inform the shape, aλ, and scale, bλ, of the prior for λ.

A potential solution might be to extend the model to include hierarchical priors such that the prior for λi is estimated in the model rather than a priori fixed. We make a slight change to the parameterization of the prior distribution such that (14) i.e. specifying the distribution by its mean mλ and shape aλ, which are estimated in the model. In that way, we move our uncertainty up a level and express our beliefs about the distribution of mλ and aλ, rather than λ. Using mλ rather than bλ facilitates the specification of a prior for the mean precision parameter that corresponds to the priors previously specified on individual λi. This parameterization further aids the use of prior beliefs about weights in section 2.6.

While we can never be strictly uninformative in Bayesian analysis, the hierarchical prior can allow for a wide range of plausible mλ and aλ whereas the model presented in section 2.4 requires these to be specified explicitly. This also allows for the concept of “borrowing strength” [54], such that the distribution of each λi can be indirectly informed by all other precisions via the hierarchical distribution. This is often beneficial in situations where individual parameters are fitted to a small amount of data [55,56], which clearly is the case for λi here. To extend Eq (10) to a hierarchical model, we include hyperpriors such that (15) and (16)

We performed the corresponding sensitivity analysis for the hierarchical ensemble prediction by applying hyperpriors Aaλ = Baλ = Amλ = B set to 0.01, 0.001 and 0.0001. We refer to this as hierarchical sensitivity set-up one. Secondly, we performed a sensitivity analysis, hierarchical sensitivity set-up two, where we fixed the shape parameters Aaλ = Amλ = 0.001 and only varied B = B, again set to either 0.01, 0.001 or 0.0001.

We refer to this as model as the Standard Hierarchical Weighting (SHW) method.

Informative Hierarchical Weighting model

Using expert opinions may substantially improve predictions [57], and there are several instances where incorporating prior beliefs that reflect the “trust” in different modeling assumptions could be useful. For instance, a policymaker might have more trust in one model type over another, and rather than excluding the models that are considered less reliable (i.e. giving them a priori zero weigh), it could be useful to include them, yet with less contribution to the ensemble.

In the case considered here, where modeling assumptions represent most likely, best and worst case in terms of parameterization, we might want to give the “most likely” modeling assumption higher weight. For the analysis with fixed aλ and bλ, described in section 2.4, we could merely elicit a different scale parameter bλ for each λi, such that modeling assumptions with high trust are given a low value. However, with the shape parameter aλ set to a low value (“vague” shape), the prior may have little effect on the posterior λi. Eliciting a high value of aλ would instead result in a posterior that is merely the results of our prior beliefs and we have no foundation for which to elicit some intermediate value.

In order to combine the hierarchical approach with informative priors, we propose a modification of the analysis presented in section 2.5, where the assumption of exchangeability is relaxed in the hierarchical structure with (17) where and wi indicates the a priori trust in modeling assumption i. With wi = kwj, the prior distribution of λi will have a mean that is k times that of λj and from Eqs (12) and (13) the relationship also implies that before λ is estimated (i.e. involving the data x0, x and y), the outputs of modeling assumption i will contribute k times as much to μ and v as does assumption j.

To demonstrate the effect that a priori trust in different modeling assumptions can have on the posterior estimates, we consider the case where the best, most likely and worst case scenarios for each of the two varied parameters corresponds to percentile 2.5, 50 and 97.5, respectively, of a normal distribution. Given that the density at percentiles 2.5 and 97.5 then is 0.15 of that at the mode, we specify wi = 0.15 for i = 2, 3, 4 and 7, i.e. for modeling assumptions where one of the varied parameters follows the most likely scenario, whereas the other one is set to either worst or best case. With the same rationality, we specify wi = 0.021 for i = 5, 6, 8 and 9, i.e. modeling assumptions where both parameters follow either best or worst case expectations.

We also investigate the case where a high weight is given to a projection xi further away from the observed data x0. Consistently, modeling assumptions i = 5 predicted the shortest duration for all actions and ensembles. We therefore also performed the analysis with w5 = 1 and w1 = 0.021, and all other weights are as above. This allows us to investigate the performance of the informative weighting method when an outlier is up-weighted.

We refer to this method as the Informative Hierarchical Weighting (IHW) method.

Multiple epidemic quantities

In the above examples, we focused on a single epidemic quantity, allowing for transparent transition from the original Tebaldi et al. work [47] that focused on temperature. For epidemiology, it may however be useful to consider multiple epidemic quantities. This could be done in different ways, but here we offer a straightforward multi-quantity extension of the Bayesian model for the single epidemic quantity, based on the supposition that the relative weights are equal for all quantities. As such, we implement a single weighting parameter λ, shared among all quantities. For other parameters, we use a similar notation as for the single quantity analysis, but give many of the parameters an additional index q, indicating that the parameter is quantity specific. We expand the Bayesian model by defining (18) (19) (20) where xi, q and yi, q are the mean projections of modeling assumption i for epidemic quantity q for the implemented and alternative control action, respectively, and x0, q is the corresponding observed value. As for the single epidemic quantity example, μqand Vqare the expected values of quantity q, and because we cannot expect to have the same correlation between control actions for all quantities, βq is included as unique for each q. Parameters θμ, q and θv, q scales the precision of models between actions and quantities and the parameters of the Bayesian model are identifiable by defining θμ, 1 = 1. Similarly, we specify quantity specific parameters (21)

The conditional distributions for the multi-quantity extension are provided in Table 4. We denote the total number of quantities in an analysis as Q.

Table 4. Conditional distributions used for updating via Gibbs sampling for Q epidemic quantities.

Analysis of simulated outbreaks

The above examples focus on the UK 2001 FMD outbreak and show how the introduced framework can be applied to actual outbreak data. However, a limitation to this approach is that we are confined to investigating the behavior of the ensemble methodology for that particular outbreak. To further investigate the potential and limitations of the proposed methods, we also performed analysis of simulated outbreak data. With simulated data, we have “true” estimates of μ and v, and we want to explore the ability of the ensemble to predict these under two different conditions; when the true values lies within the range of X and Y predicted by the individual models of the ensemble and when it does not. For multi-model ensembles, this corresponds to the situation where the true behavior of the outbreak is encapsulated within the range of underlying assumptions of the individual models and when it is not.

Here we explore the outcome of these conditions by first simulating outbreaks with the parameterizations of modeling assumption 1 (k1 = k2 = 1), i.e. located in the center of both the small and large discrepancy ensemble. This simulates outbreaks where the true behavior of the outbreak is encapsulated within the range of underlying assumptions of the individual projections for both ensembles. We also simulate outbreaks with a parameterization where both k1 and k2 are set to 0.9. This produces outbreaks where the true behavior is outside of the assumptions of the projections for the small discrepancy ensemble, yet inside the range of the large discrepancy ensemble.

The exact behavior of the ensemble depends on the actual realization of the individual outbreak, because the observed values x0 are different due to the stochastic disease transmission process. We therefore apply both the small and large discrepancy ensembles to ten realizations of each of the simulation parameterizations. We implement both the single and multiple epidemic quantity analysis, thus further highlighting the effect of using multiple quantities.


We use Markov Chain Monte Carlo (MCMC) techniques to obtain samples from the full posterior distribution of the proposed Bayesian models (NHW, SHW and IHW). For many parameters, the conditional distribution belongs to a standard parametric family, thus allowing for Gibbs sampling. We list these conditional distributions in Table 3 for single quantity analysis and Table 4 for multiple quantities.

We also rely on Metropolis-Hastings (M-H) updates, and with the computation used for multi-quantity analysis being a straightforward extension of that used for the single quantity, we start by describing the update scheme for the single quantity analysis. The conditional distribution of bτ has a known form, , that would allow for Gibbs sampling of bτ, whereas M-H updates need to be implemented for aτ. We however found strong correlation between the marginal posterior estimates of aτ and bτ, and mixing was improved by performing joint M-H updates of these parameters by multivariate Random Walk (RW) proposals. Mixing can be further improved by updating parameters on a transform that resembles a Gaussian distribution, and we therefore performed updates on the log-transform, i.e. based on current values of aτ and bτ We proposed candidate parameters from MVN([log(aτ),log(bτ)],Στ). Here MVN indicates the multivariate normal distribution and Στ the covariance matrix. Candidate values are accepted with the probability (22) where indicates the Jacobian determinant of the log-transform.

Mixing can be improved if the covariance matrix Στ is proportional to the covariance of the marginal posterior of [log(aτ),log(bτ)], here indicated as Φ. However, this is not known prior to the analysis. We therefore implement an optimized method of the Robbins-Monroe search process as presented by Garthwaite et al. [58]. This estimates the covariance during the MCMC and finds the scaling parameter ρ such that Στ = ρΦ provides a chosen long term acceptance rate, here set to 0.234 based on suggestions by Roberts et al. [59]. The method has been demonstrated to be appropriate also for RW on transformed scales of the parameters [60].

The corresponding updates of aφ and bφ were also performed with M-H updates and we proposed candidate parameters from MVN([log(aϕ),log(bϕ)],Σϕ). and accepted them with probability (23)

We used a similar approach for updates of aλ and mλ in the hierarchical methods (SHW and IHW) and proposed from MVN([log(aλ),log(bλ)],Σλ). Candidate parameters were accepted with probability (24) where for all modeling assumptions i in the SHW method and with in the IHW method. As above, we used the method of Garthwaite et al. [58] to determine Σλ to obtain a long term acceptance rate of 0.234.

We also found strong correlation between μ and ν. In order to improve mixing, we repeated the updates of these parameters five times for each iteration of the MCMC.

The same update scheme was used for the multi-quantity consideration, yet with a separate Στ, q, Σϕ, q and Σλ, q adaptively estimated for each quantity q.

The algorithm was implemented in MATLAB (The MathWorks, Inc., Natick, Massachusetts, United States) and code is available as supplementary information (S1 File).


Single quantity analysis

We start by presenting the results for the single quantity analysis, highlighting the behavior of the method for the NHW, SHW and IHW schemes. Fig 2, panels A and B show the estimates of outbreak duration for the two control actions for the large discrepancy ensembles using the NHW method and reveals rather large prior sensitivity. Note that we plot marginal posteriors of M = eμ and N = ev, respectively. As such, the posteriors represent the geometric mean outbreak duration. The corresponding arithmetic mean can be calculated as eμ+1/(2τ0) and ev+1/(2ϕ0), respectively, yet we use the geometric mean as it more clearly shows the relationship with individual projections, here presented by xi = exi and Yi = eyi, respectively. For aλ = bλ = 0.0001 (solid gray lines), the distributions are multimodal with peaks at individual model predictions, whereas a more smooth shape is obtained for aλ = bλ = 0.01 (dashed black lines) and aλ = bλ = 0.001 (solid black line) yields an intermediate result.

Fig 2. Ensemble prediction for the UK 2001 FMD outbreak duration.

Comparing two methods of ensemble prediction of expected outbreak duration under implemented and alternative control actions (panels A, E and B, F, respectively) as well as the corresponding duration of individual outbreaks (panels C, G and D, H, respectively). The observed duration of the UK 2001 FMD outbreak is indicated by X0 while vertical colored lines (panels A, B, E and F) and annotations X1, X2,.., X9 and Y1, Y2,…, Y9 indicate the mean outbreak duration of each projection. The total bar height (panels C, D, G, H) indicate the frequency of simulations across all projections with outbreaks ending within each 50 day interval and colors indicate the contribution of each of the projections. Results are shown for analyses of the large discrepancy ensemble using the NHW method (panels A-D) and the SHW method (panels E-H). Marginal posterior estimates correspond to different priors, with prior parameters aλ = bλ = 0.01 (light gray), aλ = bλ = 0.001 (black) and aλ = bλ = 0.0001 (dark gray) the NHW method (panels A-D) and Aaλ = Abλ = Baλ = Bbλ set to 0.01 (light gray), 0.001 (black) and 0.0001 (dark gray) for the SHW method (panels E-H). Note that posterior estimates are very similar for the latter, making the lines overlap.

With the SHW method, we instead obtain posteriors that are largely insensitive to the choice of hyperprior. Fig 2, panels E and F present the result of sensitivity set-up one, showing near identical posterior estimates when hyperparameters A, B, A and B are set to 0.01, 0.001 or 0.0001. Sensitivity set-up two produced results that were visually indistinguishable from panels E and F and are not presented. This further indicates that the hierarchical method is robust to the choice of hyperpriors.

Within epidemiology, there is clearly an interest in not just the expected outbreak duration, but also other statistics such as the probability of large outbreaks occurring. We therefore consider the posterior predictive distributions of individual outbreak durations under the two control actions. For the non-hierarchical model (Fig 2, panels C and D), there is an obvious effect of the choice of prior with higher probability of long outbreaks for lower values of aλ and bλ. For the hierarchical model (Fig 2, panels G and H), there is again little difference among posteriors corresponding to different priors.

When evaluating the efficiency of control actions, the difference N-M is of particular interest. In the example presented here, this estimates how much longer the outbreak would have been if culling of CPs had been excluded from the control. As shown in Fig 3, the estimates are again sensitive to the choice of prior with the NHW method, yet insensitive with the SHW method. The range of the posterior under the NHW method is less sensitive to the prior for the low discrepancy ensemble (panel B) than for the large discrepancy ensemble (panel A), where higher probability of less difference is estimated with aλ = bλ = 0.01 than for aλ = bλ = 0.0001. However, the multimodal behavior of the NHW method with low values of aλ and bλ is obtained also for the low discrepancy ensemble.

Fig 3. Ensemble predicted difference between control actions.

Posterior predictive distributions of the difference in outbreak duration between implemented and alternative control actions for the 2001 UK FMD outbreak using the NHW method (panels A, B) and the SHW method (panels C, D,). Results are shown for large (panels A, C) and small (panels B, D) discrepancy among projections in the ensemble. Marginal posterior estimates correspond to different priors, with prior parameters aλ = bλ = 0.01 (light gray), aλ = bλ = 0.001 (black) and aλ = bλ = 0.0001 (dark gray) for the NHW method (panels A, B) and Aaλ = Abλ = Baλ = Bbλ set to 0.01 (light gray), 0.001 (black) and 0.0001 (dark gray) for the SHW method (panels C, D). Due to high similarity of estimates, the plots are largely overlapping for the hierarchical method.

Fig 4 demonstrates the effect that a priori beliefs about the weights have on the predicted outbreak duration under large and small discrepancy ensembles. When using a priori higher weights for the most likely scenarios (modeling assumption one; black dotted lines), the posterior estimates are shifted and become more centered on projections of that particular modeling assumption compared to the case where a priori weights are equal (black solid line). The outcome of up-weighting the outlier (modeling assumption five; solid gray lines) is however different between the two ensembles. For the small discrepancy ensemble (panels A and B), similar results are found as for the up-weighting of the most likely scenarios; posteriors are shifted towards the projection with a priori high weight. For the high discrepancy ensemble (panels C and D), the posterior estimates of outbreak duration instead become wider for both control actions, indicating larger uncertainty about the expected duration of outbreaks.

Fig 4. Ensemble prediction of expected outbreak duration with the IHW method.

Panels A and C show ensemble estimates of mean duration for the implemented control action of the 2001 UK FMD outbreak under small and large discrepancy ensemble, respectively. Panels B and D shows the corresponding estimates for an alternative control action. Marginal posterior distributions indicate the expected outbreak duration estimated by the ensemble with a priori equal weights (solid black line), up-weighting of the most likely scenarios (i = 1; dashed black line), up-weighting of the outlier (i = 5; solid gray line). Observed outbreak duration is indicated by X0 and predicted means under the implemented and alternative control action are indicated by X1, X2,… X9 and Y1, Y2,…, Y9, respectively.

Fig 5 shows the marginal posterior estimates of individual weights λi under different discrepancy among projections and informative weighting schemes. When using a priori equal weights, there is little difference in the estimates for the small discrepancy ensemble (top left panel) whereas moderate differences are obtained for large discrepancy (bottom left panel). Note that while the error bars are overlapping, the mean estimate of the most likely scenarios (modeling assumption one) is approximately 1.7 times as large as that of the outlier(modeling assumption five), meaning that the former will contribute approximately 1.7 times as much to the posterior means of μ and v than the latter (Eqs (12) and (13)).

Fig 5. Marginal posterior estimates of weighting parameters, λ.

Posterior means are indicated with circles and error bars indicate 95% central credibility intervals under a priori equal weights (left column panels), up-weighting of modeling assumption i = 1(middle column panels) and up-weighting of assumption i = 5 (right column panels). Results are shown for small and large discrepancy ensembles in top and bottom row panels, respectively.

When giving a priori highest weight to the most likely scenario (modeling assumption one; middle column panels), the posterior estimate of λ1 is consistently shifted upwards, meaning that the most likely scenarios (modeling assumption one) will contribute more to the posteriors of μ and ν than other projections. For the up-weighting of the outlier, projections corresponding to modeling assumption five, the same is found when there is little discrepancy among projections (lower right panel). This is however not found for the high discrepancy ensemble (top right panel), where the main effect is that compared to equal a priori weights (top left panel), the error bars are wider; this indicates larger uncertainty about weights and consequently about the contribution of individual projections to the posterior estimates of outbreak durations.

Multiple quantity analysis and simulated outbreaks

The proposed multi-quantity method can be implemented with either NHW, SHW or IHW schemes. Here we aim to illustrate the effect of using multiple quantities and focus on the SHW scheme. Fig 6 plots the marginal posterior density of mean outbreak duration under the two control actions as estimated for the multiple quantity analysis (solid) together with the corresponding estimates for the single quantity analysis (dashed). The figure illustrates how inclusion of multiple quantities in the analysis leads to tighter distributions, centered on projections for i = 1. The multi-quantity analysis produces a probability distribution of all considered quantities, and Fig 7 further illustrates how the marginal posterior densities are located above zero for all three considered quantities.

Fig 6. Ensemble prediction of outbreak duration with Q = 1 and Q = 3.

Posterior estimates of mean outbreak duration for the implemented (Panel A) and alternative (Panel B) control action, with the dashed line indicating the posteriors corresponding to the single quantity analysis and the solid line indicating the corresponding posterior when number of infected and control culls are included. The figure shows the result of the large discrepancy ensemble.

Fig 7. Posterior estimates of difference between controls.

The figure shows the marginal posterior predictive estimates of difference in outbreak duration (Panel A), number of infected farms (Panel B) and number of control culls (Panel C) between the implemented and alternative control action. The figure shows the result of the large discrepancy ensemble.

To illustrate the performance of the method under different conditions, we also analyzed simulated outbreaks. Fig 8 shows the posteriors of mean duration for outbreaks simulated with the k1 = k2 = 1 parameterization and applying the small (triangles) and large (circles) discrepancy ensembles, represented by the median values and error bars indicating the 95% central credibility interval. Note that individual realizations, indicated by stars, are expected to frequently be outside of the credibility envelopes. Error bars are inclusive of the true mean outbreak duration (dashed lines) for all ten analyzed realizations for both the implemented and alternative control actions. However, the credibility envelopes are tighter and medians closer to the true value for the multi-quantity analysis. This indicates that the ensemble prediction is improved by including multiple quantities.

Fig 8. Analysis of synthetic data.

Triangles and circles indicate the median posterior estimates of mean outbreak duration after analysis with the small and large discrepancy ensemble, respectively, under the implemented (Panels A and C) and alternative (Panels B and D) control action. Results are shown for ten realizations of synthetic data simulated with k1 = k2 = 1. The error bars indicate the 95% credibility interval, the dashed lines the true values and the star the individual realization (expected to frequently lie outside the predicted mean). Panels A and B show the results of single quantity analysis (only outbreak duration) and Panels C and D the corresponding results of analyses where number of infected farms and control culls are included.

When applying the analysis to outbreaks simulated with the k1 = k2 = 0.9 parameterization (Fig 9), the large discrepancy ensemble error bars are still consistently inclusive of the true value. As with Fig 8, credibility envelopes are tighter for the multi-quantity analysis. The error bars of the small discrepancy ensemble that all rely on simulations with parameterizations with higher k1 and k2 than the true value, are not inclusive of the true value, indicating that the small discrepancy ensemble fails in predicting the true values of the outbreak.

Fig 9. Analysis of synthetic data.

Triangles and circles indicate the median posterior estimates of mean outbreak duration after analysis with the small and large discrepancy ensemble, respectively, under the implemented (Panels A and C) and alternative (Panels B and D) control action. Results are shown for ten realizations of synthetic data simulated with k1 = k2 = 0.9. The error bars indicate the 95% credibility interval, the dashed lines the true values and the star the individual realization analyzed (expected to often lie outside the predicted mean). Panels A and B show the results of single quantity analysis (only outbreak duration) and Panels C and D the corresponding results of analyses where number of infected farms and control culls are included.


Ensemble modeling is appealing because it offers the possibility to combine multiple projections. Within weather forecasting, the approach has given more robust predictions, and we could expect that to be the case for epidemiology as well. However, there is a need for the development of methods describing how to combine several epidemiological projections. The aim of this study was to investigate the possibility of using the Bayesian framework introduced by Tebaldi et al. [47]. We find that it is a promising approach, for primarily three reasons.

Firstly, when the methodology is implemented in a hierarchical Bayesian framework, it provides an appealing interpretation of model exchangeability. Essentially, projections and their underlying modeling assumptions are treated as random draws from a population of possible projections. By estimating the hierarchical parameters aλ and mλ jointly with individual precisions (weights) λi, the characteristics of this hypothetical population are estimated. Smith et al. [61] used a similar approach for climate ensembles and pointed out that this reduces the impact of which models are included or excluded in the ensemble. That is, we should expect to get similar results when using a different set of model assumptions if they are chosen independently. We stress that this interpretation is more valid for multi-model ensembles, however. Also, the term “random draws” should not be interpreted as arbitrary. Rather, the interpretation is that models should come from a population of well-informed, reasonable models. The analysis treats the outputs of the performed simulations under different assumptions as data (Eq 1, Table 1), and as such they are used to inform the quantities of interest (μ and v). This may seem counterintuitive, yet it only serves as a formal means to combine the results of multiple projections, and by Eqs (7) and (20), these are combined with available outbreak data.

Secondly, the framework can handle several different weighting schemes simultaneously. The original methods introduced by Tebaldi et al. [47] used convergence and bias to assess weights. Here, we further extend the framework such that informative priors can be included to inform the weights, thus relaxing the supposition that all modeling assumptions are a priori exchangeable. Epidemiological predictions suffer from lack of available data to assess model bias, and we propose that expert opinions will play a larger role than in other fields of research. With the analytical tool proposed here, a policymaker can choose to include a range of projections based on different modeling assumptions, yet give them different weights, rather than including one or a few (given a weight of one) and excluding others (given a weight of zero). When using different mechanistic models, subjective trust in the different models can be incorporated by using methods of prior elicitation based on expert opinion [38]. Importantly, our methods can incorporate these subjective beliefs in the hierarchical framework, requiring only the specification of the a priori relative confidence in the underlying assumptions of the projections. Definition of an individual, fixed prior would undoubtedly be cumbersome to elicit from expert opinion; it would not be feasible to ask policymakers to define an individual gamma prior for each modeling assumption.

Here we used ensembles based on projections of the same model with different parameterizations, demonstrating the possibility to explore parameter space, yet with unequal probabilities of different parameterizations. Uncertainty about parameters will be an issue for most epidemiological models, and we propose that multi-model ensembles should incorporate projections with different models and different parameterizations. Thus, different mechanistic assumptions as well as parameter uncertainty would be incorporated in the ensemble.

Thirdly, the framework produces easily interpretable probability distributions. It is important that communication with policymakers include uncertainties about prediction rather than just the most likely outcome [16]. In the ensemble context, these uncertainties take into account different assumptions about the transmission process. Gårdmark et al. [32] suggested that uncertainty should be communicated with policymakers by presenting the full range of predicted outcomes. However, that would give equal weights to all included projections and would require that the results be communicated with a detailed description of all assumptions made, thus allowing the policymaker to decide how much to trust each modeling assumption. This would be a cumbersome task, particularly for detailed simulation models that rely on a large number of parameters. We therefore argue that it is beneficial to communicate the aggregated and weighted result as easily interpretable probability distributions. With further modifications of the methodology, we propose that the approach could also be used as a forecasting tool during an outbreak, e.g. by letting xi and yi denote current and future numbers of infected farms. In such a situation, there is a great need for rapid and clear communication of model results to aid policy decisions. The visual manner in which uncertainty is presented using probability distributions makes them easy to understand and communicate [62].

We here show that these distributions are sensitive to the choice of priors when using the NHW method (Fig 2, panels A-D, Fig 3, panels A, B). However, the impact of the prior is heavily reduced when using the hierarchical framework (Fig 2, panels E-H, Fig 3, panels C, D). Thus, our results demonstrate that the hierarchical approach is preferred for ensemble modeling and using the non-hierarchical approach can lead to spurious conclusions. We argue that this would also be the case for other fields, such as climate ensembles, but it is likely to be a larger concern for epidemiology where data to modify the prior are fewer. Considering Eq (11), we could ensure that b has little contribution to the denominator if τ0≫λi for all i, ensuring that the prior has little contribution to the posterior. For climate considerations, we envisage that the precision of natural variability, τ0, would be large relative to each λi if bias is assessed by comparing model simulations to long time series of climate data. For epidemiological considerations, this would however rarely be the case. In the proposed method, we instead inform τ0 largely by the simulation outputs, letting the projections of the ensemble determine how variable outcomes are.

Climate modeling, from which the proposed method is adapted, is primarily concerned with differences between current and future mean climate variables [24]. Epidemiology is not only concerned with mean projections but also with other quantities such as the probability of very large or long outbreaks occurring. Fig 2, panels G and H illustrates the probability of a given epidemic duration occurring for a single outbreak under the two control actions with the preferred SHW method. Comparing the posterior predictive distribution to the density of merely lumping the results of all simulations, as illustrated by the colored bars, the posterior predictive distribution of the ensemble method has a lower probability of both very long and short outbreaks. This is because projections of such outbreaks are down-weighted when their bias is assessed in the analysis; the observed outbreak duration would be unlikely under the modeling assumptions that produce these projections. Thus, ensemble methods that give equal weights to all projections can overestimate the uncertainty about outbreaks, preventing the models from informing appropriate policy decisions.

We have further extended the methodology to allow for informative priors on the weights. Compared to climate models, epidemiology often has far less available data to assess model bias. As such, expert opinion will often play a larger role within this field. Fig 4 illustrates the behavior of the ensemble prediction under such informative priors. When up-weighting projections for i = 1, which is also likely under the observed outbreak duration, the posteriors are shifted towards these projections and produce tighter distributions. This is also found when up-weighting the outlier, i = 5, in the small discrepancy ensemble (Fig 4, panels A and B), in which no projection is particularly unlikely for the observed duration. Projection x5 is however unlikely in the large discrepancy ensemble. As a result, the effect of up-weighting the underlying modeling assumptions of this projection primarily makes the distribution wider, resulting from a larger uncertainty about individual weights (Fig 5). This is to be interpreted such that if expert opinions a priori determine that a modeling assumption that is unlikely to predict the observed data is better than other assumptions, the conclusions should be that there is less information in the ensemble as whole. However, when expert opinions are well informed and do not contradict with observed data, they can lead to more precise predictions.

It should be stressed that discrepancy among projections in the ensembles should be viewed as relative to τ0, the estimated variability in outbreak duration given the initial conditions. A crucial difference between the original method applied to climate change and the epidemiological consideration presented here is that τ0 is unknown for the latter and therefore needs to be estimated. We argue that in the absence of multiple outbreaks, it is sensible to inform this by the model simulations. Stochastic simulations are often used to estimate the range of outcomes for non-ensemble projections [1,17,18,23], and we propose that when extending the use of models to the ensemble context, they can be used to estimate this feature as well. We have therefore chosen a Bayesian model structure where τ0 is informed largely by the within projection variability, τ1, τ2,…, τn, via the Gamma(aτ, bτ) distribution in Eq (8). All projections of the implemented control action contribute equally to this distribution in the method presented here, thus we are giving equal weights to all modeling assumptions in terms of informing τ0. Estimation of different weights in terms of informing τ0 based on a single outbreak, analogous to the estimation of λ, would not be conceivable. However, if policymakers believe that some modeling assumptions are more reliable in terms of capturing the variability of outcomes, we envisage that the Bayesian model structure can be altered to include this. If applied to endemic disease, τ0 could be informed similarly to the natural variability of temperature in climate application, and the algorithm we supply is set up to handle this situation. Also, data from multiple outbreaks could be used to inform τ0 when available. Yet, data quality will rarely be comparable to climate data, which highlights one of the major challenges for epidemiological modeling.

We also provide a multi-quantity extension of the Bayesian ensemble framework. Fig 5 shows that when adding number of infected and culled farms to the analysis, the marginal posteriors of outbreak duration become narrower and centered on x1 and y1, i.e. the projections based on the most likely scenario. This illustrates that predictions can be improved by incorporating multiple quantities when assessing the weights.

The main scope of this study is to introduce ensemble methods to the field of epidemiology rather than to produce inference about the 2001 FMD outbreak. However, Fig 7 illustrates the types of conclusions the method can provide. The three quantities we include in the multi-quantity analysis are all of great concern to policy makers when assessing the impact of control actions. The probability distributions represent the ensemble predicted difference in the outcome of the outbreak if the control action had excluded culling of CPs. The distributions all have most of the density above zeros, indicating that excluding culling of CPs would most likely have resulted in a prolonged and larger outbreak. We should however point out that these results are based on a single model. To make more robust predictions, we propose that the same type of analysis be made with projections of different models.

We also analyzed simulated data to provide a more general depiction of the performance of the method under different conditions. Fig 8 shows the result of analysis of ten simulated outbreaks with the parameterization in the center of the ensemble, i.e. k1 = k2 = 1. As this is in agreement with both the small and large discrepancy ensemble, the true values (dashed lines) consistently lie within the 95% credibility intervals. However, when using the k1 = k2 = 0.9 parameterization, the assumptions of the model used to simulate the outbreak is only inclusive of the large discrepancy ensemble, and consequently only the large discrepancy ensemble error bars are inclusive of the true values. Noting that we primarily use the different parameterization as a proxy for different models, this simple simulation example illustrates some obvious but essential points. Ensemble modeling should not be interpreted as a remedy for models based on poor assumptions about the modeled process. It offers the ability to combine multiple assumptions, thus integrating uncertainty with regards to this in the predictions. However, if all models are based on similar but inaccurate assumptions, ensemble modeling will not improve predictions. Intentionally making models similar to each other increases this risk and should be avoided if the models are to be used for ensemble purposes.

Accepting these limitations, we argue that the ensemble approach will be beneficial to epidemiological risk assessment because rather than choosing a single model for the purpose, it offers the possibility to combine projections from models that make mechanistically different assumptions about the transmission process. Thus, uncertainty with regard to this is incorporated in the predictions, which is important as projections of different models have been reported to deviate [6365]. The use of multi-model ensembles would rely on collaboration of modeling teams, as well as overcoming confidentiality constraints in accessing outbreak data and population demographics. The current development in FMD modeling is seeing encouraging development in that area. The Quadrilateral Epiteam [19] has compared simulation of several outbreak scenarios in a subset of the UK demographics with five different models: NAADSM [45], Netherlands CVI [66], InterSpreadPlus [46], AusSpread [44] and ExoDis [67].

This demonstrates that potential obstacles for multi-model ensembles can be overcome and we envisage that epidemiology will see a shift towards multi-model ensembles to inform policy decisions, as has been seen in climate research [24,25] and weather forecasting [26,27]. Combining the results of multiple models however requires means of weighting these. We conclude that the presented framework is a promising approach because it provides easily interpretable probability distributions of quantities of interest. It also offers an appealing interpretation of model exchangeability, while at the same time combining several different weighting schemes, including a priori beliefs when such are available.

In this study, we introduced this framework by applying it to a simple question: how would exclusion of contiguous premises culling from the control action have affected the outcome of the UK 2001 outbreak? The aim of the study has been to introduce the methodological framework to epidemiology and solve some key issues associated with this transfer, including prior sensitivity, informing weights by expert opinion, using models to inform the variability in the outcome of individual outbreaks and extension to consider multiple epidemic quantities. We have purposely chosen the simple example because it allows for a straightforward transfer from the original climate implementation, and at the same time lets us demonstrate essential concepts and the potential of the framework. Models are however used to answer a range of different questions in epidemiology, and combining multiple projections has the potential to improve the way models are used to inform policy. We argue that the framework we introduce here has great potential, and foresee that many of the questions addressed in epidemiological modeling would require further developments of the Bayesian model, structured to fit with the specific problem. To facilitate this, we have supplied the algorithm (S1 File) and hope that it will aid further development of ensemble methods for epidemiology.

Supporting Information

Author Contributions

Conceived and designed the experiments: TL MJT CW. Performed the experiments: TL MJT. Analyzed the data: TL. Contributed reagents/materials/analysis tools: TL MJT. Wrote the paper: TL MJT CW.


  1. 1. Woolhouse M (2011) How to make predictions about future infectious disease risks. Philos Trans R Soc Lond B Biol Sci 366: 2045–2054. pmid:21624924
  2. 2. Murray CJL, Rosenfeld LC, Lim SS, Andrews KG, Foreman KJ, et al. (2012) Global malaria mortality between 1980 and 2010: a systematic analysis. Lancet 379: 413–431. pmid:22305225
  3. 3. Thomson MC, Doblas-Reyes FJ, Mason SJ, Hagedorn R, Connor SJ, et al. (2006) Malaria early warnings based on seasonal climate forecasts from multi-model ensembles. Nature 439: 576–579. pmid:16452977
  4. 4. Ferguson NM, Cummings D a T, Fraser C, Cajka JC, Cooley PC, et al. (2006) Strategies for mitigating an influenza pandemic. Nature 442: 448–452. pmid:16642006
  5. 5. Anderson I (2002) Foot and Mouth Disease 2001: Lessons to be Learned Inquiry Report. London, UK.
  6. 6. Keeling MJ, Woolhouse MEJ, May RM, Davies G, Grenfell BT (2003) Modelling vaccination strategies against foot-and-mouth disease. Nature 421: 136–142. pmid:12508120
  7. 7. Tildesley MJ, Savill NJ, Shaw DJ, Deardon R, Brooks SP, et al. (2006) Optimal reactive vaccination strategies for a foot-and-mouth outbreak in the UK. Nature 440: 83–86. pmid:16511494
  8. 8. Tildesley MJ, Bessell PR, Keeling MJ, Woolhouse MEJ (2009) The role of pre-emptive culling in the control of foot-and-mouth disease. Proc R Soc B Biol Sci 276: 3239–3248. pmid:19570791
  9. 9. Keeling MJ, Woolhouse MEJ, Shaw DJ, Matthews L, Chase-Topping M, et al. (2001) Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a heterogeneous landscape. Science 294: 813–817. pmid:11679661
  10. 10. Ferguson NM, Donnelly CA, Anderson R (2001) The Foot-and-Mouth Epidemic in Great Britain: Pattern of Spread and Impact of Interventions. Science 292: 1155–1160. pmid:11303090
  11. 11. Reiner RC, Stoddard ST, Forshey BM, King A a, Ellis AM, et al. (2014) Time-varying, serotype-specific force of infection of dengue virus. Proc Natl Acad Sci U S A 111: E2694–E2702. pmid:24847073
  12. 12. Shea K, Tildesley MJ, Runge MC, Fonnesbeck CJ, Ferrari MJ (2014) Adaptive management and the value of information: learning via intervention in epidemiology. PLoS Biol 12: e1001970. pmid:25333371
  13. 13. Massad E, Burattini MN, Lopez LF, Coutinho F a B (2005) Forecasting versus projection models in epidemiology: the case of the SARS epidemics. Med Hypotheses 65: 17–22. pmid:15893110
  14. 14. Shaman J, Karspeck A (2012) Forecasting seasonal outbreaks of influenza. Proc Natl Acad Sci U S A 109: 20425–20430. pmid:23184969
  15. 15. Hufnagel L, Brockmann D, Geisel T (2004) Forecast and control of epidemics in a globalized world. Proc Natl Acad Sci U S A 101: 15124–15129. pmid:15477600
  16. 16. Ferguson NM, Keeling MJ, Edmunds WJ, Gani R, Grenfell BT, et al. (2003) Planning for smallpox outbreaks. Nature 425: 681–685. pmid:14562094
  17. 17. Chao DL, Halloran ME, Obenchain VJ, Longini IM (2010) FluTE, a publicly available stochastic influenza epidemic simulation model. PLoS Comput Biol 6: e1000656. pmid:20126529
  18. 18. Buhnerkempe M, Tildesley MJ, Lindström T, Grear DA, Portacci K, et al. (2014) The Impact of Movements and Animal Density on Continental Scale Cattle Disease Outbreaks in the United States. PLoS One 9: e91724. pmid:24670977
  19. 19. Roche SE, Garner MG, Sanson RL, Cook C, Birch C, et al. (2014) Evaluating vaccination strategies to control foot-and-mouth disease: a model comparison study. Epidemiol Infect: 1–20.
  20. 20. Mollison D (1977) Spatial Contact Models for Ecological and Epidemic Spread. J R Stat Soc Ser B 39: 283–326.
  21. 21. Anderson RM, May RM (1979) Population biology of infectious diseases: Part I. Nature 280: 361–367. pmid:460412
  22. 22. May R, Anderson R (1987) Transmission dynamics of HIV infection. Nature 326: 137–142. pmid:3821890
  23. 23. Lindström T, Stenberg Lewerin S, Wennergren U (2012) Influence on disease spread dynamics of herd characteristics in a structured livestock industry. J R Soc Interface 9: 1287–1294. pmid:22112656
  24. 24. Tebaldi C, Knutti R (2007) The use of the multi-model ensemble in probabilistic climate projections. Philos Trans A Math Phys Eng Sci 365: 2053–2075. pmid:17569654
  25. 25. IPCC (2013) Summary for Policymakers. In: Stocker TF, Qin D, Plattner G- K, Tignor M, Allen SK, et al., editors. The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press.
  26. 26. Palmer TN, Doblas-Reyes FJ, Hagedorn R, Alessandri a., Gualdi S, et al. (2004) Development of a European Multimodel Ensemble System for Seasonal-To-Interannual Prediction (Demeter). Bull Am Meteorol Soc 85: 853–872.
  27. 27. Gneiting T, Raftery AE (2005) Weather forecasting with ensemble methods. Science 310: 248–249. pmid:16224011
  28. 28. Velázquez J a., Anctil F, Perrin C (2010) Performance and reliability of multimodel hydrological ensemble simulations based on seventeen lumped models and a thousand catchments. Hydrol Earth Syst Sci 14: 2303–2317.
  29. 29. Cloke HL, Pappenberger F (2009) Ensemble flood forecasting: A review. J Hydrol 375: 613–626.
  30. 30. Mangiameli P, West D, Rampal R (2004) Model selection for medical diagnosis decision support systems. Decis Support Syst 36: 247–259.
  31. 31. West D, Mangiameli P, Rampal R, West V (2005) Ensemble strategies for a medical diagnostic decision support system : A breast cancer diagnosis application. Eur J Oper Res 162: 532–551.
  32. 32. Gårdmark A, Lindegren M, Neuenfeldt S, Blenckner T, Heikinheimo O, et al. (2013) Biological ensemble modeling to evaluate potential futures of living marine resources. Ecol Appl 23: 742–754. pmid:23865226
  33. 33. Maiorano L, Falcucci A, Zimmermann NE, Psomas A, Pottier J, et al. (2011) The future of terrestrial mammals in the Mediterranean basin under climate change. Philos Trans R Soc Lond B Biol Sci 366: 2681–2692. pmid:21844047
  34. 34. Daszak P, Zambrana-Torrelio C, Bogich TL, Fernandez M, Epstein JH, et al. (2013) Interdisciplinary approaches to understanding disease emergence: the past, present, and future drivers of Nipah virus emergence. Proc Natl Acad Sci U S A 110 Suppl: 3681–3688. pmid:22936052
  35. 35. Guis H, Caminade C, Calvete C, Morse AP, Tran A, et al. (2012) Modelling the effects of past and future climate on the risk of bluetongue emergence in Europe. J R Soc Interface 9: 339–350. pmid:21697167
  36. 36. Smith T, Ross A, Maire N, Chitnis N, Studer A, et al. (2012) Ensemble Modeling of the Likely Public Health Impact of a Pre-Erythrocytic Malaria Vaccine. PLoS Med 9: e1001157. pmid:22272189
  37. 37. IPCC (2001) Climate Change 2001: Synthesis Report. Houghton JT, Ding Y, Griggs DJ, Noguer M, van der Linden PJ, et al., editors Cambridge, UK: Cambridge University Press.
  38. 38. Ye M, Pohlmann KF, Chapman JB (2008) Expert elicitation of recharge model probabilities for the Death Valley regional flow system. J Hydrol 354: 102–115.
  39. 39. Räisänen J, Palmer TN (2001) A Probability and Decision-Model Analysis of a Multimodel Ensemble of Climate Change Simulations. J Clim 14: 3212–3226.
  40. 40. Burnham KP, Anderson DR (2004) Multimodel Inference Understanding AIC and BIC in Model Selection. Sociol Methods Res 33: 261–304.
  41. 41. Gibbons JM, Cox GM, Wood a T a., Craigon J, Ramsden SJ, et al. (2008) Applying Bayesian Model Averaging to mechanistic models: An example and comparison of methods. Environ Model Softw 23: 973–985.
  42. 42. Jewell CP, Kypraios T, Christley RM, Roberts GO (2009) A novel approach to real-time risk prediction for emerging infectious diseases: a case study in Avian Influenza H5N1. Prev Vet Med 91: 19–28. pmid:19535161
  43. 43. Tildesley MJ, Deardon R, Savill NJ, Bessell PR, Brooks SP, et al. (2008) Accuracy of models for the 2001 foot-and-mouth epidemic. Proc R Soc B Biol Sci 275: 1459–1468. pmid:18364313
  44. 44. Garner MG, Beckett SD (2005) Modelling the spread of foot-and-mouth disease in Australia. Aust Vet J 83: 758–766. pmid:16395942
  45. 45. Harvey N, Reeves A, Schoenbaum M a, Zagmutt-Vergara FJ, Dubé C, et al. (2007) The North American Animal Disease Spread Model: a simulation model to assist decision making in evaluating animal disease incursions. Prev Vet Med 82: 176–197. pmid:17614148
  46. 46. Stevenson M a, Sanson RL, Stern MW, O’Leary BD, Sujau M, et al. (2013) InterSpread Plus: a spatial and stochastic simulation model of disease in animal populations. Prev Vet Med 109: 10–24. pmid:22995473
  47. 47. Tebaldi C, Smith RL, Nychka D, Mearns LO (2005) Quantifying Uncertainty in Projections of Regional Climate Change : A Bayesian Approach to the Analysis of Multimodel Ensembles. J Clim 18: 1524–1540.
  48. 48. Giorgi F, Mearns LO (2002) Calculation of Average, Uncertainty Range, and Reliability of Regional Climate Changes from AOGCM Simulations via the “Reliability Ensemble Averaging” (REA) Method. J Clim 15: 1141–1158.
  49. 49. Schmitt SM, O’brien DJ, Bruning-Fann CS, Fitzgerald SD (2002) Bovine tuberculosis in Michigan wildlife and livestock. Ann N Y Acad Sci 969: 262–268. pmid:12381603
  50. 50. Grassly NC, Fraser C (2006) Seasonal infectious disease epidemiology. Proc Biol Sci 273: 2541–2550. pmid:16959647
  51. 51. Diggle PJ (2006) Spatio-temporal point processes, partial likelihood, foot and mouth disease. Stat Methods Med Res 15: 325–336. pmid:16886734
  52. 52. Deardon R, Brooks SP, Grenfell BT, Keeling MJ, Tildesley MJ, et al. (2010) Inference For Individual-Level Models Of Infectious Diseases In Large Populations. Stat Sin 20: 239–261.
  53. 53. Mahul O, Durand B (2000) Simulated economic consequences of foot-and-mouth disease epidemics and their public control in France. Prev Vet Med 47: 23–38.
  54. 54. Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis. 2nd ed. Chapman & Hall/CRC.
  55. 55. Lindström T, Grear D a, Buhnerkempe M, Webb CT, Miller RS, et al. (2013) A bayesian approach for modeling cattle movements in the United States: scaling up a partially observed network. PLoS One 8: e53432. pmid:23308223
  56. 56. Amiel JJ, Lindström T, Shine R (2014) Egg incubation effects generate positive correlations between size, speed and learning ability in young lizards. Anim Cogn 17: 337–347. pmid:23922118
  57. 57. Garthwaite PH, Kadane JB, O’Hagan A (2005) Statistical Methods for Eliciting Probability Distributions. J Am Stat Assoc 100: 680–701.
  58. 58. Garthwaite PH, Fan Y, Sisson SA (2014) Adaptive Optimal Scaling of Metropolis-Hastings Algorithms Using the Robbins-Monro Process. Commun Stat Theory Methods In press. pmid:25089070
  59. 59. Roberts GO, Gelman A, Gilks WR (1997) Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms. Ann Appl Probab 7: 110–120.
  60. 60. Lindström T, Brown GP, Sisson SA, Phillips BL, Shine R (2013) Rapid shifts in dispersal behavior on an expanding range edge. Proc Natl Acad Sci 110: 13452–13456. pmid:23898175
  61. 61. Smith RL, Tebaldi C, Nychka D, Mearns LO (2009) Bayesian Modeling of Uncertainty in Ensembles of Climate Models. J Am Stat Assoc 104: 97–116.
  62. 62. Wade PR (2000) Bayesian methods in conservation biology. Conserv Biol 14: 1308–1316.
  63. 63. Dubé C, Stevenson M a, Garner MG, Sanson RL, Corso B a, et al. (2007) A comparison of predictions made by three simulation models of foot-and-mouth disease. N Z Vet J 55: 280–288. pmid:18059645
  64. 64. Gloster J, Jones A, Redington A, Burgin L, Sørensen JH, et al. (2010) Airborne spread of foot-and-mouth disease—Model intercomparison. Vet J 183: 278–286. pmid:19138867
  65. 65. Sanson RL, Harvey N, Garner MG, Stevenson MA (2011) Foot and mouth disease model verification and “relative validation.” Rev Sci Tech L`Office Int Des Epizoot 30: 527–540. pmid:21961223
  66. 66. Backer J a, Hagenaars TJ, Nodelijk G, van Roermund HJW (2012) Vaccination against foot-and-mouth disease I: epidemiological consequences. Prev Vet Med 107: 27–40. pmid:22749763
  67. 67. DEFRA (2005) D5100/R3. Cost Benefit Analysis of Foot and Mouth Disease Controls.