A Bayesian Ensemble Approach for Epidemiological Projections

Mathematical models are powerful tools for epidemiology and can be used to compare control actions. However, different models and model parameterizations may provide different prediction of outcomes. In other fields of research, ensemble modeling has been used to combine multiple projections. We explore the possibility of applying such methods to epidemiology by adapting Bayesian techniques developed for climate forecasting. We exemplify the implementation with single model ensembles based on different parameterizations of the Warwick model run for the 2001 United Kingdom foot and mouth disease outbreak and compare the efficacy of different control actions. This allows us to investigate the effect that discrepancy among projections based on different modeling assumptions has on the ensemble prediction. A sensitivity analysis showed that the choice of prior can have a pronounced effect on the posterior estimates of quantities of interest, in particular for ensembles with large discrepancy among projections. However, by using a hierarchical extension of the method we show that prior sensitivity can be circumvented. We further extend the method to include a priori beliefs about different modeling assumptions and demonstrate that the effect of this can have different consequences depending on the discrepancy among projections. We propose that the method is a promising analytical tool for ensemble modeling of disease outbreaks.


Introduction
Epidemiological forecasting is inherently challenging because the outcome often depends on largely unpredictable characteristics of hosts and pathogens as well as contact structure and pathways that mediate transmission [1]. Faced with such uncertainty, policy makers must still make decisions with high stakes, both in terms of health and economics. For instance, global annual malaria mortality was recently estimated at around 1.1 million [2] and to optimize control efforts, policy makers must make seasonal predictions about spatiotemporal patterns [3]. The prospect of an emergent pandemic influenza outbreak remains a global threat and emergency preparedness must evaluate the costs and benefits of control measures such as border control, closing of workplaces and/or schools as well as different vaccination strategies [4]. Livestock diseases are major concerns for both animal welfare and economics. As an example, the United Kingdom (UK) 2001 outbreak of foot and mouth disease (FMD) involved culling of approximately 7 million animals, either in an effort to control the disease or for welfare reasons, and the total cost has been estimated at £8 billion [5]. To minimize the size and duration of future outbreaks, various strategies for culling and vaccination must be compared [6][7][8]. As a tool to address these challenging tasks, mathematical models offer the possibility to explore different scenarios, thereby informing emergency preparedness and response to epidemics [1,[9][10][11][12].
The predictive focus of epidemiological models can either be classified as forecasting or projecting [13]. Forecasting aims at estimating what will happen and can be used for example to predict seasonal peaks of outbreaks [3,14] or to identify geographical areas of particular concern [15]. Projecting, which is the main focus of this study, instead aims at comparing different scenarios and exploring what would happen under various assumptions of transmission, e.g. comparing the effectiveness of different control actions [7,[16][17][18][19].
Whilst analytical models clearly provide important insight into observed dynamics and a theoretical understanding of epidemiology [20][21][22], there has been a shift in recent years towards stochastic simulation models for predictive purposes [1]. Typically, dynamic models are constructed and outbreaks are simulated repeatedly, thus generating predictive distributions of outcomes [1,17,18,23]. This variability in outcomes caused by the mere stochasticity of the transmission process includes one level of uncertainty, but still only relies on a single set of assumptions about the underlying disease transmission process. However, multiple assumptions can often be justified, leading to further uncertainty in the predictions. For instance, different models may have different projections because of different assumptions about transmission or because they incorporate different levels of detail. It may also be informative to explore different projections in terms of different parameterizations of a single model, for example corresponding to worst or best case scenarios. Faced with a set of projections, an important issue is how to combine these in a manner such that they can be used to inform policy.
The issue of multiple projections is not unique to the field of epidemiology, and various techniques of ensemble modeling have been used to merge projections based on different modeling assumptions. The key concept is that rather than relying on a single set of assumptions, a range of projections is used for predictive purposes. For instance, climate forecasting has employed ensemble techniques to account for uncertainty about initial conditions, parameter values and structure of the model design when predicting climate change [24,25]. Weather forecasting has been improved by combining the results of multiple models [26,27]. Similarly, hydrological model ensembles have been demonstrated to increase reliability of catchment forecasting [28] and have been used to assess the risk of flooding events [29]. Ensemble methods have also proven to be a powerful decision tool for medical diagnostics [30,31] and ecological considerations including management [32] and prediction of future species distribution [33].
Ensemble modeling has not yet been extensively used in epidemiology. However a few implementations exist, commonly by feeding climate or weather ensembles into disease models. Daszak et al. [34] coupled a set of climate projections to an environmental niche model of Nipah virus to predict future range distribution of the virus under climate change. Similarly, Guis et al. [35] investigated the effect of climate change on Bluetongue emergence in Europe by simulating outbreaks under different climate change scenarios. Focusing on a shorter time scale, Thomson et al. [3] used an ensemble of seasonal forecasts to predict the spatiotemporal pattern of within seasonal variation in malaria incidence. These studies all used a single disease model projection, coupled to an ensemble of climate or weather forecasts and the use of structurally different epidemiological models are to our knowledge still rare. However, Smith et al. [36] compared different malaria vaccination strategies by implementing an ensemble approach with different alterations of a base model. Also, in order to estimate global malaria mortality, Murray et al. [2] used weighted averages of different predictive models.
Given the success of ensemble methods in other fields, we expect that epidemiological implementations will increase. For that purpose however, there is a need for methods that combine multiple projections. A central issue in ensemble modeling is how to weight different projections, and we envisage four main procedures for this. Firstly, all models can be given equal weights. For instance, the IPCC 2001 report on climate change [37] used a set of climate models and gave the range of probable scenarios by averaging over different models and uncertainty by envelopes that included all scenarios. Gårdmark et al. [32] used seven ecological models for cod stock and argued that in order to prevent underestimation of uncertainty, weighted model averages are not to be used and when communicating with policy makers, it is preferable to present all included projections as well as the underlying assumptions behind them. A similar approach was also used by Smith et al. [36], who presented the prevalence of malaria under different vaccination strategies by medians of individual models and the range of the whole ensemble.
Secondly, expert opinions can be used to weight models. To our knowledge, no ensemble study has implemented weights based exclusively on expert opinion, but Bayesian model averaging can incorporate expert opinion as a subjective prior on model probabilities [38]. This approach relies on engaging stakeholders and communicating the underlying assumptions of the projections.
Thirdly, models can be weighted by agreement with other models. This approach was implemented by Räisänen and Palmer [39], who used cross-validation to weight climate models. As a more informal approach to the use of model consensus, the third IPCC report excluded two models because these predicted much higher global warming than the rest of the ensemble, thus acting as outliers [24].
Fourthly, weights can be determined by the models' ability to replicate data. If all models are fitted to the same data using likelihood based methods, weights can be given directly by Akaike or Bayesian Information Criterion (AIC or BIC) [40,41]. In the FMD context, this may be a suitable approach if all included models are data driven kernel models that estimate parameters from outbreak data, such as those proposed by Jewell et al. [42] or Tildesley et al. [43]. However, such weighting schemes would be unfeasible when including detailed simulation models that rely on a large number of parameters, that are determined by expert opinion or lab experiment, such as AusSpread [44], NAADSM [45] and InterSpread Plus [46]. We propose that the future of ensemble modeling for epidemiology will benefit from combining structurally different model types, and methods of weighting need to handle both kernel type models as well as detailed simulation models.
Thus, bias assessment is often confined to the ability of models to replicate observed summary statistics of interest, in particular when the resolution of data observation is on a courser scale than the model prediction [47]. Such methods have been developed within the field of climate forecasting. Giorgi and Mearns [48] introduced a formal framework in which model weights were assessed based on model bias compared to observed data as well as convergence, i.e. agreement with the model consensus. Tebaldi et al. [47] extended the approach to a Bayesian framework. This approach is appealing because it provides probability distributions of quantities of interest, hence uncertainty about the projected outcomes may be provided to policy makers. As such, it would be a suitable approach also for epidemiological predictions.
However, methods developed in one field might not be directly transferable to another. Tebaldi et al. [47] points out that lack of data at fine scale resolution is a limiting factor for climate forecasting. Yet, at courser resolution climate researchers have access to long time series of climate variables to assess model bias. Comparable data may be available for endemic diseases, such as malaria [36] or tuberculosis [49], or seasonally recurrent outbreaks, such as influenza [14] or measles [50]. However, for emerging diseases, long time series would rarely be available, making the lack of data an even bigger issue for epidemiology.
In this methodological paper we aim to explore the potential of using ensemble methods based on the approach presented by Tebaldi et al. [47] for epidemiological projections. The Tebaldi et al. methodology focus on ensembles where projections are made with different models and our aim is to provide a corresponding framework for disease outbreak projections. To investigate the potential of the framework for epidemiology, we here use variations of a single model as a proxy for different models, thus allowing us to investigate how the methodology performs under different levels of discrepancy among projections in the ensemble. We exemplify the implementation by using the UK 2001 FMD outbreak and projections modelled by different parameterizations of the Warwick model [7,9].
In the 2001 UK FMD outbreak, livestock on all infected premises (IPs) were culled. In addition, livestock on farms that were identified to be at high risk of infection were culled as either traditional dangerous contacts (DCs) or contiguous premises (CPs). CPs were defined as "a category of dangerous contact where animals may have been exposed to infection on neighboring infected premises" [5,8]. We start by focusing on ensemble prediction of epidemic duration under the control action employed during the 2001 outbreak compared with an alternative action that excludes CP culling. We investigate sensitivity to priors and explore a hierarchical Bayesian extension of the method to circumvent potential problems with prior sensitivity. We also explore the potential of including subjective a priori trust in the different modeling assumptions and extend the methodology further to allow incorporation of multiple epidemic quantities, here exemplified by adding number of infected and culled farms to the analysis. Through a simulation study, we finally explore the capacity and limitations of the proposed ensemble method, pinpointing some important features of ensemble modeling

Materials and Methods
We apply a terminology such that control actions refers to different strategies for disease control. In the ensemble, each action is simulated under different modeling assumptions about the underlying process, expressed as either different models or, as in the example described here, different parameterizations of the same model. We refer to the combination of control action and modeling assumption as a projection. Each projection is further simulated with several replicates, which produce different outcomes merely due to the stochasticity of simulation models. We are also interested in how discrepancy among projections influences the performance of the weighting method and refer to sets of different projections as different ensembles with small and large discrepancy. A flow chart that demonstrates the relationship between different concepts and weighting schemes are presented in Fig 1.

The Warwick model, control actions and ensembles
We focus on projections of FMD made by the Warwick model [7,9]. This model was developed in the early stages of the 2001 FMD outbreak by Keeling and coworkers to determine the potential for disease spread and the impact of intervention strategies [9]. Here, we utilized a modified version of the model used in 2001, and we briefly describe relevant aspects of the Warwick model with regard to ensemble modeling. Full details of the model can be found in [7,9]. The rate at which an infectious farm I infects a susceptible farm J is given by: where is the susceptibility of farm J and is the transmissibility of farm I and K(d IJ ) is the distance-dependent transmission kernel, estimated from contact tracing [9]. In this model Z s, I is the number of livestock species s (sheep or cow) recorded as being on farm I, S s and T s measure the region and species-specific susceptibility and transmissibility, d IJ is the distance between farms I and J and K(d IJ ) is the distance dependent transmission kernel. The parameters, p s, S , p c, S , p s, T and p c, T, are power law parameters that account for a non-linear increase in susceptibility and transmissibility as animal numbers on a farm increase. Previous work has indicated that a model with power laws provides a closer fit to the 2001 data than when these powers are set to unity [43,51,52]. This version of the model has previously been parameterized to fit to the 2001 FMD outbreak [43]. Region-specific transmissibility and susceptibility parameters (and associated power laws) capture specific epidemiological characteristics and policy measures used in the main hot spots of Cumbria, Devon and the Welsh and Scottish borders. The model is therefore fitted to five regions: Cumbria, Devon, Wales, Scotland and the rest of England (excluding Cumbria and Devon). A table listing all the parameter values used in this model is given in Tildesley et al. [43].
In order to obtain multiple modeling assumptions for ensemble modeling, we specified different transmission rates, i as where k 1i and k 2i are constants, specific for each modeling assumption, that scale the transmissibility and the spatial kernel, respectively. k 1i = k 2i = 1, follow the parameterizations of Tildesley et al. [43] and by decreasing or increasing these constants, we obtain parameterizations that correspond to best or worst case expectations about the transmissibility and spatial range of transmission. We are interested in how the level of discrepancy among modeling assumptions influences the performance of the ensemble method and we therefore created two different ensembles with different k 1i and k 2i , as listed in Table 1. We refer to these as the large and small Overview of analyses and methods developments. Panel A presents a conceptual flowchart, showing that modeling assumptions are combined with control actions to simulate projections. These are then combined with observed outbreak data in the ensemble analysis. Panel B presents the methods developments in this study, indicating that we start with the Non Hierarchical Weighting (NHW) scheme, which is most similar to the original climate application. We then extend this to the Standard Hierarchical Weighting (SHW) scheme, and subsequently to Informative Hierarchical Weighting (IHW). We also extend the analysis to consider multiple epidemic quantities. The dashed lines indicate combinations for which the method and the supplied algorithm (S1 File) can be used but are not treated explicitly in the presented examples.
doi:10.1371/journal.pcbi.1004187.g001 discrepancy ensemble, corresponding to large and small differences, respectively, in the underlying modeling assumptions used for projections. DCs in our model were determined based upon both prior infection by an IP and future risk of infection in the same way as in previous work [8]. CPs were defined as farms that share a common boundary and were determined on an individual basis. The model was seeded with the farms that were predicted to be infected prior to the introduction of movement restrictions on the 23 rd February. For each modeling assumption i and control action, 200 replicates were simulated and each simulation progressed until the epidemic died out.

Adapting the Tebaldi et al. method for emerging diseases
To demonstrate concepts and explore the potential of using the Tebaldi et al. [47] approach for epidemiological considerations we initially focus on outbreak duration. This is often considered to be the most costly aspect of FMD outbreaks due to its implication for trade [53]. In section 2.7 we extend the methodology to multiple epidemic quantities. However, the outbreak duration example offers transparent transition from the original climate analysis of Tebaldi et al. [47] that considers the ensemble estimated difference between current and future mean temperatures. In order to introduce the framework to epidemiology, we consider the difference between the implemented and an alternative control action, attempting to show whether the inclusion of CP culling was an appropriate choice given the conditions at the start of the outbreak. As this is a post outbreak analysis, we know the final outbreak duration of the observed outbreak, but that is just a single realization and due to the stochastic nature of disease transmission, the exact outcome may be quite variable. We also have no observed outbreak under the alternative control action to compare with the implemented control. Under these conditions, the most appropriate quantities to compare are the mean duration of a large number of outbreaks under the two control actions, something that can only be achieved through epidemic modeling.
We are interested in comparing projections under the implemented control action to the observed data in order to estimate model weights. Using the Bayesian method of Tebaldi et al. [47], weights as well as statistics of the outbreak, like duration, are considered unknown random variables, and we denote the mean outbreak duration under the implemented and the alternative control action as μ and v, respectively, corresponding to the mean current and future temperature, respectively, for the climate application. In order to fit with the normal assumptions of the method, we consider log-duration in the analysis. Weights are expressed through precision, λ = λ 1 , λ 2 ,. . ., λ n , with λ i denoting the precision of modeling assumption i. The projection specific parameter x i indicates the mean of all replicates under the implemented control action (analog of current climate) for modeling assumption i. For the UK 2001 outbreak this included culling of IPs, DCs and CPs. The corresponding projection for the alternative control action (analog of future climate), that included culling of IPs and DCs is denoted y i . The relationship between projections and ensemble parameters is expressed as with Normal(μ,λ i -1 ) denoting the normal distribution with mean μ and variance λ i -1 . Parameter θ is included to allow for difference in overall precision of the modeling assumptions under implemented and alternative control actions. However, since projections x i and y i are based on simulations, it is fair to assume that modeling assumption i that has a high precision for the observed control action also will do well for the unobserved action. This is incorporated by the λ i term in both Eqs 5 and 6. For the same reason, we may expect that a projection of a large x i also corresponds to a large value for y i and thus β is included to allow for correlation between corresponding projections for the two control actions; a projection that e.g. over-predicts duration of the outbreak for the observed control action can be expected to also over-predict the alternative control action.
The analysis of Tebaldi et al. [47] also assessed bias of projections by their ability to reproduce observed current climate by incorporating the relationship between observed current climate x 0 , an unobserved true mean climate variable (μ) and the precision of natural variability τ 0 through In climate modeling, it is a fair assumption that τ 0 is a known, fixed parameter because it can be assessed through historical data. That would rarely be the case for the corresponding epidemic considerations, at least for emerging diseases. Using a single outbreak to evaluate bias, we clearly have no way of assessing variability in outcomes. We therefore include τ 0 as an unknown, random variable that is estimated in the analysis as described in the following section.
To aid the interpretation and transfer from the climate to the epidemiological interpretation, we have included Table 2 that lists the variables used in the analysis.

Stochasticity and variability
Our main interest in terms of outcome under the implemented control action is μ rather than x 0 . However, it is clear that in addition to the mean duration of the outbreak, the uncertainty about the process also results in some variability in the outcomes that we need to consider. The stochastic simulations used to generate projections provide not only a mean simulated outbreak quantity, but also a range of outcomes that projects the variability. In the absence of repeated outbreaks to evaluate variability of outcomes, an obvious choice would be to use this information to inform the variability τ 0 . Defining the variability τ i as the precision of projections under the implemented action for modeling assumptions i = 1,2,. . .,n we include a hierarchical structure in the analysis so that for i = 0,1,2,. . .,n where Gamma(a τ , b τ ) indicates the gamma distribution with shape parameter a τ and rate b τ both of which are unknown parameters and are estimated in the analysis. Thus, as it would be cumbersome to elicit a fixed prior for τ 0 based on our prior expectations about variability, we instead assume that τ 0 comes from some, unknown distribution, and make use of τ 1 , τ 2 ,. . ., τ n to inform what this distribution should be. Similarly, we need to model the variability of projections under the alternative control action, and denoting this φ i we specify The parameters a φ and b φ are conditionally independent from all other parameters in the analysis and can be modelled separately in the Bayesian analysis. As x i, y i , τ i and φ i are calculated from a finite number of realization with each modeling assumption and control action, there is some uncertainty related to this. Tebaldi et al. [47] however points out that while it is certainly possible to construct a Bayesian model that takes this uncertainty into account, the effect is minimal if the number of replicates is large. With the R = 200 replicates preformed here, the uncertainty of the mean will in practice have very little effect, and we have included x i , y i , τ i and φ i as fixed observations. Priors for a τ and b τ were specified as a gamma distribution with shapes A aτ and A bτ , respectively, and rates B aτ and B bτ , respectively. Similarly, the priors for a φ and b φ , were specified as a gamma distribution with shapes A aφ and A bφ , respectively, and rates B aφ and B bφ , respectively. We explored different parameter choices for the hyperpriors and found that the results were insensitive to the choice of prior for a wide range of values. In the analysis presented, we used A aτ = A bτ = B aτ = B bτ = A aϕ A bϕ = B aϕ = B bϕ = 0.001. This corresponds to prior distributions with a mean of one and a variance of 1000, thus allowing for a wide range of plausible values.

Bayesian model
Bayesian analysis requires the specification of prior parameters. We follow Tebaldi et al. [47] with priors specified as uniform on the real line for μ, ν, and β, and λ i~G amma(a λ , b λ ) for i = 1,2,. . ., n and θ~Gamma(a θ , b θ ). We also need to specify hyperpriors for a τ and b τ , and we Table 2. Variables in the NHW analysis with the interpretation for the original climate interpretation and the epidemiological counterpart.

Variable Climate Interpretation
Epidemiological Interpretation x i Current temperature mean for modeling assumption i.
Mean log-duration under implemented control for modeling assumption i. y i Future temperature mean for modeling assumption i.
Mean log-duration under alternative control for modeling assumption i.
x 0 Observed current temperature. Observed outbreak log-duration. implement a τ~G amma(A aτ , B aτ ) and b τ~G amma(A bτ , B bτ ). Denoting x = x 1 , x 2 ,. . ., x n and y = y 1 , y 2 ,. . ., y n , the full posterior distribution under these priors is given by This posterior only differs from the one defined by Tebaldi et al. in that we include τ 0 as an unknown variable and use a hierarchical structure for its prior. Using Markov Chain Monte Carlo (MCMC) techniques as described in 2.9, we first performed the analysis with priors as specified by Tebaldi et al. [47] where applicable, i.e. a λ = b λ = a θ = b θ = 0.001, because they argue that this ensures that the prior contributes little to the posteriors.
However, we propose that this argument is not necessarily always valid. In particular λ i could be expected to be sensitive to priors because it is essentially only fitted to two data points, x i and y i . Yet, based on approximations of conditional distributions, Tebaldi et al. argued that the gamma distribution with a λ = b λ = 0.001 is appropriately vague for the analysis. For transparency we here follow their approach and investigate the effect of the prior for the simplified model where β = 0. The mean of the conditional distribution of λ i is then approximated by wherem is the conditional mean of the distribution of μ, given bỹ andñ the corresponding value for v, given bỹ We stress that Sλ i need not sum to one, as might be intuitive when using weights. As given by Eqs 11 and 12, the mean of μ and ν only depends on the relative values of λ i , but the absolute values influence the width of the distribution, with the variance of the conditional distributions increasing with lower absolute values of λ i (Table 3).
While a low value of a λ certainly ensures little contribution to the numerator in Eq (11), it is less evident that a low value for b λ contributes little to the denominator because if x i !m and y !ñ, the denominator actually approaches b λ . Hence, to ensure that a low value of b λ can be considered vague such that our posterior is informed primarily by x 0 , x and y, we must conclude that jx i Àmj or jy i Àñj is clearly separated from zero. However, if λ i )λ j for all i 6 ¼ j and λ i )τ 0 , thenm % x i andñ % y i and nothing in the model prevents this relationship. In fact, if we consider the gamma prior with shape and scale parameters set to 0.001, the distribution has most of its density near zero, however with a fat tail (yet exponentially bounded) that allows for high values. In the current analysis, this corresponds to the prior belief that the majority of modeling assumptions will have very low precision while a few will have very high. Under this prior belief, it is expected that for some model i, λ i )λ j for all i 6 ¼ j. In the instance where instead τ 0 )λ i for all i, thenm % x 0 and the approximation would hold, but we cannot expect that relationship.
As we cannot a priori be sure that the choice of a λ and b λ does not influence our posterior as long as they are arbitrarily small, we performed a prior sensitivity analysis and re-ran the analysis with a λ = b λ = 0.01 and a λ = b λ = 0.0001. We could expect that the sensitivity to priors depends on the difference among modeling assumptions, and we investigate this by analyzing ensembles with little and large discrepancy between assumptions in the ensemble as given by Table 1.
We refer to this as the Non Hierarchical Weighting (NHW) method.

Standard Hierarchical Weighting model
If we cannot ensure that the analysis is insensitive to the choice of prior, it implies that our prior beliefs will influence how much different projections contribute to ensemble predictions with the current method. Using prior beliefs is sometimes desirable, and in section 2.6 we consider the situation where we trust some modeling assumptions more than others. However, it would rarely be the case that we would have substantial expectations that could inform the shape, a λ , and scale, b λ , of the prior for λ.
A potential solution might be to extend the model to include hierarchical priors such that the prior for λ i is estimated in the model rather than a priori fixed. We make a slight change to the parameterization of the prior distribution such that i.e. specifying the distribution by its mean m λ and shape a λ , which are estimated in the model. In that way, we move our uncertainty up a level and express our beliefs about the distribution of m λ and a λ , rather than λ. Using m λ rather than b λ facilitates the specification of a prior for the mean precision parameter that corresponds to the priors previously specified on individual λ i . This parameterization further aids the use of prior beliefs about weights in section 2.6. While we can never be strictly uninformative in Bayesian analysis, the hierarchical prior can allow for a wide range of plausible m λ and a λ whereas the model presented in section 2.4 requires these to be specified explicitly. This also allows for the concept of "borrowing strength" [54], such that the distribution of each λ i can be indirectly informed by all other precisions via Gamma a l þ 1;b li þ ðx i ÀmÞ 2 Àyðy i ÀnÀbðx i ÀmÞÞ 2 2 forb li ¼ b l for all modeling assumptions i in the NHW method,b li ¼ a l =m l for all modeling assumptions i in the SHW method andb li ¼ a l =m li in the IHW method. μ doi:10.1371/journal.pcbi.1004187.t003 the hierarchical distribution. This is often beneficial in situations where individual parameters are fitted to a small amount of data [55,56], which clearly is the case for λ i here. To extend Eq (10) to a hierarchical model, we include hyperpriors such that a l e GammaðA al ; B al Þ ð 15Þ and m l e GammaðA ml ; B ml Þ: We performed the corresponding sensitivity analysis for the hierarchical ensemble prediction by applying hyperpriors A aλ = B aλ = A mλ = B mλ set to 0.01, 0.001 and 0.0001. We refer to this as hierarchical sensitivity set-up one. Secondly, we performed a sensitivity analysis, hierarchical sensitivity set-up two, where we fixed the shape parameters A aλ = A mλ = 0.001 and only varied B aλ = B mλ , again set to either 0.01, 0.001 or 0.0001.
We refer to this as model as the Standard Hierarchical Weighting (SHW) method.

Informative Hierarchical Weighting model
Using expert opinions may substantially improve predictions [57], and there are several instances where incorporating prior beliefs that reflect the "trust" in different modeling assumptions could be useful. For instance, a policymaker might have more trust in one model type over another, and rather than excluding the models that are considered less reliable (i.e. giving them a priori zero weigh), it could be useful to include them, yet with less contribution to the ensemble. In the case considered here, where modeling assumptions represent most likely, best and worst case in terms of parameterization, we might want to give the "most likely" modeling assumption higher weight. For the analysis with fixed a λ and b λ , described in section 2.4, we could merely elicit a different scale parameter b λ for each λ i , such that modeling assumptions with high trust are given a low value. However, with the shape parameter a λ set to a low value ("vague" shape), the prior may have little effect on the posterior λ i . Eliciting a high value of a λ would instead result in a posterior that is merely the results of our prior beliefs and we have no foundation for which to elicit some intermediate value.
In order to combine the hierarchical approach with informative priors, we propose a modification of the analysis presented in section 2.5, where the assumption of exchangeability is relaxed in the hierarchical structure with l i e Gammaða l ; a l =m li Þ; wherem li ¼ w i m l and w i indicates the a priori trust in modeling assumption i. With w i = kw j , the prior distribution of λ i will have a mean that is k times that of λ j and from Eqs (12) and (13) the relationship also implies that before λ is estimated (i.e. involving the data x 0 , x and y), the outputs of modeling assumption i will contribute k times as much to μ and v as does assumption j.
To demonstrate the effect that a priori trust in different modeling assumptions can have on the posterior estimates, we consider the case where the best, most likely and worst case scenarios for each of the two varied parameters corresponds to percentile 2.5, 50 and 97.5, respectively, of a normal distribution. Given that the density at percentiles 2.5 and 97.5 then is 0.15 of that at the mode, we specify w i = 0.15 for i = 2, 3, 4 and 7, i.e. for modeling assumptions where one of the varied parameters follows the most likely scenario, whereas the other one is set to either worst or best case. With the same rationality, we specify w i = 0.021 for i = 5, 6, 8 and 9, i.e. modeling assumptions where both parameters follow either best or worst case expectations.
We also investigate the case where a high weight is given to a projection x i further away from the observed data x 0 . Consistently, modeling assumptions i = 5 predicted the shortest duration for all actions and ensembles. We therefore also performed the analysis with w 5 = 1 and w 1 = 0.021, and all other weights are as above. This allows us to investigate the performance of the informative weighting method when an outlier is up-weighted.
We refer to this method as the Informative Hierarchical Weighting (IHW) method.

Multiple epidemic quantities
In the above examples, we focused on a single epidemic quantity, allowing for transparent transition from the original Tebaldi et al. work [47] that focused on temperature. For epidemiology, it may however be useful to consider multiple epidemic quantities. This could be done in different ways, but here we offer a straightforward multi-quantity extension of the Bayesian model for the single epidemic quantity, based on the supposition that the relative weights are equal for all quantities. As such, we implement a single weighting parameter λ, shared among all quantities. For other parameters, we use a similar notation as for the single quantity analysis, but give many of the parameters an additional index q, indicating that the parameter is quantity specific. We expand the Bayesian model by defining x i;q e Normalðm q ; y m;q l À1 i Þ; ð18Þ y i;q e Normalðn q þ b q ðx i;q À m q Þ; ðy n;q l i Þ À1 Þ; ð19Þ where x i, q and y i, q are the mean projections of modeling assumption i for epidemic quantity q for the implemented and alternative control action, respectively, and x 0, q is the corresponding observed value. As for the single epidemic quantity example, μ q and V q are the expected values of quantity q, and because we cannot expect to have the same correlation between control actions for all quantities, β q is included as unique for each q. Parameters θ μ, q and θ v, q scales the precision of models between actions and quantities and the parameters of the Bayesian model are identifiable by defining θ μ, 1 = 1. Similarly, we specify quantity specific parameters t i;q e Gammaða t;q ; b t;q Þ; i ¼ 0; 1; 2; . . . ; n; The conditional distributions for the multi-quantity extension are provided in Table 4. We denote the total number of quantities in an analysis as Q.

Analysis of simulated outbreaks
The above examples focus on the UK 2001 FMD outbreak and show how the introduced framework can be applied to actual outbreak data. However, a limitation to this approach is that we are confined to investigating the behavior of the ensemble methodology for that particular outbreak. To further investigate the potential and limitations of the proposed methods, we also performed analysis of simulated outbreak data. With simulated data, we have "true" estimates of μ and v, and we want to explore the ability of the ensemble to predict these under two different conditions; when the true values lies within the range of X and Y predicted by the individual models of the ensemble and when it does not. For multi-model ensembles, this corresponds to the situation where the true behavior of the outbreak is encapsulated within the range of underlying assumptions of the individual models and when it is not.
Here we explore the outcome of these conditions by first simulating outbreaks with the parameterizations of modeling assumption 1 (k 1 = k 2 = 1), i.e. located in the center of both the small and large discrepancy ensemble. This simulates outbreaks where the true behavior of the outbreak is encapsulated within the range of underlying assumptions of the individual projections for both ensembles. We also simulate outbreaks with a parameterization where both k 1 and k 2 are set to 0.9. This produces outbreaks where the true behavior is outside of the assumptions of the projections for the small discrepancy ensemble, yet inside the range of the large discrepancy ensemble.
The exact behavior of the ensemble depends on the actual realization of the individual outbreak, because the observed values x 0 are different due to the stochastic disease transmission process. We therefore apply both the small and large discrepancy ensembles to ten realizations of each of the simulation parameterizations. We implement both the single and multiple epidemic quantity analysis, thus further highlighting the effect of using multiple quantities.

Computation
We use Markov Chain Monte Carlo (MCMC) techniques to obtain samples from the full posterior distribution of the proposed Bayesian models (NHW, SHW and IHW). For many parameters, the conditional distribution belongs to a standard parametric family, thus allowing for Gibbs sampling. We list these conditional distributions in Table 3 for single quantity analysis and Table 4 for multiple quantities.
We also rely on Metropolis-Hastings (M-H) updates, and with the computation used for multi-quantity analysis being a straightforward extension of that used for the single quantity, we start by describing the update scheme for the single quantity analysis. The conditional distribution of b τ has a known form, Pðb t j . . .Þ ¼ GammaðA bt þ ðN þ 1Þa t ; B bt þ P N i¼0 t i Þ, that Parameter Conditional distribution λ i Gamma a l þ Q;b li þ P Q q¼1 y m;q ðx i;q Àm q Þ 2 Ày n;q ðy i;q Àn q Àb q ðx i;q Àm q ÞÞ 2 2 ! forb li ¼ b l for all modeling assumptions i in the NHW method,b li ¼ a l =m l for all modeling assumptions i in the SHW method andb li ¼ a l =m li in the IHW method.
P y m;q l i x i;q À y n;q b q P l i ðy i;q À n q À b q x i;q Þ þ t 0;q x 0;q Þ v q Normalðn À ; ðy n;q P l i Þ À1 Þ with n À ¼ P l i ðy i;q Àb q ðx i;q Àm q ÞÞ would allow for Gibbs sampling of b τ , whereas M-H updates need to be implemented for a τ . We however found strong correlation between the marginal posterior estimates of a τ and b τ , and mixing was improved by performing joint M-H updates of these parameters by multivariate Random Walk (RW) proposals. Mixing can be further improved by updating parameters on a transform that resembles a Gaussian distribution, and we therefore performed updates on the log-transform, i.e. based on current values of a τ and b τ We proposed candidate parameters ½logða Ã t Þ; logðb Ã t Þ from MVN([log(a τ ),log(b τ )],S τ ). Here MVN indicates the multivariate normal distribution and S τ the covariance matrix. Candidate values are accepted with the probability min 1; indicates the Jacobian determinant of the log-transform.
Mixing can be improved if the covariance matrix S τ is proportional to the covariance of the marginal posterior of [log(a τ ),log(b τ )], here indicated as F. However, this is not known prior to the analysis. We therefore implement an optimized method of the Robbins-Monroe search process as presented by Garthwaite et al. [58]. This estimates the covariance during the MCMC and finds the scaling parameter ρ such that S τ = ρF provides a chosen long term acceptance rate, here set to 0.234 based on suggestions by Roberts et al. [59]. The method has been demonstrated to be appropriate also for RW on transformed scales of the parameters [60].
The corresponding updates of a φ and b φ were also performed with M-H updates and we proposed candidate parameters ½logða Ã φ Þ; logðb Ã φ Þ from MVN([log(a ϕ ),log(b ϕ )],S ϕ ). and accepted them with probability We used a similar approach for updates of a λ and m λ in the hierarchical methods (SHW and IHW) and proposed ½logða Ã l Þ; logðm Ã l Þ from MVN([log(a λ ),log(b λ )],S λ ). Candidate parameters were accepted with probability whereb li ¼ a l =m l for all modeling assumptions i in the SHW method andb li ¼ a l =m li witĥ m li ¼ w i m l in the IHW method. As above, we used the method of Garthwaite et al. [58] to determine S λ to obtain a long term acceptance rate of 0.234. We also found strong correlation between μ and ν. In order to improve mixing, we repeated the updates of these parameters five times for each iteration of the MCMC.
The same update scheme was used for the multi-quantity consideration, yet with a separate S τ, q , S ϕ, q and S λ, q adaptively estimated for each quantity q.
The algorithm was implemented in MATLAB (The MathWorks, Inc., Natick, Massachusetts, United States) and code is available as supplementary information (S1 File).

Single quantity analysis
We start by presenting the results for the single quantity analysis, highlighting the behavior of the method for the NHW, SHW and IHW schemes. Fig 2, panels A and B show the estimates of outbreak duration for the two control actions for the large discrepancy ensembles using the NHW method and reveals rather large prior sensitivity. Note that we plot marginal posteriors of M = e μ and N = e v , respectively. As such, the posteriors represent the geometric mean outbreak duration. The corresponding arithmetic mean can be calculated as e μ+1/(2τ 0 ) and e v+1/  0.001 or 0.0001. Sensitivity set-up two produced results that were visually indistinguishable from panels E and F and are not presented. This further indicates that the hierarchical method is robust to the choice of hyperpriors.
Within epidemiology, there is clearly an interest in not just the expected outbreak duration, but also other statistics such as the probability of large outbreaks occurring. We therefore consider the posterior predictive distributions of individual outbreak durations under the two control actions. For the non-hierarchical model (Fig 2, panels C and D), there is an obvious effect of the choice of prior with higher probability of long outbreaks for lower values of a λ and b λ . For the hierarchical model (Fig 2, panels G and H), there is again little difference among posteriors corresponding to different priors.
When evaluating the efficiency of control actions, the difference N-M is of particular interest. In the example presented here, this estimates how much longer the outbreak would have been if culling of CPs had been excluded from the control. As shown in Fig 3, the estimates are again sensitive to the choice of prior with the NHW method, yet insensitive with the SHW method. The range of the posterior under the NHW method is less sensitive to the prior for the low discrepancy ensemble (panel B) than for the large discrepancy ensemble (panel A), where higher probability of less difference is estimated with a λ = b λ = 0.01 than for a λ = b λ = 0.0001. However, the multimodal behavior of the NHW method with low values of a λ and b λ is obtained also for the low discrepancy ensemble.  a priori beliefs about the weights have on the predicted outbreak duration under large and small discrepancy ensembles. When using a priori higher weights for the most likely scenarios (modeling assumption one; black dotted lines), the posterior estimates are shifted and become more centered on projections of that particular modeling assumption compared to the case where a priori weights are equal (black solid line). The outcome of up-weighting the outlier (modeling assumption five; solid gray lines) is however different between the two ensembles. For the small discrepancy ensemble (panels A and B), similar results are found as for the up-weighting of the most likely scenarios; posteriors are shifted towards the projection with a priori high weight. For the high discrepancy ensemble (panels C and D), the posterior estimates of outbreak duration instead become wider for both control actions, indicating larger uncertainty about the expected duration of outbreaks.  (modeling assumption one) is approximately 1.7 times as large as that of the outlier(modeling assumption five), meaning that the former will contribute approximately 1.7 times as much to the posterior means of μ and v than the latter (Eqs (12) and (13)).
When giving a priori highest weight to the most likely scenario (modeling assumption one; middle column panels), the posterior estimate of λ 1 is consistently shifted upwards, meaning that the most likely scenarios (modeling assumption one) will contribute more to the posteriors of μ and ν than other projections. For the up-weighting of the outlier, projections corresponding to modeling assumption five, the same is found when there is little discrepancy among projections (lower right panel). This is however not found for the high discrepancy ensemble (top right panel), where the main effect is that compared to equal a priori weights (top left panel), the error bars are wider; this indicates larger uncertainty about weights and consequently about the contribution of individual projections to the posterior estimates of outbreak durations.

Multiple quantity analysis and simulated outbreaks
The proposed multi-quantity method can be implemented with either NHW, SHW or IHW schemes. Here we aim to illustrate the effect of using multiple quantities and focus on the SHW scheme. Fig 6 plots the marginal posterior density of mean outbreak duration under the two control actions as estimated for the multiple quantity analysis (solid) together with the corresponding estimates for the single quantity analysis (dashed). The figure illustrates how inclusion of multiple quantities in the analysis leads to tighter distributions, centered on projections for i = 1. The multi-quantity analysis produces a probability distribution of all considered quantities, and Fig 7 further illustrates how the marginal posterior densities are located above zero for all three considered quantities.
To illustrate the performance of the method under different conditions, we also analyzed simulated outbreaks. Fig 8 shows the posteriors of mean duration for outbreaks simulated with the k 1 = k 2 = 1 parameterization and applying the small (triangles) and large (circles) discrepancy ensembles, represented by the median values and error bars indicating the 95% central credibility interval. Note that individual realizations, indicated by stars, are expected to frequently be outside of the credibility envelopes. Error bars are inclusive of the true mean outbreak duration (dashed lines) for all ten analyzed realizations for both the implemented and alternative control actions. However, the credibility envelopes are tighter and medians closer to the true value for the multi-quantity analysis. This indicates that the ensemble prediction is improved by including multiple quantities.
When applying the analysis to outbreaks simulated with the k 1 = k 2 = 0.9 parameterization (Fig 9), the large discrepancy ensemble error bars are still consistently inclusive of the true value. As with Fig 8, credibility envelopes are tighter for the multi-quantity analysis. The error bars of the small discrepancy ensemble that all rely on simulations with parameterizations with higher k 1 and k 2 than the true value, are not inclusive of the true value, indicating that the small discrepancy ensemble fails in predicting the true values of the outbreak.

Discussion
Ensemble modeling is appealing because it offers the possibility to combine multiple projections. Within weather forecasting, the approach has given more robust predictions, and we could expect that to be the case for epidemiology as well. However, there is a need for the development of methods describing how to combine several epidemiological projections. The aim of this study was to investigate the possibility of using the Bayesian framework introduced by Tebaldi et al. [47]. We find that it is a promising approach, for primarily three reasons.
Firstly, when the methodology is implemented in a hierarchical Bayesian framework, it provides an appealing interpretation of model exchangeability. Essentially, projections and their underlying modeling assumptions are treated as random draws from a population of possible projections. By estimating the hierarchical parameters a λ and m λ jointly with individual precisions (weights) λ i , the characteristics of this hypothetical population are estimated. Smith et al. [61] used a similar approach for climate ensembles and pointed out that this reduces the impact of which models are included or excluded in the ensemble. That is, we should expect to get similar results when using a different set of model assumptions if they are chosen independently. We stress that this interpretation is more valid for multi-model ensembles, however. Also, the term "random draws" should not be interpreted as arbitrary. Rather, the interpretation is that models should come from a population of well-informed, reasonable models. The analysis treats the outputs of the performed simulations under different assumptions as data (Eq 1, Table 1), and as such they are used to inform the quantities of interest (μ and v). This may seem counterintuitive, yet it only serves as a formal means to combine the results of multiple projections, and by Eqs (7) and (20), these are combined with available outbreak data.
Secondly, the framework can handle several different weighting schemes simultaneously. The original methods introduced by Tebaldi et al. [47] used convergence and bias to assess weights. Here, we further extend the framework such that informative priors can be included to inform the weights, thus relaxing the supposition that all modeling assumptions are a priori exchangeable. Epidemiological predictions suffer from lack of available data to assess model bias, and we propose that expert opinions will play a larger role than in other fields of research. With the analytical tool proposed here, a policymaker can choose to include a range of projections based on different modeling assumptions, yet give them different weights, rather than including one or a few (given a weight of one) and excluding others (given a weight of zero). When using different mechanistic models, subjective trust in the different models can be incorporated by using methods of prior elicitation based on expert opinion [38]. Importantly, our methods can incorporate these subjective beliefs in the hierarchical framework, requiring only the specification of the a priori relative confidence in the underlying assumptions of the projections. Definition of an individual, fixed prior would undoubtedly be cumbersome to elicit from expert opinion; it would not be feasible to ask policymakers to define an individual gamma prior for each modeling assumption. Here we used ensembles based on projections of the same model with different parameterizations, demonstrating the possibility to explore parameter space, yet with unequal probabilities of different parameterizations. Uncertainty about parameters will be an issue for most epidemiological models, and we propose that multi-model ensembles should incorporate projections with different models and different parameterizations. Thus, different mechanistic assumptions as well as parameter uncertainty would be incorporated in the ensemble.
Thirdly, the framework produces easily interpretable probability distributions. It is important that communication with policymakers include uncertainties about prediction rather than just the most likely outcome [16]. In the ensemble context, these uncertainties take into account different assumptions about the transmission process. Gårdmark et al. [32] suggested that uncertainty should be communicated with policymakers by presenting the full range of predicted outcomes. However, that would give equal weights to all included projections and would require that the results be communicated with a detailed description of all assumptions made, thus allowing the policymaker to decide how much to trust each modeling assumption. This would be a cumbersome task, particularly for detailed simulation models that rely on a large number of parameters. We therefore argue that it is beneficial to communicate the aggregated and weighted result as easily interpretable probability distributions. With further modifications of the methodology, we propose that the approach could also be used as a forecasting tool during an outbreak, e.g. by letting x i and y i denote current and future numbers of infected farms. In such a situation, there is a great need for rapid and clear communication of model results to aid policy decisions. The visual manner in which uncertainty is presented using probability distributions makes them easy to understand and communicate [62].
We here show that these distributions are sensitive to the choice of priors when using the NHW method (Fig 2, panels A-D, Fig 3, panels A, B). However, the impact of the prior is heavily reduced when using the hierarchical framework (Fig 2, panels E-H, Fig 3, panels C, D). Thus, our results demonstrate that the hierarchical approach is preferred for ensemble modeling and using the non-hierarchical approach can lead to spurious conclusions. We argue that this would also be the case for other fields, such as climate ensembles, but it is likely to be a larger concern for epidemiology where data to modify the prior are fewer. Considering Eq (11), we could ensure that b has little contribution to the denominator if τ 0 )λ i for all i, ensuring that the prior has little contribution to the posterior. For climate considerations, we envisage that the precision of natural variability, τ 0 , would be large relative to each λ i if bias is assessed by comparing model simulations to long time series of climate data. For epidemiological considerations, this would however rarely be the case. In the proposed method, we instead inform τ 0 largely by the simulation outputs, letting the projections of the ensemble determine how variable outcomes are.
Climate modeling, from which the proposed method is adapted, is primarily concerned with differences between current and future mean climate variables [24]. Epidemiology is not only concerned with mean projections but also with other quantities such as the probability of very large or long outbreaks occurring. Fig 2, panels G and H illustrates the probability of a given epidemic duration occurring for a single outbreak under the two control actions with the preferred SHW method. Comparing the posterior predictive distribution to the density of merely lumping the results of all simulations, as illustrated by the colored bars, the posterior predictive distribution of the ensemble method has a lower probability of both very long and short outbreaks. This is because projections of such outbreaks are down-weighted when their bias is assessed in the analysis; the observed outbreak duration would be unlikely under the modeling assumptions that produce these projections. Thus, ensemble methods that give equal weights to all projections can overestimate the uncertainty about outbreaks, preventing the models from informing appropriate policy decisions.
We have further extended the methodology to allow for informative priors on the weights. Compared to climate models, epidemiology often has far less available data to assess model bias. As such, expert opinion will often play a larger role within this field. Fig 4 illustrates the behavior of the ensemble prediction under such informative priors. When up-weighting projections for i = 1, which is also likely under the observed outbreak duration, the posteriors are shifted towards these projections and produce tighter distributions. This is also found when up-weighting the outlier, i = 5, in the small discrepancy ensemble (Fig 4, panels A and B), in which no projection is particularly unlikely for the observed duration. Projection x 5 is however unlikely in the large discrepancy ensemble. As a result, the effect of up-weighting the underlying modeling assumptions of this projection primarily makes the distribution wider, resulting from a larger uncertainty about individual weights (Fig 5). This is to be interpreted such that if expert opinions a priori determine that a modeling assumption that is unlikely to predict the observed data is better than other assumptions, the conclusions should be that there is less information in the ensemble as whole. However, when expert opinions are well informed and do not contradict with observed data, they can lead to more precise predictions.
It should be stressed that discrepancy among projections in the ensembles should be viewed as relative to τ 0 , the estimated variability in outbreak duration given the initial conditions. A crucial difference between the original method applied to climate change and the epidemiological consideration presented here is that τ 0 is unknown for the latter and therefore needs to be estimated. We argue that in the absence of multiple outbreaks, it is sensible to inform this by the model simulations. Stochastic simulations are often used to estimate the range of outcomes for non-ensemble projections [1,17,18,23], and we propose that when extending the use of models to the ensemble context, they can be used to estimate this feature as well. We have therefore chosen a Bayesian model structure where τ 0 is informed largely by the within projection variability, τ 1 , τ 2 ,. . ., τ n , via the Gamma(a τ , b τ ) distribution in Eq (8). All projections of the implemented control action contribute equally to this distribution in the method presented here, thus we are giving equal weights to all modeling assumptions in terms of informing τ 0 . Estimation of different weights in terms of informing τ 0 based on a single outbreak, analogous to the estimation of λ, would not be conceivable. However, if policymakers believe that some modeling assumptions are more reliable in terms of capturing the variability of outcomes, we envisage that the Bayesian model structure can be altered to include this. If applied to endemic disease, τ 0 could be informed similarly to the natural variability of temperature in climate application, and the algorithm we supply is set up to handle this situation. Also, data from multiple outbreaks could be used to inform τ 0 when available. Yet, data quality will rarely be comparable to climate data, which highlights one of the major challenges for epidemiological modeling.
We also provide a multi-quantity extension of the Bayesian ensemble framework. Fig 5  shows that when adding number of infected and culled farms to the analysis, the marginal posteriors of outbreak duration become narrower and centered on x 1 and y 1 , i.e. the projections based on the most likely scenario. This illustrates that predictions can be improved by incorporating multiple quantities when assessing the weights.
The main scope of this study is to introduce ensemble methods to the field of epidemiology rather than to produce inference about the 2001 FMD outbreak. However, Fig 7 illustrates the types of conclusions the method can provide. The three quantities we include in the multiquantity analysis are all of great concern to policy makers when assessing the impact of control actions. The probability distributions represent the ensemble predicted difference in the outcome of the outbreak if the control action had excluded culling of CPs. The distributions all have most of the density above zeros, indicating that excluding culling of CPs would most likely have resulted in a prolonged and larger outbreak. We should however point out that these results are based on a single model. To make more robust predictions, we propose that the same type of analysis be made with projections of different models.
We also analyzed simulated data to provide a more general depiction of the performance of the method under different conditions. Fig 8 shows the result of analysis of ten simulated outbreaks with the parameterization in the center of the ensemble, i.e. k 1 = k 2 = 1. As this is in agreement with both the small and large discrepancy ensemble, the true values (dashed lines) consistently lie within the 95% credibility intervals. However, when using the k 1 = k 2 = 0.9 parameterization, the assumptions of the model used to simulate the outbreak is only inclusive of the large discrepancy ensemble, and consequently only the large discrepancy ensemble error bars are inclusive of the true values. Noting that we primarily use the different parameterization as a proxy for different models, this simple simulation example illustrates some obvious but essential points. Ensemble modeling should not be interpreted as a remedy for models based on poor assumptions about the modeled process. It offers the ability to combine multiple assumptions, thus integrating uncertainty with regards to this in the predictions. However, if all models are based on similar but inaccurate assumptions, ensemble modeling will not improve predictions. Intentionally making models similar to each other increases this risk and should be avoided if the models are to be used for ensemble purposes.
Accepting these limitations, we argue that the ensemble approach will be beneficial to epidemiological risk assessment because rather than choosing a single model for the purpose, it offers the possibility to combine projections from models that make mechanistically different assumptions about the transmission process. Thus, uncertainty with regard to this is incorporated in the predictions, which is important as projections of different models have been reported to deviate [63][64][65]. The use of multi-model ensembles would rely on collaboration of modeling teams, as well as overcoming confidentiality constraints in accessing outbreak data and population demographics. The current development in FMD modeling is seeing encouraging development in that area. The Quadrilateral Epiteam [19] has compared simulation of several outbreak scenarios in a subset of the UK demographics with five different models: NAADSM [45], Netherlands CVI [66], InterSpreadPlus [46], AusSpread [44] and ExoDis [67].
This demonstrates that potential obstacles for multi-model ensembles can be overcome and we envisage that epidemiology will see a shift towards multi-model ensembles to inform policy decisions, as has been seen in climate research [24,25] and weather forecasting [26,27]. Combining the results of multiple models however requires means of weighting these. We conclude that the presented framework is a promising approach because it provides easily interpretable probability distributions of quantities of interest. It also offers an appealing interpretation of model exchangeability, while at the same time combining several different weighting schemes, including a priori beliefs when such are available.
In this study, we introduced this framework by applying it to a simple question: how would exclusion of contiguous premises culling from the control action have affected the outcome of the UK 2001 outbreak? The aim of the study has been to introduce the methodological framework to epidemiology and solve some key issues associated with this transfer, including prior sensitivity, informing weights by expert opinion, using models to inform the variability in the outcome of individual outbreaks and extension to consider multiple epidemic quantities. We have purposely chosen the simple example because it allows for a straightforward transfer from the original climate implementation, and at the same time lets us demonstrate essential concepts and the potential of the framework. Models are however used to answer a range of different questions in epidemiology, and combining multiple projections has the potential to improve the way models are used to inform policy. We argue that the framework we introduce here has great potential, and foresee that many of the questions addressed in epidemiological modeling would require further developments of the Bayesian model, structured to fit with the specific problem. To facilitate this, we have supplied the algorithm (S1 File) and hope that it will aid further development of ensemble methods for epidemiology.
Supporting Information S1 File. MCMC code written in MatLab. (ZIP)