Quantifying model evidence for yellow fever transmission routes in Africa

Yellow fever is a vector-borne disease endemic in tropical regions of Africa, where 90% of the global burden occurs, and Latin America. It is notoriously under-reported with uncertainty arising from a complex transmission cycle including a sylvatic reservoir and non-specific symptom set. Resulting estimates of burden, particularly in Africa, are highly uncertain. We examine two established models of yellow fever transmission within a Bayesian model averaging framework in order to assess the relative evidence for each model’s assumptions and to highlight possible data gaps. Our models assume contrasting scenarios of the yellow fever transmission cycle in Africa. The first takes the force of infection in each province to be static across the observation period; this is synonymous with a constant infection pressure from the sylvatic reservoir. The second model assumes the majority of transmission results from the urban cycle; in this case, the force of infection is dynamic and defined through a fixed value of R0 in each province. Both models are coupled to a generalised linear model of yellow fever occurrence which uses environmental covariates to allow us to estimate transmission intensity in areas where data is sparse. We compare these contrasting descriptions of transmission through a Bayesian framework and trans-dimensional Markov chain Monte Carlo sampling in order to assess each model’s evidence given the range of uncertainty in parameter values. The resulting estimates allow us to produce Bayesian model averaged predictions of yellow fever burden across the African endemic region. We find strong support for the static force of infection model which suggests a higher proportion of yellow fever transmission occurs as a result of infection from an external source such as the sylvatic reservoir. However, the model comparison highlights key data gaps in serological surveys across the African endemic region. As such, conclusions concerning the most prevalent transmission routes for yellow fever will be limited by the sparsity of data which is particularly evident in the areas with highest predicted transmission intensity. Our model and estimation approach provides a robust framework for model comparison and predicting yellow fever burden in Africa. However, key data gaps increase uncertainty surrounding estimates of model parameters and evidence. As more mathematical models are developed to address new research questions, it is increasingly important to compare them with established modelling approaches to highlight uncertainty in structures and data.


Introduction
Yellow fever (YF) is a vector-borne viral disease estimated to cause 78,000 deaths in Africa alone [1]. It is endemic to both Latin America and Africa although as much as 90% of cases are thought to occur in the latter. The disease has a wide spectrum of severity ranging from asymptomatic infection to severe disease including jaundice and haemorrhaging [2]. Particularly in milder cases, the disease is fairly non-specific and many individuals may be mistaken for having other highly prevalent infections such as influenza or malaria.
The main urban vector of yellow fever transmission is Aedes aegypti, also a carrier of Dengue and Zika [3]. However, there are a number of intermediate and sylvatic vectors which link the three main transmission cycles involving humans and non-human primates. The two main transmission cycles are sylvatic, where infection occurs mainly in non-human primates and is driven by tree-hole-breeding mosquitoes such as Aedes africanus, and urban, where transmission is mainly in humans and is driven by Aedes aegypti [3,4]. A third cycle cycle can exist in the moist Savannah region of Africa when the tree-hole-breeding mosquito densities reach high levels, infecting both humans and non-human primates. The various cycles have different dynamics. When infection reaches urban settings, explosive outbreaks can occur due to the high density of human hosts and mosquitoes. Yet, the consistent infection pressure from the sylvatic reservoir of disease leads to speculation that many yellow fever cases arise as a result of this cycle. Although, this pressure may vary temporally due to environmental factors. The third cycle, an intermediate form of transmission, affects humans living or working in the jungle border areas and may exhibit dynamics of both extremes. This naturally leads to questions concerning the optimal routes for controlling the disease.
Eradication of yellow fever is not feasible due to the sylvatic reservoir. However, there exists a human vaccine which was developed in the 1930s and is estimated to have effective protection that lasts a lifetime [5]. As such, control of the disease is possible and recent mass vaccination campaigns have been found to reduce annual burden by 57% across the targeted countries [1]. Yet assessing the impact of such an intervention will always be troubled by uncertainty in the prevalence and dynamics of the disease.
A great deal of uncertainty is inherent in yellow fever modelling. The varying level of symptoms experienced by infected individuals lead to significant under or mis-reporting. This is further complicated by the surveillance structures and remoteness of some of the communities at risk from yellow fever. Furthermore, there is little data concerning the non-human primates who act as the environmental reservoir of the disease. The relationship between different primate species, mosquito types and the human hosts can vary the severity of transmission. Tackling the uncertainty in model structures as well as parameters gives a start point from which to begin quantifying the importance of transmission routes for this virus.
The uncertainty in yellow fever understanding leads to a number of challenges. In Africa, little is known about YF in non-human primates (NHP); because of this, it is difficult to assess how dynamics in NHP may affect epizootic events. Furthermore, the relative contribution of different transmission cycles to the burden of YF in humans is unclear from the limited data available. As a result, specific transmission-targeted intervention, which could allow for effective and cost-efficient control of the disease, is difficult to justify.
There have been several approaches to modelling YF, often without explicit assumptions concerning the transmission cycles involved. Locally, Monath and Nisidi developed models to assess the benefits of YF vaccination in Nigeria, finding it to be a cost-effective addition to the enhanced programme of immunization (EPI) [6]. They assumed that infection in humans occurred purely as a result of contact with the sylvatic reservoir; however they suggested a background prevalence of immunity of 60% may preclude epidemic spread. Kraemer et al. modelled the geographic spread of the YF outbreak in Angola and Democratic Republic of Congo 2016 using human mobility data and a measure of vector suitability in order to prioritise districts for intervention. On a global scale, Shearer et al. modelled the vaccination coverage and then YF risk by incorporating distribution maps of reservoir and vector species as well as habitat suitability [7,8]. We focus on two models for transmission in Africa. The first is that of Garske et al. which is used to inform and assess vaccination strategy from GAVI and the Gates foundation [1]. This model assumes the force of infection is static resulting from a constant infection pressure from the sylvatic reservoir and uses a generalised linear model of YF occurrence to extrapolate transmission intensity to areas where data is sparse. Our second focus model is that of Jean et al. which extends the work of Garske et al. to account for a form of herd immunity [9]. In this case, infection is assumed to be human-human mediated by mosquitos thus representing only the urban transmission cycle. Both represent an extreme version of yellow fever transmission; however, within our averaging framework, we will assess the relative evidence for each model formulation and thus compare the role of each transmission cycle. This approach differs from Faria et al. who used genomic data as well as the age/ sex distributions of infected cases to find that a YF outbreak in Minas Gerais was mainly due to sylvatic transmission [10].
In this manuscript we shall detail the updated data sources and describe in brief the two highlighted model structures for yellow fever in Africa [1,9]. We shall then define a composite model incorporating both model formulations and estimate it within a Bayesian framework to evaluate key values of model selection such as Bayes factors. This will allow us to produce predictions incorporating both model approaches and the inherent uncertainty of multiple model structures and parameters. This is the first rigorous multi-model approach applied to yellow fever.

Overview
As described in [1], we estimate a generalised linear model at locations where yellow fever was reported from 1987 to 2017. This model predicts, for each location, the probability of at least one yellow fever report over the observation period based on covariates such as the temperature or land cover type. Then, given a probability of detection for each country informed by the mismatch between case reports and serology, the number of cases that would give rise to this probability of report is estimated. Finally, this number of infections is converted to either a force of infection, λ, see [1], or a basic reproduction number estimate, R 0 , see [9].

Data
The models were estimated at the resolution of the first administrative unit, often called a province. All data detailed below is resolved or aggregated at that level unless stated otherwise. This resolution was optimal for the YF occurrence data, other inputs such as environmental covariates were originally gridded.
Yellow fever occurrence. A database of the provinces where outbreaks of yellow fever have been reported was compiled from 1984 to 2017. This was informed by data from the Weekly epidemiological record (WER), the WHO disease outbreak news (DON), the published literature and internal WHO outbreak records.
Case-based surveillance for YF was established in 21 West and Central African countries and is recorded in the yellow fever surveillance database (YFSD) held by the African regional office of the World Health Organisation (AFRO). This database is designed to use a very broad case definition (jaundice or haemorrhage with fever) to ensure high sensitivity (at the cost of low specificity of suspected cases). That means that most reported cases are not due to YF and only 1-2% will be lab-confirmed as YF. In addition to using the confirmed cases to inform the occurrence of YF, we use the above data as a proxy for surveillance effort in each country.
Assuming an approximately constant incidence of the syndrome 'fever with jaundice and/or haemorrhage', the per-capita incidence of reported suspected cases yields information on the surveillance effort. We therefore aggregate this measure at country level and divide by the national population to be used as a covariate in the generalised linear model.
Vaccination coverage. We use the methodology of Garske et al. with updated data sets to estimate vaccination coverage [1]. This methodology is also described and visualised in finer detail in the Polici shiny application where vaccine coverage is visualised at province level from 1940 to 2050 for the 34 endemic countries in Africa. [11]. This data may also be downloaded for each country. The coverage is based historic data on large-scale mass vaccination activities [12,13], outbreak response campaigns as reported in the WHO weekly epidemiological record and disease outbreak news [14,15], recent preventive mass-vaccination campaigns conducted as part of the yellow fever initiative [16] and routine infant vaccination reported by the WHO/UNICEF estimates of national immunization coverage (WUENIC) [17].
Serological surveys. We use serological surveys to assess transmission intensity in specific regions of the African endemic zone, the locations are displayed in Fig 2. Theses include published and unpublished surveys [18][19][20][21][22][23]. The unpublished surveys were from various East African countries performed between 2012 and 2015 as part of the YF risk assessment process [24]. Of the published surveys, included in the estimation of Garske et al. and Jean et al. we omit that of the Central African Republic as it may capture post-outbreak dynamics and thus not represent the population serological status at steady state [9,18,25]. In the majority of surveys, individuals who were known to have been vaccinated were excluded; however in south Cameroon the vaccination status was unknown so we estimate an additional vaccination factor [19]. This factor represents the probability that a vaccinated individual is included in the study. Further summary characteristics of the surveys are included in the Table A in S1 Text.
Demographic data. For each country, the population size and age structure was obtained from the UN World Population Prospects (UN WPP) [26]. These were then disaggregated from the five-year age band into annual birth cohorts according to the method described in [25]. We combine the estimates with spatial population distributions from LandScan 2014 [27] in order to estimate the populations sizes at province level assuming the same age structure across all provinces in a given country. Landscan provides population size estimates for approximately 1 km square pixels. Combining these data sets we arrive at the total number of individuals in each age group and province over time assuming that spatial distributions are fairly static.
Environmental data. The generalised linear model includes environmental covariates to account for the dependence of transmission on factors such as temperature and land cover. These data sets specifically include the enhanced vegetation index (EVI), middle infra-red reflectance (MIR), land cover types, rainfall estimates, temperature and altitudes [28][29][30]. These are gridded data at various resolutions, ranging from approximately 1km to 10km, which we aggregated to province level by calculating the mean value weighted by the population size at each grid cell.

Models
Generalised linear model of yellow fever reports. This approach is common to both models and described in full in [1]. The generalised linear model is estimated from occurrence data which we assume to be binomially distributed. The model predictions, q i for each province or first administrative unit, i, are given by where X = X ij is the matrix of covariates used in the model, i denotes province and j denotes covariate, and β is the vector of parameters to be estimated. Garske et al. used a complementary log-log link function in order to have a more realistic interpretation in terms of surveillance quality when relating the GLM to transmission intensity, detailed later. The full list of covariates assess is detailed in [1]; however, in this paper we examine only the best-fitting model which incorporates aspects such as land cover type, temperature, surveillance quality, population size, enhanced vegetations index and longitude. This is the same for both formulations of the transmission model.

Relating occurrence to transmission intensity.
In both of the model formulations, the transmission intensity produces an estimate of the number of infections in any year. As such we may calculate the number of infections over the reporting period. If we can also calculate  Table A in S1 Text [1,9,[18][19][20][21][22][23]. Maps were produced from GADM version 2.0. https://doi.org/10.1371/journal.pcbi.1007355.g002 Quantifying model evidence for yellow fever transmission routes in Africa the probability of report, the number of infections will link transmission intensity and the occurrence of yellow fever reports. First, we relate q i , the probability of at least one yellow fever report in province i over the observation time to n inf,i , the number of infections in that province, through a Poisson process for the detection on infection, where ρ i = ρ c is the per country probability of detection which may be related to the GLM in following way n inf;i lnð1 À r c Þ ¼ À expðXbÞ: Through the repeated taking of logarithms, the probability of detection may be written in terms of the GLM coefficients and a baseline surveillance quality, b, which is calculated separately using serological survey data from the left part of Eq 2.

Transmission intensity as a static force of infection.
In the areas where serological survey data is available, it is possible to estimate the transmission intensity assuming certain information about vaccination coverage and demography. In the first instance, Garske et al. assumed transmission occurred as a static force of infection, hereby termed the λ model [1]. This formulation may be likened to the assumption that most Yellow Fever infection pressure comes from the sylvatic reservoir and there are limited instances where infection reaches urban environments to be sustained in the human and vector populations alone.
The force of infection is assumed to be the same for all age groups within a province over time, so we may model the serological status of the population in age group u with force of infection λ as the following, where p a denotes the proportion of the population of age a and v a denotes the vaccination coverage at age a. Transmission intensity related to the basic reproduction number. Jean et al. developed a formulation for the transmission dynamics of yellow fever that is based on the basic reproduction number [9]. Therefore, the model is dynamic and the force of infection may change depending on population immunity. In contrast to the the λ model, the basic reproduction number or R 0 model assumes all infections are human-to-human, mediated through the main urban vector of Yellow Fever, Aedes aegypti. The ramification of this assumption is that the herd affects of any intervention measures will be captured by the R 0 model and not by the λ model. The R 0 model assumes that each province is at endemic equilibrium. Each year the endemic equilibrium is maintained by the addition of new infections such that the fraction of susceptible population is equal to 1/R 0 . Thus, where N inf (t) is the number of infections in year t and P tot (t) is the total population size in year t.
The immunity granted through vaccination is also incorporated into the model, reducing the number of infections occurring in any one year. However, seropositivity due to vaccination is included in only some of the serological studies. In this case, the natural and vaccinated serological status must be tracked separately and the resulting seroprevalence at age a in year t may be modelled as where i − v − (a, t) denotes those individuals who are both uninfected and unvaccinated.

Model estimation
We wish to average the model predictions of disease burden whilst taking into account the uncertainty in parameter estimates and model structure. As such, we would like to calculate the probability of each model given the data and uncertain parameter estimates. We do this by estimating the model evidence using Markov Chain Monte Carlo (MCMC) on the product or composite space of the models using the method of Carlin and Chibb [31][32][33][34][35]. They wished to develop a rigorous model selection, without affecting convergence, in the situation where the dimension of the parameter space is not fixed [31]. This method not only gives us an estimate of the model evidence but also allows us to calculate the Bayes factors and marginal posterior distributions for each parameter.

Method in brief.
We examine the full space of parameters for all models, y ¼ ðy R 0 ; y l Þ. Therefore, we define a composite model containing all parameters and a model indicator, M. The model indicator dictates which parameters are informed by the data at each iteration; when the model indicator 'selects' model A, we term the parameters corresponding to model A as activated. We sample the posterior of this composite model by proposing values of θ and M, and accepting them with a probability proportional to the prior distributions of M and activated model parameters; the likelihood, and pseudopriors. Pseudopriors can be considered as linking densities which are chosen to improve mixing between models and do not contribute to the marginal likelihoods. They act as place holders for the parameters that are not currently informed by the data and therefore only apply to inactivated parameters. In our models the model-specific parameters describe the transmission intensity in the serological surveys, namely λ and R 0 estimates which may be activated or inactivated by M. The parameters that are common to both models, such as the vaccine efficacy, are always informed by the data and therefore always activated. These parameters do not have pseudopriors, see Theoretical background. We may describe our R 0 and λ models as Bayesian models defined by a joint probability distribution of data, D, and model parameters, θ: We define a mixture model of the two models where our parameter vector can take any value from the Cartesian product of the individual parameter spaces [32]. Therefore, the full model may be written as the following, where p(M i ) is the prior for model i 2 (λ, R 0 ) and pðM R 0 Þ þ pðM l Þ ¼ 1. Given the above formulation of the mixture model, the marginal likelihood for each model may be described with the following, Note that the above is independent of the prior distribution p(θ j |M i ), or pseudoprior, which means that p(θ j |M i ) may be chosen specifically to improve mixing and that we may calculate the Bayes factors as the ratio of the two marginal likelihoods [31,32]. We generate samples from the joint posterior distribution of the model parameters and indicator using Metropolis-within-Gibbs where the joint posterior is given by, pðM; yjDÞ / pðDjy i ; M i Þpðy i jM i Þpðy j jM i ÞpðM i Þ: Following Carlin and Chibb and, separately, Lodewyckx et al. [31,32], we use a Gibbs sampler. However, in both of the above, the likelihood, p(D|θ i , M i ), could be sampled easily, whereas we use a pseudo-likelihood assuming observations are independent, with form shown in Eq 6. As such, where the likelihood occurs in the Gibbs sampling algorithm, we include a Metropolis-Hastings step. This means that we propose new parameter values from a symmetric distribution which we then accept with a probability proportional to the likelihood and prior distributions at that point.
The steps for one iteration of this approach are as follows: Specific formulation. Our two model formulations retain the generalised linear model of yellow fever occurrence which is used to extrapolate transmission intensity across the entire African endemic region. To estimate this, we use binary occurrence data to produce the following likelihood, where y i denote yellow fever reports in province i over the observation period and q i is the predicted probability of a report given by Eq 1. We estimate the GLM with MCMC and a Metropolis-Hasting step based on an adaptive proposal distribution.
The difference between the λ and R 0 model formulations occurs when predicting seroprevalence. However, the serological study likelihoods have the same general form. The positive samples are assumed to be binomially distributed so we have the following pseudo-log- where S M (T, u) denotes the seroprevalence predicted by model M 2 (λ, R 0 ) for age group u and transmission type T 2 (λ, R 0 ) given by Eqs 3 and 4; n tot,u and n pos,u denote the total number of samples and positive samples in the data respectively. As we have overlapping parameters which will be activated irrespective of the value of the model indicator, we only need to define pseudopriors for the parameters that differ between models i.e., R 0 and λ. We choose the pseudopriors to be approximations of the individual model posterior predicted values from previous MCMC runs, see the appendix for all distributions. These are calculated using the package fidistr in R assuming the posteriors of individual model runs are normally distributed, see supplementary material for fits. We adjust the variance of these fitted distributions in order to ease mixing between the two models as the pseudopriors do not affect the marginal posterior distributions of the models. This was performed by adjusting a scaling factor for the standard deviation for the R 0 pseudoprior to ensure that the values of the pseudopriors applied to the median values of the posterior outputs of both models have a difference close to zero.
Each model has a prior which may affect the mixing between models and the final posterior model probability. As such, we choose this to be such that each model is activated relatively frequently to ensure efficient sampling of the parameter space. We examined the posterior probabilities given either value of the model indicator and adjusted the model prior information such that the difference was minimal. In our case, and primarily due to the difference in model likelihood for the median posterior value of each parameter taken from initial individual model estimation runs, the model prior for the λ model is π λ = exp(−10). The model prior for the R 0 is p R 0 ¼ 1 À p l . The final Bayes factors will take into account the prior model probabilities in the following way:

Bayesian model averaging
Once we have estimated the model evidence and approximated the Bayes factors, we may produce average disease burden predictions given both of our models. We aim to predict the burden of yellow fever in the African endemic zone for the year 2018 which may be calculated using the posterior predicted distribution of the risk of a yellow fever report and either the posterior distributions for λ or R 0 in the serological survey areas. We consider a pool of possible predictions from both model formulations to arrive at the average model prediction. We sample from the combined pool of model predictions proportional to each model's evidence under the assumption that the model priors are equal. To calculate this, we use our Bayes factor estimates and the following formula: wherepðM R 0 jDÞ is the R 0 model evidence under the assumption of equal model priors. We also sample the model predictions proportional to the model evidence where the model priors are as we have defined them; the results of which can be found in the supplementary material.

Results
We shall first detail the outcome of the estimation for each model and then combine the full predicted distributions to calculate the transmission intensity across the African endemic region and thus the disease burden.

Estimation of the GLM for YF occurrence
The generalised linear model provides a close quantitative and qualitative fit to the data as shown by Fig 4. Stability of the chains was assessed both visually through examination of the trace plots and comparison of the whole and partial chain densities, see S1 Text for figures. Additionally, the Raftery diagnostic was used to assess the number of iterations required to accurately estimate the quantiles of the parameters [36,37]. All estimation and analyses were performed in R version 3.4.4 [38].

Estimation of the transmission models
The transmission models were estimated together through the product space MCMC method, as such we arrive at marginal posterior distributions for all parameters. In the case of the specific transmission parameters, λ and R 0 , these posteriors are independent and so we separate the samples by which model was activated. However, the parameters for vaccine efficacy and vaccine coverage factor for south Cameroon are shared between the models; as such, we include all samples for these parameters to account for uncertainty in model structure in these estimates. The posterior distributions for these shared parameters are shown in Fig C in S1 Text and all other trace plots and distributions can be found in the appendix. Fig 5 shows the posterior predictive distributions for all serological surveys for both transmission models. In the majority of situations, both the λ and R 0 models give a similar qualitative fit to the data. However, there are notable differences in COD zones 1 to 3 where the R 0 model fails to predict the first data point.

Bayes factors
The Bayes factors are approximated using Eq 7. We defined the prior probability of the λ model as exp(−10) in our specific formulation of the inference, therefore the only additional consideration is the posterior model probability of each model. The approximate posterior probability of the R 0 model is approximately 0.0675 [0.0691, 0.0706] where the confidence interval is provided from a 1,000 bootstrap samples of the model indicator chain [32]. This leads to a log Bayes factor of -12.60 [-12.63,-12.58] for the R 0 model indicating strong evidence for the static transmission force of infection model.

Bayesian model average predictions
We sample from the posterior distributions of all models to produce estimates of the transmission intensity across the African endemic region, with median values shown in Fig 6. We take 1,000 samples of the posterior for the GLM parameters and use these with 1,000 samples of the transmission models. The transmission models are then sampled proportional to their model evidence under the assumption of equal priors, see Eq 8. In this case, the R 0 model is sampled with probability 3.37×10 −6 , so essentially all 1,000 predictions shown in Fig 7 are obtained from the λ model. We also consider the case where the model priors are as we defined, π λ = exp(−10), the resulting burden predictions are shown in Fig K in S1 Text.

Discussion
We have presented a framework to assess model evidence for two transmission models for yellow fever in Africa. We have used these two models in conjunction with a model of relative  risk of yellow fever reports to estimate transmission intensity across the African endemic region. We were then able to produce average model predictions of the burden across the region according to each model's evidence.
We found strong support for the static force of infection model with a Bayes factor of 2.95 ×10 5 (95% CrI [2.95 ×10 5 , 2.95 ×10 5 ]). This suggests that for the currently included serological surveys, there is more support for the static force of infection model. This could be due to unsuitability of the R 0 model for the type of data we have access to or the sparsity of the data itself; however, it could also indicate the relative validity of assumptions concerning yellow fever transmission in Africa.
Whilst the models we compare have some similar structure, the intrinsic assumptions in each differ. We have omitted one serological survey, from the Central African Republic, that was included in the original estimation of both transmission models as it may not represent the population level immunity at steady state [1,9,18]. When this survey was included, the support for the R 0 model decreased, suggesting that the assumption of constant transmission intensity disproportionally hampers the R 0 model. The R 0 model also assumes that each province is at endemic equilibrium. This assumption may be too strong in East Africa where very low values of R 0 are estimated. It also leads to some years where there are no new infections in certain provinces, see COD zones 1 to 3 in Fig 5 and in the burden predictions in Fig K in S1 Text. Furthermore, changes to the demography could affect the endemic equilibrium making the R 0 less flexible than the static force of infection model. Finally, by fitting the same model indicator over all locations, we implicitly assume that the relative support for each model is spatially invariant. As such, this could penalise areas where the R 0 is better supported.
The example of COD zones 1-3 suggests that the disparity between models is being driven by specific locations. In these situations, the population level immunity is close to herd immunity, leading to years of zero infections in the R 0 model when vaccine coverages are higher; a behaviour never seen in the λ model. As both models are designed to assess long-term disease burden, this behaviour is smoothed over the timespan; however, when comparing specific age groups in serological profiles, the differences can be seen. This behaviour leads to large differences in pseudo-log-likelihood for some age groups in the serology, leading to the large magnitude of the Bayes factor. It also means that the Bayes factor is dependant on the included studies.
If our model assumptions were directly comparable, then we may conclude that the λ model provides a stronger representation of yellow fever transmission in Africa. This would suggest transmission occurring as a result of a constant pressure from the sylvatic reservoir may be responsible for more occurrence reports than human to human transmission (mediated by mosquitoes). This would be consistent with the dynamics seen for the ongoing Nigerian yellow fever outbreak, but not for the 2016 Angola outbreak [9]. However, the way in which we incorporate the available data into our model may contribute to the relative support.
Untangling the relative contribution of transmission routes may not be feasible with our current data sets. Our results are limited by the number of compared transmission models and the sparse data set on which to compare them. We could counter the first issue by introducing further transmission models that bridge the two approaches. However, the second issue will limit how any new models may be estimated. We have utilised occurrence data and serological studies. To show occurrence, we take cases and outbreak reports only, discarding the relative magnitude of outbreaks, as we expect any reports to be a large underestimate of the actual prevalence of the disease. This may indirectly favour the static force of infection model as infections are assumed to be unconnected. In the serological studies, there are key gaps, particularly in West Africa where transmission is estimated to be highest. The absence of information is this region both increases the uncertainty in our estimates and may mean we omit areas where urban transmission plays a greater role. One of the benefits of the framework we use is accounting for some of the uncertainty arising from these data gaps. Yet, we can not ensure that both models will be affected equally.
Our framework for model comparison and averaging may be extended to other models given they share data sources. We have presented a method for dealing with pseudo-likelihoods within a product space estimation format. This leads to posterior estimates of model parameters and model evidence whilst accounting for uncertainty in model structure. We find strong evidence for a model with constant infection pressure from an external source, validating proposals that yellow fever is sustained in a sylvatic reservoir. However these results are limited by the small number of models compared and the sparse data available. New studies and data collection, particularly serological data in West Africa, could reduce uncertainty and alter the current model evidence. As an increasing number of models are developed for disease transmission, rigorous comparison accounting for uncertainty in parameter estimates and model structure is vital in order for us to unpack the relative importance of transmission dynamics. We present one avenue of model comparison and highlight areas of uncertainty both in yellow fever modelling and available yellow fever data.
Supporting information S1 Text. Supporting text. Additional data and estimation figures as well as further explanation of the main text. (PDF)