^{1}

^{2}

^{2}

^{1}

^{3}

^{4}

^{4}

^{1}

The authors have declared that no competing interests exist.

Upcoming vaccination efforts against typhoid fever require an assessment of the baseline burden of disease in countries at risk. There are no typhoid incidence data from most low- and middle-income countries (LMICs), so model-based estimates offer insights for decision-makers in the absence of readily available data.

We developed a mixed-effects model fit to data from 32 population-based studies of typhoid incidence in 22 locations in 14 countries. We tested the contribution of economic and environmental indices for predicting typhoid incidence using a stochastic search variable selection algorithm. We performed out-of-sample validation to assess the predictive performance of the model.

We estimated that 17.8 million cases of typhoid fever occur each year in LMICs (95% credible interval: 6.9–48.4 million). Central Africa was predicted to experience the highest incidence of typhoid, followed by select countries in Central, South, and Southeast Asia. Incidence typically peaked in the 2–4 year old age group. Models incorporating widely available economic and environmental indicators were found to describe incidence better than null models.

Recent estimates of typhoid burden may under-estimate the number of cases and magnitude of uncertainty in typhoid incidence. Our analysis permits prediction of overall as well as age-specific incidence of typhoid fever in LMICs, and incorporates uncertainty around the model structure and estimates of the predictors. Future studies are needed to further validate and refine model predictions and better understand year-to-year variation in cases.

Typhoid fever is a bacterial enteric infection that continues to pose a considerable burden to the 5.5 billion people living in low- and middle-income countries (LMICs). We developed and validated a model incorporating widely available indicators of economic and social development and the environment to estimate the burden of typhoid fever across LMICs. Our analysis uses all available data to estimate the incidence of typhoid in key age groups, which is important for the design and implementation of optimal vaccination strategies, and it identifies regions of the world that have the most uncertainty in typhoid incidence. Across all LMICs, we estimated that the expected number of typhoid fever cases per year is 17.8 million (95% CI: 6.9–48.4 million). We also present the probability that incidence surpasses the criteria for low, medium, high, and very high incidence in each country, which could help guide policy in the face of uncertainty.

Typhoid fever has been estimated to cause between 9.9 and 24.2 million cases and 75,000–208,000 deaths per year [

Data on the incidence of typhoid fever are scarce in low- and middle-income countries (LMICs). The symptoms of typhoid fever resemble those of many other significant febrile diseases, precluding straightforward estimates of typhoid incidence [

Furthermore, variation in the age distribution of typhoid fever across settings is not well understood. Cases tend to be concentrated in younger age groups in settings with higher transmission and distributed more equally among different ages in low-transmission settings [

We explored the potential contribution of demographic, environmental, and socioeconomic indicators that serve as candidate predictors of the age-specific incidence of typhoid fever. We used a data-driven approach to predict the mean and variance in age-specific incidence while accounting for uncertainty in the underlying model structure, and validated our predictions against out-of-sample data.

We carried out a literature search to identify population-based studies that reported incidence of culture-confirmed typhoid fever for the period of 1980–2014. We excluded all hospital-based or clinic-based studies that did not constitute exhaustive surveillance of typhoid in a well-defined population. Further details of the literature search are presented in the

We gathered data on possible predictors of typhoid fever incidence from publicly available databases, aiming to identify indicators of environmental characteristics and socioeconomic development for all LMICs. Predictors were chosen for their ubiquity and relevance to water-borne disease transmission in consultation with typhoid experts.

Covariate | Resolution | Mean and range in estimation sample | Mean and range in TSAP |
Mean and range in prediction sample | Source |
---|---|---|---|---|---|

Population density: pop/km^{2} |
1/24 x 1/24 degree | 2,091 (17–47,024) | 1,205 (46–11,592) | 7.06 (0–98,928) | [ |

GDP per capita, 2005 international dollars |
1x1 degree | $1,780 ($500-$5,509) | 1,130 (841–1,608) | $6,909 (694–53,836) | [ |

Gini coefficient | National | 41 (31–59) | 42 (37–50) | 39 (29–57) | [ |

Access to improved source of water |
Subnational, national | 35 (2–90) | 25 (3–66) | 61 (1–100) | [ |

Access to improved source of sanitation |
Subnational | 38 (2–96) | 17 (3–63) | 57 (0.36–98) | [ |

Years of education for women over age 20 |
Subnational | 4.7 (1.9–10.3) | 3.8 (0.5–9.8) | 1.7 (0.6–10.7) | [ |

% roads paved |
National | 40 (10–87) | 4 (4–28) | 47 (1.8–91) | [ |

% population living on <$2/day |
National | 59 (18–93) | 78 (52–93) | 25 (0.05–95) | [ |

Prevalence of stunting |
Subnational, national | 42 (6–60) | 34 (18–51) | 17 (0.02–82.7) | [ |

Number of major floods 1985–2011 ^{2} |
50m^{2} |
21 (4–99) | 11 (1–25) | 3.7 (0–32) | [ |

HIV prevalence |
National | 0.958 (0.013–6.2) | 2.46 (0.3–6) | 1.42 (0.1–28.8) | [ |

Water stress |
½ x ½ degree | 1.7 (0.0021–6.8) | 1.17 (0.07–3.46) | 1.37 (0.10–10) | [ |

* TSAP: Typhoid Fever Surveillance in Africa Program

^{†} Log transformed values were used. Geometric means were reported in this case.

^{‡} Logistic transformed values were used.

We validated our model predictions against previously unpublished data from the Typhoid Fever Surveillance in Africa Program (TSAP, see

The observed data on typhoid fever incidence result from both a disease process and an observation process. We aimed to take into account both of these processes in our modeling strategy, as illustrated in

The top of the flowchart consists of all cases of typhoid fever in the population, which is what we are ultimately interested in estimating. However, three intervening (observation) processes result in a considerable difference between the “actual” cases of typhoid fever and the observed cases of typhoid fever. In all, the observation process is made up of the type of surveillance employed in each study (_{s}), the participation rate of patients seen at each of the study clinics (_{p,a,j}, adjusted for age _{c,a,j}, adjusted for age _{a,j}, at age

We employed a generalized linear mixed-effects model with a log link function to characterize the true underlying incidence of typhoid (_{a,j}) in setting

First, we considered a null model in which annual incidence was based on age-group fixed effects and random effects for each age group in each location. The intercept, _{0}, represents the incidence in the referent age group (designated as 5–14 years olds) in each setting, and the slopes _{a} represent the incidence rate ratio between the three other age groups and the referent group. An offset equal to the log of person-time was included to adjust for the size of the population under observation:
_{0,j} ~ MVN(0, _{a,j} = 0 for 5–14 year olds.

To test the assumption that typhoid fever incidence correlates with indicators of environmental characteristics and socioeconomic development, we evaluated whether including additional covariates (_{j}) to model both the intercept and the slope improved our ability to predict the incidence of typhoid fever:
_{a} are the effect sizes corresponding to each predictor for the intercept and slope, respectively; again, _{0,j} ~ MVN(0, _{a,j} = 0 for 5–14 year olds. Variable selection for the predictors was based on the spike-and-slab method described below.

Three additional sub-processes that impact our observed data were taken into account (

Studies used in the estimation sample are depicted in red and the studies used in the validation sample are depicted in blue. The studies in the validation sample come from the Typhoid Fever Surveillance in Africa Program (TSAP).

We used a Bayesian framework for model estimation, which allowed us to incorporate prior information on diagnostic test sensitivity, as described above. We used a stochastic search variable selection algorithm employing spike-and-slab priors to estimate the fixed effects of the predictors while incorporating uncertainty in model structure (see

We sought to validate the predictive ability of our model in two ways: using a leave-k-out validation method and by comparing model predictions to out-of-sample data from the recent TSAP studies [

We drew 1,000 samples from the estimated model in order to obtain posterior predictions of the incidence rate across all countries classified as lower income, lower-middle income, or upper-middle income at least once in the period of 2011–2015 by the World Bank. We mapped the median estimates of the predicted incidence by age using a map resolution of 0.1 degrees. We capped the predicted incidence at 10,000 per 100,000 person-years. To obtain estimates of the incidence for each region, we took the population-weighted sum of the estimated incidence over the raster surface. To characterize the uncertainty in the incidence, we calculated the proportion of the posterior predictive sample in each of four incidence categories: <10, 10-<100, 100-<500, and ≥500 cases per 100,000 person-years, designated as low, medium, high, and very high incidence.

We identified 32 studies from 22 sites located in 14 countries (

We evaluated twelve potential predictors of typhoid fever incidence (

A) The posterior marginal probability that each variable was excluded from the model (black) or included as a predictor of the intercept (dark grey) or intercept and slope (light grey) is shown for two chains. Our stochastic search variable selection algorithm could include variables either as a predictor of the intercept (the incidence in 5–14 year olds) or as a predictor of the intercept as well as the slopes (the incidence rate ratios between the other age groups and the referent age group of 5–14 year olds). B) Distribution of the average number of covariates in the model. Chain 1 was initiated using a model that included all the covariates as predictors of the main effect, while chain 2 was initiated as the null model. The null model was never sampled, implying that the models including at least one predictor better described the data than the null model. C) Posterior distributions of age-specific incidence rate ratios between the referent age group (5–14 years of age) and other age groups: <2 years, 2–4 years, ≥15 years old.

No single covariate was included in all models. The percent of roads paved, prevalence of stunting, and percent of the population living in extreme poverty were the most sampled covariates (present in 99%, 97%, and 91% of all models in both chains); in almost all models in which they were present, the covariates were useful to predict both the overall incidence as well as the age-specific incidence rate ratios in each setting. HIV prevalence and flood risk were the next most sampled predictors (present in 85% and 45% of all models in both chains), but these indices were only useful to predict the overall incidence. Indices for income inequality (Gini coefficient), access to flush toilets, and GDP per capita were sampled in just over a third of all models, while the rest of the predictors were each included in 21–30% of all models. In total, the models had a median of six predictors, and 95% of models had between three and nine predictors; the null model was never sampled (

The model estimated a lower incidence rate (on average) among <2 year olds and ≥15 year olds, and a slightly higher incidence rate among 2–4 year olds compared to 5–14 year olds (

Sites are labeled by location and year, and plots are ordered by decreasing overall model-predicted incidence. The red line and regions represent the model fits—median and 95% credible interval of the expected incidence estimated by the joint posterior distribution of model parameter (excluding study specific random effects and the impact of the observation process). The black symbols are the observed incidence with the 95% credible intervals after adjusting for the observation process: surveillance type (active/augmented passive versus passive surveillance), the participation rate, and blood culture sensitivity. Only studies that reported age-specific incidence are featured here.

The posterior incidence predictions are shown in

(A) Posterior predictions from the null model, which only adjusts for age and the observation process. (B) Posterior predictions from the model using fixed effects for the predictors. (C) Leave-3-out validation results. The gray markers represent the density of model-predicted posterior distributions of incidence, while the red dots represent the median posterior predicted incidence. The size of the red circular markers is proportional to the number of person-years of observation in each study. All predictions are of the mean incidence and were generated using only the fixed-effect terms of the model, and hence do not account for unmeasured location-specific differences, e.g. in healthcare-seeking behavior.

Model validation against the TSAP data performed well for some sites, but showed large variance in posterior estimates of incidence, as well as notable overestimates of incidence in several locations (

The observed versus predicted incidence of typhoid fever is plotted for studies in the Typhoid Fever Surveillance in Africa Program (TSAP) using a model estimated from previously published data identified in our literature review. The numbers represent the median posterior predicted incidence for each TSAP site: 1- Nioko II, Burkina Faso. 2 –Polesgo, Burkina Faso. 3 –Ashanti Akim North, Ghana. 4 –Bandim, Guinea Bissau. 5 –Kibera, Kenya. 6 –Antananarivo, Madagascar. 7 –Imerintsiatosika, Madagascar. 8 –Moshi rural, Tanzania. 9 –Moshi urban, Tanzania. The gray markers represent the density of model-predicted posterior distributions of incidence. The gray horizontal lines represent 95% confidence intervals for the observed incidence.

Given the environmental and socioeconomic composition of LMICs, our model predicts that typhoid fever incidence should be highest in parts of Central Africa, Turkmenistan and Uzbekistan in Central Asia, as well as Mongolia and western China (

The median posterior predicted incidence per 100,000 person-years in each of the age groups (<2 years, 2–4 years, 5–14 years, and ≥15 years) is mapped for all low- and middle-income countries (LMICs) with a resolution of 0.1 degrees.

Approximately 6 billion people lived in all LMICs in 2015. We estimated that the expected number of typhoid fever cases per year is 17.8 million across all LMICs (95% CI: 6.9–48.4 million) (

Total cases are shown in millions and incidence is per 100,000 person-years.

Cases | Incidence | |
---|---|---|

Central Asia | 0.05 (0.01, 0.5) | 55 (12, 541) |

Central Europe | 0.01 (0.003, 0.06) | 21 (4, 100) |

Eastern Europe | 0.03 (0.01, 0.13) | 16 (4, 65) |

Andean Latin America | 0.4 (0.04, 2.1) | 704(80, 3751) |

Caribbean | 0.02 (0.004, 0.05) | 47(12, 166) |

Central Latin America | 0.3 (0.07, 1.3) | 120 (30, 512) |

Southern Latin America | 0.04 (0.01, 0.2) | 61 (15, 276) |

Tropical Latin America | 0.2 (0.04, 1.1) | 89 (18, 517) |

Central Sub-Saharan Africa | 1.7 (0.4, 8.4) | 1459 (371, 6984) |

Eastern Sub-Saharan Africa | 2.4 (0.8, 11.3) | 620(213, 2921) |

Southern Sub-Saharan Africa | 0.1 (0.04, 0.4) | 149 (57, 571) |

Western Sub-Saharan Africa | 2.8 (0.7, 11.2) | 753 (198, 3075) |

Southeast Asia | 1.3 (0.4, 5.3) | 217 (88, 571) |

East Asia | 0.5 (0.1, 1.7) | 33 (9, 122) |

Oceania | 0.4 (0.03, 0.5) | 5454 (397, 6576) |

Although South Asia has the second largest case count among the regions, the region has the third highest expected incidence rate after sub-Saharan Africa and North Africa/Middle East (

Our model allowed us to quantify the probability that each continent and its sub-regions had a total incidence that fell into one of four incidence categories: <10 cases per 100,000 person-years (“low incidence”), 10-<100 cases per 100,000 person-years (“medium incidence”), 100-<500 cases per 100,000 person-years (“high incidence”) and ≥500 cases per 100,000 person-years (“very high incidence”) (

Percent of posterior sample in each incidence category | ||||
---|---|---|---|---|

Low (<10 per 100,000 person-years) | Medium (10-<100 per 100,000 person-years) | High (100-<500 per 100,000 person-years) | Very high (≥500 per 100,000 person-years) | |

7 | 87 | 6 | <1 | |

Central Asia | <1 | 77 | 20 | 3 |

Central Europe | 14 | 83 | 2 | <1 |

Eastern Europe | 23 | 76 | 1 | <1 |

<1 | 27 | 67 | 7 | |

Andean Latin America | <1 | 5 | 37 | 59 |

Caribbean | 1 | 90 | 9 | <1 |

Central Latin America | <1 | 40 | 57 | 3 |

Southern Latin America | <1 | 75 | 25 | <1 |

Tropical Latin America | <1 | 57 | 40 | 3 |

<1 | 2 | 41 | 57 | |

<1 | <1 | 26 | 74 | |

Central Sub-Saharan Africa | <1 | <1 | 7 | 93 |

Eastern Sub-Saharan Africa | <1 | <1 | 38 | 62 |

Southern Sub-Saharan Africa | <1 | 24 | 72 | 4 |

Western Sub-Saharan Africa | <1 | <1 | 26 | 74 |

<1 | 44 | 56 | <1 | |

Southeast Asia | <1 | 12 | 79 | 10 |

East Asia | 3 | 92 | 4 | <1 |

Oceania | <1 | <1 | 4 | 96 |

<1 | 5 | 91 | 4 |

We developed a meta-regression framework incorporating widely available indicators of economic and social development and the environment to estimate the incidence of typhoid fever across LMICs, as well as the concomitant uncertainty. We identified predictors that explain a substantial amount of heterogeneity in the incidence of typhoid fever, which significantly improved predictions of incidence across all age groups and for school-aged children in particular. This analysis represents substantial progress over existing models for the incidence of typhoid fever by allowing for estimation of variation in typhoid incidence both within and between countries. Given the limited data available, the credible intervals around the model predictions are appropriately large in parts of the world where typhoid surveillance is weak or non-existent. Although considerable uncertainty remains, an additional strength of our analysis is that we calculate the probability that incidence surpasses the criteria for low, medium, high, and very high incidence in each country, which could help guide policy in the face of uncertainty.

Our results provide insights into the likely predictive power of widely available risk factors for typhoid incidence; however, these should not be interpreted as providing inference on the causes of typhoid transmission. First, most of the covariate data is available at the national or sub-national administrative unit, potentially failing to capture the epidemiological dynamics that operate at smaller scales. Second, these indices might be adequate proxies for the causal factors that drive disease incidence in some locations, but not across all LMICs. For instance, although contaminated drinking water has been established as the vehicle of transmission in numerous outbreak investigations, the proportion of the population using piped water is not a particularly helpful metric for predicting typhoid incidence in LMICs compared to other indicators of development and health, e.g. percent of roads paved, the population living in extreme poverty, or the prevalence of stunting and HIV [

Our estimate of the overall incidence of typhoid fever is similar to those published previously, but it reflects considerably greater uncertainty [

Past estimates of the incidence of typhoid fever in different age groups did not account for differential diagnostic procedures amongst age groups (in particular blood volume used for culture confirmation). It is important to consider the degree to which the observed age distribution of disease, particularly the lower incidence often observed in <5 year olds, could be attributed to the relationship between test sensitivity, the amount of blood drawn for diagnosis, and age. By simultaneously estimating the overall incidence, the incidence rate ratios between different age groups, and the observation process, we have allowed the data to drive our estimates of age-specific incidence, rather than relying on assumptions derived from a subset of studies.

However, we did not adjust for all differences in diagnostic procedures across studies. For example, while most studies limited eligible participants to those with a fever of three days or more, at least two of the studies enrolled all febrile children <5 years old regardless of the duration of fever [

After carrying out a rigorous, statistically sound analysis of the available data on typhoid incidence, sizable uncertainty around our incidence estimates remains. The uncertainty in the incidence rate ratio (IRR) among <5 year olds and the covariance in the IRR between the <2 and 2–4 year olds outpaces the between and within-group variance in the other groups (

We strongly emphasize the need to consider the uncertainty in estimated incidence, in addition to the point estimates. Whereas past estimates designated countries into low, medium, and high incidence categories without any discussion on the potential for misclassification, our Bayesian framework allows us to present estimates of the probability that the incidence falls into each of four categories of incidence (

Past estimates indicated that a large number of settings are predicted to fall into the high typhoid incidence category, previously defined as ≥100 cases per 100,000 person-years [

It is clear that the incidence of typhoid fever in Africa is not yet well understood. Out-of-sample validation of the model against data from nine TSAP sites showed that the model has mixed success in predicting incidence for locations outside the estimation sample. The incidence in the <5 year age group rural site in Moshi, Tanzania, and in both sites in Madagascar were all over-estimated. Model predictions for Kibera, Kenya and Ashanti Akim North, Ghana were more consistent with the observed data; however, there were previous studies from these two sites that were used to estimate the model. This suggests that there was some consistency in the incidence of typhoid over time, which our model was able to capture. Moreover, we draw attention to the potential levels of typhoid fever in regions where typhoid has received limited or no attention in the literature, such as Central Africa, Central Asia, and Oceania, and Latin America.

The current study modeled the long-term incidence (technically speaking, the mean annual incidence) of typhoid fever instead of predicting the number of cases in any given year. As we lack high-quality data in almost all LMIC settings, predicting year-to-year variation in the incidence of typhoid fever would be considerably more difficult, but would likely lead to even greater uncertainty in the incidence of typhoid in any given year. Due to limited surveillance capacity for typhoid fever over prolonged periods of time, our estimates of the spatial distribution of typhoid incidence reflect only the spatial distribution of risk factors, rather than temporal processes of spatial contagion suggested by phylogeographic studies [

Furthermore, we do not account for temporal variations in incidence associated to emergent properties of the pathogen or its transmission dynamics, such as the recent outbreaks of typhoid reported across eastern and southern Africa [

Our analysis achieved two main objectives: 1) to identify widely available predictors of typhoid fever incidence; 2) to point out places in the world that have the most uncertainty in typhoid incidence, thereby motivating future studies into the scale and spatiotemporal distribution of typhoid fever in these areas. However, many LMICs have limited capacity for typhoid surveillance. The model we present provides a validated means of predicting typhoid incidence in countries with limited or no typhoid surveillance data based on widely available indicators. Understanding and predicting the burden of typhoid fever is an essential first step in motivating the need for better control measures, including typhoid conjugate vaccines.

(DOCX)

Incidence rates (and 95% confidence intervals) per 100,000 person-years. Incidence rates shown are not adjusted for participation rate, surveillance type, or blood culture sensitivity.

(DOCX)

Incidence rates (and 95% confidence intervals) per 100,000 person-years. Incidence rates shown are not adjusted for participation rate, surveillance type, or blood culture sensitivity.

(DOCX)

(DOCX)

We ran a null model assuming regional hyperpriors for the location-specific random effects (in addition to a global hyperprior) using two schemes to group countries: continents and the Global Burden of Disease (GBD) Regions for 2015. We discovered that regional hyperpriors would not differ significantly from global hyperpriors. Further, we note that there are only data for two locations in the Americas, making a separate hyperprior for this region unnecessary.

(DOCX)

A) Studies that estimate blood culture sensitivity in bone-marrow culture-confirmed patients for typhoid fever [

(EPS)

Distribution of the covariate coefficients for the intercept and each of the slopes are shown. The symbol size corresponds to the proportion of the models that included that covariate. The intercept describes the incidence rate for children 5–14 years of age, slope 1 describes the incidence rate ratio (IRR) between children <2 and 5–14, slope 2 describes IRR between children 2–4 and 5–14, and slope 3 describes the IRR between adults (≥15) and children 5–14 years of age.

(EPS)

Posterior predictions from the model using fixed effects for the predictors and location- and age-specific random effects. The gray markers represent the density of model-predicted posterior distributions of incidence, while the red dots represent the median posterior predicted incidence. The size of the circular markers represents the number of person-years of observation in each study.

(EPS)

To validate the model, we re-estimated the model seven times, each time leaving out the data from three locations to assess how well the model would estimate the incidence in those locations. Leave-3-out validation allowed us to evaluate the consistency with which spike-and-slab priors (our stochastic search variable selection algorithm) would include each variables as predictors of the intercept or of the intercept and the slopes.

(EPS)

Validation also allowed us to evaluate the posterior number of predictors selected for the model. Chain 1 was initiated using a model that included all the covariates as predictors of the main effect, while chain 2 was initiated as the null model. The null model was never sampled, implying that the models including at least one predictor better described the data than the null model.

(EPS)

The authors would like to thank Prof. Jeroen Smits and Dr. Rutger Schilpzand of the Global Data Lab at Nijmegen Center for Economics (Radboud Universiteit Nijmegen) for providing assistance with extracting data on subnational socioeconomic and demographic indices.