Cholera forecast for Dhaka, Bangladesh, with the 2015-2016 El Niño: Lessons learned

A substantial body of work supports a teleconnection between the El Niño-Southern Oscillation (ENSO) and cholera incidence in Bangladesh. In particular, high positive anomalies during the winter (Dec-Feb) in sea surface temperatures (SST) in the tropical Pacific have been shown to exacerbate the seasonal outbreak of cholera following the monsoons from August to November. Climate studies have indicated a role of regional precipitation over Bangladesh in mediating this long-distance effect. Motivated by this previous evidence, we took advantage of the strong 2015–2016 El Niño event to evaluate the predictability of cholera dynamics for the city in recent times based on two transmission models that incorporate SST anomalies and are fitted to the earlier surveillance records starting in 1995. We implemented a mechanistic temporal model that incorporates both epidemiological processes and the effect of ENSO, as well as a previously published statistical model that resolves space at the level of districts (thanas). Prediction accuracy was evaluated with “out-of-fit” data from the same surveillance efforts (post 2008 and 2010 for the two models respectively), by comparing the total number of cholera cases observed for the season to those predicted by model simulations eight to twelve months ahead, starting in January each year. Although forecasts were accurate for the low cholera risk observed for the years preceding the 2015–2016 El Niño, the models also predicted a high probability of observing a large outbreak in fall 2016. Observed cholera cases up to Oct 2016 did not show evidence of an anomalous season. We discuss these predictions in the context of regional and local climate conditions, which show that despite positive regional rainfall anomalies, rainfall and inundation in Dhaka remained low. Possible explanations for these patterns are given together with future implications for cholera dynamics and directions to improve their prediction for the city.


Introduction
Climate variability, specifically the El Niño Southern Oscillation (ENSO), has been shown to influence the year-to-year variation of seasonal outbreaks of cholera in Bangladesh [1][2][3][4][5][6]. In particular, this interannual variation of the disease has been shown to be positively associated with Sea Surface Temperatures in the Tropical Pacific in the region that warms during El Niños [2]. Regional rainfall was identified as one of the local drivers that mediates this long-range effect of ENSO [7], with evidence for an influence of high monsoon rains and river discharge on disease levels in Bangladesh (e.g. [8][9][10][11][12][13][14]).
Both statistical analyses and process-based mathematical models for the population dynamics of the disease have been applied to the role of ENSO in rural areas of Bangladesh (e.g. [2,5]). For the city of Dhaka itself, the capital of Bangladesh, application of mechanistic models incorporating epidemiological processes is still lacking, although a more phenomenological approach has been developed to analyze the spatio-temporal transmission dynamics at the level of 'thanas' or districts within the city [15]. The statistical model of Reiner et al. (2012) specifically identified the existence of two distinct spatial regions within Dhaka based on the dynamics of the disease: the truly urban core of the city and its more rural periphery, with the former exhibiting much higher disease incidence and a stronger response to ENSO. This distinction between the core and periphery of the city was also found to be relevant for rotavirus, suggesting that the causal links between climate and disease are a more general feature of diarrheal diseases in this region, independent of their particular transmission pathways [16]. A similar conclusion is supported by comparisons between cholera and shigellosis regarding their association with flooding and SST in the Pacific [17].  The arrows denote rates of flow among these classes. The force of infection (λ) includes three components: a long-term trend, secondary transmission that depends on the levels of infection in the population, and primary transmission at a constant rate from an environmental reservoir of the pathogen. The transmission coefficient (or rate) β in secondary transmission incorporates seasonality, interannual variation as a function of the ENSO index, and environmental noise. B) Statistical spatio-temporal model. Districts of the city known as 'thanas' are grouped into two main regions (depicted in orange for the core and in blue for the periphery). Thanas within the same group follow the same dynamical rules in terms of transitions between cholera levels or states from one month to the next. Three states are considered and used to discretize the case data: no cholera (0), low cholera (1), or high cholera (2) as indicated by the different color intensity. The probability of transition between states from one month to the next depends on the season, the maximum state of neighboring districts, and the climate covariate (ENSO). For details on the models, see Methods and Supplementary Information.
Recently, new statistical approaches for inferring model parameters based on time series data have allowed flexible representations of mechanisms that more closely represent epidemiology in models that also incorporate both measurement and process noise [18,19]. Although these process-based models have been applied to glean insight into epidemiological processes in a number of infectious diseases from retrospective data, they have been used only in a few exceptions to generate and evaluate forecasts (e.g. [20,21]). Motivated by the large El Niño of 2015-2016, we examine here the ability of a process-based (mechanistic) model to predict cholera incidence in Dhaka by combining epidemiology and climate variability ( Figure 1A). We also compare the results to those obtained with the previously published statistical model for cholera in the city ( Figure 1B). The value of the SST anomaly observed in the region of the Pacific known as Nino3.4 for this January (2.60) is comparable in magnitude to the one observed for the large El Niño event of 1998 (2.56), when Dhaka suffered the largest cholera outbreak in the last 20 years and one of the worst floods in its history affecting more than 50% of the city's area [22]. We focus on cholera reported cases from the core of the city for the process-based model, based on the heightened sensitivity to ENSO in this region, the low number of cases in the periphery and its low contribution to the force of infection in the core [15,23]. The monthly surveillance data are divided into a training set from January 1995 to December 2010, used to fit the mechanistic model, and an 'out-of-fit' set from January 2011 to December 2015, used to evaluate the predictability of the model. For the statistical model, the out-if-fit set starts in 2009, based on the published fit [15]. We specifically ask whether these two models can be used as an early-warning system, able to forecast cholera outbreaks following the monsoon season, based on observed January Niño3.4 values and on previous cholera cases. We then generate a forecast for the incoming transmission season in August-December 2016 and discuss implications of the findings and possible limitations of the approach given the event variability of El Niño and of its connections to the regional precipitation over Bangladesh. We discuss what will be learnt later this year when the actual observations for the upcoming cholera season become available, whether the predictions fail or succeed. At the moment, careful monitoring of the development of the monsoon season is warranted in informing cholera predictions at a shorter lead time.

Results
Normalized incidence data exhibit strong interannual variability ( Figure 2). The seasonality in cholera incidence is bimodal with two peaks per year: in spring and in autumn, preceding and following the monsoons respectively. For the mechanistic model, the observed and the simulated data exhibit yearly peaks and troughs that fall in the same months for most years, capturing the bi-modal seasonal patterns fairly well ( Figure 2A). Such agreement is encouraging given that these simulations are not one time-step ahead predictions but instead, 15-yr long trajectories starting from estimated initial conditions. It demonstrates the hindcast performance of the model and suggests its potential for forecasting.
The predicted yearly data for the period 2011-2016 were generated by updating the initial states of the epidemiological variables for January each year, as well as the estimated values of the parameters for each year starting in 2011. Model predictability was evaluated for the years between 2011 and 2015. Because the model is stochastic (incorporating noise in the dynamics), we generated both the median value and the 10-90% confidence intervals of 1000 predictions for each month. For the most part, the median prediction appears close to the data, with the observed cases falling within the uncertainty of the confidence intervals, suggesting that cholera incidence is forecasted accurately, including the decline observed in the data until 2015 and the low levels of incidence characteristic of the past 5 years. We note a slight over-prediction of some of the fall seasons (2012 and 2015) which exhibited almost no cases those years in a somewhat uncharacteristic seasonal pattern. Similarly, simulations of the statistical model show that the predicted cases are very consistent with the observed data within the training set as already shown in Reiner et al. , the model is simulated from 1995 forward. Thus, the comparison to data is not based on the typical next-step (next-month) prediction for which it is somewhat trivial for most models to capture the fitted data. These simulations span more than a decade. The second period (light gray background) shows predictions for out-of-fit years, with simulations starting in January and spanning the whole year. We refer to these predictions that cover windows of time in the past, as hindcasts. Finally, the last period is for the current year and constitutes a true forecast (for the upcoming fall season).
not used to fit the model, which has become available since that study. For the most part, the predictions produce the observed lower levels of cholera, with an overprediction of the same fall peaks for 2012 and 2015.
Here the overprediction is larger possibly because the model does not incorporate a long-term trend.
To more formally evaluate predictability, we considered the probability of exceeding a threshold number of cases for the whole fall season (Aug -Dec), with the threshold based on the distribution of outbreak size in the data. We are interested here in evaluating the ability of the model to predict the occurrence vs. the lack of 'large' outbreaks, given their relevance to public health and in light of the high El Niño index reported for January 2016. In Table 1, the probability of cholera incidence exceeding the 50th, 75th and 95th percentiles for 2011-2015 are reported, with these different thresholds representing outbreaks of increasingly higher magnitude. Results from the mechanistic model indicate the absence of a large outbreak, consistent with the observed data for that period, with none of the years exhibiting a probability greater than 50% of exceeding the thresholds (Table 1). In other words, the model would have accurately predicted the low risk of cholera in the past five years. Likewise, the statistical model shows consistent results for the most part, with a probability higher than 50% of being above the median for only one of the years (2015). Otherwise for the 75% and 95% thresholds, the probabilities of exceeding them are consistently low. For the coming fall season of 2016, the same forecast analysis reveals probabilities of 87% and 92% for the mechanistic and statistical models respectively, of a large outbreak (defined as surpassing the 75th quantile of outbreak size). For an extreme outbreak (defined as surpassing the 95th quantile, and therefore comparable to the cholera event of 1998), these probabilities are 50% and 27% for the two models. We note that the latter model has a tendency to underpredict the actual size of large outbreaks, evident for 1998. As explained in Reiner et al. (2012), this follows from the classification into three discrete levels which tends to place all thanas with a high number of cases in the same category. This effectively reduces the tail of the distribution of monthly cases. We return to this consideration in the Discussion.
We also consider the evolution of the monsoon over Bangladesh up to this time in June, based on the Climate Prediction Center Unified (CPC UNI) global daily precipitation data set [24,25], which covers the period 1979-present and is updated daily in real time. We compare the evolution of the 2016 monsoon with previous high and low flooding years, all following El Niño events. Five high flooding (1987,1988,1998,2004,2007) and five low flooding (1983,1992,1995,2003,2005) years were selected, and the average difference in June rainfall relative to the 1979-2015 mean is shown in Figure 3. We find that for the high flooding years, June rainfall was higher than normal across much of Bangladesh ( Figure 3A), while in the low flooding years, June rainfall was below normal across the entire region ( Figure 3B). Signals from the current year are more mixed, with above normal rainfall across the Northwest India and northern Bangladesh and below normal rainfall across the center of the country ( Figure 3C). At the time of writing, the Bangladesh Flood Warning Table 1: Hindcasts for the indicated years and forecast for 2016, for the post-monsoon (Aug-Dec) season of cholera. The distribution of observed cases for this same post-monsoon period for the training data used to fit the models was used to estimate the values of the 50th (the median), 75th and 95th quantiles. These values are used as thresholds to define outbreaks of increasing size: a season that exceeds the median is considered anomalous, one that exceeds the 75% level is considered a large outbreak, and one that exceeds the 95% level, an extreme outbreak. The average observed cases for each year are shown next with an indication of whether they exceed the threshold (yes, "outbreak") or not (no, "no outbreak"). The proportion of 1000 simulations that fall above each threshold level is reported as a probability, and a probability > 50% is interpreted as a prediction of an outbreak, specified in the last column of each model.

Mechanistic temporal model
Statistical spatio-temporal model

Discussion
The El Niño Southern Oscillation (ENSO) acts as the main driver of interannual climate variability worldwide. As such, the associated anomalies in Sea Surface Temperatures in the Tropical Pacific provide a basis for predicting the interannual variability of a number of phenomena around the globe, including that of climate-sensitive infectious diseases, such as water-borne and vector-borne infections. Here, we implemented a mechanistic transmission model that incorporates both epidemiological processes and the effect of ENSO, to forecast cholera risk in the city of Dhaka for the upcoming fall season. The predictability of this model was validated with retrospective data, and our main results indicate an 87% probability to observe a large outbreak of cholera during Aug-December 2016, comparable to that of the previous large El Niño in 1998. Likewise, the previously published statistical model for cholera in Dhaka indicates a 92% probability of a large outbreak this coming fall. These results provide a call for a heightened alert and preparedness of the public health system in the city, concurrent with the monitoring of the evolution of the monsoon season in Bangladesh.
The predictive ability of the models was evaluated based on the incidence of the last five years. Given the uniformly low levels of cholera during that period, this assessment concerns only true or false negatives, that is the ability of the model to predict the lack, and not the occurrence, of an outbreak. Although we correctly predict the lack of an outbreak and the overall low risk of cholera for that period, we overpredict the actual number of monthly cases during the fall season for two of the years (2012 and 2015) whose fall peaks were almost completely absent. This pattern may result from the stochasticity inherent to low levels of the disease, and in part be accounted for the long-term trend in combination with the lack of warm anomalies in ENSO conditions. The statistical model overpredicts these recent low seasons further, possibly because it does not incorporate such a trend. We note that in general, this model also tends to underpredict the large events (like 1998) because of the transformation from actual cases to three discrete levels which reduces the tail of the distribution of cases [15]. A detailed analysis of this property of the model showed that it can be corrected by making a probability higher than 25% (rather than 50%) an indication of outbreak risk (for the 75% threshold). We note that this correction does not modify the outcome for 2016, and that the probability for the 75% quantile is 97%, well above 25%, a further indication that the model predicts a high risk of a large outbreak. Our predictions are based on the magnitude of the current El Niño condition, which is similar to that observed for 1998 when Dhaka experienced the largest cholera outbreak in the last 20 years. Given that the interannual variability associated with ENSO is superimposed on the on-going climate change, the effect of El Niño on climate-sensitive diseases could be aggravated. We acknowledge, however, that variation among El Niño events also calls for caution regarding predictions that involve long-distance connections, such as ours, since such an event is not fully characterized by the magnitude of the SST anomaly alone, and ultimately, the regional changes in the climate variability of Bangladesh are what matters to the population dynamics of the disease. The ENSO index permits however the generation of forecasts with a relatively long lead time, of 8 to 12 months, not possible for any local environmental factor. This early warning can then be followed by carefully monitoring of the monsoon especially in July and August.
Given that some strong El Niño events have resulted in low rainfall and low flooding in Bangladesh in the past, and the somewhat anomalous impact of the 2015-2016 event on rainfall in other regions (such as California), it is important to recall that no ENSO index can fully capture the impact of a given event. Significant difference in impact over Bangladesh from events characterized by similar index values have occurred in the past. These observations are a reason for caution, indicating the possibility of a false positive forecast for cholera if rainfall were to remain below average for the whole season.
We note however that the value of producing the forecast goes beyond its actual success since we can learn from comparing it to the actual incidence of cholera, once the data become available, whether or not a large outbreak occurs. Specifically, the lack of a large cholera outbreak could follow from three possible reasons: First, the current El Niño could differ from previous ones such as 1998 in its effects over Bangladesh. In this case, future research should seek further features of the evolution of such an event that can be useful to distinguish effects on cholera in the region, although this pertains to the challenging connections between temperatures in the Pacific and the monsoons. Still, the discrepancy of model predictions and observations would be an indication of the importance of concatenating an early warning with a long lead time, to a shorter prediction based on the local environment. Second, a discrepancy with our forecast could result from increased and effective intervention. In this case, it would be valuable to document the level of intervention in comparison for example to that of 1998. Model outputs, besides their potential contribution to preparedness, can serve as a point of reference to evaluate the success of intervention [20]. Finally, the low incidence observed in the last 5 years, especially for the fall peak, could be a consequence of changes in the environmental and/or socio-economic conditions in the city, which could alter the seasonality and overall risk of the disease. In this case, even in the presence of anomalous rains, one would not observe as strong a response to climate forcing as those in the past. A related intriguing possibility is that the prolonged hiatus in large El Niños has contributed to the sustained window of many successive years with low incidence, which in turn have diminished the reservoir of the bacterium Vibrio cholerae in its pathogenic form. In this case, the system may be poised at environmental conditions that make it unable to respond as strongly as in the past to a large climate event. Observations of how both the disease and the monsoons evolve this year can tell us which of these situations apply.
In brief, we provide the forecasts for the upcoming season as a warning, based on the past accuracy of the models and on the lead time afforded by observations of SST in the Tropical Pacific. Public health preparedness is called for, to be complemented by observation of the monsoon in Bangladesh in July-August, and later still, by observations of the cholera cases themselves as the transmission season develops. With the more sophisticated ability of the modeling community to fit transmission models to retrospective records, it is now of interest to take up the challenge of forecasting and learn from 'real time' implementations.

Methods
The original cholera data consist of daily cases from 1995 to 2015 obtained from the ongoing surveillance program by the International Center for Diarrheal Disease Research, Bangladesh (icddr,b), in which a systematic subsample of all patients visiting the hospital, which serves as the main treatment center for the greater Dhaka city area, is tested for cholera. Reported cases were added by month, per 'thana' or administrative subdivision. For each thana, the population was computed by interpolating the three decadal censuses beginning in year 1991, 2001 and 2011. Cases were then aggregated over the thanas comprising two regions, core and the periphery of the city, according to the partition proposed by Reiner et al. (2012).
The process-based model follows the temporal changes in the number of cases in the core of the city. It consists of a Susceptible-Infected-Recovered-Susceptible (SIRS) compartmental formulation in which the population is subdivided into the following classes: S for susceptible individuals, I for infected and infectious individuals, and R for recovered individuals who have acquired immunity to the disease. Acquired immunity is temporary and wanes at rate φ as individuals return to the S class. The recovery rate (γ) of infected individuals and other parameters are estimated using a recently developed likelihood-based inference method (see Supplementary Information for model details). The rate of transmission per susceptible individual (or force of infection λ) contains three components: a long-term trend, a representation of secondary transmission (depending on the level of infected individuals in the population), and primary transmission (at a constant rate from environmental reservoir). Secondary transmission depends on the proportion of the population that is infected (I/N ), with the coefficient or transmission rate β including a seasonal component, the interannual effect of ENSO, and environmental noise ( figure 1, figure S1). The model further incorporates measurement error with a reporting rate that is also estimated and accounts for under-reporting. For a full description of the model, see Supplementary Information.
The statistical model follows the spatio-temporal transmission dynamics of the disease at the spatial resolution of thanas or districts in the city. It categorizes the monthly thana-level case data into three states: 'no cholera'; 'low cholera'; and 'high cholera' as described in Reiner et al. (2012). The model estimates the month-to-month transitions of each thana as a set of three transition probabilities (conditioned on the value of the current state only). These probabilities are functions of temporal and spatial covariates as well as of the maximum state of the neighboring thanas ( Figure 1). The parameter values used for these predictions were fitted to data from 1995-2008 in Reiner et al. (2012). For complete details on the model formulation and fitting, including the categorization criterion, see Reiner et al. (2012).

Process-based model and parameter estimation
The population dynamics of cholera are represented by an SIRS (Susceptible-Infected-Recovered-Susceptible) model (equation 1) in which the population in subdivided into the following classes: S for naive individuals who are susceptible to disease, I for infected and infectious individuals, and R for those who have recovered and have acquired immunity to the disease. This representation allows for temporary immunity with individuals in R eventually returning to S at a given rate φ. The set of equations is given by: where P represents the population size, and S, I and R, the number of individuals in those respective classes. The force of infection λ denotes the rate of infection per individual susceptible in the population, which is given by: where the first term implements a long-term trend starting at the initial time t 0 , and in the second term, ω represents a fixed background infection due to an environmental reservoir and β refers to the transmission rate per infection: The expression for the transmission rate β includes a seasonal component implemented through six splines s j with six coefficients b j determining the weight of each component, allowing for a flexible representation of the seasonality ( Figure S1A). Climate forcing by ENSO is incorporated by considering the SST anomalies in the Niño3.4 region of the subtropical Pacific for the month of January (http://www.cpc.ncep.noaa.gov/data/ indices/sstoi.indices). Specifically, this covariate is included in the terms relevant to the fall months (fourth and fifth splines), based on previous studies of the observed correlations between ENSO and cholera cases in this region [2,3]. The transmission rate also includes a stochastic component (through a Gamma distribution Γ) to represent environmental or other sources of variation not accounted for by seasonality or ENSO.
Following Reiner et al. (2012), the El Niño index is integrated into a sigmoidal function, where EN SO Jan refers to the anomaly of the ENSO reported for January at time t, normalized between -1 and 1 ( Figure  S1B). This functional form allows for nonlinear responses to ENSO and is given by the following expression: We formally relate the model to data by assuming that only a fraction ρ of new infections are detected by the surveillance methods (ρ is estimated from data along with other model parameters, see below), and model the data as negative binomially distributed around these new infections, as follows. From equation 1, the rate of new infections is λS, which gives the total new infections ∆I k over a time interval (t k−1 , t k ) as ∆I k = ρ and variance a + a 2 b, and σ 2 obs denotes the variance parameter associated with detection uncertainty. To evaluate the forecasting abilities of the model, we first fitted the model to the data between 1995-2010. Specifically, we carried out a likelihood-based inference via an iterated particle filtering method to estimate the parameters and initial conditions, to obtain the MLE (Maximum Likelihood Estimates) [18,19,26]. Because we wish to reproduce as close as possible the conditions under which one would be generating the forecasts, we then updated the estimated parameters by extending the fit of the model to the 'new' data that would have become available each year. This resulted in refining the fit of the model over a moving window of 12 months, starting from January 2011. The updated parameters and initial values of the state variables each January were used to simulate forward for the next year.
To compare the predictions to the data for the years between 2011 and 2015, we first defined an outbreak as a season in which the total number of cases exceeds a given threshold. The threshold itself was determined based on the empirical distribution of the average number of cases between August and December for the period 1995-2010. Based on the predictions generated by 1000 simulations, we computed the predicted probability of surpassing the given threshold (top 50%, 25% and 5% of all of the data). These respective probabilities can be interpreted as an outbreak, large outbreak and extreme outbreak respectively.