Mapping climate change’s impact on cholera infection risk in Bangladesh

Several studies have investigated how Vibrio cholerae infection risk changes with increased rainfall, temperature, and water pH levels for coastal Bangladesh, which experiences seasonal surges in cholera infections associated with heavy rainfall events. While coastal environmental conditions are understood to influence V. cholerae propagation within brackish waters and transmission to and within human populations, it remains unknown how changing climate regimes impact the risk for cholera infection throughout Bangladesh. To address this, we developed a random forest species distribution model to predict the occurrence probability of cholera incidence within Bangladesh for 2015 and 2050. We developed a random forest model trained on cholera incidence data and spatial environmental raster data to be predicted to environmental data for the year of training (2015) and 2050. From our model’s predictions, we generated risk maps for cholera occurrence for 2015 and 2050. Our best-fitting model predicted cholera occurrence given elevation and distance to water. Generally, we find that regions within every district in Bangladesh experience an increase in infection risk from 2015 to 2050. We also find that although cells of high risk cluster along the coastline predominantly in 2015, by 2050 high-risk areas expand from the coast inland, conglomerating around surface waters across Bangladesh, reaching all but the northwestern-most district. Mapping the geographic distribution of cholera infections given projected environmental conditions provides a valuable tool for guiding proactive public health policy tailored to areas most at risk of future disease outbreaks.


Introduction
Cholera, a waterborne bacterial disease that causes severe diarrhea and dehydration in humans, remains a significant threat to global health. Despite proposed efforts to reduce global cholera mortality by 90% by 2030 [1], researchers estimate that between 1.3 million and 4 million cholera cases occur annually, with an estimated 21,000 to 143,000 deaths [2].
The etiological agent Vibrio cholerae resides in coastal brackish water and riverine habitats and is typically seeded along coastlines [3]. Among many proposed hosts, vectors, and reservoirs of infection, zooplankton remain the largest known environmental reservoir of V. cholerae [4]. Consumption of seafood or water contaminated with an infective dose of freefloating V. cholerae or V. cholerae-harboring zooplankton causes human infections, while infection may also occur through fecal-oral transmission between human hosts. Such transmission pathways are influenced by environmental conditions in waterbodies that favor bacterial growth [5]. Changes to such waterbodies influence the epidemiology and ecology of V. cholerae by altering bacterial reproduction, transmission, and exposure risks. Climatic conditions, such as rainfall and sea surface temperature, drive epidemiological risk, with warmer, wetter environments increasing the likelihood of disease transmission and infection [6]. Specifically, increases in sea surface temperature and photosynthetic activity, which increases salinity and pH levels, have been shown to encourage bacterial growth and hence V. cholerae infection risk and endemicity in the Bay of Bengal [7].
However, future climate conditions can also promote increased infection risk in inland populations. Heavy rainfall events (e.g., El Niño and Southern Oscillation and summer monsoons) increase cholera infection risk by damaging sanitation systems and contaminating water sources with sewer spillage [5,8,9]. Surface water contaminated with brackish coastal waters may also serve as sources of infection after flooding events [10]. Cholera infection risk may also increase in periods of drought, during which reliance on scarce water sources increases the likelihood of contamination with V. cholerae, especially if human hygiene practices partake in waters used for drinking water [11].
The role of the environment in shaping disease epidemiology and ecology, is not unique to cholera. Recently, researchers have found that air pollution, chemical exposures, population density, and the climate-specifically, the ambient air temperature-influence SARS-CoV-2 transmission dynamics [12][13][14]. For COVID-19 and cholera alike, curbing widespread infection, mortality, and social disruption requires characterizing the epidemiological risk, which in turn depends on how regional weather, land-use practices, and climate conditions influence disease epidemiology and ecology. Risk mapping, a method of associating risk values to explicit geographic areas, has become an effective tool for not only visualizing the spatial distribution of disease burden (i.e., risk) but also for guiding public health policy to reduce that burden [15,16].
One approach to estimating risk across a landscape is to use non-mechanistic correlative models that predict infection risk given disease incidence data (e.g., disease presence/absence) and environmental covariates. Predicting risk under future environmental and climate scenarios is essential for disease surveillance. While predictive studies cannot predict into the future with complete accuracy and are often subject to the limitations of global climate models that create future environmental variables, risk prediction remains a powerful tool in guiding proactive public health policy for areas most at risk of future disease outbreaks. Such a strategy is particularly critical in endemic areas, as pandemic strains of V. cholerae almost invariably emerge from endemic areas that seed epidemics abroad [6,17].
Several studies have sought to predict risk for cholera infection given climate and weather differences via risk-mapping [10,[17][18][19][20]. Most risk-mapping studies restrict their analyses to present climatic conditions or limit climate projections to coastal settings only. To our knowledge, no study to date integrates long-term climate projections into risk mapping, especially for inland populations of endemic countries notoriously affected by climate change. Bangladesh is one such country. Not only is it uniquely vulnerable to coastal flooding, due to its geography and population density, but recent research also finds that by 2100, regardless of the global climate model used, Bangladesh will experience an increase in exposure risk to flooding, with lower-lying regions most at risk [21]. Cholera epidemics are also frequently seeded in the Bay of Bengal and emerge with seasonality [17,22]. Given these vulnerabilities, this study seeks to quantify current and future cholera infection risk values across Bangladesh given environmental conditions. Such an analysis is critical to lessening the burden of cholera and to sustaining and redirecting regional public health strategies as needed over the medium and long term.

Materials and methods
Here we construct risk maps for cholera infection for Bangladesh under current and future climate scenarios. We identify spatial environmental variables associated with human cholera infection and cholera incidence data from a detailed country-wide serosurvey study, and employ a fitted random forest model to predict the risk of infection across Bangladesh at a fine spatial resolution [17]. Below, we characterize our analyses in greater detail.

(a) Study area
We used the administrative boundary level 0 provided by the GADM spatial database (v. 3.6) as the extent for our study area (88.01057˚W, 92.67366˚E, 20.74111˚S, 26.63407˚N) [48].

(b) Cholera occurrence data
We used a serosurvey dataset described in Azman et al. (2020) that identifies cholera prevalence within Bangladesh for 2015 for our disease presence data [17]. Of the 2930 surveyed individuals, the 639 predicted positive cases constituted our model's presence data while the predicted 2291 negative cases constituted absence data. The approximate coordinate location of each surveyed individual was also used by our model to extract values from our spatial covariates. Notably, multiple presence or background points may exist at the same coordinate location as serum samples were often taken from multiple individuals within the same household.

(c) Spatial environmental data
To develop our model, we considered 13 spatial variables known to correlate with V. cholerae occurrence and case incidence and for which data were available for 2015 and 2050 (Table 1). Given our interest in predicting risk for the entirety of Bangladesh, we restricted our variables to those with values available for each cell in the extent used. Moreover, as V. cholerae can be found in semiaquatic and seasonally aquatic settings, we excluded environmental variables describing aquatic environments only [7,23]. All raster datasets were projected to the World Geodetic System 84 (WGS 84) projection, resampled to a 0.00214˚(approximately 250-m 2 ) resolution, and cropped to the extent of our study area using the 'raster' package version 3.4-13 in R (see S1 Text) [24,25].

(d) Statistical analyses
We constructed a predictive model estimating cholera incidence in each 250-m 2 raster cell in 2015 as a function of the spatial correlates using the presence-absence algorithm of the 'ran-domForest' package version 4.6-14 in R [42]. Briefly, the random forest (RF) algorithm uses bootstrap aggregation and resampling to create an ensemble of lowly correlated decision trees that together classify each datapoint [43,44].
To select the best-fitting model, we performed a stepwise model selection procedure using the variable importance measures from the RF model calibrated and evaluated with all covariates included. From this model, we selected the highest contributing variable to first create a univariate RF model using 80% of each sample group (i.e., presence and absence) as training data for model calibration and the remaining 20% for model evaluation. We ran the univariate models for 1000 iterations, computing the area under the curve (AUC) statistic from the receiver operating curve (ROC) generated for each run to create a 95% confidence interval of the AUC. From here, covariates were added individually to this model if the AUC confidence interval generated for the new model over 1000 iterations indicated improved predictive ability over the univariate model. For each iteration of the RF model, we used the algorithm's default settings in R to perform a supervised classification.
Once the relevant variables were identified, we ran the best-fitting RF model from 2015 1000 times, training and evaluating the model of each iteration with the same 80% sample or 20% sample of presence-absence data, respectively. With each iteration, the model fitted to the 2015 data predicted cholera occurrence probabilities for 2050 for each 250-m 2 cell. From these predictions, we constructed a mean, 2.5%-, and 97.5%-quantile rasterized map for each year by determining the mean, 2.5%-quantile, and 97-5%-quantile values for each cell within Bangladesh. Using the 'arcgisbinding' package in R, we interfaced ArcGIS Pro version 2.6.3 with R to transfer the raster maps generated in R to ArcGIS to ensure our rasters were of the appropriate resolution and extent [45,46]. All code used in the analysis is publicly available on github (github.com/sophiakruger/cholera_risk) and released under the GNU Public License v.3 [47].

(a) Drivers of cholera infection risk
Our random forest classification model including all predictors ("the full model") showed elevation as the most prominent predictor (S1 Table). Thus, we began our stepwise model selection from a model with elevation as the sole predictor. The random forest classification model including elevation and distance to water as predictors increased the model's predictive power Distance (m) from nearest surface water body. Meters (0, 63750) [33]; Original creation based on the surface water data from [27]. [34,35] The anticipated number of persons within each square kilometer.

PLOS GLOBAL PUBLIC HEALTH
compared to the full model and outperformed all other predictors that were added to the univariate model ( Table 2). Model performance invariably declined when additional variables were added one at a time to the bivariate model (S2 Table). Generally, we find cholera infection risk increased with lower elevation and a shorter distance to the nearest surface water body (S1 and S2 Figs).

(b) Spatial predictions of cholera infection risk
We find that the distribution of cholera infection risk changes over time, with coastal and inland Bangladesh projected to experience an increase in cholera infection occurrence probability from 2015 to 2050 (Fig 1(A)-1(C)). Even under the most conservative estimate for 2050, we find risk increases along tributaries, running inland from the coast (Fig 1(B)). In 2015, cells with an average occurrence probability of 0.50 or greater cluster tightly along the coast of the Khulna and Barisal districts and are more widely distributed inland, though many follow the Padma River north into the district of Dhaka (Fig 2). Yet by 2050, clusters of cells with an occurrence probability of 0.50 or greater are predicted to increase inland in the districts of Khulna, Barisal, Chittagong, Rajshahi, Dhaka, and Sylhet (Fig 2). Notably, while an occurrence

PLOS GLOBAL PUBLIC HEALTH
probability of 0.50 and greater cluster around major river systems along district boundaries in 2015, by 2050 these risk clusters expand inland latitudinally (Fig 2).

Discussion
In this study, we predicted how changing climatic and land-use patterns can alter the risk for cholera infection at very fine spatial scales for the entirety of Bangladesh between the years 2015 and 2050. Using a species distribution modelling approach, we found areas with low elevation and shorter distances to surface water to be at highest risk. Areas at low elevations have greater potential for inundation from future rainfall events, which may compromise sanitation systems and increase risk for the spread of waterborne pathogens. Not only this, but projected increases in coastal vulnerability to V. cholerae and more frequent heavy rainfall events will also likely increase the presence of V. cholerae in surface waters at these elevations [3,49]. Low PLOS GLOBAL PUBLIC HEALTH elevation areas are also likely at greater risk for infection than those of higher elevation given human settlement patterns on low-lying arable land, along rivers and other surface water. To the extent that high population density correlates with increased risk for infection, whether through increased contact with positive cases, sanitation system strain, or under-development and poverty, these areas therefore exhibit greater potential for human-to-human cholera spread [38,[50][51][52]. We find that although cells of high risk (designated as having a cholera case occurrence probability of 0.50 and higher) cluster along the coastline predominantly in 2015, by 2050 high-risk areas expand from the coast to inland Bangladesh with all but the northwestern district of Rangpur seeing increased clusters around surface water. The overall increased risk for infection in inland Bangladesh indicates that coastal vulnerability to infection translates to increased inland infection risk. This is worrying given the predicted doubling of ENSO events in the future which will only promote V. cholerae coastal suitability and increase coastal cholera incidence [3,53].
Cholera infection risk mapping studies that restrict their analyses to Bangladesh remain limited. Previous risk-mapping studies that quantified cholera infection risk on a global scale may account for global trends in the distribution of cholera incidence and its etiological agent; however, these trends may not accurately reflect the factors shaping the distribution of infection risk at the country level. For example, Escobar et al. (2015) generated a global suitability map for cholera infection, using the environmental suitability for V. cholerae as a proxy for cholera infection risk, but restricted those predictions to the coastline, globally, leaving inland risk values for cholera-endemic countries unaccounted for [3]. Recently, Azman and colleagues attempted to fill this gap by restricting their analysis to Bangladesh, quantifying relative infection risks at the grid-cell level; however, this analysis was restricted to present environmental conditions only [17]. Notably, our study addresses both issues by expanding the spatial scope of predictions under a future climate scenario to include inland Bangladesh, where approximately 70% of the population lives [54]. Given ongoing efforts to reduce global cholera morbidity by 90% by 2030, our study offers valuable insight into projected high-risk areas in need of continued, if not additional, public health intervention measures to reduce the burden of disease in the coming decades.
Even in the presence of infrastructural and public health advances, predictive risk mapping studies for cholera infection risk will continue to be essential in reducing the disease burden. This is because such predictions characterize a baseline set of expectations about the distribution of infection risk if future conditions resemble current circumstances. Moreover, novel cholera strains are expected to continue to arise in Bengali waters, due in part to cholera biology in the environmental reservoir. For instance, while bacteriophage niche adaptation has allowed bacteriophages to prey on V. cholerae infecting zooplankton in fresh and estuary water, coevolution enables V. cholerae to resist bacteriophage predation [55,56]. Additionally, phages can facilitate the evolution of specific toxigenic V. cholerae biotypes through horizontal transfer of genes associated with virulence or enhanced environmental fitness [57]. This suggests that aquatic interactions between bacteriophages and strains of V. cholerae can not only select for more environmentally persistent strains, but also more virulent strains with the capacity to seed epidemics.
Climate change is likely to affect not only the distribution of waterborne diseases inland, but also socioeconomic conditions and infrastructural integrity. Thus, further modelling studies should seek to include covariates of the latter in combination with climatic variables to predict infection risk. Such models should also consider the potential for climate-associated human migration inland from vulnerable coastal regions to influence inland risk. As we developed our model, we initially found the distance from each grid cell to the coast of Bangladesh to be an important variable in predicting cholera infection occurrence, with closer distances experiencing higher cholera occurrence probabilities. However, the lack of coastline projections for 2050 prevented us from including that variable in our model. Therefore, the need for accurate coastline data under future climate scenarios remains to support robust predictive studies into disease occurrence. In addition to supporting the need for accurate sociological variable data-which is difficult to project decades into the future-remote sensing data could fill this need and in turn be useful in training models that seek to consider the interplay between human hosts and their environment on the risk for cholera infection.
As with our study, to generate valid risk predictions future models must also rely on robust case incidence data that reflects actual disease prevalence. Risk predictions from correlative models may also improve with added model complexity, but potentially at the expense of explanatory power. In future infection risk forecasting studies for cholera, researchers should consider the use of hierarchical spatial models or neural networks in spatial distribution modelling that have been shown to generate robust predictions in emerging infectious disease studies [58][59][60][61].
Mechanistic models of transmission are also needed. Species distribution models (SDMs), like that of this study, represent a key first step in developing such models, but may not include the effect of climate-sensitive ecological processes on model predictions [62]. Therefore, in the context of global change, modelling the spatial distribution of risk for cholera infection is best done using process-based models that will use our model's infection probabilities, consider the correlative components of our model, and incorporate the ecological mechanisms influencing the distribution of cholera and human transmission. Nevertheless, our study holds importance in providing robust inland climate-associated cholera infection risk predictions that can inform preventive Bengali public health strategies.