^{1}

^{2}

^{3}

^{4}

^{1}

The authors have declared that no competing interests exist.

The world is rapidly becoming urban with the global population living in cities projected to double by 2050. This increase in urbanization poses new challenges for the spread and control of communicable diseases such as malaria. In particular, urban environments create highly heterogeneous socio-economic and environmental conditions that can affect the transmission of vector-borne diseases dependent on human water storage and waste water management. Interestingly India, as opposed to Africa, harbors a mosquito vector,

Statistical analyses and a phenomenological transmission model are applied to an extensive spatio-temporal dataset on cases of

Climate forcing and socio-economic heterogeneity act synergistically at local scales on the population dynamics of urban malaria in this city. The stationarity of malaria risk patterns provides a basis for more targeted intervention, such as vector control, based on transmission ‘hotspots’. This is especially relevant for

Urbanization and environmental change are the main driving forces of ecological and social change around the globe, specifically in developing countries and for human health. Cities in developing countries exhibit rapid and unplanned urbanization which creates heterogeneous environmental and socio-economic conditions, which can in turn lead to different risks of infection. Here we address the role of urban spatial heterogeneity in infection risk by

Addressing health problems associated with urban growth will be one of the major challenges of the 21st century, especially for the developing world [

Historically, urbanization has led to economic and social transformations associated with profound improvements in sanitation and hygiene [

The most common hypotheses for the persistence of malaria in cities include spatial variation in: 1) environmental conditions (relative humidity, temperature, precipitation), land use, and stored water, which create a favorable environment for Anopheles breeding in cities [

Importantly, these considerations are focused on Africa where endemic malaria remains a predominantly rural problem, because the main mosquito vectors are themselves rural, and in cities, largely peri-urban [

Here, we describe the spatial pattern of urban

We take advantage of a highly disaggregated dataset of monthly malaria cases collected by the Municipal Corporation of the city of Ahmedabad, the capital of state of Gujarat in Northwest India (

Location of study area (A), and temporal patterns of incidence of

Apart from decadal census data, annual population data were provided by the Ahmedabad Municipal Corporation to approximate the population of each ward. Socio-economic data were obtained from the District Census Handbook of the concerned district for the year 2001 from the Directorate of Census Operations, Gujarat. Monthly time series (from 2002 to 2014) for mean temperature, mean rainfall and relative humidity at 8 am are those from the meteorological station of the city of Ahmedabad (and were provided by the Indian Institute of Technology in Gandhinagar).

In order to investigate the existence of a spatial pattern in malaria incidence within the city of Ahmedabad, we performed a series of statistical analyses to address whether malaria risk varied within the city and what factors explained this variation. First, we analyzed the spatial and temporal variation of malaria incidence and identified regions of high and low risk based on incidence. Second, we performed a series of statistical analyses on the role of socioeconomic and environmental factors in the spatial, and spatio-temporal patterns of malaria incidence. These analyses ranged from a simple t-test comparing socio-economic factors between the two regions of differential malaria risk, to time series models incorporating the autocorrelation in the data and the external drivers (including climatic ones), to a full spatio-temporal general linear mixed model with random effects. Third, based on these results a probabilistic dynamical model was formulated for malaria transmission at the ward level, and predictions of this model were evaluated at the city level.

To consider a measure of vivax malaria risk independent from interannual variation, we normalized malaria incidence for each ward in a given year by the total number of cases throughout the city. We then ranked these normalized values across wards to determine if high risk locations were consistently so over time. To examine the robustness of the patterns, we complemented the estimation of the intensity of infection with the Slide Positivity Rate (SPR) [

To further characterize spatial variation in risk we applied a 2k-means cluster algorithm to the incidence data at the wards level, and examined the existence of at least two regions differing in malaria risk. (We pre-determined two groups of wards to consider the hypothesis of different transmission intensity in the core and periphery of the city). Also, two groups allow us to consider the subdivision of wards into high and low risk regions. We hypothesized that differences in malaria risk in these two main areas are largely explained by demographic and socio-economic factors. To test if the two regions differed significantly in those variables, we first extracted socioeconomic indicators, including slum density (number of slums/ward area), unemployment (number of unemployed people), number of marginal workers, literacy, population below 6 years, total population, number of households, vulnerable and economically deprived communities, from the 2001 Ahmedabad census and calculated the density of slums per ward based on cartographic information on the slums’ distribution within the city

We then addressed whether the temporal variation in malaria incidence responded differentially to climate variables (rainfall, temperature and humidity) across the two regions. To determine which predictors explain the temporal variation in vivax cases, we considered models with autoregressive terms to account for serial correlation in the data. The correlation structure in the malaria time series was assessed by inspecting the autocorrelation function ACF. Then we applied a generalized linear model (GLM) framework. Because observed count data, such as reported cases in infectious diseases, often exhibit significant over-dispersion [_{i}:
^{2} controls the strength of the local dependence, and _{ij} are neighborhood weights for each ward based on distance to the river. We additionally compared the best model to a model with a different distribution (Zero Inflated Poisson) but this model was not significantly better (

To model malaria risk within the city of Ahmedabad an inhomogeneous Markov chain model was used, following the theoretical framework developed for cholera by Reiner et al. 2012. In this approach the monthly malaria cases are categorized into discrete states of malaria incidence, which we chose as “low malaria”, “mild malaria,” and “high malaria”. The three discrete states partitioned the distribution of monthly incidence based on the 25^{th}, lower than 75^{th} and above 75^{th} quantiles. Then, the model assigns baseline probabilities _{i,j} to the transitions between these states in a defined time step as described by the following transition probability matrix P:
_{i,j,k,t} is the probability that ward k goes from state i to j from time t to time t+1. This probability is dependent on: (1) _{i,j,d}, the baseline transition probabilities of moving from state i to state j for a ward in risk region d; (2) a seasonal factor _{(t,d)} is periodic over the 12 months of the year and each group d has its own seasonality; (3) a neighborhood effect _{i,j,d} and ∝_{i,j,d} are estimated. Finally, the effects of temperature and humidity are included as sigmoidal functions, similar to the formulation for ENSO and its effect on cholera in Reiner et al 2012:

We considered different models obtained by including or neglecting the effect of a subset of the following factors: temperature, relative humidity, the state of the neighboring wards and the two different risk regions. We compared each of the models to a null model) employing a likelihood ratio test. The most complex model has 78 parameters (

Finally, to assess prediction performance, a cross-validation approach was implemented by sequentially removing the epidemic months (August-November) that follow the monsoon in a given year, refitting the model to the remaining data, and simulating it four months ahead starting from August to predict the course of the seasonal outbreak for the omitted period. Forecasting accuracy was estimated by computing the likelihood of the observed state. To that end, we inferred the probability distribution of the predicted state by performing 5000 independent simulations. This procedure is then sequentially repeated removing, one at a time, all the epidemic seasons available. To quantify the accuracy of our predictions, we calculate the percentage mean absolute error in our predictions, as well as a second quantity more practical and possibly relevant to public health, based on the definition of a ‘large’ outbreak. We defined such as event as one where the peak of the epidemic at the whole city level exceeds the 75% quantile of the distribution of this quantity. We quantify the fraction of times the model correctly predicts the observed malaria incidence state (above the 75% quantile). Then, to examine and illustrate the importance of the climate covariates to the predictions, we simulated the model using different combinations of humidity and temperature ‘data’. In particular, predictions for the epidemic months in the anomalous, low incidence years, were obtained using: 1) monthly observed temperature and average humidity 2) monthly average temperature and observed humidity 3) monthly average humidity and average temperature. Monthly averages were computed based on the mean of all previous years for a given month. We also examined the performance of the model by obtaining a one-step ahead prediction, where we removed 1 month of data at a time for all the wards (

We initially addressed whether the spatial distribution of normalized

The panels show the distribution of the cases normalized by population, with the intensity of the color (from low yellows to high reds) corresponding to the ranking of incidence. There is striking consistency from one year to the next in the places exhibiting the highest burden of the disease. Some of this regularity also extends to the two parasites. See

The spatial pattern observed in

Map depicting the two groups of wards (administrative units), with high and low malaria risk respectively,

We then asked whether the differences in malaria risk between the low and high malaria risk regions are associated with differences in population density (number of slums, population size, number of households) or economic level (income, unemployment, literacy).

1.834 | 54.99 | 0.042 | 3.261 | 2.747 | |

3.059 | 29.21 | 0.004 | 10.82 | 8.828 | |

3.197 | 30.56 | 0.003 | 6.967 | 5.592 | |

3.217 | 29.18 | 0.003 | 10.884 | 8.566 | |

3.275 | 29.43 | 0.002 | 9.095 | 7.120 | |

3.237 | 29.18 | 0.003 | 11.20 | 8.808 | |

0.336 | 45.51 | 0.737 | 1.379 | 1.438 | |

3.178 | 29.29 | 0.003 | 9.565 | 7.544 | |

3.210 | 40.75 | 0.002 | 6.207 | 4.825 | |

3.934 | 36.14 | 0.000 | 8.841 | 6.682 |

Moreover, the temporal variation in malaria incidence between the two regions could be influenced differentially by the environmental covariates.

Low risk region | ||||||

Estimate | Std. Error | z value | Pr(>|z|) | 2.50% | 97.50% | |

ar1 | 0.63891 | 0.064 | 9.917 | 0 | 0.448 | 0.718 |

intercept | -1.257 | 2.126 | -0.591 | 0.554 | -13.702 | 9.159 |

temp | 0.140 | 0.069 | 2.006 | 0.044 | -0.035 | 0.725 |

RH | 0.034 | 0.025 | 1.329 | 0.0183 | 0.027 | 0.258 |

High risk region | ||||||

Estimate | Std. Error | z value | Pr(>|z|) | 2.50% | 97.50% | |

ar1 | 0.5835 | 0.0689 | 8.4637 | 0.0000 | 0.4483 | 0.7186 |

intercept | -2.2716 | 5.8323 | -0.3895 | 0.6969 | -13.7027 | 9.1594 |

RH | 0.1775 | 0.0235 | 3.2948 | 0.0010 | 0.1332 | 0.3952 |

Interestingly,

For a more dynamical perspective, we used an inhomogeneous Markov chain model that incorporates the effect of spatial and temporal variation on malaria risk. Results are also consistent for both parasites. Specifically, the comparison of the different models analyzed (

The best likelihood is for the model that incorporates seasonality, temperature, two regions and neighbors. The last column shows the result of a likelihood ratio test between the null model (model 1) and each of the other models.

+ | + | + | + | + | -4821.559 | 78 | 9799.119 | * | |

+ | + | + | + | -4903.554 | 72 | 9951.107 | * | ||

+ | + | + | -5670.935 | 66 | 11473.87 | * | |||

+ | + | + | + | -5894.961 | 39 | 11867.92 | * | ||

+ | + | -5932.35 | 54 | 11972.7 | * | ||||

+ | + | + | -6573.846 | 36 | 13219.69 | * | |||

+ | + | -6648.966 | 33 | 13363.93 | |||||

+ | -6656.621 | 27 | 13367.24 | -- |

Finally,

In (A), the red line corresponds to the average number of cases per 1000 for the 59 wards. The blue dots correspond to predictions given by the median of 5000 simulations, and the gray bars correspond to the 5th and 95th percentiles. In (B-D), simulations of the model predict the seasonal epidemics of 2009 and 2010 starting from the end of the monsoons (August) under modifications of the observed climate covariates. The different panels show the effect of fixing temperature and/or humidity at their mean monthly values, to remove their effect on the interannual variation of these anomalous years. When the interannual effect of both is removed (B), the model clearly over-estimates the cases. Individual effects are less pronounced (C, D) although predictions are also higher than observations. Our best model has a mean absolute error of 68% for predicting the peak of the epidemic in a year with a high number of cases (2013). Fig 4 (E, left), shows the distribution of model forecasts from 5000 runs for October 2003 based on October 2002 data. Although the mean prediction differs from the observation, almost all (~84%) model simulations resulted in large events for 2003. The figure on the right repeats this hindcast analysis for October 2013 (using data from 2012). Here, we find a reduced but still large (∼87%) probability of a large outbreak.

Because the model is stochastic and it considers discrete states (no malaria, low and high), we simulated repeatedly, and from these ensemble of simulations computed the mean number of cases in a given month for a given ward. Our simulations generate realizations of the stochastic process and therefore, configurations of the discrete states. To convert the discrete states to cases, we used the mean number of cases for each class. The red line corresponds to the wards mean observed cases of the city. The blue dots show the median of the simulated values and the blue shaded regions correspond to the 5th– 95th percentile range over 5000 simulations.

Most transmission models of vector-borne diseases tend to aggregate the data at large scales and treat transmission homogeneously in space [

For

The two different risk regions within the city were also shown to exhibit differential temporal responses to climate forcing. This finding underscores the importance of humidity to malaria transmission, with a higher water table in the high risk region possibly increasing relative humidity and affecting vector ecology. The spread of malaria requires favorable conditions for the survival of both the mosquito and the parasite. Temperatures in the approximate range of 21°-32°C and a relative humidity of at least 60% are most conducive to transmission [

Social and economic elements such as the quality of housing can also favor the biological development of mosquitoes [

Our dynamical and stochastic model captures the seasonal pattern and the main trends in the interannual variation of the malaria cases. Interestingly, most of the models that incorporate spatial structure, namely the two regions, perform better than the models that do not. This conclusion is consistent with the results for diarrheal diseases in Dhaka, where consideration of different parts of the city also improved model performance [

The model is able to capture the interannual trends and in particular, the lower outbreaks of years like 2009 and 2010, based on the effect of climate covariates. These two years exhibit anomalous high temperature and low humidity (associated with low monsoon rainfall), only comparable to values in 2002, another year with low incidence

Besides prediction, the phenomenological modeling framework applied here is also useful to address the spatial scale at which to aggregate the data to consider process-based epidemiological models, in a way that balances reducing the noise with representing dominant spatial heterogeneity. Questions on the spatial scale of aggregation are specifically relevant to addressing climate forcing in the context of socio-economic heterogeneity. Given that pronounced changes in urbanization will co-occur with those in climate, these are fundamental questions for infectious disease dynamics within cities.

Although our approach is able to capture the interannual variation in the data and predict the peak of the epidemic, it could be improved in several directions. For example, one could incorporate in the model: (1) mobility fluxes derived from the spatial distribution of the population with movement models, to replace the near-neighbor effects on transition probabilities; (2) the explicit effect of population density on group-dependent parameters explicitly; (3) further analysis of the local effect of environmental heterogeneities such as river discharge and soil moisture on malaria incidence at higher resolution by increasing the number of groups in the model. Moreover, temporal changes of the city itself would be of interest, including changes in the local speed of urbanization, and their implications for mobility, population distribution and economic level.

The region has experienced strong malaria interventions in the last three decades reflected in the pronounced negative trends in the number of reported cases from the 1980s and 19990s to the 2000s. From 2000 onwards, malaria prevalence in the city of Ahmedabad has remained however fairly stationary. Although the spatio-temporal variation in intervention efforts could influence the results of our models. This is unlikely given that the interventions within the city are largely homogeneous in space (

Our probabilistic model tends to underpredict the size of outbreaks. This bias results from the transformation of the incidence data into discrete malaria levels, which smooth’s out extreme events. The effect of this bias can be assessed and corrected by lowering the threshold probability (the proportion of simulations with large outbreaks) below 50% using ROC (Receiver-Operating-Curves) (Reiner et al., 2012). The number of discrete classes describing the malaria levels could also be increased or estimated to balance complexity and accuracy. At the limit, one could move to stochastic models that do not require such discretization, although their parameterization would present challenges related to model complexity.

A better understanding of urbanization and malaria is needed, since urban environments can contribute to the persistence of the disease and frustrate elimination efforts more broadly at a regional level, by creating a reservoir for the disease in cities that contributes to transmission in rural areas. In India, the earlier National Eradication Programs focused on rural areas, with urban malaria contributing to the resurgence of disease in the 1970s [

The panels show the distribution of SPR with the intensity of the color (from low yellows to high reds for

(TIF)

The latter was generated by overlaying the slum distribution map with the wards map provided by the municipal corporation, and calculating the number of slums per ward divided by the ward area.

(TIF)

(Here we show these functional forms for an arbitrary amplitude A and scale M, and different shape values: h = 0 (black line), h = 7/8pi (red line) and h = pi (green line).

(TIF)

The red line corresponds to the average monthly cases per 10000 for the 59 wards. The blue dotted line corresponds to one-month ahead predictions for the median of the 5000 simulations values, and the light brown shaded region corresponds to the interval between the 5^{th} and 95^{th} percentiles for these simulations.

(TIF)

The top panels represent the seasonal pattern for

(TIF)

Although most of the autocorrelations fall within the confidence intervals, there is a small autocorrelation at lags of 11 and 12 months (seen in the significant spike of the ACF plot). This suggests that the model can be slightly improved by capturing the remaining seasonal variation.

(TIF)

(TIF)

The panels show the distribution of the cases normalized by population, with the intensity of the color corresponding to the ranking of incidence.

(TIF)

(TIF)

(TIF)

Red dots represent wards in the high risk regions, and yellow dots, those in the low risk region.

(TIF)

(DOCX)

(DOCX)

The table shows the effects included in the model formulation and the corresponding total number of parameters.

(DOCX)

(DOCX)

Confidence intervals (CI) from posterior distributions (from two chains that are well mixed and have converged).

(DOCX)

(DOCX)

This model includes an autoregressive term that accounts for serial autocorrelation, the effect of relative humidity in both regions, and the effect of temperature in the low risk region.

(DOCX)

The best likelihood is obtained for the model that incorporates seasonality, temperature, two regions and the effect of neighbors. The last column shows the likelihood ratio test between the null model (model 1) and each of the other models.

(DOCX)