Spatial representativeness matters for Climate-Driven Dengue Forecasting

Khemmanant Khamthong; Kunwithree Phramrung

doi:10.1371/journal.pntd.0014270

Abstract

Accurate forecasting of dengue incidence requires statistical models that explicitly accommodate overdispersion, temporal dependence, and delayed environmental forcing. We develop a Bayesian negative binomial dynamic regression model to generate monthly forecasts of dengue hemorrhagic fever (DHF) incidence in Kanchanaburi Province, Thailand. Transmission persistence is captured through lagged dengue incidence, while delayed climatic effects are represented using locally observed maximum temperature and relative humidity. Model adequacy and predictive performance are assessed using posterior predictive checks and leave-one-out cross-validation (LOO-CV). The negative binomial specification consistently outperforms Poisson-based alternatives under substantial overdispersion. Importantly, forecasting performance is not determined solely by the strength of marginal climate-dengue associations. Instead, it depends critically on the spatial representativeness of climatic inputs relative to the population at risk. Models informed by climatically representative observations yield more stable and robust out-of-sample forecasts, even when marginal associations are comparatively weaker. These findings underscore the distinction between explanatory association and predictive utility in climate-driven infectious disease models and provide practical guidance for the development of climate-informed dengue early warning systems in endemic settings.

Author summary

Dengue fever is a mosquito-borne disease that affects hundreds of millions of people each year, particularly in tropical regions. Because dengue transmission is strongly influenced by weather conditions such as temperature and humidity, climate data are widely used to build forecasting models that help public health authorities anticipate outbreaks. However, an important practical question is often overlooked: does the location of the meteorological station used to represent climate conditions affect the reliability of dengue forecasts? In this study, we compared dengue forecasting models that used climate data from two meteorological stations located in different environments within the same province in Thailand. One station showed stronger statistical correlations with dengue incidence, while the other was located closer to the main population center where most dengue cases occur. Using Bayesian predictive model comparison, we found that the station located nearer to the population center produced forecasting performance comparable to that of the station with the stronger marginal climate-dengue correlation. These findings suggest that strong historical correlations between climate and disease do not necessarily lead to better forecasts. Instead, the spatial representativeness of climate measurements relative to the population at risk may play a key role. Our results provide practical guidance for designing climate-informed dengue early warning systems in endemic regions.

Citation: Khamthong K, Phramrung K (2026) Spatial representativeness matters for Climate-Driven Dengue Forecasting. PLoS Negl Trop Dis 20(4): e0014270. https://doi.org/10.1371/journal.pntd.0014270

Editor: Richard A. Bowen, Colorado State University, UNITED STATES OF AMERICA

Received: January 16, 2026; Accepted: April 15, 2026; Published: April 27, 2026

Copyright: © 2026 Khamthong, Phramrung. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The dengue surveillance data analyzed in this study were provided by the Division of Epidemiology, Department of Disease Control, Ministry of Public Health, Thailand. Meteorological data were obtained from the Thai Meteorological Department, Ministry of Digital Economy and Society. Due to institutional data governance and privacy restrictions, these datasets are not publicly available. Access to the underlying data can be requested directly from the respective organizations: the Division of Epidemiology (http://doe.moph.go.th) and the Thai Meteorological Department (http://www.tmd.go.th).

Funding: This research was financially supported by Mahasarakham University to KK. The funder’s website is https://www.msu.ac.th. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Dengue remains a major cause of febrile illness and hospitalization across endemic regions of Southeast Asia, placing sustained pressure on local health systems and routine infectious disease surveillance [1,2]. In Thailand, recurrent dengue outbreaks continue to challenge timely preparedness, particularly at the provincial level, where surveillance decisions must often be made under uncertainty and resource constraints. Improving the ability to anticipate short-term fluctuations in dengue incidence is therefore not only a statistical exercise but also a core component of infection control planning and early warning system design.

Climate-sensitive infectious diseases such as dengue are commonly modeled using environmental covariates, motivated by well-established effects of temperature and humidity on mosquito survival, viral replication, and transmission efficiency [3,4]. However, for surveillance and operational forecasting, the utility of climate information depends not only on biological relevance but also on how well climatic measurements represent the conditions experienced by the population at risk. In practice, meteorological monitoring networks are often spatially heterogeneous, leading to a misalignment between climate data and the catchment areas of disease surveillance systems. This mismatch raises a critical question for infectious disease forecasting: whether stronger climate-dengue associations necessarily translate into better predictive performance for public health decision-making [5].

Routine dengue surveillance data present additional challenges for forecasting in endemic settings. Monthly case counts are typically overdispersed due to clustered transmission, unobserved heterogeneity, and variability in reporting practices [6]. At the same time, dengue incidence exhibits strong temporal dependence, reflecting transmission persistence and delayed responses to environmental forcing [7]. Forecasting models that fail to accommodate these features often yield unstable predictions, limiting their usefulness for infection control and early warning purposes.

Dynamic regression models with lagged incidence terms provide a pragmatic framework for capturing short-term transmission persistence while incorporating exogenous drivers such as climatic covariates [8,9]. When coupled with a negative binomial likelihood, such models can accommodate the extra-Poisson variability characteristic of infectious disease surveillance data, thereby improving predictive stability. Within a Bayesian framework, uncertainty can be coherently propagated from model parameters to forecasts, a critical requirement for risk-based decision-making in infectious disease control [10].

Many climate-dengue studies focus on identifying statistically significant associations between environmental variables and disease incidence [4,11]. However, from the perspective of infection surveillance, explanatory association does not necessarily imply predictive utility. In particular, climatic variables that exhibit strong marginal correlations with dengue incidence may offer limited forecasting benefit if they are spatially misrepresentative of exposure conditions relevant to the surveillance population. Despite its operational relevance, the role of spatial representativeness of climate inputs in shaping dengue forecast performance has received limited explicit attention in the infectious disease literature.

In this study, we develop a Bayesian negative binomial dynamic regression model to forecast monthly Dengue Hemorrhagic Fever (DHF) incidence in Kanchanaburi Province, Western Thailand. The model explicitly captures short-term transmission persistence through lagged dengue incidence and incorporates lagged climatic covariates derived from provincial meteorological stations. Rather than seeking to maximize climate-dengue association strength, the primary objective is to evaluate how the spatial representativeness of climatic inputs influences out-of-sample forecasting performance. By contrasting models informed by meteorological observations with differing degrees of alignment to the population under surveillance, we distinguish explanatory association from predictive value in a manner directly relevant to dengue surveillance and infection control.

Building on the concept of spatial exposure misalignment, we hypothesize that climate observations obtained from meteorological stations that are more spatially representative of the population under surveillance will provide equal or improved out-of-sample predictive performance compared with stations that exhibit stronger marginal climate-dengue correlations but are geographically less representative.

To test this hypothesis, we compare models informed by climate data from two meteorological stations in Kanchanaburi Province. The first station is located near the provincial population center, where most dengue cases are reported, while the second station is situated in the mountainous Thong Pha Phum district, which exhibits distinct microclimatic conditions. This comparison allows us to Evaluate whether spatial representativeness of climate inputs improves forecasting reliability for dengue surveillance.

2. Methods

2.1. Study design and data

A retrospective ecological time-series study was conducted to investigate the association between climatic variability and dengue incidence in Kanchanaburi Province, Western Thailand. Monthly dengue case counts were obtained from routine provincial surveillance records maintained by public health authorities. Meteorological variables, including monthly mean maximum temperature and monthly mean relative humidity, were retrieved from official meteorological monitoring conducted at two stations located within the province. The primary outcome was the monthly number of reported dengue cases. All datasets were aggregated at the monthly level and temporally aligned. Calendar years originally recorded in the Buddhist Era were converted to the Gregorian calendar to ensure temporal consistency across data sources and facilitate integration with meteorological records.

We evaluated two meteorological stations with contrasting geographical profiles. The Kanchanaburi station is located in the provincial administrative center, where the majority of the population resides and where dengue surveillance activities are concentrated. Climatic conditions recorded at this station are therefore expected to better represent the environmental conditions experienced by much of the population at risk.

In contrast, the Thong Pha Phum station is situated in a remote mountainous district characterized by forested terrain, lower population density, and distinct microclimatic conditions compared with the provincial population center.

This contrast provides a natural framework for evaluating the role of spatial representativeness in climate-disease modeling. Specifically, it allows us to examine whether meteorological data that are more closely aligned with the environmental conditions experienced by the surveillance population yield more reliable predictive inputs for dengue forecasting than data from stations located in climatically distinct areas, even when the latter exhibit stronger marginal correlations with dengue incidence.

2.2. Descriptive characteristics and exploratory analysis

To provide an initial overview of dengue transmission dynamics in the study area, we conducted an exploratory analysis of monthly dengue incidence in Kanchanaburi Province. Descriptive inspection of the time series facilitates assessment of interannual variability, outbreak clustering, and long-term temporal patterns, thereby motivating the subsequent dynamic modeling strategy.

Fig 1 illustrates pronounced interannual variability with recurrent outbreak peaks, indicating substantial temporal dependence and nonstationary behavior. These characteristics are consistent with dengue transmission dynamics and motivate the use of time-series regression models that explicitly account for lagged incidence and climatic drivers.

Download:

Fig 1. Monthly dengue cases in Kanchanaburi Province, Thailand, from 2014 to 2023.

The time series displays substantial temporal variability and recurrent outbreak peaks, supporting the application of dynamic time-series modeling approaches.

https://doi.org/10.1371/journal.pntd.0014270.g001

To further characterize the statistical properties of the observed data and inform model specification, Table 1 summarizes the distributional characteristics of dengue incidence and meteorological variables. Monthly dengue cases exhibit pronounced overdispersion (Mean = 46.44; SD = 56.40), providing a strong empirical rationale for adopting a negative binomial likelihood over a Poisson formulation. Furthermore, climatic variables demonstrate substantial inter-station variability, particularly in rainfall and relative humidity; for instance, the Thong Pha Phum station recorded a markedly higher mean rainfall (132.87 mm) compared to the Kanchanaburi station (88.88 mm). This reflects significant spatial heterogeneity in local microclimatic conditions, necessitating the standardization of predictors prior to model estimation.

Download:

Table 1. Summary statistics for monthly dengue incidence and climatic variables in Kanchanaburi Province.

https://doi.org/10.1371/journal.pntd.0014270.t001

Synthesizing these observations, the temporal persistence, pronounced overdispersion, and climatic variability suggest that dengue incidence is driven by a combination of intrinsic transmission momentum and lagged environmental forcing. We therefore hypothesize that a model integrating lagged incidence with locally representative, multi-month lagged climatic covariates will more effectively capture underlying transmission dynamics and enhance out-of-sample predictive performance.

Spearman’s rank correlation coefficients (Table 2) reveal that climatic variables at the Thong Pha Phum station exhibit consistently stronger monotonic associations with dengue incidence compared to the Kanchanaburi station. Furthermore, the strong intercorrelation between rainfall and relative humidity at both sites highlights the risk of multicollinearity in frequentist regression frameworks. This underscores the advantage of our Bayesian hierarchical approach, which inherently accommodates correlated predictors and ensures coherent uncertainty propagation during inference.

Download:

Table 2. Spearman’s rank correlation coefficients between monthly dengue cases and climatic variables from two meteorological stations in Kanchanaburi Province.

https://doi.org/10.1371/journal.pntd.0014270.t002

2.3. Feature construction and lag specification

To account for short-term temporal dependence in dengue transmission, we incorporated lagged incidence terms as predictors, a strategy commonly employed in infectious disease time-series modeling to represent intrinsic transmission persistence and reporting-related autocorrelation [6,9]. Monthly case counts were log-transformed () to linearize autoregressive effects and stabilize variance. Lagged dengue incidence at one and two months was included to capture short-term transmission momentum, delayed case reporting, and unobserved local transmission dynamics commonly observed in dengue surveillance systems [6,12].

Climatic predictors consisted of monthly mean maximum temperature and mean relative humidity. Climatic variability may influence dengue transmission through cumulative effects on the life cycle of Aedes mosquitoes, including larval development, adult mosquito abundance, and the extrinsic incubation period of the virus, together with the subsequent time required for human infection, case detection, and reporting within surveillance systems [13,14].

To identify an appropriate lag structure, an exploratory cross-correlation analysis was conducted between climatic variables and monthly dengue incidence. The analysis indicated relatively stronger and more stable associations for maximum temperature and relative humidity at a 5-month lag compared with other candidate lags. Considering both ecological plausibility and the empirical patterns observed in the data, climatic predictors were therefore lagged by five months in the primary model specification. This lag is consistent with the cumulative ecological processes linking climate conditions to dengue transmission, including mosquito population development, viral incubation within the vector, and delays associated with human infection and surveillance reporting.

All continuous predictors were standardized (zero mean, unit variance) prior to model fitting. Consequently, estimated regression coefficients represent the expected change in dengue incidence associated with a one-standard-deviation increase in each predictor, facilitating comparison of effect sizes and improving numerical stability during Bayesian estimation [10].

2.4. Model specification

Let denote the observed number of dengue cases in month t. Dengue incidence was modeled using a negative binomial distribution to accommodate overdispersion commonly observed in infectious disease surveillance data:

(1)

where denotes the conditional mean incidence and is an overdispersion parameter governing the variance structure. Under this parameterization, the variance of increases with , allowing the model to capture variability exceeding the Poisson assumption.

The conditional mean was linked to covariates through a log-linear dynamic regression structure:

(2)

where and denote standardized log-transformed dengue incidence lagged by one and two months, respectively; denotes standardized maximum temperature lagged by five months; and denotes standardized relative humidity lagged by five months.

This specification explicitly captures temporal dependence through lagged outcome terms rather than through stochastic autoregressive error components, thereby avoiding potential overparameterization or double-counting of serial correlation. The resulting framework aligns with established INGARCH-type modeling approaches for infectious disease time series while remaining parsimonious and interpretable at a monthly temporal scale [6,9].

3. Bayesian inference

3.1. Negative binomial formulation

Monthly dengue case counts often exhibit variability exceeding that expected under a Poisson assumption due to unobserved heterogeneity, reporting noise, and clustered transmission events. To explicitly accommodate this excess variation, dengue incidence was modeled using a negative binomial distribution parameterized by its mean and overdispersion parameter .

Under this parameterization, the probability mass function is given by

(3)

where denotes the gamma function. The variance of is

(4)

which exceeds the mean whenever is finite. Smaller values of correspond to greater overdispersion, while the model converges to a Poisson distribution as .

This formulation allows the model to flexibly capture observed variability in dengue incidence without imposing restrictive distributional assumptions, thereby improving both in-sample fit and out-of-sample predictive performance.

Taking the logarithm of the likelihood yields the log-likelihood contribution for observation t:

(5)

The full log-likelihood is obtained by summing over all time points. The conditional mean is linked to lagged dengue incidence and climatic covariates through a log-linear dynamic regression structure, enabling explicit modeling of short-term transmission persistence and delayed environmental effects.

Bayesian inference provides a principled probabilistic framework for synthesizing observed data with prior knowledge. The posterior distribution of the parameter vector , given the observed dengue case series , is determined by Bayes’ theorem:

(6)

where represents the likelihood function and denotes the joint prior distribution.

3.2. Prior specification

Weakly informative priors were specified to regularize parameter estimation while preserving data-driven inference:

(7)

(8)

(9)

(10)

Relatively tighter priors were assigned to lagged dengue incidence terms to reflect stronger prior expectations of short-term persistence in transmission dynamics. Slightly more diffuse priors were used for climatic covariates to allow flexibility in estimating delayed environmental effects whose magnitude and direction may vary across ecological and climatic settings.

The exponential prior on the overdispersion parameter enforces positivity and stabilizes posterior estimation while remaining weakly informative. Together, these prior choices ensure identifiability of the dynamic regression structure, mitigate overfitting, and allow posterior inference to be driven primarily by the information contained in the observed data.

3.3. Model assessment

Bayesian inference was carried out using Hamiltonian Monte Carlo implemented in Stan via the brms package. Four parallel chains were fitted, each run for 12,000 iterations, with the initial 4,000 iterations discarded as burn-in. The remaining draws were used for posterior inference. Chain behavior was examined to ensure stable sampling. Trace plots showed no evidence of poor mixing, effective sample sizes were adequate across parameters, and all values were below 1.01.

Model adequacy was evaluated by comparing observed dengue counts with data replicated from the fitted models. These posterior predictive checks focused on whether the models could reproduce the overall distribution of cases and the main temporal patterns observed in the time series. Replicated data generated from the posterior predictive distribution were compared with observed case counts to assess both the distribution of cases and the temporal structure, including seasonal patterns.

Out-of-sample predictive performance was evaluated using leave-one-out cross-validation based on Pareto-smoothed importance sampling. Models were compared using expected log predictive density, with differences in ELPD and their associated standard errors used to summarize relative predictive performance. A small number of influential observations were identified based on elevated Pareto shape parameters. These observations were examined, and moment matching was applied where necessary to improve the stability of cross-validation estimates. Sensitivity to the assumed count distribution was assessed by fitting a Poisson model with the same linear predictor and comparing its predictive performance with that of the negative binomial specification. Model comparison focused on out-of-sample predictive accuracy rather than in-sample fit.

3.4. Sensitivity and robustness analyses

To ensure the identifiability of the dynamic regression structure and mitigate potential overfitting, we employed weakly informative priors tailored to each parameter class (Eqs. 7–10). Furthermore, we conducted extensive sensitivity analyses to verify the robustness of our primary inferences across alternative lag configurations and prior specifications. Specifically, we evaluated climatic lags ranging from four to six months. While models with shorter lags remained statistically significant, the five-month lag was prioritized for its superior alignment with the biological latency of dengue transmission. Conversely, longer lags (e.g., six months) resulted in non-significant associations for key variables, such as relative humidity, where the 95% credible interval encompassed zero. These results confirm that our core findings are not driven by arbitrary modeling assumptions but reflect stable underlying associations.

4. Results and discussion

In this section, we present the empirical findings from our Bayesian negative binomial framework, evaluating the drivers of dengue in Kanchanaburi Province. The analysis begins with model convergence diagnostics, followed by an assessment of transmission persistence, climatic forcing, and predictive robustness.

Table 3 presents the posterior summaries and convergence diagnostics for the Bayesian negative binomial models. All parameters demonstrated excellent mixing and convergence, as evidenced by values of 1.00 and substantial effective sample sizes for both the bulk and tail of the posterior distributions. These metrics, alongside the results from the sensitivity analyses (3.4), confirm that our posterior inferences are robust and the Hamiltonian Monte Carlo sampling was highly efficient. Lagged dengue incidence at one and two months is positively associated with current incidence across both station-based models, reflecting the strong temporal persistence of dengue transmission. In contrast, maximum temperature and relative humidity at a five-month lag display negative associations with current dengue incidence. These effects are consistent across stations and remain robust after accounting for overdispersion and temporal dependence. The estimated dispersion parameter further confirms substantial extra-Poisson variability in monthly dengue incidence.

Download:

Table 3. Posterior summaries and convergence diagnostics of the Bayesian negative binomial model.

https://doi.org/10.1371/journal.pntd.0014270.t003

Incidence rate ratios (IRRs) derived from the posterior distributions are reported in Table 4. All covariate effects exhibit high posterior probabilities of direction, narrow credible intervals, and excellent convergence diagnostics, supporting the reliability of the inferred associations.

Download:

Table 4. Incidence Rate Ratios (IRR) and posterior diagnostics from the Bayesian negative binomial model for dengue incidence in Kanchanaburi.

https://doi.org/10.1371/journal.pntd.0014270.t004

4.1. Model adequacy and posterior predictive diagnostics

Model adequacy was evaluated using a comprehensive set of posterior predictive checks targeting both marginal distributional fit and temporal dependence across the full study period of 120 months. Density-based posterior predictive checks (Fig 2) show close agreement between the observed dengue incidence and replicated datasets drawn from the posterior predictive distribution, indicating that the negative binomial specification adequately captures the pronounced right-skewness and overdispersion characteristic of the surveillance data.

Download:

Fig 2. Posterior predictive check (PPC) density overlays for monthly dengue incidence.

Panels (A) and (B) correspond to the two study areas. The observed dengue cases are compared with 100 replicated datasets drawn from the posterior predictive distribution of the Bayesian negative binomial model. The close agreement between observed and replicated densities indicates that the model adequately captures key marginal distributional features, including overdispersion and right-skewness.

https://doi.org/10.1371/journal.pntd.0014270.g002

To assess whether the fitted model reproduces the temporal dependence observed in the data, Bayesian posterior predictive autocorrelation function (ACF) checks were conducted (Fig 3). The observed ACF (red dots) lies largely within the 90% posterior predictive envelope (gray shaded region) across lags up to 40 months. This result suggests that the inclusion of lagged dengue incidence effectively captures the dominant short-term serial dependence and the seasonal structure of the time series without requiring an additional stochastic autoregressive error component.

Download:

Fig 3. Bayesian posterior predictive autocorrelation function (ACF) checks for monthly dengue incidence.

Panels (A) and (B) correspond to the two study areas. The solid blue line represents the posterior median ACF computed from replicated datasets drawn from the posterior predictive distribution; the shaded band denotes the 90% posterior predictive envelope, and the red points indicate the observed ACF. Autocorrelations are displayed up to lag 40 months to assess whether the fitted model reproduces the short- to medium-term temporal dependence observed in the data.

https://doi.org/10.1371/journal.pntd.0014270.g003

To further evaluate longer-term temporal structure, residual diagnostics based on Bayesian-consistent Pearson residuals were examined up to 60-month lags (Fig 4). This extended lag window allows assessment of potential interannual dependence associated with the 3–5-year dengue outbreak cycles reported in Southeast Asia. Across both study areas, residual autocorrelations remain within the nominal 95% confidence bounds, indicating no substantial unexplained long-term temporal dependence.

Download:

Fig 4. Autocorrelation function (ACF) of Bayesian-consistent Pearson residuals from the negative binomial regression model with lagged dengue cases and climate covariates.

Panels (A) and (B) correspond to the two study areas. The dashed lines denote approximate 95% confidence bounds under the white-noise assumption. Residual autocorrelation is examined up to 60-month lags to evaluate potential long-term temporal structure, including the documented 3-5 year dengue outbreak cycles. The absence of substantial residual autocorrelation across lags indicates that temporal dependence is adequately captured by the lagged response structure without the need for an additional stochastic autoregressive error component.

https://doi.org/10.1371/journal.pntd.0014270.g004

Taken together, these posterior predictive and residual diagnostics suggest that the fitted model provides an adequate representation of both the distributional properties and the temporal dynamics of monthly dengue incidence, capturing short-term transmission persistence as well as longer-term interannual variability.

Table 5 presents the model comparison results based on leave-one-out cross-validation (LOO). Across all specifications, the Bayesian negative binomial model consistently demonstrates improved predictive performance relative to the Poisson alternative, as evidenced by higher expected log predictive density (ELPD) and lower LOOIC values. This underscores the importance of explicitly accounting for overdispersion in monthly dengue incidence data.

Download:

Table 5. Model comparison based on leave-one-out cross-validation (LOO) and predictive accuracy.

https://doi.org/10.1371/journal.pntd.0014270.t005

Regarding the comparison between meteorological sources, the negative binomial model informed by the Kanchanaburi station data yields the lowest nominal LOOIC and RMSE. However, its predictive performance is statistically indistinguishable from the model utilizing Thong Pha Phum data. Specifically, the difference in expected log predictive density (ELPD ) is smaller than twice its associated standard error (SE = 2.60), indicating that the two station-based models are statistically equivalent in terms of predictive accuracy based on the available data.

These findings suggest that stronger marginal correlations between climatic variables and dengue incidence do not necessarily translate into superior out-of-sample predictive performance. Although climatic variables from the Thong Pha Phum station exhibit stronger marginal associations with dengue cases, the resulting model does not demonstrate a statistically meaningful improvement in predictive accuracy.

Taken together, these results support the interpretation that spatial representativeness of climatic exposure relative to the primary population center may influence predictive performance but does not necessarily confer a statistically significant predictive advantage. In this context, meteorological observations that more closely reflect the environmental conditions experienced by the majority of the population can provide predictive information comparable to that derived from stations located in climatically distinct areas, even when the latter exhibit stronger marginal correlations with dengue incidence.

4.2. Out-of-sample forecasting performance

Out-of-sample forecasting performance was evaluated using a direct multi-step forecasting protocol with horizons ranging from one to six months ahead. Fig 5 summarizes forecasting accuracy in terms of RMSE and MAE.

Download:

Fig 5. Out-of-sample forecasting performance across k-step-ahead horizons (1-6 months).

Panels (A) and (B) report RMSE and MAE, respectively, comparing the predictive accuracy of Bayesian negative binomial (teal) and Poisson (orange) models. Climate covariates from the Kanchanaburi and Thong Pha Phum meteorological stations are used to assess model stability and spatial representativeness. Both metrics indicate that the negative binomial model consistently outperforms the Poisson model across all forecast horizons and study areas.

https://doi.org/10.1371/journal.pntd.0014270.g005

Across all forecast horizons, the Bayesian negative binomial model consistently outperforms the Poisson specification for both meteorological stations. The performance advantage of the negative binomial model becomes more pronounced at longer horizons (), indicating greater robustness to overdispersion and increasing forecast uncertainty. Forecasting errors obtained from models incorporating climatic covariates from the Kanchanaburi station are broadly comparable to those derived from the Thong Pha Phum station. Although the Kanchanaburi-based models occasionally exhibit slightly lower RMSE and MAE at longer forecast horizons, the magnitude of these differences remains small. The pattern of forecasting accuracy across horizons therefore aligns with the LOO cross-validation results, which indicate that the predictive performances of the two station-based models are statistically equivalent.

The comparison further indicates that stronger marginal associations between climatic variables and dengue incidence, observed for the Thong Pha Phum station, do not necessarily translate into improved out-of-sample forecasting accuracy.

4.3. Discussion

This study examined whether the spatial alignment between meteorological observations and the population under surveillance influences the accuracy of climate-driven dengue forecasting. The results indicate that stronger marginal climate-dengue associations do not necessarily translate into superior out-of-sample predictive performance. Although climatic variables from the Thong Pha Phum station exhibited higher statistical correlations with dengue incidence, the predictive performances of the two station-based models were statistically equivalent. Specifically, the difference in ELPD was within two standard errors, indicating that the Kanchanaburi station-despite its lower marginal correlation-provided a predictive signal of comparable quality. This finding underscores an important distinction in infectious disease modeling: strong explanatory associations in historical data do not necessarily guarantee improved predictive utility.

4.3.1. Potential mechanisms of spatial representativeness.

The observed statistical equivalence between a highly correlated distal station and a moderately correlated representative station may be explained by the mechanisms of environmental exposure. Dengue transmission is a localized process driven by the abundance and infection dynamics of Aedes mosquitoes in areas where human-vector contact occurs. The Kanchanaburi station, situated in the provincial administrative center, captures climatic conditions experienced by the majority of the population. In urbanized environments, microclimatic factors such as urban heat island effects, dense housing structures, and localized water storage practices can create breeding conditions that differ from those in forested or mountainous districts such as Thong Pha Phum. Consequently, meteorological observations from geographically distant environments may reflect broader regional climatic variability without accurately representing the microclimatic drivers of transmission in the primary population center.

4.3.2. Generalizability and regional implications.

The implications of these findings extend beyond the present study area. The influence of spatial representativeness is likely to become more pronounced in geographically large or environmentally heterogeneous regions where climatic gradients vary across relatively short distances. For example, in a large metropolitan area such as Bangkok, meteorological measurements from a single central monitoring station may not capture the diverse microclimates that characterize densely populated urban neighborhoods and suburban districts. Conversely, in environmentally homogeneous regions, the choice of meteorological station may have a smaller influence on predictive performance. These considerations suggest that the geographical correspondence between meteorological monitoring networks and disease surveillance catchment areas represents an important structural factor in climate-informed infectious disease forecasting.

4.3.3. Practical framework for dengue early warning systems.

To translate these findings into operational guidance, a systematic workflow for selecting climatic inputs in dengue early warning systems can be proposed. First, public health agencies should map the spatial distribution of the population at risk and identify the primary exposure zones within the surveillance catchment area. Second, meteorological monitoring locations should be evaluated in relation to these exposure zones, with priority given to stations located within or near densely populated areas. Third, candidate climatic predictors should be assessed using predictive validation metrics, such as leave-one-out cross-validation or out-of-sample forecasting error, rather than relying solely on historical correlation coefficients. Finally, forecasting models should explicitly account for temporal dependence and overdispersion in surveillance data to ensure realistic uncertainty estimates and robust predictive performance.

Taken together, these findings highlight an important principle for climate-informed infectious disease forecasting. The strength of historical climate-disease associations alone is not a reliable indicator of predictive utility. Instead, the spatial representativeness of environmental measurements relative to the population under surveillance appears to be a key determinant of predictive reliability.

By explicitly distinguishing explanatory correlation from predictive performance, this study provides an empirical framework for integrating meteorological observations into dengue early warning systems. Incorporating spatially representative climate inputs, together with models that account for temporal dependence and overdispersion in surveillance data, may improve the robustness of operational dengue forecasting in endemic regions.

4.3.4. Study limitations.

Despite these insights, several limitations warrant consideration. First, our analysis relied on monthly aggregated surveillance data, which may overlook finer temporal fluctuations and reporting delays occurring at weekly or daily scales. Second, as the study focused on two stations within a single province, the observed spatial representativeness effects might vary in regions with different topographies or denser monitoring networks. Third, while maximum temperature and relative humidity are key drivers of mosquito ecology, other factors, such as rainfall patterns, land-use changes, and vector control interventions, also influence local transmission but were not explicitly captured in our model.

Future research integrating higher-resolution environmental data and multi-province surveillance systems will be essential to further clarify how spatial exposure misalignment shapes climate-driven forecasting across diverse ecological contexts.

5. Conclusion

This study analyzed monthly dengue incidence in Kanchanaburi Province using a Bayesian negative binomial time-series framework that accounted for temporal dependence, overdispersion, and climatic covariates. The observed dengue counts exhibited substantial extra-Poisson variability and strong serial correlation, features that were not adequately captured by Poisson-based formulations and motivated the use of a negative binomial specification.

Lagged dengue incidence accounted for a large proportion of short-term variation in reported cases, indicating persistent transmission dynamics and unobserved local influences. After controlling for temporal dependence, lagged maximum temperature and relative humidity remained associated with dengue incidence. These associations were consistent across meteorological stations and were supported by posterior predictive diagnostics, indicating stable model estimation.

Model comparison based on leave-one-out cross-validation showed higher out-of-sample predictive accuracy for Bayesian negative binomial models relative to Poisson alternatives. Differences in predictive performance across station-specific climate inputs were small relative to posterior uncertainty, indicating that stronger marginal climate-dengue associations were not systematically associated with improved forecasts. This result suggests that correlation strength alone provides limited guidance for selecting climatic inputs in predictive models.

Assessment of multi-step-ahead forecasts showed that the negative binomial framework maintained comparable predictive performance across lead times of up to six months. Predictive behavior at longer horizons reflected explicit representation of temporal dependence and uncertainty propagation rather than reliance on short-term associations.

Overall, the findings indicate that dengue forecasting performance is influenced more by the combination of dynamic transmission structure and the spatial representativeness of climatic inputs than by the magnitude of marginal climate-dengue associations. Bayesian negative binomial time-series models provide a suitable framework for dengue surveillance and forecasting in settings characterized by overdispersed incidence data and spatially heterogeneous climate information. Extensions incorporating spatially resolved climate data, vector surveillance indicators, or mechanistic transmission components may be considered to examine their effects on predictive behavior.

5.1. Ethics Approval and Consent to Participate

This study analyzed aggregated and anonymized secondary dengue surveillance data and did not involve individual-level human subjects. Ethical approval and informed consent were therefore not required in accordance with national regulations and institutional guidelines.

Acknowledgments

The authors gratefully acknowledge the personnel involved in data collection and preliminary data processing. We also thank the Center for Epidemiology and Disease Control for providing dengue surveillance data and the Thai Meteorological Department for access to meteorological observations used in this study.

References

1. World Health Organization. Dengue: Guidelines for diagnosis, treatment, prevention and control. Geneva: WHO Press. 2009.
2. Gubler DJ. Dengue and dengue hemorrhagic fever. Clin Microbiol Rev. 1998;11(3):480–96. pmid:9665979
3. Lambrechts L, Paaijmans KP, Fansiri T, Carrington LB, Kramer LD, Thomas MB, et al. Impact of daily temperature fluctuations on dengue virus transmission by Aedes aegypti. Proc Natl Acad Sci U S A. 2011;108(18):7460–5. pmid:21502510
4. Morin CW, Comrie AC, Ernst K. Climate and dengue transmission: evidence and implications. Environ Health Perspect. 2013;121(11–12):1264–72. pmid:24058050
5. Naish S, Dale P, Mackenzie JS, McBride J, Mengersen K, Tong S. Climate change and dengue: a critical and systematic review of quantitative modelling approaches. BMC Infect Dis. 2014;14:167. pmid:24669859
6. Held L, Höhle M, Hofmann M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat Model. 2005;5(3):187–99.
- View Article
- Google Scholar
7. Xu L, Stige LC, Chan K-S, Zhou J, Yang J, Sang S, et al. Climate variation drives dengue dynamics. Proc Natl Acad Sci U S A. 2017;114(1):113–8. pmid:27940911
8. Held L, Meyer S, Bracher J. Probabilistic forecasting in infectious disease epidemiology. Biom J. 2017;59(2):242–58.
- View Article
- Google Scholar
9. Meyer S, Held L, Höhle M. Spatio-temporal analysis of epidemic phenomena using the R package surveillance. J Stat Softw. 2017;77(11):1–55.
- View Article
- Google Scholar
10. Gelman A. Scaling regression inputs by dividing by two standard deviations. Stat Med. 2008;27(15):2865–73. pmid:17960576
11. Liu Z, Zhang Z, Lai Z. Temperature variability and dengue fever transmission in China: A multi-city study. Environ Res. 2020;182:109115.
- View Article
- Google Scholar
12. Held L, Höhle M, Hofmann M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Biostatistics. 2006;7(4):531–48.
- View Article
- Google Scholar
13. Hii YL, Rocklöv J, Ng N, Tang CS, Pang FY, Sauerborn R. Climate variability and increase in intensity and magnitude of dengue incidence in Singapore. Glob Health Action. 2009;2:10.3402/gha.v2i0.2036. pmid:20052380
14. Lowe R, Gasparrini A, Van Meerbeeck CJ, Lippi CA, Mahon R, Trotman AR, et al. Nonlinear and delayed impacts of climate on dengue risk in Barbados: A modelling study. PLoS Med. 2018;15(7):e1002613. pmid:30016319

[ref1] 1. World Health Organization. Dengue: Guidelines for diagnosis, treatment, prevention and control. Geneva: WHO Press. 2009.

[ref2] 2. Gubler DJ. Dengue and dengue hemorrhagic fever. Clin Microbiol Rev. 1998;11(3):480–96. pmid:9665979
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Lambrechts L, Paaijmans KP, Fansiri T, Carrington LB, Kramer LD, Thomas MB, et al. Impact of daily temperature fluctuations on dengue virus transmission by Aedes aegypti. Proc Natl Acad Sci U S A. 2011;108(18):7460–5. pmid:21502510
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Morin CW, Comrie AC, Ernst K. Climate and dengue transmission: evidence and implications. Environ Health Perspect. 2013;121(11–12):1264–72. pmid:24058050
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Naish S, Dale P, Mackenzie JS, McBride J, Mengersen K, Tong S. Climate change and dengue: a critical and systematic review of quantitative modelling approaches. BMC Infect Dis. 2014;14:167. pmid:24669859
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Held L, Höhle M, Hofmann M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat Model. 2005;5(3):187–99.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. Xu L, Stige LC, Chan K-S, Zhou J, Yang J, Sang S, et al. Climate variation drives dengue dynamics. Proc Natl Acad Sci U S A. 2017;114(1):113–8. pmid:27940911
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref8] 8. Held L, Meyer S, Bracher J. Probabilistic forecasting in infectious disease epidemiology. Biom J. 2017;59(2):242–58.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref9] 9. Meyer S, Held L, Höhle M. Spatio-temporal analysis of epidemic phenomena using the R package surveillance. J Stat Softw. 2017;77(11):1–55.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref10] 10. Gelman A. Scaling regression inputs by dividing by two standard deviations. Stat Med. 2008;27(15):2865–73. pmid:17960576
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref11] 11. Liu Z, Zhang Z, Lai Z. Temperature variability and dengue fever transmission in China: A multi-city study. Environ Res. 2020;182:109115.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref12] 12. Held L, Höhle M, Hofmann M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Biostatistics. 2006;7(4):531–48.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref13] 13. Hii YL, Rocklöv J, Ng N, Tang CS, Pang FY, Sauerborn R. Climate variability and increase in intensity and magnitude of dengue incidence in Singapore. Glob Health Action. 2009;2:10.3402/gha.v2i0.2036. pmid:20052380
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref14] 14. Lowe R, Gasparrini A, Van Meerbeeck CJ, Lippi CA, Mahon R, Trotman AR, et al. Nonlinear and delayed impacts of climate on dengue risk in Barbados: A modelling study. PLoS Med. 2018;15(7):e1002613. pmid:30016319
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

Spatial representativeness matters for Climate-Driven Dengue Forecasting

Spatial representativeness matters for Climate-Driven Dengue Forecasting

This is an uncorrected proof.

Figures

Abstract

Author summary

1. Introduction

2. Methods

2.1. Study design and data

2.2. Descriptive characteristics and exploratory analysis

2.3. Feature construction and lag specification

2.4. Model specification

3. Bayesian inference

3.1. Negative binomial formulation

3.2. Prior specification

3.3. Model assessment

3.4. Sensitivity and robustness analyses

4. Results and discussion

4.1. Model adequacy and posterior predictive diagnostics

4.2. Out-of-sample forecasting performance

4.3. Discussion

4.3.1. Potential mechanisms of spatial representativeness.

4.3.2. Generalizability and regional implications.

4.3.3. Practical framework for dengue early warning systems.

4.3.4. Study limitations.

5. Conclusion

5.1. Ethics Approval and Consent to Participate

Acknowledgments

References