^{*}

Conceived and designed the experiments: LC. Analyzed the data: LC. Wrote the paper: MP LC.

The authors have declared that no competing interests exist.

Early warning systems (EWS) are management tools to predict the occurrence of epidemics of infectious diseases. While climate-based EWS have been developed for malaria, no standard protocol to evaluate and compare EWS has been proposed. Additionally, there are several neglected tropical diseases whose transmission is sensitive to environmental conditions, for which no EWS have been proposed, though they represent a large burden for the affected populations.

In the present paper, an overview of the available linear and non-linear tools to predict seasonal time series of diseases is presented. Also, a general methodology to compare and evaluate models for prediction is presented and illustrated using American cutaneous leishmaniasis, a neglected tropical disease, as an example. The comparison of the different models using the predictive ^{2} for forecasts of “out-of-fit” data (data that has not been used to fit the models) shows that for the several linear and non-linear models tested, the best results were obtained for seasonal autoregressive (SAR) models that incorporate climatic covariates. An additional bootstrapping experiment shows that the relationship of the disease time series with the climatic covariates is strong and consistent for the SAR modeling approach. While the autoregressive part of the model is not significant, the exogenous forcing due to climate is always statistically significant. Prediction accuracy can vary from 50% to over 80% for disease burden at time scales of one year or shorter.

This study illustrates a protocol for the development of EWS that includes three main steps: (i) the fitting of different models using several methodologies, (ii) the comparison of models based on the predictability of “out-of-fit” data, and (iii) the assessment of the robustness of the relationship between the disease and the variables in the model selected as best with an objective criterion.

Early Warning Systems (EWS) are management tools to predict the occurrence of epidemics. They are based on the dependence of a given infectious disease on environmental variables. Although several neglected tropical diseases are sensitive to the effect of climate, our ability to predict their dynamics has been barely studied. In this paper, we use several models to determine if the relationship between cases and climatic variability is robust—that is, not simply an artifact of model choice. We propose that EWS should be based on results from several models that are to be compared in terms of their ability to predict future number of cases. We use a specific metric for this comparison known as the predictive ^{2}, which measures the accuracy of the predictions. For example, an ^{2} of 1 indicates perfect accuracy for predictions that perfectly match observed cases. For cutaneous leishmaniasis, ^{2 }values range from 72% to77%, well above predictions using mean seasonal values (64%). We emphasize that predictability should be evaluated with observations that have not been used to fit the model. Finally, we argue that EWS should incorporate climatic variables that are known to have a consistent relationship with the number of observed cases.

One of the best documented patterns in the dynamics of vector-transmitted diseases is their periodicity at seasonal and interannual temporal scales

Despite the possible caveats of climate-based EWS, especially because of the complexity of human diseases for which social components can be as important as natural forces

Monthly records of ACL cases from January 1991 to December 2001 were obtained from the epidemic surveillance service Vigilancia de la Salud, of Costa Rica. The data were normalized using a square root transformation.

The temperature (

(A) Square root Transformed ACL Cases in Costa Rica. (B) Mean Temperature in Costa Rica. (C) MEI.

Several linear and non-linear models were fitted to the square root transformed case data. Brief descriptions follow of: (1) the approach to handle seasonality, (2) the types of models used, and (3) their classification as linear or non-linear.

To introduce seasonality, the strategy for all models was to include lags 12 and 13 of the transformed case data. This approach was chosen because the autoregressive treatment of seasonality is known to be the best approximation to the asymptotic cyclical structure of a time series

In this class of models, parameters have a linear relationship with the response variable

In these models, the relationship between the response and the parameters for the predictors is not constrained to be linear. Models include NLF, generalized additive models (GAM), and feed-forward neural networks (FNN). A description of the methods (linear and non-linear) and of the fitted models can be found in

For all models, forecasts were obtained for prediction time intervals of 1, 3, 6, and 12 months ahead for a total of 24 months each. Each model was refitted recurrently before computing the next prediction by including all the previous months in the series ^{2}, which has an interpretation similar to the ^{2} of a linear regression by definition ^{2} = 1–(mean square error/variance of the series). Thus, the errors are normalized by the variance of the time series; an ^{2} of 1 indicates perfect forecasts while a value close to 0 or negative indicates poor predictability. Forecasting accuracy was tested for all the fitted models. To establish a baseline for comparison, the predictive ^{2} was also computed when the prediction is the monthly mean value of the transformed time series.

Once the best modeling approach was selected, the robustness of the association between the cases and the exogenous forces ^{2} was used. The bootstrap was initially used to see the frequency (%) of times that the model from which we generated the bootstrap samples was actually selected as the best model, using the Akaike Information criterion ^{2} confidence intervals.

(A) Autoregressive (y_{t−1}) and Seasonal (y_{t−12}) components. (B) Seasonal (y_{t−12}) and Autoregressive Seasonal (y_{t−13}) components. (C) Autoregressive component (y_{t−1}) and Temperature (lag 4, T_{t−4}). (D) Autoregressive component (y_{t−1}) and MEI (lag 13, MEI_{t−13}).

Model | 1 month | 3 months | 6 months | 12 months |

NLF (E = 2) | 0.69 | 0.62 | 0.61 | 0.66 |

NLF (E = 3) | 0.67 | 0.60 | 0.59 | 0.67 |

NLF (E = 4) | 0.66 | 0.59 | 0.58 | 0.66 |

FNN (2 Layers) | 0.55 | 0.53 | 0.44 | 0.44 |

FNN (3 Layers) | 0.62 | 0.58 | 0.61 | 0.60 |

SAR (null) | 0.71 | 0.64 | 0.62 | 0.57 |

SAR (MEI) | 0.73 | 0.67 | 0.67 | 0.66 |

SAR (MEI+T) | 0.77 | 0.73 | 0.73 | 0.72 |

BSM | 0.69 | 0.59 | 0.52 | 0.65 |

GAM (MEI) | 0.66 | 0.59 | 0.56 | 0.57 |

GAM (MEI+T) | 0.73 | 0.68 | 0.67 | 0.68 |

MEAN | 0.64 | 0.64 | 0.64 | 0.64 |

For model identification, see common abbreviations. Months indicate the number of months predicted ahead. Mean indicates the results that could be obtained by just using the monthly average number of cases.

The predictive ^{2} was highest for the SAR model with

(A) 95% Confidence intervals for the parameters of the best model. AR stands for the autoregressive component of the model (_{1}_{12}_{ε}^{2}^{2} and the 95% confidence intervals, indicated by stars, for the bootstrapped best model and prediction interval.

The need for forecasts by policy makers goes well beyond the development of EWS for diseases. Due to large-scale, rapid changes, from increased average temperatures to extensive land use changes, major alterations in biogeochemical cycles, water availability, food production, biodiversity and diseases are already occurring and likely to be exacerbated in the future

In this paper, we have presented several methods to study seasonal time series, and used a simple measure, the predictive ^{2}, to compare models based on their ability to predict future dynamics and not their goodness of fit of the past. By comparison with modeling results for other infectious diseases on the predictability of NLF methods

One of the main lessons from the study of populations is that non-linear dynamics are common in nature but often satisfactorily captured by linear approximations ^{2} for NLF with E = 3 does not vary with the prediction time step, while this value for the SAR model without covariates decreases abruptly, as expected in systems where the dynamics are non-linear

This result also highlights two open questions that need to be addressed when modeling infectious diseases transmitted by vectors: first, the appropriate functional form to introduce climate variables into the dynamics

A factor that deserves further consideration in developing EWS is the understanding of the role of space. Predictability at more local scales was not addressed here because half of the series was only available at levels below that of the whole country, and because Costa Rica encompasses a small area for which temperature variability is quite homogenous, as seen in the very small variability between temperature grids. However, for larger spatial scales heterogeneities in the landscape for disease transmission would need to be considered

EWS are a feasible ecological application for neglected tropical diseases, as illustrated for ACL. Available models have good levels of predictability up to one year ahead for the number of cases. Predictability strongly depends on the use of an appropriate structure for the different components of the model, including seasonality and exogenous drivers such as climatic variables. Depending on the model, predictability can range from poor, with approximately 50% accuracy, to high, with 80% accuracy, significantly better than that of seasonal averages (about 65%). Forecasts can be useful in planning services for the populations affected, allowing estimates of approximate number of hospital beds, vaccine shots, drug doses and vector control measures. If EWS need to incorporate the spatial spread of the disease, they should do so dynamically and in relation to different landscapes, such as the geopolitical unit of this study or regions with similar climatic patterns

Linear and non-linear models for time series forecasting.

(0.07 MB DOC)

We thank Justin M. Cohen, Inés Széliga, Andy Dobson and Mark Wilson for their comments and suggestions during early stages of this work. Suggestions by a referee also helped in strengthening the discussion.

^{nd}ed.