Deep learning models for forecasting dengue fever based on climate data in Vietnam

Background Dengue fever (DF) represents a significant health burden in Vietnam, which is forecast to worsen under climate change. The development of an early-warning system for DF has been selected as a prioritised health adaptation measure to climate change in Vietnam. Objective This study aimed to develop an accurate DF prediction model in Vietnam using a wide range of meteorological factors as inputs to inform public health responses for outbreak prevention in the context of future climate change. Methods Convolutional neural network (CNN), Transformer, long short-term memory (LSTM), and attention-enhanced LSTM (LSTM-ATT) models were compared with traditional machine learning models on weather-based DF forecasting. Models were developed using lagged DF incidence and meteorological variables (measures of temperature, humidity, rainfall, evaporation, and sunshine hours) as inputs for 20 provinces throughout Vietnam. Data from 1997–2013 were used to train models, which were then evaluated using data from 2014–2016 by Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Results and discussion LSTM-ATT displayed the highest performance, scoring average places of 1.60 for RMSE-based ranking and 1.95 for MAE-based ranking. Notably, it was able to forecast DF incidence better than LSTM in 13 or 14 out of 20 provinces for MAE or RMSE, respectively. Moreover, LSTM-ATT was able to accurately predict DF incidence and outbreak months up to 3 months ahead, though performance dropped slightly compared to short-term forecasts. To the best of our knowledge, this is the first time deep learning methods have been employed for the prediction of both long- and short-term DF incidence and outbreaks in Vietnam using unique, rich meteorological features. Conclusion This study demonstrates the usefulness of deep learning models for meteorological factor-based DF forecasting. LSTM-ATT should be further explored for mitigation strategies against DF and other climate-sensitive diseases in the coming years.


Introduction
Dengue fever (DF) is a climate-sensitive, vector-borne disease caused by the dengue virus, transmitted primarily by Aedes aegypti and Aedes albopictus mosquitoes [1]. Ae. aegypti are particularly suited to urban environments, where there is an abundance of human hosts, few predators, and a wide range of potential breeding sites such as drains, tires, and water containers [2]. Symptoms of DF include flu-like symptoms such as fever, headache, joint pain, nausea, and vomiting. Severe DF (dengue haemorrhagic fever) can be fatal and may present with plasma leakage, respiratory distress, organ damage, and internal bleeding [3]. Vietnam experienced an average of 80,938 reported confirmed DF cases annually during the period from 1997-2016, representing a significant impact on public health. The burden of DF is forecast to worsen throughout the country, and temperatures in the whole country (especially southern and central regions) are predicted to become significantly more suited to DF transmission due to climate change [4]. Therefore, an effective early-warning system for DF will help to inform public health responses for outbreak prevention and has been identified as one of the prioritized health adaptation measures to climate change in Vietnam [4].
Previous studies have attempted to elucidate the relationships between meteorological factors (i.e., weather factors) and DF incidence in Vietnam and other affected countries [5][6][7][8][9][10][11][12][13]. Such research is useful in designing effective DF forecasting models. Multiple studies have found a positive correlation between precipitation and DF, with a lag-time of 0-3 months between high rainfall and rise in case numbers [5][6][7][8][9]. However, others found no significant correlation [10,11] or a negative association for a 2-month lag-time [12]. In the studies examined, minimum temperature was consistently reported as positively correlated with DF incidence for 1-2 month lags [8,10,12,13]. Average monthly or weekly temperature was reported as positively correlated at 0-2.5 month lags [5][6][7]9] or not significantly associated [11]. Temperature and rainfall analyses received the most coverage, however other analyses involved humidity, evaporation, sunshine hours, wind speed, and El Niño events. Relative humidity was reported as positively associated with DF in the same month by some studies [7,9,12] and negatively correlated by others [11]. When relative or minimum humidity was lagged by 1-3 months, it was only reported as positively correlated [8,12]. Sunshine hours were reported as both correlated [11] and inversely correlated [7] with DF incidence. Wind speed was found to be inversely correlated with DF cases for the same month [12]. Positive associations with DF were also found for same-month average evaporation [11] and El Niño events [10]. The regular findings of significant associations between meteorological factors and DF suggests that they may be useful predictors in forecasting DF incidence. However, the differences in findings also indicate that these relations may be location-specific.
In this study, we focused on deep learning models due, in part, to their advantages over traditional approaches. There are several limitations which traditional machine learning (ensemble and statistical) models face. Firstly, missing data can considerably decrease the performance of the models. Secondly, traditional models are not always able to discern complex patterns in the data. Thirdly, they are not able to work well in long-term forecasting applications. Finally, feature engineering in traditional models is carried out manually. In contrast, deep learning models can overcome the obstacles of traditional models through learning features directly from the data and learning much more complex data patterns in a more specific way [25,26].
Similar to the situation in various countries worldwide, there are no early-warning systems in place for the prediction of DF in Vietnam. This was identified as one of the prioritized adaptation measures of Vietnam in the "Climate change response action plan of the health sector in the 2019-2030, vision to 2050" [4]. Thus, the development of a DF early-warning system has the potential to be significantly impactful in reducing national morbidity and mortality. There are some existing studies which built DF prediction models in various provinces in Vietnam in the past [8,21,27]. However, these have mainly focused on the Mekong delta area in the southern region of Vietnam. These prediction models have either been single-variate based on DF data or multi-variate based on common meteorological factors: temperature, humidity, and rainfall. More recently, Colón-González et al. [28] developed a superensemble of Bayesian generalised linear mixed models for DF forecasting up to six months in advance. The model was evaluated on all 63 provinces in Vietnam, using weather and land cover variables as predictors. To the best of our knowledge, there have been no DF forecasting models developed in Vietnam using advanced deep learning techniques such as LSTM. LSTM shows promising predictive accuracy when compared to other machine learning techniques in DF forecasting elsewhere [14,23] as well as in many other real-world problems [25,[29][30][31]. This study aimed to develop an accurate prediction model for DF in Vietnam, using a wide range of weather factors as input variables.
Contributions. In this paper, advanced deep learning methods-CNN, Transformer, LSTM, and attention mechanism-enhanced LSTM (LSTM-ATT) models-were trained and evaluated on DF rates and 12 different meteorological variables (measures of temperature, humidity, rainfall, evaporation, and sunshine hours) from 1997 to 2016 in 20 of Vietnam's 63 provinces. Given the varying response of dengue incidence to meteorological factors observed in the literature across different locations and for different time lags, we trained the models for each province separately. To the best of our knowledge, this paper is the first to employ deep learning techniques to predict both long-term (three months ahead) and short-term (one month ahead) DF incidence and epidemic months in Vietnam. We evaluated our methods on a large number of provinces in Vietnam-20 different provinces spanning across three different regions with different geographical and climate conditions. From this evaluation, LSTM-ATT was found to outperform competing models and accurately forecast DF incidence throughout Vietnam.

Study design and study site
This was a retrospective ecological study conducted in Vietnam. Vietnam is located in Southeast Asia, with a high level of exposure to climate-related hazards and extreme weather events. The Global Climate Risk Index 2020 ranked Vietnam as the sixth country in the world most affected by climate variability and extreme weather events over the period of 1999-2018 [32]. Vietnam has three main regions, Northern, Central and Southern Vietnam, which have distinctive geographical, meteorological, historical, and cultural qualities. Each region consists of subregions with further cultural and climate differences. Northern Vietnam has a humid subtropical climate with a full four seasons and much cooler temperatures than the South, which has a tropical savanna climate. Winters in the North can get quite cold, sometimes with frost and even snowfall. Snow can even be found to an extent up in the mountains of the extreme Northern regions, such as in Sapa and Lang Son province in recent years. Southern Vietnam is usually much hotter and has only two main seasons: a dry season and a rainy season. Climate change is projected to increase temperatures throughout the country as well as the severity and frequency of extreme weather events, which in turn would increase the number of people at risk of climate-sensitive diseases such as DF [4]. Under Representative Concentration Pathway 4.5, more frequent severe typhoons and droughts, longer monsoon seasons, and a sea-level rise of 55 cm are projected by the end of the 21st century. Temperatures are forecast to rise by approximately 2.2˚C in northern regions and 1.8˚C in southern regions, and annual rainfall by 5-15 mm. These changes in climate conditions are projected to significantly worsen the impact of DF and other communicable diseases in Vietnam [4], thus leading to the development of early-warning systems for them.

Data
DF is one of the prioritized climate-sensitive diseases in Vietnam. Monthly incident confirmed cases and deaths for DF in 20 provinces/cities (belonging to three main regions in Vietnam: North, Central, and South) from 1997 to 2016 were provided by the National Institute of Hygiene and Epidemiology (NIHE), which was responsible for the accuracy of the information in the database. There were 1,618,767 notified cases of DF from 1997 to 2016, with on average, about 80,938 cases per year (or 110 cases per 100,000 population). There were 1389 deaths from DF in this period with most of the deaths occurring before 2000. In 1998, the death rate of DF was especially high at 0.5 per 100,000. The incidence of DF and mortality rates increased as temperature increased and the rates in June to October were higher than in other months. Average yearly DF incidence rates were lower in northern Vietnam from 1997 to 2016, and peaked in central and southern provinces where the climate is hotter, rainier, and more humid (Fig 1). These conditions are advantageous to the spread of DF. Hence, including these meteorological factors into prediction models has the potential to improve prediction accuracy as demonstrated in previous works [8,14,23,24].
For weather data, 12 meteorological factors in the same period were collected, including measures of temperature, rainfall, humidity, evaporation, and sunshine hours ( Table 1). Sunshine hours refers to the number of hours with the intensity of direct solar radiation reaching the surface equal to or greater than 0.2 calories/cm 2 minute. Surface is defined as 2 m above the ground. Thus, if there are thin clouds, but the solar radiation measured at the surface is greater than 0.2 calories/cm 2 minute, it will still be counted as sunny time. The data were provided by the Vietnam Institute of Meteorology, Hydrology and Climate Change (IMHEN).

Forecasting models
Since the raw datasets were in various formats, they had to be pre-processed and prepared for building prediction models (Fig 2).
Data pre-processing. The first step was to clean the data to ensure data integrity before building prediction models. Our datasets contained a few missing datapoints for some provinces. The missing data were imputed by using the minimum value from the same month of the last two years. We found out that this scheme brings better prediction performance with our data than other common methods such as 0 and mean substitutions in preliminary experiments. Since the data contained many different features (12 weather factors and DF incidence) with different value ranges, it required normalisation. For example, total rainfall ranged from 0 mm to 3207 mm, while average temperature ranged from 3.8˚C to 31.8˚C. We normalized each data feature into a range of (0, 1) using Min-max scaling to ensure all data features were treated equally in the prediction models. Moreover, rather than predicting the numbers of DF cases each month, we predicted the incidence rate per 100,000 population to avoid the effect of population changes over time including past province expansions (e.g., the merge of Ha Noi and Ha Tay in 2008).
Feature selection. For each province, we used a Random Forest Regressor from the Scikit-learn Python Library (version 0.24.2) [33] to rank the importance of all meteorological factors using Recursive Feature Elimination (RFE) and choose the top 2 features as input for prediction models. In this method, the RFE function was first trained on all meteorological factors as predictors of DF incidence by using random forest regression, then the least

Meteorological factor Unit Measurement methods/detailed description of climate factors
Average monthly temperature˚C These factors were measured in a meteorological tent at an altitude of 2m, with a frequency of four times per day. In the tent, 3 specialized thermometers were placed to measure the average temperature, the maximum temperature, and the minimum temperature. The average daily temperature value was calculated as the average of four measurements (1 am; 7 am; 1 pm; 7pm). Thus, each day had an average temperature value, a maximum temperature value, and a minimum temperature value, from which monthly data were calculated.

Minimum average monthly temperature˚C
Monthly absolute maximum temperature˚C

Monthly absolute minimum temperature˚C
Monthly rainfall mm Rainfall was also measured by WMO's specialized meter and placed in a meteorological garden (close to the meteorological tent) with a frequency of measurement of four times per day. Total rainfall per day was calculated as the sum of four measurements. Thus, total monthly rainfall was calculated from the daily rainfall values.

Highest daily rainfall per month
mm Selected from a series of daily rainfall in a month.

Number of rainy days per month
Days Calculated from the series of daily rainfall. Number of rainy days per month is the total number of days with the rainfall greater than 0mm.

Monthly average relative humidity
% Humidity was also measured in a weather tent according to WMO standards with a measurement frequency of four times per day (1 am; 7 am; 1 pm; 7 pm). The average daily relative humidity value was calculated as the average value of these four measurements. From the date data series, monthly average relative humidity was calculated.
Monthly minimum relative humidity % Daily minimum relative humidity was selected from the four measurements. From the daily data series, monthly minimum relative humidity was calculated.

Monthly evaporation mm
Evaporation was also measured in a meteorological tent according to WMO standards with a measurement frequency of two times per day (7 am and 7 pm). Daily evaporation was calculated as the sum of these 2 measurements. From the daily data series, monthly evaporation was calculated.
Total monthly sunshine hours Hours Similar to the other factors, sunshine hours were also measured from a specialized meter according to WMO standards and placed in a meteorological garden to measure the total number of sunshine hours per day. From the series of daily data, the total monthly sunshine hours were calculated.
Data for each factor was collected from 1997 to 2016. WMO = World Meteorological Organization. https://doi.org/10.1371/journal.pntd.0010509.t001 important meteorological factor was removed. This process was repeated recursively until there were only the two most important features left. This helped to improve the model's efficiency and effectiveness by avoiding overfitting caused by too many input features. The full list of features for each province can be found in S1 Table. Performance evaluation. Models were evaluated for predictions made one to three months (steps) in advance. Multi-step prediction refers to forecasts made more than one month in advance. We split our data into a training set (from 1997 to 2013-a total of 17 years) and a testing set (from 2014 to 2016-3 years in total) for each province. The training data were used as input to fit the parameters of the prediction models. We used RMSE and MAE as two main measures to evaluate how our forecasted incidence rates compared to the real ones in the test set for each province. In this context, MAE can be interpreted as the average absolute difference between predicted and actual DF rates over the three years test set. MAE computes the mean of the absolute errors between predicted values and corresponding real values as follows: where y i is an actual value andŷ i is a predicted value. MAE weights errors in proportion to their magnitude. RMSE, in contrast, weights larger errors more heavily than smaller errors. RMSE computes the square root of the mean of squared errors between predicted values and corresponding real values.

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi , where y i is an actual value andŷ i is a predicted value.
Generally, lower scores in these RMSE and MAE metrics indicate a better forecasting model. As RMSE weights larger errors more than MAE, a forecast with lower RMSE and higher MAE than competing models would likely have more small-scale errors but fewer large-scale errors.
Outbreak detection. The ability to correctly categorise months as either outbreak (i.e., epidemic) or non-outbreak (i.e., normal) months was assessed for the LSTM-ATT model. We set an epidemic threshold for each province by using the monthly mean and standard deviation of incidence rates for that province as in previous works [34,35]. An outbreak month is defined by an incidence rate exceeding the mean by n standard deviation(s). We set n = 1 in our study to capture both medium and large outbreak months. Four metrics were used to assess epidemic detection, as defined below. Firstly, accuracy is defined as the ability of a model to correctly categorise future months as normal or outbreak months. Secondly, precision refers to the ratio of correctly detected outbreak months to the number of predicted outbreak months. Thirdly, sensitivity refers to proportion of outbreak months that were correctly predicted. Finally, specificity is defined as the ratio of correctly detected normal months to the total number of normal months. Forecasting models. ANNs [36] are a type of computational model, which imitate the information processing achieved by neurons in the human brain by making the right connections among nodes [29]. An ANN consists of three parts: a layer of input nodes, layers of hidden nodes, and a layer of output nodes. ANNs are able to successfully map nonlinear input to output by automatically extracting subtle patterns and multiple features from a large dataset through each layer. Modern ANNs have achieved state-of-the-art results in previous DF studies in different regions with different meteorological and geographic data, such as in China [22,23] and Kuala Lumpur, Malaysia [14]. Thus, in this paper, we focus on adapting these advanced prediction models to predict DF rates for Vietnam, through the use of CNNs [29], LSTM models [37] with and without attention mechanisms [30,31], and a Transformer model [30]. Additionally, a selection of traditional machine learning models-Poisson regression [38], XGBoost [39], Support Vector Regression (SVR and SVR-L) [40], and Seasonal AutoRegressive Integrated Moving Average (SARIMA) [41]-were included for comparison. Our prediction methods take DF rate and some selected weather factors as inputs and output the forecasted DF incidence rates for the next k consecutive months (Fig 2). In this paper, we fixed k = 3 for forecasting future DF incidence up to 3 months ahead in 20 provinces. However, we also tested with k = 6 in Hanoi to provide an extended example.

Accuracy
CNNs: The development of CNNs was a breakthrough in ANNs, as they approached human performance in a wide range of domains including pattern recognition, natural language processing, and video processing by processing data in grid-like topology [25,29]. Thus, we adapted CNN models to cope with longitudinal data. Our CNN model consisted of 1D convolution layer, 1D max pooling layer, and one fully connected layer.
LSTM: Recurrent Neural Networks are another variant of ANNs specifically designed to cope with time ordered data, where nodes are connected as a directed graph along a temporal sequence [42]. In this paper, we focused on LSTM [37], one of the most successful variants of RNNs specifically designed to deal with longer dependencies in sequences [43] and reduce exploding gradients. Unlike RNNs, instead of adding regular neural units (i.e., hidden layers), LSTM adds memory blocks. A common LSTM memory block consists of a cell state and three gates-an input gate, a forget gate, and an output gate.
LSTM-ATT: LSTMs can lose important information due to passing information across multiple sequence steps. To deal with this limitation, attention mechanisms were originally introduced in Machine Translation [31,33] to strengthen the power of exploiting information by generating an output at each sequence step. They have proven to be an effective approach for long input sequences. For this reason, we employed the attention technique from Luong et al. [31] to further enhance the performance of LSTM in this paper by adding an attention layer after the LSTM network, denoted as LSTM-ATT.
Transformer: We also considered the Transformer model [30], a recent advanced deep learning model for natural language processing, for our task. Like RNNs, the Transformer is designed to deal with sequential data. However, it does not process sequential data in order like LSTMs. Instead, the Transformer handles the sequence data by using self-attention mechanisms to learn the complex dynamics of time series data.
Model Implementation: We implemented the deep learning prediction models (CNN, LSTM, LSTM-ATT and Transformer) in Python 3.7.10 using PyTorch (version 1.8.1) [44] and Scikit-learn (version 0.24.2) [33] libraries. During our experiments, we tried lookback window lengths from 1 to 18 (months). We observed that the models performed best once the lookback window length was set to 3 (months). After tuning different configurations for parameters and hyperparameters, we applied the best fitting configurations as follows. For all models, the following parameters were used: batch size = 16, learning rate = 1e -3 , dropout = 0.1, number of training epoch = 300. For CNN, the following parameters were used: number of layers = 1, number of each kernel = 100, size of each kernel is (1, 3), (2, 3) and (3,3). The numbers of layers and hidden sizes for LSTM, LSTM-ATT, and Transformer were optimized for different provinces and models (S2 Table).
SARIMA models were implemented using the SARIMAX model from the statsmodels (v0.12.2) Python library [45]. Default function parameters were used with the exception of enforce_stationarity and enforce_invertibility, which were set to false, and the models were not retrained while iterating through the test set. The order, seasonal order, and trend parameters were chosen using Bayesian model-based optimisation. This was implemented with a Tree-structured Parzen Estimator (TPE) in Optuna (version 2.8.0) [46] which aimed to find the optimum combination of parameters for each province to minimise RMSE (S3 Table). There were many parameters to optimise for the SARIMA models, which can be highly time-consuming and difficult for fine-tuning. Therefore, the decision was made to automate this process, and a TPE was chosen over grid-searching as it is less computationally expensive [46].
Ethical consideration: This study was approved and managed by the Hanoi University of Public Health. The study only involved analysing secondary data on DF cases and climate factors including temperature, precipitation, humidity, evaporation, and sunshine hours. No human participants were actually involved in this study. Thus, ethical approval was not required.

Results
One step forecasting accuracy: Overall, the deep learning models outperformed traditional models in forecasting DF incidence in all 20 provinces, as measured by RMSE (Table 2) and MAE (Table 3). Colour-coded results were used to highlight this on a province-by-province basis, instead of colour-coding across the entire range of values, as RMSE and MAE values are only directly comparable where observed incidence rates are the same. Compared to the traditional models, LSTM-ATT had lower RMSEs and MAEs in all provinces, LSTM had lower MAEs in all provinces and lower RMSEs in all but one province, CNN had lower values for both error metrics in all but three provinces, and Transformer had lower error metrics in all but four provinces.
To visualize the prediction performances of different models compared to the real incidence rates, we plotted the predicted values of the best performing models-CNN, LSTM and LSTM-ATT-as well as the actual incidence rates for all provinces during the last 36 months from January 2014 to December 2016 (S1 Fig). Plots from six different provinces were provided for an overview of the forecasting results (Fig 3), as well as complete epidemic curves for all provinces across the full 20 years of the dataset (S2 Fig). As the transformer and traditional models performed poorly, they were excluded to avoid overplotting. Overall, the prediction lines of LSTM and LSTM-ATT fit very well with the actual incidence lines for most of the provinces indicating very good prediction accuracies in these provinces. On the other hand, the performances of CNN and especially Transformer were less stable than LSTM and LSTM-ATT in most provinces.

PLOS NEGLECTED TROPICAL DISEASES
Deep learning models for forecasting dengue fever based on climate data in Vietnam   The RMSE and MAE metrics for the full set of 20 provinces in Vietnam further quantify the differences in deep learning model performance initially seen in the DF incidence plots (Fig  4). LSTM and LSTM-ATT clearly outperformed CNN and especially Transformer in most cases, indicated by low RMSE and MAE values, such as in Ha Noi and Tay Ninh. LSTM-ATT had the lowest RMSE in 10 provinces, followed by LSTM in five provinces, CNN in four provinces, and Transformer in one. For RMSE, LSTM-ATT was better than LSTM in 14 out of 20 provinces. Similarly, for MAE, LSTM-ATT had the lowest score in eight provinces, followed by LSTM in five provinces, CNN in five provinces, and Transformer in two provinces. LSTM-ATT had a lower MAE than LSTM in 13 out of 20 provinces. This shows the improvement the attention mechanism brings to the prediction accuracies of LSTM in our task.
To have a better overall view of the performance of these models on all provinces, we ranked each model from one to nine based on the RMSEs and MAEs for each province where one was the best method and nine was the worst. After that, we calculated the average ranks for all methods across all 20 different provinces. LSTM-ATT outperformed all other techniques with average rankings of 1.60 for RMSE and 1.95 for MAE (Fig 5). LSTM was the second-best method with average rankings of 2.35 and 2.20 for RMSE and MAE, respectively. The CNN model placed third, with average rankings of 3.10 and 2.70 for RMSE and MAE, respectively. The other models had worse error scores overall, with transformer ranking fourth, XGBoost fifth, Poisson regression sixth, SARIMA seventh, SVR eighth, and SVR-L nineth. Therefore, the deep learning models outperformed traditional models, and the attention mechanism improved the performance of the baseline LSTM model.
One step outbreak prediction: LSTM-ATT was selected for outbreak prediction due to its high performance relative to competing models. Overall, LSTM-ATT was able to predict epidemic months very well with a low incidence of false alarms (Fig 6A) and high levels of precision, accuracy, sensitivity, and specificity ( Fig 6B). There was an average accuracy score (i.e., the ability to classify months as either outbreak or normal) of 0.99, and an average sensitivity score (i.e., the ability to detect outbreak months) of 0.70. However, the average sensitivity calculation is based on the five provinces where there were outbreaks, as sensitivity is undefined    (Fig 6A). Additionally, prediction metrics (precision, accuracy, sensitivity, and specificity) for each province are displayed ( Fig  6B). If a province did not have any actual epidemic months in the evaluation period, the precision and sensitivity are not available. LSTM-ATT = attention mechanism-enhanced LSTM. https://doi.org/10.1371/journal.pntd.0010509.g006

PLOS NEGLECTED TROPICAL DISEASES
Deep learning models for forecasting dengue fever based on climate data in Vietnam Multi-step ahead prediction: The performance of LSTM-ATT was then assessed for predictions 2-3 months in advance (Fig 7A). Obviously, it is harder to predict in longer term. Thus, it is unsurprising that RMSE and MAE increased for some provinces. However, for most provinces, the changes were small (or even better in a few cases) indicating very good prediction performance of LSTM-ATT. This is also observed in the plotted incidence rates. For example, in Ha Noi, Ninh Thuan, and Binh Phuoc, there were high similarities between the predicted and observed rates (Fig 7B). In most of the 20 provinces, however, there were visible reductions in performance while forecasting more months in advance (S3 Fig). Further forecasts of up to six months in advance in Hanoi showed a continuing worsening of performance (S4 Fig). Multi-step outbreak prediction: As with 1-month ahead predictions, outbreak month detection was assessed for forecasts 2-3 months ahead (Fig 8). As expected, the performance dropped for some provinces when predicting further into the future (e.g., for Binh Phuoc, Quang Nam and Ha Noi). However, the overall performance was still approximately the same for almost all of the other provinces.

Discussion
This study found that LSTM-ATT frequently outperformed competing deep learning models in DF prediction and displayed a marked improvement over the basic LSTM model. Further

PLOS NEGLECTED TROPICAL DISEASES
exploration revealed that LSTM-ATT could accurately forecast DF incidence and predict outbreak months up to 3 months ahead, though accuracy dropped slightly compared to shortterm forecasting. While other studies have applied a country-level threshold to identify epidemic months [17], the incidence of DF in Vietnam varies across regions, provinces, and cities. Therefore, a single threshold method is not appropriate. By setting the outbreak threshold as one standard deviation above the monthly mean for a province, both medium and large-scale outbreaks were detected, which may be more useful for mitigating DF epidemics.
Meteorological factors are, in part, associated with changes in DF incidence because of their impacts on mosquito development and behaviour. The implementation of an early-warning system for DF requires it to be based on data that is widely accessible throughout Vietnam at short notice with low costs involved, and weather data and case numbers satisfy these criteria unlike other correlates of DF such as mosquito density [47]. The models in this study used a subset of rich meteorological factors including temperature, precipitation, humidity, evaporation, and sunshine hours for forecasting, as recursive feature selection identified these as the most relevant predictors out of the 12 weather variables available. Development rates increase for Ae. aegypti eggs, larvae, and pupae from 12˚C up to 30˚C, then drop sharply after 40˚C [48]. Additionally, biting rate may increase with temperature [49] and estimated dengue epidemic potential increases with average temperature up to 29˚C for low diurnal temperature ranges but is lower with high diurnal temperature ranges [50]. Increases in rainfall have been shown to increase mosquito density and oviposition of Ae. aegypti, which can facilitate endemicity [51]. The pooling of rainwater in containers and tires can create breeding grounds for mosquitoes [4], though excessively heavy rainfall, conversely, has been proposed to flush out breeding sites [12]. Furthermore, humidity is associated with increased survival of Ae. aegypti [52], and evaporation could impact Aedes mosquitoes through its effects on humidity. Previous works in Thailand [16] and Puerto Rico [53] have found models including weather data to perform worse than those that did not. The complex mechanisms described here between DF and weather could explain why deep learning models show considerable predictive ability in forecasting DF incidence-simpler models may be unable to adequately process the non-linear biological relationships. In our results, the SARIMA model only used previous DF incidence as a predictor, and performed worse than the deep learning models which included meteorological factors. However, an evaluation of equivalent deep learning models with and without meteorological factors would be required for a true comparison.
In general, the LSTM-ATT model frequently outperformed the other deep learning models being assessed. Moreover, LSTM-ATT outperformed LSTM in 13 and 14 provinces when measured by MAE and RMSE, respectively. In Quảng Nam, the MAE was lower for the standard LSTM model, but the RMSE was lower for LSTM-ATT. As RMSE attributes greater weight to larger errors unlike the linear weighting of MAE, this suggests the LSTM-ATT model had more small-magnitude errors but fewer large-magnitude errors than the standard LSTM model. This is likely to be preferable in DF forecasting, where the underestimation of an outbreak could be catastrophic.
To the best of our knowledge, this study is the second to forecast long term DF incidence and outbreak months on a large scale in Vietnam. Disease incidence and epidemic detection remained relatively accurate for forecasts up to three months in advance, which further illustrates the utility of LSTM-ATT in DF forecasting. There are very few works exhibiting true long-term DF prediction. Colón-González et al. [28] recently developed a weather and land cover-based probabilistic superensemble of generalised linear mixed models (GLMMs) to forecast DF in all 63 provinces in Vietnam up to 6 months in advance. Average accuracy and sensitivity scores of 73% and 68% were obtained for outbreaks more than two standard deviations above the mean. As a different outbreak threshold was used and results were averaged across 1-6 months lags, direct comparisons with our results are not possible. However, the cost effectiveness analysis in the study suggests implementing the superensemble model could improve relative value in reducing the impact of DF outbreaks compared to not using a prediction model in most provinces. Therefore, future work to directly benchmark GLMM superensembles and deep learning models may be useful.
Outside of Vietnam, a few long-term weather-based DF forecasting models have been developed. Hii et al. [17] reported high prediction precision for a Poisson multivariate regression model forecasting DF outbreak months in Singapore 16 weeks in advance. The model had a Receiver Operating Characteristics (ROC) area under the curve (AUC) of 0.98 for outbreak forecasting. However, case numbers were much lower than they are in Vietnam. There was only one outbreak to assess performance on in the one-year validation period, reducing the robustness of the analysis. Shi et al. [54] employed LASSO regression to develop models for up to 3-months ahead DF forecasting in Singapore, with a MAPE of 17% for a 1-month lag and 24% for a 3-month lag. Notably, they integrated mosquito breeding index with meteorological data for predictions. Both of these studies were on a national level, while Chen et al. [55] used LASSO regression for neighbourhood level forecasting in major residential areas in Singapore. They reported AUC values of 0.88-0.76 for predictions of 1-12 weeks, respectively. Additionally, non-meteorological data was integrated in the form of cell-phone derived travel metrics, building age, and Normalised Difference Vegetation Index.
Previous studies comparing weather-based DF forecasting techniques are in agreement with our findings regarding the high accuracy of LSTM models. Xu et al. [23] found LSTM to be superior to BPNN, GAM, SVR, and GBM techniques, with transfer learning improving predictions in lower-incidence areas. Similarly, Pham et al. [14] found a genetic algorithm enhanced LSTM model to provide better accuracy than linear regression and decision tree models. Here, we present a novel implementation of the attention-mechanism for LSTM models in the prediction of DF incidence from meteorological data, and demonstrate its improved accuracy over CNN, standard LSTM, and Transformer models. Notably, LSTM-ATT outperformed the basic LSTM model in almost all provinces, suggesting LSTM-ATT could be a more robust choice for future studies on the prediction of climate-sensitive diseases.
Surprisingly, the Transformer model performed poorly throughout the study, even though it has previously been shown to outperform LSTM-based models in some other applications [56]. In most of the cities, the Transformer performed worse, and under-fitting was observed in many of the results. The advantage of Transformer is that the model is based on self-attention. This helps the Transformer by not processing the sequential data in order and can reduce training time due to parallel computation. This advantage, however, does not appear to carry over to the research presented in this study, which might be better handled strictly in order due to the seasonality of the data. In other words, processing the input data in this paper as a whole seems ineffective.
This study had several limitations regarding alternate correlates of DF incidence, case reporting, and dengue virus serotypes. One was not accounting for various non-weather-based factors of DF transmission, such as human behaviour, travel patterns, mosquito density, dengue virus serotypes, and public health programs for DF prevention and control. These were, however, impractical to model on a national or provincial scale for an early-warning system in Vietnam. On a similar note, missing case and meteorological data may be confounding factors. Some of the differences observed between provinces may be attributable to different rates and methods of data reporting between locations. Additionally, as this was a retrospective study, all data was available in real time. Due to delays in case reporting, prospective forecasting sometimes requires predictions to be made with incomplete case data. Reich et al. [57] found this to be the case for DF forecasting in Thailand, and reported reduced model accuracy for predictions into the future as a result. Real-world implementation of the deep learning models presented in this study, therefore, may have higher errors than presented here. Lastly, multiannual spikes in DF incidence have previously been a barrier to accurate DF prediction and have been attributed to antibody-dependent enhancement following a new serotype being introduced to a region [19]. While the models presented here were only evaluated on 36 months of data, they appear to partially overcome this limitation and accurately predict large multi-annual fluctuations in cases.

Conclusion
In this study, we developed and evaluated a selection of deep learning models for the prediction of DF incidence and epidemics in Vietnam. In contrast to most existing works, which have focused on smaller study areas in Vietnam with fewer weather variables [8,21,27], our models were built upon a rich set of 12 different meteorological factors (including temperature, precipitation, humidity, evaporation and sunshine hours) and evaluated on 20 different provinces in northern, central and southern regions of Vietnam. These regions display significantly different geographical and climate conditions, allowing for a robust assessment of model performance. LSTM techniques were found to display considerable accuracy in forecasting DF incidence, with LSTM-ATT demonstrating improved prediction performance over other models in nearly all provinces. Vietnam is experiencing a digital transformation in healthcare. Digital technologies, such as AI with deep learning models for forecasting climate-sensitive diseases, come as a promising measure to promote public health responses to climate change and enhance their efficiency. The application of LSTM-ATT in forecasting other prioritized climate-sensitive diseases in Vietnam such as influenza, diarrhoea, and malaria should be further explored.
Supporting information S1 Table. Ranked features for all provinces. Features were ranked by recursive feature elimination using a random forest regressor to rank importance. The features are listed in order from most important to least important.