Weekly dengue forecasts in Iquitos, Peru; San Juan, Puerto Rico; and Singapore

Background Predictive models can serve as early warning systems and can be used to forecast future risk of various infectious diseases. Conventionally, regression and time series models are used to forecast dengue incidence, using dengue surveillance (e.g., case counts) and weather data. However, these models may be limited in terms of model assumptions and the number of predictors that can be included. Machine learning (ML) methods are designed to work with a large number of predictors and thus offer an appealing alternative. Here, we compared the performance of ML algorithms with that of regression models in predicting dengue cases and outbreaks from 4 to up to 12 weeks in advance. Many countries lack sufficient health surveillance infrastructure, as such we evaluated the contribution of dengue surveillance and weather data on the predictive power of these models. Methods We developed ML, regression, and time series models to forecast weekly dengue case counts and outbreaks in Iquitos, Peru; San Juan, Puerto Rico; and Singapore from 1990–2016. Forecasts were generated using available weekly dengue surveillance, and weather data. We evaluated the agreement between model forecasts and actual dengue observations using Mean Absolute Error and Matthew’s Correlation Coefficient (MCC). Results For near term predictions of weekly case counts and when using surveillance data, ML models had 21% and 33% less error than regression and time series models respectively. However, using weather data only, ML models did not demonstrate a practical advantage. When forecasting weekly dengue outbreaks 12 weeks in advance, ML models achieved a maximum MCC of 0.61. Conclusions Our results identified 2 scenarios when ML models are advantageous over regression model: 1) predicting dengue weekly case counts 4 weeks ahead when dengue surveillance data are available and 2) predicting weekly dengue outbreaks 12 weeks ahead when dengue surveillance data are unavailable. Given the advantages of ML models, dengue early warning systems may be improved by the inclusion of these models.


Introduction
Dengue fever, a mosquito-borne disease, poses a significant public health concern due to its re-emergence in tropical and sub-tropical regions [1]. In many countries where dengue is present, the disease is endemic. Globally, researchers estimate that dengue infects 390 million people per year [2]; however, only 50-100 million cases are detected due to the high asymptomatic rate [1][2][3][4][5][6]. Estimating dengue burden can be problematic due to delays in case identification, strong intra-and inter-annual variation in incidence, and the majority of cases being clinically mild or asymptomatic [7][8][9][10]. As a result, implementing effective vector control operations can be challenging [11]. To overcome these issues, the development of accurate and timely early warnings systems capable of predicting future dengue incidence that do not depend upon current dengue case data remains an active area of research [5].
Several modeling approaches have been evaluated as early warning models for various infectious diseases. Time series and regression models are commonly used but have had various levels of success [5,7,[12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. These models offer a robust and easily interpretable framework; however, these approaches can be limited by the underlying model assumptions (e.g., linear relationships between predictors and outcome) and the number of predictors that can be included [29,30]. Mechanistic models, which model individual components of a dynamic system, have accurately described outbreaks of influenza and mosquito borne diseases [31][32][33][34][35][36][37]; yet, the data required to parameterize these models are difficult to obtain, and the necessary model assumptions (e.g., disease infectivity) may not be clear until after the outbreak [7]. Ensemble approaches, which integrate multiple forecasting methods, have performed well and lately have received increased interest. Using dengue and climate data from Iquitos and San round and is heaviest between November and May. The mean daily temperatures of the coolest and hottest months are 25.6˚C and 27.5˚C, respectively. San Juan is the capital and largest city in Puerto Rico. It is located on the Northeastern coast of the island, and has an approximate population of 400,000 people. Rainfall primarily occurs between April and November, leaving the other months relatively dry. The mean daily temperatures of the coolest and hottest months are 25.3˚C and 28.7˚C, respectively. Singapore is a city state off the Southern-most tip of the Malay Peninsula, and has approximately 5.6 million inhabitants. Rainfall is heaviest during the Northeast monsoon season, which typically occurs from November to March [59]. A second drier monsoonal period occurs between June and October. The mean daily temperatures of the coolest and hottest months in Singapore are 26.5˚C and 28.4˚C, respectively.

Dengue surveillance data, predictors, and outcomes
Weekly dengue case counts for Iquitos were available between June 2000 and June 2013 from a passive surveillance network representing approximately 40% of the Iquitos population [57,60,61]. Weekly case counts for San Juan were available from April 1990 to April 2013 and were ascertained from a combination of active and passive surveillance systems [62]. All confirmed dengue cases, regardless of severity were reported in Iquitos and San Juan. Further, when the number of samples exceeded local testing capactiy, the number of positive cases among those not tested was estimated by multiplying the number of untested cases by the rate of laboratory-positive cases amongst those that were tested [57,60,62]. In both locations, all dengue and DHF cases were reported together. For Singapore, weekly dengue and DHF cases [63] for were reported separately and available between January 2000 and December 2016 from the Ministry of Health. Dengue is a nationally notifiable disease in Singapore, meaning that all clinically diagnosed and laboratory-confirmed cases must be reported to the Ministry of Health within 24 hours [28,63]. Clinically confirmed cases were then confirmed with serologic or virologic testing by the Ministry of Health. Data from each of the 3 study locations are publically available [61,64].
Using weekly case counts, we created surveillance-based predictors for our models (S1 Table). We summarized observed dengue case counts with weekly and cumulative totals starting from the beginning of the year. We also summarized the annual number of dengue cases in the past 1 to 3 years.
These data also served as the prediction outcomes, "weekly case counts" and "weekly outbreaks." We created the binary outcome variable, weekly outbreaks, to indicate whether or not weekly case counts exceeded a predefined threshold. For this study, the outbreak threshold was set at 1.5 standard deviations above the mean weekly reported cases and is defined as:

Temporal predictors
Inter-and intra-annual variations in dengue cases have been observed across the globe, providing evidence for multi-year periodicity which has been estimated to be approximately 3 years [57,[68][69][70]. To account for the temporal variation in dengue cases, we summarized time by including the month that the week of interest occurs in and 1 to 4 year periodic components as predictor variables (S2 Table). The periodic components were sine and cosine functions described below: where t is the number of months since the start of the study period and a is the inter-annual period length in years.

Weather data and predictors
We ascertained daily temperature, humidity, and rainfall summaries (i.e., averages, minimums, maximums, and totals) from the National Oceanic and Atmospheric Administration and the National Environment Agency, Singapore (Table 1). We obtained weather measurements from weather stations, remote sensed imagery, and meteorological reanalysis to account for the various strengths and limitations of each data source (see Weather data limitations in Supplemental S1 Text for a brief overview of these limitations) [12,[71][72][73][74][75][76][77]. Daily weather summaries obtained from remote sensed imagery and meteorological reanalysis were collected from the gridded cell surrounding the weather station used for each study area. We collected daily weather summaries from January 1999 to March 2014 for Iquitos, January 1989 to April 2013 for San Juan, and January 1999 to December 2016 for Singapore. We created weather-based predictors for our models (S3 Table) by aggregating daily weather summaries into multi-day and multi-week summaries. Temperature and humidity predictors included 7-, 14-, 21-, and 28-day moving averages and standard deviations. As temperature alone does not account for the optimal temperature ranges for the Aedes mosquito and may not accurately represent the temperature-dengue relationship, we created additional temperature predictors based upon the Temperature Suitability Index (TSI) [78]. Rainfall predictors included 7-, 14-, 21-, and 28-day moving averages, standard deviations, and total number of days with any recorded rainfall. We also summarized daily total rainfall for cumulative periods of 1-to 20-weeks. Since the effect of rainfall on mosquito abundance has been found to differ across seasons [70,79] we created additional rainfall predictors that summarized daily total rainfall for cold, warm, and hot periods which were based upon average daily temperature and the extreme minimum and maximum TSI thresholds [70,78].

Missing weather data
We observed missing daily weather measurements in each area due to non-reporting or instrument failure (S1 Fig). We imputed missing weather data using multiple imputation by chained equations with the MICE R package [80]. For this study, we created 10 imputation sets which we then averaged to obtain a final value for each missing observation [81].

Prediction approach
In our analysis, we developed models to predict dengue case counts and outbreaks based upon the temporal variation in dengue activity, regional population, and weather. Fig 1 reflects the general framework, used in this study, for developing a predictive ML (i.e., RF, RF-UFA) and regression-based models (i.e., Poisson regression, Logistic Regression) using historical and near-real time data as input. In our approach, we trained (i.e., fit to data) models with a subset of the study data (i.e., training data) and evaluated the accuracy of model forecasts on the last 4 years' worth of data (i.e., testing data) that had been withheld during model training. We evaluated each model on 1 year's worth of testing data at a time and in chronological order. After model evaluation, the test set was then added to the training data and the process was repeated for the subsequent year of test data. This resulted in each model being redeveloped and For each trained ML and regression model, we analyzed the predictor variables and assessed their importance. The variable analysis allowed us to (1) identify the strongest predictors of dengue case counts for each study area and (2) to perform variable reduction, a conventional approach to improve model accuracy. During variable reduction, we removed weak and non-informative predictors by ranking each variable according to the variables measure of importance, which is defined later. After ranking each variable, we removed all non-informative variables and selected the top 1%, 5%, and 10% most important variables. We then retrained each model using the 3 subsets of predictors and evaluated the predictive accuracy of these models. This process was performed for each test set.
For this study, all models and statistical analyses were implemented in the R programming environment version 3.3.3. [82]

Predicting weekly outbreaks
We observed substantial imbalance in the proportion of outbreak and non-outbreak weeks for each study area. Class imbalance can cause a predictive model to classify all predictions as the same class in an effort to maximize model accuracy, resulting in an uninformative model. To overcome the limitation of class imbalance [83], we trained the models on a "balanced" dataset where we under-sampled non-outbreak observations to create a 1:1 ratio of outbreak to nonoutbreak observations in the training set. To account for sampling variability, we created 500 training sets which we used to train each model and averaged the predictions. Additionally, we optimized model performance by selecting the classification threshold (i.e., the minimum prediction value required for an observation to be classified as "outbreak") that maximized model performance.

Machine learning models
In our study we used RF to predict weekly case counts and weekly outbreaks and RF-UFA to predict weekly outbreaks only. RF is an ensemble ML algorithm based upon decision trees and has been previously used to analyze time series data [40,45,84]. RF-UFA is an extension of the RF algorithm where the Univariate Flagging Algorithm (UFA) is used to transform continuous predictors into binary predictors [85]. UFA transforms continuous predictors by identifying an optimal threshold that is associated with a statistically significant (p � 0.01) higher ("high- To assess how each model's predictive accuracy was affected by the lack of current dengue surveillance data, we trained models to predict dengue case counts and outbreaks using only population, temporal, and weather predictor variables. We compared the performance of these models with the performance of the same models when surveillance data inputs were included. https://doi.org/10.1371/journal.pntd.0008710.g001 risk") or lower ("low-risk") risk of the outcome. All RF models were fitted with the randomForest R package [86]. A more detailed explaination of both models is available in Supplemental S1 Text "Overview of machine learning models."

Regression models
We used 2 types of generalized linear regression models in our study: Poisson regression to predict weekly case counts and Logistic regression to predict weekly outbeaks. Unlike RF, regression models are not well suited for high dimensional data analysis and requires additional measures to prevent overfitting. To minimize this risk, we used the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm [87][88][89]. We identified the optimal penalty parameter using 10-fold cross validation and selecting the parameter that minimized the cross validation mean absolute error (MAE), for Poisson regression models, and the misclassification error rate, for Logistic regression models [89]. All Poisson regression and Logistic regression models were implemented with the glmnet R package [90].

Time series model
We developed an autoregressive integrated moving average (ARIMA) model to forecast weekly dengue case counts in each study location. As ARIMA models cannot be applied to high dimensional data, model predictions were based upon the time series of observed case counts only. In this study, we also evaluated seasonal ARIMA (SARIMA) models and found that the added seasonal component did not consistently improve model performance, as such we do not present the results of the SARIMA model.
The ARIMA model parameters were identified by finding the parameters that resulted in the best fit of the training data. To identify the best fitting parameters, we performed a stepwise search and selected the parameters which minimized the model Akaike Information Criterion (AIC). The ARIMA model was implemented using the forecast R package [91].

Variable importance
Variable importance is a measure of how much a single variable contributes to the overall predictive accuracy of a model. For RF-based models, we ranked variables according to their "percentage increase in mean squared error" when predicting weekly case counts and by their "mean decrease in accuracy" when predicting weekly outbreaks [92]. Both metrics measure how much error would be introduced into the model's predictions if the variable were to be removed from the model. For Poisson regression and Logistic regression, we ranked variables according to the absolute value of the standardized coefficient, a conventional ranking approach for regression models [93].

Model evaluation
We evaluated the performance of each model with the withheld testing data. To quantify model accuracy, we selected accuracy metrics that measure how well model predictions approximate observed outcomes. When predicting weekly case counts, we used mean absolute error (MAE) which measures how far a prediction deviates from the observed outcome. The MAE is defined as follows: where n is the number of observations, y i is the observed number of dengue cases for week i, andŷ i is the predicted number of dengue cases for week i. The MAE is considered to be an unbiased estimator because it only considers the variance and not the magnitude of the errors [45]. Since the magnitude of reported dengue cases varied widely by study area, we also report the normalized MAE (nMAE). The nMAE provides an estimate of the prediction error relative to the average number of weekly cases in the testing data and allows for better comparisons of model accuracy between study areas and forecast horizons. We calculated the nMAE by dividing the MAE by the average weekly number of dengue cases. The nMAE is defined as follows: where n is the number of observations y i is the observed number of dengue cases for week i, and MAE is the mean absolute error. The best value that can be obtained for both MAE and nMAE is 0, while the worse value is unbounded. For models forecasting weekly outbreaks, we quantified how well model predictions approximated observed outcomes with Matthew's Correlation Coefficient (MCC) [94]. MCC measures the correlation between a binary outcome and prediction and unlike other measures MCC is insensitive to class imbalance [95,96]. MCC is defined as follows:

MCC ¼
TP � TN À FP � FN ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where TP is the number of true positives; TN is the number of true negatives; FP is the number of false positives; and FN is the number of false negatives. The best value that can be obtained for MCC is +1, while the worse value is -1.

Results
Weekly dengue case counts for each study area are presented in Fig 2. We observed substantial inter-annual variation as well as wide ranges in the number of weekly reported cases during the observational periods by study area. Reported weekly case counts ranged from 0 to 116 in Iquitos, 0 to 461 in San Juan, and 3 to 888 in Singapore. The average number of weekly cases varied greatly by study area as well. The average number of weekly cases was 7.57, 38.84, and 115.96 for Iquitos, San Juan, and Singapore, respectively. In 2013, we observed a notable increase in the number of reported dengue cases in Singapore, which was the result of a large dengue outbreak throughout all of Southeast Asia [97][98][99][100].
In our study, we developed multiple ML, regression-based, and time series models under various data availability and forecast horizon settings. Since the objective of this study was to compare ML (i.e., RF and RF-UFA) models with conventional forecasting models, we only describe the results for models with the best performance under each data-forecast horizon scenario. In our evaluation, models with the smallest nMAE or largest MCC were defined as the best performing models.

Forecasting dengue case counts
In Iquitos (4 week ahead forecasts: Fig 3; 12 week ahead forecasts: S2 Fig), both RF and Poisson regression models did not fully capture the sharp increase in dengue cases in 2011. Interestingly, during the typical peak dengue period the predictions made by the Poisson regression model had the highest level of uncertainty as demonstrated by the wide confidence intervals. Unlike the Poisson regression model's predictions, RF model forecasts had small confidence intervals regardless of the transmission period (peak or non-peak season). Forecasts made by ARIMA model (S3 Fig) typically captured the transmission dynamics (i.e., increased cases during the peak season and fewer cases during the low dengue season); however, ARIMA model forecasts did not marginally vary from year to year, indicating an inability to differentiate between large and small epidemics.
In San Juan, both RF and Poisson models captured the general trend in dengue case counts regardless of the inclusion of surveillance data (Fig 4). When surveillance data were included, both RF and Poisson model forecasts were more similar to observed case counts as when surveillance data were not included. As was observed in Iquitos, Poisson model forecasts showed In Singapore (Fig 5), when surveillance data were included in the model, RF and Poisson regression 4 week ahead predictions did not reflect the general trend in dengue cases for the first 2 sets of testing data (2013 and 2014). In the last 2 test sets (2015 and 2016) 4 week ahead forecasts for both RF and Poisson regression captured the general trend in dengue cases, suggesting that the training data was not representative of the first 2 test sets (2013 and 2014). A  Table 2 summarizes the nMAE and MAE of the residuals between observed weekly dengue case counts and model predictions for the optimal RF, Poisson regression, and ARIMA models by study area and the data used to make the predictions (results for all evaluated models are available in S4 Table). When the evaluated models predicted dengue cases 4 weeks ahead and surveillance data were included, RF had more accurate forecasts relative to both Poisson regression and ARIMA models. We estimated RF nMAEs as 0.87, 0.27, and 0.40 in Iquitos, San Juan, and Singapore respectively. On average, RF forecasts had 21% and 33% less error than Poisson regression and ARIMA models. As model performance may differ by dengue season, we also evaluated model accuracy during peak and non-peak dengue periods [102][103][104]. During peak dengue season (S5 Table), the RF model had less error than Poisson regression and ARIMA models in San Juan (RF nMAE: 0.22) and Singapore (RF nMAE: 0.37). In Iquitos, the ARIMA model had the least amount of error (ARIMA nMAE: 0.70). During the non-peak dengue (S6 Table), Poisson regression had the least amount of error in Iquitos (Poisson nMAE: 0.91) while RF had the smallest nMAE in San Juan (RF nMAE: 0.37). In Singapore, RF and Poisson regression had identical nMAEs, 0.43. We evaluated each model's ability to make long-term forecasts of dengue case counts. Compared with RF and Poisson regression, ARIMA had a smaller nMAE in Iquitos and Singapore, 0.85 and 0.40 respectively. However, in San Juan, RF (nMAE: 0.48) had less error than Poisson regression (nMAE: 0.59) and ARIMA (nMAE: 1.16). We observed similar trends in performance during the peak-dengue season (S5 Table). During non-peak dengue season (S6 Table) RF was more accurate than Poisson regression and ARIMA in Iquitos and San Juan (Iquitos RF nMAE: 1.34; San Juan RF nMAE: 0.59). In Singapore, ARIMA performed better than both RF and Poisson regression (ARIMA nMAE: 0.43).
To understand how model accuracy is affected when current surveillance data are unavailable, we retrained models using only population, temporal, and weather data inputs. We

The strongest predictors of dengue case counts
Using variable analysis, we identified the strongest RF model predictors of weekly dengue case counts (Figs 6-8). When models included surveillance inputs, previous dengue levels were the strongest predictors for near term forecasts. When model forecasts were based upon only population, temporal, and weather data, the strongest predictors included population size, 3-and 4-year periodicity, multi-week cumulative rainfall, peak daily rainfall (Iquitos only), the average and variation in minimum daily temperature (Iquitos only), and monthly air passenger arrivals (Singapore only). Of note, these predictors were typically distributed over lag periods greater than 15 weeks. Across all study areas, we found that the inclusion of surveillance predictors had a much smaller impact on the model's long-term forecast accuracy. Table 3 presents model MCCs, summarizing how well the optimal RF, RF-UFA, and Logistic regression models correctly predicted weekly dengue outbreaks 4 and 12 weeks in advance (results for all evaluated models are available in S7 Table). When predictions were made 4 weeks in advance and based upon surveillance, population, temporal, and weather data, both RF and RF-UFA performed worse than Logistic regression in San Juan and Sinagpore (Logistic San Juan: 0.84; Singapore: 0.57). RF-UFA had the largest MCC in Iquitos (0.56). For long-term forecasts, RF-UFA outperformed all other models where MCCs equaled 0.58, 0.61, and 0.30 in Iquitos, San Juan, and Singapore, respectively. On average, RF-UFA MCCs were 125% and 79% larger than RF and Logistic regression model MCCs. To evaluate RF-UFA's utility as an early warning tool, we compared the total number of high and low-risk flags per week with weekly dengue case counts (Figs 9-11). Using Pearson's The number of high-risk (red) and low-risk (blue) flags per week that are met 12 weeks in advance are plotted against weekly dengue case counts (black) in the testing data. Grey regions represent observed outbreak weeks. Thresholds were identified using UFA and are associated with dengue outbreaks 12 weeks into the future. Black dashed lines indicate the beginning of a new test set.

Forecasting dengue outbreaks
https://doi.org/10.1371/journal.pntd.0008710.g010 correlation, we estimated the correlation between high-risk flags and dengue cases being 0.60, 0.69 and 0.73 in Iquitos, San Juan, and Singapore. We observed a weaker and negative correlation between the number of low-risk flags and dengue cases in Iquitos (-0.35) and Singapore (-0.37), but a strong negative correlation in San Juan, (-0.79).

Discussion
In this study, we developed RF, regression, and ARIMA models to predict dengue cases and outbreaks in 3 geographic locations. For near term forecasts, we found that RF performed better than both Poisson regression and ARIMA when the model had access to prior dengue surveillance data ( Table 2). On average, RF predictions had 21% and 33% less error than Poisson regression and ARIMA models respectively. These results are consistent with other studies  comparing the forecasting capabilities of RF with regression and time series models [40,45,84]. We believe that RF's better performance is due to the model's ability to capture the nonlinear dynamics that are part of dengue ecology [105] and to learn the trajectory of an outbreak from previously observed outbreaks. When forecasts were extended to 12 weeks in advance, the ARIMA model had the least amount of error in Iquitos and Singapore. However, in San Juan, RF performed better than Poisson regression and ARIMA. Our observation of the ARIMA model outperforming both the RF and Poisson models may be due to the ARIMA model's ability to describe key underlying factors without being overly complex [106]. The performance of these models in providing short-and long-term forecasts appar to indicate that for short-term prediction, models benefit from an increase in complexity as the outcome is more certain and the added complexity increases model accuracy. However, for long-term predictions where the outcome is less certain, the additional model complexity appears to hurt model accuracy.
In a forecasting challenge which used similar dengue and weather data from Iquitos and San Juan; mechanistic, statistical and multimodel ensemble models were used to predict 3 dengue outcomes: peak incidence, week of peak incidence and total incidence [106]. Model performance was highly variable where models did not consistently perform well across locations and prediction targets. Similar to our study, the models did not perform well during high incidence seasons-potentially due to only having a few high incidence seasons to train the model on. Further, Johansson et al (2019) found that on average, models which included biologically meaningful data and mechanisms had lower accuracy [106]. This result appears to support our finding that ML models can at times, better leverage biologically meaningful data as they utilize a more flexible framework and do not require a priori assumptions of the predictor-target relationship.
Due to delays in case identification, current surveillance data may not be available in real time. To evaluate this limitation, we removed model inputs related to surveillance data and reassessed model performance. We found that predicted values generated by both RF and Poisson regression were similar to the general trend in dengue case counts in Iquitos and San Juan but not in Singapore. Our results show that both models were sensitive to the lack of surveillance data and model error increases. The increase in error is most likely a result of the combination of similar yearly weather patterns but high inter-annual variation in dengue spread. As such, these models are unable to fully anticipate whether or not future dengue levels will be high or low when surveillance data are unavailable.
In each study area, the random forest model had a high degree of confidence in its predictions, as evidenced by the small confidence intervals. Though the confidence intervals were small, the observed number of weekly cases were typically not included within the confidence interval. This is due to the way that the random forest model estimates the standard error: as the variation in predictions among the individual trees [102]. This result indicates that there was little variation in predicted values between individual trees.
For some scenarios, such as vector control planning, the accurate prediction of outbreak periods may prove sufficient to provide an early warning of an imminent dengue outbreak. The RF-UFA model was able to forecast weekly dengue outbreaks 12 weeks in advance where model MCCs ranged from 0.27 to 0.61 (Table 3). Further, the RF-UFA model was able to indicate periods of low dengue risk 12 weeks in advance (Figs 9-11). Of interest, RF-UFA performed well even when surveillance data inputs were removed from the model. In our analysis of the RF-UFA model we found that the number of weekly high and low-risk flags correlated well with dengue cases. Twelve weeks have been identified as the optimal lead time to enact widespread vector control efforts [11]; based upon our study results RF-UFA could be a beneficial addition to an early warning system due to its ability to identify changes in dengue spread risk.
Another study objective was to identify the strongest predictors of dengue case counts (Figs 6-8). According to our models, the strongest predictors were previous levels of dengue cases -indicating that factors such as force of infection have a stronger influence on local transmission than weather factors. These results do not imply that weather is not important but rather, once suitable weather conditions are achieved, outbreak risk becomes a function of other drivers such as: vector control, population immunity, and virus infectivity. Interestingly, in Johansson et al (2019), models which incorporated weather and surveillane data typically performed worse than models based only on surveillance data, suggesting that previous levels of dengue cases are the strongest predictors [106]. The authors further hypothesized that surveillance predictors alone may contribute equivalent information as weather predictors regarding future dengue levels and the addition of weather data may overly complicating the model [106].
For each study area, when we removed surveillance inputs from the models and predictions were based upon population, temporal and weather data only, the strongest predictors typically described multi-week weather patterns distributed over lag periods greater than 15 weeks. The observed relationships in our study are most likely due to the phase difference between seasonal signals causing the variables to become correlated rather than being related through a causal mechanistic link [57]. The strongest weather predictors demonstrated low week-to-week variation, but larger month-to-month variation. In addition, the observed lag periods are towards the maximum period by which weather variables have been observed to affect dengue spread. In Singapore, monthly air travel patterns distributed over long lag periods were also a strong predictor of dengue cases. Though global travel has been identified as an important driver of dengue outbreaks in Singapore [107], the effect of imported cases has been observed to persist a maximum of 14 to 16 weeks, suggesting that this finding is also due to phase differencing [108][109][110][111][112].
Our study has some limitations. Data availability may have negatively affected model performance. We could not obtain vector control data, which are critical in diminishing the size of the outbreak [11,[113][114][115], and may confound the relationship between predictors and prediction outcomes, causing the model to learn biased predictor-outcome relationships.
To train our models, we used dengue case counts as reported by passive surveillance systems. As such, asymptomatic and clinically mild cases were most likely missed, suggesting that model predictions are underestimates of the true number of cases [2].
Our study highlighted various limitations for each modeling approach. When predicting dengue case counts, RF consistently underestimated observed extreme values, for example the 2011 outbreak in Iquitos and the 2013 outbreak in Singapore (Fig 5 and Fig 7). This consistent underestimation is a direct result of the RF's inability to predict outside of the training set's outcome distribution [92]. Despite this limitation, the RF model typically identified when dengue cases would peak. In contrast, Poisson regression would occasionally overestimate peak weeks with a delay, due to the model's reliance upon the previous week's reported cases and the linear relationship imposed by the model. When predicting weekly outbreaks, we found that all models performed poorly in Singapore, where there was an unprecedented increase in dengue cases beginning in 2013 due to a severe dengue outbreak throughout Southeast Asia [97][98][99][100]. As a result, the models were unable to account for this shift in dengue dynamics.
In evaluating RF-UFA performance, we found that this model suffered from false positives in Iquitos and San Juan. Typically, the model predicted an earlier onset and a later end to the outbreak period and, on occasion, would incorrectly predict extended outbreak periods during the traditional peak dengue months. This is certainly problematic and requires further attention since too many false positives can lead to alarm fatigue and can rapidly deplete limited resources [116].

Conclusions
In this study, we compared the ability of ML, regression, and time-series based modeling approaches to forecast dengue case counts and outbreaks. When using dengue surveillance, population, temporal, and weather data as model inputs, RF was more accurate than both Poisson regression and ARIMA models, for near term predictions while the ARIMA model performed best for long-term predictions. We also found that when predicting dengue outbreaks, RF-UFA outperformed both RF and logistic regression models when using only population, temporal, and weather data as model inputs. Given the potential advantages of ML models the forecasting capabilities of dengue early warning systems may be improved by the inclusion of ML models.

S1 Text. Additional Materials and Methods.
(DOCX) S1  Table. Optimal model performance when predicting weekly dengue case counts during the typical peak dengue season. � The ARIMA model was only developed using previously observed case counts. Abbreviations: nMAE: normalized mean absolute error; MAE: mean absolute error. Iquitos peak dengue season: January to July [102]. San Juan peak dengue season: May to November [104]. Singapore peak dengue season: September to February [103]. Table. Optimal model performance when predicting weekly dengue case counts during the typical low dengue season. � The ARIMA model was only developed using previously observed case counts. Abbreviations: nMAE: normalized mean absolute error; MAE: mean absolute error. Iquitos peak dengue season: January to July [102]. San Juan peak dengue season: May to November [104]. Singapore peak dengue season: September to February [103].