The association between dengue incidences and provincial-level weather variables in Thailand from 2001 to 2014

Dengue and dengue hemorrhagic pose significant burdens in many tropical countries. Dengue incidences have perpetually increased, leading to an annual (uncertain) peak. Dengue cases cause an enormous public health problem in Thailand because there is no anti-viral drug against the dengue virus. Searching for means to reduce the dengue incidences is a challenging and appropriate strategy for primary prevention in a dengue outbreak. This study constructs the best predictive model from past statistical dengue incidences at the provincial level and studies the relationships among dengue incidences and weather variables. We conducted experiments for 65 provinces (out of 77 provinces) in Thailand since there is no dengue information for the remaining provinces. Predictive models were constructed using weekly data during 2001-2014. The training set are data during 2001-2013, and the test set is the data from 2014. Collected data were separated into two parts: current dengue cases as the dependent variable, and weather variables and previous dengue cases as the independent variables. Eight weather variables are used in our models: average pressure, maximum temperature, minimum temperature, average humidity, precipitation, vaporization, wind direction, wind power. Each weather variable includes the current week and one to three weeks of lag time. A total of 32 independent weather variables are used for each province. The previous one to three weeks of dengue cases are also used as independent variables. There is a total of 35 independent variables. Predictive models were constructed using five methods: Poisson regression, negative binomial regression, quasi-likelihood regression, ARIMA(3,1,4) and SARIMA(2,0,1)(0,2,0). The best model is determined by combinations of 1–12 variables, which are 232,989,800 models for each province. We construct a total of 15,144,337,000 models. The best model is selected by the average from high to low of the coefficient of determination (R2) and the lowest root mean square error (RMSE). From our results, the one-week lag previous case variable is the most frequent in 55 provinces out of a total of 65 provinces (coefficient of determinations with a minimum of 0.257 and a maximum of 0.954, average of 0.6383, 95% CI: 0.57313 to 0.70355). The most influential weather variable is precipitation, which is used in most of the provinces, followed by wind direction, wind power, and barometric pressure. The results confirm the common knowledge that dengue incidences occur most often during the rainy season. It also shows that wind direction, wind power, and barometric pressure also have influences on the number of dengue cases. These three weather variables may help adult mosquitos to survive longer and spread dengue. In conclusion, The most influential factor for further cases is the number of dengue cases. However, weather variables are also needed to obtain better results. Predictions of the number of dengue cases should be done locally, not at the national level. The best models of different provinces use different sets of weather variables. Our model has an accuracy that is sufficient for the real prediction of future dengue incidences, to prepare for and protect against severe dengue outbreaks.


Dengue disease
Dengue is a mosquito-borne disease, in which Aedes aegypti is the main vector. This type of mosquito is commonly found in tropical countries. Dengue fever (DF) and dengue hemorrhagic fever (DHF), a severe form of the disease, are caused by four dengue serotypes represented by DEN 1, 2, 3, and 4 [1][2][3][4]. Reinfections of different serotypes can cause severe illnesses or deaths.
Once infected, it can take 3-14 days for the virus to incubate. Dengue fever is the initial stage of dengue cases. Symptoms of dengue fever include a high fever, body and muscular aches, nausea, vomiting, skin rash, and fatigue. Dengue fever can last 5 to 7 days. Some patients may recover. However, the disease may develop into the next lethal stage called dengue hemorrhagic fever. Clinical manifestations of DHF include reduced blood pressure due to plasma leakage from capillaries and bleeding due to low platelet counts and impairment of platelet functions. If their blood pressure is extremely low, patients may enter the stage of dengue shock syndrome, lose consciousness, and pass into the last stage of shock.
The World Health Organization (WHO) reports that during the last decade, there were approximately 390 million dengue incidences per year (284-528 million, 95% CI). About 25% of those incidences, or about 96 million people, had serious symptoms [5]. Approximately 10,000 to 20,000 people die from dengue disease each year. Almost 75 percent of the global dengue cases are in Southeast Asia and the Western Pacific Region [4]. In Southeast Asia, all four serotypes of dengue virus have been reported in the Philippines, Thailand, Malaysia, Vietnam, and eastern India [6]. Previous studies proposed that host factors play an important role in the pathogenesis of severe manifestations of dengue infection [7]. Dengue incidences in Southeast Asia were uncontrolled and predominant in children during 1970-1990s [7][8][9]. A previous study [10] showed the global burden of dengue. The rate of incidences increased during 1990-2013, and most patients are adult. Unfortunately, there are no cures for dengue. Consequently, dengue is regarded as one of the top three deadly infectious diseases. Prevention is one of the best ways to fight the disease. In this work, we believe that understanding the relationships among the disease and other factors such as weather and the prediction of future dengue infections are appropriate methods for the prevention of dengue.
Dengue disease is a large economic burden for Thailand. The work by Clark et al. [12] reveals that the financial loss for dengue infections is higher than the average income in Thailand.

Dengue and weather factors
Dengue viral pathogens typically live in tropical regions with warm temperatures such as India, Africa, Brazil, Saudi Arabia, Malaysia, Sri Lanka, and Asian countries [4,13]. Many studies show that weather factors have strong associations with dengue incidences. The study by Naish et. al. [14] reports that temperature, rainfall, and relative humidity have potential effects on the transmission of dengue. In addition, the spatial-temporal patterns and social ecology also have an association with the severity of dengue transmission [14,15]. Weather factors can potentially have delayed effects on the number of dengue cases [16,17]. Vapor pressure with a change in geographical limits also has an association with dengue transmission [15].
Due to its severe burden and its strong association with weather variables, many works have created models to study the relationships among dengue incidences and weather variations. Many factors such as humidity, rainfall, and pressure are included in the studies. Statistical models such as the negative binomial regression model have been used in previous studies [18,19]. Non-linear models have also been used in a previous study [20]. Many models also include the lag time to study the delayed effects of weather variables [1,4,6,14,[16][17][18][19][21][22][23][24][25][26][27][28][29][30]. The minimum temperature has a high association with dengue incidences in previous studies [21,22,25,28]. This study creates provincial-level models to find the relationships of weather parameters and dengue incidences. These models can also be used to predict dengue pandemics, locally. Forecasting the future dengue trends would aid the public health department in reducing the dengue burden of Thailand.

Geological details of Thailand
Thailand is a country in a tropical region, north of the equator line. The capital city of Thailand is Bangkok. Thailand covers an area of 513,120 square kilometers, located at latitude 15.8700˚N (5-21˚N) and longitude 100.9925˚E (97-106˚E). Thailand has various topographies, geographies, and weather variations. Provinces are the primary local government units. Thailand is composed of 77 provinces with an average area of 6,663.90 square kilometers. Nakorn Ratchasima covers the largest area of 20,494 square kilometers, and Samut Songkhram covers the smallest area of 417 square kilometers. Due to the location and coverage area of Thailand, almost all regions of Thailand have a tropical savannah climate. Only the southern region has a tropical monsoon climate [31]. Because of this, the weather within a province does not vary greatly.
The northern region of Thailand covers an area of 170,000 square kilometers. This region contains high mountains and various types of forests, including flat river basins. This area is separated into 17 provinces: Chiang Mai, Chiang Rai, Nan, Phrae, Payao, Mae Hong Son, Lampang, Lamphun, Uttaradit, Kamphaeng Phet, Tak, Nakhon Sawan, Phichit, Pitsanulok, Phetchabun, Sukhothai, and Uthai Thani. The general climate of the northern region consists of three seasons: summer season (March-May), rainy season (June-September), and winter season (October-February). Because the northern region is mostly on a plateau of 800-1200 square kilometers, the average temperature is approximately 25˚C. Rainfall begins in June because of the southwest monsoons. Fig 2 shows a map of the northern region of Thailand.
The northeastern region of Thailand covers an area of 172,500 square kilometers. The area consists of 20 provinces: Kalasin, Khon Kaen, Chaiyaphum, Nakhon Phanom, Nakhon Ratchasima, Bueng Kan, Buri Ram, Maha Sarakham, Mukdahan, Yasothon, Roi Et, Loei, Si Sa Ket, Sakon Nakhon, Surin, Nong Khai, Nong Bua Lam Phu, Amnat Charoen, Udon Thani, and Ubon Ratchathani. Most areas are on the Khorat plateau. The climate seasons are similar to those of the northern region except that the northeast monsoons begin in November. The average temperature is slightly higher than in the northern region with slightly less rainfall. The southern part of Thailand includes an area of 70,715 square kilometers including 14 provinces: Chumphon, Krabi, Trang, Nakhon Si Thammarat, Narathiwat, Pattani, Phang Nga, Phattalung, Phuket, Yala, Ranong, Songkhla, Satun, and Surat Thani. The population density is approximately 130 people per square kilometer. The southern region of Thailand is located between the Andaman Sea and the Gulf of Thailand. The climate of the southern region follows the tropical monsoons. The highest temperature is typically 39.7˚C in Trang and the lowest temperature is 12.1˚C in Chumphon. The rainy season is longer in the southern region than the other regions, and the southern region rarely has winter-like weather. The average temperature and average rainfall are the highest among all regions in Thailand. The central and eastern areas of Thailand consist of 26 provinces: Bangkok, Kanchanaburi, Chanthaburi, Chachoengsao, Chon Buri, Chai Nat, Trat, Nakhon Nayok, Nakhon Pathom, Nonthaburi, Pathum Thani, Prachuap Khiri Khan, Prachin Buri, Phra Nakhon Sri Ayutthaya, Phetchaburi, Rayong, Ratchaburi, Lop Buri, Samut Prakan, Samut Songkhram, Samut Sakhon, Sa Keao, Saraburi, Sing Buri, Suphan Buri, and Ang Thong. The central area has lowland plains where agriculture and industrial stations are located. The climate and seasonal weather are like the northern and northeastern areas, except for two different monsoons in May and November. The average temperature is higher than the northern region but lower than the southern region. The central region of Thailand has a high population density, compared to the other regions.

Data
Dengue data. The dengue incidence data used in this study come from the Bureau of Epidemiology (BOE), Ministry of Public Health (MOPH), Thailand. The data is in weekly periods from the first week of January 2001 to the last week of December 2014. They are clinically suspected cases reported to the MOPH by the National Disease Surveillance Report 506. Since the highest number of dengue incidences is from 2013 (see Fig 1), dengue data from 2001 to 2013 are used as a training set, whereas 2014 data are used as a test set in this study.
Data availability: The dengue incidence data can be obtained through the BOE National Disease Surveillance website.
Weather data. The weekly provincial-level weather data that was used in this study are from the Thai Meteorological Department. Eight weather parameters are considered in each model: average temperature (avp), maximum temperature (maxt), minimum temperature (mint), average humidity (avh), precipitation (rain), vaporization (vapor), wind direction (dwind), and wind power (pwind).  occurred. The dengue cases were plotted on different base maps, i.e. a province boundary map, the geographical map, population density map, and plain map. Fig 7 shows correlations between dengue cases and weather variables for Bangkok from January 2001 to December 2014. Every variables including dengue cases (patients), time series (weeks), pressure (hPa), maximum temperature (degree Celsius), minimum temperature (degree Celsius), humidity (percent), precipitation (ml), vaporization (ml), wind direction (direction), and wind power (kmph), are plotted horizontally from column one to column ten, and vertically from row one to row ten. The distribution of these variable are shown at the diagonal. Each scatter plot shows the correlation of each local column variable and local row variable.
Our study also includes a one to three-week lag time for each variable. Hence, the total independent weather variables are 32 variables. The effects of a one to three-week lag time on dengue cases are also used to construct the models. Therefore, we use a total of 35 independent variables. Thailand consists of a total of 77 provinces. However, dengue and weather data were recorded for only 65 provinces. Due to this data limitation, our study was conducted on 65 provinces in Thailand.

Statistical analysis and modeling
In this study, we use generalized linear models (Poisson, negative binomial, and quasi-likelihood regression) and time series analysis (ARIMA, and SARIMA) to create the best fitted models for weather variations and dengue incidences.
Generalized linear models. Three Generalized linear models (GLMs) used in this study: Poisson, negative binomial, and quasi-likelihood regression, allow data that are not in a normal distribution and nonnegative integers. They are commonly used in epidemiology or biostatistical research [16,18,19,21,23,25,26,29,30,32]. Excess zero counts, which cause overdispersion of the data, are allowed in these three methods.
Poisson regression is usually used to model countable data. The model allows overdispersion data, which is caused by excess zeros. A link function of the Poisson regression is the logarithmic function. Poisson regression is based on the normal distribution and assumes that the mean and variance are equal to zero.
Negative binomial regression is a subset of the Poisson regression method. This method assumes that the variance is not equal to the mean. This model is used for overdispersion data. The model is based on the Poisson-gamma mixture distribution, which also assumes that the data follows the normal distribution.
Not all 35 independent variables are pertinent to dengue incidences. Therefore, we have conducted an exhaustive search to find the most relevant subset of potential variables that results in the best-fit prediction model for each province. From previous studies [19,[24][25][26][27][28], the prediction models use 4-9 independent variables. For prediction models with up to 9 independent variables, we have conducted an exhaustive search over all combinations. This means that for each type of GLM model (Poisson regression, Negative binomial regression, quasilikelihood regression) in a province, there are C 35 1 þ C 35 2 þ � � � þ C 35 9 ¼ 77; 663; 191 models constructed. For models with 10 to 12 independent variables, we use the Forward Selection method to test the next variables, for selecting the best model prediction. Hence, for each province, a total of 77, 663, 191 + 26 + 25 + 24 = 77, 663, 266 GLM models are created.
From the combination of the variables from 1-12 variables for each method, 77, 663, 266 models per method of GLM are created. We select from 1 to 9 variables because previous studies show that the best model usually uses 4 to 9 variables [19,[24][25][26][27][28]. We construct a total of 15, 144, 337, 000 models in this study. Each province uses a total of 232, 989, 800 models. We first use a combination of from 1 to 9 variables. Because of a limitation of computer performance, the next 10 to 12 variables continuously use the Forward Selection method to test the next variables, for selecting the best model prediction. Adding the two models from time series analysis (SARIMA and ARIMA), there are 232, 989, 798 + 2 = 232, 989, 800 prediction models constructed per province. Hence, for all 65 provinces, we have constructed 232, 989, 800 × 65, which is 15, 144, 337, 000 models, in total, for Thailand.
The computers that were used in this study have 2.2 GHz to 2.5 GHz CPUs. All models are created using R (programming language) version 3.4.3. We use 37 computers for calculating the models at the same time. It takes approximately 232, 989, 800/(120 × 60 × 60 × 24) = 22.472 days of computing time to construct all the models for a province. For all provinces of Thailand, it takes approximately 2 months to complete all experiments.

Results and analysis
A summary of statistical analyses of the best models for all 65 provinces is shown in Tables Tables 3 and 4.

Best-fit prediction model analysis
Our prediction models for all 65 provinces have an average coefficient of determination of 0.6339 (95%CI: 0.57802, 0.6898), with a minimum of 0.026 at Sakon Nakhon and a maximum of 0.954 at Lampang. Our experimental results reveal that the highest-accuracy models for all 65 provinces are one of the GLM models. Moreover, the best models of most provinces (44 out of 65) are based on quasi-likelihood regression. The best models of 15 provinces are based on negative binomial regression and the best model of 6 provinces is based on Poisson regression. This finding suggests that best describe the relationship of weather variables and dengue data should not have distribution assumptions. In most provinces, the data is not normally distributed, since most models are based on quasi-likelihood, not negative binomial or Poisson regression, Both of which assume the normal distribution. In our study, time series analysis is inferior to GLM models in all provinces. This suggests that time series analysis may not be the best type of model to describe the relationships among weather variables and dengue incidences. Evidence for this inferiority can be found in Chiang Rai. From a previous study in Chiang Rai, Thailand [29], the SARIMA method was used for constructing a prediction model. They did not report the coefficient of determination between the predicted cases and actual cases. From the Chiang Rai analysis in S1 Appendix, the Quasi-Likelihood prediction model matches the observed data better than the SARIMA prediction model. In fact, for Chiang Rai data, SARIMA has the least coefficient of determination between the predicted cases and actual cases among all 5 techniques used in this study. Since all the provincial-level best-fit prediction models are GLM, we can indicate how influential an independent variable is to dengue incidences from its statistical p-value. Previous cases, variable analysis. From our study, the highest significant variable associated with dengue incidences is the positive 1-week-lag dengue cases in most of the provinces in Thailand, except in Phichit, Sukhothai, Si Sa Ket, Chachoengsao, Trat, Pathum Thani, Prachin Buri, Rayong, Krabi, and Ranong provinces. The best-fit model for all provinces contains some weather factors.
The one-week-lag time variable in most provinces has a positive association with predicted dengue incidences. In time series analysis, we fitted the Autocorrelation functions (ACF) and Partial Autocorrelation Functions (PACF). An in-depth PACF analysis of each province can be found in S1 Appendix. From the results, 1-week-lag cases provide the highest correlation among all provinces. The best model that contains the 1-week-lag case variable reported a coefficient of determination of a minimum of 0.257 and a maximum of 0.954 (average of 0.6383, 95%CI: 0.57313 to 0.70355). However, in some provinces, the highest significant variables are not the 1-week-lag cases but the 2-week-lag cases (such as Ranong, Si Sa Ket, Chachoengsao, Trat, Pathum Thani, and Rayong) or 3-week-lag cases (such as Sukhothai and Prachin Buri). These results show that the effects of previous cases on dengue incidences can be for up to 3 weeks. Our results show that the PACF provide a strong possible number of lag times, from 1-4 weeks of lag time, which presents the highest relationship at a lag of one week. This indicates the significance of past cases associated with current cases as a serial relationship in Thailand. Our results show the strong relationship of the period of a small lag time, from one to three weeks. Our findings for the relationships of small lag time and dengue incidences agrees with the classic SIR model, which considers the small-time delay that causes the threshold for epidemic diseases. Our results also support the fact that mosquitoes, as a dengue disease carrier, could contain the initial dengue viral vector for 1-14 days. This is also known as the extrinsic incubation period. A period of one month presents over-critical dengue incidences by vertical transmission of dengue virus due to the rapid infection from adult to offspring mosquitoes. Dengue cases at the first lag time are a predictive factor for most of the best model predictions. Because the stability of the dengue virus is in endemic equilibrium depending on delay times, a delay time of from 7 to 14 days is taken as the incubation time for the infective virus to transform into an active viral disease. Thus, vector diseases could stay in the carriers through the next 1 or 2 weeks. The importance of previous cases is shown in the literature [16,17,21,24,27,33].

Weather variable analysis
Even though previous cases have strong effects on dengue cases in most provinces, the best-fit model of two provinces, i.e., Phichit and Krabi, contain solely weather variables, with no previous case variables. The best-fit prediction model yields a high coefficient of determination of 0.715 (Phichit) and 0.637 (Krabi). These two provinces indicate that previous cases (alone) are not enough to accurately predict future dengue cases. Weather variables play non-negligible roles in highly accurate prediction models.
Although the previous dengue cases are significantly associated with dengue incidences, weather variables do have merit in the prediction models. In fact, the best models for all provinces contain some weather variables. The most frequent weather variable among all provinces is the current-week precipitation, which is used in 20 provinces. The second frequent weather variable is the 3-week-lag precipitation, which is associated with predicted dengue incidences. In some provinces, like Bangkok, both the current-week precipitation and the 3-week-lag precipitation are used in the best-fit prediction model. Our results show that precipitation for up to three weeks has a high (positive and negative) influence on dengue incidences. Although heavy precipitation could convey the mosquito's larva away at the current week, the amount of precipitation can substantially affect a mosquito's habitat by increasing water storages for the eggs, which has positive correlation with the 3-week-lag precipitation. This duration corresponds to the life cycle of mosquitos which take up to 14 days to become an adult period. High precipitation for 30 days has a substantial effect, rapidly increasing the risk of infection for dengue, as shown in our pairwise plots. This may be because precipitation initially creates the offspring habitats for mosquito reproduction. More precipitation induces an increased risk of infection. In contrast, massive precipitation can decrease mosquito habitats by destroying the mosquitoes' offspring in water storage or convey mosquito larvae away. Therefore, our results present both positive and negative relationships for precipitation, depending on the location in each province. Our findings strongly show that precipitation is the most influential weather factor for all provinces of Thailand. From Fig 7, some of the highest cases occur at low precipitation. However, theses cases contains higher 1 to 3-week-lag precipitation. High cases may also correspond to the life cycle of mosquitos which take up to 14 days for a female mosquito to become fully grown.
Wind direction and wind power are also influential weather variables. For example, the best-fit model of Phayao province, which uses both wind direction and power factor, shows the highest dengue incidences for 5-6 knots of wind power. This suggests that wind power is significant for the dispersion of dengue by mosquitos. The positive coefficient in many provinces shows that higher wind power may affect dengue incidences. The more wind power on water surface, the larger the region of evaporation. The increase in humidity may help adult mosquitos to survive longer and spread dengue. Wind direction variables are most influential in provinces located in northern and northeastern Thailand. These regions are mountainous areas. Human populations are not spread equally in all directions. A suitable wind direction may help distribute dengue carriers to a more populated area. This can result in a wider spread of dengue incidences. Current and lag pressure effects are minor weather variables for all provinces of Thailand. In our results, pressure has negative associations to dengue cases. Average pressure also has a negative association in many provinces, especially in the southern and northern regions. Because the northern provinces are mountainous, there is a seasonal pattern leading to a decrease in local pressure and temperature. Lower pressure intensifies the water storage in the environment by increasing precipitation. The seasonal pattern in the northern provinces causes an increase in the number of mosquitoes, leading to more dengue viral cases while increasing the residence time of incubation, which indicates a pressure-controlled variable.
For the temperature and time-lag effects, our results show that the maximum temperatures from current to 3 week-lag time are more influential in Thailand than the effects of minimum temperatures. However, the best-fit model of a province with a high dengue mortality rate usually contains an influence of the minimum temperature. For example, in Songkhla province, the minimum-temperature variable indicates that there is a high dengue case occurrence at approximately 24-26 degrees Celsius. Previous studies have shown that the minimum temperature has a positive association with dengue incidences. The association of dengue incidences with temperature change has been widely studied across lag times of up to 20 weeks. The minimum temperature has a significant impact on dengue epidemics. Our results show that the current-week minimum temperature is highly significant for the southern, eastern, and central regions.
Beyond the weather variables used in this study, the diurnal range of temperature is an important weather factor that influences dengue transmission. The correlation between dengue incidences and diurnal ranges of temperature has been studied in Bangladesh and Sri Lanka [34,35]. Including this variable in future models may help improve the accuracy of the models.

Regional analysis
In the northern region, the 1-week-lag case variable is the highest significant factor associated with dengue incidence in most of the provinces, except Phichit and Sukhothai. Phichit province was not included for the previous case variable in the best model. For Sukhothai province, the highest significant variable is 3-week-lag dengue cases. Considering overall regional weather, the highest significant weather variables are ordered as the following: negative current-week pressure, 3-week-lag wind power, 2-week-lag vaporization, and 3-week-lag wind direction. The accuracy by the coefficient of determination in this region is (on average) 0.7208 (95%CI: 0.6067, 0.8349), with a minimum of 0.22 for Phitsanulok to a maximum of 0.954 for Lampang. The northern provinces are mountainous and are higher from sea level than the other provinces. The pressures are lower so that there may be an increase in the mosquito population and the dengue virus. For example, in Lampang province, the results show a high dengue incidence at low pressure. The most frequently used weather variables in the northern region are the 3-week-lag humidity and 3-week-lag wind direction. Humidity, temperature, and dengue cases have a positive correlation. In Tak province, the results show that high humidity and high temperature increase the dengue cases. The results also support the strong positive relation of the wind direction with dengue cases. The second frequent weather variable in the northern region is the 3-week-lag wind power. As seen in Phrae province, the results show that moderate wind power leads to high dengue cases. Other frequent weather variables are the negative current-week pressure, positive 2-week-lag rainfall, and negative 2-week-lag vaporization. These values are also supported by Mae Hong Son province. For general weather effects, the best model for fitting the northern region is the quasi-likelihood model ( In the northeastern region, the most significant previous case variable is the positive 1-week-lag case variable, except for Si Sa Ket province. In Si Sa Ket province, the positive 2-week-lag case variable is the most significant previous case variable. The coefficient of determination in the northeastern region is from 0.026 for Sakon Nakhon to 0.929 for Loei province, with an average of 0.6672 (95%CI: 0.5267, 0.8076). For determining the overall weather effects, the most significant weather variable is the positive 3-week-lag vaporization. The northeastern region is a high-level plain area called the Khorat plateau, in which sunlight can easily penetrate to the land, compared with other regions, leading to higher vaporization. When vaporization rapidly increases, temperature and dengue incidences also proportional increase, as shown in the best-fit model of Chaiyaphum province. Other significant weather variables are the negative 1-week-lag pressure and the 2-week-lag wind direction. The significance of the first-lag pressure to dengue incidences was supported by Wongkoon et al., 2011. Nakhon Ratchasima and Nong Khai provinces have a high dengue incidence at low pressure. The northeastern region depends on the 1-week-lag pressure, supporting the results of Nong Khai province. For the 2-week-lag wind direction, the direction is approximately 180 to 200 degrees (from Nong Khai and Nakhon Phanom provinces), which refer to the southeast wind from the South China Sea in the local monsoon climate system [36,37]. The wind supports the pathway of dengue transmission of the northeastern area before passing through other regions. The most frequent weather variables are the negative current week and the 3-weeklag precipitation. Our results of Udon Thani show that low precipitation induces a high number of dengue cases. The second weather variable is the 3-week-lag vaporization. The best model in the northeastern region is the quasi-likelihood model, followed by negative binomial regression, Poisson regression model, and time series analysis. Time series analysis cannot be used to predict dengue incidences in this region. The quasi-likelihood regression method (13 provinces out of a total of 16 provinces) has a coefficient of determination from 0.026 to 0.929 with an average of 0.6626 (95%CI: 0.4856, 0.8397). The negative binomial regression method has a coefficient of determination from 0.607 for Nakhon Phanom to 0.708 for Kalasin, and the coefficient of determination of the Poisson regression is 0.746 for Nong Khai province.
In the central and eastern regions, the most significant variable is the positive 1-week-lag cases, except Chachoengsao, Trat, Pathum Thani, Prachin Buri, and Rayong provinces. The coefficient of determination in the central and eastern regions is from 0.138 for Chai Nat to 0.836 for Chanthaburi province, with an average of 0.5440 (95%CI: 0.4417, 0.6464). In the central and eastern regions, the most significant weather variables are the negative current week and 1-week-lag pressure variables. Another frequent variable is the 2-week-lag maximum temperature. In Trat province, the maximum temperature is linearly associated with dengue cases, but the relation to pressure is the reverse. The central and eastern regions are a low-level plain and the Gulf of Thailand (Meteorological Development Bureau, 2015). The temperature in this region favors mosquitoes. Therefore, the temperature factors are important for dengue incidences. The most frequent weather variables in this region are the positive current week, 3-week-lag precipitation, and 2-week-lag maximum temperature. This area is the center point of all rivers in Thailand. When precipitation rises, it can cause an increase in dengue incidences in the area. As shown for Bangkok, the plot shows the relationship of precipitation to dengue cases, with the highest cases at low precipitation. Another frequent weather variable is the 3-week-lag wind power. As shown in Prachuap Khiri Khan, wind power in a certain direction increases dengue incidences. The best selection models in the central and eastern regions are the quasi-likelihood model, negative binomial regression model, and time series analysis. Poisson regression is not used in this region. For the quasi-likelihood regression model, the coefficient of determination is from the lowest (0.138) to the highest (0.836) with an average of 0.5420 (95%CI: 0.4146, 0.6695). The negative binomial regression model has a coefficient of determination from 0.225 for Pra Nakhon Sri Ayutthaya to 0.759 for Prachin Buri province (Average of 0.5496, 95%CI: 0.2877, 0.8115). Time series analysis is used to fit only Bangkok, Chanthaburi, and Ratchaburi province, where the coefficient of determination is from 0.287 to 0.472. From the result of Chanthaburi, the predictions of time series analysis by SARIMA have sufficient dengue incidences.
In the southern region, the most significant previous case variable is the positive 1-weeklag case variable, except for Krabi and Ranong provinces. In Ranong, the 2-week-lag cases are the most influential variable. Nevertheless, the best-fit model for Krabi province does not include the previous cases. The coefficient of determination is from 0.271 for Satun to 0.884 for Nakhon Si Thammarat, with an average of 0.614857 (95% CI: 0.51033, 0.7194). The most significant weather variable is the negative current-week pressure. As shown in the model of Chumphon province, the pressure factor shows a relationship to dengue cases, in which a low pressure leads to high incidences. Since the topography of the area is a peninsula between the Andaman Sea and the South China Sea, the pressure change has a significant effect on each individual location. The second significant weather variable is the positive current-week maximum temperature. The southern region has a tropical rainforest climate system where the temperature is warm throughout the year. Therefore, the effects of temperature are influential, as shown in Trang province. When the temperature increases, the rate of virus infection also increases. Other significant weather variables are the negative 1-week-lag case and the 3-week-lag vaporization. This can be seen in the model of Trang province where water vapor plays an important role in dengue incidences at moderate vaporization. The most frequent weather variables in the southern region are the negative current week, 1-week-lag case, and 2-week-lag pressure factors which all have the same significance. Other frequent weather variables are the 2-week-lag precipitation and 3-week-lag vaporization. This can be observed from the model of Trang province. In this region, quasi-likelihood regression is the best model method, followed by negative binomial regression, Poisson regression, and time series analysis. The quasi-likelihood regression model can predict the results. The coefficients of determination are from 0.43 for Phang Nga to 0.884 for Nakhon Si Thammarat (average of 0.4630, 95% CI: 0.2196, 0.7064). Negative binomial regression yields a coefficient of determination from 0.271 for Satun to 0.637 for Krabi (average of 0.5527, 95% CI: 0.4079, 0.6975). Also, Poisson regression yields a coefficient of determination from 0.506 for Ranong to 0.618 for Chumphon province. For Krabi province, which is independent of the previous case variables, the best model consists of the negative current-week pressure, positive 1-week-lag pressure, negative 2-week-lag pressure, and the positive 2-week-lag wind direction at high significance. The results show the substantial effects of pressure and wind direction. The wind direction was supported by the topography of the surrounding seas that are parallel to the southern area [37].

High mortality, provincial analysis
From the best-fit model of provinces with the highest dengue incidences in 2013, the most significant variable is the 1-week-lag case variable, except for Krabi province. The best predictive model for Krabi province does not include the previous case variables as we explained for the southern region. In provinces that have many patients, the most significant weather variables are the positive current-week minimum temperature and the 3-week-lag vaporization as shown in the model for Mae Hong Son province. There is an increase of dengue incidences when both the minimum temperature and the vaporization increase. In addition, the effects of wind direction are also significant. The model of Nakhon Phanom shows that the direction of the wind which passes through this region has a strong influence on the number of dengue incidences. The most frequent weather variable for the selected provinces is the 3-week-lag vaporization as shown in the model of Mae Hong Son province. Other frequent weather variables are the positive current-week minimum temperature and the 2-week-lag wind direction. The best-fit model in these selected provinces is quasi-likelihood regression which has coefficients of determination from 0.43 to 0.954 for Lampang (0.78625, 95%CI: 0.6747, 0.8978). Negative binomial regression is used in some provinces which yields coefficients determination from 0.51 for Songkhla to 0.909 for Mae Hong Son province (0.7855, 95%CI: 0.4333, 1.13768). Time series analysis is performed only in Chiang Rai, Mae Hong Son, Chiang Mai, Phuket, and Bangkok. Chiang Rai has coefficients of determination from 0.081 to 0.277. The selected provinces were supported by previous studies [9,29].

Limitation
The dengue case data used in our study are clinically suspected cases collected from every state hospital in Thailand. Some of the cases maybe infections with similar clinical manifestation, like Chikungunya and Leptospirosis.

Conclusion
We constructed predictive models to forecast provincial-level future dengue cases based on weather and dengue incidences from 2001 to 2014 in Thailand. There are a total of 35 independent variables, which include the one to three week previous cases and the 0 to 3 week lag for each weather variable (average temperature, maximum temperature, minimum temperature, average humidity, precipitation, vaporization, wind direction, and wind power). By using a combination of 1-12 variables to find the best model, we create a total of 77,663,266 models for one method of the generalized linear model. We create predictive models based on three statistical regression models and two time-series analysis methods: Poisson, negative binomial, quasi-likelihood regression models and ARIMA and SARIMA series analysis methods. The models for each province were combined for 2-12 independent variables. In total, 232,989,800 models are built for a province. We construct 15,144,337,000 models for all provinces in Thailand. For all provinces, the model with the highest coefficient of determination is the quasilikelihood regression model, which is chosen as the predictive model. We then use the three best predictive models to further analyze the relations among weather variables and dengue cases using pairwise scatter plots, line plots, and ACF and PACF to investigate our results. The results show an average coefficient of determination of 0.6339 (95%CI: 0.57802, 0.6898) for all provinces, which are from a minimum of 0.026 in Sakon Nakhon to a maximum of 0.954 in Lampang.
From our study, we found that the 1-week-lag case variable (coefficient of determination from a minimum of 0.026 to a maximum of 0.954, average of 0.6383, 95%CI: 0.57313 to 0.70355) is the best predictor associated with dengue incidences for all provinces. However, some provinces are predictable without previous case variables. The 2-week-lag case (coefficient of determination from a minimum of 0.391 to a maximum of 0.714, average of 0.5485, 95%CI: 0.4384, 0.6585) and 3-week lag case (coefficient of determination from a minimum of 0.642 to a maximum of 0.759, average of 0.7005, 95%CI: 0.0428, 1.4438) variables are the most influential variables in many provinces.
For weather variables, the most frequent weather variable is the current-week precipitation, followed by the 3-week-lag precipitation, 3-week-lag wind direction, and the negative currentweek pressure. This shows that the precipitation (up to a 3-week lag) significantly affects an outbreak of dengue, especially for a tropical country like Thailand.
Our results show that in each province, a different set of weather variables should be used in the best prediction model. Different geographical and spatial locations may be the cause of local effects that lead to a different dengue model. Each province has its unique local factors that cannot be generalized to other provinces. Our results indicate that the independent weather variables needed to accurately predict future dengue cases are not fixed. These variables are locally determined. Hence, the best prediction models should be constructed at a local level, such as the provincial level, not on a larger scale, such as the national level. The application of this study provides high accuracy from the predictive model for the predicted weather parameters. The public health institute may use these models in considering future strategies and hospital preparation. Thailand is a small country in Southeast Asia and our results show different independent variables in each province. The individual geographical locations are the factors that could be important for dengue incidences. We can observe the relations of variables overlapping (by areas) in the same province. Weather variables of the model are locally determined so that a province-level model provides an appropriate and accurate model. A dengue prediction model should be determined on a small scale, not on a large scale such as the national level. In conclusion, the best model prediction depends on the geographical variability of each model location. With the presented accuracy, our models can be used to predict the dengue incidences at the provincial level in the future.
Supporting information S1 Appendix. Complete provincial level analysis of weather variables and dengue cases. (PDF)