Modeling and Predicting Hemorrhagic Fever with Renal Syndrome Trends Based on Meteorological Factors in Hu County, China

Background Hu County is a serious hemorrhagic fever with renal syndrome (HFRS) epidemic area, with notable fluctuation of the HFRS epidemic in recent years. This study aimed to explore the optimal model for HFRS epidemic prediction in Hu. Methods Three models were constructed and compared, including a generalized linear model (GLM), a generalized additive model (GAM), and a principal components regression model (PCRM). The fitting and predictive adjusted R2 of each model were calculated. Ljung-Box Q tests for fitted and predicted residuals of each model were conducted. The study period was stratified into before (1971–1993) and after (1994–2012) vaccine implementation epochs to avoid the confounding factor of vaccination. Results The autocorrelation of fitted and predicted residuals of the GAM in the two epochs were not significant (Ljung-Box Q test, P>.05). The adjusted R2 for the predictive abilities of the GLM, GAM, and PCRM were 0.752, 0.799, and 0.665 in the early epoch, and 0.669, 0.756, and 0.574 in the recent epoch. The adjusted R2 values of the three models were lower in the early epoch than in the recent epoch. Conclusions GAM is superior to GLM and PCRM for monthly HFRS case number prediction in Hu County. A shift in model reliability coincident with vaccination implementation demonstrates the importance of vaccination in HFRS control and prevention.


Introduction
Hemorrhagic fever with renal syndrome (HFRS) is a rodent-borne zoonosis caused by hantaviruses. It is characterized by varying degrees of bleeding diathesis, hypertension, and renal failure [1]. The overwhelming majority of reported cases of HFRS in Asia have occurred in China and the annual HFRS incidence of China has ranked No.1 in the world since 2000 [2]. The incidence of HFRS in Hu County ranked third of all counties in China [3]. Both Hantaan viruses (carried by Apodemus agrarius mice that thrive in the wild) and Seoul viruses (carried by Rattus norvegicus rats that thrive in residential areas) have been detected in Hu County, though Hantaan virus infections are more prevalent.
The incidence of HFRS in China has decreased significantly since 1984 owing to improved housing conditions, improved hygiene, and human migration from rural areas to cities [4]. However, the incidence of HFRS in Hu County has shown large fluctuations in the same period. It decreased greatly from 300.57/100,000 people in 1984 to 9.53/100,000 in 2005, and then increased to 48.46/100,000 in 2010 [5]. These incidence rates are 33.89 times, 5.96 times, and 68.25 times the rates of China as a whole in these sample years, respectively. The cause underlying the re-emergence of HFRS in Hu County after 2005 is not known. Elucidating the dynamic tendencies of the HFRS epidemic in Hu County will be critical for enabling China to allocate medical health resources effectively and to develop an appropriate plan for the prevention and control of the disease in HFRS re-emergent areas.
The epidemiologic profile of HFRS is influenced by numerous environmental and social factors, including meteorological factors [6,7], rodent density [6], and vaccination [8]. Meteorological factors affect the behavior and survival of rodents, and thus also influence rodent density and the likelihood of people having contact with rodent excrement. Accurate rodent density data are difficult to obtain because densities are estimated based on the numbers of rats caught by traps in fields and around houses. These catch numbers are influenced by many factors and catch frequency can be irregular.
Researchers have attempted to predict HFRS epidemics based on meteorological factors and obtained evidence supporting the notion that meteorological factors are the main factors influencing HFRS epidemic risk [9][10][11][12][13]. Although these studies have shown good predictive capacities, there has been inconsistency between models and regions in terms of predictor variables and coefficients, leaving it unclear which model is most appropriate for HFRS epidemic prediction. This variability may due, at least in part, to the fact that the meteorological factors affecting epidemics are themselves influenced by many other factors, such as elevation (relative to sea level), vaccination compliance, and which rat species live in a region. Therefore, a prediction model constructed according to data from one area would not be expected to be universally applicable. In order to predict HFRS epidemics with confidence, it is necessary to construct a prediction model according to the relationship between HFRS epidemics and meteorological factors within each specific area.
Several types of mathematical models have been used to predict HFRS epidemics, including the generalized linear model (GLM) [10][11][12], seasonal autoregressive integrated moving average model (SARIMAM) [9], and autoregressive integrated moving average model (ARIMAM) [14,15]. The prior ARIMAMs were based purely on data describing HFRS cases in prior months as predictor variables, without consideration of other potential influencing factors. The prior GLMs and SARIMAM were based on the hypothesis that HFRS cases and meteorological factors are in linear or logarithmic relationships. However, case-factor relationships are not confined to a fixed function. Therefore, some studies have attempted to describe these unfixed relationships using a generalized additive model (GAM) and a principal components regression model (PCRM) [16][17][18]. The GAM is a multiple regression model where smooth functions are used instead of pre-specifying forms for explanatory variables [19]. It provides a flexible model with which to explore the relationship between response and explanatory variables. The PCRM is a multiple regression model that involves the use of the principal components extracted from explanatory variables as predictor variables. It removes multi-colinearity between explanatory variables by dimension reduction. However, data on the accuracy of HFRS epidemic prediction using a GAM or PCRM are still scarce. Therefore, it remains unclear which model type would be better for HFRS epidemic prediction, which hinders HFRS control and prevention.
This study aimed to explore the optimal model for HFRS epidemic prediction in Hu County based on meteorological factors. To obtain accurate and comparable results, we constructed and compared a GLM, a PCRM, and a GAM, three frequently-used and well-regarded model types.

Ethics statement
This study was approved by the Ethics Committee of Fourth Military Medical University (Grant No.: 2013026). All data analyzed in this study were anonymized. All patient records were anonymized and de-identified prior to analysis.

Data collection and management
Monthly records of HFRS cases in Hu County from1971 through 2012 were obtained from the Hu Center for Disease Control and Prevention (CDC). HFRS is a national class B notifiable communicable disease in China, and Hu County is a monitor sentinels for HFRS [20]. All hospitals and clinics in Hu County are obliged to report HFRS cases to the Hu CDC through the National Infectious Disease Reporting System within 24 hours [21]. Before 1982, HFRS was diagnosed according to the national standard clinical criteria [22]. After 1982, HFRS was diagnosed first in hospitals and clinics and then confirmed by laboratory tests at the Hu CDC. Only a few cases in which sudden death occurred (3 cases in this study) were not confirmed in the Hu CDC laboratory. Annual population data for Hu County during the 1971-2012 period were determined based on information from the Hu Bureau of Statistics. Population was estimated from annual household registration records maintained by the local police departments. Monthly meteorological data for Hu County during the same period, including mean temperature (MT), mean maximum temperature (MaxT), mean minimum temperature (MinT), accumulative rainfall (R), and mean relative humidity (H), were obtained from the Hu Meteorological Bureau. Mean meteorological values were calculated by averaging data across the 18 meteorological stations in Hu County.

Cut-off of study period
An HFRS vaccination program has been in place in Hu County since 1994 and has contributed to the area's decreased HFRS incidence [8]. In order to avoid the influence of vaccination as a confounding factor, the study period was stratified into the epochs before  and after (1994-2012) implementation of the vaccine program. All data analyses and model constructions were conducted using data in each these two time periods.

Cross-correlation analysis and autocorrelation analysis
A cross-correlation analysis between monthly HFRS cases and meteorological index sequences were conducted with a lag time of six months to enable detection of the dominant meteorological factors influencing HFRS infections. Cross-correlations could be identified when the cross-correlation coefficient (CCF) was more than double the standard error (SE). The autocorrelation analysis of monthly HFRS cases was performed to explore the relationship between HFRS infections in consecutive months with the aim of accounting for short-term temporal persistence and annual trends. Autocorrelations could be identified if the autocorrelation coefficient (AC) was more than double the standard error (SE). Those variables that correlated significantly with the number of HFRS cases were selected as preliminary predictor variables.

Multi-collinearity analysis
The variance inflation factor (VIF), tolerance (T), and Spearman's rank correlation coefficient were calculated to examine the degree of multi-collinearity among the preliminary predictor variables identified in the cross-correlation analysis. Multi-collinearity was identified when the VIF was greater than 10.

Model construction
We constructed and compared three models using meteorological factors as explanatory variables. Model 1 was a GLM with a Poisson distribution and a log link performed after adjustment for autocorrelation, seasonality, and lag effects, including meteorological variables that were selected from the preliminary predictor variables by the stepwise least square method, and HFRS cases in previous months. Model 2 was a GAM employing the same response variables as model 1. For the GAM, predictor variables with smooth functions (including meteorological variables) were selected from the preliminary predictor variables by the backfitting method, and HFRS cases in previous months. The generalized cross-validation criterion was used to estimate the smoothing parameter in the GAM. Model 3 was a PCRM that examined monthly HFRS cases against predictor variables, including principal components that were extracted from the preliminary predictor variables and HFRS case numbers from previous months. A 10-fold cross-validation analysis was employed in the PCRM fitting process to determine the number of principal components that should be included in the model.
The three models were constructed as follows: Both s i (Á) and s j (Á) are smooth functions of the predictor variables in Model 2; they are fitted using a cubic regression spline. The smoothing parameters are determined using restricted maximum likelihood. The variable y t is the number of HFRS cases in time t, ŷ t is the expected value of y t , x i(t−k) represents the meteorological factors, and Z i represents the principal components. The terms α 1 , α 2 , and α 3 are constants. In these three models, the terms containing x i(t−k) denote the effect of meteorological factors, the terms containing y t−j describe the effects of auto regression, and the term X q i¼1 o i Z i represents the effect of the principal components.
The parameters m, p, n, and q represent the number of meteorological variables, the maximum lagged months of meteorological variables, the maximum lagged months of HFRS cases, and the number of principal components, respectively. The terms β i , γ i , and ω i are coefficients for each of the variables.
The three models were fit with data from the first 80% of the timeline within each of the two study epochs (1971-1988 and 1994-2008), and then we predicted numbers of HFRS cases for the remaining 20% of each of the epochs using concurrent meteorological data.

Model evaluation
The actual, fitted, and predicted values of monthly HFRS cases during each epoch of the three models were plotted with different colored lines. Fitting and predictive adjusted R 2 values were calculated, and Ljung-Box Q tests for fitted and predicted residuals were conducted to compare the fit and predictive ability of the three models. The adjusted R 2 were calculated as follows: where n is the number of months included, p is the number of response variables in the model, and R 2 is the square of the correlation coefficient between fitted or predicted values and actual values. All of these analyses were conducted using R Project 3.0.2 (R Development Core Team, 2012).

Descriptive analysis
There were 12,714 HFRS cases reported in Hu County between 1971 and 2012. The annual average incidence was 59.96 per 100,000, with the annual incidence ranging from 9.53/100,000 in 2005 (nadir) to 300.57/100,000 in 1984 (zenith). A clear seasonality phenomenon became apparent, with peaks occurring in summer and winter. The smaller of the two annual peaks occurred in summer between June and July, accounting for 9.88% of all the HFRS cases (1,256 cases). The larger peak occurred in winter between October and December and accounted for 75.49% of all HFRS cases (9,598 cases). The monthly MT, MaxT, MinT, R, and H ranged from -3.7°C to 33.3°C, from 5.3°C to 41.9°C, from -19.0°C to 20.5°C, from 0 mm to 374.9 mm, and from 39% to 90%, respectively. (Fig 1)

Cross-correlation between monthly HFRS cases and meteorological factors
The number of HFRS cases reported each month correlated significantly with meteorological factors from the current and the previous month in both epochs ( Table 1). The following variables correlated significantly with the monthly numbers of HFRS cases and thus were used as the preliminary predictor variables of the three models: MT 0 , MT 1

Autocorrelation of monthly HFRS cases
The monthly numbers of HFRS cases correlated significantly with the numbers of cases in the first and twelfth lagged months in both epochs, pointing to a clear short-term temporal persistence effect and yearly trend (Fig 2). The variables N 1 (no. cases in the immediately previous month) and N 12 (no. cases in the month 12 months prior) were thus selected for inclusion as predictor variables in the models.

Multi-collinearity among meteorological variables
There were 17 variables with variance inflation factors greater than 10 and corresponding tolerance values less than 0.1 in the early epoch, and 14 such variables in the recent epoch (Table 2). There were 193 pairs of variables that correlated with each other during the early epoch (Table 3, P < .05 or P < .01), and 226 pairs of variables that did so during the recent epoch (Table 4, P < .05 or P < .01). These results indicate that multi-collinearity of the preliminary predictor variables can be observed in this study.

Discussion
Although HFRS epidemic prediction models have been constructed for many geographic areas [9,10,12], they are difficult to apply elsewhere, because the contingencies affecting HFRS outbreak levels vary regionally. For example, the southern oscillation was found to be a risk factor favoring larger HFRS outbreaks in Changsha [10] and Inner Mongolia [12], but a protective factor dampening HFRS outbreaks in Heilongjiang [9]. Although relative humidity was found to be a risk factor in both Heilongjiang [9] and Inner Mongolia [12], the correlation coefficients and parameters of the prediction models differed between these two areas. In the present study, we construct an HFRS prediction model optimized for Hu County based on the relationship between the local epidemiological history of HFRS cases before and after the implementation of vaccinations and concurrent meteorological factors. Comparing results obtained with a GLM, a GAM and a PCRM led us to conclude that the GAM had superior predictive ability relative to the GLM and the PCRM for both the pre-and post vaccination time periods. Redundancy cased by multi-collinearity between meteorological variables in the GLM was minimized by applying the stepwise least square method, while the backfitting method was used in the GAM, and principal components were extracted from the PCRM in the model-fitting process. Both the fitted and predicted residuals of the GAM were white noise in both time periods. Therefore, we conclude that a GAM can be used to predict HFRS trends in Hu County accurately using meteorological factors. GAMs are flexible in that they allow non-parametric fits with relaxed assumptions on the relationship between response and predictor variables. This characteristic has made GAMs useful for studies examining the risk factors of respiratory diseases [23], coronary heart disease mortality [24], and autistic disorder [25], among other conditions. The GAM may be more robust than the GLM for interpreting relationships between response and predictor variables [17,26,27]. Our study provided additional evidence of this GAM advantage in the first application of these models to HFRS incidence prediction. The predictor variables included in our GAM for Hu County may not be maintained in GAMs for other locations with differing rodent host population, environmental characteristics and socio-economic factors. Nevertheless, the flexibility of the GAM in terms of setting predictor variables should be help yield high predictive accuracy.
The results of this study showed that the variables R 1 , R 2 , MT 1 and MaxT 2 were in non-linear relationship with monthly HFRS case numbers, and could be used to predict HFRS case numbers in Hu County in the 1994-2012 epoch. Consequently, epidemiologists should be able to use data describing ongoing fluctuations in these variables to predict future HFRS surges in Hu County accurately. Such predictions will enable targeted countermeasures to be prepared ahead of time and thus implemented promptly, which should improve the effectiveness of HFRS surveillance and control programs. This study provides practical evidence for the usefulness of the GAM in HFRS prediction, with the optimal predictive meteorological variables being determined for each particular locality.
It should be noted that the adjusted R 2 values of the three models in the post-vaccination time period were less than those in the preceding period, which indicates that the predictive power of all three models declined after 1994. Thus, there may be other factors that became more influential after the introduction of vaccines. It has been reported that increasing vaccination compliance plays an important role in reducing HFRS incidence in Hu County [8]. Therefore, we infer that the shift in the models' predictive abilities after 1994 is related, at least in part, to the increasing vaccination compliance in Hu County. The vaccination factor was not included in our models as an explanatory variable because annual vaccination compliance data were tracked in Hu County after 1994, whereas monthly numbers of HFRS cases and meteorological data have been recorded consistently since 1971. It will be important to monitor the GAM's predictive accuracy in Hu County in the coming years. If the adjusted R 2 values demonstrate further decreases, then some other methods should be pursued to adjust the model with respect to vaccination compliance data.
This study had a couple limitations which should be noted. Firstly, in the construction of our prediction model, we considered only meteorological factors and HFRS cases numbers in previous months. We did not account for other potential influencing factors such as rodent density, changes in land-use, and fluctuations in the non-immunized population, which may have decreased the predictive ability of the model. However, relative to data describing other environmental factors, meteorological data are highly objective, accurate, and contiguous, and are readily available. Thus a model constructed based on meteorological data will be more applicable than if it had been based on relatively subjective, inaccurate, discontinuous, or difficult to obtain data. Moreover, the goodness and stability of fit and the predictive capacity of the GAM in this study demonstrated that meteorological factors inform the HFRS epidemic pattern to a great extent and can be used to predict HFRS outbreaks accurately in Hu County. Secondly, the diagnostic criteria for HFRS changed in 1982. However, because the clinical diagnostic criteria did not change and the clinical symptoms of HFRS are easy to identify, most clinically-diagnosed HFRS cases were laboratory-confirmed after 1982. Therefore, we are not concerned that the diagnostic policy change of 1982 affects the suitability of the present GAM models.
In conclusion, this study demonstrated that the GAM is superior to the GLM and PCRM for HFRS case number prediction in Hu County. Additionally, our finding that the models' predictive power shifted the same year that hantavirus vaccination programs were implemented in Hu County provides additional evidence of the importance of vaccination in HFRS control and prevention. This study provides practical evidence for further use of GAM in the epidemiological prediction of HFRS, especially within Hu County.
Supporting Information S1