Time Series Analysis of Hemorrhagic Fever with Renal Syndrome: A Case Study in Jiaonan County, China

Exact prediction of Hemorrhagic fever with renal syndrome (HFRS) epidemics must improve to establish effective preventive measures in China. A Seasonal Autoregressive Integrated Moving Average (SARIMA) model was applied to establish a highly predictive model of HFRS. Meteorological factors were considered external variables through a cross correlation analysis. Then, these factors were included in the SARIMA model to determine if they could improve the predictive ability of HFRS epidemics in the region. The optimal univariate SARIMA model was identified as (0,0,2)(1,1,1)12. The R2 of the prediction of HFRS cases from January 2014 to December 2014 was 0.857, and the Root mean square error (RMSE) was 2.708. However, the inclusion of meteorological variables as external regressors did not significantly improve the SARIMA model. This result is likely because seasonal variations in meteorological variables were included in the seasonal characteristics of the HFRS itself.


Introduction
Hemorrhagic fever with renal syndrome (HFRS) is a zoonosis caused by different species of Hantavirus, Hantaan virus (HNTV) transmitted by the striped field mouse (Apodemus agrarius), Seoul virus (SEOV) transmitted by the Norway rat (Rattus norvegicus), resulting in high fever and varying degrees of renal damage and hemorrhaging [1]. Approximately 90% of the world's cases have been reported in China [2], with 10,000 cases annually in mainland China [3]. In Shandong, the HFRS epidemic exhibited a rebound trend, potentially due to changes associated with climate change and variations in rodent populations [4].
HFRS epidemics can be affected by environmental, population and reservoir factors, among which meteorological factors play an important role in the transmission of HFRS [5][6][7][8][9][10]. These meteorological factors, including temperature, precipitation and relative humidity, not only affect the transmission of Hantavirus but also impact the reservoir, rodents and contact chances between humans and rodents [11][12][13]. The infectivity and survival time of the Hantavirus after it leaves the host is largely dependent on the environmental temperature and humidity, and the chance of contact between humans and Hantavirus is influenced by the rainfall, temperature and humidity [11][12][13]. Few studies have investigated the impact of meteorological factors on the dynamics of HFRS in the context of the increasing trend in recent years.
In this study, we investigated seasonal HFRS variations and developed SARIMA models of the number of HFRS cases using time series analysis. The goal of this study was to characterize whether the inclusion of the affecting factors is useful in predicting epidemics with higher precision. The predictive model would be used to facilitate efficient HFRS control.

Data collection
Monthly HFRS epidemiologic data from Jiaonan were provided by the Jiaonan Center for Disease Control and Prevention, spanning from January 1992 to December 2014. Monthly

Statistical analyses
The analyses included descriptive and correlative approaches. The descriptive analysis included the number of cases, the long-term trends or seasonal variations and a summary of meteorological variables over the study period. A Spearman correlation analysis was performed between each meteorological variable and the number of cases. Moreover, given the potential lag effect of the meteorological variables on disease transmission, a cross-correlation analysis was also performed with relevant time lag values.

Temporal simulation and validation
A SARIMA model was used to predict the number of HFRS epidemics in Jiaonan in this study.
The SARIMA model was based on (p, d, q) (P, D, Q)[s], where p, d, and q are non-negative integers that indicate orders of non-seasonal AR terms, non-seasonal differencing and non-seasonal MA, respectively; P, D, Q are also non-negative integers that indicate orders of seasonal SAR terms, seasonal differencing and seasonal SMA terms, respectively; and s indicates the seasonal period (s = 12 months in this study).
The following steps were undertaken when modeling the number of HFRS epidemics and the meteorological variables. First, we used a mean range plot to determine whether the time series of HFRS and the climate variables exhibited stationary or non-stationary conditions. The autocorrelation function (ACF) and partial autocorrelation function (PACF) plots of HFRS in Jiaonan showed that it was non-stationary. Because both HFRS and the climate variables exhibited strong seasonal variations and fluctuations in their monthly means, we adjusted for seasonality by first seasonally differencing the series in the analysis. Second, the temporal structures of seasonal and non-seasonal autoregressive parameters (AR) (P, p) and moving average parameters (MV) (Q, q) in the series were determined by analysis of ACF and PACF analyses. Upon assessing the ACF and PACF results, the correlograms of the time series suggest that p and q should be <2 and d = 0. Third, parameters in the model were estimated using the maximum likelihood method, and the goodness-of-fit of each model was determined for appropriate modeling using the Ljung-Box test, measuring the ACFs and PACFs of the residuals, and checking the normality of the residuals. The significance of the parameters should be significantly different from zero. The normalized Bayesian Information Criterion (BIC) and stationary R square (R 2 ) were also used to compare the goodness-of-fit among SARIMA models. The model with the lowest BIC and the highest stationary R 2 values was considered a good model. The root mean square error (RMSE) was used to evaluate the predictive validity of the models. In this study, the data from January 1992 to December 2013 were used to construct a SARIMA model, and data from 2014 were used to validate the model.
We further evaluated whether alternative SARIMA models incorporating meteorological variables have greater predictive power. To overcome the autocorrelation within each individual series, the correlation coefficients between the number of HFRS epidemics and meteorological variables were computed after pre-whitening. Pre-whitening was performed by modeling each time series individually, using the SARIMA model to remove the trend and seasonal components and compute the correlation coefficients of the residuals of the time series. Climatic variables significantly associated with the number of HFRS cases were tested as predictors in the multivariate SARIMA model. A comparison of SARIMA with and without climatic variables was conducted. All statistical tests were 2-tailed, and a P value of 0.05 was considered statistically significant in terms of an explorative data analysis. For statistical analyses, we used SPSS software version 19 (SPSS).

Statistical analyses
There were 1868 HFRS cases in Jiaonan from 1992-2014, and the number of cases fluctuated over the study period (Fig 2). The HFRS cases first increased and then decreased, with small fluctuations. The years with the most HFRS cases were 1995 (142 cases), 1999 (262 cases), 2008 (78 cases), and 2012 (127 cases). As shown in Fig 3, the months with the highest HFRS epidemic risk were November>October>December>January, with mean HFRS cases of 23.78>16.87>14.17>5.65.

Temporal simulation using SARIMA models
Univariate SARIMA model. The data were fitted with several univariate SARIMA models, and the models in which the residuals were not likely to be white noise were excluded. Among the models, the univariate SARIMA (0,0,2)(1,1,1) 12 model had both the lowest BIC (3.628) and highest R 2 (0.599) values (Fig 5, Table 2). Analyses of residuals in ACF and PACF plots assessed the absence of persistent temporal correlations. The Ljung-Box test confirmed that the residuals of time series were statistically independent. The selected model adequately fit observed data from 1992 to 2013. Furthermore, the model was used to forecast the number of HFRS cases between January and December 2014 and was validated using actual observations. The prediction of HFRS cases from January 2014 to December 2014 yielded an R 2 value of 0.857 and RMSE of 2.708.
Multivariate SARIMA model integrating meteorological variables. Next, we determined whether HFRS-associated meteorological variables could help refine the prediction model. To include meteorological variables (time series) as external variables, a multivariate SARIMA model was applied to the time series. To find the most HFRS-associated meteorological variables, a cross-correlation analysis was used to compute the lags of meteorological variables significantly associated with HFRS cases.
The BIC and RMSE values increased and R 2 did not improve when the meteorological variables were considered external variables in the alternative SARIMA model (Table 3). Thus, the univariate SARIMA model predicted HFRS epidemics better than did the alternative SARIMA model that integrated meteorological variables.

Discussion
In this paper, monthly HFRS cases in Jiaonan, Shandong, China, from 1992 to 2014 were modeled and validated using a SARIMA (0,0,2)(1,1,1) 12 model. Furthermore, we tested whether meteorological variables, including temperature, precipitation and relative humidity, could be used to improve the prediction. In mainland China, Shandong is one of the areas most severely affected by HFRS epidemics. Additionally, Qingdao City ranked first among cities with the most HFRS cases reported, and Jiaonan reported the most HFRS cases in Qingdao [24]. Although the incidence of HFRS is stable and exhibits a general decreasing trend at the national level in China, an increasing trend has been observed in Jiaonan in recent years [24].
HFRS is a disease with typical seasonal characteristics. In Jiaonan, most HFRS cases were observed in winter months (October, November, and December) and a spring month (January), In this study, meteorological variables, including relative humidity, precipitation and temperature, were correlated with HFRS cases to different degrees. According to a study in Jiaonan by Lin et al., meteorological variables, including daily temperature, humidity and rainfall, might be important predictors of HFRS epidemics in Jiaonan County [25], which has been verified in many studies. [11][12][13]. In this study, we tested whether meteorological variables (relative humidity, precipitation and temperature) significantly associated with HFRS cases could improve the SARIMA model as external regressors. The results showed that the meteorological variables did not significantly improve the SARIMA model, which can be explained as follows: (1) the seasonal variations in the meteorological variables were included in the seasonal mode characteristics of the HFRS itself and were hidden over a relatively large time span; and (2) Other factors, including vaccines, rodent control and improvement of living and working conditions, may affect HFRS epidemics.
Some limitations of this study should be noted. First, human rural activities, landscape features, land use, etc. can also affect HFRS occurrence; however, due to data availability and regain cycles, they were not included in this study. Second, vaccine inoculation has an important impact on HFRS epidemics and may directly affect HFRS epidemics in the subsequent months and years; thus, inoculation should be included in the SARIMA model to improve the prediction precision in the future.
The SARIMA model is good at simulating the temporal variations in HFRS epidemics in China, which should aid hygienic authorities in creating effective measures that prevent and control this disease. More temporal models should incorporate powerful environmental factors to synthetically predict HFRS epidemics in the future.

Conclusion
The SARIMA model developed in this study can be used as an early and reliable monitoring system to predict annual HFRS epidemics. Climate patterns and HFRS were highly correlated; however, they did not improve the simulation results when included in the SARIMA model. The result was likely because the seasonal variations in meteorological variables were included in the seasonal mode characteristics of the HFRS itself.