Correction: Applications and Comparisons of Four Time Series Models in Epidemiological Surveillance Data

The name of a funding organization in the Funding statement is incorrect. The Funding statement should read: ''The research is funded by National science and technology major special project "Data mining and analysis of the surveillance data of five syndrome pathogen (grant no. 2012ZX10004201-006)." The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.'' Copyright: ß 2014 The PLOS ONE Staff. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
Public health surveillance is an important way to continuously collect, analyze, interpret and disseminate health data essential to prevention and control [1]. Public health surveillance systems are designed to facilitate the detection of abnormal behavior of infectious diseases and other adverse health events. To achieve this goal, different statistical methods have been used to forecast infectious disease incidence. Time series models have long been of interest in the literature. The time series models try to predict epidemiological behaviors by modeling historical surveillance data. Many researchers have applied different time series models to forecasting epidemic incidence in previous studies. Exponential smoothing [2] and generalized regression [3] methods were used to forecast in-hospital infection and incidence of cryptosporidiosis respectively. Decomposition methods [4] and multilevel time series models [5] were used to forecast respiratory syncytial virus. Autoregressive integrated moving average (ARIMA) models have been widely used for epidemic time series forecasting including the hemorrhagic fever with renal syndrome [6,7], dengue fever [8,9], and tuberculosis [10]. Models based on artificial neural networks were also used to predict the incidence of hepatitis A [11,12] and typhoid fever [13].
The decomposition methods are generally the most traditional methods in time series analysis [14,15]. These methods try to break down the original series into a long trend pattern, a seasonal pattern and residuals. Seasonal indices are extracted to express the seasonal pattern; a regression model is established to express the long trend pattern and the residuals are ignored in the methods. Because the decomposition time series methods do not involve a lot of mathematics or statistics, they are relatively easy to explain to the end user. This is a major advantage because if the end user has recognition of how the forecast was developed, he or she may have more confidence in its use for decision making.
The ARIMA models are almost the most widely used methods [16,17]. The ARIMA models are generally derived from three basic time series models (1) autoregressive (AR), (2) moving average (MA), and (3) autoregressive moving average (ARMA). The current value of the time series is a linear function of its previous values and random noise in the AR model; whereas the current value of the time series is a linear function of its current and previous values of residuals in the MA model. The ARMA model is the combination of AR and MA, which considers both the historical values and residuals. The time series required in AR, MA, and ARMA models are stationary processes. This means that the mean and the covariance of the series do not change with time. Transformation of the series into a stationary one has to be performed first for non-stationary time series. The ARIMA model fits the time series data generally based on the ARMA model and a differencing process which effectively transforms the non-stationary data into a stationary one.
In recent years, machine learning based time series models such as artificial neural networks have been successfully applied for modeling infectious disease incidence time series [18]. Support vector machines (SVMs) are a new type of machine learning methods based on statistical learning theory [19]. They could lead to greater potential and better performance in practical applications. This is due to the structural risk minimization principle employed in SVMs, which has greater generalization ability and is superior to the empirical risk minimization principle that is adopted by traditional neural networks. SVMs have been successfully applied in different problems of time series prediction such as forecasting production value in machinery industry [20], predicating engine reliability [21] and economic time series predication [22,23]. The successful utilization of support vector machines in time series predication motivates our research work by using support vector machines for epidemic time series forecasting.
The objectives of the present paper are to compare four typical time series methods, namely, two decomposition methods (regression and exponential smoothing), ARIMA model and SVMs in theory and practice as well as their real forecasting efficacy in epidemic time series. This comparison may be helpful for the epidemiologist to choose the most suitable methodology in a given situation.

Materials
We gathered available monthly incidence of nine typical infectious diseases time series data which were reported by the Chinese Center for Disease Prevention and Control (CDC). The data were collected from the Chinese National Surveillance System established in 2004. The incidence time series of brucellosis, gonorrhea, hemorrhagic fever renal syndrome (HFRS), hepatitis A (HA), hepatitis B (HB), scarlet fever, schistosomiasis, syphilis, typhoid fever from 2005 to 2012 were collected.

Methods
Decomposition methods. The decomposition methods try to extract the underlying pattern in the data series from randomness. The underlying pattern then can be employed to predict future trends and make forecasts. The underlying pattern can also be broken down into sub patterns to identify the component factors that influence each of the values in a series. Two separate components of the basic underlying pattern that tend to characterize the infectious disease time series are usually identified in decomposition methods. They are the trend cycle and seasonal factors. The trend cycle represents long term changes, and the seasonal factor is the periodic fluctuations with constant length that is usually caused by known factors such as rainfall, month of the year, temperature, timing of the holidays, etc. The decomposition model assumes that the data has the following form: Time series = Pattern + Error = Trend cycle+ Seasonality+ error The seasonality part of the time series is usually expressed with the seasonal indices [24]. To arrive at seasonal factors, the entire incidences for the training sample are averaged first, and then the averaged incidence is divided by the mean incidence for each month. If the seasonal index is bigger than 1, it means that the incidence is usually higher than the average level. Otherwise, it means that the incidence is usually lower than the average level.
Once the Seasonal indices are calculated, one can deseasonalize data by dividing by the corresponding index.
Deseasonalized data = Raw data/Seasonal Index The long-term trend is estimated from the deseasonalized data. There are many ways to estimate the long-term trend, such as moving average, exponential smoothing, and linear regression. In simple moving average methods, the current value is calculated as the mean of its previous k values, whereas exponential smoothing assigns exponentially decreasing weights over time. When the time series x(t) begins at time t = 0, the simplest form of exponential smoothing is given by the formulae: where a is the smoothing factor and St is the output of the exponential smoothing algorithm 0vav1.
The linear regression method is another simple way to express the long term trend in which a common linear regression model is established between the incidence and time t.
ARIMA model. The ARIMA model originated from AR model, MA model, and the combination of AR and MA, the ARMA models [25]. AR models express the current value of the time series X(t) linearly in terms of its previous values (X(t21), X(t2 2)…) and the current residuals e(t), which can be expressed as: X (t)~w 1 X (t{1)zw 2 X (t{2)z:::zw p X (t{p)ze(t) ð1Þ MA models express the current value of the time series X(t) linearly in terms of its current and previous residual series e(t),e(t{1),:::. the model can be expressed as: ARMA models are a combination of AR and MA models, in which the current value of the time series is expressed linearly in terms of its previous values as well as current and previous residual series. It can be expressed as: The ARIMA model deals with non-stationary time series with differencing process based on the ARMA model. The differenced stationary time series can be modeled as ARMA model to yield ARIMA model.
The ARIMA model is usually termed as ARIMA (p, d, q)6(P, D, Q) S . In the expression, P is the seasonal order of autoregressive, p the non-seasonal order of autoregressive, Q the seasonal order moving average, q the non-seasonal order of moving average, d the order of regular differencing and D the order of seasonal differencing. The subscripted letter ''s'' indicates the length of seasonal period. For example, the incidence of infectious disease varies in the annual cycle, so s = 12 in the present study.
The ARIMA modeling procedure consists of three iterative steps: identification, estimation, and diagnostic checking. Prior to fitting the ARIMA model, an appropriate difference of the series is usually performed to make the series stationary. Identification is the process of determining seasonal and non-seasonal orders using the autocorrelation functions (ACF) and partial autocorrelation functions (PACF) of the transformed data [26]. The ACF is a  statistical tool that measures whether earlier values in the series have some relation to later values. PACF captures the amount of correlation between a variable and a lag of the said variable that is not explained by correlation at all low-order lags. Parameters in the ARIMA model(s) are estimated with the conditional least squares (CLS) method [27] after the identification step. Finally, the adequacy of the established model for the series is verified by employing white noise tests [28] to check whether the residuals are independent and normally distributed. It is possible that several ARIMA models may be identified, and the selection of an optimum model is necessary. Such selection of models is usually based on the Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) [29]. Support Vector Machine. SVMs estimate the regression using a set of linear functions that are defined in a high dimensional space. SVMs carry out the regression estimation by using Vapnik's e-insensitive loss function. SVMs use a risk function consisting of the empirical error and a regularization principle [30].
Assume that G~(xi,di) f g n i is a set of data points, where xi is the input sample, di is the desired value and n is the total number of data. The SVMs calculate the function using the following: where w(x) is the high dimensional feature space which is nonlinearly mapped from the input space Q. The coefficients v and b are calculated by minimizing In eq. (5), the first term C 1 n X n i~1 Le(di,yi) represents the empirical error risk, which is calculated by the e-insensitive loss function in eq. (6). The second term 1 2 v k k 2 is the regularization term. C is the regularized constant, which determines the trade-off  Table 3. Estimation of available ARIMA models for each disease. between the empirical risk and the regularization term. If the value of C is changed, the relative importance of the empirical risk and the regularization term will also be changed. Increasing the value of C will lead to the growth of the weight of the regularization. e is named as the tube size, which is equivalent to the approximation accuracy placed on the training data sample. Both C and e are user-prescribed parameters [31].
To estimate v and b, eq. (5) is transformed to the primal function given by eq. (7) by introducing the positive slack variableszi and z Ã i as follows: Subjected to di{vw(xi){biƒezzi, Finally, by introducing Lagrange multipliers and exploiting the optimality constraints, the decision function given by Eq. (4) has the following explicit form: In Eq. 5, ai and a Ã i are the so-called Lagrange multipliers. They satisfy the equalities ai : a Ã i~0 ,ai §0, and a Ã i §0 where i = 1,…,n, and are obtained by maximizing the dual function of eq.4 which has the following form: with the constraints 0ƒaiƒC, i~1,2,:::n  0ƒa Ã i ƒC, i~1,2,:::n K(xi,xj) is called the kernel function. The value of the kernel is equal to the inner product of two vectors Xi and Xj in the feature space w(xi) and w(xj), that is K(xi,xj)~w(xi) Ã w(xj). The elegance of using the kernel function is that one can deal with feature spaces of arbitrary dimensionality without having to compute the map w(x) explicitly. A. Typical examples of kernel function are the Gaussian kernel K(x,y)~exp({1=d 2 (x{y) 2 ) where d 2 is the bandwidth of the Gaussian kernel [32]. The kernel parameter should be carefully chosen as it implicitly defines the structure of the high dimensional feature space w(x) and thus controls the complexity of the final solution. From the implementation point of view, training SVMs is equivalent to solving a linearly constrained quadratic programming (QP) with the number of variables twice as that of the training data points. The sequential minima optimization algorithm propounded by Scholkopf and Smola [33,34] is reported to be very effective in training SVMs for solving regression problems.
Model selection criterion and evaluation indices. The contrasts between the observed value of the raw series and the predicted values obtained through the four methods were compared to determine the efficacy of the four forecasting methods used in the present study. The mean absolute error (MAE), mean absolute percentage error (MAPE), and the root mean square error (RMSE) were selected as the measures of evaluation because as empirical methods they are widely used in combining and selecting forecasts for measuring bias and accuracy of models [35].
These measures were calculated using Equations (10), (11), and (12). Pt is the predicted value at time t, Zt is the observed value at time t and T is the number of predictions. RMSE~ffi To take into account the variability of MAE, MAPE and RMSE, the block bootstrap technique [36] was adopted to calculate their standard errors. All of the incidence time series in the current research have a one-year period of seasonality (D = 1). Therefore, in our block bootstrap simulations, the block length was set to be 12 months so that the autocorrelation structure within seasonal blocks was reserved. We firstly simulated 10000 replications by block bootstrap sampling, and then calculated the MAE, MAPE and RMSE for each replication. At last, the standard errors could be obtained by the following formula: where n is number of replications (10000), index could be MAE, MAPE or RMSE. Take MAE as an example, here Index means MAE, index i represents the specific value of MAE in the i-th replication and index is the mean value of MAE for the whole replications. It is the same with MAPE and RMSE.

Decomposition Methods
Seasonal indices of different types of infectious diseases were extracted from the original time series, which are listed in Table 1 (Seasonal index of each type of infectious disease), Figure 1-2(Seasonal index of each type of infectious disease (1)). The seasonality of the  incidence behavior of each infectious disease can be seen according to the seasonal indices. All the infectious diseases selected show a seasonal trend as the occurrence of infectious disease can be more or less influenced by the temperature, rainfall and sunshine, etc. However, the extent of the seasonality is not quite similar among them. Figure 1 shows the five types of diseases whose seasonal index varies obviously through 12 months. Figure 2 shows the four types of disease whose seasonality indices do not vary seriously. Brucellosis, hemorrhagic fever, scarlet fever, schistosomiasis and typhoid fever show stronger seasonality than the others, as their variances of their seasonal indices are bigger than others. The incidence of brucellosis is higher in summer and lower in winter, with the crest in June. Hemorrhagic fever has the highest seasonal index in November and lowest in September. Scarlet fever has the higher seasonal index in May, June and December and lower index in August. The incidence of schistosomiasis is higher in summer and lower in winter, with the crest in July. The incidences of typhoid fever are higher in summer and lower in winter, with the crest in August. The other diseases, such as Hepatitis A, Hepatitis B, Gonorrhea and syphilis have relatively smooth seasonal index curves.
After the extraction of seasonal indices, linear regressions were modeled for the rest of the incidence time series. The form of the regression model is: Deseasonalized value at time t = Constant + Coefficient * t The parameters of the established models are listed in Table 2 (Regression results of each series removed seasonality). R 2 is the coefficient of determination. It ranged between 0 and 1, which is used  to describe how well a regression line fits a set of data. An R 2 near 1 indicates that a regression line fits the data well, while an R 2 closer to 0 indicates a regression line does not fit the data very well. It can be seen from Table 2 that the regression model on the seasonality-removed incidence data of brucellosis, gonorrhea, hepatitis A, Syphilis and typhoid fever generally fit well. The regression model on the seasonality removed incidence data of hepatitis B fit badly, and P value is over 0.05, however, the model is still used to forecast the incidence in the study as the model has good fitting and forecasting efficacy.
We also used exponential smoothing to extract the long term trend after the extraction of seasonal indices. Different smoothing factors were tested from 0.1 to 0.9 with 0.1 step. Smoothing factors were selected by the criterion of minimum MSE in the modeling process.

ARIMA model
ARIMA models were fitted to the nine types of infectious diseases from 2005 to 2011 and tested by predicting the incidence for the year 2012. Different ARIMA models were tested to determine the best fitting models. Table 3(Estimation of available ARIMA models for each disease) presents the results of the estimations using various ARIMA processes for the nine diseases incidence time series. The selections of the best models were performed according to the principle of AIC and SBC. The final selected ARIMA model was marked into yellow in Table 3. The parameter significance test and the white noise diagnostic check for residuals obtained by the selected model were made to ensure that the data was fully modeled.

Support Vector Machine
The training number of the SVM based time series model needed to be determined. In previous studies [13], the training number for the training of periodic series is usually the period of the series. In the present study, the period of the entire infectious disease incidence selected is twelve. Therefore, twelve was selected as the training number for SVM based models, in which the last 12 months of data were reserved as the input for forecasting the present data. Proper transition of the data series is always necessary to determine the input and the output data before the training process. Supposing that X t represents the value at time t, the input matrix and the corresponding output matrix of the training and validation sample used in our study are written as follows: The input matrix is sent into SVM for training, and its corresponding output matrix is its training goal. Once the  parameters are determined, they are used to forecast the incidence in 2012 iteratively.
Several parameters needed to be determined. They are C, e and the kernel parameter d 2 . The value of e is reportedly not sensitive to the accuracy of SVMs. In the present study, the value of e was prescribed as 0.01. Different C and d 2 were examined from 2 210 to 2 10 in 2 increments. There is no structural way to determine the optimal parameters of SVMs. In the present study, cross validation methods were applied to determine the proper SVMs. The training samples were randomly divided into k parts in the training process, each part was used for testing and the others used for training. The obtained MSE each test was recorded and the mean of the MSE acted as the selection criterion for the optimal parameters. MAPE is a relative index among the three evaluation indices. We used MAPE to evaluate the general performance for the models to forecast each disease. The MAPEs for each model obtained for each disease in both modeling process and predicating process are shown in Figure 3-6. It was shown that most of the MAPEs obtained by the decomposition (Regression) method in the modeling process are controlled within 30% except scarlet fever (42%). In the predication process, the MAPEs for all infectious disease are controlled within 30% except hemorrhagic fever (55%), and typhoid fever (51%). The decomposition (Regression) methods had bad performance in fitting scarlet fever incidence and predicating those of hemorrhagic fever and typhoid fever. All of the MAPEs obtained by decomposition (Exponential Smoothing) method in the modeling process were controlled within 15%. The method generally had a good fit in the modeling process. In the predication process, the MAPEs for all infectious diseases were controlled within 30% except scarlet fever (59%) and Hepatitis A(31%). The decomposition (Exponential Smoothing)  methods had bad performance in predicating scarlet fever incidence. The MAPEs obtained by ARIMA model in the modeling process method were controlled within 30%. In the predication process, the MAPEs for the 9 kinds of infectious diseases were controlled within 30% except scarlet fever (175%). The ARIMA model had good performance in the fitting process of all the infectious diseases selected. But it had bad performance in forecasting scarlet fever. The MAPEs obtained by SVM model in the modeling process are controlled within 15%. In the predication process, the MAPEs for the 9 kinds of infectious diseases were controlled within 20% except scarlet fever (33%) and Schistosomiasis (25%). The SVM based model had good performance in the fitting process and predicting process of all the infectious diseases selected.

Comparisons of the forecasting performance
To compare the performance the different models for different diseases, different evaluation indices were emphasized. MAPE is emphasized for lower level incidence disease (annual mean incidence ,0.1/100,000) such as Schistosomiasis (0.0245/ 100,000) and Hemorrhagic Fever (0.0814/100,000). RMSE is emphasized for higher level incidence disease (mean incidence . 1/100,000), such as Hepatitis B (7.9335/100,000) and syphilis (1.8461/100,000). MAE was emphasized for medium level incidence disease (0.1/100,000,mean incidence ,1/100,000) including Hepatitis A (0.3356/100,000), gonorrhea (0.8329/ 100,000), scarlet fever (0.2131/100,000), typhoid fever (0.1242/ 100,000) and brucellosis (0.1975/100,000). The performances of the three methods for gonorrhea, hepatitis B, Schistosomiasis and Syphilis ranked in descending order were: SVM, ARIMA, exponential smoothing and regression. The performances of the three methods for Hepatitis A ranked in descending order were: SVM, regression, exponential smoothing and ARIMA. The performances of the three methods for Brucellosis and Hemorrhagic fever ranked in descending order were: ARIMA, SVM, exponential smoothing and regression. The performances of the four models for Scarlet Fever ranked in descending order were: regression, SVM, exponential smoothing and ARIMA. The performances of the four models for typhoid fever ranked in descending order were: exponential smoothing, ARIMA, SVM and regression. SVMs performed best in forecasting gonorrhea, hepatitis A, hepatitis B, Schistosomiasis and Syphilis. ARIMA performed best in forecasting Brucellosis and Hemorrhagic Fever and performed the worst in forecasting Scarlet Fever. Exponential smoothing performed best in forecasting typhoid fever, but worst in hepatitis A. Regression method performed best in forecasting scarlet fever, however the worst in Brucellosis, Gonorrhea, Hemorrhagic Fever, Schistosomiasis, Syphilis and typhoid fever. The exponential smoothing method performs better than regres-

Discussion
The early recognition of epidemic behavior is significantly important for epidemic disease control and prevention. The effectiveness of statistical models in forecasting future epidemic disease incidence has been proved useful [37]. The surveillance system is a good way to collect and analyze infectious disease data. With high quality surveillance data, the epidemic behavior may be accurately detected and forecasted. Discussion of the forecasting techniques is very important. In the present study, we conducted a comparative study of four typical time series investigations in the forecasting of the epidemic pattern of nine types of infectious diseases, namely two decomposition methods (regression and exponential smoothing), ARIMA model, and SVMs based model. We have also compared the differences among these methods in both principle and practical aspects.
In principle, the decomposition method can break down the original into different parts. The seasonal factor can be expressed in the form of seasonal indices. The series after seasonal pattern removal can be modeled with regression methods or exponential smoothing, etc. Time series decomposition models do not involve a lot of mathematics or statistics; they are relatively easy to explain to the end user. The ARIMA model can grasp the historical information by (1) AR to consider the past values, and (2) MA to consider the current and previous residual series. The ARIMA model is popular because of its known statistical properties and the well-known Box-Jenkins methodology in the modeling process. It is one of the most effective linear models for seasonal time series forecasting. In contrast, the SVMs time series models capture the historical information by nonlinear functions. With flexible nonlinear function mapping capability, support vector machine can approximate any continuous measurable function with arbitrarily desired accuracy.
In practical matters, the building of the decomposition methods generally involves two parts: (1) extraction of the seasonal indices to express the seasonal pattern hidden in the infectious disease time series, and (2) regression methods to model the long trend pattern. The building of the ARIMA model requires the determination of differencing orders (d, D), and operators (p, q, P, Q), as well as the estimation of model parameters in the autoregressive and moving average polynomials. The construction of SVMs requires the determination of three parameters, namely, d 2 , C, e. The time series data should be transformed into the input matrix and the output matrix, and then be put into the support  Based on the three forecasting measured errors (MAE, MAPE, MSE), and the visualization of the forecasted values, the empirical evidence is that no one method completely dominated the others. However, the present study shows that support vector machine generally outperforms the conventional ARIMA model and decomposition methods. The ARIMA model has been proved an effective linear model to effectively capture a linear trend of the infectious disease series. The decomposition methods generally perform better when the series conform to the decomposition hypothesis. The linear regression hypothesis seems to be more rigid on the season moved series than exponential smoothing.
The advantage of decomposition is that decomposition models do not involve a lot of mathematics or statistics; they are relatively easy to explain to the end user. This is a major advantage because if the end user has an appreciation of how the forecast was developed, he or she may have more confidence in its use for  decision making. The disadvantage of decomposition methods is that the hypothesis may be too strong for the epidemic behavior, so that the model may not perform well sometimes. The ARIMA model has advantages in its well-known statistical properties and effective modeling process. It can be easily realized through mainstream statistical software. The model can be used when the seasonal time series are stationary and have no missing data. The disadvantage of the ARIMA model is that it can only extract linear relationships within the time series data. it may not work well for the occurrence of an infectious disease which can be affected by various factors, including many meteorological and various social factors, namely, the occurrence of the disease does not necessarily associate with the historical data in linear relationship. Our study suggested that nonlinear relationships may exist among the monthly incidences of many diseases such as scarlet fever, so that the ARIMA model did not efficiently extract the full relationship hidden in the historical data. Support vector machines are potentially useful endemic time series forecasting methods because of their strong nonlinear mapping ability and tolerance to complexity in forecasting data. SVMs have very good learning ability in time series modeling. SVMs have unique advantages compared with other machine learning methods, such as neural networks. For example, the SVMs implement the structural risk minimization principle, which leads to better generalization than neural networks that implement the empirical risk minimization principle. SVMs also have fewer free parameters than neural networks [38]. What is more, the scarlet fever incidence shown in figure 17 (Scarlet fever incidence and fitting values predicted by the four methods) indicated that the average incidence from 2011 to 2012 was higher than that in the previous six years (2005)(2006)(2007)(2008)(2009)(2010). The phenomenon that the incidence level changed greatly through time was called level shift by Tsay, R. S. in 1988. [39] Since the ARIMA model is in fact a regression of the present incidence value on the past values and residuals, it is of high risk that level shift would likely affect the forecasting performance of ARIMA model. Therefore, statisticians and time series analysts have tried to overcome the effect of level shift for many years. In our paper, it is interesting that, as presented in Table 4, the MAE, MAPE, RMSE and their standard errors of ARIMA model are larger than those of decomposition model, SVM and exponential smoothing method. This result in our paper suggests that the other three methods may serve as a better way than SARIMA model in analyzing time series in the presence of level shift.
The limitations of the study should also be acknowledged. First, only eight-years of incidence data were obtained because the Chinese National Surveillance System for Infectious Disease was established only in 2004. The relatively short length of the series may influence the forecasting efficacy of the different methods. Second, we only predicted the infectious disease incidence with the four typical forecasting methods. The findings based on a specific disease may not be repeatable when used on other cases. What is more, there are some other hypotheses on the long term trend in decomposition methods, such as generalized models which assume a nonlinear function among the time series. Many other models were developed to make up deficiencies of ARIMA, such as GARCH, etc. SVM is only one of the typical machine learning techniques. In this paper, we only choose four very typically used time series methods to make a comparison.
Infectious diseases pose a significant threat to human health. The establishment of epidemiological surveillance system greatly facilitates the implement of strategic health planning, such as vaccination costs and stocks. More research on the accurate prediction of the epidemiological events based on surveillance data should be conducted, and more sophisticated forecasting techniques should be applied and compared in practice.