Using a Negative Binomial Regression Model for Early Warning at the Start of a Hand Foot Mouth Disease Epidemic in Dalian, Liaoning Province, China

Background The hand foot and mouth disease (HFMD) is a human syndrome caused by intestinal viruses like that coxsackie A virus 16, enterovirus 71 and easily developed into outbreak in kindergarten and school. Scientifically and accurately early detection of the start time of HFMD epidemic is a key principle in planning of control measures and minimizing the impact of HFMD. The objective of this study was to establish a reliable early detection model for start timing of hand foot mouth disease epidemic in Dalian and to evaluate the performance of model by analyzing the sensitivity in detectability. Methods The negative binomial regression model was used to estimate the weekly baseline case number of HFMD and identified the optimal alerting threshold between tested difference threshold values during the epidemic and non-epidemic year. Circular distribution method was used to calculate the gold standard of start timing of HFMD epidemic. Results From 2009 to 2014, a total of 62022 HFMD cases were reported (36879 males and 25143 females) in Dalian, Liaoning Province, China, including 15 fatal cases. The median age of the patients was 3 years. The incidence rate of epidemic year ranged from 137.54 per 100,000 population to 231.44 per 100,000population, the incidence rate of non-epidemic year was lower than 112 per 100,000 population. The negative binomial regression model with AIC value 147.28 was finally selected to construct the baseline level. The threshold value was 100 for the epidemic year and 50 for the non- epidemic year had the highest sensitivity(100%) both in retrospective and prospective early warning and the detection time-consuming was 2 weeks before the actual starting of HFMD epidemic. Conclusions The negative binomial regression model could early warning the start of a HFMD epidemic with good sensitivity and appropriate detection time in Dalian.


Results
From 2009 to 2014, a total of 62022 HFMD cases were reported (36879 males and 25143 females) in Dalian, Liaoning Province, China, including 15 fatal cases. The median age of the patients was 3 years. The incidence rate of epidemic year ranged from 137.54 per 100,000 population to 231.44 per 100,000population, the incidence rate of non-epidemic year was lower than 112 per 100,000 population. The negative binomial regression model with AIC value 147.28 was finally selected to construct the baseline level. The threshold value was 100 for the epidemic year and 50 for the non-epidemic year had the highest sensitivity(100%) both in retrospective and prospective early warning and the detection timeconsuming was 2 weeks before the actual starting of HFMD epidemic.

Introduction
The hand foot and mouth disease (HFMD) is a human syndrome caused by intestinal viruses like that coxsackie A virus 16, enterovirus 71 in infants and children [1][2], infectious strongly and easily developed into outbreak in kindergarten and school. In china, there was a sharp rise in incidence since the Chinese Ministry of Health (MOH) has listed HFMD as a notifiable Class-C communicable disease on May 2, 2008 [3]and became one of the major infectious diseases affect children's health. According to the reference [3], 7,200,092 probable HFMD cases were reported to the China CDC surveillance system during 2008-2012, of which 3.7% were laboratory-confirmed and 0.03%died. The scientifically and accurately early detection of the start time of HFMD epidemic through estimating the baseline case number of HFMD is a key principle in planning of control measures and minimizing the impact of HFMD.
The Serfling regression has been extensively used to estimate a baseline describing the expected pattern of the historical disease in non-epidemic periods, for example, tuberculosis and influenza [4][5]. The model characterized the historical sequence of the disease time series by combination of a linear term with a trigonometric function describing the seasonal trend. Reference 4 presented the Serfling regression model in which the annual number of deaths attributable to influenza was calculated by summing the weekly excess over a period that only included the excluded weeks. Reference 5developed two Hidden Markov Models and selected the best model which considering the Serfling model results as reference. In this study, since it will produce too much negative data, so we didn't use the Serfling regression to estimate a baseline case number of HFMD. Considering the data type of the HFMD cases was count data, we choice the Poisson regression or negative binomial regression model to estimate the baseline case number of HFMD. One of the reasons of that choice these two models was the Poisson regression is cited as a recommended approach for analyzing the count data. Another one was if the count data involve over-dispersion, the negative binomial regression was an alternative to analyze this kind of phenomenon.
In this study, we attempted to establish a reliable early detection model for start timing of hand foot mouth disease epidemic in Dalian using the Poisson regression or negative binomial regression model and to evaluate the performance of model by analyzing the sensitivity in detectability.

Study area
Dalian is the main coastal city of Liaoning Province, China and a major tourist city located at 38°43 0 -40°10 0 N latitude and 120°58 0 -123°31 0 E longitude. It had a population of 6.69 million in May 2011. Dalian has a warm continental monsoon climate and is in a marine temperate zone. The average annual temperature is 10.5°C with a maximum of 37.8°C, and a minimum of -19.1°C. The average rainfall is 550-950 mm and the total hour of annual sunshine is 2500-2800 hours [6].

Data collection
The HFMD has been a notifiable communicable disease in China since May, 2008. The clinicians are required to report HFMD cases through the China information system for disease control and prevention within 24 hours. Weekly and Monthly HFMD cases in Dalian during the period of 2009 to 2014 were obtained from above information system. The HFMD cases included clinical and laboratory diagnosed cases. A clinical diagnosed HFMD case was defined as a patient with papular or vesicular rash on hands, feet, mouth or buttocks, with or without fever. A laboratory diagnose case was defined as a clinical diagnose case with laboratory evidence of enterovirus infection detected by reverse-transcriptase polymerase chain reaction (RT-PCR), real-time RT-PCR, or by virus isolation [7].

Data analysis
According to the data fit for the basic Poisson or not, we use the Poisson regression or negative binomial regression model to estimate the baseline case number of HFMD. The assumption of the Poisson models is that the variance equal to mean, but due to unobserved heterogeneity and clustering, the HFMD data often display over-dispersion. The all variance of weekly cases of HFMD exceeds the mean of that in Dalian from 2009 to 2014 (Table 1). So finally we chose the negative binomial regression model to estimate the baseline case number of HFMD.
1.Establishing a negative binomial regression model to estimate the baseline: To estimate the weekly baseline case number of HFMD, we used the negative binomial regression model and calculated iteratively. The model was as follows: Where Y t is the number of HFMD cases reported in week t; β 0 , β 1 , β 2 , β 3 , β 4 are the regression coefficients to be estimated; ε t is a normally distributed error term. In this model β 0 , β 1 t and β 2 t 2 represent the secular trend, b 3 2pt 52 À Á and b 4 2pt 52 À Á represent the seasonal trend. The Akaike Information Criterion (AIC) is used as the measure for goodness of fit in the model.
In this study, we reference the calculation steps of an adjusted Serfling regression model [8]. Firstly, we established a negative binomial regression model based on the actual weekly observed HFMD counts. Then, we excluded the actual observations which exceeded the fitted values from the first round of regression. And then, we established a negative binomial regression model again using the above cleaned data. We repeated this process until the AIC value stopped to increase. SPSS version 11.5 (SPSS, Chicago, IL) was used for data analysis. A pvalue < 0.05 was considered statistically significant. 2. Determining the optimal threshold values: In this study, we tested difference threshold values and chosen the optimal alerting threshold between them during the epidemic and nonepidemic year. The epidemic year was defined together with the consultation result with local epidemiologists, consideration of the epidemiological characteristics of HFMD, namely the HFMD epidemics have been shown to occur in 2-to 3-year cycles [9] and all the weekly number of cases was over 200 during the epidemic peak zone. The gold standard of start timing of HFMD epidemic was calculated using circular distribution method. Specifically, we use the circular distribution methods to calculate the central tendency of the peak time points(r) and the peak time zone ( a AE s) of HFMD in Dalian from 2009 to 2014 as the following formula, and defined the start of peak time zone as the start timing of HFMD epidemic.
sin a ¼ Y=r ð6Þ Where α i (i = 1,2,Á Á Á12) represents the angle of each month; α represents the average angle. Count 365 days a year and 360°all year round, so one day equal to 0.9863°.For example, calculated the average angle of January, because there are 31 days in January, the class mid-value of January is 15.5 days, so the angle of January is 0.9863 × 15.5 = 15.28765. By that analogy, there are 28 days in February, the accumulative days of class mid-value of February is 31 + 14 = 45 days, so the angle of February is 0.9863 × 45 = 44.3835 [10]. f i is the month's number of HFMD cases. DPS(Data Processing System) was used for data analysis. A p-value < 0.05 was considered statistically significant.

Circular statistical analysis
The monthly cases of HFMD reported from Dalian during 2009-2014 are shown in Fig 1. Although HFMD cases occurred throughout the year, June, July and August each year were the months with the highest incidence of HFMD. We used the circular distribution test to calculate the peak day and the peak period of HFMD for each year. The occurrence of HFMD from 2009 to 2014 in Dalian was concentrated, namely there was central tendency (overall r = 0.76, P<0.01), the concentrated vector quantity r ranged from 0.692 to 0.842. The peak day fluctuated between July 14th to 29th and the peak period of incidence ranged from May 31st to September 8th during 2009-2014 (Table 3).  respectively. In this study take into consideration 3 aspects, including consultation result with local epidemiologists, the epidemiological characteristics of HFMD and the weekly number of cases criteria during the epidemic peak zone, the epidemic year was defined as the period from2009 to 2010 and from 2012 to 2013, during which the incidence rate ranged from 137.54 per 100,000 population to 231.44 per 100,000population; the other years were defined as the non-epidemic year, the incidence rate in 2011 and 2014 was 86.92 per 100,000 population and 111.29 per 100,000 population respectively( Table 2).

The negative binomial regression model fitting
Since the negative binomial regression model usually require historical data over several years, we established the model using the weekly incidence of HFMD from 2009 to 2012 in Dalian to estimate the baseline data. After baseline data construction, a simulation was conducted for the year 2013 and 2014.
In the first run, we established a negative binomial regression model based on the actual weekly observed HFMD counts. The AIC value was 2257.51. After, we excluded the actual observations which exceeded the fitted values from the first round of regression, the AIC value decreased to1355.53.We continued to carry out multiple rounds of iterative regressions, the AIC value decreased to 147.28 and then decreased to 52.45, but there was no statistical significance for the parameter (Table 4).So we selected the model with the AIC value 147.28 to construct the baseline level(S1 Database) and the equation for the model was:Y t ¼ À13:12 À1:57 sin 2pt 52 À Á À2:16 cos 2pt 52 À Á .

Determining the optimal threshold values
When the number of observed cases from 2009 to 2012 exceeded the baseline data a certain value (threshold value) two successive, an alert would be generated, indicating the starting of HFMD epidemic. So, firstly we calculated the difference between the observed value and the baseline data. Then, we test 9 candidate threshold values from 100 to 500, the interval is 50 for the epidemic year, and the half value of them for the non-epidemic year. And then, we calculated their sensitivity to determine the optimal threshold. We defined the early warning succeed as the detection time (the median number of weeks from the start of peak time zone to the first alert) larger than one week. If the first alert generated on the same week that start of peak time zone, early warning was failure. Fig 2 and Table 5 show the different threshold value and sensitivity separately for the epidemic year and non-epidemic year. The early warning value was the sum of baseline value and the different threshold value in Fig 2. The threshold value was 100 for the epidemic year and 50 for the non-epidemic year had the highest sensitivity (100%) and the detection time-consuming was 2 weeks before the actual starting of HFMD epidemic. According to above criteria, the week of first alert generated was the 25th week. In the same way, the year of 2014 was a non-epidemic year, so the threshold value was 50 and the week of first alert generated was the 21st week. The start of peak time zone 26th week and 23rd week in 2013 and 2014 were then observed. It showed that the negative binomial regression model could early warning the start of HFMD epidemic 1.5 weeks before the actual start of peak time zone.

Discussion
In recent years, HFMD epidemics have been common in East and Southeast Asia since the first reported in 1957 [11].From January 2009 to December 2014, 62022 HFMD cases were reported in Dalian, the average incidence rate was 159.50 per 100,000 populations. The average incidence rate was higher than the reported average incidence levels of other city in Liaoning provinces and lower than the part of southern province in China [12][13][14][15][16]. The peak time zone was from May 31st to September 8th, with the peak time point fluctuated at from July 14th to 29th. This result was consistent with a previous study in China, which conclusion that the incidence peak of HFMD in northern china was observed in summer [17]. This is likely associated with the easy propagation of virus in the seasons of high temperature and high humidity [18].In this study, the circular distribution test was used to analyze the seasonality of HFMD. Compared with using the constituent ratio or relative ratio, the circular  distribution method could exactly provide accurate peak time and peak phase for the seasonal disease [19], and the gold standard of start timing of HFMD epidemic can be defined accurately and objectively. This was a strength of our study.
The another strength of our study was different optimal threshold value for the different year. The number of weekly HFMD cases differed greatly between the epidemic year and the non-epidemic year, especially during 2010 and 2012 to 2013, the mean weekly number of cases were over 200, thus if we used the same threshold value during the epidemic and non-epidemic year, the false alarm rate must be increased or the lead-timing too more to rational allocation of public health resources.
In this study, we aimed to establish a new reliable early detection model for start timing of hand foot mouth disease epidemic in Dalian and to evaluate the performance of model. Our study result indicated that the negative binomial regression model had good sensitivity (100%) in the detection of start timing to HFMD and could to detect2.5weeks for the epidemic year (2009-2010,2012-2013), 2 weeks for the non-epidemic year(2011,2014) before the actual starting of HFMD epidemic. After the initial infection, the host remains in a latent stage for a period of time before becoming infectious [20]. For HFMD, the incubation period of HFMD was 3 to 10 days [21],so 2 to 2.5 weeks leading time were enough to implement the prevention and control measures, such as promotion of health education, case isolation, disinfection of affected setting and so on [1], which can decrease the morbidity during the upcoming epidemic season. Because of the epidemic started, there would be a sharp increase in the number of HFMD cases and then reached the highest incidence soon.
This study had several limitations. First, we used only one disease with high incidence rate in Dalian as object of study, so if we used the other diseases with relatively low incidence rate, the above results will not generalize or the threshold value should readjust. Second, the definition of the epidemic and non-epidemic year was empirical, especially the incidence rate of 2009 and 2014 was not statistically difference, but they were defined by the epidemic year and non-epidemic year respectively.