Comparing Observed with Predicted Weekly Influenza-Like Illness Rates during the Winter Holiday Break, United States, 2004-2013

In the United States, influenza season typically begins in October or November, peaks in February, and tapers off in April. During the winter holiday break, from the end of December to the beginning of January, changes in social mixing patterns, healthcare-seeking behaviors, and surveillance reporting could affect influenza-like illness (ILI) rates. We compared predicted with observed weekly ILI to examine trends around the winter break period. We examined weekly rates of ILI by region in the United States from influenza season 2003–2004 to 2012–2013. We compared observed and predicted ILI rates from week 44 to week 8 of each influenza season using the auto-regressive integrated moving average (ARIMA) method. Of 1,530 region, week, and year combinations, 64 observed ILI rates were significantly higher than predicted by the model. Of these, 21 occurred during the typical winter holiday break period (weeks 51–52); 12 occurred during influenza season 2012–2013. There were 46 observed ILI rates that were significantly lower than predicted. Of these, 16 occurred after the typical holiday break during week 1, eight of which occurred during season 2012–2013. Of 90 (10 HHS regions x 9 seasons) predictions during the peak week, 78 predicted ILI rates were lower than observed. Out of 73 predictions for the post-peak week, 62 ILI rates were higher than observed. There were 53 out of 73 models that had lower peak and higher post-peak predicted ILI rates than were actually observed. While most regions had ILI rates higher than predicted during winter holiday break and lower than predicted after the break during the 2012–2013 season, overall there was not a consistent relationship between observed and predicted ILI around the winter holiday break during the other influenza seasons.


Introduction
In the United States, influenza season typically begins in October or November, peaks in February, and tapers off in April, although the timing and duration vary from year to year [1]. The U.S. Centers for Disease Control and Prevention (CDC) assesses influenza activity using the National Influenza Sentinel Surveillance System for influenza-like-illness (ILINet) [2]. More than 2,900 ILINet sentinel providers in all 50 states, Puerto Rico, the District of Columbia, and the U.S. Virgin Islands report weekly visits for influenza-like illnesses (ILI), defined as fever (100°F [37.8°C]), plus cough and/or sore throat, in the absence of another known cause of illness.
There are a number of studies [3][4][5] suggesting that school closures, which temporarily change the social contact patterns of children, can reduce total illnesses and peak incidence of pandemic and seasonal influenza among school children. Among the general population, commonly observed holidays, such as the winter holiday break in late December to early January, also disrupt normal social mixing patterns. These periods provide unique opportunities to explore the relationship between ILI and temporary, atypical social patterns.
Assessing the impact of winter break on ILI is challenging. It is difficult to identify appropriate control series, and populations may not be similarly susceptible to influenza across regions. However, models that can be trained on longitudinal data to predict weekly ILI rates may be able to identify deviations from expected ILI patterns around a time period of interest.
The ARIMA method was first popularized by Box-Jenkins [6] for analyzing time-series data. Since then, they have been widely applied in fields such as engineering, economics, agriculture, meteorology, and infectious diseases, including influenza [7][8][9]. Unlike most generalized linear regression, in which model predictions are restricted to the range of predictors, ARIMA models forecast beyond the scope of model predictors using the recursive relationship between observations and error terms.
We apply the ARIMA method to compare observed and predicted weekly ILI around the winter holiday break period to explore patterns that may be associated with the holiday period.

Data Source
We obtained ILINet data for the 2003-2004 through 2012-2013 influenza seasons for each of 10 U.S. Health and Human Services (HHS) regions listed on the CDC website [2]. A U.S. HHS region is the geographic aggregation of 4-10 adjacent states or island areas. Each week, sentinel clinic providers in these regions report the number of clinic visits due to ILI, as well as the overall number of clinic visits [10]. The ILI rate (number of clinic visits due to ILI/total number of visits across the state's sentinel clinics) is weighted according to the state population and aggregated to the regional level.

Data Analysis
Winter Holiday Break. Because the beginning and ending dates of the winter holiday break may vary by geography and year, we approximated this using website announcements made by 4,297 public schools in Tennessee and North Carolina during the 2012-2013 influenza season (data originally collected for a different project). We found that most school districts started the winter holiday break during the week of Christmas, or the Thursday and Friday of the previous week, and ended the break on the first Monday after the New Year. Therefore, we approximated the disruption to normal social mixing patterns and healthcareseeking behavior during the holiday break as the 2-week period that covered New Year's Day.
Defining ILINet Surveillance Weeks. The ILI weeks in this research were defined in the same way as they were for the CDC's Morbidity and Mortality Weekly Report (MMWR) [11]. The first day of any week is Sunday. Week # 1 is the first week of the year having at least four days in the calendar year. Using this definition, years 2003 and 2008 consisted of 53 weeks in our analysis.
In addition to the first week and last week that encompass the New Year, we also compared observed and predicted ILI rates for weeks 44 to week 8 for each influenza season from 2004-2005 to 2012-2013 for the purpose of determining whether we can observe similar interruptions of ILI trend in these influenza weeks. Therefore, our analysis included a total of 1,530 unique (17 weeks x 9 seasons x 10 regions) predictions. However, for years 2003 and 2008, the starting week for our data analysis was week 45 rather than week 44, since years 2003 and 2008 had 53 weeks. In addition, each 17-week period was aligned with the 17-week periods from all other seasons.
Auto-Regressive Integrated Moving Average Method. To forecast ILI rates for each week-year-region combination, we used the ARIMA (p, d, q) method, in which p represents the number of autoregressive terms (i.e., number of previous observations on which the current observation linearly depends), d is the order of differencing, and q is the number of lagged forecast errors in the prediction equation (i.e., number of preceding estimation errors are taken into account when estimating the next time-series value). For example, to forecast the ILI rate for week 52 in 2004, we used all the data points in 2003, plus weeks 1-51 in 2004, as a training set to determine the proper parameters in the ARIMA model using model identification, model diagnosis, and forecasting [12]. In the identification step, we selected parameters p, d, q using Bayesian information criterion (BIC), which assigns a penalty according to the number of parameters in the model. In the model-diagnosis step, we used the Ljung-Box statistical test and quantile-quantile plot to check the time-series assumptions. We also applied the test outlined by Osborn et al [13] to check if seasonality terms should be included in the ARIMA model. Finally, in the forecasting step, we applied the model identified in the above steps to predict the weekly ILI rate and computed 95% prediction intervals by bootstrapping the model error 5,000 times.
We repeated these steps, as described above, for each combination of the 17 weeks (weeks 44 to week 8), nine influenza seasons, and 10 HHS regions. To generate the forecasting series for a specific week in one influenza season, we replaced the observed ILI rate for this week in all previous years with the predicted ILI rate to avoid the potential carryover effect by the actual ILI data in the prediction. For example, to forecast the ILI rate for week Y in influenza season 2005, we replaced the observed week Y ILI rates in 2003 and 2004 with ARIMA model-predicted values. Because there were not enough data points to have reliable predictions for the last week of 2003 and first week of 2004, these weeks are excluded from the results presented.
All analyses were performed using package Forecast 4.06 [14] in R 3.0.1 [15] and PROC ARIMA in SAS 9.3 for Windows 7 (SAS Institute, Cary NC). Because the ILI surveillance data are publicly available and include summary data only, this research was not subject to CDC institutional review board (IRB) review.

Results
We present the last and first week predictions in the main article in Tables 1 and 2, Tables 3  and 4. All other results from weeks 44-51 and weeks 2-8, plus model-fitting procedures, are reported in the supplemental material. We summarize our results into three categories in this section: 1) predicted ILI rates lower than observed; 2) predicted ILI rates higher than observed; and 3) model predictions at peak and after peak. ranged from 44% in 2009-2010 to 67% in 2004-2005. The percentage of predicted ILI rates that were lower than observed was similar across regions (from 58% in Region 10 to 63% in Region 5).
There were 64 of 1,530 predicted ILI rates that were statistically significantly lower than observed (i.e., the observed weekly ILI rate was higher than the upper bound of 95% bootstrapped prediction interval). These 64 predicted values were almost evenly distributed by HHS region. However, seasons 2007-2008 and 2012-2013 contributed 18 and 17 predictions, respectively. Of the 64 predicted values that were significantly lower than observed, the most commonly identified week numbers were weeks 52, 51, 5, 7, and 4, which contributed 12,9,8,7, and 7 values, respectively. Fig 1A illustrates the number of predicted rates that were significantly lower than observed by influenza season and weeks. Weeks 51-52 form an apparent cluster, with a combined 21 (33% of 64) predicted values that is significantly lower than observed. Twelve of the 21 values in this cluster were from influenza season 2012-2013and nine from week 52 (Tables 1 and 2).

2004-2005
Predicted ILI Rates Higher than Observed Of 1,530 predicted ILI rates, 38% (579) were higher than the corresponding observed values (S2 Table). Influenza season 2009-2010 had the most predicted ILI rates that were higher than observed (93/1,530; 6%) and influenza season 2010-2011 was the least represented (48/1,530; 3%). There was regional variation in the number of ILI predictions that were higher than observed (36% in region 5 and region 9 to 42% in region 10). There were 46/1,530 (3%) predictions that were statistically significantly higher than observed (i.e., the observed weekly ILI rate was lower than the lower bound of 95% bootstrapped prediction interval). Influenza season 2009-2010 comprised most of these predictions (12/

Model Predictions at Peak and after Peak
While timing of peak-observed ILI activity often differed by HHS region, ILI activity peaked mostly during a single week during some influenza seasons. Of 90 (10 HHS regions x 9 seasons) predictions during the peak week ILI activity, 78 predicted ILI rates were lower than observed. In seasons in which the peak week ILI activity occurred in week 7 or earlier, 62 out of 73 predictions for the post-peak week (i.e., the week after the peak ILI week) were higher than observed. There were 53 out of 73 models that had lower peak and higher post-peak predicted ILI rates than were actually observed.

Discussion
From weeks 44 to week 8 of the influenza seasons investigated, week 52 (during the typical winter holiday break) had the most predicted ILI rates that were significantly lower than observed and week 1 (after the typical winter holiday break) had the most predicted ILI rates that were significantly higher than observed. However, these findings were largely driven by a single  1,0,2). When time series is used to forecast the peak values along the timeline, the predicted value will depend on the past week or a rolling average of the past two weeks' values, in most cases. As a result, the forecasted ILI at week 52 will be smaller than the highest observed ILI. However, when the model is trying to forecast the data point next to the peak, the peak observation exerts influence on this prediction, which is then predicted as a value higher than that observed.
Changing daily routines or traveling to other parts of the country during the holiday break could influence the decision to seek care for ILI symptoms. During week 52, the total number of visits reported to ILINet decreases, while the number of ILI-related visits remains the same around the peak, resulting in an elevated proportion of ILI visits (Fig 2). Because ARIMA model predictions for week 52 are based on previous weeks when the total number of visits was higher, the observed ILI rate may be higher than expected. In early January, the observed proportion of ILI may decrease because of the increase in total patient visits. The model prediction in the week after the holiday break is based on previous ILI rates, including the holiday week with the lower denominator of visits; this can lead to predicted ILI rates that are higher than observed. In addition to the changes in the denominator of patient visits during the holiday break, ILI cases requiring medical attention may be more severe than ILI cases reported during non-holiday weeks because of differences in patient care-seeking behavior. The variability we detected between predicted and observed ILI rates during the winter holiday break of certain influenza seasons may reflect several factors, including a change in social mixing patterns, healthcare-seeking behavior, specific characteristics of the circulating influenza strain, or artifacts of ILINet surveillance data. ILINet surveillance does not capture the full age-specific information; therefore, we could not explore whether our findings are related to age-specific ILI reports or other factors.
We were unable to assess the role of different influenza strains on the predicted ILI rates, although such differences can impact the epidemiology of influenza [12,13]. Previous infection from one strain can provide full or partial immunity to the same strain, or similar influenza strains, during subsequent exposures. Additionally, certain strains are associated with more or less transmissibility and pathogenicity, affecting illness severity and use of healthcare services. Any of these virus-related factors, as well as co-circulation or serial circulation of different strains, could have influenced our findings.
Our time-series analysis was an ecological approach to describing the relationship between ILINet data and the winter holiday break. As a result, we cannot establish a causal relationship between winter holiday breaks and changes in ILI rates. ILINet provides weekly data on all people with ILI, and since these data do not rely on laboratory confirmation, reported ILI rates likely included people with influenza as well as other acute respiratory diseases. Repeating this analysis using confirmed influenza cases could be informative. Additionally, we divided the winter holiday break into two time frames in this analysis: the last week of the year and the first week of each year. This categorization was made based on our estimation of the typical duration of the winter holiday break, given limited information in the literature as well as the availability of surveillance data by week. Future analyses interpolating daily changes in ILI rates may be warranted. We must also note that our time-series analysis used only 51 and 52 data points for predicting ILI rates for the last week and the first week of the 2003-2004 influenza season, respectively. This could have limited the reliability of our predictions; more data points would have been beneficial.

Conclusion
In our analysis, we detected high variability in the temporal relationship between winter holiday break and weekly ILI rates across influenza seasons. While overall observed ILI rates during the last week in December were higher than predicted, and observed ILI rates during the first week of January were lower than predicted, these findings were mainly attributable to a specific influenza season 2012-2013. We demonstrated the use of the ARIMA method in conducting this time-series analysis. Additional analyses using this method and others to incorporate environmental factors or virological data may better clarify the relationship between influenza activity and the winter holiday break. Last-week ILI rate prediction for HHS region 4 was based on previously observed weekly ILI rates and previously fitted lastweek ILI rate. A solid blue line represented weekly ILI rates reported by CDC ILInet, a solid red line represented fitted weekly ILI rates. Blue dots and red dots represented the observed and predicted week-52 ILI rates, respectively, with ARIMA models for the following years: a) 2004 last-week prediction by ARIMA(2,0,2); b) 2006 last-week prediction by ARIMA(2,0,2); and c) 2008 last-week prediction by ARIMA(2,0,2). (TIF) S1 Model Fitting. (DOCX) S1