^{1}

^{*}

^{1}

^{2}

^{1}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: LJM YY. Analyzed the data: LJM BX YY. Contributed to the writing of the manuscript: LJM BX YY. Drafted the manuscript: LJM.

Google Flu Trends (GFT) uses Internet search queries in an effort to provide early warning of increases in influenza-like illness (ILI). In the United States, GFT estimates the percentage of physician visits related to ILI (%ILINet) reported by the Centers for Disease Control and Prevention (CDC). However, during the 2012–13 influenza season, GFT overestimated %ILINet by an appreciable amount and estimated the peak in incidence three weeks late. Using data from 2010–14, we investigated the relationship between GFT estimates (%GFT) and %ILINet. Based on the relationship between the relative change in %GFT and the relative change in %ILINet, we transformed %GFT estimates to better correspond with %ILINet values. In 2010–13, our transformed %GFT estimates were within ±10% of %ILINet values for 17 of the 29 weeks that %ILINet was above the seasonal baseline value determined by the CDC; in contrast, the original %GFT estimates were within ±10% of %ILINet values for only two of these 29 weeks. Relative to the %ILINet peak in 2012–13, the peak in our transformed %GFT estimates was 2% lower and one week later, whereas the peak in the original %GFT estimates was 74% higher and three weeks later. The same transformation improved %GFT estimates using the recalibrated 2013 GFT model in early 2013–14. Our transformed %GFT estimates can be calculated approximately one week before %ILINet values are reported by the CDC and the transformation equation was stable over the time period investigated (2010–13). We anticipate our results will facilitate future use of GFT.

In the United States, traditional influenza and influenza-like illness (ILI) surveillance data are available from the Centers for Disease Control and Prevention (CDC) and include data from ILINet, an outpatient surveillance system that measures the percentage of physician visits related to ILI (%ILINet)

Given the inaccuracies and criticisms of GFT during the 2012–13 season, we investigated the relationship between GFT estimates and their intended target (%ILINet) and transformed GFT estimates to improve their estimation of %ILINet in the United States.

We used United States national data from GFT

In our main analysis for evaluating the prediction performance, we limited the data to a period after the 2009 GFT recalibration and before the 2013 GFT recalibration: i.e., week 39 of 2010 to week 30 of 2013 (October 3, 2010 to July 27, 2013; 147 weeks). Although the recalibration by Google was reported in October 2013, it was applied to the data retrospectively starting August 1, 2013 _{i}

This equation states that the relative change in %ILINet from the previous week is proportional to the relative change in %GFT from the previous week; in practice, the former is the relative change from the preliminary %ILINet value (i.e., the predictor) available from the previous week (week _{i}_{i}

In addition to our main analysis, we conducted four subanalyses. First, we assessed whether the same transformation parameter (

We compared our transformed GFT estimates to the %ILINet value using four metrics: 1) the proportion of weeks in which the relative percent difference between the transformed %GFT estimate and the f%ILINet value was within ±5% or ±10% of the f%ILINet value, 2) sum of the squared errors, 3) relative percentage difference in peak magnitude in 2012–13, and 4) difference in peak timing in 2012–13. We limited our evaluation of (1) to weeks when f%ILINet was above the CDC’s reported national baseline value for that season: 2.5% for 2010–11

Ethics approval was not required because all data were publicly available. Analyses were conducted using SAS 9.4 (SAS Institute Inc., Cary, North Carolina) and R version 3.0.2

We found that the relative change in %GFT closely approximates the relative change from p%ILINet to f%ILINet and this relationship can inform %GFT for the current week, _{i}_{i}_{i}

This transformation produces GFT estimates that more closely approximate %ILINet values compared to the original %GFT estimates (_{i}_{i}_{i}_{-1} on its own as an estimate of f%ILINet_{i}_{i}_{i}_{-1})^{2}) was appreciably larger (17.0). This suggests that %GFT_{i}_{i}

The final CDC value (f%ILINet; blue) is compared to the GFT estimate (%GFT; red) and the transformed GFT estimate using

Estimate | 2010–13 seasons(Week 40, 2010 to Week 30, 2013) |
2013–14 season(Week 31, 2013 to Week 10, 2014) |
||||||||

No. (%)above baseline weeks |
No. (%)above baselineweeks within ±10% | Sum of squared errors | Relative % difference in peak magnitude (2012–13) | Difference in peak timing (2012–13) | No. (%)above baselineweeks |
No. (%)above baselineweeks within ±10% | Sum of squared errors | Relative % difference in peak magnitude | Difference in peak timing | |

Original2009%GFT | 0 (0.0) | 2 (6.9) | 177.4 | 74.1 | 3 weeks after | - | - | - | - | - |

Recalibrated 2013%GFT | - | - | - | - | - | 6 (43) | 8 (57) | 3.8 | 9.1 | 1 week after |

Preliminary %ILINet from previous week | 8 (28) | 12 (41) | 17.0 | - | - | 2 (14) | 5 (36) | 5.7 | - | - |

Transformed%GFT ( |
8 (28) | 17 (59) | 12.1 | –2.2 | 1 week after | 5 (36) | 11 (79) | 2.1 | −1.3 | 1 week after |

Transformed %GFT ( |
- | - | - | - | - | 6 (44) | 10 (71) | 1.6 | 2.4 | 1 week after |

During the 2013–14 season, 14 weeks were above baseline.

In our first subanalysis, when we applied the transformation using

We transformed the percentage of sentinel physician visits related to ILI estimated by GFT to better match the ILINet value reported by the CDC, thus reducing the overestimation produced by GFT during the 2012–13 influenza season by an absolute difference of 4.4 percentage points and shifting the peak ahead by two weeks. Relative to the peak in %ILINet, the transformed GFT peak in 2012–13 was only 2.2% lower in magnitude (an absolute difference of 0.13 percentage points) and one week later. Key features of this transformation are that it can be calculated approximately one week before the final %ILINet value is reported by the CDC and it appears stable over the time period investigated (2010–13).

The stability of our transformation equation is important because it means that the same equation with the same value of

Although our transformed GFT estimate closely matches %ILINet, the largest discrepancy between these values (1.3 percentage points) was seen during week 1 in 2012–13, when %ILINet peaked in week 52 and our transformed %GFT peaked in week 1. However, the height of the peak in %ILINet may have been somewhat artifactual due to a decrease in total physician visits (the denominator of %ILINet) during the Christmas holidays in week 52. In contrast, the absolute number of ILI-related physician visits (the numerator of %ILINet) peaked in week 2 of 2012–13, one week after our transformed %GFT estimate peaked. We observed the same phenomenon in our subanalysis of 2013–14, when %ILINet peaked in week 52 but its numerator peaked in week 1, the same week as our transformed %GFT estimate peaked.

We note several limitations to our study. First, our main analysis was conducted using the previous 2009 GFT model, which Google recalibrated in October 2013. Although our transformation appears to work in the first 32 weeks of the recalibrated 2013 GFT model, continued evaluation and monitoring are needed; our transformation needs to be more thoroughly assessed and potentially modified using the new 2013 GFT model and any future GFT models, which may have different relationships with %ILINet. Additionally, the estimates reported by GFT for August-October 2013 were not prospectively estimated; rather, they were retrospectively updated with the recalibrated model. Second, our approach to transform GFT estimates using the most recent %ILINet value would create somewhat of a delay in reporting GFT estimates. Given that the p%ILINet value for week _{i}_{i}

Herein we provide a simple transformation that can be implemented prospectively to improve GFT estimates for the United States, approximately one week before %ILINet values are reported. We anticipate our results could help to inform future use of GFT.

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(DOCX)

(DOCX)

We thank the anonymous reviewers for their helpful comments on the previous version of our manuscript, including the insightful recommendation to use preliminary %ILINet values in our calculations and suggestion to conduct regional analyses.