Norovirus outbreaks severely disrupt healthcare systems. We evaluated whether Websök, an internet-based surveillance system using search engine data, improved norovirus surveillance and response in Sweden. We compared Websök users' characteristics with the general population, cross-correlated weekly Websök searches with laboratory notifications between 2006 and 2013, compared the time Websök and laboratory data crossed the epidemic threshold and surveyed infection control teams about their perception and use of Websök. Users of Websök were not representative of the general population. Websök correlated with laboratory data (b = 0.88-0.89) and gave an earlier signal to the onset of the norovirus season compared with laboratory-based surveillance. 17/21 (81%) infection control teams answered the survey, of which 11 (65%) believed Websök could help with infection control plans. Websök is a low-resource, easily replicable system that detects the norovirus season as reliably as laboratory data, but earlier. Using Websök in routine surveillance can help infection control teams prepare for the yearly norovirus season.
Citation: Edelstein M, Wallensten A, Zetterqvist I, Hulth A (2014) Detecting the Norovirus Season in Sweden Using Search Engine Data – Meeting the Needs of Hospital Infection Control Teams. PLoS ONE 9(6): e100309. https://doi.org/10.1371/journal.pone.0100309
Editor: Vittoria Colizza, Inserm & Universite Pierre et Marie Curie, France
Received: March 26, 2014; Accepted: May 26, 2014; Published: June 23, 2014
Copyright: © 2014 Edelstein et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All data are included within the manuscript.
Funding: The authors have no support of funding to report.
Competing interests: The authors have declared that no competing interests exist.
Norovirus is a leading cause of gastroenteritis , responsible for sporadic cases as well as outbreaks . In the United States alone, it causes between 19 and 21 million cases of acute gastroenteritis annually . Norovirus outbreaks peak during winter months in temperate climates  and are therefore referred to as ‘winter vomiting disease’. Other pathogens, such as rotavirus, also contribute to acute gastrointestinal illness in the winter months and can lead to several peaks of gastrointestinal illness during that season . In Northern Europe, winter vomiting disease occurs every year , ,  although the onset of the season may vary . Norovirus activity increases with decreasing temperature and humidity . Infection with norovirus is characterized by acute onset of nausea, vomiting, abdominal cramps, myalgia, and non-bloody diarrhea. Symptoms usually resolve in 2–3 days . Almost 90% of individuals with norovirus illness do not seek healthcare  and estimations suggest that national surveillance systems only capture 1 in 1,500 community cases . During the winter vomiting disease season, community transmission of norovirus is typically followed by healthcare associated outbreaks . Healthcare associated outbreaks affect patients and disrupt hospital functioning because of containment procedures (e.g., closure of wards, postponement of surgery) and staff absenteeism , , . They are also costly: a 2011 study estimated that each nosocomial norovirus infection case cost $6 237 . Early implementation of prevention measures including hand hygiene, staff exclusion and disinfection shorten norovirus outbreaks  and reduce hospital costs .
In Sweden, the incidence of winter vomiting disease in the community is unknown although norovirus causes 60% of all gastroenteritis outbreaks . As of 2012, surveillance relied on laboratory notification, mainly reflecting hospitalized cases. Such systems do not provide information on community transmission, leading to under-ascertainment, and are subject to reporting lags .
Analyzing patterns of words entered in online search engines is an alternative surveillance method to obtain information on outbreaks. This technique was first piloted for influenza surveillance  and can be performed with syndromic and disease-specific terms . However, the ability of word pattern analysis using generic search engines to accurately predict influenza outbreaks or estimate their magnitude has been questioned , , . Health websites provide better epidemiological information for internet-based surveillance than generic search engines , . Word pattern analysis based surveillance has also been tested for acute gastrointestinal illness  and norovirus specifically . Search patterns for gastroenteritis-related terms correlated with laboratory notification patterns , . Likewise, the volume of calls related to vomiting queries to a national health helpline provided a timely indicator of forthcoming healthcare-associated norovirus outbreaks . In 2007, The Swedish institute for Communicable disease control (SMI, reorganised into Public Health Agency of Sweden since January 2014) started ‘Websök’, a system that routinely analyzes data generated by search queries entered by the public in åwww.vårdguiden.se, the official health portal for Stockholm county. Since 2009, SMI has been using Websök for influenza-like illness surveillance . In 2010, a preliminary analysis of Websök use for norovirus suggested that the pattern of winter vomiting disease related searches followed the laboratory notification pattern  and was not influenced by mentions of winter vomiting disease in the Swedish media or by other pathogens . In addition, Websök might detect the onset of the winter vomiting disease season earlier than laboratory based surveillance . An early warning for the beginning of the norovirus season can help infection control teams put measures in place in a timely manner . Well prepared institutions suffered less disruption, morbidity and cost . Although preliminary information about Websök is encouraging, the system has not been formally evaluated. We therefore evaluated whether Websök is an accurate, timely and representative tool for the surveillance of winter vomiting disease, as a complement to laboratory surveillance, and whether its use adds public health value.
In order to evaluate Websök's data accuracy, timeliness, representativeness and usefulness, we determined whether: (i) Websök's data correlated with voluntary laboratory reporting data (ii) Websök detected the winter vomiting season earlier than laboratory surveillance (iii) Vårdguiden.se users were representative of the Swedish population (iv) There was an added public health value to earlier detection of the winter vomiting disease season.
The query logs do not record any personal information. We could therefore not link online searches to any identifiable information, and therefore did not seek ethical committee clearance. We used STATA version 12 for all statistical analyses.
Evaluating data accuracy
From week 27 in 2006 to week 26 in 2013, we counted the weekly number of laboratory notifications from 16 regional laboratories across Sweden, using the dates of notification to SMI, and the number of weekly searches for two norovirus related search terms: “kräk” (vomiting) and “vinterkräksjuka” (winter vomiting disease), using the dates when the searches occurred, obtained from the vårdguiden.se website logs. Searches for the term “kräk” included searches for longer words with the term “kräk” in them, such as “vinterkräksjuka”. To standardize the data, adjust for trends, and smooth, we expressed the weekly number of notifications and searches for each term as the proportion of the total number of notifications or searches for each season (from week 27 one year to week 26 the next year) and used a five week moving average. We compared Websök and laboratory surveillance data in terms of trends over time, by cross correlating smoothed laboratory notifications with smoothed weekly searches for each search term over the specified period.
We defined the onset of the yearly winter vomiting season as the exceedance of the upper prediction interval of baseline norovirus activity (epidemic threshold) , itself defined by fitting harmonic functions on the time period with no or little activity . Based on visual inspection of the data, we used the period between June through October (weeks 23 to 44, or 43 when week 44 starts in November) as baseline for laboratory data and weekly searches for “kräk” and “vinterkräksjuka”. We calculated 95% and 99% prediction intervals. For each season between 2006–07 and 2012–13, we determined the week where the number of notifications or searches exceeded the epidemic thresholds. We then measured the time interval between the epidemic thresholds crossing for the laboratory data and each Websök search term, both for individual seasons as well as the mean for the whole study period.
We obtained vårdguiden.se users' characteristics from a 2012 survey of 3,000 users and the 2012 general Swedish population characteristics from the Statistics Sweden website (www.scb.se, accessed 2014 June 2). We compared vårdguiden.se users with the general population in terms of age, sex, educational attainment, and county of residence using Chi-square goodness of fit tests. Google analytics data identified the counties from which users had accessed vårdguiden.se.
We prepared a questionnaire using survey generator (http://www.alstra.se/sv/enkatverktyg, accessed 2014 June 2) following in depth interviews with two infection control teams. The questionnaire contained questions regarding experience of hospital norovirus outbreaks, usual triggers for implementing norovirus control strategies, use of norovirus surveillance information, perception of internet-based surveillance data and perception of the usefulness of an early warning to the norovirus season. All county infection control teams in Sweden received the questionnaire. Non responding teams received up to two reminders. We analysed the survey results using MS Excel.
Between week 27 in 2006 and week 26 in 2013, laboratories reported 46,765 confirmed norovirus infections to SMI, and the terms “kräk” and “vinterkräksjuka” were searched 91,630 and 36,576 times respectively in vårdguiden.se. The total number of searches per year peaked in 2010 for both terms. After standardizing, adjusting for trend and smoothing, the number of laboratory notifications correlated with both the number of searches for “kräk” and “vinterkräksjuka”. (correlation coefficient = 0.88 for “vinterkräksjuka”, 0.89 for “kräk” with a lag of 6 weeks for both search terms). Graphically, the trends for laboratory notifications and search terms were similar. However, the number of searches increased earlier in the year than laboratory notifications (figure 1). In addition, a double peak can be seen in the search term data (most clearly in the 2006–07, 2010–11 and 2011–12 seasons, Figure 1) but not in the laboratory notifications data.
Figure 1 represents for each week between week 27 2006 and week 26 2013, the norovirus notifications and online searches for the terms “kräk” (vomiting) and “vinterkräksjuka” (winter vomiting disease) expressed as weekly proportion of the yearly total after smoothing, standardisation and adjusting for trend. The blue line represents laboratory notifications, the red line represents online searches for “kräk” (vomiting) and the green line represents online searches for “vinterkräksjuka” (winter vomiting disease).
Compared with laboratory notifications, the number of searches for “kräk” and “vinterkräksjuka” exceeded the 99% upper prediction interval of the baseline activity an average of two weeks earlier (range 0–6, Table 1). When using the upper 95% prediction interval, the number of searches exceeded the threshold an average of two weeks earlier (range 0–8) for “kräk”, and three weeks earlier (range 0–8) for “vinterkräksjuka” (Figure 2). Of seven norovirus seasons, Websök detected the season onset earlier than laboratory data between 4 and 6 times, depending on the search term and threshold used (Table 1). Additionally, the number of searches for “kräk” exceeded the 95% threshold during low activity months for two seasons in a row (week 36 in 2009 and week 33 in 2010).
Dots represent the weeks the threshold for norovirus season onset is crossed. Figure 2 presents three separate graphs. The black graph represents laboratory notifications, the green graph represents online searches for the term “vinterkräksjuka” (winter vomiting disease) and the red graph represents online searches for the term “kräk” (vomiting). Each graph presents the number of notifications or searches per week between week 27 2006 and week 26 2013, along with a baseline and the 95% upper prediction interval line of the baseline. For each year, on each graph, the dots represents the week when the number of searches or notifications exceeded the 95% upper prediction interval line of the baseline, thus indicating when the onset of the norovirus season is detected in each case.
Compared with the general Swedish population, Vårdguiden.se users were more likely to be female, university educated and aged 31–65 (p<0.001 for each, Table 2). Of the 17,323,214 visits to the website in 2012, 16,221,649 (94%) originated from Sweden. The geographical distribution of the visits differed from what the population distribution (p<0.001). Stockholm county accounted for 22% of the population but accounted for 46% of the domestic visits to the website.
Of 21 counties, 17 (81%) answered the survey. All were affected by norovirus outbreaks in the 2012–13 season, lasting up to 60 days and involving at least 2,664 patients and staff. During the 2012–13 norovirus season, 69% infection control teams closed wards, 69% had sick members of staff, 80% restricted staff to working on specific wards and 13% redirected patients to other hospitals (Table 3). Infection control teams reported a range of preventive and reactive measures when dealing with hospital norovirus outbreaks (Table 3). Regarding timing of infection control measures, 35% respondents began implementing infection control measures after receiving information that the winter vomiting season had started, 18% did so at a fixed date in October or November every year and 35% began once an outbreak was declared (Table 3). 12% teams received a warning to the beginning of the 2012–13 norovirus season and 56% actively searched for information regarding the beginning of the norovirus season (Table 3). 54% infection control teams considered web-based surveillance as trustworthy as laboratory data but 38% thought it was less trustworthy (Table 3). 52% of teams said they would trust web based surveillance information only as a complement to laboratory data (Table 3). 88% considered that a Websök based early warning would constitute useful information, 65% thought it would help them direct their infection control strategy and 31% stated it would help decrease the number or size of norovirus outbreaks in hospitals (Table 3).
We evaluated Websök, an internet-based surveillance system for norovirus, and found that it was a reliable, timely and useful tool to detect the onset of the norovirus season. Where search query data are readily accessible, it is a simple system to set up and practically free to run.
Trends in norovirus related online queries correlated with laboratory surveillance. This finding is consistent with results of investigations into other internet based surveillance systems, both for norovirus  and influenza . The six week lag between laboratory data and Websök data in the cross-correlation analysis may be partly explained by a lag in laboratory reporting, partly by the delay between community circulation and circulation in hospitals, and partly by the fact that Websök is a high sensitivity, low specificity tool, potentially capturing events or episodes not attributable to norovirus and therefore occurring in a different timeframe. Since Websök is based on data from a health portal, it might be more adequate for outbreak detection that systems based on generic search engine data. However, it is not possible to equate one search in Websök with one norovirus case. Also, as searches in Websök are anonymous, it is not possible to differentiate for example between ten individuals searching information on winter vomiting disease and one individual searching for information ten times at different occasions. Likewise, Websök cannot differentiate between individuals who search because of their symptoms and those who are well and search out of general interest. Although the detection of the season onset was not influenced by other pathogens overall, the double peak seen in several seasons in the Websök data, but not in the laboratory data, could be caused by distinct pathogens, a phenomenon consistent with the literature . For these reasons, interpretation of Websök data should be restricted to overall trends and detection of the season onset and cannot be extended to severity or magnitude. This restriction is further warranted by the reported over-estimation of outbreak magnitude when using search engine data , . In light of these limitations, Websök should be seen as a complement, rather than an alternative, to laboratory surveillance.
Compared with laboratory surveillance, Websök detected the onset of the winter vomiting season two to three weeks earlier on average, depending on keyword and prediction interval used. This finding was consistent with studies from other countries , . When evaluating different search terms, our results suggested that compared with a generic search term such as “kräk” (vomiting), a more specific search word such as “vinterkräksjuka” (winter vomiting disease) provided better results, for three reasons. First, “kräk”, as a generic term, is more sensitive and generated alerts during the low activity months which were not confirmed by laboratory data. Second, “vinterkräksjuka” achieved early detection of the norovirus season onset for more seasons than did “kräk”. Third, when we used the 95% upper prediction interval as the epidemic threshold, the term “vinterkräksjuka” detected the onset of the winter vomiting season earlier than with a higher prediction interval. However no combination of search term and epidemic threshold gave an early warning for all investigated seasons. When the delay between laboratory surveillance and Websök was particularly long, such as in the season 2009–10, the laboratory detection surveillance occurred later while the Websök detection did not occur earlier. In 2009–10 in France, the winter onset of the acute gastrointestinal illness season was delayed by 5 weeks . In Sweden, it was unclear whether the discrepancy in that year is caused by a false positive detection in the search term data, a late detection using laboratory data for an unknown reason, or a longer lag between community circulation and hospital circulation.
An early signal per se is not an end in itself. The detection signal should reflect the epidemiology of the virus, and be based on plausible data. Too early a signal, disconnected from the norovirus season would have little public health value. Websök purports to reflect circulation of the virus in the community, which on average precedes detection by laboratory data, reflecting hospital circulation, by 2–3 weeks. This signal is close enough to the onset of nosocomial outbreaks to prompt infection control teams to act. Search engine data surveillance systems, such as Websök, have high sensitivity and low specificity by nature, with an inherent risk of false or artificially early signals. Websök mitigates this risk and increases its specificity by relying on a local health portal data and by using a search term, which, in Sweden, is strongly associated with norovirus.
Websök's users were not representative of the Swedish population. However, as our objective was to detect the nation-wide onset of the norovirus season rather than to precisely estimate the incidence of disease, this lack of representativeness is unlikely to bias the overall detection of the season onset. One consequence of Websök's lack of representativeness is that the early signal may not apply to all Swedish regions since season onset is influenced by climate, which may vary in different parts of Sweden at any given time. In December 2005 for example, southern Sweden experienced increasing norovirus activity, while the rest of the country reported relatively low activity . In our evaluation, the detection of the norovirus season onset was likely biased towards Stockholm since users from that area were over-represented.
For a surveillance system to be useful, it needs to lead to public health action. The perceived usefulness by the intended end-users is of particular interest. Our survey provided important information on the public health usefulness of early detection and the current preparedness activities and attitudes. Most infection control teams believed that an early warning to the onset of the winter vomiting disease season would help them plan their infection control strategies. However, infection control teams perceived web-based data as less reliable than laboratory data. Therefore, these teams are likely to take both the Websök signal, earlier and less specific, and the laboratory-data signal, later and more specific, into consideration when planning infection control strategies. As of 2012, less than half of the infection control teams in Sweden who responded to our survey used surveillance-based evidence to plan their norovirus infection control activities. The majority started their infection control activities either at a fixed point in time or once outbreaks had occurred. If these teams were to use Websök's early warning, they would be able to implement earlier infection control measures which could reduce the number and size of outbreaks.
Conclusions and Recommendations
Websök provides surveillance data that detect the onset of the norovirus season as reliably as laboratory data, but earlier. In our survey, infection control teams viewed this early signal as useful for planning infection control measures, although they perceived web based surveillance data as less reliable than laboratory based data. Since this evaluation we have integrated Websök in routine norovirus surveillance, along with laboratory surveillance. We have also improved the collaboration between the national centre for communicable disease control and local infection control teams regarding norovirus season detection. Issue of regular newsletters and educational events could potentially increase the local infection control teams' confidence in Websök and ensure its use. Since 2013, we send email alerts to all infection control teams in Sweden when the number of searches for “vinterkräksjuka” first exceeds the upper limit of the 95% prediction interval of the baseline search activity for the season. The use of internet systems such as Websök may be replicated at low cost in countries where internet access is widespread. Based on our evaluation, we recommend (i) the use of a Websök-like surveillance system as a complement to laboratory surveillance, in countries where data for search engines is available and where norovirus is a public health concern (ii) the use of search engine data for the detection of the norovirus season using a local health-focused search engine if possible (iii) a close collaboration with local infection control units to ensure the data is understood, trusted and used optimally for early implementation of infection control measures.
We thank the infection control teams who participated in the survey, 1177-Vårdguiden for granting access to the query logs, and EPiServer for providing this access.
Conceived and designed the experiments: ME AW IZ AH. Performed the experiments: ME IZ. Analyzed the data: ME AW AH. Contributed to the writing of the manuscript: ME AW IZ AH.
- 1. Hall AJ, Lopman BA, Payne DC, Patel MM, Gastañaduy PA, et al. (2013) Norovirus disease in the United States. Emerg Infect Dis 19(8): 1198–205.
- 2. Bartsch SM, Lopman BA, Hall AJ, Parashar UD, Lee BY (2012) The potential economic value of a human norovirus vaccine for the United States. Vaccine 30(49): 7097–104.
- 3. Mounts AW, Ando T, Koopmans M, Bresee JS, Noel J, et al. (2000) Cold weather seasonality of gastroenteritis associated with Norwalk-like viruses. J Infect Dis 181 Suppl 2: S284–7.
- 4. Monto AS, Koopman JS, Longini IM, Isaacson RE (1983) The Tecumseh study XII: Enteric agents in the community, 1976–1981. J Infect Dis148(2): 284–91.
- 5. Koopmans M (2009) Noroviruses in healthcare settings: a challenging problem. J Hosp Infect 73(4): 331–7.
- 6. Hedlund K, Rubilar-Abreu E, Svensson L (2000) Epidemiology of calicivirus infections in Sweden, 1994–1998. J Infect Dis 181 Suppl 2: S275–80.
- 7. Loveridge P, Cooper D, Elliot AJ, Harris J, Gray J, et al. (2010) Vomiting calls to NHS Direct provide an early warning of norovirus outbreaks in hospitals. J Hosp Infect 74(4): p. 385–93.
- 8. Lopman B, Armstrong B, Atchison C, Gray JJ (2009) Host, weather and virological factors drive norovirus epidemiology: time-series analysis of laboratory surveillance data in England and Wales. PLoS One 4(8): e6671.
- 9. Patel MM, Hall AJ, Vinjé J, Parashar UD (2009) Noroviruses: a comprehensive review. J Clin Virol 44(1): 1–8.
- 10. Hall AJ, Rosenthal M, Gregoricus N, Greene SA, Ferguson J, et al. (2011) Incidence of acute gastroenteritis and role of norovirus, Georgia, USA, 2004–2005. Emerg Infect Dis 17(8): 1381–8.
- 11. Wheeler JG, Sethi D, Cowden JM, Wall PG, Rodrigues LC, et al. (1999) Study of infectious intestinal disease in England: rates in the community, presenting to general practice, and reported to national surveillance. The Infectious Intestinal Disease Study Executive. BMJ 318(7190): 1046–50.
- 12. Vega E, Barclay L, Gregoricus N, Williams K, Lee D (2011) Novel Surveillance network for norovirus gastroenteritis outbreaks, United States. Emerg Infect Dis 17(8): 1389–95.
- 13. Lee BY, Wettstein ZS, McGlone SM, Bailey RR, Umscheid CA, et al. (2011) Economic value of norovirus outbreak control measures in healthcare settings. Clin Microbiol Infect 17(4): 640–6.
- 14. Lopman BA, Reacher MH, Vipond IB, Hill D, Perry C, et al. (2004) Epidemiology and cost of nosocomial gastroenteritis, Avon, England, 2002–2003. Emerg Infect Dis 10(10): 1827–34.
- 15. Eysenbach G (2006) Infodemiology: tracking flu-related searches on the web for syndromic surveillance. AMIA Annual Symposium Proceedings Archive: 244–8.
- 16. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS (2009) Detecting influenza epidemics using search engine query data. Nature 457(7232): 1012–4.
- 17. Olson DR, Konty KJ, Paladini M, Viboud C, Simonsen L (2013) Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales. PLoS Comput Biol 9(10): e1003256.
- 18. Lazer D, Kennedy R, King G, Vespignani A (2014) The parable of Google Flu: traps in big data analysis. Science 343(6176): 1203–5.
- 19. Butler D (2013) When Google got flu wrong. Nature 494(7436): 155–6.
- 20. Hulth A, Rydevik G (2011) GET WELL: an automated surveillance system for gaining new epidemiological knowledge. BMC Public Health 11: 252.
- 21. Pelat C, Turbelin C, Bar-Hen A, Flahault A, Valleron A (2009) More diseases tracked by using Google Trends. Emerg Infect Dis15(8): 1327–8..
- 22. Desai R, Hall AJ, Lopman BA, Shimshoni Y, Rennick M, et al. (2012) Norovirus disease surveillance using Google Internet query share data. Clin Infect Dis 55(8): 75–8.
- 23. Hulth A, Rydevik G, Linde A (2009) Web queries as a source for syndromic surveillance. PLoS One 4(2): 4378.
- 24. Hulth A, Andersson Y, Hedlund KO, Andersson M (2010) Eye-opening approach to norovirus surveillance. Emerg Infect Dis 16(8): 1319–21.
- 25. Lee BY, McGlone SM, Bailey RR, Wettstein ZS, Umscheid CA, et al. (2011) Economic impact of outbreaks of norovirus infection in hospitals. Infect Control Hosp Epidemiol 32(2): 191–3.
- 26. Pelat C, Boëlle P, Cowling B, Carrat F, Flahault A, et al. (2007) Online detection and quantification of epidemics. BMC Med Inform Decis Mak 7: 29..
- 27. Serfling RE (1963) Methods for current statistical analysis of excess pneumonia-influenza deaths. Public Health Rep 78(6): 494–506.
- 28. Crépey P, Pivette M, Desvarieux M (2013) Potential impact of influenza A/H1N1 pandemic and hand-gels on acute diarrhea epidemic in France. PLoS One 8(10): e75226.