Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks

Background Internet-based biosurveillance systems have been developed to detect health threats using information available on the Internet, but system performance has not been assessed relative to end-user needs and perspectives. Method and Findings Infectious disease events from the French Institute for Public Health Surveillance (InVS) weekly international epidemiological bulletin published in 2010 were used to construct the gold-standard official dataset. Data from six biosurveillance systems were used to detect raw signals (infectious disease events from informal Internet sources): Argus, BioCaster, GPHIN, HealthMap, MedISys and ProMED-mail. Crude detection rates (C-DR), crude sensitivity rates (C-Se) and intrinsic sensitivity rates (I-Se) were calculated from multivariable regressions to evaluate the systems’ performance (events detected compared to the gold-standard) 472 raw signals (Internet disease reports) related to the 86 events included in the gold-standard data set were retrieved from the six systems. 84 events were detected before their publication in the gold-standard. The type of sources utilised by the systems varied significantly (p<0001). I-Se varied significantly from 43% to 71% (p = 0001) whereas other indicators were similar (C-DR: p = 020; C-Se, p = 013). I-Se was significantly associated with individual systems, types of system, languages, regions of occurrence, and types of infectious disease. Conversely, no statistical difference of C-DR was observed after adjustment for other variables. Conclusion Although differences could result from a biosurveillance system's conceptual design, findings suggest that the combined expertise amongst systems enhances early detection performance for detection of infectious diseases. While all systems showed similar early detection performance, systems including human moderation were found to have a 53% higher I-Se (p = 00001) after adjustment for other variables. Overall, the use of moderation, sources, languages, regions of occurrence, and types of cases were found to influence system performance.


Introduction
Emerging and re-emerging infectious diseases continue to pose major threats to global health security [1,2]. The Internet provides information that can be used to detect health threats early [3]. Epidemic intelligence (EI) relies mainly on event-based biosurveillance, i.e. the ad hoc detection and interpretation of unstructured information originating from multiple and not predefined sources on the Internet. Sources are varied but typically include the electronic news media and official governmental and non-governmental organisations [4]. Internet-based biosurveillance systems have been developed to monitor this large volume of information [5]. Despite substantial inherent differences, these systems all scan the Internet to detect reported related to infectious disease that could represent potential health threats, and filter unstructured information through complex algorithms. Select relevant information is stored on dedicated web-based platforms and disseminated. Information collected is then further filtered, verified, and analysed by end-users (i.e. national or international institution and stakeholders involved in EI management).
One of the limitations of event-based biosurveillance is the difficulty of applying traditional epidemiological parameters (e.g. sensitivity, specificity, positive predictive value, etc.), due to the lack of accessible data on verified outbreaks (i.e. a gold standard). The use of Internet-based biosurveillance systems is still maturing and its assessment is on-going [6,7]. Most of the available scientific literature focuses either on the assessment of biosurveillance system performance regarding detection and adequate classification of health-related information using informal open sources, or on the presentation of innovative functionalities. An important topic yet to be elucidated in the literature concerns the performance of biosurveillance systems relative to end-user needs and expectations.
This study aims at providing a quantitative evaluation of multiple biosurveillance systems' performance compared to a gold-standard.

Epidemic intelligence in France
France is a medium sized country made up of metropolitan France and eleven overseas territories scattered over Africa, America, and Oceania. French surveillance has been focused traditionally on the detection of unusual health events occurring in the national territory [8,9]. In 2002 the International Department of the French institute for public health surveillance (InVS) developed EI to detect internationally emerging health threats that could affect the French population living in France and abroad [10]. The process was formalized into five steps: detection of informal disease reports (e.g. using biosurveillance systems), selection of disease events (through a set of defined criteria), validation of the event (through a network of contacts, available official information, etc.), analysis, and communication [11].
Events targeted to the InVS public health network are integrated into the weekly international epidemiological bulletin (BHI) available on the InVS website every Wednesday [12]. Only verified events are reported in the BHI, and events are usually reported only once. Updates can occasionally be integrated but only if major epidemiological changes occur. Event not detected by the system before its reporting in the BHI or not detected primarily by the system

Abrev. Definition
Crude Detection Rate C-DR Ability of a system to detect confirmed infectious disease outbreaks. C-DR = (C-DET)/(CDET+NDET 1 ) Intrinsic Detection Rate I-DR Ability of a system to detect confirmed infectious disease outbreaks independently from other systems. I-DR = (IDET+NDET 2 ) Crude Sensitivity Rate C-Se Ability of a system to detect confirmed infectious disease outbreaks prior to the publication in the BHI. C-Se = (CTP)/(CTP+FN 1 ).
Intrinsic Sensitivity Rate I-Se Ability of a system to detect confirmed infectious disease outbreaks independently from other systems and before their publication in the BHI. I-Se = (ITP)/(ITP+FN 2 )).

Definitions
Events. Events were defined as a verified infectious disease occurrence resulting from the EI process (i.e. including verification and analysis). Events were defined by the disease, the type of cases (human or animal), the country, the province or state, and the month of occurrence regardless of the number of cases concerned. All events are considered to be independent.
Signals. Signals were defined as unverified raw infectious disease information (in relation to an event included in the goldstandard data set) collected from biosurveillance systems (informal sources). Biosurveillance systems are therefore a source of signal, but are not the source of the events included in the BHI (i.e. goldstandard data set).
Biosurveillance systems are used only for the detection of signals. Once selected signals are fully processed (i.e. systematically verified, analysed, characterised, etc.), signals can be classified as an ''event'' or ''discarded''. If biosurveillance systems are the main sources of signals, in the EI process they are not the ''source of the event''.
Gold-Standard. Ideally, infectious disease signals identified by Internet biosurveillance methods should be compared to official event reports (gold-standard) during the same time period. The EI performed at InVS, for which necessary information (rational, selection and validation processes, etc.) was readily accessible, was chosen as the best source of gold-standard data for this study.
Infectious disease events (human cases and zoonosis epizootics) reported in the BHI in 2010 were included in the gold-standard data set. A/H5N1 influenza has been considered a health threat for a long time though it is subject to substantial under-reporting [13]. Hence, A/H5N1 cases were excluded from the study.
Non-infectious disease events, not systematically monitored by all systems, were also excluded from our study.
Biosurveillance Systems. Six biosurveillance systems contributing to the Early Alerting and Reporting (EAR) project launched under the Global Health Security Initiative (GHSI) [14] were used to detect raw signals (informal reports of infectious disease): Argus, BioCaster, GPHIN, HealthMap, MedISys and ProMED (Table 1). Of those, three are fully moderated (i.e. include human analysts in the selection, sorting and/or translation processes) while the remaining three systems are based mainly on automated processes for detection, sorting, and translation (Table  1).
Data. Signals relating to events included in the gold-standard data set were retrospectively searched on all six biosurveillance systems through ad hoc queries using keywords or a series of keywords. Searches were performed by two InVS epidemiologists (i.e. independent from the six biosurveillance systems). Discordant pairs were reviewed and the most relevant signal was kept in the final database.
Rates. The crude detection rate (C-DR) was defined as the ability of a system to detect an infectious disease event, the intrinsic detection rate (I-DR) was defined as the ability of a system to detect outbreaks independently from other systems. The crude sensitivity rate (C-Se) was defined as the ability of a system to detect outbreaks prior to their publication in the BHI. The intrinsic sensitivity rate (I-Se) was defined as the ability of a system to detect outbreaks independently from other systems and before their publication in the BHI (see table 2 for details).
Associated factors. Bivariable and multivariable modified Poisson regressions (well suited to produce rate ratios) were used to assess the associated factors [15]. Rate Ratios (RR), 95% confidence intervals (CI) and p values were computed to assess the strength of these associations. The same variables were included in the different regression models. Potential interactions were tested for their significance at the 005 level. All analyses were performed using Stata 12.1 (StataCorp LP, USA).

Gold-Standard
In 2010, 132 events were reported in the BHI. 46 (35%) were excluded (26 global overviews or long-lasting events, 12 follow-ups of previously reported events and 8 non-infectious health occurrences). The 86 events included in the gold-standard data set occurred in 46 countries; 23 (27%) in the Americas, 22 (25%) in Africa, 17 (20%) in Europe, 14 (16%) in Asia and 10 (12%) in Near-East and North-Africa ( Figure 1). Twenty-two disease events were identified. West Nile virus (WNV) infection and dengue infections represented 21% (n = 18) and 14% (n = 12) of the events, respectively. Important variations were observed according to the region of occurrence, e.g. WNV infection represented 76% of events reported in Europe, 50% in the North-Africa and Near-East, but none of the events reported in America or in Asia (Table 3).

Signals Detected
A total of 472 raw signals relating to events included in the goldstandard data set were retrieved from the six biosurveillance systems. The language of the initial source was English for 53% (n = 249), Spanish for 15% (n = 72), French for 11% (n = 53) and other languages for 20% (n = 94). The remaining 4 events were detected through several sources in different languages and information was insufficient to rank them. Language sources varied according to the systems (p = 0.063) and to the region of occurrence of the event (p,0.001). All 86 events included in the gold-standard data set were detected by at least one of the systems and 57 (66%) were detected by all six systems. When early detection was considered, 84 (98%) events were detected before their publication in the BHI, 30 (35%) were detected by the six systems, 26 (30%) by five systems, 10 (12%) by four systems, 7 (8%) by three systems, 6 (7%) by two systems, 5 (6%) by a single system and 2 events (2%) were not detected prior to their publication in the BHI. According to the system the proportion of media sources utilised by the systems ranged from 44% to 73%, official sources from 6% to 32%. Raw signals originating from another system included in the study ranged from 1% to 49% (p,0.001) ( Table 4).

Factors associated with Crude or Intrinsic detection
In the bivariable regression analysis individual systems, moderation of system, languages, regions of occurrence, types of disease, and types of case were significantly associated (Table 6). No significant differences were observed across and within systems for C-DR; I-DR and I-Se varied significantly across and within systems; one system (HealthMap) showed a significantly higher C-Se than moderated systems, but across systems the difference was not significant (p = 0.13) ( Table 5).
From the multivariable Poisson regression models, no statistical difference in C-Se was observed after adjustment for individual systems, moderation, languages, types of diseases, regions of occurrence, and type of cases. Conversely, systems, moderation, languages, regions of occurrence, and types of disease were all significantly associated with I-Se (Table 7). Potential interactions were tested, and none were found significant at the 0.05 level.

Discussion
The systems' characteristics (type of moderation, sources accessed, diseases, languages, and regions covered) were found to significantly influence disease detection performance. This highlights the differences in conceptual design used to develop the biosurveillance systems, and the importance of taking advantage of synergies through combining systems' data for infectious diseases detection.
C-DR was used to evaluate the global detection rate independently from the EI quality and type. With C-DR values ranging from 83% to 95%, all systems were found to have a similar ability to detect infectious events, findings consistent with other studies [23][24][25][26][27][28][29][30][31][32][33][34]. However, C-DR is a very crude indicator that does not take into consideration the main EI objective of early detection.
C-Se provides a better estimation of the systems' ability to detect infectious diseases outbreaks early in a given framework defined by the chosen gold-standard. Although biosurveillance systems were originally designed to detect relevant information though informal sources, they now include a noteworthy proportion of early released official information. C-Se of early published official information was 27% lower than C-Se of media sources (p = 0001) underlining the usefulness of media sources in the detection of communicable diseases outbreaks. Despite their different conceptual designs and notably the type of sources used, all systems demonstrated remarkably similar early detection capacities as C-  Se remained comparable even after adjusting for other variables (p = 013). A high level of cross-feeding (i.e., a system using another fellow system as a source) was documented, further emphasising the synergistic qualities of the systems. For automated systems, 39% of early detected signals were collected from another system versus 1% for moderated systems (p,0001). I-Se provides an appropriate proxy to assess a system's detection rate. Individually (data not shown) and collectively (Table 6) moderated systems were found to have a 53% higher I-Se (p,0001) than automated systems after adjustment for the other variables included in the model. This increased I-Se of moderated systems can be attributed directly to their common characteristic: the human moderation. The difficulties met by the systems in developing an efficient algorithm covering the different facets of a single disease have been demonstrated by a previous study [7]. Hence, our study illustrates the significant added value resulting from the input of human analysts and their ability to balance the limits inherent to a fully automated detection. Yet, I-Se should be considered as a lower limit, because for each system only the first detected signal was considered. Therefore, it cannot be excluded that some signals primarily collected via a fellow system may have been later detected through another source, but still earlier than the goldstandard. All systems integrate a de-duplication module, aiming at reducing the volume of redundant information; de-duplication performance varies according to systems. A consistent collection of the second detected signals was not possible across all systems, and the weight of this potential bias cannot be estimated.
At the time of the study, none of the systems were able to detect early all events included in the gold-standard data set, substantiating the necessity for end-users to use several systems in parallel. The purpose of cross-feeding is to increase sensitivity by utilising all available pieces of information. However, such cross-feeding matters when several systems are used in parallel. In a previous study, some authors documented that major EI stakeholders routinely accessed four to seven different systems for event detection [7]. In this context, cross-feeding generates a substantial level of duplication for end-users. In the current study, 43% of signals detected by automated systems would have already been seen on a fellow system. This stresses the importance of developing a common tool that would combine system outputs and specificities while reducing duplication.
Overall 97% of detected signals were published in seven languages (English, Spanish, French, Russian, Portuguese, Arabic and Chinese), findings consistent with another study [16].These languages were the first integrated by the systems and as such the linguistic methodology (i.e. ontology) might be better developed for those languages than for languages incorporated more recently. Signals in all languages incorporated in the systems were systematically considered. Yet, it cannot be formally excluded that it might have been easier for analysts to detect information published in English, French, Spanish, or Portuguese than in information published in languages requiring systematic translation (e.g., Arabic, Chinese) or rarely used. Despite this potential bias, the results underline the importance of multiple languages tools. The lack of recognised and consistently available reference sources across diseases and regions represents a major challenge to the evaluation of EI and biosurveillance systems [17,18]. The choice of the gold-standard for this study (BHI) might have impacted the results. The disease and the location of occurrence are among the selection criteria used for EI at InVS. Events occurring in an area close to a French territory were more likely to be retained. Arboviruses represent a risk of exportation especially in overseas territories where competent vectors are present [19], as illustrated in 2006 by the outbreak of chikungunya virus, which affected over one third of the population of both Reunion and Mayotte islands [20]. For France, the circulation of major arthropod-borne infections in previously non-endemic areas is perceived as a threat to the blood supply, and therefore is a topic of high interest. This may partially explain both the high proportion (64%) of vector-borne diseases among reported events and the specific attention placed on those events. For instance, 2010 was a year marked by unprecedented WNV circulation in the Mediterranean area [21] and all 18 WNV infection events reported in 2010 occurred in Mediterranean countries not previously considered as endemic. Hence, other EI stakeholders, with different objectives or disease distribution, would certainly select different events resulting in another gold-standard data set. The performance of the EI process might also affect the results. Unfortunately, in the absence of a recognised international goldstandard, the sensitivity of the chosen gold-standard could not be assessed. This assessment should be carried out, but was beyond the current studies objectives. Although, these results are not fully representative, France's large geographic distribution (spread over four continents) and systematic and stable approach to EI suggest that the results from this study provide a larger overview that can be transposed to other contexts. Others studies using other goldstandards should be implemented to better assess and ensure generalizability of the results.
The number of events included in the gold-standard was too limited to allow stratified analysis and may have undermined potential associations. For example, it is likely that some systems might have developed specific competences in specific languages, for certain geographic regions, or specific diseases, but this aspect could not be explored since biosurveillance systems are in constant evolution, and as such, the short time frame was chosen to limit intra-system variability. An extended study time period to generate more disease events could be considered for a future study.
The retrospective search for information might have influenced the results. In a previous study, authors found that prospective detection rates were 17% lower than the retrospective sensitivity rate [7]. However; this bias applies equally to all systems and does not affect the overall findings. Similarly, in this study no significant difference (p = 0.53) was found between the two InVS epidemiologists suggesting that the user-bias was limited. In conclusion, infectious diseases, environmental issues, and potential bioterrorist threats will continue to pose major risks for global health security and epidemic intelligence is now an essential component of early warning systems. Overall, the systems' disease detection capabilities are complementary (synergistic) with demonstrated timeliness and sensitivity [7]. The output from these systems and also the expertise of the public health institutions responsible for EI should be pooled for optimal early detection. Internet biosurveillance systems have evolved substantially; sufficient data is now available to implement robust validation studies using epidemiological approaches against an official comparison data set. Moreover, larger scope studies should be implemented that would prospectively involve major stakeholders, increase the number of epidemiologists involved, and enable implementation of innovative strategies to pool the expertise developed by the different systems. Especially, a more robust composite gold-standard that pools information and expertise from national and international institutions in charge of EI activities should be developed.