Diagnostic accuracy of the WHO clinical definitions for dengue and implications for surveillance: A systematic review and meta-analysis

Background Dengue is the world’s most common mosquito-borne virus but remains diagnostically challenging due to its nonspecific presentation. Access to laboratory confirmation is limited and thus most reported figures are based on clinical diagnosis alone, the accuracy of which is uncertain. This systematic review assesses the diagnostic accuracy of the traditional (1997) and revised (2009) WHO clinical case definitions for dengue fever, the basis for most national guidelines. Methodology/Principal findings PubMed, EMBASE, Scopus, OpenGrey, and the annual Dengue Bulletin were searched for studies assessing the diagnostic accuracy of the unmodified clinical criteria. Two reviewers (NR/SL) independently assessed eligibility, extracted data, and evaluated risk of bias using a modified QUADAS-2. Additional records were found by citation network analysis. A meta-analysis was done using a bivariate mixed-effects regression model. Studies that modified criteria were analysed separately. This systematic review protocol was registered on PROSPERO (CRD42020165998). We identified 11 and 12 datasets assessing the 1997 and 2009 definition, respectively, and 6 using modified criteria. Sensitivity was 93% (95% CI: 77–98) and 93% (95% CI: 86–96) for the 1997 and 2009 definitions, respectively. Specificity was 29% (95% CI: 8–65) and 31% (95% CI: 18–48) for the 1997 and 2009 definitions, respectively. Diagnostic performance suffered at the extremes of age. No modification significantly improved accuracy. Conclusions/Significance Diagnostic accuracy of clinical criteria is poor, with significant implications for surveillance and public health responses for dengue control. As the basis for most reported figures, this has relevance to policymakers planning resource allocation and researchers modelling transmission, particularly during COVID-19.


Introduction
Dengue is the most common mosquito-borne virus worldwide, with an estimated 390 million annual infections globally (last calculated in 2010) [1]. Although the majority of infections are asymptomatic, they likely contribute to viral transmission [1], similar to the ongoing COVID-19 pandemic. As healthcare systems deal with COVID-19, many countries in Latin America and Asia are reporting an increase in dengue cases [2,3], raising concerns of a 'double epidemic' that could overwhelm fragile health systems. As clearly evidenced by COVID-19, the global importance of local disease control cannot be overstated, and it is therefore essential that the current pandemic does not lead to setbacks in dengue control [4]. However, that is only possible if accurate transmission data are available, which is not the case for dengue.
Dengue lacks the robust standardisation of WHO reporting found in other infections such as malaria. Aside from high levels of underreporting [5], the diagnostic accuracy of reported cases remains unclear. Despite recent developments in dengue diagnostics, there is significant variation in accuracy between different tests and different assays of the same test [6]. Access to testing is limited and not mandated in many dengue-endemic countries [7]. Consequently, confirmatory testing is often not done, with only 43% of cases reported to the Pan-American Health Organisation in 2019 being laboratory-confirmed [8]. Equivalent reports could not be found for the Western Pacific and Southeast Asia, although research studies have found low confirmation rates in these regions [9,10]. Thus, most reported cases are likely to be based solely on clinical diagnosis, the accuracy of which has not been formally studied.
Guidelines for the clinical diagnosis of dengue were published by the WHO in 1997 and 2009. The 1997 ('traditional') definition classifies cases into dengue fever (DF), dengue haemorrhagic fever (DHF), and dengue shock syndrome (DSS) [11]; while the 2009 ('revised') definition classifies cases into dengue and severe dengue [12]. In both guidelines, laboratory confirmation is not necessary to diagnose 'probable' dengue in endemic locations.
The development of the WHO case classification has been reviewed elsewhere [13], and whilst methodologically robust, the aim was to improve early prediction of severe disease, rather than distinguish dengue from non-dengue febrile illnesses. Thus, most studies have focused on the guidelines' prognostic value. In this systematic review, we assess the diagnostic performance of the 1997 and 2009 WHO clinical definitions of 'probable dengue' in febrile patients and discuss the implications for surveillance and control.

Methods
The protocol was registered on PROSPERO on 27/01/2020 (CRD42020165998). The PICOS statement is outlined in Table 1.

Eligibility criteria for studies
Study design and participants. Studies comparing the WHO diagnostic criteria to a suitable reference standard (see below) in patients with unexplained fever were included. There were no limitations on demographics, fever duration, healthcare setting, or geographical region. Studies were excluded if they only recruited confirmed or suspected dengue patients or excluded any dengue serotypes.
Index test and reference standard. The index tests were the 1997 [11] and 2009 [12] WHO clinical definitions for dengue. Studies applying either definition without modification ( Table 2) were included. Studies that modified the WHO criteria were analysed separately to determine what effect this had. With no accepted reference standard for dengue, any of the following, as per WHO guidance [12], were acceptable: IgM or IgG serology, plaque reduction neutralisation test or hemagglutination inhibition, NS1 antigen/antibody test, (RT-)PCR, or virus isolation.

Search methodology
PubMed, EMBASE, Scopus, and OpenGrey were searched using the strings outlined in S1 Table. Records published from 1997 to the last search on 19/1/2020 were included, with no restrictions on type of publication or language.
Search results were pooled and duplicates removed using EndNote X9 (Clarivate Analytics, USA). Abstracts of all articles and short notes in the annual Dengue Bulletin (published by WHO SEARO) from 1997-2014 (last available volume) were also included. Titles and abstracts were independently screened by two reviewers (NR and SL). This was repeated for eligible fulltext articles, with the reason for exclusion recorded. Any disagreements were resolved by a third reviewer (RJM). Authors of conference abstracts were contacted to identify related peerreviewed publications. Finally, all articles citing (from The Web of Knowledge) and cited by (from reference lists) included studies were screened. For articles not available on Web of Knowledge, Google Scholar was used. This was repeated until no more studies were identified.

Analysis
Risk of bias was assessed by two independent reviewers (NR and SL) using a modified QUA-DAS-2 tool (S2 Table) [14]. Study information and 2x2 table data (principal summary measure) were extracted by one reviewer and verified by a second reviewer (NR and SL). Any disagreements were resolved by a third reviewer (RJM). Authors were contacted for missing information, and if no response was received within 3 weeks this was repeated. If no response was subsequently received, it was recorded as not specified. Further detail can be found in S1 File.
Meta-analysis for sensitivity, specificity, and likelihood ratios for both definitions was done using the MIDAS statistical package [15] on Stata/IC 14 (College Station, TX, USA). This uses a bivariate mixed-effects regression framework to calculate average sensitivity and specificity. Deeks' funnel plot asymmetry test was used to detect publication bias for both metaanalyses.
A forest plot for sensitivity, specificity, and corresponding 95% confidence intervals was obtained. Only studies using unmodified WHO criteria were included in the meta-analysis. Heterogeneity was assessed using the I 2 and Chi-square statistics. A separate analysis was carried out excluding studies at high risk of bias.

Search results
The original search identified 1471 records. One additional record was included, identified from previous work but not found by the search as it did not mention WHO criteria in the title or abstract. After duplicates were removed, 1088 records remained. Dengue Bulletin provided 340 additional records; the 2005 and 2006 volumes could not be found online and were not screened.
119 full-text articles were assessed for eligibility, of which 16 were included. Citation analysis identified 5 additional records. In total, 21 records were included in the qualitative analysis, and 15 records using unmodified WHO criteria in the meta-analysis (Fig 1). PubMed, EMBASE, Scopus, and OpenGrey were searched for articles assessing the diagnostic accuracy of clinical criteria for dengue diagnosis, Dengue Bulletin articles and short notes were also included. The additional record was identified from previous work but did not mention WHO criteria in the title or abstract. Citation network analysis used Web of Science, Google Scholar, and reference lists. Articles assessing the diagnostic accuracy of unmodified WHO clinical criteria (1997 or 2009) for dengue were included in the meta-analysis, articles using modified criteria were included in qualitative analysis only. https://doi.org/10.1371/journal.pntd.0009359.g001

PLOS NEGLECTED TROPICAL DISEASES
The diagnostic accuracy of WHO dengue clinical case definitions

Study characteristics
Study characteristics and patient flow are summarised in Tables 3 and 4, respectively. Three records were conference abstracts, the remaining 18 came from peer-reviewed journals. Two records [16,17] presented findings from independent studies in the same publication and were thereafter treated as separate, so that final analysis contained 23 separate datasets. 5 out of 23 studies were retrospective. 11 studies were in Asia, 6 South America, 3 Europe (returning travellers), 2 Central America, and 1 Africa. Overall, there were 11 datasets comprising 10,355 patients assessing the traditional (1997) definition, and 12 datasets comprising 9,421 patients assessing the revised (2009) definition; with 6 assessing both definitions. The most common modification to WHO definitions was not using the tourniquet test, in 3 out of 6 modified studies.

Risk of bias
Risk of bias analysis for included studies is presented in Fig 2. The most common methodological flaw, in 12 out of 23 studies, was the use of an unreliable reference standard (e.g. unpaired IgM serology). The anticipated impact of the bias on each study's estimated sensitivity and specificity, along with the rationale for this choice, is provided in S3 Table.
Overall, our meta-analysis gave similar results for the two definitions' diagnostic accuracy, with high sensitivity and low specificity for both. This echoes studies that assessed both definitions, with two finding no difference [20,26], three finding a higher sensitivity and lower specificity in the 2009 definition [16,17,33], and two finding the opposite [16,34]. However, there was significant heterogeneity between studies for both definitions (Figs 3 and 4), as reflected in the wide range of reported values, high I 2 (97-100%), and statistically significant Chi-squared tests (p<0.0001), even when high-risk studies were excluded.

Modified criteria
Results from studies using modified criteria are shown in Tables 5 and S6. Diagnostic accuracy for all modifications was similar to the corresponding WHO case definition. Studies that improved [35] or worsened [32,34] both sensitivity and specificity showed high risk of bias in more than one domain (Fig 2) and should be interpreted with caution. Removing the tourniquet test reduced specificity in two studies [33,34], consistent with its association with dengue (see below), although it increased specificity in another study [36].

Effects of age on diagnostic accuracy
The sensitivity of both definitions was halved in patients under 4 years presenting in the community. The reduction was less marked for hospital presentations: approximately 10% for the 1997 definition and 2% for the 2009 definition [16]. This could be due to children's inability to report symptoms such as retro-orbital pain, myalgia, and arthralgia. In theory, the 2009 definition (which combines them as 'aches and pains') should overcome this but does not appear to do so in practice. In both community and hospital settings, this fall in sensitivity was accompanied by an increase in specificity, again less marked in hospital settings [16]. At the other extreme of age, the frequency of many symptoms associated with dengue fever, such as retro-orbital pain and mucosal bleeding, decreased with increasing age, particularly over 56 years. This led to decreasing sensitivity of both definitions in older adults [33].
Dengue may present differently in adults and children. Children (but not adults) with dengue were more likely to have sore throat, fatigue, oliguria, and elevated haematocrit and transaminases compared to children with other febrile illnesses. Conversely, adults were more likely to have joint pain [36].

Discussion
In this review, we have pooled evidence from multiple regions assessing the accuracy of the 1997 and 2009 WHO clinical definitions for diagnosing dengue fever. We have shown that both definitions have high sensitivity (93%) but poor specificity (29% and 31%). No

PLOS NEGLECTED TROPICAL DISEASES
The diagnostic accuracy of WHO dengue clinical case definitions modification improved accuracy. This makes the definitions useful rule-out criteria but unreliable as the basis for diagnosis, which is concerning given they are often used as such [8][9][10].
Clinical presentation varied with age, with diagnostic accuracy suffering at the extremes of age. As the average age of dengue cases increases, case definitions developed from paediatric studies [39] will no longer be sufficient. Overall, our findings highlight the need for an urgent reassessment of these guidelines.

Reasons for exclusion (if applicable)
Pitisuttithum 2015 [28] All patients with dengue recorded as an adverse event/severe adverse event.
All individuals without a dengue diagnosis and a severe adverse event using a preferred term that corresponded to the system organ class "Infections and infestations" or idiopathic fever (pyrexia) occurring between June and September (missed dengue cases)

PLOS NEGLECTED TROPICAL DISEASES
The diagnostic accuracy of WHO dengue clinical case definitions Two outliers (both assessing the 1997 definition) displayed the reverse pattern with high specificity and low sensitivity [24,29]. Interestingly, they both employed an active surveillance study design that monitored a community cohort for febrile illness. Given the high expansion factors associated with dengue fever [5], healthcare systems have a low sensitivity for detecting dengue cases, and thus patients presenting to health services (the majority of included studies) may not be representative of dengue cases overall, which could explain why the case definitions perform so differently in a more representative population. However, the only other included study that used a prospective fever design found a low specificity and high sensitivity [16], so caution is needed in interpreting the conclusions from these two outliers as they were both performed consecutively in the same centres and another confounding factor may be contributing. In addition, while not representative of all dengue cases, patients presenting to healthcare services are more representative of what frontline clinicians see daily. Further research is therefore warranted to better understand this differential performance of the case definitions, as it may have opposing implications for public health surveillance and clinical practice.
Seshan et al. and Lagi et al. also found a better specificity than expected (63%), but both used single samples for IgM/IgG serology as the reference standard, which would lead to more

PLOS NEGLECTED TROPICAL DISEASES
false positives in the reference standard, thus overestimating specificity of the index test (clinical diagnosis) [21,30]. Pitisuttithum et al. also found a higher specificity (74%), which could be due to their case-control study design which may not capture all febrile presentations like the other studies [28].

Surveillance implications of inaccurate clinical diagnosis
While underreporting remains a major issue for dengue [1,5], given the low specificity of the clinical definitions it is highly likely that non-dengue viral illness is also being misreported as dengue. This makes it difficult to assess the burden and spread of dengue across regions, particularly during outbreaks. While dengue is the most common cause of acute febrile illness in Southeast Asia [38] and Latin America [40], other causative agents include the arboviruses Chikungunya [24,37,38,40] and Zika [40]; respiratory viruses (e.g. influenza) [24,33,37]; and bacteria such as rickettsia and leptospirosis [38,40]. The co-circulation of multiple pathogens causing similar clinical pictures is uncontroversial, and, as evidenced by our findings, not what the clinical definitions were developed to handle. This poses an issue to public health policy, surveillance, and response measures. A large number of (false-positive) dengue referrals to tertiary care may overwhelm healthcare systems, particularly during 'outbreaks' [23]. Chikungunya and Zika share the same vector and thus may be amenable to the same control measures. However, the inability to determine which Aedes-borne virus is responsible for a particular case cluster makes it difficult to assess the introduction of novel viruses to an area and trigger early responses.
With the licensing of the new dengue vaccine, governments need to prioritise areas where vaccine introduction will have the most impact and thereafter measure its efficacy. This is made exceedingly difficult if they cannot ascertain which pathogen is primarily responsible for a region's disease burden. In resource-limited settings, this uncertainty, at the level of both the individual patient and the surveillance system, carries significant opportunity cost for which treatments, control measures, and vaccines to prioritise.

Impact of COVID-19
These issues have only increased in importance during the COVID-19 pandemic. There are case reports of COVID-19 being misdiagnosed as dengue [41], including due to an atypical presentation with a rash [42] (a relatively specific feature of dengue). The studies included in this systematic review predate the COVID-19 pandemic, making it difficult to draw direct conclusions on the effect COVID-19 has on clinical dengue diagnosis. However, the overlap in syndromic definitions [43] and the prevalence of cough and respiratory symptoms in over a third of dengue patients [25,36,37] makes it likely that dengue is also being misdiagnosed as COVID-19 [4]. In particular, the association of taste disorders (a cardinal symptom of COVID-19 [43]) with dengue [25,33] warrants further investigation.
Misclassification of COVID-19 as dengue and vice versa has a profound impact on public health responses due to the very different control measures. Control of dengue relies on control of mosquito vectors or reducing human-vector contact. This generally relies heavily on visits to households, workplaces, schools and other mosquito breeding sites for environmental management, and application of chemical and/or biological measures [12]. This is in stark opposition to COVID-19 control which relies on lockdown measures including restrictions on travel and social interaction. Dengue control has thus been negatively affected during the pandemic [2]. Abandoned buildings (e.g. due to school closures) and lack of maintenance of public spaces can contribute to increases in mosquito populations [3]. As countries report rises in both dengue and COVID transmission [2,3], governments need accurate transmission figures (and hence clinician access to rapid and accurate diagnostics) to prioritise region-specific control measures [4].

Implications for diagnostic testing
While mandating confirmatory testing may increase the specificity of dengue diagnosis, despite recent developments in diagnostics (including the highly-accurate NS1 antigen detection tests) [6], rapid and accurate laboratory confirmation remains inaccessible in most dengue-endemic regions. Furthermore, cheaper tests such as IgM and IgG serology are likely to become less useful as dengue vaccination programs are rolled out; their already (relatively) low specificity has been demonstrated to fall in vaccinated individuals [44].
Once again, this is exacerbated by COVID-19, with case reports of false-positive dengue serology in COVID-19 [41] and a study in a dengue non-endemic area showing 22% false dengue positivity amongst COVID patients (albeit in a small sample) [45]. These findings suggest that the need for better clinical guidance (or cheaper diagnostics) is likely to become increasingly urgent as dengue serology, the most common and accessible laboratory test, becomes less informative.

Possible improvements to the clinical case definitions
One possibility is to use the absence of features more strongly associated with other aetiologies as supporting criteria [17]. For example, the absence of cough, lung crackles [36], and backache [23] were found to be significantly associated with dengue. However, while less common, they are still present in a significant proportion of dengue cases, and therefore their absence can only be a supporting sign.
Another possibility could be prioritising symptoms within case definitions, perhaps by splitting into 'major' and 'minor' criteria, so that symptoms more strongly associated with dengue, such as leukopenia or thrombocytopenia, carry more weight in making the diagnosis. As this was not the goal of this systematic review, further research is needed to better identify these symptom constellations.
Similar to guidance for laboratory testing [12], the case definition could be modified so that symptom criteria vary depending on timing within the illness course, where symptom associations are known to differ [37]. For example, platelet count, while reduced in dengue patients, may be normal at first, making thrombocytopenia more informative later in the illness [33,37]. Another study found that headache, myalgia, and retro-orbital pain were more sensitive earlier on, whilst rash was the opposite [32]. Modifying the definition at different timepoints may therefore improve accuracy, although current findings demonstrate inconsistencies and further research and/or systematic review is necessary.
Test positivity also varies over time with different windows for detectable PCR, NS1, IgM and IgG [12]. Some studies took this into account in deciding which reference standard to use, as outlined in Table 3. Differences in test timing may contribute to some of the variability in sensitivity/specificity found between studies. However, as most included studies did not state whether the choice of reference standard varied depending on illness duration, the impact of this could not be adequately assessed. This is a potential topic for future research.
Nonetheless, any clinical definition will remain imperfect given the variable and nonspecific presentation of dengue. Alternatively, modified case definitions could guide the allocation of limited testing resources rather than diagnosis. Specificity increases when more criteria are required (e.g. 5 instead of 2) [16,19,23], increasing diagnostic certainty. Thus, as dengue can effectively be ruled out in patients not fulfilling the criteria (due to the high sensitivity), and is highly likely in those with multiple symptoms, laboratory confirmation can be reserved for those patients with only 2 or 3 symptoms where uncertainty is greatest and testing will be most informative [23].
Finally, local guidelines or electronic decision support tools could incorporate epidemiological information about circulating pathogens to prioritise symptoms and signs that would be most discriminatory for the region's differentials [4,46]. As clinician diagnosis at both admission and discharge was more specific (but less sensitive) than WHO criteria [36], this could already be considered by experienced clinicians and is a potential avenue for further research.

Strengths and limitations
Our study conformed to PRISMA guidelines (S7 Table) and was methodologically robust. By using two independent reviewers, researcher bias was mitigated at every stage of analysis. By searching multiple databases (including grey literature) and carrying out a thorough citation analysis we believe we have captured most, if not all, the available literature on the diagnostic accuracy of dengue case definitions. The inclusion of studies from multiple regions increases the generalizability of our findings. Only one eligible study from Africa could be found. This may be due to a lack of dengue or a lack of dengue research in this region, which could itself be a result of underrecognition and underdiagnosis.
The main limitation was the significant heterogeneity (in methods and results) of included studies and the high risk of bias. This is likely due to the use of different reference standards between studies. As diagnostic accuracy varies between and within confirmatory tests [6], and no test is perfect, this would introduce significant bias to results (especially when IgM or IgG serology alone were used for confirmation). Furthermore, the different spectra of illness presenting in different healthcare settings and age groups may also contribute to heterogeneity in clinical case definition performance.
However, except for two outliers, studies across different regions, healthcare settings, and patient ages demonstrated relatively high sensitivity and poor specificity. While the summary values should be used with caution, the need for urgent improvement in dengue diagnostic guidance and reporting practice is clear.
Nonetheless, it is worth noting that, unlike the studies included in this systematic review, frontline clinicians may not apply WHO criteria strictly without also considering contextual epidemiology (such as a local outbreak). The effect of this on the accuracy of clinical diagnosis (and subsequent reporting of global cases) remains unclear. It may improve through correctly dismissing cases that fulfil the WHO criteria when other circulating pathogens are more common (out of 'dengue season') but may also lead to self-fulfilling prophecies of dengue outbreaks due to the nonspecific nature of the case definitions. This overdiagnosis could be offset by clinicians being too busy during outbreaks to report all cases, hence why studies may not find evidence of over/underreporting during outbreaks. Further research would be helpful in understanding the impact of outbreaks on reporting rates in light of limited access to testing and nonspecific case definitions.

Conclusion
This review has demonstrated the poor diagnostic accuracy of the clinical definitions for dengue in the absence of confirmatory testing. This has real-world costs both for treating clinicians and for surveillance systems, magnified by COVID-19. As fragile healthcare systems prepare to cope with the possibility of double epidemics, further research into improved clinical guidance, access to diagnostic testing, and accurate quantification of dengue burden and transmission will be essential.