Validating International Classification of Disease 10th Revision algorithms for identifying influenza and respiratory syncytial virus hospitalizations

Objective Routinely collected health administrative data can be used to efficiently assess disease burden in large populations, but it is important to evaluate the validity of these data. The objective of this study was to develop and validate International Classification of Disease 10th revision (ICD -10) algorithms that identify laboratory-confirmed influenza or laboratory-confirmed respiratory syncytial virus (RSV) hospitalizations using population-based health administrative data from Ontario, Canada. Study design and setting Influenza and RSV laboratory data from the 2014–15, 2015–16, 2016–17 and 2017–18 respiratory virus seasons were obtained from the Ontario Laboratories Information System (OLIS) and were linked to hospital discharge abstract data to generate influenza and RSV reference cohorts. These reference cohorts were used to assess the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the ICD-10 algorithms. To minimize misclassification in future studies, we prioritized specificity and PPV in selecting top-performing algorithms. Results 83,638 and 61,117 hospitalized patients were included in the influenza and RSV reference cohorts, respectively. The best influenza algorithm had a sensitivity of 73% (95% CI 72% to 74%), specificity of 99% (95% CI 99% to 99%), PPV of 94% (95% CI 94% to 95%), and NPV of 94% (95% CI 94% to 95%). The best RSV algorithm had a sensitivity of 69% (95% CI 68% to 70%), specificity of 99% (95% CI 99% to 99%), PPV of 91% (95% CI 90% to 91%) and NPV of 97% (95% CI 97% to 97%). Conclusion We identified two highly specific algorithms that best ascertain patients hospitalized with influenza or RSV. These algorithms may be applied to hospitalized patients if data on laboratory tests are not available, and will thereby improve the power of future epidemiologic studies of influenza, RSV, and potentially other severe acute respiratory infections.


Introduction
Routinely collected health administrative data are increasingly being used to assess disease burden and aetiology [1,2]. Algorithms applied to International Classification of Disease (ICD) codes documented in hospital discharge abstracts can be used to identify cases of a disease for the purposes of disease surveillance, but it is imperative to evaluate the validity of such algorithms to limit misclassification bias in epidemiologic studies.
While several studies have assessed the validity of ICD codes for identifying influenza and respiratory syncytial virus (RSV) within health administrative data [1][2][3][4][5][6][7][8], many of those studies had limitations. Some studies could only examine correlative patterns between true cases and ICD-coded cases at an aggregate level, because they could not link data at the individual level [2,3,5,6]. Without individual-level data, there remains the risk of misclassification of individual cases, as well as challenges in characterizing the sensitivity, specificity, and predictive values of these algorithms. When individual-level data were available and validity parameters were reported, studies were generally limited by one or more of: small numbers of study centres, restricted participant age ranges, or inclusion of few respiratory virus seasons [1,4,7,8]. Consequently, the generalizability of these algorithms is uncertain.
The objective of this study was to develop and validate more generalizable ICD 10 th revision (ICD-10) case-finding algorithms to identify patients hospitalized with laboratory-confirmed influenza or laboratory-confirmed RSV using population-based health administrative data from Ontario, Canada.

Ethical considerations
This study used laboratory and health administrative data from Ontario, Canada (population 13.5 million in 2016) housed at ICES. ICES is a prescribed entity under section 45 of Ontario's Personal Health Information Protection Act (PHIPA). Section 45 authorizes ICES to collect personal health information, without consent, for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system. Projects conducted under section 45, by definition, do not require review by a Research Ethics Board. This project was conducted under section 45, and was approved by ICES' Privacy and Legal Office.

Data sources
Ontario's universal healthcare system captures virtually all healthcare interactions. To identify eligible patients for this study, we used data from the Ontario Laboratories Information System (OLIS), the Canadian Institute for Health Information's Discharge Abstract Database (CIHI-DAD), and the Registered Persons Database (RPDB). These datasets were linked using unique encoded identifiers and analyzed at ICES.
OLIS is an electronic repository of Ontario's laboratory test results, containing information on laboratory orders, patient demographics, provider information, and test results. The system captures data from hospital, commercial, and public health laboratories participating in OLIS. OLIS excludes: tests performed for purposes other than providing direct care to patients; tests that are ordered for out-of-province patients or providers; and tests for patients with health cards that are recorded as lost, stolen, expired, or invalid.
Implemented in 1988, CIHI-DAD captures administrative, clinical, and demographic information on all hospitalization discharges. Following a patient's discharge from hospital, a trained medical coder codes the medical record with up to 25 ICD-10 diagnosis codes (1 "most responsible" diagnosis code and up to 24 additional diagnosis codes), all of which are recorded in CIHI-DAD.
The RPDB provides basic demographic information on all individuals who have ever had provincial health insurance, including birth date, sex and postal code of residence. Ontario health insurance eligibility criteria are summarized in Table A of the S1 Appendix.

Generating influenza and RSV reference standard cohorts
Influenza and RSV polymerase chain reaction (PCR) laboratory data were obtained from OLIS over 4 respiratory virus seasons ranging from 2014-15 to 2017-18. This time frame was selected to include as many seasons as possible during a period when a relatively higher and stable proportion of laboratories were reporting to OLIS. Respiratory virus seasonality was defined to create the most inclusive time frames that would capture influenza and RSV seasonal activity in Ontario between the 2014-15 and 2017-18 viral seasons according to data provided by Public Health Ontario's Respiratory Pathogen Bulletin [9]. Therefore, influenza tests were collected from November to May and RSV tests were collected from November to April. Only one test per person per season was included in the reference cohort. If an individual was tested multiple times per season, we included the first positive test, or the first negative test if all tests were negative. Tests were excluded if they were linked to an individual who: was missing information on birth date, sex, or postal code from the RPDB; was not eligible for provincial health insurance or resided out of province according to the RPDB; or had a death date registered before the specimen collection date.
Laboratory data were then linked to CIHI-DAD hospitalization data using patients' unique encoded identifiers. Only patients with suspected community-acquired infections, defined as specimen collection within 3 days before or after a hospital admission, were included in the analysis. This definition ensured reference hospitalizations were more likely to be associated with community-acquired influenza or RSV infection. Individuals with suspected nosocomial infections, defined as hospitalizations associated with specimens collected more than 72 hours post admission [10], were excluded from the reference cohorts for that respective season. Overall, the "true positive" influenza and RSV reference cohorts comprised all hospitalized patients who tested positive for influenza or RSV by PCR within 3 days of admission, respectively, and the "true negative" influenza and RSV reference cohorts comprised all hospitalized patients who tested negative for influenza or RSV by PCR within 3 days of admission, respectively.
The validity of each algorithm was evaluated by calculating sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). First, validity parameters were calculated by evaluating the "most responsible diagnosis" code in the discharge abstract. If an ICD-10 code in the algorithm was recorded as the most responsible diagnosis in the discharge abstract, then it was classified as an algorithm-positive record. Next, validity parameters were calculated using all diagnosis codes available in the discharge abstract. If an ICD-10 code in the algorithm was recorded as any diagnosis code on the discharge abstract, then it was classified as an algorithm-positive record. Algorithms applied to the most responsible diagnosis code were consistently less accurate than the same algorithms applied to all diagnosis codes (see Tables A-D in S2 Appendix). Therefore, we present the results of the latter analyses only.
To minimize false positive rates and minimize misclassification of algorithm-positive cases, top-performing algorithms were selected according to specificity and PPV parameters [11]. If multiple algorithms had similar specificity and PPV, we then prioritized sensitivity. Since PPV and NPV are susceptible to changes in disease prevalence [12], and thus may vary depending on patient age or month of hospital admission, we also validated the top-performing algorithms in the reference cohorts stratified by age and month of hospital admission. The algorithms with consistently high specificity and PPV were selected as top-performing algorithms.
We calculated 95% confidence intervals using the Clopper-Pearson exact method [13]. All analyses were conducted using SAS version 9.4 (SAS Institute, Cary, NC, USA).

Influenza and RSV reference cohorts
We identified 133,422 and 96,624 PCR testing events for influenza and RSV, respectively, in OLIS during the 2014-15 to 2017-18 respiratory virus seasons (Fig 1). After exclusions, 83,638 (63%) and 61,117 (63%) events for influenza and RSV, respectively, were associated with a hospitalization within 3 days of specimen collection and thus comprised the reference cohorts. Reference cohort characteristics are summarized in Table 1. True positive cases, defined as hospitalizations associated with a positive PCR test, comprised 17.6% of the influenza cohort and 9.2% of the RSV cohort (Table 1). Patient age ranged from 0 to 105 years. In both reference cohorts, all age strata had at least 2,000 patients.

Algorithm validation by age group and month of admission
Validity of the FLU1 and FLU2 algorithms did not vary substantially by age (Table 4). Both algorithms had specificities �98% and PPVs �89% across all age strata. More variability in FLU1 and FLU2 algorithm validity was observed when assessed by month of hospital admission (Table E in S2 Appendix). Specificity of both algorithms remained �98% during all months, whereas sensitivity and PPV decreased in November and May. RSV1 and RSV2 algorithm validity was more variable across age strata (Table 4). Algorithm specificities were �94% across all age strata, while algorithm sensitivities were higher among children aged 0-4 years (e.g. RSV1 Sensitivity = 76%) compared to adults (e.g. adults aged 20-49 years, RSV1 Sensitivity = 49%). Further, PPVs declined among patients aged 5-19 years to lows of 85% for RSV1 and 78% for RSV2. RSV1 and RSV2 algorithm validity also varied by month of hospital admission (Table E in S2 Appendix). Algorithm specificities were �99% for November through April, while algorithm sensitivities and PPVs declined in April (RSV1: Sensitivity = 56% PPV = 89%; RSV2: Sensitivity = 57% PPV = 81%).   Overall, the FLU1 algorithm and the RSV1 algorithm maintained the highest specificity and PPV across all age strata and months of admission, and were therefore classified as the most valid algorithms to identify influenza and RSV hospitalizations.

Discussion
We established two highly specific ICD-10 algorithms to identify influenza and RSV hospitalizations using large, population-based reference cohorts of patients with laboratory-confirmed hospitalizations over four respiratory virus seasons. Based on the criteria of specificity and PPV, the most valid influenza algorithm included all influenza-specific ICD-10 codes that included laboratory confirmation (FLU1), while the most valid RSV algorithm included all RSV-specific ICD-10 codes (RSV1).  This finding was expected given our reference cohorts were defined using laboratory test results. Medical coding is performed at discharge when testing results may be available; thus, medical coding and laboratory data are not necessarily independent.
FLU1 and RSV1 maintained high specificity and PPV when the reference cohorts were stratified by age. Thus, the algorithms can be applied to paediatric, adult, and elderly populations with low risk of misclassification bias. The specificity of the algorithms also remained  high when assessed by month of hospitalization, although PPV was more variable. The PPV of FLU1 dropped to lows of 87% in November and 86% in May, while the PPV of RSV1 dropped to a low of 89% in April. These decreases were expected, as PPV is dependent on disease prevalence, and the decreases were concordant with typical declines in respiratory virus prevalence and activity in Ontario during those months [14]. Notably, the absolute number of false positives generated during times of low viral activity made up <8% of overall FLU1 false positives and <7% of overall RSV1 false positives. Therefore, while PPV declined during months of lower viral activity, the overall algorithm validity was not impacted. Our findings concur with previous literature indicating that ICD-10 codes have high specificity and moderate sensitivity for identifying influenza and RSV hospitalizations using health administrative data [1,4,7,8]. Where direct comparisons are possible, our quantitative measures of specificity align with previous findings, while our measures of sensitivity are lower. For example, Moore et al. found that an algorithm that included codes for influenza with or without laboratory confirmation (J10.0-J10.9, J11.0-J11.9) had a specificity of 98.6% and a sensitivity of 86.1% for children aged 0-9 years, whereas our FLU2 algorithm had a specificity of 99% and a sensitivity of 73-75% for children aged 0-19 years [4]. Furthermore, Pisesky et al. found that an algorithm comprising RSV-specific codes (J12.1, J20.5, J21.0, B97.4) had a specificity of 99.6% and a sensitivity of 97.9% for children aged 0-3 years, whereas our RSV1 algorithm had corresponding values of 95% and 76% for children aged 0-4 years [1].
Distinctions between our study populations may explain the differences in sensitivity observed. Pisesky et al. studied a population from a specialized hospital in Ottawa, Ontario [1], while Moore et al. studied a Western Australian population [4]. In contrast, our study was conducted in a larger cohort of patients using data from hospitals across the entire province of Ontario. ICD-10 codes may be used more or less frequently across jurisdictions and institutions resulting in variable algorithm sensitivity. The discrepancies highlight the importance of validating algorithms within distinct populations.
While we established two highly specific algorithms that identify influenza and RSV hospitalizations, some limitations must be considered. First, our reference cohorts only included hospitalized patients who were tested by PCR for the respective pathogens and did not include patients who were not tested. Untested patients with suspected respiratory infections may differ from the tested population of patients. They may have less severe symptoms at hospitalization, may be more likely to live in long-term care facilities where outbreaks have occurred, or may be more likely to live in less-resourced settings where testing is limited. Untested patients may have had more pressing medical concerns at hospitalization and therefore testing was not a priority, or they may have been hospitalized at an overcrowded, high-volume site. Testing may further depend on the age of the patient at admission or the protocols in place at the hospital. By including hospitals across the entire province of Ontario we aimed to mitigate hospital-specific variability that may have affected the generalizability of our results. However, variability between untested and tested patients must be considered when using the algorithms to assess certain risk factors that may be associated with propensity to receive a test. For example, risk factors such as age, symptom severity, comorbidities, and residence in long-term care facilities may be associated with propensity to receive a PCR test, and thus may have biased estimates of effect when using these algorithms. Caution must also be applied when assessing risk factors in specific settings that have differing testing practises compared to general Ontario hospitals.
Another limitation is that our top-performing algorithms were selected to maximize specificity and PPV. This approach was taken to minimize misclassification of cases rather than non-cases. Depending on future study objectives, it may be more important to maximize sensitivity. For example, our algorithms significantly underestimate the number of true influenza and RSV cases in the Ontario population, and thus would not be suitable to estimate population burden of influenza or RSV. Therefore, validity parameters have been reported for all algorithms tested to facilitate the selection of the best algorithm(s) for particular studies.
Use of these algorithms in non-Ontario-based cohorts also warrants caution. PPV and NPV are highly susceptible to changes in disease prevalence [12]. Coding practises and testing practises may vary across jurisdictions, affecting all validity measures reported [15,16]. Thus, it may be necessary to re-validate these algorithms when applying them to other populations.
Our findings have important implications for future studies that aim to assess the aetiology of severe outcomes for influenza and RSV hospitalizations using broad health administrative data. Not all hospitals across Ontario currently submit laboratory data to OLIS. Further, OLIS data collection was limited between 2007 and 2012 as laboratories only gradually started submitting data upon implementation of OLIS in 2007. As CIHI-DAD is available for all hospitals across Ontario, these algorithms will allow us to create larger and more representative cohorts of patients hospitalized with influenza or RSV, increasing the power of future aetiological studies. Lastly, since historical CIHI-DAD data are available as early as 1988, these algorithms could be used to assess changes in disease prevalence and aetiology over time.

Conclusion
Using a population-based cohort of patients tested for influenza and RSV, we identified two highly specific algorithms that best ascertain paediatric, adult, and elderly patients hospitalized with influenza or RSV. These algorithms will improve future efforts to evaluate prognostic and aetiologic factors associated with influenza and RSV when reporting of laboratory data is limited. The same principles may be applicable for other severe acute respiratory infections. Kwong.