Classification performance of administrative coding data for detection of invasive fungal infection in paediatric cancer patients

Background Invasive fungal infection (IFI) detection requires application of complex case definitions by trained staff. Administrative coding data (ICD-10-AM) may provide a simplified method for IFI surveillance, but accuracy of case ascertainment in children with cancer is unknown. Objective To determine the classification performance of ICD-10-AM codes for detecting IFI using a gold-standard dataset (r-TERIFIC) of confirmed IFIs in paediatric cancer patients at a quaternary referral centre (Royal Children’s Hospital) in Victoria, Australia from 1st April 2004 to 31st December 2013. Methods ICD-10-AM codes denoting IFI in paediatric patients (<18-years) with haematologic or solid tumour malignancies were extracted from the Victorian Admitted Episodes Dataset and linked to the r-TERIFIC dataset. Sensitivity, positive predictive value (PPV) and the F1 scores of the ICD-10-AM codes were calculated. Results Of 1,671 evaluable patients, 113 (6.76%) had confirmed IFI diagnoses according to gold-standard criteria, while 114 (6.82%) cases were identified using the codes. Of the clinical IFI cases, 68 were in receipt of ≥1 ICD-10-AM code(s) for IFI, corresponding to an overall sensitivity, PPV and F1 score of 60%, respectively. Sensitivity was highest for proven IFI (77% [95% CI: 58–90]; F1 = 47%) and invasive candidiasis (83% [95% CI: 61–95]; F1 = 76%) and lowest for other/unspecified IFI (20% [95% CI: 5.05–72%]; F1 = 5.00%). The most frequent misclassification was coding of invasive aspergillosis as invasive candidiasis. Conclusion ICD-10-AM codes demonstrate moderate sensitivity and PPV to detect IFI in children with cancer. However, specific subsets of proven IFI and invasive candidiasis (codes B37.x) are more accurately coded.


Introduction
Invasive fungal infections (IFIs) represent significant challenges in the management of paediatric cancer patients with impaired immunity [1][2][3] and are an important cause of morbidity and mortality [1,4]. Current methods for detecting IFI are manual, time consuming and often labour intensive [1,2,5,6], and are reliant on a suite of clinical, laboratory and radiological data. There is therefore limited capacity to routinely capture IFIs to assess the epidemiology, detect potential outbreaks and inform optimal antifungal use in children with cancer [7].
Uniform case definitions for IFI are widely accepted as measurable outcomes in clinical trials (i.e. European Organization for Research and Treatment of Cancer/Invasive Fungal Infections Cooperative Group and the National Institute of Allergy and Infectious Diseases Mycoses Study Group [EORTC/MSG]) [8]. However, these are complex and require detailed case review. Administrative coding data possess potentially favourable attributes for simplified surveillance [9], including standardised classification and availability of specific codes for yeast and mould infections [4,10]. In Australia, the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification (ICD-10-AM) are a monohierarchical, codified, medical lexicon used for coding inpatient diagnoses and is a commonly used ontology to inform activity-based funding models [11].
Earlier data have suggested the sensitivity of administrative coding data for classifying invasive aspergillosis to be moderate (63%) [12], but findings were restricted to filamentous fungi in adult allogeneic and autologous haematopoietic stem cell transplantation recipients and excluded invasive candidiasis, one of the most prevalent IFIs in the paediatric haematologyoncology setting [1]. Despite the high incidence and poor survival prognoses of IFI in cancer patients [13], there is a paucity of available evidence examining the utility of administrative coding data for reliable and reproducible surveillance of IFI in vulnerable paediatric cancer populations.
The objectives of this study were to: (i) determine the sensitivity, positive predictive value (PPV) and F 1 score of administrative coding data for case ascertainment of IFI; and (ii) describe the misclassification rate of ICD-10-AM in paediatric haematology-oncology patients.

Study design and population
This was a retrospective, single-site, cohort study of paediatric patients (<18-years) diagnosed with a haematological malignancy or solid tumour neoplasm between the 1 st  studies in Epidemiology (STROBE; S1 Table) [14] and the REporting of studies Conducted using Observational Routinely-collected health Data (RECORD; S2 Table) statements [15].

Gold-standard invasive fungal infection dataset
Data collected as part of the multisite The Epidemiology and Risk Factors for Invasive Fungal Infections in Immunocompromised Children (TERIFIC) study and restricted to episodes collected at the RCH (denoted as r-TERIFIC), were used for the current study [1,2]. Detailed study methodology is available elsewhere [1,2]. Briefly, this 10-year retrospective study identified all episodes of IFI in children with cancer or haematological malignancy from hospital microbiology, pharmacy-dispensing, radiology, oncology diagnostic and clinical management records as well as diagnostic coding data. Invasive fungal infection episodes were classified as proven, probable, possible or modified possible in accordance with EORTC/MSG criteria [8] and modifications described elsewhere [1,2].

Administrative coding dataset
Episode-level, administrative coding data were sourced from the Victorian Admitted Episodes Dataset (VAED) and mapped to each patient record captured in the r-TERIFIC dataset. The VAED is Australia's largest hospital morbidity database, and consists of diagnostic ICD-10-AM and procedural Australian Classification of Health Interventions (ACHI) codes for paediatric cancer patients admitted to private and public hospitals in Victoria [16]. Patients with haematological malignancy or a solid tumour were defined using the principal diagnosis codes denoting a primary malignant neoplasm (ICD-10-AM codes: where "x" denotes any number (S3 Table). Invasive fungal infection was defined when an additional diagnosis code (Australian Coding Standards 0002 Additional diagnoses [17]) denoting IFI was reported in the VAED (ICD-10-AM codes: B37.x, B42.x -B50.x) (S3 Table). Hospitalisations for autologous or allogeneic haematopoietic stem cell transplantation were defined by corresponding ACHI codes 13706-00, -06, -07, -08, -09, -10 [802] (S4 Table). Updates to the ICD-10-AM and ACHI codes from the Third to Eighth Edition were elucidated. Duplicate IFI codes denoting the same IFI in the same hospitalisation, as well as those reported at the time of admission in subsequent hospitalisations, were considered the same IFI and were counted only once per patient. Multiple discrete IFI codes appearing in the same hospitalisation per patient were counted as separate IFI episodes. Accordingly, patients with �2 mutually exclusive gold-standard IFI diagnoses in the r-TERIFIC dataset, diagnosed in the same or in discrete hospitalisations, were counted as individual gold-standard cases for each IFI diagnosis (for example, one patient with both invasive aspergillosis and invasive candidiasis was counted as one case of invasive aspergillosis and one case of invasive candidiasis). Index hospitalisation was defined as the first admission date at the RCH.

Exclusion criteria
Cancer patients with superficial fungal infections (codes B36.x), including dermatophytes (codes B35.x), and patients with no underlying malignancy were excluded.

Statistical analyses
For patient and clinical characteristic data, normality was tested on histogram analysis and the skewness and kurtosis test [18]. The mean (±standard deviation) and median (interquartile range) were reported for parametric and non-parametrically distributed data, respectively.

Classification accuracy.
To determine the accuracy of ICD-10-AM codes for IFI case detection, sensitivity, PPV and F 1 scores were calculated, stratified by IFI type, EORTC/MSG classification and underlying cancer diagnosis [19,20]. Sensitivity and PPV of the coding data were calculated as the number of clinically-confirmed IFI patients in receipt of at least one IFI code (i.e. true positives; cases where the ICD-10-AM code agrees with the clinical label) divided by the total number of clinically-confirmed IFI cases in the r-TERIFIC dataset and the total number of patients assigned an ICD-10-AM code for IFI (code positives), respectively. Exact binomial 95% confidence intervals (CI) were calculated for all sensitivity and PPV calculations. The F 1 score was used to measure the harmonic mean of the sensitivity and PPV of the coding data according to the formula [21]: where F 1 ranges in [0,1] = {F 1 :0�F 1 �1} and an F 1 = 1 indicates perfect sensitivity and PPV. To identify which coding abstraction yields the highest sensitivity, PPV and F 1 score within each combination of IFI codes, the union of different ICD-10-AM code sets for IFI (represented as A k ) was evaluated. The union of code sets A 1 and A 2 , denoted A 1 [ A 2 , is equivalent to the set of patients in the r-TERIFIC dataset that are correctly assigned either code A 1 (Pr(A 1 )) or code A 2 (Pr(A 2 )) or codes A 1 and A 2 (Pr(A 1 \ A 2 )). Classification performance was determined according to increasing numbers of assigned code sets ( ). Sensitivity, PPV and F 1 estimates of 0% indicate IFI code sets that were not assigned to true positive cases in the r-TERIFIC dataset, denoted A 1 0 \ A 2 0 . The number of different combinations (C) of codes (n) in increasing set sizes (r) was determined according to the following formula: Classification statistics are reported in accordance with the Standards for Reporting Diagnostic (STARD) accuracy studies statement [22] (S5 Table). Misclassification rate. Misclassification rate was calculated as a proportion of discordantcoded IFIs (e.g. the proportion of invasive candidiasis cases coded as invasive aspergillosis).
All statistical analyses were undertaken using Stata/SE v15.1 software (StataCorp 1 LLC, College Station, Texas, U.S.A.) A two-sided p value <0.05 was considered statistically significant.

Ethics
Ethics approval was granted by the Royal Children's Hospital Human Research Ethics Committee (project number: 59636) and the need for informed consent was waived in accordance with the National Statement on Ethical Conduct in Human Research 2007 (Updated May 2015) [23].

Discussion
This study is the first to describe the performance of administrative coding data to detect IFI in immunocompromised children with cancer. Overall sensitivity and PPV of ICD-10-AM codes for detection of clinically-confirmed IFI were moderate. However, sensitivity was improved for ascertainment of proven and possible IFI cases, in particular for invasive candidiasis, suggesting there is potential merit in using administrative coding data to signal medical record review for these discrete IFIs. We found that ICD-10-AM codes alone were not sufficient to accurately classify IFI cases. In keeping with earlier estimates reported in Chang et al. [12], we observed an overall sensitivity and PPV of 60%, indicating that administrative coding data alone are not sufficient to reliably detect true cases of IFI in paediatric patients. The performance of coding data for IFI case detection was enhanced when subsets of proven IFI were examined, suggesting that where confirmatory laboratory results are available, then the quality of coding may be improved. Accuracy and completeness of medical record documentation likely contributes to this variation with one study showing that 97% of fungaemia cases were assigned an IFI code when fungaemia was explicitly documented in the medical record, as opposed to only 42% of cases when only microbiology results were used [24]. While underlying malignancy is important for evaluating IFI risk [4], our findings suggest that cancer diagnosis is less relevant to understanding the classification performance of administrative coding data for IFI. In addition, we observed a difference in the performance of coding data for accurate detection of specific subsets of fungal infection. Cases of invasive candidiasis were more accurately coded than invasive aspergillosis cases. We propose that this may be related to readily available and simple diagnostic tests for yeast infections, in comparison to heterogenous diagnostic testing and the requirement for interpretation of imaging and laboratory results in order to identify mould infections. These factors could impact upon coding practices, particularly where microbiology, histology and radiological findings require integration by clinicians, with documentation in medical files, to facilitate accurate coding by clinical coders.
Cases of invasive aspergillosis were most frequently misclassified as invasive candidiasis in the coded data. Although the number of invasive aspergillosis cases in the gold-standard data were small (N = 15), our findings are likely indicative of the uncertainty in discriminating between yeast and mould infections at the clinical coding level. A recent qualitative study [25] identified clinical coders' experience and awareness of IFI as a factor associated with discordant coding. Although it is a reasonable assumption that clinical coder experience is associated with our misclassification estimates, in the setting of IFI where clinical case definitions are complex, it is conceivable that other factors are at play. This includes the complexity of translating clinical data indicating invasive aspergillosis into ICD-10-AM [24,26], the absence of clear definitions [27,28], subjective interpretation of existing guidelines [24,25,27], delays in diagnosis [29], and the review of multiple data sources to make a confirmatory diagnosis of mould infection [1,2,30]. This setting underscores the importance of clear, complete, legible and standardised documentation of IFI to mitigate misclassification in current coding workflows.
We noted variation in classification performance according to specific code sets. Our results indicate that algorithms including the largest combination of specific IFI code sets yield the highest probability for case ascertainment in hospitalised paediatraic cancer patients. Notwithstanding, the fact that the F 1 score for specific invasive aspergillosis code abstractions (B44.0 [ B44.1 [ B44.7) is still low-to-moderate (F 1 = 36%, Tables 3 and 4) underscores that although these specific codes are the most sensitive starting point to signal medical chart review, existing coding rules are an unreliable indicator for invasive mould infections when used in isolation. Importantly, the sensitivity of ICD-10-AM decreases from 42% to 5.26% and 83% to 17% when comparing patients assigned one versus two codes denoting invasive aspergillosis and invasive candidiasis, respectively. Mathematically, the subset of true positive cases (numerator) diminishes as the number of assigned IFI codes increases (and the case definition therefore becomes more specific), whilst the number of gold-standard IFI cases (denominator) remains fixed. For example, true positives with 3 IFI codes is a subset of true positives with �2 IFI codes, which is a subset of true positives with �1 IFI code. Alternatively, {patients with 3 codes} � {�2 codes} � {�1 code}.
Methodological differences between the Australian Coding Standards and EORTC/MSG definitions likely contribute to our moderate overall F 1 score of 60%. Clinical coders must adhere to rigid coding rules in accordance with Australian Coding Standards in the same way that clinicians adhere to complex and comprehensive criteria for IFI (i.e. EORTC/MSG), but these two sets of criteria may not directly match. This disconnect in clinical case definitions is a fundamental drawback in using ICD-10 codes as a reproducible proxy for IFI given cases detected according to clinical criteria may not reflect coded cases using ICD-10-AM. For example, clinical coders' reliance on microbiology and histology records to identify cases of IFI in line with current coding rules can be subject to ascertainment bias in the coded data, given many IFIs are diagnosed according to a combination of metrics, namely clinical acumen, radiological findings and serum antigen testing [12,26]. Notwithstanding, strategic imperatives to mitigate erroneous coding of IFI are likely two-fold. First, harmonisation of clinical EORTC/MSG definitions with existing Australian Coding Standards may help safeguard accurate detection of IFI in the coded data by reducing ascertainment of false positive cases (for example, our high number of false positive cases [N = 38] coded as 'candidiasis of other sites' [code B37.88]). In fact, recent qualitative research proposes the use of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) codes in electronic health records as a more granular tool to standardise terminology and facilitate clinical coding of complex diseases [25,31,32]. Second, ensuring that chart documentation is complete, legible and streamlined will ensure clinical coders have sufficient access to the data required to assign the appropriate IFI code(s) [25,33,34].
Our high classification estimates for invasive candidiasis suggest that administrative coding data may be a feasible proxy to facilitate existing surveillance methods of yeast infection. Owing to the availability and easier interpretation of confirmatory diagnostic data indicating invasive candidiasis compared to invasive aspergillosis [35], the sensitivity and PPV of the administrative coding data are high (F 1 = 76%). These findings substantiate potential merit in its use as a signal to trigger medical record review. Current surveillance of IFI is manual, onerous, time-consuming and resource-intensive [1,2,4,12,24]. However, use of ICD-10 codes as a feasibly available surrogate measure for invasive candidiasis may help restrict medical chart reviews to patients most likely presenting with yeast infection, therefore mitigating unnecessary record review. Our promising classification results also suggest there may be value in using ICD-10-AM codes for population-based monitoring of invasive candidiasis (codes B37. x) in paediatric populations.
Limitations of the current study include the fact that single-centre experience was evaluated, and findings may not reflect clinical coding performance and differences in other paediatric haematology-oncology units [36,37]. Second, ICD-10 is an amalgamation of diagnostic information into a codified, monohierarchical, medical lexicon which does not discriminate between EORTC/MSG classifications, therefore rendering the data insufficient for fungal surveillance based on classification of proven/probable/possible IFI. Third, the wide 95% confidence intervals for our classification estimates (Table 3) are attributed to a small sample size of true positive cases stratified by type of IFI and EORTC/MSG criteria. Further, although the one-month average time lag [38] for hospital diagnoses to be coded make ICD-10-AM unsatisfactory for real-time IFI surveillance, our data indicate potential merit in using invasive candidiasis codes (B37.x) to signal retrospective detection of potentially missed cases.

Conclusions
In conclusion, we demonstrate moderate performance of ICD-10-AM codes for detection of IFI in children with cancer. Coding of invasive fungal infections having greater diagnostic certainty according to EORTC/MSG criteria (i.e. proven IFI), as well as yeast infections, resulted in higher sensitivity for case ascertainment. Findings suggest that while administrative coding data are not an accurate reflection of overall IFI disease burden, these data may provide an acceptable reflection of relative disease burden and signal a medical chart review for specific IFI categories (namely, proven/possible IFI and yeast infections) in paediatric patients with cancer. Future studies are required to assess the utility of ICD-10-AM data for these specific infections to detect changes in disease burden and longitudinally monitor quality improvement activities.
Supporting information S1 Table. STROBE statement-checklist of items that should be included in reports of cohort studies. (PDF) S2 Table. The RECORD statement-checklist of items, extended from the STROBE statement, that should be reported in observational studies using routinely collected health data. (PDF) S3 Table.