How Well Do Discharge Diagnoses Identify Hospitalised Patients with Community-Acquired Infections? – A Validation Study

Background Credible measures of disease incidence, trends and mortality can be obtained through surveillance using manual chart review, but this is both time-consuming and expensive. ICD-10 discharge diagnoses are used as surrogate markers of infection, but knowledge on the validity of infections in general is sparse. The aim of the study was to determine how well ICD-10 discharge diagnoses identify patients with community-acquired infections in a medical emergency department (ED), overall and related to sites of infection and patient characteristics. Methods We manually reviewed 5977 patients admitted to a medical ED in a one-year period (September 2010-August 2011), to establish if they were hospitalised with community-acquired infection. Using the manual review as gold standard, we calculated the sensitivity, specificity, predictive values, and likelihood ratios of discharge diagnoses indicating infection. Results Two thousand five hundred eleven patients were identified with community-acquired infection according to chart review (42.0%, 95% confidence interval [95%CI]: 40.8–43.3%) compared to 2550 patients identified by ICD-10 diagnoses (42.8%, 95%CI: 41.6–44.1%). Sensitivity of the ICD-10 diagnoses was 79.9% (95%CI: 78.1–81.3%), specificity 83.9% (95%CI: 82.6–85.1%), positive likelihood ratio 4.95 (95%CI: 4.58–5.36) and negative likelihood ratio 0.24 (95%CI: 0.22–0.26). The two most common sites of infection, the lower respiratory tract and urinary tract, had positive likelihood ratios of 8.3 (95%CI: 7.5–9.2) and 11.3 (95%CI: 10.2–12.9) respectively. We identified significant variation in diagnostic validity related to age, comorbidity and disease severity. Conclusion ICD-10 discharge diagnoses identify specific sites of infection with a high degree of validity, but only a moderate degree when identifying infections in general.


Introduction
Credible measures of disease incidence, trends and mortality are critical for a proper public healthcare management. This information can be obtained through surveillance using manual chart review, but this is both time-consuming and expensive [1]. Surveillance of infections often depends on notifications from the physicians. However, because patients are registered with diagnose codes at their discharge or transfer from department to department, it is possible to use discharge diagnoses as surrogate markers of infection. Studies examining the validity of discharge diagnoses identifying infections have previously to a large extent only focused on specific sites of infection, with varying results. The validity depends on which infection the patient presents with, the patient population, and setting examined [2].
Only a few studies have assessed the validity of ICD-10 codes for infections in general [3][4][5], and it is unknown if the validity changes in specific patient subgroups.
The aims of this study were to determine, to which degree discharge diagnoses of infection could accurately identify community-acquired infections in an emergency department (ED) setting; and to assess if the sites of infection, baseline patient characteristics and disease severity affect the validity of the discharge diagnoses.

Materials and Methods
We conducted a cross sectional study of all patients admitted to the medical ED at Odense University Hospital, Denmark from 1 September 2010-31 August 2011. All subjects were manually reviewed with respect to the presence of an infection at admission. We ascertained the ability of discharge diagnoses to identify the infections found by a structured manual review.

Ethics statement
In compliance with Danish law, the study was notified to and approved by the Danish Data Protection Agency (J No 2008-58-0035), and the access to patient clinical records was approved by the Danish National Board of Health (J No 3-3013-35). No further ethical approval, or consent from participants, is needed for register-based studies in Denmark. Data were anonymised and deidentified prior data analysis.

Study design and setting
The medical ED serves a population of 235,000 adults and serves as a medical admission unit for the following medical specialities: general internal medicine, infectious diseases, gastrointestinal medicine, geriatric medicine, rheumatology, endocrinology and respiratory medicine. The medical ED received all acutely admitted medical patients referred from either a primary care physician or from the open general ED where an emergency care physician found the patient in need of admission. At arrival, all patients had their vital signs registered and blood drawn for laboratory analysis, as a part of the clinical routine.

Participants
Patients eligible for the study were adults ($15 years of age) with a first time admission to the medical ED within the study period. Patients without a Danish civil registration number and patients discharged from a hospital up to 7 days prior to inclusion were excluded from further analysis.
We used a structured protocol to collect data regarding the presence of infections, based on The National Healthcare Safety Network criteria in combination with a predefined definition, where the site of infection was clinically evident [6] (Appendix S1). A manual chart review was conducted of all patients admitted to the medical ED within the study period. Physicians notes, nurses charts, data on microbiological cultures, biochemical data and radiographic imaging were reviewed, and infections identified during the first 48 hours of the admission were included. The manual chart review was done by an experienced clinical physician (DPH). If a patient had more than one site of infection associated to the given admission, we included all, and did not prioritise between them.

Validation of chart review
We assessed the inter-observer reliability by analysing 2.5% randomly selected patients from all admissions to the medical emergency ward within the inclusion period, to examine the reproducibility of identifying infection by manual chart review. The review was done by two experienced clinical physicians (DPH and CBL), blinded to each other and the others verdict. The general inter-rater agreement, regarding the presence of all infections, was 84.1% with a kappa value of 0.68, producing a substantial strength of agreement [7]. When restricting to specific sites of infection, the inter-rater agreement was between 92.7% (lower respiratory tract) to 100% (cardiovascular).

Data sources
Database. Trained data abstractors extracted and validated clinical details and vital signs at the time of admission from the electronic patient journal.
Using the unique Danish personal identification number [8], supplementary information on included patients were retrieved and linked from several large population-based registers.
Funen Patient Administrative System. The register comprises all hospitalisations at Odense University Hospital registered since 1974, and was used to identify all patients admitted to the medical ED within the study period, as well as the registered time of admission.
Danish National Patient Register. The register contains data on admission and discharge dates as well as discharge diagnosis for all patients hospitalised in Denmark since 1977, classified according to the International Classification of Diseases, 10th revision (ICD-10) from 1994 and onward [9]. For the included patients we extracted discharge diagnoses from the previous 10 years to generate a Charlson comorbidity score and grouped it to form the Charlson Comorbidity Index for each patient enrolled in the study, as a marker for comorbid illness [10].
Other registers and databases. Data were supplemented by information from Odense Pharmacoepidemiological Database, the laboratory information system at Department of Clinical Microbiology at Odense University Hospital, the Danish National Cancer Register, as well as the Danish National Alcohol-and Drug Treatment Register, with the aim to identify patients with immunosuppression, community-acquired bacteremia and alcoholism-related conditions [11,12]. Data on birth, deaths and migration status were obtained from the Civil Registration System in Denmark [13].

Definitions
In order to categorise the discharge diagnoses into sites of infection, we reviewed all ICD-10 diagnoses aggregated from the entire admission of all patients admitted to the medical ED within the study period. The ICD-10 diagnoses indicating the presence of infection are presented in Appendix S2. If a discharge diagnosis was associated with a specific microbe (e.g. A490 Staphylococcal infection, unspecified site) and did not have a specific organ relation, we classified it as the presence of infection and unknown site of infection. For additional definitions on immunosuppression, alcoholism-related conditions, organ dysfunction, comorbidity and systemic inflammatory response syndrome, see Appendix S3.

Analysis
We assessed the validity of discharge diagnoses indicating infections in general presented as a crude value (infection yes/no) as well as stratified into sites of infection, using sensitivity, specificity, likelihood ratios and predictive values.
The diagnosis of infection was extracted from chart review and compared to the discharge diagnoses. The chart review was considered as gold standard. The distribution of these two different approaches, as well as tentative diagnoses from the medical ED, was illustrated in an area proportional Euler Diagram [14].
Baseline patient characteristics were presented as the proportion of all eligible patients, and the proportion of patients with infection according to chart review and ICD-10 discharge diagnoses.
Positive likelihood ratio was defined as the probability of a patient with infection, who had a discharge diagnosis indicating infection, divided by the probability of a patient without infection, but with a discharge diagnosis indicating infection (sensitivity/[1specificity]). Negative likelihood ratio was defined as the probability of a patient with infection, but without a discharge diagnosis indicating infection, divided by the probability of a patient without infection and without a discharge diagnoses indicating infection ([1-sensitivity]/specificity). 95% confidence intervals were calculated for predictive values, likelihood ratios, sensitivity and specificity analysis assuming normal approximation of the binomial distribution.
Statistical analysis was performed with Stata version 13.0 (Stata Corporation, Texas, USA).

Participants
A total of 6257 patients had one or more admissions to the medical ED during the study period. 280 were excluded, 5977 were included ( Figure 1). The median age of the included patients was 66 years (5-95% range: 21-91 years). 2722 (45.5%) were males, and 2002 (33.5%) presented with a Charlson Comorbidity Index .2 (Table 1).
The relations between the diagnosis of infection based on chart review, tentative discharge diagnoses and accumulated discharge diagnoses from the medical emergency department are presented in Figure 2. In total, 3069 patients were identified with an infection by either manual chart review, tentative ICD-10 discharge diagnoses or accumulated ICD-10 discharge diagnoses. The figure shows that 182 (5.9%) patients were registered with a discharge diagnosis of infection after transfer from the medical emergency department to another department at the hospital.

Number and proportion of patients with infection
The most common site of infection was the lower respiratory tract (with-and without pneumonia) with about 54% of all registered sites of infection, followed by the urinary tract with 22% and abdomen with 14% ( Figure 3). We found no difference in numbers of patients with different sites of infection, identified by one or the other method.
When stratifying on patient baseline characteristics, chart review and ICD-10 code, identification of infected patients showed no difference in number and proportion identified ( Table 1).    Table 2).
Although the sensitivity of identifying patients with communityacquired infection by discharge diagnoses increased with old age and number of organ failures, the corresponding positive likelihood ratios decreased due to a decreasing specificity. The demographic characteristic with the highest positive likelihood ratio associated was patients aged 15-39 years (9.8, 95%CI: 7.7-12.5) whereas  Table 2. Estimation of sensitivity, specificity, predictive values and likelihood ratios of the diagnosis codes of infection in patients admitted to the medical emergency department with-and without infection.

Discussion
Our study explored the possibility of using ICD-10 discharge diagnoses from health administrative data to identify patients with infections presenting at a medical ED. We found, that using discharge diagnoses as surrogate markers of infection gave reliable estimates of numbers and proportions as well as a high degree of validity when stratifying on the different sites of infection, although it only had moderate capability to identify patients with infections in general.
We found that over 40% of all patients presented to the medical ED with a community-acquired infection. The most common site of infection was the lower respiratory tract, which concurs with prior studies [15][16][17][18][19]. The cohort of infected patients identified by discharge diagnoses, and that identified by chart review, were almost similar with regards to the prevalence rate of infection, distribution of patient characteristics, sites of infection and disease severity.
We chose to use the CDC/NHSN criteria as gold standard in our chart review. Since the criteria primarily were developed to survey healthcare-acquired infections, we had to adapt them to community-acquired infections. Some patients had clinically evident infections that did not conform to the CDC/NHSN defined criteria, so we used an alternative predefined definition in these cases, described in S 1. Use of another gold standard might have resulted in different results, but the CDC/NHSN criteria was chosen because of its widespread use the last 25 years [6]. A recent study assessed the inter-observer agreement of CDC/NHSN for classifying infections in critically ill patients in an ICU setting and found excellent agreement, but also found that full concordance on all aspects of the diagnosis of a specific infection was rare [20].
We found, that patients with one or more discharge diagnoses of infection, accumulated throughout their entire course of admission, were 4.9 times more likely to have a community-acquired infection, compared to patients without any confirmed infection. While a positive likelihood ratio greater than 10 means a test is good at ruling in a diagnosis [21], our results indicate, that discharge diagnoses as surrogate markers of infection are less good as an identification method of patients admitted to a medical ED with diagnoses of infections in general.
Only few studies have validated the use of hospital administrative data as a surrogate marker of infections in general. These studies, as the present study, yielded moderate to high positive predictive values (54-90%) [2][3][4][5]. When restricting the diagnoses to specific sites of infection, we found increased and acceptable high positive likelihood ratios and a high degree of validity in almost every site except the lower respiratory tract. This could be due to the classification criteria we used or to a non-specific coding practice in this patient group. Prior studies show similar results of low sensitivity and positive likelihood ratios in diagnosing pneumonia by discharge diagnoses in different patient populations [22][23][24], but these studies did not include lower respiratory tract infections without pneumonia as we did in the present study. In a sub-group analysis, where we divided the lower respiratory tract Table 4. Sensitivity, Specificity predictive values and likelihood ratios for sites of infection. infections into pneumonia and lower respiratory tract infections without pneumonia, we found a slight decrease in positive likelihood ratios in patients with pneumonia, compared to the combined group of lower respiratory tract infections (data not shown).
If the administrative data is used to identify patients with infection for predictive analysis, it is not only important that the administrative data identify patients with infections and rule out patients without infections, but also that the measures of validity remain constant within patient sub-groups. We found a decreasing positive likelihood ratio with increasing disease severity, older age, severe comorbidity and presence of immunosuppression. These findings indicate a high degree of differential misclassification; hence the discharge diagnoses lead to an overrepresentation of young patients without organ dysfunction and morbidity, compared to results identified by manual chart review. This information is important when planning prognostic studies using administrative data to identify patients with infection.
The observed trend in likelihood ratios could be due to a more complex clinical presentation, which older and more comorbid patients present with to the emergency department, making it more difficult to distinguish signs and symptoms of an infection from underlying diseases. Søgaard et al showed similar results in terms of decreasing positive predictive values when validating ICD-10 discharge diagnoses of pleural empyema [25], where younger patients had higher positive predictive values than older patients.

Strengths and limitations
The strength of the study is the large and unselected cohort of acute medical patients. Several studies have assessed the validity of distinct infections by identifying patients from discharge diagnosis, and subsequently reviewing charts to confirm the diagnosis [3,25]. This method leaves all patients with an infection, but without a discharge diagnosis of infection, undetected, thus potentially underestimating the ''true'' prevalence and lacking the ability to report likelihood ratios and sensitivity. Although this study provides information that confirms this method of sampling patients as acceptable when you work with total numbers, trends and proportions as well as site-specific infections it also illustrates the need for careful case validation if ICD-10 identified patients are used in studies assessing risk factors and prognosis.
Due to the uniformly organised Danish public healthcare system we could identify all but three patients included in the study. Another strength is the chart review to identify the reference cohort and subsequent validation of this cohort, yielding a kappa value of 0.67. This shows, that even when a well-defined classification of infection is applied, it is difficult to obtain a high concordance in inter-rater agreement in retrospective studies.
Like some of the prior studies using health administrative data to identify infections, we chose to include all discharge diagnoses from each patient stay as a surrogate marker of infection [19]. Other studies have used tentative admission diagnoses [16] or the primarily assigned discharge diagnoses [26], but because of differences in coding traditions across countries, it is difficult to generalise results and incidence rates. In contrast to the United States, coding of discharge diagnoses in Denmark depends on the treating physician, and a new set of codes might be produced every time the patient is transferred between different departments. As patients from the open general emergency department or the medical ED sometimes are transferred to other departments before a final diagnosis is established, the open general EDs/ medical EDs tend to produce more non-specific symptom related codes [27]. Despite differences in coding practices, we found results of positive predictive values comparable with studies conducted in the United States [2][3][4][5].
It is possible, that we identified hospital-acquired infections in the accumulated discharge diagnoses. However, only 5.9% of all patients had discharge diagnoses of infections added after being transferred from the medical ED to another department, indicating either a very low infection rate or insufficient coding practice of hospital-acquired infections. The current work was a single-center study from a medical ED at a university hospital; therefore, the results of this study may not be generalisable to other hospitals, surgical departments or intensive care units. However the hospital serves as the primary (and only) hospital for all residents in the catchment area, which minimise selection bias and probably increase the generalisability to other primary hospitals. The most appropriate design would have been a multicentre-and maybe international study with a larger sample size.
We chose to use ICD-10 discharge diagnoses of infection based on a review of all discharge diagnoses accumulated throughout the screened patients course of admissions. We based our choice of codes on the adapted ICD-10 version of the ones by Angus et al. from 2001, [26] but found that they were insufficient regarding some of the codes in our study population and we therefore included these clinical relevant missing codes. Another definition of ICD-10 diagnoses identifying infections could possibly have affected the main results.

Conclusion
Using ICD-10 discharge diagnoses as surrogate markers of infection yield almost the same prevalence rate, distribution of sites of infection and distribution of demographic characteristics compared to chart review. Identifying patients with site-specific infections showed a high degree of validity, but only moderate validity when identifying infections in general.

Supporting Information
Appendix S1 Definition of infections identified by chart review.