Low Completeness of Bacteraemia Registration in the Danish National Patient Registry

Bacteraemia is associated with significant morbidity and mortality and timely access to relia-ble information is essential for health care administrators. Therefore, we investigated the complete-ness of bacteraemia registration in the Danish National Patient Registry (DNPR) containing hospital discharge diagnoses and surgical procedures for all non-psychiatric patients. As gold standard we identified bacteraemia patients in three defined areas of Denmark (~2.3 million inhabitants) from 2000 through 2011 by use of blood culture data retrieved from electronic microbiology databases. Diagnoses coded according to the International Classification of Diseases, version 10, and surgical procedure codes were retrieved from the DNPR. The codes were categorized into seven groups, ranked a priori according to the likelihood of bacteraemia. Completeness was analysed by contin-gency tables, for all patients and subgroups. We identified 58,139 bacteraemic episodes in 48,450 patients; 37,740 episodes (64.9%) were covered by one or more discharge diagnoses within the sev-en diagnosis/surgery groups and 18,786 episodes (32.3%) had a code within the highest priority group. Completeness varied substantially according to speciality (from 17.9% for surgical to 36.4% for medical), place of acquisition (from 26.0% for nosocomial to 36.2% for community), and mi-croorganism (from 19.5% for anaerobic Gram-negative bacteria to 36.8% for haemolytic strepto-cocci). The completeness increased from 25.1% in 2000 to 35.1% in 2011. In conclusion, one third of the bacteraemic episodes did not have a relevant diagnosis in the Danish administrative registry recording all non-psychiatric contacts. This source of information should be used cautiously to iden-tify patients with bacteraemia.

contacts. This source of information should be used cautiously to iden-tify patients with bacteraemia. thereafter). Key data included dates of draw and receipt of the BC in the DCM, and BC isolates. We retrieved data on all positive BCs and used previously published computer algorithms to exclude likely contaminants and to derive bacteraemic episodes [21,23]. For each episode we defined the best-estimate baseline date as the date of draw; for bacteraemic episodes with a missing date of draw (9.3%) we used the never-missing date of receipt. We used previously reported computer algorithms to derive incident and non-incident episodes as well as acquisition (community-acquired, healthcare-associated, nosocomial) [21].

Code systems used in the Danish National Hospital Registry
In Denmark, all diagnostic and surgical procedure codes are allocated by physicians when patients are discharged.
Until 1994, diagnoses were coded according to the International Classification of Diseases, version 8 (ICD-8), and thereafter according to the ICD-10, as ICD-9 was never implemented in Denmark [15]. We used the Danish ICD-10 version [24], derived from the WHO classification, vs. 2010 [25] with amendments that more specifically designate bacteraemia (e.g. A49.9A [Bacteraemia, unspecified] found in the Danish, but not in the WHO, version). For each hospitalization, one obligatory principal diagnosis may be supplemented with up to 20 secondary diagnoses.
For surgical procedures, the Nordic Classification of Surgical Procedures [26] (NOMESCO) has been in use since 1996.

Retrieval of diagnoses and surgical procedures related to sepsis or bacteraemia
Two authors (HCS, SLN) independently retrieved codes for diagnoses and surgical procedures that may indicate the presence of bacteraemia, either directly by codes using the term bacteraemia or septicaemia or indirectly by codes indicating focal infections. The authors' codes were combined and consensus was reached with agreement on all included codes, shown in the Appendix.

Linkage to the Danish National Hospital Registry
We linked the core dataset to their DNPR inpatient data and retrieved the date of hospital admission from home which was closest to and equal to or earlier than the best-estimate baseline date. Likewise, we retrieved the date of discharge to home which was closest to and equal to or later than the best-estimate baseline date. For this hospitalization, which covered the bacteraemic episode, we retrieved all relevant diagnosis and surgical procedure codes (see Appendix).
We then linked the core dataset to the DNPR to retrieve all first-time diagnoses in the Charlson comorbidity index [27] within a 6-year period prior to the best-estimate baseline date. In this index, 19 major disease categories (e.g., malignancy, cardiovascular diseases, and diabetes mellitus) are assigned a score, with higher scores given to prognostically more severe diseases.

Linkage to the Danish Civil Registration System
To obtain mortality data we linked the study population to the Danish Civil Registration System, which comprises daily updated data on the patients' vital status, as well as date of death, disappearance, or emigration, if relevant [28].

Statistical analyses
We categorized the codes for diagnoses and surgical procedures into seven groups and determined the following priority list for the likelihood of representing bacteraemia: 1) infections/ bacteraemia; 2) other diagnoses/bacteraemia; 3) other diagnoses/focal infection; 4) surgical procedures/focal infection; 5) infections/systemic infection; 6) other diagnoses/systemic infection; 7) infections/focal infection (see Appendix for the specific codes and text examples for the three most common codes within each group). If group 1 occurred in the hospitalization comprising the bacteraemic episode, groups 2-7 were annulled. If group 1 did not occur we retrieved group 2 and annulled groups 3-7. If group 2 did not occur we proceeded to group 3, etc. (see Appendix for examples for two specific bacteraemic episodes).
We used the same seven groups with a different prioritization (5 > 7 > 3 > 4 > 6 > 1 > 2) to derive diagnoses/procedures that may indicate the presence of a focal infection, applying the same principles as for bacteraemia (if 5 occurred, 1-4 and 6-7 were annulled, if 5 did not occur we proceeded to 7, etc.). See Appendix for examples of two specific bacteraemic episodes.
To assess possible time-related aspects, we depicted a histogram with the proportions of groups 1 and 3-7 on the y-axis and calendar year (2000-2011) on the x-axis.
Finally, we used logistic regression analysis to compute odds ratios (ORs) with 95% confidence intervals (CIs) for 30-day mortality, a commonly used outcome in prognostic bacteraemia studies. We adjusted for the above basic patient characteristics except sepsis groups (due to missing data). The analyses covered all bacteraemic episodes as well as subgroup analyses for groups 1-7, group 1, and groups 3-7 during 2000-2008 as data on speciality were incomplete as from 2009. The program Stata (release 13; StataCorp) was used for all analyses.

Ethical considerations
The study was approved by the Danish Data Protection Agency (record nos. 2007-41-0627, 2013-41-2579). Approval by an ethics committee or consent from participants (including next of kin/caregiver in the case of children) are not required for registry-based research in Denmark. Data were not anonymized prior to analysis.

Results
We

Groups of diagnoses and surgical procedures
Among the 58,139 bacteraemic episodes, 37,740 (64.9%) were related to a hospitalization with one or more of the seven diagnoses/surgery groups we defined as indicative of bacteraemia or a focal infection (Table 1). Among these, 18,786 episodes (32.3%) had a group 1 code (an "infection/bacteraemia" diagnosis, i.e., the highest priority codes representing bacteraemia) with a total of 20,433 of such codes (

Characteristics of patients with the most likely bacteraemia codes
The proportion of bacteraemic episodes with a group 1 code (32.3%, Table 3) differed between all subgroups (p < 10 −4 ) except for gender (p = 0.24). The completeness increased with higher

Characteristics of patients with the most likely focal infection codes
The proportion of bacteraemic episodes with a group 3-7 code (45.7%, Table 3

30-day mortality in multivariate analyses
The ORs (95% CIs) for 30-day mortality generally varied little whether these were computed for all bacteraemic episodes, episodes with a group 1-7 code, episodes with a group 1 code, or episodes with a group 3-7 code (

Discussion
Even with a comprehensive inclusion of diagnostic ICD-10 codes and NOMESCO procedure codes that could indicate either sepsis/bacteraemia or a focal infection, only 64.9% of bacteraemic episodes had at least one of these codes registered in the relevant hospital contact. With restriction to codes that more likely represented bacteraemia the completeness declined to 32.3%. Our gold standard was bacteraemic episodes derived from positive BCs recorded in electronic laboratory information systems maintained by DCMs, from which we excluded contamination episodes by generally accepted algorithms [21,23]. This, as well as the capture of the  majority of positive BCs [17], indicates that our study database represents the greater part of detected bacteraemic episodes within well-defined geographic regions. Bacteraemia is a serious condition [1], which should theoretically encourage its recording in administrative registries. However, many of the diagnoses that most likely capture the aetiological entity bacteraemia actually designate sepsis ( Table 2). Sepsis is a clinical entity previously defined as the presence of at least two among four Systemic Inflammatory Response Syndrome (SIRS) criteria as well as infection [5,29] and currently defined by a broader definition [30]. The correct coding of sepsis is complicated [31][32][33][34][35][36] and there is no consensus on which code abstraction strategy that will correctly capture septic episodes [37][38][39]. Two prior studies, a Swedish based on ICD-9 and ICD-10 codes and a US based on ICD-9 codes, compared code abstraction strategies used to retrieve severe sepsis hospitalizations from administrative registries [6,7]. The number of severe sepsis hospitalizations varied more than three-fold, which plausibly explains the high variation when reporting incidence of sepsis [37][38][39][40][41].
Most administrative data validation studies either retrieved diagnostic codes from administrative registries followed by validation in randomly sampled medical charts [10,11,[42][43][44][45][46] or they compared administrative registries to assess concordance [11][12][13][14][47][48][49]. Fewer studies have initially scrutinized data believed to represent the gold standard followed by their completeness in administrative registries [8,9,[50][51][52][53] To the best of our knowledge, only the Danish 'predecessor' study that prompted this study has validated the diagnosis of bacteraemia in administrative registries [9]. That study included 406 bacteraemic episodes from 1994 recorded in a prospectively validated research database of positive BCs and clinical assessments [54]. Only 18 episodes (4.4%) were recorded with a bacteraemia/sepsis diagnosis in the DNPR. The DNPR replacement of ICD-8 by ICD-10 in 1994 [15] may be a reason for this low completeness. Analysis of data for our study population by using the same 30 ICD-10 codes as in the 1994 study [9] yielded a completeness of 32.0% (data not shown), which is virtually identical to the 32.3% reported here (see Appendix, group 1 codes) and thus representing a notable improvement as compared to the 4.4% reported from 1994 [9].
A few studies have validated sepsis in administrative registries, focusing on severe sepsis or septic shock [8,50,51]. Comparison to our study is difficult for several reasons: we do not know how much overlap there is between sepsis and bacteraemia, ICD-9 codes (which may differ substantially from ICD-10 codes [32,55]) were used, capture of sepsis varies up to threefold depending on the algorithm [6,7], or the study included emergency department or ICU patients only [8,51]. Although a low proportion of the bacteraemic episodes was recorded with a relevant diagnosis in the DNPR this may not pose a problem if this capture is non-selective, but this was not the case as proportions varied up to two-fold, e.g., 17.9% for surgical vs. 36.4% for medical patients pertaining to group 1 diagnoses. Likewise, a higher completeness was seen with increasing severity of sepsis, also reported for severe sepsis [8], whereas it declined from community over healthcare-associated to nosocomial acquisition. There were fewer variations between ORs in multivariate 30-day mortality analyses; caveat is still warranted for some subgroups though, e.g., paediatric ward patients. A meta-analysis of 36 severe sepsis trials reported the same declining mortality trend from 1993 through 2009 whether data were from the trials or administrative registries or not, though percent mortality differed considerably [56].
The Danish version of the WHO ICD-10 classification [25] is from 1993 [57] and instructions on coding and registration have not been updated since then. The codes are updated on the official Danish web site [24] and coding is facilitated by private entrepreneurs [58]. Most of the codes that cover bacteraemia include "Sepsis" in their designation, with a few exceptions, such as A49.9A [Bacteraemia, unspecified], used only for 2.1% of the Group 1 codes (Table 2). One reason for this may be that A49.9A is not included in the original code book [57], so only physicians who are aware of the web amendments [24,58] will probably use this highly relevant code. A future update could alter the designations in the Group 1 codes from "Sepsis" to "Bacteraemia", as this would be more in accordance with globally accepted definitions [5] and would probably facilitate and increase the recording of bacteraemia in the DNPR.
We used a database which represented clinically important bacteraemic episodes, the study was population-based, and included a high number of episodes that enabled subgroup analyses. However, there were also limitations that warrant further discussion. The main limitation was the inability to report predictive values as we had no information on the use of the 1,079 codes in the DNPR for patients not having bacteraemia. In the Danish 'predecessor' study, codes that roughly correspond to our group 1 codes were assessed for all patients in the DNPR, which enabled the reporting of positive predictive values (PPVs), found to be 21.7% [9]. PPVs would probably decline with the inclusion of group 1-7 or group 3-7 codes, in which the prevalence of bacteraemia would not alter the numerator, but the denominator comprising all the said codes would increase. Such low PPVs, and consequently many false positive patients, preclude research on bacteraemia patients based on administrative data. Secondly, we only had clinical data for 4.7% of the bacteraemic episodes. The increasing completeness with higher sepsis severity further indicated the selective recording of patient groups in administrative registries. Thirdly, the multiple ICD-10 codes prompted us to define a limited number of groups based on the likelihood to represent bacteraemia. Although we used a consensus process the classification of codes and the ranking of groups were subjective. Still, the ordinal scale provided a working solution to the conundrum of 5,040 ordered sequences of the 7 groups. Finally, some blood cultures with common skin commensals may represent contamination, and not bacteraemia, which may be reflected in the lower completeness for CNS (Table 3). However, for 9,482 bacteraemic episodes (part of the actual study database) we previously reported 94.6% agreement for bacteraemia vs. contamination when computer algorithms were compared to physicians' individual clinical assessments [21]. We therefore believe this had minor impact on the results.
In conclusion, our study showed a low completeness of bacteraemic episodes identified in the official Danish administrative registry used for the recording of all hospital contacts.
Further, there were considerable differences in completeness as to whether the acquisition was community, healthcare-associated or nosocomial. Although few studies have shown this for bacteraemia, this is in accordance with sepsis [33,35] and health-care associated infections [55,59] for which it has been concluded that their detection should not be based solely on administrative registry data. Hence, bacteraemia studies should preferably be derived from bacteraemia databases based on positive blood cultures [54,60,61]. The three most common diagnoses from Group 1: See article,  With a group prioritization of 1 > 2 > 3 > 4 > 5 > 6 > 7, this bacteraemic episode will be recorded as DA41.0-Sepsis due to Staphylococcus aureus (Group 1), annulling the three other codes.