Validity of cerebrovascular ICD-9-CM codes in healthcare administrative databases. The Umbria Data-Value Project

Background Validation of administrative databases for cerebrovascular diseases is crucial for epidemiological, outcome, and health services research. The aim of this study was to validate ICD-9 codes for hemorrhagic or ischemic stroke in administrative databases, to use them for a comprehensive assessment of the burden of disease in terms of major outcomes, such as mortality, hospital readmissions, and use of healthcare resources. Methods We considered the hospital discharge abstract database of the Umbria Region (890,000 residents). Source population was represented by patients aged >18 discharged from hospital with a diagnosis of hemorrhagic or ischemic stroke between 2012 and 2014 using ICD-9-CM codes in primary position. We randomly selected and reviewed medical charts of cases and non-cases from hospitals. For case ascertainment we considered symptoms and instrumental tests reported in the medical charts. Diagnostic accuracy measures were computed using 2x2 tables. Results We reviewed 767 medical charts for cases and 78 charts for non-cases. Diagnostic accuracy measures were: subarachnoid hemorrhage: sensitivity (SE) 100% (95% CI: 97%-100%), specificity (SP) 96% (90–99), positive predictive value (PPV) 98% (93–100), negative predictive value (NPV) 100% (95–100); intracerebral hemorrhage: SE 100% (97–100), SP 98% (91–100), PPV 98% (94–100), NPV 100% (95–100); other and unspecified intracranial hemorrhage: SE 100% (97–100), SP 96% (90–99), PPV 98% (93–100), NPV 100% (95–100); ischemic stroke due to occlusion and stenosis of precerebral arteries: SE 99% (94–100), SP 66 (57–75), PPV 70% (61–77), NPV 99% (93–100); occlusion of cerebral arteries: SE 100% (97–100), SP 87% (78–93), PPV 91% (84–95), NPV 100% (95–100); acute, but ill-defined, cerebrovascular disease: SE 100% (97–100), SP 78% (69–86), PPV % 83 (75–89), NPV 100% (95–100). Conclusions Case ascertainment for both ischemic and hemorrhagic stroke showed good or high levels of accuracy within the regional healthcare databases in Umbria. This database can confidently be employed for epidemiological, outcome, and health services research related to any type of stroke.


Introduction
In Italy, stroke is one of the leading cause of mortality, and the first cause of long-term disability. The overall Italian population is 60,656,000 and the incidence estimate is 73,116 strokes/ year, the prevalence estimate is 351,820 strokes, with 75,252 deaths/year due to stroke [1]. Stroke represents a major social and economic problem: the healthcare estimated cost amounts to € 3,195.9 million, € 53 per capita. In Europe, we expect an increase of about 32% of incidence in the next twenty years, mainly due to the aging of the population [2]. Epidemiological data of stroke in Italy are in line with those of other high-income countries [3].
Trends in epidemiology and survival rates of stroke can be assessed using administrative healthcare databases. Administrative databases have the advantage that they can link different sources of information (such as discharge data, prescription and laboratory data) providing a comprehensive understanding of the burden of stroke in terms of important outcomes such as mortality, disability, hospital readmissions, and use of healthcare resources. However, these databases need to be adequately validated by comparing their main content, that is, the diagnosis represented by the International Classification of Diseases (ICD), with another source of information such as the clinical chart [4,5]. In addition, such databases can aid in monitoring adherence to drug therapies, including the use of evidence-based therapies. For these reasons, the validation of available administrative databases for cerebrovascular diseases becomes crucial, and it represents the first step for the conduction of subsequent epidemiological, outcome, and health services research.
The objective of the present study was to evaluate the accuracy of the ICD-9-CM codes in identifying patients with hemorrhagic or ischemic stroke in the administrative database of the Regional Health Authority of Umbria.

Materials and methods
Setting and data source Administrative database. The regional healthcare administrative database of Umbria gathers information regarding all hospital admission medical records on all 890,000 residents, including personal demographics, hospital admission and discharge dates, vital status, hospital department, primary and secondary diagnoses, and surgical or diagnostic procedures. In addition, the database records all drug prescriptions listed in the National Drug Formulary, and it allows identification of the prescriber. The regional administrative database has already been used for pharmacoepidemiology and drug-related outcome research [6][7][8][9]. Each resident has a unique identification code within the database, to which the various types of information corresponding to each individual are linked. In Italy, healthcare is provided almost entirely by the Italian National Health System (NHS), therefore most residents' significant healthcare information can be retrieved within the healthcare databases.
Source population. Source population was represented by permanent Umbria Region residents aged 18 or above. Any resident that had been discharged (with exclusion of voluntary discharge and inter-hospital transfer of patients) from a hospital with a diagnosis of hemorrhagic or ischemic stroke was considered. Residents that have been hospitalised outside the regional territory of Umbria were excluded from analysis.
Case selection and sampling method. The research protocol for this study has been previously published [10]. From the discharge abstract database of Umbria we identified by a simple randomization method using SAS 9.4 procedures six cohorts of "cases", that is patients with a diagnosis of hemorrhagic or ischemic stroke, between 2012 and 2014 having the ICD-9 codes located in primary position of subarachnoid hemorrhage (ICD-9 code 430), intracerebral haemorrhage (code 431), other and unspecified intracranial hemorrhage (codes 432.x), occlusion and stenosis of paracerebral arteries (codes 433.x1), occlusion of cerebral arteries (codes 434.x1), and acute but ill-defined cerebrovascular disease (code 436).
We excluded prevalent cases, i.e. patients discharged from hospitals with any of the diagnostic codes investigated with the same diagnosis in the five years before the index date. From each cohort, we extracted a random sample of 130 cases (see Statistical Methods for details). At the same time, we identified a cohort of "non-cases", corresponding to patients who had been discharged in the same period with a diagnosis of cerebrovascular disease (ICD-9 codes 390-459), including transient ischemic attack, but without hemorrhagic or ischemic stroke. From this cohort of non-cases, we extracted a random sample of 80 patients. This sample of noncases was used as control for each of the six conditions.

Chart abstraction and case ascertainment
We examined medical charts of cases and non-cases from hospitals archives for case ascertainment.
We collected the following information from the medical charts: unique identification patient code, date of birth, gender, dates of hospital admission and discharge, any diagnostic procedure and treatment that contributed to the diagnosis of the disease.
The chart reviewers were physicians appropriately trained in data extraction. A pilot phase was performed in which they independently examined 30 medical charts. The level of agreement between the two reviewers was very high (k> 0.85). To ensure a higher level of agreement the results of the pilot phase were discussed by the working group. Disagreements were resolved through the involvement of a third reviewer (GA). Each reviewer independently completed the data extraction using standardized forms.
The detection of symptoms and diagnostic tests have been considered for the case ascertainment, as described below.

Validation criteria
As defined in the study protocol [10], we used the criteria indicated in the published international guidelines regarding hemorrhagic and ischemic stroke to validate the related ICD-9-CM codes. To validate hemorrhagic stroke ICD-9 codes (430, 431, 432.x), we also considered the criteria defined by the American Heart Association/American Stroke Association (AHA/ ASA) [11,12], while to validate ischemic stroke ICD-9 codes (433.x1, 434.x1, 436) the criteria defined by the AHA/ASA [13] and European Stroke Organization (ESO) were used [14].
According to the above-mentioned guidelines, for the validation of hemorrhagic and ischemic stroke we considered the corresponding ICD-9 codes valid when both of the following conditions were present: (1) detection of focal lesions by neurological examination; (2) imaging test (CT, MRI, or angiography).
Neuroimaging was the main discriminator between the two types of stroke: the presence of hemorrhagic lesion classified the case as hemorrhagic stroke, while negative imaging for hemorrhage classified the case as ischemic stroke.

Statistical analysis
As previously published, for each ICD-9 code we calculated a sample of 121 cases in order to obtain an expected sensitivity of 80%, with a half-width of the 95% CI equal to 8% [10]. For specificity, we calculated a sample of 73 non-cases (cerebrovascular disease patients without the diseases of interest) to obtain an expected specificity of 90%, with a half-width of the 95% CI equal to 8%, according to binomial exact calculation [15]. We considered published systematic reviews to derive the expected accuracy measures [4,5]. In order to provide for potential missing medical charts, we decided to review a higher number of medical charts than anticipated.
For each ICD-9-CM code we calculated sensitivity and specificity, with their corresponding 95% CI. Sensitivity expressed the proportion of 'true positives' (i.e., patients with subarachnoid hemorrhage classified as positive by the administrative database and medical record review) relative to all cases deemed positive by the medical chart review. Specificity expressed the proportion of 'true negatives' (i.e., subarachnoid hemorrhage identified as negative by the administrative database and medical record review) relative to all cases deemed negative by the medical chart review. Positive and negative predicting values were also calculated, along with their 95% CI.

Reporting
We ensured quality of reporting following the criteria of Standards for Reporting Diagnostic Accuracy (STARD) 2015 [16] (S1 Table).

Ethics statement
Ethics approval has been obtained from the Regional Ethics Committee of Umbria (CEAS), registry No 2695/15 of 16/12/2015.

Results
We randomly selected a sample of 130 medical charts for each cohort of cases, and 80 medical charts from the cohort of non-cases.
We retrieved and analysed 78 medical charts for non-cases. Tables 2-7 show the characteristics of the patients for each disease.
As an additional support information file, a minimal anonymized dataset is provided (S1 Dataset).
Diagnostic accuracy measures are reported in Table 8.

Hemorrhagic stroke
Subarachnoid hemorrhage. We identified a cohort of 294 patients with subarachnoid hemorrhage (ICD-9 430), from which we extracted a sample of 130 cases, of these 129 clinical charts were analysed (one clinical chart was not available). Table 2 shows the basic characteristics of the patients with subarachnoid hemorrhage. The majority of patients were females (58%). Most patients (73%) were > 60 years.
Instrumental examinations performed for the diagnosis included computed tomography (CT) of the head for most patients, followed by arteriography of cerebral arteries, and magnetic resonance (MRI). Forty percent of patients underwent intracranial or neck vessels surgical procedures. Twenty percent of patients died during hospital stay. Magnetic resonance imaging (MRI) of brain 13 (10)

Intracerebral hemorrhage
Incident cases (N medical charts reviewed) 125
False positives were due to lack of neuroimaging documentation in the medical chart. Intracerebral hemorrhage. We identified a cohort of 1,259 patients with intracerebral hemorrhage (ICD-9 431), from which we extracted a sample of 130 cases, of these 125 clinical charts were analysed (five clinical charts were not available). Table 3 shows the basic characteristics of patients with intracerebral hemorrhage. Fifty-one percent of patients were females. Most patients (86%) were >60 years. Almost all patients underwent CT of the head, while about 10% underwent MRI of the brain or arteriography of cerebral arteries (Table 2). Few patients (10%) underwent neurosurgery procedures. Almost one third of patients died during hospitalization.
Two patients were considered false positives: one patient had negative instrumental (CT or MRI) diagnosis for intracerebral haemorrhage; in a second patient, imaging confirmation of intracerebral hemorrhage (CT or MRI) could not be found in the medical chart ( Table 9).
Acute, but ill-defined, cerebrovascular disease Incident cases (N medical charts reviewed) 127

Deaths, N (%)
Patients deceased during hospital admission 28 (22) https://doi.org/10.1371/journal.pone.0227653.t007 Validity of cerebrovascular ICD-9-CM codes in healthcare administrative databases PLOS ONE | https://doi.org/10.1371/journal.pone.0227653 January 9, 2020 Table 4 shows the basic characteristics of patients with other and unspecified intracranial hemorrhage. The most common ICD-9 subgroup was the code 432.1 (subdural hemorrhage) (94%). Sixty percent of patients were males. Most patients (91%) were 60 years and older. The most frequent diagnostic instrumental examination was CT of the head. Almost half of the patients underwent incision of brain and cerebral meninges.
Three patients were considered false positives: two patients were misclassified because they had a diagnosis of subarachnoid hemorrhage instead of subdural, while one patient had no symptoms or signs in past history ( Table 9).

Ischemic stroke
Precerebral arteries. We identified a cohort of 468 patients with occlusion and stenosis of precerebral arteries (ICD-9 433.x1), from which we extracted a sample of 130 cases, of these 128 clinical charts were analysed (two clinical charts were not available). Table 5 shows the basic characteristics of patients with occlusion and stenosis of precerebral arteries. The most common ICD-9 subgroup was the code 433.11 (carotid artery) (93%). Sixty-five percent of patients were males. Most patients (65%) were in the age class 60-79 years. The most frequent diagnostic instrumental examination was CT of the head. Almost half of the patients (45%) underwent endarterectomy of other vessels of head and neck.
Thirty-nine patients were considered false positives. Twenty-nine patients who had been discharged with a stroke diagnosis were asymptomatic, admitted for planned carotid endarterectomy, and for most of them (26/29) no instrumental confirmation of the disease was found in the medical chart. Seven patients had no diagnostic report of carotid artery ultrasound in the medical chart. Three patients had carotid stenosis <50% (Table 10). -Negative instrumental diagnosis for intracerebral hemorrhage (CT or MRI) (n. 1); -Instrumental confirmation of intracerebral hemorrhage (CT or MRI) not found in the medical chart (n. 1).
-Misclassification (diagnosis of subarachnoid hemorrhage instead of subdural) (n. 2); -Symptoms or signs not reported in the medical history (n. 1).

None
None None https://doi.org/10.1371/journal.pone.0227653.t009 Validity of cerebrovascular ICD-9-CM codes in healthcare administrative databases A less conservative algorithm that considered true positives patients with a planned hospitalization, lead to an increase in specificity from 66% to 86%, and PPV from 70% to 90%.
One patient in the non-case group was considered a false negative because he had carotid stenosis but with instrumental examination missing in the medical chart that can confirm the absence of cerebral infarction.
Occlusion of cerebral arteries. We identified a cohort of 4,152 patients with occlusion of cerebral arteries (ICD-9 434.x1), from which we extracted a sample of 130 cases, of these 129 clinical charts were analysed (one clinical chart was not available). Table 6 shows the basic characteristics of patients with occlusion of cerebral arteries. The most common ICD-9 subgroup was the code 434.01 (Cerebral thrombosis) (67%). Fifty-three percent of patients were males. Most patients (86%) were 60 years and older. Almost all patients (98%) underwent CT of the head. Few patients (9%) underwent pharmacological procedures.
Twelve patients were considered false positives: eight of them had negative instrumental diagnosis for cerebral thrombosis or embolism or occlusion of cerebral arteries with cerebral infarction; three patients were misclassified because they had a diagnosis of acute, but illdefined, cerebrovascular disease; one patient had no instrumental examinations in the clinical chart (Table 10).
Acute, but ill-defined, cerebrovascular disease. We identified a cohort of 278 patients with "acute, ill-defined, cerebrovascular disease" (ICD-9 436), from which we extracted a sample of 130 cases, of these 127 clinical charts were analysed (three clinical charts were not available).  -Negative instrumental diagnosis for cerebral thrombosis or embolism or occlusion of cerebral arteries with cerebral infarction (n. 8); -Acute, but ill-defined, cerebrovascular disease misclassification (n. 3); -Absence of instrumental examinations in the clinical chart (n. 1).

FALSE NEGATIVES
-Patient with carotid stenosis but without instrumental examination in the medical chart that can confirm the absence of cerebral infarction (n. 1).

None
None https://doi.org/10.1371/journal.pone.0227653.t010 Validity of cerebrovascular ICD-9-CM codes in healthcare administrative databases Table 7 shows the basic characteristics of patients with acute, but ill-defined, cerebrovascular disease. Sixty-one percent of patients were females. Most of patients (61%) were 80 years and older. The most frequent diagnostic instrumental examination was CT of the head (93%). One fifth of the patients died during hospitalization.
Twenty-two patients were considered false positives. Of these, ten were symptomatic patients with negative instrumental diagnosis; seven were symptomatic patients with instrumental confirmation not found in the medical chart; three were misclassified because they had diagnosis of thrombosis with cerebral infarction; lastly, two were asymptomatic patients with instrumental confirmation of chronic ischemia ( Table 10).

Discussion
In administrative databases the diagnosis of a given disease is associated with a specific ICD code. Despite some limitations, the ICD code is an important tool designed to map health conditions to corresponding general disease categories, along with specific variations. These codes have the advantage of being widely available and require much lower effort and cost than consulting medical charts [17].
The present study is the first to validate the main ICD-9 codes related to ischemic and hemorrhagic stroke using the Umbria healthcare database. Through the study an excellent level of accuracy for hemorrhagic stroke and acceptable values of specificity and PPV for ischemic stroke was found. This is the second study in Italy to validate ICD-9 codes at regional level. Spolaore et al [18] assessed the accuracy of discharge diagnoses related to stroke criteria and found that the codes 430, 431, 434, and 436 in primary diagnoses had the highest PPVs ranging from 61% to 78%. The authors used the MONICA criteria [19] and obtained similar results to ours in terms of PPVs. Other studies that measured the validity of stroke related ICD-9 codes in Italy were limited to a hospital level [20] [21], or limited to the validity of ischemic stroke only [20] but the results were substantially comparable to those of our study. Potential discrepancies in terms of accuracy might exist between different healthcare databases and these can be explained by the validation criteria used for case ascertainment or inaccuracies in coding the diagnosis. However, validation of administrative databases is context-specific and the results of the present validation study can be generalizable only to the setting of the Umbria population. Hence, the Umbria healthcare database can be used to perform epidemiological and clinical research on health services, in terms of assessment of efficacy, safety and appropriateness of drugs prescription and use of medical devices, through cross-linkage of several databases such as hospital discharge records and prescription databases.
The diagnostic accuracy results of our study are in line with those described in the literature of validation studies using analysis of medical records as the gold standard.
A systematic review published in 2012 [4] identified 35 studies that across 1990-2010 evaluated the validity of algorithms for identifying ischemic and hemorrhagic strokes (intracranial hemorrhage and subarachnoid haemorrhage). All studies included validated administrative coding data in primary and secondary position through abstraction of medical charts. The source population was heterogeneous (i.e. inpatient, outpatient, incident and prevalent cases).
In addition, while two studies validated stroke diagnosis in a paediatric population, most studies conducted the validation process using hospitalization databases, though 16 studies evaluated stroke as a cause-of-death on death certificates and one study reported on outpatient data.
Criteria for the confirmation of stroke varied widely across the studies. More than half of the studies used a specific set of diagnostic criteria based mostly on the WHO criteria (clinical criteria for stroke with CT negative for lesion) to evaluate the stroke diagnosis.
In terms of accuracy, for most studies evaluating codes 430, 431, or 434.x separately, the reported PPVs were 80% or higher. Most of the studies that validated the 436 code had a PPV � 70%. In contrast, most studies reported low PPV values for the 433.x code, with the exception of a study [22] that evaluated the code 433.x1 separately from the code 433.x0 with a much higher value of PPV (71% compared to 13%). Moreover they compared algorithms using the primary discharge diagnosis with those using diagnoses in any position (primary and secondary diagnoses) and found less than 10% higher PPVs for algorithms using the primary discharge diagnosis alone.
A more recent systematic review published in 2015 [5] identified 77 studies that across 1976-2015 evaluated the validity of ICD-9 or ICD-10 codes related to any cerebrovascular disease and found that in more than half of the studies sensitivity was > 82%, specificity > 95% and PPV > 81%. Hospital discharge positions (primary and secondary diagnosis) and type of administrative data (i.e. inpatient, outpatient) were abstracted from the included studies. Most of the studies used chart review as a gold standard. About half of the included studies used chart reviews sometimes in conjunction with unspecific diagnostic criteria as a gold standard, while about 35% used chart review with a specific set of diagnostic criteria, most often the WHO criteria. Finally, about 15% of the studies used regional stroke registers and clinical databases. The review also separately analysed the three codes related to hemorrhagic stroke, while regarding ischemic stroke, in most studies the code ICD-9 436 or ICD-10 I64 was analysed, paired with the ICD-9 434 or ICD-10 I63.
Regarding subarachnoid hemorrhage codes (ICD-9 430 or ICD-10 I60) the PPV was � 86% in 16 of the 26 studies where this was reported. The sensitivity of these codes, reported by 4 studies, ranged from 35% to 95%. Twenty-six studies evaluated the intracerebral hemorrhage codes (ICD-9 431 or ICD-10 I61) and the PPV was � 87% in 16 of the 25 studies reporting on PPV. The sensitivity of these codes, as reported by 3 studies, ranged from 57% to 95%. The codes ICD-9 432 or ICD-10 I62 were less used and evaluated in 15 studies, in which the PPV was � 67% in all but two.
Regarding ischemic stroke the ICD-9 code 433 was evaluated in 19 papers, and in 14 of the 19 the PPV was � 71%. Two studies reported very low measures of sensitivity for ischaemic stroke (ICD-9 433). The code for occlusion of cerebral arteries (ICD-9 434 or ICD-10 I63) had a PPV � 82% in 20 of the 27 studies. The sensitivity, available from 6 papers, ranged from 2% to 80%.
Finally, 16 studies examined the validity of ICD-9 436 and 434 together, and the PPV was � 82% in 10 of the 16.

Strengths and limitations
A strength of our study is that we used medical charts as reference standards to adjudicate cases of hemorrhagic or ischemic strokes. In addition, we relied on a pre-published protocol with no deviation during the study development. To ensure the quality of reporting we followed the STARD 2015 criteria [16] for diagnostic accuracy studies. Lastly, we used detailed and explicit criteria for case ascertainment, and duplicate and independent processes for medical chart review, data abstraction and analysis. We acknowledge that a potential limitation of our study is that we did not evaluate the accuracy of ICD-9 codes located in secondary position. Another limitation of the present study concerns the generalizability of our results in other populations with different demographic characteristics and disease prevalence.
Nonetheless, the study methodology could be replicated in other settings to identify possible differences in diagnostic accuracy results.

Conclusion
The Regional Health Authority of Umbria has started a research activity in the last years regarding case definitions of several diseases [23][24][25][26], and has validated ICD-9 codes for several oncological diseases [27][28][29][30]. In the present study, we validated the ICD-9 diagnostic codes for important cerebrovascular diseases using the Regional Health database of Umbria. Findings from the present study suggest that the assessed ICD-9 codes are highly predictive of hemorrhagic stroke while they may be considered acceptable for ischemic stroke. In conclusion, our results showed that the Umbrian healthcare administrative database, validated for these diseases, can be confidently used for epidemiological, outcome, and health services research.
Supporting information S1