The authors have declared that no competing interests exist.
Conceived and designed the experiments: DL JAAZ. Performed the experiments: NM VB DL JAAZ. Analyzed the data: NM VB DL JAAZ. Wrote the paper: NM VB DL JAAZ. Final approval of the version of the manuscript to be published: NM VB DL JAAZ.
To conduct a systematic review of studies reporting on the validity of International Classification of Diseases (ICD) codes for identifying stroke in administrative data.
MEDLINE and EMBASE were searched (inception to February 2015) for studies: (a) Using administrative data to identify stroke; or (b) Evaluating the validity of stroke codes in administrative data; and (c) Reporting validation statistics (sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), or Kappa scores) for stroke, or data sufficient for their calculation. Additional articles were located by hand search (up to February 2015) of original papers. Studies solely evaluating codes for transient ischaemic attack were excluded. Data were extracted by two independent reviewers; article quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool.
Seventy-seven studies published from 1976–2015 were included. The sensitivity of ICD-9 430-438/ICD-10 I60-I69 for any cerebrovascular disease was ≥ 82% in most [≥ 50%] studies, and specificity and NPV were both ≥ 95%. The PPV of these codes for any cerebrovascular disease was ≥ 81% in most studies, while the PPV specifically for acute stroke was ≤ 68%. In at least 50% of studies, PPVs were ≥ 93% for subarachnoid haemorrhage (ICD-9 430/ICD-10 I60), 89% for intracerebral haemorrhage (ICD-9 431/ICD-10 I61), and 82% for ischaemic stroke (ICD-9 434/ICD-10 I63 or ICD-9 434&436). For in-hospital deaths, sensitivity was 55%. For cerebrovascular disease or acute stroke as a cause-of-death on death certificates, sensitivity was ≤ 71% in most studies while PPV was ≥ 87%.
While most cases of prevalent cerebrovascular disease can be detected using 430-438/I60-I69 collectively, acute stroke must be defined using more specific codes. Most in-hospital deaths and death certificates with stroke as a cause-of-death correspond to true stroke deaths. Linking vital statistics and hospitalization data may improve the ascertainment of fatal stroke.
Stroke imparts a substantial burden on patients, healthcare systems, and society, with stroke accounting for more than 6.6 million deaths in 2012 (11.9% of all deaths globally) [
Administrative databases are increasingly being used for stroke research. These data sources, which link longitudinal health resource utilization data for hospitalizations, outpatient care, and, in some jurisdictions, dispensed medications, to individual-level demographic and vital statistics data, allow for more efficient analyses, and more generalizable findings. Unfortunately, as administrative databases are usually established for billing, and not research, purposes, the diagnoses contained within tend to be coded by non-medical staff and may not reflect the final diagnosis of the treating physician. But if these databases are to be used for stroke research, the diagnostic codes used to identify stroke must be valid. This means they must be able to distinguish those who have actually experienced a stroke (according to an accepted ‘gold standard’ reference diagnosis) from those who have not. These diagnostic codes must also allow researchers to distinguish the major subtypes of acute stroke, which differ from one another with regards to their incidence rates, risk factors, and outcomes. For example, haemorrhagic stroke occurs far less frequently than ischaemic stroke [
While several validation studies of stroke codes have been conducted [
An experienced librarian (M-DW) undertook searches of the MEDLINE and EMBASE databases, from inception (1946 and 1974, respectively) for all available peer-reviewed literature. Two search strategies were used: (1) All studies where administrative data was used to identify cardiovascular diseases; (2) All studies reporting on the validity of administrative data for identifying cardiovascular diseases. Our MEDLINE and EMBASE search strategies are available as (
Two reviewers independently screened the titles and abstracts of the located records for relevance to the study objectives. In the next step, full text publications were evaluated against the inclusion criteria. Any discrepancies were discussed until consensus was reached. When the conflict persisted a third reviewer (JAA-Z) was consulted. No protocol for this systematic review has been published, though more information is available in the following publication [
We considered full-length, English-language, peer-reviewed articles that used administrative data and either reported validation statistics for the International Classification of Diseases (ICD) codes of interest, or provided sufficient data for their calculation. We first included studies that evaluated at least one code pertaining to a subtype of acute stroke, being ICD-8/9 430 or ICD-10 I60 for subarachnoid haemorrhage (SAH), and ICD-8/9 431 or ICD-10 I61 for intracerebral haemorrhage (ICH). For ischaemic stroke, the main codes are ICD-8 433/434 and ICD-9 434 (occlusion of the cerebral arteries), and ICD-10 I63 (cerebral infarction).
Stroke is a heterogeneous disease that is not defined consistently by clinicians or researchers [
Although our focus was on the validity of codes for acute stroke-specifically (defined as SAH, ICH, or ischaemic stroke), we also included studies that evaluated a range of codes (ICD-8/9 430–438 or ICD-10 I60-I69) pertaining to a broader group of cerebrovascular diseases. Included in these ranges were the codes for acute stroke listed above, along with codes for acute but ill-defined stroke (ICD-9 436 and ICD-10 I64), other types of ill-defined stroke (ICD-9 437) and other cerebrovascular diseases (ICD-10 I67/68), other types of intracranial haemorrhage than ICH (ICD-9 432 and ICD-10 I62), TIA (ICD-9 435), and late effects of stroke or stroke sequelae (ICD-9 438 and ICD-10 I69). It was important to include these studies because, while reviewing the literature, we observed that this broad range of codes for cerebrovascular disease is frequently used to identify cases of acute stroke.
Two independent reviewers (NM and VB) examined the full text of each selected record and abstracted data using a standardized collection form (a copy is provided in
The design and methods employed in each study, including the rigour of the reference standard, and generalizability of the study population, could influence the resultant validity statistics. Hence, all studies were evaluated for quality, with the validation statistics stratified by level of study quality. An adaptation of the QUADAS tool (Quality Assessment of Diagnostic Accuracy Studies) [
All validation statistics were abstracted as reported. Where sufficient data were available we calculated 95% confidence intervals (95% CI) and validation statistics not directly reported in the original publication. Kappa values (a measure of agreement beyond that expected by chance) greater than 0.60 indicated substantial/perfect agreement, 0.21–0.60 were considered as fair/moderate agreement and those 0.20 or lower as light/poor agreement [
We identified 1,587 citations through our original searches (inception to November 2010) of the MEDLINE and EMBASE databases, and an additional 2,160 citations in our updated searches of these databases (January 2010 to February 2015). All citations were screened for relevance to our study objectives, with 198 full-text articles assessed for eligibility (
ICD = International Classification of Diseases.
Of the 77 articles evaluating stroke diagnoses that were included in the final review, 31 (40%) were from the United States (USA), 26 (34%) were from Europe, 13 (17%) were from Canada, four were from Asia (5%), two (3%) were from Australia, and one (1%) was from Sri Lanka. Characteristics of these studies are presented in
First Author, Year of Publication | Year(s) of Data Collection | Primary Validation Study? | Country | Records Evaluated (N) | Source Population | Type of Administrative Data | Gold Standard |
---|---|---|---|---|---|---|---|
2004–2008 | yes | France | 903 | residents of one community hospitalized for stroke at one teaching hospital | ICD-10 inpatient records | disease registry, using WHO criteria | |
1993–2003 | no | USA (California) | 1,307 | children aged 0–19 years enrolled in the Kaiser Permanente Medical Care Program and participating in the Kaiser Paediatric Stroke Study | ICD-9 inpatient and outpatient records | CRMD | |
1999–2000 | yes | Sweden | 377 | residents of one community | ICD-10 inpatient and vital statistics records | disease registry, using WHO criteria | |
1999–2000 | yes | Canada (Ontario) | 616 | all patient discharged from one tertiary hospital with a bleeding-related or thromboembolic diagnosis | ICD-9 inpatient records | CRDC | |
1992 | yes | USA (Louisiana, Massachusetts, California, Iowa, Pennsylvania) | 649 | patients hospitalized at one of five academic medical centres and eligible for a telephone survey of persons at increased risk for major stroke | ICD-9 inpatient records | CRDC—WHO criteria | |
1998–1999 | yes | USA (national) | 23,657 | Medicare beneficiaries (aged 20–105 years) on the National Registry of Atrial Fibrillation hospitalized for atrial fibrillation | ICD-9 inpatient records | chart review | |
1998–1999 | yes | USA (national) | 1,176 | individuals regularly receiving care from one of 10 Veterans Affairs sites across the USA, random selection of 100 users from each site with hypertension and 20 without | ICD-9 outpatient records | chart review | |
1993–1995 | no | USA (Ohio, Kentucky) | 733 | all residents of one of five counties | ICD-9 inpatient and vital statistics records | CRDC–Rochester, Minnesota and National Institute of Neurological Disorders and Stroke | |
2000–2001 | yes | USA (Texas) | 186 | participants aged 44 years and older enrolled in the Brain Attack Surveillance in Corpus Christi (BASIC) Project | ICD-10 vital statistics records | CRMD | |
2003 | yes | Canada (Alberta) | 4,008 | general hospitalized population | ICD-10 inpatient records | chart review | |
1999 | yes | Taiwan | 372 | hospitalized patients aged 55 years and older | ICD-9 inpatient records | CRMD | |
n/a | yes | Scotland | 97,515 | hospitalized patients at one university teaching hospital | ICD-9 inpatient records | disease registry: Lothian Stroke Register | |
1961–73 | yes | Sweden | 1,156 | 10,000 pairs of twins enrolled in the Swedish Twin Registry and born during 1901–1925 | ICD (1965 edition) vital statistics records | CRMD | |
1980–1991 | no | USA (Rhode Island, Massachusetts) | 3,811 | residents of two communities aged 35–74 years | ICD-9 inpatient records | CRMD | |
1994–1996 | yes | Norway | 759 | hospitalized patients aged 15 years and older | ICD-9 inpatient records | disease registry, using WHO criteria | |
1977–1995 | no | Denmark | 191 | patients hospitalized at one university hospital or two other hospitals within one county | ICD-8 and ICD-10 inpatient records | CRMD | |
2003–2007 | yes | Australia | 570 | hospitalized patients admitted through the emergency department and diagnosed upon admission with TIA | ICD-10 inpatient records | chart review | |
1995–1997 | yes | USA (North Carolina) | 175 | hospitalized patients at one Veterans Affairs Medical Center | ICD-9 inpatient records | CRDC—TOAST criteria | |
1999–2004 | yes | USA (Indiana) | 663 | all inpatients and outpatients seen at one Children's Hospital | ICD-9 inpatient and outpatient records | CRMD | |
2006–2007 | yes | France | 329 | patients ≥ 18 years of age admitted to one of four university hospitals | ICD-10 inpatient records | disease registry: AVC69 cohort | |
1993 | yes | Wales | 166 | patients admitted to the Department of the Care of the Elderly at one of four hospitals within one health unit | ICD-9 inpatient records | CRMD | |
1994–2000 | yes | USA (national) | 34,016 | women participating in the Women's Health Initiative clinical and observational studies | ICD-9 inpatient records | CRDC—Women's Health Initiative criteria | |
1998–99, 2000–01 | yes | Australia | 14,635 | all hospitalized patients (excluding same-day chemotherapy and dialysis) | ICD-10 inpatient records | chart review: charts were re-coded by professional coders | |
2002–2007 | yes | Canada | 1,292 | patients hospitalized at one of four hospitals | ICD-10 inpatient records | chart review: charts were re-coded by nurses with coding experience | |
2003–2007 | no | USA | 132 | new users of atomoxetine or stimulant ADHD medications, and general population controls, identified from a health insurance database for a study assessing the association between atomoxetine and stroke in adults | ICD-9 inpatient records | CRMD | |
2006–2008 | yes | Taiwan | 1,736 | patients hospitalized at one tertiary referral centre | ICD-9 inpatient records | disease registry (Taiwan Stroke Registry) and CRMD | |
1994–1995 | yes | Canada (British Columbia) | 817 | patients hospitalized for percutaneous coronary intervention | ICD-9 inpatient records | chart review | |
1970 & 1980 | yes | USA (Minnesota) | 214 | residents of the study area aged 30–74 years who died in hospital, identified as part of the Minnesota Heart Survey | ICD-8 and ICD-9 vital statistics records | CRDC–National Survey of Stroke | |
1989–1992 | yes | USA (California, Maryland, North Carolina, Pennsylvania) | 5,201 | participants in the population-based Cardiovascular Health Study aged 65 years or older | ICD-9 inpatient and vital statistics records | CRMD | |
1993–1999 | yes | Denmark | 565 | participants in a population-based cohort study on diet and cancer development aged 50–64 years at enrollment | ICD-10 inpatient records | CRDC—WHO criteria | |
1987–2010 | yes | USA (Maryland, Minnesota, Mississippi, North Carolina) | 4,260 | members of the population-based Atherosclerosis Risk in Community (ARIC) Study cohort, aged 45–64 years at the time of study enrollment | ICD-9 inpatient records | CRDC–National Survey of Stroke, AHA/ASA | |
2002–2007 | yes | United Kingdom | 2,147 | all hospitalized patients residing in the study area | ICD-10 inpatient records | chart review: mentioned in records | |
1978–1996 | no | USA (California) | 3,441 | members of a prepaid healthcare program who supplied data on voluntary health examinations | ICD-9 inpatient records | CRMD | |
2000–2003 | yes | Canada (Alberta) | 717 | hospitalized patients at three centres | ICD-9 and ICD-10 inpatient records | chart review | |
2004 | yes | Sweden | 3,534 | residents aged 20 years and older of two Swedish counties covered by the MONICA register | ICD-10 inpatient, outpatient, and vital statistics records | disease registry—MONICA | |
1998–1999 | yes | Denmark | 236 | enrollees in the population-based Copenhagen City Heart Study | ICD-10 inpatient records | CRDC—WHO criteria | |
2003–2009 | yes | USA (national) | 15,089 | participants in the REasons for Geographic And RacialDifferences in Stroke (REGARDS) study, aged ≥ 65 years with at least one month of Medicare eligibility | ICD-9 inpatient records | CRDC–WHO criteria | |
1980, 1985, 1990, 1995, 2000 | no | USA (Minnesota) | 6,032 | general population aged 30–74 years | ICD-9 inpatient records | CRDC–WHO and Minnesota Stroke Survey criteria, and neuroimaging | |
1993–2007 | yes | USA (national) | 48,877 | participants enrolled in the observational Women’s Health Initiative studies aged 50–79 years at enrollment with Medicare fee-for-service coverage | ICD-9 inpatient records | CRDC–Women's Health Initiative criteria | |
2002–2006 | yes | Canada (Quebec) | 1,982 | patients hospitalized with MI as a principal diagnosis, or who underwent PCI or CABG, at one of 13 primary, secondary, or tertiary hospitals | ICD-9 inpatient records | chart review; mentioned in records | |
1997–1999 | yes | Canada (Ontario) | 1,592 | hospitalized individuals <105 years of age coded with a primary/most responsible diagnosis of heart failure | ICD-9 inpatient records | CRDC—Charlson comorbidity index | |
1970, 1980, 1984, 1989 | yes | USA (Minnesota) | 377 | all hospitalized patients residing in the study area | ICD-A and ICD-9 inpatient hospitalizations | disease registry—Rochester Stroke Registry | |
1991–2002 | yes | USA (Missouri) | 571 | kidney transplant recipients aged ≥ 18 years who had Medicare as their primary insurer at transplant and the time of each clinical event | ICD-9 inpatient records | clinical database | |
1998 | yes | Italy | 1,126 | hospitalized patients from the Neurology, Neurosurgery, General Medicine, Cardiac Surgery, and Intensive Care departments at one centre | ICD-9 inpatient records | CRDC—WHO criteria | |
1985–1989, 1992 | yes | Finland | 593 | male smokers enrolled in a population-based, randomized controlled trial of alpha-tocopherol and beta-carotene supplementation, aged 50–69 years at registration | ICD-8 and ICD-9 inpatient and vital statistics records | CRDC—National Survey of Stroke and MONICA criteria | |
1994 | yes | Canada (Quebec) | 224 | individuals aged ≥ 65 years discharged alive with a primary diagnosis of myocardial infarction | ICD-9 inpatient records | chart review: mentioned in records | |
1977–1987 | yes | Sweden | 413 | participants in a hypertension registry (hypertensive cases, normotensive participants, randomly-selected controls), recruited from a geographical half of one Swedish county aged 40–70 years at registration | ICD-9 inpatient and vital statistics records | CRMD | |
1990–1991 | yes | Canada (Saskatchewan) | 1,494 | patients hospitalized at one of three tertiary-care, or three community, hospitals | ICD-9 inpatient records | CRDC—National Survey of Stroke criteria | |
n/a | yes | Canada (Quebec) | 96 | patients hospitalized at five teaching, affiliated, and community hospitals | ICD-9 inpatient records | CRMD | |
2001–2009 | yes | USA (Colorado) | 4,689 | enrollees in a managed care organization aged 18 years and older | ICD-9 inpatient and outpatient records | CRDC–Rochester, Minnesota Stroke Study criteria | |
1992–1995 | yes | USA (Washington) | 471 | enrollees of a Health Maintenance Organization with diabetes aged 18 years and older | ICD-9 inpatient and outpatient records | chart review | |
1997–1999 | no | Italy | 8,000 | general population aged 35–74 years residing in one of eight regions of Italy | ICD-9 inpatient and vital statistics records | CRDC—MONICA criteria | |
1988–1989 | yes | Canada | 301 | patients hospitalized at one teaching hospital | ICD-9 inpatient and vital statistics records | CRDC–WHO criteria | |
2000 | yes | USA (Texas) | 815 | as part of the Brain Attack Surveillance in Corpus Christi (BASIC) Project, patients ≥ 45 years admitted to one of six area hospitals | ICD-9 inpatient records | CRDC–MONICA criteria | |
2009 | yes | Spain | 400 | patients hospitalized at one of two public hospitals | ICD-9 inpatient records | CRMD | |
2006–2008 | yes | Sri Lanka | 648 | deaths occurring at three large hospitals in/near the capital city | ICD-10 vital statistics records | CRMD | |
2002 | yes | China | 2,917 | deaths occurring in health facilities located in one of six large cities | ICD-10 vital statistics records | CRMD | |
1998–1999 | yes | USA (national) | 671 | patients hospitalized at 11 Veterans Affairs Medical Centres | ICD-9 inpatient records | CRDC—WHO criteria | |
1985–1992 | yes | Italy | 193 | deaths occurring among residents of one municipality | ICD-9 vital statistics records | CRDC–WHO criteria | |
1999 | yes | Italy | 233 | hospitalized patients at one centre | ICD-9 inpatient records | prospective clinical examination and retrospective CRDC—WHO criteria | |
1987–1995 | no | USA (Maryland, Minnesota, Mississippi, North Carolina) | 1,185 | members of the population-based Atherosclerosis Risk in Community (ARIC) Study cohort, aged 45–64 years at the time of study enrollment | ICD-9 inpatient records | CRDC–National Survey of Stroke | |
1999–2003 | yes | USA (Tennessee) | 231 | Medicaid enrollees aged 50–84 years, identified as part of a larger retrospective cohort study on the relationship between NSAID use and stroke | ICD-9 inpatient records | CRMD | |
1980,1985, 1990 | no | USA (Minnesota) | 2,939 | general population aged 30–74 years | ICD-9 inpatient records | CRDC—WHO, Minnesota Stroke Survey | |
2007–2009 | yes | USA (Minnesota) | 240 | retrospective cohorts of patients ≥ 18 years of age admitted to the intensive care unit | ICD-9 inpatient and outpatient records | CRMD | |
1993–2003 | yes | United Kingdom | 250 | residents in one community aged 40–79 years and enrolled in a population-based study of the determinants of chronic disease | ICD-10 inpatient records | CRDC—WHO criteria | |
2003 | yes | Canada (Alberta) | 193 | patients hospitalized for myocardial infarction | ICD-9 and -10 inpatient records | chart review | |
2003 | yes | Scotland | 3,219 | participants in a population-based cohort study of chronic kidney disease | ICD-10 inpatient records | CRMD | |
1999 | yes | Italy | 4,015 | general hospitalized population | ICD-9 inpatient records | CRDC—MONICA criteria | |
1985–1989 | yes | Sweden | 6,000 | residents of the two provinces included in the Northern Sweden MONICA study aged 25–74 years | ICD-9 inpatient and vital statistics records | disease registry—MONICA | |
1984–1986 | yes | Poland | 213 | residents of two city districts covered by the POL-MONICA Warsaw Project aged 25–64 years | ICD-9 vital statistics records | disease registry—MONICA | |
2006–2010 | yes | USA (Alabama, Massachusetts, Pennsylvania) | 1,812 | patients with atrial fibrillation hospitalized at one of three medical centres | ICD-9 inpatient records | CRDC–WHO, AHA | |
1990–1996 | yes | USA (Washington) | 206 | general hospitalized population ≥ 20 years of age | ICD-9 inpatient records | CRMD | |
1993–1998 | yes | Finland | 3,633 | general population aged 25 years and older | ICD-9 and -10 inpatient and vital statistics records | disease registry—FINMONICA/FINSTROKE register | |
2011 | yes | Canada | 5,000 | individuals aged ≥ 20 years seen by a family practice physician using the EMRALD EMR system | ICD-10 inpatient and outpatient records | CRMD | |
2002–2004 | yes | USA (national) | 200 | commercially-insured individuals in a large health claims database, identified as part of a larger retrospective observational cohort study on the risk of serious adverse events among users of selective coxibs and non-over-the-counter NSAIDs | ICD-9 inpatient records | CRMD | |
2009–2010 | yes | Denmark | 228 | Part 1: individuals ≥ 18 years admitted to hospital; Part 2: patients discharged from one of four neurologic wards | ICD-10 inpatient records | CRDC–WHO criteria | |
2004–2005 | yes | Taiwan | 15,574 | individuals aged ≥ 12 years whose households were randomly selected for participation in the 2005 Taiwan National Health Interview Survey | ICD-9 inpatient and outpatient records | patient self-report |
CRDC = Chart Review, Diagnostic Criteria–the charts of potential cases were reviewed, and a formal set of diagnostic criteria were applied when evaluating cases; CRMD = Chart Review, Medical Doctor–the charts of potential cases were reviewed by a physician, who evaluated cases using their clinical judgment or an otherwise unspecified set of criteria; AHA = American Heart Association; ASA = American Stroke Association; CABG = coronary artery bypass graft; EMRALD = Electronic Medical Record Administrative Data Linked Database; ICD = International Classification of Diseases; MONICA = MONItoring Trends and Determinants in CArdiovascular Disease; NSAID = non-steroid anti-inflammatory drug; PCI = percutaneous coronary intervention; TOAST = Trial of ORG 10172 in Acute Stroke Treatment; TIA = transient ischaemic attack; WHO = World Health Organization
Chart reviews, sometimes in conjunction with unspecified diagnostic criteria, formed the basis of the gold standard in 35 studies, patient self-report was used in one [
Study quality was evaluated based on the QUADAS tool [
The validation statistics reported by each of the included studies are provided in
Eight of the 16 studies [
The sensitivity of these codes for the narrower diagnosis of acute stroke (SAH, ICH, or ischaemic stroke), which was reported by ten studies [
Twenty-seven papers reported on the validity of codes for SAH (ICD-9 430 or ICD-10 I60), and the PPV was ≥ 86% in 16 of the 26 studies where this was reported (
Thirty-four studies evaluated the validity of codes for ICH (ICD-9 431/432 or ICD-10 I61/62) (
We located 39 papers that examined the validity of codes for ischaemic stroke (
The combination of ICD-9 433 and 434 (occlusion of the precerebral or cerebral arteries) was reported on by 20 studies [
Nineteen studies [
Thirty-six papers examined the validity of a set of stroke-specific codes for identifying any type of stroke (
The sensitivity of these sets of codes for stroke was ≥ 82% in 13 of 22 studies [
The 77 studies included in this review were published over a 40-year period (1976–2015), though 81% of these (n = 62) were published from 1999-onwards. Few studies reported on any longitudinal changes in sensitivity, and few longitudinal trends in PPV were observed after the 77 studies were stratified by period of publication. For instance, amongst the 27 studies reporting on the PPV of ICD-9 434/ICD-10 I63, the PPV ranged from 64% to 100% in the eight-earliest studies (published from 1993 to 1998), from 72% to 100% in the ten middle studies (published from 1999 to 2004), and from 62% to 100% in the nine most-recent studies (published from 2005 to 2015). And among the 26 studies reporting on ICD-9 431/ICD-10 I61, the PPV ranged from 66% to 100% in the 12 earlier studies (published from 1993–2002) and from 65% to 100% in the 13 more-recent studies (published from 2004–2014). Still, several investigators collected data over ten or more years, and some improvements were observed in the PPV for stroke over time. In one study [
The accuracy of fatal and non-fatal stroke diagnoses were compared in nine studies [
Most studies examined codes from the ICD 8th and 9th revisions, though 22 studies examined codes from the 10th revision. Just one of these studies [
In performing what is (to our knowledge) the broadest systematic review ever conducted on the validity of stroke diagnoses in administrative data, we observed high PPVs for codes pertaining to the different subtypes of acute stroke. The PPV of SAH codes for an SAH diagnosis was ≥ 93% in most studies, that for the main ischaemic stroke codes was ≥ 82%, and the PPV of the main ICH codes for an ICH diagnosis was ≥ 89%. For diagnoses of fatal stroke, the PPV was ≥ 87% in most studies. The validity of the group of ICD codes corresponding to cerebrovascular disease in general (ICD-9 430–438 and ICD-10 I60-69) was also generally good; sensitivity was ≥ 82% in half the studies where this was reported, specificity and NPV were ≥ 95%, and the PPV of these codes against the broader reference standard of ‘any cerebrovascular disease’ was ≥ 81% in most studies. However, the PPV was lower (≤ 68% in 12 of 21 studies) when the reference standard was restricted to acute stroke (defined as SAH, ICH, or ischaemic stroke). Given these findings, we conclude that most diagnoses of fatal stroke in administrative data correspond to true stroke deaths, and that the presence of any code from 430–438 or I60-I69 can be used to rule-in the diagnosis of prevalent cerebrovascular disease. We also conclude that administrative data can be used to identify cases of acute stroke, as long as extraneous codes (i.e. ICD-9 432, 435, 437, and 438; ICD-10 I62, I67, I68, and I69) are excluded.
Only a few studies evaluated the sensitivity of individual codes for stroke but from these, it appears the sensitivity of the main ICD-9 code for ischaemic stroke (434) is suboptimal. However, findings from some studies included in this review suggest that adding ICD-9 code 433 (occlusion of precerebral arteries), and/or 436 (acute but ill-defined stroke) to the search algorithm can help capture more cases of ischaemic stroke at little cost to the PPV. Further support is provided by the fact that the PPV of ICD-9 436 for ischaemic stroke in most applicable studies was ≥ 75%. Cases of ischaemic stroke appear to be coded as “acute but ill-defined stroke” much more often than haemorrhagic strokes are. For example, of all strokes that were coded initially as ill-defined, Krarup
The findings of our review are consistent with those of a systematic review of algorithms for identifying acute stroke or TIA in administrative data that was published in 2012 [
The PPV of diagnostic codes for stroke appears to have improved over the decades, increasing by 20% in one study (where about 51% of stroke discharges identified in 1980, and about 61% of stroke discharges identified in 1990, were confirmed as stroke) [
Regional differences in access-to and use-of neuroimaging may explain other disparities that were observed in the validity of strokes diagnoses, even amongst studies conducted more recently (i.e. 1990 or later). Tolonen
It is possible that changes in billing and reimbursement practices, including the introduction of Diagnosis-Related Groups (DRGs) in the US Medicare program in the 1980’s, may also have contributed to the assignment of more precise stroke codes over time. To investigate this, Derby
In our systematic review of the validity of myocardial infarction (MI) diagnoses in administrative data [
Many of the studies that reported on the validity of ICD-9 codes 430–438, or ICD-10 codes I60-69, as a group were examining how well these codes performed for detecting preexisting cerebrovascular disease as a comorbidity. Searching for any one of the codes in this group is appropriate when seeking to identify and adjust for preexisting cerebrovascular disease in the analyses of other clinical conditions. The high sensitivity we observed for these codes, and high PPV they had when compared to the reference diagnosis of ‘any cerebrovascular disease’, provides additional support for this use. However, when studying acute stroke as a primary outcome, this broad group of codes, which includes the codes for non-acute and ill-defined cerebrovascular disease, and the late effects of stroke, should not be used. It is far more difficult to identify risk factors for stroke from this mixture of recent-onset and prevalent cerebrovascular disease because it is unclear which characteristics may have increased the risk of
We acknowledge some limitations to our systematic review. There is the potential for a language bias as we could not consider articles whose full-text was not available in English. We were also conservative in our definition of acute stroke, and excluded studies that only reported on the validity of diagnostic codes for TIA. Another potential limitation stems from the fact that, even though our database searches were conducted by an experienced librarian, administrative databases are not well catalogued in MEDLINE and EMBASE (e.g. no MeSH term pertaining to “administrative database”). Although the majority of the included studies were located through database searches, our subsequent hand search turned up other relevant articles that had not been indexed under terms relating to Administrative Data or Validation. As a result, despite our extensive hand search, we may have missed some relevant articles if they were not indexed in MEDLINE or EMBASE under a term relating to administrative data or validation. Our findings are also subject to publication bias, wherein reports of stroke codes having poor validity may have been differentially withheld from publication. We feel this is unlikely, however, given that we did locate reports of case definitions (i.e. ICD-9 432 or 433 individually) whose sensitivity and PPV for acute stroke were suboptimal.
Following our analysis of the evidence, we conclude that the diagnostic codes for acute stroke in administrative databases are valid. In fact, advances in neuroimaging and the increased availability of CT scanners may have helped improve diagnosis and coding of acute stroke subtypes over time. However, it is apparent that researchers have been using a variety of codes to identify acute stroke, some of which have suboptimal validity. Based on current evidence, we provide researchers with several recommendations for the use of diagnostic codes to capture cases of stroke in administrative data. We believe the findings of our review will help guide researchers in their efforts to better understand and decrease the burden of stroke.
1. As a group, the range of codes for cerebrovascular disease (ICD-9 430-438/ICD-10 I60-I69) has good sensitivity (≥ 82%), specificity (≥ 95%), and PPV (≥ 81%) for identifying the aggregate diagnosis of acute or preexisting cerebrovascular disease.
2. Codes that pertain to the diagnosis of acute stroke (ICD-9 430/ICD-10 I60, ICD-9 431/ICD-10 I61, ICD-9 434/I63, and ICD-9 436/ICD-10 I64) are highly predictive of true cases of acute stroke of any type, and of the particular subtype.
⚬ These are the codes that should be used when identifying acute stroke as an outcome, especially in pharmacoepidemiologic and other analyses of risk factors where diagnostic specificity is essential.
3. When searching for cases of ischaemic stroke, including both the code for ischaemic stroke (ICD-9 434/ICD-10 I63) and the code for acute-but-ill-defined stroke (ICD-9 463/ICD-10 I64) in the case definition should help capture more cases of ischaemic stroke with little impact on the PPV.
4. Whether identified from hospitalization or vital statistics data, diagnoses of fatal stroke generally correspond to true deaths from stroke.
5. Hospitalization and vital statistics databases should be linked and searched together in order to maximize the capture of stroke deaths.
(DOC)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOC)
(DOCX)
(DOC)
(DOC)
(DOCX)
The authors wish to thank members of the CANRAD network, librarian Mary-Doug Wright (B.Sc., M.L.S.) for conducting the literature search, Reza Torkjazi, and Kathryn Reimer for the administrative support and help editing the manuscript.