Accuracy of routinely-collected healthcare data for identifying motor neurone disease cases: A systematic review

Background Motor neurone disease (MND) is a rare neurodegenerative condition, with poorly understood aetiology. Large, population-based, prospective cohorts will enable powerful studies of the determinants of MND, provided identification of disease cases is sufficiently accurate. Follow-up in many such studies relies on linkage to routinely-collected health datasets. We systematically evaluated the accuracy of such datasets in identifying MND cases. Methods We performed an electronic search of MEDLINE, EMBASE, Cochrane Library and Web of Science for studies published between 01/01/1990-16/11/2015 that compared MND cases identified in routinely-collected, coded datasets to a reference standard. We recorded study characteristics and two key measures of diagnostic accuracy—positive predictive value (PPV) and sensitivity. We conducted descriptive analyses and quality assessments of included studies. Results Thirteen eligible studies provided 13 estimates of PPV and five estimates of sensitivity. Twelve studies assessed hospital and/or death certificate-derived datasets; one evaluated a primary care dataset. All studies were from high income countries (UK, Europe, USA, Hong Kong). Study methods varied widely, but quality was generally good. PPV estimates ranged from 55–92% and sensitivities from 75–93%. The single (UK-based) study of primary care data reported a PPV of 85%. Conclusions Diagnostic accuracy of routinely-collected health datasets is likely to be sufficient for identifying cases of MND in large-scale prospective epidemiological studies in high income country settings. Primary care datasets, particularly from countries with a widely-accessible national healthcare system, are potentially valuable data sources warranting further investigation.


Introduction
Motor neurone disease (MND) is a rare, rapidly progressive, neurodegenerative disease, which leads to muscle wasting, weakness and usually death within a few years of onset [1]. The aetiology is unclear and at present no cure is available. Further research that extends our current understanding of the aetiology and pathophysiology of the disease is urgently needed to bring us closer to developing effective treatment strategies.
Very large, population-based, prospective studies involving bio-sampling, detailed phenotyping and genotyping are ideal for investigating the determinants of diseases of complex aetiology, including neurodegenerative diseases such as MND. Through identifying sufficiently large numbers of incident cases of disease, such studies can provide adequate statistical power to detect associations of environmental, lifestyle, biological and genetic exposures with disease outcomes. They can also overcome the inherent limitations of retrospective case-control studies, including recall and reverse causation biases. A prominent example of such a study is UK Biobank, which recruited 500,000 participants aged 40-69 years old between 2006 and 2010, and has obtained a wealth of baseline information, stored bio-samples for current and future assays, additional post-recruitment phenotyping, genome-wide genotyping and consent for long term follow-up. Follow up of the participants' health is chiefly via linkage to routinely-collected national health datasets such as hospital admissions, death registrations and primary care data. Data from UK Biobank are of substantial relevance to the international research community, since they are available to any bona fide researcher worldwide who wishes to conduct health-related research for the benefit of the public's health [2].
Cohort-wide linkage to routinely-collected health datasets, especially within the context of a universally-available healthcare system such as the UK's National Health Service (NHS), is a comprehensive and cost-efficient method of case identification for large prospective studies such as UKB. For aetiological research, the identification of disease cases within these cohorts must be of sufficient accuracy, with a high positive predictive value (PPV) and reasonable sensitivity. The accuracy of MND coding in these routinely-collected datasets therefore needs to be understood.
PPV refers to the proportion of cases identified by codes in routinely-collected health datasets that are true cases. Sensitivity refers to the proportion of true cases in a population that are identified by using codes in these health datasets. Specificity and negative predictive value (NPV) are less important accuracy measures for case-control comparisons nested within prospective studies as they tend to be high in these situations. In particular, NPV will be high when most individuals in the population do not have the disease in question.
In this study we aimed to systematically review all studies that investigated the accuracy of routinely-collected health datasets in identifying MND cases by comparing coded information to a reference standard.
In this paper we use the term 'motor neurone disease' as an umbrella term for the group of diseases of which amyotrophic lateral sclerosis (ALS) is one subtype, along with others such as progressive bulbar palsy and progressive muscular atrophy. Elsewhere, particularly in North America, the term ALS is used as the overarching term for this set of disorders. This difference in the use of the term ALS should be borne in mind when interpreting the results of studies in this review.

Search strategy
We searched MEDLINE (Ovid), EMBASE (Ovid), CENTRAL (Cochrane Library) and Web of Science (Thompson Reuters) for studies published between 1/1/1990-16/11/2015 that compared MND coding in routinely-collected datasets to a reference standard (see S1 Table for search criteria). We identified additional studies by searching the bibliographies of included studies and from personal communication. Two authors (SH and TW) independently screened all titles, abstracts and potentially relevant full text articles, resolving selection discrepancies through discussion and mutual consensus, and remaining areas of uncertainty through discussion with a third, senior author (CLMS).

Eligibility criteria
Studies had to have been published in a peer-reviewed journal; to have compared routinelycollected, coded datasets using internationally recognised coding systems (e.g. International of Classification of Diseases, Read) to a reference standard for MND, based on medical diagnostic review; to have reported PPV, sensitivity or both (or provided data from which these could be calculated); and to have a sample size of !10 MND cases (since smaller studies would have limited precision). Studies estimating sensitivity had to have used a population-based reference standard, with comprehensive MND case ascertainment (e.g. a population-based MND register or similar). We did not impose any limitations based on published language or the country in which the study was conducted.

Data extraction & analysis
Using pre-tested data extraction forms, two authors (SH and TW) independently extracted the following information: first author, publication year, country from which the relevant coded data were obtained, enrolment period, study population characteristics, study size, routine dataset(s) assessed (hospital, deaths, primary care), coding system, codes and coding positions (primary, secondary or any position) used to identify cases, reference standard used, PPV and/ or sensitivity with their 95% confidence intervals (or data to calculate these). Any discrepancies were discussed and resolved with a third, senior author (CLMS). If data required to extract a PPV or sensitivity estimate were unclear or omitted from the published manuscripts, we contacted the original study authors for clarification.
Where appropriate, our approach used features of the methodology developed for systematic reviews of diagnostic test accuracy studies. However, there were key differences. In particular, for many studies that investigated PPV, it was not possible to also calculate sensitivity because the total number of MND cases (true positives and false negatives) in the relevant population was not known. We adapted the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool to evaluate study quality (S1 File) [4]. Two authors (SH & TW) completed the assessments of risk of bias and applicability (relevance to the study question) for the following QUADAS-2 categories: patient selection, source of coded data (including the codes used to identify cases), reference standard and study flow (e.g. whether all cases were accounted for). We assessed the risk of bias and the applicability of studies with respect to our review purpose, not on the quality of the paper in general. We did not exclude studies on the basis of quality.
Where data were available, we calculated 95% confidence intervals for PPV and sensitivity directly. We generated statistical measures of heterogeneity using I 2 and chi-squared methods, but we focussed on descriptive assessments of heterogeneity based on evaluating study methodologies. We did not perform a formal meta-analysis as the substantial heterogeneity in methodologies between studies would make any summary measure of PPV or sensitivity potentially misleading. Instead, we performed a descriptive analysis, and considered factors that might influence PPV and sensitivity through visual inspection of the range of values in a forest plot. We also investigated within-study comparisons of PPV values with respect to characteristics reported by at least two studies (age, sex and coding position). We performed statistical analyses with StatsDirect3 software.
All studies were based in high income countries. Three were from the UK [5,7,9], seven from other European countries [6,8,10,12,13,16,17], two from the USA [14,15] and one from Hong Kong [11]. For studies reporting PPV, sample size (number of participants with an MND code) ranged from 48-433; for those reporting sensitivity, sample size (number of participants known to have MND in the population-based reference standard) ranged from 95-488. The studies were conducted over a range of different time periods. Eight began before 2000, and three of these began prior to 1990. The vast majority of studies included assessment of hospital and/or death certificate-derived datasets, with only one study assessing primary care data [5].
Hospital and death data were coded using various different versions of the World Health Organisation International Classification of Diseases (ICD) system [18] (see Table 3). Based on the codes chosen, studies variably investigated all-cause MND, amyotrophic lateral sclerosis (ALS) or other MND subtypes. The single study that used primary care data did not report which coding system it used [5], but, since the study was UK-based, this is likely to have been the Read coding system, used since 1985 in UK primary care [19].
The broad categories of diagnostic reference standard used were medical record review, presence in an MND patient register and direct patient assessment, although methodological details and diagnostic criteria for case confirmation varied. Four studies [6,[11][12][13] used either the original or revised El Escorial criteria [20,21] to confirm a diagnosis of MND. These criteria require evidence of upper and lower motor neurone involvement, with a progressive spread of the regions affected. Depending on the clinical evidence obtained, the original El Escorial criteria identify five levels of diagnostic certainty: suspected, possible, laboratory-supported probable, probable and definite [20], while the revised El Escorial criteria identify three levels: possible, probable and definite [21]. Studies differed in the diagnostic certainty threshold required (Table 1).

Quality assessment
Studies generally performed well in the subjective quality assessment (S2 Table). We did not consider any to be of high risk of bias or to have substantial applicability concerns. However, we rated 12 of the 13 studies as 'unclear' for at least one category, either because there was insufficient information to assess the category, or because we could not be sure what effect the reported methodology for that category would have on bias or applicability.

Sensitivity of routinely-collected datasets in MND case identification
There were five estimates of sensitivity, two from hospital discharge data [7,8] and three from mortality data [7,16,17] (Fig 3). No studies assessed the sensitivity of primary care data. All of the sensitivities reported were !75% and the values were less variable (range: 75-93%) than the PPVs. All studies reporting sensitivity were of high quality in the QUADAS-2 assessment. There was no observable difference between the sensitivity measures arising from death or hospital data. The studies assessing death data reported sensitivities of 75-93% [7,16,17] and those evaluating hospital data reported sensitivities of 79-84% [7,8].

UK-based routinely-collected datasets
The UK NHS provides an ideal substrate for data linkage studies as there is a single provider of healthcare services. Of particular relevance for researchers worldwide using UK Biobank (and other large population-based UK cohorts that have linkage to routinely-collected healthcare datasets), the three UK-based studies reported some of the best performing results with respect to PPV and sensitivity [5,7,9] (Figs 2 and 3) and also scored well on the QUADAS-2 quality assessments.

Within-study comparisons
Four studies conducted within-study analyses of the effects of age, gender or coding position on PPV estimates. However, sufficient data were not available to permit a consistent assessment of the statistical significance of the differences reported. Results are displayed in Table 4. Two studies of hospital data and one of primary care data assessed the effect of age [5,8,12]. While the small primary care study found no difference in PPV between participants aged 70 and >70 years [5], the two larger studies of hospital data each reported a decline in PPV above the age of 70 to 75 years [8,12]. One of these reported that PPV increased with advancing age until ultimately falling in the elderly [8].
Two studies of hospital data investigated the effect of the coding position of the recorded MND diagnosis [8,14]. Both found that codes in the primary position had a higher PPV than

Discussion
There is no widely-agreed level of the accuracy required for identifying disease cases for research using routinely-collected health datasets, and acceptable PPV and sensitivity thresholds will differ depending on the specific study purpose. In this systematic review we have shown that although reported accuracy estimates for identifying MND cases from such datasets vary widely, individual datasets often achieve PPV or sensitivity values of !80%, and can reach >90%.
False positive cases identified from coded data can be due to diagnostic or administrative errors. Given that-at least in many high income countries-the diagnosis of MND is usually made or confirmed by a specialist [22] we would expect diagnostic error to be low. However, clinical experience suggests that there are many patients in whom the diagnosis of MND is highly likely despite not meeting formal diagnostic criteria. Considering such patients to be  Routinely-collected health data in motor neurone disease 'false positives' in validation studies of coded data may result in falsely-low PPV estimates. The sensitivity of coded hospital admissions data for the identification of MND cases will depend on how likely MND patients are to be admitted to hospital during the course of their illness. This is likely to vary by geographic location, with differences in healthcare access and provision. Since MND usually leads to death within a few years of diagnosis, one would expect coded death data to be a sensitive source of MND case identification, as we observed.
Primary care data appears to be a promising source of MND case ascertainment for prospective studies based in countries with universally-accessible primary health care. Primary care in the UK is a free, comprehensive and lifelong service, in which general practitioners (GPs) act as gatekeepers to more specialist services, meaning that most individuals with an active diagnosis are likely to present to primary care at least once. Furthermore, GPs hold comprehensive medical notes for their patients, including correspondence from secondary care, resulting in diagnoses made in secondary care being coded in primary care datasets. Primary care data may therefore prove to be a rich resource for the study of MND epidemiology, particularly in countries without a national MND register. However, since only one small study reported the PPV of MND codes recorded in primary care data [5], and none reported the sensitivity, future investigation of the value of coded primary care data in MND case ascertainment is warranted.
Within-study analyses minimise confounding by variation in study methodology and setting, and so should enable more reliable evaluation of factors affecting the accuracy of case identification than between-study comparisons. However, such analyses were only available for a small number of studies and factors potentially influencing accuracy. While they showed that limiting case ascertainment to those recorded at the primary coding position may increase PPV, this was at the expense of the number of cases identified. In population-based, prospective studies such as UKB, methods of identifying disease cases with a high PPV are generally prioritised over those with a high sensitivity, as the effect of any false negatives (cases that are misidentified as controls) in case-control and case-cohort studies is diluted amongst the very large number of control subjects [23]. However, sensitivity needs to be sufficient to generate large numbers of cases for adequate statistical power as well as to ensure that representative cases are ascertained across the disease spectrum. It is important to strike a balance between the comprehensiveness of case ascertainment (reflected by high sensitivity) and the proportion of the pool of cases identified that are true positives (PPV).

Heterogeneity of accuracy estimates
The wide range of reported PPV and sensitivity measures likely reflects variation in study methodologies as well as between the data sources.
The method of case confirmation (reference standard) could influence reported estimations of accuracy. Studies differed in their application of the El Escorial criteria, while subjects that could not be traced were counted as false positives in some studies but excluded from the analysis in others.
The system used to assign codes to diagnoses could also account for some variation. Most included studies assessed data coded using ICD-9 or ICD-10, which differ with respect to coding of MND subtypes: ICD-10 lists only ALS and progressive bulbar palsy as specific subtypes of MND, whereas ICD-9 permits sub-classification of five subtypes. However, variable study methods and characteristics precluded a reliable assessment of the effect of coding system on accuracy. A further issue relates to a problem with MND subgroup coding in an early version of ICD-10 [9], in which the condition progressive supranuclear palsy (PSP) was wrongly given the code G12.2 for MND. This may affect the results of studies coded before this problem was rectified, as patients with PSP would have been given an MND code and then counted as false positives (e.g., Doyle et al. [2012] discovered that 8% of cases with the ICD-10 code G12.2 were miscoded due to this error [9]).
Variation may also arise from the specific codes chosen. Included studies variably investigated MND, ALS, and/or other specific disease sub-types. Studies that used a broad code, such as the ICD-9 335.2, would include rarer subtypes such as progressive muscular atrophy or primary lateral sclerosis (335.21 and 335.24 respectively) in addition to ALS, although the effect of including these very rare subtypes is likely to be minimal, as they are much less common than the ALS variant. More importantly, the choice of code to identify the relevant condition was sometimes inaccurate, leading to misclassification. For example, the ICD-9 code 335.2 which represents a diagnosis of MND, was often used interchangeably with code 335.20 representing ALS. Such usage may have led to the inappropriate classification of some outcomes as false positives within studies, but as clinical information for every possible case was not available, we were unable to determine the effect on results.
Although we cannot estimate their effects quantitatively, these methodological issues are likely to have caused spuriously low as opposed to falsely elevated PPV estimates, suggesting that the results generally represent minimum estimates of PPV.

Increasing the accuracy of case identification
Accurate case ascertainment may be optimised by an algorithm which draws upon multiple sources, to improve both PPV and sensitivity. For example, one study that did not meet the eligibility criteria for this review as it combined routine and non-routine data sources (insurance data, death registrations, reports from local neurologists and records from the ALS Association) achieved an improvement in PPV from 84% with single sources to 98% with combined sources [24]. An additional method of improving PPV might be to only include cases that appear more than once within a dataset or in more than one dataset. Where possible, linkage to robust, comprehensive, national disease registers such as the population-based Scottish MND Register [25] is likely to be a powerful way to increase both sensitivity and PPV.

Strengths and limitations
Our review benefits from rigorous methodology, including prospective protocol publication, comprehensive search criteria, and involvement of two independent authors in study screening, quality assessments and data extraction. While some relevant studies may have been missed, our extensive search criteria should minimise this possibility. We included all identified, eligible studies to retain a comprehensive, systematic approach and avoid study selection bias. While including studies of lower quality could theoretically affect our results, such studies did not have extreme PPV or sensitivity values. Publication or selective reporting biases could have influenced our results, since studies showing high accuracy might be published or reported more often than those with lower accuracy. However, such effects are difficult to assess meaningfully in this type of review, and so we did not attempt formally to estimate these potential biases. Lastly, PPV increases with the prevalence of the condition of interest in the study population, meaning that PPVs will tend to be higher for common conditions. We were unable to assess the underlying prevalence of MND across the study populations, but given that MND is generally rare, we believe this is unlikely to have substantially affected variability of our PPV estimates.

Conclusions
In general, PPV and sensitivity of routinely-collected health data in identifying MND cases are likely to be sufficient for many epidemiological studies investigating the determinants of MND. However, in view of the range of reported results, prospective studies may wish to perform their own validation studies to evaluate the PPV and/or sensitivity for their particular study setting and population. For UK Biobank, which has obtained primary care data for many participants, further studies that assess the improvements in accuracy achieved by identifying MND cases through primary care data in addition to hospital and death data, will be helpful. In the meantime, scientists interested in using UK Biobank or other UK-based prospective studies with data linkage for MND-related research can be reassured that PPV and sensitivity in UK studies of hospital admissions and death registration data are among the highest reported for MND, while the one UK-based primary care study showed promising results. In view of the different advantages associated with each type of dataset and the additional factors that influence the accuracy of a coded diagnosis of MND, the development of a case identification algorithm based on multiple overlapping sources may be particularly valuable and merits further investigation.