Identifying Parkinson's disease and parkinsonism cases using routinely collected healthcare data: A systematic review

Background Population-based, prospective studies can provide important insights into Parkinson’s disease (PD) and other parkinsonian disorders. Participant follow-up in such studies is often achieved through linkage to routinely collected healthcare datasets. We systematically reviewed the published literature on the accuracy of these datasets for this purpose. Methods We searched four electronic databases for published studies that compared PD and parkinsonism cases identified using routinely collected data to a reference standard. We extracted study characteristics and two accuracy measures: positive predictive value (PPV) and/or sensitivity. Results We identified 18 articles, resulting in 27 measures of PPV and 14 of sensitivity. For PD, PPV ranged from 56–90% in hospital datasets, 53–87% in prescription datasets, 81–90% in primary care datasets and was 67% in mortality datasets. Combining diagnostic and medication codes increased PPV. For parkinsonism, PPV ranged from 36–88% in hospital datasets, 40–74% in prescription datasets, and was 94% in mortality datasets. Sensitivity ranged from 15–73% in single datasets for PD and 43–63% in single datasets for parkinsonism. Conclusions In many settings, routinely collected datasets generate good PPVs and reasonable sensitivities for identifying PD and parkinsonism cases. However, given the wide range of identified accuracy estimates, we recommend cohorts conduct their own context-specific validation studies if existing evidence is lacking. Further research is warranted to investigate primary care and medication datasets, and to develop algorithms that balance a high PPV with acceptable sensitivity.


Conclusions
In many settings, routinely collected datasets generate good PPVs and reasonable sensitivities for identifying PD and parkinsonism cases. However, given the wide range of identified PLOS

Introduction
Despite well-established pathological features, the aetiologies of Parkinson's Disease (PD) and other parkinsonian conditions remain poorly understood and disease-modifying treatments have proved elusive [1]. Large, prospective, population-based cohort studies with biosample collections (e.g., UK Biobank, German National Cohort, US Precision Medicine Initiative) provide a robust methodological framework with statistical power to investigate the complex interplay between genetic, environmental and lifestyle factors in the aetiology and natural history of neurological disorders such as PD and other parkinsonian disorders [2][3][4]. Linkage to routinely collected healthcare data-which are administrative datasets collected primarily for healthcare purposes rather than to address specific research questions [5]-provides an efficient means of long term follow-up in order to identify large numbers of incident cases in such studies [2]. Furthermore, participant linkage to such datasets can be used in randomised controlled trials as a cost-effective and comprehensive method of follow-up for disease outcomes [6]. These data are coded using systems such as the International Classification of Diseases (ICD) [7], the Systematized Nomenclature of Medicine-Clinical Terms (SNO-MED-CT) system [8], and the UK primary care Read system [9].
There are several mechanisms by which inaccuracies can arise when using routinely collected healthcare data to identify PD outcomes. False positives (participants who receive a disease code but do not have the disorder) may arise if a clinician incorrectly diagnoses the condition. Given that PD and other parkinsonian disorders are largely clinical diagnoses made without a definitive diagnostic test, there is the potential for diagnostic inaccuracies. Clinicopathological studies have shown discrepancies between clinical diagnoses in life and neuropathological confirmation [10] and there is evidence that accuracy increases when diagnoses are made by movement disorder specialists [11][12][13]. Secondly, diagnoses may be incorrectly recorded in medical records, or errors may arise during the coding process. Similarly, false negatives (patients who have the condition but do not receive a code) may arise due to underdiagnosis, omission of the diagnosis from the medical records (e.g., because the condition is not the primary reason for hospital admission), or errors during the coding process.
As a result, before such datasets can be used to identify PD and parkinsonism cases in prospective studies, their accuracy must be determined. Important measures are the positive predictive value (PPV, the proportion of those coded positive that are true disease cases) and sensitivity (the proportion of true disease cases that are coded positive). Specificity and negative predictive value are less relevant metrics in this setting. A high specificity (the proportion of those without the disease that do not receive a disease code) is important to ensure a high PPV, thereby minimising bias in effect estimates. With an appropriately precise choice of codes, the specificity of routinely collected healthcare data to identify disease cases in population-based studies is usually very high (98-100%) [14,15]. However, in a population-based cohort study where the overall prevalence of a disease is low, a high specificity does not guarantee a high PPV-a large absolute number of people without the disease can be incorrectly classified as being disease cases (false positives), yet the overall proportion of misclassified cases can be low (high specificity, low PPV) [16]. NPV, like PPV, is related to disease prevalence and will therefore be high in population-based studies where most individuals do not develop the disease of interest [14].
Previous systematic reviews on the accuracy of routine data to identify other neurological diseases such as stroke [14], dementia [17] and motor neurone disease [18] have summarised the existing literature and identified methods by which accuracy can be improved, as well as areas for further evaluation. Here, we systematically reviewed published studies that evaluated the accuracy of routinely collected healthcare data for identifying PD and parkinsonism cases.

Study reporting
We followed the Preferred Reporting Items for Systematic Review and Meta-analysis statement (PRISMA) guidelines for the reporting of this systematic review [19].

Search strategy
We (AS & TW) searched the electronic databases MEDLINE (Ovid), EMBASE (Ovid), CEN-TRAL (Cochrane Library) and Web of Science (Thomson Reuters) for relevant articles published in any language between 01.01.1990 and 23.06.2017. Our search strategy is outlined in S1 File. We chose the date limits based on our judgement that accuracy estimates from studies published prior to 1990 would have limited current applicability. We did not exclude studies based on the dates covered by the datasets. We also screened bibliographies of included studies and relevant review papers to identify additional publications.

Eligibility criteria
To be included, studies had to have: compared codes for PD or parkinsonism from routinely collected healthcare data to a clinical expert-derived reference standard, and provide either a PPV and/or a sensitivity estimate (or sufficient raw data to calculate these). We excluded studies with <10 coded cases, due to the limited precision of studies below this size [17,18]. Studies reporting sensitivity values had to be population-based (i.e. community-based as opposed to hospital-based) with comprehensive attempts to detect all disease cases. Where multiple studies investigated overlapping populations, we included the study with the larger population size. Where articles assessed more than one dataset or evaluated both PPV and sensitivity, we included these as separate studies. Hereafter, we will refer to published papers as 'articles' and these separate analyses as 'studies'.

Study selection
Two authors (AS and SH) independently screened all titles and abstracts generated by the search, and reviewed full text articles of all potentially eligible studies to determine if the inclusion criteria were met. In the case of disagreement or uncertainty, we reached a consensus through discussion and, where necessary, involvement of a senior third author (CLMS).

Data extraction
Using a standardized form, two authors (TW and ZH) independently extracted the following data from each study: first author; year of publication; time period during which coded data were collected; country of study; study population; average age of disease cases (or, if this was unavailable, the ages of participants at recruitment); study size (defined as the total number of code positive cases for PPV [true positives plus false positives] and the total number of true positives for sensitivity [true positives and false negatives]); type of routine data used (e.g., hospital admissions, mortality or primary care); coding system and version used; specific codes used to identify cases; diagnostic coding position (e.g. primary or secondary position); parkinsonian subtypes investigated; and the method used to make the reference standard diagnosis.
We recorded the reported PPV and/or sensitivity estimates, as well as any corresponding raw data. After discussion, any remaining queries were resolved with a senior third author (CLMS). When necessary, we contacted study authors to request additional information.

Quality assessment
We adapted the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) [22] tool to evaluate the risk of bias in the estimates of accuracy and any concerns about the applicability of each article to our specific research question (S2 File). Two authors (TW and ZH) independently assigned quality ratings, with any discrepancies resolved through discussion. We performed this evaluation in the context of our specific review question and not as an indication of the overall quality of the articles. We assessed risk of bias at the article level rather than study level, as the methods for each study within an article were very similar. We did not exclude studies based on their quality assessment ratings, but rather considered a given study's results in the context of the article's risk of bias and applicability concerns. Where articles deemed to be at low of bias and articles at high risk of bias reported PPV or sensitivity estimates on the same type of dataset, we compared the reported estimates to assess the potential effect of bias on accuracy estimates.

Statistical analysis/data synthesis
We tabulated the extracted data, and calculated 95% confidence intervals for the accuracy measures from the raw data using the Clopper-Pearson (exact) method. Due to substantial heterogeneity in study settings and methodologies, we did not perform a meta-analysis, as we considered any summary estimate to be potentially misleading. Instead, we assessed the full range of results in the context of study methodologies, populations and specific data sources. We also reported any within-study comparisons in which a single variable was changed to examine its effect on PPV or sensitivity. We performed analyses using the statistical software StatsDirect3.

Quality assessment
Only two articles were judged to be of low risk of bias or applicability concerns in the QUA-DAS-2 assessment [23,24] (S1 Table). Across the risk of bias domains, the most common area of concern was inappropriate or unclear code lists to identify disease cases (10/18), followed by: selection bias (8/18), patient flow (i.e. inappropriate inclusions and exclusions or patients being lost to follow-up) (5/18) and insufficiently rigorous or unclear reference standards (4/ 18).
One of the two articles judged to be at low risk of bias investigated the PPV of hospital admissions data to identify PD, reporting a PPV of 70.8% [24]. This value fell in between the range of other studies (range 55.5-90.3%), raising the possibility that estimates from studies at the extremes of the range may be influenced by bias.
Several within-study comparisons were available from three studies identifying PD (Table 3) [24,28,29]. Two of these investigated the change in PPV for hospital data to identify PD when algorithms containing additional criteria were used [24,28]. Both showed a moderate increase in PPV if a relevant diagnosis code was recorded more than once, or if a specialist department assigned such a code. One study reported an increase in PPV when only primary position diagnoses were assessed [24]. Another showed that incorporating selected medication codes with diagnosis codes increased the PPV from 76% to 86%, although this was at the expense of reduced case ascertainment [28]. Finally, one study showed that the combination of a diagnostic code in hospital data with a relevant medication code increased the PPV when compared to using either dataset alone (94% versus 87% and 89% respectively) [29].
The two articles with low risk of bias investigated the use of hospital admissions data to identify parkinsonism cases. These articles reported PPVs of 76% [23] and 88% [24], which is consistent with the values reported by other studies judged to be at risk of bias.

Sensitivity
For PD, there were 11 sensitivity estimates in total (Fig 3) [24,[37][38][39][40]. Of these, nine were sensitivity estimates for mortality data alone, consistently showing that codes in the primary position only gave low sensitivities of 11-23%, rising to 53-60% when codes from any position were included [24,[37][38][39][40]. A single study reported the sensitivity of hospital data to be 73%, increasing to 83% when hospital and mortality data were combined. There were no sensitivity estimates for primary care or prescription data.
Of the two studies with low risk of bias, one investigated the sensitivity of mortality data, reporting a value of 20%. This was similar to the values reported by other studies deemed at risk of bias, suggesting that the potential bias identified did not significantly affect these estimates.
For parkinsonism, there were three sensitivity estimates, all from one study [24]. Hospital admissions and mortality data combined gave higher sensitivity (71%) compared with either mortality or hospital data alone (43% and 63% respectively).

Discussion
We have demonstrated that existing validation studies show a wide variation in the accuracy of routinely collected healthcare data for the identification of PD and parkinsonism cases. Despite this, in some circumstances, achieving high PPVs is possible. Sensitivity (range 15-73% for PD) is generally lower than PPV (range 36-90%) in single datasets, but is increased by combining data sources.
When using routinely collected datasets to identify disease cases, there will inevitably be a trade-off between PPV and sensitivity [16]. The extent to which cohorts seek to maximise one accuracy metric over another will depend on the specific study setting and research question. For example, for studies that rely only on routinely collected data to identify disease cases are likely to desire a high PPV, providing sensitivity is sufficient to ensure statistical power in analyses. In contrast, for studies that use routinely collected data to identify potential cases before going onto validate these cases with a more detailed in-person or medical record review, a high sensitivity will be important. In this review, we found that the sensitivity of mortality data to detect PD using codes in the primary position alone was very low (range 11-23%) however, this markedly improved (range 56-60%) when codes were selected from any position on the death certificate [24,[37][38][39][40]. No studies in this review investigated the effect of coding position on PPV, but previous studies of dementia and motor neurone disease have shown that selecting cases for whom the disease code was in the primary position consistently led to increased PPVs compared to selecting disease codes from any position [41][42][43][44]. However, as with PD, this approach led to the identification of fewer cases, thereby reducing sensitivity [17,18].
The pharmacological treatment of PD is largely focussed on improving motor function and patients are treated with a limited number of drugs. This has allowed antiparkinsonian drugs to be used as 'tracers' in epidemiological studies [45,46]. There are potential problems with using prescription data as a proxy for PD diagnosis. This approach may disproportionately under-identify patients with early stage disease who do not yet require treatment. Also, a response to a trial of dopaminergic drugs may be used as part of the diagnostic assessment in potential PD cases, meaning some patients prescribed antiparkinsonian medications will not be subsequently diagnosed with PD. Furthermore, antiparkinsonian can be prescribed for indications other than PD (such as dopamine agonists for restless legs syndrome, endocrine disorders and other forms of parkinsonism). The specific drugs licensed for use in parkinsonian conditions varies between countries and may change over time. Therefore, an algorithm incorporating prescription data would need to be continually revised to match prescribing patterns. Results from our review suggest that prescription data alone has a low PPV for PD case ascertainment [33]; however, when drug codes are combined with diagnostic codes, PPV increases but with reduced case ascertainment [28,32]. Furthermore, prescription datasets appear to have a higher PPV when identifying any parkinsonian disorder rather than specifically PD [33].
This study has several strengths and limitations. Our review benefits from prospective protocol publication, comprehensive search criteria, and independent duplication of each stage by two authors. Despite this, relevant studies may still have been missed, especially if a validation study was a subsection of a paper with a wider aim. As all eligible studies were included, the results may have been influenced by studies of lower quality. Only two articles were found to be at low risk of bias or applicability concerns [23,24], and it is likely that biases in study design would have affected the results. For example, one study with the lowest PPV [35] used very broad ICD-9 codes such as 781.0 (abnormal involuntary movements) and 781.3 (lack of coordination).
Since there is no method of diagnosing PD with certainty in life, there is likely to be some misclassification of the reference standards used in the studies. The application of stringent diagnostic criteria to reference standard diagnoses, although often necessary for research purposes, may lead to some patients being misclassified as 'false positives' when they do in fact have the condition. This may lead to underestimation of the PPV in some of the studies. When considering the ideal reference standard for validation studies, there is a trade-off between the robustness of the reference standard and validating sufficient cases to produce precise accuracy estimates. For example, in-person neurological examination may have greater diagnostic certainty than medical record review but this becomes difficult as the cohort size increases. Some of the variation in the reported results, therefore, is likely to be due to differences in how stringently different studies applied their reference standards.
Many of the studies reported cases with insufficient information to meet the reference standard and the handling of these varied. Some studies excluded such cases, others classified them as false positives, while some did not specify how they handled such missing data. Excluding such cases may introduce selection bias, whereas counting them as false positives may underestimate PPV.
The effect of possible publication bias on the results is difficult to estimate, but disproportionate publication of studies which report more favourable accuracy measures may lead to over-estimation of the performance of the codes. In addition, estimates of PPV are dependent upon the prevalence of the condition in the study population but it was not possible to assess the prevalence of PD within each study population.
Our review highlights several areas requiring further research. Given that the management of PD is largely delivered in outpatients or the community, primary care data may be an effective method of identifying cases. Whilst studies have suggested that PD diagnoses made in primary care are less accurate than those made in a specialist setting [47,48], primary care records combine notes made by primary care clinicians with prescription records and correspondence from secondary care. Codes from primary care should therefore include diagnoses made by specialists, thus increasing their accuracy. We found only one small study of primary care data, reporting a promising PPV of 81%, improving to 90% with the inclusion of medication codes [32]. No studies investigated the sensitivity of primary care data. Further research into the accuracy of primary care data is needed.
Two studies investigated using algorithmic combinations of codes from different sources to improve PPV [24,28]. These investigated the additional benefit of the inclusion of factors such as only including codes that appeared more than once, selecting codes in the primary position only, combining diagnostic codes with prescription data, and only including diagnoses made in specialist clinics. These methods increased PPV but at a cost to the number of cases identified. The development of algorithms that maximize PPV whilst maintaining a reasonable sensitivity (e.g., by combining multiple complimentary datasets) merits further evaluation.
To our knowledge, no studies have evaluated the accuracy of routinely collected healthcare data for solely identifying atypical parkinsonian syndromes such as PSP and MSA. Further work is needed to understand whether these datasets provide a valuable resource for studying these less common diseases.
In conclusion, our review summarises existing knowledge of the accuracy of routinely collected healthcare data for identifying PD and parkinsonism, and highlights approaches to increase accuracy and areas where further research is required. Given the wide range of observed results, prospective cohorts should perform their own validation studies where evidence is lacking for their specific setting.