The diagnosis, burden and prognosis of dementia: A record-linkage cohort study in England

Objectives Electronic health records (EHR) might be a useful resource to study the risk factors and clinical care of people with dementia. We sought to determine the diagnostic validity of dementia captured in linked EHR. Methods and findings A cohort of adults in linked primary care, hospital, disease registry and mortality records in England, [CALIBER (CArdiovascular disease research using LInked Bespoke studies and Electronic health Records)]. The proportion of individuals with dementia, Alzheimer’s disease, vascular and rare dementia in each data source was determined. A comparison was made of symptoms and care between people with dementia and age-, sex- and general practice-matched controls, using conditional logistic regression. The lifetime risk and prevalence of dementia and mortality rates in people with and without dementia were estimated with random-effects Poisson models. There were 47,386 people with dementia: 12,633 with Alzheimer’s disease, 9540 with vascular and 1539 with rare dementia. Seventy-four percent of cases had corroborating evidence of dementia. People with dementia were more likely to live in a deprived area (conditional OR 1.26;95%CI:1.20–1.31 most vs least deprived), have documented memory impairment (cOR = 11.97;95%CI:11.24–12.75), falls (cOR = 2.36;95%CI:2.31–2.41), depression (cOR = 2.03; 95%CI:1.98–2.09) or anxiety (cOR = 1.27; 95%CI:1.23–1.32). The lifetime risk of dementia at age 65 was 9.2% (95%CI:9.0%-9.4%), in men and 14.9% (95%CI:14.7%-15.1%) in women. The population prevalence of recorded dementia increased from 0.3% in 2000 to 0.7% in 2010. A higher mortality rate was observed in people with than without dementia (IRR = 1.56;95%CI:1.54–1.58). Conclusions Most people with a record of dementia in linked UK EHR had some corroborating evidence for diagnosis. The estimated 10-year risk of dementia was higher than published population-based estimations. EHR are therefore a promising source of data for dementia research.


Introduction
Dementia is a common progressive clinical syndrome that develops slowly over years. Many of those affected are disabled not only by cognitive impairment but also by common co-morbidities of ageing such as stroke, arthritis, and heart disease. The burden of dementia on patients, carers and the health system is substantial, and might increase as populations grow older. [1] Very long follow-up is needed to study risk factors for dementia, because of the long prodrome before suspicion of diagnosis, reverse causality, and because it is likely that different factors affect disease risk at different stages over a lifetime. This means that well conducted prospective studies with complete follow-up are important, but they are rare and costly. [2,3] More efficient methods to study dementia, on a large scale and with a low dropout rate, would improve our understanding of the pathophysiology of this condition. National electronic health records (EHR) linking primary care, hospital and death records are a potentially important source of data for dementia research. They include people with the severest manifestations of dementia or disability, who might be under-represented in bespoke recruited cohorts because they are difficult to recruit or follow-up. They also capture a wide variety of important health events. However, it is uncertain whether EHRs capture dementia with sufficient accuracy and completeness.
Studies of the validity of dementia in EHRs reported positive predictive values for a dementia of up to 90% [4,5] (i.e. when a diagnosis is recorded, people usually have dementia). These studies have relied on hand searching of individual clinical charts, and therefore had modest sample sizes (<500). EHR might underestimate the proportion of people with dementia (i.e. a low sensitivity) compared with bespoke cohorts. However, the longitudinal nature of electronic medical records provides multiple opportunities for capture of a dementia diagnosis, and therefore measurement of the lifetime risk of dementia could provide a better measure of the sensitivity of EHRs for dementia diagnosis.
In this study, we sought to determine how dementia is captured in different routinely collected medical data sources; whether characteristic dementia symptoms might improve dementia ascertainment; and to determine the lifetime risk of dementia from these records.

Study population
We studied a cohort of people registered in the Clinical Practice Research Datalink (CPRD) general practices between 1 st January 1998 and 31 st March 2010. At study entry, eligible patients were aged 18 or above at the beginning of the cohort and had at least one year of upto-standard pre-study follow-up. We used the CArdiovascular disease research using LInked Bespoke studies and Electronic health Records (CALIBER) dataset that links individuals in CPRD to national hospital admission and death records. [6] This linked dataset includes 4% of the English population and is broadly representative of the UK population in terms of age, sex, ethnicity and overall mortality. [7][8][9]

Definition of dementia
We identified dementia in the three data sources using the clinical terms (Read version 2) (67 codes), ICD-9 (12 codes) and ICD-10 (36 codes) classification systems (Figure A and Table A in S1 File). We defined dementia as the record of one or more diagnostic codes in any of the three data sources at any time and in any position (i.e. dementia was any of the recorded diagnosis in hospital admission or death record). We defined people with corroborating evidence of diagnosis if they had dementia with: (i) more than one record of dementia in the same data source, on different dates; or (ii) a record of dementia in 2 or 3 data sources; or (iii) a record of falls, confusion, memory problems or nursing home admission; or (iv) dementia monitoring codes in primary care; or (v) a referral to a dementia speciality (geriatrics, care of the elderly, psychiatry); or (v) more than one prescription of rivastigmine, galantamine, donepezil, memantine, which are typically used to treat patients with Alzheimer's disease and sometimes patients with dementia in Parkinson's disease and dementia with Lewy bodies. We additionally classified people with dementia into four sub-types, Alzheimer's disease, vascular, rare and unclassified; using Read and ICD diagnostic codes. Rare dementia included fronto-temporal dementia, dementia with Lewy bodies, Parkinson's, Huntingdon's, Pick's or Creutzfeld-Jakob diseases, and HIV-related dementia.

Selection of comparison group
For each case with dementia identified in primary or hospital care, we randomly selected up to ten people without dementia (concurrent sampling), who were matched on sex, year of birth and general practice. Controls had to be alive and actively registered in the general practice at the date of diagnosis of the matched dementia case, and to have had a contact with the practice within the year prior or after the matched index date. A total of 47151 people with dementia (99.5% of the total identified) were matched. They were followed up until death, transfer out of their primary care practice, or the date of administrative censoring (March 2010).

Statistical analysis
We described the frequency and proportion of people with dementia and its subtypes, in the linked data and in each data source. We compared symptoms and management characteristics of people with dementia in cases and matched controls using conditional logistic regression that takes into account the matched structure of the data and consequently adjusts the results for the matching factors. We measured deprivation with the index of multiple deprivation, and divided the population into fifths based on this measure. [10] We calculated the 10-year and lifetime risks of dementia and Alzheimer's according to age and gender, using Kaplan-Meier methods corrected for competing risk of death and using age as the time-scale. For this analysis we included all registered patients who were alive, registered in the cohort and without any dementia diagnosis, at the beginning of follow-up (i.e. earliest date of study eligibility), for example at their 65 th , 75 th and 85 th birthdays. Such follow-up then ended on the date of first recorded dementia diagnosis (for cases) or the earliest of date of death, practice deregistration or last data collection date in the practice. We then estimated the point-prevalence of dementia in the entire cohort on 1 st July 2000 and 2005 and on 1 st January 2010. We counted as cases also patients who were diagnosed with dementia only at death, when this happened within a year from the analysis time points. Finally we compared overall and sex-specific mortality rates ratios in people with and without dementia in the matched subset. We performed this analysis using random-effects Poisson models, adjusted for age and sex as appropriate, with age as the time-scale. Cox proportional hazard models were not used because the hazard ratio for mortality was not constant over time. For this analysis the observation period began on the date of first recorded dementia diagnosis of the matched case. All analyses were performed using Stata 13. The study was registered at www.clinicaltrials. gov (NCT02549872). Approval was granted by the Independent Scientific Advisory Committee of the Medicines and Healthcare products regulatory agency (protocol no. 15_138).

Role of the funding source
The funding source had no role in the preparation of the manuscript.

Diagnosis ascertainment and data source overlap
We identified 47,386 people with dementia, of whom 34,925 (74%) had corroborating evidence to support their diagnosis ( Figure B in S1 File). A total of 22,184 (47%) could be classified into a dementia subtype: 12,633 (27%) had Alzheimer's disease, 9,540 (20%) had vascular dementia and 1,539 (3%) had a rare dementia ( Table B in S1 File). Compared to people with unclassified dementia, a greater proportion of those with a specific dementia subtype had corroborating evidence for their diagnosis (82% of Alzheimer's disease, 71% of vascular, 69% of rare and 62% of unclassified subtype).
Of the 47,386 cases of dementia, 55% were captured in primary care, 65% in hospital records, and 26% in the national death register only (Table 1). Overall, 44% of dementia cases were captured in hospital or in the mortality registry but not in primary care, and 23% were captured in primary care records only (Fig 1). In each data source, most people had corroborating evidence of dementia (88% in primary care, 76% in hospital and 83% in the death registry). Overall, the proportion of people with dementia who had corroborating evidence was 74%. Compared with other data sources, a higher proportion dementia cases captured in the primary care were prescribed dementia medication (18% versus 11% in hospital or 9% in the mortality registry), had recorded symptoms (27% versus 23% or 25%) and had evidence of dementia monitoring (40% versus 22% or 22%) or of referral to a relevant speciality (12% versus 10% or 8%).
Overall, a minority of patients with dementia had a primary care record of memory impairment, confusion or admission to nursing home (23%), a record of dementia monitoring (27%) or had been referred to a geriatrician or care of the elderly psychiatrist (10%). Drugs typically indicated for Alzheimer's dementia, dementia with Lewy bodies or dementia with Parkinson's disease were more commonly prescribed to people with Alzheimer's disease (27%) or rare (26%) dementias than to people with vascular (6%) or with unclassified dementia subtype (5%). Dementia subtypes were broadly consistently recorded across data sources, i.e. only 9% of people with Alzheimer's disease also had vascular codes, and only 12% of those with vascular dementia also had Alzheimer's codes; (Fig 2, Table B in S1 File). There were 647 people with 2 or more prescriptions for dementia medication who had no dementia diagnosis in any of the three data sources. The characteristics of this group were similar to people identified with dementia i.e. 32% had dementia symptoms or nursing home admission, 14% were monitored for dementia in primary care, and 14% had been referred to a dementia relevant speciality.
However only a minority of people with a diagnosis of dementia had a record of any one of these factors. The proportion of patients with dementia who had a missed a GP appointment was similar to patients without dementia (28% vs 27%, OR 1.38, 1.11-1.16), and the annual rate of GP consultation was similar in the two groups (proportions with >5 appointments per year were 36% vs 56%). However, the annual hospital admission rate was higher in people with than without dementia (proportions of >2 admissions per year were 3% vs <1%).

Prevalence of dementia
The prevalence of dementia increased markedly with age and over time, in both men and women (Fig 3). In 2010, in men the prevalence was much higher over 90 (8.7%) than under  50 years (0.2%) ( Table C in S1 File). The estimates were substantially higher in women than in men in the eldest age group (14.2% vs. 8.7% in 2010). The prevalence of dementia gradually increased over time in men and women of all ages, for almost all dementia subtypes (Figure C in S1 File) and for diagnosis captured in primary and hospital records ( Figure D in S1 File).

Lifetime risks of dementia
Estimates of lifetime risk of dementia and Alzheimer's disease are shown in

Mortality associated with dementia
In total, 159,674 deaths were recorded during a median follow-up of 1.7 years (inter-quartile range 0.6 to 3.4), 34,528 amongst people with dementia and 125,146 amongst those without dementia ( Table 4). The incidence rate ratio of mortality for people with dementia compared to individuals without the disease was 1.56 (95% CI:1.54-1.58). Estimates were similar in men and in women.

Discussion
Using contemporary, nationally-representative linked primary care, hospital records and the death registry from 2,524,144 people in England and Wales, we identified people with recorded diagnostic codes of dementia and dementia subtypes. The large majority of people with dementia had corroborating evidence of diagnosis, including recording of multiple diagnostic records in one or more data sources, symptoms and care features characteristics of dementia or were prescribed dementia medication. The lifetime risk of dementia at 65 years was 15% in women and 9% in men, and mortality was 1.56 higher in people with than without dementia. Our findings highlight the importance of using multiple linked data sources for defining dementia in EHRs. No individual data source analysed had complete coverage of coded dementia. Six percent were only recorded in the death registry, thirty-two percent only in hospital records and twenty-three percent only in primary care. Because data in CALIBER were anonymised, we could not validate dementia cases against patient clinical charts.
However, subgroups of dementia codes have been validated in previous EHR-based studies. [4,5,11,12] In addition, our estimated dementia lifetime risks are similar to figures reported in previous population based cohort studies. [13] Our study suggests that Read-coded symptoms on their own, cannot be used to identify unrecorded patients with dementia, because these are infrequently recorded and are insufficiently specific, even in combination, to accurately identify cases. Future work with natural language processing methods of free text collected during the consultation would be needed to make better use of symptom data in electronic records.
In studies based on the analysis of EHRs, lifetime risk of dementia is likely to be a more suitable measure of disease risk than absolute incidence rates, given that the time of onset of dementia is difficult to define in clinical practice. Lifetime risk is probably the most important statistic for an individual when planning their future needs. The lifetime risk of dementia depends on the duration of life, and is affected by the competing risks of death from other causes and the incidence of dementia in a population. We compared the 10 year risk of dementia from the Framingham study [14] with our estimates (Table E in S1 File). Our lifetime risk estimates were slightly higher than those reported in the US in 2005. For example, the 10 year risks of all dementia found in the Framingham study at 75 in men was 7.6%, and in women 7.4%, as compared with 12.8% and 16.6% in our study. The prevalence of dementia at different ages in this study in England was lower than estimates reported in a community prevalence study with participants recruited from Cambridgeshire, Liverpool and Newcastle. In that study in 2001, prevalences from age 65-69, 70-74, 75-79, 80-84, 85-90 and over 90 in men were: 1.2, 3.0, 5.2, 10.6, 12.8, 17.1% and in women 1.8, 2.5, 6.2, 9.5, 18.1, and 35%. Our prevalence estimates were approximately half of this in 2000, but about two-thirds of this in 2010, perhaps indicating improvement in recording. Community based incidence studies using formal instruments are likely to ascertain dementia with a lower severity, and it is possible that those in the EHRs represent only those in a later stage of illness, those with clinically evident dementia [15] or with more severe symptoms. [12] Given that the benefits of early dementia diagnosis have not been shown, and that the impact that such diagnosis has on patients and their relatives, GPs may hesitate to formally diagnose the disease until symptoms become disabling.
Although we have examined UK linked EHR from 1998 to 2010, our conclusions may not be transportable to other EHR datasets covering different time periods. With changing UK practice, diagnostic validity of a dementia record may change, with better ascertainment achieved in recent years after a significant effort to improve dementia diagnosis in primary care. Our results are not generalizable to other health systems, and therefore researchers working with data from these systems should aim to determine the validity of dementia diagnosis, either by linking and/or comparing information from existing or new disease cohorts to their EHR data sources, though case review, or conducting a similar analysis to ours. [16] Although linked EHR are an efficient source of data for dementia research, there are a number of weakness to be considered. First, there may be variation in case ascertainment and validity of diagnosis across regions, depending on hospital or primary care diagnostic or management behaviours. Second, under ascertainment of diagnosis during the early stages of the condition is likely-hence our recommendation for a lifetime approach. Finally, there is often limited data on important prognostic factors, such as education and family history or APOE4 allele, although these might eventually be obtained through linkage to other data sources.

Conclusion
This study provides evidence that the diagnosis of dementia in linked electronic health records has sufficient validity for large scale epidemiological studies. The major value of current records is found in coded diagnoses, rather than additional symptoms or other care episodes, which are seldom recorded. Despite reasonable concerns that that electronic health records underestimate the point prevalence of dementia compared to research studies, the calculated lifetime risk of dementia from these electronic health records is similar to population based estimates.
Supporting information S1 File. Supplementary data. (Table A) Dementia diagnostic codes used to identify people with dementia in hospital (ICD-10), mortality registry (ICD-9 and ICD-10) and primary care (Read codes). (Table B) Capture of diagnosis of dementia and its subtypes and overlap between data sources. (Table C) Time trends in prevalence of dementia according to age group and sex. (Table D) Median residual life expectancy in years. (Table E)