Concordance and timing in recording cancer events in primary care, hospital and mortality records for patients with and without psoriasis: A population-based cohort study

Background The association between psoriasis and the risk of cancer has been investigated in numerous studies utilising electronic health records (EHRs), with conflicting results in the extent of the association. Objectives To assess concordance and timing of cancer recording between primary care, hospital and death registration data for people with and without psoriasis. Methods Cohort studies delineated using primary care EHRs from the Clinical Practice Research Datalink (CPRD) GOLD and Aurum databases, with linkage to hospital episode statistics (HES), Office for National Statistics (ONS) mortality data and indices of multiple deprivation (IMD). People with psoriasis were matched to those without psoriasis by age, sex and general practice. Cancer recording between databases was investigated by proportion concordant, that being the presence of cancer record in both source and comparator datasets. Delay in recording cancer diagnoses between CPRD and HES records and predictors of discordance were also assessed. Results 58,904 people with psoriasis and 350,592 comparison patients were included using CPRD GOLD; whereas 213,400 people with psoriasis and 1,268,998 comparison patients were included in CPRD Aurum. For all cancer records (excluding keratinocyte), concordance between CPRD and HES was greater than 80%. Concordance for same-site cancer records was markedly lower (<68% GOLD-linked data; <72% Aurum-linked data). Concordance of non-Hodgkin lymphoma and liver cancer recording between CPRD and HES was lower for people with psoriasis compared to those without. Conclusions Concordance between CPRD and HES is poor when restricted to cancers of the same site, with greater discordance in people with psoriasis for some cancers of specific sites. The use of linked patient-level data is an important step in reducing misclassification of cancer outcomes in epidemiological studies using routinely collected electronic health records.


Introduction
Psoriasis is an immune-mediated inflammatory disease, with substantial regional variation in prevalence across the globe [1]. In Western Europe, the prevalence is relatively high when compared to many other global regions, with UK-specific estimates that it affects 2.8% of the general population [2]. The importance of psoriasis has been highlighted by the World Health Organization (WHO) [3], which acknowledged not only the burden of the disease to the individual and to society, but also the consequence of associated comorbidities. These comorbidities include psoriatic arthritis [4], cardiovascular disease [5], depression [6] and cancer [7]. As in other diseases, many recent studies investigating comorbid conditions in psoriasis have been conducted by using routinely collected electronic health records (EHRs). A recent systematic review [7] identified 37 studies examining the risk of cancer development in people with psoriasis conducted using such databases, with many only utilising primary care [8][9][10][11] or hospital [12][13][14][15] data. However, estimates quantifying the degree of risk remain varied. Several explanations have been posited for this variation, including the extent of adjustment for confounding and differing severities of psoriasis in study groups [16]. Whilst these explanations may play a role, it is also important to consider, given the proportion of studies conducted using EHRs, that some of the variation may come as a result of bias in the ascertainment of cancer outcomes.
Population-based EHR databases present a number of distinct advantages for epidemiological research, including increased power and consideration of multiple exposures [17], and increasingly are being used to address questions related to dermato-epidemiology. Given that such databases are derived from routinely recorded EHRs, it is important to consider the potential for misclassification of outcomes. Recently, a cohort study using the Clinical Practice Research Datalink (CPRD) found that the use of linked primary care and hospitalisation records helped to avoid outcome misclassification, as the use of primary care data alone to ascertain hospitalisation for lower respiratory tract infection would have underestimated the incidence rate by 31% [18]. Given that a number of studies have investigated the risk of cancer occurrence in psoriasis using only primary or secondary care data, it is important to consider the extent to which misclassification may be a problem. The primary aim of this study was therefore to assess the concordance of cancer recording between primary care, hospital, and national mortality records for people with and without psoriasis. Our secondary aim assessed the delay in recording between the different data sources for same-site cancer records and risk factors influencing discordance in recording.

Data sources
Clinical Practice Research Datalink (CPRD) GOLD and Aurum. The CPRD is a UKbased research service consisting of primary care records and formed from two databases: GOLD and Aurum [19,20]. Diagnostic information is recorded through Read codes, while information on test results, ethnicity and lifestyle measures (e.g. smoking status) is also available. Data from registered patients in the contributing general practices in England may be linked to a number of other data sources [21], including hospital records, in the form of Hospital Episode Statistics (HES) and mortality data, in the form of Office for National Statistics (ONS) death registration data. Socioeconomic data is also available for linkage through the Index of Multiple Deprivation 2010 (IMD), which is a small-area measure based on patient residential postcode, providing an aggregate measure of deprivation across seven domains, including income, employment and health [22].
Hospital Episode Statistics (HES). Hospital episode statistics consist of records from inpatient, outpatient and emergency admissions in English NHS hospitals. Following patient discharge, clinicians complete a discharge summary that is then entered into an electronic patient information database. HES records contain a range of information, including patient demographics (including ethnicity), medical diagnoses at discharge (coded using ICD-10) and procedures (coded using OPCS-4) [23].
Office for National Statistics death registrations (ONS). ONS mortality records are derived from death registrations. Upon patient death, an attending doctor completes a Medical Certificate of Cause of Death (MCCD) that is then passed on to a local registrar of births and deaths. Causes of death listed as part of a patients' MCCD may be a primary or underlying cause, and are coded according to ICD-9 (pre-2001) or ICD-10 (2001-present) [24].

Study population
Psoriasis patients were identified by a diagnostic Read code for psoriasis in the primary care records within the study period (01/01/1998-30/11/2018). Patients were required to be eligible for linkage to HES, ONS and IMD; be a minimum of 18 years old; and, have been in an 'up to standard' practice (those with continuity in reporting of data and expected death rates) for at least 12 months prior to study entry. Patients were excluded if they had any record of cancer (excluding keratinocyte cancer) prior to study entry in order to ensure only primary cancers were included. Additionally, patients with any diagnostic record of HIV/Aids prior to study entry were excluded due to the associated increased cancer risk. A bridging file was applied to the GOLD cohort to exclude patients and practices that transferred from CPRD GOLD to Aurum. Cohort delineation is presented in S1 Fig. The index date for psoriasis patients was defined as the first record of psoriasis in the study period.
Comparison patients were matched to psoriasis patients at a ratio of up to 6:1 on age, sex and general practice. Restriction criteria for comparison patients were consistent with those for psoriasis patients, with the addition of having no psoriasis record prior to study entry. Pseudo-index dates for comparison patients were generated based on the index date of their matched psoriasis patient. Psoriasis and comparison cohorts were identified separately for CPRD GOLD and Aurum. All patients were followed from index date to the first occurrence of cancer diagnosis, transfer out from the general practice, last data collection date, study period end or death.

Data analysis
Outcomes. The primary outcome of interest was cancer diagnosis. In order to examine variation in concordance, cancer diagnoses were split into the following categories: all cancer (excluding keratinocyte cancer), bladder, brain, breast, cervical, colorectal, gallbladder, Hodgkin's lymphoma (HL), keratinocyte, kidney, laryngeal, leukaemia, liver, lung, malignant melanoma, multiple myeloma, nasal cavity, non-Hodgkin's lymphoma (NHL), oesophageal, oral cavity, ovarian, pancreatic, pharyngeal, prostate, stomach, thyroid and uterine cancer. Code lists were formed by one author (AMT) and were then separately reviewed by two clinicians (CEG and MKR), with any discrepancies rectified through discussion between both clinicians and AMT. Code lists for exposure and outcome are available for download from www. clinicalcodes.org [25]. Data analysis was carried out using Stata version 16 (Statacorp, College Station, TX, USA).
Assessment of concordance. Concordance of cancer recording was considered between one source database and a comparison database (i.e. CPRD as the source database with HES as the comparison database, and vice versa). Records in the comparison database must have occurred prior to transfer out date, last data collection date, study period end or death in order to be eligible for consideration of concordance. Concordance between the source database and the comparison database was classified into 3 groups: (1) same site recorded in the source and comparison database (2) any cancer recorded in the comparison database (3) no record in the comparison database. Additionally, for each cancer record in the source database, it was evaluated whether there was any death registration including a record of cancer. Factors associated with discordance in reporting between the source and comparison databases, for CPRD and HES, were assessed using logistic regression. The following risk factors for discordance were included in the model: age, gender, deprivation and time period. Where the same cancer-site record was found in the comparison database, the delay in recording between the source and comparison records was examined.
This study was approved by the Independent Scientific Advisory Committee (ISAC) for Medicines and Healthcare Regulatory Agency database research (ISAC approval 19_089R).

Results
In CPRD GOLD, 58,904 people with psoriasis and 350,592 matched comparison patients were included. In CPRD Aurum, 213,400 people with psoriasis and 1,268,998 matched comparison patients were included (Table 1).

Concordance in cancer recording
The concordance of cancer recording between CPRD Aurum, HES and ONS is reported separately for both people with psoriasis (Fig 1) and comparison patients (Fig 2). Concordance for CPRD GOLD-linked data is reported in S2 and S3 Figs.
CPRD-identified cancer records. In CPRD Aurum, 11,889 and 63,691 cancers (excluding keratinocyte) were identified in psoriasis and comparison patients respectively. Concordance at the same site in HES was 71.0% for people with psoriasis and 71.5% for comparison patients. Concordance for any record of cancer was higher (84.5% psoriasis; 84.8% comparison). In CPRD GOLD, 2,916 and 15,236 cancers events (excluding keratinocyte) were identified in the psoriasis and comparison group, respectively. Same cancer-site concordance with HES was 67.9% for people with psoriasis and 67.7% for comparisons. Concordance of any cancer record (excluding keratinocyte) was higher (82.7% psoriasis; 82.2% comparison). When examining specific cancers, concordance between CPRD identified records and HES was lowest for keratinocyte cancers and malignant melanoma in both CPRD Aurum and GOLD. In people with psoriasis, for any cancer (excluding keratinocyte) record in CPRD, 35.2% (Aurum) and 35.3% (GOLD) also had a cancer record in ONS mortality records.

HES-identified cancer records.
In CPRD Aurum-linked HES, 11,777 and 63,298 cancers (excluding keratinocyte) were identified; 70.9% and 71.2% of these had a cancer record at the same site in CPRD Aurum for people with psoriasis and comparisons respectively. Any site record concordance was higher (84.0% psoriasis; 84.2% comparison). In patients with psoriasis 2,911 cancers (excluding keratinocyte) were identified in CPRD GOLD-linked HES and 15,124 cancers were identified in comparison patients. Of these, 67.3% (psoriasis) and 67.8% (comparison) had a record at the same site in CPRD GOLD. Any cancer record concordance for CPRD GOLD was 81.6% (psoriasis) and 81.6% (comparison). For site-specific cancers, concordance between HES identified cancers and CPRD records was lowest for pancreatic, lung and kidney cancer. In people with psoriasis, for any cancer (excluding keratinocyte) record in HES, 40.7% (CPRD Aurum-linked HES) and 40.8% (CPRD GOLD-linked HES) also had a cancer record in ONS mortality records.
Variation in concordance by psoriasis status. When considering same cancer site concordance between source and comparator databases, notable differences were found between psoriasis and comparison patients for two cancers. Of the liver cancers recorded in CPRD Aurum-linked HES for people with psoriasis, 28% had no record in CPRD Aurum. In comparison, for the psoriasis-free patients, 20% of liver cancer records were only found in HES. For non-Hodgkin lymphoma (NHL), a greater proportion of cases were only found in CPRD Aurum for people with psoriasis (16%) compared to comparison patients (10%). CPRD GOLD results followed the same pattern and are included in the (S3 and S4 Tables).

Delays in recording cancer events between sources
There was little variation in the timing of recording of cancer events between people with psoriasis and comparison patients or between CPRD GOLD-linked data and CPRD Aurum-linked data ( Table 2). For records first identified in CPRD, between 75-79% were identified within 3 months in HES. The large majority of records first identified in HES were recorded in CPRD within 3 months (>90%), with less than 4% having a delay of over a year. There was notable variation in the delay between records by cancer site, particularly for records first identified in the CPRD. In records first identified in CPRD Aurum, the lowest proportion of concordant records identified within three months was observed for keratinocyte cancers, leukaemia, non-Hodgkin lymphoma and prostate cancer (Fig 3). Variation in delay for records first identified in HES was less apparent, with only keratinocyte cancers having a notably lower proportion of concordant records within 3 months. Results for the CPRD GOLD cohort were similar and are included in the (S4 Fig).

Risk factors for discordance in cancer recording
In CPRD Aurum, increased age was associated with discordance in cancer recording when either CPRD or linked HES data was the source database, regardless of whether patients had Only those aged over 75 were more likely than those aged less than 65 to only have a record in CPRD Aurum (OR 1.43, 95%CI: 1.35-1.51). For both people with psoriasis and comparison patients, the odds of having a record only in the primary care records reduced with increasing deprivation. In contrast, where HES was the source database, discordance was more likely in the comparison cohort for those in more deprived areas. With regards to temporality, later year of diagnosis was also associated with reduced discordance. Factors associated with discordance in the CPRD GOLD cohort were similar and are presented in the (S5 Table).

Discussion
This study examined the concordance of cancer recording between primary care, hospital and death registration data in people with and without psoriasis. Concordance of cancer records at the same site between CPRD and HES was poor, with marked variation according to cancer site. Though higher, concordance for any cancer record remained below 85%. The delay between same-site records varied according to the database in which the cancer was first identified, with older age, time period and deprivation associated with discordance in reporting.
Concordance for cancer records of the same site differed according to the cancer site in question and to the database in which the cancer was first identified. For cancers initially identified in the CPRD, same-site concordance in HES was notably lower for keratinocyte cancers and malignant melanoma. Conversely, for cancers initially identified in HES, same-site Table 3. Multivariable-adjusted odds of non-concordance according to covariates in CPRD-Aurum and linked HES data. concordance in the CPRD was lowest for pancreatic, lung and kidney cancers. Previously suggested plausible explanations for lower same-site concordance include the use of non-specific cancer diagnostic codes, death shortly following hospital admission for cancer, and death from cancer prior to hospitalisation [26], with the latter two explanations supported by increased likelihood of discordance in older patients. With regards to keratinocyte cancers specifically, high discordance between CPRD and HES records likely arises as appropriately trained primary care physicians are able to excise lesions without referral to secondary care [27]. As lower concordance of same-site records suggests a potential for poor outcome ascertainment in studies utilising only one data source, these results support the need to link primary and secondary care data sources in CPRD studies of cancer occurrence, especially those considering site-specific cancers. Differences in concordance between people with and without psoriasis were present for some site-specific cancers, with implications for studies considering associations between psoriasis and cancer. For NHL identified in HES, site-specific concordance within CPRD was lower for people with psoriasis compared to those without. As cutaneous T-cell lymphoma (CTCL), a variant of NHL, clinically manifests in a manner which mimics psoriasis [28], it is possible that people with CTCL are misdiagnosed as having psoriasis in primary care. Referral to specialist hospital services would likely result in an accurate CTCL diagnosis and a consequential discordant cancer record. Furthermore, in people with psoriasis, CPRD identified NHL records were also more commonly discordant-likely resulting from patients receiving an incorrect CTCL diagnosis in primary care, which is then correctly identified as psoriasis in secondary care. The link between psoriasis and lymphoma has received significant focus in previous work [10,11,29] and these results suggest caution in the interpretation of these findings. Studies utilising only primary or secondary care records that do not differentiate between NHL variants are likely to over-estimate the risk of NHL through the aforementioned misclassification of CTCL cases in psoriasis patients.

Psoriasis-Aurum Source Psoriasis-HES Source Comparison-Aurum Source Comparison-HES Source
There was also heterogeneity in the recording of liver cancer cases between people with and without psoriasis, with a greater proportion of liver cancer cases only identified in HES for people with psoriasis. It is plausible that this discordance arises as psoriasis patients have higher liver cancer mortality and therefore die before a record is made in the CPRD-an argument supported by increased heavy drinking in people with psoriasis [30] and the suggested increased mortality in alcohol-associated liver cancer [31]. Under this explanation, studies that only use primary care records may underestimate the association between psoriasis and liver cancer.
Beyond psoriasis, age, deprivation and time period were all predictors of discordance. As noted previously, increasing age was associated with an increased probability of having a discordant record-likely resulting from cancer death prior to a record being made in the comparison data source. For records first identified in the CPRD, increasing deprivation was associated with lower discordance. Rather than suggesting improved recording practices in more deprived areas, it is likely that this relationship is explained by the increased incidence of the most commonly discordant cancers, such as keratinocyte cancers [32], malignant melanoma [33] and prostate cancer [34], in less deprived areas. With regards to time, reduced discordance in later periods is suggestive of improved data recording, as noted for lifestyle and demographic factors in previous works [20].
To our knowledge, this is the first study to consider the concordance of cancer recording between CPRD Aurum and linked HES data, and the inclusion of both GOLD and Aurum within the study allows cross-validation between the two databases. A potential limitation, as in any study of concordance between EHRs, are discrepancies in coding dictionaries. However, code list review by multiple clinicians was carried out to minimise any issues.
In conclusion, the use of primary care or hospital data in isolation to determine cancer events is likely to be inadequate, particularly when considering certain site-specific cancers. In addition, these inadequacies may be exacerbated through improper consideration of important predictors of discordance. As such, the use of linked electronic health records, with appropriate covariate consideration, is strongly advisable in studies of cancer risk as a means of improving outcome ascertainment.