Age and Gender Variations in Cancer Diagnostic Intervals in 15 Cancers: Analysis of Data from the UK Clinical Practice Research Datalink

Background Time from symptomatic presentation to cancer diagnosis (diagnostic interval) is an important, and modifiable, part of the patient’s cancer pathway, and can be affected by various factors such as age, gender and type of presenting symptoms. The aim of this study was to quantify the relationships of diagnostic interval with these variables in 15 cancers diagnosed between 2007 and 2010 using routinely collected data from the Clinical Practice Research Datalink (CPRD) in the UK. Methods Symptom lists for each cancer were prepared from the literature and by consensus amongst the clinician researchers, which were then categorised into either NICE qualifying (NICE) or not (non-NICE) based on NICE Urgent Referral Guidelines for Suspected Cancer criteria. Multivariable linear regression models were fitted to examine the relationship between diagnostic interval (outcome) and the predictors: age, gender and symptom type. Results 18,618 newly diagnosed cancer patients aged ≥40 who had a recorded symptom in the preceding year were included in the analysis. Mean diagnostic interval was greater for older patients in four disease sites (difference in days per 10 year increase in age; 95% CI): bladder (10.3; 5.5 to 15.1; P<0.001), kidney (11.0; 3.4 to 18.6; P=0.004), leukaemia (18.5; 8.8 to 28.1; P<0.001) and lung (10.1; 6.7 to 13.4; P<0.001). There was also evidence of longer diagnostic interval in older patients with colorectal cancer (P<0.001). However, we found that mean diagnostic interval was shorter with increasing age in two cancers: gastric (-5.9; -11.7 to -0.2; P=0.04) and pancreatic (-6.0; -11.2 to -0.7; P=0.03). Diagnostic interval was longer for females in six of the gender non-specific cancers (mean difference in days; 95% CI): bladder (12.2; 0.8 to 23.6; P=0.04), colorectal (10.4; 4.3 to 16.5; P=0.001), gastric (14.3; 1.1 to 27.6; P=0.03), head and neck (31.3; 6.2 to 56.5; P=0.02), lung (8.0; 1.2 to 14.9; P=0.02), and lymphoma (19.2; 3.8 to 34.7; P=0.01). Evidence of longer diagnostic interval was found for patients presenting with non-NICE symptoms in 10 of 15 cancers (mean difference in days; 95% CI): bladder (62.9; 48.7 to 77.2; P<0.001), breast (115.1; 105.9 to 124.3; P<0.001), cervical (60.3; 31.6 to 89.0; P<0.001), colorectal (25.8; 19.6 to 31.9; P<0.001), gastric (24.1; 3.4 to 44.8; P=0.02), kidney (22.1; 4.5 to 39.7; P=0.01), oesophageal (67.0; 42.1 to 92.0; P<0.001), pancreatic (48.6; 28.1 to 69.1; P<0.001), testicular (36.7; 17.0 to 56.4; P< 0.001), and endometrial (73.8; 60.3 to 87.3; P<0.001). Pooled analysis across all cancers demonstrated highly significant evidence of differences overall showing longer diagnostic intervals with increasing age (7.8 days; 6.4 to 9.1; P<0.001); for females (8.9 days; 5.5 to 12.2; P<0.001); and in non-NICE symptoms (27.7 days; 23.9 to 31.5; P<0.001). Conclusions We found age and gender-specific inequalities in time to diagnosis for some but not all cancer sites studied. Whilst these need further explanation, these findings can inform the development and evaluation of interventions intended to achieve timely diagnosis and improved cancer outcomes, such as to provide equity across all age and gender groupings.


Results
18,618 newly diagnosed cancer patients aged 40 who had a recorded symptom in the preceding year were included in the analysis. Mean diagnostic interval was greater for older patients in four disease sites (difference in days per 10 year increase in age; 95% CI): bladder (10.3; 5.5 to 15.1; P<0.001), kidney (11.0; 3.4 to 18.6; P=0.004), leukaemia (18.5; 8.8 to 28.1; P<0.001) and lung (10.1; 6.7 to 13.4; P<0.001). There was also evidence of longer diagnostic interval in older patients with colorectal cancer (P<0.001). However, we found that Introduction Rapid diagnosis of cancer after symptoms arise is believed to be important to improve outcomes [1], and patient and/or their carer experience [2,3]. It is thought that thousands of deaths may be avoided annually if cancers are diagnosed quickly and successfully treated [4][5][6][7]. Hence, prompt diagnosis of symptomatic patients has become a priority worldwide [1,[7][8][9]. The National Awareness and Early Diagnosis Initiative (NAEDI) in England [10] and similar initiatives elsewhere in Europe [9] are trying to address this.
Most patients with cancer-related symptoms present to a primary health care practitioner, usually a GP, who then has to suspect a cancer, or other illness, and initiate an investigation or referral for diagnosis. This period between the first primary care presentation of potential cancer symptoms and eventual diagnosis, the 'diagnostic interval' [11,12] is one of the important phases in the route to diagnosis of many cancers [11,13]. Shorter diagnostic interval is generally considered to contribute to overall earlier stage diagnoses and better cancer outcomes [5,6]. Suspecting a cancer diagnosis in primary care may be difficult, as many of the symptoms of cancer can arise from co-morbidities or benign causes [14]. Hence, there is both a potential for delay at this point [12,13], as well as an opportunity to detect a cancer earlier [15], as an estimated one in 20 consultations in primary care include possible malignant symptomatology [16]. The speed of cancer diagnosis may vary by demographic characteristics, such as age and gender, [17,18] making some groups vulnerable and disadvantaged in both being diagnosed and treated late, [19][20][21], leading to poorer survival [22].
Primary care datasets are a key resource for studying cancer diagnostic pathways and have previously been used to determine the positive predictive value of cancer symptoms [23,24]; the change in diagnostic interval over time for various cancers [12]; and to construct clinical decision support tools [25,26]. These datasets can also be used to examine the association between the time to diagnosis and demographic variables for specific cancer symptoms presented to primary care.
The aim of this study was to quantify differences in cancer diagnostic intervals across subgroups defined by age, gender and symptom type in 15 types of incident cancer diagnosed between 2007 and 2010 in England and Wales, UK using routinely collected primary care data. This could facilitate an understanding of variation in diagnostic interval and inform the development and evaluation of targeted interventions to facilitate timelier diagnosis.

Methods
This analysis was undertaken alongside a previously reported study [12], extending the scope to examining the relationship of diagnostic interval with age, gender and symptom type. A more detailed description of applying cancer (S1 Table) and symptom (S2 and S3 Tables) codes to the dataset, and the process of identification and validation of these codes is given in that report and has been supplied as supplementary files for readers' reference for this report.
Ethical approval for this study was obtained from the Independent Scientific Advisory Committee (ISAC), under license numbers 09_0110 and 09_0111. All patient records/information was anonymised and de-identified when the dataset was obtained from the Clinical Practice Research Datalink-CPRD (General Practice Research Database-GPRD, at the time the data was acquired) and the analysis did not comprise any patient identifiable data.

Source population dataset
We used routinely collected UK general practice data obtained from the CPRD for 15 types of incident cancer (bladder, breast, cervical, colorectal, endometrial, gastric, head & neck, kidney, lung, leukaemia, lymphoma, myeloma, oesophageal, pancreatic, testicular) with at least one year of complete records before diagnosis. The CPRD is a large, longitudinal general practice database holding anonymised records of over five million active patients registered with over 650 general practices in England and Wales in the UK. General practices that agree to and fulfil strict quality criteria for data entry and maintenance only can contribute to this database and the data is then periodically quality checked to ascertain and maintain its robustness. At the practice level, the GP enters the most appropriate terms related to symptoms or diagnosis based on a list of drop down choices corresponding to the appropriate Oxford Medical Information Systems (OXMIS) and Read codes [27].
The dataset used in this study consisted of patients aged 40 years diagnosed between 1 st Jan 2007 and 31 st Dec 2010 inclusive with one of 15 cancers of interest described earlier.

NICE cancer symptom categories (NICE status)
Lists of potential symptoms (S4 Table) of primary, local and regional disease for the cancers of interest for this study were developed from the literature, and by consensus, amongst the three clinician researchers (RN, WH, GR), and were classified into 'NICE-qualifying symptoms' (NICE) or not (non-NICE) [12] (S5 Table). These symptom categories are sometimes referred to as 'alarm symptoms' and 'vague symptoms' respectively in the literature [23]. NICE symptoms were those specifically cited in the NICE Guideline for Urgent Referral of Suspected Cancer [28] as mandating urgent investigation or specialist assessment.

Diagnostic interval
The first occurrence of a cancer code in patient's primary care record in the CPRD dataset (S1 Table) pertaining to the cancer diagnosis was assigned to be the date of diagnosis [12,25] and the clinical record for the 12 month period preceding this date was studied. The 'diagnostic interval' was defined as the duration from the first occurrence of a symptom code in CPRD pertaining to a possible cancer to the date of cancer diagnosis, and was censored at 365 days. Hence, the diagnostic interval was calculated only for patients with identifiable symptom codes. The patients who were screen-or incidentally-detected, or who had emergency admissions without any symptom information were excluded. Although there have been reports of patients experiencing symptoms for more than a year before diagnosis [29], it is difficult to know whether very early symptoms genuinely arise from the cancer in question, or from benign or incidental conditions. We chose 365 days as a reasonable compromise in the absence of any methodological precedence [12] and it is in keeping with recently published consensus recommendations [13].

Data analysis
We examined the relationships between diagnostic interval and each of age, gender and NICE status. Separate analyses were carried out for each cancer site as well as a single overall analysis that included all cancers.
Numbers and percentages of symptomatic patients in the dataset, males and females, and patients with either NICE or non-NICE symptoms [12] among the symptomatic patients for each cancer site are reported. The mean age at diagnosis is reported for each cancer site. The distribution of diagnostic interval was summarised, reporting the mean, standard deviation, median, inter-quartile range (IQR), and 90 th centile. Median, IQR and 90 th centiles are shown as the preferred method for describing these skewed data, but comparisons across sub-groups are based on mean diagnostic interval (using linear regression models with diagnostic interval as the outcome and age, gender, and NICE status as predictors) as this was the parameter we wanted to make inferences for. Because the diagnostic interval distributions were skewed, we validated the linear regression results by constructing bias-corrected accelerated bootstrap confidence intervals for the mean differences (regression coefficients) as these are robust to nonnormality [30]. As the bootstrap confidence intervals were virtually the same as the regression model-based confidence intervals we report results from the latter analysis. The four genderspecific cancers (breast, cervical, endometrial and testicular) were omitted from the analyses of diagnostic intervals against gender.
Unadjusted (crude) linear regression models were fitted in which only one predictor was included and multivariable models in which all three of age, gender, and NICE status were included as predictors. We focus on the multivariable analyses as primary. Fractional polynomial models were used to check that the continuous predictor, age, had a linear relationship with diagnostic interval. Where the relationship was linear we reported the increase in mean diagnostic interval for every 10 year increase in age. Where the relationship was non-linear we divided the patient sample into five equal sized age categories based on the quintiles and used age as a categorical predictor in the linear regression model, comparing the mean of each of the four older categories to the youngest age category (reference category).
Where there was evidence at the 5% level of an association between diagnostic interval and the age and gender predictors, tests of interaction were undertaken to explore whether the relationships differ between categories defined by NICE status. All data manipulation and analyses were performed using Stata 11.0 software (StataCorp. 2009. Stata Statistical Software: Release 11. College Station, TX: StataCorp LP.).

Results
Demographic and symptom profile of the study sample 33,008 patients had a new diagnosis of cancer during the study period; of these 18,618 (56.4%) had a recorded symptom in the 12 months before diagnosis, so were included in the analyses. Mean age varied among cancers ranging from 50.8 (SD 10.5) years for testicular to 73.5 (SD 10.4) years for bladder. Because the dataset only contained patients aged 40 years or more, the mean ages for those cancers also affecting younger people are artefactually high. Percentages of patients with symptoms varied among cancer sites with leukaemia having the lowest (19.5%) and oesophageal having the highest (75.4%) percentage of symptomatic patients respectively. Patients presenting with NICE symptoms were considerably more common than with non-NICE symptoms for all cancers except cervical. More males than females had symptoms for all the gender non-specific cancers except lymphoma (49.9%) and pancreatic (46.6%). The general characteristics of patients with no symptoms were similar to the symptomatic population in all cancers, though these data are not presented here as the focus of this study was symptomatic cancer patients. Table 1 summarises the patient demographic characteristics regarding age, gender and percentage of symptomatic patients in each cancer group in the dataset.

Diagnostic intervals and age
The results from fitted fractional polynomial linear regression models indicated that the relationship between age and diagnostic interval was linear for all cancers except colorectal (Fig 1), in which the age was hence analysed as a categorical predictor in the linear regression model. The adjusted mean change in diagnostic interval per 10 year increase in age ranged from a 19 day increase for leukaemia to a 6 day decrease for pancreatic and gastric cancers ( Table 2). There was evidence of a relationship between diagnostic interval and age for seven of the cancers in the multivariable analysis showing longer diagnostic interval with increasing age for five cancers (mean change per 10 year increase in age; 95% confidence interval; p value): bladder (10.3 days; 95% CI: 5.5 to 15.1; P<0.001), kidney (11.0 days; 95% CI: 3.4 to 18.6; P = 0.004), leukaemia (18.5 days; 95% CI: 8.8 to 28.1; P<0.001), lung (10.1 days; 95% CI: 6.7 to 13.4; P<0.001) and colorectal (P <0.001-see Table 3); whereas mean diagnostic interval was shorter for two of the cancers: gastric (-5.9 days; 95% CI: -11.7 to -0.2; P = 0.04) and pancreatic (-6.0 days; 95% CI: -11.2 to -0.7; P = 0.03). There were no significant differences in other cancers. Pooling the patients from all cancers ( Table 2) resulted in strong evidence of a relationship showing longer diagnostic interval with increasing age (7.8 days per 10 year increase; 6.4 to 9.1; P<0.001). No evidence at the 5% level of significance was found of an interaction between age and NICE status for any cancer type.

Diagnostic intervals and gender
The 11 gender non-specific cancers included 15,987 symptomatic patients ( 12.2; P<0.001). No evidence at the 5% level of significance was found of an interaction between gender and NICE status for any cancer type.

Diagnostic intervals and NICE status
All 15 cancers were included in these analyses (

Summary of the main findings
The overall findings were that longer diagnostic intervals are associated with increased age, female gender and non-NICE symptoms. Not all cancer sites had these associations: for older age, longer diagnostic intervals were observed in five cancers (bladder, colorectal, kidney, leukaemia and lung) but shorter diagnostic intervals in two cancers (gastric and pancreatic). Gender analyses showed females had longer diagnostic interval than males, with significant evidence at the 5% level in six cancers (bladder, colorectal, gastric, head and neck, lung, and lymphoma). Presentation of a NICE symptom before diagnosis was associated with shorter diagnostic intervals in 10 of the 15 cancers (bladder, breast, cervical, colorectal, gastric, kidney, oesophagus, pancreatic, testicular and endometrial). Data combined from all cancers included in this study and analysed together showed that the diagnostic interval was longer for older patients, females and non-NICE symptoms.

Comparison with existing literature
This is the first study of this type to report the association between diagnostic interval and age and gender for patients with cancer. There was evidence that diagnostic interval increased with older age in five of 15 cancers. This finding is contradictory to a previous report [18] where longer diagnostic delays were reported for younger age groups, although this may be explained by methodological differences such as: their data were collected from patient surveys, whereas ours were GP-coded and collected from primary care consultations; they used different measures to analyse the data; there was a difference in the definition of diagnostic interval: number of days from first symptomatic presentation to date of diagnosis was used in our study, whereas 'primary care delay' was used in their study (derived by subtracting referral delay from the duration from noticing first symptoms to appointment by hospital doctor, based on patient recollection of these events). The findings of our study align with those of a recent report of a project piloted in five UK cancer network jurisdictions aimed, among others, at testing new methods of clinical assessment of older cancer patients. One of the main findings was that older cancer patients were being discriminated against, with care and treatment being determined based on age and not needs [21]. Other potential reasons to explain these findings include: changes in the nature, perception and presentation of symptoms with age [31], although this has not been shown in previous studies [32]; increasing age-related co-morbidity with concurrent treatment(s) masking potential cancer symptoms [14,31]; varying tumour biology and aggressiveness with age [33] and/or gender [34]; a reluctance by GPs to refer or investigate older and frailer people [35][36][37]; and differing age specific patterns in willingness to be referred for onward investigation by the patients [38]. Longer diagnostic interval and advanced stage at diagnosis in females have been reported before for some cancers [18,19,39] and our findings are in keeping with these; this is a useful corroboration as our data source is different. The significant relationship of female gender with longer diagnostic intervals in six of the 11 gender non-specific cancers analysed in our study supports the findings of disparities reported in other studies that females might delay seeking help when they detect or realise the presence of potential cancer related symptoms [14,19] as well as other chronic conditions such as heart disease [40], COPD [41] and others [42]. Although, this trend appears to be improving [43], it still highlights the need for a deeper understanding of this multi-dimensional phenomenon [44] of gender difference to tailor interventions according to patients' socioeconomic and cultural background [45], especially when females are reported to be keener on seeking more health related information [46], and appear to be more receptive [47]. This finding also highlights the fact that symptoms should not be overlooked by the health care professionals based on patients' gender only.
This study adds to previous findings that 'alarm' symptoms that qualify a patient for urgent referral (NICE) had shorter diagnostic interval than the 'vague' symptoms (non-NICE) [9,12] indicating that the symptoms that were already getting a good service are getting an even better one [31], and their prioritisation over more vague symptoms may lead to a 'slow track' for diagnosis [12,48,49].

Strengths and limitations
In the UK, over 95% of the population is uniquely registered with only one general practice. Hence, the population data derived from the GP system is highly representative of the general population. We used a large, longitudinal UK general practice dataset, which has previously been used for cancer diagnostic studies [23,24] and has been validated for diagnostic coding accuracy of upto 95% in recent systematic reviews [27,50].
Though our definition of diagnostic interval aligns with recent recommendations on the design and conduct of studies using such datasets [13], there are methodological weaknesses in measuring diagnostic interval from electronic records [12]. This study used CPRD codes to extract symptom and cancer diagnosis dates from the dataset. The cancer diagnostic codes are usually entered in the GP system by the practice staff upon receipt of the diagnostic confirmation letter from a hospital bearing the date of diagnosis. There is a possibility at this stage that the date of the letter itself or the date of coding entry might erroneously be entered as the date of cancer diagnosis. This may affect the diagnostic interval in some cases. Likewise, some cancer diagnoses will have been unrecorded or recorded incorrectly, leading to either such cases being excluded from our analysis or might have affected the correct diagnostic interval calculations respectively. These effects, though, are unlikely to affect a large proportion of the study population when the CPRD databases have been validated to show a diagnostic coding accuracy of upto 95% recently [27,50]. Similarly some symptoms might not have been recorded, or recorded in a less accessible field (so-called 'free-text'), although this may not be important because a recent CPRD study indicated that free-text data usually only confirms what is entered in an accessible coded form [51], and electronic records have been found to be of similar quality to paper records [52]. Furthermore, some cancers might have presented with different or atypical symptoms not included in our defined list. Also, we assumed that all the symptoms in our list represented the symptomatic presentation of the cancer; however some may have been co-incidental.
Although we were unable to specifically identify screen-detected patients, most would have had no symptoms, and would therefore have been correctly excluded. Low proportions of symptomatic patients in some cancers, such as breast, can be explained by the fact that between 39-46% patients can be screen-detected as reported in other UK studies using different data sources [53,54]; others could present with atypical symptoms or as emergency admissions.
The cut-off point for symptoms at 12 months prior to the date of diagnosis was based on the judgement that very few would have had a diagnostic interval longer than this. If we had extended the time cut-off we would have picked more patients whose symptoms might not have been related to subsequent cancer diagnosis, but we equally would have captured more patients with genuine diagnostic interval of greater than one year. There may also be variation between cancers; however, for consistency and in the absence of any methodological precedents, we used the time period of 12 months for all the cancers as a compromise.
Patients under the age of 40 were not included. This was based on a practical decision because of the rarity of cancer diagnoses in this group; only 10% of all new cases in the UK occur in the age group 25-49 [55]; and if they do occur, may be atypical or part of a familial syndrome [56,57]. This approach is in keeping with similar primary care studies [12,25]. Apart from this, the age and symptom profile as well as male to female ratios in our datasets are similar to other national cancer surveillance systems [55,58] indicating that the sample was representative of the UK cancer population.
The authors would urge caution in interpreting and generalising the findings of this study keeping in mind the inherent methodological limitations of analysing retrospective electronic data such as completeness, accuracy etc. We would also reiterate that our results would only apply to cases that had symptomatic presentation before the date of diagnosis, hence some patients with emergency admissions who had missing symptom information in their records would have been excluded, though they might have had shorter diagnostic intervals. Similarly, screen-or incidentally-detected cancer patients would have been excluded as well. These artefacts would limit the generalisability of the findings. We also acknowledge that clinical heterogeneity within certain cancer groups in our study (e.g. leukaemia, head and neck), may also limit the generalisability of our findings.

Implications
Interventions aimed at reducing cancer diagnostic intervals should be tailored to address inequalities in certain age and/or gender groups. This study has identified specific cancer sites where such action would be of benefit. We have also provided a baseline against which future intervention effects as well as evaluation outcomes can be assessed. More work is needed to understand the complex interaction between age, gender and types of symptoms and diagnostic intervals, their effects on stage at diagnosis, and the types of interventions needed to address the inequalities.

Conclusions
Diagnostic interval has been shown to vary with age, gender and NICE status across 15 different cancers. For some, there appear to be little age and/or gender differences. However, increasing age for bladder, colorectal, kidney, leukaemia, and lung cancers; female gender for bladder, colorectal, gastric, head and neck, lung, and lymphoma cancers; and non-NICE symptoms for 10 of the 15 cancers analysed in this study were associated with longer diagnostic intervals.
Supporting Information S1