The Protective Effect of Low-Dose Aspirin against Colorectal Cancer Is Unlikely Explained by Selection Bias: Results from Three Different Study Designs in Clinical Practice

Background We conducted three differently designed nested case–control studies to evaluate whether the protective effect of low-dose aspirin against colorectal cancer (CRC) is explained by selection bias. Methods Using a large validated UK primary care database, we followed different cohorts of patients, who varied in their demographic and clinical characteristics, to identify first ever cases of CRC. In Studies 1 and 2, two cohorts were followed, i) new users of low-dose aspirin at start of follow-up (N = 170,336 in Study 1, N = 171,527 in Study 2) and either ii) non-users of low-dose aspirin (Study 1, N = 170,336) or new users of paracetamol (Study 2, N = 149,597) at start of follow-up. In Study 3 a single cohort of individuals näive to low-dose aspirin at the start of observation was followed. Controls were selected using incidence sampling and logistic regression used to obtain an unbiased estimate of the incidence rate ratio (RR) with 95% confidence intervals (CIs). Low-dose aspirin exposure was analyzed ‘as-treated’ before the index date (CRC date for cases, random date for controls). Results In the three studies, median (maximum) follow-up was 5.1 (12), 5.8 (12) and 7.5 (13) years, respectively. 3033 incident CRC cases were identified in Study 1, 3174 in Study 2, and 12,333 in Study 3. Current use of low-dose aspirin was associated with a significantly reduced risk of 34%, 29% and 31% in the three studies, respectively; corresponding RRs (95% CIs) were 0.66 (0.60–0.73), 0.71 (0.63–0.80) and 0.69 (0.64–0.74). In each study, significantly reduced risks of CRC were seen when low-dose aspirin was used for primary or secondary cardiovascular disease prevention, in both sexes, and across all age groups evaluated. Conclusion Low-dose aspirin is associated with a significantly reduced risk of CRC. The consistency of our findings across different studies makes selection bias an unlikely explanation.


Methods
Using a large validated UK primary care database, we followed different cohorts of patients, who varied in their demographic and clinical characteristics, to identify first ever cases of CRC. In Studies 1 and 2, two cohorts were followed, i) new users of low-dose aspirin at start of follow-up (N = 170,336 in Study 1, N = 171,527 in Study 2) and either ii) non-users of lowdose aspirin (Study 1, N = 170,336) or new users of paracetamol (Study 2, N = 149,597) at start of follow-up. In Study 3 a single cohort of individuals näive to low-dose aspirin at the start of observation was followed. Controls were selected using incidence sampling and logistic regression used to obtain an unbiased estimate of the incidence rate ratio (RR) with 95% confidence intervals (CIs). Low-dose aspirin exposure was analyzed 'as-treated' before the index date (CRC date for cases, random date for controls).

Introduction
Colorectal cancer (CRC) is an increasingly global public health problem and is the third most commonly diagnosed cancer worldwide [1]. Prognosis is highly related to stage at diagnosis [2]; in the UK, 5-year relative survival is over 90% for patients with Dukes Stage A CRC at diagnosis compared with around 50% and less than 7% for patients presenting with Stage C and D CRC, respectively [3]. Comparable estimates have been reported in the US [4].
Evidence from cardiovascular trials suggests daily use of low-dose aspirin reduces the risk of developing CRC, and that part of this reduction in risk might be due to protection against metastatic disease [5,6]. Evidence from patients in the 'real-world' involve pharmacoepidemiological studies, which require a robust and unbiased study design to obtain valid results. In order to rigorously investigate the relationship between low-dose aspirin and risk of CRC, we carried out three separate cohort studies with nested case-control analysis, each adopting a different study design. We hypothesized that low-dose aspirin would be associated with a reduction in risk in all three studies, thereby minimizing the chance that the protective effect is due to selection bias. The study protocol was reviewed and approved by an independent scientific review committee (reference number 12-044V).

Data source
We used data from The Health Improvement Network (THIN), a validated primary care database of anonymized patient records in the United Kingdom (UK). Almost all of the UK population are registered with a primary care practitioner (PCP) and THIN is representative of the UK population with regards to age, sex and geographic distribution [7,8]. Primary care practitioners record information electronically as part of routine patient care, and send it to THIN for use in research projects. Currently close to 600 general practices throughout the UK have contributed data to THIN [9]. Details of the database, which has been used for a large number of observational studies, have been described previously [10]. THIN was used for each study although the study periods varied slightly owing to the data available at the time each study was conducted (Fig 1). For each study, individuals entered the source population upon meeting all eligibility criteria (S1 Text).

Source population and identification of study cohort(s) in each study design
The source population comprised individuals in THIN meeting data quality and completion standards (Fig 1). Individuals were excluded if they had a prescription for low-dose aspirin (75 or 300 mg; tablets available in the UK) or a recorded diagnosis of cancer any time before study entry. For Study 2, individuals were also excluded if they had a prescription for paracetamol monotherapy prior to study entry; this was not an exclusion criterion in Studies 1 and 3. Identification of the study cohort(s) in the three different studies is illustrated in Fig 1 with further details can be found in the S1 Text. In the first study (Study 1) a cohort of new-users of lowdose aspirin was identified with the date of each individual's first low-dose aspirin prescription set as the start of follow-up (start date). Each member of this cohort was matched to an individual who was still free of low-dose aspirin at the start date. Matching was by age, sex and number of PCP visits in the previous year. The second study (Study 2) was similar in design to Study 1, but here the second cohort comprised new users of a 'neutral' drug-paracetamol-at the start date in order to minimize any potential healthy user bias arising from differences between users and non-users of low-dose aspirin. Unlike in Study 1, in this study, individuals in the new users of low-dose aspirin cohort were not matched to the new users of paracetamol cohort. In this study, paracetamol (used as monotherapy) was chosen as a suitable comparison drug to low-dose aspirin by virtue of meeting the following necessary criteria: i) it has no known effect, either positive or negative, on the risk of CRC, ii) it is not a proxy of significant co-morbidity, and iii) prevalence of use among the source population is sufficiently common to sample from. In the third study (Study 3), we used a more conventional methodology using a cohort of all individuals in the source population rather than the selection of two separate cohorts.

Follow-up to identify incident cases of CRC
In each study, members of the study cohort(s) were followed until the earliest of the following endpoints: occurrence of a Read code for CRC (S1 Table; identified through computerized searches), a recorded diagnosis of cancer other than CRC, age of 90 years, death or the end of the study period (31 December 2011 in Study 1 and 31 December 2012 in Studies 2 and 3; Fig  1). For patients with a diagnostic Read code for CRC in Study 1, 86.4% were deemed to be true incident cases following manual review of the patients' electronic medical records (EMRs) [11]. The index date was the date of first CRC-related symptom, screening or diagnostic procedure, or surgery, whichever came first. Because of the high positive predictive value of the CRC Read codes in THIN in Study 1, we did not manually review the EMRs of patients with a Read code for CRC in Studies 2 and 3, and the index date in these two studies was the date of the computer-detected CRC Read code.

Selection of controls
In each study, controls were selected to perform nested case-control analyses. In Studies 1 and 2, 10,000 controls were randomly sampled and frequency matched to CRC cases by age, sex and calendar year. In Study 3, a total of 20,000 controls were randomly sampled and frequency-matched to CRC cases by age, sex and follow-up time to CRC occurrence. We used incidence density sampling so that the likelihood of being selected as a control was proportional to the person-time at risk. To do this, we generated a random date within the study period for each member of the study cohort(s). If this date was included in the patients' eligible person-time, we used their random date as the index date and marked that person as an eligible control. Controls were subject to the same eligibility criteria applied to cases.

Identification of risk factors
Information on potential risk factors was extracted from the database, including demographics, body mass index (BMI), lifestyle factors, morbidities, medication use, and PCP visits, referrals and hospitalizations. Medication use was classified into three categories: current use, when supply of the most recent prescription lasted until the index date or ended 1-90 days before the index date; recent/past use, when supply of the most recent prescription ended 91 days before the index date; and non-use, when there was no recorded use at any time before the index date. Low-dose aspirin was analyzed using an 'as-treated' approach, evaluating exposure between the start and index dates, and therefore for Studies 1 and 2 was irrespective from the study cohort from which the individual was assigned at start of follow-up (further details can be found in the S1 Text). The reason we did not use an intention-to-treat analysis in Studies 1 and 2 (i.e. by low-dose aspirin exposure at start of follow-up) comparing incidence rates of CRC between the two study cohorts, was because use of low-dose aspirin could have changed in a substantial number of patients over the lengthy follow-up duration; intention-to-treat analysis was not applicable in Study 3. All medication use evaluated (including clopidogrel and dual antiplatelet therapy [DAT] with clopidogrel/low-dose aspirin) was also analyzed 'as-treated', according to the three exposure categories above. Further methods on the evaluation of medications (including low-dose aspirin) and other risk factors can be found in the S1 Text.

Statistical analysis
In each study, incidence rates of CRC were calculated using the number of cases as the numerator and the respective person-time as the denominator, stratified by sex and age group. Nested case-control analyses were performed to estimate the association between low-dose aspirin and the occurrence of CRC. Under the study design of incidence density sampling, the odds ratio is an unbiased estimator of the incidence rate ratio (RR) [12]. Rate ratios and 95% confidence intervals (CIs) were calculated by unconditional multiple logistic regression models adjusted in each study for the frequency matching factors, number of PCP visits, smoking, non-steroidal anti-inflammatory drugs and BMI. Use of insulin and oral steroids were also included in the fully adjusted model in Studies 1 and 2, with adjustment for paracetamol monotherapy also undertaken in Study 2. All potential confounders were treated as categorical variables and a separate level was created for variables with missing information. Stratified analyses and tests for interaction were performed (S1 Text). Statistical analyses were carried out using STATA package version 12.0 (StataCorp LP, College Station, TX, USA).

Characteristics of the study cohort(s)
Characteristics of the study cohort(s) in each of the three studies are shown in Table 1. As expected, due to the more restrictive inclusion criteria in Studies 1 and 2, the mean age at start of follow up was higher (64 years) in these two studies than in Study 3, while a greater proportion of younger individuals were present in Study 3 (mean age 53 years). For the same reason, follow-up to identify CRC generally started from an earlier calendar period in Study 3. The distribution of men was slightly higher in Study 1 (52%), than in Study 2 (48%) and Study 3 (49%). Patient characteristics of the study cohort(s) in each study are shown in Table 2. As expected, individuals in the study cohort of Study 3 had fewer comorbidities, were less likely to be overweight, obese, taking the medications evaluated, and had fewer PCP visits than individuals in Studies 1 and 2. In Study 1, the low-dose aspirin cohort were less healthy in terms of comorbidities, BMI, smoking status and previous colonoscopy/sigmoidoscopy procedures than the non-user of low-dose aspirin at start of follow-up cohort, whereas in Study 2, the new users of low-dose paracetamol cohort had a higher level of several comorbidities (chronic obstructive pulmonary disease, inflammatory bowel disease, depression, gastrointestinal conditions), more previous colonoscopy/sigmoidoscopy procedures and a similar level of current smokers compared with the low-dose aspirin cohort. In addition, the prevalence of these particular patient characteristics in the new users of paracetamol at start of follow-up cohort in Study 2 was higher than those in the non users of low-dose aspirin at start of follow-up cohort in Study 1.

Incidence of CRC
A total of 3033 incident CRC cases were identified in Study 1 compared with 3174 in Study 2 and 12,333 in Study 3. Of the 3033 cases in Study 1, 55.2% (n = 1674) were also cases in Study 2 (because they met the eligibility criteria for both studies), while of the 12,333 cases in Study 3, 65.2% (n = 8043) were not cases in Studies 1 or 2. Details on the clinical features of the 3033 cases in Study 1 have been described previously [13]. Median follow up duration was 5.1 years in Study 1, 5.8 years in Study 2 and 7.5 years in Study 3. The incidence of CRC per 10,000 person-years was 16 2 and 3, respectively. In each study, incidence rates increased with age in both sexes and were higher in men than women in all age groups (Fig 2).

Characteristics of cases and controls
Cases and controls were generally comparable with regards to demographics, healthcare utilization, lifestyle characteristics and comorbidities across the three study designs (Table 3); however, cases had more healthcare visits (PCP visits, referrals and hospitalizations) and higher alcohol consumption. In both Studies 1 and 2, 41% of cases and 45% of controls were current users of low-dose aspirin, compared with 12% cases and 13% controls in Study 3. More than 90% of low-dose aspirin use among cases and controls in each study was at a dose of 75 mg/ day (see S2 Table for further details on aspirin use among cases and controls in each study). The proportion of low-dose aspirin current users (among both cases and controls combined) who had at least 1 years' use of the drug by the index date was 68.0%, 77.3% and 73.6% in Studies 1, 2 and 3, respectively.
Low-dose aspirin and risk of CRC Current use of low-dose aspirin was associated with a statistically significant reduction in risk of CRC in each study design; RR 0.66 (95% CI: 0.60-0.73) in Study 1, 0.71 (95% CI: 0.63-0.80) in Study 2 and 0.69 (95% CI: 0.64-0.74) in Study 3 (Fig 3). Reduced risks of CRC were seen throughout treatment duration, and no dose-response relationships were observed. The RR of CRC with current use of low-dose aspirin is shown by patient subgroups in Fig 4. Using all  . This is because, in these two studies, an updated version of THIN (the latest data released) was used at the start of follow-up to identify CRC cases, whereas an earlier version was used for the ascertainment of study cohorts. For these two studies, some patients included in the study cohort(s) were no longer eligible for inclusion in the follow-up to identify CRC cases for reasons such as they were no longer included in the database (e.g. had transferred out of the practice); it transpired they had died on the same day that they became an eligible study cohort member; they no longer met all inclusion criteria (e.g. data incompleteness).  three study designs, significantly reduced risks of CRC were seen when low-dose aspirin was used for primary or secondary cardiovascular disease prevention, for both fatal and non-fatal cases, for colon or rectal CRC, for both sexes, and across all age groups. Age did not modify the protective effect of low-dose aspirin (p for interaction = 0.27, 0.23 and 0.53 in the three studies, respectively), and no differences were observed between men and women (p for interaction = 0.24, 0.98 and 0.98, respectively). In addition, current use of low-dose aspirin was associated with a reduced risk of CRC among patients with or without previous colonoscopy/ sigmoidoscopy procedures (Fig 4)

Discussion
Using data from a validated UK primary care database, our results support a 30% protective effect of low-dose aspirin against the development of CRC. This effect was consistent across our three studies, each adopting a different study design, supporting previous findings. Our findings also suggest this protective effect is present across patient subgroups. In each study design, a daily dose of 75 mg/day was effective in reducing CRC risk in line with previous findings [5]. This is an important and potentially beneficial public health finding because aspirin doses of less than 100 mg are sufficient to reduce the risk of thrombotic cardiovascular events, while the risk of aspirin-associated gastrointestinal bleeding is dose-dependent and is lower with daily doses of 75 mg than with 300 mg [14][15][16]. We did not observe any dose-response relationships between low-dose aspirin and CRC; however, more than 90% of aspirin use in  each study was at a dose of 75 mg/day, leaving insufficient power to assess the higher doses evaluated. While overall CRC incidence rates from Studies 1 and 2 were higher due to the age differences at start of follow up, CRC occurrence was similar across our three studies when reporting age-and sex-specific rates, with a pattern of increased incidence with age in both sexes and consistently higher rates in men. Incidence rates from Study 3, using a cohort of relatively 'unselected' patients are, however, the most representative estimates, and are in line with national statistics from the UK during the study period [17][18][19].
Owing to the very similar results across our three studies, despite the variation in the characteristics of the study cohort(s) between the three studies, we feel it is unlikely that the observed protective effect of low-dose aspirin is explained by bias. All three studies adopted new-user study designs [20] that would have helped to minimize the survival bias present among prevalent users. The design of Study 1 would have helped minimize potential bias arising from differences between users and non-users of low-dose aspirin, which may have been difficult to otherwise control, while use of a 'neutral' drug-paracetamol-in Study 2 minimized any healthy user bias. In fact, we found that in Study 2, the new-users of paracetamol at start of follow-up cohort was less healthy in terms of several comorbidities and lifestyle factors than the new users of low-dose aspirin at start of follow-up cohort. Moreover, the lack of Low-Dose Aspirin and Risk of Colorectal Cancer association between paracetamol and CRC observed in the three studies further reinforces the validity of this drug for use as an active comparator group. Our results have high external validity because data in THIN has been shown to be representative of the UK general population [8]. In addition, the source population for each study included a broad range of real-world patients, including those with gastrointestinal disorders. Previously, we have found the Dukes Stages of CRC cases in Study 1 to be broadly consistent with national data [11] supporting their representativeness to cases in the general population. We have also previously validated cases from Study 1 through linkage to hospitalization data and through PCP questionnaires for a smaller sample [11], finding high confirmation rates. Characteristics of cases and controls, in terms of healthcare use, comorbidities and lifestyle factors, were similar across the three studies, supporting the likelihood that cases in Studies 2 and 3 are also likely to have a high level of validity and be representative of those in the general population.
In the UK, low-dose aspirin can be obtained over-the-counter (OTC), yet misclassification of aspirin exposure due to unrecorded OTC use is likely to be minimal as shown previously through a survey of PCPs contributing data to THIN [21]. Any misclassification because of non-adherence would have been non-differential and biased the risk estimates towards the null. Paracetamol is also available as an OTC medication in the UK and similarly any misclassification is likely to be non-differential between cases and controls. Residual confounding, however, cannot be excluded despite being able to control for several confounders in our analyses. Some risk factors for CRC, such as a positive family history and red meat intake are not recorded in the database and so we could not include these variables in our analyses; however, they are unlikely to be related to aspirin exposure and confound the associations found.
In conclusion, we have found that new use of low-dose aspirin is associated with a significant 30% reduced risk of CRC, consistent across patient subgroups. The similarity of our results between studies adopting different study designs makes it unlikely that the findings can be explained by selection bias.
Supporting Information S1