A Retrospective Observational Study of the Relationship between Single Nucleotide Polymorphisms Associated with the Risk of Developing Colorectal Cancer and Survival

Background There is variability in clinical outcome for patients with apparently the same stage colorectal cancer (CRC). Single nucleotide polymorphisms (SNPs) mapping to chromosomes 1q41, 3q26.2, 6p21, 8q23.3, 8q24.21, 10p14, 11q13, 11q23.1, 12q13.13, 14q22, 14q22.2, 15q13.3, 16q22.1, 18q21.1, 19q13.11, 20p12, 20p12.3, 20q13.33 and Xp22 have robustly been shown to be associated with the risk of developing CRC. Since germline variation can also influence patient outcome the relationship between these SNPs and patient survivorship from CRC was examined. Methods All enrolled into the National Study of Colorectal Cancer Genetics (NSCCG) were genotyped for 1q41, 3q26.2, 6p21, 8q23.3, 8q24.21, 10p14, 11q13, 11q23.1, 12q13.13, 14q22, 14q22.2, 15q13.3, 16q22.1, 18q21.1, 19q13.11, 20p12, 20p12.3, 20q13.33 and xp22 SNPs. Linking this information to the National Cancer Data Repository allowed patient genotype to be related to survival. Results The linked dataset consisted of 4,327 individuals. 14q22.22 genotype defined by the SNP rs4444235 showed a significant association with overall survival. Specifically, the C allele was associated with poorer observed survival (per allele hazard ratio 1.13, 95% confidence interval 1.05–1.22, P = 0.0015). Conclusion The CRC susceptibility SNP rs4444235 also appears to exert an influence in modulating patient survival and warrants further evaluation as a potential prognostic marker.


INTRODUCTION
Colorectal cancer (CRC) is a common disease in the UK affecting around 40,000 individuals annually and accounting for 16,000 cancer related deaths each year [1]. Despite major advances in the medical management of CRC over the last 25 years, five-year survival remains at only around 55% [1].
A principle metric of patient prognosis of CRC is stage at presentation [2] however there is significant variability in overall survival (OS) of patients with apparently same stage disease and understanding these differences is clinically important.
There is evidence of familial concordance for survival in a number of cancers, including CRC [3], which suggests that inherited genetic variation can contribute to CRC prognosis. Additionally, studies have reported associations with survival from CRC with genetic variants alone or in combination with specific types of chemotherapy [4][5][6]. Hence, as a potential prognostic factor the concept of germline variation imparting inter-individual variability in tumour development, progression and metastasis is receiving increased attention [7][8][9][10][11].
This hypothesis has been variously examined by a number of researchers but with contradictory results [17][18][19][20][21][22][23][24]. Disparity may be due to the relatively small and heterogeneous cohorts of individuals analysed which had limited power to detect clinically important relationships between SNP genotype and outcome and, hence, the prognostic significance of these CRC susceptibility variants remains controversial. To address shortcomings in previous studies we have made use of the recent linkage [10] of the large National Study of Colorectal Cancer Genetics (NSCCG) [25] with the data in the National Cancer Data Repository (NCDR) [26]. This linkage has offered an opportunity to relate genotype and outcome across a larger population than has previously been possible. Using these data, this study aimed to investigate whether 19 CRC susceptibility SNPs also exerted an influence of survival from the disease.

Patients and record linkage
Full details of the NSCCG have been published elsewhere [25] but, in brief, the study collected DNA and clinicopathological data from over 20,000 individuals with colorectal cancer and a series of spouse/partner controls with the aim of creating a unique resource for the identifying low-penetrance CRC susceptibility genes. All individuals within this study for whom SNP information were available and who could be linked to the NCDR were, therefore, identified and matched using the method described previously [10]. To minimise bias, cases were excluded from the analysis if there was more than a year between the diagnosis of CRC in an individual recorded in the NCDR and their recruitment to the NSCCG (Fig. 1).

Statistical analysis
Statistical analyses were conducted using Stata version 13 (State College, Tx, USA). A P-value of 0.05 (two sided) was considered to be significant. When commented, a Bonferroni correction for multiple comparisons corresponded to a value of 0.0026 (0.05/19 SNPs). Results are presented without correction for multiple testing to mitigate against type II error. Differences in patient characteristics between groups were assessed using χ 2 and Kruskal-Wallis tests. The study end-point was five-year overall survival calculated from date of recruitment to the NSCCG to date of death or when censored (30 th June 2011). Kaplan-Meier graphs according to genotype were generated and their homogeneity evaluated using log-rank tests. Cox proportional hazards regression analysis was used to estimate hazard ratios (HR) and their 95% confidence intervals (CI) whilst adjusting for age, sex, Dukes' stage of disease at diagnosis, deprivation score, tumour site (colon, rectosigmoid junction or rectum), and year of diagnosis. The P-values presented correspond to the significance of a test difference among all three of the genotype groups (common allele homozygote, heterozygote and rare allele homozygote).
The power to demonstrate a relationship between SNP genotype and OS was estimated using sample size formulae for comparative binomial trials. To evaluate the chance of obtaining a false-positive association in our data set and to assess the robustness of previously reported associations between SNP genotype and patient outcome, we made use of the false-positive report probability (FPRP) test [27]. The FPRP value is determined by the P value, the prior probability for the association, and statistical power. For our analyses, we assumed prior probabilities of 0.05, 0.01 and 0.001; imposing an FPRP cut-off value of 0.5 as advocated [27], values less than 0.5 were considered to be noteworthy, being indicative of a robust association.
Meta-analysis of study findings with previously published data was performed using a fixed-effects model, estimating Cochran's Q statistic to test for heterogeneity and the I 2 statistic to quantify the proportion of the total variation between studies.

Linkage
Information on 9,229 individuals recruited to the NSCCG and with SNP information was supplied for linkage to the NCDR. The study population consisted of 4,327 (46.9%) of these individuals who both matched into the NCDR and who were recruited to the NSCCG within a year of the diagnosis of their disease (Fig. 1).

Descriptive statistics
Complete clinical and demographic characteristics of the subjects studied are provided in Table 1. The median age at diagnosis of CRC was 60 years (mean, 58.6 years; standard deviation, 8.0). A total of 2,626 cases (60.7%) had colonic, 416 (9.6%) rectosigmoid and 1,285 (29.7%) rectal tumours; the majority of patients presented with Dukes' stage B and C tumours (3,055, 70.6%).
Overall, the 5-year survival rate was 64.3% (95%CI 62.9-65.8%). There were 1,658 (38.3%) deaths across the entire cohort. Survival was strongly associated with tumour stage (P<0.0001); 5-year survival ranged from 54.9% (95%CI 0.50-0.60) for patients diagnosed with stage D CRC to 88.4% (95%CI 83.6-91.2%) for those with the stage A CRC. Since these survival rates are not significantly different to those documented in previously published studies investigating the prognosis of actively managed CRC patients [28], it was concluded that there is no evidence that 'healthy study participant' selection would bias analyses.

Relationship between SNP genotype and OS
There was no statistically significant correlation between SNP genotype and the pathological parameters, site and stage. Only one SNP showed evidence of a correlation with OS (Table 2). A significant association were identified between rs4444235 genotype and prognosis, where the hazards ratio for increasing number of variant alleles was 1.13 (95% CI: 1.05-1.22). Hazard ratios for heterozygosity, homozygosity and carrier status were: 1.18 (95% CI: 1.04-1.34) and 1.28 (95% CI: 1.11-1.48), respectively. It should be noted that the association (P trend = 0.0015) remained significant if Bonferroni correction for multiple testing was applied (P adj = 0.032) and remained noteworthy (i.e. FPRP0.5) provided the prior was >0.001. Kaplan-Meier estimates (Fig. 2) demonstrated that carriers had lower five-year survival than those with the wildtype genotype (P<0.01). Commentary on previously published studies A number of previous studies have evaluated the relationship of rs4444235 and other risk SNPs with patient prognosis (Table 3). Tenesa et a l [21] analysed 10 CRC susceptibility variants but found no association with OS or CRC-specific survival (CSS). Xing et al [22] analysed six SNPs in a small cohort of patients in relation to recurrence and death and generated evidence suggesting that rs10795668 (10p14) might influence recurrence (P = 0.007, P adj = 0.042). The effect observed was strongest in those receiving chemotherapy. Phipps et al [20] studied 16 CRC SNPs (including some also analysed by Tenesa et al [21] and survival in 2,611 CRC patients ascertained from five cohort studies. They reported the 18q21 variant rs4939827 affected OS (P = 0.002; P adj = 0.03). Most recently Abuli and co-workers [17] reported on the relationship between 16 CRC risk SNPs CRC patients requited to the Spanish EPICOLON consortium. Genetic variants rs9929218 at 16q22.1 and rs10795668 at 10p14 were reported to have an effect on OS (P = 0.0179 and 0.057, respectively) albeit neither robust after adjustment for multiple testing (P adj = 0.28 and 0.91, respectively). Most recently, Hoskins et al [19] have reported on the relationship between 11 SNPs and survival. The only associations reported to be significant was for homozgosity for 8q24 SNPs rs7013278, rs7014346 (P = 0.01 and 0.03 respectively, P adj for number of risk loci = 0.06 and 0.18 respectively). Contemporaneously Dai and co-workers [18] reported on the relationship between 26 SNPs in 10 of the GWAS risk loci in a cohort restricted to individuals with Dukes' stage B and C cancers. rs961253 (20p12.3), rs355527 (20p12.3), rs4464148 (18q21.1), rs6983267 (8q24.21) and rs10505477 (8q24.21) were significantly associated with survival. The effects were no longer statistically significant, however, after adjustment for multiple testing. Irrespective of correction for multiple testing assuming a prior of 0.001 none of these associations are inherently robust. Only two of the previously reported studies have investigated the influence of rs4444235 on prognosis and they found no significant association with survival [20,21]. To further examine the association between rs4444235 genotype and OS a meta-analysis pooling our study with these studies was undertaken (Fig. 3). Collectively, the three studies provided rs4444235 genotypes on a total of 9,686 CRC patients. Using these data, the summary OR was 1.08 (95%CI 1.02-1.14) with the potential of heterogeneity between studies (P het = 0.34; I 2 = 8%).

DISCUSSION
Here we have provided evidence that variation at 14q22.2 defined by rs4444235 influences CRC outcome independent of established metrics. Although our study did not provide evidence for a relationship between other SNPs our analysis only had 50-70% power to demonstrate a relationship between carrier status for a 10% difference in prognosis, at the 5% threshold. Hence it is not possible to conclusively exclude the possibility that variation at the other CRC risk loci may also be linked to outcome.
Major strengths of our study are its size, the fact that it is drawn from a representative sample of the population, and involved the systematic follow-up of patients. Overall survivorship is unlikely to have influenced study findings, even though case selection in NSCCG is biased to Dukes' stages A and B disease. It therefore seems unlikely that any spurious influences as a consequence of study design will have impacted significantly on our findings. Furthermore, as our analysis was restricted to UK patients with self-reported European ethnicity our study findings are also unlikely to be confounded by population stratification. We do however acknowledge that a limitation of our study is that we have not addressed potential bias arising from non-uniform treatment. While this is a potential serious confounder in studies of some tumours the management of CRC is relatively uniform within the UK. Support for this assertion is provided by the fact that survival rates observed in our study population were not different to those expected of other unselected patients of a similar stage profile treated in the UK [2]. It is likely that the impact of risk variants will be contingent upon interaction with non-genetic risk factors. Unfortunately, such data were not available within the current study to allow such an analysis.
Mechanistically a functional basis for only the 14q22.2 association has yet to be fully elucidated. It is also noteworthy that the risk allele of rs4444235 appears to be preferentially associated with the development of microsatellite stable CRC [29,30]. This is consistent with the observation that germline mutation in the TGF-β superfamily-signalling pathway genes is associated with microsatellite stable CRC, and hence may impact indirectly on patient outcome. Furthermore, reporter gene studies have demonstrated that the element to which rs4444235 maps acts as an allele-specific transcriptional enhancer. Allele-specific expression studies in CRC cell lines heterozygous for rs4444235 have shown significantly increased expression of bone morphogenetic protein-4 (BMP4) associated with the risk allele providing evidence for a functional basis for the non-coding risk variant [31].
This analysis has provided evidence that variation in 14q22.2 plays a role in defining individual patient prognosis. However compelling this association between 14q22.2 and OS is on the basis of biological plausibility as with all association studies independent validation of study findings is required. While previously published studies have not provided support for the 14q22.2 association such studies are small and hence have had limited power to demonstrate a relationship [17][18][19][20][21][22][23][24]32]. Hence our analysis serves to highlight the statistical problem of searching for genetic associations when the impact of any variant is likely to be at best modest. Even stipulating significance level of 0.05 for an analysis of clinical trial data is unrealistic, because to have 80% power to demonstrate a 5% difference in survival, which is clinically relevant, requires at least 4,800 patient samples to be analysed, even if the frequency of the at risk genotype is 50%. Hence it is therefore not perhaps surprising that previously purported associations cannot be considered robust if FPRP type criteria are imposed.
While germline variants are unlikely to replace staging schemes and conventional markers, they have potential to assist in distinguishing different outcome patterns among patients with the same stage disease where 10% differences are clinically relevant thereby opening up the possibility of a rational, targeted approach to treatment based on a combination of genotype and tumour characteristics of a patient.