Association of Caucasian-Identified Variants with Colorectal Cancer Risk in Singapore Chinese

Background Genome-wide association studies (GWAS) in Caucasians have identified fourteen index single nucleotide polymorphisms (iSNPs) that influence colorectal cancer (CRC) risk. Methods We investigated the role of eleven iSNPs or surrogate SNPs (sSNPs), in high linkage disequilibrium (LD, r2≥0.8) and within 100 kb vicinity of iSNPs, in 2,000 age- and gender-matched Singapore Chinese (SCH) cases and controls. Results Only iSNP rs6983267 at 8q24.21 and sSNPs rs6695584, rs11986063, rs3087967, rs2059254, and rs7226855 at 1q41, 8q23.3, 11q23.1, 16q22.1 and 18q21.1 respectively showed evidence of association with CRC risk, with odds ratios (OR) ranging from 1.13 to 1.40. sSNP rs827401 at 10p14 was associated with rectal cancer risk (OR = 0.74, 95% CI 0.63–0.88) but not disease prognosis (OR = 0.91, 95% CI 0.69–1.20). Interestingly, sSNP rs3087967 at 11q23.1 was associated with CRC risk in men (OR = 1.34, 95% CI 1.14–1.58) but not women (OR = 1.07, 95% CI: 0.88–1.29), suggesting a gender-specific role. Half of the Caucasian-identified variants, including the recently fine-mapped BMP pathway loci, BMP4, GREM1, BMP2 and LAMA 5, did not show any evidence for association with CRC in SCH (OR ∼1; p-value >0.1). Comparing the results of this study with that of the Northern and Hong Kong Chinese, only variants at chromosomes 8q24.21, 10p14, 11q23.1 and 18q21.1 were replicated in at least two out of the three Chinese studies. Conclusions The contrasting results between Caucasians and Chinese could be due to different LD patterns and allelic frequencies or genetic heterogeneity. The results suggest that additional common variants contributing to CRC predisposition remained to be identified.


Introduction
To date, GWAS in the Caucasian populations have uncovered fourteen iSNPs at chromosomes 1q41, 3q26.2, 8q23.3, 8q24.21, 10p14, 11q23.1, 12q13.13, 14q22, 15q13.3, 16q22.1, 18q21.1, 19q13.1, 20p12.3 and 20q13.33 associated with CRC risk [1]. Fine mapping at several of these candidate regions have identified other SNPs that could potentially be functional variants [2,3]. Since there are significant differences in allelic frequencies and LD patterns across different populations, these variants have to be replicated to ascertain their role in CRC. More than one-third of these variants, for example, were found to have odds ratios in the opposite direction in the African Americans [4].
We performed genome-wide genotyping on 2,000 age-and gender-matched case-control series of Singapore Chinese (SCH) patients from a single center and population-based healthy controls. The SCH aged 50 years or more comprises mainly descendants of immigrants from the Southern Chinese provinces of Guangdong and Fujian, and is thus representative of the Southern Han Chinese. Determining the genetic risk for CRC in SCH is pertinent as the SCH has the highest CRC incidence amongst all races in Singapore; internationally, its incidence is higher than that of the residents of Shanghai, China and comparable to that of the Caucasian Whites [9].

Ethics Statement
Collection of samples and clinico-pathological information from patients and controls was undertaken with written informed consent and approval from SingHealth Centralized Institutional Review Board B.

Sample Collection
Matched specimens of mucosa and tumor are routinely collected and archived from patients undergoing resection at Singapore General Hospital (SGH). SGH is the premier public hospital which treats about half of the CRC patients in Singapore. The matched mucosa specimens collected are typically at least 10 cm away from tumor site. Mucosa specimens from 1,000 sporadic Chinese CRC patients (defined as age 50 or more at date of operation and without dominant family history of FAP and HNPCC) archived over the past ten years were selected as cases for the study.
Blood samples from 1,000 age-and gender-matched healthy donors from the Singapore Chinese Health Study (SCHS) (n = 931) [10] and the SGH Health Screening Unit (n = 69) constituted the controls of the study. Age was matched to within three years of the year of operation of the cases. The controls were interviewed to ensure that they have no CRC family history.

Genome-wide Genotyping
Samples were randomized so that consecutively procured samples were not extracted consecutively. Genomic DNA was extracted using standard procedures (Methods S1). Whole-genome scan was performed with Affymetrix GeneChip Human Mapping SNP Array 6.0 consisting 906, 600 SNPs. A 600 ng of genomic DNA sample was digested with the restriction enzymes NspI and StyI, amplified, fragmented, labelled and hybridised to the Array for 16 h as per the manufacturer's instructions (Affymetrix, Santa Clara, CA). Arrays were scanned with the Affymetrix 3000-7G scanner and analysed with the Genotyping Console v3. CHP files were generated with the Birdseed algorithm. To minimize batch effect, the genotyping was performed by one operator; and matched cases and control specimens were processed and arrayed together.

Statistical Analysis
The CHP (genotypes) files from the genome-wide scan were imported into Golden Helix SVS for statistical analysis. SNP loci that were not in Hardy-Weinberg Equilibrium (p#1E-7) in the controls were filtered out. Principal component analysis (PCA) was performed on 869, 371 autosomal SNPs on all 2,000 samples and 270 HapMap (consisting of 90 CEU, 45 Chinese Han Beijing (CHB), 45 Japanese (JPT) and 90 Yoruba (YRB)) and 268 Singapore Genome Variant Project (SGVP) samples. The cases and controls clustered with the CHB and the SGVP Chinese samples indicating that there is no population substructure. Sixteen outliers, including two controls that probably have admixture ancestry, were removed. There was observable difference in the clustering of cases and controls for PC1. This difference was no longer apparent after PC1 correction ( Figure S1).
Since the hypothesis tested in this study was whether the CEUidentified SNPs for CRC risk can be replicated in the SCH, the multiple testing corrections included only the number of at-risk SNPs investigated. Thus, a Bonferroni correction of 0.0031 (0.05/ 16) was applied. Multivariate logistic regression using the additive model was performed after adjusting for PC1. SNPs with p,0.0031 or 0.0031,p,0.1 were considered to be significantly or showed a trend of being associated with disease risk respectively. Subgroup analysis was performed for selected SNPs. The iSNPs were examined whenever possible. If the iSNP was not found on the SNP 6.0 platform or was non-polymorphic in SCH (MAF,0.01), surrogate SNP (sSNP) in high L.D. (r 2 .0.8) and within 100 kb vicinity of iSNP was identified from CHB individuals from HapMap and examined. sSNPs that were recently identified by fine mapping in CEU were interrogated whenever possible [2,3]. The mean call rate of the eleven iSNPs and sSNPs interrogated was 0.99 (ranging from 0.97 to 1) and the genotypes of these SNPs clustered well.
Recurrence was defined as time from operation to local recurrence and/or distant metastasis. All patients without recurrence up till January 31 st 2012 were censored. Kaplan-Meier analysis with log rank test was used to evaluate the relationship between genotype and recurrence-free survival. Cox regression test was used to test the independence of the covariates and to estimate the risk for recurrence.

Results and Discussion
There were 14% more males than females in this cohort ( Table 1). Majority of the cases and controls were within the age range of 61-80. About 2/3 of the cases had colon cancer while early (Dukes A and B) and advanced (Dukes C and D) stages of CRC were almost equally represented. Most of the CRC cases were moderately differentiated. The clinico-pathological features of the cases were representative of the Singapore CRC patients. Three candidate regions have either no iSNPs on the Affymetrix SNP6 platform (14q22 and 19q13.1) or the genotypes of iSNP rs4925386 (20q13.33) clustered poorly. All three regions have no sSNPs at high LD (r 2 $0.8) within 100 kb of the iSNPs, as exemplified by the LD plot of chromosome 19q13.1 ( Figure 1A). Thus, it is unlikely that these candidate regions harbor any SNP that could tag causal variant associated with CRC risk in SCH.
The only SNP out of the eleven interrogated that was significantly associated with CRC risk in SCH was sSNP rs3087967 at 11q23.1 ( Figure 1B), possibly due to the higher minor allelic frequencies (MAF) and the relatively higher effect size ( Table 2). Contrary to GWAS studies in Caucasians and Japanese [11,12], we did not find this variant at 11q23.1 to be associated with greater disease risk in the rectum (OR = 1.20, 95% CI 1.02-1.42) compared to colon (OR = 1.22, 95% CI 1.06-1.41) (Table  S1). We, however, found rs3087967 to be associated with greater CRC risk in men (OR = 1.34, 95% CI 1.14-1.58; p = 0.0005) compared to women (OR = 1.07, 95% CI: 0.88-1.29; p = 0.4954), thus implying a gender-specific role which has not been previously reported. It is interesting to note, however, that iSNP rs3802842 at 11q23.1 was replicated in the Northern Chinese but not the Hong Kong Chinese study [5,6]. It is unclear why this so but the Hong Kong Chinese sampled could be a mixture of migrant workers from all over China as Hong Kong is a cosmopolitan city.
Five SNPs, rs6687758, rs11986063,, rs6983267, rs2059254, and rs7226855 at 1q41, 8q23.3, 8q24.21, 16q22.1 and 18q21.1 respectively show trend of association (0.0031,p,0.1) with CRC in SCH but have not reached statistical significance probably due to insufficient sample size and hence power ( Table 2). The MAF for these 5 SNPs were also smaller than the CEU although the effect sizes were comparable.
The iSNP, rs6983267, at 8q24.21 was the first susceptible loci to be identified in the Caucasians [13,14]. It was also the most frequently replicated iSNP in several different populations [5,11,[15][16][17]. Interestingly, rs6983267 was reported to be significantly associated with CRC risk in both the Japanese and Northern Chinese in a recessive model only [5,15]. We found rs6983267 to have higher effect size using a dominant model instead in SCH (OR = 1.38, 95% CI 1.13-1.69). It is unclear why this is so but the Japanese were found to be genetically closer to Northern Han Chinese than Southern Han Chinese [8]. The Hong Kong study, however, did not find rs6983267 but another SNP, rs7014346, at 8q24.21 to have evidence of association with CRC risk [6].
Further, sSNP rs827401 at 10p14 was associated with decreased cancer risk in rectum (OR = 0.74, 95% CI 0.63-0.88; p = 0.0006) but not colon (OR = 1.02, 95% CI 0.89-1.18; p = 0.7466) in SCH (Table S1), thus supporting earlier findings in the Caucasian and Northern Chinese [5,18]. A recent study has reported that the iSNP at 10p14 was associated with a reduced risk of recurrence [19]. We, however, were not able to replicate this with sSNP rs827401 in our rectal cancer patients. Kaplan-Meier analysis revealed that the genotype was not significantly associated with Notably, iSNPs rs7136702 (12q13.13) and rs4779584 (15q13.3) and sSNPs rs12638862 (3q26.2) and rs5005940 (20p12.3) did not show any evidence of being associated with CRC risk in SCH ( Table 2; OR ,1; p-value .0.1). The report on Northern Chinese found rs961253 at 20p12.3 to be significantly associated with CRC risk (OR = 1.38; 95% CI 1.19-1.60; p = 0.00002) [5]. We could not replicate this finding with sSNP rs5005940 (OR = 1.00; 95% CI 0.79-1.28; p = 0.976) although the LD structure in SCH is similar though not identical to the HapMap CHB samples ( Figure 1C and 1D), suggesting that genetic heterogeneity exists between Northern and Southern Chinese. Similarly, rs4779584 at 15q13.13, with the risk allele being the major allele in the Chinese (Table 2), was replicated in the Hong Kong Chinese but not the Northern Chinese and SCH [5,6].
In summary, only iSNPs or sSNPs at 1q41, 8q23.3, 8q24.21, 11q23.1, 16q22.1 and 18q21.1 showed evidence of association with CRC in SCH (Table 2). rs827401 at 10p14 was associated with increased risk in rectal cancer only. Moreover, in contrast to the findings of a recent study [19], the 10p14 region was not associated with disease prognosis in our series. Susceptibility loci from seven other candidate regions, 3q26.1, 12q13.13, 14q22, 15q13.3, 19q13.1, 20p12.3 and 20q13.3 showed no evidence of being associated with the disease. It is noteworthy that all four BMP loci, BMP4 (14q22), GREM1 (15q13.3), BMP2 (20p12.3) and LAMA 5 (20q13.33), the BMP pathway genes highlighted in a recent study [3], did not replicate in SCH. Chromosome 15q13.3 has been implicated to harbor the CRAC1/HMPS locus in Ashkenazi Jewish hereditary mixed polyposis syndrome (HMPS) patients [20]. We previously showed that the disease in Singapore Chinese HMPS patients was not linked to 15q13.3, and identified BMPR1A at 10q23 to be the disease-causing gene [21]. These earlier results indicate that genetic heterogeneity can give rise to similar clinical phenotypes in different populations.
Of the fourteen CEU-identified variants for CRC, only SNPs at 8q24.21, 10p14, 11q23.1 and 18q21.2 were replicated in at least two out of the three Chinese populations, suggesting that the functional variants in these regions could be important for colorectal tumorigenesis across diverse populations (Table 3). Amongst the four SNPs, only rs4939827 at 18q21.1 appear to tag a gene, SMAD 7, in the TGF-â signaling pathway, an important pathway in colorectal tumorigenesis [22]. The other three SNPs are in gene deserts. Accumulating evidence indicate that rs6983267 at 8q24.1 lies within a long range enhancer regulating the expression of C-MYC, an oncogene more than 300 kb downstream by binding T cell factor 4 (TCF4) and enhancing Wnt signaling [23][24][25]. Recent report has indicated, however, that there is neither somatic loss of the risk allele nor possible functional enhancer elements in the LD region at 10p14 and 11q23.1 [26], implying therefore that other unknown mechanisms may be responsible for the association. In addition, the 8q23.3 region harboring the EIF3H and UTP23 genes could be potentially important risk region for the Chinese as well, as sSNP rs11986063 was replicated with the highest effect size in SCH. The 8q23.3 region was not interrogated in the other two Chinese studies due to the lack of polymorphism in the iSNP rs16892766. Pittman et al showed that SNP rs16888589 at 8q23.3 bind EIF3H promoter and repressed its transcription [27]. A later eQTL expression analysis indicated however that the expression of UTP23, rather than that of EIF3H, was correlated with the risk allele of rs16888589 at 8q23.3. The authors suggested that both genes could be functionally coordinated [2].
Not all CEU-identified variants were replicated in the Chinese. The disparity could be due to differences in allelic frequencies and LD structures or real genetic differences. Since the effect sizes of these variants are relatively small and a recent study has estimated that at least 60 common variants contribute to CRC risk [28], the results imply that other variants contributing to predisposition to CRC remained to be identified. Figure S1 PCA plots for PC1 vs PC2 and PC2 vs PC3.

(DOC)
Methods S1 DNA Extraction from buffy coat and normal mucosa.