Replication Study in Chinese Population and Meta-Analysis Supports Association of the 11q23 Locus with Colorectal Cancer

Background A common single nucleotide polymorphism (SNP), rs3802842, located at 11q23, was identified by genome-wide association studies (GWAS) to be significantly associated with the risk of colorectal cancer (CRC); however, the results of following replication studies were not always concordant. Thus, a case-control study and a meta-analysis were performed to clearly discern the effect of this variant in CRC. Method and Findings We determined the genotypes of rs3802842 in 641 unrelated Chinese patients with CRC and 1037 cancer-free controls. Additionally, a meta-analysis comprising current and previously published studies was conducted. In our case-control study, significant associations between the polymorphism and CRC risk were observed in all genetic models, with an additive OR being 1.45 (95% CI = 1.26–1.67). The meta-analysis of 38534 cases and 39446 controls further confirmed the significant associations in all genetic models but with obvious between-study heterogeneity. Nevertheless, ethnicity, study type and whether subjects affected by Lynch syndrome could synthetically accounted for the heterogeneity. Besides, the cumulative and sensitivity analyses indicated the robust stability of the results. Conclusion The results from our case-control study and meta-analysis provided convincing evidence that rs3802842 significantly contributed to CRC risk.


Introduction
Colorectal cancer (CRC), as one of the most common malignancies, accounted for an estimated 1,230,000 new cases and 680,000 deaths worldwide in 2008 [1]. In United States, CRC was the second most commonly diagnosed cancer and the second leading cause of cancer death, with an incidence of 51.6 and a mortality of 19.7 per 100,000 population in 2008 [2]. In China, epidemiology studies indicate that the incidence rate of CRC has grown rapidly, especially in urban areas. The number of affected people increased as much as 4.2% every year from 1973 to 1993 in Shanghai, which was even higher than the global level (2%) [3]. Additionally, the data from 56 cancer registries in China showed that the incidence and mortality rates of CRC respectively ranked the third (31.4/100,000) and fifth (14.8/100,000) among the cancers that affected men and women in 2008 [4]. Among the risk factors for the disease, inherited susceptibility plays a role in the development of CRC, which is responsible for about 35% of variance in CRC risk [5]. However, high-penetrance germline mutations account for only 6% of CRC cases [6], suggesting that the remaining inheritance is likely to be a consequence of many common variants with low penetrance.
Genome-wide association studies (GWAS), efficiently applied to identify common genetic variants for complex diseases without prior knowledge of gene function, have so far uncovered multiple novel single nucleotide polymorphisms (SNPs) to CRC susceptibility [7][8][9][10][11][12]. Among these SNPs, rs3802842 (11q23.1), located in the intron region of C11orf93, was firstly identified in a GWA set of 3004 cases and 3094 controls and 8 replication sets of 14453 cases and 13259 controls [7]. Pittman AM et al. immediately validated the positive finding in the pooling data from 8 independent casecontrol series comprising a total of 10638 cases and 10457 controls [13]. Besides, rs3802842 is the first locus reported to exhibit a population difference between the Japanese and Caucasian populations [7]. Although there is much statistical evidence of this SNP for CRC, the results from replication studies are not always concordant [14][15][16], which is probably due to the so-called ''winner's curse'' phenomenon that initial studies generally overestimate the effect sizes, replication studies are likely to be underpowered and so more likely to fail if the sample size calculations are based on the overestimated effect sizes [17]. Additionally, the modest effect of this variant may be also one of the important reasons for the failure of replication. Nevertheless, meta-analysis is a powerful technique to clarify the inconsistent findings in genetic association studies by increasing the sample size [18]. Notably, although the individual associated variants identified through GWAS confer only modest risks for CRC, the population attributable risk proportions have been considered to be significant because of substantial minor allele frequencies [19]. Thus, we conducted a replication study to examine the association between rs3802842 and CRC risk in a Chinese population using a case-control design, after that, a meta-analysis combining current and previously published studies about rs3802842 was further performed to provide a more precise estimate of this association. Then, the population attributable risk (PAR) was calculated in order to evaluate the effect of rs3802842 for CRC occurrence in general population.

Study Population
A total of 641 CRC cases were inpatients consecutively enrolled through the Tongji Hospital of Huazhong University of Science and Technology between 2009 and 2011, which is a top integrated hospital absorbing majority of cancer patients in Wuhan and nearby region. All cases were newly diagnosed and histopathologically confirmed without any treatment prior to blood samples collection. And 1037 cancer-free controls were selected randomly among the individuals who participated in health check-up programs at the same hospital in the same time period as the cases were enrolled. The health check up programs also primarily involved residents living in Wuhan and nearby region. The selection criteria for controls were no individual history of cancer and frequency matched to CRC cases on sex and age (65 years). If a person was suspected to have CRC through the health check up programs and then histopathologically confirmed, he/she was selected as a case, whereas no case was ascertained through the health check up programs in this case-control study. All subjects were unrelated ethnic Han Chinese. At recruitment, epidemiologic data were collected by personal interview or a review of medical records, and 5-ml peripheral venous blood was drawn from each participant. Participants provided their written informed consent to participate in this study and this study was approved by ethnics committee of Tongji Hospital of Huazhong University of Science and Technology.

Genotyping
Genomic DNA was extracted from 5-ml blood sample using the RelaxGene Blood System DP319-02 (Tiangen, Beijing, China) by reference to the manufacturer's instructions. The rs3802842 was genotyped using the TaqMan SNP Genotyping Assay (Applied Biosystems, Fostercity, CA) on a 7900HT Fast Real-Time PCR System (Applied Biosystems, Fostercity, CA). To ensure quality control, 5% duplicated samples were randomly selected to assess the reproducibility, with a concordance rate of 100%.

Statistical Analysis
Deviation of the genotype frequencies in controls from those expected under Hardy-Weinberg equilibrium was calculated using goodness-of-fit x 2 test. Differences in demographic variables and distribution of genotypes between cases and controls were examined by x 2 test or t test when appropriate. The association between rs3802842 and CRC risk was estimated as odds ratio (OR) with 95% confidence interval (95% CI), which was computed by unconditional multivariate logistic regression with adjustment for sex and age. ORs and 95% CIs as the metrics of effect size were recalculated for the allele C versus A, genotypes AC versus AA and CC versus AA. In order to avoid the assumption of genetic models, dominant, recessive and additive models were also analyzed. To adjust for multiple comparisons, the Bonferroni method was applied. Additionally, stratified analyses by tumor site and tumor differentiation were carried out to further evaluate the role of rs3802842 in CRC. All statistical analyses were performed in the SPSS 18.0 and all P values are two-tailed with a significant level at 0.05.

Meta-analysis of rs3802842 in Association with CRC Risk
To ensure the rigour of this current meta-analysis, we designed and reported it according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement (http://www.prisma-statement.org) and the checklist is shown in Checklist S1.
We searched PubMed, EMBASE, and ISI Web of Science databases for studies published in any language up to April 2012 using the search terms rs3802842, C11orf53, or 11q23.1 combined with colorectal cancer, colorectal neoplasia, colorectal adenoma, colon cancer, or rectal cancer. To expand the coverage of our searches, we further performed searches in Chinese Biomedical (CBM) database [20] based on the above searching strategy. References of retrieved articles and reviews were also checked for additional studies. Searching was performed in duplicate by two independent reviewers (L. Zou and R. Zhong). The following criteria were applied for literature selection: (1) case-control study assessing the association between rs3802842 and CRC risk; (2) presentation of crude/adjusted OR with 95% CI or sufficient data to calculate crude/adjusted OR with 95% CI; (3) studies of humans. When there were multiple published reports from the same study population, the one with complete design or larger sample size was finally selected. If more than one ethnic population were included in one report, each population was considered separately. All data were extracted independently by two reviewers (L. Zou and R. Zhong) and any disagreement was adjudicated by a third author (Q. Wang). For each study, we summarized first author, year of publication, geographic location, ethnicity of study population, study type, genotyping method, numbers of cases and controls, male/female rate, mean age, family history of cancer, source of control group and frequencies of genotypes in cases and controls. The allele C versus A, genotypes AC versus AA and CC versus AA were all calculated and dominant, recessive and additive models were also assumed for rs3802842 respectively. The Bonferroni correction was also applied to counteract the problem of multiple comparisons. The between-study heterogeneity was assessed by x 2 -based Cochran's Q statistic (heterogeneity was considered significant at P,0.10). The I 2 statistic was then utilized to estimate heterogeneity quantitatively (I 2 ,30%, no between-study heterogeneity or marginal between-study heterogeneity; I 2 = 30%-75%, mild heterogeneity; I 2 .75%, notable heterogeneity) [21]. A fixed-effects model, using Mantel-Haenszel method, was adopted to compute the pooled OR when no significant heterogeneity was detected [22]; otherwise, a randomeffects model, using DerSimonian and Laird method, was applied [23]. Overall meta-analysis for rs3802842 was initially performed. Then we conducted stratification analyses if data permitted (the number of studies obtained in each subgroup is not less than 3), according to ethnicity (European, Asian and African), study type (GWAS and replication studies), tumor site (colon and rectum cancers) and whether subjects affected by Lynch syndrome. Additionally, sensitivity analysis was carried out, in which the pooled ORs were calculated after omission of each study in turn [24]. Cumulative meta-analysis was also applied through assortment of studies with publication time [25]. An estimation of potential publication bias was executed by Egger's test [26]. If there was an significant association between the polymorphism and CRC risk detected in the overall meta-analysis for any genetic model, bioinformatics analyses were further carried out to predict the function of rs3802842 using three integrated bioinformatics tools ''SNP Info'' (http://manticore.niehs.nih.gov/snpfunc.htm), ''FastSNP'' (http://fastsnp.ibms.sinica.edu.tw/pages/ input_CandidateGeneSearch.jsp) and ''F-SNP'' (http://compbio. cs.queensu.ca/F-SNP/). To examine the contribution of this polymorphism to the occurrence of CRC in general population, the PAR was computed using the following formula: Pr (RR-1)/ [1+Pr (RR-1)], where Pr, the proportion of control subjects exposed to the allele of interest, can be calculated by minor allele frequency (MAF) reported in dbSNP database and the relative risk (RR) is estimated using the pooled OR produced by the metaanalysis [27]. All statistical analyses were performed with Stata 11.0 software and P values less than 0.05 are considered statistically significant for all tests except for Q test for heterogeneity.

Results of Case-control Study
Population characteristics. The characteristics of cases and controls are listed in Table 1. No significant differences were found between cases and controls for sex and age. Males were 59.9% among cases compared with 59.1% among controls (P = 0.748). The mean age (6standard deviation) was 56.31 (612.59) years for cases and 57.24 (610.86) years for controls (P = 0.119). Of the cases, 250 had colon cancer and 391 had rectal cancer. For tumor differentiation, 80, 445 and 116 cases were classified as poorly, moderately and well differentiated respectively.
Association analysis. The genotype data of rs3802842 for cases and controls are shown in Table 2. The genotype distribution in controls complied with Hardy-Weinberg equilibrium (P = 0.323). The genotype frequencies for this polymorphism in the case group differed from those in the control group (P = 0.000). In the multivariate logistic regression model adjusted for age and sex, individuals who carried the C allele, AC or CC genotype had significantly elevated risks of CRC compared with those carried the A allele or AA genotype ( We then stratified data according to the pathological factors ( Table 2). The polymorphism was associated with increased risk of colon cancer in all genetic models except the recessive model. Significant associations between the polymorphism and rectum cancer for all genetic models were also observed. However, the adjustment for multiple testing resulted in null-significant association in recessive model for colon or rectum cancer. Regarding tumor differentiation, the variant in all genetic models presented significantly elevated risk of CRC with poorly or moderated differentiated. For well differentiated cancer, apart from the recessive model, significant associations between the polymorphism and CRC risk were detected. After correction for multiple testing, the association was observed to be null significant in recessive model for poorly, moderated or well differentiated cancer.

Results of Meta-analysis
Study characteristics. Figure S1 shows the literature search and study selection procedures. After comprehensive searching, 31 potentially relevant publications were identified and screened for retrieval, of which, 15 publications met the inclusion criteria. However, the publications respectively reported by Hutter CM et al. [28], Mates IN et al. [29], Niittymä ki I et al. [30], and Lubbe SJ et al. [31] were excluded since the cases largely overlapped with the samples of previous studies. Therefore, 11 publications plus  [32], hence the study merely participated in the pooled analysis for allelic model. In addition, 3 studies included some Lynch syndrome patients [32,33,35]. Table 3 shows the characteristics of the included studies.
Overall meta-analysis of rs3802842 in associated with CRC. As shown in Table 4 Stratified analysis. The stratified analysis was firstly conducted by ethnicity ( Table 4). Because of less than 3 studies regarding African, the specific data for rs3802842 were only stratified into two subgroups: European and Asian. After stratifying by ethnicity, significant between-study heterogeneity was effectively reduced in European, whereas the heterogeneity in Asian was still detected. In European population, the polymorphism presented significantly increased risks of CRC in all genetic models. In Asian population, all genetic models except the recessive model exhibited significant associations with CRC risk. However, no significant association was found for the genotype AC versus AA, CC versus AA, dominant model and recessive model when the Bonferroni correction was performed. The data were further stratified by study type into GWAS and replication studies (Table 4). Heterogeneity for all genetic models was not detected in the subgroup of GWAS, however, there was significant heterogeneity in the subgroup of replication studies. Statistically significant findings for all genetic were seen either in the GWAS or in the replication studies. The data were additionally stratified into subgroup of subjects affected or not affected by Lynch syndrome ( Table 4). All pooled ORs except the allelic OR could not be appraised in the subjects affected by Lynch syndrome due to limited number of studies. Significant association between the variant and CRC risk was observed without heterogeneity in the allelic model. In terms of subjects not affected by Lynch syndrome, there was evidence of heterogeneity in all genetic models and significant associations of CRC risk with the polymorphism were found. The stratified analysis by tumor site could not be performed owing to lack of data. Sensitivity analysis. Since significant between-heterogeneity for the overall meta-analyses was observed in all genetic models, we performed sensitivity analyses in an attempt to assess the effects of each individual study on the pooled OR under random-effects model. As shown in Table S1, S2, S3, S4, S5, S6, all of the results were not materially altered and did not draw different conclusion, suggesting that our results were robust.
Cumulative meta-analysis. Cumulative meta-analyses were also conducted in all genetic models via assortment of studies in chronologic order. As shown in Figure S2, the plots made it clear that although the studies reduced the 95% CI for the summary estimates, they did not change the inclinations toward significant associations.
Publication bias. The results of Egger's test indicated that no evidence of publication bias was observed in all genetic models (all P.0.05).
The bioinformatics analysis and PAR of rs3802842. All of the three bioinformatics tools conformably predicted that the SNP might change the transcription factor binding sites of C11orf93. In order to evaluate the percent of the incidence of a disease in the population that is due to exposure, the PAR was calculated. The MAF of rs3802842 (C allele) was 31.3% and the pooled OR of overall meta-analysis in additive model was 1.15, so the PAR for C allele was estimated to be 4.5%.

Discussion
The present study demonstrated an association between rs3802842 and increased risk of CRC in a Chinese population. Then, the following meta-analysis based on 32 case-control studies of 38534 cases and 39446 controls also suggested that the SNP was     significantly associated with CRC risk, with an additive OR being 1.15. Accordingly, PAR which takes into account both magnitude of the risk and risk allele frequency in the general population was 4.5%. Cumulative analysis in chronologic order further confirmed the positive findings, exhibiting that the effect of this variant progressively significant with more precise estimation. In addition, sensitivity analyses indicated the stability of the result. To the best of our knowledge, this is the first meta-analysis seeking to clarify the association between rs3802842 and risk of CRC and the results strongly emphasize the role of this polymorphism in CRC susceptibility. rs3802842 is located at 11q23 and there are four ORFs (LOC120376, FLJ45803, C11orf53 and POU2AF1) and a SNP (rs12296076) recognized as polymorphic binding site target for miRNAs in high linkage disequilibrium around rs3802842 within a range of 100 kb [7]. Moreover, rs3802842 is near the genes encoding POU transcription factors [7]. Thus it can be seen that the region where rs3802842 is located doubtlessly adds difficulty and complexity to discern the role of this polymorphism in CRC. Little is known about the function of rs3802842, but our bioinformatics analyses indicated that it was likely to alter the transcription factor binding sites of C11orf93 and further affected the expression of the gene, whereas the function of C11orf93 also remains elusive. Besides, albeit the important role of the region 11q23 in the pathogenesis of CRC has been confirmed [38], Pittman AM et al. inferred that the potential genomic sequence change caused by rs3802842 might affect the expression of genes mapping outside 11q23.1 through cisor transregulatory [13]. Alternatively, most of the variants identified by GWAS cannot be per se causal but imply the probability of being in linkage with the ''real'' causal variants [19], so it is possible that the polymorphism is in linkage disequilibrium with ''real'' causal loci which are hitherto uncharacterized. However, the achievements of finemapping the causal variants responsible for GWAS signals which have been largely predicated on common disease common variant theory remain limited, one reason for this may be that a number of rare variants which are not identified by GWAS account for much of the remaining heritability of diseases, reflecting the important role of rare variants in the risk of disease occurrence [39]. Moreover, understanding the biological function of risk loci, with a focus on non-coding variants, is the greatest challenge in the post-GWAS era [40].
After the first GWAS with respect to rs3802842, multiple replication studies were carried out in succession with inconsistent results. Our case-control study found a significant association between the polymorphism and CRC risk in all genetic models, which was consistent with the GWAS and some replication studies [7,13,34]. Stratified analyses by tumor site or differentiation showed similar results, but not statistically significant in recessive model, which may be due to the sample size or inheritance pattern. Despite positive findings corroborated in our metaanalysis, obvious between-study heterogeneity cannot be ignored; hence stratified analyses were conducted to dig out the source of heterogeneity. When stratified by ethnicity, heterogeneity was greatly reduced in European but not in Asian, implying that there might be varied manners of action and different allele frequencies of rs3802842 between the two populations. Besides, it was noted that Tenesa A et al. observed significantly different allelic effects of rs3802842 between the Japanese and Europeans and further discovered that the population difference was site-specific, that is, the Japanese population showed the variant only associated with the increased risk of rectal cancer, but not associated with the risk of colon cancer that was detected in European population [7]. However, whether colon or rectal cancer risk was related to the polymorphism in our case-control study, suggesting that notwithstanding the same ethnicity the Chinese and Japanese both belong to, the two populations may differ in the causes of CRC to some extent. Regarding study type, heterogeneity disappeared in GWS studies but still remained in replication studies, which was resulted from the diverse genotyping methods. After stratifying by whether subjects affected by Lynch syndrome, heterogeneity was almost removed in the subgroup of subjects affected by Lynch syndrome but did not change in the other subgroup, naturally reflecting the different attributes of research subjects. Taken together, ethnicity, study type and whether subjects affected by Lynch syndrome were likely to be the sources of heterogeneity in this meta-analysis.
Despite the clear strengthen of this study that conducted a casecontrol study and a meta-analysis at the same time to substantiate the association between rs3802842 and CRC risk, some limitations should be acknowledged. First, the sample size of our case-control study was relatively small. Second, we merely performed bioinformatics analyses rather than functional experiments, so whether this variant is causal remained uncertain. Third, CRC is a complex disease related to environmental and genetic factors. However, in this study, only genetic factor was taken into consideration which restricted to explore the gene-environment interaction.
In conclusion, our case-control study and the following metaanalysis provided convincing evidence for the genetic involvement of rs3802842 polymorphism in CRC susceptibility. However, it is needed to carry out fine-mapping of 11q23 region and functional experiments to identify causal loci.