Single nucleotide polymorphisms within MUC4 are associated with colorectal cancer survival

Mucins and their glycosylation have been suggested to play an important role in colorectal carcinogenesis. We examined potentially functional genetic variants in the mucin genes or genes involved in their glycosylation with respect to colorectal cancer (CRC) risk and clinical outcome. We genotyped 23 single nucleotide polymorphisms (SNPs) covering 123 SNPs through pairwise linkage disequilibrium (r2>0.80) in the MUC1, MUC2, MUC4, MUC5AC, MUC6, and B3GNT6 genes in a hospital-based case-control study of 1532 CRC cases and 1108 healthy controls from the Czech Republic. We also analyzed these SNPs in relation to overall survival and event-free survival in a subgroup of 672 patients. Among patients without distant metastasis at the time of diagnosis, two MUC4 SNPs, rs3107764 and rs842225, showed association with overall survival (HR 1.40, 95%CI 1.08–1.82, additive model, log-rank p = 0.004 and HR 0.64, 95%CI 0.42–0.99, recessive model, log-rank p = 0.01, respectively) and event-free survival (HR 1.31, 95%CI 1.03–1.68, log-rank p = 0.004 and HR 0.64, 95%CI 0.42–0.96, log-rank p = 0.006, respectively) after adjustment for age, sex and TNM stage. Our data suggest that genetic variation especially in the transmembrane mucin gene MUC4 may play a role in the survival of CRC and further studies are warranted.


Introduction
With a global incidence of 25.4/100,000 person-years, colorectal cancer (CRC) is the third most common cancer diagnosed in men and second in women [1]. Individuals with a family a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 history of CRC have an approximately 1.87 times higher risk of developing CRC than those without a family history [2]. Lifetime risk for carriers of highly penetrant mutations in genes, such as APC and mismatch repair genes, may reach 50%-80% [3]. In addition, 194 low-penetrance variants located in 144 loci have so far been associated with the risk of CRC by genomewide association studies (http://www.genome.gov/gwastudies/). Individuals carrying these variants may also have a considerable risk of CRC [4]. Early identification of such individuals could provide new options for clinical interventions and lead to cancer prevention and improved treatment [3].
Genetic variation in inflammation-related genes is an attractive research target in the context of CRC because inflammation is a known risk factor for CRC and a hallmark of human cancer in general. The gut represents a unique environment for host-pathogen interactions, with a commensal microflora in direct proximity of intestinal epithelial cells. Gut homeostasis is maintained by a physical separation of the microbial community from the gut epithelium by a mucous barrier. Secreted mucins form the physical barrier, while transmembrane mucins contribute to the protective mucous gel through their O-glycosylated tandem repeats which extend into the mucous gel [5]. The human mucin family consists of at least 22 members: MUC1-MUC22 [5,6].
It has been reported that reduced synthesis and secretion of mucins and altered O-glycosylation in mucus layer are related to the causation of human ulcerative colitis [7,8]. Dysregulation of mucin biosynthesis, especially of Muc2, and loss of core 1 and core 3-derived Oglycans have been shown to induce colitis and CRC in murine models [9][10][11]. Intriguingly, loss of core 3 synthase, which plays an important role in the synthesis of mucin-type O-glycans in digestive organs, has also been shown to lead to development of colon cancer in a mouse model [12].
Aberrant expression of mucins has been reported to be a common feature of CRC. MUC1 and MUC5AC have been shown to be up-regulated in CRC [13], and their overexpression has been associated with disease progression [14][15][16][17][18][19][20]. For MUC2, down-regulation has been associated both with CRC development and progression [14,17,18,20,21]. Aberrant expression of MUC4 during CRC progression has also been reported, and overexpression of MUC4 has been suggested to predict poor survival among patients with early stage (stage I and II) tumors but not in patients with advanced-stage (stage III and IV) tumors [19,22], Also MUC6 has been suggested to be involved in CRC development [20,23].
In addition to mucins, also the core 3 synthase, acetylgalactosaminyl-O-glycosyl-glycoprotein beta-1,3-N-acetylglucosaminyltransferase, encoded by the B3GNT6 gene in humans, has been reported to be significantly down-regulated in colorectal cancer samples [24]. Due to this expression change, it was suggested to be a marker for distinguishing benign adenomas and premalignant lesions [24].
So far, few studies have investigated the association between genetic variants in the mucin genes or genes involved to their glycosylation and CRC. Here, we genotyped a set of potentially functional SNPs in the MUC1, MUC2, MUC4, MUC5AC, MUC6, and B3GNT6 genes in a case-control study of 1532 CRC patients and 1108 healthy controls from the Czech Republic and evaluated their association with CRC susceptibility, progression, and prognosis.

Study population
The case group contained 1532 CRC patients recruited between the years 2004 and 2013 in an on-going study by nine oncological departments in the Czech Republic and has been described in detail previously [25][26][27]. Their mean age was 63 years (range 25-91), and 61.2% of them were males. The patients showed positive colonoscopic results for malignancy, histologically confirmed as colon or rectal carcinomas. Patients who met the Amsterdam criteria I or II for hereditary nonpolyposis colorectal cancer were not included in the study. General information about gender and age at diagnosis was available for all patients. For 672 consecutively recruited, incident cases diagnosed between 2003 and 2013, clinical data at the time of diagnosis, including location of the tumor (colon/rectum) and International Union against Cancer (UICC) TNM stage classification [size or direct extent of the primary tumor (T), degree of spread to regional lymph nodes (N), presence of metastasis (M)] were available (Table 1). Also data on distant metastasis, relapse, death and last contact with the treating physician were collected.
The control group consisted of 1108 healthy individuals recruited by a blood-donor center in one hospital in Prague. These disease-free individuals represent the general population of the Czech Republic, which has a genetically quite uniform population [28]. Their mean age was 47 years (range 18-94) and 53.3% of them were males. All participants were of Czech Caucasian origin All procedures performed involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study was approved by the ethical committees of the participating institutes, the Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, the Institute of Clinical and Experimental Medicine and Faculty Thomayer Hospital, and the General University Hospital, all in Prague, Czech Republic. Written informed consent was obtained from all individual participants included in this study.

SNP selection and functional prediction of the associated variants
Five first and most studied mucin genes MUC1, 2, 4, 5AC and 6 and the gene encoding the core 3 synthase, B3GNT6, which plays an important role in the synthesis of mucin-type O-glycans in digestive organs, were selected as candidate genes based on their functional role in CRC development and progression. A total of 23 SNPs were selected in these genes from the International HapMap Project (http://hapmap.ncbi.nlm.nih.gov) and the NCBI database (http://www.ncbi.nlm.nih.gov) ( Table 2) based on the criteria described in [26]: location within the coding region (non-synonymous SNPs), the 3' and 5' untranslated regions (UTRs) and the promoter (up to approximately 1 kb from the transcription start site), pairwise linkage disequilibrium (LD, r 2 �0.80) between the SNPs in Utah residents with Northern and Western European ancestry from the CEPH collection (CEU). We selected only SNPs with the minor allele frequency (MAF) � 10% in Europeans, to minimize statistical uncertainties in the survival analysis. The selected SNPs provided information on altogether 123 SNPs due to LD (r 2 >0.80). For the SNPs, which associated with CRC risk or survival, SNPnexus (http://snpnexus.org/) was used to predict functional consequences of the selected SNPs. We also used additional web-based tools [HaploReg v2 (http://www.broadinstitute.org), Regulome DB (http://regulome.stanford.edu/) and SNPinfo Web Server (http://snpinfo.niehs.nih.gov/cgibin/snpinfo/snpfunc.cgi)] to predict their effects on potential regulatory elements.

Genotyping
As described in detail in our previous study, whole genome amplified (WGA) DNA from peripheral blood leukocytes was used [29]. All WGA samples were genotyped for two common SNPs: less than 0.1% of the genotypes could not be determined or they did not agree with the corresponding genomic DNA sample, confirming the accuracy of WGA. KASP (LGC Genomics) or TaqMan (Applied Biosystems) allelic discrimination methods were used to genotype the selected SNPs. DNA amplification was performed according to the LGC Genomics' and TaqMan´s PCR conditions. Case and control samples were amplified simultaneously in 384-well format using Hydrocycler 16 (LGC Genomics). Endpoint genotype detection was carried out on the ViiA 7 Real-Time PCR System (Applied Biosystems). The sample set contained 142 duplicated samples as quality controls. The genotype correlation between the duplicate samples was > 99%.

Statistical analysis
The observed genotype frequencies in the controls were tested for Hardy-Weinberg equilibrium (HWE) using the chi-square test. Odds ratios (ORs) and 95% confidence intervals (CIs) for associations between genotypes and CRC risk were calculated by logistic regression (PROC LOGISTIC, SAS Version 9.3; SAS Institute, Cary, NC), and adjusted for age and gender. Codominant, dominant and recessive models were calculated to evaluate the statistical significance. The major allele homozygous genotype was used as the reference. In the case of recessive model, a combination of major allele homozygous and heterozygous genotypes was used as the reference. To account for multiple testing, the SNP Spectral Deposition (SNPSpD) method for multilocus analyses was applied. For a polymorphism with a variant allele frequency between 10 and 50%, the study had greater than 90% power to detect an OR of 1.

Results
Altogether, 123 SNPs with MAF � 10% in the CEU population were located within the regions of interest (promoter, 5'-and 3'-UTR, non-synonymous SNPs) of the 6 genes MUC (1,2,4,5AC,6) and B3GNT6. From these, 23 SNPs were selected for genotyping based on LD (r 2 �0.80) ( Table 2). The genotype distribution of all genotyped polymorphisms was consistent with HWE in the control group (P>0.05), except for rs72842418 (MUC6), which was excluded from the analyses. The MAFs in the control population were similar to the ones reported by the HapMap project for the CEU population.
Minor allele carriers of the MUC6 5'UTR SNP rs61869016 had a decreased risk of CRC (OR 0.78, 95%CI 0.64-0.96, p = 0.02) (S1 Table). To correct for multiple testing, we used the SNPSpD approach. The study-wise effective number of independent markers Meff was calculated to be 18, which gave the significance threshold of 0.0028. Thus, the association with the SNP rs61869016 (MUC6) did not remain formally significant (P = 0.02).
In the univariable analysis, the following parameters were associated with overall survival rate: gender, T, N, M and TNM stage (  Table 3). The Kaplan-Meier survival curves representing survival rates of the patients according to their rs3107764 and rs842225 genotypes are presented in Fig 1. The overall and event-free survival differences of the patients without distant metastasis at the time of diagnosis between the carriers of the different genotypes were statistically significant for rs3107764 in the 3-genotype model with log-rank p-values of 0.004 (overall survival) and 0.004 (event-free survival), respectively, and for rs842225 in recessive model with log-rank p-values of 0.01 and 0.006, respectively.

Discussion
Several mucin-type O-glycans have been considered as prognostic markers in CRC due to their aberrant expression [15,18,20,22,24]. Recently, their interaction with Wnt/β-catenin signaling has been shown to play an important role in CRC progression [30]. In this genetic association study, we investigated the associations between 23 SNPs capturing 123 potentially functional SNPs in the mucin genes and B3GNT6 and CRC risk and clinical outcome. SNP rs61869016 located in the MUC6 gene exhibited a nominal association with CRC risk. In the multivariable survival analysis, two SNPs located in MUC4 were associated with overall and event-free survival of non-metastatic CRC patients. MUC6 has recently been proposed as a prognostic marker for CRC patients [23]. Although MUC6 expression was rarely observed in colorectal tumors, when it was present, the patients had a very good prognosis [23]. In another study, MUC6 expression was associated with the CpG island methylator phenotype (CIMP) of CRC and tumorigenesis via the serrated neoplasia pathway [31]. So far, most studies on polymorphisms in MUC6 have been dedicated to gastric cancer, without any significant associations [32]. The only study on CRC reported no association between a 3'UTR SNP and risk or clinical outcome [25]. In our study, this SNP was captured by rs7481521 (r 2 = 0.72) and similar to the previous study no association was observed. However, we observed a nominally decreased risk among carriers of the minor allele of rs61869016 in the 5' UTR of MUC6. According to Regulome DB the SNP is located in a region showing weak repressed Polycomb marks in different colorectal tissues, while in gastric tissue, it shows enhancer marks. Thus, a potential mechanism how rs61869016 could affect CRC risk is through tissue-specific regulation of gene expression via changes in histone state, which may be relevant for the development of some specific types of CRC.
Previous reports have indicated that MUC4 expression is lost as CRC progresses, and this loss of MUC4 may be regulated by β-catenin [33], however, in a subgroup of patients with overexpression of MUC4 a worse prognosis has been reported [22,34]. Through its epidermal growth factor (EGF) domains, MUC4 may act as an intramembrane ligand for receptor tyrosine kinase ErbB2 and execute antiapoptotic function and by that promote tumor progression [35,36]. Published studies have reported that polymorphisms in MUC4 are associated with susceptibility to lung cancer, endometriosis development and endometriosis-related infertility [37,38]; homozygous G allele carriers of the MUC4 SNP rs842225 were reported to have a decreased risk of lung cancer in a Han Chinese population [35,37,38]. So far, no studies have investigated the relationship between MUC4 SNPs and CRC. In our study, we found two SNPs in MUC4, rs3107764 and rs842225, to be associated with overall and event-free survival among patients without distant metastasis at the time of diagnosis. C allele carriers of rs3107764 had an increased risk of dying while homozygous A allele carriers of rs842225 had a decreased risk. Rs842225 is predicted to be likely to affect binding and expression of a target gene (MUC20) by Regulome DB. Chromatin state data indicated also marks of strong transcription at this locus in colon and rectal tissues. Rs3107764, on the other hand, is a missense SNP (Ala41Pro), which is predicted to be possibly damaging by PolyPhen. It may also affect expression of MUC20, although with a lower extent than rs842225. Previously, overexpression of MUC20 has been associated with recurrence and poor survival of CRC patients [39].
Given the important role of the mucous barrier in maintaining the gut homeostasis, it is surprising that only four studies have been published regarding potential associations between genetic variants in the mucin genes and CRC [25,[40][41][42]. Similarly, studies on polymorphisms in mucin genes or on alterations in mucin expression affecting risk of inflammatory bowel diseases are sparse and inconclusive [43]. The four studies on SNPs and CRC risk reported either no or only weak associations [25,[38][39][40], similar to our study. However, the study investigating also associations with clinical outcome reported three associations between SNPs located in the microRNA binding sites, recurrence and survival [25]. Interestingly, the genes involved were MUC17, MUC20 and MUC21, all encoding transmembrane mucins, similar to MUC4, which was identified in our study to affect both overall and event-free survival in metastasisfree patients at the time of diagnosis. In addition to protecting gut epithelia against microorganisms and inflammation, transmembrane mucins also play an important role in transmitting cell-cell and cell-microenvironment signals [5,44]. In addition, they can induce cell transformation and promote tumor progression [5,44]. Thus, genetic variation in the regulatory regions of the transmembrane mucin genes may modify the function of the corresponding proteins and by that affect colorectal tumor progression and survival of the patients.
Our study has both strengths and limitations. The cases and controls represent a genetically quite uniform Czech population [28], excluding the problem of population stratification. The control group consisted of healthy blood donors, who may be more health conscious than the general population. The control group was also younger and the proportion of men was lower than in the case group. To avoid bias due to these differences, we adjusted our analyses for both age and sex. From the 1532 CRC patients included in the study, 672 consecutively collected, incident cases diagnosed between 2003 and 2013 were available for the survival analysis. This ensured that only newly diagnosed CRC cases (within 1 year of diagnosis before enrollment for this study) were included in the study, excluding a survival bias. For this subgroup, nearly complete clinical data were available, allowing evaluation of the SNPs as independent prognostic markers. However, although the number of individuals for the risk analysis was sufficient for this kind of a study, the limitation to newly diagnosed CRC cases in the survival analyses decreased the power to detect associations with genotypes. Because of that we concentrated on SNPs with MAF � 10% in Europeans. Although we covered a total of 123 SNPs by the 23 genotyped SNPs in the basic regulatory and coding regions of the genes, it is possible that SNPs with lower MAF or SNPs in still unknown regulatory regions of these genes, such as the enhancer and the silencer regions, might also have an effect on CRC susceptibility or clinical outcome. As our study covered only 6 of the known mucin and mucin-type O-glycosylation genes, further studies are warranted in the other genes of this system maintaining the gut homeostasis.
In summary, our results on associations of SNPs in MUC4 with survival of CRC patients supports previous studies implicating importance of genetic variants in mucin genes encoding transmembrane mucins in the clinical outcome of CRC patients. Further studies with larger independent populations are needed to verify our findings and to investigate the potential function of the studied SNPs as well as SNPs in other relevant regulatory regions of the mucin-type O-glycans.
Supporting information S1 Table. Association of all evaluated SNPs with colorectal cancer susceptibility in the study population of 1532 cases and 1108 controls.