Single Nucleotide Polymorphisms in the Wnt and BMP Pathways and Colorectal Cancer Risk in a Spanish Cohort

Background Colorectal cancer (CRC) is considered a complex disease, and thus the majority of the genetic susceptibility is thought to lie in the form of low-penetrance variants following a polygenic model of inheritance. Candidate-gene studies have so far been one of the basic approaches taken to identify these susceptibility variants. The consistent involvement of some signaling routes in carcinogenesis provided support for pathway-based studies as a natural strategy to select genes that could potentially harbour new susceptibility loci. Methodology/Principal Findings We selected two main carcinogenesis-related pathways: Wnt and BMP, in order to screen the implicated genes for new risk variants. We then conducted a case-control association study in 933 CRC cases and 969 controls based on coding and regulatory SNPs. We also included rs4444235 and rs9929218, which did not fulfill our selection criteria but belonged to two genes in the BMP pathway and had consistently been linked to CRC in previous studies. Neither allelic, nor genotypic or haplotypic analyses showed any signs of association between the 37 screened variants and CRC risk. Adjustments for sex and age, and stratified analysis between sporadic and control groups did not yield any positive results either. Conclusions/Significance Despite the relevance of both pathways in the pathogenesis of the disease, and the fact that this is indeed the first study that considers these pathways as a candidate-gene selection approach, our study does not present any evidence of the presence of low-penetrance variants for the selected markers in any of the considered genes in our cohort.


Introduction
Colorectal cancer (CRC) is one of the main forms of cancer, being the second most frequent neoplasm in both sexes and one of the most important morbidity causes in the western world [1]. The genetic contribution to CRC has been estimated to be around 35% by extensive twin studies [2]. However, highly penetrant variants, that cause mendelian predisposition syndromes, account only for, at most, 5% of the disease cases [3]. The remaining genetic susceptibility is thought to follow a polygenic model, with an interplay of multiple low-penetrance allelic variants appearing in high frequency in the general population, and each conferring a modest effect on disease risk [4,5].
Candidate-gene studies have been one of the most commonly used tools in the screening for new variants affecting CRC risk. Gene selection in these studies is mainly based on the functional implications of a possible association, and thus genes selected have either been chosen because of the previous presence of other high/ low risk alleles [6], or their participation in a pathway implicated in the pathogenesis of the disease [7]. Candidate-gene studies can be performed by either direct approaches, where the variants genotyped are presumed to be the underlying cause of the disease because of their location (variants in exonic or regulatory regions), or by indirect approaches, where tag SNPs take advantage of the linkage disequilibrium properties of the human genome to try and screen the most of the variability in a given gene.
This latter approach has also allowed, together with the development of high-throughput technologies, the implementation of new hypothesis-free approaches (in opposition with hypothesisbased candidate-gene approaches), covering the majority of the genome (genome-wide association studies or GWAS). This implementation has successfully led to the identification of some new susceptibility loci [8][9][10][11][12][13][14], including rs4444235 and rs9929218, that fall within reach of two genes belonging to the BMP pathway. Nevertheless, these have been found to predict only a small proportion of the disease susceptibility, with the remaining yet to be discovered [15].
We hence aimed to find such susceptibility variants through a candidate-gene approach screening a selected number of variants within two cellular pathways that have consistently been linked to CRC tumorogenesis: the Wnt and the BMP signaling pathways [16,17].
The Wnt pathway contains genes that have for long been known to be responsible of some hereditary CRC syndromes, such as APC and familial adenomatous polyposis [18]. Moreover, somatic alterations in APC are found in almost 80% of the sporadic colorectal cancers, and Wnt signaling activation is involved in the best part of sporadic colorectal carcinomas [19]. On the other hand, the BMP pathway acts as positive regulator of some of the Wnt proteins [17], and the tumor suppressive role of this signaling pathway in the pathogenesis of CRC and other cancers is well established [20,21]. Besides, mutations in two of its genes, SMAD4 and BMPR1A, are responsible for juvenile polyposis syndrome, another hereditary CRC condition [22]. Considering all this information, we thought it would be interesting to screen some of the genetic variability within these pathways for any evidence of new CRC related variants that could explain at least part of the missing heritability. Our approach was mainly functional, for only SNPs within exonic or cis-regulatory sequences (59 and 39 unstranslated regions) were selected to analyse their relationship with CRC susceptibility.

Results and Discussion
Following our pathway-based candidate-gene selection method, we performed our study in a total of 45 SNPs that were in either exonic or regulatory regions, in an overall of 21 genes from both the Wnt and BMP pathways. Details of SNP features and association values for the 37 SNPs that successfully passed quality control criteria are shown on Table 1. None of the screened SNPs were significantly associated with an altered risk of CRC, considering odds-ratios and related p values for allelic and genotypic tests (trend, dominant and recessive). Logistic regression for age and sex adjustment was performed, although it did not improve p value results. Haplotype analysis results were consistent in both Unphased and Haploview, and did not show any signs of positive associations either for any of the 8 genes for which this analysis was performed (AXIN1, HDAC9, BMP4, DACT1, CDH3, CDH1, BTRC, and APC), ( Figure S1). Stratification analysis comparing sporadic and familial cases was also implemented, but it did not provide any evidence of differences in susceptibilities between the groups that could be a sign of any specific associations within either of the groups (Table 2).
Thus, our strategy has not managed to detect any new susceptibility loci for CRC risk.
Pathway-based expectations have proved to be quite discouraging in the literature as well, for strong candidate pathways, such as DNA-repair ones, surprisingly failed too in identifying any new risk variants [7,[23][24]. In addition to this, most of the genetic variants that have been found to be associated with disease are located in intergenic regions, with potential functions that are yet unknown.
Still, in light of the recent discoveries that followed up the analysis of genome-wide data, both Wnt and BMP have earned a renewed fame. The susceptibility locus found on 8q24 (rs6983267) has been linked to an enhanced Wnt signaling through its interaction with TCF4 [25,26], and a meta-analysis conducted on a series of GWAS data succeeded in associating two variants in the BMP4 and CDH1 gene regions with the disease (rs4444235 and rs9929218, respectively) [8].
Even though this is actually the first association study that considers the pathways as a whole for gene selection, some of the genes included in our analysis (i.e APC, CCND1, CDH1 and TCF7) had already been screened for risk alleles [6,[27][28][29][30]. It is quite remarkable that there has been a growing debate over some of these loci, specially the p.V1822D variant in APC (rs459552). This missense change is widely documented in the literature, with some studies defending it as neutral (this study and others) [31], and some conferring its minor allele a protective effect [6,28]. Lack of appropriate study power, resultant from insufficient number of samples has been a major problem in many of these studies and thus most of them have not provided very convincing results [32].
Although our study had over 80% power to detect OR as low as 1.21 with minor allele frequencies of 0.30 (57% of our SNPs), and 1.24 for MAFs down to 0.2 (78% of the SNPs), assuming a log-additive model and a = 0.05, we were unable to detect any positive associations suggesting the presence of any new CRC susceptibility variants. Nevertheless, it is quite remarkable that, albeit our failure to replicate the associations for the BMP4 and CDH1 SNPs, this is the first study that investigates any of the socalled 10 new GWAS-discovered susceptibility loci in a Southern-European population.
Despite our negative results, we must consider that we did not whatsoever comprehensively cover all possible low-penetrance variants within the selected genes. This is mainly due to the fact that our strategy was purely functional, selecting the variants that were a priori good candidates to be directly associated with the disease. This indeed may constitute a limitation in the study, for most of the genetic variation within the loci was not investigated. Thus, we believe further efforts should be made to screen a wider variety of loci within these pathways, specially considering the previous positive associations described so far for both Wnt and BMP-related genes.
Pondering the potential odds ratios of the variants described so far (1.11, CI 1.08-1.15 and 0.91, CI 0.89-0.94 for rs4444235 and rs9929218, respectively), we assume larger cohorts may be required to detect such subtle effects. On the other hand, when considering candidate-gene approaches, it would also be useful to meta-analyse previous studies and pull the information across of them altogether in the search of evidences of potential new pathways linked to the pathogenesis of the disease.

Study populations
Subjects were 933 CRC patients and 969 controls that belonged to the EPICOLON project, a prospective, multicentre, population-based epidemiology survey studying the incidence and features of familial and sporadic CRC in the Spanish population [33]. Cases were selected across 11 hospitals in Spain as all patients with a de-novo histologically confirmed diagnosis of colorectal adenocarcinoma and who attended 11 community hospitals across Spain between November 2006 and December 2007. Patients in whom CRC developed in the context of familial adenomatous polyposis or inflammatory bowel disease, and cases where patients or family refused to participate in the study were excluded. Demographic, clinical and tumour-related characteristics of probands, as well as a detailed family history were obtained using a pre-established questionnaire, and registered in a single database. Of these, 592 (63%) were male and 341 (37%) female. Median age for cases was 73 (range 26-95), whereas mean was 71(SD610.7). Hospital-based controls were recruited together with cases and were confirmed to have no cancer or prior history of neoplasm, and no family history of CRC. All controls were randomly selected and matched with cases for sex and age (65 years) in a 1:1 ratio. Both cases and controls were of European ancestry and from Spain. Ethics statement

DNA extraction
DNA was obtained from frozen peripheral blood; extraction was performed in a CHEMAGEN robot (Chemagen Biopolymer-

Candidate-gene selection
Both Wnt and BMP pathways were initially selected after the findings of Nishanian et al. [34], who demonstrated the interaction between these two pathways. Both pathways were thoroughly investigated through the Cancer Genome Anatomy Project site [35], but we failed to find any information regarding the BMP pathway in either this or other web browsers. For that reason, Wnt genes were selected by browsing the pathway through Biocarta [36], whereas BMP genes had to be strictly selected from previous literature [17,34]. Forty-one genes were finally selected to be included in the analysis.

SNP selection and genotyping
SNP selection criteria only considered functional markers with minor allele frequencies above 0.05 and at least two independent validation criteria as established in dbSNP [37]. This included all exonic variants selected with Pupasuite [38] and gene-regulatory regions in cis (59or 39 UTR ends), as defined by the FESD web browser [39]. 59UTR variants were only included when they complied to the abovementioned criteria and were presumed to be in the potential binding site of a known transctiptional binding factor. 39 UTR variants were included because of their potential relationship with miRNA binding regions [40]. Because some of the selected genes had no SNPs of such these kinds in any of the three browsers at the time of SNP selection, they ultimately had to be dropped out of the study. Finally, 43 SNPs were chosen within 21 genes to be screened as potential direct modifiers of CRC susceptibility (Table 3).
rs4444235 and rs9929218 are two variants lying in the near-by and intronic regions of BMP4 and CDH1, respectively, that have been recently reported to be associated with the disease [8].
Considering that the SNPs that we had chosen within these two genes were not good taggers for these two variants (r-squared values were 0.6 for the SNPs in BMP4, and 0.02 for those in CHD1) (Figure 1), we decided to include them in our study as well, although they did not fulfill our selection criteria, making the total number of interrogated SNPs rise to 45. Genotyping was performed with the MassARRAY (Sequenom Inc., San Diego, USA) technology at the Santiago de Compostela node of the Spanish Genotyping Center. Calling of genotypes was done with Sequenom Typer v4.0 software using all the data from the study simultaneously.

Statistical analyses
Quality control was performed, first by excluding both SNPs and samples with genotype success rates below 95%, with the help of the Genotyping Data Filter (GDF) [41]. Genotypic distributions for all SNPs in controls were consistent with Hardy-Weinberg equilibrium as assessed using a X 2 test (1 df ). All pvalues obtained were $0.05, thereby excluding the possibility of genotyping artifacts (data not shown). Population stratification was assessed with Structure v2.2 [42]. Briefly, the posibility of different scenarios was tested assuming a different number of underlying populations (k ranging from 1 to 4), allowing for a large number of iterations (25 K in the burn-in period followed by 500 K repetitions). The mean log likelihood was estimated for the data for a given k (referred to as L(K)) in each run. We as well performed multiple runs for each value of k computing the overall mean L(K) and its standard deviation. All results seemed to be concordant with the original assumption of a single existing population. Moreover, additional procedures for better confounding variable visualization were undertaken by means of a Principal Component Analysis (PCA) using the EIGENSOFT tool smartpca [43], although number of markers was very low. No differences were found of population stratification between cases and controls for either STRUCTURE or the first 10 components of the PCA analysis ( Figure S2). After quality control 1746 samples (854 cases and 892 controls) and 37 SNPs remained for further analyses.  Association tests were performed by chi-squared tests for every single SNP and haplotypes where possible with both Haploview v4.0 [44] and Unphased [45]. In short, LD patterns across genes for which more than one SNP was genotyped were checked in Haploview and tested for association using Unphased (to check in any of the haplotypes was associated) and Haploview (to see which of the haplotypes was associated). Genotypic association tests, logistic regression analysis for sex and age adjustment, and stratified analysis between sporadic and familial groups were estimated with PLINK v1.03 [46]. OR and 95% confidence intervals were calculated for each statistic, and to address the issue of multipletesting, permutation tests and the Bonferroni correction were used. Study power was estimated with CATS software [47]. Figure S1 Haplotype structure and analysis for the 8 genes for which more than one SNP was genotyped. The table shows association values for each SNP generated by Haploview. Note S1