Evaluation of Allele-Specific Somatic Changes of Genome-Wide Association Study Susceptibility Alleles in Human Colorectal Cancers

Background Tumors frequently exhibit loss of tumor suppressor genes or allelic gains of activated oncogenes. A significant proportion of cancer susceptibility loci in the mouse show somatic losses or gains consistent with the presence of a tumor susceptibility or resistance allele. Thus, allele-specific somatic gains or losses at loci may demarcate the presence of resistance or susceptibility alleles. The goal of this study was to determine if previously mapped susceptibility loci for colorectal cancer show evidence of allele-specific somatic events in colon tumors. Methods We performed quantitative genotyping of 16 single nucleotide polymorphisms (SNPs) showing statistically significant association with colorectal cancer in published genome-wide association studies (GWAS). We genotyped 194 paired normal and colorectal tumor DNA samples and 296 paired validation samples to investigate these SNPs for allele-specific somatic gains and losses. We combined analysis of our data with published data for seven of these SNPs. Results No statistically significant evidence for allele-specific somatic selection was observed for the tested polymorphisms in the discovery set. The rs6983267 variant, which has shown preferential loss of the non-risk T allele and relative gain of the risk G allele in previous studies, favored relative gain of the G allele in the combined discovery and validation samples (corrected p-value = 0.03). When we combined our data with published allele-specific imbalance data for this SNP, the G allele of rs6983267 showed statistically significant evidence of relative retention (p-value = 2.06×10−4). Conclusions Our results suggest that the majority of variants identified as colon cancer susceptibility alleles through GWAS do not exhibit somatic allele-specific imbalance in colon tumors. Our data confirm previously published results showing allele-specific imbalance for rs6983267. These results indicate that allele-specific imbalance of cancer susceptibility alleles may not be a common phenomenon in colon cancer.


Introduction
Tumor suppressor genes and oncogenes have long been recognized to show copy number losses and gains in tumors, respectively [1,2]. Classically, the wild-type allele of tumor suppressor genes is lost in tumors whereas the mutated or nonfunctional allele shows selective retention. Likewise, an activated mutation or activated copy of an oncogene is frequently selected for gain or amplification in tumors. Previous studies using mouse models show evidence that a subset of susceptibility loci for skin and colon cancer demonstrate strain-specific gains or losses consistent with these loci housing tumor promoting alleles or tumor suppressing alleles [3,4]. For example, PTPRJ, a gene originally identified as a candidate tumor suppressor mapping to the mouse Scc1 locus, was shown to preferentially lose a suspected resistance allele in a subset of heterozygous human colorectal adenocarcinomas showing loss of heterozygosity at PTPRJ [3]. Allele-specific gains of a single nucleotide polymorphism (SNP) in AURKA, rs2273535, have been observed in multiple studies of colorectal tumors [5,6]. Preferential allelic gains or losses in multiple regions of the genome have been identified in genomewide screens looking at individuals with multiple independent primary tumors [7] and in genomic studies of glioblastoma samples via the comparison of germline and somatic genotype data [8].
Several genome-wide association studies have revealed alleles associated with colorectal cancer (CRC) risk [9][10][11][12][13][14][15][16]. The SNP rs6983267 on 8q24 has been associated with both colorectal and prostate cancer risk at a genome-wide significance level [9,17,18]. Allele-specific copy number analyses showed that the G allele (the putative risk allele) of this variant shows preferential gains in colon tumors and myeloid leukemia [19][20][21]. To our knowledge, no other SNPs from published GWAS literature have definitively and reproducibly shown allele-specific imbalance in colorectal tumors, although individual studies have described allelic imbalance in CRC for other loci [7,22]. In the present study, we performed quantitative genotyping of 16 statistically significant variants from published GWAS (including rs6983267) in paired normal and colorectal tumor DNA. The goal of this study was to investigate these SNPs for somatic gain of the susceptibility allele or loss of the resistance allele using allelic imbalance analyses.

Human Samples
Ethics statement. This study was approved by The Ohio State University (OSU) Institutional Review Board. All study participants provided written informed consent for use of their tissues in research.
Discovery Set. Paired normal and formalin-fixed paraffinembedded (FFPE) tumor tissue blocks were obtained through the OSU Human Tissue Research Network and the Midwest Cooperative Human Tissue Network. Tumors that exhibited microsatellite stability and/or stained positively for the Lynch syndrome proteins MSH2, MLH1, PMS2, and MSH6 by immunohistochemistry (IHC) were prioritized for inclusion in the study. When microsatellite or IHC data were unavailable, tumors that showed characteristics suggestive of Lynch syndrome such as right-sided location, poor differentiation, and a high percentage of mucin were excluded [23]. After selection, confirmation of diagnosis and DNA extraction, 194 histologically normal/tumor DNA pairs were available for study.
Validation Set. A validation set of 296 paired non-tumor/ tumor DNA samples were obtained from two existing study collections. Samples from 196 individuals were acquired from a population-based study cohort of incident colon cancer diagnosed in the metropolitan Columbus area [24,25]. Blood DNA was available for all cases. An additional 100 fresh frozen paired normal and tumor tissue samples were obtained through the Cooperative Human Tissue Network at The Ohio State University Medical Center. Specimens were snap-frozen shortly after surgery and received anonymously along with a full pathology report. The 296 CRC cases were all classified as likely to be microsatellite stable, the set of 196 samples was stable by microsatellite instability testing, and the 100 fresh frozen tumors all showed intact mismatch repair proteins by immunohistochemistry staining.

DNA Extraction
Test Set. Hematoxylin and eosin stains from normal and tumor FFPE sections were evaluated by a pathologist to confirm diagnosis and to mark tissues for coring. Tissue cores of 1.6 mm diameter were prepared from regions consisting of 70% or more tumor cells for collection of tumor DNA, or from regions with normal histology for isolation of normal (non-tumor) DNA. Genomic DNA was extracted from tissue cores as previously described [26] and quantified with a Nanodrop-1000 spectrophotometer. The majority of DNAs were of good quality as indicated by A260/A280 ratios greater than 1.8.
Validation Set. Tumor DNAs from the Columbus-metropolitan area study were isolated as described [26]. Normal DNAs from these individuals were isolated from blood samples in the OSU Human Genetics Sample Bank by standard protocols. DNAs from the 100 paired normal/tumor DNA samples from the Cooperative Human Tissue Network were isolated from the fresh frozen tissue by the same extraction protocol used for the test set samples. Normal DNAs from the three sources (FFPE, blood, and fresh frozen tissue) exhibited similar frequencies of heterozygosity and similar A260/A280 ratios, suggesting comparable DNA quality across sample sources.

Inclusion of SNPs for Study
To test our hypothesis that CRC susceptibility loci would show allele-specific somatic events in tumors, we searched the recent literature to identify variants showing evidence of CRC risk from GWA studies [9,10,[13][14][15][27][28][29][30][31][32]. Seventeen SNPs (rs10411210, rs10936599, rs11169552, rs16892766, rs3802842, rs4444235, rs4779584, rs4925386, rs4939827, rs6687758, rs6691170, rs6983267, rs7014346, rs7136702, rs719725, rs961253, rs9929218) meeting or approaching genome-wide significance (pvalue,10 27 ) for CRC risk in published GWA studies were chosen for analysis of allele-specific imbalance in the initial discovery set of tumor/normal DNA pairs (Table 1). Other inclusion criteria for study included identification in Caucasian populations and a sufficiently high documented minor allele frequency (MAF.20%) for identification of enough heterozygotes for statistical power. The SNP rs16892766 was the only exception to this criterion, as it has a documented MAF of 7%. rs4925386 was eliminated postgenotyping of the original sample set due to a failure rate greater than 15%.

Quantitative Genotyping
Multiplexed primers for PCR amplification and allele-specific single base extension reactions were designed using the Seque-nomH MassARRAY Assay Design 3.1 software and are available upon request. Mass spectrometry-based genotyping of 20 ng paired tumor and normal DNA was performed using SequenomH MassARRAY iPlex Gold (Sequenom Inc., San Diego, CA, USA) according to the manufacturer's protocol. Each 384-well Seque-nomH plate included four negative template controls (dH 2 O), two samples tested in duplicate, and four positive control DNAs.

Verification of Genotyping Technique
To validate the use of SequenomH quantitative genotyping for its sensitivity for identification of allelic imbalance, we generated natural log-transformed N-ratios (N-ratio = normal allele 1 peak area/normal allele 2 peak area) for DNA mixtures of known homozygote DNA samples representing 0, 20, 40, 50, 60, 80, and 100% allelic contributions. We did not have appropriate homozygous DNAs for three of the SNPs so these were not evaluated. The majority of slopes and R-values for these were very close to standard curves for ''perfect data'' suggesting a high degree of sensitivity for our method of detecting allelic deviations from 50% ( Figure S1).

Analysis of Imbalance
The SequenomH MassARRAY iPLEX software quantifies the area under each of the allele peaks and assigns either a heterozygous or homozygous call to the SNP by calculating the ratio of the peak areas for the two alleles. As described previously [7], for all SNPs tested we scored preferential allelic imbalance by calculating the R-ratio for each DNA pair. We defined the R-ratio as the ratio of the two allele peak areas in the normal DNA divided by the ratio of the two allele peak areas in the paired tumor DNA (R-ratio = Normal (allele 1/allele 2) /Tumor (allele 1/allele 2) ). Samples were scored as having imbalance, defined as the loss of either the first or second allele in the tumor sample, if the R-ratio was greater than 1.5 or less than 0.67, respectively. The R-ratio thresholds used to determine imbalance have been described previously [33,34]. A chi-squared test (df = 1) was used to assess the observed imbalances for statistically significant deviation from the expected 50:50 distribution of allele imbalances. In cases in which a tumor was heterozygous for a SNP by genotyping but the paired normal sample failed to genotype, an average of the two normal alleles for heterozygous normal samples at that SNP was used in place of the failed normal sample to calculate an R-ratio. SNPs with pvalue,0.10 were considered suggestive of preferential allelic imbalance and were therefore subjected to testing in the validation sample set to rule out false positives. Bonferroni correction was used to adjust for the number of statistical tests. In addition to qualitative determination of imbalance, we generated box plots of the distribution of R-ratios for each SNP for samples showing relative loss of allele 1, relative loss of allele 2, and no imbalance ( Figure S2). Samples were excluded from the plots if they had an R-ratio of greater than 10 or if an R-ratio could not be calculated because one of the two alleles in the tumor sample had an allele peak area value of 0.

Validation Studies
Following statistical analysis of allele-specific imbalance in the discovery sample set, three variants with p-values,0.1 (rs16892766, rs6983267 and rs7136702) were genotyped by SequenomH MassARRAY iPlex Gold in a replication sample set of 296 paired normal/tumor DNAs. The same quantitative genotyping protocol and statistical analyses used for the discovery sample set were employed with the validation sample set. Bonferroni correction was used to adjust for the number of statistical tests (n = 3).

Compilation of Allele-Specific Imbalance Data from Multiple Studies
Allele-specific imbalance analyses have previously been performed on seven of the GWAS SNPs tested in the present study [19,35]. These studies employed manual measurement of sequencing chromatogram peaks for tumor and normal DNAs to calculate R-ratios. Both published studies utilized R-ratio cutoff values of ,0.60 and .1.67 for allele-specific imbalance analysis. For both previously published studies, tumor DNA was isolated from fresh frozen colon tumors, and blood was used as the source of normal DNA [19,35]. In order to test the seven variants that overlapped with our study, we combined the data from the published studies with our allele-specific imbalance results for rs6983267, rs961253, rs3802842, rs10411210, rs4444235, rs4779584, and rs9929218. We combined our numbers of relative allelic losses with the numbers from the published studies and performed a chi-squared test with Bonferroni correction (n = 7) to determine the statistical significance of the combined imbalances.

Correlation Analysis of Allelic Imbalances and Age, Sex, and Tumor Stage
For each SNP successfully assessed for allelic imbalance, we investigated the association between the presence of allelic imbalance and age of diagnosis, sex, and tumor stage of the patient. Chi-squared statistical test was used to detect association between allelic imbalance and sex. Fisher exact statistical test was used to detect association between allelic imbalance and tumor stage. For tumor stage, we classified tumors as TNM stage I-IV according to available tumor size, nodal spread, and metastasis information. The sample t-test was used to compare the average age of patients whose tumors showed allelic imbalance to that of patients whose tumors maintained heterozygosity. Correlations with corrected p-values,0.05 were deemed statistically significant.

Discovery Set Genotyping
To determine if any of the 17 CRC-associated SNPs show evidence of allele-specific imbalance, we genotyped them in 194 normal/tumor DNA pairs. All but one SNP, rs4925386, were successfully genotyped in greater than 85% of samples in the discovery set. Due to a high rate of genotyping failures (24%), rs4925386 was excluded from further analysis. The number of heterozygous normal DNAs identified for each SNP (for which the paired tumor DNA was also successfully genotyped) ranged from 27 to 84 of the 194 samples (14-43%; Table 2). The frequency of overall relative allele loss (for both risk and non-risk alleles combined) ranged from 2% to 44%. While none of the SNPs reached statistical significance for allele-specific imbalance at a = 0.05, three SNPs (rs16892766, rs6983267, rs7136702) showed a trend for allele-specific imbalance (p-values,0.10) prior to Bonferroni correction for multiple comparison (n = 16). The SNP rs6983267 showed higher frequencies of relative loss of the nonrisk T allele compared to the risk G allele. Interestingly, rs16892766 and rs7136702 both demonstrated higher frequencies of relative loss of the risk allele compared to the non-risk allele in the discovery set tumors. The variants rs16892766, rs6983267 and rs7136702 were prioritized for validation in a second set of samples. In addition to qualitatively scoring the SNPs as showing imbalance or no imbalance, the distribution of R-ratios for relative loss of the risk allele, relative loss of the non-risk allele and no imbalance were graphed as boxplots for each SNP ( Figure S2). Samples for which the R-ratio was greater than 10 or for which the R-ratio could not be calculated were excluded from the plots.

Validation Set Genotyping
The SNPs rs16892766, rs6983267 and rs7136702, which all showed evidence of allele-specific imbalance in the original discovery set, were further tested in the validation sample set of 296 normal/tumor DNA pairs. As with the test set, these three SNPs successfully genotyped in greater than 85% of the validation samples. With 22% of the validation set heterozygotes showing relative loss of an allele, rs6983267 showed a frequency of overall relative allele loss lower than that observed in the original test set (30%; Table 3). A lower frequency of heterozygous samples in the validation set showed relative loss of an allele of rs7136702 (11%) compared to the test set (23% ; Table 3). Similarly, a lower frequency of allelic loss of rs16892766 was observed in the validation sample set (16%) compared to the original test set (26%; Table 3). rs6983267 again showed a tendency towards statistically significant preferential allelic imbalance (p-value = 0.06), favoring relative loss of the non-risk T allele and relative retention of the risk G allele in the validation sample set. However, neither rs7136702 nor rs16892766 showed a statistically significant tendency towards preferential allelic imbalance in the validation sample set (p-values = 0.59 and 1.00, respectively).

Combined Genotyping Results from Discovery and Validation Sample Sets
When the test set and validation set genotypes were combined, 48 of 192 heterozygous samples (25%) showed relative loss of an allele of rs6983267 (Table 3). For the SNP rs7136702, 31 of 208 combined heterozygotes showed relative loss of either allele (15%). When genotypes from the test set and validation set were combined for rs16892766, 13 of 65 heterozygotes (20%) showed allelic loss. By pooled analysis rs6983267 showed strong statistical evidence of preferential allelic imbalance (p-value = 0.01). After Bonferroni correction for multiple comparisons testing (n = 3), rs6983267 maintained a statistically significant adjusted p-value of 0.03. In contrast, both rs16892766 and rs7136702 failed to show any tendency towards significant allele-specific imbalance by combined analysis (unadjusted p-values = 0.17 and 0.37, respectively).

Compilation of Allelic Imbalance Data from Multiple Studies
Because others have published allele-specific imbalance data on seven variants from our study [19,35], we decided to perform combined analysis of the present study and the previously published studies to increase the power of identifying SNPs demonstrating allele-specific imbalance. When the imbalances observed in our samples at the SNPs rs6983267, rs961253, rs3802842, rs10411210, rs4444235, rs4779584, and rs9929218 were combined with those published previously [19,35], we observed a highly significant relative loss of the non-risk T allele of rs6983267 (p-value = 2.94610 25 ). After Bonferroni correction (n = 7), the preferential relative loss of the T allele of rs6983267 maintained a highly significant p-value of 2.06610 24 . None of the other variants showed statistically significant evidence of preferential allelic imbalance (Table 4).

Correlation Analysis of Allelic Imbalances and Age, Sex, and Tumor Stage
To test whether samples showing allelic imbalance for the GWAS SNPs had different clinical characteristics compared to samples not showing imbalance, we performed a correlation analysis of imbalance with age, sex and tumor stage using data from our discovery sample set. The presence of allelic imbalance was significantly associated with tumor stage for rs719725 (unadjusted p-value = 0.0098), and significantly associated with younger age for rs7014346 (unadjusted p-value = 0.033). However, after adjusting for multiple comparisons (n = 16), there was no significant association between the presence of allelic imbalance and age, sex, and tumor stage (adjusted p-values.0.05) for any of the tested SNPs.

Discussion
In this study, we investigated 16 SNPs previously associated with CRC risk for allele-specific imbalance using the SequenomH MassARRAY iPLEX Gold genotyping platform. While 15 of the 16 tested SNPs did not show statistically significant evidence (pvalue,0.05) of preferential allelic imbalance in our discovery sample set, the SNP rs6983267 demonstrated a tendency towards statistically significant somatic loss of the non-risk T allele and retention of the risk G allele in both the original discovery set and the validation sample set (p-values = 0.07 and 0.06, respectively; Tables 2 and 3). This is consistent with previously published reports [19,20]. Interestingly, despite being in high linkage disequilibrium with rs6983267 at 8q24 (D9 = 0.99) [9,13], rs7014346 did not show evidence of preferential allelic imbalance (p-value = 0.53) in the discovery sample set. In the largest previous study to assess allelic imbalance for rs6983267, 466 heterozygous  tumors from Finnish CRC patients were successfully evaluated and 101 of these heterozygous samples (22%) showed allelic imbalance [19]. Among these 101 samples, there were significantly (p-value = 0.0007) more tumors showing relative loss of the T allele (66% of tumors) versus relative loss of the G allele (34% of tumors). From our discovery and validation sets combined, we evaluated tumors from individuals heterozygous for the rs6983267 variant, and 48 (25%) of these heterozygotes showed allelic imbalance. We observed a nearly identical percentage of tumors showing relative loss of the T allele (33 of 48; 69%) versus the G allele (15 of 48; 31%). This was significant even after adjusting for multiple comparisons testing (p-value = 0.03; Table 3). Thus, our data support the observation of preferential allelic imbalance for rs6983267 and validate our experimental method. Furthermore, when we combined our data with that of Tuupanen et al. [19], we observed a highly significant relative loss of the T allele and relative gain of the G allele that withstood multiple comparisons testing (p-value = 2.06610 24 ; Table 4). Importantly, the finding that the risk G allele may be selectively retained or gained in colorectal tumors is consistent with a study showing that the G allele of rs6983267 demonstrates enhanced binding of the Wntregulated transcription factor TCF4, perhaps leading to increased responsiveness to Wnt signaling in individuals carrying the G risk allele [20]. Additionally, these data confirm that allele-specific imbalance does occur for CRC susceptibility loci, albeit at a low frequency.
In another recent study, somatic allelic imbalance was investigated at seven low-penetrance CRC susceptibility loci [35]. The loci-tagging SNPs rs4779584, rs3802842, rs4444235, rs9929218, rs10411210, and rs961253 that were genotyped in our study were among the seven variants tested for allele specific imbalance in the study by Niittymä ki et al. [35]. While none of these SNPs showed evidence of preferential allelic imbalance in the combined analysis with our data, one of these SNPs (rs961253) demonstrated similar allelic imbalance trends as those observed in our discovery sample set, with rs961253 showing more frequent relative loss of the A allele in both studies ( Table 4). Rates of heterozygosity and imbalance were very similar between the two studies with the exception of our study showing a higher degree of allelic imbalance for rs4779584.A combined analysis of our data and the data from Niittymä ki et al. [35] for the six variants in common did not reveal any SNPs with evidence of allele-specific imbalance. A caveat to combining data from the present study with that from published data sets is that the percentage of tumor cells in the samples as well as genotyping methods and R-value cutoffs for determining allelic imbalance differ across studies. Nonetheless, our study reproduces the finding that these six locitagging SNPs show no evidence for preferential allelic imbalance in predominantly Caucasian study populations.
Although only one of the SNPs tested in the present study showed strong evidence of preferential allelic imbalance, the other SNPs may play a role in germline predisposition for CRC independent of somatic events in the tumor. It has been proposed that these SNPs influence the development of neoplasms but do not affect subsequent somatic neoplastic progression [35]. The functional SNPs at the GWAS-identified loci may influence neoplastic development by modifying gene expression, methylation, or splicing patterns in such a way that selection at the DNA level is not required during tumorigenesis. These SNPs could also impact non-tumor cells, such as stromal or immune cells to modify cancer risk, but be independent of the cancer cells themselves. Once the mechanism by which these variants act to confer risk is better understood, we may be able to deduce which variants are more likely to show selection in tumors. Inherent limitations in our study design could further mask existing preferential allelic selection. First, it is possible that normal cells were isolated with tumor cells in the tumor tissue cores from which DNA was extracted for analysis. Despite initial selection of regions of the tumor containing 70% or greater tumor cells, some normal DNA contamination of the tumor DNA sample could bias the sample towards showing no imbalance. However, our histological examination of the tissue samples should minimize the possibility of normal DNA contamination. Similarly, our histologically normal samples from FFPE colon tissue may not be normal and may contain similar somatic mutations as the tumor, which could result in a general ''undercalling'' of tumors with imbalance. Whenever possible the normal colon tissue was collected from sites distant from the tumor. Second, we employed conservative data inclusion practices by discounting aggressive genotype calls made by the SequenomH MassARRAY iPLEX software and by instilling R-ratio cutoffs of .1.5 and ,0.67 for determination of allelic imbalance. Our rigorous requirements for inclusion of data may limit detection of borderline significant allelic imbalance, particularly in tumor samples containing nontumor cells. Furthermore, if tumors are heterogeneous for allelic loss we may not detect imbalances in that sample. Third, our discovery sample set was limited to 194 normal/tumor DNA pairs and may have lacked statistical power for detection of preferential allelic selection in loci showing lower levels of heterozygosity or less frequent genomic aberration. Based on mouse data showing that about 40% of susceptibility loci demonstrate preferential allelic imbalance [4], we did not expect all SNPs identified through GWA studies to show preferential allelic selection in tumors. However, our results are surprising in that only one SNP, rs6983267, showed a trend towards somatic selection in the colon tumors. These results may indicate differences between species, differences between colon and skin tumors, or may be the result of the discussed study limitations.
In conclusion, our results suggest that the majority of variants identified as colon cancer susceptibility alleles through GWAS do not exhibit somatic allele-specific imbalance in colon tumors. However, our data confirm previously published results showing allele-specific imbalance for rs6983267. These results indicate that somatic allele-specific imbalance of cancer susceptibility alleles may not be a common phenomenon in colon cancer, but that for a small percentage of loci (1 of 16, or 6%, observed in the present study), somatic selection of specific alleles may be driving tumorigenesis. Figure S1 Standard Curves for SNPs. Standard curves for 13 of the tested GWAS SNPs were generated by mixing control DNAs known to be homozygous for either allele in different proportions so as to generate mixtures of 0, 20, 40, 50, 60, 80 and 100% allele 1. DNA mixtures were quantitatively genotyped using SequenomH MassARRAY iPLEX Gold, and the percentage of allele 1 was plotted against natural log-transformed N-ratio. The line of best fit, linear equation in the form ln(N-ratio) = m(% allele 1)+b, and correlation coefficient R 2 are shown for each GWAS SNP for which the appropriate control homozygote DNA was available. The expected equation for the ideal standard curve is: ln(N-ratio) = 20.0456(% allele 1)+2.2822. (DOC) Figure S2 Box plots of R-ratios. Box plots for each call relative loss of allele 1, relative loss of allele 2 and no imbalance are plotted for each of the 16 SNPs genotyped in the discovery sample set. Average R-ratio is indicated by a white line and the standard deviation within each group is denoted. Outlier samples are indicated by a dot except for samples with R-ratios greater than 10 which were removed from the figure.