Pooled Sample-Based GWAS: A Cost-Effective Alternative for Identifying Colorectal and Prostate Cancer Risk Variants in the Polish Population

Background Prostate cancer (PCa) and colorectal cancer (CRC) are the most commonly diagnosed cancers and cancer-related causes of death in Poland. To date, numerous single nucleotide polymorphisms (SNPs) associated with susceptibility to both cancer types have been identified, but their effect on disease risk may differ among populations. Methods To identify new SNPs associated with PCa and CRC in the Polish population, a genome-wide association study (GWAS) was performed using DNA sample pools on Affymetrix Genome-Wide Human SNP 6.0 arrays. A total of 135 PCa patients and 270 healthy men (PCa sub-study) and 525 patients with adenoma (AD), 630 patients with CRC and 690 controls (AD/CRC sub-study) were included in the analysis. Allele frequency distributions were compared with t-tests and χ2-tests. Only those significantly associated SNPs with a proxy SNP (p<0.001; distance of 100 kb; r2>0.7) were selected. GWAS marker selection was conducted using PLINK. The study was replicated using extended cohorts of patients and controls. The association with previously reported PCa and CRC susceptibility variants was also examined. Individual patients were genotyped using TaqMan SNP Genotyping Assays. Results The GWAS selected six and 24 new candidate SNPs associated with PCa and CRC susceptibility, respectively. In the replication study, 17 of these associations were confirmed as significant in additive model of inheritance. Seven of them remained significant after correction for multiple hypothesis testing. Additionally, 17 previously reported risk variants have been identified, five of which remained significant after correction. Conclusion Pooled-DNA GWAS enabled the identification of new susceptibility loci for CRC in the Polish population. Previously reported CRC and PCa predisposition variants were also identified, validating the global nature of their associations. Further independent replication studies are required to confirm significance of the newly uncovered candidate susceptibility loci.


Introduction
Cancers are highly heterogeneous, polygenic disorders that arise in a multi-step process involving the selection of successive cellular clones and result from genetic as well as specific environmental factors. In the former case, both high-penetrance mutations and low-penetrance polymorphisms may determine a patient's defense and adaptive mechanisms against exposure to carcinogenic factors, determining susceptibility to this disease. However, the effect of common low-penetrance risk determinants is small when in isolation, increasing susceptibility only through the cumulative effect associated with the occurrence of multiple risk variants [1].
The association between allele frequency and susceptibility to disease can be studied by focusing on individually selected variants or, instead, on the position of over a million DNA variants, using single nucleotide polymorphism (SNP) microarray technology. Microarray platforms used by genome-wide association studies (GWAS) represent a relatively mature technology that allows scanning the entire genome to detect potential associations with disease without prior knowledge of their position or biological function. In theory, as a consequence of linkage disequilibrium (LD) between SNPs at a given locus, a high proportion of all diversity could be captured by genotyping a relatively smaller subset of markers (the so-called tagging SNPs) [2][3][4][5].
To date, over 1,000 susceptibility loci, usually of small or modest effect and accuracy from low to moderately high, have been identified by GWAS [6]. However, each of these studies, including over 50 GWAS performed with cancer patients, identified only a few risk variants when analyzed separately. Moreover, many studies have not been replicated [7,8]. The difficulties in the identification of genetic risk factors associated with heterogeneous and polygenic diseases, such as sporadic cancers, may be explained by the limitations of the methodology. Commercially available SNP array platforms have been optimized for studying diseases or traits based on the assumption that common diseases would be associated with common variants [9]. Since loci with a high effect size have been efficiently removed from the human population by natural selection, the identification of a common polymorphic susceptibility locus strongly associated with a disease, with odds ratio (OR) over 2 [10], is unlikely. Even though the identification of SNPs of low minor allele (MA) frequency have improved with the use of last generation chips, and higher probe densities enabled the study of variants with a low degree of heterozygosity, the detection of rare variants remains highly demanding in terms of statistical power [7,8,[11][12][13][14].
Prostate cancer (PCa) and colorectal cancer (CRC) are the most common types of cancers in the Polish population, and the leading cause of cancer-related morbidity and mortality [15]. Most CRCs are sporadic, and only a small proportion occurs in the course of highly penetrating hereditary syndromes, such as Lynch syndrome, familial adenomatous polyposis and other polyposis syndromes mediated by rare germline mutations in the DNA mismatch repair gene and in the adenomatous polyposis coli (APC) gene [16]. PCa predisposition mediated through rare mutations in some candidate genes, such as the BRCA2, also explain less than 10% of the relative familial risk [17]. Therefore, it is possible that a substantial proportion of heritable cancer risk is explained by a combination of common low-penetrance variants of modest effects. For example, genetic variation in 14 and 21 independent susceptibility loci, validated in unrelated populations, may explain approximately 8% and 13.5% of the heritability risk of developing CRC and PCa, respectively [16,18]. These results show, however, that most inherited variation associated with the risk of developing either cancer type remains to be determined.
A comprehensive analysis of variants conferring genetic susceptibility to CRC and PCa based on GWAS has not been conducted in the Polish population yet. A major cause for this lack of studies is the high cost of the SNP microarray technology, particularly considering that new loci identified by GWAS have been associated with progressively smaller effect sizes, demanding an increase in the statistical power (namely sample size) of GWAS. An alternative approach using pooled DNA samples has been developed [19]. Although the non-standard use of SNP arrays makes it necessary to take additional precautions into account [19,20], this approach substantially reduces research costs. It is important to consider, however, that a higher technical variation associated with the DNA pooling approach may mask the weakest associations. Thus, researchers have to trade between accuracy of genetic risk prediction and cost of their research.
In this study, we describe a pooled DNA sample-based GWAS as a cost-effective alternative to identify genetic variants of moderate effect associated with CRC and PCa in the Polish population. Pooled DNA samples were processed using microarray technology, and GWAS was employed as a genetic variance filtering approach. The technical validation of the GWAS results and the replication studies on individual DNA samples was conducted using much cheaper PCR-based genotyping technology.

Ethics Statement
All enrolled patients and control subjects were Polish Caucasians recruited from two urban populations, Warsaw and Szczecin.

Allelotyping GWAS
Genomic DNA was extracted from whole blood treated with EDTA using the QIAamp DNA Mini Kit (Qiagen, Germany), following the manufacturer's protocol. Before pooling, DNA sample concentrations were measured based on their fluorescent intensity using Quant-iT TM PicoGreen dsDNA Kit (Invitrogen, United Kingdom). To determine DNA quality with precision, the 260 nm/280 nm absorbance ratio of each sample was also measured using a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific Inc., USA), and samples were run on a 1% agarose gel to determine DNA integrity visually.
DNA samples that passed quality control tests were combined mixing equimolar concentrations according to patient diagnose to obtain 15-DNA sample pools. Pooled DNA samples were then brought to a final concentration of 50 ng/ml in Tris-EDTA buffer (pH = 8), with concentrations of Tris and EDTA not exceeding 10 mM and 0.1 mM, respectively. In the AD/CRC sub-study, a total of 35, 42 and 47 DNA pools were prepared for AD, CRC and controls, respectively, whereas in the PCa sub-study, a total of 19 and 19 DNA pools for both PCa and controls, respectively. To reduce the influence of experimental variation, DNA pools were subdivided into triple technical repeats and assayed independently, using separate microarrays, on the Affymetrix Genome-Wide Human SNP Array 6.0. Microarray genotyping experiments and the extraction of probe set signal intensities were performed using ATLAS Biolabs GmbH (Berlin, Germany).

Individual genotyping
For the technical validation of GWAS findings and for the replication study, individual patients were genotyped using TaqMan SNP Genotyping Assays (Life Technologies, USA), SensiMix TM II Probe Kit (Bioline Ltd, United Kingdom), and a 7900HT Real-Time PCR system (Life Technologies, USA).

Statistical analyses -allelotyping GWAS
The intensity of each SNP was calculated as the relative allele signal (RAS) for each microarray, such that: RAS = A/(A+B), where A and B are the probe set intensity values of alleles A and B, respectively, according to the Affymetrix coding [21,22]. The intensity of A and B was obtained from the Affymetrix Birdseed v2 algorithm. Mean RAS values were next calculated for each DNA pool to account for the three technical repeats. Prior to conducting the association tests, a principal component analysis (PCA) for all arrays was performed based on RAS values. Pools identified as outliers by plotting the first two principal components were excluded from further analyses.
To detect significant differences in allele frequency between PCa and the control group a combination of two statistical approaches was used. Firstly, between-group differences in RAS were tested using Student's t-tests to take into account RAS variation among pools representing each group [23]. Secondly, mean RAS values of all arrays in the patient and control group were calculated and significant differences in allele frequency were tested using a x 2 -test with one degree of freedom [24]. Since this test compares mean allele frequencies between groups without taking into account the high technical complexity of the allelotyping approach, it could lead to a higher number of false positive and false negative results. Conversely, the t-tests could be too sensitive to detect differences between groups if technical variation among pools is low. Thus, differences in allele frequency might be too small to be validated by individual genotyping. A combined statistical approach therefore provides a more accurate means to test for significant differences as compared to each test alone.
Candidate SNPs for individual genotyping were selected by combining the results from both the t-test and x 2 -test, using the clumping algorithm in the PLINK v1.06 software (http://pngu. mgh.harvard.edu/purcell/plink) [25]. Those loci for which there was an SNP (p,0.001) and at least one correlated proxy SNP (r 2 .0.7) within a 100-kb region (p,0.001, x 2 -test) were considered as positive results. Proxy SNPs were determined based on LD data obtained from 4100 individually genotyped Caucasian subjects from West-Pomerania in the SHIP cohort, using the Affymetrix Human SNP Array 6.0 [26,27].

Statistical analyses -individual genotyping
Technical validation of those candidate SNPs selected by the pooled-DNA GWAS was performed by individual genotyping of the same experimental cohorts. TaqMan genotyping data was first subjected to quality control procedures, including thresholds for maximum individual missingness for each of the SNPs ,0.05, maximum genotype missingness for each of the individuals ,0.05 and the Hardy-Weinberg disequilibrium ,0.001 for the control group. GWAS candidate associations were validated using the allelic x 2 -test (PLINK v1.07 software). SNPs with p-values ,0.01 were eligible for further analyses. High levels of concordance in allele frequency differences between case and control groups validated the accuracy of the GWAS screening process, including the equimolar pool construction and the statistical approach for selection of candidate SNP associations.
Validated GWAS-derived SNPs and literature-selected SNPs (Table S1) were further analyzed by individual genotyping in the extended AD, CRC and PCa cohorts ( Table 1). The binomial logistic regression model was used, using R software, to investigate associations in the context of additive gene action model for all the subjects enrolled in the study. A logistic regression analysis was also performed for PCa patients to determine whether any of the assayed SNPs was associated with early (,65 years of age) PCa onset. Benjamini-Hochberg correction was used for multiple comparisons. The heterogeneity among study populations was assessed with the I 2 and p-value of the Cochran's Q statistic. For meta-analyses, pooled-OR values with 95% confidence intervals (CI) were calculated using meta function of STATA version 11. Their significance was assessed by Z test and p,0.05 was considered significant.

Pooled-DNA allelotyping GWAS and individual DNA validation of the GWAS findings
The GWAS was carried out using pooled 15-DNA samples and the Affymetrix Genome-Wide Human SNP Array 6.0. The following outliers, identified by the PCA results, were excluded from the further analyses: 1) one pool representing 15 control male subjects in the AD/CRC sub-study and 2) 10 pools representing 150 PCa patients and one pool representing 15 controls, in the PCa sub-study. A reason why so many of PCa patient pools had to be rejected from further consideration is not clear. It can only be speculated that some pre-analytical variability, such as discreet changes in DNA quality and/or DNA microarray hybridization could affect the final results of the allelotyping experiments.
The pooled-DNA GWAS revealed 44 candidate SNPs associated with either AD, CRC or PCa, of which two were repeated in two unrelated comparisons. Considering SNP population frequencies of 0.2-0.5, our AD/CRC GWAS reached a power ranging from 98.6% to 99.8% and from 43% to 64% to detect effect size of OR = 2.0 and 1.5, respectively, at a = 1E-03, as estimated according to Dupont et al. [28] ( Figure S1).
Next, the GWAS-selected SNPs were validated by genotyping of individual DNA samples using TaqMan SNP Genotyping Assays. Five candidate SNPs (rs2557030, rs2557227, rs2574608, rs2755895, rs7583683) were excluded from further statistical analysis due to significant deviations (p,0.001) from the Hardy-Weinberg equilibrium detected in the healthy control group. Although TaqMan genotyping-derived MA frequencies deviated slightly from the RAS values for MA obtained in the microarray experiment, there was an agreement in the direction of differences (OR) in the allele frequencies of the case and controls groups as shown by the allelic x 2 -test (with p,0.01) for 30 out of 39 candidate SNPs: 24 associated with AD or CRC (one SNP, rs6702619, was identified in two separate comparisons) and six SNPs associated with PCa (Table 2). Table 1 shows demographic details of subjects enrolled at the replication study. When a logistic regression was used to determine the significance of the association between the 30 GWAS-selected SNPs, using case or control as the dependent variable and appropriately coded TaqMan genotypes as independent variables, 17 SNPs were significantly (p,0.05) associated with AD or CRC in additive model of inheritance (Table 3). Seven of those SNPs remained significantly associated after multiple testing adjustment. The MA of three variants was associated with increased CRC susceptibility, whereas for four variants MA was associated with a decreased risk. When allele frequencies between cases and control subjects were assessed with the x 2 -test corrected p-value, significant differences were observed for 13 SNPs (Table 3).

Replication study for GWAS-selected SNPs
The statistical evidence for heterogeneity between allele frequencies across validation and replication study groups was assessed by the Q-test p-value. Of 30 GWAS-selected SNPs, 14 revealed overall low heterogeneity (p.0.1). Among them, significant associations in replication study cohorts were apparently more frequent, regardless the statistic used to determine the significance of association ( Table 3). Lack of heterogeneity may be considered as a criterion of credible replication [29].
The association of 14 literature-selected variants with AD or CRC and four literature-selected variants with PCa was confirmed (p,0.05) in additive model of inheritance (Table 4). The association of the common SNP rs6983267 was confirmed for both the AD and PCa groups of patients. Strikingly, SNP rs1800894 (IL10) was associated in the opposite direction with AD and CRC susceptibility ( Table 4). The MA of the remaining 10 variants was associated with an increased risk and six variants with a decreased risk of PCa, CRC and/or AD. Of these 17 variants, five (rs1800894, rs16892766, rs6983267, rs1859962 and rs4939827) remained significant after correction for multiple comparisons. When allele frequencies between cases and control subjects were assessed with the x 2 -test corrected p-value, significant differences were observed in 11 comparisons for seven independent SNPs (Table 4).
To validate the global nature of these associations, betweendataset heterogeneity was tested. In the meta-analysis we included three SNPs associated with CRC and four SNPs associated with PCa susceptibility in our replication study for which associations were found with the same phenotype in at least four other studies. A random-effects model was used to calculate the pooled-OR values. As shown in Table 5, lack of demonstrable heterogeneity (Q p-value of less than 0.1) was noted across datasets representing three out of seven SNPs, and all pooled-ORs were significant (p,0.001).
To check whether any of the studied variants was associated with an early age of PCa onset, we performed a logistic regression analysis including cases only, with a binary indicator for age (below or above 65 years of age, coded as 1 and 0, respectively) at PCa diagnosis and the studied SNPs as independent variables. There were 171 patients diagnosed at age 65 or earlier and 247 patients older than 65. Two SNPs were significantly associated with age at PCa diagnosis (Table S2): rs1934636 and rs6983267. The former, a GWAS-selected SNP, was more frequent in the group of older patients (OR = 0.6, 95% CI 0.39-0.93, p = 2.18E-02), considering the dominant gene action model. Conversely, the rs6983267 variant was associated with a younger patient age in the age-stratified analysis; OR = 1.40, 95% CI 1.01-1.95, p = 4.44E-02).

Pooled DNA-based GWAS utility
It is generally accepted that well-designed GWAS should be conducted with groups of at least 1,000 patients and 1,000 controls, even though appropriate levels of statistical power to test for genetic associations (at p,5E-08) often relate to higher effect sizes [14]. These GWAS significance thresholds result from the requirement to correct for multiple comparisons and are aimed at minimizing the number of false positive findings [8]. However, exceedingly restrictive statistical criteria may, in turn, produce false negative results [11][12][13]. Indeed, those significant associations from independent replication studies were not ranked in the top 1,000 SNPs in the initial GWAS [46]. Thus, the use of stringent criteria may prevent the detection of subtle associations and account for missing heritability [14]. It is also recognized that there is certain level of heterogeneity in the GWAS results, which may arise due to the different genetic background (population stratification) of geographically distinct populations [41,63,64], or because of the bias introduced by population admixture effects [65,66]. Although few CRC susceptibility loci (as 8q24.21, 8q23.3 or 18q21.1) have been replicated in a number of studies [41], it is symptomatic that some of the identified associations reflect between-populations differences in tumor sub-site, age of CRC/ AD onset, sex or smoking status within the groups studied [41]. Thus, large cohort studies can ignore some sub-population-specific risk variants, so genome-wide genotyping should be also conducted in smaller cohorts. Conversely, studies with lower sample sizes typically reveal a smaller fraction of the heritability of a complex disease by failing to detect associations that do not reach statistical significance [7].
Since the final GWAS results depend on many factors, each associated with a different stage of the experimental procedure, their analysis and interpretation are often challenging. It is  essential to realize that the GWAS results reflect, at best, the differences in the genetic material of the cases and controls used for analysis. Although this may seem obvious, it emphasizes one of the most fundamental conditions required for a successful GWAS. Therefore, precise diagnostic criteria must be employed to obtain homogenous groups, as a nonrandom distribution of individuals with traits governed by strong genetic determinants, such as singlegene mutations, will strongly bias the final GWAS outcome. Although our pooled DNA-based GWAS represent studies with small sample size, they identified 30 SNPs significantly overrepresented in the studied groups (Table 2), which were further validated by TaqMan genotyping of the individual DNA samples. The replication studies selected 17 candidate risk variants associated with CRC, considering additive model of inheritance (Table 3). These associations had not been previously reported. Seven of them remained significant after correction for multiple hypothesis testing.
Although not all GWAS-selected susceptibility SNPs will have a direct functional association with a cancer phenotype, a careful analysis of the GWAS results showed that those SNPs located in intronic regions or in the LD blocks with nearby genes have a potential to influence cancer development (Table 3). Noteworthy, several candidate susceptibility genes (PRKCA, BMP6, ADAMTS19, ARHGAP6, FUT9/8, FAM108C1, CHL1, BTBD9 and WDR52) are involved in the actin cytoskeleton arrangement, cell adhesion and cell motility processes, which are important for cancer invasion and metastasis.
The rs3803820 located in the PRKCA gene (17q24.2) was selected in the CRC sub-study, showing OR = 1.27 (p = 2.24E-02). Other candidate SNP rs13192135, which showed a strong effect size of OR = 0.47 (p = 1.07E-02) in the CRC male group, is located at 6p24.3 in the intronic region of the BMP6 gene. Similarly, strong association with both AD and CRC risk, of the known rs4939827 variant of SMAD7 gene was indicated in the present study (Table 4). This is in agreement with several previous studies showing association of genetic variation in the BMP/Smad pathway-related genes with CRC risk [32,33,67].
The rs9848984 SNP at 3p26.3, downstream to the close homolog of L1 (CHL1) gene, is located in the LD block involving the 39-end of the gene. CHL1 is involved in cancer growth and in the metastasis of different human cancers, including colon and breast cancers [68]. The observation that both mRNA and protein levels of ARHGAP6 were elevated in the CRC tissue and cell lines suggests that it may serve as a biomarker for the development and progression of CRC [69]. Similarly, a high level of metalloprotease ADAMTS19 expression was observed in several tumor tissues and cell lines [70]. In turn, FAM108C1 activity was shown to predict the development of distant metastases [71].
The rs2799652 SNP was found in the promoter region of the alpha-(1,3)-fucosyltransferase (FUT9) gene, responsible for the biosynthesis of the Lewis X antigen, a cancer-associated antigen expressed preferentially in premalignant colon polyps [72]. FUT8, in turn, is responsible for modulation of E-cadherin function [73]. Previous studies showed that FUT8 and E-cadherin expression levels were significantly higher in primary CRC samples and that E-cadherin core fucosylation enhanced cell-cell adhesion in colon carcinoma [74]. Both FUT9 and downstream to FUT8 gene variations were shown to be associated with CRC risk in this study (Table 3). Interestingly, our replication study revealed also association between the intronic sequence variation (rs9929218) in the E-cadherin gene (CDH1) and AD risk, especially in males (Table 4).
SNPs rs1447295 and rs6983267 are located at the 8q24 region. Several studies have identified 8q24 as an important region associated with risk for various cancers, including prostate, breast, colon, ovarian and bladder cancers [62,[76][77][78]. To date, all susceptibility markers within 8q24 were located at five distinct LD blocks [53]. SNP rs1447295 is located at block 5 (previously referred as susceptibility region 1) and was shown to increase PCa risk in various populations with an OR ranging from 1.21 to 1.81, [47,48,[57][58][59][60]. Its rare allele A was also shown to be associated with an increased risk for prostate-specific antigen (PSA) recurrence in patients receiving radical prostatectomy (OR = 1.56, 95% CI 1.14-2.21) [79]. In fact, a meta-analysis of this SNP supported previously GWAS-reported associations [80].
Among the polymorphisms in block 4 (region 3) at 8q24, rs6983267 has been consistently identified in many studies, with an OR ranging from 0.65 to 1.42 [46,47,[49][50][51]57,59,81,82], therefore the strongest association with PCa risk in this LD block [53,83]. It has also been associated with CRC and ovarian cancer [76]. Recently, a meta-analysis showed an allelic and genotypic association of the rs6983267 polymorphism with CRC risk among Asians, Europeans, and Americans with a European ancestry [82]. Surprisingly, this variant did not show any association with the CRC phenotype in our study. However, it was significantly associated with AD risk (in the whole group and among females only) ( Table 4). In our age-stratified analysis, the minor allele T of rs6983267 was significantly associated with a younger age at PCa diagnosis (#65 years; considering an additive mode of inheritance) (Table S2). Accordingly, the G allele of rs6983267 was associated with an older age at PCa diagnosis in the Swedish population [42], and the higher PCa risk associated with this SNP was approximately doubled in those individuals susceptible to an early disease onset or to the development of a clinically aggressive disease [84].
Still, some previously reported associations with CRC and PCa risk were not replicated in our study. This may have been a result of a low statistical power coupled with a high genetic heterogeneity and/or cancer complexity [8]. If so, these inconsistencies may stem from a potential hidden stratification of our cohort, despite the apparent homogeneity of the Polish population.

Utility of cancer risk variants revealed by GWAS
The only factor that decreases cancer-related mortality significantly is early diagnosis. Since at the early stage of development cancers are asymptomatic or associated with unspecific symptoms, early diagnosis is usually accidental or results from the participation in screening programs. Epidemiological studies demonstrate that screening can be effective in a few cancer locations, including the large bowel and prostate. However, screening effectiveness depends not only on the availability of appropriate diagnostic tests, but also on the general acceptance of the proposed screening methods by those who consider themselves healthy. Colonoscopy used for CRC screening also allows simultaneous detection and removal of ADs, but it is a rather expensive procedure with low acceptability, especially by men [88]. By contrast, simple and cheap detection of serum PSA is widely accepted as a screening tool, but its predictive value is limited by the lack of specificity and the inability to differentiate indolent from aggressive PCa [89]. Therefore, specific but more expensive imaging-based methods might be introduced in PCa preventive programs. Enrolling healthy individuals with a higher risk of cancer to screening programs would increase the acceptance of screening exams, and therefore enhance their effectiveness and greatly reduce healthcare costs. Currently, CRC screening guidelines are based on age and to some extent on the family history of screeners. These guidelines could be also customized according to gender, race, ethnicity, smoking habits and presence of obesity, diabetes and metabolic syndromes [90].
One of the early hopes of the GWAS approach was to enable the development of risk prediction models that could accurately select high-risk individuals based on their genetic profiles. However, the proportion of risk explained by known susceptibility variants is still small. For example, according to a recently published meta-analysis of 30 selected SNPs associated with PCa risk, the proportion of the total genetic variance attributed to each SNP ranged from 0.2% to 0.9% as based on both OR and risk allele frequency [18]. Moreover, since the relative risk conferred by these loci is moderate or low, with ORs below 2, and new loci identified by GWAS have had progressively smaller effect sizes, the capacity for risk prediction in newly discovered common marker SNPs may be diminishing [89]. The problem is further complicated by interactions between genetic and environmental risk factors, largely due to a lack of established guidelines or procedures that would determine the impact of environmental factors on humans over the span of a lifetime. Thus, the information provided by genome-wide genotyping is often insufficient to be clinically useful in the prediction of cancer. In this sense, the cost of GWAS-based studies should be always considered, especially when adequate GWAS coverage of risk variants of small or modest effect requires larger sample sizes.
The major idea behind genomic studies is not only to enable recognizing genetic variability associated with susceptibility to a disease, but also to recognize the complex nature of genetic variability underlying its pathogenesis [1]. In this regard, although the genetic variants identified to date explain only a modest proportion of cancer heritability, their combination with additional, newly discovered loci may have a greater, cumulative, effect. Ideally, instead of typing all known variants, the most informative combination of potential SNPs should be assessed. Further research is therefore needed to enable the detection of new susceptibility variants. Moreover, it would be beneficial if such efforts were accompanied by an increase in the statistical power of GWAS.
In summary, in this study we provide evidence for the utility of pooled sample-based GWAS instead of genome-wide genotyping of individual DNA samples as a cost-effective alternative approach for filtering genetic variance which reached a decent statistical power particularly for the relatively common SNP markers of moderate effect sizes. The usefulness of pooling-based GWAS was exemplified through the identification of SNPs associated with CRC and PCa susceptibility in the Polish population. However, considering the complex nature of cancer, which involves the interaction of different genetic and environmental factors, detecting all cancer markers present in the human genome is a task beyond capabilities. In addition to previous findings, the risk information provided in the present study is still not sufficient to be used in clinical practice.

Supporting Information
Table S1 Literature-selected SNPs used in the replication study.