Characterizing Associations and SNP-Environment Interactions for GWAS-Identified Prostate Cancer Risk Markers—Results from BPC3

Genome-wide association studies (GWAS) have identified multiple single nucleotide polymorphisms (SNPs) associated with prostate cancer risk. However, whether these associations can be consistently replicated, vary with disease aggressiveness (tumor stage and grade) and/or interact with non-genetic potential risk factors or other SNPs is unknown. We therefore genotyped 39 SNPs from regions identified by several prostate cancer GWAS in 10,501 prostate cancer cases and 10,831 controls from the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We replicated 36 out of 39 SNPs (P-values ranging from 0.01 to 10−28). Two SNPs located near KLK3 associated with PSA levels showed differential association with Gleason grade (rs2735839, P = 0.0001 and rs266849, P = 0.0004; case-only test), where the alleles associated with decreasing PSA levels were inversely associated with low-grade (as defined by Gleason grade <8) tumors but positively associated with high-grade tumors. No other SNP showed differential associations according to disease stage or grade. We observed no effect modification by SNP for association with age at diagnosis, family history of prostate cancer, diabetes, BMI, height, smoking or alcohol intake. Moreover, we found no evidence of pair-wise SNP-SNP interactions. While these SNPs represent new independent risk factors for prostate cancer, we saw little evidence for effect modification by other SNPs or by the environmental factors examined.


Introduction
Prostate cancer is the most common non-skin cancer among men in industrialized countries, but beyond age, ethnicity and family history, very little is known about its etiology. Observed familial aggregation together with evidence from both twin and epidemiological studies demonstrate a key role for inherited genetic variants [1].
Genome-wide association studies (GWAS) conducted within the last few years have identified multiple common single nucleotide polymorphisms (SNPs) associated with prostate cancer risk [2][3][4][5][6][7][8][9][10][11][12][13][14][15]. However, the function of these SNPs (or the causal variants these SNPs serve as proxies for) remains largely unknown and data describing their correlation with clinical factors or their interplay with other genetic and non-genetic factors are sparse, mainly due to the large sample sizes needed for sufficient statistical power.
To this end, we selected 39 SNPs from regions identified in previous GWAS and genotyped them in 10,501 prostate cancer cases and 10,831 controls within the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We tested each SNP for association with two strongly predictive clinical factors: Gleason grade and tumor stage. We investigated interactions between SNPs and known and potential environmental risk factors. Finally, we performed exploratory analysis to identify possible pair-wise SNP-SNP interactions.

Association between SNPs and prostate cancer risk
Subject characteristics are displayed in Table 1. All 39 SNPs were significantly associated with prostate cancer risk ( Table 2) and directions of associations were consistent with previous findings [2][3][4][6][7][8]10,12,14]. Although risk estimates varied somewhat between different cohorts (Table S1 and Figure S1), we observed overall no strong statistical evidence for heterogeneity (P.0.01). Risk effects per allele ranged from 1.06 (rs2928679) to 1.44 (rs16901979). Carriers of two copies of the rare 'A' allele of rs16901979 had a 3-fold increased risk to develop prostate cancer in this population. The allele frequency of rs16901979 varies widely across ethnicities (Hapmap population frequencies: 0.03 in CEU, 0.26 in CHB and 0.58 in YRI), and thus might explain a part of the differences seen in prostate cancer incidence across populations. Based on p-value, we observed the strongest association for rs4430796 located in HNF1B/TCF2 (OR: 0.80 (95% CI: 0.77-0.83), P = 2.09 N 10 228 ) and the weakest association for rs4961199 near CPNE3 (OR: 1.07 (95% CI: 1.02-1.14), P = 0.012). In addition, rs266849 near KLK3 was only weakly associated with prostate cancer risk (OR: 0.93 (95% CI: 0.89-0.98), P = 0.009). rs266849 was initially identified in a GWAS using controls selected for low prostate-specific antigen (PSA) levels (,0.5 ng/ml) [4] and it has been suggested that rs266849 is a marker for circulating PSA levels rather than for prostate cancer risk [16,17].
The primary analysis in most GWAS assumes an additive increase in risk for each risk allele carried. rs4961199 (P = 0.02) was the only SNP showing nominally significant evidence of departure from additivity in our data. This was not unexpected since rs4961199 was initially identified using a recessive inheritance model [12].
No other SNP was differentially associated with tumor grade or stage after adjusting for multiple testing.

Association between non-genetic factors and prostate cancer risk
We tested for association between prostate cancer risk and potential non-genetic risk factors including family history of prostate cancer, diabetes, BMI, height, smoking and alcohol consumption. As expected, we observed a strong association between family history of prostate cancer and prostate cancer risk (OR: 1.77, 95% CI: 1.59-1.96, P = 1.88 N 10 227 ) as well as between diabetes and prostate cancer risk (OR: 0.73, 95% CI: 0.64-0.83, P = 1.61 N 10 26 ). Adjusting for BMI did not alter the association between diabetes and prostate cancer (data not shown). BMI was inversely associated with prostate cancer risk (OR: 0.996 (95% CI: 0.994-0.998) per BMI unit increase, P = 0.0004). This association was limited to obese men (BMI .30) compared to normal weight men (BMI,25) (OR: 0.86, 95% CI: 0.79-0.94, P = 0.0009), and we observed no association for being overweight (OR: 0.99, 95% CI: 0.93-1.05, P = 0.64). Adjusting for diabetes and smoking attenuated the association between obesity and prostate cancer risk (OR: 0.89, 95% CI: 0.82-0.98, P = 0.02). The inverse association between BMI and prostate cancer risk was restricted to nonaggressive cases as defined by Gleason grade ,8 and tumor stages A and B (data not shown). Height was not associated with prostate cancer risk, when analyzed as a continuous variable (OR: 1.001, 95% CI: 1.000-1.002 per cm increase, P = 0.12) or in tertiles (OR: 1.02, 95% CI: 0.99-1.06, P = 0.24). We observed a non-significant reduced prostate cancer risk among both former smokers (OR: 0.95, 95% CI: 0.89-1.01, P = 0.08) and current smokers (OR: 0.91, 95% CI: 0.82-1.00, P = 0.06) compared to never smokers. Adjusting for alcohol consumption or BMI did not change the results (data not shown). Finally, consuming more than 30 g alcohol per day (corresponding to two drinks) was associated with an increased prostate cancer risk (OR: 1.09, 95% CI: 1.01-1.18, P = 0.03). Adjusting for smoking did not alter this association (data not shown).

SNP-environment and SNP-SNP interactions
To investigate if the associations with family history of prostate cancer, diabetes and BMI were stronger in specific genetic strata, we tested for effect modification by including a SNPxE interaction term in the model. We also tested for SNP effect modification of age at diagnosis (studying the main effect of age is not appropriate  since our population comprises of a series of nested case-control studies matched on age). After adjusting for multiple testing, no SNP showed significant statistical interaction with any of the nongenetic factors tested (Table S3 and Table S4). Of note, two SNPs in the 8q24 region (rs620861, P = 0.05 and rs6983267, P = 0.004) showed nominally significant interactions with age at diagnosis, with the association being stronger in younger men. These results are in line with previous reports of stronger associations with earlier onset of disease for SNPs in the 8q24 region [6,18,19]. We observed marginally significant interactions between diabetes and rs10486567 in JAZF1 (P = 0.04) and between BMI and rs10486567 (P = 0.03). This is of particular interest since genetic variation in JAFF1 has been associated with diabetes, albeit not the same genetic variants. In this study, obesity was associated with a reduced risk for prostate cancer. It has been shown that BMI is inversely associated with PSA levels [20] and thus, obese men are less likely to get diagnosed through PSA screening. Because BMI was associated with non-aggressive disease, we also looked at possible SNP-BMI interactions stratified by disease aggressiveness but observed no significant interactions (data not shown).
To assess if the ambiguous associations between prostate cancer risk and height, smoking and alcohol consumption are due to hidden SNP-environment interactions, we conducted a joint test of the environmental main effect and the SNP-environment interaction effect. This test has proven powerful when the nongenetic effect is limited to a specific genetic stratum [21]. Across SNPs, the joint test was not significant for either alcohol or smoking after adjustment for multiple testing (Table S5, Table S6  and Table S7). Similarly, standard interaction tests between SNPs and height, SNPs and smoking and SNPs and alcohol consumption were not significant. Exploratory analyses of all possible pairwise SNP-SNP interactions revealed no excess in significant interactions than expected by chance (50 out of 630 tests, Table  S8). Furthermore, no SNP-SNP interaction was significant after correcting for multiple testing using a Bonferroni correction (lowest nominal P-value was 0.0005). Yeager and colleagues identified a SNP-SNP interaction between rs4242382 and rs620861 (P = 0.002) [13]. We also observe this interaction (P = 0.02), but not when the analysis was restricted to only MCCS and PHS (P = 0.75).

Discussion
In this study, we set out to examine whether SNPs identified in GWAS to be associated with prostate cancer show variation in risk by disease aggressiveness (tumor stage and grade) and/or interact with non-genetic and genetic factors. All 39 SNPs tested were significantly associated with prostate cancer in the overall analysis. However, the CGEMS project, which included four and six BPC3 studies in its second and third stage stages, respectively, contributed to identification of eleven SNPs investigated in the present study. We tested whether associations for these eleven SNPs could be confirmed in the remaining studies and with the exception of three SNPs, the findings were replicated with risk magnitudes similar to those in the CGEMS analysis. We could not replicate rs4961149 using data from three of the non-CGEMS cohorts. Since rs4961199 was included in CGEMS stage 2 based on its recessive association, we also tested the recessive model in the non-CGEMS studies and observed a non-significant association similar but weaker as compared with CGEMS (OR: 1.10, 95% CI: 0.82-1.47, P = 0.54).
Few of the observed associations differed by disease stage, tumor grade or environmental exposures. The most noteworthy finding was the qualitatively altered association according to Gleason grade for two SNPs near KLK3 (rs266849 and rs2735839), where the minor alleles were associated with lower risk of low-grade disease but higher risk of Gleason 8-10 tumors. This was previously observed by Kader and colleagues [22] who studied 5,000 patients and found a strong association between Gleason grade and rs2735839 (P = 3.7 N 10 27 ). The minor alleles of these SNPs have been associated with lower PSA levels indicating that carriers are less likely to be diagnosed at an early stage through PSA screening [16,17]. However, we did not observe any difference in the association of these two SNPs by disease stage, suggesting that delayed diagnosis might not fully explain these associations. Interestingly, the significant positive association of these two SNPs with Gleason 8-10 tumors support the clinical observations that PSA expression is lower in malignant than in normal prostatic epithelium and is further reduced in poorly differentiated tumors [23,24]. Together, these results suggest that KLK variation might influence high-grade prostate cancer risk through a yet unidentified pathway or simply as a genetic marker of the probability of a diagnosis of high versus low-grade prostate cancer diagnosis through its influence on PSA levels. To test this hypothesis, we performed case-only analysis based on year of diagnosis to reflect the introduction of wide-spread PSA screening (up to 1992 (670 men) vs. after 1992 (9831 men)). If the association between Gleason grade and KLK3 variation is due to altered PSA levels, we would expect to see differential associations according to year of diagnosis. We did not observe such differences, however, suggesting that the KLK3-prostate cancer association is not mediated by altered PSA levels. A recent Icelandic study conducted stratified analysis based on year of diagnosis and noticed that the association with prostate cancer was confined to the group of cases diagnosed in 1992 or later. These results suggest that the association between the KLK3 locus and prostate cancer is driven by the increasing frequency of PSA testing [25].
After adjusting for multiple testing, no other SNP was associated with clinical sub-types. Earlier studies had failed to link these SNPs to clinical characteristics [22,26], suggesting that these SNPs affect prostate cancer risk overall and not solely for more (or less) aggressive or advanced cancer.
We found overall no evidence that these SNPs interact with known or proposed risk factors for prostate cancer including family history of prostate cancer risk, age of onset, diabetes, BMI, height, smoking or alcohol consumption. Studying the interactions between SNPs and diabetes was of particular interest since genetic variation in JAZF1 and TCF2 has been associated with both prostate cancer and diabetes [8,11,12,27,28]. We did see a borderline statistically significant interaction between rs10486567 in JAZF1 and diabetes, but this particular SNP has not been associated with diabetes risk. A previous study conducted in CPS-II and PLCO found that diabetes did not mediate the association between JAZF1 and HNF1B/TCF2 SNPs and prostate cancer risk [29], and we observed no statistical interaction between diabetes and three SNPs in HNFIB/TCF2.
We observed no significant associations between prostate cancer risk and smoking or height and only a weak association between prostate cancer and alcohol consumption, even after accounting for the possibility of differences in the effects of these exposures by genotype. A meta-analysis of 39 studies observed that height was positively associated with risk (RR 1.05 per 10 cm increment, 95% CI 1.02-1.09) but the association was only seen in cohort studies [30]. A recent large meta-analysis of smoking and prostate cancer incidence found overall no evidence of an association but reported an increased risk when considering number of cigarettes smoked. Moreover, they observed a 9% risk increase for former smokers [31]. We did observe a marginal association between alcohol intake and prostate cancer risk. This is in line with earlier results indicating a weak risk increase for men consuming at least 25 grams alcohol per day (OR: 1.05 (95% CI: 1.00-1.08)) and for men consuming at least 50 grams per day (OR: 1.09 (95% CI: 1.02-1.17)) [32].
Overall, these results imply that the lack of robust associations between these environmental factors and prostate cancer risk is not due to interactions between these exposures and variation in any of the 36 SNPs assessed in this study. However, the lack of significant interactions does not rule out that gene-environment interactions exist in prostate cancer. All SNPs under study have been linked to prostate cancer through their main effects. Agnostic approaches such as incorporating gene-environment interactions in a genome-wide association study setting might identify genetic variants that only affect risk when acting with other factors. The lack of significant interactions can also reflect the low power to detect only modest interaction effects despite our sample size of 10,000 cases and 10,000 controls. It is important to note that our results do not rule out small departures from a multiplicative odds model for the joint effect of pairs of individual markers and risk factors, nor does absence of departure from a multiplicative odds model necessarily imply that these genetic loci and risk factors do not interact in some causal manner. Moreover, absence of interaction as defined here does not imply absence of a ''public health interaction'', where the benefit from reducing a risk factor in terms of absolute risk reduction differs across genotypes [33]. This is, to our knowledge, the first large-scale study to explore possible interactions between confirmed prostate cancer susceptibility markers and a broad spectrum of known and possible environmental factors. The SNPs considered in this study show marginal per-allele odds ratios ranging between 1.07 and 1.44. It is possible that these odds ratios might be larger in strata defined by other prostate cancer risk factors, not evaluated in this study. It is well recognized that exploring such interactions requires large study populations with well-defined exposure data. With 10,501 prostate cancer cases, 10,831 controls and prospectively collected data within established cohorts, BPC3 is in a unique position to explore both gene-gene and gene-environment interactions as demonstrated here. For example, in the absence of main effects (which is not the same as assuming no marginal effect and plausibly consistent with modest marginal genetic or environmental effects), the BPC3 has 89% power to detect an interaction effect of 1.2 assuming an allele frequency of 20% and an environmental exposure with a prevalence of 20%.
As with all studies utilizing environmental exposure data, the present investigation would be expected to have some degree of misclassification in the measurement of those factors. It is possible that alternative modeling of the environmental risk factors or more precise exposure quantification would increase statistical power (e.g. analyzing intensity, duration or pack-years of smoking rather than as never/former/current). However, a critical issue in conducting pooled analysis across studies is to harmonize data. As exposure data gets more refined, there is an increasing risk of discrepancies between cohorts which increases the risk of ''misclassification''. Since our study cohorts (MEC exempted) included predominantly men of European ancestry, we were limited in our ability to study other ethnicities.
Genome-wide association studies have been particularly successful for prostate cancer. Recently published secondary analysis of GWAS has now added ,10 additional prostate cancer SNPs to those presented here [5,9,11]. At time of this study, we did not have genotype data for these SNPs in BPC3 and it remains to be seen if they are differentially associated with clinical subtypes or if they interact with non-genetic factors.
In summary, we independently replicated the association between prostate cancer risk and 36 SNPs identified in multistage genome-wide association studies of prostate cancer. Except for SNPs in KLK3 that were differentially associated with Gleason grade, we did not detect any differentiation in SNP associations according to Gleason grade or stage at diagnosis, two clinical factors strongly predictive of disease outcome. Moreover, we found no strong evidence that these SNPs interact with age, family history, diabetes, BMI, height, smoking or alcohol consumption.

Study Population
The BPC3 has been described in detail elsewhere [34]. In brief, the consortium combines resources from seven well-established cohort studies with blood samples collected as follows: the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study in 1992-1993 [35], American Cancer Society Cancer Prevention Study II (CPS-II) in 1998 [36], the European Prospective Investigation into Cancer and Nutrition Cohort (EPICcomprised of cohorts from Denmark, Great Britain, Germany, Greece, Italy, the Netherlands, Spain, and Sweden) in 1993 [37], the Health Professionals Follow-up Study (HPFS) in 1993 [38], the Multi-Ethnic Cohort (MEC) in 1995 [39], the Physicians' Health Study (PHS) in 1982 [40], and the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial in 1993-2001 [41]. In addition, the Melbourne Collaborative Cohort Study (MCCS) established in 1990-1994 [42] recently joined the consortium. Together, these eight cohorts collectively include over 265,000 men who provided a blood sample.
Prostate cancer cases were identified through population-based cancer registries or self-reports confirmed by medical records, including pathology reports. Except for the MCCS study, the BPC3 consists of a series of matched nested case-control studies within each cohort; controls were matched to cases on a number of potential confounding factors, such as age, ethnicity, and region of recruitment, depending on the cohort. MCCS used a case-cohort design, with a randomly sampled sub-cohort serving as controls. Written informed consent was obtained from all subjects and each study was approved by the Institutional Review Boards at their respective institutions. The IRBs for each study were as follows: The current study was restricted to individuals who self-reported as being Caucasian. We had genotype data for a total of 10,501 prostate cancer cases and 10,831 controls. Data on disease stage and grade at time of diagnosis were collected from each cohort, wherever possible. A total of 1,717 cases were classified as highstage (stage C or D at diagnosis) and 1,388 were classified as highgrade (Gleason grade .7 or equivalent, i.e. coded as poorly differentiated or undifferentiated). For 15% of the cases, we did not have information about tumor stage or Gleason grade.
Baseline information of height and body weight, family history of prostate cancer, cigarette smoking status (never, past, and current), alcohol intake (g/day) and information about a preexisting diabetes diagnosis were collected by self-report. Family history, which was defined as having at least one first-degree family member diagnosed with prostate cancer, was available for all but two cohort studies (PHS and EPIC). For some countries in EPIC, weight and height was measured.

Collection and harmonization of non-genetic data
We collected data on family history, diabetes at baseline, smoking, alcohol consumption, height and BMI for each study. Family history of prostate cancer was dichotomized into ''yes'' (1,780 subjects) or ''no'' (12,382 subjects). Age was calculated at age of diagnosis/selection as control except for MCCS (at baseline for controls) and MEC (at blood draw for controls) and further dichotomized into younger or equal to 65 years old or older than 65 years. BMI was calculated based on baseline weight (kg) and height (m) categorized into 3 categories: normal weight (BMI,25 kg/m 2 , 7,947 subjects), overweight (BMI 25-30 kg/ m 2 , 10,206 subjects) and obese (BMI.30 kg/m 2 , 2,771 subjects). Height was analyzed both as a continuous variable and in tertiles (,173 cm (7,221 subjects), 173-180 cm (7,324 subjects) and .180 cm (6,548 subjects).
We agreed on a common protocol prior to data collection based on data availability in the studies. Each study was responsible for sending the data in a format as described in the protocol to facilitate data harmonization. We agreed on collecting as detailed information as possible without having to exclude any study due to lack of covariate information (that is, we aimed for the least common denominator for the variables of interest). Inconsistencies or clarifications were handled by a dialogue between the data coordinating center and the individual studies. All studies have published analysis on these variables earlier and details on quality checks can be found in study-specific publications. All statistical analyses were conducted centrally.

SNP selection and genotyping
We selected 39 SNPs based on the literature for prostate cancer GWAS (  22). For rs12418451, we used genotypes from either rs12418451 or rs10896438 (r 2 = 0.964 in HapMap CEU population) and for and rs2928679 we used genotypes from either and rs2928679 or rs13264338 (r2 = 0.966 in HapMap CEU population). We did not have genotype data on rs4961199, rs16901979 and rs16902094 for MCCS.
Genotyping was performed using the TaqMan assay (Applied Biosystems, Foster City, CA) in five different genotyping laboratories: Core Genotyping Facility at National Cancer Institute, Harvard School of Public Health, University of South California, DKFZ and UK Cancer Research. Blinded duplicated samples indicated no genotyping error. For each autosomal SNP, we tested HWE in the controls in each study separately. All autosomal SNPs were in HWE (P.0.01).

Statistical methods
We tested the association between prostate cancer risk and each SNP with a likelihood ratio test based on unconditional logistic regression. We adjusted all analyses for study and age at diagnosis or selection as a control in five year intervals using indicator variables. All odds ratios are calculated per copy of the minor alleles (0,1,2) carried. For each SNP, we used Cochran's Q statistic to test for heterogeneity between studies.
To estimate odds ratios for high-grade or low-grade disease, we performed multinomial regression with an outcome variable coded as 0 (control), 1 (low-grade) or 2 (high-grade). To test for differential SNP associations between low-grade and high-grade disease, we used a likelihood ratio test based on case-only analysis. We repeated this analysis for high-stage/low-stage disease.
We tested for interaction between SNPs and non-genetic factors by conducting a one degree-of-freedom likelihood ratio test of a single interaction term (SNPxE) as implemented in an unconditional logistic regression. When an environmental factor had more than two categories (as is the case for smoking, BMI and height), we used ordinal coding for the interaction term. To explore whether associations with proposed environmental risk factors may have been masked by effect heterogeneity, we performed a joint (2 d.f.) test of the environmental main effect and the interaction effect. This test has been shown to outperform the standard marginal test when the environmental effect is restricted to a genetic stratum [21]. Cohorts with no variability in exposure (such as ATBC and smoking) were excluded from geneenvironment interaction analyses. We tested for pair-wise SNP-SNP interactions using a one degree-of-freedom likelihood ratio test of a single interaction term as described for the SNPenvironment interaction tests.
We tested for dominance deviation from an additive model by including an additional SNP covariate coded as (0,1,0) for (homozygote common allele, heterozygote, homozygote rare allele) respectively. Based on unconditional regression, we performed a one degree-of-freedom likelihood ratio test where the full model was tested against a model only including the SNP covariate with additive coding (0,1,2) as described above. All reported P values are two-sided and uncorrected for multiple hypothesis testing. Analyses were conducted in R [43] and SAS version 9.1. Figure S1 Study-specific SNP associations with prostate cancer risk. For rs4961199, rs16901979 and rs16902094 we did not have genotype data from MCCS. (DOC)