An Analysis of Growth, Differentiation and Apoptosis Genes with Risk of Renal Cancer

We conducted a case-control study of renal cancer (987 cases and 1298 controls) in Central and Eastern Europe and analyzed genomic DNA for 319 tagging single-nucleotide polymorphisms (SNPs) in 21 genes involved in cellular growth, differentiation and apoptosis using an Illumina Oligo Pool All (OPA). A haplotype-based method (sliding window analysis of consecutive SNPs) was used to identify chromosome regions of interest that remained significant at a false discovery rate of 10%. Subsequently, risk estimates were generated for regions with a high level of signal and individual SNPs by unconditional logistic regression adjusting for age, gender and study center. Three regions containing genes associated with renal cancer were identified: caspase 1/5/4/12(CASP 1/5/4/12), epidermal growth factor receptor (EGFR), and insulin-like growth factor binding protein-3 (IGFBP3). We observed that individuals with CASP1/5/4/12 haplotype (spanning area upstream of CASP1 through exon 2 of CASP5) GGGCTCAGT were at higher risk of renal cancer compared to individuals with the most common haplotype (OR:1.40, 95% CI:1.10–1.78, p-value = 0.007). Analysis of EGFR revealed three strong signals within intron 1, particularly a region centered around rs759158 with a global p = 0.006 (GGG: OR:1.26, 95% CI:1.04–1.53 and ATG: OR:1.55, 95% CI:1.14–2.11). A region in IGFBP3 was also associated with increased risk (global p = 0.04). In addition, the number of statistically significant (p-value<0.05) SNP associations observed within these three genes was higher than would be expected by chance on a gene level. To our knowledge, this is the first study to evaluate these genes in relation to renal cancer and there is need to replicate and extend our findings. The specific regions associated with risk may have particular relevance for gene function and/or carcinogenesis. In conclusion, our evaluation has identified common genetic variants in CASP1, CASP5, EGFR, and IGFBP3 that could be associated with renal cancer risk.


Introduction
Renal cancer is among the most commonly diagnosed cancers in men and women in the United States [1] and Eastern Europe [2]. The incidence of renal cell carcinoma (RCC), the most common malignancy of renal cancer, has increased rapidly worldwide over the past few decades [3,4] with some of the highest rates occurring in Central and Eastern Europe [2,5]. Only a few well-established lifestyle risk factors have been identified: cigarette smoking, obesity, hypertension and diabetes [6]. An increased risk observed among those with a family history of renal cancer and the identification of inherited forms of kidney cancer provide justification for evaluating the genetic susceptibility of this disease, which has not been fully investigated [6].
The mechanism by which a normal cell progresses to carcinoma customarily involves the disruption of critical molecular pathways in cellular growth, differentiation, and development [7]. Among the steps required for tumor cell growth and survival are the amplification of signals from growth factors and the interruption of signals promoting cell death or apoptosis [8,9]. Alterations in genes involved in such pathways are thus likely to contribute to cancer risk. Based on this logic, we identified genes involved in cell growth and differentiation (AKR1C3, EGF, EGFR, IGFBP3, IGFBP5, PPARG, TGFA, VCAM1, and VEGF) and apoptosis (CASP1, CASP2, CASP3, CASP4, CASP5, CASP6, CASP7, CASP8, CASP9, CASP10, CASP12, and CASP14; Table 1). Several of these genes have been associated with risk of cancer at other sites [10,11]; however, the role of these genes in the development of renal cancer remains unknown.
Given the importance of these pathways in carcinogenesis and the lack of studies evaluating genetic susceptibility and renal cancer, we evaluated whether polymorphisms in these 21 genes could alter the risk for developing renal cancer in a large multicenter case-control study based in Central and Eastern Europe. We hypothesized that common variation in genes involved in cellular growth, differentiation and apoptosis may increase genetic susceptibility to renal cancer.

Study Population
The Central and Eastern European Renal Cancer (CEERC) Study is a hospital-based case-control study of renal cancer (1,097 cases and 1,555 controls) that was conducted in seven centers in Eastern and Central Europe (Moscow, Russia; Bucharest, Romania; Lodz, Poland; and Prague, Olomouc, Ceske Budejovice and Brno, Czech Republic). Details of the study have been described previously [12]. Newly diagnosed and histologically confirmed cases of renal cancer (ICD-0-2 code C64) between the ages of 20 and 79 years were recruited from August 1999 through January 2003. Trained medical staff reviewed medical records and extracted information on date and method of diagnosis, histological classification, tumor location, stage and grade. Pathology data was available for 917 cases. RCC was defined as the following subtypes: clear cell, clear cell with papillary features, clear cell with sarcomatoid, papillary type I, papillary non-type I, papillary type II, chromophobe and hybrid subtype (n = 848). Clear cell renal cancer was defined as the first three clear cell subtypes (n = 760). Eligible controls were chosen from among patients admitted to the same hospital as cases for conditions unrelated to smoking or genitourinary disorders (except for benign prostatic hyperplasia) and were frequency-matched to cases on age (within 3 years), sex, and study center. Among controls, the disease conditions associated with hospitalization were the following: obstetric or perinatal (0.1%), infectious (1%), psychiatric (1%), endocrine (2%), hematologic (3%), dermatologic (3%), injury or poisoning (3%), genitourinary (benign prostatic hyperplasia (4%), pulmonary (4%), orthopedic or rheumatologic (9%), cardiovascular (10%), neurologic (11%), ophthalmologic or otologic (14%), gastrointestinal (19%), and other (16%). No single disease made up more than 20% of the control group. A portion of the controls were also recruited for a parallel study of lung cancer. All recruited cases and controls were Caucasian. Response rates at each center ranged from 90.0 to 98.6% for cases and from 90.3 to 96.1% for controls. Interviews were conducted by trained personnel to collect standardized lifestyle and food frequency questionnaires. Data was collected on demographic characteristics, education, tobacco smoke exposures, alcohol consumption, dietary practices, anthropometry, medical history, family history, and occupational history.
Blood samples were collected and stored at 280uC and shipped to the National Cancer Institute (NCI). Genomic DNA was extracted from whole blood buffy coat by the standard phenol chloroform method at the NCI laboratory. All subjects in this study provided written informed consent. This study was approved by the institutional review boards (IRB) at the NCI, International Agency for Research on Cancer (IARC), and each participating center.

Replication Study
The US Kidney Cancer Study is a population-based casecontrol study conducted in Detroit and Chicago. Cases were residents of the study areas, aged 20 to 79 years who were newly diagnosed with histologically confirmed renal cell carcinoma (ICD-O2 C64.9) from February 2002 through January 2007. Controls were frequency-matched to cases by study center, race, age, and sex. Controls aged 65 years and older were identified from Medicare files, and those under age 65 years were identified from Division of Motor Vehicle records. African American cases and controls were over-sampled. Written informed consent was obtained from all participants, and IRB approvals were obtained from all participating study centers.
Participants were interviewed by trained interviewers to elicit information on demographic factors, use of tobacco and alcohol, diet, occupational history, height and weight history, family history of cancer, reproductive history among women, medical history, and medication history including the use of diet pills and antihypertensives. A total of 1568 Caucasians (856 cases and 712 controls) and 884 African-Americans (523 cases and 361 controls) were interviewed. Of these subjects, 1109 cases and 1106 controls provided DNA that was extracted using standard procedures. Genotyping data was available for 966 cases and 977 controls with sufficient quality and quantity of DNA. Subjects were predominantly recruited from the Detroit area (84%), and were similar in age (76%.50 years) and sex (57% male) to those in the CEERC study.

Statistical Analyses
Of the 987 cases and 1298 controls that had valid study data and provided genomic DNA, analyses were based on the 777 cases and 1035 controls that had adequate quality DNA and were successfully genotyped on the OPA platform in the CEERC study. Associations were evaluated through several methods. Global pvalues were evaluated using the minimum-p value permutation test [15]. A haplotype-based method called HaploWalk, conducted in Matlab, was used to identify chromosome regions of interest by examining regional associations rather than effects from an individual SNP. For a gene with K SNPs, the HaploWalk procedure considered a 3 SNP sliding window for each SNP from SNP 2 through SNP K-1. To account for multiple testing across the K SNPs, the K-2 p-values (one for each window) were adjusted for multiple comparisons using the False Discovery Rate (FDR)controlling procedure of Benjamini and Hochberg [16]. Windows that remained significant at a FDR level of 10% were considered to be a candidate region of interest. If adjacent windows were significant, they were amalgamated into a single candidate region of interest. Haplotypes in the candidate block were then reconstructed and effects evaluated using Haplostats (Version 1.3.1) in R (version 2.4.1). The most common haplotype was used as the reference group and haplotypes with frequencies less than 1% were combined into one category for testing. Subsequently, unadjusted and adjusted (age, sex, and study center) odds ratios (OR) and 95% confidence intervals (95% CI) using the logadditive model were generated for regions with a high level of signal.
The association between individual SNPs and risk of renal cancer were estimated by unconditional logistic regression, adjusted for age, sex, and study center. Genotypes were evaluated by coding the homozygous common allele as the referent group and separately comparing the heterozygous and homozygous rare allele genotypes to the referent group. Linear tests for trends were conducted by including a variable coded 0, 1, and 2 corresponding to the number of rare alleles. Associations for SNPs were considered robust if they were significant (based on the p-value of the test for trend) with a FDR level of 20% or less. A more liberal FDR level was chosen at this stage of analysis in order to guide us toward SNPs that may be of interest within previously identified regions of interest. FDR adjustment was based on the number of SNPs within each gene region. Additional adjustment for potential confounders (body mass index [BMI], self-reported hypertension, and smoking) did not result in meaningful changes of the risk estimates and were not included in the analyses. In addition, we investigated multiplicative interaction between individual SNPs and age, sex, and BMI, using the likelihood ratio test to compare the fit of models with and without interaction terms. Heterogeneity of genotype frequencies among countries was evaluated by using the likelihood ratio test to compare the fit of models with and without interaction terms, but we did not find any evidence of heterogeneity. Analyses were conducted using SAS version 9.1 (SAS Institute, Cary, NC).

Results
A large proportion of the study population was from the Czech Republic, with a slightly higher proportion among cases (Table 2). Controls were more likely to be male, but were similar to cases in age distribution. Cases were more likely than controls to have higher BMI, have a family history of cancer, and report hypertension.
Results from global gene-based tests of association are included in Table 1. Among results from the minimum p-value test, CASP1/ 5/4/12, CASP14, and IGFBP3 were the most promising gene regions, but were not significant after adjustment for multiplicity (total number of SNPs) over the entire gene (Table 1). However, CASP1/5/4/12, EGFR, and IGFBP3 had a larger number of significant SNPs (p-value for trend ,0.05) than one would expect to see by chance. In addition, with a haplotype-based sliding window method, we identified the same genes with regions that were associated with renal cancer risk at a FDR level ,10%: CASP1/5/4/12, EGFR, IGFBP3, and VCAM1 (Supplementary Figures S1, S2, S3).
An interesting region was detected that spans over the area upstream of CASP1 through exon 2 of CASP5 (Supplementary Figure S1). At this region, individuals with a specific variant haplotype GGGCTCAGT (OR: 1.40, 95% CI: 1.10-1.78) had a 1.4 fold higher risk of renal cancer compared to those with the most common haplotype (Table 3). Concordant with the haplotype analysis, several individual variants within this haplotype also had nominal statistically significant associations with renal cancer risk (Table 4). After applying FDR adjustment, four CASP1 and CASP5 SNPs (rs1785883, rs568910, rs492859 and rs507879) were considered significant at a FDR level ,20%. The strongest association among individual SNPs was rs507879 (Thr90Ala), located in exon 2 of CASP5. The ORs (95% CI) for heterozygote and homozygote rare genotypes compared to the homozygote common genotypes were 1.29 (1.03-1.60) and 1.39 (1.07-1.82; p-value for trend = 0.01), respectively. The OR and pvalue of the specific variant haplotype were stronger than the associations (p-value for trend) observed for any of the individual SNPs in this region, suggesting that the causal variant within this haplotype may not have been genotyped.
We had the opportunity to conduct a quick replication of our most statistically significant finding, CASP5 SNP rs507879 in the US Kidney Cancer Study population (Table 5). Although results from the US Kidney Cancer Study were not statistically significant, the point estimates were in the same direction as those from the CEERC study. A pooled estimate of 1.22 (95% CI: 1.04-1.42) was observed for those with at least one copy of the rare allele of rs507879 among Caucasian participants. A pooled estimate including both Caucasians and African-Americans from both studies was not noticeably different from the estimate restricted to Caucasians (OR: 1.22, 95% CI: 1.05-1.41; Table 5).
A sliding window analysis over EGFR revealed three signals within intron 1 (Supplementary Figure S2). In particular, two haplotypes centered on rs759158 (region 3) were associated with a higher risk of renal cancer (GGG: OR: 1.26, 95% CI: 1.04-1.53 and ATG: OR: 1.55, 95% CI: 1.14-2.11; Table 3) when compared to the common haplotype. In the second EGFR region, variant haplotype TGA was associated with an increased risk of renal cancer compared to the common haplotype (OR: 1.32, 95% CI: 1.02-1.70). Associations between three of the SNPs within these EGFR haplotypes (rs11238349, rs6954351, and rs7796139) were nominally statistically significant, but with FDR levels ,30% ( Table 4). The two SNPs rs6954351 and rs7796139 appear to be responsible for the associations in their respective regions; however, these associations do not appear to be entirely independent effects as the SNPs are moderately correlated (r 2 = 0.47). We further evaluated the strong signal in EGFR by integrating the second and third regions to form a haplotype spanning seven SNPs in intron 1 (Supplementary Table S2). Among common haplotypes, the effect estimates for haplotypes containing GGG or GTG appear to be consistently above 1.0. It is interesting to note that among common haplotypes in the integrated region, the variant haplotype TGA from the second region is present only with either variant haplotype GGG or GTG, the statistically significant haplotypes from the third region. This suggests that the two sets of haplotypes may be reflecting the same signal. A strong haplotype effect was observed for the variant haplotype TGA-A-GGG, with an OR of 1.84 (95% CI: 1.25-2.71) and a p-value of 0.002. This effect was stronger than those observed for the individual regions and reinforces the idea that these two regions are related. A second variant haplotype in the integrated region was also statistically significant (OR:1.60; 95%      For IGFBP3, a large region across the gene was considered noteworthy using a sliding window analysis (Supplementary Figure  S3). Two regions were defined by evaluating linkage disequilibrium across the identified area. The second region, spanning the area of exon 5 to 39 downstream of IGFBP3, was associated with a global p-value of 0.04. Among haplotypes in this region, variant haplotype AGC (OR: 1.27, 95% CI: 1.06-1.54) and TAT (OR: 1.62, 95% CI: 1.05-2.51) were associated with increased renal cancer risk (Table 3). Among SNPs in the haplotype, rs6670 was statistically significantly associated with renal cancer risk at a FDR level ,20%. We observed a positive association between renal cancer risk among subjects that had at least one copy of the rare allele with an OR of 1.27 (95% CI: 1.04-1.56). The association for haplotype AGC, which contains the rare allele for rs6670, was slightly stronger than the effect observed for the individual SNP and appears to be driven primarily by rs6670. The causal variant for haplotype TAT, however, is not apparent, suggesting that the causal variant was not genotyped in this study.

A-A-G-T-C-G-A-
In VCAM1, a variant haplotype centered on rs3917010 was also associated with an increased risk of renal cancer (CAT OR: 1.25, 95% CI: 1.01-1.54; Table 3). However, none of the VCAM1 SNPs were significantly associated with renal cancer risk after FDR adjustment. Although a statistically significant association was observed, this association could be spurious as the effects observed for the haplotype are not concordant with the individual SNP associations within this haplotype (Supplementary Table S1).
Results for individual analyses of all SNPs can be found in Supplemental Table S1. No statistically significant interactions between our statistically significant SNPs and potential effect modifiers (age, sex, and BMI) were detected (data not shown). Additional sensitivity analyses restricted to RCC (n = 627 cases) and clear cell RCC (n = 564 cases) did not meaningfully change any of the previously detected associations (data not shown).

Discussion
In this study, we conducted an exploratory analysis of 319 SNPs in or around 21 genes involved in cell growth/differentiation and apoptosis pathways in relation to renal cancer risk. We identified both haplotypes and SNPs in CASP1/5/4/12, EGFR, and IGFBP3 that were statistically significantly associated with risk of renal cancer. Associations between SNPs in the other investigated cell growth/differentiation and apoptosis pathway genes were weak and less promising.
There is strong evidence supporting the biological relevance of genetic variants in EGFR and IGFBP3 and renal cancer risk. EGFR encodes for a transmembrane growth factor receptor that plays a critical role in the signal transduction pathway regulating cell proliferation, differentiation, and survival [17,18]. A recent study has proposed an additional role for EGFR of interacting with and stabilizing the sodium/glucose cotransporter 1 (SGLT1), thus helping to maintain intracellular glucose levels in low extracellular glucose environments and prevent cell death from occurring [19]. This is especially relevant to renal cancer, as both EGFR and SGLT1 are expressed in the kidney, where glucose uptake is important [20]. Altered glucose metabolism is one of the major hypotheses thought to explain the association between diabetes and renal cancer. Thus far, most studies have focused on evaluating EGFR in relation to cancer progression and targeted treatment [21,22]. It is interesting to note that the first intron of EGFR (.120 kb) has been implicated as an important regulatory area [21,23]. A highly polymorphic (CA) n repeat in intron 1 of EGFR, about 1.5 kb downstream of exon 1, has been associated with decreased EGFR transcription in multiple studies [24,25]. This microsatellite appears to be in linkage disequilibrium with several SNPs of unknown function in the promoter region of this gene, as well [26]. One of these variants (rs759171) was also genotyped in this study, but not associated with renal cancer risk (Supplementary Table S1). In this study, three SNPs (rs11238349, rs6954351, and rs7796139) from intron 1 of EGFR and identified through our initial screen were statistically significantly associated with risk of renal cancer. Among these three SNPs, only rs6954351 and rs7796139 were moderately correlated (r 2 = 0.47) with one another. Subsequent analyses suggest that perhaps a haplotype that includes these two SNPs may be driving the associations found in this region. The mechanism through which these intronic SNPs (or variants in linkage disequilibrium with these SNPs) might affect renal cancer risk is unknown but they do reside within a functionally relevant region of EGFR that has been associated with decreased EGFR transcription and protein expression in humans.
Similar to our findings for EGFR, the IGFBP3 regions associated with modified risk appear to be functionally important in cancer. IGFBP3 encodes for IGF-binding protein 3 and is the primary carrier of circulating IGF-1. A reduction in the amount of IGFBP3 available results in an increase in levels of free IGF-1, a factor associated with growth, proliferation, and an elevated risk of several cancers [27,28]. Independent of IGF-1, IGFBP3 has also been shown to affect cell proliferation and apoptosis through its interactions with several signaling pathways [29,30]. In relation to renal cancer, experimental studies have demonstrated that IGFBP3 expression is increased among both clear cell renal tumors and renal cancer cell lines [31,32]. The promoter region of IGFBP3 has also been observed to be frequently hypermethylated in primary renal cell tumors, but unmethhylated among normal cells [33]. We   Table 3. cont.
observed a statistically significant increase in renal cancer risk with rs6670 located in the 39 untranslated region (UTR) of IGFBP3. Variants in the 39UTR could be involved in the stability and  expression of mRNA [34]. IGFBP3 variation has been evaluated with several other cancer sites [35], but this is the first study to evaluate SNPs in relation to renal cancer. In association studies, SNPs in IGFBP3 and IGF related genes (IGF-1 and IGFBP1) have been related to circulating IGF-1 and IGFBP-3 levels [36,37]. IGFBP3 SNP rs6670 (A allele) was not directly associated with IGFBP-3 levels but was weakly associated with a decreasing trend in circulating IGF-1 levels [36]. This is not entirely consistent with the positive association we observed with renal cancer in our study, but suggests that further study is needed to clarify the associations observed. CASP1, CASP4, CASP5 and CASP12 belong to a caspase subfamily called the inflammatory caspases, which are involved in the maturation of inflammatory cytokines (Il-1b and IL-18) in addition to their role in apoptotic pathways [9,38,39]. Despite their involvement in two key carcinogenic pathways, inflammation and apoptosis, few published reports have evaluated genetic variation in these four caspase genes in relation to cancer. In our study, three CASP1/5/4/12 SNPs (rs568910, rs492859, rs507879) were associated with an increased risk of renal cancer, while one SNP (rs1785883) was associated with a decreased risk. The four SNPs were only weakly correlated with each other (r 2 ,0.5), except for rs492859 and rs568910 which were strongly correlated (r 2 = 0.99) within our data. The strongest individual SNP association with renal cancer was observed with rs507879, located within exon 2 of CASP5 and results in a missense mutation and amino acid substitution (Thr90Ala). The function of this particular exon 2 SNP is unclear and is predicted to be a benign mutation by PolyPhen. However, a common somatic mutation in exon 2 has also been identified in leukemias and, gastric, colon, and lung cancers, but has not yet been examined in renal tumors [40][41][42][43]. A somatic mutation in a mononucleotide repeat (A) 10 in exon 2 produces a shift in the reading frame during transcription resulting in a premature stop and a truncated protein. This suggests that this region in CASP5 may be particularly important for carcinogenesis.
To our knowledge, this is the first study to evaluate SNPs in all but two of these growth/differentiation and apoptosis genes in relation to renal cancer. The primary focus so far in the area of renal cancer susceptibility has been on genetic variants in xenobiotic metabolism genes [12,44] and the von Hippel-Lindau (VHL) gene, which leads to an increased risk of the hereditary form of renal cancer [45], Only three small studies have evaluated variants in PPARG and VEGF in relation to renal cancer. Smith et al. (n = 40 cases) observed that the rare allele of the PPARG P12A polymorphism (rs1801282) was underrepresented among RCC patients compared to controls, with an OR for trend of 0.28 (0.08-1.01) [46]. This finding is consistent with results from our analysis (OR for trend: 0.80; 95% CI: 0.67-0.96), but this SNP was not considered statistically significant after FDR adjustment. Kawai et al. (n = 213 cases) [47] observed a weak association between three VEGF promoter polymorphisms (rs1570360, rs2010963, rs699947) and renal cancer progression and prognosis; and Abe et al. (n = 145 cases) [48] observed a nonsignificant association between three VEGF 39UTR polymorphisms (C702T 2dbSNP identifier number is unknown, rs3025039, rs10434) and renal cancer risk in Japanese populations. Three of these SNPs were genotyped in our study (rs2010963, rs699947, rs3025039), but only SNP rs699947 demonstrated a weak but nonsignificant association with renal cancer risk. Our analysis of VEGF revealed only one nominally significant SNP in the promoter region (rs833058; Supplemental Table S1) which is correlated with rs699947 (r 2 = 0.65).
A strength of our study is the large sample size which provides sufficient statistical power to detect associations between SNPs and renal cancer risk. Hospital-based controls in our study could potentially cause selection bias if carrying specific genetic variants were somehow related to hospitalization or if the controls were somehow not representative of the general population. However, the high participation and response rates among both cases and controls minimize the potential for selection bias. Given the multiple centers and countries in our study, the potential for population stratification exists; however, we found no evidence of heterogeneity. Population stratification may still be present, but the likelihood of this is small among European populations [49]. Although tagSNP selection was not based on resequencing data, the strategy for selecting tagSNPs allowed a more comprehensive analysis of common genetic variation in these genes than the traditional candidate SNP approach. Given the large number of associations investigated, additional examination of statistically significant associations using FDR control helped us to evaluate the potential for chance findings due to multiple testing. Results from the replication conducted within the US Kidney Cancer study for rs507879 were not statistically significant on their own, but the study (696 cases and 593 controls) was underpowered (40%) to detect an association of 1.3. Point estimates calculated by pooling data from the two studies may better represent the true association between rs507879 and renal cancer.
In summary, the results from this study suggest that genetic polymorphisms and haplotypes within the CASP1, CASP5, EGFR, and IGFBP3 genes are associated with renal cancer risk. The regions identified in this study appear to have functional relevance in renal and other types of cancer. To our knowledge, this is one of the largest evaluations of genetic susceptibility and renal cancer conducted to date, but there is need to replicate and extend our findings in other populations. Figure S1 Sliding window results and linkage disequilibrium plot of CASP1/5/4/12 region. SNPs associated or located within a CASP gene are indicated by their respective lines. The haplotype results reported in Table 3 Table 3 are