Large-Scale Pathway-Based Analysis of Bladder Cancer Genome-Wide Association Data from Five Studies of European Background

Pathway analysis of genome-wide association studies (GWAS) offer a unique opportunity to collectively evaluate genetic variants with effects that are too small to be detected individually. We applied a pathway analysis to a bladder cancer GWAS containing data from 3,532 cases and 5,120 controls of European background (n = 5 studies). Thirteen hundred and ninety-nine pathways were drawn from five publicly available resources (Biocarta, Kegg, NCI-PID, HumanCyc, and Reactome), and we constructed 22 additional candidate pathways previously hypothesized to be related to bladder cancer. In total, 1421 pathways, 5647 genes and ∼90,000 SNPs were included in our study. Logistic regression model adjusting for age, sex, study, DNA source, and smoking status was used to assess the marginal trend effect of SNPs on bladder cancer risk. Two complementary pathway-based methods (gene-set enrichment analysis [GSEA], and adapted rank-truncated product [ARTP]) were used to assess the enrichment of association signals within each pathway. Eighteen pathways were detected by either GSEA or ARTP at P≤0.01. To minimize false positives, we used the I2 statistic to identify SNPs displaying heterogeneous effects across the five studies. After removing these SNPs, seven pathways (‘Aromatic amine metabolism’ [PGSEA = 0.0100, PARTP = 0.0020], ‘NAD biosynthesis’ [PGSEA = 0.0018, PARTP = 0.0086], ‘NAD salvage’ [PARTP = 0.0068], ‘Clathrin derived vesicle budding’ [PARTP = 0.0018], ‘Lysosome vesicle biogenesis’ [PGSEA = 0.0023, PARTP<0.00012], ’Retrograde neurotrophin signaling’ [PGSEA = 0.00840], and ‘Mitotic metaphase/anaphase transition’ [PGSEA = 0.0040]) remained. These pathways seem to belong to three fundamental cellular processes (metabolic detoxification, mitosis, and clathrin-mediated vesicles). Identification of the aromatic amine metabolism pathway provides support for the ability of this approach to identify pathways with established relevance to bladder carcinogenesis.


Introduction
Genome-wide association studies (GWAS) have served as a useful tool to identify common genetic variants associated with various complex traits [1]. As expected, each variant explains a tiny portion of the heritable component of their associated phenotypes [2,3]. Recently, Park and colleagues estimated that some proportion of the 'missing heritability' may reside in additional common low-penetrance susceptibility variants that can be discovered in larger GWAS [4]. In principle, other methods could complement the primary single-locus tests of GWAS in identifying additional susceptibility loci. One such approach is pathway (gene-set) analysis [5,6], which examines whether association signals of a collection of functionally related loci (typically genes) consistently deviate from what is expected by chance. This approach may suggest new candidate susceptibility loci and possibly provide insights into the mechanisms underlying complex traits. Pathway-based analyses have been applied to GWAS of complex diseases, including multiple sclerosis [7], type-2 diabetes [8,9], Crohn's disease [10,11], Parkinson's disease [12,13], colon [14] and breast [15] cancers.
Bladder cancer is the fourth most common malignancy among men in the western world [16]. Epidemiological studies have shown that exposure to aromatic amines (AAs) from tobacco smoking or occupation is strongly associated with bladder cancer risk [16,17,18,19]. Additionally, genetic studies have demonstrated that functional polymorphisms in two genes involved in carcinogen metabolism (N-acetyltransferase 2 [NAT2] and glutathione S-transferase M1 [GSTM1]) are associated with bladder cancer risk [20,21]. Notably, the risk of bladder cancer associated with NAT2 slow acetylation genotype is restricted to smokers [20,22]. Recently, a series of GWAS have identified previously unknown susceptibility loci for bladder cancer, with the prospects of more to be discovered [22,23,24,25]. To identify additional regions that harbor plausible candidate genes and shed further light on genetic basis of this disease, we applied pathway analysis to the first stage of the NCI's CGEMS bladder cancer GWAS containing 3,532 cases and 5,120 controls [22]. We report here seven pathways implicated in diverse carcinogenic processes to be enriched with bladder cancer susceptibility loci.

Study population
We applied our analyses to primary scan data of 591,637 SNPs from NCI's bladder cancer GWAS containing 3,532 cases and 5,120 controls of European ancestry from five studies ( [22].

Pathway data construction
We collected gene-sets from five publicly available pathway resources: BioCarta [26], Kyoto Encyclopedia of Genes and Genomes (KEGG) [27], NCI's Pathway Interaction Database (PID) [28], Reactome [29], and Encyclopedia of Homo sapiens Genes and Metabolism (HumanCyc) [30]. Inclusion criteria of pathways for analysis were those containing 5-100 genes to avoid testing too narrowly-or too broadly-defined functional categories. In addition, we constructed 22 candidate pathways (Table S2) based on known bladder cancer risk factors and general carcinogenic processes [31,32,33] which were not represented in the public databases above. Specifically, selection of genes was determined through 1) biochemical data for the detoxification of aromatic amines [34,35]; 2) Ingenuity pathway lists [36]; and 3) Gene ontology lists [37].
To explore the similarity between pathways in our database, we assessed the percentage of overlapping genes between each two pathways (A and B) as: where N A and N B are the number of genes within pathways A and B. SNPs from the first stage of the NCI bladder cancer GWAS [22] were mapped to genes in these pathways if they were located in a region encompassing 20 kb 59 upstream and 10 kb 39 downstream from the genes' coding regions (NCBI's human genome build 36.3). These gene's boundaries were selected attempting to capture most of the gene's coding and regulatory variants [38] as well as minimizing the overlap between genes. Overall, 1,422 pathways containing 5,647genes (24.3621.7 [mean 6 SD] genes per pathway) and ,92,000 SNPs were included in our database. A complete list of the studied pathways is available in Table S1.

Statistical analysis
SNPs with MAF,1% among controls were excluded from the analysis. We fitted logistic regression models adjusted for age, sex, study center, DNA source (buccal/blood), and smoking status (current/former/never/occasional), to assess the marginal effect of each SNP (1 degree of freedom trend test) on the risk of bladder cancer, as previously described [22]. For each gene G j (j = 1, …, N, where N is the total number of genes in our dataset), the SNP with the lowest p-value among all SNPs that were mapped to its region was selected to represent the gene in the pathway analysis. We used two approaches to test for overrepresentation of association signals within pathways in our database: A. Gene-set enrichment analysis (GSEA; [12]): In this approach, the 2log10 of the p-value of each gene's best SNP was used as the gene's test statistics (r j = 2log10(p j ). Then, a weighted Kolmogorov-Smirnov procedure was used to assess for overrepresentation of gene's statistics Enrichment Score (ES) within each pathway (S) [15].
where, W S~P G jÃ [S r jÃ and N H is the number of genes in a pathway. A. The statistical significance of ES S was empirically evaluated using 10,000 permutations (permuting the genotype data between individuals and keeping the LD between SNPs intact). B. Adaptive Rank-Truncated Product (ARTP; [39]): In this approach the genes' best SNP p-values (p j ) in each pathway were ordered from lowest to highest. Then, the mathematical product was computed for all possible sets of p (j) such that with K, 1#K#L, being all possible integers (the truncation points) between 1 and L, with L being the number of genes in a pathway. In words, W(K) is simply the product of the K smallest P-values in a pathway. Next, we used the minP statistics [40,41] to evaluated what is the K truncation point where the W(K) get the most statistically significant value.
min P~min where ŝ(K j ) be the estimated P-value for W(K j ), K 1 #…#K. B. We then used two-level permutation procedure (10,000 permutations, permuting the genotype data between individuals and keeping the LD structure between SNPs intact) to estimate ŝ(K j ), and to adjust for multiple testing over the different truncation points used.
Using both the GSEA and ARTP methods that employ different approaches to assess the enrichment of gene-based signals within predefined gene-sets may facilitate capturing a broader range of candidate pathways for bladder cancer susceptibility.
Finally, we calculated a false discovery rate (FDR) to assess the proportion of expected false positive findings in the GSEA and ARTP analyses. In short, we normalized the GSEA and ARTP statistics for each pathway (NSs (GSEA) and NSs (ARTP) respectively) based on the mean and standard deviation of the corresponding permutation data [12]. This procedure allows a direct comparison of pathways with different sizes and gene compositions. Then, we used these normalized statistics to calculate the FDR as:

Genetic heterogeneity analysis
To minimize false positives, we estimated the I-squared statistic (I 2 ) [42] to identify SNPs displaying heterogeneous effects across the five studies [ATBC, CPSII, NEBCS (ME, VT), PLCO, and SBCS]. I 2 describes the proportion of total variation in study estimates that is due to heterogeneity. In short, a meta-analysis was applied to every SNP belonging to one of the top pathways using the genotype frequency counts of cases and controls to estimate per-allele OR and CI's. SNPs with I 2 P-values,0.2 were removed from further analyses. We evaluated the OR, CI and p values for both the meta-analysis and they were similar in both models, and did not change the interpretation of the data. These analyses were done using STATA (Version 11, STATA Corporation, College Station, TX).

Results
Overall, there was good correlation between the results of the GSEA and the ARTP methods (r = 0.74, P,0.0001). A detailed examination of the results revealed that, on average, GSEA performed better in detecting pathways enriched with multiple weak association signals while ARTP appeared to be more powerful in detecting pathways where only few genes with relatively strong signals are dominating. Notably, the AA metabolism pathway, which contains several known bladder cancer susceptibility loci, was detected by both GSEA and ARTP methods (P GSEA = 0.0100, P ARTP = 0.0020). Therefore, we used its significance level as a reference for highlighting additional candidate susceptibility pathways. Of the 1421 pathways examined, 18 were significantly enriched with association signals at the P,0.01 level (Table 1). Of these, seven pathways were detected by both GSEA and ARTP, four pathways were detected only by GSEA, and seven were detected only by ARTP. After removing SNPs with heterogeneous effects across the five studies (I 2 Pvalue,0.2), the enrichment signals remained significant (P,0.01) in seven pathways belonging to four cellular processes (''aromatic amine [AA] metabolism'', ''Nicotinamide adenine dinucleotide [NAD] metabolism'', ''Clathrin-mediated vesicles'', and ''Mitosis''). For clarity, from this point forward, we will refer only to the results from the post heterogeneity analysis. Table 2 displays the results for the genes in the AA pathway. The enrichment signals in this pathway were mainly driven by SNPs in the UGT1A9 and NAT2 genes. SNPs in these genes were identified in the primary analysis of this GWAS [22]. Removing these two genes from the pathway analyses reduced the enrichment signal in the AA metabolism pathway in both methods but still ranked it relatively high using the GSEA (P GSEA = 0.0130, P ARTP = 0.1217). Apart from UGT1A9 and NAT2, five additional genes in this pathway had SNPs with significant genetic effect (P trend ,0.05). These included NAT1, UGT1A4, UGT1A6, NQO1 and CYP1B1.

Aromatic amine [AA] metabolism
Some of the genes in the AA metabolism pathway (i.e. CYP1A1 and CYP1A2; UGT1A4, UGT1A6 and UGT1A9; SULT1A1 and SULT1A2) occur on the same chromosomal locus and consequently share similar tagging SNPs. To assess the effect of this redundancy on the pathway enrichment signal, we pooled together genes with overlapping SNPs and treated them as a single genetic unit in our pathway analyses. Consequently, the number of loci included in the AA metabolism pathway was reduced to seven, (Table S2) and the corresponding enrichment signals were strengthened (P GSEA = 0.0046, P ARTP = 0.0001). Even when removing the NAT2 and UGT1A regions from this gene-set, its corresponding enrichment signal remains relatively high (P GSEA = 0.024, P ARTP = 0.0921).

NAD metabolism
Two nicotinamide adenine dinucleotide (NAD) metabolism pathways were detected in this analysis. The ''NAD biogenesis I'' pathway (HumanCyc) was detected by both GSEA and ARTP (P GSEA = 0.0018, P ARTP = 0.0086), and the ''NAD salvage II'' pathway (HumanCyc) was detected only by the ARTP method (P ARTP = 0.0068). Table 3 presents the results for the genes in these pathways. The three NMNAT genes (NMNAT1, NMNAT2, and NMNAT3) that are shared by both of these two pathways harbor SNPs with significant genetic effect (P trend ,0.05) and therefore likely to dominate the significant enrichment signals in these pathways. Other genes displaying significant bladder cancer risk are QPRT in the ''NAD I'' pathway, and ACP6, ITGB1BP3, ACPL2 in the ''NAD II'' pathway.

Vesicle biogenesis and budding
Three pathways involved in clathrin-dependent vesicle biogenesis and budding were detected in this analysis. The ''Lysosome Vesicle Biogenesis'' pathway (Reactome) showed the strongest enrichment signal among all pathways in this study, and was detected by both GSEA and ARTP (P GSEA = 0.0023, P ARTP ,0.0001). The ''Clathrin derived vesicle budding'' pathway (Reactome) was detected only by ARTP (P ARTP = 0.0018), while the ''Retrograde neurotrophin signaling'' pathway (Reactome) was detected only by GSEA (P GSEA = 0.0084). Table 4 displays the Aromatic amine metabolism Self 11 (5); 1 (0.0059); 0.0100 (.0.5) Results of the top ranked pathways (P,0.01) using GSEA and ARTP. In parenthesis are results prior of removal SNPs displaying heterogeneous signals. 1 The number of genes in the pathway. 2 The number of genes underlying the enrichment signal in the pathway. 3 P-value of the enrichment score based on 10,000 permutations. 4 False-discovery rate calculated based on the normalized statistics of the permutation data to account for the variable sizes of genes and pathways. doi:10.1371/journal.pone.0029396.t001 results for the genes in these pathways. Three genes are shared by the three pathways: CLTA and CLTC, which encode for the light and heavy chains of clathrin respectively, and SH3GL2 which is associated with clathrin-mediated endocytosis. The association of SNPs in these three genes with bladder cancer risk ranked them among the top four genes in these pathways.

Mitosis
The ''Mitotic metaphase/anaphase transition'' (Reactome) was detected by the GSEA method (P GSAE = 0.0040) and was marginally significant using ARTP (P ARTP = 0.0187). Interestingly, all eight genes in this pathway are included in the more comprehensive ''Mitotic prometaphase'' pathway that was detected in the initial pathway screening, but had a less significant signal after removing SNPs with heterogeneous signals (Table 1). Results for the eight genes included in the ''Mitotic metaphase/anaphase transition'' pathway are presented in Table 5. Three SNPs in three genes (FBXO5, SMC3 and SPC24) were associated with significant protective effect on bladder cancer (P trend ,0.05). The SNP representing the gene in the pathway analysis after the removal of SNPs with heterogeneous effects. 3 The rank of the SNP among all SNPs in the gene's region based on their p-values. 4 Minor allele frequency among controls. 5 Per allele odds ratios +95% confidence intervals from logistic regression models adjusting for age, sex, study center, DNA source , and smoking. 6 1 d.f. trend test. doi:10.1371/journal.pone.0029396.t003 Table 2. Summary of genes in the aromatic amine metabolism pathway used for pathway-based analysis of multi-study bladder cancer GWAS. Number of SNPs genotyped in the gene region (20 kb 59 upstream and 10 kb 39 downstream from the gene's coding region). 2 The SNP representing the gene in the pathway analysis after the removal of SNPs with heterogeneous effects. 3 The rank of the SNP among all SNPs in the gene's region based on their p-values. 4 Minor allele frequency among controls. 5 Per allele odds ratios +95% confidence intervals from logistic regression models adjusting for age, sex, study center, DNA source , and smoking. 6 1 d.f. trend test. doi:10.1371/journal.pone.0029396.t002
The identification of the AA metabolism pathway in this study by both GSEA and ARTP could be considered a good indication for the utility of this approach, since AA metabolism has established relevance to bladder cancer susceptibility. Interestingly, the enrichment signal in this pathway is driven by variations in the UGT1A gene cluster and the NAT1, NAT2, and NQO1 genes ( Table 1) that are involved in detoxification processes in the AA pathway [34,35]. The strong enrichment signal left in this pathway even after the removal of the UGT1A and NAT2 genes from the analysis indicates that other genetic variations affecting aromatic amines detoxification may contribute to bladder cancer susceptibility.
The detection of the NAD metabolism pathway may be relevant to bladder cancer susceptibility through several carcinogenic mechanisms. First, NAD homeostasis has been shown to play a role in various redox reactions that may lead to irreversible cellular damage and consequently to the initiation of malignant tumor [43]. In addition, NAD has been shown to be involved in DNA repair and telomere maintenances [44] as well as in energy production both of which are important processes in cancer development. Interestingly, NAD metabolism pathway has been implicated in a recent pathway-based analysis of colon cancer GWAS [14]. Colon and bladder cancers have been associated with NAT2 acetylation status. For bladder cancer, in which Nacetylation is a detoxification step, NAT2 slow acetylator phenotype presents a higher risk. In contrast, for heterocyclic amine-related colon cancer in which N-acetylation is negligible and O-acetylation is a carcinogen-activation step, NAT2 rapid acetylator phenotype presents a higher risk [45]. Thus, similar metabolic pathways could play diverse roles in the etiology of these two cancers.
Three clathrin-mediated vesicle pathways are also highlighted in this study. Clathrin-coated vesicles play essential role in intracellular trafficking, endocytosis, and exocytosis [46]. In this realm, it has been shown that clathrin-mediated vesicle pathways regulate the signaling and cellular localization of several growth factors [47] that are known to play a role in cancer susceptibility. Interestingly, clathrin may be also relevant to the Mitotic Metaphase/Anaphase transition pathway that was also implicated in this study. During mitosis, clathrin helps stabilizing the The SNP representing the gene in the pathway analysis after the removal of SNPs with heterogeneous effects. 3 The rank of the SNP among all SNPs in the gene's region based on their p-values. 4 Minor allele frequency among controls. 5 Per allele odds ratios +95% confidence intervals from logistic regression models adjusting for age, sex, study center, DNA source, and smoking. 6 1 d. kinetochore fibers which are required for the proper function of the mitotic spindle [48]. Thus, the overrepresentation of association signals in two distinct pathways associated with mitosis suggest that perturbations in the mitotic process, and particularly those related to the metaphase/anaphase transition, may modify the risk of human bladder cancer. Strengths of our study are the large sample size; the use of primary scan data from five independent studies allowing us to address consistency of effects across the different populations; and the use of two complementary pathway-based methods. A limitation of our study is the lack of pathway-based signals to reach a noteworthy FDR significance level, with only one pathway (Lysosome Vesicle Biogenesis) having an FDR value ,0.2. This could be partially due to the inherent limits of the methods used, the inadequate annotation of relevant pathways in public databases, or due to weak association signals in our data. Recent analysis of bladder cancers using RNA expression data, have also highlighted enrichment of genes with similar processes as we identified in our genomic data here, including metabolic processes, which provide further plausibility that the pathways identified may be relevant to bladder cancer susceptibility [49]. Furthermore, the high rank of the AA metabolism pathway in both GSEA and ARTP support the power of these methods to highlight pathways with established relevance to bladder cancer susceptibility and may therefore similarly suggest the involvement of metabolic detoxification, mitosis and clathrin-mediated pathways in bladder carcinogenesis.  The SNP representing the gene in the pathway analysis after the removal of SNPs with heterogeneous effects. 3 The rank of the SNP among all SNPs in the gene's region based on their pvalues. 4 Minor allele frequency among controls. 5 Per allele odds ratios +95% confidence intervals from logistic regression models adjusting for age, sex, study center, DNA source , and smoking. 6 1 d.f. trend test. doi:10.1371/journal.pone.0029396.t005