Genome-Wide Association Study of Pancreatic Cancer in Japanese Population

Pancreatic cancer shows very poor prognosis and is the fifth leading cause of cancer death in Japan. Previous studies indicated some genetic factors contributing to the development and progression of pancreatic cancer; however, there are limited reports for common genetic variants to be associated with this disease, especially in the Asian population. We have conducted a genome-wide association study (GWAS) using 991 invasive pancreatic ductal adenocarcinoma cases and 5,209 controls, and identified three loci showing significant association (P-value<5×10−7) with susceptibility to pancreatic cancer. The SNPs that showed significant association carried estimated odds ratios of 1.29, 1.32, and 3.73 with 95% confidence intervals of 1.17–1.43, 1.19–1.47, and 2.24–6.21; P-value of 3.30×10−7, 3.30×10−7, and 4.41×10−7; located on chromosomes 6p25.3, 12p11.21 and 7q36.2, respectively. These associated SNPs are located within linkage disequilibrium blocks containing genes that have been implicated some roles in the oncogenesis of pancreatic cancer.


Introduction
Pancreatic cancer is the fifth leading cause of cancer death with an estimated death of 24,634 patients in Japan in year 2007. Its 5year survival rate is as low as 6.7% (http://www.fpcr.or.jp/ publication/pdf/statistics2009/fig01.pdf and http://www.fpcr.or.jp/ publication/pdf/statistics2009/fig20.pdf). Since no specific symptom is observed in the patients with pancreatic cancer at an early stage, most of the patients were diagnosed at their advanced stage with a very low possibility of cure for the disease [1,2].
Previous reports indicated the involvement of both environmental and genetics factors in the etiology of this deleterious disease. Several case-control and cohort epidemiological studies have identified a number of possible risk factors such as smoking [3], diabetes [4], chronic pancreatitis [5], which are likely to predispose individual to the disease. In addition, familial aggregation of the disease has implied the possible involvement of genetic factors in pancreatic cancer [6]; approximately 10% of the patients were reported to have family history and individuals having first-degree relatives with pancreatic cancer revealed 2-to 4-fold higher risk of the disease [7][8][9]. These data indicated that genetic factors are likely to play some roles in the development of pancreatic cancer. In the last decade, the advancement of molecular biology improved the understanding of the pathogenesis of pancreatic cancer and characterized a number of genes that mutated in pancreatic cancers, such as somatic mutations in genes INK4A(CDKN2A), TP53, DPC4, BRCA1/2, STK11, APC, KRAS and ATM and PALB2 are found in pancreatic cancers [10][11][12][13][14][15][16][17][18].
Two recent GWAS studies for pancreatic cancer using Caucasian populations have identified associations with genomewide significance on chromosomes 9p34.2 (ABO), 13q22.1, 1q32 (NR5A2) and 5p15.33 (CLPTM1L-TERT), and highlighted that accumulation of these common genetic risk variants with modest effects are likely to play an important role on this complex disease, either individually or in interaction with environmental factors [19][20][21][22]. As the ethnicity is one of the critical factors in the pathogenesis of the genetic diseases with complex gene-gene and gene-environmental interactions, we (Biobank Japan (BBJ) in The University of Tokyo and National Cancer Center (NCC) Japan) combined samples of 991 cases with pancreatic cancer and 5209 controls (Table S1), attempted to identify common genetic variations associated with susceptibility to pancreatic cancer in the Japanese population.

Results
After the standard quality control of the genotype results (Table  S2), association analysis was performed for 420,236 SNPs using logistic regression analysis on the basis of allelic, dominant and recessive models after adjustment of age, sex and smoking status for each individual. The Q-Q plot for this GWAS based on allelic P-values by logistic regression revealed no significant population stratification with genomic inflation factor l of 1.026 ( Figure 1).
We successfully identified three genomic regions, 6p25.3, 12p11.21 and 7q36.2, shown to be significantly associated (Pvalue,5.0610 27 ) with increased risk of pancreatic cancer in Japanese population as indicated in the Manhattan plot in Figure 2 (referred to ref. 23).
The most significantly-associated SNP, rs9502893 (P-value of 3.30610 27 , per-allele odds ratio (OR) of 1.29 with 95% confidence interval (CI) of 1.17-1.43), is located within a 75-kb linkage disequilibrium (LD) block on chromosome 6p25.3 (Table 1). This LD block includes FOXQ1 (forkhead box (Fox) Q1) gene, which is located 25 kb upstream to this marker SNP (Figure 3a). Imputation analysis also revealed modest association at SNPs located near to or on the FOXQ1 gene suggesting it to be one of the causative genes for pancreatic cancer (Figure 3a and Table S3).
The second significantly-associated SNP, rs708224, located in the second intron of the gene BICD1 (Bicaudal-D homolog 1) on chromosome 12p11 (P-value of 3.30610 27 , per-allele OR of 1.32 with 95% CI of 1.19-1.47) ( Table 1). The 80-kb LD block showing the association corresponds to the second intron of BICD1 as revealed by the imputation analysis shown in Figure 3b ( Table S3).
The third locus is marked by rs6464375, rs7779540, rs6973850 and rs1048768 in the first intron of DPP6 gene. These SNPs indicated suggestive associations only under recessive model with minimum P-value of 4.41610 27 (OR of 3.73 with 95%CI of 2.24-6.21) as shown in Table 1 and Figure 3c.

Discussion
Here we present results of GWAS analysis on 991 cases with pancreatic cancer and 5209 controls. Our study represents the first GWAS attempt to identify common variants associated with pancreatic cancer in Japanese population and successfully identified SNPs located on chromosomal loci of 6p25.3, 12p11.21 and 7q36.2 are significantly associated with increased risk of pancreatic cancer in Japanese population. It is known that the development of the common disease is caused by the accumulation of common genetic variants, and each of this variant has a very modest effect on the risk (for example OR of ,1.2). In order to detect such small fraction, GWAS involving much larger populations (5000-10000) should be required. Our study was expected to identify SNPs with moderate effects (i.e OR.1.4). Hence SNPs that show very modest effect might have failed to be identified through this study.
The most significantly associated SNP in this GWAS, rs9502893 (P-value = 3.30610 27 , OR = 1.29) is located within a 75 kb LD block which encompasses gene FOXQ1 on chromosome loci 6p25.3. FOXQ1 encodes for protein forkhead box (Fox) Q1. The Fox family of transcription factors consists of at least 43 members and mutations in Fox genes can cause significant effects on human common disease and cancers [24,25]. A Fox member, FoxM1, is well-known to be associated with oncogenesis of pancreatic cancer. Down-regulation of this protein results in the inhibition of migration, invasion and angiogenesis in pancreatic cancer cells [26]. Furthermore, a recent study showed that FoxQ1 is overexpressed in pancreatic cancer, suggesting its role in pancreatic cancer tumorigenesis [27]. Although the SNP that we identified is approximately 25 kb downstream to this gene, the associated SNP may 'tag' the causative variant located on the expression regulatory region of the gene and subsequently alter expression of the gene. However, further study is needed to elucidate a precise biological role and mechanism of the gene function with regard to pancreatic carcinogenesis.
The second most significantly associated SNP, rs708224 (Pvalue = 3.30610 27 , OR = 1.32) is located within the BICD1 gene. This gene encodes a protein Bicaudal-D homolog 1, which plays a role in vacuolar trafficking. Previous studies reported substantial evidences indicating a link between vacuolar gene and shorter telomeres in yeast model [28][29][30]. In addition, Mangino et al. suggested that genetic variations within the BICD1 gene could alter its transcriptional levels and in turn influence telomere length in humans [31]. Several recent studies have documented reduced telomere length in pancreatic ductal adenocarnoma specimens, suggesting telomeric dysfunction in pancreatic cancer cells [32][33][34]. Thus, it is of importance to determine the functional consequences of rs708224 and/or variations linked to this SNP in the pathogenesis of pancreatic cancer.
Several SNPs located in the first intron of DPP6 indicated suggestive associations with an increased risk of pancreatic cancer in this study. DPP6 encodes protein dipeptidyl-peptidase 6, which binds to specific voltage-gated potassium channels and alters their expression and biophysical properties. A recent study on core signaling pathways in human pancreatic cancers found three somatic mutations in DPP6 among 24 pancreatic cancer samples examined by detailed sequence analyses. This report also suggested that DPP6 might play a crucial role in regulation of invasion of pancreatic cancer cells [35]. Hence, our study strengthens the risk of DPP6 in pancreatic cancer and warrants further screening on this gene to confirm its association with pancreatic cancer.
Recent GWAS reports have indicated several loci on chromosomes 9p34.2, 13q22.1, 1q32.1 and 5p15.33 to be associated with an increased risk of pancreatic cancer in Caucasian population [21,22]. Among the significantly associated SNPs, rs9543325 on chromosome 13q22.1 showed moderate association in our study populations (P-value (allelic model) of 1.69610 24 ; OR of 1.21 with 95%CI of 1.10-1.34) (Table S4). On the other hand, SNPs on chromosomes 9p34.2 (rs505922) and 1q32.1 (rs3790844) showed a weak association in our study populations (P-values of 3.69610 22 and 1.24610 22 ; ORs of 1.11 and 1.14 with 95% CI of 1.01-1.22 and 1.03-1.27, respectively) (Table S4). We were unable to replicate the remaining loci (SHH and two loci on chromosomes 5p15.33 and 15q14) in these reports, probably because most of these associated SNPs are either non-polymorphic or possess very low allelic frequencies (MAF = 0.01) in Japanese population. The power of our study was not sufficient enough to detect positive associations for  these variants with the low allelic frequency. Such ethnic difference in genetic architecture of disease susceptibility is not rare. For example, two recent GWAS reported common variants on KCNQ1 gene associated with type 2 diabetes mellitus in Japanese population, but European GWAS were unable to identify the associations due to the low allelic frequency of these variants in the population [36,37]. In addition, identification of susceptibility loci may be also influenced by the differences in the LD structure across different populations and by potential interaction with other genetic variants and environmental factors [38]. In summary, this study represents the first GWAS to identify common variants possibly associated with pancreatic cancer in Japanese population. Our study confirmed the association from the Caucasian GWAS studies and revealed several novel possible candidate associated loci that were not detected in the previous Caucasian GWAS studies. Nevertheless, further additional replications are required to confirm or exclude the current findings.

Case and control subjects
A total of 331 and 675 cases that were clinically and/or histologically diagnosed to have an invasive pancreatic ductal adenocarcinoma were obtained from Biobank Japan (http:// biobankjp.org) at the Institute of Medical Science, The University of Tokyo as well as National Cancer Center Hospital, respectively. The control samples consisted of Japanese volunteers that were obtained from Osaka-Midosuji Rotary Club, Osaka, Japan (n = 906) as well as from staff members in Keio University, Japan, who participated in its health-check program (n = 677). In addition, individuals who were registered in Biobank Japan as subjects with various diseases except cancer (n = 3,728) (those having pulmonary tuberculosis, chronic hepatitis-B, keroid, drug-induced skin rash, peripheral artery disease, arrhythmia, stroke and myocardial infarction) were used as controls. All samples were obtained after obtaining the written informed consent. This project was approved by the ethics committee at The Institute of Medical Sciences, The University of Tokyo, National Cancer Center and Keio University. Individuals who had clinical history of diabetes mellitus (a possible confounding factor for pancreatic cancer) were excluded from these control sets. For sample quality control, we excluded five cases with call rate,0.98. After performing principal component analysis, we excluded outliers of 10 cases and 102 controls, who did not belong to the major Japanese cluster (Hondo cluster) ( Figure S1) [39]. We eventually performed the association study based on 991 cases and 5209 controls (Table S1). Power calculation showed that our study would have over 90% power to detect a per-allele OR of 1.4 or greater for an allele with 30% frequency at the genome-wide significance level (a = 5610 27 ).

SNP genotyping and quality control
All the individuals were genotyped using either Illumina Infinium HumanHap550v3 or Illumina Infinium Human610-Quad DNA Analysis Genotyping BeadChip. SNPs common in the two platforms were used for further analysis. We applied SNP quality control for all sets of samples as follows; SNP call rate should be .0.99 in both cases and controls, and P-value of Hardy-Weinberg equilibrium test should be .1.0610 26 in controls. SNPs with minor allele frequency (MAF) of ,0.01 in both case and control samples were excluded from the further analysis (Table S2).

Statistical analysis
We analyzed each SNP using logistic regression adjusted for age (continuous), sex and smoking status (current/former, never). Pvalues and OR with 95%CI were calculated for allelic, dominant and recessive models. We used the minimum P-values obtained from three models to evaluate the statistical significance of the association. All OR were reported with respect to the risk allele. All the statistical analyses were performed using R statistical environment version 2.9.0 (http://www.r-project.org/) or PLINK 1.06 (http://pngu.mgh.harvard.edu/purcell/plink/). R statistical environment version 2.9.0 was employed to draw Q-Q plot and regional association plot.

Genotype Imputation
We performed genotype imputation analysis for each set of samples by utilizing a Hidden Markov model as programmed in MACH version 1.0 (http://www.sph.umich.edu/csg/abecasis/ mach/index.html). To infer untyped and missing genotypes around the candidate chromosomal loci, we provided genotypes from our own samples together with haplotypes for reference samples (Japanese from Tokyo, JPT) from HapMap database (http://hapmap.ncbi.nlm.nih.gov/). SNPs with low genotyping rate (,99%), showing deviations from Hardy-Weinberg equilibrium (,1.0610 26 ), or MAF (,0.01) were excluded from the analysis. MACH version 1.0 was used to estimate haplotypes, map crossover and error rates using 50 iterations of the Markov chain Monte Carlo algorithm. By utilizing the genotype information from the HapMap database, maximum likelihood genotypes were generated. For quality control, we retained imputed SNPs with the estimated r 2 of .0.3. We also picked up a total of 17 SNPs (Pvalue,0.001) to verify the association using Invader and TaqMan genotyping methods (data not shown).