Genetic Association of the KLK4 Locus with Risk of Prostate Cancer

The Kallikrein-related peptidase, KLK4, has been shown to be significantly overexpressed in prostate tumours in numerous studies and is suggested to be a potential biomarker for prostate cancer. KLK4 may also play a role in prostate cancer progression through its involvement in epithelial-mesenchymal transition, a more aggressive phenotype, and metastases to bone. It is well known that genetic variation has the potential to affect gene expression and/or various protein characteristics and hence we sought to investigate the possible role of single nucleotide polymorphisms (SNPs) in the KLK4 gene in prostate cancer. Assessment of 61 SNPs in the KLK4 locus (±10 kb) in approximately 1300 prostate cancer cases and 1300 male controls for associations with prostate cancer risk and/or prostate tumour aggressiveness (Gleason score <7 versus ≥7) revealed 7 SNPs to be associated with a decreased risk of prostate cancer at the Ptrend<0.05 significance level. Three of these SNPs, rs268923, rs56112930 and the HapMap tagSNP rs7248321, are located several kb upstream of KLK4; rs1654551 encodes a non-synonymous serine to alanine substitution at position 22 of the long isoform of the KLK4 protein, and the remaining 3 risk-associated SNPs, rs1701927, rs1090649 and rs806019, are located downstream of KLK4 and are in high linkage disequilibrium with each other (r2≥0.98). Our findings provide suggestive evidence of a role for genetic variation in the KLK4 locus in prostate cancer predisposition.


Introduction
The Kallikrein (KLK) gene family consists of 15 genes in a tightly clustered locus over 320 kilobases (kb) at 19 q13.4 [1]. Many of the KLKs display altered expression in disease, in particular hormonedependent cancers [1,2]. KLK4 is hormone-regulated and is expressed predominantly in the prostate [3,4], and to a lesser extent in other tissues [4,5]. KLK4 has gained support as a potential biomarker for several hormone-dependent cancers [2], and for prostate cancer specifically, in that numerous studies have found KLK4 to be significantly overexpressed in prostate carcinoma tissues compared to benign prostatic hyperplasia [6] and normal tissues [7][8][9][10][11]. Of note, KLK4 is known to be expressed as a variety of isoforms [12], with the full length protein (254 amino acids long) showing the potential to be a better biomarker of prostate tumour cells than the commonly expressed shorter isoform (205 amino acids) [11]. In addition, KLK4 has been proposed to play a role in prostate cancer progression through its involvement in epithelial-mesenchymal transition [13], a more aggressive phenotype, and metastases to bone [14]. KLK4 overexpression has been reported to be associated with prostate cancer stage, although the direction of effect differed for KLK4 mRNA (associated with advanced stage) [6] versus KLK4 protein (early stage tumours) [15].
Approximately 40% of prostate cancer is estimated to have a genetic component (http://www.genome.gov/gwastudies/) [16], and to date single nucleotide polymorphisms (SNPs) in over 40 loci have been identified by genome-wide association studies (GWAS) to be associated with prostate cancer risk [17]. One of these SNPs is located in the KLK locus, downstream of the

SNP Selection and Genotyping
The KLK4 gene region used for SNP selection was chr19:56091420…56115806 (hg18), which encompasses the longest KLK4 isoform 610 kb. All SNPs in this region were extracted from the National Center for Biotechnology Information (NCBI) dbSNP build 130 [27], CHIP SNPper [28] and the ''ParSNPs'' database [29] and duplicates removed. SNPs not classified as validated were removed and validated SNPs were further investigated for occurrence in Europeans using SPSmart [30] and 1000 Genomes [23]. Additional SNPs excluded from investigation included all SNPs on the Illumina 550 K, 610 K and Omni1 genome-wide genotyping chips and SNPs assessed in the Cancer Genetics Markers of Susceptibility (CGEMS) project [31], unless there was evidence of association with prostate cancer by CGEMS (P,0.05). SNPs in high linkage disequilibrium (LD; r 2 $0.80) with these excluded Illumina and CGEMS SNPs were also removed, determined by the SNP Annotation and Proxy Search program (SNAP) version 2.1 [32] using HapMap release 22 (1000 Genomes data was not available at the time of initiation of this study). We then prioritised for genotyping all independent SNPs (r 2 ,0.80) according to SNAP using HapMap release 22 data (N = 74). An additional 8 KLK4 tagSNPs (selected using HapMap data release 24/phase II, Nov 2008, NCBI build 36, dbSNP b126, using the Tagger program within Haploview v4.1 [33]), genotyped as part of a previous study, were also included (N = 82 overall).
SNPs were genotyped using iPLEX Gold assays on the Sequenom MassARRAY platform (Sequenom, San Diego, CA), as described previously [34]. There were 4 negative (H 2 O) controls per 384-well plate, and quality control parameters included genotype call rates .95%, a combination of cases and controls on each plate, inclusion of 20 duplicate samples per 384-well plate (.5% of samples) with $98% concordance between duplicates and Hardy-Weinberg Equilibrium P values .0.05. Of a total of 82 KLK4 SNPs selected for investigation, 11 could not be designed for Sequenom assays, and after application of quality control parameters, 61 SNPs were successfully genotyped. After the study was completed, 1000 Genomes data became available and revealed that 6 KLK4 SNPs not genotyped directly in our study (rs2659108, rs1654556, rs1090648, rs11881373, rs2569531 and rs73598979) were actually tagged by our genotyped SNPs (r 2 .0.80).

Statistical Methods
Predictive Analytics Software (PASW) Statistics version 17.0.2 (SPSS Inc., Chicago, IL) was used for all analyses. Genotype and allele frequencies were calculated for the patient and control groups. SNP allele and genotype distributions were compared using x 2 and their association with prostate cancer susceptibility and clinical data were performed under codominant and linear models using logistic regression analysis. Prostate cancer cases with tumour Gleason scores $7 were classified as aggressive. All analyses were adjusted for age (as a continuous variable).
Results of bioinformatic prediction of functions of the associated SNPs are provided in Table S2. SNPs rs268923, rs198968, rs1654551, rs1701926, rs1090649 and rs806019 were found to alter transcription factor binding sites as predicted by at least one prediction tool. rs198968 and rs1654551 also lie within promoter histone marks as well as DNAse hypersensitive sites (Table S2), and hence are better candidate for functional follow up studies.
SNP rs1654551 leads to a serine to alanine amino acid change, but is predicted to be benign using the FASTSNP web server, although the SNP is predicted to effect o-glycosylation. In addition, PsortII prediction (http://urgi.versailles.inra.fr/Tools/ PsortII) predicted the serine variant to be only 44.4% extracel-      lularly localised as compared to the alanine variant which is predicted to be 55.6% extracellular. This is backed by SignalP predicting alteration of the KLK4 signal peptide sequence for the serine variant. Further, this SNP is also predicted to be involved in differential splicing.

Discussion
We performed a comprehensive investigation of the role of variation in the KLK4 gene in prostate cancer risk and/or tumour aggressiveness by assessing the majority of SNPs that have not been covered by previously performed GWA studies. Our study of approximately 1,300 cases and 1,300 male controls provided suggestive evidence that several KLK4 SNPs may be associated with decreased risk of prostate cancer, and bioinformatic analysis provides evidence that some of these have potential biological relevance in prostate cancer.
None of the nominally risk-associated SNPs were located in known KLK4 hormone response elements [35]. Three SNPs lay several kb upstream of the KLK4 gene. rs7248321 is a tagSNP that has not been previously reported to be associated with prostate cancer risk in any GWAS, including CGEMS, and given the large numbers of samples assessed in previous studies [17], it is likely to be a false-positive result. Bioinformatic analyses of the rare rs56112930 SNP did not reveal any predicted effects on transcription factor binding sites [36,37,38]. SNP rs268923 was calculated by three different transcription factor binding site prediction programs to possibly have an effect [36][37][38][39], and although each program predicts different transcription factor binding sites to be altered by the SNP; one example of a prostate cancer relevant result is the predicted gain of an Oct-1 site [36]. Oct-1 is a known co-regulator of the androgen receptor [40], regulates growth of prostate cancer cells and is associated with poor prognosis [41].
The only SNP located in the KLK4 coding region found to be marginally associated with prostate cancer risk was rs1654551. Since splicing of the KLK4 locus is complex and results in several KLK4 mRNA forms being produced [12], there are several possible functional consequences of this substitution. The two protein isoforms identified to date, in order of expression in normal prostate, are an intracellular 205 amino acid (aa) protein which lacks the classical KLK signal peptide and is localised to the nucleus (''short'' isoform) [9,11], and a secreted 254 aa protein that is cytoplasmically localised [11,13]. rs1654551 codes for a serine to alanine substitution at amino acid 22 of the long isoform, or is located in the 59 UTR of the 205 aa KLK4 protein. Although both the short and long isoforms have been found to be overexpressed in prostate cancer cells, the ''long'' 254 aa KLK4 protein is better able to discriminate between tumour and normal cells [11] and hence may be the more biologically relevant isoform in prostate cancer. Amino acid 22 is located within the signal peptide region of KLK4, which is cleaved off between aa 26 and 27 to result in secretion. It is unknown what the potential functional effects of an amino acid substitution are within the signal peptide. However, a recent study has shown that this cleaved peptide may be a useful target in prostate cancer immunotherapy, with the KLK4 signal peptide successfully inducing and expanding the cytotoxic T lymphocyte response more readily than PSA or Prostatic Acid Phosphotase (PAP) [42]. In addition, in silico analysis using the signal peptide prediction program SignalP [43] predicted a Serine22Alanine substitution to alter the cleavage site from aa 26/27 to aa 21/22. This would result in a KLK4 pro-protein with an additional 5 aa, which could potentially affect localisation or possibly even activation of the KLK4 proenzyme. Of relevance, a form of PSA has been reported that has an altered signal/pro-peptide and, although the pro-PSA sequence is truncated (not lengthened as is predicted for KLK4), the signal peptide alteration does result in an isoform of PSA that is unable to be activated [44]. This [-2]pro-PSA isoform is now also the basis of a commercially available prostate cancer serum test [45]. Attempting to predict the possible functional effects of the four associated SNPs located downstream of KLK4, rs1701927, rs1701926, rs1090649 and rs806019, is not as clearly directed. It is possible that some or all of these SNPs might alter enhancer/ silencer binding sites, affecting expression of KLK4. In silico transcription factor binding site analysis predicts that rs1701926, rs1090649 and rs806019 may alter transcription factor binding sites [36][37][38][39] relevant to prostate cancer. The closest validated gene to the KLK4 39 end is the Kallikrein family pseudogene KLKP1, thus these SNPs may regulate the activity of this pseudogene or expression of KLKP1 transcripts, which have been shown to be down-regulated in prostate cancer tissues [46]. In addition, one other SNP not genotyped in this study, rs1654556, is in high LD (r 2 $0.80) with these SNPs [23] and is predicted to alter mRNA folding [47] and miRNA binding (Table S2).
To the best of our knowledge, only two other studies have examined the role of KLK4 SNPs in cancer (aside from genomewide investigations). Recently Klein et al. investigated the effect of common variation in the exons and putative promoter regions of all 15 KLK genes on prostate cancer risk and levels of PSA forms and KLK2 [48]. Five KLK4 SNPs -rs198969, rs198968, rs1654552, rs1654551 and rs1654553 -were assessed for association with prostate cancer risk in the Cancer Prostate in Sweden (CAPS) 1 sample set of over 1,400 cases and 700 controls and none were found to be associated. A second study in a small Korean sample set of 117 breast cancer cases and 194 controls found KLK4 SNP rs806019 to be associated with a decreased risk of breast cancer (Odds Ratio 0.53; 95% Confidence Interval 0.33-0.85; P = 0.007) [49], a finding of similar magnitude and direction to that observed in our study of prostate cancer. Part of our study design was to exclude KLK4 SNPs already assessed in GWAS, except for those reported to be associated with prostate cancer at the P,0.05 level in CGEMS. Four SNPs in CGEMS gave evidence of association with prostate cancer -rs17714461, rs8101572, rs8100631 and rs10427094 [31]. All four were genotyped in this study, but none were associated in our sample set. Of note, rs17714461 was recently reported to interact with the GWAS-detected KLK2/3 SNP rs2735839 in CGEMS [50]. The authors mention that this result is notable considering KLK4 and KLK2 collaborate to stimulate cellular proliferation in prostate cancer [51]. rs2735839 genotype data was available for only a small proportion of our samples and hence we did not investigate this interaction further.
Our well-sized study indicates a possible contribution of SNPs in the KLK4 gene to decreased risk of prostate cancer. However, these results should be interpreted cautiously considering the number of tests performed, and validation in much larger sample sets such as those of the PRACTICAL consortium is necessary. Table S1 rs IDs found to be monomorphic in this study. (DOC)