A genome-wide association study identifies three novel genetic markers for response to tamoxifen: A prospective multicenter study

Purpose Although association studies of genetic variations with the clinical outcomes of breast cancer patients treated with tamoxifen have been reported, genetic factors which could determine individual response to tamoxifen are not fully clarified. We performed a genome-wide association study (GWAS) to identify novel genetic markers for response to tamoxifen. Experimental design We prospectively collected 347 blood samples from patients with hormone receptor-positive and human epidermal growth factor receptor 2-negative, invasive breast cancer receiving preoperative tamoxifen monotherapy for 14 to 28 days. We used Ki-67 response in breast cancer tissues after preoperative short-term tamoxifen therapy as a surrogate marker for response to tamoxifen. We performed GWAS and genotype imputation using 275 patients, and an independent set of 72 patients was used for replication study. Results The combined result of GWAS and the replication study, and subsequent imputation analysis indicated possible association of three loci with Ki-67 response after tamoxifen therapy (rs17198973 on chromosome 4q34.3, rs4577773 on 6q12, and rs7087428 on 10p13, Pcombined = 5.69 x 10−6, 1.64 x 10−5, and 9.77 x 10−6, respectively). When patients were classified into three groups by the scoring system based on the genotypes of the three SNPs, patients with higher scores showed significantly higher after/before ratio of Ki-67 compared to those with lower scores (P = 1.8 x 10−12), suggesting the cumulative effect of the three SNPs. Conclusion We identified three novel loci, which could be associated with clinical response to tamoxifen. These findings provide new insights into personalized hormonal therapy for the patients with breast cancer.


Introduction
The clinical benefit of the antiestrogen agent "tamoxifen" for the treatment of estrogen receptor (ER)-positive breast cancers is well-known [1]. It is reported that five-year tamoxifen therapy could improve the risk of its relapse for 15 years, particularly for ER-positive invasive breast cancers in premenopausal women [2]. However, inter-individual differences in response to tamoxifen therapy have been reported and 30-50% of patients with adjuvant tamoxifen therapy suffer a relapse and die of the disease [3,4]. Many studies have suggested that metabolites of tamoxifen, 4-hydroxytamoxifen and 4-hydroxy-N-desmethyltamoxifen (endoxifen) are the active therapeutic moieties of tamoxifen. These two metabolites have at least 100-fold greater affinity to ER and 30-100 greater potency in inhibiting estrogen-dependent cell growth compared with a parent compound, tamoxifen [5][6][7]. Inter-individual differences in the production of these active metabolites could affect variability in the response to tamoxifen. Cytochrome P450 2D6 (CYP2D6) has been reported to be a key enzyme for the production of the potent active metabolites of tamoxifen, "4-hydroxytamoxifen" and "endoxifen" [8]. Many studies reported that decreased or null-function alleles of CYP2D6 were associated with poor clinical outcome in breast cancer patients treated with tamoxifen [9][10][11][12][13]. However, the results showing the lack of association between CYP2D6 genotypes and tamoxifen efficacy have also been reported [14][15][16][17][18][19], although these studies have been criticized for multiple issues as the cause of false-negative results [20]. To clarify the clinical significance of CYP2D6 allele in tamoxifen therapy, we recently carried out the prospective CYP2D6-tamoxifen study and reported the positive association between CYP2D6 genotype and response to tamoxifen using Ki-67 change in breast cancer tissues after short-term preoperative tamoxifen therapy [21], which has been known as a promising surrogate marker for clinical response to tamoxifen [21][22][23][24].
In addition to CYP2D6-tamoxifen study, many candidate gene approaches have been carried out [12,15,16,18,19,25,26], however, associations of the candidate genes have not yet been sufficiently validated. Hence, it has been suggested that the other genetic factors which could influence the inter-individual differences in responsiveness to tamoxifen should exist, even if the effects of genetic polymorphisms of CYP2D6 were considered. In this study, to identify novel responsible loci for the response to tamoxifen therapy, we carried out a genomewide association study (GWAS) and identified novel three loci associated with Ki-67 change after short-term tamoxifen therapy in the breast cancer patients treated with tamoxifen.

Patients
A total of 347 patients with primary breast cancer (including the 233 patients reported previously [21]) were recruited at Showa University, Nippon Medical School, Tokyo Medical University, Saitama Cancer Center, Hirosaki Municipal Hospital, Sapporo Medical University, Sapporo Breast Surgical Clinic, Nakagami Hospital, Sagara Hospital, Yokohama City University Medical Center, Yokohama Minato Red Cross Hospital, St. Marianna University School of Medicine Hospital, and National University Cancer Institute, Singapore. Of them, 275 patients who were recruited from July 2012 to July 2014 were used for a GWAS analysis, and the remaining 72 patients who were recruited from October 2014 to November 2015 were analyzed in a replication study. All patients were women who were pathologically diagnosed with ER-positive (> 10%), human epidermal growth factor receptor 2 (HER2)-negative, invasive breast cancer without metastasis. ER status was evaluated by immunohistochemistry, and HER2 negativity was defined as < 2+ immunohistochemical staining or 2+ immunohistochemical staining without gene amplification by FISH test as previously described [21]. All patients received 20 mg/day of tamoxifen for 14 to 28 days until the day before the radical operation for breast cancer. Core-needle biopsy samples for diagnosis of the primary tumor were obtained before the first dose of tamoxifen, and tumor tissues after tamoxifen treatment were obtained from surgical specimen. Formalin fixation of tissues and record of Ki-67 labeling index was performed as described previously [21]. International Union Against Cancer TNM classification was used to determine the tumor and nodal status. This study was approved by the Institutional Review Boards of the National Cancer Center (Tokyo, Japan) and each participating institution. Written informed consent was obtained from all patients.

Genotyping and quality control
In this study, genomic DNA was extracted from peripheral blood using a Qiagen DNA extraction kit (Qiagen, Hilden, Germany). For the GWAS analysis, 275 patients with primary breast cancer were genotyped at RIKEN Center for Integrative Medical Sciences using the Illumina HumanOmniExpressExome-8 v1.2 (Illumina, San Diego, USA). We performed single nucleotide polymorphism (SNP) quality control by excluding SNPs that had a low genotyping rate < 98%, showed deviations from Hardy-Weinberg equilibrium (P 1.0 x 10 −6 ) and SNPs with minor allele frequency of < 0.05. 519,335 SNPs in autosomal chromosomes passed the quality control filters. We utilized the identity-by-state method to evaluate cryptic relatedness for the samples included in this study. Additionally, we examined population stratification by principal component analysis (PCA) using the EIGENSTRAT software v2.0 (https://www. hsph.harvard.edu/alkes-price/software/). The PCA was performed by comparing the distribution of the sample populations with three reference populations from the HapMap database that included Europeans (represented by Caucasian from UTAH, CEU), Africans (represented by Yoruba from Ibadan, YRI) and East Asians (represented by Japanese from Tokyo, JPT, and Han Chinese from Beijing, CHB). The top two principal components were utilized to produce a scatter plot for the identification of outliers who did not belong to the Asian cluster. The PCA was performed on the basis of the genotype information from the samples included in this study. The quantile-quantile (Q-Q) plot was generated between observed P values of Kruskal Wallis test against expected P values and revealed no significant population stratification with genomic inflation factor (λ = 1.020) (S1 Fig). In the replication study, 72 were genotyped using the Illumina HumanOmniExpressExome-8 v1.2 (Illumina), TaqMan SNP genotyping assay (Thermo Fisher Scientific, MA, USA), and Sanger sequencing.

Genotype imputation
Genome-wide imputation was conducted separately for the 275 GWAS subjects. The reference panel that we used for imputation was based on the 1000 Genomes Project Phase 3 integrated release version 5 for individuals of East Asian descent comprising Japanese from Tokyo, Chinese from Beijing and Chinese from southern China. In brief, we first compared the allele frequencies in the GWAS and reference panels and exclude SNPs that have allele frequency disconcordance at 0.15. We then phased the haplotypes for the samples using SHAPEIT ver2.0 software. Missing genotypes were imputed with Minimac3 software. We extracted variants with imputation quality (RSQ) threshold of RSQ > 0.9.
For imputation analysis, we performed SNP quality control by excluding SNPs that had a low genotyping rate < 98%, showed deviations from Hardy-Weinberg equilibrium (P 1.0 x 10 −6 ) and SNPs with minor allele frequency of < 0.01.

Statistical analysis
In the GWAS and the replication study, the differences in the Ki-67 labeling index among each genotype were evaluated by the Kruskal-Wallis test, and the Mann-Whitney U test was applied to two genetic models: a dominant-inheritance model, and a recessive-inheritance model. Significance levels after Bonferroni correction for multiple testing of three genetic models were P = 3.21 x 10 −8 [0.05/(519,335x3)] in the GWAS stage and P = 2.03 x 10 −4 [0.05/ (82x3)] in replication analyses. For combination analysis, the genotype count of the replication study was added to that of the GWAS. For the predictive scoring system of Ki-67 response, we assigned a score of 2 to individuals homozygous for the risk allele (allele responsible for tamoxifen resistance) and 0 to individuals with the other genotypes at rs4577773 and rs7087428. We further assigned a scores of 2 and 1 to individuals homozygous and heterozygous for risk allele, respectively, and 0 to those without risk allele at rs17198973, and summed up the scores for each gene to obtain individuals' scores. Based on this system, each patient was classified into any of the three prediction score groups (group 0, 1, ! 2). All the statistical analyses were carried out using R statistical environment version 3.3.1 (http://www.r-project.org/), PLINK version 1.07 [27], or the Ekuseru-Toukei 2015 (Social Survey Research Information Co., Ltd., Tokyo, Japan). Regional association plots were generated using Locus Zoom (http:// locuszoom.sph.umich.edu/).

Patient characteristics
We recruited 347 patients receiving short-term (14-28 days) preoperative tamoxifen monotherapy to evaluate the effect of tamoxifen on change of Ki-67 labeling index in breast cancer tissues. The characteristics of these 347 patients who were pathologically diagnosed to have an ER-positive, HER2-negative, invasive breast cancer were summarized in Table 1. Their median age at the time of surgery was 54 years old (range, 25-85 years). Among the characteristics in Table 1, none of them showed significant association with Ki-67 response after preoperative tamoxifen therapy in logistic regression analysis (S1 Table).

GWAS and replication analysis
We used Ki-67 response after preoperative short-term tamoxifen therapy (after/before ratio of the Ki-67 index; when it is below 1, the proportion of Ki-67-positive cells is decreased) as a surrogate biomarker of tamoxifen efficacy because a change in the expression of Ki-67 after short-term tamoxifen therapy could be significantly associated with clinical response to tamoxifen [21-24]. To identify novel biomarker(s) for predicting response to tamoxifen therapy, we carried out a GWAS of Ki-67 response after tamoxifen therapy of 275 patients with breast cancer who received preoperative tamoxifen monotherapy using HumanOmniExpres-sExome-8 v1.2. The association analysis was carried out for 519,335 SNPs by the Kruskal-Wallis test and Mann-Whitney U test after the standard quality control. We generated a quantile-  quantile plot to inspect possible population stratification effects and obtained the genomic inflation factor (λ GC ) of 1.02, indicating no population substructure (S1 Fig). However, we could not observe the SNP which reach genome-wide significance level (Fig 1). The top 100 markers are displayed in S2 Table. To further validate the results of the GWAS, we performed a replication study using an independent set of 72 hormone receptor-positive breast cancer patients receiving preoperative tamoxifen monotherapy. We genotyped the top 100 SNPs that had the most significant P values in the GWAS (S2 Table). We set significance levels for three genetic models after the Bonferroni correction at P = 2.03 x 10 −4 [0.05/(82x3)] in replication study because 18 SNPs were highly linked (r 2 >0.80) to the other SNPs. Although we could not find any SNPs, which revealed significant levels of association in the replication stage, we identified possibly associated SNPs on a locus, chromosome 4q34 (rs4861477, P min = 3.65 x 10 −2 , Table 2). A combined result of GWAS and the replication studies reveled stronger association of three SNPs than those in GWAS results (rs4861477 on chromosome 4q34.3, P combined = 8.34 x 10 −6 , rs4577773 on chromosome 6q12, P combined = 1.64 x 10 −5 , and rs7893556 on chromosome 10p13, P combined = 2.03 x 10 −5 , Table 2).

Imputation analysis
To identify additional candidate loci associated with Ki-67 response after tamoxifen therapy, we examined associations by using genome-wide imputed genotypes of GWAS samples. Although this analysis identified additional 231 candidate variants (P < 1 x 10 −4 ), we could not observe novel candidate locus associated with Ki-67 response. Of the above three candidate locus identified in GWAS (chromosome 4q34.3, 6q12 and 10p13), imputation analysis identified two novel SNPs on chromosome 4q34.3 and 10p13 which showed greater significance than the marker SNPs (rs17198973 on chromosome 4q34.3; P = 7.26 x 10 −5 , rs7087428 on chromosome 10p13; P = 3.64 x 10 −5 , Table 2, Fig 2). We further performed a replication study for the two SNPs, and found that the associations of the two imputed SNPs were stronger than those of the marker SNPs (rs17198973 on chromosome 4q34.3, P = 1.03 x 10 −2 and rs7087428 on chromosome 10p13, P = 7.68 x 10 −2 , Table 2). A combined results of the GWAS and replication stage revealed stronger association of the two imputed SNPs with Ki-67 response than those of GWAS stage (rs17198973; P combined = 5.69 x 10 −6 , and rs7087428; P combined = 9.77 x 10 −6 , Table 2).

Combination analysis of the three candidate SNPs associated with Ki-67 response after preoperative tamoxifen therapy
We investigated combined effects of the three loci on the Ki-67 response after preoperative tamoxifen therapy using a scoring system because the three SNPs, which showed smallest P values in combined study at each locus (rs17198973, rs4577773 and rs7087428), were independent markers for Ki-67 response when analyzed by multivariate analysis (P < 7.18 x 10 −3 , S1 Table). For the predictive scoring system, each patient was scored considering the number of risk alleles (alleles responsible for increase of after/before ratio of the Ki-67, i.e., tamoxifen resistance alleles) of the three SNP; we gave a score of 2 to individuals homozygous for the risk allele, and 0 to those with other genotypes at rs4577773 and rs7087428 because the recessiveinheritance model revealed a smallest P value at the two SNPs. Furthermore, we gave scores of 2 and 1 to individuals homozygous and heterozygous for risk allele, respectively, and 0 to those without risk allele at rs17198973 because the additive model revealed a smallest P value at this SNP. Individuals' scores were obtained by summing up the scores for each gene. We investigated a combined effect of the three SNPs on the Ki-67 response after preoperative tamoxifen therapy by classifying the 347 patients into 3 groups (0, 1, and !2 prediction score groups) according to the above scoring method (Fig 3). The patients with higher prediction scores indicated significantly higher after/before ratio of the Ki-67 index (more resistant to tamoxifen), suggesting that the predictive scoring system using the above three SNPs could predict the clinical response to tamoxifen (P = 1.80 x 10 −12 , Fig 3).

Discussion
The use of effective drugs such as tamoxifen at the lower cost based on individual germline and/or somatic genetic information is important to reduce the medical cost with maintaining the quality of medical care for patients with ER-positive breast cancer. This study is the first prospective GWAS which attempts to identify genetic variants associated with clinical response to tamoxifen using Ki-67 response as a surrogate marker of its efficacy. Although the genome-wide significant SNPs was not identified in this study, we identified three candidate SNPs showing possible association with response to tamoxifen in three independent loci (Table 2). Moreover, predictive scoring system using the three candidate SNPs showed significantly higher after/before ratio of the Ki-67 index (more resistant to tamoxifen) in patients with higher score groups (higher ratio in score 2 or more >1 > 0 groups, P = 1.80 x 10 −12 , Fig  3). We further investigated the relationship between the prediction score and clinical outcome of breast cancer patients treated with tamoxifen in adjuvant setting, who were used in our previous retrospective study [26], and observed significant association of the prediction score with recurrence-free survival after tamoxifen therapy (Log-rank P = 0.0056, S2 Fig). These lines of evidence suggest the potential to improve the ability of physicians to select optimal hormonal drug based on the result of this predictive scoring system for the treatment of patients with hormone receptor-positive breast cancer. The most strongly associated SNP in this study, rs17198973 (P combined = 5.69 x 10 −6 ) which was identified by genome-wide imputation, is located in the region on chromosome 4q34.3, although no protein-coding gene in this region is known. A GWAS reported that rs17198973 was one of the variants which could be associated with height [28]. Although body height could be regulated by genetic and environmental factors, plasma estrogen (estradiol) level has been reported as one of the factors which could influence the body height [29]. The interindividual variation of estrogen level could affect the response to tamoxifen because this drug acts by competitive antagonism of estrogen at its receptor site [30,31]. Hence, rs17198973 might be one of the marker SNPs for response to tamoxifen, although further validation studies will be needed to prove the true association.
The second strongest association with Ki-67 response after preoperative tamoxifen therapy in this study was observed at rs7087428 (P combined = 9.77 x 10 −6 ) on chromosome 10p13 ( Table 2). rs7087428 located in intron 7 of Ras Suppressor Protein 1 (RSU1). RSU1 is reported to be involved in the Ras signal effects in breast cancer by inhibiting anchorage-independent cancer cell proliferation [32][33][34][35]. Activation of Ras/Raf-1/MAPK is related to tamoxifen resistance through phosphorylation of estrogen receptor alpha [36][37][38]. Ras signal transduction could influence response to tamoxifen, and increase risk of relapse and death after tamoxifen treatment [39][40][41][42]. Therefore, genetic variations in RSU1 might contribute to response to tamoxifen through effects on Ras/Raf-1/MAPK signaling.
Genetic variants of CYP2C19, CYP3A5, UGT2B15 and SULT1A1 have been investigated as candidate genes associated with clinical response to tamoxifen [12,18,25]. In our GWAS, the SNPs in these candidate genes showed no or weak association with Ki-67 response after preoperative tamoxifen therapy (P = 2.12 x 10 −2 -9.96 x 10 −1 ). Moreover, the P values of the SNPs in the ESR1, ESR2 and PGR genes, which encode ER-alpha, ER-beta and progesterone receptor (PR), respectively, were from 4.12 x 10 −2 to 9.96 x 10 −1 , indicating no or weak association [43]. The sample size used in this study might not be enough to detect significant associations of these SNPs because the effect sizes of these candidate genes were not so large according to the previous reports [13,18,44].
In conclusion, our prospective GWAS using 347 patients with breast cancer identified three novel candidate loci, chromosome 4q34.3, 6q12, and 10p13, which were associated with Ki-67 response after preoperative tamoxifen therapy. Moreover, the combined analysis of the three SNP loci revealed that the predictive score based on risk genotypes was significantly associated with Ki-67 response after tamoxifen therapy. Our findings provide new insights into individualization of hormonal therapy for the patients with breast cancer. However, larger replication study and further functional analysis for candidate loci are required to verify these results and to clarify biological mechanisms which have effects on the clinical response to tamoxifen treatment. Patients with higher prediction scores (>2) in this study showed significantly shorter recurrence-free survival after adjuvant tamoxifen therapy compared to those with lower prediction scores (2 or less) in retrospective cohort used in our previous study [26]. (TIF) S1