The Associations between Immunity-Related Genes and Breast Cancer Prognosis in Korean Women

We investigated the role of common genetic variation in immune-related genes on breast cancer disease-free survival (DFS) in Korean women. 107 breast cancer patients of the Seoul Breast Cancer Study (SEBCS) were selected for this study. A total of 2,432 tag single nucleotide polymorphisms (SNPs) in 283 immune-related genes were genotyped with the GoldenGate Oligonucleotide pool assay (OPA). A multivariate Cox-proportional hazard model and polygenic risk score model were used to estimate the effects of SNPs on breast cancer prognosis. Harrell’s C index was calculated to estimate the predictive accuracy of polygenic risk score model. Subsequently, an extended gene set enrichment analysis (GSEA-SNP) was conducted to approximate the biological pathway. In addition, to confirm our results with current evidence, previous studies were systematically reviewed. Sixty-two SNPs were statistically significant at p-value less than 0.05. The most significant SNPs were rs1952438 in SOCS4 gene (hazard ratio (HR) = 11.99, 95% CI = 3.62–39.72, P = 4.84E-05), rs2289278 in TSLP gene (HR = 4.25, 95% CI = 2.10–8.62, P = 5.99E-05) and rs2074724 in HGF gene (HR = 4.63, 95% CI = 2.18–9.87, P = 7.04E-05). In the polygenic risk score model, the HR of women in the 3rd tertile was 6.78 (95% CI = 1.48–31.06) compared to patients in the 1st tertile of polygenic risk score. Harrell’s C index was 0.813 with total patients and 0.924 in 4-fold cross validation. In the pathway analysis, 18 pathways were significantly associated with breast cancer prognosis (P<0.1). The IL-6R, IL-8, IL-10RB, IL-12A, and IL-12B was associated with the prognosis of cancer in data of both our study and a previous study. Therefore, our results suggest that genetic polymorphisms in immune-related genes have relevance to breast cancer prognosis among Korean women.


Introduction
Cancer is a significant health problem in many parts of the worldwide [1,2]. In Korea, the incidence rate of breast cancer was ranked second and the mortality rate fifth in Korean women, which steadily increased from 1983 to 2010 [3]. The etiology and progression of breast cancer is a multiple-step process caused by combining many factors which involve environmental, hormonal and genetic factors [4,5]. We focused on genetic factors involved in immune response which was known to play a role in breast cancer prognosis.
The association of immune markers with breast cancer prognosis were well known and the role as key factor of microenvironment of tumor such as tumor suppressor or growth. For example, high density of CD68 which is high-infiltration of tumor-associated macrophages was related with poorer outcome in node-negative breast cancer [6] and CD44 positive patients showed longer overall survival and progression free survival than CD44 negative patients [7]. In addition, cytokines produced by various immune cells were known to modulate the transition from the innate to the adaptive immune response, the activation of antitumor cells, persistent oxidative stress, and the angiogenesis of breast cancer [8][9][10]. The prognosis of breast cancer was also known to be associated with single nucleotide polymorphisms (SNPs) in the immune system related genes [11][12][13][14]. Those reports described that genetic variants of toll-like receptor 4 (TLR4), interleukin 12 (IL-12), interleukin 2 (IL-2), and interleukin 6 (IL-6) were related with breast cancer prognosis. However, there have been few studies that investigate the association between comprehensive list of variants in the immunity-related genes and the prognosis of breast cancer.
Given the findings that immune system is related with breast cancer prognosis, we hypothesized that many genetic polymorphisms in immune related genes might be prognostic factor of breast cancer recurrence. In this study, the role of common immune genetic variations to the disease free survival (DFS) of breast cancer was investigated with the multivariate Coxproportional hazard model by individual variants, polygenic risk score model, and an extended gene set enrichment analysis. Additionally, a systematic review of previous literature that had reported on the associations between variants of the immunityrelated genes and the prognosis of various cancers was done.

Study population
Among subjects of Seoul Breast Cancer Study (SEBCS), a multicenter based case-control study recruiting between 2001 and 2007, the participants in this study were patients diagnosed with histologically confirmed breast cancer in the Seoul National University Hospital during [2002][2003][2004]. Based on the sample availability and quality of DNA, 140 breast cancer patients were successfully genotyped [15]. Among them, 107 patients were included in the final analysis after excluding patients without survival status or clinical information or been diagnosed as metastatic breast cancer patients.
During recruitment, well-trained interviewers provided patients with informed consent forms and collected information with a structured questionnaire. Through abstracting the medical chart, information on survival status, hormone receptor status, and TNM stage [16] were obtained.
This study design was approved by the Committee on Human Research of Seoul National University Hospital (IRB No. H-0503-144-004).

Genotyping
Among 209 samples met the genotyping criteria (concentration .7.5 ng/ul and total amount of DNA .750 ng), 140 cases were successfully genotyped. 283 immune-related candidate genes were composed of 190 innate immune-related genes in innate immune oligonucleotide pool assay (OPA) chip and 93 adaptive immunerelated genes in Non-Hodgkin's lymphoma (NHL) OPA chip as described in previous study [15,17]. 2,432 Tags SNPs were selected with SNP500 Cancer project database considering the site from 20 kb upstream of the first site of transcription of a candidate gene to 10 kb downstream of the end site of the last exon of the candidate gene and genotyped. Among them, 461 SNPs were excluded from the analysis because of low minor allele frequency (MAF) (,3%) and deviation from Hardy-Weinberg Equilibrium (HWE) (P, 10 24 ). Finally, a total of 1,971 SNPs in 279 immunity genes were selected for the analysis.

Statistical method
A DFS was calculated from the date when patients underwent a breast cancer operation to the date of last follow-up or recurrence, such as loco-regional, distant, contralateral recurrence and death from any causes. If patients had no evidence of recurrence, they were censored at the last follow-up date or on June 30, 2011. The median follow-up time was 4.87 years (range, 0.25-6.72 years).
Demographic data including age (,50 and $50), body mass index (BMI) (,21.4 and $21.4), family history of breast cancer in 1 st and 2 nd relatives (no and yes), educational level (# middle school, high school, and $ college or university), smoking status (never and ever), alcohol consumption (never and ever), and menopausal status (premenopausal and postmenopausal), and clinicopathological data including estrogen receptor status (ER) (positive and negative), progesterone receptor status (PR) (positive and negative), and 7 th AJCC TNM stage (I, II, and III) were assessed for DFS with the log-rank test and univariate Coxproportional hazard model. Multivariate Cox-proportional hazard model adjusted for age, ER status, PR status, and TNM stage (I, II, and III) was used to calculate the hazard ratio (HR) and their 95% CI of the effect for each SNP on the DFS of breast cancer based on additive genetic models. If SNPs were located in the same candidate gene and these SNPs had a linkage disequilibrium (LD) (r 2 .0.4), the most significantly associated SNP were selected. To correct the multiple comparison, false discovery rate (FDR) pvalues were calculated with the Benjamin-Hochberg method [18].
For the polygenic risk score method, the polygenic risk score was calculated by adding the number of risk alleles in each patient based on individual SNP analyses and the patients were categorized into tertiles of polygenic risk score [19]. HR and 95% confidence intervals (CIs) per tertile of polygenic risk score were calculated. After analyzing multivariate Cox-proportional hazard model, Harrell's C index was calculated to evaluate predictive accuracy of polygenic risk score model [20]. In addition, 4-fold cross-validation method was used to appraise the internal validity of our model; the entire data set was randomly partitioned into 4 equal size subsets. Of the 4 subsets, 3 subsets were used as training data, and a remaining single subset was retained as the validation data for testing the model. Significantly associated SNPs with prognosis of breast cancer were firstly estimated in training set and then Harrell's C index was estimated based on those SNPs in validation set. The cross-validation process was then repeated 4 times. The summary of these 4 Harrell's indices was assessed by fixed-effect model meta-analysis.
The GSEA-SNP method was used to reveal the biological function of the SNPs which were significantly related to breast cancer prognosis [21]. Pathway information was obtained from the Molecular Signatures Database (MSigDB) which collected annotated gene sets from the following online databases; BioCarta, KEGG, Pathway Interaction Database, Reactome, SigmaAldrich, Signaling Gateway, Signal Transduction KE, and SuperArray. In addition, gene sets that have been extracted from experimental studies were included in the database. The curated gene sets were downloaded from MSigDB (version 4.0, C2). Because there was a chance of the biological pathway being narrowly defined, each pathway was set up to contain at least three genes in the following analyses. The names of gene sets were described with 'brief description' rather than 'standard name' which is available on the GSEA web (http://www.broadinstitute.org/gsea/index.jsp), because standard name equivocally explained function of gene set.
The statistical significance of the effects was estimated with a pvalue less than 0.05 in both multivariate Cox-proportional hazard model by individual variants and polygenic risk score models and 0.1 in GSEA-SNP. The SAS statistical software package version 9.3, PLINK program version 1.07, and R 2.15.1 packages (GenABEL), STATA statistical software version 12.0 were used for the analyses.

Systematic review
Previous studies conducting analyses to find associations between immunity-related genetic factors and the prognosis of cancer in the epidemiologic field were selected for Jan 2000 through Dec 2013 ( Figure 2). Available studies for systematic review were searched in the PubMed and EMBASE database with a set of keywords that delineated breast cancer as well as other cancers, immune, genetic factors, and survival; cancer AND immune AND polymorphism AND survival. Abstracts were reviewed to identify reports examining associations between immunity-related genetic factors and clinical outcomes including recurrence and death. Literatures were excluded in the following circumstances; review paper, studies unrelated with genomic epidemiology, using SNPs located in non-immune related genes, duplicated in both databases, with no survival or recurrence data reported for survival analysis and no hazard ratios (HRs) reported which were estimated with the Cox-proportional hazard model for the associations of immunity-related genetic factors with cancer outcomes (Figure 2). In cases of duplication between both databases, the studies were deemed to have been searched in the PubMed database. The following data were extracted from each eligible study from the literature; disease site, authors, genes   (Table S1). Table 1 shows the characteristics of the 107 patients including 20 patients who had the events. Among the 107 cases, BMI, PR status, and TNM stage showed a significant association with the prognosis on the DFS of breast cancer (P,0.05, log-rank test), while there were no significant differences in age, family history of breast cancer, educational level, menopausal status, smoking status, alcohol consumption, and ER status.

Results
The associations of immunity-related genetic factors on DFS of breast cancer prognosis are presented in Table 2. Among 1,971 SNPs, 80 SNPs were significantly associated with the DFS of breast cancer. The 62 SNPs were remained after excluding those with high LD (r 2 .0.4) and 3 SNPs were still significant at FDR p-value less than 0.05. The SNPs were rs1952438 in SOCS4 gene (HR = 11.99, 95% CI = 3.62-39.72, P = 4.84E-05), rs2289278 in TSLP gene (HR = 4.25, 95% CI = 2.10-8.62, P = 5.99E-05) and rs2074724 in HGF gene (HR = 4.63, 95% CI = 2.18-9.87, P = 7.04E-05). Figure 1 presents the Kaplan-Meier survival curve and estimated HRs of breast cancer in groups defined by tertile derived from the polygenic risk scores of the 107 patients with all 62 SNPs. The HR was significantly increased as the score increased (p for trend = 0.01). The HR of women in the 3 rd tertile was 6.78 (95% CI = 1.48-31.06) compared to patients in the 1 st tertile of polygenic risk score. Table 3 shows the predictive accuracy and validation results of polygenic risk score model. The Harrell's C index of total patients is 0.813, and summarized Harrell's C index of cross validation is 0.924.
In GSEA-SNP analysis, our results showed that 18 pathways with 62 SNPs in 56 immunity-related genes had significant association with the DFS of breast cancer at a p-value less than 0.1 (Table 4); set 'Myc targets1': targets of c-Myc identified by ChIP on chip in cultured cell lines, focusing on E-box-containing genes; high affinity bound subset (including BCL2 and NBN, P = 0.04), mitochondrial genes; based on literature and sequence annotation resources and converted to Affymetrix HG-U133A probe sets (including BCL2 and NBN, P = 0.04), genes down-regulated in T24 (bladder cancer) cells in response to the photodynamic   Perez et al. [51] Immune-Related Genes and Breast Cancer Prognosis PLOS ONE | www.plosone.org therapy (PDT) stress (including BCL2 and CCL2, P = 0.04), genes transiently induced only by the second pulse of EGF in 184A1 cells (mammary epithelium) (including IRF3, TRAF5, KLK15 and IL5R, P = 0.02). Table 5 showed 30 studies resulted from systematic review for survival analyses estimating effects of immune-related genetic factors on various cancers. In the studies, eighty eight SNPs in 58 immunity genes were significantly associated with the prognosis of cancer patients (Table 6). In those results, there were 29 genes overlapped in both our study and previous studies, but no SNPs overlapped. Among them, IL-6R, IL-8, IL-10RB, IL-12A, and IL-12B was significantly associated with the prognosis of cancer consistent to our finding.

Discussion
In this study, we found that the rs1952438 in the suppressors of cytokine signaling (SOCS4) gene, rs2289278 in the thymic stromal lymphopoietin (TSLP) gene and rs2074724 in the hepatocyte growth factor (HGF) gene were highly associated with a poor prognosis of breast cancer. Moreover, the polygenic risk score model with genetic variations of immunity-related genes showed that the hazard of DFS of patients was significantly increased as high-risk alleles accumulated. In the GSEA-SNP analysis, 18 pathways significantly affected breast cancer prognosis.
The rs1952438 is located in the intron region of SOCS4 gene. SOCS family are rapidly induced by activated STATs and negatively regulate JAK/STAT pathway by a classical feedback loop [22]. Furthermore, other signal molecules such as FAK, IRS, p65, GR which are related with carcinogenesis, are regulated by SOCS proteins [23][24][25][26][27]. In addition, there are several previous study which reported that people who have higher expression level of SOCS4 are likely remained disease free status compared to those who developed recurrence [28]. In the view of previous studies which explain functional importance of SOCS4 and results of present study, it might be assumed that rs1952438 is associated with poorer prognosis of breast cancer by declining expression level of SOCS4.
The rs2289278 is found in intron 2 of the long-form of TSLP and in the 59 untranslated region of the short-form of TSLP [29]. TSLP is a member of the IL-2 cytokine family and a distant paralog of IL-7. TSLP may have an important role in tumor progression by activating CD4+ T cells, inducing the expressing of OX40L in dendritic cells (DCs), and producing Th2-type cytokines and B-cell growth factor [30]. A recent study has shown that breast cancer cells have high expression levels of TSLP, indicating that the TSLP may be critical in the development of breast cancer [31]. It is that high expression level of TSLP in cancer increases the Th2 level [30]. Furthermore, Th2 cytokines promote disease progres-sion through the increased survival of cancer cells, M2 macrophage differentia-tion, and fibrosis [31,32]. Thus, TSLP may be an important factor of breast tumor progression and the prognosis of a patient.
The rs2074724 is located in the intron of HGF. HGF is known to activate angiogenesis of tumors as well as cell-cell interactions, matrix adhesion, migration, invasion [33]. Moreover, breast cancer patients with a high HGF concentration had a significantly poor prognosis when compared to those with a low HGF concentration [34]. Therefore, HGF level was found to be the most important independent factor in predicting the prognosis of breast cancer.
In the GSEA-SNP analysis, there are 18 significant pathways; among these pathways, gene set from Kyng et al [35] which included rs1952438 in SOCS4 gene and rs2074724, and   influencing on environmental response, although there are not precise result in this assumption. 'Myc tagets1' gene set from Benporath et al [36] which included rs12458289 and rs9989529 in BCL2 gene, and rs2142097 in NBN gene is shown as the most significant gene set. Benporath et al describe that targets of Nanog, Oct4, Sox2 and c-Myc are more frequently associated in poorly differentiated tumors than in welldifferentiated tumors. c-Myc is well known to directly regulate the expression of NBN gene involved in DNA double-strand break repair and can result in chromosomal instability, cellular proliferation defects leading to increased more aggressive and metastatic tumor latency [37,38]. BCL2 and c-Myc are known to make the negative feedback loop in breast cancer cell line [39]. Taking all these consideration of both Benporath et al and results of present study to account, it is can be deduced that rs12458289, rs9989529, and rs2142097 might be associated with the prognosis of breast cancer by interacting with c-MYC gene.
To support the indirectly functional effects of our results, we attempted to find potential functional SNPs in SOCS4, HGF, TSLP and genes included in GSEA-SNP using UCSC database [40] and checked the LD between the potential functional SNPs and our findings. Table S2 show the functional SNPs studied in this study and functional SNPs in LD with those SNPs, generally to affect histone modification, DNA methylation, and binding affinity of several transcription factors located in 59UTR or 39UTR. For example, transcription activity of IL-8 is influenced by rs4073 which located in promoter region of IL-8 [41] and the variant increased the risk of mortality in follicular lymphocytic leukemia by increasing production of IL-8 [42]. As a result, it is possibly anticipated that those potential SNPs may influence to breast cancer prognosis by regulating the epigenetic and transcriptional pathway.
Several previous reports have evaluated the associations of immunity gene polymorphism and breast cancer prognosis [11][12][13][14]. They suggested that the variants of ERCC2, TLR4, IL-2, IL-6, and IL-21 genes had associations with breast cancer prognosis respectively. However, those genes were not replicated in present study. In the other types of cancer studies, IL-6R, IL-8, IL-10RB, IL-12A, and IL-12B genes were consistently associated with cancer prognosis between our study and theirs. However, there were few consistent SNPs with cancer prognosis in our review of the literature, which may result from various cancer targets, different ethnicities, and different prognostic factors in the models and statistical power.
In this study, there are several limitations including a small sample size and absence of an external validation study. Since the power of this study was low to detect accurate results, the results of this study are carefully interpreted, although the significance levels of top 3 SNPs passed the FDR test with significance (p,0.05) and the internal validity was confirmed by the cross-validation. In addition, polygenic risk score model and GSEA-SNP are conducted with whole significant SNPs which include insignificant SNPs at FDR p-value greater than 0.05. Tag SNPs selected based on the data of a Caucasian population and lack of breast cancer subtype information were also limitations of this study. In the systematic review-level, the summary measure and synthesis of the results were not calculated because various genes and the variations related to immune response were the focus. However, the strength of this study is that lots of genetic factors in immunerelated genes were covered at once. Moreover, it attempted to apply the candidate gene approach to cover the pathway of immunity-related genetic factors with breast cancer prognosis in Asian women.
In conclusion, our study found that common variants in the SOCS4, TSLP and HGF genes might be related with breast cancer prognosis in Korean women. Hazard of DFS in patients was significantly increased when high-risk alleles were accumulated. Therefore, our results suggest that genetic polymorphisms in immunity-related genes have relevance to breast cancer prognosis among Korean women. Further large-scale functional studies are needed to confirm our findings.