A Study on Genetic Variants of Fibroblast Growth Factor Receptor 2 (FGFR2) and the Risk of Breast Cancer from North India

Genome-Wide Association Studies (GWAS) have identified Fibroblast growth factor receptor 2 (FGFR2) as a candidate gene for breast cancer with single nucleotide polymorphisms (SNPs) located in intron 2 region as the susceptibility loci strongly associated with the risk. However, replicate studies have often failed to extrapolate the association to diverse ethnic regions. This hints towards the existing heterogeneity among different populations, arising due to differential linkage disequilibrium (LD) structures and frequencies of SNPs within the associated regions of the genome. It is therefore important to revisit the previously linked candidates in varied population groups to unravel the extent of heterogeneity. In an attempt to investigate the role of FGFR2 polymorphisms in susceptibility to the risk of breast cancer among North Indian women, we genotyped rs2981582, rs1219648, rs2981578 and rs7895676 polymorphisms in 368 breast cancer patients and 484 healthy controls by Polymerase chain reaction-Restriction fragment length polymorphism (PCR-RFLP) assay. We observed a statistically significant association with breast cancer risk for all the four genetic variants (P<0.05). In per-allele model for rs2981582, rs1219648, rs7895676 and in dominant model for rs2981578, association remained significant after bonferroni correction (P<0.0125). On performing stratified analysis, significant correlations with various clinicopathological as well as environmental and lifestyle characteristics were observed. It was evident that rs1219648 and rs2981578 interacted with exogenous hormone use and advanced clinical stage III (after Bonferroni correction, P<0.000694), respectively. Furthermore, combined analysis on these four loci revealed that compared to women with 0–1 risk loci, those with 2–4 risk loci had increased risk (OR = 1.645, 95%CI = 1.152–2.347, P = 0.006). In haplotype analysis, for rs2981578, rs2981582 and rs1219648, risk haplotype (GTG) was associated with a significantly increased risk compared to the common (ACA) haplotype (OR = 1.365, 95% CI = 1.086–1.717, P = 0.008). Our results suggest that intron 2 SNPs of FGFR2 may contribute to genetic susceptibility of breast cancer in North India population.


Introduction
Worldwide, breast cancer is the most commonly diagnosed cancer and the leading cause of cancer mortality among women [1]. Asian countries have witnessed greatest increase of the globally rising breast cancer burden during the last several decades [2][3][4][5][6]. A similar trend has been observed in India [7][8][9][10] with a reported 0.5-2% per annum rise in incidence across all regions and in all age groups, particularly in younger age groups (,45 years) [11]. Further, it is predicted that breast cancer cases would increase by 26%, majorly in developing countries, by 2020 [12].
Breast carcinogenesis involves a complex combination of genetic, environmental as well as lifestyle factors. Inherited susceptibility makes an important contribution to breast cancer development and the risk is around two times more in first degree relatives of women with the disease [13]. Rare mutations in several high-penetrance genes like BReast CAncer genes (BRCA1, BRCA2) account for less than 25% of the familial breast cancer risk, and less than 5% of the overall risk [14,15]. Therefore, common variants present in other low penetrance genes may be more imperative and contribute to breast cancer along with lifestyle and environmental factors [16]. However, all of the common low risk variants described so far collectively account for ,10% of the familial risk of breast cancer [17][18][19][20][21][22][23][24][25], leaving ample room for uncovering additional variants that confer risk of this disease and account for the genetic basis of the remaining major breast cancer fraction. Single nucleotide polymorphisms are the most common type of germline variations present in at least 1% of a population [26]. The effect of an individual SNP is usually small, but combinations of relevant SNPs across the genome may additively contribute to higher risk in a polygenic model [27]. Though supposed to be functionally insignificant, current evidence emphasizes their predominantly unexplored functional relevance [28][29][30].
Fibroblast Growth Factor Receptor 2 (FGFR2) belongs to the FGFR family of tyrosine kinase receptors and contributes to the process of tumorigenesis through cell growth, invasiveness, motility and angiogenesis [31]. It plays an important role during mammary gland development [32] and aberrant FGF signaling has been associated with the pathogenesis of multiple types of cancer [33][34][35][36]. FGFR2 overexpression has been observed in breast cancer cell lines and breast tumor tissues [37][38]. Human FGFR2 gene, is located at chromosome 10q26, and contains 22 exons [39]. Two large Genome-Wide Association Studies (GWAS) have identified intron 2 SNPs of FGFR2 to be associated with breast cancer risk, rs2981582 and rs1219648 were the most strongly associated marker SNPs in the two studies respectively [17,18]. Association of these variants with breast cancer has been evaluated in different ethnic regions with inconsistent findings [40][41][42][43][44][45][46][47][48][49][50][51][52][53][54]. Recent metaanalysis suggests their association with breast cancer risk in Caucasian and East Asian populations [55]. Both rs2981582 and rs1219648 fell in a 25 kb linkage disequilibrium (LD) block within intron 2 region of FGFR2 [17,18]. Multiple haplotypes carrying the minor allele of rs2981582 were found to be associated with the risk in haplotype analysis [17]. Six polymorphisms including rs7895676 and rs2981578 were identified as potentially causal for breast cancer, with rs7895676 exhibiting strongest association in the combined analysis of European and Asian datasets [17]. Further analysis by Meyer et al. [56] support their functional relevance in relation to breast cancer risk. However, the association of rs2981578 and rs7895676 with breast cancer susceptibility still remains inconclusive [49][50][51][52]57].
Wide variations in genetic architecture, including differential allele frequencies of SNPs and differently evolved LD structure for the GWAS-identified genetic variants reflect differences among ethnicities and may contribute to disparities in the incidence and characteristics of breast cancer. Thus, variants identified in one study may not have the same impact on risk in other populations. Therefore, there is a need to replicate previously associated loci in multiple populations worldwide. This will help in determining the genetic heterogeneity among different population groups for these loci, particularly in India, which witnesses a rapidly rising breast cancer burden but relatively fewer studies to identify the common breast cancer associated variations. Such studies will assist in evaluating the generalizability of initial findings and to identify the causal variants. Therefore, we tried to assess the impact of FGFR2 intron 2 polymorphisms (rs2981582, rs1219648, rs2981578 and rs7895676) on sporadic breast cancer and determined their association with the risk for North Indian women in a case control approach, including combined effect of these variants, LD structure measurement, haplotype analysis, as well as relation with patients' clinical, environmental and lifestyle characteristics. We observed significant association of these variants with breast cancer susceptibility for North Indian women.

Ethics Statement
The study was approved by Institution Ethics Committee of All India Institute of Medical Sciences (AIIMS), New Delhi and the Institutional Human Ethical Committee of Jamia Millia Islamia, New Delhi. All the participants provided their written informed consent to be included in the study.

Study subjects and specimen collection
This hospital-based case control study included a total of 852 genetically unrelated women subjects of North Indian ethnicity comprising 368 sporadic breast cancer cases and 484 healthy controls. Controls were frequency-matched to cases on age (62 years) and geographical location. The study participation response rates for cases and controls were 88.46% and 81.07%, respectively. All breast cancer cases (aged 24-80 years) were newly diagnosed, histopathologically confirmed with primary breast cancer and were recruited from the Department of Surgical Oncology, AIIMS. Classification of breast cancer has been done according to TNM staging system by American Joint Committee on Cancer (AJCC) and Nottingham grading system for histological grading. Exclusion criteria included in the study were reported previous cancer history, metastasized cancer from other organs and previous exposure to radiotherapy or chemotherapy.
Detailed information on clinical profiles for cases and controls were collected from their medical records and are presented in Table S1.

Extraction of genomic DNA
Participating women provided 3-5 ml of venous blood samples used for isolating genomic DNA based on standard phenolchloroform extraction method [58]. DNA samples were stored at 280uC until used for further analysis.

SNP selection and Genotype analysis
Previously reported FGFR2 SNPs showing association with breast cancer in one or more GWAS and candidate gene studies including the two proposed functional variants (rs2981582C/T, rs1219648A/G, rs2981578A/G, rs7895676T/C) [17,18,[40][41][42][43][44][45][46][47][48][49][50][51][52]56] were selected for genotyping. All the four SNPs were analyzed using the Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) assay. Details of selected SNPs, primer sequences used for PCR, sizes of the PCR products, restriction endonucleases (New England Biolabs, USA) used for digestion and their recognition sequences, as well as size of various digested fragments obtained distinguishing different genotypes for all the four SNPs are described in Table 1 and Table S2. For all the SNPs, restriction digested fragments were subjected to analysis by 2-3.5% agarose gel electrophoresis. In order to validate the data generated by PCR-RFLP assay method, 5% of randomly selected samples were directly sequenced. DNA sequencing was carried out at Xcelris Labs Ltd., India. The quality of genotyping was assessed by re-genotyping 10% of randomly selected samples; no discrepancy in the replicate genotyping could be obtained.

Statistical analysis
To compare the overall distribution of genotypes between patients and healthy controls 362 Chi-square (x2) test was performed. Hardy-Weinberg equilibrium (HWE) was evaluated by a goodness-of-fit x2 test. For estimating associations between individual genotypes and breast cancer risk, and for cumulative risk analysis, odds ratios (ORs) and their 95% confidence intervals (CIs) were computed using unconditional logistic regression analysis with adjustment for age. One-way ANOVA (Analysis of variance) was carried out for estimating the contribution of different number of risk loci to breast cancer risk. These statistical analysis were performed using Statistical Package for the Social Sciences, version 17 (SPSS Inc., Chicago, IL, USA) and P,0.05 was considered statistically significant. Further, all P-values were corrected for multiple comparisons according to Bonferroni method. LD pattern and population haplotype frequencies for the SNPs were estimated using HaploView v4.2 [59]. Fisher's exact test was performed for determining the association of haplotypes with diseased condition.

Results
Results of genotype analysis on the four selected FGFR2 intronic variants (rs2981582C/T, rs1219648A/G, rs2981578A/ G, and rs7895676T/C) were available from 368 breast cancer cases/484 healthy controls and a notably significant association with breast cancer susceptibility was observed.

FGFR2 SNPs and overall breast cancer risk
Distribution of genotype and allele frequencies of the four FGFR2 SNPs in breast cancer cases and controls are shown in Table 2. Chi-square test depicted a significant association for the four FGFR2 variants with overall breast cancer risk (P,0.05). Logistic regression analysis (age adjusted) further confirmed this association which remained significant in per-allele model for rs2981582/T, rs1219648/G, rs7895676/C and in dominant model for rs2981578 (AG+GG) even after Bonferroni correction (P,0.0125).
To evaluate the cumulative risk for these SNPs, we categorized study subjects as carrying 0 risk loci, 1 risk loci, 2 risk loci, 3 risk loci and 4 risk loci (risk loci represents presence of risk allele at SNP position). For determining the contribution of different risk loci overall as well as in four different trio combinations of these SNPs, logistic regression and one-way ANOVA tests were performed (Table 3). Logistic regression analysis revealed a significantly higher risk in carriers with 2-4 risk loci (aOR = 1.645, 95%CI = 1.152-2.347, P = 0.006) compared to those with 0-1 risk loci. Further a progressively increased risk was noted from 1 risk loci (aOR = 1.600, 95%CI = 0.754-3.394) to 2 risk loci (aOR = 1.786, 95%CI = 1.076-2.964) to 3-4 risk loci (aOR = 1.855, 95%CI = 1.230-2.799) in comparison to 0 risk loci (Table S3). ANOVA determined significant (P,0.05) differences in the contribution of various risk loci to diseased condition for the four SNPs taken together as well as in different combinations ( c P values in Table 3). Moreover, logistic regression analysis revealed significant contribution of only two SNP combinations for breast cancer risk considering dichotomized 2-3 risk loci compared to 0-1 risk loci, including ABD (rs7895676, rs2981578 and rs1219648; aOR = 1.622, 95%CI = 1.150-2.289, P = 0.006), and BCD (rs2981578, rs2981582 and rs1219648; aOR = 1.431, 95%CI = 1.065-1.923, P = 0.018). On conducting multiple comparison analysis in ANOVA, similar trends were observed (Table S3, Table S4). Significant association with the risk was noted for 2 risk loci (P = 0.026) and 4 risk loci (P = 0.002) compared to 0 risk loci. Also for the four different SNP combinations, significant P values for all the risk loci were observed for combination BCD ( b P values in Table 3) indicating its predominant contribution towards risk.

FGFR2 SNPs and clinicopathological characteristics
Further we analyzed association of these variants with various clinicopathological characteristics including several reproductive and environmental risk factors of breast cancer in a stratified analytical approach (Table 4) The corrected P value cut-off after bonferroni correction was set as (P,0.000694).

Linkage Disequilibrium (LD) and Haplotype analysis
Setting measure of high LD between two genetic markers cut off to a value r 2 $0.80, D9 = 1, in control group of our study population, the four studied SNPs were found to be in moderate to weak LD (pair wise r 2 value range from 0.175-0.680, D9 value range from 0.610-0.938, Figure S1). Haplotype frequencies were estimated for the four SNPs taken together as well as for different combinations of SNPs taken three and two at a time using  Haploview and the association with the risk was determined by applying Fisher's exact test (Table 5, Table S5). Increased risk for the haplotype having only risk alleles compared to the one having only common alleles was observed for all the possible combinations (P,0.05). However, contrary to the combined risk analysis, predominant contribution towards the risk in terms of higher odds ratio was observed for trio combinations ABC (rs7895676, rs2981578 and rs2981582; OR = 1.422, 95%CI = 1.126-1.797, P = 0.004) and ACD (rs7895676, rs2981582 and rs1219648; OR = 1.442, 95%CI = 1.144-1.816, P = 0.002). However, SNP combination BCD (rs2981578, rs2981582 and rs1219648), seems to be relevant as pair wise D9.0.80 for these three SNPs ( Figure  S1) and carriers of GTG (carrying only risk alleles) haplotype had a significantly greater risk compared to ACA (carrying only wildtype alleles) haplotype, (OR = 1.365, 95%CI = 1.086-1.717, P = 0.008). While among duo SNP combinations, AC (rs7895676 and rs2981582) displayed highest odds ratio (OR = 1.449, 95%CI = 1.153-1.822, P = 0.002), Table S5.

Discussion
In this case-control study of sporadic breast cancer in North Indian women we found that the variant genotypes rs2981582C/ T, rs1219648A/G, rs2981578A/G and rs7895676T/C of FGFR2 were all significantly associated with increased breast cancer risk. Recent identification of these intron 2 SNPs [17,18] has drawn substantial attention towards FGFR2 as a candidate gene for breast cancer. At present, much effort is focussed into targeting additional genetic alterations that drive breast cancer and FGFR2 which has been implicated in different types of human malignancies, including breast cancer [33][34][35][36], is a likely candidate.
Since a previous report from South India [60] did not succeed in replicating the association of the studied FGFR2 variant with breast cancer, as was observed in Europeans and other Asian populations [17,46,47], it was relevant to revisit the region along with other SNPs from the same LD block. The purpose of our study was to unravel any heterogeneity in association between population groups. Such differences reflect the variations among distinct geographic areas and ethnicity, and accentuate the necessity of characterizing breast cancer susceptibility genes among ethnic groups.
Present study reports significant association of rs2981582 and rs1219648 with breast cancer, consistent with previous observations from two Asian studies by Liang et al. [46] and Kawase et al. [47]. T allele of rs2981582 has been linked with an increased activity of FGFR2 and it has been shown that haplotype marked by this allele associates with a higher level of FGFR2 transcription both in breast cancer cell lines and tumors [56]. We observed an association of risk allele at rs2981582 and rs1219648 loci with breast cancer in premenopausal women, similar to some previous studies revealing the association of these variants with breast cancer risk in younger women [43][44][45][46]. We also observed rs2981582 T allele and rs1219648 G allele association with ERpositive than ER-negative tumors and further association with PRpositive than PR-negative tumors for rs2981582. Such findings of association with reproductive hormones are supported by several earlier studies showing that FGFR2 variants contribute to breast cancer and confer their effect primarily in ER-positive and PRpositive tumor subtypes [42,44,46,60,61]. Also, higher levels of FGFR2 expression have been reported in ER-positive than ERnegative cell lines and tumors [62][63][64]. It is well known that elevated level of endogenous sex hormones, particularly estrogens, may increase breast cancer risk [65] and further, in premenopausal women exposure of endogenous serum estrogen is much higher as compared to post-menopausal women [66]. For rs2981582, we also observed an association of T allele with lower grade tumors, in accordance with a previous study by Garcia-Closas et al. [61]; with an early age at menarche (#12 years), an observation previously reported by Kawase et al. [47]; and with employed status. Early onset of menarche is considered a breast cancer risk factor [67,68] as early onset of menarche leads to early exposure of endogenous sex hormones and can induce proliferation of breast cells [67]. A significant proportion of breast cancer in India has been attributed to greater urbanization and changing life styles. Higher education and increased income have been shown to be as risk factors of breast cancer [69,70]. For rs1219648, we observed a strong association of G allele with more invasive tumors with higher chance of LN metastasis, consistent with a previous report from China [54], and with clinically advanced stage, suggesting its association with disease aggressiveness. Moreover, we observed a very strong association of the risk allele with the use of exogenous hormones (either as contraceptives/infertility treatment/hormone replacement therapy), this is in somewhat contradiction to a previous report by Rebbeck et al. [53], where never users of combined hormone replacement therapy (CHRT) with the risk allele were at higher risk. Further, association with positive breastfeeding status was also observed. Exogenous hormone exposure and breastfeeding have been described as important factors predictive of breast cancer risk [71].
Breast cancer tends to be diagnosed at an earlier age in developing countries than in European and American populations and a rapid rate of increase in incidence has been observed before menopause [72]. Moreover, it has also been reported that premenopausal women constitute about 50% of all the breast cancer patients in India [6]. Thus, results from our study demonstrating the association of rs2981582 and rs1219648 with premenopausal status suggest the importance of investigating these two SNPs in Indian context. Moreover, restriction of the risk conferred by FGFR2 variants to ER-positive and PR-positive tumors suggests that these SNPs affect the reproductive hormonerelated pathway in the development of breast cancer in North Indian women. But these observations need to be confirmed in larger sample size studies from our population.
Recently done analysis by Meyer et al. have shown that two FGFR2 SNPs rs2981578 and rs7895676 within intron 2 region alter the DNA binding affinity of transcription factors octamerbinding transcription factor 1 (Oct-1)/runt-related transcription factor 2 (Runx2) and CCAAT/enhancer binding protein b (C/ EBPb) respectively, resulting in an increased FGFR2 gene expression both in cell lines and in breast tissues in patients homozygous for the risk allele as compared to those homozygous for the wild type allele [56]. These 2 SNPs are located in the same LD block of interest identified by GWAS [17,18]. Role of these as breast cancer susceptibility variants is not yet established. In our study we observed a significant association of G allele of rs2981578 with breast cancer risk which is in accordance with a previous African American study [49] and a Chinese study [52]. We also observed an association of G allele with LN-positive status and with advanced clinical stage suggesting that the risk allele might relate to a more aggressive form of breast cancer. Further, association with parous status was observed. For rs7895676, we observed significant association of C allele with breast cancer risk, consistent with an earlier study by Boyarskikh et al. [50]. We further observed association of C allele with PR-positive, LNpositive, less malignant grade I+II tumors and negative breastfeeding status. Both parity and breastfeeding have been described as important factors linked to breast cancer risk [71]. Moreover, nulliparity and negative breastfeeding status have been linked with increased risk for breast cancer in Indian population [73,74].
Association with exogenous hormone exposure and higher stage for SNPs rs1219648 and rs2981578 respectively, achieved statistical significance even after Bonferroni correction (P, 0.000694), while other clinical features lost statistical significance, suggesting the importance of these SNPs in sub-categorized breast cancers in our population. But these observations need to be confirmed in further studies with larger sample size, to rule out false positive results and to establish intron 2 FGFR2 SNPs as breast cancer susceptibility loci.
On conducting combined risk analysis in our study population of North Indian women relative risk of developing breast cancer is found to be elevated by around 65% for women carrying 2-4 risk loci as compared to the remaining groups carrying 0-1 risk loci (aOR = 1.645, 95%CI = 1.152-2.347). Moreover a progressively augmented risk with increasing number of risk loci was also noted demonstrating that a combination of these variants cumulatively increases risk (Table S3). In haplotype analysis, the FGFR2 rs2981578 G/rs2981582 T/rs1219648 G haplotype was associated with a significantly increased breast cancer risk compared with the rs2981578A/rs2981582 C/rs1219648 A haplotype. Our findings on combinatorial effect of these loci and haplotype analysis are to several extent similar to previous studies [44][45][46]51,75], though they included only 2 or 3 FGFR2 variants we are reporting here. Although, the tendency to increase breast cancer risk was significant across all the four SNPs tested, but the LD pattern between the four FGFR2 variants in our North Indian population was weak to moderate only, in contrast to Europeans, but resembling other Asian populations [17,46,47,75], indicating a fairly independent risk effect of each locus in our population, but the results warrant screening in larger sample sets. Moreover, we also observed significant differences in the contribution of different number of risk loci as well as different combinations of SNPs both in combined risk analysis as well as haplotype analysis, which resulted in varied extent of involvement towards risk.
To the best of our knowledge, we are reporting for the first time, a case control study on these four intronic FGFR2 variants taken together along with LD measurement, haplotype analysis and stratified analysis for possible correlation with patients' clinical parameters in susceptibility to breast cancer.
Location of these FGFR2 variants in intronic region suggests the probable explanation for their association with the risk through differential expression. Aberrant expression of alternatively spliced isoforms of FGFR2 has been shown to activate signal transduction leading to transformation in breast cancer cells [76]. Variable expression of FGFR2 in relation to intron 2 SNPs has been supported by the analysis carried out by Meyer et al. [56] and Huijts et al. [77]. Further, FGFR2 intron 2 shows a high degree of conservation in mammals, and number of conserved putative transcription-factor binding sites have been identified in it [17,78], some of which lie in close proximity to the significant SNPs. However, the exact mechanism of how these SNPs affect FGFR2 upregulation remains unclear.
Besides SNPs, other features of FGFR2 could be targeted in search for newer and efficient biomarkers in the future. Several altered FGFR2 characteristics have been linked with breast tumorigenesis and have shown promising results in studies on breast cancer cell lines and tumors, like amplification and overexpression, mutations, alternative splicing and isoform switching [33][34][35][36][37][38]62,76,[79][80][81][82][83]. Though, none of them has reached the clinical phase as yet and there are many hurdles to be overcome, there is enough encouraging evidence suggesting that targeting FGFR2 along with other FGFRs in certain subtypes of breast cancer could be a valuable approach in the future [84][85][86][87][88].
In conclusion, our study revealed a significant association of FGFR2 intron 2 SNPs with breast cancer risk, as well as their interaction with various clinical parameters revealing their contribution to breast cancer susceptibility among North Indian women. Although, findings of the present study by themselves are unlikely to have any immediate clinical implications, however, such studies may play a key role in elucidating the biological mechanism that underline breast tumor heterogeneity, which may ultimately lead to improved treatment and prevention. These findings suggest that genetic variants of FGFR2 might be used as candidate potential biomarkers for breast cancer risk. Further epidemiological and experimental studies of larger data sets along with sub-categorization by clinical parameters and expression studies are warranted to explore and confirm the role of these variants in increasing breast cancer risk, particularly from India, that will help us better understand the genetic heterogeneity in complex diseases like breast cancer.