Genetic predisposition to in situ and invasive lobular carcinoma of the breast.

Invasive lobular breast cancer (ILC) accounts for 10-15% of all invasive breast carcinomas. It is generally ER positive (ER+) and often associated with lobular carcinoma in situ (LCIS). Genome-wide association studies have identified more than 70 common polymorphisms that predispose to breast cancer, but these studies included predominantly ductal (IDC) carcinomas. To identify novel common polymorphisms that predispose to ILC and LCIS, we pooled data from 6,023 cases (5,622 ILC, 401 pure LCIS) and 34,271 controls from 36 studies genotyped using the iCOGS chip. Six novel SNPs most strongly associated with ILC/LCIS in the pooled analysis were genotyped in a further 516 lobular cases (482 ILC, 36 LCIS) and 1,467 controls. These analyses identified a lobular-specific SNP at 7q34 (rs11977670, OR (95%CI) for ILC = 1.13 (1.09-1.18), P = 6.0 × 10(-10); P-het for ILC vs IDC ER+ tumors = 1.8 × 10(-4)). Of the 75 known breast cancer polymorphisms that were genotyped, 56 were associated with ILC and 15 with LCIS at P<0.05. Two SNPs showed significantly stronger associations for ILC than LCIS (rs2981579/10q26/FGFR2, P-het = 0.04 and rs889312/5q11/MAP3K1, P-het = 0.03); and two showed stronger associations for LCIS than ILC (rs6678914/1q32/LGR6, P-het = 0.001 and rs1752911/6q14, P-het = 0.04). In addition, seven of the 75 known loci showed significant differences between ER+ tumors with IDC and ILC histology, three of these showing stronger associations for ILC (rs11249433/1p11, rs2981579/10q26/FGFR2 and rs10995190/10q21/ZNF365) and four associated only with IDC (5p12/rs10941679; rs2588809/14q24/RAD51L1, rs6472903/8q21 and rs1550623/2q31/CDCA7). In conclusion, we have identified one novel lobular breast cancer specific predisposition polymorphism at 7q34, and shown for the first time that common breast cancer polymorphisms predispose to LCIS. We have shown that many of the ER+ breast cancer predisposition loci also predispose to ILC, although there is some heterogeneity between ER+ lobular and ER+ IDC tumors. These data provide evidence for overlapping, but distinct etiological pathways within ER+ breast cancer between morphological subtypes.


Abstract
Invasive lobular breast cancer (ILC) accounts for 10-15% of all invasive breast carcinomas. It is generally ER positive (ER+) and often associated with lobular carcinoma in situ (LCIS). Genome-wide association studies have identified more than 70 common polymorphisms that predispose to breast cancer, but these studies included predominantly ductal (IDC) carcinomas. To identify novel common polymorphisms that predispose to ILC and LCIS, we pooled data from 6,023 cases (5,622 ILC, 401 pure LCIS) and 34,271 controls from 36 studies genotyped using the iCOGS chip. Six novel SNPs most strongly associated with ILC/ LCIS in the pooled analysis were genotyped in a further 516 lobular cases (482 ILC, 36 LCIS) and 1,467 controls. These analyses identified a lobular-specific SNP at 7q34 (rs11977670, OR (95%CI) for ILC = 1.13 (1.09-1.18), P = 6.0610 210 ; P-het for ILC vs IDC ER+ tumors = 1.8610 24 ). Of the 75 known breast cancer polymorphisms that were genotyped, 56 were associated with ILC and 15 with LCIS at P,0.05. Two SNPs showed significantly stronger associations for ILC than LCIS (rs2981579/10q26/FGFR2, Phet = 0.04 and rs889312/5q11/MAP3K1, P-het = 0.03); and two showed stronger associations for LCIS than ILC (rs6678914/1q32/ LGR6, P-het = 0.001 and rs1752911/6q14, P-het = 0.04). In addition, seven of the 75 known loci showed significant differences between ER+ tumors with IDC and ILC histology, three of these showing stronger associations for ILC (rs11249433/1p11, rs2981579/10q26/FGFR2 and rs10995190/10q21/ZNF365) and four associated only with IDC (5p12/rs10941679; rs2588809/ 14q24/RAD51L1, rs6472903/8q21 and rs1550623/2q31/CDCA7). In conclusion, we have identified one novel lobular breast cancer specific predisposition polymorphism at 7q34, and shown for the first time that common breast cancer polymorphisms predispose to LCIS. We have shown that many of the ER+ breast cancer predisposition loci also predispose to ILC, although there is some heterogeneity between ER+ lobular and ER+ IDC tumors. These data provide evidence for overlapping, but distinct etiological pathways within ER+ breast cancer between morphological subtypes.

Introduction
Invasive lobular breast cancer (ILC) accounts for 10-15% of all invasive breast carcinomas and it has distinct etiological, clinical and biological characteristics compared with the more common invasive ductal/no special type carcinoma (IDC) [1]. Lobular cancers show stronger associations with the use of hormone replacement therapy (HRT) than IDC, [2] and its incidence follows a similar temporal pattern as the use of combined HRT [3]. ILC is characterized by E-cadherin loss and the malignant cells therefore infiltrate the breast stroma in single files with little associated stromal reaction. This makes it difficult to detect these tumors by palpation or mammography, and they are often larger at presentation than IDCs [4]. ILCs are generally of histological grade 2 and estrogen receptor positive (ER+), with the exception of the pleomorphic subgroup. They typically have a different pattern of metastatic spread to IDCs, tending to infiltrate the peritoneum, ovary and gastrointestinal system. There is some evidence that they are less chemo-sensitive than IDC and that the 10-year survival rate of women with ILC is lower than that of ER+ IDCs [5,6].
ILC is often associated with lobular carcinoma in situ (LCIS), a form of non-invasive breast cancer that is difficult to detect clinically and typically found incidentally on biopsy. The increased breast biopsy rate associated with screening mammography has led to an increase in the diagnosis of LCIS. LCIS shares many of the same genetic aberrations as ILC, suggesting that it is a precursor lesion in an analogous manner to ductal carcinoma in situ (DCIS) and IDC [7]. Women who have had LCIS are 2.4 times more likely to develop invasive breast cancer compared to the general population, with an excess of ILC (23-80% of cases) [8,9]. However only 50-70% of invasive cancers associated with LCIS have lobular morphology [10, unpublished data from GLACIER study]. The remaining cancers have a IDC or mixed ductal-lobular appearance, but again are generally ER+ (95% of IDC and mixed ductal-lobular cancers associated with LCIS in the GLACIER study were ER+). Unlike DCIS, LCIS is also a risk factor for developing invasive cancer in the contralateral breast [8].
Genome-wide association studies (GWAS) in breast cancer have identified loci that predispose to invasive breast cancer in general, or specifically to ER+ or ER-negative disease [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. However, no previous study has focused specifically on lobular carcinomas. Only one common single nucleotide polymorphism (SNP; rs11249433 at 1p11.2) has been shown to be more strongly associated with lobular than ductal histology [26]. For the remaining SNPs predisposing to ER+ tumors, it is unclear whether the studies have lacked statistical power to identify differential associations by histology, or whether associations tend to be non-differential by morphology after accounting for ER status.
The aim of this study was to identify new breast cancer susceptibility loci specific to lobular carcinoma, and to evaluate the heterogeneity of associations of known loci by morphology. This involved pooling genotyping data from over 6,000 cases of lobular carcinoma (ILC and/or LCIS) and over 34,000 controls genotyped using the iCOGS chip, a custom SNP array that comprises 211,155 SNPs enriched at predisposition loci for breast and other cancers [24].

Results
In a phase I analysis, we evaluated risk associations between SNPs on the iCOGS chip and risk of ILC and LCIS using 1,782 lobular cases (1,470 ILC with or without LCIS, 312 pure LCIS) from GLACIER, a UK study of lobular breast cancer, and 4,755 UK controls from the Breast Cancer Association Consortium, BCAC ( Figure 1). There was little evidence for systematic inflation of the test statistics, based on 37,544 uncorrelated SNPs that had not been selected on the basis of breast cancer risk (l = 1.04; Figure S1). Data were combined by meta-analysis with a further 4,241 cases (4,152 ILC, 89 LCIS) and 29,519 controls of European ancestry, derived from 34 studies in BCAC, and previously typed on the iCOGS chip (Tables S1 and S2). This resulted in a total of 6,023 cases (5,622 ILC, 401 LCIS) and 34,271 controls with data on 199,961 iCOGS SNPs (after quality control exclusions and with minor allele frequency (MAF) .0.01) included in the metaanalysis.

Search for new lobular breast cancer predisposition loci
All SNPs reaching genome-wide significance (P,5610 28 ) in the meta-analysis were correlated with one of the known breast cancer predisposition loci. In order to identify new loci that predispose to lobular carcinoma, we selected six uncorrelated SNPs (rs11977670, rs2121783, rs2747652, rs3909680, rs9948182, rs7034265) that were only weakly correlated (r 2 ,0.25) with known loci and that showed the best evidence of association (P between 5610 28 and 5610 25 ) in the overall lobular case-control analysis (ILC and LCIS). These SNPs were genotyped in a Phase II including 516 cases (481 ILC, 35 LCIS) and 1,467 controls, all from white European donors ( Figure 1).
One of the six SNPs, rs11977670 at 7q34, reached genomewide significance in a pooled analysis of phase I and II ILC cases and controls (OR = 1.13, 95%CI = 1.09-1.18, P = 6.0610 210 , Table 1, Figure 2). rs11977670 showed a similar association with LCIS (P-het for ILC vs LCIS = 0.198), and a very weak or no association with IDC (OR = 1.02, 95%CI = 1.00-1.05, P = 0.070; P-het for ILC vs IDC = 1.3610 25 ), indicating that this is a lobular specific predisposition locus ( Table 2). The risk allele appeared to act in a dominant rather than additive manner: OR AG = 1.21, 95%CI = 1.14-1.30; OR AA = 1.27, 95%CI = 1.17-1.38; P for departure from log-additivity = 0.009; Table S3. rs11977670 was not significantly associated with age at onset of ILC (P trend = 0.16) and risk alleles were not significantly overrepresented in cases with a positive family history (FH) (P = 0.90, FH+ vs FH2). None of the other 5 SNPs genotyped

Author Summary
Invasive lobular breast cancer (ILC) accounts for 10-15% of invasive breast cancer and is generally ER positive (ER+). To date, none of the genome-wide association studies that have identified loci that predispose to breast cancer in general or to ER+ or ER-negative breast cancer have focused on lobular breast cancer. In this lobular breast cancer study we identified a new variant that appears to be specific to this morphological subtype. We also ascertained which of the known variants predisposes specifically to lobular breast cancer and show for the first time that some of these loci are also associated with lobular carcinoma in situ, a non-obligate precursor of breast cancer and also a risk factor for contralateral breast cancer. Our study shows that the genetic pathways of invasive lobular cancer and ER+ ductal carcinoma mostly overlap, but there are important differences that are likely to provide insights into the biology of lobular breast tumors.
rs11977670 at 7q34 (position:139942304, GRCh Build 37) is intergenic, 65 kb from the nearest gene, JHDM1D, a histone demethylase and 500 kb from BRAF, a gene frequently mutated in melanoma. It is also in close proximity to a predicted novel U1 spliceosomal RNA that contains two U1 specific promoter motifs ( Figure S2). ENCODE data on normal human mammary epithelial cells (HMEC), and breast carcinoma (MCF-7), were used to establish chromatin states in the region and showed that rs11977670 lies in region marked by H3K27 acetylation, Figure S3.
Using expression data from the Cancer Genome Atlas Network (TCGA database) [27], we assessed expression of the nine genes within 0.5 Mb of rs11977670 by breast cancer subtype (ER+ ILC, 40 cases; ER+ IDC, 341 cases; and ER-negative IDC, 108 cases; Figure S4). Three genes showed differential expression in ER+ ILC compared to ER+ IDC (BRAF, P = 0.006; NDUFB2, P = 0.02, SLC37A3, P = 0.05), however none reached statistical significance when correcting for multiple testing. Another two genes, JHDM1D and ADCK2, showed a difference in expression between ERnegative and ER+ cancers, but this was not lobular-specific. To further investigate which genes may be influenced by SNPs tagged by rs11977670, germline genotype data for rs13225058 (A/G), a surrogate for rs11977670 (G/A) (r 2 = 0.79) was taken from the TCGA database (SNP6.0 Affymetrix array) and compared to expression of these genes, correcting for copy number variation, in 335 ER+ primary breast cancers where both genotype and expression data was available. A significant difference, after correcting for multiple testing, was found in expression between the AA and GG genotype for two genes JHDM1D (P = 0.0005) and SLC37A3 (P = 0.004), Figure S5a. Confining the analysis to the 36 ILC cases with data in TCGA showed no significant genotype specific expression due small numbers although there was the suggestion of a trend towards overexpression with the GG genotype (2 cases), Figure S5b. 48 of the cases also had expression data on adjacent normal breast tissue, but due to the small numbers no significant genotype specific expression changes were detected, Figure S6. There was no evidence of copy number variation around rs11977670 and no evidence of an excess of somatic mutations in JHDM1D, SLC37A3 or BRAF in ILC. Assessment of the 75 known breast cancer susceptibility loci for association with ILC and LCIS Most (56 of 75) known common breast cancer susceptibility loci were associated with ILC at P,0.05 with the effect in the same direction as previously reported (Table S5), and 13 of these reached genome-wide significance (P,5610 28 , Table 3). The strongest associations were with SNPs close to FGFR2 (rs2981579, OR = 1.38, P = 5.1610 252 ), TOX3 (rs3803662, OR = 1.33, P = 1.1610 235 ), at 1p11.2 (rs11249433, OR = 1.25, P = 2.7610 225 ) and 11q13.3 (rs554219, OR = 1.33, P = 1.6610 222 ). All 13 loci had previously been shown to be associated with ER+ breast cancer and one locus, rs11249433 (1p11.2), with lobular histology in subgroup analysis. Of the remaining 19 SNPs with P$0.05, 18 had ORs in the same direction as previously reported for overall breast cancer (Sign test P = 0.0001), suggesting that these SNPs are also likely to predispose to LCIS. Only one of the seven ER-negative specific loci on the iCOGS array showed a significant association with ILC (rs12710696, P = 0.037). In case-only analyses, no SNP showed an association with family history of breast cancer or young age at onset of ILC.

Assessment of the 75 known susceptibility SNPs for effects on mixed ILC-IDC cancer predisposition
Case-control analysis of 690 mixed ductal-lobular carcinomas revealed 25 loci that showed an association with these mixed cancers at P,0.05. The top hits were at FGFR2 (rs2981579, OR = 1.37, P = 1.6610 27 ), rs941764 (CCDC88C, OR = 1.25, P = 3.6610 24 ) and rs10995190 (ZNF365, OR = 0.74, P = 3.9610 24 ). The case-only analysis above showed that two of these SNPs are more strongly associated with ILC than IDC (rs2981579, rs10995190). rs941764 showed no association with ILC and only weak association with ER+ IDC, Table S6.

Discussion
Our analyses of a total of 6,539 lobular cancers (including 436 cases of pure LCIS) and 35,710 controls has identified for the first time a lobular-specific SNP, rs11977670 (JHDM1D; OR = 1.13 P = 4.2610 210 , that showed little evidence of association with IDC (P = 0.07) or DCIS (P = 0.23). Identification of the target of this association will require fine mapping of the region, followed by functional assays to determine which gene(s) the key SNPs regulate. The preliminary in silico functional analysis suggests that SNPs in this region may be influencing expression of JHDM1D (a histone demethylase) and SLC37A3 (a sugar-phosphate exchanger). For JHDM1D this appears to be a recessive effect, in contrast to the susceptibility data, which suggests a dominant effect. There are little data on the role of these genes in cancer. There is some evidence that increased expression of JHDM1D can suppress tumor growth by regulating angiogenesis [28] and decreased expression promotes invasiveness, which is contrary to what one would expect from the risk data [29]. This inconsistency does shed some doubt on these results and further analysis of the region is required before any firm conclusion can be made. Studies of syndecan-1-deficient breast cancer cells, which show increased cell motility and invasiveness, demonstrate decreased expression of both JHDM1D and E-cadherin [29], suggesting the two genes may interact. Somatic mutations in CDH1 (E-Cadherin) are frequent in ILC and rare germline frameshift mutations in CDH1 have been described in ILC, particularly in families with hereditary diffuse gastric cancer (HDGC), but also in cases of familial ILC with no HDGC [30,31]. However, none of the 56 SNPs in CDH1 that were typed on the iCOGS chip showed any association with lobular cancer at P,0.05. It should also be noted that this study is not a true genome wide association study for lobular breast cancer as the SNPs on the iCOGS chips were chosen on the basis of some prior evidence of association with breast cancer as a whole. Although ILC would have been a small proportion of the samples in the discovery sets for these SNPs it is possible that other lobular specific loci exist that have not been included on the iCOGS chip. This is particularly true for LCIS, which would only have been included in the discovery set as a parallel phenotype when associated with invasive disease.  75 of the known common breast cancer susceptibility loci were assessed for association with ILC and LCIS. As cases of ILC were included in the discovery sets that generated these susceptibility loci and lobular breast cancer is generally ER+ (94% of the ILC cases in this study were ER+) with the majority of ILCs classified as luminal tumors [32], it is not surprising that the majority of SNPs that we found to be associated with ILC were known to also predispose to ER+ breast cancer. However, some loci were only associated with ER+ IDC and not with ILC, particularly rs10941679 at 5p12, previously shown to predispose more strongly to ER-positive, lower-grade cancers [33], P-het = 2.7610 28 . Others showed a much stronger association with ILC than IDC, particularly rs11249433 at 1p11.2, as previously described [26]. These data suggest specific etiological pathways for the development of different histological subtypes of breast cancer, in addition to common pathways that predispose to multiple tumor subtypes.
Despite the small number of pure LCIS cases without invasive disease, our analyses have shown for the first time that many of the SNPs that predispose to ILC also predispose to LCIS. Although only 15 of the known breast cancer SNPs were associated with LCIS risk at P,0.05, 47 of the remaining 60 SNPs at P.0.05 had ORs in the same direction as for ILC (Sign Test P = 1.2610 25 ) suggesting that many more SNPs are likely to be associated with pure LCIS but did not reach statistical significance individually because of the relatively few LCIS cases without associated ILC in our sample set. This is not unexpected if LCIS is an intermediate phenotype for ILC. However, a small number of SNPs had differential effects on LCIS or ILC risk. Specifically, rs6678914 at 1q32.1 (LGR6), known to be an ER-negative specific SNP [25], that appeared to be associated with LCIS but not ILC (Phet = 0.0007), and rs17529111 at 6q14 preferentially associated with ER-negative tumors [23] that had a stronger association with LCIS than ILC (P-het = 0.04). We also identified SNPs in FGFR2 and at 5q11.2 (MAP3K1) that appear only to predispose to ILC, but have little effect on LCIS suggesting that SNPs affect different parts of the lobular carcinoma pathway. These findings are surprising and as based on small numbers need confirmation in future studies.
Some of the SNPs associated with both ILC and LCIS showed a stronger effect size in LCIS compared to ILC (for example SNPs at TOX3, 9q31.2, 11q13.3, ZNF365 and MLLT10). It is possible that the SNPs that showed an association with both LCIS and ILC predispose to the development of LCIS rather than ILC, and that the effect size is smaller in ILC as not all cases of LCIS will become invasive cancer. SNPs that predispose strongly to LCIS were also associated with ER+ IDCs but again with stronger effect sizes in LCIS, consistent with the fact that 30-40% of invasive tumors associated with LCIS will not be ILC but will be IDC, mixed ductal-lobular or other morphology.
In conclusion, we have identified a novel lobular-specific predisposition SNP at 7q34 close to JHDM1D that does not appear to be associated with IDC. Most known breast cancer predisposition SNPs also predispose to ILC, with some differential effects between ILC and IDC. In addition, many SNPs predisposing to invasive cancer are also likely to increase the risk for LCIS. Overall, our analyses show that genetic predisposition to IDC and lobular lesions (both ILC and LCIS) overlap to a large extent, but there are important differences that are likely to provide insights into the biology of lobular breast tumors.

Ethics statement
All studies were performed with ethical committee approval, Table S7, and subjects participated in the studies after providing informed consent.

Study populations
Phase I. Cases and controls came from 34 studies forming part of the Breast Cancer Association Consortium (BCAC) included in the COGS Project [13] (Table S1), and GLACIER (A study to investigate the Genetics of LobulAr Carcinoma In situ in EuRope MREC 06/Q1702/64), a UK case-only study of lobular breast cancer. BCAC studies recruited all types of breast cancer. Pathological information in BCAC was collected by the studies individually but combined and checked through standardized data control in a central database. A total of 4,152 ILC and 89 LCIS cases were identified by the central BCAC pathology database (see Table S2 for number of cases by study).
The GLACIER study recruited patients from participating centers throughout the UK with the aim of identifying predisposition genes for LCIS and/or ILC. Any patients aged 60 or less at the time of diagnosis, with a current or past history of LCIS (with or without invasive disease of any histological subtype) were eligible. A total of 2,539 cases were recruited: 2,167 were identified from local pathology reports in 97 UK hospitals, 346 cases were identified through the British Breast Cancer Study (BBCS) using UK Cancer Registry data and 26 cases from the Royal Marsden Breast Tissue Bank. Cryptic relatedness analysis showed no evidence of overlap between these samples and the BCAC samples. All these cases were genotyped with the iCOGS chip and compared to 5,000 UK controls selected from four UK studies participating in BCAC and already typed on the iCOGS chip. Controls were randomly selected prior to analysis so that each of these UK studies, including GLACIER, had a case:control ratio of at least 1:2 (Table S8). These controls were excluded from case-control comparisons with BCAC cases from the originating study. This report includes only cases of pure LCIS or ILC with or without LCIS. Cases of LCIS with IDC or mixed lobular and ductal carcinoma in GLACIER were excluded in order to perform meta-analyses with the BCAC studies which do not have information on the presence or absence of LCIS associated with an invasive cancer. After excluding individuals based on genotyping quality (see Genotyping and Analysis) and non-European ancestry, data for the GLACIER study available for analyses included 1,782 cases (1,470 ILC (with or without LCIS), 312 pure LCIS) and 4,755 controls.
Phase II. A further 516 cases (481 ILC, 35 LCIS) and 1,465 controls were analyzed as part of Phase II. Controls were recruited through the GLACIER study, but were not genotyped in Phase I on the iCOGS chip to reduce costs, and were all white West European. Cases came from the following studies: 232 cases from GLACIER, 176 from BBCS, 71 from DietCompLyf [38], 39 from King's Health Partners Cancer Biobank (KHP-CB). All cases were white West European, apart from the 39 samples from the KHP-CB where there were no associated ethnicity data. For studies that had also participated in Phase I, we selected samples so there was no overlap with the samples in Phase I.

Genotyping and analysis
Phase I. After DNA extraction from peripheral blood, GLACIER samples were genotyped on the iCOGS custom Illumina iSelect, which contains 211,155 SNPs, at King's College, London. The remaining cases and controls were genotyped as part of the COGS project described in detail elsewhere [13]. The GLACIER cases were analyzed using the same QC criteria as the COGS project. Briefly, genotypes were called using Illumina's proprietary GenCall algorithm and 10,000 SNPs were manually inspected to verify the algorithm calling. Individuals were excluded if genotypically not female, had overall call rate ,95% or were ethnic outliers (248 cases) as identified by multi-dimensional scaling, combining the genotyping data with the three Hapmap2 populations. SNPs with a Gencall rate of ,0.25, call rate ,95% (call rate ,99% if MAF ,0.1) and HWE,10 27 or evidence of poor clustering on inspection of cluster plots were excluded. All SNPs with MAF ,0.01 were excluded for this analysis. A cryptic relatedness analysis of the whole dataset was performed using 46,918 uncorrelated SNPs and there was no evidence of any duplicates.
For GLACIER cases and controls, principal component analysis (PCA) was carried out on a subset of 46,918 uncorrelated SNPs and used to exclude individuals or groups distinct from the main cluster using the first five principal components (PCs), Figure  S7. Following removal of outliers (166 cases and 245 controls), the PCA was repeated and the first five PCs included as covariates in the analysis. The adequacy of the case-control matching was evaluated using quantile-quantile plots of test statistics and the inflation factor (l) calculated using only uncorrelated SNPs that were not selected by BCAC and were not within one of the four common fine-mapping regions, to minimize selection for SNPs associated with breast cancer, Figure S1. As the majority of the SNPs on the iCOGS array were selected from GWAS of breast, ovarian and prostate cancer the SNPs selected for this analysis were taken from the set of SNPs selected by the prostate consortium, with the assumption that these SNPs were more likely to be representative of common SNPs in terms of population structure in our study than those selected by the breast or ovarian consortia.
For each SNP, we estimated a per-allele log-odds ratio (OR) and standard error by logistic regression, including the 5 PCs as covariates, using PLINK v1.07 (http://pngu.mgh.harvard.edu/ purcell/plink/). Genotyping and analysis of BCAC studies is described in detail elsewhere [24], in brief data were analyzed using the Genotype Library and Utilities (GLU) package to estimate per-allele ORs and standard errors for each SNP using unconditional logistic regression. All analyses were performed in subjects of European ancestry (determined by PC analyses) and adjusted for study and seven principal components.
Case-control odds ratio (OR) for ILC or LCIS cases vs controls from BCAC and GLACIER were combined using inverse variance-weighted fixed-effects meta-analysis, as implemented in METAL [39]. Case-only analyses were also carried out to compare genotype frequencies for ILC vs LCIS (GLACIER and BCAC) and ILC vs IDC (BCAC studies only), and were used as a test for heterogeneity of ORs by tumor subtype. Any study without data on both histological subtypes was dropped from the case-only analysis.
Phase II. SNPs showing the strongest evidence for association with lobular tumors (P,5610 25 ) in the meta-analysis (after excluding previously reported loci) were genotyped at LGC Genomics (formerly KBiosciences) in Phase II samples. Duplicate samples genotyped on the iCOGS chip were included to assess the concordance of the two genotyping methods. Cluster plots for rs11977670 are shown in Figure S8.
A pooled analysis of ILC including Phase I (GLACIER and BCAC) and Phase II data was performed. Data were analyzed using STATA v.12 to estimate per-allele ORs and standard errors for each SNP using unconditional logistic regression. Differences in the strength of the associations with ILC, IDC and LCIS were assessed using case-only analyses. A sign test was used to test whether the number of SNPs showing associations in the same direction in two different subtypes (i.e. LCIS vs ILC, and IDC vs ILC) was significantly grater than expected by chance. A likelihood ratio test was used as a global test of the null hypothesis of no differences between subtypes for any of the ORs of the 75 known loci evaluated. Stratum-specific estimates of per-allele OR by categories of age and family history of disease were obtained from logistic regression models and differences in ORs across strata were tested using an interaction term.

Bioinformatics
In order to establish the SNP's functional role, a window of 10 kb both up and downstream was formed around the marker and pairwise r 2 values calculated using 1000 genome CEU population data. Three SNPs were identified as being in LD (r 2 . 0.5) with rs11977670 and were compared to next generation sequence technologies to elucidate the overlap between chromatin states (ENCODE Project). Two cell lines, normal human mammary epithelial (HMEC), and breast carcinoma (MCF-7), were used to establish these chromatin states, i.e. active or engaged enhancers (H3K27ac), nucleosome-depleted regions (DNase I and FAIRE), and RNA polymerase linked regions (Pol II). Expression data from the Cancer Genome Atlas Network for each gene within a 1 Mb window of rs11977670 was analyzed looking for differential expression in each breast cancer subtype (ER+ ILC, 40 cases; ER+ IDC, 341 cases; and ER-negative IDC, 108 cases). Allele data for surrogate SNP rs13225058 was obtained for all ER+ cases from TCGA. These 335 cases were used to produce genotype specific gene expression data in R. Differences in gene expression between the three genotypes were tested for using one-way-anova, verified by t-test and visually by boxplot. Linear regression was performed across all three genotypes using copy number variation as a co-variate. Level 3 copy number variation data (hg19 build) was obtained from the TCGA data portal.  Figure S3 rs1197790 falls in a high H3K27ac region using ENCODE data from normal human mammary epithelial (HMEC), and breast carcinoma (MCF-7) cell lines to establish chromatin states in the region. (PPTX) Figure S4 Gene expression data taken from TCGA for genes in a 1 Mb window of rs11977670. Three genes showed differential expression in ER+ ILC compared to ER+ IDC (BRAF, P = 0.006; NDUFB2, P = 0.02, SLC37A3, P = 0.05). (PPTX) Figure S5 a: Genotype specific gene expression In ER+ Breast Cancers. Gene expression and genotype data was taken from TCGA and compared using a surrogate for rs11977670, rs13225058 (r 2 = 0.79) for 335 ER+ cancers. A significant difference between the AA and GG genotype was only found for two genes, JHDM1D and SLC37A3. b: Genotype specific gene expression in 36 Invasive Lobular Cancers. Gene expression and genotype data was taken from TCGA and compared using a surrogate for rs11977670, rs13225058 (r 2 = 0.79). (PPTX) Figure S6 Genotype specific gene expression in 48 cases of normal breast tissue associated with ER+ breast cancer. Gene expression and genotype data was taken from TCGA and compared using a surrogate for rs11977670, rs13225058 (r 2 = 0.79) for 48 cases with normal breast tissue.