Candidate Luminal B Breast Cancer Genes Identified by Genome, Gene Expression and DNA Methylation Profiling

Breast cancers (BCs) of the luminal B subtype are estrogen receptor-positive (ER+), highly proliferative, resistant to standard therapies and have a poor prognosis. To better understand this subtype we compared DNA copy number aberrations (CNAs), DNA promoter methylation, gene expression profiles, and somatic mutations in nine selected genes, in 32 luminal B tumors with those observed in 156 BCs of the other molecular subtypes. Frequent CNAs included 8p11-p12 and 11q13.1-q13.2 amplifications, 7q11.22-q34, 8q21.12-q24.23, 12p12.3-p13.1, 12q13.11-q24.11, 14q21.1-q23.1, 17q11.1-q25.1, 20q11.23-q13.33 gains and 6q14.1-q24.2, 9p21.3-p24,3, 9q21.2, 18p11.31-p11.32 losses. A total of 237 and 101 luminal B-specific candidate oncogenes and tumor suppressor genes (TSGs) presented a deregulated expression in relation with their CNAs, including 11 genes previously reported associated with endocrine resistance. Interestingly, 88% of the potential TSGs are located within chromosome arm 6q, and seven candidate oncogenes are potential therapeutic targets. A total of 100 candidate oncogenes were validated in a public series of 5,765 BCs and the overexpression of 67 of these was associated with poor survival in luminal tumors. Twenty-four genes presented a deregulated expression in relation with a high DNA methylation level. FOXO3, PIK3CA and TP53 were the most frequent mutated genes among the nine tested. In a meta-analysis of next-generation sequencing data in 875 BCs, KCNB2 mutations were associated with luminal B cases while candidate TSGs MDN1 (6q15) and UTRN (6q24), were mutated in this subtype. In conclusion, we have reported luminal B candidate genes that may play a role in the development and/or hormone resistance of this aggressive subtype.

Luminal B BCs have a poor prognosis [32]. Although they express hormone receptors, their metastatic risk and resistance to hormone therapy and to conventional chemotherapy demand to develop appropriate therapies. Some proteins (e.g. CITED2, NCOR2) or molecular networks associated with BCAR (breast cancer anti-estrogen resistance) genes are involved in the resistance to anti-estrogen therapy and in the progression of these cancers [32][33][34][35]. Luminal B cancers exhibit various mutated genes, TP53 and PIK3CA being the most frequent (29% each) [28].
To further define molecular alterations associated with the luminal B subtype we studied DNA copy number aberrations (CNAs), DNA promoter methylation alterations (DPMAs), gene expression deregulation (EXP), and selected gene mutations in 188 primary BC samples. These analyses identified luminal B-specific candidate genes.

Ethics statement
The study was approved by our institutional review board: the ''Comité d'Orientation Stratégique'' of the Institut Paoli Calmettes (IPC) (Marseille, France). Each patient gave a written informed consent for research use.

Breast cancer samples
Pre-treatment tumor tissues were collected from 188 patients with invasive adenocarcinomas. Patients underwent surgical biopsies or initial surgery at the Institut Paoli-Calmettes between 1987 and 2007. The main histoclinical, biological and subtype characteristics were established for the 188 BCs as described [12,[17][18][19]. They are listed in Table S1A and illustrated in Figure S1.
Gene profiling and data analysis DNA and RNA were extracted as previously described [12,[17][18][19] and controled on Agilent Bioanalyzer (Agilent Technologies, Massy, France). Genomic profiles of the 188 BCs were established by using array-comparative genomic hybridization (aCGH) onto high-resolution 244K CGH microarrays (Hu-244A, Agilent Technologies, Massy, France). A pool of 13 normal male DNA was used as reference. Gene expression data from the same 188 BCs and 4 normal breast (NB) samples, which represented 1 pool of 4 samples from 4 women, and 3 commercial pools of respectively 1, 2 and 4 normal breast RNA (Clontech, Palo Alto, CA), were obtained using whole-genome DNA microarrays (HG-U133 Plus 2.0, Affymetrix). Both approaches and analysis methods have been used in our previous studies [12,17,18]. All probes for aCGH, gene expression and DNA promoters methylation analyses were mapped according to the hg18/NCBI human genome mapping database (build 36) to homogeneously integrate the data. The aCGH, gene expression, methylation data, as well as the integrated CNA/gene expression and DPMA/gene expression analyses are illustrated by pipelines ( Figures S2 and S3, respectively).
Validation and prognostic impact of candidate genes were evaluated in a large public series of BC samples. Thirty-six data sets, including a total of 5,765 non-redundant samples, were collected from public database i.e. Gene Expression Omnibus (GEO/NCBI), Array Express (EBI) and authors' websites (Table  S1B). Raw data from each study were normalized using quantile normalization, and log2-transformed. The intrinsic molecular subtypes of each tumor sets were defined using Hu single sample predictor (SSP) [4]. To be comparable across data sets, each gene expression levels were standardized within each data set using luminal A population as reference.
The data (experiment called ''Candidate luminal B breast cancer genes identified by genome, gene expression and DNA methylation profiling'') are publicly available (ArrayExpress repository ref ID: E-MTAB-1861).

DNA promoter methylation profiling and data analysis
We captured the methylated DNA of 117 (109 tumors+8 NB) samples by using the MethylMiner Methylated DNA Enrichment Kit (Invitrogen). Genome-wide DNA-methylation analysis was done on custom [A-MEXP-2178 (arrayexpress)] human promoter arrays 26400K (Agilent Technologies, Massy, France) using the MethylMiner Methylated DNA Enrichment Kit (Invitrogen) [36]. Over 414,000 probes cover promoter regions approximately 2 3 kb to +3 kb relative to transcription start sites (TSSs) with a resolution of 280 bp in average. Scanning was done with Agilent Autofocus Dynamic Scanner (G2565C, Agilent Technologies). Raw data were obtained from Feature extraction 10.7.3 software (Agilent Technologies). Probes not mapped to exact positions in the genome as well as those under the background signal were removed with the control probes. The final dataset contained 326,350 unique probes covering 18,297 promoter regions according to the hg18/NCBI human genome mapping database (build 36). The M (Log 2 Red-Log 2 Green) values of each probe on the array were then obtained and normalized according to their GC content. Then, inter-array quantile normalization was done for the correction of distribution differences among experiments.
To estimate the global methylation level for a given gene in one sample, we computed a methylation score based on the sum of frequency of probes with a M value greater than zero combined with its amplitude and frequency of probes with a M value less than zero combined with its amplitude. Clustering was done with the Cluster program using Pearson correlation as similarity metrics and average linkage clustering. Results were displayed using TreeView program.

Gene mutations
Polymerase chain reaction (PCR) and direct sequencing were done using standard conditions with gene-specific primers designed to amplify coding sequence of ARID1A, ASXL1, FOXO3, L3MBTL4, MAP2K4, PIK3CA, RUNX1, RUNX3 (Table S1C). Most PCR amplifications were done in a total volume of 25 ml PCR mix containing at least 10 ng template DNA, Taq buffer, 200 mmol of each deoxynucleotide triphosphate, 20 pmol of each primer and 1 unit of appropriated Taq polymerase (Table S1C). PCR products were purified using Millipore plate MSNU030. The purified PCR products (2 ml) were used for sequencing using the Big Dye terminator v1.1 kit (Applied Biosystems, Courtaboeuf, France). After G50 purification, sequences were loaded on an ABI 3130XL automat (Applied Biosystems). The sequence data files were analyzed using the Seqscape software and all mutations were confirmed on an independent PCR product.

Statistical analyses
Correlations between sample groups and histoclinical factors were calculated with the Fisher's exact test for qualitative variables with discrete categories, and the Mann-Whitney test for continuous variables. Metastasis-free survival (MFS) was calculated from the date of diagnosis until the date of first metastatic relapse. Survivals were calculated using the Kaplan-Meier method and curves were compared with the log-rank test. Stratification into high-risk and low-risk groups was based on relative risk defined by Cox model using the natural threshold of 1. Univariate and multivariate survival analyses were done using the Cox regression model (Wald test). Variables tested in univariate analyses included patients' age at time of diagnosis (#50 years vs .50), pathological tumor size (pT: pT1 vs pT2-3), pathological axillary lymph node status (pN: negative vs positive), pathological grade (I vs 2-3), histological type, delivery of hormone therapy and chemotherapy, immunohistochemical (IHC) ERBB2 status (negative vs positive), molecular subtypes and RECQL4 expression (continuous value). Variables with a p-value,0.05 in univariate analysis were tested in multivariate analysis. All statistical tests were two-sided at the 5% level of significance. Analyses were done using R software (2.14.2) and associated packages. We followed the reporting REcommendations for tumor MARKer prognostic studies (REMARK criteria) [39].
Luminal B CNAs landscape was drawn with the Circos software [40] and significant mutual exclusive and co-occurring CNAs (FDR,0.05) were identified by DRP analysis [41].
In the meta-analysis of six recent NGS studies including 875 breast tumors, co-occurring and mutually exclusive gene mutations were identified by using a method previously reported [42].

Genomic characterization of 188 BC samples
We first describe the results on the whole set of 188 tumors before addressing the specific question of the luminal B cases. The 188 samples were first profiled using whole-genome gene expression microarrays. Figure S1 shows the hierarchical clustering of samples based on the expression of 13,031 probe sets. Samples were sorted into four major clusters, which strongly correlated with histoclinical features (grade, IHC data) and molecular subtypes. The 188 cases (Table S1A) (Table S1D).

Copy number aberrations in luminal B tumors
We next focused on the CNAs found in luminal B tumors. The luminal B subtype shows both common and specific alterations [7,9,16,19]. To identify the latter we did a supervised analysis comparing CNAs observed in luminal B to those found in each of the other subtypes ( Fig. S5 and Tables S2A-F).
The  33 and 6q, 9p22-p24, 13q34, and 18p11.31-11.32 regional gains and losses with a frequency $30%, respectively) are more frequent in luminal B samples (Fisher's exact test; FDR, 0.05) (Fig. S5, Tables S2C,D). Gains and losses targeted 1,091 and 3,339 genes, respectively ( Table S2D). The comparison with the ERBB2 subtype did not distinguish regions with a different CNA frequency, perhaps because of the low number of samples.

Integrated comparative analysis and luminal B candidate genes
We compared the degree of CNA-driven RNA up or downregulation in 32 luminal B (i) vs 64 luminal A, (ii) vs 54 basal, and (iii) vs 156 pooled non-luminal B tumors, by analyzing the 13,127 genes common to the two platforms (aCGH Agilent On the left is shown a hierarchical clustering of genome copy number profiles measured by aCGH on 24,907 probes or groups of probes (without X and Y). Red indicates increased copy number and green indicates decreased copy number. To the left are indicated chromosome locations with chromosome 1pter to the top and 22qter to the bottom. Next on the right, significant copy number amplifications (dark red), gains (red) and losses (green) observed in luminal B compared to non-luminal B tumors (Fisher's exact test), are plotted as a function of chromosome location. Only amplification, gains, and losses associated with luminal B tumors are shown (Fisher's exact test; FDR,0.05) for each chromosome. In addition to the previously reported 8p11-12, 11q13 amplifications, 17q, 20q gains and 18p losses we previously reported (16,20) and Hu233 2.0 plus Affymetrix) and retained after filtering based on the expression level. From these supervised analyses (Tables  S2A-F), 160, 148 and 307 genes had an expression level that varied according to CNA (Mann-Whitney; FDR,0,05) after luminal B/luminal A (Table S3A), luminal B/basal (Table S3C), and luminal B/non-luminal B (Table S3E) comparison, respectively. Of these, 138, 132, and 251 genes had a deregulated gene expression in relation with CNA specifically associated with the luminal B subtype (t test, FDR,0.05) after luminal B/luminal A (Table S3B), luminal B/basal (Table S3D), and luminal B/nonluminal B (Table S3F) comparison, respectively. These specific luminal B candidate genes were qualified as potential oncogenes or TSGs if they presented up or downregulated expression in relation with amplification/gain or losses, respectively, as previously described [12,17,18]. Overall, out of the 337 candidates identified, 66% (221) were found in the TCGA data [28], deregulated in relation with their copy number alterations. (Tables S3B, 3D and S3F, respectively).
A total of 34, 47 and 2 candidate genes were identified as specific of luminal B tumors when compared to luminal A, basal, and normal-like tumors (Tables S3H-J), respectively. These different repertoires of candidate genes could help identify pathways and mechanisms either specifically affected in luminal B tumors or shared by the major subtypes.
RECQL4 was the most significant. Corresponding Kaplan-Meier MFS curves are shown in Figure 3A (p = 3.80610 29 ). Interestingly, concordant results were obtained with another referent molecular classifier i.e. PAM50 SSP [50] (Table S3M). Indeed, the definition of luminal B tumors is highly variable due to their molecular heterogeneity, making their assignment in the luminal B subtype non-reproducible by different gene expression signatures [51].This suggests that the robustness of our candidates is not impacted by the choice of the classifier [4,50]. A high RECQL4 gene expression level has recently been reported to confer proliferation advantage and survival to breast cancer cells [50]. We observed that RECQL4 overexpression was associated with MDM2 overexpression (t-test p = 2.75610 22 ) as well as with mutated TP53 or/and overexpressed MDM2 gene status (t-test p = 1,73610 220 ) in the independent public TCGA data set [28] (data not shown). This might highlight high genomic instability and DNA repair perturbations in such tumors.
In multivariate analysis including the other histoclinical prognostic features, RECQL4 expression remained significant for MFS (p = 3.4610 22 ), with pT and molecular subtypes ( Table 1). The clinical response study in regard to RECQL4 gene expression in 924 BCs (included in the large public series, see Table S1B) showed as expected that luminal B tumors have a better pathological response to chemotherapy (Fisher, p = 3.04610 24 ) than luminal A tumors. Moreover, RECQL4 overexpression was associated with a better pathological response in the group including both luminal A and B tumors (Fisher, p = 0.013) ( Figure 3B) but not within separated luminal subgroups (data not shown).

DNA methylation profiles of 109 BCs
The hierarchical clustering established with the most variant methylation scores observed in 5,492 promoters (SD.0.3) classified the 109 BC and 8 NB samples into clusters associated with different DNA methylation patterns (Fig. 4A) variably associated with ER and PR expression (p = 5.6610 27 and p = 2.26610 27 ), SBR grade, molecular subtype (p = 5610 24 ), and TP53 status (p = 1.2610 23 ) ( Table S4A). The 8 NB samples and most of the ER-tumors were included in cluster I while ER+ tumors were mainly distributed in cluster II. Most of the basal tumors were included in cluster I whereas cluster II contained most of the luminal A and luminal B tumors. ERBB2 tumors were distributed in the two clusters probably because ERBB2 tumors are heterogeneous for ER status [17]. These different profiles were consistent with recently reported BC methylation patterns [21][22][23], which pointed to a possible relationship between DNA methylation and ER status.

Validation of methylation results
The supervised analysis comparing methylation score data of ER+ and ER-tumors identified 3,484 gene promoters with methylation differences between the two groups (Table S4B) (t test, FDR,0.05). Among them, 1,753 gene promoters (including those of APC, CAV1, CCND2, CDCA7, CDH3, CDKN2A, CDKN2B, HEY2, RASSF1, RECK) had a DNA methylation level higher in the ER+ group (t-test, FDR,0.05). Conversely, 1,731 gene promoters (including those of BCL2, ESR1, HSD17B4, PISD, WNK4) had a higher level of methylation in the ER-group (FDR,0.05). These . For each region, heatmaps for genome copy number and gene expression profiles are consecutively drawn. Genome copy number was measured by aCGH on probes or groups of probes spanning each of these regions. Red indicates increased copy number and green indicates decreased copy number. In the heatmap tumors are organized from the tumor that presented the most copy number losses to the tumor that exhibited the most copy number gains. The next heatmap was established with the expression of the independent genes located on the corresponding 6q region and profiled in the same 188 tumors similarly organized. For gene copy number and gene expression heatmaps, we used color scale limits from 23 to +3 and 22 to +2, respectively. Next to the right, are plotted genes successively selected by steps I, II and III of the integrated analysis ''aCGH & mRNA expression'' as defined by the work pipeline ( Figure S2). Grey and green lines correspond to rejected and selected genes, respectively. Among genes with an expression level that varied according to CNAs, we retained genes showed significant differences (vertical line) in copy number loss correlated with downregulated expression in luminal B compared to non-luminal B data are in agreement with those reported on IlluminaH platform [20,21,23,24]. The strong correlation between DNA methylation data of RASSF1 gene promoter established by EpiTyper and promoter array approaches further validated our method (Results and References S1, Fig. S10, S11).

DNA methylation associated with breast cancer molecular subtypes
Using the ANOVA method, we identified 4,545 gene promoters with a DNA methylation level different in at least one subtype among the five major ones (Table S4C). They included several genes previously reported differentially methylated in BC subtypes and particularly in luminal and basal (or ER+ and ER2) tumors ( Table S4C). The median methylation levels of these 4,545 subtype-associated genes were highest in the luminal and ERBB2 subtypes and lowest in the basal subtype (p = 6.24610 212 ) (Fig. 4B).
Tukey's range test integrating ten supervised analyses was applied to distinguish gene promoters exhibiting a DNA methylation level different in only one subtype as compared to the others (FDR#0.05) ( Table S4C). A total of 375 and 431 promoters had a methylation level higher and lower in the luminal B tumors than in the other subtypes, respectively. Among them, 265 and 295 had a methylation level higher and lower than in NB tissues (t test, FDR,0.05). This analysis was extended to the other subtypes (Results and References S1).

Molecular subtypes and specific deregulated gene expression in relation with the DNA methylation level variation
Among the 4,545 promoters with a DNA methylation level different in at least one subtype (Table S4C), only 459 genes presented an associated mRNA deregulation (correlation,20.40) ( Table S4D). These genes are listed with their CNA status in Table S4E. The presence of CpG islands was observed in 62.5% of them. In the luminal B cases, among the 560 promoters associated with a significant DNA methylation variation, 46 corresponding genes presented a deregulated expression (24 and 22 were down and upregulated, respectively) (correlation,20.40) ( Table S4E). 52% (24 genes) were found significantly deregulated in relation with their level of DNA methylation in the TCGA data [28] (Table S4E).

Subtype-specific candidates presenting gene expression deregulation in relation with CNA and with DNA methylation aberrations
We integrated genomic, gene expression and DNA methylation profiles to identify subtype-specific candidate genes presenting expression deregulation in relation with CNA and with DNA promoter methylation aberrations (Table S4F).
In the luminal B cases, no gene was downregulated in relation with both copy number loss and with high methylation (Table  S4F). Conversely, 8 genes were upregulated in relation with copy number gain or amplification and with low DNA methylation (Table S4F). C12ORF60 and two previously reported oncogenes, H2AFJ and RAB11FIP1/RCP were the most significantly upregulated (t-test, p,0.05) (Fig. S12A). Except for TPD52 and C12orf60, 75% were upregulated in relation with copy number gain or amplification and with low DNA methylation in the TCGA data [28]. The other subtypes were also analyzed (Results and References S1, Table S4F, Fig. S12B-S12D1-D4).
tumors. They were qualified as potential TSGs. For each region, only the first five most significant are listed. PNRC1, NCOA7 and TNFAIP3 genes were the most significant candidate TSGs for the 6q14.1-q22. 31 (Table S1B). Stratification into high-risk (red curve) and low-risk (black curve) groups were based on relative risk defined by the Cox model using the natural threshold of 1. RECQL4 gene expression is associated with MFS (p = 3.8010 29 ). B -From the same large public series, 924 BCs were informed for the pathological response ( Table S1B). The RECQL4 overexpression was associated with a better pathological response within the group including both luminal A and B tumors (N = 435) (Fisher, p = 0.013). doi:10.1371/journal.pone.0081843.g003
In these NGS studies, only 602 samples were subtyped ( Table  S6A) including 248 luminal A, 134 luminal B, 132 basal, 71 ERBB2 and 17 normal-like BCs. A total of 517 genes exhibited more than 5 mutations in the 602 tumors. For each of them, the frequency of mutation was established and we identified gene mutations associated with a specific subtype by comparing the number of mutations with those observed in the others (Fisher, FDR,0.25 with odds ratio.1 dark grey colored in Table S6D). The first 15 most frequently mutated genes are reported in  In an NGS subset [29], TP53 mutations were enriched in luminal B tumors (p = 0.04) and in high grade tumors (p = 0.02). In our meta-analysis of all NGS samples, TP53 mutations were enriched in only basal (FDR = 4610 218 ) and ERBB2 (FDR = 5610 26 ) tumors. The proportion of analyzed tumors and their heterogeneities could influence the results.
To identify co-occurring and mutually exclusive subtype-specific gene mutations, we retained in the 602 samples, the genes targeted by more than 3 mutations in each subtype. To identify key luminal B genes that could be altered by several mechanisms, we crossed in Table S6I information between genes found in the meta-analysis as mutated in luminal B cases and those identified in our study as significantly altered in luminal B tumors (Tables S2A-S2F) or as potential luminal B candidates (Table  S3G). KCNB2 mutations were associated with luminal B tumors (Fisher, FDR,0.25 with odds ratio.1), and the gene was more frequently gained in luminal B than in the other subtypes (Fisher's exact test; FDR,0.05) (Tables S2A-S2F). Among luminal B candidate genes, only CIT (12q24) (3%), CHD6 (20q12) (2%), MDN1 (6q15), SRGAP1 (12q14.2) (1.5%), UTRN (6q24), BRCA1 (17q21) and EVPL (17q25) (less than 1%) were also targeted by mutations. Red indicates increased DNA methylation score and green indicates decreased DNA methylation score. The dendrogram of samples (above matrixes) represents overall similarities in DNA methylation profiles and is zoomed to the right. Three groups of tumor samples (I, IIa and IIb) are associated with various DNA methylation patterns and delimited by orange vertical lines. Below the dendrogram are some histoclinical and molecular features of the samples: from top to bottom, intrinsic molecular subtypes 4 , IHC ER status and SBR grade. Color legends for the various features are illustrated below. B -The median methylation levels of the 4,545 subtype-associated genes were highest in the luminal and ERRB2 subtypes and lowest in the basal subtype (p = 6.24 10 212 ). C -Compared to the other molecular subtypes, the DNA methylation levels (three top panels) of ASS1, C6ORF145 and ZFP36L2 gene promoters and their mRNA expression (three bottom panels) were higher (

Specific gains and amplifications target oncogene candidates in luminal B tumors
The high proportion of complex ''sawtooth and firestorm'' genomic profiles observed in luminal B tumors suggests a high level of genomic instability. Expectedly, among the most common specific alterations in luminal B tumors ($30%) were the amplified 8p11-p12 and 11q13.1-q13 regions. In these regions ZNF703, FGFR1 and CCND1 have already been found associated with luminal B BCs [9,19,46,51]. The comparison between luminal B and luminal A genomic profiles showed only differences in the frequency of these, which may contribute to phenotypic differences. ZNF703 interacts with WDR68/DCAF7, PHB2 and HSP60 and induces transcriptional repression [19]. Here, we found that WDR68/DCAF7 (17q23.3) is indeed a candidate luminal B oncogene. Genes of the 8p11 region were coamplified with ZNF703, supporting the idea that the 8p amplicon carries multiple genes, such as RAB11FIP1/RCP [47], FGFR1 [54] and PPAPDC1B [45], which contribute to the luminal B phenotype.
In the 8p12/11q13 coamplified regions [55], EIF4EBP1 and RPS6KB2 could be co-targeted and play a synergistic role associated with the development and cancer progression as AKT/MTOR activators [56]. The recurrent 8p12/11q13 coamplification was often accompanied by gain of 17q25.1 containing RPS6KB1, a paralog of RPS6KB2, suggesting again the involvement of the AKT/MTOR pathway in luminal B oncogenesis.

Specific losses target TSG candidates in luminal B tumors
Chromosome arm 6q showed frequent deletions in luminal B tumors. Several 6q regions were lost and rare homozygous or focal deletions of ARID1B, ASCC3, FOXO3, PARK2, MLLT4/Afadin and *Gene mutation associated with the corresponding molecular subtype. doi:10.1371/journal.pone.0081843.t002 UFL1/NLBP genes were observed. Deletion of 6q is also frequent in a variety of cancers [57]. Several other chromosomal regions were targeted by losses suggesting the existence of several potential TSGs in luminal B cancers (Table S7). We identified luminal B candidate TSGs PNRC1 (6q15), PTPRK (6q22.33) and L3MBTL4 (18p11.31). We previously reported L3MBTL4 as a potential BC TSG. PNRC1 is a coactivator of nuclear receptors such as ER. PTPRK negatively regulates the transcriptional activity of ßcatenin in cancer cells.
Gene expression deregulation of luminal B candidate genes could perturb epigenetic regulation Eleven of the luminal B candidate genes we found have been associated with endocrine resistance [33,34,44]. Upregulation of FOXM1 [58] could explain in part the DNA methylated landscape characterizing luminal B tumors. The deregulation of ARID1B, ASHL2, CHD6, KDM4C/JMJD2C, L3MBTL4 and ZFP161 suggests an important perturbation of the epigenetic regulation in luminal B tumors.
BC subtypes have specific methylation profiles [20,21]. Luminal B and basal BCs were reported as the most and least frequently methylated, respectively [21]. Our data are in agreement with these observations; the median methylation level of the 4,545 subtype-associated promoters was the highest in the luminal and ERRB2 subtypes and the lowest in the basal subtype. High DNA methylation level associated with the luminal B subtype targeted CITED4, SP100, SAMD9L, DCR1, FBXO32, ASS1, FAM78A and STAT5A genes previously reported as TSGs or associated with tumor progression ( Table S7). None of them is located in 6q. The two specific luminal B candidates, ASS1 and ZFP36L2 downregulated in relation with DNA methylation have been reported targeted in cancers. DNA methylation of the ASS1 promoter leading to its downregulation was associated with poor prognosis and chemoresistance in various cancers. Several phase I/II clinical trials of the arginine-lowering drug, pegylated arginine deiminase, showed encouraging evidence of clinical benefit and low toxicity in patients with ASS1-negative tumors that might be extended to luminal B tumors.
Finally, we further noted that multiple alternatively spliced transcript variants have been described for candidate genes ASB13, CIT, CPSF1, EPN2, LSM1, PDCD4, RNF146, TAF4, VTCN1. Transcripts of CIT, CPSF1, PDCD4 and TAF4 are overlapped by MIR1178, MIR939 and MIR1234 (both), MIR4680 and MIR1257, respectively [59]. Changes in MIR expression may also contribute to luminal B tumorigenesis by modulating the functional expression of critical genes involved in cancer growth and metastasis.

Mutation status
As expected, TP53 and PIK3CA were among the most frequently mutated genes in our luminal B series. FOXO3 and RUNX3 genes were lost and mutated. A possible role for RUNX3 as a tumor suppressor in ER+ BCs has been suggested [60,61]. FOXO3 functions as a tumor suppressor in both ER+ and ER2 BCs [62,63]. Its nuclear localization and subsequent transcriptional activity is a marker of good prognosis among breast cancer patients [64].
Our meta-analysis of NGS samples showed the frequent mutations of PIK3CA, TP53 and GATA3 in luminal B tumors but also that KCNB2 gene mutations were, for an unexplained reason, associated with this aggressive subtype. A total of 91 cooccurring mutations and GATA3/PIK3CA gene mutual exclusive mutations completed the landscape associated with luminal B tumors. Surprisingly, RUNX3 and FOXO3 mutations were not found in the NGS studies. This may be due the limited coverage and low depth of these early analyses, or to BC heterogeneity and the large number of alterations that could lead to similar deregulations of particular pathways.

Potential targeted therapy in luminal B breast cancer
Hormonal therapy is the preferred treatment for about twothirds of all BC patients whose tumor expresses ER. ER+ BCs are commonly treated with anti-estrogens or aromatase inhibitors [65]. Luminal B tumors respond less than luminal A tumors to such treatments. Seven of the luminal B candidate oncogenes we have identified could be targeted [43]. Several phase I/II clinical trials targeting IGF, FGF and PI3K/AKT pathways in luminal B BC have been reported [32]. Except for the IGF pathway, our data are in agreement with the potential activation of the FGF, PI3K/AKT, PIK3/MTOR pathways subsequent to the overexpression of candidate oncogenes such as RPS6KB1, RPS6KB2, EIF4EBP1, FGFR1, FRS2, and RAB11FIP1 and the high frequency of PIK3CA mutations. Interestingly, targeting the PI3K/AKT/ mTOR pathway is one of the most-promising therapeutic approaches to reversing endocrine resistance for ER positive breast cancer (for review, [66]). Therapeutic strategies combining endocrine treatment and signaling pathways inhibition (such as targeting growth factor receptors (ERBB2) PI3K/AKT/mTOR, CDK4 and CDK6, MDM2-TP53 interaction, FGFR pathway aberrations and histone deacetylases) might be pertinent [66]. However, the luminal B molecular heterogeneity depicted by its complex genomic and epigenomic landscape strongly suggests that other potential targets could also exist and should be identified for each patient. The candidate oncogene RECQL4, which might be involved in endocrine resistance [44], could be also a potential therapeutic target [43] as recently shown in breast cancer cells [52].
The targeting of candidates such as YWHAZ and its coregulated proteins such as FOXM1, or NHERF1, could also restore endocrine sensitivity and reduce the risk of BC recurrence [32,34,44,67,68]. FOXM1 has a significant role in endocrine therapy resistance [67]. In endocrine therapy-resistant BCs, high FOXM1 expression could result from the loss of FOXO function as transcriptional repressor.
In conclusion, this refined molecular dissection of luminal B BCs has pointed to both new and well-known specific candidates. Some code for proteins that participate to the same signaling pathways including those known to cross-talk with ER signaling pathways. Many of the candidate genes have not been previously reported in breast cancer, and deserve further functional validation. Similarly, further characterization of the 6q TSGs is an important goal. This should help better understand pathways and mechanisms affected, and find new therapeutic targets.   4 , GC (genome complexity) (green for simplex, orange for complex saw tooth and brown for complex firestorm), SBR grade (white for grade I, grey for II, and black for III), and IHC ER, ERBB2, and P53 status (white for negative, and black for positive). Crossed white boxes mean not assigned samples.  1-q13.4 (bottom) regions. For each region, heatmaps for genome copy number and gene expression profiles are consecutively drawn. Genome copy number was measured by aCGH on probes or groups of probes spanning each of these regions. Red indicates increased copy number and green indicates decreased copy number. In the heatmap tumors are organized from the tumor that presented the highest copy number gains and amplification to the tumor that exhibited the most copy number losses. The next heatmap was established with the expression of the independent genes located on the corresponding region and profiled in the same 188 tumors similarly organized. For gene copy number and gene expression heatmaps, we used colour scale limits from 23 to +3 and 22 to +2, respectively. Next to the right, are plotted genes successively selected by steps I, II and III of the integrated analysis ''aCGH & mRNA expression'' as defined by the work pipeline ( Figure S2). Grey and red lines correspond to rejected and selected genes, respectively. Among genes with an expression level that varied according to CNAs, we retained genes showed significant differences (vertical line) in copy number gains correlated with upregulated expression in luminal B compared to non-luminal B tumors. They were qualified as potential oncogenes.. For each region, only the first five most significant are listed. ZNF703 and CCND1 genes were the most significant candidate oncogenes for the 8p11.1-p12 (top) and 11q13.1-q13.4 (bottom) regions, respectively. (TIFF) Figure S9 Luminal B candidates and gene CNAs landscape. The Circos diagram presents from outside to inside, luminal B candidate genes, luminal B altered chromosomes, luminal B regional CNAs colored in red and green for significant gains and losses, respectively. Oranges and grey arcs indicate respectively genes/regions that present significant mutually exclusive and cooccurring luminal B CNAs (FDR,0.05) as identified in Table  S3N.  (no apparent CNA), T9345, T8056, T50115, T9934,  T11568, T15120, and T9207. Arrow shows RUNX1 location on each genomic profile. Results show that RUNX1 is targeted by potential breaks in T8056, T9207 and T15120, but also by regional deletions (samples T50115, T9934, T11568, T15120, and T9207). The genomic profiles of T11568 (B), T15120 (C), and T9207 (D) BCs presenting the smallest regional deletions were established with CGH analyticsH software (Agilent Technologies), from centromere to telomere, within the genomic intervals [34.0-36.4 Mb] of the long arm of the chromosome 21. The common smallest deleted region observed in T9207 (D) involves RUNX1.

(TIF)
Results and References S1 In this section, results about: validation of our methylation approach; DNA methylation level associated with the other breast cancer molecular subtypes; and specific deregulated gene expression in relation with the DNA methylation level variation associated with the other breast cancer molecular subtypes; are presented with supplementary references. (DOCX)    [28] was done. C: Comparison of expression levels according to CNAs (Mann-Whitney ; FDR,0.05) in luminal B vs basal BCs. D: Identification of specific luminal B candidate genes from comparison between luminal B and basal BCs. A comparison with TCGA data [28] was done. E: Comparison of expression levels according to CNAs (Mann-Whitney ; FDR, 0.05) in luminal B vs non luminal B BCs. F: Identification of specific luminal B candidate genes from comparison between luminal B and non-luminal B BCs. A comparison with TCGA data [28] Table S6 A: References of the six recent NGS studies used in the meta-analysis (dataset Meta-analysis). B: Frequency of somatic mutations identified in the 875 NGS breast tumors. C: Cooccurring and mutually exclusive mutations in the 875 NGS BCs (without molecular subtype distinction). D: Frequency of somatic mutations and association with molecular subtypes. E: Cooccurring and mutually exclusive gene mutations associated with luminal A subtype. F: Co-occurring and mutually exclusive gene mutations associated with luminal B subtype. G: Co-occurring and mutually exclusive gene mutations associated with basal subtype. H: Co-occurring and mutually exclusive gene mutations associated with ERBB2 subtype. I: genes targeted by several alterations mechanisms in luminal B subtype. (XLSX)

Table S7
Genes commented in discussion with supplemental references. (XLSX)