High-Resolution Comparative Genomic Hybridization of Inflammatory Breast Cancer and Identification of Candidate Genes

Background Inflammatory breast cancer (IBC) is an aggressive form of BC poorly defined at the molecular level. We compared the molecular portraits of 63 IBC and 134 non-IBC (nIBC) clinical samples. Methodology/Findings Genomic imbalances of 49 IBCs and 124 nIBCs were determined using high-resolution array-comparative genomic hybridization, and mRNA expression profiles of 197 samples using whole-genome microarrays. Genomic profiles of IBCs were as heterogeneous as those of nIBCs, and globally relatively close. However, IBCs showed more frequent “complex” patterns and a higher percentage of genes with CNAs per sample. The number of altered regions was similar in both types, although some regions were altered more frequently and/or with higher amplitude in IBCs. Many genes were similarly altered in both types; however, more genes displayed recurrent amplifications in IBCs. The percentage of genes whose mRNA expression correlated with CNAs was similar in both types for the gained genes, but ∼7-fold lower in IBCs for the lost genes. Integrated analysis identified 24 potential candidate IBC-specific genes. Their combined expression accurately distinguished IBCs and nIBCS in an independent validation set, and retained an independent prognostic value in a series of 1,781 nIBCs, reinforcing the hypothesis for a link with IBC aggressiveness. Consistent with the hyperproliferative and invasive phenotype of IBC these genes are notably involved in protein translation, cell cycle, RNA processing and transcription, metabolism, and cell migration. Conclusions Our results suggest a higher genomic instability of IBC. We established the first repertory of DNA copy number alterations in this tumor, and provided a list of genes that may contribute to its aggressiveness and represent novel therapeutic targets.


Introduction
Inflammatory breast cancer (IBC) [1] is one of the most lethal forms of breast cancer because of its high metastatic potential [2]. IBC is classified T4d in the TNM-UICC classification. Diagnosis is based on clinical signs: edema, erythema, pain, breast widening, and induration [2]. Most cases are associated with a ductal type and a high histological grade [3]. The presence of tumor emboli in dermal lymphatics, although not mandatory for the diagnosis, is a pathological hallmark of 50-75% of IBCs, Emboli are nonadherent cell clusters that rapidly spread by continuous passive dissemination [4], thus favoring both distant metastasis and local recurrence. Despite progresses due to the multi-modality treatment [2,5], the prognosis remains poor with a 3-year survival of ,50% [6]. New molecular therapeutic targets need to be identified to improve treatment and increase patient survival.
High-throughput molecular approaches have led to important insights in the molecular heterogeneity of non-IBC (nIBC), notably by identifying biologically and clinically relevant subtypes (luminal A and B, basal, ERBB2-overexpressing, normal-like) [26]. More recently, IBCs have been studied at the mRNA level using DNA microarrays [27][28][29][30][31][32][33]. The results showed the presence of the five molecular subtypes in IBCs, and the definition of IBC versus nIBC gene expression signatures. But the studied series remain relatively small, with 35 IBC samples in the largest one [27,28]. DNA copy number alterations (CNAs) may account for phenotypic and expression differences between IBCs and nIBCs. However, in contrast to nIBCs [34][35][36][37][38][39], genomic imbalances have not yet been analyzed in IBC by using recent techniques such as arraycomparative genomic hybridization (aCGH) or SNP-arrays. The rare genomic studies performed to date used low resolution methods [40][41][42]. Similarly, whole-genome integrated studies (associating analysis of DNA CNAs and mRNA expression levels) have been done in nIBC [34,39,43,44], but never in IBC. Such approaches provide opportunities to better elucidate IBC at the DNA and RNA levels.
Here, we have studied and compared DNA CNAs and mRNA expression deregulation on a whole-genome scale in a large series of IBCs and nIBCs. To our knowledge, this is both the first highthroughput aCGH analysis of IBC and the first whole-genome integrated analysis comparing IBC vs nIBC. This is also the largest series of IBC profiled using high-throughput molecular analyses.

Breast cancer samples and histoclinical characteristics
Pre-treatment tumor tissues were collected from 197 patients with invasive adenocarcinomas, who underwent surgical biopsies or initial surgery at the Institut Paoli-Calmettes (IPC, Marseille, France) between 1987 and 2007. Each patient gave written informed consent and the study was approved by the IPC ''Comité d'Orientation Stratégique''. Tumor samples were macrodissected and frozen in liquid nitrogen within 30 minutes of removal. Before RNA extraction, tumor sections were reviewed by two pathologists (ECJ and JJ) and contained more than 60% of tumor cells. The 197 samples comprised 63 IBCs and 134 nIBCs. IBC tumors were selected upon clinical criteria as T4d tumor. The control group (nIBCs) represented a mixture of early (121 samples, including 68 with pathological axillary lymph node involvement).and locallyadvanced stages (13 samples), as found in previous studies [27][28][29]32,33]. Using only locally-advanced cases as controls would introduce a bias toward the discovery of molecular differences related to the difference of growth kinetics between IBCs (sudden onset and rapid growth) and nIBCs (long-standing evolution with slower growth). Immunohistochemistry (IHC) status was available on most samples for ER (positivity cut-off: 10%), ERBB2 (0-3+ score, DAKO HercepTest, with .1+ defined as positive), and P53 (positivity cut-off: 1%). Patients were treated according to standard guidelines. The main histoclinical characteristics are listed in Table 1. As expected, IBCs were associated with more unfavorable prognostic features and poorer survival than nIBCs.

DNA and RNA extraction
DNA and RNA were extracted from frozen samples by using guanidium isothiocynanate and cesium chloride gradient. DNA and RNA quality was respectively controlled on polyacrylamide gel electrophoresis and on Agilent Bioanalyzer (Agilent Technologies, Massy, France).

Array-comparative genomic hybridization profiling and data analysis
From the 197 samples analyzed, DNA was available for 173, including 49 IBCs and 124 nIBCs. Genomic imbalances of the DNA samples were determined by using high-resolution 244K CGH microarrays (Hu-244A, Agilent Technologies, Massy, France). A pool of 13 normal male DNA was used as reference. Scanning was done with Agilent Autofocus Dynamic Scanner (G2565BA, Agilent Technologies). Data analysis was done and visualized with CGH Analytics 3.4 software (Agilent Technologies). Extraction of data (log 2 ratio) was done from CGH analytics, while normalized and filtered log 2 ratio were obtained from ''Feature extraction'' software (Agilent Technologies). Data generated by probes mapped to X and Y chromosomes were eliminated . The final dataset contained 225,388 unique probes  covering 22,509 genes and intergenic regions according to the  hg17/NCBI human genome mapping database (build 35). Data were analyzed using circular binary segmentation (CBS) as implemented in the DNA copy R/Bioconductor package [45] with default parameters to translate intensity measurements in regions of equal copy number, each region being defined by at least five consecutive probes. Thus, each probe was assigned a segment value referred to as its ''smoothed'' value. We used two different threshold values (log 2 ratio .|0.5|, and |1|) to distinguish low level from high level CNAs [43]. DNA copy number patterns were classified as ''simplex'' (very few CNAs), ''complex sawtooth'' (many low-level CNAs), or ''complex firestorm'' (low-level CNAs and recurrent amplifications) [46,47]. Unsupervised analysis was done with the Cluster program [48] using log 2 ratio data, complete linkage and Pearson correlation as similarity metrics. Results were displayed using TreeView [48]. To determine the robustness of the resulting tumor clusters, we used R package Pvclust [49] with multiscale bootstrap resampling using 1,000 iterations. Approximately Unbiased (AU) p-values $90% were used to define the significant clusters. To identify altered regions, we used the GISTIC algorithm [50], which computes for each segment through the genome a score based on the frequency of CNA combined with its amplitude, with bootstrapping to calculate the significance level (p,0.05). To identify genes with different CNA frequencies between IBCs and nIBCs, the frequency of CNAs was computed for each probe locus as the proportion of samples showing an aberration therein. Alteration frequencies were compared using the Fisher's exact test and false discovery rate (FDR) was applied to correct the multiple testing hypothesis (p,0.05) [51].

Gene expression profiling and data analysis
Gene expression data of the 197 BCs and 4 normal breast (NB) samples, which represented 1 pool of 4 samples from 4 women, and 3 commercial pools of respectively 1, 2 and 4 normal breast RNA (Clontech, Palo Alto, CA), were quantified using wholegenome DNA microarrays (HG-U133 Plus 2.0, Affymetrix). Scanning was done with Affymetrix GeneArray scanner and quantification with Affymetrix GCOS software. Data were analyzed by the Robust Multichip Average method in R using Bioconductor and associated packages [52]. Robust Multichip Average did background adjustment, quantile normalization, and summarization of 11 oligonucleotides per gene. Before analysis, a first filtering step removed from the dataset the genes with low and poorly measured expression as defined by an expression value inferior to 100 units in all samples. All data was then log 2transformed for display and analysis.
Before clustering analysis, a second filtering retained the 12,813 probe sets with the most variable expression across all samples. Clustering was done with the Cluster program [48] using Pearson correlation as similarity metrics and centroid linkage clustering. Results were displayed using TreeView program [48]. The molecular subtypes of breast cancer were determined using the single sample predictor (SSP) classifier based on the list of 306 intrinsic genes as described [53]. The sample was attributed the subtype corresponding to the most correlated centroid. To develop a predictive model based on the expression of the 24 genes identified by integrated analysis (see below), we applied a logistic regression analysis using the glm function in R statistical package. Once defined, the model was applied to expression data to assign the ''IBC-like'' or the ''nIBC-like'' class: first for testing its robustness by leave-one-out cross-validation [54] in the learning set and by validation in an independent set of 24 IBC and nIBC samples, then for estimating its prognostic value in public nIBC datasets.

Integrated analyses of genome and expression data
Before integrated analysis, Affymetrix expression probe sets were matched with Agilent aCGH probes using the hg17/NCBI database. When multiple Affymetrix probe sets mapped to the same gene, the probe sets with an extension « _at », next « s_at », and followed by all other extensions were preferentially kept. When several probe sets with the best extension were available, the one with the highest median value was retained.
We analyzed the degree of correlation between RNA expression and CNAs for the genes and tumors (173 tumors) profiled with both platforms. For each gene with a CNA in at least two IBCs or nIBCs, a Student t-test compared the expression levels in samples with and without the corresponding CNA (FDR-corrected p,0.05). Comparative integrated analysis of IBCs and nIBCs was only applied to the genes with CNA frequencies significantly different between IBCs and nIBCs (628 genes). Genes had to satisfy three other criteria: i) frequencies of combined alterations (gain associated with overexpression vs other combinations, and conversely, loss associated with underexpression vs other combinations) different (Fisher exact test) between IBCs and nIBCs, ii) correlation (Student t-test) between CNA and expression in the 173 samples, and iii) expression different (Student t-test) between IBCs and nIBCs. In the first above-quoted step, overexpression and underexpression for a given gene were assigned using a threshold of |1| corresponding to twice the expression level found in the normal breast pool. These steps are summarized in Figure S1.

Public gene expression data of breast cancer
To test the prognostic performance of our 24-gene signature in nIBCs, we analyzed 6 public data sets collected from five publications [55][56][57][58][59], and from the UNC Microarray Database (Table S1). When different publications included the same patients redundancy was eliminated, resulting in 1,781 different patient samples available for analysis. Before analysis, we mapped hybridization probes for the 24 genes across the two oligonucleotide-based platforms used across the series. When multiple probes were mapped to the same GeneID (EntrezGene identification number), the one with the highest variance in a particular dataset was selected to represent the GeneID. Analysis of each data set (using available normalized data) was done separately to guarantee a larger number of genes common with our signature.

Statistical analysis
Correlations between sample groups and histoclinical factors were calculated with the Fisher's exact test for qualitative variables with discrete categories, and the Mann-Whitney test for continuous variables. Follow-up was measured from the date of diagnosis to the date of last news for patients without any event. Disease-free survival (DFS) was calculated from the date of diagnosis until the date of first relapse whatever its location (local, regional or distant) or date of death (when the relapse data was not available) using the Kaplan-Meier method. Survival was compared between groups with the log-rank test. Univariate and multivariate analyses were done using Cox regression analysis. The p-values were based on the Wald test, and patients with one or more missing data were excluded. All statistical tests were twosided at the 5% level of significance. Analyses were done using the survival package (version 2.30), in the R software (version 2.9.1).

Genome and transcriptome analysis of breast cancer
We first describe the results on the whole set of tumors before addressing the specific question of IBC. High-resolution aCGH was performed on 173 samples, including 49 IBCs and 124 nIBCs. Figure 1A (left) shows the frequency of low level CNAs in all samples. As previously reported [37,44], the three most frequently gained regions were on 1q, 8q and 17q chromosomal arms, whereas the regions frequently lost were on 8p, 11q and 16q. The median percentage of probe sets displaying a CNA in a sample was 3.5%, with a great variability between samples (range, 0.03-44%). As expected, this percentage was higher in grade 3 tumors (2.1%) than in grade 1 tumors (0.5%; p = 0.005; Mann-Whitney test).
The 173 samples and 24 additional samples (197 including 63 IBCs and 134 nIBCs) were profiled using whole-genome mRNA expression microarrays. Figure S2 shows the hierarchical clustering of samples based on the expression of 12,813 probe sets. Samples were sorted into three major clusters, which strongly correlated with histoclinical features (grade, IHC data) and molecular subtypes, but not with the IBC status. As with aCGH analysis, differences due to molecular subtype greatly overcame differences due to clinical type. IBCs were scattered across the three clusters, intermixed with nIBCs, suggesting that, at the RNA level and on a whole-genome scale, they are as heterogeneous as nIBCs.

Regions and genes with significant genome aberrations in IBCs
Our first objective was to establish the first repertoire of genome CNAs in IBC. Table S3 shows the regions with frequent low and/ or high level CNAs in the 49 IBCs, defined using the GISTIC algorithm with a significance threshold of p,0.05. Sixty-five regions were gained/amplified. The top ten regions (in term of median GISTIC score) resided on 17q, 6p, 1q, 8q, 11q, 19q, and 8p chromosomal arms. Many of these regions contain genes involved in breast cancer, such as ERBB2, MYC, CCND1 and FGFR1. Thirty-four regions were gained in at least 15% of IBCs.
The most frequently gained region (.40%) was 8q12.1-24.3, and contained 461 genes. A total of 216 regions were lost/deleted, with the top ten (in term of median GISTIC score) distributed in 4q, 3q, 1q, 9q, 15q and 2q. Only 4 regions were lost in at least 15% of IBCs.

Comparative genome analysis of IBCs and nIBCs
We compared the genomic profiles of 173 samples (49 IBCs and 124 nIBCs). Globally, as shown in Figure 1A, IBCs looked like nIBCs, since both phenotypes showed similar altered regions with similar frequencies of alterations for most of them. However, some differences were visually apparent, such as the more frequent gain of 1q, 8q and 17q regions in IBCs, or the more frequent loss of 4p, 8p, 11q, and 16q regions in nIBCs. The median percentage of probe sets displaying a CNA for a sample was higher in IBCs (3.7%, range 0.01-14%) than in nIBCs (1.9%, range 0.01-26%; p = 6.1.E-253, Fisher's exact test), even if a great variability between samples existed for both types. IBCs showed a higher proportion of ''complex sawtooth'' and ''complex firestorm'' profiles than nIBCs, which conversely showed a higher proportion of ''simplex'' profiles (p = 0.0009, Fisher's exact test) ( Figure S3).

Correlations between gene expression and CNA
We compared the degree of CNA-driven mRNA up and downregulation in 49 IBCs and 124 nIBCs profiled on both platforms by analyzing the 13,127 genes common to the two platforms and retained after filtering based on the expression levels.
In IBCs, from all genes with a CNA (gains/amplifications and losses/deletions) in at least two samples, 10.4% of gained genes exhibited mRNA expression correlated with the gain, and 1.5% of lost genes exhibited expression correlated with the loss (Table S8). In nIBCs, these respective features were 10.4% and 9.5%. For the gained genes, the percentages of correlations were similar in IBCs and nIBCS but for the lost genes, this percentage was 7-fold more important in nIBCs than in IBCs (p,1.E-12, Fisher's exact test; OR = 0.14 (CI95% 0.09-0.20)).

Integrated comparative analysis and IBC-specific candidate genes
To identify potential IBC-specific candidate genes, we compared in the 173 IBCs and nIBCs only the genes with deregulated mRNA expression in relation to CNA ( Figure S1).
Out of the 628 genes identified by supervised analysis with CNA frequencies different between IBCs and nIBCs, 500 were present on the Affymetrix microarrays. They were represented by 748 probe sets on these microarrays, and 4,259 probes on the Agilent microarrays. From these 500 genes, we identified 24 genes whose expression was deregulated in relation to CNA with significant differences between IBCs and nIBCs ( Table 3; Table S9; Figure 2). In all cases, the transcriptional deregulation was associated with IBCs only and corresponded to an overexpression related to a gain (21 genes) and/or amplification (13 genes). By definition, these 24 genes were also overexpressed in IBCs as compared to nIBCs. Twenty of these genes are located in 8q22-q24 and 17q21, including PAPBC1, RAD21, ATAD2, MTSS1, SQLE, ST3GAL1, C17orf37, ABCC3 and PTPN2, previously reported as cancer-related genes. In contrast, no candidate gene was both downregulated and loss-targeted in IBCs.
To validate the association of these 24 genes with the IBC/ nIBC distinction, we analyzed the 24 remaining breast tumors (14 IBCs and 10 nIBCs) only profiled on our Affymetrix platform. None of them had been included in the 173 tumors from which the 24 gene-list had been derived. First, we defined an IBC/nIBC genomic classifier by applying logistic regression to the expression data of the 24 genes in the 173 samples (learning set). As expected, the rate of accurate resulting classification was high (86%). Leaveone-out cross-validation gave similar results with a rate of 82% (p = 2.94E-11, Fisher's exact test; Figure 3A). Then, we applied the predictor to our validation set of 24 samples ( Figure 3B): 75% of samples (10/14 IBCs and 8/10 nIBCs; p = 0.03, Fisher's exact test) were correctly classified, suggesting the robustness of the classifier.
Because of the differences between IBC and nIBC regarding some histoclinical features such as grade, ER, ERBB2 and P53 status, we verified that our 24-gene signature was more associated to the IBC/nIBC distinction than to any of these features. This was done by comparing the p-values testing the correlations between the ''IBC-like''/''nIBC-like'' classes and each feature ( Figure 3A-B). This was further confirmed by regression analysis on the whole set of samples testing and comparing the ability of each variable (24-gene signature, grade, ER, ERBB2 and P53 status) to discriminate IBC from nIBC (Table 4). Three features were significant in univariate analysis (24-gene signature, ERBB2 and P53 status), and two, including the 24-gene signature, remained significant in multivariate analysis. Altogether, these data indicated that our 24-gene signature is more linked to the distinction IBC/niBC than to other potential confounding variables.

Prognostic value of the 24-gene signature
Given the poor prognosis of IBCs, we hypothesized that the 24gene predictor, if biologically relevant with respect to the IBC/ nIBC distinction, might be prognostic in breast cancer. We thus tested its prognostic value in a series of 1,781 clinically annotated nIBCs. Using our 24-gene model (the number of genes common with each data set ranged from 17 to 24), we attributed to each sample an ''IBC-like'' or ''nIBC-like'' profile. All series were then pooled, resulting in 338 nIBCs with an ''IBC-like'' profile and 1,323 with a ''nIBC-like'' profile ( Figure 3C).
We compared the DFS of these two nIBC classes for the 1,420 patients with available clinical outcome. With a median follow-up of 91 months after diagnosis (range, 1 to 220), the ''nIBC-like'' class had a 5-year DFS of 73%, better than the survival of the ''IBC-like'' class with a 5-year DFS of 61% (p = 4.4.E-4, log-rank test). Survival curves are shown in Figure 3D.
We then performed univariate and multivariate DFS analyses (Table 5). In addition to the ''IBC-like'' or ''nIBC-like'' profile, we tested the variables most frequently annotated in the six data sets: patients' age, pathological tumor size, axillary lymph node status, and grade, and IHC ER and PR status (ERBB2 status not available). All features, except age and PR status, were significant

Discussion
We applied high-throughput molecular analyses to a large series of IBCs and nIBCs. Because copy-number changes drive a considerable proportion of the transcriptional changes [34], we compared genome copy number and expression profiles to identify potential IBC-specific candidate genes. To our knowledge, this is the first high-resolution aCGH study of IBC, the first integrated genomic analysis for IBC vs nIBC comparison, and the largest series of IBCs profiled using high-resolution genomic analytic tools.

Genomic alterations in IBC
Because of the high number of CNAs observed in breast cancer, we used stringent log 2 ratio threshold values to define the most specific genomic aberrations. We did not separate IBCs from nIBCs on the basis of whole genome profiles, which were globally very close, suggesting that IBCs are as heterogeneous as nIBCs at the genome level and that different obvious genome alterations are not what distinguish them. However, the number and frequency of CNAs were more important in IBCs than in nIBCs, as well as the proportion of ''complex sawtooth'' and ''complex firestorm'' profiles, clearly suggesting that the genomic differences between them are not due to a possible damping of the nIBC signal by higher contamination with normal tissue. Among the 628 genes with CNAs differentially represented between IBCs and nIBCs, 88% were associated with IBCs and only 12% with nIBCs. These results indicate a higher degree of genomic instability in IBCs, in agreement with their high grade, frequency of P53 mutations and their aggressiveness. Given this genomic complexity and heterogeneity of IBCs, and the low degree of differences observed with nIBCs globally, future studies should ideally compare IBCs and nIBCs within molecular subgroups defined by expression (molecular subtypes) and/or by CNA (simplex, complex sawtooth, complex firestorm) profiles as recently suggested [60]. In this context, and given the scarcity of IBC, international collaborations are underway for collecting enough IBC samples that should allow identifying genomic alterations perhaps more specific of IBC.
However, even if the number of profiled samples is relatively small, our series represents the largest tumor set reported in literature, and its study provides several interesting results. We identified 65 regions with gain/amplification and 216 with loss/ deletion in IBCs. Few data are available in literature regarding the genomic imbalances in IBCs [40][41][42], and none has been done using high-resolution aCGH. A study of loss of heterozygosity at 71 microsatellite markers in 66 IBCs [41] reported deletions of various chromosomal regions including 3p14-p21.2, 6p, 8p22, 11q22-q23, 11q24-q25, 13q14, and especially 17q21, more frequently than in nIBCs. Our results show some overlap with these results (Table S3). We here identified four 17q21 genes, NSF, ARL17P1, ARL17 and KIAA1267 targeted by loss in 17% of IBCs (Table S3).
In both IBCs and nIBCs the number of regions with CNAs was very close and many altered regions were similar, suggesting that common genes are involved in both types. However, some common regions displayed higher frequency and/or higher amplitude of alteration in IBCs, and other regions were specifically altered in IBCs such as 5p15.33, previously associated with poor prognosis in breast [61] and bladder cancers [62].  At the gene level, the results were similar. Many genes were similarly altered in IBCs and nIBCs in terms of frequency and amplitude of aberrations. However, a larger number of genes displayed recurrent amplifications in IBCs than in nIBCs. Some genes, such as CCND1 and FGFR1, were amplified at the same frequency in the two types, whereas others (99 out of 321 recurrently amplified, 31%), such as ERBB2 and MYC, were more often amplified in IBCs. Of note, all high-level CNAs differentially represented between the two types were associated with IBCs, and none with nIBCs. Several of them code for validated or potential therapeutic targets, which could contribute to enlarge our therapeutic armament in IBC. The difference between IBCs and nIBCs was much less important regarding the genes recurrently deleted. The deletion of the RB1 tumor suppressor gene likely contributes to the genomic instability in IBC.

IBC and transcriptional deregulation
Whole-genome gene expression profiling did not distinguish IBCs and nIBCs more than did aCGH. To our knowledge, the information about the relationship between gene CNAs and mRNA expression in IBC is scarce in the literature. In the present study, we restricted the analysis of expression data to genes with DNA CNA. Having identified the genes with a CNA in at least two IBCs and in at least two nIBCs, we determined those whose mRNA expression correlated with the CNA. In both tumor types, ,10% of gained genes presented such correlation, in agreement with previous observations obtained with low resolution techniques and less samples in nIBC series [34,63]. In contrast, this percentage for the genes with loss/deletion was smaller in IBCs (1.4%) than in nIBCs (9.5%), suggesting that epigenetic mechanisms might be more operational in IBCs than in nIBCs.

IBC candidate genes
Integrated analysis of aCGH and expression data identified 24 genes as potential IBC-specific candidate oncogenes, whereas no IBC-specific gene inactivated by loss was found. This does not rule out the likely existence of IBC-specific tumor suppressor genes inactivated by other mechanisms, as well as the existence of a gene expression signatures identified by the sole comparison of whole-genome expression data of IBC vs nIBC. Importantly, the discriminative power of the predictive model -built from the expression levels of these 24 genes -was validated in an independent sample set. Furthermore, this model was an independent prognostic feature in a multicentric series of 1781 nIBCs, indirectly validating its association with IBC, known to be more aggressive than nIBC.
Whether these genes are causative or even predictive of the IBC phenotype in a biological sense or reflect aggressiveness or another associated phenomenon remain to be explored by further in-depth experimental analyses. Several encode proteins involved in the protein translation and transport: MRPL27, a component of mitochondrial ribosomes; VPS13B, involved in vesicle-mediated sorting and transport of proteins within the cell; ABCC3, an ATPbinding cassette transporter, and PABPC1 [64]. Increasing evidence points to a crucial role of translational regulation in cancer development and progression, notably in IBC [65]. PABPC1 is a poly(A)-binding protein (PABP) required for translation initiation. Its interaction with the translation initiation factor eIF4G is crucial for the translational stimulatory effect conferred by the poly(A) tail. In IBC, eIF4G1 reprograms the protein synthetic machinery for specifically increasing the translation of certain mRNAs, notably that encoding p120 catenin, resulting in an increased stabilization of E-cadherin, and that encoding VEGF [66, Silvera, 2009]. E-cadherin stabilization maintains the structure of tumor emboli, allowing them to survive and to metastasize as entire structures. VEGF expression accounts for high levels of angiogenesis in IBC and resistance to hypoxia. Our result suggests that PABPC1 could also participate and potentiates this process, allowing IBC cells to adapt to the persistent hypoxia they experience as tumor emboli. Other genes are associated with cell cycle progression: RBM13/MAK16 [67], TAF2, ATAD2, UTP23, MTBP, and DSCC1. TAF2 is a general transcription factor particularly involved in the G2/M transition [68]. ATAD2, as target of E2F, ER and coactivator of MYC [69], links the 3 corresponding pathways, and likely contributes to the aggressiveness of disease through the enhancement of MYC-dependent transcription [70]. UTP23 [71] is a component of the small subunit processome, required for ribosome biogenesis and cell cycle progression at G1. MTBP regulates the E3 ubiquitin ligase activity of MDM2, a critical negative regulator of p53 function [72]. DSCC1 is a component of an alternative replication factor C complex that loads PCNA onto DNA during S phase. Genes of the signature encode proteins involved in RNA processing and transcription: the TAF2 and ATAD2 transcriptional regulators, ARID2, which facilitates ligand-dependent transcriptional activation by nuclear receptors [73], UTP23; INTS2 and INTS8, subunits of the integrator complex, which associates with RNA polymerase II and mediates 3-prime end processing of snRNAs (Baillat, D, Cell 2005). Other genes are associated with metabolism. ST3GAL1 encodes a glycosyltransferase that induces aberrant glycosylation of MUC1 in breast cancer [4,74]. SQLE encodes a key enzyme of cholesterol biosynthesis, whose expression is associated with poor survival in nIBCs [75]. Two genes, RAD21 and RAD54B, are associated with DNA repair [76]. Finally, four genes (PTPN2, MTSS1, EPN3, and C17ORF37) are associated with cell migration and adhesion and/or poor prognosis of breast cancer. The PTPN2/TC-PTP phosphatase stimulates the ERK pathway [77], and its decreased expression markedly impairs IGF2 induced MCF7 migration [78]. The presence of MTSS1, which encodes for metastasis suppressor 1 [79] appears paradoxal, given its favorable prognostic impact in breast cancer [80]. However, it has been suggested that MTSS1 is unlikely to be a metastasis suppressor, but interacted with RAC, actin and actin-associated proteins to modulate lamellipodia formation [81]. Epsin 3 (EPN3) is involved in extracellular matrix-epithelial cells interactions [82]. C17orf37, whose expression correlates with grade and stage of nIBC [83], promotes invasion and migration of prostate cancer cells by enhancing secretion of uPA, MMP9 and VEGF through NFk B pathway [84]. Altogether, these different processes (protein translation and transport, cell cycle, RNA processing and transcription, metabolism, cell migration) are consistent with the hyperproliferative and invasive phenotype of IBC. Some of them (protein processing, RNA translation, proliferation and lipid metabolism) have been previously reported as overrepresented among genes or pathways associated with IBC [29,31].
In conclusion, we report the first description of genomic profiles of IBCs, on a large sample size and with a high-resolution aCGH platform, and the first integrated genomic analysis comparing IBC vs nIBC. This repertoire of whole-genome CNAs in IBCs may serve of basis for further investigations. We show the genomic complexity and heterogeneity of IBCs, which globally look like nIBCs. Many genes targeted by CNA -some of them specific of IBC -have not been previously reported in breast cancer. We have identified 24 IBC-specific potential oncogenes that could explain, at least partially, the IBC phenotype and its aggressiveness, and lead to the development of new therapeutic strategies. As such, they represent new candidates for further clinical and functional validation in IBC. One of them, PABPC1, is particularly interesting as it likely potentiates the role of an alteration recently discovered as essential in IBC pathogenesis. Our findings, as well as the comprehensive database of CNA and mRNA expression generated, constitute a novel step towards the goal of better understanding, and perhaps treating, IBC, even if other alterations of the tumor, as well as those of its microenvironment, at other molecular levels such as DNA mutations, epigenetic regulations, microRNAs [85], proteins and others [4,25] still need to be identified. Figure S1 Integrated comparative analysis of IBcs and nIBCs with the three successive steps numbered 1, 2 and 3). (PPT) Figure S2 Whole-genome expression profiling of IBCs and nIBCs. A) Hierarchical clustering of 197 samples and 12,813 probe sets with significant variation in mRNA expression level across the samples. Each row of the data matrix represents a gene and each column represents a sample. Expression levels are depicted according to the color scale shown at the bottom. Red and green indicate expression levels respectively above and below the median. The magnitude of deviation from the median is represented by the color saturation. The dendrogram of samples (above matrixes) represents overall similarities in gene expression profiles and is zoomed in B. Colored bars to the right indicate the locations of 6 gene clusters of interest (ECM means extra-cellular matrix). B) Dendrograms of samples. Top, three large groups of tumor samples (designated I to III) are evidenced and delimited by orange vertical lines. Below the dendrogram, are some histoclinical and molecular features of the samples: from top to bottom, clinical type (green for nIBC, and orange for IBC), SBR grade (white for grade I, grey for II, and black for III), IHC ER, ERBB2, and P53 status (white for negative, and black for positive), and intrinsic molecular subtypes (dark blue for luminal A, light blue for luminal B, red for basal, pink for ERBB2-overexpressing, and green for normal-like). (PPT) Figure S3 Proportion of genomic patterns in IBCs and nIBCs. (PPT) Figure S4 6q21 gains are more frequent in IBCs than nIBCs. Profiles of chromosome 6 show higher 6q21 gain frequency in IBC tumors than in nIBC (A). Regional genomic profiles were established with CGH analyticsH software (Agilent Technologies), for IBC and nIBC cases (panels B and C) both within the genomic interval [105.9-114.7 Mb] of the long arm of the chromosome 6 (hg17 human genome mapping; build 35 from NCBI). Profiles are distinguishable by different colors corresponding to different cases. Several IBC cases showed 6q21 gain or regional or focused amplification (B), whereas only two nIBC cases displayed a regional amplification (C). (PPT)