A Stromal Immune Module Correlated with the Response to Neoadjuvant Chemotherapy, Prognosis and Lymphocyte Infiltration in HER2-Positive Breast Carcinoma Is Inversely Correlated with Hormonal Pathways

Introduction HER2-positive breast cancer (BC) is a heterogeneous group of aggressive breast cancers, the prognosis of which has greatly improved since the introduction of treatments targeting HER2. However, these tumors may display intrinsic or acquired resistance to treatment, and classifiers of HER2-positive tumors are required to improve the prediction of prognosis and to develop novel therapeutic interventions. Methods We analyzed 2893 primary human breast cancer samples from 21 publicly available datasets and developed a six-metagene signature on a training set of 448 HER2-positive BC. We then used external public datasets to assess the ability of these metagenes to predict the response to chemotherapy (Ignatiadis dataset), and prognosis (METABRIC dataset). Results We identified a six-metagene signature (138 genes) containing metagenes enriched in different gene ontologies. The gene clusters were named as follows: Immunity, Tumor suppressors/proliferation, Interferon, Signal transduction, Hormone/survival and Matrix clusters. In all datasets, the Immunity metagene was less strongly expressed in ER-positive than in ER-negative tumors, and was inversely correlated with the Hormonal/survival metagene. Within the signature, multivariate analyses showed that strong expression of the “Immunity” metagene was associated with higher pCR rates after NAC (OR = 3.71[1.28–11.91], p = 0.019) than weak expression, and with a better prognosis in HER2-positive/ER-negative breast cancers (HR = 0.58 [0.36–0.94], p = 0.026). Immunity metagene expression was associated with the presence of tumor-infiltrating lymphocytes (TILs). Conclusion The identification of a predictive and prognostic immune module in HER2-positive BC confirms the need for clinical testing for immune checkpoint modulators and vaccines for this specific subtype. The inverse correlation between Immunity and hormone pathways opens research perspectives and deserves further investigation.

identified the ER-, PR-and AR-positive samples in each dataset as follows: the distributions of ER PR and AR expression were analyzed empirically, with a two-component Gaussian mixture model, and parameters for bimodal filtering were estimated with the R Mclust package. A density plot of ER, PR and AR gene expression in each dataset showed a bimodal distribution for ER and PR, but not for AR. The median value was used as the cutoff for AR expression in each dataset.

Affymetrix© platform
For each dataset (training, validation and prediction), we used the R arrayQualityMetrics package on an Affybatch object to filter out outliers. We excluded samples detected as outliers by at least two of the following methods: distances between arrays, boxplots, relative log expression (RLE), normalized unscaled standard error (NUSE), MA plots, spatial distribution of M. The HER2-positive samples were selected and split into training and validation sets. We filtered out 52 outlier samples from the training set and 11 from the validation set. Raw GE values for the HER2-positive samples in each dataset (training, validation and prediction) were normalized independently. The optimal microarray probe set to represent a gene was selected with the R JetSet package. This package developed scoring methods for the assessment of each probe set for specificity, coverage, and degradation resistance. For each dataset, batch effects were removed by median centering of each probe set across arrays and the quantile normalization of all arrays separately for each set.

Illumina© platform
We used the Illumina© probe quality score introduced by Barbosa-Morais in 2010 to ensure that all the probes used were of high quality (deleting probes scored as "bad" or "no match").
Probes were filtered on the basis of three criteria: probe quality [3], GC content between 38% and 64% and presence in more than 5% of the samples (n= 20,009 probes). We eliminated any batch effects associated with sample collection sites, by fitting a linear model (R limma package). We then chose the most variable probes as the optimal probe set.

Statistical analyses
All statistical analyses were performed with R software. Affymetrix© microarrays were normalized with the robust multichip average (RMA) procedure from the EMA R package [6].
Principal component analysis was carried out and heatmaps were generated with the R gplots package.

Gene selection process
Consensus clustering [7] was applied to the training set, to determine the optimal number of robust gene clusters from the most variable genes (standard deviation >0.8) (ConsensusClusterPlus R package). Cluster robustness was assessed by hierarchical clustering (1,000 iterations) with a ward inner, final linkage and Pearson distance. The optimal number of clusters was determined from the cumulative distribution function (CDF), which plots the corresponding empirical cumulative distribution, defined over the range [0,1], and by calculating the proportional increase in the area under the CDF curve. The number of clusters was set as that at which an increase in cluster number (k) did not lead to a marked increase in CDF area. We calculated Pearson's correlation coefficient for the relationships between genes within the same cluster, to assess the heterogeneity of each gene cluster. The consensus clustering method and hierarchical clustering identified five main gene clusters. Further increases in cluster number yielded no significant increase in the consensus distribution function area. Each gene cluster was tested for gene enrichment (biological process (BP), molecular function (MF), cellular component (CC)) by a conditional test for overrepresentation, in the R runHyperGO package. The various gene clusters were associated with different gene ontologies). The clusters were named as follows: Immunity cluster (70 genes), Signal/transduction cluster (130 genes), Hormonal/survival cluster (312 genes), Tumor suppressor cluster (65 genes) and Matrix cluster (39 genes). The Matrix and Immunity clusters were the most homogeneous, with strong correlations between the gene expression profiles of most of the genes within each of these clusters (Pearson's correlation coefficients of 0.60 and 0.43, respectively). We then used the String© database (http://stringdb.org/help/index.jsp?topic=/org.string-db.docs/ch04.html) [8] to identify biological gene networks. String© is a database of known and predicted protein interactions. The interactions include physical and functional associations derived from four sources: genomic context, high-throughput experiments, conserved coexpression, previous knowledge. For each gene cluster, we excluded genes that were not connected to any of the other genes present in the cluster. We then applied a two-step selection process: 1) we selected strong biological networks, by retaining genes with connection scores of at least 0.7 to each other, according to the String database 2) within each biological network, we then selected groups of genes with correlated patterns of expression, with correlation coefficients of at least 0.5. For this step, we used Cytoscape (http://cytoscapeweb.cytoscape.org), an open-source software platform for visualizing complex networks and integrating them with any type of attribute data. Attribute data, such as correlations, variance and interquartile range, were calculated from the expression data matrix.
After selection, we checked that the various genes selected from the same cluster clustered together again (R package ConsensusClusterPlus). Following biological network-driven gene selection, it became clear that the original Immunity cluster was more accurately described by splitting into two slightly different subclusters (Immunity (n=28), Interferon (n=11)). This approach yielded an increase in the area under the consensus distribution function (CDF) curve. The six clusters were renamed as follows, after reclustering: Immunity (n=28), Interferon (n=11), Signal transduction (n=20), Hormonal/survival (n=22), Tumor suppressors/Proliferation (n=36), Matrix (n=21).

Metagene identification
For each dataset (training, validation, Ignatiadis and METABRIC), each gene cluster was used to define a metagene (for the Affymetrix© platform: from the 138 selected genes; for the prognosis set, we used the 136 genes common to the preprocessed prognosis expression matrix and the selected genes matrix from the Affymetrix© platform). Metagene expression was assessed by calculating the median value for the normalized expression values of all probe sets in the respective gene clusters for each sample. For each dataset, we calculated the correlation between expression levels for the various metagenes, using the R psych package.
The metagene value for each sample was then classified as corresponding to "high" or "low" expression according to the median value for the metagene.

Classification of HER2-positive samples
In each dataset, hierarchical clustering was applied to the HER2-positive GE profiles, using the selected genes to visualize the optimal number of stable HER2-positive subtypes (ConsensusClusterPlus R package). We identified five HER2-positive subtypes, using the 138-genes signature. We checked the concordance between each of the validation sets and the training set (Pearson's correlation coefficient for the relationship between centroids). Centroid expression values were determined by calculating the mean normalized expression values for all samples in the sample cluster, for each probe set ( Figure S2).

Analysis of the response to chemotherapy
We assessed the predictive value of our metagenes, by collecting the independent dataset described above (the Ignatiadis set), which contained data for gene expression and clinical variables. We selected HER2-positive samples and the data were preprocessed as described above. The following clinical and pathological variables were available and were categorized as follows: age (<50 versus ≥50), pre-chemotherapy clinical tumor size (T1 and T2, T3, T4), pre-chemotherapy tumor nodal status (N0 versus N1, N2, N3), tumor grade (Grade I and II versus III), treatment type, and response to chemotherapy (pathological compete response versus no pathological complete response). ER and PR status were assessed by gene expression (as described above). We assessed the expression of the metagenes in each population. Given their unimodal distribution, the expression of each metagene was classified as "low" or "high", based on the median value for the six metagenes. Pathological complete response (pCR) was defined as the disappearance of the invasive component of the primary tumor in one study [9] and no residual invasive cancer in the breast and axillary lymph nodes in the other seven studies. Factors predictive of pCR were introduced into a univariate logistic regression model. A multivariate logistic model was then generated. The covariates selected for the multivariate analysis were those with a likelihood ratio test p-value lower than 0.10 in univariate analysis. A backward stepwise selection procedure was used.
We also tested the predictive value of nine immune signatures or metagenes validated as predictive of the response to chemotherapy in HER2-positive breast cancer patients: 12 immune-gene signatures [10], Th1 and T-fh metagenes [11], an immune-related 23-gene signature for NAC (IRSN-23) [12], the CXCL13, CCL8 and CXCL9 metagenes [13], the LCK and IgG combined metagene [14], and the HER2-derived prognostic predictor (HDPP) [15]. We performed several multivariate analyses with each signature or metagene independently and the clinical variables significantly associated with pCR in univariate analysis (tumor grade, tumor stage, ER status). We generated a heatmap of the gene expression profiles of each of the above predictive signatures ( Figure S4A). The samples were ordered according to our classification of low/high levels of Immunity metagene expression.
Expression patterns were similar for the Immunity metagene and for the other predictive gene expression signatures or metagenes, with the exception of Staaf's signature genes.

Prognosis
We assessed the prognostic classification, by collecting the independent dataset described above (METABRIC set), which contained data for gene expression and clinical variables. We selected HER2-positive samples and the data were preprocessed as described above. The following clinical and pathological variables were available and were categorized as follows: age (≤45, [45-55], >55), menopausal status, tumor size, tumor grade according to the Elston and Ellis grading system, number of lymph nodes involved (0 (N-, node negative) versus 1 or more lymph nodes involved (N+, node positive)), the Nottingham Prognostic Index score (good prognosis, intermediate prognosis, poor prognosis), treatment type, last follow-up status and the time at which last-follow-up occurred. ER and PR statuses were assessed on the basis of gene expression (as described above). We assessed the expression of the metagenes in each population and the expression of each metagene was classified as "low" or "high", relative to the median value for the six metagenes. Survival analyses were performed by calculating Kaplan-Meier estimates of the survival function. The endpoint of these analyses was breast cancer-specific survival (BCSS) (death from breast cancer). Time-censoring analyses were performed with a right censoring of events from 1 to 20 years. Log-rank tests were used to compare survival curves. Hazard ratios and their associated 95% confidence intervals were calculated with the Cox proportional hazard model. Variables with a p-value for the likelihood ratio test lower than 0.10 in univariate analysis were included in the multivariate model. Backward selection was used to establish the final multivariate model. However, NPI was excluded due to redundancy with the variables size, grade and lymph node status.
Analyses were performed in the whole population, and in the ER-positive, and ER-negative populations. We also assessed the prognostic value of nine immune signatures or metagenes previously validated as predictive of prognosis in HER2-positive breast cancer patients: the LCK metagene [14], an immune response module described by Desmedt [16], the T-fh and Th1 metagenes, and CXCL13 alone [11], the stroma-derived prognosis predictor (SDDP) [17]) or signatures developed specifically for HER2-positive breast cancers (HDPP [18] 18 (18)(17)(17) [15], the 105 lymphocyte-associated gene signature [19], the 14-immune gene signature [20]). We generated a heatmap of the gene expression profiles of each of the above prognostic signatures ( Figure S4B). The samples were ordered according to our classification of low/high Immunity metagene expression. The expression patterns were similar among signatures, except for the Staaf signature genes, which were associated with poor outcome, and the Finak signature genes associated with a mixed or poor outcome.

Assessment of tumor and stromal infiltrating lymphocytes and Immunity metagene expression in the REMAGUS dataset.
We selected 27 HER2-positive breast cancer patients treated by NAC with or without neoadjuvant trastuzumab at our institution during the REMAGUS 02 trial [21]. Core biopsies were obtained before treatment, and separate cores were processed for histology and for RNA extraction, amplification, and hybridization to Affymetrix U133P2 arrays. We selected HER2positive patients on the basis of the expression of the "216836_s_at" probeset. According to the bimodal distribution of ERBB2 expression, 74 of the 226 samples for which transcriptomic data were available were considered HER2-positive. Gene expression data were normalized with the RMA package and batch effects were removed by the median centering of each probe set across arrays and the quantile normalization of all arrays separately for each set. Thirty of these patients were treated at our institution, and pretreatment microbiopsies corresponding to the gene expression chips were retrieved for 27 of them. Histologic microbiopsy specimens were evaluated independently for the presence of a lymphocytic infiltrate by one BCA pathologist (ML) and one physician (ASHP) unaware of the gene expression classification. Intratumoral TILs and stromal TILs were quantified in a semi-quantitative manner as percentages, as previously recommended [22]. Samples were split into Immunity "low" and "high" expression, relative to the median value for the metagene, as described for the other datasets. Intratumoral TIL and stromal TIL percentages were compared between the Immune "high" and Immune "low" subgroups by ANOVA. The correlation between gene expression and intratumoral TIL and stromal TIL percentages was assessed by calculating Pearson's correlation coefficient.

Expression of our gene signature in human breast cancer cell lines
We assessed the expression of our signature in "in vitro" models, as a means of validating our classification and its prognostic value. We used the gene expression profiles of the human cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE) [23]   Sample clustering was moderately consistent between the training and validation gene sets.
Concordance was high for centroid 1 (high Immunity, low Matrix, low Hormone/survival, concordance from 69 to 79%), but reproducibility was not high for the other centroids in the validation datasets.

Correlation of Immunity metagene expression with hormonal pathways
The levels of expression of ESR1, PGR and AR as a function of Immunity metagene expression status were compared in the four datasets (see the results section for the training set). The results below are presented with p-values for the differences in the Immunity "low" versus the Immunity "high" group, in the training, validation, Ignatiadis and METABRIC datasets. ESR1 expression was consistently higher in the "Immunity low" subgroup than in the "Immunity high" subgroup (p< 10 -16 , p=0.008, p=0.003, p<0.00001, respectively). PGR expression was also stronger in the Immunity metagene "low" expression group, in three of the four datasets (p<10 -6 , p=0.003, p=0.07 and p=0.02, respectively). Similarly, AR expression was stronger in the Immunity metagene "low" expression group, in two of the four datasets (p=0.0002, p=0.43, p=0.003 and p=0.17, respectively).
We also compared the level of Immunity metagene expression across the four datasets as a function of ER, PR and AR status, as previously described. Immunity metagene expression was significantly stronger in the ER-negative subgroup than in the ER-positive subgroup, in all four datasets (p<10 -9 , p=0.03, p<0.0001 and p=0.02, respectively). PR negativity was associated with higher levels of Immunity metagene expression in all but the Ignatiadis dataset (p=0.0001, p=0.05, p=0.24 and p=0.04, respectively), and AR negativity was associated with stronger Immunity metagene expression in the training and METABRIC datasets (p=10 -6 and p=0.04, respectively), whereas no such association was observed in the validation and Ignatiadis datasets (p=0.75 and 0.80, respectively).
We also compared the expression levels of each gene of the Immunity metagene separately as a function of ER status. Most of the genes of the Immunity metagene were significantly less strongly expressed in ER-positive than in ER-negative tumors in all four datasets (training set: 27/28 genes, validation set: 19/28, METABRIC: 23/26 genes, and Ignatiadis dataset: 16/28).

Correlations between the Immunity metagene and published signatures
String database connections between the Immunity genes and published predictive and prognostic metagenes or immune signatures are provided in Figure S4. The gene intersection was poor, but our immune signature nevertheless appears to be strongly correlated with other published signatures, consistent with the use of similar immune pathways. Pearson's correlation coefficients for the signatures are shown in brackets. The Immunity metagene was strongly correlated with the T-fh metagene (r=0.89), the CXCL13 metagene (r=0.74), and the LCK metagene (r=0.96). In comparisons with prognostic signatures, it was found to be correlated with the Denkert signature (r=0.73), the Th1 metagene (r=0.75), the IRSN-23 predictor (r=0.62), the CXCL9 metagene (r=0.70), and the IgG metagene (r=0.78), although the coefficients were lower.