Identification of cell proliferation, immune response and cell migration as critical pathways in a prognostic signature for HER2+:ERα- breast cancer

Background Multi-gene prognostic signatures derived from primary tumor biopsies can guide clinicians in designing an appropriate course of treatment. Identifying genes and pathways most essential to a signature performance may facilitate clinical application, provide insights into cancer progression, and uncover potentially new therapeutic targets. We previously developed a 17-gene prognostic signature (HTICS) for HER2+:ERα- breast cancer patients, using genes that are differentially expressed in tumor initiating cells (TICs) versus non-TICs from MMTV-Her2/neu mammary tumors. Here we probed the pathways and genes that underlie the prognostic power of HTICS. Methods We used Leave-One Out, Data Combination Test, Gene Set Enrichment Analysis (GSEA), Correlation and Substitution analyses together with Receiver Operating Characteristic (ROC) and Kaplan-Meier survival analysis to identify critical biological pathways within HTICS. Publically available cohorts with gene expression and clinical outcome were used to assess prognosis. NanoString technology was used to detect gene expression in formalin-fixed paraffin embedded (FFPE) tissues. Results We show that three major biological pathways: cell proliferation, immune response, and cell migration, drive the prognostic power of HTICS, which is further tuned by Homeostatic and Glycan metabolic signalling. A 6-gene minimal Core that retained a significant prognostic power, albeit less than HTICS, also comprised the proliferation/immune/migration pathways. Finally, we developed NanoString probes that could detect expression of HTICS genes and their substitutions in FFPE samples. Conclusion Our results demonstrate that the prognostic power of a signature is driven by the biological processes it monitors, identify cell proliferation, immune response and cell migration as critical pathways for HER2+:ERα- cancer progression, and defines substitutes and Core genes that should facilitate clinical application of HTICS.


Introduction
Breast cancer is a highly heterogeneous disease [1][2][3][4][5][6]. In the clinic, most (~65-70%) tumors are classified as estrogen receptor alpha positive (ERα + ) that are treated with endocrine therapy [7]. About 20% is driven by amplification of the receptor tyrosine kinase HER2/ERBB2/ NEU and either express ERα (HER2 + :ERα + ) or not (HER2 + :ERα -). A third category comprises triple negative tumors (~10-15%), which do not express ERα, progesterone receptor or HER2. HER2 + patients are treated with chemotherapy plus HER2 antagonists such as trastuzumab, trastuzumab-emtansine, pertuzumab or lapatinib [8][9][10][11][12][13]. Without HER2-directed therapy, HER2 + :ERαpatients have the worst clinical outcome. Yet, some of these tumors do not progress to develop macro-metastases, and therefore surgical removal alone with local radiation or conventional therapy may suffice, at least as front-line therapy. Indeed, there is growing evidence that patients with small (T1), node negative HER2 positive tumors may profit from less aggressive therapies [14] [15]. In contrast, other tumors disseminate to form distal metastases that are virtually incurable, and should be treated most aggressively. Generating a prognostic signature from primary biopsies of these tumors may therefore help guide clinicians and patients as to the most appropriate treatment (Reviewed in Ref. [16]). Many such multi-gene based signatures have been developed for breast cancer and other malignancies [17][18][19][20][21][22]. Understanding the biology of these signatures and the basis for their prognostic power may provide further insight into processes that drive metastatic disease and mortality.
We previously reported on the development of a powerful prognostic signature for HER2 + : ERαbreast cancer patients [23]. Our strategy was based on the recognition that Tumor Initiating Cells (TICs), which sustain growth following transplantation into recipient mice, may express genes that drive metastatic dissemination, colonization and growth at distal sites [24]. We therefore developed a prognostic signature for HER2 + :ERαpatients using enriched TIC fraction from a mouse model for this subtype, MMTV-Her2/Neu [23,25,26]. Using both differentially up-and down-regulated genes between TIC-enriched and non-TIC fractions, we generated a 17-gene Her2-TIC-enriched signature (HTICS) that predicted clinical outcome on publicly available HER2 + :ERαcohorts, but not HER2 + :ERα + patients [23]. When tested headto-head, HTICS had superior prognostic power for HER2 + :ERαpatients than a HER2-Derived Prognostic Predictor (HDPP) [27], Stroma-Derived Prognostic Predictor (SDPP) [19], IGS [18], mammaPrint [28] or a proliferation signature [29]. Furthermore, in multivariate analysis, HTICS was independent of multiple clinical variables [23]. It predicted overall survival (OS) for HER2 + :ERαpatients with hazard ratio (HR) of 5.57 (P = 0.002), and metastatic free survival (MFS) with HR of 7.94 (P = 0.00084). Retrospective analysis on a small cohort of patients treated with trastuzumab (Herceptin) revealed that HTICS + HER2 + :ERαpatients had a worse prognosis compared with HTICSpatients but benefited from trastuzumab therapy, and therefore should be prioritized for anti-HER2 therapy [23]. In contrast, HTICSpatients exhibited good prognosis and may benefit from withholding anti-HER2 therapy as a frontline treatment.
We hypothesized that a prognostic signature monitors the combinatory effect of several biological pathways, and that understanding these pathways may shed light into the biology behind aggressive forms of a particular cancer subtype. The identification of a minimal "core" set of genes may also help define essential pathways. From a clinical perspective, a minimal set of genes may also be useful in generating immunohistochemical (IHC)-based prognosis. In addition, targeting these core genes or pathways may be therapeutically beneficial. Finally, it is important to identify substitutes that can be used to replace specific genes in a signature. If such substitutions are derived from the same functional pathways as the original signature genes, one can further establish the importance of the pathway. Such substitutions are also useful because most signatures are developed using RNA microarray data derived from fresh biopsies, whereas clinical biopsies are routinely processed as formalin-fixed paraffin embedded (FFPE) tissues, necessitating other modes of transcript detection such as NanoString technology [30]. Substituting genes may be required to compensate for limited technical performance of certain probes when a molecular signature acquired from frozen tissues is applied to another platform such as FFPE. This is especially critical for small signatures in which the contribution of each gene is highly important.
In the present study, we describe the identification of biological pathways monitored by HTICS and demonstrate that most genes capable of substituting for this signature share the same pathways. Expression of HTICS genes and their substitutes could be reliably detected in FFPE samples from breast cancer patients using NanoString technology, thus extending the clinical utility of HTICS. Through gene combination studies we have identified a core of 6 genes in HTICS and validated the effectiveness of this core using an independent RNA-Seq platform.
Together, our results demonstrate that a prognostic signature reflects critical biological pathways in a particular breast cancer subtype that dictate patient outcome. Specifically, we found that the prognostic power of HTICS lies in its ability to track cell proliferation, migration and immune-response pathways, which are critical for HER2 + :ERαtumorigenesis. Our analysis also identifies a core and substituting genes that would facilitate clinical implementation of HTICS. The critical role of the immune response, even within the 6 gene Core, for the prognostic power of HER2 + :ERαpatients is particularly important given the accumulative evidence linking the immune system to the therapeutic effects of trastuzumab [31].

Research ethics board (REB)
Experimental protocols with human breast cancer samples were approved by Sunnybrook Health Sciences Centre REB (PIN 418-2011), and were carried out in accordance with the approved guidelines.

Score for Signature Match (SSM) Classifier
For the comparison of effectiveness of our signatures, we used the Score for Signature Match (SSM) classifier to determine the status (signature match or no match) of each sample in the cohort [23]. In short, the SSM classifier takes into consideration 2 major aspects of the signature genes for calculation: 1) the rank of each gene among the samples in the cohort (X n ). This is measured by median-centering the log2 transformed gene expression value, where a positive value is given to X n if the gene is expressed at levels above the median of the cohort and a negative value if below; 2) the expected direction (up or down) and weight of the gene (I n ). We gave every gene equal weight and therefore I n = +1 for Up-regulated signature genes and I n = -1 for Down-regulated signature genes. The score for each gene is the product of I n and X n (I n x X n / |X n |). The formula for SSM is: The formula can be simplified as: Sum of Scores for Every Gene in Signature / Total Weight. If a gene is supposed to be up-regulated (I n = 1), a sample with positive expression above median (X n > 0) will be I n X n /|X n | = 1. If a sample is below median expression (X n < 0) for the same gene, I n X n /|X n | = -1 leading to deduction of SSM score. Samples with SSM !0 were considered a match for the signature and used to compare with all other samples with SSM <0.

Receiver Operating Characteristic (ROC) analysis
The ROC curve represents the tradeoff between True-Positive (correct match) and False-Positive (incorrect match) rates as the threshold level of the classifier varies. The Area Under the ROC Curve (AUC) provides a standard measure of the discrimination power of the classifier, with 1/2 indicating no discrimination power and 1 indicating perfect discrimination. We used ROC and AUC to assess and compare the discrimination/prognostic power of the different signatures/combinations. ROC analysis was performed using algorithms developed by John Eng from the Russell H. Morgan Department of Radiology and Radiological Science at Johns Hopkins University School of Medicine (http://www.rad.jhmi.edu/jeng/javarad/roc/ JROCFITi.html). In brief, the score of each sample for a given signature/combination was calculated by Score for Signature Match (SSM) algorithm and compared to a threshold to determine the match/no-match groups. The match/no-match class was compared with the survival events in the datasets. The survival events of patients were divided into 4 categories to reflect the degree of cancer aggressiveness: 1) disease-free patients after 36 months, the least aggressive; 2) disease-free patients within 36 months, not aggressive; 3) patients' disease progressed after 36 months, aggressive; 4) patients' disease progressed within 36 months, most aggressive.

Human breast cancer patient samples and FFPE RNA extraction
Formalin-fixed paraffin-embedded breast cancer samples from the Department of Anatomic Pathology at Sunnybrook Hospital, Toronto, were used to generate 5-μm thick tissue section slides. Tumors were classified to HER2 -:ERα + , HER2 + :ERα -, HER2 + :ERα + and triple-negative using immunohistochemistry and FISH according to standard methodology. Sections were macrodissected to enrich the sample for invasive component, and de-waxed through sequential 10-minutes washes in xylene (3x), 100%, 90%, 75%, 50%, 30% Ethanol, and PBS. The de-waxed samples were then scrapped off by surgical scalpels and placed in 1.5 ml eppendorf tubes. RNA was extracted using High Pure FFPE RNA Micro Kit (Roche, cat# 04823125001).
Human peripheral blood mononuclear cell, breast cancer cell lines and RNA extraction Frozen human peripheral blood mononuclear cell (PBMC) were gifts from Dr. Shannon Dunn in the Department of Immunology, University of Toronto. MDA157, MDA231, MDA361, MDA436, MDA468, BT549, CAMA1, JIMT1. HCC3153 and HCC38 cells were cultured as described [22,32]. The cells were plated in 10 cm plates and harvested at 80% confluence. RNA was extracted from cells using RNeasy Mini Kit (Qiagen, cat#74104).

Pathway analysis
Microarray analysis with mouse tumor models was carried out using Affymetrix Mouse Gene 1.0 ST with 500 ng of total RNA isolated by double Trizol extractions (Centre for Applied Genomics, Hospital for Sick Children, Toronto). Microarray data were normalized using RMA method via Partek software and log2-transformed gene expression values were obtained. The data were analyzed by GSEA using paired t-test comparing gene expression values in the TIC and CD24-fractions, and parameters set to 2,000 gene-set permutations, gene-sets size between 15 and 500 [23]. The data is deposited in GEO Datasets with identifiers GSE29590 and GSE29616. An enrichment map (version 1.1 of Enrichment Map software) was generated using enriched gene-sets with a nominal p-value < 0.005, FDR < 1% and the overlap coefficient set to 0.5 [33,34]. The databases included in the GSEA analyses were the Gene Ontology (GO), KEGG, PFAM, BIOCARTA and NCI databases. GO, PFAM and KEGG annotations were downloaded from Bioconductor (org.Mm.eg.db version 2.4.6, GO.db version 2.4.5, KEGG.db version 2.4.5). NCI annotations were downloaded from NCI website (http://pid.nci. nih.gov/, 2010-11-08) and BioCarta annotations downloaded from WhichGenes (2010-03-26). Only highly significant gene-sets (p < 0.005; FDR < 1%) are shown as nodes in the enrichment map. When two nodes have high degree of overlap in genes (overlap coefficient > 0.5), they are connected by a green line (edge). The major pathways labeled in Fig 1B are clusters of these significant gene-sets (nodes) by the Markov Cluster Algorithm.

Algorithms for random sets and combination analysis
To determine the probability that signatures/combinations could be the result of random chance, we generated 1000 random signatures with the same composition (i.e. same number of up-and down-regulated genes) and estimated their HRs using a formula provided by Kleinbaum and Klein (Chapter 3, Survival Analysis, 2005) with the same patients as indicated [35]. Algorithms for random sets as well as for combination analysis both written in R are available upon request.

Additional statistical analysis
Parametric paired samples were analyzed by student t-test. Kaplan-Meier Survival analysis was performed using the PAST program (P.D. Ryan and Ø. Hammer, University of Oslo) and pvalue was calculated using the Wilcoxon method. Hazard ratios were calculated by Cox Proportional Hazards Survival Regression. Pearson correlation studies were also performed using the PAST program. Differences between values were considered statistically significant at p< 0.05.

ROC analysis of HER2 amplicon versus IHC for HER2 + breast cancer classification
As a prelude to the forthcoming analysis, we assessed the assay we employ to call HER2 + breast cancer samples. This assay, developed by Staaf et al., measures expression levels of 5 genes around the HER2 amplicon (Her2/Erbb2, Stard3, Perld1, Grb7, C17orf37) directly from microarray data to predict HER2 status [23,27]. The advantage of this approach is that the status of HER2, determined by IHC or FISH, is only available for a fraction of cohorts with mRNA profiling and clinical outcome. Moreover, methods and standardization for HER2 expression vary across clinical labs. To assess whether the amplicon strategy is as efficient as IHC in identifying HER2 + breast cancer samples, we performed Receiver Operating Characteristic (ROC) analysis. This analysis calculates the discriminatory power of a classifier by describing the trade-off between True-Positives (correctly predicting HER2 status from amplicon expression as positive) and False-Positives (incorrectly predicting HER2 status as positive). For this analysis, we combined 11 cohorts containing a total of 1833 breast cancer samples with microarray and IHC data (although most have no clinical outcome, S1 Table). HER2 amplicon status (HER2 positive or negative for 5 amplicon gene expression) was then compared to HER2 IHC classification (positive or negative IHC; details in Method and Materials, S1 Table). Fisher's exact test demonstrated a significant detection of IHC HER2 + samples by HER2 amplicon genes with 96% specificity (S1 Table). The HER2 amplicon accurately predicted HER2 IHC status of breast cancer patients with an Area Under Curve (AUC) value of 0.90 (S1A Fig). Together, the two methods have a Concordance Rate of 90.9% and Cohens Kappa coefficient of 0.69. Thus, in agreement with a recent publication [36], microarray amplicon analysis is a reliable surrogate for HER2 status as determined by IHC, and can be used to improve consistency across datasets.

Leave-one-out analysis of HTICS pathways
To assess the contribution of each pathway as a whole, we deleted each at a time and assessed the prognostic power (as measured by Hazard Ratio and AUC) of the remaining genes, using combined data from 12 publically available datasets with microarray gene expression profiles and clinical outcome [23]. We found that leaving out each of the five pathways diminished the prognostic power of HTICS, albeit to different extents, indicating that all pathways contribute to its full prognostic power ( Table 2). Most critical pathways were Cell cycle and Immune response, followed by Glycan metabolism, Homeostasis and Cell migration. Notably, by this analysis Glycan metabolism and Homeostasis (one gene each) are more critical than Cell migration (3 genes). However, the detailed analysis below reveals that Cell migration is more important than Glycan metabolism and Homeostasis, and that the three main pathways (Cell cycle, Immune response, Cell migration) are most significant, yet genes from all five pathways contribute to the prognostic capability of HTICS.

Identification of substitutes for HTICS genes
Having demonstrated that HTICS monitors the activity of specific signaling pathways in HER2 + breast cancer, we next asked whether genes in the same, rather than all other, pathways would preferentially substitute for HTICS genes. To identify such potential replacements, we initially performed correlation studies on each of the additional 266 TIC genes (i.e. from the original 283 differentially regulated genes in TICs, omitting the 17 HTICS genes) that are significantly regulated in enriched-TICs vs. non-TICs in MMTV-Her2/Neu tumors [23]. Specifically, for each of the 17 HTICS genes, a correlation analysis was performed to determine which of the 266 differentially regulated genes had the most similar expression pattern. Remarkably, 15 of the 17 top correlated genes shared the same pathways as the HTICS genes they could replace (Fig 1A). The only 2 exceptions were Cldn8 (Cell Migration), which was best replaced by Clu (Cell Death) and St8sia4 (Glycan Metabolism), which was best substituted by Ncf4 (Immune Response) (Fig 1B).
Of the 15 genes with potential substitutes in the same pathway, two had the highest correlation with genes from different pathways: CCR2 (Immune Response) could be substituted by Rnase6 (RNA Catabolic Process), and Scrn1 (Immune Response) could be replaced by Mphosph6 (not associated with any identified pathway) (Fig 1; S2 Fig). Our observation that for most HTICS genes (15/17), the best substitutes share similar functional pathways underscores the importance of these pathways in predicting disease progression and survival of HER2 + :ERαpatients.

Development of Nanostring probes for HTICS and detection in FFPE samples
To assess the clinical utility of HTICS, we designed Nanostring probes for the signature genes. As noted, HTICS was generated using RNA microarray data acquired from fresh/frozen tumor biopsies. However, in a clinical setting, the routine platform for detecting gene expression is formalin-fixed paraffin-embedded (FFPE) samples. RNA quality from FFPE tissues is poor, and differences in sample preparation may introduce artifacts that can render detection of some genes unreliable. Knowledge of signature pathways and substitutes could therefore enable the design of clinical prognostic kits with built-in backup genes and controls to expand the clinical utility of HTICS. To test this strategy, we employed NanoString technology, which can faithfully detect RNA expression from partially degraded and contaminated RNA because it does not use an amplification step [30]. To detect the same spliced RNA transcripts, we designed the 17 NanoString probes for the same regions in the mRNA as in the microarray platform used to generate HTICS. These NanoString probes detected expression of all 17 HTICS genes from FFPE tumor samples (Fig 2A). By comparing signals generated from human breast cancer cell lines and peripheral blood mononuclear cells (PBMC), we could determine the tissue from which signals of each HTICS gene originated. For example, Cldn8 expression was predominantly detected in tumor epithelial cells, with very low signal in PBMC. In contrast, CD74 was exclusively detected in PBMC (Fig 2A). Vcam1 and C1qb were undetectable in both tumor epithelia and PBMC but readily detected in whole tumors, suggesting expression in other cell lineages/biological contexts such as endothelium/ coagulation (Fig 2A).

Prognostic power of HTICS substitutes
To determine the ability of the potential replacements to predict clinical outcome, we designed NanoString probes for four substitutes (Spink5, Ctss, Aim2 and CD48). These substitutes were selected because the original HTICS genes (Npy, Ccr2, CD180 and CD72) gave low expression signals in FFPE sections (<100). The replacement genes exhibited comparable or even better expression (Fig 2A). Moreover, ROC analysis of the pattern of gene expression of each substitute compared to the original HTICS gene in all the 79 MFS patients revealed high AUCs of 0.83, 0.77, 0.80 and 0.82 when using the Spink5, Ctss, Aim2 or CD48 substitutes, respectively (Fig 2A and 2B). A modified HTICS signature (HTICSm) with all 4 genes substituted gave nearly identical score in predicting outcome as HTICS, with a correlation of 0.97 (Fig 2C). HTICSm and HTICS also segregated patients into the same risk groups with AUC of 0.99 via ROC analysis (Fig 2D). Thus, these substitutes should produce consistent results by the Nano-String assay, and thus, may be included in a prognostic kit for HER2 + :ERαpatients.
To assess the prognostic power of HTICSm and HTICS in a non-biased way, we generated 1000 randomly selected sets of 17 genes and compared their prediction power to those of HTICS and HTICSm [35]. Both signatures were highly prognostic, ranking within the top 25 for HER2 + patients and top 12 for HER2 + :ERαpatients (Fig 3A). Yet, HTICS performed consistently better than HTICSm and was ranked #1 for HER2 + :ERαpatients (Fig 3A). It had the highest prognostic power with hazard ratio (HR) of 7.94 for MFS and 5.57 for OS, as compared to HTICSm with HR of 5.20 for MFS, and HR of 4.01 for OS (Fig 3B).
When replacing all 17 HTICS genes with the substitutes, the alternative signature (AltH-TICS) still made weak but significant prognosis for HER2 + :ERαpatients with HR of 2.83 for MFS (Fig 3C). In those cases where the highest correlated genes were from different pathways (Rnase6 for CCR2; Mphosph6 for Scrn1), substitution with other-pathway genes resulted in inconsistent (Rnase6 in AltHTICSb), or insignificant (Mphosph6 in AltHTICSc) prognosis S2  Fig). Thus, while the substitutes are capable of significant prognostication, likely because they monitor similar pathways, they are not as effective as the original HTICS.

Identifying HTICS core genes
To define a core gene set that is most critical within HTICS, we developed an algorithm in R to test the prognostic power of all 131,071 possible combinations of genes in HTICS using both MFS and OS cohorts. Of these, 15,417 gene combinations were found to produce significant prognosis (P<0.05) with an average signature length of 9.8 genes. We used two criteria to select for gene combinations with prognostic power. First, we asked how frequently a gene appears with other genes within the 15,417 significant prognostic signatures. For this, we performed a pair-wise analysis for each of the 17 HTICS genes to determine the percentage of overlaps with the other 16 genes. The average % of overlap was 39.52% with standard deviation (SD) of 7.5%. If the percentage overlap of a pair of genes was greater than 1 SD above the average, i.e. 47.02%, the two genes were considered to require each other for prognosis (Fig 4A). To further increase the stringency of selecting critical genes, we only considered those that paired with two or more other genes. 8 such genes highlighted in pink in Fig  5A were identified: Ccr2, Nrp1, Scrn1, Vcam1, CD74, Chaf1b, Npy and CD180, and were designated Core1 (Fig 4B). Core1 gave a HR of 4.53 for HER2 + :ERαpatients (Fig 4C), and AUC of 0.61 with high false positive rate by ROC analysis (Fig 4D).
Interestingly, of the 15,417 gene combinations with significant prognosis, 92 (with average length of 11.1 genes), gave similar or even better score than HTICS. Composition of the top 10 signatures and corresponding HRs in OS and MFS cohorts are depicted in Fig 5A. The 92 signatures maintained the cell migration, cell cycle and immune-response pathways; a relatively high percentage of them also included the glycan metabolism gene St8sia4, but not the homeostasis gene, Atp7b (Fig 5A and 5B). At least 3 of the 5 HTICS pathways were found in 100% of these 92 highly prognostic sub-signatures (Fig 5C). Whether these HTICS-derived sub-signatures are indeed superior to HTICS or whether they represent over-fitting awaits analysis of additional/new cohorts. Here, we used these 92 top combinations in a second approach to identify core genes for HTICS employing the pair-wise analysis described above (Fig 5D). This analysis also resulted in a core of 8 genes (designated Core2): Scrn1, Nrp1, C1qb, CD74, Npy, Chaf1b, St8sia4 and Ccr2 (Fig 5B). Core2 was a better predictor than Core1 with HR of 5.33 for HER2 + :ERαpatients and AUC of 0.72 by ROC analysis (Fig 5E and 5F).

Effect of substitutions on the prognostic power of HTICS Core
We next examined the ability of the replacement genes to substitute for the Core. By substituting all 6 genes in the Core (Fig 7A), the resulting AltCore produced better HR of 6.37 than the Core in OS analysis for HER2 + :ERαpatients (Fig 7B). However, like the Core, AltCore had a low AUC of 0.59 (Fig 7C). Comparing the Core and AltCore with 1000 randomly selected sets of 6-genes (with 3 gene-up; 3 gene-down), the Core produced consistently significant result (ranked #15 for MFS and #12 for OS for HER2 + patients; #2 for MFS in HER2 + :ERαpatients), whereas AltCore was very unstable ranking #2 for MFS but #84 for OS (Fig 7D). Therefore, while the substitutes were able to provide prognostication, they are less stable in predicting patient outcome than the original HTICS or optimized Core.

Cross-platform analysis of the HTICS Core
Finally, to validate the efficacy of the Core, we analyzed RNA-Sequencing profiles of invasive HER2 + breast cancer patients with OS data from The Cancer Genome Atlas (TCGA) [2]. Survival analysis demonstrated that HTICS Core significantly predicted OS with HR of 4.15 ( Fig  7E). The AltCore, Core1 and Core2 gave similar trends but none was significant (Fig 7E). Therefore the 6 gene Core identified here was consistent in making prognostic predictions on independent datasets generated by different technologies (RNA microarray and RNA-seq).

Comparison of HTICS and Core signatures by empirical false detection rate
To compare all the signatures described herein we used Empirical False Detection Rate (eFDR). This was calculated by 10,000 permutations using the entire microarray dataset. Each permutation generated a randomly selected gene sets for HR calculation and the rate of random gene sets with significantly better HR than HTICS was recorded (hence eFDR). After 10,000 permutations, only 13 random gene sets had better HR than the original signature and thus, HTICS had eFDR of 0.00129 (Fig 8A). All six other signatures developed herein were analyzed in a similar way. The results clearly show that while all core signatures were prognostic, significant and with low eFDR, HTICS was by far the most powerful by all criteria. Thus, while the various pathways, in particularly Cell Cycle, Immune Response and Cell Migration, were sufficient to predict clinical outcome, the other pathways (Homeostasis and Glycan Metabolism) fine-tuned the signature and maximized its prognostic power (Fig 8B).

Discussion
We report the identification of critical pathways mediating the prognostic power of HTICS, a 17-gene signature for HER2 + :ERαbreast cancer [23]. We identified cell proliferation, cell migration and immune response pathways as mots critical. Two other pathways, homeostasis and Glycan Metabolism, albeit not as important, contributed to optimal prognostication. We found that 15 of 17 (88%) substituting genes that could best replace the 17 HTICS genes were from the same biological pathways, highlighting their importance in predicting disease-free and overall survival for these patients. We also identified a core signature of 6 genes with high prognostic power and showed that they also include genes from the three major pathways. Thus, not all genes in HTICS equally contribute to its prognostic power. Our results suggest that IHC-based detection of the three major pathways-cell proliferation, cell migration and immune response-may successfully predict clinical outcome for HER2 + :ERαbreast cancer patients. Indeed, we are currently performing retrospective analysis on large cohorts of HER2 + :ERαpatients using the NanoString assay developed herein. In addition, drugs/inhibitors that can antagonize the Core or substituting genes identified herein may have therapeutic benefits.
Our finding of immune-response gene pathway in HTICS, especially of those associated exclusively with PBMC, seems counterintuitive given that the signature was derived from mouse MMTV-Her2/Neu tumor epithelial cells that were depleted of other cell lineages (endothelial and hematopoietic cells [23]). However, the non-TIC fraction was evidently contaminated with immune-cells, possibly due to incomplete lineage depletion or to lack of expression of cell surface markers used for lineage depletion (CD31; CD45-TER119, respectively). In hindsight, the inclusion of these cells, reflecting normal anti-cancer responses by the host, allowed us to generate a signature that includes immune-response genes that is critically important for the prognostic power of HTICS. Indeed, a meta analysis revealed that immuneresponse is an important determinant in the prognosis of HER2 + breast cancer relative to Critical pathways in a HER + :ERαprognostic signature other subtypes [28]. Notably, the identification of immune-related genes as part of HTICS was only made possible because we identified MMTV-Her2/Neu TICs in isogenic (FvB) mice with intact immune system. Had we attempted to generate HTICS from human HER2 + TICs injected into immunocompromised mice, we would have not had the immune component of HTICs, and thus, not a strong prognostic signature. As noted, the presence of immune response pathway even in the minimal 6-gene Core, is particularly interesting given the effect of the immune system on the therapeutic effects of trastuzumab [31]. Combination therapy with trastuzumab and immunomodulatory drugs may therefore be highly synergistic for HER2 + :ERαpatients.
that paired with at least 2 HTICS genes with >1 standard deviation above average are highlighted in red. (C) Frequency of appearance of each HTICS gene in the 92 top combinations. Vertical black lines indicate the gene is present in each combination; yellow lines indicate its absence. Core2 includes genes with higher than median frequency that paired with at least 2 genes. (D) Kaplan-Meier analysis of Core2 on MFS cohorts. (E) ROC analysis of Core2. (F) Frequency of the 5 HTICS pathways in the 92 superior combinations. Black lines indicate a pathway is present in each combination; cyan lines indicate its absence. All combinations monitored at least 3 pathways as indicated.
https://doi.org/10.1371/journal.pone.0179223.g005 Critical pathways in a HER + :ERαprognostic signature One important question is whether the HTICS or Core signature genes identified here merely monitor aggressive behaviour of HER2 + :ERαbreast cancer, or whether they also of up-and down-regulated genes. (E) Kaplan-Meier OS analyses of Core, AltCore, Core1 and Core2 on HER2 + breast cancer samples with RNA-Seq data (TCGA).
https://doi.org/10.1371/journal.pone.0179223.g007 actively participate in and drive metastatic growth. The identification of a 6 Core gene set as well as substitutes may simplify this question by focusing on the most critical genes. For example, inhibition/knockdown of the only Core cell cycle gene Chaf1b, (Chromatin assembly factor 1, subunit B), which is required for assembly of histone octamers onto newly-replicated DNA, and/or its substitutes Kif11 (Kinesin Family Member 11), which is required for chromosome positioning, centrosome separation and bipolar spindle formation during cell mitosis, may diminish proliferation or self-renewal of HER2 + :ERα -TICs.
Over-expression of Nrp1 (Neuropilin 1), a vascular endothelial cell growth factor (VEGF) 165 Receptor, has been implicated in the vascularization and metastasis of many cancer types, including breast cancer [37]. Our results suggest that anti-NRP1 therapy such as the synthetic peptide, EG3287, [38], may be assessed against HER2 + :ERαbreast cancer. The Nrp1 substitute, Klrd1 (Killer Cell Lectin-Like Receptor Subfamily D, Member 1), an antigen expressed on Natural Killer cells, may also be considered as target to block tumor dissemination. Finally, as noted, the homeostasis gene, Atp7b is required for efflux of copper out of the cells. Interestingly, disulfiram, a potent anti-cancer agent, kills tumor cells by increasing intracellular copper concentrations [39,40].
The identification of cell proliferation, migration and immune-response pathways as key prognostic determinants in HER2 + :ERαbreast cancer suggests that IHC assays that can faithfully detect alterations in these pathways may be highly prognostic. While proliferation and immune-response can readily be tested by IHC, identification of cell migration markers may be more challenging. A starting point may be the HTICS Cell migration pathway genes (Nrp1, Npy, Cldn8) and their substitutes (Klrd1, Spink5, Itga8) identified herein-but other known markers of cancer cell dissemination or circulating tumor cells may be equally informative [41,42].

Conclusion
In conclusion, using bioinformatic analysis we established a rationale for identifying critical pathways as well as substitute genes and Core Signature genes that would facilitate clinical application of HTICS and may uncover new therapeutic targets for HER2 + :ERαbreast cancer. Our analysis should be applicable to other types of multigene-based prognostic signatures for breast and other cancer types, and may facilitate the development of a universal breast cancer prognostic test for multiple breast cancer subtypes.