Refinement of Triple-Negative Breast Cancer Molecular Subtypes: Implications for Neoadjuvant Chemotherapy Selection

Triple-negative breast cancer (TNBC) is a heterogeneous disease that can be classified into distinct molecular subtypes by gene expression profiling. Considered a difficult-to-treat cancer, a fraction of TNBC patients benefit significantly from neoadjuvant chemotherapy and have far better overall survival. Outside of BRCA1/2 mutation status, biomarkers do not exist to identify patients most likely to respond to current chemotherapy; and, to date, no FDA-approved targeted therapies are available for TNBC patients. Previously, we developed an approach to identify six molecular subtypes TNBC (TNBCtype), with each subtype displaying unique ontologies and differential response to standard-of-care chemotherapy. Given the complexity of the varying histological landscape of tumor specimens, we used histopathological quantification and laser-capture microdissection to determine that transcripts in the previously described immunomodulatory (IM) and mesenchymal stem-like (MSL) subtypes were contributed from infiltrating lymphocytes and tumor-associated stromal cells, respectively. Therefore, we refined TNBC molecular subtypes from six (TNBCtype) into four (TNBCtype-4) tumor-specific subtypes (BL1, BL2, M and LAR) and demonstrate differences in diagnosis age, grade, local and distant disease progression and histopathology. Using five publicly available, neoadjuvant chemotherapy breast cancer gene expression datasets, we retrospectively evaluated chemotherapy response of over 300 TNBC patients from pretreatment biopsies subtyped using either the intrinsic (PAM50) or TNBCtype approaches. Combined analysis of TNBC patients demonstrated that TNBC subtypes significantly differ in response to similar neoadjuvant chemotherapy with 41% of BL1 patients achieving a pathological complete response compared to 18% for BL2 and 29% for LAR with 95% confidence intervals (CIs; [33, 51], [9, 28], [17, 41], respectively). Collectively, we provide pre-clinical data that could inform clinical trials designed to test the hypothesis that improved outcomes can be achieved for TNBC patients, if selection and combination of existing chemotherapies is directed by knowledge of molecular TNBC subtypes.


Introduction
Triple-negative breast cancer (TNBC) is a heterogeneous collection of breast cancers lacking expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) amplification. Patients with TNBC have a higher risk of both local and distant recurrence and metastases are more likely to occur in the brain and lungs rather than bone compared to other breast cancers [1][2][3][4]. The majority of TNBC patients who progress to metastatic disease do so within the first three years after diagnosis, however, those patients who have not recurred during this time have similar survival rates as patients with ERpositive breast cancers [5][6][7]. Unlike ER-positive and HER2-amplified breast cancers, there is a lack of recurrent oncogenic driver alterations in TNBC [8][9][10]. This molecular heterogeneity has to date resulted in the absence of FDA-approved targeted therapies for TNBC Chemotherapy is the main treatment option for patients with TNBC in the neoadjuvant, adjuvant or metastatic settings. Despite the rather aggressive clinical behavior of TNBC, approximately 30-40% of patients achieve a pathological complete response (pCR) with no histological evidence of disease at the time of surgery after neoadjuvant chemotherapy and those patients have much higher rates of survival [11,12]. However, patients that show evidence of residual disease after neoadjuvant chemotherapy are six times more likely to have recurrence and twelve times more likely to die of metastatic disease [1,12,13].
The differences in clinical response and survival after neoadjuvant chemotherapy suggest that a subset of TNBC may be inherently insensitive to cytotoxic chemotherapy. We and others have recently demonstrated that TNBCs are transcriptionally heterogeneous and can be grouped into subtypes with vastly differing biologies and responses to chemotherapy and targeted therapies [5,[14][15][16][17]. Previously, we identified six molecular TNBC subtypes (TNBCtype) [14][15][16]18], each displaying unique ontologies and differential response to standard-of-care chemotherapy. The TNBC subtypes include two basal-like (BL1 and BL2), an immunomodulatory (IM), a mesenchymal (M), a mesenchymal stem-like (MSL), and a luminal androgen receptor (LAR) type [8,15,19]. The BL1 subtype is characterized by elevated cell cycle and DNA damage response gene expression, while the BL2 subtype is enriched in growth factor signaling and myoepithelial markers. The IM subtype is composed of genes encoding immune antigens and cytokine and core immune signal transduction pathways and likely represents gene expression from both the tumor cells and infiltrating lymphocytes. Both M and MSL subtypes share elevated expression of genes involved in epithelial-mesenchymal-transition and growth factor pathways, but only the MSL subtype has decreased expression of genes involved in proliferation. The LAR subtype is characterized by luminal gene expression and is driven by the androgen receptor (AR). In addition, we identified TNBC cell lines representative of each subtype and demonstrated differential sensitivity to alkylating agent cisplatin, with BL1 cell lines displaying the greatest sensitivity [15,19,20].
In a prior retrospective analysis of patient pretreatment biopsies, TNBCtype molecular subtypes were predictive of response to neoadjuvant anthracycline and cyclophosphamide followed by taxane (ACT), with BL1 subtype tumors exhibitng the highest pCR (52%) and BL2 and LAR subtypes the lowest (0 and 10%, respectively). These results suggest that certain TNBC subtypes are intrinsically sensitive or insensitive to neoadjuvant chemotherapy.
TNBCtype molecular subtypes were identified from surgical tumor specimens containing significant stromal and immune components including normal cells. To determine if normal cells contribute to TNBC subtypes, we performed histological evaluation, laser-capture microdissection, RNA isolation and gene expression analysis on a panel of TNBC tumors. We provide significant evidence that the IM and MSL TNBC subtypes represent tumors with substantial infiltrating lymphocytes and tumor-associated mesenchymal cells, respectively, and led us to refine our original TNBCtype (BL1, BL2, IM, M, MSL and LAR) to TNBCtype-4 (BL1, BL2, M and LAR).
Using the refined TNBCtype-4 on TNBC tumors genomically analyzed as part of The Cancer Genome Atlas (TCGA), we evaluated survival, age at diagnosis, grade, lymph node positivity, histopathological subtype enrichment and metastatic site preferences relative to TNBC subtype. In a retrospective analysis of gene expression datasets from five clinical trials we determined the predictive value of TNBCtype-4 subtypes in response to neoadjuvant chemotherapy.

TCGA breast cancer datasets and analysis
RNA-seq gene expression data for TCGA breast cancer (BRCA) study were obtained from the Broad GDAC Firehose (http://gdac.broadinstitute.org/). Gene level 3 RSEM mRNA expression (stddata_2015_02_04 run) was downloaded and used for bimodal identification of TNBC samples, PAM50 and TNBCtype subtyping. Mutation annotation files for BRCA were downloaded from GDAC Firehose (stddata_2015_02_04 run) and the total number of missense variants per sample determined. H&E stained sections corresponding to the biopsy or primary tumor used for gene expression were obtained from the cancer digital slide archive (http://cancer. digitalslidearchive.net/) for histological evaluation of lymphocytes. Level 1 clinical annotation was downloaded from GDAC Firehose (stddata_ 2015_02_04 run).

Patients, samples and clinical data
Retrospective analysis was performed on previously published, clinically annotated microarray datasets in the public domain (GSE25066 [8], GSE41998 [14], GSE22358 [11], GSE22226 [1] and GSE32646 [5]) containing gene expression data from tumors of patients that received neoadjuvant chemotherapy (see Table 1 for details). Gene expression data were analyzed from pretreatment tumor samples. Pathological complete response (pCR) was defined as the absence of residual invasive adenocarcinoma in the breast and axillary lymph nodes upon histologic evaluation after neoadjuvant chemotherapy. All patients provided written informed consent and studies were approved by the Institutional Review Board or Independent Ethics Committee at all participating sites. The Hatzis et al. dataset (GSE25066) generated by MD Anderson Cancer Center (MDACC) [8] consisted of 508 breast cancer gene expression profiles from HER2-negative breast cancer patients enrolled in a neoadjuvant chemotherapy trials that received an anthracycline-based and taxane regimen, either in combination or sequentially.
The Glück et al. dataset (GSE22358) contained gene expression profiles of tumors from 154 women with HER2-neu negative breast cancer that received chemotherapy alone consisting of 3 weekly cycles of treatment with Xeloda (capecitabine, 825 mg/m 2 ) with taxotere (docetaxel, 75 mg/m 2 ). Patients with HER2-neu positive breast cancer received the same chemotherapy in combination with Herceptin (trastuzumab) [11].
The Essermann et al. dataset (GSE22226) was obtained from pretreatment breast cancer biopsies from 149 women treated with four cycles of anthracycline followed optional taxane as per physician's discretion [1].
The Horak et al. dataset (GSE41998) was obtained from the tumors of women enrolled on a randomized multicenter, phase II trial (NCT00455533) with no prior treatment and histologically confirmed primary invasive breast carcinoma regardless of hormone receptor or HER2 expression status [14]. Patients received sequential neoadjuvant therapy with 4 cycles of adriamycin (60 mg/m 2 ) and cyclophosphamide (600 mg/m 2 ), followed by a 1:1 randomization to either ixabeplione (40mg/m 2 every 3 weeks for 4 cycles) or paclitaxel (80mg/m 2 weekly for 12 weeks). Patients were stratified by tumor size at baseline, ER status, investigator site and clinical response to AC. All patients underwent surgery 4 to 6 weeks after the last dose of ixabepilone (n = 148) or paclitaxel (n = 147) and specimens evaluated by pathological review at each site. We obtained the publically available gene expression profiles from 279 pretreatment samples (GSE41998). In addition gene expression profiling was performed on tumor and adjacent stroma from 10 TNBC specimens from Vanderbilt University. All research involving human participants have provided written consent and the study approved by the Institutional Review Board (IRB090026).

Gene expression microarray normalization and processing
Raw microarray expression (CEL) files were downloaded from Gene Expression Omnibus. The Affymetrix U133 PLUS 2.0 CEL files from GSE25066 and GSE32646 and U133A CEL files from GSE41998 were processed and normalized using Frozen Robust Multiarray Analysis (fRMA) implemented in Bioconductor package frma, which renders the samples from different studies comparable by utilizing information from the large publicly available microarray databases [19]. The log2-transformed gene expression values were the basis for the analysis presented in this study. For datasets generated using Agilent microarray platforms (GSE22226 and GSE22358), processed gene expression data was obtained from the series matrix file in which Lowess-normalized data were obtained from the log2 ratio of sample (channel 2-Cy5) to the Stratagene human universal reference sample (channel 1-Cy3).

Identification of TNBC patients from gene expression data
To compare mRNA expression with IHC results and eliminate potential false negative and include false positives, we approximated the empirical distributions ESR1, PGR and ERRBB2 mRNA expression from each dataset individually using a two-component Gaussian mixture (R optim package). The following probe sets were used for each of the datasets: GSE41998, GSE25066 and GSE32646; ESR1 (205225_at), PGR (PR208305_at) and ERBB2 (216836_s_at), GSE22226 (GPL4133); ESR1_18336, PGR_2744, and ERBB2_43498, GSE22226 (GPL4133); ESR1_26884, PGR_6923, and ERBB2_37893, GSE22358; ESR1_26884, PGR_15163, and ERBB2_38777. Given the estimated distributions, the posterior probability of negative expression status of ER, PR and HER2 were determined and samples negative for expression of each of these markers identified.

TNBCtype and PAM50 subtype predictor
For each dataset, all samples were analyzed using the PAM50 predictor using the robust method (R genfu package) [20]. To identify TNBC molecular subtypes, only TNBC samples that were determined by mixed Gaussian distribution were subtyped as individual datasets using TNBCtype (http://cbc.mc.vanderbilt.edu/tnbc/) as previously described [18].

Histopathological evaluation of tumor lymphocytes from TCGA
The contribution of mononuclear chronic inflammatory cells to the cellularity of the entire tissue section was assessed histopathologically using digitally scanned H&E slides and Aperio software (Buffalo Grove, IL). Because non-microdissected tissues were used to generate the TCGA profiles, we assumed that peritumoral and intratumoral mononuclear cells were equally as likely to contribute to the gene expression profiles of individual tumors. Semi-quantitative assessment of the proportion of mononuclear cells was performed. The mononuclear cell infiltrate as a percentage of all nuclei in a field was characterized as mild (0-10%) moderate (20-40%) or intense (>50%), using a modification of the proportion score described by the International TILs Working Group [22].

Gene expression analysis of laser capture-microdissected tumor and adjacent stroma
Depending on the amount of available tissue, laser capture microdissection (LCM) was performed on 30 to 50 sections (5μm) of frozen breast cancer needle core biopsies using the Arcturus PixCell IIe microscope (Mountain View, CA). RNA from LCM-captured tumor and adjacent stromal cells was isolated using the RNAqueous-Micro kit (Ambion, Grand Island, NY). RNA was validated for quality and subsequent cDNA synthesis and amplification performed (on 10 ng of total RNA) by the Vanderbilt Technologies for Advanced Genomics (VANTAGE) core. The reactions were run through first strand and second strand synthesis, followed by two rounds of single primer isothermal amplification (NuGEN, San Carlos, CA) amplification. The amplified cDNA product was hybridized to the Affymetrix HuGene 1.0ST array. Raw Affymetrix Human Gene CEL files were normalized using the Robust MultiChip Averaging (RMA) algorithm implemented in the Bioconductor package Affy. The probes were annotated using Bioconductor package hugene10sttranscriptcluster.db and the normalized, log2-transformed gene expression data used for further analysis. Gene expression data are available under GEO (Gene Expression Omnibus) accession number GSE81838.

Statistical analysis
Odds ratio for pCR were computed for each TNBC and PAM50 subtype versus all unselected TNBC patients. Forest plots were generated by R package rmeta. Kaplan-Meier and log-rank tests were used to estimate and compare survival curves of TNBCs patients stratified by PAM50 or TNBCtype. 95% confidence intervals determined by R package binom. Chi-square tests were performed for all comparisons involving two categorical variables from a single population. Fisher's exact test was performed on categorical variable comparisons between two groups. Wilcoxon signed rank test was used for pairwise significance testing of continuous variables. Cox proportional hazard model performed using IM correlation as continuous variable and significance determined by likelihood ratio test. All correlations use Spearman's method. All statistical analyses were performed in R version 3.1.2.

Significant correlation between IM subtype and the level of tumor infiltrating lymphocytes (TILs)
Gene expression profiles from human tumors are a composite, to varying degrees, of tumor and surrounding stromal and immune cells, including fibroblasts, adipocytes, endothelial, macrophages and lymphocytes. Recent studies have suggested prognostic value of TILs in TNBC [23,24]. Since the IM subtype is highly enriched for immune cell markers and signaling, we hypothesized that TIL levels in a tumor specimen would influence the IM subtype 'call' for a given TNBC. To test this hypothesis, we scored H&E sections from 180 TNBCs within The Cancer Genome Atlas (TCGA) for lymphocytes and analyzed the results relative to the TNBCtype call generated from the RNA-seq data (see methods). The study pathologists found that the percentage of total nuclei that represented by lymphocytes ranged from 0% to 70% with a median of 10% per tumor sample. The latter is considered mild lymphocytic presence ( Fig 1A and S1 Table). Tumors classified as IM had the highest average percentage of lymphocytes with 38% followed by BL2 (23%), MSL (21%), LAR (17%), BL1 (15%) and M (9%) (S1 Table). Regardless of tumor subtype, the IM component for each tumor was highly correlated (Spearman, 0.67) with percentage lymphocytes. Analysis of the corresponding TNBCtype calls showed that the level of IM correlation was relative to the degree of lymphocytic presence when binned as mild (median = -0.32), moderate (median = 0.20) and intense (median = 0.56) lymphocytic presence (p<0.001) (Fig 1B).
To further examine the relative level of immune component in the other TNBC subtypes, we used a separate cohort of 587 TNBC tumors [15] and examined cases receiving a BL1, BL2, M, MSL and LAR primary 'call' for the presence of a secondary correlation to the IM subtype. BL1, BL2, MSL and LAR classified tumors all had representatives with high secondary correlations to the IM subtype ( Fig 1C). In contrast, mesenchymal (M) classified tumors all have a very low IM correlation. In fact there is a negative correlation (Spearman, -0.95) between IM and M across all tumors (S1A Fig), suggesting opposite biological states with M-subtyped tumors having a microenvironment that is non-permissive to immune cell infiltration or immunosuppressed (S1B Fig). These data provide strong evidence that the infiltrating lymphocytes contribute significantly to the gene expression profiles for the IM subtype and that correlations to this signature should be considered as a descriptor of the immune state of the tumor rather than an independent subtype [25].
A previously published gene signature composed of T-and B-cell markers, chemokines, and immune checkpoint regulators (CXCL9, CCL5, CD8A, CD80, CXCL13, IGKC, CD21, IDO1, PD-1, PD-L1, CTLA4, FOXP3) was shown to be highly correlated with pathological evaluation of immune cell infiltrate [23]. To further demonstrate that the IM subtype represents tumors with high lymphocytic infiltrate, we examined the expression of the genes listed above in the gene expression data set of 587 TNBC [15]. When ordered by increasing correlation to the IM subtype, high immune gene expression was confined IM, BL1, BL2 and MSL tumors, with very low expression in M tumors (Fig 1D). IM subtype tumors had high levels of the immune checkpoint regulatory genes CTLA4 CD274 (the gene encoding PD-L1) and PDCD1 (the gene encoding PD-1) and may be amenable to agents targeting immune checkpoints given the recent success of anti-PD-1 and anti-PD-L1 therapy in TNBC [26].
Significant correlation between MSL TNBC subtype and the level of stromal mesenchymal cell gene expression We performed laser-capture microdissection (LCM) on 10 TNBC tumors followed by RNA isolation and gene expression analysis on malignant epithelial cell-enriched areas and the adjacent stromal cell-containing areas of the tumor sections (Fig 2A). Principal component analysis demonstrated that overall gene expression profiles were more similar within tumor samples and stromal samples than between matched tumor/stromal samples (S2 Fig). To determine if we efficiently captured tumor and stromal cells, we identified differentially expressed genes (fold change, FC> 2, false discovery rate, FDR< 0.01) between tumor and stromal samples and performed gene set enrichment analysis (S2 Table and Fig 2B). Among the most significantly enriched pathways in dissected, tumor epithelial cells were cell cycle, mitosis, DNA replication, cell cycle checkpoint and G1/S transition. In contrast, dissected stromal samples were highly enriched for expression of genes encoding extracellular matrix proteins, collagens, proteoglycans, glycoproteins and integrins ( Table 2). Analysis of TNBC subtypes from both the matched tumor epithelium and stroma revealed that six of ten pairs had discordant subtype calls (Table 3). Of these, five changed to MSL when the stromal gene expression was analyzed, suggesting that the MSL subtype has features of surrounding cells or that these samples are comprised of stromal gene expression. Examination of the centroid correlations between each of the pairs revealed that the correlations remained stable for all subtypes, with exception of MSL (S2 Table). The MSL component is significantly higher for the adjacent stromal cells compared to the tumor epithelium for each of the pairs (Wilcoxon signed-rank p = 0.001953), indicating the MSL subtype is comprised of tumors with a high abundance of tumor-associated mesenchymal tissue (Fig 2C).

Clinical, histological and genomic differences in refined TNBC subtypes
Having demonstrated that IM and MSL subtype calls are strongly weighted by stromal cell gene expression and subtype correlations are independent of one another, we refined TNBCtype from six to four subtypes (TNBCtype-4) by re-assigning IM and MSL subtypes to the second highest correlated centroid. Using PAM50, TNBCtype and the refined TNBCtype-4 subtyping algorithms we reanalyzed 587 TNBC tumors from publically available gene expression data [15] and 180 additional cases from TCGA [9] for clinical, histological and genomic differences. Given the similar distribution of subtypes for TNBCtype, TNBCtype-4 and PAM50, we merged TNBC patients from both datasets and analyzed clinical variables (S3 Table).
The IM subtype displayed the best overall and relapse-free survival (Fig 3E and 3H). Since the IM subtype had a higher amount of lymphocytes and better survival, we examined whether correlation to the IM centroid resulted in better survival, regardless of TNBCtype subtype. COX proportional hazard modeling survival using correlation to the IM centroid as a continuous variable demonstrated significant increases in relapse-free survival (likelihood ratio p = 0.0494) and trend for increased overall survival (likelihood ratio p = 0.0742) (S3 Fig). Therefore the presence of lymphocytes as measured by the IM correlation has predictive value for better relapse-free survival for TNBC patients, regardless of TNBCtype subtype.
To determine if PAM50 or the refined TNBCtype-4 subtypes display clinical differences, we examined age of diagnosis, tumor size, grade and lymph node involvement among the subtypes from annotated microarray gene expression and TCGA RNA-seq data (Table 4). Non-basal In contrast to lower histological grade, non-basal TNBC presented with significantly more advanced clinical disease and higher stage than basal TNBC (p = 0.0004; Table 4). Stratification by TNBCtype-4 resulted in significant differences in disease stage (p = 0.0003). Despite being  Response of TNBCtype Subtypes to Chemotherapy Response of TNBCtype Subtypes to Chemotherapy of histologically higher grade, BL1 (6% stage 3) tumors were of lower clinical stage than BL2 (30% stage 3) and LAR (22% stage) tumors. Regional spread to lymph nodes occurred in 34% of TNBC and there was no significant difference between basal (29%) and non-basal (31%) TNBC (p = 0.1325). However, there was a significant enrichment of lymph node metastasis in LAR TNBC, with nearly half (47%) of these patients displaying regional spread (p = 0.0278; Table 4). Lymph node involvement was lower for the M TNBC subtype, as only 21% had lymph node disease.
Since there are clear differences in regional lymph node spread, we evaluated if these differences resulted in differential clinical progression. TNBCs have been shown to have high frequency of lung and brain metastases [28]. Using published datasets with metastasis-site annotations (GSE12276, GSE2034 and GSE2603), we identified 124 patients with site-specific metastasis data and examined the metastatic pattern in TNBC subtypes [29]. Overall in TNBC, the incidence of brain (11%), bone (19%) and lung (31%) metastasis were similar to a previous report of 10.9%, 16.6% and 18.5% to brain, bone and lung, respectively [28]. Stratification by TNBCs subtype did not show any statistical differences in brain (p = 0.1238) and lung (p = 0.0776) metastasis (S5 Table). However, the M subtype displayed a significantly higher frequency of lung metastasis (46%) compared to all other subtypes (25%) (p = 0.0388). Metastasis to the bone was significantly different among TNBC subtypes (p = 0.0398). For example, the incidence of bone metastasis was significantly higher for the LAR subtype (46%) as compared to all other subtypes 16% (p = 0.0456), consistent with the preference of hormonally-regulated cancers to metastasize to bone [30].
Since stratification by TNBCtype-4 resulted in subtypes with clinical differences, we examined if stratification by PAM50 or TNBCtype-4 subtypes enriched for atypical histology within TCGA cohort (S6 Table). Nearly all of the special histological subtypes are basal by PAM50, with exception of the lobular carcinomas that are luminal and the secretory and an invasive pleomorphic lobular carcinoma that are normal-like (S6 Table). While comprising the largest TNBCtype-4 subtype, BL1 tumors were largely ductal carcinomas without notable atypical histology. In contrast, infiltrating lobular carcinomas were nearly exclusive to the LAR subtype (4 of 5), suggesting a potential role for AR signaling in lobular breast cancer. Medullary carcinomas are characterized by infiltrating carcinomas with circumscribed pushing borders, dense peripheral lymphoid infiltrate and have favorable outcome. Medullary breast cancer histological types were present in BL1, BL2 and LAR and absent in the M subtype, consistent with the lack of lymphocytic infiltration in the M subtype. Metaplastic carcinomas display differentiation towards squamous epithelium with mesenchymal components and cells displaying spindle, chondroid, osseous or rhabdoid morphologies. All of the metaplastic carcinomas were either BL2 (n = 4) or M (n = 4), with one BL2 described as squamous. In, contrast, each of the metaplastic breast cancers are classified as basal by PAM50, even though they display striking differences in morphology.

Analysis of over 300 TNBC patients treated with neoadjuvant chemotherapy identifies TNBC molecular subtypes with differing responses
Masuda and colleagues previously showed that TNBC patients significantly differ in response to neoadjuvant chemotherapy composed of anthracycline and taxane (A-T) based on the subtype of their tumor as determined by TNBCtype [13]. To determine if women with the refined TNBCtype-4 subtypes have differing outcomes to standard chemotherapy, we re-examined the MDACC cohort (GSE25066) used by Masuda et al. [13] with PAM50, and the refined TNBCtype-4 (S7 Table). To identify TNBC patients within each of the cohorts, we applied a mixed Gaussian distribution for each dataset using the mRNA expression for ER, PR and ERBB2 (S4 Fig). Using mixed Gaussian distribution along with annotated pathological calls, we were able to identify 182 TNBC patients, of which 176 had neoadjuvant chemotherapy response information. Consistent with previous reports, TNBC patients had higher pCR than non-TNBC patients (34% vs. 11%; p = 0.0001), in the GE25066 dataset (Fig 4A) [12]. TNBC tumors were subtyped by either PAM50 or TNBCtype-4 and chemotherapy response evaluated in this dataset. PAM50 subtyping of tumors in to basal and non-basal subtypes did not result in significant differences in pCR (p = 0.1135; Fig 4B). TNBCtype-4 subtyping did not result in significant differences in pCR for TNBC patients treated with neoadjuvant chemotherapy (p = 0.1074; Fig 4C), the pCR incidence for the subtypes displayed shows similar trends to previous studies, with BL1 displaying the greatest response and BL2 and LAR with lower pCR. However, compared to all other subtypes, BL1 patients had significantly higher pCR (49% vs. 31%; p = 0.0441).
Distant relapse-free survival was evaluated in the same cohort to determine if chemotherapy responses to neoadjuvant ACT resulted in differences in survival within PAM50 and TNBCtype-4 subtypes. Despite having better pCR to neoadjuvant chemotherapy (34% vs. 11%), TNBC patients had significantly worse DRFS survival compared to non-TNBC (p = 1.8e-6; Fig 4D). However, TNBC patients that responded to chemotherapy and achieved a pCR clearly had a far better DRFS compared to those patients that did not, with 95% of patients surviving seven years after treatment compared to a median survival of 2.7 y (p = 3.78e-8; Fig  4E). While there were no differences in DRFS between basal and non-basal PAM50 subtypes (p = 0.41), stratification by TNBCtype-4 trended towards significance (p = 0.09), with BL2 patients displaying the worst outcome with a median survival of 2.4 y compared to a median survival for unselected TNBC being greater than 7 y (Fig 4F and 4D). In contrast, the BL1 subtype displayed the highest pCR (49%) and also the best long-term DRFS with 72% of patients relapse-free at 7 y follow up (Fig 4G).
To determine how likely TNBC patients were to achieve a pCR, we computed the odds ratio for pCR for each subtype compared to all TNBC patients. In the combined dataset of 306 TNBC patients, unselected TNBC patients were 2.5 times more likely to achieve pCR than non-TNBC. Stratifying TNBC by PAM50 into basal (OR, 1.17) was slightly more likely to achieve pCR while non-basal TNBC (OR, 0.50) was less likely to achieve pCR than unselected TNBC (OR, 1.00) (Fig 5D). Stratification of tumors by TNBCtype resulted in BL1 (OR, 1.44) and M (OR, 1.21) tumors with greater odds and both BL2 (OR, 0.44) and LAR (OR, 0.81) tumor with lower odds of achieving a pCR compared to unselected TNBC (OR, 1.00) (Fig 5E). Stratification of TNBC patients by refined TNBCtype-4 could identify those patients most and least likely to respond to neoadjuvant chemotherapy. These findings support the need for further clinical testing of the predictive power of TNBC molecular subtyping.

Discussion
TNBC is a molecularly heterogeneous disease that we had previously subtyped by gene expression into six different subtypes [15]. Herein we refine this prior classification into four subtypes based after taking into consideration the contribution of transcripts from normal stromal and immune cells in the tumor environment. The four TNBC subtypes display differing clinical characteristics with BL1 tumors displaying higher grade, lower stage and increased patient overall and relapse-free survival. TNBC subtypes displayed different patterns of progression with patients with LAR tumors having increased regional spread and preferential distant metastasis to bone, while M tumors preferentially metastasize to lung. Clinical differences were complimented by histological differences, with lobular carcinomas exclusive to the LAR subtype and metaplastic carcinomas either M or BL2. More importantly, TNBC subtypes differed in their response to standard neoadjuvant chemotherapy, with BL1 subtype displaying the greatest likelihood of achieving a pathological complete response.
Using pathological evaluation of lymphocytes and tandem LCM followed by gene expression analysis of adjacent tumor and stromal cells, we demonstrated that the previously described IM and MSL subtypes are composed of tumors with low cellularity. Therefore, we have refined TNBC from six to four (TNBCtype-4) transcriptional subtypes, using IM and MSL correlations as subtype descriptors of cellular heterogeneity. Refined TNBCtype-4 subtypes show consistencies with histological differences observed by pathologists, such as lobular breast cancers belonging to the LAR subtype and metaplastic breast cancers having a BL2 or M subtype. Consistent with differing biologies and histologies, are differences in disease progression and metastatic spread. Paradoxically, BL1 tumors are higher grade but lower stage than LAR. There is an increased frequency of regional lymph node involvement in LAR TNBC and differential metastatic spread to the lung and bone for M and LAR tumors, respectively. This tissue tropism likely reflects unique tumor biology and suggests the need for different and more comprehensive approaches for monitoring metastatic disease in newly diagnosed M and LAR patients, with more personalized imaging approaches.
Approximately 20% of TNBCs classify as immunomodulatory and are highly enriched in immune cell makers and signaling. Pathological evaluation of lymphocytes from H&E sections provide significant evidence that infiltrating lymphocytes within tumors drives the overall gene expression of the IM subtype. The presence of tumor-associated lymphocytes in a TNBC generated a gene expression profile that had increased expression of immune checkpoint regulators such as PD1, PD-L1 and CTLA4, strongly correlated with the IM gene signature centroid and was associated with increased relapse-free survival for the patient. These data are of interest, given the promising phase I results with anti-PDL1 inhibitors, in which 18.5% of 27 TNBC patients responded to pembrolizumab [31].
Of note, recent studies have shown that the presence of TILs are associated with better response to adjuvant chemotherapy [32] and neoadjuvant chemotherapy [23]. This association with neoadjuvant response appears to be more pronounced with platinum based agents, as patients enrolled in the doxorubicin with carboplatin arm of the GeparSixto (GBG 66) trial had a greater odds of achieving pCR than those on the doxorubicin arm alone [23].
The IM subtype descriptor has the potential to be a semi-quantitative biomarker for immunereactive TNBC tumors and consideration should be given to investigating if it can identify patients that may benefit from immune checkpoint inhibitors. Interestingly, select tumors representative of all the subtypes, except M, had some degree of correlation to the IM centroid and presence of immune cells. In fact, there was a strong negative correlation between the IM and M subtypes, suggesting M tumors create a microenvironment that is immune-suppressive.
Certain TNBC patients clearly benefit from chemotherapy that includes a combination of anthracylines, alkylating agents and microtubule inhibitors in the neoadjuvant, adjuvant and metastatic settings. However, historically this benefit is restricted to a subset of patients, with approximately 22-30% achieving pCR in the neoadjuvant setting that correlates with overall and event free survival [12]. Currently, no examples of validated predictive biomarkers for individual chemotherapeutics have been described outside of platinum agents for BRCA1/ 2-mutated TNBC [6,33,34].
Since TNBC subtypes have previously been shown to be independently associated with pCR, we re-evaluated response to neoadjuvant chemotherapy in TNBC patients from five neoadjuvant chemotherapy datasets stratified into the refined TNBCtype-4 molecular subtypes. Initial evaluation of the GSE25066 dataset was consistent with the previous publication, in which BL1 had the highest pCR and BL2 and LAR the lowest [13]. Similar evaluation of tumors stratified by PAM50 showed basal tumors having a better response than non-basal. The decreased response of BL2 tumors to neoadjuvant chemotherapy was consistent with decreased distant relapse-free survival for those patients. In contrast, the LAR subtypes had better survival despite a decreased response to neoadjuvant chemotherapy. The decreased response of AR-positive TNBC tumors to neoadjuvant chemotherapy has recently been validated with the report of significantly lower pCR and increased disease recurrence in AR-positive TNBC patients [35]. The discrepancy between response and survival in the LAR subtype can potentially be explained by the decreased proliferation and well-differentiated luminal state of this subtype.
Combined analysis of over 300 TNBC patients receiving neoadjuvant chemotherapy showed that BL2 patients were significantly less likely to achieve a pCR than TNBC as a whole. These data are also supported by decreased relapse-free survival in the GSE25066 cohort, with a less than a 3 y median survival compared to a 7 y for unselected TNBC, and highlight the unmet medical need to identify alternative therapeutic strategies for this patient population. In contrast, BL1 patients were nearly 50% more likely to achieve a pCR compared to unselected TNBC. Even though BL1 tumors are more likely to be of higher grade, they are more responsive in general to genotoxic chemotherapies. The latter is likely due, in part, to aberrant DNA signaling and repair functions in the BL1 subtype tumors. Majority of the samples were classified as BL1 (36%), however after neoadjuvant chemotherapy, one would anticipate a greater enrichment of chemo-insensitive subtypes, such as BL2 and LAR, in patients with residual disease. Stratification into the BL1 subtype may identify a patient population more responsive to chemotherapy and those patients for whom chemotherapeutic treatments are most appropriate.

Conclusions
Our analyses and resulting data refine TNBC into four molecular subtypes and provide further evidence that patients with tumors subtyped as BL1 will receive greater benefit from standard neoadjuvant chemotherapy such as ACT than patients with other TNBC subtypes such as BL2 and LAR. Subtyping of TNBC tumors should provide significant value for future clinical decision-making and the alignment of TNBC patients with traditional chemotherapy versus targeted and immune-based therapies that are currently under clinical investigation.