Development of predictive models for the prognosis of triple-negative breast cancer using multiple transcriptomic analyses

Suhyun Hwangbo; Yoojin Choi; Jae Yong Ryu

doi:10.1371/journal.pone.0348414

Abstract

Triple-negative breast cancer (TNBC) is a subtype of breast cancer (BC) and constitutes approximately 15–20% of all BC cases. This subtype has the most aggressive behavior and the worst prognosis. Numerous studies have been conducted over the past several decades to address the lack of clinically available treatment options. In particular, potential markers targeting effective treatment options have been actively studied. However, these efforts were hindered by the complex mechanisms of TNBC, and no study has demonstrated a model with a predictive performance exceeding 0.85. This study developed TNBC prognosis predictive models with a predictive performance exceeding 0.94. Applying the nine selected markers to five independent datasets demonstrated their potential as TNBC-specific prognostic markers. Most of these genes (including GPR61, PZP, IGFL1, and AHCTF1) are associated with overall survival (OS) in patients with TNBC. Based on these results, these nine selected genes may serve as prognostic markers for OS in patients with TNBC.

Citation: Hwangbo S, Choi Y, Ryu JY (2026) Development of predictive models for the prognosis of triple-negative breast cancer using multiple transcriptomic analyses. PLoS One 21(5): e0348414. https://doi.org/10.1371/journal.pone.0348414

Editor: Julie Decock, Qatar Biomedical Research Institute, QATAR

Received: February 26, 2025; Accepted: April 14, 2026; Published: May 4, 2026

Copyright: © 2026 Hwangbo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All publicly available datasets used in this study are listed below. The TCGA-BRCA gene expression data can be accessed from the UCSC Xena Data Hub (“TCGA-BRCA cohort: gene expression RNAseq - IlluminaHiSeq”; https://xenabrowser.net/datapages/?dataset=TCGA.BRCA.sampleMap%2FHiSeqV2&host=https%3A%2F%2Ftcga.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). Survival data are available under “TCGA-BRCA cohort: phenotype - Curated survival data” (https://xenabrowser.net/datapages/?dataset=survival%2FBRCA_survival.txt&host=https%3A%2F%2Ftcga.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443), and BC subtype information is provided in “TCGA-BRCA cohort: phenotype - Phenotypes” (https://xenabrowser.net/datapages/?dataset=TCGA.BRCA.sampleMap%2FBRCA_clinicalMatrix&host=https%3A%2F%2Ftcga.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). The GEO datasets analyzed (GSE65216, GSE215442, and GSE18864) are available via the NCBI Gene Expression Omnibus: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65216 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE215442 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE18864 The METABRIC dataset used for external validation is accessible via cBioPortal (https://www.cbioportal.org/study/summary?id=brca_metabric). The single-cell RNA-seq and bulk RNA-seq datasets generated from the cell line experiments in this study are publicly available at Zenodo (DOI: 10.5281/zenodo.17527826). Code for model development, along with example data, is available at GitHub (https://github.com/Syhyun-Hwangbo/TNBC-Prediction-Model/tree/main).

Funding: This study was supported by Seoul National University Hospital (grant no. 04-2023-0690, Suhyun Hwangbo) and Basic Science Research Program through the National Research Foundation (NRF) of Korea (grant no. RS-2021-NR060140, Jae Yong Ryu) funded by the Ministry of Education. The NRF grant supported single-cell data generation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Breast cancer (BC) is the most commonly diagnosed cancer in women worldwide and has the highest cancer-related mortality rate [1,2]. It is heterogeneous and is generally classified into four subtypes based on the expression of the estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER2) [2,3]. Triple-negative breast cancer (TNBC) is one of the four subtypes of BC, accounting for approximately 15–20% of all BC cases [2,4]. It is defined as the absence of the expression of three receptors (ER, PR, and HER2) [5], resulting in unresponsiveness to agents targeting hormone receptors and HER2. Significant progress has been made in addressing this issue, particularly with novel agents such as poly ADP-ribose polymerase inhibitors and antibody-drug conjugates for patients with TNBC [6–8]. However, these studies have primarily focused on subsets of patients with TNBC, such as those with germline mutations in BRCA1/2 or metastatic TNBC.

Despite these advancements, TNBC remains a challenging disease with poorer outcomes compared to other BC subtypes. Specifically, the 5-year overall survival (OS) rate of primary TNBC is significantly lower (77%) than that of other BC subtypes (93%): Luminal A (LumA), Luminal B (LumB), and HER2-positive [9–11]. Given this poor prognosis and the lack of effective targeted therapies for most TNBC patients, identifying robust prognostic biomarkers has important clinical implications. Such biomarkers can improve risk stratification, guide treatment decisions, and identify patients who may benefit from novel therapeutic strategies or intensified surveillance [12]. Thus, there is a critical need for additional prognostic markers and therapeutic targets to further improve patient outcomes, which is the primary aim of this study.

Numerous studies have been conducted over the past several decades to overcome the limitations of treatment options for patients with TNBC by identifying potential markers for targeting effective treatment options. Neoadjuvant chemotherapy (NAC) aimed at reducing tumor size has been established as the standard treatment for TNBC, with improved prognosis for patients achieving a pathological complete response (pCR) [2,13]. Owing to the clinical benefits of NAC, many studies have focused on identifying the markers that predict the pCR after NAC to stratify patients with more effective NAC responses [10,14,15]. Studies have also identified predictors of sensitivity to chemotherapy in TNBC (such as BCL2), including postoperative adjuvant chemotherapy [16,17]. Additionally, predictors of the response to immune checkpoint inhibitors (ICI) plus chemotherapy were conducted to identify new therapeutic agents [18,19]. Unfortunately, these efforts failed, and effective treatment strategies have not yet been developed owing to the complex mechanisms of TNBC that do not solely rely on specific signals [20].

In addition to chemotherapy or ICI, previous studies have attempted to identify prognostic markers that predict outcomes, such as recurrence, death, disease-free survival (DFS), and OS, in patients with TNBC [5,21,22]. Campione et al. developed a model to predict recurrence using three protein signatures (TrpRS, TSP1, and DP), achieving predictive performance with an area under the curve (AUC) of 0.82 [5]. Xu et al. developed predictive models for death based on multiple machine learning algorithms using clinicopathological data, with the highest predictive performance, achieving an AUC of 0.732 [21]. Yang et al. developed a nomogram to predict DFS and OS using clinicopathological data, with the developed nomogram displaying a predictive performance of AUC 0.784 for DFS and AUC 0.783 for OS, respectively [22]. However, there is no current model with a predictive performance surpassing 0.9 on validation data, indicating the ongoing necessity for developing models with greater accuracy.

Recently, research has been conducted to identify TNBC markers using data from LumA, which expresses two hormone receptors and has the best prognosis among the BC subtypes. Choi et al. constructed a molecular regulatory network model for reprogramming TNBC cells into LumA cells and identified BCL11A and HDAC1/2 as the optimal targets for inducing the transition to LumA cells (1). Singhal et al. established TNBC cell line-driven SLFN12-overexpressing human BC xenografts that led to higher levels of LumA markers, HER2 receptor expression, and ultimately better survival [23].

In this study, we aimed to develop predictive models with outstanding performance in forecasting the prognosis of TNBC subtypes by employing multiple machine learning algorithms and identifying TNBC-specific prognostic markers through validation using multi-cohort transcriptomic datasets.

Materials and methods

Data sources

This retrospective study used six datasets: four RNA-sequencing (RNA-seq) datasets and two microarray datasets. The RNA-seq dataset from The Cancer Genome Atlas Breast Invasive Cancer (TCGA-BRCA) cohort served as the primary dataset for model development; gene expression profiles (Illumina HiSeq 2000) were obtained from the UCSC Xena Data Hub (https://xenabrowser.net/). Only primary tumors were included to build a prognostic model for early outcome prediction. Subtypes were defined by the Prediction Analysis of Microarray 50 (PAM50) signature and comprised 143 TNBC (basal-like), 386 LumA, 186 LumB, and 69 HER2-positive samples. OS and death events were used as the primary outcomes.

For external validation of prognostic genes identified during model development, we used two Gene Expression Omnibus (GEO) datasets (GSE65216 and GSE215442), two cell-line datasets, and the METABRIC cohort. GSE65216 is a GPL570 (Affymetrix U133 Plus 2.0) microarray dataset including 55 TNBC, 29 LumA, 30 LumB, and 39 HER2-positive samples. GSE215442 is an RNA-seq dataset generated from MDA-MB-231 TNBC cells overexpressing SLFN12 to create LumA-like subclones with favorable prognosis, comprising three SLFN12-overexpressing lines and three controls [23]. The cell-line datasets consisted of a single-cell RNA-seq dataset (7,484 TNBC and 4,599 LumA cells) and a bulk RNA-seq dataset (31 TNBC and 10 LumA cell lines). METABRIC is a microarray cohort of 320 TNBC tumors used for external validation; during follow-up, 168 patients (52%) died, with a median OS of 13.3 years.

Model development and evaluation

The overall workflow is illustrated in Fig 1. The objective of this study was to develop a prognostic model for TNBC using the Cox proportional hazards (CoxPH) regression based on time-to-event data. All 143 TNBC samples from the TCGA-BRCA cohort were included, since CoxPH models estimate relative risk within the cohort without requiring a predefined control group.

Download:

Fig 1. Workflow for model development and evaluation.

The dataset was randomly split 70/30 into training and test sets. Across 100 resamples, univariate Cox proportional hazards models were fit in the training set; genes significant in ≥80/100 resamples were retained as candidates. An AUC-based stepwise selection in the training set produced the final gene signature. For each resample, the signature was trained on the training set and evaluated on its paired test set, and performance (AUC) was summarized across all 100 pairs. Outcomes were 5-year and 10-year overall survival.

https://doi.org/10.1371/journal.pone.0348414.g001

The dataset was randomly divided into training (70%) and test (30%) subsets, stratified by event status to maintain the proportion of deaths and censored cases. For each gene, a univariate CoxPH model was fitted using the training set, and genes with p-value < 0.05 were considered significant. This process was repeated 100 times with random resampling, and genes identified as significant in at least 80 of the 100 iterations were retained as candidate predictors.

To determine the optimal combination of predictors, we applied a training AUC-based stepwise selection procedure. Each candidate gene was first fitted individually using CoxPH, and the mean training AUC across 100 resamples was calculated. The gene with the highest mean AUC was chosen as the initial model (M_k = M₁; k = 1).

In each subsequent forward step (k = k + 1), all candidate models formed by adding exactly one previously unselected predictor to the current model (M_k-1) were evaluated, and the highest-mean-AUC model was designated as M_k. A new model (M_k) was accepted only if its mean AUC exceeded that of M_k-1 by more than α = 0.005, which served as a minimal improvement threshold to prevent overfitting from marginal gains; otherwise, M_k-1 was retained and the procedure stopped.

In each backward step (k = k + 1), we evaluated all reduced models formed by removing exactly one predictor from M_k-1 (one-at-a-time deletions; the number of candidates equals the number of predictors in M_k-1). The reduced model with the highest mean training AUC (M_k) was retained only if its performance exceeded that of M_k-1 by more than α = 0.005; otherwise, M_k-1 was retained and the algorithm returned to the forward step. Forward and backward steps were alternated until no further increase in mean training AUC was observed during the forward phase.

The final predictors obtained through this selection procedure were used to develop prognostic models with three algorithms―CoxPH, Random Survival Forest (RSF), and Survival Support Vector Machine (Survival-SVM). For RSF and Survival-SVM, hyperparameters were optimized by maximizing the training AUC. Model performance across 100 resamples was evaluated using the corresponding test sets, with time-dependent AUC, area under the precision-recall curve (AUPRC) and c-index as performance metrics for 5-year and 10-year overall survival outcomes.

Statistical analysis

Group differences in gene expression were tested with two-sided Wilcoxon rank-sum tests. For survival analyses, optimal expression cutoffs for each gene were determined using maximally selected rank statistics (MaxStat) [24], after which samples were dichotomized into high- and low-expression groups. Survival differences were compared with two-sided log-rank tests. Associations between expression group (high vs. low) and overall survival status were evaluated using Fisher’s exact test. Statistical significance was set at p-value < 0.05.

Results

Development and evaluation of TNBC prognosis model

Among the 16,336 protein-coding genes in the TCGA-BRCA cohort, we initially screened candidate predictors (genes) associated with OS. Of the 784 BC patients in the cohort, 143 (18%) were identified as having TNBC (S1 Table). Among them, NAC history was available for 142 patients, all of whom had no such history; one patient had missing data. During the follow-up period, 18 TNBC patients (13%) died, with a median OS of 20.4 years. To prevent overfitting during screening to model development, the total dataset was randomly divided into training and test sets in a 7:3 ratio, and the process from screening to model development was performed using only the training set. A CoxPH model was used to screen the candidate predictors. The genes were selected at a significance level of 5%. To avoid the selection of specific dataset-dependent predictors, the data was randomly split 100 times to select significant genes. This is expected to reduce the selection bias owing to random splitting. After 100 iterations, 53 predictor variables that were selected as key variables more than 80 times were selected as final candidates.

We performed training AUC-based stepwise selection using these 53 candidates. The 5-year OS and 10-year OS (which were used as the main outcomes of the previously developed TNBC prognosis prediction model [22,25]) were used as response variables. For the 5-year OS, over 90% of the 100 test sets achieved a test AUC greater than 0.9, with mean AUC and AUPRC values of 0.9459 and 0.8027, respectively (Fig 2). For the 10-year OS, more than 98% of the test sets achieved a test AUC greater than 0.9, with mean AUC and AUPRC values of 0.9570 and 0.9070, respectively.

Download:

Fig 2. ROC and precision-recall curves of the developed model across 100 training/test splits.

For each outcome (5-year OS and 10-year OS), receiver operating characteristic (ROC) and precision-recall curves from 100 resampled test sets are shown as dashed lines, and the mean curves are shown as solid lines. The mean AUC and mean AUPRC are reported for the ROC and precision-recall curves, respectively.

https://doi.org/10.1371/journal.pone.0348414.g002

In addition to the CoxPH model used as the main model in this study, other machine learning (ML) algorithms (RSF and Survival-SVM) were applied to 100 training and test sets. Both CoxPH and other ML algorithms confirmed predictive performance over 0.8 for AUC, AUPRC, and C-index (Table 1).

Download:

Table 1. Final selected predictive model for each response variable.

https://doi.org/10.1371/journal.pone.0348414.t001

The direction and significance of the coefficients contributing to OS showed consistent trends across the 100 training sets and the total dataset. According to the fitted CoxPH models for the entire dataset (Table 2), CELF6, IGFL1, GPR61, and TTLL2 had positive coefficients, indicating shorter survival with increasing expression levels. In contrast, other predictors (including TMEM14B and CREB5) showed negative coefficients, indicating shorter survival with decreasing expression levels. These findings were consistent with Kaplan-Meier analyses (Fig 3). In addition, Fisher’s exact test showed significant differences in gene expression group distributions (high vs. low) between survivors and non-survivors (Table 3), further supporting the association between gene expression and survival outcomes. Standardized coefficients indicated that CELF6 and IGFL1 were the most influential predictors for OS (Fig 4).

Download:

Table 2. Fitted results of the CoxPH model for each response variable (multiple analysis).

https://doi.org/10.1371/journal.pone.0348414.t002

Download:

Table 3. Association between gene expression groups (high vs. low), defined by gene-specific thresholds, and survival status (alive vs. deceased at follow-up). A Fisher’s exact test was performed to evaluate differences in expression groups between survivors and non-survivors.

https://doi.org/10.1371/journal.pone.0348414.t003

Download:

Fig 3. Kaplan-Meier survival curves for nine genes in TNBC (n = 143).

For each gene, samples were dichotomized by the MaxStat-derived optimal cutoff into high- (red) and low-expression (blue) groups. Survival differences were evaluated with the log-rank test.

https://doi.org/10.1371/journal.pone.0348414.g003

Download:

Fig 4. Standardized genes coefficients in OS predictive models.

For each outcome (5-year OS and 10-year OS), coefficients from the final model fit to the entire dataset are shown. All variables were z-standardized before fitting to enable direct comparison of gene effects.

https://doi.org/10.1371/journal.pone.0348414.g004

Validation of TNBC prognostic markers across cohorts

To confirm the potential of the nine selected prognostic markers for TNBC, we validated them across multiple cohort datasets. We hypothesized that if increased expression levels contributed to a worsening prognosis, its expression level would be higher in TNBC than that in LumA, which is known to have a better prognosis among the BC subtypes. To confirm our hypothesis, we compared the gene expression patterns between TNBC and other BC subtypes for each of the nine genes in the four datasets, including the TCGA-BRCA cohort. Among the nine genes, TTLL2 and GPR61 exhibited trends consistent with this hypothesis in both TCGA-BRCA and GSE65216 datasets (Figs 5A and 5B). Specifically, the expression of TTLL2 and GPR61 was significantly higher in TNBC than in LumA (p-value < 2.2E-16 and <4.1E-11 for TTLL2 and GPR61 in TCGA-BRCA, respectively). Furthermore, the expression levels of both genes were higher in TNBC than those in the other BC subtypes. This trend was consistent for both the genes in the GSE65216 dataset.

Download:

Fig 5. External validation in four independent cohorts.

(a) TCGA-BRCA; (b) GSE65216; (c) GSE215442; (d) cell line bulk RNA-seq. For each cohort, boxplots show the genes among the nine selected genes whose expression follows the hypothesized direction. Fold change (FC) is the mean expression in TNBC divided by that in LumA. TNBC was compared with LumA, LumB, and HER2 using two-sided Wilcoxon rank-sum tests.

https://doi.org/10.1371/journal.pone.0348414.g005

Unlike TCGA-BRCA and GSE65216 datasets, the GSE215442 dataset was designed to generate SLFN12-overexpressing xenografts from a TNBC cell line (MDA-MB-231), resulting in a LumA-like TNBC cell line with better prognosis. The GSE215442 dataset contained RNA-seq data from two groups: original TNBC and LumA-like TNBC cell lines. Analysis of the GSE215442 dataset identified three genes whose expression level trends in the LumA-like TNBC group compared to those in the original TNBC group were consistent with those of the nine genes selected as predictors (Fig 5C). The GPR61 gene demonstrated a trend toward decreased survival time with increasing expression levels across TNBC subtypes in TCGA cohort and exhibited higher expression levels in the original TNBC group than those in the LumA-like TNBC group. In contrast, TMEM14B and PZP displayed a trend toward decreased survival time with decreasing expression levels across the TNBC subtypes in the TCGA cohort and had lower expression levels in the original TNBC group than those in the LumA-like TNBC group. The GSE65216 dataset and the cell-line-based bulk RNA-seq dataset revealed higher IGFL1 expression levels in TNBC than those in LumA, but the difference was not statistically significant (Figs 5B and 5D).

Through validation analysis of multi-cohort datasets, we confirmed that five genes (TTLL2, GPR61, TMEM14B, PZP, and IGFL1) were validated in at least one independent dataset. Interestingly, the GPR61 gene was validated in three datasets, although there is still no known relationship between GPR61 (G-protein-coupled receptor 61) and TNBC prognosis. GPR161 (which belongs to the same receptor family as GPR61) is overexpressed in TNBC and impairs the proliferation of TNBC cell lines in knockdown experiments [26]. Considering that GPR161 is a potential drug target, the same is expected for GPR61.

Discussion

We developed prognostic models for TNBC that achieved an AUC exceeding 0.94 in the test sets, outperforming previously reported OS-predictive models, which typically achieved AUCs below 0.85 [22,25,27]. Comparable high-performing models have been reported, including a 10-gene early-stage TNBC signature [28], a stemness-based prognostic model [29], and an EMT-related gene signature [30]. These studies collectively demonstrate that compact gene sets can achieve clinically meaningful risk stratification. Our model extends this approach by incorporating systematic resampling with AUC-based feature selection, thereby improving generalizability.

Beyond technical performance, long-term survival prediction has important clinical relevance. While treatment-response prediction primarily informs initial therapeutic decision-making, long-term survival prediction (5- and 10-year OS) provides complementary but distinct clinical value. TNBC is a highly aggressive and heterogeneous disease, with substantial variability in survival outcomes even among patients with similar clinical characteristics. This heterogeneity necessitates personalized prognostic assessment and accurate long-term risk stratification [31].

Despite extensive research efforts, robust prognostic tools for TNBC remain limited, and existing clinical markers such as pCR provide only partial prognostic information [32]. Survival prediction offers clinically actionable insights beyond treatment response by enabling risk stratification and supporting long-term management decisions, including treatment intensity and follow-up planning [31,33]. In addition, a substantial proportion of TNBC recurrences and deaths occur beyond five years after diagnosis, and even patients who initially achieve favorable responses (e.g., pCR) may experience late relapse [34,35]. Together, these findings indicate that treatment response and long-term survival capture related but distinct aspects of disease progression, underscoring the importance of long-term survival prediction in TNBC.

Among the evaluated algorithms, the CoxPH model showed the best predictive performance (Table 1). Although advanced ML approaches can perform similarly, previous studies have shown that CoxPH-based models remain competitive, and are often superior, when sample sizes are modest and relationships are approximately linear [36,37]. Consistent with these reports, the CoxPH model achieved the highest discrimination in the TCGA-BRCA dataset.

Given the class imbalance in our dataset, AUPRC provides a complementary performance metric to AUC. Only 18 of 142 patients (13%) experienced events, corresponding to a baseline AUPRC of 0.127 for a random classifier. Despite this imbalance, our model achieved AUPRC values of 0.8027 and 0.9070 for 5- and 10-year OS, respectively, substantially exceeding the baseline and demonstrating strong predictive performance for the minority class. The corresponding AUC values were 0.9459 and 0.9570. As AUC and AUPRC have different baselines and scales, direct numerical comparison is not appropriate; however, the consistently high values across both metrics support the robustness of our model under class imbalance.

External validation using the METABRIC cohort yielded lower predictive performance, likely due to differences in clinical composition and assay platforms between METABRIC (microarray) and TCGA-BRCA (RNA-seq). Similar cross-platform discrepancies have been reported, and frameworks such as EMBER have demonstrated that statistical harmonization can improve integration across datasets [38]. Applying such approaches may further enhance cross-cohort reproducibility.

Kaplan-Meier analyses demonstrated consistent survival trends across TCGA-BRCA and METABRIC (S1 Fig), supporting the biological plausibility of the identified markers. Among the nine selected genes, five (PZP, GPR61, TTLL2, TMEM14B, and IGFL1) showed consistent expression patterns and effect directions across cohorts, in line with prior studies linking them to TNBC proliferation and survival [1,23,26,39,40]. The remaining four genes showed discordant coefficients but similar expression patterns (S2 Fig), suggesting potential subtype-specific effects [41].

Recent studies have identified immune- and B cell-related signatures as strong prognostic determinants in early-stage TNBC [42], suggesting that incorporating immune-related features into our model may further improve predictive performance. In addition to OS, we developed models for progression-free survival (PFS) and DFS using the same framework (S2 Table). As these endpoints reflect distinct biological processes, differences in model performance are expected [43,44]. These findings highlight the potential applicability of our framework across multiple prognostic outcomes.

In summary, we present a reproducible and high-performing prognostic model for TNBC that exceeds prior benchmarks and aligns with emerging literature. Despite limitations in external validation and experimental confirmation, the model’s strong reproducibility across datasets and consistency with recent prognostic studies [28–30,38,45] underscore its robustness. Future studies will focus on experimental validation, cross-platform harmonization, and integration of multi-omics and immune features to further advance precision prognostication in TNBC.

Supporting information

S1 Fig. Kaplan-Meier survival curves for nine genes in the METABRIC cohort (external validation).

Analysis followed the procedure in Fig 3, but MaxStat cutoffs were re-estimated within METABRIC for each gene. Samples were dichotomized into high- (red) and low-expression (blue) groups using these cohort-specific cutoffs, and survival differences were assessed with the two-sided log-rank test.

https://doi.org/10.1371/journal.pone.0348414.s001

(TIF)

S2 Fig. External validation across four independent datasets.

(a) TCGA-BRCA; (b) GSE65216; (c) cell line single cell RNA-seq; (d) cell line bulk RNA-seq. Fold change (FC) is the mean expression in TNBC divided by that in LumA. Differences between TNBC and LumA, LumB, and HER2 were tested using two-sided Wilcoxon rank-sum tests.

https://doi.org/10.1371/journal.pone.0348414.s002

(TIF)

S1 Table. Clinicopathological characteristics of all patients in the TCGA-BRCA cohort.

NAC; Neoadjuvant chemotherapy. The p-values for continuous variable (age) and categorical variables were calculated using the Kruskal-Wallis test and the Chi-square test, respectively.

https://doi.org/10.1371/journal.pone.0348414.s003

(DOCX)

S2 Table. The best model for each of PFS and DFS.

PFS: Progression Free Survival; DFS: Disease Free Survival. For the results, we utilized CoxPH as the machine learning algorithm and AUC as the prediction measure, both of which showed the highest predictive performance in Table 1.

https://doi.org/10.1371/journal.pone.0348414.s004

(DOCX)

References

1. Choi SR, Hwang CY, Lee J, Cho K-H. Network analysis identifies regulators of basal-like breast cancer reprogramming and endocrine therapy vulnerability. Cancer Res. 2022;82(2):320–33. pmid:34845001
- View Article
- PubMed/NCBI
- Google Scholar
2. Lee J. Current treatment landscape for early triple-negative breast cancer (TNBC). J Clin Med. 2023;12(4):1524. pmid:36836059
- View Article
- PubMed/NCBI
- Google Scholar
3. Turner KM, Yeo SK, Holm TM, Shaughnessy E, Guan J-L. Heterogeneity within molecular subtypes of breast cancer. Am J Physiol Cell Physiol. 2021;321(2):C343–54. pmid:34191627
- View Article
- PubMed/NCBI
- Google Scholar
4. Xu Y, Gong M, Wang Y, Yang Y, Liu S, Zeng Q. Global trends and forecasts of breast cancer incidence and deaths. Sci Data. 2023;10(1):334. pmid:37244901
- View Article
- PubMed/NCBI
- Google Scholar
5. Campone M, Valo I, Jézéquel P, Moreau M, Boissard A, Campion L, et al. Prediction of Recurrence and Survival for Triple-Negative Breast Cancer (TNBC) by a protein signature in tissue samples. Mol Cell Proteomics. 2015;14(11):2936–46. pmid:26209610
- View Article
- PubMed/NCBI
- Google Scholar
6. Bardia A, Hurvitz SA, Tolaney SM, Loirat D, Punie K, Oliveira M, et al. Sacituzumab Govitecan in Metastatic Triple-Negative Breast Cancer. N Engl J Med. 2021;384(16):1529–41. pmid:33882206
- View Article
- PubMed/NCBI
- Google Scholar
7. McCann KE, Hurvitz SA. Advances in the use of PARP inhibitor therapy for breast cancer. Drugs Context. 2018;7:212540. pmid:30116283
- View Article
- PubMed/NCBI
- Google Scholar
8. Beniey M, Haque T, Hassan S. Translating the role of PARP inhibitors in triple-negative breast cancer. Oncoscience. 2019;6(1–2):287–8. pmid:30800714
- View Article
- PubMed/NCBI
- Google Scholar
9. Chen X, Li J, Gray WH, Lehmann BD, Bauer JA, Shyr Y, et al. TNBCtype: A subtyping tool for triple-negative breast cancer. Cancer Inform. 2012;11:147–56. pmid:22872785
- View Article
- PubMed/NCBI
- Google Scholar
10. Zhao Y, Schaafsma E, Cheng C. Gene signature-based prediction of triple-negative breast cancer patient response to Neoadjuvant chemotherapy. Cancer Med. 2020;9(17):6281–95. pmid:32692484
- View Article
- PubMed/NCBI
- Google Scholar
11. Stover DG, Winer EP. Tailoring adjuvant chemotherapy regimens for patients with triple negative breast cancer. Breast. 2015;24 Suppl 2:S132-5. pmid:26255198
- View Article
- PubMed/NCBI
- Google Scholar
12. Sukumar J, Gast K, Quiroga D, Lustberg M, Williams N. Triple-negative breast cancer: promising prognostic biomarkers currently in development. Expert review of anticancer therapy. 2021;21(2):135–48.
- View Article
- Google Scholar
13. Lee JS, Yost SE, Yuan Y. Neoadjuvant treatment for triple negative breast cancer: Recent progresses and challenges. Cancers (Basel). 2020;12(6):1404. pmid:32486021
- View Article
- PubMed/NCBI
- Google Scholar
14. Ono M, Tsuda H, Shimizu C, Yamamoto S, Shibata T, Yamamoto H, et al. Tumor-infiltrating lymphocytes are correlated with response to neoadjuvant chemotherapy in triple-negative breast cancer. Breast Cancer Res Treat. 2012;132(3):793–805. pmid:21562709
- View Article
- PubMed/NCBI
- Google Scholar
15. Han Y, Wang J, Xu B. Novel biomarkers and prediction model for the pathological complete response to neoadjuvant treatment of triple-negative breast cancer. J Cancer. 2021;12(3):936–45. pmid:33403050
- View Article
- PubMed/NCBI
- Google Scholar
16. Bouchalova K, Kharaishvili G, Bouchal J, Vrbkova J, Megova M, Hlobilkova A. Triple negative breast cancer - BCL2 in prognosis and prediction. Review. Curr Drug Targets. 2014;15(12):1166–75. pmid:25374001
- View Article
- PubMed/NCBI
- Google Scholar
17. Abdel-Fatah TMA, Perry C, Dickinson P, Ball G, Moseley P, Madhusudan S, et al. Bcl2 is an independent prognostic marker of triple negative breast cancer (TNBC) and predicts response to anthracycline combination (ATC) chemotherapy (CT) in adjuvant and neoadjuvant settings. Ann Oncol. 2013;24(11):2801–7. pmid:23908177
- View Article
- PubMed/NCBI
- Google Scholar
18. Ensenyat-Mendez M, Orozco JIJ, Llinàs-Arias P, Íñiguez-Muñoz S, Baker JL, Salomon MP, et al. Construction and validation of a gene expression classifier to predict immunotherapy response in primary triple-negative breast cancer. Commun Med (Lond). 2023;3(1):93. pmid:37430006
- View Article
- PubMed/NCBI
- Google Scholar
19. Benitez JC, Remon J, Besse B. Current panorama and challenges for neoadjuvant cancer immunotherapy. Clin Cancer Res. 2020;26(19):5068–77. pmid:32434852
- View Article
- PubMed/NCBI
- Google Scholar
20. Nakai K, Hung M-C, Yamaguchi H. A perspective on anti-EGFR therapies targeting triple-negative breast cancer. Am J Cancer Res. 2016;6(8):1609–23. pmid:27648353
- View Article
- PubMed/NCBI
- Google Scholar
21. Xu Y, Ju L, Tong J, Zhou C, Yang J. Supervised machine learning predictive analytics for triple-negative breast cancer death outcomes. Onco Targets Ther. 2019;12:9059–67. pmid:31802913
- View Article
- PubMed/NCBI
- Google Scholar
22. Yang Y, Wang Y, Deng H, Tan C, Li Q, He Z, et al. Development and validation of nomograms predicting survival in Chinese patients with triple negative breast cancer. BMC Cancer. 2019;19(1):541. pmid:31170946
- View Article
- PubMed/NCBI
- Google Scholar
23. Singhal SK, Al-Marsoummi S, Vomhof-DeKrey EE, Lauckner B, Beyer T, Basson MD. Schlafen 12 Slows TNBC tumor growth, induces luminal markers, and predicts favorable survival. Cancers (Basel). 2023;15(2):402. pmid:36672349
- View Article
- PubMed/NCBI
- Google Scholar
24. Lausen B, Schumacher M. Maximally selected rank statistics. Biometrics. 1992;48(1):73.
- View Article
- Google Scholar
25. Polley M-YC, Leon-Ferre RA, Leung S, Cheng A, Gao D, Sinnwell J, et al. A clinical calculator to predict disease outcomes in women with triple-negative breast cancer. Breast Cancer Res Treat. 2021;185(3):557–66. pmid:33389409
- View Article
- PubMed/NCBI
- Google Scholar
26. Feigin ME, Xue B, Hammell MC, Muthuswamy SK. G-protein-coupled receptor GPR161 is overexpressed in breast cancer and is a promoter of cell proliferation and invasion. Proc Natl Acad Sci U S A. 2014;111(11):4191–6. pmid:24599592
- View Article
- PubMed/NCBI
- Google Scholar
27. Huang K, Zhang J, Yu Y, Lin Y, Song C. The impact of chemotherapy and survival prediction by machine learning in early Elderly Triple Negative Breast Cancer (eTNBC): A population based study from the SEER database. BMC Geriatr. 2022;22(1):268. pmid:35361134
- View Article
- PubMed/NCBI
- Google Scholar
28. Kim CM, Park KH, Yu YS, Kim JW, Park JY, Park K, et al. A 10-Gene signature to predict the prognosis of early-stage triple-negative breast cancer. Cancer Res Treat. 2024;56(4):1113–25. pmid:38754473
- View Article
- PubMed/NCBI
- Google Scholar
29. Ouyang M, Gui Y, Li N, Zhao L. Prognostic model based on tumor stemness genes for triple-negative breast cancer. Sci Rep. 2024;14(1):30855. pmid:39730613
- View Article
- PubMed/NCBI
- Google Scholar
30. Zhang B, Zhao R, Wang Q, Zhang Y-J, Yang L, Yuan Z-J, et al. An EMT-Related gene signature to predict the prognosis of triple-negative breast cancer. Adv Ther. 2023;40(10):4339–57. pmid:37462865
- View Article
- PubMed/NCBI
- Google Scholar
31. Kesireddy M, Elsayed L, Shostrom VK, Agarwal P, Asif S, Yellala A, et al. Overall Survival and prognostic factors in metastatic triple-negative breast cancer: A national cancer database analysis. Cancers (Basel). 2024;16(10):1791. pmid:38791870
- View Article
- PubMed/NCBI
- Google Scholar
32. Kim JW, Lee J, Lee SH, Ahn S, Park KH. Machine learning-based prognostic gene signature for early triple-negative breast cancer. Cancer Res Treat. 2025;57(3):731–40. pmid:39563200
- View Article
- PubMed/NCBI
- Google Scholar
33. Gao H, Yang J, Li Y. Triple-negative breast cancer survival outcomes: Prognostic model validated with SEER database. Discov Oncol. 2026;17(1):258. pmid:41521352
- View Article
- PubMed/NCBI
- Google Scholar
34. Park WK, Chung SY, Jung YJ, Ha C, Kim J-W, Nam SJ, et al. Long-term oncologic outcomes of unselected triple-negative breast cancer patients according to BRCA1/2 mutations. NPJ Precis Oncol. 2024;8(1):96. pmid:38689097
- View Article
- PubMed/NCBI
- Google Scholar
35. Assunção Ribeiro da Costa RE, Rocha de Oliveira FT, Nascimento Araújo AL, Vieira SC. Impact of pathologic complete response on the prognosis of triple-negative breast cancer patients: A cohort study. Cureus. 2023;15(4):e37396. pmid:37182056
- View Article
- PubMed/NCBI
- Google Scholar
36. Germer S, Rudolph C, Labohm L, Katalinic A, Rath N, Rausch K, et al. Survival analysis for lung cancer patients: A comparison of Cox regression and machine learning models. Int J Med Inform. 2024;191:105607. pmid:39208536
- View Article
- PubMed/NCBI
- Google Scholar
37. Tran TT, Lee J, Gunathilake M, Kim J, Kim S-Y, Cho H, et al. A comparison of machine learning models and Cox proportional hazards models regarding their ability to predict the risk of gastrointestinal cancer based on metabolic syndrome and its components. Front Oncol. 2023;13:1049787. pmid:36937438
- View Article
- PubMed/NCBI
- Google Scholar
38. Ronchi C, Haider S, Brisken C. EMBER creates a unified space for independent breast cancer transcriptomic datasets enabling precision oncology. NPJ Breast Cancer. 2024;10(1):56. pmid:38982086
- View Article
- PubMed/NCBI
- Google Scholar
39. Kumar R, Kuligina E, Sokolenko A, Siddiqui Q, Gardi N, Gupta S, et al. Genetic ablation of pregnancy zone protein promotes breast cancer progression by activating TGF-β/SMAD signaling. Breast Cancer Res Treat. 2021;185(2):317–30. pmid:33057846
- View Article
- PubMed/NCBI
- Google Scholar
40. Fang K, Xu Z, Jiang S, Yan C, Tang D, Huang Y. Integrated profiling uncovers prognostic, immunological, and pharmacogenomic features of ferroptosis in triple-negative breast cancer. Front Immunol. 2022;13:985861. pmid:36505498
- View Article
- PubMed/NCBI
- Google Scholar
41. Cheng X-Y, Liang Y, Zhang H-F, Qian F-Z, Sun X-H, Liu X-A. An immunogenic cell death-related classification predicts response to immunotherapy and prognosis in triple-negative breast cancer. Am J Transl Res. 2023;15(4):2598–609. pmid:37193173
- View Article
- PubMed/NCBI
- Google Scholar
42. Conte B, Brasó-Maristany F, Hernández AR, Pascual T, Villacampa G, Schettini F, et al. A 14-gene B-cell immune signature in early-stage triple-negative breast cancer (TNBC): a pooled analysis of seven studies. EBioMedicine. 2024;102.
43. Li K, Qiu L, Zhao Y, Sun X, Shao J, He C, et al. Nomograms Predict PFS and OS for SCLC patients after standardized treatment: A real-world study. Int J Gen Med. 2024;17:1949–65. pmid:38736664
- View Article
- PubMed/NCBI
- Google Scholar
44. Kim SI, Song M, Hwangbo S, Lee S, Cho U, Kim J-H, et al. Development of web-based nomograms to predict treatment response and prognosis of epithelial ovarian cancer. Cancer Res Treat. 2019;51(3):1144–55. pmid:30453728
- View Article
- PubMed/NCBI
- Google Scholar
45. Hou X, Li X, Han Y, Xu H, Xie Y, Zhou T, et al. Triple-negative breast cancer survival prediction using artificial intelligence through integrated analysis of tertiary lymphoid structures and tumor budding. Cancer. 2024;130(S8):1499–512. pmid:38422056
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Choi SR, Hwang CY, Lee J, Cho K-H. Network analysis identifies regulators of basal-like breast cancer reprogramming and endocrine therapy vulnerability. Cancer Res. 2022;82(2):320–33. pmid:34845001
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Lee J. Current treatment landscape for early triple-negative breast cancer (TNBC). J Clin Med. 2023;12(4):1524. pmid:36836059
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Turner KM, Yeo SK, Holm TM, Shaughnessy E, Guan J-L. Heterogeneity within molecular subtypes of breast cancer. Am J Physiol Cell Physiol. 2021;321(2):C343–54. pmid:34191627
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Xu Y, Gong M, Wang Y, Yang Y, Liu S, Zeng Q. Global trends and forecasts of breast cancer incidence and deaths. Sci Data. 2023;10(1):334. pmid:37244901
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Campone M, Valo I, Jézéquel P, Moreau M, Boissard A, Campion L, et al. Prediction of Recurrence and Survival for Triple-Negative Breast Cancer (TNBC) by a protein signature in tissue samples. Mol Cell Proteomics. 2015;14(11):2936–46. pmid:26209610
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Bardia A, Hurvitz SA, Tolaney SM, Loirat D, Punie K, Oliveira M, et al. Sacituzumab Govitecan in Metastatic Triple-Negative Breast Cancer. N Engl J Med. 2021;384(16):1529–41. pmid:33882206
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. McCann KE, Hurvitz SA. Advances in the use of PARP inhibitor therapy for breast cancer. Drugs Context. 2018;7:212540. pmid:30116283
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Beniey M, Haque T, Hassan S. Translating the role of PARP inhibitors in triple-negative breast cancer. Oncoscience. 2019;6(1–2):287–8. pmid:30800714
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Chen X, Li J, Gray WH, Lehmann BD, Bauer JA, Shyr Y, et al. TNBCtype: A subtyping tool for triple-negative breast cancer. Cancer Inform. 2012;11:147–56. pmid:22872785
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Zhao Y, Schaafsma E, Cheng C. Gene signature-based prediction of triple-negative breast cancer patient response to Neoadjuvant chemotherapy. Cancer Med. 2020;9(17):6281–95. pmid:32692484
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Stover DG, Winer EP. Tailoring adjuvant chemotherapy regimens for patients with triple negative breast cancer. Breast. 2015;24 Suppl 2:S132-5. pmid:26255198
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Sukumar J, Gast K, Quiroga D, Lustberg M, Williams N. Triple-negative breast cancer: promising prognostic biomarkers currently in development. Expert review of anticancer therapy. 2021;21(2):135–48.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref13] 13. Lee JS, Yost SE, Yuan Y. Neoadjuvant treatment for triple negative breast cancer: Recent progresses and challenges. Cancers (Basel). 2020;12(6):1404. pmid:32486021
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Ono M, Tsuda H, Shimizu C, Yamamoto S, Shibata T, Yamamoto H, et al. Tumor-infiltrating lymphocytes are correlated with response to neoadjuvant chemotherapy in triple-negative breast cancer. Breast Cancer Res Treat. 2012;132(3):793–805. pmid:21562709
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Han Y, Wang J, Xu B. Novel biomarkers and prediction model for the pathological complete response to neoadjuvant treatment of triple-negative breast cancer. J Cancer. 2021;12(3):936–45. pmid:33403050
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Bouchalova K, Kharaishvili G, Bouchal J, Vrbkova J, Megova M, Hlobilkova A. Triple negative breast cancer - BCL2 in prognosis and prediction. Review. Curr Drug Targets. 2014;15(12):1166–75. pmid:25374001
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Abdel-Fatah TMA, Perry C, Dickinson P, Ball G, Moseley P, Madhusudan S, et al. Bcl2 is an independent prognostic marker of triple negative breast cancer (TNBC) and predicts response to anthracycline combination (ATC) chemotherapy (CT) in adjuvant and neoadjuvant settings. Ann Oncol. 2013;24(11):2801–7. pmid:23908177
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref18] 18. Ensenyat-Mendez M, Orozco JIJ, Llinàs-Arias P, Íñiguez-Muñoz S, Baker JL, Salomon MP, et al. Construction and validation of a gene expression classifier to predict immunotherapy response in primary triple-negative breast cancer. Commun Med (Lond). 2023;3(1):93. pmid:37430006
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref19] 19. Benitez JC, Remon J, Besse B. Current panorama and challenges for neoadjuvant cancer immunotherapy. Clin Cancer Res. 2020;26(19):5068–77. pmid:32434852
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref20] 20. Nakai K, Hung M-C, Yamaguchi H. A perspective on anti-EGFR therapies targeting triple-negative breast cancer. Am J Cancer Res. 2016;6(8):1609–23. pmid:27648353
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref21] 21. Xu Y, Ju L, Tong J, Zhou C, Yang J. Supervised machine learning predictive analytics for triple-negative breast cancer death outcomes. Onco Targets Ther. 2019;12:9059–67. pmid:31802913
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref22] 22. Yang Y, Wang Y, Deng H, Tan C, Li Q, He Z, et al. Development and validation of nomograms predicting survival in Chinese patients with triple negative breast cancer. BMC Cancer. 2019;19(1):541. pmid:31170946
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref23] 23. Singhal SK, Al-Marsoummi S, Vomhof-DeKrey EE, Lauckner B, Beyer T, Basson MD. Schlafen 12 Slows TNBC tumor growth, induces luminal markers, and predicts favorable survival. Cancers (Basel). 2023;15(2):402. pmid:36672349
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref24] 24. Lausen B, Schumacher M. Maximally selected rank statistics. Biometrics. 1992;48(1):73.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref25] 25. Polley M-YC, Leon-Ferre RA, Leung S, Cheng A, Gao D, Sinnwell J, et al. A clinical calculator to predict disease outcomes in women with triple-negative breast cancer. Breast Cancer Res Treat. 2021;185(3):557–66. pmid:33389409
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref26] 26. Feigin ME, Xue B, Hammell MC, Muthuswamy SK. G-protein-coupled receptor GPR161 is overexpressed in breast cancer and is a promoter of cell proliferation and invasion. Proc Natl Acad Sci U S A. 2014;111(11):4191–6. pmid:24599592
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref27] 27. Huang K, Zhang J, Yu Y, Lin Y, Song C. The impact of chemotherapy and survival prediction by machine learning in early Elderly Triple Negative Breast Cancer (eTNBC): A population based study from the SEER database. BMC Geriatr. 2022;22(1):268. pmid:35361134
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref28] 28. Kim CM, Park KH, Yu YS, Kim JW, Park JY, Park K, et al. A 10-Gene signature to predict the prognosis of early-stage triple-negative breast cancer. Cancer Res Treat. 2024;56(4):1113–25. pmid:38754473
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref29] 29. Ouyang M, Gui Y, Li N, Zhao L. Prognostic model based on tumor stemness genes for triple-negative breast cancer. Sci Rep. 2024;14(1):30855. pmid:39730613
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref30] 30. Zhang B, Zhao R, Wang Q, Zhang Y-J, Yang L, Yuan Z-J, et al. An EMT-Related gene signature to predict the prognosis of triple-negative breast cancer. Adv Ther. 2023;40(10):4339–57. pmid:37462865
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref31] 31. Kesireddy M, Elsayed L, Shostrom VK, Agarwal P, Asif S, Yellala A, et al. Overall Survival and prognostic factors in metastatic triple-negative breast cancer: A national cancer database analysis. Cancers (Basel). 2024;16(10):1791. pmid:38791870
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref32] 32. Kim JW, Lee J, Lee SH, Ahn S, Park KH. Machine learning-based prognostic gene signature for early triple-negative breast cancer. Cancer Res Treat. 2025;57(3):731–40. pmid:39563200
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref33] 33. Gao H, Yang J, Li Y. Triple-negative breast cancer survival outcomes: Prognostic model validated with SEER database. Discov Oncol. 2026;17(1):258. pmid:41521352
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref34] 34. Park WK, Chung SY, Jung YJ, Ha C, Kim J-W, Nam SJ, et al. Long-term oncologic outcomes of unselected triple-negative breast cancer patients according to BRCA1/2 mutations. NPJ Precis Oncol. 2024;8(1):96. pmid:38689097
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref35] 35. Assunção Ribeiro da Costa RE, Rocha de Oliveira FT, Nascimento Araújo AL, Vieira SC. Impact of pathologic complete response on the prognosis of triple-negative breast cancer patients: A cohort study. Cureus. 2023;15(4):e37396. pmid:37182056
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref36] 36. Germer S, Rudolph C, Labohm L, Katalinic A, Rath N, Rausch K, et al. Survival analysis for lung cancer patients: A comparison of Cox regression and machine learning models. Int J Med Inform. 2024;191:105607. pmid:39208536
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref37] 37. Tran TT, Lee J, Gunathilake M, Kim J, Kim S-Y, Cho H, et al. A comparison of machine learning models and Cox proportional hazards models regarding their ability to predict the risk of gastrointestinal cancer based on metabolic syndrome and its components. Front Oncol. 2023;13:1049787. pmid:36937438
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref38] 38. Ronchi C, Haider S, Brisken C. EMBER creates a unified space for independent breast cancer transcriptomic datasets enabling precision oncology. NPJ Breast Cancer. 2024;10(1):56. pmid:38982086
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref39] 39. Kumar R, Kuligina E, Sokolenko A, Siddiqui Q, Gardi N, Gupta S, et al. Genetic ablation of pregnancy zone protein promotes breast cancer progression by activating TGF-β/SMAD signaling. Breast Cancer Res Treat. 2021;185(2):317–30. pmid:33057846
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

[ref40] 40. Fang K, Xu Z, Jiang S, Yan C, Tang D, Huang Y. Integrated profiling uncovers prognostic, immunological, and pharmacogenomic features of ferroptosis in triple-negative breast cancer. Front Immunol. 2022;13:985861. pmid:36505498
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref41] 41. Cheng X-Y, Liang Y, Zhang H-F, Qian F-Z, Sun X-H, Liu X-A. An immunogenic cell death-related classification predicts response to immunotherapy and prognosis in triple-negative breast cancer. Am J Transl Res. 2023;15(4):2598–609. pmid:37193173
View Article
PubMed/NCBI
Google Scholar

[160] View Article

[161] PubMed/NCBI

[162] Google Scholar

[ref42] 42. Conte B, Brasó-Maristany F, Hernández AR, Pascual T, Villacampa G, Schettini F, et al. A 14-gene B-cell immune signature in early-stage triple-negative breast cancer (TNBC): a pooled analysis of seven studies. EBioMedicine. 2024;102.

[ref43] 43. Li K, Qiu L, Zhao Y, Sun X, Shao J, He C, et al. Nomograms Predict PFS and OS for SCLC patients after standardized treatment: A real-world study. Int J Gen Med. 2024;17:1949–65. pmid:38736664
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

[ref44] 44. Kim SI, Song M, Hwangbo S, Lee S, Cho U, Kim J-H, et al. Development of web-based nomograms to predict treatment response and prognosis of epithelial ovarian cancer. Cancer Res Treat. 2019;51(3):1144–55. pmid:30453728
View Article
PubMed/NCBI
Google Scholar

[169] View Article

[170] PubMed/NCBI

[171] Google Scholar

[ref45] 45. Hou X, Li X, Han Y, Xu H, Xie Y, Zhou T, et al. Triple-negative breast cancer survival prediction using artificial intelligence through integrated analysis of tertiary lymphoid structures and tumor budding. Cancer. 2024;130(S8):1499–512. pmid:38422056
View Article
PubMed/NCBI
Google Scholar

[173] View Article

[174] PubMed/NCBI

[175] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Data sources

Model development and evaluation

Statistical analysis

Results

Development and evaluation of TNBC prognosis model

Validation of TNBC prognostic markers across cohorts

Discussion

Supporting information

S1 Fig. Kaplan-Meier survival curves for nine genes in the METABRIC cohort (external validation).

S2 Fig. External validation across four independent datasets.

S1 Table. Clinicopathological characteristics of all patients in the TCGA-BRCA cohort.

S2 Table. The best model for each of PFS and DFS.

References