Node properties of biomarkers within the protein–protein interaction network derived from breast cancer-associated genes

Takanori Sasaki; Saito Torii

doi:10.1371/journal.pone.0347551

Abstract

Analyzing the network properties of cancer biomarkers within protein–protein interaction (PPI) networks is valuable for discovering novel biomarker candidates. Therefore, we constructed PPI networks using breast cancer (BC)-associated gene sets and performed 12 distinct centrality analyses to characterize the topological features of clinically validated biomarkers. Our reference set of biomarkers comprised genes from five clinical genetic testing panels—MammaPrint, Oncotype DX, PAM50, EndoPredict, and the BC Index—that were also present in the STRING database. The PPI networks were constructed from the top 2,000 BC-associated genes, ranked by disease score from the DISEASES database. These networks were then subjected to centrality analysis using five local and seven global measures. The top 5% centrality rankings were evaluated, demonstrating that maximum clique centrality (MCC) identified the highest proportion of known biomarkers, with an inclusion rate of approximately 36%. Furthermore, MCC generated a unique biomarker-ranking pattern, exhibiting a Spearman’s rank correlation coefficient below 0.8 when compared with all other metrics. Consequently, a high MCC score is a key topological feature of many validated biomarkers. Genes with the highest MCC scores (top 5%) were significantly enriched for gene-ontology terms related to the cell cycle and fibroblast growth factor receptor signaling pathway. Additionally, biomarkers with high MCC scores exhibited significantly greater evolutionary conservation and potential for protein complex formation. Collectively, our findings indicate that many effective BC biomarkers are components of large, evolutionarily conserved cliques within cell-cycle-associated regions of the PPI network. Finally, based on this MCC-centric approach, we identified 11 novel candidate biomarkers.

Citation: Sasaki T, Torii S (2026) Node properties of biomarkers within the protein–protein interaction network derived from breast cancer-associated genes. PLoS One 21(5): e0347551. https://doi.org/10.1371/journal.pone.0347551

Editor: Attila Csikász-Nagy, Pázmány Péter Catholic University: Pazmany Peter Katolikus Egyetem, HUNGARY

Received: February 7, 2026; Accepted: April 2, 2026; Published: May 6, 2026

Copyright: © 2026 Sasaki, Torii. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data used in this study are publicly available. Breast cancer–associated genes were retrieved from the DISEASES database (https://diseases.jensenlab.org) using the disease query terms “breast cancer” (DOID:1612), “breast disease” (DOID:3463), and “breast carcinoma” (DOID:3459) (accessed February 23, 2024), yielding 5,646 genes ranked by disease score. The top 2,000 genes were selected based on disease score (disease score >2.08) using the Cytoscape plugin stringApp (v2.0.3), and 1,816 genes present in the STRING database (https://string-db.org) were used for protein–protein interaction network construction. PPI networks were generated using STRING with interaction sources “Experiments,” “Databases,” and “Co-expression,” and minimum interaction scores of 0.4, 0.7, and 0.9. Functional enrichment analysis was performed using Metascape (https://metascape.org) on the top 5% of MCC-ranked genes (n = 68). Survival analysis was conducted using the Kaplan–Meier Plotter (https://kmplot.com), and gene expression comparisons were performed using TNMplot (https://tnmplot.com) on 45 candidate genes. Ortholog information was obtained from InParanoidDB v9 (http://inparanoid.sbc.su.se) using the “full-length orthologs” search option.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: BC, Breast cancer; CC, Clustering coefficient; DMNC, Density of maximum neighborhood component; EPC, Edge percolated component; MNC, Maximum neighborhood component; MCC, Maximum clique centrality; PPI, Protein-protein interaction; GO, Gene ontology; RFS, relapse-free survival; OS, overall survival

Introduction

Breast cancer (BC) is the most frequently diagnosed malignancy in women and a leading cause of cancer-related mortality [1]. The clinical management of BC is challenged by its heterogeneity, making an understanding of the specific mutations and gene expression changes within individual tumors essential for developing effective therapeutic strategies [2,3]. In this context, biomarkers function as critical tools for diagnosis and prognostication. For example, established multigene tests such as Oncotype DX, PAM50, MammaPrint, EndoPredict, and the BC Index have been developed and validated by correlating gene signatures with patient clinical outcomes [4–8]. Each of these assays has demonstrated robust prognostic power, contributing significantly to personalized treatment and improving patient outcomes [9].

Meanwhile, numerous computational biology studies have focused on identifying novel biomarker candidates for BC [10,11]. These approaches have often employed gene co-expression analysis and machine-learning techniques to identify candidate genes and subsequently assess their prognostic value on large datasets and through in vitro experiments. However, no prior studies have explored novel biomarker candidates by leveraging the computational characteristics of biomarkers already established in clinical practice.

Protein–protein interaction (PPI) networks are invaluable for elucidating both the maintenance of cellular functions and the roles of genes in disease progression. Centrality analysis, a tool from network theory, has been widely applied to characterize disease-associated proteins as nodes within these PPI networks [12–14]. For instance, Viacava Follis (2021) reported that targets of available selective drugs tend to possess a high degree of centrality, short average path lengths, and low topological coefficients in the PPI network [13]. In networks of prostate cancer-associated genes, “kinless hubs” that bridge modules were identified as essential for maintaining network stability and integrity [15]. Similarly, a centrality analysis of BC-related PPI networks showed that these genes often exhibit high clustering coefficients and link densities [16]. This body of work confirms that disease-associated genes possess distinct node characteristics. However, it remains unclear whether clinically implemented biomarkers for BC share common centrality features within disease-specific PPI networks.

This study was designed to investigate the node properties of established BC biomarkers in a PPI network context. We constructed networks using BC-related genes with high disease scores from the DISEASES database and applied 12 centrality analyses. Our investigation revealed that high maximum clique centrality (MCC) scores are a characteristic feature of many BC biomarkers. Based on this finding, we used the optimal network conditions for MCC to propose 11 new candidate biomarkers, providing an analytical framework to advance network-based biomarker discovery.

Materials and Methods

Data acquisition and preprocessing

The workflow of this study is illustrated in Fig 1.

Download:

Fig 1. Study workflow.

BC: Breast Cancer, PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.g001

Genes associated with BC were obtained from the DISEASES database, which links genes to diseases via a confidence metric called a disease score (scale: 0–5) [17,18]. Using the Cytoscape plugin stringApp (v2.0.3), we queried the database for “BC,” which returned an initial set of 5,646 genes [19]. The database was accessed on February 23, 2024. Because DISEASES is continuously updated, the analyses in this study reflect the database version available at the time of access. From this list, we selected the top 2,000 BC-related genes, resulting in a list with disease scores above 2.08.

PPI network construction using BC-related genes

A PPI network was constructed using the selected BC-related genes that were registered in the DISEASES and STRING databases [20]. To define the network edges, physical and functional interactions were sourced from STRING by selecting “Experiments,” “Databases,” and “Co-expression” as interaction sources. The density of each resulting PPI network was calculated using the formula 2m/ n(n–1), where m and n denote the numbers of edges and nodes, respectively.

Centrality analysis

Centrality analysis was performed on the BC-related PPI network with the cytoHubba (v0.1) plugin for Cytoscape [12]. We calculated 12 centrality measures: Betweenness, BottleNeck, Clustering Coefficient (CC), Closeness, Degree, Density of Maximum Neighborhood Component (DMNC), EcCentricity, Edge Percolated Component (EPC), MCC, Maximum Neighborhood Component (MNC), Radiality, and Stress. The biomarker set for this analysis comprised genes from five genetic tests (MammaPrint, Oncotype DX, PAM50, EndoPredict, and the BC Index) that were also registered in the STRING database. Using the resulting centrality rankings for these biomarkers, we calculated Spearman’s rank correlation coefficients using the R package corrplot (v0.92) and performed hierarchical clustering via Ward’s method.

Gene ontology (GO) enrichment analysis

GO enrichment analysis was conducted on genes with high-ranking centrality score using Metascape (v3.5) [21]. A custom analysis was conducted using GO biological process-related genes from H. sapiens as the background set. To explore the semantic relationships between the resulting GO terms, we used REVIGO (v1.8.1) [22,23]. These results were then exported as an R script for plotting.

Survival analysis

Survival analyses of BC-related genes were performed using the Kaplan–Meier Plotter public datasets [24]. Patients were stratified into two groups based on the median expression level of a given gene, and analyses were conducted for both relapse-free survival (RFS) and overall survival (OS). A p-value < 0.05 was considered statistically significant.

Differential expression analysis

Differential gene expression was analyzed using the TNMplot tool, which contains data from 56,938 unique samples [25,26]. To compare gene expression across normal, tumor, and metastatic tissues, the Kruskal–Wallis test was applied, followed by Dunn’s test for post hoc comparisons. A p-value < 0.05 in the Dunn’s tests for normal vs. tumor and tumor vs. metastatic samples was considered statistically significant.

Ortholog pair analysis

To evaluate the evolutionary conservation of BC biomarkers, we identified orthologous pairs using InParanoidDB9 [27,28]. Pairwise comparisons were performed between H. sapiens and five other eukaryotic species: S. cerevisiae, D. melanogaster, C. elegans, A. thaliana, and D. rerio. Seed ortholog pairs were defined as genes from full-length protein ortholog groups with an Inparalog score of 1.0 and a seed score ≥ 0.95 from the InParanoid-DIAMOND algorithm. Based on this, an evolutionary conservation score was assigned to each biomarker, with 0.2 points awarded for the presence of an ortholog in each species, for a maximum possible score of 1.0.

Prediction of complex formation ability

The potential for BC biomarkers to form protein complexes was predicted using the MCODE (v2.0.3) plugin for Cytoscape [29]. The analysis was run with default parameters on the BC-related PPI network described in Section 2.2. The MCODE-generated node score, which is used to identify cluster “seed” proteins, was used as a measure of each protein’s ability to form complexes.

Results

PPI network construction using BC-related genes

To generate a series of PPI networks, we first obtained the top 2,000 BC-related genes ranked by disease score from the DISEASES database. From this list, we created three sets of “query genes” using the top 1,000, 1,500, and 2,000 genes. These sets corresponded to 920, 1,362, and 1,816 genes, respectively, that were registered in the STRING database (Table 1).

Download:

Table 1. Construction of PPI network based on BC related genes obtained from DISEASES database.

https://doi.org/10.1371/journal.pone.0347551.t001

For each gene set, networks were constructed using three different interaction score cutoffs: > 0.4 (medium confidence), > 0.7 (high confidence), and >0.9 (highest confidence). For example, the PPI network built from 1,000 query genes with medium confidence interactions consisted of 920 STRING nodes and 11,032 edges, as visualized in Fig 2(A).

Download:

Fig 2. PPI network of BC-related genes and centrality analysis.

(A) PPI network constructed with 920 nodes from the STRING database, derived from 1,000 query genes. Medium confidence edges (minimum required interaction score = 0.4) were applied. Active interaction sources were experiments, databases, and co-expression. (B) Results of centrality analysis. The upper panel shows scores for the local-based “Degree” method; the lower panel shows scores for the global-based “Betweenness” method. Deeper red node colors indicate higher centrality scores. Nodes with a degree of zero are not displayed. BC: Breast Cancer, PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.g002

Our target list for analysis comprised 128 known biomarkers from five representative genetic tests (Mammaprint, Oncotype DX, PAM50, EndoPredict, and BC Index), which were also registered in STRING (S1 Table). Within the constructed networks, 46, 64, and 72 of these biomarkers were present among the 920, 1,362, and 1,816 STRING nodes, respectively (Table 1).

Centrality analysis on the PPI network

We applied five local-based and seven global-based centrality ranking methods to the constructed PPI networks, calculating scores for all genes with a degree of one or more (S2 Table). The results for two representative methods, the local-based “degree” and global-based “betweenness,” are shown in Fig 2(B), whereas S1 Fig displays the results for the other ten analyses.

For each metric, we ranked all genes in the PPI network by their score and then calculated the percentage of known biomarkers that appeared in the top 5% of the ranking. This proportion is displayed for each centrality score in Fig 3. As shown in Fig 3(A), the inclusion rate of biomarkers was sensitive to the network confidence cutoff.

Download:

Fig 3. Proportion of BC biomarkers included in the top 5% of genes from each centrality ranking.

(A) Centrality analysis results for PPI networks constructed with medium (light blue bar), high (blue bar), and highest (orange bar) confidence edges. Each bar represents the average proportion of biomarkers from networks built with 1,000, 1,500, and 2,000 query genes. Error bars indicate the standard deviation. (B) Centrality analysis results for PPI networks constructed with nodes from 1,000 (light blue bar), 1,500 (blue bar), and 2,000 (orange bar) query genes. Each bar represents the average proportion of biomarkers across all three confidence levels. Error bars indicate the standard deviation. BC: Breast Cancer, PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.g003

For example, using the betweenness centrality method with medium confidence interactions, the biomarker inclusion rates were 4/46 × 100 = 8.7%, 5/64 × 100 = 7.8%, and 6/72 × 100 = 8.3% for the 1,000-, 1,500-, and 2,000-gene networks, respectively, with an average of 8.3%. Generally, biomarker rankings tended to be higher under medium and high confidence settings than under the highest confidence setting, except for the “clustering coefficient” and “radiality” measures. Notably, the “EcCentricity” metric exhibited poor resolution, with many genes sharing the same score; in these instances, all genes with the same score as that of the top 5% threshold were included in the analysis and normalized accordingly. Fig 3(B) compares biomarker inclusion rates across PPI networks built with different numbers of query genes. Each bar represents the average inclusion rate calculated across the medium, high, and highest confidence networks. For locally based methods such as Degree, DMNC, MCC, and MNC, the inclusion rate was 1.5–2.1 times higher in networks constructed from 1,500 and 2,000 query genes compared with the 1,000-gene network. Across all comparisons shown in Figs 3(A) and (B), MCC consistently yielded the highest biomarker inclusion rate. These results strongly suggest that a high MCC score is a shared characteristic among established BC biomarkers.

Relationship between biomarker MCC rankings and PPI network statistics

Next, we examined the influence of PPI network statistics on the inclusion rate of biomarkers among the top 5% of MCC-ranked genes. As shown in Fig 4, this rate varied depending on the number of query genes and the interaction confidence used to build the network (see S3 Table for full network statistics).

Download:

Fig 4. Proportion of BC biomarkers in the top 5% of MCC rankings.

The results of MCC analysis are shown for PPI networks constructed from a range of 1,000 to 2,000 query genes with the top disease scores. Edge conditions were medium (light blue bar), high (blue bar), and highest (orange bar) confidence. BC: Breast Cancer, PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.g004

The optimal condition was the network constructed from 1,500 query genes (1,362 STRING nodes) with medium confidence interactions, which showed the highest inclusion rate, encompassing approximately 36% of the 64 biomarkers present in that network. To determine whether the MCC ranking pattern in this optimal network was distinct, we compared it with the other 11 centrality rankings for the 62 connected biomarkers (S4 Table; two unconnected genes, HOXB13 and SLC39A, were excluded. Fig 5 shows the Spearman’s rank correlation coefficients and results of a hierarchical clustering analysis (correlation threshold = 0.8).

Download:

Fig 5. Spearman’s rank correlation coefficient and hierarchical clustering of centrality rankings for BC biomarkers.

The analysis used 62 biomarkers with a degree of ≥1 from the PPI network based on 1,500 query genes. Two biomarkers with a degree of 0 (HOXB13 and SLC39A) were excluded. BC: Breast Cancer, PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.g005

The analysis revealed three major clusters: one comprised global-based methods (“Betweenness,” “Closeness,” “Radiality,” “Stress”), a second of local-based methods (“DMNC,” “CC”), and a third that included “EPC,” “Degree,” and “MNC.” By contrast, MCC, BottleNeck, and EcCentricity all had correlation coefficients below 0.8 with all other metrics, indicating that their ranking patterns are unique.

GO of genes with high MCC scores

As the PPI network constructed from 1,500 query genes with medium confidence interactions showed the highest concentration of highly-ranked biomarkers, we performed a GO analysis on the top 5% of MCC-ranked genes from this network. This set comprised 68 genes, including 23 known biomarkers (Table 2).

Download:

Table 2. Top 5% genes of MCC ranking for PPI network based on 1,500 query genes^a.

https://doi.org/10.1371/journal.pone.0347551.t002

An examination of the top 20 enriched GO terms revealed that one of the most significant was the “fibroblast growth factor receptor signaling pathway” (P = 3.9 × 10 ⁻ ⁴⁵) (Table 3).

Download:

Table 3. GO term on top 5% of MCC-ranked genes.

https://doi.org/10.1371/journal.pone.0347551.t003

Additionally, five GO terms related to the “cell cycle” were significantly enriched: “mitotic cell cycle process” (P = 3.1 × 10 ⁻ ³⁷), “regulation of cell cycle process” (P = 7.6 × 10 ⁻ ²⁹), “cell cycle phase transition” (P = 1.2 × 10 ⁻ ²²), “regulation of G2/M transition of mitotic cell cycle” (P = 1.6 × 10 ⁻ ¹²), and “meiotic cell cycle” (P = 2.7 × 10 ⁻ ¹⁰). To explore the relationships between these terms, we used REVIGO to cluster them based on semantic similarity (Fig 6).

Download:

Fig 6. REVIGO clustering of GO terms from top 5% MCC-ranked genes.

Semantically similar GO terms are positioned closely in the two-dimensional space. The number inside each bubble corresponds to the GO term list in Table 3. Bubble color indicates the log p-value, and the size indicates the term’s frequency in the underlying EBI GOA database. GO: gene ontology.

https://doi.org/10.1371/journal.pone.0347551.g006

This process identified two major cell-cycle-related clusters. The first, represented by the most significant term “mitotic cell cycle process,” included related terms such as “cell cycle phase transition” and “meiotic cell cycle.”The second group, represented by “regulation of cell cycle process,” included terms such as “regulation of cell division” and “positive regulation of chromosome segregation”. These results suggest that genes with high MCC scores are predominantly involved in the fibroblast growth factor receptor signaling pathway and cell-cycle-related processes.

Relationship between MCC rankings of biomarkers and their biological characteristics

To investigate the biological properties associated with MCC rankings, we first examined the evolutionary conservation of biomarkers. InParanoidDB9 was used to determine if Homo sapiens genes for each BC biomarker had orthologs in five representative eukaryotic species (Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Danio rerio) (S5 Table). A score of 0.2 was added to each biomarker’s evolutionary conservation score for every orthologous pair that was considered a true ortholog, defined as having a seed score of ≥0.95. For instance, CDC6 (MCC rank 34), which has orthologs across all five species, received a maximum score of 1.0. A comparison of biomarkers divided into high- and low-MCC ranking groups (32 genes each) revealed that the high-MCC group had a significantly greater median conservation score (P = 0.02, Wilcoxon rank-sum test; Fig 7(A), S7 Table).

Download:

Fig 7. Association of BC biomarker MCC ranking with biological features.

(A) Comparison of the evolutionary conservation score between high- and low-MCC ranking groups. The 64 BC biomarkers were divided into two groups, each comprising 32 genes. (B) Comparison of the node score for complex prediction between high- and low-MCC ranking groups. The 62 connected BC biomarkers were divided into two groups, each comprising 31 genes. HOXB13 and SLC39A were excluded from the protein complex analysis because their degree of 0 prevented the calculation of the node score.

https://doi.org/10.1371/journal.pone.0347551.g007

Among the 11 other centrality metrics, only five showed a similar statistically significant difference, suggesting a strong association between high MCC rank and evolutionary conservation.

Furthermore, we investigated the link between biomarker MCC rankings and the potential for protein complex formation using the MCODE algorithm, a density-based clustering method for predicting protein complexes within PPI networks. MCODE calculates a node score for each protein, reflecting its local interaction density —a proxy for its likelihood of being part of a biological complex. Based on the scores for the 64 biomarkers (S6 Table), we found that those with higher MCC rankings had significantly higher node scores than biomarkers with lower rankings (P = 4.0 × 10 ⁻ ¹⁶, Wilcoxon rank-sum test; Fig 7(B), S7 Table). Although nine other centrality metrics also showed significant correlations, the association with MCC was the most statistically significant (S7 Table). These findings suggest that biomarkers with high MCC rankings, which are part of larger or more numerous network cliques, have a greater potential for protein complex formation.

Prediction of novel biomarker candidates

Finally, we leveraged these findings to predict novel biomarker candidates by examining the top 5% of MCC-ranked genes, excluding those already known to be biomarkers. These potential candidates were filtered based on survival analysis in the Kaplan–Meier Plotter database (7,830 BC samples) and differential gene expression analysis in the TNMplot database (56,938 samples). To qualify, a gene had to be significantly associated (P < 0.05) with both RFS and OS, and also show significant differential expression (P < 0.05) between normal and tumor tissues and between tumor and metastatic tissues. As illustrated in the Venn diagram in Fig 8(A), a total of 11 genes satisfied all criteria: TOP2A, TTK, BUB1, CCNB2, KIF11, HMMR, FOXM1, FGF1, PTPN11, FGFR2, and ERBB3 (S8 Table).

Download:

Fig 8. Prediction of key genes as BC biomarker candidates.

(A) Venn diagram illustrating the filtering of the top 5% MCC-ranked genes based on survival and differential gene expression analyses. Among the top 68 genes in the MCC ranking (Table 2), the 45 genes that are not known biomarkers were analyzed. For survival analysis, patients were split by median gene expression, and genes with significance in both RFS and OS (P < 0.05) were selected. For differential gene expression analysis, genes with significant expression differences (P < 0.05) in both normal vs. tumor and tumor vs. metastatic tissues were selected. (B) MCC rank of the 11 identified key genes and the number of interactions with 128 known BC biomarkers. The number of direct interactions for each key gene was counted in the STRING database under medium confidence using “Experiments,” “Databases,” and “Co- expression” as sources. BC: Breast Cancer.

https://doi.org/10.1371/journal.pone.0347551.g008

We then mapped the interactions of these 11 genes with the 128 known BC biomarkers from the STRING database (Fig 8(B), S1 Table). This revealed two groups: seven genes (TOP2A, TTK, BUB1, CCNB2, KIF11, FOXM1, and HMMR) were connected to approximately 21–26% of known biomarkers, whereas the remaining four (FGF1, FGFR2, PTPN11 (SHP2), and ERBB3) were connected to only 3–5%. These connection levels may reflect the extent to which the biological functions and prognostic significance of the newly identified candidates resemble those of established biomarkers.

Discussion

In this study, an analysis of 12 centrality measures applied to the PPI network of BC-related genes found that a high MCC score was the most consistent node feature associated with known BC biomarkers. Under the optimal network condition—using 1,500 query genes (1,362 STRING nodes) and medium confidence interactions—the proportion of known biomarkers among the top 5% of MCC-ranked genes reached approximately 36%, the highest of all evaluated conditions (Fig 4). This detection rate was approximately 1.8 times greater than that achieved with DMNC, the second-best performing metric (20.3%). The success of MCC-based biomarker detection was influenced by network parameters. First, network density was a key factor; the network constructed with medium confidence interactions was approximately 2.8- and 4.5-fold denser than those built with high and highest confidence edges, respectively, which corresponded to a 1.2- and 7.7-fold higher biomarker detection rate (S3 Table, Fig 4). Another contributing factor was the number of nodes (genes), as the detection rate increased from 8.7% in the network with 1,000 query genes to approximately 36% in larger gene sets (Fig 4). These findings suggest that BC biomarkers tend to be embedded within large cliques in the PPI network, and that successful detection of these cliques requires a network that is both large and dense. In contrast, increasing the number of query genes from 1,500–2,000 resulted in a slight decrease in biomarker detection rate. One possible explanation is that the inclusion of genes with lower relevance to breast cancer (i.e., lower disease scores) led to a more diffuse interaction structure, thereby reducing the relative prominence of densely connected cliques enriched with biomarkers. As a result, the performance of MCC-based biomarker detection may have decreased.

Although BC biomarkers exhibit diverse biological functions, including hormone-related activities, cell-cycle regulation, adhesion/invasion/angiogenesis processes, epithelial and tumor-associated antigen presentation, and cellular and proliferative activity [30], we found that biomarkers and other BC-related genes with high MCC scores were predominantly associated with GO terms related to cell-cycle processes and the fibroblast growth factor receptor signaling pathway (Table 2, Fig 6). The connection to the cell cycle is particularly strong, as 19 of the 23 top-ranked biomarkers with high MCC scores have been previously reported to be involved in its regulation (S9 Table). The cell cycle is an evolutionarily conserved process, and many of its related genes are known to be highly conserved [31]. Furthermore, Wuchty et al. demonstrated that evolutionarily conserved genes are more likely to form cliques, which serve as dense interaction motifs within PPI networks [32]. Consistent with these findings, our study also demonstrated that biomarkers with high MCC scores were significantly more evolutionarily conserved (S5 Table, Fig 7(A), S7 Table). Collectively, these results suggest that evolutionarily conserved BC-related genes that are embedded in large, multiple cliques within cell-cycle-associated PPI networks may serve as sensitive biomarkers that reflect cancer progression. In this study, evolutionary conservation was assessed by assigning equal weight to each species to provide a simple and interpretable metric across diverse eukaryotes. However, we acknowledge that phylogeny-aware weighted scoring, which accounts for evolutionary distance between species, may provide a more refined evaluation of conservation. Such an approach could be particularly useful for capturing subtle differences in conservation among genes embedded in large network cliques. This represents an important direction for future research. The tendency of cell-cycle-related genes to form cliques likely arises from their complex interaction dynamics, as many of these proteins colocalize to specific organelles, such as the centrosome, kinetochore, and midbody [33]. Approximately 20% of these proteins exhibit multi-localization, which allows them to form dynamic PPI networks centered on these proteins to regulate cell division. Additionally, many cell-cycle-related proteins are reported to function as complexes or higher-order super-complexes formed by the assembly of multiple smaller complexes, thereby exerting spatiotemporal control over their activities [33]. Our MCODE analysis supported these findings, as high-MCC biomarkers showed a strong potential for complex formation (Fig 7(B), S6 Table, S7 Table). The propensity of cell-cycle-related genes to create dynamic interaction networks and form protein complexes likely explains the formation of large and numerous cliques within the PPI network. Additionally, Metascape analysis revealed that several biomarkers (EGFR, ERBB2, FGF18, FGFR4) were also classified under the “fibroblast growth factor receptor signaling pathway” GO term, a key functional category among the top MCC-ranked genes. Receptors with this GO term are known to form heterodimers and engage with various adaptor proteins and growth factors to activate multiple signaling cascades, such as the Akt/PI3K and Ras/MAPK pathways, which would also contribute to the formation of large cliques [34,35].

Interestingly, the GO enrichment analysis of the top 5% MCC-ranked genes also identified terms such as lung development, glial cell differentiation, connective tissue development, and embryo development, which reflect fundamental biological programs related to development and differentiation (Table 3). These findings suggest that the MCC-based framework may preferentially capture genes associated with core cellular functions that are commonly dysregulated across multiple cancer types. This raises the possibility that the MCC approach could be applicable to biomarker discovery beyond BC. However, as the present study focused exclusively on a PPI network constructed from BC-associated genes, further investigation is required to determine whether this framework can be generalized to other cancer types and whether comparable biomarker detection performance can be achieved.

Experimentally validated cancer biomarkers sensitively reflect tumor initiation and progression through changes in their expression. Using a bioinformatics approach, we identified 11 genes within the top 5% of MCC-ranked nodes in the 1,500-query-gene PPI network as candidate biomarkers for BC (Fig 8(A), S10 Table). Among these, seven genes (TOP2A, TTK, BUB1, CCNB2, KIF11, FOXM1, HMMR) are connected to approximately 21–26% of the 128 known biomarkers, suggesting they may share similar prognostic characteristics (Fig 8(B)). Six of these genes (CCNB2, TOP2A, KIF11, TTK, BUB1, and FOXM1) are primarily involved in the G2/M phase of the cell cycle, promoting mitosis and regulating mitotic chromosome condensation, segregation, and bipolar spindle assembly. Moreover, their prognostic power has been reported by other researchers, supporting the validity of our bioinformatics-based candidate selection method (S10 Table). Although HMMR is also important for spindle formation, its prognostic significance remains largely unreported [36]. The remaining four genes (FGF1, FGFR2, PTPN11 [SHP2], and ERBB3) exhibited lower connectivity to the known biomarkers (approximately 3–5%), suggesting they may have distinct prognostic characteristics. Among them, FGF1, FGFR2, and PTPN11 are classified under the “fibroblast growth factor receptor signaling pathway” GO term and are involved in regulating the cell cycle and survival through the Ras/MAPK, Akt/PI3K, PLCγ, and STAT signaling pathways [37]. ERBB3 also activates the Ras/MAPK and Akt/PI3K pathways through dimerization with ERBB2, contributing to proliferation and cell survival [38]. Although ERBB3 and PTPN11 have reported prognostic potential (S10 Table), and prognostic effects of genetic variants in FGF1 and FGFR2 have been studied [39,40], to the best of our knowledge, no studies have evaluated the prognostic potential of their wild-type forms. Therefore, these genes may represent promising novel biomarkers.

Notably, the four genes that exhibited low connectivity to known biomarkers were located near the threshold of the top 5% MCC ranking (Fig 8(B)). This observation suggests that genes involved in cancer progression may participate in parallel signaling pathways that are not strongly connected to the core biomarker-enriched network and may not always rank at the very top under strict clique-based criteria. In this context, integrating MCC with centrality measures such as DMNC, which are less dependent on strict clique structures, may improve the detection of such parallel pathways that are not fully captured by MCC alone. Such integrated approaches represent an important direction for future research.

Our findings align with those of recent studies that have identified BC biomarker candidates using other bioinformatics approaches. For instance, Lin et al. identified five genes (CDC2, CCNB1, CCNA2, TOP2A, and CCNB2) as potential diagnostic biomarkers for triple-negative BC by integrating co-expression network analysis with degree centrality [41]. In this study, four of these five genes (except CDC2) were also ranked within the top 5% by MCC score (Table 2). Among them, TOP2A and CCNB2 were identified as biomarker candidates in this study, and CCNB1 is a known biomarker included in the PAM50 and OncotypeDX panels. This overlap suggests that applying the MCC score is a useful and efficient approach for selecting appropriate biomarker candidates.

However, this study had several limitations, which must be acknowledged. Notably, approximately 60% of the biomarkers did not rank within the top 5% of MCC scores in the optimal PPI network constructed from the 1,500 query genes. Therefore, future studies must identify other node characteristics shared by these biomarkers. In addition to proteins, numerous microRNAs have also been reported as potential biomarkers for the diagnosis and prognosis of BC [42]. Our next step will involve a centrality analysis on integrated PPI and mRNA regulatory networks that incorporate relationships involving microRNAs and transcription factors to evaluate the network properties of potential microRNA biomarkers in BC [43].

Conclusion

In this study, we analyzed 12 centrality measures on a PPI network of BC-related genes, demonstrating that BC biomarkers tend to exhibit high MCC scores. In the network built with 1,500 query genes (including 64 biomarkers) and medium confidence interactions, approximately 36% of the biomarkers were found among the top 5% of genes ranked by MCC. The top MCC-ranked genes were primarily enriched in GO terms related to the cell-cycle process and fibroblast growth factor receptor signaling pathway. Furthermore, the biomarkers within this high-ranking gene set exhibited high evolutionary conservation and a strong potential for complex formation. These findings indicate that dense, cell-cycle-related regions of the PPI network contain numerous genes that can serve as reliable biomarkers capable of sensitively reflecting cancer progression. Notably, many of the 11 biomarker candidates selected using our MCC-based bioinformatic approach are related to the cell cycle and have been validated for their prognostic power in other experimental studies. The analytical approach using the MCC score, as employed in this study, is a valuable method for identifying potential cancer biomarkers.

Supporting information

S1 Fig. Ten types of centrality analyses.

This figure displays the results for ten types of centrality analyses (excluding degree and betweenness). The analyses were performed on a PPI network of 920 nodes, which were derived from 1,000 query genes and registered in the STRING database. Medium confidence edges (minimum required interaction score = 0.4) were used. Deeper red node colors indicate higher centrality scores, and nodes with a degree of zero are not shown. CC: Clustering Coefficient; DMNC: Density of Maximum Neighborhood Component; EPC: Edge Percolated Component; MCC: Maximal Clique Centrality; MNC: Maximum Neighborhood Component; PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.s001

(TIF)

S1 Table. BC biomarkers included in five genetic tests.

This table lists the 128 biomarkers that were selected as targets for analyzing node properties. The biomarkers are registered in the STRING database and were derived from five representative multigene tests: Mammaprint, Oncotype DX, PAM50, EndoPredict, and the BC Index. BC: Breast Cancer; PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.s002

(XLSX)

S2 Table. Algorithm for computing centrality metrics.

This table describes the five local-based and seven global-based centrality ranking methods that were applied to the constructed PPI networks of BC-related genes. BC: Breast Cancer; PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.s003

(XLSX)

S3 Table. Construction of PPI network using BC related genes obtained from the DISEASES database.

This table details the number of query genes selected from the DISEASES database based on descending disease score, along with the corresponding number of nodes registered in the STRING database. It also shows the statistics for PPI networks constructed using three different minimum required interaction scores. Network density was calculated as described in Section 2.2.

https://doi.org/10.1371/journal.pone.0347551.s004

(XLSX)

S4 Table. Centrality rankings of BC biomarkers in the optimal PPI network.

This table displays the twelve centrality rankings for 64 biomarkers included in the PPI network that was constructed from 1,500 query genes (1,362 nodes, 17,907 edges, medium confidence interactions). Deeper red shading indicates a higher ranking. HOXB13 and SLC39A were excluded from the ranking because they had a degree of 0. CC: Clustering Coefficient; DMNC: Density of Maximum Neighborhood Component; EPC: Edge Percolated Component; MCC: Maximal Clique Centrality; MNC: Maximum Neighborhood Component; BC: Breast Cancer; PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.s005

(XLSX)

S5 Table. Evolutionary conservation scores for BC biomarkers.

This table presents the MCC ranking and evolutionary conservation scores (0–1.0) for the 64 biomarkers included in the 1,500-query-gene network (1,362 nodes). For each Homo sapiens gene, ortholog candidates across five eukaryotic species are shown. Pairs with a seed score >0.95 were considered true orthologs and are indicated in bold. An evolutionary conservation score of 0.2 was assigned for each identified true ortholog pair. BC: Breast Cancer; PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.s006

(XLSX)

S6 Table. MCODE node scores for prediction of complex formation.

This table shows the MCC rankings and node scores for 64 biomarkers from the 1,500-query-gene network (1,362 nodes, 17,907 edges, medium confidence). The node scores, which represent the capacity for protein complex formation, were calculated using the MCODE algorithm on the PPI network of BC-related genes. BC: Breast Cancer; PPI: protein–protein interaction.

https://doi.org/10.1371/journal.pone.0347551.s007

(XLSX)

S7 Table. Statistical relationships between biomarker centrality and biological characteristics.

This table shows the results of the Wilcoxon rank-sum test for the evolutionary conservation and protein complex analyses. These tests evaluated the differences in evolutionary conservation scores (from S5 Table) and node scores (from S6 Table) between the high- and low-centrality ranking groups (each comprising 32 genes). CC: Clustering Coefficient; DMNC: Density of Maximum Neighborhood Component; EPC: Edge Percolated Component; MCC: Maximal Clique Centrality; MNC: Maximum Neighborhood Component; BC: Breast Cancer.

https://doi.org/10.1371/journal.pone.0347551.s008

(XLSX)

S8 Table. Analysis of top 5%-MCC genes as novel biomarker candidates via survival and expression data.

This table provides the data used to identify novel biomarker candidates from the top 5% of MCC-ranked genes. Candidates were required to satisfy two conditions: (1)achieve statistical significance (P < 0.05) in survival analyses for both RFS and OS, and (2) show significant differential expression (P < 0.05) between normal and cancer tissues, and between cancer and metastatic tissues. The table lists the 11 genes that met all criteria: TOP2A, TTK, BUB1, CCNB2, KIF11, HMMR, FOXM1, FGF1, PTPN11, FGFR2, and ERBB3. RFS: relapse-free survival; OS: overall survival.

https://doi.org/10.1371/journal.pone.0347551.s009

(XLSX)

S9 Table. GO categories for biomarkers with top MCC rankings.

This table lists the GO categories for the 23 known biomarkers that ranked within the top 5% for MCC. Supporting references for the functional annotations are also provided. GO: gene ontology; FGFR: fibroblast growth factor receptor.

https://doi.org/10.1371/journal.pone.0347551.s010

(XLSX)

S10 Table. Functional annotation of 11 novel BC biomarker candidates.

This table presents the known biological functions and summarizes published reports on the prognostic power for each of the 11 novel biomarker candidates identified in this study. BC: Breast Cancer.

https://doi.org/10.1371/journal.pone.0347551.s011

(XLSX)

References

1. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023;73(1):17–48. pmid:36633525
- View Article
- PubMed/NCBI
- Google Scholar
2. Turner KM, Yeo SK, Holm TM, Shaughnessy E, Guan J-L. Heterogeneity within molecular subtypes of breast cancer. Am J Physiol Cell Physiol. 2021;321(2):C343–54. pmid:34191627
- View Article
- PubMed/NCBI
- Google Scholar
3. Kudelova E, Smolar M, Holubekova V, Hornakova A, Dvorska D, Lucansky V. Genetic heterogeneity, tumor microenvironment and immunotherapy in triple-negative breast cancer. Int J Mol Sci. 2022;23(23). pmid:36499265
- View Article
- PubMed/NCBI
- Google Scholar
4. Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817–26. pmid:15591335
- View Article
- PubMed/NCBI
- Google Scholar
5. Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160–7. pmid:19204204
- View Article
- PubMed/NCBI
- Google Scholar
6. van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415(6871):530–6. pmid:11823860
- View Article
- PubMed/NCBI
- Google Scholar
7. Filipits M, Rudas M, Jakesz R, Dubsky P, Fitzal F, Singer CF, et al. A New Molecular Predictor of Distant Recurrence in ER-Positive, HER2-Negative Breast Cancer Adds Independent Information to Conventional Clinical Risk Factors. Clinical Cancer Research. 2011;17(18):6012–20.
- View Article
- Google Scholar
8. Ma X-J, Salunga R, Dahiya S, Wang W, Carney E, Durbecq V, et al. A Five-Gene Molecular Grade Index and HOXB13:IL17BR Are Complementary Prognostic Factors in Early Stage Breast Cancer. Clinical Cancer Research. 2008;14(9):2601–8.
- View Article
- Google Scholar
9. Zeng C, Zhang J. A narrative review of five multigenetic assays in breast cancer. Transl Cancer Res. 2022;11(4):897–907. pmid:35571670
- View Article
- PubMed/NCBI
- Google Scholar
10. Tian Z, He W, Tang J, Liao X, Yang Q, Wu Y. Identification of important modules and biomarkers in breast cancer based on WGCNA. Onco Targets Ther. 2020;13:6805–17. pmid:32764968
- View Article
- PubMed/NCBI
- Google Scholar
11. Guo Q, Qiu P, Pan K, Liang H, Liu Z, Lin J. Integrated machine learning algorithms identify KIF15 as a potential prognostic biomarker and correlated with stemness in triple-negative breast cancer. Sci Rep. 2024;14(1):21449. pmid:39271768
- View Article
- PubMed/NCBI
- Google Scholar
12. Chin C-H, Chen S-H, Wu H-H, Ho C-W, Ko M-T, Lin C-Y. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8 Suppl 4(Suppl 4):S11. pmid:25521941
- View Article
- PubMed/NCBI
- Google Scholar
13. Viacava Follis A. Centrality of drug targets in protein networks. BMC Bioinformatics. 2021;22(1):527. pmid:34715787
- View Article
- PubMed/NCBI
- Google Scholar
14. Hayat S, Ishrat R. Exploring potential genes and pathways related to lung cancer: a graph theoretical analysis. Bioinformation. 2023;19(9):954–63. pmid:37928493
- View Article
- PubMed/NCBI
- Google Scholar
15. Mangangcha IR, Malik MZ, Kucuk O, Ali S, Singh RKB. Kinless hubs are potential target genes in prostate cancer network. Genomics. 2020;112(6):5227–39. pmid:32976977
- View Article
- PubMed/NCBI
- Google Scholar
16. Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, et al. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet. 2021;12:596794. pmid:34484285
- View Article
- PubMed/NCBI
- Google Scholar
17. Pletscher-Frankild S, Pallejà A, Tsafou K, Binder JX, Jensen LJ. DISEASES: text mining and data integration of disease-gene associations. Methods. 2015;74:83–9. pmid:25484339
- View Article
- PubMed/NCBI
- Google Scholar
18. Grissa D, Junge A, Oprea TI, Jensen LJ. Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration. Database (Oxford). 2022;2022:baac019. pmid:35348648
- View Article
- PubMed/NCBI
- Google Scholar
19. Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data. J Proteome Res. 2019;18(2):623–32. pmid:30450911
- View Article
- PubMed/NCBI
- Google Scholar
20. 20.string. https://string-db.org/. Accessed 2024 February 23.
- View Article
- Google Scholar
21. Metascape. https://metascape.org/. Accessed 2024 February 29.
- View Article
- Google Scholar
22. Revigo. http://revigo.irb.hr/. Accessed 2024 February 29.
- View Article
- Google Scholar
23. Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One. 2011;6(7):e21800. pmid:21789182
- View Article
- PubMed/NCBI
- Google Scholar
24. Györffy B, Lanczky A, Eklund AC, Denkert C, Budczies J, Li Q, et al. An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients. Breast Cancer Res Treat. 2010;123(3):725–31. pmid:20020197
- View Article
- PubMed/NCBI
- Google Scholar
25. TNMplot. https://tnmplot.com/analysis/. Accessed 2024 March 3.
- View Article
- Google Scholar
26. Bartha Á, Győrffy B. TNMplot.com: A Web Tool for the Comparison of Gene Expression in Normal, Tumor and Metastatic Tissues. Int J Mol Sci. 2021;22(5). pmid:33807717
- View Article
- PubMed/NCBI
- Google Scholar
27. InParanoidDB9 [Internet]. 2024 [cited 2024 Sep 3]. Available from: https://inparanoidb.sbc.su.se/
- View Article
- Google Scholar
28. Persson E, Sonnhammer ELL. InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins. J Mol Biol. 2023;435(14):168001. pmid:36764355
- View Article
- PubMed/NCBI
- Google Scholar
29. Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4:2. pmid:12525261
- View Article
- PubMed/NCBI
- Google Scholar
30. Arciero C, Somiari SB, Shriver CD, Brzeski H, Jordan R, Hu H, et al. Functional relationship and gene ontology classification of breast cancer biomarkers. Int J Biol Markers. 2003;18(4):241–72. pmid:14756541
- View Article
- PubMed/NCBI
- Google Scholar
31. Dabydeen SA, Desai A, Sahoo D. Unbiased Boolean analysis of public gene expression data for cell cycle gene identification. Molecular Biology of the Cell. 2019;30(14):1770–9. pmid:31091168
- View Article
- PubMed/NCBI
- Google Scholar
32. Wuchty S, Oltvai ZN, Barabási A-L. Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet. 2003;35(2):176–9. pmid:12973352
- View Article
- PubMed/NCBI
- Google Scholar
33. Zheng Y, Guo J, Li X, Xie Y, Hou M, Fu X, et al. An integrated overview of spatiotemporal organization and regulation in mitosis in terms of the proteins in the functional supercomplexes. Front Microbiol. 2014;5:573. pmid:25400627
- View Article
- PubMed/NCBI
- Google Scholar
34. Hennessy BT, Smith DL, Ram PT, Lu Y, Mills GB. Exploiting the PI3K/AKT pathway for cancer drug discovery. Nat Rev Drug Discov. 2005;4(12):988–1004. pmid:16341064
- View Article
- PubMed/NCBI
- Google Scholar
35. Turner N, Grose R. Fibroblast growth factor signalling: from development to cancer. Nat Rev Cancer. 2010;10(2):116–29. pmid:20094046
- View Article
- PubMed/NCBI
- Google Scholar
36. Tolg C, Hamilton SR, Morningstar L, Zhang J, Zhang S, Esguerra KV, et al. RHAMM promotes interphase microtubule instability and mitotic spindle integrity through MEK1/ERK1/2 activity. J Biol Chem. 2010;285(34):26461–74. pmid:20558733
- View Article
- PubMed/NCBI
- Google Scholar
37. Ornitz DM, Itoh N. The Fibroblast Growth Factor signaling pathway. Wiley Interdiscip Rev Dev Biol. 2015;4(3):215–66. pmid:25772309
- View Article
- PubMed/NCBI
- Google Scholar
38. Baselga J, Swain SM. Novel anticancer targets: revisiting ERBB2 and discovering ERBB3. Nat Rev Cancer. 2009;9(7):463–75. pmid:19536107
- View Article
- PubMed/NCBI
- Google Scholar
39. Slattery ML, John EM, Stern MC, Herrick J, Lundgreen A, Giuliano AR, et al. Associations with growth factor genes (FGF1, FGF2, PDGFB, FGFR2, NRG2, EGF, ERBB2) with breast cancer risk and survival: the Breast Cancer Health Disparities Study. Breast Cancer Res Treat. 2013;140(3):587–601. pmid:23912956
- View Article
- PubMed/NCBI
- Google Scholar
40. Cui F, Wu D, Wang W, He X, Wang M. Variants of FGFR2 and their associations with breast cancer risk: a HUGE systematic review and meta-analysis. Breast Cancer Res Treat. 2016;155(2):313–35. pmid:26728143
- View Article
- PubMed/NCBI
- Google Scholar
41. Lin Y, Wang S, Yang Q. Identification of hub genes and diagnostic efficacy for triple-negative breast cancer through WGCNA and Mendelian randomization. Discov Oncol. 2024;15(1):117. pmid:38609711
- View Article
- PubMed/NCBI
- Google Scholar
42. Jordan-Alejandre E, Campos-Parra AD, Castro-López DL, Silva-Cázares MB. Potential miRNA use as a biomarker: from breast cancer diagnosis to metastasis. Cells. 2023;12(4). pmid:36831192
- View Article
- PubMed/NCBI
- Google Scholar
43. Zhang J, Le TD, Liu L, He J, Li J. A novel framework for inferring condition-specific TF and miRNA co-regulation of protein-protein interactions. Gene. 2016;577(1):55–64. pmid:26611531
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023;73(1):17–48. pmid:36633525
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Turner KM, Yeo SK, Holm TM, Shaughnessy E, Guan J-L. Heterogeneity within molecular subtypes of breast cancer. Am J Physiol Cell Physiol. 2021;321(2):C343–54. pmid:34191627
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Kudelova E, Smolar M, Holubekova V, Hornakova A, Dvorska D, Lucansky V. Genetic heterogeneity, tumor microenvironment and immunotherapy in triple-negative breast cancer. Int J Mol Sci. 2022;23(23). pmid:36499265
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817–26. pmid:15591335
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160–7. pmid:19204204
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415(6871):530–6. pmid:11823860
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Filipits M, Rudas M, Jakesz R, Dubsky P, Fitzal F, Singer CF, et al. A New Molecular Predictor of Distant Recurrence in ER-Positive, HER2-Negative Breast Cancer Adds Independent Information to Conventional Clinical Risk Factors. Clinical Cancer Research. 2011;17(18):6012–20.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref8] 8. Ma X-J, Salunga R, Dahiya S, Wang W, Carney E, Durbecq V, et al. A Five-Gene Molecular Grade Index and HOXB13:IL17BR Are Complementary Prognostic Factors in Early Stage Breast Cancer. Clinical Cancer Research. 2008;14(9):2601–8.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref9] 9. Zeng C, Zhang J. A narrative review of five multigenetic assays in breast cancer. Transl Cancer Res. 2022;11(4):897–907. pmid:35571670
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref10] 10. Tian Z, He W, Tang J, Liao X, Yang Q, Wu Y. Identification of important modules and biomarkers in breast cancer based on WGCNA. Onco Targets Ther. 2020;13:6805–17. pmid:32764968
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref11] 11. Guo Q, Qiu P, Pan K, Liang H, Liu Z, Lin J. Integrated machine learning algorithms identify KIF15 as a potential prognostic biomarker and correlated with stemness in triple-negative breast cancer. Sci Rep. 2024;14(1):21449. pmid:39271768
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Chin C-H, Chen S-H, Wu H-H, Ho C-W, Ko M-T, Lin C-Y. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8 Suppl 4(Suppl 4):S11. pmid:25521941
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref13] 13. Viacava Follis A. Centrality of drug targets in protein networks. BMC Bioinformatics. 2021;22(1):527. pmid:34715787
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref14] 14. Hayat S, Ishrat R. Exploring potential genes and pathways related to lung cancer: a graph theoretical analysis. Bioinformation. 2023;19(9):954–63. pmid:37928493
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref15] 15. Mangangcha IR, Malik MZ, Kucuk O, Ali S, Singh RKB. Kinless hubs are potential target genes in prostate cancer network. Genomics. 2020;112(6):5227–39. pmid:32976977
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref16] 16. Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, et al. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet. 2021;12:596794. pmid:34484285
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref17] 17. Pletscher-Frankild S, Pallejà A, Tsafou K, Binder JX, Jensen LJ. DISEASES: text mining and data integration of disease-gene associations. Methods. 2015;74:83–9. pmid:25484339
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref18] 18. Grissa D, Junge A, Oprea TI, Jensen LJ. Diseases 2.0: a weekly updated database of disease-gene associations from text mining and data integration. Database (Oxford). 2022;2022:baac019. pmid:35348648
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref19] 19. Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data. J Proteome Res. 2019;18(2):623–32. pmid:30450911
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref20] 20. 20.string. https://string-db.org/. Accessed 2024 February 23.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref21] 21. Metascape. https://metascape.org/. Accessed 2024 February 29.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref22] 22. Revigo. http://revigo.irb.hr/. Accessed 2024 February 29.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref23] 23. Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One. 2011;6(7):e21800. pmid:21789182
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref24] 24. Györffy B, Lanczky A, Eklund AC, Denkert C, Budczies J, Li Q, et al. An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients. Breast Cancer Res Treat. 2010;123(3):725–31. pmid:20020197
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref25] 25. TNMplot. https://tnmplot.com/analysis/. Accessed 2024 March 3.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref26] 26. Bartha Á, Győrffy B. TNMplot.com: A Web Tool for the Comparison of Gene Expression in Normal, Tumor and Metastatic Tissues. Int J Mol Sci. 2021;22(5). pmid:33807717
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref27] 27. InParanoidDB9 [Internet]. 2024 [cited 2024 Sep 3]. Available from: https://inparanoidb.sbc.su.se/
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref28] 28. Persson E, Sonnhammer ELL. InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins. J Mol Biol. 2023;435(14):168001. pmid:36764355
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref29] 29. Bader GD, Hogue CWV. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4:2. pmid:12525261
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref30] 30. Arciero C, Somiari SB, Shriver CD, Brzeski H, Jordan R, Hu H, et al. Functional relationship and gene ontology classification of breast cancer biomarkers. Int J Biol Markers. 2003;18(4):241–72. pmid:14756541
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref31] 31. Dabydeen SA, Desai A, Sahoo D. Unbiased Boolean analysis of public gene expression data for cell cycle gene identification. Molecular Biology of the Cell. 2019;30(14):1770–9. pmid:31091168
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref32] 32. Wuchty S, Oltvai ZN, Barabási A-L. Evolutionary conservation of motif constituents in the yeast protein interaction network. Nat Genet. 2003;35(2):176–9. pmid:12973352
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref33] 33. Zheng Y, Guo J, Li X, Xie Y, Hou M, Fu X, et al. An integrated overview of spatiotemporal organization and regulation in mitosis in terms of the proteins in the functional supercomplexes. Front Microbiol. 2014;5:573. pmid:25400627
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref34] 34. Hennessy BT, Smith DL, Ram PT, Lu Y, Mills GB. Exploiting the PI3K/AKT pathway for cancer drug discovery. Nat Rev Drug Discov. 2005;4(12):988–1004. pmid:16341064
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref35] 35. Turner N, Grose R. Fibroblast growth factor signalling: from development to cancer. Nat Rev Cancer. 2010;10(2):116–29. pmid:20094046
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref36] 36. Tolg C, Hamilton SR, Morningstar L, Zhang J, Zhang S, Esguerra KV, et al. RHAMM promotes interphase microtubule instability and mitotic spindle integrity through MEK1/ERK1/2 activity. J Biol Chem. 2010;285(34):26461–74. pmid:20558733
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref37] 37. Ornitz DM, Itoh N. The Fibroblast Growth Factor signaling pathway. Wiley Interdiscip Rev Dev Biol. 2015;4(3):215–66. pmid:25772309
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref38] 38. Baselga J, Swain SM. Novel anticancer targets: revisiting ERBB2 and discovering ERBB3. Nat Rev Cancer. 2009;9(7):463–75. pmid:19536107
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref39] 39. Slattery ML, John EM, Stern MC, Herrick J, Lundgreen A, Giuliano AR, et al. Associations with growth factor genes (FGF1, FGF2, PDGFB, FGFR2, NRG2, EGF, ERBB2) with breast cancer risk and survival: the Breast Cancer Health Disparities Study. Breast Cancer Res Treat. 2013;140(3):587–601. pmid:23912956
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref40] 40. Cui F, Wu D, Wang W, He X, Wang M. Variants of FGFR2 and their associations with breast cancer risk: a HUGE systematic review and meta-analysis. Breast Cancer Res Treat. 2016;155(2):313–35. pmid:26728143
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref41] 41. Lin Y, Wang S, Yang Q. Identification of hub genes and diagnostic efficacy for triple-negative breast cancer through WGCNA and Mendelian randomization. Discov Oncol. 2024;15(1):117. pmid:38609711
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref42] 42. Jordan-Alejandre E, Campos-Parra AD, Castro-López DL, Silva-Cázares MB. Potential miRNA use as a biomarker: from breast cancer diagnosis to metastasis. Cells. 2023;12(4). pmid:36831192
View Article
PubMed/NCBI
Google Scholar

[159] View Article

[160] PubMed/NCBI

[161] Google Scholar

[ref43] 43. Zhang J, Le TD, Liu L, He J, Li J. A novel framework for inferring condition-specific TF and miRNA co-regulation of protein-protein interactions. Gene. 2016;577(1):55–64. pmid:26611531
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

Figures

Abstract

Introduction

Materials and Methods

Data acquisition and preprocessing

PPI network construction using BC-related genes

Centrality analysis

Gene ontology (GO) enrichment analysis

Survival analysis

Differential expression analysis

Ortholog pair analysis

Prediction of complex formation ability

Results

PPI network construction using BC-related genes

Centrality analysis on the PPI network

Relationship between biomarker MCC rankings and PPI network statistics

GO of genes with high MCC scores

Relationship between MCC rankings of biomarkers and their biological characteristics

Prediction of novel biomarker candidates

Discussion

Conclusion

Supporting information

S1 Fig. Ten types of centrality analyses.

S1 Table. BC biomarkers included in five genetic tests.

S2 Table. Algorithm for computing centrality metrics.

S3 Table. Construction of PPI network using BC related genes obtained from the DISEASES database.

S4 Table. Centrality rankings of BC biomarkers in the optimal PPI network.

S5 Table. Evolutionary conservation scores for BC biomarkers.

S6 Table. MCODE node scores for prediction of complex formation.

S7 Table. Statistical relationships between biomarker centrality and biological characteristics.

S8 Table. Analysis of top 5%-MCC genes as novel biomarker candidates via survival and expression data.

S9 Table. GO categories for biomarkers with top MCC rankings.

S10 Table. Functional annotation of 11 novel BC biomarker candidates.

References