Figures
Abstract
Objective
Breast cancer, a global concern predominantly impacting women, poses a significant threat when not identified early. While survival rates for breast cancer patients are typically favorable, the emergence of regional metastases markedly diminishes survival prospects. Detecting metastases and comprehending their molecular underpinnings are crucial for tailoring effective treatments and improving patient survival outcomes.
Methods
Various artificial intelligence methods and techniques were employed in this study to achieve accurate outcomes. Initially, the data was organized and underwent hold-out cross-validation, data cleaning, and normalization. Subsequently, feature selection was conducted using ANOVA and binary Particle Swarm Optimization (PSO). During the analysis phase, the discriminative power of the selected features was evaluated using machine learning classification algorithms. Finally, the selected features were considered, and the SHAP algorithm was utilized to identify the most significant features for enhancing the decoding of dominant molecular mechanisms in lymph node metastases.
Results
In this study, five main steps were followed for the analysis of mRNA expression data: reading, preprocessing, feature selection, classification, and SHAP algorithm. The RF classifier utilized the candidate mRNAs to differentiate between negative and positive categories with an accuracy of 61% and an AUC of 0.6. During the SHAP process, intriguing relationships between the selected mRNAs and positive/negative lymph node status were discovered. The results indicate that GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2 are among the top five most impactful mRNAs based on their SHAP values.
Conclusion
The prominent identified mRNAs including GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2, are implicated in lymph node metastasis. This study holds promise in elucidating a thorough insight into key candidate genes that could significantly impact the early detection and tailored therapeutic strategies for lymph node metastasis in patients with breast cancer.
Citation: Vahed SZ, Khatibi SMH, Saadat YR, Emdadi M, Khodaei B, Alishani MM, et al. (2024) Introducing effective genes in lymph node metastasis of breast cancer patients using SHAP values based on the mRNA expression data. PLoS ONE 19(8): e0308531. https://doi.org/10.1371/journal.pone.0308531
Editor: Guanghui Liu, State University of New York at Oswego, UNITED STATES OF AMERICA
Received: March 24, 2024; Accepted: July 24, 2024; Published: August 16, 2024
Copyright: © 2024 Vahed et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: This research was funded by the National Institute for Medical Research Development (NIMAD) (Grant No: 4000599). It's worth noting that the funders had no role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Breast cancer is a widespread and fatal form of cancer that affects women globally [1]. In 2023, 297,790 new cases of breast cancer and 43,700 deaths were reported in the US [2]. The survival rate for breast cancer patients is high, with 90% and 83% at 5 and 10 years, respectively [3]. However, the presence of metastases in breast cancer patients leads to a notable decrease in survival rates. In cases where there are regional metastases, such as in the lymph nodes, the 5-year survival rate drops to 85%. If the metastases are distant, the rate decreases further to 26% [4]. Identifying the presence of metastases is of utmost importance as it allows for appropriate treatment and ultimately enhances patient survival rates.
When looking for the presence of metastases, the first step is to examine the regional lymph nodes. The presence of metastases in these lymph nodes is a poor prognostic factor only, while it is the main factor in predicting the presence of distant metastases [5]. In the case of breast cancer, the most common method of evaluating the regional lymph node status is the sentinel lymph node procedure [6, 7]. This procedure involves injecting a blue dye or radioactive tracer near the tumor. The first lymph node reached by the injected substance, known as the sentinel lymph node, is likely to contain metastatic cancer cells and should be removed. The removed lymph node is then sent for pathological processing and analysis by a pathologist.
Evaluating lymph node status is a crucial task for pathologists, but it comes with a challenge. The large area of tissue that needs to be checked is extensive, and it can be hard to identify metastases that can be as small as single cells. In the case of sentinel lymph nodes, at least three sections at different levels through the lymph node have to be examined. At least ten lymph nodes of non-sentinel nodes must be examined [8, 9]. This process can be tedious and time-consuming, and pathologists may miss small metastases [10]. To address this, pathologists in the Netherlands performed a secondary examination using immune histochemical staining for cytokeratin if inspection of the H&E slide reveals no metastases. However, even in this secondary examination, metastases can still be missed [11].
With the advent of personalized treatment, traditional prognostic biomarkers such as tumor size, tumor grade, and lymph node metastases are no longer sufficient for the effective management of early-diagnosed breast cancer patients [12, 13]. As a result, extensive research has been conducted to identify and validate molecular biomarkers in recent years that can serve as prognostic and predictive indicators. These new approaches are commonly referred to as multi-parameter, multi-analyte, and multi-gene tests. Several of these tests, recommended by experts, are currently used in clinical practice. Some validated examinations include Oncotype DX, MammaPrint, and uPA/PAI-1. The Oncotype DX test is a widely used multigene signature test that helps predict the risk of breast cancer recurrence by evaluating the expression of 21 mRNA genes and then calculates a recurrence score based on the relative expression of these genes [14, 15]. However, it should be noted that Oncotype DX has some limitations, such as a lack of validation for long-term follow-ups and ER-negative cases. Another molecular test called MammaPrint uses micro-array to evaluate the relative expression of genes associated with regulatory pathways of cancer, specifically 70 genes. MammaPrint is a validated test for predicting cancer recurrence and dividing patients into high-risk and low-risk groups [16–20]. In addition to these tests, there is another molecular test that evaluates protein levels. This test measures uPA and PAI-1 markers by extracting breast cancer tissues [21]. Studies have shown that elevated levels of these proteins result in more severe outcomes in patients.
These multigene signature tests are exorbitant in many countries. To set up a simple and inexpensive test to serve as a diagnostic and predictive biomarker test, considerable effort has been devoted. To discover a novel biomarker, databases could be considered a helpful tool as well. Recently, many researchers have been working on circulating tumor cells (CTCs), microRNAs, and DNA mutation testing (such as the measurement of ctDNA) to find new prognostic and predictive markers. These novel biomarkers before their clinical applications should be validated through clinical and analytical assessments to start our journey towards a personalized treatment for early-diagnosed patients with breast cancer, we need established prognostic biomarkers in combination with validated prognostic/predictive factors.
In this study, we aimed to create a new roadmap for the development of robust biomarkers that can accurately detect the presence of lymph node metastases. The significance of this work lies in its ability to provide effective pathways for identifying patients who may have complex conditions where traditional clinical and imaging diagnostic methods fail to detect metastases despite the presence of metastases from a pathological perspective. Our research has introduced new perspectives and tools to assist pathologists and surgeons in screening and classifying such complex and ambiguous conditions and to facilitate decision-making when considering the possible removal of axillary lymph nodes.
Material and methods
Material
The NCBI data portal (https://www.ncbi.nlm.nih.gov/geo/) provides mRNA-seq data from the GEO repository. For this study, we used dataset number GSE96058, which contains 30,865 gene expressions (mRNAs) for 3409 breast cancer patients measured using the GPL11154 platform [22, 23]. This dataset is a subset of the multi-center prospective cohort study Sweden Cancerome Analysis Network—Breast [SCAN-B], in which gene expression data is collected from multiple clinical centers. This platform applied the Illumina HiSeq 2000 technique, in which gene expression levels are reported using 54,715 Probes for 30,865 genes. In addition, clinical data of breast cancer patients were obtained from the same dataset. The breast cancer patients were divided into two groups based on their lymph node status: 2099 samples were in the negative lymph node group, while 1209 were in the positive lymph node group (Table 1). Informed consent to participate was obtained from all patients by the Regional Ethical Review Board of Lund (diary numbers 2007/155, 2009/658, 2009/659, 2014/8), the county governmental biobank center, and the Swedish Data Inspection group (diary number 364–2010).
The study was conducted by the principles of the Declaration of Helsinki (2013), and we received permission to access the research data file from the NCBI-GEO program through the National Cancer Institute in the United States. Since the NCBI-GEO data is publicly available, the local ethics committee waived the need for approval. The local ethics code is IR.NIMAD.REC.1400.025.
Method
The proposed approach for finding significant mRNAs involves five steps: reading, pre-processing, feature selection, classification, and applying the SHAP algorithm. Fig 1 illustrates the complete process and provides additional details for each step.
Five main steps, including reading, preprocessing, feature selection, classification, and SHAP algorithm were applied to the mRNA expression data. 1) Required data was collected from the NCBI-GEO repository and organized during the reading step. 2) The pre-processing step includes two sub-steps, cross-validation and data normalization. 3) The feature-selection step contains two parts: the filter method based on ANOVA and the wrapper method based on Particle Swarm Optimization (PSO) for mRNA data, in which candidate mRNAs with more relevance to positive and negative Papillary lymph node groups were selected. 4) Multi-classifier models were utilized to evaluate the discrimination power of the selected mRNAs. 5) The SHAP algorithm was employed to discover the possible relationship between the selected mRNAs and positive and negative groups.
In the reading step, the mRNA data was organized into a matrix with 3308 rows and 30,865 columns, representing the number of samples and features respectively. To achieve a more authentic error estimation, the hold-out Cross-Validation (CV) approach was used to separate the data into train, validation, and test proportions. The train, validation, and test sets were set to 70%, 10%, and 20%, respectively. Additionally, some feature columns with identical values for all samples of the training set were removed. Finally, to normalize the feature selection and classification steps, the z-score and min-max methods were employed.
To reduce the number of irrelevant attributes, we implemented a two-part feature-selection process that consisted of a classifier-independent filter method and a wrapper method [24]. During the filter phase, we used ANOVA [25] to evaluate each feature individually and reduce the dimension of the mRNA data. Due to class imbalance in the training set, we used the SMOTE algorithm [26] to reduce its effect on ANOVA. This also helped to reduce the computational cost during the subsequent wrapper step. From the training set, we selected the 200 top features based on their F-values.
We used a method called Particle Swarm Optimization (PSO) [27] to select the most important features. This method is based on swarm intelligence [28] and requires a classifier, in our case Random Forest (RF), to evaluate the fitness function. To ensure accuracy, we chose the fitness function based on AUC and measured its value using a validation set. The algorithm parameters, including the number of population and iterations, were set to 35 and 100 respectively. After running the binary PSO method, we selected 109 significant mRNAs based on the output.
Different machine learning models are not inherently better than others, as each model can outperform the others depending on the problem at hand. Therefore, it is crucial to employ various methods to address the issue, particularly in the classification domain. Subsequently, the best-performing model, free from bias, is selected based on its performance. To determine the most suitable model, various parameters are taken into consideration, with the AUC being one of the most valuable parameters according to machine learning literature. We used several supervised classifiers, such as Support Vector Machine (SVM) [29], Naive Bayes (NB), K-Nearest Neighbor (KNN) [30], and Random Forest (RF) [31], to determine the differentiation power of the selected mRNAs. For each classifier, important evaluation metrics including accuracy and AUC were calculated.
In the next step, we studied significant relationships using the SHapley Additive exPlanation (SHAP) algorithm [32]. We extracted SHAP values regarding the effect of selected mRNAs in model predictions (positive/negative lymph node status). It is crucial to accurately understand the output of a prediction model. This helps build user trust, identify areas for improvement, and gain insights into the modelling process. Simple models, such as linear models, are often preferred in certain applications due to their ease of interpretation, even if they may not be as accurate as complex models. However, the increasing availability of big data has highlighted the trade-off between a model’s accuracy and interpretability. Various methods have been proposed to address this issue, but there is still a lack of understanding about how these methods compare and when to use one over the other.
The SHAP algorithm is an effective tool for explaining the role of each input variable in model predictions. The SHAP technique employs simplified explanation models that yield close approximations to the original predictive model. These models help in explaining complex machine learning models.
The detailed technical explanation for each of the stages mentioned in the methodology has been comprehensively covered in the original references. Additionally, the output results of each step have been reported in the results section of the study. It is worth mentioning that Python was the primary programming language used in this study, and we employed various libraries and frameworks such as Numpy, Pandas, Matplotlib, Seaborn, Scikit-learn, Scipy, Pyswarms, and SHAP to implement the proposed steps.
Result
The current study had two primary objectives. The primary aim was to decipher the molecular mechanisms implicated in lymph node metastasis and isolate the top mRNAs with the most associative interactions. The secondary objective was to identify the most significant mRNAs capable of accurately distinguishing positive and negative lymph node status in breast cancer. Table 2 reports the top 109 most significant features among the 28,456 mRNAs. After two consecutive feature selection steps, these features were selected using ANOVA (filter method) and binary PSO algorithm (wrapper method).
To assess the effectiveness of the selected mRNAs in distinguishing between positive and negative lymph nodes, four different classifiers, namely SVM, KNN, NB, and RF were used. The performance of each classifier was evaluated by computing key metrics such as accuracy and AUC across the train/validation/test folds of the candidate mRNAs and reported in Table 3.
The classifiers rely on mRNA features and the results showed that RF outperforms other algorithms. The accuracy of RF was 61%, and its AUC-ROC score was 0.6 in the test data evaluation. Fig 2 illustrates the confusion matrices for the training, validation, and test sets.
The confusion matrix for (a): Training set Confusion Matrix, (b): Training set ROC (c): Validation set Confusion Matrix, (d): Validation set ROC (e): Test set Confusion Matrix, (f): Test set ROC. Zero and one are negative and positive groups, respectively.
During the process of SHAP, some interesting relationships between selected mRNAs and positive/negative lymph node status were discovered. Additionally, the five most significant mRNAs based on their SHAP values were selected and their role in the positive and negative lymph nodes of breast cancer patients were studied to gain a better understanding of their impact and relevance. In this study, a summary plot is presented using SHAP values, displayed in Fig 3A. This plot showcases the significance and impact of selected mRNAs in model predictions for both negative and positive lymph nodes. The results indicate that GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2 are among the top five most effective mRNAs based on their SHAP values. Additionally, two separate summary plots are provided based on negative and positive lymph node classes, shown in Fig 3B and 3C, respectively. The Y-axis in these plots lists feature names in the order of their importance, from top to bottom, while the X-axis represents the SHAP value, indicating the degree of change in log odds. The color of each point on the plot represents the value of the corresponding mRNA, where red indicates high values and blue indicates low values. Each point in the plot represents a row of mRNA expression data from the original dataset. If we observe the SHAP value of GDF5, we can see that it is mostly negative when the count of GDF5 is low. This indicates that lower expression of GDF5 counts tends to negatively impact the output.
The summary plot using SHAP values (a): summary plot of both classes, (b): summary plot of the negative group, (c): summary plot of the positive group.
We presented the decision plot, which illustrates the model’s decisions by displaying the cumulative SHAP values for each prediction or sample. The decision plot features lines that show the extent to which each feature contributed to a specific model prediction, thereby explaining which feature values influenced the prediction. In the decision plots of Fig 4A and 4B, the negative and positive target labels are respectively represented. The explanation in Fig 4C (negative class) and d (positive class) displays the features that contribute to the model prediction, compared to the average model output over the training dataset. The features that push the prediction higher are shown in red, while those that push it lower are shown in blue. This explanation is presented in a waterfall plot format.
(a): Decision plot of the first sample related to negative class, (b): Decision plot of the first sample related to positive class, (c): waterfall plot of the first sample related to negative class, (d): waterfall plot of the first sample related to positive class.
The first sample in the dataset was analyzed to identify the features that led to positive and negative outcomes (Fig 5A and 5B). A force plot was used to provide the expected value, SHAP value, and testing sample. From the negative class in Fig 5A, it was found that PHLDA1 and FGF14-AS2 increased the results, while BAHCC1 and GDF5 decreased them. Fig 5B also provided other details related to the positive class. Fig 6 displays the box plot of five critical mRNAs with the most significant effect on model predictions. These mRNAs were identified based on the SHAP values.
The force plot of the first sample in dataset (a): Force plot of negative class, (b): Force plot of positive class.
The box plot of five more important mRNAs based on their SHAP values (a): GDF5, (b): BAHCC1, (c): LCN2, (d): FGF14-AS2, (e): IDH2.
Discussion
The presence of lymph node metastasis demonstrates a strong correlation with the recurrence of cancer after surgery and the length of time a patient survives post-surgery. Therefore, it is of utmost importance to identify biomarkers that can predict the early occurrence of lymph node metastasis in breast cancer. Our results show that GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2 are among the top five most effective mRNAs involved in lymph node metastasis of breast cancer based on their SHAP values.
Changes in the vasculature of the lymph nodes during malignant progression could create an environment conducive to the attraction and maintenance of tumor cells, thereby establishing a specialized niche for metastasis [33]. The GDF5 gene (also called BMP14; bone morphogenetic protein-14) was the first mRNA identified in this study. It is a member of the transforming growth factor ß (TGF-ß) superfamily [34]. In human vascular endothelial cells, GDF5 facilitates the proangiogenic action of vascular endothelial growth factor [35], and controls the up-regulation of urokinase-type plasminogen activator (uPA) receptor (uPAR), and plays a role in angiogenesis [36] by promoting the migration of aortic endothelial cells without affecting their proliferation [37]. The overproduction of TGF-ß in breast cancer cells elicits the expression of GDF5 in endothelial cells, promoting angiogenesis. The proangiogenic effect of GDF5 can be regulated by anti-TGF-ß peptides and anti-GDF5 antibodies and applied as potential therapeutic tools for TGF-ß/GDF5-dependent breast cancer angiogenesis [38]. Recently, GDF5 has been also found to be associated with lymph node metastasis in colorectal cancer [39]. The absence of GDF5 may have significant implications on the immune responses and the functionality of macrophages [40]. Based on the available reports, GDF5 can be a valuable molecular target for targeting the modulation of metastasis of lymph nodes in breast cancer.
Cancer cells experience extensive alterations in their transcriptional profile by the development of alternative gene regulatory elements like super-enhancers [41]. Based on the results of this study, the BAHCC1 ranked as the second significant mRNA in lymph node metastasis of breast cancer samples. The presence of BAHCC1, a super-enhancer, is crucial for the successful growth, engraftment, and dissemination of tumors. BAHCC1 acts as a transcriptional regulator, controlling the expression of DNA-repair and E2F/KLF-dependent cell-cycle genes. Silencing of BAHCC1 leads to reduced cell proliferation and delayed DNA repair [41]. Furthermore, it interacts with important transcriptional corepressors, namely histone deacetylase and SAP30-binding protein, thereby providing a molecular foundation for BAHCC1-mediated suppression of target genes. BAHCC1 is extensively overexpressed in various subtypes of human acute leukemia and plays a vital role in the growth of malignant cells in animals and cell lines. In leukemia, the depletion of BAHCC1 leads to the inhibition of oncogenesis [42]. Notably, BAHCC1 forms associations with BRG1-containing remodeling complexes at the promoters of these genes. The mechanism by which the BAHCC1 is involved in lymph node metastasis has not been defined in breast cancer and our results suggest elucidating its direct role in breast cancer.
Lipocalin-2 (LCN2), a secretory glycoprotein, was identified as a top mRNA with the highest role in metastasis of the lymph node. Lipocalin-2 is responsible for the transportation of small lipophilic ligands. Its role in breast cancer and the metastasis of lymph nodes is of utmost importance. Aberrant expression of LCN2 plays a crucial role in various processes related to breast cancer such as angiogenesis, invasion and migration of cells, and the transition of epithelial cells to mesenchymal cells (EMT). An increase in the expression of LCN2 is linked with a negative prognosis, the presence of lymph node metastasis, the grading of tumors, and the estrogen receptor (ER)- negative status [43]. Lipocalin-2 actively encourages the metastasis of breast cancer by inducing angiogenesis, the production of vascular endothelial growth factor (VEGF), EMT, and the migration and invasion of cells, all of which occur through various signaling pathways including the PI3K/AKT/NF-κB [44], HIF-1α/Erk, ERα/Slug axis [45], and stabilizing matrix metalloproteinase-9 [46]. Taking into consideration the aforementioned discoveries, it can be concluded that LCN2 stimulates the invasion and metastasis of breast cancer cells by inducing EMT and promoting angiogenesis. This leads to the suggestion that LCN2 could be a potential target for therapeutic interventions aimed at inhibiting the progression of breast cancer. Agents that are capable of reducing and preventing the secretion of LCN2 are expected to have a wide range of applications and be beneficial for patients who are suffering from breast cancer [44, 47–51].
The 4th identified mRNA in this study was FGF14 antisense RNA 2 (FGF14-AS2), an emerging long non-coding RNA. FGF14-AS2 was originally identified as a suppressor of tumor formation in breast cancer. In comparison to its expression in adjacent normal tissue, FGF14-AS2 displays a notable decrease in breast cancer tissues. This reduced expression of FGF14-AS2 is closely linked to an enlargement in tumor size, an advanced clinical stage, a higher occurrence of lymph node metastasis, and an overall survival rate that is more unfavorable [52]. According to a study conducted by Jin et al., FGF14-AS2 functions as a competitive endogenous RNA (ceRNA) of miR-370-3p, thereby promoting the expression of FGF14 at the post-transcriptional level in breast cancer [53]. Furthermore, it has been reported that FGF14-AS2 directly binds to miR-205-5p, leading to the inhibition of proliferation, migration, invasion, and the initiation of apoptosis in breast cancer [54]. These studies support and validate the functional roles of the identified FGF14-AS2 in lymph node metastasis of breast cancer.
Isocitrate dehydrogenase or IDH serves as a vital enzyme with rate-limiting capabilities within the tricarboxylic acid cycle (TCA cycle), which plays a crucial role in energy metabolism. The TCA cycle upregulates cellular energy in cancer cells that are highly proliferative and have metastasized. The expression levels of IDH isoforms have been observed to be dysregulated in various human malignancies, suggesting their involvement in oncogenesis. In the context of breast cancer, upregulation or mutation of the mitochondrial NADP-dependent enzyme IDH2 has been associated with disease progression and prognostics. The presence of IDH2 has been linked to the aggressive behavior of breast cancer through its promotion of cell proliferation. Furthermore, the status of IDH2 has been identified as a robust predictor of outcome, particularly in individuals with ER-positive breast cancer [55]. The wild-type IDH2 [56] and its elevated expression potentially play a critical function in the progression of breast cancer and the emergence of lymphovascular invasion and metastasis [57]. Notably, IDH2 was found to be significantly upregulated in stage 3 breast cancer tissues and cell lines. Its presence was shown to be essential for arresting the cell cycle and inducing breast cancer proliferation [58]. These findings along with our results confirm that IDH2 presents a promising target for the treatment of breast cancer.
Our study has some limitations. While our initial AI-based analysis focused on detecting significant genes associated with lymph node metastasis, a post-hoc pathway enrichment analysis did not be performed to provide additional biological insights into the potential mechanisms and signaling pathways involved in the metastatic process. Moreover, we have not validated the identified RNA/protein expression levels in clinical samples. Nevertheless, existing studies supports and reinforce the molecular mechanism and biological functions of the candidate RNAs in lymph node metastasis of breast cancer.
Apart from the 5 genes we identified, there are other identified genes that are previously reported to be related to lymph node metastases. Our literature review reveals that dysregulated levels of identified RNAs including PQLC3 [59], Syndecan-1 (SDC1; a heparin sulfate proteoglycan) [60, 61], IRS2 (Insulin Receptor Substrate 2) [62, 63], MMP11 (Matrix Metalloproteinase 11) [64], aquaporin (AQP6) [65], TTC17 (Tetratricopeptide Repeat Domain 17) [66], SEPHS2 (Selenophosphate Synthetase 2) [67], PPP6R2 (Protein Phosphatase 6 Regulatory Subunit 2) [68], PPAPDC1A (Phosphatidic Acid Phosphatase Type 2 Domain Containing 1A) [69], STIM1 (Stromal Interaction Molecule 1) [70], LMOD2 (Leiomodin 2) [71], PPP4C phosphoprotein phosphatase catalytic subunit (PPPCs) [72], RAC1 [73], CCNB2 (Cyclin B2) [74], RAMP1 (Receptor activity modifying protein 1) [75], FOS [76], LINC00899 [77], and phosphatidylcholine (PC) [78] are involved in breast cancer metastasis. These RNAs can serve as an independent and external validation to your computational framework. Future research should explore this RNA panel in tissue and liquid biopsy samples to thoroughly establish their diagnostic, prognostic, and therapeutic values.
Conclusion
Lymph node metastasis is a critical and influential occurrence in the advancement of breast cancer. To find mRNA targets for lymph node metastasis in breast cancer, we applied an AI-based framework in this study. The suggested method selected the top five mRNAs involved in lymph node metastasis including GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2. This research has the potential to establish a comprehensive understanding of potential candidate genes that may play a crucial role in the early detection of lymph node metastasis in patients with breast cancer. The differential expression patterns of these genes between lymph node-positive and lymph node-negative tumors, as well as their association with lymph node metastases, highlight their clinical relevance. Further validation and investigation of these genes could lead to the development of more accurate prognostic tools and targeted therapies for breast cancer patients with lymph node involvement.
Acknowledgments
The authors would like to thank the National Institute for Medical Research Development (NIMAD), the Clinical Research Development Unit of Tabriz Valiasr Hospital and Kidney Research Center of Tabriz University of Medical Sciences for their assistance in this research.
References
- 1. Nagarajan D. and McArdle S. E., "Immune landscape of breast cancers," Biomedicines, vol. 6, no. 1, p. 20, 2018. pmid:29439457
- 2. Society A. C. "Key Statistics for Breast Cancer." https://www.cancer.org/ (accessed).
- 3.
Howlader N. et al., "SEER cancer statistics review, 1975–2013," Bethesda, MD: National Cancer Institute, vol. 19, 2016.
- 4. Litjens G. et al., "1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset," GigaScience, vol. 7, no. 6, p. giy065, 2018.
- 5. Voogd A. C. et al., "Differences in risk factors for local and distant recurrence after breast-conserving therapy or mastectomy for stage I and II breast cancer: pooled results of two large European randomized trials," Journal of clinical oncology, vol. 19, no. 6, pp. 1688–1697, 2001. pmid:11250998
- 6. Giuliano A. E. et al., "Locoregional recurrence after sentinel lymph node dissection with or without axillary dissection in patients with sentinel lymph node metastases: long-term follow-up from the American College of Surgeons Oncology Group (Alliance) ACOSOG Z0011 randomized trial," Annals of surgery, vol. 264, no. 3, p. 413, 2016. pmid:27513155
- 7. Giuliano A. E. et al., "Effect of axillary dissection vs no axillary dissection on 10-year overall survival among women with invasive breast cancer and sentinel node metastasis: the ACOSOG Z0011 (Alliance) randomized clinical trial," Jama, vol. 318, no. 10, pp. 918–926, 2017. pmid:28898379
- 8. Weaver D. L., "Pathology evaluation of sentinel lymph nodes in breast cancer: protocol recommendations and rationale," Modern Pathology, vol. 23, no. 2, pp. S26–S32, 2010. pmid:20436499
- 9. Somner J., Dixon J., and Thomas J., "Node retrieval in axillary lymph node dissections: recommendations for minimum numbers to be confident about node negative status," Journal of clinical pathology, vol. 57, no. 8, pp. 845–848, 2004. pmid:15280406
- 10. Van Diest P. J., Van Deurzen C. H., and Cserni G., "Pathology issues related to SN procedures and increased detection of micrometastases and isolated tumor cells," Breast disease, vol. 31, no. 2, pp. 65–81, 2010. pmid:21368369
- 11. Vestjens J. et al., "Relevant impact of central pathology review on nodal classification in individual breast cancer patients," Annals of oncology, vol. 23, no. 10, pp. 2561–2566, 2012. pmid:22495317
- 12. Duffy M. J., O’Donovan N., McDermott E., and Crown J., "Validated biomarkers: The key to precision treatment in patients with breast cancer," The Breast, vol. 29, pp. 192–201, 2016. pmid:27521224
- 13. Duffy M. J., McDermott E. W., and Crown J., "Use of multiparameter tests for identifying women with early breast cancer who do not need adjuvant chemotherapy," Clinical Chemistry, vol. 63, no. 4, pp. 804–806, 2017. pmid:28188230
- 14. Markopoulos C., van de Velde C., Zarca D., Ozmen V., and Masetti R., "Clinical evidence supporting genomic tests in early breast cancer: Do all genomic tests provide the same information?," European Journal of Surgical Oncology (EJSO), vol. 43, no. 5, pp. 909–920, 2017. pmid:27639633
- 15. Paik S. et al., "A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer," New England Journal of Medicine, vol. 351, no. 27, pp. 2817–2826, 2004. pmid:15591335
- 16. Van De Vijver M. J. et al., "A gene-expression signature as a predictor of survival in breast cancer," New England Journal of Medicine, vol. 347, no. 25, pp. 1999–2009, 2002. pmid:12490681
- 17. Buyse M. et al., "Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer," Journal of the National Cancer Institute, vol. 98, no. 17, pp. 1183–1192, 2006. pmid:16954471
- 18. Knauer M. et al., "The predictive value of the 70-gene signature for adjuvant chemotherapy in early breast cancer," Breast cancer research and treatment, vol. 120, pp. 655–661, 2010. pmid:20204499
- 19. C. A. Drukker et al., "A prospective evaluation of a breast cancer prognosis signature in the observational RASTER study," International journal of cancer, vol. 133, no. 4, pp. 929–936, 2013. pmid:23371464
- 20. Bueno-de-Mesquita J. M. et al., "Use of 70-gene signature to predict prognosis of patients with node-negative breast cancer: a prospective community-based feasibility study (RASTER)," The lancet oncology, vol. 8, no. 12, pp. 1079–1087, 2007. pmid:18042430
- 21. Duffy M. J., McGowan P. M., Harbeck N., Thomssen C., and Schmitt M., "uPA and PAI-1 as biomarkers in breast cancer: validated for clinical use in level-of-evidence-1 studies," (in eng), Breast Cancer Res, vol. 16, no. 4, p. 428, Aug 22 2014, pmid:25677449
- 22. Brueffer C. et al., "Clinical Value of RNA Sequencing-Based Classifiers for Prediction of the Five Conventional Breast Cancer Biomarkers: A Report From the Population-Based Multicenter Sweden Cancerome Analysis Network-Breast Initiative," (in eng), JCO Precis Oncol, vol. 2, 2018, pmid:32913985
- 23. Saal L. H. et al., "The Sweden Cancerome Analysis Network—Breast (SCAN-B) Initiative: a large-scale multicenter infrastructure towards implementation of breast cancer genomic analyses in the clinical routine," (in eng), Genome Med, vol. 7, no. 1, p. 20, 2015, pmid:25722745
- 24. Tsai C.-F. and Sung Y.-T., "Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches," Knowledge-Based Systems, vol. 203, p. 106097, 2020/09/05/ 2020, https://doi.org/10.1016/j.knosys.2020.106097.
- 25. Bommert A., Sun X., Bischl B., Rahnenführer J., and Lang M., "Benchmark for filter methods for feature selection in high-dimensional classification data," Computational Statistics & Data Analysis, vol. 143, p. 106839, 2020.
- 26. Chawla N. V., Bowyer K. W., Hall L. O., and Kegelmeyer W. P., "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
- 27.
J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proceedings of ICNN’95—International Conference on Neural Networks, 27 Nov.-1 Dec. 1995 1995, vol. 4, pp. 1942–1948 vol.4, https://doi.org/10.1109/ICNN.1995.488968
- 28. Rostami M., Berahmand K., Nasiri E., and Forouzandeh S., "Review of swarm intelligence-based feature selection methods," Engineering Applications of Artificial Intelligence, vol. 100, p. 104210, 2021.
- 29. Hearst M. A., Dumais S. T., Osuna E., Platt J., and Scholkopf B., "Support vector machines," IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18–28, 1998,
- 30.
Guo G., Wang H., Bell D., Bi Y., and Greer K., "KNN Model-Based Approach in Classification," in On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, Berlin, Heidelberg, Meersman R., Tari Z., and Schmidt D. C., Eds., 2003// 2003: Springer Berlin Heidelberg, pp. 986–996.
- 31. Breiman L., "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001/10/01 2001,
- 32. Lundberg S. M. and Lee S.-I., "A unified approach to interpreting model predictions," Advances in neural information processing systems, vol. 30, 2017.
- 33. Farnsworth R. H. et al., "A role for bone morphogenetic protein-4 in lymph node vascular remodeling and primary tumor growth," (in eng), Cancer research, vol. 71, no. 20, pp. 6547–57, Oct 15 2011, pmid:21868759
- 34. Harsanyi S., Zamborsky R., Krajciova L., Bohmer D., Kokavec M., and Danisovic L., "Association Analysis of GDF5 and Contributing Factors in Developmental Dysplasia of the Hip in Infants," (in eng), Ortopedia, traumatologia, rehabilitacja, vol. 23, no. 5, pp. 335–339, Oct 31 2021, pmid:34734566
- 35. Kurogane Y. et al., "FGD5 mediates proangiogenic action of vascular endothelial growth factor in human vascular endothelial cells," (in eng), Arteriosclerosis, thrombosis, and vascular biology, vol. 32, no. 4, pp. 988–96, Apr 2012, pmid:22328776
- 36. Serratì S. et al., "TGFbeta1 antagonistic peptides inhibit TGFbeta1-dependent angiogenesis," (in eng), Biochemical pharmacology, vol. 77, no. 5, pp. 813–25, Mar 1 2009, pmid:19041849
- 37. Yamashita H. et al., "Growth/differentiation factor-5 induces angiogenesis in vivo," (in eng), Experimental cell research, vol. 235, no. 1, pp. 218–26, Aug 25 1997, pmid:9281371
- 38. Margheri F. et al., "GDF5 regulates TGFß-dependent angiogenesis in breast carcinoma MCF-7 cells: in vitro and in vivo control by anti-TGFß peptides," (in eng), PloS one, vol. 7, no. 11, p. e50342, 2012, pmid:23226264
- 39. Yang H. et al., "An Analysis of the Gene Expression Associated with Lymph Node Metastasis in Colorectal Cancer," (in eng), International journal of genomics, vol. 2023, p. 9942663, 2023, pmid:37719786
- 40. Daans M., Lories R. J., and Luyten F. P., "Absence of GDF5 does not interfere with LPS Toll-like receptor signaling," (in eng), Clinical and experimental rheumatology, vol. 27, no. 3, pp. 495–8, May-Jun 2009. pmid:19604444
- 41. Berico P. et al., "Super-enhancer-driven expression of BAHCC1 promotes melanoma cell proliferation and genome stability," (in eng), Cell reports, vol. 42, no. 11, p. 113363, Nov 28 2023, pmid:37924516
- 42. Fan H. et al., "BAHCC1 binds H3K27me3 via a conserved BAH module to mediate gene silencing and oncogenesis," (in eng), Nature genetics, vol. 52, no. 12, pp. 1384–1396, Dec 2020, pmid:33139953
- 43. Bauer M., Eickhoff J. C., Gould M. N., Mundhenke C., Maass N., and Friedl A., "Neutrophil gelatinase-associated lipocalin (NGAL) is a predictor of poor prognosis in human primary breast cancer," (in eng), Breast cancer research and treatment, vol. 108, no. 3, pp. 389–97, Apr 2008, pmid:17554627
- 44. Leng X. et al., "Inhibition of lipocalin 2 impairs breast tumorigenesis and metastasis," (in eng), Cancer research, vol. 69, no. 22, pp. 8579–84, Nov 15 2009, pmid:19887608
- 45. Yang J. et al., "Lipocalin 2 promotes breast cancer progression," (in eng), Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 10, pp. 3913–8, Mar 10 2009, pmid:19237579
- 46. Provatopoulou X. et al., "Circulating levels of matrix metalloproteinase-9 (MMP-9), neutrophil gelatinase-associated lipocalin (NGAL) and their complex MMP-9/NGAL in breast cancer disease," (in eng), BMC cancer, vol. 9, p. 390, Nov 4 2009, pmid:19889214
- 47. Chiang K. C. et al., "The Vitamin D Analog, MART-10, Attenuates Triple Negative Breast Cancer Cells Metastatic Potential," (in eng), International journal of molecular sciences, vol. 17, no. 4, Apr 21 2016, pmid:27110769
- 48. Guo P., Yang J., Jia D., Moses M. A., and Auguste D. T., "ICAM-1-Targeted, Lcn2 siRNA-Encapsulating Liposomes are Potent Anti-angiogenic Agents for Triple Negative Breast Cancer," (in eng), Theranostics, vol. 6, no. 1, pp. 1–13, 2016, pmid:26722369
- 49. Cheng G. et al., "HIC1 silencing in triple-negative breast cancer drives progression through misregulation of LCN2," (in eng), Cancer research, vol. 74, no. 3, pp. 862–72, Feb 1 2014, pmid:24295734
- 50. Guo P., You J. O., Yang J., Jia D., Moses M. A., and Auguste D. T., "Inhibiting metastatic breast cancer cell migration via the synergy of targeted, pH-triggered siRNA delivery and chemokine axis blockade," (in eng), Molecular pharmaceutics, vol. 11, no. 3, pp. 755–65, Mar 3 2014, pmid:24467226
- 51. Fougère M. et al., "NFAT3 transcription factor inhibits breast cancer cell motility by targeting the Lipocalin 2 gene," (in eng), Oncogene, vol. 29, no. 15, pp. 2292–301, Apr 15 2010, pmid:20101218
- 52. Yang F. et al., "A novel long non-coding RNA FGF14-AS2 is correlated with progression and prognosis in breast cancer," (in eng), Biochemical and biophysical research communications, vol. 470, no. 3, pp. 479–483, Feb 12 2016, pmid:26820525
- 53. Jin Y. et al., "Long noncoding RNA FGF14-AS2 inhibits breast cancer metastasis by regulating the miR-370-3p/FGF14 axis," (in eng), Cell death discovery, vol. 6, p. 103, 2020, pmid:33083023
- 54. Yang Y., Xun N., and Wu J. G., "Long non-coding RNA FGF14-AS2 represses proliferation, migration, invasion, and induces apoptosis in breast cancer by sponging miR-205-5p," (in eng), European review for medical and pharmacological sciences, vol. 23, no. 16, pp. 6971–6982, Aug 2019, pmid:31486497
- 55. Minemura H. et al., "Isoforms of IDH in breast carcinoma: IDH2 as a potent prognostic factor associated with proliferation in estrogen-receptor positive cases," Breast Cancer, vol. 28, pp. 915–926, 2021. pmid:33713004
- 56. Kurozumi S. et al., "A key genomic subtype associated with lymphovascular invasion in invasive breast cancer," (in eng), British journal of cancer, vol. 120, no. 12, pp. 1129–1136, Jun 2019, pmid:31114020
- 57. Aljohani A. I. et al., "The prognostic significance of wild-type isocitrate dehydrogenase 2 (IDH2) in breast cancer," (in eng), Breast cancer research and treatment, vol. 179, no. 1, pp. 79–90, Jan 2020, pmid:31599393
- 58. Piao S. et al., "The relative isoform expression levels of isocitrate dehydrogenase in breast cancer: IDH2 is a potential target in MDA-MB-231 cells," (in eng), Korean journal of clinical oncology, vol. 19, no. 2, pp. 60–68, Dec 2023, pmid:38229490
- 59. Wang C. et al., "A six-gene signature related with tumor mutation burden for predicting lymph node metastasis in breast cancer," (in eng), Translational cancer research, vol. 10, no. 5, pp. 2229–2246, May 2021, pmid:35116541
- 60. Cui X., Jing X., Yi Q., Long C., Tian J., and Zhu J., "Clinicopathological and prognostic significance of SDC1 overexpression in breast cancer," (in eng), Oncotarget, vol. 8, no. 67, pp. 111444–111455, Dec 19 2017, pmid:29340066
- 61. Sayyad M. R. et al., "Syndecan-1 facilitates breast cancer metastasis to the brain," (in eng), Breast cancer research and treatment, vol. 178, no. 1, pp. 35–49, Nov 2019, pmid:31327090
- 62. Lee J. S. et al., "The insulin and IGF signaling pathway sustains breast cancer stem cells by IRS2/PI3K-mediated regulation of MYC," (in eng), Cell reports, vol. 41, no. 10, p. 111759, Dec 6 2022, pmid:36476848
- 63. Gibson S. L., Ma Z., and Shaw L. M., "Divergent roles for IRS-1 and IRS-2 in breast cancer metastasis," (in eng), Cell cycle (Georgetown, Tex.), vol. 6, no. 6, pp. 631–7, Mar 15 2007, pmid:17361103
- 64. Molière S. et al., "MMP-11 expression in early luminal breast cancer: associations with clinical, MRI, pathological characteristics, and disease-free survival," (in eng), BMC cancer, vol. 24, no. 1, p. 295, Mar 4 2024, pmid:38438841
- 65. Charlestin V., Fulkerson D., Arias Matus C. E., Walker Z. T., Carthy K., and Littlepage L. E., "Aquaporins: New players in breast cancer progression and treatment response," (in eng), Frontiers in oncology, vol. 12, p. 988119, 2022, pmid:36212456
- 66. Zhang J. et al., "Loss of TTC17 promotes breast cancer metastasis through RAP1/CDC42 signaling and sensitizes it to rapamycin and paclitaxel," (in eng), Cell & bioscience, vol. 13, no. 1, p. 50, Mar 9 2023, pmid:36895029
- 67. Zhang L. et al., "Bioinformatics Analyses Reveal the Prognostic Value and Biological Roles of SEPHS2 in Various Cancers," (in eng), International journal of general medicine, vol. 14, pp. 6059–6076, 2021, pmid:34594130
- 68. Müller A. et al., "Involvement of chemokine receptors in breast cancer metastasis," (in eng), Nature, vol. 410, no. 6824, pp. 50–6, Mar 1 2001, pmid:11242036
- 69. Guo X., Yang F., Yu L., Wen R., Zhang X., and Lin H., "MiR-598-5p inhibits breast cancer tumor growth and lung metastasis by targeting PPAPDC1A," (in eng), The Chinese journal of physiology, vol. 66, no. 2, pp. 103–110, Mar-Apr 2023, pmid:37026213
- 70. Mo P. and Yang S., "The store-operated calcium channels in cancer metastasis: from cell migration, invasion to metastatic colonization," (in eng), Frontiers in bioscience (Landmark edition), vol. 23, no. 7, pp. 1241–1256, Jan 1 2018, pmid:28930597
- 71. Liu Y. et al., "LMO2 promotes tumor cell invasion and metastasis in basal-type breast cancer by altering actin cytoskeleton remodeling," (in eng), Oncotarget, vol. 8, no. 6, pp. 9513–9524, Feb 7 2017, pmid:27880729
- 72. Xie W. et al., "Comprehensive analysis of PPPCs family reveals the clinical significance of PPP1CA and PPP4C in breast cancer," (in eng), Bioengineered, vol. 13, no. 1, pp. 190–205, Jan 2022, pmid:34964699
- 73. Baugher P. J., Krishnamoorthy L., Price J. E., and Dharmawardhane S. F., "Rac1 and Rac3 isoform activation is involved in the invasive and metastatic phenotype of human breast cancer cells," Breast Cancer Research, vol. 7, pp. 1–10, 2005.
- 74. Aljohani A. I. et al., "Upregulation of Cyclin B2 (CCNB2) in breast cancer contributes to the development of lymphovascular invasion," (in eng), American journal of cancer research, vol. 12, no. 2, pp. 469–489, 2022. pmid:35261781
- 75. Gutierrez S. and Boada M. D., "Neuropeptide-induced modulation of carcinogenesis in a metastatic breast cancer cell line (MDA-MB-231 LUC+)," Cancer Cell International, vol. 18, pp. 1–10, 2018.
- 76. Li P. et al., "Enhancer RNA SLIT2 inhibits bone metastasis of breast cancer through regulating P38 MAPK/c-Fos signaling pathway," Frontiers in Oncology, vol. 11, p. 743840, 2021. pmid:34722297
- 77. Mondal P. and Meeran S. M., "Long non-coding RNAs in breast cancer metastasis," Non-coding RNA research, vol. 5, no. 4, pp. 208–218, 2020. pmid:33294746
- 78. Qiu Y. et al., "ACSL4-mediated membrane phospholipid remodeling induces integrin β1 activation to facilitate triple-negative breast cancer metastasis," (in eng), Cancer research, Mar 12 2024, pmid:38471082