Machine learning-driven risk stratification for distant metastasis in gastric cancer: A comparative study of clinical features and composite indices integrated models

Shaoxue Yang; Han Lei

doi:10.1371/journal.pone.0335258

Abstract

Objective

Distant metastasis (DM) of gastric cancer (GC) represents a significant health challenge due to its high mortality rates, necessitating advancements in early detection and management strategies. The objective of this study was to create a machine learning (ML) model that is interpretable for preoperative prediction of DM in GC.

Methods

We retrospectively analyzed 1,009 GC patients, of which 769 were from Zhejiang Cancer Hospital as development cohort and 240 from Zhejiang Provincial Hospital of Chinese Medicine as external test cohort. Nine clinical features, and four composite indices derived from ten laboratory indicators were selected as candidate features. The dataset was balanced using the borderline Synthetic Minority Over-sampling Technique (SMOTE) and the Edited Nearest Neighbors (ENN) under-sampling method. Univariate and multivariate analyses were used to identified key metastasis-related features. Based on the identified features, we developed predictive models incorporating five ML algorithms, with performance evaluated via receive operating characteristic (ROC) curves, recall, precision-recall (PR) curves. Ultimately, Shapley additive explanations (SHAP) analysis were applied to rank the feature importance and explain the final model.

Results

Univariate and multivariate analyses identified five metastasis-related features: cT stage, cN stage, differentiation grade, PLR and TMI. Logistic Regression emerged as the optimal predictive model with the highest area under the curve (AUC) of 0.942 (95% CI: 0.922–0.962), Recall of 0.895 (95% CI: 0.843–0.947), and AUPRC of 0.889 (95% CI: 0.867–0.911) among five models. Additionally, the internal and external test cohorts recorded AUC values of 0.935 (95% CI: 0.897–0.972) and 0.879 (95% CI: 0.833–0.926), respectively. The SHAP analysis revealed the features that played a significant role in the predictions made by the model.

Conclusion

This ML model integrates clinical features and composite indices to predict GC metastasis risk, supported by an online tool to guide preoperative decision-making.

Citation: Yang S, Lei H (2025) Machine learning-driven risk stratification for distant metastasis in gastric cancer: A comparative study of clinical features and composite indices integrated models. PLoS One 20(10): e0335258. https://doi.org/10.1371/journal.pone.0335258

Editor: Jincheng Wang, Hokkaido University: Hokkaido Daigaku, JAPAN

Received: May 27, 2025; Accepted: October 7, 2025; Published: October 30, 2025

Copyright: © 2025 Yang, Lei. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The anonymized data and analysis scripts of this study have been deposited in Mendeley Data. The corresponding repository is publicly accessible via the following DOI: 10.17632/t7j3tmcys4.1.

Funding: This work was jointly supported by grants from the Zhejiang Medical and Health Science and Technology Plan (Grant Nos. 2022KY671 and 2023KY136). There was no additional external funding received for this study. No internal funding from our organization was received for this study.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Gastric cancer (GC) is a significant global health concern, ranking fifth in incidence and fourth in mortality worldwide. In China, 24.05% of individuals diagnosed with GC have distant metastases (DM). The peritoneum, liver, and bones are the three most frequent sites of metastatic spread [1–2]. The therapeutic approach for metastatic disease fundamentally differs from that of non-metastatic disease, transitioning from endoscopic resection or curative-intent gastrectomy to palliative systemic therapy, such as chemotherapy, molecularly targeted therapies, and immunotherapeutic regimens [3–4]. This distinction highlights the critical importance of accurately identifying the metastatic status to guide appropriate clinical management.

Detecting DM in GC remains challenging due to the limitations of existing technologies. For example, CT/PET-CT has low sensitivity for identifying sub-5 mm peritoneal micrometastases and often fails to differentiate metastatic lesions from inflammatory changes, particularly in hypometabolic tumors [5–6]. Liquid biopsies, which include circulating tumor DNA and circulating tumor cells, show promise but suffer from low sensitivity in the early detection of metastasis and lack standardized protocols. Tumor heterogeneity further compromises their reliability [7–9]. The substantial costs associated with PET-CT and liquid biopsies impose significant financial burdens on patients, limiting their feasibility for routine monitoring applications.

Machine learning (ML) is revolutionizing disease prediction by integrating diverse data sources to enable earlier and more precise risk assessment. These technologies excel in chronic disease management, oncology, infectious disease surveillance, and rare disease diagnosis [10–15]. The majority of current ML models for predicting GC with DM are based on the Surveillance, Epidemiology, and End Results (SEER) database. These models predominantly leverage demographic characteristics and clinicopathological profiles, rarely integrating multimodal predictors with laboratory-derived composite indices into their predictive architectures [16–19]. Compared to single laboratory indicators, accumulating evidence demonstrates that composite indices play significant roles in tumor early detection and prognostication. Neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), platelet-to-monocyte ratio (PMR), and systemic immune-inflammation index (SIII) are common indicators of systemic inflammation, which have been identified as potential predictors for prognosis and treatment efficacy evaluation in various types of cancer [20–23]. Tumor Marker Index (TMI) and Prognostic Nutritional Index (PNI) are recognized as important prognostic factors in multiple malignancies [24–27]. However, limited studies have applied these composite indices to the investigation of DM in GC.

In this research, our objective was to create and assess an optimal explainable ML model for preoperative forecasting risk of DM in GC using clinical features and laboratory-derived composite indices, thereby assisting clinicians in reducing unnecessary surgical trauma caused by inappropriate radical surgery for patients at high DM risk.

Methods

Patients involvement

In this retrospective study, all data were extracted from the electronic medical record system (EMRS) following ethical approval from both the Ethics Committee of Zhejiang Cancer Hospital (No. IRB-2022–140), with initial access commencing on September 30, 2022; and the Ethics Committee of the First Affiliated Hospital of Zhejiang Chinese Medical University (No. 2023-KLS-137–01), with initial access commencing on October 20, 2023. Minors were not included in this study. This retrospective study utilized fully anonymized medical record data. The ethics committee waived the requirement for informed consent, and all procedures adhered to the Declaration of Helsinki. A total of 1,107 GC patients treated at Zhejiang Cancer Hospital were enrolled between January 2019 and March 2023. After excluding certain patients, 769 patients were included in the final analysis as the development cohort. The inclusion criteria were as follows: (1) pathological diagnosis of gastric adenocarcinoma; (2) preoperative patients who had not received neoadjuvant therapy or radiation; (3) non-residual gastric cancer. The exclusion criteria were: (1) lack of clinical data; (2) patients with other malignancies or gastrointestinal stromal tumor (GIST); (3) diseases of the hematopoietic system; (4) hepatic or renal insufficiency; (5) uncertain distant metastasis status. The external test cohort comprised 240 GC patients from the Zhejiang Traditional Chinese Medicine Hospital, all of whom met the same inclusion and exclusion criteria. The patients were treated between January 2019 and April 2024. The study flow of this paper is shown in Fig 1.

Download:

Fig 1. The workflow diagram for study design and patient screening.

Abbreviations: SMOTE: Synthetic Minority Oversampling Technique; XGBoost: eXtreme Gradient Boosting; RF: Random Forest; AdaBoost, Adaptive Boosting; SVM: Support Vector Machine; SHAP: Shapley additive explanations.

https://doi.org/10.1371/journal.pone.0335258.g001

Data collection and processing

The collection of clinical features encompassed gender, age, tumor location, tumor size, Lauren classification, differentiation grade, clinical T stage, clinical N stage, and Her2 status. Laboratory indicators comprised lymphocyte count, monocyte count, platelet count, albumin level, CA242, CA50, CA724, CEA, CA125, and CA199 levels. All data were obtained from the EMRS and were collected prior to treatment initiation. The clinical tumor stage and DM were classified based on the 8th edition of the AJCC Cancer Staging System. The PLR, LMR, PNI and TMI were calculated as follows: PLR = Platelet count (10⁹/L) ÷ Lymphocyte count (10⁹/L); LMR = Lymphocyte count (10⁹/L) ÷ Monocyte count (10⁹/L); PNI = Albumin (g/L) + 5 × Lymphocyte count (10⁹/L); the calculation method for TMI was proposed by Miyata T et al [28], and TMI was defined as the number of positive TMs for individual, the value exceeding the upper limit of the reference range is considered positive, and the upper limit of CA50, CEA, CA19−9, CA12−5, CA72−4, CA24−2 were 25 U/mL, 5 ng/mL, 34 U/mL, 35 U/mL, 6.9 U/mL and 20 U/mL, respectively.

Sample balancing strategies

Considering the class imbalance in the gastric cancer (GC) metastasis dataset (with metastatic cases constituting only 18.9% and non-metastatic cases making up 81.1%), the borderline Synthetic Minority Over-sampling Technique (SMOTE) and the Edited Nearest Neighbors (ENN) under-sampling method was employed to address the imbalance. First, the borderline SMOTE algorithm was used for data augmentation. This method generated new minority-class (metastatic) samples by leveraging feature similarities among existing positive cases. Subsequently, the ENN cleaning approach was implemented to remove majority-class (non-metastatic) samples that overlapped with minority-class samples. These overlapping cases are prevalent noise sources in imbalanced datasets. This strategy not only enhances model training accuracy but also better reflects class-specific sample distributions while avoiding the noise associated with synthetic data.

Feature selection and validation strategy

To select optimal features for predicting gastric cancer metastasis, we first applied univariate analysis to filter out variables with weak associations (P > 0.05). The remaining candidates were then included in multivariate stepwise regression to eliminate collinearity and retain only those with independent predictive value (P < 0.05). To evaluate the model performance, we employed five-fold nested cross-validation. Five-fold nested cross-validation is a robust validation method that uses double-layer data partitioning. The outer loop evaluates model performance by dividing the development data into four training folds and one test fold, while the inner loop optimizes hyperparameters through five splits, utilizing four subsets for training and one for validation. This approach prevents data leakage, ensures unbiased assessment, and improves generalizability, although it requires 25 training iterations. The optimal hyperparameters for the model were determined using a grid search approach, which systematically assesses each and every combination of hyperparameters. The internal and external test cohorts were employed to validate the final optimal model derived through rigorous filtering.

ML algorithms

The optimal ML model was selected from a pool of five algorithms: eXtreme Gradient Boosting (XGBoost), Logistic Regression, Random Forest (RF), Adaptive Boosting (AdaBoost), and Support Vector Machine (SVM). Subsequently, ROC curves, precision-recall (PR) curves, calibration curves, and decision curves analysis (DCA) were used to evaluate the models’ performance. A battery of metrics including AUC, accuracy, recall, specificity, PRAUC, Brier and F1 score were employed to assess the models. Following a thorough comparative analysis of the various ML models, the one exhibiting the best AUC, recall and PRAUC in validation cohort was selected as the final predictive model. Ultimately, Shapley additive explanations (SHAP) analysis were utilized to interpret and evaluate the final model.

Statistical analysis

Statistical evaluations were conducted utilizing Python version 3.11.4 and R version 4.2.3. Categorical variables are represented as percentages, while continuous variables are denoted as mean ± standard deviation or median accompanied by the interquartile range (IQR). For the analysis of categorical data, the Chi-square test was employed, whereas the assessment of continuous variables was carried out using either Student’s t-test or the Mann–Whitney U test were used to analyze continuous data. ROC curves were conducted to evaluate the discrimination performance of the models, including the AUCs along with their 95% confidence interval (CI) being reported. The Brier score, which varies between 0 and 1, was employed to evaluate the discrepancy between the predicted risk and the actual risk, where a score approaching 0 signifies superior calibration and serves as a measure of model calibration. Additionally, DCA was performed to illustrate the net benefit derived from utilizing the model at varying thresholds, thereby assessing the model’s clinical relevance. A two-tailed P-value of less than 0.05 was deemed statistically significant.

Results

Baseline clinical characteristics

The essential demographic and clinical characteristics of GC patients from the two centers are detailed in Table 1. A total of 769 patients from Zhejiang Cancer Hospital were classified into the development cohort, while 240 patients from Zhejiang Provincial Hospital of Traditional Chinese Medicine comprised the external test cohort. Statistical analysis revealed no significant differences in gender, age, metastasis category, or metastasis sites between the cohorts (all P > 0.05), confirming comparable baseline characteristics. Table 2 presents the baseline clinical characteristics of the development cohort, which included 145 patients with DM and 624 non-metastatic patients. The groups showed no significant differences in gender, age, HER2 status, or monocyte count (all P > 0.05), whereas statistically significant variations were observed in tumor location, tumor size, Lauren classification, clinical T/N stages, lymphocyte count, platelet count, albumin, CA242, CA50, CA724, CEA, CA125, CA199, PLR, LMR, PNI, and TMI (all P < 0.05).

Download:

Table 1. Baseline characteristics of gastric gancer cohorts from two centers.

https://doi.org/10.1371/journal.pone.0335258.t001

Download:

Table 2. Comparison of characteristics between Mo and M1 gastric cancer patients from development cohort.

https://doi.org/10.1371/journal.pone.0335258.t002

Sample balancing via borderline SMOTE-ENN method

After sample balancing, metastasis samples increased from 145 to 250 (preserving clinical feature consistency), no metastasis samples samples decreased from 624 to 500 (clean invalid noise). The resultant 750-sample dataset (250 positive, 500 negative) maintained a 1:2 ratio. No significant feature distribution differences from the original dataset (all P > 0.05) confirmed no data distortion, as shown in Table 3.

Download:

Table 3. Comparison of clinical characteristics between original cohort and borderline SMOTE-ENN balanced cohort in M0 and M1 gastric cancer.

https://doi.org/10.1371/journal.pone.0335258.t003

Feature selection associated with distant metastasis in gastric cancer

Univariate and multivariate analyses were performed to screen feature variables, and five key variables were ultimately identified: cT stage, cN stage, differentiation grade, PLR, and TMI. Detailed information on these variables is presented in Table 4.

Download:

Table 4. Univariate and multivariate analysis of factors associated with distant metastasis in gastric cancer.

https://doi.org/10.1371/journal.pone.0335258.t004

Identification of the optimal ML model

Table 5 and Fig 2 comparatively present the performance metrics of the five ML models across training and validation cohorts. Radar chart analysis revealed that in the training cohort, the RF model achieved superior predictive performance (Fig 3A), with an AUC of 0.990 (95% CI: 0.983–0.997), accuracy of 0.969 (95% CI: 0.962–0.977), sensitivity of 0.952 (95% CI: 0.930–0.974), specificity of 0.978 (95% CI: 0.976–0.980), F1-score of, 0.954 (95% CI: 0.942–0.966) and AUPRC of 0.977 (95% CI: 0.972–0.981). In the validation cohort (Fig 3B), Logistic Regression demonstrated optimal classification capability with an AUC of 0.942 (95% CI: 0.904–0.980), accuracy of 0.840 (95% CI: 0.810–0.870), sensitivity of 0.895 (95% CI: 0.843–0.947), specificity of 0.813 (95% CI: 0.755–0.870), F1-score of 0.790 (95% CI: 0.759–0.820), and AUPRC of 0.889 (95% CI: 0.867–0.911). Logistic Regression model also showed good calibration with Brier score of 0.093 (95% CI: 0.082–0.104) (Fig 2E) and clinical utility with the highest net benefit across all probability thresholds in decision curve analysis (Fig 2F). These results establish Logistic Regression as the optimal predictor for DM in GC, combining highest discriminative power (AUC), highest ability to correctly identify positive cases (recall), balanced precision-recall (AUPRC), and reliable probabilistic calibration (Brier score). The hyperparameters for 5 ML models were provided in S1 Table.

Download:

Table 5. A battery of metrics of five classifiers in the training and validation cohorts for fivefold nested cross-validation.

https://doi.org/10.1371/journal.pone.0335258.t005

Download:

Fig 2. The performance of five-model in forecasting distant metastasis among gastric cancer patients, as evaluated in both the training and validation cohorts.

The analysis of the ROC curves (A, B) and the PR curves (C, D) were conducted for each model within both the training and validation cohorts. Calibration curves (E) and DCA curves (F) for each model in the validation cohorts. Abbreviations: XGBoost: eXtreme Gradient Boosting; AdaBoost, Adaptive Boosting; SVM: Support Vector Machine.

https://doi.org/10.1371/journal.pone.0335258.g002

Download:

Fig 3. Five-model comparative analysis via radar plot: AUC, accuracy, sensitivity, specificity, F1-Score, AUPRC evaluation.

(A) Comparative analysis in the training cohort. (B) Comparative analysis in the validation cohort. Abbreviations: AUC: area under the curve; AUPRC: area under the precision-recall curve; XGBoost: eXtreme Gradient Boosting; RF: Random Forest; AdaBoost, Adaptive Boosting; SVM: Support Vector Machine.

https://doi.org/10.1371/journal.pone.0335258.g003

Analysis and assessment of the Logistic regression model

As shown in Fig 4, the AUC for the validation cohort [AUC = 0.942 (95% CI: 0.904−0.980)] did not exceed that of the test cohort [AUC = 0.935 (95% CI: 0.8970.972)] by more than 10%, confirming robust model generalizability. Therefore, Logistic regression model was considered appropriate for classification tasks within this dataset. The SHAP methodology offers two distinct categories of interpretations: the bar plot visualizes the global importance of features by calculating the mean absolute SHAP value (Mean |SHAP Value|) for each feature’s impact on the model output, with features ranked by their importance. Additionally, the dot plot displays the distributional influence and directionality of features, showing how individual feature values (red indicating higher feature values and blue indicating lower values) correlate with the SHAP values across all samples, thereby highlighting both the spread and the positive/negative effects on predictions. As illustrated in Fig. 5A, the bar plot showed the influence of various features on the model presented in a descending sequence: TMI, T stage, PLR, N stage, differentiation grade. Furthermore, the dot plot (Fig. 5B) effectively depicted both the direction and extent of the impact that each feature exerts on the predictions made by the model. Notable features include advanced T and N stage, higher TMI and PLR, and poorer differentiation (Grade3, Grade4) increased the risk. From Table 6, the full model (with cT stage, cN stage, Differentiation grade, PLR, and TMI) performs best showing significantly positive IDI 0.258 to 0.309, all P < 0.001 vs. all reduced models, and TMI is the core variable boosting model performance.

Download:

Table 6. Comparison of integrated discrimination improvement (IDI) among models with different feature combinations.

https://doi.org/10.1371/journal.pone.0335258.t006

Download:

Fig 4. Logistic Regression performance evaluation with 5-fold nested cross-validation for gastric cancer distant metastasis prediction.

The ROC curves (A, B) and confusion matrix (C, D) were conducted for Logistic Regression model within both the training and validation cohorts. (E) The calibration curve of Logistic Regression model in test cohort. (F) The decision curve analysis of Logistic Regression Model model in test cohort.

https://doi.org/10.1371/journal.pone.0335258.g004

Download:

Fig 5. Explainability of Logistic Regression model with SHAP method and the online tool for forecasting distant metastasis in patients with gastric cancer.

(A) The bar plot by SHAP. (B) The dot plot by SHAP. (C) The online tool for forecasting distant metastasis in patients with gastric cancer. Abbreviations: TMI: toumor marker index; PLR: Platelet-to-lymphocyte ratio.

https://doi.org/10.1371/journal.pone.0335258.g005

External test of the model and the performance differences between internal and external test

A set comprising 240 patients diagnosed with GC was assembled from the Zhejiang Provincial Hospital of Traditional Chinese Medicine for the purpose of external test. The AUC of external test was 0.879 (95% CI: 0.833–0.926).

The absolute difference in the AUC between the internal and external test sets is 0.056; The AUC of the external test showed a relative decline of approximately 6.0% compared to the internal test.

Online prediction tool

As depicted in Fig 5C, we developed a web-based clinical decision support system (https://www.xsmartanalysis.com/model/list/predict/model/html?mid=27296&symbol=41755942uz68CdglZU92) that calculates the likelihood of metastasis by incorporating various feature variables. By applying a predefined decision threshold, the system categorizes patients into low-risk and high-risk groups. This tool combines model interpretability visualizations (SHAP force plots) with risk probability outputs to improve clinical utility and reliability.

Discussion

This is one of the few multicenter data prediction ML models originating from a source other than the SEER database, incorporating clinical characteristics and laboratory-derived composite indices for GC with DM. It features two key innovations: 1. Transforming multiple laboratory indicators into optimized composite indices to reduce the multicollinearity issue among laboratory data; 2. Integrating sample balancing strategies and validation protocols into robust model training to mitigate the reliability and generalization limitations of imbalanced-sample models.

As of the present, various studies have reported predictive ML models that asess the risk of DM in GC; however, these studies are constrained by significant limitations. For instance, the majority of these investigations are SEER-based models, which has notable deficiencies in clinical granularity. These deficiencies include the absence of laboratory biomarkers, molecular profiles, and detailed pathological characteristics, limiting its efficacy in constructing a precise predictive model. Furthermore, the database’s population representativeness does not encompass Chinese gastric cancer sets, thereby undermining its generalizability. These limitations are corroborated through external validation in Chinese population cohorts, which revealed suboptimal discriminative performance with AUC values ranging from 0.727 to 0.760 [16–19]. Some models integrate DNA methylation profiles or miRNAs as variables; however, routine screening for genetic testing remains limited, coupled with the absence of standardized methodologies for their detection and interpretation, which severely undermines the reproducibility and generalizability of such models [29–30]. Undoubtedly, radiomics has potential value in diagnosing peritoneal metastasis of GC, However, the existing research exhibits varying levels of quality. Therefore, there is a pressing need for future research to be more standardized and of higher quality to facilitate the translation of radiomics findings into clinical practice [31–33]. Therefore, it is imperative to develop a easily data-acquirable, standardizable and clinically translatable ML model specifically tailored to Chinese populations.

Currently, The domain of medical ML faces substantial critical challenges, particularly in scenarios with small sample sizes, binary class imbalance, encompassing overfitting, constrained generalizability, and non-transparent decision-making processes, which significantly impede clinical translation [34–36]. To mitigate the impact of these issues on models’ performance, we have adopted the following relatively novel approaches: Firstly, we optimized ten single laboratory indicators into four disease-associated composite indices, which significantly reduced the model’s input dimensionality, mitigated overfitting with high-dimensional data, and helped eliminate multicollinearity among original indicators, thereby enhancing model stability and generalizability. Secondly, we employed the Borderline SMOTE-ENN method to address sample imbalance. This approach not only improves model training accuracy and more effectively captures class-specific sample distributions but also enhances the model’s adaptability to data distribution, strengthens generalization performance, and simultaneously prevents overfitting and the noise introduced by synthetic data [37]. Thirdly, we implemented a 5-fold nested cross-validation methodology to meticulously address model evaluation and hyperparameter optimization within the constraints of limited sample sizes, closely replicating real-world clinical settings. This technique operates via dual validation cycles: the outer cycle partitions the dataset into five segments, sequentially assigning one segment as an independent test cohort while the other four segments are utilized for model development. Within these four segments, the inner cycle conducts further cross-validation to adjust hyperparameters and select features, ensuring the test data remains entirely segregated from any optimization activities. This method precludes the risk of data leakage that could artificially enhance performance metrics by strictly segregating the test cohort from the model refinement stages (hyperparameter tuning, feature selection), thereby optimizing the utilization of scarce medical data. Finally, we integrated SHAP to enhance model interpretability. SHAP quantifies the contribution of each feature to individual patient predictions and visually demonstrates the directional impact (positive or negative) and magnitude of features through force-directed plots, thereby providing clinicians with an intuitive decision-making audit tool.

As a predictive model for DM, we need to comprehensively evaluate its performance with a focus on key metrics aligned with clinical needs: Recall, which reflects the model’s ability to identify patients with distant metastasis; AUC, which assesses the overall discriminative power between metastatic and non-metastatic cases; and PRAUC, which is more robust for evaluating performance in imbalanced datasets. Logistic Regression emerged as the optimal predictive model with the highest AUC, Recall, and AUPRC, while achieving the lowest Brier score. External validation across multicenter datasets yielded an AUC of 0.879, superior to existing SEER-based DM models whose external validation AUCs ranging from 0.727 to 0.76 [16–19]. This suggests that our model demonstrates strong generalization capability. The final model incorporated five critical predictors: T stage, N stage, differentiation grade, PLR and TMI. SHAP analysis revealed advanced T and N stage, high PLR and TMI, and poor differentiation markedly increased the risk. Both SHAP plots and IDI analysis confirmed that TMI is the key incremental factor in this modeling process. To address the limitations of individual TMs in diagnostic and predictive performance, numerous studies have proposed TMI as new marker, demonstrating its superior clinical utility over single-marker approaches [26–28]. In these studies, there are mainly two types of TMI calculation methods: one based on ROC and the other based on the number of positive indicators. Considering the ease of standardization for data from different populations, we adopted the latter method and successfully constructed a prediction model incorporating TMI. Our study extends the application of TMI to gastric cancer metastasis. The performance of the model incorporating TMI was significantly improved by 25.8% to 30.9% compared with models without TMI. In particular, the performance improvement was most significant when compared with the traditional T and N stages. This confirms the application value of TMI in gastric cancer.

We acknowledge certain shortcomings that require further optimization. Firstly, this investigation was retrospective in nature. Although the study implemented rigorous inclusion and exclusion criteria, completely eradicating bias from the findings proved to be a challenge. Secondly, the internal test AUC of 0.935 showed strong discriminative ability on familiar data, while the external test set AUC of 0.879 indicated slight overfitting to training data noise, though still >0.85, proving basic generalization. Future work should add diverse data to narrow this gap. Thirdly, molecular markers (E-cadherin, VEGF, and Claudin-18.2) and imaging features associated with DM in GC were not incorporated into the current research, we will integrate them in future studies. Finally, the online prediction tool for DM risk in GC developed in this study, still requires further systematic evaluation by clinicians in terms of clinical practicality, decision-making auxiliary value, and operational experience. Notwithstanding these constraints, the remarkable efficacy of our ultimate predictive model remains unaffected.

In conclusion, we developed an interpretable ML model leveraging routine EMRS data to predict the risk of DM in GC. Logistic Regression model exhibited excellent prediction ability. The online prediction tool, developed using this model, classifies patients into distinct risk categories to provide doctors with preoperative decision support.

Supporting information

S1 Table. Hyperparameters for 5 machine learning models.

https://doi.org/10.1371/journal.pone.0335258.s001

(DOCX)

References

1. Chen Y, Jia K, Xie Y, Yuan J, Liu D, Jiang L, et al. The current landscape of gastric cancer and gastroesophageal junction cancer diagnosis and treatment in China: a comprehensive nationwide cohort analysis. J Hematol Oncol. 2025;18(1):42. pmid:40234884
- View Article
- PubMed/NCBI
- Google Scholar
2. Sirody J, Kaji AH, Hari DM, Chen KT. Patterns of gastric cancer metastasis in the United States. Am J Surg. 2022;224(1 Pt B):445–8. pmid:35144812
- View Article
- PubMed/NCBI
- Google Scholar
3. Lordick F, Carneiro F, Cascinu S, Fleitas T, Haustermans K, Piessen G, et al. Gastric cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol. 2022;33(10):1005–20. pmid:35914639
- View Article
- PubMed/NCBI
- Google Scholar
4. Rau B, Lang H, Koenigsrainer A, Gockel I, Rau H-G, Seeliger H, et al. Effect of Hyperthermic Intraperitoneal Chemotherapy on Cytoreductive Surgery in Gastric Cancer With Synchronous Peritoneal Metastases: The Phase III GASTRIPEC-I Trial. J Clin Oncol. 2024;42(2):146–56. pmid:37906724
- View Article
- PubMed/NCBI
- Google Scholar
5. Jayaprakasam VS, Paroder V, Schöder H. Variants and Pitfalls in PET/CT Imaging of Gastrointestinal Cancers. Semin Nucl Med. 2021;51(5):485–501. pmid:33965198
- View Article
- PubMed/NCBI
- Google Scholar
6. Ho SYA, Tay KV. Systematic review of diagnostic tools for peritoneal metastasis in gastric cancer-staging laparoscopy and its alternatives. World J Gastrointest Surg. 2023;15(10):2280–93. pmid:37969710
- View Article
- PubMed/NCBI
- Google Scholar
7. Lengyel CG, Hussain S, Trapani D, El Bairi K, Altuna SC, Seeber A, et al. The Emerging Role of Liquid Biopsy in Gastric Cancer. J Clin Med. 2021;10(10):2108. pmid:34068319
- View Article
- PubMed/NCBI
- Google Scholar
8. Han HS, Lee K-W. Liquid Biopsy: An Emerging Diagnostic, Prognostic, and Predictive Tool in Gastric Cancer. J Gastric Cancer. 2024;24(1):4–28. pmid:38225764
- View Article
- PubMed/NCBI
- Google Scholar
9. Zhang Z, Wu H, Chong W, Shang L, Jing C, Li L. Liquid biopsy in gastric cancer: predictive and prognostic biomarkers. Cell Death Dis. 2022;13(10):903. pmid:36302755
- View Article
- PubMed/NCBI
- Google Scholar
10. Mroz T, Griffin M, Cartabuke R, Laffin L, Russo-Alvarez G, Thomas G, et al. Predicting hypertension control using machine learning. PLoS One. 2024;19(3):e0299932. pmid:38507433
- View Article
- PubMed/NCBI
- Google Scholar
11. Yesilyaprak A, Kumar AK, Agrawal A, Furqan MM, Verma BR, Syed AB, et al. Predicting Long-Term Clinical Outcomes of Patients With Recurrent Pericarditis. J Am Coll Cardiol. 2024;84(13):1193–204. pmid:39217549
- View Article
- PubMed/NCBI
- Google Scholar
12. Ma Y, Luo M, Guan G, Liu X, Cui X, Luo F. An explainable predictive machine learning model of gangrenous cholecystitis based on clinical data: a retrospective single center study. World J Emerg Surg. 2025;20(1):1. pmid:39757162
- View Article
- PubMed/NCBI
- Google Scholar
13. Vieira FG, Venugopalan S, Premasiri AS, McNally M, Jansen A, McCloskey K, et al. A machine-learning based objective measure for ALS disease severity. NPJ Digit Med. 2022;5(1):45. pmid:35396385
- View Article
- PubMed/NCBI
- Google Scholar
14. Swanson K, Wu E, Zhang A, Alizadeh AA, Zou J. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell. 2023;186(8):1772–91. pmid:36905928
- View Article
- PubMed/NCBI
- Google Scholar
15. Keyl J, Kasper S, Wiesweg M, Götze J, Schönrock M, Sinn M, et al. Multimodal survival prediction in advanced pancreatic cancer using machine learning. ESMO Open. 2022;7(5):100555. pmid:35988455
- View Article
- PubMed/NCBI
- Google Scholar
16. Qin X, Qiu B, Ge L, Wu S, Ma Y, Li W. Applying machine learning techniques to predict the risk of distant metastasis from gastric cancer: a real world retrospective study. Front Oncol. 2024;14:1455914. pmid:39703842
- View Article
- PubMed/NCBI
- Google Scholar
17. Yang K, Wu J, Xu T, Zhou Y, Liu W, Yang L. Machine learning to predict distant metastasis and prognostic analysis of moderately differentiated gastric adenocarcinoma patients: a novel focus on lymph node indicators. Front Immunol. 2024;15:1398685. pmid:39364413
- View Article
- PubMed/NCBI
- Google Scholar
18. Tian H, Liu Z, Liu J, Zong Z, Chen Y, Zhang Z, et al. Application of machine learning algorithm in predicting distant metastasis of T1 gastric cancer. Sci Rep. 2023;13(1):5741. pmid:37029221
- View Article
- PubMed/NCBI
- Google Scholar
19. Lin Z, Wang R, Zhou Y, Wang Q, Yang C-Y, Hao B-C, et al. Prediction of distant metastasis and survival prediction of gastric cancer patients with metastasis to the liver, lung, bone, and brain: research based on the SEER database. Ann Transl Med. 2022;10(1):16. pmid:35242861
- View Article
- PubMed/NCBI
- Google Scholar
20. Yamamoto T, Kawada K, Obama K. Inflammation-Related Biomarkers for the Prediction of Prognosis in Colorectal Cancer Patients. Int J Mol Sci. 2021;22(15):8002. pmid:34360768
- View Article
- PubMed/NCBI
- Google Scholar
21. Staniewska E, Grudzien K, Stankiewicz M, Raczek-Zwierzycka K, Rembak-Szynkiewicz J, Nowicka Z, et al. The Prognostic Value of the Systemic Immune-Inflammation Index (SII) and Red Cell Distribution Width (RDW) in Patients with Cervical Cancer Treated Using Radiotherapy. Cancers. 2024;16(8):1542.
- View Article
- Google Scholar
22. Menyhart O, Fekete JT, Győrffy B. Inflammation and Colorectal Cancer: A Meta-Analysis of the Prognostic Significance of the Systemic Immune-Inflammation Index (SII) and the Systemic Inflammation Response Index (SIRI). Int J Mol Sci. 2024;25(15):8441. pmid:39126008
- View Article
- PubMed/NCBI
- Google Scholar
23. Inoue H, Kosuga T, Kubota T, Konishi H, Shiozaki A, Okamoto K, et al. Significance of a preoperative systemic immune-inflammation index as a predictor of postoperative survival outcomes in gastric cancer. World J Surg Oncol. 2021;19(1):173. pmid:34118953
- View Article
- PubMed/NCBI
- Google Scholar
24. Ding P, Guo H, Sun C, Yang P, Kim NH, Tian Y, et al. Combined systemic immune-inflammatory index (SII) and prognostic nutritional index (PNI) predicts chemotherapy response and prognosis in locally advanced gastric cancer patients receiving neoadjuvant chemotherapy with PD-1 antibody sintilimab and XELOX: a prospective study. BMC Gastroenterol. 2022;22(1):121. pmid:35287591
- View Article
- PubMed/NCBI
- Google Scholar
25. Tobing E, Tansol C, Tania C, Sihombing AT. Prognostic Nutritional Index (PNI) as Independent Predictor of Poor Survival in Prostate Cancer: A Systematic Review and Meta-Analysis. Clin Genitourin Cancer. 2024;22(5):102142. pmid:39079465
- View Article
- PubMed/NCBI
- Google Scholar
26. Zhang J, Yin X, Wang H, Fang T, Gao J, Zhu Z, et al. Development and Validation of Tumor Marker Indices in Advanced Gastric Cancer Patients. Cancer Control. 2023;30:10732748231202466. pmid:37728233
- View Article
- PubMed/NCBI
- Google Scholar
27. Kamada T, Ohdaira H, Takahashi J, Aida T, Nakashima K, Ito E, et al. Novel tumor marker index using carcinoembryonic antigen and carbohydrate antigen 19-9 is a significant prognostic factor for resectable colorectal cancer. Sci Rep. 2024;14(1):4192. pmid:38378762
- View Article
- PubMed/NCBI
- Google Scholar
28. Miyata T, Hayashi H, Yamashita Y-I, Matsumura K, Nakao Y, Itoyama R, et al. Prognostic Value of the Preoperative Tumor Marker Index in Resected Pancreatic Ductal Adenocarcinoma: A Retrospective Single-Institution Study. Ann Surg Oncol. 2021;28(3):1572–80. pmid:32804325
- View Article
- PubMed/NCBI
- Google Scholar
29. Shi J, Chen Y, Wang Y. Deep learning and machine learning approaches to classify stomach distant metastatic tumors using DNA methylation profiles. Comput Biol Med. 2024;175:108496. pmid:38657466
- View Article
- PubMed/NCBI
- Google Scholar
30. Zhang C, Yang J, Chen Y, Jiang F, Liao H, Liu X, et al. miRNAs derived from plasma small extracellular vesicles predict organo-tropic metastasis of gastric cancer. Gastric Cancer. 2022;25(2):360–74. pmid:35031872
- View Article
- PubMed/NCBI
- Google Scholar
31. Mirniaharikandehei S, Heidari M, Danala G, Lakshmivarahan S, Zheng B. Applying a random projection algorithm to optimize machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images. Comput Methods Programs Biomed. 2021;200:105937. pmid:33486339
- View Article
- PubMed/NCBI
- Google Scholar
32. Wu A, Luo L, Zeng Q, Wu C, Shu X, Huang P, et al. Comparative assessment of the capability of machine learning-based radiomic models for predicting omental metastasis in locally advanced gastric cancer. Sci Rep. 2024;14(1):16208. pmid:39003337
- View Article
- PubMed/NCBI
- Google Scholar
33. Xue Y, Zhang H, Zheng Z, Liu X, Yin J, Zhang J. Predictive performance of radiomics for peritoneal metastasis in patients with gastric cancer: a meta-analysis and radiomics quality assessment. J Cancer Res Clin Oncol. 2023;149(13):12103–13. pmid:37422882
- View Article
- PubMed/NCBI
- Google Scholar
34. Kolla L, Parikh RB. Uses and limitations of artificial intelligence for oncology. Cancer. 2024;130(12):2101–7. pmid:38554271
- View Article
- PubMed/NCBI
- Google Scholar
35. Krepper D, Cesari M, Hubel NJ, Zelger P, Sztankay MJ. Machine learning models including patient-reported outcome data in oncology: a systematic literature review and analysis of their reporting quality. J Patient Rep Outcomes. 2024;8(1):126. pmid:39499409
- View Article
- PubMed/NCBI
- Google Scholar
36. de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med. 2022;5(1):2. pmid:35013569
- View Article
- PubMed/NCBI
- Google Scholar
37. Wei X, Shi B. Reducing bias in coronary heart disease prediction using Smote-ENN and PCA. PLoS One. 2025;20(8):e0327569. pmid:40773445
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Chen Y, Jia K, Xie Y, Yuan J, Liu D, Jiang L, et al. The current landscape of gastric cancer and gastroesophageal junction cancer diagnosis and treatment in China: a comprehensive nationwide cohort analysis. J Hematol Oncol. 2025;18(1):42. pmid:40234884
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Sirody J, Kaji AH, Hari DM, Chen KT. Patterns of gastric cancer metastasis in the United States. Am J Surg. 2022;224(1 Pt B):445–8. pmid:35144812
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Lordick F, Carneiro F, Cascinu S, Fleitas T, Haustermans K, Piessen G, et al. Gastric cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol. 2022;33(10):1005–20. pmid:35914639
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Rau B, Lang H, Koenigsrainer A, Gockel I, Rau H-G, Seeliger H, et al. Effect of Hyperthermic Intraperitoneal Chemotherapy on Cytoreductive Surgery in Gastric Cancer With Synchronous Peritoneal Metastases: The Phase III GASTRIPEC-I Trial. J Clin Oncol. 2024;42(2):146–56. pmid:37906724
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Jayaprakasam VS, Paroder V, Schöder H. Variants and Pitfalls in PET/CT Imaging of Gastrointestinal Cancers. Semin Nucl Med. 2021;51(5):485–501. pmid:33965198
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Ho SYA, Tay KV. Systematic review of diagnostic tools for peritoneal metastasis in gastric cancer-staging laparoscopy and its alternatives. World J Gastrointest Surg. 2023;15(10):2280–93. pmid:37969710
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Lengyel CG, Hussain S, Trapani D, El Bairi K, Altuna SC, Seeber A, et al. The Emerging Role of Liquid Biopsy in Gastric Cancer. J Clin Med. 2021;10(10):2108. pmid:34068319
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Han HS, Lee K-W. Liquid Biopsy: An Emerging Diagnostic, Prognostic, and Predictive Tool in Gastric Cancer. J Gastric Cancer. 2024;24(1):4–28. pmid:38225764
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Zhang Z, Wu H, Chong W, Shang L, Jing C, Li L. Liquid biopsy in gastric cancer: predictive and prognostic biomarkers. Cell Death Dis. 2022;13(10):903. pmid:36302755
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Mroz T, Griffin M, Cartabuke R, Laffin L, Russo-Alvarez G, Thomas G, et al. Predicting hypertension control using machine learning. PLoS One. 2024;19(3):e0299932. pmid:38507433
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Yesilyaprak A, Kumar AK, Agrawal A, Furqan MM, Verma BR, Syed AB, et al. Predicting Long-Term Clinical Outcomes of Patients With Recurrent Pericarditis. J Am Coll Cardiol. 2024;84(13):1193–204. pmid:39217549
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Ma Y, Luo M, Guan G, Liu X, Cui X, Luo F. An explainable predictive machine learning model of gangrenous cholecystitis based on clinical data: a retrospective single center study. World J Emerg Surg. 2025;20(1):1. pmid:39757162
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Vieira FG, Venugopalan S, Premasiri AS, McNally M, Jansen A, McCloskey K, et al. A machine-learning based objective measure for ALS disease severity. NPJ Digit Med. 2022;5(1):45. pmid:35396385
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref14] 14. Swanson K, Wu E, Zhang A, Alizadeh AA, Zou J. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell. 2023;186(8):1772–91. pmid:36905928
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref15] 15. Keyl J, Kasper S, Wiesweg M, Götze J, Schönrock M, Sinn M, et al. Multimodal survival prediction in advanced pancreatic cancer using machine learning. ESMO Open. 2022;7(5):100555. pmid:35988455
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref16] 16. Qin X, Qiu B, Ge L, Wu S, Ma Y, Li W. Applying machine learning techniques to predict the risk of distant metastasis from gastric cancer: a real world retrospective study. Front Oncol. 2024;14:1455914. pmid:39703842
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref17] 17. Yang K, Wu J, Xu T, Zhou Y, Liu W, Yang L. Machine learning to predict distant metastasis and prognostic analysis of moderately differentiated gastric adenocarcinoma patients: a novel focus on lymph node indicators. Front Immunol. 2024;15:1398685. pmid:39364413
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref18] 18. Tian H, Liu Z, Liu J, Zong Z, Chen Y, Zhang Z, et al. Application of machine learning algorithm in predicting distant metastasis of T1 gastric cancer. Sci Rep. 2023;13(1):5741. pmid:37029221
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref19] 19. Lin Z, Wang R, Zhou Y, Wang Q, Yang C-Y, Hao B-C, et al. Prediction of distant metastasis and survival prediction of gastric cancer patients with metastasis to the liver, lung, bone, and brain: research based on the SEER database. Ann Transl Med. 2022;10(1):16. pmid:35242861
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref20] 20. Yamamoto T, Kawada K, Obama K. Inflammation-Related Biomarkers for the Prediction of Prognosis in Colorectal Cancer Patients. Int J Mol Sci. 2021;22(15):8002. pmid:34360768
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref21] 21. Staniewska E, Grudzien K, Stankiewicz M, Raczek-Zwierzycka K, Rembak-Szynkiewicz J, Nowicka Z, et al. The Prognostic Value of the Systemic Immune-Inflammation Index (SII) and Red Cell Distribution Width (RDW) in Patients with Cervical Cancer Treated Using Radiotherapy. Cancers. 2024;16(8):1542.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref22] 22. Menyhart O, Fekete JT, Győrffy B. Inflammation and Colorectal Cancer: A Meta-Analysis of the Prognostic Significance of the Systemic Immune-Inflammation Index (SII) and the Systemic Inflammation Response Index (SIRI). Int J Mol Sci. 2024;25(15):8441. pmid:39126008
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref23] 23. Inoue H, Kosuga T, Kubota T, Konishi H, Shiozaki A, Okamoto K, et al. Significance of a preoperative systemic immune-inflammation index as a predictor of postoperative survival outcomes in gastric cancer. World J Surg Oncol. 2021;19(1):173. pmid:34118953
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref24] 24. Ding P, Guo H, Sun C, Yang P, Kim NH, Tian Y, et al. Combined systemic immune-inflammatory index (SII) and prognostic nutritional index (PNI) predicts chemotherapy response and prognosis in locally advanced gastric cancer patients receiving neoadjuvant chemotherapy with PD-1 antibody sintilimab and XELOX: a prospective study. BMC Gastroenterol. 2022;22(1):121. pmid:35287591
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref25] 25. Tobing E, Tansol C, Tania C, Sihombing AT. Prognostic Nutritional Index (PNI) as Independent Predictor of Poor Survival in Prostate Cancer: A Systematic Review and Meta-Analysis. Clin Genitourin Cancer. 2024;22(5):102142. pmid:39079465
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref26] 26. Zhang J, Yin X, Wang H, Fang T, Gao J, Zhu Z, et al. Development and Validation of Tumor Marker Indices in Advanced Gastric Cancer Patients. Cancer Control. 2023;30:10732748231202466. pmid:37728233
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref27] 27. Kamada T, Ohdaira H, Takahashi J, Aida T, Nakashima K, Ito E, et al. Novel tumor marker index using carcinoembryonic antigen and carbohydrate antigen 19-9 is a significant prognostic factor for resectable colorectal cancer. Sci Rep. 2024;14(1):4192. pmid:38378762
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref28] 28. Miyata T, Hayashi H, Yamashita Y-I, Matsumura K, Nakao Y, Itoyama R, et al. Prognostic Value of the Preoperative Tumor Marker Index in Resected Pancreatic Ductal Adenocarcinoma: A Retrospective Single-Institution Study. Ann Surg Oncol. 2021;28(3):1572–80. pmid:32804325
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref29] 29. Shi J, Chen Y, Wang Y. Deep learning and machine learning approaches to classify stomach distant metastatic tumors using DNA methylation profiles. Comput Biol Med. 2024;175:108496. pmid:38657466
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref30] 30. Zhang C, Yang J, Chen Y, Jiang F, Liao H, Liu X, et al. miRNAs derived from plasma small extracellular vesicles predict organo-tropic metastasis of gastric cancer. Gastric Cancer. 2022;25(2):360–74. pmid:35031872
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref31] 31. Mirniaharikandehei S, Heidari M, Danala G, Lakshmivarahan S, Zheng B. Applying a random projection algorithm to optimize machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images. Comput Methods Programs Biomed. 2021;200:105937. pmid:33486339
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref32] 32. Wu A, Luo L, Zeng Q, Wu C, Shu X, Huang P, et al. Comparative assessment of the capability of machine learning-based radiomic models for predicting omental metastasis in locally advanced gastric cancer. Sci Rep. 2024;14(1):16208. pmid:39003337
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref33] 33. Xue Y, Zhang H, Zheng Z, Liu X, Yin J, Zhang J. Predictive performance of radiomics for peritoneal metastasis in patients with gastric cancer: a meta-analysis and radiomics quality assessment. J Cancer Res Clin Oncol. 2023;149(13):12103–13. pmid:37422882
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref34] 34. Kolla L, Parikh RB. Uses and limitations of artificial intelligence for oncology. Cancer. 2024;130(12):2101–7. pmid:38554271
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref35] 35. Krepper D, Cesari M, Hubel NJ, Zelger P, Sztankay MJ. Machine learning models including patient-reported outcome data in oncology: a systematic literature review and analysis of their reporting quality. J Patient Rep Outcomes. 2024;8(1):126. pmid:39499409
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref36] 36. de Hond AAH, Leeuwenberg AM, Hooft L, Kant IMJ, Nijman SWJ, van Os HJA, et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med. 2022;5(1):2. pmid:35013569
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref37] 37. Wei X, Shi B. Reducing bias in coronary heart disease prediction using Smote-ENN and PCA. PLoS One. 2025;20(8):e0327569. pmid:40773445
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

Figures

Abstract

Objective

Methods

Results

Conclusion

Introduction

Methods

Patients involvement

Data collection and processing

Sample balancing strategies

Feature selection and validation strategy

ML algorithms

Statistical analysis

Results

Baseline clinical characteristics

Sample balancing via borderline SMOTE-ENN method

Feature selection associated with distant metastasis in gastric cancer

Identification of the optimal ML model

Analysis and assessment of the Logistic regression model

External test of the model and the performance differences between internal and external test

Online prediction tool

Discussion

Supporting information

S1 Table. Hyperparameters for 5 machine learning models.

References