Figures
Abstract
Background
Patients with advanced lung cancer admitted to the intensive care unit (ICU) face a substantially elevated risk of in-hospital mortality. Early identification of high-risk individuals is essential to support timely clinical decision-making. This study aimed to develop and validate a predictive model using machine learning (ML) techniques to estimate in-hospital mortality in this patient population.
Methods
Clinical data were obtained from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression, enabling the construction of eight ML models: logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), artificial neural network (ANN), extreme gradient boosting (XGBoost), k-nearest neighbors (k-NN), adaptive boosting (AdaBoost), and random forest (RF). Model performance was assessed using the area under the receiver operating characteristic curve (AUC), as well as accuracy, sensitivity, specificity, and F1 score. Discrimination, calibration, and clinical utility were also evaluated. The final model incorporated 27 clinically interpretable variables, including not only established severity scores (e.g., SAPS II) but also dynamic treatment factors (e.g., vasopressin, mechanical ventilation duration) that reflect real-world ICU practice. SHAP analysis was employed to enhance interpretability, allowing clinicians to understand both the magnitude and directionality of key predictors—an improvement over black-box ML applications in prior studies.
Results
Among the 1,755 patients included, 368 (21%) died during hospitalization in the training cohort.Notably, older individuals, particularly those of Caucasian descent, demonstrated a higher susceptibility to mortality during their hospital stay. Lasso regression revealed that 27 variables demonstrated a significant correlation with lung cancer, such as gender, hospital stay duration The XGBoost model achieved the highest predictive performance, achieving an accuracy of 0.783, an F1 score of 0.595, and an AUC of 0.865 (95% CI: 0.840–0.891)within the training cohort. The performance metrics for the test cohort reflected similar trends, with an accuracy of 0.719, an F1 score of 0.543, and an AUC of 0.790(95% CI: 0.741–0.840). Key predictors identified consistently across models (LR, SVM, ANN, and XGBoost) included hospital stay duration, Simplified Acute Physiology Score II (SAPS II), use of norepinephrine and vasopressin, prothrombin time (PT), mechanical ventilation duration, white blood cell count (WBC), and blood urea nitrogen (BUN). The SHAP summary plot further illustrated the direction and magnitude of influence for the top 15 predictors.
Conclusion
The XGBoost-based model showed the best performance in predicting in-hospital mortality among critically ill lung cancer patients. Hospital stay duration and SAPS II score emerged as the most influential predictors,which can serve as the basis for a simplified clinical risk score. These findings may support early risk stratification and guide clinical decision-making in the ICU. The analysis, relying exclusively on internal divisions from MIMIC-IV, restricts the model’s generalizability and, consequently, its applicability in broader clinical contexts.
Citation: Wang J, Lin L, Qiu L-p, Zheng L-l, Wu L-x, Lv H, et al. (2026) Assessing in-hospital mortality risk in ICU lung cancer patients using machine learning: An analysis based on the MIMIC-IV database. PLoS One 21(1): e0341259. https://doi.org/10.1371/journal.pone.0341259
Editor: Miquel Vall-llosera Camps, PLOS ONE, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
Received: July 6, 2025; Accepted: January 5, 2026; Published: January 22, 2026
Copyright: © 2026 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data are available from the Medical Information Mart for Intensive Care-IV (MIMIC-IV, v2.2) (https://physionet.org/content/mimiciv/2.1/).
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Lung cancer is a leading cause of cancer-related mortality worldwide, with a significant number of patients facing poor prognoses due to diagnoses made at advanced stages and inadequate treatment options [1–3]. The prognosis for lung cancer patients in the intensive care unit (ICU) is particularly bleak, with studies indicating in-hospital mortality rates that can be as high as 69% [4]. Moreover, identifying prognostic factors, such as performance status and biochemical markers, is essential for developing effective care strategies and prioritizing ICU admissions. Recent research has highlighted the increased mortality rates among lung cancer patients admitted to the ICU, thereby emphasizing the urgent need for reliable early prediction and risk stratification methods to identify high-risk individuals and improve clinical outcomes [5–7].By filling this knowledge gap, we can better allocate ICU resources, optimize treatment plans, and offer more accurate prognoses to patients and their families. Therefore, there is an immediate need for research that focuses specifically on predicting in – hospital mortality in this vulnerable patient population.
Machine learning (ML) offers a promising alternative due to its ability to analyze complex datasets, revealing patterns and relationships that may remain hidden in traditional statistical analyses [8,9]. This highlights the necessity for dependable predictive models that can accurately identify patients at increased risk of mortality during hospitalization. As an innovative form of artificial intelligence, ML has the potential to transform measurement results into relevant predictive models, particularly in oncology, aided by rapid advancements in large datasets and deep learning techniques. Recent findings demonstrate that ML has been effective in predicting lung cancer susceptibility, recurrence, survival rates, and the prognosis of malignant tumors [10–13]. However, there is a notable lack of data regarding in-hospital mortality risk prediction models utilizing ML approaches specifically for lung cancer patients in an ICU setting.
Previous studies may have only used a single or a few machine – learning models to predict the in – hospital mortality of critically ill lung cancer patients. In this study, eight different machine – learning models were constructed, including Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Artificial Neural Network (ANN), eXtreme Gradient Boosting (XGBoost), k – Nearest Neighbors (k – NN), Adaptive Boosting (AdaBoost), and Random Forest (RF). By comparing multiple models, it is possible to comprehensively evaluate the performance of different models in predicting the in – hospital mortality of this specific patient group, and thus select the optimal model.
Despite the high accuracy levels achieved by these ML models, the specific contributions of individual variables to their performance often remain unclear. This ambiguity limits the practical application of ML methodologies in clinical environments [14]. Shapley Additive ExPlanations (SHAP) provides a solution by integrating optimal credit allocation with localized interpretations, visually elucidating the importance of each variable within the model [15], thereby enhancing interpretability.
This study aims to utilize the extensive data available in the MIMIC-IV database to develop a robust ML model for predicting in-hospital mortality in lung cancer patients. The model will be validated using various algorithms, including logistic regression (LR), support vector machine (SVM), gradient boosting machine (GBM), artificial neural network (ANN), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), adaptive boosting (AdaBoost), and random forest (RF), thereby allowing a comprehensive assessment of its predictive capability. The identification of significant risk factors associated with in-hospital mortality will be prioritized, with the visual interpretation of the model using SHAP techniques to assist clinicians in identifying and intervening with high-risk populations.
Methods
Data source
This retrospective cohort study utilized data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV, v2.2) (https://physionet.org/content/mimiciv/2.1/) [16,17], a clinical database that includes 730,141 ICU admissions from Beth Israel Deaconess Medical Center between 2008 and 2019, located in the United States. Since the study utilized non-identifiable, publicly available data, it received an exemption from informed consent requirements and Institutional Review Board (IRB) approval in accordance with ethical guidelines. All research procedures adhered to the ethical principles outlined in the 1964 Declaration of Helsinki and its subsequent amendments. Before data extraction, the research team completed the required web-based training modules and successfully passed the Protecting Human Research Participants certification exam, thereby obtaining authorized access to the MIMIC-IV database. A verbal notification was submitted to the institutional ethics committee, waiving the need for formal approval due to the retrospective, anonymized nature of the study.
Cohort selection
Individuals meeting any of the following criteria were excluded from the study: (1) those under 18 years of age at the time of their initial ICU admission; (2) patients with multiple ICU admissions; (3) cases with over 80% of personal data missing. We employed a random selection from the MIMIC-IV database to create both the training and test cohorts. Ultimately, this study included 1,288 patients in the training cohort and 527 patients in the test cohort, as depicted in the detailed flowchart presented in Fig 1.
Data collection and outcomes
Data extraction from the database regarding patients admitted to the ICU within the first 24 hours was conducted using structured query language (SQL) in PostgreSQL. The variables considered in this investigation included the following categories: (1) Demographic information: gender, race, admission age, and weight; (2) Laboratory indicators: blood urea nitrogen (BUN), prothrombin time (PT), red blood cell count (RBC), red cell distribution width (RDW), partial thromboplastin time (PTT), chloride, anion gap, hemoglobin, white blood cell count (WBC), platelet count, creatinine, potassium, sodium, calcium; (3) Comorbidities: congestive heart failure, chronic pulmonary disease, peptic ulcer disease, diabetes, renal disease, Charlson comorbidity index, mild liver disease; (4) Vital signs: heart rate (HR), respiratory rate (RR), systolic blood pressure (SBP), diastolic blood pressure (DBP), temperature, percutaneous arterial oxygen saturation (SpO2); (5) Scoring systems: sequential organ failure assessment (SOFA) score, glasgow coma scale (GCS), acute physiology score III (APS III); (6) Interventions: continuous renal replacement therapy (CRRT), mechanical ventilation, vasopressin, losartan, epinephrine, dopamine, norepinephrine. For multiple measurements, the maximum and minimum values recorded on the first day were utilized, with the exception of SpO2. To reduce the impact of missing data on model development, the KNN Imputer method was employed to impute values for data with less than 30% missingness, whereas data with over 30% missingness were excluded from the analysis. The primary outcome of interest was in-hospital mortality.
Statistical analysis
The Kolmogorov–Smirnov test was employed to analyze continuous variables. Given that all continuous variables demonstrated non-normal distributions, data were summarized using the median and interquartile range. To assess differences between groups, the Mann–Whitney U test was applied. Categorical variables were reported as percentages (%), and Pearson chi-squared tests were conducted to evaluate group differences. To address the class imbalance observed in the dependent variables, under-sampling techniques were implemented to readjust the dataset for balance. The sample data were partitioned into a training set and an internal validation set using fivefold cross-validation sampling. In scenarios involving numerous features, the LASSO technique was utilized for feature selection on the training set, which incorporates L1 regularization to select relevant features and reduce dimensionality by compressing coefficients, thereby identifying features with significant contributions whereas discarding redundant ones.
Development of predictive models for in-hospital mortality
This research implemented eight ML algorithms: LR, SVM, GBM, ANN, XGBoost, KNN, AdaBoost, and RF to develop a predictive model for in-hospital mortality. Use one-hot encoding to handle categorical variables in LR, SVM, ANN, and KNN, while label encoding is recommended for GBM, XGBoost, AdaBoost, and RF.The variables selected through lasso regression were incorporated into the model. To ensure the model’s robustness, ten-fold cross-validation was utilized. Parameter tuning was conducted using grid search to identify the optimal tuning parameters for each algorithm. During the parameter adjustment phase, the model exhibiting the highest area under the curve (AUC) from the receiver operating characteristic (ROC) analysis was identified as the most effective model. The models were constructed using the training dataset, followed by validation of the best model using both internal and external validation datasets.
LASSO regression can effectively identify these key features with linear relationships and retain them. Subsequently, XGBoost can further explore the possible non – linear interaction effects between features on the basis of these selected features. In this way, the hybrid approach combines the advantages of both models, leveraging LASSO regression’s effective capture of linear relationships and XGBoost’s ability to mine non – linear information between features, thereby improving the overall performance of the model.
Performance assessment
The evaluation of the predictive model’s performance was conducted utilizing the AUC of the ROC curve, as well as metrics such as sensitivity, specificity, F1 score, accuracy, and threshold values. Furthermore, decision curve analysis (DCA) and calibration curves were generated to illustrate the true clinical applicability of the model.
Results
Baseline characteristics
Following a thorough screening process, a cohort of 1,755 patients from the MIMIC-IV database was identified for inclusion in this study, of which 368 individuals (21%) died after admission to the ICU. As presented in Table 1, the baseline characteristics of the enrolled patients in the MIMIC-IV database were summarized. Notably, older individuals, particularly those of Caucasian descent, demonstrated a higher susceptibility to mortality during their hospital stay. These patients also experienced extended hospital stays following ICU admission, coupled with significantly elevated in-hospital mortality rates.
Furthermore, patients categorized as Non-Survival exhibited elevated rates of Charlson comorbidity and mild liver disease. On the first day following admission to the ICU, Non-Survival patients were more likely to necessitate interventions, including the administration of vasopressors, CRRT, mechanical ventilation, and medications such as epinephrine, norepinephrine, and dopamine. Laboratory evaluations and vital sign measurements indicated that patients who did not survive demonstrated significantly lower levels of RBC, hemoglobin, SpO2, pH, base excess, albumin, anion gap, bicarbonate, and calcium. In contrast, these patients exhibited elevated concentrations of BUN, potassium, creatinine, total bilirubin, alanine aminotransferase (ALT), alkaline phosphatase (ALP), aspartate aminotransferase (AST), lactate dehydrogenase (LDH), PT, international normalized ratio (INR), WBC, RDW and the anion gap when compared to lung cancer patients who survived.
Feature selection
Lasso regression was employed to identify relevant features within the training dataset, with the characteristics of the variable coefficients illustrated in Fig 2. An iterative analysis employed a tenfold cross-validation approach. This analysis revealed that 27 variables demonstrated a significant correlation with lung cancer, including gender, hospital length of stay, age at admission, race, BUN, potassium, sodium, creatinine, PT, RBC, WBC, platelet count, RDW, hemoglobin, PTT, chloride, anion gap, calcium, heart_rate, SBP, DBP, resp_rate, temperature, SpO2, congestive heart failure, peptic ulcer disease, Charlson comorbidity index, mild liver disease, SOFA score, SAPS II score, GCS, ventilator hours, vasopressin usage, epinephrine usage, norepinephrine usage, and dopamine usage.
Model performance comparisons
In our study, we developed eight ML models to predict in-hospital mortality using a comprehensive set of features. The performance of these models is summarized in Table 2. Notably, the RF, KNN, and GBM models exhibited suboptimal predictive capabilities, as indicated by overfitting in the ROC curves. In contrast, the XGBoost model emerged as the most effective, achieving an accuracy of 0.783, an F1 score of 0.595, and an AUC of 0.865 within the training cohort. The performance metrics for the test cohort reflected similar trends, with an accuracy of 0.719, an F1 score of 0.543, and an AUC of 0.790 (refer to Table 2). Furthermore, we conducted a ROC analysis to validate the predictive capabilities of these eight models concerning in-hospital mortality. As illustrated in Fig 3A and 3B, the RF, KNN, and GBM models consistently demonstrated the least predictive power, whereas the XGBoost model exhibited superior performance across both the training and test cohorts.
The calibration curves for all eight models are depicted in Fig 4A, C, providing essential insights into their predictive accuracy. Among the evaluated models, five—excluding the RF, KNN, and GBM—exhibited commendable agreement between predicted probabilities and actual observed results. Regarding clinical utility, all models, apart from the RF, KNN, and GBM, demonstrated significant net benefit across a broad spectrum of threshold probabilities. Notably, the XGBoost model provided the highest net benefit, establishing it as the most effective model for forecasting in-hospital mortality in lung cancer patients (Fig 4B, D).
Feature importance analysis
To clarify the significant features influencing model outputs, an analysis of feature importance was conducted. The top 36 features identified from the LR, SVM, Neural Network, and XGBoost models are presented in Fig 4. Within the XGBoost framework,hospital stay duration (los_hospital) and the SAPS II emerged as the most critical factors, followed by norepinephrine administration, vasopressin, PT, ventilator hours, WBC, and BUN levels (Fig 5). In the LR model, the SAPS II was recognized as the most significant predictor of in-hospital mortality; additionally, ventilator hours, length of stay, norepinephrine administration, vasopressin usage, PT, and WBC count also substantially contributed to mortality predictions (S1 Fig). Furthermore, in the SVM and Neural Network models, the SAPS II and ventilator hours were similarly identified as having considerable effects on the prediction of in-hospital mortality, as illustrated in S2 and S3 Fig.
Interpretability analysis
To clarify the overall effects—both positive and negative—on model results and to investigate the similarities and differences in the significant characteristics of critically ill lung cancer patients across varying severity levels, the SHAP summary chart was employed. Fig 6A illustrates a detailed SHAP bee swarm plot that displays the impact of different features on the XGBoost model’s output.The horizontal axis represents the SHAP values, whereas the vertical axis ranks the features according to their cumulative effect on the SHAP values. Each data point corresponds to a unique instance, with its position along the x-axis reflecting the SHAP value associated with that specific instance and feature.
The plot clearly ranks the features by their overall impact on the model’s predictions. Features at the top of the plot (e.g., los_hospital, vasopressin_used, sapsii_idx1, norepinephrine_used) are the most important.Hospital stay duration(los_hospital) appears to be the most important feature. Most of the points are clustered around zero, but the color variance suggests a complex relationship to other variables. sapsii_idx1 is likely a severity score. It shows the patients with high severity scores (yellow dots) have a wider spread of shap values, which suggest a positive association.vasopressin_used and norepinephrine_used indicate the use of vasopressors, likely used to treat low blood pressure. Higher values of these features (yellow dots) tend to have positive SHAP values, suggesting that their use is associated with an increased predicted value.
As depicted in Fig 6B, S4 Fig sapsii_idx1 = 32 is a severity score. A value of 32 contributes −0.0091. the SAPS II score was identified as the most significant feature among the top 15 in the XGBoost model; a higher SAPS II score correlates with an increased likelihood of in-hospital mortality. This finding highlights the necessity of prioritizing the SAPS II score when predicting in-hospital mortality rates.All the features shown in the graph pushes the prediction to the left, meaning that each of them contributed to decreasing the value from the average to the final prediction value.We have added a correlation heatmap and a table of correlation coefficients for the top features (SAPS II, hospital stay duration, PT, and norepinephrine use) in the S5 Fig.Our analysis shows SAPS II shows almost no correlation with hospital stay duration(0.0778) and do not appear to pose a risk to the interpretability of the model.
Discussion
In contemporary healthcare, the accurate prediction of in – hospital mortality for intensive care unit (ICU) patients with advanced lung cancer is of paramount importance. This study, leveraging the rich data within the Medical Information Mart for Intensive Care – IV (MIMIC – IV) database, aimed to develop and validate machine learning (ML) models for this purpose. Our findings contribute significantly to the existing body of knowledge, offering new insights into the factors influencing in – hospital mortality and highlighting the potential of ML in enhancing clinical decision – making.
Lung cancer patients admitted to ICUs exhibit significantly higher mortality rates than those with other solid malignancies, with results varying considerably by disease stage. Existing literature indicates mortality rates exceeding 50% for patients with advanced or extensive-stage disease. Specifically, Park et al. documented a 58.3% mortality rate among critically ill patients with advanced lung cancer in their Korean cohort study (2008–2010) [18]. Similarly, Song et al. reported mortality rates of 82.4% and 65.9% for stage IIIB/IV NSCLC and extensive-stage SCLC patients prior to and following 2011, respectively [19]. Our findings are more consistent with Adam et al.’s report of a 20% mortality rate in stage I NSCLC patients, likely attributable to our cohort’s predominance of primary rather than metastatic lung cancer cases [20].
This study developed a predictive model for in-hospital mortality among lung cancer patients using ML approaches. Twenty-seven clinical variables collected within the first 24 hours of ICU admission were analyzed using eight different ML algorithms. The XGBoost model exhibited superior performance, demonstrating robust discrimination, calibration, and significant clinical utility. Validation in an independent test cohort further affirmed the model’s reliability and accuracy. To enhance interpretability, we employed the SHAP framework for feature importance analysis. The results identified eight key predictors:hospital stay duration, SAPS II score, administration of norepinephrine or vasopressin, PT, duration of mechanical ventilation, WBC, and BUN levels. Additionally, SHAP force plots offered valuable insights into the model’s decision-making process for assessing individual patient mortality risk.
The XGBoost algorithm, an advanced implementation of gradient-boosted decision trees, demonstrates significant efficacy in handling large-scale datasets characterized by complex feature spaces. Medical research has widely adopted XGBoost-based predictive frameworks, revealing superior performance across various clinical domains, including sepsis management, cardiovascular risk assessment, and renal impairment evaluation [21–23]. Unlike traditional LR techniques, XGBoost effectively identifies non-linear associations and integrates multiple weak classifiers into a robust predictive model, enhancing generalizability. The SHAP framework offers valuable insights into ML model behavior by quantifying feature contributions, thereby improving model interpretability.The XGBoost model achieved the highest predictive performance, achieving an accuracy of 0.783, an F1 score of 0.595, and an AUC of 0.865 (95% CI: 0.840–0.891)within the training cohort. The performance metrics for the test cohort reflected similar trends, with an accuracy of 0.719, an F1 score of 0.543, and an AUC of 0.790(95% CI: 0.741–0.840).
This study systematically evaluated eight different ML approaches (LR, SVM, GBM, neural network, XGBoost, KNN, AdaBoost, and RF) for developing predictive models. Model performance was rigorously assessed using six established evaluation metrics: receiver operating characteristic curve analysis, F1 score, classification accuracy, sensitivity, specificity, and precision. Notably, our findings revealed superior predictive capability and stability with an ensemble approach that incorporated SVM, neural network, LR, and XGBoost, consistent with existing literature [24].
While comparing eight ML algorithms strengthens the methodological rigor of this study, the calibration curves (Fig A) reveal potential overfitting in some models, particularly Random Forest (RF) and k-Nearest Neighbors (k-NN).These models demonstrate a significant deviation from the ideal calibration line, suggesting that their predicted probabilities do not accurately reflect the observed event percentages in the validation dataset. Specifically, the observed event percentage is significantly higher in the validation data than the predicted probability from the KNN model and the RF model. This overconfidence in predictions is characteristic of overfitting, where the model has learned the training data too well and struggles to generalize to unseen data. While GBM also shows some calibration issues, the divergence is less pronounced than in RF and k-NN. This observation is reinforced when contrasting Fig 4A (training set) with Fig 4C (test set), where all models exhibit better calibration.
Through SHAP value analysis, we identified key predictive features, including hospital stay duration, SAPS II score, vasopressor administration (norepinephrine and vasopressin), PT, mechanical ventilation duration, WBC, and BUN levels. The SAPS II scoring system, a validated measure of organ dysfunction severity upon ICU admission, has demonstrated robust performance in mortality risk stratification [21,25–27]. Although extensively utilized for general ICU mortality prediction, the specific association between SAPS II scores and in-hospital mortality among lung cancer patients remains inadequately explored in current literature. Comparative studies have established the superior predictive validity of both SAPS II and SOFA scores for infection-related mortality in critical care populations, with score elevation correlating positively with increased mortality risk [28].
The association between the duration of ICU stay and patient mortality exhibits a biphasic pattern, indicating that both short stays (reflecting rapid clinical deterioration) and prolonged stays (suggesting persistent organ failure) are associated with increased mortality risks. However, existing research has not established precise cutoff values for these critical time points. Optimal clinical management should incorporate sequential physiological evaluations using standardized scoring systems, such as SAPS II, to facilitate comprehensive patient assessment [29,30]. If length of stay (LOS) is not known at admission, the use of first-day LOS can be explicitly stated as a proxy for SAPS II scoring.
Our predictive model exhibited consistent performance during external validation, confirming its robust generalizability across various clinical environments. Moreover, the selected predictive variables comprise standard clinical measurements, ensuring both ease of assessment and seamless integration across institutions with varying resource capacities.
Several methodological limitations merit attention in this study. First, the retrospective observational design introduces inherent selection bias and potential information bias due to inconsistencies in data collection and incomplete records. Second, whereas utilizing the MIMIC-IV database, the absence of external validation cohorts raises concerns regarding potential model overfitting, highlighting the necessity for future validation studies. Third, despite internal validation, the single-center nature of our data highlights the need for multicenter prospective studies to evaluate model generalizability across diverse populations. Most notably, the lack of histopathological and radiological characterization of lung cancer subtypes, particularly the distinction between small cell and non-small cell carcinomas, represents a significant limitation, as the absence of molecular diagnostic data may have affected model performance. Despite these limitations, our predictive model demonstrates clinical utility in assisting physicians with timely patient risk stratification.
In the process of developing our predictive model, we are acutely aware that several important clinical variables, namely cancer stage, metastasis, histology, and performance status, were not incorporated into the analysis. The absence of these variables has significant implications for the model’s clinical applicability.we acknowledge that the absence of these important clinical variables is a limitation of our study. By discussing the potential confounding effects and reasons for their missingness, we hope to provide a more comprehensive understanding of the model’s limitations and guide future research efforts to improve its clinical applicability.
While our XGBoost model demonstrates robust predictive accuracy in the MIMIC-IV cohort, its deployment in real-world ICU settings requires addressing several practical barriers. First, interoperability with existing electronic health record (EHR) systems is critical; this could be achieved through API-based integration or modular deployment within clinical decision support platforms.Second, clinician adoption hinges on interpretability—a challenge we mitigated via SHAP analysis (Fig 6), which quantifies and visualizes the impact of key variables (e.g., SAPS II, vasopressin use) on predictions. To foster trust, future iterations could incorporate interactive interfaces allowing clinicians to adjust inputs and observe risk estimates in real time. Regulatory approval would require prospective trials to evaluate model performance under dynamic ICU conditions and diverse patient demographics. A critical future direction involves deploying this model within a dedicated clinical platform to rigorously evaluate its runtime performance and integration feasibility.
Conclusion
In a clinical setting, especially at the bedside, healthcare providers need quick and easy – to – use tools for risk assessment. While the XGBoost model demonstrated the best performance in predicting in – hospital mortality among critically ill lung cancer patients, it may be complex for immediate use in a busy ICU environment. A simplified risk score can distill the key information from the model into a more accessible format, enabling rapid decision – making.
Based on this study, multiple models (logistic regression, support vector machine, artificial neural network, and XGBoost) have consistently identified several key predictive factors, which can serve as the basis for a simplified clinical risk score. These factors include hospital stay duration, SAPS II scores, vasopressor administration (norepinephrine and vasopressin), coagulation parameters (PT), mechanical ventilation duration, and metabolic markers (WBC and BUN). The process of developing a simplified score includes determining the weight of each predictive factor. This can be based on the results of SHAP analysis, the application of SHAP analysis improved clinical interpretability by clarifying feature contributions to mortality risk, providing intensivists with actionable insights for prognostic assessment in critical care settings.
In conclusion, developing a simplified clinical risk score based on the key predictive factors identified in this study is expected to improve the bedside applicability of the model and support clinical decision – making in the ICU. Although, a notable limitation of our study is that the analysis solely relied on internal divisions from MIMIC – IV. This restricts the model’s applicability in broader clinical scenarios.
Supporting information
S1 Fig. The top 36 features derived from Logistic.
https://doi.org/10.1371/journal.pone.0341259.s001
(TIF)
S3 Fig. The top 36 features derived from Neural Network.
https://doi.org/10.1371/journal.pone.0341259.s003
(TIF)
S5 Fig. Correlation heatmap and a table of correlation coefficients for the top features (SAPS II, hospital stay duration, PT, and norepinephrine use).
https://doi.org/10.1371/journal.pone.0341259.s005
(TIF)
References
- 1. Ferlay J, Colombet M, Soerjomataram I, Mathers C, Parkin DM, Piñeros M, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer. 2019;144(8):1941–53. pmid:30350310
- 2. Bade BC, Dela Cruz CS. Lung cancer 2020: epidemiology, etiology, and prevention. Clin Chest Med. 2020;41(1):1–24. pmid:32008623
- 3. Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398(10299):535–54. pmid:34273294
- 4. Özpınar ŞN, Gürün Kaya A, Öz M, Erol S, Arslan F, Çiledağ A, et al. Prognosis of lung cancer patients followed in the intensive care unit: a cross-sectional study. Tuberk Toraks. 2023;71(2):138–47. pmid:37345396
- 5. Al-Dorzi HM, Atham S, Khayat F, Alkhunein J, Alharbi BT, Alageel N, et al. Characteristics, management, and outcomes of patients with lung cancer admitted to a tertiary care intensive care unit over more than 20 years. Ann Thorac Med. 2024;19(3):208–15. pmid:39144533
- 6. Liu W, Zhou D, Zhang L, Huang M, Quan R, Xia R, et al. Characteristics and outcomes of cancer patients admitted to intensive care units in cancer specialized hospitals in China. J Cancer Res Clin Oncol. 2024;150(4):205. pmid:38642154
- 7. Hautecloque-Rayz S, Albert-Thananayagam M, Martignene N, Le Deley M-C, Carbonnelle G, Penel N, et al. Long-Term Outcomes and Prognostic Factors of Patients with Metastatic Solid Tumors Admitted to the Intensive Care Unit. Oncology. 2022;100(3):173–81. pmid:35051928
- 8. Xie Z, Zhu S, Wang J, Zhang M, Lv X, Ma Y, et al. Relationship between coagulopathy score and ICU mortality: Analysis of the MIMIC-IV database. Heliyon. 2024;10(14):e34644. pmid:39130418
- 9. Criminisi A. Machine learning for medical images analysis. Med Image Anal. 2016;33:91–3. pmid:27374127
- 10. Gould MK, Huang BZ, Tammemagi MC, Kinar Y, Shiff R. Machine learning for early lung cancer identification using routine clinical and laboratory data. Am J Respir Crit Care Med. 2021;204(4):445–53. pmid:33823116
- 11. Duan S, Cao H, Liu H, Miao L, Wang J, Zhou X, et al. Development of a machine learning-based multimode diagnosis system for lung cancer. Aging (Albany NY). 2020;12(10):9840–54. pmid:32445550
- 12. Li D, Li Z, Ding M, Ni R, Wang J, Qu L, et al. Comparative analysis of three data mining techniques in diagnosis of lung cancer. Eur J Cancer Prev. 2021;30(1):15–20. pmid:32868638
- 13. Swanson K, Wu E, Zhang A, Alizadeh AA, Zou J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell. 2023;186(8):1772–91. pmid:36905928
- 14. Cabitza F, Rasoini R, Gensini GF. Unintended consequences of machine learning in medicine. JAMA. 2017;318(6):517–8. pmid:28727867
- 15. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable ai for trees. Nat Mach Intell. 2020;2(1):56–67. pmid:32607472
- 16. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1. pmid:36596836
- 17. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, physiotoolkit, and physioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215-20. pmid:10851218
- 18. Park J, Kim WJ, Hong JY, Hong Y. Clinical outcomes in patients with lung cancer admitted to intensive care units. Ann Transl Med. 2021;9(10):836. pmid:34164470
- 19. Song JH, Kim S, Lee HW, Lee YJ, Kim M-J, Park JS, et al. Effect of intensivist involvement on clinical outcomes in patients with advanced lung cancer admitted to the intensive care unit. PLoS One. 2019;14(2):e0210951. pmid:30759088
- 20. Adam AK, Soubani AO. Outcome and prognostic factors of lung cancer patients admitted to the medical intensive care unit. Eur Respir J. 2008;31(1):47–53. pmid:17715168
- 21. Hou N, Li M, He L, Xie B, Wang L, Zhang R, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med. 2020;18(1):462. pmid:33287854
- 22. Li J, Liu S, Hu Y, Zhu L, Mao Y, Liu J. Predicting mortality in intensive care unit patients with heart failure using an interpretable machine learning model: retrospective cohort study. J Med Internet Res. 2022;24(8):e38082. pmid:35943767
- 23. Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care. 2019;23(1):112. pmid:30961662
- 24. Huang T, Le D, Yuan L, Xu S, Peng X. Machine learning for prediction of in-hospital mortality in lung cancer patients admitted to intensive care unit. PLoS One. 2023;18(1):e0280606. pmid:36701342
- 25. Le Cacheux C, Couturier A, Sortais C, Houot R, Péré M, Gastinne T, et al. Features and outcomes of patients admitted to the ICU for chimeric antigen receptor T cell-related toxicity: a French multicentre cohort. Ann Intensive Care. 2024;14(1):20. pmid:38291184
- 26. Quenot J-P, Helms J, Labro G, Dargent A, Meunier-Beillard N, Ksiazek E, et al. Influence of deprivation on initial severity and prognosis of patients admitted to the ICU: the prospective, multicentre, observational IVOIRE cohort study. Ann Intensive Care. 2020;10(1):20. pmid:32048075
- 27. Sánchez-Díaz JS, Escarraman-Martínez D, Guerrero-Gutiérrez MA, Meza-Comparán HD, Mancilla-Galindo J, Peniche-Moguel KG, et al. Simplified acute physiology score II and Mannheim peritonitis index are associated with in-hospital mortality in patients with abdominal sepsis admitted to ICU. Cir Cir. 2022;90(S2):81–91. pmid:36480749
- 28. Qiu Y, Li M, Song X, Li Z, Ma A, Meng Z, et al. Predictive nomogram for 28-day mortality risk in mitral valve disorder patients in the intensive care unit: A comprehensive assessment from the MIMIC-III database. Int J Cardiol. 2024;407:132105. pmid:38677334
- 29. Li Y, Li C. Comparison of simplified acute physiology score III and other scoring systems in prediction of 28-day prognosis in patients with severe sepsis. Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2015;27(6):454–9. pmid:26049183
- 30. Xiao J, Zhong R, Ye G. Comparison of acute physiology and chronic health evaluation II, simplified acute physiology score II and Logistic organ dysfunction system on predicting outcome of critically ill patients in intensive care unit. Zhongguo Wei Zhong Bing Ji Jiu Yi Xue. 2006;18(12):743–7. pmid:17166357