Figures
Abstract
Objectives
Developing and validating interpretable machine learning (ML) models for predicting whether triaged patients need to be admitted to the intensive care unit (ICU).
Measures
The study analyzed 189,167 emergency patients from the Medical Information Mart for Intensive Care IV database, with the outcome being ICU admission. Three models were compared: Model 1 based on Emergency Severity Index (ESI), Model 2 on vital signs, and Model 3 on vital signs, demographic characteristics, medical history, and chief complaints. Nine ML algorithms were employed. The area under the receiver operating characteristic curve (AUC), F1 Score, Positive Predictive Value, Negative Predictive Value, Brier score, calibration curves, and decision curves analysis were used to evaluate the performance of the models. SHapley Additive exPlanations was used for explaining ML models.
Results
The AUC of Model 3 was superior to that of Model 1 and Model 2. In Model 3, the top four algorithms with the highest AUC were Gradient Boosting (0.81), Logistic Regression (0.81), naive Bayes (0.80), and Random Forest (0.80). Upon further comparison of the four algorithms, Gradient Boosting was slightly superior to Random Forest and Logistic Regression, while naive Bayes performed the worst.
Citation: Liu Z, Shu W, Liu H, Zhang X, Chong W (2025) Development and validation of interpretable machine learning models for triage patients admitted to the intensive care unit. PLoS ONE 20(2): e0317819. https://doi.org/10.1371/journal.pone.0317819
Editor: Jerome Baudry, The University of Alabama in Huntsville, UNITED STATES OF AMERICA
Received: November 10, 2024; Accepted: January 6, 2025; Published: February 18, 2025
Copyright: © 2025 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Publicly available datasets were analyzed in this study. These data can be found at https://mimic.mit.edu/. The datasets generated during and/or analysed during the current study are available in the figshare repository, accessible at: https://doi.org/10.6084/m9.figshare.26402761.v1.
Funding: This research was supported by the Education Department of Liaoning Province, China. The funding was awarded to Zheng Liu under the grant number LJ232410159024. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
For the past 20 years, the number of emergency patients has increased every year [1,2], and the overcrowding of emergency rooms has become an important health problem worldwide [3–7]. Similarly, the proportion of critically ill patients has also increased [8]. These patients need immediate treatment in the intensive care unit (ICU), but overcrowding causes them to wait longer in the emergency department (ED) [9]. Overcrowding in emergency rooms and delayed ICU admission can lead to a series of adverse consequences, such as increased mortality [10–13]. Timely risk stratification and diversion of emergency patients, which requires the help of an early warning system, is an effective measure to decrease emergency room congestion and improve the survival rate of critically ill patients [14–17].
Currently, the commonly used international triage standards include the Australian Triage Scale, Canadian Triage and Acuity Scale, Manchester Triage System, Emergency Severity Index (ESI) five-level triage system, Korean Triage and Acuity Scale, and National Early Warning Score [18,19]. Due to the limited information obtained during triage, these warning models were primarily based on subjective judgments by medical staff or the vital signs of patients, and previous studies had already demonstrated their shortcomings in predictive performance [16,20,21]. With the development of structured electronic medical records (sEMR) and machine learning (ML), additional variables can be collected during triage and included in the prediction model without increasing the workload of medical staff [17,22]. At present, there are few studies on the use of a ML method to predict admission to the ICU from complex data based on triage information, and the “black box” nature of ML limits its application in medical decision support [23,24]. The interpretability of ML models has always been one of the hotspots and challenges in research. Especially in the medical field, there is a higher demand for interpretability of model decisions to ensure transparency and reliability in the decision-making process.
We chose vital signs, demographic characteristics, medical history, and chief complaint as predictive variables to develop the models. We believe these variables can represent all the information obtained during triage, and models constructed based on these variables may outperform traditional models based on subjective judgments by medical staff or patients’vital signs. To explain the models, we adopted SHapley Additive exPlanations (SHAP) [25].
This study compared the ESI five-level triage system, models based on vital signs, and ML models constructed using additional triage information, aiming to explore the maximum predictive potential of triage data for ICU admission. Additionally, it employed interpretable ML techniques to better assist clinical staff in making informed decisions during triage.
Materials and methods
Study design and data source
This was a retrospective cohort study, and the study design followed the TRIPOD guidelines [26]. All data were obtained from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 1.4) database. The MIMIC-IV database was used to collect the clinical information of patients who were admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts, USA in the ED, inpatient department, and ICU from 2008 to 2019. The MIMIC-IV database is a public resource that provides global researchers with free access to clinical data. The data have been anonymized and de-identified. The database was developed by the Computational Physiology Laboratory at MIT and received approval from the Institutional Review Boards of both MIT and Beth Israel Deaconess Medical Center, exempting the requirement for informed consent from participants. After successfully completing the training course and examination of the cooperative organization (Record ID 45797033), ZL extracted the data needed for this study on July 31, 2024. Data extraction was performed using Navicat Premium software (version 15.0), while statistical analysis and ML were conducted using R (version 4.1.3, The R Foundation for Statistical Computing, Vienna, Austria) and Python (version 3.8.5, Python Software Foundation). The main code and libraries used for the statistical analysis and ML sections in this study were detailed in the S1 Text.
Study population and outcome
The study population included adult patients (aged ≥ 18 years) whose medical data were extracted from the MIMIC-IV ED database. The predicted outcome of this study was admission to ICU, which was extracted from the MIMIC-IV ICU database. Populations who lacked outcomes, had multiple visits, and had missing predictor variables were excluded.
Data collection and feature engineering
We selected ESI, vital signs, demographic characteristics, medical history, and chief complaints as predictors for data collection. The vital signs included body temperature, heart rate, respiratory rate, systolic blood pressure (SBP), diastolic blood pressure (DBP), blood oxygen saturation, and pain index. The demographic characteristics included age and sex. Medical history included smoking, hypertension, diabetes, coronary heart disease, chronic heart failure, chronic obstructive pulmonary disease, chronic kidney disease, cirrhosis, and malignant tumors. For chief complaints, we selected those with a proportion greater than 1% or clinically typical severe symptoms to construct Model 3. For extracting chief complaints, we employed natural language processing techniques to address issues related to spelling errors and language inconsistencies.
In this study, we characterized continuous variables by reporting the median and interquartile range, and employed the Mann-Whitney U test to assess intergroup differences. For categorical variables, frequencies and percentages were utilized for description, with group comparisons conducted using either the chi-square test or Fisher’s exact test.
For handling missing and extreme values, we excluded data points with missing predictors or outcomes. Extreme values of continuous variables were defined based on clinical plausibility and prior literature: body temperature < 34.5 °C or < 43.0 °C, SBP < 60 mmHg or > 250 mmHg, DBP < 30 mmHg or > 150 mmHg, respiratory rate < 10 times/min or > 50 times/min, blood oxygen saturation < 60% or > 100%, and heart rate < 40 beats/min or > 200 beats/min (Fig 1). Beyond these thresholds, we further assessed outliers using the interquartile range (IQR) method. Specifically, for each continuous variable, values outside 1.5 times the IQR below the first quartile (Q1) or above the third quartile (Q3) were flagged as potential outliers. These flagged values were reviewed and retained if they were clinically plausible; otherwise, they were excluded. Categorical variables were uniformly processed across all algorithms by applying one-hot encoding. This ensured consistent treatment of categorical features, regardless of the algorithm used. For example, variables such as sex (male/female) and smoking status (yes/no) were transformed into binary dummy variables, while multi-class variables such as chief complaints were expanded into multiple binary columns. To improve the accuracy of the model, we used a normalization method to scale all the variables and map the data to the [0,1] interval.
MIMIC, medical information mart for intensive care; SBP, systolic blood pressure; DBP, diastolic blood pressure; o2sat, oxygen saturation; ICU, intensive care unit; AUC, area under the receiver operating characteristic curve, SHAP, SHapley Additive exPlanations; ESI, Emergency Severity Index.
An order of priority based upon acuity utilizing the ESI Five Level triage system. This priority is assigned by a registered nurse. Level 1 is the highest priority, while level 5 is the lowest priority. The levels are:
- Level 1: When Level 1 condition or patient meets ED Trigger Criteria, the triage process stops, the patient is taken directly to a room and immediate physician intervention requested. Patient conditions which trigger level 1 include being unresponsive, intubated, apneic, pulseless, requiring a medication/intervention to alter ESI level, e.g., narcan/adenosine/cardioversion, trauma, stroke, stemi.
- Level 2: When a Level 2 condition is identified, the triage nurse notifies the resource nurse and appropriate placement will be determined. Patient conditions which trigger level 2 include high risk situations, new onset confusion, suicidal/homicidal ideation, lethargy, seizures or disorientation, possible ectopic pregnancy, an immunocompromised patient with a fever, severe pain/distress, or vital sign instability.
- Level 3: Includes patients requiring two or more resources (labs, EKG, x-rays, IV fluids, etc) with stable vital signs.
- Level 4: Patients requiring one resource only (labs, EKG, etc)
- Level 5: Patients not requiring any resources
Resampling strategies for class imbalance
Class imbalance is a common challenge in ML, often leading to overfitting and suboptimal model performance[27]. In the current dataset, the majority class (non-ICU patients) comprises 94.69% of the data, while the minority class (ICU patients) accounts for only 5.21%. To explore the effect of addressing class imbalance on model performance, we selected the Logistic Regression (LR) algorithm as a case study and compared the results of three resampling strategies—SMOTE, SMOTE-Tomek, and RandomUnderSampler—against the original, non-resampled dataset in Models 2 and 3. Model performance was assessed using metrics such as Precision, Sensitivity, and F1-Score, with a specific emphasis on enhancing the predictive accuracy for the minority class (ICU patients).
Development and evaluation of the models
The data were randomly divided into a training set (80%) and test set (20%). We developed three models: Model 1 was based on the ESI five-level triage system, Model 2 was constructed using vital signs, and Model 3 was built using vital signs, demographic characteristics, medical history, and chief complaints. We applied nine algorithms to the training data to build the models: 1) LR, 2) k-Nearest Neighbor (KNN), 3) Support Vector Machine (SVM), 4) naive Bayes, 5) Decision Tree, 6) Random Forest, 7) Extra Tree, 8) Gradient Boosting, and 9) Adaptive Boosting (AdaBoost). By tuning the regularization parameters, we aimed to control model complexity and enhance its generalization ability and performance. Model 1, containing only the ESI variable, used the LR algorithm. Models 2 and 3 employed all nine algorithms to explore the best area under the receiver operating characteristic curve (AUC) value. The AUC and its 95% confidence interval were determined using the DeLong method. We evaluated the performance of the models on the test data using several metrics, including the F1 Score, Area Under the Precision-Recall Curve (AUC-PR), Positive Predictive Value (PPV), Negative Predictive Value (NPV), Brier score, calibration curve, and decision curve analysis. We utilized the SHAP method to interpret different algorithms and assessed the risk of ICU admission to better assist healthcare providers in decision-making.
Results
Characteristics of study sample
A total of 189,167 emergency patients were included in this study, of which 10,042 (5.31%) were admitted to the ICU and 17,9125 (94.69%) were not admitted to ICU (Table 1). The correlation coefficient between SBP and DBP was 0.5, while it was less than 0.25 for the other continuous variables. Body temperature, heart rate, respiratory rate, and age exhibited positive correlations with the outcome, whereas SBP, DBP, blood oxygen saturation, and pain index showed negative correlations with the outcome (Fig 2A). Restricted cubic splines illustrated the nonlinear relationships between systolic and diastolic blood pressure and the outcome, whereas temperature, heart rate, respiratory rate, blood oxygen saturation, pain index, and age displayed linear relationships with the outcome (Fig 2B). Bar charts illustrating categorical variables with outcomes were presented in Fig 2C.
A: Heatmap of correlation between continuous variables. B: Restricted cubic spline analysis of the linear relationship between continuous variables and outcomes. C: Bar chart of the relationship between categorical variables and outcomes. ICU, intensive care unit; o2sat, oxygen saturation; SBP, systolic blood pressure; DBP, diastolic blood pressure; COPD, chronic obstructive pulmonary disease; CKD, chronic kidney diseases; CHF, chronic heart failure; CHD, coronary heart disease; GIB, gastrointestinal bleeding.
Effect of resampling on the dataset
The results showed that, compared to the original dataset without resampling, all three resampling techniques—SMOTE, SMOTE-Tomek, and RandomUnderSampler—significantly improved the Precision, Sensitivity, and F1-Score for the minority class, while their AUC values were largely consistent (S1 Table). Among them, RandomUnderSampler, which uses only actual data, minimizes the risk of overfitting that can arise from generating synthetic samples, while also being less computationally demanding. Considering the large scale of the dataset and the need for multi-model comparisons in this study, random undersampling was chosen as the primary strategy to mitigate class imbalance, as it offers an optimal trade-off between predictive performance and computational efficiency.
Performance of the models
As shown in Fig 3, when Model 1 was used for prediction, the AUC value was 0.68. When Model 2 was used for prediction, the AUC values of the nine algorithms, in descending order, were as follows: Gradient Boosting (0.76), naive Bayes (0.75), LR (0.74), Random Forest (0.71), AdaBoost (0.69), Extra Tree (0.67), KNN (0.61), Decision Tree (0.55), and SVM (0.33). When Model 3 was used for prediction, the AUC values of the nine algorithms, in descending order, were as follows: Gradient Boosting (0.81), LR (0.81), naive Bayes (0.80), Random Forest (0.80), AdaBoost (0.75), Extra Tree (0.73), Decision Tree (0.65), KNN (0.63), and SVM (0.35). LR, Gradient Boosting, naive Bayes, and Random Forest algorithms were compared for Model 3, while other models and algorithms were not further evaluated due to their lower AUC values. Table 2 presented metrics evaluating the overall performance, discrimination, calibration, and clinical utility of these four models. Calibration curves for the algorithms were shown in Fig 4A, decision curve analysis in Fig 4B. Overall, after comparison, Model 3 demonstrated superior predictive performance compared to Models 1 and 2, with LR, Random Forest, and Gradient Boosting being the three best-performing algorithms within Model 3.
ESI, Emergency Severity Index; AUC, area under the receiver operating characteristic curve; KNN, k-Nearest Neighbor; SVM, Support Vctor Machine; AdaBoost, Adaptive Boosting.
A: Calibration curves of the four algorithms. B: Decision curve analysis curves of the four algorithms.
Feature importance and interpretation of the models
The feature importance for the three best-performing ML algorithms in Model 3 (LR, Random Forest, Gradient Boosting) was detailed in the S1 Fig. Using Case16 as an example, we analyzed the predictions of these three algorithms using the SHAP method and provided the probability of ICU admission (Fig 5). The predicted probabilities of sepsis using the three algorithms were as follows: LR (77%), Random Forest (84%), and Gradient Boosting (96%).
LLR, Logistic Regression; RF, Random forest; GB, Gradient Boosting; sbp, systolic blood pressure; dbp, diastolic blood pressure; o2sat, oxygen saturation; CKD, chronic kidney diseases; CHD, coronary heart disease; CHF, chronic heart failure.
In the SHAP method, f (X) represented the final prediction result, which equaled the baseline value E [f (X)] plus the sum of all variable SHAP values. The SHAP values quantified the quantity and direction of each variable’s influence on predicting the outcome. Blue and red respectively represented decreases or increases in risk, with longer arrows indicating greater effects. The baseline value E [f (X)] was equivalent to the average risk in the dataset.
Discussion
In this study, we predicted the risk of ICU admission based on triage information from 189,716 emergency patients by constructing interpretable ML models. Our study indicated that adding demographic characteristics, medical history, and chief complaints to the triage prediction model improved the predictive performance for critical conditions compared to using only vital signs or the ESI five-level triage system for prediction. Furthermore, interpretable ML helped by providing transparency and insight into predictions, aiding medical decision-making during triage.
We visualized the data to intuitively observe the relationship between various characteristic variables and outcomes. In this study, all vital signs were continuous variables. Restricted cubic splines demonstrated significant linear or U-shaped relationships between these variables and outcomes (Fig 2B). Traditionally, triage warning scores categorize vital signs, but we retained them as continuous variables to explore maximum predictive efficacy, which categorical variables often diminish. Unfortunately, the MIMIC-IV ED database lacked data on patient conscious state, which was only reflected in the chief complaint. We displayed relationships between classification variables and outcomes as columns, listed in Fig 2C. Descriptive statistics indicated that most variables correlated with the occurrence of the outcome.
In our study, the AUC value of the ESI five-level triage system was 0.68, which was lower than that of Models 2 and 3. Previous studies have reported that the ESI has lower sensitivity in accurately identifying critically ill patients, which may lead to delays in care [28,29]. As shown in Fig 3, after adding additional variables, the AUC values of most of the algorithms were improved. This showed that, based on sEMR, collecting additional variables during triage can improve the prediction efficiency, at least in terms of ICU admission. The popularization of sEMR has become a trend in the medical system, providing a carrier for artificial intelligence to serve medical care [30,31]. Among the nine algorithms, the highest AUC value was 0.81, which was based on LR and Gradient Boosting, followed by 0.80, which was based on naive Bayes and Random Forest. Although sEMR is quite convenient, its ability to collect my information in the triage setting is limited and typically does not extend beyond four aspects: vital signs, medical history, demographics, and chief complaint. Therefore, we believe that an AUC of 0.81 can represent the maximum predictive performance for triage in predicting ICU admission.
According to the AUC values, we further compared the prediction effects of LR, Gradient Boosting, naive Bayes, and Random Forests because the AUC value is the most important indicator for evaluating the prediction model [32]. The AUC value was used to judge the discrimination ability of the model, whereas the calibration curve was used to judge the degree of conformity between the prediction and the actual situation. A perfect calibration curve should be on the 45-degree line [33], and in this study, the naive Bayes algorithm performed poorly, while the remaining three algorithms performed excellently, as shown in Fig 4A. The decision curve analysis showed that the performance of naive Bayes was noticeably inferior to the other three algorithms (Fig 4B). As shown in Table 2, considering the evaluation metrics of overall performance, discriminative ability, calibration ability, and clinical application value, Gradient Boosting performed slightly better than LR and Random Forest, while naive Bayes performed the worst. Although the improvement in AUC by complex models such as Gradient Boosting over simpler ones like Logistic Regression appears marginal (0.81 vs. 0.80), the added complexity is justifiable in certain contexts. The AUC is only one evaluation metric and does not fully capture a model’s overall performance. Gradient Boosting achieved the highest F1 Score (0.74) among all models, along with the lowest Brier score (0.044), indicating superior overall performance, calibration, and probabilistic prediction accuracy, as well as comparable or better clinical usefulness (Table 2, Fig 4B). Additionally, complex models can capture non-linear relationships and interactions between variables that simpler models may fail to identify, which is particularly relevant in clinical datasets with inherent heterogeneity. However, we acknowledge that simpler models like Logistic Regression remain highly interpretable, computationally efficient, and better suited for resource-limited or transparent decision-making scenarios. Therefore, the choice between simple and complex models requires balancing the slight gain in performance against the trade-offs in interpretability and computational demands depending on the clinical application.
Many studies have concluded that ML can deal better with high-order nonlinear interactions between variables than LR when faced with complex data or numerous features [34–36]. Indeed, some studies have found that ML is better than LR at predicting critical illness or poor prognosis [37–40]; however, a systematic review in 2019 showed that ML had no performance advantage over LR for clinical prediction models [23]. Yun et al. [41] compared the ML model and KTAS to predict the risk of critical care outcomes using patient triage data. The results showed that the Gradient Boosting algorithm was superior to KTAS, and the AUC of Gradient Boosting was 0.86. Yun et al. included more variables than we did, such as mode of arrival, interval between onset and arrival, and state of consciousness, but did not include medical history. De Hond et al. [42] added laboratory tests within 2h to predict the risk of hospitalization in emergency patients. The results showed that ML had excellent predictive performance, similar to LR, for predicting hospitalization rate, and the best performing ML algorithm was Gradient Boosting. The advantage of the Gradient Boosting algorithm is that it can identify a variety of distinguishing features and feature combinations.
This study exemplified Case16 to explain the three better-performing algorithms: LR, Random Forest, and Gradient Boosting. Although different algorithms included slightly different variables and weights, they all could assist healthcare professionals in clinical judgment. In our study, all three algorithms included vital signs, demographics, medical history, and chief complaints in the model (Fig 5). This also demonstrated that using only vital signs for modeling in the triage alert model may not be optimal. SHAP fairly allocates contribution values for each feature in the sample, ultimately explaining the difference between the predicted value of an individual sample and the average predicted value of the model [43]. SHAP also has some drawbacks, such as long computation times when explaining certain ML algorithms, which limits its clinical applicability. For instance, in this study, the computation time for Random Forest was significantly longer than for LR and Gradient Boosting.
The primary significance of this study lies in our demonstration that incorporating more triage information into the model development process was superior to traditional early warning models. Additionally, we employed nine different algorithms for prediction and identified the optimal one through comparison. As sEMR continues to develop, these results may provide more options for future triage warning methods, surpassing the sole reliance on vital signs or subjective judgment. At the same time, we utilized interpretable ML methods, enabling ML to assist healthcare professionals in clinical decision-making, thereby providing feasibility and credibility for the application of this new warning model in triage. For the first time, we provided the probability of patients being admitted to the ICU using interpretable ML methods. By assessing these probabilities, triage healthcare personnel could further determine the severity of a patient’s condition. However, the ethical implications of using automated decision-making systems in clinical practice warrant careful consideration. While such systems can enhance the accuracy and efficiency of triage processes, they must not replace human judgment but rather serve as a supportive tool for healthcare professionals. Over-reliance on automated systems may risk overlooking nuanced clinical contexts that are difficult to quantify or encode into models. Additionally, the potential for algorithmic bias, stemming from imbalanced datasets or unrepresentative training data, must be addressed to ensure equitable care across diverse patient populations. Transparency in model development, validation, and implementation is essential to build trust among healthcare providers and patients. Lastly, the deployment of such systems should always prioritize patient autonomy, ensuring that automated predictions are used to inform, rather than dictate, clinical decisions.
The limitations of this study mainly manifest in the following aspects. Firstly, as a single-center study, external validation was not possible. Secondly, as a retrospective study, it was unable to include all potential variables, such as consciousness status and history of alcohol abuse. Certain chief complaints were not very precise; for example, some patients’chief complaints were “low blood pressure” or “low blood sugar”. If their proportion exceeded 1%, we included these less standardized chief complaints. Thirdly, removing patients with missing and extreme values may introduce selection bias, particularly if the data were not missing completely at random. For example, extreme values could carry clinically important information, and their exclusion might affect model performance. Future studies should consider applying advanced imputation methods, such as multiple imputation by chained equations (MICE), to address missing values and compare the results with models based on complete-case data to evaluate potential biases. Additionally, the clinically defined thresholds for extreme values were conservative, which might have excluded meaningful data points, and future efforts could explore machine learning-based anomaly detection methods or use winsorization to better account for the impact of extreme values on model performance. Fourthly, in terms of population selection, we excluded patients from the MIMIC-IV ED who lacked outcomes and those with multiple visits. Since these patients constituted a significant proportion, there may be some selection bias. Finally, while our predictive model provided the probability of ICU admission, it did not directly determine the severity of emergency patient conditions, and further research is needed to confirm whether it can improve clinical decision-making and better stratify emergency patients by severity.
Conclusions
In conclusion, this study developed an interpretable ML triage warning model based on vital signs, demographics, medical history, and chief complaints. It was more effective in predicting whether patients needed ICU admission compared to traditional early warning models based on vital signs or ESI. Interpretable ML can assist healthcare professionals in making clinical decisions during triage.
Supporting information
S1 Table. Performance comparison of ML model (LR) before and after resampling.
https://doi.org/10.1371/journal.pone.0317819.s002
(PDF)
S1 Fig. Feature importance in Logistic Regression, Random Forest and Gradient Boosting model.
https://doi.org/10.1371/journal.pone.0317819.s003
(PDF)
Acknowledgments
The authors would like to express their sincere gratitude to the MIMIC database for providing publicly available data, which has been instrumental in facilitating this research.
References
- 1.
Sun R, Karaca Z, Wong HS. Trends in Hospital Emergency Department Visits by Age and Payer, 2006–2015. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs. Rockville (MD): Agency for Healthcare Research and Quality (US); 2006.
- 2. Van Der Linden MC, Khursheed M, Hooda K, Pines JM, Van Der Linden N. Two emergency departments, 6000km apart: differences in patient flow and staff perceptions about crowding. Int Emerg Nurs. 2017;35:30–6. Epub 2017/07/01 pmid:28659247
- 3. Sun BC, Hsia RY, Weiss RE, Zingmond D, Liang LJ, Han W, et al. Effect of emergency department crowding on outcomes of admitted patients. Ann Emerg Med. 2013;61(6):605–11.e6. Epub 2012/12/12 pmid:23218508
- 4. Gaieski DF, Agarwal AK, Mikkelsen ME, Drumheller B, Cham Sante S, Shofer FS, et al. The impact of ED crowding on early interventions and mortality in patients with severe sepsis. Am J Emerg Med. 2017;35(7):953–60. Epub 2017/02/25 pmid:28233644
- 5. Morley C, Unwin M, Peterson GM, Stankovich J, Kinsman L. Emergency department crowding: a systematic review of causes, consequences and solutions. PLoS One. 2018;13(8):e0203316. Epub 2018/08/31 pmid:30161242
- 6. Zachariasse JM, van der Hagen V, Seiger N, Mackway-Jones K, van Veen M, Moll HA. Performance of triage systems in emergency care: a systematic review and meta-analysis. BMJ Open. 2019;9(5):e026471. Epub 2019/05/31 pmid:31142524 PMCID: PMC6549628
- 7. Hoot NR, Aronsky D. Systematic review of emergency department crowding: causes, effects, and solutions. Ann Emerg Med. 2008;52(2):126–36. Epub 2008/04/25 pmid:18433933 PMCID: PMC7340358
- 8. Herring AA, Ginde AA, Fahimi J, Alter HJ, Maselli JH, Espinola JA, et al. Increasing critical care admissions from U.S. emergency departments, 2001-2009. Crit Care Med. 2013;41(5):1197–204. Epub 2013/04/18 pmid:23591207 PMCID: PMC3756824
- 9. Goldstein RS. Management of the critically ill patient in the emergency department: focus on safety issues. Crit Care Clin. 2005;21(1):81–9, viii. Epub 2004/12/08 pmid:15579354
- 10. Hsieh CC, Lee CC, Hsu HC, Shih HI, Lu CH, Lin CH. Impact of delayed admission to intensive care units on patients with acute respiratory failure. Am J Emerg Med. 2017;35(1):39–44. Epub 2016/10/16 pmid:27742520
- 11. Bernstein SL, Aronsky D, Duseja R, Epstein S, Handel D, Hwang U, et al; Society for Academic Emergency Medicine, Emergency Department Crowding Task Force. The effect of emergency department crowding on clinically oriented outcomes. Acad Emerg Med. 2009;16(1):1–10. Epub 2008/11/15 pmid:19007346
- 12. Oliveira EG, Garcia PC, Citolino Filho CM, de Souza Nogueira L. The influence of delayed admission to intensive care unit on mortality and nursing workload: a cohort study. Nurs Crit Care. 2019;24(6):381–6. Epub 2018/11/28 pmid:30478867
- 13. García-Gigorro R, de la Cruz Vigo F, Andrés-Esteban EM, Chacón-Alves S, Morales Varas G, Sánchez-Izquierdo JA, et al. Impact on patient outcome of emergency department length of stay prior to ICU admission. Med Intensiva. 2017;41(4):201–8. Epub 2016/08/25 pmid:27553889
- 14. LaMantia MA, Platts-Mills TF, Biese K, Khandelwal C, Forbach C, Cairns CB, et al. Predicting hospital admission and returns to the emergency department for elderly patients. Acad Emerg Med. 2010;17(3):252–9. Epub 2010/04/08 pmid:20370757 PMCID: PMC5985811
- 15. Peck JS, Gaehde SA, Nightingale DJ, Gelman DY, Huckins DS, Lemons MF, et al. Generalizability of a simple approach for predicting hospital admission from an emergency department. Acad Emerg Med. 2013;20(11):1156–63. Epub 2013/11/19 pmid:24238319
- 16. Dugas AF, Kirsch TD, Toerper M, Korley F, Yenokyan G, France D, et al. An electronic emergency triage system to improve patient distribution by critical outcomes. J Emerg Med. 2016;50(6):910–8. Epub 2016/05/03 pmid:27133736
- 17. Fernandes M, Vieira SM, Leite F, Palos C, Finkelstein S, Sousa JMC. Clinical decision support systems for triage in the emergency department using intelligent systems: a review. Artif Intell Med. 2020;102:101762. Epub 2020/01/26 pmid:31980099
- 18. Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. Risk of mortality and cardiopulmonary arrest in critical patients presenting to the emergency department using machine learning and natural language processing. PLoS One. 2020;15(4):e0230876. Epub 2020/04/03 pmid:32240233 PMCID: PMC7117713
- 19. Lee JH, Park YS, Park IC, Lee HS, Kim JH, Park JM, et al. Over-triage occurs when considering the patient’s pain in Korean Triage and Acuity Scale (KTAS). PLoS One. 2019;14(5):e0216519. Epub 2019/05/10 pmid:31071132 PMCID: PMC6508716
- 20. Afnan MAM, Netke T, Singh P, Worthington H, Ali F, Kajamuhan C, et al. Ability of triage nurses to predict, at the time of triage, the eventual disposition of patients attending the emergency department (ED): a systematic literature review and meta-analysis. Emerg Med J. 2021;38(9):694–700. Epub 2020/06/21 pmid:32561525
- 21. Christ M, Grossmann F, Winter D, Bingisser R, Platz E. Modern triage in the emergency department. Deutsches Arzteblatt Int. 2010;107(50):892–8. Epub 2011/01/20 pmid:21246025 PMCID: PMC3021905
- 22. Mueller B, Kinoshita T, Peebles A, Graber MA, Lee S. Artificial intelligence and machine learning in emergency medicine: a narrative review. Acute Med Surg. 2022;9(1):e740. Epub 2022/03/08 pmid:35251669 PMCID: PMC8887797
- 23. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. Epub 2019/02/15 pmid:30763612
- 24. Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, et al; CENTER-TBI collaborators. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol. 2020;122:95–107. Epub 2020/03/24 pmid:32201256
- 25. Jin Y, Kattan MW. Methodologic issues specific to prediction model development and evaluation. Chest. 2023;164(5):1281–9. Epub 2023/07/07 pmid:37414333
- 26. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ (Clinical Research ed). 2015;350:g7594. Epub 2015/01/09 pmid:25569120
- 27. Fernando KRM, Tsokos CP. Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans Neural Netw Learn Syst. 2022;33(7):2940–51. Epub 2021/01/1 pmid:33444149
- 28. Sax DR, Warton EM, Mark DG, Vinson DR, Kene MV, Ballard DW, et al; Kaiser Permanente CREST (Clinical Research on Emergency Services & Treatments) Network. Evaluation of the emergency severity index in US emergency departments for the rate of Mistriage. JAMA Netw Open. 2023;6(3):e233404. Epub 2023/03/18. pmid:36930151
- 29. Weiss SL, Fitzgerald JC, Balamuth F, Alpern ER, Lavelle J, Chilutti M, et al. Delayed antimicrobial therapy increases mortality and organ dysfunction duration in pediatric sepsis. Crit Care Med. 2014;42(11):2409–17. Epub 2014/08/26 pmid:25148597 PMCID: PMC4213742
- 30. Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. J Healthc Eng. 2018;2018:4302425. Epub 2018/06/01 pmid:29849998 PMCID: PMC5911323
- 31. Huang Y, Wang N, Zhang Z, Liu H, Fei X, Wei L, et al. Patient representation from structured electronic medical records based on embedding technique: development and validation study. JMIR Med Inform. 2021;9(7):e19905. Epub 2021/07/24 pmid:34297000 PMCID: PMC8367145
- 32. Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, et al. Discrimination and calibration of clinical prediction models: users’ guides to the medical literature. JAMA. 2017;318(14):1377–84. Epub 2017/10/20 pmid:29049590
- 33. Huang Y, Li W, Macheret F, Gabriel RA, Ohno-Machado L. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc. 2020;27(4):621–33. Epub 2020/02/28 pmid:32106284 PMCID: PMC7075534
- 34. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317–8. Epub 2018/03/14 pmid:29532063
- 35. Chen JH, Asch SM. Machine learning and prediction in medicine - beyond the peak of inflated expectations. N Engl J Med. 2017;376(26):2507–9. Epub 2017/06/29 pmid:28657867 PMCID: PMC5953825
- 36. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18(12):e323. Epub 2016/12/18 pmid:27986644 PMCID: PMC5238707
- 37. Berlyand Y, Raja AS, Dorner SC, Prabhakar AM, Sonis JD, Gottumukkala RV, et al. How artificial intelligence could transform emergency department operations. Am J Emerg Med. 2018;36(8):1515–7. Epub 2018/01/13 pmid:29321109
- 38. Goto T, Camargo CA, Jr, Faridi MK, Yun BJ, Hasegawa K. Machine learning approaches for predicting disposition of asthma and COPD exacerbations in the ED. Am J Emerg Med. 2018;36(9):1650–4. Epub 2018/07/05 pmid:29970272
- 39. Taylor RA, Pare JR, Venkatesh AK, Mowafi H, Melnick ER, Fleischman W, et al. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data-driven, machine learning approach. Acad Emerg Med. 2016;23(3):269–78. Epub 2015/12/19 pmid:26679719 PMCID: PMC5884101
- 40. Wellner B, Grand J, Canzone E, Coarr M, Brady PW, Simmons J, et al. Predicting unplanned transfers to the intensive care unit: a machine learning approach leveraging diverse clinical elements. JMIR Med Inform. 2017;5(4):e45. Epub 2017/11/24 pmid:29167089 PMCID: PMC5719228
- 41. Yun H, Choi J, Park JH. Prediction of critical care outcome for adult patients presenting to emergency department using initial triage information: an XGBoost algorithm analysis. JMIR Med Inform. 2021;9(9):e30770. Epub 2021/08/05 pmid:34346889 PMCID: PMC8491120
- 42. De Hond A, Raven W, Schinkelshoek L, Gaakeer M, Ter Avest E, Sir O, et al. Machine learning for developing a prediction model of hospital admission of emergency department patients: hype or hope? Int J Med Inform. 2021;152:104496. Epub 2021/05/22 pmid:34020171
- 43. Ali S, Akhlaq F, Imran AS, Kastrati Z, Daudpota SM, Moosa M. The enlightening role of explainable artificial intelligence in medical & healthcare domains: a systematic literature review. Comput Biol Med. 2023;166:107555. Epub 2023/10/09 pmid:37806061