Using machine learning methods to predict all-cause somatic hospitalizations in adults: A systematic review

Mohsen Askar; Masoud Tafavvoghi; Lars Småbrekke; Lars Ailo Bongo; Kristian Svendsen

doi:10.1371/journal.pone.0309175

Abstract

Aim

In this review, we investigated how Machine Learning (ML) was utilized to predict all-cause somatic hospital admissions and readmissions in adults.

Methods

We searched eight databases (PubMed, Embase, Web of Science, CINAHL, ProQuest, OpenGrey, WorldCat, and MedNar) from their inception date to October 2023, and included records that predicted all-cause somatic hospital admissions and readmissions of adults using ML methodology. We used the CHARMS checklist for data extraction, PROBAST for bias and applicability assessment, and TRIPOD for reporting quality.

Results

We screened 7,543 studies of which 163 full-text records were read and 116 met the review inclusion criteria. Among these, 45 predicted admission, 70 predicted readmission, and one study predicted both. There was a substantial variety in the types of datasets, algorithms, features, data preprocessing steps, evaluation, and validation methods. The most used types of features were demographics, diagnoses, vital signs, and laboratory tests. Area Under the ROC curve (AUC) was the most used evaluation metric. Models trained using boosting tree-based algorithms often performed better compared to others. ML algorithms commonly outperformed traditional regression techniques. Sixteen studies used Natural language processing (NLP) of clinical notes for prediction, all studies yielded good results. The overall adherence to reporting quality was poor in the review studies. Only five percent of models were implemented in clinical practice. The most frequently inadequately addressed methodological aspects were: providing model interpretations on the individual patient level, full code availability, performing external validation, calibrating models, and handling class imbalance.

Conclusion

This review has identified considerable concerns regarding methodological issues and reporting quality in studies investigating ML to predict hospitalizations. To ensure the acceptability of these models in clinical settings, it is crucial to improve the quality of future studies.

Citation: Askar M, Tafavvoghi M, Småbrekke L, Bongo LA, Svendsen K (2024) Using machine learning methods to predict all-cause somatic hospitalizations in adults: A systematic review. PLoS ONE 19(8): e0309175. https://doi.org/10.1371/journal.pone.0309175

Editor: Tariq Jamal Siddiqi, The University of Mississippi Medical Center, UNITED STATES OF AMERICA

Received: February 1, 2024; Accepted: August 6, 2024; Published: August 23, 2024

Copyright: © 2024 Askar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: This is a systematic review. The extracted data is available in the supplementary material.

Funding: The publication charges for this article have been funded by a grant from the publication fund of UiT The Arctic University of Norway. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Unplanned hospital admissions and readmissions (hospitalizations) account for a significant share of global healthcare expenditures [1,2]. Interestingly, up to 35% of these hospitalizations are potentially avoidable [3]. One approach to address avoidable hospitalizations is to implement statistical and mathematical models on healthcare datasets in order to predict future hospitalization [4,5].

Previous attempts were mainly based on regression models and specific risk indexes (scores). Systematic reviews have concluded that most models had poor, inconsistent performance, and limited applicability. They also found that models utilizing health records data performed better than models using self-report data [4,6,7].

More recently, prediction models that utilize Machine Learning (ML) [8,9] algorithms have become more popular. Recent reviews emphasized the growing importance and effectiveness of ML models in predicting clinical outcomes such as hospital readmissions. These reviews concluded that ML techniques can improve readmission prediction ability over traditional statistical models. This improvement could be explained by ML models offering several advantages over traditional regression models such as flexibility, the ability to handle large, complex, high dimensional datasets, and identifying non-linear relationships [10]. The reviews also highlighted the critical role of selecting features and addressed some challenges such as transparency, the difficulty of ML models’ interpretation, and the importance of handling class imbalance to enhance the models’ performance. Moreover, they highlighted the importance of demonstrating the clinical usefulness of the models in practice [11–13]. A systematic analysis of readmission prediction literature proposed a comprehensive framework for ML model development detailing steps from data preparation and preprocessing to suggesting methods of feature selection and transformation, data splitting, model training, validation, and evaluation [14].

Although several reviews have considered the use of ML in predicting hospitalizations for specific diseases and conditions [15–17], none has systemically reviewed the literature on all-cause hospital admissions. We aim with this review to (i) summarize the characteristics of ML studies used in predicting all-cause somatic admissions and readmissions; (ii) provide a picture of the ML pipeline steps including, data preprocessing, feature selection, model evaluation, validation, calibration, and explanation; (iii) assessing the risk of bias, applicability and reporting completeness of the studies; and finally (iv) to comment on the challenges facing implementation of ML models in clinical practice.

Materials and methods

The protocol of this systematic review was registered in the International prospective register of systematic reviews PROSPERO (CRD42021276721). The PRISMA, and PRISMA-Abstract guidelines [18] were followed in reporting this review, see S1 File: Section 1.

Inclusions/Exclusion criteria

To formulate the research question, we used the PICOTS checklist [19,20]. Studies that only included non-adults were excluded. Hospitalizations were defined as all-cause somatic admissions or readmissions from outside hospitals, hence psychological-related, disease-specific, and internal admissions between wards are excluded. Emergency Department (ED) were considered portals, thus the admissions from an ED to the hospitals were included, but ED admissions followed by discharge were excluded.

Our focus is the studies performed in the ML context (both in developing steps of the model, e.g., feature engineering, or making the final predictions), so studies that only used statistical learning or risk indexes for prediction were excluded. All performance measures were reported for competing models. This review is mainly descriptive of how ML was used in predicting hospitalization; hence we chose to include studies conducted using real-world data with hospital admissions and readmissions as a valid outcome regardless of the timing of the outcome. Table 1 represents the overall inclusion criteria. A detailed description of the inclusions and exclusions is provided in S1 File: Section 2.

Download:

Table 1. The criteria for study inclusion.

https://doi.org/10.1371/journal.pone.0309175.t001

Search strategy

We searched four main databases: PubMed, Embase (via Ovid), Web of Science, and CINAHL (via EBSCO) from inception dates to October 13^th,2023. The search strategy was developed through the piloting of some relevant studies. Search terms were used if included in the database (MeSH for PubMed and CINAHL, and Emtree for EMBASE). We also searched 4 other databases: ProQuest, OpenGrey, WorldCat (OCLC FirstSearch), and MedNar for grey literature.

Four main search blocks were used to identify relevant studies: prediction, hospitalization, machine learning, and exclusions. The exclusion of irrelevant search words was developed by iteration and preliminary title/abstract piloting. The Boolean operators AND, OR, and NOT were used alongside truncation operators and phrase-searching. Search syntax was adapted for each database using the Polyglot tool [21] with manual supervision. The complete search syntax can be found in S1 File: Section 3.

Duplicate studies were removed using Mendeley Reference Manager (version 1.19.8, Elsevier). In cases where the reference manager was uncertain, we manually checked and removed any duplicates. Titles and abstracts were screened by two independent investigators (MA and KS), and full-text papers were retrieved for all candidate studies. The full-text screening was separately performed by MA, MT, LS, and KS. A manual search was conducted using the reference lists of the included studies to manually extract literature that did not appear in the electronic search. A list of all full-text screened studies, including those that were included and excluded with the reason(s) for exclusion, is attached to S2 File, sheet: Included & excluded studies. The final included studies were decided by discussion between MA, KS, and LS. The descriptive results were synthesized using Pivot tables in Microsoft Excel.

Data extraction

The data was extracted using the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The (CHARMS) Checklist [19] separately by MA, MT, and LS. For further analysis, features were grouped into administrative and clinical feature groups. The included records, extracted data, models’ features, and feature groupings can be found in S2 File, sheets: CHARMS, and Features.

Assessment of bias and applicability

Despite that the main purpose of the review is descriptive, we assessed the risk of bias and applicability using the Prediction model Risk of Bias Assessment Tool (PROBAST) [22] by MA and MT. PROBAST is a commonly used tool to assess prediction models. The tool evaluates four domains; namely: Participants, Predictors, Outcome, and Analysis. For each domain, there is a set of questions to help judge the risk of both bias and applicability concerns. If any domain was not rated “low”, the overall risk of bias was considered “high”. Abstracts were not assessed due to their limited information. The assessment is attached to S2 File, sheet: PROBAST.

Quality of reporting

To assess the quality of reporting, we utilized the Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) checklist [23]. We followed the suggested methodology in Constanza et al. [24] to evaluate the adherence to TRIPOD per article and per item in the reporting checklist. Each item is scored by (1 = reported, 0 = not reported, 0,5 = incomplete reporting, or not applicable = ‘_’). Abstracts and conference proceedings were not evaluated. We then calculated the adherence per TRIPOD item by dividing the sum of the items of all studies by the total number of studies. Adherence for each article was calculated as the sum of all TRIPOD items over the sum of all items if the reporting was complete, S2 File, sheet: TRIPOD. All abbreviations mentioned in the study are included in S2 File, sheet: Abbreviations.

Results

Of the 7,543 records reviewed, 147 records were eligible for full-text screening. We included 16 other records by manual searching of the references. In total, 163 studies were fully screened, and 116 studies were included in the review, of which 87 peer-reviewed articles (76%), 17 conference articles, 9 abstracts, and three theses (Fig 1).

Download:

Fig 1. A PRISMA flowchart of studies inclusion.

https://doi.org/10.1371/journal.pone.0309175.g001

Data extraction results

Characteristics of the included studies.

Sixty-one studies (53%) were conducted using data from the USA, followed by Australia (seven studies), Taiwan (four studies), and Canada and Singapore, each with three studies. The oldest article is from 2005, and 2019 had the greatest number of articles (22 articles, 19%), followed by 2020 (17 articles, 15%), see Fig 2.

Download:

Fig 2.

Left panel, a bar plot of the top 10 countries where the datasets originate and the number of publications. Right panel, a bar plot of the number of publications by year.

https://doi.org/10.1371/journal.pone.0309175.g002

Population characteristics.

Only 23 studies (20%) had a complete reporting of sample size (number of unique patients and the total number of admissions). Six studies (5%) neither reported the number of patients nor admissions (among them were 4 abstracts). The sample size varied from 371 to 4,637,294. Regarding age, 49 studies (42%) did not report an age range of the included patients. The rest of the studies had different minimum age requirements for the studied patients.

Outcomes characteristics.

Readmission was the outcome in 70 studies (60%), while 45 studies (39%) had hospital admission as an outcome, and one study investigated both outcomes. The readmission outcome prediction horizon varied from 24 hours to 1 year. The most frequent predicted horizon was 30-day readmission (51 studies (73% of the readmission studies), 7 other studies combined it with other readmission horizons, in total 58 studies (83%)). The dataset’s inclusion period varied from 1 month to 30 years (median: 1.25 years, mean 3.2 years). Excluding rebalanced datasets, the readmission proportion varied from 0.7% to 34.6% (median 12.4%), while the admission proportion varied from 0.38% to 41% (median 17.2%).

Datasets

The number of studies that used administrative, clinical, or both types was close (40, 36, and 38 studies respectively), with two abstracts, having an unclear description of the type of dataset. Among the studies that reported an Area Under the ROC curve (AUC) and used these types of datasets (103 studies), the mean AUCs were 0.80, 0.78, and 0.77 with standard deviations (SD) of 0.08, 0.07, and 0.09 respectively. Six studies reported an AUC over 90%, while 81 studies reported an AUC from 70–90% and 18 studies reported an AUC ranging from 60–70%. Fig 3 shows the relationship between outcomes, dataset types, dataset sources, and the best model performance. S1 File: Section 4 includes detailed information on the data types, sources, and frequency of use with predicting either admission or readmissions.

Download:

Fig 3. Sankey diagram showing the type of outcome, datatypes and sources of datasets, and model performance by AUC.

The thickness of the streams indicates the number of records common between pairs of categories. Medical records include patient information from EHR or EMR. Hospital datasets include data from hospital information systems.

https://doi.org/10.1371/journal.pone.0309175.g003

Types of features included in models.

The most used feature groups were demographics (92 studies, 79%), diagnoses (43 studies, 37%), vital signs (34 studies, 29%), and laboratory tests (28%). Fig 4 represents the most used feature groups in the included studies. Natural Language Processing techniques (NLP) were used in 16 studies (14%) to predict hospitalizations from clinical free-text notes.

Download:

Fig 4. The most frequently used feature groups in the retrieved studies.

https://doi.org/10.1371/journal.pone.0309175.g004

Missing data and data imbalance.

Missing values were not mentioned at all in 52 studies (45%). In 55 studies (47%) where the handling of how missing values were reported, the most used methods were removing the records of missing values (27 studies, 23%) and various imputation methods (25 studies, 22%) with some studies using both removing and imputation to deal with missing values.

Of 99 studies (85%) that reported class imbalance in the outcome, only 32 studies (32%) reported handling this imbalance by some technique. The most used techniques were undersampling (19 studies, 59%), oversampling (7 studies, 22%), and Synthetic Minority Oversampling TEchnique (SMOTE) (6 studies, 19%). Note that some studies tested more than one resampling method.

Models’ performance and comparison

In total, 57 different algorithms were used for predicting the outcomes. Regression models were the most frequently used algorithm group (73 studies, 63%) followed by bagging tree-based algorithms, in 61 studies (53%), and boosting tree-based algorithms in 60 studies (52%). The best-performing algorithm group was boosting algorithms in 35 studies (42%), bagging algorithms in 16 studies (19%), followed by regression and Neural Networks (NN) models in 14 studies (17%) each, see Table 2.

Download:

Table 2. Algorithms’ groups and frequency of use in the included studies.

https://doi.org/10.1371/journal.pone.0309175.t002

Comparing the performance of algorithms.

Eighty-three studies (72%) compared the performance of multiple algorithms. Based on the results of the best-performing algorithm groups (Table 2), we compared the performance of some of these algorithm groups. DT, Bayesian models were not involved in the comparison as they did not perform best in any of the studies. Fig 5 illustrates the performance comparison between different algorithm groups in the retrieved studies.

Download:

Fig 5. A pairwise comparison of the performance of different algorithms’ groups.

The numbers on each segment denote the count of publications in which the first algorithm group demonstrated superior, equivalent, or inferior performance compared to the second one. Adjacent to each bar, the total number of publications involving such comparisons was indicated.

https://doi.org/10.1371/journal.pone.0309175.g005

Evaluation metrics.

AUC was the most used evaluation metric (105 studies) followed by precision, sensitivity, specificity, and accuracy (Fig 6). Thirty-seven studies (32%) reported only one evaluation metric such as AUC or accuracy without reporting a clinical performance metric such as sensitivity or specificity. Of the 105 studies that reported AUC, 18 studies (17%) reported AUC between (60–70%), 42 studies (40%) reported AUC between (70–80%), 39 studies (37%) reported an AUC between (80–90%), and only 6 studies (6%) reported AUC above 90% (Fig 3). The highest reported AUC in admission models was 95% and 99% in readmission models. The mean AUC reported in the studies that used administrative, clinical, or combined both was 0.80, 0.78, and 0.77 ((SD: 0.08, 0.07, and 0.09) respectively.

Download:

Fig 6. Various aspects of model evaluation.

Each subplot represents the frequency of use in the reviewed studies.

https://doi.org/10.1371/journal.pone.0309175.g006

Model calibration and benchmarking.

Only 28 studies (24%) calibrated their models using one of the calibration methods. Fig 6 represents the calibration methods used and the count of publications. Eighteen studies (16%) were benchmarked against one or more risk prediction indexes such as LACE [25], PARR [26], HOSPITAL [27] indexes, etc. The most used risk index in benchmarking was LACE index with nine studies, followed by PARR and HOSPITAL with two studies each. In all 18 studies, ML models outperformed predictions obtained from these risk indexes. A detailed comparison is attached to S1 File: Section 5.

Model validation.

The majority of studies were trained and validated retrospectively (96 studies, 83%). Only 17 studies (15%) were trained retrospectively and tested prospectively, among them three studies performed a real-time validation. The study design was not clear in three studies. Fig 6 depicts the internal and external validation methods used in the studies.

Model explainability and availability

Providing model interpretation on the patient level (local model interpretation) was only represented in three studies. Fig 6 represents the different interpretation methods used in the studies. Twenty studies (17%) used publicly available datasets, and 15 studies (13%) reported providing the data upon request. Only 17 studies made their code available (15%) and only six studies implemented their models in clinical practice.

Quality of the studies

Bias and applicability assessment.

Of 106 studies assessed, 68 (64%) were evaluated to be at high risk of bias. We evaluated 94 studies (87%) to be at low concern of applicability. Assessment results are attached to S2 File, sheet: PROBAST.

Reporting quality assessment.

Only nine studies reported adherence to TRIPOD checklist. These studies [28–36] had generally better reporting quality (scored 17, 17, 19, 17.5, 16.5, 18, 17.5, 17, 16 out of 20) respectively. The overall median of 20 items TRIPOD adherence was 77% (IQR 63–95). The assessment of adherence to TRIPOD reveals insufficient reporting, especially in some items such as reporting the flow of participants (35% of the studies), supplementary material (52%), population characteristics (53%), reporting missing data (56%), and funding (58%) among others (Fig 7). The evaluation sheet is attached to S2 File, sheet: TRIPOD

Download:

Fig 7. Studies’ adherence proportion to a range of TRIPOD checklist items.

Only explicit reporting of Confidence Intervals (CI) was considered as complete reporting. Note that all items are calculated excluding abstracts and proceedings (10 studies). Hence, some items such as missing data calculations can differ from what is reported in the result section as the calculations in the result section included all the studies.

https://doi.org/10.1371/journal.pone.0309175.g007

Discussion

To our knowledge, this is the first systematic review to focus on ML models for predicting all-cause somatic hospitalizations. Of 7,543 citations, 116 studies were included. Our review reveals the potential that ML models have in predicting all-cause somatic hospitalizations which is consistent with what is reported by both a general review of AI and machine learning and disease-specific reviews [8,9]. Our findings also raise concerns regarding the quality of the studies conducted. Therefore, despite the potential of the ML prediction framework and the superiority over traditional statistical prediction shown in many studies, there are clear issues with the quality of reporting, external validation, model calibration, and interpretation. All these aspects should support the model performance to be convenient to implement in real-life clinical practice. These main findings are consistent with findings from other reviews [11,12,37–39].

Most studies were based on data from the USA which can be an issue. This geographic skew limits the generalizability of the developed models, considering the differences in healthcare systems and patient populations between countries [40]. As 30 day-readmission is a widely used indicator of hospital care quality [41] the majority of the included readmission studies used this indicator as an outcome.

Datasets and features

A wide variety of data sources and types were used. We found that the performance of models trained on administrative (claims), clinical, or datasets combined both clinical and administrative variables to be close with a slight edge for models trained on administrative datasets.

The most important features varied between the different studies. This lack of convergence of risk factors is due to: i) different definitions of admissions and readmissions as an outcome for these studies, ii) the use of different feature selection methods [42], iii) the diversity of recorded features in different healthcare databases, iv) lack of standard handling of data preprocessing steps and variance of methods handling and generating variables, v) the variety in population and subpopulations, exclusion criteria, and finally, vi) the use of different risk scores and indexes which include different sets of features. This is consistent with what previous studies concluded about the difficulty of finding universal features for predicting hospitalization [43–45]. While defining general risk factors is particularly difficult for studies of all-cause hospitalizations, it may be appropriate in subpopulations (i.e., patients of specific diseases) that have more similarities and less diversity. Yet, some groups of risk factors are shown to be more common than others (Fig 4).

The most used feature groups were demographics, diagnoses, physiological measurements, and laboratory tests, respectively (Fig 4). Some studies used only one or a limited number of feature groups [46–53]. All these studies yielded generally good predictive performance suggesting that the sole use of one or limited categories of features can be enough to predict hospitalization. However, this needs to be further investigated by comparing the performance of these models that were exclusively built on one or a few feature groups with models built on several feature groups.

Some studies used Natural Language Processing (NLP) techniques to extract information from the clinical text and either combined them with other structured features [54–57] or as a sole source of data [47,52,53,58–62]. Some studies reported better prediction performance using textual data over numerical data (e.g., laboratory tests and vital signs) suggesting the existence of relevant expert knowledge within these reports [53]. We noticed an increase in applying NLP techniques in recent studies suggesting that utilizing textual data is a promising future direction for predicting hospitalizations. Incorporating NLP techniques in prediction models will provide models with a rich source of clinical information that could not be present in the tabular format of patient records. It will also improve the research scalability by the automatic extraction of relevant information rather than manual processing. Furthermore, it can provide real-time assistance for clinicians. However, some challenges should be considered such as the limited availability of sharing large, annotated datasets which are necessary for developing efficient NLP models, the current popular evaluation methods may not be clinically relevant, and the lack of transparent protocols to ensure NLP methods are reproducible [63,64].

Data preprocessing

Elaborating on how the individual patient multiple admissions were handled in data preparation should be reported. Only 23 studies reported both a unique number of patients and the total number of rows indicating poor reporting of this item. Reporting both the number of patients and admissions and the methods of handling multiple admissions for the same patient is important since neglecting the correlation between admissions may lead to unreliable predictions.

Similarly, less than half of the studies (47%) reported a method of handling missing values. Only a few studies (32 studies) reported handling class imbalance in the dataset. Class imbalance means that the outcome contains more samples from one class (majority class) over the other classes (minority class) [65] and represents one of the most common issues in training ML models in predicting hospitalizations. However, it is not usually taken into consideration in the readmission risk prediction literature [66]. The problem with class imbalance is that the models could be biased towards the majority class leading to a misleadingly high prediction performance [67]. Resampling techniques, especially undersampling were the most used approach. Resampling techniques involve balancing the distribution of outcome classes either by oversampling or undersampling. Oversampling involves increasing the number of instances of minority class (e.g. SMOTE), while undersampling involves randomly reducing the number of majority class instances, thus balancing the class distribution [68]. It should be noted that resampling techniques have some drawbacks such as overfitting or losing useful information which can introduce problematic consequences and hinder model learning [69,70].

Models’ performance comparisons

Evaluating model performance in health-related outcomes should be reported on two levels: model performance (e.g., AUC, F1-score, etc.) and clinical performance metrics (e.g., sensitivity, specificity, PPV, NPV, etc.) [71]. More than one-third of the studies only reported a model performance metric, with AUC as the most used one, which could limit their acceptance in clinical practice.

The analysis of different algorithms’ performance confirms that no algorithm constantly performs better than the other [72]. Yet, some algorithms more frequently yield better results compared to others. In this review, we found that tree-based boosting algorithms often outperformed other algorithms (Table 2 and Fig 5). Tree-based boosting algorithms such as Gradient Boosting Machine (GBM), XGBoost, and AdaBoost, are a class of ensemble learning methods that build multiple decision trees sequentially [73]. Each new decision tree corrects the errors of previous ones by giving more focus to samples that were difficult to estimate [74]. The predictions of the trees are then combined to produce the final model prediction [75]. This group of algorithms has many advantages such as training multiple models which enhances the prediction performance over training a single one, flexibility to handle different data types, capturing non-linear patterns, and being less prone to overfitting [76].

Many studies tended to compare the performance of different algorithms on the same dataset. In this concern, we suggest that conducting even more studies with a sole focus on comparing the performance of commonly used ML algorithms is not needed unless they aim to benchmark new algorithms to the existing ones. We propose that researchers should focus on how to incorporate efforts to generalize ML models and implement them in clinical practice instead.

There is a discussion about whether ML models can offer better predictive abilities than conventional statistical models such as logistic regression (LR). While some studies found that ML models outperform regression models [11,37,77–81], others suggest that using the ML models gives no better prediction than LR [55,82,83]. In our analysis, ML models mostly performed better than regression models. This is consistent with a meta-analysis that concluded the same by comparing LR to advanced ML algorithms such as NN [84]. Regression models performed better only in 17% of studies compared between regression and ML algorithms (Table 2 and Fig 5). This can be justified by LR being a parametric algorithm and lacking enough flexibility compared to non-parametric ones [85] or that LR has restricted assumptions which also gives favor to the less-restricted or no-assumptions algorithms [86].

We also found that ML models outperform risk indexes in prediction performance. This is reasonable because risk indexes usually contain few predictors and aim mainly for simplification of predictions, while ML models utilize more predictors and complex methods to understand the pattern in datasets. It could also be argued that ML models are developed and tested on the same dataset and may be more skilled to predict the outcome from this specific dataset and even could be overfitted for it. While risk indexes are usually developed in a setting and then validated in different datasets and settings which would mean that ML models will outperform these indexes anyway.

Finally, two studies compared ML models to clinicians’ predictions. They concluded that the models outperform ED nurses in predicting admission to ED and that combining ML models with clinical insight improves the model’s performance [87,88].

Model validation

External validation (EV) should ideally be conducted on unrelated and structurally different datasets from the dataset used for model training [89,90]. If the validation dataset differs only temporally but still originates in the same settings, place, etc., it is called temporal EV and is regarded as an approach that lies midway between internal and external validation [91]. This is because the overall patients’ characteristics are similar between the two datasets [92]. Our analysis shows that there was a clear shortage in terms of EV of models. Most of the EV performed can be regarded as temporal EV. Although recent studies indicate an increased awareness of EV, maintaining EV continues to be a critical step in the current development of ML models [93]. However, there are still several obstacles facing ML models’ generalizability. These obstacles can be categorized as either model-related or data-related. Model-related obstacles include issues with transparency in model development and results reporting. Data-related ones include the diversity of data structure, formats, population, etc. across different healthcare systems, and the lack of a standardized data preprocessing framework. Additionally, the strict health data privacy regulations.

Adopting Common Data Models (CDMs) [94], designing a comprehensive and widely accepted framework for data preprocessing, and implementing Federated Learning (FL) [95] could help address these issues. In S1 File: Section 6, we provide a more detailed explanation of these obstacles and solutions.

Model explainability and availability

Model interpretation is of great importance in predicting health-related outcomes. Global model interpretation involves describing the most important rules and most influential features that the model learned in the training steps [96], while, local model interpretation refers to explaining how the model derived each individual prediction (i.e., for each patient) [97,98]

In our analysis, 62 studies (53%) introduced global interpretation to their model in the form of feature importance, or a risk score, for example [28,99,100], while only three studies presented methods for local interpretation [56,78]. Introducing both global and local model interpretation is important to increase the trustworthiness and to enhance the implementation of these models in practice [101–104].

Few studies made their dataset (20 studies) or code (17 studies) publicly available. To facilitate the technical reproducibility of the model, publishing both datasets and code is necessary. Indeed, healthcare datasets contain patients’ confidential information which hinders publishing them. Hence, some suggestions were reported to partially solve this issue such as publishing a simulated dataset [105], providing complementary empirical results on an open-source benchmark dataset [106], or sharing model prediction and data labels to allow further statistical analysis [107]. There is also no doubt that publicly available datasets such as MIMIC-III [108], have boosted ML research and opened many chances to develop ML in the health domain. MIMIC-III has been cited more than 3,000 times to date. The dataset has enabled numerous studies that focus on developing predictive models and enhancing clinical decision support systems [109,110].

Reporting model developing codes and performed experiments can help understand the final methodology, accelerate the overall development, and ensure that models are safeguarded from data leakage and other downfalls in model development [71,111]. Additionally, reporting the software and package versions is also necessary. Many decisions taken by algorithms are taken silently through the default setting of the different packages leading to differences in results when the experiment is repeated even on the same dataset [112].

Bias risk and applicability

More than 60% of the assessed studies had a high risk of bias in line with other reviews’ findings [38,39,113]. Twelve studies were found to have a high concern of applicability. However, factors such as variability of populations, settings, and dataset characteristics are anticipated to further constrain the applicability of these studies.

In general, we observed poor quality of reporting in the studies. This is consistent with findings in other studies [24,114,115]. Poor reporting quality raises concerns about the reproducibility of models [105]. Studies that adhered to TRIPOD had better scores than those that didn’t. This points to the importance of adherence to a reporting checklist in ML studies, especially in the health domain. It also raises the need to develop ML-specialized checklists for quality assessment and reporting quality. Ongoing research is currently addressing this requirement [116]. In S1 File: Section 7, we suggest a reporting scheme for ML studies.

Limitations

We identified the relevant literature from eight databases, but we have not approached authors for missing information on the studies. This is due to the considerable amount of missing information which could have impacted the assessment of bias risk. Our results are also limited by the fact that most of the reviewed studies were based on data from the USA which limits the generalizability because of the differences between populations and healthcare systems between countries. To address this limitation, future studies should aim to include diverse datasets from various countries and healthcare settings. Additionally, more efforts should be directed to compare models from different populations and settings to understand their limitations in different contexts. Assessing the quality of studies was also limited by not being able to access their code scripts. Potential publication bias also limits the ability of the review to comprehensively evaluate the overall results. Additionally, the variability of reporting varied significantly between the studies which can affect the reliability of the findings.

The heterogeneity and differences in healthcare systems and patient populations across countries and ML algorithms and settings limit the comparisons of results between studies and make it more difficult to harmonize the results of different models. Due to this heterogenicity, we had to make decisions regarding the inclusion criteria which may have caused us to miss relevant studies. Finally, only literature published in English was included which also limits our insights to the overall picture of ML development globally.

Conclusions

The main purpose of the review was to describe how ML was used in predicting all-cause somatic hospitalizations. The review raises some concerns about the quality of data preprocessing, the reporting quality, reproducibility, local interpretation, and the external validity of many studies. The quality of studies needs to improve to meet the expectations of clinicians and stakeholders before using these models in clinical practice. We recommend that future studies should prioritize generalizing ML models and integrating them into clinical practice.

Supporting information

S1 File. Includes: Section 1: PRISMA checklist, Section 2: Detailed inclusion/exclusion criteria, Section 3: Literature search syntax, Section 4: Studies data sources, Section 5: Benchmarking with risk indexes, Section 6: A comment on the generalizability of ML models, and Section 7: A suggestion of reporting checklist specifically for ML models in structured datasets.

https://doi.org/10.1371/journal.pone.0309175.s001

(DOCX)

S2 File. Includes: Sheet: Abbreviations, (Sheet: CHARMS) include the extracted data for review studies and studies citations, (Sheet: Features) includes feature-related extractions, (Sheet: PROBAST) includes the applicability and risk of bias assessment, (Sheet: TRIPOD) includes the reporting quality assessment, and (Sheet: Included & excluded studies) includes full-text screened studies with the reason(s) of exclusion.

https://doi.org/10.1371/journal.pone.0309175.s002

(XLSX)

References

1. McDermott KW, Jiang HJ. Characteristics and Costs of Potentially Preventable Inpatient Stays, 2017. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs. Agency for Healthcare Research and Quality (US); 2006. Available: https://www.ncbi.nlm.nih.gov/books/NBK559945/.
2. Jencks SF, Williams M V., Coleman EA. Rehospitalizations among Patients in the Medicare Fee-for-Service Program. N Engl J Med. 2009;361: 311–312. pmid:19605841
- View Article
- PubMed/NCBI
- Google Scholar
3. Lyhne CN, Bjerrum M, Riis AH, Jørgensen MJ. Interventions to Prevent Potentially Avoidable Hospitalizations: A Mixed Methods Systematic Review. Front Public Heal. 2022;10. pmid:35899150
- View Article
- PubMed/NCBI
- Google Scholar
4. Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk Prediction Models for Hospital Readmission. JAMA. 2011;306: 1688. pmid:22009101
- View Article
- PubMed/NCBI
- Google Scholar
5. Dhillon SK, Ganggayah MD, Sinnadurai S, Lio P, Taib NA. Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis. Diagnostics 2022, Vol 12, Page 2526. 2022;12: 2526. pmid:36292218
- View Article
- PubMed/NCBI
- Google Scholar
6. Wallace E, Stuart E, Vaughan N, Bennett K, Fahey T, Smith SM. Risk Prediction Models to Predict Emergency Hospital Admission in Community-dwelling Adults. Med Care. 2014;52: 751–765. pmid:25023919
- View Article
- PubMed/NCBI
- Google Scholar
7. Zhou H, Della PR, Roberts P, Goh L, Dhaliwal SS. Utility of models to predict 28-day or 30-day unplanned hospital readmissions: an updated systematic review. BMJ Open. 2016;6: e011060. pmid:27354072
- View Article
- PubMed/NCBI
- Google Scholar
8. Helm JM, Swiergosz AM, Haeberle HS, Karnuta JM, Schaffer JL, Krebs VE, et al. Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Curr Rev Musculoskelet Med. 2020;13: 69–76. pmid:31983042
- View Article
- PubMed/NCBI
- Google Scholar
9. El Naqa I, Murphy MJ. What Is Machine Learning? Machine Learning in Radiation Oncology. Cham: Springer International Publishing; 2015. pp. 3–11. https://doi.org/10.1007/978-3-319-18305-3_1
10. Rajula HSR, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment. Medicina (B Aires). 2020;56: 455. pmid:32911665
- View Article
- PubMed/NCBI
- Google Scholar
11. Artetxe A, Beristain A, Graña M. Predictive models for hospital readmission risk: A systematic review of methods. Comput Methods Programs Biomed. 2018;164: 49–64. pmid:30195431
- View Article
- PubMed/NCBI
- Google Scholar
12. Teo K, Yong CW, Chuah JH, Hum YC, Tee YK, Xia K, et al. Current Trends in Readmission Prediction: An Overview of Approaches. Arab J Sci Eng. 2021. pmid:34422543
- View Article
- PubMed/NCBI
- Google Scholar
13. Benedetto U, Dimagli A, Sinha S, Cocomello L, Gibbison B, Caputo M, et al. Machine learning improves mortality risk prediction after cardiac surgery: Systematic review and meta-analysis. J Thorac Cardiovasc Surg. 2022;163: 2075–2087.e9. pmid:32900480
- View Article
- PubMed/NCBI
- Google Scholar
14. Chen T, Madanian S, Airehrour D, Cherrington M. Machine learning methods for hospital readmission prediction: systematic analysis of literature. J Reliab Intell Environ. 2022;8: 49–66.
- View Article
- Google Scholar
15. Cho SM, Austin PC, Ross HJ, Abdel-Qadir H, Chicco D, Tomlinson G, et al. Machine Learning Compared With Conventional Statistical Models for Predicting Myocardial Infarction Readmission and Mortality: A Systematic Review. Can J Cardiol. 2021;37: 1207–1214. pmid:33677098
- View Article
- PubMed/NCBI
- Google Scholar
16. Sun Z, Dong W, Shi H, Ma H, Cheng L, Huang Z. Comparing Machine Learning Models and Statistical Models for Predicting Heart Failure Events: A Systematic Review and Meta-Analysis. Front Cardiovasc Med. 2022;9. pmid:35463786
- View Article
- PubMed/NCBI
- Google Scholar
17. Mahajan SM, Heidenreich P, Abbott B, Newton A, Ward D. Predictive models for identifying risk of readmission after index hospitalization for heart failure: A systematic review. Eur J Cardiovasc Nurs J Work Gr Cardiovasc Nurs Eur Soc Cardiol. 2018;17: 675–689. pmid:30189748
- View Article
- PubMed/NCBI
- Google Scholar
18. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev. 2021;10: 89. pmid:33781348
- View Article
- PubMed/NCBI
- Google Scholar
19. Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Med. 2014;11: e1001744. pmid:25314315
- View Article
- PubMed/NCBI
- Google Scholar
20. Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356: 6460. pmid:28057641
- View Article
- PubMed/NCBI
- Google Scholar
21. Clark JM, Sanders S, Carter M, Honeyman D, Cleo G, Auld Y, et al. Improving the translation of search strategies using the Polyglot Search Translator: a randomized controlled trial. J Med Libr Assoc. 2020;108. pmid:32256231
- View Article
- PubMed/NCBI
- Google Scholar
22. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019;170: 51. pmid:30596875
- View Article
- PubMed/NCBI
- Google Scholar
23. Collins GS, Reitsma JB, Altman DG, Moons K. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13: 1. pmid:25563062
- View Article
- PubMed/NCBI
- Google Scholar
24. Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, et al. Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review. BMC Med Res Methodol. 2022;22: 12. pmid:35026997
- View Article
- PubMed/NCBI
- Google Scholar
25. van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ. 2010;182: 551–7. pmid:20194559
- View Article
- PubMed/NCBI
- Google Scholar
26. Billings J, Blunt I, Steventon A, Georghiou T, Lewis G, Bardsley M. Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30). BMJ Open. 2012;2. pmid:22885591
- View Article
- PubMed/NCBI
- Google Scholar
27. Donzé J, Aujesky D, Williams D, Schnipper JL. Potentially Avoidable 30-Day Hospital Readmissions in Medical Patients. JAMA Intern Med. 2013;173: 632. pmid:23529115
- View Article
- PubMed/NCBI
- Google Scholar
28. Fenn A, Davis C, Buckland DM, Kapadia N, Nichols M, Gao M, et al. Development and Validation of Machine Learning Models to Predict Admission From Emergency Department to Inpatient and Intensive Care Units. Ann Emerg Med. 2021;78: 290–302. pmid:33972128
- View Article
- PubMed/NCBI
- Google Scholar
29. Chandra A, Rahman PA, Sneve A, McCoy RG, Thorsteinsdottir B, Chaudhry R, et al. Risk of 30-Day Hospital Readmission Among Patients Discharged to Skilled Nursing Facilities: Development and Validation of a Risk-Prediction Model. J Am Med Dir Assoc. 2019;20: 444–450.e2. pmid:30852170
- View Article
- PubMed/NCBI
- Google Scholar
30. Spangler D, Hermansson T, Smekal D, Blomberg H. A validation of machine learning-based risk scores in the prehospital setting. PLoS One. 2019;14: e0226518. pmid:31834920
- View Article
- PubMed/NCBI
- Google Scholar
31. Raita Y, Goto T, Faridi MK, Brown DFM, Camargo CAJ, Hasegawa K. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care. 2019;23: 64. pmid:30795786
- View Article
- PubMed/NCBI
- Google Scholar
32. Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Ayala Solares R, Raimondi F, et al. Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records. PLoS Med. 2018;15: e1002695. pmid:30458006
- View Article
- PubMed/NCBI
- Google Scholar
33. Hegselmann S, Ertmer C, Volkert T, Gottschalk A, Dugas M, Varghese J. Development and validation of an interpretable 3 day intensive care unit readmission prediction model using explainable boosting machines. Front Med. 2022;9. pmid:36082270
- View Article
- PubMed/NCBI
- Google Scholar
34. Olza A, Millán E, Rodríguez-Álvarez MX. Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms. BMC Med Inform Decis Mak. 2023;23: 152. pmid:37543596
- View Article
- PubMed/NCBI
- Google Scholar
35. Brankovic A, Rolls D, Boyle J, Niven P, Khanna S. Identifying patients at risk of unplanned re-hospitalisation using statewide electronic health records. Sci Rep. 2022;12: 16592. pmid:36198757
- View Article
- PubMed/NCBI
- Google Scholar
36. Dadabhoy FZ, Driver L, McEvoy DS, Stevens R, Rubins D, Dutta S. Prospective External Validation of a Commercial Model Predicting the Likelihood of Inpatient Admission From the Emergency Department. Ann Emerg Med. 2023;81: 738–748. pmid:36682997
- View Article
- PubMed/NCBI
- Google Scholar
37. Shin S, Austin PC, Ross HJ, Abdel‐Qadir H, Freitas C, Tomlinson G, et al. Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality. ESC Hear Fail. 2021;8: 106–115. pmid:33205591
- View Article
- PubMed/NCBI
- Google Scholar
38. Huang Y, Talwar A, Chatterjee S, Aparasu RR. Application of machine learning in predicting hospital readmissions: a scoping review of the literature. BMC Med Res Methodol. 2021;21: 96. pmid:33952192
- View Article
- PubMed/NCBI
- Google Scholar
39. Kamel Rahimi A, Canfell OJ, Chan W, Sly B, Pole JD, Sullivan C, et al. Machine learning models for diabetes management in acute care using electronic medical records: A systematic review. Int J Med Inform. 2022;162: 104758. pmid:35398812
- View Article
- PubMed/NCBI
- Google Scholar
40. Kashyap M, Seneviratne M, Banda JM, Falconer T, Ryu B, Yoo S, et al. Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network. J Am Med Inform Assoc. 2020;27: 877–883. pmid:32374408
- View Article
- PubMed/NCBI
- Google Scholar
41. Demir E. A Decision Support Tool for Predicting Patients at Risk of Readmission: A Comparison of Classification Trees, Logistic Regression, Generalized Additive Models, and Multivariate Adaptive Regression Splines. Decis Sci. 2014;45: 849–880.
- View Article
- Google Scholar
42. Junqueira ARB, Mirza F, Baig MM. A machine learning model for predicting ICU readmissions and key risk factors: analysis from a longitudinal health records. Health Technol (Berl). 2019;9: 297–309.
- View Article
- Google Scholar
43. Hong WS, Haimovich AD, Taylor RA. Predicting hospital admission at emergency department triage using machine learning. PLoS One. 2018;13: e0201016. pmid:30028888
- View Article
- PubMed/NCBI
- Google Scholar
44. Sabbatini AK, Kocher KE, Basu A, Hsia RY. In-Hospital Outcomes and Costs Among Patients Hospitalized During a Return Visit to the Emergency Department. JAMA. 2016;315: 663. pmid:26881369
- View Article
- PubMed/NCBI
- Google Scholar
45. Adams JG. Ensuring the Quality of Quality Metrics for Emergency Care. JAMA. 2016;315: 659. pmid:26881367
- View Article
- PubMed/NCBI
- Google Scholar
46. Fialho AS, Cismondi F, Vieira SM, Reti SR, Sousa JMC, Finkelstein SN. Data mining using clinical physiology at discharge to predict ICU readmissions. Expert Syst Appl. 2012;39: 13158–13165.
- View Article
- Google Scholar
47. Lorenzana A, Tyagi M, Wang QC, Chawla R, Nigam S. Using text notes from call center data to predict hospitalization. Value Heal. 2016;19: A87.
- View Article
- Google Scholar
48. Jayousi, Rashid; Assaf R. 30-day Hospital Readmission Prediction using MIMIC Data. The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings. Al-Quds University,Jerusalem,Palestine: The Institute of Electrical and Electronics Engineers, Inc. (IEEE); 2020. pp. 1–6. http://dx.doi.org/10.1109/AICT50176.2020.9368625.
49. Xue Y, Klabjan D, Luo Y. Predicting ICU readmission using grouped physiological and medication trends. Artif Intell Med. 2019;95: 27–37. pmid:30213670
- View Article
- PubMed/NCBI
- Google Scholar
50. Feretzakis G, Karlis G, Loupelis E, Kalles D, Chatzikyriakou R, Trakas N, et al. Using Machine Learning Techniques to Predict Hospital Admission at the Emergency Department. J Crit Care Med. 2022;8: 107–116. pmid:35950158
- View Article
- PubMed/NCBI
- Google Scholar
51. Aphinyanaphongs Y, Liang Y, Theobald J, Grover H, Swartz JL. Models to predict hospital admission from the emergency department through the sole use of the medication administration record. Acad Emerg Med. 2016;23: S116.
- View Article
- Google Scholar
52. Lucini FR, Fogliatto FS, da Silveira GJC, Neyeloff JL, Anzanello MJ, Kuchenbecker R de S, et al. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform. 2017;100: 1–8. pmid:28241931
- View Article
- PubMed/NCBI
- Google Scholar
53. Curto S, Carvalho JP, Salgado C, Vieira SM, Sousa JMC. Predicting ICU readmissions based on bedside medical text notes. 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). Piscataway: IEEE; 2016. pp. 2144-a-2151-h. https://doi.org/10.1109/FUZZ-IEEE.2016.7737956
54. Handly N, Thompson DA, Li J, Chuirazzi DM, Venkat A. Evaluation of a hospital admission prediction model adding coded chief complaint data using neural network methodology. Eur J Emerg Med. 2015;22: 87–91. pmid:24509606
- View Article
- PubMed/NCBI
- Google Scholar
55. Zhang X, Kim J, Patzer RE, Pitts SR, Patzer A, Schrager JD. Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks. Methods Inf Med. 2017;56: 377–389. pmid:28816338
- View Article
- PubMed/NCBI
- Google Scholar
56. Hilton CB, Milinovich A, Felix C, Vakharia N, Crone T, Donovan C, et al. Personalized predictions of patient outcomes during and after hospitalization using artificial intelligence. NPJ Digit Med. 2020;3: 1–8. pmid:32285012
- View Article
- PubMed/NCBI
- Google Scholar
57. Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. Predicting Intensive Care Unit admission among patients presenting to the emergency department using machine learning and natural language processing. Olier I, editor. PLoS One. 2020;15: e0229331. pmid:32126097
- View Article
- PubMed/NCBI
- Google Scholar
58. Topaz M, Woo K, Ryvicker M, Zolnoori M, Cato K. Home Healthcare Clinical Notes Predict Patient Hospitalization and Emergency Department Visits. Nurs Res. 2020;69: 448–454. pmid:32852359
- View Article
- PubMed/NCBI
- Google Scholar
59. Boggan JC, Schulteis RD, Simel DL, Lucas JE. Use of a natural language processing algorithm to predict readmissions at a veterans affairs hospital. J Gen Intern Med. 2019;34: S396–S397.
- View Article
- Google Scholar
60. Teo K, Yong CW, Chuah JH, Hum YC, Tee YK, Xia K, et al. Current Trends in Readmission Prediction: An Overview of Approaches. Arab J Sci Eng. pmid:34422543
- View Article
- PubMed/NCBI
- Google Scholar
61. Li Z, Xing X, Lu B, Zhao Y, Li Z. Early Prediction of 30-Day ICU Re-admissions Using Natural Language Processing and Machine Learning. Biomed Stat Informatics. 2019;4: 22.
- View Article
- Google Scholar
62. Sterling NW, Patzer RE, Di M, Schrager JD. Prediction of emergency department patient disposition based on natural language processing of triage notes. Int J Med Inform. 2019;129: 184–188. pmid:31445253
- View Article
- PubMed/NCBI
- Google Scholar
63. Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, et al. Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. J Biomed Inform. 2018;88: 11–19. pmid:30368002
- View Article
- PubMed/NCBI
- Google Scholar
64. Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Informatics. 2019;7: e12239. pmid:31066697
- View Article
- PubMed/NCBI
- Google Scholar
65. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6: 27.
- View Article
- Google Scholar
66. Artetxe A, Graña M, Beristain A, Ríos S. Emergency Department Readmission Risk Prediction: A Case Study in Chile. In: Vicente JMF, AlvarezSanchez JR, Lopez F, Moreo JT, Adeli H, editors. BIOMEDICAL APPLICATIONS BASED ON NATURAL AND ARTIFICIAL COMPUTING, PT II. Vicomtech IK4 Res Ctr, Mikeletegi Pasealekua 57, San Sebastian 20009, Spain; 2017. pp. 11–20. https://doi.org/10.1007/978-3-319-59773-7_2
67. Tanha J, Abdi Y, Samadi N, Razzaghi N, Asadpour M. Boosting methods for multi-class imbalanced data classification: an experimental review. J Big Data. 2020;7: 70.
- View Article
- Google Scholar
68. Guo X, Yin Y, Dong C, Yang G, Zhou G. On the Class Imbalance Problem. 2008 Fourth International Conference on Natural Computation. IEEE; 2008. pp. 192–201. https://doi.org/10.1109/ICNC.2008.871
69. He Haibo, Garcia EA. Learning from Imbalanced Data. IEEE Trans Knowl Data Eng. 2009;21: 1263–1284.
- View Article
- Google Scholar
70. Kaur H, Pannu HS, Malhi AK. A Systematic Review on Imbalanced Data Challenges in Machine Learning. ACM Comput Surv. 2020;52: 1–36.
- View Article
- Google Scholar
71. Norgeot B, Quer G, Beaulieu-Jones BK, Torkamani A, Dias R, Gianfrancesco M, et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med. 2020;26: 1320–1324. pmid:32908275
- View Article
- PubMed/NCBI
- Google Scholar
72. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1: 67–82.
- View Article
- Google Scholar
73. Zhou Z-H. Ensemble Learning. Encyclopedia of Biometrics. Boston, MA: Springer US; 2009. pp. 270–273. https://doi.org/10.1007/978-0-387-73003-5_293
74. Zhang Y, Haghani A. A gradient boosting method to improve travel time prediction. Transp Res Part C Emerg Technol. 2015;58: 308–324.
- View Article
- Google Scholar
75. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785
76. Opitz D, Maclin R. Popular Ensemble Methods: An Empirical Study. J Artif Intell Res. 1999;11: 169–198.
- View Article
- Google Scholar
77. Graham B, Bond R, Quinn M, Mulvenna M. Using Data Mining to Predict Hospital Admissions From the Emergency Department. IEEE Access. 2018;6: 10458–10469.
- View Article
- Google Scholar
78. Lo Y-T, Liao JC-H, Chen M-H, Chang C-M, Li C-T. Predictive modeling for 14-day unplanned hospital readmission risk by using machine learning algorithms. BMC Med Inform Decis Mak. 2021;21: 288. pmid:34670553
- View Article
- PubMed/NCBI
- Google Scholar
79. Futoma J, Morris J, Lucas J. A comparison of models for predicting early hospital readmissions. J Biomed Inform. 2015;56: 229–238. pmid:26044081
- View Article
- PubMed/NCBI
- Google Scholar
80. Li Q, Yao X, Échevin D. How Good Is Machine Learning in Predicting All-Cause 30-Day Hospital Readmission? Evidence From Administrative Data. Value Heal J Int Soc Pharmacoeconomics Outcomes Res. 2020;23: 1307–1315. pmid:33032774
- View Article
- PubMed/NCBI
- Google Scholar
81. Barbieri S, Kemp J, Perez-Concha O, Kotwal S, Gallagher M, Ritchie A, et al. Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk. Sci Rep. 2020;10: 1111. pmid:31980704
- View Article
- PubMed/NCBI
- Google Scholar
82. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110: 12–22. pmid:30763612
- View Article
- PubMed/NCBI
- Google Scholar
83. Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, et al. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol. 2020;122: 95–107. pmid:32201256
- View Article
- PubMed/NCBI
- Google Scholar
84. Talwar A, Lopez-Olivo MA, Huang Y, Ying L, Aparasu RR. Performance of advanced machine learning algorithms OVER logistic regression in predicting hospital readmissions: A meta-analysis. Explor Res Clin Soc Pharm. 2023; 100317. pmid:37662697
- View Article
- PubMed/NCBI
- Google Scholar
85. Kino S, Hsu Y-T, Shiba K, Chien Y-S, Mita C, Kawachi I, et al. A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects. SSM—Popul Heal. 2021;15: 100836. pmid:34169138
- View Article
- PubMed/NCBI
- Google Scholar
86. Mesgarpour M, Chaussalet T, Chahed S. Ensemble Risk Model of Emergency Admissions (ERMER). Int J Med Inform. 2017;103: 65–77. pmid:28551003
- View Article
- PubMed/NCBI
- Google Scholar
87. Peck JS, Benneyan JC, Nightingale DJ, Gaehde SA. Predicting emergency department inpatient admissions to improve same-day patient flow. Acad Emerg Med Off J Soc Acad Emerg Med. 2012;19: E1045–54. pmid:22978731
- View Article
- PubMed/NCBI
- Google Scholar
88. Flaks-Manov N, Shadmi E, Yahalom R, Perry-Mezre H, Balicer R, Srulovici E. Identification of elderly patients at-risk for 30-day readmission: clinical insight beyond big data prediction. J Nurs Manag. 2021. pmid:34661943
- View Article
- PubMed/NCBI
- Google Scholar
89. Ramspek CL, Jager KJ, Dekker FW, Zoccali C, van Diepen M. External validation of prognostic models: what, why, how, when and where? Clin Kidney J. 2021;14: 49–58. pmid:33564405
- View Article
- PubMed/NCBI
- Google Scholar
90. Riley RD, Ensor J, Snell KIE, Debray TPA, Altman DG, Moons KGM, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016; i3140. pmid:27334381
- View Article
- PubMed/NCBI
- Google Scholar
91. Staartjes VE, Kernbach JM. Significance of external validation in clinical machine learning: let loose too early? Spine J. 2020;20: 1159–1160. pmid:32624150
- View Article
- PubMed/NCBI
- Google Scholar
92. Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338: b605–b605. pmid:19477892
- View Article
- PubMed/NCBI
- Google Scholar
93. Cabitza F, Campagner A, Soares F, García de Guadiana-Romualdo L, Challa F, Sulejmani A, et al. The importance of being external. methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed. 2021;208: 106288. pmid:34352688
- View Article
- PubMed/NCBI
- Google Scholar
94. Ryu B, Yoo S, Kim S, Choi J. Development of Prediction Models for Unplanned Hospital Readmission within 30 Days Based on Common Data Model: A Feasibility Study. Methods Inf Med. 2021. pmid:34583416
- View Article
- PubMed/NCBI
- Google Scholar
95. Rieke N, Hancox J, Li W, Milletarì F, Roth HR, Albarqouni S, et al. The future of digital health with federated learning. npj Digit Med. 2020;3: 119. pmid:33015372
- View Article
- PubMed/NCBI
- Google Scholar
96. Yang C, Rangarajan A, Ranka S. Global Model Interpretation via Recursive Partitioning. 2018 [cited 27 Dec 2022]. Available: http://arxiv.org/abs/1802.04253.
- View Article
- Google Scholar
97. Kopitar L, Cilar L, Kocbek P, Stiglic G. Local vs. Global Interpretability of Machine Learning Models in Type 2 Diabetes Mellitus Screening. 2019. pp. 108–119.
- View Article
- Google Scholar
98. Du M, Liu N, Hu X. Techniques for interpretable machine learning. Commun ACM. 2019;63: 68–77.
- View Article
- Google Scholar
99. Wu CX, Suresh E, Phng FWL, Tai KP, Pakdeethai J, D’Souza JLA, et al. Effect of a Real-Time Risk Score on 30-day Readmission Reduction in Singapore. Appl Clin Inform. 2021;12: 372–382. pmid:34010978
- View Article
- PubMed/NCBI
- Google Scholar
100. Maali Y, Perez-Concha O, Coiera E, Roffe D, Day RO, Gallego B. Predicting 7-day, 30-day and 60-day all-cause unplanned readmission: a case study of a Sydney hospital. BMC Med Informatics Decis Mak. 2018;18: 1-N.PAG. pmid:29301576
- View Article
- PubMed/NCBI
- Google Scholar
101. Petch J, Di S, Nelson W. Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology. Can J Cardiol. 2022;38: 204–213. pmid:34534619
- View Article
- PubMed/NCBI
- Google Scholar
102. Ribeiro MT, Singh S, Guestrin C. “Why should i trust you?” Explaining the predictions of any classifier. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery; 2016. pp. 1135–1144.
- View Article
- Google Scholar
103. Sheu Y. Illuminating the Black Box: Interpreting Deep Neural Network Models for Psychiatric Research. Front Psychiatry. 2020;11. pmid:33192663
- View Article
- PubMed/NCBI
- Google Scholar
104. Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. 2020;58: 82–115.
- View Article
- Google Scholar
105. McDermott MBA, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: Still a ways to go. Sci Transl Med. 2021;13. pmid:33762434
- View Article
- PubMed/NCBI
- Google Scholar
106. Pineau J, Vincent-Lamarre P, Sinha K, Lariviére V, Beygelzimer A, d’Alché-Buc F, et al. Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program). J Mach Learn Res. 2020;22: 1–20.
- View Article
- Google Scholar
107. Haibe-Kains B, Adam GA, Hosny A, Khodakarami F, Shraddha T, Kusko R, et al. Transparency and reproducibility in artificial intelligence. Nature. 2020;586: E14–E16. pmid:33057217
- View Article
- PubMed/NCBI
- Google Scholar
108. Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3: 160035. pmid:27219127
- View Article
- PubMed/NCBI
- Google Scholar
109. Harutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A. Multitask learning and benchmarking with clinical time series data. Sci Data. 2019;6: 96. pmid:31209213
- View Article
- PubMed/NCBI
- Google Scholar
110. Johnson AEW, Ghassemi MM, Nemati S, Niehaus KE, Clifton D, Clifford GD. Machine Learning and Decision Support in Critical Care. Proc IEEE. 2016;104: 444–466. pmid:27765959
- View Article
- PubMed/NCBI
- Google Scholar
111. Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digit Med. 2022;5: 48. pmid:35413988
- View Article
- PubMed/NCBI
- Google Scholar
112. Beam AL, Manrai AK, Ghassemi M. Challenges to the Reproducibility of Machine Learning Models in Health Care. JAMA. 2020;323: 305. pmid:31904799
- View Article
- PubMed/NCBI
- Google Scholar
113. Lans A, Pierik RJB, Bales JR, Fourman MS, Shin D, Kanbier LN, et al. Quality assessment of machine learning models for diagnostic imaging in orthopaedics: A systematic review. Artif Intell Med. 2022;132: 102396. pmid:36207080
- View Article
- PubMed/NCBI
- Google Scholar
114. Yusuf M, Atal I, Li J, Smith P, Ravaud P, Fergie M, et al. Reporting quality of studies using machine learning models for medical diagnosis: a systematic review. BMJ Open. 2020;10: e034568. pmid:32205374
- View Article
- PubMed/NCBI
- Google Scholar
115. Li J, Zhou Z, Dong J, Fu Y, Li Y, Luan Z, et al. Predicting breast cancer 5-year survival using machine learning: A systematic review. Baltzer PAT, editor. PLoS One. 2021;16: e0250370. pmid:33861809
- View Article
- PubMed/NCBI
- Google Scholar
116. Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11: e048008. pmid:34244270
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. McDermott KW, Jiang HJ. Characteristics and Costs of Potentially Preventable Inpatient Stays, 2017. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs. Agency for Healthcare Research and Quality (US); 2006. Available: https://www.ncbi.nlm.nih.gov/books/NBK559945/.

[ref2] 2. Jencks SF, Williams M V., Coleman EA. Rehospitalizations among Patients in the Medicare Fee-for-Service Program. N Engl J Med. 2009;361: 311–312. pmid:19605841
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Lyhne CN, Bjerrum M, Riis AH, Jørgensen MJ. Interventions to Prevent Potentially Avoidable Hospitalizations: A Mixed Methods Systematic Review. Front Public Heal. 2022;10. pmid:35899150
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, et al. Risk Prediction Models for Hospital Readmission. JAMA. 2011;306: 1688. pmid:22009101
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Dhillon SK, Ganggayah MD, Sinnadurai S, Lio P, Taib NA. Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis. Diagnostics 2022, Vol 12, Page 2526. 2022;12: 2526. pmid:36292218
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Wallace E, Stuart E, Vaughan N, Bennett K, Fahey T, Smith SM. Risk Prediction Models to Predict Emergency Hospital Admission in Community-dwelling Adults. Med Care. 2014;52: 751–765. pmid:25023919
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Zhou H, Della PR, Roberts P, Goh L, Dhaliwal SS. Utility of models to predict 28-day or 30-day unplanned hospital readmissions: an updated systematic review. BMJ Open. 2016;6: e011060. pmid:27354072
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Helm JM, Swiergosz AM, Haeberle HS, Karnuta JM, Schaffer JL, Krebs VE, et al. Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Curr Rev Musculoskelet Med. 2020;13: 69–76. pmid:31983042
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref9] 9. El Naqa I, Murphy MJ. What Is Machine Learning? Machine Learning in Radiation Oncology. Cham: Springer International Publishing; 2015. pp. 3–11. https://doi.org/10.1007/978-3-319-18305-3_1

[ref10] 10. Rajula HSR, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment. Medicina (B Aires). 2020;56: 455. pmid:32911665
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref11] 11. Artetxe A, Beristain A, Graña M. Predictive models for hospital readmission risk: A systematic review of methods. Comput Methods Programs Biomed. 2018;164: 49–64. pmid:30195431
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref12] 12. Teo K, Yong CW, Chuah JH, Hum YC, Tee YK, Xia K, et al. Current Trends in Readmission Prediction: An Overview of Approaches. Arab J Sci Eng. 2021. pmid:34422543
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref13] 13. Benedetto U, Dimagli A, Sinha S, Cocomello L, Gibbison B, Caputo M, et al. Machine learning improves mortality risk prediction after cardiac surgery: Systematic review and meta-analysis. J Thorac Cardiovasc Surg. 2022;163: 2075–2087.e9. pmid:32900480
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref14] 14. Chen T, Madanian S, Airehrour D, Cherrington M. Machine learning methods for hospital readmission prediction: systematic analysis of literature. J Reliab Intell Environ. 2022;8: 49–66.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref15] 15. Cho SM, Austin PC, Ross HJ, Abdel-Qadir H, Chicco D, Tomlinson G, et al. Machine Learning Compared With Conventional Statistical Models for Predicting Myocardial Infarction Readmission and Mortality: A Systematic Review. Can J Cardiol. 2021;37: 1207–1214. pmid:33677098
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref16] 16. Sun Z, Dong W, Shi H, Ma H, Cheng L, Huang Z. Comparing Machine Learning Models and Statistical Models for Predicting Heart Failure Events: A Systematic Review and Meta-Analysis. Front Cardiovasc Med. 2022;9. pmid:35463786
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref17] 17. Mahajan SM, Heidenreich P, Abbott B, Newton A, Ward D. Predictive models for identifying risk of readmission after index hospitalization for heart failure: A systematic review. Eur J Cardiovasc Nurs J Work Gr Cardiovasc Nurs Eur Soc Cardiol. 2018;17: 675–689. pmid:30189748
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref18] 18. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev. 2021;10: 89. pmid:33781348
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref19] 19. Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Med. 2014;11: e1001744. pmid:25314315
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref20] 20. Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356: 6460. pmid:28057641
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref21] 21. Clark JM, Sanders S, Carter M, Honeyman D, Cleo G, Auld Y, et al. Improving the translation of search strategies using the Polyglot Search Translator: a randomized controlled trial. J Med Libr Assoc. 2020;108. pmid:32256231
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref22] 22. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med. 2019;170: 51. pmid:30596875
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref23] 23. Collins GS, Reitsma JB, Altman DG, Moons K. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13: 1. pmid:25563062
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref24] 24. Andaur Navarro CL, Damen JAA, Takada T, Nijman SWJ, Dhiman P, Ma J, et al. Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review. BMC Med Res Methodol. 2022;22: 12. pmid:35026997
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref25] 25. van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ. 2010;182: 551–7. pmid:20194559
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref26] 26. Billings J, Blunt I, Steventon A, Georghiou T, Lewis G, Bardsley M. Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30). BMJ Open. 2012;2. pmid:22885591
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref27] 27. Donzé J, Aujesky D, Williams D, Schnipper JL. Potentially Avoidable 30-Day Hospital Readmissions in Medical Patients. JAMA Intern Med. 2013;173: 632. pmid:23529115
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref28] 28. Fenn A, Davis C, Buckland DM, Kapadia N, Nichols M, Gao M, et al. Development and Validation of Machine Learning Models to Predict Admission From Emergency Department to Inpatient and Intensive Care Units. Ann Emerg Med. 2021;78: 290–302. pmid:33972128
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref29] 29. Chandra A, Rahman PA, Sneve A, McCoy RG, Thorsteinsdottir B, Chaudhry R, et al. Risk of 30-Day Hospital Readmission Among Patients Discharged to Skilled Nursing Facilities: Development and Validation of a Risk-Prediction Model. J Am Med Dir Assoc. 2019;20: 444–450.e2. pmid:30852170
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref30] 30. Spangler D, Hermansson T, Smekal D, Blomberg H. A validation of machine learning-based risk scores in the prehospital setting. PLoS One. 2019;14: e0226518. pmid:31834920
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref31] 31. Raita Y, Goto T, Faridi MK, Brown DFM, Camargo CAJ, Hasegawa K. Emergency department triage prediction of clinical outcomes using machine learning models. Crit Care. 2019;23: 64. pmid:30795786
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref32] 32. Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Ayala Solares R, Raimondi F, et al. Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records. PLoS Med. 2018;15: e1002695. pmid:30458006
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref33] 33. Hegselmann S, Ertmer C, Volkert T, Gottschalk A, Dugas M, Varghese J. Development and validation of an interpretable 3 day intensive care unit readmission prediction model using explainable boosting machines. Front Med. 2022;9. pmid:36082270
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref34] 34. Olza A, Millán E, Rodríguez-Álvarez MX. Development and validation of predictive models for unplanned hospitalization in the Basque Country: analyzing the variability of non-deterministic algorithms. BMC Med Inform Decis Mak. 2023;23: 152. pmid:37543596
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref35] 35. Brankovic A, Rolls D, Boyle J, Niven P, Khanna S. Identifying patients at risk of unplanned re-hospitalisation using statewide electronic health records. Sci Rep. 2022;12: 16592. pmid:36198757
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref36] 36. Dadabhoy FZ, Driver L, McEvoy DS, Stevens R, Rubins D, Dutta S. Prospective External Validation of a Commercial Model Predicting the Likelihood of Inpatient Admission From the Emergency Department. Ann Emerg Med. 2023;81: 738–748. pmid:36682997
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref37] 37. Shin S, Austin PC, Ross HJ, Abdel‐Qadir H, Freitas C, Tomlinson G, et al. Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality. ESC Hear Fail. 2021;8: 106–115. pmid:33205591
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref38] 38. Huang Y, Talwar A, Chatterjee S, Aparasu RR. Application of machine learning in predicting hospital readmissions: a scoping review of the literature. BMC Med Res Methodol. 2021;21: 96. pmid:33952192
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref39] 39. Kamel Rahimi A, Canfell OJ, Chan W, Sly B, Pole JD, Sullivan C, et al. Machine learning models for diabetes management in acute care using electronic medical records: A systematic review. Int J Med Inform. 2022;162: 104758. pmid:35398812
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref40] 40. Kashyap M, Seneviratne M, Banda JM, Falconer T, Ryu B, Yoo S, et al. Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network. J Am Med Inform Assoc. 2020;27: 877–883. pmid:32374408
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref41] 41. Demir E. A Decision Support Tool for Predicting Patients at Risk of Readmission: A Comparison of Classification Trees, Logistic Regression, Generalized Additive Models, and Multivariate Adaptive Regression Splines. Decis Sci. 2014;45: 849–880.
View Article
Google Scholar

[155] View Article

[156] Google Scholar

[ref42] 42. Junqueira ARB, Mirza F, Baig MM. A machine learning model for predicting ICU readmissions and key risk factors: analysis from a longitudinal health records. Health Technol (Berl). 2019;9: 297–309.
View Article
Google Scholar

[158] View Article

[159] Google Scholar

[ref43] 43. Hong WS, Haimovich AD, Taylor RA. Predicting hospital admission at emergency department triage using machine learning. PLoS One. 2018;13: e0201016. pmid:30028888
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref44] 44. Sabbatini AK, Kocher KE, Basu A, Hsia RY. In-Hospital Outcomes and Costs Among Patients Hospitalized During a Return Visit to the Emergency Department. JAMA. 2016;315: 663. pmid:26881369
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

[ref45] 45. Adams JG. Ensuring the Quality of Quality Metrics for Emergency Care. JAMA. 2016;315: 659. pmid:26881367
View Article
PubMed/NCBI
Google Scholar

[169] View Article

[170] PubMed/NCBI

[171] Google Scholar

[ref46] 46. Fialho AS, Cismondi F, Vieira SM, Reti SR, Sousa JMC, Finkelstein SN. Data mining using clinical physiology at discharge to predict ICU readmissions. Expert Syst Appl. 2012;39: 13158–13165.
View Article
Google Scholar

[173] View Article

[174] Google Scholar

[ref47] 47. Lorenzana A, Tyagi M, Wang QC, Chawla R, Nigam S. Using text notes from call center data to predict hospitalization. Value Heal. 2016;19: A87.
View Article
Google Scholar

[176] View Article

[177] Google Scholar

[ref48] 48. Jayousi, Rashid; Assaf R. 30-day Hospital Readmission Prediction using MIMIC Data. The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings. Al-Quds University,Jerusalem,Palestine: The Institute of Electrical and Electronics Engineers, Inc. (IEEE); 2020. pp. 1–6. http://dx.doi.org/10.1109/AICT50176.2020.9368625.

[ref49] 49. Xue Y, Klabjan D, Luo Y. Predicting ICU readmission using grouped physiological and medication trends. Artif Intell Med. 2019;95: 27–37. pmid:30213670
View Article
PubMed/NCBI
Google Scholar

[180] View Article

[181] PubMed/NCBI

[182] Google Scholar

[ref50] 50. Feretzakis G, Karlis G, Loupelis E, Kalles D, Chatzikyriakou R, Trakas N, et al. Using Machine Learning Techniques to Predict Hospital Admission at the Emergency Department. J Crit Care Med. 2022;8: 107–116. pmid:35950158
View Article
PubMed/NCBI
Google Scholar

[184] View Article

[185] PubMed/NCBI

[186] Google Scholar

[ref51] 51. Aphinyanaphongs Y, Liang Y, Theobald J, Grover H, Swartz JL. Models to predict hospital admission from the emergency department through the sole use of the medication administration record. Acad Emerg Med. 2016;23: S116.
View Article
Google Scholar

[188] View Article

[189] Google Scholar

[ref52] 52. Lucini FR, Fogliatto FS, da Silveira GJC, Neyeloff JL, Anzanello MJ, Kuchenbecker R de S, et al. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform. 2017;100: 1–8. pmid:28241931
View Article
PubMed/NCBI
Google Scholar

[191] View Article

[192] PubMed/NCBI

[193] Google Scholar

[ref53] 53. Curto S, Carvalho JP, Salgado C, Vieira SM, Sousa JMC. Predicting ICU readmissions based on bedside medical text notes. 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). Piscataway: IEEE; 2016. pp. 2144-a-2151-h. https://doi.org/10.1109/FUZZ-IEEE.2016.7737956

[ref54] 54. Handly N, Thompson DA, Li J, Chuirazzi DM, Venkat A. Evaluation of a hospital admission prediction model adding coded chief complaint data using neural network methodology. Eur J Emerg Med. 2015;22: 87–91. pmid:24509606
View Article
PubMed/NCBI
Google Scholar

[196] View Article

[197] PubMed/NCBI

[198] Google Scholar

[ref55] 55. Zhang X, Kim J, Patzer RE, Pitts SR, Patzer A, Schrager JD. Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks. Methods Inf Med. 2017;56: 377–389. pmid:28816338
View Article
PubMed/NCBI
Google Scholar

[200] View Article

[201] PubMed/NCBI

[202] Google Scholar

[ref56] 56. Hilton CB, Milinovich A, Felix C, Vakharia N, Crone T, Donovan C, et al. Personalized predictions of patient outcomes during and after hospitalization using artificial intelligence. NPJ Digit Med. 2020;3: 1–8. pmid:32285012
View Article
PubMed/NCBI
Google Scholar

[204] View Article

[205] PubMed/NCBI

[206] Google Scholar

[ref57] 57. Fernandes M, Mendes R, Vieira SM, Leite F, Palos C, Johnson A, et al. Predicting Intensive Care Unit admission among patients presenting to the emergency department using machine learning and natural language processing. Olier I, editor. PLoS One. 2020;15: e0229331. pmid:32126097
View Article
PubMed/NCBI
Google Scholar

[208] View Article

[209] PubMed/NCBI

[210] Google Scholar

[ref58] 58. Topaz M, Woo K, Ryvicker M, Zolnoori M, Cato K. Home Healthcare Clinical Notes Predict Patient Hospitalization and Emergency Department Visits. Nurs Res. 2020;69: 448–454. pmid:32852359
View Article
PubMed/NCBI
Google Scholar

[212] View Article

[213] PubMed/NCBI

[214] Google Scholar

[ref59] 59. Boggan JC, Schulteis RD, Simel DL, Lucas JE. Use of a natural language processing algorithm to predict readmissions at a veterans affairs hospital. J Gen Intern Med. 2019;34: S396–S397.
View Article
Google Scholar

[216] View Article

[217] Google Scholar

[ref60] 60. Teo K, Yong CW, Chuah JH, Hum YC, Tee YK, Xia K, et al. Current Trends in Readmission Prediction: An Overview of Approaches. Arab J Sci Eng. pmid:34422543
View Article
PubMed/NCBI
Google Scholar

[219] View Article

[220] PubMed/NCBI

[221] Google Scholar

[ref61] 61. Li Z, Xing X, Lu B, Zhao Y, Li Z. Early Prediction of 30-Day ICU Re-admissions Using Natural Language Processing and Machine Learning. Biomed Stat Informatics. 2019;4: 22.
View Article
Google Scholar

[223] View Article

[224] Google Scholar

[ref62] 62. Sterling NW, Patzer RE, Di M, Schrager JD. Prediction of emergency department patient disposition based on natural language processing of triage notes. Int J Med Inform. 2019;129: 184–188. pmid:31445253
View Article
PubMed/NCBI
Google Scholar

[226] View Article

[227] PubMed/NCBI

[228] Google Scholar

[ref63] 63. Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD, Morley K, et al. Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances. J Biomed Inform. 2018;88: 11–19. pmid:30368002
View Article
PubMed/NCBI
Google Scholar

[230] View Article

[231] PubMed/NCBI

[232] Google Scholar

[ref64] 64. Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Informatics. 2019;7: e12239. pmid:31066697
View Article
PubMed/NCBI
Google Scholar

[234] View Article

[235] PubMed/NCBI

[236] Google Scholar

[ref65] 65. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6: 27.
View Article
Google Scholar

[238] View Article

[239] Google Scholar

[ref66] 66. Artetxe A, Graña M, Beristain A, Ríos S. Emergency Department Readmission Risk Prediction: A Case Study in Chile. In: Vicente JMF, AlvarezSanchez JR, Lopez F, Moreo JT, Adeli H, editors. BIOMEDICAL APPLICATIONS BASED ON NATURAL AND ARTIFICIAL COMPUTING, PT II. Vicomtech IK4 Res Ctr, Mikeletegi Pasealekua 57, San Sebastian 20009, Spain; 2017. pp. 11–20. https://doi.org/10.1007/978-3-319-59773-7_2

[ref67] 67. Tanha J, Abdi Y, Samadi N, Razzaghi N, Asadpour M. Boosting methods for multi-class imbalanced data classification: an experimental review. J Big Data. 2020;7: 70.
View Article
Google Scholar

[242] View Article

[243] Google Scholar

[ref68] 68. Guo X, Yin Y, Dong C, Yang G, Zhou G. On the Class Imbalance Problem. 2008 Fourth International Conference on Natural Computation. IEEE; 2008. pp. 192–201. https://doi.org/10.1109/ICNC.2008.871

[ref69] 69. He Haibo, Garcia EA. Learning from Imbalanced Data. IEEE Trans Knowl Data Eng. 2009;21: 1263–1284.
View Article
Google Scholar

[246] View Article

[247] Google Scholar

[ref70] 70. Kaur H, Pannu HS, Malhi AK. A Systematic Review on Imbalanced Data Challenges in Machine Learning. ACM Comput Surv. 2020;52: 1–36.
View Article
Google Scholar

[249] View Article

[250] Google Scholar

[ref71] 71. Norgeot B, Quer G, Beaulieu-Jones BK, Torkamani A, Dias R, Gianfrancesco M, et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med. 2020;26: 1320–1324. pmid:32908275
View Article
PubMed/NCBI
Google Scholar

[252] View Article

[253] PubMed/NCBI

[254] Google Scholar

[ref72] 72. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1: 67–82.
View Article
Google Scholar

[256] View Article

[257] Google Scholar

[ref73] 73. Zhou Z-H. Ensemble Learning. Encyclopedia of Biometrics. Boston, MA: Springer US; 2009. pp. 270–273. https://doi.org/10.1007/978-0-387-73003-5_293

[ref74] 74. Zhang Y, Haghani A. A gradient boosting method to improve travel time prediction. Transp Res Part C Emerg Technol. 2015;58: 308–324.
View Article
Google Scholar

[260] View Article

[261] Google Scholar

[ref75] 75. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785

[ref76] 76. Opitz D, Maclin R. Popular Ensemble Methods: An Empirical Study. J Artif Intell Res. 1999;11: 169–198.
View Article
Google Scholar

[264] View Article

[265] Google Scholar

[ref77] 77. Graham B, Bond R, Quinn M, Mulvenna M. Using Data Mining to Predict Hospital Admissions From the Emergency Department. IEEE Access. 2018;6: 10458–10469.
View Article
Google Scholar

[267] View Article

[268] Google Scholar

[ref78] 78. Lo Y-T, Liao JC-H, Chen M-H, Chang C-M, Li C-T. Predictive modeling for 14-day unplanned hospital readmission risk by using machine learning algorithms. BMC Med Inform Decis Mak. 2021;21: 288. pmid:34670553
View Article
PubMed/NCBI
Google Scholar

[270] View Article

[271] PubMed/NCBI

[272] Google Scholar

[ref79] 79. Futoma J, Morris J, Lucas J. A comparison of models for predicting early hospital readmissions. J Biomed Inform. 2015;56: 229–238. pmid:26044081
View Article
PubMed/NCBI
Google Scholar

[274] View Article

[275] PubMed/NCBI

[276] Google Scholar

[ref80] 80. Li Q, Yao X, Échevin D. How Good Is Machine Learning in Predicting All-Cause 30-Day Hospital Readmission? Evidence From Administrative Data. Value Heal J Int Soc Pharmacoeconomics Outcomes Res. 2020;23: 1307–1315. pmid:33032774
View Article
PubMed/NCBI
Google Scholar

[278] View Article

[279] PubMed/NCBI

[280] Google Scholar

[ref81] 81. Barbieri S, Kemp J, Perez-Concha O, Kotwal S, Gallagher M, Ritchie A, et al. Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk. Sci Rep. 2020;10: 1111. pmid:31980704
View Article
PubMed/NCBI
Google Scholar

[282] View Article

[283] PubMed/NCBI

[284] Google Scholar

[ref82] 82. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110: 12–22. pmid:30763612
View Article
PubMed/NCBI
Google Scholar

[286] View Article

[287] PubMed/NCBI

[288] Google Scholar

[ref83] 83. Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, et al. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol. 2020;122: 95–107. pmid:32201256
View Article
PubMed/NCBI
Google Scholar

[290] View Article

[291] PubMed/NCBI

[292] Google Scholar

[ref84] 84. Talwar A, Lopez-Olivo MA, Huang Y, Ying L, Aparasu RR. Performance of advanced machine learning algorithms OVER logistic regression in predicting hospital readmissions: A meta-analysis. Explor Res Clin Soc Pharm. 2023; 100317. pmid:37662697
View Article
PubMed/NCBI
Google Scholar

[294] View Article

[295] PubMed/NCBI

[296] Google Scholar

[ref85] 85. Kino S, Hsu Y-T, Shiba K, Chien Y-S, Mita C, Kawachi I, et al. A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects. SSM—Popul Heal. 2021;15: 100836. pmid:34169138
View Article
PubMed/NCBI
Google Scholar

[298] View Article

[299] PubMed/NCBI

[300] Google Scholar

[ref86] 86. Mesgarpour M, Chaussalet T, Chahed S. Ensemble Risk Model of Emergency Admissions (ERMER). Int J Med Inform. 2017;103: 65–77. pmid:28551003
View Article
PubMed/NCBI
Google Scholar

[302] View Article

[303] PubMed/NCBI

[304] Google Scholar

[ref87] 87. Peck JS, Benneyan JC, Nightingale DJ, Gaehde SA. Predicting emergency department inpatient admissions to improve same-day patient flow. Acad Emerg Med Off J Soc Acad Emerg Med. 2012;19: E1045–54. pmid:22978731
View Article
PubMed/NCBI
Google Scholar

[306] View Article

[307] PubMed/NCBI

[308] Google Scholar

[ref88] 88. Flaks-Manov N, Shadmi E, Yahalom R, Perry-Mezre H, Balicer R, Srulovici E. Identification of elderly patients at-risk for 30-day readmission: clinical insight beyond big data prediction. J Nurs Manag. 2021. pmid:34661943
View Article
PubMed/NCBI
Google Scholar

[310] View Article

[311] PubMed/NCBI

[312] Google Scholar

[ref89] 89. Ramspek CL, Jager KJ, Dekker FW, Zoccali C, van Diepen M. External validation of prognostic models: what, why, how, when and where? Clin Kidney J. 2021;14: 49–58. pmid:33564405
View Article
PubMed/NCBI
Google Scholar

[314] View Article

[315] PubMed/NCBI

[316] Google Scholar

[ref90] 90. Riley RD, Ensor J, Snell KIE, Debray TPA, Altman DG, Moons KGM, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016; i3140. pmid:27334381
View Article
PubMed/NCBI
Google Scholar

[318] View Article

[319] PubMed/NCBI

[320] Google Scholar

[ref91] 91. Staartjes VE, Kernbach JM. Significance of external validation in clinical machine learning: let loose too early? Spine J. 2020;20: 1159–1160. pmid:32624150
View Article
PubMed/NCBI
Google Scholar

[322] View Article

[323] PubMed/NCBI

[324] Google Scholar

[ref92] 92. Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338: b605–b605. pmid:19477892
View Article
PubMed/NCBI
Google Scholar

[326] View Article

[327] PubMed/NCBI

[328] Google Scholar

[ref93] 93. Cabitza F, Campagner A, Soares F, García de Guadiana-Romualdo L, Challa F, Sulejmani A, et al. The importance of being external. methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed. 2021;208: 106288. pmid:34352688
View Article
PubMed/NCBI
Google Scholar

[330] View Article

[331] PubMed/NCBI

[332] Google Scholar

[ref94] 94. Ryu B, Yoo S, Kim S, Choi J. Development of Prediction Models for Unplanned Hospital Readmission within 30 Days Based on Common Data Model: A Feasibility Study. Methods Inf Med. 2021. pmid:34583416
View Article
PubMed/NCBI
Google Scholar

[334] View Article

[335] PubMed/NCBI

[336] Google Scholar

[ref95] 95. Rieke N, Hancox J, Li W, Milletarì F, Roth HR, Albarqouni S, et al. The future of digital health with federated learning. npj Digit Med. 2020;3: 119. pmid:33015372
View Article
PubMed/NCBI
Google Scholar

[338] View Article

[339] PubMed/NCBI

[340] Google Scholar

[ref96] 96. Yang C, Rangarajan A, Ranka S. Global Model Interpretation via Recursive Partitioning. 2018 [cited 27 Dec 2022]. Available: http://arxiv.org/abs/1802.04253.
View Article
Google Scholar

[342] View Article

[343] Google Scholar

[ref97] 97. Kopitar L, Cilar L, Kocbek P, Stiglic G. Local vs. Global Interpretability of Machine Learning Models in Type 2 Diabetes Mellitus Screening. 2019. pp. 108–119.
View Article
Google Scholar

[345] View Article

[346] Google Scholar

[ref98] 98. Du M, Liu N, Hu X. Techniques for interpretable machine learning. Commun ACM. 2019;63: 68–77.
View Article
Google Scholar

[348] View Article

[349] Google Scholar

[ref99] 99. Wu CX, Suresh E, Phng FWL, Tai KP, Pakdeethai J, D’Souza JLA, et al. Effect of a Real-Time Risk Score on 30-day Readmission Reduction in Singapore. Appl Clin Inform. 2021;12: 372–382. pmid:34010978
View Article
PubMed/NCBI
Google Scholar

[351] View Article

[352] PubMed/NCBI

[353] Google Scholar

[ref100] 100. Maali Y, Perez-Concha O, Coiera E, Roffe D, Day RO, Gallego B. Predicting 7-day, 30-day and 60-day all-cause unplanned readmission: a case study of a Sydney hospital. BMC Med Informatics Decis Mak. 2018;18: 1-N.PAG. pmid:29301576
View Article
PubMed/NCBI
Google Scholar

[355] View Article

[356] PubMed/NCBI

[357] Google Scholar

[ref101] 101. Petch J, Di S, Nelson W. Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology. Can J Cardiol. 2022;38: 204–213. pmid:34534619
View Article
PubMed/NCBI
Google Scholar

[359] View Article

[360] PubMed/NCBI

[361] Google Scholar

[ref102] 102. Ribeiro MT, Singh S, Guestrin C. “Why should i trust you?” Explaining the predictions of any classifier. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery; 2016. pp. 1135–1144.
View Article
Google Scholar

[363] View Article

[364] Google Scholar

[ref103] 103. Sheu Y. Illuminating the Black Box: Interpreting Deep Neural Network Models for Psychiatric Research. Front Psychiatry. 2020;11. pmid:33192663
View Article
PubMed/NCBI
Google Scholar

[366] View Article

[367] PubMed/NCBI

[368] Google Scholar

[ref104] 104. Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. 2020;58: 82–115.
View Article
Google Scholar

[370] View Article

[371] Google Scholar

[ref105] 105. McDermott MBA, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: Still a ways to go. Sci Transl Med. 2021;13. pmid:33762434
View Article
PubMed/NCBI
Google Scholar

[373] View Article

[374] PubMed/NCBI

[375] Google Scholar

[ref106] 106. Pineau J, Vincent-Lamarre P, Sinha K, Lariviére V, Beygelzimer A, d’Alché-Buc F, et al. Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program). J Mach Learn Res. 2020;22: 1–20.
View Article
Google Scholar

[377] View Article

[378] Google Scholar

[ref107] 107. Haibe-Kains B, Adam GA, Hosny A, Khodakarami F, Shraddha T, Kusko R, et al. Transparency and reproducibility in artificial intelligence. Nature. 2020;586: E14–E16. pmid:33057217
View Article
PubMed/NCBI
Google Scholar

[380] View Article

[381] PubMed/NCBI

[382] Google Scholar

[ref108] 108. Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3: 160035. pmid:27219127
View Article
PubMed/NCBI
Google Scholar

[384] View Article

[385] PubMed/NCBI

[386] Google Scholar

[ref109] 109. Harutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A. Multitask learning and benchmarking with clinical time series data. Sci Data. 2019;6: 96. pmid:31209213
View Article
PubMed/NCBI
Google Scholar

[388] View Article

[389] PubMed/NCBI

[390] Google Scholar

[ref110] 110. Johnson AEW, Ghassemi MM, Nemati S, Niehaus KE, Clifton D, Clifford GD. Machine Learning and Decision Support in Critical Care. Proc IEEE. 2016;104: 444–466. pmid:27765959
View Article
PubMed/NCBI
Google Scholar

[392] View Article

[393] PubMed/NCBI

[394] Google Scholar

[ref111] 111. Varoquaux G, Cheplygina V. Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digit Med. 2022;5: 48. pmid:35413988
View Article
PubMed/NCBI
Google Scholar

[396] View Article

[397] PubMed/NCBI

[398] Google Scholar

[ref112] 112. Beam AL, Manrai AK, Ghassemi M. Challenges to the Reproducibility of Machine Learning Models in Health Care. JAMA. 2020;323: 305. pmid:31904799
View Article
PubMed/NCBI
Google Scholar

[400] View Article

[401] PubMed/NCBI

[402] Google Scholar

[ref113] 113. Lans A, Pierik RJB, Bales JR, Fourman MS, Shin D, Kanbier LN, et al. Quality assessment of machine learning models for diagnostic imaging in orthopaedics: A systematic review. Artif Intell Med. 2022;132: 102396. pmid:36207080
View Article
PubMed/NCBI
Google Scholar

[404] View Article

[405] PubMed/NCBI

[406] Google Scholar

[ref114] 114. Yusuf M, Atal I, Li J, Smith P, Ravaud P, Fergie M, et al. Reporting quality of studies using machine learning models for medical diagnosis: a systematic review. BMJ Open. 2020;10: e034568. pmid:32205374
View Article
PubMed/NCBI
Google Scholar

[408] View Article

[409] PubMed/NCBI

[410] Google Scholar

[ref115] 115. Li J, Zhou Z, Dong J, Fu Y, Li Y, Luan Z, et al. Predicting breast cancer 5-year survival using machine learning: A systematic review. Baltzer PAT, editor. PLoS One. 2021;16: e0250370. pmid:33861809
View Article
PubMed/NCBI
Google Scholar

[412] View Article

[413] PubMed/NCBI

[414] Google Scholar

[ref116] 116. Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11: e048008. pmid:34244270
View Article
PubMed/NCBI
Google Scholar

[416] View Article

[417] PubMed/NCBI

[418] Google Scholar

Figures

Abstract

Aim

Methods

Results

Conclusion

Introduction

Materials and methods

Inclusions/Exclusion criteria

Search strategy

Data extraction

Assessment of bias and applicability

Quality of reporting

Results

Data extraction results

Characteristics of the included studies.

Population characteristics.

Outcomes characteristics.

Datasets

Types of features included in models.

Missing data and data imbalance.

Models’ performance and comparison

Comparing the performance of algorithms.

Evaluation metrics.

Model calibration and benchmarking.

Model validation.

Model explainability and availability

Quality of the studies

Bias and applicability assessment.

Reporting quality assessment.

Discussion

Datasets and features

Data preprocessing

Models’ performance comparisons

Model validation

Model explainability and availability

Bias risk and applicability

Limitations

Conclusions

Supporting information

References