Figures
Abstract
Background
Surgical site infections (SSIs) lead to increased mortality and morbidity, as well as increased healthcare costs. Multiple models for the prediction of this serious surgical complication have been developed, with an increasing use of machine learning (ML) tools.
Objective
The aim of this systematic review was to assess the performance as well as the methodological quality of validated ML models for the prediction of SSIs.
Methods
A systematic search in PubMed, Embase and the Cochrane library was performed from inception until July 2023. Exclusion criteria were the absence of reported model validation, SSIs as part of a composite adverse outcome, and pediatric populations. ML performance measures were evaluated, and ML performances were compared to regression-based methods for studies that reported both methods. Risk of bias (ROB) of the studies was assessed using the Prediction model Risk of Bias Assessment Tool.
Results
Of the 4,377 studies screened, 24 were included in this review, describing 85 ML models. Most models were only internally validated (81%). The C-statistic was the most used performance measure (reported in 96% of the studies) and only two studies reported calibration metrics. A total of 116 different predictors were described, of which age, steroid use, sex, diabetes, and smoking were most frequently (100% to 75%) incorporated. Thirteen studies compared ML models to regression-based models and showed a similar performance of both modelling methods. For all included studies, the overall ROB was high or unclear.
Conclusions
A multitude of ML models for the prediction of SSIs are available, with large variability in performance. However, most models lacked external validation, performance was reported limitedly, and the risk of bias was high. In studies describing both ML models and regression-based models, one modelling method did not outperform the other.
Citation: van Boekel AM, van der Meijden SL, Arbous SM, Nelissen RGHH, Veldkamp KE, Nieswaag EB, et al. (2024) Systematic evaluation of machine learning models for postoperative surgical site infection prediction. PLoS ONE 19(12): e0312968. https://doi.org/10.1371/journal.pone.0312968
Editor: Mohamad K. Abou Chaar, Mayo Clinic Rochester, UNITED STATES OF AMERICA
Received: January 21, 2024; Accepted: October 15, 2024; Published: December 12, 2024
Copyright: © 2024 van Boekel et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: B.F. Geerts declares to be shareholder and owner of Healthplus.ai S.L. van der Meijden, M. Wiewel, E.B. Nieswaag, K.F.T. Jochems, J. Holtz, A. van IJlzinga Veenstra, and J. Reijman declare to be an employee of Healthplus.ai. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Introduction
Surgical site infections (SSIs) are known complications following surgery and belong to the most frequently occurring hospital-acquired infections. The incidence of SSIs ranges between 0.6% and 18% and depend on the type of surgical procedure and setting [1–4]. Surgical site infections lead to increased morbidity, mortality and hospital stay, resulting in a negative impact on the patient’s health-related quality of life [2]. Moreover, SSIs cause an increase in healthcare costs due to prolonged hospitalization, the need for extra diagnostic tests and interventions, and prolonged treatment. Recent meta-analyses showed an additional length of hospital stay between 2.1 to 54 days for patients with an SSI [2] with an estimated cost ranging from USD $10,443 to USD$ 25,546 per case [3]. Early detection and treatment are important for reducing these negative effects of SSIs.
Several risk factors for the development of SSIs have been identified such as sex, BMI, comorbidity American Society of Anesthesiologist (ASA) score, smoking, age and surgical approach [5, 6]. Several prognostic prediction models have been developed to identify which patients are at risk for developing an SSI. Besides traditional models, such as those using logistic regression [7], machine learning (ML) models are increasingly being developed and used for this purpose. ML comprises a wide spectrum of different algorithms that automatically learn from presented and new input data in a continuous iterative process, and variable selection for ML models is performed by these algorithms. This in contrast to traditional models, where variable selection and internal model settings are more dictated by humans [8–10]. ML models benefit not only from this iterative learning process, but also from using more and different types of input variables. The complex algorithmic structure can find non-linear relations between variables, which contrasts with traditional regression-based models [11]. The disadvantage of ML models is that the outcomes result in “black-box” predictions, where the used data for ML model output, the (relative) importance of these data and their possible mutual effects are less evident compared to regression-based models [12, 13].
To evaluate the statistical performance of prediction models, discriminative performance in terms of concordance statistics (C-statistic), also known as area under the receiver operating characteristic curve (ROC or area under the curve -AUC-), and calibration in terms of calibration plots with slope and intercept are most often assessed [14]. Discriminative performance is the ability of the model to distinguish between patients with and without the outcome, whereas calibration is the agreement between the predicted probability and the proportion of patients with the actual outcome. Prediction models are first internally validated, using for example cross-validation or bootstrapping. Thereafter, external validation should be performed either on other hospital datasets, prospectively in time, or both, to ensure generalizability [15].
ML models are increasingly being developed for many different purpuses in surgery [7]. Elfanagely et al [16] described 45 ML models used for the prediction of surgical outcomes and another review [17] summarized the outcomes of 212 articles with ML models developed for prediciting a broad spectrum of outcomes in vascular surgery. The ML models performed reasonably well, but there were concerns regarding the risk of bias. A recent systematic review and meta-analysis performed by Wu et al. showed that there are many different ML models for the prediction or detection of SSIs, but that the validation of these models is generally insufficient [18]. Wu et al. mainly focused on the methodological aspects of the models and made no distinction between the prediction of SSI or SSI detection for surveillance purposes. Moreover, a clear overview of the available models for different surgical specialties or SSI subtypes (superficial-, deep- or organ space SSI) is still missing. The number of models developed for SSI prediction is increasing, and since 2021 new models have been developed. The aim of this systematic review was therefore to describe the performance of all internally or externally validated ML models for the prediction of SSI, to describe the methodological quality of the studies studying ML models for prediction of SSI, and to give an overview of the available models per surgical specialty and SSI subtype.
Methods
A systematic review of the published literature on the prediction of postoperative infections was conducted according to the Preferred Reporting items for Systematic Reviews and Meta-Analyses (PRISMA) statement (S1 Appendix). The protocol for this study was registered in PROSPERO (registration number 248953).
Search strategy
The literature search was performed in MEDLINE, EMBASE and the Cochrane Library from inception to July 1, 2023. The complete search strings are shown in the Supplementary material (S2 Appendix).
Inclusion and exclusion criteria
All original studies that developed and validated (internally or externally) ML models for the prediction of SSIs and studies that externally validated ML models that were previously developed were included. Models were considered to be an ML model if a non-regression-based approach for model development was used such as random forests, support vector machines and neural networks. As outcome, prediction of all types of SSIs within 30 days postoperative were included. Models that only predicted SSIs as part of a composite adverse outcome were excluded. Other exclusion criteria were pediatric populations (age <18 years old), no full text article available, and articles not written in English language.
Screening and data extraction
Study selection was performed using the Covidence® software program (www.covidence.org, Melbourne, Australia). After removal of duplicates, titles and abstracts were screened on full text inclusion criteria by two independent authors (AB, BG, or MW). Full text analysis of the remaining articles was performed by the same authors. All conflicts were resolved by a third reviewer (MB or SA).
The following data were obtained from the included articles: type of SSI predicted (either superficial, deep or organ space), surgical specialty, number of surgeries, patients or both, performance parameters of the model (sensitivity, specificity, accuracy, calibration and C-statistic), method of validation, variables used as predictors, and all types of developed and/or validated models (ML as well as regression-based models). A complete list of the extracted data is provided in S2 Table. Reviewers used a standardized data extraction form that was based on the CHARMS (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies) [19]. Extracted data was double checked for inconsistencies by AB and BG and discrepancies were resolved by consensus.
Descriptive analyses
Results were summarized using descriptive statistics. We did not perform a meta-analysis due to the heterogeneity in reported outcome measures and definitions. Analyses were performed using R (version 2023.06.1+524, R Core Team, Vienna, Austria).
Risk of bias
The methodological quality of all included studies was assessed using the Prediction model study Risk of Bias Assessment Tool (PROBAST) [20, 21]. The PROBAST is designed to critically appraise prediction models and contains two main domains: the risk of bias, which consists of four subdomains (participant selection, predictor selection, outcome definition and analysis) and the applicability for the review. In total there are 20 signaling questions which can be scored as ‘yes’, ‘probably yes’, ‘probably no’, ‘no’, or ‘no information’ which combined lead to a low, uncertain, or high risk of bias and applicability.
Results
A flowchart of the search is summarized in Fig 1. Of the 4,377 publications identified, 24 studies were included for further analysis. See S1 Table for the exclusion reasons of the excluded full text articles.
Characteristics of included studies
The 24 included studies described a total of 85 different ML models. Sixty-nine models (81%) were internally validated and 16 (19%) were externally validated, including one model (the Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator) that was externally validated in five separate studies. The most frequently predicted outcome was SSI in general (i.e., a combination of superficial-, deep- and organ space SSI or unspecified), 11 models predicted superficial SSI, nine models predicted deep SSI and 24 models predicted organ space SSI. Abdominal surgery was the surgical specialty for which most models were developed (47%), followed by general surgery (21%) and orthopedic surgery (8%). See Table 1 for an overview of all included studies.
Performance of ML models
The most common reported outcome for model performance was the C-statistic, which was reported in 96% of the studies. Other model performance parameters reported were sensitivity, specificity, negative predicting value and positive predicting value. Only two studies reported calibration metrics of which one study also reported the brier score [39, 44]. Of the internally validated models, the median C-statistic was 0.62 and ranged from 0.44 to 0.99, for the externally validated models the median C-statistic was 0.79 and ranged from 0.55 to 0.87. Sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) were reported in one externally validated model by Grass et al. and were 0.47, 0.8, 0.97 and 0.10 respectively. Of the internally validated models, sensitivity was reported for twenty (29%) models and varied between 0.24 to 0.90, specificity was reported for fifteen (22%) models and varied between 0.25 to 0.91, NPV was reported for four (6%) models and varied between 0.87 to 0.98 and PPV was reported for eleven (16%) models and varied between 0.06 to 0.90 respectively. Overall, the performance of the models varied widely and there was no clear difference between the different surgical specialties or type of SSI predicted (Tables 2–5).
Predictors used in ML models
Of the 85 included ML models, the number of predictors used in the model was reported for 20 models (24%), with mentioning of feature importance (determined by SHAP values) in 15 models (18%). In total 116 different predictors were used in these 20 models. The median number of included predictors per model was 22, ranging from 5–56. The most commonly included predictors were age (100%), oral corticosteroid use (85%), sex (85%), smoking (80%), and diabetes (75%) (Fig 2).
All predictors used five times or more are included in the figure. ASA classification (American Society of Anesthesiologists); BMI, (Body Mass Index); COPD (Chronic Obstructive Pulmonary Disease); INR, (International Normalized Ratio); PT, (Prothrombin time); WBC, (White blood count).
Regression-based models
Of the 24 studies, thirteen studies (54%) also included regression-based models and compared the regression-based performance to the performance of their developed ML models (Fig 3 and S3 Table). The C-statistic for regression-based models varied between 0.41 to 0.95. For the prediction of SSIs, ML performed slightly better compared to regression-based models in four studies [27, 29, 35, 44], whereas regression-based models performed better in two studies [26, 41]. In the other studies reporting both regression-based and ML models, performances were similar [23, 30, 31, 39, 40, 43, 45]. See Fig 3 for an overview of the AUCs of the studies presenting both ML and regression-based models.
Green dots represent the AUC of the ML models, orange dots represent the AUC of the regression-based models. The green and orange lines represent the median.
Risk of bias
The ROB was assessed for all models described in the 24 studies. ROB was low in the participants domain. ROB was high or unclear in the predictors domain and outcome domain, as studies often poorly reported the used predictors and whether predictors were selected independent of the outcome status. In the analysis domain, all studies had a high or unclear ROB, mostly caused by statistical issues such as poor reporting of performance measures, not taking competing risks into account and inappropriate methods to handle missing data. There were no concerns on applicability for all studies. See Fig 4 for an overview of ROB, and S4 Table for the complete ROB.
Green low risk of bias, yellow unclear risk of bias due to lack of information, red high risk of bias. ROB; Risk of bias.
Discussion
This systematic review showed that a multitude of 85 different validated ML prediction models for SSIs exists. Most models were developed and tested in patient populations that underwent abdominal surgery. Most of these models (81%) were only internally validated. The most frequently reported parameter for performance was the C-statistic, which varied widely between the different models, and only two studies reported calibration metrics. This corresponds with previous studies on the use of ML in other fields, that found that calibration is rarely reported and that only a minority of the models is externally validated [11, 46, 47]. However, for proper assessment of model performance, both discrimination and calibration are essential parameters for the interpretation of the predicted probabilities [14]. Without external validation of a prediction model, it is difficult to accurately estimate the actual performance of a model in different clinical practices. Furthermore, it is common that retraining or recalibration of an ML model is necessary to fit the unseen population [48]. Therefore, newly developed ML prediction models as well as already existing models need to be retrained, recalibrated, and again validated for new populations. Furthermore, their effect on patient care should then be evaluated and reported with impact studies.
Thirteen of the included studies described both regression-based and ML models and compared their performances in the same population. Both the regression-based models as well as the ML models showed large variability of performance, which is in accordance with previous literature on regression-based models for the prediction of SSIs [49–51]. When compared, the ML and regression-based models did not outperform each other. This is in accordance with previous studies that compared ML models with regression-based models, although some studies suggest that certain subtypes of ML (i.e. gradient boosting trees) perform better than regression-based models [52, 53]. ML models generally need larger datasets to use their full potential. It is possible that this condition was not met in all studies, as the median number of predictors was 22 and the sample size ranged from 256 to 5,881,881.
Model explainability is an important issue with ML prediction models. In general, ML models are considered to be more complex and less transparent with respect to which variables are selected for the prediction compared to regression-based models. Furthermore, in our study, transparency of ML models was further limited as only in the minority of the ML models (24%) the used predictors were reported. This contrasts with regression-based models which are usually presented with regression coefficients representing the strength of the relation between individual predictors and the outcome [54]. Despite being less transparent, ML models are able to utilize large and heterogenous number of datasets and types, can take into account more complex relationships of predictors, can be adapted to the local setting if the model has been validated or recalibrated to this population and can be incorporated in the electronic health care system, making them potentially more beneficial when implemented into clinical care [10].
The ROB was high or unclear for almost all studies, suggesting considerable methodological issues. ROB was scored using the PROBAST which is the most common used tool to estimate ROB of prediction studies. Although an high or unclear ROB for almost all studies is in agreement with previous reviews using the PROBAST [55–57], the PROBAST has been criticized because of poor inter-rater agreement [56, 57]. Moreover, it is not possible to distinguish domains with a high ROB based on one single signaling question answered with ‘no’ from domains with all signaling questions answered with ‘no’. Despite the limitations of the PROBAST, it remains a useful tool to assess methodological shortcomings in prediction studies. Therefore, caution for the interpretation of the findings from these ML models for SSI prediction is recommended. Recently, the new TRIPOD-AI guidelines have been published and new ML models being developedshould follow these guidelines in order to prevent bias [58].
Strengths and limitations
The major strength of this systematic review is that it included all presently available validated ML models for the prediction of SSIs without restrictions on surgical specialty or SSI subtype. In addition, we described the comparison of regression-based models with ML models where possible. As both types of models were compared to each other within the same population, bias was minimalized.
Some limitations exist.
Differences in the quality and the heterogeneity of the data prevented the conduction of a sound meta-analysis comparing the different ML models. Furthermore, this review is limited to SSIs as outcome, although other postoperative infections such as pneumonia and bloodstream infections are also clinically relevant.
Conclusions
This systematic review showed that many ML models for the prediction of SSIs exist, and that their performance generally is equal to regression-based models. Machine learning techniques are still developing and are seen as a promising tool to improve medical care. However, there are multiple methodological issues with the currently available models and there is still a substantial gap between the existing models and their practical and safe implementation in clinical settings. The recently published TRIPOD-AI guidelines should be used to reduce methodological flaws. To create clinically relevant prediction models for future use, more collaboration between clinicians and data scientists, as well as post-implementation studies are needed.
Supporting information
S2 Table. Extracted parameters from the data.
https://doi.org/10.1371/journal.pone.0312968.s004
(DOCX)
S3 Table. Studies with both ML and regression-based models.
https://doi.org/10.1371/journal.pone.0312968.s005
(DOCX)
S4 Table. Risk of bias assessment with the use of the PROBAST score.
https://doi.org/10.1371/journal.pone.0312968.s006
(DOCX)
Acknowledgments
The author would like to thank Rory Monahan for proofreading the pre-final manuscript.
References
- 1.
Global patient outcomes after elective surgery: prospective cohort study in 27 low-, middle- and high-income countries. Br J Anaesth. 2016;117(5):601–9.
- 2. Badia JM, Casey AL, Petrosillo N, Hudson PM, Mitchell SA, Crosby C. Impact of surgical site infection on healthcare costs and patient outcomes: a systematic review in six European countries. J Hosp Infect. 2017;96(1):1–15. pmid:28410761
- 3. Gillespie BM, Harbeck E, Rattray M, Liang R, Walker R, Latimer S, et al. Worldwide incidence of surgical site infections in general surgical patients: A systematic review and meta-analysis of 488,594 patients. Int J Surg. 2021;95:106136. pmid:34655800
- 4.
European Centre for Disease Prevention and Control. Healthcare-associated infections: surgical site infections. ECDC. Annual epidemiological report for 2018–2020. Stockholm; 2023.
- 5. Qu H, Liu Y, Bi DS. Clinical risk factors for anastomotic leakage after laparoscopic anterior resection for rectal cancer: a systematic review and meta-analysis. Surg Endosc. 2015;29(12):3608–17. pmid:25743996
- 6. Dietz N, Sharma M, Alhourani A, Ugiliweneza B, Wang D, Drazin D, et al. Evaluation of Predictive Models for Complications following Spinal Surgery. J Neurol Surg A Cent Eur Neurosurg. 2020;81(6):535–45. pmid:32797468
- 7. Guo Y, Hao Z, Zhao S, Gong J, Yang F. Artificial Intelligence in Health Care: Bibliometric Analysis. J Med Internet Res. 2020;22(7):e18228. pmid:32723713
- 8. Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347–58. pmid:30943338
- 9. Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to Machine Learning, Neural Networks, and Deep Learning. Transl Vis Sci Technol. 2020;9(2):14. pmid:32704420
- 10. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2018;19(6):1236–46. pmid:28481991
- 11. Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, et al. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol. 2023;154:8–22. pmid:36436815
- 12. Solomonides AE, Koski E, Atabaki SM, Weinberg S, McGreevey JD, Kannry JL, et al. Defining AMIA’s artificial intelligence principles. J Am Med Inform Assoc. 2022;29(4):585–91. pmid:35190824
- 13. Hunter DJ, Holmes C. Where Medical Statistics Meets Artificial Intelligence. N Engl J Med. 2023;389(13):1211–9. pmid:37754286
- 14. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38. pmid:20010215
- 15. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–31. pmid:24898551
- 16. Elfanagely O, Toyoda Y, Othman S, Mellia JA, Basta M, Liu T, et al. Machine Learning and Surgical Outcomes Prediction: A Systematic Review. J Surg Res. 2021;264:346–61. pmid:33848833
- 17. Li B, Feridooni T, Cuen-Ojeda C, Kishibe T, de Mestral C, Mamdani M, et al. Machine learning in vascular surgery: a systematic review and critical appraisal. NPJ Digit Med. 2022;5(1):7. pmid:35046493
- 18. Wu G, Khair S, Yang F, Cheligeer C, Southern D, Zhang Z, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: A systematic review and meta-analysis. Ann Med Surg (Lond). 2022;84:104956. pmid:36582918
- 19. Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. pmid:25314315
- 20. Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med. 2019;170(1):W1–w33. pmid:30596876
- 21. de Jong Y, Ramspek CL, Zoccali C, Jager KJ, Dekker FW, van Diepen M. Appraising prediction research: a guide and meta-review on bias and applicability assessment using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Nephrology (Carlton). 2021;26(12):939–47. pmid:34138495
- 22. Bertsimas D, Dunn J, Velmahos GC, Kaafarani HMA. Surgical Risk Is Not Linear: Derivation and Validation of a Novel, User-friendly, and Machine-learning-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator. Ann Surg. 2018;268(4):574–83. pmid:30124479
- 23. Bonde A, Varadarajan KM, Bonde N, Troelsen A, Muratoglu OK, Malchau H, et al. Assessing the utility of deep neural networks in predicting postoperative surgical complications: a retrospective study. Lancet Digit Health. 2021;3(8):e471–e85. pmid:34215564
- 24. Chang B, Sun Z, Peiris P, Huang ES, Benrashid E, Dillavou ED. Deep Learning-Based Risk Model for Best Management of Closed Groin Incisions After Vascular Surgery. J Surg Res. 2020;254:408–16. pmid:32197791
- 25. El Hechi MW, Maurer LR, Levine J, Zhuo D, El Moheb M, Velmahos GC, et al. Validation of the Artificial Intelligence-Based Predictive Optimal Trees in Emergency Surgery Risk (POTTER) Calculator in Emergency General Surgery and Emergency Laparotomy Patients. J Am Coll Surg. 2021. pmid:33705983
- 26. Gowd AK, Agarwalla A, Amin NH, Romeo AA, Nicholson GP, Verma NN, et al. Construct validation of machine learning in the prediction of short-term postoperative complications following total shoulder arthroplasty. J Shoulder Elbow Surg. 2019;28(12):e410–e21. pmid:31383411
- 27. Grass F, Storlie CB, Mathis KL, Bergquist JR, Asai S, Boughey JC, et al. Challenges of Modeling Outcomes for Surgical Infections: A Word of Caution. Surg Infect (Larchmt). 2020.
- 28. Ke C, Jin Y, Evans H, Lober B, Qian X, Liu J, et al. Prognostics of surgical site infections using dynamic health data. J Biomed Inform. 2017;65:22–33. pmid:27825798
- 29. Liu WC, Ying H, Liao WJ, Li MP, Zhang Y, Luo K, et al. Using Preoperative and Intraoperative Factors to Predict the Risk of Surgical Site Infections After Lumbar Spinal Surgery: A Machine Learning-Based Study. World Neurosurg. 2022;162:e553–e60. pmid:35318153
- 30. Liu X, Lei S, Wei Q, Wang Y, Liang H, Chen L. Machine Learning-based Correlation Study between Perioperative Immunonutritional Index and Postoperative Anastomotic Leakage in Patients with Gastric Cancer. Int J Med Sci. 2022;19(7):1173–83. pmid:35919820
- 31. Mamlook REA, Wells LJ, Sawyer R. Machine-learning models for predicting surgical site infections using patient pre-operative risk and surgical procedure factors. Am J Infect Control. 2023;51(5):544–50. pmid:36002080
- 32. Maurer LR, Chetlur P, Zhuo D, El Hechi M, Velmahos GC, Dunn J, et al. Validation of the AI-based Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) Calculator in Patients 65 Years and Older. Ann Surg. 2020;Publish Ahead of Print.
- 33. Mazaki J, Katsumata K, Ohno Y, Udo R, Tago T, Kasahara K, et al. A Novel Predictive Model for Anastomotic Leakage in Colorectal Cancer Using Auto-artificial Intelligence. Anticancer Res. 2021;41(11):5821–5. pmid:34732457
- 34. Merath K, Hyer JM, Mehta R, Farooq A, Bagante F, Sahara K, et al. Use of Machine Learning for Prediction of Patient Risk of Postoperative Complications After Liver, Pancreatic, and Colorectal Surgery. J Gastrointest Surg. 2020;24(8):1843–51. pmid:31385172
- 35. Nudel J, Bishara AM, de Geus SWL, Patil P, Srinivasan J, Hess DT, et al. Development and validation of machine learning models to predict gastrointestinal leak and venous thromboembolism after weight loss surgery: an analysis of the MBSAQIP database. Surg Endosc. 2021;35(1):182–91. pmid:31953733
- 36. Ohno Y, Mazaki J, Udo R, Tago T, Kasahara K, Enomoto M, et al. Preliminary Evaluation of a Novel Artificial Intelligence-based Prediction Model for Surgical Site Infection in Colon Cancer. Cancer Diagn Progn. 2022;2(6):691–6. pmid:36340449
- 37. Sanger PC, van Ramshorst GH, Mercan E, Huang S, Hartzler AL, Armstrong CA, et al. A Prognostic Model of Surgical Site Infection Using Daily Clinical Wound Assessment. J Am Coll Surg. 2016;223(2):259–70.e2. pmid:27188832
- 38. Taylor J, Meng X, Renson A, Smith AB, Wysock JS, Taneja SS, et al. Different models for prediction of radical cystectomy postoperative complications and care pathways. Ther Adv Urol. 2019;11:1756287219875587. pmid:31565072
- 39. Van Esbroeck A, Rubinfeld I, Hall B, Syed Z. Quantifying surgical complexity with machine learning: looking beyond patient factors to improve surgical models. Surgery. 2014;156(5):1097–105. pmid:25108343
- 40. van Kooten RT, Bahadoer RR, Ter Buurkes de Vries B, Wouters M, Tollenaar R, Hartgrink HH, et al. Conventional regression analysis and machine learning in prediction of anastomotic leakage and pulmonary complications after esophagogastric cancer surgery. J Surg Oncol. 2022;126(3):490–501. pmid:35503455
- 41. Velmahos CS, Paschalidis A, Paranjape CN. The Not-So-Distant Future or Just Hype? Utilizing Machine Learning to Predict 30-Day Post-Operative Complications in Laparoscopic Colectomy Patients. Am Surg. 2023:31348231167397. pmid:36992631
- 42. Walczak S, Davila M, Velanovich V. Prophylactic antibiotic bundle compliance and surgical site infections: an artificial neural network analysis. Patient Saf Surg. 2019;13:41. pmid:31827618
- 43. Weller GB, Lovely J, Larson DW, Earnshaw BA, Huebner M. Leveraging electronic health records for predictive modeling of post-surgical complications. Stat Methods Med Res. 2018;27(11):3271–85. pmid:29298612
- 44. Ying H, Guo BW, Wu HJ, Zhu RP, Liu WC, Zhong HF. Using multiple indicators to predict the risk of surgical site infection after ORIF of tibia fractures: a machine learning based study. Front Cell Infect Microbiol. 2023;13:1206393. pmid:37448774
- 45. Zhang N, Fan K, Ji H, Ma X, Wu J, Huang Y, et al. Identification of risk factors for infection after mitral valve surgery through machine learning approaches. Front Cardiovasc Med. 2023;10:1050698. pmid:37383697
- 46. van der Endt VHW, Milders J, Penning de Vries BBL, Trines SA, Groenwold RHH, Dekkers OM, et al. Comprehensive comparison of stroke risk score performance: a systematic review and meta-analysis among 6 267 728 patients with atrial fibrillation. Europace. 2022;24(11):1739–53. pmid:35894866
- 47. de Jong Y, Ramspek CL, van der Endt VHW, Rookmaaker MB, Blankestijn PJ, Vernooij RWM, et al. A systematic review and external validation of stroke prediction models demonstrates poor performance in dialysis patients. J Clin Epidemiol. 2020;123:69–79. pmid:32240769
- 48. de Hond AAH, Kant IMJ, Fornasa M, Cinà G, Elbers PWG, Thoral PJ, et al. Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model. Crit Care Med. 2023;51(2):291–300. pmid:36524820
- 49. Kunutsor SK, Whitehouse MR, Blom AW, Beswick AD. Systematic review of risk prediction scores for surgical site infection or periprosthetic joint infection following joint arthroplasty. Epidemiol Infect. 2017;145(9):1738–49. pmid:28264756
- 50. Gwilym BL, Ambler GK, Saratzis A, Bosanquet DC. Groin Wound Infection after Vascular Exposure (GIVE) Risk Prediction Models: Development, Internal Validation, and Comparison with Existing Risk Prediction Models Identified in a Systematic Literature Review. Eur J Vasc Endovasc Surg. 2021;62(2):258–66. pmid:34246547
- 51. Lubelski D, Alentado V, Nowacki AS, Shriver M, Abdullah KG, Steinmetz MP, et al. Preoperative Nomograms Predict Patient-Specific Cervical Spine Surgery Clinical and Quality of Life Outcomes. Neurosurgery. 2018;83(1):104–13. pmid:29106662
- 52. Song X, Liu X, Liu F, Wang C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int J Med Inform. 2021;151:104484. pmid:33991886
- 53. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. pmid:30763612
- 54. van Smeden M, Heinze G, Van Calster B, Asselbergs FW, Vardas PE, Bruining N, et al. Critical appraisal of artificial intelligence-based prediction models for cardiovascular disease. Eur Heart J. 2022;43(31):2921–30. pmid:35639667
- 55. Venema E, Wessler BS, Paulus JK, Salah R, Raman G, Leung LY, et al. Large-scale validation of the prediction model risk of bias assessment Tool (PROBAST) using a short form: high risk of bias models show poorer discrimination. J Clin Epidemiol. 2021;138:32–9. pmid:34175377
- 56. Langenhuijsen LFS, Janse RJ, Venema E, Kent DM, van Diepen M, Dekker FW, et al. Systematic metareview of prediction studies demonstrates stable trends in bias and low PROBAST inter-rater agreement. J Clin Epidemiol. 2023;159:159–73. pmid:37142166
- 57. Kaiser I, Pfahlberg AB, Mathes S, Uter W, Diehl K, Steeb T, et al. Inter-Rater Agreement in Assessing Risk of Bias in Melanoma Prediction Studies Using the Prediction Model Risk of Bias Assessment Tool (PROBAST): Results from a Controlled Experiment on the Effect of Specific Rater Training. J Clin Med. 2023;12(5). pmid:36902763
- 58. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Bmj. 2024;385:e078378. pmid:38626948