Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis

  • Jérôme Allyn ,

    allyn.jer@gmail.com

    Affiliations Réanimation Polyvalente, Centre Hospitalier Universitaire Félix Guyon, Saint-Denis, France, Département d'Anesthésie Réanimation, APHP, CHU Bichat-Claude Bernard, Paris, France

  • Nicolas Allou,

    Affiliations Réanimation Polyvalente, Centre Hospitalier Universitaire Félix Guyon, Saint-Denis, France, Département d'Anesthésie Réanimation, APHP, CHU Bichat-Claude Bernard, Paris, France

  • Pascal Augustin,

    Affiliation Département d'Anesthésie Réanimation, APHP, CHU Bichat-Claude Bernard, Paris, France

  • Ivan Philip,

    Affiliations Département d'Anesthésie Réanimation, APHP, CHU Bichat-Claude Bernard, Paris, France, Département d'Anesthésie Réanimation, Institut Mutualiste Montsouris, 42 boulevard Jourdan, Paris, France

  • Olivier Martinet,

    Affiliation Réanimation Polyvalente, Centre Hospitalier Universitaire Félix Guyon, Saint-Denis, France

  • Myriem Belghiti,

    Affiliation Réanimation Polyvalente, Centre Hospitalier Universitaire Félix Guyon, Saint-Denis, France

  • Sophie Provenchere,

    Affiliation Département d'Anesthésie Réanimation, APHP, CHU Bichat-Claude Bernard, Paris, France

  • Philippe Montravers,

    Affiliations Département d'Anesthésie Réanimation, APHP, CHU Bichat-Claude Bernard, Paris, France, Université Denis Diderot, PRESS Sorbonne Cité, Paris, France

  • Cyril Ferdynus

    Affiliations Unité de Soutien Méthodologique, CHU de La Réunion, Saint-Denis, France, INSERM, CIC 1410, Saint-Pierre, France

A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis

  • Jérôme Allyn, 
  • Nicolas Allou, 
  • Pascal Augustin, 
  • Ivan Philip, 
  • Olivier Martinet, 
  • Myriem Belghiti, 
  • Sophie Provenchere, 
  • Philippe Montravers, 
  • Cyril Ferdynus
PLOS
x

Abstract

Background

The benefits of cardiac surgery are sometimes difficult to predict and the decision to operate on a given individual is complex. Machine Learning and Decision Curve Analysis (DCA) are recent methods developed to create and evaluate prediction models.

Methods and finding

We conducted a retrospective cohort study using a prospective collected database from December 2005 to December 2012, from a cardiac surgical center at University Hospital. The different models of prediction of mortality in-hospital after elective cardiac surgery, including EuroSCORE II, a logistic regression model and a machine learning model, were compared by ROC and DCA. Of the 6,520 patients having elective cardiac surgery with cardiopulmonary bypass, 6.3% died. Mean age was 63.4 years old (standard deviation 14.4), and mean EuroSCORE II was 3.7 (4.8) %. The area under ROC curve (IC95%) for the machine learning model (0.795 (0.755–0.834)) was significantly higher than EuroSCORE II or the logistic regression model (respectively, 0.737 (0.691–0.783) and 0.742 (0.698–0.785), p < 0.0001). Decision Curve Analysis showed that the machine learning model, in this monocentric study, has a greater benefit whatever the probability threshold.

Conclusions

According to ROC and DCA, machine learning model is more accurate in predicting mortality after elective cardiac surgery than EuroSCORE II. These results confirm the use of machine learning methods in the field of medical prediction.

Introduction

Cardiac surgery is at high risk of intraoperative and postoperative complications. The benefit of surgery is sometimes difficult to predict and the decision to proceed to surgery on an individual basis is complex.

European recommendations cite various risk stratification methods in decision making, even though these scores cannot replace clinical judgment and multidisciplinary dialogue [1]. Among the many scores that have been proposed, the European EuroSCORE II and the American Society of Thoracic Surgeons (STS) scores have been the most studied, and are the most widely used [25].

EuroSCORE II, the replacement for EuroSCORE I, was developed and validated in 2012 from a cohort of more than 22,000 patients hospitalized in 154 hospitals in 43 countries, and is used to predict postoperative mortality during hospital stay, through the collection of preoperative variables [2]. Logistic regression was used to create EuroSCORE II and Receiver Operating Characteristic (ROC) analysis was used to evaluate the model. Several studies have shown the limits of this score in some surgeries or patient subgroups [68].

Machine learning, a subfield of artificial intelligence, is a relatively new method issued from the development of complex algorithms and the analysis of large datasets. Machine learning can establish powerful predictive models, and is already used by major companies such as Google and Facebook. Several studies have shown the superiority of machine learning over traditional logistic regression used in EuroSCORE II [913].

Moreover, Decision Curve Analysis (DCA) has been developed to evaluate and compare diagnostic and prediction models by integrating the clinical consequences of false positives and false negatives [14]. In addition to its many other advantages, this method takes into consideration the patient’s choice to put themselves at a risk of false negatives or false positives. The analysis is presented as a graph with the selected probability threshold plotted on the abscissa and the benefit of the evaluated model on the ordinate. This evaluation method may be of considerable help in making decisions on medical treatment with serious side effects, and several studies have been conducted in cancer diagnosis [1518].

Our objective was to compare a machine learning-based model with EuroSCORE II to predict mortality after elective cardiac surgery, using ROC and decision curve analysis. To the best of our knowledge, machine learning and decision curve analyses have never previously been used in such context.

Methods

Study Design and Data Collection

Data from consecutive patients who underwent cardiac surgery with cardiopulmonary bypass between December 2005 and December 2012 in a 1200-bed university hospital were included in this single-center cohort study. Perioperative characteristics were obtained from our local, prospectively collected database (S1 Table). The study was approved by the local ethics committee, which waived the need for informed consent, because of the observational nature of the study (Institutional Review Board 00006477, Paris 7 University, AP-HP). Reporting of this study complies with the Strengthening the Reporting of Observational studies in Epidemiology recommendations statement for reporting.

EuroSCORE I and EuroSCORE II scores were calculated using the coefficients described in the literature [2,19]. We used socio-demographic characteristics, comorbities, data on pre-operative conditions, and EuroSCORE I and EuroSCORE II as predictors.

Patients with non-elective cardiac surgery were excluded from the analysis.

Perioperative care (including anaesthesia, monitoring techniques and normothermic cardiopulmonary bypass) was standardized for all patients.

Models Development

The database was split into 2 datasets: 70% for models training and 30% for models validation. Each model was trained using a 5-fold cross validation process. For each model, we repeated this process 10 times to obtain 10 individual probabilities. The final individual probability for each model was the mean of these 10 individual probabilities.

Logistic Regressions

Three logistic regressions were performed: two univariate models and one multivariate model. The first univariate model was adjusted using EuroSCORE I. The second model was adjusted using EuroSCORE II. In contrast to the sample used to assess EuroSCORE II, in our sample there were no patients requiring emergency surgery. We therefore fitted a third logistic regression model (named LR model), adjusted using EuroSCORE II covariates [2]. All covariates were kept in the model.

Gradient Boosting Machine (GBM)

Gradient boosting machine is a family of supervised machine learning techniques for regression and classification problems that are highly customizable. Gradient boosting machine produces a prediction which is an ensemble of weak prediction models (e.g. decision trees) [20]. Using boosting techniques and a meta-algorithm for reducing bias, GBM can convert a set of weak predictors to strong ones [20].

Random Forests (RF)

Random Forests are an ensemble learning method for classification and regression analysis. The principle of RF is to construct multiple decision trees and return the mode of these classes (classification) or mean prediction (regression) of the individual trees. In contrast to classical decision trees, RF are more robust to overfitting [20].

Support Vector Machine

Support vector machine is a class of supervised learning models for regression and classification analysis [20]. Support vector machine separates the input feature space in hyperplanes. Support vector machine can handle linear and nonlinear separation of the feature space using kernels [20]. Support vector machine offers good performance but can be computational complex with high dimensional input space features and nonlinear kernels.

Naïve Bayes Model

The Naïve Bayes model is a supervised learning method for classification analysis based on Bayes' theorem [20]. Naïve Bayes assumes that given a class of the outcome, all covariates are independent (naïve assumption). With appropriate preprocessing, a simple Naïve Bayes classification can outperform more advanced machine learning methods [21].

Features Selection

Some of the methods are sensitive to correlation between input features [20,21]. To overcome this problem we fitted the four models described above (GBM, RF, Support vector machine and Naïve Bayes) twice. For the first fit, we kept all input features in the analysis. For the second fit, we applied Chi-Square filtering to the input features and then used only relevant features in each of the four models.

Ensemble of Models

It has been shown that assembling the results of different machine learning algorithms can outperform any single base learner [20]. In this way, we performed a greedy ensemble (i.e. a weighted sum) of the probabilities obtained with each of the four machine learning algorithms (GBM, RF, Support vector machine and Naïve Bayes), which we called ML model.

Models Performance

We assessed the performance of each model using the area under the ROC curves (AUC) on the validation dataset. We used bootstrap with 1000 replications to obtain 95% Confidence Intervals (95% CI). Comparisons between ROC Curves were performed using the method described by DeLong et al. [22].

Statistical Analysis

Qualitative variables were expressed as frequencies, percentages, and 95% confidence intervals (95% CI). Quantitative variables were expressed as mean and standard deviation (SD) or median and interquartile range (IQR). Comparisons of percentages were performed with a Chi-square test or by Fisher’s exact test, as appropriate. Comparisons of means were performed by Student t-test or by Mann and Whitney test, as appropriate.

Finally, we analyzed the net benefit of EuroSCORE II, LR model and ML model for predicting postoperative mortality. For these three methods, we calculated the net benefit using DCA, as described by Vickers et al.: it consists in the subtraction of the proportion of all patients who are false-positive from the proportion who are true-positive, weighting by the relative harm of a false-positive and a false-negative result [14]. where n is the total number of patients included in the study and Pt is the probability threshold [14,23,24]. For example, a net benefit of 0.08 for a Pt of 50% can be interpreted as: « using this predictive model, 8 patients per 100 patients who will die after elective cardiac surgery will not have surgery without increasing number of not operated patient undue». The decision curve is constructed by varying Pt and plotting net benefit on the y vertical axis against Pt on the x horizontal axis.

DCA was performed using the SAS macro provided by Vickers et al. All statistical tests were performed at the two-tailed level of significance at 5%. All analyses were performed using SAS 9.4 (SAS Institute, Inc, Cary, NC, USA) and R software version 3.2.2 (The R Foundation for Statistical Computing; Vienna, Austria) with packages XGBoost, ExtraTrees and e1071 [2527].

Results

Characteristics of Patients

From December 2005 to December 2012, 6,889 cardiac procedures with cardiopulmonary bypass were performed including 369 (5.3%) patients with non-elective cardiac surgery that were excluded from the analysis. The remaining 6,520 patients constituted the cohort.

Pre and intra-operative characteristics of the 6,520 patients are presented in Table 1. Among them, 411 (6.3%; 95%CI: 5.7–6.9) died during the in-hospital stay. The mean (standard deviation) age was 63.4 (14.4) years, and the mean EuroSCORE II was 3.7 (4.8) %. Univariate analysis showed a significant difference of mortality for 50 variables: age, height, weight, body mass index, 24 comorbities, 4 preoperative treatments, 8 preoperative paraclinic data, 5 preoperative coronarography data, and 5 surgery characteristics (Table 1).

thumbnail
Table 1. Characteristics of patients at admission and evolution in ICU (whole dataset).

https://doi.org/10.1371/journal.pone.0169772.t001

Receiver Operating Characteristic Analysis

Table 2 summarizes the performance of the different predictive models tested in the validation dataset: EuroSCORE I and II, the LR model, different machine learning techniques, and the ML model. The ML model was the most accurate (AUC, 0.795 (0.755–0.834)). Of the different machine learning techniques, Random Forest had the best AUC irrespective of whether filtering was used. The AUC of the EuroSCORE II and the LR model was significantly lower than the ML model (both, p < 0.0001).

Fig 1 presents the ROC curves of EuroSCORE I and II, and ML model tested on the validation dataset, to predict mortality after cardiac surgery for all patients included in the study.

thumbnail
Fig 1. Receiver operating characteristic curves showing the performance of EuroSCORE I, EuroSCORE II, and the ML model in predicting post-operative mortality.

Areas under curves (95% CI) are 0.719 (0.674–0.763), 0.737 (0.691–0.783), and 0.795 (0.755–0.834), respectively.

https://doi.org/10.1371/journal.pone.0169772.g001

Decision Curve Analysis

Fig 2 presents the decision curves that show the clinical usefulness of EuroSCORE I, EuroSCORE II and the ML model to predict mortality after cardiac surgery in the validation dataset. The results are presented as a graph with the selected probability threshold (i.e., the degree of certitude of postoperative mortality over which the patient's decision is not to operate) plotted on the abscissa and the benefits of the evaluated model on the ordinate [14,28]. The curves of EuroSCORE I and EuroSCORE II remain very close regardless of the threshold selected. The benefit of the ML model is always greater than that of EuroSCORE I and II regardless of the selected threshold, included between 1 and 6%.

thumbnail
Fig 2. Decision curves showing the clinical usefulness of EuroSCORE I, EuroSCORE II, and the ML model in predicting post-operative mortality.

The blue line represents the net benefit of providing surgery for all patients, assuming that all patients would survive. The red line represents the net benefit of surgery to none patients, assuming that all would die after surgery. The green, purple and turquoise lines represent the net benefit of applying surgery to patients according to EuroSCORE I, EuroSCORE II, and ML model, respectively. The selected probability threshold (i.e., the degree of certitude of postoperative mortality over which the patient's decision is not to operate) is plotted on the abscissa.

https://doi.org/10.1371/journal.pone.0169772.g002

Discussion

To the best of our knowledge, this study is the first to use machine learning and DCA, relatively recent and efficient methods, in the context of cardiac surgery decision.

Mortality after this type of surgery is high. We report a mortality rate of 6.3%, which is slightly greater than reported in the EuroSCORE II study (4.6%), while we excluded urgent surgery. This difference may be related to a greater frequency of valvular surgery (56% and 46.3%) and less coronary isolated surgery (38.4% and 46.7%) in our cohort [2].

This study found the ML model to be superior to traditional logistic regression, which has already been observed in many studies [913]. For example, in the study of Churpek and al., machine learning and logistic regression were compared to predict clinical deterioration in a large multicentric database of 269,999 patients. Random Forest (AUC 0.80) was more accurate than logistic regression (AUC 0.77) in predicting clinical deterioration [9]. In the same way, the study de Taylor et al. on the prediction of mortality of 4676 patients with sepsis at the emergency department, found a superiority of their machine learning model (AUC 0.86) compared to their logistic regression model (AUC 0.76, p ≤ 0.003) [12].

EuroSCORE I was used during the years of patient inclusion in our database. Moreover, EuroSCORE I and EuroSCORE II were created from another database. It is why we created a logistic regression model from our database in order to not disadvantage EuroSCORE I and EuroSCORE II. Again, the ML model outperformed the LR model, with ROC analysis and DCA significantly more favorable for the ML model.

Beyond our study, it appears that machine learning may replace logistic regression for prediction when the dataset is several thousand patients or more, and when it is not necessary to assign a weight to each variable related criterion. In these cases, logistic regression may be always helpful.

The second aim of our study was to provide a complementary analysis of traditional ROC curves. This DCA provides complementary information which help in the decision making process involving exposure to risk in surgery. This technique is increasingly widely used and has already shown promise in cancer research. Our analyses show a moderate benefit from the different models (between 1 and 6%), even for the ML model. While the benefit is modest, the impact is major (i.e. prediction of mortality after surgery), and ML model is never inferior to EuroSCORE II. This point is interesting because European guidelines advocate the use of a prediction score to help in decision making [1].

Beyond the theory, combining these two innovative calculation and evaluation methods for predictive models could improve the clinical management of patients. Many predictive models, however, are of no use in clinical practice and we are waiting studies about utility of prediction scores in the area of preoperative cardiac evaluation.

Our study has several limitations. Firstly, because of the retrospective calculation of EuroSCORE II, bias may be present. Secondly, we were unable to gauge the degree of emergency as proposed in EuroSCORE II. This is a minor limitation because our objective was to help in decision making, and we acknowledge that decision making is different in an emergency situation. Third, this was a monocentric study, which could limit extrapolation of our ML model in another cardiac surgery center. Teams with sufficient data could establish their own ML model, or teams with multicentric data could conduct an even larger study than ours. Similarly, it would be interesting to use these methods in a more complex context, where standard scores are less efficient. For example, it may be very useful in decision making involving very old patients or to help choosing between transcatheter and surgical aortic valve replacement [68]. Finally, we cannot compare the ML model with other scores like the STS score, widely use in North America.

In conclusion, machine learning is more accurate than EuroSCORE II to predict mortality after non-urgent surgery. These results confirm the use of machine learning methods in the field of medical prediction. Decision curve analysis showed this prediction model to be a moderate help in decision making, according recommendations.

Supporting Information

Acknowledgments

The authors thank Lucie Allyn and Emma Allou for their guidance in the early stage of this project.

Author Contributions

  1. Conceptualization: JA NA CF.
  2. Formal analysis: JA NA CF.
  3. Investigation: JA NA CF.
  4. Methodology: JA NA CF.
  5. Project administration: JA OM.
  6. Resources: IP NA MB PM OM.
  7. Software: JA CF.
  8. Supervision: JA NA CF PA OM.
  9. Validation: JA MB PA IP SP PM.
  10. Visualization: JA NA PM MB PA.
  11. Writing – original draft: JA NA CF PA PM.

References

  1. 1. Authors/Task Force members, Windecker S, Kolh P, Alfonso F, Collet J-P, Cremer J, et al. 2014 ESC/EACTS Guidelines on myocardial revascularization: The Task Force on Myocardial Revascularization of the European Society of Cardiology (ESC) and the European Association for Cardio-Thoracic Surgery (EACTS)Developed with the special contribution of the European Association of Percutaneous Cardiovascular Interventions (EAPCI). Eur Heart J. 2014;35: 2541–2619. pmid:25173339
  2. 2. Nashef SAM, Roques F, Sharples LD, Nilsson J, Smith C, Goldstone AR, et al. EuroSCORE II. Eur J Cardio-Thorac Surg Off J Eur Assoc Cardio-Thorac Surg. 2012;41: 734-744-745.
  3. 3. Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1—coronary artery bypass grafting surgery. Ann Thorac Surg. 2009;88: S2–22. pmid:19559822
  4. 4. O’Brien SM, Shahian DM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 2—isolated valve surgery. Ann Thorac Surg. 2009;88: S23–42. pmid:19559823
  5. 5. Shahian DM, O’Brien SM, Filardo G, Ferraris VA, Haan CK, Rich JB, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 3—valve plus coronary artery bypass grafting surgery. Ann Thorac Surg. 2009;88: S43–62. pmid:19559824
  6. 6. Joint Task Force on the Management of Valvular Heart Disease of the European Society of Cardiology (ESC), European Association for Cardio-Thoracic Surgery (EACTS), Vahanian A, Alfieri O, Andreotti F, Antunes MJ, et al. Guidelines on the management of valvular heart disease (version 2012). Eur Heart J. 2012;33: 2451–2496. pmid:22922415
  7. 7. Chhor V, Merceron S, Ricome S, Baron G, Daoud O, Dilly M-P, et al. Poor performances of EuroSCORE and CARE score for prediction of perioperative mortality in octogenarians undergoing aortic valve replacement for aortic stenosis. Eur J Anaesthesiol. 2010;27: 702–707. pmid:20520558
  8. 8. Kuwaki K, Inaba H, Yamamoto T, Dohi S, Matsumura T, Morita T, et al. Performance of the EuroSCORE II and the Society of Thoracic Surgeons Score in patients undergoing aortic valve replacement for aortic stenosis. J Cardiovasc Surg (Torino). 2015;56: 455–462.
  9. 9. Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edelson DP. Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards. Crit Care Med. 2016;44: 368–374. pmid:26771782
  10. 10. Kessler RC, van Loo HM, Wardenaar KJ, Bossarte RM, Brenner LA, Cai T, et al. Testing a machine-learning algorithm to predict the persistence and severity of major depressive disorder from baseline self-reports. Mol Psychiatry. 2016;
  11. 11. Weichenthal S, Ryswyk KV, Goldstein A, Bagg S, Shekkarizfard M, Hatzopoulou M. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach. Environ Res. 2016;146: 65–72. pmid:26720396
  12. 12. Taylor RA, Pare JR, Venkatesh AK, Mowafi H, Melnick ER, Fleischman W, et al. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach. Acad Emerg Med Off J Soc Acad Emerg Med. 2016;23: 269–278.
  13. 13. Liu NT, Holcomb JB, Wade CE, Darrah MI, Salinas J. Utility of vital signs, heart rate variability and complexity, and machine learning for identifying the need for lifesaving interventions in trauma patients. Shock Augusta Ga. 2014;42: 108–114.
  14. 14. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Mak Int J Soc Med Decis Mak. 2006;26: 565–574.
  15. 15. Shigeta K, Ishii Y, Hasegawa H, Okabayashi K, Kitagawa Y. Evaluation of 5-fluorouracil metabolic enzymes as predictors of response to adjuvant chemotherapy outcomes in patients with stage II/III colorectal cancer: a decision-curve analysis. World J Surg. 2014;38: 3248–3256. pmid:25167895
  16. 16. Zastrow S, Brookman-May S, Cong TAP, Jurk S, von Bar I, Novotny V, et al. Decision curve analysis and external validation of the postoperative Karakiewicz nomogram for renal cell carcinoma based on a large single-center study cohort. World J Urol. 2015;33: 381–388. pmid:24850226
  17. 17. Hernandez JM, Tsalatsanis A, Humphries LA, Miladinovic B, Djulbegovic B, Velanovich V. Defining optimum treatment of patients with pancreatic adenocarcinoma using regret-based decision curve analysis. Ann Surg. 2014;259: 1208–1214. pmid:24169177
  18. 18. Shariat SF, Savage C, Chromecki TF, Sun M, Scherr DS, Lee RK, et al. Assessing the clinical benefit of nuclear matrix protein 22 in the surveillance of patients with nonmuscle-invasive bladder cancer and negative cytology: a decision-curve analysis. Cancer. 2011;117: 2892–2897. pmid:21692050
  19. 19. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE). Eur J Cardio-Thorac Surg Off J Eur Assoc Cardio-Thorac Surg. 1999;16: 9–13.
  20. 20. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning [Internet]. New York, NY: Springer New York; 2009. Available: http://link.springer.com/10.1007/978-0-387-84858-7
  21. 21. Rennie JD, Shih L, Teevan J, Karger D. Tackling the poor assumptions of naive bayes text classifiers. International Conference on Machine Learning. 2003. p. 616. Available: http://www.aaai.org/Papers/ICML/2003/ICML03-081.pdf
  22. 22. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44: 837–845. pmid:3203132
  23. 23. Vickers AJ. Decision analysis for the evaluation of diagnostic tests, prediction models and molecular markers. Am Stat. 2008;62: 314–320. pmid:19132141
  24. 24. Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Mak. 2008;8: 53. pmid:19036144
  25. 25. Chen T, He T, Benesty M. xgboost: Extreme Gradient Boosting [Internet]. 2016. Available: https://cran.r-project.org/web/packages/xgboost/index.html
  26. 26. Simm J, Abril IM de. extraTrees: Extremely Randomized Trees (ExtraTrees) Method for Classification and Regression [Internet]. 2014. Available: https://cran.r-project.org/web/packages/extraTrees/index.html
  27. 27. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F, C++-code) C-CC (libsvm, et al. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien [Internet]. 2015. Available: https://cran.r-project.org/web/packages/e1071/index.html
  28. 28. Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. JAMA. 2015;313: 409–410. pmid:25626037