Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improving preoperative risk-of-death prediction in surgery congenital heart defects using artificial intelligence model: A pilot study

  • João Chang Junior ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing

    chang.joao@gmail.com

    Affiliations Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil, Fundação Armando Alvares Penteado–FAAP, São Paulo, Brazil, Escola Superior de Engenharia e Gestão–ESEG, São Paulo, Brazil

  • Fábio Binuesa,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil

  • Luiz Fernando Caneo,

    Roles Conceptualization, Formal analysis, Investigation, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil

  • Aida Luiza Ribeiro Turquetto,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Project administration, Supervision, Validation, Writing – review & editing

    Affiliations Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil, Health Technology Assessment Center of Clinics Hospital–NATS-HCFMUSP, São Paulo, Brazil

  • Elisandra Cristina Trevisan Calvo Arita,

    Roles Resources, Writing – review & editing

    Affiliation Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil

  • Aline Cristina Barbosa,

    Roles Resources, Writing – review & editing

    Affiliation Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil

  • Alfredo Manoel da Silva Fernandes,

    Roles Conceptualization, Formal analysis, Investigation, Supervision, Validation, Writing – review & editing

    Affiliation Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil

  • Evelinda Marramon Trindade,

    Roles Conceptualization, Formal analysis, Investigation, Supervision, Validation, Writing – review & editing

    Affiliations Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil, Health Technology Assessment Center of Clinics Hospital–NATS-HCFMUSP, São Paulo, Brazil, São Paulo State Health Secretariat–SES-SP, São Paulo, Brazil

  • Fábio Biscegli Jatene,

    Roles Conceptualization, Investigation, Project administration, Supervision, Validation, Writing – review & editing

    Affiliation Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil

  • Paul-Eric Dossou,

    Roles Visualization, Writing – review & editing

    Affiliation Institut Catholique des Arts et Métiers–Icam, Paris-Sénart, France

  • Marcelo Biscegli Jatene

    Roles Conceptualization, Investigation, Project administration, Supervision, Validation, Writing – review & editing

    Affiliation Department of Cardiovascular Surgery—Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—HCFMUSP—InCor, São Paulo, Brazil

Improving preoperative risk-of-death prediction in surgery congenital heart defects using artificial intelligence model: A pilot study

  • João Chang Junior, 
  • Fábio Binuesa, 
  • Luiz Fernando Caneo, 
  • Aida Luiza Ribeiro Turquetto, 
  • Elisandra Cristina Trevisan Calvo Arita, 
  • Aline Cristina Barbosa, 
  • Alfredo Manoel da Silva Fernandes, 
  • Evelinda Marramon Trindade, 
  • Fábio Biscegli Jatene, 
  • Paul-Eric Dossou
PLOS
x

Abstract

Background

Congenital heart disease accounts for almost a third of all major congenital anomalies. Congenital heart defects have a significant impact on morbidity, mortality and health costs for children and adults. Research regarding the risk of pre-surgical mortality is scarce.

Objectives

Our goal is to generate a predictive model calculator adapted to the regional reality focused on individual mortality prediction among patients with congenital heart disease undergoing cardiac surgery.

Methods

Two thousand two hundred forty CHD consecutive patients’ data from InCor’s heart surgery program was used to develop and validate the preoperative risk-of-death prediction model of congenital patients undergoing heart surgery. There were six artificial intelligence models most cited in medical references used in this study: Multilayer Perceptron (MLP), Random Forest (RF), Extra Trees (ET), Stochastic Gradient Boosting (SGB), Ada Boost Classification (ABC) and Bag Decision Trees (BDT).

Results

The top performing areas under the curve were achieved using Random Forest (0.902). Most influential predictors included previous admission to ICU, diagnostic group, patient's height, hypoplastic left heart syndrome, body mass, arterial oxygen saturation, and pulmonary atresia. These combined predictor variables represent 67.8% of importance for the risk of mortality in the Random Forest algorithm.

Conclusions

The representativeness of “hospital death” is greater in patients up to 66 cm in height and body mass index below 13.0 for InCor’s patients. The proportion of “hospital death” declines with the increased arterial oxygen saturation index. Patients with prior hospitalization before surgery had higher “hospital death” rates than who did not required such intervention. The diagnoses groups having the higher fatal outcomes probability are aligned with the international literature. A web application is presented where researchers and providers can calculate predicted mortality based on the CgntSCORE on any web browser or smartphone.

Introduction

Congenital heart defects (CHD) are structural problems that arise in the formation of the heart or major blood vessels, with a significant impact on morbidity, mortality and health costs in children and adults. Defects vary in severity, from tiny holes between chambers that are resolved naturally or malformations that may require multiple surgical procedures, being a major cause of perinatal and infant mortality [1].

Reported birth estimates for patients with congenital heart disease vary widely among studies worldwide. The estimate incidence of 9 per 1,000 live births is generally accepted, thus, annually, more than 1.35 million of children are expected to be born with some congenital heart disease [2,3].

CHD may require several surgical procedures carrying its implicit death risk [2]. Survival risk analysis provides support for medical decision-making, avoiding futile clinical interventions or ineffective treatments [4]. There are some risk stratification models for mortality and morbidity for children with congenital heart disease, for example RACHS-1 [57], Aristotles Basic Complexity (ABC) and Aristotles Comprehensive Complexity (ACC) [8]. These models were developed based on experience and consensus among experts in this field, due to the lack of adequate data at that time [9].

One of the greatest challenges in developing accurate predictors of death related to pediatric heart surgery is the wide heterogeneity range of congenital heart anomalies. Unlike adult cardiac surgery (where there is a limited number of surgical procedures and very large numbers of patients undergoing such procedures), the exact opposite applies in pediatric cardiac surgery where there are thousands of different procedures and small number of patients undergoing each type of procedures. Many cardiac surgical programs may undertake certain rare procedures only once in several years. Thus, to build a helpful risk predictive model, the experience of a large number of patients must be analyzed. Of note, the Society of Thoracic Surgeons (STS) has the world’s largest multi institutional congenital cardiac database registering data from the majority of United States and Canada pediatric cardiac centers [10]. The STS has published numerous comprehensive articles modelling risk factor analysis in a representative population living and treated in a developed country. Recently a new global database is being established by the World Society for Pediatric and Congenital Heart Surgery (WSPCHS). It aims to include multiple institutions from multiple countries with the proposal to represent more heterogeneous population of children assisted by different health systems and facilities [11]. Indeed, the motivation for the development of the global database is exactly the problem pointed in 2014, that we used to justify the ASSIST Registry project: there is massive heterogeneity of CHD diagnosis, procedures, patient characteristics, trained human resources and facilities diverse structures where they are treated. Following the need to perform outcomes assessments, respecting the characteristics of our population and healthcare system, we hope to establish a multi institutional Brazilian database in the near future based on our ASSIST Registry.

The ASSIST Registry was established in 2014 as a multicenter São Paulo State regional database [12], as the pilot for this national project. In the past 5 years the ASSIST Registry collects data from five institutions aiming to elicit our population’s specific covariates and their individual conditions that could modify the outcomes risk model, currently based on international well established risk scales for databases stratification [10]. The predictions of these models, despite the efforts for enhancing it are not sufficiently accurate for individual’s risk assessment worldwide, either because the performance of a given risk score reflects the average of a group or because there are sociodemographic particularities that affect the model's response [13,14]. Given this scenario, individualized mortality prediction models have been proposed [15]. Some studies with AI aids have been performing better when compared to the standard severity scoring system [16,17].

The proposal for mortality risk models using artificial intelligence for patients with congenital heart disease is promising, although research on this subject is scarce. Recent studies include machine-learning algorithms to classify groups of risks of death in surgery [16]. Another study proposed an artificial neural network (ANN) model to predict the risks of congenital heart disease in pregnant women [17]. However, there was no result when we searched the published scientific literature for specific models of individual prediction of death in cardiac surgery for patients with congenital heart defects.

The proposal of this study is to evaluate six statistics models to ascertain the mortality risk, adapted to the regional reality, focused on individual mortality prediction among patients with congenital heart disease undergoing cardiac surgery. Secondly, we aimed to instrument the model-based mortality prediction with a calculator tool, the CngtSCORE calculator model, accessible through any web browser or smartphone.

Materials and methods

Study design

This is a retrospective post-hoc AI analysis of the prospectively built ASSIST Registry multicenter CHD 2014–2019 study. These analyses intended to elicit the highest AI accuracy model to build the individual’s death risk prediction before individual’s surgery.

Six artificial intelligence models most cited in medical references were used in this study. The Multilayer Perceptron (MLP), Random Forest (RF), Extra Trees (ET), Stochastic Gradient Boosting (SGB), Ada Boost Classification (ABC) and Bag Decision Trees (BDT) machine-learning algorithms were tested with the InCor’s dataset aiming to elicit the most adjusted outcome evaluation.

Study population

Between January 2014 and December 2018, there were 2,240 consecutive patients with CHD referred for InCor’s surgical treatment. All data were extracted from the general ASSIST Registry dataset and stored in compliance with institutional security and privacy governance rules.

The database ASSIST Registry accumulates more than 3,000 patients reported [12].

Despite this collaborative dataset (since the data from the remaining centers was not externally audited until now), we used only the InCor’s data to keep this pilot test data more accurate.

To ensure data accuracy, the postgraduate student and the supervisors (authors) performed quality checks over time.

Predicting variables

Eighty-three pre-operational ASSIST Registry predictive variables for the outcome of each patient were applied. Selection decisions were made based on their methodology, the evidence literature used, their applicability, and by consensus among the participant researchers (these variables and its parametrization are presented in Table 1). These variables were used as exogenous variables in the six machine-learning algorithms to create the CgntSCORE calculator. The six algorithms trained in this study were Multilayer Perceptron (MLP), Random Forest (RF), Extra Trees (ET), Stochastic Gradient Boosting (SGB), Ada Boost Classification (ABC) and Bag Decision Trees (BDT). These six different machine-learning algorithms were used to predict the risk of pre-surgical mortality and to understand the magnitude each variable affected the risk of death.

Outcome variables

The outcome variable of interest was hospital mortality, defined as death in the hospital or within 30 days of cardiac surgery, as defined by STS [10].

Data analysis

The experiments were performed on an Intel® Core ™ i7-7700HQ 2.80GHz notebook, 16.0 GB of RAM, under the Windows 10 platform. Moreover, for the manipulation, analysis and training of the algorithms, Python 3.7.1 software and Numpy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Imblearn and PyTorch libraries were used.

The forecasting model development included the following steps: preparation of the InCor data set, normalization or standardization of the variables, division of the data into training and validation sub-sets, balancing of the training set, training and algorithm adjustments, and finally, measuring the model's forecast performance. Fig 1 presents this procedure sequence and subsequent texts define and further explain each step.

The Department of Cardiovascular Surgery-Pediatric Cardiac Unit, Heart Institute of University of São Paulo Medical School—InCor provided the data set used in this study and its technical expertise. The data set, extracted from the ASSIST database, contains the history of 2,240 cardiac surgeries performed on patients with heart disease from 2014 to 2018. This information was organized into 84 variables, many derived from the international RACHS and Aristotle risk scores checklists, including continuous, quantitative or categorical qualitative parameterized fields, as detailed in Table 2. We defined the objective variable as “Final Outcome” and tabulated it 0 or 1, with 0 for “Hospital Discharge” and 1 for “Hospital Death”.

As the first step to train of the algorithm, it was necessary to evaluate the need for normalization or standardization of the variables. Indeed, many machine-learning algorithms perform better or converge more quickly when the resources are on a relatively similar scale or close to the normal distribution, as for example, in Linear Regression, Logistic Regression, K-Nearest Neighbors Algorithm (KNN), Artificial Neural Networks (RNA), Support Vector Machines with radially polarized core (SVM), Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) [1821].

The data set partition for training and testing (validation) was separated by the forecasting statistic adjusted model. The training set was then used to train the model and the test set (validation) was used to evaluate the model's performance. However, this approach without adjustments could have led to problems of variance using the same algorithm, with scenarios where precision obtained in one test is different from the precision obtained in another test set. To minimize variance and ensure better performance of the machine learning models [22,23], Hold-out and K-Fold Cross Validation techniques were compared, regarding our model purpose and the size of the dataset. The Hold-out method divides the data set into two parts, training and testing, while the K-Fold Cross Validation method divides the data set into K parts of equal size, also called folds. The training process was then applied to all folds, with the exception of one fold that was used as a test set in the validation process, where, finally, the measure of performance was the average of all performance tests for all folds. The advantage of this K partition method is that the entire data set is trained and tested, reducing the variation of the chosen estimator. This guarantees a more accurate forecast and less bias from the positive rate estimator [23,24].

As in this study, the 10-fold K-Fold Cross Validation method (K = 10) has shown good performance in several health-related studies with low variability between the training and test sample [16,2529].

Another source of results variance we analyzed was derived from unbalance within data categories included, such as, for example, some diagnostic variables (e.g. small numbers of rare conditions). Indeed, in data sets it is common to observe large differences in the percentage of representativeness in the classes studied. For instance, in the InCor’s study data set, we observed 10.8% patients dying after surgery versus 89.2% who survived. When the classification categories are not equally represented, it is said that the data set is unbalanced [30,31].

Conventional algorithms tend to be biased towards the majority class because their loss functions try to optimize quantities such as error rate, disregarding data distribution. In the worst case, minority examples are treated as outliers of the majority class and ignored, causing the model to be trained only to identify the majority class, which for this study it would lead to failure to classify a patient's risk of death.

The InCor’s data set is unbalanced, that is, there is a 1:9 ratio between mortality and post-surgery survival [30,31]. In some algorithms, if this effect is not addressed, there would be a false interpretation of the model's performance, which was not desirable since the study’s aim is to identify and understand the risk of death, the minority class of the studied data set.

To reduce the impacts due to the unbalance, technical methods to do under-sampling or over-sampling the data set were used. The under-sampling techniques consisted of reducing the sample of its most representative category to increase the sensitivity of a classifier towards its minority class, while the over-sampling technique simply increased the sample of the minority category using statistical techniques to replicate minority samples, duplicating them or generating new samples from the actual samples [30].

Moreover, in the process of building the machine-learning model, it was necessary to evaluate algorithm performance, the model’s errors and its hits capacity. In this study, the model aims to accurately predict the risk of death of patients with congenital heart disease before cardiac surgery, so it is a binary classification problem. In binary classification problems there are several evaluation metrics [32]; the most common performance metrics in machine learning are Accuracy, Precision, Specificity, Sensitivity or Recall, and the ROC (Receiver Operating Characteristics) curve AUC (Area Under the Curve), also written as AUROC (Area Under the Receiver Operating Characteristics).

Accuracy, Precision, Specificity, Sensitivity or Recall measurements were calculated using the Confusion Matrix (Fig 2).

The methods and techniques used in this study are summarized in Table 2.

Ethical approval

This study is part of the larger ASSIST Registry project ("Estudo do Impacto dos Fatores de Risco na Morbimortalidade das Cardiopatias Congênitas: comparação entre estratos de risco quando duas escalas internacionais são aplicadas no Sistema Único de Saúde do Estado de São Paulo"), protocol CAAE: 31994814.0.3001.5440 approved by the Ethics Committee of the Heart Institute of University of São Paulo Medical School, São Paulo, Brazil. Because this study used solely the pre-established database by the larger ASSIST Registry project, the use of the patient’s informed consent forms was waived.

Companion web site

A companion site was designed to contain additional, up-to-date information on the data set, model as well as a Web Application that can perform mortality predictions based on individual patient characteristics.

Results

The predictive performance metrics of the machine learning algorithms tested in this study are shown in Table 3.

Table 3 shows that the Multilayer Perceptron (MLP) neural network obtained the highest levels of accuracy (accuracy) and specificity (specificity) in relation to the other studied algorithms, respectively 90.2% and 98.5%. On the other hand, it obtained the lowest sensitivity index (20.8%), ROC AUC (84.6%) and AP ((Average Precision) 0.44). These results demonstrate that the Multilayer Perceptron neural network achieved the best performance in the accuracy of survival forecasts, a fact reinforced with the specificity index (Specificity) of 98.5%. In contrast, its ability to identify patients at risk of death is the lowest among the models studied; only 20.8% of the total patients who died in surgery were identified as risk of death by the neural network.

Given our InCor’s unbalanced dataset [33,34], we used the ROC AUC and AP to analyze the performance of the models. The Bagged Decision Trees (BGT), Random Forest (RF) and Stochastic Gradient Boosting (SGB) algorithms stand out with the highest ROC AUC rates among the studied algorithms, respectively 92.6%, 90.2% and 88.5%. They also have the highest AP rates, 0.81 for Bagged Decision Trees (BGT), 0.73 for Random Forest (RF) and 0.70 for Stochastic Gradient Boosting (SGB). The sensitivity index (Recall) is another metric considered useful to subsidize decision making, where the 92.2% index for Random Forest (RF) is observed, the highest index among the studied algorithms.

In line with the objective of predicting the risk of pre-surgical mortality, the Bagged Decision Trees (BGT) and Random Forest (RF) algorithms stand out in the performance requirement. The Bagged Decision Trees (BGT) algorithm demonstrated better performance for predicting survival, specificity (specificity) of 90.8%, without giving up the ability to identify risk of death, sensitivity index (Recall) of 70.6%. While Random Forest (RF) stood out in its ability to identify risk of death, reaching a sensitivity index (Recall) of 92.2%.

The data in the Confusion Matrix of the RF algorithm are from the test set (validation), in which it is possible to verify the number of observations with correct and predicted errors. It can be seen that 8 of the 51 patients who died were not predicted by the model (Fig 3). This matrix contains the information that was used to generate the model's Accuracy, Sensitivity or Recall, Specificity and Precision Indices (Fig 4).

In Fig 4, it can be seen that the RF model obtained an accuracy of 80.8%, sensitivity of 92.1% and precision of 54.6%.

The ROC AUC (AUROC) curve is another model performance metric frequently used to support medical decision-making [35] increasingly adopted in the machine learning research community [36]. The ROC AUC curve of the RF algorithm is shown in Fig 5.

Fig 5 shows the rate of false positives on the horizontal axis and the rate of true positives on the vertical axis, and the plotted curve is the ROC curve. The area under the curve is called AUC (Area Under Curve) and indicates the model's ability to hit. The closer this index gets to 1, the greater the model's ability to hit its predictions. The Random Forest (RF) reached a rate of 90.2% of ROC AUC, information used to compare the performance of the other models and to support the decision of the cutoff values for the desired objective.

Due to the InCor’s unbalanced dataset [33,34] and the warn of caution in the use of the AUROC it is recommended to include Precision-Recall Curves in decision-making (Fig 6).

The Precision-Recall Curve shows the correlation of those two indices. It is observed that the higher the Recall, the lower the Precision of the generated model. It is important information for the cutoff point of the model. The highest precision is obtained in detriment of the sensitivity, or the opposite. The AP index is also calculated, which in the RF model was 0.73.

The variables importance and influence analysis using the Random Forest (RF), Stochastic Gradient Boosting (SGB), Extra Trees (ET) and AdaBoost Classification (ABC) algorithms is presented, with the resulting magnitude each variable affected the risk of death, in Table 4.

thumbnail
Table 4. Importance of predictor variables in the outcome.

https://doi.org/10.1371/journal.pone.0238199.t004

Table 4 shows that some variables are listed as the most important in the four outstanding studied algorithms, such as: Previous ICU admission, Diagnostic Group, Patient Height at the Time of Surgery, Arterial Oxygen Saturation, Hypoplasia of the Left Heart and the Body Mass Index in Surgery. These variables together represent 67.8% of the importance of the risk of death in the Random Forest (RF) model, 57.6% in the Stochastic Gradient Boosting (SGB), 28.4% in Extra Trees (ET) and 32.0% in AdaBoost Classification (ABC).

Discussion

In recent research on individualized mortality prediction models, it is observed that the use of machine learning techniques can be a tool to support medical decision making. Research in this field has been increasing in recent years [16]. These techniques have shown better performance compared to traditional techniques, such as logistic regression, [15,25,37], including in mortality prediction studies [15].

The experiment was started following the steps described in Fig 2. With the normalized data set, the training and test samples (validation) were separated, using the K-Fold Cross Validation method 10 times (K = 10), where 90% of the sample was separated to train the machine learning algorithms and the rest separated for the validation step. Then, the need to balance the data set was assessed, since there is an unbalanced set, a 1:9 ratio between mortality and post-surgery survival and the search for greater predictability of risk of death. When balancing was necessary, different methods of under-sampling or over-sampling were tested to adjust the distribution of the training sample categories.

With the training and test samples (validation) separated and adjusted, the training step of the algorithm began. In the process of training the algorithms, several parameter configurations were tested, always seeking to minimize the generalization errors, either by overfitting or under fitting [38].

In order to optimize the generalization of the algorithm, the ROC AUC (AUROC) and Average Precision indexes were maximized, aiming to obtain the best forecasting assertiveness and minimize the error of not signaling a patient at high risk of mortality.

Among the six machine learning algorithms studied, it was observed that the Bagged Decision Trees (BGT) and Random Forest (RF) algorithms stood out in predicting mortality in relation to the other studied algorithms. In this selection to decide which one of the algorithms best fulfilled the mortality prognosis, we considered the implementation complexity and the possibility to understand the impact of the variables on the risk of death. Based on these parameters, the Random Forest (RF) algorithm outperformed and made possible to analyze the importance of variables, a function not present in Bagged Decision Trees (BDT). It was also observed that the Random Forest (RF) algorithm did not use all the variables of the data set in the generation of the model, thus, it reduced the implementation complexity. Moreover, the Random Forest (RF) can be highlighted for its precision, ease of training, and adjustments [21].

To the best of our knowledge, this is the first individual’s CHD mortality prognosis ascertainment using AI.

Our AI derived outcomes analyses are in line with the aggregated international scientific literature published. The variables listed as the most important, where the representativeness of “hospital death” is greater with more severe CHD diagnosis, indeed, agrees with the STS core aggregated data published. However, the STS aggregated data more recently published [10] has excluded low weight or out of -7.0 to 5.0 Z score range neonates present in InCor’s patient population due to malnutrition, problems not prevalent in developed countries.

Conclusions

This study suggests the use of Random Forest (RF) as a model of individual death prediction for cardiac surgery in patients with congenital heart disease. The prediction results of Random Forest (RF) corroborate that machine learning algorithms can assist clinical specialists, patients and family members to analyze the risks associated with a possible cardiac surgical intervention.

Understanding which diagnoses and variables impact the probability of mortality of a patient with congenital heart disease, when proposed to be submitted to a cardiac surgical intervention, allows clinical specialists to understand the risks associated with a surgical intervention, provide information to support the decision of health professionals and family members of patients.

Analyzing the variables listed as the most important, it was observed that the representativeness of “hospital death” is greater in patients up to 66 cm in height and BMI below 13.0. In addition, the “hospital death” probability declines with the increase in the arterial oxygen saturation index, allowing focusing the action and medical intervention to mitigate the risk of death.

In the patients’ cluster with previous ICU stay or with prior hospitalization before surgery, it was observed the highest proportion of deaths than the patients who did not needed such admissions. In-depth analysis of the effects of this variable is timely in understanding the risks of death and may be the target of studies in future research.

Accordingly, the most severe diagnosis groups have a higher percentage of death than others, for example the left heart hypoplasia syndrome.

It is, thus, opportune to direct specific studies for these groups and variables that can direct actions to mitigate the risk of death.

As a model-based mortality prediction tool, the CngtSCORE model can be accessed through Web browsers and smartphones.

Perspectives

Future research can evaluate new machine-learning algorithms or even test new variables and diagnoses, as well as an in-depth analysis of the effects of these variables in understanding the risks of death in patients with congenital heart disease undergoing cardiac surgery.

In addition, given the transition of pediatric care into adult life, the continuous evolution of treatment strategies, and the relatively long life expectancies of survivors of cardiac interventions, new machine learning algorithms may compare the long-term efficacy of different treatment strategies.

References

  1. 1. TENNANT P. W. G. et al. 20-year survival of children born with congenital anomalies: a population-based study. Lancet 2010, 2010.
  2. 2. BENJAMIN E. J. et al. Heart Disease and Stroke Statistics’2017 Update: A Report from the American Heart AssociationCirculation, 2017.
  3. 3. LINDE D. VAN DER et al. Birth Prevalence of Congenital Heart Disease Worldwide. Israel Medical Association Journal, v. Vol. 58, N, 2011.
  4. 4. LIN K.; HU Y.; KONG G. Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model. International Journal of Medical Informatics, v. 125, n. 38, p. 55–61, 2019.
  5. 5. JENKINS K. J. et al. Consensus-based method for risk adjustment for surgery for congenital heart disease. Journal of Thoracic and Cardiovascular Surgery, 2002.
  6. 6. JENKINS K. J. Risk adjustment for congenital heart surgery: The RACHS-1 method. Pediatric Cardiac Surgery Annual, v. 7, n. 1, p. 180–184, 2004.
  7. 7. JENKINS K. J.; GAUVREAU K. Center-specific differences in mortality: Preliminary analyses using the risk adjustment in congenital heart surgery (RACHS-1) method. Journal of Thoracic and Cardiovascular Surgery, v. 124, n. 1, p. 97–104, 2002. pmid:12091814
  8. 8. LACOUR-GAYET F. et al. The Aristotle score: A complexity-adjusted method to evaluate surgical results. European Journal of Cardio-thoracic Surgery, v. 25, n. 6, p. 911–924, 2004. pmid:15144988
  9. 9. HÖRER J. et al. Mortality Following Congenital Heart Surgery in Adults Can Be Predicted Accurately by Combining Expert-Based and Evidence-Based Pediatric Risk Scores. World Journal for Pediatric and Congenital Heart Surgery, 2016.
  10. 10. JACOBS J. P. et al. Refining The Society of Thoracic Surgeons Congenital Heart Surgery Database Mortality Risk Model With Enhanced Risk Adjustment for Chromosomal Abnormalities, Syndromes, and Noncardiac Congenital Anatomic Abnormalities. The Annals od Thoracic Surgery, v. 108, issue 2, p. 558–566, 2019.
  11. 11. JACOBS J. P. et al. History of the World Society for Pediatric and Congenital Heart Surgery: The First Decade. World Journal for Pediatric and Congenital Heart Surgery, v. 9, p. 392–406, 2018. pmid:29945512
  12. 12. CARMONA F. et al. Collaborative Quality Improvement in the Congenital Heart Defects: Development of the ASSIST Consortium and a Preliminary Surgical Outcomes Report. Brazilian Journal of Cardiovasculary Surgery, v. 32, p. 260–269, 2017.
  13. 13. VILELA DE ABREU HAICKEL NINA R. et al. O escore de risco ajustado para cirurgia em cardiopatias congênitas (RACHS-1) pode ser aplicado em nosso meio? Rev Bras Cir Cardiovasc, v. 22, n. 4, p. 425–431, 2007. pmid:18488109
  14. 14. OMAR A. V. MEJIA, et al. Predictive performance of six mortality risk scores and the development of a novel model in a prospective cohort of patients undergoing valve surgery secondary to rheumatic fever. Journal Plos One, p. 1–14, 2018.
  15. 15. AWAD A.; BADER-EL-DEN M.; MCNICHOLAS J. Patient length of stay and mortality prediction: A survey. Health Services Management Research, v. 30, n. 2, p. 105–120, 2017. pmid:28539083
  16. 16. RUIZ-FERNÁNDEZ D. et al. Aid decision algorithms to estimate the risk in congenital heart surgery. Computer Methods and Programs in Biomedicine, 2016.
  17. 17. LI H. et al. An artificial neural network prediction model of congenital heart disease based on risk factors A hospital-based case-control study. Medicine (United States), 2017.
  18. 18. MITCHELL T. M. Machine Learning. [s.l.] McGraw-Hill Science/Engineering/Math, 1997.
  19. 19. HAYKIN S. Redes neurais: princípios e prática. 2. ed. ed. Porto Alegre: Bookman, 2001.
  20. 20. SILVA I. N. DA; SPATTI D. H.; FLAUZINO R. A. Redes neurais artificiais: para engenharia e ciências aplicadadas. 2.ed. ed. São Paulo: Artliber, 2010.
  21. 21. HASTIE T.; TIBSHIRANI R.; FRIEDMAN J. The Elements of Statistical Learning. Second Edi ed. [s.l.] Springer Series in Statistics, 2009.
  22. 22. BROWNE M. W. Cross-Validation Methods. Journal of Mathematical Psychology, v. 44, n. 4, p. 108–132, 2001.
  23. 23. KOHAVI, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. International Joint Conference On Artificial Intelligence, n. 0, p. 0–6, 1995.
  24. 24. BLAGUS R.; LUSA L. Joint use of over-and under-sampling techniques and cross-validation for the development and assessment of prediction models. BMC Bioinformatics, v. 16, n. 1, p. 1–10, 2015.
  25. 25. AWAD A. et al. Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach. International journal of medical informatics, v. 108, n. October, p. 185–195, 2017. pmid:29132626
  26. 26. CHURPEK M. M. et al. Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards. Crit Care Med, v. 44, n. 2, p. 298–305, 2017.
  27. 27. DESAUTELS T. et al. Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: A cross-sectional machine learning approach. BMJ Open, v. 7, n. 9, p. 1–9, 2017.
  28. 28. COHEN K. B. et al. Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates through Natural Language Processing and Machine Learning. Biomedical Informatics Insights, v. 8, p. BII.S38308, 2016.
  29. 29. R., P. et al. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): A population-based study. The Lancet Respiratory Medicine, v. 3, n. 1, p. 42–52, 2015. pmid:25466337
  30. 30. CHAWLA N. V. et al. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, v. 16, n. 1, p. 321–357, 2002.
  31. 31. PROVOST F.; FAWCETT T. Robust classification for imprecise environments. Machine Learning, v. 42, n. 3, p. 203–231, 2001.
  32. 32. BADER-EL-DEN, M.; TEITEI, E.; ADDA, M. Hierarchical classification for dealing with the Class imbalance problem. Proceedings of the International Joint Conference on Neural Networks, v. 2016- Octob, p. 3584–3591, 2016.
  33. 33. SAITO T.; REHMSMEIER M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, v. 10, n. 3, 2015.
  34. 34. DAVIS, J.; GOADRICH, M. The Relationship Between Precision-Rcall and ROC Curves. Madison: [s.n.], 2006.
  35. 35. HAJIAN-TILAKI K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine, v. 4, n. 2, p. 627–635, 2013. pmid:24009950
  36. 36. FAWCETT T. ROC graphs: Notes and practical considerations for researchers. Machine learning, v. 31, n. 1, p. 1–38, 2004.
  37. 37. XU J. et al. Data Mining on ICU Mortality Prediction Using Early Temporal Data: A Survey. [s.l: s.n.]. v. 16
  38. 38. HAWKINS D. M. The Problem of Overfitting. J. Chem. Inf. Comput. Sci., v. 44, p. 1–12, 2004. pmid:14741005