Predictive performance of six mortality risk scores and the development of a novel model in a prospective cohort of patients undergoing valve surgery secondary to rheumatic fever

Background Mortality prediction after cardiac procedures is an essential tool in clinical decision making. Although rheumatic cardiac disease remains a major cause of heart surgery in the world no previous study validated risk scores in a sample exclusively with this condition. Objectives Develop a novel predictive model focused on mortality prediction among patients undergoing cardiac surgery secondary to rheumatic valve conditions. Methods We conducted prospective consecutive all-comers patients with rheumatic heart disease (RHD) referred for surgical treatment of valve disease between May 2010 and July of 2015. Risk scores for hospital mortality were calculated using the 2000 Bernstein-Parsonnet, EuroSCORE II, InsCor, AmblerSCORE, GuaragnaSCORE, and the New York SCORE. In addition, we developed the rheumatic heart valve surgery score (RheSCORE). Results A total of 2,919 RHD patients underwent heart valve surgery. After evaluating 13 different models, the top performing areas under the curve were achieved using Random Forest (0.982) and Neural Network (0.952). Most influential predictors across all models included left atrium size, high creatinine values, a tricuspid procedure, reoperation and pulmonary hypertension. Areas under the curve for previously developed scores were all below the performance for the RheSCORE model: 2000 Bernstein-Parsonnet (0.876), EuroSCORE II (0.857), InsCor (0.835), Ambler (0.831), Guaragna (0.816) and the New York score (0.834). A web application is presented where researchers and providers can calculate predicted mortality based on the RheSCORE. Conclusions The RheSCORE model outperformed pre-existing scores in a sample of patients with rheumatic cardiac disease.


Introduction
Approximately 80% of countries worldwide present with rheumatic fever (RF) and with one of its most prevalent complications, the rheumatic heart disease (RHD). People presenting advanced RHD without access to cardiac surgery die [1].
An improvement in our ability to predict who the best surgical candidates might be can partially account for recent improvements in mortality rates after cardiac procedures. This prediction is frequently accomplished through risk scores. As a consequence, numerous risk scores have been developed over time [2][3][4][5]. Although the widespread use of risk scores is deemed to be a sign of improvement in our clinical decision support system, clinicians often fail to notice that the performance of a given risk score only remains adequate under certain conditions. For example, if the sample which validated the risk score was different from the patient population where it is being applied, then prediction performance could be compromised, ultimately resulting in misleading clinical decisions.
Around the world, there are over 15 million people with RHD accompanied by 300,000 new cases per year and over 200,000 annual deaths [6]. The public health system of Brazil spends over 90 million dollars a year to treat patients with RF and RHD, thus the creation of a task force for the prevention [7] and improvement of quality initiatives for the surgical treatment of patients with RHD such as repair rather than replacement of diseased valves because of renowned consequences [8]. However, RHD damages the valve leaflets and the subvalvular apparatus, making the repair more difficult. For these reasons, proper risk assessment for purposes of informed consent and the determination of current treatment in these patients is important because the traditional risk scores often emerge from non-rheumatic populations. Given that most risk scores to date were developed and validated mostly among patients in developed countries, it is questionable whether their predictive performance would still be optimal when applied to rheumatic patients. Unfortunately, we are not aware of any previous scores validated among a large, prospective sample exclusively composed of patients with a diagnosis of rheumatic valve disease.
In the face of this gap in the literature, our study aimed to evaluate the predictive performance of six different risk scores: the 2000 Bernstein-Parsonnet, EuroSCORE II, InsCor, Ambler, Guaragna and the New York scores. In addition, we developed the RheSCORE model, optimized for mortality risk prediction among patients with rheumatic valve disease. It was our hypothesis that a prediction model specifically designed to be used among patients with rheumatic valve disease would outperform previously existing scores.

Study design
This study is a prospective consecutive all-comers cohort of patients referred to the Department of Thoracic and Cardiovascular Surgery, Heart Institute-University of São Paulo Medical Center, São Paulo, Brazil.

Study population
Between May 2010 and July of 2015, a total of 2,919 consecutive all-comers patients with RHD referred for surgical treatment of valve disease. Symptomatic RHD was characterized by 2004 World Health Organization criteria for the diagnosis of first onset, recurrence and chronic RHD (modified Jones criteria) and transthoracic echocardiography. Patients were excluded from the study if the primary diagnosis of the valvular disease was not RHD or if they were undergoing associated procedures such as myocardial revascularization, ASD closure, thoracic aorta procedures, etc. The ESC/EACTS 2012 guideline was used for surgical indication of valvular heart disease. All data were extracted from the general prospective institutional register (Si3) and stored in compliance with institutional security and privacy governance rules. To ensure data accuracy, quality checks were performed over time by the postgraduate student and the supervisors (authors).

Predicting variables
Our choice of risk scores was defined by a consensus among participating surgeons, decisions being made on the basis of their methodology and popularity in the literature as well as applicability (Table 1). Clinical and laboratory-related variables for the 2000 Bernstein Parsonnet [2], EuroSCORE II [9], InsCor [5], Ambler [10], New York [11] and Guaragna [12] (Table 2) were prospectively collected and subsequently scored according to the criteria and definitions stipulated by their developers as well as for our new proposed model, RheSCORE.

Outcome variables
The outcome variable of interest was hospital mortality, defined as death in the hospital or within 30 days of cardiac surgery.

Data analysis
We followed international reporting guidelines as well as an expert recommendation in our modeling strategy. We started the analysis by performing a graphical exploratory analysis evaluating the frequency, percentage and near-zero variance for categorical variables, distribution for numeric variables, and missing values and patterns across all variables. In addition, a Maximal Information Nonparametric Exploration algorithm was run to guide bivariate plot inspection. Feature engineering then proceeded by attempting variable transformations and dummy coding for variables with distributions that were not normal at inspection, variable recategorization or removal for near-zero variation, and different imputation algorithms for variables with missing values. We modeled hospital mortality as an outcome variable. To train and test our models, we used a five-fold model validation.  Table 3. Comparison across models was performed using metrics for the area under the curve, sensitivity, specificity, Kappa as well as positive and negative predictive values. All calculations were performed using the statistical language R, including packages ggplot2, caret, rmarkdown, vcd, randomforest, MASS, glmnet, mda, pROC, corrplot, and tabplot. Finally, total scores for the 2000 Bernstein-Parsonnet, EuroSCORE II, InsCor, Ambler, Guaragna, and the New York score were used to predict mortality using logistic regression models under identical validation criteria used for the RheSCORE model.

Ethical approval
This study is part of the project: "Mortality prediction in coronary bypass surgery and/or heart valve surgery at InCor: Validation of two external risk models and comparison to the locally developed model (InsCor)" approved with the number 1063/07 (SDC: 3073/07/148) by the Ethics Committee of the Heart Institute of the Hospital das Clinicas, Medicine School, University of São Paulo, Brazil. Because our study used a pre-established database, the use of informed consent forms was waived.

Companion web site
A companion site was designed to contain additional, up-to-date information on the data set, model as well as a Web Application that can perform mortality predictions based on individual patient characteristics. The application was developed using the Shiny framework.

Results
A total of 2,919 RHD patients underwent heart valve surgery. A hospital mortality rate of 3,51% was recorded for the entire population. Mortality rates associated with aortic, mitral and tricuspid surgery were 2,43%, 3,85%, and 7,25% respectively. Our study sample mostly composed of patients above the age of 50 years, with over 40% having undergone at least one previous surgical procedure, and with the aortic valve being the most common valve location. A number of baseline variables were significantly different for the group of patients who died and those who did not, including lower ejection fraction, pulmonary hypertension, reoperations, emergency, cardiogenic shock, aortic valve surgery, tricuspid valve surgery, renal failure, dialysis and high creatinine values (Table 4). A more pronounced heterogeneity demonstrated by increased variability was observed among variables such as pulmonary hypertension, reoperation and aortic and tricuspid valve surgery procedures (Fig 1A and 1B). In these graphics, all variables are presented in relation to the distribution of age (left-most column).
Results for bivariate associations with mortality from the MINE analysis, a test used to detect overall associations, indicated that pulmonary hypertension, left atrium size, high creatinine, renal failure, tricuspid procedure and aortic valve procedure were the main unadjusted predictors of mortality according to the Maximal Information Coefficient (Table 5).
During our feature engineering, the following variables were deemed as having high nearzero variance frequency ratios and percent uniqueness, and despite their clinical relevance, were eliminated from our final model: emergency surgery, cardiogenic shock, concomitant valve and revascularization procedure, presence of pacemaker, myocardial infarct within 48 hours from the surgical procedure, dialysis, and renal failure. Since the percentage of missing values in our cohort was negligible, we opted for not performing imputation.
Results for all 13 models regarding their overall performance are displayed in Table 6, with the top performing models being Random Forest and Neural Network with areas of 0.982 and 0.952, respectively (Fig 2).
When evaluating the main predictors among our top two models, we observed that the variables left atrium size, high creatinine, tricuspid procedure, reoperation and pulmonary hypertension were consistently the most influential ones predicting mortality (Fig 3). Comparison across model performance was conducted using an area under the curve, where larger values represent better-combined sensitivity and specificity. Table 7 summarizes the main finding of our paper by comparing the area under the curve for the best performing RheSCORE model, the 2000 Bernstein-Parsonnet, EuroSCORE II, InsCor, Ambler, Guaragna, and the New York score, demonstrating a substantial improvement in predictive performance in favor of the RheSCORE model making use of Random Forest.
Finally, we have published a Web application containing the best performing random forest model so that healthcare professionals can calculate predicted mortality rates for individual patients. The application is available at http://www.incor.usp.br/quick/app.html.

Discussion
To the best of our knowledge, this is the first report of a predictive model specifically designed for patients with rheumatic valve conditions undergoing cardiac procedures, making model results available not as a score but as a Web application. This Web application is promptly available to peers as well as to practitioners at the bedside. We have demonstrated that the Rhe-SCORE model using a random forests algorithm provides a substantially improved predictive performance over previous scores. We also observed that, among the top performing models, the following variables were consistently ranked among the most important in predicting mortality: left atrium size, high creatinine, a tricuspid procedure, a reoperation procedure and the presence of pulmonary hypertension. We obtained a better prediction performance with the RheSCORE model than with traditional scores; traditional scores have been designed with the intention of being simple to calculate as long as the practitioner could recall their scoring formula at the bedside. Despite their simplicity, efforts to improve the predictive performance of traditional scores have mostly come to a halt in the past decade. Parsonnet [13] was one of the first authors to analyze mortality risk factors in a sample of patients only undergoing coronary artery bypass graft surgery. Eleven years later, a study involving 10,703 patients undergoing coronary artery bypass graft surgery as well as valve procedures in 10 centers in New Jersey (USA) led to the 2000 Bernstein-Parsonnet score [2]. Given that Parsonnet only involved an American-based sample, the EuroSCOREs [3] was subsequently developed with 19,000 patients from 128 European centers, this score later being reformulated to create EuroSCORE II [9]. With scores now validated in both European and American populations, our team validated the 2000 Bernstein-Parsonnet and EuroSCORE scores among a Brazilian group of patients undergoing coronary artery bypass graft surgery and valve procedures, also generating the new InsCor model [5]. Given that the InsCor score was specifically designed to address the needs of a patient population that is essentially different from their American and European counterparts, the InsCor was the most appropriate of all three [4]. Although there are a number of other scores in the  Risk prediction in a prospective cohort of patients undergoing rheumatic heart valve surgery literature, to our knowledge none of them has substantially improved prediction performance, including the Ambler score [10] with an area under the curve of 0.77 in the original publication and 0.73 in the Brazilian population [14]. Scores specifically designed for patients undergoing valve procedures have also not achieved substantially greater performance, including Hannan's score [11] with an area under the curve of 0.79. To our knowledge, Hannan's score has not been previously validated in a sample involving patients from developing countries. Finally, Guaragna published a valve-specific model validated in a Brazilian population [12], with a resulting area under the curve of 0.83 in the original publication and 0.78 in a subsequent validation [15]. Our current development of the RheSCORE model can be considered as the next generation in model development, with prediction results that far surpass the ones from classical scores. Our finding regarding the importance of left atrial size is aligned with previous reports [16], often surpassing the combination of multiple isolated predictors. For example, in one previous series evaluating surgical outcome predictors, left atrial size was found to be the primary outcome predictor, although this association might vanish in the presence of atrial fibrillation [17]. The importance of left atrial size can be explained since this parameter reflects both the severity and duration of mitral regurgitation, both of which can significantly affect mortality risk.
Regarding the predictive importance of high creatinine levels, our findings concur with many previous publications demonstrating its association with high mortality after cardiac surgery when compared with controls. Of importance, previous findings have demonstrated that even small serum creatinine changes after surgery can significantly affect mortality, this association being independent of other well-established perioperative risk indicators [18].
Our results regarding the importance of tricuspid procedures in predicting mortality align with previous publications point to these interventions as the second highest risk for mortality after valvular heart surgery [19]. In a separate series evaluating determinants of surgical mortality after cardiac surgery, the tricuspid procedure was again shown to be the second highest determinant of mortality [20] among a selected group of 19 predictors. Although not evaluated in our study, studies report that mortality rates for re-operated patients undergoing tricuspid procedures can rise by up to 37% [21], a factor that should be taken into account when planning procedures as well as discussing potential risks with patients.
Given the increased surgical trauma as well as the underlying reasons leading to a re-operation, heart valve re-operations are known to be performed with an acceptable operative mortality with some patient categories presenting elevated risks [22] ultimately underscoring the need for appropriate risk prediction and stratification in relation to therapeutic options and preoperative selection.
Age remains an independent predictor of mortality in this population although a lower value is associated with rheumatic patients [23]. Our data indicate that the left ventricular dysfunction, analyzed by LVEF, also is associated with mortality after heart valve surgery in rheumatic patients. Like the previous report on non-specific rheumatic patients [24], the number of valve reoperation was an independent predictor of hospital mortality. Finally, as opposed to previous publications, there was no report of an association between gender and hospital mortality [22] Despite a significant improvement in predictive performance when compared to previously reported scores, our study does have limitations. First, although our model is transparent enough to point to the most important variables predicting mortality, it does not provide a clear causal path. In other words, our model does not offer an instrument that could help us better design quality improvement programs toward a reduction in mortality rates after cardiac surgery among patients with rheumatic valve conditions. This limitation could be addressed in future studies where causal models such as Bayesian Networks can be used to predict not only mortality but also to determine a clinically interpretable causal model as well as to conduct causal experiments. Second, our cohort of rheumatic patients comes from one of the cities with the highest income in Latin America. This probably explains why the average age of our subjects is in the upper limit of upper-middle-income countries included in the REMEDY Study [25].

Conclusions
In conclusion, we believe that future studies should further validate the predictive performance of the RheSCORE model among patient populations from other countries, evaluate how healthcare professionals might use our Web application in daily clinical practice, and also investigate how that use might affect their clinical decision making. Despite these pending evaluations, and in view of our results steering to a superior predictive performance, we recommend the incorporation of the RheSCORE model into daily practice when attempting to predict mortality risk among patients undergoing cardiac surgical procedures for rheumatic valve conditions.

Perspectives
• The RheSCORE model, developed specifically for rheumatic-related conditions, has superior predictive performance when compared to previous traditional scores.
• The most important variables predicting mortality across different models were left atrium size, high creatinine, a tricuspid procedure, a reoperation procedure and the presence of pulmonary hypertension.
• As a model-based mortality prediction tool, the RheSCORE model can be accessed through Web browsers and smartphones at http://www.incor.usp.br/quick/app.html

Author Contributions
Conceptualization