Short- and long-term mortality prediction after an acute ST-elevation myocardial infarction (STEMI) in Asians: A machine learning approach

Background Conventional risk score for predicting short and long-term mortality following an ST-segment elevation myocardial infarction (STEMI) is often not population specific. Objective Apply machine learning for the prediction and identification of factors associated with short and long-term mortality in Asian STEMI patients and compare with a conventional risk score. Methods The National Cardiovascular Disease Database for Malaysia registry, of a multi-ethnic, heterogeneous Asian population was used for in-hospital (6299 patients), 30-days (3130 patients), and 1-year (2939 patients) model development. 50 variables were considered. Mortality prediction was analysed using feature selection methods with machine learning algorithms and compared to Thrombolysis in Myocardial Infarction (TIMI) score. Invasive management of varying degrees was selected as important variables that improved mortality prediction. Results Model performance using a complete and reduced variable produced an area under the receiver operating characteristic curve (AUC) from 0.73 to 0.90. The best machine learning model for in-hospital, 30 days, and 1-year outperformed TIMI risk score (AUC = 0.88, 95% CI: 0.846–0.910; vs AUC = 0.81, 95% CI:0.772–0.845, AUC = 0.90, 95% CI: 0.870–0.935; vs AUC = 0.80, 95% CI: 0.746–0.838, AUC = 0.84, 95% CI: 0.798–0.872; vs AUC = 0.76, 95% CI: 0.715–0.802, p < 0.0001 for all). TIMI score underestimates patients’ risk of mortality. 90% of non-survival patients are classified as high risk (>50%) by machine learning algorithm compared to 10–30% non-survival patients by TIMI. Common predictors identified for short- and long-term mortality were age, heart rate, Killip class, fasting blood glucose, prior primary PCI or pharmaco-invasive therapy and diuretics. The final algorithm was converted into an online tool with a database for continuous data archiving for algorithm validation. Conclusions In a multi-ethnic population, patients with STEMI were better classified using the machine learning method compared to TIMI scoring. Machine learning allows for the identification of distinct factors in individual Asian populations for better mortality prediction. Ongoing continuous testing and validation will allow for better risk stratification and potentially alter management and outcomes in the future.


Introduction
Half of the global burden related to ischemic heart disease occurs within the Asia-Pacific region [1]. Prediction of mortality risks associated with the acute coronary syndrome (ACS) is often evaluated using risk scores such as the Thrombolysis in Myocardial Infarction (TIMI) or Global Registry of Acute Cardiac Events (GRACE) scores. These scores are extrapolated from studies with predominantly Caucasian patients with limited participation from Asia [2]. Asian countries tend to have younger patients with myocardial infarction, a higher burden of diabetes melitus, hypertension and renal failure as well as higher rates of delayed presentation for medical care [3,4]. South-East Asia in particular is unique because of its heterogeneity due to inherent genetic variations in an already diverse group of multi-ethnic communities. Conventional risk scores may not be able to account for nuances related to the individual region in terms of disease burden, healthcare resources and available interventions.
The TIMI risk score is widely used due to its simplicity in calculation and accuracy in STEMI patients. TIMI scoring, unlike the GRACE score, was derived from patients with STsegment elevation myocardial infarction (STEMI) only [5]. Studies using TIMI scores amongst Asians revealed a higher incidence of STEMI when compared to their Caucasian counterpart with somewhat similar mortality risk. This discrepancy is difficult to explain especially in the context of a higher disease burden amongst Asian patients.
Conventional cardiovascular disease (CVD) risk assessment models assume that risk factors have a linear relationship to clinical outcomes, leading to the oversimplification of a truly complex correlation. There is a need to develop models which consider these multiple risk factors and outcomes, including the use of machine learning (ML) algorithms [2,[6][7][8].
Current evidence supporting the use of ML over statistically-based models in mortality predictions include Logistic Regression (LR), Support Vector Machine (SVM) and Random Forest (RF). ML has been shown to outperform the conventional risk scoring model in population-specific mortality studies, post-STEMI, in countries like China, Israel and Korea [2,7,9].
To our knowledge, the development, and application of ML algorithms to predict shortand long-term mortality post-STEMI in a heterogeneous Asian population has yet to be reported. The study aims to identify factors and develop an ML model risk calculator that predicts short and long-term mortality in a heterogeneous South-East Asian population.

Study data
We used retrospective data from the Malaysian National Cardiovascular Database (NCVD-ACS) registry collected between 2006 until 2016. The NCVD registry was approved by the Medical Review & Ethics Committee (MREC), Ministry of Health (MOH) Malaysia in 2007 (Approval Code: NMRR-07-20-250). MREC waived informed patient consent for NCVD. The registry collects data on a standardised set of clinical, demographic, and procedural variables, along with outcomes, for consecutive patients treated at participating institutions [10,11]. The study was also approved by the UITM ethic committee (Reference number: 600-TNCPI (5/1/6)) and the National Heart Association of Malaysia (NHAM) for data acquisition.
All patients from the ACS registry without exclusion were used including patients who received reperfusion (fibrinolysis, primary PCI (PPCI), angiography demonstrating spontaneous reperfusion, or urgent coronary artery bypass grafting (CABG)) for STEMI. In this context, STEMI was defined as persistent ST-segment elevation � 1 mm in two contiguous electrocardiographic leads, or the presence of a new left bundle branch block in the setting of positive cardiac markers. 50 variables from a complete set of data were used in this study based on clinical recommendation. Categories of variables used were sociodemographic characteristics, CVD diagnosis and severity, CVD risk factors, CVD comorbidities, non-CVD comorbidities, biomarkers and medication used. The mortality time frame was calculated from first hospital admission for in-hospital, 30 days and 1-year. Confirmation of deaths was done yearly via record linkages with the Malaysian National Registration Department. The data collected by the registry does not include data on short term complication such as heart failure. The follow-up data points are meant to collect these variables but unfortunately are excessive in terms of missing values and hence we omitted this from the study. We focused our algorithm to policy changing endpoints for example hard endpoints such as death to increase the impact of the study. This is similarly done in other publications [2,7,9].

Classification and sample pre-processing
We developed the ML using a complete set of data to ensure the validity of the findings. A total of 27,592 STEMI cases from the registry were collected 12,368 were identified as complete cases (with no missing values on predictors). Out of the 12 368 datasets, a total of 6299, 3130 and 2939 complete cases were used for in-hospital, 30-days and 1-year respectively for the model development. This rendered almost 50% complete cases of patients with a full predictor set of 50 variables for each time frame (9 continuous, 41 categorical) for the study (Table 1). Stratified random sampling of data was used [12]. Data were split for model development (70%) and validation (30%). We accessed the performance of ML and TIMI using a validation set that accounts for 30% of data for each time frame that is not used for model development.

Model development and calibration
Prediction models post-STEMI were developed using three selected ML algorithms. Next, feature selection (see below) was carried out on the ranked variables in an ascending order
Data are shown as n (%) for categorical variables and mean ± SD for continuous variables.
p value is statistically highly significant as p < 0.001. https://doi.org/10.1371/journal.pone.0254894.t001 iteratively [13]. 10-fold cross-validation was used to avoid overfitting for model development on the training set [14]. The prediction models were trained and tested for each iteration, and the models with the highest performance consisting of the least number of variables were selected. Predictive performances of the models were calculated using the validation dataset. Secondary analyses were carried out after adding 15224 missing cases imputed using multivariable imputation using chained equations and predictive mean matching that yields a total of 27 592 cases [15]. This method imputes missing values based on real values from other cases where predicted values are closest. Our reference for incomplete dataset refers to missing sets of variables up to 50%. The missing dataset mentioned refers to patient characteristics and not outcome data. As our dataset is a prospective dataset, with retrospective data management, the level of missingness in values across all variables was completely random and beyond our control. The probability of missingness in our dataset depends neither on the observed values in any variable of the dataset nor on the unobserved part of the dataset.
Hence the dataset is classified as missing completely at random (MCAR) which is the highest level of randomness and it implies that the pattern of missing value is random and does not depend on any variable which may or may not be included in the analysis. We had complete data for all our outcomes. The models were tested with a similar validation dataset for ML models trained with a complete cases dataset.

Machine learning algorithms and calibration
Supervised classification ML algorithms RF [16], SVM [17] and LR [18] were selected in this study. They are the classifiers that have resulted in high predictive performance compared to conventional methods in mortality studies [7,19]. RF and SVM are black-box models (models without interpretability) meanwhile LR is a white-box model (model with good interpretability) [16,18,20]. The ML algorithms' parameters were set to the optimized value to obtain higher predictive performance (S1 Table). Tuned hyperparameters improve ML model performance over the default setting provided [21]. The area under the receiver operating curve (AUC) was used as a predictive performance metric [22]. Additional performance metrics were accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for model calibration [23]. Paired resampled t-test was used to compare ML models predictive performances [12,24].

Feature selection
Feature selection is the process of ranking variables using classifier specific variable evaluator. RF, SVM and LR variable importance method were used to rank variables importance associated with outcome (survival / non-survival at in-hospital, 30 days, and 1-year).
Feature reduction involves the elimination of existing variables to a minimal set, which reduces training time, produces better results, and increases the accuracy of results. Next, sequential backward elimination (SBE) [13] was used in this study for feature reduction on the ranked variables in ascending order of importance iteratively from the ML variables importance methods. The ML prediction model is retrained and tested each time a variable is eliminated. The variable that causes a significant decrease in the AUC of the prediction model upon elimination based on the ranked variable list using the feature selection method is deemed as important. We selected the important variables and ranked them again and the elimination process is repeated until a model with a reduced number of variables and the highest AUC value is achieved.
RFE combined with ML classifiers have been used in various clinical dataset successfully [25][26][27][28]. We also used Recursive feature elimination (RFE) to find a minimal and best set of variables by removing the least important features and compare them with feature selection by ML methods (RF, SVM and LR) [29,30].

Comparison with TIMI score
Calculated TIMI scores were used from the NCVD registry for the validation data performance. TIMI score performance (AUC) was compared with the developed ML-based models using the validation set. We derived a graph to compare performance between ML and TIMI score based on cutoff points applicable in clinical practice and literature [31]. A high risk of death was defined as a probability risk of death of more than 8% similar to reported in Correia et al. [31].
Net reclassification improvement (NRI) was used to determine the changes in discrimination between the TIMI risk score for STEMI and ML algorithm. The NRI uses reclassification tables to examine whether there is an additive benefit gained from reclassifying patients using a different approach in mortality assessment. By calculating the NRI, we were able to quantify the degree to which the different mortality risk assessment approaches driving correct movement between categories. An NRI can be interpreted as the percentage by which the net classification has improved by using a new different approach. The NRI was used to evaluate the improvement in classification obtained by comparing the TIMI risk score for STEMI with ML for STEMI [32].

Additional statistics
The results are expressed as mean and SD for continuous variable and as frequencies for categorical variables. Correlation analysis was carried out to identify a significant relationship between variables. Univariate analysis was performed using a Chi-Square test to identify significant variables and a two-sided independent student t-test (p < 0.05). The ML performance was compared using a pair-wise corrected resampled t-test [33,34]. Statistical significance was considered if the p-value was less than 0.05.

Software
R package (Version 3.5.2) was used in ML algorithm development. Statistical analysis was conducted using Statistical Package for Social Sciences (SPSS) program version 16.0 [35].

Patient characteristics
A total of 27,592 STEMI patients were identified. Incomplete data made up 55.2% of patients enrolled. Table 1 illustrates patients' characteristic used in this study on the complete dataset. The mean age was 56.6 (SD = 11.7). The majority of patients (~87%) were males. The overall mortality reported for in-hospital, 30 days and 1-year was 5.4%, 8.1% and 14.4%. There was a significant difference between survivors to non-survivors for in-hospital, 30-days and 1-year mortality in terms of gender, smoking status, diabetes, renal disease, heart rate, Killip class, fasting blood glucose, ECG abnormalities, beta-blocker, ACE inhibitor, statin, diuretics, insulin and anti-arrhythmic agent use (p < 0.0001 for all). S2 Table illustrates patient's characteristics for secondary analysis using an imputed dataset. Both statistical analyses on the complete and imputed dataset are almost similar.

ML prediction
Maximal predictive performances on the validation dataset were observed for ML models constructed using reduced and complete sets of variables compared to TIMI risk score using untouched 30% validation dataset (Table 2). TIMI only outperformed the RFE-LR model for 30 days and 1-year mortality. The best-selected ML model (SVMvarImp-SBE-SVM) performed better against TIMI based on the AUC value using the untouched 30% validation dataset (p < 0.0001 for all models). Detailed performance evaluation of the best ML model against TIMI risk score is presented in Table 3. Fig 1(a) illustrates ML model performances and Fig 1(b) the best selected ML model against TIMI based on the AUC value using the untouched 30% validation dataset.  Table 2. The AUC of TIMI risk score and ML models with and without feature selection based on a 30% validation dataset.

Classifiers
The  Table 3. Additional performance metrics based on a 30% validation dataset for TIMI risk score and ML models with and without feature selection.

Feature selection
RFE and SBE feature selection methods were combined with ML algorithms to construct predictive models with optimal performance (refer to methods). Initial ranking using all 50 variables for best model (SVMvarimp-SBE-SVM) using SVM variable importance is shown in S1-S3 Figs. SBE was then used to identify features that result in model optimal performances. Common predictors observed for in-hospital, 30 days and 1-year mortality across all ML models in this study are (age, heart rate, Killip class, and fasting blood glucose). Diuretics were an additional common predictor for the best model (SVMvarImp-SBE-SVM). Age, heart rate and Killip class are identified as common predictors for the best ML model (SVMvarImp-SBE-SVM) in-hospital, 30 days and 1-year against TIMI ( Table 4).  stratum as �50%. This is equivalent to TIMI low risk of score �5 and a high-risk score of > 5 [5].

Comparison of ML to TIMI risk score when applied to validation dataset
In the high-risk group, ML better-predicted mortality in comparison to TIMI for in-hospital death (21.94% vs 16.15%) but similar for prediction for 30 days and 1-year deaths. (25.61% vs 23.15% and 35.71% vs 34.48%).
Regarding the NRI for the in-hospital model, the net reclassification of patients improved using the ML produced a net reclassification improvement of 0.20 with p<0.0001 over the original TIMI risk score, that is, a 20% improved classification. NRI for 30 days reported the net reclassification of patients improved using the ML produced a net reclassification improvement of 0.19 with p<0.0001 over the original TIMI risk score, that is, a 19% improved classification. In the 1-year model, the net reclassification of patients improved using the ML produced a net reclassification improvement of 0.14 with p<0.0001 over the original TIMI risk score, that is, a 14% improved classification (Table 5).

Variables
Machine learning best model TIMI Score

Discussion
Our study is the first to show better short-and long-term mortality prediction using the ML method in a multi-ethnic Asian patient with STEMI. We demonstrated high performance on validation dataset for ML models with a combination of feature selection and classifier algorithms. Overall ML model performed better than TIMI for in-hospital, 30days and 1-year AUC of (0.88vs 0.81, 0.90 vs 0.80, 0.84 vs 0.76). SVMvarImp-SBE-SVM for in-hospital, 30 days and 1-year mortality prediction had better performance compared to RF, LR and TIMI scoring as well.
The TIMI risk score was originally developed to estimate 30 days mortality risk. In the absence of a more convenient risk score system, it has since been exploited to predict in-hospital, 30 days and 1-year mortality post-STEMI in other Asian countries as well as Mexico [2,[36][37][38]. This is despite its moderate accuracy for risk prediction in Asians with an AUC of 0.78 [3]. In this validation study, the Asian cohort was found to be carrying an overall higher disease burden and risk compared to the TIMI cohort. The mortality rate, however, was no different suggesting an inherent inaccuracy within the algorithm. Not only that, TIMI is known to underestimate mortality risk in the lower risk group. This may delay treatment incurring excess avoidable deaths.
TIMI risk score for STEMI consists of the following components: age; systolic blood pressure; heart rate; Killip classification; infarct location or left bundle branch block; a history of diabetes, hypertension, angina pectoris, weight, and time to reperfusion (thrombolysis or pPCI). Previous studies have modified 'time to reperfusion' to be 'door-to-needle' or 'door-toballoon' time instead of 'symptom onset to-reperfusion' time because of inconsistencies in the reporting of symptom onset time [39]. Our study excluded some variables such as angina pectoris, weight and time to reperfusion in the model development as over 50% of data was missing. Additional parameters (Table 1) were included including ethnicity, smoking status, invasive and non-invasive treatments, lipid profile and features from the complete blood chemistry at admission.
Feature selection algorithms are essential in mortality prediction. A combination of feature selection methods with classification algorithms resulted in higher performance versus using standalone classifiers [29]. Applications of feature selection algorithms improved ML model performance using a reasonable number of predictors by reducing the predictor's dimensionality [40]. The model performance in this study increased with the reduction in the number of predictors. Our results indicate that ML model predictive performance requires 15 predictors for in-hospital, 13 for 30 days and 12 for 1-year mortality prediction that performs better than models developed using a conventional statistical approach.
We used univariate analysis to support the relationship between variables selected from ML algorithms and outcomes (Table 1). Age, heart rate, Killip class and fasting blood glucose were ranked and selected by all short-and long-term mortality prediction ML models. Older age and higher Killip class were significant predictors of mortality [41,42]. Age, Killip class and fasting blood glucose were also selected as a factor that affects mortality post-STEMI by ML models in previous studies [7,19]. Glucose levels were ranked by all ML models, supporting the relationship between hyperglycemia and increased risk in mortality for patients with STEMI in the Asian population [43]. STEMI patients with higher heart rates were associated with an increased risk of mortality, even after primary PCI [44]. This may be a reflection of worse presentation (higher Killip class) or even higher pain intensity from a larger infarct.  Table 5

In-hospital
Individuals with events (n = 101) Incorporating variables like having invasive or non-invasive management into the SVMvar-Imp-SBE-SVM model to predict in-hospital, 30 days and 1-year mortality yield interesting results. Invasive treatment such as PCI received by STEMI patients showed a trend towards better outcomes for in-hospital and 30 days after discharge. Mortality risk at 1-year was reduced by 40% for patients who received PCI compared to those who did not [4,39,45].

Number of individuals
TIMI and GRACE scores were calculated based on data during an era where early reperfusion therapy and routine use of drug-eluting stents were not common. Non-invasive treatment predictors such as pharmacological therapy (medications including anti-hypertensive (ACE inhibitor, beta-blockers, diuretics), anti-diabetic agents (oral hypoglycaemic agents, insulin and antiplatelet) were selected for in-hospital, 30 days and 1-year mortality prediction in our study. These drugs are often prescribed in the acute setting to augment neurohumoral modulation associated with left ventricular negative remodelling. Being on these medications could signal a sicker ventricle hence the strong association with death.
Systolic and diastolic blood pressure were ranked as predictors for in-hospital, 1-year and 30 days models. Cardiogenic shock at presentation increases the risk of death. STEMI patients with cardiogenic shock who survived in-hospital death are at an increased risk of long-term death, probably as a reflection of the severity during initial admission [46].
Other CVD risk such as hypertension, diabetes, smoking and chronic renal disease, were associated with a poorer 1-year outcome. Poorly controlled CVD risk leads to an adverse systemic remodeling, leading to a plethora of cardiovascular conditions including heart failure, stroke, renal failure, and peripheral vascular disease [47].
By having continuous data collection through an electronic health records system, we were able to allow for the adaptation of ML predictive algorithm tailored to patient's risk grouping. ML methods discussed in this study are needed to rank and select significant risk factors associated with short-and long-term STEMI mortality. Feature selection allows better interpretation of the models by restricting the scope of predictors used, selecting only those clinically relevant. ML models in this study have demonstrated higher performance compared to TIMI scoring that was extrapolated from a Caucasian cohort. Asian patients present at a younger age with acute coronary syndromes. The average age in the GRACE registry was 61, whereas it is 58 in Malaysia and 51 in the Middle East [48]. Numerous factors are associated with differences in presentation. Hence risk scoring tools should be adapted to a specific population to better reflect the differences with greater accuracy.
Data imputation was performed to ensure the validity of the findings. We used multivariable imputation using chained equations and predictive mean matching method for data imputation instead of using machine learning-based method such as missForest in this study. The data imputation method used in this study was selected as recommended in a similar study conducted on the Swedish heart registry dataset that resulted in high model performance [19]. Moreover, Solaro et al. demonstrated that the relative performance of missForest varied with the MCAR data patterns and did not show a clear advantage. Overall, the imputation accuracy and applicability of missForest is still unclear [49]. We initially did not include patients with more than 50% missing data as it will require data imputation, which may affect our result. We do not feel it is a limitation for the population as it is still a large dataset. As the dataset had completed dataset for all follow-up time points, generation of risk calculator was possible for both ML and TIMI calculator. However, identifying factors associated with short-and long-term mortality prediction usage of complete cases would lead to more reliable findings. We went back to use an incomplete dataset and imputed data and showed similar results.
The cross-validation approach used in this study increases the efficacy of the models during model construction as it reduces the risk of model over-fitting. Also, the classification performance is highly influenced by data pre-processing and tuning of algorithms [50]. A pair-wise corrected resampled t-test was used to evaluate the differences between ML models predictive performances. The resampled t-test is a validated tool for the comparison of outcome between two classifiers [33,34]. ML algorithms SVM and RF have demonstrated high predictive performance when combined with feature selection in mortality related studies [7,19]. Both RF and SVM models were used to determine the list of variable importance that is an essential part of contributing to good model performance. RF, SVM and LR with SBE a feature reduction algorithm reported higher performance compared to RFE. SBE algorithm depends only on importance as an adequate term to eliminate unimportant variables one-by-one from a model [51]. Meanwhile, RFE is reported to have poor generalisation ability.
ML models in this study were validated with untouched validation data that was not used for model development, to confirm the reliability of the current study. We also demonstrated the ML model using complete sets of variables collected, without a variable selection process that resulted in a similar performance to models with feature selection. This shows that feature selection does not lead to the loss of important prognostic information.
Despite a large proportion of missing values in the original dataset, we were still able to apply both TIMI and ML algorithm and compare outcomes. This is likely because we used a hard endpoint of death that is not affected by missing values. Another possibility is that the variables extracted (15 for in-hospital, 13 for 30 days and 12 for 1-year) was sufficient to increase the model's precision to predict death reliably.
Future study will focus on validation of the ML algorithm in real-time involving several local hospitals for continuous assessment of its reliability. Application of ML models that are population-specific together with conventional risk scoring method allows better outcome in mortality prediction, communication and increases awareness of patients that enables behavioural modifications and better management of limited resources by clinicians.

Study limitations
This study compared the performance of an ML-based model for in-hospital, 30 days and 1-year with a clinical prognostic model that was designed for 30 days' mortality. Its robustness would be increased had we included variables and compared them to other scoring systems such as GRACE and the Heart Score. The lack of certain variables precluded this attempt. We recognised that missing variable may result in a bias finding. We attempted to reduce this effect by applying TIMI score and ML-based score to the same population. Selection bias that exists within registries is difficult to control. We hope that future real-world study would validate our findings.