Emergency department routine data and the diagnosis of acute ischemic heart disease in patients with atypical chest pain

Background Due to an aging population and the increasing proportion of patients with various comorbidities, the number of patients with acute ischemic heart disease (AIHD) who present to the emergency department (ED) with atypical chest pain is increasing. The aim of this study was to develop and validate a prediction model for AIHD in patients with atypical chest pain. Methods and results A chest pain workup registry, ED administrative database, and clinical data warehouse database were analyzed and integrated by using nonidentifiable key factors to create a comprehensive clinical dataset in a single academic ED from 2014 to 2018. Demographic findings, vital signs, and routine laboratory test results were assessed for their ability to predict AIHD. An extreme gradient boosting (XGB) model was developed and evaluated, and its performance was compared to that of a single-variable model and logistic regression model. The area under the receiver operating characteristic curve (AUROC) was calculated to assess discrimination. A calibration plot and partial dependence plots were also used in the analyses. Overall, 4,978 patients were analyzed. Of the 3,833 patients in the training cohort, 453 (11.8%) had AIHD; of the 1,145 patients in the validation cohort, 166 (14.5%) had AIHD. XGB, troponin (single-variable), and logistic regression models showed similar discrimination power (AUROC [95% confidence interval]: XGB model, 0.75 [0.71–0.79]; troponin model, 0.73 [0.69–0.77]; logistic regression model, 0.73 [0.70–0.79]). Most patients were classified as non-AIHD; calibration was good in patients with a low predicted probability of AIHD in all prediction models. Unlike in the logistic regression model, a nonlinear relationship-like threshold and U-shaped relationship between variables and the probability of AIHD were revealed in the XGB model. Conclusion We developed and validated an AIHD prediction model for patients with atypical chest pain by using an XGB model.


Introduction
Traditional models usually assume that each predictor is associated with outcomes in a linear fashion [28]. However, the association between routinely collected data and outcomes in atypical chest pain is uncertain. For example, components of routinely collected data, e.g., aspartate transaminase (GOT) and white blood cell count (WBC), may exhibit different patterns in relation to AIHD. There is also a possibility of high-order interaction between each predictor. A machine learning-based prediction model can be beneficial in this scenario because of its potential for use in evaluating complex data, including nonlinear data and highorder interactions [29].
The aim of this study was to develop and validate a machine learning-based prediction model for AIHD using routinely collected data (excluding 12-lead ECG and cardiac biomarker test results) in patients with atypical chest pain who visited the ED. We also evaluated the predictor variable importance of the model and how each predictor affected the probability of AIHD according to its value.

Study design and ethical statements
This study was a single-center retrospective study in the ED of a large, urban, academic teaching hospital that receives~60,000 ED visits annually.
This study was approved by the Institutional Review Board, and the requirement for informed consent was waived (IRB No. 1808-001-962). This study complied with the Declaration of Helsinki, and we adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement on reporting predictive models [30].

Study setting
The ED had approximately 10-12 emergency physicians, 9-11 emergency residents, 5 specialty board staff, 50-60 emergency nursing staff, and 15-17 emergency medical technicians from 2014 to 2018. A structured protocol for triage based on patients' vital signs, brief history, and chief complaint was used. When patients visited the ED, dedicated ED nurses triaged patients based on a five-level scale (level 1: immediate, level 2: very urgent, level 3: urgent, level 4: less urgent, and level 5: not urgent) [31,32]. Based on this scale, the most unstable patients, such as those with cardiac arrest or definite shock status, were classified as level 1. If a triage nurse found no evidence of severe shock or desaturation, all patients with chest pain were assigned to on-duty physicians in the cardiovascular section, which assists with efficient ED operation and sensitive diagnosis of cardiovascular disease. Patients with a chief complaint other than chest pain were included if they also had chest pain. After interviews and physical examinations, the primary physician completed the structured chest pain workup registry. They assigned all eligible patients into categories consisting of typical angina and other angina, as per the guidelines [33]. Typical angina was defined as having all three conditions: substernal squeezing chest pain, pain subsiding with rest or nitroglycerine administration, and pain aggravated by exercise [34,35]. The registry was reviewed every other day by attending physicians for quality control of the management of patients with chest pain. If patients were suspected of having AIHD, consultation with a cardiology specialist was conducted for further management and admission.

Selection of participants
We included adult (�18 years old) nontrauma patients with chest pain who visited the ED from January 2014 to December 2018. Patients who presented with typical characteristics of angina were excluded because most needed to undergo further cardiac testing regardless of their characteristics. Patients who did not visit the ED until 1 week after symptom onset, received interhospital transfer, or presented with cardiac arrest or unstable vital signs and were classified as level 1 by a triage nurse, were excluded.
The study population was divided into a training cohort from which each of the machine learning prediction models was derived and a validation cohort in whom the prediction models were applied and tested. The training cohort was derived from data collected from January 2014 to December 2017; the validation cohort comprised data collected from January 2018 to December 2018.

Data sources
We obtained data from three independent databases, including the ED administrative database, clinical data warehouse (CDW) database [36], and chest pain workup registry. Data were obtained from January 2014 to December 2018. The ED administrative database contains patients' demographic characteristics, route of visit, time of visit, and diagnosis and disposition. The CDW database includes laboratory study results and imaging study results. The chest pain workup registry includes chest pain characteristics. We integrated the three databases using a common deidentified key to produce a comprehensive clinical dataset that contained sufficient information. If patients visited the ED multiple times within 7 days, only the data from the index visit were analyzed.

Data description and preprocessing
Because our primary purpose of prediction was diagnosis of AIHD using routinely collected information aside from 12-lead ECG findings and cardiac biomarker measurements in the early ED period, only data available at the initial ED visit were used as prediction variables. For laboratory tests, we chose the initially retrieved CBC and CMP, which have been frequently used in most EDs. We selected 25 predictors according to eligibility, and a detailed description of the variables is presented in S1 Table. Among them, 23 predictors were continuous variables (age, vital signs, and blood laboratory test results), and there was a range of proportions of missing data (2.5% to 16.8%, S1 Table). Median imputation, which is a common method used to deal with missing values in machine learning models, was conducted [37]. Extreme value imputation was conducted for outlier replacement of continuous variables except for age. Using a training cohort, the 1 st percentile value of each continuous variable and 99 th percentile value of each continuous variable were defined as cutoff values. Values smaller than the 1 st percentile cutoff or larger than the 99 th percentile cutoff were defined as extreme values and replaced in both the training and validation cohorts. This method was used to develop a model that is less sensitive to extreme values, in order to reduce the effect of outliers [38].

Outcome variable
The diagnosis of AIHD, which was extracted from the ED administrative database and CDW database, was used as the outcome. We defined patients as having AIHD if both of the following conditions were satisfied. First, among patients who visited the ED, the diagnostic code according to the International Statistical Classification of Diseases and Related Health Problems (ICD-10) needed to be between I-20 and I-25, which indicates IHD. The ED administrative database has two types of primary diagnostic codes: the final diagnostic codes at ED discharge and at hospital discharge. We defined the diagnostic code as positive for ischemic heart disease if a confirmative diagnostic code was found in any level of the discharge record. Next, a diagnosis of AIHD was accepted when coronary angiography (CAG) was performed during the patient's hospital stay. We defined patients who were discharged without CAG results as nondiagnosed.

Statistical analysis
Sample size estimation was not conducted since this was designed as a hypothesis-generating epidemiological study, and all eligible patients were included to maximize the statistical power.
Characteristics including baseline characteristics, vital signs, laboratory test results, and the study outcome were compared between the training cohort and validation cohort using the ttest or Wilcoxon rank-sum test for continuous variables and the chi-square test or Fisher exact test for categorical variables, as appropriate [40]. Cardiac tests during hospital stays were also compared between the groups according to the study outcome.
A machine-learning model using the XGB algorithm was developed using 25 predictor variables. The XGB model has been widely used in the development of prediction models in the clinical field, and it has demonstrated good performance [41,42]. The performance of the predictive model was evaluated by the area under the receiving operating characteristic curve (AUROC) as a primary measure. We assessed calibration power using the scaled Brier score, Hosmer-Lemeshow test, and a calibration plot in the validation cohort. The test characteristics of each model in the validation cohort, including the sensitivity, specificity, and positive and negative predictive values with 95% confidence intervals, were reported. The optimal cutoff probability for evaluation of the test characteristics was calculated using the Youden index. The variable importance of the XGB model was also reported. The XGB models were ranked by variable importance on the gain, which implies the relative contribution of the corresponding variable to the model calculated by taking each variable's contribution for each tree in the model [43]. In addition, partial dependence plots were used to determine the marginal effect of features on the predicted outcome in the XGB model.
Two baseline models were developed to compare with the XGB model and traditional model. First, because troponin is a cardiac biomarker that is most commonly used in the diagnosis of AIHD, a single-variable logistic regression model using troponin was developed to assess the usability of the XGB model in the clinical setting. Second, to compare the performance of the machine learning model and the traditional model, a logistic regression model of all predictors was developed. The variable importance of the logistic regression model was also reported. The logistic regression model was ranked by variable importance using z-statistics (the beta estimate divided by the standard error of beta). Partial dependence plots for the logistic regression model were also evaluated. Comparison of the AUROC between the XGB model and the two baseline models was performed using the De-Long test [44].
A p-value of 0.05 was considered statistically significant. All analyses were performed using R, version 3.5 (R Foundation for Statistical Computing, Vienna, Austria) with packages including caret and xgboost for the analysis of the machine learning algorithms.

Characteristics of study subjects
Altogether, 10,217 patients were screened from the comprehensive dataset. After excluding patients who were transferred from other hospitals (N = 1,085), transferred to another hospital (N = 39), had symptom onset more than 7 days before their ED visit (N = 1,298), or had the highest triage level at presentation (N = 17), 5,415 patients remained. Among them, 437 patients with typical chest pain were excluded; therefore, 4,978 patients were included in the final analysis (Fig 1). The distribution of chest pain characteristics and proportion of patients with AIHD among the 5,415 patients are presented in S1 Fig Table 1. Compared to patients in the training cohort, those in the validation cohort were more likely to visit the ED earlier, to call emergency medical services (EMS), to have a higher blood pressure and heart rate, and to be diagnosed as having AIHD (11.8% vs. 14.5%). Several laboratory test results, such as WBC, total protein level, albumin level, and electrolyte level, were significantly different between the two groups, and mortality was low in both the training and validation groups (0.5% and 0.1%, respectively; p = 0.071) ( Table 1).

Main results
The characteristics of the study patients according to the study outcome are presented in Table 2.
Compared to patients without AIHD, those with AIHD were more likely to be older men, to call EMS, and to have a higher systolic blood pressure (SBP) and lower heart rate. Among the 17 laboratory test results, 11 were significantly different between patients with and without AIHD (Table 3).
Classification results of the machine learning models in the validation cohort are presented in Table 4. There was no significant difference in AUROC between the XGB model and the baseline models (Table 4). The test characteristics of the prediction models are also shown in Table 4. The accuracy and F1 scores of the logistic regression model and XGB model were similar (logistic regression model: 0.66 and 0.36; XGB model: 0.67 and 0.36, respectively). Calibration metrics are presented in Fig 2. Calibration was poor in patients with a high predicted probability of AIHD in all prediction models.

Variable importance and partial dependence plot
Variable importance was calculated for the logistic regression model and XGB model (S2 Table). Age and glucose ranked first and second in both models, and there was variability in variable importance between the models. The relationship between the probability of AIHD and all features in the models was demonstrated according to importance, as shown in Fig 3. A notable nonlinear trend was observed in several features in the XGB model (Fig 3).

Discussion
We applied a machine-learning algorithm to patients with atypical chest pain who visited the ED in order to generate a predictive model of AIHD. We found that routinely collected data showed considerable predictive power, comparable to that of cardiac biomarkers. We also found that there were differences in variable importance between the XGB model and logistic regression model. Unlike the logistic regression model, many predictors showed nonlinear associations with the study outcome in the XGB model (Fig 3). We found that discrimination power was comparable between the XGB model and troponin model. Cardiac biomarkers are important for the diagnosis of AIHD in patients with chest pain. Our findings suggest that a machine learning model with a combination of less relevant predictors could achieve equivalent performance to biomarkers with biological relevance. Because the XGB model showed good calibration in patients with a low probability of AIHD and identified most patients as having a low probability of AIHD (Fig 2), it can be helpful  when deciding whether to conduct further cardiac tests, such as CCTA, in patients with a low risk of AIHD even when ECG and troponin findings are not available. In our study, one-fifth of non-AIHD patients underwent CCTA (Table 3). Decreasing the proportion of patients receiving CCTA may reduce radiological hazards and length of stay for those patients and preserve ED resources [45]. Calibration plots for acute ischemic heart disease in the validation cohort. The observed probability of acute ischemic heart disease (with a 95% confidence interval) is plotted against predicted good neurological recovery by 10% intervals of the predicted probability. Point size indicates the relative number of observations in a group. AIHD, acute ischemic heart disease.
https://doi.org/10.1371/journal.pone.0241920.g002 We also found that the discrimination power was comparable between our model and the logistic regression model. However, we found that the variable importance was markedly different between the XGB model and logistic regression model (S2 Table). The inherent linear relationship between a feature and the outcome of AIHD could contribute to different variable importance between the two models. SBP was the 24 th most important variable in the logistic regression model but the 9 th most important variable in the XGB model. Because the relationship between SBP and AIHD was U-shaped in the XGB model, the logistic regression model could not detect an important relationship between SBP and AIHD. A U-shaped relationship between SBP and the outcome of AIHD was also reported in a previous study [46]. GOT was among the top 5 most important variables in both models. GOT was proportionate to the risk of AIHD in the logistic regression model, but in the XGB model, that risk did not increase for GOT >75 IU/L. GOT is one of the oldest known biomarkers for AIHD [47]. However, because GOT originates from skeletal muscles or the liver rather than the heart, a high GOT level usually reflects diseases in those organs and not AIHD [48]. Therefore, the nonlinear relationship derived from the XGB model may be more compatible with biological relevance. A similar finding was also shown for glucose and WBC. Hyperglycemia and leukocytosis were associated with a high risk of AIHD in previous studies [49,50]. However, severe hyperglycemia and leukocytosis are usually associated with an endocrine problem or an infection acquired in the ED, respectively.
The similarity of discrimination power between the XGB model and logistic regression model may be due to the preprocessing method used in our study. We replaced the extreme values of each feature in preprocessing to develop a less sensitive prediction model for outliers [38]. Because of the linear relationship between the predictor and the outcome in the logistic regression model, patients with extreme values of predictors, such as glucose, WBC, or GOT, may be classified as having AIHD in the logistic regression model. This result could diminish the discrimination power of the logistic regression model because those laboratory results could be caused by other diseases, such as an endocrine problem, infection, or hepatitis, rather than AIHD. Additionally, the inclusion of all predictors in both the XGB and logistic regression models also contributed to the similar predictive power between the models. Because the variable importance and internal processing of predictors are different between the two models, the performance of each model may be different when limited information can be used in constructing it.
We evaluated the predictive performance of routinely collected information for AIHD in patients with atypical chest pain. Atypical chest pain has been reported to have a high prevalence in AIHD patients, especially in the elderly and in women [51][52][53]. Because the diagnosis of AIHD in atypical chest pain is often challenging, decision support tools for the accurate diagnosis of these patients could result in decreased misdiagnosis, inappropriate discharge, and in-hospital mortality [51]. As chest pain characteristics are not routinely collected data in many administrative databases or CDW, this group of patients is not easily identified. In this study, merging various databases allowed us an opportunity to evaluate hypotheses that could not solely be addressed by one database. We found that chest pain characteristics critically affected the probability of AIHD. Patients with three typical characteristics showed a 9.5 times higher probability of AIHD compared to patients without typical characteristics. Even in patients with one typical characteristic, the probability of AIHD doubled compared to patients without typical characteristics (S1 Fig). Because uncertainty in AIHD diagnosis increases with lower numbers of typical characteristics, the utility of prediction models is greater in patients with few characteristics.
ECG and serum cardiac biomarkers are well-known predictors for diagnosing AIHD [54,55]. Both variables can achieve a high level of performance, and prediction models and stratification tools tend to utilize these variables [10,56]. However, we did not use these variables because we considered that each variable would show dominant significant predictive power in the model by itself, which would make it difficult to evaluate other clinically important variables. Moreover, we expected that prediction models for AIHD that do not utilize those test results would have their own clinical implications. Our model can be applied in various settings without high-level laboratory facilities or specialists who interpret ECG findings. Even in facilities with many resources, nurses can undertriage, physicians may overlook diagnoses, or ED overcrowding can create issues [57,58]. Our model could also be applied to patients who visit the ED after their routine laboratory test was performed in an outpatient clinic or another hospital. Because our model can be applied in situations where some laboratory findings are missing, the potential coverage of our model is extensive.
This study has several limitations. First, the final diagnosis of the patient was defined based on the diagnostic code and procedure result recorded in his/her electronic medical record. This definition did not include whether the culprit lesion was observed or whether further intervention, such as ballooning or stenting, was performed. We focused on CAG rather than percutaneous coronary intervention (PCI) since this population should not be overlooked in the ED, even if they do not ultimately undergo PCI. Second, this study was conducted based on a population that visited one tertiary ED; thus, further external validation is required for data generalization. Third, only patients with chest pain who visited the ED were enrolled, and the chief complaint was determined by a triage nurse. Patients with dyspnea, syncope, or palpitation but no chest pain at the time of their ED visit were not included. In addition, patients with altered mental state who could not verbalize their complaint were not enrolled.

Conclusion
In summary, we used the XGB algorithm to develop and validate prediction models for AIHD in patients with atypical chest pain who visited the ED. Our prediction model showed similar performance to the troponin and logistic regression models for detecting AIHD. However, we identified a notable nonlinear relationship between predictors and the study outcome and a different variable importance pattern by using the XGB model. Further prospective validation of our results is warranted, and a response protocol based on our model should be evaluated. Because we developed our prediction model using routinely collected data, a rapid response system based on our model may be applied more broadly to critical patients in the emergency setting. An automatic screening process that uses basic important variables and routine testing should be considered.