Predicting 30-day and 1-year mortality in heart failure with preserved ejection fraction (HFpEF)

Ikgyu Shin; Nilay Bhatt; Alaa Alashi; Keervani Kandala; Karthik Murugiah

doi:10.1371/journal.pone.0336809

Abstract

Objectives

To develop and compare prediction models for 30-day and 1-year mortality in Heart failure with preserved ejection fraction (HFpEF) using EHR data, utilizing both traditional and machine learning (ML) techniques.

Background

HFpEF represents 1 in 2 heart failure patients. Predictive models in HFpEF, specifically those derived from electronic health record (EHR) data, are less established.

Methods

Using MIMIC-IV EHR data from 2008−2019, patients aged ≥ 18 years admitted with a primary diagnosis of HFpEF were identified using ICD-9 and 10 codes. Demographics, vital signs, prior diagnoses, and lab data were extracted. Data was partitioned into 80% training, 20% test sets. Prediction models from seven model classes (Support Vector Classifier (SVC), Logistic Regression, Lasso Regression, Elastic Net, Random Forest, Histogram-based Gradient Boosting Classifier (HGBC), and eXtreme Gradient Boosting (XGBoost)) were developed using various imputation and oversampling techniques with 5-fold cross-validation. Model performance was compared using several metrics, and individual feature importance assessed using SHapley Additive exPlanations (SHAP) analysis.

Results

Among 3,235 hospitalizations for HFpEF, 30-day mortality was 6.3%, and 1- year mortality was 29.2%. Logistic regression performed well for 30-day mortality (Area Under the Receiver operating characteristic curve (AUC) 0.83), whereas Random Forest (AUC 0.79) and HGBC (AUC 0.78) for 1-year mortality. Age and NT-proBNP were the strongest predictors in SHAP analyses for both outcomes.

Conclusion

Models derived from EHR data can predict mortality after HFpEF hospitalization with comparable performance to models derived from registry or trial data, highlighting the potential for clinical implementation.

Citation: Shin I, Bhatt N, Alashi A, Kandala K, Murugiah K (2025) Predicting 30-day and 1-year mortality in heart failure with preserved ejection fraction (HFpEF). PLoS One 20(11): e0336809. https://doi.org/10.1371/journal.pone.0336809

Editor: Shukri AlSaif, Saud Al-Babtain Cardiac Centre, SAUDI ARABIA

Received: April 9, 2025; Accepted: October 30, 2025; Published: November 14, 2025

Copyright: © 2025 Shin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data that support the findings of this study are openly available in Physionet. Code used to analyze data and build the models is publicly accessible via [GitHub] (https://github.com/itsjustnilay/Predictive_Modelling_for_Heart_Failure_with_Preserved_Ejection_Fraction).

Funding: Dr. Murugiah received support from the National Heart, Lung, and Blood Institute of the National Institutes of Health (under award K08HL157727). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Heart failure with preserved ejection fraction (HFpEF) is a distinct subtype of heart failure (HF), and accounts for the majority of HF hospitalizations [1]. Despite this burden of hospitalizations, and the associated considerable morbidity and mortality, prognostic models specifically for patients hospitalized with HFpEF are less established. Accurate prediction models are essential to physicians to help identify and manage high risk patients, to health systems for allocating resources, and to policy makers for risk adjustment to measure performance.

With the wide availability of electronic health records (EHR), there is a need for predictive models to be based on real-world EHR data which is critical for implementation at the bedside. The few predictive models that have been developed for HFpEF have been derived from registry or trial data and are for ambulatory populations [2–7]. In addition, these models often contain variables such as New York Heart Association (NYHA) Class or complex health status assessments, which are not readily available in the EHR [3–6]. Additionally, it is important for models to be developed in a data-driven approach incorporating complex interactions, which can be accomplished with machine learning techniques.

Accordingly, we leveraged data from the Medical Information Mart for Intensive Care (MIMIC)-IV database and tested a variety of modeling techniques including machine learning to develop prediction models for 30-day and 1-year mortality with an index hospitalization for HFpEF. We compared model performance using an array of performance metrics.

Methods

This study adheres to the guidelines set by the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. Compliance with the TRIPOD checklist for the thorough and transparent reporting of our predictive model development and validation processes are detailed in S1 Table [8].

We employed seven predictive models: Logistic Regression, Lasso Regression, Elastic Net, Support Vector Classifier (SVC) with a radial basis function (RBF) kernel, Random Forest, Histogram-based Gradient Boosting Classifier (HGBC), and eXtreme Gradient Boosting (XGBoost) [9–15]. Each model class has its unique advantages in handling different aspects of the data.

The models were evaluated using the following metrics: Accuracy, Sensitivity, Specificity, Area Under the ROC curve (AUC), Precision-Recall Area Under the Curve (PR-AUC), Calibration curves, MCC score (Matthews Correlation Coefficient), AIC (Akaike Information Criterion), and BIC (Bayesian Information Criterion) [16–19].

Accuracy, Sensitivity, Specificity, AUC and PR-AUC are commonly encountered metrics used to evaluate models in medical literature. In addition, MCC is a balanced measure of model performance, particularly in the context of imbalanced classes, as it considers true and false positives and negatives, offering more information than accuracy alone. AIC and BIC both assess model fit and complexity. AIC estimates the relative quality of models for a given dataset by considering the trade-off between goodness-of- fit and the number of parameters, penalizing models with excessive complexity. BIC incorporates a penalty term for the number of parameters but with a stronger penalty for model complexity, providing a stricter criterion that favors more parsimonious models. We informed overall model selection with the metrics that would be more important from a clinical standpoint for this particular prediction problem.

Data sources

We used the MIMIC-IV dataset version 2.2 – a publicly shared database of de- identified electronic health record data, including hospital and intensive care unit admissions from the Beth Israel Deaconess Medical Center in Boston, MA from 2008 to 2019 [20,21]. The data were accessed via PhysioNet after completing the necessary requirements. Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research. The data that support the findings of this study are openly available in Physionet [22]. Given that the MIMIC IV data is de-identified and publicly accessible, the study was not subject to Yale Institutional Review Board review.

Study population

We identified hospitalizations from 2008 to 2019 of patients aged ≥ 18 years with HFpEF as a primary diagnosis using appropriate ICD-9 and ICD-10 codes (S2 Table) [23].

As our diagnosis was based on ICD codes, to test the validity of this label we queried clinical notes using regular expressions to extract mentions of the left ventricular ejection fraction value or a qualitative report of the left ventricular function using appropriate phrases. However, as this LVEF data was extracted from clinical notes and not readily available in a structured field in MIMIC-IV data, we intentionally did not include this in predictive modeling.

From a total of 430,852 hospitalizations, we identified 3,235 individual hospitalization encounters with a discharge diagnosis of HFpEF which comprised the study sample. Among these hospitalization encounters, we had access to clinical notes for 3,146 (97.3%) encounters of which 1,836 (58.4%) had an LVEF measurement value reported. Of these, 1,726 (94.0%) had an LVEF value ≥ 50%, and 46 (2.5%) had an LVEF between 45-50%. An additional 586 (18.6%) encounters had a qualitative mention of LVEF, of which 551 (94.0%) indicated the LVEF was normal/preserved. Thus, this ICD code-based diagnosis label was considered valid for identifying encounters with HFpEF within MIMIC-IV data. Given that among the 77% encounters with either a quantitative or qualitative mention of ventricular mention, >97% had a documented preserved LVEF we concluded that this method of ICD code-based identification of HFpEF is valid and has sufficient positive predictive value. This validation exercise will also be useful for future projects by other researchers that use ICD codes to identify HFpEF. In this process we did note 55 cases with a documented LVEF <40% and 16 cases with a qualitative mention of ‘reduced’ LVEF. However, these cases were not excluded from the cohort as the primary method of identification for this study remained ICD code-based. The notes assessment was primarily for the purpose of validation of the ICD code-based diagnosis.

Outcomes

Outcomes for predictive models included 30-day and 1-year mortality. Date of death in MIMIC data is derived from hospital records and state records. The maximum time of follow up for each patient in MIMIC data is exactly one year after their last hospital discharge.

Data extraction

Data containing patient demographics, vital signs, diagnoses using ICD codes, admission information, laboratory tests, and date of death were extracted from appropriate relational tables using two identification columns: ‘subject_id’ and ‘hadm_id’. The ‘subject_id’ represents a single patient’s admission to the hospital, while the ‘hadm_id’ pertains to a specific hospital admission event. For data tables not readily alignable through these IDs, we employed alternative matching strategies, such as correlating timestamps within one day.

ICD diagnosis codes were mapped to comorbidity categories in the Charlson Comorbidity Index (CCI) - a common method for mapping and summarizing patients’ comorbidities. However, as the weighting of comorbidities in CCI is not particular to HFpEF, and our goal is to identify and use predictive variables, we did not use the comorbidity score as a predictive variable and instead used the individual mapped comorbidities as separate variables. In addition, we included a select few other comorbidities such as hypertension, atrial fibrillation, pulmonary hypertension etc. which are noted to be predictors in prior HFpEF prediction models but are not a part of the CCI comorbidities. For vital signs and specific lab values we used the first entry on the day of admission using appropriate time stamps.

We assessed sample size adequacy to support model development to predict mortality in HFpEF patients by using the criteria suggested by Riley et al [24]. Using the I- PRESERVE⁴ 1-year all-cause mortality model’s AUC of 0.74 as a benchmark, we calculated the minimum sample size required for 1-year mortality with a prevalence of 29.2% to be 2,037. For 30-day mortality there are no contemporary prediction models for HFpEF specific to this time frame. However, using an in-hospital mortality model by Wang et al. with an AUC of 0.83 as a reference, and an observed 30-day mortality rate of 6.3% in our cohort, a similarly performing model would need a sample size of 3,186 [25]. This suggested our sample size should be adequate for both outcomes.

Preprocessing

As a part of data preprocessing, we one-hot encoded sex, and binarized comorbidity variables. Four extreme outliers were identified and subsequently treated as missing data. Based on visual inspection of the variable distributions, we applied PowerTransformer (Yeo–Johnson) or QuantileTransformer (normal output) transformations to variables with wide ranges or evident non-normal distributions (e.g., creatinine, INR, platelet count, WBC count, and oxygen saturation), while BMI was standardized to zero mean and unit variance. These transformations were applied prior to training Elastic Net, Lasso, Logistic Regression, SVC, and XGBoost models. Random Forest and HGBC models were trained on the original unscaled features, as tree-based methods are inherently robust to feature distribution and scaling.

To identify the most effective preprocessing strategy for handling missing data, we explored several imputation techniques, including mean and median imputation, along with Multiple Imputation by Chained Equations (MICE). As a validation, we compared the statistical analyses results from the imputed data with those obtained after dropping missing data and assessed the consistency of results and distribution changes to best maintain data integrity and statistical power, while avoiding the substantial data loss associated with dropping missing data. The statistical tests included the Shapiro-Wilk test for normality, t-tests, and Mann-Whitney U tests for continuous variables, Chi-Squared and Fisher’s Exact tests for categorical variables, and Variance Inflation Factor (VIF) analysis for multicollinearity.

To address class imbalance, we employed random oversampling, undersampling, Synthetic Minority Over-sampling Technique (SMOTE), and balanced sampling methods. Each method was evaluated based on final performance metrics to determine its effectiveness in creating a balanced class distribution and improving the performance and generalizability of our predictive models, with model performance also assessed using a baseline of no imputation and no resampling for comparison.

Imputation was done first, followed by transformations, resampling, and then scaling (for applicable models).

Feature analysis

Our study included a set of 36 features selected based on data availability and clinical relevance – 17 categorical and 19 continuous features (S3 Table). Categorical features included patient demographics, and comorbidities (as defined by ICD codes) such as diabetes, renal disease, and cancer. Continuous features included vital signs such as heart rate, systolic blood pressure, and oxygen saturation, laboratory values like hemoglobin, creatinine, sodium troponin and NT-proBNP levels. Note that prior renal disease as identified by ICD codes and admission Creatinine values were both used as individual variables in model development.

To understand the relationship between individual features and their predictive power, mutual information plots for 30-day and 1-year mortality were constructed. Additionally, Pearson correlation heatmaps were generated to visualize the linear relationships between continuous features.

Model fitting and evaluation

An 80−20 data split was applied to separate the data into training and testing sets. We used 5-fold cross-validation for the pipelines utilizing resampling methods, and for the pipeline without resampling methods, we utilized a repeated stratified K-Fold cross- validation, considering its strength towards the imbalance classification task. We used a randomized hyperparameter search to fine-tune each model. Model evaluation was performed using the metrics outlined above.

Model interpretability and explainability

To enhance the transparency and interpretability of our predictive models, we used SHAP (SHapley Additive exPlanations) values which provide a unified measure of feature importance, quantifying the contribution of each feature to the model’s predictions. We used SHAP summary plots and bar plots to visualize the global importance of features. For logistic regression models, we calculated odds ratios to quantify the impact of each feature on the target variable.

Analyses were conducted using Python 3.10.12, R, and Stata Statistical Software: Release 18 (College Station, TX). Code used to analyze data and build the models is publicly accessible via [GitHub](https://github.com/itsjustnilay/Predictive_Modelling_for_Heart_Failure_with_Preserved_Ejection_Fraction).

Results

The study sample consisted of 3,235 individual hospitalization encounters with a discharge diagnosis or HFpEF. Demographics and clinical characteristics for the study sample are shown in Table 1. Note that the comorbidities in Table 1 were defined by ICD codes. The mean age of the study population was 76.4 ± 13.3; 62.0% were female, and 20.5% self-identified as Black. Missing values proportions by variable are shown in S4 Table. BMI, temperature, and oxygen saturation had higher proportions of missing values, while laboratory parameters like Creatinine, Bicarbonate, and Hemoglobin had fewer missing values, except for troponin which had a high proportion of missing.

Download:

Table 1. Baseline Characteristics of Patients by Survival Status (N = 3,235).

https://doi.org/10.1371/journal.pone.0336809.t001

The observed 30-day mortality was 6.3% (N=245) and 1-year mortality was 29.2% (N = 1145). Women had similar mortality to men (28.5% vs 27.5%, p=0.52). The in- hospital mortality rate for Black patients was lower at 20.7% vs 31.6% for White, while that for patients ≥65 years was higher at 31.6% vs 13.9% for those <65 years (both p<0.001). Patients who died during their hospital stay had higher proportions of comorbidities such as chronic kidney disease, chronic obstructive pulmonary disease (COPD), cancer, atrial fibrillation, compared with patients who survived hospitalization (Table 1).

Correlation heat maps for continuous variables are shown in S1 Fig. and Mutual information plots are shown in S2 Fig. Mutual information plots showed NT-proBNP and age to be key predictors for both outcomes, while heart rate, White race, and potassium levels were significant markers for 30-day mortality, while systolic blood pressure, Black race, and oxygen saturation were significant predictors for one-year mortality. Black race has been previously shown to be associated with lower mortality in HFpEF [26]. However, as race is a social and not a biological construct, we did not include any race variables in predictive modeling. Multiple imputation and balanced resampling methods were noted to be the most effective strategies for managing missing and class imbalance respectively.

Model performance metrics are shown in Table 2. and AUC curves for all models are shown in Fig 1. PR-AUC and calibration curves are shown in S3 and S4 Figs. respectively.

Download:

Table 2. Performance Metrics of Predictive Models for 30-Day and 1-Year Mortality.

https://doi.org/10.1371/journal.pone.0336809.t002

Download:

Fig 1. ROC Curves for Predictive Models of (A) 30-Day Mortality and (B) 1-Year Mortality.

https://doi.org/10.1371/journal.pone.0336809.g001

Model performance

For 30-day mortality, the regression-based models overall appeared to perform better than tree-based models. The Logistic Regression model using median imputation and random under-sampling demonstrated an overall good performance with an accuracy of 0.67, AUC of 0.83, sensitivity of 0.82, and specificity of 0.66.

For 1-year mortality, tree-based models overall appear to perform better. The HGBC model using multiple imputation and random oversampling had an accuracy of 0.77, AUC of 0.78, sensitivity of 0.49, and specificity of 0.87. On the other hand, regression models such as Elastic Net model showed higher specificity but lower sensitivity (accuracy of 0.79, AUC of 0.75, sensitivity of 0.35, and specificity of 0.94).

Variable importance

The odds ratios (OR) for the logistic regression models for 1-year and 30-day mortality are shown in Table 3. For 30-day mortality, the most significant predictors were elevated WBC count and NT-proBNP levels (OR: 2.85 and 2.44 respectively). Other important predictors included age, troponin and bicarbonate levels. For 1-year mortality, age and elevated NT-proBNP levels (Odds Ratio: 1.78 and 1.66 respectively) were significant predictors, though with lower odds ratios compared to 30-day mortality. Atrial fibrillation, metastatic cancer, and elevated bicarbonate level were other important predictors.

Download:

Table 3. Feature Odds Ratios for 30-Day and 1-Year Mortality as per Logistic Regression.

https://doi.org/10.1371/journal.pone.0336809.t003

Interpretability and explainability

SHAP summary plots for 30-day and 1-year mortality are shown in Fig 2 and SHAP bar plots in S5 Fig. SHAP interpretations were performed for the Logistic regression model for 30-day mortality outcome and HGBC for 1-year mortality. For 30-day mortality, NT-proBNP was the most important feature, followed by age and coronary artery disease. For 1-year mortality, age at admission, NT-proBNP levels and systolic blood pressure levels were the most significant factors.

Download:

Fig 2. SHAP Summary Plots for Predictive Models of (A) 30-Day Mortality and (B) 1-Year Mortality.

https://doi.org/10.1371/journal.pone.0336809.g002

Discussion

In our study, models derived from EHR data to predict 30-day and 1-year mortality with a Heart Failure with Preserved Ejection Fraction (HFpEF) hospitalization showed good performance and potential for clinical use. Regression models performed well for the 30-day outcome with the overall best performing Logistic regression model with an AUC of 0.83. Tree-based models overall appear to perform better for the 1-year outcomes with the best performing HGBC model with an AUC of 0.78.

Prior studies developing prediction models in HFpEF have focused on the ambulatory population [4,6,7,27] and are not optimal to be used in the hospitalized setting, where markers of acuity such as vital signs etc. need to be additionally incorporated and can help define risk. Further, most prior HFpEF models have been derived from trial data which have standardized data collection, and often contain variables which are not readily available in the EHR, such as complex health status assessment, NYHA Class, or genetic data. Additionally, traditional models often focus on being parsimonious, which is extremely pertinent for low resource settings, but in clinical environments delivering care using contemporary EHR systems, computation is not a limitation, and thus leveraging all available variables and modeling the complexity of variable relationships can help improve risk prediction [28].

It is critical for models to be developed using EHR data for two reasons. First, patient populations sourced from the EHR may be more reflective of the real-world than trial data which can be affected by selection bias. Second, EHR-based prediction models are easier to implement in patient facing environments, given that the constituent risk variables are already sourced from EHR and are obtained in routine clinical care. The potential uses of these models could be early risk stratification for in-hospital planning such as ICU triage, and triggering care pathways like advanced HF team involvement, palliative care involvement etc. There is also a role for quality metrics and for hospitals to track post-discharge outcomes.

In our study, for predicting 30-day mortality, regression-based models (Logistic regression and Elastic Net) performed better than tree-based models. The logistic regression model had the best metrics overall including an AUC of 0.83. It could be that short-term outcomes are driven by more immediate and linear relationships with acute clinical indicators which are modeled well by regression methods. Additionally, it may be that regression-based methods are able to handle the highly imbalanced nature of the 30-day outcome more effectively. Techniques like Elastic Net provide regularization, preventing overfitting by penalizing complex models, which may be crucial for the shorter prediction window. In addition, as short-term mortality risk is often incorporated into triage decisions, the higher sensitivity of the regression-based models is also favorable.

Tree-based models, on the other hand, performed better for the 1-year outcome with the overall best performing HGBC model with an AUC of 0.78. Tree-based models are non-linear, which enables them to capture complex interactions between variables which are present in long-term prediction tasks. They are also more effective at handling different types of data and missing values, ensuring robust prediction in the face of incomplete data. Their ensemble-based structure aggregating predictions from multiple trees, makes them versatile and helps reduce variance. They also improve predictive accuracy by leveraging the strengths of multiple models to explore deeper relationships within the data, capturing long-term trends and patterns more effectively than linear models.

Age was an important predictor of both 30-day and 1-year mortality. Sex was not a predictor unlike noted in some other prior models [27]. Among comorbidities, we noted COPD, atrial fibrillation, malignancy and liver disease to be important predictors. Among laboratory parameters NT-pro-BNP was the most important predictor, as has been noted in most prior models in HFpEF, and affected both outcomes significantly [4,25,27,28]. Troponin on the other hand was an important predictor more for 30-day mortality than for 1 year. A higher bicarbonate level and wbc count, similar to a prior study, were also noted to be an important variable for both outcomes. Unlike in HFrEF, the effect of elevated bicarbonate levels on mortality in HFpEF have not been specifically reported before [4,29].

Overall, in terms of the individual risk variables most are not novel such as age and NT-proBNP and have been noted in prior studies. However, the novelty of our study remains the use of real-world EHR data from a large health system to derive the model and the being model specific to the hospitalized population. In contrast prior predictive models for HFpEF have been derived from registry or trial data and are for ambulatory populations. Thus, although there is some commonality of variables identified with prior models, the strength of association with the outcomes may differ in this treatment setting. Further, the incomplete nature of EHR data also can change the association of variables. An EHR derived model is likely to perform better in a patient-facing EHR setting. However, despite these advantages further validation of our model is needed.

To further enhance the predictive accuracy of such EHR-based models, future investigations could use data combined from multiple health systems, which will allow larger numbers of patients, to fully leverage the capabilities of machine learning methodologies. In addition, exploring ensemble methods by combining model classes can further enhance prediction by strategically amalgamating the strengths of individual algorithms. Further, including additional data categories such as prescription fill data and imaging parameters can help enhance prediction. These data streams are currently not universally accessible in EHRs, however, with advancements in interoperability there is a potential in the near future for incorporating such data and more into clinical models for use at the bedside.

Limitations

One limitation of our study is the lack of external validation using an independent cohort, and despite the use of techniques like stratified cross-validation and bootstrapping concerns remain of the model’s generalizability. Further validation across diverse populations is necessary. Additionally, the completeness of the data presented challenges, particularly with features that exhibited high levels of imbalance and missingness. This is however, a common issue encountered with EHR data. Although imputation and resampling methods were carefully applied to address these issues to maintain the original dataset’s distribution, these processes can introduce bias and leave the potential for misclassification, which may impact the model’s performance. Despite implementing regularization techniques to reduce the risk of overfitting, there remains a concern that the model may still be overly tailored to the training data.

Conclusion

Models derived from EHR data have good performance in predicting 30-day and 1-year mortality with a HFpEF hospitalization, with performance metrics similar to other contemporary models derived from trial datasets. Models derived from EHR have an immediate potential to be implemented at the bedside.

Supporting information

S1 Table. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement.

https://doi.org/10.1371/journal.pone.0336809.s001

(PDF)

S2 Table. ICD codes of observed outcomes and their frequencies.

https://doi.org/10.1371/journal.pone.0336809.s002

(PDF)

S3 Table. Features, Target Outcomes and their data types.

https://doi.org/10.1371/journal.pone.0336809.s003

(PDF)

S4 Table. Missing Values (count and %) of features.

https://doi.org/10.1371/journal.pone.0336809.s004

(PDF)

S1 Fig. Correlation Matrix for Continuous Variables in (A) 30-Day and (B) 1-Year Mortality.

https://doi.org/10.1371/journal.pone.0336809.s005

(PDF)

S2 Fig. The mutual information (MI) analysis comparing (A) 30-day and (B) 1-year mortality.

https://doi.org/10.1371/journal.pone.0336809.s006

(PDF)

S3 Fig. The precision-recall curves for (a) 30-Day and (b) 1-Year mortality outcomes.

https://doi.org/10.1371/journal.pone.0336809.s007

(PDF)

S4 Fig. The calibration curves for (A) 30-Day and (B) 1-Year mortality outcomes.

https://doi.org/10.1371/journal.pone.0336809.s008

(PDF)

S5 Fig. SHAP Bar plots for (A) 30-Day mortality Logistic regression model and (B) 1-Year mortality outcomes HGBC model.

https://doi.org/10.1371/journal.pone.0336809.s009

(PDF)

References

1. Kittleson MM, Panjrath GS, Amancherla K, Davis LL, Deswal A, Dixon DL, et al. 2023 ACC Expert Consensus Decision Pathway on Management of Heart Failure With Preserved Ejection Fraction: A Report of the American College of Cardiology Solution Set Oversight Committee. J Am Coll Cardiol. 2023;81(18):1835–78. pmid:37137593
- View Article
- PubMed/NCBI
- Google Scholar
2. Jia Y-Y, Cui N-Q, Jia T-T, Song J-P. Prognostic models for patients suffering a heart failure with a preserved ejection fraction: a systematic review. ESC Heart Fail. 2024;11(3):1341–51. pmid:38318693
- View Article
- PubMed/NCBI
- Google Scholar
3. Rich JD, Burns J, Freed BH, Maurer MS, Burkhoff D, Shah SJ. Meta-Analysis Global Group in Chronic (MAGGIC) Heart Failure Risk Score: Validation of a Simple Tool for the Prediction of Morbidity and Mortality in Heart Failure With Preserved Ejection Fraction. J Am Heart Assoc. 2018;7(20):e009594. pmid:30371285
- View Article
- PubMed/NCBI
- Google Scholar
4. Komajda M, Carson PE, Hetzel S, McKelvie R, McMurray J, Ptaszynska A, et al. Factors associated with outcome in heart failure with preserved ejection fraction: findings from the Irbesartan in Heart Failure with Preserved Ejection Fraction Study (I-PRESERVE). Circ Heart Fail. 2011;4(1):27–35. pmid:21068341
- View Article
- PubMed/NCBI
- Google Scholar
5. McDowell K, Kondo T, Talebi A, Teh K, Bachus E, de Boer RA, et al. Prognostic Models for Mortality and Morbidity in Heart Failure With Preserved Ejection Fraction. JAMA Cardiol. 2024;9(5):457–65. pmid:38536153
- View Article
- PubMed/NCBI
- Google Scholar
6. Pocock SJ, Ferreira JP, Packer M, Zannad F, Filippatos G, Kondo T, et al. Biomarker-driven prognostic models in chronic heart failure with preserved ejection fraction: the EMPEROR-Preserved trial. Eur J Heart Fail. 2022;24(10):1869–78. pmid:35796209
- View Article
- PubMed/NCBI
- Google Scholar
7. Angraal S, Mortazavi BJ, Gupta A, Khera R, Ahmad T, Desai NR, et al. Machine Learning Prediction of Mortality and Hospitalization in Heart Failure With Preserved Ejection Fraction. JACC Heart Fail. 2020;8(1):12–21. pmid:31606361
- View Article
- PubMed/NCBI
- Google Scholar
8. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. pmid:38626948
- View Article
- PubMed/NCBI
- Google Scholar
9. Vapnik V. The nature of statistical learning theory. Springer science & business media; 2013.
10. Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons. 2013.
11. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc Ser B Stat Methodol. 1996;58(1):267–88.
- View Article
- Google Scholar
12. Zou H, Hastie T. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Stat Soc Ser B Stat Methodol. 2005;67(2):301–20.
- View Article
- Google Scholar
13. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
- View Article
- Google Scholar
14. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29(5):1189–232.
- View Article
- Google Scholar
15. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.
16. M H, M.N S. A Review on Evaluation Metrics for Data Classification Evaluations. IJDKP. 2015;5(2):01–11.
- View Article
- Google Scholar
17. Namdar K, Haider MA, Khalvati F. A Modified AUC for Training Convolutional Neural Networks: Taking Confidence Into Account. Front Artif Intell. 2021;4:582928. pmid:34917933
- View Article
- PubMed/NCBI
- Google Scholar
18. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6. pmid:31898477
- View Article
- PubMed/NCBI
- Google Scholar
19. Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Soc Methods Res. 2004;33(2):261–304.
- View Article
- Google Scholar
20. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215-20. pmid:10851218
- View Article
- PubMed/NCBI
- Google Scholar
21. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1. pmid:36596836
- View Article
- PubMed/NCBI
- Google Scholar
22. Johnson A, Bulgarelli L, Pollard T, Gow B, Moody B, Horng S, et al. MIMIC-IV (version 3.0). PhysioNet. 2024. https://doi.org/10.13026/hxp0-hg59
23. Johnson AE, Stone DJ, Celi LA, Pollard TJ. The MIMIC Code Repository: enabling reproducibility in critical care research. J Am Med Inform Assoc. 2018;25(1):32–9. pmid:29036464
- View Article
- PubMed/NCBI
- Google Scholar
24. Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE Jr, Moons KG, et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med. 2019;38(7):1276–96. pmid:30357870
- View Article
- PubMed/NCBI
- Google Scholar
25. Wang C-H, Han S, Tong F, Li Y, Li Z-C, Sun Z-J. Risk prediction model of in-hospital mortality in heart failure with preserved ejection fraction and mid-range ejection fraction: a retrospective cohort study. Biomark Med. 2021;15(14):1223–32. pmid:34498488
- View Article
- PubMed/NCBI
- Google Scholar
26. Brown S, Biswas D, Wu J, et al. Race- and Ethnicity-Related Differences in Heart Failure With Preserved Ejection Fraction Using Natural Language Processing. JACC Adv. 2024;3(8):101064.
- View Article
- Google Scholar
27. McDowell K, Kondo T, Talebi A, Teh K, Bachus E, de Boer RA, et al. Prognostic Models for Mortality and Morbidity in Heart Failure With Preserved Ejection Fraction. JAMA Cardiol. 2024;9(5):457–65. pmid:38536153
- View Article
- PubMed/NCBI
- Google Scholar
28. Kasahara S, Sakata Y, Nochioka K, Tay WT, Claggett BL, Abe R, et al. The 3A3B score: The simple risk score for heart failure with preserved ejection fraction - A report from the CHART-2 Study. Int J Cardiol. 2019;284:42–9. pmid:30413304
- View Article
- PubMed/NCBI
- Google Scholar
29. Cooper LB, Mentz RJ, Gallup D, Lala A, DeVore AD, Vader JM, et al. Serum Bicarbonate in Acute Heart Failure: Relationship to Treatment Strategies and Clinical Outcomes. J Card Fail. 2016;22(9):738–42. pmid:26777758
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Kittleson MM, Panjrath GS, Amancherla K, Davis LL, Deswal A, Dixon DL, et al. 2023 ACC Expert Consensus Decision Pathway on Management of Heart Failure With Preserved Ejection Fraction: A Report of the American College of Cardiology Solution Set Oversight Committee. J Am Coll Cardiol. 2023;81(18):1835–78. pmid:37137593
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Jia Y-Y, Cui N-Q, Jia T-T, Song J-P. Prognostic models for patients suffering a heart failure with a preserved ejection fraction: a systematic review. ESC Heart Fail. 2024;11(3):1341–51. pmid:38318693
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Rich JD, Burns J, Freed BH, Maurer MS, Burkhoff D, Shah SJ. Meta-Analysis Global Group in Chronic (MAGGIC) Heart Failure Risk Score: Validation of a Simple Tool for the Prediction of Morbidity and Mortality in Heart Failure With Preserved Ejection Fraction. J Am Heart Assoc. 2018;7(20):e009594. pmid:30371285
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Komajda M, Carson PE, Hetzel S, McKelvie R, McMurray J, Ptaszynska A, et al. Factors associated with outcome in heart failure with preserved ejection fraction: findings from the Irbesartan in Heart Failure with Preserved Ejection Fraction Study (I-PRESERVE). Circ Heart Fail. 2011;4(1):27–35. pmid:21068341
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. McDowell K, Kondo T, Talebi A, Teh K, Bachus E, de Boer RA, et al. Prognostic Models for Mortality and Morbidity in Heart Failure With Preserved Ejection Fraction. JAMA Cardiol. 2024;9(5):457–65. pmid:38536153
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Pocock SJ, Ferreira JP, Packer M, Zannad F, Filippatos G, Kondo T, et al. Biomarker-driven prognostic models in chronic heart failure with preserved ejection fraction: the EMPEROR-Preserved trial. Eur J Heart Fail. 2022;24(10):1869–78. pmid:35796209
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Angraal S, Mortazavi BJ, Gupta A, Khera R, Ahmad T, Desai NR, et al. Machine Learning Prediction of Mortality and Hospitalization in Heart Failure With Preserved Ejection Fraction. JACC Heart Fail. 2020;8(1):12–21. pmid:31606361
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. pmid:38626948
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Vapnik V. The nature of statistical learning theory. Springer science & business media; 2013.

[ref10] 10. Hosmer Jr DW, Lemeshow S, Sturdivant RX. Applied logistic regression. John Wiley & Sons. 2013.

[ref11] 11. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc Ser B Stat Methodol. 1996;58(1):267–88.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref12] 12. Zou H, Hastie T. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Stat Soc Ser B Stat Methodol. 2005;67(2):301–20.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref13] 13. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref14] 14. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29(5):1189–232.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref15] 15. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016.

[ref16] 16. M H, M.N S. A Review on Evaluation Metrics for Data Classification Evaluations. IJDKP. 2015;5(2):01–11.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref17] 17. Namdar K, Haider MA, Khalvati F. A Modified AUC for Training Convolutional Neural Networks: Taking Confidence Into Account. Front Artif Intell. 2021;4:582928. pmid:34917933
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref18] 18. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6. pmid:31898477
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref19] 19. Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Soc Methods Res. 2004;33(2):261–304.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref20] 20. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215-20. pmid:10851218
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref21] 21. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1. pmid:36596836
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref22] 22. Johnson A, Bulgarelli L, Pollard T, Gow B, Moody B, Horng S, et al. MIMIC-IV (version 3.0). PhysioNet. 2024. https://doi.org/10.13026/hxp0-hg59

[ref23] 23. Johnson AE, Stone DJ, Celi LA, Pollard TJ. The MIMIC Code Repository: enabling reproducibility in critical care research. J Am Med Inform Assoc. 2018;25(1):32–9. pmid:29036464
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref24] 24. Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE Jr, Moons KG, et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med. 2019;38(7):1276–96. pmid:30357870
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref25] 25. Wang C-H, Han S, Tong F, Li Y, Li Z-C, Sun Z-J. Risk prediction model of in-hospital mortality in heart failure with preserved ejection fraction and mid-range ejection fraction: a retrospective cohort study. Biomark Med. 2021;15(14):1223–32. pmid:34498488
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref26] 26. Brown S, Biswas D, Wu J, et al. Race- and Ethnicity-Related Differences in Heart Failure With Preserved Ejection Fraction Using Natural Language Processing. JACC Adv. 2024;3(8):101064.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref27] 27. McDowell K, Kondo T, Talebi A, Teh K, Bachus E, de Boer RA, et al. Prognostic Models for Mortality and Morbidity in Heart Failure With Preserved Ejection Fraction. JAMA Cardiol. 2024;9(5):457–65. pmid:38536153
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref28] 28. Kasahara S, Sakata Y, Nochioka K, Tay WT, Claggett BL, Abe R, et al. The 3A3B score: The simple risk score for heart failure with preserved ejection fraction - A report from the CHART-2 Study. Int J Cardiol. 2019;284:42–9. pmid:30413304
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref29] 29. Cooper LB, Mentz RJ, Gallup D, Lala A, DeVore AD, Vader JM, et al. Serum Bicarbonate in Acute Heart Failure: Relationship to Treatment Strategies and Clinical Outcomes. J Card Fail. 2016;22(9):738–42. pmid:26777758
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

Figures

Abstract

Objectives

Background

Methods

Results

Conclusion

Introduction

Methods

Data sources

Study population

Outcomes

Data extraction

Preprocessing

Feature analysis

Model fitting and evaluation

Model interpretability and explainability

Results

Model performance

Variable importance

Interpretability and explainability

Discussion

Limitations

Conclusion

Supporting information

S1 Table. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement.

S2 Table. ICD codes of observed outcomes and their frequencies.

S3 Table. Features, Target Outcomes and their data types.

S4 Table. Missing Values (count and %) of features.

S1 Fig. Correlation Matrix for Continuous Variables in (A) 30-Day and (B) 1-Year Mortality.

S2 Fig. The mutual information (MI) analysis comparing (A) 30-day and (B) 1-year mortality.

S3 Fig. The precision-recall curves for (a) 30-Day and (b) 1-Year mortality outcomes.

S4 Fig. The calibration curves for (A) 30-Day and (B) 1-Year mortality outcomes.

S5 Fig. SHAP Bar plots for (A) 30-Day mortality Logistic regression model and (B) 1-Year mortality outcomes HGBC model.

References