Evaluating risk prediction models for adults with heart failure: A systematic literature review

Background The ability to predict risk allows healthcare providers to propose which patients might benefit most from certain therapies, and is relevant to payers’ demands to justify clinical and economic value. To understand the robustness of risk prediction models for heart failure (HF), we conducted a systematic literature review to (1) identify HF risk-prediction models, (2) assess statistical approach and extent of validation, (3) identify common variables, and (4) assess risk of bias (ROB). Methods Literature databases were searched from March 2013 to May 2018 to identify risk prediction models conducted in an out-of-hospital setting in adults with HF. Distinct risk prediction variables were ranked according to outcomes assessed and incorporation into the studies. ROB was assessed using Prediction model Risk Of Bias ASsessment Tool (PROBAST). Results Of 4720 non-duplicated citations, 40 risk-prediction publications were deemed relevant. Within the 40 publications, 58 models assessed 55 (co)primary outcomes, including all-cause mortality (n = 17), cardiovascular death (n = 9), HF hospitalizations (n = 15), and composite endpoints (n = 14). Few publications reported detail on handling missing data (n = 11; 28%). The discriminatory ability for predicting all-cause mortality, cardiovascular death, and composite endpoints was generally better than for HF hospitalization. 105 distinct predictor variables were identified. Predictors included in >5 publications were: N-terminal prohormone brain-natriuretic peptide, creatinine, blood urea nitrogen, systolic blood pressure, sodium, NYHA class, left ventricular ejection fraction, heart rate, and characteristics including male sex, diabetes, age, and BMI. Only 11/58 (19%) models had overall low ROB, based on our application of PROBAST. In total, 26/58 (45%) models discussed internal validation, and 14/58 (24%) external validation. Conclusions The majority of the 58 identified risk-prediction models for HF present particular concerns according to ROB assessment, mainly due to lack of validation and calibration. The potential utility of novel approaches such as machine learning tools is yet to be determined. Registration number The SLR was registered in Prospero (ID: CRD42018100709).

The most commonly reported risk predictors were also investigated, and discrimination and calibration of the models analyzed. As potential for bias is a consideration in risk prediction, each identified model was assessed according to the Prediction model Risk Of Bias ASsessment Tool (PROBAST) [16,17].

Data sources
MEDLINE, including MEDLINE in progress, EMBASE, and the Cochrane Library Database, including the National Health Service Economic Evaluation Database and the Health Technology Assessment Database, were searched using a combination of search terms (S1 Appendix). Principle and practical guidelines advocated by the Cochrane Collaboration Handbook and the Centre for Reviews and Dissemination were employed (where relevant). The SLR incorporated a standardized methodical and transparent approach that adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Cochrane Collaboration guidelines. The SLR was registered in Prospero (ID: CRD42018100709 see: https:// www.crd.york.ac.uk/prospero/display_record.php?RecordID=100709).

Study eligibility
English-language studies published March 1, 2013 to May 29, 2018 were retained for further review if they involved adult patients with HF, aged �18 years, were conducted in an out-ofhospital setting, and documented multivariable models that predicted single-or multiple-HF outcomes in the target population, according to the search strategy (S1 Appendix). Preclinical, pharmacokinetic, or pharmacodynamic studies were excluded. Studies were not eligible for inclusion if they: used clinical outcomes that were considered in-hospital; focused on individual predictions or markers of risk (i.e., non-univariable as this type of model tends to report overly optimistic findings [18]); were a letter, opinion piece, or review article; or used a dataset that did not reflect current clinical practices.

Study selection
Titles and abstracts of identified publications were screened and relevant publications retained for full-text review, according to National Institute for Health and Care Excellence guidance [19] (Fig 1). Both search and screening phases were independently conducted by two trained investigators. Any disagreements were resolved with a senior investigator.

Analysis of bias (PROBAST)
PROBAST was used to assess the risk of bias (ROB) of each risk prediction model identified from the relevant publications, according to our interpretation of Moons et al. [16]. Each model was assessed for applicability concerns and ROB, according to 3 or 4 domains, respectively. According to guidance from Moons et al. [16], if �1 domain is considered "No [N]" or "Probably No [PN]", there is concern for applicability or potential for bias within that domain. If the review questions were considered to be a good match to the study, concern regarding applicability was rated overall "low" [16]. A publication needed to score "low ROB" in each of the 4 domains for an overall judgment of "low ROB". However, if �1 domain was "high ROB", a judgment can still be made that the study is overall "low ROB", but specific reasons should be provided as to why the risk can be considered low [16].

Study selection
The SLR yielded 5425 citations, of which 4720 were non-duplicated citations and were further screened. Of these, 290 were retained for full-text review, which led to 40 relevant publications [21-60] (Fig 1). The 250 excluded publications are detailed in Fig 1, with reasons for exclusion.

Study characteristics
Sample size varied from 43 to 33,349 patients. Patients were aged 59-81 years, and 28-84% of cohorts were male (Table 1). Study follow-up varied considerably (30 days to 5 years).

Characteristics of risk prediction studies
Nearly half of studies (n = 18 [45%]) failed to provide any indication of data collection period. Of studies that did report study period, data were collected from 2001 to 2015. Few studies reported detail regarding how missing data were handled (n = 11 [28%]); the most common approach being multiple imputation procedures (n = 6 [55%]) ( Table 2). Of the 14 studies that reported missing data (35%), the percentage of complete cases ranged between 86% and 100%. Thirty-nine studies (98%) evaluated candidate predictors during model development. Cox regression was used by approximately half of studies (n = 22 [55%]). As would be expected, hazard ratios (n = 25 [64%]) and odds ratios (n = 12 [31%]) were most often used for estimating risk. All publications employed discrimination methods to assess prognostic utility of their model(s). Area under the curve-receiver operating characteristic (AUC-ROC) (n = 19 [48%]) and C-statistic (n = 18 [45%]) were most often used (Table 2).
Beyond model discrimination, steps for evaluating model performance were suboptimal. Less than half of retrieved publications evaluated model fit through calibration methods (n = 16 [40%]). Approaches to correctly classify patients according to severity of HF risk were not widely reported, with net reclassification improvement (NRI) (n = 14 [35%]) or integrated discrimination index (IDI) (n = 6 [15%]) used by a minority of studies (Table 2). Interpretation of these observations is hampered by lack of similarity in approach, particularly as some studies utilized category-dependent NRIs, whereas others a category-free NRI technique. Only 20 studies performed an estimation of internal model validation (50%), with bootstrapping most commonly used. External validation was less frequently reported (n = 10/40 [25%]), with the majority of these publications (n = 8/10 [80%]) employing an external model cohort for comparison ( Table 2).
Discriminatory value was assessed for all 17 all-cause mortality outcomes, based on C-statistic (n = 10) or reported as AUC-ROC (n = 7). Relevant model outcomes showed predictive C-statistic values considered "moderate" or "good", ranging between 0.655 and 0.840 (Table 3). Eight model outcomes provided C-statistics according to a base model in an effort to determine the incremental value when retaining candidate variables into the final model, these C-statistics ranged between 0.677 and 0.826. Internal validation was carried out by 8 model outcomes (47%), primarily by bootstrapping. Just 3 (18%) performed external validation.    (Table 3). Of the 9 CV mortality model outcomes, 3 reported C-statistic for model discrimination, 3 reported AUC-ROC, 1 used Kaplan-Meier assessment, and 2 used the Therneau's survival concordance index. The 8 relevant model outcomes displayed "moderate" or "good" discriminatory values, with a model C-statistic ranging between 0.680 and 0.890 (Table 3). Only 1 model outcomes performed internal validation [53] and none external validation.
HF hospitalization. Admission to hospital for HF was the most common endpoint, assessed in 15 models. Of the overall outcomes, 3 additionally assessed all-cause mortality and 1 CV mortality. The median [range] of candidate variables for HF hospitalization was the highest of the 4 outcome categories (19 ), although the median number of retained variables was equivalent to those retained for composite endpoints (7 ). Discrimination was most commonly assessed using the C-statistic (n = 7) or reported as AUC-ROC (n = 7), with 1 model outcomes using Kaplan-Meier assessment. C-statistics ranged between 0.59 and 0.80 (Table 3). Eapen et al. had the largest sample size (33,349 subjects), and a "low" discriminatory value of 0.59 for HF hospitalization [35]. This study assessed all-cause mortality and composite endpoints using different models, and reported good (0.75) and modest (0.62) discrimination, respectively [35] ( Table 3). The majority of the predictive model outcomes for HF hospitalization were unable to determine incremental values, as only 2 included a base model. Seven model outcomes (47%) included an assessment of internal validation; 3 (20%) discussed external validation.

Model predictors
From the 38 retrieved publications that did not employ machine learning, 105 distinct predictor variables were identified. The 12 most commonly used variables (in >5 publications) were derived from pathophysiological pathways linked to poor health in HF (Fig 2). These included surrogates of demographic, anthropometric, clinical, and laboratory measures. N-terminal prohormone brain natriuretic peptide (NT-proBNP) and age were most commonly included (n = 11 studies each), followed by T2DM and male sex (n = 10 studies each), systolic blood pressure (SBP) (n = 9 studies), blood urine nitrogen (BUN) and creatinine (n = 8 studies each), heart rate and left ventricular EF (n = 7 studies), sodium, body mass index (BMI), and New York Heart Association (NYHA) class (n = 6 studies each) (Fig 2). Shameer et al. [55] and Krumholz et al. [46] used machine learning and included 4205 and 105 candidate variables, respectively. Despite these large numbers of variables, they did not consider the commonly identified distinct predictors, given in Fig 2. Shameer et al. displayed "good" discriminatory ability with C-statistic of 0.77 [55], suggesting this approach might be promising for predicting relevant outcomes. Conversely, Krumholz et al. documented that a number of socioeconomic, health status, adherence, and psychosocial indicators were not dominant factors for predicting 30-day readmission risk, and model discrimination remained "modest" (C-statistic = 0.65) [46].

Identification of HF subgroups
Five studies (13%) looked to classify a "high-risk" patient subset. The groups were typically defined according to the highest scoring category, based on each of the included publication's risk scoring. Á lvarez-García et al. [23] demonstrated that patients who presented with 20-30 points on the Redin-SCORE, had a 5-fold increase (i.e., 5.9% vs. 0.9%) in the cumulative incidence of 30-day HF readmission vs. patients scoring 0-19 points [23]. Uszko-Lencer et al. [58] reported 2-year survival probability among patients classified with "high scores" (i.e., BAR-DICHE-score >16 points) was 58% vs. 97% in the low BARDICHE-score group (�8 points). Using the Echo Heart Failure Score, Carluccio and colleagues [30] reported that all-cause mortality increased progressively with higher scores (0-5 points). Notably, patients with a score of 5 had an all-cause mortality HR 13.6 points higher than if they had a score of 0. When evaluating "high-risk" on the Heart Failure Patient Severity Index (i.e., decile 10), Hummel et al. [41] noted a 57% increase in 6-month all-cause death and hospitalization (composite), vs. an 8% increase in 6-month combined event rate for those classified as "low-risk" (deciles 1-4).

PROBAST
In total, 58 distinct models were identified from the 40 publications. By applying our assessment of PROBAST [32, 35], 11 models (19%) were classified as overall low ROB, 4 (7%) as overall unclear, and the majority (43 [74%]) as overall high ROB (Fig 3). Of the 11 models considered overall low ROB, (co)primary outcomes across the 4 categories were modeled. Although 11 models (from 7 studies) were rated as overall low ROB according to our assessment of PROBAST, only 3 models had "Yes [Y]" or "Partial Yes [PY]" in all domains of PROBAST. The other 8 models were considered overall low ROB according to PROBAST, despite being rated "Unclear" within at least one Domain (1-3). Of the overall low ROB models, 4 also had an "N" in 1 category of Domain 4. For example, Cubbon et al. [32] had an "N" in Domain 4.1 ("Were there a reasonable number of participants with the outcome?"), due to the events per variable (i.e., subjects/variables) being <10 [16]; however, as this model assessing HF rehospitalization was externally validated, it was considered overall low ROB according to Moons et al. [16]. Eapen et al. [35] developed 3 models, and split their data set 70%/30% leading to an "N" in Domain 4.3 ("Were all enrolled participants included in the analysis?"). The authors used the 30% split to validate 70% of their data, and as the models were also calibrated, these models were considered overall low ROB according to our interpretation of Moons et al. [16].
Most of the models considered as overall high ROB had a "Y" in multiple signaling questions, but in particular for Domain 4, which assessed model design and validation (S2 Appendix). Zai et al. [60] was rated high ROB on all 4 domains, mainly through lack of reporting. Of the 43 models rated overall high ROB, 32 were ranked "Low" or "Unclear" on the first 3 PRO-BAST domains assessing participants, predictors, and outcomes, but were classified overall high ROB due to "N" or "PN" in �1 aspect of Domain 4 (S2 Appendix). Most often, for these 32 models and overall, an "N" was included in Domain 4.8, which assessed model overfitting and optimism, particularly involving internal validation [16,17]. For example, Ford et al. assessed 4 co-primary outcomes using 4 models [37]. Ford et al. [37] had an "N" in Domain 4.8 as the models were not reported as being internally validated, and a "PY" in Domains 4.2 and 4.9, as information was reported in the appendix only. As the study did not report information on if the models were externally validated, the models were rated overall high ROB.
When the 58 models were assessed according to applicability concerns, just 6 models (from 5 studies) were rated with overall "High" applicability concern. The majority (52 models) were considered overall "Low" concern, following assessment of applicability to participants, predictors, and outcomes (S2 Appendix).

Discussion
Publications on risk prediction models have become more common in recent years, but distinct prediction models frequently exist for the same outcome or target population. As such, healthcare professionals, policy makers, or guideline committees have competing information regarding which prediction models should be used or recommended [61,62]. To aid these decisions, SLRs of risk prediction models are increasingly demanded and performed [11][12][13][14][15]. In this review of the past 5 years, we identified 40 studies that reported 58 multivariable models for risk prediction in HF. Despite risk prediction models varying widely, a number of common distinct predictor variables were incorporated into these identified models. As CV disorders manifest from multiple pathophysiological pathways, a multivariable approach would likely offer additional incremental value beyond the use of single predictors.
In total, 33 of the 40 studies retained >1 candidate variables in the initial assessment, and we identified 12 most commonly used variables, as incorporated in more than 5 studies. For example, age and male sex were frequently incorporated into the base model, in-line with them being key risk factors for onset and survival in HF [2,63]. Although we identified some commonality in predictors, 105 distinct predictors were identified. This highlights real complexity in HF as a condition, but also the interrelated pathological mechanisms that are considered important for predicting risk, and in part highlights some of the confusion around selecting the most appropriate risk prediction models by professionals [10,14,61]. Two publications [46,55] reported the use of machine learning for predicting risk. Both of these studies incorporated an extensive number of candidate variables for model selection (n = 4205 [55] and n = 110 [46]). Machine learning has shown some promise for improving the accuracy of risk prediction, aiming to increase the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others [64]. Contradictorily, an analysis of 71 studies suggested that machine learning had no superiority over logistic regression techniques for predicting risk, although comparison of studies was hindered by methodological reporting [65]. Whether such automated processes can markedly augment predictive performance in the HF setting remains unclear and requires further investigation to define a role in evaluating risk prediction.
Of the multivariable models identified, several models provided C-statistics according to a base model in an effort to determine the incremental value when adding the retained candidate variables into the final model. These studies highlight the steps taken to improve discriminatory ability, the range of variables retained in different risk prediction models, and how these seem dependent on HF outcomes and population under study. There was no particular evidence to suggest that differences in sample size, data source, or HF type significantly affected the discriminatory ability of the models to predict HF outcomes, or clear commonality in the variables retained within the final model. However, it is unlikely that one prediction model will suit all types of HF, and risk should be dependent on level of preserved EF [61]. Ensuring that models properly evaluate both calibration and discrimination is a domain on PROBAST (Domain 4.7), and 14 models did not include sufficient level of information on this domain by our application of the tool. The majority of retrieved studies relied on AUC-ROC / C-statistic to define discriminatory value, and newer approaches were not widely adopted [66,67]. The C-statistic could naively eliminate established risk factors from CV risk prediction scores [68]. However, it remained a challenge to interpret the distribution of those findings particularly as some studies utilized category-dependent NRIs, whereas others employed a category-free NRI technique. These techniques go beyond conventional discrimination methods by facilitating risk reclassification of patients. Measures such as the NRI have their own limitations, for example the NRI is often heavily influenced by the choice of cutoff points used, as well as the inclusion of unnecessary predictors, and generally requires well-calibrated prediction models for these metrics to be clinically meaningful [66,67]. The concept of risk reclassification has caused much discussion in the literature, with novel decision-analytic measures being proposed [69]. However, as novel risk factors are discovered, sole reliance on the C-statistic to evaluate discriminatory ability of risk predictors has been suggested as ill-advised [67]. A limited number of studies included a reliable approach to evaluate model performance, and less than half evaluated goodness-of-fit by calibration methods. As such, there is clear room for improving the design of risk prediction models away from reliance on the C-statistic, in parallel with research into improving model performance, ensuring validity and enhancing generalizability.
Given the wide variety in models identified, the PROBAST assessment was applied to give further insight into model design and application. Through our application of PROBAST, 11 models (from 7 studies [21, 26, 32, 35, 53, 57, 59]) were suitably designed and published in a way that suggests the model did not introduce bias into the assessment, highlighting that 47 were not sufficiently described. Some lacking areas that arose from analyzing the prediction models included reporting on methods of calibration and discrimination, validation, and the key issue of how missing data were handled using imputation or other techniques.
A lack of full reporting on aspects of validation or overfitting was the domain on which most studies "failed" (Domain 4.8) according to our application of PROBAST. For example, only 26/58 models included sufficient information to confirm studies were internally validated ("Y" on Domain 4.8). The model by Ahmad et al. [22], although sufficient information across aspects of PROBAST was reported, did not report information on internal or external validation, and therefore was rated overall high ROB, despite being rated low ROB on the first 3 domains of PROBAST, covering participants, predictors, and outcomes. The authors even noted that they did not carry out any method of internal validation [22]. Our observations highlight the need for regular assessments of internal validation and goodness-of-fit, but also the wider adoption of methods of external validation. Importantly, external validation requires measures of both discrimination and calibration in another cohort, and only 8 studies reported information on attempting to use an external model cohort for comparison. Although applicability concerns were low, the PROBAST ROB observations suggested models were generally prone to bias. Introduction of bias could lead to the wrong patients being identified and treated, and ultimately costly mistakes within a healthcare system, if the model was widely used [16,17]. The risk that patients will be inappropriately treated could partly explain why models are not being confidently used as an aid to HF patient management [61,62], alongside other concerns discussed in more detail below.
Despite 40 new publications on predicting risk in HF being published within past 5 years, there is little evidence to suggest that any of these 58 models has been adopted by clinicians or healthcare institutions, and no international or local guidance recommending one risk prediction model over another. Indeed, <1% of patients in a European registry received any form of prognostic evaluation [10]. Although many reasons contribute to the limited uptake, poor performance of short-term assessments in guiding decision-making may have contributed [11]. Based on a single-variable model, the GUIDE-IT trial demonstrated that NT-proBNP-guided therapy was not more effective than usual care for improving outcomes in high-risk patients with HF and reduced EF [70]. With such studies, clinicians may therefore see little need to change patient management by risk prediction, seeing all patients as high risk. If risk assessments are to be useful at the bedside, providers need pragmatic models that rely upon easily accessible variables to stratify patients. Given these diverse needs and conflicting evidence of value, further research is required to develop tools, or moreover automated techniques that can provide clinical guidance for risk estimation, in primary care and high-risk or secondary prevention settings [62]. Beyond these concerns, our study highlights the wide variation in statistical approach, the complexity of certain models, and lack of clear external validation, other important considerations for decision makers when recommending any model for predicting risk or stratifying patients according to future risk. Although statistical concerns may hinder clinicians' confidence with a risk model, development of an app or similar tool to simplify the application of the model for the healthcare provider may also negate the need to fully understand the statistical approach. Clear step-by-step guidance toward the correct patient population would be needed, in an app-type approach.
In order to endorse a risk prediction model within a suitable patient group, decision makers would need to ensure the model is generalizable, as one model may suit given patient groups better than another [61]. Only 13% of identified studies stratified patients by HF type, despite evidence that suggests different models should be used depending on level of preserved EF [61]. In addition, considering patients' "frailty index" or other functional parameters, such as the 6-min walking distance, within the prognostic modelling of HF may provide a more multidimensional picture of the patient's risk [71][72][73]. Further research is needed, however, to ensure the validity of such measures. Patient frailty, for example, can be difficult to interpret [73] and requires additional functional parameters (such as mental, nutritional, or social components) to provide a reasonably accurate definition of "frailty" [72]. However, recent research demonstrated that a frailty index can predict mortality, disability, and hospitalization rates in patients with HF, discriminating from patients without HF [74]. Configuration and use of functional parameters is something that may become more important along with the development of generalizable risk prediction models, but they are still being validated and debated [71,74]. Further exploration and understanding of automated processes [46,55,64] is also needed to help researchers and clinicians gain better insight into the risks and uncertainties involved in the management of different types of HF patient. Collectively, future risk prediction models may involve different measures of function, classification, or clinical usefulness, to give additional insight on the prediction, which extends beyond traditional measures of calibration and discrimination [69].
Some limitations need to be considered when interpreting our observations. We selected a study window of 5 years to ensure we reflect up-to-date knowledge and treatment practices given that HF is a dynamic condition, which often has annual treatment recommendations imposed in many countries. However, by limiting the study window to ensure up-to-date treatment practices were reflected, we did not capture risk prediction models that were published prior to the 5-year window, such as MAGGIC (Meta-Analysis Global Group in Chronic Heart Failure) or the Seattle Heart Failure Model [75,76], which had informed contemporary clinical guidelines [77]. Previous reviews, such as that carried out by Rahimi et al., 2014, have included discussion and analysis of these earlier HF models in a contemporary context [12]. Our study time window started after this study by Rahimi et al., but the authors also conclude that although models varied widely, they had some variables in common. In addition, we also found that prediction of HF hospitalization was associated with the lowest discrimination, but that other risk predictions had higher performance that may facilitate clinical use [12], suggesting that discrimination for HF hospitalization has not improved with models developed within the past 5 years and that learnings have not been applied. Just falling outside of our study window, Rich et al. evaluated the MAGGIC risk score (first published in 2013 [75]) for predicting morbidity/mortality in 407 HF patients with preserved EF [78] comparing it with the Seattle Heart Failure Model. The authors concluded the MAGGIC risk score is a valid instrument to assess mortality and morbidity of HF patients with preserved EF and with a better calibration for hospitalization outcome than the Seattle HF instrument. Unfortunately, neither risk model has been assessed with PROBAST.
Each risk model differed, depending on the overall aim of the study, target population considered, length of follow-up, health procedures assessed, location of study, and accessibility to study data, to name but a few. To this end, advocating an optimal modeling approach for use in the HF setting is beyond the scope of this review, and we have discussed some of the limitations around differences in methodologies. Time horizon and sample size varied considerably among the studies identified, with few studies providing sufficient information to confirm robustness and generalizability to qualify the prognosis of individual patients. More rigorous reporting guidance would aid more complete reporting, and in turn, more accurate comparison of studies in an SLR. Nevertheless, by highlighting similarities in approach we hope to inform future decision makers to optimize a model for wider use.
It is clear there is a real need to integrate risk prediction models into healthcare management, but this must be carried out with an eye on bias and handling missing data [61]. Only 28% of studies reported on how they handled missing data. Indeed, most studies (20/40 [50%]) included no information [NI] on how missing data were handled, leading to "NI" in Domain 4.4 of PROBAST. This highlights an area in need of significant improvement in data reporting, to ensure outcomes can be properly concluded upon. Our understanding of the retrieved models is expected to be limited to what is reported within the publication, and the PROBAST assessment should be considered in light of this, as a number of domains were "NI" as no information was available. As such, we cannot disregard the possibility that certain model elements of interest (e.g., as documented in technical modeling reports) may have been overlooked by the present review. Furthermore, the PROBAST checklist is based on reviewer decision-making regarding aspects of the model, which in itself introduces a level of professional decision-making into the assessment of each domain. Therefore, analysis of each study as "high" or "low" ROB should be considered accordingly. As such, independent assessors may come to different decisions regarding domains and models. Further application of PROBAST is therefore required before our observations can be interpreted in light of its application.

Conclusions
We identified 58 risk prediction models for HF, of which 11 (from 7 studies) were sufficiently detailed and validated to be considered overall low ROB according to PROBAST. The risk prediction models differed with regard to patient population analyzed, their statistical approach, and modeling applied, and confirming prognostic utility was challenging due to the majority of models not establishing a base model. A number of distinct predictors were identified in multiple models suggesting commonality in certain key variables when predicting risk in patients with HF. We feel there is room for improvement beyond what is currently offered in the literature as risk prediction tools for HF, particularly by HF type.