Development of a Novel Heart Failure Risk Tool: The Barcelona Bio-Heart Failure Risk Calculator (BCN Bio-HF Calculator)

Background A combination of clinical and routine laboratory data with biomarkers reflecting different pathophysiological pathways may help to refine risk stratification in heart failure (HF). A novel calculator (BCN Bio-HF calculator) incorporating N-terminal pro B-type natriuretic peptide (NT-proBNP, a marker of myocardial stretch), high-sensitivity cardiac troponin T (hs-cTnT, a marker of myocyte injury), and high-sensitivity soluble ST2 (ST2), (reflective of myocardial fibrosis and remodeling) was developed. Methods Model performance was evaluated using discrimination, calibration, and reclassification tools for 1-, 2-, and 3-year mortality. Ten-fold cross-validation with 1000 bootstrapping was used. Results The BCN Bio-HF calculator was derived from 864 consecutive outpatients (72% men) with mean age 68.2±12 years (73%/27% New York Heart Association (NYHA) class I-II/III-IV, LVEF 36%, ischemic etiology 52.2%) and followed for a median of 3.4 years (305 deaths). After an initial evaluation of 23 variables, eight independent models were developed. The variables included in these models were age, sex, NYHA functional class, left ventricular ejection fraction, serum sodium, estimated glomerular filtration rate, hemoglobin, loop diuretic dose, β-blocker, Angiotensin converting enzyme inhibitor/Angiotensin-2 receptor blocker and statin treatments, and hs-cTnT, ST2, and NT-proBNP levels. The calculator may run with the availability of none, one, two, or the three biomarkers. The calculated risk of death was significantly changed by additive biomarker data. The average C-statistic in cross-validation analysis was 0.79. Conclusions A new HF risk-calculator that incorporates available biomarkers reflecting different pathophysiological pathways better allowed individual prediction of death at 1, 2, and 3 years.


Introduction
Risk stratification of heart failure (HF) is a challenge, and as guidelines acknowledge, new accurate scoring models are needed. Several models have been developed [1][2][3][4][5][6][7][8][9][10], of which the Seattle HF model [6] has had the most visibility. Nevertheless, this scoring model is derived from a cohort of patients carefully selected for a randomized clinical trial over 20 years ago, with its inclusion and exclusion criteria. Serum biomarkers for patient risk stratification were not available. However, in recent times a number of biomarkers reflective of different pathophysiological pathways have been identified in HF [11]. Therefore, we developed a calculator for HF stratification that, in addition to classical risk factors, includes N-terminal pro B-type natriuretic peptide (NT-proBNP), a marker of myocardial stretch; high-sensitivity cardiac troponin T (hs-cTnT), a marker of myocyte injury; and highsensitivity soluble ST2 (ST2), which is reflective of myocardial fibrosis and remodeling.
The Barcelona Bio Heart Failure risk calculator (BCN Bio-HF calculator), which is derived from a real-life cohort of contemporary treated HF patients, is a web-based calculator allowing quick and easy interactive calculations of individual mortality at 1, 2, and 3 years and life expectancy.

Derivation Study Population
The study population, samples, and biomarker assays were described elsewhere [12]. In summary ambulatory patients treated at a multidisciplinary HF unit were consecutively included in the study in an outpatient setting. Patients were referred to the unit by cardiology or internal medicine departments and, to a lesser extent, from the emergency or other hospital departments. The principal referral criterion was HF according to the European Society of Cardiology guidelines irrespective of etiology, at least one HF hospitalization, and/or reduced LVEF. Etiologies of HF were: ischemic heart disease 52.2%, dilated cardiomyopathy 10%, hypertensive 9.4%, alcoholic cardiomyopathy 5.7%, drug-related cardiomyopathy 2.5%, valvular disease 11.4% and others 8.8%.
All participants provided written informed consent, and the local ethics committee approved the study. All study procedures were in accordance with the ethical standards outlined in the Helsinki Declaration of 1975 as revised in 1983. The regular visitation schedule was reported elsewhere [12][13][14][15]. Death from all causes was the main outcome. Fatal events were identified from clinical records, family contact or by reviewing the electronic clinical history at the Catalan and Spanish Institute of Health. Physicians and nurses of the HF Unit identified adverse events (JL, M de A, AU, BG, RC, LC).

Model Making
A selection of 23 well-known mortality-related variables from the literature and from previous own studies was first evaluated, and 11 of them were included in eight Cox proportional hazard regression models due to their significance in the multivariate analysis or because considered of clinical significant relevance: one model without biomarkers ('clinical model') and seven additional models with all possible combinations of the three biomarkers.
Proportional assumptions needed to use Cox proportional hazard regression models were tested for all variables. Variables in which the non-linear component achieved significance were transformed according to what the figure of time vs. hazard suggested until non-significance of the non-linear component was achieved, as reported elsewhere [12,14,15]. In summary, to fulfill the assumption of linearity for the co-variables hs-cTnT, ST2, and NT-proBNP, the logarithmic functions of both NT-proBNP and hs-cTnT, the quadratic term of the logarithmic function of hs-cTnT, and the quadratic term of ST2 were used in the Cox proportional hazard regression models. In the 'clinical model', variables were removed one-by-one in a backward manner to assess whether their exclusion significantly reduced the likelihood of the model. When two variables were collinear in predicting outcome, the one with the better likelihood was included. All twovariable interactions were also tested. Some variables were dichotomized (such as New York Heart Association (NYHA) functional class or left ventricular ejection fraction (LVEF) for better performance).

Model Performance
We used different measures of performance to test the potential incremental prognostic value of the three biomarkers as follows: Discrimination. The ability of the model to discriminate between patients who will have and will not have the event along all follow-up was measured by means of the C-statistic obtained from a generalization of Somers 'Dxy' rank correlation, which equals 26(c21/2), where c is the concordance (discrimination) probability [16], which already incorporates information from censored data.
Calibration. How well the observed incidence rate fit the predicted risk was measured by Nam-D'Agostino statistics using the Hosmer and Lemeshow test for censored survival [17]. Calibration using this method was calculated for one-, two-and three-year mortality.
Accuracy. The integrated Brier score for censored observations was used to measure the accuracy of probabilistic predictions [18]. A lower score represents higher accuracy. This score takes values between 0 and 1 and was calculated for one-, two-and three-year mortality.
Best prediction. The Bayesian information criterion (BIC) and the Akaike information criterion (AIC), measures of the relative goodness-of-fit of a statistical model, were used to compare non-nested models. Lower values indicate a better model along all follow-up. Both indicators take into account the events along all follow-up.
Reclassification. We used the method described by Pencina et al. [19]. Integrated discrimination improvement (IDI) considers changes in the estimated mortality prediction probabilities as a continuous variable. Net reclassification improvement (NRI) requires a previous definition of meaningful risk categories; we used tertiles for the risk of death: ,18.5%, 18.5-41%, and .41%. NRI considers changes in the predicted probabilities of estimated mortality that imply a change from one category to another. Reclassification was evaluated for one-, two-and three-year mortality.
Generalization or validation. To assess how the results of the models can be generalized to an independent data set, a 10fold cross-validation technique was used [20]. Using a bootstrapping technique, we created 1000 samples (allowing repetition) equal in size to the present cohort. One by one, each of the 1000 samples was split into 10 distinct blocks roughly equal in size. We left out the first block (the testing set) and fit a model with the remaining blocks (the training set) to predict the held-out-block. We continued this process until the model predicted all 10 heldout-blocks. The mean C-statistic was calculated and the process repeated for all 1000 samples.

Calculator algorithms
Mortality. The calculator was designed to run with the availability of none, one, two, or three of the chosen biomarkers, using the best model for each available combination. To calculate the probability of developing an event at a specific time, the following formula was applied: Life expectancy. To get an estimate of life expectancy we refitted all Proportional Hazard Cox-Regression models in parametric Weibull models [21]. In those models the mean survival, ''E(T)'', is estimated by: where, m m is the estimated intercept obtained from the model

Results
After an initial evaluation of 23 variables, eight models (one without biomarkers and seven with combinations of the three studied biomarkers) were finally included in the risk calculator tool. Table 1 provides the evaluated demographic, clinical, and biochemical characteristics of the studied patients with univariate and multivariate Cox regression analysis. During a median followup of 3.4 years (25 th -75 th percentiles 1.8-5.0 years) 305 deaths occurred. The follow-up for alive patients was 4.4 years (25 th -75 th percentiles 2.7-5.2). The following variables emerged as significant in at least one of the models: age, sex, NYHA functional class, LVEF, estimated glomerular filtration rate (eGFR), serum sodium, hemoglobin, daily loop diuretic dose, beta-blocker, angiotensin converting enzyme inhibitor (ACEI)/angiotensin-2 receptor blocker (ARB), statin treatment and hs-cTnT, ST2, and NTproBNP levels. No pair-wise interaction between variables achieved significance. Variables excluded from the models due to the lack of statistical improvement in the model were: ischemic etiology of HF, diabetes mellitus, body mass index, blood systolic pressure, heart rate, atrial fibrillation, chronic obstructive pulmonary disease, hypertension, iron deficiency, cystatin-C, spironolactone/eplerenone treatment, cardiac resynchronization therapy (CRT), and implantable cardiac defibrillator (ICD).
In the 'clinical model' (Model 1), LVEF (HR 0.69, P = 0.016), eGFR (HR 0.99, P = 0.017), and ACEI/ARB treatment (HR 0.67, P = 0.014) were significant outcome predictors. In the other models, these variables often lost significance after the addition of biomarkers. In Model 8, in which the predictors were adjusted for the three biomarkers, age (HR 1, P,0.001) and NYHA functional class (HR 1. To calculate the probabilities to develop an event at specific time for a particular covariates combination, beta coefficients, survival at the mean of covariates and the sum of the product of coefficients per covariates mean are needed. Survival at the mean of covariates was 94.2% at 1 year, 87.5% at 2 years and 80.2% at 3 years. The remaining values are shown in Table S1 in File S1. When a covariate added no increased prognostic accuracy, it was not included in the risk calculation. An example of calculator functioning is shown in the appendix. Table 2 shows the C-statistic for the 'clinical model' and all of the models containing biomarkers (alone or in combination) in the derivation sample. The model with the three biomarkers had a C-statistic of 0.794 (95% CI 0.77;0.817). Calibration for 1-, 2-, and 3-year mortality was good (non-significant in the Hosmer and Lemeshow test) (Fig. 1).
Reclassification for 1-, 2-, and 3-year mortality was better in the models containing more than one biomarker, with the highest found using the combination of ST2 and hs-cTnT (Model 7; Table 2). The best overall performance was observed with models 7 and 8 ( Table 2).
A web-based calculator (Fig. 2) (www.BCNBioHFcalculator.cat) has been developed, allowing interactive calculation of estimated individual probability. A graphic with monthly mortality probabilities is also available. Risk of death was found to be largely influenced by biomarkers' results. As a practical exemple a 68 year-old male in NYHA class III, LVEF 30%, sodium 130 mmol/ L, eGFR 45 ml/min/m2, hemoglobin 12 g/dl, taking 60 mg of furosemide and on treatment with statins, ACEI and betablockers had a risk of death of 22%, 42% and 60% at 1,2 and 3 years, respectively. When adding the following biomarker levels: hs-cTnT 14 ng/L, ST2 40 ng/mL and NTproBNP 900 ng/L, the risk fell to 10%, 21% and 32%. However, if biomarker data had been hs-cTnT 70 ng/L, ST2 140 ng/mL, and NTproBNP 2500 ng/L the risk would rise to 35%, 62% and 80%, respectively (Fig. S1).
In the 10-fold cross-validation analysis with 1000 bootstrapping, the average C-statistic for the model with all combined biomarkers was 0.79 (Fig. 3), suggesting that the results may be generalized safely to independent data sets.
Mean (95% confidence interval) life expectancy for the entire cohort was 11.8 years (11.1212.4) and expected mean age (95% confidence interval) for death was 80.2 years (79.7280.7) using clinical model; and 11.4 years (10.7212.0) and 79.8 years      Table S2 in File S1.

Discussion
HF risk prediction is a cornerstone of HF management. The development of an accurate HF risk calculator has the potential for tailored management. The BCN Bio-HF calculator reported here was derived from a real-life contemporarily treated consecutive cohort and includes, in addition to conventional prediction factors, three serum biomarkers (NTproBNP, ST2, and hs-cTnT) that are highly accurate for cardiac malfunction.
Mortality risk prediction models specific to the HF population have been developed with broad variation in the degree of validation and concretion of prognostic output, from classification into risk groups (low-high risk, low-medium-high risk, risk deciles) [225], to life expectancy [6] or individual mortality at a certain time point [3,527,10]. Most of these models have not included a substantial proportion of patients taking evidence-based treatments, including ACEI/ARBs, beta-blockers, and spironolactone/ eplerenone, or were developed only for patients admitted to the hospital [8,9]. In the Seattle HF Model, the relative effect for HF medications could not be obtained from the derivation cohort and benefits were estimated from published trials or meta-analyses. In our cohort of ambulatory patients, 87% were on beta-blockers, 90% on ACEI/ARBs, and 40% on spironolactone/eplerenone. Independent of the causality of the risks and benefits of treatments, taking evidence-based HF drugs clearly reduces the risk of death, and they merit inclusion in a risk calculator. In fact, the estimated risk can be very significantly modified by treatment, both in the model without biomarkers and in the model containing biomarkers where treatment can also modify their level and the calculated risk of death would consequently change.
Some scores [426] have the advantage of a large derivation cohort. Their limitation is that all of the subjects in their derivation and validation samples were participating in a clinical trial and how well they represent those in routine clinical practice is unknown. In addition, all of the samples were obtained over a decade ago, and none included biomarker testing. The very recent 3C-HF score [7] did include contemporary treatment but not biomarkers in a mixed population of in-and outpatients included both prospectively and retrospectively. Also the even more recent MAGGIC score, derived from a metaanalysis of 30 studies [10]  did not include biomarkers. In the discussion, however, the authors of this score state: ''Any new risk score's success depends on the patient variables available for inclusion. Current knowledge of biomarkers in HF is inevitably ahead of what data are available across multiple cohort studies… but could not be included in our model. In principle, its inclusion would enhance further the excellent prognostic discrimination we achieved with routinely collected longestablished predictors.'' The inclusion of B-type natriuretic peptide in the Valsartan-Heart Failure Trial to the Seattle HF Model increased AUC by < 0.03 [6]. In the BCN Bio-HF Calculator, we included three commercially available complementary biomarkers that provide information about myocyte necrosis (hs-cTnT), fibrosis, remodeling, and inflammation (ST2), and chamber strain (NTproBNP). Other biomarkers are in the pipeline for the HF field, but some of them are not yet commercially available (i.e. growth differential factor-15) and others reflect pathways that overlap those used here. We and others previously reported on the prognostic utility of these three biomarkers [12,14,15]. This calculator was developed with eight models that include none, one, two, or the three biomarkers, allowing its use with any combination of biomarkers. This characteristic is unique to this new tool, which in combination with the use of state-of-the-art statistics for biomarker values, which include C-statistic, as well as calibration and reclassification, makes it more robust.
The Seattle HF Model [6] is probably the most extensively used model. It was prospectively validated in several trials. The validation AUC varied from 0.68 to 0.81 in these diverse populations, with an overall AUC of 0.73 and an AUC of less than 0.70 in the three biggest cohorts [6]. AUC from other studies rank from 0.75 (CHARM two-year mortality [4]) to 0.83 in the validation cohort of the 3C-HF score (one-year mortality with logistic regression analysis [5]). The use of Somers 'Dxy' rank correlation in the C-statistic analysis, which already incorporates information from censored data, is more correct from the survival point of view rather than determination of C-statistic for death at one fixed point with logistic regression models. The C-statistic analysis using logistic regression model in our population was 0.82 for 1-year, 0.82 for 2-year and 0.83 for 3-year mortality. We evaluated both the Seattle HF Model and the 3C-HF score in our population. Taking into account the inherent limitations (default values of Seattle HF for percentage of lymphocytes as well as ''diabetes'' instead of ''diabetes with organ damage'' for 3C-HF) the C-statistic using Somers 'Dxy' rank correlation in such models were 0.71 (95% CI 0.678-0.79) for the Seattle HF model and 0.73 (95% CI 0.68-0.73) for the 3C-HF score.
The validation obtained in our 10-fold cross-validation analysis with 1000 bootstrapping was substantially higher, averaging 0.79. Cross-validation is useful, especially when additional samples are hazardous, costly, or impossible to collect. The resulting average accuracy is likely somewhat of an underestimate for the true accuracy when the model is trained on all data and tested on external data (the optimal way for validation), but in most cases this estimate is reliable, particularly if the amount of available data is sufficiently large and the external data follows the same distribution as the available data [22]. Both the Seattle HF Model and the BCN Bio-HF calculator provide the individual risk of death at several points of time without the necessity of a physician calculating the score as an intermediate step. Also, as an added value to other scores, both allow predicting life expectancy, although using different statistical methods.
A number of the clinical variables in our calculator are also included in the Seattle HF Model, though the former has fewer variables. Some variables that may be considered clinically important, such as devices, aldosterone blockers, and systolic blood pressure, were excluded from our model due to the absence of significance in the multivariable model. In the case of devices, particularly ICD and CRT, the lack of significance could be influenced by the limited number of patients with such devices. Remarkably other variables such as blood pressure, ischemic etiology and diabetes did not achieve statistical significance in the multivariate analysis and did not improve the model prediction of risk, and were not included in the calculator.
Recently, Ky et al. [23] showed that adding a more complex biomarker panel consisting of high-sensitivity C-reactive protein, myeloperoxidase, B-type natriuretic peptide, soluble fms-like tyrosine kinase receptor-1, troponin I, ST2, creatinine, and uric acid to the Seattle HF Model improves the predictive accuracy for 1-year all-cause death, with a C-statistic up to 0.8. In contrast, both of our clinical and biomarker additive models were less complex but performed similarly. Choosing the panel of biomarkers to deploy in clinical practice will depend on factors such as cost and ease of assay, among others.

Limitations
Our population was a general HF population treated at a HF unit in a tertiary hospital. Most patients were white and referred from the cardiology department and, thus, relatively young men with HF of ischemic etiology and reduced LVEF. As such, risk prediction is more accurate in these patients. The risk calculator is based on ambulatory patients with chronic HF and may require extensive adjustments when applied to an inpatient population, some of whom have acute decompensated HF. Absence of external validation represents an acknowledged limitation, although we overcame it using 10-fold cross-validation analysis with 1000 bootstrapping as already discussed.

Conclusion
We developed a new HF risk calculator that incorporates available biomarkers reflecting different pathophysiological pathways and allows quick and easy prediction of death at 1, 2, and 3 years. Figure S1 Graphic with monthly mortality probabilities in the same patient (68 year-old male in NYHA class III, LVEF 30%, sodium 130 mmol/L, eGFR 45 ml/min/m2, hemoglobin 12 g/ dl, taking 60 mg of furosemide and on treatment with statins, ACEI and betablockers) according to model without biomarkers