Multimarker Proteomic Profiling for the Prediction of Cardiovascular Mortality in Patients with Chronic Heart Failure

Risk stratification of patients with systolic chronic heart failure (HF) is critical to better identify those who may benefit from invasive therapeutic strategies such as cardiac transplantation. Proteomics has been used to provide prognostic information in various diseases. Our aim was to investigate the potential value of plasma proteomic profiling for risk stratification in HF. A proteomic profiling using surface enhanced laser desorption ionization - time of flight - mass spectrometry was performed in a case/control discovery population of 198 patients with systolic HF (left ventricular ejection fraction <45%): 99 patients who died from cardiovascular cause within 3 years and 99 patients alive at 3 years. Proteomic scores predicting cardiovascular death were developed using 3 regression methods: support vector machine, sparse partial least square discriminant analysis, and lasso logistic regression. Forty two ion m/z peaks were differentially intense between cases and controls in the discovery population and were used to develop proteomic scores. In the validation population, score levels were higher in patients who subsequently died within 3 years. Similar areas under the curves (0.66 – 0.68) were observed for the 3 methods. After adjustment on confounders, proteomic scores remained significantly associated with cardiovascular mortality. Use of the proteomic scores allowed a significant improvement in discrimination of HF patients as determined by integrated discrimination improvement and net reclassification improvement indexes. In conclusion, proteomic analysis of plasma proteins may help to improve risk prediction in HF patients.


Introduction
In spite of recent therapeutic improvements, chronic heart failure (HF) remains a major public health problem [1,2] with a high rate of mortality [3]. Risk stratification is a critical issue in patients with systolic HF since high-risk patients can therefore be considered for invasive strategies such as implantable assist devices and/or cardiac transplantation. Variables such as New-York Heart Association (NYHA) class, left ventricular ejection fraction (LVEF), brain natriuretic peptide (BNP), or variables obtained during cardiopulmonary exercise testing (peak oxygen consumption (peak VO 2 )) have been associated with the outcome of HF patients [4,5,6,7]. In spite of these advances, risk stratification of HF patients needs to be further improved. Indeed, there remains variability in the prognosis with some patients who are categorized at low risk but experience early mortality; and conversely, patients categorized as severe but have an unexpectedly prolonged survival.
There is a need for novel prognostic markers that may help to better stratify the risk of major cardiac events in HF patients. Recently, a systematic review of 55 papers dealing with risk prediction model accuracy has only shown a moderate succesfulness for prediction of mortality in HF patients using "classic" prognostic evaluation and emphasized the need for models using a systematic biology approach [8]. Due to their availability and non-invasive nature, circulating biomarkers are currently the subject of intense research in this area [9,10]. Surface enhanced laser desorption ionization-time of flight-mass spectrometry (SELDI-TOF-MS), a proteomic technology which is a combination of chromatography on proteinchip arrays and mass spectrometry, offers a high throughput non a priori strategy for the detection of differentially expressed biomarkers [11]. This may thus allow developing a multimarker strategy for improving risk prediction in HF patients.
The aim of the present study was to investigate the potential value of plasma proteomic profiling for risk stratification in HF. Proteomic scores predictive of cardiovascular mortality were developed in a discovery population of chronic HF patients; then, their performances were tested in a validation cohort and challenged against established prognostic indicators.

Methods Population
All patients referred for evaluation of systolic HF (LVEF <45%) in our institution between November 1998 and May 2010 have been included in a prospective cohort on prognostic indicators [6,12,13,14]. The study was approved by the ethics committee of the Centre Hospitalier de Lille (Lille, France) and complies with the Declaration of Helsinski. All patients gave written informed consent. All patients were ambulatory and clinically stable for at least 2 months, and received optimal medical therapy with maximal tolerated doses of angiotensin-converting enzyme inhibitors and betablockers. At inclusion, patients underwent a prognostic evaluation including: BNP level assessment, echocardiography, and cardiopulmonary exercise testing as previously described [6,13]. In addition, patients underwent a coronary angiogram to help define the etiology of left ventricular (LV) systolic dysfunction as either ischemic or nonischemic. A follow-up was performed at 3 years to assess clinical outcome. All events were adjudicated by two investigators and by a third one in case of disagreement. Cardiovascular death included cardiovascular-related death, urgent transplantations defined as United Network for Organ Sharing (UNOS) status 1), and urgent assist device implantation.
The flow chart of the present study is shown in Fig 1. For the discovery phase, we selected 198 patients included between November 1998 and December 2005. Ninety nine patients who died from cardiovascular cause within 3 years after the initial prognostic evaluation (cases) were individually matched for age, sex, and HF etiology with 99 patients who were still alive at 3 years (controls). For the validation phase, the proteomic analysis was repeated in a population of 344 consecutive patients included between January 2006 and May 2010.

Proteomic analyses of plasma samples
Detailed process is described in supplemental methods (see S1 Methods). At inclusion, peripheral blood was collected in tubes containing EDTA and plasma samples were stored at -80°C. Prior to the proteomic study, plasma samples underwent no more than two freeze/thaw cycles. One mL of each plasma sample was treated with the ProteoMiner protein enrichment kit (Bio-Rad Laboratories, Hercules, CA, USA) as previously described [15,16]. This combinatorial peptide ligand library (CPLL) method has been shown to be reproducible and allow access to proteolytic fragments [11].
Proteomic analyses were performed on CPLL-treated plasma samples in both populations using the SELDI-TOF-MS technique. CPLL-treated plasma samples were profiled with eightspot format CM10 (Weak Cation Exchanger) and H50 (Reverse Phase) ProteinChip arrays (Bio-Rad Laboratories). To obtain ion m/z peak intensities, all arrays were read in an automated PBS 4000 SELDI-TOF-MS (Bio-Rad Laboratories) as previously described [11]. Samples were analyzed in duplicate and randomly distributed on arrays. Representative mass spectra are displayed in S1 Fig.

Statistical analyses
All statistical analyses were performed using R Statistical Package version 3.0. Continuous variables are presented as mean ± standard deviation (SD) and were compared using Student's t-test. Categorical variables are expressed as absolute number and/or percentages and were compared using the χ2 test or the Fisher test as appropriate. Single imputation was used for clinical and proteomic missing data. In the discovery set, missing proteomic data (1 patient) were imputed with the median of the corresponding ion m/z peak intensity. In the validation set, 9 peak VO 2 values and 2 BNP values were imputed with their respective medians.
Proteomic variables were standardized before further analyses by subtracting the mean then dividing by the SD to have a mean of 0 and a SD of 1. Detailed analysis is described in the supplemental methods (see S2 Methods). The mean intensity of each ion m/z peak was compared between cases and controls with a Bonferroni correction to account for multiple testing. Three different statistical regression methods were applied on the selected ion m/z peaks in the discovery set to calculate proteomic scores predictive of cardiovascular mortality: the support vector machine (SVM), the sparse partial least square discriminant analysis (sPLS-DA), and a lasso logistic regression (LASSO). We used the following R packages: "kernlab" R package (version 0.9-19) for SVM [17], "spls" R package (version 2.2-1) for sPLS-DA [18] and "glmnet" R package (version 1.9-5) for LASSO [19]. The same 3 models were applied in the validation cohort to compute the predicted probabilities of cardiovascular death. Receiver operating characteristic (ROC) curve analysis was used to display the performance of the proteomic scores. Multivariate logistic regressions relating cardiovascular mortality to the proteomic scores were performed to calculate odds ratios (OR) and corresponding 95% confidence intervals (CI). Covariables included in final logistic regression models were: age, sex, HF etiology, diabetes, creatinine, NYHA class, BNP, LVEF and peak VO 2 . Finally, the incremental value of the proteomic scores to predict the cardiovascular mortality risk, when added to models with established prognostic indicators, was estimated with the continuous net reclassification improvement (NRI), and the integrated discrimination improvement (IDI). Table 1 shows the baseline characteristics of the patients included in the discovery population. The patients who died from a cardiovascular cause during the 3-year follow-up showed significantly higher NYHA class, BNP level, creatinine level, and lower peak VO 2 as compared to patients alive at 3 years. A total of 203 different ion m/z peaks was detected and analyzed by SELDI-TOF-MS in the CPLL-treated plasma of these patients (S1 Table). The 42 ion m/z peaks found to be differentially intense after Bonferroni correction (highlighted in blue in S1 Table) were used to build the proteomic scores. Sixteen of these peaks were highly correlated with correlation coefficients >0.9 (S2 Table) requiring feature selection in model construction. As shown in Fig 2A, the values of the proteomic scores obtained with the 3 statistical methods were consistently and significantly higher in cases as compared to controls. The ROC curves are shown in Fig 2B. High and similar areas under the curves (AUC) values (0.86-0.88) were observed. The 3 proteomic scores were highly correlated (S3 Table).

Results
The proteomic scores were then tested in the validation population. Patients with non-cardiovascular death and patients with non-urgent heart transplantation (n = 35) were excluded from the analysis. The remaining 266 patients with no event were compared to the 43 patients with cardiovascular death during the 3-year follow-up period. As shown in Table 2, patients who died from a cardiovascular cause had more often ischemic HF, higher NYHA class and BNP, and lower LVEF and peak VO 2 . The values of the 3 proteomic scores were significantly higher in patients who died (Fig 3A). ROC curves are shown in Fig 3B;  Other variables kept into the models were NYHA class and peak VO 2 . Finally, continuous NRI and IDI demonstrated that the proteomic scores calculated with the three methods significantly improved the prediction of cardiovascular death in HF patients (Table 3).

Discussion
It has been emphasized by recent international guidelines that the assessment of prognosis is an important step for the management of chronic systolic HF, particularly when counselling patients about devices and cardiac transplantation [20]. The aim of our study was to demonstrate that a multimarker strategy could be used for risk prediction in these patients. For that purpose we used an "unbiased" proteomic technique for plasma profiling. The SELDI-TOF-MS technique has previously provided meaningful prognostic information in various diseases such as cancer [21,22]. For instance, Belluco et al. [23] reported a profile combining 7 ion m/z peaks that yielded a sensitive and specific diagnostic procedure to discriminate women with stage 1 breast cancer from women without breast cancer. To the best of our knowledge, the present study is the first demonstration that a proteomic approach may improve risk stratification in HF patients. We paid close attention to the phenotyping of the study populations. It should be underlined that the patients were well treated regarding their HF status with more than 90% receiving ACE inhibitors and betablockers, reflecting a modern practice in CHF. In addition, all the main predictors of cardiovascular mortality (age, sex, aetiology of HF, NYHA class, LVEF, BNP, creatinine and peak VO 2 ), previously identified in the literature [4,5,6,7], were assessed with few missing values.
The proteomic scores were built using plasma proteomic biomarker-classifier based on 42 ion m/z peaks differentially intense in the discovery cohort; this strategy of discriminatory patterns for disease detection has already been successful in previous studies [24]. To select a good biomarker panel and in order to minimize the effects of overfitting [25], we used 3 different statistical regression methods to set up proteomic scores. Our data in the validation population show that the proteomic scores provide information that are independent from the currently "classic" prognostic markers (NYHA class, LVEF, BNP, peak VO 2 ). Incremental improvement in model performance with the proteomic scores was also demonstrated.
Our results thus suggest that a multimarker strategy based on plasma proteomics has the potential to be clinically useful for the risk stratification in HF patients. In practice however, it should be acknowledged that the SELDI-TOF-MS technique has some limitations including difficulties in ion m/z peak identification and/or low rate of new analytes going from discovery to accurate measurement by specific assays in routine clinical practice. In addition, pre-analytic treatment of the samples is time-consuming and this technology will not be commercially available in the next years. Nevertheless it should also be emphasized that emerging proteomic technologies may provide easier access to the global information contained in patient plasma proteome in the next future [26,27]. These technologies associated with advances in the field of  computation biology employing artificial neural networks to analyze complex changes in multiple biomarkers simultaneously may potentially modify prognostic evaluation of HF patients [28].  Table. List of ion m/z peaks detected in the discovery population. Each ion m/z peak detected is named using the initial "p" followed by its m/z value, the type of array on which it was detected (H50 or CM10) and then the laser intensity: low mass (LM) or high mass (HM). Ion m/z peak intensities are expressed as mean ± SD. The 42 ion m/z peaks that reach a significant p value after Bonferroni correction in the discovery population are highlighted in blue and were used to calculate the proteomic scores. (DOC) S2 Table. Pearson correlation matrix of the 42 ion m/z peaks used to build the proteomic scores. Each ion peak detected is named using the initial "p" followed by its m/z value, the type of array on which it was detected (H50 or CM10) and then the laser intensity: low mass (LM) or high mass (HM). (DOC) S3 Table. Pearson correlation matrix of the proteomic scores in the discovery population.
The proteomic scores were developed using the support vector machine (SVM), the sparse partial least square discriminant analysis (sPLS-DA) and the lasso logistic regression (LASSO). (DOC) S1 Fig. Representatitive mass spectra. The upper mass spectrum was obtained from the H50 proteinchip array using low-mass (LM) parameter settings and the lower mass spectrum was obtained from the CM10 proteinchip array using high-mass (HM) parameter settings. (TIF) and NRI.non-event (corresponding to the capacity of the proteomic score to reclassify those alive patients).