Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Random Forest Based Risk Model for Reliable and Accurate Prediction of Receipt of Transfusion in Patients Undergoing Percutaneous Coronary Intervention

  • Hitinder S. Gurm ,

    Affiliation Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, Michigan, United States of America

  • Judith Kooiman,

    Affiliation Department of Thrombosis and Hemostasis and Department of Nephrology, Leiden University Medical Center, Leiden, The Netherlands

  • Thomas LaLonde,

    Affiliation Department of Internal Medicine, St John Providence Health System, Detroit, Michigan, United States of America

  • Cindy Grines,

    Affiliation Department of Internal Medicine, Detroit Medical Center, Detroit, Michigan, United States of America

  • David Share,

    Affiliation Blue Cross Blue Shield of Michigan, Detroit, Michigan, United States of America

  • Milan Seth

    Affiliation Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, Michigan, United States of America

A Random Forest Based Risk Model for Reliable and Accurate Prediction of Receipt of Transfusion in Patients Undergoing Percutaneous Coronary Intervention

  • Hitinder S. Gurm, 
  • Judith Kooiman, 
  • Thomas LaLonde, 
  • Cindy Grines, 
  • David Share, 
  • Milan Seth



Transfusion is a common complication of Percutaneous Coronary Intervention (PCI) and is associated with adverse short and long term outcomes. There is no risk model for identifying patients most likely to receive transfusion after PCI. The objective of our study was to develop and validate a tool for predicting receipt of blood transfusion in patients undergoing contemporary PCI.


Random forest models were developed utilizing 45 pre-procedural clinical and laboratory variables to estimate the receipt of transfusion in patients undergoing PCI. The most influential variables were selected for inclusion in an abbreviated model. Model performance estimating transfusion was evaluated in an independent validation dataset using area under the ROC curve (AUC), with net reclassification improvement (NRI) used to compare full and reduced model prediction after grouping in low, intermediate, and high risk categories. The impact of procedural anticoagulation on observed versus predicted transfusion rates were assessed for the different risk categories.


Our study cohort was comprised of 103,294 PCI procedures performed at 46 hospitals between July 2009 through December 2012 in Michigan of which 72,328 (70%) were randomly selected for training the models, and 30,966 (30%) for validation. The models demonstrated excellent calibration and discrimination (AUC: full model  = 0.888 (95% CI 0.877–0.899), reduced model AUC = 0.880 (95% CI, 0.868–0.892), p for difference 0.003, NRI = 2.77%, p = 0.007). Procedural anticoagulation and radial access significantly influenced transfusion rates in the intermediate and high risk patients but no clinically relevant impact was noted in low risk patients, who made up 70% of the total cohort.


The risk of transfusion among patients undergoing PCI can be reliably calculated using a novel easy to use computational tool ( This risk prediction algorithm may prove useful for both bed side clinical decision making and risk adjustment for assessment of quality.


Bleeding and transfusion after PCI have been associated with increased morbidity, short term and long term mortality and increased health care cost[1], [2], [3], [4]. Although there is considerable debate on the causal versus casual nature of the relation between bleeding and mortality, there is general consensus that bleeding is a negative outcome following PCI and is best avoided[5]. Assessment of transfusion after PCI as a quality measure is more complex since blood transfusion is clearly necessary in some patients, and occasionally may even be life-saving, whereas it may be avoidable in others[2]. Transfusion has been associated with several adverse outcomes and is associated with worsened short and long term survival in patients with acute coronary syndrome and following coronary revascularization[6], [7], [8]. There is increasing evidence that restrictive blood transfusion policies may be beneficial in patients with cardiac disease and there is increasing focus on transfusion as a quality improvement objective.

Considerable variation in transfusion rates have been identified across institutions and it remains unclear if this is driven by variations in case mix and/or practice[9]. Lack of a validated model to predict likelihood of transfusion serves as an impediment to benchmarking and guiding quality improvement. Further, such a model, if available could help guide individualized care and guide therapeutic strategies to reduce transfusion in patients who are most at risk.

The widespread use of computers in medical care has opened up the possibility of bed side application of more complex tools that leverage developments in statistical science, and facilitate use of algorithms that cannot be easily converted into risk scores[10], [11]. We have recently reported on such a tool for prediction of contrast induced nephropathy in patients undergoing PCI[12].

The goal of our work was to use a similar approach to develop a highly accurate model for prediction of transfusion using pre-procedural variables that are routinely collected in patients undergoing PCI, while retaining the advantages of bed side applicability. Further, we evaluated the impact of bleeding avoidance strategies on observed transfusion rates based on predicted transfusion risk.


We developed and validated the transfusion model using data from the Blue Cross Blue Shield of Michigan cardiovascular consortium (BMC2), a quality improvement collaborative that tracks the inpatient outcome of consecutive patients undergoing PCI at all non-federal hospitals in the State of Michigan. The details of the BMC2 and its data collection and auditing process have been described previously[13], [14]. BMC2 registry is a clinical registry that tracks the outcome of all consecutive patients undergoing PCI at the participating institutions. Procedural data are collected using standardized data collection forms. Baseline data include clinical, demographic, procedural, and angiographic characteristics as well as medications used before, during, and after the procedure, and in-hospital outcomes. All data elements have been prospectively defined. In addition to a random audit of 2% of all cases, medical records of all patients undergoing multiple procedures or coronary artery bypass grafting (CABG) and of patients who died in the hospital are reviewed routinely to ensure data accuracy. The audit has revealed a data accuracy of over 95% for the study population.

The BMC2 registry and waiver of patient consent has been either approved by or the need for approval waived by the IRB at each of the participating hospitals. The University of Michigan has waived the need for IRB approval on all analysis that are performed using BMC2 data. The need for consent has been waived since all data are anonymous and no patient identifiers are collected.

The study population for this analysis included all consecutive patients who underwent PCI between July 2009 through December 2012. Patients who underwent coronary artery bypass grafting during the same hospitalization were excluded from the analysis since a post –operative transfusion could not be distinguished from post PCI transfusion. The choice of vascular access, procedural anticoagulation and decision to transfuse was as per the operator preference guided by institutional policy and practice.

Study endpoints

The primary endpoint for our study was blood transfusion. Transfusion was defined as transfusion of packed red cells or whole blood after the PCI procedure but prior to hospital discharge irrespective of the total number of units transfused. Baseline hemoglobin was collected within a month of the procedure. Among patients who had multiple assessments of hemoglobin, the value closest to the time of the procedure was considered as the baseline value.

Model development

The model was developed using a random forest method as previously described.[12] The study cohort was divided randomly into training and validation datasets, with 70% of procedures assigned to training, and the remaining 30% utilized for validation. A random forest regression model was trained for predicting transfusion using 45 baseline clinical variables including pre-procedural medications, with missing predictors imputed to be the overall median for continuous values and mode for categorical variables. Details of random forest methods have been described elsewhere[15]. Briefly random forest is an ensemble classification method that determines a consensus prediction for each observation by averaging the results of many individual recursive partitioning tree models. Each of the individual trees are fitted to a randomly selected subset of the observations, and utilize a random subset of the available predictors at each node as candidates for splitting. Random forests have been shown to have good predictive value, and are generally robust to issues of over-fitting, and missing data, and are particularly suited for evaluating a large number of possible predictors and exploiting potential interactions between predictors and their relationship with the outcome[16]. The transfusion outcome was entered as a continuous variable coded as 1 in patients who were transfused, and 0 for those not meeting the criteria to facilitate regression rather than classification modeling, so that estimated means (leaf node probabilities of transfusion) assigned to a given observation were then aggregated in the ensemble. To facilitate the development of an easy to use bedside tool, a reduced model was also trained using only the fourteen most important predictors as assessed in the full model by the incremental decrease in node impurity (residual sum of squares) associated with splitting on the predictor averaged over all trees in the ensemble.

Model validation

The full and reduced models were evaluated in terms of discrimination and calibration in the validation dataset through evaluation of the area under the ROC curve (AUC), and by graphical examination of observed versus predicted transfusion rates after grouping observations by predicted risk (<1%, 1–2%, 2–3%, 3–5%, 5–10%, 10–15%, 15–20%, 20–30%, 30–40%, and >40%). The net reclassification index was used to compare full and reduced model performance, after classifying the predicted risk as low, medium and high; p-values and confidence intervals were obtained through bootstrapping[12], [17], [18]. Random forest estimates for observations in the validation dataset were scaled so that the overall predicted transfusion rate for the validation sample matched the overall transfusion rate observed in the training dataset.

The potential application of the tool for guiding individualized decision making was assessed by comparing the predicted and observed transfusion rates in the low, medium, and high risk categories in patients treated with heparin alone, heparin and platelet glycoprotein IIbIIIa inhibitor (GPI) and bivalirudin. For patients in each risk category, the unadjusted number needed to treat (NNT) with bivalirudin compared to GPI in order to prevent one transfusion was estimated as the inverse of the absolute difference in observed transfusion rates. A similar calculation was made for use of radial versus femoral access.

All analyses were performed in R version 2.14.1 using freely distributed contributed packages[19], [20].


Our study cohort comprised of 103,294 (99%) of 104,408 procedures performed across Michigan between July 2009 through December 2012. We excluded 1047 (1%) patients since they underwent CABG during the same hospitalization (n = 1,018) or when post procedural CABG data were not available (n = 29), and 67 patients for whom post-procedural transfusion data were not available.

The training dataset consisted of 72,328 PCI procedures of which 2156 (3.0%) were accompanied by transfusion, and the validation dataset of 30,966 procedures of which 922 (3.0%) were followed by a transfusion. All baseline variables presented in Table 1 were included in the full random forest model. The training and validation datasets were similar in terms of baseline covariates (Table 2). The variables with the largest model determined importance are listed in Table 3, and Table 4 provides their distribution in training dataset patients both with and without transfusion. This set of predictors was used to fit the reduced random forest model that is available for use at

Table 2. Characteristics of patients in the training and the validation cohort.

Table 3. Patient/procedural characteristics selected for reduced model.

Table 4. Distribution of abbreviated model covariates by transfusion status in the training dataset.

When evaluated in the validation dataset, both models provide good discrimination for transfusion, with the full model having a small but statistically significant advantage in AUC (full model AUC: 0.888 [95% CI, 0.877–0.899], reduced model AUC: 0.880 [95% CI, 0.868–0.892]. p for difference  = 0.003). Both models demonstrated high calibration (Figure 1) with good concordance between observed and predicted transfusion rates.

Figure 1. Calibration plot depicting observed transfusion across predicted risk using the full and the abbreviated model.

The full and reduced model predictions were grouped into low risk (<1%), intermediate risk (1–5%), and high risk (>5%) categories and the number of patients along with the observed transfusion rate in each group is presented in Table 5. The patients in the highest risk category comprised one sixth of the total population but received over 75% of the transfusions. The net reclassification improvement statistic for the full model relative to the reduced model for these categories was small but statistically significant (NRI: 2.77%, [0.62–5.06%], p = .007).

Figure 2 depicts the observed transfusion rates across the three predicted risk categories in patients treated with heparin only, bivalirudin and GPI (with heparin).The use of GPI is associated with the highest transfusion rates while bivalirudin was associated with the lowest transfusion rates overall, although in the lowest risk category the transfusion rates for all three anticoagulant strategies were very small, so that the absolute differences were not clinically meaningful (<.5%). The highest risk group by contrast, demonstrated the greatest absolute difference in bleeding (>5%) so that only 19 patients would need to be treated with bivalirudin instead of GPI to prevent one transfusion (Table 6). Figure 3 provides transfusion rates by risk categories for patients with radial and femoral vascular access. When the impact of access site on transfusion was considered, the greatest benefit of radial access was seen in patients in the highest risk category although a lower transfusion rate was observed with radial access in all risk groups. The number needed to treat with radial versus femoral approach to prevent one transfusion was 18 for the highest risk category and 244 for the lowest risk cohort.

Figure 2. The observed transfusion rates across the three predicted risk categories in patients treated with heparin only, bivalirudin and platelet glycoprotein IIbIIIa inhibitor (with heparin) is depicted in panel A.

Panel B depicts the total number of patients treated with each anticoagulation strategy across the three transfusion risk groups.

Figure 3. The observed transfusion rates across the three predicted risk categories in patients treated with femoral versus radial access is depicted in panel A.

Panel B depicts the total number of patients treated with the two access routes across the three transfusion risk groups. There is an inverse association between predicted transfusion risk and access route with radial access being more commonly used in the low risk patients.

Table 6. Projected numbers needed to treat (NNT) to prevent one transfusion across categories of predicted risk.


The key finding of our study is that the risk of transfusion in patients undergoing PCI can be reliably estimated using standard clinical and laboratory variables that are routinely collected in this population. Secondly, this tool helps identify patient subgroups that are at higher or lower risk of needing a transfusion and can therefore guide appropriate choice of pharmacotherapy or vascular access in a cost effective fashion. The robust discrimination and calibration of this method, combined with the ease of use for simplified bedside prediction, makes this model an easy tool for routine clinical practice.

Risk stratification models have been advocated for two broad usages: patient level decision making (for guiding informed consent, and therapeutic decision-making) and risk adjustment for assessment of quality of care. Our model has several advantages that make it especially suited for these purposes.

First, to the best of our knowledge, this is the only model to predict transfusion that has been developed and validated on a contemporary patient population. Secondly, the model has a very high discrimination that should improve reliability and accuracy of risk estimates. While there are no contemporary models to predict transfusion, the NCDR bleeding prediction model is perhaps the closest in its clinical application[21]. The modest discrimination of that model (C statistics of 0.72) raises concerns about misclassification when it is applied for individualized decision making. Thirdly, our model should be generalizable to routine clinical practice since it is developed and validated on all consecutive patients treated in Michigan and reflects contemporary practice across multiple institutions and operators. The model is based only on pre-procedure variables, and thus can be used for risk stratification prior to the procedure. This can help facilitate better informed consent as well as consideration of alternate therapeutic strategies that would minimize the risk of bleeding and the resultant need for transfusion.

Unlike traditional risk scores, our model requires a computer for calculation and cannot be converted into a bedside arithmetic risk score. While, the need to favor simplicity over accuracy might have been reasonable in the past, these considerations should no longer be relevant in the era of the widespread use of smart devices and electronic medical records. We developed two different models, with the full model providing a slightly greater discrimination compared with the abbreviated model. In the ideal world, models like ours would be embedded in the electronic medical record, and would be an integral part of the clinical workflow, providing physicians and patients with accurate risk estimates. All the predictors in this model are routinely ascertained and are embedded in the templates that are used for the documentation of the initial history and physical assessment of a patient being evaluated for PCI. Therefore, real time automatic risk estimation is feasible and hopefully will be adapted in the near future by the vendors of electronic medical record systems.

We envision multiple application of this model. The BMC2 consortium is using this model for calculating risk adjusted transfusion rates for physicians and operators and this will guide quality improvement efforts. Initial application of this approach has identified institutions where the rate of transfusion is significantly greater than expected and these hospitals have initiated focused efforts geared towards reducing transfusion. Secondly, the model can be used to personalize the consent process and the patient provided with their personalized risk estimate rather than the standard average risk of bleeding. Thirdly, the model helps identify the 16% of patients who are most likely to need transfusion and thus the ideal subset for use of strategies that have been proven to reduce the risk of transfusion such as bivalirudin or use of radial approach. Conversely, the model helps identify the large subset of patients who are at the extremely low risk and in whom the use of such therapies may not be that beneficial or cost effective. The use of the model to target therapies like bivalirudin (with demonstrated reduction in transfusion but increased expense relative to heparin) to the highest risk patients, while avoiding it in the low risk patients has the potential to reduce both cost and complications and should be evaluated in future studies. Use of the NCDR bleeding model in this fashion has been recently demonstrated to be associated with clinically meaningful reductions in bleeding and transfusion and it likely that the use of our model with its greater accuracy would enhance those benefits[22]. In our exploratory analysis, we demonstrate that the absolute benefit of bleeding avoidance therapies is dependent on both the baseline bleeding risk as well as the type of therapy used. As expected, radial access is the most effective approach towards preventing transfusion with a number needed to treat of 19 among patients at high risk while the benefit is less impressive and of uncertain clinical significance in patients at low risk of bleeding. This is also evident in the comparison of heparin and bivalirudin in low risk patients where the absolute difference in events is too small to be clinically meaningful and many institutions may not be able to prevent one transfusion in a year even if they treated all their low risk patients with bivalirudin instead of heparin. It is expected that as clinicians use this tool in practice, other uses will emerge that will lead to further optimization of patient care, as well as modification and refinement of the prediction tool.

Like most observational studies, our study findings must be evaluated with certain caveats. We developed a model to predict transfusion, which is distinct from bleeding. While both bleeding and transfusion can be considered negative outcomes following PCI, transfusion, unlike bleeding, is occasionally necessary and cannot be considered a never event. Secondly, while bleeding that does not require transfusion is associated with adverse long term outcomes, it is unclear if the relationship is causal. On the other hand transfusion, if not needed, is best avoided both due to its negative health impact, and the associated cost. A model for predicting transfusion thus can help guide quality improvement as well as guide clinical practice. Furthermore, the decision to transfuse in our population was clinically driven and may vary from physician to physician and across institutions. However, we believe this makes our model more generalizable to routine clinical care since it reflects findings from contemporary practice across the entire patient population undergoing PCI in Michigan.


We have developed a simple tool for accurately predicting risk of transfusion among patients undergoing PCI. This risk prediction algorithm may prove useful for both bed side clinical decision making and risk adjustment for assessment of quality.

Author Contributions

Conceived and designed the experiments: HG MS. Wrote the paper: HG. Performed the statistical analysis: MS. Interpretation of the data: HG JK. Contributed patient data: TL CG. Critical review of the manuscript: JK TL CG DS.


  1. 1. Ndrepepa G, Berger PB, Mehilli J, Seyfarth M, Neumann FJ, et al. (2008) Periprocedural bleeding and 1-year outcome after percutaneous coronary interventions: appropriateness of including bleeding as a component of a quadruple end point. J Am Coll Cardiol 51: 690–697.
  2. 2. Doyle BJ, Rihal CS, Gastineau DA, Holmes DR Jr (2009) Bleeding, blood transfusion, and increased mortality after percutaneous coronary intervention: implications for contemporary practice. J Am Coll Cardiol 53: 2019–2027.
  3. 3. Feit F, Voeltz MD, Attubato MJ, Lincoff AM, Chew DP, et al. (2007) Predictors and impact of major hemorrhage on mortality following percutaneous coronary intervention from the REPLACE-2 Trial. Am J Cardiol 100: 1364–1369.
  4. 4. Jani SM, Smith DE, Share D, Kline-Rogers E, Khanal S, et al. (2007) Blood transfusion and in-hospital outcomes in anemic patients with myocardial infarction undergoing percutaneous coronary intervention. Clin Cardiol 30: II49–56.
  5. 5. Steg PG, Huber K, Andreotti F, Arnesen H, Atar D, et al. (2011) Bleeding in acute coronary syndromes and percutaneous coronary interventions: position paper by the Working Group on Thrombosis of the European Society of Cardiology. Eur Heart J 32: 1854–1864.
  6. 6. Shishehbor MH, Madhwal S, Rajagopal V, Hsu A, Kelly P, et al. (2009) Impact of blood transfusion on short- and long-term mortality in patients with ST-segment elevation myocardial infarction. JACC Cardiovasc Interv 2: 46–53.
  7. 7. Rao SV, Jollis JG, Harrington RA, Granger CB, Newby LK, et al. (2004) Relationship of blood transfusion and clinical outcomes in patients with acute coronary syndromes. JAMA 292: 1555–1562.
  8. 8. Koch CG, Li L, Duncan AI, Mihaljevic T, Loop FD, et al. (2006) Transfusion in coronary artery bypass grafting is associated with reduced long-term survival. Ann Thorac Surg 81: 1650–1657.
  9. 9. Sherwood MW, Wang Y, Curtis JP, Peterson ED, Rao SV (2012) Patterns of Red Blood Cell Transfusion Use in Patients Undergoing Percutaneous Coronary Intervention in Contemporary Clinical Practice Circulation. 126: A9286.
  10. 10. Chia CC, Rubinfeld I, Scirica BM, McMillan S, Gurm HS, et al. (2012) Looking beyond historical patient outcomes to improve clinical models. Sci Transl Med 4: 131ra149.
  11. 11. Pencina MJ, D′Agostino RB (2012) Thoroughly Modern Risk Prediction? Science Translational Medicine 4: 131fs110.
  12. 12. Gurm HS, Seth M, Kooiman J, Share D (2013) A novel tool for reliable and accurate prediction of renal complications in patients undergoing percutaneous coronary intervention. J Am Coll Cardiol 61: 2242–2248.
  13. 13. Moscucci M, Rogers EK, Montoye C, Smith DE, Share D, et al. (2006) Association of a continuous quality improvement initiative with practice and outcome variations of contemporary percutaneous coronary interventions. Circulation 113: 814–822.
  14. 14. Gurm HS, Smith DE, Collins JS, Share D, Riba A, et al. (2008) The relative safety and efficacy of abciximab and eptifibatide in patients undergoing primary percutaneous coronary intervention: insights from a large regional registry of contemporary percutaneous coronary intervention. J Am Coll Cardiol 51: 529–535.
  15. 15. Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, et al. (2012) Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Briefings in Bioinformatics.
  16. 16. Breiman L (2001) Random Forests. Machine Learning 45: 5–32.
  17. 17. Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27: 157–172 discussion 207-112.
  18. 18. Cook NR (2008) Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond' by M. J. Pencina et al., Statistics in Medicine (DOI: Stat Med 27: 191–195.
  19. 19. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, et al. (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12: 77.
  20. 20. Liaw A, Wiener M (2002) Classification and Regression by randomForest. R News 2: 18–22.
  21. 21. Mehta SK, Frutkin AD, Lindsey JB, House JA, Spertus JA, et al. (2009) Bleeding in patients undergoing percutaneous coronary intervention: the development of a clinical risk algorithm from the National Cardiovascular Data Registry. Circ Cardiovasc Interv 2: 222–229.
  22. 22. Rao SC, Chhatriwalla AK, Kennedy KF, Decker CJ, Gialde E, et al. (2013) Pre-Procedural Estimate of Individualized Bleeding Risk Impacts Physicians' Utilization of Bivalirudin During Percutaneous Coronary Intervention. Journal of the American College of Cardiology 61: 1847–1852.