Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

MIASurviveMTP: Machine learning for immediate assessment and survival prediction after massive transfusion protocol

  • Michael D. Cobler-Lichter ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    mdc232@miami.edu

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Jessica M. Delamater,

    Roles Data curation, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Brianna L. Collie,

    Roles Data curation, Methodology, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Nicole B. Lyons,

    Roles Data curation, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Luciana Tito Bustillos,

    Roles Data curation, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Nicholas Namias,

    Roles Conceptualization, Data curation, Methodology, Resources, Supervision, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Brandon M. Parker,

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Jonathan P. Meizoso,

    Roles Conceptualization, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

  • Kenneth G. Proctor

    Roles Resources, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Divisions of Trauma, Surgical Critical Care & Burns, Daughtry Family Department of Surgery, University of Miami Miller School of Medicine and Jackson Memorial Hospital Ryder Trauma Center, Miami, Florida, United States of America

Abstract

Early triage of trauma patients requiring massive transfusion (MT) may help to marshal appropriate resources and improve treatment and outcome. Artificial intelligence (AI) and machine learning (ML) offer theoretical advantages compared to conventional prediction algorithms but have not been thoroughly evaluated in this population. We hypothesized that AI/ML techniques incorporating all available data in a patient’s medical record could achieve similar, if not higher, performance in the prediction of mortality in MT patients as compared to existing models. Patients from the American College of Surgeons Trauma Quality Improvement Project database (TQIP) were retrospectively reviewed. Those receiving ≥ 5 units of red blood cells and/or whole blood within the first four hours of arrival were defined as MT patients. Those receiving ≥10 units were identified as ultramassive transfusion (UMT) patients. ML models were created to predict 6-hour mortality using variables available at different time points, including patient arrival. Of 5,481,046 patients in TQIP from 2017 to 2021, 47,744 received MT and 20,337 of these received UMT. Using only variables available on arrival, MT AUROC was 0.901 [95% CI 0.895–0.910] which increased to 0.943 [95% CI 0.938–0.948] with addition of 4-hour variables. For UMT, arrival AUROC was 0.858 [95% CI 0.846–0.872] and increased to 0.922 [95% CI 0.914–0.931] at 4 hours. ML models reliably predict mortality in both MT and UMT patients. These are the only ML models trained on MT and UMT patients. Future work can focus on prospective implementation of these models with potential direct integration into the electronic medical record. Real-time utilization of comprehensive patient data may enhance clinical decision-making regarding which patients should continue receiving massive transfusion, thus optimizing the allocation of this limited resource.

Introduction

Massive transfusion (MT), defined as transfusion of ≥10 units of Whole Blood (WB) or packed red blood cells (pRBCs) within 24 hours, or 5 units within 4 hours, is required for approximately 3–5% of trauma patients [1]. With approximately 2.6 million trauma admissions per year in the US, there are 100,000 MTs for trauma each year -- 70% of all blood consumed at trauma centers [2,3]. Though standardization of Massive Transfusion Protocols (MTP) has led to reductions in both mortality, postoperative complications, and total volume of product transfused, these patients still face high mortality rates [48]. To make matters worse, the number of people donating blood has declined 40% in the last 2 decades [9]. Thus, judicious use of this limited resource is an important consideration, especially in resource-limited settings.

Both the Assessment of Blood Consumption (ABC) score and the Massive Transfusion Score have been applied in mortality prediction after MT but have only achieved 60–70% discrimination by area under the receiver operator curves (AUROC), likely because these scores, along with virtually every other MT-related scoring system, have been designed to predict who will require MTP, not who will survive once it has been initiated [1012]. In current practice, the decision to initiate an MTP is typically made within minutes of patient arrival, often based on initial vital signs and clinical gestalt, with transfusion beginning almost immediately once activated. However, decisions about continuing transfusion and prognostication beyond the initial phase are more complex and rely on evolving physiologic parameters over the ensuing hours.

Artificial Intelligence (AI) and Machine Learning (ML) are rapidly growing fields that excel in their ability to integrate large amounts of data quickly to make real-time predictions. The ability of these models to use substantially more information in making predictions as compared to standard traditional techniques offers a unique advantage now that virtually all patient records are maintained electronically, making access to this data easy. ML algorithms have been developed for prediction of post-trauma complications such as surgical site infection, venous thromboembolism, and mortality in other populations as well as many other surgical and non-surgical populations [1315]. These methods have also been refined to predict mortality in Critical Administration Threshold positive (CAT+) patients, those requiring ≥3 units of pRBCs in the first hour. However, the only model reported in the literature is based on < 2,000 CAT+ patients, not true MT patients, and its primary focus was prediction of need for MT [16].

To fill this gap, we aimed to utilize AI and ML techniques to predict survival in trauma patients undergoing MT using a large national dataset. We hypothesized that ML models trained on MT and ultramassive transfusion (UMT) patients could achieve similar, if not higher performance compared to pre-existing models for CAT+ patients, and that these predictions could be improved further with the incorporation of more clinical information that becomes available as the resuscitation progresses. These evidence-based mortality prediction tools have the potential to guide more informed blood allocation, especially in resource-limited settings where transfusion resources may be scarce.

Methods

Per recommendations by the EQUATOR network, the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD+AI) guidelines were followed, and the checklist is available in S1 Appendix [17]. This study was exempt from institutional review board approval because of use of a de-identified national database.

Data source

The American College of Surgeons Trauma Quality Improvement Program (TQIP) dataset from 2017–2021 was retrospectively reviewed. Patients who underwent any transfusion were identified using International Statistical Classification of Diseases and Related Health Problems Version 10 (ICD 10) codes. From this cohort, those who underwent MT, defined as ≥5 units of pRBC and/or WB within the first 4 hours of resuscitation, and UMT, defined as ≥10 units of pRBCs and/or WB within the first four hours, were selected. 6-hour mortality was derived by combination the mortality variable with the time to discharge variable (those who died with a time to discharge <=6 hours were identified as mortalities within 6 hours). Those with missing data for mortality were excluded.

Data preprocessing

Initial exploratory data analysis was performed using IBM SPSS Statistics version 28 (International Business Machines Corp, Armonk, New York). ML model development was done using Google Colab servers running Python 3.10.12 (Python Software Foundation, Beaverton, OR) with following packages: scikit-learn version 1.5.2, tensorflow version 2.9.1, DeepTables version 0.2.6, XGBoost version 2.0.2, SHAP version 0.46.0 [1823]. All code is available in S2 Appendix.

Of all available variables in TQIP, only those that would have been available on arrival were included in the initial model, termed the arrival model, to simulate variables that would be available upon initiation of MTP. Additional 4-hour models were developed separately that also included variables that would be available after the first 4 hours of resuscitation, such as 4-hour transfusion volume, to refine the initial prediction made by the arrival model. In addition to the standard variables available in TQIP, multiple other variables were derived from available data, including body mass index, total number of patients transfused at a given facility, mortality rate at a given facility, and average volume of blood products per patient administered at a given facility. Patients (n = 88) that had facility mortality rates of 0, or total facility deaths of 0 or 1 had their facility-level data censored and treated as missing data to prevent perfect separation of the data based on this single variable. Abbreviated Injury Scale (AIS) codes were processed to derive 32 additional binary variables that correspond to the presence or absence of injury in clinically relevant body regions/organs (S3 Appendix) for inclusion in the 4-Hour models. See S4 Appendix for an exhaustive list of variables contained in each model.

All categorical variables were one-hot encoded (a method of transforming a categorical variable with n levels into n binary variables that are more easily handled by predictive modeling). Any variable with >50% missing data was excluded. For categorical data with <50% but >0% missing data, missing values were replaced with the most common value across the data set for each variable (mode imputation). For continuous data, the missing values were replaced with the median value of the variable across the dataset (median imputation), a commonly described technique for handling missing data [24]. Random imputation was also tested for key continuous variables, using values drawn from a synthetic distribution matched to the variable’s mean and variance, to ensure model performance was not overly dependent on patterns of missingness. To assess the impact of the missingness threshold used for variable exclusion, we conducted sensitivity analyses using five thresholds (10%, 20%, 33%, 50%, and 66%), with empiric selection of the best-performing threshold overall for the final model threshold (see S5 Appendix for sensitivity analysis results). Continuous variables were normalized by subtraction of the mean of each variable and scaling to unit variance. All time-related variables are measured from patient arrival to the trauma bay.

ML model development

Given the potential for these predictive models to influence decisions on continuation or stoppage of MT in critically ill patients, we chose an early mortality timepoint (6-hour) as our outcome. 6-hour mortality was chosen as the endpoint as early mortality in patients requiring MT is most reflective of hemorrhage-related death, the outcome most directly targeted by MT protocols, and is a commonly used outcome in hemorrhage-control literature. Each later timepoint would have progressively higher mortality rates and would be more influenced by specific in-hospital treatments compared to the initial resuscitation, inflating the potential error in these models and therefore the potential harm.

The models tested included logistic regression (with both LASSO and Ridge regularization), gradient-boosted decision trees (GBDT), random forest, k-nearest neighbor, and artificial neural networks. Each model was optimized against the AUROC. The data were randomly split into training (80%) and testing (20%) sets. The training set was again split into true training (80%) and calibration (20%) sets for model calibration. Grid search with fivefold cross validation was used to determine optimal hyperparameters (tested combinations of hyperparameters and final specifications available in S2 Appendix). Models were calibrated after fitting and hyperparameter optimization.

Two different model set were developed: the MT models that were trained on any patient who received ≥5 units of either pRBC or WB in the first four hours of resuscitation, and the UMT models that were trained on patients who received ≥10 units in the first four hours.

Model evaluation

The AUROC with 95% confidence intervals (CI) was calculated using bootstrapping methods with 1,000 iterations and served as the primary performance metric for each model. The model with the highest AUROC was chosen as the best performing model for each time interval (arrival and 4-hour) and was compared against a more traditional regression-based approach (LASSO) using paired bootstrap testing with 2,000 iterations. Secondary performance metrics included sensitivity, specificity, positive predictive value, area under the precision recall curve (a measure of a model’s ability to correctly identify the positive class across various classification thresholds and that may help inform model performance in cases of imbalanced data with low prevalence rates), F1 score (the harmonic mean of positive predictive value and sensitivity), and the Brier Score (mean squared difference between predicted probabilities and actual outcomes). Secondary metrics that depend on positive classification threshold chosen (sensitivity, specificity, positive predictive value, negative predictive value, and F1 score) were reported at two different classification thresholds: the threshold which maximizes F1 score and the threshold that achieves 90% specificity.

Models were calibrated after fitting. Calibration success was assessed by examining reliability diagrams in which predictions are grouped into deciles of predicted risk and compared the mean predicted probability of each decile to the observed proportion of positive outcomes. Plotting these points against the diagonal (perfect calibration line) indicates whether the model is under- or overestimating risk. Agreement between predicted probabilities and the observed event rates signifies successful calibration and was therefore examined in accordance with best practice guidelines for development of ML-based clinical prediction models [25]. 95% CI’s were generated with bootstrap resampling of the dataset with 1000 iterations.

Similarly, the potential clinical utility of the predictive model was evaluated using decision curve analysis across a range of threshold probabilities. Higher net benefit implies that using the model at that threshold would yield better clinical outcomes than either of two baseline strategies: treating all (assuming everyone died) or treating none (assuming no one died), accounting for the trade-off between true positives and false positives [26,27].

Finally, to improve interpretability and provide insight into model behavior, we conducted a descriptive analysis of feature importance using Shapley additive explanation (SHAP) scores, a game-theoretical approach that estimates the average marginal contribution of each variable across all permutations. While informative, this feature analysis is secondary to the primary goal of the study, which is predictive modeling of 6-hour mortality in patients receiving massive transfusion [23,28].

Results

Patient characteristics

Of 5,481,046 patients in TQIP from 2017 to 2021, 333,987 received at least one unit of any blood product. 47,744 patients received MT and of these, 20,337 received UMT. After excluding patients with missing data for 6-hour mortality (n = 1,675, of whom 800 were also in the UMT group), the study populations were comprised of 46,069 in the MT group and 19,537 in the UMT group.

The 46,069 MT patients had a 6-hour mortality rate of 21.9%, increasing to 41.2% at 30 days. Of these patients, 19,537 underwent UMT, with mortality rates at 6-hours and 30 days of 29.9% and 53.7% respectively. Tables 1 and 2 depict descriptive statistics for the MT (with UMT excluded) and UMT cohorts respectively, stratified by outcome.

ML prediction performance

A GBDT showed the best performance for predicting 6-hour mortality for both the arrival and 4-hour timepoints. In both the MT and UMT models, performance improved at the 4-hour timepoint, from AUROC of 0.901 [95% CI 0.895–0.910] to 0.943 [95% CI 0.938–0.948] and 0.858 [95% CI 0.846–0.872] to 0.922 [95% CI 0.911–0.931] respectively (Fig 1). On paired bootstrap analysis, all GBDT models outperformed LASSO models (all p < 0.0001, Table 3).

thumbnail
Table 3. Comparison of final models to L1 regularization regression models.

https://doi.org/10.1371/journal.pone.0335151.t003

thumbnail
Fig 1. Receiver-operator curves for prediction of 6-hour mortality in each model.

AUROC: area under the receiver-operator curve.

https://doi.org/10.1371/journal.pone.0335151.g001

All models demonstrated satisfactory calibration as depicted by the reliability diagrams in Fig 2. Decision curve analysis showed all models demonstrated net benefit over both “treat none” and “treat all” baseline strategies (Fig 3) for all decision thresholds under 0.95. Secondary performance metrics are displayed in Table 4.

thumbnail
Table 4. Secondary metrics for each model developed and tested.

https://doi.org/10.1371/journal.pone.0335151.t004

thumbnail
Fig 2. Reliability diagrams to assess calibration of each model.

https://doi.org/10.1371/journal.pone.0335151.g002

thumbnail
Fig 3. Decision Curve Analysis to assess net benefit of each model.

https://doi.org/10.1371/journal.pone.0335151.g003

Evaluation with ShAP in all models identified the clinical factors having the strongest impact on model prediction, and were roughly similar between MT and UMT cohorts. On arrival, these were ED Glasgow Coma Scale (GCS), lowest systolic blood pressure, and patient weight/body mass index (BMI). At the four-hour timepoint, these determinant remained similar but total 4-hour transfusion volume became the single most important predictor in the MT group, and cryoprecipitate administration in the UMT group (Figs 4, 5).

thumbnail
Fig 4. Shapley additive explanation (SHAP) methods to assess feature importance for both massive transfusion models.

Left side: Beeswarm plot-each point represents an individual patient, the color of each point represents the value of that variable for that patient, and the horizontal displacement represents the effect of that value of that variable on the model’s outcome prediction for that individual patient. Right side: Top 20 most important features in relation to the model’s decision making, ranked by mean absolute SHAP value. ED: emergency department; SPB: systolic blood pressure; GCS: Glasgow Coma Scale; BMI: body mass index; Avg: average; EMS: emergency medical services.

https://doi.org/10.1371/journal.pone.0335151.g004

thumbnail
Fig 5. Shapley additive explanation (SHAP) methods to assess feature importance for all ultramassive transfusion models.

Left side: Beeswarm plot-each point represents an individual patient, the color of each point represents the value of that variable for that patient, and the horizontal displacement represents the effect of that value of that variable on the model’s outcome prediction for that individual patient. Right side: Top 20 most important features in relation to the model’s decision making, ranked by mean absolute SHAP value. ED: emergency department; GCS: Glasgow Coma Scale; SPB: systolic blood pressure; BMI: body mass index; Avg: average; EMS: emergency medical services.

https://doi.org/10.1371/journal.pone.0335151.g005

Discussion

To our knowledge, this is the first study to describe the use of AI and ML methods for real-time prediction of 6-hour mortality in patients undergoing MT and UMT. A GBDT performed the best for all models with AUROC of 0.901 [95% CI 0.895–0.910] in the arrival model, increasing to 0.943 [95% CI 0.938–0.948]in the 4-hour model (0.858 [95% CI 0.846–0.872] to 0.922 [95% CI 0.914–0.931] in UMT models). On descriptive feature analysis, the variables with the highest impact on the model’s prediction, on average, included ED GCS, lowest systolic blood pressure, patient weight/BMI, total 4-hour transfusion volume, and administration of cryoprecipitate. These results demonstrate that ML models can use variables available at the time of MTP initiation to accurately predict early mortality. These predictions outperform traditional regression-based approaches (p < 0.0001, Table 3) and can be refined by the inclusion of more data points as the resuscitation progresses. This information can serve as a useful clinical adjunct for clinicians facing difficult decisions about continued use of limited blood products and may be especially helpful in more resource-limited settings, such as smaller trauma centers or in cases of mass casualty incidents. This may also aid in difficult family discussions surrounding prognostication of this very sick cohort.

MTP is not a single event but an ongoing process, often over hours and involving variable transfusion volumes. While some patients may receive only a few units of blood, others can receive more than 100. Throughout this process, clinicians face multiple decision points where reassessment of resuscitation goals, patient trajectory, and futility becomes necessary. Our model is intended to support decision-making throughout the entire resuscitation process by providing real-time mortality prediction once MT has already begun and resuscitation is ongoing. While emergency and trauma physicians already incorporate evolving physiologic data into their decision-making, these decisions often rely heavily on clinical gestalt. The aim of our model is to quantify that expert intuition using data-driven approaches, creating a reproducible, evidence-based tool that mirrors expert reasoning. This is a goal supported by multiple recent studies citing the need to adapt advanced diagnostics, precision medicine, and patient-tailored strategies to inform clinical care in the context of massive transfusion [29,30].

We envision this model functioning as a decision-support tool for physicians during resuscitation, augmenting rather than replacing clinical judgment, particularly when time and resources are limited. This may be especially beneficial in lower-volume or lower-acuity trauma centers, where providers may not encounter massive transfusion cases frequently and where reliance on gestalt may be more variable. In these settings, a real-time, data-driven prediction model can help standardize decision-making, reduce uncertainty, and support more confident choices in the face of limited clinical exposure. Similarly, in mass casualty events, where clinical bandwidth and blood product availability can be rapidly overwhelmed, such a tool could aid in triaging ongoing transfusion efforts and prioritizing patients most likely to benefit.

Previous studies identifying when MT is futile in trauma patients have focused on identifying a threshold number of units of blood when continued resuscitation is considered “futile,” or have identified risk factors for mortality after MT and/or UMT. For instance, Louden et al identified ≥ 16 units in 4 hours as the point at which mortality rates eclipse 50%, labeling this “heroic,” and ≥ 36 units as the at which survival rates approach 0%, labeling this “near futile” [31]. Ang et al then went a step further to describe how patient age should influence these thresholds, with increasing patient age resulting in lower transfusion thresholds to maintain the same mortality rates [32]. A recent narrative review of this topic by Kim et al identified nine articles that examined this topic of when ongoing transfusion in trauma patients is futile, focusing on patients in the UMT range [33]. While the authors note that it is difficult, if not impossible, to decide on an absolute value a transfusion cutoff, they do note that “circumstances vary and the decision to continue transfusions should be individualized”. This widely shared view underscores the potential utility of ML-based solutions that provide highly individualized risk estimates, especially given the growing consensus that no single transfusion threshold can universally define futility in massive transfusion. Static numerical cutoffs have proven inadequate, and clinical decisions must instead account for patient-specific and situational factors [3440].

There has been prior work on prediction models in MT patients, but virtually every scoring system and/or algorithm has been developed to predict which patients will require MTP activation in the first place, not who will survive once it has been initiated [11,12,41,42]. Additionally, while there are standardized metrics to quantify injury severity, such as Injury Severity Score and Trauma and Injury Severity Score, these are not available in real time and therefore not useful in mortality prediction or prognostication during the resuscitation. Benjamin et al developed the first ML-based algorithm for mortality risk-estimation in trauma patients requiring transfusions, and was the first proof-of-concept that highly individualized mortality prediction can outperform standard methods of mortality prediction, such as stratifying patients by ABC or Massive Transfusion Scores [16]. While their algorithm was based on a cohort of CAT+ patients, not MT patients, their Tier 2 model (using variables available on presentation as well as additional vital signs and adjuncts) was able to achieve an AUROC of 0.858 compared to the AUROC of 0.544 using just the ABC score.

Though our model’s performance metrics cannot be statistically compared to Benjamin’s given the different patient populations and variables in the datasets used in algorithm training, our AUROCs of 0.901 [95% CI 0.895–0.910] and 0.858 [95% CI 0.846–0.872] for our MT and UMT arrival models and 0.943 [95% CI 0.938–0.948]/ 0.922 [95% CI 0.914–0.931] for our 4-hour models suggest that our algorithm performs similarly if not better to existing ML methods. Another strength of our model is that it only uses structured variables that would be available within approximately 60 seconds of patient arrival. This can facilitate direct electronic medical record (EMR) integration where these structured variables could be pulled from the chart and would allow for the model to run directly within the EMR, automatically. Additionally, we specifically chose to train our model on a database that publishes new data every year to counteract model drift and remain stable over time [43].

Despite these strengths, our model is not without limitations. Given that these models were trained on a retrospective national database, it lacks the granularity of certain variables that may be important for the prediction of mortality after MT, such as the Focused Assessment with Sonography in Trauma exam, coagulation values, and hemoglobin: all of which were highly impactful features in Benjamin’s models for predicting CAT+ status. Because the dataset did not include the components necessary to calculate the ABC or Massive Transfusion Scores, we were unable to benchmark against those established tools. As with all retrospective modeling, the results are only as reliable as the training data. Additionally, though our data was taken from a national database, the model has not yet been prospectively validated, which may limit its generalizability. Our inclusion criterion of patients receiving at least 5 units of blood within the first four hours also introduces a degree of survival bias, as it excludes patients who died before reaching that threshold. Consequently, our model does not apply to patients who did not survive long enough to receive 5 units of blood.

While our model offers a high degree of accuracy in its predictions, it is certainly less interpretable than standard regression-based techniques, an issue common to all ML prediction algorithms. Though we can use ShAP to give us insights into why a given ML model makes the predictions it does, ShAP explanations do not depict independent associations like a regression would. Therefore, interpretation must be done cautiously with the knowledge that these are akin to univariate correlations. For instance, height appears consistently as an important predictor in our model. This does not imply that patient height is associated with mortality, however. It is more likely that height is unlikely to be measured accurately in critical patients and this “missingness” is a proxy for acuity or process-of-care dynamics, rather than a biologically meaningful signal. This phenomenon is consistent with findings from Donohue et al., who showed that missing thromboelastography data correlated with hypotension, low GCS, prehospital intubation, and increased 30-day mortality [44]. As a result, height in these critical patients may be defaulted to a standard value in the EMR, or is left missing and later imputed by our model. It is therefore critical to recognize that this feature analysis is provided to offer transparency into which variables the model is leveraging for its predictions, not to draw causal or explanatory conclusions about the biological or clinical importance of specific variables.

We chose to exclude explicit missingness indicator variables from the models for two reasons. First, to avoid overfitting to institution/database-specific documentation patterns that may not generalize during single-facility implementation, and second, to reduce the risk of the model overly relying on non-physiologic process artifacts. While missingness may carry real prognostic value, it is fragile as a predictor. Future institutional workflows changes, such as mandatory auto-filling of height/weight fields from prior encounters or license records, may result in decreased or entirely absent predictive utility of missingness. As a result, models that lean heavily on these artifacts may fail silently when deployed in environments with different charting practices. Though our model does not explicitly include missingness as a determinant of mortality, it may be able to pick up the imputed values as a sign of this missingness. To address this, separate models were run using random imputation for the height, weight, and temperature variables to assess model performance without this missingness heuristic. While performance of the models did drop slightly on admission (AUROC of 0.870 [95% CI 0.862–0.879] for MT and 0.814 [95% CI 0.801–0.830] for UMT), this drop was less pronounced in the 4-hour models (AUROC of 0.929 [95% CI 0.924–0.935] for MT and 0.904 [95% CI 0.895–0.914] for UMT. Relative feature importance remained similar across all models as well, except for weight and BMI becoming less impactful. Altogether, this suggests that these models are not overly reliant on missing data in their predictions, and that local implementation should take facility-specific practices into account when determining both whether to include explicit missingness and type of imputation method for missingness.

Most importantly though, this predictive tool is not designed to guide the initial activation of MTP or identify patients for whom it should be started. Rather, it is specifically designed to inform decisions about the continuation or termination of massive resuscitation once MTP has already been initiated. These are decisions that are currently based largely on clinician gestalt and experience rather than objective, data-driven methods, which may be especially problematic and variable in low-volume, low-acuity centers. Our model aims to supplement, not replace, clinical judgment in this complex setting. No matter how accurate prediction algorithms become, they certainly cannot replace the clinical decision-making of an experienced trauma surgeon. Importantly, while predictive accuracy is high, caution is warranted in interpreting model outputs to avoid a self-fulfilling prophecy, wherein high predicted mortality could lead to premature withdrawal of care. At the same time, the model may offer value in supporting continued resuscitation in patients who might otherwise be deemed futile, particularly in lower-volume trauma centers, where limited experience could lead to underestimation of survivability.

Prospective implementation of this tool must therefore be accompanied by ethical safeguards and institutional calibration. Threshold selection and model integration should reflect each institution’s resources, clinical culture, and tolerances for aggressive care. For example, a high predicted mortality score at one institution might trigger an early automated alert to the blood bank signaling imminent MTP activation, while at another, it could serve as an indicator to consider transfer to a higher level of care or an early alert to prepare either the operating room or an ICU bed. Ultimately, the model is intended to enhance, not constrain, individualized clinical decision-making, providing a data-informed method to augment trauma care, especially in resource-limited scenarios.

Our findings demonstrate that ML techniques can predict survival of trauma patients after both MT and UMT with outstanding discrimination and can refine this prediction over time as the resuscitation progresses. The next steps for this model will focus on two distinct areas: prospective validation and model extension using different data modalities. Prospective validation is essential to assess the real-time performance, clinical utility, and integration feasibility of this algorithm across diverse healthcare settings [45]. This will not only confirm the model’s predictive accuracy in practice but will also allow for iterative refinement based on institutional-specific workflows, thresholds, and patient populations.

Implementation trials could help evaluate how predictive outputs influence clinician decision-making, and whether they improve resource allocation, patient outcomes, patient communication/prognostication, or the efficiency of MTP continuation decisions. This will also allow for collection of other types of data that can extend the model, such as ED notes, operative reports, radiology reports, and time-series vitals data, as the use of unstructured data like this has been shown to improve predictive accuracy in other ML-based decision support tools [46]. Just as important as technical performance is end-user trust, usability, and buy-in. The successful integration of decision-support tools such as this will depend heavily on how frontline clinicians perceive their relevance, reliability, and ease of use. With this in mind, it will be important to include structured usability assessments such as those informed by the Health Information Technology Usability Assessment Scale framework to ensure that we are developing tools that actually help clinicians at the bedside [47,48]. As the use of ML and AI increases, so will the importance of databases that contain the necessary information for development of these tools. To support the development of robust ML models, future trauma databases should be structured to ensure high-quality, standardized, and clinically relevant variable capture.

Conclusions

Though ML models have been developed for prediction of mortality in CAT+ patients, this is the first reported ML model to predict 6-hour mortality trauma patients requiring both MT and UMT. A well-calibrated GBDT model performed best with AUROC of 0.901 [95% CI 0.895–0.910] in the arrival model, increasing to 0.943 [95% CI 0.938–0.948] in the 4-hour model (0.858 [95% CI 0.846–0.872] to 0.922 [95% CI 0.914–0.931] in UMT models). This model performs similarly to if not better than previously developed ML models for CAT+ patients. This demonstrates proof-of-concept that ML techniques can be leveraged for development of highly accurate prediction models that may aid clinicians in prognostication as well as making difficult decisions regarding futility and judicious use of blood products in the most critically injured patients.

Supporting information

S1 Appendix. PDF of EQUATOR network Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD+AI) guidelines.

https://doi.org/10.1371/journal.pone.0335151.s001

(PDF)

S2 Appendix. Microsoft word file with a link to github respository containing all code used in training and production of ML models and hyperparameter information.

https://doi.org/10.1371/journal.pone.0335151.s002

(DOCX)

S3 Appendix. Microsoft excel file containing exhaustive list of Abbreviated-Injury Scale codes used to generate binary variables encoding the presence or absence of injury to various body regions/organ systems.

https://doi.org/10.1371/journal.pone.0335151.s003

(XLSX)

S4 Appendix. Microsoft excel file containing exhaustive list of all variables used in each specific model described in the manuscript and description of variables.

https://doi.org/10.1371/journal.pone.0335151.s004

(XLSX)

S5 Appendix. Sensitivity analysis of different missingness thresholds for imputation versus variable removal.

https://doi.org/10.1371/journal.pone.0335151.s005

(PDF)

S6 Appendix. Structured Technology Summary based on Health Technology Assessment Domains.

https://doi.org/10.1371/journal.pone.0335151.s006

(PDF)

References

  1. 1. van der Meij JE, Geeraedts LMG Jr, Kamphuis SJM, Kumar N, Greenfield T, Tweeddale G, et al. Ten-year evolution of a massive transfusion protocol in a level 1 trauma centre: have outcomes improved? ANZ J Surg. 2019;89(11):1470–4. pmid:31496010
  2. 2. Finkelstein EA, Corso PS, Miller TR. The incidence and economic burden of injuries in the United States. New York (NY): Oxford University Press; 2006.
  3. 3. ACS TQP Best Practices Guidelines | ACS. [cited 8 July 2024]. Available from: https://www.facs.org/quality-programs/trauma/quality/best-practices-guidelines/
  4. 4. Cotton BA, Au BK, Nunez TC, Gunter OL, Robertson AM, Young PP. Predefined massive transfusion protocols are associated with a reduction in organ failure and postinjury complications. J Trauma. 2009;66(1):41–8; discussion 48-9. pmid:19131804
  5. 5. Cotton BA, Gunter OL, Isbell J, Au BK, Robertson AM, Morris JA Jr, et al. Damage control hematology: the impact of a trauma exsanguination protocol on survival and blood product utilization. J Trauma. 2008;64(5):1177–82; discussion 1182-3. pmid:18469638
  6. 6. Riskin DJ, Tsai TC, Riskin L, Hernandez-Boussard T, Purtill M, Maggio PM, et al. Massive transfusion protocols: the role of aggressive resuscitation versus product ratio in mortality reduction. J Am Coll Surg. 2009;209(2):198–205. pmid:19632596
  7. 7. O’Keeffe T, Refaai M, Tchorz K. A massive transfusion protocol to decrease blood component use and costs. Arch Surg Chic Ill. 2008;143:686–90; discussion 690-1.
  8. 8. Hwang K, Kwon J, Cho J, Heo Y, Lee JC-J, Jung K. Implementation of Trauma Center and Massive Transfusion Protocol Improves Outcomes for Major Trauma Patients: A Study at a Single Institution in Korea. World J Surg. 2018;42(7):2067–75. pmid:29290073
  9. 9. Red Cross declares emergency blood shortage, calls for donations during National Blood Donor Month. [cited 8 July 2024]. Available from: https://www.redcross.org/about-us/news-and-events/press-release/2024/red-cross-declares-emergency-blood-shortage-calls-for-donations-during-national-blood-donor-month.html
  10. 10. Callcut RA, Cripps MW, Nelson MF, Conroy AS, Robinson BBR, Cohen MJ. The Massive Transfusion Score as a decision aid for resuscitation: Learning when to turn the massive transfusion protocol on and off. J Trauma Acute Care Surg. 2016;80(3):450–6. pmid:26517786
  11. 11. Strickland M, Nguyen A, Wu S, Suen S-C, Mu Y, Del Rio Cuervo J, et al. Assessment of Machine Learning Methods to Predict Massive Blood Transfusion in Trauma. World J Surg. 2023;47(10):2340–6. pmid:37389644
  12. 12. Callcut RA, Cotton BA, Muskat P, Fox EE, Wade CE, Holcomb JB, et al. Defining when to initiate massive transfusion: a validation study of individual massive transfusion triggers in PROMMTT patients. J Trauma Acute Care Surg. 2013;74(1):59–65, 67–8; discussion 66-7. pmid:23271078
  13. 13. Bonde A, Bonde M, Troelsen A, Sillesen M. Assessing the utility of a sliding-windows deep neural network approach for risk prediction of trauma patients. Sci Rep. 2023;13(1):5176. pmid:36997598
  14. 14. Harmantepe AT, Dulger UC, Gonullu E, Dikicier E, Şentürk A, Eröz E. A method for predicting mortality in acute mesenteric ischemia: Machine learning. Ulus Travma Acil Cerrahi Derg. 2024;30(7):487–92. pmid:38967529
  15. 15. Kurita T, Oami T, Tochigi Y, Tomita K, Naito T, Atagi K, et al. Machine learning algorithm for predicting 30-day mortality in patients receiving rapid response system activation: A retrospective nationwide cohort study. Heliyon. 2024;10(11):e32655. pmid:38961987
  16. 16. Benjamin AJ, Young AJ, Holcomb JB, Fox EE, Wade CE, Meador C, et al. Early Prediction of Massive Transfusion for Patients With Traumatic Hemorrhage: Development of a Multivariable Machine Learning Model. Ann Surg Open. 2023;4(3):e314. pmid:37746616
  17. 17. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. pmid:38626948
  18. 18. Bisong E. Google Colaboratory. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners. Berkeley (CA): Apress; p. 59–64.
  19. 19. Pedregosa F, Varoquaux G, Gramfort A. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825–30.
  20. 20. Abadi M, Agarwal A, Barham P, et al. TensorFlow: large-scale machine learning on heterogeneous systems. 2015. Available from: https://www.tensorflow.org/
  21. 21. Yang J, Li XHW. DeepTables: A deep learning python package for tabular data. 2022. Available from: https://github.com/DataCanvasIO/DeepTables
  22. 22. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York (NY): ACM; p. 785–794.
  23. 23. Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed. 2022;214:106584. pmid:34942412
  24. 24. Rios R, Miller RJH, Manral N, Sharir T, Einstein AJ, Fish MB, et al. Handling missing values in machine learning to predict patient-specific risk of adverse cardiac events: Insights from REFINE SPECT registry. Comput Biol Med. 2022;145:105449. pmid:35381453
  25. 25. Luu J, Borisenko E, Przekop V, Patil A, Forrester JD, Choi J. Practical guide to building machine learning-based clinical prediction models using imbalanced datasets. Trauma Surg Acute Care Open. 2024;9(1):e001222. pmid:38881829
  26. 26. Piovani D, Sokou R, Tsantes AG, Vitello AS, Bonovas S. Optimizing Clinical Decision Making with Decision Curve Analysis: Insights for Clinical Investigators. Healthcare (Basel). 2023;11(16):2244. pmid:37628442
  27. 27. Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3:18. pmid:31592444
  28. 28. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook (NY): Curran Associates Inc.; 2017, p. 4768–4777.
  29. 29. Lin T-L, Liu H-T, Hsieh C-H. Current controversies and advances in massive transfusion: Balancing evidence and practice. J Formos Med Assoc. 2025;S0929-6646(25)00383-3. pmid:40683811
  30. 30. Sanderson BJ, Field JD, Estcourt LJ, Wood EM, Coiera EW. Massive transfusion experience, current practice and decision support: A survey of Australian and New Zealand anaesthetists. Anaesth Intensive Care. 2021;49(3):214–21. pmid:33951942
  31. 31. Loudon AM, Rushing AP, Hue JJ, Ziemak A, Sarode AL, Moorman ML. When is enough enough? Odds of survival by unit transfused. J Trauma Acute Care Surg. 2023;94(2):205–11. pmid:36694331
  32. 32. Ang D, Fakhry SM, Watts DD, Liu H, Morse JL, Armstrong J, et al. Data-Driven Blood Transfusion Thresholds for Severely Injured Patients During Blood Shortages. J Surg Res. 2023;291:17–24. pmid:37331188
  33. 33. Kim JS, Casem CF, Baral E. Narrative review: Is there a transfusion cutoff value after which nonsurvivability is inevitable in trauma patients receiving ultramassive transfusion? Anesth Analg. 2023;137:354–64.
  34. 34. Clements TW, Van Gent J-M, Lubkin DE, Wandling MW, Meyer DE, Moore LJ, et al. The reports of my death are greatly exaggerated: An evaluation of futility cut points in massive transfusion. J Trauma Acute Care Surg. 2023;95(5):685–90. pmid:37125814
  35. 35. Criddle LM, Eldredge DH, Walker J. Variables predicting trauma patient survival following massive transfusion. J Emerg Nurs. 2005;31(3):236–42; quiz 320. pmid:15983575
  36. 36. Matthay ZA, Hellmann ZJ, Callcut RA, Matthay EC, Nunez-Garcia B, Duong W, et al. Outcomes after ultramassive transfusion in the modern era: An Eastern Association for the Surgery of Trauma multicenter study. J Trauma Acute Care Surg. 2021;91(1):24–33. pmid:34144557
  37. 37. Quintana MT, Zebley JA, Vincent A, Chang P, Estroff J, Sarani B, et al. Cresting mortality: Defining a plateau in ongoing massive transfusion. J Trauma Acute Care Surg. 2022;93(1):43–51. pmid:35393379
  38. 38. Morris MC, Niziolek GM, Baker JE, Huebner BR, Hanseman D, Makley AT, et al. Death by Decade: Establishing a Transfusion Ceiling for Futility in Massive Transfusion. J Surg Res. 2020;252:139–46. pmid:32278968
  39. 39. Shibahashi K, Aoki M, Hikone M, Sugiyama K. Association between transfusion volume and survival outcome following trauma: Insight into the limit of transfusion from an analysis of nationwide trauma registry in Japan. J Trauma Acute Care Surg. 2024;96(5):742–8. pmid:37962149
  40. 40. Velmahos GC, Chan L, Chan M, Cornwell EE 3rd, Asensio JA, et al. Is There a Limit to Massive Blood Transfusion After Severe Trauma? Arch Surg 1998;133:947–52.
  41. 41. Mina MJ, Winkler AM, Dente CJ. Let technology do the work: Improving prediction of massive transfusion with the aid of a smartphone application. J Trauma Acute Care Surg. 2013;75(4):669–75. pmid:24064881
  42. 42. Nunez TC, Voskresensky IV, Dossett LA, Shinall R, Dutton WD, Cotton BA. Early prediction of massive transfusion in trauma: simple as ABC (assessment of blood consumption)?. J Trauma. 2009;66(2):346–52. pmid:19204506
  43. 43. Sahiner B, Chen W, Samala RK, Petrick N. Data drift in medical machine learning: implications and potential remedies. Br J Radiol. 2023;96(1150):20220878. pmid:36971405
  44. 44. Donohue JK, Iyanna N, Lorence JM, Brown JB, Guyette FX, Eastridge BJ, et al. Missingness matters: a secondary analysis of thromboelastography measurements from a recent prehospital randomized tranexamic acid clinical trial. Trauma Surg Acute Care Open. 2024;9(1):e001346. pmid:38375027
  45. 45. Brajer N, Cozzi B, Gao M, Nichols M, Revoir M, Balu S, et al. Prospective and External Evaluation of a Machine Learning Model to Predict In-Hospital Mortality of Adults at Time of Admission. JAMA Netw Open. 2020;3(2):e1920733. pmid:32031645
  46. 46. Wu G, Khair S, Yang F, Cheligeer C, Southern D, Zhang Z, et al. Performance of machine learning algorithms for surgical site infection case detection and prediction: A systematic review and meta-analysis. Ann Med Surg (Lond). 2022;84:104956. pmid:36582918
  47. 47. Tan ML, Prasanna R, Stock K, Doyle EEH, Leonard G, Johnston D. Understanding end-users’ perspectives: Towards developing usability guidelines for disaster apps. Progress in Disaster Science. 2020;7:100118.
  48. 48. Schnall R, Cho H, Liu J. Health Information Technology Usability Evaluation Scale (Health-ITUES) for Usability Assessment of Mobile Health Technology: Validation Study. JMIR Mhealth Uhealth. 2018;6(1):e4. pmid:29305343