Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Development of a prediction model for 30-day COVID-19 hospitalization and death in a national cohort of Veterans Health Administration patients–March 2022—April 2023

  • David P. Bui ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    David.Bui@va.gov

    Affiliations Veterans Affairs Portland Health Care System, Portland, Oregon, United States of America, Center of Innovation to Improve Veteran Involvement in Care (CIVIC), Veterans Affairs Portland Healthcare System, Portland, Oregon, United States of America

  • Kristina L. Bajema,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Veterans Affairs Portland Health Care System, Portland, Oregon, United States of America, Division of Infectious Diseases, Department of Medicine, Oregon Health & Science University, Portland, Oregon, United States of America

  • Yuan Huang,

    Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Veterans Affairs Cooperative Studies Program Clinical Epidemiology Research Center (CSP CERC), Veterans Affairs Connecticut Healthcare System, West Haven, Connecticut, United States of America, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America

  • Lei Yan,

    Roles Conceptualization, Data curation, Writing – original draft, Writing – review & editing

    Affiliations Veterans Affairs Cooperative Studies Program Clinical Epidemiology Research Center (CSP CERC), Veterans Affairs Connecticut Healthcare System, West Haven, Connecticut, United States of America, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America

  • Yuli Li,

    Roles Data curation, Writing – original draft, Writing – review & editing

    Affiliation Veterans Affairs Cooperative Studies Program Clinical Epidemiology Research Center (CSP CERC), Veterans Affairs Connecticut Healthcare System, West Haven, Connecticut, United States of America

  • Nallakkandi Rajeevan,

    Roles Data curation, Writing – original draft, Writing – review & editing

    Affiliation Veterans Affairs Cooperative Studies Program Clinical Epidemiology Research Center (CSP CERC), Veterans Affairs Connecticut Healthcare System, West Haven, Connecticut, United States of America

  • Kristin Berry,

    Roles Data curation, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Research and Development, Veterans Affairs Puget Sound Health Care System, Seattle, Washington, United States of America

  • Mazhgan Rowneki,

    Roles Data curation, Writing – original draft, Writing – review & editing

    Affiliation Center of Innovation to Improve Veteran Involvement in Care (CIVIC), Veterans Affairs Portland Healthcare System, Portland, Oregon, United States of America

  • Stephanie Argraves,

    Roles Data curation, Writing – original draft, Writing – review & editing

    Affiliation Veterans Affairs Cooperative Studies Program Clinical Epidemiology Research Center (CSP CERC), Veterans Affairs Connecticut Healthcare System, West Haven, Connecticut, United States of America

  • Denise M. Hynes,

    Roles Investigation, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Center of Innovation to Improve Veteran Involvement in Care (CIVIC), Veterans Affairs Portland Healthcare System, Portland, Oregon, United States of America, Health Management and Policy, School of Social and Behavioral Health Sciences, College of Health, Health Data and Informatics Program, Center for Quantitative Life Sciences, Oregon State University, Corvallis, Oregon, United States of America

  • Grant Huang,

    Roles Funding acquisition, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Office of Research and Development, Veterans Health Administration, Washington, District of Columbia, United States of America

  • Mihaela Aslan,

    Roles Conceptualization, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Veterans Affairs Cooperative Studies Program Clinical Epidemiology Research Center (CSP CERC), Veterans Affairs Connecticut Healthcare System, West Haven, Connecticut, United States of America, Office of Research and Development, Veterans Health Administration, Washington, District of Columbia, United States of America

  • George N. Ioannou

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Research and Development, Veterans Affairs Puget Sound Health Care System, Seattle, Washington, United States of America, Divisions of Gastroenterology, Veterans Affairs Puget Sound Healthcare System and University of Washington, Seattle, Washington, United States of America

Abstract

Objective

The epidemiology of COVID-19 has substantially changed since its emergence given the availability of effective vaccines, circulation of different viral variants, and re-infections. We aimed to develop models to predict 30-day COVID-19 hospitalization and death in the Omicron era for contemporary clinical and research applications.

Methods

We used comprehensive electronic health records from a national cohort of patients in the Veterans Health Administration (VHA) who tested positive for SARS-CoV-2 between March 1, 2022, and March 31, 2023. Full models incorporated 84 predictors, including demographics, comorbidities, and receipt of COVID-19 vaccinations and anti-SARS-CoV-2 treatments. Parsimonious models included 19 predictors. We created models for 30-day hospitalization or death, 30-day hospitalization, and 30-day all-cause mortality. We used the Super Learner ensemble machine learning algorithm to fit prediction models. Model performance was assessed with the area under the receiver operating characteristic curve (AUC), Brier scores, and calibration intercepts and slopes in a 20% holdout dataset.

Results

Models were trained and tested on 198,174 patients, of whom 8% were hospitalized or died within 30 days of testing positive. AUCs for the full models ranged from 0.80 (hospitalization) to 0.91 (death). Brier scores were close to 0, with the lowest error in the mortality model (Brier score: 0.01). All three models were well calibrated with calibration intercepts <0.23 and slopes <1.05. Parsimonious models performed comparably to full models.

Conclusions

We developed prediction models that accurately estimate COVID-19 hospitalization and mortality risk following emergence of the Omicron variant and in the setting of COVID-19 vaccinations and antiviral treatments. These models may be used for risk stratification to inform COVID-19 treatment and to identify high-risk patients for inclusion in clinical trials.

Introduction

Early in the COVID-19 pandemic, many prognostic prediction models were developed to identify persons infected with SARS-CoV-2 who could be at high risk for adverse outcomes [13]. These models informed implementation of risk reduction measures, clinical care, and resource allocation [4]. Despite reductions in mortality and hospitalization rates, COVID-19 remains a leading cause of death in the United States and accounts for thousands of hospitalizations every week nationally [5]. COVID-19 risk prediction models may continue to inform patient care and policy decisions, particularly as effective vaccines and antiviral treatments have become available [6, 7]. Such models may also help identify high-risk patients for facilitating future clinical trials of new pharmacotherapies.

Currently available COVID-19 risk prediction models have important limitations. First, most models were developed early in the pandemic [813] and do not account for current circulating viral strains, population immunity from previous infection or vaccination, and treatments. This is particularly important as risk models are sensitive to population contextual factors under which they are developed. Model performance, especially calibration, may diminish over time as context (e.g., population immunity, treatment availability) changes [14, 15]. Moreover, most models were developed among acutely ill patients receiving care in the hospital [13, 1618] and emergency department settings [19], focusing predominantly on mortality and severe outcomes, and may not be applicable to or perform well in a broader population, particularly as mortality rates have declined. Most COVID-19 risk models have largely relied on parametric regression models with strong assumptions of linearity (e.g., logistic regression) which may not be met for many predictors. Further, most available models rarely leverage ensemble methods which can utilize parametric and nonparametric modeling approaches to hedge against model misspecification and have been shown to improve predictive performance in diverse applications [2022].

We report the development, assessment, and internal validation of multivariable risk prediction models for 30-day hospitalization and death among patients with COVID-19, which use more recent data to reflect the current context of COVID-19 and leverage an ensemble modeling approach. These models were developed on a large national source population of United States Veterans enrolled in Veterans Health Administration (VHA) care using detailed covariates captured in electronic health records (EHR).

Material and methods

Data sources, participants, and sample size

We used the Department of Veterans Affairs (VA) COVID-19 Shared Data Resource (CSDR) to identify patients with COVID-19. The CSDR includes data on all VHA patients who had laboratory-confirmed SARS-CoV-2 tests performed in the VHA as well as tests performed outside the VHA and documented in clinical notes. In addition to CSDR, we used the VA Corporate Data Warehouse, a repository of VA comprehensive electronic health records, which includes a broad range of demographic (e.g., age, sex, race), geographic, and clinical variables (e.g., vaccination, comorbidity indicators) in our prediction models. Hospitalization and mortality outcomes obtained from CSDR, and hospitalizations were further supplemented with claims data for purchased care provided outside of the VHA through the VHA Integrated Veteran Care Program and with Fee-for-Service claims from the Centers for Medicare & Medicaid Services (CMS). CMS data were also used to supplement data on COVID-19 vaccinations.

For model development and validation, we selected all patients aged 18 years and older with a first positive SARS-CoV-2 test in CSDR between March 1, 2022, and March 31, 2023. Patients from all VHA facilities across 50 states and territories were eligible for inclusion. We limited the cohort to patients obtaining VHA care, define as having a VHA primary care encounter in the 18 months preceding the positive test to minimize missing data and improve ascertainment of key variables. All data for this analysis were accessed on October 2, 2023.

Outcomes

We developed separate models to predict three outcomes: 1) 30-day hospitalization or death (composite outcome model), 2) 30-day hospitalization (hospitalization model), and 3) 30-day all-cause mortality (mortality model). Hospitalizations and deaths were counted if they occurred within 30 days from the first test-positive date for each patient.

Predictors

We reviewed predictors available in the EHR and selected covariates for model development based on prior research and clinical knowledge [6, 2325]. We selected 84 predictors, including demographic variables (e.g., age, sex, race), geographic variables (e.g., region of residence, rurality), body mass index, indicators for comorbidities, COVID-19 vaccination status, and receipt of outpatient anti-SARS-CoV-2 treatments, including nirmatrelvir-ritonavir, molnupiravir, and monoclonal antibodies. COVID-19 treatments were considered in the list of candidate predictors as studies have shown including treatments known to affect outcomes can enhance predictive performance and precision in prognostic models [26, 27]; inclusion of post-infection treatment further flexibly allows prediction of pre-treatment and on-treatment risk. We included both individual comorbidity indicators (e.g., type 1 diabetes, type 2 diabetes, diabetes with complications) as well as composite indicators of related comorbidities (e.g., any diabetes). We also included three index measures of vulnerability and frailty, including the Area Deprivation Index (ADI) [28] the Charlson Comorbidity Index (CCI) [29], and the Care Assessment Needs (CAN) score [30], an automated EHR-based risk score developed at the VHA to predict 1-year mortality [31]. See S1 Table in S1 File for full list of predictors.

Training and holdout test datasets

We used a random 80/20 data split of our full cohort to create a training and a holdout dataset, respectively. Sampling was done at the VHA facility level such that all patients from a given station were either allocated to the training dataset or test dataset but not both. This data partition approach ensured our models were developed on a population that was geographically different from the population in which we tested model performance.

Data pre-processing and missing data

We pre-processed the training dataset to ensure categorical variables were properly modeled and missing data was imputed. Missing data was rare, with just 3,971 (2%) of our cohort missing one or more variables; the CAN score was the most frequently missing variable and was missing for 2,883 (1.4%) of patients. Categorical predictors were one-hot coded by creating a series of binary ‘dummy’ indicator variables for each factor level with a reference level omitted; most prevalent levels served as the referent. Missing continuous covariates were imputed with observed medians, and missing categorical covariates were imputed with observed modes. We created covariate-specific missing indicators (0/1) to flag observations with missing and imputed covariate values to allow covariate missingness to be an outcome predictor, which will allow future prediction with missing data [32, 33].

Statistical analysis methods: Model development

We used the Super Learner (SL) algorithm to develop a risk prediction model for each specified outcome [20]. SL is an ensemble machine learning algorithm that leverages cross-validation to find optimal weighted combinations of a pre-specified ensemble of models and algorithms (i.e., learners) to improve prediction. The SL algorithm requires minimal assumptions, is flexible enough to include both parametric and nonparametric learners, and mitigates the risk of overfitting in high-dimensional applications through cross-validation making it an ideal algorithm for complex prediction problems [21, 22, 33]. The SL algorithm has been applied in numerous prediction [34] and causal inference studies and shown to perform as well as or better than the best performing individual learner, and is especially well-suited for rare outcomes [21, 35]. We used the ‘SuperLearner’ R package (version 2.0–28.1) to build our SL models.

In total, our final ensembles included a library of 10 distinct learners, including general linear models, penalized regressions (LASSO and ridge regression LASSO and ridge regression), and machine learning algorithms such as generalized additive models, multivariate adaptive regression splines, neural networks, random forest, and XGBoost (see S2 Table in S1 File for learners and hyperparameters). For machine learning algorithms included in the ensemble, we used the ‘caret’ package (version 6.0–94) to perform a series of cross-validated grid searches of hyperparameters in a random 50% sample of the training dataset (to reduce computation time) and used the set of parameters that maximized the cross-validated area under the receiver operating characteristic curve (AUC) for the composite hospitalization or death outcome model [36].

To fit ensemble models, we used a 5-fold cross-validation scheme stratified on outcome to generate cross-validated predictions from each included learner, which were then used in a meta-learner to obtain final ensemble model weights for prediction. Based on our effective sample size and guidance from Phillips et al., [22] we chose 5-fold cross-validation for fitting our SL models to minimize computation time and bias in cross-validated risk estimates. We used the non-negative binomial likelihood maximization meta learner in the ‘SuperLearner’ package which has been shown to perform well for SL classification models [35] (see S3 Table in S1 File for full SL model specifications).

Assessing ensemble model performance and variable importance

For each fitted outcome model, we assessed model performance in two ways. First, we conducted 5-fold cross-validation of each ensemble model with the training dataset to estimate the ensemble model’s performance in unseen data with average AUC (a measure of discrimination, with 1 indicating perfect discrimination) as the target performance metric. Second, we assessed each model’s performance in the holdout dataset and reported each model’s AUC, Brier score (a measure of accuracy, with 0 indicating perfect accuracy), and Cox’s calibration intercept and slope (an intercept of 0 and slope of 1 indicating perfect calibration); no model updating or recalibration was conducted [37, 38]. Model calibration was also visually assessed in the holdout test dataset using local regression-smoothed calibration curves. We further estimated calibration intercepts and slopes for each model across various subgroups in the holdout dataset to assess local model calibration by age, race, ethnicity, region, and CAN score [38].

To compare our models’ performance against existing risk stratification scores, we estimated AUC, Brier scores, and calibration intercepts and slopes for the CAN score and the VACO Index, a publicly available risk score for 30-day COVID-19 mortality that was developed in the VHA population early in the pandemic [11, 39]. For the CAN score, we used Platt scaling to recalibrate the score to our COVID-19 outcomes in the training model dataset [38]. The CAN score was modeled using restricted cubic splines with five knots and outcomes were predicted and assessed in the holdout dataset. We also calculated the absolute net reclassification index (NRI) for our full mortality model compared with CAN and VACO index to assess the improvement in our models’ ability to classify patients into lower and high risk groups [40]. The absolute NRI measures the absolute percent of patients correctly reclassified in our developed model; greater positive NRIs suggest improvements in classification. Model performance metrics and NRI were estimated in the holdout test dataset.

To understand variable importance from our SL models, we calculated SHapley Additive exPlanations (SHAP) values from each model to understand the average marginal contribution of each variable to the predicted outcome probability [41]. For computational efficiency, we estimated SHAP values from the individual learner assigned the most weight in each ensemble. SHAP values were estimated using the ‘shapviz’ R package (version 0.9.2). For visual clarity, we collapsed comorbidity indicators in the same domain (e.g., indicators for heart disease and heart failure were collapsed into ‘cardiovascular diseases’) and collapsed one-hot coded categorical variables (e.g., individual race indicators) and plot the top 15 feature domains in terms of largest average absolute SHAP value.

Parsimonious model development

In addition to the full models, we developed and assessed a set of parsimonious models for each outcome which included the subset of most important predictors from our full models (the top 15 feature domains with highest SHAP values). To facilitate future model implementation in other healthcare systems, we omitted variables used exclusively within VHA, including the CAN score, facility complexity indicator, and residency at a Community Living Center (VA nursing home). These models were trained using the same learners and parameters used to fit the full ensemble models. We use the same performance metrics used to assess our full models and further calculate the absolute NRI for these parsimonious models compared with the full models. All analyses were conducted in R (version 4.3.1).

We followed the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement in preparing this manuscript [42]. The study was approved by the VA Central Institutional Review Board (CIRB). The CIRB waived the requirement for informed consent, as this study was secondary research involving the use of identifiable health information for the purposes of "health care operations" or "research" as those terms are defined at 45 CFR 164.501; only CIRB approved research team members were provided data access.

Results

Participants

We identified 198,174 VHA patients engaged in VHA care with a first positive COVID-19 test between March 1, 2022, and March 31, 2023; the cohort was predominantly male (174,110 [88%]), White (133,461 [67%]), non-Hispanic or Latino (166,670 [84%]), and with average age of 63 years (Table 1). Comorbid conditions were common, with 141,256 (71%) having cardiovascular disease, 91,144 (46%) having a mental health diagnosis (including bipolar disorder, depression, post-traumatic stress disorder, and schizophrenia), and 19,999 (10%) who were immunocompromised. Most (156,956 [79%]) were COVID-19 vaccinated. The patients in the training and holdout test datasets were similar on nearly all characteristics except race (SMD = 0.18) and region (SMD = 0.35). There were 15,037 (8%) persons in the cohort who were hospitalized or died within 30-days of testing positive, including 2,444 (1%) who died (Table 1).

thumbnail
Table 1. Cohort characteristics used in model training and holdout testing.

https://doi.org/10.1371/journal.pone.0307235.t001

Model development

The ensemble models were successfully fitted with all ten pre-specified learners. In the composite and hospitalization models, only the XGBoost, random forest, and GAM learners received an ensemble weight; the XGBoost learner accounted for over 60% of the ensemble weights for the composite and hospitalization models. The mortality model showed greater diversity in learners that were weighted (including LASSO regression, and multivariate adaptive regression splines) with a more even weight distribution (Table 2).

thumbnail
Table 2. Ensemble model weights and 5-fold cross-validated area under the receiver operating characteristic curve (AUC) for individual learners and fitted ensemble model in training dataset.

https://doi.org/10.1371/journal.pone.0307235.t002

Overall, each model showed good cross-validated discrimination with AUC estimates >0.80 in the composite (AUC: 0.82, range: 0.81–0.83) and hospitalization (AUC: 0.80, range: 0.80–0.81) models (Table 2). The mortality model had the highest cross-validated AUC of 0.91 (range: 0.90–0.93). The full ensemble models generally had higher average cross-validated AUCs than the individually included learners.

Model performance in holdout test dataset

Histograms of predicted risks from each model are shown in S4 Fig in S1 File, showing a right-skewed distribution with predicted risks ranging from <1% to 70% for the composite outcome model, <1% to 57% for the hospitalization model, and <1% to 49% for the mortality model. The ensemble models all performed well in the holdout test dataset with AUCs >0.80 and were similar to the cross-validated estimates (Fig 1). The Brier scores for the ensemble models were all close to 0 with the largest score in the composite model (Brier score: 0.07) and lowest score in the mortality model (Brier score: 0.01) suggesting good prediction and classification accuracy. The ensemble models all showed good model calibration, with calibration slopes all close to 1 suggesting the spread of estimated risks generally align with observed risk. Calibration intercepts were all positive suggesting slight underestimation in predicted risks, but the magnitude of miscalibration was low with calibration intercepts all <0.3 (Fig 1). Model calibration by age, sex, race, ethnicity, region, and CAN score generally display consistent calibration across all subgroups (see S5 Fig in S1 File).

thumbnail
Fig 1.

Calibration plots and performance metrics in holdout test dataset for A) Full 30-day Composite Model, B) Full 30-day Hospitalization Model, C) Full 30-day Mortality Model, D) Parsimonious 30-day Composite Model, E) Parsimonious 30-day Hospitalization Mode, and F) Parsimonious 30-day Mortality Model. White 45-degree line represents ideal model calibration. Solid black line represents the nonparametric local regression-smoothed calibration curve. Triangle points represent the average actual outcome probability for grouped deciles of the predicted probability. The rug plot at the bottom of each panel shows the distribution of predicted probabilities.

https://doi.org/10.1371/journal.pone.0307235.g001

Comparison with existing risk scores

Both the recalibrated CAN score and VACO Index had lower AUCs than our full ensemble models for predicting COVID-19 outcomes, although the AUCs for mortality in both the CAN score and VACO Index were >0.80 (S6 Fig in S1 File). In terms of model calibration, the CAN score showed nearly perfect calibration with intercepts <0.2 and slopes of 1. The VACO Index had an intercept of -2.1 and a calibration slope >1, suggesting more extreme predicted risks than observed in our holdout cohort. Compared with the VACO Index, our mortality model showed an absolute increase of +13% in correctly reclassifying 30-day mortality (S7 Table in S1 File). Compared with the recalibrated CAN score, our mortality model showed a decrease in the absolute NRI of -7%, driven by reclassification errors among patients without the outcome.

Variable importance

SHAP values were estimated from the XGBoost model in each ensemble since it was the learner with the greatest weight and values for the top 15 variable domains in each ensemble model are shown in Fig 2 (See S8 Table and S9 Fig in S1 File for SHAP values for all predictors). In all models, the CAN score had the highest importance with SHAP values all >0.5 and was highest in the mortality model (CAN score SHAP value: 1.35). The vulnerability and frailty indices were strong predictors of outcomes with CAN, CCI, and ADI all included in the top 15 predictors. Outpatient COVID-19 treatment and vaccination status were important predictors, ranking among the top 5 with largest SHAP values.

thumbnail
Fig 2. Bee swarm plot showing distribution of SHAP values and mean absolute SHAP value from outcome models predicted on holdout test dataset.

Treatment SHAP value estimate includes indicators for receipt of outpatient nirmatrelvir-ritonavir, molnupiravir, or anti-SARS-CoV-2 monoclonal antibody treatment. Age includes age at index and indicator for being >65 years old. SHAP value estimates for cardiovascular diseases, immunocompromise, other diseases, substance use disorders, and mental health conditions, include SHAP values for all indicators within those domains (see S1 Table in S1 File for list of indicators and their domains used in prediction). ADI: area deprivation index; CCI: Charlson Comorbidity Index; CAN score: Care Assessment Needs score; CLC: community living center.

https://doi.org/10.1371/journal.pone.0307235.g002

Facility complexity level was also among the top 5 predictors in the composite and hospitalization models (but not the mortality model), suggesting treating facility complexity may be more relevant for hospitalization risk than death [43]. While cardiovascular disease, receipt of immunosuppressive medications or cancer therapies, and mental health conditions were among the top 15 predictors, other high-risk chronic conditions like diabetes, liver disease, and lung disease did not have high SHAP values. Notably, race had the 9th largest SHAP value (SHAP value: 0.076) in the mortality model but was not among the top 15 features for the composite or hospitalization models.

Parsimonious models

The performance and calibration of the parsimonious models trained using the subset of the top 15 most important predictors (See S10 Table in S1 File) are shown in Fig 1. Overall, these parsimonious models showed comparable discrimination to the fully fitted models (parsimonious composite model AUC: 0.79, parsimonious hospitalization model AUC: 0.77, and parsimonious mortality model AUC: 0.89) and slightly improved calibration in the composite and hospitalization model, but poorer calibration in the parsimonious model where risks were underestimated. The absolute NRIs for all parsimonious models compared with fully fitted models showed a loss in reclassification accuracy of about -2% in the composite and hospitalization model and a loss in reclassification accuracy of <1% in the mortality model (S11 Table in S1 File).

Discussion

Using data from a national cohort of veterans diagnosed with COVID-19 during the Omicron era, we developed three multivariable prediction models for 30-day COVID-19 hospitalization and death which demonstrated good discrimination and calibration in internal validation. We found age, composite measures of frailty including CCI and CAN score, select medical comorbidities such as cardiovascular disease, COVID-19 vaccination, and outpatient COVID-19 pharmacotherapies to be important predictors of adverse outcomes. Parsimonious models which included a small subset of predictors created to be accessible in non-VHA healthcare systems also showed comparable performance.

In all models, we found the CAN score to be the most important predictor for adverse COVID-19 outcomes. Osborne et al., previously demonstrated the CAN score was associated with COVID-19 mortality in the first three months of the pandemic and recommended its use for risk stratification [31]. The CAN score is an automated and validated model for predicting patient hospitalization or death developed and used at the VHA [4]. In addition to demographic and comorbidity data, the CAN score includes healthcare utilization measurements such emergency department visits and medication refills, and vital sign measurements like blood pressure and heart rate [30]. While the CAN score was the most important predictor in our ensemble models and performed well alone once calibrated (S6 Fig in S1 File), it is not available in healthcare systems outside of the VHA. Our parsimonious models, which excluded the CAN score, performed comparably to our fully fitted models and could be implemented more widely.

Our models have multiple strengths. They address a significant gap in the field, given that only 5% of developed COVID-19 outcome models predict hospitalization risk [3]. We also focused on COVID-19 diagnoses during the Omicron era and included COVID-19 vaccines and treatments. Following the early pandemic and Delta era, COVID-19 hospitalization and mortality rates significantly declined, and effective vaccines and treatments became available. Thus, early risk prediction models may no longer be well-calibrated [44]. Furthermore, many prior risk models were developed during a period of lower population immunity [45]. Dickerman et al. have demonstrated how prediction model calibration can worsen when applied to settings where treatment strategies and availability differ from the context in which models were developed [14]. We demonstrated that our prediction models outperformed the VACO Index [11], with improved discrimination, calibration, and classification for mortality. We chose not to assess the VACO Index’s performance on hospitalization outcomes since it was only developed and validated for mortality. Reasons for this improvement could be our use of an ensemble of model and a larger number of predictors; however, model training on a more recent cohort reflecting contemporary contexts may be the most important as we showed the VACO index is no longer well calibrated.

Our models have important implications for clinical practice and research applications. Since our models were trained using predictors readily available in CSDR, it is possible to automate risk scores for all VHA patients similar to the CAN score [4]. Identifying high risk patients who might benefit the most from outpatient COVID-19 treatments, for example, could facilitate improved prescribing, particularly as utilization remains low in many settings [46, 47]. In a recent study, we used these models to assess treatment effect heterogeneity of nirmatrelvir-ritonavir, an FDA-approved antiviral for COVID-19, and found COVID-19 patients in the highest two quartiles of predicted risk experienced the greatest benefit (i.e., risk reduction), compared with patients in the lower two risk quartiles [48]. In addition, the parsimonious models, which exclude VHA-specific predictors, may be more broadly implemented in other healthcare systems as well as facilitate recruitment of high-risk patients to power future clinical trials. While our models were developed specifically to predict COVID-19 outcomes, many of the predictors we included are likely relevant to other respiratory viral illnesses. As such, our models could be used as a base for developing models for other respiratory illnesses such as respiratory syncytial virus or influenza.

Limitations

Our risk models have several important limitations. First, we did not have data on re-infections, an important prognostic factor for severe COVID-19-related outcomes. Second, while we considered many predictors for adverse COVID-19 outcomes, it is possible we are missing important, unmeasured prognostic factors; however, such factors would also likely be unavailable for model implementation. Third, while we expect the models to perform well in an out-of-sample population given good cross-validated discrimination, we have not demonstrated external model validation. Since our models were developed using readily available predictors, validation in future infection periods is possible and external validation of our parsimonious models in a separate healthcare system is feasible. Finally, while we included a diverse set of learners in our ensemble models, our set of ten learners is relatively small compared with typical super learner applications [20]. We limited the number of included learners for computational efficiency; future work could consider incorporating additional algorithms and models with multiple parameter sets to enhance prediction [22].

Conclusion

We developed machine learning risk prediction models that accurately estimate risk of COVID-19 hospitalization and mortality in the VHA, in the context of recent, Omicron-era COVID-19 infections, COVID-19 vaccinations and antiviral treatments. These models may be used for risk stratification of individual patients to optimize care and to identify high-risk patients for inclusion in clinical trials.

Supporting information

S1 File. Supporting information, data, and results.

https://doi.org/10.1371/journal.pone.0307235.s001

(DOCX)

Acknowledgments

Disclaimer

The contents of this article do not represent the views of the U.S. Department of Veterans Affairs or the U.S. government.

Additional contributions

We thank the Biomedical Advanced Research and Development Authority (interagency agreement No. AAI21050) and US Food and Drug Administration (interagency agreement No. 75F40121S30013) for their support. These institutions did not directly contribute to this study.

References

  1. 1. Wang L, Zhang Y, Wang D, Tong X, Liu T, Zhang S, et al. Artificial Intelligence for COVID-19: A Systematic Review. Front Med (Lausanne). 2021;8: 704256. pmid:34660623
  2. 2. Shakibfar S, Nyberg F, Li H, Zhao J, Nordeng HME, Sandve GKF, et al. Artificial intelligence-driven prediction of COVID-19-related hospitalization and death: a systematic review. Front Public Health. 2023;11: 1183725. pmid:37408750
  3. 3. Appel KS, Geisler R, Maier D, Miljukov O, Hopff SM, Vehreschild JJ. A Systematic Review of Predictor Composition, Outcomes, Risk of Bias, and Validation of COVID-19 Prognostic Scores. Clinical Infectious Diseases. 2023; ciad618. pmid:37879096
  4. 4. Atkins D, Makridis CA, Alterovitz G, Ramoni R, Clancy C. Developing and Implementing Predictive Models in a Learning Healthcare System: Traditional and Artificial Intelligence Approaches in the Veterans Health Administration. Annu Rev Biomed Data Sci. 2022;5: 393–413. pmid:35609894
  5. 5. Ahmad FB, Cisewski JA, Xu J, Anderson RN. COVID-19 Mortality Update—United States, 2022. MMWR Morb Mortal Wkly Rep. 2023;72: 493–496. pmid:37141157
  6. 6. Bajema KL, Berry K, Streja E, Rajeevan N, Li Y, Mutalik P, et al. Effectiveness of COVID-19 Treatment With Nirmatrelvir–Ritonavir or Molnupiravir Among U.S. Veterans: Target Trial Emulation Studies With One-Month and Six-Month Outcomes. Ann Intern Med. 2023;176: 807–816. pmid:37276589
  7. 7. Hammond J, Leister-Tebbe H, Gardner A, Abreu P, Bao W, Wisemandle W, et al. Oral Nirmatrelvir for High-Risk, Nonhospitalized Adults with Covid-19. N Engl J Med. 2022;386: 1397–1408. pmid:35172054
  8. 8. Barda N, Riesel D, Akriv A, Levy J, Finkel U, Yona G, et al. Developing a COVID-19 mortality risk prediction model when individual-level data are not available. Nat Commun. 2020;11: 4439. pmid:32895375
  9. 9. Cisterna-García A, Guillén-Teruel A, Caracena M, Pérez E, Jiménez F, Francisco-Verdú FJ, et al. A predictive model for hospitalization and survival to COVID-19 in a retrospective population-based study. Scientific Reports. 2022;12: 18126. pmid:36307436
  10. 10. Dashti H, Roche EC, Bates DW, Mora S, Demler O. SARS2 simplified scores to estimate risk of hospitalization and death among patients with COVID-19. Scientific Reports. 2021;11: 4945. pmid:33654180
  11. 11. King JT Jr, Yoon JS, Rentsch CT, Tate JP, Park LS, Kidwai-Khan F, et al. Development and validation of a 30-day mortality index based on pre-existing medical administrative data from 13,323 COVID-19 patients: The Veterans Health Administration COVID-19 (VACO) Index. PLOS ONE. 2020;15: e0241825. pmid:33175863
  12. 12. Murtas R, Morici N, Cogliati C, Puoti M, Omazzi B, Bergamaschi W, et al. Algorithm for Individual Prediction of COVID-19–Related Hospitalization Based on Symptoms: Development and Implementation Study. JMIR Public Health Surveill. 2021;7: e29504. pmid:34543227
  13. 13. Yuan Y, Sun C, Tang X, Cheng C, Mombaerts L, Wang M, et al. Development and Validation of a Prognostic Risk Score System for COVID-19 Inpatients: A Multi-Center Retrospective Study in China. Engineering (Beijing). 2022;8: 116–121. pmid:33282444
  14. 14. Dickerman BA, Dahabreh IJ, Cantos KV, Logan RW, Lodi S, Rentsch CT, et al. Predicting counterfactual risks under hypothetical treatment strategies: an application to HIV. European Journal of Epidemiology. 2022;37: 367–376. pmid:35190946
  15. 15. Subbaswamy A, Saria S. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics. 2020;21: 345–352. pmid:31742354
  16. 16. Gao Y, Cai G-Y, Fang W, Li H-Y, Wang S-Y, Chen L, et al. Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nature Communications. 2020;11: 5033. pmid:33024092
  17. 17. Horvath A, Lind T, Frece N, Wurzer H, Stadlbauer V. Validation of a simple risk stratification tool for COVID-19 mortality. Frontiers in Medicine. 2022;9. Available: https://www.frontiersin.org/articles/10.3389/fmed.2022.1016180 pmid:36304183
  18. 18. Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020;370: m3339. pmid:32907855
  19. 19. Goodacre S, Thomas B, Sutton L, Burnsall M, Lee E, Bradburn M, et al. Derivation and validation of a clinical severity score for acutely ill adults with suspected COVID-19: The PRIEST observational cohort study. PLOS ONE. 2021;16: e0245840. pmid:33481930
  20. 20. van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6: Article25. pmid:17910531
  21. 21. Naimi AI, Balzer LB. Stacked generalization: an introduction to super learning. European Journal of Epidemiology. 2018;33: 459–464. pmid:29637384
  22. 22. Phillips RV, van der Laan MJ, Lee H, Gruber S. Practical considerations for specifying a super learner. International Journal of Epidemiology. 2023;52: 1276–1285. pmid:36905602
  23. 23. Admon AJ, Wander PL, Iwashyna TJ, Ioannou GN, Boyko EJ, Hynes DM, et al. Consensus elements for observational research on COVID-19-related long-term outcomes. Medicine. 2022;101: 8. pmid:36401423
  24. 24. Ioannou GN, Green P, Fan VS, Dominitz JA, O’Hare AM, Backus LI, et al. Development of COVIDVax Model to Estimate the Risk of SARS-CoV-2–Related Death Among 7.6 Million US Veterans for Use in Vaccination Prioritization. JAMA Network Open. 2021;4: e214347–e214347. pmid:33822066
  25. 25. Ioannou GN, O’Hare AM, Berry K, Fan VS, Crothers K, Eastment MC, et al. Trends Over Time in the Risk of Adverse Outcomes Among Patients With Severe Acute Respiratory Syndrome Coronavirus 2 Infection. Clin Infect Dis. 2022;74: 416–426. pmid:33973000
  26. 26. Pajouheshnia R, Peelen LM, Moons KGM, Reitsma JB, Groenwold RHH. Accounting for treatment use when validating a prognostic model: a simulation study. BMC Medical Research Methodology. 2017;17: 103. pmid:28709404
  27. 27. Groenwold RHH, Moons KGM, Pajouheshnia R, Altman DG, Collins GS, Debray TPA, et al. Explicit inclusion of treatment in prognostic modeling was recommended in observational and randomized settings. Journal of Clinical Epidemiology. 2016;78: 90–100. pmid:27045189
  28. 28. Ghirimoldi FM, Schmidt S, Simon RC, Wang C-P, Wang Z, Brimhall BB, et al. Association of Socioeconomic Area Deprivation Index with Hospital Readmissions After Colon and Rectal Surgery. J Gastrointest Surg. 2021;25: 795–808. pmid:32901424
  29. 29. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. Journal of Chronic Diseases. 1987;40: 373–383. pmid:3558716
  30. 30. Wang L, Porter B, Maynard C, Evans G, Bryson C, Sun H, et al. Predicting Risk of Hospitalization or Death Among Patients Receiving Primary Care in the Veterans Health Administration. Medical Care. 2013;51. Available: https://journals.lww.com/lww-medicalcare/fulltext/2013/04000/predicting_risk_of_hospitalization_or_death_among.12.aspx
  31. 31. Osborne TF, Veigulis ZP, Arreola DM, Röösli E, Curtin CM. Automated EHR score to predict COVID-19 outcomes at US Department of Veterans Affairs. PLOS ONE. 2020;15: e0236554. pmid:32716922
  32. 32. Gruber S, Lee H, Phillips R, Ho M, van der Laan M. Developing a Targeted Learning-Based Statistical Analysis Plan. Statistics in Biopharmaceutical Research. 2023;15: 468–475.
  33. 33. Zachary Butzin-Dozier, Yunwen Ji, Haodong Li, Coyle Jeremy, Junming (Seraphina) Shi, Rachael V. Philips, et al. Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner. medRxiv. 2023; 2023.07.27.
  34. 34. Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data. The Lancet Digital Health. 2022;4: e532–e541. pmid:35589549
  35. 35. LeDell E, van der Laan MJ, Petersen M. AUC-Maximizing Ensembles through Metalearning. 2016;12: 203–218. pmid:27227721
  36. 36. Kuhn M. Building Predictive Models in R Using the caret Package. J Stat Soft. 2008;28: 1–26.
  37. 37. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med. 2014;33: 517–535. pmid:24002997
  38. 38. Huang Y, Li W, Macheret F, Gabriel RA, Ohno-Machado L. A tutorial on calibration measurements and calibration models for clinical prediction models. Journal of the American Medical Informatics Association. 2020;27: 621–633. pmid:32106284
  39. 39. Veterans Health Administration COVID-19 (VACO) Index for COVID-19 Mortality. In: MDCalc [Internet]. [cited 3 Nov 2023]. Available: https://www.mdcalc.com/calc/10346/veterans-health-administration-covid-19-vaco-index-covid-19-mortality
  40. 40. Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, et al. Discrimination and Calibration of Clinical Prediction Models: Users’ Guides to the Medical Literature. JAMA. 2017;318: 1377–1384. pmid:29049590
  41. 41. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30.
  42. 42. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med. 2015;162: 55–63. pmid:25560714
  43. 43. National Academies of Sciences E, Education D of B and SS and, Integration B on H-S, Sciences D on E and P, Environment B on I and the C, Administration C on FSR for VH. Nature of Veterans Health Administration Facilities Management (Engineering) Tasks and Staffing. Facilities Staffing Requirements for the Veterans Health Administration-Resource Planning and Methodology for the Future. National Academies Press (US); 2019. Available: https://www.ncbi.nlm.nih.gov/books/NBK555777/
  44. 44. Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al. The Clinician and Dataset Shift in Artificial Intelligence. N Engl J Med. 2021;385: 283–286. pmid:34260843
  45. 45. CDC. COVID Data Tracker. In: Centers for Disease Control and Prevention [Internet]. 28 Mar 2020 [cited 1 Nov 2023]. Available: https://covid.cdc.gov/covid-data-tracker/index.html#nationwide-blood-donor-seroprevalence
  46. 46. Yan L, Streja E, Li Y, Rajeevan N, Rowneki M, Berry K, et al. Anti–SARS-CoV-2 Pharmacotherapies Among Nonhospitalized US Veterans, January 2022 to January 2023. JAMA Netw Open. 2023;6: e2331249. pmid:37651140
  47. 47. Sullivan M, Perrine CG, Kelleher J, Kanwar O, Kuwabara S, Bennett K, et al. Notes From the Field: Dispensing of Oral Antiviral Drugs for Treatment of COVID-19 by Zip Code-Level Social Vulnerability—United States, December 23, 2021-August 28, 2022. MMWR Morb Mortal Wkly Rep. 2022;71: 1384–1385. pmid:36301746
  48. 48. Yan L, Bui D, Li Y, Rajeevan N, Rowneki M, Berry K, et al. Identifying Veterans Who Benefit From Nirmatrelvir-Ritonavir: A Target Trial Emulation. Clinical Infectious Diseases. 2024; ciae202. pmid:38864601