Development and validation of a 30-day mortality index based on pre-existing medical administrative data from 13,323 COVID-19 patients: The Veterans Health Administration COVID-19 (VACO) Index

Background Available COVID-19 mortality indices are limited to acute inpatient data. Using nationwide medical administrative data available prior to SARS-CoV-2 infection from the US Veterans Health Administration (VA), we developed the VA COVID-19 (VACO) 30-day mortality index and validated the index in two independent, prospective samples. Methods and findings We reviewed SARS-CoV-2 testing results within the VA between February 8 and August 18, 2020. The sample was split into a development cohort (test positive between March 2 and April 15, 2020), an early validation cohort (test positive between April 16 and May 18, 2020), and a late validation cohort (test positive between May 19 and July 19, 2020). Our logistic regression model in the development cohort considered demographics (age, sex, race/ethnicity), and pre-existing medical conditions and the Charlson Comorbidity Index (CCI) derived from ICD-10 diagnosis codes. Weights were fixed to create the VACO Index that was then validated by comparing area under receiver operating characteristic curves (AUC) in the early and late validation cohorts and among important validation cohort subgroups defined by sex, race/ethnicity, and geographic region. We also evaluated calibration curves and the range of predictions generated within age categories. 13,323 individuals tested positive for SARS-CoV-2 (median age: 63 years; 91% male; 42% non-Hispanic Black). We observed 480/3,681 (13%) deaths in development, 253/2,151 (12%) deaths in the early validation cohort, and 403/7,491 (5%) deaths in the late validation cohort. Age, multimorbidity described with CCI, and a history of myocardial infarction or peripheral vascular disease were independently associated with mortality–no other individual comorbid diagnosis provided additional information. The VACO Index discriminated mortality in development (AUC = 0.79, 95% CI: 0.77–0.81), and in early (AUC = 0.81 95% CI: 0.78–0.83) and late (AUC = 0.84, 95% CI: 0.78–0.86) validation. The VACO Index allows personalized estimates of 30-day mortality after COVID-19 infection. For example, among those aged 60–64 years, overall mortality was estimated at 9% (95% CI: 6–11%). The Index further discriminated risk in this age stratum from 4% (95% CI: 3–7%) to 21% (95% CI: 12–31%), depending on sex and comorbid disease. Conclusion Prior to infection, demographics and comorbid conditions can discriminate COVID-19 mortality risk overall and within age strata. The VACO Index reproducibly identified individuals at substantial risk of COVID-19 mortality who might consider continuing social distancing, despite relaxed state and local guidelines.

regression model in the development cohort considered demographics (age, sex, race/ethnicity), and pre-existing medical conditions and the Charlson Comorbidity Index (CCI) derived from ICD-10 diagnosis codes. Weights were fixed to create the VACO Index that was then validated by comparing area under receiver operating characteristic curves (AUC) in the early and late validation cohorts and among important validation cohort subgroups defined by sex, race/ethnicity, and geographic region. We also evaluated calibration curves and the range of predictions generated within age categories. 13,323 individuals tested positive for SARS-CoV-2 (median age: 63 years; 91% male; 42% non-Hispanic Black). We observed 480/3,681 (13%) deaths in development, 253/2,151 (12%) deaths in the early validation cohort, and 403/7,491 (5%) deaths in the late validation cohort. Age, multimorbidity described with CCI, and a history of myocardial infarction or peripheral vascular disease were independently associated with mortality-no other individual comorbid diagnosis provided additional information. The VACO Index discriminated mortality in development (AUC = 0.79, 95% CI: 0.77-0.81), and in early (AUC = 0.81 95% CI: 0.78-0.83) and late (AUC = 0.84, 95% CI: 0.78-0.86) validation. The VACO Index allows personalized estimates of 30-day mortality after COVID-19 infection. For example, among those aged 60-64 years, overall mortality was estimated at 9% (95% CI: 6-11%). The Index further discriminated risk in this age stratum from 4% (95% CI: 3-7%) to 21% (95% CI: 12-31%), depending on sex and comorbid disease.

Introduction
The highly contagious nature of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the lack of widespread immunity, and the absence of an effective vaccine ensure continued spread of the virus among the general population [1]. As state and local authorities relax guidelines, we need accurate and reliable means of identifying those at greatest risk should they become infected to inform personal choice and public policy.
Several studies have identified risk factors for mortality associated with coronavirus disease 2019  in the inpatient setting [2][3][4][5][6][7]. However, these analyses do not adequately address the issue of identifying at-risk individuals before infection, for several reasons. First, these analyses were not exclusively based on data present prior to SARS-CoV-2 infection. Second, the models require data not routinely available or directly analyzable from administrative databases or electronic health records (EHR) making them difficult to apply in real time to large patient populations. Third, a recent systematic review [4] found that most SARS-CoV-2 infection outcome models were based on limited sample sizes, were likely over-fit, and were not validated in independent data.
The Veterans Health Administration (VA) is the largest integrated health care system in the United States, providing care at 1,255 health care facilities, including 170 medical centers and 1,074 outpatient sites, serving 6 million Veterans each year. Using data routinely available and directly analyzable in the VA national system, we developed the VA COVID-19 (VACO) Index estimating 30-day COVID-19 mortality after a positive test based on demographics and pre-existing conditions, and validated its discrimination and calibration. We explored the VACO Index performance in two different time intervals of the pandemic, and in important clinical subgroups by sex, race/ethnicity, geographic region, and within age strata.

Data source and participants
We obtained individual patient data on August 19, 2020 from the VA Corporate Data Warehouse, which includes daily updates from over 1,200 facilities across the United States. All Veterans who were alive as of January 1, 2020 and active in care (defined as having at least one clinical encounter between January 1, 2018 and December 31, 2019, with either a recorded blood pressure or a routine laboratory test result (complete blood count, serum creatinine, alanine transaminase, or aspartate aminotransferase) were eligible. We included patients who tested positive for SARS-CoV-2 in inpatient or outpatient settings between March 2 and July 18, 2020 and followed them for 30 days.
We identified tested individuals using text searches of laboratory results containing terms consistent with SARS-CoV-2 or COVID-19. Nearly all tests utilized nasopharyngeal swabs; <1% were from other sources, serum tests were excluded. Testing was performed in VA, state public health, and commercial reference laboratories using emergency use authorization approved SARS-CoV-2 assays. If an individual had more than one test, we used the date of their first positive test. Baseline was defined as the date of specimen collection unless testing occurred during hospitalization, in which case it was defined as date of admission. If admission began more than 14 days prior to testing, possibly indicating nosocomial infection, we set the baseline to 14 days prior to testing to delineate health status before SARS-CoV-2 infection.
The data were split into a development cohort (positive test between March 2 and April 15, 2020), an early validation cohort (positive test between April 16 and May 18, 2020), and a late validation cohort (positive test between May 19 and July 19, 2020). Date of last follow-up was August 18, 2020 to allow 30 days of follow-up after testing for all patients. This study was conducted in compliance with the Health Insurance Portability and Accountability Act (HIPAA) and was approved by the Institutional Review Boards of VA Connecticut Healthcare System and Yale University, both of whom granted wavers of consent. This cohort study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines (S1 Checklist).

VACO Index development: Candidate predictors
We began by performing a literature review to identify candidate demographic and medical condition predictors available in medical administrative records. Demographic variables included age, sex (male or female), and race and ethnicity (non-Hispanic Black, non-Hispanic White, Hispanic, or other). Medical conditions included individual components of the Charlson Comorbidity Index (CCI) and the CCI without an age adjustment derived from International Classification of Diseases, 10th edition (ICD-10) codes [8,9] present between 730 and 15 days before COVID-19 testing (S1 Table). Using a previously validated grouping of ICD-10 code-defined comorbidities recorded during at least one inpatient or two outpatient encounters within the past two years [10,11], we also considered conditions reported by other investigators as associated with COVID-19 mortality that were not included in CCI: asthma and hypertension [12][13][14].
Deaths were determined using inpatient records and the VA death registry to capture deaths occurring outside hospitalization. Previous research has demonstrated that these combined sources are as accurate and more up to date than the National Death Index [15].

Statistical analyses
All data analyses were performed using Stata, version 15.1 (StataCorp, College Station, TX). We assessed the distribution of variables in the development cohort and their association and functional forms with 30-day mortality using unadjusted and multivariable logistic regression models. All variables with P<0.1 in unadjusted models were evaluated for inclusion in the adjusted models and retained in the final adjusted model for a P<0.05. We double checked the final multivariable model by reinserting and assessing the significance of previously excluded individual comorbidity and condition variables-none attained significance at P<0.05. Sex was included in the final multivariable model, regardless of P-value. CCI values with similar mortality rates were collapsed into five categories (0, 1-3, 4-5, 6-9, 10+). We explored interactions between variables-there was a significant interaction between age and CCI below the age of 85 that was incorporated into the final model.

Model validation and calibration
We report area under the receiver operating characteristic curves (AUC) and calibration curves as assessments of the VACO Index performance in development and validation samples. To validate performance, we froze statistical weights from the final development model, then generated risk prediction scores for individuals in validation. We used the early and late validation cohorts, and a combined validation cohort, to evaluate Index performance overall and in important subgroups: sex (male vs female), race/ethnicity (Black vs non-Black), and VA-defined geographic regions combined to generate two approximately equal population samples (Northeast and West vs Southeast and Midwest). We assessed Index calibration with the Hosmer-Lemeshow goodness-of-fit test in the development cohort, and with plots of observed versus predicted 30-day mortality in 10 strata containing equal numbers of deaths, in development and validation cohorts and in validation cohort subgroups by sex, race/ethnicity, and geographic region. We also compared the range of predicted mortality values stratified by age category.

Participants
Among tests performed from February 8 to July 19, 2020, we identified 13,323 individuals testing positive for SARS-CoV-2 in the VA who met our inclusion criteria. The first VA positive test was on March 2, 2020. Based on date of their first positive test, we assigned 3,681 patients to the development cohort, 2,151 patients to the early validation cohort, and 7,491 patients to the late validation cohort (Fig 1). As of August 18, 2020, we observed 1,136 deaths (9%): 480 (13%) in the development cohort, 253 (12%) in the early validation cohort, and 403 (5%) in the late validation cohort. The development cohort was older (median age: 64.8 vs 62.3), with a higher proportion of non-Hispanic Blacks (52% vs 38%), and a lower proportion of males (93% vs 90%) than the combined validation cohorts ( Table 1). The development cohort had fewer patients with a Charlson Comorbidity Index of zero indicating absence of comorbid disease (26% vs 35%).

VACO Index development
Univariate analyses demonstrated strong associations between multiple candidate predictors and 30-day mortality in the development cohort ( Table 2). The strongest predictor was age, with mortality ranging from 0.3% among those under age 50 to 44% among those 90 or more years of age. Women experienced lower mortality than men. Before adjustment, non-Hispanic White patients had higher mortality, although these differences vanished after adjustment with age and CCI. Many pre-existing conditions were associated with mortality including prior myocardial infarction (MI), chronic kidney disease (CKD), chronic lung disease, diabetes with complications, hypertension, and peripheral vascular disease (PVD), both individually and combined in the CCI.

VACO Index specification and performance
Age alone was strongly associated with mortality ( Table 2) with an AUC of 0.77 (95% CI: 0.75-0.79). There was a significant interaction between CCI and age below the age of 85. Discrimination improved in the multivariable model after supplementing age with sex, CCI, and MI or PVD (AUC: 0.79, 95% CI: 0.77-0.81; Fig 2). When we applied the VACO Index to the validation cohorts, it maintained good discrimination in the early (AUC: 0.81, 95% CI: 0.78-0.83) and late (AUC: 0.84, 95% CI: 0.78-0.86) validation cohorts. The AUCs for important

Calibration and discrimination of the VACO Index beyond age alone
Hosmer-Lemeshow goodness-of-fit testing supported good calibration of the index in development (P = 0.847, indicating no significant lack of fit). Calibration curves of predicted versus observed 30-day mortality illustrated good calibration of the VACO Index in development, with modest overestimation of mortality in the early and late validation cohorts in which overall observed mortality rates progressively decreased (Fig 3). The VACO index demonstrated stable performance between the development and combined validation cohorts across sex, race/ethnicity, and geographic region subgroups (Fig 4). The VACO Index can be used to estimate COVID-19 30-day mortality risk by age strata and covariates (Fig 5; S1 File). For example, among males 60-64 years of age, overall mortality was estimated as 9% (95% CI: 6-11%). The VACO Index provided risk estimates ranging from 5% (95% CI: 3-7%) for men with a CCI of zero indicating no comorbidity, to 22% (95% CI: 12-31%) for men with a CCI of 10 or more and a history of MI or PVD. Similar trends were seen across other age strata.

Discussion
Using information present prior to SARS-CoV-2 infection from a national healthcare system, we created and validated in two prospective, independent samples a practical index that can predict 30-day COVID-19 mortality. The VACO Index is based on real world data, routinely available in medical administrative datasets. Our findings describe the experience of a large, racially and ethnically diverse, fully integrated healthcare system, encompassing inpatient and outpatient care. Discrimination of the VACO Index was maintained in both validation samples, and despite major changes in overall observed mortality over time, the Index only modestly overestimated mortality in the validation samples. The VACO Index identifies individuals at greatest risk for COVID-19 mortality, enabling patients, providers, healthcare systems, insurers, and accountable care organizations to make better informed decisions.
We are one of the first groups to use pre-existing information and multivariable modeling to generate a mortality risk index, and our findings are likely more generalizable than earlier studies [16]. Our sample was larger than most prior studies and we included patients testing positive for SARS-CoV-2 in both inpatient and outpatient settings. Most importantly, discrimination and calibration of the VACO index validated well for two different time periods in the

PLOS ONE
30-day COVID-19 mortality index based on pre-existing data pandemic, and among important subgroups including men and women, racial/ethnic minorities, and those living in different geographic regions of the US. The strong relationship between age and COVID-19 mortality has been a consistent finding across multiple studies [17][18][19] and age was the strongest predictor in both unadjusted and adjusted analyses. The VACO Index allows personalized estimates of 30-day mortality after COVID-19 infection stratified by age. For example, among those aged 60-64 years, overall mortality was estimated at 9% (95% CI: 6-11%). The Index further discriminated risk in this

PLOS ONE
30-day COVID-19 mortality index based on pre-existing data age stratum from 4% (95% CI: 3-7%) to 21% (95% CI: 12-31%), depending on sex and comorbid disease. This added discrimination is particularly relevant for patients age 60-74 who are both at substantial risk and often remain employed. Thirty-nine percent of those age 60-74 in the US are employed [20], thus accurate personalized risk estimation can better inform personal and system level decisions regarding returning to work or other group settings.
Most prior studies considered only individual comorbid conditions such as asthma, chronic lung disease, diabetes, hypertension, and vascular disease [6,7,12,[21][22][23]. Liang et al. found that comorbidity count predicted critical illness in hospitalized patients in China [3]. We found that multimorbidity captured by the CCI has a stronger relationship with mortality than nearly all individual comorbid conditions. After adjustment using the CCI, only a prior MI or PVD was independently associated with mortality. CCI also has the advantage of straightforward calculation from ICD-10 diagnosis codes obtained from medical administrative data, and is widely used across numerous diseases, health care systems, and populations [9]. Our finding that MI and PVD added independent prognostic information underscores the likely importance of thrombotic complications in COVID-19 [24,25]. It stands to reason that those with pre-existing vascular disease are more susceptible to thrombosis if infected.
The most important limitation of the VACO Index is that it was developed on patients who presented for COVID-19 testing early in the pandemic, presumably because they had symptomatic disease. COVID-19 testing capacity in the US was limited early in the pandemic, and testing was reserved for patients with significant symptoms that might represent a more severe infection. While the discrimination of the VACO Index was maintained in both prospective independent validations, index predictions modestly over-estimated mortality risk in validation, particularly in the late validation cohort. Mortality rates among those testing positive for COVID-19 are decreasing as US testing capacity improves, permitting testing of more mildly symptomatic and asymptomatic people who are less likely to succumb to the disease. Overall mortality rate in our development cohort was nearly three times that found in our most recent

PLOS ONE
30-day COVID-19 mortality index based on pre-existing data validation cohort (13% vs 5%). Predictive indices developed in the context of high mortality rates will almost inevitably overestimate risk in samples with substantially lower mortality. However, if discrimination of the index is preserved, it is possible to adjust calibration as rates eventually stabilize. COVID-19 testing criteria and rates, test positivity rates, and mortality are evolving with the pandemic. Centers for Disease Control and Prevention (CDC) data estimate that the number of people with antibody evidence of SARS-CoV-2 infection is many times the number of reported COVID-19 test-positive cases [26]. The CDC report did not stratify their results by age, and older people are almost certainly more likely to experience symptoms if infected. While the CDC report suggested that the overall ratio of asymptomatic to symptomatic infections was~10:1, it may be substantially lower for older individuals. Future research should   5. Range of 30-day mortality predictions from age alone and VACO Index. Bar graphs demonstrating the additional variation in mortality prediction provided by the VACO Index over age alone across age categories in the combined validation cohort (n = 9,642). The diamonds indicate predicted 30-day mortality within each age category when only age is used to generate the predicted value. The bars show the range of predicted 30-day mortality within the same age category provided by the VACO Index, where age is supplemented with sex and comorbidities.
https://doi.org/10.1371/journal.pone.0241825.g005 examine this ratio stratified by age as a potential factor in mortality risk estimation. We are gathering data to adjust risk estimates based on the ratio of asymptomatic to symptomatic infections stratified by age; however, this is beyond the scope of this analysis.
This study has other limitations. Our study population was limited to Veterans in VA care. Prior work has demonstrated that while Veterans in VA care are older and have a higher prevalence of chronic health conditions and risk behaviors than the general US population [27][28][29], after adjusting for age, sex, race/ethnicity, region, and residence location, there are no significant differences in total disease burden [29]. VA has excellent mortality assessment [15], but delays in registering outpatient deaths could result in some under reporting. We only included Veterans receiving COVID-19 testing in the VA-others may have been tested and treated outside the VA. In the future, when Center for Medicare and Medicaid Services (CMS) data are available, this limitation could be addressed in Veterans age 65 and older. Our goal was to create a predictive model using pre-existing data that is available and readily analyzable in real time in most medical administrative data. Consequently, we did not consider laboratory data, vital signs, medications, or information typically residing in text notes, such as symptoms, physical exam findings, or imaging. We have demonstrated internal generalizability of the VACO Index within the VA-we recommend further validation in external datasets before applying the VACO Index outside of the VA.
In summary, using data from a national healthcare system, we developed and validated the VACO Index, a short-term mortality risk index based upon directly analyzable data available prior to infection with SARS-CoV-2. By doing so, we provide timely, quantifiable, and individualized risk estimates that successfully differentiate risk of 30-day mortality among those of similar age to better inform personal decision making and public policy as countries begin to relax lockdown guidelines.