Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Health diagnosis associated with COVID-19 death in the United States: A retrospective cohort study using electronic health records

  • Mariam Joseph,

    Roles Data curation, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Statistics, University of Michigan, Ann Arbor, Michigan, United States of America

  • Qiwei Li,

    Roles Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing – review & editing

    Affiliation Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, United States of America

  • Sunyoung Shin

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing

    sunyoungshin@postech.ac.kr

    Affiliation Department of Mathematics, Pohang University of Science and Technology, Pohang, Gyeongbuk, South Korea

Abstract

Background

The United States has experienced high surge in COVID-19 cases since the dawn of 2020. Identifying the types of diagnoses that pose a risk in leading COVID-19 death casualties will enable our community to obtain a better perspective in identifying the most vulnerable populations and enable these populations to implement better precautionary measures.

Objective

To identify demographic factors and health diagnosis codes that pose a high or a low risk to COVID-19 death from individual health record data sourced from the United States.

Methods

We used logistic regression models to analyze the top 500 health diagnosis codes and demographics that have been identified as being associated with COVID-19 death.

Results

Among 223,286 patients tested positive at least once, 218,831 (98%) patients were alive and 4,455 (2%) patients died during the duration of the study period. Through our logistic regression analysis, four demographic characteristics of patients; age, gender, race and region, were deemed to be associated with COVID-19 mortality. Patients from the West region of the United States: Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming had the highest odds ratio of COVID-19 mortality across the United States. In terms of diagnoses, Complications mainly related to pregnancy (Adjusted Odds Ratio, OR:2.95; 95% Confidence Interval, CI:1.4 - 6.23) hold the highest odds ratio in influencing COVID-19 death followed by Other diseases of the respiratory system (OR:2.0; CI:1.84 – 2.18), Renal failure (OR:1.76; CI:1.61 – 1.93), Influenza and pneumonia (OR:1.53; CI:1.41 – 1.67), Other bacterial diseases (OR:1.45; CI:1.31 – 1.61), Coagulation defects, purpura and other hemorrhagic conditions(OR:1.37; CI:1.22 – 1.54), Injuries to the head (OR:1.27; CI:1.1 - 1.46), Mood [affective] disorders (OR:1.24; CI:1.12 – 1.36), Aplastic and other anemias (OR:1.22; CI:1.12 – 1.34), Chronic obstructive pulmonary disease and allied conditions (OR:1.18; CI:1.06 – 1.32), Other forms of heart disease (OR:1.18; CI:1.09 – 1.28), Infections of the skin and subcutaneous tissue (OR: 1.15; CI:1.04 – 1.27), Diabetes mellitus (OR:1.14; CI:1.03 – 1.26), and Other diseases of the urinary system (OR:1.12; CI:1.03 – 1.21).

Conclusion

We found demographic factors and medical conditions, including some novel ones which are associated with COVID-19 death. These findings can be used for clinical and public awareness and for future research purposes.

Introduction

The COVID-19 (Coronavirus Diseases 2019) pandemic has been substantially impacting most individuals in the world since it broke out in Wuhan, China in December 2019. The COVID-19 caused by acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is highly contagious and can lead to death. The U.S. Centers for Disease Control and Prevention (U.S. CDC) declared the COVID-19 pandemic the third most popular leading cause of death in the United States by the end of 2020 [1]. The case-fatality ratio, which is the number of deaths divided by the number of confirmed cases, in the U.S. is 1.1% [2]. The severity of the COVID-19 varies among individuals. Many COVID-19 patients have mild illness, and some patients are asymptomatic; however, others face serious consequences such as hospitalization, admission to intensive care or death [3]. Adults over the age of 65 with pre-existing medical conditions such as obesity, diabetes, asthma, hypertension, cardiovascular diseases, or chronic lung diseases have been found to be at higher risk of severe COVID-19 outcomes [46].

Comprehensive systematic review and meta-analysis have been conducted for quantitative evaluation of the impact of such pre-existing medical conditions on the COVID-19 mortality. During the rise of the pandemic, COVID-19 hospitalizations/ICU admission data obtained from multiple studies were synthesized and analyzed through meta-analysis with random-effects, which identified that hypertension, coronary heart disease, diabetes, lymphocytopenia and D-dimer are factors associated with increased mortality during hospitalization [79]. With an increase in amount of COVID-19 data accumulated, many systematic reviews on multiple studies published have identified underlying health conditions as prognostic factors for the COVID-19 mortality [3,1012]. However, different data sources that have inconsistent definitions of outcomes, selection criteria, reporting, etc. make the interpretations of the combined results from the multiple studies difficult [3,13].

COVID-19 electronic health records (EHR) data collected from large multicenter cohort studies contain crucial pieces of information to understand underlying health factors determining COVID-19 hospitalization and mortality. The large volume of CERNER EHR database has been widely used to understand the risk of COVID-19 among patients with specific conditions such as patients with sickle cell disease, pregnant women, and patients with type 1 diabetes [1416]. A U.S. healthcare company OPTUM initiated individual-level COVID-19 EHR data collection in February 2020, near the beginning of the pandemic, harnessing the power of comprehensive medical networks. A full access to the OPTUM database has been granted to UTHealth Systems and UTHealth School of Biomedical Informatics (SBMI) Data Service. It has made biweekly updates. The OPTUM EHR database has been exploited by researchers who conduct in-depth investigations on patients with COVID-19 [1721]. Characteristics of adults hospitalized with COVID-19 were examined along with their disease progression and outcomes, and their changes were explored over time [18,20,21]. [17] studied patients’ clinical conditions after COVID-19 diagnosis or hospital discharge and recognized the significance of their adverse long-term outcomes. The EHR data have provided evidence that patients with psychiatric disorders are at higher risk of COVID-19 infection and mortality [19,22].

The rich OPTUM EHR data source includes full demographics and baseline characteristics of COVID-19 deaths such as comorbidities and medication use collected during the enrollment period [23]. Statistical models containing the demographics and the baseline characteristics for COVID-19 outcomes were used for predicting prognosis of hospitalized adults with COVID-19 and understanding racial and ethnic disparity in clinical outcomes among the patients [18,24]. Acute respiratory failure, pneumonia, sepsis, coagulation defects, arrhythmia, and myocardial infarction are found to be the recent diagnoses predictive of COVID-19 deaths during hospitalization [25].

The objective of this paper is to assess the effect of pre-existing medical conditions that determine the mortality in COVID-19 patients. We leverage full access to the individual-level medical diagnosis records for up to 10 years with standard codes in common use in the OPTUM EHR database. [26] considered all subjects tested for COVID-19 in the OPTUM medical networks between February, 2020 and August, 2020 regardless of their COVID-19 infection. In this paper, our focus is COVID-19 patients since we aim to improve accuracy in evaluating associations between the pre-existing medical conditions and the mortality among COVID-19 cases. The study subjects are divided into alive and died groups. The associations between COVID-19 mortality and the demographic factors such as age, regions, and race are interrogated. Logistic models adjusted for the demographics are used to identify COVID-19 patients’ pre-existing diagnoses that increase risk of deaths. Identifying populations with underlying conditions that are associated with the COVID-19 mortality would help healthcare providers to provide better medical interventions to COVID-19 patients.

Methods

Study design

OPTUM, an American health care provider, acquired COVID-19 EHR data from various medical care provider organizations which include hospitals and clinics across the United States since February 2020. The observational COVID-19 EHR dataset consists of medical and healthcare utilization data from outpatient, inpatient and ambulatory medical records, medical practice management systems, and several other internal systems. An independent statistical expert has certified the data to be de-identified based on the HIPAA statistical de-identification rules and OPTUM customer data use agreements. The study protocol was exempt from review in a written form by the Institutional Review Board (IRB) at University of Texas at Dallas. The IRB exemption was received since any information containing personally identifying information is not collected nor obtained, and the identity and privacy of study participants are protected by reviewing the de-identified data in a private setting during each phase of research. Our study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Study population and observation period

The COVID-19 dataset used for this study was collected from February 2020 until January 2021 and contains 2,627,679 individuals (COVID-19 positive and negative). However, for this study, we exclusively analyzed COVID-19 positive patients, removing all COVID-19 negative individuals from the dataset. The dataset includes 223,286 COVID-19 positive individuals who meet at least one of the following criteria: (i) have taken a laboratory test for COVID-19 or a COVID-19 antibody test, (ii) have been associated with a procedure code for COVID-19 test, and (iii) possess COVID-19 diagnosis code or diagnosis code to a similar condition in their respective medical records. For these individuals, the EHR was included up to 10 years prior to their COVID-19 diagnosis, where available. Patient deaths are recorded with their month and year of death. Additionally, history of patient visits, procedures, laboratory diagnostics, and observations are recorded to analyze treatments.

Data preprocessing

We conducted the COVID-19 data preprocessing and analysis from March 2021 until December 2022. Unix shell scripts were written to obtain COVID-19 patient information from the initial OPTUM COVID-19 EHR dataset. Specifically, we used the information about lab and test results to acquire the patients that were COVID-19 positive along with their ID, COVID-19 test order date, and test collected date. We acquired the following demographic variables: age, gender, race, ethnicity, and region. The variable race has three levels: African American, Asian, and Caucasian. The variable ethnicity is about whether an individual is either Hispanic, or Not Hispanic. The variable gender takes Female or Male. Region is primarily based on the categorization of states by the United States Census Bureau which are Northeast, Midwest, West, and South [27].

As shown in S1 Fig, we excluded 300,971 patients out of 2,627,679 (11.45%) who identified unknown in at least one of the demographic categories. Patients’ death records, identified as YYYYMM format, were also acquired. Additionally, the information about the patients’ visit was processed for obtaining their last visit date that enabled us to obtain a calculated estimate of the patients’ age who survived COVID-19. Lastly, we obtained all patients’ pre-existing conditions from their diagnosis history before May 5, 2020. This cut-off date was chosen to ensure feature engineering of all patients’ diagnosis history files identified in OPTUM Data. We obtained 223,286 individuals who tested positive for COVID-19 at least once from the sample of 2,326,708 individuals who took COVID-19 tests.

Diagnostic measures

In this study, we identified the top 500 diagnosis codes most frequently used among COVID-19 patients within the study period. These codes were then standardized and aligned with the International Classification of Diseases (ICD) versions 9 and 10. This preprocessing step was essential to ensure consistency in diagnosis categorization across different coding systems. To further refine our analysis, we consolidated these codes into 117 unique ‘pre-existing diagnosis codes.’ This consolidation involved merging related ICD 9 and 10 diagnosis codes, categorizing them into similar sections based on their respective diagnosis chapters [28,29].

We grouped all diagnosis codes according to ICD 9/10 classifications, resulting in 15 overarching categories of general pre-existing conditions: Circulatory, Digestive, Emergency, Endocrine, Eyes and ENT, Genitourinary, Infectious and Parasitic, Injury Poisoning, Mental and Behavioral Disorders, Musculoskeletal, Neoplasms, Nervous, Pregnancy and Childbirth, Respiratory, and Skin. For our study’s purposes, any patient with at least one diagnosis code falling under a general pre-existing condition category was classified as having that condition. This methodology allowed for a comprehensive and systematic analysis of pre-existing conditions among COVID-19 patients, enhancing the accuracy and relevance of our findings.

Outcome

We use all-cause mortality as our primary binary outcome. All dead individuals in the OPTUM EHR data are marked with their death record of year and month. Only cases whose time of death follows the COVID-19 test order date are considered for our analysis.

Statistical analysis

After data processing, we first conducted a two-sample unpaired t-test to find any differences in patients’ age between the alive and died populations. Secondly, we implemented a Chi-square test of independence between the mortality and the demographic variables such as age, gender, and race to determine the eligibility as predictors for our analysis. After verifying demographic variables as essential predictors, we then used multivariate logistic regression models adjusted for age, gender, race, ethnicity, and region to obtain pre-existing diagnosis codes that are influential to COVID-19 patients’ mortality.

To gain a more nuanced understanding, we implemented both joint and marginal analyses. In the joint analysis, we included all 117 pre-existing diagnosis codes in a single multivariate logistic regression model alongside the demographic variables to assess their combined effect on mortality. In contrast, the marginal analysis involved running separate multivariate logistic regressions for each of the 117 diagnosis codes, each adjusted for the same set of demographic variables, to evaluate the individual effect of each diagnosis code on the outcome. We then compared the results from these two approaches by calculating the Spearman’s correlation coefficient for the z-values obtained from each analysis. Additionally, we computed the Spearman’s correlation coefficient for the adjusted odds ratios to further compare the joint and marginal models’ outcomes.

Furthermore, we extended this approach to the 15 general pre-existing conditions, applying both joint and marginal logistic regression models adjusted for demographic variables. This comprehensive examination allowed us to assess not only the influence of specific diagnosis codes but also general categories of pre-existing conditions on COVID-19 mortality, providing a more comprehensive understanding of the factors contributing to patient outcomes.

Results

Demographical factors impacting COVID-19 deaths in the U.S

Table 1 presents an in-depth examination of demographic factors associated with the life status of COVID-19 patients at the end of our study period. The first row of the table displays the age, as in years, versus life status, a binary outcome of COVID-19 patients at the study’s conclusion. A notable observation from the table is that older individuals are more susceptible to being at risk of death than others. The median age for patients who survived is 46 with an interquartile range of 32 while the median age for patients who died is 78 years with an interquartile range of 18. We obtained a mean of 45.44 years for the alive group and a mean of 75.51 years for the deceased group. A two-sample t-test was performed on age based on life status. The null hypothesis for this t-test is that both means of the alive and deceased populations are equal. We reject the null hypothesis at a significant level of 0.05 and conclude that the population means are different (p-value <  0.001).

thumbnail
Table 1. Demographic characteristics based on life status.

https://doi.org/10.1371/journal.pone.0319585.t001

Additionally, Table 1 presents the detailed representation of other demographic variables counts along with p-values of the Chi-square test of independence. The EHR database provides a large body of evidence that COVID-19 death is impacted by gender, race, ethnicity, and region. All gender, race, ethnicity, and region hold p-values as listed in Table 1 which is less than 0.05 hence claiming that there are relationships between these demographic variables and the mortality. This supports the validity of utilizing these demographics as predictors for a logistic regression to remove their effect.

A crucial aspect of gender characteristics is displayed in Table 1. Female patients identified as alive are observed as the majority over male patients by 56%. On the other hand, COVID-19 patients identified as male and have died hold the majority of the deceased population by 56%. Caucasians hold the majority over other races in both alive and died population by 87% and 87% respectively. Likewise, Not Hispanic hold both the majority in alive and died populations by 95% and 96% respectively. In addition, Midwest holds the 60% in the alive population and 53% in the died population making it the region where most of our COVID-19 patients originated. The second biggest region for the alive population is the Northeast holding 21%, however the South region is behind the Midwest by 26% for the died population.

Demographics-adjusted pre-existing diagnosis codes associated with the COVID-19 mortality in the U.S

Table 2 displays the odds ratios of the demographic and the 29 significant pre-existing diagnosis codes that are found to be associated with the COVID-19 mortality out of the 117 diagnosis codes with the joint logistic regression model fit (p-value <  0.05). Our data provides strong statistical evidence that 13 of the distinct 15 general categories were found to be a significant category that influences the risk to COVID-19 mortality. Among the demographics variables, age, gender, race, and region are found to be statistically significant.

thumbnail
Table 2. Adjusted Odds Ratio and P-value results of the joint logistic regression.

https://doi.org/10.1371/journal.pone.0319585.t002

Among 13 significant general pre-existing diagnosis categories, 11 categories have at least one diagnosis code with an odds ratio greater than 1. The most significant one being the respiratory category, specifically Other diseases of the respiratory system diagnosis, such as acute chest syndrome and chronic respiratory failure (for full details refer to ICD 10 codes J95-J99 and ICD 9 codes 510 -519), has 100.2% increase in the odds of COVID-19 mortality and hold the highest adjusted odds ratio compared to other respiratory diseases. Interestingly, renal failure is identified to pose the second highest increase in the likelihood of COVID-19 mortality. In contrast, seven of the 15 general categories have at least one diagnosis code that possesses an odds ratio less than 1. All diagnoses identified significant under musculoskeletal disorders are observed to decrease the odds of COVID-19 mortality. In addition, other metabolic and immunity disorders, such as cystinosis and obesity (refer to ICD 9 codes 270-279), identified in the endocrine category, hold a decrease in COVID-19 mortality odds.

Statistical significance supported by the agreement between marginal and joint analyses

We implemented two logistic regression strategies: a joint model accounting for all demographic factors and diagnosis codes and a marginal model assessing each of the 117 diagnosis codes individually with adjustments for demographic variables. The joint model mitigates confounding effects but is prone to multicollinearity, potentially skewing results. However, the marginal model acts as an essential counterpart, allowing the comparison of adjusted odds ratios to gauge the influence of individual diagnosis codes. The similarity between corresponding adjusted odds ratios from the two models bolsters confidence in the significance of specific diagnosis codes, suggesting that their effects are robust to the presence of other variables.

The diagnosis z-values from the joint logistic model are plotted against the marginal diagnosis’ z-values in Fig 1. Similarly, the scatterplot represented in Fig 2 presents the associations between the adjusted odds ratios of the pre-existing diagnosis codes from the joint logistic model and the marginal logistic models. We obtained a spearman correlation coefficient for the z-value of 0.6818, which signifies that there is positive agreement between our main joint logistic model’s z-values and each individual marginal model’s z-values. We found a similar conclusion on the adjusted odds ratio scatterplot wherein the spearman coefficient for the adjusted odds ratio is 0.7651.

thumbnail
Fig 1. Scatterplot of Z-values on diagnoses given by joint and marginal analyses.

https://doi.org/10.1371/journal.pone.0319585.g001

thumbnail
Fig 2. Scatterplot of adjusted odds ratios on diagnoses given by joint and marginal analyses.

https://doi.org/10.1371/journal.pone.0319585.g002

We found a great agreement on the z-values from joint analysis and marginal analyses from the scatterplots of the overall general pre-existing diagnosis categories in Fig 3. Similarly, in Fig 4 the adjusted odds ratios are well agreed. The Spearman correlation coefficients further substantiate this alignment, with both the general z-values and the general adjusted odds ratios showing a strong correlation of 0.93.

thumbnail
Fig 3. Scatterplot of Z-values on general categories given by joint and marginal analyses.

https://doi.org/10.1371/journal.pone.0319585.g003

thumbnail
Fig 4. Scatterplot of adjusted odds ratios on general categories given by joint and marginal analyses.

https://doi.org/10.1371/journal.pone.0319585.g004

Overall, the pre-existing diagnosis codes identified by the joint analysis with the demographic adjustment are substantiated by the results from the marginal analysis. The results from the joint logistic model with the 15 general categories are also supported by the marginal analysis; therefore, our conclusion is that the 13 general diagnosis categories pose significant association with the COVID-19 mortality.

Discussion

Among the COVID-19 positive patients spanned across the United States until January 2021, our analysis highlighted many patient characteristics and underlying medical diagnosis codes to be associated with COVID-19 mortality. The logistic model displayed that all the demographic variables excluding ethnicity have prominent influence in determining COVID-19 mortality. The sample of COVID-19 positive cases obtained from the comprehensive OPTUM networks (N =  223,286) is large enough to generalize our study results to COVID-19 patients in the U.S population. Patients who are identified with the following characteristics: Male African American from the west region posed a greater risk in observing COVID-19 mortality. COVID-19 patients from the northeast region of the United States have lower odds in observing COVID-19 mortality when compared to the west region holding the highest odds in experiencing mortality.

A history of respiratory diseases, circulatory disease and diabetes hold higher odds in experiencing COVID-19 death. Such pre-existing diseases were reported as a predictor of COVID-19 mortality in the literature [4,3035]. In our current model, we have found nine new diagnoses such as Renal Failure, and Other diseases of urinary system, which are shown to hold higher odds in observing COVID-19 death. Among these diagnoses, Renal Failure presents a complex challenge in terms of determining whether the primary disease process leading to this association is chronic kidney disease (CKD), acute kidney injury (AKI), or AKI on CKD, further complicated by the necessity of dialysis [36]. This intricate relationship underscores the need for precise diagnostic differentiation to understand the exact nature of the association.

A COVID-19 study [26] observed that people residing in the Northeastern region of the United States have the highest risk of death, however we have found that the COVID-19 patients identified from the West region (OR: 1.61, p-value: 0.01) are more susceptible to observe COVID-19 mortality than the Northeast [26]. Note that our results based on the COVID-19 cases examine the fatality of the disease among the infected patients while [26] explored the risk of COVID-19 related death of the entire population. Additionally, in the findings of [25] a lower COVID-19 mortality for African American patients was detected among hospitalized patients, while our results show that African American patients hold (OR: 1.19, p-value: 0.01) an increased risk of COVID-19 mortality. Such a difference on the risk of death may be attributable to limited access to hospitalization for African American patients [37]. Further interrogations are needed.

Characteristics of COVID-19 patients admitted to hospitals were carefully investigated in [21,38,39], each of which used OPTUM, Geisinger, and three U.S. EHR databases (Academic Health System, Explorys, and OneFlorida), respectively. The results in [21,39] are based on descriptive statistics of variables characterizing hospitalized patients, not establishing statistical significance on their findings. [38] investigated patients’ phenotypes that determine their hospitalization status with Firth’s logistic regression [40] adjusted for age, sex and race, which is similar to our marginal analysis. [41] studied COVID-19 mortality with 31,461 patients from TriNetX EHR data using multivariate logistic regression. Our study results that are similar to those in [41] have statistical power increased due to a much larger dataset of 223,286 COVID-19 patients, together with our validation through the marginal analysis.

In the analysis with the 115 pre-existing diagnostic codes, several diagnostic codes including chronic rheumatic heart disease and other metabolic and immunity disorders hold lower odds in experiencing death by COVID-19, which is difficult to find supporting evidence from previous studies. Although considering the pre-existing diagnostic codes altogether is useful for evaluating specific pre-existing diagnoses, high correlations among the diagnostic codes may underestimate or overestimate the odds, misleading the results. The analyses with the general pre-existing conditions mitigate such high correlations to some extent, by merging pre-existing diagnoses under a general category. From the general analyses, all general pre-existing conditions but Digestive and Eyes and ENT are found to be significant for explaining COVID-19 mortality. We find that Circulatory and Endocrine disorders that include a diagnosis code that possesses an odds ratio less than 1 are found to hold higher odds in the COVID-19 mortality.

This study faced some limitations. There is less than a 1 percent rate of uninsured individuals in the EHR database, which is far less than that in the U.S. Thus, poverty might confound the effect of the demographic characteristics on the COVID-19 mortality. Next, use of all-cause mortality may overestimate the COVID-19 mortality. Obtaining information about causes of death would reduce such bias. Another limitation comes from the design of the dataset, which does not record the death date and the cause of the patients’ death. Such information would have allowed researchers to conduct survival analysis and various other time series analysis that could have enhanced our knowledge on which diagnoses posed a significant influence on the cause of COVID-19 mortality. The current study does not delve into comorbidity due to the limitation of the logistic model’s complexity. Further study of comorbidity on this large dataset would produce vital information into learning the interconnection between diagnoses and COVID-19 mortality [10].

Our data curation strategy for this study primarily revolved around the inclusion of all COVID-positive patients with complete demographic information. Subsequently, we collected their pre-existing diagnoses and selected only the top five hundred diagnosis codes. Consequently, certain diagnoses associated with specific diseases might not have been fully represented in the dataset. For instance, while rheumatic disease is linked to a higher risk of COVID-19 mortality, multiple diagnosis codes pertaining to this disease may not have been captured [42]. As a result, individual diagnosis codes such as ‘other joint disorders’, ‘other dorsopathies’, and ‘arthrosis’ exhibit lower odds of COVID-19 mortality. Another issue of our marginal analysis is that we did not control for other pre-existing diagnosis codes when evaluating a given individual diagnosis code. This might hinder or bias the association between that diagnosis code and the COVID-19 mortality. Controlling for other pre-existing diagnosis codes could provide additional insight into such association [22].

Additionally, future exploration of the determination of COVID-19 mortality in various clinical settings such as hospitalization, ICU, urgent care, and others currently identified in OPTUM data would help advance our healthcare practices and operations. Furthermore, determining the age of COVID-19 patients who are alive is an obscure process since due to the substantial size of visit files and our current computing memory and processing speed, obtaining the most recent visit date for each patient is an impractical task. Obtaining the accurate current age of living COVID-19 patients would deepen our knowledge on understanding the association to COVID-19 mortality based on age groups.

Conclusion

Recognizing patient health diagnosis and characteristics with COVID-19 death is vital for aiding public awareness and asserting better precautionary measures. Through 223,286 patient EHR, we were able to conduct a large cohort logistic analysis. With respect to pre-existing diagnoses, patients who have had one of the following: complications mainly related to pregnancy, other diseases of the respiratory system, renal failure, influenza and pneumonia, other bacterial diseases, coagulation defects, purpura and other hemorrhagic conditions, injuries to the head, mood [affective] disorders, aplastic and other anemias, chronic obstructive pulmonary disease and allied conditions, other forms of heart disease, diabetes mellitus, and other diseases of the urinary system are deemed to pose a greater odds ratio in exhibiting COVID-19 mortality. Our large cohort analysis can pave the way for future healthcare policies and outbreak preparedness plans.

Supporting information

S1 Fig. Steps in selection of COVID-19 patients.

https://doi.org/10.1371/journal.pone.0319585.s001

(TIF)

Acknowledgments

This study incorporates COVID-19 data which is obtained by a licensed agreement from OPTUM. Investigators can reach out directly to OPTUM regarding licensing OPTUM data assets and other OPTUM data inquiries.

References

  1. 1. Xu J, Murphy SL, Kochanek KD, Adrias E. Mortality in the United States, 2020. NCHS Data Brief No. 456, December 2022. 2021. Available from: fromhttps://www.cdc.gov/nchs/products/databriefs/db456.htm.
  2. 2. Dong E, Ratcliff J, Goyea TD, Katz A, Lau R, Ng TK, et al. The Johns Hopkins University Center for Systems Science and Engineering COVID-19 Dashboard: data collection process, challenges faced, and lessons learned. Lancet Infect Dis. 2022;22(12):e370–6. pmid:36057267
  3. 3. Treskova-Schwarzbach M, Haas L, Reda S, Pilic A, Borodova A, Karimi K, et al. Pre-existing health conditions and severe COVID-19 outcomes: an umbrella review approach and meta-analysis of global evidence. BMC Med. 2021;19(1):212. pmid:34446016
  4. 4. Aggarwal G, Cheruiyot I, Aggarwal S, Wong J, Lippi G, Lavie CJ, et al. Association of cardiovascular disease with coronavirus disease 2019 (COVID-19) Severity: A Meta-Analysis. Curr Probl Cardiol. 2020;45(8):100617. pmid:32402515
  5. 5. Chu Y, Yang J, Shi J, Zhang P, Wang X. Obesity is associated with increased severity of disease in COVID-19 pneumonia: a systematic review and meta-analysis. Eur J Med Res. 2020;25(1):64. pmid:33267871
  6. 6. Clark A, Jit M, Warren-Gash C, Guthrie B, Wang HHX, Mercer SW, et al. Global, regional, and national estimates of the population at increased risk of severe COVID-19 due to underlying health conditions in 2020: a modelling study. Lancet Glob Health. 2020;8(8):e1003–17. pmid:32553130
  7. 7. Figliozzi S, Masci PG, Ahmadi N, Tondi L, Koutli E, Aimo A, et al. Predictors of adverse prognosis in COVID-19: A systematic review and meta-analysis. Eur J Clin Invest. 2020;50(10):e13362. pmid:32726868
  8. 8. Mesas AE, Cavero-Redondo I, Álvarez-Bueno C, Sarriá Cabrera MA, Maffei de Andrade S, Sequí-Dominguez I, et al. Predictors of in-hospital COVID-19 mortality: A comprehensive systematic review and meta-analysis exploring differences by age, sex and health conditions. PLoS One. 2020;15(11):e0241742. pmid:33141836
  9. 9. Tian W, Jiang W, Yao J, Nicholson CJ, Li RH, Sigurslid HH, et al. Predictors of mortality in hospitalized COVID-19 patients: A systematic review and meta-analysis. J Med Virol. 2020;92(10):1875–83. pmid:32441789
  10. 10. Biswas M, Rahaman S, Biswas TK, Haque Z, Ibrahim B. Association of sex, age, and comorbidities with mortality in COVID-19 Patients: a systematic review and meta-analysis. Intervirology. 2020:1–12. pmid:33296901
  11. 11. Li X, Zhong X, Wang Y, Zeng X, Luo T, Liu Q. Clinical determinants of the severity of COVID-19: A systematic review and meta-analysis. PLoS One. 2021;16(5):e0250602. pmid:33939733
  12. 12. Patel U, Malik P, Usman MS, Mehta D, Sharma A, Malik FA, et al. Age-adjusted risk factors associated with mortality and mechanical ventilation utilization amongst COVID-19 hospitalizations-a systematic review and meta-analysis. SN Compr Clin Med. 2020;2(10):1740–9. pmid:32904541
  13. 13. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328. pmid:32265220
  14. 14. Guarino S, Lanzkron SM. COVID-19 in hospitalized patients with sickle cell disease. Blood. 2021;138(Supplement 1):3090–3090.
  15. 15. Qeadan F, Mensah NA, Tingey B, Stanford JB. The risk of clinical complications and death among pregnant women with COVID-19 in the Cerner COVID-19 cohort: a retrospective analysis. BMC Pregnancy Childbirth. 2021;21(1):305. pmid:33863292
  16. 16. Qeadan F, Tingey B, Egbert J, Pezzolesi MG, Burge MR, Peterson KA, et al. The associations between COVID-19 diagnosis, type 1 diabetes, and the risk of diabetic ketoacidosis: A nationwide cohort from the US using the Cerner Real-World Data. PLoS One. 2022;17(4):e0266809. pmid:35439266
  17. 17. Jovanoski N, Chen X, Becker U, Zalocusky K, Chawla D, Tsai L, et al. Severity of COVID-19 and adverse long-term outcomes: a retrospective cohort study based on a US electronic health record database. BMJ Open. 2021;11(12):e056284. pmid:34893488
  18. 18. Page JH, Londhe AA, Brooks C, Zhang J, Sprafka JM, Bennett C, et al. Trends in characteristics and outcomes among US adults hospitalised with COVID-19 throughout 2020: an observational cohort study. BMJ Open. 2022;12(2):e055137. pmid:35228287
  19. 19. Teixeira AL, Krause TM, Ghosh L, Shahani L, Machado-Vieira R, Lane SD, et al. Analysis of COVID-19 infection and mortality among patients with psychiatric disorders, 2020. JAMA Netw Open. 2021;4(11):e2134969. pmid:34812848
  20. 20. Zhu J, Wei Z, Suryavanshi M, Chen X, Xia Q, Jiang J, et al. Characteristics and outcomes of hospitalised adults with COVID-19 in a global health research network: a cohort study. BMJ Open. 2021;11(8):e051588. pmid:34362806
  21. 21. Liang C, Ogilvie RP, Doherty M, Clifford CR, Chomistek AK, Gately R, et al. Trends in COVID-19 patient characteristics in a large electronic health record database in the United States: A cohort study. PLoS One. 2022;17(7):e0271501. pmid:35857793
  22. 22. Li L, Li F, Fortunati F, Krystal JH. Association of a prior psychiatric diagnosis with mortality among hospitalized patients with coronavirus disease 2019 (COVID-19) infection. JAMA Netw Open. 2020;3(9):e2023282. pmid:32997123
  23. 23. Ayodele O, Ren K, Zhao J, Signorovitch J, Jonsson Funk M, Zhu J, et al. Real-world treatment patterns and clinical outcomes for inpatients with COVID-19 in the US from September 2020 to February 2021. PLoS One. 2021;16(12):e0261707. pmid:34962924
  24. 24. Buikema AR, Buzinec P, Paudel ML, Andrade K, Johnson JC, Edmonds YM, et al. Racial and ethnic disparity in clinical outcomes among patients with confirmed COVID-19 infection in a large US electronic health record database. EClinicalMedicine. 2021;39:101075. pmid:34493997
  25. 25. Chomistek AK, Liang C, Doherty MC, Clifford CR, Ogilvie RP, Gately RV, et al. Predictors of critical care, mechanical ventilation, and mortality among hospitalized patients with COVID-19 in an electronic health record database. BMC Infect Dis. 2022;22(1):413. pmid:35488229
  26. 26. Chen U-I, Xu H, Krause TM, Greenberg R, Dong X, Jiang X. Factors associated with COVID-19 death in the united states: cohort study. JMIR Public Health Surveill. 2022;8(5):e29343. pmid:35377319
  27. 27. U.S. Census Bureau. Geographic Levels. 2021 Oct 8 [cited 15 August 2024]. Available from: https://www.census.gov/programs-surveys/economic-census/guidance-geographies/levels.html#par_textimage_34
  28. 28. Centers for Disease Control and Prevention. ICD-10-CM Official Guidelines for Coding and Reporting. 2024 Jun 4 [cited 15 August 2024. ]. Available from: https://www.cdc.gov/nchs/icd/icd-10-cm/index.html
  29. 29. Centers for Disease Control and Prevention. International Classification of Diseases, Ninth Revision, Clinical Modification (1CD-9-CM). 2021 November 2 [cited 15 August 2024]. Available from: https://archive.cdc.gov/www_cdc_gov/nchs/icd/icd9cm.htm
  30. 30. Aveyard P, Gao M, Lindson N, Hartmann-Boyce J, Watkinson P, Young D, et al. Association between pre-existing respiratory disease and its treatment, and severe COVID-19: a population cohort study. Lancet Respir Med. 2021;9(8):909–23. pmid:33812494
  31. 31. Estiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN. Predicting COVID-19 mortality with electronic medical records. NPJ Digit Med. 2021;4(1):15. pmid:33542473
  32. 32. Beltramo G, Cottenet J, Mariet A-S, Georges M, Piroth L, Tubert-Bitter P, et al. Chronic respiratory diseases are predictors of severe outcome in COVID-19 hospitalised patients: a nationwide study. Eur Respir J. 2021;58(6):2004474. pmid:34016619
  33. 33. Mehra MR, Desai SS, Kuy S, Henry TD, Patel AN. Cardiovascular disease, drug therapy, and mortality in covid-19. N Engl J Med. 2020;382(25):e102. pmid:32356626
  34. 34. Kumar A, Arora A, Sharma P, Anikhindi SA, Bansal N, Singla V, et al. Is diabetes mellitus associated with mortality and severity of COVID-19? A meta-analysis. Diabetes Metab Syndr. 2020;14(4):535–45. pmid:32408118
  35. 35. Aggarwal G, Lippi G, Lavie CJ, Henry BM, Sanchis-Gomar F. Diabetes mellitus association with coronavirus disease 2019 (COVID-19) severity and mortality: A pooled analysis. J Diabetes. 2020;12(11):851–5. pmid:32677321
  36. 36. Zaki N, Alashwal H, Ibrahim S. Association of hypertension, diabetes, stroke, cancer, kidney disease, and high-cholesterol with COVID-19 disease severity and fatality: A systematic review. Diabetes Metab Syndr. 2020;14(5):1133–42. pmid:32663789
  37. 37. Webb Hooper M, Nápoles AM, Pérez-Stable EJ. COVID-19 and racial/ethnic disparities. JAMA. 2020;323(24):2466–7. pmid:32391864
  38. 38. Oetjens MT, Luo JZ, Chang A, Leader JB, Hartzel DN, Moore BS, et al. Electronic health record analysis identifies kidney disease as the leading risk factor for hospitalization in confirmed COVID-19 patients. PLoS One. 2020;15(11):e0242182. pmid:33180868
  39. 39. Saunders-Hastings P, Zhou CK, Hobbi S, Boyd E, Lloyd P, Alawar N, et al. Characterization of COVID-19 hospitalized patients in three united states electronic health record databases. Pathogens. 2023;12(3):390. pmid:36986311
  40. 40. Carroll RJ, Bastarache L, Denny JC. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014;30(16):2375–6. pmid:24733291
  41. 41. Harrison SL, Fazio-Eynullayeva E, Lane DA, Underhill P, Lip GYH. Comorbidities associated with mortality in 31,461 adults with COVID-19 in the United States: A federated electronic medical record analysis. PLoS Med. 2020;17(9):e1003321. pmid:32911500
  42. 42. Grainger R, Machado PM, Robinson PC. Novel coronavirus disease-2019 (COVID-19) in people with rheumatic disease: epidemiology and outcomes. Best Pract Res Clin Rheumatol. 2021;35(1):101657. pmid:33468418