Skip to main content
Advertisement
  • Loading metrics

Physiologic signatures within six hours of hospitalization identify acute illness phenotypes

  • Yuanfang Ren,

    Roles Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Medicine, Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida, United States of America

  • Tyler J. Loftus,

    Roles Formal analysis, Investigation, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Surgery, University of Florida, Gainesville, Florida, United States of America

  • Yanjun Li,

    Roles Formal analysis, Investigation, Methodology, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Computer & Information Science & Engineering, University of Florida, Gainesville, Florida, United States of America

  • Ziyuan Guan,

    Roles Data curation, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Medicine, Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida, United States of America

  • Matthew M. Ruppert,

    Roles Data curation, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Medicine, Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida, United States of America

  • Shounak Datta,

    Roles Data curation, Methodology, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Medicine, Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida, United States of America

  • Gilbert R. Upchurch Jr.,

    Roles Supervision, Writing – review & editing

    Affiliation Department of Surgery, University of Florida, Gainesville, Florida, United States of America

  • Patrick J. Tighe,

    Roles Writing – review & editing

    Affiliation Department of Anesthesiology, University of Florida, Gainesville, Florida, United States of America

  • Parisa Rashidi,

    Roles Formal analysis, Methodology, Supervision, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, J. Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, Gainesville, Florida, United States of America

  • Benjamin Shickel,

    Roles Formal analysis, Methodology, Writing – review & editing

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Medicine, Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida, United States of America

  • Tezcan Ozrazgat-Baslanti ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing

    ‡ These authors contributed equally as senior authors.

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Medicine, Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida, United States of America, Sepsis and Critical Illness Research Center, University of Florida, Gainesville, Florida, United States of America

  • Azra Bihorac

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Resources, Supervision, Writing – original draft, Writing – review & editing

    abihorac@ufl.edu

    ‡ These authors contributed equally as senior authors.

    Affiliations Intelligent Critical Care Center (IC3), University of Florida, Gainesville, Florida, United States of America, Department of Medicine, Division of Nephrology, Hypertension, and Renal Transplantation, University of Florida, Gainesville, Florida, United States of America, Department of Surgery, University of Florida, Gainesville, Florida, United States of America, Sepsis and Critical Illness Research Center, University of Florida, Gainesville, Florida, United States of America

Abstract

During the early stages of hospital admission, clinicians use limited information to make decisions as patient acuity evolves. We hypothesized that clustering analysis of vital signs measured within six hours of hospital admission would reveal distinct patient phenotypes with unique pathophysiological signatures and clinical outcomes. We created a longitudinal electronic health record dataset for 75,762 adult patient admissions to a tertiary care center in 2014–2016 lasting six hours or longer. Physiotypes were derived via unsupervised machine learning in a training cohort of 41,502 patients applying consensus k-means clustering to six vital signs measured within six hours of admission. Reproducibility and correlation with clinical biomarkers and outcomes were assessed in validation cohort of 17,415 patients and testing cohort of 16,845 patients. Training, validation, and testing cohorts had similar age (54–55 years) and sex (55% female), distributions. There were four distinct clusters. Physiotype A had physiologic signals consistent with early vasoplegia, hypothermia, and low-grade inflammation and favorable short-and long-term clinical outcomes despite early, severe illness. Physiotype B exhibited early tachycardia, tachypnea, and hypoxemia followed by the highest incidence of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short- and long-term mortality. Physiotype C had minimal early physiological derangement and favorable clinical outcomes. Physiotype D had the greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had good short-term outcomes but suffered increased 3-year mortality. Comparing sequential organ failure assessment (SOFA) scores across physiotypes demonstrated that clustering did not simply recapitulate previously established acuity assessments. In a heterogeneous cohort of hospitalized patients, unsupervised machine learning techniques applied to routine, early vital sign data identified physiotypes with unique disease categories and distinct clinical outcomes. This approach has the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures.

Author summary

In this paper, we present a machine learning approach, consensus clustering, to group hospitalized patients based on six routinely collected vital signs measured within six hours of hospital admission into previously undescribed subsets or acute illness phenotypes that may have different risks for a poor outcome or different treatment responses. We identified four acute illness phenotypes associated with distinct clinical characteristics, biomarker patterns, and clinical outcomes. We validated the reproducibility of phenotypes using different dataset and clustering approach. The early identified phenotypes, that have unique disease states and mortality risk, have the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures and clinical decision-support systems under time constraints.

Introduction

Each year in the United States alone there are more than 36 million hospital admissions and seven thousand in-hospital mortalities, nearly one quarter of which may be preventable [14]. Early in each hospital admission, clinicians formulate decisions regarding diagnostic tests, treatments, and triage destinations using information that has diluted signal-to-noise ratios [57]. These arduous clinical decision-making tasks are supported by analyzing vital signs representing essential physiological processes [812]. Identifying early vital sign trajectories may have utility for discovering unique physiological signatures that are associated with distinct patient phenotypes and clinical outcomes. Unsupervised machine learning (ML) clustering analyses of clinical variables have identified meaningful subtypes of sepsis and the acute respiratory distress syndrome, but this approach has not been reported among broad, heterogeneous cohorts incorporating all hospitalized patients [1315].

Using electronic health record data spanning 75,762 adult hospital admissions, we test the hypothesis that unsupervised ML analysis of vital signs recorded within six hours of hospital admission reveals discrete and reproducible physiologic signatures of acute illness phenotypes (physiotypes) that are associated with distinct disease categories and clinical outcomes.

Methods

Data source and participants

We generated a longitudinal dataset of electronic health records (EHR) for 75,762 hospital admissions of 43,598 patients representing all adults (age ≥18 years) admitted to the University of Florida Health 1000-bed academic hospital between June 1, 2014 and April 1, 2016 with length of stay greater than or equal to six hours including emergency department admission if applicable. Patients completely missing at least two of the six vital sign measurements (systolic and diastolic blood pressure, heart rate, respiratory rate, temperature, and oxygen saturation) within six hours of admission were excluded (S1 Fig). A detailed description of our methods is available in S1 Text. This project was approved by the University of Florida Institutional Review Board.

Study design

We followed Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) recommendations under the Type 2b analysis category [16] to chronologically split the dataset into training (admissions between June 1, 2014 and May 31, 2015, n = 41,502), validation (admissions between June 1, 2015 and October 31, 2015, n = 17,415), and testing (admissions between November 1, 2015 and April 1, 2016, n = 16,845) cohorts to mitigate potentially adverse effects of dataset drift due to changes in clinical practice or patient populations. To identify acute illness phenotypes (physiotypes) using early physiologic signatures, we applied unsupervised ML clustering to temporal measurements of six vital signs recorded within six hours of hospital admission in the training cohort. We assessed physiotype reproducibility by applying alternative clustering methods in the training dataset, assessing physiotype frequency distributions and clinical outcomes in the validation cohort, and predicting physiotypes in the testing cohort (S2 Fig).

Identifying acute illness physiotypes using early physiologic signatures

To derive physiotypes with reproducible early physiologic signatures, we applied consensus k-means clustering [17] to 36 features derived from time series of six vital signs measured within six hours of hospital admission for each encounter in the training cohort. Based on consensus matrix plots and cumulative distribution function curves, the optimal number of physiologic clusters was four (S3 Fig) [15].

We processed raw time series to remove outliers and assess distributions, missingness, and correlation (S1 Table and S4 Fig). Raw time series were resampled to an hourly frequency, using mean values when multiple measurements were recorded during the same one-hour window. Missing values were imputed by forward and backpropagating temporally adjacent values [18]. For records with no measurements within six hours of hospitalization, we imputed median values from the training cohort. Each admission was represented by six hourly values for six vital signs, yielding 36 clustering features. Vital sign patterns were visualized using line plots with 95% confidence intervals, t-distribution stochastic neighbor embedding (t-SNE) plots, ranked plots for mean standardized difference between physiotype pairs, and vital sign mosaic plots (see S1 Text for a comprehensive description).

Clinical characteristics, biological correlates, and clinical outcomes

For each admission we extracted demographics, 19 clinical biomarkers routinely measured at hospital admission (S2 Table), Sequential Organ Dysfunction Assessment (SOFA) and Modified Early Warning Score (MEWS) acuity scores, and patient outcomes [19,20]. Details on data processing are described in S1 Text. Primary outcomes were thirty-day and three-year mortality. Median follow-up duration was 4.3 years per reverse Kaplan-Meier method. Other outcomes were acute kidney injury (AKI), venous thromboembolism, sepsis, intensive care unit (ICU) admission, mechanical ventilation (MV), and renal replacement therapy (RRT).

Statistical methods

We assessed physiotype reproducibility by comparing phenotype derivation with gaussian mixture modeling (GMM) [21] in the training dataset and by assessing frequency distributions in the validation and testing cohorts (S2 Fig). We assessed the robustness of derived physiotypes using sensitivity analyses excluding variables with high missingness, excluding both highly missing and highly correlated variables, and using a 12-hour vital sign window. We validated derived physiotypes in two steps. In the validation cohort we rederived clusters using consensus k-means and compared them with training cohort clusters. In the testing cohort, we predicted physiotypes based on the clinical characteristics of training cohort clusters. Predictions arose from the minimum Euclidean distance from each patient to the centroid of each physiotype (S1 Text). Clinical variables across clusters were compared using line plots, t-distribution stochastic neighbor embedding plots, and ranked plots.

Physiotypes were compared using the χ2 test for categorical variables and analysis of variance and the Kruskal-Wallis test for continuous variables. Overall survival was illustrated using Kaplan–Meier curves and compared using the log-rank test. Adjusted hazard ratios (HR) for each physiotype were compared using Cox proportional-hazards regression while adjusting for age, sex, comorbidities, and SOFA score on admission. We adjusted p values for the family-wise error rate due to multiple comparisons using the Bonferroni correction. To assure that physiotypes did not recapitulate existing acuity scores, we compared physiotypes with SOFA scores within 24 hours of admission using alluvial plots and chord diagrams. Analyses were performed with Python version 3.7 and R version 3.5.1.

Results

Clinical characteristics of patients

Training, validation, and testing cohorts had similar clinical characteristics, biomarker distributions, and outcomes (S3 and S4 Tables). Average patient age was 54 years and sex was equally distributed. Almost two thirds of all patients had urgent admissions, 18% were transferred from another hospital, 27% were admitted to an ICU or intermediate care unit (IMC), and 28% had surgery during admission. Among patients admitted directly to an ICU/IMC, 22–27% had high acuity scores (SOFA greater than 6 or MEWS greater than 4) on admission. Among patients admitted to hospital wards, 2–3% had high acuity scores. Overall thirty-day and three-year mortality rates were 4% and 19%, respectively.

Derivation and characteristics of physiotypes

We identified four physiotypes with unique pathophysiological signatures, disease categories, and clinical outcomes (Tables 1 and 2, S5 and S6 Tables, Fig 1). Physiotypes were labeled as Physiotype A (31% of total cohort), B (23% of total cohort), C (31% of total cohort), and D (15% of total cohort) according to ascending value of systolic blood pressure (Fig 1A).

thumbnail
Fig 1. Physiotype vital sign representations.

(A) Distribution of vital signs during the first six hours of hospital admission. (B) Visualization of physiotypes using the t-distributed stochastic neighbor embedding (t-SNE) technique. (C) Physiotype average vital sign mosaics using a self-organizing map.

https://doi.org/10.1371/journal.pdig.0000110.g001

thumbnail
Table 1. Physiotype clinical characteristics and biomarkers.

https://doi.org/10.1371/journal.pdig.0000110.t001

thumbnail
Table 2. Physiotype illness severity, clinical outcomes, and resource use.

https://doi.org/10.1371/journal.pdig.0000110.t002

Physiotype A.

Physiotype A exhibited early and persistent hypotension without concomitant rise in HR, high incidence of vasopressor support (32%), initial normothermia followed by decreasing body temperature, and low RR with high SpO2, consistent with having the highest proportion undergoing early surgery (35%). Despite high incidence of surgical interventions, Physiotype A had lower inflammatory markers (i.e., C-reactive protein, erythrocyte sedimentation rate) than two of the other three physiotypes. Despite early, severe illness, Physiotype A had favorable short-and long-term clinical outcomes, consistent with reversible surgical disease and evident by the greatest proportion of patients with SOFA score > 6 within 24 hours of admission (12%) but the second-lowest incidence of ICU/IMC admission (26%), AKI (16%), and three-year mortality (17%).

Physiotype B.

Physiotype B exhibited early tachycardia, tachypnea, and hypoxemia. Unlike similarly hypotensive Physiotype A, Physiotype B had substantial biomarker evidence of inflammation, evident by the highest levels of C-reactive protein (53 mg/L compared with 11–18 mg/L in all other physiotypes) and erythrocyte sedimentation rate. The Physiotype B biomarker profile also suggested infection and perfusion deficit manifested as higher white blood cell counts, and base deficit values (6.4 vs. 3.6–4.4 mmol/L in other physiotypes). Physiotype B had the highest incidence of prolonged respiratory insufficiency (11% receiving mechanical ventilation, 59% of whom received more than 2 calendar days of ventilator support), sepsis (20%, compared with 4–7% in the other physiotypes), acute kidney injury (22%), hospital mortality (5%, more than two-fold greater than all other physiotypes) and three-year mortality (25%).

Physiotype C.

Physiotype C had minimal early physiological derangement and a diffuse pattern of mild organ dysfunction. Physiotype C had favorable clinical outcomes manifest as the lowest incidence of ICU/IMC admission (20%), AKI (13%), sepsis (4%), hospital mortality (2%), and three-year mortality (16%), despite having comorbid disease burdens like other physiotypes (cardiovascular disease: 29% vs. 27–32%, diabetes: 24% vs. 23–27%, chronic kidney disease: 16% vs. 14–20%).

Physiotype D.

Physiotype D had the greatest prevalence of chronic cardiovascular and kidney disease (32% and 20%, respectively), the greatest proportions of African American patients (37% vs. 16–23% in other physiotypes) and emergent admissions (89%), and presented with severely elevated blood pressure; 79% had a systolic blood pressure measurement greater than 160 mmHg. Physiotype D had the second highest incidence of ICU/IMC admission (27%) despite having the lowest proportion of patients with SOFA > 6 (5%) and had 2% hospital mortality but suffered 20% 3-year mortality.

Vital sign signatures

To understand which vital signs made the greatest contributions to cluster assignments, vital sign standardized mean differences were compared between pairs of phenotypes (Fig 2). Temperature and oxygen saturation contributed least to phenotype differences. Systolic and diastolic blood pressure varied substantially between all physiotypes except for A and B. Fig 2 demonstrates that consensus clustering and Gaussian mixture modeling yielded similar standardized variable values. Fig 1C and S5 Fig illustrate average vital sign mosaics for each physiotype and individual vital sign mosaics for two example patients, demonstrating how each physiotype had a unique overall mosaic that was representative of individual patient mosaics.

thumbnail
Fig 2. Vital sign contributions to cluster assignments.

Pairwise physiotype comparisons of vital sign values standardized to mean 0 and standard deviation 1 demonstrated that temperature and oxygen saturation contributed least to phenotype differences. Systolic and diastolic blood pressure varied substantially between all Physiotypes except for A and B. SpO2: peripheral capillary oxygen saturation; Temp: temperature; SBP: systolic blood pressure; DBP: diastolic blood pressure, RR: respiratory rate; HR: heart rate.

https://doi.org/10.1371/journal.pdig.0000110.g002

SOFA scores

Associations between physiotypes and highest SOFA score within first 24 hours of admission are illustrated in S6 Fig; SOFA components for each physiotype are illustrated in chord diagrams in S7 Fig. Physiotypes A and B had the greatest proportions of patients with cardiovascular and respiratory dysfunction. Yet, each physiotype contained substantial proportions of patients across the full range of SOFA scores and component subscores; clustering did not simply recapitulate SOFA acuity assessments.

Survival probabilities

Three-year survival probability was modeled adjusting for demographics and comorbidities (Fig 3), demonstrating lower probability of survival for male sex (HR 1.5, 95% CI 1.4–1.5), and age 65 years or greater (HR 2.9, 95% CI 2.7–3.1). Using Physiotype C as a reference, probability of survival was lower for Physiotype A (HR 1.1, 95% CI 1.0–1.2), D (HR 1.4, 95% CI 1.3–1.6), and B (HR 1.8, 95% CI 1.7–1.9, all p<0.008). Three-year survival probability was similarly modeled, (S8C and S8D Figs), demonstrating similar associations among demographics, comorbidities and survival, and strong associations between higher SOFA and lower survival probability (SOFA 2–4: HR 1.6, 95% CI 1.5–1.7; SOFA 5 or greater: HR 2.3, 95% CI 2.1–2.4, all p<0.001). When adjusting for SOFA, Physiotype A was not associated with lower survival probability (HR 1.0, 95% CI 0.9–1.0, p = 0.146), but Physiotypes D and B were (HR 1.5, 95% CI 1.4–1.7 and HR 1.7, 95% CI 1.6–1.8, respectively).

thumbnail
Fig 3. Survival curves and Cox proportional hazards modeling.

(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. CCI: Charlson Comorbidity Index.

https://doi.org/10.1371/journal.pdig.0000110.g003

Reproducibility

Proportions of the total cohort in each physiotype were stable across training, validation, and testing (Physiotype A: 31%, 30%, and 30%, respectively; Physiotype B: 23%, 23%, and 24%, respectively; Physiotype C: 31%, 31%, and 31%, respectively; Physiotype D: 15%, 16%, and 15%, respectively). Physiotypes derived in the validation cohort using consensus k-means clustering showed similar clinical characteristics, biomarkers, and patient outcomes as observed in the validation cohort (S9S12 Figs, S7 and S8 Tables). No significant differences were observed after excluding temperature values with high missingness (S13 Fig, S9 and S10 Tables), or after excluding variables with high missingness and correlation (S14 Fig, S11 and S12 Tables). Similar trends were observed when using 12-hour vital sign window (S15 Fig, S13 and S14 Tables). Physiotypes were also reproducible in testing data (S15 Table). The clinical characteristics, biomarkers, and patient outcomes of physiotypes predicted in the testing cohort mimicked the training cohort (S16 and S17 Figs, S16 and S17 Tables). SOFA score distributions, survival curves, and diagnosis groups were similar across training, validation, and testing cohorts (S18S26 Figs). Gaussian mixture modeling method confirmed the statistical fit of the 4-class model (S27 Fig and S18 Table). Physiotypes identified by Gaussian mixture modeling had vital sign distributions and t-distributed stochastic neighbor embedding plots that were similar to those originally derived by consensus clustering (S28S30 Figs, S19 and S20 Tables).

Discussion

Using six vital signs measured within six hours of hospital admission, consensus clustering identified four distinct, clinically relevant patient phenotypes with unique pathophysiological signatures, disease categories, and clinical outcomes. Blood pressure values and trends contributed substantially to cluster assignments: one hypertensive, one normotensive, and two hypotensive clusters. Among the two hypotensive clusters, one was inflammatory, the other non-inflammatory according to C-reactive protein and erythrocyte sedimentation rate values. Beyond these fundamental distinctions, clusters were also differentiated by disease categories, producing the final physiotype labels. Physiotype A, hypotensive non-inflammatory surgical shock, had physiologic signals suggesting early vasoplegia and hypothermia but low-grade inflammation relative to Physiotype B, a hypotensive inflammatory pulmonary dysfunction physiotype associated with early tachycardia, tachypnea, and hypoxemia followed by greatest burdens of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short- and long-term mortality. Physiotype C, a normotensive, rapid normalization physiotype, had minimal early physiological derangement and favorable clinical outcomes. Physiotype D, hypertensive chronic disease exacerbation, had greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had favorable short-term outcomes but suffered 20% three-year mortality. Each physiotype contained substantial patient proportions across the full ranges of SOFA scores and component subscores, suggesting that clustering did not simply recapitulate SOFA acuity assessments. Finally, physiotype characteristics were reproduced with fidelity in validation and testing cohorts.

Beyond the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures, physiotypes could be adapted to augment clinical decision-making under time constraints and uncertainty. Early identification of hypotensive inflammatory pulmonary dysfunction could theoretically facilitate early ICU admission and high suspicion for sepsis with attention to resuscitation strategies that maintain adequate renal perfusion without inducing volume overload and hydrostatic pulmonary edema, primarily by focusing on providing the optimal balance of intravenous fluid resuscitation and vasopressor [58,22]. Early identification of normotensive rapid recovery could facilitate early hospital discharge or triage to low-intensity care settings (i.e., hospital floors), avoiding excessive monitoring testing that confers lower value of care and may impart harm from unnecessary treatments [9,10]. Early identification of hypertensive chronic disease exacerbation could suggest low value for critical care resources compared with careful post-discharge follow-up for mitigating long-term mortality, and could be built into a decision-support system that facilitates hospital ward admission and outpatient clinic visits to address modifiable risk factors and optimize medication regimens for treating the underlying chronic disease. Several statistical and machine learning methods can accurately predict risk for death, but these approaches do not elucidate pathophysiologic states or disease categories [23,24]. Conversely, clustering can identify patient phenotypes that have unique disease states and mortality risk, representing a potentially useful adjunct to clinical decision-support systems, particularly among heterogeneous patient cohorts with diverse disease etiologies.

We are unaware of previous studies using cluster analyses of early vital sign measurements to identify phenotypes in heterogeneous cohorts of patients hospitalized for any reason. Others have used clustering for identifying patients with unique disease subtypes with unique treatment responses; sepsis and diastolic heart failure are prominent examples. Seymour et al. [15] performed clustering analyses on a multi-center cohort of sepsis patients with the rationale that sepsis pathophysiology is heterogeneous and identifying distinct sepsis phenotypes may facilitate targeted therapy. Clustering was performed on both clinical and host immune response biomarker variables, identifying four distinct clusters. In a series of simulations, varying proportions of each cluster were applied to previously reported randomized controlled trials. Treatment effects varied significantly across simulations, suggesting unique treatment responses. Shah et al. [25] performed clustering analyses on a single-center cohort of patients with heart failure and preserved ejection fraction, another heterogeneous syndrome refractory to one-size-fits-all management. Clustering was performed on electrocardiogram and echocardiogram data as well as clinical variables, identifying three distinct phenotypes with unique risk-adjusted clinical outcomes. While Seymour et al. [15] and Shah et al. [25] both identified subgroups of patients within larger patient groups that share an established diagnosis, we instead apply clustering methods to any hospitalized patient, identifying broad, generalized patterns of pathophysiology rather than targeted treatment responses. This difference precludes further comparison of our results with others.

We also acknowledge several limitations. Our study used data from a single institution, limiting the generalizability of our findings, and external validation in databases from different centers is needed. Yet, it seems unlikely that selection bias significantly affected results, as all adult patients admitted for longer than six hours were included. Input variables were limited to the first six hours following hospital admission so that phenotypes could be identified early enough to support clinical decision-making under time constraints and uncertainty. It is possible that the same advantages for early decision-support could be achieved while incorporating historical patient data from previous encounters in the electronic health record; further research is necessary to determine whether this strategy is advantageous. Waveform data, though not universally available in EHRs, has the potential to improve the precision of phenotype clustering. Our clustering approach does not ensure temporal ordering of vital signs, which could influence cluster assignments. Finally, the potential of early clustering to augment clinical decision-making remains theoretical until evaluated in a prospective trial.

Conclusions

Using six vital signs measured within six hours of hospital admission, clustering analyses identified four distinct patient phenotypes that had unique disease categories and clinical outcomes and did not recapitulate previously established acuity assessments. Beyond elucidating pathophysiology by distilling thousands of disease states into a few physiological signatures, identifying patient phenotypes during the early stages of hospital admission may have important implications for clinical decision-making under time constraints.

Supporting information

S1 Text. Detailed description of methods.

https://doi.org/10.1371/journal.pdig.0000110.s001

(DOCX)

S1 Fig. Cohort selection and exclusion criteria.

https://doi.org/10.1371/journal.pdig.0000110.s002

(DOCX)

S2 Fig. Purposes of training, validation, and testing cohorts.

https://doi.org/10.1371/journal.pdig.0000110.s003

(DOCX)

S3 Fig. Consensus k clustering results in training cohort (N = 41,502).

(A) Unsupervised consensus k clustering in training cohort showing optimal partitioning in consensus matrix for k = 4. (B) Consensus cumulative distribution function (CDF) across k = 2 to k = 8, where more horizontal curves suggest optimal fit. (C) Relative change in the area under the CDF curve with increasing clusters (k), with little change beyond k = 4. (D) Cluster consensus plot showing the mean of all pairwise consensus values between a cluster members, for k = 2 to k = 8 where greater values for all bars suggest optimal fit.

https://doi.org/10.1371/journal.pdig.0000110.s004

(DOCX)

S4 Fig. Spearman correlation heat map for the training cohort (N = 41,502).

Spearman correlation heat map shows the pairwise spearman rank order correlation coefficient among the 6 vital signs studied in our paper. The darker red color, the higher correlation in positive direction. Abbreviations: RR: respiratory rate; SpO2: peripheral capillary oxygen saturation; Temp: temperature; HR: heart rate; SBP: systolic blood pressure; DBP: diastolic blood pressure.

https://doi.org/10.1371/journal.pdig.0000110.s005

(DOCX)

S5 Fig. Average vital sign mosaics of phenotypes using a self-organizing map in the training cohort (N = 41,502).

https://doi.org/10.1371/journal.pdig.0000110.s006

(DOCX)

S6 Fig. Alluvial plot showing distribution of phenotypes across worst SOFA scores of patients within first 24 hours of admission in training cohort.

For each phenotype, the larger percentage of patients with that score, the broader the ribbon.

https://doi.org/10.1371/journal.pdig.0000110.s007

(DOCX)

S7 Fig. Chord diagrams showing the distribution of patients with higher SOFA scores (i.e., 2+) within first 24 hours of admission of six organ systems by phenotypes in training cohort.

For each phenotype, the larger percentage of patients with higher score of that organ system, the border the ribbon.

https://doi.org/10.1371/journal.pdig.0000110.s008

(DOCX)

S8 Fig. Survival curves and Cox proportional hazards modeling by phenotypes in training cohort.

(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. (C) Physiotype survival curves adjusted using demographic information, comorbidities, and SOFA scores. (D) Adjusted Cox proportional hazards model using demographic information, comorbidities, and SOFA scores. Abbreviation: CCI: charlson comorbidity index; SOFA: sequential organ failure assessment.

https://doi.org/10.1371/journal.pdig.0000110.s009

(DOCX)

S9 Fig. Consensus k clustering results in validation cohort (N = 17,415).

(A) Unsupervised consensus k clustering in training cohort showing optimal partitioning in consensus matrix for k = 4. (B) Consensus cumulative distribution function (CDF) across k = 2 to k = 8, where more horizontal curves suggest optimal fit. (C) Relative change in the area under the CDF curve with increasing clusters (k), with little change beyond k = 4. (D) Cluster consensus plot showing the mean of all pairwise consensus values between a cluster members, for k = 2 to k = 8 where greater values for all bars suggest optimal fit.

https://doi.org/10.1371/journal.pdig.0000110.s010

(DOCX)

S10 Fig. Mean standardized differences between variables across phenotype pairs for training cohort (N = 41,502, dark line) and validation cohort (N = 17,415) using consensus clustering.

In all panels, the variables are standardized such that all means are scaled to 0 and SDs to 1. A value of 1 for the standardized variable (x-axis) signifies that the mean value for the phenotype was 1 SD higher than the mean value for both phenotypes shown in the graph as a whole. Abbreviations in order: SpO2: peripheral capillary oxygen saturation; Temp: temperature; SBP: systolic blood pressure; DBP: diastolic blood pressure, RR: respiratory rate; HR: heart rate.

https://doi.org/10.1371/journal.pdig.0000110.s011

(DOCX)

S11 Fig. Validation cohort clusters had unique distributions of vital signs during the first six hours of admission.

https://doi.org/10.1371/journal.pdig.0000110.s012

(DOCX)

S12 Fig. t-SNE plot of penotype assignments in validation cohort.

Starting from the original 36 dimensional vital signs, we run the t-SNE to reduce to 2 dimensions. Each dot represents a patient. Phenotypes are shown in separate colors.

https://doi.org/10.1371/journal.pdig.0000110.s013

(DOCX)

S13 Fig. Distribution of vital signs by phenotypes in sensitivity analysis excluding variables with high missingness (temperature) in training cohort (N = 41,502).

https://doi.org/10.1371/journal.pdig.0000110.s014

(DOCX)

S14 Fig. Distribution of vital signs by phenotypes in sensitivity analysis excluding variables with high correlation (diastolic blood pressure and respiratory rate) and high missingness (temperature) in training cohort (N = 41,502).

https://doi.org/10.1371/journal.pdig.0000110.s015

(DOCX)

S15 Fig. Distribution of vital signs by phenotypes in sensitivity analysis using a 12-hour window of EHR data in the training cohort (N = 41,502).

https://doi.org/10.1371/journal.pdig.0000110.s016

(DOCX)

S16 Fig. Testing cohort clusters had unique distributions of vital signs during the first six hours of admission.

https://doi.org/10.1371/journal.pdig.0000110.s017

(DOCX)

S17 Fig. t-SNE plot of phenotype assignments in testing cohort.

Starting from the original 36 dimensional vital signs, we run the t-SNE to reduce to 2 dimensions. Each dot represents a patient. Phenotypes are shown in separate colors.

https://doi.org/10.1371/journal.pdig.0000110.s018

(DOCX)

S18 Fig. Alluvial plot showing distribution of phenotypes across worst SOFA scores of patients within first 24 hours of admission in validation cohort.

For each phenotype, the larger percentage of patients with that score, the broader the ribbon.

https://doi.org/10.1371/journal.pdig.0000110.s019

(DOCX)

S19 Fig. Chord diagrams showing the distribution of patients with higher SOFA scores (i.e., 2+) within first 24 hours of admission of six organ systems by phenotypes in validation cohort.

For each phenotype, the larger percentage of patients with higher score of that organ system, the border the ribbon.

https://doi.org/10.1371/journal.pdig.0000110.s020

(DOCX)

S20 Fig. Alluvial plot showing distribution of phenotypes across worst SOFA scores of patients within first 24 hours of admission in testing cohort.

For each phenotype, the larger percentage of patients with that score, the broader the ribbon.

https://doi.org/10.1371/journal.pdig.0000110.s021

(DOCX)

S21 Fig. Chord diagrams showing the distribution of patients with higher SOFA scores (i.e., 2+) within first 24 hours of admission of six organ systems by phenotypes in testing cohort.

For each phenotype, the larger percentage of patients with higher score of that organ system, the border the ribbon.

https://doi.org/10.1371/journal.pdig.0000110.s022

(DOCX)

S22 Fig. Survival curves and Cox proportional hazards modeling by phenotypes in validation cohort.

(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. (C) Physiotype survival curves adjusted using demographic information, comorbidities, and SOFA scores. (D) Adjusted Cox proportional hazards model using demographic information, comorbidities, and SOFA scores. Abbreviation: CCI: charlson comorbidity index; SOFA: sequential organ failure assessment.

https://doi.org/10.1371/journal.pdig.0000110.s023

(DOCX)

S23 Fig. Survival curves and Cox proportional hazards modeling by phenotypes in testing cohort.

(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. (C) Physiotype survival curves adjusted using demographic information, comorbidities, and SOFA scores. (D) Adjusted Cox proportional hazards model using demographic information, comorbidities, and SOFA scores. Abbreviation: CCI: charlson comorbidity index; SOFA: sequential organ failure assessment.

https://doi.org/10.1371/journal.pdig.0000110.s024

(DOCX)

S24 Fig. Chord diagrams showing the distribution of nine most common admission diagnosis groups by phenotype in training cohort.

Diagnosis groups are shown in order of frequencies of all patients. For each phenotype, the larger percentage of patients with that diagnosis, the border the ribbon. Detailed diagnosis groups from left to right are: Nonspecific chest pain, Abdominal pain, Other and unspecific lower respiratory disease, Complication of device; implant or graft, Speticemia (except in labor), Acute cerebrovascular disease, Cardiac dysrhythmias, Congestive heart failure; nonhypertensive, and Osteoarthritis.

https://doi.org/10.1371/journal.pdig.0000110.s025

(DOCX)

S25 Fig. Chord diagrams showing the distribution of nine most common admission diagnosis groups by phenotype in validation cohort.

Diagnosis groups are shown in order of frequencies of all patients. For each phenotype, the larger percentage of patients with that diagnosis, the border the ribbon. Detailed diagnosis groups from left to right are: Nonspecific chest pain, Abdominal pain, Complication of device; implant or graft, Other and unspecific lower respiratory disease, Speticemia (except in labor), Malaise and fatigue, Acute cerebrovascular disease, Osteoarthritis, and Cardiac dysrhythmias.

https://doi.org/10.1371/journal.pdig.0000110.s026

(DOCX)

S26 Fig. Chord diagrams showing the distribution of nine most common admission diagnosis groups by phenotype in testing cohort.

Diagnosis groups are shown in order of frequencies of all patients. For each phenotype, the larger percentage of patients with that diagnosis, the border the ribbon. Detailed diagnosis groups from left to right are: Nonspecific chest pain, Other and unspecific lower respiratory disease, Speticemia (except in labor), Abdominal pain, Complication of device; implant or graft, Acute cerebrovascular disease, Cardiac dysrhythmias, Osteoarthritis, and Other complications of pregnancy.

https://doi.org/10.1371/journal.pdig.0000110.s027

(DOCX)

S27 Fig. Sensitivity analysis using gaussian mixture modeling clustering in training cohort (N = 41,502), showing probabilities of phenotype assignment.

Interpretive example: Using gaussian mixture modeling to derive phenotypes, histograms of within phenotype probability demonstrated that members have high probability of being a phenotype member (>0.9).

https://doi.org/10.1371/journal.pdig.0000110.s028

(DOCX)

S28 Fig. Distribution of vital signs by phenotypes derived using gaussian mixture modeling clustering in the training cohort (N = 41,502).

https://doi.org/10.1371/journal.pdig.0000110.s029

(DOCX)

S29 Fig. t-SNE plot of phenotype assignments in training cohort.

Visualization of phenotypes using t-distributed stochastic neighbor embedding (t-SNE) technique in the training cohort with (A) physiotypes derived by consensus clustering shown in color, and (B) physiotypes derived by gaussian mixture modeling (GMM) shown in color.

https://doi.org/10.1371/journal.pdig.0000110.s030

(DOCX)

S30 Fig. Probabilities of assignment for phenotype members and for those not assigned, using gaussian mixture modeling in the training cohort (N = 41,502).

(A) Probabilities of assignment to cluster 1, and purple for those actually assigned to cluster 1, (B) Probabilities for patients assigned to cluster 2, and blue for those actually assigned to cluster 2, (C) Probabilities for patients assigned to cluster 3, and green for those actually assigned to cluster 3, and (D) probabilities for patients assigned to cluster 4, and orange for those actually assigned to cluster 4. Black lines correspond to median [IQR] of probability. Gray shading corresponds to region with a 45–55% (low or marginal) probability of assignment. Inset proportion is the % of 41,502 in the marginal region.

https://doi.org/10.1371/journal.pdig.0000110.s031

(DOCX)

S1 Table. Processing of vital sign time series.

https://doi.org/10.1371/journal.pdig.0000110.s032

(DOCX)

S2 Table. Used LOINCS, range of values, direction of abnormal values for lab variables.

https://doi.org/10.1371/journal.pdig.0000110.s033

(DOCX)

S3 Table. Clinical characteristics and biomarkers of the cohorts.

https://doi.org/10.1371/journal.pdig.0000110.s034

(DOCX)

S4 Table. Illness severity, clinical outcomes, and resource use of the cohorts.

https://doi.org/10.1371/journal.pdig.0000110.s035

(DOCX)

S5 Table. Physiotype clinical characteristics and biomarkers in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s036

(DOCX)

S6 Table. Physiotype illness severity, clinical outcomes, and resource use in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s037

(DOCX)

S7 Table. Physiotype clinical characteristics and biomarkers in the validation cohort.

https://doi.org/10.1371/journal.pdig.0000110.s038

(DOCX)

S8 Table. Physiotype illness severity, clinical outcomes, and resource use in the validation cohort.

https://doi.org/10.1371/journal.pdig.0000110.s039

(DOCX)

S9 Table. Physiotype clinical characteristics and biomarkers in sensitivity analysis by excluding highly missing variable (Temperature) in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s040

(DOCX)

S10 Table. Physiotype illness severity, clinical outcomes, and resource use in sensitivity analysis by excluding highly missing variable (Temperature) in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s041

(DOCX)

S11 Table. Physiotype clinical characteristics and biomarkers in sensitivity analysis by excluding variables with high missingness (temperature) and correlation (diastolic blood pressure and respiratory rate) in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s042

(DOCX)

S12 Table. Physiotype illness severity, clinical outcomes, and resource use in sensitivity analysis by excluding variables with high missingness (temperature) and correlation (diastolic blood pressure and respiratory rate) in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s043

(DOCX)

S13 Table. Physiotype clinical characteristics and biomarkers in sensitivity analysis by using a 12 hour window of EHR data in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s044

(DOCX)

S14 Table. Physiotype illness severity, clinical outcomes, and resource use in sensitivity analysis by using a 12 hour window of EHR data in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s045

(DOCX)

S15 Table. Centroids of physiotypes for prediction.

https://doi.org/10.1371/journal.pdig.0000110.s046

(DOCX)

S16 Table. Physiotype clinical characteristics and biomarkers in the testing cohort.

https://doi.org/10.1371/journal.pdig.0000110.s047

(DOCX)

S17 Table. Physiotype illness severity, clinical outcomes, and resource use in the testing cohort.

https://doi.org/10.1371/journal.pdig.0000110.s048

(DOCX)

S18 Table. Statistical output from gaussian mixture modeling in the training cohort (N = 41,502).

https://doi.org/10.1371/journal.pdig.0000110.s049

(DOCX)

S19 Table. Physiotype clinical characteristics and biomarkers by physiotypes derived using gaussian mixture modeling in sensitivity analysis in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s050

(DOCX)

S20 Table. Physiotype illness severity, clinical outcomes, and resource use by physiotypes derived using gaussian mixture modeling in sensitivity analysis in the training cohort.

https://doi.org/10.1371/journal.pdig.0000110.s051

(DOCX)

Acknowledgments

The content is solely the responsibility of the authors. AB and TOB had full access to all of the data. The authors thank members of the Intelligent Critical Care Center and Integrated Data Repository at the University of Florida Health for supporting this work.

References

  1. 1. Graber ML, Franklin N, Gordon R. Diagnostic Error in Internal Medicine. Archives of Internal Medicine. 2005; 165 (13): 1493–9. pmid:16009864
  2. 2. Blumenthal-Barby JS, Krieger H. Cognitive Biases and Heuristics in Medical Decision Making:A Critical Review Using a Systematic Search Strategy. Medical Decision Making. 2015;35(4):539–57. pmid:25145577.
  3. 3. Kirch W, Schafii C. Misdiagnosis at a University Hospital in 4 Medical Eras Report on 400 Cases. Medicine (Baltimore). 1996 Jan;75(1):29–40. 00005792-199601000-00004; PubMed Central PMCID: PMC8569468. pmid:8569468
  4. 4. Ludolph R, Schulz PJ. Debiasing Health-Related Judgments and Decision Making: A Systematic Review. Medical Decision Making. 2018 Jan;38(1):3–13. Epub 2017 Jun 25. pmid:28649904.
  5. 5. Pearse RM, Harrison DA, James P, Watson D, Hinds C, Rhodes A, et al. Identification and characterisation of the high-risk surgical population in the United Kingdom. Crit Care. 2006;10(3):R81–R. Epub 2006/06/02. pmid:16749940.
  6. 6. Skogvoll E, Isern E, Sangolt GK, Gisvold SE. In-hospital cardiopulmonary resuscitation. Acta Anaesthesiologica Scandinavica. 1999;43(2):177–84. pmid:10027025
  7. 7. Merchant RM, Yang L, Becker LB, Berg RA, Nadkarni V, Nichol G, et al. Incidence of treated cardiac arrest in hospitalized patients in the United States. Crit Care Med. 2011;39(11):2401–6. pmid:21705896.
  8. 8. Perman SM, Stanton E, Soar J, Berg RA, Donnino MW, Mikkelsen ME, et al. Location of In‐Hospital Cardiac Arrest in the United States—Variability in Event Rate and Outcomes. Journal of the American Heart Association. 2016;5(10):e003638. pmid:27688235
  9. 9. Levinson W, Kallewaard M, Bhatia RS, Wolfson D, Shortt S, Kerr EA. ‘Choosing Wisely’: a growing international campaign. BMJ Quality & Safety. 2015;24(2):167–74. pmid:25552584
  10. 10. Emanuel EJ, Fuchs VR. The Perfect Storm of Overutilization. JAMA. 2008;299(23):2789–91. pmid:18560006
  11. 11. Van den Bruel A, Thompson M, Buntinx F, Mant D. Clinicians’ gut feeling about serious infections in children: observational study. BMJ: British Medical Journal. 2012;345:e6144. pmid:23015034
  12. 12. Van den Bruel A, Haj-Hassan T, Thompson M, Buntinx F, Mant D. Diagnostic value of clinical features at presentation to identify serious infection in children in developed countries: a systematic review. The Lancet. 2010;375(9717):834–45. pmid:20132979
  13. 13. Calfee CS, Delucchi K, Parsons PE, Thompson BT, Ware LB, Matthay MA, et al. Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials. Lancet Respir Med. 2014;2(8):611–20. Epub 2014/05/24. pmid:24853585; PubMed Central PMCID: PMC4154544.
  14. 14. Famous KR, Delucchi K, Ware LB, Kangelaris KN, Liu KD, Thompson BT, et al. Acute Respiratory Distress Syndrome Subphenotypes Respond Differently to Randomized Fluid Management Strategy. Am J Respir Crit Care Med. 2017;195(3):331–8. Epub 2016/08/12. pmid:27513822; PubMed Central PMCID: PMC5328179.
  15. 15. Seymour CW, Kennedy JN, Wang S, Chang C-CH, Elliott CF, Xu Z, et al. Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis. JAMA. 2019;321(20):2003–17. pmid:31104070
  16. 16. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Journal of British Surgery. 2015;102(3):148–58.
  17. 17. Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning. 2003;52(1):91–118.
  18. 18. Booth HP, Prevost AT, Gulliford MC. Validity of smoking prevalence estimates from primary care electronic health records compared with national population survey data for England, 2007 to 2011. Pharmacoepidemiol Drug Saf. 2013;22(12):1357–61. Epub 2013/11/19. pmid:24243711.
  19. 19. Shickel B, Loftus TJ, Adhikari L, Ozrazgat-Baslanti T, Bihorac A, Rashidi P. DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning. Sci Rep. 2019;9(1):1879. Epub 2019/02/14. pmid:30755689; PubMed Central PMCID: PMC6372608.
  20. 20. Subbe CP, Kruger M, Rutherford P, Gemmel L. Validation of a modified Early Warning Score in medical admissions. Qjm. 2001;94(10):521–6. Epub 2001/10/06. pmid:11588210.
  21. 21. Fraley C, Raftery AE. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal. 1998;41(8):578–88.
  22. 22. Jhanji S, Thomas B, Ely A, Watson D, Hinds CJ, Pearse RM. Mortality and utilisation of critical care resources amongst high-risk surgical patients in a large NHS trust*. Anaesthesia. 2008;63(7):695–700. pmid:18489613
  23. 23. Green M, Lander H, Snyder A, Hudson P, Churpek M, Edelson D. Comparison of the Between the Flags calling criteria to the MEWS, NEWS and the electronic Cardiac Arrest Risk Triage (eCART) score for the identification of deteriorating ward patients. Resuscitation. 2018;123:86–91. Epub 2017/11/25. pmid:29169912; PubMed Central PMCID: PMC6556215.
  24. 24. Rothman MJ, Rothman SI, Beals Jt. Development and validation of a continuous measure of patient condition using the Electronic Medical Record. J Biomed Inform. 2013;46(5):837–48. Epub 2013/07/09. pmid:23831554.
  25. 25. Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M, et al. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation. 2015;131(3):269–79. Epub 2014/11/16. pmid:25398313; PubMed Central PMCID: PMC4302027.