Physiologic signatures within six hours of hospitalization identify acute illness phenotypes

During the early stages of hospital admission, clinicians use limited information to make decisions as patient acuity evolves. We hypothesized that clustering analysis of vital signs measured within six hours of hospital admission would reveal distinct patient phenotypes with unique pathophysiological signatures and clinical outcomes. We created a longitudinal electronic health record dataset for 75,762 adult patient admissions to a tertiary care center in 2014–2016 lasting six hours or longer. Physiotypes were derived via unsupervised machine learning in a training cohort of 41,502 patients applying consensus k-means clustering to six vital signs measured within six hours of admission. Reproducibility and correlation with clinical biomarkers and outcomes were assessed in validation cohort of 17,415 patients and testing cohort of 16,845 patients. Training, validation, and testing cohorts had similar age (54–55 years) and sex (55% female), distributions. There were four distinct clusters. Physiotype A had physiologic signals consistent with early vasoplegia, hypothermia, and low-grade inflammation and favorable short-and long-term clinical outcomes despite early, severe illness. Physiotype B exhibited early tachycardia, tachypnea, and hypoxemia followed by the highest incidence of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short- and long-term mortality. Physiotype C had minimal early physiological derangement and favorable clinical outcomes. Physiotype D had the greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had good short-term outcomes but suffered increased 3-year mortality. Comparing sequential organ failure assessment (SOFA) scores across physiotypes demonstrated that clustering did not simply recapitulate previously established acuity assessments. In a heterogeneous cohort of hospitalized patients, unsupervised machine learning techniques applied to routine, early vital sign data identified physiotypes with unique disease categories and distinct clinical outcomes. This approach has the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures.

Introduction Each year in the United States alone there are more than 36 million hospital admissions and seven thousand in-hospital mortalities, nearly one quarter of which may be preventable [1][2][3][4]. Early in each hospital admission, clinicians formulate decisions regarding diagnostic tests, treatments, and triage destinations using information that has diluted signal-to-noise ratios [5][6][7]. These arduous clinical decision-making tasks are supported by analyzing vital signs representing essential physiological processes [8][9][10][11][12]. Identifying early vital sign trajectories may have utility for discovering unique physiological signatures that are associated with distinct patient phenotypes and clinical outcomes. Unsupervised machine learning (ML) clustering analyses of clinical variables have identified meaningful subtypes of sepsis and the acute respiratory distress syndrome, but this approach has not been reported among broad, heterogeneous cohorts incorporating all hospitalized patients [13][14][15].
Using electronic health record data spanning 75,762 adult hospital admissions, we test the hypothesis that unsupervised ML analysis of vital signs recorded within six hours of hospital admission reveals discrete and reproducible physiologic signatures of acute illness phenotypes (physiotypes) that are associated with distinct disease categories and clinical outcomes.

Data source and participants
We generated a longitudinal dataset of electronic health records (EHR) for 75,762 hospital admissions of 43,598 patients representing all adults (age �18 years) admitted to the University of Florida Health 1000-bed academic hospital between June 1, 2014 and April 1, 2016 with length of stay greater than or equal to six hours including emergency department admission if applicable. Patients completely missing at least two of the six vital sign measurements (systolic and diastolic blood pressure, heart rate, respiratory rate, temperature, and oxygen saturation) within six hours of admission were excluded (S1 Fig). A detailed description of our methods is available in S1 Text. This project was approved by the University of Florida Institutional Review Board.

Study design
We followed Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) recommendations under the Type 2b analysis category [16] to chronologically split the dataset into training (admissions between June 1, 2014 and May 31, 2015, n = 41,502), validation (admissions between June 1, 2015 and October 31, 2015, n = 17,415), and testing (admissions between November 1, 2015 and April 1, 2016, n = 16,845) cohorts to mitigate potentially adverse effects of dataset drift due to changes in clinical practice or patient populations. To identify acute illness phenotypes (physiotypes) using early physiologic signatures, we applied unsupervised ML clustering to temporal measurements of six vital signs recorded within six hours of hospital admission in the training cohort. We assessed physiotype reproducibility by applying alternative clustering methods in the training dataset, assessing physiotype frequency distributions and clinical outcomes in the validation cohort, and predicting physiotypes in the testing cohort (S2 Fig).

Identifying acute illness physiotypes using early physiologic signatures
To derive physiotypes with reproducible early physiologic signatures, we applied consensus kmeans clustering [17] to 36 features derived from time series of six vital signs measured within six hours of hospital admission for each encounter in the training cohort. Based on consensus matrix plots and cumulative distribution function curves, the optimal number of physiologic clusters was four (S3 Fig) [15].
We processed raw time series to remove outliers and assess distributions, missingness, and correlation (S1 Table and S4 Fig). Raw time series were resampled to an hourly frequency, using mean values when multiple measurements were recorded during the same one-hour window. Missing values were imputed by forward and backpropagating temporally adjacent values [18]. For records with no measurements within six hours of hospitalization, we imputed median values from the training cohort. Each admission was represented by six hourly values for six vital signs, yielding 36 clustering features. Vital sign patterns were visualized using line plots with 95% confidence intervals, t-distribution stochastic neighbor embedding (t-SNE) plots, ranked plots for mean standardized difference between physiotype pairs, and vital sign mosaic plots (see S1 Text for a comprehensive description).

Clinical characteristics, biological correlates, and clinical outcomes
For each admission we extracted demographics, 19 clinical biomarkers routinely measured at hospital admission (S2 Table), Sequential Organ Dysfunction Assessment (SOFA) and Modified Early Warning Score (MEWS) acuity scores, and patient outcomes [19,20]. Details on data processing are described in S1 Text. Primary outcomes were thirty-day and three-year mortality. Median follow-up duration was 4.3 years per reverse Kaplan-Meier method. Other outcomes were acute kidney injury (AKI), venous thromboembolism, sepsis, intensive care unit (ICU) admission, mechanical ventilation (MV), and renal replacement therapy (RRT). the validation and testing cohorts (S2 Fig). We assessed the robustness of derived physiotypes using sensitivity analyses excluding variables with high missingness, excluding both highly missing and highly correlated variables, and using a 12-hour vital sign window. We validated derived physiotypes in two steps. In the validation cohort we rederived clusters using consensus k-means and compared them with training cohort clusters. In the testing cohort, we predicted physiotypes based on the clinical characteristics of training cohort clusters. Predictions arose from the minimum Euclidean distance from each patient to the centroid of each physiotype (S1 Text). Clinical variables across clusters were compared using line plots, t-distribution stochastic neighbor embedding plots, and ranked plots.
Physiotypes were compared using the χ2 test for categorical variables and analysis of variance and the Kruskal-Wallis test for continuous variables. Overall survival was illustrated using Kaplan-Meier curves and compared using the log-rank test. Adjusted hazard ratios (HR) for each physiotype were compared using Cox proportional-hazards regression while adjusting for age, sex, comorbidities, and SOFA score on admission. We adjusted p values for the family-wise error rate due to multiple comparisons using the Bonferroni correction. To assure that physiotypes did not recapitulate existing acuity scores, we compared physiotypes with SOFA scores within 24 hours of admission using alluvial plots and chord diagrams. Analyses were performed with Python version 3.7 and R version 3.5.1.

Clinical characteristics of patients
Training, validation, and testing cohorts had similar clinical characteristics, biomarker distributions, and outcomes (S3 and S4 Tables). Average patient age was 54 years and sex was equally distributed. Almost two thirds of all patients had urgent admissions, 18% were transferred from another hospital, 27% were admitted to an ICU or intermediate care unit (IMC), and 28% had surgery during admission. Among patients admitted directly to an ICU/IMC, 22-27% had high acuity scores (SOFA greater than 6 or MEWS greater than 4) on admission. Among patients admitted to hospital wards, 2-3% had high acuity scores. Overall thirty-day and three-year mortality rates were 4% and 19%, respectively.

Derivation and characteristics of physiotypes
We identified four physiotypes with unique pathophysiological signatures, disease categories, and clinical outcomes (Tables 1 and 2, S5 and S6 Tables, Fig 1). Physiotypes were labeled as Physiotype A (31% of total cohort), B (23% of total cohort), C (31% of total cohort), and D (15% of total cohort) according to ascending value of systolic blood pressure ( Fig 1A).
Physiotype A. Physiotype A exhibited early and persistent hypotension without concomitant rise in HR, high incidence of vasopressor support (32%), initial normothermia followed by decreasing body temperature, and low RR with high SpO2, consistent with having the highest proportion undergoing early surgery (35%). Despite high incidence of surgical interventions, Physiotype A had lower inflammatory markers (i.e., C-reactive protein, erythrocyte sedimentation rate) than two of the other three physiotypes. Despite early, severe illness, Physiotype A had favorable short-and long-term clinical outcomes, consistent with reversible surgical disease and evident by the greatest proportion of patients with SOFA score > 6 within 24 hours of admission (12%) but the second-lowest incidence of ICU/IMC admission (26%), AKI (16%), and three-year mortality (17%).
Physiotype B. Physiotype B exhibited early tachycardia, tachypnea, and hypoxemia. Unlike similarly hypotensive Physiotype A, Physiotype B had substantial biomarker evidence of inflammation, evident by the highest levels of C-reactive protein (53 mg/L compared with 11-
Physiotype D. Physiotype D had the greatest prevalence of chronic cardiovascular and kidney disease (32% and 20%, respectively), the greatest proportions of African American patients (37% vs. 16-23% in other physiotypes) and emergent admissions (89%), and presented with severely elevated blood pressure; 79% had a systolic blood pressure measurement greater than 160 mmHg. Physiotype D had the second highest incidence of ICU/IMC admission (27%) despite having the lowest proportion of patients with SOFA > 6 (5%) and had 2% hospital mortality but suffered 20% 3-year mortality.

Vital sign signatures
To understand which vital signs made the greatest contributions to cluster assignments, vital sign standardized mean differences were compared between pairs of phenotypes (Fig 2). Temperature and oxygen saturation contributed least to phenotype differences. Systolic and diastolic blood pressure varied substantially between all physiotypes except for A and B.

PLOS DIGITAL HEALTH
Physiologic signatures identify acute illness phenotypes

Survival probabilities
Three-year survival probability was modeled adjusting for demographics and comorbidities (Fig 3), demonstrating lower probability of survival for male sex (HR 1.

PLOS DIGITAL HEALTH
Physiologic signatures identify acute illness phenotypes
Physiotypes were also reproducible in testing data (S15 Table). The clinical characteristics, biomarkers, and patient outcomes of physiotypes predicted in the testing cohort mimicked the training cohort (S16 and S17 Figs, S16 and S17 Tables). SOFA score distributions, survival curves, and diagnosis groups were similar across training, validation, and testing cohorts (S18-S26 Figs). Gaussian mixture modeling method confirmed the statistical fit of the 4-class model (S27 Fig and S18 Table). Physiotypes identified by Gaussian mixture modeling had vital sign distributions and t-distributed stochastic neighbor embedding plots that were similar to those originally derived by consensus clustering (S28-S30 Figs, S19 and S20 Tables).

Discussion
Using six vital signs measured within six hours of hospital admission, consensus clustering identified four distinct, clinically relevant patient phenotypes with unique pathophysiological signatures, disease categories, and clinical outcomes. Blood pressure values and trends contributed substantially to cluster assignments: one hypertensive, one normotensive, and two hypotensive clusters. Among the two hypotensive clusters, one was inflammatory, the other noninflammatory according to C-reactive protein and erythrocyte sedimentation rate values. Beyond these fundamental distinctions, clusters were also differentiated by disease categories, producing the final physiotype labels. Physiotype A, hypotensive non-inflammatory surgical shock, had physiologic signals suggesting early vasoplegia and hypothermia but low-grade inflammation relative to Physiotype B, a hypotensive inflammatory pulmonary dysfunction physiotype associated with early tachycardia, tachypnea, and hypoxemia followed by greatest burdens of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short-and long-term mortality. Physiotype C, a normotensive, rapid normalization physiotype, had minimal early physiological derangement and favorable clinical outcomes. Physiotype D, hypertensive chronic disease exacerbation, had greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had favorable short-term outcomes but suffered 20% three-year mortality. Each physiotype contained substantial patient proportions across the full ranges of SOFA scores and component subscores, suggesting that clustering did not simply recapitulate SOFA acuity assessments. Finally, physiotype characteristics were reproduced with fidelity in validation and testing cohorts.
Beyond the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures, physiotypes could be adapted to augment clinical decision-making under time constraints and uncertainty. Early identification of hypotensive inflammatory pulmonary dysfunction could theoretically facilitate early ICU admission and high suspicion for sepsis with attention to resuscitation strategies that maintain adequate renal perfusion without inducing volume overload and hydrostatic pulmonary edema, primarily by focusing on providing the optimal balance of intravenous fluid resuscitation and vasopressor [5][6][7][8]22]. Early identification of normotensive rapid recovery could facilitate early hospital discharge or triage to low-intensity care settings (i.e., hospital floors), avoiding excessive monitoring testing that confers lower value of care and may impart harm from unnecessary treatments [9,10]. Early identification of hypertensive chronic disease exacerbation could suggest low value for critical care resources compared with careful postdischarge follow-up for mitigating long-term mortality, and could be built into a decision-support system that facilitates hospital ward admission and outpatient clinic visits to address modifiable risk factors and optimize medication regimens for treating the underlying chronic disease. Several statistical and machine learning methods can accurately predict risk for death, but these approaches do not elucidate pathophysiologic states or disease categories [23,24]. Conversely, clustering can identify patient phenotypes that have unique disease states and mortality risk, representing a potentially useful adjunct to clinical decision-support systems, particularly among heterogeneous patient cohorts with diverse disease etiologies.
We are unaware of previous studies using cluster analyses of early vital sign measurements to identify phenotypes in heterogeneous cohorts of patients hospitalized for any reason. Others have used clustering for identifying patients with unique disease subtypes with unique treatment responses; sepsis and diastolic heart failure are prominent examples. Seymour et al. [15] performed clustering analyses on a multi-center cohort of sepsis patients with the rationale that sepsis pathophysiology is heterogeneous and identifying distinct sepsis phenotypes may facilitate targeted therapy. Clustering was performed on both clinical and host immune response biomarker variables, identifying four distinct clusters. In a series of simulations, varying proportions of each cluster were applied to previously reported randomized controlled trials. Treatment effects varied significantly across simulations, suggesting unique treatment responses. Shah et al. [25] performed clustering analyses on a single-center cohort of patients with heart failure and preserved ejection fraction, another heterogeneous syndrome refractory to one-size-fits-all management. Clustering was performed on electrocardiogram and echocardiogram data as well as clinical variables, identifying three distinct phenotypes with unique risk-adjusted clinical outcomes. While Seymour et al. [15] and Shah et al. [25] both identified subgroups of patients within larger patient groups that share an established diagnosis, we instead apply clustering methods to any hospitalized patient, identifying broad, generalized patterns of pathophysiology rather than targeted treatment responses. This difference precludes further comparison of our results with others.
We also acknowledge several limitations. Our study used data from a single institution, limiting the generalizability of our findings, and external validation in databases from different centers is needed. Yet, it seems unlikely that selection bias significantly affected results, as all adult patients admitted for longer than six hours were included. Input variables were limited to the first six hours following hospital admission so that phenotypes could be identified early enough to support clinical decision-making under time constraints and uncertainty. It is possible that the same advantages for early decision-support could be achieved while incorporating historical patient data from previous encounters in the electronic health record; further research is necessary to determine whether this strategy is advantageous. Waveform data, though not universally available in EHRs, has the potential to improve the precision of phenotype clustering. Our clustering approach does not ensure temporal ordering of vital signs, which could influence cluster assignments. Finally, the potential of early clustering to augment clinical decision-making remains theoretical until evaluated in a prospective trial.

Conclusions
Using six vital signs measured within six hours of hospital admission, clustering analyses identified four distinct patient phenotypes that had unique disease categories and clinical outcomes and did not recapitulate previously established acuity assessments. Beyond elucidating pathophysiology by distilling thousands of disease states into a few physiological signatures, identifying patient phenotypes during the early stages of hospital admission may have important implications for clinical decision-making under time constraints.  N = 41,502). Spearman correlation heat map shows the pairwise spearman rank order correlation coefficient among the 6 vital signs studied in our paper. The darker red color, the higher correlation in positive direction. Abbreviations: RR: respiratory rate; SpO2: peripheral capillary oxygen saturation; Temp: temperature; HR: heart rate; SBP: systolic blood pressure; DBP: diastolic blood pressure. In all panels, the variables are standardized such that all means are scaled to 0 and SDs to 1. A value of 1 for the standardized variable (x-axis) signifies that the mean value for the phenotype was 1 SD higher than the mean value for both phenotypes shown in the graph as a whole. Abbreviations in order: SpO2: peripheral capillary oxygen saturation; Temp: temperature; SBP: systolic blood pressure; DBP: diastolic blood pressure, RR: respiratory rate; HR: heart rate.  N = 41,502). (A) Probabilities of assignment to cluster 1, and purple for those actually assigned to cluster 1, (B) Probabilities for patients assigned to cluster 2, and blue for those actually assigned to cluster 2, (C) Probabilities for patients assigned to cluster 3, and green for those actually assigned to cluster 3, and (D) probabilities for patients assigned to cluster 4, and orange for those actually assigned to cluster 4. Black lines correspond to median [IQR] of probability. Gray shading corresponds to region with a 45-55% (low or marginal) probability of assignment. Inset proportion is the % of 41,502 in the marginal region. (DOCX) S1  Table. Physiotype clinical characteristics and biomarkers in sensitivity analysis by excluding highly missing variable (Temperature) in the training cohort. (DOCX) S10 Table. Physiotype illness severity, clinical outcomes, and resource use in sensitivity analysis by excluding highly missing variable (Temperature) in the training cohort. (DOCX) S11 Table. Physiotype clinical characteristics and biomarkers in sensitivity analysis by excluding variables with high missingness (temperature) and correlation (diastolic blood pressure and respiratory rate) in the training cohort. (DOCX) S12 Table. Physiotype illness severity, clinical outcomes, and resource use in sensitivity analysis by excluding variables with high missingness (temperature) and correlation (diastolic blood pressure and respiratory rate) in the training cohort. (DOCX) S13 Table. Physiotype clinical characteristics and biomarkers in sensitivity analysis by using a 12 hour window of EHR data in the training cohort. (DOCX) S14 Table. Physiotype illness severity, clinical outcomes, and resource use in sensitivity analysis by using a 12 hour window of EHR data in the training cohort. (DOCX) S15 Table. Centroids of physiotypes for prediction. (DOCX) S16 Table. Physiotype clinical characteristics and biomarkers in the testing cohort. (DOCX) S17 Table. Physiotype illness severity, clinical outcomes, and resource use in the testing cohort.

Supporting information
(DOCX) S18 Table. Statistical output from gaussian mixture modeling in the training cohort (N = 41,502). (DOCX) S19 Table. Physiotype clinical characteristics and biomarkers by physiotypes derived using gaussian mixture modeling in sensitivity analysis in the training cohort. (DOCX) S20 Table. Physiotype illness severity, clinical outcomes, and resource use by physiotypes derived using gaussian mixture modeling in sensitivity analysis in the training cohort. (DOCX)