The authors have declared that no competing interests exist.
‡ These authors contributed equally as senior authors.
During the early stages of hospital admission, clinicians use limited information to make decisions as patient acuity evolves. We hypothesized that clustering analysis of vital signs measured within six hours of hospital admission would reveal distinct patient phenotypes with unique pathophysiological signatures and clinical outcomes. We created a longitudinal electronic health record dataset for 75,762 adult patient admissions to a tertiary care center in 2014–2016 lasting six hours or longer. Physiotypes were derived via unsupervised machine learning in a training cohort of 41,502 patients applying consensus
In this paper, we present a machine learning approach, consensus clustering, to group hospitalized patients based on six routinely collected vital signs measured within six hours of hospital admission into previously undescribed subsets or acute illness phenotypes that may have different risks for a poor outcome or different treatment responses. We identified four acute illness phenotypes associated with distinct clinical characteristics, biomarker patterns, and clinical outcomes. We validated the reproducibility of phenotypes using different dataset and clustering approach. The early identified phenotypes, that have unique disease states and mortality risk, have the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures and clinical decision-support systems under time constraints.
Each year in the United States alone there are more than 36 million hospital admissions and seven thousand in-hospital mortalities, nearly one quarter of which may be preventable [
Using electronic health record data spanning 75,762 adult hospital admissions, we test the hypothesis that unsupervised ML analysis of vital signs recorded within six hours of hospital admission reveals discrete and reproducible physiologic signatures of acute illness phenotypes (
We generated a longitudinal dataset of electronic health records (EHR) for 75,762 hospital admissions of 43,598 patients representing all adults (age ≥18 years) admitted to the University of Florida Health 1000-bed academic hospital between June 1, 2014 and April 1, 2016 with length of stay greater than or equal to six hours including emergency department admission if applicable. Patients completely missing at least two of the six vital sign measurements (systolic and diastolic blood pressure, heart rate, respiratory rate, temperature, and oxygen saturation) within six hours of admission were excluded (
We followed Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) recommendations under the Type 2b analysis category [
To derive
We processed raw time series to remove outliers and assess distributions, missingness, and correlation (
For each admission we extracted demographics, 19 clinical biomarkers routinely measured at hospital admission (
We assessed
Training, validation, and testing cohorts had similar clinical characteristics, biomarker distributions, and outcomes (
We identified four
(A) Distribution of vital signs during the first six hours of hospital admission. (B) Visualization of physiotypes using the t-distributed stochastic neighbor embedding (t-SNE) technique. (C) Physiotype average vital sign mosaics using a self-organizing map.
Variables | Total | Acute Illness Physiotypes | |||
---|---|---|---|---|---|
Physiotype A | Physiotype B | Physiotype C | Physiotype D | ||
Number of Encounters (%) | 41,502 | 12,695 (31) | 9,710 (23) | 12,962 (31) | 6,135 (15) |
Age, mean (SD), years | 54 (19) | 53 (18) |
50 (20) |
56 (18) | 56 (17) |
Female sex, n (%) | 22,745 (55) | 7,291 (57) |
5,585 (58) |
6,641 (51) | 3,228 (53) |
Race, n (%) | |||||
White | 29,076 (70) | 9,577 (75) |
6,723 (69) |
9,195 (71) | 3,581 (58) |
African American | 9,634 (23) | 2,090 (16) |
2,342 (24) | 2,947 (23) | 2,255 (37) |
Primary Insurance, n (%) | |||||
Private | 9,591 (23) | 3,158 (25) |
2,278 (23) | 2,991 (23) | 1,164 (19) |
Medicare | 18,499 (45) | 5,604 (44) |
3,852 (40) |
6,124 (47) | 2,919 (48) |
Medicaid | 9,231 (22) | 2,767 (22) |
2,588 (27) |
2,566 (20) | 1,310 (21) |
Uninsured | 4,181 (10) | 1,166 (9) | 992 (10) | 1,281 (10) | 742 (12) |
Residing neighborhood characteristics | |||||
Proportion of African-Americans (%), mean (SD) | 18.7 (17.5) | 17.3 (16.1) |
19.3 (17.8) |
18.5 (17.3) | 21.3 (19.4) |
Proportion Below Poverty (%), mean (SD) | 22.7 (10.1) | 21.8 (10.0) |
23.2 (9.9) |
22.6 (10.0) | 24.0 (10.4) |
Distance from Hospital (mile), median (IQR) | 18 (3, 34) | 22 (3, 37) |
14 (3, 32) |
18 (3, 34) | 14 (3, 27) |
Hypertension, n (%) | 21,639 (52) | 6,498 (51) | 5,000 (51) | 6,723 (52) | 3,418 (56) |
Cardiovascular disease, n (%) |
12,058 (29) | 3,477 (27) |
2,833 (29) | 3,783 (29) | 1,965 (32) |
Diabetes mellitus, n (%) | 10,111 (24) | 2,934 (23) | 2,400 (25) | 3,125 (24) | 1,652 (27) |
Chronic kidney disease, n (%) | 6,518 (16) | 1,757 (14) |
1,454 (15) | 2,056 (16) | 1,251 (20) |
Emergent Admission, n (%) | 30,177 (73) | 7,367 (58) |
8,106 (83) |
9,244 (71) | 5,460 (89) |
Transfer from another hospital, n (%) | 7,115 (17) | 1,943 (15) | 1,957 (20) |
2,100 (16) | 1,115 (18) |
Diseases of the circulatory system | 7,719 (19) | 2,142 (17) |
1,503 (15) |
2533 (20) | 1,541 (25) |
Respiratory and infectious diseases | 3,306 (8) | 571 (4) |
1,403 (14) |
692 (5) | 640 (10) |
Complications of pregnancy and childbirth | 3,148 (8) | 857 (7) | 1,100 (11) |
862 (7) | 329 (5) |
Diseases of the digestive/genitourinary systems | 5,184 (12) | 1,857 (15) |
1,028 (11) |
1661 (13) | 638 (10) |
Diseases of the musculoskeletal/connective tissue and skin | 3,651 (9) | 1,489 (12) |
479 (5) |
1216 (9) | 467 (8) |
Neoplasms | 2,743 (7) | 1,244 (10) |
377 (4) |
950 (7) | 172 (3) |
Surgical procedure on admission day, n (%) | 8,644 (21) | 4,441 (35) |
796 (8) |
2933 (23) | 474 (8) |
ICU/IMC admission within first 24 hours, n (%) | 9,426 (23) | 2,893 (23) |
3,022 (31) |
2151 (17) | 1,360 (22) |
Hypotension (MAP < 60 mmHg) at any time, n (%) | 14,470 (35) | 7,420 (58) |
3,393 (35) |
3051 (24) | 606 (10) |
Duration, median (IQR), minutes | 57 (15, 168) | 60 (18, 197) |
75 (30, 212) |
18 (6, 62) | 24 (8, 68) |
Vasopressors used, n (%) | 7,531 (18) | 4,079 (32) |
995 (10) |
2113 (16) | 344 (6) |
Out of operating room | 1,403 (3) | 646 (5) |
494 (5) |
198 (2) | 65 (1) |
Hypertension (SBP > 160 mmHg) at any time, n (%) | 14,838 (36) | 2,742 (22) |
1,611 (17) |
5629 (43) | 4,856 (79) |
Troponin, tested, n (%) | 14,616 (35) | 3,223 (25) |
4,090 (42) |
4214 (33) | 3,089 (50) |
Abnormal result among tested, n (%) | 3,398 (23) | 791 (25) |
987 (24) |
816 (19) | 804 (26) |
Highest administered FiO2, median (IQR), % | 0.21 (0.21, 0.40) | 0.28 (0.21, 0.40) |
0.21 (0.21, 0.33) |
0.21 (0.21, 0.40) | 0.21 (0.21, 0.29) |
Room air only, n (%) | 23,963 (58) | 6,273 (49) |
5,580 (57) |
8040 (62) | 4,070 (66) |
0.22 – 0.40, n (%) | 14,790 (36) | 5,419 (43) |
3,285 (34) | 4320 (33) | 1,766 (29) |
> 0.40, n (%) | 2,749 (7) | 1,003 (8) |
845 (8) |
602 (4) | 299 (5) |
PaO2/FiO2, tested with arterial blood gas, n (%) | 6,113 (15) | 2,015 (16) |
1,965 (20) |
1,345 (10) | 788 (13) |
<200 among tested, n (%) | 2,265 (5) | 747 (37) |
837 (43) |
427 (32) | 254 (32) |
Mechanical ventilation, n (%) | 2,123 (5) | 808 (6) |
656 (7) |
449 (3) | 210 (3) |
Preadmission estimated glomerular filtration rate |
95 (78, 111) | 96 (80, 112) |
100 (83, 117) |
93 (77, 107) | 90 (59, 105) |
Highest /reference creatinine |
1.24 (0.66) | 1.25 (0.71) |
1.31 (0.73) |
1.18 (0.54) | 1.24 (0.67) |
Renal Replacement therapy, n (%) | 641 (2) | 170 (1) | 119 (1) | 128 (1) | 224 (4) |
Highest Anion Gap, median (IQR), mmol/L | 14 (12, 17) | 13 (11, 16) | 15 (12, 18) |
14 (11, 16) | 15 (12, 17) |
Arterial Blood Gas tested, n (%) | 6,115 (15) | 2,016 (16) |
1,966 (20) |
1345 (10) | 788 (13) |
pH < 7.3 among tested, n (%) | 1,437 (23) | 532(26) |
558 (28) |
216 (16) | 131 (17) |
Highest Base deficit among tested, mean (SD), mmol/L | 4.8 (4.7) | 4.4 (4.2) |
6.4 (5.8) |
3.6 (3.2) | 4.3 (3.7) |
Lactate, tested, n (%) | 15,447 (37) | 4,360 (34) |
4,660 (48) |
4,006 (31) | 2421 (39) |
2 – 4 mmol/L among tested, n (%) | 3,739 (24) | 1,012 (23) | 1,305 (28) |
854 (21) | 568 (23) |
> 4 mmol/L among tested, n (%) | 1,374 (9) | 379 (9) |
607 (13) |
204 (5) | 184 (8) |
Highest White blood cell count, median (IQR), x109/L | 9 (7, 13) | 9 (7, 13) |
10 (8, 14) |
9 (7, 12) | 9 (7, 12) |
Highest Premature neutrophils (bands), median (IQR), % | 10 (4, 20) | 10 (4, 17) |
12 (5, 24) |
5 (2, 14) | 8 (3, 15) |
Lowest Lymphocytes, median (IQR), % | 16 (9, 24) | 16 (9, 26) |
12 (6, 20) |
18 (11, 26) | 17 (10, 24) |
C-reactive protein, tested, n (%) | 5,862 (14) | 1,479 (12) |
1,694 (17) |
1,730 (13) | 959 (16) |
Highest C-reactive protein, median (IQR), mg/L | 18 (5, 77) | 18 (5, 71) |
53 (11, 122) |
11 (3, 54) | 12 (4, 52) |
Erythrocyte sedimentation rate, tested, n (%) | 3,903 (9) | 962 (8) |
1,021 (11) | 1,234 (10) | 686 (11) |
Highest Erythrocyte sedimentation rate, median (IQR), mm/h | 40 (19, 73) | 37 (18, 66) | 51 (23, 88) |
34 (17, 65) | 40 (20, 72) |
Highest Temperature, mean (SD), Celsius | 37.7 (0.6) | 37.7 (0.6) |
37.9 (0.8) |
37.6 (0.5) | 37.7 (0.6) |
38–39, n (%) | 8,633 (21) | 2,869 (23) |
2,349 (24) |
2,259 (17) | 1,156 (19) |
> 39, n (%) | 1,548 (4) | 349 (3) |
826 (9) |
211 (2) | 162 (3) |
Lowest Temperature, mean (SD), Celsius | 36.7 (1.0) | 36.5 (1.4) |
36.7 (0.8) |
36.7 (0.8) | 36.8 (0.7) |
Lowest Hemoglobin, mean (SD), g/dL | 11.5 (2.3) | 11.1 (2.3) |
11.2 (2.4) |
12.0 (2.2) | 12.0 (2.3) |
Highest RDW, mean (SD), % | 15.5 (2.1) | 15.5 (2.2) |
15.9 (2.3) |
15.2 (1.9) | 15.5 (2.0) |
Lowest Platelets, median (IQR), x109/L | 210 (161, 269) | 200 (152, 258) |
218 (161, 285) |
211 (166, 265) | 219 (169, 274) |
Platelets < 200, n (%), x109/L | 16,707 (40) | 5,535 (44) |
3,874 (40) |
4,971 (38) | 2,327 (38) |
< 100 | 2,643 (16) | 976 (18) |
785 (20) |
628 (13) | 254 (11) |
100–200 | 14,064 (84) | 4,559 (82) |
3,089 (80) |
4,343 (87) | 2,073 (89) |
International normalized ratio, tested, n (%) | 20,357 (49) | 5,607 (44) |
5,193 (53) |
6,150 (47) | 3,407 (56) |
>= 2 | 1,836 (9) | 586 (10) |
583 (11) |
465 (8) | 202 (6) |
Glasgow Coma Scale score, n (%) | |||||
Moderate neurologic dysfunction (9–12) | 1,708 (4) | 631 (5) |
479 (5) |
401 (3) | 197 (3) |
Severe neurologic dysfunction (< = 8) | 1,482 (4) | 477 (4) |
479 (5) |
336 (3) | 190 (3) |
Bilirubin tested, n (%), mg/dL | 21,183 (51) | 5,431 (43) |
5,902 (61) |
6,110 (47) | 3,740 (61) |
≥ 2 | 1,427 (7) | 527 (10) |
481 (8) |
306 (5) | 113 (3) |
Highest Glucose, median (IQR), mg/dL | 126 (104, 170) | 125 (102, 165) | 129 (105, 175) |
124 (102, 167) | 132 (106, 186) |
Albumin, tested, n (%) | 21,368 (51) | 5,508 (43) |
5,929 (61) |
6,172 (48) | 3,759 (61) |
< 2.5 | 1,243 (6) | 403 (7) |
555 (9) |
180 (3) | 105 (3) |
2.5–3.5 | 6,904 (32) | 1,912 (35) |
2,292 (39) |
1,621 (26) | 1,079 (29) |
Abbreviations: ICU: intensive care unit; IMC: intermediate care unit; MAP: mean aterial pressure; RDW: red cell distribution width; SD: standard deviation; IQR: interquartile range.
a The p-values represent difference < 0.05 compared to Physiotype C and were adjusted for multiple comparisons using Bonferroni method. Supplemental Tables list p values for all within-group comparisons.
b Cardiovascular disease was considered if there was a history of congestive heart failure, coronary artery disease of peripheral vascular disease.
c Reference glomerular filtration rate and reference creatinine were derived without use of race correction (see
Variables | Total | Acute Illness Physiotypes | |||
---|---|---|---|---|---|
Physiotype A | Physiotype B | Physiotype C | Physiotype D | ||
Number of encounters (%) | 41,502 | 12,695 (31) | 9,710 (23) | 12,962 (31) | 6,135 (15) |
SOFA score > 6, n (%) | 3,506 (8) | 1,494 (12) |
974 (10) |
720 (6) | 318 (5) |
Patients in ICU/IMC, SOFA score ≤ 6, n (%) | 6,882 (17) | 1,868 (15) |
2,195 (23) |
1,693 (13) | 1,126 (18) |
Patients in ICU/IMC, SOFA score > 6, n (%) | 2,544 (6) | 1,025 (8) |
827 (9) |
458 (4) | 234 (4) |
Patients on ward, SOFA score ≤ 6, n (%) | 31,114 (75) | 9,333 (74) |
6,541 (67) |
10,549 (81) | 4,691 (76) |
Patients on ward, SOFA score > 6, n (%) | 962 (2) | 469 (4) |
147 (2) |
262 (2) | 84 (1) |
MEWS score ≥ 5, n (%) | 2,828 (7) | 472 (4) |
1,549 (16) |
264 (2) | 543 (9) |
Patients in ICU/IMC, MEWS score ≤ 4, n (%) | 7,235 (17) | 2,507 (20) |
1,785 (18) |
1,941 (15) | 1,002 (16) |
Patients in ICU/IMC, MEWS score > 4, n (%) | 2,191 (5) | 386 (3) |
1,237 (13) |
210 (2) | 358 (6) |
Patients on ward, MEWS score ≤ 4, n (%) | 31,439 (76) | 9,716 (77) |
6,376 (66) |
10,757 (83) | 4,590 (75) |
Patients on ward, MEWS score > 4, n (%) | 637 (2) | 86 (1) |
312 (3) |
54 (0) | 185 (3) |
Hospital days, median (IQR) | 4 (2, 7) | 4 (2, 6) |
4 (3, 8) |
3 (2, 6) | 4 (2, 7) |
Surgery at any time, n (%) | 11,634 (28) | 5,225 (41) |
1502 (15) |
3957 (31) | 950 (15) |
Admitted to ICU/IMC |
11,121 (27) | 3,330 (26) |
3,504 (36) |
2,640 (20) | 1,647 (27) |
Days in ICU/IMC |
4 (2, 7) | 4 (3, 7) |
4 (3, 8) |
4 (2, 7) | 4 (2, 6) |
ICU/IMC stay greater than 48 hrs, n (%) | 8,332 (75) | 2,517 (76) |
2,722 (78)a | 1,872 (71) | 1,221 (74) |
Mechanical ventilation, n (%) | 3,218 (8) | 1,120 (9) |
1,036 (11) |
736 (6) | 326 (5) |
Mechanical ventilation hours, median (IQR) |
35 (14, 113) | 24 (11, 81) | 46 (17, 142) |
26 (12, 105) | 54 (21, 145) |
Mechanical ventilation greater than 2 calendar days, n (%) | 1,661 (52) | 492 (44) | 613 (59) |
349 (47) | 207 (63) |
Renal replacement therapy, n (%) | 1,262 (3) | 335 (3) |
299 (3) |
265 (2) | 363 (6) |
Acute kidney injury, n (%) | 6905 (17) | 1,971 (16) |
2,119 (22) |
1,682 (13) | 1,133 (18) |
Community-acquired AKI, n (%) | 3839 (56) | 1,234 (63) |
1,221 (58) |
873 (52) | 511 (45) |
Hospital-acquired AKI, n (%) | 3066 (44) | 737 (37) |
898 (42) |
809 (48) | 622 (55) |
Worst AKI staging, n (%) | |||||
Stage 1 | 4360 (63) | 1,194 (61) |
1,241 (59) |
1,174 (70) | 751 (66) |
Stage 2 | 1362 (20) | 404 (20) |
484 (23) |
280 (17) | 194 (17) |
Stage 3 | 848 (12) | 269 (14) |
276 (13) |
171 (10) | 132 (12) |
Stage 3 with RRT | 335 (5) | 104 (5) |
118 (6) |
57 (3) | 56 (5) |
Venous thromboembolism, n (%) | 1257 (3) | 341 (3) | 393 (4) |
350 (3) | 173 (3) |
Sepsis, n (%) | 3750 (9) | 902 (7) |
1,933 (20) |
500 (4) | 415 (7) |
Hospital Disposition, n (%) | |||||
Hospital mortality | 1141 (3) | 291 (2) |
502 (5) |
227 (2) | 121 (2) |
Another hospital, LTAC, SNF, Hospice | 4475 (11) | 1,286 (10) | 1,231 (13) |
1,233 (10) | 725 (12) |
Home or short-term rehabilitation | 35886 (86) | 11,118 (88) |
7,977 (82) |
11,502 (89) | 5,289 (86) |
Thirty-day mortality, n (%) | 1633 (3.9) | 429 (3) |
684 (7) |
332 (3) | 188 (3) |
Three-year mortality, n (%) | 8013 (19) | 2,205 (17) | 2,466 (25) |
2,109 (16) | 1,233 (20) |
Abbreviation: SOFA: sequential organ failure assessment; MEWS: modified early warning score; ICU: intensive care unit; IMC: intermediate care unit; IQR: interquartile range.
a The p-values represent difference < 0.05 compared to Physiotype C and were adjusted for multiple comparisons using Bonferroni method. Supplemental Tables list p values for all within-group comparisons.
b At any time during hospitalization.
c Values were calculated among patients admitted to ICU/IMC.
d Values were calculated among patients requiring MV.
To understand which vital signs made the greatest contributions to cluster assignments, vital sign standardized mean differences were compared between pairs of phenotypes (
Pairwise physiotype comparisons of vital sign values standardized to mean 0 and standard deviation 1 demonstrated that temperature and oxygen saturation contributed least to phenotype differences. Systolic and diastolic blood pressure varied substantially between all Physiotypes except for A and B. SpO2: peripheral capillary oxygen saturation; Temp: temperature; SBP: systolic blood pressure; DBP: diastolic blood pressure, RR: respiratory rate; HR: heart rate.
Associations between
Three-year survival probability was modeled adjusting for demographics and comorbidities (
(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. CCI: Charlson Comorbidity Index.
Proportions of the total cohort in each
Using six vital signs measured within six hours of hospital admission, consensus clustering identified four distinct, clinically relevant patient phenotypes with unique pathophysiological signatures, disease categories, and clinical outcomes. Blood pressure values and trends contributed substantially to cluster assignments: one hypertensive, one normotensive, and two hypotensive clusters. Among the two hypotensive clusters, one was inflammatory, the other non-inflammatory according to C-reactive protein and erythrocyte sedimentation rate values. Beyond these fundamental distinctions, clusters were also differentiated by disease categories, producing the final
Beyond the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures,
We are unaware of previous studies using cluster analyses of early vital sign measurements to identify phenotypes in heterogeneous cohorts of patients hospitalized for any reason. Others have used clustering for identifying patients with unique disease subtypes with unique treatment responses; sepsis and diastolic heart failure are prominent examples. Seymour et al. [
We also acknowledge several limitations. Our study used data from a single institution, limiting the generalizability of our findings, and external validation in databases from different centers is needed. Yet, it seems unlikely that selection bias significantly affected results, as all adult patients admitted for longer than six hours were included. Input variables were limited to the first six hours following hospital admission so that phenotypes could be identified early enough to support clinical decision-making under time constraints and uncertainty. It is possible that the same advantages for early decision-support could be achieved while incorporating historical patient data from previous encounters in the electronic health record; further research is necessary to determine whether this strategy is advantageous. Waveform data, though not universally available in EHRs, has the potential to improve the precision of phenotype clustering. Our clustering approach does not ensure temporal ordering of vital signs, which could influence cluster assignments. Finally, the potential of early clustering to augment clinical decision-making remains theoretical until evaluated in a prospective trial.
Using six vital signs measured within six hours of hospital admission, clustering analyses identified four distinct patient phenotypes that had unique disease categories and clinical outcomes and did not recapitulate previously established acuity assessments. Beyond elucidating pathophysiology by distilling thousands of disease states into a few physiological signatures, identifying patient phenotypes during the early stages of hospital admission may have important implications for clinical decision-making under time constraints.
(DOCX)
(DOCX)
(DOCX)
(A) Unsupervised consensus k clustering in training cohort showing optimal partitioning in consensus matrix for k = 4. (B) Consensus cumulative distribution function (CDF) across k = 2 to k = 8, where more horizontal curves suggest optimal fit. (C) Relative change in the area under the CDF curve with increasing clusters (k), with little change beyond k = 4. (D) Cluster consensus plot showing the mean of all pairwise consensus values between a cluster members, for k = 2 to k = 8 where greater values for all bars suggest optimal fit.
(DOCX)
Spearman correlation heat map shows the pairwise spearman rank order correlation coefficient among the 6 vital signs studied in our paper. The darker red color, the higher correlation in positive direction. Abbreviations: RR: respiratory rate; SpO2: peripheral capillary oxygen saturation; Temp: temperature; HR: heart rate; SBP: systolic blood pressure; DBP: diastolic blood pressure.
(DOCX)
(DOCX)
For each phenotype, the larger percentage of patients with that score, the broader the ribbon.
(DOCX)
For each phenotype, the larger percentage of patients with higher score of that organ system, the border the ribbon.
(DOCX)
(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. (C) Physiotype survival curves adjusted using demographic information, comorbidities, and SOFA scores. (D) Adjusted Cox proportional hazards model using demographic information, comorbidities, and SOFA scores. Abbreviation: CCI: charlson comorbidity index; SOFA: sequential organ failure assessment.
(DOCX)
(A) Unsupervised consensus k clustering in training cohort showing optimal partitioning in consensus matrix for k = 4. (B) Consensus cumulative distribution function (CDF) across k = 2 to k = 8, where more horizontal curves suggest optimal fit. (C) Relative change in the area under the CDF curve with increasing clusters (k), with little change beyond k = 4. (D) Cluster consensus plot showing the mean of all pairwise consensus values between a cluster members, for k = 2 to k = 8 where greater values for all bars suggest optimal fit.
(DOCX)
In all panels, the variables are standardized such that all means are scaled to 0 and SDs to 1. A value of 1 for the standardized variable (x-axis) signifies that the mean value for the phenotype was 1 SD higher than the mean value for both phenotypes shown in the graph as a whole. Abbreviations in order: SpO2: peripheral capillary oxygen saturation; Temp: temperature; SBP: systolic blood pressure; DBP: diastolic blood pressure, RR: respiratory rate; HR: heart rate.
(DOCX)
(DOCX)
Starting from the original 36 dimensional vital signs, we run the t-SNE to reduce to 2 dimensions. Each dot represents a patient. Phenotypes are shown in separate colors.
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
Starting from the original 36 dimensional vital signs, we run the t-SNE to reduce to 2 dimensions. Each dot represents a patient. Phenotypes are shown in separate colors.
(DOCX)
For each phenotype, the larger percentage of patients with that score, the broader the ribbon.
(DOCX)
For each phenotype, the larger percentage of patients with higher score of that organ system, the border the ribbon.
(DOCX)
For each phenotype, the larger percentage of patients with that score, the broader the ribbon.
(DOCX)
For each phenotype, the larger percentage of patients with higher score of that organ system, the border the ribbon.
(DOCX)
(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. (C) Physiotype survival curves adjusted using demographic information, comorbidities, and SOFA scores. (D) Adjusted Cox proportional hazards model using demographic information, comorbidities, and SOFA scores. Abbreviation: CCI: charlson comorbidity index; SOFA: sequential organ failure assessment.
(DOCX)
(A) Physiotype survival curves adjusted using demographic information and comorbidities. (B) Adjusted Cox proportional hazards models using demographic information and comorbidities. (C) Physiotype survival curves adjusted using demographic information, comorbidities, and SOFA scores. (D) Adjusted Cox proportional hazards model using demographic information, comorbidities, and SOFA scores. Abbreviation: CCI: charlson comorbidity index; SOFA: sequential organ failure assessment.
(DOCX)
Diagnosis groups are shown in order of frequencies of all patients. For each phenotype, the larger percentage of patients with that diagnosis, the border the ribbon. Detailed diagnosis groups from left to right are: Nonspecific chest pain, Abdominal pain, Other and unspecific lower respiratory disease, Complication of device; implant or graft, Speticemia (except in labor), Acute cerebrovascular disease, Cardiac dysrhythmias, Congestive heart failure; nonhypertensive, and Osteoarthritis.
(DOCX)
Diagnosis groups are shown in order of frequencies of all patients. For each phenotype, the larger percentage of patients with that diagnosis, the border the ribbon. Detailed diagnosis groups from left to right are: Nonspecific chest pain, Abdominal pain, Complication of device; implant or graft, Other and unspecific lower respiratory disease, Speticemia (except in labor), Malaise and fatigue, Acute cerebrovascular disease, Osteoarthritis, and Cardiac dysrhythmias.
(DOCX)
Diagnosis groups are shown in order of frequencies of all patients. For each phenotype, the larger percentage of patients with that diagnosis, the border the ribbon. Detailed diagnosis groups from left to right are: Nonspecific chest pain, Other and unspecific lower respiratory disease, Speticemia (except in labor), Abdominal pain, Complication of device; implant or graft, Acute cerebrovascular disease, Cardiac dysrhythmias, Osteoarthritis, and Other complications of pregnancy.
(DOCX)
Interpretive example: Using gaussian mixture modeling to derive phenotypes, histograms of within phenotype probability demonstrated that members have high probability of being a phenotype member (>0.9).
(DOCX)
(DOCX)
Visualization of phenotypes using t-distributed stochastic neighbor embedding (t-SNE) technique in the training cohort with (A) physiotypes derived by consensus clustering shown in color, and (B) physiotypes derived by gaussian mixture modeling (GMM) shown in color.
(DOCX)
(A) Probabilities of assignment to cluster 1, and purple for those actually assigned to cluster 1, (B) Probabilities for patients assigned to cluster 2, and blue for those actually assigned to cluster 2, (C) Probabilities for patients assigned to cluster 3, and green for those actually assigned to cluster 3, and (D) probabilities for patients assigned to cluster 4, and orange for those actually assigned to cluster 4. Black lines correspond to median [IQR] of probability. Gray shading corresponds to region with a 45–55% (low or marginal) probability of assignment. Inset proportion is the % of 41,502 in the marginal region.
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
The content is solely the responsibility of the authors. AB and TOB had full access to all of the data. The authors thank members of the Intelligent Critical Care Center and Integrated Data Repository at the University of Florida Health for supporting this work.