Identifying and analyzing sepsis states: A retrospective study on patients with sepsis in ICUs

Sepsis accounts for more than 50% of hospital deaths, and the associated cost ranks the highest among hospital admissions in the US. Improved understanding of disease states, progression, severity, and clinical markers has the potential to significantly improve patient outcomes and reduce cost. We develop a computational framework that identifies disease states in sepsis and models disease progression using clinical variables and samples in the MIMIC-III database. We identify six distinct patient states in sepsis, each associated with different manifestations of organ dysfunction. We find that patients in different sepsis states are statistically significantly composed of distinct populations with disparate demographic and comorbidity profiles. Our progression model accurately characterizes the severity level of each pathological trajectory and identifies significant changes in clinical variables and treatment actions during sepsis state transitions. Collectively, our framework provides a holistic view of sepsis, and our findings provide the basis for future development of clinical trials, prevention, and therapeutic strategies for sepsis.


Introduction
Sepsis accounts for more than 50% of hospital deaths [1], and the cost of sepsis management ranks the highest among hospital admissions for all illnesses in the United States [2]. Key factors in improving patients' outcomes are the early diagnosis of sepsis, and subsequent timely and appropriate treatment actions. While significant progress has been made towards the former [3,4], with recent development of the Sequential Organ Failure Assessment (SOFA or Quick-SOFA) measure outside the Intensive Care Unit (ICU) [5,6], the latter continues to be a significant challenge [7][8][9]. There have been a number of efforts aimed at classifying disorders that broadly comprise sepsis, which have resulted in categories such as Systemic Inflammatory Response Syndrome (SIRS), severe sepsis, and septic shock. These, in turn, have resulted in treatment strategies with limited success. More recently, these categorizations have been abandoned, in favor of a more broadly accepted definition of sepsis as a 'life-threatening organ dysfunction caused by a dysregulated host response to infection' [10]. Although SOFA score is a more comprehensive measure of the severity of health status of patients with sepsis (S1 Table), and a good predictor of mortality [11], the diverse mechanisms underlying sepsis and how they map to SOFA scores are still not fully understood. This is in spite of significant efforts aimed at developing and deploying new and improved treatments. As a result, current approaches to sepsis treatment are primarily guideline-based, as opposed to relying on clinicians' decision-making capability, when presented with a patient's unique set of clinical variables [12].
A personalized decision process for sepsis must be capable of differentiating heterogeneous response from diverse groups of patients, and understanding the etiology of disease to minimize errors and maximize treatment efficacy. With the goal of motivating research on such personalized decision processes, the Medical Information Mart for Intensive Care version III (MIMIC-III) [13] database released de-identified clinical data from approximately 46,000 patients admitted to Beth Israel Deaconess Medical Center in Boston, Massachusetts between 2001 and 2012. We use clinical variables in the MIMIC-III database, along with a range of novel algorithmic and statistical constructs for our retrospective study of sepsis states and response (Fig 1).
We first identify a patient cohort using AI Clinician [14] from five tertiary ICUs in Boston. AI Clinician [14] defines sepsis as the presence of suspicion of infection in conjunction with evidence of organ dysfunction (SOFA score � 2) 48 hours before and up to 24 hours after onset of infection. We then extract clinical variables, including vital signs, lab results, and the use of a ventilator, from 24 hours before and up to 48 hours after the onset of infection, as well as demographic data and various comorbidities before sepsis infection, to characterize the health status of the patients with sepsis during the ICU stay. This results in a sample of 16,546 distinct patients from five tertiary ICUs in Boston with 20,944 ICU stays. We summarize this data in Table 1.
We develop a novel mathematical framework that: (i) identifies distinct sepsis states using archetypal analysis; (ii) extracts representative sets of features from clinical variables to differentiate sepsis states and identifies associated biomarkers that can be mapped back to organ function(s); (iii) analyzes relationships between sepsis states, demographic variables, and comorbidities pre-and post-infection; and (iv) models sepsis progression using a higher-order Markov chain and identifies significant changes in clinical variables and treatment actions during sepsis state transitions. We demonstrate that our framework identifies distinct sepsis states-each state characterized by a unique set of pathological responses that can be mapped back to organ function(s) and an association between patient attributes and sepsis states. We also find that these states manifest distinct comorbidity profiles before infection. Moreover, our computational framework also provides insights into understanding the pathological processes in sepsis. Our state transition graphs provide an estimate of treatment actions (average In the data analysis phase, we use archetypal analysis to find distinct states in sepsis. We validate that each state corresponds to patient clusters that are statistically distinct from the distribution of the cohort as a whole, and the SOFA score, SIRS score, and mortality rate are calculated to characterize each sepsis state. In primary function analysis, selected features from archetypes are used to identify the primary functions (namely, nervous system, inflammation and infection, liver function, kidney function, coagulation, respiratory function, and, cardiovascular function) of each sepsis state. In etiological analysis, we find correlation between pre-existing comorbidity profiles (30 types) and sepsis states. Finally, in progression analysis, we use higher-order Markov chains to model the dynamics of pathological processes of sepsis. We then use archetypal analysis to identify distinct types of sepsis state transitions and use z-score analysis to find representative clinical markers of each state transition. https://doi.org/10.1371/journal.pdig.0000130.g001

Results
Identifying distinct sepsis states from the cohort Archetypal analysis of sepsis cohort. We pose the following important question: do there exist distinct states of sepsis with different clinical manifestations, recovery rates, demographic and pathological characteristics, and is it possible to identify these states from patient clinical measurements? We formulate this problem as one of finding archetypes (representatives of states) of sepsis, and design powerful mathematical models and methods for solving this problem. A geometric interpretation of our approach is to view each patient as a point characterized by clinical manifestations in a high dimensional space of attributes, and archetypes as corners of a convex hull in this high dimensional space. Within this representation, each data point can be approximated as a linear combination of the archetypes. Since archetypes form a convex polytope, the coefficients in the linear combinations sum to one (convex combinations).
This formulation has several advantages over traditional clustering techniques (e.g., kmeans). Archetypes represent extremal or pure states-to this end, they have clear clinical interpretations. Second, each convex combination has a well-characterized interpretation as a mixture of pure states. Finally, descriptors of archetypes may themselves be processed to identify clinical markers of pure states. The elbow method was used to determine the number of archetypes in the dataset. Specifically, we measured how well the archetypes and the coefficient matrix approximate the original data with respect to the Frobenius norm and chose the elbow point as the optimal number of archetypes. Using this procedure, shown in S1 and S2 Figs, we discover six distinct states in sepsis among our cohort. Since archetypes represent extreme sepsis states, in the rest of this discussion, we use the terms "sepsis state" and "archetype" interchangeably.
Fig 2 shows a uniform manifold approximation and projection (UMAP) embedding of the data points, along with the archetypes (represented by colors). Note that archetypes (A1 through A6) do not appear as corners of the convex polytope since this is a two-dimensional embedding of a higher dimensional attribute space.
To ascertain that the identified clusters are distinct, we apply a statistical test to validate that the probability distributions corresponding to these groups are significantly different from the distribution of the cohort as a whole. We also test to ensure that the probability distribution of each group is significantly different from others, as characterized by a multivariate analysis of variance (MANOVA) procedure (please see Methods Section Testing statistical significance of states for more details). Statistics of pairwise Hotelling t-square test across sepsis states and two-sample t-test for each variable for each sepsis state compared to overall populations are shown in S2 and S3 Tables, respectively. The pairwise Hotelling t-square test across sepsis states demonstrates that sepsis states are significantly different from each other, and the twosample t-test demonstrates that most of the clinical variables are significantly different from the overall populations. Since there is no apparent separation between the MODS group in the embedding space, shown in Fig 2, we further applied two-sample t-tests for each variable between the MODS group, shown in S4 Table. In general, SGOPT and SGPT are significantly different, which are the dominant signals in the MODS group. A distinct subset of clinical variables is significantly different between the MODS group. Specifically, besides SGOT and SGPT, age, meanBP, potassium, calcium, total bilirubin, Hb, white blood cell count, INR, arterial lactate are significantly different (p-values < 0.001) between A4 and A5; SGOT, potassium, glucose, and white blood cell count are significantly different (p-values < 0.001) between A5 and A6; SGOT, SGPT, gender, GCS, FiO2, glucose, BUN, Magnesium, Calcium, PT, and INR are significantly different (p-values < 0.001) between A4 and A6.
Characterizing sepsis states: SOFA score, SIRS score, and mortality rate. We use the SOFA score, SIRS score, and mortality rate as measures to characterize sepsis states (please see Table 2 for detailed statistics for each clinical variable in each sepsis state.). Since SOFA (and SIRS) scores vary over time, we use the average SOFA (and SIRS) scores for each state. We define mortality rate for each state as the rate of mortality among the patients who have passed through the sepsis state of interest. Stated otherwise, if a patient transitions from state A1 to A2, back to A1, and then dies, we only attribute this once to states A1 and A2, even though the patient visited state A1 twice.
Based on the SIRS criteria, a patient with SIRS score higher than two is diagnosed with sepsis infection (please see full discussion in S1 Text). This definition of sepsis is mainly focused on signs of inflammation exhibited by patients. We find that, among the sepsis states, only state A2 satisfied the SIRS criteria. Consequently, we identify state A2 as primarily representing inflammatory response. According to the sepsis-3 criteria [10], sepsis is defined as having a change of SOFA score higher than two in the presence of clinical suspicion of infection (as indicated by the ordering of IV antibiotics and blood cultures), and the higher the score, the more severe the condition. It is reported that patients who developed Multiple Organ Dysfunction Syndrome (MODS) display significantly higher mortality rate [8,15]. We observe that the mortality rate, as well as the SOFA scores of state A4, A5, and A6 are significantly higher than the other types. Thus, we hypothesize that the A4, A5, and A6 represent MODS with heterogeneous organ dysfunction. Compared to the MODS states, states A1 and A3 display lower mortality rates and SOFA scores, with A3 having the lowest mortality rate and SOFA score.  Therefore, we characterize states A1 and A3 as 'moderate condition' and 'mild condition', respectively. A restrictive definition of sepsis has significant adverse implications for diagnosis and treatment. The SIRS metric has been criticized for its inability to identify all possible host responses for sepsis since the SIRS criteria focuses solely on inflammatory excess; hence it is an inaccurate predictor for mortality. This diagnostic metric for sepsis was replaced by sepsis-2, and eventually by sepsis-3. Sepsis-3 uses the SOFA score to characterize the health status of patients with sepsis. It has been shown to be a more accurate predictor of mortality [16], compared to SIRS and sepsis-2. Our results support the arguments against the SIRS metric, and reinforce the use of SOFA scores for severity of sepsis infections. As mentioned earlier, only state A2 in our cohort qualified as a sepsis infection based on SIRS scores-leading to potentially inadequate care for other sepsis states. Our analysis demonstrates that SOFA scores correlate well with our identified states, and that the severity and mortality rate for identified states correlates well with their SOFA sores. However, as we show in the rest of this study, SOFA score alone does not capture the diversity of sepsis states-motivating our multidimensional approach based on archetypal analysis.
On the generalizability of archetypes. We demonstrate the generalizability of archetypal analysis. We first generate "ground truth" sepsis states by computing corresponding archetypes using the entire set of samples. We then characterize the state of each sample by assigning it to the closest archetype (or the highest coefficient in the convex combination). Next, we divide the entire set of samples into training sets (90%) and test sets (10%). We show that: i) the computed archetypes using training sets are very close to the "ground truth" archetypes; and ii) the cluster assignment for samples in the test sets using the archetypes from training sets are very similar to the cluster labels using "ground truth" archetypes. We run archetypal analysis with 1000 iterations and 20 repetitions for both cases, where the entire samples and training sets are used for statistically stable results. We report: i) relative errors of sepsis states, as measured by , where A and A train denote the "ground truth" archetypes and the archetypes computed from training set; ii) cluster assignment accuracy on the test set; and iii) errors in SOFA scores, SIRS scores, and mortality rates on the test set. We find that the computed archetypes using training sets are very close to the "ground truth" archetypes (averaged relative error over 20 repetitions is 0.0180), that the cluster assignment using archetypes from the training set is consistent with the ground truth assignment (averaged cluster assignment accuracy over 20 repetitions on tests sets is 99.88%), and that the computed SOFA scores, SIRS scores and mortality rates using archetypes from the training set are consistent with those using ground truth archetypes (averaged errors on the SOFA scores, SIRS scores and mortality rates on the test sets are 0.1887, 0.0256, and 0.0104, respectively.)

Selecting distinguishing features of sepsis states
To identify discriminative attributes for each state, we use three criteria. The first two criteria are based on the Huygens-Steiner theorem to measure the inertia (i.e., the tendency of a physical object to remain still or continue in motion) of the points in Euclidean space. PTT, DiaBP, Glucose, BUN, Creatinine and Comorbidity Count were selected by one. We use these features to analyze the primary profiles for each sepsis state in the next section.

Analyzing the primary attributes for sepsis states
Among the 21 features selected by our methods, 18 are vitals and lab results that are known biomarkers of organ functions or other aspects of overall health. From this set of features, we identify associated primary health indicators corresponding to the nervous system, inflammation and infection, liver function, kidney function, coagulation, respiratory function, and cardiovascular function. We refer to these seven as primary functions. We used these selected features to calculate the expression of these primary functions (please see Method Section Expression of primary functions for more a detailed calculation.). The spider-plot of primary functions affected in each sepsis state is shown in Fig 4. We find that each sepsis state manifests distinct expressions of organ dysfunction. We provide a detailed examination of the expression level of the biomarkers for each primary function next. We provide: (i) the biomarkers that are used in assessment of organ function; (ii) the severity level of biomarkers to assess organ function, and (iii) the expressions of these biomarkers in each sepsis state. A summary of the statistics for these biomarkers highlighted in color light cyan can be found in Table 2.
Nervous system. We use the Glasgow Coma Scale (GCS) to evaluate the state of the nervous system. GCS estimates coma severity based on eye, verbal, and motor criteria, and classifies the patient into mild (score = 13-15), moderate (score = 9-12), severe (score = 3-8), and vegetative state (score less than 3) [17]. We find that the average GCS scores for states A1 total) by Q j (P K ), Q 0 j ðP K Þ, and Variation test. Q j (P K ) calculates the discriminative power of feature i for a given clustering as the ratio of inter-cluster inertia to the total inertia computed using feature i. Q 0 j ðP K Þ calculates the discriminative power of feature i as the ratio of inter-cluster inertia computed using feature i to total inter-cluster inertia computed using all features. Variation test selects features that have the lowest probability of overlap across clusters (please see Methods section Feature selection methods [56] for more details). Note that there is a significant overlap between features chosen by these selection criteria. However, each criterion yields a distinct set of features significantly associated with different sepsis states.
https://doi.org/10.1371/journal.pdig.0000130.g003 through A6 are 12.56, 12.24, 13.96, 10.67, 11.20, and 11.77, respectively. This indicates that the non-MODS states-A1, A2, and A3, are mild GCS states, while the MODS states-A4, A5, and A6 states, are within the range of moderate GCS state. Among all of the sepsis states, state A3 displays the highest level of consciousness, while the A4 corresponds to the lowest level of consciousness.
We find that the GCS scores correlate well with SOFA scores and mortality rate of sepsis states, indicating that the GCS score is a good predictor of severity level or mortality for patients with sepsis.
Inflammation and infection. White Blood Cell (WBC) count, Heart Rate (HR), and platelet count are used to evaluate inflammation and infection response in the body. Among these, WBC count and HR are also used in the SIRS criteria to characterize systemic inflammation [18]. In acute inflammatory conditions, an increase in HR is often observed. HR in sepsis increased when patients suffer from hypovolemia and hypoperfusion. The WBC count increases from a normal value of 4.5 to 11.0 × 10 9 /L to 15.0 to 20.0 × 10 9 /L, with WBC levels higher than 11.0 defined as leukocytosis. While not considered in the SIRS criteria, the elevation of platelet count is an important indicator for inflammation and infection [19]. Inflammatory conditions such as bacterial infection, sepsis, malignancy, and tissue damage, motivate a reactive response that elevates platelet count, namely secondary thrombocytosis (platelet count higher than 500 × 10 9 /L) [20].
In summary, states A1, A3, A4, A5, and A6 show few signs of inflammation. In contrast, state A2 reveals high inflammatory response, as all the inflammatory biomarkers-WBC count, platelet count, and HR, are significantly elevated. Liver function. SGOT, SGPT, and arterial lactate are used to characterize liver function. An increase in SGOT and SGPT levels indicates damage to the liver. In general, the severity of liver dysfunction can be classified as mild, moderate, or severe if elevation of SGOT and SGPT levels is less than 5 times, 5-10 times, and 10-50 times the upper reference limit. In addition to SGOT and SGPT, arterial lactate is a biomarker for liver dysfunction. Arterial lactate is primarily cleared by the liver, with a small amount of additional clearance by the kidneys. Thus, arterial lactate is elevated when liver function is compromised. In a healthy body, the lactate level is usually less than two mmol/L. Patients with hyperlactatemia usually have lactate levels higher than two mmol/L. Lactate levels higher than four mmol/L are considered to be in a severe state of hyperlactatemia.
We find that non-MODS states-A1, A2, and A3, reveal mild liver damage, with only a mild increase in SGOT and SGPT levels (less than 5 times the upper reference limit), as well as a mild increase in arterial lactate. The average SGOT levels in states A1, A2, and A3 are 121.20 u/L, 115.25 u/L, and 97.13 u/L, respectively; the average levels of SGPT in states A1, A2, and A3 are 104.71 u/L, 103.37 u/L, and 81.68 u/L, respectively; and the average arterial lactate levels in states A1, A2, and A3 are 2.04 mmol/L, 1.88 mmol/L, and 2.10 mmol/L, respectively. In contrast to non-MODS states, SGOT, SGPT, and arterial lactate are all in severe levels in MODS states-A4, A5, and A6. The average SGOT levels in states A4, A5, and A6 are 6.56 × 10 3 u/L, 2.01 × 10 3 u/L, and 7.66 × 10 3 u/L, respectively; the average SGPT levels in states A4, A5, and A6 are 3.02 × 10 3 u/L, 6.39 × 10 3 u/L, and 6.69 × 10 3 u/L, respectively; and the average arterial lactate levels in states A4, A5, and A6 are 5.50 mmol/L, 3.95 mmol/L, and 4.58 mmol/L, respectively. Not identified by our feature selection methods, but also a representative biomarker, high levels of bilirubin are often associated with liver damage [21]. Patients with sepsis having (i) bilirubin � 4.0 mg/dL, or (ii) SGPT levels of twice the upper limit of normal for age are considered to have a sepsis-associated liver injury (SALI) [22]. A high level of bilirubin (� 2.5 to 3 mg/dL) can cause jaundice. In our study, the average bilirubin levels in states A4, A5, and A6 are 5.37 mg/dL, 3.37 mg/dL, and 4.56 mg/dL, respectively, indicating common occurrence of jaundice in MODS states.
In summary, non-MODS states reveal mild liver damage, reflected in a mild increase in SGOT, SGPT, and arterial lactate levels. In contrast, MODS states display severe liver dysfunction, with SGOT, SGPT, and arterial lactate all at severe levels. This is generally accompanied with jaundice. Finally, we note that state A6 potentially develops ischemic injury, since we observe that: (i) both SGOT and SGPT are more than 50 times higher than the upper reference limit; and (ii) SGOT is greater than SGPT [23].
Kidney function. Blood Urea Nitrogen (BUN) test and serum creatinine, identified by our feature selection methods, are common biomarkers of Acute Kidney Injury (AKI). AKI, defined as a sudden episode of kidney failure or kidney damage that happens within a few hours or a few days, is a common complication in sepsis patients. It is associated with high morbidity and mortality [24]. BUN measures the amount of urea nitrogen in the blood. Urea nitrogen is removed from the blood by the kidneys; consequently, high BUN levels indicate potential kidney damage. A serum creatinine test provides an estimate of filtration efficiency of kidneys (glomerular filtration rate). An increased level of creatinine in blood is indicative of potential impaired kidney function. Our feature selection methods also identify glucose. Higher glucose levels are often observed in patients with compromised kidney function, such as Chronic Kidney Disease (CKD).
We find that state A2 exhibits relatively better kidney function when compared to the other sepsis states since both serum creatinine (1.02 mg/dL) and BUN (24.96 mg/dL), though slightly elevated, are the lowest. In the rest of the states, damage to kidneys is observed, with state A4 being the worst (highest average glucose level (150.89 mg/dL)). The average levels of serum creatinine in states A1, A3, A4, A5, and A6 are 1.49 mg/dL, 1.50 mg/dL, 2.05 mg/dL, 2.00 mg/L, and 2.01 mg/dL, respectively; the BUN levels in states A1, A3, A4, A5, and A6 are 29.31 mg/dL, 26.68 mg/dL, 33.63 mg/dL, 26.26 mg/dL, and 29.35 mg/dL, respectively.
Coagulation function. Activated Partial Thromboplastin Time (aPTT), Prothrombin Time (PT), and International Normalized Ratio (INR) are identified by our feature selection methods. These are measures of coagulation function. Sepsis-associated coagulopathy (SAC) is typically diagnosed by PT prolongation or elevated INR, in conjunction with reduced platelet count [25]. aPTT is also used as a test for coagulation in patients with sepsis. Increased aPTT and PT above normal values, and decreased platelet count below normal value indicate long clotting time (DIC) and bleeding in sepsis patients [26].
We find that in non-MODS states, aPTT is within the normal range, and that INR and PT are slightly elevated. We also examine the platelet count in these states. Although the average platelet count in all sepsis states is within the normal range, a higher percentage of the cases with a platelet count below the normal range (150 × 10 9 /L) are observed in MODS states. The percentage of cases with platelet count lower than normal from states A1 to A6 are 30%, 0%, 27.7%, 44.9%, 40.3%, and 44.9%, respectively.
In summary, nearly one-third of the cases in states A1 and A3 develop a mild condition of SAC or DIC, while more than 40% of cases in the MODS group have worse SAC or DIC.
Respiratory function. PaO2, FiO2, PaO2/FiO2, and the use of a mechanical ventilator are identified by our feature selection methods. These are commonly used to measure respiratory function. PaO2 measures the pressure of oxygen dissolved in blood, and how well oxygen can move from the airspace of the lungs into the blood. FiO2 is defined as the concentration of oxygen that a person inhales. Patients experiencing difficulty in breathing are provided with oxygen-enriched air. Therefore, higher FiO2 is observed if the respiratory function is compromised. PaO2/FiO2 is a known measure for the assessment of respiratory dysfunction, such as Acute Respiratory Distress Syndrome (ARDS). Under the Berlin ARDS definition, patients with PaO2/FiO2 levels in the range of 200-300, 100-200, and less than 100 are classified as mild, moderate, and severe ARDS. The SOFA metric also incorporates PaO2/FiO2 as a parameter in assessing respiratory function. According to the SOFA score, a normal person has a PaO2/FiO2 ratio of approximately 500 and a patient with PaO2/FiO2 ratio between 300-400, 200-300, 100-200, and less than 100 would have SOFA scores 1, 2, 3, and 4, respectively. Thus, a lower PaO2/FiO2 ratio indicates worse respiratory condition. Conversely, high PaO2/FiO2 ratio (PaO2 > 300 mmHg) indicates that the lung is exposed to hyperoxia. Mechanical ventilators are often used in ICUs to assist or replace spontaneous breathing, indicating compromised respiratory function.
We find that patients in state A3 display excessive amounts of PaO2 and a slight increase in FiO2 (0.07 higher than the normal value of 0.21, on average). This indicates that patients in state A3 are less prone to lung dysfunction, but that state A3 manifests hyperoxia. The lower fraction of patients on ventilators in state A3 also indicates better lung function, compared to other states. The fraction of patients on a ventilator in state A3 is the lowest, at 0.09. The PaO2/FiO2 parameter also indicates that state A3 does not develop ARDS. Distinct from state A3, respiratory functions are compromised to varying extents in other states. We find that both PaO2 and FiO2 in states A1, A2, A4, A5, and A6 are slightly elevated. The average values of PaO2 in states A1, A2, A4, A5, and A6 are 120.83 mmHg, 121.77 mmHg, 126.95 mmHg, 120.98 mmHg, and 117.92 mmHg, respectively, and the average values of FiO2 in states A1, A2, A4, A5, and A6 are 0.46, 0.47, 0.52, 0.48, and 0.47, respectively. A higher rate of patients on ventilators is observed in states A1, A2, A4, A5, and A6, with the mean value of 0.37, 0.33 0.56, 0.50, and 0.41, respectively. We observe that states A1, A2, A4, A5, and A6 correspond to mild ARDS. The average values of PaO2/FiO2 in states A1, A2, A4, A5, and A6 are, respectively, 292.98 mmHg, 294.59 mmHg, 285.03 mmHg, 318.13 mmHg, and 301.8 5 mmHg, which is close to the boundary of normal value of 300mmHg in the Berlin ARDS definition and close to SOFA score of 2. We highlight that among these states, state A4, one of the MODS states that displays the highest SOFA score and mortality rate, shows the highest FiO2, the lowest PaO2/ FiO2, and the highest rate of ventilator use.
In summary, states A1, A2, A4, A5, and A6 manifest mild respiratory dysfunction with state A4 being the worst. State A3 shows better respiratory function, as the average value of FiO2, and rate of ventilator use is the lowest. According to both the Berlin ARDS definition and SOFA score, state A3 type does not manifest ARDS. However, state A3 manifests hyperoxia, since the average value of PaO2 in state A3 is higher than 300 mmHg. Cardiovascular function. DiaBP and serum lactate are identified by our feature selection method. DiaBP is indicative of potential hypotension (systolic blood pressure � 90 mmHg or diastolic � 60 mm Hg), and serum lactate is an important biomarker of septic shock [4]. Patients with septic shock can be identified with a clinical construct of sepsis with persisting hypotension requiring vasopressors to maintain mean arterial pressure (MAP) of 65 mmHg, and having a serum lactate level higher than two mmol/L despite adequate volume resuscitation [4].
We observe that all sepsis states except A5 show lower blood pressure. The average DiaBP of state A5 is 60. 46 We also find that the dosage of vasopressin in MODS states is significantly higher than non-MODS states. The average dosage of vasopressin in states A1, A2, and A3 are 0.06 mcg/ kg/min, 0.08 mcg/kg/min, and 0.01 mcg/kg/min, respectively. In contrast, the average dosage of vasopressin in states A4, A5, and A6 are 0.26 mcg/kg/min, 0.13 mcg/kg/min, and 0.13 mcg/ kg/min, respectively.
In summary, non-MODS states show mild hypotension, while MODS states are potentially in septic shock, with state A4 being the worst. It has been shown that the development of septic shock is an accurate predictor of mortality [27,28]. Our results are consistent with these studies since states A4, A5, and A6 consist of patients with higher SOFA scores and correlate with higher mortality rates.

Correlation of demographic variables and comorbidities with sepsis states
Variations in patients' demographics, such as gender, age, and medical comorbidities, present additional considerations for classifying sepsis states [7]. Our feature selection methods identify age, weight, and comorbidities, indicating that these attributes are strongly correlated with sepsis states.
Correlation of demographic variables with sepsis states. The distributions of sepsis states in terms of patient age are shown in Fig 5A. We observe that while the average ages of patient in states A1, A3, and A4 are close to the average age of the entire cohort, the average age patients in states A2, A5, and A6 are significantly lower than the average age of the entire cohort-the average age of the entire cohort is 64.57 years, and the average ages of states A1 to A6 are 64.79 years, 59.55 years, 65.53 years, 64.09 years, 58.66 years, and 60.16 years, respectively. Several studies have been shown that advanced age has been associated with worse outcomes [29,30]. We also find that worse outcomes are observed in older people for severe sepsis types. Specifically, in MODS states, we observe that state A4, shown to be associated with the highest mortality, also has the highest average age. On the other hand, we observe that the sepsis state that demonstrates notable expression of inflammatory response, i.e., A2, is associated with lower average age.
The distributions of sepsis states in terms of patient weight are shown in Fig 5B. We observe that the average weight of the entire cohort and the average weight of all sepsis states except A5 are within 4 percent. The average weight of patients in state A5 is 7 percent higher than the average weight of the entire cohort. The average weight of the entire cohort is 83.27 kilograms and the average weight of patients in states A1 to A6 is 83.29 kilograms, 81.95 kilograms, 79.28 kilograms, 85.18 kilograms, 89.30 kilograms, and 82.31 kilograms, respectively.
Comorbidity profiles and their association with sepsis states. We investigate the association of different comorbidity profiles with sepsis states. First, we construct distributions of sepsis states by the total number of pre-existing comorbidities, shown in Fig 5C. We observe that, compared to non-MODS states, the MODS group has a higher number of comorbidities -the average comorbidity count for states A1 to A6 is 3.93, 3.36, 3.75, 4.32, 4.04, and 4.08, respectively. The higher the number of comorbidities, the worse the outcomes of sepsis. Our results show clear association between comorbidities and patient outcomes. We next analyze the relationship between specific comorbidity profiles and their association (positive or negative) with sepsis states. We use the z-score to measure the distance between the observed condition (comorbidity) and its average over the entire cohort. If the z-score of a condition is positive in a state, we note that the condition is expressed in the state; conversely, if the z-score is negative, the condition is suppressed in the state. To ensure that a diverse range of conditions is covered, a comprehensive set of comorbidity measures, 30 types in all, are included in the z-score analysis (please see S6 Table for more detail for each type).
Once the z-scores are computed, we apply the two-sample t-test to ensure that the computed values are statistically significantly different. The statistically significant z-scores (pvalue � 0.05) are shown in Fig 5D. (S3 Fig shows the original z-scores and the corresponding p-values for the pairwise two-sample t-test for the comorbidity profiles of each sepsis type.) As shown in Fig 5D, we find that none of the conditions are significantly differentially expressed from the overall cohort in state A1. This is explained by the fact that state A1 represents a mild sepsis state. We find that the z-score of metastatic cancer (0.81) is significantly expressed in the inflammation state (A2 state). A slightly increased differential expression of valvular disease, peripheral vascular, hypertension, and diabetes (uncomplicated and complicated) is observed in state A3, with values of 0.14 and 0.13, 0.05, 0.08, and 0.18, respectively. We observe a strong association between MODS states and liver disease. Therefore, we observe that z-scores of liver disease in MODS states are statistically higher than average. The z-scores of liver disease in states A4, A5, and A6 are 0.73, 0.78, and 0.74, respectively. Individuals with poor kidney health manifest fluid and electrolyte imbalances. We observe the z-score of fluid and electrolyte imbalances in state A4 is statistically higher than average, with a value of 0.17. Lastly, we find a strong association between complicated diabetes and state A4, with a z-score value of 0.53.

Analyzing pathological processes of sepsis
Understanding the progression of pathological process of sepsis is essential for designing disease management protocols. First, we model sepsis patients' trajectories over identified states and analyze the association between pathological trajectories and mortality rates. Specifically, we model transition across sepsis states using higher-order Markov chains and compute the mortality rate of each node, which corresponds to a particular pathological trajectory in a Markov chain. Here, a third-order Markov chain, represented as directed de Bruijn graph, in conjunction with the association between third-order transitions and mortality rate is shown in Fig 6A (See S4 Fig for the second-order counterparts.) In Fig 6A, we observe that nearly all sepsis patients begin in state A1 and that the conditional probability of remaining in the same state given they were in A1 state in the previous three time points is approaching one (98.74%)-the state "111" of the third-order Markov chain is nearly an absorbing state. Also, we observe that if patients were in the states other than state A1 and enter state A1, they remain in state A1. This indicates that most of the sepsis patients remain in moderate condition. Lastly, we find that there are very few transitions across MODS groups. This suggests that patients in different MODS groups are composed of distinct sub-populations.
Combined with knowledge of the severity level of sepsis states (Fig 2), we can characterize the mortality trend for each pathological trajectory. (Note that we had shown in Section Characterizing sepsis states: SOFA score, SIRS score, and mortality rate that MODS states (state A4, A5, and A6) consist of patients with higher mortality rates than non-MODS states (state A1, A2, and A3).) As shown in Fig 6A, we observe a higher mortality rate in trajectories that traversed MODS states. Also, we observe that the longer a patient stays in MODS states, the higher the mortality rate. Finally, we find that mortality rate increases if there is a transition from a non-MODS state to a MODS state. Conversely, if there is a transition from a MODS state to a non-MODS state, the mortality rate decreases.
Transition dynamics across states are functions of treatments, patient characteristics, and sepsis states. The identification of sepsis states guides physicians to monitor a set of variables from patient covariates to assess the status of sepsis patients or the severity level of organ dysfunctions. Based on this information, a set of treatment actions are performed to manage sepsis. To find treatment actions during transitions, we identified the amounts of fluids infused, the dosage of vasopressor, and mechanical ventilators as treatment actions. We kept track of the averaged fluids, maximum dosage of vasopressor, and the probability of using a mechanical ventilator between transitions. The First-order transition graph and treatment actions are shown in Fig 6B. We find that a different set of treatment actions was imposed for each state with a general trend of amount of fluids, vasopressors, and mechanical ventilators on the MODS group. In addition, amount of fluids, vasopressors, and the usage of mechanical ventilators are reduced during the transitions from MODS states to non-MODs states (A1).
Next, we identify distinctive clinical transition makers of sepsis progression. We construct a state transition dataset as a collection of gradients of clinical measurements associated with transitions from one sepsis state to another, to quantify dominant gradient trends using archetypal analysis. Formally, given a transition dataset G ¼ fg 1 ; . . . ; g m 0 g, we find a set of archetypes of gradients so that each gradient is a convex combination of archetypes and each archetype is a convex combination of the gradients. Following the procedure described in Section Archetypal analysis of sepsis cohort, we identified six archetypes of gradients, labeled G1 to G6. Fig 7A shows a UMAP embedding of the gradients, along with the archetypes of gradients (represented by colors).
To identify discriminating components across gradient groups, we compute the z-score of each component in each gradient group and identify components that diverse significantly.

Discussion
We present a computational framework to identify disease states and model pathological processes of sepsis from 16,546 distinct patients collected from the MIMIC-III database [13]. We identified six sepsis states based on the measurement of 42 variables (demographic profiles, vital signs, laboratory tests, mechanical ventilation status of the patients, and information on pre-existing clinical conditions) from these sepsis patients. Among these states, State 1 manifests a moderate condition of sepsis, State 2 primarily represents inflammation and infection with evident signs of inflammatory response, State 3 corresponds to the highest survival rate, but is typically associated with hyperoxia. The last three states show signs and symptoms of Multiple Organ Dysfunction Syndrome (MODS) with diverse manifestations of organ failures.
Our framework identified the most discriminating attributes for each sepsis state and showed that each state manifests a unique set of pathological responses, which correspond to different extents of organ dysfunction. These observations have two significant implications: (i) in contrast to the SOFA metric, our method identifies a larger number of attributes to provide a comprehensive view of sepsis symptoms, allowing for a more detailed diagnostic criterion; and (ii) it is possible to focus on a smaller set of attributes to differentiate sepsis symptoms, potentially reducing the associated diagnostic time and associated cost. Our identification of three distinct MODS states associated with a higher mortality rate, provides insight into advanced management of sepsis in ICU environments.
We also analyzed the association of different demographics and comorbidity profiles with identified sepsis states. Our results revealed that these sepsis states are composed of distinct populations with different demographics and comorbidity profiles, some of which have been supported in prior results. We find that a higher percentage of patients in MODS states had developed liver disease before the onset of sepsis, validating that patients with liver disease are more prone to developing severe sepsis [31]. Studies have shown inflammatory response during the tumor progression [32,33]. We find that patients with metastatic cancer are over-represented in state A2. The effect of diabetes on the outcome of sepsis remains controversial [34][35][36][37]. We find that patients with uncomplicated diabetes are over-represented in state A3 and that patients with complicated diabetes are over-represented in state A4 (highest mortality rate). Fluid and electrolyte imbalances are commonly observed in critically ill patients with compromised kidney function [38,39]. We find that patients with fluid and electrolyte imbalances are over-represented in state A4. Our etiological analysis, presented in S3 Fig, also shows other associations between comorbidity profiles and sepsis states. However, these associations might not be statistically significant. Further virtual clinical trials [40] are needed to validate these associations. Although we used the most distinguishing attributes to analyze organ function, other attributes, including ABG, electrolytes, albumin, shock index, and hemoglobin, are also crucial in analyzing various aspects of the health status of sepsis patients. We also investigated these variables and found that these variables provide significant insight into patients' health status in different sepsis states; many of which are consistent with current literature. However, there are a few cases that are at odds with the current literature. We highlight a few cases here, and refer readers to the detailed description in the S1 Text section: Analyses of other variables. Metabolic acidosis often occurs in sepsis patients with organ failure, and metabolic alkalosis has been noticed in sepsis patients [41]. We also find that a higher percentage of metabolic acidosis occurs in MODS types. However, we observe fewer cases of metabolic alkalosis in our cohort. We also find that Arterial BE is a significant predictor of metabolic acidosis for sepsis patients. Hypocalcemia, measured by the concentration of ionized calcium or serum calcium in the body, might be observed in critically ill patients, especially in those with sepsis [42,43], and is reported to be associated with increased severity of illness and increased mortality [42]. We find that low ionized calcium concentrations coincide with worse outcomes, while low serum calcium concentrations do not.
By comparing the SIRS scores, SOFA scores, and mortality rates across the states, we confirm that the SIRS metric only identifies a subset of sepsis states [4]. Furthermore, our results show that the SOFA metric covers a broader spectrum of disease states and is a more accurate predictor of mortality for sepsis patients [16].
Finally, our framework provides insight into the complex states of sepsis and the pathological processes underlying state transitions. By analyzing the relationship between pre-existing comorbidities and sepsis states, changes in clinical measurements treatment actions during disease progression, and severity of each pathological trajectory, one can prognosticate individuals' outcomes and devise prevention and therapeutic strategies. As a narrative example, for patients with suspected sepsis, we examine 21 features selected from our feature selection methods that differentiate sepsis states. Once the sepsis state is identified, finer management of sepsis disease can be applied to each sub-group based on the associated clinical variables and comorbidities that are uniquely expressed in these states. Using the transition graph, we predict patients' transition path and corresponding treatment actions. For example, in the firstorder transition graph, if the patient were in state A4, the patient would be less likely to transition to state A5 or A6 and stay in state A4 in the next 4 hours. In addition, the average amounts of fluids, the dosage of vasopressors, and the usage of mechanical ventilators are presented as a reference point to guide treatment decisions.
In our patient cohort (AI clinician cohort), patients are defined as sepsis when the SOFA score is higher than two within the time window of 48 hours before and up to 24 hours after the onset of infection. However, Sepsis-3 requires a change in the SOFA score of two or more, consequent to infection. The baseline SOFA score can be assumed to be zero only when the presence of preexisting organ dysfunction is unknown. Therefore, although the patients in our cohort have a suspicion of infection, a subset of patients in our cohort have SOFA scores higher than two but do not manifest change in SOFA score of two or more over the baseline. Here, we analyze the number of non-sepsis patients included in our cohort if it was constructed under strict Sepsis-3 criteria. We define the change in SOFA score consequent to infection as the maximum SOFA score after onset (within the time window of up to 24 hours) minus the baseline SOFA score (within the time window of 48 hours before the onset) and identify the patient as sepsis if the difference is two or more. Baseline SOFA is measured as the minimum SOFA score among the available time points prior to the onset. If no available SOFA scores are present in the 48 hour time window, the baseline SOFA score is assumed to be zero. Our AI Clinician cohort includes 21,329 ICU stays. Of these 21,329 stays, 20,944 are included as our final cohort for the subsequent analyses since we focus on identifying sepsis states within the time window of 24 hours before and up to 48 hours after the onset of infection. The cohort constructed under the Sepsis-3 criteria includes 16,018 ICU stays, and all 16,018 of these ICU stays are also included in our AI Clinician cohort. Although there are 4,926 stays in our AI Clinician cohort that were excluded from the Sepsis-3 criteria since the change in SOFA score for these stays is smaller than 2, only 1,278 stays can be classified as non-Sepsis-3, as only these stays have available SOFA scores up to 48 hours before the onset to define baseline SOFA scores, as well as the available SOFA scores within 24 hours after the onset of sepsis to estimate the change in SOFA score. The other 3,648 stays are unknown with respect to their inclusion under Sepsis-3.
The current work characterizes correlation between the treatment actions and the state transitions. These correlations can be used to generate hypothesis for subsequent trials to establish causal relationships. Virtual clinical trials [40] or counterfactual queries [44,45] are needed to estimate causal effects of treatment actions on the state transitions or clinical outcomes. Ongoing research focuses on designing virtual clinical trials and construction of causal models to further analyze sepsis states based on the pathophysiology and on learning personalized intervention strategies.

Patient cohort
We use the sepsis cohort defined by AI Clinician [14]. Patient samples were collected from five distinct ICUs in Boston, Massachusetts, stored in the Medical Information Mart for Intensive Care version III (MIMIC-III) database. Patients that are diagnosed with sepsis when they had both suspicion of infection, defined as the presence of the prescription of antibiotics and sampling of bodily fluids for microbiological culture (as also used in prior work of Nemati et al. [46], Johnson et al. [47], and recommended by the Sepsis-3 criteria [10]), and the evidence of organ dysfunction, defined as the total SOFA score higher than 2 (baseline SOFA scores are assumed to be zero. [5,48]), within the time window of 48 hours before and up to 24 hours after onset of infection. As recommended by the Sepsis-3 criteria [10], the time of the onset is defined as the earliest event of the following two conditions: i) if the antibiotic was given first, the microbiological sample must have been collected within 24 hours; ii) if the microbiological sampling occurred first, the antibiotic must have been administered within 72 hours. Finally, sepsis patients whose age was less than 18 years old at the time of ICU admission, whose mortality was not documented, and withdrawal of treatment (vasopressors) were excluded from our cohort.

Data preprocessing
Data were included from up to 24 hours prior to the estimated onset of sepsis and 48 hours after the onset to capture the characteristics of the early stages of sepsis. The resulting cohort includes 16,546 distinct patients with 20,944 stays in ICUs. Patients' data are modeled as discrete multivariate time series with 4-hour time steps. Clinical variables associated with the patients are demographics, vital signs, lab values, severity measures such as SIRS and SOFA scores, and other information relating to use of ventilator and the number of comorbidities before sepsis infection. We also track whether the patient survived for 48 hours and extract the history of 30 types of comorbidities [49] before infection, as supplementary variables. Outliers were removed when the clinical variables were out of normal range (e.g., weight larger than 300 kg and blood pressure below 0). If there are multiple measurements within the 4 hour time period, either the max values are taken (e.g., vasopressor), are averaged (e.g., heart rate), or are accumulated (e.g., urine output and fluids). To address the issue of missing values, we use time-limited sample-and-hold method [50] for each variable with a different maximum hold time to identify missing values. Linear interpolation and k-nearest-neighbor methods were then used for the imputations [51].

Archetypal analysis [52]
Archetypal analysis views each point in a dataset as a mixture (convex combination) of "pure types", or "archetypes". The convexity constraint here implies that in contrast to traditional clustering techniques that aim to identify "typical" representatives, archetypal analysis aims to identify "extremal" points in the dataset. The archetypes are themselves mixtures (convex combinations) of the points in the data set. Archetypes can be learned by minimizing the squared error in representing each point as a mixture of archetypes. Specifically, let x 1 , . . ., x n be the data points in R m . The problem is to find a set of archetypes {z 1 , . . ., z K } so that each archetype z k is a convex combination of the data points, i.e., P n j¼1 b kj x j , with the constraints of: (i) β kj � 0 8j; and (ii) P n j¼1 b kj ¼ 1 (convexity constraint), and that each data point x i can be best approximated by a convex combination of the archetypes, i.e., P K k¼1 a ik z k , with the constraints: (i) α ik � 0 8i; and (ii) P p k¼1 a ik ¼ 1. We can then define the following optimization problem: and the archetype problem is to find α's and β's to minimize the objective 1 subject to the aforementioned constraints. This problem can be solved using general-purpose constrained nonlinear least squares methods, the alternating minimizing algorithm, or the projected gradient procedure. The learned archetypes (for K > 1) form a convex hull of the data set such that all of the points can be well-represented as a convex mixture of the archetypes. In our study, we first treat patient measurements as points in a high-dimensional space, and find archetypes for our cohort. These archetypes represent extreme states in sepsis, and each patient can be expressed as a convex combination of these states. We also use archetypal analysis to identify the archetypes of the transitions of clinical measurements. Please see Section Identification of transition markers for a detailed discussion.

Testing statistical significance of states
We characterize the statistical significance of each sepsis state based on the data points (patient records) mapped to the corresponding archetype from the cohort. Each point is a mixture of sepsis states, and we can assign the point to the closest sepsis state (the corresponding archetype). A statistical interpretation of this formulation views data points as mixtures of samples from the six distinct multivariate distributions. To ascertain that these distributions are indeed distinct, we first validate that the probability distributions corresponding to these groups are significantly different from the distribution of the cohort as a whole. We also test to ensure that the probability distribution of each group is significantly different from others, as characterized by a multivariate analysis of variance (MANOVA) procedure. Two-sample testing is sensitive to the homogeneity of covariance matrices from the compared populations. We use the Box test to compare variation in multivariate samples. We then use a Hotelling T-Squared testing variant to compare the mean vectors from two populations. We note that covariance matrices and mean vectors from the compared pairs are significantly different, with a 95% confidence interval. Statistics of pairwise Hotelling t-square test across sepsis states and twosample t-test for each variable for each sepsis state are shown in S2 and S3 Tables, respectively. Comparing mean vectors from two populations. We use Two-sample Hotelling's T 2 tests to characterize significant differences between the mean vectors of two multivariate distributions (in reality, datasets drawn from these distributions). Two-sample Hotelling's T 2 tests are sensitive to violations of the assumption of equal variances and covariances. Different approximation of the sample variance is needed when the covariance matrices of the two populations are significantly different. We use Box's M test for significant differences between covariance matrices.
Testing homogeneity of covariance matrices: Box's M Test. Consider a sample set fx 11 ; . . . ; x 1n 1 g in R m sampled from population Θ 1 and a sample set fx 21 ; . . . ; x 2n 2 g in R m sampled from population Θ 2 . Assuming that the sample sizes n 1 and n 2 are sufficiently large, Box's M Test tests the null hypothesis that the population covariance matrices are equal, i.e., H 0 : S 1 = S 2 . Let S 1 and S 2 be the sample covariance matrices from the populations Θ 1 and Θ 2 , where each S j is based on n j independent observations, we define the pool variance S pooled as follows: and the value of M is given by: M ¼ ðn 1 þ n 2 À 2Þ ln jS pooled j À ððn 1 À 1Þ ln jS 1 j þ ðn 2 À 1Þ ln jS 2 jÞ: The null hypothesis H 0 is rejected when Mð1 À cÞ > w 2 df ðaÞ (or p-value < α). Testing homogeneity of mean vectors: Hotelling's T-Squared test. We test the equality of vector means from populations Θ 1 and Θ 2 . The null hypothesis is that the population means are equal, i.e., H 0 : μ 1 = μ 2 . If Box's M test indicates that the two covariance matrices are not significantly different, we can assume S 1 = S 2 and: If Box's M test concludes that S 1 6 ¼ S 2 , In either case, T 2 approximates chi-square distribution with m degrees of freedom, i.e., w 2 m . The null hypothesis is rejected when T 2 > w 2 m ðaÞ (or p-value < α).

Low-dimensional embeddings of dataset
We use Uniform Manifold Approximation and Projection (UMAP) [53] to compute a mapping from a dataset X = {x 1 , . . ., x n } in R m to its corresponding lower-dimensional representation Y = {y 1 , . . ., y n } in R d that preserves as much of the local and the global structure from the original space. UMAP assumes that the dataset X is uniformly drawn from a Riemannian manifold M. With this assumption, the goal is to reconstruct M and to find a mapping from M into R d . To do so, UMAP first approximates the manifold and finds a fuzzy simplicial set that captures all topological properties of the manifold M. Similarly, given a current lower-dimensional representation in Y 0 of the data X in R m , it can also construct a fuzzy simplicial set from Y 0 . Having the two fuzzy simplicial sets, one constructed from X and the other constructed from Y 0 , UMAP then measures how good Y 0 is as a representation of X using cross-entropy C of two fuzzy sets: CððA; mÞ; ðA; uÞÞ ¼ X a2A mðaÞlog mðaÞ uðaÞ The above objective function can be minimized using first-order optimization methods or second-order methods [54,55].

Feature selection methods [56]
Three criteria are developed to identify discriminative attributes for each state. The first method, which we refer to as the Q j (P K ), calculates the discriminative power of feature i for a given clustering as the ratio of inter-cluster inertia to the total inertia computed using feature i. Intuitively, this method quantifies the heterogeneity of feature i across clusters. The second method, which we refer to as Q 0 j ðP K Þ, calculates the discriminative power of feature i as the ratio of inter-cluster inertia computed using attribute i to total inter-cluster inertia computed using all attributes. Intuitively, this method computes the relative heterogeneity of feature i with respect to all other features. The third method uses a variation test that selects features with the lowest probability of overlap across clusters.
Quality-index based approach. Given a set S = {x 1 , . . ., x N } of N points in R m that is partitioned as P K = {C 1 , . . ., C K }, where for each cluster pair C i , C j , 1 � i, j, �K and i 6 ¼ j, C i T C j = F, define g k to be the mean values of the instances in cluster C k , g be the average values over all the instances in S, the total inertia T ¼ P N i¼1 d 2 ðx i ; gÞ measures the dispersion of the points in the set S, where d 2 ðx i ; x i 0 Þ ¼ P m j¼1 ðx ij À x i 0 j Þ 2 is the squared Euclidean distance. According to Huygens-Steiner Theorem, the total inertia T can be decomposed into inter-cluster inertia B and within-cluster inertia W: Here, inter-cluster inertia B measures the separation between the clusters, I(C k ) is the inertia for cluster C k , and the within-cluster inertia W is the summation of the inertia of the clusters that measures the heterogeneity within the clusters.
Variable quality. The quality index is given by the ratio of the homogeneity value of each cluster and the corresponding homogeneity value associated with the partition P 0 = S. This can be interpreted as the gap between the null hypothesis, i.e., partition into one cluster, and partition into k clusters. We can use the quality index at each variable (in our case, patient feature) j to find the importance of the features. Given a set of points S partitioned by P K , the null hypothesis is that the total inertia of the system is T j . Since the partition P K is given, the total inertia of the system is fixed at T j and the optimal strategy of assigning clusters is to select minimal within-cluster inertia, W j . This leads to the following ratio as a measure of variable quality: Alternatively, since the partition P K is given, we can ignore the variability of within-cluster inertia in the variable quality measure. That is, we only consider how each variable contributes to the total inter-cluster inertia: In our study, we use both Q j (P K ) and Q 0 j ðP K Þ variable quality measures for feature selection. We also include a third feature selection criteria based on a variation test.
Variation test. Given a partition P K = {C 1 , . . ., C K } of a set of points S, we can regard the points in cluster C i as being sampled from a distinct multivariate probability distribution Θ i . To find the j-th feature that can distinguish the clusters, there exist at least two pairs of marginal distributions such that the difference of the mean value at the j-th feature sampled from the compared marginal distributions is larger than some threshold θ with probability 1 − δ, where δ is sufficiently small. We set θ = σ ij when comparing clusters C i and C l (l 6 ¼ i) and collect the union of the variables from each case as the selected features. Although this feature selection method treats each dimension independently, we find that the selected features are similar to those using quality-index based approaches. See S1 Text for a detailed comparison.

Expression of primary functions
We measure the overall expression of each of these primary functions for each sepsis state, shown as a spider-plot of primary functions affected in each sepsis state in Fig 4, using the level of corresponding biomarkers, weighted by the confidence level (as measured by the number of overlaps between feature selection methods). For sepsis state i, we calculate the distance d i between readings from the biomarkers to the boundary of the normal range for each primary function. We then normalize each d i by dividing by the maximum of the distance across sepsis states, i.e., d i /max{d 1 , . . ., d K }. Since each primary function can be tested by more than one biomarker, each of which has a different confidence level, we define the final expression of the primary function as a summation of normalized distances. The final value is linearly normalized into the range from 0 to 10, with higher values indicating higher expression of primary function.

Z-score analysis for comorbidity profiles
We assess whether the comorbidity profile of interest is uniquely expressed or suppressed in a certain sepsis state. We compute the corresponding Z-score: where μ j is the rate of presence of comorbidity j among all the patients in the cohort, σ j is its corresponding standard deviation, and w i j is the reference point, calculated as the rate of the presence of comorbidity i among patients who have passed through sepsis state j. Here, z i j measures how far the reference point w i j is from the population mean for the sepsis state i. If z i j is positive, w i j is expressed in sepsis state i, and vice versa.

Analysis of sepsis progression
Higher-order Markov chains. Higher-order Markov chains are used to model transitions across sepsis states [57]. Formally, given a dataset (observational traces) D from m patients. where h i denotes patient i's trace with n i time ordered points, and each point is defined as (t, x, a), meaning that at time t, clinical measurements x, and sepsis state a are observed. We model the transition of disease through these states using a higher-order Markov chain. The transition probability of l-th order Markov model can be written as: where l � 1 denotes the order of a Markov chain and i j denotes the realization of sepsis state at time point t − j. The higher-order Markov model can be represented as a directed valued De Bruijn graph, denoted as BðK; lÞ, where the set of vertices is given by: V ¼ A K ¼ fða 1 ; � � � ; a 1 ; a 1 Þ; ða 1 ; � � � ; a 1 ; a 2 Þ; � � � ; ða 1 ; � � � ; a 1 ; a K Þ; ða 1 ; � � � ; a 2 ; a 1 Þ; � � � ; ða K ; � � � ; a K ; a K Þg; and the set of edges is given by: E ¼ fððâ 1 ;â 2 ; � � � ;â l Þ; ðâ 2 ; � � � ;â l ; a j ÞÞ : a i 2 A; 1 � i � l; 1 � j � Kg; where each edge takes the value of its corresponding transition probability. In our study, we analyze sepsis transitions up to third order Markov models.
Identification of transition markers. Archetypal analysis and z-score analysis are used to identify transition markers across sepsis states. Formally, let transition dataset G ¼ fg 1 ; . . . ; g m 0 g represent gradients of clinical measurements on transition from one sepsis state to another, collected from dataset D of m patients. We first find distinct groupings of gradients using archetypal analysis. That is, given a transition dataset G ¼ fg 1 ; . . . ; g m 0 g, we aim to find a set of archetypes of gradients so that each gradient is a convex combination of archetypes and each archetype is a convex combination of the gradients. Once archetypes are identified, we define gradient states by mapping the gradient points to the corresponding archetype. We then compute z-scores to find the transition markers by finding the subset of features (components) of each cluster that are over-represented or suppressed. The z-score is computed as follows: Here, � m j is the mean value of j-th feature over the entire population, � s j is its corresponding standard deviation, and m i j is the mean value of j-th feature for gradient group i. The z-score z i j measures how far the reference point m i j is from the population mean for the gradient group i. If z i j is positive, m i j is expressed in gradient group i, and vice versa.