A deep state-space analysis framework for cancer patient latent state estimation and classification from EHR time-series data

doi:10.1371/journal.pone.0341003

Fig 1.

Deep state-space analysis framework overview.

(a) An overview of the deep state-space model used for latent state estimation in the context of time-series EHR is shown. The deep state-space model is a DNN-based time-series prediction model, and it provides an example configuration for each DNN. Here, represents the observed state at time t (t = 1, 2, …, T), indicating the input of various test items from the time-series EHR. represents the latent state at time t (t = 1, 2, …, T), and the output yields a latent state represented in multiple dimensions. (b) Visualization of latent states is conducted using UMAP. The upper part of (b) shows the distribution of surviving and deceased patients in the latent space, while the lower part shows the typical trajectories for each group. (c) Stratification of patient states is performed using k-means. Each cluster is classified into one of three categories: Dangerous State, Intermediate State, and Stable State, and their transition probabilities are shown at the bottom of (c).

More »

Expand

Table 1.

Demographic and clinical characteristics of the patient population. The table presents the number of patients in the overall study cohort (All; N = 12,695) and the deceased subgroup (Dead; N = 4,668). Patients are stratified by age group, gender, and primary diagnosis according to the International Classification of Diseases, 10th Revision (ICD-10).

More »

Expand

Table 2.

Overview of the input features used for the model. This table details the four feature categories, the methods for data selection and preprocessing (such as normalization and missing value imputation), and the final dimensionality of each category.

More »

Expand

Fig 2.

(a) The differences in the distribution of endpoints over the time-series of latent states for deceased patients (red) and surviving patients (blue) are shown.

(b) The results of patient stratification and the number of endpoints for deceased and surviving patients in each cluster are shown. Cluster I (red), Cluster II (yellow), and Cluster III (green) correspond to the dangerous state, the intermediate state, and the stable state, respectively. (c) The state transitions for deceased and surviving patients are shown as an example. The blue plots represent the latent states of patients across all time-series, the color bar indicates the number of days elapsed from the endpoint, and the plots change from white to red, and from red to black over time. The example state transitions for deceased patients show transitions from cluster III to II to I, while the example for surviving patients shows transitions back and forth between clusters III and II. (d) The transition probabilities between the three clusters for deceased and surviving patients. The thickness of the lines between the clusters represents the magnitude of the probability.

More »

Expand

Fig 3.

(a) A bubble plot depicting the percentage of outliers for the top 10 items with a large difference in distribution (Wasserstein distance) between clusters.

The size of the bubble indicates the percentage of abnormally low, normal, and abnormally high values in each cluster. The x-axis indicates whether test values are abnormally low, normal, or abnormally high; the y-axis corresponds to the type of test item, and the color corresponds to the cluster. The items are arranged in descending order based on the magnitude of the difference in distribution between clusters. (b) A stacked bar graph illustrates the percentage of abnormal values for the top 2 items, HGB and HCT, with a difference in distribution between clusters. The graph shows the percentage of abnormal values for HGB and HCT in each drug-administered cancer patient and each cluster. In all patients with cancer receiving chemotherapy, there is a higher percentage of abnormally low values for HGB and HCT in Cluster I, which is a dangerous state. (c) For each patient with cancer receiving drug treatment, we examined the characteristic differences in the distribution between clusters for lymphocytes and segmented neutrophils values. The graph illustrates the percentage of abnormal values for each drug-administered cancer patient in each cluster. In patients treated with Afatinib, Nivolumab, and Osimertinib, there is a higher proportion of abnormally low values for lymphocytes and abnormally high values for segmented neutrophils in Cluster I, which is a dangerous state.

More »

Expand

Table 3.

The eight anticancer drugs selected for the analysis of temporal risk factors. The table lists each drug and its corresponding drug classification.

More »

Expand

Fig 4.

(a) The differences in the distribution of endpoints over time for deceased patients (red) and surviving patients (blue) in the time-series of latent states obtained by the proposed method (deep state-space model) and each comparison method (PCA, VAE, linear state-space model) is shown.

(b) This illustrates the state transition of a deceased patient as an example. The blue plots represent the latent states of all patients across all time-series and the plots change from white to red and then to black as time progresses.

More »

Expand