Fig 1.
The scheme of the presented approach.
First, for each person, ECG data is collected, from which a sequence of R-R interval lengths is extracted. Next, we apply a rolling window of size 60 or 300 to the aforementioned sequence of R-R interval lengths. The step between two consecutive windows is one, therefore there is an overlap of 59 or 299 R-R values between the two consecutive time series extracted by our rolling window algorithm. Then, for each window, a classifier predicts whether it belongs to the person from the control or treatment group. Finally, a single decision for a given person is returned, based on predictions for multiple windows.
Fig 2.
(a) Histogram of hourly activity for each group.
Each bar indicates the number of individuals whose measurements covered the corresponding hour. (b) Histogram of age distributions across the control and treatment groups with kernel density estimation of ages.
Fig 3.
The distributions of five folds designed for the cross-validation classification experiments with 60-element time rolling windows (a–e).
Each fold contains six patients and six persons from the control group, and when merged, they form the whole dataset. In consecutive iterations, samples (sequences of R-R interval lengths) from four folds form a training and a validation set, while the remaining fold constitutes a test set. (f) Comparison of median R-R interval values with corresponding standard deviations across the two experiment groups and five data folds.
Table 1.
Experimental results for different methods and time window lengths averaged over 5 test folds, in terms of overall accuracy .
Also, in the case of the non-deterministic methods, results are additionally averaged over 5 training runs per each cross-validation step. Bold font indicates methods with the highest accuracy in a given category. MLP – multilayer perception; FSH – feature selection with automatically configured hypothesis tests; GRU – Gated Recurrent Unit; FCN – fully convolutional network.
Fig 4.
Results of Cohen’s d between consecutive pairs of classifiers, calculated for the two considered time window sizes: 60 (a) and 300 (b), corresponding to the experiments described in Table 1.
Table 2.
The results of the leave-one-out cross-validation experiment averaged over 60 test folds (in terms of overall accuracy ).
In the case of the non-deterministic method, like GRU + FCN, results are additionally averaged over 3 training runs for each cross-validation step. Bold font indicates the highest-performing method in each category.
Fig 5.
The distributions of 60 test fold accuracies for the leave-one-out cross-validation experiment for GRU + FCN (a), averaged over three different runs, and Ensemble of SVMs methods (b).
Fig 6.
The distribution of feature impact on model predictions of the treatment (positive SHAP values) and the control group (negative SHAP values) as a SHAP bee-swarm plot.
Every point represents a data instance with clusters indicating the data density. Rows and columns correspond to features and SHAP value, respectively, while color intensity represents the feature value. The red color indicates high values of the considered features, while the blue color represents the opposite case. Features are ranked by their impact on the model output. Data is from a 300-element windows’ experiment on a dataset sampled every 500th instance.
Table 3.
Autocorrelation function with lag values, i.e. ACF(i), where , for control and treatment groups across all time windows.
For lags greater than 2, the mean ACF values for the treatment group consistently exceed those of the control group.
Fig 7.
The influence of features on model predictions, presented as a SHAP heatmap.
Rows and columns correspond to features and instances, respectively. Instances are sorted according to the model’s output score. Color intensity represents the SHAP value, with red indicating a positive impact and green a negative impact on the model output (probability of belonging to the positive class) relative to the baseline. Features are ranked by their importance and are also presented as sidebars. Data is from a 300-element windows’ experiment, on a dataset sampled every 500th instance.
Fig 8.
The overview of R-R values and predictions of XGBoost, Ensemble of SVMs, and GRU + FCN classifiers for consecutive time windows for the three selected individuals.
Green areas correspond to the correctly classified periods, while red areas refer to the opposite case. (a) depicts the individual from the treatment group being relatively straightforward for all three classifiers. (b) corresponds to the selected control group individual for whom most of the time windows are classified incorrectly by all three compared methods. Finally, (c) represents the treatment group individual classified mostly accurately by XGBoost and Ensemble of SVMs but misclassified by GRU + FCN.
Fig 9.
The overview of R-R values and predictions of XGBoost, Ensemble of SVMs, and GRU + FCN classifiers for consecutive time windows for the two individuals.
Green areas correspond to the correctly classified periods, while red areas refer to the opposite case. (a) corresponds to the control group individual for whom selected contiguous time windows were classified incorrectly. All three tested methods made errors for periods corresponding to lower R-R interval lengths, while for the remaining time windows, the Ensemble of SVMs achieved the highest performance. (b) depicts the individual from the treatment group whose signal is demanding for classifiers. Only the middle part of the measurements mostly led to the correct labelling (except for GRU + FCN, making many prediction errors but still less than for the remaining parts of the signal).
Fig 10.
Histogram of R-R intervals for treatment group participants, stratified by quetiapine usage (quetiapine vs. non-quetiapine treatment subgroups).