Machine learning for passive mental health symptom prediction: Generalization across different longitudinal mobile sensing studies

doi:10.1371/journal.pone.0266516

Fig 1.

Modeling overview.

More »

Expand

Table 1.

Comparing the CrossCheck and StudentLife datasets used in this work.

More »

Expand

Table 2.

The ecological momentary assessment (EMA) symptom outcome measures collected during the CrossCheck study.

More »

Expand

Table 3.

The mental health ecological momentary assessment (EMA) symptom outcome measures collected during the StudentLife study.

More »

Expand

Table 4.

The sensor data alignment between CrossCheck and StudentLife sensing data.

More »

Expand

Fig 2.

Summary of the 44 features used for prediction.

Each data type on the left-hand side is summarized over a 3-day period for each epoch (e.g. 12AM - 6AM) using the aggregation technique (mean or count) described on the right-hand side. Aggregations were performed to align features with ecological momentary assessment (EMA) mental health symptom outcomes.

More »

Expand

Table 5.

Summary of the aligned training and validation data.

More »

Expand

Fig 3.

Example feature distribution differences across datasets.

Assessing feature distributional differences across the CrossCheck (CC), StudentLife sleep EMA (SL: Sleep), and stress EMA (SL: Stress) validation data for an example 11 features across data types. Each subfigure shows a boxplot of the feature distribution within each specific dataset. The centerline of the boxplot is the median, the box edges the interquartile range (IQR), and the fences on the boxplot are values 1.5 x the IQR. The “Missing Days’’ distribution is a histogram, describing counts across participants. A “*” is listed above each of the StudentLife datasets if the distribution differed significantly (Mann-Whitney U test, two-sided, or Chi-square test of independence, α = 0.05) from CrossCheck. The numbers above the “*” are the rank-biserial correlation (RBC) or Cramer’s V, which shows the magnitude of these differences. EMA: Ecological momentary assessment.

More »

Expand

Fig 4.

Outcome distribution differences across datasets. Sleep (left column) and stress (right column) ecological momentary assessment (EMA) validation distributions for CrossCheck (CC, top row) and StudentLife (SL, bottom row) data. The height of each bar represents the EMA response, where the specific response is listed on the x-axis under that bar. On the bottom, a “*” indicates whether there were significant (Mann-Whitney U test, two-sided, α = 0.05) differences between CrossCheck and StudentLife EMA distributions, with rank-biserial correlation (RBC) values listing the magnitude of these differences.

More »

Expand

Table 6.

Sensitivity analysis of predictive models using different training datasets.

More »

Expand

Fig 5.

Sensitivity analysis reveals the combined versus single-study data is more likely to be predictive.

The left y-axis describes the ΔMAE = MAE_Single-MAE_Combined against the sorted distribution percentiles (x-axis). The thick green solid line represents the ΔMAE percentiles, and the dashed black intersection lines show the percentile value (x-axis) where ΔMAE = 0. The right y-axis describes the actual MAE for the combined (blue solid line), and single-study (dashed orange line) data at each percentile. The baseline MAE, or error for a model predicting the average of the training data, is described by the dotted horizontal red line. Wilcoxon signed-rank test (one-sided) statistics (W), p-values, and rank-biserial correlations (RBCs) are included for models where across hyperparameters, using combined data significantly (α = 0.05) outperformed using single-study data (one-sided test). Shaded areas represent 95% confidence intervals around the mean. EMA: Ecological momentary assessment.

More »

Expand

Table 7.

Uncovering the association between distributional distance (ΔPAD) and model performance (ΔMAE).

More »

Expand

Fig 6.

Personalization increases training and held-out data alignment, but is not guaranteed to improve prediction performance.

(A) Effects of personalization by changing the number of neighbors (x-axis) used for model training on the feature distribution alignment between training and leave-one-subject-out cross-validation (LOSO-CV) participants (Proxy-A distance, y-axis). (B) Effects of changing the number of neighbors (x-axis) during model training on the model mean absolute error (MAE, y-axis). On all plots, each point is the mean Proxy-A distance (A) or MAE (B) across hyperparameters, and error bars are 95% confidence intervals around the mean. Each plot is split by the training data used (combined versus single-study), and plots are specific to the LOSO-CV result for a study (CrossCheck/StudentLife) and EMA (Sleep/Stress).

More »

Expand

Fig 7.

SMOTE increases sensitivity, positive predictive value, but reduces specificity and increases mean absolute error.

SMOTE (see legends) oversampled under-represented ecological momentary assessment (EMA) values. The height of each bar is the mean value of the metric described on the x-axis across hyperparameters. Error bars are 95% confidence intervals around the mean. Plots are specific to the leave-one-subject-out cross-validation (LOSO-CV) result for a study (CrossCheck/StudentLife) and ecological momentary assessment (EMA) (Sleep/Stress). The specificity, sensitivity, and positive predictive value (PPV) were calculated by transforming regression results into a classification problem by labeling the two most severe symptom classes in each EMA with a “1” and other symptoms as “0”. Otherwise, the plots analyzed the regression mean absolute error (MAE). “*” indicates p<0.05, and “✝” indicates p<0.10, for a Wilcoxon signed-rank test (one-sided) exploring differences using SMOTE/not using SMOTE across hyperparameter combinations.

More »

Expand