Using computer-vision and machine learning to automate facial coding of positive and negative affect intensity

doi:10.1371/journal.pone.0211735

Fig 1.

Emotion-evoking task.

More »

Expand

Fig 2.

Machine learning procedure.

The goal of our first analysis was to determine whether or not CVML could perform similarly to humans in rating facial expressions of emotion. For each AU evidence time-series, we computed the normalized (i.e., divided by the total time that FACET detected a face) Area Under the Curve (AUC), which captures the probability that a given AU is present over time. All AUC values (20 total) were entered as predictors into the random forest (RF) model to predict the average coder rating for each recording. To test how similar the model ratings were to human ratings, we separated the data into training (3,060 recordings) and test (1,588 recordings) sets. We fit the RF to the training set and made predictions on the unseen test set. Model performance was assessed by comparing the Pearson and intraclass correlations between computer- and human-generated ratings in the test sets.

More »

Expand

Fig 3.

Sensitivity of model performance to different training/test splits.

Results of sensitivity analyses across different splits of the training and test sets. We created 1,000 different splits of the training and test sets, fit the RF to each training set, and then made predictions on each respective test set. We stored the Pearson correlations between human- and model-generated ratings for each iteration. Distributions therefore represent uncertainty in prediction accuracy. Means of the distributions (superimposed on respective graphs) are represented by dashed red lines.

More »

Expand

Table 1.

Correlations between human- and computer-generated valence ratings.

More »

Expand

Fig 4.

Model performance within participants.

Distributions of within-participant Pearson correlations for positive and negative ratings in the training (all 125 participants) and test (122 participants; correlations could not be computed for 3 participants who had 0 variance in human ratings) sets. Red dashed lines represent median within-participant Pearson correlations for each distribution. Intraclass correlations for corresponding figures are reported in text.

More »

Expand

Fig 5.

Main and interactive effects among and between AUs for positive and negative ratings.

Relative importance of the main effects and interactions among all AUs for positive and negative human-coder ratings. Relative importance (normalized partial dependence from the RF model) is a measure the SD in the outcome variable (i.e. positive or negative affect intensity) attributable to each AU while integrating over all other AUs, and it can be interpreted as how important a given AU is with respect to all other AUs. Note that partial dependence is not directional (see Fig 6 for directional effects). Visual depictions of the 5 most important AUs for predicting positive and negative ratings are shown on the graphs. Because there are 190 possible combinations of AUs for displaying interactive effects, we only show the top 20 here for brevity.

More »

Expand

Fig 6.

Directionality of main and interactive effects.

Partial dependence plots of the 5 most important main and interactive effects for both positive and negative affect intensity ratings. Partial dependence indicates the predicted affect intensity while integrating over all other AUs. Panel (A) shows the directionality of main effects, where increasing (decreasing) values indicate positive (negative) effects as AU presence increases. Panel (B) shows directionality of interactive effects, where warmer (cooler) colors indicate higher (lower) affect intensity ratings given specific combinations of AU presence scores on the x- and y-axes.

More »

Expand

Table 2.

Correlations between human- and computer-generated ratings within conditions.

More »

Expand

Fig 7.

AU relative importance values across task instructions.

Relative importance of each AU for positive valence and negative valence human-coder ratings within each of the three task instructions (enhance, react normally, suppress). Intraclass correlation coefficients–both treating importance values as average [ICC(3)] and single [ICC(1)] units–are superimposed. We show ICC(3) here because the AU importance scores could be interpreted as “averages” across all recordings.

More »

Expand