Explainable human-centered traits from head motion and facial expression dynamics

doi:10.1371/journal.pone.0313883

Fig 1.

Overview of the proposed framework: Kinemes (elementary head motions), action units (atomic facial movements) and speech features employed for explainable trait prediction.

More »

Expand

Fig 2.

(a) Plots of 16 kinemes extracted for the FICS dataset following raster ordering (left to right, top to bottom) and (b) Selected kineme plots for the MIT dataset.

More »

Expand

Table 1.

18 AUs common to the FICS and MIT datasets.

More »

Expand

Fig 3.

Trimodal feature fusion architecture.

Linear activation is applied on the dense layer output for regression. N denotes the number of neurons per layer. The dense layer output involves linear activation and 32 neurons in the LSTM layer for regression model.

More »

Expand

Fig 4.

Additive soft attention fusion.

(a) Additive attention fusion architecture overview, and (b) Attention score computation process (FC layer comprises twelve neurons). N denotes the number of neurons per layer. Linear/sigmoid activation is applied on the dense layer output for regression/classification.

More »

Expand

Table 2.

Trait-wise train (Tr) and test (Te) class distributions for the FICS and MIT datasets obtained for classification experiments.

MIT class distributions correspond to 1-minute video samples employed for analysis.

More »

Expand

Table 3.

Unimodal and multimodal regression results on the MIT dataset.

Accuracy and PCC values are tabulated as (μ±σ) values, with highest PCC achieved per trait denoted in bold.

More »

Expand

Table 4.

Unimodal and multimodal regression results on the FICS dataset.

Accuracy and PCC values for different methods are tabulated, with highest PCC achieved per trait denoted in bold.

More »

Expand

Table 5.

Unimodal and multimodal classification results on the MIT dataset.

Accuracy and F1-score are tabulated as (μ±σ) values, with highest F1 achieved per trait denoted in bold.

More »

Expand

Table 6.

Unimodal and multimodal classification results on the FICS dataset.

Accuracy and F1-score for different methods are tabulated, with highest F1 achieved per trait denoted in bold.

More »

Expand

Table 7.

Soft Additive Attention Fusion Results over the 2s behavioral slice: MIT Dataset (top, results tabulated as (μ ± σ) values) and FICS Dataset (bottom).

More »