Fig 1.
Overview of the proposed framework: Kinemes (elementary head motions), action units (atomic facial movements) and speech features employed for explainable trait prediction.
Fig 2.
(a) Plots of 16 kinemes extracted for the FICS dataset following raster ordering (left to right, top to bottom) and (b) Selected kineme plots for the MIT dataset.
Table 1.
18 AUs common to the FICS and MIT datasets.
Fig 3.
Trimodal feature fusion architecture.
Linear activation is applied on the dense layer output for regression. N denotes the number of neurons per layer. The dense layer output involves linear activation and 32 neurons in the LSTM layer for regression model.
Fig 4.
Additive soft attention fusion.
(a) Additive attention fusion architecture overview, and (b) Attention score computation process (FC layer comprises twelve neurons). N denotes the number of neurons per layer. Linear/sigmoid activation is applied on the dense layer output for regression/classification.
Table 2.
Trait-wise train (Tr) and test (Te) class distributions for the FICS and MIT datasets obtained for classification experiments.
MIT class distributions correspond to 1-minute video samples employed for analysis.
Table 3.
Unimodal and multimodal regression results on the MIT dataset.
Accuracy and PCC values are tabulated as (μ±σ) values, with highest PCC achieved per trait denoted in bold.
Table 4.
Unimodal and multimodal regression results on the FICS dataset.
Accuracy and PCC values for different methods are tabulated, with highest PCC achieved per trait denoted in bold.
Table 5.
Unimodal and multimodal classification results on the MIT dataset.
Accuracy and F1-score are tabulated as (μ±σ) values, with highest F1 achieved per trait denoted in bold.
Table 6.
Unimodal and multimodal classification results on the FICS dataset.
Accuracy and F1-score for different methods are tabulated, with highest F1 achieved per trait denoted in bold.
Table 7.
Soft Additive Attention Fusion Results over the 2s behavioral slice: MIT Dataset (top, results tabulated as (μ ± σ) values) and FICS Dataset (bottom).
Table 8.
Comparison with prior works for the two datasets.
Table 9.
Regression results for model generalisability.
Table 10.
Explaining OCEAN and interview traits via kinemes and AUs.
MIT kinemes in bold font are visualized in Fig 4.
Fig 5.
Mean modality-specific attention weights for personality traits (left) and interview traits (right).
Error bars denote standard error.