Fig 1.
A labyrinth of data representation choices for a MER algorithm.
The choices that we made for the benchmark are highlighted in red.
Fig 2.
Annotation interface for both continuous (upper-left corner) and static per song (middle; using the self-assessment manikins [43]) ratings of arousal.
Table 1.
The data overview of DEAM.
Table 2.
Cronbach’s α and generalized additive mixed models (GAM)’s coefficient of determination (mean and standard deviation) per year.
Fig 3.
Fitted GAMs for the arousal and valence annotations of two songs.
Fig 4.
Liking of the music and confidence in rating for a) valence, Spearman’s ρ = 0.37, p-value = 2.2 × 10−16 b) arousal, Spearman’s ρ = 0.29, p-value = 2.2 × 10−16.
Fig 5.
Krippendorff’s α of dynamic annotations in 2015, averaged over all dynamic samples.
Table 3.
Performance of the algorithms for arousal and valence in year 2013.
BLSTM-RNN—Bi-directional Long-Short Term Memory Recurrent Neural Networks. GPR—Gaussian Processes Regression. SVR—Support Vector Regression.
Table 4.
Performance of the algorithms for arousal and valence in year 2014.
KF—Kalman Filter. LSTM—Long-Short Term Memory Recurrent Neural Network. CCRF—Continuous Conditional Random Fields. CCNF—Continuous Conditional Neural Fields. MR—Multi-level regression. PLSR—Partial Least Squares Regression.
Table 5.
Performance of the algorithms for arousal and valence in 2015.
BLSTM-ELM—BLSTM-based multi-scale regression fusion with Extreme Learning Machine. AE-HE-BLSTM—BLSTM + features created through deep learning. LS—Linear regression + Smoothing. LSB—Least Squares Boosting + Smoothing. SVR + CCRF—SVR + Continuous Conditional Random Fields.
Table 6.
Performance of the different algorithms for arousal and valence, using the baseline feature-set.
Combo—An unweighted combination of LS, LSB and Boosted ensemble of single feature filters.
Fig 6.
Distribution of the labels on arousal-valence plane for a) development-set b) evaluation-set.
Table 7.
Performance of the different feature-sets on valence, development and evaluation-sets of 2015, 20 fold cross-validation.
Table 8.
Performance of the different feature-sets on arousal, development and evaluation-sets of 2015, 20 fold cross-validation.