Table 1.
A breakdown of the 110 films in our dataset.
Only 33 of the films have only one genre tag; the other 77 films are multi-genre. A list of tags for every movie is given in S1 Appendix.
Fig 1.
A film is partitioned into non-overlapping five-second segments. For every segment, Dejavu will predict if a track in the film’s soundtrack is playing. Cues, or instances of a song’s use in a film, are built by combining window predictions. In this example, the “Cantina Band” cue lasts for 15 seconds because it was predicted by Dejavu in two nearby windows.
Table 2.
Auditory features used and feature type.
Table 3.
The six pooling functions, where xi refers to the embedding vector of instance i in a bag set B and k is a particular element of the output vector h.
In the multi-attention equation, L refers to the attended layer and w is a learned weight. The attention module outputs are concatenated before being passed to the output layer. In the feature-level attention equation, q(⋅) is an attention function on a representation of the input features, u(⋅).
Fig 2.
Neural network model architecture.
Table 4.
Classification results on the 110-film dataset.
Performance metrics using leave-one-out cross-validation for each cue-level feature model are reported. IMV stands for Instance Majority Voting; FL Attn for Feature-Level Attention. Simple MI and IMV results represent performance with the best base classifier (kNN, SVM, and random forest were tried). All models reported mean-averaged precision significantly better than the random guess baseline (p <.01), as given by a paired t-test.
Fig 3.
Feature importance by genre and feature group, reported with 95% CI error bars.
Table 5.
Difference in median brightness and contrast (×101) across all films labeled with a given genre against median brightness and contrast of the set of films excluding the given genre.
Bold values show a statistically significant difference, as given by a Mann-Whitney U test with Bonferroni correction (α = 0.01, m = 6) between the median of films including a given genre against those excluding it, within a given prediction source (Actual, Predicted, or False Positive).