A computational lens into how music characterizes genre in film
Table 3
The six pooling functions, where xi refers to the embedding vector of instance i in a bag set B and k is a particular element of the output vector h.
In the multi-attention equation, L refers to the attended layer and w is a learned weight. The attention module outputs are concatenated before being passed to the output layer. In the feature-level attention equation, q(⋅) is an attention function on a representation of the input features, u(⋅).