Invariant recognition drives neural representations of action sequences

doi:10.1371/journal.pcbi.1005859

Invariant recognition drives neural representations of action sequences

Fig 6

Feature representation empirical dissimilarity matrices.

We used feature representations, extracted with the four Spatiotemporal Convolutional Neural Network models, from 50 videos depicting five actors performing five actions at two different viewpoints, frontal and side. Moreover, we obtained Magnetoencephalography (MEG) recordings of human subjects’ brain activity while they were watching these same videos, and used these recordings as a proxy for the neural representation of these videos. These videos were not used to construct or learn any of the models. For each of the six representations of each video (four artificial models, a categorical oracle and one neural recordings) we constructed an empirical dissimilarity matrix using linear correlation and normalized it between 0 and 1. Empirical dissimilarity matrices on the same set of stimuli constructed with video representations from a) Model 1: Purely Convolutional model, b) Model 2: Unstructured pooling model, c) Model 3: Structured pooling model d) Model 4: Learned templates model e) Categorical oracle and f) Magnetoencephalography brain recordings.

doi: https://doi.org/10.1371/journal.pcbi.1005859.g006