Deep neural networks explain spiking activity in auditory cortex

doi:10.1371/journal.pcbi.1013334

Fig 1.

Schematic of ANN-based encoding models.

Spiking activity (upper right) in response to stimuli (e.g., speech; waveform at left) is recorded from squirrel monkey auditory cortex (black dots on brain in upper left). The exact same stimuli are presented to the ANN (bottom). The spike sequences are then regressed separately (schematic linear plots, middle) onto each hidden layer’s responses to the same stimuli, yielding a temporal receptive field (TRF) for each layer of the ANN. Performance is evaluated on held-out stimulus-response pairs by computing the correlation between the TRF-based predictions and the spike-count sequences.

More »

Expand

Fig 2.

Example cortical responses and ANN predictions.

A: (Left) Sequences of spike counts (50-ms bins) in response to an English sentence (“A tiny handful never did make the concert”; spectrogram above). Below are cortical responses to ten repetitions (gray) and their mean (black), and the prediction from the Whisper [base] ANN (green). (Right) the same as the left panel but for a different sentence (“A bullet, she answered”), ANN (wav2vec2, in yellow), and monkey. The locations of the recordings are indicated by color in B. B: Locations of recording sites across monkeys C (top row), B (bottom left), and F (bottom right). All recordings in right hemisphere except top left panel. The large circle indicates the location of the recording cylinder, within which the upper half plane corresponds roughly to primary auditory cortex (core); the lower half, non-primary (belt and parabelt). The approximate location of belt in each cylinder is between the two parallel lines.

More »

Expand

Fig 3.

Model-neuron correlations.

All subpanels show correlations between model predictions and the multi-unit activity they are supposed to predict. A: Model-neuron correlations for speech (TIMIT) stimuli. Bar plot: median (across multi-units) correlation of the STRF (gray) and of the best layers of each of the neural networks, both trained (dark colors) and untrained (light colors). Line plots: the distributions of correlations as a function of ANN layer for each of six trained networks (colored) and their untrained counterparts (gray). The median (solid line) and interquartile range (shaded region) are shown. At starred layers, the trained ANN is significantly superior to its untrained counterpart (top row of stars) or a STRF (bottom row of stars; Wilcoxon signed-rank test with p < 0.01). To control the false discovery rate, we apply the Benjamini–Hochberg correction separately for each model. The layer type is indicated along the top of each plot: convolutional (brown), self-attention (blue), and recurrent (light blue). B: The same as A but using monkey vocalizations for stimuli.

More »

Expand

Fig 4.

Distributions of model-neuron correlations as a function of maximum frequency of the model predictions.

Median (solid line) and interquartile range (shaded) are indicated. For each ANN, results are shown only for the layer that was most predictive at 50-ms bins. First, the layer’s responses were low-pass filtered at the frequency indicated on the horizontal axis. Then a linear readout (TRF) was fit to predict spiking activity binned at 20 ms. The vertical axis shows the resulting distribution of correlations on the test set. The red star indicates the cut-off frequency yielding the largest median (across multi-units) correlation; black dots indicate frequencies yielding correlation distributions indistinguishable from that at the red star (Wilcoxon signed-rank test with p < 0.01).

More »

Expand

Fig 5.

Distributions of most predictive layers (normalized) for primary (blue) and non-primary (orange) multi-unit activity.

Histograms and corresponding kernel density estimates are shown as a function of network depth (from shallowest to deepest), pooled across all neurons and all six ANNs. The distribution of preferred layers is significantly “deeper” for non-primary than primary neurons (Wilcoxon rank-sum test, p < 0.001).

More »

Expand