Sparse high-dimensional decomposition of non-primary auditory cortical receptive fields

doi:10.1371/journal.pcbi.1012721

Fig 1.

Cortical receptive fields of PEG neurons are estimated with respect to the primary-cortical representation of acoustic stimuli by fitting generalized linear models (GLM) to spiking responses.

Each scale-rate channel of the primary-cortical representation, obtained by the convolving the stimulus auditory spectrogram by the associated basis function, is convolved with the corresponding channel of the cortical receptive field. The outputs of each scale-rate channel linearly combine to modulate the conditional intensity function (CIF), which is logistically linked to stimulus and spiking history modulations. The equivalent spectrotemporal receptive field (STRF) is thus computed as the linear combination of the spectrotemporal filters of each scale-rate channel.

More »

Expand

Fig 2.

CortRF analysis of a simulated neuron.

A. Ground-truth receptive field of simulated PEG neuron consisted of two components (circled in blue). B. Estimated cortical receptive field nearly exactly recovered ground-truth receptive field. The dominant features are circled in blue. C. Kolmogorov-Smirnov (KS) and autocorrelation function (ACF) tests of model goodness-of-fit with 95% confidence intervals show the history-dependent GLM accounted for simulated spiking statistics accurately. D–E. Single-realization goodness-of-fit tests showed that spiking responses of individual simulated realizations in response to both speech and TORC stimuli were well-modeled by the GLM. F–G. Estimated CIF vs. observed spiking. The estimated CIFs for unseen realizations of the simulated spiking process closely matched the spiking responses, with distinctive correlogram peaks close to 0-lag. Receptive fields have been normalized for visualization.

More »

Expand

Fig 3.

CortRF analysis of an example PEG neuron.

A. The estimated CortRFs of PEG neurons were sparse, but produced complex CortSTRFs. Here, three atoms from the dictionary of primary-cortical features had non-zero weight, but correspond to a CortSTRF with wide temporal tuning B. KS and ACF tests show spiking statistics were well-matched by the estimated model over all speech and TORC stimulus repetitions withheld during model estimation. C–D. Single-trial spiking statistics were also well-matched. E–F. Comparing the predicted conditional intensity function (CIF) to withheld observed spiking responses showed the estimated model was highly predictive of spiking responses. G. The 2-dimensional convolution of a sample speech spectrogram with the estimated CortSTRF shows how the sentence would be represented by a family of PEG neurons with similar receptive fields translated in frequency. Spectrograms and receptive fields are normalized for visualization.

More »

Expand

Fig 4.

STRF analysis of a PEG neuron.

A. The estimated STRF of PEG neurons generally provided characterizations of neurons’ selectivity that diverged from the CortRF, as seen here. The 2-dimensional convolution of a speech spectrogram with the estimated STRF suggested shorter latency in the neural representation than when convolved with the CortSTRF. B–F. Statistical tests for goodness-of-fit, both for each and over all speech and TORC stimulus repetitions withheld during model estimation, indicated estimated models were well-matched to the observed spiking statistics. Additionally, the withheld spiking responses were closely matched by predicted CIFs. Spectrograms and STRFs are normalized for visualization.

More »

Expand

Fig 5.

CortRF models of PEG neurons were more predictive of speech responses than STRF models.

Quantifying predictive performance by the cosine similarity between the predicted CIFs and observed spiking responses, CortRF models were compared to the STRF models over the same set of speech and TORC responses withheld from model estimation. The histograms of differences in TORC (blue) and Speech (red) response predictions are shown in the left panel. The empirical cumulative density function of these distributions is shown in the right panel. While no difference between the two was found in predicting TORC responses (p = 0.353, Wilcoxon signed rank test), a significant advantage in using the CortRF model to predict speech responses was observed (p = 0.018, Wilcoxon signed rank test).

More »

Expand

Fig 6.

CortRF models of A1 neurons provided no advantage over STRFs in predicting spiking responses to speech or TORCs.

A–B. CortRF analysis and STRF analysis of A1 neurons characterized their feature selectivity similarly more often than for PEG neurons in the sense that CortSTRFs and STRFs had similar frequency-tuning, latency, and receptive field shape. C. The representations of a speech spectrogram, obtained via 2-dimensional convolution with either the CortSTRF (left) or STRF (right) of the same A1 neuron reflected this similarity. The latency and frequencies represented in both convolved spectrograms were much more similar than in PEG neurons. D. In further contrast to PEG neurons, there was no significant difference between the predictive performance of CortRF models and STRF models of A1 neurons for either TORC (p = 0.750, Wilcoxon signed rank test) or speech stimuli (p = 0.802, Wilcoxon signed rank test). Spectrograms and receptive fields are normalized for visualization.

More »

Expand

Fig 7.

PEG neurons encoded more complex features than A1 neurons.

A. The complexity of CortSTRFs and STRFs was quantified using two approaches to determine if PEG neurons (top row) were selective of more complex acoustic features than A1 neurons (bottom row). The magnitudes of STRFs were computed (second column) and approximated by a probability distribution function for a Gaussian mixture model (GMM) fit with a boosting algorithm with large- and small-covariance Gaussian weak learners (third and fourth columns, respectively) and by k components of its singular value decomposition (fifth column). Here, k was the smallest number of singular values that accounted for at least 75% of the spectral power, ensuring the mean-squared errors of all k-rank approximations were small. Receptive fields are normalized for visualization. B. The concentration of energy in STRFs was measured by the determinant of the covariance of the GMM likelihood; smaller values indicate more concentration of energy. GMMs were fit using a boosting algorithm with large-covariance weak learners. The energy in CortSTRFs and STRFs of PEG neurons was more dispersed than those of A1 neurons (CortSTRF: p < 0.001, STRF: p = 0.002; Wilcoxon rank sum test). C. The analysis of energy concentration in STRFs was repeated with small-covariance weak learners, demonstrating robustness to the choice of base learner and further indicating that the energy in CortSTRFs and STRFs of PEG neurons was more dispersed than those of A1 neurons (CortSTRF: p < 0.001, STRF: p = 0.002; Wilcoxon rank sum test). D. The receptive field shape complexity, quantified by the number of eigenmodes, was higher for both the CortSTRFs (p = 0.002, Wilcoxon rank sum test) and STRFs (p = 0.023, Wilcoxon rank sum test) of PEG neurons than of A1 neurons.

More »

Expand

Fig 8.

Unsupervised clustering of CortRFs segregated PEG and A1 receptive fields by cortical area.

A. Spectral clustering, applied jointly to the CortRFs estimated for PEG and A1 neurons, yielded 6 clusters. The proportion of PEG neurons in each cluster ±2 SEM was computed (top), and clusters were designated as representative of distinct PEG features if at least half its members were from PEG (green). A1 clusters were designated similarly (teal). The three clusters with at least ten members (bottom) were inspected further. One was an A1 cluster (Cluster 1), and the other two were PEG clusters (Clusters 4 and 5). Each of these three clusters were confirmed to deviate significantly from chance-level proportion of PEG neurons (t-test, p < 0.05). Moreover, a Fisher exact test confirms a significant interaction between neurons’ cluster labels (PEG or A1) and cortical area (p < 0.05). B–D. The average CortRF convolved by the primary-cortical basis functions are displayed for each cluster. The convolved CortRF was marginalized over scales (below), over rates (left), and over both to compute the average CortSTRF. Cluster 1, the A1 cluster, had more energy in fast than slow rate channels (B). In contrast, Cluster 4 (C) had more energy in slow rate channels; Cluster 5 (D) intermediated these two, but did have more energy in slow rate channels. Clusters 4 and 5, the PEG clusters, predominantly had energy in wide-bandwidth scale channels, while Cluster 1 had more narrow bandwidth components. E. The representations of a speech spectrogram by each of these clusters were obtained by computing the 2-dimensional convolution with the cluster-average CortSTRFs. The convolved spectrogram of Cluster 1 (top right) had shorter latency and narrower bandwidth than either Cluster 4 or 5. Corroborating observations about the rate-scale composition of Cluster 5, its convolved speech spectrogram had narrower bandwidth than Cluster 4 but longer latency than Cluster 1. Spectrograms and receptive fields are normalized for visualization.

More »

Expand

Table 1.

Cluster variability.

More »

Expand