A neural ensemble correlation code for sound category identification

doi:10.1371/journal.pbio.3000449

Fig 1.

Neural ensemble correlation statistics for an auditory midbrain penetration site.

(A) Neural recording probe and the corresponding frequency response areas at 8 staggered recording sites show tonotopic organization (red indicates high activity; blue indicates low activity). aMUA for the 16 recording channels for a (B) fire and (E) water sound segment (red indicates strong response; blue indicates weak response). The spectral (C = fire; F = water) and temporal (D = fire; G = water) neural ensemble correlation for the penetration site. Stimulus-driven spectral (H) and temporal (K) correlations of the recording ensemble show distinct differences and unique patterns across the five sounds tested. Spectral (I) and temporal (L) noise correlations recorded during the same sound delivery sessions are substantially less structured (diagonalized spectrally and restricted across time) and show little stimulus dependence. The total spectral and temporal ensemble correlations (stimulus-dependent + noise) are shown for the same site and sounds in J and M, respectively. Additional examples penetration sites are provided in S2–S4 Figs. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3. aMUA, analog multiunit activity; SPL, sound pressure level; Freq., frequency.

More »

Expand

Fig 2.

Average stimulus-driven and noise-driven neural correlations.

Summary results showing the average stimulus-driven (A = spectral, C = temporal) and noise-driven (B = spectral; D = temporal) neural correlations across N = 13 penetration sites (N = 4 and 9, from two animals). To allow for averaging across recording sites with different best frequencies, the spectral and temporal correlation matrices (as for A, C, and Fig 1H and 1K) are collapsed across their principal dimension (channel offset for spectral and time lag for temporal) prior to averaging. The average noise-driven correlations are more compact, being restricted in both time and frequency, and have less structure across sounds than the corresponding stimulus-driven correlations. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3.

More »

Expand

Fig 3.

Using neural ensemble correlation statistics to identify sounds.

(A) Single-trial classification results for the penetration site shown in Fig 1. The average single-trial classifier performance (red curve) and the performance for each individual sound (black lines) are shown as a function of the sound duration for four classifiers. Blue curves designate upper bound on performance based on a noiseless classifier (see Materials and Methods). In all cases, classifier performance improves with sound duration. The combined spectro-temporal classifier has the highest performance, followed by the spectral and temporal classifiers. Removing the tonotopic ordering of recording sites for the temporal classifier for this recording location (far right) substantially reduces its performance. (B) Average performance across N = 13 IC penetration sites (N = 4 and 9, from two animals) for each of the four classifiers shown as a function of sound duration for the single-trial (red) and noiseless (blue) classifiers. Red and blue bands represent SD. The average performance for each individual sound and classifier is provided in S6 Fig. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3. IC, inferior colliculus; w/o, without.

More »

Expand

Fig 4.

Neural ensemble correlation statistics for sound variants with equalized power spectrum.

The spectral (A and B) and temporal (C and D) correlations of a single IC recording site show similar structure between the original (A and C) and 1/f equalized (B and D) sounds (Pearson correlation coefficient, r = 0.92 for spectral; r = 0.72 for temporal; averaged across five sounds). (E) Across penetration sites (N = 11 total; N = 3 and 8 from two animals), the neural correlations of the original and spectrum-equalized variants have an average Pearson correlation coefficient of r = 0.95 ± 0.02 for spectral and r = 0.80 ± 0.02 for temporal when the comparison is between same sounds (e.g., original fire versus 1/f fire sounds; red). Across-sound comparisons (e.g., original fire versus 1/f water sound; blue) show a reduced correlation (r = 0.85 ± 0.02 for spectral; r = 0.57 ± 0.01 for temporal). (F) Single-trial classification results (averaged across five sounds) for the above penetration site obtained for the original and spectrum-equalized sounds. The model is trained using the responses to the original sounds, while the validation data are from the responses to the original (red) or the spectrum-equalized (blue) sounds (see Materials and Methods). The spectrum-equalized (1/f) condition shows slightly lower performance for spectro-temporal, spectral, and temporal classifiers, while the spectrum (rate code) classifier is near chance. Average performance versus sound duration across N = 11 IC penetration sites for each of the four classifiers. Classification performance is shown for the original (G) and spectrum-equalized (H) sound responses. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3. IC, inferior colliculus; Ori, original.

More »

Expand

Fig 5.

Measuring the average correlation structure of natural sounds.

The procedure is illustrated for a speech and a flowing water sound. The spectro-temporal correlations are obtained by cross-correlating the frequency-organized outputs of a cochlear model representation (A, E). The resulting spectro-temporal correlation matrices (B, F) characterize the correlations between frequency channels at different time lags. The spectro-temporal correlations are then decomposed into purely spectral (C, G) or temporal (D, H) correlations. Speech is substantially more correlated across frequency channels, and its temporal correlation structure is substantially slower than for the water sound. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3. Freq., frequency.

More »

Expand

Fig 6.

Sound-correlation statistics for the 13 sound categories and white noise.

The category average (A) spectral correlation matrix and (B) temporal correlations show unique differences among the 13 sounds examined. (C) The CDI quantifies the variability of the correlation statistics for each category. A CDI of 1 indicates that the sound category is diverse (the correlation statistics are highly variable between sounds), while 0 indicates that the category is homogenous (all sounds have identical correlation statistics). A detailed list of sounds and sources used is provided in S1 Table. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3. CDI, category diversity index; Freq., frequency.

More »

Expand

Fig 7.

Short-term correlation statistics and stationarity.

The short-term correlation statistics are estimated by computing the spectro-temporal correlation matrix using a moving sliding window. The procedure is shown for an excerpt of (A–D) speech and (E–H) water (additional examples in S1–S4 Movies). The sliding window (400 ms for these examples) is varied continuously over all time points but is shown for three select time points for this example. The short-term statistics are also shown for the spectral and temporal correlation decompositions. Note that for speech, the correlations change dynamically from moment to moment and differ from the time-averaged correlations (gray panel), indicating nonstationary structure. By comparison, the time-varying correlations for water resemble the time-averaged correlations (gray panel), indicating more stationarity. (I) SIs for the 13 categories and white noise. Speech has the lowest stationarity values, while white noise is the most stationary sound. Principal components derived for the short-term spectral and temporal correlations of all natural sounds in the database are shown in S8 Fig. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3. Freq., frequency; SI, stationarity index.

More »

Expand

Fig 8.

Using neural ensemble correlation statistics to categorize sounds in a three-category identification task.

The sound categories delivered included fire, water, and speech, with six exemplars per category. As shown for a representative IC penetration site, spectral (A) and temporal (C) neural correlations (100 observations × 6 exemplars × 3 categories; see Materials and Methods) show distinct structures across sound categories but are similar within a category. Projections of the neural correlations onto the first two principal components show that spectral (B) and temporal (D) correlation form distinct clusters for each of the three sound categories. (E) Single-response trial classification results for this penetration site. In all cases, classifier performance improves with sound duration approaching near 100% for the full sound duration (1 sec). (F) Average performance across IC penetration sites (N = 11; N = 3 and 8 from two animals). The spectro-temporal classifier has the highest performance, with an average performance of 96% correct classification for 1 s duration. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3. IC, inferior colliculus; PC, principal component.

More »

Expand

Fig 9.

Using short-term correlation statistics to categorize sounds in a 13-category identification task.

A cross-validated Bayesian classifier is applied to the sound short-term correlation statistics (spectral, temporal, and spectro-temporal) to identify the category of each of the test sounds (see Materials and Methods). (A) Both the spectral and temporal correlation classifiers had an optimal temporal resolution of 144 ms (i.e., short-term analysis window size). The optimal resolution of the spectro-temporal correlation classifier, by comparison, is slightly higher (100 ms). For reference, the performance of a spectrum-based classifier is largely independent of the sound resolution used. (B) For all three correlation classifiers and the spectrum classifier control, the performance improves with the sound duration. The spectro-temporal correlation classifier performance improved with sound duration at the fastest rate, while temporal correlations had the slowest rate of improvement. The correlation-based classifiers outperformed the spectrum-based classifier performance in all instances. (C) The short-term spectro-temporal classifier outperforms the time-averaged classifier, indicating that nonstationary structure improves performance. (D) Confusion matrices for the three correlation-based classifiers for 10 s sound durations. (E) Performance for the three classifiers shown as a function of sound category (measured at the optimal resolution and at 10 s sound duration). Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3.

More »

Expand

Fig 10.

Model classification performance in a two-category identification task.

The classification task requires that the model distinguish vocalization from background sound categories. For all three classifiers, the overall performance is consistently high and improved with increasing sound duration. Vocalization classification accuracy is highest for the spectro-temporal classifier (C) and shows a nearly identical trend for the spectral classifier (A). The performance of the temporal classifier, however, is approximately 20% lower. For background sounds, classification accuracy does not improve over time and is consistently high (approximately 90%) for all three classifiers. Figure data and related code are available from http://dx.doi.org/10.6080/K03X84V3.

More »

Expand