Auditory Frequency and Intensity Discrimination Explained Using a Cortical Population Rate Code

The nature of the neural codes for pitch and loudness, two basic auditory attributes, has been a key question in neuroscience for over century. A currently widespread view is that sound intensity (subjectively, loudness) is encoded in spike rates, whereas sound frequency (subjectively, pitch) is encoded in precise spike timing. Here, using information-theoretic analyses, we show that the spike rates of a population of virtual neural units with frequency-tuning and spike-count correlation characteristics similar to those measured in the primary auditory cortex of primates, contain sufficient statistical information to account for the smallest frequency-discrimination thresholds measured in human listeners. The same population, and the same spike-rate code, can also account for the intensity-discrimination thresholds of humans. These results demonstrate the viability of a unified rate-based cortical population code for both sound frequency (pitch) and sound intensity (loudness), and thus suggest a resolution to a long-standing puzzle in auditory neuroscience.


Introduction
The nature of the neural code for perception is a fundamental question in neuroscience [1][2][3][4][5]. In auditory neuroscience, the search for the neural code for pitch-an essential perceptual attribute of sound classes such as music and speech-has attracted considerable interest [6][7][8][9]. Two main types of neural codes for pitch have been offered: ''timing'' codes, which rely on fine spiketiming information [10], and ''rate'' codes, which involve spike rates computed over relatively long time windows-typically, a few hundred milliseconds [11].
Timing codes can carry considerably more information than rate codes [12], and the spike times of auditory-nerve fibers have been found to contain more information than needed to account for human listeners' ability to discriminate very small changes in frequency [11,13,14]. However, temporal coding degrades rapidly beyond the auditory nerve, making spike timing a less viable code at higher levels of neural processing. Indeed, in the primary auditory cortex, single units cannot precisely follow frequencies higher than a few hundred Hertz [15][16][17] -more than an order of magnitude below the upper limit of accurate pitch perception in humans [18][19][20]. Although studies in non-human animals found no deficits in pure-tone intensity or frequency discrimination following bilateral ablation of auditory cortex, substantial deficits in pure-tone frequency (pitch) and intensity (loudness) discrimination have been observed in human patients with cortical lesions [21,22], suggesting that the auditory cortex plays an important role in those two perceptual abilities.
It seems likely, therefore, that any timing code for frequency in the auditory nerve is transformed into a cortical rate-place code. However, it is not known whether the information contained in the spike counts of a population of cortical neurons is sufficient to account for the very fine frequency-discrimination thresholds of human listeners. A cortical rate-place code for frequency discrimination faces two major obstacles: relatively broad receptive fields [23], implying poor resolution of small frequency differences by single units, and correlated spike counts [24,25], which can severely limit the benefit of pooling information across multiple units [26][27][28][29].
Here, we examine the properties of a population of virtual neurons with frequency-tuning and spike-count correlation characteristics similar to those measured in the primary auditory cortex of primates. We determine that statistically optimal decoding of the information contained in the spike rates of these neurons can account quantitatively for the remarkable ability of trained human listeners to discriminate sound frequency. In addition, we show that the same cortical population code is also consistent with psychophysical data concerning another fundamental auditory ability: intensity discrimination. These results demonstrate the viability of a cortical rate code for both frequency and intensity discrimination, thus providing a possible resolution for a longstanding puzzle in auditory neuroscience. Figure 1A shows frequency tuning curves (spike-rate versus stimulus frequency) for an array of virtual frequency-selective neurons with best frequencies (BFs) equally spaced on a logarithmic scale spanning a 1-octave range centered on 1 kHz. For illustration purposes, tuning curves are plotted for a small subset of units (n = 6) and a limited BF range, but the results described below are based on a larger number of units (n = 1700) and a wider BF range (2 octaves).

Results
A key characteristic of neural tuning curves is their sharpness. A common measure of sharpness is the ''quality factor'' (Q), which is obtained by dividing the BF of the unit by a measure of tuning, in this case the width of the tuning curve at half of the peak spiking rate. The sharpness of the simulated units was adjusted to yield Q values consistent with those measured in the primary auditory cortex of primates, which have been found to equal 12 on average for sharply tuned units, and 3.7 on average for non-sharply tuned units [23]. Since sharp tuning is generally beneficial for frequency discrimination, in the context of this study we were interested primarily in discrimination performance based on the outputs of sharply tuned units. Thus, unless indicated otherwise, Q was set to 12. The tuning curves illustrated in Figure 1A reflect this choice. Figure 1B shows simulated spike counts for this population of virtual neurons in response to a 1000 Hz, 50 dB SPL pure tone with a duration of 1 s. The spike counts were modeled as integer-valued random draws from a multivariate Gaussian distribution in which the variance of the spike counts for a given unit was equal to the unit's mean spike count-as is the case for Poisson-distributed spike counts. The covariance between the spike rates of two different units was either set to zero, reflecting an assumption of complete statistical independence between units, or to the product of the geometric mean spike rate and the spike-count correlation coefficient-consistent with the facts , and E(C j ) denote the covariance, correlation, variances, and expected values of the spike counts of units i and j, respectively. The latter covariance structure is consistent with neurophysiological data, which show decreasing spike-count correlations between pairs of cortical units as the distance between the units increases, and the overlap between their receptive fields decreases [24,25,[30][31][32]. In the context of this article, the phrase ''spike-count correlations'' refers specifically to covariations in the spike counts of different units across multiple presentations of the same stimulus. Such correlations, also known as ''noise correlations,'' should not be confused with correlations between the spike counts of different units across different stimuli, which are traditionally referred to as ''signal correlations'' [33]. The resulting covariance and correlation matrices are shown in Figs. 1C and 1D, respectively. The correlation matrix was scaled so that the spike-count correlation coefficient (or, equivalently, the expected value of the correlation between the spike counts) of two units, r i,j , where i and j indicate different units, was maximally equal to r. Unless indicated otherwise, r was set to 0.25. This value was chosen based on recent findings, which indicate that such a value is not atypical for proximal cortical neurons [32], especially for output layers [34]. Even though higher discrimination performance might be achieved based on the response of cortical input layers [34], we reasoned that the properties of output layers of the primary auditory cortex were more relevant than those of other cortical layers for predicting the discrimination performance for a read-out mechanism located beyond the primary auditory cortex. Figure 2A shows mean population responses evoked by two sequentially presented pure tones with slightly different frequencies: 1000 and 1001.68 Hz. The frequency difference, 1.68 Hz, corresponds approximately to the mean frequency-discrimination threshold (corresponding to a d9 of 1) at 1000 Hz [35]. Note that the difference between the spike rates (r) evoked by the two tones ( Figure 2B, black curve) is quite small relative to the variability of the spike counts ( Figure 1B): across the entire population of neurons (n = 1700), the largest single-unit signal-to-noise ratio (SNR)-computed as the difference in spike rates evoked by the two stimuli (D r ) divided by the square root of the spike rate evoked by the first stimulus [5]-was equal to 0.12. An SNR of 0.12 corresponds approximately to only 53% correct in a two-interval two-alternative forced-choice (2I2AFC) discrimination task [36], where chance performance is 50% correct. This leads to the question of how many units an optimal observer must pool spikecount information from in order to obtain the same performance as trained human listeners in this task, and with these stimuli, i.e., a d9 of 1, or 76% correct in a 2I2AFC experiment. For statistically independent units with a constant spike-count covariance matrix, where SNR i is the SNR (as defined above) for unit i. Therefore, if all the units in the population had uncorrelated spike counts and the same BF and tuning curve as the most informative unit (i.e., the unit for which SNR was the highest), combining spike-count information from ,70 units would be sufficient to obtain a d9 of 1.
In the presence of spike-count correlations, and of a stimulusdependent covariance matrix, the relationship between overall performance (d9) and the single-unit SNRs (SNR i ) is more complex, but d9 can still be evaluated based on the Fisher information (see Methods) [37]. The Fisher information is inversely related to the Cramér-Rao lower bound on the variance of an estimator, which places a limit on the precision with which a quantity can be estimated using any decoding scheme (linear or nonlinear) [38,39]; it is often used to quantify the best decoding performance that can be achieved based on the information contained in the responses of a population of neurons [27][28][29]37]. Using this approach, we found that, for a population of units with tuning curves and spikecount correlations as illustrated in Figure 1, a d9 of 1 was reached when the number of units in the population (with BFs spread evenly across the two-octave BF range), was set to 1700, which corresponds to a density of 850 units/octave. With no spike-count correlation (r = 0), a density of 300 units/octave was sufficient to obtain a d9 of 1.0. Even if only 25% of units in primary auditory cortex are sharply tuned [23], a density of 850 sharply tuned units per octave implies an overall neuronal density of 3400 units per

Author Summary
A widely held view among auditory scientists is that the neural code for sound intensity (or loudness) involves temporally coarse spike-rate information, whereas the code for sound frequency (or pitch) requires more finegrained and precise spike timing information. One problem with this view is that neurons in auditory cortex do not produce precisely time-locked responses to higher frequencies within the pitch range, suggesting that a transformation to a rate code must occur. However, because cortical neurons exhibit relatively broad tuning to frequency and correlated spike counts, it is unclear whether a cortical population code based on spike rates alone can support the remarkably precise pitch-discrimination ability of humans. Here we show that a relatively small population of virtual neurons with frequency-tuning and spike-count correlation characteristics consistent with those of actual neurons in the primary auditory cortex of primates, can account for both the smallest frequency-and intensity-discrimination thresholds measured behaviorally in humans. These results suggest a resolution to a longstanding puzzle in auditory neuroscience.
octave; this number is well within the range of physiologically realistic neuronal densities for the output layer of primary auditory cortex [40].
To gain insight into the effective contribution of each unit in the population to the overall performance, we computed the product of the square-root of the Fisher information for each unit and the frequency difference between the two stimuli (1.68 Hz), and plotted the resulting measure, d9/unit, as a function of BF. This was done for r = 0.25 ( Figure 2B, solid green curve) and for r = 0 ( Figure 2B, dashed green curve). Units with BFs more than K octave below, or above, the reference stimulus frequency (1 kHz) contributed very little to the overall discrimination performance. In fact, for r = 0.25, only 130 (,8% of the 1700) units had a d9/ unit larger than half of the d9/unit of the ''best'' (i.e., most informative) unit. Almost all of these units had BFs located within a frequency range of 2 semitones (12%, or 1/6 th of an octave) centered on the reference-stimulus frequency (1 kHz).
Our finding that a larger pool size is needed to reach the same performance (d9 = 1) in the presence than in the absence of spikecount correlations is consistent with previous findings [26]. A simple explanation for the detrimental impact of spike-count correlations on stimulus discrimination performance is that they limit an observer's ability to ''average out'' neural noise without simultaneously canceling the signal. The ''cost'' of spike-count correlations on discrimination performance is apparent in the difference between the areas under the solid and dashed green curves ( Figure 2B)-the square root of the sum of the squared d9/ unit values across all units, which is equal to d9, was ,70% larger for r = 0 than for r = 0.25. The ''dip'' at 1 kHz in the d9/unit curves stems from the fact that, for units with a BF close to 1 kHz, the difference between the spike rates evoked by the two stimuli (D r , black curve) was close to zero. The other dips, which are apparent in the dashed green curve, reflect the combined influence of the two factors that determine the Fisher information for each unit, namely, the change in spike rate (D r ) and the change in the spike-count covariance matrix (see Methods). Intuitively, d9/unit values close to zero indicate units whose spike counts convey little information beyond that already provided by other units, once spike-count correlations are taken into account. Figure 2C shows the mean population responses evoked by two tones having the same frequency (1000 Hz), but a different intensity (50 dB SPL versus 51.22 dB SPL). The intensity difference between the two stimuli (1.22 dB) was selected to represent the difference that corresponds to human discrimination sensitivity d9 of 1, based on data in the psychoacoustic literature [41]. We determined the change in spike rate needed to obtain a d9 of 1 using the same correlation coefficient (r = 0.25) and pool size (n = 1700), which were found earlier to yield a d9 of 1 for the frequency-discrimination task. We found that a change in spike rate of slightly less than 1 spike/s (namely, 0.94 spikes/s) was sufficient. A spike-rate change of 0.94 spikes/s for a 1.22 dB change in sound intensity translates to a change of approximately 15 spikes/s for a 20-dB change in intensity. This value is consistent with example rate-level functions for neurons in primary auditory cortex in the literature, which typically show increases of 10 to 20 spike/s as the intensity of a tone at BF increases from 40 to 60 dB SPL [42]. Thus, it is possible to account for performance in the intensity-discrimination task using the same pool of sharply tuned units as assumed for the frequency-discrimination task with the same coarse rate-based neural code.
Note that the maximal change in spike rate (across all units) corresponding to the discrimination threshold was larger (by a factor of 2.5) for the intensity-discrimination task than for the frequency-discrimination task-compare the heights of the black curves in Figs. 2B and 2D. This outcome underscores the fact that equally discriminable stimulus differences need not correspond to equal differences in spike rates. It can be understood by considering the impact of spike-count correlations on the discrimination of frequency or intensity changes for a population containing only two units. Figure 3 shows equal-probability contours of probability distributions for spike counts (or singletrial estimates of spike-rates) evoked by tones differing in frequency ( Figure 3A) or in intensity ( Figure 3B), for two units, i and j. In this example, unit i, whose spike rates are plotted on the x-axis, has a BF below the reference stimulus frequency (1 kHz), while unit j, whose spike rates are plotted on the y-axis, has a BF above that frequency. When the frequency of the stimulus is increased, the spike rate of unit i decreases while that of unit j increases ( Figure 3A). By contrast, when the intensity of the stimulus is increased, the spike rates of both units increase simultaneously ( Figure 3B). Note that, for illustration purposes, the mean magnitude of the stimulus-induced changes in spike rate is the same for the two units and the two tasks. Under these circumstances, positive spike-count correlations, which are reflected in elongated contours along the major diagonal, lead to a smaller overlap between the two distributions for the frequency change than for the intensity change. Since the error rate of the optimal observer is directly related to the overlap between the A. Population responses evoked by two tones differing slightly in frequency. Blue: mean spike rate as function of BF for a 1000 Hz, 50 dB SPL pure tone. Red: mean spike rate as function of BF for a 1001.68 Hz, 50 dB SPL pure tone. B. Spike-rate difference (D r ) and d9/unit as a function of BF. The spike-rate difference was obtained by subtracting the mean spike rate evoked by the higher-frequency stimulus (red curve in panel A) from the spike rate evoked by the lower-frequency stimulus (blue curve). d9/unit was computed as described in Experimental Procedures. C. Population responses evoked by two tones differing slightly in intensity. Blue: mean spike rate as function of BF for a 1000 Hz, 50 dB SPL pure tone. Red: mean spike rate as function of BF for 1000 Hz, 51.22 dB SPL pure tone. D. As for panel B, but for the population responses shown in panel C. doi:10.1371/journal.pcbi.1003336.g002 spike-count probability distributions, for this case, positive correlations have a more dramatic impact on intensity-discrimination performance than on frequency-discrimination performance.

Discussion
Historically, auditory researchers have found it difficult to account for both frequency discrimination and intensity discrimination within the same framework, or using the same neural code. This is in part because the changes in spike rate-or the changes in ''excitation patterns'' in psychoacoustical models-corresponding to threshold are usually smaller for a frequency-discrimination task than for an intensity-discrimination task [11,13,43,44]. This has led to the view that intensity discrimination relies on spike-rate information whereas frequency discrimination requires fine spiketiming information-at least, for frequencies lower than about 8 kHz. One of the strongest arguments supporting this view stemmed from comparisons of discrimination thresholds measured in human listeners with predictions obtained using observer models that operate on spike-count information only, or use fine spike timing [13]. However, these models have traditionally been based on neural responses at the level of the auditory nerve, which contain a wealth of precise temporal information. The findings described above suggest a different conclusion at the level of the auditory cortex, where neurons are unable to accurately phaselock to frequencies higher than, at most, a few hundred Hertz. We find that the spike rates of a realistically small population of units, with frequency-tuning and response-correlation characteristics similar to those observed in the primary auditory cortex of primates, contain enough statistical information to account for the smallest frequency-discrimination thresholds measured in human listeners-slightly less than 0.2%, or ,2 Hz at 1 kHz.
Any viable rate-based population code for frequency (or pitch) discrimination must overcome two major limitations. The first limitation stems from the width of neural tuning curves: even the most sharply tuned units in the primary auditory cortex of primates have relatively wide receptive fields, with bandwidths (measured at half the peak spike-rate) of approximately 8% of the unit's best frequency (BF) [23]. One consequence of such wide receptive fields is that the change in spike rate produced by a small (e.g., 0.2%) change in stimulus frequency is very small relative to the neural noise, i.e., the random variability in spike counts. In principle, the detectability of small spike-rate differences can be enhanced by pooling information across many neural units. However, previous work in theoretical neuroscience has indicated that the benefit of pooling spike-count information across multiple units can be drastically limited if the pooled spike counts are correlated [26][27][28][29]37]. The spike counts of cortical neurons are correlated [24][25][26]30,[32][33][34]. Thus, it was unclear a priori whether a population code based solely on spike-rate information in auditory cortex could support the remarkably fine frequency-discrimination performance of humans. The results described above offer a positive answer to this question. They show that, contrary to popular belief, a cortical rateplace code can provide sufficient information to account for human behavior in the dimensions of both frequency and intensity, using reasonable assumptions relating to unit density, unit tuning, and inter-unit correlations.
As with any modeling study, the conclusions of this work depend on the assumptions of the underlying model. In particular, our estimates of the number of units needed to achieve a given level of behavioral discrimination performance rely on the assumption that downstream neurons use the information contained in the spike counts of the population optimally in a statistical (maximum-likelihood) sense. It remains to be determined whether neural networks in the auditory cortex can achieve, or even approach, this optimum. If they cannot, the estimated numbers of neurons needed to explain the behavioral performance of human listeners in the frequency-and intensity-discrimination task would be under-estimates. Importantly, however, increasing the assumed population size would not necessarily alter our main conclusion, according to which the behavioral thresholds for these two tasks can, at least in theory, be accounted for using the same population and same type of (spike-rate) code. Another assumption on which our conclusions may depend relates to the strength of spike-count correlations and its relationship with other characteristics, such as the BFs and frequency-tuning widths of the units. Our choice of correlation structure for the virtual population was A. Horizontal slices across spike-rate distributions evoked by two stimuli differing in frequency in two units with BFs below and above the referencestimulus frequency. The spike rate of the unit (i) with BF lower than the reference frequency is plotted along the x-axis; the spike rate of the unit (j) with BF higher than the reference frequency is plotted along the y-axis. Blue: horizontal slice across the spike-rate distribution for the reference tone (1000 Hz, 50 dB SPL); red: horizontal slice across the spike-rate distribution for the higher-frequency tone. B. Same as A, but for an intensity change. doi:10.1371/journal.pcbi.1003336.g003 based in part on neurophysiological data [24,25], and in part on theoretical and simplicity considerations. Lastly, the conclusions of this study are subject to the limitations of Fisher information as a measure of optimum decoding performance for neural populations [e.g., 45].
Our finding that a cortical population code operating solely on spike-count information can account for frequency-discrimination performance in humans has important implications for the search of neural correlates of frequency (pitch) perception in humans. For example, while explanations for the dependence of frequencydiscrimination thresholds on stimulus parameters such as frequency, intensity, and duration, have so far focused almost exclusively on peripheral (i.e., cochlear and auditory-nerve) response properties, our approach provides a method for examining the role of central factors, such as variations in neuronal density [46,47] or in spike-count correlations across BFs at the cortical level, in determining behavioral discrimination thresholds.

Methods
Responses of a population of frequency-selective cortical units were simulated as follows. The spike rate (in spikes/s) for unit i (i = 1,…, n) in response to a tone of frequency, f, and intensity, l, was computed as, where r e (l) and r s denote the stimulus-evoked spike rate at BF and the spontaneous spike rate, respectively, and h i (f) represents the frequency-tuning function, in which Q i denotes the BF (in Hz) of unit i, and the sharpness parameter, a i , was adjusted to yield a quality factor, Q, consistent with that of single units in the primary auditory cortex of primates [23]. This function is sometimes referred to as the ''rounded exponential'' (roex) function, and has been used to model psychophysical auditory-filter shapes [48] as well as neural frequency-tuning curves in the primary auditory cortex of primates [49]. The spontaneous rate, r s , was set to 0.1 spikes/s and the evoked rate, r e , for a 50 dB SPL pure tone having a frequency equal to the BF of the unit was set to15 spikes/s. These numbers are consistent with neurophysiological data [42]. Other physiologically realistic values for these parameters (e.g., r s = 1 and r e = 10 or 20) were also tested and led to qualitatively similar conclusions. Spike counts were simulated by drawing samples from a multivariate Gaussian probability density function with mean vector, r(f, l) = [r 1 (f, l), …, r n (f, l)], and covariance matrix, V(f, l), where # denotes the Hadamard (entrywise) matrix product. Consistent with neurophysiological data indicating that spikecount correlations for neuron pairs in primary auditory cortex tend to decrease with increasing BF distance and decreasing receptive-field overlap between the units [24,25], the spike-rate correlation matrix, C, was defined as,