Human discrimination and modeling of high-frequency complex tones shed light on the neural codes for pitch

doi:10.1371/journal.pcbi.1009889

Fig 1.

Representation of complex tones in a simulated auditory nerve.

(A). Simulated responses of a population of high-spontaneous-rate auditory-nerve fibers [52] for five periods of a sine-phase HCT composed of harmonics 4–13 at a level of 50 dB SPL per component and an F0 of 280 Hz. The middle panel shows a “neurogram”, or a plot of instantaneous firing rate as a function of time and characteristic frequency. In the neurogram, color (from purple [low] to yellow [high]) indicates the instantaneous firing rate analogous to color indicating intensity in a spectrogram. The top and left panels show the temporal waveform and spectrum, respectively, of the acoustic stimulus. In the left panel, the yellow box highlights the frequency ranges used in the behavioral experiments. The bottom two panels and right panels show the responses of individual nerve fibers over time and the response profile averaged over time, respectively, of the simulated neural response. The bottom two panels show responses for auditory-nerve fibers tuned to component frequencies (4F0, purple; 12F0, blue) or tuned between component frequencies (5.5F0, pink; 9.5F0, maroon). (B). Same as A, except for an F0 of 1400 Hz.

More »

Expand

Fig 2.

Behavioral results from Experiment 1.

The left panel (Exp1a) shows results with only harmonics 6–10 present in the target; the right panel (Exp1b) shows results with the bandpass filtered target, with harmonics 6–10 in the passband. Large filled circles and error bars indicate the average F0DL and ±1 standard error of the mean (SEM). The small filled circles and error bars indicate individual F0DLs and ±1 SEM for each participant.

More »

Expand

Fig 3.

Behavioral results from Experiment 2.

Results from Experiment 2. Large filled circles and error bars indicate the average TMR and ±1 standard error of the mean (SEM). The small filled circles and error bars indicate individual F0DLs and ±1 SEM for each participant.

More »

Expand

Fig 4.

Comparison of the availability of spectrally resolved harmonics at low and high frequencies in an excitation pattern simulation.

Excitation patterns (average firing rate versus CF) for high-spontaneous-rate (HSR) auditory-nerve fibers (left) and low-spontaneous-rate (LSR) auditory-nerve fibers (right) responding to the stimuli in Experiment 1b based on a computational model of the auditory periphery [52]. The solid curve indicates the average excitation pattern while the filled area around the curve indicates ±1 standard deviation (over 10 simulations with different samples of masking noise and level roving). Vertical dashed lines indicate the frequencies of target harmonics. The F0 difference between the target and masker was 3%. See Materials and Methods for more details on the simulations.

More »

Expand

Fig 5.

Comparison of simulated autocorrelograms at low and high frequencies for the DBL stimulus.

(A) Autocorrelograms for simulated high-spontaneous-rate auditory-nerve-fiber responses [52] for the low-frequency DBL stimulus (left column) and the high-frequency DBL stimulus (right column). Here, one masker F0 was set to 5.5 ST below the target F0 while the other masker F0 was set to 6 ST above the target F0. The figure was constructed by fixing the target F0 and then varying the simulated CF (y-axis) linearly over a fixed range. For each CF, the instantaneous firing rate of the auditory-nerve model was computed and then the autocorrelation of that response was computed. Color (from purple/blue [low] to yellow/green [high]) indicates the autocorrelation value at each lag-CF point, analogous to color indicating intensity in a spectrogram. In each column, the first four panels show autocorrelograms for varying TMRs and the bottom panel shows the summary autocorrelation function (sACF; the sum of the autocorrelograms along the CF axis). Red boxes and labels indicate the zoomed-in views plotted in the following subfigures. (B) A zoomed-in view of autocorrelograms and the sACF for 0, 5, and 10 dB TMR at low frequencies for lags of 3.5 to 4.5 times the period of the target F0. Vertical lines indicate lags corresponding to the periods of each tone in the stimulus or to multiples of those periods (marked using the shorthand “2F0” to refer to the lag twice as long as the period of the tone). Lags corresponding to multiples of the target period are indicated with “XF0”, while lags corresponding to multiples of the masker periods are indicated with “XF0_L” and “XF0_U” for the lower and upper maskers, respectively. (c) A zoomed-in view of autocorrelograms and the sACF for 5 and 10 dB TMR and without maskers at high frequencies for lags of 1.5 to 2.5 times the period of the target F0. Vertical lines indicate the same as in the previous subfigure. Only vertical lines corresponding to the target period are plotted in the last subpanel because this simulation was conducted without the maskers. See Materials and Methods for more details on the simulations.

More »

Expand

Fig 6.

Comparison of excitation patterns at low and high frequencies for the DBL stimulus.

Excitation patterns (average firing rate versus CF) for simulated high-spontaneous-rate and low-spontaneous-rate auditory-nerve fibers [52] responding to the low-frequency DBL stimulus (left column) and the high-frequency DBL stimulus (right column) in Experiment 2. The solid curve indicates the average excitation pattern while the filled area around the curve indicates ±1 standard deviation (over 10 simulations with different samples of masking noise and level roving). Vertical dashed lines indicate the frequencies of target components and masker components. Color of the lines indicates which F0 they corresponding to (orange for the lower masker F0, gold for the target F0, and pink for the upper masker F0). See Materials and Methods for more details on the simulations.

More »

Expand

Fig 7.

Ideal-observer predictions for pure-tone frequency discrimination.

Results of the frequency discrimination simulations. (A) Simulated FDLs versus frequency for a pure tone in each auditory-nerve model. Simulations in this panel include no parameter roving. Points indicate the simulated FDLs at a particular frequency while lines indicate a locally estimated scatterplot smoothing (LOESS) fit to the simulated FDLs. The solid black line indicates the predicted FDLs from Micheyl et al. [58] scaled by a factor of 0.002 (to roughly match the low-frequency side of the curve to the best-performing model predictions from the present study). The axis on the right-hand side corresponds to the unscaled FDLs predictions from Micheyl et al. [58]. (B) Simulated all-information FDLs and vector strength (top row) and simulated rate-place FDLs and Q10 (bottom row) versus frequency with a double y-axis. To choose the warping on the y-axis for vector strength and Q10, linear models were fit to predict log-transformed FDLs as a function of log-transformed reciprocals of vector strength (Q10) for the all-information FDLs (rate-place FDLs). The fitted regression equations were then used to warp the y-axes. In other words, we warped the y-axes for vector strength and Q10 to maximize overlap with the FDL predictions (across all three models) in order to visually demonstrate the relationship between vector strength and Q10 and the simulated FDLs. (C) Ratio of simulated FDLs at 8.5 kHz and 2.0 kHz in the non-roved simulation at 30 dB re: threshold for each model (left) and ratio of behavioral estimates of FDLs at 8.5 kHz and 2.0 kHz from various studies (right). Simulated FDLs were interpolated using LOESS while behavioral FDLs were linearly interpolated on log-log coordinates.

More »

Expand

Fig 8.

Ideal-observer predictions for F0 discrimination.

Results of the F0 discrimination simulations. (A) Simulated F0DLs versus F0 of the ISO HCT stimulus in each auditory-nerve model. Simulations in this panel include no parameter roving. Points indicate the simulated F0DLs at a particular F0 while lines indicate a locally estimated scatterplot smoothing (LOESS) fit to the simulated F0DLs. (B) Simulated all-information F0DLs and vector strength (top row) and simulated rate-place F0DLs and Q10 (bottom row) versus frequency with a double y-axis. To choose the warping on the y-axis for vector strength and Q10, linear models were fit to predict log-transformed F0DLs as a function of log-transformed reciprocals of vector strength or Q10 for the all-information F0DLs or rate-place F0DLs, respectively. The fitted regression equations were then used to warp the y-axes. In other words, we warped the y-axes for vector strength and Q10 to maximize overlap with the model predictions (across all three models) in order to visually demonstrate the relationship between vector strength and Q10 and the simulated F0DLs (C) Ratio of simulated F0DLs at 1.4 kHz and 0.28 kHz in the non-roved simulation at 30 dB re: threshold for each model (left) and ratio of behavioral estimates of F0DLs at 1.4 kHz and 0.28 kHz from various studies (right). Simulated F0DLs were interpolated using LOESS while behavioral F0DLs were linearly interpolated on log-log coordinates.

More »

Expand