Understanding Pitch Perception as a Hierarchical Process with Top-Down Modulation

doi:10.1371/journal.pcbi.1000301

Figure 1.

Schematic outline of the model.

The model consists of: 1) a simulation of auditory nerve spiking probabilities, p(t,k) (blue), in response to a sound for each cochlear frequency channel, k; 2) a cross-product of the auditory nerve activity with a time-delayed version of itself for a range of different time lags, l (in the diagram, processing relating to different lags is represented by stacked boxes); 3) two integration stages, A₂ and A₃, shown by green and red ellipses, which represent highly idealized models of collective neuronal responses using a shorter (τ₂) and a longer (τ₃) time constant, respectively. L₂(t) is the lag yielding the maximum response at the second processing stage, A₂(t,l); its inverse, 1/L₂(t) represents an intermediate pitch prediction of the model. Similarly, 1/L₃(t) represents the ultimate pitch estimate predicted by the model. When the pitch estimate changes over time, a mismatch between the previous pitch estimate at level 3 (labelled “expected pitch” or 1/L^E) and the current prediction at the first integration stage, 1/L₂, feeds back to modulate the recurrent processes (curved lines) at both integration stages. See text for details.

More »

Expand

Figure 2.

Example of the model output in response to an arbitrary sequence of pure tones with random frequencies and durations.

(A) Spectrogram of the stimulus as a function of time. (B) Response of the second processing stage, A₁(t,l), plotted as a function of time, t (abscissa), and time lag, l (ordinate). (C) Effective integration window of the second and third processing stages, A₂(t,l) (E₂(t), green solid line) and A₃(t,l) (E₃(t), red dotted line). E₂(t) represents the integration time at the lag corresponding to the maximum response in the second stage. (D) Response of the third processing stage, A₃(t,l). A₃(t,l) was normalized to a maximum response of unity and exponentially enhanced after each time step for illustrative purposes. The colours in plots (B) and (D) represent the activation strength as a percentage of the maximum response at that time (blue: low response or 0%; red: maximal response or 100%). Thus, the lag channel corresponding to the current pitch estimate appears red.

More »

Expand

Figure 3.

Responses of autocorrelation models with fixed time constants.

(A) Response of the cascade autocorrelation model [13]; left plot: to the sequence of random tones shown in Figure 2A, and right plot: to the alternating click train shown in Figure 8A. (B) Response of short-term integration stage of the cascaded model (corresponding to the second stage of the current model, A₂(t,l), when the feedback modulation of the integration times, equation 8, is switched off); see text for further explanation. As in panel A, the left panel shows the response to the tone sequence and the right panel shows the response to the click train (arrows mark the reported pitches). Different colours show activation strength as a percentage of the maximum response, as in Figure 2.

More »

Expand

Table 1.

Model parameters used in the simulations.

More »

Expand

Figure 4.

Diagrammatic representation of the computations involved in the recurrent processes of A_n(t,l) in flowchart form.

More »

Expand

Figure 5.

Response of the model to stimuli used in the Hall and Peters' experiment [14].

(A) Spectrogram of a rapid sequence of three 40-ms tones presented in quiet (left panel) and after the addition of white noise (right panel). (B) Response of the third stage of the model, A₃(t,l), for the stimulus in quiet (left panel) and in noise (right panel). In the noise condition, the response represents the average over three different random realizations of the noise background. Different colours represent activation strength as a percentage of the maximum response as in previous figures. Arrows indicate the lowest pitch reported by listeners in each condition. (C) Snapshot of A₃(t,l) at the end of the stimulus (t_final) in quiet (left panel) and in noise (right panel). Vertical dashed lines correspond to the final predicted pitch.

More »

Expand

Figure 6.

Response of the model to stimuli used in the Plack and White experiment [17].

(A) Stimulus waveform of a rapid sequence of two 20-ms complex tones (harmonics of 250 Hz) separated by a 8-ms silent gap (left panel) or a noise of similar root mean square level as the complex tones (right panel); after band pass filtering (5500–7500 Hz) and the addition of a white noise background. (B) Effective integration times, E₂(t) (green solid line) and E₃(t) (red dotted line) at the second and third stages of the model. (C) Response of the third stage of the model, A₃(t,l), for the silent-gap condition (left panel) and noise-burst condition (right panel). Different colours represent activation strength as a percentage of the maximum response as in previous figures.

More »

Expand

Figure 7.

Comparison between human and model pitch-gap and pitch-modulation thresholds in a task specifically designed for assessing temporal resolution in pitch perception (see text for details).

(A) Spectrogram for a rippled noise (RN) with a 4-ms delay, which contains a 25-ms gap in serial correlation around the centre of the stimulus, not visible in the figure. (B) First peak height of the running autocorrelation as a function of time (Rh1[t]) for both the modulated (red) and gap (blue) RN stimuli; averaged over 10⁵ stimulus realizations. (C) A₃(t,l), for the stimulus shown in panel A, normalized and displayed as in previous figures. (D) Average detection thresholds and standard errors for the pitch-gap (blue circles) and pitch-modulation conditions (red triangles). The corresponding model predictions are shown in the same colours (dots and stars).

More »

Expand

Figure 8.

Model response to a high-pass-filtered click train with alternating inter-click intervals [49],[50].

(A) Central portion of the stimulus waveform (the total duration is 400 ms) for a click train with inter-click intervals alternating between 4 and 6 ms after high-pass filtering and the addition of a pink noise background. (B) Effective integration time, E₂(t) (green solid line) and E₃(t) (red dotted line) at the second and third stages of the model. (C) Model response at the third stage, A₃(t,l), normalized and displayed as in previous figures. The arrow marks the lag corresponding to the pitch reported by listeners. (D) Final snapshot of A₃(t,l) at t_final. The vertical dashed line corresponds to the average pitch reported by listeners.

More »

Expand

Figure 9.

Model evaluation of the Pitch Onset Response (POR).

(A) Spectrogram of the final portion of the stimulus waveform; consisting of 500 ms of iterated rippled noise (delay 12 ms, 16 iterations); preceded by uncorrelated noise (not shown). (B) A₃(t,l) (without any normalization); colours show activation strength as a percentage of the maximum response. The horizontal arrow indicates the delay corresponding to the reported pitch of this stimulus. (C) Smoothed derivative of A₃(t,l); obtained by convolving the model output with the first derivative of a Gaussian function of 60 ms width and 6 ms of standard deviation (dotted red line). Solid green line shows the variance of A₃(t,l). C is the Pearson correlation coefficient between the smoothed derivative and the variance. (D) Comparison between the model and neuromagnetic results. The solid blue line illustrates the latency of the experimentally measured POR. The dotted red line shows the time at which the maximum of the smoothed derivative is first achieved (within a 2% of tolerance in this value). The left panel shows the POR latencies as a function of delay when the number of iterations is fixed (16). The right panel shows POR latencies when the delay is fixed to 16 ms and the number of iterations varies.

More »

Expand

Figure 10.

Minimum stimulus duration required to perceive a stable pitch sensation.

The solid blue line shows the perceptual results averaged over listeners; and the dotted red line, the mean model predictions.

More »

Expand