Figure 1.
Illustration of the two vocoding strategies.
A) Magnitude of the filters for generating a mixture of one target and one masker source in No-Pitch configuration. Time-frequency “chess” glimpses are shown in blue for the target and in red for the masker. The upper right insert shows example target and masker spectra, when the two sources were processed with this filter, averaged over a 10 ms time period (cf., vertical ellipse in 1.A). Target and masker spectra generally have little mutual energetic overlap (compare peak energy levels of each source at 75 dB to intersection points between blue and red curves at 25 dB, a dramatic 50 dB difference). The lower insert illustrates the shape of the time windowing that was applied over each 20 ms time period (cf., horizontal ellipse in 1A). B) and C) Side-by-side comparison of filters for a target-masker mixture in Pitch, and No-Pitch configurations. In both configurations, target and masker glimpses occupy mutually exclusive parts of the spectrum. Furthermore, because of the time windowing, in both Pitch and in No-Pitch configuration, all target and masker glimpses within each 10-ms time slice are co-modulated with each other, enhancing perceptual fusion between all elements in the mixture. In Pitch configuration, each source only occupies fields in half of the spectral channels (resulting in co-modulated glimpses with place pitch, i.e., a stripy filter pattern). In No-Pitch configuration, glimpses from both sources can occupy all spectral channels, constrained so that within each 10-ms time slice each source only occupies half of the channels (resulting in co-modulated glimpses without place pitch, i.e., a chess board filter pattern).
Figure 2.
Acoustic properties of spatial cues.
Across-time average interaural time differences (ITDs, top row, A, C, E) and interaural level differences (ILDs, bottom row, B, D, F) as a function of frequency. A, B) Unprocessed speech with high-fidelity spatial cues, from Experiment 1. ITDs and ILDs are zero in the Front configuration and considerably greater than zero in the Side configuration (in each panel, dashed lines are above solid lines). C, D) Chess-vocoded speech with high-fidelity ITDs and ILDs, from Experiment 2. ITDs and ILDs are approximately similar to those in the unprocessed condition (compare panels A and C, and B and D). E, F) Chess-vocoded speech with scrambled ITDs and high-fidelity ILDs, from Experiment 3. ITDs are close to zero in both Front and Side configuration (ITD pattern differs from those in panels A and C). ILDs, however, are approximately similar to those in the unprocessed and clean-ITD conditions.
Figure 3.
Performance from Experiment 1 with unprocessed speech as a function of target to masker broadband energy ratio (TMR).
Error bars show estimated 95% confidence intervals after correcting for between-listener variance. A) Percent correct for spatially separated sources (dashed line) and co-located sources (solid line). Performance was better in the spatially separated than in the co-located configuration, a phenomenon referred to as spatial release from masking. B) Spatial release from masking (SRM), i.e., the difference between the dashed and solid lines in panel A), expressed in z-units (see text for details on Data Analysis).
Figure 4.
Performance from Experiment 2 with chess-vocoded speech with clean ITD cues as a function of TMR.
Error bars show 95% confidence intervals. A) Percent correct for spatially separated sources (dashed line) and co-located sources (solid line). B) SRM. Performance was consistently better in the spatially separated than in the co-located configurations. SRM was greater for the No-Pitch condition than for the Pitch condition.
Figure 5.
Performance from Experiment 3 with chess-vocoded speech with scrambled ITD cues as a function of TMR.
Error bars show 95% confidence intervals. A) Percent correct for spatially separated sources (dashed line) and co-located sources (solid line). B) SRM. Dashed and solid lines nearly overlap, and SRM is close to zero for both the No-Pitch and the Pitch condition.
Figure 6.
Comparison of SRM across experiments.
SRM at 0 dB TMR (data replotted from Figs. 3, 4, 5). SRM is greater in the clean-ITD conditions compared to the scrambled-ITD conditions. Across the two clean-ITD cases, more spatial release occurred when place-pitch cues were absent than when they were present.