Fig 1.
Cocktail party cocktail party task.
(a) Participants were seated in the center of a 16-ch speaker array within an anechoic chamber. Speaker heights were positioned at ear level (~130 cm) during the task with a radial distance of 160 cm to the center of the head and speaker-to-speaker distance of ~200. (b) Example stimulus presentation (2 and 4 masker talker conditions). Participants were asked to recall the color, number, and perceived location of target callsign sentences from the CRM corpus [68]. Target location was varied randomly from trial to trial and occurred simultaneously with between 0 and 4 concurrent talkers presented in either forward or time-reversed directions. (c) Example trial time course. After presentation of CRM sentences, listeners recalled the color-number combination of the target talker, its perceived location in the hemifield, and how many talkers they heard in the soundscape.
Fig 2.
Extended high frequency (EHF) hearing thresholds.
Audiograms for left (LE) and right (RE) ears. Pure-tone average (PTA) EHF thresholds in the normal and EHF (9–20 kHz; yellow highlight) frequency range were well within normal hearing limits. errorbars = ± 1 s.e.m.
Fig 3.
Cocktail party listening performance.
(a) Speech recognition declines with increasing masker counts but is much poorer under informational/linguistic vs. purely energic masking (cf., forward vs. reverse masker directions). Dotted line = chance performance. (b) Owing to their added linguistic interference, forward maskers yield slower recognition speeds than reverse maskers. (c) Listeners localized targets within 2 speakers (40-60O error) with better localization during purely energetic masking. (d) Source monitoring. Listeners saturate in source monitoring and only report hearing up to ~3 additional talkers despite up to 5 in the soundscape. errorbars = ± 1 s.e.m., ***p<0.0001.
Fig 4.
Stimulus- and task-dependent changes in the strength of perceptual categorization.
Speech categorization and RT speeds under (a-b) 2AFC and (c-d) VAS labeling tasks. Note the sharper, more discrete categorization for CVs compared to vowels in the 2AFC (but not VAS) condition. RTs show the typical slowing near the perceptually ambiguous midpoint of the vowel (but not CV) continuum for both tasks. VAS responses were 750 ms slower than 2AFC across the board. RTs are plotted normalized to the global mean to highlight token- and stimulus-related changes. Identification slopes reflect sqrt[abs(X—mean(X))] transformed values. errorbars = ± 1 s.e.m., *p<0.05.
Fig 5.
VAS ratings reveal stark individual differences in categorization and “continuous” vs. “categorial” listeners.
Individual histograms show the distribution of each listener’s phonetic labeling for CV and vowel sounds. Discrete (categorical) listeners produce more binary categorization where responses lump near endpoint tokens (e.g., S2). In contrast, continuous (gradient) listeners tend to hear the continuum in a gradient fashion (e.g., S16). Inset values show Hartigan’s Dip statistic [99] score, quantifying the bimodality—and thus categoricity—of each distribution. Higher dip values = discrete categorization; low values = continuous categorization. (inset) Dip values are similar between CV and vowels suggesting it is a reliable measure of listener strategy that is independent of speech material. errorbars = ± 1 s.e.m.
Fig 6.
Gradient listeners are less susceptible to speech interference at the “cocktail party”.
(a) Speech recognition performance in the cocktail party task for discrete and continuous listeners. Listener strategy was determined via Hartigan’s dip statistic [99] applied to VAS labeling (i.e., Fig 5) to identify individuals with bimodal (categorical) vs. unimodal (continuous) response distributions. Release from masking was measured as the difference in recognition performance between forward and reverse masker conditions at each masker count. (b) Discrete/categorical listeners show less masking release during speech cocktail party than their continuous listener peers. errorbars = ± 1 s.e.m.; shading = 95% CI; *p<0.05.