The authors have declared that no competing interests exist.
Sounds in everyday environments tend to follow one another as events unfold over time. The tacit knowledge of contextual relationships among environmental sounds can influence their perception. We examined the effect of semantic context on the identification of sequences of environmental sounds by adults of varying age and hearing abilities, with an aim to develop a nonspeech test of auditory cognition.
The familiar environmental sound test (FEST) consisted of 25 individual sounds arranged into ten five-sound sequences: five contextually coherent and five incoherent. After hearing each sequence, listeners identified each sound and arranged them in the presentation order. FEST was administered to young normal-hearing, middle-to-older normal-hearing, and middle-to-older hearing-impaired adults (
FEST scores revealed a strong positive effect of semantic context in all listener groups, with young normal-hearing listeners outperforming other groups. FEST scores also correlated with other measures of cognitive ability, and for CI users, with the intelligibility of speech-in-noise.
Being sensitive to semantic context effects, FEST can serve as a nonspeech test of auditory cognition for diverse listener populations to assess and potentially improve everyday listening skills.
When sensory information is degraded due to external signal distortions or peripheral sensory limitations, listeners tend to increasingly rely on contextual information to maximize the accuracy of perceptual decisions [
In daily life, sounds rarely occur in isolation. The tacit knowledge of how they relate to one another informs the listener about what is happening in their environment, with significant implications for individual safety and quality of life [
As an ontologically broader class of ecologically relevant sounds, environmental sounds are different from speech in that they are nonlinguistic in nature. With the exception of alarms or other electronically synthesized sounds such as auditory icons specifically constructed to convey information, environmental sounds represent unintentional byproducts of distal events [
Previous research with normal-hearing young adults indicates that the identification of individual environmental sounds in an auditory scene can be affected by their contextual relationships. Similar effects have been previously found in the perception of objects in visual scenes [
Furthermore, when asked to categorize environmental sounds, adult listeners tended to group them based on semantic relationships that could either include abstract object properties or draw on meaningful activities the sounds represent in everyday life, e.g., ‘getting the groceries’ [
Another recent investigation of how identification of individual sounds forming auditory scene-like sequences is affected by the contextual relationships among the sounds was undertaken by Risley and colleagues [
Aging has been shown to be associated with a decline in cognitive abilities, including working memory and attention [
Evidence from speech research that has evaluated the effect of semantic context indicates that in adverse listening environments older and hearing-impaired adults tend to rely on context to a greater extent than younger or normal-hearing listeners[
Cochlear implants (CIs) have provided a highly effective intervention for hearing-impaired individuals who do not benefit from other types of sensory aids. However, a number of perceptually salient acoustic features are removed (e.g., temporal fine structure) or severely degraded (e.g., spectral resolution) by the signal processing of the implant [
The present study was designed, first, to develop a clinical test of auditory cognition based on nonlinguistic environmental sound stimuli—a test suitable for use in both clinical and research settings with diverse patient populations. The second goal was, using this test, to extend previous investigation of context effects in environmental sound perception by examining the role of aging, hearing impairment and cochlear implants. To that end, in Experiment 1, a short test of environmental sound sequences was constructed from the stimuli used in previous environmental sound studies [
This experiment investigated the influence of aging and hearing loss on the involvement of semantic context in environmental sound perception. Based on findings from earlier work with YNH adults [
All methods were approved by the Institutional Review Board of the Rush University Medical Center, and all participants provided written informed consent.
Upon completing a background audiometric assessment, subjects completed the environmental sound test. Participants in the the middle-to-older aged normal-hearing (MON) and the middle-to-older aged hearing-impaired (MOI) groups were also assessed for speech perception in noise, cognitive status and working memory ability. Environmental sounds were presented diotically at 75 dB SPL. The levels used in speech testing are described below. All auditory testing was conducted in a double-walled soundproof booth using Etymotic ER-3A insert earphones. After completing the cognitive tests, subjects received audiometric evaluation and speech testing, and concluded with the environmental sound tests.
Environmental sound perception was tested with the following. Familiar Environmental Sound Test—Identification (FEST-I) comprised sounds selected from a previously developed large-item test of environmental sound perception [
Sound Name | Duration (sec) | Sequence Name (C) | Position (C) | Sequence Name (I) | Position (I) |
---|---|---|---|---|---|
alarm ringing | 4.74 | Waking up | 3 | INC05 | 3 |
barking | 1.23 | House visitor | 2 | INC03 | 3 |
birds chirping | 1.18 | Waking up | 5 | INC02 | 4 |
fog horn | 3.97 | Ocean side | 2 | INC02 | 2 |
tires screeching | 1.37 | Car accident | 2 | INC01 | 2 |
busy signal | 2.55 | Phone call | 5 | INC05 | 5 |
crashing | 4.85 | Car accident | 4 | INC03 | 5 |
dialing | 5.26 | Phone call | 4 | INC04 | 1 |
dial tone | 2.09 | Phone call | 3 | INC02 | 3 |
doorbell | 1.93 | House visitor | 1 | INC04 | 4 |
door closing | 2.06 | House visitor | 4 | INC02 | 1 |
driving | 2.17 | Car accident | 1 | INC02 | 5 |
honking | 0.92 | Car accident | 3 | INC04 | 2 |
dog panting | 2.18 | House visitor | 5 | INC05 | 2 |
phone ringing | 2.93 | Phone call | 1 | INC03 | 2 |
pickup receiver | 0.56 | Phone call | 2 | INC01 | 4 |
rooster | 1.64 | Waking up | 2 | INC04 | 3 |
seagulls | 1.98 | Ocean side | 1 | INC05 | 4 |
police siren | 3.88 | Car accident | 5 | INC05 | 1 |
snoring | 3.72 | Waking up | 1 | INC03 | 4 |
splash | 1.96 | Ocean side | 5 | INC03 | 1 |
trotting | 6.58 | House visitor | 3 | INC01 | 5 |
footsteps | 5.04 | Ocean side | 4 | INC04 | 5 |
waves crashing | 2.55 | Ocean side | 3 | INC01 | 3 |
yawning | 2.38 | Waking up | 4 | INC01 | 1 |
The 25 sounds of FEST-I and their durations along, with each sound’s position in the coherent (C) and incoherent (I) sequences of FEST-S. All environmental sound stimuli are available online in wav format and can be downloaded from
Familiar Environmental Sound Test—Sequences (FEST-S)—the 25 individual sounds used in FEST-I were arranged into two separate sets of five sequences, with each sequence composed of five individual sounds (
Individual FEST-S sequences were presented in random order. After hearing all five sounds in each sequence, subjects first rated its contextual coherence, i.e. how likely the sounds in the sequence were to be heard at the same place and time. Subject responses were entered using a slider with minimum (0.0) and maximum (1.0) scale values representing “extremely unlikely” and “extremely likely,” respectively. Next, subjects selected the names of the sounds from the 25 names on the screen and arranged them in the order they were presented in the sequence. Subjects were free to begin the sound identification task from any position in the sequence between the first and last sound. Sequence presentation was subject paced. Two practice trials using one coherent and one incoherent sound sequence were completed as many times as necessary to ensure understanding of the task (typically once or twice). The environmental sounds of the practice trials were not used in the scored FEST-S testing. Prior to testing, each subject read the list of 25 environmental sounds out loud to ensure familiarity with all sound name options and their location on the display. On average, FEST-S administration took about12-15 minutes. The Matlab software package used for presenting FEST stimuli can be downloaded from
Environmental sound context effects assessed by FEST-S involve manipulation of individual sounds in working memory. Although FEST-S is administered in the auditory modality and thus involves auditory working memory, it could also be expected that manipulations across semantic categories may involve general modality-independent aspects of working memory. To test this possibility, MON and MOI subjects were given two working memory tests, one delivered through the auditory and the other through the visual modality.
Letter-Number Sequencing (LNS), presented to subjects aurally, is a subscale of the WAIS test [
The Reading Span (RS) test assesses parallel operations of both memory storage and semantic processing abilities [
The Montreal Cognitive Assessment (MoCA) [
Two tests were used to evaluate speech perception in noise. The first assessed speech perception in terms of the intelligibility of sentences from the Quick Speech-in-Noise Test (QuickSIN) in the presence of a four-talker speech-babble masker [
The second measure was based on the Speech in Noise Test—Revised (SPIN-R) [
Three groups of listeners participated in Experiment 1. The first group consisted of 15 young adults (five males; age range: 21–28 yrs; mean: 24.4 yrs) who had normal audiometric thresholds (≤15 dB HL re: ANSI 2004) for the octave frequencies between 0.25–4.0 kHz. The second and third groups consisted of middle-to-older aged listeners, with the groups distinguished by the pure-tone average (PTA) of audiometric thresholds between 0.25–4.0 kHz for their better hearing ear. There were 19 participants in the older MON group and 11 participants in the older MOI group. The subjects in the MON group (5 males; age range: 54–78 yrs; mean: 63.1 yrs) had an average PTA of 16.6 dB HL (
Audiometric thresholds for each ear for middle-to-older-normal-hearing (MON) and middle-to-older hearing-impaired (MOI) participants. The error bars represent 1 standard error shown on one side of each curve for better visibility.
Unlike FEST-S that was administered to all participants, FEST-I was administered to 17 subjects from the MON and MOI groups. It was added after the initial 13 subjects in MON and MOI groups were already tested as an extra precautionary measure to confirm high identification of the individual sounds in older listeners. Although performance on coherent and incoherent FEST-S sequences is based on the same set of individual sounds, which prevents individual sound identification from confounding performance differences between coherent and incoherent sequences, FEST-I scores were obtained to verify high identification accuracy for individual sounds in the two older groups. Thus, FEST-I was administered only to 11 and six of the MON and MOI participants, respectively. It was not administered to YNH group since previous studies consistently indicated high (above 90% accuracy) with these test sounds.
Environmental sound results were analyzed to determine the role of semantic context in the perception of individual environmental sounds and to examine the relationships of environmental sounds with working memory and speech perception abilities. Initially, the coherence ratings for all sound sequences were examined in all groups to confirm the classification of sequences into the two different categories. Next, individual sound identification scores obtained in FEST-I were examined in a sample of listeners in the MON and MOI groups to verify their familiarity with the test stimuli. In turn, results from environmental sound sequence test, FEST-S, were evaluated using three outcome metrics: labels correct (LC), order correct (OC) and sequence correct (SC). These three metrics applied to the same listener responses, but differed in how stringently the responses were evaluated. For the LC metric, a response was counted as correct if the label chosen corresponded to any of the five sounds in the corresponding sequence. For the OC metric, a response was counted as correct only if a correct response label was placed in the correct position for the corresponding sound in the sequence. For the SC metric, a response was counted as correct only if all five sounds in a sequence were correctly labeled and each label was placed in the correct order of sound presentation. Thus, for the LC and OC metrics, there were 25 scored responses per condition (five sounds times five sequences), while for the SC metric, a single binary score was derived from the response to each trial. Finally, correlation analysis was performed to examine potential relationships among recognition of environmental sound sequences, speech-in-noise intelligibility, and working memory ability. In analyses, percent correct scores from the FEST-I, FEST-S, and SPIN-R tests were submitted to an arcsine transform before data analysis.
All three groups rated the sounds of the semantically coherent sequences as being more likely to be heard in the same place or time than those in the incoherent sequences (
Experiment 1 | Experiment 2 | ||||
---|---|---|---|---|---|
YNH | MON | MOI | CIV | CI | |
0.92 (0.03) | 0.78 (0.05) | 0.74 (0.06) | 0.69 (0.04) | 0.79 (0.06) | |
0.24 (0.06) | 0.38 (0.05) | 0.42 (0.08) | 0.26 (0.03) | 0.33 (0.07) |
Average rating of sound sequences by listeners groups in both experiments: young normal-hearing (YNH), middle-to-older normal-hearing (MON), middle-to older hearing-impaired (MOI) listeners, YNH subjects listening through vocoder-simulated implants (CIV) or cochlear-implants users (CI). Rating values increase with perceived coherence of the FEST sequences. Standard errors are shown in parentheses below the average rating for each condition.
Analysis of individual sound identification responses was conducted on a subset of 17 MON and MOI listeners indicated that listeners in both the MON and MOI groups were able to identify the 25 individual sounds in the test with high accuracy. The overall identification rate was 88.5% correct (
Analysis of context effects revealed that all three groups were able to benefit from contextual relationships among sounds in semantically coherent environmental sound sequences (
As box plots, performance accuracy of each group for contextually coherent (open boxes) and incoherent sequences (gray boxes) for each of the three scoring metrics: Labels Correct (top), Order Correct (middle), and Sequence Correct (bottom). The line through each box is the median threshold; the upper and lower box edges indicate the 25th and 75th percentiles with error bars showing the 10th and 90th percentiles. Please note that with more stringent scoring metric of Sequence Correct, many listeners across groups did not respond correctly to any of the incoherent sequences, skewing the group distribution of scores. Consequently, the line displayed at 0% correct may represent the group median as well as the 25th and 75th percentiles of the performance distribution.
FEST-S scores averaged across all 10 FEST sequences for each group, without division by contextual coherence, indicated differences in performance between the YNH controls and both the MON and MOI groups. However, these differences emerged only with the more stringent metrics: OC and SC. With the most lenient scoring metric, LC, all three groups performed similarly on FEST-S overall: 77.6% (
The overall magnitude of the context effect was greater for YNH than MON and MOI subjects, whose performance, in turn, was remarkably similar. Furthermore, with the more stringent response metrics of OC and SC, the magnitude of context effects was smaller for MON and MOI subjects, compared to that with LC scoring. In contrast, for YNH listeners, the magnitude of the context effects nearly doubled with the SC metric compared to the LC or OC metrics (
Overall, MON and MOI listeners demonstrated highly comparable results on the tests of cognitive abilities and speech perception in noise. Independent t-tests between the two groups revealed no significant differences on any of the tests. Both groups had very similar average scores on MoCA [MON: 26.5 (
Evaluation of the association between environmental sound sequence perception and tests of cognitive status and speech-in-noise abilities was conducted with multiple linear regression models that controlled for age and hearing sensitivity. Separate models were run for results obtained with coherent and incoherent sequences of the FEST-S protocol. Only one of the three FEST-S scoring metrics was used in the analysis since the three metrics are based on the same underlying data and are not independent. The OC metric was chosen because of the broader range in performance it provides compared to either the more or less stringent measures. The analysis was conducted with the MOI and MON groups combined due to lack of significant differences in their FEST-S performance.
For coherent sequences, the regression model was statistically significant [
Independent Variable | Estimate | 95% CI (lower/upper) | Squared Bivariate Correlation | Squared Semi-Partial Correlation | |||
---|---|---|---|---|---|---|---|
.000 | .007 | -.013/.014 | .014 | .950 | .057 | >.001 | |
.005 | .004 | -.003/.014 | .256 | .195 | .024 | .034 | |
.014 | .023 | -.033/.062 | .123 | .535 | .191 | .008 | |
-.105 | .319 | -.768/.558 | -.074 | .745 | .079 | .002 | |
.533 | .298 | -.087/1.153 | .376 | .088 | .124 | .060 | |
.050 | .018 | .013/.087 | .625 | .010 | .364 | .150 | |
.020 | .016 | -.013/.053 | .241 | .215 | .242 | .031 | |
.000 | .003 | -.007/.007 | .075 | .941 | .191 | >.001 |
Relation of coherent-sequence performance for identifying sounds in the correct order to age, auditory and cognitive abilities for older listeners with and without hearing loss. Table entries are the estimated coefficient, standard error (
Independent Variable | Estimate | 95% CI (lower/upper) | Squared Bivariate Correlation | Squared Semi-Partial Correlation | |||
---|---|---|---|---|---|---|---|
.001 | .006 | -.012/.015 | .047 | .817 | .028 | >.001 | |
.002 | .004 | -.006/.010 | .087 | .636 | .008 | .004 | |
.014 | .022 | -.031/.059 | .116 | .536 | .218 | .007 | |
-.319 | .303 | -.949/.312 | -.224 | .305 | .048 | .019 | |
.325 | .284 | -.265/.916 | .228 | .264 | .116 | .022 | |
.045 | .017 | .010/.080 | .559 | .015 | .350 | .120 | |
.039 | .015 | .008/.070 | .469 | .016 | .367 | .117 | |
.002 | .003 | -.005/.008 | .095 | .628 | .180 | .004 |
Relation of incoherent-sequence order-correct performance to age, auditory and cognitive abilities for older listeners with and without hearing loss. Table entries are the estimated coefficient, standard error (
As exploratory analyses, the linear regression models included covariates that assessed either similar abilities (e.g., SPIN Low and QuickSIN) or interrelated processing (e.g., involvement of working memory in speech-in-noise performance). Consequently, colinearity among model covariates was anticipated. This colinearity is illustrated for each independent variable by comparison of the squared bivariate correlation and squared semi-partial correlation. In both models, the semi-partial correlations are much lower (see Tables
Tables
Component | Variance explained (%) |
---|---|
24.8 | |
10.0 | |
8.5 | |
7.5 | |
7.2 | |
6.3 |
Unique and shared variance components contributing at least 6% to variance explained in the linear regression model of
Component | Variance explained (%) |
---|---|
18.6 | |
18.1 | |
10.2 | |
7.9 | |
7.1 | |
7.1 | |
6.2 |
Unique and shared variance components contributing at least 6% to variance explained in the linear regression model of
Overall, the regression analyses indicated that MoCA, the summary measure of cognitive status, was significantly associated with FEST-S performance with both coherent and incoherent sequences. Significant association of auditory working memory as assessed by LNS was obtained only for the incoherent sequences of the FEST-S protocol. Effect of sequence type on the relationship to working memory is consistent with an absence of sequence context leading to greater memory demands. Though both environmental sound sequence recognition and speech-in-noise processing may involve related aspects of working memory, absence of significant association between FEST-S and speech performance may in part reflect colinearity among the metrics. Finally, no significant associations were found between FEST-S scores and either hearing sensitivity or age, despite the relatively broad age range of the older participants in the analyses.
This experiment investigated the effects of semantic context on the perception of environmental sounds in listeners with cochlear implants. To separate contributions of CI processing-related distortions from those of other user variables that can influence the involvement of semantic context, two groups were examined: YNH adults tested with CI vocoder simulations (CIV) and older experienced CI users. Prior to taking FEST-S, both groups practiced with FEST-I to achieve a similar level of group performance in identification of the individual sounds. Based on the findings of Experiment 1 in which YNH listeners outperformed MOI and MON listeners in terms of the beneficial effect of context, it was expected that the magnitude of the context effect would be greater for the YNH listeners in the CIV group than for the older adult CI users.
All methods were approved by the Institutional Review Board of the Rush University Medical Center, and all participants provided written informed consent.
Experiment 2 closely followed the procedures of Experiment 1 except for the following modifications. Prior to testing listeners with sequences in FEST-S, both groups practiced with the individual environmental sounds in FEST-I. The practice consisted of initial FEST-I testing which was followed by three additional repetitions of FEST-I during which the 25 sounds were presented in random order with feedback. During these practice runs, if an individual sound was not identified correctly, the correct sound name appeared and the subject was required to listen to the sound three times before the next sound could be played. Following the three practice runs, during one final administration of FEST-I, subjects demonstrated moderately high identification accuracy with the individual environmental sounds: mean 75% correct (range 40–100%) for the CIV group, and mean 85% correct (range 69–100%) for the CI users.
Experiment 2 used BKB-SIN [
All FEST stimuli were presented to CIV subjects diotically via Sennheiser HD250 headphones in a sound-treated room at 75 dB SPL. Prior to presentation to CIV listeners, all FEST stimuli were modified with a vocoder to simulate effects of CI processing. Their spectral resolution was reduced to four frequency bands, using the spectral-degradation techniques of previous studies [
CI users were similarly tested in a sound-treated booth. However, all FEST and speech stimuli were presented to them unprocessed in a sound field at 70 dB SPL at the position of the listener head. Sound presentation was through a single loudspeaker positioned at 45 degrees to the implanted ear of each participant, who was sitting one meter away. The lower presentation level used with CI than CIV listeners was chosen to minimize potential input distortions that could have resulted from the application of automatic gain control of the CIs. Furthermore, the nonimplanted ear was occluded with an E-A-R Classic- AQ10 foam earplug (NRR 29 dB) to avoid potential residual hearing effects from the contralateral side.
Two groups of listeners participated in Experiment 2. One group included 19 young adults (two males; age range: 20–26 yrs; mean 23 yrs) with normal audiometric thresholds (≤15 dB HL). The second group included eight postlingually deafened experienced CI users (three males; age range: 25–68; mean 54 yrs) with the average implanted-ear four-tone PTA (0.5, 1.0, 2.0, 4.0 kHz) equal to 28.1 dB (
Mean (SD) | Median | Range | |
---|---|---|---|
54.2 (12.8) yrs | 56.5 yrs | 25.0–68.0 yrs | |
43.2 (12.3) yrs | 45.0 yrs | 16.0–53.5 yrs | |
50.5 (13.6) yrs | 52.0 yrs | 21.0–66.0 yrs | |
3.6 (2.5) yrs | 3.0 yrs | 1.3–9.0 yrs | |
27.71 dB HL (8.35) | 27.5 dB HL | 15–43.3 dB HL | |
26.88 points (2.9) | 27.5 points | 21–30 points | |
10.63 points (2.77) | 11 points | 6–15 points | |
7.8 dB SNR 50 (4.85) | 5.7 dB SNR 50 | 3–16 dB SNR 50 |
Characteristics of the cochlear implants (CI) users of Experiment 2, along with audiometric and cognitive test results.
Following the analysis of ratings of sequence coherence, FEST-S scores obtained from CIV and CI participants were analyzed using the same three scoring metrics as in Experiment 1, with percent-correct scores submitted to an arcsine transform before data analysis. In addition, accuracy scores of CIV and CI participants, along with the three groups from Experiment 1, were evaluated to examine the effect of sound serial position within sequences on identification accuracy (i.e., primacy and recency effects). Next, response timing for the rating and identification tasks was examined to gain further insight into the effect of coherence in different subject groups. Lastly, a correlational analysis was conducted to evaluate possible associations between FEST-S and tests of working memory and cognition.
Both CIV and CI subjects rated semantically coherent sequences as being more likely to occur at the same place or time than semantically incoherent ones (0.69 and 0.79 vs. 0.26 and 0.33, respectively). These ratings closely correspond with those obtained from the MON and MOI listeners in Experiment 1 (
As can be seen in
Additional analyses compared the performance of CI users with that of MON and MOI listeners, who, on average, provided a closer comparison in age than did the CIV group, and also performed similarly on the tests of cognitive abilities (i.e., MoCA and LNS). With one exception, CI users were middle-age to older adults, overlapping with the age range of the participants in the MON and MOI groups of Experiment 1. The MOI and MON subjects were combined as a single group due to lack of significant differences in their FEST-S performance, and were compared with CI users in six independent-samples t-tests (two, with either coherent and incoherent sequences, for each scoring metric—LC, OC, SC). Across tests, there were no significant effects (
Comparison between YNH and CIV listeners, two similar groups distinguished by the signal processing performed on the environmental sounds presented to the CIV group, indicated that sensory degradation introduced by cochlear implants can impede the processing of the semantic information in auditory scenes. Although both groups were able to benefit from semantic context in the coherent sound sequences, there was a general trend for YNH listeners to perform better than CIV listeners in all conditions. Six separate independent-samples t-tests (two sequence types by three scoring metrics) with a Bonferroni correction showed significant differences,
Identification of individual sounds in FEST-S sequences was further examined in terms of their serial position in the sequence. This analysis was performed for the OC metric only since it provided the largest performance range across groups and was deemed most informative. Differences in sound identification accuracy were expected to follow a typical U-shaped function which reflects superior recall of items occurring earlier and later in the sequence (i.e., primacy and recency effects) [
Interestingly, there were also clear differences in position order effects for coherent and incoherent sequences. Overall, sounds in incoherent sequences were recalled less accurately than coherent sequences. However, the differences in accuracy between coherent and incoherent sequences varied with sound position (
Panels display performance accuracy of each group for all five serial positions of the individual environmental sounds comprising coherent and incoherent sequences. With some exceptions in CIV and CI groups, better performance for environmental sounds that occur early and late in the sequence can be seen with both coherent and incoherent sequences. Performance for coherent sequences is also generally better than incoherent sequences. Notably, CI users demonstrate the recency effect only for the coherent sequences, in which they could use contextual information, while the effect is absent for incoherent sequences. In all other groups, recency effects are evident for both coherent and incoherent sequences.
It could be proposed that a simple measure of the difference in accuracy of identification of the last sound in coherent and incoherent sequences could indicate how much listeners may benefit from the preceding context. The greater the difference, the more semantic context provided by the previous sounds in the sequence has contributed. As can be seen in
Additional analyses were conducted to evaluate the effect of coherence on the speed of performing ratings and identification tasks in each group. Generally, faster timing was observed in responses to coherent than incoherent sequences for both tasks (
Experiment 1 | Experiment 2 | |||||
---|---|---|---|---|---|---|
YNH | MON | MOI | CIV | CI | ||
2.61 (.31) | 5.48 (.73) | 5.64 (.73) | 2.79 (.29) | 6.86 (.76) | ||
3.15 (.37) | 5.6 (.52) | 7.78 (.71) | 3.09 (.33) | 7.62 (1.16) | ||
35.39 (2.75) | 85.81 (8.91) | 74.99(6.71) | 33.35 (2.38) | 57.38 (11.38) | ||
41.49 (4.26) | 87.77 (8.28) | 84.0 (9.7) | 40.8 (3.94) | 77.22 (10.33) |
Average time in seconds taken to complete rating and identification tasks for listener groups in both experiments: young normal-hearing (YNH), middle-to-older normal-hearing (MON), middle-to older hearing-impaired (MOI) listeners, with vocoder-simulated implants (CIV) and cochlear-implant users (CI). Standard errors are shown in parentheses below the average response time for each entry.
Summary descriptions of scores on the tests of cognitive status, working memory, and speech-in-noise ability for CI users are listed in
Correlation analyses were conducted to examine if the use of semantic context in the perception of auditory scenes was associated with the speech-in-noise abilities of CI users. As in Experiment 1, only the OC metric was used to represent FEST-S performance. As shown in
Coherent Sequences | Incoherent Sequences | |
---|---|---|
-.17 | -.47 | |
-.25 | .04 | |
-.80 |
-.64 | |
.53 | .67 |
|
.57 | .70 |
Pearson correlations between order-correct scores on coherent and incoherent sequences of cochlear implant (CI) listeners, with age and audiometric results.
** indicates significance at
* indicates significance at
Successful navigation of real-world environments involves two interdependent perceptual skills: (1) identification of the objects and events in one’s vicinity, and (2) awareness of the relationships among these objects and events. For example, the sound of a barking dog may signal a coming visitor and could be preceded or followed by a doorbell. Honking may be followed by screeching tires, alerting listeners to the possibility of an accident. Running footsteps followed by a big splash may signal someone jumping into the water. These skills, ubiquitous in both visual and auditory modalities, aid people in constant monitoring of the environment, focusing of attention, and prediction of future events [
The better performance of YNH listeners may result from a combination of factors. In addition to better hearing sensitivity, YNH listener typically tend to demonstrate higher sensitivity to spectral and temporal variation, and greater cognitive capacity than older adults [
Nevertheless, auditory working memory capacity as assessed by LNS, and a broader range of cognitive abilities, assessed by MoCA, may be involved in mediating the relationship between performance on FEST-S and speech perception in noise. Shared variance of these tests was a predictor of FEST-S scores for both coherent (
There are several additional factors that may have contributed to the strong facilitatory effects of semantic context found in the present study. The current sound identification procedure relied on word labels, and thus involved linguistic coding of stimuli names. Such explicit linguistic coding of the stimuli might have been enhanced since prior to identification, listeners rated the coherence of each stimulus sequence. Performing additional semantic operations on the stimuli during rating, which also further extended the time interval between stimulus presentation and identification of individual sounds, could conceivably increase the reliance on word labels for more efficient memory processing. Furthermore, the present results were obtained when all stimuli were presented in quiet. The addition of background noise, even without any identifiable semantic content, could also affect the strength of the context effect—a possibility consistent with prior research [
The magnitude of the context effect might have also been affected by the order of individual sounds in the coherent sequences. Some sequences of individual sounds may be more statistically probable or semantically coherent than others. For example, snoring may be more likely to be heard before a ringing alarm clock than after. Although present results do not provide any indication about possible contributions of sound order, manipulations of sound position within sequences may elucidate the mechanisms behind the facilitatory context effects. It is possible that the context effect was facilitated by a temporally unfolding script-like template for coherent sequences in which each consecutive sound provides a certain degree of priming for the immediately following sounds. Alternatively, if the semantic context effect is based on the relatively long lasting activation of semantic categories corresponding to all of the individual sounds in a given sequence, the order of sounds may not be as important.
Overall, the current results are in agreement with prior environmental sound studies in showing that semantic context can have a facilitatory effect on environmental sound perception [
Overall, based on the present findings, a brief environmental sound test, FEST, appears to be effective in detecting semantic context effects in listeners of varying age and hearing abilities. The three scoring metrics which differentially assess response accuracy provide further flexibility in scoring, and can be useful when examining semantic context effects in diverse listener populations. In the current study, all listener groups demonstrated robust use of semantic context in the perception of environmental sound sequences. At the least stringent level of assessment, LC, which did not take into account correct placement of sound in the sequence, the ability to identify sounds in short auditory scene-like sequences was not affected by any of the potentially detrimental factors: age, presbycusis or listening through a cochlear implant. Participants were able to utilize the tacit knowledge of probabilistic relationships among different environmental sounds forming the semantically coherent sequences. The use of semantic context thus provides an important advantage in the perception of environmental sounds. Furthermore, auditory working memory, along with other cognitive abilities, appears to play a role in maximizing performance in the perception of environmental sounds in auditory scenes.
The ability to utilize semantic context in auditory scenes may, however, be reduced in other listener populations. For instance, prelingual CI users who have not developed typical auditory cognitive capacity in childhood or individuals with certain central-processing disorders or cognitive impairments may have difficulty integrating information across the semantic categories associated with specific environmental objects and events. Finally, as a short instrument for the assessment of higher-order auditory cognitive abilities that rely on environmental sounds, FEST can be potential useful in cognitive and auditory assessments of populations with limited command of the English language. To that end, present efforts are directed toward the development of a version of FEST that uses pictures rather than word labels for indicating subject responses, as well as a version with a variable number of environmental sounds in the sequences to accommodate gradual perceptual learning. This gradation in terms of working-memory load and semantic difficulty can increase its utility in the assessment of the auditory cognition of children and adults with limited literacy. Other applications may include aural rehabilitation programs to improve real-world listening skills in CI users and older adults, either with or without a hearing loss.
We would like to thank Carly Shen, Hannah Tautz, Drew Price, Eric Moskus, Karson Glass and Madeleine Thomas for their help with stimulus selection and editing.