Talker identification under adverse auditory conditions-The impacts of noise, channel, language, and familiarity

Ningxue Fan; Puyang Geng; Zhijun Li; Hong Guo

doi:10.1371/journal.pone.0339396

Abstract

Purpose

Talker identification is a crucial auditory skill that underpins human social communication and forensic applications. However, real-world conditions pose several challenges-such as environmental noise, channel variability, language familiarity, and talker familiarity-that can undermine the accuracy of auditory identification. In light of the limitations and insights from previous studies, the present study employed auditory experiments to systematically examine the impact of these four adverse factors on talker identification.

Methods

The study aimed to address two questions: (1) whether the independent and interactive effects among these factors are significant, and (2) whether lab-training can enhance talker identification accuracy. Using a voice line-up paradigm, this study conducted a perception experiment where speech stimuli were presented under four primary conditions: noise (No Noise vs. Noise), channel (High-quality vs. High-quality; Landline vs. Landline, High-quality vs. Landline), language (Mandarin, Reversed Mandarin, English, Reversed English), and speaker familiarity (assessed through listening tests and lab training). Auditory responses to the stimuli under these adverse conditions were collected from 53 listeners.

Results

The findings indicate that environmental noise and channel variability have significantly negative effects on talker identification, while intelligible speech yields superior performance under adverse conditions compared to unintelligible reversed speech. Furthermore, the study found that lab-training (i.e., increasing talker familiarity) could enhance talker identification accuracy under adverse conditions, although it does not improve accuracy under no noise and high-quality channel conditions.

Conclusion

This paper systematically examines the interactive effects of multiple adverse factors on talker identification, thereby advancing our understanding of the auditory mechanisms underlying human social speech communication and providing important theoretical support for auditory examination techniques in forensic speaker identification.

Citation: Fan N, Geng P, Li Z, Guo H (2026) Talker identification under adverse auditory conditions-The impacts of noise, channel, language, and familiarity. PLoS One 21(2): e0339396. https://doi.org/10.1371/journal.pone.0339396

Editor: Gauri Mankekar, LSU Health Shreveport, UNITED STATES OF AMERICA

Received: August 6, 2025; Accepted: December 7, 2025; Published: February 23, 2026

Copyright: © 2026 Fan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data analyzed during the current study has been uploaded as supplemental material.

Funding: This research was supported by the grants from Ministry of Finance of the People’s Republic of China (GY2024G-5 to P.G.) and Shanghai Education Science Research Project “Shanghai Universities Philosophy and Social Sciences Research Special Project” (2025ZSS007 to N. F.). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Talker identification is a crucial auditory skill underpinning human social communication, from the early recognition of caregivers in infancy to the complex interactions of adulthood [1,2], and extending to its significant role in forensic applications [3]. In forensic contexts, reliable voice identification is essential for judicial proceedings, drawing on both the naturalistic judgments of ear witnesses and the systematic evaluations conducted by forensic experts [3,4]. However, real-world conditions introduce several challenges-such as environmental noise, channel variability, language familiarity, and talker familiarity-that can undermine the accuracy of auditory identification. Consequently, examining the impact of these adverse auditory conditions is critical not only for refining cognitive models of speech processing but also for advancing effective and robust applications in both communicative and forensic settings.

1.1 Effect of noise on talker identification

Research has confirmed that environmental noise significantly disrupts critical acoustic cues [5,6], thereby potentially impairing talker identification accuracy. However, although the number of studies is limited, existing research yields inconsistent results regarding the impact of noise on talker identification. It has been reported that, in three distinct noise environments (i.e., speech-shaped noise, multi-talker babble, and a single, unfamiliar competing talker), the identification accuracy declined as the signal-to-noise ratio (SNR) decreased across all noise conditions, with the most pronounced reduction occurring under multi-talker babble conditions [7]. Similarly, Mamun et al. [8] found that both cochlear implant users and healthy control groups experienced significant declines in talker identification accuracy when exposed to speech-shaped noise.

Other studies highlighted a more complex influence of noise, while aged and hearing-impaired female listeners did not show significant changes under noise or competing talker conditions, hearing-impaired male listeners were significantly affected [9,10]. Furthermore, Kanber et al. [11] found that in a four-talker babble environment, there was no significant difference in identification accuracy between familiar and unfamiliar talkers, with both conditions averaging around 80% accuracy. A review of previous studies also indicates that, regardless of whether the listener is normally hearing or hearing-impaired, and regardless of the familiarity of the talker, identification accuracy rarely exceeds 90% under various noisy conditions [8–11].

1.2 Effect of channel variability on talker identification

Channel variability is another common factor influencing daily speech communication and forensic talker identification (e.g., recordings from landline phones and high-definition mobile phones). One fundamental impact of landline phone use, for instance, is its limited frequency range of 400–3400 Hz [12]; this restriction can affect the transmission of crucial acoustic cues, such as F0 and formant frequencies below 400 Hz and above 3400 Hz [3,13], thereby potentially compromising accurate talker identification. However, only a few studies have examined this factor to date. It is found that the channel (i.e., landline vs. mobile phones) significantly affected talker identification accuracy (i.e., approximately 74%) with its negative impact surpassing that of language and dialect (i.e., 81%−86%) [3].

Moreover, the authors pointed out that research on multi-factor interactions in talker identification (e.g., various languages mixed with channel variability) remains extremely scarce. One of the latest studies further revealed that consonant-based talker identification is not affected by channel variability (i.e., full-band, telephone-band, and non-telephone-band recordings), whereas vowel-based talker identification is significantly influenced by the channel [14].

1.3 Effect of language familiarity on talker identification

The language familiarity effect is one of the most popular and controversial topics in talker identification research, and it is a key focus of the current study. The central debate in the extant literature concerns whether language intelligibility exerts an influence on talker identification. Specifically, researchers have questioned whether talker identification necessitates language comprehension [15] or whether it can be accomplished without an understanding of the language [16].

The argument in favor of language-independent talker identification originally emerged from early neuropathological research. For instance, patients with receptive aphasia-characterized by impaired language comprehension-can still recognize speakers, whereas patients with phonagnosia lose the ability to identify talkers despite intact ability of language comprehension [17]. Fleming et al. [16] further substantiated this perspective through a perceptual experiment employing backward Chinese and English sentences. Their findings indicated that, although the reversed sentences were largely unintelligible, native English speakers did not exhibit significant cross-language differences in talker identification accuracy; in other words, enhanced familiarity with English phonology did not translate into improved identification performance. Other study has reported similar finding that no significant difference was observed in a talker similarity rating task based on forward and backward speech [18].

Conversely, using a paradigm similar to that of Fleming et al. [16], Perrachione et al. [19] reported results that strongly suggest talker identification is contingent upon language comprehension. In support of this view, several studies involving infants, individuals with dyslexia, and second-language learners have demonstrated that auditory talker identification is facilitated by language comprehension; that is, listeners are generally more adept at discriminating between speakers when the linguistic context is familiar [15,20–22]. Mary Zarate et al. [23] extended this line of inquiry by examining talker identification among native English speakers using a range of stimuli, including non-linguistic sounds, Chinese, German, pseudo-English, and English. Their results revealed a progressive improvement in identification accuracy correlating with increased language familiarity (non-linguistic < Chinese < German < pseudo-English < English). Similarly, other studies demonstrated that talker identification accuracy was significantly higher for rhyming word pairs (e.g., “day-bay”) compared to unrelated word combinations (e.g., “day-bee”), thereby underscoring the role of phonological familiarity [24,25].

1.4 Effect of speaker-familiarity/training on talker identification

Another factor influencing talker identification is speaker familiarity. In recent years, researchers have examined the impact of familiarity by comparing the performance of listeners with familiar versus unfamiliar speakers and by employing lab-training paradigms. The majority of studies report that listeners demonstrate significantly higher accuracy when identifying familiar voices compared to unfamiliar ones [26–30]. Nevertheless, even though familiar talkers are identified more accurately, listener performance is not invariably flawless [27,28]. One plausible explanation for the speaker familiarity effect is that listeners are able to extract distinctive acoustic features or leverage prior knowledge associated with familiar speakers [31].

Furthermore, the potential of lab-training to enhance talker identification accuracy has only recently attracted attention over the past two decades. Several investigations have demonstrated that perceptual training can lead to improvements in talker identification accuracy [32–34]. Kanber et al. [11] compared the recognition accuracy among personally familiar voices, lab-trained voices, and unfamiliar voices, and found that brief training (i.e., 5–10 minutes) was sufficient to enhance identification performance. In contrast, other study reported that training does not consistently yield improvements in talker identification; specifically, training benefits observed with foreign-language talkers were restricted to the trained speaker set and did not generalize to novel foreign-language voices [35]. Similarly, McLaughlin et al. [36] found no significant enhancement in talker identification accuracy following training in conditions involving an unfamiliar language.

1.5 The present study

In summary, existing research on talker identification under adverse conditions (i.e., environmental noise, channel variability, language familiarity, and talker familiarity) remains limited in quantity, and the findings continue to be contentious. While the effects of these four adverse factors have been investigated individually, to the best of our knowledge based on the current literature, their combined influence on talker identification has yet to be examined. Consequently, in light of the insights and gaps in the current literature, the present study aims to address two primary questions:

(1). What are the individual and interactive effects of noise, channel variability, and language familiarity on talker identification?
(2). Can lab-training designed to enhance talker familiarity improve talker identification accuracy under adverse conditions?

2. Method

The research was approved by the Committee for the Protection of Human Subjects (CPHS) at the Academy of Forensic Science (Shanghai, China) [No. 2023−15]. All participants were informed about the study’s purpose, provided written consent form, and received financial compensation after completion of the experiment. Participants were informed that they could withdraw from the experiment at any time if they chose to discontinue their participation. All participants involved in the current study were recruited to participate in this experiment between March and April 2025.

2.1 Participant

A preliminary power analysis was conducted via the pwr package in R [37,38]. It indicated a sample size > 21.10 was needed to detect a large effect size (Cohen’s f = 0.4; [39]), with a significance level of 0.05 and statistical power of 0.80. Consequently, a total of 53 native Mandarin speakers (33 females, 20 males) participated in this study. All participants were undergraduate or graduate students recruited from some universities in China. All participants used English as their second language and had passed the CET-4 (College English Test), indicating an intermediate level of English proficiency. Additionally, none of the participants had received professional auditory training (e.g., musical training) that might bias their auditory perception. The female participants had a mean age of 26.21 years (SD = 3.36), and the male participants had a mean age of 27.36 years (SD = 6.74). None of the participants reported a history of speech or hearing impairments. Upon completion of the study, participants were provided with appropriate financial compensation.

2.2 Stimuli

Four female native speakers aged from 31 years to 38 years (SD = 3.16) were recruited to record the speech stimuli for this study. All speakers are fluent in standard Mandarin. They use English as their second language, and each has passed the CET-4, indicating intermediate English proficiency. Additionally, none of the speakers have a history of speech or hearing impairments.

As shown in Table 1, eight target sentences were constructed in both Chinese and English versions, with each sentence comprising 4–11 words. All speech stimuli were recorded in a sound-attenuated room using a high-quality digital recorder (i.e., SONY PCM-D100). Additionally, during a telephone call initiated from an iPhone 14 Pro Max, simultaneous recordings were acquired using a landline telephone (i.e., Motorola C7501RC). The digital recorder and the iPhone 14 Pro Max were positioned 30 cm from the speakers’ mouths. Prior to recording, the speakers were given ample opportunity to familiarize themselves with the materials and practice as needed. They were instructed to articulate each target sentence in their habitual neutral voice twice. Considering the natural variability in a speaker’s acoustic features even when uttering identical content [40], and to maintain ecological validity with daily communication and forensic contexts, different rounds of utterances were used if two sequentially presented speech stimuli originated from the same speaker. All recordings were saved in WAV format at a 44.1 kHz sampling rate and 16-bit resolution. In total, 4 (speakers) * 8 (target sentences) * 2 (Chinese and English) * 2 (times) * 2 (digital recorder and landline phone) = 256 recordings were collected.

Download:

Table 1. Target sentences in Chinese and English versions.

https://doi.org/10.1371/journal.pone.0339396.t001

The speech stimuli were firstly normalized to 70 dB and subsequently reversed using Praat software [41]. Consequently, four categories of speech stimuli (i.e., Mandarin, Mandarin-reverse, English, English-reverse) were created to examine the influence of language familiarity on talker identification. These stimuli were then divided into two groups to assess the impact of channel variability. Specifically, those recorded using the digital recorder (including both forward and reversed versions) were designated as High-quality (H), while the recordings obtained via the landline telephone were labeled as Landline (L).

To further investigate the effects of noise on talker identification, high-quality speech stimuli across all four categories were synthesized with a mixed noise component, following a methodology analogous to that employed in previous speech-in-noise perception tasks (e.g., [42,43]). Previous studies have frequently employed sine waves and broadband noise (e.g., white noise) in the investigation of speech-in-noise perception, revealing that both exert a masking effect on the transmission of speech information [44–48]. To emulate as closely as possible the impact of noise on speech perception in realistic interference scenarios, the present study generated a composite noise signal by combining sine waves and white noise, employing the default formula integrated within Praat (i.e., 1/2 * sin (2π × 377 × x) + randomGauss (0, 0.1)) at a sampling rate of 44.1 kHz. For all speech materials under noise conditions, the signal-to-noise ratio (SNR) was maintained at 0 dB.

To examine the influence of speaker familiarity on talker identification, a lab-training paradigm was employed in the auditory perceptual experiment. All speech stimuli from the speakers were divided into two sessions (1–4 target sentences for the listening test; 5–8 target sentences for the lab-training test). For the listening test, stimuli representing the four categories (i.e., Mandarin, Mandarin-reverse, English, English-reverse) under adverse noise (i.e., No Noise vs. Noise) and channel conditions (i.e., High-quality vs. High-quality; Landline vs. Landline; High-quality vs. Landline) were utilized. In the lab-training test, listeners were firstly exposed to the four categories of speech stimuli in adverse noise and channel conditions from a single speaker twice, after which they completed a perceptual talker identification task for that speaker. This procedure was conducted sequentially for all four speakers. The speech stimuli of the two sessions (i.e., listening test and lab-training test) for talker identification experiment were shown in Table 2.

Download:

Table 2. The speech stimuli of the two sessions (i.e., listening test and lab-training test) for talker identification experiment.

https://doi.org/10.1371/journal.pone.0339396.t002

It is important to note that, to limit experimental sessions to approximately 40 minutes and maintain participant engagement and attention, this stimulus set of the current study has several limitations. For instance, it only included four female talkers, used a relatively high signal-to-noise ratio (SNR = 0), and featured just two training rounds. These limitations necessitate caution when interpreting the study’s results, as they may constrain the generalizability of the findings. Nevertheless, the study systematically examines interactions among multiple adverse factors in talker identification, offers key insights into auditory talker identification patterns under complex adverse conditions, and lays groundwork for understanding how listeners process talker information amid combined auditory challenges. Future research can build on these findings by conducting more targeted, comprehensive investigations to address these constraints.

2.3 Procedure

The perceptual experiment was conducted in a sound-attenuated room. Each participant was instructed to sit in front of a laptop monitor and adjust the screen to a position that allowed clear visibility. Professional high-quality headphones (Sennheiser HD650 and Audio-Technica ATH-M70x) were used in the perceptual experiment. The experiment was conducted with PsychoPy software [49].

The procedure for the perceptual talker identification experiment was illustrated in Fig 1. For listening test session, each trial began with a 500-millisecond red fixation cross. Subsequently, two stimuli (i.e., either from the same speaker or from different speakers) were presented in a voice line-up paradigm, separated by a 400-millisecond silent interval. Participants were then required to select one of three response options (i.e., Same, Different, or Unclear) based on the stimuli they heard. For the lab-training test session, the procedure commenced with two rounds of auditory training using the speech stimuli from a single speaker (as shown in Table 2). Following the training phase, participants engaged in a talker identification task that followed the same procedure as the listening test session. The entire experiment lasted approximately 30–45 minutes. To mitigate auditory fatigue, participants were permitted to take breaks at any time during the session. Before the formal data collection, participants were provided with instructions for the perceptual experiment. They also completed two practice trials to familiarize themselves with the experimental procedure. Subsequently, perceptual data for each stimulus were collected from all 53 participants.

Download:

Fig 1. The procedure of perceptual talker identification experiment.

https://doi.org/10.1371/journal.pone.0339396.g001

2.4 Data analysis

Two generalized logistic regression analyses were conducted for the listening test session using the afex package [50] in R software [38] to investigate the impact of adverse auditory conditions on talker identification accuracy. In these models, each stimulus’s perceptual judgment was re-coded as a binary outcome (0 for an incorrect response, 1 for a correct response; “unclear” responses were excluded from the analysis) and served as the dependent variable. For one model, the independent variables were noise (No Noise vs. Noise) and language (Mandarin, Mandarin-reverse, English, English-reverse); for the other, they were channel (High-quality vs. High-quality; Landline vs. Landline; High-quality vs. Landline) and language. The models were constructed using the following formulas: Answer ~ Noise * Language + (1 | Speaker) + (1 | Listener); Answer ~ Channel * Language + (1 | Speaker) + (1 | Listener).

Furthermore, two additional generalized logistic regression analyses were performed to assess the effect of lab-training on talker identification under adverse auditory conditions. In these analyses, perceptual accuracy, coded as 0 or 1, was the dependent variable. For one model, the independent variables were familiarity (Listening test vs. Lab-training test), noise, and language; for the other, they were familiarity, channel, and language. These models were specified as follows: Answer ~ Train * Noise * Language + (1 | Speaker) + (1 | Listener); Answer ~ Train * Channel * Language + (1 | Speaker) + (1 | Listener).

In all models, the random intercepts for speakers and listener as well as the random slope for noise, channel, and language by the listener were included in all models to support the maximal random effect structure design [51]. The likelihood ratio test was used to assess the importance of the random slope, which indicated that the slope was not significant in any of the model fittings. Consequently, to maintain model simplicity, the random slope was removed from all models. Tukey’s HSD post hoc tests were subsequently performed for pairwise comparisons [52], and odds ratios were reported as the measure of effect size.

3. Results

The average accuracies for the talker identification task under noise (No Noise vs. Noise), channel (High-quality vs. High-quality [HH]; Landline vs. Landline [LL]; High-quality vs. Landline [HL]), and language conditions (Mandarin [M], Mandarin-reverse [M-reverse], English [E], English-reverse [E-reverse]) across the two sessions (Listening Test vs. Lab-training) are presented in Figs 2 and 3. As shown in Table 3, a raw comparison of the statistical results indicated that noise, poorer signal transmission (i.e., Landline vs. Landline), and channel discrepancies (High-quality vs. Landline) all resulted in reduced talker identification accuracy. Although forward speech yielded significantly superior talker identification performance compared to backward speech, no language familiarity effect was observed (i.e., talker identification accuracies were comparable for Mandarin and English). Additionally, lab-training (i.e., higher speaker familiarity) moderately improved talker identification accuracy. In terms of response times, longer identification times were observed under conditions of poorer signal transmission and channel difference conditions. Following lab-training, response times decreased across all conditions.

Download:

Table 3. The average accuracies and reaction times of talker identification task under adverse auditory conditions.

https://doi.org/10.1371/journal.pone.0339396.t003

Download:

Fig 2. The perceptual accuracies (±95% CI) of talker identification under noise conditions and channel conditions across the two sessions.

https://doi.org/10.1371/journal.pone.0339396.g002

Download:

Fig 3. The perceptual accuracies (±95% CI) for talker identification under language conditions as well as the accuracy under the noise × language and channel × language interactions across the two sessions.

https://doi.org/10.1371/journal.pone.0339396.g003

To further illustrate the impact of these adverse auditory conditions on talker identification, two generalized logistic regression models were conducted. These results of the models (as shown in S1 Appendix) revealed significant main effects of “Noise”, “Channel” and “Language”, as well as significant two-way interaction effects of “Noise × Language” and “Channel × Language” (p < 0.05) on talker identification accuracy. In instances where a higher-order interaction effect was significant, the corresponding main effects and lower-order interaction effects were not interpreted.

As shown in S1 Appendix, the results of Tukey-HSD post hoc analysis for the two-way interaction effect of “Noise × Language” demonstrated that, (1) for reversed speech, talker identification accuracies were significantly higher in the no noise condition than in the noise condition, while for forward speech, there was no significant difference in talker identification accuracies between the no noise and noise conditions; (2) forward speech yielded higher identification accuracies than reversed speech under both the no noise and noise conditions. Additionally, significantly lower talker identification accuracy was observed for reversed Mandarin speech under the noise condition compared to reversed English.

The results of the Tukey HSD post hoc analysis examining the two-way interaction effect of “Channel × Language” are presented in S1 Appendix. Overall, talker identification accuracy was highest for the High-quality vs. High-quality condition. In addition, identical channel conditions (i.e., High-quality vs. High-quality and Landline vs. Landline) generally yielded higher accuracy than mismatched channel conditions (i.e., High-quality vs. Landline) across all language conditions, with two exceptions: no significant difference was observed for reversed English between the High-quality vs. High-quality and Landline vs. Landline conditions, and for reversed Mandarin between the High-quality vs. Landline and Landline vs. Landline conditions. Furthermore, the post hoc analyses revealed that, (1) for High-quality vs. High-quality condition, forward speech exhibited significantly higher identification accuracy than reversed speech; (2) for Landline vs. Landline condition, reversed Mandarin speech showed lower accuracy than the other three language conditions; and (3) for High-quality vs. Landline condition, English speech showed higher accuracy than the other three language conditions.

Considering the impact of speaker familiarity on talker identification, the results from two generalized logistic regression models are shown in S1 Appendix. These models identified significant main effects of “Familiarity”, “Noise”, “Channel” and “Language”, significant two-way interaction effects of “Familiarity × Language” and “Channel × Language”, and significant three-way interaction effects of “Familiarity × Noise × Language” and “Familiarity × Channel × Language” (p < 0.05) on talker identification accuracy.

The Tukey HSD post hoc analysis for the three-way interaction of “Familiarity × Noise × Language” revealed that talker identification accuracy was significantly higher in the Lab-training session compared to the Listening test session only for reversed English in the no noise condition {β = 0.40, SE = 0.18, t = 2.22, p = 0.03, OR = 1.45 (95% CI: [1.03, 2.03])} and reversed Mandarin in the noise condition {β = 0.90, SE = 0.16, t = 5.60, p < 0.001, OR = 2.30 (95% CI: [1.70, 3.12])}.

Furthermore, as shown in S1 Appendix, the Tukey HSD analysis for the three-way interaction of “Familiarity × Channel × Language” indicated that the improvement in identification accuracy following lab-training (i.e., higher speaker familiarity) was observed exclusively in the poor signal transmission condition (i.e., Landline vs. Landline), with the exception of English speech. Additionally, a positive impact of speaker familiarity was found for Mandarin speech in the High-quality vs. Landline condition and for reversed English speech in the High-quality vs. High-quality condition.

4. Discussion

The current study aims to investigate talker identification under adverse auditory conditions and whether lab-training can enhance listeners’ performance. Through perceptual experiments of talker identification conducted in two sessions (i.e., Listening Test and Lab-training Test), the study found that adverse auditory conditions, specifically environmental noise, channel variability, and speaker familiarity, have a significant impact on talker identification. Moreover, although language familiarity did not have a significant effect on talker identification, forward speech yielded significantly higher identification accuracy compared to reversed (i.e., unintelligible) speech.

4.1 Complex interactive effects of adverse conditions on talker identification

In experiments involving speech mixed with noise, this study found that noise exerted a significant adverse effect only on reversed speech, with no such effect observed for forward speech. This finding contrasts with studies on forward speech, which have reported a decline in identification accuracy due to noise [7,8], but it further supports the view that noise exerts a complex influence on talker identification [10,11]. In light of evidence suggesting that reversed speech does not facilitate the formation of short-term memory representations of the speaker for the listener [53–55], we propose a potential hypothesis that talker identification under the reversed speech condition may be more susceptible to noise. Conversely, as listeners are better able to establish short-term memory of the speaker through forward speech, they may experience less interference from noise during talker identification. This hypothesis though requires further systematic auditory and neuroimaging investigations, potentially employing 1-back or n-back experimental paradigms to assess the difficulty of recalling short-term memory for talker identification under conditions of varying speech intelligibility.

The results regarding channel variability support previous fragmented findings [3], showing that talker identification accuracy declines significantly under poor signal transmission (i.e., Landline vs. Landline) and across different channels (i.e., High-quality vs. Landline), following a descending order of High-quality vs. High-quality > Landline vs. Landline > High-quality vs. Landline. Moreover, consistent with previous suggestions that language and channel may exhibit complex interactive effects [3,14], the current study found that in the High-quality vs. High-quality condition, the accuracy for forward speech was superior to that of reversed speech, whereas in the Landline vs. Landline and High-quality vs. Landline conditions, reversed Mandarin and English speech stimuli displayed higher accuracies, respectively. Both the current study and Wang et al.’s work [14] confirm that adverse conditions interact in complex ways rather than through a simple linear summation. This finding underscores the need for future research to build upon these fragmented observations and to conduct more systematic, in-depth investigations into the interactive effects between language and channel.

Surprisingly, this study did not find a significant language familiarity effect on talker identification. Despite being one of the most controversial topics in the literature, most studies have reported significant effects of language familiarity [19,21,33]. The current research revealed that, regardless of the presence or absence of noise, forward speech yielded higher talker identification rates than reversed speech; additionally, forward speech in the High-quality vs. High-quality condition outperformed reversed speech, and the speech stimuli from the four language categories (i.e., Mandarin, English, reversed Mandarin, and reversed English) exhibited a complex pattern when in poor signal transmission and different channel conditions. Based on the results of this study, it appears that the intelligibility of language may play a more critical role in talker identification than phonological familiarity [36]. However, the effect of language familiarity on talker identification under varying channel conditions remains complex and warrants further investigation.

4.2 Modest improvements of lab-training on talker identification

Consistent with previous studies [32–34], the current study found increased speaker familiarity (after lab-training) could led to modest improvements on talker identification accuracy under adverse auditory conditions (e.g., reversed speech, noise, poor signal transmission, and different channels), with overall gains of approximately 3–4%. Kanber et al. [11] argued that 5–10 minutes of training was sufficient to enhance lab-trained voice identification performance (over 80%). By contrast, the lab-training in the current study yielded only limited improvements in talker identification (see Fig 2), potentially due to the fact that the training comprised only two rounds. Future research could conduct more systematic investigations into how different training durations influence talker identification.

Notably, the current study also found interactive effects between speaker familiarity and noise, channel, and language, as evidenced by the inconsistency of the training effect across conditions. While improvements occurred under adverse conditions, listeners’ accuracy for intelligible speech (no noise, high-quality channels) did not improve with training (see Table 3). This inconsistency confirms training benefits were small and selective, limited to adverse auditory scenarios rather than generalizable.

4.3 Implications for forensic speaker identification

In forensic practice, speech is often recorded under varying conditions of noise, channel, and language [56–58]. Auditory examination constitutes a critical component of the acoustic-phonetic paradigm used in forensic speaker identification [4,59]. Therefore, the findings of the current study offer tentative implications for such examinations. First, talker identification is significantly impaired under adverse auditory conditions, which necessitates careful attention to judicial examination procedures. Potential interventions may include speech denoising and signal simulation techniques designed to present speech for identity judgment in conditions that are as optimal as possible [58,60]. Furthermore, when forensic experts encounter unintelligible speech or unfavorable signal conditions, repeated perceptual training to enhance familiarity with the target speaker could yield small improvements in identification accuracy. However, it is important to note that these benefits are not universally observed.

Several limitations of this study warrant discussion. First, this research examined the identification of speech from only four female talkers. Previous studies have reported potential gender differences in talker identification (e.g., male listeners showing higher identification accuracy for male talkers; [61]). Future studies will include additional research on male talkers to confirm these effects. Second, given the repeated mention of the significant application of talker identification in forensic contexts, it remains an interesting topic to explore whether forensic experts differ from untrained listeners. Lastly, to deepen our understanding of the mechanisms underlying talker identification, further research employing neuroscience and brain-imaging techniques is necessary to corroborate the present findings.

5. Conclusion

This study discusses the effects of adverse auditory conditions (i.e., environmental noise, channel variability, language familiarity, and speaker familiarity) on talker identification. The findings indicate that both environmental noise and channel variability negatively impact talker identification. In particular, when the channel transmits poor signals or varies in nature, the accuracy of talker identification is significantly reduced. Furthermore, intelligible language demonstrates superior recognition performance under adverse conditions compared to unintelligible language, and this effect appears to be independent of phonological familiarity. Finally, lab-training designed to enhance speaker familiarity moderately improves talker identification accuracy under adverse auditory conditions, while it has no effect on accuracy under no-noise and high-quality conditions. This study systematically examines the interactive effects of multiple factors on talker identification, thereby enriching our understanding of the underlying auditory mechanisms under various auditory conditions and providing important theoretical support for auditory examination techniques in forensic speaker identification.

Supporting information

S1 Appendix. The results of the statistical analysis.

https://doi.org/10.1371/journal.pone.0339396.s001

(DOCX)

S1 File. The datasets analyzed of the current study.

https://doi.org/10.1371/journal.pone.0339396.s002

(CSV)

References

1. Cooper A, Paquette-Smith M, Bordignon C, Johnson EK. The influence of accent distance on perceptual adaptation in toddlers and adults. Language Learning and Development. 2022;19(1):74–94.
- View Article
- Google Scholar
2. Drozdova P, van Hout R, Scharenborg O. Talker-familiarity benefit in non-native recognition memory and word identification: The role of listening conditions and proficiency. Atten Percept Psychophys. 2019;81(5):1675–97.
- View Article
- Google Scholar
3. Betancourt KS, Bahr RH. The influence of signal complexity on speaker identification. The International Journal of Speech, Language and the Law. 2011;17(2):179–200.
- View Article
- Google Scholar
4. Morrison GS, Enzinger E. Introduction to forensic voice comparison. The Routledge Handbook of Phonetics. Routledge. 2019. p. 599–634.
- View Article
- Google Scholar
5. Lecumberri MLG, Cooke M, Cutler A. Non-native speech perception in adverse conditions: A review. Speech Communication. 2010;52(11–12):864–86.
- View Article
- Google Scholar
6. Leibold LJ. Speech perception in complex acoustic environments: developmental effects. J Speech Lang Hear Res. 2017;60(10):3001–8. pmid:29049600
- View Article
- PubMed/NCBI
- Google Scholar
7. Razak A, Thurston EJ, Gustainis LE, Kidd G, Swaminathan J, Perrachione TK. Talker identification in three types of background noise. J Acoust Soc Am. 2017;141:4039.
- View Article
- Google Scholar
8. Mamun N, Ghosh R, Hansen JHL. Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users. J Acoust Soc Am. 2023;153(2):1293. pmid:36859118
- View Article
- PubMed/NCBI
- Google Scholar
9. Best V, Ahlstrom JB, Mason CR, Roverud E, Perrachione TK, Kidd G Jr, et al. Talker identification: Effects of masking, hearing loss, and age. J Acoust Soc Am. 2018;143(2):1085. pmid:29495693
- View Article
- PubMed/NCBI
- Google Scholar
10. Best V, Ahlstrom JB, Mason CR, Perrachione TK, Kidd G, Dubno JR. Effects of age and hearing loss on talker identification and talker change detection. J Acoust Soc Am. 2023;153:A285–A285.
- View Article
- Google Scholar
11. Kanber E, Lavan N, McGettigan C. Highly accurate and robust identity perception from personally familiar voices. J Exp Psychol Gen. 2022;151(4):897–911. pmid:34672658
- View Article
- PubMed/NCBI
- Google Scholar
12. Künzel HJ. Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguist. 2001;8:80–99.
- View Article
- Google Scholar
13. Hillenbrand J, Getty LA, Clark MJ, Wheeler K. Acoustic characteristics of American English vowels. J Acoust Soc Am. 1995;97(5 Pt 1):3099–111. pmid:7759650
- View Article
- PubMed/NCBI
- Google Scholar
14. Wang X, Ge J, Meller L, Yang Y, Zeng F-G. Speech intelligibility and talker identification with non-telephone frequencies. JASA Express Lett. 2024;4(7):075202. pmid:39046893
- View Article
- PubMed/NCBI
- Google Scholar
15. Perrachione TK, Del Tufo SN, Gabrieli JDE. Human voice recognition depends on language ability. Science. 2011;333(6042):595. pmid:21798942
- View Article
- PubMed/NCBI
- Google Scholar
16. Fleming D, Giordano BL, Caldara R, Belin P. A language-familiarity effect for speaker discrimination without comprehension. Proc Natl Acad Sci U S A. 2014;111(38):13795–8. pmid:25201950
- View Article
- PubMed/NCBI
- Google Scholar
17. Garrido L, Eisner F, McGettigan C, Stewart L, Sauter D, Hanley JR, et al. Developmental phonagnosia: a selective deficit of vocal identity recognition. Neuropsychologia. 2009;47(1):123–31. pmid:18765243
- View Article
- PubMed/NCBI
- Google Scholar
18. Furbeck K, Thurston EJ, Tin J, Perrachione TK. Perceptual similarity judgments of voices: Effects of talker and listener language, vocal source acoustics, and time-reversal. J Acoust Soc Am. 2018;143:1923.
- View Article
- Google Scholar
19. Perrachione TK, Furbeck KT, Thurston EJ. Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices. J Acoust Soc Am. 2019;146(5):3384. pmid:31795676
- View Article
- PubMed/NCBI
- Google Scholar
20. Fecher N, Johnson EK. The native-language benefit for talker identification is robust in 7.5-month-old infants. J Exp Psychol Learn Mem Cogn. 2018;44(12):1911–20. pmid:29698034
- View Article
- PubMed/NCBI
- Google Scholar
21. Fecher N, Johnson EK. Developmental improvements in talker recognition are specific to the native language. J Exp Child Psychol. 2021;202:104991. pmid:33096370
- View Article
- PubMed/NCBI
- Google Scholar
22. Johnson EK, Bruggeman L, Cutler A. Abstraction and the (Misnamed) language familiarity effect. Cogn Sci. 2018;42(2):633–45. pmid:28744902
- View Article
- PubMed/NCBI
- Google Scholar
23. Zarate JM, Tian X, Woods KJP, Poeppel D. Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep. 2015;5:11475. pmid:26088739
- View Article
- PubMed/NCBI
- Google Scholar
24. Narayan CR, Mak L, Bialystok E. Words get in the way: linguistic effects on talker discrimination. Cogn Sci. 2017;41(5):1361–76. pmid:27445079
- View Article
- PubMed/NCBI
- Google Scholar
25. Quinto A, Abu El Adas S, Levi SV. Re‐examining the effect of top‐down linguistic information on speaker‐voice discrimination. Cognitive Science. 2020;44(10).
- View Article
- Google Scholar
26. Johnson J, McGettigan C, Lavan N. Comparing unfamiliar voice and face identity perception using identity sorting tasks. Q J Exp Psychol (Hove). 2020;73(10):1537–45. pmid:32530364
- View Article
- PubMed/NCBI
- Google Scholar
27. Lavan N, Burston LFK, Garrido L. How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices. Br J Psychol. 2019;110(3):576–93. pmid:30221374
- View Article
- PubMed/NCBI
- Google Scholar
28. Lavan N, Burston LF, Ladwa P, Merriman SE, Knight S, McGettigan C. Breaking voice identity perception: Expressive voices are more confusable for listeners. Q J Exp Psychol (Hove). 2019;72(9):2240–8. pmid:30808271
- View Article
- PubMed/NCBI
- Google Scholar
29. Lavan N, Kreitewolf J, Obleser J, McGettigan C. Familiarity and task context shape the use of acoustic information in voice identity perception. Cognition. 2021;215:104780. pmid:34298232
- View Article
- PubMed/NCBI
- Google Scholar
30. Stevenage SV, Symons AE, Fletcher A, Coen C. Sorting through the impact of familiarity when processing vocal identity: Results from a voice sorting task. Q J Exp Psychol (Hove). 2020;73(4):519–36. pmid:31658884
- View Article
- PubMed/NCBI
- Google Scholar
31. Njie S, Lavan N, McGettigan C. Talker and accent familiarity yield advantages for voice identity perception: A voice sorting study. Mem Cognit. 2023;51(1):175–87. pmid:35274221
- View Article
- PubMed/NCBI
- Google Scholar
32. Hollien H, Didla G, Harnsberger JD, Hollien KA. The case for aural perceptual speaker identification. Forensic Sci Int. 2016;269:8–20. pmid:27855301
- View Article
- PubMed/NCBI
- Google Scholar
33. Lloy A, Johnson K, Babel M. Examining the roles of language familiarity and bilingualism in talker recognition. The 13th International symposium on Bilingualism. 2021. 87–140. https://www.khiajohnson.com/pdfs/lloy-johnson-babel-isb13-abstract.pdf
- View Article
- Google Scholar
34. Perrachione TK. Recognizing speakers across languages. In: Frühholz S, Belin P, editors. The Oxford Handbook of Voice Perception. Oxford: Oxford University Press; 2019. https:academic.oup.com/edited-volume/38687/chapter/335931302
35. Lee JJ, Tin JA, Perrachione TK. Foreign language talker identification training does not generalize to new talkers. J Acoust Soc Am. 2020;148:2763.
- View Article
- Google Scholar
36. McLaughlin DE, Carter YD, Cheng CC, Perrachione TK. Hierarchical contributions of linguistic knowledge to talker identification: Phonological versus lexical familiarity. Atten Percept Psychophys. 2019;81(4):1088–107. pmid:31218598
- View Article
- PubMed/NCBI
- Google Scholar
37. Champely S, Ekstrom C, Dalgaard P, Gill J, Weibelzahl S, Anandkumar A. Package ‘pwr.’ R package version. 2020.
38. Core Team R. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2020. https://www.R-project.org/
39. Cohen J. Things I have learned (so far). Am Psychol. 1990;45:1304–12.
- View Article
- Google Scholar
40. Jacewicz E, Fox RA, Wei L. Between-speaker and within-speaker variation in speech tempo of American English. J Acoust Soc Am. 2010;128(2):839–50. pmid:20707453
- View Article
- PubMed/NCBI
- Google Scholar
41. Boersma P, Weenink D. Praat: Doing phonetics by computer. 2021. http://www.praat.org/
42. Nilsson M, Soli SD, Sullivan JA. Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. J Acoust Soc Am. 1994;95(2):1085–99. pmid:8132902
- View Article
- PubMed/NCBI
- Google Scholar
43. Sharma S, Tripathy R, Saxena U. Critical appraisal of speech in noise tests: a systematic review and survey. Int J Res Med Sci. 2017;5:13–21.
- View Article
- Google Scholar
44. Galdos M, Simons C, Fernandez-Rivas A, Wichers M, Peralta C, Lataster T, et al. Affectively salient meaning in random noise: a task sensitive to psychosis liability. Schizophr Bull. 2011;37(6):1179–86. pmid:20360211
- View Article
- PubMed/NCBI
- Google Scholar
45. Roberts B, Summers RJ, Bailey PJ. The perceptual organization of sine-wave speech under competitive conditions. J Acoust Soc Am. 2010;128(2):804–17. pmid:20707450
- View Article
- PubMed/NCBI
- Google Scholar
46. Rosen S, Hui SNC. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter. J Acoust Soc Am. 2015;138(6):3698–702. pmid:26723325
- View Article
- PubMed/NCBI
- Google Scholar
47. Slater J, Skoe E, Strait DL, O’Connell S, Thompson E, Kraus N. Music training improves speech-in-noise perception: Longitudinal evidence from a community-based music program. Behav Brain Res. 2015;291:244–52. pmid:26005127
- View Article
- PubMed/NCBI
- Google Scholar
48. Souza P, Rosen S. Effects of envelope bandwidth on the intelligibility of sine- and noise-vocoded speech. J Acoust Soc Am. 2009;126(2):792–805. pmid:19640044
- View Article
- PubMed/NCBI
- Google Scholar
49. Peirce J, Gray JR, Simpson S, MacAskill M, Höchenberger R, Sogo H, et al. PsychoPy2: Experiments in behavior made easy. Behav Res Methods. 2019;51(1):195–203. pmid:30734206
- View Article
- PubMed/NCBI
- Google Scholar
50. Singmann H, Bolker B, Westfall J, Aust F, Ben-Shachar MS. afex: Analysis of factorial experiments. 2015.
51. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang. 2013;68(3):10.1016/j.jml.2012.11.001. pmid:24403724
- View Article
- PubMed/NCBI
- Google Scholar
52. Lenth R. Emmeans: Estimated Marginal Means, aka Least-Squares Means. 2020. https://CRAN.R-project.org/package=emmeans
- View Article
- Google Scholar
53. Dougherty SC, Mclaughlin DE, Perrachione TK. A language familiarity effect for talker identification in forward but not time-reversed speech. J Acoust Soc Am. 2015;137:2415.
- View Article
- Google Scholar
54. El Adas SA, Levi SV. Phonotactic and lexical factors in talker discrimination and identification. Atten Percept Psychophys. 2022;84(5):1788–804. pmid:35641859
- View Article
- PubMed/NCBI
- Google Scholar
55. Kreitewolf J, Wöstmann M, Tune S, Plöchl M, Obleser J. Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity. Atten Percept Psychophys. 2019;81(4):1108–18. pmid:30993655
- View Article
- PubMed/NCBI
- Google Scholar
56. Hollien HF. Forensic voice identification. Academic Press; 2002.
57. Lindh J. Forensic comparison of voices, speech and speakers. J R Stat Soc Ser C Appl Stat. 2017;53:109–22.
- View Article
- Google Scholar
58. Fraser H, Aubanel V, Maher RC, Mawalim C, Wang X, Poc̆ta P, et al. Forensic speech enhancement: toward reliable handling of poor-quality speech recordings used as evidence in criminal trials. J Audio Eng Soc. 2024;72(11):748–53.
- View Article
- Google Scholar
59. Rana S, Qureshi MA. A comprehensive review of forensic phonetics techniques. ABBDM. 2024;4(02).
- View Article
- Google Scholar
60. Ekpenyong M, Obot O. Speech quality enhancement in digital forensic voice analysis. Computational Intelligence in Digital Forensics: Forensic Investigation and Applications. Springer; 2014. p. 429–51.
61. Skuk VG, Schweinberger SR. Gender differences in familiar voice identification. Hear Res. 2013;296:131–40. pmid:23168357
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Cooper A, Paquette-Smith M, Bordignon C, Johnson EK. The influence of accent distance on perceptual adaptation in toddlers and adults. Language Learning and Development. 2022;19(1):74–94.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Drozdova P, van Hout R, Scharenborg O. Talker-familiarity benefit in non-native recognition memory and word identification: The role of listening conditions and proficiency. Atten Percept Psychophys. 2019;81(5):1675–97.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Betancourt KS, Bahr RH. The influence of signal complexity on speaker identification. The International Journal of Speech, Language and the Law. 2011;17(2):179–200.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Morrison GS, Enzinger E. Introduction to forensic voice comparison. The Routledge Handbook of Phonetics. Routledge. 2019. p. 599–634.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Lecumberri MLG, Cooke M, Cutler A. Non-native speech perception in adverse conditions: A review. Speech Communication. 2010;52(11–12):864–86.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Leibold LJ. Speech perception in complex acoustic environments: developmental effects. J Speech Lang Hear Res. 2017;60(10):3001–8. pmid:29049600
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Razak A, Thurston EJ, Gustainis LE, Kidd G, Swaminathan J, Perrachione TK. Talker identification in three types of background noise. J Acoust Soc Am. 2017;141:4039.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Mamun N, Ghosh R, Hansen JHL. Familiar and unfamiliar speaker recognition assessment and system emulation for cochlear implant users. J Acoust Soc Am. 2023;153(2):1293. pmid:36859118
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Best V, Ahlstrom JB, Mason CR, Roverud E, Perrachione TK, Kidd G Jr, et al. Talker identification: Effects of masking, hearing loss, and age. J Acoust Soc Am. 2018;143(2):1085. pmid:29495693
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref10] 10. Best V, Ahlstrom JB, Mason CR, Perrachione TK, Kidd G, Dubno JR. Effects of age and hearing loss on talker identification and talker change detection. J Acoust Soc Am. 2023;153:A285–A285.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref11] 11. Kanber E, Lavan N, McGettigan C. Highly accurate and robust identity perception from personally familiar voices. J Exp Psychol Gen. 2022;151(4):897–911. pmid:34672658
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Künzel HJ. Beware of the ‘telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguist. 2001;8:80–99.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref13] 13. Hillenbrand J, Getty LA, Clark MJ, Wheeler K. Acoustic characteristics of American English vowels. J Acoust Soc Am. 1995;97(5 Pt 1):3099–111. pmid:7759650
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref14] 14. Wang X, Ge J, Meller L, Yang Y, Zeng F-G. Speech intelligibility and talker identification with non-telephone frequencies. JASA Express Lett. 2024;4(7):075202. pmid:39046893
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref15] 15. Perrachione TK, Del Tufo SN, Gabrieli JDE. Human voice recognition depends on language ability. Science. 2011;333(6042):595. pmid:21798942
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref16] 16. Fleming D, Giordano BL, Caldara R, Belin P. A language-familiarity effect for speaker discrimination without comprehension. Proc Natl Acad Sci U S A. 2014;111(38):13795–8. pmid:25201950
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref17] 17. Garrido L, Eisner F, McGettigan C, Stewart L, Sauter D, Hanley JR, et al. Developmental phonagnosia: a selective deficit of vocal identity recognition. Neuropsychologia. 2009;47(1):123–31. pmid:18765243
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref18] 18. Furbeck K, Thurston EJ, Tin J, Perrachione TK. Perceptual similarity judgments of voices: Effects of talker and listener language, vocal source acoustics, and time-reversal. J Acoust Soc Am. 2018;143:1923.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref19] 19. Perrachione TK, Furbeck KT, Thurston EJ. Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices. J Acoust Soc Am. 2019;146(5):3384. pmid:31795676
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref20] 20. Fecher N, Johnson EK. The native-language benefit for talker identification is robust in 7.5-month-old infants. J Exp Psychol Learn Mem Cogn. 2018;44(12):1911–20. pmid:29698034
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref21] 21. Fecher N, Johnson EK. Developmental improvements in talker recognition are specific to the native language. J Exp Child Psychol. 2021;202:104991. pmid:33096370
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref22] 22. Johnson EK, Bruggeman L, Cutler A. Abstraction and the (Misnamed) language familiarity effect. Cogn Sci. 2018;42(2):633–45. pmid:28744902
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref23] 23. Zarate JM, Tian X, Woods KJP, Poeppel D. Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep. 2015;5:11475. pmid:26088739
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref24] 24. Narayan CR, Mak L, Bialystok E. Words get in the way: linguistic effects on talker discrimination. Cogn Sci. 2017;41(5):1361–76. pmid:27445079
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref25] 25. Quinto A, Abu El Adas S, Levi SV. Re‐examining the effect of top‐down linguistic information on speaker‐voice discrimination. Cognitive Science. 2020;44(10).
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref26] 26. Johnson J, McGettigan C, Lavan N. Comparing unfamiliar voice and face identity perception using identity sorting tasks. Q J Exp Psychol (Hove). 2020;73(10):1537–45. pmid:32530364
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref27] 27. Lavan N, Burston LFK, Garrido L. How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices. Br J Psychol. 2019;110(3):576–93. pmid:30221374
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref28] 28. Lavan N, Burston LF, Ladwa P, Merriman SE, Knight S, McGettigan C. Breaking voice identity perception: Expressive voices are more confusable for listeners. Q J Exp Psychol (Hove). 2019;72(9):2240–8. pmid:30808271
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref29] 29. Lavan N, Kreitewolf J, Obleser J, McGettigan C. Familiarity and task context shape the use of acoustic information in voice identity perception. Cognition. 2021;215:104780. pmid:34298232
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref30] 30. Stevenage SV, Symons AE, Fletcher A, Coen C. Sorting through the impact of familiarity when processing vocal identity: Results from a voice sorting task. Q J Exp Psychol (Hove). 2020;73(4):519–36. pmid:31658884
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref31] 31. Njie S, Lavan N, McGettigan C. Talker and accent familiarity yield advantages for voice identity perception: A voice sorting study. Mem Cognit. 2023;51(1):175–87. pmid:35274221
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref32] 32. Hollien H, Didla G, Harnsberger JD, Hollien KA. The case for aural perceptual speaker identification. Forensic Sci Int. 2016;269:8–20. pmid:27855301
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref33] 33. Lloy A, Johnson K, Babel M. Examining the roles of language familiarity and bilingualism in talker recognition. The 13th International symposium on Bilingualism. 2021. 87–140. https://www.khiajohnson.com/pdfs/lloy-johnson-babel-isb13-abstract.pdf
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref34] 34. Perrachione TK. Recognizing speakers across languages. In: Frühholz S, Belin P, editors. The Oxford Handbook of Voice Perception. Oxford: Oxford University Press; 2019. https:academic.oup.com/edited-volume/38687/chapter/335931302

[ref35] 35. Lee JJ, Tin JA, Perrachione TK. Foreign language talker identification training does not generalize to new talkers. J Acoust Soc Am. 2020;148:2763.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref36] 36. McLaughlin DE, Carter YD, Cheng CC, Perrachione TK. Hierarchical contributions of linguistic knowledge to talker identification: Phonological versus lexical familiarity. Atten Percept Psychophys. 2019;81(4):1088–107. pmid:31218598
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref37] 37. Champely S, Ekstrom C, Dalgaard P, Gill J, Weibelzahl S, Anandkumar A. Package ‘pwr.’ R package version. 2020.

[ref38] 38. Core Team R. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2020. https://www.R-project.org/

[ref39] 39. Cohen J. Things I have learned (so far). Am Psychol. 1990;45:1304–12.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref40] 40. Jacewicz E, Fox RA, Wei L. Between-speaker and within-speaker variation in speech tempo of American English. J Acoust Soc Am. 2010;128(2):839–50. pmid:20707453
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref41] 41. Boersma P, Weenink D. Praat: Doing phonetics by computer. 2021. http://www.praat.org/

[ref42] 42. Nilsson M, Soli SD, Sullivan JA. Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. J Acoust Soc Am. 1994;95(2):1085–99. pmid:8132902
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref43] 43. Sharma S, Tripathy R, Saxena U. Critical appraisal of speech in noise tests: a systematic review and survey. Int J Res Med Sci. 2017;5:13–21.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

[ref44] 44. Galdos M, Simons C, Fernandez-Rivas A, Wichers M, Peralta C, Lataster T, et al. Affectively salient meaning in random noise: a task sensitive to psychosis liability. Schizophr Bull. 2011;37(6):1179–86. pmid:20360211
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref45] 45. Roberts B, Summers RJ, Bailey PJ. The perceptual organization of sine-wave speech under competitive conditions. J Acoust Soc Am. 2010;128(2):804–17. pmid:20707450
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

[ref46] 46. Rosen S, Hui SNC. Sine-wave and noise-vocoded sine-wave speech in a tone language: Acoustic details matter. J Acoust Soc Am. 2015;138(6):3698–702. pmid:26723325
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref47] 47. Slater J, Skoe E, Strait DL, O’Connell S, Thompson E, Kraus N. Music training improves speech-in-noise perception: Longitudinal evidence from a community-based music program. Behav Brain Res. 2015;291:244–52. pmid:26005127
View Article
PubMed/NCBI
Google Scholar

[160] View Article

[161] PubMed/NCBI

[162] Google Scholar

[ref48] 48. Souza P, Rosen S. Effects of envelope bandwidth on the intelligibility of sine- and noise-vocoded speech. J Acoust Soc Am. 2009;126(2):792–805. pmid:19640044
View Article
PubMed/NCBI
Google Scholar

[164] View Article

[165] PubMed/NCBI

[166] Google Scholar

[ref49] 49. Peirce J, Gray JR, Simpson S, MacAskill M, Höchenberger R, Sogo H, et al. PsychoPy2: Experiments in behavior made easy. Behav Res Methods. 2019;51(1):195–203. pmid:30734206
View Article
PubMed/NCBI
Google Scholar

[168] View Article

[169] PubMed/NCBI

[170] Google Scholar

[ref50] 50. Singmann H, Bolker B, Westfall J, Aust F, Ben-Shachar MS. afex: Analysis of factorial experiments. 2015.

[ref51] 51. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: Keep it maximal. J Mem Lang. 2013;68(3):10.1016/j.jml.2012.11.001. pmid:24403724
View Article
PubMed/NCBI
Google Scholar

[173] View Article

[174] PubMed/NCBI

[175] Google Scholar

[ref52] 52. Lenth R. Emmeans: Estimated Marginal Means, aka Least-Squares Means. 2020. https://CRAN.R-project.org/package=emmeans
View Article
Google Scholar

[177] View Article

[178] Google Scholar

[ref53] 53. Dougherty SC, Mclaughlin DE, Perrachione TK. A language familiarity effect for talker identification in forward but not time-reversed speech. J Acoust Soc Am. 2015;137:2415.
View Article
Google Scholar

[180] View Article

[181] Google Scholar

[ref54] 54. El Adas SA, Levi SV. Phonotactic and lexical factors in talker discrimination and identification. Atten Percept Psychophys. 2022;84(5):1788–804. pmid:35641859
View Article
PubMed/NCBI
Google Scholar

[183] View Article

[184] PubMed/NCBI

[185] Google Scholar

[ref55] 55. Kreitewolf J, Wöstmann M, Tune S, Plöchl M, Obleser J. Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity. Atten Percept Psychophys. 2019;81(4):1108–18. pmid:30993655
View Article
PubMed/NCBI
Google Scholar

[187] View Article

[188] PubMed/NCBI

[189] Google Scholar

[ref56] 56. Hollien HF. Forensic voice identification. Academic Press; 2002.

[ref57] 57. Lindh J. Forensic comparison of voices, speech and speakers. J R Stat Soc Ser C Appl Stat. 2017;53:109–22.
View Article
Google Scholar

[192] View Article

[193] Google Scholar

[ref58] 58. Fraser H, Aubanel V, Maher RC, Mawalim C, Wang X, Poc̆ta P, et al. Forensic speech enhancement: toward reliable handling of poor-quality speech recordings used as evidence in criminal trials. J Audio Eng Soc. 2024;72(11):748–53.
View Article
Google Scholar

[195] View Article

[196] Google Scholar

[ref59] 59. Rana S, Qureshi MA. A comprehensive review of forensic phonetics techniques. ABBDM. 2024;4(02).
View Article
Google Scholar

[198] View Article

[199] Google Scholar

[ref60] 60. Ekpenyong M, Obot O. Speech quality enhancement in digital forensic voice analysis. Computational Intelligence in Digital Forensics: Forensic Investigation and Applications. Springer; 2014. p. 429–51.

[ref61] 61. Skuk VG, Schweinberger SR. Gender differences in familiar voice identification. Hear Res. 2013;296:131–40. pmid:23168357
View Article
PubMed/NCBI
Google Scholar

[202] View Article

[203] PubMed/NCBI

[204] Google Scholar

Figures

Abstract

Purpose

Methods

Results

Conclusion

1. Introduction

1.1 Effect of noise on talker identification

1.2 Effect of channel variability on talker identification

1.3 Effect of language familiarity on talker identification

1.4 Effect of speaker-familiarity/training on talker identification

1.5 The present study

2. Method

2.1 Participant

2.2 Stimuli

2.3 Procedure

2.4 Data analysis

3. Results

4. Discussion

4.1 Complex interactive effects of adverse conditions on talker identification

4.2 Modest improvements of lab-training on talker identification

4.3 Implications for forensic speaker identification

5. Conclusion

Supporting information

S1 Appendix. The results of the statistical analysis.

S1 File. The datasets analyzed of the current study.

References