Audio-visual speech-in-noise tests for evaluating speech reception thresholds: A scoping review

Adeel Hussain; Adele M. Goman; Mandar Gogate; Kia Dashtipour; Jasper Kirton-Wingate; Zain Hussain; Aziz Sheikh; Michael A. Akeroyd; Amir Hussain

doi:10.1371/journal.pone.0338600

Abstract

Objective

To evaluate the advancements in speech intelligibility testing over the recent decades, with a particular emphasis on the development of audiovisual speech in noise tests that incorporate both auditory and visual modalities for the measurement of speech recognition thresholds.

Design

A scoping review was conducted systematically to examine the existing literature on speech intelligibility testing methods. Following comprehensive screening process, studies were selected for detailed analysis, focusing on audiovisual integration and potential for remote or automated administration within studies methodologies.

Study Sample

The review encompassed 11 scholarly articles that investigated diverse approaches to speech intelligibility testing.

Results

The analysis revealed variability in the accuracy and reliability of speech intelligibility testing methods. Although certain methods demonstrated efficacy in incorporating audiovisual cues, none of the reviewed studies included provisions for remote administration, thereby necessitating the presence of a clinician for test execution. This limitation underscores the imperative for further research development of remote testing methodologies that leverage audiovisual technologies to assess speech in noise.

Conclusions

The findings of this review underscore the critical need for advancement in speech intelligibility testing methodologies particularly integrating audiovisual components and enabling remote administration. The development in this domain holds significant potential to enhance the assessment and implementation of assistive technologies for individuals with hearing impairments.

Citation: Hussain A, Goman AM, Gogate M, Dashtipour K, Kirton-Wingate J, Hussain Z, et al. (2026) Audio-visual speech-in-noise tests for evaluating speech reception thresholds: A scoping review. PLoS One 21(1): e0338600. https://doi.org/10.1371/journal.pone.0338600

Editor: Vidya Ramkumar, Sri Ramachandra Institute of Higher Education and Research (Deemed to be University), INDIA

Received: July 6, 2025; Accepted: November 25, 2025; Published: January 27, 2026

Copyright: © 2026 Hussain et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: No new data were generated in this scoping review. All data supporting the findings of this study were obtained from previously published articles. The study selection process followed the PRISMA-ScR guidelines, and the list of included studies is provided within the manuscript.

Funding: This research is supported by the UK EPSRC COG-MHEAR programme (Grant No. EP/M026981/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript”.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Speech perception is a complex multimodal process that integrates both auditory and visual cues, exemplifying multimodal or multisensory integration, wherein various unisensory modalities such as sight, hearing, or touch are combined. Research has demonstrated that language processing is highly interactive, involving the combination of diverse information sources [1]. Speech, generated by the vocal apparatus, is filtered through the configuration of articulatory organs. There is an inherent and perceptible link between the auditory and visual properties of speech, since articulators such as lips, teeth, and the tongue visibly contribute to the process [2–4].

Extensive research in this domain has investigated the phenomenon by which listeners unconsciously engage in lip reading to enhance speech intelligibility in noisy environments [5–8]. Speech intelligibility assessments are generally conducted in clinical or research laboratory settings, where controlled conditions are maintained. However, this presents challenges when attempting to generalise these findings to real-world listening environments, which are characterised by variables such as ambient noise, environmental factors such as wind and machinery noise, and presence of multiple concurrent conversations [9].

In the context of audiology and hearing loss, it is currently estimated that around 5% of the global population, equivalent to 430 million people, experience hearing impairment [10]. The prevalence of hearing loss increases noticeably with over 25\% of individuals aged 60 and above affected by disabling hearing loss, which refers to hearing loss greater than 40 dB in the better-hearing ear in adults and greater than 30 dB in the better hearing ear in children [10,11]. Despite continuous advancements in research and technology, currently hearing assessments predominantly rely on audio-only (AO) methodologies.

Pure tone audiometry (PTA) is standard procedure for the identification and assessment of hearing loss and its severity. This diagnostic approach enables clinicians to accurately determine the extent of hearing impairment, thereby facilitating informed counseling and the provision of tailored recommendation to patients. PTA is widely regarded as the gold standard and most frequently employed test for detecting of hearing loss [12,13]. Although PTA provides essential data regarding a listener’s hearing sensitivity, Parmar et al. [14] found that hearing healthcare professionals view speech testing as particularly valuable for offering patients relatable insights into their functional hearing abilities. This information is crucial for guiding hearing aid fittings and constitutes as vital component of the comprehensive diagnostic test battery. These assessments are conducted under both aided and unaided conditions, encompassing evaluations in quiet as well as in the presence of background noise.

Speech-in-noise (SIN) tests are effective tools for assessing hearing loss across diverse populations and languages. A wide array of commercially available audio-only (AO) SIN tests are currently available, many of which are suitable for both adults and children. The AO SIN tests evaluate performance at the sentence, word, or phonemic level and include, both adaptive and fixed signal-to-noise ratio (SNR) tests are available. Fixed SNR tests include the Connected Speech Test (CST) [15] and the Speech Perception in Noise Test (SPIN) [16]. Adaptive tests, such as the Hearing In Noise Test (HINT) [17], Quick Speech In Noise (QSIN) [18], the Words-in-Noise Test (WIN) [19] and the Bamford–Kowal–Bench (BKB) SIN test [20], are commonly employed in clinical settings. These tests typically involve a target speaker delivering the specific material (sentence, word, or phoneme) amid background noise, which varies depending on the test. Background noises range from multi-talker babble to speech-shaped noise, with adaptive testing featuring variable noise or speech level, while fixed SNR tests maintaining constant noise levels. Stimuli may be presented through headphones to provide ear-specific results or via soundfield loudspeakers. Participants are instructed to repeat what they hear, and clinicians score the response based on the number of accurately recognised keywords, determining the percentage of correct words or the speech recognition threshold at 50% intelligibility, depending on the test employed.

Although the benefits of employing speech testing for guiding hearing aid fitting and as a component within a diagnostic test battery are well recognised, the widespread adoption of this practice remains limited. Certain countries [14], such as Canada and India, recommend speech testing as an essential component of audiology practice, whereas others, such as the UK, do not. Parmar et al. [14] have identified a lack of clinical time, inadequate training, and insufficient equipment as key factors contributing to the limited implementation of speech testing within a diagnostic battery, a trend particularly evident in the UK and likely to the observed global variability in service provision [14].

In addition to the limited adoption of SIN testing, current testing protocols exhibit several others limitations, including their failure to replicate real world scenarios [21] and the lack of integration of visual cues, which could enable listeners to benefit from auditory and visual information. The incorporation of visual cues into speech perception tests has been extensively investigated with a substantial body of research establishing that speech comprehension is significantly enhanced when both auditory and visual modalities are engaged [5,7,22,23]. This research consistently demonstrates the advantages of incorporating visual cues, particularly in environments where auditory signals are degraded. For instance, a study conducted by [7] with normal hearing adults revealed a marked improvement in word recognition when both auditory and visual cues were presented, compared to auditory input alone. In particular, the inclusion of visual speech information was found to enhance performance to a level equivalent to a 15 dB increase in SNR over AO conditions. Similarly, Gagné and Wittich [24] underscored the importance of visual cues for older adults with hearing loss, reporting an average 18% improvement in speech recognition when visual information was incorporated. These findings underscore the imperative to integrate visual cues into SIN testing protocols, particularly for populations where auditory processing alone may be insufficient, thus ensuring more accurate assessments of speech perception in realistic listening environments. Another critical consideration when integrating visual elements into speech tests is the specific contribution of visual input and the extent of its benefit. As noted by Tye-Murray et al. [25], lipreading (visual-only speech perception) plays a significant role in audiovisual (AV) speech perception, accounting for up to 60% of the variance in individual AV speech perception measures.

Expanding on the integration of visual cues in speech testing, it is also essential to consider the role of Speech Reception Threshold (SRT) and other speech perception measures in evaluating auditory performance. The SRT is a widely utilised metric in AO SIN tests that determines the minimum SNR at which a listener can correctly identify speech 50% of the time. Fixed SNR tests, which assess the percentage of correct responses at a predetermined SNR, offer the advantage of providing a straightforward evaluation of hearing aid benefit, facilitating patients comprehension. However, a critical limitation of these tests is the difficulty of selecting an appropriate SNR. If the SNR is set too low, the results may underestimate the true benefit of the hearing aids. Conversely, if the SNR is set too high, the perceived benefit may be overstated. This issue is particularly salient among high-performing cochlear implant users, where traditional fixed SNR tests may result in ceiling effects, thus failing to accurately differentiate level of performance. In contrast, adaptive SRT tests, which dynamically adjust the SNR based on the listener’s performance, offer more nuanced assessment capable of better distinguishing performance across a wide range of abilities [26]. This adaptive approach is essential for capturing the full spectrum of auditory processing capabilities and ensuring that the results are both meaningful and reflective of real-world listening conditions. While percentage-based tests and SRT measurements offer different perspectives on speech perception abilities, SRT testing can provide more fine-grained insights into specific aspects of hearing performance. Both approaches are interconnected, as an SRT can be derived from a psychometric function of percentage correct versus decibel level, and conversely, a percentage-versus-decibel curve can be calculated from an adaptive SRT test. The choice between these methods often depends on the specific experimental focus and goals. The implementation of adaptive SRT test in noise would serve as a valuable tool for assessing hearing capabilities and comparing hearing aid systems in both clinical and research settings, as these tests reveal variation in psychometric function slopes and offer a more comprehensive range of performance levels [26,27].

Traditionally, hearing services have predominantly been provided in hospitals or clinics, where testing is conducted by healthcare professionals. This centralised model of care presents notable limitations in accessibility, particularly for certain populations and specific circumstances. The shortage of qualified professionals, especially in low- and middle-income countries, constitutes a significant barrier to access [28]. Recent global insults, such as the COVID-19 pandemic, have underscored another critical limitation of clinic-based services: their susceptibility to disruption during public health crises. Safety measures, including lockdowns implemented to control the spread of infectious diseases, can severely impede access to conventional hearing healthcare services [29]. In response to these limitations and a need to enhance accessibility to hearing assessments, there has been a increase shift towards the development of remote and mobile-based AO SIN tests. These innovative approaches seek to democratise access to hearing screening and assessment tools, enabling individuals to undergo preliminary evaluations without necessitating in-person clinic visits. Several AO SIN tests have been adapted or newly developed with remote capabilities, using internet-based platforms or smartphone applications [30–32]. These tools hold significant potential for the widespread screening and monitoring of hearing health, particularly in underserved areas or during periods when physical access to healthcare facilities is restricted.

This comprehensive review aims to critically examine the evolution and current state of speech intelligibility testing, with a particular focus on SRT assessments that have integrated AV elements over the recent decades. The scoping review will systematically identify and evaluate studies that have incorporated visual components into SRT assessments, exploring their methodologies, outcomes, and potential clinical applications. Additionally, the review will investigate the extent to which these AV SRT tests have been adapted for remote administration, addressing a critical gap in accessibility for individuals in geographically isolated regions or those with technological or mobility constraints that hinder access to traditional clinical settings. Furthermore, this review will not only provide a comprehensive overview of the current landscape of AV SRT testing, but also to critically analyse the potential of these methods to enhance diagnostic accuracy, thereby informing development of an AV SIN test. Drawing on the findings of this scoping study, recommendation can be formulated regarding the applicability of existing methodologies or stimuli in development of a new remote British English AV SIN test. Consequently, this review underscores the necessity for future research to focus on the developing and validating AV speech tests that can be administered remotely or via mobile applications, while ensuring the reliability and validity of clinical assessments. Such advancements hold the potential to significantly improve to accessibility, comprehensiveness, and patient centreness of hearing healthcare services and could facilitate the future development of multimodal hearing aids by manufacturers.

Methods

A scoping review was conducted, following the methodology outlined by Arksey and O’Malley [33]. These findings were systematically presented in the following sequence: (1) define a research question, (2) identify pertinent studies, (3) select relevant studies, (4) charting the data, and (5) collating, summarise, and reporting the results. This review adhered to the guidelines of the preferred reporting items for scoping reviews (PRISMA-ScR) [34].

Identifying the research question

The primary research question addressed was:’Have any previously developed or researched AV SIN tests been used to measure SRT?’ A secondary question investigated which of these tests incorporated remote or automated functionalities?

Identifying relevant studies

In this phase of study, we sought to establish the criteria for selecting publications to be included in the scoping review. Although scoping studies are inherently broad in scope, we deliberately identified specific criteria to guide our search process. The search strategy utilised for electronic databases was formulated based on our research questions and key concept underpinning the study. Prior to commencing the searches, two authors (AH and AG) reached a consensus on the relevant keywords for article retrieval. Searches were conducted from earliest records available up to February 18, 2025. The databases selected for this review included the Cochrane Library, IEEExplore, PubMed, Science Direct, and Web of Science, chosen for their comprehensive coverage of topics pertinent to health, engineering, social sciences, and psychology. Repositories containing grey literature were excluded from this systematic search, due to the absence of peer review which raised concerns regarding the authenticity, reliability, and reproducibility of the included work [35]. This exclusion was enforced to ensure that the retrieved literature would directly contribute to addressing the research question. The search terms employed in the database queries included “speech in noise,” “speech intelligibility,” “speech perception,” “speech recognition in noise,” “audiovisual,” “audio-visual,” and “auditory-visual.” These terms were consistently applied across all databases using boolean operators to structure the queries. Specifically, the key terms within each axis were combined using the “OR” operator and the search strategies for both axes were linked using the “AND” operator. The search commands were as follows: (“speech in noise” OR “speech intelligibility” OR “speech perception” OR “speech recognition in noise”) AND (“audiovisual” OR “audio-visual” OR “auditory-visual”). To minimise the risk of excluding relevant studies, no restrictions were placed on publication dates or languages.

Study selection

All studies published in English that addressed AV SIN tests which used SRT as measurement were considered for inclusion in this review. The primary objective was to identify any AV SIN tests that had been developed without imposing any restrictions on the date range for inclusion. Additionally, no limitations were placed on sample size or study design, as the focus was on analysing the methodologies and development employed in previous research within the scope of this review.

Incomplete papers, opinion pieces, book chapters, editorials, and grey literature were excluded from further review. The management of all search results citations including the identification and merging of duplicates was conducted using Paperpile LLC 2024 software. Subsequently, three stage selection process was implemented to evaluate the articles.

The inclusion criteria were established based on the following parameters: (1) studies involving AV SIN testing (2) no age restriction for participants, (3) measurement of the SRT, 4) no limitation on the language of stimuli, and (5) articles written in English. Articles excluded were: (1) measurements without SRT, (2) were not written in English, and 3) consisted of clinical commentaries, editorials, interviews, letters, newspaper articles, abstracts only, or non-peer-reviewed literature (e.g., thesis).

During stage 1, two authors (AH and KD) independently reviewed the titles and abstracts of all articles related to AV SIN tests. In the second stage, the full-text articles were meticulously examined by the same authors (AH and KD) to assess their eligibility. The final stage involved a third author (AG) who evaluated the articles flagged by the previous reviewers, due to discrepancies, resolving any disagreement to make the final determination on inclusion.

Ultimately, fully eligible articles were chosen for further analysis and synthesis (Fig 1).

Download:

Fig 1. A PRISMA flow chart that shows the process of article selection [48].

https://doi.org/10.1371/journal.pone.0338600.g001

Charting the data

The subsequent phase involved systematically organising the data extracted from the primary research reports under review. The data was structured according to the main research question to emphasise the primary findings. Each paper was meticulously read multiple times to ensure a comprehensive understanding of its aims, objectives, and findings, to ensure no relevant information was overlooked. The following information was systematically documented from each article: author(s), year of publication, study aim, research context, study participants, outcome measures, stimuli, language, test procedures and main findings.

Collating, summarising and reporting results

Papers meeting the inclusion criteria were meticulously examined, with their content subjected to rigorous analysis and assessment. Recurring themes, the various methodologies employed for recording the AV material, and the procedures utilised for determining SRT were critically scrutinised and presented in the following section.

Results

Study selection

Articles were retrieved from five online databases, resulting in an initial set of 5261 papers. After eliminating duplicates and excluding irrelevant studies, 3536 articles remained. Title screening then led to the exclusion of 3340 articles, followed by abstract screening which further excluded 109 articles. This process resulted in a final selection of 87 articles for a comprehensive full-text review. Throughout this process, 76 articles were excluded based on the inclusion criteria due to reasons such as not having an SRT measurement, lack of AV integration or not being written in English. Consequently, the final number of studies included to address the research question was 11.

Study characteristics

The following reviews each of the 11 included studies, summarising their aims, methodologies and findings (Table 1).

Download:

Table 1. Study characteristics and main findings.

https://doi.org/10.1371/journal.pone.0338600.t001

Stimuli and measures

An analysis of the studies revealed substantial variability in both the speech stimuli and measurement methodologies employed for AV SIN testing. The speech materials ranged from basic digit triplets to more intricate sentence-based stimuli. Several studies utilised standardised speech tests, such as matrix sentence tests, BKB sentences, and IEEE sentences, while others developed bespoke materials, including passages from the CST or novel sentence lists. In terms of measurement, most studies focused on determining SRTs, albeit with varying target percentages. Whilst many studies employed the conventional 50% correct performance criterion, others explored alternative thresholds. such as 80% SRT, whilst another study examined multiple thresholds, including 5%, 50%, 80%, and 95%. Furthermore, another study calculated SRTs based on the mean of the final six SNR ratio values. These variations in SRT percentages reflect ongoing efforts to optimise sensitivity and mitigate ceiling effects in AV testing. Adaptive procedures were commonly used for efficient SRT estimation, although some studies incorporated fixed SNRs to characterise performance across diverse listening conditions. Scoring methodologies encompassed both keyword and whole-sentence approaches (Table 2).

Download:

Table 2. Critical appraisal of methodological strengths and limitations of included studies.

https://doi.org/10.1371/journal.pone.0338600.t002

Visual integration

The studies reviewed utilised a range of methods to present visual speech information and assess its integration with auditory cues. Several studies employed video recordings of real speakers L. Arnold et al. [36]; Bernstein and Grant [37]; Cox et al. [38]; Llorach et al. [39, 40]; Van de Rijt et al. [41]; Le Rhun et al. [42] providing naturalistic visual speech cues that facilitated the integration process. In contrast, other studies employed virtual human speakers Choudhary et al. [43]; Devesse et al. [44]; Schreitmüller et al. [45], which despite sacrificing some realism, allowed for enhanced control over visuals elements such as lip contrast and head scaling. Although the extent of visual benefit varied across studies, it was consistently significant, with improvements in SRTs ranging from 1.5 to 5 dB when visual cues were incorporated. Furthermore, Van de Rijt et al. [41] and Bernstein and Grant [37] provided empirical support for the principle of inverse effectiveness, demonstrating that the benefit of visual cues was maximised at intermediate SNRs where AO performance was neither excessively high nor low.

Remote testing

A key finding of this review was the lack of remote testing capabilities in the AV SIN tests examined. None of the 11 studies incorporated methods for administering tests remotely or via telehealth platforms. All tests were conducted in controlled laboratory or clinical environments, with participants required to be physically present for the testing sessions.

Future innovations could leverage virtual reality (VR) to create immersive and standardised testing environments or employ AI-generated, photorealistic avatars to offer precise control over visual speech cues, overcoming the inconsistencies of video recordings.

Discussion

This comprehensive scoping review focuses on AV SIN tests with SRT measurements, aimed to (i) identify developed AV SIN tests specifically designed to measure SRT, (ii) evaluate the remote testing capabilities of these assessments. In analysing the search results, we identified methodologies employed in the development of various tests. Our investigation revealed 11 studies demonstrating considerable variability in functionality, with several requiring further development or validation. This highlights both significant progress in the field and areas necessitating future development. Due to our stringent screening criteria, the number of research studies included in this chapter is substantially lower than in comparable reviews [46] exploring AV speech perception. However, this focused sample enables us to thoroughly examine the methodologies utilised in previous studies and gain valuable insights into techniques that can be adapted and implemented in future research.

Speech material and masking noise

The reviewed studies demonstrate significant progress in developing AV SIN tests across different languages and populations, using a diverse array of speech materials and masking noise. The studies included matrix sentence tests [39,42,45], which are beneficial due to their highly controlled vocabulary and syntactic structure, ease of adaptability across languages, and extensive number of possible combinations, rendering them suitable for repeated measures. Additionally, more naturalistic sentence materials [44] have been validated and standardised for clinical application, while word-level stimuli (L. [36]) represent an alternative approach.

Choudhary et al. [43] used digit triplets, which offer simplicity and rapid administration, thereby minimizing linguistic confounds. However, this material does not accurately reflect the complexities of real-world communication challenges, nor does it assess sentence-level processing. MacLeod and Summerfield [40] utilised BKB sentences, which are standardised and widely used in research. Despite their utility, these sentences may be constrained by the limited number of available material and may not adequately reflect the complexity of adult conversational discourse. Bernstein and Grant (2009) utilised IEEE sentences, which are phonetically balanced and commonly employed in speech testing. Although this corpus is known for its low context predictability, it could be limited by the number of sentences available when compared to matrix sentences (720 sentences versus the possibility of 100,000 possible sentence combinations). Cox et al. [38] used material from the CST test, which consists of 48 passages of conversational speech. While this test has high ecological validity, the use of conversational speech makes it more difficult to control for linguistic factors, and scoring can be more complex and time-consuming.

This variety reflects a tension between the need for controlled, comparable stimuli and the desire for ecological validity. Future development of AV SIN tests should strategically balance competing factors by incorporating a diverse range of speech materials. These materials should include single words and sentences, thereby enabling a comprehensive assessment of AV speech perception across multiple levels of linguistic complexity, from phoneme recognition to contextual comprehension.

The diversity of masking noises used across studies, including speech-shaped noise, multi-talker babble, and modulated noise, whilst some reflect the complexity of real-world listening environments others are used for laboratory experiments. While speech-shaped noise offers consistent energetic masking, it may not fully capture the informational masking effects encountered in everyday listening situations. The comparative analysis of various noise types within individual studies, as done by Bernstein and Grant [37], is particularly valuable in elucidating how different masker characteristics interact with AV integration processes. These maskers often fall into two categories; energetic masking, where noise overlaps spectrally with the speech signal, obstructing it at the auditory periphery and informational masking which arises from cognitive interference, such as competing speech signals that are perceptually similar to the target. Future AV SIN tests could benefit from incorporating adaptive technologies that manipulate both the type and intensity of background noise to offer a more comprehensive assessment of AV speech perception in challenging listening conditions.

It is critical to note that the studies identified in this review were exclusively conducted in Western countries, utilising materials in English, German, French, and Dutch. This reveals a significant geographic and linguistic gap in the literature. There is a notable absence of AV-SIN test development and validation in languages from Asia, Africa, and South America. Given that linguistic and cultural factors can significantly influence speech perception, the direct application of existing tests to diverse global populations is not appropriate. Future research should prioritise the development of culturally and linguistically adapted AV-SIN tests to ensure that the benefits of this assessment methodology are accessible globally and relevant to diverse patient populations.

Procedural aspects and scoring methods

The review underscored several important methodological considerations for AV SIN testing, including the choice between adaptive procedures and fixed SNR measurements, as well as differences in scoring methods and the number of trials. Adaptive procedures for estimating SRTs offer efficiency and precision but may not capture the full range of performance across different SNRs. The approach taken by Van de Rijt et al. [41] of measuring performance across a fixed range of SNRs provides valuable insights into the shape of the psychometric function for AV speech perception, revealing important effects such as inverse effectiveness [41]. Inverse effectiveness is the principle that the benefit of combining auditory and visual information grows as the individual signals become less reliable.

Another crucial aspect to consider is the selection of an appropriate SRT percentage. Studies have demonstrated that a 50% threshold may be too easy to reach when visual cues are present due to the benefits of lipreading [39]. Scoring methods also varied, ranging from keyword scoring for sentences to more detailed phoneme-level scoring, each offering different balances between efficiency and the depth of information obtained. Additionally, studies differed in trial count, with test lists ranging from 10 to 30 sentences, underscoring the need for further research to find the optimal balance between test reliability and administration time.

Future advancement in AV SIN tests could benefit from integrating diverse approaches. Employing adaptive procedures to rapidly estimate SRTs while alongside fixed SNRs to characterise the full psychometric function, may enhance test accuracy. Additionally, incorporating multiple scoring levels could provide a comprehensive assessment, offering both overall intelligibility measures and detailed insights into AV integration.

Visual integration

The exploration of both video recordings and virtual human speakers for presenting visual speech information signifies an important area of innovation in AV SIN testing. Although video recordings of real speakers currently offer the most naturalistic visual cues, the potential benefits of virtual humans, in terms of stimulus control and flexibility, make this a promising avenue for future research. An important consideration is the simultaneous recording of both visuals and auditory, as studies have shown that dubbing during post-processing can introduce inconsistencies [39, 42].

The consistent finding of significant visual benefits across studies, with SRT improvements ranging from 1.5 to 5 dB, underscores the importance of incorporating visual cues in speech intelligibility assessments. The evidence for inverse effectiveness observed in several studies has significant implications for test design and the results interpretation. Future AV SIN tests should be designed to capture this phenomenon, by adjusting SNRs to optimise AV integration for each individual. Additionally, further research is needed to examine individual differences in the capacity to integrate auditory and visual speech cues, which have meaningful implications for developing rehabilitative strategies in clinical populations.

Remote testing

The absence of remote testing capabilities in current AV SIN assessments highlights a significant gap in the field. As technology has advanced, particularly in recent years, the potential for remote administration of speech perception tests has grown significantly. With the growing importance of telehealth in audiology and the global shift towards remote healthcare delivery [47,50], it is imperative to prioritise the development of AV SIN tests that can be reliably administered in remote settings. Leveraging improved connectivity and emerging telehealth platforms, future research may increasingly incorporate remote testing methods to address this need.

However, the transition to remote AV SIN testing presents several challenges. Ensuring consistent audiovisual presentation across diverse devices and varying internet connections is critical. Additionally, safeguarding test integrity in unsupervised settings poses unique difficulties. Potential solutions may involve the development of specialised software or web-based platforms for AV SIN test delivery, integrating automated calibration procedures to standardise device output, and employing advanced encryption and authentication technologies to protect test materials.

Rigorous research will also be necessary to validate remote versions of AV SIN tests by comparing them with in-person administration to ensure equivalence of results. The successful advancement of remote AV SIN testing capabilities would substantially enhance the accessibility and clinical applicability of these assessments, facilitating broader adoption in both research and clinical practice. This development has the potential to revolutionise speech perception testing, bridging gaps in accessibility and aligning with the broader trends in telehealth innovation.

Strengths and limitations

A primary strength of this scoping review is its specific focus on SRTs measured through AV tests. This focus, however, inherently excluded a substantial body of research that examines AV speech perception without formally quantifying thresholds. Much of the research within the fields of psychology and neuroscience explores behavioural and neurological mechanisms underlying the integration of auditory and visual inputs during language processing without calculating speech reception scores. As a result, our review may have overlooked significant insights that this broader literature on AV speech perception could have provided. Nevertheless, we chose to focus specifically on SRTs because they offer a quantifiable and clinically relevant measure of speech intelligibility, which is particularly important for evaluating and comparing the performance of AV-based hearing assessments and interventions. This strategic focus allowed for the identification of AV assessments that were expressly designed and validated to measure speech intelligibility thresholds. Although the application of the SRT criterion resulted in a more limited pool of eligible studies, it allowed for a more focused mapping of AV testing methodologies specifically designed to quantify visual speech intelligibility gains. By narrowing the scope in this way, the review highlights the range of existing tools, their key characteristics, and areas where further methodological development or validation may be needed.

Furthermore, the decision to include only studies published in peer-reviewed journals may have introduced a degree of publication bias. Research on AV speech testing that is documented in unpublished manuscripts, conference proceedings, and dissertations was likely excluded. Expanding the scope of future reviews to include this so-called grey literature could yield additional perspectives and insights. An additional advantage of our inclusion criteria was the lack of restrictions regarding publication dates. This inclusive approach provided a comprehensive view of the development trajectory of AV speech testing, encompassing efforts that span several decades. By reviewing foundational works, we were able to identify early pioneers who explored the use of visual cues to enhance speech perception, striving towards the optimisation and validation of AV speech tests.

Similarly, our review identified a lack of research focusing on diverse populations. Although one study developed a test for paediatric use, most of the research cantered on monolingual adults with normal hearing or post-lingual hearing loss. The unique challenges faced by multilingual individuals, pre-lingually deaf children, or individuals with comorbid cognitive or visual impairments were largely unaddressed. The applicability of these findings to populations in low-resource settings, where access to technology and clinical expertise is limited, also remains unexplored. This limits the global generalisability of the current body of evidence and underscores the need for research that is more inclusive of diverse participant groups.

Conclusions

In conclusion, this review identified 11 studies conducted in English, Dutch, French and German languages and specialised research domains, providing a comprehensive evaluation of existing AV SIN tests and methodologies. The analysis revealed substantial variability across studies highlighting the necessity for further research to achieve adoption of these tests. A key finding emphasised the importance of carefully calibrating scoring percentages and threshold criteria in future test designs to mitigate ceiling and floor effects, which may otherwise limit test sensitivity.

Significantly, none of the current AV assessments reviewed incorporate remote administration capabilities, underscoring the significant gap in the field. This presents a substantial opportunity to utilise telehealth innovations, thereby increasing accessibility to testing for populations constrained by geographic location, mobility issues, or availability. Additionally, careful considerations of the speech materials used in these assessments is critical to optimising reliability and validity. Factors such as the selection of talkers and linguistic complexity play a pivotal role in shaping test outcomes. Additionally, the design of adaptive procedures must carefully balance precision in threshold measurement, test efficiency, and the minimisation of participants fatigue or demotivation.

Therefore, the development a novel AV British English SIN test should prioritise addressing these identified gaps and consideration. The successful development of a remote test of this nature has the potential to transform clinical practice by facilitating more frequent assessments without the need for clinician involvement or specialised testing environments like soundproof booths. Moreover, such advancements, particularly those integrating telehealth platforms with emerging technologies like virtual reality and artificial intelligence, would be invaluable for researchers focused on developing ecologically valid assessments that address the limitations of current AO protocols and pave the way for truly patient-centred hearing healthcare.

Acknowledgments

I thank all co-authors for their support and the extended team for their helpful advice on this study.

References

1. Rosenblum LD. Primacy of multimodal speech perception. The handbook of speech perception. 2005. p. 51–78.
2. Campbell R. The processing of audio-visual speech: empirical and neural bases. Philos Trans R Soc Lond B Biol Sci. 2008;363(1493):1001–10. pmid:17827105
- View Article
- PubMed/NCBI
- Google Scholar
3. Hussain A, Barker J, Marxer R, Adeel A, Whitmer W, Watt R, Derleth P. Towards multi-modal hearing aid design and evaluation in realistic audio-visual settings: Challenges and opportunities. In First international conference on challenges in hearing assistive technology (chat-17) stockholm, sweden. 2017.
4. Rosenblum LD. Speech perception as a multimodal phenomenon. Curr Dir Psychol Sci. 2008;17(6):405–9. pmid:23914077
- View Article
- PubMed/NCBI
- Google Scholar
5. Erber NP. Auditory-visual perception of speech. J Speech Hear Disord. 1975;40(4):481–92. pmid:1234963
- View Article
- PubMed/NCBI
- Google Scholar
6. Guellaï B, Streri A, Yeung HH. The development of sensorimotor influences in the audiovisual speech domain: Some critical questions. Front Psychol. 2014;5:812. pmid:25147528
- View Article
- PubMed/NCBI
- Google Scholar
7. Sumby WH, Pollack I. Visual contribution to speech intelligibility in noise. J Acoust Soc Am. 1954;26(2):212–5.
- View Article
- Google Scholar
8. Vatikiotis-Bateson E, Eigsti IM, Yano S, Munhall KG. Eye movement of perceivers during audiovisual speech perception. Percept Psychophys. 1998;60(6):926–40. pmid:9718953
- View Article
- PubMed/NCBI
- Google Scholar
9. Miles K, Beechey T, Best V, Buchholz J. Measuring speech intelligibility and hearing-aid benefit using everyday conversational sentences in real-world environments. Front Neurosci. 2022;16:789565. pmid:35368279
- View Article
- PubMed/NCBI
- Google Scholar
10. World Health Organization. Deafness and hearing loss. https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss. 2023. Accessed 2023 December 3.
11. Akeroyd MA, Munro KJ. Population estimates of the number of adults in the UK with a hearing loss updated using 2021 and 2022 census data. Int J Audiol. 2024;63(9):659–60. pmid:38747510
- View Article
- PubMed/NCBI
- Google Scholar
12. Barlow C, Davison L, Ashmore M. Variation in tone presentation by pure tone audiometers: the potential for error in screening audiometry. In: Proceedings of euronoise 2015, 2015.
13. Barlow C, Davison L, Ashmore M, Weinstein R. Amplitude variation in.Calibrated audiometer systems in clinical simulations. Noise Health. 2014;16(72):299–305.
- View Article
- Google Scholar
14. Parmar BJ, Rajasingam SL, Bizley JK, Vickers DA. Factors affecting the use of speech testing in adult audiology. Am J Audiol. 2022;31(3):528–40.
- View Article
- Google Scholar
15. Cox RM, Alexander GC, Gilmore C. Development of the connected speech test (CST). Ear Hear. 1987;8(5 Suppl):119S-126S. pmid:3678650
- View Article
- PubMed/NCBI
- Google Scholar
16. Bilger RC, Nuetzel JM, Rabinowitz WM, Rzeczkowski C. Standardization of a test of speech perception in noise. J Speech Hear Res. 1984;27(1):32–48. pmid:6717005
- View Article
- PubMed/NCBI
- Google Scholar
17. Nilsson M, Soli SD, Sullivan JA. Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. J Acoust Soc Am. 1994;95(2):1085–99. pmid:8132902
- View Article
- PubMed/NCBI
- Google Scholar
18. Killion MC, Niquette PA, Gudmundsen GI, Revit LJ, Banerjee S. Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. J Acoust Soc Am. 2004;116(4 Pt 1):2395–405. pmid:15532670
- View Article
- PubMed/NCBI
- Google Scholar
19. Wilson RH. Development of a speech-in-multitalker-babble paradigm to assess word-recognition performance. J Am Acad Audiol. 2003;14(9):453–70. pmid:14708835
- View Article
- PubMed/NCBI
- Google Scholar
20. Research E. Bamford-kowal-bench speech-in-noise test (version 1.03). Elk Grove Village, IL: Author. 2005.
21. Badajoz-Davila J, Buchholz JM. Effect of test realism on speech-in-noise outcomes in bilateral cochlear implant users. Ear Hear. 2021;42(6):1687–98. pmid:34010247
- View Article
- PubMed/NCBI
- Google Scholar
22. Arnold P, Hill F. Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. Br J Psychol. 2001;92 Part 2:339–55. pmid:11802877
- View Article
- PubMed/NCBI
- Google Scholar
23. Erber NP. Auditory, visual, and auditory-visual recognition of consonants by children with normal and impaired hearing. J Speech Hear Res. 1972;15(2):413–22. pmid:5047880
- View Article
- PubMed/NCBI
- Google Scholar
24. Gagné J-P, Wittich W. Visual impairment and audiovisual speech perception in older adults with acquired hearing loss. In: Hearing care for adults: The challenge of aging. proceedings of the second international conference, 2009. 165–77.
25. Tye-Murray N, Spehar B, Myerson J, Hale S, Sommers M. Lipreading and audiovisual speech recognition across the adult lifespan: Implications for audiovisual integration. Psychol Aging. 2016;31(4):380–9. pmid:27294718
- View Article
- PubMed/NCBI
- Google Scholar
26. Poissant SF, Bero EM, Busekroos L, Shao W. Determining cochlear implant users’ true noise tolerance: Use of speech reception threshold in noise testing. Otol Neurotol. 2014;35(3):414–20. pmid:24518402
- View Article
- PubMed/NCBI
- Google Scholar
27. Hu W, Swanson BA, Heller GZ. A Statistical method for the analysis of speech intelligibility tests. PLoS One. 2015;10(7):e0132409. pmid:26147290
- View Article
- PubMed/NCBI
- Google Scholar
28. Mulwafu W, Ensink R, Kuper H, Fagan J. Survey of ENT services in sub-Saharan Africa: Little progress between 2009 and 2015. Glob Health Action. 2017;10(1):1289736. pmid:28485648
- View Article
- PubMed/NCBI
- Google Scholar
29. Hussain A, Hussain Z, Gogate M, Dashtipour K, Ng D, Riaz MS, et al. Impact of the Covid-19 pandemic on audiology service delivery: Observational study of the role of social media in patient communication. PLoS One. 2024;19(4):e0288223. pmid:38662689
- View Article
- PubMed/NCBI
- Google Scholar
30. Almufarrij I, Dillon H, Dawes P, Thodi C, Stone M, Charalambous AP, et al. Web-and app-based tools for remote hearing assessment: a scoping review protocol. 2020.
31. Motlagh Zadeh L, Brennan V, Swanepoel DW, Lin L, Moore DR. Remote self-report and speech-in-noise measures predict clinical audiometric thresholds. Int J Audiol. 2025;64(6):618–26. pmid:39109478
- View Article
- PubMed/NCBI
- Google Scholar
32. Paglialonga A, Polo EM, Zanet M, Rocco G, van Waterschoot T, Barbieri R. An automated speech-in-noise test for remote testing: Development and preliminary evaluation. Am J Audiol. 2020;29(3S):564–76. pmid:32946249
- View Article
- PubMed/NCBI
- Google Scholar
33. Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8(1):19–32.
- View Article
- Google Scholar
34. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann Intern Med. 2018;169(7):467–73. pmid:30178033
- View Article
- PubMed/NCBI
- Google Scholar
35. Haddaway NR, Collins AM, Coughlin D, Kirk S. The role of google scholar in evidence reviews and its applicability to grey literature searching. PLoS One. 2015;10(9):e0138237. pmid:26379270
- View Article
- PubMed/NCBI
- Google Scholar
36. Arnold L, Boyle P, Canning D. Development of a paediatric audiovisual speech test in noise. Cochlear Implants Int. 2010;11 Suppl 1:244–8. pmid:21756624
- View Article
- PubMed/NCBI
- Google Scholar
37. Bernstein JGW, Grant KW. Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners. J Acoust Soc Am. 2009;125(5):3358–72. pmid:19425676
- View Article
- PubMed/NCBI
- Google Scholar
38. Cox RM, Alexander GC, Gilmore C, Pusakulich KM. The Connected speech test version 3: audiovisual administration. Ear Hear. 1989;10(1):29–32. pmid:2470629
- View Article
- PubMed/NCBI
- Google Scholar
39. Llorach G, Kirschner F, Grimm G, Zokoll MA, Wagener KC, Hohmann V. Development and evaluation of video recordings for the OLSA matrix sentence test. Int J Audiol. 2022;61(4):311–21. pmid:34109902
- View Article
- PubMed/NCBI
- Google Scholar
40. MacLeod A, Summerfield Q. Quantifying the contribution of vision to speech perception in noise. Br J Audiol. 1987;21(2):131–41. pmid:3594015
- View Article
- PubMed/NCBI
- Google Scholar
41. van de Rijt LPH, Roye A, Mylanus EAM, van Opstal AJ, van Wanrooij MM. The Principle of inverse effectiveness in audiovisual speech perception. Front Hum Neurosci. 2019;13:335. pmid:31611780
- View Article
- PubMed/NCBI
- Google Scholar
42. Le Rhun L, Llorach G, Delmas T, Suied C, Arnal LH, Lazard DS. A standardised test to evaluate audio-visual speech intelligibility in French. Heliyon. 2024;10(2):e24750. pmid:38312568
- View Article
- PubMed/NCBI
- Google Scholar
43. Datta Choudhary Z, Bruder G, Welch GF. Visual facial enhancements can significantly improve speech perception in the presence of noise. IEEE Trans Vis Comput Graph. 2023;29(11):4751–60. pmid:37782611
- View Article
- PubMed/NCBI
- Google Scholar
44. Devesse A, Dudek A, van Wieringen A, Wouters J. Speech intelligibility of virtual humans. Int J Audiol. 2018;57(12):908–16. pmid:30261770
- View Article
- PubMed/NCBI
- Google Scholar
45. Schreitmüller S, Frenken M, Bentz L, Ortmann M, Walger M, Meister H. Validating a method to assess lipreading, audiovisual gain, and integration during speech reception with cochlear-implanted and normal-hearing subjects using a talking head. Ear Hear. 2018;39(3):503–16. pmid:29068860
- View Article
- PubMed/NCBI
- Google Scholar
46. Basharat A, Thayanithy A, Barnett-Cowan M. A scoping review of audiovisual integration methodology: Screening for auditory and visual impairment in younger and older adults. Front Aging Neurosci. 2022;13:772112. pmid:35153716
- View Article
- PubMed/NCBI
- Google Scholar
47. D’Onofrio KL, Zeng F-G. Tele-audiology: current state and future directions. Front Digit Health. 2022;3:788103. pmid:35083440
- View Article
- PubMed/NCBI
- Google Scholar
48. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Ann Intern Med. 2009;151(4):264–9, W64. pmid:19622511
- View Article
- PubMed/NCBI
- Google Scholar
49. MacLeod A, Summerfield Q. A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: rationale, evaluation, and recommendations for use. Br J Audiol. 1990;24(1):29–43. pmid:2317599
- View Article
- PubMed/NCBI
- Google Scholar
50. Eikelboom RH, Bennett RJ, Manchaiah V, Parmar B, Beukes E, Rajasingam, Eikelboom RH, Bennett RJ, Manchaiah V, Parmar B, Beukes E, Rajasingam SL, et al. International survey of audiologists during the COVID-19 pandemic: Use of and attitudes to telehealth. Int J Audiol. 2022;61(4):283–92. pmid:34369845
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Rosenblum LD. Primacy of multimodal speech perception. The handbook of speech perception. 2005. p. 51–78.

[ref2] 2. Campbell R. The processing of audio-visual speech: empirical and neural bases. Philos Trans R Soc Lond B Biol Sci. 2008;363(1493):1001–10. pmid:17827105
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Hussain A, Barker J, Marxer R, Adeel A, Whitmer W, Watt R, Derleth P. Towards multi-modal hearing aid design and evaluation in realistic audio-visual settings: Challenges and opportunities. In First international conference on challenges in hearing assistive technology (chat-17) stockholm, sweden. 2017.

[ref4] 4. Rosenblum LD. Speech perception as a multimodal phenomenon. Curr Dir Psychol Sci. 2008;17(6):405–9. pmid:23914077
View Article
PubMed/NCBI
Google Scholar

[8] View Article

[9] PubMed/NCBI

[10] Google Scholar

[ref5] 5. Erber NP. Auditory-visual perception of speech. J Speech Hear Disord. 1975;40(4):481–92. pmid:1234963
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. Guellaï B, Streri A, Yeung HH. The development of sensorimotor influences in the audiovisual speech domain: Some critical questions. Front Psychol. 2014;5:812. pmid:25147528
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Sumby WH, Pollack I. Visual contribution to speech intelligibility in noise. J Acoust Soc Am. 1954;26(2):212–5.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Vatikiotis-Bateson E, Eigsti IM, Yano S, Munhall KG. Eye movement of perceivers during audiovisual speech perception. Percept Psychophys. 1998;60(6):926–40. pmid:9718953
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref9] 9. Miles K, Beechey T, Best V, Buchholz J. Measuring speech intelligibility and hearing-aid benefit using everyday conversational sentences in real-world environments. Front Neurosci. 2022;16:789565. pmid:35368279
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. World Health Organization. Deafness and hearing loss. https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss. 2023. Accessed 2023 December 3.

[ref11] 11. Akeroyd MA, Munro KJ. Population estimates of the number of adults in the UK with a hearing loss updated using 2021 and 2022 census data. Int J Audiol. 2024;63(9):659–60. pmid:38747510
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref12] 12. Barlow C, Davison L, Ashmore M. Variation in tone presentation by pure tone audiometers: the potential for error in screening audiometry. In: Proceedings of euronoise 2015, 2015.

[ref13] 13. Barlow C, Davison L, Ashmore M, Weinstein R. Amplitude variation in.Calibrated audiometer systems in clinical simulations. Noise Health. 2014;16(72):299–305.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref14] 14. Parmar BJ, Rajasingam SL, Bizley JK, Vickers DA. Factors affecting the use of speech testing in adult audiology. Am J Audiol. 2022;31(3):528–40.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref15] 15. Cox RM, Alexander GC, Gilmore C. Development of the connected speech test (CST). Ear Hear. 1987;8(5 Suppl):119S-126S. pmid:3678650
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref16] 16. Bilger RC, Nuetzel JM, Rabinowitz WM, Rzeczkowski C. Standardization of a test of speech perception in noise. J Speech Hear Res. 1984;27(1):32–48. pmid:6717005
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref17] 17. Nilsson M, Soli SD, Sullivan JA. Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. J Acoust Soc Am. 1994;95(2):1085–99. pmid:8132902
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref18] 18. Killion MC, Niquette PA, Gudmundsen GI, Revit LJ, Banerjee S. Development of a quick speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired listeners. J Acoust Soc Am. 2004;116(4 Pt 1):2395–405. pmid:15532670
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref19] 19. Wilson RH. Development of a speech-in-multitalker-babble paradigm to assess word-recognition performance. J Am Acad Audiol. 2003;14(9):453–70. pmid:14708835
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref20] 20. Research E. Bamford-kowal-bench speech-in-noise test (version 1.03). Elk Grove Village, IL: Author. 2005.

[ref21] 21. Badajoz-Davila J, Buchholz JM. Effect of test realism on speech-in-noise outcomes in bilateral cochlear implant users. Ear Hear. 2021;42(6):1687–98. pmid:34010247
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref22] 22. Arnold P, Hill F. Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. Br J Psychol. 2001;92 Part 2:339–55. pmid:11802877
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref23] 23. Erber NP. Auditory, visual, and auditory-visual recognition of consonants by children with normal and impaired hearing. J Speech Hear Res. 1972;15(2):413–22. pmid:5047880
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref24] 24. Gagné J-P, Wittich W. Visual impairment and audiovisual speech perception in older adults with acquired hearing loss. In: Hearing care for adults: The challenge of aging. proceedings of the second international conference, 2009. 165–77.

[ref25] 25. Tye-Murray N, Spehar B, Myerson J, Hale S, Sommers M. Lipreading and audiovisual speech recognition across the adult lifespan: Implications for audiovisual integration. Psychol Aging. 2016;31(4):380–9. pmid:27294718
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref26] 26. Poissant SF, Bero EM, Busekroos L, Shao W. Determining cochlear implant users’ true noise tolerance: Use of speech reception threshold in noise testing. Otol Neurotol. 2014;35(3):414–20. pmid:24518402
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref27] 27. Hu W, Swanson BA, Heller GZ. A Statistical method for the analysis of speech intelligibility tests. PLoS One. 2015;10(7):e0132409. pmid:26147290
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref28] 28. Mulwafu W, Ensink R, Kuper H, Fagan J. Survey of ENT services in sub-Saharan Africa: Little progress between 2009 and 2015. Glob Health Action. 2017;10(1):1289736. pmid:28485648
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref29] 29. Hussain A, Hussain Z, Gogate M, Dashtipour K, Ng D, Riaz MS, et al. Impact of the Covid-19 pandemic on audiology service delivery: Observational study of the role of social media in patient communication. PLoS One. 2024;19(4):e0288223. pmid:38662689
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref30] 30. Almufarrij I, Dillon H, Dawes P, Thodi C, Stone M, Charalambous AP, et al. Web-and app-based tools for remote hearing assessment: a scoping review protocol. 2020.

[ref31] 31. Motlagh Zadeh L, Brennan V, Swanepoel DW, Lin L, Moore DR. Remote self-report and speech-in-noise measures predict clinical audiometric thresholds. Int J Audiol. 2025;64(6):618–26. pmid:39109478
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref32] 32. Paglialonga A, Polo EM, Zanet M, Rocco G, van Waterschoot T, Barbieri R. An automated speech-in-noise test for remote testing: Development and preliminary evaluation. Am J Audiol. 2020;29(3S):564–76. pmid:32946249
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref33] 33. Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8(1):19–32.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref34] 34. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann Intern Med. 2018;169(7):467–73. pmid:30178033
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref35] 35. Haddaway NR, Collins AM, Coughlin D, Kirk S. The role of google scholar in evidence reviews and its applicability to grey literature searching. PLoS One. 2015;10(9):e0138237. pmid:26379270
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref36] 36. Arnold L, Boyle P, Canning D. Development of a paediatric audiovisual speech test in noise. Cochlear Implants Int. 2010;11 Suppl 1:244–8. pmid:21756624
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref37] 37. Bernstein JGW, Grant KW. Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners. J Acoust Soc Am. 2009;125(5):3358–72. pmid:19425676
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref38] 38. Cox RM, Alexander GC, Gilmore C, Pusakulich KM. The Connected speech test version 3: audiovisual administration. Ear Hear. 1989;10(1):29–32. pmid:2470629
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref39] 39. Llorach G, Kirschner F, Grimm G, Zokoll MA, Wagener KC, Hohmann V. Development and evaluation of video recordings for the OLSA matrix sentence test. Int J Audiol. 2022;61(4):311–21. pmid:34109902
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref40] 40. MacLeod A, Summerfield Q. Quantifying the contribution of vision to speech perception in noise. Br J Audiol. 1987;21(2):131–41. pmid:3594015
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref41] 41. van de Rijt LPH, Roye A, Mylanus EAM, van Opstal AJ, van Wanrooij MM. The Principle of inverse effectiveness in audiovisual speech perception. Front Hum Neurosci. 2019;13:335. pmid:31611780
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref42] 42. Le Rhun L, Llorach G, Delmas T, Suied C, Arnal LH, Lazard DS. A standardised test to evaluate audio-visual speech intelligibility in French. Heliyon. 2024;10(2):e24750. pmid:38312568
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref43] 43. Datta Choudhary Z, Bruder G, Welch GF. Visual facial enhancements can significantly improve speech perception in the presence of noise. IEEE Trans Vis Comput Graph. 2023;29(11):4751–60. pmid:37782611
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref44] 44. Devesse A, Dudek A, van Wieringen A, Wouters J. Speech intelligibility of virtual humans. Int J Audiol. 2018;57(12):908–16. pmid:30261770
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref45] 45. Schreitmüller S, Frenken M, Bentz L, Ortmann M, Walger M, Meister H. Validating a method to assess lipreading, audiovisual gain, and integration during speech reception with cochlear-implanted and normal-hearing subjects using a talking head. Ear Hear. 2018;39(3):503–16. pmid:29068860
View Article
PubMed/NCBI
Google Scholar

[153] View Article

[154] PubMed/NCBI

[155] Google Scholar

[ref46] 46. Basharat A, Thayanithy A, Barnett-Cowan M. A scoping review of audiovisual integration methodology: Screening for auditory and visual impairment in younger and older adults. Front Aging Neurosci. 2022;13:772112. pmid:35153716
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref47] 47. D’Onofrio KL, Zeng F-G. Tele-audiology: current state and future directions. Front Digit Health. 2022;3:788103. pmid:35083440
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref48] 48. Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Ann Intern Med. 2009;151(4):264–9, W64. pmid:19622511
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

[ref49] 49. MacLeod A, Summerfield Q. A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: rationale, evaluation, and recommendations for use. Br J Audiol. 1990;24(1):29–43. pmid:2317599
View Article
PubMed/NCBI
Google Scholar

[169] View Article

[170] PubMed/NCBI

[171] Google Scholar

[ref50] 50. Eikelboom RH, Bennett RJ, Manchaiah V, Parmar B, Beukes E, Rajasingam, Eikelboom RH, Bennett RJ, Manchaiah V, Parmar B, Beukes E, Rajasingam SL, et al. International survey of audiologists during the COVID-19 pandemic: Use of and attitudes to telehealth. Int J Audiol. 2022;61(4):283–92. pmid:34369845
View Article
PubMed/NCBI
Google Scholar

[173] View Article

[174] PubMed/NCBI

[175] Google Scholar

Figures

Abstract

Objective

Design

Study Sample

Results

Conclusions

Introduction

Methods

Identifying the research question

Identifying relevant studies

Study selection

Charting the data

Collating, summarising and reporting results

Results

Study selection

Study characteristics

Stimuli and measures

Visual integration

Remote testing

Discussion

Speech material and masking noise

Procedural aspects and scoring methods

Visual integration

Remote testing

Strengths and limitations

Conclusions

Acknowledgments

References