Efficacy of Individual Computer-Based Auditory Training for People with Hearing Loss: A Systematic Review of the Evidence

Background Auditory training involves active listening to auditory stimuli and aims to improve performance in auditory tasks. As such, auditory training is a potential intervention for the management of people with hearing loss. Objective This systematic review (PROSPERO 2011: CRD42011001406) evaluated the published evidence-base for the efficacy of individual computer-based auditory training to improve speech intelligibility, cognition and communication abilities in adults with hearing loss, with or without hearing aids or cochlear implants. Methods A systematic search of eight databases and key journals identified 229 articles published since 1996, 13 of which met the inclusion criteria. Data were independently extracted and reviewed by the two authors. Study quality was assessed using ten pre-defined scientific and intervention-specific measures. Results Auditory training resulted in improved performance for trained tasks in 9/10 articles that reported on-task outcomes. Although significant generalisation of learning was shown to untrained measures of speech intelligibility (11/13 articles), cognition (1/1 articles) and self-reported hearing abilities (1/2 articles), improvements were small and not robust. Where reported, compliance with computer-based auditory training was high, and retention of learning was shown at post-training follow-ups. Published evidence was of very-low to moderate study quality. Conclusions Our findings demonstrate that published evidence for the efficacy of individual computer-based auditory training for adults with hearing loss is not robust and therefore cannot be reliably used to guide intervention at this time. We identify a need for high-quality evidence to further examine the efficacy of computer-based auditory training for people with hearing loss.


Background
The World Health Organization [1] reported in 2004 that over 275 million people worldwide had a significant hearing impairment. Adult-onset hearing loss is highly prevalent, whereby 27% of males and 24% of females aged 45 years and over experience mild hearing loss (defined as a pure-tone hearing threshold of 26 decibels (dB) average across 0.5, 1, 2, 4 k Hz) or greater in the better hearing ear. Hearing loss is currently estimated to be the 13 th most common disease burden worldwide, and it has been predicted that by 2030 adult-onset hearing loss will be the seventh leading disease burden, above diabetes and HIV [1]. Hearing loss can lead to additional difficulties with employment, depression, social isolation, and reduced quality of life [2]. Untreated hearing loss has a substantial social impact on the person with hearing loss and for those with whom they communicate [3,4].
For adults who gradually acquire a hearing loss, their first complaint is unlikely to be 'I cannot hear'. More often they report 'I can hear but I cannot understand what is being said', particularly in noisy listening environments [5]. Hearing aids are the most common management option for people with hearing loss, yet uptake is low, with just 20% of people with hearing loss in the UK [6,7], and just under 30% in the US [8] owning them. Furthermore, of those who do own hearing aids, between 15% and 30% do not wear them regularly [6,9]. It has become apparent over the last decade that the challenges faced by people with hearing loss cannot be explained by the audiogram alone [5,10]. Difficulties in hearing may be exacerbated by, or masquerade as, reductions in cognitive ability such as problems in remembering or comprehending speech [11,12]. Although hearing aids may help people with hearing loss hear speech, their ability to listen to and make sense of speech may still be sub-optimal. Cognitive function plays a significant role in listening, whereby greater working memory capacity is associated with improved language compre-hension [13], and selective attention has been shown to be central to following multi-speaker conversations (see [14] for a review).
The 2012 British Society of Audiology guidance for adult hearing rehabilitation [15] states that successful rehabilitation should be based upon, 'identifying individual needs, setting specific goals, making shared informed decisions and supporting selfmanagement -steps that are important for helping the client to overcome his/her difficulties in everyday life' [15]. To enable this, clinicians may need to consider interventions that are complementary or alternative to the provision of hearing aids (or cochlear implants where hearing loss is severe to profound). Auditory training is one such clinical intervention, which promotes selfmanagement of hearing difficulties, and is aimed at improving speech intelligibility through the development of auditory perceptual skills. Typically, listeners learn to make perceptual distinctions between sounds presented systematically [16]. Studies of auditory perceptual learning demonstrate the potential for training to improve auditory perceptual skills over the course of an adult's lifespan (see [17] for a review).

Auditory Training as an Intervention for People with Hearing Loss
Historically, a distinction has been made between bottom-up sensory refinement (analytic training) and top-down improvement of spoken language comprehension (synthetic training) [18]. In 2005, Sweetow and Palmer published a systematic review of studies that assessed the efficacy of individual auditory training for adults with hearing loss [18]. These studies assessed cliniciandelivered training, an intervention which is time-, resource-, and cost-intensive. Six articles, published between 1970 and 1996 [19][20][21][22][23][24], met the criteria for inclusion, which were; Participants: adults with hearing loss with or without hearing aids, who were not cochlear implant users; Intervention: analytic or synthetic auditory training, or combination of the two; Controls: with or without a control group comparison; Outcomes; one or more measure relating to communication skills (e.g. understanding speech, self-perception of ability); Study designs: randomized controlled trials, nonrandomized controlled trials, cohort and repeated measures designs with or without a control group. The authors concluded that speech recognition skills, particularly in noise, may be improved by synthetic training, whereas the contribution of analytic training remains uncertain. Yet, more recently, bottom-up (analytic) auditory training using phoneme discrimination has also shown improvements in top-down cognitive processing, which may offer additional benefit to people with hearing loss, particularly in adverse listening situations [25]. Finally, a meta-analysis of six studies assessing the benefits of (primarily clinician-delivered) auditory training for people with hearing loss published between 1970 and 2009, (including those reviewed by Sweetow & Palmer [18]), suggested a reliable but small post-training improvements in speech recognition performance (Cohen's d = .352) [26].
Over the last two decades, the emergence of individual (nonclinician delivered), computer-based auditory training (CBAT) packages has resulted in a resurgence of interest in auditory training as an intervention for people with hearing loss. The key benefits of CBAT include home-delivery, the potential to tailor training packages to individual needs, and the ability to remotely monitor and capture trainee data over the internet. Thus, CBAT is an intervention that is time-, resource-and cost-effective, and can be conveniently accessed by the user [27]. There are several key considerations in using individual CBAT as an intervention to improve speech intelligibility for people with hearing loss. First and foremost, the intervention should be demonstrated to be effective, whereby any on-task learning should generalise to functional benefits in real-world listening ability. Improvements in behavioural measures of speech intelligibility in noise are typically considered by researchers and clinicians to be the ultimate aim of CBAT, as this is the most common complaint of people with hearing loss. However, as speech intelligibility has been shown to be mediated by cognition, particularly where the speech signal is degraded, training-related improvements in cognition (e.g. attention and working memory) are also likely to reflect functional realworld benefits to listening. Second, for auditory training to be accepted by an individual and therefore undertaken, the individual must be able to identify tangible benefits from the training. Thus, improvements in self-reported communication abilities are also important to the success of CBAT. Nevertheless, evidence from studies of alternative interventions suggest that improvements in self-reported outcomes alone may simply reflect expectations created as a result of receiving an intervention [28,29]. Ideally, any improvements in self-reported outcomes should therefore be accompanied by functional benefits, as indexed by behavioural tasks of speech intelligibility or cognitive performance. Third, for auditory training to be a successful intervention for people with hearing loss, any CBAT related improvements should persist over time. Finally, individuals must comply with CBAT, as an intervention can only be effective when individuals conform. This final point is of particular importance where CBAT is used as an unsupervised, home-based intervention.

Research Aims
A systematic review aims to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria in order to answer a given research question [30]. The primary aim of the present review was to examine the evidence for individual CBAT as an effective intervention for people with hearing loss. The evidence-base in the published literature was evaluated for both on-task learning for trained stimuli, and generalisation of learning to improvements in untrained measures of speech intelligibility, cognition and communication. Secondary aims sought to examine the feasibility of individual CBAT as an intervention for people with hearing loss by examining (i) the longterm retention of training-related improvements, and (ii) levels of compliance with CBAT programmes. To address these questions, data from 13 published articles that met the criteria for inclusion were reviewed and quality assessed. Evidence for the efficacy of CBAT for people with hearing loss was extracted from included articles and the quality of evidence examined. Findings are presented together with recommendations for future research.

Methods
The Centre for Reviews and Dissemination, University of York [31], part of the National Institute for Health Research (NIHR), and the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement [32], which offer guidance for undertaking and reporting of systematic reviews in health care, were used to inform the methodology, the systematic search procedure, and the reporting of this systematic review (Checklist S1).

Systematic Search Strategy and Study Selection
Methods of data extraction, data analysis, and inclusion and exclusion criteria were pre-specified and documented within the systematic review protocol. This is important in providing transparency in the review process by ensuring that the objectives of the systematic review and methods of data identification and extraction are clearly defined prior to any data being collected. Details of the systematic review protocol have been registered with PROSPERO, the International prospective register of systematic reviews. The protocol is available online at: http://www.crd.york. ac.uk/prospero/display_record.asp?ID = CRD42011001406. Inclusion criteria were formed using the Participants, Intervention, Control, Outcomes, and Study designs (PICOS) strategy [33]. PICOS inclusion criteria are presented in Table 1. Exclusion criteria included articles that were published prior to 1996 (i.e. those included in the previous review of auditory training for people with hearing loss by Sweetow and Palmer, [18]), studies presenting pilot data, studies that were not peer reviewed and those not available in English.
Study identification. Eight electronic databases (Embase, Medline, Pubmed, Web of Science, Applied Social Sciences Index and Abstracts (ASSIA), Science Direct,/Cumulative Index to Nursing and Allied Health (CINAHL) and PsychINFO) were initially searched in August 2011 using the terms hearing loss OR hearing aid* OR hearing impair* OR cochlear implant* AND auditory training OR auditory learning OR perceptual training OR perceptual learning. Search terms were always combined in an attempt to limit identified papers to those reporting adult subjects with hearing loss. An example search string is provided (Example Search Terms S1). Additional articles were identified through the systematic snowballing of all 349 articles reference lists, and a related article search for each author of an article which met the PICOS criteria for inclusion. Three further articles were identified through ongoing hand-searches of audiology journals, up to the date of first submission of this article (December 2012), to ensure an up-to-date review.
Screening. The database searches returned a total of 349 articles. A further 27 articles were identified through the additional journal searches. A total of 229 articles of potential relevance remained after the removal of duplicate articles (n = 147). Abstracts of the 229 identified articles were independently assessed by the two authors relative to the PICOS criteria for inclusion ( Figure 1), of which, 201 failed to meet the inclusion criteria. In cases where insufficient detail was available in the abstract to make a decision, the full-text of the article was retrieved and assessed against the PICOS criteria. A total of 28 abstracts either met the PICOS criteria for inclusion, or contained insufficient information from which to make a judgement, and progressed to a second stage of screening where the full-text articles were obtained.
Eligibility. A second stage of assessment, a full-text review of the 28 potentially relevant articles, revealed 15 articles that failed to meet the PICOS criteria for inclusion. For cases where multiple publications arose from the same participant data, only the first publication was included in line with the Centre for Reviews and Dissemination guidelines [31]. A total of 13 articles were eligible for inclusion in the systematic review.

Data Extraction and Data Synthesis
Data to be extracted were pre-specified within a data extraction and quality assessment form, piloted by the two authors and amended as necessary. Final study data extraction was conducted independently by the same two authors and included details of study design, participants (number, age, sex and hearing loss), training stimuli, amount of training and training duration, outcome measures, main findings (both trained and untrained), compliance and follow-up. Where any instances of non-agreement on the extracted data occurred, the article was jointly revisited until a consensus was reached.

Study Quality and Potential Sources of Study Bias
Scientific study quality and potential sources of study bias were assessed using five independent measures; randomisation, controls, sample size and power calculation, blinding, and outcome measure reporting. Low scores on these measures indicate less information, thus a higher potential for bias in results. Five additional measures, which were all highly specific to training intervention studies, aimed to capture the quality of the intervention study designs. Measures included; generalisation of learning to functional benefits in real-world listening (outcome selection), training feedback, which has been previously shown to maximise auditory learning in auditory training [33,34], ecological validity, compliance with training protocols, and long-term followup of improvements. Scores for each of the study quality measures ranged from 0-2. A score of 0 indicated flawed or no information from which to make a judgement, a score of 1 indicated weak information or lack of detail and a score of 2 indicated appropriate use and reporting.
Individual measure scores were summed to form an overall study quality score that was then used to inform the level of evidence attributed to each study. Levels of evidence were adapted from the 2004 Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group guidelines [35] and provide an indication of confidence in the estimation of effect ( Table 2). Studies that represent a low-level of evidence offer results that are likely to vary should the experiment be repeated, whereas a study offering a high-level of evidence provides greater confidence that the data are representative of valid results.
A meta-analysis of results from comparable studies was not possible due to the heterogeneity between studies in terms of differences in participant samples (people with hearing loss, hearing aid users, and cochlear implant users), training stimuli, training protocols, and outcome measures adopted. As such, study  findings and study quality were incorporated within a narrative synthesis to aid interpretation of findings and to examine any differences in outcomes across the 13 articles included in the systematic review. Table 3 summarises the data extracted from each of the 13 articles. Where publications reported more than one study or training protocol these are presented separately in the table.

Study Characteristics
Data extracted from the 13 articles are presented in terms of the PICOS criteria (Participants, Intervention, Controls, Outcome measures, Study designs).
Intervention. Training stimuli, training frequency, and training duration varied substantially between studies. Several studies trained participants on small parts of speech, such as phonemes [36], monosyllabic words, vowels or consonants [37,40,42,47,48], spondee words [44], and nonsense syllables [38,41]. Other studies trained participants using digits [45], words or sentences [41,43]. The remaining studies trained their participants using hybrid communication training packages such as the Listening and Communication Enhancement (LACE) program [39], which included a variety of listening and cognitive tasks alongside interactive communication strategies, and 'I hear what you mean' [46], that comprised several listening comprehension tasks. Training sessions ranged from 30 minutes [39] to 2 hours per session [41], and occurred daily to twice-weekly. Training duration ranged from four days [47] to three months [44].
Controls. Ideally a systematic review would assess only highlevel evidence arising from randomised controlled trials as randomisation, rather than between-group differences, substantially increases our confidence that any observed effects are attributable to the intervention. Nevertheless, inclusion of only randomised controlled trials in this review would have served to eliminate the majority of published evidence assessing the efficacy of auditory training for people with hearing loss. As such, other study designs were considered. However, only studies that reported direct comparisons between control and trained groups, or between control and trained periods within a subject group, were included in the review. Repeated measures design (where participants act as their own controls) was the most commonly identified study type.
Outcome measures. There were no outcome measures that were common to all training studies. The majority of studies assessed measures of speech intelligibility using validated speech tests such as the Hearing in Noise Test (HINT) [68], the Revised Speech in Noise test (R-SPIN) [71], IEEE sentences [64], and the Nonsense Syllables Test (NST) [70]. The study by Sweetow and Henderson-Sabes [39] also included behavioural measures of cognition (working memory: Listening Span [11], and attention: Stroop Task [75]) and self-report of hearing (either the Hearing Handicap Inventory for the Elderly (HHIE) [67] or the Hearing Handicap Inventory for Adults (HHIA) [67], and the Communication Scale for Older Adults (CSOA] [60]). The study by Ingvalson et al. [47] was the only other study to include a selfreport measure of hearing (Speech and Spatial Qualities of Hearing Scale (SSQ) [74]), whereas Stacey et al. [43] assessed selfreport of health status using the Glasgow Benefit Inventory (GBI) [65].

On-task Learning
On-task learning was defined as any improvement in performance on a task or stimulus that had been directly trained. This was almost always reported in studies assessing the efficacy of auditory training for people with hearing loss (10/13). Nine articles reported significant on-task learning for trained stimuli. Despite trends towards improvement, Stacey et al. [43] did not show significant on-task learning for trained words in a group of 10 cochlear implant users. Barcroft et al. [46], Ingvalson et al. [47] and Zhang et al. [48] did not report any on-task learning results.
Burk and colleagues were the only authors to report multiple outcomes from variations in training protocols using the same word training stimuli. Humes et al. [42] reported significant and considerably smaller improvements (20%) in open-set word recognition for trained words presented in larger sets (600 words) than Burk and Humes, (40-55%) who presented words in smaller (150 words) sets [40].

Generalisation of On-task Learning
Generalisation of learning was defined as an improvement in performance on a task or stimulus not directly trained. Outcomes that measured the generalisation of learning were divided into three sub-groups; improvements in speech intelligibility, cognition and self-report of communication.
Speech Intelligibility. All studies reported at least one measure of speech intelligibility. Table 4 provides a summary of these outcomes and any significant post-training improvement. Results are presented for untrained speech stimuli only, thus generalisation to improvements in performance for trained stimuli produced by different talkers is not considered here (refer to Table 3).
A number of studies reporting untrained speech intelligibility measures identified measures that had a degree of overlap between the lexical content of the trained and outcome stimuli. For example, Burk et al. [37] reported a 6-9% overlap and tested trained word recognition embedded in untrained sentences. Humes et al. [42] reported substantially greater overlap between the lexical content of trained and outcome stimuli of 50-80%. For some studies the degree of overlap was unclear, for example, the 'Four Choice Discrimination Task' reported by Barcroft et al. [46] appears to be very similar in nature to a trained exercise, although no details about any lexical overlap are provided.
Results revealed mixed findings for all participant groups, training stimuli, and study designs, whereby generalisation of learning to untrained measures of speech intelligibility did not always occur. For example, Burk et al. [37] reported that training on words generalised to improvements in untrained words and to trained words by untrained speakers, but did not generalise to    Data for normally hearing participants are omitted from this table [42]. 2 = no data reported, *p,.05, **p,.01, ***p,.001. MSB = multi-speaker babble, SNHL = sensorineural hearing loss; SNR = signal to noise ratio, SRT = speech reception threshold, RAU = rationalised arcsine unit [49]; Adaptive spondee words in babble test [50]; Adaptive 12-choice spondee words with multiple jammers test [51]; AzBio sentences [52]; BKB = Bamford-Kowal-Bench sentence lists [53]; Build-a-sentence test [54]; CID Everyday Sentences = Central Institute for the Deaf everyday sentences [55]; CID W22 = Central Institute for the Deaf word lists [56]; CNC = consonant-vowel nucleus-consonant monosyllables [57]; CNC words [58]; Consonant recognition [59]; CSOA = Communication Scale for Older Adults [60]; CST-A = Connected Speech [audio] test [61]; CST-AV = Connected Speech [audio-visual] test [61]; CUNY = City University of New York sentences [62]; Everyday sounds localization test [63]; IEEE Sentences [64] Four-choice discrimination test [46]; GBI = Glasgow benefit inventory [65]; HHIA = Hearing handicap inventory for adults [66]; HHIE = Hearing handicap inventory for the elderly [67]; HINT = Hearing in noise test [68]; Iowa consonant test [69]; LACE = Listening and Communication Enhancement, [39]; Listening span test [11]; NST = Nonsense syllable test [70]; QuickSIN [70]; R-SPIN = Revised speech perception noise test [71]; SPATS = [72]; SPIN = Speech perception in noise test [73]; SSQ = Speech, Spatial and Qualities of Hearing Scale [74]; Stroop Color Word test [75]; TIMIT sentences [76]; VAST = Verb and sentence test [77]; Vowel recognition [78]. doi:10.1371/journal.pone.0062836.t003  [48] showed post-training improvements in the intelligibility of untrained vowels, consonants and words, but no improvements in performance for untrained sentences. Typically, where improvements were reported, the magnitude of improvement for people with hearing loss was small. For example, older hearing impaired listeners in Burk et al. [37] improved on untrained word recognition by an average of 6.9% compared to 45.3% in younger normally hearing listeners. Average improvement in untrained measures of speech intelligibility in people with hearing loss, ranged from 3.3% for sentences [38], to 14.9% average for words [48]. Sweetow and Henderson-Sabes, [39] were the only authors to report effect sizes for improvements in speech outcomes for people with hearing loss following training using LACE. Despite small reported effect sizes (ES) for an untrained measure of speech intelligibility, (QuickSIN: improvement of 21.5 dB signal to noise ratio (SNR) when presented at 70 dB, ES = 0.23, improvement of 22.2 dB SNR when presented at 45 dB, ES = 0.31) [70], the authors suggested that 46% of participants achieved clinically significant post-training improvements (defined as an improvement of 21.6 dB or greater in the SNR). No significant improvements were shown for HINT sentences, which the authors attributed to improvements also being shown in the control group, suggesting likely test-retest improvement effects. Cognition. The study by Sweetow and Henderson-Sabes [39] was the only study to include cognitive outcome measures. Significant post-training improvements were shown for measures of attention (Stroop, 7.5 points) and working memory (Listening Span, 0.5 sentences). However, unlike the results for speech intelligibility measures in the same study, effect sizes were not presented for these cognitive outcomes. Furthermore, due to the hybrid (auditory-cognitive) nature of the training stimuli, it is not clear which element(s) of LACE contributed to the improvements in cognition.

Retention of Learning
Retention of learning was defined as (i) the maintenance of a significant improvement from pre-training baseline performance, or (ii) a non-significant decrease in post-training performance, at a delayed post-training follow-up assessment. Follow-up assessments were reported in 8/13 articles, ranging from 4 days to 7 months post-training.
Retention of on-task learning. Burk et al. [37] reported that trained word-recognition performance at baseline (37.6%) was significantly improved six months post-training (62.9%, p,.05), but significantly worse than immediate post-training performance (83.5%, p,.05). The authors reported that participants were returned to peak post-training performance levels with as little as one hour of top-up training, although no additional follow-up was conducted to identify for how long peak performance was maintained. Stecker et al. [38] reported that Nonsense Syllable Test (NST) scores for new hearing aid users, showed no significant decrement from immediate post-training performance (9.8%) at an eight week follow-up (8.7%). For existing hearing aid users, the same was true, despite a smaller post-training improvement. Burk and Humes [40] tested participants on the same outcomes once a week for seven weeks after completion of two training protocols (easy and hard words), with no significant reduction in performance over the seven week period (Table 3). Tyler et al. [44] reported retention of trained sound localization and spondee-in-noise performance at two and seven months posttraining, although no statistical tests are reported due to the small sample size. Finally, Oba et al. [45] reported retention of trained digit recognition performance at a one-month post-training that was significantly better than pre-training baseline, with no significant reduction from post-training performance levels.
Retention of generalised improvements in untrained outcomes. Sweetow and Henderson-Sabes [39] reported that at the time of publication, 65% (42/65) of trained participants had completed both QuickSIN and HINT sentences, and 48% (31/65) of participants had completed the HHIE, HHIA and the CSOA questionnaires, at a four-week follow-up. Post-training improvements were reported to be maintained for all measures, although no statistical analyses were presented. Tyler et al. [44] reported retention of improvements for HINT sentences seven months posttraining for subject 1 (of 3) only. However, the authors also reported a gradual improvement in performance over time at pretraining assessments for subject 1, suggesting that some degree of post-training improvement in this measure may be attributable to   Table 5. Study validity criteria, study quality scores and levels of evidence for included articles.

Article
Scientific study validity criteria  Criteria scoring: 0 = flawed or no information from which to make a judgement, 1 = weak information, incorrect use or lack of detail from which to make a judgement, 2 = appropriate use and reporting. Study quality score = sum of scores for scientific and training-specific study validity criteria. 1. Level of evidence: Study quality score of 0-5 = very low, 6-10 = low, 11-15 = moderate, 16-20 = high (adapted from GRADE Working Group, 2004 [35] test-retest improvement effects. Oba et al. [45] reported retention at one month post-training was significantly greater than pretraining baseline performance for both HINT and IEEE sentences (in steady noise and in babble) with no significant change between immediate post-training and follow-up performance. Ingvalson et al. [47] reported no significant performance differences four days post-training for HINT and QuickSIN sentences. Zhang et al. [48] reported maintenance of a significant increase from pretraining baseline at a one month post-training assessment, for measures of vowel, consonant and CNC word recognition. However, no information was provided as to whether performance on these measures was significantly reduced from immediate posttraining levels.

Compliance with training
Compliance was defined as the percentage of participants completing the requested training duration in each study. Compliance was reported in less than half of the articles (6/13) and the method of identifying those participants who did not achieve 100% compliance differed between studies. Stecker et al. [38] reported that on average, participants achieved 37 of the requested 44 days of training (92.5%). Sweetow and Henderson-Sabes [39] reported overall compliance of 73% (65/89 participants completed the training) although no details of training duration were provided for those who had completed. Humes et al. [42] reported that 13/16 participants (81%) completed the requested training duration. The remaining three participants completed 92%, 75% and 50% of the requested training. As these participants did not significantly differ in performance on the posttraining outcomes compared with fully-compliant participants (CID Everyday Sentences [55], frequent words and phrases [42], and VAST sentences [78]), data from these low-compliant participants were included in the main analysis. In the article by Stacey et al. [43], compliance was 73% and three participants who completed only five of the requested 15 hours of training were excluded from the main analysis. Oba et al. [45] reported that 100% of participants achieved the required training duration of 600 minutes, despite reported training durations ranging from 583-767 minutes. Similarly, Zhang et al. [48] stated that 100% of participants completed the 20 hours requested training, but reported a mean training duration of 18 hours (range 15.4-21.2 hours).

Quality Assessment
Study quality was assessed using five measures of scientific study validity, and five training-specific study quality criteria (each scoring 0-2), resulting in a possible maximum study quality score of 20. Table 5 shows individual study validity ratings and overall study quality scores for each of the 13 articles. Overall study quality ranged from very low (lowest score for Barcroft et al. [46], scoring 1/20) to moderate study quality (highest score for Sweetow and Henderson-Sabes, [39], scoring 13/20).
Scientific study quality. Out of a maximum 10 points for the scientific study quality, the highest scoring article achieved a total of five points [39]. None of the included articles reported participant or tester blinding, or a power calculation to determine the required participant sample size. Where participant randomisation was used, there was often a lack of detail on how randomisation was conducted [38,39,46]. However, more than half the articles (8/13) scored 2 points for the adequate reporting of all included outcome measures in their studies.
Training intervention-specific study quality. The majority of articles (10/13) reported the use of performance feedback in their training protocols. Reporting of participant compliance with training regimens occurred in almost half of the included articles (6/13). However, definitions of non-compliance varied. For example, some studies reported this as the number of participants remaining in the study irrespective of whether they had completed the requested amount of training, e.g. [39]. Others considered this to mean to completion of the requested training duration, e.g. [43]. Follow-up assessments were reported in 8/13 articles, and ranged from repeated testing of participants at weekly intervals [40] to single follow-up assessments [37][38][39]44,45,47,48]. Training in the participant's home environment occurred in approximately half of the studies [36,38,39,[43][44][45]48], while the remaining studies delivered training in the laboratory [37,[40][41][42]46,47]. Although the majority of studies (11/13) assessed and reported the generalisation of learning to untrained measures of speech intelligibility, some to cognition or communication, outcome measures were not always reported in adequate detail [37,38,44]. Furthermore, there were frequent reports of test administration inconsistencies whereby not all participants were administered all outcome measures [36,38,46], training stimuli varied between participants [48], and findings from some outcome measures were omitted from the results [46,47].
Risk of funding bias. Sweetow and Henderson-Sabes [39] acknowledged a potential conflict of interest in funding. They reported a financial interest in Neurotone, Inc., the company licenced by the University of California, San Franscisco to produce LACE training software.

Discussion
The primary aim of this systematic review was to examine the evidence for individual computer-based auditory training (CBAT) as an effective intervention for people with hearing loss. Secondary aims sought to examine the feasibility of individual CBAT as an intervention for people with hearing loss by examining (i) the longterm retention of auditory training-related improvements at posttraining follow-ups, and (ii) levels of participant compliance with individual CBAT protocols.

Efficacy of Individual Computer-based Auditory Training as an Intervention for People with Hearing Loss
Following a program of individual CBAT, significant improvements on the trained task (on-task learning) were shown for all but one of the articles that reported on-task outcomes [9/10]. However, evidence for the generalisation of learning to functional benefits (i.e. speech intelligibility) for people with hearing loss was mixed. A narrative synthesis and quality assessment of included articles suggested that evidence was not robust, and a number of confounding factors contributed to the inconsistency in reported effects. First and foremost, a lack of homogeneity in training protocols (training stimuli, duration or frequency), outcome measures, participant samples (sample size, hearing loss) and study designs may have resulted in variations in reported outcomes. Second, where generalisation of learning was shown to untrained measures, improvements were often highly variable between trained individuals [36,42,43,48] and not everyone was shown to benefit from auditory training [39][40][41]44,45,48]. Previous research into the neurophysiological changes resulting from auditory perceptual learning for normally hearing adults suggests that although the auditory system is responsive to training, there is a substantial degree of variability across individuals in their ability to make use of physiological cues [79].
In a previous review of the efficacy of auditory training for adults with hearing loss [18], the authors concluded there was little evidence for real-world effectiveness, but some evidence for within-study efficacy, of individual auditory training for people with hearing loss. A more recent meta-analysis of six (predominantly clinician-delivered) auditory training studies published between 1970 and 2009 [26] suggested reliable but small improvements in speech recognition (Cohen's d = .352). Findings from the present review are similar in that on-task learning nearly always occurred for people with hearing loss following individual CBAT. Furthermore, there was some evidence for the generalisation of that on-task learning to untrained measures of speech intelligibility, cognition and self-reports of communication. However, the magnitude of improvement for untrained outcomes is small, and reported improvements are shown to be inconsistent across different studies, and within studies across individual trainees.

Feasibility of Individual Computer-based Auditory Training as an Intervention for People with Hearing Loss
Feasibility was considered in terms of the retention of trainingrelated improvements and compliance with individual CBAT. Although retention of post-training improvements was shown for a range of on-task and untrained measures at follow-up assessments up to 7 months post-training, the definition of retention varied across studies. The majority focused on the retention of improvements for trained tasks, not the retention of generalised improvements in untrained measures of speech intelligibility, cognition and communication. It is the latter that holds the most promise for individual CBAT to improve the everyday listening abilities of people with hearing loss.
Details of participant compliance with training were often underreported, appearing in less than half of the articles included in the review (6/13). Where reported, participant compliance rates were high. However, these reports of high compliance with training were not consistent with a large-scale study of more than 3000 clinical LACE trainees [80], where compliance (defined as completion of 10 or more training sessions) was around 30% [81]. This may suggest that rates of compliance with individual CBAT may be greater in smaller, controlled research-settings than may be typically expected in clinical environments.

Study Quality and Evidence of Bias
Study quality scores suggest that the articles included in this review offer very-low to moderate levels of evidence. None of the studies reported participant or tester blinding. Thus, where between-groups designs are employed and the control group received no intervention [38,39,41,44], we cannot be confident that any training improvements were not biased by placebo effects. The study by Sweetow and Henderson-Sabes [39] that obtained the highest quality score was the only study to report cognitive outcomes. Results showed significant generalisation of learning from trained stimuli (LACE) to untrained measures of speech intelligibility (QuickSIN), cognition (attention: Stroop Task, and working memory: listening span), and self-report of communication (HHIE, HHIA and CSOA).
The majority of studies failed to account for test-retest improvements in reported outcomes. When administering the outcomes across multiple test sessions, there is a possibility that improvement will occur as a result of procedural rather than perceptual learning [82,83]. It has been recommended for the HINT and QuickSIN sentences that practice with at least two sentence lists is needed to eliminate procedural learning effects at baseline sessions [83]. Only two of the articles repeated outcome measure assessments at baseline sessions. Studies by Fu and colleagues [36] administered outcomes for a minimum of two weeks prior to training, and Stacey and colleagues [43] repeated baseline measures for approximately three hours per participant, both until a performance asymptote was reached. The total number of occasions outcomes that administered was not reported in either study. Finally, two articles omitted findings from some outcome measures included in the studies [46,47]. Selective outcome reporting is likely to lead to inaccurate and misleading conclusions being reached [84].

Lack of High-quality Evidence as a Barrier to implementation
Results from this systematic review demonstrate robust on-task learning following individual CBAT. Generalisation of on-task learning to functional benefits for people with hearing loss is less robust. Evidence for the generalisation of on-task learning to improvements in speech intelligibility, cognition and self-report of hearing suggests that improvements are both small and inconsistent across studies and individual trainees. Inconsistencies in reported effects may in part be associated with inconsistencies in study designs, training protocols, participant samples, and outcome measures adopted. However, analysis of study quality has demonstrated some fundamental issues with scientific control, which may result in a range of biases in reported effects. Nevertheless, retention of learning from both trained and untrained stimuli was shown to persist where reported, up to seven months post-training. Furthermore, where participant compliance was reported, rates were high. This suggests that individual CBAT has the potential to be a feasible intervention, which may offer benefit to the auditory-perceptual abilities of people with hearing loss.
Nevertheless, some of the questions posed by Boothroyd [85] in a discussion of the potential role of formal training in adapting to changed hearing, remain unanswered in the current evidencebase. First, where benefits occur, what are the mechanisms of benefit of auditory training for people with hearing loss (i.e. where generalisation of learning is shown, what elements of on-task learning are these attributable to)? How do individual characteristics interact with training outcomes? And, do training-induced changes influence participation and quality of life for people with hearing loss? Although many of these questions are currently being explored in normally hearing listeners [86][87][88] there is a need for further research designed to specifically address these issues in people with hearing loss.

Recommendations for further research
Based on the reviewed evidence we propose key recommendations for future research: 1. High-level evidence. High quality studies that are randomised, blinded, with a sample size dictated by a power calculation, are crucial to adequately assess the efficacy of individual computer-based auditory training for people with hearing loss. Furthermore, the possible inclusion of an 'active' control group (that is a task comparable to the training group task, but for which no improvement in performance is expected), may help enable participant blinding to help ensure any trainingrelated improvements are not influenced by placebo effects. Future research would ideally be reported in accordance with the CONSORT statement [89], which offers guidelines for the reporting of randomised controlled trials. This would result in sufficiently detailed reporting to allow for an adequate appraisal of the quality and applicability of published results. It is also important that future studies consider key factors pertinent to training intervention studies, including ecologically valid training environments, performance related feedback, and follow-up assessments to ascertain the long-term benefits of auditory training and adequate reporting of participant (non-) compliance as interventions can only ever be beneficial if individuals comply with them.
2. Outcome selection. Measures that are appropriate for and sensitive to CBAT effects should be adopted to allow accurate assessment of training efficacy. This includes a consideration of the magnitude of effect required for those outcomes to represent clinically significant improvements in listening abilities for people with hearing loss, for example, combining behavioural outcomes with questionnaires to assess self-reported benefits in everyday listening. In addition, the inclusion of cognitive outcomes in future studies assessing the efficacy of individual CBAT for people with hearing loss may be informative given that the only study to include such measures in this review reported significant posttraining improvements in attention and auditory working memory [39]. Evidence from a study of LACE training with normally hearing adults suggested that training improves the neural representation of cues important for speech perception [86]. However, Tremblay and colleagues argue that at least that some of the physiological changes as a result of auditory training may not reflect sensory-specific fine-tuning, but other top-down modulatory influences that are activated during focused listening tasks, such as stimulus exposure, attention, memory, decision-making and task execution [88]. Thus, measurement of both auditory and cognitive outcomes may help to characterise the mechanisms of benefit for people with hearing loss following auditory training.
3. Standardisation of outcome measures. Standardisation of outcomes across auditory training studies would enable comparisons to be made between different training protocols. Furthermore, common outcome measures would enable metaanalyses of data from future training intervention studies.
4. Candidature. Published evidence suggests that posttraining improvements in untrained outcomes are highly variable, and not everyone benefits from auditory training [39][40][41]44,45]. Thus, identification of those individuals most likely to benefit from auditory training would be of substantial clinical importance, enabling clinicians to individually target those for whom CBAT would be most effective, and consider alternative interventions for those who are least likely to benefit from training.

Summary and Conclusions
Individual computer-based auditory training (CBAT) is a time and cost efficient, flexible self-management intervention that has the potential to be delivered to people with hearing loss in their home environment. It is easily accessible to the target population via PCs and the internet [27], and can be tailored to individual needs. The present review identifies scientific, methodological and study quality issues in each of the 13 articles included in this systematic review. Our findings demonstrate that although individual CBAT is a feasible intervention for people with hearing loss, published evidence for the efficacy of individual CBAT to improve speech intelligibility, cognition and hearing abilities for adults with hearing loss is neither consistent nor robust. As such, the evidence cannot be used reliably to guide intervention at this time. Future high level evidence and the standardisation of outcome measures across different training studies will provide an evidence-base from which to adequately assess the efficacy of auditory training as an intervention for people with hearing loss.

Supporting Information
Checklist S1 Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Checklist.

(DOC)
Example Search Terms S1 Example terms used to search the PubMed database. (DOCX)