Real-time loudness normalisation with combined cochlear implant and hearing aid stimulation

Dimitar Spirrov; Maaike Van Eeckhoutte; Lieselot Van Deun; Tom Francart

doi:10.1371/journal.pone.0195412

Abstract

Background

People who use a cochlear implant together with a contralateral hearing aid—so-called bimodal listeners—have poor localisation abilities and sounds are often not balanced in loudness across ears. In order to address the latter, a loudness balancing algorithm was created, which equalises the loudness growth functions for the two ears. The algorithm uses loudness models in order to continuously adjust the two signals to loudness targets. Previous tests demonstrated improved binaural balance, improved localisation, and better speech intelligibility in quiet for soft phonemes. In those studies, however, all stimuli were preprocessed so spontaneous head movements and individual head-related transfer functions were not taken into account. Furthermore, the hearing aid processing was linear.

Study design

In the present study, we simplified the acoustical loudness model and implemented the algorithm in a real-time system. We tested bimodal listeners on speech perception and on sound localisation, both in normal loudness growth configuration and in a configuration with a modified loudness growth function. We also used linear and compressive hearing aids.

Results

The comparison between the original acoustical loudness model and the new simplified model showed loudness differences below 3% for almost all tested speech-like stimuli and levels. We found no effect of balancing the loudness growth across ears for speech perception ability in quiet and in noise. We found some small improvements in localisation performance. Further investigation with a larger sample size is required.

Citation: Spirrov D, Van Eeckhoutte M, Van Deun L, Francart T (2018) Real-time loudness normalisation with combined cochlear implant and hearing aid stimulation. PLoS ONE 13(4): e0195412. https://doi.org/10.1371/journal.pone.0195412

Editor: Bolajoko O. Olusanya, Center for Healthy Start Initiative, NIGERIA

Received: November 28, 2017; Accepted: March 21, 2018; Published: April 4, 2018

Copyright: © 2018 Spirrov et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data from the current study are publicly available in the Open Science Framework repository under https://osf.io/tg8r6 with DOI: 10.17605/OSF.IO/TG8R6.

Funding: This work was supported by Flanders Innovation (https://www.iwt.be) and Cochlear Ltd. (http://www.cochlear.com/wps/wcm/connect/be/home) (IWT R & D 110722 and IWT Baekeland 140748). Maaike Van Eeckhoutte was supported by a PhD grant for Strategic Basic Research by the Agency for Innovation by Science and Technology in Flanders (IWT, 131106).

Competing interests: Apart from the necessary additional equipment provided by Cochlear Ltd, the funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. We declare no competing interest. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Introduction

There is currently a large number of people who use a cochlear implant (CI) together with a contralateral hearing aid (HA), as a result of relaxed CI candidacy criteria. The combination of acoustical stimulation by the HA and electrical stimulation by the CI is known as bimodal stimulation. Although there is individual variability, bimodal users generally have better speech intelligibility in quiet and in noise when they use both devices compared to the CI alone [1].

However, this improvement is sometimes lower than expected due to limiting factors associated with the used devices [2]. Even similar components like compressors are often different in the CI and the HA [3]. The differences between the modes of stimulation and the processing in the devices result in differences in loudness growth at both ears [4, 5]. Loudness growth relates the loudness to the sound intensity level. This means that even when sounds of the CI and the HA are balanced in loudness for a certain stimulus level, the loudness of the CI and the HA can be quite different at another level. This can lead to unbalanced sound perception [6] and can limit sound localisation performance [7].

To address the problem of different loudness growth functions, a loudness normalisation algorithm, called SCORE, was developed by Francart and McDermott [8]. For each segment of sound, this algorithm estimates the loudness caused by the CI and by the HA. The loudness of the CI is calculated based on the loudness model of McKay [9] and that of the HA is based on the model of Moore and Glasberg from 1997 [10]. For each device, based on the microphone signal, the algorithm also computes the loudness for a normal hearing subject. This loudness is used as a reference, which serves as the loudness target. Given that the loudness target is computed at each side, SCORE does not require communication between the devices. Based on the differences between the loudness target and the loudness caused by the CI or HA, the overall gain of the HA and the electrical current stimulated by the CI are adapted. As such, the algorithm returns the loudness growth to normal and automatically balances the HA and the CI for a frontal signal with any intensity and frequency content. Also, as such SCORE restores the interaural loudness difference, because for sounds coming from an angle different from 0 degrees, the targets at both sides will be different. The loudness target at the side where the sound comes from, will be higher than the loudness at the contralateral side.

The SCORE algorithm was implemented in Matrix Laboratory (MATLAB) computing environment and was evaluated in six bimodal listeners for speech intelligibility in quiet and in noise and in two listeners for localisation [11]. An improvement was found for sound localisation performance and for speech intelligibility in quiet for soft phonemes, but not for speech in noise.

Since the algorithm of Francart and McDermott was tested with an off-line implementation, the stimuli had to be preprocessed. Individual head related transfer functions were not considered. Also, spontaneous head movements that change loudness differences between devices and may be important for localisation were not taken into account. Therefore, a real-time version of SCORE was needed. Here, the computational complexity of the acoustic loudness model is a challenge. Therefore, it was necessary to simplify the acoustical models for real-time implementation.

On the real-time implementation of the normal hearing loudness model, there are some studies [12, 13]. These studies either used spectral peaks or non-uniform sampling of the spectrum of the signal. Then, both studies used the intensity differences between consequent time frames and computed the excitation pattern only after these differences were above a certain threshold. Therefore, the group delay caused by these algorithms depends on the incoming speech, making them sub-optimal for implementation in hearing devices.

An important parameter to consider is the loudness target. There is an ongoing discussion as to whether loudness growth should be restored to normal or not. It has been argued [14] that for listeners with substantial sensorineural hearing loss, for better audibility, we should provide comfortable loudness across all frequencies. Some fitting prescriptions aim for this [15]. Also, at the electrical side, there is evidence that lower intensities should be (almost) equally loud as higher intensities [16, 17]. SCORE allows investigating the effect of alternative loudness targets. For instance, we can enhance the loudness target compared to normal hearing loudness for lower intensities, in order to make the consonants almost equally loud as the vowels.

A number of studies [3, 7, 18, 19] investigated the effect of balanced loudness on speech intelligibility. However, the CI and HA were loudness balanced only for a number of intensities and/or frequencies. In contrast, SCORE enables us to study the effect of balancing the entire loudness growth functions on speech intelligibility and localisation. We assume that maintaining almost equal loudness for low and high intensities will disturb localisation since it reduces the interaural loudness difference (ILoD). Note that we use a slightly different abbreviation to distinguish from the interaural level difference (ILD), which is a physical phenomenon caused by the head-shadow effect. To improve localisation, we should enlarge ILoD by changing the loudness target in the opposite direction, which means reducing the loudness target for lower intensities.

This study had three objectives. First, to simplify the acoustical loudness models and to validate them. Second, to implement SCORE in a real-time system and to test it in bimodal users, both with a linear and with a compressive hearing aid. Third, to change the loudness targets at the two sides to assess the importance of loudness cues on speech intelligibility and localisation.

Real-time implementation: Methods and validation

The major difficulty for a real-time algorithm is the implementation of the loudness models: a normal hearing loudness model that is used as a reference and a loudness model that computes the HA caused loudness and accounts for the degree of hearing impairment. The models use a number of equivalent rectangular bandwidth (ERB) filters that are level dependent [20] (a landmark article). It is challenging to compute a number of ERB filters on-line. In the models of Moore and Glasberg, loudness is a result of a multiplication of ERB filters and the intensities from the fast Fourier transform (FFT). It is also difficult to perform this multiplication on-line.

The loudness models were implemented and validated in Simulink (The Matworks, Natick, MA, USA). The execution time of the acoustical loudness models depends on two factors. First, the number of equivalent rectangular bandwidth (ERB) [20] filters and second, the number of FFT bins. In order to make the models faster we can reduce either of them. We know that speech signals have more power at lower frequencies [21]. Given that the cochlea has a logarithmic frequency organisation [22], whereas FFT provides a linear frequency spacing, it seems appropriate to group high frequency FFT bins. We tested three quasi-log FFT bin combinations in order to achieve bands that represent the distribution of the ERB filters more closely [23]. Such a combination of bins and consecutive summation of the bin power is often used for the design of HAs [24, 25]. The combinations resulted in 19, 13 or 8 frequency bands. We used a time window of 8 ms, a sampling rate of 16 kHz, and 128 FFT bins (64 bins until π/2), which means a bin spacing of 125 Hz. For the configuration with 19 bands, we started grouping bins from 1 kHz. Therefore bands 1 to 8 were identical to the first 8 FFT bins. For each band from 9 to 12, we grouped 2 bins. For bands 13 to 16, we grouped 4 bins, then, for 17 and 18, we grouped 8 bins. Finally, for band 19, we grouped 16 bins.

In the other two configurations, with 13 and 8 frequency bands, we started grouping bins for even lower frequencies, more specifically from 0.5 kHz and from 0.25 kHz. We also tested ERB step sizes of 1, 2 and 3 ERB numbers.

The rest of the loudness model implementation followed Moore 1997 [10]. Only for the middle-ear transfer function, we used that of Moore 2004 [26], similar to the off-line implementation of Francart and McDermott [8].

Validation of the simplified models

For each of the three tested FFT configurations and ERB step sizes we compared the computed loudness to that of the original MATLAB implementation. We did this for three different hearing losses, one flat of 60 dBHL, one ski slope based on the data of Byrne [27], and one representative for the residual hearing of bimodal listeners using the data of Yoon [28]. Also, for the normal hearing loudness model, we compared the implementations when no hearing loss was present.

Stimuli used for the validation

We used five stimuli, of which four were speech or speech-like: 1) a white noise of 1s filtered according to the long-term average speech spectrum (LTASS) from Byrne [21]; 2) a 2 s fragment of the international speech test signal (ISTS) from Holube [29]; 3) a 2 s fragment of a Swedish competing male talker from the story ‘The north wind and the sun’ (IPA, 1999); 4) one 3.9 s sentence from a Dutch female speaker from the LIST speech material [30]; 5) a 1 s frequency sweep (250–8000 Hz). For each stimulus we tested five presentation levels from 50 to 90 dBSPL in steps of 10 dB.

Metrics for the validation

We compared the instantaneous loudness differences between the original MATLAB implementation and four Simulink implementations (the three FFT configurations above plus one with 64 bins that was used as a reference). For each signal frame of 8 ms we computed the loudness difference as a percentage of the loudness of the MATLAB model. Loudness differences were considered as outliers following two rules: first, if the MATLAB model loudness was below 0.1 sones and second, if the ratio between the loudness of the original and simplified model fell outside of the 99% confidence interval. The execution times were measured and the time reduction compared to the Simulink implementation with 64 bins was calculated.

Results from the validation

The loudness differences were more influenced by the ERB step size than by the combination of bins. The overall trend was that the difference decreased exponentially with decreasing ERB step size. Also, as expected, the configuration with 19 bands yielded the smallest loudness differences. Therefore, the configuration with 19 bands and ERB step size of 1 was selected for further evaluation. The loudness differences for the selected configuration are shown in Fig 1.

Download:

Fig 1. Median loudness differences.

The dashed line shows the just noticeable loudness difference for noises, based on Allen (1997) [31].

https://doi.org/10.1371/journal.pone.0195412.g001

Compared to the frequency sweep, the speech and speech-like stimuli showed smaller loudness differences. There was only one case (60 dB HL flat hearing loss and 50 dB SPL ISTS) where the median loudness difference was above the just noticeable loudness difference of 3% for noises based on the work of Allen [31]. However, all participants in the study (see below) had worse hearing than 60 dBHL. Therefore, for speech stimuli, we do not expect a substantial effect from the simplifications of the models. The time reduction for the three configurations is shown in Table 1.

Download:

Table 1. Execution time as a percentage of the baseline.

https://doi.org/10.1371/journal.pone.0195412.t001

We were able to execute the complete algorithm with loudness models, based on 19 bins and 30 ERBs, with step size 1, in a real-time target xPC system. The system is used as a quick prototyping tool.

Experiment 1: Normalising loudness growth

Methods

In order to assess the effect of the model simplifications and the implementation of the real-time SCORE algorithm, we tested it in bimodal listeners.

Subjects.

In total, nine native Dutch speakers (six male, three female) participated in either one of the experiments. All subjects were tested on speech perception in quiet. In the first experiment, six of them were tested on speech perception in noise; four were tested on localisation. One of the subjects (S2) had strong fluctuating tinnitus at the HA side. More information about the subjects is given in Table 2.

Download:

Table 2. Information about the subjects.

https://doi.org/10.1371/journal.pone.0195412.t002

The unaided audiograms of the hearing aid side are shown in Fig 2.

Download:

Fig 2. Unaided audiograms.

Pure-tone unaided audiograms of the non-implanted ear at the time of the first participation.

https://doi.org/10.1371/journal.pone.0195412.g002

High frequency residual hearing was preferable to test a potential effect on localisation, given that bimodal listeners only have access to interaural level differences [11]. Therefore, we tested less people for localisation than for speech intelligibility.

The experiments were approved by the ethics committee of UZ Leuven. All subjects signed a declaration of informed consent before the experiments. Their travel costs were reimbursed.

Equipment.

The experimental setup is shown in Fig 3.

Download:

Fig 3. Experimental setup.

A real-time system was used for HA and CI processing.

https://doi.org/10.1371/journal.pone.0195412.g003

We used a regular PC with APEX3 [32] and MATLAB/Simulink software (The Matworks, Natick, MA, USA). The computer was connected to three RME Multiface II sound cards (Audio AG, Haimhausen, Germany). One card was used for the speech perception experiments; the other two for the localisation test. The cards were driving one Genelec 8030A (Genelec Iisalmi, Finland) loudspeaker at 0° for speech intelligibility tests or an arc of 13 Fostex 6301B (Foster Electric Co., Ltd, Tokyo, Japan) loudspeakers from -90 to +90 degrees in steps of 15 degrees for the localisation test. The distance between the loudspeakers and the test subject was 1 meter.

For the CI stimulation we used a Simulink model of a cochlear implant (ACE strategy) provided by Cochlear Ltd, which means that the CI processing was identical to that of the clinical devices. All preprocessing options (AGC, ADRO, beam-former, etc.) were switched off. On the hearing aid side we used a linear HA model implemented as a 129-taps digital filter in Simulink. The filter coefficients were calculated from the HA insertion gains. The electrical and acoustical parts of the SCORE algorithm [11] were implemented as Simulink models. All described models were compiled and then executed on a real-time target xPC system (Speedgoat GmbH, Liebefeld, Switzerland) with Intel i3 dual core 3.3 GHz processor.

The stimuli at the HA side were presented using an ER-3A insert phone (Etymotic Research, Elk Grove Village, IL, USA). For the electrical stimulation, a Cochlear StimGen box provided by Cochlear Ltd. was used to connect the xPC target system to the participant’s implant. An oscilloscope was used to measure the time difference between the acoustical and the electrical stimulations (see below).

Calibration.

Since tests were carried out in free field, we used a manikin CORTEX MK2 (Metravib, Limonest, France) for the calibration. The CI model was calibrated according to the recommendations of Cochlear Ltd. The slope of the loudness growth function of the electrical loudness model [11] was held constant. The electrical loudness model has a calibration parameter to give results in sones. This parameter was adapted until the additional current units from SCORE were zero or negligibly small. This means that SCORE was inactive for a comfortable loudness level. The HA model was calibrated by setting a broadband gain to match the output for a 60 dBA broadband noise presented in free field. The acoustical loudness model was calibrated by matching 60 dBA broadband noise to the recorded level in the xPC system.

We know that it takes about 1.5 ms for the acoustical signal to travel from the insert phone to the auditory nerve [33]. However, at the CI side the auditory nerve is directly stimulated. Therefore, we delayed the CI channel with 1.5 ms.

Stimuli.

Speech perception in quiet was measured with consonant-vocal-consonant words of the Dutch NVA lists spoken by a male speaker [34]. Words were presented at 50 and 65 dBA. Each list has 12 words. The first word is a training word. We used one NVA list at each level.

Speech perception in noise was tested with the Dutch LIST sentences [30], spoken by a female speaker. Each list consists of ten sentences. The noise was a male competing talker in Dutch reading the story “The north wind and the sun” from IPA, 1999. Sentences were presented at a fixed level of 60 dBA.

For the localisation experiment we used a broadband telephone alerting signal of 250 ms [35]. The short stimulus was used in order to avoid the subjects turning their head while the stimulus was playing. The stimulus was presented at 65 dBA (±5 dB roving). Each level was presented once from each loudspeaker in a random order, which means 39 presentations (13 loudspeakers x 3 levels).

Procedures.

First, to set the HA gains, we measured the pure tone audiometric thresholds. Second, we calculated the gains using the NAL-RP rule [36]. To avoid stimulation of possible dead regions, we set the gains to zero or highly reduced them where the thresholds were worse than 100 dBHL. In order to account for the real ear insertion gain, we measured the aided thresholds in free field. The insertions gains were adjusted until we reached the prescribed target with a maximum difference of 10 dB. This was not possible for one subject (S2) at two frequencies (125 and 250 Hz), where the difference was 17 dB. Third, we fine tuned the gains based on sound quality assessment. In general, this resulted in a decrease of high frequency gains, most probably due to dead regions in the cochlea. The CI was fitted according to the subject’s clinical MAP. Finally, without SCORE, we loudness balanced the HA and the CI, by adjusting the overall gain of the HA based on 60 dB SPL LTASS.

After the fitting subjects were made familiar with SCORE by listening to an audiobook for 10 minutes while SCORE was active.

Speech perception in quiet was tested in three listening conditions (CI only, HA only, and CI+HA) and two processing configurations (with and without SCORE). In order to maintain potential familiarisation with SCORE after the audiobook, all tests with SCORE were conducted first. For each block of tests (with and without SCORE), the listening conditions were presented in a random order.

Speech perception in noise was measured for the CI+HA listening condition only. Each test started at an initial signal-to-noise ratio of +5 dB. Based on key words recognition, the intensity of the masker was adapted with a 1up/1down procedure in steps of 2 dB. Final result was determined as the mean of the last 6 trials. To avoid possible procedure learning effects, we tested with SCORE until the results was decreasing. Then we did two measurements for both configurations (with and without SCORE) in a random order.

Sound localisation was tested in the CI+HA listening condition, with and without SCORE. Before testing, we did loudness balancing for each configuration, with the localisation stimulus (except for subject S1) by frontally presenting the localisation stimulus and changing the overall gain of the HA, if needed. During the localisation experiment the subjects had to identify the loudspeaker the sound came from. To avoid learning effects, listeners were trained with SCORE until the root mean square (RMS) error was decreasing. Then we did two test runs for each configuration, in a random order. In each run, the stimulus was presented three times from each loudspeaker, in a random order.

Statistical analysis.

For speech intelligibility in quiet, we first averaged the test and retest results from the two retested subjects. Second, we transformed the results from percentage to rationalised arcsine units (RAU) [37]. On the RAU results we fitted an ANOVA model with three factors: 1) listening condition (CI only, HA only, CI+HA), 2) stimulation level (50 or 65 dBA), and 3) processing configuration (with or without SCORE).

For speech intelligibility in noise, we averaged test and retest results and then performed a Wilcoxon signed-rank test on the SRT results with versus without SCORE.

Since we tested a few subjects for localisation, we did not fit an ANOVA model on the results. However, we had many presentations for each subject. Therefore, we used a RMS error, within subject, to assess the results with or without SCORE.