Fig 1.
Comb filters as a result of 3 gain / enhancement pairs. Yellow: 17 dB, 64 ms, dip depth = 2.89 dB. Red: 12 dB, 32 ms, dip depth = 6.06 dB. Blue: 8 dB, 8 ms, dip depth = 13.82 dB.
Fig 2.
Filter latency vs. array gain.
Increased filter length gives larger array gain at the expense of increased latency. Reverberation time by 60 dB (RT60) is given in the legend. (Filter latency = group delay = filter length*0.5) Simulation for one desired source, two interfering sources.
Fig 3.
Simplified simulink schematic.
a) Speech model containing 2 speech signals–direct path signal set at X dB and varies according to previous subject response. Enhanced path signal contains same speech signal with some additional gain of xi dB and a latency value of yi ms (i = 1,2, or 3; gain / latency pairs are described in text). b) Noise model containing 2 speech shaped noise signals–direct path fixed at 0 db, enhanced path fixed at -6 dB, representing worst case scenario of a successful beamformer, with the same latency value, yi, in speech model.
Fig 4.
Experiment’s guided user interface.
Example of MATLAB’s GUI for a single trial of implementing the Modified Rhyme Test (MRT).
Fig 5.
Visual for permutation statistics.
The distribution of thresholds from the permutation statistics is shown in red. The gray line shows where the average from the experimental data falls within the distribution.
Table 1.
Experimental conditions and threshold results.
Three gain / latency combinations from the RT60 = 0.6s beamforming simulation from Fig 2 used in this experiment. The average thresholds (SNR of enhanced signal to noise) and standard deviation across subjects for each of the 3 conditions. (dB = decibels; ms = milliseconds).
Fig 6.
Left graphs show the HASPI and STOI (top and bottom, respectively) as a function of the direct path signal-to-noise ratio. Adding the various enhancements translates the curve by the corresponding amount of gain. Right graphs show the HASPI and STOI (top and bottom, respectively) as a function of the enhanced path signal-to-noise ratio. Regardless of the direct path signal level, the enhanced signal dominates the intelligibility calculation.
Fig 7.
Green line is noise level across conditions. Gray region shows the levels below the measured speech intelligibility threshold by the average listener. Yellow diamonds represent the threshold levels across conditions for the average listener. Purple circles indicate level of direct signal, unintelligible and in need of enhancement.