Modeling the impact of cochlear nerve degeneration on speech recognition performance

doi:10.1371/journal.pone.0336299

Fig 1.

Audiometry and age effects on degraded speech recognition.

A: Distribution of mean audiometric thresholds across the study cohort (n = 365), assessed at both standard audiometric frequencies (St) and extended-high-frequencies (EHF). Thresholds were averaged across ears. B: Age distribution of participants, who ranged from 18 to 80 years. C: Relationship between participant age and word recognition performance. Word scores were measured using time-compressed (65%) and reverberant NU-6 words. Each dot represents an individual participant. Pearson correlation coefficients quantify the association between age and speech recognition ability, highlighting a decline in performance with advancing age.

More »

Expand

Fig 2.

Spectro-temporal stimulus structure and SR-specific ANF responses.

Panel A shows the spectrogram of the word “size” (left), and the time-averaged energy distribution across frequency bins (right), normalized by the maximum value. Panels B through D show the neurograms (left) and rate-place profiles (right) for single low-, medium-, and high-SR ANFs at each CF in response to the same stimulus presented at 90 dB SPL. ANF firing rates from the model output (X axis) are displayed as a function of CF (Y axis). Average responses are computed across time and normalized by the maximum value across CF. For comparison, the average acoustic energy from panel A is superimposed (dashed lines) on each of the rate-place profiles.

More »

Expand

Fig 3.

Similarity of acoustic and neural representations across SR groups and levels.

Pearson correlations (r) between stimulus spectrogram and corresponding ANF neurogram are plotted for low-, medium-, and high-SR fibers at multiple presentation levels. Each point is one of 50 NU-6 words (65% time-compressed, 0.3-s reverberation). Higher r indicates closer spectro-temporal fidelity. Asterisks mark between-group differences (*** p < 0.001).

More »

Expand

Fig 4.

Training performance of neural decoders on simulated ANF neurograms.

Learning curves (loss and accuracy) for two fully connected models trained to classify 25 NU-6 words from neurograms. A, NN1: five dense layers (128 units each), ReLU, dropout = 0.03, trained for 10 epochs. B, NN2: sixteen dense layers (32 units each), ReLU, dropout = 0.03, trained for 500 epochs. Both models used the Adam optimizer with sparse categorical cross-entropy. Training inputs were neurograms generated from model responses to speech tokens presented at 11-30 dB SNR in 1 dB steps.

More »

Expand

Fig 5.

Confusion matrices for word classification across simulated CND.

Each matrix summarizes predictions for 25 NU-6 words (65% time-compressed, 0.3-s reverberation) presented at 10, 5 or 0 dB SNR. Neurograms were generated with the Zilany et al. auditory model under three neural conditions: normal innervation, high-SR only, and severe neural loss. A-C: NN1 results for the three conditions. D-F: NN2 results from the same conditions. Y-axes show true word labels; X-axes show predicted labels. Accuracy is indicated by concentration of values along the diagonal; overall percent correct appears below each matrix.

More »

Expand

Fig 6.

Word-recognition accuracy of the neural decoders across simulated CND and comparison with human data.

A: Mean classification accuracy for Neural Network 1 (NN1) and Neural Network 2 (NN2) across three SNR bins (10−0 dB; −5 to -15dB; −20 to -30dB). Curves are shown for three CND profiles generated by the Zilany et al. model: normal innervation (2 low-, 3 mid-, 7 high-SR fibers/ CF), high-SR only (loss of low/mid-SR fibers), and severe neural loss (3 high-SR fibers/CF). B: NN2 accuracy at high SNRs (10−0 dB; red symbols) for the three neural profiles, plotted against a “percent synapse survival” axis. For comparison (gray points), individual word-recognition scores from Fig 1C were mapped to the same axis by converting participant age to predicted auditory-nerve survival using the published age–survival function for normal aging [3].

More »

Expand