Imputation of missing values for cochlear implant candidate audiometric data and potential applications

doi:10.1371/journal.pone.0281337

Fig 1.

Dataset creation.

Flow chart for inclusion and exclusion of subjects and audiograms. Data drawn from the Washington University School of Medicine in St. Louis Cochlear Implant database (WUSM CI) and the HIPPA-secure, Encrypted, Research, Management, and Evaluation Solution database (HERMES).

More »

Expand

Fig 2.

Performance assessment pipeline.

Diagram of steps for assessment of model performance with repeated simulations of nested cross-validation. Output consists of model performance mean and confidence intervals averaged across all 10 simulations.

More »

Expand

Fig 3.

Visualization of sparsity distribution.

All tested sparsity distributions applied to sample dataset of 500 audiograms. Quantity of sparsity fixed at 3 missing features per instance. Rows represent audiograms, columns represent features. Black bars indicate present data, white bars indicate missing data. (a) Real-world distribution sets weighted likelihood for feature removal equivalent to the corresponding parent subset. (b) Random distribution sets equivalent likelihood of removal for all features. (c) Terminal distribution is weighted 3-fold towards removing the terminal 6 frequencies (125 Hz, 250 Hz, 500 Hz, 4000 Hz, 6000 Hz, 8000 Hz). (d) Central distribution is weighted 3-fold towards removing the central 5 frequencies (750 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 3000 Hz).

More »

Expand

Fig 4.

Demographics.

Descriptive statistics for raw sparse audiometric dataset (n = 7,451).

More »

Expand

Fig 5.

Quantity and distribution of sparsity.

Assessment of model performance on sparse datasets with different degrees of sparsity (1–10 of 11 features) and sparsity distributions (Real-world, Random, Terminal, Central). Colored lines denote mean root mean squared error; shaded bands represent 99% confidence intervals.

More »

Expand

Fig 6.

Audiometric correlation.

Correlation matrix demonstrating the pairwise Pearson correlation coefficient between audiometric frequencies.

More »

Expand

Table 1.

Model selection.

Model performance assessment given different sparsity distributions. Quantity of missing data varied on a per-instance basis, capped at 6 missing features but otherwise statistically equivalent to parent subset. Results averaged across 10 simulations, reported as mean (95% confidence interval).

More »

Expand

Fig 7.

Model performance with varying dataset size.

Models assessed on sparse datasets with sample size sequentially increasing twofold. Amount of sparsity varied on a per-instance basis from 1 to 6 missing features, mirroring the quantity of sparsity of the parent subset. Distribution of sparsity was real-world, mirroring parent subsets. Lines represent metric mean across 10 simulations; shaded bands represent 99% confidence intervals.

More »

Expand