Benchmarking interpretability of deep learning for predictive genomics: Recall, precision, and variability of feature attribution

doi:10.1371/journal.pcbi.1013784

Benchmarking interpretability of deep learning for predictive genomics: Recall, precision, and variability of feature attribution

Fig 3

Mean attribution recall across quantile thresholds for DNN interpretation methods and GWAS.

Mean recall values are shown for DNN attribution methods with (orange) and without (blue) SmoothGrad, compared with the GWAS baseline (green). Shaded regions represent the mean ± 1 standard deviation across replicates. (A) Dominant-effect recall. Smoothed DNN attribution methods achieved consistently higher recall than both non-smoothed variants and GWAS for dominant synthetic variants. (B) Recessive-effect recall. A similar trend was observed for recessive variants, where smoothed methods maintained greater sensitivity across thresholds. (C) Epistatic-effect recall. Both smoothed and non-smoothed DNN methods recovered measurable epistatic associations, substantially outperforming GWAS, which exhibited near-zero recall.

doi: https://doi.org/10.1371/journal.pcbi.1013784.g003