Benchmarking interpretability of deep learning for predictive genomics: Recall, precision, and variability of feature attribution

doi:10.1371/journal.pcbi.1013784

Benchmarking interpretability of deep learning for predictive genomics: Recall, precision, and variability of feature attribution

Fig 2

Illustration of the attribution precision metric.

The full set of features consists of real SNPs (left, green) and decoy SNPs (right, red). Within the set of real SNPs, a small subset is truly associated with the phenotype (grey circle; SNPs with associations). A DNN interpretation method identifies a set of top-K SNPs (yellow oval; DL-Salient SNPs), containing three subsets: A (truly associated real SNPs), B (real SNPs lacking true association), and C (decoy SNPs). Since sets B and C are assumed to be comparable in size, the number of decoy SNPs in the top-K most highly attributed SNPs (C) is used as an estimate of the number of real SNPs lacking true association (B), enabling the calculation of attribution precision as 1 − (|C|/ |A + B|).

doi: https://doi.org/10.1371/journal.pcbi.1013784.g002