Improved CRISPR/Cas9 off-target prediction with DNABERT and epigenetic features
Fig 5
SHAP analysis of epigenetic feature contributions in the DNABERT-Epi model.
The analysis was performed on the Lazzarotto et al. (2020) GUIDE-seq dataset. (A) The global importance of each epigenetic mark, measured by the mean absolute SHAP value across all features and samples. Error bars represent the standard deviation. (B) A SHAP summary plot from a representative cross-validation fold, illustrating the impact of the top 30 most important feature bins. Each point is a single sample, with its color indicating the feature’s value (red for high, blue for low) and its x-position showing the impact on the model’s output. (C) The positional importance of each epigenetic mark. The plot shows the mean absolute SHAP value for each 10 bp bin across a ± 500 bp window centered on the cleavage site. The shaded area represents the 95% confidence interval.