Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
Fig 6
Comparison of PWMs and Random Forest classifiers on the known TAL1 insertion.
We scored the known TAL1 enhancer insertion that occurs in the Jurkat cell line [6] with Random Forest (M1) and PWM (M0) MYB-specific models. As control, we scored all SNVs and insertions in promoters across 498 breast cancer genomes with the same MYB models, to calculate a background distribution of impact scores. A) The distribution of background PRIME scores (i.e., delta Random Forest scores) and the observed PRIME score for M1, indicated as the orange arrow. B) The distribution of background PWM-delta scores (M0 model) and the observed score. C) Feature importance within the MYB model indicates that both and MYB motifs and co-regulatory TF motifs contribute significantly to the classification decision and the most important co-regulatory motif is RUNX, a known co-regulatory factor of MYB. D) The known driver insertion in the TAL1 enhancer generates a gain of H2K27Ac peak, whereas the known SNV in the TERT promoter does not. The red highlighted region indicates which samples harbor the respective cis-regulatory mutation.