Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
Fig 5
Regulatory impact score on simulated substitions.
A) Nucleotide substitutions with higher PRIME scores are under constraint. B) An example of the E2F1 promoter for which each possible substitution is evaluated by M0 and M1 models. The M1 model (Random Forest) identifies a 15 bp region that is highly vulnerable to mutations, while three different M0 models (using only the PWM), identify excessive numbers of false-positive substitutions, demonstrating the higher specificity of the Random Forest classifiers, compared to single PWMs. C) Barplot showing an example from A), thus averaged phastCons scores depeneding on the PRIME score threshold, for the E2F4 model. Error bars represent standard error of the mean.