Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
Fig 3
A) Three examples of TFs, each with several (for NANOG and TP53) or one (for MYC) target CRMs, illustrating the feature importance in the Random Forest classifier, in the M3 model. For NANOG co-regulatory PWMs contribute more to the classification performance than the PWM of NANOG itself. For TP53, the contribution of the co-regulatory PWMs is not strong and the classification decision is largely based on the presence of strong binding sites of TP53 itself. For the MYC model the most important features are regulatory tracks. B) Examples of a decision tree in the ensemble. C) Averaged feature importance across trees, showing the contribution of various features to the classification decision. For example TCF12 and ATF2 tracks are dominant for NANOG model; for TP53 the most relevant features are motifs of the query TF (red) and particular important ones are represented with logos. The colored region around dashed line demonstrates standard deviation of the feature impartance across trees.