Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
Fig 1
A) To identify functional CRMs we searched for significant correlations between TF ChIP-seq tracks and TF target genes using i-cisTarget [28]; and selected peaks (marked in green) that are located in 20 kb regulatory space around up- or down-regulated TF target genes. B) Feature selection was performed on the set of functional CRMs to select TF and co-regulatory PWMs and data tracks. C) The performance of each of the 45 TF models was evaluated by 5-fold cross-validation, using area under the precision-recall and receiver-operating characteristic curves. D) The 45 learned classifiers where used to identify cis-regulatory somatic mutations that have an impact on the CRM score, defining a PRIME score (Predicted Regulatory Impact of a Mutation in an Enhancer).