Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
Fig 2
Cross-validation performance for 45 TF models.
A) Area under precision-recall (AuPR) and receiver operating characteristic (AuROC) curves for different models. Mk, M1, M2, and M3 are estimated by 5-fold cross-validation. M0 model does not use a training set and the AuROC and AuPR where obtained by varying the threshold of the PWM. B) Examples of precision-recall curves for ATF2 and BATF. Random Forest classifiers outperform PWM-based models. M3 models (using experimental data tracks) outperform M1 models (using sequence only).