Imputation for transcription factor binding predictions based on deep learning
Fig 3
Comparison with gkm-SVM, PIQ, and DeepSEA.
(A) AUC comparison of TFImpute and gkm-SVM on TestSet1, TestSet2, and TestSet3. ‘Shuf cell line indicates that the cell line of the corresponding test set was shuffled and that the trained TFImpute model was then applied to the shuffled dataset. Similarly, ‘Shuf TF’ indicates that the TFs were shuffled. For some of the given regions, PIQ give NA predictions. NA means that there is no motif based on log probability threshold of 5, or the region is lack of DNase I signal. PIQNoNA in this figure denotes the result after removing all NAs and PIQ denotes the result after treating NAs as no binding. To calculate the AUC, the predictions were grouped by TFs. The middle bar in each box indicates the median. (B) AUC comparison based on predictions grouped by TF-cell line combinations. (C) The recall rates of different methods at FDR 0.05 (See Material and methods for more details). The predictions were grouped by TFs. (D) AUC comparison of TFImpute on TFs appearing in both TestSet2 and TestSet3. (E) Hierarchical clustering of a subset of the TFs based on the learned embedding by TFImpute. The full clustering is shown in S3 Fig. (F) Hierarchical clustering of a subset of cell lines based on the learned embedding by TFImpute. The full clustering is shown in S4 Fig. (G) The recall rate of TFImpute and DeepSEA at different FDR cutoffs on the datasets provided by DeepSEA.