maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks
Fig 5
Protocol and cell input numbers influence the performance of TFBS predictions.
(A) Test AUPR in cell line GM12878 for 60 TFs across chromatin-accessibility experimental designs (OMNI-, sc- and standard ATAC-seq or DNase-seq protocols, with variable input number of cells indicated). Grey squares indicate no predictions, due to lack of TF motif. TFs are hierarchically clustered based on maxATAC performance. (B) To visualize protocol-dependent trends for each method, AUPRs were normalized per TF (row-wise) as the log2(AUPR:AUPRMEAN), independently for maxATAC and TF motif-scanning AUPRs. (C) Distribution of the log2(AUPR:AUPRMEAN) per ATAC-seq sample. Given the maxATAC models were trained on OMNI-ATAC-seq ~50k cells, we compared each experiment to the reference "OMNI 50k Corces" sample (red boxplot). Black lines indicate protocol-dependent performance differences relative to the reference (Student’s two-sided t-test, Bonferonni-corrected P < 0.05).