maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks

doi:10.1371/journal.pcbi.1010863

maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks

Fig 3

The maxATAC models offer state-of-the-art TFBS prediction from ATAC-seq.

For every TF model, one cell type and two chromosomes (chr1, chr8) were held out during training to assess predictive (test) performance in a new cell type. Test (A) AUPR (median = 0.43) and (B) precision at 5% recall (median = 0.85). Boxplots display median (horizontal line), interquartile range (box), 3-quartile range (whiskers) and points outside the 3-quartile range (diamonds). maxATAC model performance is compared to (C) TF motif-scanning in ATAC-seq peaks and (D) TFBS prediction using the averaged ChIP-seq signal from the training cell types; each dot represents AUPR_MEDIAN across train-test cell type splits. Red dots indicate TFs with no known motifs in CIS-BP. (E) Test AUPR of maxATAC models compared to Leopard (DNase-seq-based) model using ATAC-seq input and maxATAC ChIP-seq gold standards for 8 cell lines and 7 TFs. maxATAC outperforms Leopard for 20 out of 29 test performance comparisons. (F) Test AUPR of the maxATAC models on ATAC-seq compared to test AUPR reported by state-of-the-art deep learning models (Factornet, Leopard and DeepGRN) on DNase-seq. (G) Validation performance (AUPR_MEDIAN) on chr2 (training cell types) as a function of test performance (AUPR_MEDIAN) on chr1 (held-out test cell type) (n = 74; ⍴_Pearson = 0.97, P < 10⁻¹⁵).

doi: https://doi.org/10.1371/journal.pcbi.1010863.g003