Table 1.
Description of the benchmark datasets.
Table 2.
Data used for the semi-simulations.
Fig 1.
Benchmark of differential TF activity methods.
Ranking-based metrics for each method across each dataset. The left part of the heatmap indicates the rank of the true TF (top one if the dataset has more than one true target), the numbers are the actual ranks and the colors are mapped to the squared root of the ranks (because differences in very high ranks are irrelevant in practice). The central part shows the network score, i.e. the AUC of the proportion of top X TFs that are in the network of interactors of the true TF, relative to the maximal possible AUC for the dataset (see Methods and Fig Bc in S1 Appendix for details). The labels indicate the actual scores, while for the color mapping, the values are relative to the maximum achieved value for each dataset. Finally, the right part indicates a similar AUC-based score using motif models that are similar to (i.e. cluster together with) the true motif (see Methods). The methods are ranked based on the average of transformations of the three metrics across datasets (see Methods). For ATACseqTFEA, NA values indicate that the method repeatedly crashed on those datasets, presumably due to unsustainably high (>70GB) memory usage. For methods for which variations were tested, we show here only the original and top variant(s). All variants are available in Fig E in S1 Appendix. S1 Table contains full descriptions for the short names displayed here.
Fig 2.
Further benchmark results, and improvements over native chromVAR.
A. Sensitivity and specificity (considering interactors and members of the same archetypes as positives) of the top alternative per approach. The methods are coloured by family (sample-wise methods in pink, logFC-based methods in blue). B. Mean (and standard error of the mean) running time (elapsed, as well as total CPU time when multithreading) across datasets of the top alternative per approach (the x axis is squared-root-transformed for visibility). Note that because it was done separately for reasons of standardization, the running time does not include the generation of the peak-count matrix, nor, except for monaLisa, the motif scanning. monaLisa is therefore disadvantaged in this comparison, and these times should be interpreted as rough estimates. C. Comparison of the rank of the true motifs obtained by a limma analysis on the chromVAR deviations, versus using chromVAR’s native differentialDeviations. D. Comparison of the rank of the true motifs obtained by a limma analysis on the normalized chromVAR z-scores, versus on the chromVAR deviations.
Fig 3.
Impact of using GC smooth quantile normalization.
A. Comparison, across key methods, of the ranks of the true motif when using GC smooth quantile normalization instead of standard (TMM) normalization to calculate per-peak log(fold-change). B. Comparison of the ranks of the true motif with GC smooth quantile normalization, compared to the two best-performing methods from the earlier benchmark. C. Comparison of the precision and recall (at adjusted p-value <= 0.05) when using GC smooth quantile normalization (in blue), compared to the two best-performing methods (in pink).
Fig 4.
Semi-simulations and impact of perturbation strength.
A. Strategy for sampling ATAC-seq fragments: ChIP-seq peaks of the TF to be perturbed are overlapped with the baseline ATAC-seq profiles (1); a fold-change is determined by per ChIP-seq peak fold changes from a reference distribution (2) and used to sub-sample the overlapping fragments of the ATAC-seq profiles of one of the two groups (3). B. Two group designs were simulated by downsampling ATAC-seq fragments in one group according to the obtained fold changes of the overlapping peaks. Perturbation strength was varied by multiplying fold changes with different factors (0, 0.25, 0.5, 1, 3), i.e. the perturbation strength. Further technical variations were introduced in some datasets by varying GC contents and fragment length distributions. C. Performance of a subset of the top performing methods on the semi-simulated datasets with varying perturbation strength. Numbers in the line plot represent the rank of the perturbed TF as detected by the respective method. Colors of the boxes signify the corresponding adjusted p-value obtained in the differential activity analysis. Bold vs plain text indicates if the adjusted p-value was found to be significant (adjusted p-value <= 0.05).
Fig 5.
Characterizing the impacts of a TRAFTAC against NFkB after TNF-α activation in HEK cells.
A. Results of the ‘chromVAR(z)>limma’ analysis, highlighting the motifs with strongest changes between conditions and high absolute deviations. The two overlapping yellow circles highlight NFKB1 and NFKB2. B. Comparison of the per-motif p-values assigned by monaLisa (Simes p-value) and chromVAR, showing an agreement only on the strongest differences. C. Representative examples of motifs called as differentially-accessible by some method, showing side by side monaLisa’s enrichment scores across logFC bins, as well as the per-sample chromVAR z-scores and z-scores of the MLM t-values.
Fig 6.
Summary of the benchmark on variants of the top methods.
A. Sensitivity (left and center) and precision (right) of each method across real (left) and semi-simulated datasets (center). The size of the points indicate whether or not the method found the true motif significant (adjusted p-value <= 0.05). For real datasets, the color indicates the rank of the true motif, while for semi-simulated datasets it indicates the perturbation strength at which the true motif was found significant. B. Sensitivity and specificity (considering interactors as positives) of the top methods, aggregating both real and semi-simulated datasets. The methods are coloured by family (sample-wise methods in blue, logFC-based methods in pink).