iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections
Figure 2
Evaluation of iRegulon and comparison to other methods.
The TF recovery (y-axis) corresponds to the fraction of TFs correctly detected among all TFs for which a motif from our library can be associated. A. Positive sets consist of the top 200 genes ranked by the maximum signal value of the ChIP-Seq peak in the corresponding search space. Control sets are negatives from ENCODE (genes without a ChIP-Seq peak); TF neighborhoods (TFNB; all TFs within 5 Mb around a query TF); and random signatures (RND). The color (from red to yellow) and order of stacked bars indicate the number of times the queried TF was identified in the 1st rank (top1), 2nd rank (top2), 3rd rank (top3), 4th rank (top4), 5th rank (top5) and 6th to 10th rank (top10). White color indicates the number of detected TFs (motif enrichment ≥3) but with rank >10. B. Positives are mixed with negative genes (noise) from 0% to 100% of noise. The lines represent the sensitivity (Sn), Specificity (Sp), and Precision or Positive Predictive Value (PPV) of target gene selection. C. The layers of motif2TF increase the performance. Recovery for ENCODE signatures and their control sets using different motif2TF parameters: 1) Motif collection effect (J, T, A barcharts), 2) Homology effect using a threshold on Identity% for all motifs (A+O barcharts), 3) Motif similarity effect using a threshold on the p-value (A+S barcharts), and combinations (A+O+S). Only Jaspar motifs (J); Only Transfac Pro (T); All motifs from Jaspar and Transfac pro, and others databases (A); All motifs+Orthology (A+O); and All motifs+Orthology+Similarity (A+O+S); blue indicates the analysis done on ENCODE sets and grey indicates on the control sets. D. Tool comparison using a benchmark of 30 gene sets constructed as the top 200 target genes based on ChIP peak occurrences in the 20 kb regulatory region for 30 TFs (these TFs were selected from FactorBook having their canonical motif as top enriched in the actual ChIP peaks). The number of times the queried TF was identified in the top1 (red) and top5 (yellow) is recorded. The dashed boxes represent top 5 recoveries if similar motifs are manually re-associated to the query TF. Default parameters were used, but when possible, they were adjusted to use the tss-centered-20 kb regions. See also Figures S2–S3 and Table S1.