iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections

doi:10.1371/journal.pcbi.1003731

Figure 1.

Regulon detection by rank-based motif discovery and motif2TF.

Motif enrichment in iRegulon is measured using a ranking-and-recovery procedure using a large collection of position weight matrices (PWM). In the ranking step (A) all human genes are ranked for each motif by scoring for homotypic motif clusters across ten vertebrate species. In the recovery step (B) each of these gene rankings is tested against the set of input genes by calculating the Area Under the cumulative Recovery Curve (AUC, in pink). The example shown is for the top enriched motif, motif M2. The AUC score is normalized, based on the AUC scores of all motif rankings (distribution is shown as inset), to a normalized enrichment score (NES). A high NES score (≥3.0) indicates a motif that recovers a large proportion of the input genes within the top of its ranking. In parallel, the leading edge of the recovery curve is used to determine the optimal subset of genes that are likely controlled by this motif. In the last step (C) Motif2TF associates the candidate motif with (a number of) TFs by finding possible paths from a motif to a TF, in a motif-TF network based on direct evidence, orthology, and motif-motif similarity. The enriched TF can be from the input genes (e.g. TG5 encoding for TF2). See also Materials and Methods and Figures S1, S4.

More »

Expand

Table 1.

Description of the motif and track collections used.

More »

Expand

Figure 2.

Evaluation of iRegulon and comparison to other methods.

The TF recovery (y-axis) corresponds to the fraction of TFs correctly detected among all TFs for which a motif from our library can be associated. A. Positive sets consist of the top 200 genes ranked by the maximum signal value of the ChIP-Seq peak in the corresponding search space. Control sets are negatives from ENCODE (genes without a ChIP-Seq peak); TF neighborhoods (TFNB; all TFs within 5 Mb around a query TF); and random signatures (RND). The color (from red to yellow) and order of stacked bars indicate the number of times the queried TF was identified in the 1^st rank (top1), 2^nd rank (top2), 3^rd rank (top3), 4^th rank (top4), 5^th rank (top5) and 6^th to 10^th rank (top10). White color indicates the number of detected TFs (motif enrichment ≥3) but with rank >10. B. Positives are mixed with negative genes (noise) from 0% to 100% of noise. The lines represent the sensitivity (Sn), Specificity (Sp), and Precision or Positive Predictive Value (PPV) of target gene selection. C. The layers of motif2TF increase the performance. Recovery for ENCODE signatures and their control sets using different motif2TF parameters: 1) Motif collection effect (J, T, A barcharts), 2) Homology effect using a threshold on Identity% for all motifs (A+O barcharts), 3) Motif similarity effect using a threshold on the p-value (A+S barcharts), and combinations (A+O+S). Only Jaspar motifs (J); Only Transfac Pro (T); All motifs from Jaspar and Transfac pro, and others databases (A); All motifs+Orthology (A+O); and All motifs+Orthology+Similarity (A+O+S); blue indicates the analysis done on ENCODE sets and grey indicates on the control sets. D. Tool comparison using a benchmark of 30 gene sets constructed as the top 200 target genes based on ChIP peak occurrences in the 20 kb regulatory region for 30 TFs (these TFs were selected from FactorBook having their canonical motif as top enriched in the actual ChIP peaks). The number of times the queried TF was identified in the top1 (red) and top5 (yellow) is recorded. The dashed boxes represent top 5 recoveries if similar motifs are manually re-associated to the query TF. Default parameters were used, but when possible, they were adjusted to use the tss-centered-20 kb regions. See also Figures S2–S3 and Table S1.

More »

Expand

Figure 3.

Using iRegulon to map a p53-dependent gene regulatory network.

A. MCF-7 breast cancer cells were treated with Nutlin-3a to stabilize p53, followed by RNA-Seq after 24 h. iRegulon results shows p53 as top regulator in a set of 801 up-regulated genes, represented by 6 significantly enriched motifs, and 307 predicted direct targets. The top regulator in the set of down-regulated genes is E2F, with 653/790 predicted direct targets. B. Regulatory network for up-regulated target genes showing the overlap between the p53 regulon and regulons of predicted co-factors (AP-1, NFY, FOX) and regulatory network for down-regulated target genes showing a strong overlap between the predicted E2F and NF-Y regulons. Targets are in grey circle nodes and TF in black hexagon nodes. Regulons for each TF are represented by different edge colours. See also Tables S2–S5.

More »

Expand

Figure 4.

Validation of the p53 regulon by ChIP-Seq.

A. Integrative Genomic Viewer (IGV) [131] screenshot for CDKN1A, a known p53 target gene, showing up-regulation by RNA-seq (red arrowhead) and ChIP peaks in the upstream region (green and blue arrowhead). IGV is free software under GNU Lesser General Public License, version 2.1 (LGPL-2.1). B. Gene Set Enrichment analysis, with on the x-axis all genes in the genome ranked according to their maximum ChIP-Seq peak (20 kb around TSS). The p53 targets (green curve) show higher enrichment than the total set of up-regulated genes (blue curve), approaching the previously known curated targets (red curve), while the non-predicted p53 targets (magenta curve) and the set of down-regulated genes (cyan curve) show no enrichment. The initial two steps in the magenta curve represent two false negative predictions of iRegulon (they fall just below the optimal cutoff), namely PLK3 and DDB2, which are up-regulated and have a ChIP peak. P-values in the legend are calculated by the hypergeometric formula of the leading edge determined by GSEA. C. Comparison between annotated up-regulated p53 targets and predicted p53 targets by iRegulon and ChIP-Seq, indicating the number of previously known p53 targets. See also Figure S6.

More »

Expand

Figure 5.

Validation of p53 target genes and target CRMs.

A. Workflow to generate meta-regulons. Meta-regulons can be obtained directly via the iRegulon Cytoscape plugin. B. Direct targets of p53 in MCF-7 cells. All genes are significantly up-regulated by p53, are predicted as p53 targets by motif discovery in iRegulon and have a significant ChIP peak. In addition, genes in the grey shaded inner circle are part of the p53 meta-regulon, meaning that they are also found as p53 targets across cancer signatures. C. Four new p53 target genes are presented in detail. D. Relative mRNA expression levels of p53 target genes before (−) and 24 h after stimulation with 10 µM Nutlin-3a (N) or after 1 hour pulse of 5 µM Doxorubicin (D). Expression is shown relative to non-treated control and normalized to optimal reference genes for each cell type, assessed by GeNorm [130]. Error bars show standard error of the mean (SEM) of 3 replicates. E. Enhancer-reporter assays of four predicted p53 target CRMs, after transfection into MCF-7 cells before and after induction with Nutlin-3a (5 µM) in Wild Type and p53 Knock-down MCF-7 cells. Error bars represent SEM of 5 replicates. See also Figures S7–S8 and Tables S4,S6.

More »

Expand

Figure 6.

Combined analysis using 10K motifs and 1K ChIP-Seq tracks.

A. Two ranking databases were made using 9713 motifs and 1118 ChIP-Seq tracks. The ChIP-Seq tracks consisted of all ENCODE and Taipale ChIP-Seq data against TFs, and the p53 ChIP-Seq track generated in this study. B. AUC distributions for ChIP-Seq and motif rankings, using the p53 signature as input. C. The actual recovery curve for the p53 motif and track. Shaded area indicates the AUC. D. Top enriched ChIP tracks and motifs on the up- and down-regulated gene sets (NES>3, except for RFX5 motif that was detected with NES = 2.82 (b). (a) Predicted targets are shown for both enriched tracks and motifs respectively. E. Functional categories found enriched for predicted co-factors of p53. The annotation of p53-shared targets is shown in the inner circle, while the annotation of non-shared targets (for example, AP-1 targets but not p53) is shown on the outer circle. The co-factors shown here are those found by both motif and track enrichment (see also Table S7).

More »

Expand