Fig 1.
A flowchart of ChIP-GSM for TF module inference.
Given ChIP-seq data of multiple TFs and candidate genomic regions, ChIP-GSM learns a mixture model of Power-Law (for binding events) and Gamma (for non-binding events) distributions that best explains the read counts in TF-bound and background regions. ChIP-GSM’s Gibbs sampler iteratively samples TF modules for each region until convergence toward a posterior probability distribution of modules for all regions. Using logistic regression, ChIP-GSM correlates TF module binding likelihoods at individual regions with experimentally measured regulatory activities to systematically predict activities for each region with TF module regulation.
Fig 2.
ChIP-GSM and competing methods abilities to infer TF modules using realistically simulated ChIP-seq data.
We simulate ChIP-seq read counts for 100,000 regions and examine the accuracy of module inference by applying each competing method to a low challenging case (Case 1, four TFs), a middle challenging case (Case 2, seven TFs) and a high challenging case (Case 3, eighteen TFs). (A) F-measure of each method on module inference across all regions; (B) F-measure of each method on regions with at least one weak binding event. ChIP-GSM performs better than the comparable methods, especially when there are many TFs with weak binding events.
Fig 3.
ChIP-GSM-inferred TF modules for enhancer and promoter regions respectively.
The number of modules functioning at enhancer or promoter regions in (A) MCF-7 cells or (B) K562 cells. Module abundance reveals that region-specific modules can be as strong as common modules functioning in both enhancer and promoter regions, in (C) MCF-7 cells or in (D) K562 cells.
Fig 4.
Improved ChIP-GSM prediction of cell type-specific active enhancers and promoters.
(A) and (C) show the F-measure of ChIP-GSM on the 20% hold-out labelled enhancers or promoters. (B) and (D) show boxplots of F-measures of ChIP-GSM and three comparable methods across all cell types.
Fig 5.
ChIP-GSM-predicted active regions are significantly enriched with epigenetic markers and significantly correlated with 3D chromatin interactions.
(A) The top 10% predicted enhancers are significantly enriched with marker ChIP-seq peaks of H3K4me1 and H3K27ac but not H3K4me3. (B) and (C) The ChIP-GSM-predicted enhancer activities are significantly correlated with ChIA-PET 3D chromatin interactions in MCF7 and K562 cells, respectively. (D) The top 10% of predicted enhancers are significantly enriched with marker peaks of H3K4me3 but not H3K4me1 or H3K27ac. (E) and (F) The ChIP-GSM-predicted promoter activities are significantly correlated with ChIA-PET 3D chromatin interactions in MCF7 and K562 cells, respectively.
Fig 6.
ChIP-GSM-identified TF modules at the gene promoter regions of K562 cells.
(A) Eight groups of modules identified by ChIP-GSM functioning at gene promoter regions in leukemia K562 cells (TF modules are defined in S4 Fig); (B) mRNA co-expression of pairwise TFs in each group; (C) selected modules whose target genes are significantly enriched (hypergeometric p-value < 0.001) in activated genes as shown in (D). Each color label or box represents a unique module group.