Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
Fig 8
Scoring cis-regulatory variants in the HeLa cell-line.
A) Scatter plot of PRIME scores (45 M1 models) for heterozygous SNVs in the HeLa cell line versus z-scores of H3K27Ac peak scores (the higher the z-score the more exclusive the H3K27Ac signal to Hela, compared to 108 other samples). The arrow indicates an example SNV that generates a de novo JUN binding site (shown in C-D). B) Using high-scoring SNVs falling in acetylation peaks for each TF model we plotted fractions of gains and losses in dbSNP (polymorphisms) versus not in dbSNP (possibly somatic mutations). Oncogenic TFs that are important for HeLa, namely MYC, E2F7, JUND, and STAT1, have more gains than losses, specifically for variations not in dbSNP. Vice versa, YY1, a known repressor related to cancer, has almost no gains in non-dbSNP variations, while dbSNP variations have an almost equal amount of gains and losses. C) H3K27Ac signal around SNP that is predicted to generate a gain in JUN binding (PRIME = 0.21; z-score = 16.28) indeed shows a moderate exclusivity of H3K27Ac to HeLa. D) This position shows an allele-specific binding of JUN, only having ChIP-seq reads with the variant allele that causes a gain in JUN binding sites.