Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses
The interactions were extracted using pcHi-C dataset in GM12878 cells [54, 55]. Results from unique-read TFBSs are shown. A) Proportion of HERV/LTR copies overlapped with promoter-interacting regions. Proportions of total HERV/LTRs, HERV/LTRs with HERV-TFBSs, and HERV/LTRs with HSREs are separately shown. B) Transcription levels (log10 (RPKM+1)) of protein-coding genes and number of HERV-TFBSs interacting with the genes. Genes were divided into five categories based on the number of HERV-TFBSs interacting with the genes (0, 1, 2–5, 6–10, and 10<). Categories of the 0, 1, 2–5, 6–10, and 10< respectively contained 13,265, 1,179, 1,946, 822, and 1,639 of genes. P values were calculated using the Mann-Whitney U test with adjustment for multiple tests using the BH method. C) The word cloud indicating HERV/LTR types enriched in the interacting regions. Word sizes are proportional to the −log10 (p value) calculated using the Fisher’s exact test. The word colors indicate HERV/LTR families. D) Hi-C-based GO enrichment analysis. A set of all HERV-TFBSs in GM12878 cells was used. HERV-TFBSs identified in cells treated with special conditions (e.g., supplement of interferon) were excluded. GO terms were summarized by REVIGO . GO terms with hold enrichment scores of >2 are shown.