Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses
Results from all-read TFBSs are shown. A) and B) Number of HERV-TFBSs mapped on each consensus position of LTR7. Results for NANOG and EOMES are shown in (A), and those for FOXA1, SOX2, POU5F1, FOXA2, and GATA6 are sown in (B). The X-axis indicates nucleotide position of the consensus sequence of LTR7. The Y-axis indicates the number of HERV/LTR copies harboring HERV-TFBSs at each position. C) and D) Number of TF-binding motifs in HERV-TFBSs mapped on each consensus position of LTR7. Results for NANOG and EOMES are shown in (C), and those for FOXA1, SOX2, POU5F1, FOXA2, and GATA6 are shown in (D). The X-axis indicates consensus position of LTR7. The Y-axis indicates number of HERV/LTR copies harboring the TF-binding motifs in TFBSs at each position. Peaks of the motifs corresponding to HSREs are denoted by an asterisk (*) with motif names (e.g., SOX2 M0). E) The number of HERV-DHSs (DHSs on HERV/LTRs) mapped on each consensus position of LTR7. The X-axis indicates consensus position of LTR7. The Y-axis indicates the number of HERV/LTR copies harboring HERV-DHSs at each position. F) Proportion of LTR7 copies overlapped with each chromatin state predicted by genome segmentation method [47–49]. TSS, promoter region including TSS; PF, predicted promoter flanking region; E, enhancer; WE, weak enhancer or open chromatin cis regulatory element. G) The unrooted phylogenetic tree of LTR7 copies reconstructed using the maximum likelihood method with RAxML . Fragmented and outlier copies were excluded from the analysis. In total, 1,914 (out of 2,344) of LTR7 copies were included in the tree. Representative supporting values calculated by Shimodaira-Hasegawa (SH)-like test  are shown on the corresponding branches. Identified phylogenetic subgroups (subgroups I, II, and III) are shown. H) Orthologous copies of LTR7 in the reference genomes of primates. The order of LTR7 copies is the same to (G). I) TFBSs on each LTR7 copy. The order of LTR7 copies is the same to (G). J) TF-binding motifs at positions corresponding to HSREs on each LTR7 copy. The order of LTR7 copies is the same to (G). Black and gray colors respectively indicate the presences of motifs with p values of <0.0001 and <0.001, identified by FIMO . K) Enrichment of sequence reads mapped to LTR7 copies belonging to respective subgroups. The Y-axis shows reads per million (RPM) relative to that of input control. L) Insertion dates of proviruses of HERVH/LTR7 along with the species tree of primates. Upper panel: The boxplot showing insertion dates of the respective proviruses estimated by sequence comparison between 5′- and 3′-LTRs. Insertion dates of the proviruses are separately shown in the respective subgroups. Categories of subgroups I, II, and III contained 66, 248, and 227 copies of proviruses, respectively. Lower panel: Phylogenetic tree of primates with time scale. The tree was obtained from TIMETREE . Red branch in the tree indicates the period when the rewiring of the core regulatory network of pluripotent cells seems to have occurred.