Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses
Results from all-read TFBSs are shown. A) The unrooted phylogenetic tree of LTR5A (red), LTR5B (green), and LTR5_Hs (blue) copies constructed using the maximum likelihood method. LTR5 was divided into five groups (I–V) based on the tree and their TFBSs (shown in (C)). Fragmented and outlier copies were excluded from the analysis. Copies of 233, 300, and 532 respectively belonging to LTR5A, LTR5B, and LTR5_Hs were included in the tree (out of 265, 431, and 645, respectively). Representative bootstrap values are shown at the corresponding nodes. B) Orthologous copies in the reference genomes of primates. The order of LTR5 copies is the same to (A). C) TFBSs present on each copy; representative TFBSs are shown. TFBSs of SPI1, TAL1, and GATA1/2 were from the ENCODE dataset, and others were from the Roadmap dataset. The order of LTR5 copies is the same to (A). D) TF-binding motifs at positions corresponding to HSREs on each LTR5 copy. The order of LTR5 copies is the same to (A). Black and gray colors respectively indicate the presence of motifs with p values of <0.0001 and <0.001, as identified by FIMO . E) Enrichment of sequence reads mapped to LTR5 copies belonging to respective subgroups. The Y-axis shows RPM relative to that of the input control. F) Relative number of HERV-DHSs mapped on each consensus position. The X-axis indicates nucleotide position in the consensus sequence of LTR5_Hs. The Y-axis indicates proportion of HERV/LTR copies harboring HERV-DHSs at each position.