Figure 1.
Model framework for lineage-specific GATA1 binding sites.
Multiple alignments are shown for two GATA1-bound regions in humans. Red and blue boxes in the alignment correspond to GATA1 binding sites. Phylogenies illustrate the birth-death model framework, where the most likely number of binding sites is assigned to each ancestral node (denoted here as either ‘present’ or ‘absent’, at values 1 or 0 in this example). Highlighted branches denote the branch of origin. Evolutionary comparisons were conducted across ten primate species, as well as 36 non-primate vertebrates (not all are shown). (A) A binding site originating within an LTR insertion. (B) A genomic region containing a human GATA1 binding site originating along the ancestral primate lineage and a GATA1 binding site specific to mouse and rat. Despite nearly identical locations of the ChIP-seq peaks across human and mouse (in analogous Erythroblast cell lines), the ability of the method to identify specific branches of origin allows us to identify cases of TFBS turnover in close proximity.
Figure 2.
Time of origins for binding sites of six TFs in humans.
Binding motifs were determined using human ChIP-seq data for GATA1, SOX2, MYC, CTCF, ETS1, and MAX. The branch of origin was determined for each binding site within the (−100,+100) region relative to a human ChIP-seq peak summit. (Left) Distribution of the branch of origin for each binding site. Branch labels correspond to those in Figure 1B. ‘Ancestral’ binding sites have origins prior to human-mouse divergence. (Right) The rate of binding site creation along branches ancestral to humans. Rates were estimated by dividing the number of sites originating along each branch by evolutionary time, including only binding sites currently existing in humans.
Table 1.
Human-mouse ChIP-seq factor-bound region overlap.
Figure 3.
Within-species variation of binding sites according to time of origin.
Boxplots show the fraction of TFBSs containing common SNPs in humans [72], where plots show the median (center line), upper- and lower-quartile (boxes), and range (whisker extremes) of percentages across the TFBSs of six TFs. TFBSs are categorized as human-specific, hominid-specific (not including human-specific sites), Simian primate-specific (not including hominid-specific sites), and ancestral (present in the human-mouse common ancestor). Overall fractions (including all sites) are shown in the left-most boxplot. Note the substantial rise in the amount of human variation within more recently derived binding sites compared to older sites.
Figure 4.
Time of origin for human CTCF binding sites according to cell-specificity.
CTCF binding sites in humans were separated according to cell-specificity, considering four distinct cell lines (GM12878, H1hESC, HAC, and HRE). Colored bars correspond to varying amounts of cell-specificity, denoting sites bound in one, two, three, or in all four cell types (red to dark blue bars, respectively). Note the tendency for cell-specific binding sites to have more recent evolutionary origins than sites bound ubiquitously in all cell types.
Figure 5.
A TFBS turnover event within a functionally conserved enhancer.
A TFBS turnover event shows the impact of lineage-specific TFBS within an enhancer. The Genome Browser view shows the upstream of human gene EPB41. VISTA Enhancer track and ChromHMM track (orange means strong enhancer, yellow means weak enhancer) indicate a putative human enhancer. ChIP-seq signals of three TFs used in this study near predicted enhancer region are consistent with predicted lineage-specific binding site represented by 46-way multiple sequence alignment (only a subset of species are shown). Note that here the two MAX binding sites are also MYC binding sites since MAX and MYC have very similar motif. A potential TFBS turnover is observed between two predicted MAX/MYC binding sites (1700 bp apart). Different TFBSs are highlighted in different colors with MAX in blue and GATA1 in red. The predicted enhancer may function as blood cell specific enhancer in mouse, demonstrated by images of LacZ positive E11.5 mouse transgenic embryo on the VISTA Enhancer Browser [54] (ID: mm80; http://enhancer.lbl.gov/cgi-bin/imagedb3.pl?form=presentation&show=1&experiment_id=80&organism_id=2).