Figure 1.
Nucleosomal DNA is depleted at transcription factor binding sites.
(A) Nucleosomal tag counts in the vicinity of Abf1 motifs as a function of p-value for ChIP enrichment of the site. Abf1 sequence motifs, and the sites bound at p≤1e-3, were defined by MacIssac et al. [16] Abf1 motifs considered unbound by MacIssac et al (p>1e-3) were assigned the Harbison et al ChIP enrichment p-values of the genomic region spanning the site. [13] Only ChIP-enrichment values obtained in YPD media were used. (B) Area under ROC curves (ROC AUC) for the prediction of Abf1 binding based on low nucleosomal DNA tag counts. Abf1 bound sites were compared to randomly selected yeast promoter sites, using 15bp windows centered on the Abf1 and random sites. Values above 0.5 are considered significant. (C) ROC AUC values as in panel B but for a total of 41 transcription factors. Bound sites were defined as for the Abf1 sites using p-value for enrichment of 1e-3 or better. All transcription factors that had at least 50 bound sites are shown.
Figure 2.
Crosslinking of chromatin preferentially protects sites that are otherwise nuclease sensitive and correlated with transcription factor binding.
(A) (top): Tag counts of uncrosslinked chromatin (orange) and crosslinked chromatin (gray) in the region of bound Abf1 sites. Tag counts have been symmetrized around the Abf1 site. Tag counts for the crosslinked sample were normalized to the uncrosslinked sample between 100–600bp from the Abf1 site to highlight the concordance in the phased nucleosome locations and occupancies. (bottom): Tag count difference map (green) in the vicinity of bound Abf1 sites showing excess tags in crosslinked chromatin vs. uncrosslinked. (B) Predictive value of nuclease-resistant tag counts for binding of 41 TFs. ROC AUC values on the y-axis were calculated based on the difference map (excess tag counts found in the crosslinked sample compared to the uncrosslinked). ROC AUC values on the x-axis were calculated as in Figure 1. The dashed line is for y = x. Binding of most TFs is predicted by the difference map well as well as, or even better than, by the under-representation of tags in the normal chromatin preparation.
Figure 3.
Some of the nucleosome position information correlated with transcription factor binding is intrinsic to genomic sequence.
(A) ROC AUC values quantifying the predictive value of low nucleosome occupancy based on chromatin in vivo (x-axis) or chromatin reconstituted in vitro from genomic DNA and histones (y-axis). The in vivo data are as shown in Figure 1; values based on the in vitro reconstituted chromatin were calculated in the same manner. The positions of four TFs that are used in panel B are indicated by colored circles. The solid line is the best linear fit through the data (R = 0.90), excluding outliers Abf1 and Reb1 (gray circles). That Abf1 and Reb1 are truly outliers was established by assessing the deviation from a fit to all the data: these two TFs deviate from that line by a distance that exceeds the average distance by more than 2.5 standard deviations. The dashed line, y = x, corresponds to the expected fit if in vitro and in vivo nucleosome data were entirely equivalent. (B) Correlation between the ChIP enrichment value at perfect consensus binding sites and tag counts from nuclease-protected mono-nucleosome-sized DNA obtained from in vivo chromatin (left panels) or in vitro reconstituted chromatin (right panels). The best linear fits between log(ChIP-qPCR enrichment value) and tag count are shown.
Figure 4.
The effect of intrinsic sequence-dependent chromatin structure on TF binding is not dependent on exact nucleosome positioning.
(A) Windowing scheme to simulate lower resolution data. Nucleosome tag counts around genomic loci (TF binding sites or control loci) were averaged over windows of 15, 40, 75, 150, 300 and 600bp as indicated by the lines at the bottom of the panel. Average tag counts around Abf1-bound sites are shown as in Figure 1A, along with the locations of nucleosomes inferred from those data. (B) ROC AUC values for the prediction of Abf1 binding sites based on low nucleosome occupancy, averaged over the indicated window sizes. The difference in ROC AUC in going from high resolution data (15bp) to simulated low resolution data (600bp) is indicated by the arrow. This measure of the effect of data blurring was used in panels C and D. (C) Effect of averaging in vivo nucleosome data on the correlation between nucleosome occupancy and TF binding for 41 TFs. Coloring of the histogram bars is based on the standard deviation of the values for abs(Δ(ROC AUC)), as indicated in the legend. (D) Same as panel C except that data from in vitro reconstituted chromatin was used rather than in vivo chromatin.
Figure 5.
Clustering of nucleosome occupancy profiles suggests more than one reason for the efficacy of in vitro nucleosome occupancy blurring.
(A) Heat map of the 1200bp nucleosome occupancy profiles surrounding TF binding sites. Each row represents the average profile around binding sites for one of the 41 TFs. The left side is based on the in vivo nucleosome map; the right is based on the in vitro nucleosome map. Tag counts were separately normalized to a mean of 0 for each of 1200bp in vitro and in vivo windows. Yellow represents low tag counts, and blue high. The TFs have been placed into five groups based on k-means clustering (Methods) and are ranked, within the groups, based on the improvement in association with TF binding when nucleosome occupancy is averaged over 600bp window rather than 15bp. Nucleosome-poorer regions are typically broader in the in vitro maps, and some TFs show nucleosome enrichment over TF binding sites in the in vitro maps. (B) Changes in ROC AUC values for the prediction of TF binding using blurred data (600bp vs 15bp). TFs are colored according to the clusters in which they fall based on the in vivo and in vitro nucleosome occupancy profiles surrounding their binding sites (note color key in panel A). For each group of TFs, the one with the greatest improvement using blurred in vitro nucleosome data is indicated by a square rather than a circle. (C) In vitro and in vivo nucleosome tag counts in a 1200 bp window surrounding bound sites. The five TFs shown are representative of the five profile clusters and are indicated by squares in panel B.
Figure 6.
The blurring of intrinsic nucleosome occupancy data accentuates the effects of nucleosomes in a computational model of TF binding.
(A) Illustration of how nucleosome occupancies are used to weight the predicted binding affinities of sequence motifs (top panel): Two 2.4kb genomic regions (CAN1/NPR2 and YPL137C/ISU1) showing normalized nucleosome tag counts from in vitro reconstituted chromatin, averaged over 15bp windows (gray line) or 600bp windows (black line). Red dots indicate the location of a perfect Gcn4 consensus site in each region. (middle panel): Same as the top panel except the lines show the conversion of normalized tag counts into weights that can be applied to Position Weight Matrix based estimates of TF binding affinity. Note that the weights are plotted on a log scale. Details of the weighting scheme are given in Methods. (bottom panel): Predicted equilibrium binding constants for the two sequence-identical Gcn4 sites (relative neighed Ka = 1; white histogram bars). High-resolution nucleosome data (15bp window; gray bars) increases the effective Ka of the two sites by about the same amount because the local nucleosome occupancy for both sets is about the same, and lower than average. Averaged over 600bp, the CAN1/NPR2 site is in a much lower-than-average nucleosome occupancy region while the YPL137C/ISU1 site is in a higher-than-average region. As a result, the predicted effective binding affinities of these two sites, subject to low resolution nucleosome occupancy (black bars) are very different. (B) Effect of nucleosome-based weighting on the prediction of TF bound promoters. Each dot is a TF. The value along x-axis shows how well the PWM, used in a computational model of TF binding, predicts which promoters are bound. This is quantified as the area under an ROC curve (ROC AUC). Plotted against this value is the change in ROC AUC that is obtained by weighting genomic loci on the basis of high-resolution nucleosome data (orange; 15bp window) or on the same data averaged over 600bp windows.