The Role of Genome Accessibility in Transcription Factor Binding in Bacteria

doi:10.1371/journal.pcbi.1004891

Fig 1.

The role of genome accessibility in TF-binding in vivo.

The genome accessibility model differentiates genomic regions as accessible (A) or not accessible (B). ChIP-seq data show that coverage cannot be explained by binding affinity alone. Example data is shown for an accessible region (A) that has a weak binding site (small purple box, p-value ~ 5x10^-4) and high ChIP-seq coverage. The gray dashed line indicates the location of the TF-binding site motif. Example data is shown for an inaccessible region (B) with a strong binding site (big purple box, p-value ~ 5x10^-6) but low coverage. Example data shown are for M. tuberculosis DosR ChIP-seq experiments [15].

More »

Expand

Fig 2.

Genome accessibility improves prediction of ChIP-seq profiles in comparison to a model that only considers motif score.

Motif score alone explains only 35% of the observed variance (A), while the improved biophysically motivated model that incorporates genome accessibility explains 63% of the variance (B) (p<10⁻¹⁶, likelihood ratio test). The predicted coverage is estimated from parameters fitted for Eq 1. Coverage is represented in terms of log(p_ij). The panels display a subset of 10000 points that was randomly selected to reduce the density of points and improve visualization.

More »

Expand

Fig 3.

Genome accessibility improves binding peak prediction in ChIP-seq profiles.

Reference ChIP-seq peaks are defined according to method previously described [15]. A receiver and operator characteristic curve is shown in panel (A). Three models are presented for de novo peak prediction (see main text for details). The accessibility parameter (blue and orange lines) increases peak prediction from 0.69 to 0.82 in comparison to a model that only accounts for motif score (violet-red line). (B) Accuracy of genome accessibility estimation as a function of number of ChIP-seq experiments. The accuracy of accessibility values is defined as the Pearson correlation between the estimated values for a subset of ChIP-seq experiments and the one estimated for entire dataset (S2 Fig). The expected accuracy of accessibility values is defined as the mean value of 100 samples. Error bars represent one standard error.

More »

Expand

Fig 4.

Genome accessibility correlates with genomic features.

(A) Intergenic regions are more accessible than protein coding genic regions (p<10⁻¹⁶). (B) Regions associated with amino acid and carbohydrate metabolism and transport (COGs E and G) show statistically reduced accessibility. Genes associated with transcription and translation (COGs K and J) show statistically higher accessibility (p<0.05, Bonferroni correction). (C) Gene expression is positively correlated with accessibility. The correlation of DNA accessibility with gene expression after controlling for values of motif affinity is 0.278 (p<3.98 10^−56-; function pcor and pcor.test, R package ggm). (D) Expected gene expression is highest at an intermediate level of accessibility. Accessibility bins with less than 10 data points are clustered with the neighboring bin with fewer data points. Error bars represent one standard error from the mean.

More »

Expand

Fig 5.

Genome accessibility is affected by GC content and distance to oriC.

(A) Accessibility is negatively correlated with local genomic GC content. The correlation between accessibility values and region GC content after controlling for values of motif affinity is -0.30 (p < 10⁻¹⁷⁹; function pcor and pcor.test, R package ggm). (B) Accessibility does not appear to correlate with genome position. (C) Accessibility is negatively correlated with distance to oriC. (D) A schematic of genome replication that could explain the correlation between accessibility and distance to origin of replication.

More »

Expand