Dynamic landscape of protein occupancy across the Escherichia coli chromosome

doi:10.1371/journal.pbio.3001306

Fig 1.

Schematic of IPOD-HR technology and detection of context-dependent binding by the TF PurR.

(A) Overall workflow for isolation of the IPOD-HR fraction and quantification of total protein occupancy. (B) The final IPOD-HR signal is obtained by subtracting a normalized RNA polymerase occupancy signal from the raw IPOD-HR protein occupancy, resulting in a polymerase-corrected signal. (C) Example of RNA polymerase–corrected IPOD-HR profile upstream of the purC gene, where subtraction of RNA polymerase occupancy from the raw IPOD-HR signal properly reveals a PurR binding site in rich media that is lost upon deletion of purR or transition to minimal media. In the schematic above the plots, blue regions show genes, orange regions show promoters, and purple regions show annotated TFBSs. ChIP-seq, chromatin immunoprecipitation sequencing; IPOD, in vivo protein occupancy display; IPOD-HR, in vivo protein occupancy display—high resolution; TFBS, transcription factor binding site; WT, wild-type.

More »

Expand

Fig 2.

IPOD-HR profiles reveal rich high-resolution occupancy dynamics and large-scale structural features across the chromosome.

(A) Outer track: IPOD-HR occupancy (robust Z-scores, 5-kb moving average); middle track: total RNA read density (5-kb moving average); inner track: locations of inferred EPODs. The outer green wedges mark the portion of the chromosome shown in subsequent panels. The origin of the coordinate system is oriented at the top of the plot. All data in this figure are for the “WT,rich” condition unless otherwise noted. (B) IPOD-HR occupancy measured during growth in glucose RDM, in the vicinity of wedge i from panel A. Green segments below the genomic coordinates indicate the regions highlighted in panels C–D. (C) Condition-dependent occupancy changes at the ArgR binding sites upstream of argA. (D) Identification of condition-specific occupancy of a likely LysR binding site between lysA and lysR. (E) Cumulative histograms showing RNA polymerase ChIP-subtracted IPOD-HR occupancy in coding vs. noncoding regions and at sites that match known TFBSs from RegulonDB [7], compared with the curve that would be expected from a standard normal distribution of scores. Additional descriptive statistics and significance calls are given in S1 Table. (F) Occupancy (blue) and total RNA abundance (orange) for a selected sector of the genome (wedge ii from panel A), showing the presence of several EPODs in regions corresponding to low RNA abundance; rolling medians over a 5-kb window are plotted, with RNA read densities shown in units of RPM. (G) Magnification of the region highlighted by the green bar in panel F, illustrating a silenced region in and around rhsC, alongside flanking areas of low IPOD-HR occupancy and high transcription. A 5-kb rolling median is plotted. ChIP, chromatin immunoprecipitation; EPOD, extended protein occupancy domain; IPOD-HR, in vivo protein occupancy display—high resolution; RDM, rich defined medium; RPM, reads per million; TF, transcription factor; TFBS, transcription factor binding site; WT, wild-type.

More »

Expand

Fig 3.

IPOD-HR profiles reveal global binding activity of known TFs and sigma factors.

(A) Average (geometric mean) occupancies for all annotated binding sites of the 6 indicated TFs under each indicated condition. Error bars indicate a 95% confidence interval based on parametric bootstrapping with pessimistic assumptions; see Methods for details. The number of detectable sites used to estimate the condition-specific occupancies were 30, 10, 2, 45, 6, and 9 for ArgR, LexA, PurR, ArcA, RutR, and CytR, respectively. (B) Spearman correlations between all occupancy values at annotated binding sites for the indicated TFs (all TFs with at least 50 sites in the RegulonDB database) in the IPOD-HR vs. Lrp ChIP data sets. Points shown in red have a statistically significant correlation (FDR-corrected p-value < 0.05). Annotated binding sites are from RegulonDB release 9.4, prior to inclusion of the ChIP data used here, with overlapping or bookended sites for the same TF merged prior to analysis); data are from [24] (Lrp ChIP) or the present study (IPOD-HR). Data are taken from the most closely equivalent conditions (log phase growth in minimal media, log phase growth in rich media, and stationary phase in rich media), although the carbon source is different (glycerol vs. glucose). (C) Heat map showing the consensus clustering (co-occurrence frequencies) of the pattern of occupancy dynamics for the regulons of all considered TFs across the varied nutrient conditions in this study (see Methods for details). Consensus division into 10 clusters via agglomerative clustering is shown at right; for each cluster, representative TFs (on matrix) and regulated GO terms (right) are shown, with numbers in parentheses indicating the approximate p-value for enrichment of that GO term. A full listing of p-values is given in S2 Table. (D) Changes in occupancy and target gene transcript level for all annotated repressive binding sites of ArgR and PurR (for minimal media vs. rich media), in each case demonstrating the strong and oppositely directed changes in binding and regulatory effects across the regulons. (E) Correlation of promoter-level occupancy changes (measured by RNA polymerase ChIP-seq) and changes in transcript abundance, shown for the WT stationary phase condition compared with exponential phase. Shaded area shows a bootstrap-based 95% confidence interval. (F) IPOD-HR protein occupancy profiles in the vicinity of the potF promoter under the indicated conditions. Drawn TFBSs are taken from Ecocyc [25] reflecting recent updates in known TFBSs in this region. (G) IPOD-HR occupancy profiles upstream of ndh under the indicated conditions. For all rows of TFBSs except the top, all TFBSs in a given row correspond to the factor named at the beginning of that row. ChIP, chromatin immunoprecipitation; ChIP-seq, chromatin immunoprecipitation sequencing; FDR, false discovery rate; GO, gene ontology; IPOD-HR, in vivo protein occupancy display—high resolution; TF, transcription factor; TFBS, transcription factor binding site; WT, wild-type.

More »

Expand

Fig 4.

Experimental identification of the protein bound to a novel occupancy peak upstream of the sdaC promoter.

(A) IPOD-HR profiles upstream of sdaC in rich (M9/RDM/glu) media, minimal (M9/glu) media, and in rich media in stationary phase (the drawn Lrp binding site is taken from Ecocyc [25] and is not present in RegulonDB). (B) Schematic of pulldown/mass spectrometry experiments used to identify factors binding the sdaC promoter. (C) Gel shift experiments showing specific interaction of YieP with the sdaC promoter. Increasing concentrations of purified His₆-YieP are incubated with a mixture of fluorescein-labeled promoter regions from sdaC and purC and then run on a gel, demonstrating specific shifting of the sdaC promoter region. YieP concentrations are given as the number of 2-fold dilutions relative to full strength. (D) Comparison of IPOD-HR occupancy profile (as in panel A) with ChIP-exo data from [31], with the latter given as total read counts (parsed from GEO accession numbers GSM3022131 and GSM3022132). The top track of predicted YieP sites shows significant hits for the YieP motif identified based on that ChIP-exo data set. Out of 1,025 potential YieP sites in the genome, the location highlighted in cyan is tied for 10th highest score (identified using FIMO; see Methods for details). Occupancy signal is given as −log₁₀(p) for the IPOD-HR track or raw counts (averaged across strands) for the ChIP-exo tracks. (E) Results of Miller assays in which lacZ transcription is driven by a copy of the sdaC promoter, either with the native sequence (WT) or with one or both of the apparent YieP binding sites scrambled, in both a WT and yieP background. Large points and error bars show a posterior mean and 95% credible interval from a Bayesian analysis; small points show individual data points, with symbols denoting the day on which data were gathered (a total of 8 biological replicates split across 4 different days were performed for each strain). Significance is assessed using 1-sided Bayes factors with the interpretive scale of Kass and Raftery [33] (*: Substantial, **: Strong, ***, Decisive). Stars within the plot denote direct comparisons of the WT and yieP strains for each promoter, whereas those above the plot denote comparisons of each promoter variant with the original within a given genetic background. ChIP, chromatin immunoprecipitation; IPOD-HR, in vivo protein occupancy display—high resolution; WT, wild-type.

More »

Expand

Fig 5.

Genome-wide de novo discovery of sequence specificity motifs for actively bound TFs.

(A) At a peak calling threshold of 4 (cf S1 Fig), we show the number of identified binding sites that overlap with annotated sites in RegulonDB (“RegulonDB”), motif-based predicted binding sites (“SwissRegulon”), or novel (“New”). The “Combined” category represents peak sets where the peaks at a given threshold identified across all conditions are merged, prior to comparisons with the RegulonDB and predicted databases. Qualitatively similar results are observed at all tested peak calling thresholds (all peaks are provided in S2 Data). (B) All called IPOD-HR occupancy peaks across the conditions shown in panel A were combined and then partitioned based on whether they overlap with a known or inferred binding site in RegulonDB (RegulonDB peaks) or not (Other peaks). Peaks were then considered to have regulatory potential if they fell within 100 bp of an annotated transcription start site, and the fraction of the genes potentially regulated by each peak category plotted across different peak calling threshold. Error bars show 95% credible intervals calculated assuming that the incidence of poorly annotated genes in the inferred regulon is a binomial random variable, using Bayesian inference with a Beta(1,1) prior. The dashed line shows the overall fraction of poorly annotated genes included in the analysis (i.e., those belonging to transcripts regulated by at least 1 annotated transcription start site in RegulonDB). (C) Number of motifs discovered de novo using IPOD-HR occupancies under each condition in our study. “All” and “pruned” refer to all discovered motifs and those surviving cluster-based filtering by RSAT (see Methods for details), respectively. “Real” shows the motif counts discovered in real data, and “Decoy” shows the maximum discovered motif count across 20 independent circular permutations of the data under each condition. (D) Classification of nonredundant motifs across conditions as “Identified” (match to an existing motif from the SwissRegulon database, via TOMTOM, with E-value < 0.5) or “Unidentified” (no matches found with E < 0.5). “Combined” refers to the full set of motifs discovered after pooling all motifs across all conditions and redundancy filtering; a horizontal dashed line shows the total number of known motifs present in SwissRegulon. (E) Example cases of “Identified” matches of IPOD-HR-inferred motifs with motifs from the SwissRegulon database, showing good correspondence with annotated CRP (left) and NanR motifs. E-values arising from the TOMTOM search pairing newly discovered motifs with similar known motifs are shown beneath each inferred motif. y axes for motifs in this and the following panel show information content in bits. In the case of CRP, the half site was inferred and is shown here in both the forward and reverse orientations aligned to the motif in SwissRegulon. (F) Examples of 2 newly inferred motifs that do not have identifiable hits in the SwissRegulon database (as assessed using TOMTOM). In each case, representative GO terms showing significant enrichments amid the predicted regulon associated with that motif are shown (see Methods for details). (G) Overlap of predicted binding sites for IPOD-HR inferred motifs with either coding regions (genes) or promoters (both as annotated in RegulonDB) using only strict motif hits; shown are the log₂ fold enrichment or depletion of the overlap as compared with that expected by chance. (H) For the predicted regulon of each newly inferred motif (using only strict motif hits), we show the fraction of regulon members that are poorly annotated (UniProt annotation score of 1 or 2 out of 5); for comparison, dashed lines are shown for the values obtained when the same statistic is calculated for all annotated TF–gene interactions in RegulonDB (“Annotated TFBS”) and for the genome as a whole (“Overall”). GO, gene ontology; IPOD-HR, in vivo protein occupancy display—high resolution; TF, transcription factor; WT, wild-type.

More »

Expand

Fig 6.

EPODs define stable genomic structures and are associated with many distinct features.

(A) EPOD calls from a representative genomic region in the WT rich media condition, along with protein occupancy and RNA levels smoothed with a 1-kb rolling median. All displayed/analyzed EPOD calls refer to our strict threshold unless otherwise noted. (B) Number of called EPODs by condition (left) and fraction of the genome covered by EPODs (right) for both our loose and strict thresholds (see text for details). (C) IPOD-HR occupancies (shown over a 1-kb rolling median) and associated EPOD calls under 3 different conditions, in the same genomic region shown in panel A. EPOD calls are shown above the occupancy, in the same order as the data tracks. (D) Lower triangle: Overlap of EPOD calls (using a symmetrized distance that is the average of the fraction of EPOD positions from a condition a that is also called in condition b and vice versa) between each pair (a,b) of the studied conditions. Upper triangle: Each entry shows the fraction of the EPOD calls (at a 5-bp resolution) from the sample defining that row that is contained in a relaxed set of EPOD calls (see text) of the sample defining that column (only the upper triangle of that matrix is shown; the lower triangle is similar except that the smaller ΔargR EPOD set contains fewer of the EPODs from other conditions). (E) Density plots showing normalized histograms (smoothed by a kernel density estimator) of the specified quantities for regions of the genome that are in EPODs vs. those that are not (Background), as assessed in the WT M9/RDM/glu (WT,rich) condition. “*” indicates FDR-corrected p < 0.005 via a permutation test (against a null hypothesis of no difference in medians). Significance calling and additional comparisons are shown in S4 Table. EPOD, extended protein occupancy domain; IPOD-HR, in vivo protein occupancy display—high resolution; WT, wild-type.

More »

Expand

Fig 7.

EPODs are statistically enriched for genes in specific functional categories.

(A) The genome was split into EPOD and background regions as in Fig 6; we then applied iPAGE [37] to identify GO terms showing significant mutual information with occupancy in EPODs. All shown GO terms were significant according to the built-in tests in iPAGE. (B) Multiple EPODs are associated with silencing of the CP4-57 prophage. Shown are the IPOD-HR occupancy and transcript levels in the vicinity of the prophage locus during growth in rich defined media with glucose, with EPOD locations indicated above the plots. (C) Association of a small EPOD with 2 genes of unknown function, yigF and yigG, along with the putative transporter rarD/yigH; data tracks defined as in panel B. EPOD, extended protein occupancy domain; GO, gene ontology; IPOD-HR, in vivo protein occupancy display—high resolution.

More »

Expand