In Vitro Whole Genome DNA Binding Analysis of the Bacterial Replication Initiator and Transcription Factor DnaA

DnaA, the replication initiation protein in bacteria, is an AAA+ ATPase that binds and hydrolyzes ATP and exists in a heterogeneous population of ATP-DnaA and ADP-DnaA. DnaA binds cooperatively to the origin of replication and several other chromosomal regions, and functions as a transcription factor at some of these regions. We determined the binding properties of Bacillus subtilis DnaA to genomic DNA in vitro at single nucleotide resolution using in vitro DNA affinity purification and deep sequencing (IDAP-Seq). We used these data to identify 269 binding regions, refine the consensus sequence of the DnaA binding site, and compare the relative affinity of binding regions for ATP-DnaA and ADP-DnaA. Most sites had a slightly higher affinity for ATP-DnaA than ADP-DnaA, but a few had a strong preference for binding ATP-DnaA. Of the 269 sites, only the eight strongest binding ones have been observed to bind DnaA in vivo, suggesting that other cellular factors or the amount of available DnaA in vivo restricts DnaA binding to these additional sites. Conversely, we found several chromosomal regions that were bound by DnaA in vivo but not in vitro, and that the nucleoid-associated protein Rok was required for binding in vivo. Our in vitro characterization of the inherent ability of DnaA to bind the genome at single nucleotide resolution provides a backdrop for interpreting data on in vivo binding and regulation of DnaA, and is an approach that should be adaptable to many other DNA binding proteins.


Introduction
DnaA is the highly conserved replication initiation protein found in virtually all bacteria [reviewed in 1, 2 -6].DnaA binds to multiple 9 bp "DnaA boxes" (DnaA binding sites; consensus sequence 5'-TTATNCACA) in the origin of chromosomal replication (oriC).Once properly bound to sites in oriC, DnaA causes melting of an AT-rich region and recruitment of the replication machinery.DnaA also binds to DnaA boxes at other chromosomal regions, and in some cases functions as a transcription factor, activating some genes and repressing others.DnaA directly activates transcription of sda [7][8][9][10][11], and appears to directly repress transcription of ywlC, vpr, and the yyzF-yydD, trmE-noc, and ywcI-sacT operons [8,9,12,13].
DnaA is a AAA+ ATPase that is present in cells as both ATP-DnaA and ADP-DnaA.Although both ATP-DnaA and ADP-DnaA bind DNA, where analyzed, ATP-DnaA is required for replication initiation [3,[14][15][16][17][18][19].For E. coli DnaA, there are some sites that appear to bind ATP-DnaA and ADP-DnaA equally well, and others that have a preference for ATP-DnaA [reviewed in 3].The activity of E. coli DnaA is largely controlled by regulation of its ATPase activity and nucleotide exchange [reviewed in 1,3,20,21].In contrast, the activity of Bacillus subtilis DnaA appears to be largely regulated by several proteins, all of which affect its ability to bind cooperatively to DNA [22][23][24].
We set out to analyze the binding properties of B. subtilis DnaA to target chromosomal sites, on a genomic scale, in vitro, in the absence of other proteins.We used in vitro DNA affinity purification and sequencing (IDAP-Seq), an approach that is an in vitro analog of chromatin immunoprecipitation or affinity purification followed by deep sequencing, ChIP-Seq or ChAP-Seq, respectively.In IDAP-Seq, purified his-tagged protein (DnaA-his) is mixed with genomic DNA, the DNA bound to protein is isolated by affinity purification, and the bound DNA is analyzed by high throughput DNA sequencing.This approach has been used to analyze DNA binding by the transcriptional regulator CodY [25][26][27].
Using IDAP-Seq, we defined individual DnaA binding sites throughout the genome, and compared their affinity for ATP-DnaA and ADP-DnaA over a range of DnaA concentrations.We generated a position specific scoring matrix (PSSM) that can be used to predict DnaA binding sites with improved accuracy compared to a simple consensus sequence.The vast majority of sites bound in vitro have not been observed in vivo, suggesting that the amount of DnaA in vivo is limiting, or that other factors prevent binding at these sites under normal growth conditions.Conversely, we found some sites that were bound by DnaA in vivo, but that were not bound in vitro, indicating that at least one additional factor was involved in binding in vivo.We found that the nucleoid associated protein Rok was required for DnaA to bind to these chromosomal regions in vivo.Our results demonstrate that IDAP-Seq data can be used to understand and compare inherent binding properties of different forms of a given protein under defined in vitro conditions.In addition, comparison of IDAP-Seq data with data from in vivo analyses can provide insights into binding regions that require additional factors in vivo.

Results and Discussion
Overview of in vitro DNA affinity purification and sequencing (IDAP-Seq) The overall goal of our experiments was to identify all regions in the B. subtilis genome capable of binding DnaA, and to compare the binding properties of ATP-DnaA and ADP-DnaA to these regions to gain a better understanding of how DnaA binding is regulated.We incubated various concentrations of purified functional C-terminal hexa-histidine tagged DnaA (DnaAhis) with either ATP or ADP, and mixed each nucleotide-bound form of DnaA-his with purified sheared genomic DNA from B. subtilis.The ATPase activity of DnaA was very low under these conditions, hydrolyzing 2.2 ± 0.6 moles of ATP per mole of DnaA per hr (n = 6).This rate of hydrolysis would cause a small (0.2%) decrease in the concentration of ATP over the course of the incubations.Given the rapid rate of exchange between ATP and ADP [23], we expect that after addition of ATP most of the DnaA will be in the ATP-bound form.Genomic DNA with a uniform copy number along the chromosome and nearly random shearing pattern was used in these experiments, facilitating quantitative analysis of the data (see S1 Text).We isolated DnaA-his along with any bound DNA fragments, without crosslinking, using affinity resin for the hexa-histidine tag.We analyzed binding over a range of DnaA concentrations to compare binding at different chromosomal regions and to estimate apparent binding constants.The identity and amounts of genomic regions bound by DnaA were determined by deep sequencing (IDAP-Seq).

Binding of ATP-DnaA-his to genomic DNA in vitro
Using IDAP-seq, we identified the chromosomal regions that had increased binding by DnaA within the range of 55 nM to 4.1 μM DnaA.We found that the number of chromosomal regions bound and the amount of binding to individual regions increased with increasing concentrations of ATP-DnaA-his (Fig 1).There were no specific chromosomal regions recovered in control reactions with no added DnaA, as assessed by the distribution of sequencing reads over the genome (Fig 1A).In contrast, there were eight chromosomal regions predominantly associated with 55 nM ATP-DnaA-his following affinity purification (Fig 1B).These regions were the same as the major DnaA binding regions previously defined in vivo [8,9,12,13,28].They have a greater number of DnaA boxes than the other regions detected in vitro that required higher concentrations of DnaA for binding.As the concentration of ATP-DnaA-his was increased (55 nM; 140 nM; 550 nM; 1.4 μM; 4.1 μM), binding to the eight predominant regions increased and appeared to become saturated (Fig .In addition, binding to many other regions was detected and increased with increasing concentrations of ATP-DnaA-his.Confirmation that binding was mediated by the DnaA-binding domain of DnaA was obtained for six of the regions, spanning a wide range of affinities, by performing a parallel assay with a mutant DnaA (DnaAΔC-his) that is missing the DNA binding domain (S2 Fig) .We identified 269 chromosomal regions that were bound by 1.4 μM ATP-DnaA-his (S1 Fig and S1 Table ).This list includes all the regions that were bound at lower concentrations of DnaA, and also those that had increased binding at 4.1 μM DnaA.There was an approximately 300-fold difference in the amount of DNA detected from the weakest bound regions compared to the strongest sites at 1.4 μM ATP-DnaA-his, the second highest DnaA concentration tested.There were many additional regions bound at 4.1 μM ATP-DnaA-his, the highest concentration tested, that were not detected at the lower concentrations (Fig 1F).Because the amount of binding at these regions was low and was not detected at other concentrations of DnaA, they were not included in the list of binding regions (S1 Table ).
The number of binding regions (269) determined in vitro is significantly greater than the previously known binding regions (eight) determined in vivo.Because several different analyses of in vivo binding did not detect these additional regions [8,9,12,13,28], it seems unlikely that all, or even most, of these 269 regions are occupied by DnaA in vivo.The much larger number of regions bound by DnaA in vitro could be due to a combination factors, including the much higher sensitivity of the in vitro system, the possibility that the amount of DnaA in vivo is limiting, and the fact that DnaA binding is regulated in vivo.

Analysis of binding regions at single nucleotide resolution
We used the IDAP-Seq data to visualize binding by DnaA at single nucleotide resolution (Figs 2 and 3 and S1).In these analyses, the number of sequence reads starting at a specific nucleotide position was determined, and the reads were extended and summed to generate a curve indicative of total binding.If a specific DNA sequence is required for binding, then no sequence reads should start in that region.At 1.4 μM ATP-DnaA-his, the binding patterns for the strongest binding regions with multiple DnaA boxes were complex (Fig 2A -2H), often having  (A-J), plus one weaker binding region (K, L), are displayed, using binding data at 1.4 μM ATP-DnaA-his.The bottom section of each panel shows the genomic coordinates (using the AG1839 genomic sequence), the positions of each gene (gray rectangles), and the gene names.Arrowheads indicate the direction of transcription.The middle section of each panel is a histogram of the number of sequence reads that start at each nucleotide (blue, above the line, for sequence reads mapping to the top strand; green, below the line, for sequence reads mapping to the bottom strand).The red circles indicate DnaA boxes, predicted using the PSSM described in this paper.The top section of each panel is a plot of the amount of DNA recovered (as inferred from the sequence data) versus genome position, using 1.4 μM ATP-DnaA-his (black line) or no DnaA (red lines).The amount of DNA recovered is scaled to a global maximum of 1, as described for Fig 1 .A-J.The 10 strongest binding regions, corresponding to peaks #1-10 in Table     plateaus and multiple peaks of read start sites.In contrast, other regions with fewer DnaA boxes typically had a single, well-defined peak on each strand (Figs 2I-2L and 3A-3D).
Analyses and visualization of the binding regions at single nucleotide resolution provided insights into the requirements and contributions of individual DnaA boxes.A characteristic almost symmetric pattern of sequence reads beginning on either side of an otherwise "bare" region was indicative of a simple binding region containing a single DnaA box (Fig 3A and 3B).This footprint-like region defines the binding site and can be used to determine binding sites for uncharacterized proteins.Similarly, a larger bare region flanked by sequence reads on opposite strands was indicative of two binding sites, both of which appear to be required for DnaA to bind the region (Fig 3C and 3D).In more complex regions (e.g., Fig 3E), some DnaA boxes (numbered 1 and 2) appeared to make partial contributions to binding, as evidenced by an abrupt decrease in, but not a complete elimination of, reads at the junctions of the DnaA boxes.In contrast, DnaA boxes 3 and 4 appeared to be required for binding, since no sequence reads started in or between them.
The strongest binding regions contain arrays of DnaA boxes, and had complex binding patterns (Fig 2A -2H and 3F and 3G).In addition to DNA fragments that contained the complete array of DnaA boxes, fragments were also efficiently recovered that had one end within the array and therefore contained only a subset of the DnaA boxes.The requirement for specific DnaA boxes varied with the DnaA concentration.For example, in the sda promoter region, DnaA boxes 4, 5, 6, and 7 (Fig 3F ) were required for binding at the lowest concentration (55 nM) of ATP-DnaA-his tested.However, at the highest ATP-DnaA-his concentration (4.1 μM), fragments were efficiently recovered as long as they contained either DnaA boxes 1 and 2 or DnaA boxes 6 and 7 (Fig 3G).The finding that DnaA boxes 1 and 2 contribute to binding is consistent with in vivo results showing that these sites are important for full activation of transcription of sda by DnaA, and that a mutation in either of these individual DnaA boxes causes a reduction in sda expression [7].
The single nucleotide resolution afforded by IDAP-seq is somewhat reminiscent of the resolution obtained with DNA footprinting.Published footprinting data for DnaA binding to B. subtilis DNA is available for two sites: the dnaA promoter region, and the region upstream of the DUE [30].About half of the DnaA boxes observed by footprinting of these regions were directly supported by our IDAP-seq data.For the remaining footprinted sites, it was not possible to determine whether or not they were bound in our assay.This is because the IDAP-seq method is more analogous to a single nucleotide truncation analysis than footprinting, and in regions that contain arrays of DnaA boxes (including the dnaA promoter and the DUE), removal of one DnaA box from the end will not always give a robust change in DNA recovery if its contribution is small compared the remaining DnaA boxes.

Position-specific scoring matrix for DnaA binding sites
We used a subset of the binding data to establish a position-specific scoring matrix (PSSM) that predicted DnaA binding sites more effectively than a simple consensus sequence.When we searched for potential DnaA binding sites based on the consensus sequence 5'-TTATN-CACA-3' [31] we found that restricting the search to one mismatch was overly stringent, whereas allowing two mismatches from the consensus resulted in an excessive number of predicted DnaA boxes throughout the genome.A PSSM was developed to more accurately predict the binding sites observed in our experiments.
We derived a PSSM from 150 DnaA box sequences (S2 Table ) that were observed to bind DnaA in our IDAP-Seq data.Whereas the PSSM (Fig 4A and S3 Table) was consistent with the previously determined consensus sequence, it provided a more sensitive measure of the extent to which mismatches are tolerated at each position.We predicted a total of 11,353 DnaA boxes over the whole genome using the PSSM, and in general these putative DnaA boxes were more closely correlated with binding than boxes predicted using the consensus sequence with up to two mismatches (e.g., Fig 4B -4E).In many instances, the PSSM identified functional DnaA boxes that had three mismatches from the consensus (asterisks in Fig 4D and 4E).In contrast, using the consensus sequence and allowing three mismatches predicted one DnaA box every 17 bp of genomic sequence.The vast majority of the 269 regions bound by ATP-DnaA-his contained at least one DnaA box centered at the peak summit or two DnaA boxes flanking the peak summit (S1 Fig and S4 Table ).A total of 784 predicted DnaA boxes were found within 150 bp of the summits of these 269 binding regions.A Pearson's correlation coefficient of 0.74 was observed between the amount of binding observed at each region and how well the DnaA boxes in that region matched the PSSM (S4 Fig).
Many additional predicted DnaA boxes were bound at 4.1 μM ATP-DnaA-his, and others might require yet higher DnaA concentrations to bind.In addition, there are almost certainly some predicted DnaA boxes that are not functional, either because of limitations in the PSSM itself, or due to that fact that the PSSM approach does not take into consideration other factors

Apparent binding constants for several regions throughout the genome
We used the IDAP-Seq data to estimate apparent dissociation constants for several of the 269 regions bound at 1.4 μM ATP-DnaA-his.We plotted the amount of DNA recovered as a function of ATP-DnaA-his concentration for each of the 269 binding regions (Figs 5 and S1).For each region the nucleotide position with the maximum amount of DNA (peak summit) was identified, and the relative number of DNA fragments that spanned that position was determined at each DnaA concentration (Materials and Methods; S3 Fig) .Genomic DNA (~300 μM of base pairs) was used in the binding reactions, providing an excess of non-specific DNA for competition in binding to DnaA.For regions with high affinity, the amount of DNA recovered became saturated as the ATP-DnaA-his concentration was increased (Fig 5A -5H).For regions with intermediate affinity, the amount of DNA recovered appeared to begin to saturate at the highest concentration of ATP-DnaA-his (Fig 5I and 5J).For the remaining regions, the amount of DNA recovered increased with the concentration of ATP-DnaA-his, but did not reach saturation (e.g., Fig 5K and 5L).Evaluating relative binding for the weaker binding regions can be done by comparing the relative amount of DNA recovered at 1.4 or 4.1 μM DnaA (S1 Table ).DnaA binding to almost all regions was cooperative, consistent with the positive cooperative binding previously observed for B. subtilis DnaA [22][23][24]30].
The apparent binding constants determined by IDAP-seq for the eight high affinity binding sites ranged from 0.13-0.33μM (Table 1, S1 Table ), and were several-fold greater than those determined previously by gel mobility shift assays [22,23].Although IDAP-seq measurements have inherently higher error owing to the complex experimental protocol, the higher K d 's can be attributed, in large part, to washing the bound complexes in the IDAP experiments and the absence of caging effects from gel electrophoresis.In gel mobility shift assays, the complexes are loaded directly on a gel, where caging effects of the gel matrix stabilize binding.Other The two binding regions in oriC are listed first, and subsequent peaks are ordered by the amounts of DNA recovered at 1.4 μM ATP-DnaA-his.For peaks in intergenic regions, both flanking genes are indicated.For peaks located within genes, only that gene is indicated. 2 The numbers presented are the genome coordinates for the summit of each peak.The genome sequence used was from lab strain AG1839 [29] with 1 being the nucleotide 409 bp upstream of the dnaA open reading frame. 3 The number of putative DnaA binding sites (DnaA boxes) in a 300 bp window centered on the peak summit, as determined using the PSSM (Materials and Methods).
factors, including the heterogeneous nature of the DNA template for binding, the presence of multiple binding regions with similar affinities, and the excess of competitor DNA in the IDAP-Seq experiments, likely also contribute to the higher apparent K d 's determined with IDAP-Seq compared to those determined by gel shift assays.These differences between IDAP--Seq and gel mobility shift assays likely also affect estimates of cooperativity.
Comparisons of binding by ATP-DnaA-his to that of ADP-DnaA-his We found that the overall binding patterns for ADP-DnaA-his (S5  A few regions had a strong preference for ATP-DnaA-his (Fig 6A).Among the eight high affinity regions, the most dramatic differences between ATP-DnaA-his and ADP-DnaA-his were observed in the sda promoter region and the region between the 3' ends of gcp and ydiF (Fig 6A and 6B and 6C).Approximately 50-fold more DNA from the sda promoter region was recovered with 55 nM ATP-DnaA-his than with 55 nM ADP-DnaA-his.For the region between gcp and ydiF, this difference was 16-fold.The differences between ATP-and ADP-DnaA-his diminished at higher DnaA concentrations as binding became saturated.
Large differences between binding by ATP-DnaA-his and ADP-DnaA-his were also observed for weaker binding regions.For example, there was detectable binding to yhcN by ATP-DnaA-his at a concentration of 140 nM, whereas binding by ADP-DnaA-his was not detected until 550 nM (Fig 6A and 6D).At 550 nM DnaA, there was 73-fold more yhcN DNA bound to ATP-DnaA-his compared to ADP-DnaA-his.Similarly, there was 24-fold more yhdF bound to 550 nM ATP-DnaA-his compared to ADP-DnaA-his (Fig 6A and 6E).Although we cannot be certain that homogeneous DnaA-ATP or DnaA-ADP was present in the respective reactions, if heterogeneity did exist, it would cause an underestimate of the differences between DnaA-ATP and DnaA-ADP.
The basis for some DnaA sites exhibiting much higher affinity for ATP-DnaA than ADP-DnaA is almost certainly due to a combination of factors, including the sequence, orientation and spacing of the DnaA boxes, and the sequences flanking the DnaA boxes.There are not enough regions to define the features that contribute to the large differences.Rather, the four characterized regulators of replication initiation in B. subtilis, YabA [22,33,34], Soj [24], SirA [35][36][37], and DnaD [23,34], all affect the ability of DnaA to bind DNA.Our findings that DnaA binding to the oriC region is not particularly sensitive to the nucleotide bound state of DnaA are consistent with the emerging view that regulation of nucleotide hydrolysis and exchange may not play a predominant role in the regulation of replication initiation in B. subtilis, in contrast to the regulation in E. coli [6,38].
sda.Whereas the activity of DnaA during replication initiation appears to be regulated primarily at the level of oligomerization (rather than nucleotide binding) this is not necessarily the case when DnaA functions as a transcription factor.Our finding that ATP-bound DnaA binds to the sda promoter region much more strongly than the ADP-bound form may point to an important role for bound nucleotide in transcriptional regulation at this locus, and perhaps other loci where DnaA functions as a transcriptional regulator.
The sda gene product inhibits activation of Spo0A [7,10,11], a transcription factor that functions during stationary phase and sporulation.DnaA activates transcription of sda in vivo, thereby inhibiting entry into sporulation.Because the Sda protein is highly unstable [39], changes in transcription of sda will result in rapid changes in Sda protein levels.
Our findings indicate that transcription from the sda promoter may be highly sensitive to the relative amounts of ATP-and ADP-DnaA.These changes could occur due to modulation of the nucleotide hydrolysis and exchange activities of DnaA, or due to changes in the cellular ratio of ATP and ADP.Further studies will be needed to determine whether these potential mechanisms are important in vivo for DnaA/Sda-mediated activation of stationary phase and sporulation gene expression pathways.

Comparisons of DnaA binding in vitro and in vivo
Our in vitro data on DnaA binding provides a framework for interpreting in vivo DnaA ChIP results, and vice versa.We anticipated three general types of findings when comparing in vitro to in vivo binding by DnaA: 1) binding is detected both in vitro and in vivo; 2) binding is detected in vitro but not in vivo; and 3) binding is detected in vivo, but is not detected in vitro.
Of the 269 binding regions identified in vitro at 1.4 μM ATP-DnaA-his, only the eight strongest binding regions have been readily detected in vivo [8,9,12,13].The next strongest binding regions in vitro were within the open reading frames of codV (encoding a homologue of the tyrosine recombinase XerC), and rplB (encoding ribosomal protein L2) (Table 1).We estimated that the in vivo concentration of DnaA is ~1-2.5 μM in cells growing exponentially in minimal glucose medium at 30°C.The amount of binding at rplB in vitro at the 1.4 and 4.1 μM ATP-DnaAhis concentrations is 28-46% that observed for the eight sites that are readily observed in vivo.If no other factors affect binding, then this indicates that DnaA could bind rplB in a significant fraction of cells.Instead we found no detectable DnaA binding to rplB in vivo using ChIP-PCR.
We suspect that there are factors in vivo that prevent DnaA from binding to the site within rplB.For example, since the binding site is within the rplB open reading frame, it is possible that transcription prevents stable association of DnaA with the site.Alternatively, the concentration of available DnaA might be limited by titration due to efficient binding at other regions [e.g., 40].It is also possible that there is some binding in vivo, but that it is below the limit of detection of the ChIP-PCR assay, or that binding occurs under biological conditions that we have not assayed.

DnaA depends on Rok in vivo to bind to some chromosomal regions
We used ChIP-PCR to measure DnaA binding in vivo at four regions that we observed to bind DnaA in preliminary in vivo ChIP-seq experiments.These regions had not been identified in previously reported in vivo ChIP experiments with DnaA [8,9,12,13], perhaps due to lower sensitivity of the detection methods.These are all intergenic regions (between: ywiB-sboA, yuzB-yutJ, yjcM-yjcN, and icsS-braB) and contain promoters in one or both directions [41].We found that DnaA was consistently associated with these regions in ChIP-PCR experiments (n = 6; Table 2), with mean fold enrichment values ranging from 4.5 (iscS) to 12.8 (sboA), compared to 83-fold for the dnaA promoter, a control site that is readily detected in vivo [8,9,12,13,28].Interestingly, none of these four regions bound DnaA in vitro in our IDAP-seq experiments, even at the highest concentration of DnaA tested.Furthermore, only the sboA region has a recognizable DnaA box near the in vivo binding site.The simple interpretation of these results is that there is a factor needed for binding in vivo that is not present in the purified in vitro binding reactions.Because three of these four regions bound by DnaA in vivo but not in vitro were previously found to be bound in vivo by the nucleoid-associated protein Rok [42], we tested whether Rok might be required for DnaA binding at these regions in vivo.In a rok null mutant, there was much less association of DnaA with these regions compared to wild type cells (Table 2).DnaA protein levels are not substantially different in rok null mutant cells (S6 Fig) , indicating that the loss of binding was not due to a decreased amount of DnaA.Consistent with this, binding at the dnaA promoter, which has 12 DnaA boxes and binds DnaA robustly in vitro, was not significantly different between wild type and rok mutant cells (Table 2).These results indicate that Rok is required for association of DnaA with these chromosomal regions in vivo.

General utility of IDAP-Seq approach
The IDAP-Seq approach used here to study DNA binding by DnaA should be a useful for determining the inherent binding properties of many different proteins, and the effects of various ligands, such as the ATP and ADP comparison presented here.It should also be possible to determine the effects of mutations, phosphorylation, acetylation, and other modifications on DNA binding.The effects of regulatory proteins that modulate binding could also be evaluated, provided the two proteins do not have the same tag.In all cases, apparent binding constants can be determined and compared for sites throughout the entire genome.
Experiments to date with DnaA and CodY [25][26][27] used a his-tagged version of the DNA binding protein, with DNA-protein complexes recovered by metal affinity chromatography.Other tags could potentially be used for purification of complexes.In preliminary experiments, we found that filter binding, which does not require a tag, can also be used to recover DNAprotein complexes.Antibodies could also be used to recover DNA-protein complexes with or without a tag.Our experiments were performed without crosslinking, but for weaker binding proteins, where initially bound fragments might be lost in wash steps, crosslinking protein to DNA could be used in a more standard ChIP-type experiment, either with immunoprecipitation, affinity purification, or filter binding of crosslinked protein-DNA complexes.Comparing in vitro studies of genomic binding using the IDAP-Seq method with more traditional in vivo ChIP experiments should provide valuable clues about how the activities of DNA binding proteins are modulated in cells.

Materials and Methods
Purification of hexa-histidine-tagged DnaA, preparation of sheared genomic DNA, refinement of the catalog of DnaA binding regions, and development of the DnaA box PSSM are described in S1 Text.

DnaA-his
B. subtilis DnaA with the amino acids AAALEHHHHHH added to the C-terminus of DnaA was purified from E. coli (S1 Text).Final DnaA-his preparations were typically 98% pure.A similarly his-tagged DnaA (with 12 instead of six histidines) is functional in vivo [9].
Each reaction was mixed with 100 μl Talon Co + resin (Clontech) pre-equilibrated with equilibration/wash buffer (40 mM HEPES-KOH, pH 7.6, 150 mM potassium glutamate, 2.5 mM ATP or ADP, 10 mM magnesium acetate, 50 μg/ml BSA, 20% glycerol) and rotated for 30 min at room temperature.Each mixture was transferred to a Poly-Prep column (Bio-Rad, Hercules, CA), and washed three times with 1 ml equilibration/wash buffer, with care taken to ensure that all washes were done under virtually identical conditions.Complexes of DnaA-his bound to DNA were eluted by adding 0.5 ml ChIP elution buffer (50 mM Tris-HCl, pH 8.0, 10 mM EDTA, 1% SDS), capping the bottoms and covering the tops of the columns tightly with foil, and incubating at 65°C for 15 min.The eluate was collected, and the resin was washed twice with 200 μl ChIP elution buffer to recover all of the eluted DNA.The recovered DNA was purified using a QiaQuick PCR purification kit (Qiagen).

DNA sequencing
Sample preparation, including incorporation of a 3' barcode, selection of 200-400 bp fragments (after addition of adaptors and amplification), and single read sequencing (40 nt) on an Illumina HiSeq were performed by the MIT BioMicro Center.

Seq data processing and peak calling algorithm
Alignment of DNA fragments bound by DnaA-his to the genome of AG1839 (a.k.a., KPL69; GenBank accession number CP008698) [29] was performed using Bowtie [45], with adjustments to compensate for the fact that the chromosome is circular.Peak calling on the 1.4 μM and 4.1 μM ATP-DnaA-his data was done using cisGenome v. 2.0 [46], and in some cases PeakSplitter [47], and visualized in the genome browser MochiView [48] for manual refinement (see S1 Text for details).The genome position of the summit of each peak was determined using data from the 4.1 μM ATP-DnaA-his binding reaction, because the peaks (especially the weaker ones) were better defined at this DnaA concentration.Seq data are available at NCBI under accession SRX648534.

Quantitation of binding and determination of apparent binding constants
To determine the amount of DNA bound by DnaA-his for each chromosomal region, we determined the number of sequence reads across that region.Each sequence read (mapped to the chromosome using Bowtie) was computationally extended by the estimated average fragment length of 250 base pairs (presented schematically in S3A and S3B Fig) .The relative coverage at each bp along the chromosome was obtained by summing the number of fragments on both the positive and negative strands that are inferred to span that position (S3C Fig) .Custom R scripts were used for these steps.The resulting coverage map allowed different regions along the chromosome to be compared for any given sample.
To compare individual loci under a variety of binding conditions (e.g., ATP v. ADP, or at different concentrations of DnaA), we normalized the number of sequence reads (coverage Apparent K d 's were determined using Prism5 (GraphPad Software).Data were fit to the equation y = (B max )(x n )/(K d n + x n ), where B max is maximum binding, x is the DnaA-his concentration, y was the amount of binding observed, and n is the Hill constant.For each binding region, the position of the peak was determined in the 4.1 μM DnaA dataset, and the peak height at the same position was determined for the lower DnaA concentrations and used as the amount of binding.For binding regions that approached saturation, B max was fitted from the binding data.
For several binding regions, B max could be determined for ATP-but not ADP-DnaA-his binding.In these cases, the B max determined for ATP-DnaA-his was used to fit the ADP-DnaA-his data.In all other instances, B max of 0.8 was used to determine an apparent K d .

Annotation of DnaA boxes
DnaA boxes in the B. subtilis genome were annotated using the PSSM generated as part of this study (S1 Text).This PSSM was used to search the genome sequence of AG1839 genome using RSAT [49] with a p-value cutoff of 0.0015.Where overlapping DnaA boxes were detected, the one with the higher p-value was discarded.This collection of DnaA boxes was used in all figures and tables.A "DnaA Box Score" for the binding regions was calculated by summing the negative log of the P-value from the PSSM for each binding region.
In vivo DnaA ChIP-PCRs and strains used DnaA binding to specific chromosomal regions in vivo was determined by ChIP followed by quantitative PCR (ChIP-PCR).Wild type (AG174; genotype: trp, phe) and rok null mutant {HM57; genotype: trp, phe, rok::pDG641rok (mls)} cells were grown at 37°C in LB medium.
(The rok null mutation is an integration of plasmid pDG641rok into rok by single crossover, disrupting the rok open reading frame.)Cells in mid-exponential phase were treated with 1% formaldehyde for 20 min to crosslink protein and DNA.Crosslinking was quenched by adding glycine (0.22 M). Preparation of lysates and immunoprecipitations were done essentially as describe previously [50].DnaA was immunoprecipitated with rabbit polyclonal antiserum and the DNA was recovered using a QiaQuick PCR purification kit (Qiagen).Quantitative PCR was performed on a Roche LightCycler 480, with reduced annealing (48°C), extension (68°C), and acquisition (63°C) temperatures to compensate for the low melting temperatures of many of the loci being examined.Fold-enrichments were calculated as described in [51], using nicK, a region in ICEBs1 [52] that does not bind DnaA, for normalization.The primer pairs used are listed in S5 Table.(G) The coverage in panel E, which was calculated using the same number of reads for each sample, was scaled so that the total coverage (summed over the whole genome) is proportional

Fig 1 .
Fig 1. Binding of ATP-DnaA-his to genomic DNA in vitro.The relative amount of binding by ATP-DnaAhis is plotted on the y-axis (normalized so that maximum binding has an amplitude of 1) versus the position along the chromosome on the x-axis.The amount of binding was determined by sequence analysis of the DNA recovered in each binding reaction.Binding data is presented in 200 nucleotide bins, with the maximum binding amplitude in each bin drawn.The 4.2 mb circular chromosome is depicted linearly such that the origin of replication is near the middle of the x-axis.The concentration of ATP-DnaA-his in each binding reaction was (A) no DnaA; (B) 55 nM; (C) 140 nM; (D) 550 nM; (E) 1.4 μM; (F) 4.1 μM.The major peaks are numbered (C), and correspond to the following nearby loci: (1) sda; (2) ywlC; (3) ywcI; (4) yydA; (5) consists of 3 adjacent peaks (trmE, dnaA, and between dnaA and dnaN) that are not resolved at this scale; (6) gcp/ydiF.The inset in panel B above the asterisk corresponds to a 7 kb region that contains the trmE, dnaA, and dnaA/N binding regions.doi:10.1371/journal.pgen.1005258.g001

Fig 2 .
Fig 2. IDAP-Seq data for individual regions bound by DnaA.The 10 strongest binding regions (A-J), plus one weaker binding region (K, L), are displayed, using binding data at 1.4 μM ATP-DnaA-his.The bottom section of each panel shows the genomic coordinates (using the AG1839 genomic sequence), the positions of each gene (gray rectangles), and the gene names.Arrowheads indicate the direction of transcription.The middle section of each panel is a histogram of the number of sequence reads that start at each nucleotide (blue, above the line, for sequence reads mapping to the top strand; green, below the line, for sequence reads mapping to the bottom strand).The red circles indicate DnaA boxes, predicted using the PSSM described in this paper.The top section of each panel is a plot of the amount of DNA recovered (as inferred from the sequence data) versus genome position, using 1.4 μM ATP-DnaA-his (black line) or no DnaA (red lines).The amount of DNA recovered is scaled to a global maximum of 1, as described for Fig 1. A-J.The 10 strongest binding regions, corresponding to peaks #1-10 inTable1, S1Table, and S1Fig.(A) upstream of dnaA (part of oriC); (B) between dnaA and dnaN, containing the DNA unwinding element (also part of oriC); (C) upstream of ywcI and vpr; (D) downstream of gcp and ydiF; (E) upstream of trmE and downstream of jag; (F) upstream of ywlC and downstream of ywlB; (G) upstream of yqeG and sda; (H) upstream of yydA, spanning yyzF, and upstream of yycS; (I) within codV; (J) within rplB.K-L.A representative weaker binding region, (peak #49 in S1 Table; S1 Fig).(K) binding scaled to 1 to be comparable to previous panels; (L) the same region rescaled so that the binding pattern is visible.
Fig 2. IDAP-Seq data for individual regions bound by DnaA.The 10 strongest binding regions (A-J), plus one weaker binding region (K, L), are displayed, using binding data at 1.4 μM ATP-DnaA-his.The bottom section of each panel shows the genomic coordinates (using the AG1839 genomic sequence), the positions of each gene (gray rectangles), and the gene names.Arrowheads indicate the direction of transcription.The middle section of each panel is a histogram of the number of sequence reads that start at each nucleotide (blue, above the line, for sequence reads mapping to the top strand; green, below the line, for sequence reads mapping to the bottom strand).The red circles indicate DnaA boxes, predicted using the PSSM described in this paper.The top section of each panel is a plot of the amount of DNA recovered (as inferred from the sequence data) versus genome position, using 1.4 μM ATP-DnaA-his (black line) or no DnaA (red lines).The amount of DNA recovered is scaled to a global maximum of 1, as described for Fig 1. A-J.The 10 strongest binding regions, corresponding to peaks #1-10 inTable1, S1Table, and S1Fig.(A) upstream of dnaA (part of oriC); (B) between dnaA and dnaN, containing the DNA unwinding element (also part of oriC); (C) upstream of ywcI and vpr; (D) downstream of gcp and ydiF; (E) upstream of trmE and downstream of jag; (F) upstream of ywlC and downstream of ywlB; (G) upstream of yqeG and sda; (H) upstream of yydA, spanning yyzF, and upstream of yycS; (I) within codV; (J) within rplB.K-L.A representative weaker binding region, (peak #49 in S1 Table; S1 Fig).(K) binding scaled to 1 to be comparable to previous panels; (L) the same region rescaled so that the binding pattern is visible.
Fig 2. IDAP-Seq data for individual regions bound by DnaA.The 10 strongest binding regions (A-J), plus one weaker binding region (K, L), are displayed, using binding data at 1.4 μM ATP-DnaA-his.The bottom section of each panel shows the genomic coordinates (using the AG1839 genomic sequence), the positions of each gene (gray rectangles), and the gene names.Arrowheads indicate the direction of transcription.The middle section of each panel is a histogram of the number of sequence reads that start at each nucleotide (blue, above the line, for sequence reads mapping to the top strand; green, below the line, for sequence reads mapping to the bottom strand).The red circles indicate DnaA boxes, predicted using the PSSM described in this paper.The top section of each panel is a plot of the amount of DNA recovered (as inferred from the sequence data) versus genome position, using 1.4 μM ATP-DnaA-his (black line) or no DnaA (red lines).The amount of DNA recovered is scaled to a global maximum of 1, as described for Fig 1. A-J.The 10 strongest binding regions, corresponding to peaks #1-10 inTable1, S1Table, and S1Fig.(A) upstream of dnaA (part of oriC); (B) between dnaA and dnaN, containing the DNA unwinding element (also part of oriC); (C) upstream of ywcI and vpr; (D) downstream of gcp and ydiF; (E) upstream of trmE and downstream of jag; (F) upstream of ywlC and downstream of ywlB; (G) upstream of yqeG and sda; (H) upstream of yydA, spanning yyzF, and upstream of yycS; (I) within codV; (J) within rplB.K-L.A representative weaker binding region, (peak #49 in S1 Table; S1 Fig).(K) binding scaled to 1 to be comparable to previous panels; (L) the same region rescaled so that the binding pattern is visible.doi:10.1371/journal.pgen.1005258.g002

Fig 3 .
Fig 3. Identification of DnaA binding sites at single nucleotide resolution.Histograms of the number of sequence reads that start at each nucleotide are plotted as in Fig 2 (blue, above the line, for sequence reads mapping to the top strand; green, below the line, for sequence reads mapping to the bottom strand).Data are from a binding reaction containing 4.1 μM ATP-DnaA-his, except for panel F, where data are from a reaction containing 55 nM ATP-DnaA-his.For each panel, a 300 bp portion of the genome is presented, and the y-axis is scaled so that the data fills the space.Predicted DnaA boxes are represented by red ovals, and are numbered from left to right for regions with more than three DnaA boxes.The gray rectangles below the histograms depict nearby genes, with arrowheads indicating the direction of transcription.The regions presented are: (A) inside cssS; (B) inside ygaN; (C) inside ypiF; (D) inside lpdV; (E) inside yabS; (F) upstream of sda (55 nM ATP-DnaA-his data); (G) upstream of sda (4.1 μM ATP-DnaA-his data).Panel B corresponds to peak #140 (S1 Table and S1 Fig) and panels F and G correspond to peak #7.The other regions did not show sufficient binding at 1.4 μM DnaA to be included in the peak catalog.doi:10.1371/journal.pgen.1005258.g003

Fig 4 .
Fig 4. Comparison of DnaA boxes from a consensus sequence with DnaA boxes from the PSSM.(A) A logo, drawn using WebLogo [32], of the DnaA boxes used to construct the DnaA box PSSM is shown.(B-E) Histograms of the number of sequence reads that start at each nucleotide are plotted (blue for sequence reads mapping to the top strand; green for sequence reads mapping to the bottom strand).Data are from the binding reaction containing 4.1 μM ATP-DnaA-his.For each panel, a 300 bp portion of the genome is presented, and the y-axes are scaled so that the data fills the space.The red ovals on the horizontal axis indicate the position of DnaA binding sites predicted using the PSSM described here.The pink ovals above the horizontal axis indicate DnaA boxes with 2 mismatches from the TTATNCACA consensus sequence, and the maroon ovals have 1 mismatch from the TTATNCACA consensus.(No perfect matches to the consensus are found in these regions.)The gray rectangles below the histograms indicate nearby genes, with arrowheads indicating the direction of transcription.doi:10.1371/journal.pgen.1005258.g004

Fig 5 .
Fig 5. Binding curves for ATP-DnaA-his for selected chromosomal regions.The relative amount of DNA recovered from different chromosomal regions is plotted on the y-axis, versus the ATP-DnaA-his concentration on the x-axis.The curves were obtained from fitting the data to the Hill Equation.Chromosomal locations (nucleotide position in the sequence of AG1839, peak number in S1 Table, and nearby genes) include: (A) 150, #1, upstream of rpmH and dnaA; (B) 1841, #2, downstream of dnaA and upstream of the DUE and dnaN; (C) 3885674, #5, upstream of ywcI and vpr; (D) 627955, #3, downstream of gcp and ydiF; (E) 4191071, #4, upstream of trmE and downstream of jag; (F) 3772105, #6, upstream of ywlC and downstream of ywlB; (G) 2620059, #7, upstream of yqeG and sda; (H) 4113012, #8, upstream of yydA and downstream of yyzF; (I) 1670814, #9, inside codV; (J) 137727, #10, inside rplB; (K) 624773, #40, inside thiL; (L) 1922157, #127, upstream of ynfC and alsT.doi:10.1371/journal.pgen.1005258.g005 Fig) were similar to those for ATP-DnaA-his (Fig 1).At the lowest concentration of DnaA tested, the prominent binding regions were the same eight regions that were bound by ATP-DnaA-his, and the number of bound regions increased at higher concentrations of ADP-DnaA-his (S5 Fig).The amount of binding to any specific region at a given concentration of DnaA was almost always greater with ATP-D-naA-his than with ADP-DnaA-his.This is seen by comparing the ratio of binding (amount of DNA recovered) by ATP-DnaA-his to that by ADP-DnaA-his (Fig 6A).Two regions (the yydA promoter region and the region in oriC between dnaA and dnaN) appeared to have a small preference for ADP-DnaA-his over ATP-DnaA-his, but only at 1.4 μM DnaA (Fig 6A and 6G).

Fig 6 .
Fig 6.Relative DNA binding by ATP-DnaA-his compared to ADP-DnaA-his.(A) The ratio of the amount of DNA recovered bound to ATP-DnaA-his vs. ADP-DnaA-his is plotted versus the DnaA concentration.Background binding was subtracted prior to calculating the ratio.All 269 peaks detected at 1.4 μM were analyzed at each concentration, but the ratio of ATP/ADP binding is shown only if the binding was 1.5-times greater than background for both the ATP and the ADP binding reactions at that DnaA-his concentration.For many weaker peaks, these criteria were met only at the highest concentration of DnaA-his.Solid and dotted lines connect points for the indicated region across concentrations.The black bars indicate the average ratio (ATP-DnaA-his/ADP-DnaAhis) at each DnaA-his concentration.The two data points at 1.4 μM DnaA-his that are slightly less than 1 correspond to the yydA and dnaN regions.(B-G) The relative amount of DNA recovered from different chromosomal regions is plotted on the y-axis, versus the DnaA-his concentration (log scale) on the x-axis.ATP-DnaA-his, open circles and dashed lines (data are the same as shown in parts of Fig 5); ADP-DnaA-his, filled triangles and dotted lines.Genomic coordinates (AG1839 genome), and nearby genes: (B) 2620051, upstream of sda and yqeG; (C) 627955, downstream of gcp and ydiF; (D) 972751, inside yhcN; (E) 1006129, inside yhdF; (F) 150, oriC upstream of dnaA; (G) 1841, oriC upstream of the DUE and dnaN.doi:10.1371/journal.pgen.1005258.g006 The dnaA-dnaN oriC region.The oriC region includes two clusters of DnaA binding sites: one in the dnaA promoter region (Fig 2A), and the other between dnaA and dnaN (Fig 2B), just upstream of the DNA unwinding element (DUE).The difference between the nucleotide bound forms for these oriC binding regions are fairly modest-a maximal difference is seen at 140 nM DnaA-his, where 3-4 times more DNA is bound with ATP compared to ADP (Fig 6A and 6F and 6G).It is likely that in vivo, DnaA is bound to the oriC sites regardless of whether DnaA is in the ATP or ADP bound form.Thus far, none of the known regulators of replication initiation in B. subtilis affect nucleotide binding, exchange, or hydrolysis by DnaA.
map amplitudes) to the total amount of DNA recovered in each reaction (S1 Text, S3D-S3G Fig).The amount of DNA that was recovered in each sample increased with increasing amounts DnaA (S3F Fig), due to 1) increases in background binding, 2) increases in binding at regions that have not yet reached saturation, and 3) binding at new weaker binding regions.Measurements of the maximum binding following normalization vs.DNA concentration gave data that could be fit to a binding curve (S3H Fig).
dashed vertical red line).The labeled x-axis indicates genomic coordinates from strain AG1839.Top the overall amount of binding inferred from the sequence data.black curve, binding with 4.1 μM DnaA; red curve, binding with no added DnaA.Relative binding (y-axis) was normalized to a global maximum of 1 at 1.4 μM ATP-DnaA.Middle left: a histogram of the number of sequence reads (from 4.1 μM ATP-DnaA-his) that start at each nucleotide; blue, sequence reads mapping to the top strand; green, sequence reads mapping to the bottom strand.Red circles indicate potential DnaA binding sites predicted using the PSSM described in this paper.Bottom left: genes with arrowheads indicating the direction of transcription.Top right: sequence of each of the putative DnaA boxes identified by the PSSM and shown in the middle left.For regions with >5 putative DnaA boxes, the complete list is in S5 Table).Bottom right: binding curves plotting the amount of DNA recovered as a function of the concentration of DnaA-his.ATP-DnaA-his, open circles and dashed lines; ADP-DnaA-his, filled triangles and dotted lines.(PDF) S2 Fig.The C-terminal DNA binding domain of DnaA is required for association of DnaA with chromosomal regions in vitro.Binding reactions were performed under the same conditions as for DnaA-his, except that 4.1 μM DnaAΔC-his was used.DnaAΔC lacks the C-terminal 91 amino acids that are required for DNA binding.A reaction containing full-length DnaA-his was performed in parallel.The binding reactions contained 2.5 mM ATP.The recovered DNA was assayed using qPCR, with the primers indicated in S5 Table.The following loci were assayed (peak numbers refer to those in S1 Fig and S1 Table): cotH (peak #198), ypfD (peak #235), yphF (too weak to be called as a peak at 1.4 μM DnaA but clearly discernible at 4.1 μM), ydiO (peak #250), rplB (peak #10), dnaA (peak #1), and nicK, a control region that does not bind DnaA.(TIFF) S3 Fig. Method for quantitating binding data over a range of DnaA-his concentrations.In panels A-C, a schematic representation using a toy dataset shows how deep sequencing data were converted to coverage along the chromosome.(A) start positions of sequence reads are plotted as histograms, and are shown clustered around a DnaA binding site depicted by the red dotted line.(B) Each read was extended in the appropriate direction (rightward for reads corresponding to the top strand, and leftward for reads corresponding to the bottom strand) by the average fragment length of 250 bp.(C)The number of fragments containing each nucleotide along the genome is determined, yielding the relative coverage along the genome.Although this allows for comparison between different genomic loci in the same binding reaction, it does not support comparison between different binding reactions (i.e., comparing ATP and ADP, or comparing different concentrations of DnaA-his.)(D) Actual sequence data from the sda promoter region from samples containing the indicated concentrations of ATP-DnaA-his.The y-axis scale for each of the samples is the same.The same total number of reads was mapped for each binding reaction, but the number of reads mapping to the sda promoter region (and other high-affinity DnaA binding regions) decreased at the two highest concentrations of DnaA-his.This is because at these DnaA concentrations, binding to sda has already saturated, while an increasing portion of the reads map to weaker binding regions, and there is also an increase in background binding.(E) The relative coverage along the same region as in D, obtained by extending the reads by the average read length and summing the number of extended reads spanning each position, as depicted in A, B, and C. (F) The amount of DNA recovered in each binding reaction (prior to any preparation steps for deep sequencing) was determined.

Table 1 .
Top 20DnaA binding regions in vitro.

Table 2 .
Effects of Rok on binding by DnaA in vivo 1 .
indicated chromosomal regions were done as described (Materials and Methods).Numbers indicate the fold enrichment of the indicated chromosomal regions in the immunoprecipitate compared to a control chromosomal region ± standard deviation from six independent cultures.P values are from a one-sided Student's t-Test.The effects of rok on DnaA binding are significant at all loci indicated, except for dnaA.doi:10.1371/journal.pgen.1005258.t002