A Regulatory Hierarchy Controls the Dynamic Transcriptional Response to Extreme Oxidative Stress in Archaea

Networks of interacting transcription factors are central to the regulation of cellular responses to abiotic stress. Although the architecture of many such networks has been mapped, their dynamic function remains unclear. Here we address this challenge in archaea, microorganisms possessing transcription factors that resemble those of both eukaryotes and bacteria. Using genome-wide DNA binding location analysis integrated with gene expression and cell physiological data, we demonstrate that a bacterial-type transcription factor (TF), called RosR, and five TFIIB proteins, homologs of eukaryotic TFs, combinatorially regulate over 100 target genes important for the response to extremely high levels of peroxide. These genes include 20 other transcription factors and oxidative damage repair genes. RosR promoter occupancy is surprisingly dynamic, with the pattern of target gene expression during the transition from rapid growth to stress correlating strongly with the pattern of dynamic binding. We conclude that a hierarchical regulatory network orchestrated by TFs of hybrid lineage enables dynamic response and survival under extreme stress in archaea. This raises questions regarding the evolutionary trajectory of gene networks in response to stress.


Introduction
All organisms encounter reactive oxygen species (ROS) originating from biotic and abiotic sources. ROS are produced at relatively low levels as natural byproducts of aerobic respiration, Fenton reactions, or other biotic sources [1,2]. In contrast, abiotic sources include environmental toxins such as solar UV radiation, pollutants, and excessive metals, which damage macromolecules [3]. In each case, oxidants must be neutralized and macromolecular damage repaired at the cellular level to enable survival. Enzymes such as superoxide dismutase and thioredoxin reductase are induced to neutralize oxidants and restore redox balance in the cell [4]. The production of these oxidant response proteins is typically transient and precisely controlled to enable rapid restoration of homeostasis following oxidant clearance and damage repair [5]. Such regulation is accomplished by a diversity of strategies throughout the microbial world. For instance, complexes of transcription factor (TF) proteins coordinate ROSinduced cell cycle block with production of repair enzymes in yeast [6]. In bacteria, TFs [7,8] or their bound cofactors [9,10] are directly and reversibly oxidized in the presence of ROS, altering DNA binding specificity to induce repair enzyme-coding genes [5,11].
Relative to the other domains of life, the function of TFs that control the oxidant response in archaea remain understudied. To our knowledge, only a few transcription factors have been characterized to date [12][13][14][15][16]. Generally, components of archaeal transcription complexes are hybrid between the bacterial and eukaryal domains. For example, the basal transcriptional machinery in archaea, like that of eukaryotes, consists of transcription factor II B (Tfb), a TATA binding protein (TBP), and an RNA-Pol II-like polymerase [17]. The proteins that modulate transcription (e.g. stress-responsive TFs) typically resemble those of bacteria at the amino acid sequence level [18]. This class of TFs, like those of bacteria, can sense stressors or metabolites directly [14,19,20]. Recent evidence also suggests that these ''bacterial-like'' TFs can bind together on DNA combinatorially to expand their repertoire of gene regulation [21,22]. Machine-learning efforts to reconstruct gene regulatory networks in archaea also suggest combinatorial regulation [16,23,24]. More generally, it remains an open question how networks of transcription factors interact dynamically to enact genome-scale regulation during stress response across the domains of life.
Here we use the salt-loving archaeon H. salinarum as a model, both to characterize the genome-wide binding dynamics of an ROS-responsive transcription factor, and to analyze regulatory network function during ROS stress in archaea. This hypersaline adapted archaeal model organism encounters high levels of abiotic oxidants in its natural salt lake environment, where intense solar radiation and desiccation are frequent [25]. Halophilic archaea use several complementary strategies to protect against, respond to, and repair damage induced by ROS. These include the natural protective capacity of cytoplasmic salt inclusions [26], multiple copies of repair enzymes [27], and an extensive transcription regulatory network that has been hypothesized to respond to oxidative damage [16].
However, this network was computationally inferred from gene expression data. To experimentally characterize TFs with putative involvement in this network, our previous work identified the winged helix-turn-helix DNA-binding TF RosR. This TF dynamically regulates expression of more than 300 genes in response to oxidative stress in H. salinarum [13]. RosR is required for survival of oxidants from multiple sources (e.g. H 2 O 2 and paraquat). Genes directly and indirectly controlled by RosR in response to oxidant encode macromolecular repair functions. In the current study, we ask which of these genes are direct targets of RosR regulation. Integrated analysis of genome-wide binding location time course data with gene expression data demonstrates that RosR binds and regulates over 100 target genes. These encode molecular repair functions and a surprisingly high number of other TFs. RosR binds many of these sites in the absence of stress. Upon exposure to H 2 O 2 , RosR disengages from DNA at most loci. However, at other loci, RosR-DNA binding is dynamic following peroxide exposure, with locus-specific differences in TF occupancy over time. RosR binding is mediated via a 20-bp palindromic cisregulatory binding sequence. Integration of data generated here in the context of other existing systems biology datasets reveals extensive combinatorial binding of RosR with multiple Tfb proteins throughout the regulon. We conclude that RosR is a master regulator of a hierarchy of TFs that performs global, dynamic physiological readjustment in response to oxidative stress.

Conditional ChIP-chip time course experiments reveal dynamic patterns of RosR binding
Previous work demonstrated that the RosR transcription factor is required for the differential expression of genes in response to ROS [13]. To differentiate direct from indirect targets of RosR transcriptional regulation, we mapped DNA binding locations genome-wide in the presence and absence of H 2 O 2 over time (see Methods). A total of 189 regions (252 genes, including operons and divergently transcribed genes) were significantly enriched for RosR binding throughout the genome in the absence of stress, with fewer sites bound over time upon exposure to H 2 O 2 (Fig. 1A, S2 Table). Upon clustering, four major RosR-DNA binding profiles were detected: (1) nearly one-third of sites (88 genes) is significantly enriched for RosR binding under standard, non-stress conditions (  Table). Dynamic binding patterns for representative loci were validated by ChIP-qPCR as shown in Fig. 2B (cluster 2 Spearman correlation = 0.4; cluster 3 C s = 0.8). RosR binding ability in the absence of stress (clusters 1 and 4 at the 0 time point) was previously validated by ChIP-qPCR [13]. Together, these experiments suggest that RosR-DNA binding distributions are dynamic and reproducible genome-wide over time in response to oxidant treatment.
A large fraction of RosR-DNA binding interaction dynamics is strongly associated with differential gene expression patterns To determine if RosR-DNA binding results in functional consequences in gene expression, we asked whether genes nearby binding loci were also differentially expressed over time. ChIPchip binding profiles were compared to previously published gene expression data from H. salinarum Dura3 parent vs DrosR exposed to oxidative stress over time (0, 10, 20, and 60 min relative to H 2 O 2 addition; [13]). Of the 252 genes (including operon members) within 250 bp of a binding locus, 51 exhibit differential expression in response to H 2 O 2 and/or deletion of rosR when all time points are considered together [13]. To uncover additional putative functional binding events, the correlation of RosR-DNA binding with gene expression was calculated for all 252 genes associated with binding loci. Patterns of RosR binding occupancy nearby 70 genes are strongly correlated with expression profiles (''GE-ChIP correlation'', Cs$ 0.6, Fig. 3A, left graphs). Binding time course patterns at 52 other sites were anticorrelated with gene expression profiles (Cs#20.6, Fig. 3A, right graphs). The remaining sites were uncorrelated,

Author Summary
Complex circuits of genes rather than a single gene underlie many important processes such as disease, development, and cellular damage repair. Although the wiring of many of these circuits has been mapped, how circuits operate in real time to carry out their functions is poorly understood. Here we address these questions by investigating the function of a gene circuit that responds to reactive oxygen species damage in archaea, microorganisms that represent the third domain of life. Members of this domain of life are excellent models for investigating the function and evolution of gene circuits. Components of archaeal regulatory machinery driving gene circuits resemble those of both bacteria and eukaryotes. Here we demonstrate that regulatory proteins of hybrid ancestry collaborate to control the expression of over 100 genes whose products repair cellular damage. Among these are other regulatory proteins, setting up a stepwise hierarchical circuit that controls damage repair. Regulation is dynamic, with gene targets showing immediate response to damage and restoring normal cellular functions soon thereafter. This study demonstrates how strong environmental forces such as stress may have shaped the wiring and dynamic function of gene circuits, raising important questions regarding how circuits originated over evolutionary time.
which suggests that these sites represent non-specific DNA interactions and/or that other factors may be required for significant change in gene expression at these sites [28,29]. The four clusters observed for binding profiles alone were also detected for genes exhibiting strongly correlated or anticorrelated gene expression and binding patterns (Figs. 2 and 3A). Across the distribution of strong GE-ChIP correlations and anticorrelations, deletion of rosR significantly alters the relationship between binding and gene expression, with a trend toward uncorrelated gene expression and binding relationships in this strain (Fig. 3B). Because the time scale of TF-DNA binding is faster than that of transcript synthesis (,1 minute vs .5 minutes, respectively; [30]), binding and expression would appear simultaneous with the resolution of the time course experiments herein (Fig. 3A). Therefore, we reasoned that the relationships between gene expression and binding profiles detected are consistent with RosR activity, with activated genes exhibiting correlated binding and expression, and repressed genes showing anticorrelated binding and expression. Together, these results suggest that: (a) dynamic binding events are strongly associated with a change in gene expression before and/or after oxidant exposure; and (b) RosR is required for direct and dynamic activation or repression of over 100 genes in response to oxidative stress in H. salinarum.

Computational and experimental determination of the cis-regulatory binding sequence
A key component of gene network function is the specific cisregulatory binding sequence for a TF. To provide further support for RosR direct activation and repression of these target genes, we next sought to determine this binding sequence consensus for RosR. In previous work, a putative cis-regulatory sequence was computationally predicted from promoters of genes differentially expressed in response to deletion of rosR (direct and indirect RosR target genes; [13]). This sequence consisted of a 7 bp inverted repeat palindrome with the consensus TCGnCGA. To gain additional refinement in these predictions, the cis-regulatory  Table. (B) Zoom-in of selected binding regions. For each binding site shown, vertical lines represent average enrichment intensity at the binding site predicted from the bootstrapped noise estimation fits to the raw data. Surrounding curves represent model fits to the raw data. Colors for each time point are as in (A). Peaks with solid lines in each region represent those binding sites that pass statistical filtering criteria (see Methods); peaks with dotted lines do not. Gene strand designations are shown in the legend. Identification numbers or names are given for those genes immediately neighboring the binding sites. doi:10.1371/journal.pgen.1004912.g001 sequence search was repeated using only direct RosR targets detected here by binding location analysis (S3 Table). The resultant consensus motif contained a 20 bp imperfect palindrome sequence TCGnCGACGAGnTCGnCGAC (Fig. 4A, p, 3.5610 212 ), which was detected nearby 37 of RosR-bound loci (,15%; p,10 237 ), but not detectable elsewhere in the genome (S5 Table). Some loci contain more than one motif. Of these 37 loci with motifs detected, 40% also exhibited strong ChIP-GE associations (Cs$|0.6|; Fig. 3). On average, motifs were located within 18 bp of ORF start sites (Fig. 4B).
To validate the function of this computationally predicted binding site experimentally, the native genomic promoter (TATA box and putative cis-regulatory sequence) of VNG2094G (trh4, a TF-coding gene) was fused to GFP. Promoter activity was assayed in the DrosR vs parent strain in the absence of stress, when RosR binding activity was evident in ChIP-chip experiments for these promoters. P trh4 activity is significantly higher in the DrosR strain relative to the parent and the empty vector background control (Fig. 4C). This suggests that the predicted cis-regulatory sequence is required for RosR-mediated repression of this promoter, consistent with the genome-wide data (Figs. 1-4, S2 Table). Together, these data suggest that (a) the computationally predicted motif is biologically relevant; (b) RosR binds to the predicted cisregulatory sequence in vivo to regulate gene expression; and (c) this cis-regulatory sequence carries significant importance in the function of the RosR regulatory network.

Functional enrichment analysis of the RosR regulon
To gain additional insight into RosR function in the cell, we calculated statistical enrichment in archaeal clusters of orthologous genes functional ontology categories [31] for RosR target genes (those bound in binding location assays). These genes are significantly enriched for stress response functions (e.g. genes encoding heat shock proteins hsp4 and hsp5, peroxidase perA), translation (e.g. genes encoding ribosomal protein), DNA replication, cell growth and division, and transcription (e.g. RNA polymerase subunits, TFIIB family member tfbB, LRP family homolog trh4; Table 1). In general, the direction of regulation corresponds with the function of these gene products. For example, genes associated with translation (e.g. eif2B) are downregulated upon ROS exposure, whereas stress response genes (e.g. perA, hsp5) are upregulated (S2 Table; [16]). This analysis confirms previous results implicating RosR in the regulation of genes whose products serve stress repair functions [13], but also expands the RosR regulon.
Other transcription factors required for survival under oxidative stress are major targets of RosR regulation The functional enrichment analysis revealed novel RosR targets, notably 21 genes encoding TFs and 4 other putative regulators involved in signal transduction and DNA binding ( Table 1, Tables S2 and S3). Cis-regulatory sequences were detected in the vicinity of the translation start site for 14 of these TF-coding genes, including rosR itself ( Table 2, Fig. 5, S5 Table). This could explain why direct RosR binding was not detected for many genes affected by deleting rosR [13] (i.e. RosR binding not detected here). For example, nearly 25% of RosR indirect gene regulation appears to be mediated through TfbB, whose encoding gene is among the TFs directly regulated by RosR ( Fig. 6; [32]). Dynamic ChIP-chip profiles for seven of the 14 TF-coding genes with cis-regulatory sequences nearby were anticorrelated with their gene expression profiles ( Table 2). Closer inspection of binding and gene expression profiles revealed that these seven TFs are repressed by RosR during optimum growth in the absence of stress but de-repressed in response to H 2 O 2 (Fig. 5A). These sites were bound again within 60 minutes. Temporally coherent binding profiles resulted in two waves of time-resolved expression of TF-coding genes, with the majority of RosR-regulated TFs expressed in the late wave (Fig. 5A, S2 Table). Taken together, these results suggest that RosR regulates a hierarchy of TFs, the majority of which are transiently de-repressed in a RosRdependent manner during oxidative stress.
We reasoned that such TF-TF regulation might contribute to H. salinarum survival of extreme oxidative stress. To test this, we generated strains deleted in-frame of two of the TF-coding genes regulated by RosR (VNG0194H and hrg). Relative to the isogenic parent strain, both TF knockout strains are significantly impaired for growth in response to oxidative stress induced by addition of H 2 O 2 to the cultures (Fig. 5B). These phenotypes are significantly complemented when the corresponding wild type copy of the TF gene is supplied in trans on a plasmid. These phenotypes are similar to that previously observed for the DrosR mutant strain ( Fig. 5B; [13]). Together, these results implicate new TFs in oxidative stress survival in H. salinarum, suggest important physiological consequences for RosR regulation of other TFs, and validate hypotheses generated from systems-level datasets.
Extensive combinatorial control of gene expression by RosR with multiple TFIIB homologs RosR regulates many genes encoding TFs, a subset of which is required for oxidant survival. However, we reasoned that RosR might not be the only regulator of these TFs, since the phenotyping results described above are inconsistent with a classical epistatic relationship with TFs downstream of RosR in a linear regulatory cascade. To identify candidates for such coregulation, RosR binding positions were compared to those for Tfb proteins from previously published high-resolution genomewide DNA binding location experiments [32,33]. Similar to RosR, Tfb binding sites are detected under standard, non-stress conditions, providing comparable physiological conditions. At 82 of each of the 252 RosR-bound loci, we also detected binding for five of seven H. salinarum Tfb proteins (TfbA, B, D, F, G, S4 Table). A single Tfb bound together with RosR at just over half of these loci (Fig. 6A). In contrast, 2 or more Tfbs co-bound at the same locus with RosR at 40 loci. At least four Tfbs together with RosR occupied 10 of these 40 loci (Fig. 6A, 6B). Whether the different Tfb proteins bind simultaneously or one at a time together with RosR remains unclear. While TfbA was underrepresented for co-binding with RosR, TfbG alone was significantly enriched for co-binding with RosR. At other loci, TfbF and TfbG together were enriched for co-binding with RosR (Fig. 6B). Also among the total 82 co-bound loci were 12 of the 21 RosRregulated TF-coding genes (S4 Table, Fig. 6C).
Previous studies suggest that sequence-specific TFs in archaea activate gene expression by binding upstream of the transcription pre-initiation complex [PIC, includes TATA-binding protein (TBP) and TFIIB (Tfb)]. In contrast, most repressor TFs inhibit gene expression by binding downstream of the PIC [34][35][36][37]. To test this model and the mechanism of RosR gene regulation, the RosR-to-Tfb binding locus distance was calculated for the 82 RosR sites where Tfb binding was also detected (see Methods). These distances were compared to RosR activity using the GE-ChIP correlation as a proxy. Interestingly, the distance between RosR and Tfb binding loci was strongly and significantly anticorrelated with RosR activity. That is, if RosR binding upstream of Tfb is considered as a negative distance, then positive GE-ChIP correlation, or activation, is observed and vice versa. When these sites are binned into distance cut-offs (absolute value of 5 bp), a peak association is detected at distances of 65-75 bp (Fig. 6D). This relationship is abrogated in the DrosR mutant background (Fig. 6D, light grey trace) and is significantly different from random distributions across the distance scale (Fig. 6D, dark  (Fig. 3); and (c) supports the hypothesis that the relative binding position and distance between Tfb proteins and sequence-specific transcription factors dictates the activation or repression of target genes.
Comparison of the experimentally determined RosR network to that predicted from statistical inference models We next assessed how predictions of statistically inferred gene regulatory network models (''environmental gene regulatory influence network (EGRIN)''; [16,23]) compared to the RosR regulatory network determined from the experiments described here. Of the 252 experimentally observed direct RosR-gene interactions, 15% were predicted from EGRIN (p,5.68610 23 ; see Methods for p-value calculation and S2 Table for a list of genes with validated predictions). Further, the correspondence between predicted and observed target gene lists subject to combinatorial control by RosR-TfbB or RosR-TfbG was significant (p, 2.05610 24 for TfbB; p,2.16610 27 for TfbG). In contrast, predictions from the model did not match experimental observations regarding combinatorial control by RosR-TfbD and RosR-TfbF pairs (S4 Table). Of all RosR regulated genes that were both predicted and observed, genes encoding TFs and functions in transcription are most highly enriched (arCOG category enrichment p,1.77610 25 ; see also Fig. 6C). This analysis suggests that network topological predictions from the EGRIN model are accurate for RosR regulatory influences, especially for those genes that encode functions in transcriptional regulation.

Discussion
Data and analyses presented here suggest that H. salinarum RosR is a bifunctional regulator that directly controls a large hierarchy of transcription factors in combination with Tfb proteins to enable extreme oxidative stress survival. The majority of these sites are bound in the absence of stress, with RosR released from DNA in the presence of oxidant. A subset of loci exhibits the opposite binding pattern. We show that RosR binds to a ,20 bp imperfect palindrome cis-regulatory sequence and directly activates or represses genes encoding functions in transcription, macromolecular repair and central cellular physiology. We demonstrate that RosR regulates genes encoding TFs that are also required for oxidative stress survival. Such regulation is conducted in concert with Tfb proteins. We conclude that RosR plays an important role in a large transcriptional network that enables a rapid response to extreme oxidative stress followed by reestablishment of homeostasis.
The function of gene products in the RosR regulon reported here reflects the observations from our previous work [13]. Here we expand this regulon, differentiating between direct and indirect control of gene expression by RosR, including new gene targets whose products are involved in central cellular functions such as translation, transcription, and DNA replication. RosR regulation of specific genes encoding such functions is also accurately predicted from a computationally inferred gene regulatory network for H. salinarum [23] (S2 and S4 Tables S2; Fig. 6D). However, the RosR cis-regulatory binding sequence we detected and validated here was not predicted from the model, nor was combinatorial control of gene expression by RosR and Tfbs D and F, possibly because the inference model predicts regulatory interactions primarily based on gene expression [23]. Recent evidence suggests that such predictions can be improved by the incorporation of TF-DNA binding data (e.g. ChIP-seq or  Table. Information content in bits is shown on the y-axis and the residue position is given on the x-axis. (B) Position of predicted cis-regulatory sequence and ChIP-chip hits relative to start codon of genes for which a motif was detected (S5 Table). The vertical blue line indicates the average position of the binding location. The red line indicates the average position of the cis-regulatory sequence (18 bp from ATG). (C) Experimental validation of cis-regulatory motif sequence. Negative control (-) empty vector and positive control (+) with a strong constitutive promoter driving GFP expression are shown as a comparison to promoter activities of interest in each of the parent (grey bars) and DrosR (red bars) strains grown in the absence of stress. Error bars represent the standard error of the mean (minimum n = 12, at least five independent biological replicate measurements, each with 2-4 technical replicates). ChIP-chip, [38]). Therefore, the current work also pinpoints specific areas for model refinement.
The integrated genome-wide analysis presented here suggests hypotheses for the RosR biochemical mechanism. Dynamic TF-DNA binding analysis suggests a differential preference in RosR promoter occupancy, as some promoters are re-bound while homeostasis is restored, whereas a small subset of other sites are bound only in the presence of peroxide (Fig. 2, Cluster 3). Binding to slightly different cis-regulatory sequences could enable promoter binding under both conditions, similar to transcription factors that use Fe-S clusters as cofactors in bacteria [39]. However, we observed only one significant motif in our computational analysis  Mean expression profiles are shown for TFcoding genes expressed in the early wave (red) and late wave (blue) in response to RosR binding dynamics (black lines, binding profiles for each of the 16 sites is shown). y-axis represents mean and variance scaled ChIP-chip and expression data. All expression and ChIP-chip data for these TFs are given in S2 Table. (B) RosR dynamic regulation of TFs has functional consequences. Growth rate ratios for two strains deleted of TF-coding genes (DVNG0194H and Dhrg) as well as VNG0194H and hrg complemented in trans on a plasmid are shown. Asterisks indicate p-value ,0.05 in t-test comparisons of growth ratios between each mutant and the parent strain. Growth rates of complemented strains are not significantly different from that of the Dura3 parent strain. Raw growth data are given in S6 Table. Annotations for the strongly RosR-regulated TFs with cis-regulatory sequences are listed in Table 2, with all 21 RosR-regulated TFs listed in S3 Table. doi:10.1371/journal.pgen.1004912.g005 (S5 Table), suggesting that other co-factors may be involved (e.g. Tfb proteins, Fig. 6). It remains unclear how and whether RosR itself senses oxidant, since no cysteines are present in the protein.
Further biochemical studies are required.
In contrast to RosR targets in Cluster 3, a significant fraction of sites are bound in the absence of H 2 O 2 and re-occupied by RosR within 60 minutes of oxidant exposure (Fig. 2, Cluster 2).
Clearance of oxidant from the cell by detoxification enzymes (e.g. perA, sod2) may enable RosR to re-bind. For example, DperA mutants experience high intracellular H 2 O 2 concentrations during mid-log phase growth, whereas H 2 O 2 is cleared from the H. salinarum parent strain within the time frame tested here [16]. The gene encoding PerA is a direct target of RosR regulation (S2 and S3 Tables).  S4  Table, including TfbA data, which was omitted from the figure for clarity given the small number of co-bound RosR-TfbA sites. (C) Gene network of RosR and Tfb coordinate control of other TF-encoding genes. Tfb designations and colors are as in (B). Arrows represent activation, bars represent repression. Each edge in the network is based on both gene expression and ChIP-chip or ChIP-seq data (S2 Table, [32,33]). (D) Comparison of RosR activity (GE-ChIP correlation) with the distance between RosR and Tfb binding loci. The negative log10 p-value of the significance of this comparison is given on the y-axis, and the absolute value of the distance cutoff is given on the X-axis. These correlations in the parent strain (black lines) are compared to the correlation in the DrosR mutant (grey lines) and random data with the same mean and standard deviation as the actual distribution at each distance cutoff (dark grey dotted lines). doi:10.1371/journal.pgen.1004912.g006 Dynamic patterns of differential promoter occupancy observed in yeast suggest that the probability of productive gene expression correlates with longer TF-DNA dwell times [40]. The addition of stress in the experiments reported here links these dynamic events to environmental perturbation. For example, TF-coding genes are found almost exclusively in dynamic binding cluster 2, which are re-bound at the earliest time point following ROS exposure (S2 Table, Fig. 2). Binding at these sites correlates well with gene expression dynamics and TF knockout strains are more sensitive to H 2 O 2 challenge than the parent strain (Table 2, Fig. 5). The pattern of binding in cluster 2 is therefore consistent with an immediate need for TFs to work with RosR to restore homeostasis following stress exposure. Taken together, these dynamic genomewide data point to a non-canonical mechanism for RosR regulation in response to oxidant.
Integrated analysis of several genome-wide binding location and gene expression datasets for TFIIB homologs [32,33] with those presented here suggests a surprising degree of RosR-Tfb combinatorial control of gene expression in response to oxidant (Fig. 6). RosR combinatorial control contrasts with the H. salinarum nutritional regulator TrmB, which regulates far fewer TFs (only 4 for TrmB vs. 21 for RosR) and binds together with only one other Tfb protein at its target promoters [36]. Similarly, H. salinarum iron regulators Idr1 and Idr2 only regulate one other TF each [22]. Further regulatory interactions were observed between TFs, including TfbB regulation of RosR, setting up a potential feedback loop ( Fig. 6D; [32]). Taken together, these data are consistent with the hypothesis that the regulatory reach of RosR under oxidative stress conditions is extended significantly via TF-TF network interactions.
Systems-level studies suggest that extensive TF-TF interactions may be a conserved feature of transcriptional regulation of stress response across the domains of life. For example, hierarchical regulation in response to oxidant has been observed in Escherichia coli, where SoxS regulates at least four other TF-coding genes (fur, marA, marR, rob; [41]), some of which in turn regulate other TF-coding genes. However, RosR control of more than 20 other TF-coding genes is closer to the order of the global nutritional regulator, CRP, which controls the expression of at least 50 other TF-coding genes. Such extensive inter-TF regulation in H. salinarum is also reminiscent of multi-TF regulatory networks in yeast that coordinate the cell cycle with DNA damage repair [6]. Thus, RosR appears to possess unique functional features, resembling a eukaryotic-like TF in global activation of gene expression (Fig. 1), control of a large network of TFs (Figs. 5, 6, Table 2), and extensive coordinate control of gene expression ( Fig. 6; [33]). However, some features of RosR also resemble a bacterial-type TF, with its DNA binding sequence specificity (Fig. 4), repression of gene expression (Fig. 4C), and stress-specific alteration of its binding activity (Fig. 1).

Strains and growth conditions
Strains of Halobacterium salinarum NRC-1 used in this study are listed in S1 Table. Cultures were routinely grown in complex medium (CM; 250 g/L NaCl, 20 g/L MgSO 4 7H 2 O, 3 g/L sodium citrate, 2 g/L KCl, 10 g/L peptone). Dura3, the parent strain, and transcription factor deletion strain derivatives thereof, were grown in CM supplemented with 0.05 mg/mL uracil to complement the auxotrophy. In-frame gene deletion strains (Dura3DVNG0194H, Dura3Dhrg) were constructed using the pop-in/pop-out gene deletion strategy described previously [42]. Dura3DrosR, referred to throughout as DrosR for brevity, was constructed previously [13]. H. salinarum strains harboring plasmids were cultured in CM supplemented with 20 mg/mL mevinolin for plasmid maintenance. H 2 O 2 was added to mid- logarithmic phase cultures to 25 mM or at inoculation at 5 or 6 mM to test oxidative stress response as displayed in the figures.

Dynamic, genome-wide transcription factor binding site location analysis
ChIP-chip data collection. H. salinarum harboring VNG0258H::myc (constructed previously in [13]) was grown to mid-logarithmic phase (OD600 ,0.2-0.4) and either left untreated or exposed to 25 mM H 2 O 2 for 10, 20, and 60 minutes. Transcription factor-chromatin complexes from cultures untreated and at each treated time point were then cross-linked in vivo with 1% formaldehyde for 30 min at room temperature and subjected to immunoprecipitation (IP) by virtue of the myc epitope tag as described previously [22]. One mg of each IP sample was hybridized against matched, mock-treated controls on a custom 26105,000 feature 60-mer oligonucleotide microarray (Agilent Technologies). On this high-resolution array, the entire H. salinarum genome was tiled every 30 bp in triplicate. Randomly selected regions of the genome were spotted in quadruplicate. The custom tiling microarray design can be accessed via Agilent Technologies AMADID 026819, Gene Expression Omnibus (GEO) platform accession GPL18848. Dye swaps were conducted to correct for bias in incorporation. Seven biological replicate experiments were conducted for VNG0258::myc in the absence of H 2 O 2 and three in the presence of H 2 O 2 . These experiments yielded a total of at least 18 replicate intensity data points per 30bp genomic region per condition. DNA fragments were directly labeled with Cy3 and Cy5 dyes (Kreatech) as described previously [33]. Microarray slide hybridization and washing protocols were conducted according to the manufacturer's instructions (Agilent Technologies) with the exception that hybridization was conducted in the presence of 37.5% formamide at 68uC to ensure proper stringency for high G+C content of the H. salinarum genome (67%, [43]).
ChIP-chip data preprocessing, peak picking, and peak-togene correspondence. Resultant slides were scanned and processed with Feature Extraction software (Agilent). Raw probe intensities were first normalized within each array using densityweighted loess [40]. Second, probes were normalized to quantiles across arrays. Binding peaks were detected from normalized data for each replicate independently using MeDiChI [44]. This peak detection algorithm relies on a deconvolution-based model to determine genomic regions significantly enriched for TF binding. Binding peaks were included in subsequent correlation and statistical analyses if: (a) they were located within 250 bp of a predicted translation start site for an open reading frame (the majority of ORFs are leaderless in H. salinarum, see [45]); (b) the ORF was not redundant (H. salinarum genome encodes multiple copies of some genes; [43]); and (c) achieved p-value ,0.05 (calculated using MeDiChI) in at least one time point. Composite p-values for multiple binding peaks nearby the same gene within a given time point were calculated using Fisher's combined probability test [46]. Enrichment intensity values for these combined peaks were averaged within each time point. Peaks with variable enrichment across the four time points were set to 0 intensity (i.e. no binding) at any time point that did not meet the selection criteria. This enabled comparison of ChIP-chip to gene expression data. Using this pipeline, a total of 189 binding loci were detected across the time course, which corresponded to 252 genes when experimentally determined operon members and divergently transcribed genes were included ( [47]; S2 Table). Raw ChIP-chip data are available through GEO accession GSE58696.
Detection of dynamic binding profiles in ChIP-chip data and integration with gene expression data Time course profiles of processed ChIP-chip binding data were grouped using Spearman correlated complete linkage hierarchical clustering to identify various dynamic binding patterns. To determine the dynamic relationship between binding and gene expression, each gene in each dynamic binding cluster was correlated to expression data under the same culturing conditions as ChIP-chip from a previous study [13] (mid-logarithmic phase cultures exposed to 25 mM H 2 O 2 at 0, 10, 20, and 60 min; GEO accession GSE33980). These correlations are referred to throughout as ''GE-ChIP correlations''. GE-ChIP correlations were calculated separately for each of the DrosR deletion and isogenic parent backgrounds as an additional metric for the impact of RosR binding on gene expression. Significance of the difference in GE-ChIP correlations between the parent and DrosR strains was calculated using Student's t-test. Genes with strong GE-ChIP correlations (Cs$0.6) were interpreted as directly activated by RosR, whereas anticorrelations (Cs#20.6) were interpreted as repressed. Statistical overrepresentation in archaeal clusters of orthologous genes (arCOG) functional categories [31] for RosRbound genes was calculated for using the hypergeometric test. Enriched categories are listed in Table 1. Detailed arCOG annotations, GE-ChIP correlation values, and significance of correlations for each of the 252 genes nearby RosR binding sites are listed in S3 Table. The code repository containing the pipeline used for binding location data analysis and correspondence to gene expression can be accessed at github.com/amyschmid/rosrchip-utils.
Integration of data generated here with previously published systems biology datasets for H. salinarum To detect RosR-Tfb combinatorial control, or ''co-binding'', high resolution ChIP-chip binding data for TfbA and TfbF [33,44] and ChIP-seq binding data for TfbB, G, and D [32] were analyzed. Genes located within 250 bp of a Tfb protein binding site with ChIP enrichment significance of p,0.01 were selected using the R bioconductor MeDiChI package [44]. Sites meeting the following criteria were considered to be co-bound by RosR and a Tfb protein: (a) both RosR and Tfb binding sites were detected within 250 bp of the same gene; (b) RosR and Tfb binding positions were at most 250 bp away from each other. Venn diagram was constructed using the VennDiagram package in R [48] and RosR-Tfb gene regulatory network shown in Fig. 6D was constructed using BioTapestry [49]. Distances from RosR to Tfb binding sites for each of the co-bound genes are listed in S4 Table. The relationship between Tfb-to-RosR binding site distances with RosR GE-ChIP activity values was calculated using Spearman correlation. These correlations were calculated separately for each strain background (parent and DrosR). Significance of these correlations was computed from by comparing 10,000fold resampled data to actual data (S4 Table) at each distance cutoff in 50 bp sliding windows. The negative log10 transform of resultant p-values are reported. Simulated data was generated from the random normal distribution with the same mean, standard deviation, and number of samples in the actual data set (S4 Table). All other p-values of significance listed in the text, including comparisons to EGRIN predictions, combinatorial control, arCOG functional enrichments, etc., were calculated using the hypergeometric test against the genome-wide background distribution unless indicated otherwise.

Validation of dynamic RosR binding profiles with ChIP-qPCR
To validate RosR binding patterns from ChIP-chip time course experiments, representative binding sites from dynamic binding pattern groups were selected. Chromatin immunoprecipitation (ChIP) samples were prepared over the time course described above and subjected to quantitative real-time PCR analysis (qPCR) using SYBR green as previously described [22,50]. Primers used are listed in S1 Table. High throughput growth assays H. salinarum Dura3 parent, TF deletion strains Dur-a3DVNG0194 and Dura3Dhrg (deletion of VNG0917G), and the complementation strains (see S1 Table for strain details) were pre-grown in CM containing 0.05 mg/mL uracil (and 20 mg/mL mevinolin for complementation strains), then tested for growth phenotypes in high throughput as previously described [13]. Strains were diluted to OD600 ,0.1 and H 2 O 2 was added to final concentrations of 0, 5, or 6 mM. Absorbance at an optical density of 600 nm was measured every 30 minutes using the Bioscreen C (Growth Curves USA, Piscataway, NJ). Growth rates were calculated from the slope of the log2 transformed data during logarithmic growth. Reported in the figures are ratios of the growth rates of each strain under H 2 O 2 stress relative to the same strain's growth rate without stress. All growth data are provided in S6 Table. Cis-regulatory sequence prediction and experimental validation Regions of the H. salinarum genome sequence 250 bp upstream and downstream of each of the 189 ChIP-chip binding loci (nearby 252 genes including operons, S2 Table) were searched for a cis-regulatory consensus binding motif for RosR using MEME [51]. The output of the search was constrained to three motifs, any number of repeats per sequence, forward or reverse strand, and maximum motif width of 20 bp. Palindromic motifs were not enforced. Similar cis-regulatory sequences were detected using varying subsets of the input sequences. Motif significance was determined using the Wilcoxon signed rank test comparing randomized input sequences to actual sequences. Resultant significance of the top-scoring motif is reported in the text. Details regarding motif genomic positions, E-value of significance of similarity to consensus, and sequence are listed in S5 Table. To validate the predicted cis-regulatory binding sequence, a 200-bp region containing the putative cis-sequence and TATA box of VNG2094G was cloned into the pMTF1044GFP plasmid [36,52] by Gibson assembly [53] using primers listed in S1 Table. The maximum cloned DNA fragment size was kept to 200 bp to reduce signal from other cryptic promoter elements. H. salinarum Dura3 parent and DrosR strains transformed with the fusion vector were grown to mid-logarithmic phase (OD600 ,0.3-0.6) in the absence of stress in 50 mL CM. Samples were collected, washed and fixed as previously described [54] except for fixing temperature (4uC). Resultant samples were measured for fluorescence in an FLx800 fluorimeter (BioTek). Dura3 harboring the empty vector (i.e. GFP-encoding gene with no promoter) or vector containing GFP-encoding gene driven by the strong constitutive Pfdx promoter [33] were used as negative and positive controls, respectively (S1 Table). For each strain, at least five biological replicate cultures with 2 to 4 technical replicates each were tested. Resultant raw fluorescence values were normalized to the cell density of each culture. The mean of these normalized values and standard error of the mean are presented in the figures.
Supporting Information S1