The regulatory control of Cebpa enhancers and silencers in the myeloid and red-blood cell lineages

Cebpa encodes a transcription factor (TF) that plays an instructive role in the development of multiple myeloid lineages. The expression of Cebpa itself is finely modulated, as Cebpa is expressed at high and intermediate levels in neutrophils and macrophages respectively and downregulated in non-myeloid lineages. The cis-regulatory logic underlying the lineage-specific modulation of Cebpa’s expression level is yet to be fully characterized. Previously, we had identified 6 new cis-regulatory modules (CRMs) in a 78kb region surrounding Cebpa. We had also inferred the TFs that regulate each CRM by fitting a sequence-based thermodynamic model to a comprehensive reporter activity dataset. Here, we report the cis-regulatory logic of Cebpa CRMs at the resolution of individual binding sites. We tested the binding sites and functional roles of inferred TFs by designing and constructing mutated CRMs and comparing theoretical predictions of their activity against empirical measurements in a myeloid cell line. The enhancers were confirmed to be activated by combinations of PU.1, C/EBP family TFs, Egr1, and Gfi1 as predicted by the model. We show that silencers repress the activity of the proximal promoter in a dominant manner in G1ME cells, which are derived from the red-blood cell lineage. Dominant repression in G1ME cells can be traced to binding sites for GATA and Myb, a motif shared by all of the silencers. Finally, we demonstrate that GATA and Myb act redundantly to silence the proximal promoter. These results indicate that dominant repression is a novel mechanism for resolving hematopoietic lineages. Furthermore, Cebpa has a fail-safe cis-regulatory architecture, featuring several functionally similar CRMs, each of which contains redundant binding sites for multiple TFs. Lastly, by experimentally demonstrating the predictive ability of our sequence-based thermodynamic model, this work highlights the utility of this computational approach for understanding mammalian gene regulation.


Introduction
CCAAT/Enhancer binding protein, α (Cebpa) encodes a TF that is necessary for neutrophil development [1] as well as the specification of hepatocytes and adipocytes [2,3]. During Cebpa. The remaining CRMs, appeared to behave as silencers and repressed the activity of the proximal Cebpa promoter in a dominant fashion. Our computational analysis inferred a comprehensive map of the regulation of the Cebpa locus and suggested a novel mechanism of lineage resolution [21]. The enhancers were predicted to be activated by PU.1, C/EBP family TFs, Egr1, and Gfi1 and repressed by Myb. Surprisingly, the model predicted that the silencers exert repression through the activity of TFs strongly expressed in non-myeloid cell types, GATAs, Ebf1, and Myb. The silencing of Cebpa is necessary for the specification and maintenance of non-myeloid cell fates [9,17,35]. Dominant repression of the Cebpa promoter by distal silencers in non-myeloid lineages therefore might be a mechanism for resolving lineages. These inferences must however be regarded as predictions since they are yet to be verified experimentally.
Here we rigorously test the predictions of our computational models to determine the regulatory logic of Cebpa CRMs. We predicted the effect of mutations to one or more binding sites by simulating the regulation of mutated DNA in the model. The predictions were experimentally tested by synthesizing mutated CRMs and assaying their activity and by comparing against publicly available ChIP datasets. The regulatory logic of enhancers was investigated in PUER cells, representing the myeloid lineage where Cebpa is expressed robustly. The function and regulation of silencers was investigated in G1ME cells [36], which are derived from Gata1 −/− mice and represent the red-blood cell lineage by virtue of being blocked at the megakaryocyte-erythrocyte progenitor (MEP) stage.

Construct design and cloning using Gibson assembly
Putative CRMs were cloned into a pGL4.10luc2 Luciferase reporter vector (Promega, E6651). The proximal promoter was introduced into the multiple cloning site (MCS) of pGL4.10luc2 between XhoI and HindIII sites. The distal CRMs were inserted between BamHI and SalI sites downstream of the SV40 late poly(A) signal. CRM sequences are provided in S1 File.
Each CRM or promoter insert was amplified from genomic DNA of C57BL/6J mice using Q5 High-Fidelity 2X Master Mix (NEB, M0492L) following the manufacturer's instructions. The following PCR cycling conditions were used: initial denaturation of 30s at 98C, 30 cycles of 30s at 98C, 30s at 60C, and 60s at 72C, and a final extension for 10 minutes at 72C. Primers included 40bp of sequence homologous to pGL4.10luc2 (Table C in S1 Text). Gibson Assembly (GA) reactions [37] were carried out using 0.06pmol of digested vector and 0.18pmol of insert, for 60 minutes at 50C. NEB high-efficiency competent cells (NEB, E5510S) were transformed according to manufacturer's instructions.

Transfection and Luciferase assays
PUER or G1ME cells were transfected with a reporter vector and Renilla control vector (pRL-TK, TK promoter, gift of A. Dhasarathy) in a 1:200 ratio using a 4D-Nucleofector (Lonza). PUER cells were transfected with 2.26μg total plasmid DNA in SF buffer (Lonza, V4SC-2096), using program CM134 and incubated for 24 hours prior to luminescence measurement. G1ME cells were transfected with 4.52μg total plasmid DNA in P3 buffer (Lonza, V4SP-3096), using program CM134 and incubated for 6 hours before luminescence measurement. After incubation, Firefly and Renilla luminescence were measured using the Dual-Glo Luciferase activity kit (Promega, E2920) and the DTX 880 Multimode Detector (Beckman Coulter) according to manufacturer's instructions. Transfections were performed in at least 10 replicates. Raw luminescence data from PUER and G1ME cells are provided in S2 and S3 Datasets respectively.

Normalization of Firefly luminescence against Renilla luminescence
Well-to-well transfection efficiency variation was controlled for by normalizing Firefly luminescence against Renilla luminescence. Robust errors-in-variables (EIV) regression, implemented according to the method of Zamar [38], was used to estimate the slope, β, of the line y = βx, where y is Firefly luminescence and x is the Renilla luminescence. Briefly, β was estimated by minimizing the loss function where, x i and y i are individual replicates of Renilla and Firefly luminescence measurements, is Tukey's loss function with c = 4.7, and S is an estimate of the scale of the residuals. The argument of Tukey's function is the orthogonal distance of the point (x i , y i ) from the regression line. Tukey's function is bounded for large values of t, which limits the contribution of outliers to the loss function and ensures that the slope estimate is robust to outliers. The value of S was estimated by solving the equation where κ = 0.05 and χ(t) is Tukey's loss function with c = 1.56. The minimization problems were solved by the sequential least-squares quadratic programming (SLSQP) algorithm of the NLOPTR package of R, with parameters xtol_rel and maxeval set to 10 −7 and 1000 respectively. 95% confidence intervals were estimated by bootstrapping using the R package BOOT. 999 replicates were subsampled using the ordinary simulation and the function boot.ci was used determine confidence intervals using the basic bootstrap method.

Sequence-based thermodynamic model
We briefly describe the specific model used here to identify binding sites and make predictions. For details, see Bertolino et al. [21]. The model includes 11 TFs, C/EBPα, C/EBPδ, Egr1, Gfi1, Myb, PU.1, Jun, Myc, Ets1, Ikaros, and Fli1, that are expressed in PUER cells [8,21] and were chosen based on differential expression between uninduced, 24 hr IL3+OHT, and 24 hr GCSF+OHT conditions. 4 TFs, which are expressed in non-myeloid cells, Ebf1, GATA(s), Elf1, E2A, were included based on the detection of their binding sites in silencer elements (see Bertolino et al. [21] for details). The regulatory roles, activation or repression, of the TFs were inferred by constructing 2 15 = 32, 768 alternative models realizing all possible combinations of roles. The alternative models were fit to reporter activity measurements from 46 putative CRMs from the Cebpa, Egr1, and Egr2 regions in uninduced and 24 hour IL3 and GCSF induced conditions. Hierarchical clustering was used to identify 8 models with highly consistent regulatory schemes from the 20 lowest scoring model realizations. The model utilized in this work, 81762, was representative of the low scoring models and it's output was highly correlated with the measured reporter activity (r 2 = 0.91). It is worth noting that the same model, that is the same set of TF-related parameters, was able to correctly simulate the regulation of 46 diverse CRMs in three conditions simultaneously.

Design and synthesis of mutant CRMs
Mutations to predicted TF binding sites were designed in silico with the aid of our sequencebased model of transcription [21]. A mutant binding site was created by changing each nucleotide in the wildtype site to one having the lowest frequency in the alignment matrix [39] of the cognate TF (Table A in S1 Text). The mutated CRM was then simulated in the model to predict its activity and confirm that the targeted site was lost, no new sites had been created, and the other sites were unmodified. If the mutant sequence interfered with other sites or introduced new ones, then nucleotides having the second lowest frequency in the alignment matrix were chosen at a few positions to circumvent interference. Mutant sequences were synthesized using Gibson assembly either with primers carrying the desired mutations or with synthetic dsDNA, or both (Tables B and C in S1 Text). The mutant CRMs were cloned into pGL4.10luc2 using Gibson assembly as described above. Mutant CRM sequences are provided in S1 File.

Reverse transcription real-time PCR
Total RNA was extracted using MagJet RNA kit (Thermo, K2731), and reverse transcribed using the High Capacity cDNA Transcription kit (Applied Biosystems, 4368814) following the manufacturer's instructions. Real-Time PCR was performed using the Ssofast Evagreen Supermix (BioRad, 1725201) in a C1000 Thermal Cycler with CFX384 Real-Time System (BioRad) using the following cycling conditions: initial denaturation of 30s at 95C followed by 40 cycles of 5s at 95C and 5s at 60C. Cebpa expression relative to Hprt was computed as 2 C Cebpa The data are provided in S1 Dataset.

The expression of Cebpa during macrophage-neutrophil differentiation
We characterized the time course of Cebpa expression during the differentiation of PUER cells into macrophages and neutrophils. PUER cells are IL3-dependent hematopoietic progenitors derived from Spi1 −/− mice and carry a transgene encoding a PU.1-Estrogen receptor fusion protein [33]. Uninduced PUER cells function like myeloid progenitors and can be induced to differentiate by treatment with 4-hydroxy-tamoxifen (OHT) into either macrophages or neutrophils in the presence of IL3 or GCSF respectively (Fig 1B, Fig A in S1 Text, and [7]). For neutrophil differentiation, IL3 medium is completely replaced with GCSF medium 48 hours prior to differentiation.
We measured Cebpa gene expression relative to Hprt or Gapdh using RT-RTPCR in uninduced PUER cells and at four time points during a 7-day course of differentiation in IL3 and GCSF conditions ( Fig 1C). Overall, Cebpa is expressed two-fold higher in GCSF than in IL3 conditions (Wilcoxon rank sum test after pooling time points, N = 13, p = 2.99 × 10 −6 ). Cebpa expression increases 70% during the 48 hour pretreatment with GCSF, and another 40% after the first 24 hours of GCSF+OHT treatment. Thereafter, the expression level declines gradually over time to revert to pre-OHT levels at day 7. In contrast, the expression level remains relatively constant after OHT treatment in IL3 conditions. C/EBPα protein displays the same expression pattern as the mRNA (Fig B in S1 Text). The increased expression of Cebpa during neutrophil differentiation is consistent with the essential role that C/EBPα plays in neutrophil development and previous analyses of PUER differentiation [7,8]. These data also indicate that most of the regulatory modulation of Cebpa expression occurs during the first 24 hours of differentiation.

The activity pattern of Cebpa enhancers
We had previously identified four enhancers of Cebpa in a screen utilizing evolutionary conservation and reporter assays [21, Fig 1A]. Three of four enhancers were novel, while one enhancer, Cebpa (18), overlapped with a known enhancer located 37kb downstream of the Cebpa TSS [11,34,40]. Prior to dissecting the cis-regulatory logic of these newly identified In transient reporter assays, transfection efficiency can vary over an order of magnitude from sample to sample [41]. In our reporter data, we observed a 2-4 fold variation in luminescence from sample-to-sample ( Fig 1E). The prevalent method of correcting for transfection efficiency variation is to co-transfect an independent reporter, such as the Renilla Luciferase expressed from a constitutive promoter, along with the CRM reporter being assayed. Firefly luminescence is then normalized to Renilla luminescence to control for sample-to-sample variation in transfection efficiency. Normalizing by taking the ratio of Firefly and Renilla luminescence is statistically unsound since it weights low-and high-luminescence replicates equally even though the latter produce more reliable estimates of normalized reporter activity.
In our method, we utilize linear regression to determine the normalized activity as the slope of the best fit line (Fig 1E), and hence avoid weighting all points equally. Ordinary least squares regression assumes that the values of the independent variable, Renilla luminescence in our case, are known exactly and don't include random errors. Since Renilla luminescence is itself a random variable in transient assays, we use robust errors-in-variables (EIV) regression [38,42] instead. The estimation of the slope and intercept is rendered insensitive to outliers by utilizing a bounded loss function [38]. Furthermore, the loss function is a sum of the squares of the scaled orthogonal distance of each data point from the line, and hence leads to the minimization of errors in both variables, instead of just the dependent variable. Finally, we performed reporter assays in 10 replicates in order to boost statistical power.
We tested the four previously identified enhancers [21] using this statistically robust methodology. In all reporter data presented in this manuscript, the reporter vectors either carry the Cebpa proximal promoter alone (Cebpa(0)) or in combination with one of the distal CRMs ( Fig 1D). We denote the vector carrying a CRM along with the promoter as Cebpa(X), where X is the CRM number. The comparison of the CRM-bearing reporter with Cebpa(0) allows us to discriminate enhancing or silencing CRMs from neutral ones. Reporter activity was assayed in uninduced conditions and 24 hours after the addition of OHT in either IL3 or GCSF conditions, when the difference in Cebpa expression between the two treatments is the largest (Fig 1C).
Although these activity patterns are largely consistent with our previous measurements [21], some quantitative differences were observed. For example, Cebpa(7) upregulates activity GCSF+OHT, for which N = 2, N � 3. Error bars show standard error. D. Schematics of reporter vectors, based on the pGL4 backbone (Promega), which contain the Cebpa promoter immediately upstream of luc2 either with (below) or without (above) a distal CRM located downstream of the SV40 Poly(A) signal. E. Normalization of Firefly luminescence against Renilla luminescence to correct for sample-to-sample variation in transfection efficiency. Points are independent Firefly and Renilla luminescence measurements for Cebpa(0) (blue) and Cebpa (7) (red). Luminescence is reported in relative luminescence units (RLUs). The ratio of Firefly and Renilla luminescence was estimated as the slope of the best-fit line (solid) determined by robust errors-in-variable (EIV) regression. Dashed lines represent the 95% confidence interval for slope determined by bootstrapping (see Methods).
https://doi.org/10.1371/journal.pone.0217580.g001 *6-fold instead of *4-fold as observed previously. These differences likely stem from two sources. First, we used the pGL4 vector backbone instead of pGL3 since the former has many fewer predicted binding sites for mammalian TFs, minimizing confounding effects from spurious TF binding. Secondly, we measured luminescence in 10 replicates and analyzed the data with robust EIV regression. Both of these modifications should result in more accurate estimates of reporter activity than before.

The cis-regulatory logic of Cebpa enhancers at binding-site resolution
Having rigorously validated the novel enhancers, we next decoded their cis-regulatory logic by mutating binding sites predicted by sequence-based models of gene regulation [21]. We provide a brief description of the model here and refer the reader to Bertolino et al. [21] for implementation details and equations. A schematic of the model is provided in Fig D in S1 Text. Given the DNA sequence of a CRM, the TFs regulating the CRM, and estimates of TF concentrations in one or more conditions, our model predicts the rate of transcription in each condition. The model utilizes position weight matrices (PWMs) to identify binding sites and to compute their binding affinity relative to the consensus site [43]. The model then determines the occupancy of each site by its TF "thermodynamically" [22,24], that is, by enumerating all possible configurations in which the identified sites may be bound. The occupancy of a site in a given configuration takes into account potential cooperative and competitive interactions between TFs. The model implements position dependent repression, or quenching [26,44,45], by reducing the site occupancy of activators bound in a *150bp neighborhood of repressor sites. The total strength of a CRM's interaction with the polymerase holoenzyme complex is determined by computing a weighted sum of individual activator sites' occupancies, using activation efficiencies as weights. In the penultimate step, crucial for correctly modeling silencers, the model allows for repression over long distances by reducing the interaction strength as a function of repressor site occupancy. In the last step, transcription initiation is modeled as an enzymatic reaction, in which greater interaction strength results in higher transcription rates. In summary, the model utilizes well-known biophysical principles and phenomenological rules to predict CRM activity from DNA sequence.
Besides predicting the activity from sequence when the regulating TFs are known, this modeling framework can also be used to learn which TFs regulate a particular CRM and whether they act as activators or repressors. This is achieved by constructing an ensemble of models realizing all possible combinations of the regulatory roles of a set of candidate TFs and identifying which model realization best fits the empirical reporter activity data [21]. Whether a particular TF is predicted to act as an activator or repressor is implicit in the combination of regulatory roles represented in the best fitting model. The TFs predicted to regulate each CRM, as well as their binding sites, can be inferred by analyzing the utilization of TFs in the occupancy and interaction-strength calculations of the best fitting model.
Using this reverse engineering methodology, we had inferred a comprehensive map of Cebpa CRM regulation at binding-site resolution (Fig 1A). These inferences, implicit in the internal composition of the best-fit model for each CRM, constitute a set of hypotheses about the cis-regulatory logic of Cebpa. In order to place the decoded logic on a firm empirical footing, we sought to test these hypotheses by site-directed mutagenesis. The interpretation of sitedirected mutagenesis experiments can be challenging because deletions change binding-site spacing while substitutions have the potential to introduce new binding sites. Having CRM models capable of predicting transcription rate from DNA sequence allowed us to circumvent these limitations. For each binding site to be tested, we designed substitutions to abolish binding by choosing the nucleotide least favored at each position according to the PWM of the cognate TF [39]. The mutated sequences were simulated in the model to predict their activity. The simulations allowed us to ensure that the mutations did not create any new binding sites for the TFs represented in the model. We tested the decoded logic by synthesizing the mutant CRMs (see Methods), assaying their activity in uninduced and induced PUER cells, and comparing with the theoretical prediction. In what follows, we describe the inferred cis-regulatory logic, the predicted effect of mutations, and the empirical results for each enhancer. Cebpa(7). In the best-fit model, the upregulation of Cebpa (7) over Cebpa(0) results from activation provided by C/EBP family TFs and Gfi1, which bind 2 and 3 sites respectively ( Fig  3A and 3B). We tested the predicted sites of the C/EBP family TFs first since C/EBP TFs are known to regulate the proximal promoter [2] and the +37kb enhancer [11]. We designed a mutant CRM, Cebpa(7m1), which lacks C/EBP sites and is predicted to have half the activity of Cebpa (7) in uninduced conditions when simulated in our model (Fig 3B and 3C). Next, we synthesized Cebpa(7m1) and assayed its activity in both uninduced and induced conditions in PUER cells. We compare fold-change relative to the proximal promoter, Cebpa(0), since the absolute scale of the reporter data used to fit the model in Bertolino et al. [21] is different owing to the use of a different vector backbone and luminometer. We observed a *40% reduction of activity in uninduced conditions, matching the model's prediction and confirming the activation of Cebpa(7) by C/EBP family TFs (Fig 3C and 3D). The activity was also reduced in induced IL3 conditions, although not to the same extent as was predicted by the model. The model predicts a slight reduction of activity in induced GCSF conditions which is not observed experimentally.
The Cebpa(7m1) data also suggested that Gfi1 or other as yet unidentified sites are functional since the C/EBP sites did not account for the entirety of Cebpa(7) activity. We tested the contribution of Gfi1 sites to the residual activity of Cebpa(7m1) by designing a second mutant, Cebpa(7m2), lacking all Gfi1 and C/EBP binding sites (Fig 3B). Simulations predicted that Cebpa(7m2) completely lacked activity in the uninduced condition (Fig 3C). We observed a further *45% reduction in activity compared to Cebpa(7m1), so that Cebpa(7m2)'s activity was three-fold lower than that of the wildtype CRM (Fig 3D). This result confirms the cis-  (14). A-D. Cebpa (7). E-H. Cebpa (14). A, E. Schematics of the construct design showing a distal CRM (blue) and the Cebpa proximal promoter (red). B, F. Activity of each TF activator site predicted by the sequence-based model for each construct. The activity is the amount by which an individual site reduces the activation energy barrier [21] and depends on the occupancy of the site and the efficiency of the bound activator. Sites occurring in the CRM and proximal promoter are shown. The gray box is intervening vector sequence. The x-axis shows each binding site modeled and the position of its 5' end in the reporter construct relative to the 3' end of the proximal promoter in parentheses. 7m1, 7m2 (panel B), and 14m1 (panel F) refer to mutant constructs tested experimentally. Crosses indicate the sites mutated in each construct. C, G. Wildtype and mutant CRM activity predicted by the model in silico. D, H. Experimentally measured activity of wildtype and synthesized mutant CRMs. Both predicted and measured activity levels have been normalized to the activity of Cebpa(0) regulatory scheme of C/EBP and Gfi1 activation inferred by the model, although residual upregulation of Cebpa(7m2) suggests that as yet unknown TFs also contribute to the activity of Cebpa (7).
Cebpa (14). We had inferred that Cebpa (14) is activated exclusively by Egr1, which binds the CRM at two predicted sites (Fig 3E and 3F). Consistent with regulation by a single factor, Egr1, the activity pattern of Cebpa(14) (Fig 2) matches that of Egr1 [21], having the lowest expression in induced GCSF conditions. We designed a mutant CRM, Cebpa(14m1), which lacks both Egr1 sites. Simulation of Cebpa(14m1) predicted a reversion of activity to the level of the proximal promoter ( Fig 3G). Experimentally, we observed a reduction of *35% (Fig  3H), demonstrating the functionality of the Egr1 sites and suggesting that other TFs not represented in the model might also activate Cebpa (14).
Cebpa (16). The model for enhancer Cebpa(16) utilizes 4 activator binding sites, 3 for PU.1 and 1 for C/EBP family TFs (Fig 4A and 4B). Activation by PU.1 is consistent with the preferential upregulation of Cebpa (16) in induced conditions (Fig 2), when the PU.1-estrogen receptor fusion protein is expected to be localized to the nuclei. A mutant enhancer lacking the PU.1 sites, Cebpa(16m1), was predicted to lack enhancing activity in induced conditions, while being expressed at the same level as wildtype in uninduced conditions (Fig 4C). Experimentally, Cebpa(16m1) behaved as predicted, with an activity nearly half of Cebpa (16) and indistinguishable from that of the proximal promoter in induced IL3 conditions (Fig 4D). There was a much smaller reduction in uninduced conditions so that the activity of Cebpa(16m1) was statistically indistinguishable from that of the wildtype enhancer. The activity of Cebpa(16m1) was also *43% lower than that of Cebpa (16) in the induced GCSF condition, although residual upregulation relative to the proximal promoter likely implies that the C/EBP site is also functional.
Given that Cebpa (18) is 201bp longer than the +37kb enhancer, we next investigated whether the extra sequences had any function or not. As a first step, we simulated the +37kb enhancer in our model. The model predicted that the activity of the +37kb enhancer is 7-and 4.5-fold higher than that of Cebpa (18) in the uninduced and induced IL3 conditions respectively (Fig 4G), suggesting that the extra sequence has a repressive function. We tested the activity of the +37kb enhancer in PUER cells and observed a *2.5-fold increase relative to Cebpa (18) in both uninduced and induced IL3 conditions (Fig 4H), confirming a repressive role for the extra sequence.
We analyzed the repressors predicted by the model to pinpoint the TFs and binding sites responsible for moderating Cebpa(18)'s activity. The model had inferred five active repressor sites in Cebpa (18), Fli1, Elf1, GATA, Myb, and Ebf1 (Fig 4E and 4F). Of these five, only two, Myb and Ebf1, are unique to Cebpa (18), lying in the extra 201bp of sequence. Of the two TFs, Myb is more likely to mediate the repressive effects since Ebf1 is not expressed in myeloid cells [46]. To test the function of Myb, we simulated a mutant CRM lacking the Myb site, Cebpa (18m1), with the model. The model predicted that Cebpa(18m1) has a much higher level of in each condition. Reporter assays were performed in 10 replicates. Error bars are 95% confidence intervals. Regression plots corresponding to each bar are shown in Fig E in (16). E-H. Cebpa (18). A, E. Schematics of the construct design showing a distal CRM (blue) and the Cebpa proximal promoter (red). B. Activity of each TF activator site predicted by the sequence-based model for Cebpa (16). 16m1 refers to the mutant construct for testing PU.1 sites (crosses). See the legend of Fig 3B and 3F for details of the calculations, axes, and legend. F. Activity of each TF repressor site predicted by the sequence-based model for Cebpa (18). The repressive activity is the fraction by which the repressor reduces the interaction strength, which results in a higher activation energy barrier. The repressive activity depends on the occupancy of the repressor site and the efficiency of long-range repression of the bound repressor [21]. 18m1 refers to the mutant construct for testing the Myb site (cross). See the legend of Fig 3B and 3F  activity that is indistinguishable from that of the +37kb enhancer (Fig 4G). This is indeed how the mutant enhancer behaved in experiment. Cebpa(18m1)'s activity was derepressed relative to Cebpa (18) and indistinguishable from that of the +37kb enhancer (Fig 4H), with the caveat that the model overestimated the quantitative magnitude of Myb's repression. Since Myb is downregulated in induced PUER cells [21], this result suggests that the upregulation of Cebpa enhancers in induced conditions is a consequence not just of a gain in activation by PU.1, but also a loss of repression by Myb.
Validation against ChIP-seq datasets. Having validated the binding sites predicted by the model, we checked in publicly available genome-wide TF binding datasets whether the predicted TFs bind to Cebpa CRMs. We compiled a set of ChIP-seq datasets for C/EBP family TFs, PU.1, Gfi1, Egr1, and Myb in myeloid cell types (Fig M in S1 Text). C/EBP peaks were detected in the proximal promoter, Cebpa (7), and Cebpa (16). PU.1 peaks were detected in Cebpa (16) and Cebpa (18). Gfi1, Myb, and Egr1 bind to Cebpa (7), Cebpa (18), and the proximal promoter respectively. We were unable to verify just one prediction, the binding of Egr1 to Cebpa (14), with the available datasets. This discrepancy could be a result of cell-type specific binding of Egr1 since the ChIP data in question (GSM881139) are from Dendritic cells [47] and not GMPs. Taken together, the TF binding data strongly support the model's predictions.
To summarize, we have decoded the cis-regulatory logic of four Cebpa enhancers at the resolution of individual binding sites. In all cases, the model's predictions were borne out by experiment. The identified TFs and their sites are likely the most important regulators of Cebpa during macrophage-neutrophil differentiation since they account for most of the CRM activity. The investigated TFs do not however account for all of the CRM activity, suggesting that other TFs not represented in the model also perhaps regulate Cebpa. The overall picture that emerges is that C/EBP family TFs, Gfi1, and Egr1 support Cebpa's expression in uninduced or progenitor conditions by binding to Cebpa (7) and Cebpa (14). Activation in induced conditions is provided via Cebpa (16) and Cebpa(18) by increased PU.1 activation and a loss of Myb repression.

The role of novel silencer elements in hematopoietic lineage resolution
Our previous analysis had revealed CRMs that, when placed in the reporter vector along with the Cebpa promoter, reduced the activity of the construct to levels lower than that of the promoter alone [21]. This mode of action is consistent with the definition of silencers [44,48]. The reduction of activity to levels lower than that of the promoter alone implies that the repressors binding to these silencers act in a dominant manner, similar to long-range repression observed in Drosophila [49,50]. Furthermore, the CRMs in question, Cebpa(9), Cebpa (11), Cebpa (23), and Cebpa(24), lie 9-40kb away from the Cebpa TSS (Fig 1A), implying that dominant repression occurs over long distances. Our analysis had inferred that the silencers were repressed by GATA family TFs, Ebf1, and Myb (Fig 1A and [21]). Gata1/ Gata2 and Ebf1 play key roles in the specification of the red-blood cell and B-cell lineages respectively [28,46], while Myb has been implicated in megakaryocyte development [51]. These inferences are supported by evidence that Gata2 binds to Cebpa (11) and Cebpa (24) in G1ME cells (Fig M in S1 Text), which are blocked at the MEP stage and can be differentiated into erythrocytes [36,52].
These observations motivated the hypothesis that was the subject of our subsequent experiments. We hypothesized that dominant repression mediated via distal silencers is a mechanism for resolving hematopoietic lineages. We tested this hypothesis by 1) checking whether the silencers do, in fact, exert dominant repression in a non-myeloid cell type, and 2) determining whether the silencing is attributable to the predicted repressors.
The activity pattern of Cebpa silencers. Before testing the activity of the silencers in a non-myeloid cell type, we measured their activity in PUER cells using the statistically rigorous methodology we developed for analyzing reporter data. In PUER cells, all of the silencers were either neutral or had weak enhancing activity (Fig H in S1 Text). This result implies that previous observations of reduced activity in PUER cells [21] were likely artifacts of low sample size or statistically unsound normalization. Furthermore, this result implies that these CRMs do not silence Cebpa in a myeloid background.
Next, we tested the function of silencers in G1ME cells, representative of the red-blood cell lineage. We chose the red-blood cell lineage since Gata2 is known to bind the silencers Cebpa (11) and Cebpa (24) in G1ME cells (Fig M in S1 Text). Even though we could not detect Cebpa expression in G1ME cells (Fig K in S1 Text), the Cebpa promoter had detectable activity (Fig 5). This suggested that additional repression is required to completely silence Cebpa. Reporter vectors carrying Cebpa (9), Cebpa (11), Cebpa (24) in addition to the promoter had *3-fold lower activity compared to the promoter alone (Fig 5). The silencers, therefore, while being inert in myeloid cells, repress the Cebpa proximal promoter in the red-blood cell lineage.
GATA and Myb repress the Cebpa proximal promoter in a dominant and redundant fashion. We decoded the cis-regulatory logic of the silencers using the same model-guided strategy as was employed for the enhancers. In contrast to the enhancers, each of which had a distinctive regulatory scheme, the validated silencers shared a common regulatory motif. All three silencers had GATA and Myb sites, which were predicted to be among the most active in each silencer (Fig 6B, 6E and 6H). GATA family TFs and Myb are plausible repressors of Cebpa. Knocking down Gata2 leads to the derepression of Cebpa in G1ME cells [53], while we have demonstrated that Myb represses Cebpa (18) (Fig 4G and 4H). We synthesized mutants CRMs-Cebpa(9m1), Cebpa(11m1), and Cebpa(24m1)-lacking binding sites for both TFs (Fig 6B, 6E and 6H). Cebpa(11m1) carried additional mutations in an Ebf1 site but was functionally equivalent to a GATA/Myb mutant since Ebf1 is not expressed in the red-blood cell lineage (http://biogps.org/gene/13591; [52,54]). As predicted by the model, the mutant CRMs were derepressed relative to wildtype and, in the case of Cebpa(9m1) and Cebpa(11m1), their activity was indistinguishable from that of Cebpa(0) (Fig 6C, 6F and 6I). This implies that the silencing can be attributed specifically to GATA and Myb, which account for the entirety of the effect in two of three silencers.
Next, we investigated how GATA and Myb jointly repress the activity of the Cebpa proximal promoter. We considered three hypotheses and tested them by mutating GATA and Myb sites individually in Cebpa (11) (Fig 7A). First, it is possible that GATA and Myb repress the proximal promoter redundantly [55], so that only one functional site is sufficient to achieve silencing. The second possibility is that of synergism [17,56,57], in which GATA and Myb would have much greater silencing activity together than individually. The third possibility is that of context-dependent role switching, when a TF switches its role when bound near a second TF. For example, in the Drosophila blastoderm, the repressor Hunchback activates gene expression when bound near Bicoid [58,59]. We tested two new constructs, Cebpa(11m2) and Cebpa(11m3), which carry impaired GATA or Myb sites respectively. Both of the constructs carrying only one functional repressor site were able to silence the proximal promoter ( Fig  7C). The expression of Cebpa(11m2) is lower than that of Cebpa (11), which could be interpreted to imply an activating role for GATA. However, if GATA were an activator, one would expect a loss of repression in Cebpa(11m3), which has a functional GATA site. Since this is not the case, we favor the explanation that perhaps, even though Myb is a more potent repressor, GATA competes with Myb and limits the overall repression to that achieved by GATA alone. This would help explain why the activities of Cebpa(11m3) Cebpa (11) are indistinguishable.
The maintenance of repression in the single mutants supports the hypothesis that GATA and Myb are capable of repressing the promoter individually and act redundantly.
In summary, we have shown that three of four putative silencers are capable of attenuating the activity of the Cebpa proximal promoter in a dominant manner. Dominant repression only occurs in the red-blood cell lineage and the CRMs do not silence the proximal promoter in myeloid cells. Lastly, the silencing activity is attributable to a regulatory motif shared by all three silencers-GATA and Myb sites that act redundantly.

Discussion
We have comprehensively analyzed the regulation of 7 CRMs neighboring Cebpa at the resolution of individual binding sites. In the process of doing so, we have also verified the predictive ability of a thermodynamic model of mammalian gene regulation that we developed recently [21]. It is worth noting that prior to our efforts, thermodynamic modeling was limited to Drosophila gene regulation [16,22,24,[60][61][62][63][64], with a single gene, even-skipped, as the focus of most of the work. Our model is closely related to its Drosophila counterparts, incorporating just one additional mechanism, long-distance dominant repression, lacking in the latter. The ability of models with shared mechanisms of gene regulation to predict reporter activity in these divergent species supports the view that the rules of transcriptional regulation are universal.
Long-distance dominant repression by non-myeloid TFs, GATA [65], Myb [51], and Ebf1 [46], was required to correctly model silencers [21]. This led us to hypothesize that long-distance repression by silencer-bound TFs is necessary for quenching Cebpa expression in non- myeloid lineages. The hypothesis predicts that the silencers of Cebpa would have much lower activity in the myeloid lineage, where the gene is expressed, than in non-myeloid ones, where it is not expressed. When we assayed the activity of the silencers in PUER cells, which belong to the myeloid lineage, we found they they acted as enhancers or neutral elements (Fig H in S1 Text), confirming part of the prediction. In contrast, 3 of 4 putative silencers downregulated promoter activity *3-fold in G1ME cells (Fig 5) belonging to the megakaryocyte-erythrocyte lineage, confirming the rest of the prediction. Furthermore, silencing by distal elements appears to be necessary for quenching Cebpa expression in G1ME cells since the promoter has detectable activity (Fig 5), even though Cebpa expression is not detectable in G1ME cells ( Fig  K in S1 Text). The necessity of silencing for completely quenching Cebpa expression in the red-blood cell lineage suggests that long-distance repression is a novel mechanism for resolving hematopoietic lineages.
Detailed analysis of the regulatory logic of silencers revealed two layers of redundancy in their function. First, structural similarity underlies the functional equivalence of all three silencers. All silencers contain the same regulatory motif, a pair of GATA and Myb sites, that mediates dominant repression of the Cebpa promoter (Fig 6). Second, the regulatory motif itself is structured redundantly since mutating either GATA or Myb alone is not sufficient for relieving dominant silencing (Fig 7). We did not find evidence for synergy, where the combined effect of the two sites is greater than the sum of individual effects, implying that GATA and Myb function redundantly. The presence of multiple functionally equivalent and structurally homologous silencers in the locus suggests that the regulatory architecture of distal silencing is similar to that of distal activation by multiple redundant enhancers [66,67]. Redundant enhancers have been shown to ensure robust and precise gene expression [50,68,69], leading us to speculate that redundant silencing might serve a similar function.
The activation of Cebpa CRMs in myeloid cells also occurs in a redundant and overlapping regulatory arrangement reminiscent of shadow enhancers in Drosophila [66]. All enhancers are simultaneously co-active in nearly all conditions tested in PUER cells. The sole exception is CRM18, for which we could not detect statistically significant upregulation in induced IL3 (macrophage) conditions (Fig 2). The coactive enhancers of Cebpa share common regulators. Cebpa (16) (Fig 4B) and Cebpa (18) (Fig F in S1 Text) are activated by PU.1 and other ETS factors, while CRM7 and CRM16 are activated by C/EBPα. CRM14 is the exception with predicted and verified binding sites for Egr1 ( Fig 3F) unique to itself.
The overall picture that emerges from our analysis of Cebpa enhancers and silencers is that of a distributed and specialized control scheme (Fig 1). CRMs distributed over an *80kb region specialize in either activation or repression. Specialization is a departure from cis-regulatory organization of Drosophila segmentation genes, whose enhancers are capable of both activation and repression [31,49,50,70]. Although this arrangement could be evolutionary happenstance, it is also possible that it serves a functional purpose. Despite the antagonism between Gata1/Gata2 and Cebpa that we ( [21] and Fig 7) and others [52] have demonstrated, Gata2 and Cebpa are known to be co-expressed in eosinophils, basophils, and mast cells [71]. The regulatory logic of Cebpa, therefore, must allow for expression even in the presence of Gata2 protein. We propose that separable activation and silencing allows Cebpa to be S2 Dataset. Luciferase activity measurements in PUER cells. Luc/Ren are luminescence measurements from each well. Construct indicates the CRM/promoter being assayed, where "0" is promoter by itself. OHT is true or false for induced or uninduced conditions respectively. The "note" column indicates whether the construct is wildtype and, if not, which sites were mutated. (CSV) S3 Dataset. Luciferase activity measurements in G1ME cells. Luc/Ren are luminescence measurements from each well. Construct indicates the CRM/promoter being assayed, where "0" is promoter by itself. OHT is true or false for induced or uninduced conditions respectively. The "note" column indicates whether the construct is wildtype and, if not, which sites were mutated. (CSV)