Uncovering a Macrophage Transcriptional Program by Integrating Evidence from Motif Scanning and Expression Dynamics

Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation.


Introduction
Dynamic cellular processes, such as the response to a signaling event, are governed by complex transcriptional regulatory networks. These networks typically involve a large number of transcription factors (TFs) that are activated in different combinations in order to produce a particular cellular response. The macrophage, a vital cell type of the mammalian immune system, marshals a variety of phenotypic responses to pathogenic challenge, such as secretion of pro-inflammatory mediators, phagocytosis and antigen presentation, stimulation of mucus production, and adherence. In the innate immune system, the first line of defense against infection, the macrophage's Toll-like receptors (TLRs) play a crucial role by recognizing distinct pathogen-associated molecular patterns (PAMPs), such as flagellin, lipopeptides, or double-stranded RNA [1,2]. TLR signals are first channeled through adapter molecules (e.g., TICAM1/TRIF [3,4] and MyD88 [5]) and then through parallel cross-talking signal pathways. These activated pathways initiate a transcriptional program in which over 1,000 genes [6] and hundreds of TF genes [7] can be differentially expressed, and which is tailored to the type of infection [8,9]. The transcriptional network underlying macrophage activation can exhibit many distinct steady-states which are associated with tissue-and infection-specific macro-phage functions [10]. The transcriptional response is also dynamic and characterized by temporal waves of gene activation [6,7,9], each enriched for distinct sets of gene functions [7,9] and likely to be controlled by different combinations of transcriptional regulators [6,7]. Long-term, elucidating the transcriptional network underlying TLR-stimulated macrophage activation, and identifying key regulators and their functions, would greatly enhance our understanding of the innate immune response to infection and potentially yield new ideas for vaccine development.
Computational analysis of high-throughput experimental data is proving increasingly useful in the inference of transcriptional regulatory interaction networks [11][12][13][14][15] and in the identification and prioritization of potential regulators for targeted experimental validation [6,7]. Time-course microarray expression measurements have been used to infer dynamic transcriptional networks in yeast [14,15] and static ''influence'' networks in mammalian cell lines [11]. In the context of primary macrophages, expressionbased computational reconstruction of the transcriptional control logic underlying the activation program is not straightforward and progress is difficult to measure, for several reasons. First, transcriptional control within mammalian networks in general [16], and for key TLR-responsive genes in particular [7], is combinatorial. Second, many induced TFs are subject to posttranslational activation [17] and dynamic control of nuclear localization [6]. Third, targeted genetic perturbations are presently infeasible to perform on a large scale in a mammalian animal model, and expression knockdown (RNAi) is difficult in macrophages due to the tendency of the vector to stimulate TLRs. Finally, the few transcriptional regulatory interactions that have been validated through targeted experiments in TLR-stimulated primary macrophages are not available in a single ''gold standard'' dataset. Therefore, in the context of transcriptional regulation in the mammalian macrophage, with presently accessible expression data sets, large-scale computational inference is primarily useful for statistically identifying potential regulatory interactions, rather than as an inference tool for predicting the transcriptional control logic for specific target genes.
For the reasons described above, in order to computationally infer transcriptional regulatory interactions in a mammalian system, it is necessary to include additional sources of evidence (beyond expression data) to constrain or inform the transcriptional network model selection. Computational scanning of the promoter sequences of clusters of co-expressed genes for known transcription factor binding site (TFBS) motifs has proved particularly valuable when combined with global expression data [6,18,19]. Recently, Nilsson et al. [7] used a combination of expression clustering and promoter sequence scanning for TFBS motifs to construct an initial transcriptional network of the macrophage stimulated with the TLR4 stimulus lipopolysaccharide (LPS). Their work identified two novel regulators, but the clustering was based on an expression dataset with a single stimulus, limited biological replicates, and few time points. Moreover, TFBS motif scanning of co-expressed clusters, without utilizing expression dynamics, provides only a limited and static picture of the underlying transcriptional network. Many TFBS motifs are often recognized by multiple TFs, making difficult the unambiguous identification of the regulating TF from TFBS enrichment alone. Furthermore, because of the tendency of TFBS motifs to co-occur [20], it is difficult to determine from among a set of co-occurring motifs which associated TF is the most relevant to the condition-specific regulation of the target cluster. In the TLR-stimulated macrophage, core transcription factors already expressed in the cell (e.g., NFkB, AP1, and CREB) are rapidly activated and initiate transcriptional regulation of ''second wave'' TF genes [6]. Such transcriptionally regulated TF genes are key candidates for an integrated analysis combining TF-specific dynamic expression data and sequence-based motif scanning data.
This work is concerned with using computational data integration to identify a set of core differentially expressed transcriptional regulators in the TLR-stimulated macrophage and, in the form of statistical associations, the clusters of coexpressed genes that they may regulate. The clusters are differentiated based on temporal and stimulus-specific activation, and in this sense, the inferred associations constitute a preliminary dynamic transcriptional network for the TLR-stimulated macrophage. To achieve this, we used a novel computational approach incorporating TFBS motif scanning and statistical inference based on time-course expression data across a diverse array of stimuli. Our approach involved four steps. (i) A set of genes was identified that were differentially expressed by wild-type macrophages under at least one TLR stimulation experiment. (ii) These genes were clustered based on their expression profiles across a wide range of conditions and strains, grouping genes based on the similarity of the timing and stimulus-dependence of their induction. Gene Ontology annotations were used to identify functional categories enriched within the gene clusters. (iii) Promoter sequences upstream of the genes within each cluster were scanned for a library of TFBS motifs, each recognized by at least one differentially expressed TF, to identify possible associations between TFs and gene clusters. (iv) Across eleven different timecourse studies, dynamic expression profiles of TF genes and target genes were compared in order to identify possible causal influences between differentially expressed TF genes and clusters.
Several techniques have been developed specifically for model inference from time-course expression data, notably dynamic Bayesian networks (DBN) [21] and ODE-based model selection [12]. However, the parametric complexity of these model classes makes it difficult to apply them to infer a network underlying a specific cellular perturbation (e.g., TLR activation in the macrophage) with a limited expression dataset. Here, potential transcriptional regulatory influence is inferred from time-course expression data using the time-lagged correlation (TLC) statistic, which has been used to infer biochemical interaction networks [22] as well as transcriptional networks [23][24][25][26][27][28][29]. The TLC has the advantage that it accounts for the time delay between differential expression of an induced TF and differential expression of a target gene. In contrast to standard correlation-based methods that identify co-expressed genes, the TLC method uses temporal ordering of expression to determine whether the time lag between two correlated genes is consistent with a causal interaction. We developed a novel method to identify the optimal time lag for each gene pair, and used a prior probability distribution of transcriptional time delays to score possible interactions.
By combining the promoter scanning-based evidence with the evidence obtained by the time-lagged correlation analysis of the expression data, we were able to identify a network of statistically significant associations between 36 TF genes and 27 co-expressed clusters. Overall, 63% of differentially expressed genes are included in the network. The network provided insights into the temporal organization of the transcriptional response and into

Author Summary
Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available geneknockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and coexpressed gene clusters was validated with targeted ChIPon-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.
combinations of TFs that may act as key regulators of macrophage activation. Finally, the analysis identified a potential transcriptional regulator, TGIF1 (Tgif1), which was not previously known to play a role in macrophage activation. As a targeted experimental validation of the inferred network, two transcriptional regulators, p50 (a component of NFkB) and IRF1, were assayed for binding to cis-regulatory elements in LPS-stimulated macrophages using ChIP-on-chip, and were confirmed to bind the promoters of genes in four out of five predicted target clusters at significantly higher proportions than expected for a random set of TLR-responsive genes.

Gene selection and clustering
To probe a diverse set of transcriptional responses of Toll-like receptor (TLR)-stimulated macrophages, primary bone marrowderived macrophages (BMMs) from five mouse strains (wild-type and four mutant strains; see Table S1) were stimulated with six purified TLR agonists representing various pathogen-associated molecular patterns (PAMPs). The TLR agonists include bacterialassociated (lipopolysaccharide, Pam 2 CSK 4 , Pam 3 CSK 4 , CpG), viral-associated (poly I:C), and anti-viral (R848) stimuli, and are listed in Table S2. The mutant strains, which were included to increase the diversity of the TLR-stimulated gene expression dataset and to increase the number of time-course measurements used, consisted of null mutations of the two key adapter molecules for the TLR signaling pathway (TRIF [3] and MyD88 [5]) and two TFs predicted to be involved in TLR activation (ATF3 [30] and CREM [31]). Genome-wide expression measurements of 45,037 probesets, representing 23,259 annotated genes, were made for time courses of up to 48 hours post-stimulation, using oligonucleotide microarrays (see Materials and Methods). In all, expression measurements were made for 95 distinct combinations of strain, stimulus, and elapsed time (hereafter, ''experiments''; see Table S3). Using a spline-based multivariate regression method specifically adapted for significance testing of temporal expression datasets [32], annotated probesets were analyzed for differential expression across seven TLR-stimulated wild-type expression time-courses. After filtering for minimum absolute expression intensity and differential expression under at least one TLRstimulation experiment (see Materials and Methods), 1,960 probesets were identified as significantly differentially expressed, with each probeset mapped to a unique gene (see Table S4). Of these, 44% were found to be upregulated in LPS-stimulated wildtype macrophages. Additionally, a set of 80 TF genes (for which corresponding position-weight matrices are available in the TRANSFAC database [33]) were found to be differentially expressed in the TLR-stimulated wild-type macrophage (Table  S5). Those of TF families with established relevance in macrophage activation included two NFkB [34] component genes (Rel, Nfkb1), three AP1 [35] components (Jun, Junb, Fos), two ATF family genes [6] (Atf1, Atf3), six IRF family TF genes (Irf1/2/3/5/ 7/9) [17], and four STAT family TF genes [36] (Stat1/3/4/5a). The 80 TF genes were taken to constitute the set of potential regulators in the TLR-stimulated macrophage network.
Clustering was used to identify cohorts of genes that were coexpressed across the diverse set of TLR-stimulation experiments, based on the assumption that genes within a cluster are likely to share common cis-regulatory elements such as TF binding sites [18]. In order to focus on TF control of the timing and stimulus specificity of gene expression, genes were clustered based on the normalized profile of expression, rather than based on the foldchange. Expression measurements were transformed based on a single universal reference experiment (wild-type unstimulated macrophages) so that the transformed measurements would all lie between 21 and 1, with zero indicating the intensity in the reference experiment. This technique, which we call the signed difference ratio (SDR), has previously proved useful in clustering genes based on temporal expression in a mammalian system [37]. Each log 2 intensity measurement y pj for probeset p and nonreference experiment j, was transformed to an SDR value x pj by where j R is the index of the global reference experiment (j9 has the same range of values as j). By construction, 21#x pj #1 for all p and j. A positive SDR value indicates higher expression than in the reference experiment, and a negative value indicates lower expression. The SDR-transformed log 2 intensities of all 1,960 target genes across all 94 non-reference experiments were clustered using an unsupervised algorithm (K-means with Euclidean distance), with the number of clusters chosen using the Bayesian information criterion (BIC) [38] (see Materials and Methods, and Figure S1). The target genes were partitioned into 32 clusters (see Table S4, column 5). The differences in temporal and stimulus-specific expression between the clusters are clearly visible in a heat-map representation of the SDR-transformed expression data (hereafter, ''expression data'') ( Figure 1; see also Figure S2 for the cluster-median expression heat-map). The clusters (Table S6), which ranged in size from 18 to 113 genes, exhibit a significant diversity of timing and TLR-specificity of response. The wild-type LPS time-course was used to characterize the time scale for each cluster to respond transcriptionally (see Materials and Methods, and Table S6 columns 3-4). A small number of clusters reach peak induction within the first hour (C28, C27, C25, C26), but the majority of clusters (representing 55% of genes) respond between 2-4 hours. The temporal profiles of the clusters in wild-type BMMs under stimulation by LPS, Pam 3 CSK 4 , poly I:C, and R848 are shown in Figure S3, Figure S4, Figure S5, and Figure S6, respectively. The clusters exhibit distinct temporal profiles of transcriptional activation and repression that vary in the time of initial response and the duration of differential expression. Across all four stimuli, cluster C28 is induced first (and has sustained induction), followed by cluster C27 (which undergoes transient (2-3 h) upregulation), and then by induction of C25 and C26. Induction of C27 and C28 is delayed approximately 1 h under poly I:C stimulation, while C26 fails to fully induce under poly I:C. A comparison of the responses of clusters under 8 hours post-stimulation (LPS, Pam 3 CSK 4 , poly I:C, and R848) enabled the segregation of these clusters based on the signal transduction pathway through which they are likely primarily regulated ( Figure 2). Groups include those primarily induced (C11, C12, C15, C17, C21, C26) and downregulated (C7, C29) by the MyD88-dependent pathway, and those primarily induced (C6, C8, C22, C24) and downregulated (C4, C5, C10, C20) by the TRIF-dependent pathway. Although ''core early response'' clusters C27 and C28 appear to be inducible through either signaling pathway, a comparison of the wild-type LPS vs. poly I:C response and of the wild-type vs. Ticam1 (Lps2/Lps2) and Myd88 (2/2) responses under LPS (see Table  S7) together indicate that the MyD88-dependent pathway is responsible for the early response (within the first hour), and the TRIF-dependent pathway is responsible for sustaining the induction of these clusters. Early induced TF genes (Egr1/2/3, Junb, Rel, Irf1) also appear to be inducible through either pathway, Figure 1. Global heat-map of differential gene expression in TLR-stimulated murine macrophages, organized by clusters of coexpressed genes. Each row is one of the 1,960 genes that are differentially expressed in macrophages under TLR stimulation, and each column is a replicate-combined microarray experiment. Red/green coloring indicates the differential expression level (SDR-normalized, see Equation 1). Red indicates upregulation relative to wild-type unstimulated macrophages. Green indicates downregulation relative to wild-type unstimulated macrophages. Genotypes are indicated along the bottom edge. Clusters are indicated along the left edge. Stimuli are indicated along the top edge, with the color scheme given in the lower right corner. Clusters have been ordered based on pairwise similarity, as described in Materials and Methods, Expression Clustering. doi:10.1371/journal.pcbi.1000021.g001 from analysis of the LPS response in Ticam1 (Lps2/Lps2) and Myd88 (2/2) macrophages.
To characterize the functional role of each gene cluster in macrophage activation, gene ontology (GO) information was used to identify GO term enrichments within the gene clusters (see Materials and Methods). The 460 GO term enrichments identified within the 32 gene clusters are listed in Table S8. Many of the downregulated gene clusters are enriched for cell cycle related genes (C1, C3, C7). Clusters C15, C25, and C28 appear to be enriched for cytokines-C28 includes the pro-inflammatory cytokine Tnf (TNFa) as well as Ccl3, Ccl4, Cxcl1, and Cxcl2; C25 includes the cytokines Cxcl10 and Il10; and C15 includes the interleukin cytokine genes Il1b, Il6, and Il12b. Cluster C24, enriched for signal transduction genes, also includes the important cytokine Ifnb1 (IFNb). The early-unregulated clusters, C24-28, show a high proportion of induced TFs and are enriched for TFs relative to the genome (see Table S6 and Materials and methods). Across clusters, the fraction of TFs was generally found to decrease with increasing induction time ( Figure 3). Subsequent analysis focused on identifying statistically significant associations between the 80 differentially expressed TF genes and the 32 co-expressed clusters.

Expression dynamics analysis
Noting the high proportion of induced TFs in early-upregulated clusters, we chose a signal processing technique, the time-lagged correlation (TLC), to assess potential transcriptional regulatory interactions using the time-course expression data [22,23,[25][26][27][28]. The approach is based on the observation that when an induced TF affects a target gene's expression through its own differentially Figure 2. Hierarchical organization of differentially expressed gene clusters from TLR-stimulated macrophages reveals pathwayspecific transcriptional responses. The color of a rectangle in the heat-map shows the cluster-median differential expression (relative to wildtype unstimulated macrophages) under stimulation with the TLR agonist indicated by the column label (bottom of figure), for the cluster indicated by the row label (right-hand side). The column label Pam3 denotes the TLR agonist Pam 3 CSK 4 . The differential gene expression level is computed using the signed difference ratio (SDR, see Equation 1). Clusters (rows) have been ordered for display based on similarity of overall transcriptional response to the four indicated TLR agonists (see Materials and Methods, Expression Clustering). In the heat-map, green indicates downregulation, and red indicates upregulation. Colored subtrees of the dendrogram indicate specific inferences that can be made about the likely signaling pathway (MyD88-dependent, TRIF-dependent, or a combination) on which the transcriptional regulation of the cluster depends. The legend in the lower-left corner explains the color scheme for denoting the inferred signaling pathway-dependence of the clusters. Clusters without a color bar on the right appear to respond through either signaling pathway. The regulation of clusters C7, C11, C12, C15, C17, C21, C26, and C29 appears to be primarily MyD88-dependent; regulation of clusters C4, C5, C6, C8, C10, C20, C22, and C24 appears to be primarily TRIF-dependent; and clusters C23, C30, and C32 appear to be regulated oppositely by the two signaling pathways. This plot shows only the extremal differential response to TLR agonists; the clusters also differ in temporal expression (see Figure S3, Figure S4, Figure S5, and Figure S6). doi:10.1371/journal.pcbi.1000021.g002 regulated mRNA level (and through its own differential protein expression), the induction of the target gene's mRNA expression will occur with a time lag relative to the induction of the regulator [39][40][41][42]. This time lag is due to the combined effects of the translation, folding, nuclear translocation, and turnover timescales for the regulatory protein, and the time scale for elongation of the target gene mRNA. In our application of the TLC method, both the correlation magnitude and the time lag are used to assess significance, as we describe below.
Let g 1 denote a differentially expressed TF gene, and let g 2 denote a differentially expressed gene. We wish to estimate our degree of confidence in the null hypothesis, that g 1 does not transcriptionally regulate g 2 , given time-course expression data for both genes. In the simplest case, the alternative hypothesis could be that g 1 codes for a TF protein that binds the promoter of g 2 , thereby regulating its transcriptional activity. Let t be a fixed time lag for which the TLC between g 1 and g 2 is to be computed. Let T denote a set of discrete time points at which gene expression is measured, and let T9 denote the set of time points T+t. Let X T (g 1 ) denote the vector of expression measurements of g 1 at the time points T, and let X T9 (g 1 ) denote the measurements of g 2 at times T9 (which can be estimated by interpolation). The time-lagged correlation (TLC) coefficient between g 1 and g 2 with time lag t is defined as r t (g 1 ,g 2 ): cov(X T (g 1 ),X T 0 (g 2 )) ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi cov(X T (g 1 ),X T (g 1 ))cov(X T 0 (g 2 ),X T 0 (g 2 )) p where ''cov'' is the standard covariance. As with the standard correlation, a TLC that is close to 1 represents a perfect correlation, and a TLC that is close to 21 represents a perfect anti-correlation. This definition is easily extended to multiple timecourses (see Materials and Methods). We note that although Equation 2 is defined in terms of g 1 being a TF, it can be applied to any gene pair, for example, to obtain a background distribution of TLC coefficients of gene pairs satisfying the null hypothesis. Two examples of a TF exhibiting a high time-lagged correlation with a target gene are shown in Figure 4. Both interactions (RelRNfkb1 [43] and Irf7RStat1 [44]) correspond to known transcriptional regulatory interactions, and in both cases, the correlation with zero time lag is poorer than the correlation obtained with a time lag. Assessing the significance of an observed sequence of timelagged correlations between two genes (as a function of the time lag) as an indicator of possible transcriptional regulation necessitates formulating our prior expectation (i.e., prior probability distribution) for the time lag of a true transcriptional regulatory interaction. For a TF gene g 1 and a target gene g 2 , the overall transcriptional regulatory time delay t c (where ''c'' stands for the combined gene-gene delay) can be decomposed as a sum of two contributions, one for translation of the TF and posttranslational processing/translocation (,10.564 min [41,45]), and one for transcription and post-transcriptional processing of the target gene (,20-40 min [41,42]). The total delay t c was modeled using the gamma distribution with a mean value of 45 min and a variance of ,250 min 2 (see Text S1, Section 3). Because it is conditioned on the existence of a transcriptional regulatory interaction (TRI) between g 1 and g 2 , we denote this probability distribution by P(t c |H 0 ) (the symbol H 0 means that the null hypothesis, i.e., that there is no TRI, is false). This distribution was discretized to the set of time lags for which the TLC was computed, to obtain an estimate of the discrete probability for observing a given optimal time lag, P(t|H 0 ) (see Figure S7). These probabilities were then combined with the P value for the squared time-lagged correlation coefficient, r t 2 (g 1 , g 2 ), whose derivation we describe next.
For each pair (g 1 ,g 2 ) for which the TLC approach was to be applied, an ''optimal time lag'' h(g 1 ,g 2 ) was selected, so that a single representative TLC could be obtained for the pair. The set of time lags and the set of time-course experiments to use were selected according to a constraint (imposed to minimize interpolation error) that the target gene expression at maximum time lag must be interpolated from at least three measurements. Based on this constraint, and taking into account the expected precision at which the optimal time lag can be estimated (65 min, based on the replicate variability in the expression data-see Materials and Methods), the set of time lags was chosen to be t M {0, 10, 20, 30, 40, 50, 60, 70, 80 min}. Eleven time-course experiments satisfied the criteria (combining six stimuli and three genotypes, see Table  S9). The TLC r t 2 (g 1 , g 2 ) was computed for each of the t values, for each pair of genes, using data from all eleven time-course experiments combined (see Materials and Methods). The next step was to determine the optimal time lag for (g 1 ,g 2 ) from the squared TLC coefficient r t 2 (g 1 , g 2 ). It is not ideal to simply select the t at which r t 2 (g 1 , g 2 ) is maximal, as some studies have done [23,26,46], because of two competing bias effects, as we now explain. Consider a pair of genes (h 1 ,h 2 ) satisfying the null hypothesis, and let t max ;max(T), where T is the set of time points for a single time-course. In practice the expression of h 2 cannot be extrapolated beyond t max , so the effective number of data points for computing the TLC r t 2 (h 1 , h 2 ) is limited to the number of time points within T that are less than t max 2t. Thus, the number of measurements that can be used to compute the TLC is tdependent, and the distribution of TLCs for pairs of genes satisfying the null hypothesis depends on t. Therefore, one will more frequently observe (by chance) a TLC exceeding a given value (say, 0.9), by selecting the largest possible t. In addition, the high degree of synchronization within the transcriptional response, as well as the fact that all the SDR-transformed expression levels are zero at the initial time point, result in a second bias towards zero time lag. This effect is strengthened as the number of time points in the data set (per time-course) decreases. Therefore, selecting the optimal tto maximize r t 2 (g 1 , g 2 ) introduces an unwanted bias towards the smallest and largest tvalues investigated (see Figure S8), and against t values in the middle of the range of time lags (i.e., 20-60 min).
To avoid the above-described bias, a background cumulative distribution of squared time-lagged correlation coefficient values, denoted by F pt (where p t is the squared correlation r t 2 ) was computed separately for each time lag t, using a large set H of gene pairs such that there is no direct transcriptional regulatory interaction (TRI) for each gene pair in the set (see Materials and Methods). The functions F pt f g were used to select the optimal time lag h(g 1 ,g 2 ), and the fractional significance of the lag-specific squared correlation coefficient j(g 1 ,g 2 ), Making use of the discretized distribution P(t|H 0 ) defined above, a probability ratio R(t) was computed as the ratio of the probability of the null hypothesis (that there is no direct TRI between g 1 and g 2 ) given the measured optimal time lag, to the marginal probability of the null hypothesis, It should be noted that the uncertainty in q due to the discretization of time lags (a practical necessity in the context of microarray-derived expression data) leads to uncertainty in the estimation of R(t). However, the effect of this uncertainty on the cluster-combined P value (see Equation 10 below) is small, due to the fact that time lag estimation errors for genes in a cluster are not strongly correlated. The marginal probability P(t) was estimated from the optimal time lags of all gene pairs, and the marginal probability P(H 0 ) was estimated from data in the literature (see Materials and Methods). Using this probability ratio, and in analogy with Fisher's method, a combined score for the gene pair (g 1 ,g 2 ) was constructed, taking into account both the optimal time lag h(g 1 ,g 2 ) and the fractional lag-specific significance j(g 1 ,g 2 ), s(g 1 ,g 2 )~ln j(g 1 ,g 2 ) : R h(g 1 ,g 2 ) ð Þ ð Þ ð 6Þ Using the cumulative distribution F sjH0 of s scores for gene pairs satisfying the null hypothesis, the significance of the association between g 1 and g 2 based on expression data can be computed, This formula was applied for all pairs (g 1 ,g 2 ) where g 1 ranged over the set of 80 TFs, g 2 ranged over the set of all 1,960 differentially expressed genes, and g 1 ?g 2 (see Materials and Methods). The expression data for the TFs are provided in Table S10 and the expression data for all 1,960 differentially expressed genes are provided in Table S4.
To estimate the overall significance (based on time-course expression data) of the association between a TF gene f and a cluster C, the P values P tlc (f,g) were combined into a P value for the cluster, P exp (f,C). For each pair (f,C), a Fisher score F exp (f,C) was computed, where C\{f} means that if the TF gene f was a member of cluster ) as a function of the time lag t, for Irf7 and Stat1. The peak value of r t 2 occurs at t = 10, but the peak significance value (taking into account the lag-specific null distribution) occurs at t = 20 min. doi:10.1371/journal.pcbi.1000021.g004 C, the self-association P tlc (f,f) was excluded. For each cluster C, the number of degrees of freedom, denoted by d(C), was estimated using K-means clustering (see Materials and methods). The d(C) values were used to obtain a TF-to-cluster P value, P exp (f,C), using a x 2 test (see Materials and methods). The number of pairs for which P exp (f,C)#10 23 , was 23. The differential expression levels for the strongest (TF,cluster) pairs in wild-type time-courses following stimulation by LPS (one of the time-courses used for the TLC analysis; see Table S9) are shown in Figure S9. They show a high degree of correlation between the TF gene and target cluster. The distribution of P exp (f,C) over all TF-to-cluster pairs, and the estimated false discovery rate (FDR), are shown in Figure  S10.

Promoter scanning of co-expressed gene clusters
To provide an independent source of evidence for association between a differentially expressed TF gene and a co-expressed gene cluster, the promoters of differentially expressed genes were scanned using position-weight matrices (PWMs) representing motifs recognized by murine TFs. A motif was selected if it is recognized by at least one TF of which at least one component protein was differentially expressed in the expression dataset, ensuring that the TF had at least one expression profile that could be compared with (potential) target genes using the TLC. For each PWM, the fraction of genes with at least one above-threshold match within the promoter was computed, within a reference set of all genes detected as expressed within the TLR-stimulated macrophage, and within each co-expressed gene cluster. A total of 150 position-weight matrices were selected from the TRANSFAC database [33] for motif scanning, corresponding to the 80 differentially expressed murine TF genes (see Table S5, and Materials and Methods). Promoter sequences 2 kbp upstream of the transcription start site were obtained for 1,713 out of the 1,960 differentially expressed genes, and for 7,492 out of 8,788 expressed genes (used as a reference set; see Materials and Methods) from the UCSC genome annotation database [47]. For each PWM, a minimum match score was determined at which the PWM had a match on average once per 10 kb, within a set of 7,503 promoter sequences for genes not detectably expressed in the macrophage (to avoid biasing the match score threshold calculation with true TF targets; see Materials and Methods). Using these PWM match score thresholds, promoters were scanned within the reference set of genes, and within each co-expressed cluster of genes. The distribution of distances of matches from the transcription start site ( Figure S11) has a median of 537 bp.
As a next step towards inferring a transcriptional network, enrichments of TFBS motifs were computed for individual gene clusters. For each cluster C and position-weight matrix m, enrichment statistics were computed based on the fraction of genes in C possessing at least one match for m. For each pair (m,C) for which the fraction of genes containing a match for m within the cluster C was greater than in the reference set of genes, a P value was computed using Fisher's exact test (see Materials and Methods, and [48]) and denoted by P scan (m,C). This P value represented the significance of the enrichment of matrix m within the promoters of cluster C, relative to the reference set of promoters (expressed genes). A matrix representation of the strongest motif enrichments (56 associations with P scan (m,C)#10 22 ) with the clusters grouped by expression similarity ( Figure 5) reveals several associations between TF motifs and patterns of differential expression. First, NFkB and IRF recognition elements are associated with upregulated clusters, while E2F and MYCMAX elements are associated with downregulated clusters. The IRF element was strongly associated with TRIF-dependent cluster C6 and STAT1 was strongly associated with C22. Many TF motifs were associated with the core early response cluster C27, including AP1, CREB/ATF, EGR, PEBP, and PPARA. The quantitative results of the cluster-wise statistical tests (numbers of matches and P values) are provided in Table S11.
To enable integration of the promoter scanning evidence with the time-lagged correlation evidence, PWMs that were enriched for matches within gene clusters, were mapped to differentially expressed TF genes as follows. For each PWM m, a list of genes coding for TFs (or TF components) that bind the motif corresponding to m were obtained from a TRANSFAC-derived mapping (see Materials and Methods). For each TF gene f and cluster C, a P value for the association between f and C based on promoter scanning evidence, P scan (f,C), was defined as the minimum over all P scan (m,C) for all matrices m that are associated with the TF gene f. The distribution of the resulting P values and the false discovery rate (as a function of P value) are shown in Figure S12. A total of 31 factor-to-cluster associations were identified with P scan (f,C)#10 23 , indicating a statistical power that is slightly higher than with the TLC-based evidence.

Data integration and network extraction
To identify the set of all possible TF gene-to-target interactions consistent with motif scanning evidence, for each TFBS motif match within the promoter of a target gene, the time-lagged correlation was computed for all possible TF genes that map to the TFBS motif. The resulting list of 54,253 pairs (f,g) of TF gene f and target gene g, provided as Table S12, shows that many known transcriptional regulatory interactions have high ranking based on time-lagged correlation-for example, NFkB/Rel associated with Icam1 [49] and Cebpd associated with Il6 [50]. Although the TLCranked list of motif targets has some potential utility for identifying specific transcriptional regulatory interactions, even the highranking elements of the list will contain many false positives (and will miss many true transcriptional regulatory interactions) due to the uncertainty in motif PWMs and the prevalence of posttranslational regulation that may obscure the time-lagged correlation. Therefore, further data reduction is necessary to gain insight into the global transcriptional program of the TLRstimulated macrophage. By using a statistical test that compares the relative frequency of motif occurrence within a cluster relative to a background set of genes, a more reliable estimate of TF association with a co-expressed cluster can be obtained.
To construct a combined transcriptional network of the TLRstimulated macrophage, P values for associations between TF genes and co-expressed gene clusters based on expression dynamics and promoter scanning were combined. For each pair (f,C) where f is one of 80 TF genes and C is one of 32 gene clusters, a combined P value P comb (f,C) was computed from the P values for the scanning and expression evidences, P scan (f,C) and P exp (f,C). The P values were combined using Fisher's method (see Materials and Methods), a standard tool for meta-analysis of independent tests of a hypothesis [51]. TF-cluster pairs were then ordered by increasing P value P comb (f,C), and a cutoff was selected so that the estimated false discovery rate did not exceed 0.025 (resulting in a cutoff P comb (f,C)#0.0248). Additionally, two filtering criteria were imposed: (i) P scan (f,C)#0.05, to ensure that there is a minimal enrichment of TFBS; and (ii) a cluster-average optimal time lag between f and C that was greater than 10 min, i.e., AEhae f,C $10 min (see Materials and Methods). A scatter plot of the P values for the two evidences is shown in Figure S13, and indicates that for the data points that were rejected based on the P comb (f,C) cutoff, no dependency between the evidences is evident. A total of 90 interactions involving 36 TF genes and 27 clusters (comprising 86% differentially expressed genes), were accepted based on the above criteria (see Table 1). If the TLC P values were not included, and if the same rate of false discovery were imposed, the network would be significantly less parsimonious (,150 interac-tions), due to the large number of TF gene families that map to a common motif. Overall network coverage was estimated by taking the fraction of differentially expressed genes that (i) are members of the 27 clusters in the network; and (ii) possess a match for a motif Figure 5. Patterns of high-confidence motif enrichments within promoters of target clusters reveal associations between regulatory elements and expression patterns. Each row in the matrix represents a TF binding element, and each column represents a cluster of differentially expressed genes. Clusters are ordered as in Figure 2, and thus are grouped hierarchically by similarity of their extremal expression foldchange under the four TLR agonists LPS, Pam 3 CSK 4 , poly I:C, and R848. Each motif (row) is associated with one or more position-weight matrices (the V$ prefix and numeric suffixes are omitted, and results for multiple position-weight matrices representing the same motif were combined for each column, by taking the matrix with the maximum number of matches within the indicated cluster). Each colored block in the matrix indicates pair of a motif and target cluster for which the fraction of genes in the cluster with a motif match, is enriched relative to the overall fraction of genes expressed in the macrophage that possess the motif (P#10 22 , Fisher's exact test). The color of each matrix element (block) in the interior of the figure indicates the fraction scanned of genes within the cluster containing at least one match for the indicated motif. The number of scanned genes within the cluster that contained a match for the indicated motif is shown in yellow typeface. The red/green colored blocks above the top horizontal axis shows whether each cluster is upregulated (red) or downregulated (green) at its most extremal fold-change under stimulation with the aforementioned TLR agonists. The hatched green/red pattern indicates a cluster whose extremal fold-change direction (up/down) is stimulusdependent (see Figure 2). The colored (blue, cyan, orange, yellow, purple) blocks above the top of the matrix indicate the likely pathway through which the cluster is differentially expressed; the color scheme corresponds to that shown in the dendrogram in  Column 1 indicates the target gene cluster. Column 2 indicates the transcription factor gene that is associated with the cluster, based on the two sources of evidence. Column 3 indicates the cluster of which the transcription factor gene is a member. Column 4 indicates the -log 10 P value of the promoter scanning-based evidence. Column 5 indicates the name of the position-weight matrix that had the smallest scanning-based P value of association with the cluster, for the indicated transcription factor gene (the ''V$'' prefix is not shown). Column 6 indicates the fraction of scanned genes within the cluster that had at least one match for the indicated positionweight matrix. Column 7 contains the number of scanned genes within the cluster that had at least one match for the indicated position-weight matrix. Column 8 indicates the negative log 10 P value of the time-lagged correlation evidence. Column 9 indicates the cluster-wide average time lag h with respect to the indicated transcription factor gene. Column 10 contains the average optimal time-lagged correlation between the indicated transcription factor gene, and the genes within the cluster. doi:10.1371/journal.pcbi.1000021.t001 The distribution of the number of targets regulated by TFs, the so-called out-degree distribution of the transcriptional network, is one key measure of the network's interconnectedness [52]. For each TF that was included in the transcriptional network, the number of targets was estimated using the promoter scanning data (see Materials and Methods). The out-degree varied approximately 20-fold over the set of 36 TF genes ( Figure S14). The transcription factor MYC (which is involved in development and cellular differentiation [53]) was found to be the most highly connected in the network (consistent with the high out-degree for MYC found in [11]), followed by members of the E2F family of TFs (believed to play a role in cell cycle regulation [54]). Other highly connected TFs include NFYC (a repressor in the TGFb signaling pathway [55] and member of a TF family involved in monocyte differentiation [56]) and RXRA (a component of heterodimeric TFs that regulate inflammatory signaling and cholesterol metabolism [57]). Also strongly connected in the network are the NFkB TF family members cREL and NFKB1/ p50 (key early regulators of the immune response [58]); the IRF family members IRF1, IRF3, IRF5, IRF7, and IRF9 (regulators of interferon-induced immune response [17]); and STAT1 (a key regulator of apoptosis and mediator of interferon signaling [59]). Both the IRF and E2F family TFs had strong P values for association with target clusters ( Figure S14). The out degree distribution appears to be scale-free, consistent with previous reports for mammalian networks [11,60]. The number of TF genes associated each cluster (in degree) ranged from 1 to 9, with an average in-degree of 3.3.
To reveal patterns among TFs that may regulate multiple clusters, the connections between the 36 TFs and the 27 clusters in the inferred network were arranged in a matrix in which each row represents an induced TF and each column represents a cluster of differentially expressed genes ( Figure 6). Both the TFs and clusters were divided into subsets that are induced or repressed under LPS stimulation, and ordered within these subsets based on the time of 25% differential expression under LPS (see Materials and Methods). Thus, the matrix is divided into quadrants; for example, the upper left quadrant contains connections between induced TF genes and induced clusters, and the lower-right quadrant contains connections between downregulated TF genes and downregulated clusters. The upper left and lower right quadrants contain primarily positive correlations, with most anti-correlated connections found in the upper right and lower left quadrants. In the upper left quadrant, the connections generally fall along an arc indicating the temporal sequence of TF gene activation. The anticorrelated ''off arc'' connections within this quadrant generally indicate the association between the falling edge of a transiently induced TF gene and the rising edge of a late-induced gene cluster. The only correlated ''off arc'' connections within this quadrant (Nfkb1RC28, and JunbRC11) have weak time-lagged correlation evidence, but a very significant motif scanning P value. In contrast, the downregulated gene clusters and TF genes are not as stratified as the upregulated clusters in terms of the time of differential expression, and thus associations appear throughout the lower-right quadrant.
The network of associations between TF genes and clusters (based on combined scanning and expression evidence) directly leads to hypotheses regarding TF regulation of clusters. For example, a statistical association between any of the TF genes Jun, Junb, or Fos and a cluster would suggest a hypothesis that the TF AP1 regulates that cluster. The network also recapitulates several known transcriptional regulatory interactions. First, the NFkB component Rel is associated with C15, which is enriched for cytokines and contains many NFkB targets including Nfkb1 [43], Il6, and Il12b [6]. Second, Jun, a component of AP1 (a regulator of stress response such as response to ultraviolet radiation or pathogenic insult [61]), is associated with C27, an earlyupregulated cluster that is enriched for cell cycle-related genes and genes involved in the DNA damage response. Furthermore, C27 contains Egr1, which is a known target of AP1 under genotoxic stress conditions [61]. Third, IRF1 is strongly associated with the antiviral cluster C13, which contains the validated IRF1 target gene, Ccl5 [62]. The network also includes the TF genes Egr1 (a key regulator of LPS-induced cytokine signaling [63]) and Egr2 (implicated in adhesion and phagocytosis [64] as well as cell proliferation [65]) as regulators of C27. Finally, the TF gene Sfpi1 (PU.1) is associated with C17, an induced gene cluster enriched for endosome-associated genes (PU.1 over-expression is known to block viral escape from the endosome [66]).
Several interactions in the network were detected only through the integration of expression data with promoter scanning evidence. For example, based on scanning evidence alone, with a FDR of 0.1 (P scan #0.0033), the association between Nfkb1 and C17 would not have been detected, but by including the effect of the strong TLCs between Nfkb1 and C17 genes, an association between Nfkb1 and C17 was detected. As a second example, the network includes an association between the TF gene Irf1 and cluster C25; based on promoter scanning evidence alone, only a general association of the IRF family with the cluster would have been possible (see Table 1).
In order to investigate the possible co-operative regulation of clusters by TFs in the network, protein interactions were obtained for human orthologs of protein units associated with the 36 TF genes shown in Figure 6. Protein interactions between the TFs were obtained from the Human Protein Reference Database [67] and the Biomolecular Interaction Network Database [68] (see Materials and Methods). The resulting interaction diagram, shown in Figure S15, reveals that upregulated TFs are highly interconnected at the level of protein-protein interactions [6]. Furthermore, the diagram shows 15 pairs of interacting TFs whose corresponding genes co-associate with clusters in the network. An example corresponding to a known transcriptional complex is the pair c-JUN (an AP1 component) and EGR1 [69]; both are associated with C27.
A notable induced TF gene in the network is Tgif1 (TGIF1, or TG-interacting factor 1, named for the core TGIF1 binding sequence, 59-TGTCA-39 [70]), a transcriptional repressor in the TGFb signaling pathway [71]. TGIF1 has not been previously implicated in classical macrophage activation. It is associated (P scan ,0.01) with C26, a cluster containing genes involved in immune response, ubiquitin cycle, and leukocyte activation. Specifically, C26 contains the cytokines Csf2 (which stimulates differentiation of macrophages and granulocytes, and is proinflammatory [72]) and Gm1960 (a mediator of neutrophil chemotaxis [73]). The Csf2 promoter appears to have a TGIF1 binding site motif match (match score.0.96) in the region (2254,2244) relative to the transcription start site, and Gm1960 also has three TGIF1 motif matches approximately 1.5 kbp upstream of the start site (best match score.0.95). In humans, TGIF1 is known to interact with several protein members of the SMAD/AP1 transcriptional complex ( Figure S16) [71,74].
To validate the microarray-based expression measurement, Tgif1 expression was measured in murine BMMs using quantitative PCR (qPCR; see Materials and Methods). Consistent with the microarray-based results, Tgif1 expression was found to be ,3-fold upregulated after 1 hour of stimulation by LPS or Pam 3 CSK 4 (data not shown). Furthermore, from microarray-based measurement (Affymetrix probeset 1422286_a_at), Tgif1 expression is ,2.4-fold reduced in unstimulated Ticam1 (Lps2/Lps2) BMMs relative to wild-type (with no apparent effect in MyD88 (2/2) BMMs relative to wild-type), suggesting that basal expression of Tgif1 is TRIF-dependent.

Targeted validation using ChIP-on-chip
Genome location analysis based on chromatin immunoprecipitation-on-chip (ChIP-on-chip) hybridization was used to validate five high-confidence associations in the transcriptional network, between NFkB/p50 and clusters C13, C17 and C28; and between IRF1 and clusters C13 and C25. This validation consisted of demonstrating a statistical enrichment of ChIP-on-chip-identified binding for a given TF among genes within a cluster with which the TF was associated through our computational method, as compared to randomly selected TLR-responding genes. A customfabricated oligonucleotide microarray was used, with probes tiling up-and downstream of genes that were differentially expressed under TLR stimulation in a murine macrophage-like cell line (see Materials and Methods). Macrophages were stimulated with LPS and then ChIP was carried out using TF-specific antibodies at 1 and 2 h, and (for IRF1 only) 4 h. Binding of p50 was highly enriched within the genes of clusters C13 and C28 represented on the tiling array (18 out of 23 and 20 out of 21 genes were bound, respectively) but not significantly enriched for C17 (11 out of 20). IRF1 binding was enriched within the genes of C13 and C25 (18 out of 23, and 18 out of 22, respectively). In four out of five cases, the enrichment relative to the overall rate of binding to differentially expressed genes represented on the tiling array satisfied P,0.01 (Fisher's Exact Test; see Table 2). ChIP-on-chip results for individual target genes within the aforementioned clusters are provided in Table S13, and results for all clusters that were represented on the array (see Materials and Methods) are shown in Table S14. For each of the two TFs assayed with ChIPon-chip, and for those clusters that were identified as targets of the TF through the network analysis, the fraction of clusters found to have significant TF binding to their genes was higher than for clusters selected randomly from among all clusters represented on the tiling array (1.7-fold overall). Additionally, the association between IRF1 and C30 was significant (P,0.05) based on scanning, but not significant based on P comb . Consistent with the integrated analysis, C30 was not significantly enriched for IRF1 binding, based on the ChIP-on-chip assay.

Discussion
In this study we inferred a transcriptional network underlying dynamic TLR-stimulated activation of the murine macrophage. This network consists of statistical associations between differentially expressed transcription factor (TF) genes and co-expressed clusters of genes, each indicating a possible role for the associated TF in regulating the cluster. Such associations have proved useful for generating and prioritizing testable hypotheses regarding transcriptional regulation [6,7]. A novel computational approach was used that combined sequence-and expression-based evidence. Using expression data acquired under a comprehensive set of TLR stimuli (and sampled densely in time), differentially expressed genes were partitioned into clusters of co-expressed genes that revealed a diversity of induction time scales, functional enrichments, and stimulus-dependent activation patterns. The clustering enabled sensitive identification of TFBS enrichments despite uncertainty (due to limited sampling) in the position-weight matrices and in the appropriate score threshold for motif scanning. In addition, using the SDR-normalized expression data for clustering ensured that genes were clustered based on their temporal (and stimulus-dependent) activation profiles, rather than by the magnitude of fold-change. Early-upregulated clusters were found to be enriched for TFs, consistent with the idea that many regulators of the transcriptional program are themselves produced on-demand in response to TLR stimulation [6]. The early transcription factor gene (over time), in LPS-stimulated wild-type macrophages (SDR, see Equation 1). Red indicates upregulation relative to unstimulated macrophages and green indicates downregulation. A diamond symbol indicates the transcription factor response time. (C) Each column of the red-green matrix indicates the median normalized differential expression of the genes in the indicated cluster (over time), in LPS-stimulated wild-type macrophages. The diamond indicates the average LPS response time of the genes within the cluster. doi:10.1371/journal.pcbi.1000021.g006 Shown are five (TF,cluster) associations for which at least 30% of the genes within the cluster are represented on the tiling array, along with the results of the ChIP-onchip assay for binding of the indicated TF to the promoters of genes within the indicated cluster. Column 1 indicates the transcription factor antibody target. Column 2 indicates the position-weight matrix that was used for scanning the promoters of genes in the cluster. Column 3 indicates the stimulus used. Column 4 indicates the gene cluster whose promoters the indicated TF is predicted to bind. Column 5 indicates the time points at which ChIP-on-chip assays were performed. Column 6 indicates the number of genes in the cluster. Column 7 indicates how many of these genes have probes tiled on the chip, in the flanking 59 intergenic region (due to the much smaller microarray expression dataset used to select genes for the tiling array, only about 22% of the 1,960 differentially expressed genes were represented on the tiling array, as described in Materials and Methods). Column 8 indicates the number of these genes that were identified positively by ChIP-on-chip as having the indicated transcription factor bound to chromatin, in the 59 flanking intergenic region. Column 9 indicates the P value for the enrichment of ChIP-on-chip hits among genes within the cluster identified by promoter scanning, as compared to the set of all genes on the array (Fisher's exact test). The ChIP-on-chip results for individual genes are provided in Table S13 induction of a large number of TFs was an important indicator of the potential utility of analyzing temporal expression as an evidence for transcriptional regulatory interactions (TRIs). The time-lagged correlation (TLC) was used to analyze temporal gene expression for TFs and gene clusters, and in addition to the correlation strength, the biological plausibility of the estimated optimal time lag was factored into the significance assessment for the TLC. This time lag is useful for distinguishing between genes that are linked by a regulatory interaction and genes that are merely co-expressed. The TLC is efficient to compute, and in general requires fewer measurements than methods that rely on estimating the joint probability distribution of the expression of two genes (e.g., pairwise mutual information [11]). This observation is related to the most notable drawback of the TLC, namely, that it is sensitive only to the covariance of the joint probability distribution, and not higher order moments (with significantly more expression measurements, a possible extension of this method could be to use time-lagged mutual information [75]). A second limitation of the TLC (and of any evidence based solely on lagged expression comparison) is that in practice it can be difficult to distinguish between indirect transcriptional regulation through a rapid-acting intermediary, and direct transcriptional regulation. Finally, while it is not a significant issue in the cluster analysis described in this work, we note that the proposed method for estimating the significance of the expression data for a single gene pair (Equation 6) is potentially not robust with respect to noise in the data. For the purpose of single-gene analysis, it could be improved by using a polynomial fit to the t dependence of F pt r 2 t À Á , or by defining the optimal time lag to be the time lag that minimizes s.
The specific implementation of the TLC approach used in this study has two key advantages. First, by selecting the optimal time lag for a TF-gene pair based on minimizing the lag-dependent P value rather than maximizing the squared correlation coefficient, the inherent bias of the TLC technique in selecting time lags was avoided. This made it possible to include the contributions of (i) the magnitude of the correlation, and (ii) the probability of observing the optimal time lag, to the significance of a pairwise association. Second, the probability distribution for time lags among true interactions was incorporated as a prior in the significance calculation. This enabled taking into account the biological plausibility of the time lag in computing the significance. This significance test for the TLC has not, to our knowledge, been previously reported.
With any network inference method based on pair-wise comparison of the expression profiles of a regulator and a possible target (including the TLC method), it is difficult to accurately resolve the multi-factorial control of a target gene. This is particularly true when the effect of one TF is simply to modulate (amplify or dampen) the time-varying influence of another TF on a target gene. Several additional mechanisms can confound or eliminate the correlation between the expression level of a TF gene and the chromatin-bound activity of the corresponding TF, including multimeric TF assembly from protein products of several genes, post-translational activation of the TF, dynamically regulated nuclear translocation, and dynamically regulated TF protein turnover. For example, in the case of ATF3, there is little correlation between differential expression and nuclear localization [6], and as a result, this TF is not strongly implicated in the network via TLC. However, we note that the CREB/ATF binding motif was identified as enriched within the core early response cluster C27. Additionally, we note that given that the expression data set used in this work is densely sampled at early times (1-2 hours) and sparsely sampled at late times, our ability to leverage expression data as an evidence for TRI is reduced for very lateresponding TF genes (e.g., Lmo2). In summary, with a limited expression dataset, a high-significance TLC by itself should not be regarded as sufficient evidence to infer a TF-to-target association, underscoring the importance of incorporating additional sources of evidence.
In the present work, promoter sequence scanning was used to identify TFBS motifs enriched within co-expressed gene clusters. Due to the often one-to-many mapping between TFBS motifs and TFs, the scanning-based evidence often identifies multiple candidate TFs with a gene cluster, of which perhaps a single TF may be the relevant regulator in the given condition. The TLC approach described here provides an objective statistical framework for evaluating the suitability of a proposed TF-to-target association based on a large set of time-course expression measurements. In particular, the approach enabled the preferential identification of TF-to-target associations for which the optimal time lag is biologically plausible, and the rejection of associations with a biologically implausible (e.g., zero) time lag. Four (TF,cluster) associations were validated using ChIP-on-chip assays, in which enriched binding of the relevant TF was shown among genes within the relevant cluster. The ChIP-on-chip enrichment P values are conservative estimates of the genomewide binding enrichment, due to the fact that genes were selected for inclusion in the tiling array based on differential expression under LPS stimulation in a macrophage-like murine cell line (RAW 264.7). We note that for each of the two TFs assayed, two (TF,cluster) pairs were found to be enriched for binding based on ChIP-on-chip, but not based on the network analysis. Such falsenegative predictions may be the result of binding sites sometimes occurring upstream of the 2 kbp region selected for TFBS motif scanning, the target TF being cross-linked to a DNA-bound coregulator recognizing a different motif than the TF, or due to the TF recognizing a TFBS motif variant not represented in the motif database.
The inferred transcriptional network resulting from our analysis associates at least one TF with 27 of the 32 clusters. The 27 clusters comprise 86% of all differentially expressed genes, with an overall network coverage (including motif matches for individual targets) of 63%. An average of 3.3 TF genes were associated with each cluster, which may reflect the prevalence of combinatorial control in the transcriptional network. The TFs implicated in the network are also highly interconnected at the level of proteinprotein interactions, and interacting TFs are found to co-associate with clusters in the network. Many TFs known to play a role in macrophage activation were strongly associated with clusters in the inferred network (e.g., NFkB, AP1, IRF family members, and STAT1). NFkB and AP1 appear to be the most prolific activators in the network. EGR family members appear to be associated with early-induced clusters, and IRF family members are associated with later-induced clusters. In particular, the network associated specific TFs with immunologically important gene clusters (e.g., EGR1/2 and AP1 regulating cluster C27; and NFkB and IRF1 regulating cluster C13). Finally, incorporating expression data enabled identifying a specific TF from among members of a large TF family recognizing a motif enriched within a target cluster; for example, the predicted interaction between IRF1 and C25 was validated by ChIP-on-chip. However, we note that more ChIP-onchip data, with a variety of TF targets, would be required to quantitatively assess the performance of the combined network analysis compared to single-evidence analysis using sequence data or expression data alone.
We note that by including in the analysis only TFBS motifs for which at least one associated TF gene was differentially expressed, the inferred network does not include TFs for which there is no transcript-level differential expression; this trade-off enabled network inference based on dual criteria of motif match enrichment and the estimated time lag prior probability. Work is in progress to extend the analysis to include all 208 TFBS motifs corresponding to TFs that are transcriptionally expressed in the TLR-stimulated macrophage. Another limitation related to sequence scanning is that the promoter sequence data set used is purely upstream of the annotated transcription start site (TSS); recent evidence suggests that some TFs may be equally likely localized downstream of the annotated TSS [76]. In future work, it could be productive to scan for TFBS motifs both upstream and downstream of the annotated TSS.
In addition to recapitulating known regulators, the analysis identified a potential transcriptional regulator not previously known to play a direct role in TLR-stimulated macrophage activation, TGIF1. TGIF1 is a three-amino acid loop extension homeobox protein that acts as an obligate repressor through either direct binding to the retinoic acid responsive element on DNA, or through its interaction with SMAD2 in the TGFb pathway [71]. Its associated TFBS motif is enriched within the promoters of genes within cluster C26 (P,10 22 ) and cluster C4 (P,10 22 ), and Tgif1 is strongly (11-fold) upregulated in murine macrophages in response to Streptococcus pyogenes infection [77]. Particularly intriguing is the possibility that, in light of motif scanning evidence, TGIF1 may act as a transcriptional repressor of the cytokines Csf2 and Gm1960.
The approach of combining promoter scanning-based evidence with expression dynamics-based evidence enabled more specific identification of the TF gene(s) regulating a cluster than would have been possible using promoter scanning alone. Time-course expression data allowed, in some cases, the disambiguation of which TF gene (out of a family of TF genes associated with a given TFBS motif) is the likely regulator of a cluster enriched for the corresponding TFBS motif. Inclusion of expression data provided a second source of evidence to indicate the relevance of a given TF gene for predicting the condition-and time-specific expression of a target gene cluster. In total, these results validate the strategy of computationally integrating two distinct large-scale data sources (expression and genomic sequence) to infer a murine macrophage transcriptional network. In a future study, additional sequencebased data sources, such as evolutionarily conserved elements in the cis-regulatory region, could be incorporated into the method.

Materials and Methods
All data were analyzed in MatLab (MathWorks, Natick, MA) unless otherwise stated. In all cases where Fisher's exact test was performed, the test was one-tailed, using the cumulative distribution function (CDF) of the hypergeometric distribution.

Microarray expression measurements
Mutant strains (see Table S1) were generated in the 129SVJ background and backcrossed to C57BL/6 (Jackson Laboratories), ten times. Femurs from the C57BL/6 and mutant strains were flushed with complete RPMI (RPMI 1640 supplemented with 10% FBS, 2mM L-glutamine, 100 IU/mL penicillin and 100 mg/ mL streptomycin, all from Cellgro, Mediatech, except the FBS which was from Hyclone). Bone marrow cells were plated on nontissue culture treated plastic in complete RPMI supplemented with recombinant human M-CSF (rhM-CSF) at 50 ng/mL (gift from Chiron). On day 4 the cells were washed two times with RPMI with no additions and then grown 2 more days in complete RPMI supplemented with 50 ng/mL of rhM-CSF. On day 6 the cells were lifted from the non-tissue culture treated plastic, counted and plated at a density of 1.04610 5 cells/cm 2 (1610 6 cells per well in a 6-well dish) on tissue culture-treated plastic. On day 7 cells were stimulated with TLR agonists at the concentrations indicated in Table S2, without changing the media. Stimulus reagent sources are shown in Table S15. Stimulation of the cells was verified by the presence of TNFa in the culture supernatants detected by ELISA (Duoset ELISA Assay Development System, R&D Systems). Total RNA was isolated using TRIzol (Invitrogen) and analyzed for overall quality using an Agilent 2100 Bioanalyzer. mRNA was labeled using the Affymetrix One-Cycle Target Labeling protocol and reagents for eukaryotic target preparation. The labeled cRNA was hybridized to an Affymetrix GeneChip Mouse Genome 430 2.0 array using standard protocols and reagents from Affymetrix. Probe intensities were measured using the Affymetrix GeneChip Scanner 3000 and processed into CEL files using Affymetrix GeneChip Operating Software.

Microarray data processing
Expression data were acquired from 216 microarray hybridization experiments comprising 95 combinations of strain, stimulus, and time point (hereafter, ''experiments''; see Table  S3), of which 41 were in mutant strains, and 54 in wild-type. Data in the form of CEL files were background-subtracted and normalized with the Robust Multi-chip Average (RMA) method [78] using the software Bioconductor [79], then exported to MatLab for further analysis. For each of the 95 experiments, normalized expression measurements for each probeset were averaged across biological replicates using the log 2 intensities [78] to obtain the replicate-combined probeset intensity.

Differential expression testing
Significance testing was performed using mean log 2 intensities from 7 wild-type TLR-stimulation time-course experiments comprising 54 assays (where ''assays'' refers to a specific combination of strain, stimulus, and elapsed time; see Table S3) for which at least two replicates were available, relative to the mean log 2 intensities of unstimulated wild-type macrophages (hereafter, the ''reference experiment''). For each probeset and for each of the wild-type TLR-stimulation time-course experiments, a differential expression test was performed using a spline-based multivariate regression method [32] to obtain a P value for the difference in the sum-squared residuals under the alternative and the null hypotheses. A fourth-order polynomial basis was used, with 1,000 iterations for the bootstrap resampling. For each timecourse experiment, a separate P value threshold was selected based on a maximum Benjamini-Hochberg false discovery rate (FDR) [80] as described below.

Probeset selection
A probeset selection algorithm was carried out to select a representative probeset for each gene, eliminating probesets that are annotated as cross-hybridizing to transcripts from different genes.
Representative probesets from among the 45,037 probesets (excluding on-chip control probesets) on the Affymetrix Mouse GeneChip 430.2 were selected based on four criteria. A probeset was selected if and only if: (i) it possessed an Entrez GeneID in the Affymetrix probeset annotation database [81]; (ii) it had a log 2 intensity exceeding a fixed cutoff, in at least one replicatecombined experiment; (iii) it had a P value less than a fixed cutoff, for at least one experiment; and (iv) its probeset name did not contain ''_x_'' or ''_s_'', and was not associated (by GeneID annotation) with transcripts of two distinct genes. Criterion (iv) was imposed in order to eliminate probesets containing probes that cross-hybridize to transcripts from different genes [82]. Whenever multiple probesets mapped to the same GeneID (or the same collection of GeneIDs), the probeset with the smallest minimum P value, across all non-reference experiments, was selected as the ''representative probeset'' for the GeneID (or GeneID list).
This selection procedure was applied with four different cutoffs for log 2 intensity and P value, as summarized in Table S16. (i) To generate a set of differentially expressed genes suitable for expression clustering (hereafter, the ''target'' genes), a log 2 intensity cutoff of 6 was used, and a P value cutoff of 10 24 was used. The resulting number of representative probesets for target genes was 1,960. The complete list of the 1,960 target genes, and their expression measurements, are provided in Table S4. (ii) To generate a set of differentially expressed TF genes, the algorithm was re-run for probesets that are annotated as TFs, and for which a TRANSFAC matrix is available (see Selection of Transcription Factors), with a FDR cutoff of 0.05. A total of 80 differentially expressed TF genes were identified, as described in Selection of Transcription Factors below. (iii) To generate a set of all genes that were expressed in the macrophage, in at least one experiment, the probeset selection was run with a log 2 intensity cutoff of 6 and no filtering for differential expression. The 8,788 resulting genes were used as the reference set for applying Fisher's exact test to the promoter scanning results (see Promoter Scanning below). (iv) To generate the set of all genes represented by ''_at'' or ''_a_at'' probesets on the GeneChip, the algorithm was run with no filtering for minimum intensity or differential expression. This generated a list of 20,905 genes that constituted the genome-wide set used in the gene ontology enrichment analysis (see Functional Enrichment Analysis below).

Selection of transcription factors
A set of 388 position-weight matrices (PWMs) corresponding to murine TFs was obtained from the TRANSFAC Professional database version 10.3 [33]. These PWMs were mapped using TRANSFAC as well as literature searching, to 273 mouse genes that code for corresponding TFs or TF components. Of these, 80 TF genes were identified as differentially expressed (FDR#0.05) as described in Probeset Selection above (see Table S5). The 80 TF genes are represented by 150 TRANSFAC position-weight matrices. Table S10 contains the microarray expression measurements for these TF genes.
To estimate the fraction of genes in the mouse genome that are TFs, a genome-wide list of 1,245 murine TF genes (and probable TF genes) was assembled by mapping a list of 1,800 human TF genes from the literature [83] to mouse orthologs present on the Mouse GeneChip and integrating the set of genes possessing GO annotations for transcription factor activity (GO:0003700).

Expression clustering
The SDR values x pj for log 2 intensity, where p indicates the probeset and j indicates the experiment, were clustered using a fast implementation of the K-means algorithm [84], with a minimum cluster size of 1. The number of clusters K was chosen to minimize the Bayesian Information Criterion (BIC) [38]. The BIC is a function of K represented as BIC(K), where k p is the cluster to which the p th probeset is assigned, c kp,j is the j th coordinate of the centroid of the k th cluster in the SDR-transformed space of expression measurements, N = 1,960 (the number of target genes), M = 94 (the number of non-reference experiments), and s e 2 is the average intra-cluster variance evaluated at K = 3. The K-means clustering was carried out for integer values 18#K#50, for 1,000 iterations at each value of K; the optimal clustering (lowest average BIC over the 1,000 iterations) occurred at K = 32 (see Figure S1). The cluster expression profiles were characterized using the within-cluster median of the SDR; as a result, the cluster expression profile will not necessarily have a maximum value of 1 across all data points. This is because, in general, the genes within a cluster will not all reach a maximum value at the same time point. The induction time scale for the median SDR expression within each cluster was estimated using linear interpolation between the time points for the wild-type LPS time-course, and finding the time at which the absolute value of the SDR first exceeded 0.25. Clusters were displayed (in Figure 1 and Figure S2) in the cluster order that minimized the sum of Euclidean distances between adjacent clusters, obtained using simulated annealing [85] with 5000 iterations and a cooling rate of 0.5. The cluster expression profiles in Figure 2 were ordered for display using hierarchical agglomerative linkage using the Euclidean distance of extremal SDR expression level in time-course microarray experiments under the four indicated TLR agonists.

Functional enrichment analysis
Jackson Laboratory Mouse Genome Informatics GO annotations [86] were added to the Affymetrix Mouse GeneChip GO annotations [81] by string matching on the gene symbol field for each annotated probeset. For each of the 20,945 GO term IDs [87], the number of occurrences of the GO term ID (or a descendent of the GO term ID) in the GO hierarchy was computed for all 20,905 genes represented on the Affymetrix Mouse GeneChip (see Probeset Selection above) as well as for each co-expressed gene cluster. For each GO hierarchy (process, component, and function) the total number of genes possessing at least one GO annotation for the hierarchy was computed (see Table S17). The P value for GO enrichment was computed for each pair (i,C) of a GO term ID i and gene cluster C, using Fisher's exact test (under-occurrences of a GO term relative to the reference set were discarded). Any pairs (i,C) in which less than 5% of the genes within C possess GO term ID i, or with a term level in the GO hierarchy less than 3, were discarded. The resulting 629 (i,C) pairs were ordered by P value, and a P value cutoff was selected by demanding that the estimated false discovery rate be 0.02 (P#0.0148, or 2log 10 P$1.83). The resulting 460 GO term enrichments are shown in Table S8.
The list of 32 TLR-regulated murine cytokines was obtained by screening for all differentially expressed genes possessing an annotation for cytokine or chemokine activity, and by refining the list by using NCBI PubMed searches to determine whether each gene is a cytokine.

Selection of genes for null distribution
To form the null distribution of time-lagged correlation, a set of non-TF genes was generated. From the set of 1,960 differentially expressed genes, a set Q of 484 genes were selected such that each gene: (i) does not correspond to a TRANSFAC transcription factor as described above; (ii) has at least two GO process and two GO function annotations; (iii) is not annotated as ''regulation of transcription, DNA-dependent'' (GO:0008015); (iv) does not have a gene name with the prefix ''Zfp'' (zinc finger protein); and (v) is not listed among the 1800 TF genes (see Selection of Transcription Factors). The time-lagged correlations between genes within this group were taken as the null distributions of time-lagged correlations, for the purpose of computing the P value of a timelagged correlation between a TF and a gene (see Time-lagged Correlation below).

Constructing the prior distribution of time lags
Given the time resolution of the expression data (for which the smallest Dt is 20 min), the set L of time lags was chosen to be 0-80 min (inclusive), at 10 min intervals. The precision at which the optimal time lag can be estimated, at |r t |$0.9, was determined to be 65 min, based on simulated independent Gaussian noise added to the replicate-combined array data with standard deviation given by the measured replicate-standard deviation of the log 2 intensity in each experiment. The upper limit of 80 min was selected to ensure that in each time-course with time points T, the target gene expression evaluated at time points {t+t|tMT and t+t#max(T)} would always be based on measurements from at least three time points. The conditional probability density P(t c |H 0 ) of the overall transcriptional time delay t c , for true interacting TF-target gene pairs, was defined using the gamma distribution (see Text S1, Section 3). This probability density was integrated for bins of t c centered at the discrete time lags tML, to obtain an estimate of the discrete probability for observing an optimal time lag, where Dt = 10 min. Using the distribution P(t c |H 0 ), the upper limit of 80 min for t included approximately 97% of transcriptional delays.

Time-lagged correlation
The time-lagged correlation (TLC) was computed for all possible triples (f,g,t) of TF gene f, potential target gene g, and time lag t M L. There were 80 TFs and 1,960 target genes. The TLC was computed as follows, for a given (fixed) time lag t. Let the vectors X T (f) and X T (g) represent the log 2 -transformed, SDRnormalized expression measurements for f and g in a time-course, where T is the set of time points, and let t max ;max(T). Let T t ;{tMT|t#t max 2t}. Let X Tt (f ) and X Tt (g) represent the measurements of f and g, respectively, at the times T t . We now define the set of shifted time points T' t ;T t +t = {t+t|tMT t }. The expression values X T 0 t (g) were computed using linear interpolation between the adjacent time points. Expression values X Tt (f ) for each time course were concatenated together to obtain a combined multi-experiment vector X comb Tt (f ) of measurements for f and a combined vector X comb T ' t (g) of time-boosted measurements for g. The TLC r t (f,g) was then computed using Equation 2 and using X comb Tt (f ) and X comb T ' t (g). The criteria for inclusion of a time-course experiment in the TLC calculation were (i) a minimum of three points in the set T t , and (ii) a minimum of three measurements contributing to the interpolated values X T 0 t . A total of eleven time-course experiments comprising 72 independent time points were included in the TLC analysis, as shown in Table S9.
To build the background (null) TLC distribution D pt (as defined in Text S1, Section 2) for each time lag t, the TLC was computed for a set H consisting of all non-identical pairs of genes (h 1 ,h 2 ), where h 1 and h 2 are drawn from the set Q of non-TF genes (see Selection of Genes for Null Distribution above). The background distributions were constructed from the r t 2 (h 1 ,h 2 ) values, using Gaussian kernel density estimation [38] (see also Text S1, Section 4) with a smoothing length of 0.005 (chosen to maximize the number of pair-wise associations in the non-background set for which P tlc #10 23 ). For each t and each r t (f,g), the complementary CDF F pt was computed by integration of D pt using the extended Simpson's Rule (closed interval) [85] with 200 bins.
The TLC was then analyzed for the set G of gene pairs (g 1 ,g 2 ), where g 1 was drawn from the set of 80 TFs (see Selection of Transcription Factors above), g 2 was drawn from the set of 1,960 differentially expressed (''target'') genes (see Probeset Selection above), and g 1 ?g 2 (the inequality avoids perfect zero-time-lagged correlations that would bias the significance test). For each pair (g 1 ,g 2 ), the time lag that maximized F pt (r 2 t (g 1 ,g 2 )) was selected as the optimal time lag for the pair, and denoted by h(g 1 ,g 2 ).
The probability ratio R(t) was computed using Equation 5. The marginal probability P(H 0 ) was estimated to be ,0.94 based on an analysis of the transcriptional network of [7], taking the average out-degree of the TFs in Fig. 4B and dividing by the number of differentially expressed genes in that study (1,784 genes). The marginal probability P(t)was obtained from h(H).
The combined, cumulative, TLC-based P value for (f,g), denoted by P tlc (f,g), was computed according to Equation 7 (for which a detailed mathematical derivation is given in Text S1). Empirical evidence showing the approximate independence of j and R under the null hypothesis is shown in Figure S17. For each pair (f,C) of TF gene f and gene cluster C (see Expression Clustering above), an overall F score, F exp (f,C) was computed using Equation 8, combining the |C\{f}|P values. Because the genes within a cluster are grouped by expression similarity, their TLCs with respect to f are not independent, even under the null hypothesis that f does not regulate any of the genes within the cluster. Thus, among a large collection of pairs (f,C) satisfying the null hypothesis, the F scores F exp (f,C) will not be distributed according to the x 2 distribution with 2|C\{f}| degrees of freedom. Instead, the number of intra-cluster degrees of freedom was computed for each cluster by clustering the SDR expression profiles of the genes within a cluster (across all 94 non-reference experiments) using the K-means algorithm. For a range of numbers k of sub-clusters, the BIC was computed using the variance at k = 3 for normalizing the bias term [38]. The number of sub-clusters k at which the BIC was minimized was doubled to obtain the effective number of degrees of freedom, d(C), within each cluster. The average over all clusters was AEd(C k )ae k = 11.03, where C k denotes the k th cluster. The x 2 test was applied with d(C) degrees of freedom, to obtain an overall P value for the association between f and C: where F exp (f,C) is defined in Equation 8, and c is the incomplete gamma function [85]. A second statistic, the average time lag, was computed for each pair (f,C), and used as an additional criterion in the network inference (see Network Inference below).

Promoter scanning
Mouse position-weight matrices (150 in total) corresponding to the 80 differentially expressed TF genes, were obtained from TRANSFAC Professional (see Selection of Transcription Factors above, and Table S5) [33]. Promoter sequences of 2 kbp upstream of 17,254 mouse genes were obtained from the UCSC genome database [47] (UCSC annotation build mm8, based on the NCBI mouse genome assembly m36), each identified by NCBI RefSeq ID. The [22 kb, 0] coordinate range relative to the transcription start site was selected based on Figure 2c from [88]. For each representative probeset (see Probeset Selection above), the corresponding RefSeq ID (if available) was obtained from the Affymetrix GeneChip annotation file [81]. The 8,788 expressed genes mapped to 7,492 unique promoter sequences (hereafter, the ''reference'' set, denoted by m exp = 7,492). Of the 12,117 genes that were not expressed in any of the microarray experiments, 7,503 were mapped to UCSC promoter sequences (hereafter, the ''background'' set). The 1,960 differentially expressed genes were mapped to 1,713 unique promoter sequences. Low-complexity repeats were masked from all promoter sequences prior to motif scanning, using RepeatMasker [89]. Scanning was performed using MotifLocator version 3.2 [90], using a first-order background model with frequencies computed from the first 496 genes on chromosome 17, obtained from the 5 kbp upstream promoter sequence file from NCBI mouse genome assembly 32 (UCSC build mm4), and using motif matrix score thresholds selected as described below. The background sequences were scanned with all matrices with no cutoff. For each matrix, the score threshold was computed at which an above-threshold match would occur on average in one out of every 5 promoter sequences (i.e., once per 10 kb). The motif match score thresholds are given in Table S11. The reference promoter set was scanned using these score thresholds, and for each matrix m, the number of promoter sequences in the reference set that had at least one above-threshold match was denoted by n exp (m). For each cluster C, the mapped promoter sequences for the genes within the cluster (the number of which was denoted by m(C)) were scanned, and the number of sequences with at least one above-threshold match was denoted by n(m,C). For each matrix m and cluster C, a P value P scan (m,C) was computed from the values m exp , n exp (m), m(C), n(m,C), using Fisher's exact test. Let W denote the mapping between the 80 TF genes and subsets of the 150 TRANSFAC matrices (see Table S5), so that W(f) is the set of TRANSFAC matrices associated with the TF gene f. For each TF gene f and cluster C, a P value representing the association between f and C was computed as follows, The values of m exp , n exp (m), m(C), and n(m,C) for all clusters, are provided in Table S11.

Network inference
For each pair (f,C) of TF gene f and co-expressed gene cluster C, an overall combined P value, P comb (f, C) for the significance of the association between f and C based on both promoter scanning and expression time-course data, was computed using Fisher's method, The set of all pairs (f,C) were selected, satisfying the following criteria: (i) P comb (f, C)#0.0248 (or 2log 10 P comb (f,C)$1.61, where the P value cutoff was obtained using an FDR of 0.025); (ii) P scan (f,C)#0.05 (or 2log 10 P scan (f,C)$1.3); and (iii) AEhae f,C $10 min. Criterion (iii) was used to ensure that a pair (f,C) would not be accepted based solely on a very low P scan (f,C) value; the average optimal time lag must be biologically plausible. A total of three TF-cluster associations were rejected, that passed criteria (i) and (ii), but not criterion (iii). A total of 90 TF-cluster associations were identified based on these criteria, involving 36 TF genes. The out-degree of a TF gene f within the network was estimated by summing (over all clusters for which (f,C) was accepted) the product z(f,C) |C\{f}|, where z(f,C) is the fraction of genes within C that have at least one binding site for any matrix m M W(f). The diagrams shown in Figures S15 and S16 were generated using Cytoscape [91] version 2.5.0. Protein interactions were obtained from the Human Protein Reference Database [67], Release 6 (2007/01/01) and the Biomolecular Interaction Network Database [68] (2007/10/14). The 36 differentially expressed TF genes were mapped to human orthologs using NCBI Entrez Gene. For the protein network diagram shown in Figure S16, a minimum log 2 microarray probeset intensity cutoff of 6.5 was required in at least one array experiment (with the exception of Smad6, whose human ortholog protein is expressed in HL60 macrophage differentiation [92]).

Quantitative PCR
Total RNA was isolated from bone marrow-derived macrophages using TRIzol (Invitrogen), treated with DNAase (Ambion), and used as template for reverse transcription (Superscript II, Invitrogen) according to the manufacturers' instructions. qPCR was performed using Applied Biosystems ABI 7900 HT. Expression units were computed relative to the housekeeping gene Eef1a1 [6,93]. Primer reagents for Tgif1 and Eef1a1 were obtained as described in Table S15.

ChIP-on-chip validation
Five (TF,cluster) pairs were selected for ChIP-on-chip validation based on several criteria: (1) the gene members of the cluster needed to be well-represented on the tiling array (at least 30% of the genes in the cluster must be represented on the ChIP-on-chip array); (2) a correlation between TF gene and cluster expression consistent with known function (activator or repressor) for the TF; (3) the availability of a high-quality polyclonal murine antibody for a relevant TF protein; (4) demonstrated specificity of the antibody based on Western blot analysis; (5) a successful ChIP assay for several known targets of the TF. Genome location was assayed using ChIP-on-chip hybridization as described in [6], with polyclonal antibodies for murine IRF1 and p50 (Nfkb1) ( Table  S15). A custom Affymetrix GeneChip microarray was used, consisting of 25-mer oligonucleotides selected to densely tile 20 kbp upstream and 20 kbp downstream (and selectively, the coding regions) of genes selected based on differential expression in preliminary microarray expression studies involving murine RAW 264.7 cells stimulated for 60 minutes by LPS, Pam 3 CSK 4 , or Pam 2 CSK 4 [94]. Of the 1,960 differentially expressed genes identified in Probeset Selection, 517 are represented on the tiling array. Hybridization to the custom tiling array was carried out using standard protocols and reagents from Affymetrix. ChIP-onchip microarray scans were background-adjusted and quantile normalized as described in [6]. ChIP-on-chip data were processed as follows. First, probes were sorted based on chromosomal location. The sample/control absolute intensity ratio was computed for each probe, where the control intensity was taken from an experiment with antibody, but without LPS stimulation. A smoothed intensity profile was then generated using a sliding window algorithm based on Tukey's biweight kernel [95] with a 100 bp window size (as was used in [6]). Probes were then selected for which the intensity ratio was higher than a statistical cutoff (P#0.01). If there were multiple significant probes within a 200 bp region, the combined statistical significance of region was computed by performing a t-test in which the distribution of probe intensities within the 200 bp region is compared to a background region of probe intensities. For each identified chromosomal region, the annotated gene nearest to the region in the 59 direction was recorded, along with the distance to the nearest flanking gene. Significance testing of the enrichment of ChIP-on-chip binding among genes within a specific cluster was carried out using Fisher's exact test with a background set consisting of all 520 differentially expressed mouse genes (see Differential Expression Testing above) for which at least one probe on the array is located within 20 kbp upstream of the TSS.

Accession numbers
All microarray expression data from this study have been deposited into the ArrayExpress [96] public database under accession number E-TABM-310. NCBI Entrez Gene identifiers can be found for all differentially expressed genes considered in this study, in Tables S4 and S5. Mouse Genome Informatics Allele accession numbers are provided for each mutant strain, in Table  S15.

Supporting Information
Text S1 Mathematical Derivations. This document provides a complete mathematical description of the significance test used for the time-lagged correlation. In addition, it provides background information on the Gaussian kernel density estimation method and some key theorems supporting the derivation of the method. Found at: doi:10.1371/journal.pcbi.1000021.s001 (0.23 MB PDF) Figure S1 The optimal number of clusters was determined using the Bayesian Information Criterion (BIC). The horizontal axis indicates the number of clusters K used for K-means clustering. The cluster analysis was repeated for K varying between 18 and 50, with the BIC computed for each number of clusters. The optimal number of clusters, for which the BIC is minimized, was found to be K = 32 (see Materials and Methods, Expression Clustering). Found at: doi:10.1371/journal.pcbi.1000021.s002 (0.15 MB TIF) Figure S2 Differential expression profiles of gene clusters, in TLR-stimulated macrophages, across all microarray expression experiments. Each row represents an experiment (a specific combination of strain, stimulus, and time point), and each column represents a cluster. Clusters are displayed in the order that minimizes the sum of pairwise distances between adjacent clusters (see Materials and Methods, Expression Clustering). Each colored rectangle within the heat-map indicates the centroid of the expression levels for genes within the indicated cluster, for the indicated experiment. The differential expression level (SDR, see Equation 1) is indicated in red/green color, and varies between -1 (bright green) and 1 (bright red), with 0 (black) indicating no change from the expression level in the unstimulated wild-type macrophage. The shaded light gray/charcoal regions in the far left column indicate the genotype. The color-coding in the second-toleft column indicates the stimulus (color code legend in lower right; and see Table S2 for the concentrations). The four-digit numbers to the right of the color-code column, indicate the elapsed time (min) post-stimulation, for each experiment.  Figure S4 Cluster-median differential expression profiles in wild-type macrophages stimulated with Pam 3 CSK 4 show a diversity of time scales. Each data point shown is the median of the SDR-transformed (see Equation 1) differential expression levels of the genes within the indicated cluster, at the indicated time after stimulation. Cluster C26 shows sustained activation under this stimulus, as opposed to the case of stimulation with LPS (see Figure S3). Found at: doi:10.1371/journal.pcbi.1000021.s005 (0.32 MB TIF) Figure S5 Cluster-median differential expression profiles in wild-type macrophages stimulated with poly I:C show a diversity of time scales. Each data point shown is the median of the SDRtransformed (see Equation 1) differential expression levels of the genes within the indicated cluster, at the indicated time after stimulation. The core response Clusters C27 and C28 induce later in this time-course experiment than in the case of stimulation with LPS ( Figure S3).  Figure S14 The set of transcription factor genes has a 20-fold variation in out-degree (number of target genes), within the transcriptional network. (a) Estimated out degree of transcription factor genes. The out degree of a transcription factor gene is the number of genes estimated to be regulated by the transcription factor(s) associated with that TF gene (i.e., of which that TF gene is a component). For each gene cluster with which a TF gene was associated, the number of genes within the cluster for which a motif match was found (corresponding to the TF gene), was tabulated. The number of target genes was summed over all clusters with which the TF was associated, based on the combined expression and promoter scanning data (see Materials and Methods, Network Inference). Among the 36 TF genes in the network, the estimated out degree had a median of 49, and a maximum value of 285. (b) Estimated significance of the association of the TF gene in the network. For each TF gene f implicated in the network, the minimum P value P comb (f,C) of association with any cluster C, was used as a measure of the overall significance of the association of TF gene in the transcriptional network. Transcription factor genes are displayed in decreasing order of estimated out degree (number of target genes). Transcription factors associated with larger clusters are seen to correlate with higher significances in the network, as a consequence of the sample size-dependence of the statistical tests used for the motif scanning and expression dynamics evidences.
Found at: doi:10.1371/journal.pcbi.1000021.s015 (0.51 MB TIF) Figure S15 Transcription factors involved in macrophage activation are highly interconnected in the protein interaction network, and the interacting TFs co-associate with clusters. Nodes indicate TF genes whose transcript levels are differentially expressed in LPS-stimulated macrophages, and that are associated with the transcriptional network through the combination of scanning-and expression-based evidences. Node labels are gene names. A red node indicates upregulated gene expression under LPS, and green indicates downregulation, and a purple node indicates transient up-and downregulation. A blue arc indicates that the human orthologs of the murine proteins associated with the murine TF genes connected by the arc, have an interaction in the Human Protein Reference Database [68] or in the Biomolecular Interaction Network Database [69]. A thick black arc indicates that the two connected TF genes co-associate with one or more clusters within the network, and share a protein interaction (suggesting a possible transcriptional complex). A purple arrow indicates a known protein-DNA interaction between the source node's human ortholog protein and the promoter of the human ortholog of the gene indicated by the target node. Brown ellipses denote the core transcription factor complexes NFkB and AP1. Found at: doi:10.1371/journal.pcbi.1000021.s016 (0.64 MB TIF) Figure S16 TGIF1 interacts with many members of the SMAD/AP-1 transcription complex. Shown here is a network diagram of 16 proteins that interact with the SMAD family of transcription factors SMAD1/2/3/6, the histone deacetylaces HDAC1/2, and the TG-interacting factors TGIF1/2. Nodes indicate proteins, and a blue line between two nodes indicates that the human orthologs of the two proteins have an interaction, in either the Human Protein Reference Database (HPRD) [68] or in the literature [72,75]. Red arrows indicate human protein-DNA interactions annotated in the TRANSFAC database [34]. The diagram includes nearest-neighbors of the SMAD, HDAC, and TGIF families in the protein interaction network. Each node shown in the diagram corresponds to a transcript that is likely expressed in murine bone marrow-derived macrophages, based on having an above-threshold microarray intensity within at least one experiment (see Materials and Methods, Probeset Selection). Table S1 Summary of mutant mouse strains used in this study. Expression data from available mouse strains with mutations of known TLR signaling adapter molecules or known transcriptional regulators were included in the cluster analysis, in order to maximize the diversity of expression patterns in the data set used for clustering. Column 1 is the mutant strain name. Column 2 is the name of the molecule affected by the mutation. Column 3 gives the gene title. Column 4 briefly summarizes the relevance of the molecule in TLR-stimulated macrophages.   Table  S3). Column 1 indicates the NCBI gene symbol of the gene.  Table S6 Summary of co-expressed gene clusters. Column 1 indicates the cluster name. Clusters were numbered in order of decreasing size. Column 2 indicates the number of genes in the cluster. Column 3 is a heat-map representation of the withincluster median of the normalized differential expression intensity (SDR, see Equation 1), over time, in wild-type macrophages stimulated with LPS. The color red indicates upregulation relative to wild-type unstimulated macrophages, and green indicates downregulation (see color bar in Figure S2). Column 4 indicates the cluster response time under LPS stimulation, defined as the time scale (in minutes) for the log 2 fold change to reach 25% of its extremal value (see Materials and Methods, Expression Clustering); the time scale uncertainty is 6 5 min. Column 5 lists the known (excluding those solely inferred from electronic annotation, i.e., ''IEA'' evidence code) transcription factor genes that are members of the cluster (these are not the inferred transcriptional regulators of the cluster). Column 6 lists the known cytokines and chemokines that are members of the indicated cluster. Found at: doi:10.1371/journal.pcbi.1000021.s024 (0.13 MB DOC)

Table S7
The timing of induction of core response clusters C27 and C28 is adapter molecule-dependent. Column 1 indicates the stimulus. Column 2 indicates the microarray conditions compared, for example, fold-change (stimulated relative to unstimulated) in Myd88 (2/2) macrophages vs. the fold-change in wild-type. Column 3 indicates the time post-stimulation. Columns 4 and 5 are the within-cluster medians of the log 2 of the ratios for the condition comparison indicated in column 2, for the clusters C27 and C28, respectively. The data indicate that the early response of these clusters is largely dependent on the MyD88 signaling pathway, and that the later response (2 hours) is more strongly dependent on the TRIF signaling pathway. Found at: doi:10.1371/journal.pcbi.1000021.s025 (0.03 MB DOC)

Table S8
Gene Ontology enrichments in co-expressed gene clusters. Column 1 indicates the cluster. Column 2 contains the Gene Ontology ID (GOID) for the GO term. Column 3 contains the GO term. Column 4 indicates the GO hierarchy (process, component, or function) to which the GO term belongs. Column 5 contains the -log 10 P value (significance) for the enrichment of the GO term in the indicated cluster. Column 6 contains the level of the GO term in the gene ontology hierarchy. Column 7 indicates the number of genes within the cluster that possess this GO term. Column 8 indicates the frequency at which this GO term appears in the set of all annotated genes in the genome (see Materials and Methods, Functional Enrichment Analysis). Column 8 indicates the frequency at which the GO term appears among genes in the indicated cluster. Table S10 Transcription factor genes with microarray expression data. This spreadsheet contains microarray probeset intensities for all 80 differentially expressed transcription factor genes (see Materials and Methods, Selection of Transcription Factors) across all 95 microarray experiments (see Table S3). Column 1 indicates the NCBI gene symbol of the gene. Column 2 indicates the NCBI Entrez Gene ID. Column 3 indicates the probeset selected as representative for the gene. Column 4 provides a brief gene description, obtained from the Affymetrix Mouse GeneChip annotations file. Column 5 indicates the co-expressed gene cluster to which the gene was assigned (see Materials and Methods, Expression Clustering). Columns 6-8 provide listings of the gene's Gene Ontology annotations in the process, component, and function GO hierarchies, respectively (see Materials and Methods, Functional Enrichment Analysis). Column 9 indicates the set of TRANSFAC matrices associated with this transcription factor gene (see Materials and Methods, Selection of Transcription Factors). Column 10 indicates the maximum log 2 intensity observed, across all experiments. Columns 11-105 provide the log 2 intensity measurements of the probesets, across all 95 microarray experiments. Found at: doi:10.1371/journal.pcbi.1000021.s028 (0.21 MB XLS) Table S11 Transcription factor binding site (TFBS) motif position-weight matrices, threshold scores, and number of matches for promoter TFBS motif searching. This spreadsheet contains the results from scanning the promoters of all genes in the reference set and in each co-expressed cluster, for transcription factor binding site motifs from TRANSFAC (see Materials and Methods, Promoter Scanning). Column 1 contains the TRANSFAC matrix name. Column 2 contains the minimum MotifLocator match score required for the given PWM to be identified as matching the sequence at a given chromosomal location. Column 3 contains the number of matches within the set of 7,492 reference promoter sequences. Columns 4-35 contain the number of matches for the PWM for each of the 32 co-expressed gene clusters. Section 2 contains the P values of the enrichments of the PWM matches within each of the 32 clusters (see Materials and Methods, Promoter Scanning). Row 2 indicates the number of genes whose promoters were scanned, for each cluster. The number of matches for each motif within each of the clusters is shown in a second section of the spreadsheet, starting at row 154). Found at: doi:10.1371/journal.pcbi.1000021.s029 (0.15 MB XLS) Table S12 Time-lagged correlation data for all (TF,target) gene pairs in which a motif associated with the TF gene was found to match within the promoter region of the target gene. Column 1 contains the transcription factor gene symbol. Column 2 contains the transcription factor gene's Affymetrix probeset ID. Column 3 contains the target gene symbol. Column 4 contains the target gene's Affymetrix probeset ID. Column 5 indicates the co-expressed gene cluster (1-32) of which the target gene is a member. Column 6 indicates the time-lagged correlation coefficient between the TF and the target genes, at the optimal time lag. Column 7 indicates the optimal time lag selected for the gene pair. Column 8 contains the score assigned to the motif match by MotifLocator. Found at: doi:10.1371/journal.pcbi.1000021.s030 (7.84 MB XLS) Table S13 ChIP-on-chip data. Results of five ChIP-on-chip assays for predicted (TF,cluster) pairs. Each row in the table shows integrated data sources for a specific gene target. Column 1 indicates the TF gene predicted to regulate the target cluster. Column 2 gives the probeset of the TF gene. Column 3 indicates the gene symbol of the target gene. Column 4 gives the target gene probeset. Column 5 gives the co-expressed cluster of which the target gene is a member. Column 6 gives the score for the best motif match for the indicated TF, within the promoter of the target gene (a blank cell indicates that no above-threshold motif match was found, at the 1 match per 10 kbp level of stringency). Column 7 indicates the P tlc from time-lagged correlation. Column 8 indicates whether the gene's promoter region was represented on the promoter array. Column 9 indicates the ChIP-on-chip P value; a blank cell in this column indicates that no significant ChIP-on-chip binding was found (see Materials and Methods, ChIP-on-chip Validation). Found at: doi:10.1371/journal.pcbi.1000021.s031 (0.05 MB XLS) Table S14 ChIP-on-chip enrichment results for co-expressed gene clusters that are well-represented on the promoter array. Each row in the table gives results for the ChIP-on-chip assay for a particular cluster and for a particular TF target. Each row in the table is associated with a particular cluster and a particular TF target, for all pairings of p50/NFKB1 and IRF1 with the nine clusters for which at least 30% of the member genes were represented on the tiling array. The first column indicates the TF target. The second column gives the cluster number. The third column gives the number of genes on the ChIP-on-chip array for which binding was observed upstream of the transcription start site. The fourth column gives the number of genes within the cluster, that were represented on the ChIP-on-chip array. The fifth column gives the number of genes within the cluster that showed evidence of TF binding in the upstream region, in the ChIP-on-chip assay. The sixth column gives the fraction of genes in the cluster that are represented on the array. The seventh column gives the enrichment P value for the ChIP-on-chip hits within the cluster (see Materials and Methods, ChIP-on-chip Validation). The eighth column gives the motif match enrichment P value based on sequence scanning (see Materials and Methods, Promoter Scanning). The ninth column gives the P value based on the time-lagged correlation of expression profiles of the TF gene and the genes within the target cluster. The tenth column gives the average time lag, between the TF gene and the genes within the target cluster. The eleventh column gives the combined P value based on motif match enrichment and time-lagged correlation (see Equation 13). Found at: doi:10.1371/journal.pcbi.1000021.s032 (0.02 MB XLS) Table S15 List of key materials and reagents. Column 1 indicates the type of material (mouse strain or stimulus reagent). Column 2 indicates the specific strain or reagent. For mutant mouse strains, the Mouse Genome Informatics accession number of the allele is provided. Column 3 indicates the source laboratory from which the mouse strain or reagent was obtained. Found at: doi:10.1371/journal.pcbi.1000021.s033 (0.05 MB DOC) Table S16 Summary of probeset selection criteria. Each row describes a set of data selection criteria, for a specific purpose. For a detailed explanation of each set of criteria, see Materials and Methods, Probeset Selection. Column 1 states the purpose of the set of selection criteria. Column 2 indicates the minimum log 2 absolute probeset intensity that must have been recorded in at least one experiment, for the gene to be included in the selection described in Column 1. Column 3 indicates the false discovery rate used to determine the P value cutoffs for each of the seven timecourse experiments used for differential expression testing (see Materials and Methods, Differential Expression Testing); ''n/a'' means that no differential expression test was applied, for genes in the indicated row. Column 4 gives the number of probesets resultant from the indicated selection criteria.