^{1}

^{1}

^{1}

^{1}

^{1}

^{2}

^{1}

^{1}

^{3}

^{4}

^{*}

Conceived and designed the experiments: TN AT. Performed the experiments: SE ML KCM DEM. Analyzed the data: TN AT. Contributed reagents/materials/analysis tools: HF PC. Wrote the paper: TN AT.

The authors have declared that no competing interests exist.

The Mediator is a highly conserved, large multiprotein complex that is involved essentially in the regulation of eukaryotic mRNA transcription. It acts as a general transcription factor by integrating regulatory signals from gene-specific activators or repressors to the RNA Polymerase II. The internal network of interactions between Mediator subunits that conveys these signals is largely unknown. Here, we introduce MC EMiNEM, a novel method for the retrieval of functional dependencies between proteins that have pleiotropic effects on mRNA transcription. MC EMiNEM is based on Nested Effects Models (NEMs), a class of probabilistic graphical models that extends the idea of hierarchical clustering. It combines mode-hopping Monte Carlo (MC) sampling with an Expectation-Maximization (EM) algorithm for NEMs to increase sensitivity compared to existing methods. A meta-analysis of four Mediator perturbation studies in

Phenotypic diversity and environmental adaptation in genetically identical cells is achieved by an exact tuning of their transcriptional program. It is a challenging task to unravel parts of the complex network of involved gene regulatory components and their interactions. Here, we shed light on the role of the Mediator complex in transcription regulation in yeast. The Mediator is highly conserved in all eukaryotes and acts as an interface between gene-specific transcription factors and the general mRNA transcription machinery. Even though most of the involved proteins and numerous structural features are already known, details on its functional contribution on basal as well as on activated transcription remain obscure. We use gene expression data, measured upon perturbations of various Mediator subunits, to relate the Mediator structure to the way it processes regulatory information. Moreover, we relate specific subunits to interacting transcription factors.

The Mediator, first discovered by Kim et al. (1994) and Koleske et al. (1994)

Recently, “structure-function” analyses have been suggested and conducted by van de Peppel et al. (2005) and Koschubs et al. (2009)

Nested Effects Models (NEMs) are probabilistic graphical models designed for the analysis of gene expression data from perturbation experiments. They are designed to reconstruct the dependency structure of the perturbation signals, and they perform particularly well if this structure is hierarchical

Consequently, a NEM is parametrized by the tuple

The main objective is the reconstruction of the signals graph

Throughout this section, the data

The expectation (E-)step of the EM algorithm involves calculating the expected log-posterior with respect to the distribution of

The EM algorithm is guaranteed to find a local maximum which, for unimodal distributions, equals the global optimum. In practice, the posterior landscape

It is not obvious how the effects graph prior should be defined. Being most conservative,

In an Empirical Bayes approach, we iteratively estimate

Initialize

Generate a representative sample

Replace

Repeat steps 2 and 3 until convergence (see Sections S3.2 and S4.4 in

Our goal was to establish MC EMiNEM as a general purpose tool for the analysis of high-dimensional intervention data, and to use MC EMiNEM for the reconstruction of the internal Mediator complex signaling network. MC EMiNEM includes three key features for an efficient and comprehensive search of the space of candidate regulatory networks (Markov Chain Monte Carlo sampling, in combination with Expectation Maximization, and an Empirical Bayes method for the adaptive attachment of effects). We show in simulations that all these features contribute substantially to the method's performance. Then we construct a high-confidence regulatory network of Mediator subunits. The predicted effects graph reveals interactions between the Mediator and gene-specific transcription factors.

Extensive simulations were performed to ensure the convergence of the MCMC chain, and to verify the independence of the outcome from the initial parameter choice (see Section S2.2 in

(A) Prediction quality. Comparison of the sensitivity of MC EMiNEM and four alternative methods for four different noise levels (top) and four different signals graph sizes (bottom). The sensitivity is depicted on the y-axis, each frame corresponds to one parameter setting. Top: For a signals graph of 11 nodes, noisy data was generated such that for an optimal test with a type-I error (

Our approach attempts to maximize the marginal posterior

The 25 protein subunits of the Mediator are subdivided into 4 distinct modules (head, middle, tail, kinase, see

The numbers of the Mediator subunits correspond to the unified Mediator nomenclature

We generated expression profiles of

The predicted Mediator network (the signals graph in

Shown are the log-odds ratios which serve as MC EMiNEM's input. Genes that are likely to change in a given condition are depicted in red,and they are blue otherwise. Color saturation indicates the absolute value of the log-odds ratio (cf. Fig. S4.3 in

Apart from an estimate of the internal flow of regulatory information in the signals graph, MC EMiNEM returns a posterior probability of the attachment of effect genes to specific Mediator subunits (

The 21 TF-Mediator subunit interactions mapped by MC EMiNEM were validated using the BioGRID database

All target genes of TFs associated with the tail module show downregulation after perturbation, consistent with the tail's function to contact gene specific transcription factors

A) Expression changes of the target genes of SKO1 across all experiments. Experiments correspond to rows; the respective Mediator subunit perturbations are indicated by the colored boxes to the left of the heat map (coloring is in accordance with the Mediator module structure in

The transcriptional activator SWI5 has a large number of physical interactions with subunits from various Mediator modules (Med15, Med17, Med18, Med22,

Similar analyses were carried out for all TFs in the MC EMiNEM map (

The reconstruction of interaction networks from high dimensional perturbation effects is still a challenge. We have developed MC EMiNEM, a method for the learning of a Nested Effects Model. We introduced two major improvements, namely an Expectation-Maximization algorithm for the very fast detection of local maxima of the posterior probability function. Mode hopping Markov Chain Monte Carlo sampling was then used for the efficient exploration of the space of local maxima. We applied MC EMiNEM to a combination of proper and public gene expression data obtained from Mediator subunit perturbations. It turned out that MC EMiNEM does not only shed light on structural dependencies of Mediator subunits, it also identifies interactions of gene-specific transcription factors with Mediator subunits. Our findings are consistent with the state-of-the-art knowledge about the Mediator architecture and function. By grouping of components with similar profiles, hierarchical clustering has proved tremendously useful for the analysis of expression data obtained from observational experiments. MC EMiNEM reaches beyond the identification of undirected relationships; it resolves directed regulatory structures, and it identifies gene groups with a consistent and specific response pattern. For interventional data, MC EMiNEM is thus the appropriate counterpart to clustering.

(TXT)

(TXT)

(PDF)

(PDF)

We thank the members of the Tresch and Cramer Laboratories. We thank Frank Holstege, Kathrin Sameith, and Patrick Kemmeren for valuable suggestions and discussions.