^{1}

^{2}

^{1}

^{2}

^{3}

^{1}

^{2}

^{4}

^{5}

^{6}

The authors have declared that no competing interests exist.

Maps of genetic interactions can dissect functional redundancies in cellular networks. Gene expression profiles as high-dimensional molecular readouts of combinatorial perturbations provide a detailed view of genetic interactions, but can be hard to interpret if different gene sets respond in different ways (called

Genes do not act in isolation, but rather in tight interaction networks. Maps of genetic interactions between pairs of genes are a powerful way to dissect these relationships. Genetic interactions are mostly defined by quantifying individual phenotypes like growth or survival. However, when high-dimensional phenotypes are observed, genetic interactions can become very hard to interpret. Here we test the hypothesis that complex relationships between a gene pair can be explained by the action of a third gene that modulates the interaction. Our approach to test this hypothesis builds on Nested Effects Models (NEMs), a probabilistic model tailored to inferring networks from gene perturbation data. We have extended NEMs with logical functions to model gene interactions and show in simulations and case studies that our approach can successfully infer modulators of genetic interactions and thus lead to a better understanding of an important feature of cellular organisation.

More than 80% of genes in the yeast

While most genetic interaction maps use survival [

For two fixed knock-out mutations, we denote by 00 the wild type, by 10 and 01 the two single mutants, and by 11 the double mutant. Their effect on the expression of a given gene _{i,00}, _{i,01}, _{i,10}, and _{i,11}. Gene expression is reported as the log-fold change relative to the wild type 00, hence _{i,00} = 0. Epistasis between the two mutations is defined as
_{1}, …, _{m}).

Complete redundancy is the situation in which, for most genes _{i,01} = _{i,10} = 0 and hence _{i} = _{i,11}. Depending on the sign of _{i,11}, epistasis may be positive or negative for each individual gene

Mixed epistasis is defined by _{i,01}, _{i,10} ≠ _{i,11} for some genes and some of those not following redundancy. It is mixed in the sense that the single mutants can have any effect, positive or negative, in any combination (see

Left: Complete redundancy is explained by an effect only being visible when both genes A and B are knocked out simultaneously. Right: Mixed epistasis is characterized by a mixed behaviour of two genes. Their interaction differs for different gene sets.

In this paper we test the hypothesis that mixed epistasis between a gene pair can be explained by the action of a third gene that mediates between the functional interaction and the transcriptional readout. To test this hypothesis, we extend the framework of Nested Effects Models (NEMs), which has been specifically tailored to analyze high-dimensional gene perturbation data [

There exist many different pathway reconstruction methods [

Boolean networks have a long tradition in biology [

Bayesian networks have been used on multi-parametric readouts of gene perturbations [

This limitation motivated the development of Nested Effects Models (NEMs) to indirectly reconstruct signaling networks from observations of downstream genes whose expression levels are affected by perturbations of signaling proteins [

_{A}, _{B}, _{C}, _{D}). Effects of perturbing _{D}) are nested in the effects of perturbing _{A}, _{C}, _{D}}) and _{B}, _{D}}). The matrices show the expected behaviour under the model. In real data, each gene in a set of effect reporters _{D} if ^{3} = 8 possible logical functions are AND, OR, XOR, not-A and not-B. The NEM in (A) is the special case of epiNEM with an OR logic.

The key contribution in this paper is to extend NEMs by introducing logical functions modeling the effects of combinatorial perturbations. The fact that NEMs can easily be extended in this way shows their advantage over subset-based methods that are only defined on pairs of variables [

EpiNEMs consist of three elements (_{i}. Second, a directed graph Θ linking each observed effect _{i} to exactly one of the signaling genes _{i}. Combining these two graphs results in the NEM model. Third, in our epiNEM approach we add logical functions, one for each signaling gene _{i} that has two or more parents in

In total there are five logic gates that represent different biological relationships (see

Adding logics extends the

Thus, each set of perturbations corresponds to a unique pattern of activation states of pathway genes and we can summarize the expected effects on pathway genes in a row-vector

Given the states of all signaling genes _{i}, we calculate the likelihood of each model hypothesis in the same way as in standard NEMs [_{1}, …, _{m}}, is given. With these parameters, the expected effects can be compared to the observed effects to obtain the likelihood
_{ik} = 1 if we observe an effect and _{ik} = 0 if we do not observe any effect. Experimental data, however, will always be noisy and therefore the probability _{ik}|Φ, _{i}) will be dependent on the false positive rate

In almost all applications, however, it is not known which effect is directly linked to which signaling gene. Therefore, the marginal likelihood for each silencing scheme is computed by averaging over the effects attachments Θ. This is achieved by summing over all attachment probabilities:
_{i}. The optimal pathway is the one resulting in the highest likelihood. For small networks like the ones we use here, exhaustive search over all network topologies is possible. For faster inference or feasibility for networks with more than five genes, a greedy hill climbing method is provided in the package ‘

Interpretation of the network inferred from data is not always straightforward. As in the original NEM approach we have to consider the degree of identifiability of the network. Two networks belong to the same equivalence class if they have the same likelihood given the data. In the case of the original NEMs two networks are equivalent, if they have the same transitive closure. Due to our extension of the method, additional equivalences between network hypotheses occur in the case of epiNEMs.

Let Φ be a network with two parents regulating their child by one of epiNEMs‘ five logics or one of the three other types of relations (

Another major challenge in pathway inference methods are hidden players [

We validated and benchmarked epiNEMs in the controlled setting of a simulation study. In two case studies in

In a simulation study, we compare epiNEM results to networks reconstructed by NEM without logics as well as B-NEM, ARACNE [

We generated data sets of 4-node networks with 100 effects, _{i}, being randomly attached to the 4 signaling genes, _{i}. In each network, two of the four signaling genes were randomly connected by one of the five possible logic gates. These networks were translated into adjacency matrices with knock-outs in the rows and observed signal disruptions in the columns. For every _{i} we check the behavior upon perturbation from the adjacency matrix. We kept the false positive rate

In total, we generated data from 100 random networks, for each false negative rate. We compared the five competing methods by running time and accuracy of the predicted edges. In the case of the PC algorithm and ARACNE, we did not consider the edge direction, because they only infer partially directed and undirected networks, respectively. For B-NEM and epiNEM, we additionally scored the accuracy of the inferred logical gates and their expected data generated by the inferred network, which is similar to the truth table of a Boolean network.

ARACNE, the PC algorithm, and NEMs are by far the fastest methods. However, they do not infer any logical gates, and the first two report no or only partial edge directions, respectively. B-NEM is almost a magnitude slower than epiNEM. Additionally, epiNEM achieves the highest accuracy for the inferred edges, closely followed by B-NEM and with some distance the other methods. Due to B-NEM’s larger search space, it cannot identify the correct epistatic signaling, even though the accuracy for the expected data is high. EpiNEM on the other hand, while only achieving little higher accuracy for the expected data, has median accuracy of 100% for the logic gates and false negative rates up to 20% (see

(A) Time in seconds. (B) Accuracy of inferred edges. Accuracy of logic gates (C) and expected data (D), which is similar to the truth table. epiNEM is faster than B-NEM and slower than the other methods, while correctly identifying the logic gate for the median of all networks for up to 20% of false negative rate.

We applied epiNEMs to the studies of yeast knock-out screens of van Wageningen

Van Wageningen

Taking up the idea of such a modulator, we used epiNEMs to screen genetic interactions against every single mutant showing an effect compared to wild type. Each screen consisted of scoring many 3-node networks combining the two genes in the genetic interaction with a third gene from a set of potential modulators. To test the full range of possibilities, our search space contains models with and without logics. In order to be able to compare the marginal likelihood of the different models, a common set of

We used binarized data and thus only address complete redundancy and mixed epistasis. We do not restrict the search space to enforce epistasis, but allow every network hypothesis. However, for almost all significant modulators epiNEM infers logic gates. Only for some epiNEMs, we find “no epistasis”, defined as either some other type of regulatory network without logic gates or an unconnected network even though the three signaling genes share effect reporters. This is in contrast to “no information”, when the modulator does not share effect reporters.

There are three pairs of double knock-outs, which were previously identified to exhibit complete redundancy [

(A) The identified modulators for

Another two pairs,

For the remaining 9 double mutants, mixed epistasis was found by van Wageningen

Sameith

Sameith

For the

In both cases the AND logic (blue) is the most dominant. The absence of OR gates can be explained by the selection of regulators. Only a few modulators are identified as related to the regulators, but not via any logic (purple). False negatives in the data and equivalences can be responsible for the absence of XOR gates and the large amount of masking logics.

Only a small fraction of modulators are not identified as modulating any epistatic effect (

To further validate that the modulators we inferred are biologically meaningful, we made use of the STRING [

The distributions for the string-db interaction scores for the top 30 modulators with their respective regulators (red) and the distributions for the interaction scores of all possible modulators and regulators in the data (blue) for the Van Wageningen

We did the same analysis using a graph based GO similarity score (Wang et al., 2007 [

Additionally, we performed KEGG pathway enrichment analysis for the set of significant modulators of each double knock-out pair as well as the effect reporters connected to each significant modulator. The modulators are highly enriched in common pathways like meiosis, cell cycle and MAPK signaling for both data sets (Fig. N and O in

In this paper, we have developed a method to address a central question of molecular cell biology: how to characterise the mechanisms underlying the functional redundancies visible in genetic interactions. We hypothesized that mixed epistatic effects found in high-dimensional readouts can be explained by the action of a third gene that mediates between the genetic interaction and the transcriptional response. To explore this hypothesis we extended Nested Effects Models, an established methodology to infer signaling pathways, with logical functions. The resulting method, called epiNEMs, is a general approach to infer pathways including combinatorial regulation from perturbation effects. In particular, it allowed us to screen for modulators of genetic interactions in

Our approach has several limitations. First, extending NEMs with logics increases the size of the model space and makes exhaustive enumeration unfeasible. Second, we only consider logics between pairs of regulators, which helps to limit model space and is very well suited for our application to genetic interactions, but might be an oversimplification in other applications. In the future, the model could therefore be improved by allowing logic gates for more than two parents. This will result in more complex logics but will also allow for capturing more interactions. Also, until now it is only possible to distinguish between complete redundancy and mixed epistasis, while quantitative redundancy cannot be captured. To improve this situation, we plan to extend the model to use quantitative effects rather than binary data.

In summary, we presented a general framework to understand mediators of complex phenotypes of genetic interactions. Our case studies on transcriptional phenotypes in yeast showed very promising results and there are potentially many other applications in other organisms using either combinatorial RNAi [

All our analyses were done in the statistical computing environment R [

We used 160 microarray gene expression profiles of single and double mutants from [

All analysis steps including data preprocessing are documented in the vignette of the R-package ‘

(PDF)

We thank Frank Holstege and Patrick Kemmeren for useful discussions.