^{1}

^{1}

^{1}

^{2}

^{1}

^{*}

Analyzed the data: TP NY. Wrote the paper: TP NY ER RS.

The authors have declared that no competing interests exist.

Perturbation experiments, in which a certain gene is knocked out and the expression levels of other genes are observed, constitute a fundamental step in uncovering the intricate wiring diagrams in the living cell and elucidating the causal roles of genes in signaling and regulation. Here we present a novel framework for analyzing large cohorts of gene knockout experiments and their genome-wide effects on expression levels. We devise clustering-like algorithms that identify groups of genes that behave similarly with respect to the knockout data, and utilize them to predict knockout effects and to annotate physical interactions between proteins as inhibiting or activating. Differing from previous approaches, our prediction approach does not depend on physical network information; the latter is used only for the annotation task. Consequently, it is both more efficient and of wider applicability than previous methods. We evaluate our approach using a large scale collection of gene knockout experiments in yeast, comparing it to the state-of-the-art SPINE algorithm. In cross validation tests, our algorithm exhibits superior prediction accuracy, while at the same time increasing the coverage by over 25-fold. Significant coverage gains are obtained also in the annotation of the physical network.

Observing a complex biological system in steady state is often insufficient for a thorough understanding of its working. For such inference, perturbation experiments are necessary and are traditionally employed. In this work we focus on perturbations in which a gene is knocked out and as a result multiple genes change their expression levels. We aim to use a given set of perturbation experiments to predict the results of new experiments. Using a large cohort of gene knockout experiments in yeast, we show that the emerging map of causal relations has a very simple structure that can be utilized for the prediction task. The resulting prediction scheme, and its extension to more complex functional maps, greatly improve on extant approaches, increasing the coverage of known relations by 25-fold, while maintaining the same level of prediction accuracy. Unique to our approach is its independence of physical network data, leading to its high efficiency and coverage as well as to its wide applicability to organisms whose interactions have not been mapped to date. We further extend our method to annotate the interactions of a physical network as activating or suppressing, obtaining significant coverage gains compared to current approaches.

High-throughput technologies are routinely used to map molecular interactions within the cell. These include chromatin immuno-precipitation experiments for measuring protein-DNA interactions (PDIs)

Physical interactions however may not be sufficient to deduce causal roles played by genes in regulation and signaling. For such deduction, perturbation studies are necessary and are traditionally employed

The problem of explaining knockout experiments using a physical network was first introduced by

Another line of work, related to the analysis of single knockout experiments, is the analysis of genetic interactions. Qi

Here we present a novel approach for analyzing a functional network to infer knockout effects. In contrast to previous work, our method does not depend on knowledge of a physical network, but in fact decouples the task of predicting knockout effects from the task of annotating the edges of the physical network. The method is based on partitioning the genes into functional groups whose members are indistinguishable with respect to the rest of the (functional) network.

We start by considering a partition of the genes into two “chromatic” groups with links of up-regulation between the groups and links of down-regulation within each group. To motivate this model, we show that if the latent physical network that underlies the functional data has no cycles with an aggregate negative sign (i.e., the product of the signs along the cycle's edges is negative), then such a partition is indeed possible. We devise several tests for the two-group assumption and find that it is sufficient to explain a large fraction of the analyzed data. Nevertheless, we find that negative feedback mechanisms within signaling pathways lead to deviations of the experimental data from this model. To tackle such deviations, we extend our algorithm to more than two groups, based on ideas from the work of

We validate our methods using a collection of over two hundred knockout experiments in yeast

Finally, we tackle the task of annotating the physical edges with signs of activation or suppression. We provide an efficient algorithm for annotating a given physical network so as to explain a maximal number of functional relations. We validate the algorithm by using manual annotation of the filamentous growth pathway

We follow the seminal work of Yeang et al.

Given a set of knockout experiments, we start by representing them as a

We say that a functional network is

To motivate this assumption, it is imperative to consider its implication on the physical network that underlies the observed knockout effects. We say that a physical network is

If a network is sign-linear then one can efficiently compute a Boolean assignment that explains the input functional relations, and the task of predicting a knockout effect translates to computing the product of the signs of the participating nodes. In the general case, such a perfect Boolean assignment might not exist. Instead, we aim to find an assignment that will satisfy as many of the observed functional relations as possible (see

(A) A physical network model with nodes representing proteins and edges representing protein-DNA interactions. The sign of an interaction is denoted by its arrow type: regular (activating) or cut (suppressing). Note that the network is not sign-consistent since for example,

We tested the validity of the sign-linearity assumption using the yeast knockout data. Applying a single iteration of the sign-linear algorithm to the entire data set, we obtained a Boolean assignment that satisfies over 83% of the knockout pairs (

We use the yeast mating network, studied in

Two variants of SPINE

Method | Global Acc. | Global Coverage | Mating Acc. | Mating Coverage |

Sign-linear | 80.2% | 76.4% | 93.3% | 92.2% |

Sign-clustering | 88.3% | 73.8% | 96% | 94% |

SPINE node variant | 72.5% | 2.6% | 89.3% | 89.3% |

SPINE edge variant | NA | NA | 99% | 98% |

Yeang |
NA | NA | 97.1% | 97.1% |

We further tested our method using varying sizes of the training set (leaving out 10%, 20% and 50% of the knockout pairs). The accuracy level remained stable at 90% even when leaving out 50% of the pairs. The coverage level was at 90% when leaving out 10% or 20% of the pairs, but dropped to 38% when leaving out 50% of the pairs.

The simplicity of the model and the independence of physical data allows the sign-linear algorithm to be applied on large data sets on which the methods of

We compare the results of the sign-linear algorithm to results from

Thus far, we predicted a functional edge to be (for instance)

Results for the sign-linear and sign-clustering algorithms are displayed for different decision cutoffs. The results were obtained using cross validation, each time leaving out 200 knockout pairs. Results for SPINE are presented for its node variant as provided by

Finally, we tested the robustness of the sign-linear algorithm to noise in the input data. Following

While the sign-linear algorithm gave promising results, its underlying assumption is quite restrictive and about 20% of the data do not follow it. To characterize the deviations from the linearity assumption in a finer manner, we devised several local linearity tests for the following properties: (i) Local linearity 1 (LL-1) occurs when the effects of two knocked out genes on a common target is consistent with their effect on each other (

Edges represent functional relations with down-regulation relations depicted as regular arrows and up-regulations as cut-arrows. (A) LL-1: if knocking out genes

We evaluated the prevalence of these three properties in the yeast knockout data set and compared the results to those obtained on randomized data sets (

One particular example is the biosynthesis of steroids pathway (

A natural extension of the sign-linear model is to partition the genes into multiple (greater than two) groups, and use this as a baseline for predicting knockout effects. Taking an approach similar to

The

The sign clustering algorithm was applicable to over 83% (20,445) of the knockout pairs. The sizes of the resulting clusters varied from 1 to 35 with an average size of 4.5 (

The partition into functional groups introduced above can also facilitate the annotation of edges in a physical network with signs of activation or suppression. Given a physical network, hypothesized to provide the underlying “wiring” for the knockout effects, the problem of assigning signs (“+” for activation and “−” for suppression) on its edges so as to explain a maximum number of knockout pairs is computationally hard (

We constructed a network of physical interactions in yeast, containing 5,850 nodes, and 45,512 interactions (39,946 PPIs and 5,566 PDIs), using information from public data bases

The filamentous growth pathway in yeast is displayed in frame A; The high osmolarity glycerol (HOG) pathway is displayed in frame B. Literature curated interaction signs are denoted by the arrow type: regular (activating), cut (suppressing), or none (unassigned). Node colors correspond to a specific partition of the respective genes into two groups made by the sign-annotation algorithm. Gray nodes represent proteins that could not be assigned to a group due to a lack of data. Physical edges connecting proteins of different groups are predicted as suppressing, and edges connecting proteins of the same group are predicted as activating. SPINE, in contrast, assigns signs to proteins, meaning that all the out-going edges of a protein are assigned the same sign. Proteins that were predicted by SPINE to be activators are displayed as hexagons. Proteins that were predicted by SPINE as suppressors are displayed as squares.

One interesting finding of our algorithm concerns the annotation of the interactions between the suppressor of sensor kinase 2 (Ssk2) and Actin 1 (Act1) in the HOG pathway. While the manual annotation of this edge

We devised two clustering methodologies for predicting knockout effects based solely on a given network of functional interactions. The first algorithm employs a restrictive assumption on the structure of the functional network; nevertheless, its underlying model is sufficient for describing the majority of the knockout effects in the large scale yeast data set that we analyzed. In cross validation tests it was shown to provide very efficient means for predicting held-out knockout effects, dramatically improving upon the state-of-the-art benchmark. The second, refined algorithm extends the two-group logic that is at the heart of the first algorithm, aiming to partition the genes into several groups that behave similarly with respect to the knockout data. We show that this refined model allows capturing functional relations within signaling pathways, which could not be explained by the previous model, leading to superior accuracy.

Notably, since the input data contains only single-gene perturbations, both algorithms cannot decipher combinatorial regulation functions involving multiple inputs (as in

Being “network-free” (

In a recent paper, Ma'ayan

We define a

Let

Let

The following two lemmas motivate our sign-linear algorithm; their proofs appear in

The sign-linear algorithm is based on finding a Boolean assignment

To obtain general partitions into more than two groups we use a hierarchical clustering procedure. For a given pair

We use a standard complete-linkage hierarchical clustering procedure. We define the groups by finding inner nodes in the hierarchy whose score is lower than the a-priori probability for functional similarity (

Supporting Information

(0.20 MB PDF)

Distribution of the sizes of clusters constructed by the sign-clustering algorithm.

(0.05 MB JPG)

The number of predictable knockout pairs as a function of the decision cutoff

(0.06 MB JPG)