Fault Tolerance in Protein Interaction Networks: Stable Bipartite Subgraphs and Redundant Pathways

As increasing amounts of high-throughput data for the yeast interactome become available, more system-wide properties are uncovered. One interesting question concerns the fault tolerance of protein interaction networks: whether there exist alternative pathways that can perform some required function if a gene essential to the main mechanism is defective, absent or suppressed. A signature pattern for redundant pathways is the BPM (between-pathway model) motif, introduced by Kelley and Ideker. Past methods proposed to search the yeast interactome for BPM motifs have had several important limitations. First, they have been driven heuristically by local greedy searches, which can lead to the inclusion of extra genes that may not belong in the motif; second, they have been validated solely by functional coherence of the putative pathways using GO enrichment, making it difficult to evaluate putative BPMs in the absence of already known biological annotation. We introduce stable bipartite subgraphs, and show they form a clean and efficient way of generating meaningful BPMs which naturally discard extra genes included by local greedy methods. We show by GO enrichment measures that our BPM set outperforms previous work, covering more known complexes and functional pathways. Perhaps most importantly, since our BPMs are initially generated by examining the genetic-interaction network only, the location of edges in the protein-protein physical interaction network can then be used to statistically validate each candidate BPM, even with sparse GO annotation (or none at all). We uncover some interesting biological examples of previously unknown putative redundant pathways in such areas as vesicle-mediated transport and DNA repair.


Introduction
It is estimated that only 18% of the yeast genome consists of essential genes, meaning that if the gene is deleted, the resulting strain is not viable on rich media [1]. Sometimes, the reason a given gene is not found to be essential is that the gene is not required for growth in rich media under laboratory conditions [2,3]; for example, a gene which produces an enzyme used to metabolize one particular nutrient if other nutrients are available [4]. In other cases, however, genes are not essential because there exist other genes that can compensate for the missing gene. Three main mechanisms of compensation have been observed [2,3,5]. First, there can exist one or more paralogs of a nonessential gene which can substitute directly for it. The second mechanism involves the existence of redundant metabolic pathways or regulatory networks; this is called ''robustness'' by Wagner [6]. A third mechanism involving a more global and diffuse relation among multiple genes across many pathways has also been reported to occur [7]. There is only preliminary data on the relative importance of the three mechanisms -one study estimates that at least 25% of the gene deletions in yeast that have no phenotype involve the first mechanism of duplicate genes [8].
Recent years have seen a huge increase in the amount of genetic interaction data available from yeast double-mutants, where interactions between pairs of nonessential genes are characterized by the phenotypic effect of their simultaneous suppression or deletion. One of the simplest of such effects is a ''syntheticlethality'' interaction: both genes are nonessential, but their simultaneous deletion destroys the viability of the yeast. A synthetic-lethality genetic interaction (GI) network is defined by representing genes/proteins as nodes, with an edge between two nodes if a synthetic-lethality interaction has been observed between the corresponding genes. An increasingly comprehensive protein-protein physical interaction (PI) network (defined as was the GI network, with nodes as genes/proteins and edges as pairwise interactions) is available for yeast, where physical interactions include direct binding between two genes' protein products, regulatory protein-DNA binding mechanisms, and the existence of enzymatic reactions between pairs of proteins linked by a common metabolite (excluding common metabolic cofactors like water and ATP [7]).
In a seminal paper, Kelley and Ideker [7] showed how the superimposition of the PI and GI networks could be used to search the yeast interactome for a simple network sub-architecture that they called a between-pathway model (BPM). The search for BPMs within the yeast interactome was studied further by Ulitsky and Shamir [9], and by Ma, Tarone, and Li [10] using different models. The BPM model treats GI and PI edges as fundamentally different. This is in contrast to the model used by Nabieva et al. [11] in their groundbreaking work using maximum flow methods to predict gene function: their method depends on GI edges being treated as simply one-type of high-confidence PI edges. In the search for fault-tolerance, in contrast, it is crucial that these two types of edges be treated separately: the fundamental insight of this paper comes from recognizing that we can view the PI edges as ordinary edges, and the GI edges as 2-vertex cuts of the functional network. Thus algorithmic work related to the theory of maximum (and in our case maximal) cuts becomes highly relevant.
Specifically, a BPM is a graph-theoretic indicator that genetic fault tolerance may be present. Consider a model consisting of a pair of protein pathways where each pathway serves as a redundant backup for the other. Within each pathway, there will be many physical interactions between nodes (protein-protein binding, direct transcriptional regulation, etc.), reflecting each pathway's existence as a coherent functional unit. Syntheticlethality interactions, on the other hand, will be few or nonexistent within each pathway, since the other pathway provides a failsafe mechanism for its partner. Between the two pathways, there will be more observed synthetic-lethality interactions: if corresponding components are deleted or suppressed in both pathways at once, the fault-tolerance of the system is defeated and the strain dies. A network motif corresponding to this situation, in which two groups of genes -each group found to be edge-dense within the PI network -are connected by many synthetic-lethality edges in the GI network, defines the BPM (see Figure 1).
Kelley and Ideker, and later Ulitsky and Shamir, identified possible examples of the BPM architecture for yeast. They found that many of their candidate BPMs were enriched over certain Gene Ontology (GO) categories of protein function [12], and correlated well with some biologically-known examples of pathway buffering. However, in both cases, a heuristic approach was used to extend small connected components of the PI network, searching for patterns in a combined network which superimposed both physical and genetic interactions.
In contrast, our new method initially takes into account only the graph-theoretic structure of the synthetic-lethality GI network to search for candidate BPMs; this allows the location and density of physical interactions in the PI network to be used afterward to validate the results. (We note that a recent paper, Ma et al. [10], also takes an approach based on GI edges only; but their strategy is to produce a very large set of possible candidate BPMs, many of which will be meaningless, which must then be filtered using GO annotation to discover a much smaller subset of meaningful pathways; we discuss this more below).
The subgraphs of the synthetic-lethality network which our method returns as putative examples of the BPM architecture we call stable bipartite subgraphs. They are defined as follows. Given any bipartition of all nodes of a network into sets A and B, we call such a partition maximal if the act of moving a single node from A to B or from B to A does not increase the number of edges crossing between A and B. These partitions are locally maximal; there can be many different maximal bipartitions of the same network. There exists an efficient, randomized, greedy algorithm [13] for sampling maximal bipartitions in any network, described below. (In contrast, finding the bipartition with the globally maximum number of edges crossing between A and B is NP-hard. [14,15]) Given the results of M repeated runs of the randomized maximal bipartition detection algorithm on a network, we define the stable bipartite subgraph of any node v to be the bipartite subnetwork (with bipartition v A ,v B ð Þ) consisting of all nodes in the network (set v A ) that appear in the same partition as v in at least 70% of sampled maximal bipartitions, and all nodes (set v B ) that appear in the opposite partition from v at least 70% of the time. The putative BPMs returned by our method consist of the stable bipartite subgraphs generated for each node v. Note that we obtain fewer stable bipartite subgraphs than genes, because some genes generate the same stable bipartite subgraph.
We show that BPMs obtained from stable bipartite subgraphs show significant functional enrichment over GO categories (using FuncAssociate [16], using an FDR multiple testing correction). Using a network containing the same set of GI and PI edges as that explored by Kelley and Ideker (i.e. the edges known in 2005), we find 602 BPMs covering 1,526 SL edges with 53.4% of the 60262 = 1204 putative functional pathways exhibiting GO enrichment (for some functional category of depth $3); Kelley and Ideker reported 360 BPMs covering 687 SL edges with 34.9% of their 36062 = 720 putative pathways exhibiting GO enrichment. Using a more recent network of GI and PI edges from the BioGRID [17] (as of October 2007), we find 50.6% of 3020 pathways exhibiting GO enrichment, as compared to Ulitsky and Shamir, who find less than 36% of their smaller number of pathways enriched, on a similar but slightly older network (see Table 1). Furthermore, coverage of known complexes by our BPMs is substantially increased over those of Ulitsky and Shamir (79.8% of the complexes annotated in SGD GO-slim [18] for our BPMs, versus 46.3% for theirs).
Since the PI edges are not considered by our BPM construction method, we can go on to measure the propensity of physical protein-protein interactions to occur within rather than between putative pathways. Using this measure, we obtain high-confidence pathways that are not currently represented in known functional annotation; thus we can make new biological functional and faulttolerant predictions. As further statistical validation, we find that the BPM motifs which we predict from the smaller Kelley and Ideker interaction network are consistently carried forward on the larger BioGRID network. That is, newly-discovered syntheticlethality relationships and protein-protein interactions (which appear in the BioGRID data after 2005) tend to appear where we would expect from the structure of the BPMs generated from the older network.
All of our candidate BPMs, along with enrichment results and individual constituent gene annotations, are publically available at http://bcb.cs.tufts.edu/yeast.bpm/. When we refer to a BPM by number in this paper, the number refers to the ID associated with that BPM on this website.

Results
Two datasets describing the yeast interactome were studied: the first contained the interaction data used by Kelley and Ideker (KI) in [7], whose synthetic-lethality (SL) network we denote by G. The second includes the first as well as an updated collection of all additional SL and protein-protein interactions published in the October 1, 2007 release of the BioGRID database [17], which we denote by G9. Both datasets were filtered to exclude essential genes, as well as all genes not found to participate in any syntheticlethality relationships. Thus filtered, G contained 682 gene/ protein-product nodes with 1,858 synthetic-lethality interactions, and G9 contained 1,678 genes with 6,818 SL edges. This data represents only a fraction of the total estimated number of SL interactions in the yeast interactome, because most gene pairs have not yet been tested (we address complications arising from the incompleteness of the known interactome below).
We computed the stable bipartite subgraph of each gene in G and G9; note that for some genes participating in the same BPM, their stable bipartite subgraphs will be identical, so fewer unique BPMs were generated than the number of genes in each network.

Biological validation (GO enrichment results)
The number of different BPM subgraphs we found using this method, the total count of distinct SL edges involved in these BPMs, and the number of pathways found to be enriched for at least one GO category of depth 3 or more, is reported in Table 1, for the network G (identical to Kelley and Ideker's network) and for the more up-to-date network G9. Additionally, Ulitsky and Shamir report 46.3% coverage of the complexes annotated in SGD GO-slim [18]; our coverage of the same database was 79.8%. In both cases, we find many more BPMs on the comparable networks than do the previous studies. This is not surprising, because Kelley and Ideker, as well as Ulitsky and Shamir, include physical protein-protein interaction data in their search for BPMs, so one might expect they would find a smaller set of BPMs. It might be expected that a larger proportion of their BPMs would be enriched, since their BPMs are supported by both genetic and physical interaction data, whereas ours are based solely on genetic interaction data. In fact, Ma et al. [10], using another method that employed only SL edges to construct BPMs (as we do), found exactly this: that they generated more BPMs, but a smaller fraction were GO-enriched. Surprisingly, a larger proportion of the BPMs put forth by our method are enriched as compared with previous work: over 53% of our BPM pathways were enriched in G, and over 50% in G9, whereas the previous methods which used both networks to generate BPMs never exceeded 36%. We attribute this improvement to the power of the stable bipartite subgraph algorithm to automatically prune unrelated genes which are more often included by localized greedy heuristics.
We did not place the results of Ma et al. [10] in the comparison table; because some of their BPM generation rules bias the pathway samples, and not enough of their pathway data is available for us to generate valid comparison statistics. In particular, using a local greedy approach similar to Kelley and Ideker, but limited to the GI network, Ma et al. report 2,590 generated BPMs, but those BPMs were not made available; instead a subset of 89 BPMs from this set was published that satisfied the following criteria: 1) each pathway contains at least 4 genes, 2) both pathways are enriched for the same GO annotation, and 3) at least 30% of the genes in each pathway have GO annotations that match the annotation that is enriched for both pathways. Presumably, if these enrichment heuristics were relaxed, the number of enriched BPMs would increase from the 89 they report, but since the initial set of 2,590 BPMs was not published, it was not possible to determine by how much. Ma et al. do report that a smaller fraction of their pathways are enriched than those of Kelley and Ideker and Ulitsky and Shamir; in their discussion, they attribute this to their use of GI edges only. Our method used only GI edges and produced a higher percentage of enriched BPMs than the methods that use PI edges as well, so we suggest a different conclusion than Ma et al. concerning the amount of information present in the GI network by itself. 610 of the BPMs we found in G9 had both pathways enriched; of these, all but 71 had at least one functional-enrichment term common to both pathways. A partial list appears in Table 2. We found 71 BPMs in G9 for which both pathways were enriched for at least one GO term, but where no enriched GO terms were found which were common to both partitions. These might represent interdependent but not redundant pathways, or else might represent genuinely redundant pathways which have not yet been sufficiently annotated. A partial list of these appears in Table 3. In both tables, ''Coverage'' columns indicate how many genes -out of all genes in the background set matching the listed GO termwere found in each pathway. BPM number refers to the IDs given in the list of BPMs on our website (see above). We also found 308 BPMs where only one of two partitions exhibited GO enrichment of sufficient specificity.

Mathematical validation (probabilistic results)
The first mathematical validation involves examining the physical protein-protein interaction (PI) network; if our BPMs represent real redundancy in function, PI edges should be biased to occur within each partition as opposed to between partitions. We measure, for each BPM, how much more biased the observed PI edges (between all pairs of gene/protein nodes in the BPM) are to remain within a single partition than would be expected by chance (see the Methods section for computational details). Of the BPMs we found (which were all generated using only synthetic-lethality interaction edges), the top 10 most strongly validated by the location of known PI edges (and their associated p-values) appear in Table 4.
The second statistical validation we applied to our approach was to check the consistency of the BPMs we generated using the Kelley and Ideker network G in the context of those generated from the more recent BioGRID dataset G9. Synthetic-lethality interactions in the newer BioGRID dataset are (except for a small number of false positives weeded out since 2005) a superset of the older data. If our BPMs are biologically meaningful, then, SL interactions reported since the Kelley and Ideker network was constructed should tend to appear between genes in different partitions of the BPMs generated from the older network. We therefore estimated the bias of the distribution of all such newlyreported SL interactions in favor of appearing between rather than within pathways (see the Methods section for computational details). Across the set of 175 BPMs from G which contained at least 20 new SL edges, the average probability that the observed between-pathway bias would occur by chance was 0.017. Since these new edges were not used to construct candidate BPMs in G, their distribution bias provides parallel independent support to the hypothesis that stable bipartite subgraphs do indeed correspond to biologically meaningful motifs. Dimmer et al. [19] showed that deletion of MDM31 or MDM32 resulted in a very similar phenotype as deletion of MDM10/ MDM12/MMM1, namely large, rounded mitochondria with profoundly reduced motility. On the other hand, deletion of PHB1 or PHB2 (the tumor suppression protein prohibitin and its homolog) displayed no detectable phenotype, but was found to be synthetically lethal when any of the genes MDM12, MDM10 or MMM1 on the right side of the partition were mutated [20]. The remaining two genes, ATP23 and ATP10, both associated with the mitochondrial envelope, are believed to possess overlapping functions with respect to ATPase biogenesis [21]. Figure 3 shows an example BPM from Table 3 in more detail; three genes on the left (TOP3, SGS1, RMI1) are known to make up the RecQ helicase-Topo III complex (G0: 0031422), while on the right, overlapping sets of genes are annotated as being involved in GO:0006974 (response to DNA damage stimulus) and GO:0006310 (DNA recombination). SGS1 is the yeast homolog of BLM, responsible for the cancer-prone Bloom's syndrome in humans [22,23], whose signature is cells with unregulated crossing-over. It is known to prevent aberrant crossing-over during meiosis by suppressing formation of joint molecules comprising three and four interconnected duplexes [24]. Hollings-worth and Brill [25] studied the endonuclease MUS81-MMS4, and showed that this two-protein complex also has a role in generating crossovers. In fact, they postulate that there are two independent mechanisms for resolving recombination intermediates, including holiday junctions, during meiosis: one involving MUS81-MMS4, and one involving the RecQ helicase-Topo III complex. They note that budding yeast appears to have the extra pathway as a failover, but that some other organisms appear to have evolved to exclusively use only one mechanism or the other. Our BPM Table 3. Top-scoring BPMs from among those which had both pathways enriched for some GO function, but whose GO matches were different across the two partitions.  appears to support this theory, while segregating additional genes, some already known to be involved in DNA repair, into association with one mechanism or the other. A literature search finds additional support: Wagner et al. [26] show that PIF1 has a direct role in the prevention or repair of SGS1-induced DNA damage that accumulates in top3 mutants. Mullen et al. [27] propose that the MMS4/SLX3, SLX5/8, and SLX1/4 gene pairs encode heterodimeric complexes and speculate that they are required to resolve recombination intermediates arising in response to DNA damage, during meiosis, in the absence of SGS1=TOP3.

Ascertainment bias
While not addressed in the work of Kelley and Ideker or Ulitsky and Shamir, Fritz Roth [28] alerted us to an issue of possible ascertainment bias, based on in the available synthetic-lethality data, which needs to be addressed. In particular, many smaller-scale synthetic-lethality experiments result in data with an artificially bipartite structure. That is, they test a set of query genes against a set of genes on an array, and query genes were only tested against array genes and not against each other. A complete graph could therefore artificially appear in the data as bipartite, based on which subset of all possible gene pairs was tested. We note that the strong enrichment results obtained both in this study and in previous work go some way toward implying that we are not just rediscovering bipartite structure in the network left by ascertainment structure; support for the relevance of our BPMs is also deepened by our validation results concerning the observed within-versus-between distribution bias of protein-protein interactions, as well as validation based on the biased distribution of newly-tested synthetic-lethality interactions, appearing where we would predict them to appear as more experimental data is generated. Even so, we wished to quantify the extent to which ascertainment bias could be affecting our results.
We ordered the various experiments that produced syntheticlethality data in the BioGRID dataset by volume, according to the number of synthetic-lethality interactions each contributed. Thus ordered, the top 25 experiments taken together contributed 72% of all synthetic-lethality interactions in the database. For these experiments, we went through each of the associated papers and uncovered exactly which pairs of genes were tested for syntheticlethality relationships. In this way, instead of having two labels for SL interactions (''known to be synthetic-lethal'' vs. ''known not to be synthetic-lethal or never tested''), we now had three possible labels (''known to be synthetic-lethal'', ''known not to be syntheticlethal'' and ''never tested''). Intuitively, a BPM could be an artifact of ascertainment bias if it turned out that all or nearly all pairs of genes tested for synthetic lethality turned out to lie between the two pathways, with few or no tests having been performed between pairs of genes that lie within the same pathway.
As an example, consider the ''worst'' BPM we are able to find in our set, BPM 622. There are four genes in one pathway (call it pathway 1) that were tested by hand across several very small-scale experiments (not in the top 25 by volume): ECM1, PHB1, PHB2, and YPK2. In pathway 1 is also HSP92, which was a query gene in a very high-throughput experiment. In pathway 2, we find 5 genes that were also tested in the small-scale experiments (MDM10, RPL2A, YPK1, ATP10, and MDM12), but there are an additional 128 genes which were array genes in the same highthroughput experiment in which HSP92 was a query gene. Further examination shows that none of the pairs of these 128 array genes were ever tested against each other; thus, most of the genes in this 133-gene partition are likely to be present simply as an artifact of ascertainment bias.
At the other extreme, we are more confident of those BPMs where, for example, many pairs of genes within each pathway were tested for synthetic lethality. Considering only the top 25 experiments (so this is an underestimate), we find that at least 391 out of 610 dually-enriched BPMs had at least 10 pairs of genes tested in pathway 1, together with at least 10 pairs of genes tested in pathway 2.
Denote the numbers of pairs of genes known to have been tested for synthetic lethality within pathway 1, between pathways, and within pathway 2 by A, B and C, respectively. Suppose there were M total synthetic-lethality edges observed within the BPM as a whole, and suppose KƒM of these appeared between the two pathways. We compute the probability of observing, by chance, K or more edges between the two pathways, when M edges are randomly assigned to the slots created by known tested pairs, given by Table 5 lists the top 25 of our dually-enriched BPMs, ordered by this statistic. (We stress here that this statistic is not equal to the probability of observing one of our BPMs independent of ascertainment bias, because our BPM generation process will bias for edges going across; i.e. regardless of underlying structure, the placement of SL edges is not uniform, but biased by our algorithm to produce partitions where edges appear between pathways. Nonetheless, pathways which have a low value according to this p will have the desired quality that many edges within each pathway were, in fact, tested for synthetic lethality, thus we can still rank our confidence in the BPMs based on this p).
As an increasing fraction of all possible yeast double mutants are grown and tested for genetic interactions, the problem of ascertainment bias in the data with resolve on its own. In the meantime, in order to help the yeast genome researcher weed out those BPMs (like BPM 622, discussed above) which are likely to be artifacts of ascertainment bias, on our website at http://bcb.cs. tufts.edu/yeast.bpm/ we have annotated every gene in every BPM pathway with the names of the experiments from which it came, and whether it was a query or an array gene (the latter label provided only for the top 25 experiments by volume). Using this annotation, one can quickly flag BPMs in which query genes appear opposite array genes from a particular large-scale experiment. One can likewise easily identify cases where many pairs of non-edges were in fact tested for synthetic-lethality interactions, in which cases the likelihood of ascertainment bias is greatly reduced.

Discussion
We have introduced the stable bipartite subgraph as a new means to generate redundant-pathway hypotheses in genetic and protein-protein interaction networks, and we have shown that this approach can generate subnetwork motifs (BPMs) that provide substantially more coverage than earlier approaches, with confident functional-enrichment results.
For the majority of our BPMs, we have evidence (in the form of either high-confidence enrichment results or well-characterized protein-protein interactions) that we are describing genuine redundant pathways. As for the rest, we examine two possible ways in which our method might produce less relevant or meaningless BPMs and discuss how to correct for each.
First, there is the possibility of ''fused pathways.'' Our method only searches for bipartite structures, if there is a tripartite or multipartite redundancy arrangement, we may erroneously aggregate multiple pathways together into a single partition. We believe we have found at least one instance where this is happening (Figure 4).
A second potential issue is that when there are hub nodes (nodes of very high degree in the SL network), the structure of our algorithm will tend to give a high score to partitions that place the hub node in one partition and all of its neighbors in the other. In order to screen out these high-degree effects, on our website, we report results for alternative networks G 75 , G 35 , G9 75 and G9 35 , where for example G 75 stands for the subnetwork of G which remains after all genes of SL degree $75 have been deleted. Some interesting BPMs that are missed in the full network are uncovered in this way; we believe that more analysis of this effect is warranted in later studies.

Future work
The present work makes use of only one class of genetic interaction, namely synthetic-lethality. There are other known classes of genetic interactions such as synthetic-sick and syntheticrescue (when deletion of gene A has a particular phenotype distinct from wildtype, such as slow growth, but deletion of both A and B together results in a strain indistinguishable from wildtype). Supplementary results (reported on our website) imply that treating synthetic-sick interactions as equivalent to syntheticlethality interactions (as Ulitsky and Shamir do) produces weaker results when using our method than limiting analysis only to the latter. We observed here that edges representing synthetic-lethality interactions behave as 2-vertex-cuts; it is not clear how best to incorporate other types of epistatic genetic interactions into our model. To extend this work to aid in the reconstruction of complete functional pathways -and not just fault-tolerant submechanisms -we will also have to find ways to use evidence from purely physical interactions, so that all genes involved in each pathway can be placed back into pathways reconstructed solely from genetic interactions.

Data
We downloaded the genetic and protein-protein gene interaction networks used by Kelley and Ideker from their website [29]. We refer to this network as G. Our newer network G9 was constructed from the BioGRID release 2.0.33 of Oct. 1, 2007. The SL network used to construct G9 consisted simply of all SL interactions recorded for S. cerevisiae, along with all genes which participated in such interactions. The physical protein-protein interaction network used to validate BPMs from both genetic networks was also taken from this BioGRID release, and consisted of all interactions labeled as ''Affinity Capture,'' ''Affinity Chromatography,'' ''Affinity Precipitation,'' ''Chip On-Chip,'' ''Co-Crystal Structure,'' ''Co-Purification,'' ''Phosphorylation Array,'' ''Purified Complex,'' ''Two-Hybrid,'' ''Protein-RNA,'' ''In pathway 1'' represents the fraction of pairs of genes in the first pathway of the BPM which are known to have been tested for synthetic lethality, which actually exhibited an SL relationship. Likewise, ''Between pathways'' and ''In pathway 2'' list the observed number of SL interactions over the number of known tested pairs between the two pathways and within the second pathway, respectively. The last column lists the probability of observing by chance the bias of SL edges in the BPM in favor of appearing between rather than within pathways if edges were placed independently at random between all known tested pairs. doi:10.1371/journal.pone.0005364.t005 ''Protein-Peptide,'' or ''Reconstituted Complex.'' Essential genes were filtered out before any processing took place; we retrieved a list of these genes from Stanford's ''Saccharomyces Genome Deletion Project'' website [30].

Algorithm
We define the yeast SL graph G to have a vertex (node) corresponding to each gene/protein-product pair known to participate in at least one synthetic-lethality interaction, and an edge representing each such interaction. Let G have n vertices and E edges.
Given any bipartition (A,B) of G (that is, given any division of the nodes of G into two disjoint subsets A and B), let c denote the number of edges with one endpoint in A and one in B. For any vertex vMA, define two new sets A9 and B9 to be A2{v} and B<{v}, respectively. (Similarly, for vMB, define B9 to be B2{v} and A9 to be A<{v}.) We say that the bipartition (A,B) is maximal in G if the number of edges of G with one endpoint in A9 and one in B9 is at most c: in other words, moving a single vertex from A to B or vice versa cannot increase the number of edges that cross the cut between A and B.
In any partition (A,B) of the vertices of G, call a vertex happy if it has at least as many edges to vertices in the other partition as it does to vertices in its own partition, and unhappy otherwise. (The term ''happy partition'' was first used in [31].) The following procedure Flip generates a maximal bipartition of G; it is based on a classical result of Lovász [13].
1. Place each vertex of G into A or B uniformly at random. 2. While there exists at least one unhappy vertex in G: (a) Choose a random unhappy vertex v. (b) Switch its side (from A to B or from B to A).

Output the resulting sets A and B.
Theorem. Procedure Flip goes through its while loop at most E times, and results in a maximal bipartition of G.
Proof. Call an edge crossing if it has one endpoint in A and one endpoint in B. Each pass through the loop takes an unhappy vertex and makes it happy. This flip can have the side effect of causing previously happy vertices, which are neighbors of the flipped vertex, to become unhappy, leading to any given vertex potentially becoming happy and unhappy multiple times throughout the course of the algorithm. Every time the while loop is executed, however, the number of crossing edges increases by at least one, and there are E edges, so the loop terminates in at most E iterations. At termination, all vertices must be happy. Therefore, for each node, at least as many of its edges cross the partition as stay within a side of the partition. Thus, globally, there are at least as many edges that cross the partition as stay within a side of the partition. QED.
Running Flip several times may generate different maximal bipartitions, because of the random choices in initializing vertices to partitions, and also because of the random choices of which unhappy node to switch to happy at each iteration of the while loop. Notice that if we have a true example of the two-redundantpathway BPM motif, there will be a large bipartite or nearlybipartite subgraph contained in G whose SL edges are likely to cross the partition in ''most'' of the maximal bipartitions of G (because we get a large crossing gain for having the correct edges . Tripartite pathway redundancy. This is a modified reproduction of structure C in Figure 5 in [34]. This structure is tripartite, with three interacting complexes. Our BPM 541 contains all but two of the genes involved in all three complexes (the two that are missing were not present in our synthetic-lethality data to begin with). BPM 541 correctly separates the complex on the left (yellow nodes are in pathway 1 of BPM 541) from the other two (violet nodes are in pathway 2 of BPM 541), but because our search is limited to bipartite structure, our algorithm grouped both the complex on the bottom and the one on the right together into a single ''pathway,'' basing this decision on the fact that there are more SL interactions observed between the bottom complex and the one on the left than were observed between the bottom complex and the one on the right. doi:10.1371/journal.pone.0005364.g004