Fig 1.
Overview of CompaCt analysis steps.
(A) Input correlation datasets from multiple biological systems, with lines indicating correlations between proteins based on the similarity in their migration profiles in the complexome profiling data and colors indicating proteins from different datasets. (B) Determining interactor profile similarity of two proteins, I and II, from different datasets, by calculating the overlap between their interactors (A, B, C, etc.), using orthology (A-A’, C-C’) when they are from different species, with the rank biased overlap metric [14]. (C) Clustering the combined network of all proteins (nodes) with RBO scores (edges) into clusters with MCL [15]. (D) Processing of MCL clusters, separating into subclusters per system (e.g., from the same tissue or species), while pooling information of datasets representing the same system.
Fig 2.
Overview of clusters resulting from CompaCt analysis of a collection of complexome profiling datasets.
Each dot corresponds to an MCL supercluster, with dot size corresponding to the number of subclusters (e.g., species) consistently represented in this cluster. The x-axis represents the number of proteins consistently represented in the cluster. The cluster coherence score displayed on the y-axis is a measure for the degree of commonality between the clustered proteins per dataset. Various clusters that we identified to correspond to known protein complexes have been annotated in the figure as such. Note that the cluster consisting mainly of ras-related proteins is unlikely to represent a large complex, as the comigration of multiple members of this protein family is most likely caused by their similar mass.
Table 1.
Overview of analyzed complexomes.
Fig 3.
Effect of including data from additional complexomes to CompaCt analysis on the agreement of H. sapiens cluster results with CORUM v4 [17].
Starting with only datasets representing the human complexome, datasets from additional complexomes were added to the analysis one by one, computing the MMR after including each. The remaining datasets are then added in a progressive manner, prioritizing those that resulted in the highest MMR increase. The y-axis displays the maximum matching ratio (MMR) between the human subclusters and the CORUM reference set of protein complexes, reflecting the overlap of the identified clusters with the reference. At each step the MMR resulting from including data from each remaining complexome is shown.
Fig 4.
Recovery of the five oxidative phosphorylation complexes by the corresponding subclusters in four complexomes.
The x axis categories correspond to different protein inclusion criteria to the subclusters. The fractions reflect the minimum fraction clustered (FrC) score for proteins to be included: i.e., in how many datasets per complexome this protein is part of the cluster. The “bg” (“best guess”) selection criterion includes all proteins that have a fraction clustered of over 1/2. In addition to those, it also includes proteins with lower FrC scores, but that cluster with an orthologous protein scoring higher than 1/2. Bars above zero correspond to counts of all proteins included in the corresponding clusters. Bars below zero correspond to known complex members that are present in at least one dataset but are not part of the corresponding cluster (false negatives). The numeric values above the bars represent Jaccard index values (true positives divided by true positives, false positives and false negatives) for each selection criterion. (A) H. sapiens, (B) Y. lipolytica, (C) A. thaliana seedling, (D) A. thaliana leaf.
Fig 5.
Recovery of the respiratory chain complex I in H. sapiens, Y. lipolytica and A. thaliana by the corresponding CompaCt output cluster.
Subunits in green are detected and are part of the cluster. Subunits in yellow were not detected in any of the analyzed complexome profiles. Subunits in red were detected in at least one complexome profiling dataset, but are not part of the cluster. Taxon-specific elements of the protein complex are depicted against a colored background, the others correspond to conserved elements. Proteins in columns are orthologs or best hit homologs.
Table 2.
Overview of 45 inspected superclusters.
Fig 6.
(A) Overview of CompaCt output cluster containing proteins of the p24 family in Y. lipolytica, H. sapiens and P. falciparum. Clustered proteins with a fraction clustered score of over 0.5 or best hit homologs of those are displayed. Proteins in columns are orthologs or best hit homologs with two or more proteins in the same block, reflecting likely gene duplications. (B) Phylogenetic tree of all representatives of the p24 protein family in the three aforementioned species. Species-specific branches have been colored. Members of the protein family that were not detected in the data used in this project are shown in gray. Sequences were aligned with ClustalOmega [39], and the phylogenetic tree was reconstructed with PhyML [40]. The dynamic evolution of the protein family whose members are part of one complex are well captured by the CompaCt approach.
Fig 7.
Overview of CompaCt output cluster containing subunits from the mitochondrial ATP Synthase complex in H. sapiens, P. knowlesi and T. gondii complexomes.
Clustered P. falciparum and H. sapiens proteins with a fraction clustered score of over 0.5 or best hit homologs of those are displayed, as well as their clustered Toxoplasma orthologs. Human proteins labeled “predicted mitochondrial” are listed in Mitocarta 3.0 [42]. Plasmodium and Toxoplasma proteins with this label are predicted as such by PlasmoMitoCarta [44] and a HyperLOPIT study [45], respectively.
Fig 8.
Overview of CompaCt output cluster containing subunits from the membrane component (V0) of the Vacuolar ATPase complex (V-ATPase) in H. sapiens, Y. lipolytica, A. thaliana and P. falciparum complexomes.
Clustered proteins with a fraction clustered score of over 0.5 or best hit homologs with those are displayed. Proteins in columns are orthologs or best hit homologs, and are labeled depending on whether they have previously been associated with this protein complex. Two or more proteins in the same block indicate likely gene duplications. For proteins labeled as “associated with the complex” there is some previous evidence associating them with the V-ATPase complex, while for the other proteins there is not.
Table 3.
Rubisco supercluster (cluster id: 410) actual and possible matches.
Table 4.
Characteristics of the different c15orf61 HEK293T clones used in this work.