Figure 1.
Co-occurrence significance (CS) scores measure the interaction specificity for two proteins in AP/MS data.
(A) Flow chart for the computation of CS scores. (B) Illustration for protein pair Tub1∶Tub2 showing an overrepresented co-occurrence in the purification data of Gavin et al. [8]: 156 (observed) vs. 61.2 (σ = 6.0) (random), with corresponding CS score of 15.8. (C) Illustration for protein pair Ssa1∶Ssa2, showing an underrepresented co-occurrence in the purification data of Gavin et al. [8]: 65 (observed) vs. 187.6 (σ = 6.6) (random), with corresponding CS score of −18.7. (D) Total score distributions of experimental data sets and corresponding average distributions from 105 random (shuffled) realizations (see Materials and Methods). (E) score distributions, in purification data of Gavin et al. [8], of selected curated interactions in MIPS [18] and SGD-Biogrid (SBMC2) [23],[24] repositories (see Materials and Methods) showing their measured high specificities.
Figure 2.
Evaluation of the IDBOS scoring scheme.
Coverage versus accuracy data (see Materials and Methods) comparing the scoring schemes of IDBOS (this work) and Collins et al. [16], when applied to the purification data of Gavin et al. [8]. Four diverse reference interaction data sets were used: (A) BGS; (B) PCA; (C) SBMC2; and (D) MIPS. See Materials and Methods for full descriptions of these references. Also shown is the scored data of Hart et al. [17] (determined by multiplying individual results across the Gavin et al. [8], Krogan et al. [9], and Ho et al. [7] AP/MS data sets) and evaluations for Y2H data sets of Yu et al. [6] (CCSB-YI1), Ito et al. [4] (core subset), Uetz et al. [5], and a union of these data sets [6] (Y2H-union).
Figure 3.
Abundance effects in high-confidence PINs derived from AP/MS data.
The association between protein degree and abundance in high-confidence PINs derived by (A) the IDBOS procedure (this work) and (B) Collins et al. [16], from AP/MS data sets of Gavin et al. [8] and Krogan et al. [9]. Proteins were sorted by increasing abundance, as measured by Newman et al. [30], into 11 classes. Undetectable low-abundant proteins comprised class 0 while the remaining proteins were sorted into 10 equally-sized classes. The sizes of classes 0/classes 1–10 were as follows: 231/92 for the IDBOS-Gavin PIN; 265/68 for the IDBOS-Krogan (MALDI) PIN; 424/101 for the IDBOS-Krogan (LCMS) PIN; 238/87 for the Collins-Gavin PIN; and 384/111 for the Collins-Krogan (MALDI+LCMS) PIN. For each class, we determined the significance of the average degree, as a Z-score, compared to the network average and standard deviation determined from equivalently-sized randomly-compiled pools (104 realizations). The enclosed rectangular areas represent |Z|<2.6 (P>0.05 after multiple-test correction).
Figure 4.
The high-confidence IDBOS-Gavin PIN is highly modular.
Depictions of (A) the high-confidence IDBOS-Gavin PIN and (B) a commensurate, degree-preserving random network. (C) Enrichments of numbers of disjoined parts in the IDBOS-Gavin PIN and Y2H data sets of Yu et al. [6] (CCSB-YI1), Ito et al. [4] (core subset), Uetz et al. [5], and a union of these data sets [6] (Y2H-union). Expected values and standard deviations (SD) were computed from 1000 realizations of commensurate, degree-preserving random networks. (D) Clustering coefficients of the IDBOS-Gavin PIN and experimental Y2H data sets. The inset shows average clustering coefficients by degree for the IDBOS-Gavin PIN and two realizations of a commensurate, degree-preserving random network. (E) Coverage versus accuracy data for the weakest links in the IDBOS-Gavin PIN using the BGS reference set (see Materials and Methods). Also shown are coverage-accuracy values for the Y2H data sets.
Figure 5.
Indirect associations in the IDBOS-Gavin PIN and Y2H data sets are enriched with false negatives.
An indirect association occurs when two non-interacting proteins share an interaction partner, e.g., A and B represent an indirect association in the case of A–C–B. Indirect associations form a subset of all non-interactions. A false negative is defined as a non-interaction that is curated as a direct physical interaction in a reference set: (A) BGS, (B) SBMC2 (see Materials and Methods). The fraction of indirect associations that are false negatives (actual) was compared with the fraction of all non-interactions that are false negatives (expected). Enrichments were computed as ratios of actual/expected.
Figure 6.
High-confidence AP/MS interaction data shows assortative mixing while Y2H interaction data shows disassortative mixing.
(A) Power-law-like degree distribution of the IDBOS-Gavin PIN and for a commensurate completely random Erdös-Rényi (ER) graph. Enrichments (Z-scores) of interaction frequencies, relative to commensurate, degree-preserving random networks (104 realizations) between pairs of degrees in the (B) IDBOS-Gavin PIN, (C) Y2H-union data set [6], and (D) BGS curated interaction set (see Materials and Methods). Most red indicates Z≥5 (overrepresented) and most green indicates Z≤−5 (underrepresented).