A Novel Scoring Approach for Protein Co-Purification Data Reveals High Interaction Specificity

doi:10.1371/journal.pcbi.1000515

Figure 1.

Co-occurrence significance (CS) scores measure the interaction specificity for two proteins in AP/MS data.

(A) Flow chart for the computation of CS scores. (B) Illustration for protein pair Tub1∶Tub2 showing an overrepresented co-occurrence in the purification data of Gavin et al. [8]: 156 (observed) vs. 61.2 (σ = 6.0) (random), with corresponding CS score of 15.8. (C) Illustration for protein pair Ssa1∶Ssa2, showing an underrepresented co-occurrence in the purification data of Gavin et al. [8]: 65 (observed) vs. 187.6 (σ = 6.6) (random), with corresponding CS score of −18.7. (D) Total score distributions of experimental data sets and corresponding average distributions from 10⁵ random (shuffled) realizations (see Materials and Methods). (E) score distributions, in purification data of Gavin et al. [8], of selected curated interactions in MIPS [18] and SGD-Biogrid (SBMC2) [23],[24] repositories (see Materials and Methods) showing their measured high specificities.

More »

Expand

Figure 2.

Evaluation of the IDBOS scoring scheme.

Coverage versus accuracy data (see Materials and Methods) comparing the scoring schemes of IDBOS (this work) and Collins et al. [16], when applied to the purification data of Gavin et al. [8]. Four diverse reference interaction data sets were used: (A) BGS; (B) PCA; (C) SBMC2; and (D) MIPS. See Materials and Methods for full descriptions of these references. Also shown is the scored data of Hart et al. [17] (determined by multiplying individual results across the Gavin et al. [8], Krogan et al. [9], and Ho et al. [7] AP/MS data sets) and evaluations for Y2H data sets of Yu et al. [6] (CCSB-YI1), Ito et al. [4] (core subset), Uetz et al. [5], and a union of these data sets [6] (Y2H-union).

More »

Expand

Figure 3.

Abundance effects in high-confidence PINs derived from AP/MS data.

The association between protein degree and abundance in high-confidence PINs derived by (A) the IDBOS procedure (this work) and (B) Collins et al. [16], from AP/MS data sets of Gavin et al. [8] and Krogan et al. [9]. Proteins were sorted by increasing abundance, as measured by Newman et al. [30], into 11 classes. Undetectable low-abundant proteins comprised class 0 while the remaining proteins were sorted into 10 equally-sized classes. The sizes of classes 0/classes 1–10 were as follows: 231/92 for the IDBOS-Gavin PIN; 265/68 for the IDBOS-Krogan (MALDI) PIN; 424/101 for the IDBOS-Krogan (LCMS) PIN; 238/87 for the Collins-Gavin PIN; and 384/111 for the Collins-Krogan (MALDI+LCMS) PIN. For each class, we determined the significance of the average degree, as a Z-score, compared to the network average and standard deviation determined from equivalently-sized randomly-compiled pools (10⁴ realizations). The enclosed rectangular areas represent |Z|<2.6 (P>0.05 after multiple-test correction).

More »

Expand

Figure 4.

The high-confidence IDBOS-Gavin PIN is highly modular.

Depictions of (A) the high-confidence IDBOS-Gavin PIN and (B) a commensurate, degree-preserving random network. (C) Enrichments of numbers of disjoined parts in the IDBOS-Gavin PIN and Y2H data sets of Yu et al. [6] (CCSB-YI1), Ito et al. [4] (core subset), Uetz et al. [5], and a union of these data sets [6] (Y2H-union). Expected values and standard deviations (SD) were computed from 1000 realizations of commensurate, degree-preserving random networks. (D) Clustering coefficients of the IDBOS-Gavin PIN and experimental Y2H data sets. The inset shows average clustering coefficients by degree for the IDBOS-Gavin PIN and two realizations of a commensurate, degree-preserving random network. (E) Coverage versus accuracy data for the weakest links in the IDBOS-Gavin PIN using the BGS reference set (see Materials and Methods). Also shown are coverage-accuracy values for the Y2H data sets.

More »

Expand

Figure 5.

Indirect associations in the IDBOS-Gavin PIN and Y2H data sets are enriched with false negatives.

An indirect association occurs when two non-interacting proteins share an interaction partner, e.g., A and B represent an indirect association in the case of A–C–B. Indirect associations form a subset of all non-interactions. A false negative is defined as a non-interaction that is curated as a direct physical interaction in a reference set: (A) BGS, (B) SBMC2 (see Materials and Methods). The fraction of indirect associations that are false negatives (actual) was compared with the fraction of all non-interactions that are false negatives (expected). Enrichments were computed as ratios of actual/expected.

More »

Expand

Figure 6.

High-confidence AP/MS interaction data shows assortative mixing while Y2H interaction data shows disassortative mixing.

(A) Power-law-like degree distribution of the IDBOS-Gavin PIN and for a commensurate completely random Erdös-Rényi (ER) graph. Enrichments (Z-scores) of interaction frequencies, relative to commensurate, degree-preserving random networks (10⁴ realizations) between pairs of degrees in the (B) IDBOS-Gavin PIN, (C) Y2H-union data set [6], and (D) BGS curated interaction set (see Materials and Methods). Most red indicates Z≥5 (overrepresented) and most green indicates Z≤−5 (underrepresented).

More »

Expand