Predicting and Validating Protein Interactions Using Network Structure

doi:10.1371/journal.pcbi.1000118

Figure 1.

Upcast Sets of Characteristic Pairs and Triplets.

In this example, we consider only a single characteristic (e.g., protein function), so that the characteristic vector for a protein is a 1-vector. There are three single-category proteins and one two-category protein in the protein interaction network (left), which result in an upcast set of six characteristic pairs {A–B, A–B, A–D, B–D, B–C, C–D}. Alternatively, the upcast set of triplets includes two triangles and three lines.

More »

Expand

Table 1.

The Size of Predictable Protein Pairs in Yeast.

More »

Expand

Table 2.

Eligible Protein Interactions for Different Methods.

More »

Expand

Figure 2.

ROC Curves of Predictive Scores.

The ROC curves, 1 minus specificity vs. sensitivity, for predicting yeast protein interactions using domain interaction based approaches (Deng et al.'s score and Liu et al.'s score), a homology-based approach (Jonsson et al.'s score plus paralogs) and our network-based approach (the triangle rate score).

More »

Expand

Table 3.

Areas under ROC Curves for Scores Comparison.

More »

Expand

Table 4.

Z-tests for AUC Comparison among Predictive Scores.

More »

Expand

Figure 3.

P-ROC Curves for Comparison among Scores.

The P-ROC curves for the comparison of scores.

More »

Expand

Figure 4.

ROC Curves of Pair-Based Score and Triangle Rate Score.

The ROC curves for interactions prediction from the triangle rate score and the pair-based score.

More »

Expand

Figure 5.

EPR Index in Predictions of Interactions.

The black triangles indicate the EPR index for the predicted interactions for top-ranked scores. For example, the set of top 10% predictions has EPR index 41.2.

More »

Expand

Figure 6.

Percentages of Predictions of Interactions Overlapping with DIP CORE.

For each triangle rate score, the amount of overlap of predicted interactions (score≥this rate) with DIP CORE is plotted.

More »

Expand

Figure 7.

Accuracy and Coverage.

Comparison of the number of predictions and accuracy between using, firstly, fully annotated proteins and secondly, fully and partially annotated proteins; the accuracy is the fraction of correct predictions out of all predictions against the reference set and is presented by an error bar (mean±2*standard deviation).

More »

Expand

Table 5.

AUC Based on Different Priors.

More »

Expand

Figure 8.

Performance by Using Different Prior Data Bases.

ROC curves for the triangle rate score using upcast sets constructed, firstly, from yeast only, secondly, from all eukaryotes, thirdly, from all prokaryotes, and lastly, from all organisms. Randomly shuffled proteins are added for comparison.

More »

Expand

Figure 9.

P-ROC Curves for Different Priors.

More »

Expand