Structural Similarity and Classification of Protein Interaction Interfaces

doi:10.1371/journal.pone.0019554

Figure 1.

A protocol for obtaining a reliable set of similar and dissimilar interface pairs.

First, two structure-based similarity measures, iiRMSD and siRMSD, are evaluated on a dataset collected from 3D Complex database. Second, a non-redundant domain-domain interaction data set is obtained from PDB, SCOP and CATH. Third, iiRMSD is used to classify positive (similar) and negative (dissimilar) training sets of pairs of interaction interface structures.

More »

Expand

Table 1.

Positive and negative datasets.

More »

Expand

Figure 2.

An overview of machine learning approach to determine interface similarity measure.

First, interface structures are extracted from the training sets of similar and dissimilar interaction interfaces. Second, for each pair of interfaces a 106-dimensional feature vector is calculated. Third, a Support Vector Machines classifier is trained and evaluated using the above datasets. Last, a protein interface similarity measure δ(I₁, I₂) is defined for two interfaces, I₁ and I₂, as the distance between the corresponding106-dimensional feature vector and the separating hyperplane.

More »

Expand

Table 2.

Amino acid residue classes according to their physicochemical properties.

More »

Expand

Figure 3.

Hierarchical classification of interaction interfaces.

Similar shapes correspond to homologous proteins. Three levels of structurally similar interaction interfaces are defined. A single cluster at H-level, C-level, and A-level can include homologous, common partner analogous, and analogous interfaces, correspondingly.

More »

Expand

Figure 4.

Histograms of the distributions of (A) iiRMSD and (B) siRMSD values on the datasets of similar and dissimilar interfaces.

Both datasets are obtained from 3D Complex database. On average, the dissimilar interface pairs had larger iiRMSD and siRMSD values (mean values are 20.6 and 15.8, correspondingly) than similar pairs (mean values are 14.8 and 14.7). In addition, the mean value difference between the similar and dissimilar interfaces was larger when using the iiRMSD measure (Δμ is 4.7 for iiRMSD and 1.1 for siRMSD).

More »

Expand

Figure 5.

Distribution of SCOP class ID pairs from the training dataset of protein-protein interactions.

The dataset covers all SCOP class IDs, while the uneven distribution of the pairs is consistent with the unevenness in the overall distribution of protein structures across the SCOP classes.

More »

Expand

Table 3.

Leave-one-out cross validation of two SVM models.

More »

Expand

Table 4.

Top 20 ranked features for both SVM models.

More »

Expand

Table 5.

Minimum, Maximum, and Median of feature values for top 20 ranked features for both SVM models.

More »

Expand

Table 6.

Comparison of SCOPPI, PRISM with Model_ND and Model_NDNN.

More »

Expand

Figure 6.

Average Silhouette value against different number of clusters (K).

An obvious knee point (K = 140) is selected as the number of clusters.

More »

Expand

Table 7.

A three-level hierarchy obtained using the new feature-based interface similarity measure.

More »

Expand

Figure 7.

Case studies of similar interactions.

(A) H-level interactions (iiRMSD = 2.93 Å), (B) C-level interactions (iiRMSD = 6.12 Å), and (C) A-level interactions (iiRMSD = 6.19 Å). Subunits from the first interaction together with the corresponding interface and binding sites are colored gold and light yellow. Subunits from the second interaction (and their interfaces and binding sites) are colored dark and light grey. Positively and negatively charged residues in the first interaction are colored blue and red, while in the second interaction they are colored cyan and magenta, correspondingly. Superposition refers to the superposed interactions, interfaces, and binding sites.

More »

Expand