Interactome-Wide Prediction of Protein-Protein Binding Sites Reveals Effects of Protein Sequence Variation in Arabidopsis thaliana

Figure 1

SLIDERBio strategy to predict protein-protein binding sites.

(A–B) SLIDERBio follows the assumption that interfaces can be represented by short sequence motifs: (A) Interaction sites (spacefill) are continuous patches of amino acid residues in the 3D structure of a protein, while in a protein sequence (B) the interface is composed of scattered short motifs (regions highlighted in red and green). In (A–B), protein structure and sequence of the Mms2/Ubc13 heterodimer (PDB id 1jat) are used as illustration. (C–D) SLIDERBio predicts interaction sites by finding motif pairs that are overrepresented in pairs of interacting proteins in an interaction network. (C) illustrates a protein-protein interaction network in which the proteins are represented by nodes and the interactions represented by connecting edges; (D) illustrates the protein sequences and their short motifs (regions highlighted in colored bars; same colors represents similar motifs). In this example, the motif pair [grey-orange] is overrepresented compared to the motif pair [red-green]. To calculate the degree of overrepresentation of a motif, the method verifies in how many sequences of interacting proteins a certain motif is found. Originally, SLIDER considered a motif present in a sequence if a perfect match was found between motif sequence and a region in the protein sequence. In contrast, SLIDERBio makes use of a substitution matrix to calculate the similarity between the motif and the sequence. If the degree of similarity between a motif and a sequence is greater than a threshold, SLIDERBio considers that the sequence contains the motif. In addition, SLIDERBio verifies whether the conservation score and the surface accessibility score of the motifs are greater than pre-defined thresholds. These three thresholds are based on the average value per residue over the length of the motif (E).

