Design of Multi-Specificity in Protein Interfaces

doi:10.1371/journal.pcbi.0030164

Figure 1.

Computational Strategy and Methodology Flowchart

(A) Computational strategy for determining the degree of optimization and predicted cost of multi-specificity.

(B) Flowchart illustrating the methodology for generating a dataset of multi-specific proteins and computational protocol for predicting sequences optimal for each binding interaction alone (single-constraint) as well as sequences predicted to satisfying binding in the context of all structurally characterized partners (multi-constraint).

More »

Expand

Figure 2.

Dataset of Promiscuous Proteins

PDB codes and descriptions of 20 promiscuous proteins and their 65 crystallized interaction partners. For each binding partner, the total number of residues it contacts (within 4 Å) on its promiscuous binding protein as well as the number of these residues which are also utilized by at least one other characterized binding partner are given in the “Total” and “Shared” columns. Fold classes are as assigned using SCOP [20]. Protein–protein interaction maps of sequence homologs to the promiscuous proteins in our dataset (see Methods) are as taken directly from the Database of Interacting Proteins [21], http://dip.doe-mbi.ucla.edu/, see Table S21). Root nodes are colored red and the number of first (orange) and second (yellow) shell nodes for each map is given on the far left. Edges are color-coded based on the reliability of data used to infer interactions, with green lines indicating data verified by one or more computational methods and red lines depicting unverified high-throughput screens. The width of lines in interaction graphs reflects the number of independent experiments verifying each predicted interaction.

More »

Expand

Figure 3.

Single- and Multi-Constraint Simulation Trajectories and Sequences Selected for the Multi-Specific Protein Ran

Trajectories of single-constraint (A) and multi-constraint (B) optimizations. PDB codes for all complexes with the five different binding partners are given in the legend. For reference, the score of the native amino acid sequence for each binding partner is marked on the y-axis (squares, final generation). Scores among partners are correlated for multi-constraint simulations (arrows).

(C) Optimal interface sequences taken from the endpoint of the trajectories in (A) and (B). The first row in the table contains the interface residue PDB numbering, the second row lists the native sequence (red), and the following rows list sequences predicted to be optimal in each simulation: multi-constraint (second sequence), single-constraint (third through seventh sequences). Plus signs in the table denote that the wild-type amino acid residue type was recovered as optimal. The number and percent of interface residues recovered as identical to native is shown for each simulation in the rightmost column. Grey shading denotes interface positions not within 4 Å of the shaded interaction partner (see Methods).

More »

Expand

Figure 4.

Single- and Multi-Constraint Models for Two Ran Interface Sites

Shown are computational models of interface regions around residues predicted to be optimal for binding each partner (orange, 1A2K.pdb; yellow, 1I2M.pdb; green, 1IBR.pdb, purple, 1K5D.pdb; blue, 1WA5.pdb) of Ran (pink). Single-constraint predictions for residue 74 (A–E) (wild-type glycine) indicate compromise among the preferences of the five partners. Three partners (A,C,D), when optimized alone, prefer a residue with greater hydrogen bonding capabilities than the wild-type glycine. Steric constraints imposed by the remaining two partners (B,E) forced selection of the wild-type glycine by the multi-constraint protocol. Multi-constraint predictions for residue 76 are shown in panels F–J. The wild-type arginine is also chosen in all single-constraint predictions where it mediates an inter-chain hydrogen bonding network (F,G,H,J). Single-constraint selection of leucine at position 76 for 1K5D.pdb is not shown.

More »

Expand

Figure 5.

Comparison of Native Amino Acid Recovery and Predicted Binding Scores of Native, Single-Constraint, and Multi-Constraint Sequences

(A) The number of residues recovered as identical to native are plotted for each promiscuous protein (see Figure 2). For reference, the size of the shared interface is shown for each protein in red. For roughly half the dataset, (group II, pink shading), sequence recovery from the multi-constraint simulations (black) significantly out-performed the average single-constraint recovery (grey). The remaining proteins (group I, blue shading) showed similar native recovery regardless of whether sequences were optimized with respect to one or all characterized partners. Error bars represent the best and worst native sequence recovery in a single-constraint optimization.

(B) Calculated binding scores of native (red), single-constraint (grey), and multi-constraint (black) sequences for each of the 65 complexes examined in this study (see Figure 2). Sequences selected by single- and multi-constraint optimizations often show a favorable decrease in binding score relative to native sequences for group I proteins (blue shading), while multi-constraint binding scores were close to native for group II proteins (pink shading).

More »

Expand

Figure 6.

Distribution of Optimization in Promiscuous Interfaces

Predicted per-residue binding score improvements (relative to native) for sequences selected in single-constraint (A) and multi-constraint (B) simulations. Coloring indicates the magnitude of predicted improvement over native. Darker-colored bars (compromise value 1–1.5, orange; more than 1.5, red) indicate positions for which the simulation predicts a non-native residue to bind stronger than native. Lighter-colored bars (compromise value 0–0.5, wheat; 0.5–1, yellow) indicate simulations recovered the native (or near-native) residue type. Whether optimization was in the context of single or multiple partners, positions calculated to be hotspots (see Methods) consistently returned the native amino acid as optimal (244/303 and 272/303, for single- and multi-constraint simulations, respectively). In contrast, roughly half of non-hotspot interface positions were predicted as suboptimal for binding when each partner was considered separately (350/682), but only a quarter (167/682) were estimated to still be suboptimal in the context of binding multiple partners. Overall, the total number of interface sites for which improvements in binding scores could be found was significantly less for multi-constraint optimizations. Scores for the same residue position with differing binding partners are included in all plots.

More »

Expand

Figure 7.

Distribution of Constraint Scores in Promiscuous Interfaces

Tradeoff at each interface position in our dataset was estimated by the per-residue difference in scores of amino acids chosen when each partner was optimized alone as compared with when all binding partners were considered in the optimization procedure (see Figure 1A2). The percentage of interface sites displaying the lowest level (0–0.5) of “tradeoff value” (see Methods and text) is shown for all 20 proteins in our dataset (A). Such positions are predicted to be highly shared, in that no partner considered had to “give up” potential gain so that other partners could fulfill their optimal interactions. Blue and pink shading denotes whether each protein was assigned to group I or II. Right-hand panels show color-coded mappings of constraint scores onto three promiscuous protein interfaces calculated to display high (B) (Ran set #11), medium (C) (CheY set #4), and low (D) (Ovomucoid Inhibitor set #3) compromise. Compromise values are colored as follows: 0–0.5, wheat; 0.5–1 yellow; 1–1.5 orange; >1.5 red.

More »

Expand