Fig 1.
MFPred input is a backbone ensemble of a protein/peptide complex, which is generated from a protein structure from the PDB (1CKA here) as described in Methods. For each backbone, Rosetta pre-calculates the interaction graph, which stores intrinsic rotamer one-body energies on the vertices (blue circles) and matrices of rotamer-rotamer two-body energies on the edges (black lines). A probabilities matrix (P) is initialized. Mean-field energies (E) are calculated using the interaction graph and P, and a new matrix, P’ is generated from E. If P’ is equal to P, convergence has been reached. If not, the process is repeated by updating P with a combination of P and P’. Once convergence is reached, the final energies matrix and probabilities matrix is used to generate the Boltzmann weights of each backbone position, which is then used to average all the backbone specificity profiles together. This specificity profile is divided by the background specificity profile to reach the final predicted specificity profile.
Fig 2.
Comparison of backbone ensemble generation methods.
(a) Experimental specificity profiles. (b) MFPred on FastRelax backbone ensemble. The p-value of the JSD for a given position is represented by the color of the square under that position; white denotes a p-value > 0.5 and dark blue denotes a p-value of 0. A given circle to the right of a profile represents the cosine similarity (white) and AUC (black) of that profile. The ROC plots beneath each profile depict the SSAL calculation via the experimental ROC (blue) and predicted ROC (red) with their respective AUC values. (c) MFPred on FlexPepDock backbone ensemble. (d) MFPred on Backrub backbone ensemble.
Fig 3.
Number of sequences vs. accuracy and information for methods of profile prediction.
(a)-(d) Number of sequences vs. accuracy for TEV, HCV, GrB, and HIV, respectively. Number of sequences is varied over 1-5-10-All experimentally derived sequences, which is different for each protease. (e)-(h) Number of sequences vs. information content (i.e. shape of profile) difference for TEV, HCV, GrB, and HIV, respectively. Information difference is equal to the predicted bits minus the experimental bits. An information difference that is close to zero approximates the experimental information content well; a highly positive information difference indicates a more peaked predicted than experimental profile while a highly negative information difference denotes a flatter predicted than experimental profile.
Table 1.
Results of all methods—MFPred (MF), sequence_tolerance (ST), and pepspec (PS)—on variously-sized backbone ensembles.
Fig 4.
Generalize MFPred to PRD benchmark.
(a) Experimental specificity profiles. (b) MFPred prediction. The p-value of the JSD for a given position is represented by the color of the square under that position; white denotes a p-value > 0.5 and dark blue denotes a p-value of 0. A given circle to the right of a profile represents the cosine similarity (white) and AUC (black) of that profile. For the PDZ domain, prediction was performed at a kT of 0.6, which was found to be optimal for PDZ domains.
Fig 5.
Changes in specificity profile upon granzyme B protease mutation are recapitulated by MFPred. (a) Experimental (bold) specificity (average of Harris et al. [46] and Ruggles et al. [47]) and predicted P3 specificity for WT granzyme B protease. (b)-(c), WT granzyme B protease structure. (d) R192E granzyme B protease active site. (e) Experimental specificity (bold) [46] and predicted P3 specificity for R192E granzyme B protease. (f) R192E/N218A granzyme B protease active site. (g) Experimental specificity (bold) [47] and predicted P3 specificity for R192E/N218A granzyme B protease.
Table 2.
Substrates for proteases and PRDs.