Fig 1.
Flowchart of NABind algorithm.
(A) NABind comprising four basic modules (i.e. a deep learning-based module, a template-based module, a merging module and a post-processing module). In the deep learning module, proteins were represented as graphs which were then fed into the EGAT layer and fully connected layer for node classification. In the template module, multiple templates were retrieved for each query, and template-related features were generated for supervised learning. (B) Features used in different modules. For deep learning, node features included sequence descriptors and structural descriptors, while edge features included the distance and orientation between residues. The template features included overall alignment descriptors and residue-based alignment descriptors. (C) Schematics of merging module. This module was implemented by the LGBM method, the inputs of which were the outputs of deep learning and template modules. (D) Schematics of post-processing module. In this module, the random walk process was performed on the surface residue network to optimize binding probabilities.
Fig 2.
Comparison of deep learning models using different types of features and comparison of structural features of binding and non-binding residues.
(A) AUC measures for different types of features. Significance tests were performed as described in the Methods section. (B) MCC measures for different types of features. (C) Scatter plots of AUC for native structures and an example with prediction results generated by different types of features. (D) Comparison of partial structural features between DNA-binding and non-binding residues. The complete comparison is presented in S2 Fig. ORC: Ollivier Ricci curvature, FRC: Forman Ricci curvature, MFD: multifractal dimension, MIR: minimum inaccessible radius, ASV: accessible shell volume, and USR: ultrafast shape recognition. Significant differences were evaluated using Wilcoxon rank sum test. **** p < 0.0001, *** 0.0001 ≤ p < 0.001, ** 0.001 ≤ p < 0.01, * 0.01 ≤ p < 0.05 and ns: p ≥ 0.05.
Fig 3.
Usefulness of template-based module and merging module.
(A) Performance comparison of the deep learning-based module, template-based module, and merging module. (B) Relationship between the AUC of chains and the TM-score of the best template. (C) Increments in AUC by incorporating template-based predictions and the relationship between the increment in AUC of difficult cases and the TM-score of the best template. Difficult cases denote those chains in the deep learning phase with AUC values less than the average AUC of all chains. (D) An example with prediction results of NABindDL, NABindTL and NABindMer.
Table 1.
Performance of different modules on training sets using 5-fold cross-validation.
Fig 4.
Comparison between NABind and our previous methods (i.e. DNABind and RBRDetector).
(A) Differences in the design strategy of each module. The feature-based module adopted different feature representations and supervised learning models. The template-based module utilized different approaches for constructing the template library and inferring binding residues based on retrieved templates. The integration module used the stacking strategy instead of a piecewise function. A newly designed post-processing module was used in the updated method. (B) Comparison of improved and traditional residue representations using random forest classifiers. (C) Performance comparison of NABindDL, DNABindML and RBRDetectorML. (D) Statistics of best templates retrieved by current and previous methods. (E) Performance comparison of NABindTL, DNABindTL and RBRDetectorTL. (F) Performance comparison of NABind and our previous methods.
Table 2.
Comparison of NABind and existing methods on test sets.
Fig 5.
Prediction results of several examples generated by NABind and other state-of-the-art methods.
(A) Results for the native structure of a DBP (PDB ID: 7D7C_F). (B) Results for the trRosetta-based predicted structure of this DBP. (C) Results for the native structure of an RBP (PDB ID: 5Y58_B). (D) Results for the trRosetta-based predicted structure of this RBP.