Fig 1.
Workflow of DNABP
Table 1.
Comparison of the performances of various features using the RF algorithm based on Mainset with five-fold cross-validation
Fig 2.
The IFS curve showing MCC values plotted against feature numbers.
The maximum MCC value was 0.727 when the top 64 features were selected.
Table 2.
The optimal 64 features for the prediction of DNA-binding proteins
Table 3.
The performance of DNABP, enDNA-Port, iDNA-Prot|dis and nDNA-Prot based on the Testset
Table 4.
Comparison of the performances of DNABP and enDNA-Prot based on various test dataset
Fig 3.
(a) Feature distribution for the 64 optimal features. (b) The selection proportion of each type of feature.
Fig 4.
(a) Physicochemical property distribution of the 38 PSSM-PP features that were selected in the optimal feature set. (b) The type of amino acid distribution used to construct the 38 PSSM-PP features that were selected in the optimal feature set.
Fig 5.
(a) Physicochemical property distribution used to construct the 23 PHY features that were selected in the optimal feature set. (b) Distribution of the three descriptors used to construct the 23 PHY features that were selected in the optimal feature set.
Table 5.
Comparison of the performances of various dataset using the RF algorithm based on 292 features with five-fold cross-validation