Table 1.
Summary of the considered features, where x, x′ = {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V} denotes the 20 AA types, y = {C, H, E} denotes the three secondary structure states, h = {0.1, 0.2, 0.3, 0.4, 0.5} denotes the cutoff used to categorize the buried/exposed residues based on their relative solvent accessibility, t = {0, 25, 50, 75, 100} denotes the ratio for computing the percentile values, and m = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} denotes the lag for calculating the auto-correlation coefficients.
Table 2.
Comparison of the prediction performance of the Gaussian naïve Bayes (GNB)-based wrapper, logistic regression (LogR)-based wrapper, decision tree (DT)-based wrapper, k-nearest neighbor (KNN)-based wrapper, and two support vector machine (SVM)-based wrappers with the RBF and polynomial kernels (denoted as SVM-RBF and SVM-Poly respectively).
Figure 1.
The flowchart of the proposed method.
Figure 2.
The improvement of MCC values (y axis) along with the increasing number of selected features (x axis) for the performed wrapper based feature selection.
A forward, best-first search was executed using both 10 5 CV runs and jackknife tests on the PDB594 dataset. The standard deviations of MCC values for the case of 5 CV with 10 runs are shown using error bar.
Table 3.
Comparison of DBPPred with the existing methods based on independent blind tests on the same dataset PDB186.
Figure 3.
ROC curves for the predictions of DNA-binding proteins on the PDB186 dataset.
We compare the predictions of DBPPred with DNABIND and DNAbinder that provide real-value outputs.
Table 4.
List of false positive rates of the proposed DBPPred and the existing iDNA-Prot, DNA-Prot, DNAbinder and DNABIND on datasets NDBP4025, RB174, RB256 and RB430.
Table 5.
The mean values of the selected 56 features and the P values that quantify significance of the differences between DNA-binding and non DNA-binding proteins for PDB594 dataset.