Figure 1.
An example to show the process of generating the PseAAC of Distance-Pairs with Reduced Alphabet Scheme cp(3).
The characters C1, C2, and C3 represent the three different clusters and are coloured with orange, blue, and yellow, respectively. When the maximum pairwise distance d = 2, the occurrence frequencies ,
, and
can be derived from Eq. 12 and the corresponding dimension for the PseAAC vector is
. See the test for further explanation.
Figure 2.
The overall Acc values achieved by iDNA-Prot|dis for cp(20) with different d values based on the benchmark dataset through five-cross validation.
Figure 3.
An illustration for discriminant visualization and interpretation.
(A) The discriminative power of the 400 amino acid pairs. Each element in this figure represents the sum score of the features with positive discriminant weights for a specific distance amino acid pair with cp(20). The amino acids are identified by their one-letter code. The amino acids labelled by horizontal-axis and vertical-axis indicate the first amino acid and the second amino acid in the pairs, respectively. The adjacent colour bar shows the mapping of sum score values. (B) The different discriminant weights of distance amino acid pairs R-R. There are three kinds of features with positive discriminative power for amino acid pair R-R, including RR, R*R, and R**R with distance 1, 2, 3, respectively. (C) The occurrence distribution of RR, R*R, and R**R in the sequence of protein 1HLVA. The total occurrences of the three features are ten, which are shown in red dots. The two DNA-binding regions (sequence position 28–48, and 97–129) are shown in yellow colour. (D) The distribution of RR in the three dimensional structure of 1HLVA. Only one RR occurs outside of the two DNA-binding regions, which was shown in red square. (E) The distribution of R*R and R**R in the three dimensional structure of 1HLVA.
Table 1.
The jackknife test results by iDNA-Prot|dis with different amino acid alphabet profiles (cf. Eqs. 9–13) on the benchmark dataset of Eq. 1 (cf. Supporting Information S1).
Table 2.
A comparison of the jackknife test results by iDNA-Prot|dis with the other methods on the benchmark dataset of Eq. 1.
Figure 4.
The ROC (receiver operating characteristic) curves obtained by different methods on the benchmark dataset using the jackknife tests.
The areas under the ROC curves or AUC are 0.834, 0.826, 0.814, 0.815, 0.789 and 0.761 for iDNA-Prot|dis (cp(20)), iDNA-Prot|dis (cp(14)), DNAbinder (dimension 21), DNAbinder(dimension 400), DNA-Prot and iDNA-Prot, respectively. See the main text for further explanation.
Figure 5.
The ROC (receiver operating characteristic) curves obtained by different methods on the independent dataset PDB186.
The areas under the ROC curves or AUC are 0.786, 0.779, 0.607, 0.694, and 0.791 for iDNA-Prot|dis(cp(20)), iDNA-Prot|dis(cp(14)), DNAbinder, DNABIND and DBPPred, respectively. See the main text for further explanation.
Table 3.
A comparison of the resultsa obtained by iDNA-Prot|dis and the other methods on the independent dataset PDB186.
Figure 6.
A semi-screenshot to show the top page of the web-server iDNA-Prot|dis, which is available at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/.