Figure 1.
The average ROC scores of the sequence-based PDT approach with different β values on SCOP 1.53 dataset.
Table 1.
Comparison against the methods based on sequence composition information.
Figure 2.
The influence of the number of randomly chopped amino acids within the beginning 20 amino acids from the N-terminus of the target proteins on the performance.
Table 2.
Ordered list of discriminative features of SVM-PDT.
Figure 3.
The average ROC scores of the profile-based PDT approach with different values of β.
Table 3.
Comparison against the profile-based methods.
Figure 4.
A schematic diagram of physicochemical distance transformation approach with λ values of 1 (subfigure a), 2 (subfigure b) and 3 (subfigure c).
A1 is the first amino acid in the protein sequence; AL is the Lth amino acid in that protein. Dj(Ai,Ai+λ) is calculated by equation (4) based on index j in AAIndex, which measures the correlation between any two amino acids with a distance λ along the protein sequence. The sequence-order information associated with the physicochemical properties can be efficiently reflected by equation (3) and (4)
Figure 5.
The flowchart of generating the profile-based protein sequences.
The multiple sequence alignment is obtained by PSI-BLAST. The frequency profile is calculated from the multiple sequence alignment. For each column in the frequency profile, the amino acids are sorted in descending order according to their frequencies, and then the profile-based sequences are obtained by combining the n-th most frequent amino acids.