Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection

doi:10.1371/journal.pone.0046633

Figure 1.

The average ROC scores of the sequence-based PDT approach with different β values on SCOP 1.53 dataset.

More »

Expand

Table 1.

Comparison against the methods based on sequence composition information.

More »

Expand

Figure 2.

The influence of the number of randomly chopped amino acids within the beginning 20 amino acids from the N-terminus of the target proteins on the performance.

More »

Expand

Table 2.

Ordered list of discriminative features of SVM-PDT.

More »

Expand

Figure 3.

The average ROC scores of the profile-based PDT approach with different values of β.

More »

Expand

Table 3.

Comparison against the profile-based methods.

More »

Expand

Figure 4.

A schematic diagram of physicochemical distance transformation approach with λ values of 1 (subfigure a), 2 (subfigure b) and 3 (subfigure c).

A₁ is the first amino acid in the protein sequence; A_L is the Lth amino acid in that protein. D_j(A_i,A_i+λ) is calculated by equation (4) based on index j in AAIndex, which measures the correlation between any two amino acids with a distance λ along the protein sequence. The sequence-order information associated with the physicochemical properties can be efficiently reflected by equation (3) and (4)

More »

Expand

Figure 5.

The flowchart of generating the profile-based protein sequences.

The multiple sequence alignment is obtained by PSI-BLAST. The frequency profile is calculated from the multiple sequence alignment. For each column in the frequency profile, the amino acids are sorted in descending order according to their frequencies, and then the profile-based sequences are obtained by combining the n-th most frequent amino acids.

More »

Expand