Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier

doi:10.1371/journal.pone.0056499

Table 1.

Division of amino acids into 3 different groups by different physicochemical properties.

More »

Expand

Figure 1.

Extraction process of the 188-dimensional (188D) feature vectors (FV).

Sequences are input and processed by analyzing amino acid composition, distribution and protein physicochemical properties, FV1–FV188 are output as feature vectors.

More »

Expand

Figure 2.

The architecture of our ensemble classifier.

The training dataset is classified by all base classifiers. After K-Means clustering and circulating combination the best ensemble result is achieved.

More »

Expand

Table 2.

Algorithm 1. Circulating Combination of EFSS.

More »

Expand

Figure 3.

Protein structure levels in SCOP.

The classification of protein classes and of protein folds are the first and second layer, respectively, of our hierarchical classification frame.

More »

Expand

Figure 4.

Comparison of the two datasets.

In each query description, the first letter represents the class name and the second digit represents the fold number. The SCOP dataset that was used in this paper is shown in red. Seven classes containing 1195 folds (which are omitted in the figure) from SCOP dataset are explained as: (a) all-α proteins (284 folds), (b) all-β proteins (174 folds), (c) α/β proteins (147 folds), (d) α+β proteins (376 folds), (e) multi-domain proteins (66 folds), (f) membrane and cell surface proteins and peptides (58 folds), and (g) small proteins (90 folds). The benchmark dataset [3] proposed by Ding and Dubchak, composed of the 27 folds that were extracted from SCOP, is shown in blue.

More »

Expand

Figure 5.

Comparison of success rate among several studies.

Our work outperforms all previous works with an accuracy of 74.21%.

More »

Expand

Figure 6.

Success rate achieved by three classifiers with different sequence identity.

The two graphs show the results of two datasets((a) SCOP version 1.75, (b) SCOP version 1.75A). Their similar success rates demonstrate the robustness of our model. As identity increases it becomes less stringent and success rate rises. It also shows our ensemble classifier outperforms other two classifiers.

More »