Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1.

Division of amino acids into 3 different groups by different physicochemical properties.

More »

Table 1 Expand

Figure 1.

Extraction process of the 188-dimensional (188D) feature vectors (FV).

Sequences are input and processed by analyzing amino acid composition, distribution and protein physicochemical properties, FV1–FV188 are output as feature vectors.

More »

Figure 1 Expand

Figure 2.

The architecture of our ensemble classifier.

The training dataset is classified by all base classifiers. After K-Means clustering and circulating combination the best ensemble result is achieved.

More »

Figure 2 Expand

Table 2.

Algorithm 1. Circulating Combination of EFSS.

More »

Table 2 Expand

Figure 3.

Protein structure levels in SCOP.

The classification of protein classes and of protein folds are the first and second layer, respectively, of our hierarchical classification frame.

More »

Figure 3 Expand

Figure 4.

Comparison of the two datasets.

In each query description, the first letter represents the class name and the second digit represents the fold number. The SCOP dataset that was used in this paper is shown in red. Seven classes containing 1195 folds (which are omitted in the figure) from SCOP dataset are explained as: (a) all-α proteins (284 folds), (b) all-β proteins (174 folds), (c) α/β proteins (147 folds), (d) α+β proteins (376 folds), (e) multi-domain proteins (66 folds), (f) membrane and cell surface proteins and peptides (58 folds), and (g) small proteins (90 folds). The benchmark dataset [3] proposed by Ding and Dubchak, composed of the 27 folds that were extracted from SCOP, is shown in blue.

More »

Figure 4 Expand

Figure 5.

Comparison of success rate among several studies.

Our work outperforms all previous works with an accuracy of 74.21%.

More »

Figure 5 Expand

Figure 6.

Success rate achieved by three classifiers with different sequence identity.

The two graphs show the results of two datasets((a) SCOP version 1.75, (b) SCOP version 1.75A). Their similar success rates demonstrate the robustness of our model. As identity increases it becomes less stringent and success rate rises. It also shows our ensemble classifier outperforms other two classifiers.

More »

Figure 6 Expand

Table 3.

Performance on different classifiers on protein fold recognition (one sequence in each family).

More »

Table 3 Expand

Table 4.

Performance on different classifiers on protein fold recognition (sequence at 35% identity).

More »

Table 4 Expand

Table 5.

Preliminary results* of PCA analysis.

More »

Table 5 Expand

Table 6.

Loadings of most informative features* on principle component factors.

More »

Table 6 Expand

Figure 7.

Success rate of seven subsets with different sequence identities.

The figure shows factors influencing success rate. Success rate has an increasing trend when sequence identity rises or class number drops.

More »

Figure 7 Expand

Table 7.

Influential factors for success rate of 1st and 2nd hierarchical layers.

More »

Table 7 Expand