Effective Automated Feature Construction and Selection for Classification of Biological Sequences | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Figure 1.

The EFFECT framework consists of two algorithms, EFC and EFS, as detailed in the Methods section.
While EFC conducts a biased exploration of a vast space of potentially complex features to find a set of top features, EFS reduces this set to a subset of informative yet low redundancy features. The remaining features are used to transform sequence data into vector data that can be separated by any classifier.

More »

Table 1 — Table 1.

IUPAC code is adapted from [100].

More »

Figure 2.

Conjunction Features combining one positional and one compositional feature.

More »

Figure 3.

Disjunction Features combining one positional and one negation of compositional feature.

More »

Table 2 — Table 2.

A table of non-terminals and terminals employed in feature construction.

More »

Table 3 — Table 3.

auROC and auPRC comparison analysis for Recognition of HSS Sites.

More »

Table 4 — Table 4.

auROC and auPRC comparison analysis for recognition of splice sites on NN269 dataset.

More »

Table 5 — Table 5.

auROC and auPRC comparison analysis for recognition of splice sites on C. elegans dataset.

More »

Table 6 — Table 6.

auROC and auPRC comparison analysis for Recognition of ALU Sites.

More »

Figure 4.

A complex disjunctive feature is obtained by the EFFECT framework for the ALU sequence classification problem.
The feature is shown in the tree representation employed for features in this work.

More »

Figure 5.

A clustal alignment of sequences shows the same overrepresented signals also combined as lower-level features in the disjunctive higher-level one found by the EFFECT framework.

More »

Figure 6.

Information gain for features obtained by EFFECT on the HS site dataset is compared to that obtained by feature-based methods used for comparison in this work.

More »

Figure 7.

Information gain for features obtained by EFFECT on the NN269 Acceptor dataset is compared to that obtained by feature-based methods used for comparison in this work.

More »

Figure 8.

Information gain for features obtained by EFFECT on the NN269 Donor dataset is compared to that obtained by feature-based methods used for comparison in this work.

More »

Figure 9.

Information gain for features obtained by EFFECT on the ALU dataset is compared to that obtained by feature-based methods used for comparison in this work.

More »