Figure 1.
The EFFECT framework consists of two algorithms, EFC and EFS, as detailed in the Methods section.
While EFC conducts a biased exploration of a vast space of potentially complex features to find a set of top features, EFS reduces this set to a subset of informative yet low redundancy features. The remaining features are used to transform sequence data into vector data that can be separated by any classifier.
Table 1.
IUPAC code is adapted from [100].
Figure 2.
Conjunction Features combining one positional and one compositional feature.
Figure 3.
Disjunction Features combining one positional and one negation of compositional feature.
Table 2.
A table of non-terminals and terminals employed in feature construction.
Table 3.
auROC and auPRC comparison analysis for Recognition of HSS Sites.
Table 4.
auROC and auPRC comparison analysis for recognition of splice sites on NN269 dataset.
Table 5.
auROC and auPRC comparison analysis for recognition of splice sites on C. elegans dataset.
Table 6.
auROC and auPRC comparison analysis for Recognition of ALU Sites.
Figure 4.
A complex disjunctive feature is obtained by the EFFECT framework for the ALU sequence classification problem.
The feature is shown in the tree representation employed for features in this work.
Figure 5.
A clustal alignment of sequences shows the same overrepresented signals also combined as lower-level features in the disjunctive higher-level one found by the EFFECT framework.
Figure 6.
Information gain for features obtained by EFFECT on the HS site dataset is compared to that obtained by feature-based methods used for comparison in this work.
Figure 7.
Information gain for features obtained by EFFECT on the NN269 Acceptor dataset is compared to that obtained by feature-based methods used for comparison in this work.
Figure 8.
Information gain for features obtained by EFFECT on the NN269 Donor dataset is compared to that obtained by feature-based methods used for comparison in this work.
Figure 9.
Information gain for features obtained by EFFECT on the ALU dataset is compared to that obtained by feature-based methods used for comparison in this work.