Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data

doi:10.1371/journal.pone.0039932

Figure 1.

Flowchart illustrating the experimental procedure.

The protocol consists of three steps: 1) Pre-processing; 2) Supervised analysis; 3) Post-analysis.

More »

Expand

Table 1.

Datasets used in this paper.

More »

Expand

Figure 2.

A BioHEL classification rule set obtained for the prostate cancer dataset and illustrating different types of rules.

“Exp(x)” is short for “Expression of gene x”, where x is a HUGO gene symbol, “” represents the conjunctive AND-operator, “[x,y]” is an interval of expression values in which the value of the attribute must lie to fulfill one premise of the rule, and “-” is a class assignment operator, followed by the output class of the rule. Rule 5 is a default rule that applies if no rule above is matched.

Comparison of text mining scores.

Histogram of text mining scores for randomly chosen gene identifier subsets compared to scores achieved by BioHEL and the ensemble feature selection (FS) approach (prostate cancer dataset).

More »

Expand

Figure 4.

Comparison of text mining scores.

Histogram of text mining scores for randomly chosen gene identifier subsets compared to scores achieved by BioHEL and the ensemble feature selection (FS) approach (lymphoma cancer dataset).

More »

Expand

Figure 5.

Comparison of text mining scores.

Histogram of text mining scores for randomly chosen gene identifier subsets, compared to scores achieved by BioHEL and the ensemble feature selection (FS) approach (breast cancer dataset).

More »

Expand

Table 11.

Literature mining significance scores.

More »

Expand