Figure 1.
MIDClass flowchart.
Figure 2.
Example of MIDClass flowchart on Breast Cancer 2 Dataset (data are partially shown).
Let denote the expression value of sample
on the
-th gene (an example of entry in M is shown as a black box). Samples are divided into classes corresponding to phenotypes disease. After discretization process, MIDClass constructs a matrix
from
by replacing each
with the unique interval containing it.
denotes an entry in
(an example of entry in
is shown as a black box). Then, MIDClass computes per class the possible sets of
that are frequent and they have maximal size. MIDClass filters out gene expression intervals which size are below a given threshold. Since, association rules express interesting relationships between gene expressions and class labels, MIDClass uses them for classification. Therefore, MIDClass extracts a set of rules per class. Each rule has quantitative attributes on the antecedence part (i.e. discretized values) and one categorical attribute on the consequence side (i.e. the class
). Finally, it returns only rules that have a maximal score. The score takes into account the number of items in each sample are contained in the rule together with the cardinality of the rule (the computation of the score is described in detailed in the Methods section).
Table 1.
Dataset description.
Table 2.
Number of genes used by classifiers in each tested dataset.
Figure 3.
Runninig time of MIDClass to (a) build and establish its reliability using the LOOCV and (b) to create the model and classify a new instance.
Figure 4.
MIDClass ROC curves.
Table 3.
Comparisons of MIDClass , single gene classifiers and standard classifiers.
Table 4.
MIDClass classification rules in breast cancer 2 dataset.