MIDClass: Microarray Data Classification by Association Rules and Gene Expression Intervals

doi:10.1371/journal.pone.0069873

Figure 1.

MIDClass flowchart.

More »

Expand

Figure 2.

Example of MIDClass flowchart on Breast Cancer 2 Dataset (data are partially shown).

Let denote the expression value of sample on the -th gene (an example of entry in M is shown as a black box). Samples are divided into classes corresponding to phenotypes disease. After discretization process, MIDClass constructs a matrix from by replacing each with the unique interval containing it. denotes an entry in (an example of entry in is shown as a black box). Then, MIDClass computes per class the possible sets of that are frequent and they have maximal size. MIDClass filters out gene expression intervals which size are below a given threshold. Since, association rules express interesting relationships between gene expressions and class labels, MIDClass uses them for classification. Therefore, MIDClass extracts a set of rules per class. Each rule has quantitative attributes on the antecedence part (i.e. discretized values) and one categorical attribute on the consequence side (i.e. the class ). Finally, it returns only rules that have a maximal score. The score takes into account the number of items in each sample are contained in the rule together with the cardinality of the rule (the computation of the score is described in detailed in the Methods section).