MIDClass: Microarray Data Classification by Association Rules and Gene Expression Intervals

We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier), based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.


Installing locally
You can download MIDClass from http://ferrolab.dmi.unict.it/midclass.html You will get a zip file that can be installed and executed in any Linux operating systems. The first thing to do is to uncompress the zip file into some folder. Use your preferred application for this, usually doing click or double click over it will start the application that will allow you to uncompress it.
Once uncompressed you will get a folder like MIDClass (the name of the folder could be different depending on the name chose). That's all, you have already installed MIDClass in your computer and you can work with the graphical interface.

Memory configuration
MIDClass is a Java application, and by default it starts with the default memory requirements established by most of the Java applications (usually 256 Megabytes). But most probably you will need more than the default, you will realize this in the case you obtain an exception like this:

java.lang.OutOfMemoryException
There is a way to configure the memory limits for MIDClass the same way that Java does, using the option -Xmx. But this configuration should be specified through an enviroment variable called MIDCLASS_JAVA_OPTS. Let's see some examples of how to do this with different operating systems:

Linux
Imagine that you would like to use 1024 megabytes of memory for MIDClass, then edit the ~/.bashrc file adding a line like this: export MIDCLASS_JAVA_OPTS='-Xmx1024m' You can also specify 2 gigabytes like this: export MIDCLASS_JAVA_OPTS='-Xmx2g'

Starting MIDClass
Open a Terminal window, cd into the MIDClass directory, and run the command: java -jar MIDClass.jar. When running on Linux/Unix OS, make sure that you have rwx permissions for the MIDClass directory and for the directory in which your data is located. Upon running the program, the GUI appears:

Description of Tabs
• Training Set: The tab shows the data matrix chose.
• Discretized T.S.: It shows the discretized matrix. For each gene, MIDClass discretizes the gene expression building this new matrix. • Rules: It allows visualizing the discriminant association rules created by the algorithm.
• Input: The user can visualize the elements to be classified.
• Discretized Input: It shows the discretized input matrix.
• Output: In this tab, the program prints all input produced.
The different tabs show the various steps of the algorithm, the rules and output produced.

MIDClass setting and go fuction
The action buttons on the left of GUI perform all function provided by MIDClass; they are in the following paragraphs. Our algorithm has several parameters that users can set with the buttons present below the action buttons. However, we experimentally tuned the method to establish default values, with those values MIDClass outperformed all the other methods in almost all cases. Users can set: • Discretization Function: Chose among ID3, CACC, USD, EWIB, EFIB, KMEANS, MC, ENTR, RMDL and SSD (default ID3). • MFI Threshold: Cutoff for the maximal frequent itemsets (default 0.05), • Minimum Interval size threshold: Allows to remove gene-intervals whose range is smaller than it (default 0.05).

Input Data
The input data files should be tab-delimited ASCII text files. Each row in this file represents a gene/microRNA in the dataset (except for the first two rows which are a header row and the class row), and each column represents a sample. The header row must contains for each sample a unique label as alphanumeric string, the class row must contains for each sample the belonging class as alphanumeric label. Here is an example: To load tabular expression data, select: File >> load data Matrix or click the button labelled "Input data matrix". The dialog box will appear allowing you to select your input file.

Header Row
Class Row