Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes

doi:10.1371/journal.pone.0081683

Figure 1.

The linear SVM trained on samples from two classes.

Samples locating on the hyperplanes of and are referred to as ''boundary samples''.

More »

Expand

Figure 2.

The essential idea of kernel matrix induced gene selection algorithms.

More »

Expand

Table 1.

The Algorithm of Kernel Matrix Gene Selection.

More »

Expand

Table 2.

The Algorithm of Kernel Matrix Sequential Forward Selection.

More »

Expand

Figure 3.

The flow chart for evaluating a gene selection algorithm using the B.632+ technique.

More »

Expand

Figure 4.

The B.632+ error shown as a function of the number of DEGs for the prostate dataset.

The curves depict the performance of the following wrapper methods with their respective optimal parameter settings: Gaussian KMGS with and ; Gaussian KMSFS with and ; LOOSFS with and ; linear KMSFS with ; linear KMGS with . The performance of linear KMSFS was the best when the number of DEGS was between 10 and 60, while Gaussian KMGS outperformed the rest when the number of DEGs increases further to 100. The lowest B.632+ rate was achieved by Gaussian KMGS.

More »

Expand

Figure 5.

The B.632+ error shown as a function of the number of DEGs for the prostate dataset.

The curves are obtained from the following algorithms with their respective optimal parameter settings: Fisher's ratio with ; Yang's methods both of which with ; Cho's method with ; linear KMSFS with ; Gaussian KMGS with and . Linear KMSFS and the Gaussian KMGS performed better than the 4 filter methods.

More »

Expand

Figure 6.

The B.632+ error shown as a function of the number of DEGs for the colon dataset.

The curves depict the performance of the following wrapper methods with their respective optimal parameter settings: Gaussian KMGS with and ; Gaussian KMSFS with and ; LOOSFS with and ; linear KMSFS with ; linear KMGS with . The performance of Gaussian KMSFS was shown to be the best while the performance of the LOOSFS was the worst.

More »

Expand

Figure 7.

The B.632+ error shown as a function of the number of DEGs for the colon dataset.

The curves are obtained from the filter methods among which are Fisher's ratio, Cho's method and Yang's methods with the parameter uniformly set at . Gaussian KMSFS outperformed all the filter methods noticeably.

More »

Expand

Figure 8.

The B.632+ error shown as a function of the number of DEGs for the leukemia dataset.

The curves depict the performance of the following wrapper methods with their respective optimal parameter settings: Gaussian KMGS with and ; Gaussian KMSFS with and ; LOOSFS with and ; GLGS with ; linear KMSFS and linear KMGS with . The performance of Gaussian KMSFS remained competitive to that of LOOSFS. Meanwhile, the lowest B.632+ error rate was achieved by Gaussian KMSFS with around 50 selected DEGs.

More »

Expand

Figure 9.

The B.632+ error shown as a function of the number of DEGs for the leukemia dataset.

The curves are obtained from the filter methods among which are Fisher′s ratio, Cho′s method and Yang′s methods with set to be uniformly. Gaussian KMSFS performed better than the 4 filter methods.

More »

Expand

Figure 10.

Heatmaps of top 50 DEGs selected most frequently by Gaussian KMGS, Gaussian KMSFS, LOOSFS, GLGS respectively with their optimal parameter settings on .