Figure 1.
The linear SVM trained on samples from two classes.
Samples locating on the hyperplanes of and
are referred to as ''boundary samples''.
Figure 2.
The essential idea of kernel matrix induced gene selection algorithms.
Table 1.
The Algorithm of Kernel Matrix Gene Selection.
Table 2.
The Algorithm of Kernel Matrix Sequential Forward Selection.
Figure 3.
The flow chart for evaluating a gene selection algorithm using the B.632+ technique.
Figure 4.
The B.632+ error shown as a function of the number of DEGs for the prostate dataset.
The curves depict the performance of the following wrapper methods with their respective optimal parameter settings: Gaussian KMGS with and
; Gaussian KMSFS with
and
; LOOSFS with
and
; linear KMSFS with
; linear KMGS with
. The performance of linear KMSFS was the best when the number of DEGS was between 10 and 60, while Gaussian KMGS outperformed the rest when the number of DEGs increases further to 100. The lowest B.632+ rate was achieved by Gaussian KMGS.
Figure 5.
The B.632+ error shown as a function of the number of DEGs for the prostate dataset.
The curves are obtained from the following algorithms with their respective optimal parameter settings: Fisher's ratio with ; Yang's methods both of which with
; Cho's method with
; linear KMSFS with
; Gaussian KMGS with
and
. Linear KMSFS and the Gaussian KMGS performed better than the 4 filter methods.
Figure 6.
The B.632+ error shown as a function of the number of DEGs for the colon dataset.
The curves depict the performance of the following wrapper methods with their respective optimal parameter settings: Gaussian KMGS with and
; Gaussian KMSFS with
and
; LOOSFS with
and
; linear KMSFS with
; linear KMGS with
. The performance of Gaussian KMSFS was shown to be the best while the performance of the LOOSFS was the worst.
Figure 7.
The B.632+ error shown as a function of the number of DEGs for the colon dataset.
The curves are obtained from the filter methods among which are Fisher's ratio, Cho's method and Yang's methods with the parameter uniformly set at
. Gaussian KMSFS outperformed all the filter methods noticeably.
Figure 8.
The B.632+ error shown as a function of the number of DEGs for the leukemia dataset.
The curves depict the performance of the following wrapper methods with their respective optimal parameter settings: Gaussian KMGS with and
; Gaussian KMSFS with
and
; LOOSFS with
and
; GLGS with
; linear KMSFS and linear KMGS with
. The performance of Gaussian KMSFS remained competitive to that of LOOSFS. Meanwhile, the lowest B.632+ error rate was achieved by Gaussian KMSFS with around 50 selected DEGs.
Figure 9.
The B.632+ error shown as a function of the number of DEGs for the leukemia dataset.
The curves are obtained from the filter methods among which are Fisher′s ratio, Cho′s method and Yang′s methods with set to be uniformly. Gaussian KMSFS performed better than the 4 filter methods.
Figure 10.
Heatmaps of top 50 DEGs selected most frequently by Gaussian KMGS, Gaussian KMSFS, LOOSFS, GLGS respectively with their optimal parameter settings on .
Figure 11.
Heatmaps of top 50 DEGs selected most frequently by Fisher's ratio, Cho's methods and Yang's two methods.
Figure 12.
Heatmaps of top 50 DEGs selected most frequently by linear KMSFS and linear KMGS.
Table 3.
Scores obtained from Friedman rank sum tests with Holm correction for choices of with
fixed at a specific value at each row.
Table 4.
Scores obtained from Friedman rank sum tests with Holm correction for choices of with
fixed at a specific value at each row.