Fig 1.
Flowchart of the proposed Margin Weighted Robust Discriminant Score (MW-RDS) algorithm.
Table 1.
Summary of the gene expression datasets. Number of samples, number of features, and class-wise frequency distribution are shown against each dataset.
Table 2.
Using the ID1 dataset, results of the 3 classifiers for the given feature selection methods.
Table 3.
Using the ID2 dataset, results of the 3 classifiers for the given feature selection methods.
Table 4.
Using the ID3 dataset, results of the 3 classifiers for the given feature selection methods.
Table 5.
Using the ID4 dataset, results of the 3 classifiers for the given feature selection methods.
Table 6.
Using the ID5 dataset, results of the 3 classifiers for the given feature selection methods.
Table 7.
Using the ID6 dataset,results of the 3 classifiers for the given feature selection methods.
Table 8.
Using the ID7 dataset, results of the 3 classifiers for the given feature selection methods.
Table 9.
Using the ID8 dataset, results of the 3 classifiers for the given feature selection methods.
Table 10.
Using the ID9 dataset, results of the 3 classifiers for the given feature selection methods.
Table 11.
p-values by Wilcoson rank sum test comparing MW-RDS with feature selection methods across 9 datasets in terms classification accuracy. Statistically significance p-value (*p< 0.05, **p< ***p<0.001) indicate that MW-RDS significantly outperforms the other method.
Fig 2.
Boxplots of classification accuracies of the 3 classifiers for the given feature selection methods on ID1.
Fig 3.
Boxplots of sensitivies of the 3 classifiers for the given feature selection methods on ID1.
Fig 4.
Boxplots of classification specificities of the 3 classifiers for the given feature selection methods on ID1.
Fig 5.
Boxplots of classification F1-scores of the 3 classifiers for the given feature selection methods on ID1.
Fig 6.
Boxplots of classification precisions of the 3 classifiers for the given feature selection methods on ID1.
Fig 7.
Plots of classification accuracies on ID2 for various numbers of selected features.
Fig 8.
Plots of sensitivites on ID2 for various numbers of selected features.
Fig 9.
Plots of specificities on ID2 for various numbers of selected features.
Fig 10.
Plots of F1-scores on ID2 for various numbers of selected features.
Fig 11.
Plots of precisions on ID2 for various numbers of selected features.
Fig 12.
Barplots of results on the balanced simulated dataset based on 10 selected features.
Fig 13.
Barplots of the results on the imbalanced simulated dataset based on 10 selected features.
Table 12.
Execution time (in miliseconds) of the feature selection methods for various number of features.
Fig 14.
The effect of under different levels of minority amplification factor
.
Fig 15.
Classification accuracy of RF, SVM, and WKNN on 100–500 features selected by different methods for imbalanced simulated datasets.
Fig 16.
Sensitivity of RF, SVM, and WKNN on 100–500 features selected by different methods for imbalanced simulated datasets.
Table 13.
Classification performance (accuracy, sensitivity, specificity, F1-score, and precision) based on 50 selected features, reported as over 500 runs.