Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio

Muhammad Hamraz; Amjad Ali; Wali Khan Mashwani; Saeed Aldahmani; Zardad Khan

doi:10.1371/journal.pone.0284619

Abstract

Feature selection in high dimensional gene expression datasets not only reduces the dimension of the data, but also the execution time and computational cost of the underlying classifier. The current study introduces a novel feature selection method called weighted signal to noise ratio (W_SNR) by exploiting the weights of features based on support vectors and signal to noise ratio, with an objective to identify the most informative genes in high dimensional classification problems. The combination of two state-of-the-art procedures enables the extration of the most informative genes. The corresponding weights of these procedures are then multiplied and arranged in decreasing order. Larger weight of a feature indicates its discriminatory power in classifying the tissue samples to their true classes. The current method is validated on eight gene expression datasets. Moreover, results of the proposed method (W_SNR) are also compared with four well known feature selection methods. We found that the (W_SNR) outperform the other competing methods on 6 out of 8 datasets. Box-plots and Bar-plots of the results of the proposed method and all the other methods are also constructed. The proposed method is further assessed on simulated data. Simulation analysis reveal that (W_SNR) outperforms all the other methods included in the study.

Citation: Hamraz M, Ali A, Mashwani WK, Aldahmani S, Khan Z (2023) Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio. PLoS ONE 18(4): e0284619. https://doi.org/10.1371/journal.pone.0284619

Editor: Muhammad Fazal Ijaz, Sejong University, KOREA, REPUBLIC OF

Received: November 14, 2022; Accepted: April 4, 2023; Published: April 25, 2023

Copyright: © 2023 Hamraz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Feature/Gene selection in micro-array gene expression datasets has gained great attention during the recent decades [1–7]. Since high dimensional datasets usually contain noisy, redundant and non-informative features that enhance computational complexity as well as execution time of the underlying model. Feature selection is therefore, necessary to select the informative features and remove the unnecessary ones. This will not only reduce execution or training time but will also increase the accuracy of the model. Based on this model one can categorize the samples in the data into their classes [8]. Feature selection is mainly carried out by using three different methods such as wrapper, filter and embedded. The feature selection methods used in paper falls under the category of filter methods, except sigF [9] which is a wrapper method. Features or variables selection is used in variety of task such as classification, regression and clustering [10]. Also, different types of biological data sets can be analyzed by using feature selection, for instance whole-genome sequencing data set [11], protein mass spectra data set [12], whole-genome expression data set [13–15], and so on. Micro-array and other high throughput technologies are capable of measuring thousands of genes simultaneously, leading to its rampant usage in clinical settings. Recent years have witnessed a lot of feature selection methods for micro-array data analysis. Authors in [16] introduced a method known as ‘double feature selection method’. In their method they have used both the global and intrinsic geometric information, for the selection of informative features in data. Similarly, study in [17] introduced a method that handles semi-supervised feature selection tasks. This method combines neighborhood discriminant index (NDI) and forward iterative Laplacian score (FILS) methods for the selection of discriminative features in high-dimensional data sets. A more efficient implementation of linear support vector machines to improve the recursive feature elimination strategy and then combine them together to select informative genes was proposed in [18]. A study in [2] proposed a new technique that applies an ensemble of feature selection procedures to select those genes that are highly correlated to Lung Adenocarcinoma (LUAD). Utilizing LUAD RNA-seq data from the Cancer Genome Atlas (TCGA), mutual information (MI) was employed followed by recursive feature elimination (RFE) feature selection procedures along with SVM classifier. A new Bi-dimensional Principal Feature Selection (BPFS) procedure for efficiently extracting critical genes was proposed for high dimensional gene expression datasets [19]. This procedure utilizes the principal component analysis (PCA) technique on sample and gene domains successively, in order to identify the informative genes and reduce redundancies while losing less information. The selection of informative features and their importance in classification/regression can be found in [20–27]. The main focus of these methods is to enhance the classification accuracy of the underlying classifier with the help of selected genes, while ignoring their biological relevance, which leads to inaccurate downstream data analysis [28–33]. Therefore, it is necessary to device such a feature selection method that not only increase the classification accuracy, but also to be capable of identifying the biological significance of the selected genes, in tumor versus normal tumor contrast [34, 35].

This paper proposes a new feature selection procedure by combining the information obtained from well known feature selection method called signal to noise ratio (SNR) [40] and the feature weights given by support vector machine (SVM) [36]. For assessing the performance of the current study eight gene expression datasets, i.e., Leukemia, Colon, Srbct, DLBCL, Lungcancer, Breastcancer, TumorC and Prostate have been used. Furthermore, the results of the proposed method are compared with four other well known feature selection methods such as significant features “sigF” [9], minimum redundancy maximum relevance (mRmR) [37], wilcoxon rank sum test “Wilc” [38] and an ensemble method called SVM-mRMRe [39]. After comparing the results of the proposed method (W_SNR) with the aforemesioned methods, it has been observed that the proposed W_SNR stands apart in terms of classification error. Box-plots and bar-plots of the results are also constructed, which also indicate that the proposed method has better performance as compared to the aforementioned feature selection methods. The rest of the paper is organized as follows.

Section 2 gives a detailed description of the datasets used in the paper, support vector machine (SVM) classifier, feature selection procedures “Significant Features” (sigF) [9], Signal to Noise Ratio (SNR) [40], and the proposed method (W_SNR) with its mathematical background and algorithm. Section 3 presents the experimental set up of the proposed method. Section 4 gives discussion on the results of the proposed method W_SNR. The paper is concluded in Section 5.

2 Methods

2.1 Data sets

For the assessment of the proposed method, W_SNR, eight benchmark problems are used. Their sources along with number of features, number of observations and class wise distribution of samples are given in Table 1.

Download:

Table 1. Brief description of the datasets along with the corresponding number of features, observations, class-wise distributions and sources.

https://doi.org/10.1371/journal.pone.0284619.t001

2.2 Support vector machine

Support vector machine (SVM) is a supervised learning technique, which has been widely used for regression and classification problems in literature. It has also been used for feature selection in several studies [32, 33, 48]. This classifier utilizes several kernel functions to perform the classification effectively in linear and non-linear feature spaces. The SVM searches a linear or non-linear optimal hyperplane (H), which can then divide the two groups of observations meaningfully [49]. This hyperplane (H) is supposed to be at maximum distance from both the classes or groups in high-dimensional spaces, so as to separate the two groups as much as possible. The hyperplane is represented in the form of a vector given in Eq 1 which acts as a reference frame to identify the position of each sample or observation in high-dimensional spaces. It is summed in order to produce a score known as discriminate score, which is then used to categorize the observations into one of the two classes. (1) where y is a response vector, i.e., y ∈ (0, 1), where each sample in the data is classified into class 0 or 1. z = (z₁, ⋯, z_d) is a d-dimensional input vector and vector w = (w₁, ⋯, w_d) contains the coefficients of the hyperplane. The term b indicates the intercept of the hyperplane.

2.2.1 Mathematical description behind SVM weights w.

As the SVM algorithm uses a hyperplane (H) to classify the data points in their respective classes, i.e.,

The distance between a given point ψ(z₀) and the hyperplane H is give by (2) where ‖w‖₂ is the Euclidean norm given as (3) The weight vector is the argument that maximize the distance given in Eq 2, that is: (4)

2.3 Significant feature selection (sigF)

A method known as Signature feature selection (sigF) can be found in [9]. In this method, significant features are identified with the help of support vector machine and t-test. First, the weight of each feature is computed via support vector machine (SVM). In the second stage, t-test is computed for each feature in the data in the following manner: (5) where represent the means, standard deviations and number of samples in Class 0 and 1, respectively. In this way the t-statistic is computed for each feature in the data. Alternatively, p-values for all the features in the data are computed based on t-test. A smaller p-value of a feature represents its disciminative ability. The weights computed via SVM classifier are then multiplied with these p-values to achieve new weights of all the features by using the following equation. (6) where v is the level of significance for the corresponding reference distribution and u is the observed value of test statistic based on the level of flexibility v. The feature is considered informative if it possesses a smaller value of ξ.

2.4 The proposed method, W_SNR

The proposed method selects the informative genes or features in high-dimensional gene expression data sets in a similar fashion as that of sigF given in [9]. The only difference is that the method in [9] computes t-statistic for each feature, which is then multiplied with the weights computed via support vector machine classifier. The proposed method on the other hand computes signal to noise ratio [40] for each feature in the following manner. (7) where represent the mean and standard deviations of class 0 and 1, respectively. Features that carry larger value of SNR, are supposed to have greater discriminative ability. Similarly weights of all the features in the data are also computed via SVM, i.e., w_j. Since both the weighting schemes assign larger weights to the informative genes therefore, their multiplication will also assign larger weights to the features that are informative. The resultant weights of the proposed method (W_SNR) are computed by using the following equation (8) where (W_SNR)_j represents the weight of j^th feature in the data. The proposed method (W_SNR) considers the following steps in identifying the informative genes.

Compute weights of all the features using support vectors and denote it by w_j.
Compute signal to noise ratio for all the features in the training data and denote it by SNR_j.
Multiply the corresponding weights in step 1 and 2 and arrange them in descending order.
Select the top ranked (K) genes in step 3 for the model construction.

The authors in [9] have used t-test rather than signal to noise ratio for the selection of discriminative genes. The t-test requires the underlying distribution of variables to be approximately normal, which is a difficult task in a situation where data contains tens of thousand of genes or variables. On the other hand signal to noise ratio does not require such assumption. The following pseudo code given in Algorithm 1 explains how the proposed method, W_SNR, identifies the informative genes, in high-dimensional gene expression data sets, followed by its flowchart in Fig 1.

Algorithm 1 Pseudo code of the proposed method, W_SNR.

1: ← Micro-array data with dimension n × (d + 1);

2: n ← Number of tissue samples in the data;

3: d ← Number of genes in the data;

4: X_n×d ← Total input feature space with n samples and d genes;

5: Y ← Target variable having n values.

6: K ← Number of genes to be selected.

7: w ← Weights vector of genes obtained via support vector;

8: w_j ← Weight of j^th gene obtained via support vector;

9: for j ← 1: d do

10: SNR_j ← Compute the using signal to noise ratio;

11: Perform (W_SNR)_j = w_j * SNR_j;

12: end for

13: Arrange the weights (W_SNR)_j in decreasing order;

14: Select the top K genes for model construction.

Download:

Fig 1. Flowchart of the proposed method.

https://doi.org/10.1371/journal.pone.0284619.g001

3 Experiments

This section provides the experimental setup of the current paper. Eight high-dimensional gene expression benchmark problems are analyzed, where each benchmark problem is split into (70%) training and (30%) testing parts. This splitting criteria is repeated 500 times for all feature selection procedures and the classifiers used for assessing their performance. Random forest (RF) and k-Nearest Neighbours (k-NN) classifiers have been used to evaluate the performance of different subsets of informative genes selected by various feature selection techniques.

The feature selection method, minimum redundancy and maximum relevance (mRmR), is implemented in R package mRMRe [50]. Wilcoxon rank sum test (Wilc) and significant feature selection (sigF) are implemented by using the R packages WilcoxCV [51] and sigFeature [9], respectively. Moreover, the R library randomForest [52] is utilized for fitting the random forest algorithm with default parameters, i.e., ntree = 500, and nodesize = 1. Similarly, the R library caret [53] is used for the implementation of k-Nearest neighbours classifier, with parameter k = 5.

The training parts of each benchmark problem are considered for the selection of different subsets of descriminative genes, i.e., K = 5, 10 and 15 by different gene selection procedures to train the classifiers. Classification error rate is used as a performance metric to investigate the classifiers’ performance on the basis of selected set of informative genes.

4 Results and discussion

Table 2 provides the classification error rates produced by the proposed method, W_SNR, and all the other competitors included in the study, for different subsets of informative genes. From Table 2, it is evident that for the data set “Leukemia” the proposed method has outperformed all the other methods on both the classifiers. In the case of “Colon” data set, the proposed method has outperformed the others on random forest classifier for all subsets of descriminative genes, while on k-nearest neighbour classifier the method (sigF) has produced minimum error for a subset of 5 informative genes. The proposed method, however, has produced minimum error rates for the subsets of genes 10 and 15. Similarly, in the case of “Lungcancer” data set, the method (Wilc) has yielded minimum error rates on random forest classifier while the proposed method has outperformed all the other competitors on k-NN classifier. In the case of “Srbct” data set, the proposed W_SNR method has outperformed all the other methods except for the number of 5 informative genes, where the method “sigF” has yielded minimum error rate on k-NN classifier. The proposed method has outperformed all the other methods on random forest classifier in the case of the dataset “DLBCL” and has shown poor performance on kNN classifier. Similarly, the W_SNR method has won over all the other procedures in majority of the cases for the data set “Breast” but has shown poor performance in case of “TumorC” data set. Similarly, the proposed method has won over all the other methods in case of Prostate data set. Overall, the method, W_SNR, has produced minimum error rates in six out of eight data sets and comparable results on one data set. To summarize simulation results, a win-loss summary is given in Table 3.

Download:

Table 2. Classification error rates produced by different methods on various subsets of genes.

https://doi.org/10.1371/journal.pone.0284619.t002

Download:

Table 3. Win-loss table of the methods used.

Total number of wins of the methods on the data sets is given in the last row of the table.

https://doi.org/10.1371/journal.pone.0284619.t003

The performance of the proposed method is also illustrated with the help of bar-plots of the results for pictorial illustration as given in Figs 2–9. It is clear from the plots that in case of the data set “Leukemia” the heights of bars corresponding to the proposed method, W_SNR, are smaller than the bars corresponding to all the other procedures included in the study. In case of data set “Lungcancer” the method “Wilc” is producing minimum error rates than the rest of the gene selection procedures. For the data sets “Srbct” and “DLBCL”, the method, W_SNR, method has produced minimum classification error rates. For the remaining data sets, our method has maintained a majority wining position except for the data set “TumorC”. Fig 2 has been constructed for a quick insight into the results of various feature selection methods included in the study.

Download:

Fig 2. Bar-plots of error rates of the proposed and the other classical methods on various subsets for Leukemia dataset.

https://doi.org/10.1371/journal.pone.0284619.g002

Download:

Fig 3. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Colon dataset.

https://doi.org/10.1371/journal.pone.0284619.g003

Download:

Fig 4. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Lungcancer dataset.

https://doi.org/10.1371/journal.pone.0284619.g004

Download:

Fig 5. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Srbct dataset.

https://doi.org/10.1371/journal.pone.0284619.g005

Download:

Fig 6. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for DLBCL dataset.

https://doi.org/10.1371/journal.pone.0284619.g006

Download:

Fig 7. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Breast dataset.

https://doi.org/10.1371/journal.pone.0284619.g007

Download:

Fig 8. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for TumorC dataset.

https://doi.org/10.1371/journal.pone.0284619.g008

Download:

Fig 9. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Prostate dataset.

https://doi.org/10.1371/journal.pone.0284619.g009

Similarly, box-plots of the results produced by the method, W_SNR, and all the other competitors for 10 number of informative genes on random forest classifier are also constructed as given in Figs 10–17. The boxplots also show that the method, W_SNR, outperformed the others in majority of the cases.

Download:

Fig 10. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Leukemia dataset.

https://doi.org/10.1371/journal.pone.0284619.g010

Download:

Fig 11. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Colon dataset.

https://doi.org/10.1371/journal.pone.0284619.g011

Download:

Fig 12. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Lungcancer dataset.

https://doi.org/10.1371/journal.pone.0284619.g012

Download:

Fig 13. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Srbct dataset.

https://doi.org/10.1371/journal.pone.0284619.g013

Download:

Fig 14. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for DLBCL dataset.

https://doi.org/10.1371/journal.pone.0284619.g014

Download:

Fig 15. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Breastcancer dataset.

https://doi.org/10.1371/journal.pone.0284619.g015

Download:

Fig 16. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for TumorC dataset.

https://doi.org/10.1371/journal.pone.0284619.g016

Download:

Fig 17. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Prostate dataset.

https://doi.org/10.1371/journal.pone.0284619.g017

4.1 Simulation

This subsection describes two simulation scenarios for the proposed method. The first scenario (S₁) is designed to mimic a situation where the proposed method is useful, whereas the second scenario (S₂) shows a data generation environment that might not favour the proposed method. For this purpose, two different models are designed, one for each scenario. The class probabilities of the Bernoulli response Y = Bernoulli(p) given n × d dimensional matrix X of n iid observations from Normal(0, 1) and Uniform(0, 1) distributions, are generated in each scenario by using the following equation. (9) The values of a and b are both fixed at 1.5. A vector of coefficients, i.e., β is generated from the Uinform(−5, 5) distribution to fit the following linear predictor. (10) Top five, i.e., K = 5, important variables are identified from the above model based on their coefficients β^s. In order to contaminate the data, outliers are added to these top five variables from the Normal(20, 60) distribution. In addition, 20 noisy variables/observations are also added to the data from Normal(5, 10) distribution. By this way a simulated data having n = 100 observations and d = 120 variables is generated. For all the methods considered, the same experimental set is used as that of the benchmark data sets. The second model is also constructed in a similar fashion. The difference between the two models is that, the former contains outliers and noisy variables/observations in the data, while the latter one does not contain outliers and noisy variables in the data. A total of 500 realizations are made for estimating the performance metrics values. The results of the simulation study for both the scenarios are presented in Table 4.

Download:

Table 4. Classification error rates produced by different methods on simulated data.

https://doi.org/10.1371/journal.pone.0284619.t004

From Table 4, it is evident that, when there are noisy variables/observations in the data, the proposed method, W_SNR, performs better than the other competitors, whereas the method (Wilc) produces minimum error rates, when there are no noisy variables/observations in the data. Similarly, bar-Plots of error rates for different subsets of genes, when the simulated data contains noisy genes/observations in the data and when there are no noisy features/observations are also constructed as given in Figs 18 and 19, respectively. The plots indicate that the proposed method, W_SNR, is producing minimum error rates in the presence of noisy features/observations in the data.

Download:

Fig 18. Bar-plots of errors produced by different feature selection methods on simulated data having outliers, for various subsets of genes.

https://doi.org/10.1371/journal.pone.0284619.g018

Download:

Fig 19. Bar-plots of errors produced by different feature selection methods on simulated data, having no outliers, for various subsets of genes.

https://doi.org/10.1371/journal.pone.0284619.g019

5 Conclusion

The current study has proposed a novel feature selection method by exploiting feature weighting via support vectors and signal to noise ratio (SNR). The proposed method initially computes the weights of all genes using support vector machine, followed by the computation of signal to noise ratio for all the genes in the training phase. These weights are then multiplied to compute new weights for each gene in the data. Genes are then arranged in decreasing order of their weights. Top ranked genes are then selected for model construction.

The proposed method is validated on eight benchmark problems and assessment is made against other methods in terms of classification error rates. The results of the proposed method are compared with four well known feature selection methods. Two stat-of-the-art classifiers, i.e., random forest (RF) and k-NN are used to evaluate the performance of the selected genes by various feature selection methods. The analyses revealed that the proposed method, W_SNR, has out performed all the other methods in 6 out of 8 data sets and has produced comparable results on 2 data sets. For quick insight into the results of the proposed method and all the other methods, bar-plots and box-plots of the results have also been constructed. Furthermore, the proposed method is also evaluated on the simulated data where two scenarios are generated. First, a scenario which favors the proposed idea where data consist of noisy features and outlier observations. Second, a scenario where there are no noisy features and outlier observations in the data which does not favor the proposed method. From all the analysis, it is concluded that the proposed method could effectively be used in high dimensional settings where the underlying distribution of observations is not known, as is the case with micro-array data.

For future work in the direction of the proposed study, one can extend it to the situation of unsupervised learning, where the features will first be divided into clusters, and then the proposed method applied in each cluster. The top ranked genes in each cluster can then be selected for the model construction. One can also use the robust measures of location and dispersion in conventional signal to noise ratio to mitigate the effect of outliers in gene expression values. In addition, the performance of the proposed method can be checked by using various kernel functions in SVM.

References

1. Akinola OA, Agushaka JO, Ezugwu AE. Binary dwarf mongoose optimizer for solving high-dimensional feature selection problems. Plos one. 2022;17(10):e0274850. pmid:36201524
- View Article
- PubMed/NCBI
- Google Scholar
2. Abdelwahab O, Awad N, Elserafy M, Badr E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. Plos one. 2022;17(9):e0269126. pmid:36067196
- View Article
- PubMed/NCBI
- Google Scholar
3. Song J, Li Z, Yao G, Wei S, Li L, Wu H. Framework for feature selection of predicting the diagnosis and prognosis of necrotizing enterocolitis. PloS one. 2022;17(8):e0273383. pmid:35984833
- View Article
- PubMed/NCBI
- Google Scholar
4. Tahmouresi A, Rashedi E, Yaghoobi MM, Rezaei M. Gene selection using pyramid gravitational search algorithm. Plos one. 2022;17(3):e0265351. pmid:35290401
- View Article
- PubMed/NCBI
- Google Scholar
5. Taguchi Y, Turki T. Projection in genomic analysis: A theoretical basis to rationalize tensor decomposition and principal component analysis as feature selection tools. PloS one. 2022;17(9):e0275472. pmid:36173994
- View Article
- PubMed/NCBI
- Google Scholar
6. Chen LP. Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions. Plos one. 2022;17(9):e0274440. pmid:36107929
- View Article
- PubMed/NCBI
- Google Scholar
7. Ai H. GSEA–SDBE: A gene selection method for breast cancer classification based on GSEA and analyzing differences in performance metrics. PloS one. 2022;17(4):e0263171. pmid:35472078
- View Article
- PubMed/NCBI
- Google Scholar
8. James G, Witten D, Hastie T, Tibshirani R. Statistical learning. In: An introduction to statistical learning. Springer; 2021. p. 15–57.
9. Das P, Roychowdhury A, Das S, Roychoudhury S, Tripathy S. sigFeature: novel significant feature selection method for classification of gene expression data using support vector machine and t statistic. Frontiers in genetics. 2020;11:247. pmid:32346383
- View Article
- PubMed/NCBI
- Google Scholar
10. Jović A, Brkić K, Bogunović N. A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO)(pp. 1200–1205). Google Scholar. 2015; p. 1200–1205.
11. Das R, Dimitrova N, Xuan Z, Rollins RA, Haghighi F, Edwards JR, et al. Computational prediction of methylation status in human genomic sequences. Proceedings of the National Academy of Sciences. 2006;103(28):10713–10716. pmid:16818882
- View Article
- PubMed/NCBI
- Google Scholar
12. Hilario M, Kalousis A, Pellegrini C, Müller M. Processing and classification of protein mass spectra. Mass spectrometry reviews. 2006;25(3):409–449. pmid:16463283
- View Article
- PubMed/NCBI
- Google Scholar
13. Zheng C, Li L, Haak M, Brors B, Frank O, Giehl M, et al. Gene expression profiling of CD34+ cells identifies a molecular signature of chronic myeloid leukemia blast crisis. Leukemia. 2006;20(6):1028–1034. pmid:16617318
- View Article
- PubMed/NCBI
- Google Scholar
14. Frank O, Brors B, Fabarius A, Li L, Haak M, Merk S, et al. Gene expression signature of primary imatinib-resistant chronic myeloid leukemia patients. Leukemia. 2006;20(8):1400–1407. pmid:16728981
- View Article
- PubMed/NCBI
- Google Scholar
15. Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z. Tissue classification with gene expression profiles. In: Proceedings of the fourth annual international conference on Computational molecular biology; 2000. p. 54–64.
16. Shang R, Song J, Jiao L, Li Y. Double feature selection algorithm based on low-rank sparse non-negative matrix factorization. International Journal of Machine Learning and Cybernetics. 2020;11(8):1891–1908.
- View Article
- Google Scholar
17. Pang Q, Zhang L. A recursive feature retention method for semi-supervised feature selection. International Journal of Machine Learning and Cybernetics. 2021;12(9):2639–2657.
- View Article
- Google Scholar
18. Li Z, Xie W, Liu T. Efficient feature selection and classification for microarray data. PloS one. 2018;13(8):e0202167. pmid:30125332
- View Article
- PubMed/NCBI
- Google Scholar
19. Hou X, Hou J, Huang G. Bi-dimensional principal gene feature selection from big gene expression data. Plos one. 2022;17(12):e0278583. pmid:36477666
- View Article
- PubMed/NCBI
- Google Scholar
20. Bakhshandeh S, Azmi R, Teshnehlab M. Symmetric uncertainty class-feature association map for feature selection in microarray dataset. International Journal of Machine Learning and Cybernetics. 2020;11(1):15–32.
- View Article
- Google Scholar
21. Li Z, Du J, Nie B, Xiong W, Xu G, Luo J. A new two-stage hybrid feature selection algorithm and its application in Chinese medicine. International Journal of Machine Learning and Cybernetics. 2022;13(5):1243–1264.
- View Article
- Google Scholar
22. Nasfi R, Bouguila N. A novel feature selection method using generalized inverted Dirichlet-based HMMs for image categorization. International Journal of Machine Learning and Cybernetics. 2022; p. 1–17.
- View Article
- Google Scholar
23. Javidi MM. Feature selection schema based on game theory and biology migration algorithm for regression problems. International Journal of Machine Learning and Cybernetics. 2021;12(2):303–342.
- View Article
- Google Scholar
24. Hamraz M, Khan Z, Khan DM, Gul N, Ali A, Aldahmani S. Gene selection in binary classification problems within functional genomics experiments via robust Fisher Score. IEEE Access. 2022;10:51682–51692.
- View Article
- Google Scholar
25. Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, et al. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Computer Science. 2021;7:e562. pmid:34141889
- View Article
- PubMed/NCBI
- Google Scholar
26. Hamraz M, Khan DM, Gul N, Ali A, Khan Z, Ahmad S, et al. Regulatory Genes Through Robust-SNR for Binary Classification Within Functional Genomics Experiments. 2022;.
27. Ali A, Hamraz M, Kumam P, Khan DM, Khalil U, Sulaiman M, et al. A k-nearest neighbours based ensemble via optimal model selection for regression. IEEE Access. 2020;8:132095–132105.
- View Article
- Google Scholar
28. Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, et al. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Information Fusion. 2020;63:208–222.
- View Article
- Google Scholar
29. Ali F, El-Sappagh S, Islam SR, Ali A, Attique M, Imran M, et al. An intelligent healthcare monitoring framework using wearable sensors and social networking data. Future Generation Computer Systems. 2021;114:23–43.
- View Article
- Google Scholar
30. Kumar Y, Koul A, Singla R, Ijaz MF. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. Journal of Ambient Intelligence and Humanized Computing. 2022; p. 1–28. pmid:35039756
- View Article
- PubMed/NCBI
- Google Scholar
31. Mandal M, Singh PK, Ijaz MF, Shafi J, Sarkar R. A tri-stage wrapper-filter feature selection framework for disease classification. Sensors. 2021;21(16):5571. pmid:34451013
- View Article
- PubMed/NCBI
- Google Scholar
32. Li X, Peng S, Chen J, Lü B, Zhang H, Lai M. SVM–T-RFE: A novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles. Biochemical and biophysical research communications. 2012;419(2):148–153. pmid:22306013
- View Article
- PubMed/NCBI
- Google Scholar
33. Mishra S, Mishra D. SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm. Karbala International Journal of Modern Science. 2015;1(2):86–96.
- View Article
- Google Scholar
34. Galland F, Lacroix L, Saulnier P, Dessen P, Meduri G, Bernier M, et al. Differential gene expression profiles of invasive and non-invasive non-functioning pituitary adenomas based on microarray analysis. Endocrine-related cancer. 2010;17(2):361–371. pmid:20228124
- View Article
- PubMed/NCBI
- Google Scholar
35. Jiang H, Martin V, Gomez-Manzano C, Johnson DG, Alonso M, White E, et al. The RB-E2F1 Pathway Regulates AutophagyRB/E2F1 Pathway Regulates Autophagy. Cancer research. 2010;70(20):7882–7893. pmid:20807803
- View Article
- PubMed/NCBI
- Google Scholar
36. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20(3):273–297.
- View Article
- Google Scholar
37. Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology. 2005;3(02):185–205. pmid:15852500
- View Article
- PubMed/NCBI
- Google Scholar
38. Lausen B, Hothorn T, Bretz F, Schumacher M. Assessment of optimal selected prognostic factors. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2004;46(3):364–374.
- View Article
- Google Scholar
39. El Kafrawy P, Fathi H, Qaraad M, Kelany AK, Chen X. An Efficient SVM-Based Feature Selection Model for Cancer Classification Using High-Dimensional Microarray Data. IEEE Access. 2021;9:155353–155369.
- View Article
- Google Scholar
40. Mishra D, Sahu B. Feature selection for cancer classification: a signal-to-noise ratio approach. International Journal of Scientific & Engineering Research. 2011;2(4):1–7.
- View Article
- Google Scholar
41. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences. 1999;96(12):6745–6750. pmid:10359783
- View Article
- PubMed/NCBI
- Google Scholar
42. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science. 1999;286(5439):531–537. pmid:10521349
- View Article
- PubMed/NCBI
- Google Scholar
43. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet. 2005;365(9458):488–492. pmid:15705458
- View Article
- PubMed/NCBI
- Google Scholar
44. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005;21(5):631–643. pmid:15374862
- View Article
- PubMed/NCBI
- Google Scholar
45. Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswamy S, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer research. 2002;62(17):4963–4967. pmid:12208747
- View Article
- PubMed/NCBI
- Google Scholar
46. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Bioinformatics Laboratory; 2002. Available from: https://file.biolab.si/biolab/supp/bi-cancer/projections/info/DLBCL.html.
47. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine. 2002;8(1):68–74. pmid:11786909
- View Article
- PubMed/NCBI
- Google Scholar
48. Guyon I, Weston A, Barnhill S, Vapnik V. Gene selection for cancer classification using svm. Machine Learning Journal;2(10.1023).
- View Article
- Google Scholar
49. Butte A. The use and analysis of microarray data. Nature reviews drug discovery. 2002;1(12):951–960. pmid:12461517
- View Article
- PubMed/NCBI
- Google Scholar
50. De Jay N, Papillon-Cavanagh S, Olsen C, Bontempi G, Haibe-Kains B. mRMRe: an R package for parallelized mRMR ensemble feature selection. Submitted. 2012; p.
51. Boulesteix AL. WilcoxCV: Wilcoxon-based variable selection in cross-validation; 2012. Available from: https://CRAN.R-project.org/package=WilcoxCV.
52. Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002;2(3):18–22.
- View Article
- Google Scholar
53. Kuhn M. caret: Classification and Regression Training; 2021. Available from: https://CRAN.R-project.org/package=caret.

[ref1] 1. Akinola OA, Agushaka JO, Ezugwu AE. Binary dwarf mongoose optimizer for solving high-dimensional feature selection problems. Plos one. 2022;17(10):e0274850. pmid:36201524
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Abdelwahab O, Awad N, Elserafy M, Badr E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. Plos one. 2022;17(9):e0269126. pmid:36067196
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Song J, Li Z, Yao G, Wei S, Li L, Wu H. Framework for feature selection of predicting the diagnosis and prognosis of necrotizing enterocolitis. PloS one. 2022;17(8):e0273383. pmid:35984833
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Tahmouresi A, Rashedi E, Yaghoobi MM, Rezaei M. Gene selection using pyramid gravitational search algorithm. Plos one. 2022;17(3):e0265351. pmid:35290401
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Taguchi Y, Turki T. Projection in genomic analysis: A theoretical basis to rationalize tensor decomposition and principal component analysis as feature selection tools. PloS one. 2022;17(9):e0275472. pmid:36173994
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Chen LP. Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions. Plos one. 2022;17(9):e0274440. pmid:36107929
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Ai H. GSEA–SDBE: A gene selection method for breast cancer classification based on GSEA and analyzing differences in performance metrics. PloS one. 2022;17(4):e0263171. pmid:35472078
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. James G, Witten D, Hastie T, Tibshirani R. Statistical learning. In: An introduction to statistical learning. Springer; 2021. p. 15–57.

[ref9] 9. Das P, Roychowdhury A, Das S, Roychoudhury S, Tripathy S. sigFeature: novel significant feature selection method for classification of gene expression data using support vector machine and t statistic. Frontiers in genetics. 2020;11:247. pmid:32346383
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref10] 10. Jović A, Brkić K, Bogunović N. A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO)(pp. 1200–1205). Google Scholar. 2015; p. 1200–1205.

[ref11] 11. Das R, Dimitrova N, Xuan Z, Rollins RA, Haghighi F, Edwards JR, et al. Computational prediction of methylation status in human genomic sequences. Proceedings of the National Academy of Sciences. 2006;103(28):10713–10716. pmid:16818882
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref12] 12. Hilario M, Kalousis A, Pellegrini C, Müller M. Processing and classification of protein mass spectra. Mass spectrometry reviews. 2006;25(3):409–449. pmid:16463283
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref13] 13. Zheng C, Li L, Haak M, Brors B, Frank O, Giehl M, et al. Gene expression profiling of CD34+ cells identifies a molecular signature of chronic myeloid leukemia blast crisis. Leukemia. 2006;20(6):1028–1034. pmid:16617318
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref14] 14. Frank O, Brors B, Fabarius A, Li L, Haak M, Merk S, et al. Gene expression signature of primary imatinib-resistant chronic myeloid leukemia patients. Leukemia. 2006;20(8):1400–1407. pmid:16728981
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref15] 15. Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z. Tissue classification with gene expression profiles. In: Proceedings of the fourth annual international conference on Computational molecular biology; 2000. p. 54–64.

[ref16] 16. Shang R, Song J, Jiao L, Li Y. Double feature selection algorithm based on low-rank sparse non-negative matrix factorization. International Journal of Machine Learning and Cybernetics. 2020;11(8):1891–1908.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref17] 17. Pang Q, Zhang L. A recursive feature retention method for semi-supervised feature selection. International Journal of Machine Learning and Cybernetics. 2021;12(9):2639–2657.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref18] 18. Li Z, Xie W, Liu T. Efficient feature selection and classification for microarray data. PloS one. 2018;13(8):e0202167. pmid:30125332
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref19] 19. Hou X, Hou J, Huang G. Bi-dimensional principal gene feature selection from big gene expression data. Plos one. 2022;17(12):e0278583. pmid:36477666
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref20] 20. Bakhshandeh S, Azmi R, Teshnehlab M. Symmetric uncertainty class-feature association map for feature selection in microarray dataset. International Journal of Machine Learning and Cybernetics. 2020;11(1):15–32.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref21] 21. Li Z, Du J, Nie B, Xiong W, Xu G, Luo J. A new two-stage hybrid feature selection algorithm and its application in Chinese medicine. International Journal of Machine Learning and Cybernetics. 2022;13(5):1243–1264.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref22] 22. Nasfi R, Bouguila N. A novel feature selection method using generalized inverted Dirichlet-based HMMs for image categorization. International Journal of Machine Learning and Cybernetics. 2022; p. 1–17.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref23] 23. Javidi MM. Feature selection schema based on game theory and biology migration algorithm for regression problems. International Journal of Machine Learning and Cybernetics. 2021;12(2):303–342.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref24] 24. Hamraz M, Khan Z, Khan DM, Gul N, Ali A, Aldahmani S. Gene selection in binary classification problems within functional genomics experiments via robust Fisher Score. IEEE Access. 2022;10:51682–51692.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref25] 25. Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, et al. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Computer Science. 2021;7:e562. pmid:34141889
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref26] 26. Hamraz M, Khan DM, Gul N, Ali A, Khan Z, Ahmad S, et al. Regulatory Genes Through Robust-SNR for Binary Classification Within Functional Genomics Experiments. 2022;.

[ref27] 27. Ali A, Hamraz M, Kumam P, Khan DM, Khalil U, Sulaiman M, et al. A k-nearest neighbours based ensemble via optimal model selection for regression. IEEE Access. 2020;8:132095–132105.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref28] 28. Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, et al. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Information Fusion. 2020;63:208–222.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref29] 29. Ali F, El-Sappagh S, Islam SR, Ali A, Attique M, Imran M, et al. An intelligent healthcare monitoring framework using wearable sensors and social networking data. Future Generation Computer Systems. 2021;114:23–43.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref30] 30. Kumar Y, Koul A, Singla R, Ijaz MF. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. Journal of Ambient Intelligence and Humanized Computing. 2022; p. 1–28. pmid:35039756
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref31] 31. Mandal M, Singh PK, Ijaz MF, Shafi J, Sarkar R. A tri-stage wrapper-filter feature selection framework for disease classification. Sensors. 2021;21(16):5571. pmid:34451013
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref32] 32. Li X, Peng S, Chen J, Lü B, Zhang H, Lai M. SVM–T-RFE: A novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles. Biochemical and biophysical research communications. 2012;419(2):148–153. pmid:22306013
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref33] 33. Mishra S, Mishra D. SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm. Karbala International Journal of Modern Science. 2015;1(2):86–96.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref34] 34. Galland F, Lacroix L, Saulnier P, Dessen P, Meduri G, Bernier M, et al. Differential gene expression profiles of invasive and non-invasive non-functioning pituitary adenomas based on microarray analysis. Endocrine-related cancer. 2010;17(2):361–371. pmid:20228124
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref35] 35. Jiang H, Martin V, Gomez-Manzano C, Johnson DG, Alonso M, White E, et al. The RB-E2F1 Pathway Regulates AutophagyRB/E2F1 Pathway Regulates Autophagy. Cancer research. 2010;70(20):7882–7893. pmid:20807803
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref36] 36. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20(3):273–297.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref37] 37. Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology. 2005;3(02):185–205. pmid:15852500
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref38] 38. Lausen B, Hothorn T, Bretz F, Schumacher M. Assessment of optimal selected prognostic factors. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2004;46(3):364–374.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref39] 39. El Kafrawy P, Fathi H, Qaraad M, Kelany AK, Chen X. An Efficient SVM-Based Feature Selection Model for Cancer Classification Using High-Dimensional Microarray Data. IEEE Access. 2021;9:155353–155369.
View Article
Google Scholar

[129] View Article

[130] Google Scholar

[ref40] 40. Mishra D, Sahu B. Feature selection for cancer classification: a signal-to-noise ratio approach. International Journal of Scientific & Engineering Research. 2011;2(4):1–7.
View Article
Google Scholar

[132] View Article

[133] Google Scholar

[ref41] 41. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences. 1999;96(12):6745–6750. pmid:10359783
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref42] 42. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science. 1999;286(5439):531–537. pmid:10521349
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref43] 43. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet. 2005;365(9458):488–492. pmid:15705458
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref44] 44. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics. 2005;21(5):631–643. pmid:15374862
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref45] 45. Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswamy S, et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer research. 2002;62(17):4963–4967. pmid:12208747
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref46] 46. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Bioinformatics Laboratory; 2002. Available from: https://file.biolab.si/biolab/supp/bi-cancer/projections/info/DLBCL.html.

[ref47] 47. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine. 2002;8(1):68–74. pmid:11786909
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref48] 48. Guyon I, Weston A, Barnhill S, Vapnik V. Gene selection for cancer classification using svm. Machine Learning Journal;2(10.1023).
View Article
Google Scholar

[160] View Article

[161] Google Scholar

[ref49] 49. Butte A. The use and analysis of microarray data. Nature reviews drug discovery. 2002;1(12):951–960. pmid:12461517
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

[ref50] 50. De Jay N, Papillon-Cavanagh S, Olsen C, Bontempi G, Haibe-Kains B. mRMRe: an R package for parallelized mRMR ensemble feature selection. Submitted. 2012; p.

[ref51] 51. Boulesteix AL. WilcoxCV: Wilcoxon-based variable selection in cross-validation; 2012. Available from: https://CRAN.R-project.org/package=WilcoxCV.

[ref52] 52. Liaw A, Wiener M. Classification and Regression by randomForest. R News. 2002;2(3):18–22.
View Article
Google Scholar

[169] View Article

[170] Google Scholar

[ref53] 53. Kuhn M. caret: Classification and Regression Training; 2021. Available from: https://CRAN.R-project.org/package=caret.

Figures

Abstract

1 Introduction

2 Methods

2.1 Data sets

2.2 Support vector machine

2.2.1 Mathematical description behind SVM weights w.

2.3 Significant feature selection (sigF)

2.4 The proposed method, WSNR

3 Experiments

4 Results and discussion

4.1 Simulation

5 Conclusion

References

2.4 The proposed method, W_SNR