KNCFS: Feature selection for high-dimensional datasets based on improved random multi-subspace learning

Feature selection has long been a focal point of research in various fields.Recent studies have focused on the application of random multi-subspaces methods to extract more information from raw samples.However,this approach inadequately addresses the adverse effects that may arise due to feature collinearity in high-dimensional datasets.To further address the limited ability of traditional algorithms to extract useful information from raw samples while considering the challenge of feature collinearity during the random subspaces learning process, we employ a clustering approach based on correlation measures to group features.Subsequently, we construct subspaces with lower inter-feature correlations.When integrating feature weights obtained from all feature spaces,we introduce a weighting factor to better handle the contributions from different feature spaces.We comprehensively evaluate our proposed algorithm on ten real datasets and four synthetic datasets,comparing it with six other feature selection algorithms.Experimental results demonstrate that our algorithm,denoted as KNCFS,effectively identifies relevant features,exhibiting robust feature selection performance,particularly suited for addressing feature selection challenges in practice.


Introduction
In disease prediction tasks, the collected DNA microarray datasets are often high-dimensional.Searching for the genes that determine the occurrence of diseases among these genes is challenging,as it constitutes an NP-hard problem with a complexity of O (2 d ).Furthermore,in highdimensional datasets,there exists a significant amount of redundant and noisy features.Blindly learning these features causes the model to learn spurious correlations and reduces the performance of the mode [1,2].To address this challenge,an effective approach is to reduce data dimensionality through feature selection [3].The objective of feature selection is to retain relevant features while discarding irrelevant ones.Feature selection not only reduces feature dimensionality but also enhances model performance.
Feature selection (FS) methods can be categorized into three primary modes: wrapper mode,filter mode,and embedded mode [4].Wrapper mode typically employs heuristic search to select the most favorable features with respect to evaluation metrics [5][6][7][8].Wrapper models typically use swarm intelligent optimisation to generate binary solution vectors,where the selection of a particular feature is denoted by 1 and 0 means that the corresponding feature is not considered in the subset of features.For example,bee colony optimization [6];particle swarm optimization [8];whale optimization [9] etc.However,when dealing with high-dimensional data,these methods often struggle to complete the search within a reasonable time frame [10].To address this issue,some filter-mode methods search for the optimal subset by exploring the intrinsic relationships between samples and features [11][12][13][14].For example,literature [13] introduces the correlation coefficient and combines the correlation coefficient and mutual information to measure the relationship between different features for feature selection,and literature [15] uses mutual information and joint mutual information to balance the significance between the two feature correlation terms for weighted correlation-based feature selection.Due to the lack of a specific classifier guiding the feature selection stage,the selected features in such methods may not be optimal [16].On the other hand, embedded mode views the process of learning the optimal subset as an optimization problem.These methods introduce penalties or constraints to FS through the construction of an objective function and regularization terms related to feature weights [17][18][19][20][21][22].This encourages the model to select the most relevant features.For example,literature [23] embedded the relevance self-representation matrix into unsupervised learning to take into account the complete sample relevance and feature dependencies;literature [24] helped to identify relevant features by embedding indication labels into ridge regression models;and literature [25] proposed a new adaptive LapSVM feature selection method by embedding the acquisition of Laplacian matrices into the SVM training process in order to achieve semi-supervised learning.In comparison to filter mode, embedded mode involves interaction with a classifier and often can select features with the highest information content [26].
The Neighborhood Component Feature Selection (NCFS) [27] is an embedded method in the field of feature selection that has garnered significant attention, primarily due to its excellent performance on high-dimensional datasets.However, NCFS exhibits a notable limitation in that it is confined to acquiring knowledge within the original feature space,leading to a relatively limited information extraction from the raw samples and failing to fully exploit the latent information within the data.In a separate study,a method known as the Random Multi-subspace Approach [28] was proposed.This approach treats the reliefF method as a black box and, through multiple random data set partitions,learns local weights in each subspace to enhance the sample diversity of the reliefF method.However,it is worth noting that the experiments conducted in reference [28] were limited to low-dimensional datasets, with a maximum of 649-dimensional features used.Our further investigation suggests that the direct application of the Random Subspace Approach on high-dimensional datasets offers limited performance enhancement for NCFS.This is because in high-dimensional datasets,some features can be approximately represented as combinations of other features in a linear manner, resulting in a certain degree of feature collinearity.The existence of collinearity can reduce the model's generalization performance.Furthermore,during the random subspace partitioning of the feature space,a situation may arise where features with collinearity are accidentally assigned to the same subspace.This can lead to model overfitting and consequently decrease the accuracy of feature selection.
In addressing the issue of limited information captured by NCFS from the original samples, we introduce an enhanced approach that simultaneously considers enhancing the diversity of the original samples and mitigating the problem of feature collinearity.In formal terms, we propose a method that utilizes clustering algorithms to construct random subspaces,aiming to alleviate the impact of collinearity.Furthermore,following the completion of feature weight learning within each feature partition,we employ a feature partition weight factor to assess the contribution of each feature partition to the final weight vector, as opposed to a simple averaging approach.Extensive experiments on ten high-dimensional datasets and synthetic datasets validate the effectiveness of our algorithm.The primary contributions of this paper are summarized as follows.
• The proposed method simultaneously addresses the issues of diversity in the random subspace during the feature selection process and feature collinearity.
• A feature partition weight factor is introduced to weight the importance of features learned within each feature partition.
• Multiple sets of experiments on synthetic and real datasets confirm the effectiveness of the proposed method.Notably, the experimental results demonstrate that the consideration of feature collinearity in the random subspace approach enhances the effectiveness of feature selection.
The remainder of the paper is organised as follows,in Section 2 we briefly introduce the NCFS algorithm,2.3we detail the random multi-subspace method,as well as in 2.4 we briefly introduce the K-means cluster,followed by Section 3 where we present our method, Section 4 shows the experimental results of the new method with the comparative method, and, finally, we draw conclusions in Section 5.

Preliminaries
In this section,we introduce the notation and definitions of this paper in 2.1.In section 2.2, the original NCFS method is briefly described,and finally 2.3 presents the random multi-subspace method.

Notation and definition
Given a feature matrix X = [x 1 ,x 2 ,..,x n ] T 2R n×d ,which is a set of n training samples with a dimensionality of d,and y = [y 1 ,y 2 ,..,y n ] T representing the labels corresponding to the samples, in addition,X can be formalised as a feature set F = [f 1 ,f 2 ,..,f d ],where f i denotes the column vector consisting of the ith column of features of all samples.Then,according to the definition in [28], the set family E is a feature partition of F when the following condition holds.
when A 2 E,A is called a random subspace.For example,let D = {f 1 , f 2 , f 3 , f 4 , f 5 } be a set with 5 feature columns,and its subsets are A = {f 1 , f 3 },B = {f 2 , f 4 },and C = {f 5 },then by definition,{A, B, C} is a feature partition,of which A,B,C is a subspace, respectively.

Neighborhood component feature selection
NCFS is an embedded method for selecting features that utilizes the nearest-neighbour model.It measures the similarity between samples by using feature-weighted distances.Furthermore, for each sample x i ,the algorithm measures the probability of correct classification with a probability distribution function.After the probability of all samples being classified correctly being summed up,NCFS then introduces a penalty term to prevent overfitting.
The algorithm initially initializes the feature importance weights w as a vector with all elements set to 1.Then,based on w,it defines the weighted distance between two samples x i and x j as follows: where w l denotes the weight of the l-th feature.In order to learn w based on the approximate leave-one-out classification accuracy,the NCFS further gives a definition of the probability that a sample x i selects x j as a reference point: where κ(x) = exp(-x/σ) is the kernel function and σ is the kernel width.According to the above definition,the probability that the query point x i is correctly classified is: where y ij = 1 if and only if y i = y j otherwise y ij = 0.Finally,NCFS defines the objective function in the following form.
arg max where λ is the regularisation parameter to be adjusted.For the problem of obtaining the maximum value of the objective function,it is sufficient to make the derivative of the function F(w) with respect to w equal to 0 to derive the local optimum value of the feature weights w,and then use the gradient ascent method to update w until F(w) converges at a point near its maximum value,and output the weighting vector w at this point.

Random multi-subspace based learning
The Random Multi-subspace Approach,as introduced in reference [28], involves partitioning the original feature space into s different subspaces,with each subspace constructed based on a distinct subset of features.Subsequently,separate feature weight learning is conducted for each of these distinct feature subspaces.By repeatedly performing such partitions,the Random Multi-subspace Approach is capable of extracting additional information from the data, thereby enhancing the model's robustness and generalization capacity.
In the context of the Random Multi-subspace Approach,each random partition of the original feature space is referred to as a feature partition.Assuming that the Random Multi-subspace Approach conducts M random partitions of the original feature space, the ith feature partition can be represented as follows: P ðiÞ ¼ P ði;1Þ ; P ði;2Þ ; . . .; where s represents the number of random subspaces,and P (i,j) signifies the j-th subspace within the ith feature partition,j2{1,2, . ..,s}.Here,we assume an equal number of feature subspaces within each feature partition.
For an original feature space comprising d features,to execute a random partition, one can initially generate a random permutation of the d features.Subsequently, the first bd=sc features are consecutively assigned to distinct subspaces.The remaining features are sequentially allocated to different subspaces until all features have been partitioned.Evidently,within each feature partition,each feature belongs to a single feature subspace.
For each subspace P (i,j) ,local feature weights w (i,j) can be computed using feature selection methods such as ReliefF.Then the overall feature weight of the i-th feature partition can be obtained by splicing the local weights of its s subspaces: It is assumed that each feature partition contributes equally to the final feature weight.Then the final feature weights can be obtained by averaging the feature weights of M feature partitions:

K-means cluster
We employ the k-means algorithm for feature clustering,which is one of the most well-known and widely used clustering methods [29].It partitions a set of samples into k clusters (the value of k needs to be predetermined).
Let A = {a 1 , ... , a k } represent the k cluster centers.Consider z = [z ic ] d×c ,where z ic is a binary variable taking values 0 or 1,indicating whether feature f i belongs to the c-th cluster,where c = {1, . .., k}.The objective function of k-means is given by: where D 2 (f i -a c ) denotes the Euclidean distance between feature f and the c-th cluster centre a c .The Euclidean distance is a commonly used similarity measure.The k-means algorithm iteratively minimizes the objective function J(A, z) and updates the cluster centers A and the membership matrix z as follows:

:
The algorithmic steps of K-means are as follows:Initially, k features are randomly chosen as the centers of the k clusters.Then,(1) the membership degree of each feature to each cluster center is computed following Eq (10).For a feature f i , it is assigned to cluster a c if a c is its nearest cluster center.Once all features are assigned to their respective clusters,(2) the positions of each cluster center are updated following Eq (9).Steps 1 and 2 iterate mutually until the stopping criteria are met.

The proposed method
Previous random multi-subspace weight learning methods did not take into account that in high-dimensional datasets,some highly collinear features might be accidentally allocated to the same subspace.This collinearity is prevalent and can lead to local overfitting of the algorithm, thereby reducing the accuracy of feature selection.Hence,to address the issue of feature collinearity within random subspaces while ensuring diversity within the original sample space,we propose a novel approach.
Our algorithm requires performing M iterations.Initially,each feature is treated as equally important,with the initial weight vector w set as a vector of all ones.In the i-th iteration of the algorithm,we commence by partitioning the original feature set into k clusters based on a correlation measure using K-means clustering.Subsequently, we randomly select features from each feature cluster to construct s equally sized random subspaces.Within each subspace,we employ NCFS to learn its local feature weights.These local feature weights from each subspace are then integrated to form a complete d-dimensional feature vector,denoted as w (i) .When integrating the feature weights w (i) learned during each iteration into the overall feature weight vector w,we apply an importance factor to weight them,as opposed to the previous approach of taking a simple average.The general framework of the algorithm is illustrated in Fig 1 .Section 3.1 presents the method for constructing random subspaces using K-means clustering,while Section 3.2 introduces the proposed weighting factor.

Use K-means to generate subpsaces
To address the issue of feature collinearity within random subspaces,we cluster features based on their inter-correlations.The purpose of this step is to group features with a certain level of correlation into the same cluster.To achieve this,we employ the correlation coefficient as a measure in the K-means objective function instead of the Euclidean distance.Consequently, the formulas for Eqs ( 8) and ( 10) should be defined as follows: : where,pea(f i -a c ) represents the Pearson coefficient between feature f i and the c-th cluster center, a c .The Pearson coefficient is a commonly used measure of the degree of correlation between variables and its values range between -1 and 1.A Pearson coefficient closer to 0 indicates a smaller degree of correlation between variables, while a coefficient closer to 1 (or -1) suggests a larger degree of correlation.
Once the feature set is divided into k clusters (K 1 , . .., K k ),for any feature cluster K c , it comprises different features, denoted as: where f (c,j) represents the j-th feature in the c-th feature cluster,and n c is the number of features in the c-th feature cluster.To construct feature subspaces with low collinearity among their constituents,we generate a random permutation of length n c for each feature cluster K c .Subsequently,we sequentially assign bn c =sc contiguous features to each subspace.The remaining features are assigned to different subspaces in order until all features have been partitioned.
Therefore,the feature cluster K c can be represented as: where su (c,j) denotes the feature assigned by K c to the j-th subspace,and any intersection between su (c,j) is empty.Thus,for each subspace P (i,j) ,it can be represented as: P ði;jÞ ¼ fsu ð1;jÞ ; . . .; su ðk;jÞ g ð14Þ each feature cluster is evenly divided into s segments,which are then separately incorporated into each subspace.Each subspace,denoted as P (i,j) ,constitutes the i-th feature partition generated by the algorithm.Subsequently,we employ NCFS to learn local feature weights within each subspace,where w (i,j) represents the feature weights learned in the j-th subspace of the i-th feature partition (i.e., the i-th iteration of the algorithm).Upon computing the feature weights w (i, j) for all subspaces,they are consolidated into a complete d-dimensional feature weight vector, denoted as w (i) .

Weighting w (i)
Previously,every feature partition's contribution was assigned equal weight in prior methods for random subspace,resulting in an average of weights across all partitions in the final feature weights outcome.Our argument is that contributions from each feature partition may not all have the same weight.Therefore, we introduce an appraisal factor,α,for the feature weight vector, w (i) ,to provide a weighted assessment for every w (i) .Before calculating α,it is crucial to acquire a correlation matrix R linked to the feature matrix X.To compute the (i, j)-th element of R,use the subsequent equation: where f i and f j represent the i-th and j-th feature columns of X,'pea' denotes the Pearson coefficient.According to the above definition,R is a d×d symmetric matrix.Subsequently,we perform Cholesky decomposition on R and multiply it by a random sample θ drawn from a Gaussian distribution,resulting in v: The form of v conforms to w,a chance variable that maintains a particular correlation structure.Afterward,we introduce v into a cumulative multivariate distribution function,resulting in a cumulative multivariate probability u that is distributed across [0-1]: Define π (i) as the normalised w (i) and X (i) as an empty set.Formally,if (π (i) ) k is greater than u k , the feature f k is added to X (i) .Finally,the weighting factor α is defined as: In this case,the 'evaluate' function calculates the classification results on the feature matrix X (i) ,and the classification accuracy (ACC) can be selected as the evaluation criterion,with Classifier representing the selected classifier.We define 'classifier' as the KNN classifier(k = 3).Therefore,at the end of the i-th iteration of the algorithm,we use the following formula to add w (i) to the total weight vector w: where w (i) denotes the feature weight vector obtained by the algorithm from the i-th iteration.
The detailed steps of the algorithm are described in Algorithm 1. 2 for i = 1 to M do 3 set P (i,1) , . ..,P (i,s) as sempty sets.
5 for c = 1 to k do 6 for j = 1 to s do 7 randomly select blenðK c Þ=sc features from K c ,add them into P (i, j) ,and remove the selected features in K c .

8
if K c is not empty then 9 for q = 1 to len(K c ) do 10 Randomly select a feature from K c to be added to any subspace P (i,s) , and subsequently remove this feature from K c .
11 for j = 1 to s do 12 use NCFS to compute feature importance w (i,j) on P (i,j) .

13
w (i,1) , . ..,w (i,s) are fromed into a complete weighting vector w (i) based on the indexing of the features.
20 return w

Experiments
In this section,we conducted multiple sets of experiments to evaluate the performance of the proposed algorithm.Firstly,we explore the convergence of the proposed K-means algorithm based on Pearson coefficients.Subsequently,we compared this method with other approaches on both synthetic datasets and real-world datasets,and the experimental results confirmed the effectiveness of this method.Finally,we investigated the sensitivity of the algorithm to its parameters to determine the optimal parameter configuration.

Datasets
Ten real-world datasets were utilised as the primary experimental benchmarks to assess the performance of the proposed method.These datasets were obtained from diverse domains such as facial images (pixraw10P, warpAR10P),biomedical data (lung_discrete,tumors_C,GLI-OMA,TOX_171,leukemia,ALLAML),and other areas (SCADI,arcene).Table 1 offers comprehensive information on these datasets.In addition,synthetic datasets were produced as benchmarks utilizing four small-scale datasets from the UCI database [30].Further details about these synthetic datasets will be explained in Section 4.5.2.Subsequently,all datasets were normalized to conform to a standard distribution.

Compared method
We compared the proposed KNCFS method against six different approaches.Specifically,we used chi-square as a baseline and we compared KNCFS with two widely used feature selection (FS) methods [32][33][34], Fisher-score and ReliefF.Additionally,we considered RBEFF,recognized as one of the most advanced FS methods,and NCFS,renowned for its excellent performance on high-dimensional datasets.Given the various improvements we made to NCFS,to evaluate the effectiveness of the improvements in this paper,we also used RB-NCFS as a comparative method.Below,we provide a brief overview of all the comparative methods: • chi-square [35]: A statistical method used to select categorical variables significantly associated with the target variable.
• fisher-score [36]: Measures the importance of features for classification tasks by comparing the variance between different classes and within classes.
• ReliefF [37]: A feature selection method based on a nearest neighbor model,calculated using reliefF scores to assess feature importance,Number of nearest neighbours k = 5.
• RBEFF [28]: A method based on random subspaces that uses reliefF to learn local feature weights in subspaces,number of feature partitions M = 10,number of subspaces s = 10,number of nearest neighbours in reliefF k = 5.
• NCFS [27]: A feature selection method based on fast neighborhood model analysis,maximizing leave-one-out classification accuracy to obtain feature weights,the kernel width σ = 1 and the regularisation parameter λ = 1.
• RB-NCFS:A method based on random subspaces that uses NCFS to learn local feature weights in subspaces,number of feature partitions M = 10,number of subspaces s = 10,the kernel width for NCFS σ = 1 and the regularisation parameter λ = 1.

Compared metric
In our experiments,to validate the effectiveness of the method,we employed four classifiers to calculate classification performance:Support Vector Machine (SVM), Naive Bayes (NB),Decision Tree (DT),and K-Nearest Neighbors (KNN).Additionally, we utilized standard evaluation metrics,accuracy (ACC),and F 1 -score,to assess the performance of different feature selection methods.ACC and F 1 -score range between 0 and 1,where higher values indicate better performance.
1. ACC: Where I(y i = c(x i )) = 1 if and only if y i = c(x i ),y i is the true label of x i ,c(x i ) is the predicted label of sample x i by the classifier.In binary classification problems,samples are categorized into four scenarios based on the actual labels and predicted labels:True Positives (TP),False Positives (FP),True Negatives (TN), and False Negatives (FN).Precision is the proportion of samples predicted as "positive" that are actually "positive" among all samples predicted as "positive," while recall is the proportion of samples actually labeled as "positive" that were correctly predicted as "positive" by the model.The definitions of these two metrics are as follows (Eqs ( 21) and ( 22)):

F 1 -score:
Often,we would like to take care of both Precision and Recall,therefore,F 1 -Score is another commonly used metric,which is the reconciled average of Precision and Recall,and can be used to comprehensively evaluate the performance of the model.It is defined in Eq (23): The F 1 -score for binary classification can be extended to multiclassification problems, where one of the classes is considered as a positive class and the others as negative classes,and then the F 1 -Score is calculated according to Eq (23).

Parameter settings
For KNCFS,there are three parameters to consider:the number of feature subspaces M,the number of subspaces s,the number of clusters for feature clustering k.In prior research,the values for σ and λ in NCFS were recommended to be {1, 1},and for M,s,and k,we will explore their optimal settings within the range of {5, 10, 15, 20, 25}.
Regarding the parameters for classifiers,we chose the RBF kernel for the support vector machine,set the maximum tree depth to 5 for the decision tree,and selected 3 nearest neighbors for K-nearest neighbors (KNN).The parameter configurations are summarized in Table 2.

Convergence results.
In this paper,K-means obtains the optimal result under the condition that:at the kth iteration,the objective function J k (A,z)-J (k-1) (A,z)<η or k > max_iter where we set η = 0.02 and max_iter = 300.Fig 2 shows the convergence of the K-means method we use on ten datasets,and we find that that the algorithm converges quickly on most datasets,while on the SCADI,TOX_171,and arcene datasets,the algorithm's objective function value oscillates within an interval,and returns results only when the maximum number of iterations is reached.

Classification result.
The average classification accuracy of the seven feature selection methods is presented in Tables 3 and 4,while the F 1 -score results are shown in Tables 5  and 6.The last row in the tables demonstrates the number of times each method achieved the best results (win/tie).The average results acheaved by SVM with defferent features are shown in  a) also shows that KNCFS obtains the best classification accuracy.It's worth noting that RB-NCFS (2/0 and 8/1) generally performs better than NCFS (0/0 and 1/1) because it is a random subspace method that considers sample diversity,giving it an advantage over traditional methods.However, due to its lack of a solution for feature collinearity,its performance falls short compared to KNCFS.
On the other hand,KNCFS obtained the best results 27 times in Table 5 and 25 times(22/2) in Table 6 for F 1 -score,demonstrating a significant advantage in F 1 -score as well, and, Fig 4 (sub-figure b) also shows that KNCFS obtains the best F 1 -score.RBEFF achieved the best results twice (selecting 50 features) and three times (selecting 100 features) in the SCADI dataset because of its lower dimensionality.RBEFF is a filter-based feature selection method that performs well on small datasets.However,due to its lack of guidance algorithms in the feature selection stage and its inability to consider nonlinear relationships between features,its performance on high-dimensional datasets falls short compared to KNCFS.

Success rate of feature selection.
In this subsection,we generated synthetic datasets based on four real-wrold datasets from the UCI database.These datasets are as follows:Caesarian(80 samples, 5 features,2 classes),Fertility(100 samples,10 features,2 classes),BLOGGER(100 samples,5 features,2 classes) and Immunotherapy(90 samples,7 features,2 classes).Before starting the experiment,we considered the initial features of each dataset as relevant.We then added noise features consisting of random numbers with a mean of 0 and a variance of 5.The number of noise features varies from 50 to 500 in increments of 50 to form a set of synthetic datasets.We apply all seven methods to each synthetic dataset to learn the importance of the features and then rank the importance of the features.For example,on the Caesarian dataset with five relevant features,we ranked the feature weights and counted the number of relevant   noisy features increases.Lastly,KNCFS consistently exhibits the strongest performance on the BLOGGER and Immunotherapy datasets.This study establishes KNCFS as a compelling contender.RB-NCFS is often less effective than NCFS,possibly due to overfitting caused by random noise features interfering with the random subspace approach.This creates bias in the

Parameter sensitivity analysis
In order to investigate the influence of the parameters M,s and k on the performance of our proposed method,we performed sensitivity experiments on the average accuracy of the SVM and KNN classifiers.For simplicity,we selected two representative datasets,namely "leukaemia" and "GLIOMA",to participate in the parameter sensitivity analysis.As shown in Figs 6 and 7, the accuracy of the algorithm decreases when k exceeds 15,with the highest accuracy achieved when k is in the range {5, 10}.Similarly,the algorithm shows higher accuracy when s is in the range {5, 10}.Overall,variations in M have a relatively small effect on the accuracy of the algorithm.Therefore,it can be concluded that the algorithm is not very sensitive to the value of M.Finally,we set M,s and k to {10, 10, 10}.

Conclusion
Feature selection (FS) is an important data preprocessing technique that reduces the dimensionality of a dataset,decreases model complexity and lowers computational cost.In this paper,we propose a random multi-subspace method based on feature correlation clustering, which is implemented through an iterative process consisting of two key phases:a stochastic subspace learning phase and a feature vector weighting phase.The stochastic subspace learning phase aims to increase the diversity of samples to extract more information, while the feature vector weighting phase evaluates the feature partitions.We conducted numerical experiments on two types of datasets:real-world datasets and synthetic datasets with noisy features.The experimental results,when compared with Chi-square,Fisher-score,ReliefF,RBEFF, NCFS and RB-NCFS,show that KNCFS is a state-of-the-art FS algorithm as it can effectively identify relevant features.In this study,we used the existing feature selection method NCFS in the subspace learning phase,but there are more advanced feature selection methods that can be used to improve the effectiveness of the algorithm.In addition,selecting feature selection methods or unsupervised feature selection algorithms capable of handling multi-labeled data could further extend the applicability of the algorithm.Feature selection remains an important area of research with many other aspects to be explored.

Fig 3 .
The average results accumulated by the four classifiers are shown in Fig 4. Regarding classification accuracy,KNCFS achieved the best results 27 times for selecting 50 features and 26 times(21/5) for selecting 100 features, Fig 3 shows that in most cases,KNCFS outperforms the other six comparative methods in terms of classification accuracy,and Fig 4 (sub-figure features in the top five to measure the success of feature selection.The experimental results for the synthetic dataset are shown in Fig 5.In the Caesarean dataset (sub-figure a),KNCFS and Chi-square have similar results when the number of noisy features is below 200.On the Fertility dataset (sub-figure b),KNCFS outperforms NCFS slightly,and the gap between KNCFS and RBEFF widens as the number of