Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Interfered feature elimination coupled with feature group selection for wound infection detection by electronic nose

  • Jia Liu ,

    Contributed equally to this work with: Jia Liu, Jinglei Zhang

    Roles Conceptualization, Formal analysis, Methodology, Writing – original draft

    ☯ These authors contributed equally to this work.

    Affiliation School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, China

  • Jinglei Zhang ,

    Contributed equally to this work with: Jia Liu, Jinglei Zhang

    Roles Data curation, Formal analysis, Validation

    ☯ These authors contributed equally to this work.

    Affiliation The First Affiliated Hospital of Sun Yat-sen University of Clinical and basic research of assisted reproductive technology, Guangzhou, China

  • Shaoqi Zhang,

    Roles Software

    Affiliation College of information and management science, Henan Agricultural University, Zhengzhou, China

  • Kaiwei Li,

    Roles Data curation

    Affiliation The First Affiliated Hospital of Henan University of CM Clinical and Basic Research on the Treatment of Digestive Tract Tumors with Integrative Medicine, Zhengzhou, China

  • Xiang Li,

    Roles Data curation

    Affiliation The First Affiliated Hospital of Henan University of CM Clinical and Basic Research on the Treatment of Digestive Tract Tumors with Integrative Medicine, Zhengzhou, China

  • Shuo Zhang,

    Roles Methodology, Software, Validation, Writing – original draft

    Affiliation College of information and management science, Henan Agricultural University, Zhengzhou, China

  • Hang Gu,

    Roles Validation

    Affiliation School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, China

  • Zhen Chen,

    Roles Funding acquisition, Project administration

    Affiliation College of information and management science, Henan Agricultural University, Zhengzhou, China

  • Chao Liu ,

    Roles Software, Validation

    cliu@henau.edu.cn (CL); suntong@henau.edu.cn (TS); nan1969@sina.com (NZ)

    Affiliation College of information and management science, Henan Agricultural University, Zhengzhou, China

  • Nan Zhang ,

    Roles Data curation, Project administration, Supervision, Writing – review & editing

    cliu@henau.edu.cn (CL); suntong@henau.edu.cn (TS); nan1969@sina.com (NZ)

    Affiliation The First Affiliated Hospital of Henan University of CM Clinical and Basic Research on the Treatment of Digestive Tract Tumors with Integrative Medicine, Zhengzhou, China

  • Tong Sun

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Software, Supervision, Writing – review & editing

    cliu@henau.edu.cn (CL); suntong@henau.edu.cn (TS); nan1969@sina.com (NZ)

    Affiliation College of information and management science, Henan Agricultural University, Zhengzhou, China

Abstract

As the precise odor-sensing equipment, the electronic nose integrates multiple advanced and sensitive sensors that can identify wound infections non-invasively and rapidly by analyzing wound characteristic odor. To reduce the cost of sensors and improve or maintain e-nose’s performance, efficient optimization of sensor arrays is required. For this issue, we proposed a new sensor array optimization algorithm named Interfered Feature Elimination coupled with Feature Group Selection (IFE-FGS). In this method, the IFE algorithm first removed the bad sensor features; then the FGS algorithm determined the optimized sensor combination by gradually selecting the features in groups. The experimental results show the superiority of the IFE- FGS method on two bacteria datasets and six public gene expression profiling datasets. IFE- FGS achieves the classification accuracy of 93.95% and 94.94% in mean accuracy and max accuracy, respectively, on the bacteria dataset 2, which is significantly ahead of the comparison methods. Besides, our proposed method shows consistency and effectiveness. It achieves excellent performance, which takes two first-places, four second-places in mean accuracy, one first-place, and six second-places in max accuracy. Moreover, it also explores three novel and valuable discoveries for the electronic nose: 1) It can effectively identify biomarkers in the application. 2) It can effectively distinguish the degree of chemical components contributing to odors. 3) It can reveal the effective detection range of the targets.

1. Introduction

Wound bacterial infection is a common complication of wounds. It can severely hinder wound healing, make the wound difficult to treat, and increase the risk of amputation and death to the patients [13]. The traditional detection methods, such as morphological examination and immunological test, have the disadvantages of high expenses, long detection time, invasive injury, etc., resulting in the standard cycle of diagnosis and treatment may last for several months, which cannot meet the actual needs of better diagnosis and treatment [4,5]. In-home care, the patient and caregiver have difficulty being aware of wound changes in time, which can easily cause the condition to be delayed. Therefore, wound bacterial infections have seriously threatened human health and brought a substantial economic burden on society [68]. Based on the above needs, how to accurately, quickly, and low-cost detection of wound bacterial infection has attracted the attention of many researchers.

The Odor-sensing approach was able to identify wound infections by analyzing wound characteristic odor, which has the advantages of noninvasiveness, rapid response, easy operation, and etc., and it is very suitable for routine detection of wound infection [9,10]. Electronic nose (e-nose) is a common odor-sensing approach, which is an electronic device that mimics the working principle of the mammalian olfactory system [1115] and has been intensively applied in many fields such as medical diagnosis [1619], food safety [20], flavor identification [2123], environmental monitoring [2426]. An e-nose system typically consists of three parts: a multi-sensor array to sense the target odor, a signal processing unit to generate odor data, and a set of dedicated algorithms to identify the odor. Bacterial wound infection detection uses gas sensors with broad selectivity and cross-sensitivity to jointly construct odor fingerprints. Finally, it recognizes the sensor signal through the dedicated odor recognition algorithm. Hence, E-nose can quickly and effectively detect and monitor wound infection by analyzing wound characteristic odor.

For the established electronic nose sensor array, the performance of the algorithm applied to odor recognition determines the final detection capability of the whole electronic nose system. However, the known odorous compounds are as many as more than 10,000 [27], so developing a universal e-nose to complete all the odor detection missions is nothing but wishful thinking. Therefore, to achieve rapid and effective detection of bacterial wound infection, it is necessary to design a specific sensor array optimization algorithm to reduce the cost and even improve the performance of the electronic nose system.

Based on the above research requirements, sensor array optimization (SAO) algorithms are increasingly gaining attention in the electronic nose community. The general concept of SAO is to determine the best sensor combination to provide effective and robust odor judgment. This implies that the SAO algorithm should guide us in eliminating poor and redundant sensors and/or adding necessary sensors in the array, thus achieving rapid detection of bacterial wound infection while reducing the cost of sensors. Currently, in the e-nose community, SAO is commonly solved by introducing feature selection technology in the machine learning community. The technology introduction could be crystallized into two major technical issues, i.e., 1) the evaluation of the features and 2) the searching scheme to form the compact key feature set. For 1), statistic and information theory methods are usually adopted, such as Pearson correlation coefficient, mutual information, and performance sensitivity. For 2), Sequential Selection Algorithms and Heuristic Search Algorithms have been extensively applied, such as Sequential Feature Selection (SFS), Sequential Floating Forward Selection (SFFS), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA). The details about this can refer to the review article by Chandrashekar and Sahin [28] and other literature [2931].

Some advanced methods applied for medical research have demonstrated the superiority of their algorithms and are capable of effectively extracting valuable information. To be specific, Elnaz Pashaei [32] proposed a novel Binary Sand Cat Swarm Optimization to solve the “curse of dimensionality” in biomedical data analysis and can even successfully classify colon cancer without any error [33]. combined hub gene ranking techniques and feature selection algorithms to identify reliable biomarkers and therapeutic targets for Alzheimer’s Disease research [34]. Integrated differential expression with network centrality analysis and then identified genes over-represented in crucial pathways and cancer fitness genes [35]. used metaheuristic algorithm can improve classification in several publicly available high-dimensional biomedical datasets. Although these methods did not focus on the research field of the electronic nose community, they still brought us many valuable inspirations.

In some released works of SAO, for example, in [3643], individual evaluation and manipulation of a single feature form the basis of the optimization process. However, we believe that this ignored the working principle of e-nose, i.e., using a set of sensors with broad selectivity and cross-sensitivity to identify odors, which may lead to performance loss. Therefore, we infer that the e-nose sensors can be viewed as working in several groups. Inspired by this, we propose a novel SAO searching scheme named Feature Group Selection (FGS), which is used in our application of bacteria detection. In our practice, original features are first filtered by the Interfered Feature Elimination (IFE) algorithm in an unsupervised way to exclude the features with a low signal-noise ratio (SNR), and then the FGS algorithm is employed to find the key and compact feature group.

This paper aims to conduct research on SAO for e-nose and optimize the custom e-nose for detecting wound infection. The main work is threefold: 1) A feature evaluation criterion based on linear transformation, which was widely used in feature selection, is reformulated; 2) Under a reasonable assumption, the IFE method is presented to exclude the features with low SNR; 3) We proposed an effective and efficient sensor arrays optimization method to optimize the custom e-nose for wound infection detection and analysis.

For the arrangement of this paper, before elaborating on the FGS algorithm, the feature evaluation criterion and IFE algorithm are introduced, which are used to preprocess the original features. Specifically, the textual construction and content are as follows. Section 2.1 presents a brief description of the custom e-nose, the measurement process of 3 common pathogenic bacteria, and two custom bacteria datasets. In section 2.2, a feature evaluation criterion based on linear transformation is first reviewed and then reformulated, which is used in the IFE algorithm. In section 2.3, the IFE method is presented based on a reasonable assumption and employment of the criterion and Pearson correlation coefficient. Subsequently, in section 2.4, the FGS algorithm is elaborated step by step, where a crossover selection and a mutation selection are involved. In section 2.5, the validation protocol of the involved algorithms is presented. After describing the initial performance of the e-nose without SAO in section 3.1, the validation results of the proposed method and eight benchmarks on the bacteria datasets and six public datasets are discussed in section 3.2, where the reason for the outstanding merit of the IFE-FGS method on model stability is explained. In section 3.3, we seek some knowledge extension on the application with the assistance of the SAO algorithm. Finally, the conclusion of the whole paper is made in section 4.

2. Materials and methods

2.1. Experiment and datasets

In this research, we further investigated the potential of an odor-sensing approach for the detection of wound infections by proposing a novel sensor array optimization algorithm. Specifically, we collect the data from the experiments of 44 Sprague Dawley (SD) rats [42,44]. Besides, based on a full-thickness wound infection model, the rats were inoculated with three common wound infection pathogens, including Escherichia-coli (EC), Pseudomonas aeruginosa (PA), and Staphylococcus aureus (SA), for three infected groups and sterile Phosphate Buffer Solution (PBS) for a control group, respectively. The prototype was used to directly sniff the rat wound samples without any sample pretreatment to acquire the wound odor data, and an information fusion algorithm dedicated to the dual odor-sensing prototype was developed to accomplish the final identification of the rat infection type.

The specifications of the sensors used in the e-nose and a description of the two bacteria datasets were listed in Tables 1 and 2, respectively.

thumbnail
Table 1. Specifications of the involved sensors of the custom e-nose.

https://doi.org/10.1371/journal.pone.0327748.t001

thumbnail
Table 2. Description of the two custom bacteria odor datasets.

https://doi.org/10.1371/journal.pone.0327748.t002

The custom e-nose consisted of two modules: the gas channel module and the signal processing and control module. The gas channel module mainly included Teflon gas lines, an air filter (Teflon membrane), a single-way solenoid valve and two three-way solenoids, a Teflon sample chamber, a sensor array of 4 environmental sensors and 30 gas sensors, a mass flow controller (MFC), and a vacuum pump. The signal processing and control module mainly included an upper computer, a signal processing circuit, and a control board. The detection process has 14 min, including baseline collecting of 3 min, sample collecting of 3 min, system purging of 8 min, where the gas flow rate is set at 100 ml/min by the MFC.

Escherichia coli (EC), Staphylococcus aureus (SA), and Pseudomonas aeruginosa (PA) are three common pathogenic bacteria responsible for wound infection. By using different carrier gas, we detected the three bacterial fluids by the custom e-nose, and acquired two bacteria datasets. Dataset 1 was collected using clean air as the carrier gas, and dataset 2 using the volatiles of medical ethanol as the carrier gas to simulate the odor background of the hospital ward.

2.2. Evaluation of feature relevance by linear transformation

The feature evaluation criterion serves as the assessment of feature relevance for the SAO algorithm. There exists a kind of feature evaluation criteria derived from linear transformation, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Independent Component Analysis (ICA), which has been widely used to estimate feature relevance [42,4551]. In this subsection, the fundamental criteria was first introduced, and a brief review of them was listed in Table 3. And then they were corrected and reformulated.

thumbnail
Table 3. Previously literatures on feature evaluation for relevance based on linear transformation.

https://doi.org/10.1371/journal.pone.0327748.t003

Provided that a multivariate random vector is of s original features, and it is projected on a set of unit vector by linear transformation, so the transformed random vector can be represent as

(1)

where

(2)

and is the j-th component of the projection direction . Intuitively, A bigger indicates more importance of the corresponding feature on the projection direction , and the importance of can be denoted by contribution degree ,

(3)

where the has different meanings in different linear-transformation technologies. For example, in PCA, is the i-th largest eigenvalue of the covariance matrix of the observations of . It denotes the sample variance on principle component , and is the intensity of the component ; In LDA, is the i-th largest eigenvalue of ( is the inverse of within-class scatter matrix and the between-class scatter matrix), and it denotes the Fisher separation or signal-to-noise ratio (SNR) on projection direction . The detailed description and formulation about PCA and LDA can refer to the literature [47] and [50].

We revised the evaluation for feature relevance of as

(4)

where . The numerator is the absolute value of to the power of p. The denominator is the p vector norm of to the power of p, which is to normalize the numerator, and this makes the first issue come out, i.e., the normalization. The aim of the normalization is to guarantee the allocation of on is conserved, i.e., . While, as shown in Table 3, a 2-norm normalization is equivalent to be adopted by considering that is usually normalized by 2-norm in the transformation process automatically, i.e., , which causes an false allocation of . For example, given a unit , then by the method of literature [47] the total allocated contribution would be , so obviously the increased 41.4% of is groundless. But in our formulation, the issue is solved by make the numerator and its normalization term matched.

Another issue is that the projection direction selection may affect the feature relevance evaluation. In literature [47] and [46], the authors noticed the effect on the final performance by selecting different projection directions to evaluate the feature importance, yet no practical method was presented to complete the selection. In the following subsection, we will present an implementation of the selection to calculate feature relevance by Eq. (4).

2.3. Interfered feature Elimination (IFE)

The IFE method used PCA and Pearson correlation coefficient to evaluate the feature relevance, and thereby eliminated the features badly interfered with the noise in unsupervised way. This method was built on a reasonable assumption that the ability of sensors to resist the actual noise is different since they are usually made with various materials and principles. Therefore, we could deduce that most sensors tended to point to directions that roughly consistent with the latent signal directions in feature space. Based on this deduction, we provided a specific implementation of the IFE method, which is as follows:

First, we normalized the original features by z-score standardization, respectively, so the variance of each original feature was scaled to 1. Then, we transformed the feature dataset by PCA, and the derived principal components with intensity bigger than 1 (which was equivalent to ) were considered to be consistent with the latent signal direction. Next, we used these components to calculate feature relevance by Eq. (4). The f 1, f 2, and f 3 denoted three features, respectively, and their intensity (module) was pre-scaled to 1. By PCA transformation of the three features, the principal components PC 1 and PC 2 with a bigger intensity were considered as be consistent with the latent signal direction. While the PC 3 with a smaller intensity was excluded because it might be badly interfered with.

Second, we sorted the features in descending order according to their individual relevance. We believe that the feature with a higher relevance was more likely to be the key feature than the feature with a lower relevance. Therefore, the feature was excluded if it was not correlated with each feature that was higher than it in relevance. The specific algorithm flow of the IFE method is listed in Table 4. The threshold was the minimum of the Pearson correlation, which is set below 0.4 empirically.

2.4. Feature Group Selection (FGS)

An implementation of FGS method was drawn in Fig 1, which was to determine a combination of 8 features from 64 sensor features. Based on the specific implementation, the FGS method was elaborated as below.

In the first step, we ranked the 64 features in descending order of their individual relevance calculated by regularized LDA (rLDA) [52]. In Step 1, feature f (1) was first assigned as a chooser group (a special group with one feature) to select its partner group from the remaining 63 groups (f (2) to f (64)). The selected partner group should make the combination of the two groups achieving the maximum Fisher score calculated as Eq. (5),

(5)

where was the eigenvalue of the j-th projection direction of the combination of the chooser group and partner group i derived from rLDA. The threshold was used to exclude the direction that was not effective for classification. As shown in Fig 2 (a)–(e), five scatter diagrams were draw with gradually increasing the distance between the centers of the two categories. In each of the diagrams, the 400 points were randomly chosen from a normal distribution with a standard deviation of 1. It was clear that the two categories become separable when the was bigger than 0.5. Therefore, it was appropriate to set to about 0.5. Then, similarly, we assigned the subsequent single feature groups picked out the partner groups, respectively, and obtained 32 feature groups.

thumbnail
Fig 2. Random points from norm distribution with different lambda.

https://doi.org/10.1371/journal.pone.0327748.g002

In the second step, an iterative ranking and selecting procedure were conducted in Steps 2 and 3. In Step 2, the 32 feature groups were ranked in descending order of their respective Fisher score. In Step 3, once found that the chooser group could not keep its previous rank when taking with the current partner group, a check procedure was conducted. For example, after Step 2, the previous first group {f (1), f (2)} dropped to the third place and {f (3), f (6)} moved up to the top, which was marked in red solid box in Fig 2,

In the third step, we first conduct the crossover selection to check the procedure. We extra calculated the score of {f (3), f (2)} and {f (3), f (1)}, respectively, and assigned f (2) or f (1) as a new partner group of f (3) if the new combination achieved a higher Fisher score. Then we conduct the mutation selection by extra calculating the score of {f (6), f (2)} and {f (6), f (1)}, respectively, and assigning f (2) or f (1) as a new partner group of f (6) if the new combination achieved a higher Fisher score. As a result, the group {f (3), f (2)} achieved a higher score than {f (3), f (6)}, so it was the new top group. Moreover, the feature f (6) was assigned to be the partner group of f (1).

When the check procedure to all the feature groups was finished, we ranked the new groups according to the new scores, and continued to conduct the procedure until the rank was not changed. In this setting, a feature group was given the right to select its partner according to its current rank but not to possess the partner. Similarly, in Stages 2 and 3, the group size was gradually enlarged to four and eight, and a group of 8 key features was formed on the top. Finally, a routine back sequential feature elimination (BSE) procedure was conducted, which iteratively eliminated a feature from the 8 key features and added a new feature until the Fisher score was not significantly increased.

2.5. Protocol and validation

Eight methods of the state of the art were used as benchmarks, i.e., FSASL (Feature Selection with Adaptive Structure Learning) [53], MCFS (Multi-Cluster Feature Selection) [54], UFSOL (Unsupervised Feature Selection with Ordinal Locality) [55], LASSO [56], Relief-F [57], SVM-RFE (Support Vector Machines with Recursive Feature Elimination) [58], mRMR (max-Relevance Min- Redundancy) [59], and RST (Rank Sum Test) [60,61]. For the validation on two bacteria datasets, the size of the optimized sensor combination was set to eight. Besides, we added six public gene expression profiling datasets for validation, including Srbct [62], ULC [63], Brain [64], Breast [65], Lymphoma [66], NCI [67], which was to increase the reliability of the validation.

Although our sensor array optimization method is Specifically proposed for the e-nose that applied to wound infection detection, the production and collection of our bacterial datasets are time-consuming and rare. Similar to the experiments on our bacteria datasets, we further did the experiments on the six public gene expression profiling datasets to verify the effectiveness of our proposed method. Gene expression profiling is the overall activity (the expression) of hundreds or thousands of genes, which is also known as gene fingerprint. A set of gene expression data can be processed by feature selection algorithm to find a good feature (expression) combination to better understand the information of these data. The size of the selected feature combination was also set to eight since the small feature combination can be perceived as the basis of any desired big feature combination. The description of the structure of the involved eight datasets is listed in Table 5.

For algorithm validation, to reduce the risk of overfitting, we used the protocol of 20 times 5-fold cross-validation instead of the common protocol of 10 times 10-fold cross-validation. We believed that the risk of overfitting could be greatly reduced by doubling the number of random tests and halving the number of partitioned subsets. Based on the adopted protocol, algorithm accuracy was calculated by averaging the 100 random recognition rates. Moreover, Least Squares Support Vector Machines (LSSVM) was relied as the classifier under the setting of RBF kernel and grid search for parameter optimization. The codes were executed at MATLAB 2024b on the Windows operating system with Intel(R) Core (TM) i7-4790K.

It needs to be explained that, for our application of bacteria detection, the optimization on the bacteria datasets consists of four steps, as shown in Fig 3. Initially, the respective performance of seven features common in the e-nose community [68] is investigated, and then three good features are selected to construct the feature dataset. In step 0, the three features are extracted on the 30 gas sensors, so totally 90 original features are obtained. In step 1, the IFE method is used to eliminate the interfered features. And in step 2, the remainders are ranked by the Fisher score of rLDA. Finally, the FGS method and BSE procedure are applied to determine the key feature set.

thumbnail
Fig 3. Overview diagram of the IFE-FGE on the bacteria datasets.

https://doi.org/10.1371/journal.pone.0327748.g003

3. Results and discussion

3.1. Initial performance of the e-nose without SAO

A good initial performance is crucial to the result of the experiment. We did several experiments and then extracted seven popular e-nose features on the two bacteria datasets, including max response (Max), max amplitude (Amplitude), peak area, max slope, rising area, declining area, and purging area, and we gave a brief description of each of the features in Table 6. Based on 20 times 5-fold cross-validation, we investigated each feature’s performance separately on the two bacteria datasets. The investigation results (Table 6) showed that the Max feature performed better than other features, and achieved an accuracy of 88.64% on dataset 1 and 90.02% on dataset 2. Therefore, the two accuracies of the Max feature were considered as the initial performance of the e-nose on the two bacteria datasets before SAO.

thumbnail
Table 6. Initial performance of the e-nose with the use of all sensors and single feature extraction method.

https://doi.org/10.1371/journal.pone.0327748.t006

3.2. Performance validations on the custom bacteria datasets and six public datasets

The parameter configuration of the involved methods is listed in Table 7, and validation results of bacteria datasets 1 and bacteria datasets 2 are shown in Figs 4 and 5, respectively. The results of all the methods on the six public gene datasets are shown in Table 8. After statistics of the validation results listed in Figs 6 and 7, respectively. Moreover, two novel and interesting conclusions can be deduced through contrastive analysis as below.

thumbnail
Table 7. Experiment results of all the involved algorithms on the two bacteria datasets and six famous public datasets.

https://doi.org/10.1371/journal.pone.0327748.t007

thumbnail
Table 8. The experimental results (Mean) of all the involved algorithms on six public datasets.

https://doi.org/10.1371/journal.pone.0327748.t008

thumbnail
Fig 4. The experimental results of all the involved algorithms on the bacteria datasets 1.

https://doi.org/10.1371/journal.pone.0327748.g004

thumbnail
Fig 5. The experimental results of all the involved algorithms on the bacteria datasets 2.

https://doi.org/10.1371/journal.pone.0327748.g005

thumbnail
Fig 6. Heatmap matrix (Mean accuracy) for rank statistics of the validation results on the eight datasets.

https://doi.org/10.1371/journal.pone.0327748.g006

thumbnail
Fig 7. Heatmap matrix (Max accuracy) for rank statistics of the validation results on the eight datasets.

*The cell denotes the number of corresponding rank (each column) achieved by the model (each row) on the results of the eight datasets.

https://doi.org/10.1371/journal.pone.0327748.g007

From Fig 4, we can see that the IFE-FGS method achieved the best results both in mean accuracy and max accuracy on the bacteria datasets 1. To be specific, our proposed method obtained 88.71% accuracy, which was more than 20% higher than that of the UFSOL method. Although Relief-F got the second-best accuracy rate (88.28%), it still lagged behind us by 0.43%. For the experimental results in max accuracy, though the FSASL method could get 0.2% higher than ours, the mean accuracy of it is obviously poor compared with ours (77.08% vs 88.71%), which indicates that it is so susceptible to parameters. Besides, the accuracy rates of the rest of the seven compared methods did not exceed 89%. Therefore, it could be demonstrated that our proposed method is not only effective but also robust to parameters.

Different from the experimental results on bacteria dataset 1, the experimental results on bacteria dataset 2 illustrated that the advantages of our proposed method would be further enhanced. In Fig 5, we can see that the IFE-FGS method achieved the best performance (93.95%) in mean accuracy, which was 3.24% higher than the second-best method and as much as 24.71% ahead of the last one. A similar situation also occurred in the mean accuracy results. Our method obtained 94.94% accuracy, 2.7%, and 2.8% higher than the second-best and third-best methods. Comparing this with the initial performance of the current e-nose system indicates that the IFE- FGS method has effectively fulfilled the sensor array optimization target for the electronic nose applied to wound bacterial detection, which could significantly reduce the cost of sensors while improving system accuracy.

We further did the experiments on the six public gene expression profiling datasets to verify the effectiveness of our proposed method. From Tables 8 and 9, we can see that the IFE-FGS method took four second-places, two thrid-places in mean accuracy, and five second-places in max accuracy. Although we were specifically designed for the electronic nose dataset rather than the genetic dataset, our method has achieved acceptable results and demonstrated the possibility of its migration to other application scenarios.

thumbnail
Table 9. The experimental results (Max) of all the involved algorithms on six public datasets.

https://doi.org/10.1371/journal.pone.0327748.t009

In Figs 6 and 7, we can see that the IFE-FGS method showed consistency of effectiveness and achieved excellent performance on all the eight datasets, which took 2 first-places, 4 second-places, and 2 third-places in mean accuracy; 1 first-place, 6 second-places, and 1 third-place in max accuracy. This means that the IFE-FGS method has a distinct advantage in [53] model stability, which is a benefit of the group-based strategy of FGS because it can generate a stable structure of key feature sets by binding features in groups. However, none of the other methods can maintain effectiveness and usually fluctuate in accuracy when dealing with different datasets.

3.3. Comprehension of the feature recommendation by IFE-FGS

The above experimental results not only show the superiority of our proposed method, but also explore two interesting discoveries. 1) Each recommended feature set involved two or three types of features, the case that all eight features come from one type, such as (which is the best one in our application) has never happened. It is the demonstration of sensor cooperation at the feature level. 2) Some sensors are always recommended, which are highlighted in Table 10. We may safely draw three new and valuable conclusions by summarizing the known volatiles of the three bacteria from previous research [6972] listed in Table 11.

thumbnail
Table 10. Characteristics of the sensors recommended by IFE-FGS under limit of eight features.

https://doi.org/10.1371/journal.pone.0327748.t010

thumbnail
Table 11. Known volatile organic compounds and gases emanated from the metabolite of the bacteria.

https://doi.org/10.1371/journal.pone.0327748.t011

  1. The algorithm always favors the sensors sensitive to the inorganic compound “ammonia,” which supports the previous research shown in Table 11 that ammonia is only found from the metabolite of Staphylococcus aureus (SA). Besides, the sensors with the ability to simultaneously be sensitive to ethanol, toluene, formaldehyde, or benzene are also preferred by the algorithm. Therefore, we have reason to believe that the chemicals belong to the biomarkers in the application.
  2. The sensor only sensitive to oxycarbide, oxygen, sulfur dioxide, or hydrogen sulfide has never been selected by the algorithm, which indicates that they have no contribution in significance for the recognition.
  3. From the perspective of engineer design for EN, we think that the selected sensor reveals the effective detection range of the targets. For ethanol, we found that the sensors TGS2602 with a detection range of 1–30 ppm, MQ138 of 30–300 ppm, TGS822 of 50–5000 ppm are strongly recommended by the algorithm, so they cover a detection range of 1–5000 ppm of ethanol, which should be considered as an effective range for the detection.

4. Conclusion

In this study, we presented the IFE-FGS method to optimize the sensor array for e-nose. Experimental results showed that the proposed methods could effectively reduce the number of sensors while improving the accuracy. Performance comparison with eight state-of-the-art methods on the two bacteria datasets and six public gene expression profiling datasets confirmed the effectiveness of the proposed methods. Besides, the SAO results helped us to further understand the different roles of the sensors in the application.

Supporting information

References

  1. 1. Lora AJM, et al. Diagnosis and management of wound infections. Critical Limb Ischemia: Acute And Chronic. 2017. p. 517–30.
  2. 2. Troeger C, et al. Estimates of the global, regional, and national morbidity, mortality, and aetiologies of lower respiratory infections in 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. The Lancet Infectious Diseases. 2018;18(11):1191–210.
  3. 3. Sen CK. Human Wounds and Its Burden: An Updated Compendium of Estimates. Adv Wound Care (New Rochelle). 2019;8(2):39–48. pmid:30809421
  4. 4. Sanaeifar A, ZakiDizaji H, Jafari A, Guardia M de la. Early detection of contamination and defect in foodstuffs by electronic nose: A review. TrAC Trends in Analytical Chemistry. 2017;97:257–71.
  5. 5. Ghasemi-Varnamkhasti M, Apetrei C, Lozano J, Anyogu A. Potential use of electronic noses, electronic tongues and biosensors as multisensor systems for spoilage examination in foods. Trends in Food Science & Technology. 2018;80:71–92.
  6. 6. Sen CK, Gordillo GM, Roy S, Kirsner R, Lambert L, Hunt TK, et al. Human skin wounds: a major and snowballing threat to public health and the economy. Wound Repair Regen. 2009;17(6):763–71. pmid:19903300
  7. 7. Nerín C, Aznar M, Carrizo D. Food contamination during food process. Trends in Food Science & Technology. 2016;48:63–8.
  8. 8. Gu D-C, Liu W, Yan Y, Wei W, Gan J, Lu Y, et al. A novel method for rapid quantitative evaluating formaldehyde in squid based on electronic nose. LWT. 2019;101:382–8.
  9. 9. Sun T, He J, Qian S, Zheng Y, Zhang K, Luo J, et al. Collaborative detection for wound infections using electronic nose and FAIMS technology based on a rat wound model. Sensors and Actuators B: Chemical. 2020;320:128595.
  10. 10. Swanson T, Ousey K, Haesler E, Bjarnsholt T, Carville K, Idensohn P, et al. IWII Wound Infection in Clinical Practice consensus document: 2022 update. J Wound Care. 2022;31(Sup12):S10–21. pmid:36475844
  11. 11. Wilson AD, Baietto M. Applications and advances in electronic-nose technologies. Sensors (Basel). 2009;9(7):5099–148. pmid:22346690
  12. 12. Byun HG, Persaud KC, Pisanelli AM. Wound‐state monitoring for burn patients using e‐nose/spme system. ETRI Journal. 2010;32(3):440–6.
  13. 13. Loutfi A, Coradeschi S, Mani GK, Shankar P, Rayappan JBB. Electronic noses for food quality: A review. Journal of Food Engineering. 2015;144:103–11.
  14. 14. Wojnowski W, Majchrzak T, Dymerski T, Gębicki J, Namieśnik J. Electronic noses: Powerful tools in meat quality assessment. Meat Sci. 2017;131:119–31. pmid:28501437
  15. 15. Majchrzak T, Wojnowski W, Dymerski T, Gębicki J, Namieśnik J. Electronic noses in classification and quality control of edible oils: A review. Food Chem. 2018;246:192–201. pmid:29291839
  16. 16. Gardner JW, Shin HW, Hines EL. An electronic nose system to diagnose illness. Sensors and Actuators B: Chemical. 2000;70(1–3):19–24.
  17. 17. Dutta R., J.W. Gardner, and E.L. Hines. ENT bacteria classification using a neural network based cyranose 320 electronic nose. in SENSORS, 2004 IEEE. 2004. IEEE.
  18. 18. D’Amico A, Pennazza G, Santonico M, Martinelli E, Roscioni C, Galluccio G, et al. An investigation on electronic nose diagnosis of lung cancer. Lung Cancer. 2010;68(2):170–6. pmid:19959252
  19. 19. D’Amico A, Di Natale C, Falconi C, Martinelli E, Paolesse R, Pennazza G, et al. Detection and identification of cancers by the electronic nose. Expert Opin Med Diagn. 2012;6(3):175–85. pmid:23480684
  20. 20. Concina I, Falasconi M, Gobbi E, Bianchi F, Musci M, Mattarozzi M, et al. Early detection of microbial contamination in processed tomatoes by electronic nose. Food Control. 2009;20(10):873–80.
  21. 21. Mamatha BS, Prakash M, Nagarajan S, Bhat KK. Evaluation of the flavor quality of pepper (Piper nigrum l.) cultivars by gc–ms, electronic nose and sensory analysis techniques. Journal of Sensory Studies. 2008;23(4):498–513.
  22. 22. Yang Z, Dong F, Shimizu K, Kinoshita T, Kanamori M, Morita A, et al. Identification of coumarin-enriched Japanese green teas and their particular flavor using electronic nose. Journal of Food Engineering. 2009;92(3):312–6.
  23. 23. Banerjee(Roy) R, Chattopadhyay P, Tudu B, Bhattacharyya N, Bandyopadhyay R. Artificial flavor perception of black tea using fusion of electronic nose and tongue response: A Bayesian statistical approach. Journal of Food Engineering. 2014;142:87–93.
  24. 24. Fenner RA, Stuetz RM. The Application of Electronic Nose Technology to Environmental Monitoring of Water and Wastewater Treatment Activities. Water Environment Research. 1999;71(3):282–9.
  25. 25. Tang K.-T., et al. An electronic-nose sensor node based on polymer-coated surface acoustic wave array for environmental monitoring. In: The 2010 International Conference on Green Circuits and Systems. IEEE; 2010.
  26. 26. Dentoni L, Capelli L, Sironi S, Del Rosso R, Zanetti S, Della Torre M. Development of an electronic nose for environmental odour monitoring. Sensors (Basel). 2012;12(11):14363–81. pmid:23202165
  27. 27. Pearce TC, et al. Handbook of machine olfaction. Weinheim: Willy-VCH; 2003.
  28. 28. Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering. 2014;40(1):16–28.
  29. 29. Guyon I, Eliseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research. 2003;3(Mar):1157–82.
  30. 30. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F. A review of microarray datasets and applied feature selection methods. Information Sciences. 2014;282:111–35.
  31. 31. Hira ZM, Gillies DF. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Adv Bioinformatics. 2015;2015:198363. pmid:26170834
  32. 32. Pashaei E. An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data. Bioengineering (Basel). 2023;10(10):1123. pmid:37892853
  33. 33. Pashaei E, Pashaei E, Aydin N. Biomarker Identification for Alzheimer’s Disease Using a Multi-Filter Gene Selection Approach. Int J Mol Sci. 2025;26(5):1816. pmid:40076442
  34. 34. Pashaei E, et al. DiCE: differential centrality-ensemble analysis based on gene expression profiles and protein-protein interaction network. bioRxiv. 2025; p. 2025.03.14.638654.
  35. 35. Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput & Applic. 2022;34(8):6427–51.
  36. 36. Zhang H, et al. Optimization of sensor array and detection of stored duration of wheat by electronic nose. Journal of Food Engineering. 2007;82(4):403–8.
  37. 37. Petersson H, Klingvall R, Holmberg M. Sensor array optimization using variable selection and a Scanning Light Pulse Technique. Sensors and Actuators B: Chemical. 2009;142(2):435–45.
  38. 38. Xu Z, Shi X, Lu S. Integrated sensor array optimization with statistical evaluation. Sensors and Actuators B: Chemical. 2010;149(1):239–44.
  39. 39. Szecówka PM, Szczurek A, Licznerski BW. On reliability of neural network sensitivity analysis applied for sensor array optimization. Sensors and Actuators B: Chemical. 2011;157(1):298–303.
  40. 40. Xu Z, Lu S. Multi-objective optimization of sensor array using genetic algorithm. Sensors and Actuators B: Chemical. 2011;160(1):278–86.
  41. 41. Yin Y, Yu H, Chu B, Xiao Y. A sensor array optimization method of electronic nose based on elimination transform of Wilks statistic for discrimination of three kinds of vinegars. Journal of Food Engineering. 2014;127:43–8.
  42. 42. Sun H, Tian F, Liang Z, Sun T, Yu B, Yang SX, et al. Sensor Array Optimization of Electronic Nose for Detection of Bacteria in Wound Infection. IEEE Trans Ind Electron. 2017;64(9):7350–8.
  43. 43. Xu K, Wang J, Wei Z, Deng F, Wang Y, Cheng S. An optimization of the MOS electronic nose sensor array for the detection of Chinese pecan quality. Journal of Food Engineering. 2017;203:25–31.
  44. 44. Liang Z, Tian F, Zhang C, Sun H, Liu X, Yang SX. A correlated information removing based interference suppression technique in electronic nose for detection of bacteria. Anal Chim Acta. 2017;986:145–52. pmid:28870320
  45. 45. Malhi A, Gao RX. PCA-Based Feature Selection Scheme for Machine Defect Classification. IEEE Trans Instrum Meas. 2004;53(6):1517–25.
  46. 46. Cataltepe Z, Genc HM, Pearson T. A PCA/ICA based feature selection method and its application for corn fungi detection. In: 2007 15th European Signal Processing Conference. IEEE; 2007.
  47. 47. Xu J-L, et al. Principal component analysis based feature selection for clustering. In: 2008 International Conference on Machine Learning and Cybernetics. IEEE; 2008.
  48. 48. Song F, Guo Z, Mei D. Feature selection using principal component analysis. In: 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization. IEEE; 2010.
  49. 49. Song F, Mei D, Li H. Feature selection based on linear discriminant analysis. In: 2010 International Conference on Intelligent System Design and Engineering Application. IEEE; 2010.
  50. 50. Sharma A, Paliwal KK, Imoto S, Miyano S. A feature selection method using improved regularized linear discriminant analysis. Machine Vision and Applications. 2013;25(3):775–86.
  51. 51. Bhardwaj I, Londhe ND, Kopparapu SK. Feature selection for novel fingerprint dynamics biometric technique based on PCA. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE; 2016.
  52. 52. Deng Cai, Xiaofei He, Jiawei Han. SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis. IEEE Trans Knowl Data Eng. 2008;20(1):1–12.
  53. 53. Du L, Shen YD. Unsupervised feature selection with adaptive structure learning. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015.
  54. 54. Cai D, Zhang C, He X. Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 2010.
  55. 55. Guo J, et al. Unsupervised feature selection with ordinal locality. In: 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE; 2017.
  56. 56. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1996;58(1):267–88.
  57. 57. Kononenko I. Estimating attributes: Analysis and extensions of RELIEF. In: European Conference on Machine Learning. Springer; 1994.
  58. 58. Guyon I, et al. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.
  59. 59. Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–38. pmid:16119262
  60. 60. Rosner B, Glynn RJ, Lee M-LT. Incorporation of clustering effects for the Wilcoxon rank sum test: a large-sample approach. Biometrics. 2003;59(4):1089–98. pmid:14969489
  61. 61. Rosner B, Glynn RJ. Power and sample size estimation for the Wilcoxon rank sum test with application to comparisons of C statistics from alternative prediction models. Biometrics. 2009;65(1):188–97. pmid:18510654
  62. 62. Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001;7(6):673–9. pmid:11385503
  63. 63. Johnson B, Xie Z. Classifying a high resolution image of an urban area using super-object information. ISPRS Journal of Photogrammetry and Remote Sensing. 2013;83:40–9.
  64. 64. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature. 2002;415(6870):436–42. pmid:11807556
  65. 65. van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415(6871):530–6. pmid:11823860
  66. 66. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403(6769):503–11. pmid:10676951
  67. 67. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet. 2000;24(3):227–35. pmid:10700174
  68. 68. Yan J, Guo X, Duan S, Jia P, Wang L, Peng C, et al. Electronic Nose Feature Extraction Methods: A Review. Sensors (Basel). 2015;15(11):27804–31. pmid:26540056
  69. 69. Labows JN, McGinley KJ, Webster GF, Leyden JJ. Headspace analysis of volatile metabolites of Pseudomonas aeruginosa and related species by gas chromatography-mass spectrometry. J Clin Microbiol. 1980;12(4):521–6. pmid:6775012
  70. 70. Allardyce RA, Hill AL, Murdoch DR. The rapid evaluation of bacterial growth and antibiotic susceptibility in blood cultures by selected ion flow tube mass spectrometry. Diagn Microbiol Infect Dis. 2006;55(4):255–61. pmid:16678377
  71. 71. Šetkus A, Galdikas A-J, Kancleris Ž-A, Olekas A, Senulienė D, Strazdienė V, et al. Featuring of bacterial contamination of wounds by dynamic response of SnO2 gas sensor array. Sensors and Actuators B: Chemical. 2006;115(1):412–20.
  72. 72. Šetkus A, et al. Qualitative and quantitative characterization of living bacteria by dynamic response parameters of gas sensor array. Sensors and Actuators B: Chemical. 2008;130(1):351–8.