Improved swin transformer-based thorax disease classification with optimal feature selection using chest X-ray

Nadim Rana; Yahaya Coulibaly; Ayman Noor; Talal H. Noor; Md Imran Alam; Zeba Khan; Ali Tahir; Mohammad Zubair Khan

doi:10.1371/journal.pone.0327099

Abstract

Thoracic diseases, including pneumonia, tuberculosis, lung cancer, and others, pose significant health risks and require timely and accurate diagnosis to ensure proper treatment. Thus, in this research, a model for thorax disease classification using Chest X-rays is proposed by considering deep learning model. The input is pre-processed by resizing, normalizing pixel values, and applying data augmentation to address the issue of imbalanced datasets and improve model generalization. Significant features are extracted from the images using an Enhanced Auto-Encoder (EnAE) model, which combines a stacked auto-encoder architecture with an attention module to enhance feature representation and classification accuracy. To further improve feature selection, we utilize the Chaotic Whale Optimization (ChWO) Algorithm, which optimally selects the most relevant attributes from the extracted features. Finally, the disease classification is performed using the novel Improved Swin Transformer (IMSTrans) model, which is designed to efficiently process high-dimensional medical image data and achieve superior classification performance. The proposed EnAE + ChWO+IMSTrans model for thorax disease classification was evaluated using extensive Chest X-ray datasets and the Lung Disease Dataset. The proposed method demonstrates enhanced Accuracy, Precision, Recall, F-Score, MCC and MAE of 0.964, 0.977, 0.9845, 0.964, 0.9647, and 0.184 respectively indicating the reliable and efficient solution for thorax disease classification.

Citation: Rana N, Coulibaly Y, Noor A, Noor TH, Alam MI, Khan Z, et al. (2025) Improved swin transformer-based thorax disease classification with optimal feature selection using chest X-ray. PLoS One 20(6): e0327099. https://doi.org/10.1371/journal.pone.0327099

Editor: Mohammad Alfrad Nobel Bhuiyan, LSUHSC Shreveport: LSU Health Shreveport, UNITED STATES OF AMERICA

Received: December 19, 2024; Accepted: June 10, 2025; Published: June 25, 2025

Copyright: © 2025 Rana et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The raw image dataset used in this study is publicly available on: https://www.kaggle.com/code/anmolkamoji/lung-disease-detection/input. https://www.kaggle.com/datasets/aryashetty29/fibrosis All relevant data supporting the findings of this study are included within the manuscript.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Thorax diseases encompass a broad range of conditions that affect the organs and structures within the thoracic cavity, including the lungs, heart, and surrounding tissues. Common thorax diseases include pneumonia, tuberculosis, lung cancer, pleural effusion, chronic obstructive pulmonary disease (COPD), and interstitial lung diseases [1]. These conditions can have serious health implications, ranging from breathing difficulties and chronic pain to life-threatening complications [2]. The early and accurate detection of these diseases is crucial for effective treatment and improved patient outcomes [3,4]. Traditional methods of detecting thorax diseases primarily rely on clinical assessments and imaging techniques, particularly computed tomography (CT) scans and Chest X-rays. Chest X-rays are the most commonly used imaging modality due to their accessibility, speed, and ability to reveal abnormalities in the lungs, heart, and chest wall [5]. Radiologists analyze these X-ray images for signs of disease, such as abnormal shadows, opacities, or lesions that may indicate conditions like pneumonia or lung cancer [6]. However, the manual interpretation of Chest X-rays is highly dependent on the radiologist’s expertise and experience, and it can be prone to variability and errors, especially in cases of subtle or early-stage diseases [7].

Numerous research gaps exist in the classification of thoracic diseases using chest X-rays, despite advances in deep learning [8]. Because patient demographics and imaging conditions differ, models frequently do not generalize well across datasets [9]. Predictions are biased as a result of data imbalance, especially when rare diseases are underrepresented [10]. The majority of datasets only include labels at the image level, which restricts interpretability and localization [11]. Patients frequently appear with numerous illnesses, making multi-label categorization difficult for current models [12]. Deep learning systems also operate as “black boxes,” providing little explanation and casting doubt on clinical reliability [13]. They could also form erroneous associations with objects that have nothing to do with illness. Practical implementation is hampered by the lack of real-world evaluation and integration into clinical operations [14]. Furthermore, the development of reliable, equitable, and thorough diagnostic tools is further limited by bias across demographic groups and the underutilization of multimodal and longitudinal data [15,16]. Building clinically dependable and morally decent AI systems for thoracic imaging requires addressing these problems.

To overcome this limitation, this research introduced a deep learning-based thorax disease classification model. The primary objective of this study is to develop a deep learning framework that is both reliable and accurate for classifying thoracic diseases from chest X-ray images. By combining an EnAE with attention processes, the study aims to enhance diagnostic performance by efficiently extracting important features that capture intricate thoracic anomalies [17]. The research employs advanced data augmentation techniques and the ChWO Algorithm to effectively select the most critical features, addressing the fundamental challenge of imbalanced datasets and enhancing the model’s ability to diagnose both common and rare diseases [18,19] accurately. Furthermore, the framework utilizes the IMSTrans model in conjunction with a multi-label classification technique to accurately classify instances that exhibit multiple concurrent states, thereby mirroring actual clinical settings. To enhance clinical applicability, the research aims to integrate explainability elements that enhance model transparency and credibility among medical professionals [20]. The major contributions of the study are:

Design of EnAE for feature extraction: The proposed EnAE model is designed by integrating the attention mechanism with a stacked autoencoder for extracting significant features.

Design of ChWO for feature selection: The proposed ChWO algorithm is designed by incorporating the chaotic Chebyshev mapping within the existing whale optimization for obtaining a better solution in choosing the optimal best features.

Design of IMSTrans for Thorax Disease Classification: The proposed IMSTrans model is designed by integrating a DenseNet-based MLP layer into a conventional Swin Transformer to enhance the classification accuracy of the model.

The organization of the research is as follows: Section 2 details the related works with a problem statement, and Section 3 elaborates on the detailed Thorax disease classification. The result and discussion are presented in Section 4, and the conclusion is in Section 5.

2. Related works

To enhance the accuracy and efficiency of thoracic disease classification and localization from chest X-ray images, [21] designed a deep learning model named a novel attention-based convolutional neural networks (CNNs) model for thoracic disease classification (ThoraX-PriorNet). The developed model utilizes a separate network, called Anatomical Prior Estimation, which was trained to estimate disease-dependent spatial probability maps. The attention was employed to extract relevant regions in disease classification. The designed method, when applied to other medical image classification and localization tasks, is more efficient in terms of classification accuracy. The outcome relies on the precision of the estimated anatomical prior probability maps, which was the challenging aspect.

To develop an accurate and efficient deep learning model for detecting and classifying various chest diseases from X-ray images, [22] designed CXray-EffDet model using EfficientNet. In this, EfficientDet-D0 model was utilized for feature extraction based on EfficientNet-B0 to extract a distinctive set of features from the input images. It is used to predict the presence and type of chest abnormalities. The designed model offers a lightweight and computationally efficient solution for detecting and classifying chest diseases. Here, the complex structure of chest X-rays, including variations in image quality and anatomical differences, presents challenges for accurate detection and classification.

A deep learning approach based on a modified CNN with an attention mechanism based on Xception model was designed by [23] to develop a reliable and automatic system for detecting thorax disease. Here, the crucial region of the X-ray image for improved thorax detection was acquired by the attention mechanism. The model can potentially assist radiologists in chest X-ray analysis, improving efficiency and reducing workload. Datasets utilized for the evaluation have an imbalanced distribution of typical and thorax cases, requiring specific techniques to address the issue during training.

A deep learning approach based on Vision Transformers (ViTs) was designed by [24] for performing multi-label thorax disease classification tasks. In this, ViTs were pre-trained to capture the missing pixels of the input image. The designed method offers improved performance in multi-label thorax disease classification compared to traditional CNN-based approaches. The computational complexity of the model was higher when considering a large dataset.

A deep learning model based on CNN was designed by [25], wherein nine various disease classes were identified using Chest X-rays. The developed model utilized four different CNN models for extracting distinct image-level representations. The designed model demonstrates superior performance due to the consideration of various attributes. Training and deploying complex CNN models was computationally expensive, which presented a significant challenge.

A ResNet-based model was designed by [26] for thorax classification and localization using the Triplet-Attention Mechanism. The features achieved through the ResNet model were utilized to extract radiomic features from the regions highlighted by the Class Activation Maps. Finally, the classification of disease was employed through the fully connected layer. Here, the use of radiomic features provides a more interpretable explanation of the model’s predictions, helping to achieve better outcomes. Still, the computational complexity limits the model’s performance.

A CX-Ultranet was designed by [27] to classify and diagnose 13 distinct thoracic illnesses from plain radiography pictures. A multiclass cross-entropy loss function was employed within a compound scaling structure, with EfficientNet serving as the baseline model. To create reduction cells and add more skip connections, channel shuffling was applied at various points along the network. Together, the loss function algorithm and Adam optimizers stabilize the model, enabling ongoing learning from new data over time.

A transfer learning method for categorizing radiological abnormalities and respiratory conditions from chest X-rays was designed by [28]. The dataset comprised 752 X-ray photos from the University Clinical Center of Kragujevac and 191,660 publicly available images. Infections, pleural thickening, atelectasis, cardiomegaly, tuberculosis, malignancies, non-viral pneumonia, and COVID-19 pneumonia, as well as healthy cases, were among the disorders it covers. This research, in contrast to others, defines up to 18 illness groups and distinguishes between healthy and diseased instances. DenseNet121 with CheXNeXt weights and additional layers for fine-tuning are used in the procedure.

CycleGAN-based preprocessing approach for enhanced lung disease classification in ChestX-Ray14 was designed by [29] that starts with identifying images with artifacts and then uses a CycleGAN model to produce sharper images in order to lessen the noise effect of the artifacts. The DenseNet-121 model, used for classification, utilizes channel and spatial attention mechanisms to focus on specific areas of the image. The model was also updated to incorporate additional data from the dataset, specifically clinical features. Table 1 shows the analysis of recent research work.

Download:

Table 1. Analysis of Recent Research Work.

https://doi.org/10.1371/journal.pone.0327099.t001

2.1. Problem statement

Deep learning for thoracic disease classification and localization has advanced; however, there are still a number of important research gaps that highlight the importance and applicability of further study in this area. Numerous current models, like the ResNet-based Triplet-Attention model [30] and ThoraX-PriorNet [31,32], mostly rely on anatomical priors and attention mechanisms. These methods enhance computing complexity and improve accuracy, but they also introduce dependence on the quality of earlier predictions. Similar to this, models like CXray-EffDet [33,34] and CX-Ultranet [33,35] utilize lightweight backbones, such as EfficientNet, to promote computational efficiency; however, they still struggle to handle anatomical changes and fluctuating image quality. Although Vision Transformer-based methods provide better multi-label classification, their high computing requirements limit their scalability. Additionally, the majority of research suffers from inadequate generalizability across various clinical contexts, a lack of interpretability, and dataset imbalance. Despite recent attempts to use CycleGANs to enhance preprocessing and integrate clinical metadata, multimodal data integration is still not completely recognized. Furthermore, transfer learning techniques have the potential to benefit from massive datasets, but fine-tuning for domain-specific nuances remains a challenging task. Thus, the goal is to create a robust system that enhances diagnostic accuracy, reduces false positives and negatives, and supports radiologists in making faster and more accurate decisions, ultimately improving patient outcomes. This research is motivated by the difficulties of correctly identifying thoracic disease from chest X-ray scans, which frequently entail complicated anatomical variances and imbalanced datasets. A novel EnAE with an attention mechanism for robust feature extraction, sophisticated data augmentation techniques to reduce class imbalance, and the ChWO algorithm for optimal feature selection are all used in the proposed approach to address these problems. Ultimately, the IMSTrans model enables the accurate classification of thoracic diseases, thereby enhancing precision and facilitating more informed clinical decision-making.

3. Proposed methodology

The proposed thorax disease classification module comprises of four various modules like pre-processing, extraction of significant attributes, optimal feature selection and classification of thorax disease. Initially, the input Chest X-ray for processing the proposed Thorax disease classification is acquired from the dataset and is pre-processed for resizing images, normalizing pixel values, and data augmentation to improve model generalization. Thoracic diseases are rare, resulting in imbalanced datasets where certain classes are underrepresented. Then, the significant attributes are extracted using the EnAE. The EnAE is designed with a stacked auto-encoder and an attention module for enhancing classification accuracy. From the extracted features, the significant attributes are selected optimally using the ChWO algorithm. Using the extracted features, the disease classification is employed using the novel IMSTrans model. The workflow is portrayed in Fig 1.

Download:

Fig 1. Proposed Thorax Disease Detection Model.

https://doi.org/10.1371/journal.pone.0327099.g001

3.1. Data acquisition

The input is obtained from the publicly available dataset and is utilized for the evaluation of the proposed EnAE + ChWO+IMSTrans model. The datasets like Chest X-Ray dataset [36] and Lung Disease Dataset [37] are utilized for the evaluation of the proposed EnAE + ChWO+IMSTrans-based Thorax disease classification.

ChestX-Ray dataset: Dataset 1 contains a total of 8,274 chest X-ray images in 1024 × 1024 PNG format, covering 14 disease classes: Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, and Hernia. Each image is accompanied by rich metadata (from Data_Entry_2017.csv) including patient age, gender, view position, and more. This dataset provides a realistic and challenging basis for evaluating computer-aided diagnosis systems, due to the visual similarity of pathologies and the complexity of clinical interpretation, even when compared to CT imaging.

Lung Disease Dataset: This dataset was created using the NIH CXR Structured Dataset 248, which contains Chest X-ray (CXR) images designed to support the classification of lung diseases. This dataset includes a total of 4,260 high-resolution chest X-ray images, categorized into three classes: Normal Lungs, Fibrosis Lungs, and Pneumonia Lungs. The photos are organized into separate folders for each category to ensure structured access and labeling.

3.2. Pre-processing

Image filtering, resizing, and normalization are employed in the pre-processing stage to remove artifacts from the image and simplify further processing with minimal computational load.

3.2.1. Gaussian filtering.

Image filtering is the process of enhancing or suppressing certain features of an image to make the relevant information more visible to the model. The proposed method utilizes Gaussian filtering to remove the artifacts from the image. The image filtered by the Gaussian filter is formulated as:

(1)

where, the filtered outcome is signified as , the input is denoted as , and the smoothing factor is denoted as .

3.2.2. Resizing.

For obtaining the entire image with a unique size, image resizing is employed in the proposed disease classification model. The image is resized to 224x224 for further evaluation.

3.2.3. Normalization.

Normalization is a process that adjusts the pixel intensity values in the image to a standard range, often between 0 and 1. The normalization is designed to make the image data more uniform and to help deep learning models converge faster and perform more effectively. The Z-score normalization is expressed as:

(2)

where, the normalized outcome is denoted as , the average pixel intensity is defined as , standard deviation is represented as and input is signified as .

3.3. Data augmentation

The dataset’s data is enriched to eliminate the biased outcome in the disease classification task. The proposed Thorax disease classification model employs augmentation based on rotating, flipping, cropping, shifting and scaling.

Rotating: Rotating of X-ray image is employed by turning the image around its center with the specified angle in a clockwise or counterclockwise direction.

Flipping: Flipping an X-ray image involves generating a mirror image by flipping it horizontally or vertically regardless of the orientation.

Cropping: Cropping an X-ray image involves cutting out a part of the image to create an augmented version that focuses on different parts and introduces variations in image framing.

Shifting: Shifting is employed by moving the X-ray image along the x-axis, y-axis, or for obtaining augmented outcome irrespective of its position within the image frame.

Scaling: Scaling involves enlarging or reducing the X-ray image to handle variations in the size of the regions of interest.

The augmented image and the original image in the dataset are fed into the feature extraction module to obtain the most appropriate features for reducing the computation burden of the classification model.

3.4. Enhanced auto-encoder-based feature extraction

The proposed EnAE model is designed by integrating the attention mechanism with the Stacked Auto-Encoders (SAEs). A feature extraction model called EnAE was created to overcome the limitations of conventional SAEs, which lose information as a result of layerwise reconstruction errors. Each EAE includes both the input from the current layer and the original raw data in the reconstruction process, in contrast to ordinary AEs that only attempt to recreate the input characteristics transmitted from earlier layers. By avoiding the cumulative degradation observed in conventional SAEs, this dual reconstruction makes sure that the learnt features at each hidden layer maintain a good representation of the original data. A deep network that can hierarchically extract progressively abstract features while maintaining the integrity of the raw input across all layers is created by stacking numerous EAEs. The significant features from the augmented and original images are extracted using the EnAE model. The structure of EnAE is portrayed in Fig 2.

Download:

Fig 2. Structure of EnAE model for feature extraction.

https://doi.org/10.1371/journal.pone.0327099.g002

Encoder: The encoder is responsible for transforming the input to obtain low level features and is represented through:

(3)

Here, weights and biases concerning the layer of encoder is defined as and . The encoder of the EnAE model acquires as its input, and the outcome produced in the layer is signified as . The output of the encoder model is obtained as a function of biases, weights, and input and is notated as .

Decoder: The decoder takes the latent space representation and is represented as:

(4)

Here, the final features arrived at the outcome of the EnAE model is defined as .

Attention Mechanism: The attention mechanism is included after each layer of the encoder and decoder module for capturing long-range dependencies between features. Here, the inclusion of the attention layer within the stacked auto-encoder, both the attention and deep hierarchical features are extracted that assist to enhance the classification process. In addition, attention layers in both the encoder and decoder part assist to learn appropriate features in both compression and reconstruction stages of feature extraction. Here, the attention mechanism assigns different weights to each feature based on its importance. Features deemed more relevant receive higher weights, while less important features receive lower weights. The attention mechanism is employed using query, key and value function. The attention score of the features is estimated as:

(5)

where, query and key vector is signified as respectively and dot product is denoted as and specifies the transformation. The Normalized form of the scores for the features is formulated as:

(6)

where, is the attention weight for the input, is the un-norma ized attention score for the key and the softmax function ensures that the attention weights sum to 1. Then, the outcome of the attention mechanism is defined as:

(7)

where, is the attention weight for the value and is the value vector. Thus, using the proposed EnAE based feature extraction model assist to capture all the significant features.

3.5. Chaotic whale optimization algorithm-based feature selection

The most appropriate features that enhance the Thorax disease classification are selected by the feature selection model based on ChWO algorithm [38]. The ChWO algorithm is designed by incorporating the chaotic chebyshev mapping within the existing whale optimization for obtaining a better solution in choosing the optimal features. The conventional whale optimization [39,40] is designed based on the foraging behavior of humpback whales. Humpback whales generate distinctive bubbles in the 9-shaped or circular path to capture the target. The hunting method devised by the whales termed the bubble-net feeding strategy is utilized to solve to address the optimization issue. Here, the inclusion of Chebyshev-based randomization enables the algorithm to explore a larger search space and addresses the issue of local optimal trapping.

ChWO improves the randomization criteria by introducing chaotic Chebyshev mapping into the traditional whale optimization process. This lessens the possibility of being stuck in local optima, a typical drawback of many optimization strategies, and enables the algorithm to more efficiently traverse a larger search space. The whale optimization process’s rate of convergence is accelerated by the Chebyshev mapping. The ChWO ensures effective optimization during feature selection since it converges more quickly than the conventional whale optimization algorithm. By selecting the most relevant and ideal features from the extracted set, ChWO feature selection greatly enhances the classification of thoracic diseases.

The ChWO algorithm considers the current best solution as the prey and the other members of the candidates update the solution using the following equations:

(8)

(9)

Here, is the good solution obtained, denotes the present iteration, signifies the solution vector, and and signifies the coefficient vectors and is formulated as:

(10)

(11)

where, decreases linearly, and is an arbitrary factor within [0,1].

Randomization: The randomization phase of the ChWO algorithm considers two various strategies in acquiring the global best solution. Reducing the value of to shrink the encircling range is employed in its first strategy. Then, spiral movement-based position updation is devised in its second strategy. Calculating the distance between the candidate and target and updating the solution in a spiral-shaped path:

(12)

where, , is the spiral movement, and is a arbitrary factor in [−1, 1]. Here, the chebyshev mapping for enhancing the randomization criteria of the whale optimization is expressed as:

(13)

where, the control parameter for enhancing the convergence rate is defined as . Then, the solution accomplished by the proposed ChWO algorithm is signified as:

(14)

(15)

Thus, after the addition of chebyshev mapping, the proposed ChWO algorithm enhances the randomization criteria in capturing the global best solution.

Local Search: Whales search for prey randomly based on their positions. This is represented as:

(16)

where, is a random solution.

Termination: The global best solution accomplishment or completion of pre-defined iteration terminates the processing. The pseudo-code is presented in Algorithm 1.

Algorithm 1. Pseudo-code of ChWO algorithm

1 Initialize the parameters, population and iteration

2 Locate the whales in the search area

3 Evaluate the feasibility based on error

4 While

5 {

6 Estimate the solution in randomization phase using

7 Estimate the solution in local search phase using

8 Re-estimate the feasibility using error

9 Return the best solution

10}

11

12 end

Thus, using the solution accomplished by the ChWO algorithm, the optimal best features are selected for further processing.

3.6. Improved swin transformer-based neural network for thorax disease detection

Using the selected features, the Thorax disease detection is devised using the proposed IMSTrans model. The proposed IMSTrans is more efficient in image-based disease classification tasks because it extracts the features of the images hierarchically by capturing fine details and abstract features at multiple scales that assist in enhancing the disease classification task more efficiently. The designed model utilizes two various attention mechanisms for capturing both local and global features. In this, the non-overlapping windows-based local self-attention is utilized for capturing the local features. In addition, the shifted window mechanism with cross-window attention is employed for capturing the long-term dependent features. In addition, the extraction of complex features from the input are extracted by incorporating DenseNet within the standard Swin transformer. The DenseNet-based model assist in capturing the significant attributes with minimal computation burden through the skip connections and it assist in solving the issue concerning the vanishing gradient problem. The structure of the proposed IMSTrans model for the Thorax disease classification is portrayed in Fig 3.

Download:

Fig 3. Structure of proposed IMSTrans model for Thorax disease classification.

https://doi.org/10.1371/journal.pone.0327099.g003

The key components of the proposed IMSTrans model for Thorax disease classification are defined as:

(i) DenseNet: DenseNet model with its dense connectivity and efficient parameter usage is helpful in classifying the Thorax disease by extracting the complex features. In addition, the issues concerning the over-fitting is solved by the DenseNet through the efficient use of parameters helps reduce the risk of over-fitting. Also, the improved gradient flow of the DenseNet due to dense connections ensures more effective training that helps in alleviating the vanishing gradient problem. The feature re-use capability of the DenseNet makes the proposed model to learn complex features and design of DenseNet is portrayed in Fig 4.

Download:

Fig 4. Design of DenseNet.

https://doi.org/10.1371/journal.pone.0327099.g004

Dense Block: The dense block associated with the DenseNet comprises of several convolution layers with ReLU activation and batch normalization functions. The Dense block concerning the DenseNet results in improved gradient flow during training and encourages feature reuse. The structure of dense block is portrayed in Fig 5.

Download:

Fig 5. Structure of Dense Block.

https://doi.org/10.1371/journal.pone.0327099.g005

(a) Convolution Layer: A convolution layer applies convolution operations to extract feature maps from the input. It performs a dot product between the filter (kernel) and local regions of the input image. The convolution layer in the proposed Thorax disease classification model assists in extracting the significant features concerning the texture and patterns of input features.

(17)

where, the kernel is defined as , the input feature is defined as , and the outcome of the convolutional layer is notated as . The spatial coordinates are represented as , respectively.

(b) Batch Normalization: Batch normalization normalizes the output of a previous activation layer by maintaining the output 0 mean and standard deviation of 1. It is commonly applied before the activation functions in the network.

(c)ReLU Activation: ReLU is applied to the feature maps to ensure non-linearity.

Transition Layer: After each dense block, the transition layer reduces the dimensions through down-sampling. The transition layer helps in handling large inputs and ensures that the model remains computationally efficient.

Pooling Layer: Global Average Pooling (GAP) is utilized in DenseNet to reduce the feature map by averaging all elements within each feature map, resulting in a single feature per map. In the proposed model, GAP is used towards the end of the network is used for drastically reducing the number of parameters while retaining important information. In thorax disease classification, GAP helps by reducing the risk of over-fitting while still preserving the significant information necessary for accurate classification.

(ii) Attention Mechanism based on the shifted window: Unlike the traditional attention mechanism employed in the transformers, swin transformers utilize an attention mechanism based on the shifted window. It performs the attention mechanism within small local windows. In this, the input feature is divided into non-overlapping blocks and then, it is shifted into half of the window size and is represented as . The shift in window position causes the windows to overlap, introducing connections between patches from neighboring windows. It is formulated as:

(18)

(19)

(20)

(21)

where, , , and represent the feature outputs at different layers and represent the output of the attention module. refers to the regular window-based self-attention, and refers to the self-attention based on shifted window. The dense representation of the MLP layer is signified as.

(iii) Layer Normalization: Layer normalization helps to standardize the input features across different image patches, ensuring that the attention mechanism can focus on meaningful relationships between patches.

(iii) Fully Connected Layer: In the Thorax disease classification tasks, the fully connected layer serves as the output layer. For performing the thorax disease classification task, the fully connected layer provides output through the probability distribution across various disease classes.

4. Result and discussion

The implementation of the proposed EnAE + ChWO+IMSTrans-based Thorax disease classification is employed in PYTHON programming language. The analysis is devised for various training and testing percentages and the outcome derived by the testing is presented in this section. The proposed method is compared with existing Thorax disease detection methods like ThoraX-PriorNet, CXray-EffDet, Attention-based CNN, and Swin Transformer to portray the superiority of the EnAE + ChWO+IMSTrans-based Thorax disease classification model.

4.1. Dataset description

The datasets, such as the ChestX-Ray dataset and the Lung Disease Dataset, are utilized for the evaluation of the proposed EnAE + ChWO+IMSTrans-based Thorax disease classification.

4.1.1. Original data distribution.

Dataset 1: ChestX-Ray dataset

The dataset consists of 8,274 chest X-ray images across 14 thoracic disease categories:
Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, and Hernia.
These images are annotated with one or more disease labels, and the dataset is inherently multi-label and imbalanced. For instance, diseases such as Infiltration and Effusion are more frequent, while Hernia and Fibrosis are rare.
This imbalance was initially visualized through class-wise frequency plots to better understand the skewness in data distribetter.

Dataset 2: Lung Disease Dataset

• This dataset contains 4,260 high-resolution chest X-ray images organized into three classes:
- • Normal Lungs (no abnormalities),
- • Fibrosis Lungs and
- • Pneumonia Lungs
The data also exhibits imbalance, with Normal cases being more prevalent compared to Fibrosis and Pneumonia.

4.1.2. Data distributions and preprocessing steps.

Dataset 1: ChestX-Ray dataset

Source: Publicly available NIH ChestX-ray14 dataset.
Subset Size Used: 8,274 chest X-ray images.
Classes (14 in total):

Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, Hernia.

• Nature: Multi-label, class-imbalanced dataset.

Infiltration and Effusion are overrepresented, while Hernia and Fibrosis are rare.

Label Handling: The “Finding Labels” column is parsed to assign binary multi-label vectors for each class.
Preprocessing:
- • Images with missing or unreadable files were removed.
- • Images resized to 256×256, followed by random crop (224×224).
- • Pixel normalization with ImageNet statistics.
- • Missing or null labels were excluded.
- • Data Augmentation:
- • Random horizontal and vertical flips (p=0.5).
- • Random rotations (up to ±30 degrees).
- • Augmentation increases feature variability but does not alter label distribution.

Dataset 2: Lung Disease Dataset

Source: dataset containing 4,260 images.
Classes: Normal, Pneumonia, and Fibrosis.
Nature: Multi-label, class-imbalanced dataset.
Preprocessing:
- •Images resized and normalized using the same procedure as Dataset 1 to ensure consistency.
- •Verified for label completeness and removed any ambiguous or mislabeled samples.

4.1.3. Data augmentation and impact.

To address the data imbalance and enhance model generalization, a robust augmentation pipeline was applied during training that includes:

Spatial transformations (resizing, random cropping),
Geometric variations (horizontal/vertical flipping, random rotations),
Normalization to standardize image intensity.

These augmentations are applied dynamically at each training epoch. While the underlying image count remains unchanged, the model is exposed to diverse representations of each sample, especially from minority classes. This synthetically balances the training data, increases intra-class variance, and reduces overfitting.

4.1.4. Hyperparameter tuning and cross-validation.

Enhancing the input feature space through feature selection based on the ChWO algorithm was the main objective of the optimization strategy. By ensuring that only the most relevant features were included for classification, this technique successfully reduced feature redundancy, enhanced class separability, and improved overall model performance.

Batch Size and Learning Rate: To balance memory limitations with training stability, we chose widely used empirically validated settings a fixed learning rate of 1e-4 and a batch size of 32.
Optimizer: The Adam optimizer was chosen because of its demonstrated convergence efficiency in deep learning problems, and it has a weight decay of 1e-5.
Training Epochs: Up to 100 training epochs were used to give the models enough time to converge without being overfit.

4.2. Comparative assessment

The comparative assessment is devised using two various datasets like D1 and D2 with the existing Thorax disease detection methods like ThoraX-PriorNet [21], CXray-EffDet [22], Attention-based CNN [23], and Swin Transformer [26]. The definition for the assessment measures is portrayed in Table 2.

Download:

Table 2. Definition for Assessment Measures.

https://doi.org/10.1371/journal.pone.0327099.t002

4.2.1. Assessment using dataset 1.

The assessment of Thorax disease classification method using the D1 dataset is evaluated based on training data and is elaborated in this section.

The accuracy-based performance evaluation of several models at varying training percentages is displayed in Fig 6. Accuracy defines the proportion of correct predictions made by the model out of the total samples. The accuracy assessment of Thorax disease classification methods is portrayed in Fig 6a. Here, the EnAE is key to extracting critical attributes using stacked auto-encoders combined with an attention module, EnAE extracts important features, removing noise and irrelevant data. Additionally, IMSTrans utilizes advanced attention mechanisms and hierarchical feature extraction, enabling it to process image patches efficiently. With values ranging from 0.878 at 50% training to 0.964 at 90%, the suggested model achieves the highest accuracy and routinely outperforms alternative approaches. Thus, the proposed EnAE + ChWO+IMSTrans model captures both local and global patterns in the X-rays, which leads to more accurate predictions compared to existing methods. The precision analysis is illustrated in Fig 6b. Here, the ChWO algorithm is utilized to select the most significant features after extraction. By discarding less relevant attributes, the model is streamlined to focus on features that are strongly correlated with specific thoracic diseases, reducing the risk of making incorrect classifications. The model’s highly accurate performance in predicting positives is demonstrated by the proposed method’s highest precision of 0.977 at 90% training percentage. Thus, the incorrect classifications are eliminated by the proposed model, which leads to higher precision by the proposed EnAE + ChWO+IMSTrans model. The recall-based analysis is illustrated in Fig 6c. The proposed method achieves higher recall due to the balanced data and efficient feature extraction modules. By augmenting the minority class samples, the model becomes better at identifying actual cases of Thorax diseases, which helps improve its recall. In addition, the EnAE captures finer and more complex features, ensuring that the model can detect even minor indicators of thoracic diseases, reducing false negatives. With a recall of 0.9845 at a 90% training percentage, the suggested technique accurately identifies the majority of relevant instances.

Download:

Fig 6. Assessment Using Dataset 1: (a) Accuracy, (b) Precision, (c) Recall, (d) F-Measure, (e) MCC and (f) MAE.

https://doi.org/10.1371/journal.pone.0327099.g006

The F1-score-based analysis is demonstrated in Fig 6d The EnAE’s ability to extract significant features and IMSTrans’s robust classification ensure that both precision and recall are maximized. It indicates that the proposed model helps accomplish both sensitive in detecting diseases (high recall) and specificity in making correct predictions (high precision). The maximum F1-Score of 0.964 is obtained by the proposed method at 90% training percentage, demonstrating an effective trade-off between recall and precision. The Matthews Correlation Coefficient (MCC) is used to analyze the robustness of the Thorax disease model for both positive and negative classes. The MCC-based analysis is demonstrated in Fig 6e. The use of data augmentation helps reduce the bias toward majority classes that results in a more balanced classification. It leads to lower false positive and false negative rates, which contribute to a higher MCC score by the proposed EnAE + ChWO+IMSTrans model. The proposed method achieves the maximum MCC of 0.9647 at 90%, indicating that it is the most dependable and balanced model. The MAE-based analysis is illustrated in Fig 6f. EnAE minimizes reconstruction error when extracting features, which ensures that the latent representations of the X-rays are highly accurate. It reduces the overall error in classifying the Thorax disease. The careful selection of features helps minimize prediction errors. By ensuring that only relevant features are used, the model avoids making large errors, which keeps the MAE low. The proposed method has the lowest MAE of 0.184 at 90%, meaning it predicts with the least amount of error. Table 3 shows the Assessment Using Dataset 1

Download:

Table 3. Assessment Using Dataset 1.

https://doi.org/10.1371/journal.pone.0327099.t003

4.2.2. Assessment using dataset 2.

The assessment of Thorax disease classification method using the D2 dataset is evaluated based on training data and is elaborated in this section.

Fig 7 analyzes the performance of various Thorax disease classification techniques on the D2 dataset, comparing their efficacy at different training data proportions, ranging from 50% to 90%. The accuracy is shown in Fig 7a, where EnaE+CHWO+IMSTrans continuously attains the best accuracy, hitting roughly 0.96 at 90% training data. Likewise, in Fig 7b Precision, EnaE+CHWO+IMSTrans is in the lead with approximately 0.95 at 90%. In Fig 7c, the Recall exhibits a comparable pattern, with EnaE+CHWO+IMSTrans reaching roughly 0.94. The F-Measure in Fig 7d, which strikes a balance between recall and precision, likewise peaks for EnaE+CHWO+IMSTrans at about 0.94. Fig 7e shows that the MCC scores for all models generally improve as the training percentage rises from 50% to 90%. In particular, proposed EnAe+CHWO+IMSTrans attains the greatest MCC with 90% training data, at roughly 0.96. At the maximum training, ThoraX-PriorNet has the lowest MCC, which is approximately 0.86. This suggests that, particularly when trained on more data, EnAe+CHWO+IMSTrans exhibits the best overall classification performance. Figure (f) shows that the MAE for all models tends to decrease as the training set size increases. EnAe+CHWO+IMSTrans shows the lowest MAE at the 90% training level, which is at 0.04, indicating the most minor average magnitude of errors in its predictions. At 90% training data, ThoraX-PriorNet has the greatest MAE, approximately 0.09, indicating higher average prediction errors. This suggests that, when trained with a larger dataset, EnAe+CHWO+IMSTrans produces the most accurate numerical predictions among all the models compared. Table 4 shows the assessment using Dataset 2.

Download:

Table 4. Assessment Using Dataset 2.

https://doi.org/10.1371/journal.pone.0327099.t004

Download:

Fig 7. Assessment Using Dataset 2: (a) Accuracy, (b) Precision, (c) Recall, (d) F-Measure, (e) MCC and (f) MAE.

https://doi.org/10.1371/journal.pone.0327099.g007

4.3. Accuracy-Loss: Accuracy-loss analysis compares the model’s accuracy and loss during both training and testing phases to assess the performance of the model and determine if it’s generalizing well to unseen data.

The accuracy and loss trends for Dataset 1 to classify thorax disease over 100 epochs are displayed in Fig 8. Over 100 epochs, the accuracy trends are shown in Fig 8a. The accuracy of training rises quickly, peaking at about 95% around epoch 30 before plateauing. The trajectory of the testing accuracy is similar, with modest swings after a slightly lower peak at 94%. This shows that the model is capable of learning and generalizing to new data with reasonable accuracy. In Fig 8b, the loss curves are shown. Over time, the training loss decreases from roughly 0.7 to less than 0.2. After a brief period of volatility, the testing loss likewise drops dramatically, stabilizing at about 0.2. It is clear from the rather narrow difference between training and testing loss that the model is not significantly overfitting the training set.

Download:

Fig 8. Accuracy-Loss Analysis for Dataset 1.

https://doi.org/10.1371/journal.pone.0327099.g008

The accuracy and loss trends for Dataset 2 to classify thorax disease over 100 epochs are displayed in Fig 9. Minor overfitting is suggested by the testing loss, which drops to a minimum of 0.21 before slightly increasing to 0.22, while the training loss gradually reduces from roughly 0.7 to 0.19. On the other hand, the black training accuracy quickly increases from 0.5 to a plateau at 0.97. Also improving is the testing accuracy, which peaks at about 0.95 and stays steady. Although there is some overfitting in later epochs, the model’s overall high testing accuracy of almost 95% shows that learning is effective.

Download:

Fig 9. Accuracy-Loss Analysis for Dataset 2.

https://doi.org/10.1371/journal.pone.0327099.g009

4.4. Confusion Matrix: The confusion matrix representation of the proposed EnAE + ChWO+IMSTrans model using dataset 1 and dataset 2 is portrayed in Fig 10, which illustrates the correct classification made by the proposed method.

Download:

Fig 10. Confusion Matrix for (a) Dataset 1 and (b) Dataset 2.

https://doi.org/10.1371/journal.pone.0327099.g010

Fig 10a shows a 14x14 matrix for thorax disease classification. For several classes, including class 0 (82), class 2 (71), and class 4 (107), the diagonal displays high correct classification rates. Misclassifications sometimes occur, though; for example, 37 cases of class 1 were incorrectly classified as class 0. The three categories of fibrosis, normal, and pneumonia are simplified in Fig 10b. With 250 pneumonia cases correctly predicted, 201 normal cases correctly classified, and 573 fibrosis patients correctly recognized, the matrix shows strong results. Three cases of pneumonia were misclassified as fibrosis, while two cases of normal were misclassified as fibrosis. At 0.986, this classification’s overall accuracy score on Dataset 2 is remarkably high. The model’s accuracy in classifying thoracic disorders is clearly demonstrated by these matrices, with Dataset 2 exhibiting robust findings.

4.5. AUC Analysis: The AUC analysis of Dataset 1 is portrayed in Fig 11a, b. The area under the ROC curve ranges from 0 to 1. A higher AUC value by EnAE + ChWO+IMSTrans model indicates better performance in distinguishing between classes. The analysis of Dataset 2’s Area Under the Curve (AUC) is shown in Fig 12a, b. Various models’ performances are evaluated in Fig 12a, and the model with the most excellent AUC, EnAE + ClWO+IMSTrans, shows superior classification abilities. The AUC curves for several classes (0, 1, and 2) are shown in Fig 12b. All of these classes achieve unusually high AUC scores of 0.99, indicating superior discrimination ability across all categories of Dataset 2.

Download:

Fig 11. AUC Analysis for Dataset 1.

https://doi.org/10.1371/journal.pone.0327099.g011

Download:

Fig 12. AUC Analysis for Dataset 2.

https://doi.org/10.1371/journal.pone.0327099.g012

4.6. Convergence Analysis: Convergence analysis evaluates how quickly and effectively an optimization algorithm minimizes the loss function, ensuring that the model reaches the global or near-global minimum efficiently. The convergence analysis is demonstrated in Fig 13. The inclusion of chaotic mapping assist the proposed ChWO algorithm to converge faster compared to existing whale optimization algorithm.

Download:

Fig 13. Convergence Analysis.

https://doi.org/10.1371/journal.pone.0327099.g013

4.7. Comparative discussion

The comparative discussion for Dataset 1 and Dataset 2 based on the best case with 90% of training data and 10% of testing data is portrayed in Table 5. The proposed EnAE + ChWO+IMSTrans approach performs better across all assessment measures, according to a comparative analysis of Thorax disease categorization on Datasets 1 and 2. The proposed approach outperformed ThoraX-PriorNet, CXray-EffDet, Attention-based CNN, and Swin Transformer by 7.99%, 5.39%, 3.73%, and 1.97%, respectively, and obtained the highest accuracy (0.964) for Dataset 1. Likewise, the proposed technique outperformed Swin Transformer, the following best method, by 2.81% for Dataset 2, an accuracy of 0.986. Throughout both datasets, the proposed approach continuously yielded superior outcomes in terms of accuracy, recall, F-score, and MCC. Its robustness in detecting true positives is noteworthy, as evidenced by the recall on Datasets 1 (0.9845) and 2 (0.985). Additionally, the model’s Mean Absolute Error (MAE) was the lowest, particularly on Dataset 2, at 0.014, which is far lower than all baseline techniques. These findings highlight the improved generality and efficacy of the proposed method in classifying Thorax disease, particularly when 90% of the data is used for training and 10% is used for testing. Overall, the consistent superiority of the two datasets highlights their suitability for dependable clinical use in medical image classification tasks.

Download:

Table 5. Comparative Analysis for Datasets 1 and 2.

https://doi.org/10.1371/journal.pone.0327099.t005

Here, the comparative analysis portrays the superiority of the proposed model compared to the existing methods.

A comparative analysis of proposed and state-of-the-art methods for classifying thoracic diseases across two datasets is shown in Table 6. The suggested approach achieves the highest accuracy, precision, and recall at 90%, outperforming all others. It considerably outperforms CycleGAN with 93.3% accuracy, 94.7% precision, and 92.1% recall on Dataset 1, achieving 96.4% accuracy, 97.7% precision, and 98.45% recall. Likewise, the suggested approach attains 98.6% accuracy, 98.5% precision, and 98.5% recall in Dataset 2. On the other hand, DenseNet121 and CX-Ultranet perform moderately, demonstrating the superior efficacy of the proposed model in classifying thoracic diseases.

Download:

Table 6. Comparative Analysis of proposed and state-of-the-art methods.

https://doi.org/10.1371/journal.pone.0327099.t006

5. Conclusion

The proposed approach incorporates several key innovations, including data augmentation during pre-processing to mitigate class imbalance, feature extraction using the EnAE with an attention mechanism and optimal feature selection through the ChWO algorithm. Finally, classification is performed using the IMSTrans model, which has demonstrated superior performance in terms of accuracy, precision, recall, and overall classification efficacy compared to existing methods. The proposed model was rigorously validated using two major datasets: the Chest X-Ray dataset with over 112,000 images and the Lung Disease Dataset of 4,260 high-resolution images. Results showed superior performance, achieving accuracies of up to 96.4% and 98.6% on these datasets, respectively. The proposed framework is computationally intensive, particularly during the training phase, requiring significant resources for both training and feature extraction.

5.1. Limitations and future work

The proposed method faces certain limitations, despite showing improved classification performance when employing the Chaotic Whale Optimization Algorithm for feature selection and the EnAE with attention mechanisms. First and foremost, it is computationally demanding, particularly during the training and feature extraction stages, using significant resources that could impede its use in environments with limited resources, such as remote hospitals or mobile diagnostic units. Furthermore, the quality and representativeness of the Chest X-ray dataset used are crucial to the model’s performance; even with data augmentation efforts, imbalances and rare illness classes persist as problems. There is currently no test for generalizability to other medical imaging modalities, which restricts wider clinical use.

Thus, Future research should conduct comprehensive clinical validation tests with radiologists to overcome the uncertainties surrounding the model’s applicability in actual clinical situations. To assess the model’s interpretability, dependability, and practical utility in routine diagnostics, such investigations are crucial. Refinements can be guided by collaborations with doctors to better address clinical demands. Trust and usability will be improved by incorporating explainability elements. Furthermore, deployment requires optimizing computing efficiency, particularly in environments with restricted resources. It will be easier to move from a research prototype to a clinically useful tool for diagnosing thorax disease if the dataset is expanded to include multi-institutional and multi-modal imaging data.

References

1. Li Q, Lai Y, Adamu MJ, Qu L, Nie J, Nie W. Multi-Level Residual Feature Fusion Network for Thoracic Disease Classification in Chest X-Ray Images. IEEE Access. 2023;11:40988–1002.
- View Article
- Google Scholar
2. Yimer F, Tessema AW, Simegn GL. Multiple lung diseases classification from chest X-ray images using deep learning approach. Int J. 2021;10:2936–46.
- View Article
- Google Scholar
3. Sharma S, Guleria K. A Deep Learning based model for the Detection of Pneumonia from Chest X-Ray Images using VGG-16 and Neural Networks. Procedia Computer Science. 2023;218:357–66.
- View Article
- Google Scholar
4. Goyal S, Singh R. Detection and classification of lung diseases for pneumonia and Covid-19 using machine and deep learning techniques. J Ambient Intell Humaniz Comput. 2023;14(4):3239–59. pmid:34567277
- View Article
- PubMed/NCBI
- Google Scholar
5. Khan E, Rehman MZU, Ahmed F, Alfouzan FA, Alzahrani NM, Ahmad J. Chest X-ray Classification for the Detection of COVID-19 Using Deep Learning Techniques. Sensors (Basel). 2022;22(3):1211. pmid:35161958
- View Article
- PubMed/NCBI
- Google Scholar
6. Yi R, Tang L, Tian Y, Liu J, Wu Z. Identification and classification of pneumonia disease using a deep learning-based intelligent computational framework. Neural Comput Appl. 2023;35(20):14473–86. pmid:34035563
- View Article
- PubMed/NCBI
- Google Scholar
7. Ullah I, Ali F, Shah B, El-Sappagh S, Abuhmed T, Park SH. A deep learning based dual encoder-decoder framework for anatomical structure segmentation in chest X-ray images. Sci Rep. 2023;13(1):791. pmid:36646735
- View Article
- PubMed/NCBI
- Google Scholar
8. Avanzato R, Beritelli F. Thorax Disease Classification based on the Convolutional Network SqueezeNet. In 2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). IEEE; 2023, September; 1. p. 933–7.
9. Chen K, Wang X, Zhang S. Thorax Disease Classification Based on Pyramidal Convolution Shuffle Attention Neural Network. IEEE Access. 2022;10:85571–81.
- View Article
- Google Scholar
10. Devasia J, Goswami H, Lakshminarayanan S, Rajaram M, Adithan S. Deep learning classification of active tuberculosis lung zones wise manifestations using chest X-rays: a multi label approach. Sci Rep. 2023;13(1):887. pmid:36650270
- View Article
- PubMed/NCBI
- Google Scholar
11. Deng LY, Lim X-Y, Luo T-Y, Lee M-H, Lin T-C. Application of Deep Learning Techniques for Detection of Pneumothorax in Chest Radiographs. Sensors (Basel). 2023;23(17):7369. pmid:37687825
- View Article
- PubMed/NCBI
- Google Scholar
12. Albahli S, Rauf HT, Arif M, Nafis MT, Algosaibi A. Identification of thoracic diseases by exploiting deep neural networks. Neural Netw. 2021;5(6):10–32604.
- View Article
- Google Scholar
13. Ibrahim AU, Ozsoz M, Serte S, Al-Turjman F, Yakoi PS. Pneumonia Classification Using Deep Learning from Chest X-ray Images During COVID-19. Cognit Comput. 2021;:1–13. pmid:33425044
- View Article
- PubMed/NCBI
- Google Scholar
14. Victor Ikechukwu A, S M. CX-Net: an efficient ensemble semantic deep neural network for ROI identification from chest-x-ray images for COPD diagnosis. Mach Learn: Sci Technol. 2023;4(2):025021.
- View Article
- Google Scholar
15. Xu Y, Lam H-K, Bao X, Wang Y. Learning group-wise spatial attention and label dependencies for multi-task thoracic disease classification. Neurocomputing. 2024;573:127228.
- View Article
- Google Scholar
16. Sajed S, Sanati A, Garcia JE, Rostami H, Keshavarz A, Teixeira A. The effectiveness of deep learning vs. traditional methods for lung disease diagnosis using chest X-ray images: A systematic review. Applied Soft Computing. 2023;147:110817.
- View Article
- Google Scholar
17. Bhosale YH, Patnaik KS. PulDi-COVID: Chronic obstructive pulmonary (lung) diseases with COVID-19 classification using ensemble deep convolutional neural network from chest X-ray images to minimize severity and mortality rates. Biomed Signal Process Control. 2023;81:104445. pmid:36466567
- View Article
- PubMed/NCBI
- Google Scholar
18. Rahimiaghdam S. Enhancing the stability and quality assessment of visual explanations for thorax disease classification using deep learning. 2023.
19. Soysal OA, Guzel MS, Dikmen M, Bostanci GE. Common Thorax Diseases Recognition Using Zero-Shot Learning With Ontology in the Multi-Labeled ChestX-ray14 Data Set. IEEE Access. 2023;11:27883–92.
- View Article
- Google Scholar
20. Agarwal S, Arya KV, Meena YK. CNN-O-ELMNet: Optimized Lightweight and Generalized Model for Lung Disease Classification and Severity Assessment. IEEE Transactions on Medical Imaging. 2024.
- View Article
- Google Scholar
21. Hossain MdI, Zunaed M, Ahmed MdK, Hossain SMJ, Hasan A, Hasan T. ThoraX-PriorNet: A Novel Attention-Based Architecture Using Anatomical Prior Probability Maps for Thoracic Disease Classification. IEEE Access. 2024;12:3256–73.
- View Article
- Google Scholar
22. Nawaz M, Nazir T, Baili J, Khan MA, Kim YJ, Cha J-H. CXray-EffDet: Chest Disease Detection and Classification from X-ray Images Using the EfficientDet Model. Diagnostics (Basel). 2023;13(2):248. pmid:36673057
- View Article
- PubMed/NCBI
- Google Scholar
23. Upasana C, Tewari AS, Singh JP. An Attention-based Pneumothorax Classification using Modified Xception Model. Procedia Computer Science. 2023;218:74–82.
- View Article
- Google Scholar
24. Xiao J, Bai Y, Yuille A, Zhou Z. Delving into masked autoencoders for multi-label thorax disease classification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023. p. 3588–600.
25. Malik H, Anees T. Multi-modal deep learning methods for classification of chest diseases using different medical imaging and cough sounds. PLoS One. 2024;19(3):e0296352. pmid:38470893
- View Article
- PubMed/NCBI
- Google Scholar
26. Li W, Zou X, Zhang J, Hu M, Chen G, Su S. Predicting lung cancer bone metastasis using CT and pathological imaging with a Swin Transformer model. J Bone Oncol. 2025;52:100681. pmid:40342492
- View Article
- PubMed/NCBI
- Google Scholar
27. Han Y, Chen C, Tang L, Lin M, Jaiswal A, Wang S, et al. Using radiomics as prior knowledge for thorax disease classification and localization in chest x-rays. In: AMIA Annual Symposium Proceedings. 2022; 2021. p. 546.
28. Kabiraj A, Meena T, Reddy PB, Roy S. Multiple thoracic diseases detection from X-rays using CX-Ultranet. Health Technol. 2024;14(2):291–303.
- View Article
- Google Scholar
29. Geroski T, Pavić O, Dašić L, Milovanović D, Petrović M, Filipović N. SoftLungX: leveraging transfer learning with convolutional neural networks for accurate respiratory disease classification in chest X-ray images. J Big Data. 2024;11(1).
- View Article
- Google Scholar
30. Hage Chehade A, Abdallah N, Marion J-M, Hatt M, Oueidat M, Chauvet P. Advancing chest X-ray diagnostics: A novel CycleGAN-based preprocessing approach for enhanced lung disease classification in ChestX-Ray14. Comput Methods Programs Biomed. 2025;259:108518. pmid:39615193
- View Article
- PubMed/NCBI
- Google Scholar
31. Yuan X, Qi S, Wang Y. Stacked Enhanced Auto-Encoder for Data-Driven Soft Sensing of Quality Variable. IEEE Trans Instrum Meas. 2020;69(10):7953–61.
- View Article
- Google Scholar
32. Ejaz K, Rahim MSM, Bajwa UI, Rana N, Rehman A. An Unsupervised Learning with Feature Approach for Brain Tumor Segmentation Using Magnetic Resonance Imaging. In: Proceedings of the 2019 9th International Conference on Bioscience, Biochemistry and Bioinformatics. 2019. p. 1–7. https://doi.org/10.1145/3314367.3314384
33. Alshanketi F, Alharbi A, Kuruvilla M, Mahzoon V, Siddiqui ST, Rana N, et al. Pneumonia Detection from Chest X-Ray Images Using Deep Learning and Transfer Learning for Imbalanced Datasets. Journal of Imaging Informatics in Medicine. 2024:1–20.
- View Article
- Google Scholar
34. Ejaz K, Suaib NBM, Kamal MS, Rahim MSM, Rana N. Segmentation Method of Deterministic Feature Clustering for Identification of Brain Tumor Using MRI. IEEE Access. 2023;11:39695–712.
- View Article
- Google Scholar
35. Cui L, Jing X, Wang Y, Huan Y, Xu Y, Zhang Q. Improved Swin Transformer-Based Semantic Segmentation of Postearthquake Dense Buildings in Urban Areas Using Remote Sensing Images. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2023;16:369–85.
- View Article
- Google Scholar
36. Arya Shetty. Lung Disease Detection [Internet]. Kaggle. 2022 [cited 2025 Jun 13. ]. Available from: https://www.kaggle.com/datasets/aryashetty29/fibrosis
- View Article
- Google Scholar
37. Kamoji A. Lung Disease Detection [Internet]. Kaggle; 2022 [cited 2025 Jun 13. ]. Available from: https://www.kaggle.com/code/anmolkamoji/lung-disease-detection/input
- View Article
- Google Scholar
38. Sayed GI, Darwish A, Hassanien AE. A New Chaotic Whale Optimization Algorithm for Features Selection. J Classif. 2018;35(2):300–44.
- View Article
- Google Scholar
39. Mirjalili S, Lewis A. The Whale Optimization Algorithm. Advances in Engineering Software. 2016;95:51–67.
- View Article
- Google Scholar
40. Rana N, Latiff MSA, Abdulhamid SIM, Chiroma H. Whale optimization algorithm: a systematic review of contemporary applications, modifications and developments. Neural Computing and Applications. 2020;32:16245–77. https://doi.org/10.1007/s00521-020-04849-z
- View Article
- Google Scholar

[ref1] 1. Li Q, Lai Y, Adamu MJ, Qu L, Nie J, Nie W. Multi-Level Residual Feature Fusion Network for Thoracic Disease Classification in Chest X-Ray Images. IEEE Access. 2023;11:40988–1002.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Yimer F, Tessema AW, Simegn GL. Multiple lung diseases classification from chest X-ray images using deep learning approach. Int J. 2021;10:2936–46.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Sharma S, Guleria K. A Deep Learning based model for the Detection of Pneumonia from Chest X-Ray Images using VGG-16 and Neural Networks. Procedia Computer Science. 2023;218:357–66.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Goyal S, Singh R. Detection and classification of lung diseases for pneumonia and Covid-19 using machine and deep learning techniques. J Ambient Intell Humaniz Comput. 2023;14(4):3239–59. pmid:34567277
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Khan E, Rehman MZU, Ahmed F, Alfouzan FA, Alzahrani NM, Ahmad J. Chest X-ray Classification for the Detection of COVID-19 Using Deep Learning Techniques. Sensors (Basel). 2022;22(3):1211. pmid:35161958
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Yi R, Tang L, Tian Y, Liu J, Wu Z. Identification and classification of pneumonia disease using a deep learning-based intelligent computational framework. Neural Comput Appl. 2023;35(20):14473–86. pmid:34035563
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Ullah I, Ali F, Shah B, El-Sappagh S, Abuhmed T, Park SH. A deep learning based dual encoder-decoder framework for anatomical structure segmentation in chest X-ray images. Sci Rep. 2023;13(1):791. pmid:36646735
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref8] 8. Avanzato R, Beritelli F. Thorax Disease Classification based on the Convolutional Network SqueezeNet. In 2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). IEEE; 2023, September; 1. p. 933–7.

[ref9] 9. Chen K, Wang X, Zhang S. Thorax Disease Classification Based on Pyramidal Convolution Shuffle Attention Neural Network. IEEE Access. 2022;10:85571–81.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref10] 10. Devasia J, Goswami H, Lakshminarayanan S, Rajaram M, Adithan S. Deep learning classification of active tuberculosis lung zones wise manifestations using chest X-rays: a multi label approach. Sci Rep. 2023;13(1):887. pmid:36650270
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Deng LY, Lim X-Y, Luo T-Y, Lee M-H, Lin T-C. Application of Deep Learning Techniques for Detection of Pneumothorax in Chest Radiographs. Sensors (Basel). 2023;23(17):7369. pmid:37687825
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Albahli S, Rauf HT, Arif M, Nafis MT, Algosaibi A. Identification of thoracic diseases by exploiting deep neural networks. Neural Netw. 2021;5(6):10–32604.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref13] 13. Ibrahim AU, Ozsoz M, Serte S, Al-Turjman F, Yakoi PS. Pneumonia Classification Using Deep Learning from Chest X-ray Images During COVID-19. Cognit Comput. 2021;:1–13. pmid:33425044
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref14] 14. Victor Ikechukwu A, S M. CX-Net: an efficient ensemble semantic deep neural network for ROI identification from chest-x-ray images for COPD diagnosis. Mach Learn: Sci Technol. 2023;4(2):025021.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref15] 15. Xu Y, Lam H-K, Bao X, Wang Y. Learning group-wise spatial attention and label dependencies for multi-task thoracic disease classification. Neurocomputing. 2024;573:127228.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref16] 16. Sajed S, Sanati A, Garcia JE, Rostami H, Keshavarz A, Teixeira A. The effectiveness of deep learning vs. traditional methods for lung disease diagnosis using chest X-ray images: A systematic review. Applied Soft Computing. 2023;147:110817.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref17] 17. Bhosale YH, Patnaik KS. PulDi-COVID: Chronic obstructive pulmonary (lung) diseases with COVID-19 classification using ensemble deep convolutional neural network from chest X-ray images to minimize severity and mortality rates. Biomed Signal Process Control. 2023;81:104445. pmid:36466567
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref18] 18. Rahimiaghdam S. Enhancing the stability and quality assessment of visual explanations for thorax disease classification using deep learning. 2023.

[ref19] 19. Soysal OA, Guzel MS, Dikmen M, Bostanci GE. Common Thorax Diseases Recognition Using Zero-Shot Learning With Ontology in the Multi-Labeled ChestX-ray14 Data Set. IEEE Access. 2023;11:27883–92.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref20] 20. Agarwal S, Arya KV, Meena YK. CNN-O-ELMNet: Optimized Lightweight and Generalized Model for Lung Disease Classification and Severity Assessment. IEEE Transactions on Medical Imaging. 2024.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref21] 21. Hossain MdI, Zunaed M, Ahmed MdK, Hossain SMJ, Hasan A, Hasan T. ThoraX-PriorNet: A Novel Attention-Based Architecture Using Anatomical Prior Probability Maps for Thoracic Disease Classification. IEEE Access. 2024;12:3256–73.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref22] 22. Nawaz M, Nazir T, Baili J, Khan MA, Kim YJ, Cha J-H. CXray-EffDet: Chest Disease Detection and Classification from X-ray Images Using the EfficientDet Model. Diagnostics (Basel). 2023;13(2):248. pmid:36673057
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref23] 23. Upasana C, Tewari AS, Singh JP. An Attention-based Pneumothorax Classification using Modified Xception Model. Procedia Computer Science. 2023;218:74–82.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref24] 24. Xiao J, Bai Y, Yuille A, Zhou Z. Delving into masked autoencoders for multi-label thorax disease classification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023. p. 3588–600.

[ref25] 25. Malik H, Anees T. Multi-modal deep learning methods for classification of chest diseases using different medical imaging and cough sounds. PLoS One. 2024;19(3):e0296352. pmid:38470893
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref26] 26. Li W, Zou X, Zhang J, Hu M, Chen G, Su S. Predicting lung cancer bone metastasis using CT and pathological imaging with a Swin Transformer model. J Bone Oncol. 2025;52:100681. pmid:40342492
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref27] 27. Han Y, Chen C, Tang L, Lin M, Jaiswal A, Wang S, et al. Using radiomics as prior knowledge for thorax disease classification and localization in chest x-rays. In: AMIA Annual Symposium Proceedings. 2022; 2021. p. 546.

[ref28] 28. Kabiraj A, Meena T, Reddy PB, Roy S. Multiple thoracic diseases detection from X-rays using CX-Ultranet. Health Technol. 2024;14(2):291–303.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref29] 29. Geroski T, Pavić O, Dašić L, Milovanović D, Petrović M, Filipović N. SoftLungX: leveraging transfer learning with convolutional neural networks for accurate respiratory disease classification in chest X-ray images. J Big Data. 2024;11(1).
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref30] 30. Hage Chehade A, Abdallah N, Marion J-M, Hatt M, Oueidat M, Chauvet P. Advancing chest X-ray diagnostics: A novel CycleGAN-based preprocessing approach for enhanced lung disease classification in ChestX-Ray14. Comput Methods Programs Biomed. 2025;259:108518. pmid:39615193
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref31] 31. Yuan X, Qi S, Wang Y. Stacked Enhanced Auto-Encoder for Data-Driven Soft Sensing of Quality Variable. IEEE Trans Instrum Meas. 2020;69(10):7953–61.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref32] 32. Ejaz K, Rahim MSM, Bajwa UI, Rana N, Rehman A. An Unsupervised Learning with Feature Approach for Brain Tumor Segmentation Using Magnetic Resonance Imaging. In: Proceedings of the 2019 9th International Conference on Bioscience, Biochemistry and Bioinformatics. 2019. p. 1–7. https://doi.org/10.1145/3314367.3314384

[ref33] 33. Alshanketi F, Alharbi A, Kuruvilla M, Mahzoon V, Siddiqui ST, Rana N, et al. Pneumonia Detection from Chest X-Ray Images Using Deep Learning and Transfer Learning for Imbalanced Datasets. Journal of Imaging Informatics in Medicine. 2024:1–20.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref34] 34. Ejaz K, Suaib NBM, Kamal MS, Rahim MSM, Rana N. Segmentation Method of Deterministic Feature Clustering for Identification of Brain Tumor Using MRI. IEEE Access. 2023;11:39695–712.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref35] 35. Cui L, Jing X, Wang Y, Huan Y, Xu Y, Zhang Q. Improved Swin Transformer-Based Semantic Segmentation of Postearthquake Dense Buildings in Urban Areas Using Remote Sensing Images. IEEE J Sel Top Appl Earth Observations Remote Sensing. 2023;16:369–85.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref36] 36. Arya Shetty. Lung Disease Detection [Internet]. Kaggle. 2022 [cited 2025 Jun 13. ]. Available from: https://www.kaggle.com/datasets/aryashetty29/fibrosis
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref37] 37. Kamoji A. Lung Disease Detection [Internet]. Kaggle; 2022 [cited 2025 Jun 13. ]. Available from: https://www.kaggle.com/code/anmolkamoji/lung-disease-detection/input
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref38] 38. Sayed GI, Darwish A, Hassanien AE. A New Chaotic Whale Optimization Algorithm for Features Selection. J Classif. 2018;35(2):300–44.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref39] 39. Mirjalili S, Lewis A. The Whale Optimization Algorithm. Advances in Engineering Software. 2016;95:51–67.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref40] 40. Rana N, Latiff MSA, Abdulhamid SIM, Chiroma H. Whale optimization algorithm: a systematic review of contemporary applications, modifications and developments. Neural Computing and Applications. 2020;32:16245–77. https://doi.org/10.1007/s00521-020-04849-z
View Article
Google Scholar

[121] View Article

[122] Google Scholar

Figures

Abstract

1. Introduction

2. Related works

2.1. Problem statement

3. Proposed methodology

3.1. Data acquisition

3.2. Pre-processing

3.2.1. Gaussian filtering.

3.2.2. Resizing.

3.2.3. Normalization.

3.3. Data augmentation

3.4. Enhanced auto-encoder-based feature extraction

3.5. Chaotic whale optimization algorithm-based feature selection

3.6. Improved swin transformer-based neural network for thorax disease detection

4. Result and discussion

4.1. Dataset description

4.1.1. Original data distribution.

4.1.2. Data distributions and preprocessing steps.

4.1.3. Data augmentation and impact.

4.1.4. Hyperparameter tuning and cross-validation.

4.2. Comparative assessment

4.2.1. Assessment using dataset 1.

4.2.2. Assessment using dataset 2.

4.7. Comparative discussion

5. Conclusion

5.1. Limitations and future work

References