Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A pruned and parameter-efficient Xception framework for skin cancer classification

  • Şafak Kılıç ,

    Contributed equally to this work with: Şafak Kılıç, Yahya Doğan

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    safakkilic@kayseri.edu.tr

    Affiliations School of Computer Science, CHART Laboratory, University of Nottingham, Nottingham, United Kingdom, Faculty of Engineering, Architecture and Design, Department of Software Engineering, Kayseri University, Kayseri, Turkey

  • Yahya Doğan

    Contributed equally to this work with: Şafak Kılıç, Yahya Doğan

    Roles Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Computer Engineering, Siirt University, Siirt, Turkey

Abstract

Skin cancer is one of the most prevalent and potentially lethal diseases worldwide, with early detection being critical for patient survival. This study presents a novel framework that leverages transfer learning, pruning, SMOTE, data augmentation, and the advanced Avg-TopK pooling method to improve the accuracy and efficiency of skin cancer classification using dermoscopic images. The HAM10000 dataset was used to evaluate the performance of various transfer learning models, with Xception as the top performer. A layer-based pruning strategy was proposed to optimize the model and reduce its complexity. SMOTE and data augmentation were applied to address the class imbalance within the dataset, significantly improving the model’s generalization across all skin lesion classes. The utilization of the Avg-TopK pooling technique further enhanced model accuracy by preserving crucial image features during the downsampling process. The proposed approach achieved an overall accuracy of 91.52%, surpassing several state-of-the-art models. Following pruning, the model’s parameter count was reduced by approximately 35%, from 20.9 million to 13.5 million, improving efficiency and performance. This framework demonstrates the effectiveness of combining model pruning, oversampling, and advanced pooling methods to build robust and efficient skin cancer classification systems suitable for clinical applications.

1. Introduction

Skin cancer is a rapidly growing concern in the medical field due to its increasing incidence worldwide, driven by factors such as environmental changes, radiation exposure, lifestyle habits, and genetic predispositions [1,2]. Among all cancer types, skin cancer holds a dominant position, with millions of new cases diagnosed annually. The complexity of skin lesions, which vary greatly in appearance and often resemble benign conditions, makes early detection challenging for even the most experienced dermatologists [3,4]. This difficulty highlights the need for more advanced diagnostic tools, especially as some types, like melanoma, are highly aggressive and life-threatening if not detected early [5].

The introduction of dermoscopy as a non-invasive imaging technique has significantly enhanced the accuracy of skin cancer diagnosis. However, its effectiveness is still dependent on the expertise of dermatologists, which varies widely [2,6]. As skin cancer cases continue to rise, automated methods using deep learning, particularly Convolutional Neural Networks (CNNs), have gained traction for their ability to accurately classify skin lesions based on image data [1,3]. CNNs have revolutionized the field by reducing the need for handcrafted features, allowing faster and more reliable diagnoses in real-time settings [7,8].

Transfer learning models have been particularly effective in medical image classification tasks, leveraging pre-trained networks such as Xception, DenseNet, and ResNet to reduce training time and computational resources while maintaining high accuracy [9]. However, these models often involve many parameters, which can be computationally expensive and difficult to deploy in real-time or resource-constrained environments. To address this, pruning techniques are increasingly being explored to reduce the number of parameters in these models without significantly compromising their performance [10,11]. Applying pruning to the highest-performing transfer learning model enables the creation of a more efficient model that retains high accuracy while reducing computational load.

One of the significant challenges encountered in skin cancer classification is the issue of class imbalance, where certain lesion types are underrepresented in the dataset. This imbalance can lead to biased predictions, with the model favoring majority classes. To address this problem, techniques such as SMOTE (Synthetic Minority Over-sampling Technique) and ADASYN (Adaptive Synthetic Sampling) are employed in various problem domains to generate synthetic examples for minority classes, thereby balancing the dataset and enabling the model to produce more accurate results across all classes [1214]. Additionally, data augmentation techniques such as rotation, scaling, and color adjustments are applied to increase the diversity of the dataset, thus preventing overfitting. The HAM10000 dataset used in this study faces class imbalance issues, and comprehensive balancing and data processing techniques must be applied to obtain a more robust and accurate model.

Recent advancements in 2025 have further expanded the scope of deep learning in oncological applications, ranging from skin lesion classification to breast cancer detection, emphasizing the need for both high accuracy and computational efficiency. For instance, Aruk et al. [15] introduced a novel hybrid approach combining ConvNeXt blocks with Vision Transformers (ViT), which captures both local textural patterns and global long-range dependencies, achieving 94.30% accuracy on the HAM10000 dataset. In a comprehensive comparative study, Aruk et al. [16] benchmarked 15 CNNs against 15 ViT models, demonstrating that while Swin Transformer-based ViTs yield superior accuracy (92.12%), they incur higher computational costs compared to CNNs. Beyond dermatology, similar architectural optimizations are critical in other domains; Alswilem and Pacal [17] highlighted the trade-off between diagnostic performance and efficiency in breast cancer ultrasound analysis, identifying RexNet-200 as a pragmatic, high-efficiency model. Additionally, Cakmak and Pacal [18] demonstrated the efficacy of InceptionV3 in breast ultrasound classification, achieving 96.67% accuracy, which further underscores the importance of selecting appropriate deep learning architectures for specific medical imaging tasks.

In addition to addressing the class imbalance, enhancing model architecture through optimized pooling strategies can further boost performance. Traditional pooling layers, such as max pooling or average pooling, are commonly used in CNNs to reduce the dimensionality of feature maps. However, replacing these layers with advanced techniques like Avg-TopK Pooling can improve feature selection by retaining the most informative data points, leading to more accurate predictions [19]. This approach enhances the model‘s accuracy and reduces the risk of losing important features during the downsampling process.

This study investigates the performance of multiple transfer learning models on a large skin cancer dataset. The top-performing model is then pruned to create a lightweight version, maintaining high performance while reducing the number of parameters. In addition, the class imbalance problem in the dataset is addressed through the use of SMOTE and data augmentation techniques. Traditional pooling layers are replaced with Avg-TopK pooling to evaluate its impact on overall model performance. These optimizations aim to develop an efficient and high-performing model for skin cancer classification, suitable for real-time diagnostic applications.

This study makes the following key contributions:

  • In this study, a novel parameter-efficient framework for skin cancer classification was developed using pruning, SMOTE, data augmentation, and advanced Avg-TopK pooling.
  • A sparsity-based pruning strategy was proposed to reduce the complexity of the best-performing model. This pruning method decreased the number of parameters by approximately 35% (from 20.9 million to 13.5 million), resulting in a more parameter-efficient variant of Xception while still maintaining high classification performance.
  • Experiments were carried out on a well-known dataset, i.e., HAM10000, demonstrating that the proposed method improves classification accuracy by addressing class imbalance through SMOTE and enhancing the model’s generalization with data augmentation techniques.
  • The integration of the advanced Avg-TopK pooling technique further boosted model performance by preserving essential features during the downsampling process, resulting in improved accuracy and robustness in skin cancer classification.
  • The performances of a pruned Xception model were compared to standard models through both quantitative metrics and qualitative evaluations, demonstrating that the pruned model retains high accuracy while significantly reducing the number of parameters, making it more efficient and practical for clinical applications.

The structure of this article is organized as follows: Section 2 reviews the existing literature on skin cancer classification techniques. Section 3 describes the proposed approach, including the use of transfer learning, pruning, SMOTE, data augmentation, and Avg-TopK pooling. Section 4 presents the experimental setup and analyzes the results. Finally, the article concludes with a discussion of the findings and potential directions for future research.

2. Related works

In medical image analysis, particularly in the classification of skin diseases, a wide range of techniques have been applied to enhance the accuracy and reliability of diagnostic systems [20,21]. Each technique offers distinct advantages and faces specific challenges depending on the complexity of the dataset and the nature of the task. From traditional statistical methods to advanced neural networks, these approaches are designed to tackle the diverse characteristics of skin lesions, such as texture, color, and shape. Several widely used techniques, as summarized in Table 1, are discussed, highlighting their key functions and limitations in the context of skin disease classification.

thumbnail
Table 1. Summary of related works on skin lesion classification, including classical methods, hybrid approaches, and deep learning models.

https://doi.org/10.1371/journal.pone.0341227.t001

Morphological operations have been widely used as preprocessing techniques to enhance structural features such as lesion boundaries and abnormal regions. By applying operations like dilation and erosion, these methods aim to emphasize diagnostically relevant shapes. However, their effectiveness strongly depends on threshold selection and structuring elements, making them less reliable for complex skin lesion variations in shape, texture, and size [2225].

Gray Level Co-occurrence Matrix (GLCM) is a statistical texture analysis method that extracts spatial intensity relationships between pixel pairs. While effective for capturing textural patterns, GLCM-based approaches are computationally expensive and lack robustness to rotation, scale, and texture variations, limiting their generalization performance [2628].

Traditional machine learning algorithms, including Bayesian classifiers, Decision Trees, KNN, SVM, and ANN, have been extensively applied to skin disease classification. These methods leverage handcrafted features related to texture, color, and shape. Despite their effectiveness in structured scenarios, they often suffer from overfitting, sensitivity to feature selection, and limited scalability when dealing with complex and high-dimensional image data [2931].

Genetic Algorithms (GA) have also found applications in skin disease classification tasks due to their ability to explore large and complex solution spaces efficiently. In skin classification, GA is utilized to optimize feature selection, model parameters, and other classification criteria. GA helps identify near-optimal solutions for distinguishing between different skin conditions by simulating natural selection, crossover, and mutation processes. However, despite its strengths, GA does not always guarantee convergence to the global optimum, especially in high-dimensional spaces, and its computational cost can be substantial, making it less ideal for real-time medical applications [3234].

CNNs have become one of the most widely used techniques in skin disease classification due to their ability to automatically learn and extract relevant features from images. In particular, CNNs excel in handling the complexities of image data, making them highly suitable for tasks such as skin lesion classification. A common approach within this field involves leveraging transfer learning, where pre-trained CNN models such as ResNet, Inception, or VGG are fine-tuned on specific datasets, significantly reducing training time while achieving high accuracy. Despite their effectiveness, CNNs come with certain limitations [35]. They sometimes struggle to interpret the scale and size of objects within images, which is crucial in distinguishing subtle differences in skin lesions. Moreover, CNNs demand substantial computational resources and long training times to achieve reliable performance. Additionally, maintaining spatial invariance ensuring that small changes in object position or orientation do not impact the model‘s predictions remains a challenge, potentially affecting the robustness of the classification results [3638].

Ensemble learning approaches improve skin lesion classification performance by combining multiple classifiers or deep models to enhance robustness and accuracy. These methods benefit from complementary feature representations but introduce increased computational complexity, reduced interpretability, and a higher risk of overfitting when training data variability is high [3941].

While CNN-based approaches have established strong baselines, recent literature from 2024 and 2025 highlights a shift towards Vision Transformers (ViT) and hybrid architectures to capture global dependencies. For instance, Lian et al. introduced a shifted windowing vision transformer that achieved 93.60% accuracy on the HAM10000 dataset, demonstrating the efficacy of attention mechanisms over pure CNNs [42]. Similarly, Agarwal and Mahto proposed a hybrid CNN-Transformer model utilizing Convolutional Kolmogorov-Arnold Networks (CKAN) for feature fusion, which reached an accuracy of 92.81% [43]. Recently, Kılıç (2025) proposed FocusGate-Net, a dual-attention guided MLP–convolution hybrid segmentation network that integrates shifted token MLP blocks with CBAM and attention gates, achieving strong performance and cross-dataset generalization on ISIC2018, PH2, and Kvasir-SEG datasets while maintaining high computational efficiency [44]. Furthermore, Yang et al. developed a multi-scale attention booster within an ensemble framework, achieving a remarkable 95.05% accuracy by focusing on discriminative lesion regions [45]. These studies collectively indicate that integrating global context modeling with local feature extraction represents the current state-of-the-art.

3. Materials and methods

3.1. Dataset

This study used the publicly available skin cancer HAM10000 dataset [46] to classify skin cancer. This dataset consists of 10,015 dermatoscopic images, including actinic keratoses (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv), and vascular lesions (vasc). However, as shown in Fig 1, there is a noticeable class imbalance in the dataset. Most images belong to the nv class, accounting for 6,705 images, making up a significant portion of the dataset. In contrast, minority classes such as vasc and df have only 142 and 115 images, respectively.

thumbnail
Fig 1. Distribution of skin lesion classes in the HAM10000 dataset.

The dataset exhibits a significant class imbalance, with most images belonging to the melanocytic nevi class, while minority classes such as dermatofibroma and vascular lesions are underrepresented.

https://doi.org/10.1371/journal.pone.0341227.g001

This imbalance between classes may pose challenges in model training, as the model is likely to become biased towards the majority class, leading to better performance for this class while struggling to classify the minority classes correctly. To address this imbalance, techniques such as data augmentation, class weighting, or oversampling of minority classes can be employed to achieve a more balanced model training process.

The images in the dataset are originally in RGB format with a resolution of pixels. In this study, the images were resized to pixels to ensure compatibility with the model’s input size and reduce computational complexity. While the HAM10000 dataset provides a valuable resource for automatically classifying skin lesions, it is crucial to consider the class imbalance during model development. Fig 2 presents example images from each class in the dataset. These images illustrate the diversity of skin lesions and highlight the visual differences between the various classes.

thumbnail
Fig 2. Example images for each class from the HAM10000 dataset.

The images illustrate the visual diversity of skin lesion categories and highlight inter-class variability across different lesion types.

https://doi.org/10.1371/journal.pone.0341227.g002

3.2. Ethical considerations

This study did not involve any new data collection from human participants. All experiments were conducted using the publicly available HAM10000 dataset [46], which consists of fully de-identified and anonymized dermoscopic images. According to the original dataset publication, image acquisition and data collection were performed in compliance with applicable ethical standards and were approved by the relevant institutional review boards. As this study exclusively used secondary, anonymized data, no additional ethical approval or informed consent was required.

3.3. Transfer learning models

Transfer learning enables the use of pre-trained deep neural networks—originally trained on large-scale datasets—for new tasks with limited data. By leveraging previously learned representations, training time is reduced and generalization is improved. Each model utilized in this study incorporates architectural innovations that enhance feature extraction, computational efficiency, and convergence behavior. This subsection summarizes the key contributions and mathematical foundations of the models employed.

DenseNet201 introduces dense connectivity, where each layer receives the concatenated outputs of all preceding layers. This formulation facilitates feature reuse, strengthens gradient flow, and reduces redundancy across layers. Formally, the output of layer l is expressed as:

(1)

where denotes the composite transformation at layer l. Dense connectivity mitigates vanishing gradients and enhances representational efficiency.

VGG16 employs a deep architecture with small convolution filters stacked sequentially to capture fine-grained spatial features. A single convolution operation is defined as:

(2)

where denotes the output activation, and the filter weights. Despite its simplicity, VGG16 provides a strong baseline for feature extraction.

ResNet50 resolves the vanishing gradient issue by introducing residual learning. Residual blocks add shortcut connections that allow gradients to propagate directly through the network:

(3)

where F denotes the residual mapping. This design stabilizes training in deeper networks and improves convergence.

MobileNetV2 is built upon depthwise separable convolutions, decomposing a standard convolution into depthwise and pointwise operations. The depthwise step is:

(4)

followed by a pointwise convolution:

(5)

This decomposition dramatically reduces computational cost, making MobileNetV2 suitable for resource-limited environments.

EfficientNetB3 applies a compound scaling strategy that uniformly scales network depth, width, and resolution using predefined coefficients:

(6)

where α, β, and γ are tuned scaling constants. This balanced scaling yields improved performance with fewer parameters.

Xception extends the idea of depthwise separable convolutions by fully separating spatial and channel-wise correlations. The depthwise operation is:

(7)

followed by a pointwise projection:

(8)

This architecture enhances efficiency without compromising accuracy, making it highly effective for transfer learning tasks.

InceptionV3 extracts multi-scale features by applying convolution filters of different receptive field sizes in parallel:

(9)

This parallel structure enables the model to capture both local and global patterns, improving representational richness. In summary, the models employed in this study incorporate complementary innovations—such as dense connectivity, residual learning, separable convolutions, and multi-scale feature extraction—that collectively enhance transfer learning performance. These architectures allow efficient reuse of pre-trained features and facilitate rapid adaptation to the HAM10000 dataset.

3.4. The proposed pruning method

Model pruning is a widely used optimization technique to enhance the performance of deep learning models. This technique aims to optimize a model’s speed and memory usage by removing unnecessary or low-importance parameters. In large models, some parameters may contribute minimally to the output. The pruning process identifies these redundant parameters and reduces the model’s size without compromising performance. This approach is especially important for mobile, edge, and other environments with limited hardware resources.

In CNN models, several pruning methods are commonly used, including Weight-based Pruning, Structured Pruning, and Layer Pruning. Weight-based Pruning [47,48] selects which weights to remove based on their magnitude. Weights with smaller values generally contribute less to the model‘s output, so they are either set to zero or entirely removed. This method is effective in significantly reducing the number of parameters without greatly impacting the model’s accuracy. It introduces sparsity into dense networks, which improves computational efficiency. Structured Pruning [49,50], unlike weight-based pruning, targets entire structures within the model, such as neurons or filters. For instance, in a convolutional layer, entire filters or channels can be removed. This approach not only reduces the model’s size but also creates a hardware-friendly structure, which facilitates easier parallel computations. Structured pruning is especially useful in environments with hardware constraints, like mobile or edge devices, as it enhances inference speed and reduces memory usage. Layer Pruning [51,52] focuses on removing whole layers or blocks within the model. In architectures like ResNet, layers that do not significantly contribute to performance may be deactivated or entirely removed. This technique can dramatically decrease the model’s size and complexity, particularly in deep networks where some layers become redundant as training progresses, leading to more efficient computations.

In this study, a novel pruning method for CNNs is proposed, which reduces model depth by analyzing layer-wise sparsity after training. Instead of relying on a single test image, sparsity values are now computed using a validation batch of 16 images. For each layer, activations are collected for all samples in this batch, and sparsity is defined as the proportion of zero activations averaged across the batch. This provides a more stable and representative estimate of layer importance.

Sparsity for each layer is computed as:

(10)

where denotes the validation batch size, and represents the activations for layer for the b-th image in the batch. Once the sparsity values for all layers are calculated, the layer with the highest sparsity is identified. The critical step in the pruning process involves removing all layers following the layer with the highest sparsity. The detailed pruning process is outlined in Algorithm 1. Formally, the pruned model consists of all layers up to and including the layer with the highest sparsity:

(11)

where denotes the index of the layer with the highest sparsity. After the pruning operation, the modified model is retrained from scratch to adapt to its reduced architecture. The retraining process allows the pruned model to learn new representations without the removed layers, potentially leading to a more compact model that maintains its original performance levels. It should be noted that this retraining step is not a separate neural architecture search process; rather, it is an integral part of the pruning strategy that ensures the reduced architecture can effectively learn after the pruning decision guided by batch-averaged sparsity.

This pruning method differs from traditional techniques that target individual neurons or weights. By analyzing layer-wise sparsity and removing entire sections of the network, a more drastic reduction in model complexity is achieved. This approach is particularly advantageous in cases where deeper layers contribute minimally to the final output. Retraining the pruned model allows for maintaining strong performance while significantly reducing computational cost. The objective is not to produce an ultra-lightweight architecture, but to obtain a parameter-efficient reduction relative to the original Xception structure.

Algorithm 1 Layer-wise Sparsity-Based Pruning

Input Trained CNN model model, validation batch

Output Pruned and retrained model

1: 

2:  For each layer in model do

3: 

4:   For each image in X do

5:   

6:   

7:   end for

8:  

9:   Append()

10: end for

11:              ▹ index of max-sparsity layer

12: pruned_model ←SliceLayers (model , 1:k)

13: Retrain()

14: return

3.5. Handling imbalanced datasets

Many classification algorithms assume that the dataset used for training is balanced, meaning that each class has a similar number of observations. However, in real-world scenarios, datasets often exhibit class imbalance, where one class has significantly more observations than the other. In such cases, models tend to become biased toward the majority class, leading to poor performance in predicting the minority class, which is often the class of greater interest. This bias occurs because standard classification models aim to minimize the overall error rate, inherently placing more weight on the majority class due to its larger presence in the dataset. As a result, the minority class is often overlooked, leading to misclassification or under-representation in the final model predictions. This issue has drawn significant attention in recent years due to its implications in various fields such as medical diagnostics, fraud detection, and rare event prediction.

Several strategies have been developed to address class imbalance, one of the most popular being resampling methods. Resampling modifies the dataset to balance the class distributions either by oversampling the minority class or undersampling the majority class [53]. Studies have demonstrated that balancing class distributions through resampling can improve model performance in imbalanced settings [54,55].

Resampling techniques are generally divided into undersampling, oversampling, and hybrid methods. In undersampling methods, observations from the majority class are randomly removed until the classes are balanced. Random undersampling is the simplest form of this approach, where majority class instances are randomly discarded, reducing bias but potentially leading to loss of valuable information. On the other hand, oversampling methods aim to increase the representation of the minority class. Random oversampling achieves this by duplicating minority class instances and adding them to the dataset [56]. Although Random oversampling is straightforward, it can lead to overfitting as the model may become too focused on specific minority class samples, which can reduce generalization performance on unseen data.

To overcome this, [57] introduced SMOTE, a more advanced method that generates synthetic samples of the minority class. Rather than simply duplicating existing instances, SMOTE creates new samples by interpolating between existing minority class instances and their nearest neighbors. This process helps the model to generalize better by covering the feature space more effectively, reducing overfitting. In recent years, various SMOTE variants have been developed to address specific limitations and challenges of the original method [5861]. These variants aim to further improve the performance of oversampling, particularly in complex and highly imbalanced datasets. For instance, ADASYN [58], which adapts the generation of synthetic samples based on the density of minority class instances, gives more attention to harder-to-classify regions of the data. SMOTEWB [61] incorporates a noise detection mechanism combined with a boosting procedure to mitigate the generation of synthetic samples in noisy regions. This approach determines the appropriate number of neighbors for each observation, reducing the risk of overfitting or generating unrealistic data points. The method leverages the strengths of boosting to improve class balance and overall model robustness, proving to be effective in noisy environments. These SMOTE variations provide tailored solutions for different scenarios, enhancing the robustness and effectiveness of resampling techniques in handling imbalanced datasets.

Hybrid methods combine both oversampling and undersampling approaches to leverage the strengths of each method, aiming for a balanced and representative dataset without sacrificing generalization. By adopting these resampling techniques, it becomes possible to address the class imbalance and improve the classifier’s ability to perform well in both classes.

In this study, the SMOTE technique was utilized under various scenarios. The first of these strategies is the ’minority’ option, which ensures that only the minority class is oversampled. Another strategy, ’not minority’, focuses on oversampling all classes except the minority class, while ’not majority’ allows for oversampling of all classes except the majority class. Furthermore, the model performance was evaluated by generating different numbers of samples for the minority classes, such as , , and oversampling. SMOTE can be applied to images with a resolution of , where each image consists of three color channels (RGB). The technique generates synthetic samples in the high-dimensional pixel space by interpolating between existing minority class samples. For an image represented as a vector in the feature space, SMOTE creates synthetic images based on the following formulation:

(12)

Where is a vector representing the original minority class image, is one of the k-nearest neighbors of within the minority class, also represented as a vector in the space. The term λ is a random scalar between 0 and 1, ensuring that the new synthetic sample lies somewhere between and , effectively interpolating between the two vectors in the feature space. This interpolation is performed for each pixel in the image. Since each pixel has three values (one for each color channel: Red, Green, and Blue), the interpolation is applied independently to each channel. Thus, for each pixel p in the image, the corresponding pixel in the new synthetic image is calculated as:

(13)

Where is the pixel value of the original image and is the corresponding pixel value in the neighbor image. This process generates a new synthetic image within the feature space defined by the minority class, creating more variation and reducing the risk of overfitting specific samples. By applying SMOTE in this high-dimensional space, the minority class is better represented, helping the model generalize more effectively during training.

3.6. Data augmentation

One of the most significant challenges faced by machine learning and deep learning algorithms is the lack of sufficient training data. This issue often results in overfitting, where the model performs exceptionally well on training data but fails to generalize to unseen data. Overfitting occurs when the model effectively memorizes the training data instead of learning the underlying patterns, causing poor performance when it encounters new inputs. To mitigate this problem, data augmentation is widely used, especially in image-based tasks. Data augmentation artificially increases the size of the training dataset by applying various transformations to the existing data, thus helping the model to generalize better by exposing it to new variations of the same data.

In this study, several augmentation techniques were applied using the following transformations: scaling the pixel values by rescaling (), rotating images by up to 10 degrees, shifting the width and height by 20%, applying shear transformations with a range of 0.2, flipping the images both horizontally and vertically, and filling missing pixels using the nearest neighbor interpolation method. These transformations introduce variability in the training data, allowing the model to learn more robust features. For instance, rotation and flipping help the model to recognize objects regardless of their orientation, while shifting and shearing introduce positional variation. This increases the overall diversity of the dataset without requiring the collection of new data from scratch, which can be particularly useful in fields like medical imaging, where acquiring large datasets is often challenging and time-consuming.

Moreover, another challenge faced is the issue of imbalanced datasets, where some classes contain significantly more images than others. As shown in Table 2, certain classes have far fewer images, and this imbalance can lead to poor model performance on the minority classes, a problem known as class misclassification. Data augmentation helps alleviate this by artificially balancing the dataset, especially for the underrepresented classes, and improving the model’s ability to generalize across all classes, even the minority ones.

thumbnail
Table 2. Number of training samples for each class after applying different sampling strategies.

https://doi.org/10.1371/journal.pone.0341227.t002

3.7. Avg-TopK pooling method

Avg-TopK pooling is a novel pooling method designed to address the limitations of traditional pooling techniques like max and average pooling. In this method, the average of the top K values within a pooling window is taken, which helps preserve significant information while allowing the model to utilize more representative features. Let X be the set of input values in a pooling window of size . The top K highest values in the pooling window are represented by , such that:

(14)

Where is the maximum value, is the second-highest value, and so on. The Avg-TopK pooled value () is calculated by averaging the top K values:

(15)

Unlike max pooling, which selects only the maximum value, Avg-TopK pooling preserves more information by averaging the top K values, thus retaining multiple important features from the input. Consider a pooling window with the following values:

If , the top 3 values are . The Avg-TopK pooled value is calculated as:

(16)

In this example, Avg-TopK pooling retains more information compared to max pooling (which would only select 8), and it provides a more meaningful output than average pooling (which would compute ).

3.8. Training details

In this section, the details of the training process used to optimize the classification model are presented. The dataset images were resized to a resolution of . The dataset was split into training and testing sets, following the common practice of using 80% of the data for training and 20% for testing. This split aligns with previous research utilizing similar datasets, ensuring consistency in methodology [4].

The model was compiled using the Adam optimizer with a learning rate of 0.001. The loss function chosen for this classification task was sparse categorical cross entropy, as the target labels were encoded as integers representing the different skin classes. The model’s performance was tracked using the accuracy metric, which provides a clear indication of how well the model distinguishes between the skin classes.

To avoid overfitting and ensure optimal training, early stopping was implemented. Early stopping monitors the validation loss during training and halts the process if there is no improvement for 10 consecutive epochs. Additionally, a ReduceLROnPlateau callback was employed to reduce the learning rate by a factor of 0.1 if the validation loss plateaued for more than three epochs. This adaptive learning rate strategy ensures that the model continues to improve during later stages of training.

The training process was conducted for a maximum of 50 epochs, using a batch size of 16. During training, 20% of the training data was set aside as a validation set to monitor the model’s performance and optimize hyperparameters through early stopping and learning rate adjustments.

4. Experiment and results

This section presents the experimental studies conducted to improve the performance of skin cancer classification. First, several widely used transfer learning models were evaluated on the original HAM10000 dataset. The models included DenseNet201, VGG16, ResNet50, MobileNetV2, EfficientNetB3, Xception, InceptionV3, and InceptionResNetV2. Their performance metrics—accuracy, precision, recall, F1-score, early stopping epoch, and parameter count—are summarized in Table 3, reported as mean ± standard deviation over five independent runs. The results show that Xception continued to outperform the other models, achieving an accuracy of 84.70 ± 0.30%, precision of 84.50 ± 0.33%, recall 446 of 84.68 ± 0.29%, and an F1-score of 84.55 ± 0.31%. Training consistently converged around the 17th epoch, demonstrating stable and fast optimization behavior. Despite having approximately 20.9 million parameters, Xception exhibited the most balanced and reliable performance across evaluation metrics.

thumbnail
Table 3. Performance scores of different models for the original dataset (mean ± standard deviation over 5 runs).

https://doi.org/10.1371/journal.pone.0341227.t003

In contrast, VGG16 remained the lowest-performing model, with an accuracy of 68.47 ± 0.55% and an F1-score of 55.76 ± 0.73%. Its relatively shallow architecture and limited representational capacity contributed to slower convergence and higher training loss. MobileNetV2 and EfficientNetB3 continued to provide strong efficiency–accuracy trade-offs, with accuracies above 83% and relatively low parameter counts, particularly in the case of MobileNetV2 (2.3M parameters). Figs 3 and 4 provide additional insights into model behavior. In Fig 3, models such as Xception and EfficientNetB3 show rapid loss reduction early in training, while VGG16 maintains higher loss values throughout. Early stopping prevented overfitting in all models, with most converging between epochs 15 and 20. Fig 4 demonstrates the fast and stable rise in accuracy for Xception and EfficientNetB3, whereas VGG16 shows slower and lower stabilization consistent with its overall performance. In summary, Xception remained the strongest baseline model across repeated runs, followed by EfficientNetB3 and MobileNetV2. The updated metrics confirm the robustness and consistency of these models and support the selection of Xception as the base architecture for subsequent pruning and optimization steps.

thumbnail
Fig 3. Training loss curves of multiple deep learning models.

The X-axis denotes training epochs, while the Y-axis represents loss values. All models were trained with early stopping to mitigate overfitting. Most architectures exhibit a stable reduction in loss, whereas VGG16 shows slower convergence, highlighting differences in learning dynamics across models.

https://doi.org/10.1371/journal.pone.0341227.g003

thumbnail
Fig 4. Training accuracy curves of different deep learning models.

The X-axis represents training epochs, while the Y-axis indicates accuracy. Most architectures demonstrate a rapid increase in accuracy during the early epochs, with DenseNet201, Xception, and EfficientNetB3 approaching near-perfect performance. In contrast, VGG16 converges more slowly and stabilizes at a lower accuracy level, reflecting differences in training dynamics among models.

https://doi.org/10.1371/journal.pone.0341227.g004

After evaluating the performance of various transfer learning models, the Xception model was identified as the best-performing model. The next step involved making the Xception model more efficient by reducing redundant layers while preserving performance. To achieve this, the pruning method described earlier, which analyzes layer-wise sparsity to identify layers that contribute less to the model‘s overall performance, was applied. Based on the sparsity values, the layers with the highest sparsity were progressively pruned, and the model was retrained after each pruning operation. During this process, eight layers with the highest sparsity values were identified, and the model was pruned starting from the layer with the highest sparsity. After pruning, the model was retrained from scratch for each configuration to allow it to adapt to the reduced architecture. In addition, the ablation pipeline follows the sequence Pruning → SMOTE → Augmentation to ensure methodological consistency. Pruning is performed first because it fixes the final architecture; SMOTE is applied afterward to generate synthetic samples in the correct feature space; and augmentation is applied last to enrich visual diversity for the finalized model. This ordering ensures stable, architecture-aligned, and reproducible training across all ablation stages. Table 4 presents the performance scores and the corresponding parameter reductions after pruning the model based on the proposed layer-wise sparsity. As shown in Table 4, the model maintained high-performance levels after progressively removing layers with the highest sparsity values. The best performance was achieved after pruning the model at the block12_sepconv3_act layer, with an accuracy of 85.33% ± 0.24, precision of 85.12% ± 0.26, recall of 85.30% ± 0.23, and an F1-Score of 84.88% ± 0.25. This pruning resulted in a model with 13.5 million parameters, a significant reduction compared to the original 20.9 million parameters in the unpruned Xception model, representing a reduction of approximately 35% in the number of parameters.

thumbnail
Table 4. Performance scores for different layers with sparsity for the original dataset (mean ± standard deviation over 5 runs).

https://doi.org/10.1371/journal.pone.0341227.t004

The pruning strategy revealed a significant relationship between the sparsity rates of individual layers and the model’s overall performance. Specifically, layers exhibiting higher sparsity rates demonstrated a reduced contribution to the network’s accuracy, thereby enabling their removal with minimal impact on performance. For instance, pruning the layer with the highest sparsity rate, block12_sepconv3_act (sparsity rate: 91.3%), resulted in an accuracy increase to 85.33% and a substantial reduction in the number of parameters (to 13.5 million). This finding indicates that higher sparsity rates are associated with lower layer importance, allowing for more aggressive pruning strategies without compromising performance. Conversely, pruning layers with lower sparsity rates, such as block14_sepconv1_act (sparsity rate: 79.6%), led to a smaller performance gain. These results confirm that focusing on layers with higher sparsity rates can produce more efficient models while preserving competitive accuracy levels. In conclusion, this method successfully reduced the Xception model’s computational complexity by carefully analyzing layer-wise sparsity while maintaining competitive performance. The proposed pruning method led to a more compact model, reducing the number of parameters by up to 35%, making the model significantly more efficient for deployment in environments with limited computational resources. While the pruned model remains larger than compact architectures such as MobileNetV2 (2.3M parameters), it offers a substantial reduction compared to the original Xception (20.9M parameters). Therefore, the contribution lies in achieving parameter efficiency within the Xception family rather than competing with ultra-lightweight models.

After applying the pruning process and selecting the pruned model at block12_sepconv3_act (sparsity rate: 91.3%), the next step addressed the issue of data imbalance in the dataset. Imbalanced data can negatively impact model performance, especially in terms of recall and F1-Score, as the model may become biased towards the majority class.To mitigate this issue, SMOTE was applied only in the deep feature space obtained from the pruned model, rather than in the raw pixel domain. This ensures that no synthetic images are generated; instead, oversampling is performed by interpolating between high-level feature vectors after the train–test split. This approach balances the class distributions while preventing the introduction of unrealistic artifacts and fully avoiding any data leakage.

Table 5 presents the results obtained from applying various sampling strategies to address the class imbalance issue in the dataset. The baseline, referred to as “No Sampling,” shows the results without any class balancing, serving as a reference for comparison. All considered sampling strategies—including “not majority,” “minority,” “not minority,” and the controlled oversampling cases (2x, 3x, 4x)—were applied exclusively to the feature representations of the training set. This guarantees that the test set remained untouched and that balancing was performed without generating synthetic images. These strategies were evaluated to determine which feature-space balancing method yielded the most reliable improvements under class imbalance.

thumbnail
Table 5. Performance scores of different sampling strategies (mean ± standard deviation over 5 runs).

https://doi.org/10.1371/journal.pone.0341227.t005

The “No Sampling” strategy, which represents the pruned model before any resampling, achieved an accuracy of 85.33% and an F1-Score of 84.88%. This serves as the baseline for comparison. Among the oversampling strategies, the “Case 1 – 2x” strategy, which increases the samples in minority classes by 2 times, yielded the best results, with an accuracy of 86.25%, precision of 86.70%, and an F1-Score of 85.95%. This shows that a moderate increase in the number of minority class samples can significantly improve the model‘s performance by enhancing its ability to identify minority class instances correctly. In contrast, “Case 3 – 4x” resulted in a decline in accuracy (71.72%) and F1-Score (72.20%), indicating that excessive oversampling can lead to overfitting, where the model becomes too specific to the synthetic samples generated, reducing its generalization capability. The “minority” strategy, which oversamples only the minority class, also performed well, achieving an F1-Score of 85.55%. However, strategies such as “not majority” and “not minority” resulted in suboptimal performance, with the “not minority” strategy performing the worst (accuracy of 55.80%), as it significantly altered the balance of the dataset, leading to poor generalization. In conclusion, the results indicate that moderate oversampling of the minority classes, particularly in “Case 1 – 2x,” provided the most balanced and improved performance. Careful tuning of the oversampling ratio is essential to avoid overfitting while addressing the imbalance problem effectively.

Following the previous application of the SMOTE technique in the Case 1 – 2x strategy, where the number of samples in the minority classes was increased by a factor of two, the next step involved applying data augmentation. In this phase, data augmentation techniques were employed to improve model performance by ensuring that the number of examples in each class in the training dataset was approximately equal. The primary goal was to introduce more variability into the training set and reduce the risk of overfitting. Data augmentation was applied to the examples generated in the Case 1 – 2x scenario, introducing transformations such as rotations, scaling, and flips to create a more diverse set of training instances. This approach ensured that each class had almost the same number of examples, providing the model with balanced and varied data across all classes. The application of data augmentation resulted in significant performance improvements, as shown by the updated scores in Table 6: an accuracy of 90.58%, precision of 90.42%, recall of 90.70%, and F1-Score of 90.55%. Compared to the performance achieved in Case 1 – 2x without augmentation (86.25%), augmentation led to a noticeable enhancement in model accuracy and overall classification ability. These results highlight the effectiveness of data augmentation in conjunction with SMOTE. By generating additional, varied examples for each class, the model became better equipped to generalize across both minority and majority classes. The equal distribution of class examples and the diversity introduced by augmentation helped the model maintain high recall and precision, improving performance metrics across the board. This approach proved valuable in balancing the dataset while maximizing the model‘s classification capabilities.

thumbnail
Table 6. Performance scores after applying data augmentation and AvgTopK strategies (mean ± standard deviation over 5 runs).

https://doi.org/10.1371/journal.pone.0341227.t006

The next stage involved applying the Avg-TopK pooling method to enhance the performance of the pruned Xception model. As discussed earlier, Avg-TopK pooling improves on max and average pooling by averaging the top K values in a pooling window, preserving multiple important features. All max pooling layers in the Xception model were replaced with Avg-TopK pooling, with for this experiment. The results in Table 6 demonstrate a performance improvement, with an accuracy of 91.52±0.16%, precision of 91.33±0.19%, recall of 91.70±0.15%, and an F1-Score of 91.50±0.17%. These results show that Avg-TopK pooling with enhanced the model‘s ability to classify data more effectively, leading to better overall performance compared to the data augmentation strategy alone. This confirms that replacing max pooling with Avg-TopK pooling contributed to improved classification results in the Xception model.

To evaluate the effectiveness of the proposed strategy in this study, its performance was compared with that of recent studies in the literature using the HAM10000 dataset for skin lesion classification. As illustrated in Table 7, the approach, which combines the PrunedModel with SMOTE, data augmentation, and Avg-TopK pooling, achieved remarkable results. Specifically, this method attained an accuracy of 91.52%, a precision of 91.33%, a recall of 91.70%, and an F1-Score of 91.50% when applied to the standard 10,015-image dataset.

thumbnail
Table 7. Comparison of prior studies on the HAM10000 dataset for skin lesion classification. All results shown are based on the original 10,015-image dataset; in our case, oversampling and augmentation are applied only to the training set after the train–test split.

https://doi.org/10.1371/journal.pone.0341227.t007

Compared to recent studies using the same dataset, the proposed PrunedModel, combined with SMOTE, data augmentation, and Avg-TopK pooling, achieved competitive results. When considering models like Dilated InceptionV3 by [62], which reported an accuracy of 89.81%, and DenseNet201 with an accuracy of 87.7%, the proposed method surpasses these in terms of overall accuracy and balance across precision and recall metrics. MobileNet V2-LSTM, which achieved an accuracy of 85.34%, also falls behind the proposed model’s performance. On the other hand, [63]’s combination of InceptionResNetV2 and ResNeXt101 achieved a slightly higher accuracy of 92.83%, surpassing the proposed model’s 91.47%. However, combining multiple models increases model capacity and complexity, which can lead to higher computational costs and increased training time. In contrast, the proposed method, which focuses on pruning and optimizing a single model, maintains a strong balance between accuracy, precision, recall, and F1-Score, offering a more efficient and streamlined solution. This balance ensures consistency across all metrics while keeping the model lightweight and suitable for real-time applications, making it a more practical choice for clinical settings. Compared to the KELM proposed by [64], which reached an accuracy of 90.67% and a recall of 90.20%, the proposed model shows better overall performance, with superior recall and higher accuracy. The ensemble model by [65], combining ResNet18, MobileNetV2, and VGG11, achieved an accuracy of 86.78%, which was outperformed by the proposed method both in terms of accuracy and metric balance. In summary, the proposed method outperforms many state-of-the-art models in terms of overall accuracy, precision, and recall, while remaining slightly behind a few models like InceptionResNetV2 and ResNeXt101 regarding peak accuracy. These comparisons highlight the effectiveness of our approach in utilizing techniques like model pruning, SMOTE, data augmentation, and Avg-TopK to optimize the performance of the classification model. Our method’s consistent superiority in both accuracy and balance across precision, recall, and F1-Score underscores its potential for real-world diagnostic applications, where maintaining high precision and recall is crucial for reducing false negatives and false positives in medical diagnoses.

The confusion matrix shown in Fig 5 presents the classification performance of the proposed model on the original HAM10000 dataset. All SMOTE and data augmentation procedures were applied exclusively to the training set after the train–test split to prevent data leakage. The results demonstrate that the proposed framework effectively mitigates the impact of class imbalance and achieves consistent performance across both majority and minority classes, leading to a more reliable and balanced classification outcome.

thumbnail
Fig 5. Confusion matrix of the proposed model evaluated on the HAM10000 dataset.

All data augmentation and SMOTE procedures were applied exclusively to the training set after the train-test split, ensuring that the test set remained free of synthetic samples.

https://doi.org/10.1371/journal.pone.0341227.g005

To highlight the regions where the model concentrated during the classification process, the Grad-CAM results are provided in Fig 6. The original images are presented alongside their corresponding Grad-CAM heatmaps, which reveal the spatial areas that contribute most strongly to the model‘s predictions. These heatmaps, where red tones indicate higher activation, demonstrate that the model generally attends to clinically meaningful lesion regions across both correctly and incorrectly classified samples. Including examples of both accurate and erroneous predictions provides a clearer understanding of the model‘s decision patterns, showing not only where the model succeeds but also where it misinterprets features that lead to incorrect outcomes. This interpretability analysis ensures that the decision-making process is transparent and allows for validation of whether the model relies on medically relevant visual cues. Such visualization-based insights are essential for assessing the reliability of deep learning systems in dermatological image analysis, as they help confirm that the model‘s learned representations align with expert diagnostic reasoning.

thumbnail
Fig 6. Grad-CAM visualizations for representative skin lesion classes.

For each example, the original dermoscopic image is shown alongside its corresponding Grad-CAM heatmap. The highlighted regions indicate areas that contribute most strongly to the model‘s classification decision, with warmer colors representing higher relevance.

https://doi.org/10.1371/journal.pone.0341227.g006

Fig 7 presents a visual comparison of the model’s predictions against the true labels for a subset of images from the test dataset. Each image displays a pair of labels in the format , where the true label is positioned on the left, and the predicted label from the model is on the right. This visualization helps illustrate the model’s classification performance, highlighting instances of both accurate predictions and misclassifications. The majority of the images exhibit consistent predictions with their true labels, such as the class “nv,” indicating the model’s strong capability to identify common patterns accurately. However, there are some cases where the true label differs from the predicted label, for example, , suggesting that while the model generally performs well, it occasionally struggles with the more subtle differences between certain skin lesion classes.

thumbnail
Fig 7. Representative test images with ground truth and predicted labels.

Each example displays a dermoscopic image from the test set along with a pair of labels, where the left label denotes the ground truth and the right label indicates the model prediction (e.g., mel|akiec).

https://doi.org/10.1371/journal.pone.0341227.g007

5. Conclusions

This study developed a robust framework for skin cancer classification by leveraging advanced techniques such as model pruning, SMOTE, data augmentation, and the Avg-TopK pooling method. Through the use of transfer learning with Xception and the application of pruning, a more parameter-efficient variant of the Xception model was obtained that effectively reduced complexity while maintaining high accuracy. SMOTE and data augmentation were applied to address the class imbalance, further enhancing the model‘s generalization capability across various skin lesion types. The integration of the Avg-TopK pooling technique allowed for better feature retention, resulting in superior performance compared to traditional pooling methods. The framework achieves high accuracy while offering improved parameter efficiency relative to the original Xception architecture. Although the model is not ultra-lightweight, the parameter reduction makes it more practical for deployment in moderately constrained environments. A limitation of the proposed pruning strategy is that it relies on layer-wise activation sparsity computed from a small validation batch rather than the entire dataset, and the decision to prune all subsequent layers remains heuristic. Although experimental results strongly support the validity of this pruning boundary for the Xception architecture, the method does not guarantee theoretical optimality. Future work will incorporate benchmarking against established pruning techniques such as magnitude pruning, filter pruning, and structured channel pruning further to validate the effectiveness and generalizability of the proposed approach. Finally, although the proposed framework achieves strong quantitative performance, it is not intended to be interpreted as clinically deployable. Clinical applicability requires validation through dermatologist-supervised assessment, multi-center studies, and prospective clinical trials, which fall outside the scope of this study. Future research could extend this approach to other medical imaging tasks, broadening its applicability and impact in healthcare.

This study developed a robust framework for skin cancer classification by leveraging advanced techniques such as model pruning, SMOTE, data augmentation, and the Avg-TopK pooling method. Through the use of transfer learning with Xception and the application of pruning, a more parameter-efficient variant of the Xception model was obtained that effectively reduced complexity while maintaining high accuracy. The framework achieves high accuracy while offering improved parameter efficiency relative to the original Xception architecture. Although the model is not ultra-lightweight, the parameter reduction makes it more practical for deployment in moderately constrained environments.

References

  1. 1. Alam TM, Shaukat K, Khan WA, Hameed IA, Almuqren LA, Raza MA, et al. An Efficient Deep Learning-Based Skin Cancer Classifier for an Imbalanced Dataset. Diagnostics (Basel). 2022;12(9):2115. pmid:36140516
  2. 2. Rajasekar V, K ND, K S, A S, S S, K S. An Intelligent System For Skin Cancer Detection Using Deep Learning Techniques. In: 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), 2024. 1–5. https://doi.org/10.1109/adics58448.2024.10533643
  3. 3. Salamaa WM, Aly MH. Deep learning design for benign and malignant classification of skin lesions: a new approach. Multimed Tools Appl. 2021;80(17):26795–811.
  4. 4. Bozkurt F. Skin lesion classification on dermatoscopic images using effective data augmentation and pre-trained deep learning approach. Multimed Tools Appl. 2022;82(12):18985–9003.
  5. 5. Salma W, Eltrass AS. Automated Deep Learning Approach for Classification of Malignant Melanoma and Benign Skin Lesions. Multimedia Tools and Applications. 2022;81(22):32643–60.
  6. 6. Lakshmi DS, Divya A, Sai MB, Rama KD, Rani KJ, Vakkalagadda M. Skin Cancer Diagnosis Using Deep Convolutional Generative Adversarial Network for Image Classification. In: 2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), 2024. 1219–25. https://doi.org/10.1109/idciot59759.2024.10467243
  7. 7. Lu X, Abolhasani Zadeh FAY. Deep learning-based classification for melanoma detection using XceptionNet. Journal of Healthcare Engineering. 2022;2022(1):2196096.
  8. 8. Dogan Y. AutoEffFusionNet: A New Approach for Cervical Cancer Diagnosis Using ResNet-Based Autoencoder With Attention Mechanism and Genetic Feature Selection. IEEE Access. 2025;13:44107–22.
  9. 9. OZDEMIR C. Adapting transfer learning models to dataset through pruning and Avg-TopK pooling. Neural Comput & Applic. 2024;36(11):6257–70.
  10. 10. Dogan Y. A novel pruning-enhanced hybrid approach for efficient and accurate brain tumor diagnosis. Biomedical Signal Processing and Control. 2026;112:108466.
  11. 11. Cheng H, Zhang M, Shi JQ. A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations. IEEE Trans Pattern Anal Mach Intell. 2024;46(12):10558–78. pmid:39167504
  12. 12. Öter E, Doğan Y. A Comparative Study on Data Balancing Methods for Alzheimer’s Disease Classification. Çukurova Üniversitesi Mühendislik Fakültesi Dergisi. 2024;39(2):489–501.
  13. 13. Alhudhaif A. A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach. PeerJ Comput Sci. 2021;7:e523. pmid:34084928
  14. 14. Joloudari JH, Marefat A, Nematollahi MA, Oyelere SS, Hussain S. Effective class-imbalance learning based on SMOTE and convolutional neural networks. Applied Sciences. 2023;13(6):4006.
  15. 15. Aruk I, Pacal I, Toprak AN. A novel hybrid ConvNeXt-based approach for enhanced skin lesion classification. Expert Systems with Applications. 2025;283:127721.
  16. 16. Aruk I, Pacal I, Toprak AN. A comprehensive comparison of convolutional neural network and visual transformer models on skin cancer classification. Comput Biol Chem. 2026;120(Pt 2):108713. pmid:41092791
  17. 17. Alswilem L, Pacal N. Computational efficiency and accuracy of deep learning models for automated breast cancer detection in ultrasound imaging. Artificial Intelligence in Applied Sciences. 2025;1(1):1–6.
  18. 18. Çakmak Y, Pacal N. Deep Learning for Automated Breast Cancer Detection in Ultrasound: A Comparative Study of Four CNN Architectures. AIAPP. 2025;1(1):13–9.
  19. 19. Özdemir C. Avg-topk: A new pooling method for convolutional neural networks. Expert Systems with Applications. 2023;223:119892.
  20. 20. Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ. Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM. Sensors (Basel). 2021;21(8):2852. pmid:33919583
  21. 21. Kılıç Ş. Deep feature engineering for accurate sperm morphology classification using CBAM-enhanced ResNet50. PLoS One. 2025;20(9):e0330914. pmid:40929291
  22. 22. Zghal NS, Derbel N. Melanoma Skin Cancer Detection Based on Image Processing. Current Medical Imaging. 2020;16(1):50–8.
  23. 23. Viknesh CK, Kumar PN, Seetharaman R, Anitha D. Detection and Classification of Melanoma Skin Cancer Using Image Processing Technique. Diagnostics (Basel). 2023;13(21):3313. pmid:37958209
  24. 24. Suiçmez Ç, Tolga Kahraman H, Suiçmez A, Yılmaz C, Balcı F. Detection of melanoma with hybrid learning method by removing hair from dermoscopic images using image processing techniques and wavelet transform. Biomedical Signal Processing and Control. 2023;84:104729.
  25. 25. Kılıç Ş. Densenet201+ with multi-scale attention and deep feature engineering for automated Kellgren–Lawrence grading of knee osteoarthritis. PeerJ Computer Science. 2025;11:e3329.
  26. 26. Thangakani SA, Sornam M, Kavitha M. Deep convolutional neural network for skin lesion detection with HOG and GLCM features. Trends in Biomaterials & Artificial Organs. 2022;36.
  27. 27. Pandu J, Kudtala U, Prabhakar B. Skin cancer detection and classification using DWT-GLCM with probabilistic neural networks. Biomedical signal and image processing with artificial intelligence. Springer. 2023. p. 183–94.
  28. 28. Verma S, Kumar M. A hybrid machine learning model for skin disease classification using discrete wavelet transform and gray level co-occurrence matrix (GLCM). Multimed Tools Appl. 2024;84(14):12835–53.
  29. 29. Balaji VR, Suganthi ST, Rajadevi R, Krishna Kumar V, Saravana Balaji B, Pandiyan S. Skin disease detection and segmentation using dynamic graph cut algorithm and classification through Naive Bayes classifier. Measurement. 2020;163:107922.
  30. 30. Hatem MQ. Skin lesion classification system using a K-nearest neighbor algorithm. Vis Comput Ind Biomed Art. 2022;5(1):7. pmid:35229199
  31. 31. Zeng W, Liao Y, Chen Y, ying Diao Q, ying Fu Z, Yao F. Research on classification and recognition of the skin tumors by laser ultrasound using support vector machine based on particle swarm optimization. Optics & Laser Technology. 2023;158:108810.
  32. 32. Ain QU, Al-Sahaf H, Xue B, Zhang M. Genetic programming for automatic skin cancer image classification. Expert Systems with Applications. 2022;197:116680.
  33. 33. Jha S, Mehta AK. Retraction Note: A Hybrid Approach Using the Fuzzy Logic System and the Modified Genetic Algorithm for Prediction of Skin Cancer. Neural Process Lett. 2024;56(4).
  34. 34. Salih O, Duffy KJ. Optimization Convolutional Neural Network for Automatic Skin Lesion Diagnosis Using a Genetic Algorithm. Applied Sciences. 2023;13(5):3248.
  35. 35. Kılıç Ş. Attention-Based Dual-Path Deep Learning for Blood Cell Image Classification Using ConvNeXt and Swin Transformer. J Imaging Inform Med. 2026;39(1):564–82. pmid:40301289
  36. 36. Bhuvaneshwari KS, Rama Parvathy L, Chatrapathy K, Krishna Reddy ChV. An internet of health things-driven skin cancer classification using progressive cyclical convolutional neural network with ResNexT50 optimized by exponential particle swarm optimization. Biomedical Signal Processing and Control. 2024;91:105878.
  37. 37. Faghihi A, Fathollahi M, Rajabi R. Diagnosis of skin cancer using VGG16 and VGG19 based transfer learning models. Multimed Tools Appl. 2023;83(19):57495–510.
  38. 38. Tuncer T, Barua PD, Tuncer I, Dogan S, Acharya UR. A lightweight deep convolutional neural network model for skin cancer image classification. Applied Soft Computing. 2024;:111794.
  39. 39. Natha P, RajaRajeswari P. Advancing skin cancer prediction using ensemble models. Computers. 2024;13(7):157.
  40. 40. Baykal Kablan E, Ayas S. Skin lesion classification from dermoscopy images using ensemble learning of ConvNeXt models. Signal, Image and Video Processing. 2024;:1–9.
  41. 41. Chanda D, Onim MdSH, Nyeem H, Ovi TB, Naba SS. DCENSnet: A new deep convolutional ensemble network for skin cancer classification. Biomedical Signal Processing and Control. 2024;89:105757.
  42. 42. Lian J, Han L, Wang X, Ji Z, Cheng L. Shifted windowing vision transformer-based skin cancer classification via transfer learning. Clinics (Sao Paulo). 2025;80:100724. pmid:40915182
  43. 43. Agarwal S, Mahto AK. Skin cancer classification: hybrid cnn-transformer models with kan-based fusion. In: 2025. https://arxiv.org/abs/2508.12484
  44. 44. Kılıç Ş. FocusGate-Net: A dual-attention guided MLP-convolution hybrid network for accurate and efficient medical image segmentation. PLoS One. 2025;20(9):e0331896. pmid:40997119
  45. 45. Yang G, Luo S, Greer P. Boosting Skin Cancer Classification: A Multi-Scale Attention and Ensemble Approach with Vision Transformers. Sensors (Basel). 2025;25(8):2479. pmid:40285168
  46. 46. Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018;5:180161. pmid:30106392
  47. 47. Chen T, Anderson N, Kim Y. Latent Weight-Based Pruning for Small Binary Neural Networks. In: Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023. 751–6. https://doi.org/10.1145/3566097.3567873
  48. 48. Zheng Y, Sun P, Ren Q, Xu W, Zhu D. A novel and efficient model pruning method for deep convolutional neural networks by evaluating the direct and indirect effects of filters. Neurocomputing. 2024;569:127124.
  49. 49. He Y, Xiao L. Structured Pruning for Deep Convolutional Neural Networks: A Survey. IEEE Trans Pattern Anal Mach Intell. 2024;46(5):2900–19. pmid:38015707
  50. 50. Hedegaard L, Alok A, Jose J, Iosifidis A. Structured Pruning Adapters. Pattern Recognition. 2024;156:110724.
  51. 51. Tang H, Lu Y, Xuan Q. SR-Init: An Interpretable Layer Pruning Method. In: ICASSP 2023 – IEEE International Conference on Acoustics, Speech and Signal Processing, 2023. 1–5.
  52. 52. Wei Q, Zeng B, Liu J, He L, Zeng G. LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024. 4968–75. https://doi.org/10.1109/icra57147.2024.10610022
  53. 53. Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F. Learning from imbalanced data sets. Springer. 2018.
  54. 54. Ghorbani R, Ghousi R. Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques. IEEE Access. 2020;8:67899–911.
  55. 55. Naseriparsa M, Al-Shammari A, Sheng M, Zhang Y, Zhou R. RSMOTE: improving classification performance over imbalanced medical datasets. Health Inf Sci Syst. 2020;8(1):22. pmid:32549976
  56. 56. Haibo He, Garcia EA. Learning from Imbalanced Data. IEEE Trans Knowl Data Eng. 2009;21(9):1263–84.
  57. 57. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. jair. 2002;16:321–57.
  58. 58. Haibo He, Yang Bai, Garcia EA, Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008. 1322–8. https://doi.org/10.1109/ijcnn.2008.4633969
  59. 59. Torres FR, Carrasco-Ochoa JA, Martínez-Trinidad JF. SMOTE-D: A Deterministic Version of SMOTE. In: Pattern Recognition: 8th Mexican Conference, MCPR 2016, Guanajuato, Mexico, June 22–25, 2016, Proceedings, 2016. 177–88.
  60. 60. Douzas G, Bacao F. Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Systems with Applications. 2017;82:40–52.
  61. 61. Sağlam F, Cengiz MA. A novel SMOTE-based resampling technique trough noise detection and the boosting procedure. Expert Systems with Applications. 2022;200:117023.
  62. 62. Ratul MAR, Mozaffari MH, Lee WS, Parimbelli E. Skin lesions classification using deep learning based on dilated convolution. bioRxiv preprint. 2019.
  63. 63. Chaturvedi SS, Tembhurne JV, Diwan T. A multi-class skin Cancer classification using deep convolutional neural networks. Multimed Tools Appl. 2020;79(39–40):28477–98.
  64. 64. Khan MA, Sharif M, Akram T, Damaševičius R, Maskeliūnas R. Skin Lesion Segmentation and Multiclass Classification Using Deep Learning Features and Improved Moth Flame Optimization. Diagnostics (Basel). 2021;11(5):811. pmid:33947117
  65. 65. Liu X, Yu Z, Tan L, Yan Y, Shi G. Enhancing skin lesion diagnosis with ensemble learning. In: 2024. https://doi.org/arXiv:2409.04381
  66. 66. Purnama IKE, Hernanda HAK, Ratna AAP, Nurtanio I, Hidayati AN, Purnomo MH. Disease classification based on dermoscopic skin images using convolutional neural network in teledermatology system. In: 2019 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), 2019. 1–5.
  67. 67. Thurnhofer-Hemsi K, Domínguez E. A Convolutional Neural Network Framework for Accurate Skin Cancer Detection. Neural Process Lett. 2020;53(5):3073–93.
  68. 68. Khan MA, Zhang Y-D, Sharif M, Akram T. Pixels to Classes: Intelligent Learning Framework for Multiclass Skin Lesion Localization and Classification. Computers & Electrical Engineering. 2021;90:106956.
  69. 69. Popescu D, El-Khatib M, Ichim L. Skin Lesion Classification Using Collective Intelligence of Multiple Neural Networks. Sensors (Basel). 2022;22(12):4399. pmid:35746180
  70. 70. Hoang L, Lee S-H, Lee E-J, Kwon K-R. Multiclass Skin Lesion Classification Using a Novel Lightweight Deep Learning Framework for Smart Healthcare. Applied Sciences. 2022;12(5):2677.
  71. 71. Gairola AK, Kumar V, Sahoo AK, Diwakar M, Singh P, Garg D. Multi-feature Fusion Deep Network for Skin Disease Diagnosis. Multimed Tools Appl. 2024;84(1):419–44.