Table 1.
Summary of related works on skin lesion classification, including classical methods, hybrid approaches, and deep learning models.
Fig 1.
Distribution of skin lesion classes in the HAM10000 dataset.
The dataset exhibits a significant class imbalance, with most images belonging to the melanocytic nevi class, while minority classes such as dermatofibroma and vascular lesions are underrepresented.
Fig 2.
Example images for each class from the HAM10000 dataset.
The images illustrate the visual diversity of skin lesion categories and highlight inter-class variability across different lesion types.
Table 2.
Number of training samples for each class after applying different sampling strategies.
Table 3.
Performance scores of different models for the original dataset (mean ± standard deviation over 5 runs).
Fig 3.
Training loss curves of multiple deep learning models.
The X-axis denotes training epochs, while the Y-axis represents loss values. All models were trained with early stopping to mitigate overfitting. Most architectures exhibit a stable reduction in loss, whereas VGG16 shows slower convergence, highlighting differences in learning dynamics across models.
Fig 4.
Training accuracy curves of different deep learning models.
The X-axis represents training epochs, while the Y-axis indicates accuracy. Most architectures demonstrate a rapid increase in accuracy during the early epochs, with DenseNet201, Xception, and EfficientNetB3 approaching near-perfect performance. In contrast, VGG16 converges more slowly and stabilizes at a lower accuracy level, reflecting differences in training dynamics among models.
Table 4.
Performance scores for different layers with sparsity for the original dataset (mean ± standard deviation over 5 runs).
Table 5.
Performance scores of different sampling strategies (mean ± standard deviation over 5 runs).
Table 6.
Performance scores after applying data augmentation and AvgTopK strategies (mean ± standard deviation over 5 runs).
Table 7.
Comparison of prior studies on the HAM10000 dataset for skin lesion classification. All results shown are based on the original 10,015-image dataset; in our case, oversampling and augmentation are applied only to the training set after the train–test split.
Fig 5.
Confusion matrix of the proposed model evaluated on the HAM10000 dataset.
All data augmentation and SMOTE procedures were applied exclusively to the training set after the train-test split, ensuring that the test set remained free of synthetic samples.
Fig 6.
Grad-CAM visualizations for representative skin lesion classes.
For each example, the original dermoscopic image is shown alongside its corresponding Grad-CAM heatmap. The highlighted regions indicate areas that contribute most strongly to the model‘s classification decision, with warmer colors representing higher relevance.
Fig 7.
Representative test images with ground truth and predicted labels.
Each example displays a dermoscopic image from the test set along with a pair of labels, where the left label denotes the ground truth and the right label indicates the model prediction (e.g., mel|akiec).