Table 1.
Overview of the characteristics of the metadata CSV file, including the count of missing values.
Fig 1.
Six distinct disease categories associated with lesions.
Fig 2.
Frequency of images for every type of skin lesions.
Fig 3.
Architectural workflow of the proposed approach.
Fig 4.
Black-Hat Transformation Applied to a Skin Lesion Image.
Fig 5.
Application of adaptive Gaussian Thresholding on a skin lesion image.
Fig 6.
Embedding patches.
Fig 7.
The overall framework of the transformer encoder.
Fig 8.
Vision transformer architecture.
Table 2.
The ViT models parameter configuration and layer architecture.
Fig 9.
Feature Extraction from Original, Black Hat, and Adaptive Thresholding Images of Skin Lesions.
Fig 10.
Element-Wise Multiplication of Features from Black Hat, Original, and Adaptive Images.
Fig 11.
Stacking model architecture for final prediction.
Table 3.
Splitting data distribution.
Table 4.
The overall performance of the proposed approach.
Table 5.
The performance of the Proposed Approach across different lesion classes.
Fig 12.
Confusion matrix of the proposed model.
Fig 13.
Grad-CAM Visualizations for Skin Lesion Classification.
Grad-CAM show the regions of interest the ViT model focuses on for each skin lesion type. Red areas indicate high influence on the model’s predictions, aligning with clinically relevant features.
Fig 14.
SHAP Summary Plot for Clinical Metadata Feature Importance SHAP values derived from the XGBoost model highlight the relative contribution of each clinical feature to the classification of six lesion types: ACK, BCC, MEL, NEV, SCC, and SEK.
Features such as skin cancer history and Fitzpatrick skin type show the highest impact across classes.
Table 6.
Comparison of the Proposed Approach with State-of-the-Art models.