Table 1.
Novelty clarification of the proposed framework (high-level).
Fig 1.
The overall framework consists of an encoder driven by Boundary-augmented Hybrid Attention (BAHA) and a decoder guided by multi-scale shape priors, achieving joint modeling of lesion boundary features, cross-scale structural information, and global semantic context.
The decoder further integrates a feature pyramid structure, a gradient extraction module, and a multi-layer perceptron to produce high-precision skin lesion segmentation results with enhanced boundary refinement and improved shape consistency.
Fig 2.
The structure of the Boundary-augmented Hybrid Attention (BAHA) module is illustrated, where the feature branch and gradient branch are first encoded using 1D CNNs and then processed by local self-attention and convolutional gating units to extract semantically relevant local contextual information.
The gradient features generate a boundary mask through Laplacian and EdgeConv operations, which guides channel and spatial reweighting, and the result is concatenated with the global context produced by the state-space fusion branch to form boundary-enhanced features for subsequent segmentation.
Fig 3.
The structure of the Multi-scale Lesion Shape Prior module is illustrated, where the input features first pass through two stages of IDCNN to model long-range local context and are then transformed linearly to form the main decoding features, while the lower branch extracts shape-related responses through an edge-enhancement operator and convolutional networks to generate a global gating vector θ for channel importance modulation.
In the bottom pathway, the decoder features are processed by three scale-projection operators to obtain multi-scale shape representations, which are concatenated and fused through gating to produce shape-enhanced features, thereby injecting explicit contour and scale priors into the decoding stage.
Fig 4.
Example image of ISIC dataset.
Fig 5.
Example image of HAM10000 dataset.
Fig 6.
Example image of PH2 dataset.
Table 2.
Dataset statistics and train/test splits used in this study.
Table 3.
Hardware environment and training hyperparameters used in this study.
Table 4.
Comparison with state-of-the-art segmentation models on the ISIC 2018 dataset.
Table 5.
Comparison with state-of-the-art segmentation models on the HAM10000 dataset.
Table 6.
Comparison with state-of-the-art segmentation models on the PH2 dataset.
Fig 7.
Loss function changes with epoch.
Table 7.
Systematic ablation study on ISIC 2018, HAM10000, and PH2 datasets.
BP denotes the boundary prior (gradient-driven boundary enhancement), HA denotes the hybrid attention in the encoder, and SP denotes the shape prior introduced by MLSP.
Fig 8.
Qualitative results of ablation experiments.
Fig 9.
This figure illustrates the pairwise significance test results of the base model and different module combinations relative to the final model on three datasets.
By visualizing the p-value matrix, the statistical contribution of each module to the performance differences can be intuitively assessed, and the significance of the performance improvement can be verified.
Table 8.
Hyperparameter sensitivity analysis of loss weights ,
, and
.
Fig 10.
Qualitative Experimental Results of the ISIC2018 Dataset.
Fig 11.
Qualitative Experimental Results of the HAM10000 Dataset.
Fig 12.
Qualitative Experimental Results of the PH2 Dataset.
Fig 13.
Experimental results of Grad-cam on three datasets.
Fig 14.
The curves showing the changes in IoU, DSC, HD95, and ASD metrics of the proposed model on the ISIC 2018, HAM10000, and PH2 datasets under different Gaussian noise intensities demonstrate its robustness to lesion segmentation performance under random perturbations.
Table 9.
Cross-domain generalization results on skin lesion segmentation.
Fig 15.
PR curve experimental results.