Table 1.
Ablation study results on ISIC2018.
Fig 1.
FocusGate-Net architecture overview.
The model uses a U-shaped design: the encoder performs progressive downsampling at scales 1/2, 1/4, 1/8, and 1/16 with convolutional blocks, followed by shifted MLP blocks for global context modeling; the decoder upsamples via transposed convolutions; attention gates on the skip connections filter relevant features before fusion. A dermoscopic input image is processed to produce the final segmentation mask.
Fig 2.
Shifted Token MLP (ST-MLP) module architecture.
The module consists of two parallel pathways: (top) a spatial shift operation followed by split attention for capturing spatial dependencies, and (bottom) a channel shift operation with global max pooling and convolution for modeling channel relationships. Both pathways incorporate residual connections to improve gradient flow during training.
Fig 3.
Attention Gate (AG) architecture for skip connections.
The module takes encoder features x and gating signal g
from the decoder as inputs. These are processed through
convolutions, combined via addition, and then passed through ReLU activation, a
convolution with batch normalization, and a sigmoid activation. The resulting attention map is multiplied element-wise with the original encoder features to produce the refined features
that are passed to the decoder.
Fig 4.
Dice and IoU Comparison Across Models on ISIC2018.
The proposed FocusGate-Net achieved superior performance across both metrics, validating the benefit of its hybrid architecture. Ablated versions (No AG, No CBAM) show reduced performance, highlighting the importance of dual attention.
Table 2.
Generalization results of FocusGate-Net on PH2 and Kvasir-SEG.
Table 3.
Benchmark comparison of segmentation models on ISIC2018.
Fig 5.
Qualitative comparison of segmentation outputs on ISIC2018.
The first row shows the original dermoscopic images. The second and third rows display segmentation results produced by UNet++ and ResUNet, respectively. The fourth row shows predictions by the proposed FocusGate-Net. Compared to the baselines, FocusGate-Net provides more precise boundary localization and better structure preservation across diverse lesion types.
Table 4.
Comparison with state-of-the-art Transformer and MLP-based models on ISIC2018.
Table 5.
Cross-modal generalization performance of FocusGate-Net.
Fig 6.
Heatmap of evaluation metrics across models on the ISIC2018 dataset.
FocusGate-Net achieves the best overall performance across all metrics, particularly in Dice, Precision, and Accuracy, demonstrating its effectiveness in segmenting challenging lesion boundaries.