FocusGate-Net: A dual-attention guided MLP-convolution hybrid network for accurate and efficient medical image segmentation

doi:10.1371/journal.pone.0331896

Table 1.

Ablation study results on ISIC2018.

More »

Expand

Fig 1.

FocusGate-Net architecture overview.

The model uses a U-shaped design: the encoder performs progressive downsampling at scales 1/2, 1/4, 1/8, and 1/16 with convolutional blocks, followed by shifted MLP blocks for global context modeling; the decoder upsamples via transposed convolutions; attention gates on the skip connections filter relevant features before fusion. A dermoscopic input image is processed to produce the final segmentation mask.

More »

Expand

Fig 2.

Shifted Token MLP (ST-MLP) module architecture.

The module consists of two parallel pathways: (top) a spatial shift operation followed by split attention for capturing spatial dependencies, and (bottom) a channel shift operation with global max pooling and convolution for modeling channel relationships. Both pathways incorporate residual connections to improve gradient flow during training.

More »

Expand

Fig 3.

Attention Gate (AG) architecture for skip connections.

The module takes encoder features x and gating signal g from the decoder as inputs. These are processed through convolutions, combined via addition, and then passed through ReLU activation, a convolution with batch normalization, and a sigmoid activation. The resulting attention map is multiplied element-wise with the original encoder features to produce the refined features that are passed to the decoder.

More »

Expand

Fig 4.

Dice and IoU Comparison Across Models on ISIC2018.

The proposed FocusGate-Net achieved superior performance across both metrics, validating the benefit of its hybrid architecture. Ablated versions (No AG, No CBAM) show reduced performance, highlighting the importance of dual attention.

More »

Expand

Table 2.

Generalization results of FocusGate-Net on PH2 and Kvasir-SEG.

More »

Expand

Table 3.

Benchmark comparison of segmentation models on ISIC2018.

More »

Expand

Fig 5.

Qualitative comparison of segmentation outputs on ISIC2018.

The first row shows the original dermoscopic images. The second and third rows display segmentation results produced by UNet++ and ResUNet, respectively. The fourth row shows predictions by the proposed FocusGate-Net. Compared to the baselines, FocusGate-Net provides more precise boundary localization and better structure preservation across diverse lesion types.

More »

Expand

Table 4.

Comparison with state-of-the-art Transformer and MLP-based models on ISIC2018.

More »

Expand

Table 5.

Cross-modal generalization performance of FocusGate-Net.

More »

Expand

Fig 6.

Heatmap of evaluation metrics across models on the ISIC2018 dataset.

FocusGate-Net achieves the best overall performance across all metrics, particularly in Dice, Precision, and Accuracy, demonstrating its effectiveness in segmenting challenging lesion boundaries.

More »

Expand