Fig 1.
Proposed method architecture: Our proposed method combines three key components: 1) Integration of deformable convolutions within the UNet framework to enhance fine-grained details and object boundary distinction. 2) Introduction of an attention ASPP module for context modeling, utilizing attention mechanisms to capture contextual information at multiple scales. 3) Utilization of the Large Kernel Attention (LKA) module in the decoding path of the UNet to refine features and improve discrimination of object classes. ** Original image was taken from Cityscapes dataset which is freely available on [69].
Fig 2.
Deformable convolution [16].
Fig 3.
SE block (a) and ECA block (b).
Fig 4.
Large kernel attention module.
Fig 5.
Efficient channel attention ASPP.
Table 1.
Details of large kernel attention modules.
Table 2.
Structure of improved ResNet50.
Table 3.
Different modules used in the proposed UNet-based architectures.
Table 4.
Effect of additional modules on segmentation performance: Ablation study results in Stanford dataset.
Table 5.
Effect of additional modules on segmentation performance: Ablation study results in Cityscapes dataset.
Fig 6.
Ablation study results: Comparative analysis of additional modules on segmentation performance in Cityscapes dataset.
(Original image was taken from Cityscapes dataset which is freely available on [69]).
Fig 7.
Ablation study results: Comparative analysis of additional modules on segmentation performance in WildPASS and DensPASS datasets.
(Original image was taken from WildPASS and DensPASS datasets which are freely available on [54, 70]).
Table 6.
Performance comparison of semantic segmentation methods on Cityscapes, DensPASS.
Table 7.
Performance comparison of semantic segmentation methods on Cityscapes.
Table 8.
Class-wise IOU comparison of segmentation models on the Stanford dataset.
Fig 8.
Visual comparison of original [69] and segmented images using state of the art (FCN [27], UNet-R(34) [29], FPN [60], Fast-SCNN [58], UNet++ [30], DeepLabV3#layer 1 [16], S-FPN [60], Segformer-B1 [66], DeepLabV3+ [13], Trans4PASS-S [35], DANet [18]) and proposed method in the Cityscapes dataset.
(Original image was taken from Cityscapes dataset which is freely available on [69]).