Fig 1.
Overview of our proposed DECTNet approach with dual-encoder-single-decoder structure.
DECTNet consists of four components: Convolution-based encoder, Swin Transformer-based encoder, Feature Fusion decoder, and Deep Supervision module. The detailed composition of each component is described in the following sections.
Fig 2.
The detailed structure of the convolution encoder stage.
This stage consists of the DenseConnection Block and the CBAM Block, which are applied to sufficiently extract detailed information from the images.
Fig 3.
Overview of the Swin Transformer-based encoder of the proposed DECTNet.
(a) Components of the SwinTransformer-based encoder. It includes scaling the features in the ST-Encoder. (b) Composition of the STP block (c) Two successive Swin Transformer Block. W-MSA and SW-MSA are multi-head self-attention modules with regular and shifted windowing configurations.
Fig 4.
The detailed structure of the feature fusion decoder stage.
The stage has two components: the feature aggregation block and the feature selection block, which are applied to integrate and select features.
Fig 5.
The detailed structure of the deep supervision module.
The DS-Module converts different scale features of the F-decoder into the same scale confidence maps.
Table 1.
Result of comparisons with other methods in skin lesion segmentation task.
Fig 6.
Visual comparison examples with other approaches in the skin lesion segmentation.
The red contour refers to the ground truth, and different segmentation masks are produced by different methods.
Fig 7.
Comparison of Dice score among different methods on the validation dataset in the skin lesion segmentation task during the training process.
Fig 8.
Visual comparison examples with other approaches in the Covid-19 lesion segmentation.
The red contour refers to the ground truth, and different segmentation masks are produced by different methods.
Table 2.
Result of comparisons with other methods in Covid-19 lesion segmentation task.
Fig 9.
Comparison of Dice score among different methods on the validation dataset in the Covid-19 lesion segmentation task during the training process.
Table 3.
Result of comparisons with other methods in polyp segmentation task.
Fig 10.
Visual comparison examples with other approaches in the polyp segmentation.
The red contour refers to the ground truth, and different segmentation masks are produced by different methods.
Fig 11.
Comparison of Dice score among different methods on the validation dataset in the polyp segmentation task during the training process.
Table 4.
Result of comparisons with other methods in cardiac segmentation task.
Fig 12.
Visual comparison examples with other approaches in the cardiac segmentation.
The red, green, and blue portions refer to the right ventricle, the myocardium, and the left ventricle, respectively. Different segmentation masks are produced by different methods, where the masks in “Original Image” refer to the ground truth.
Fig 13.
Comparison of Dice score among different methods on the validation dataset in the cardiac segmentation task during the training process.
Fig 14.
Visual comparison samples in Ablation study.
(a) refers to the original image and the corresponding ground truth, (b) and (c) refer to the heat map and segmentation mask produced by “C-encoder + Sim decoder”, and (d) and (e) to the heat map and segmentation mask produced by the DECTNet.
Table 5.
Quantitative result of ablation study in skin lesion segmentation task.
Table 6.
Quantitative result of ablation study in Covid-19 lesion segmentation task.