DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation

doi:10.1371/journal.pone.0301019

Fig 1.

Overview of our proposed DECTNet approach with dual-encoder-single-decoder structure.

DECTNet consists of four components: Convolution-based encoder, Swin Transformer-based encoder, Feature Fusion decoder, and Deep Supervision module. The detailed composition of each component is described in the following sections.

More »

Expand

Fig 2.

The detailed structure of the convolution encoder stage.

This stage consists of the DenseConnection Block and the CBAM Block, which are applied to sufficiently extract detailed information from the images.

More »

Expand

Fig 3.

Overview of the Swin Transformer-based encoder of the proposed DECTNet.

(a) Components of the SwinTransformer-based encoder. It includes scaling the features in the ST-Encoder. (b) Composition of the STP block (c) Two successive Swin Transformer Block. W-MSA and SW-MSA are multi-head self-attention modules with regular and shifted windowing configurations.

More »

Expand

Fig 4.

The detailed structure of the feature fusion decoder stage.

The stage has two components: the feature aggregation block and the feature selection block, which are applied to integrate and select features.

More »

Expand

Fig 5.

The detailed structure of the deep supervision module.

The DS-Module converts different scale features of the F-decoder into the same scale confidence maps.

More »

Expand

Table 1.

Result of comparisons with other methods in skin lesion segmentation task.

More »

Expand

Fig 6.

Visual comparison examples with other approaches in the skin lesion segmentation.

The red contour refers to the ground truth, and different segmentation masks are produced by different methods.

More »

Expand

Fig 7.

Comparison of Dice score among different methods on the validation dataset in the skin lesion segmentation task during the training process.

More »

Expand

Fig 8.

Visual comparison examples with other approaches in the Covid-19 lesion segmentation.

The red contour refers to the ground truth, and different segmentation masks are produced by different methods.

More »

Expand

Table 2.

Result of comparisons with other methods in Covid-19 lesion segmentation task.

More »

Expand

Fig 9.

Comparison of Dice score among different methods on the validation dataset in the Covid-19 lesion segmentation task during the training process.

More »

Expand

Table 3.

Result of comparisons with other methods in polyp segmentation task.

More »

Expand

Fig 10.

Visual comparison examples with other approaches in the polyp segmentation.

The red contour refers to the ground truth, and different segmentation masks are produced by different methods.

More »

Expand

Fig 11.

Comparison of Dice score among different methods on the validation dataset in the polyp segmentation task during the training process.

More »

Expand

Table 4.

Result of comparisons with other methods in cardiac segmentation task.

More »

Expand

Fig 12.

Visual comparison examples with other approaches in the cardiac segmentation.

The red, green, and blue portions refer to the right ventricle, the myocardium, and the left ventricle, respectively. Different segmentation masks are produced by different methods, where the masks in “Original Image” refer to the ground truth.

More »

Expand

Fig 13.

Comparison of Dice score among different methods on the validation dataset in the cardiac segmentation task during the training process.

More »

Expand

Fig 14.

Visual comparison samples in Ablation study.

(a) refers to the original image and the corresponding ground truth, (b) and (c) refer to the heat map and segmentation mask produced by “C-encoder + Sim decoder”, and (d) and (e) to the heat map and segmentation mask produced by the DECTNet.

More »

Expand

Table 5.

Quantitative result of ablation study in skin lesion segmentation task.

More »

Expand

Table 6.

Quantitative result of ablation study in Covid-19 lesion segmentation task.

More »

Expand