Activation extending based on long-range dependencies for weakly supervised semantic segmentation | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

(a) Causal intervention in the sample caused by long-range dependencies. The same class is represented in the same color, where A ↔ D and A1 ↔ D1 represent false causal between different classes of regions. (b) Modeling of long-range dependencies between regional samples. The long-range dependences of the same classes in different samples(A ↔ A1,D ↔ D1) is established and the false causal between different classes is cut off. Each class captures long-range dependencies through pixel dependencies in a smaller area.

More »

Fig 2.

We propose a two-step network structure.
CAMs in the figure result from the visualization of all the classes. The generated by RSA was restored to the size of C by the Merge module after being corrected by the cross-image comparison module, C generated by two-step provides equivariant constraint supervision for the merged version of .

More »

Fig 2.

We propose a two-step network structure.
CAMs in the figure result from the visualization of all the classes. The generated by RSA was restored to the size of C by the Merge module after being corrected by the cross-image comparison module, C generated by two-step provides equivariant constraint supervision for the merged version of .

More »

Fig 3.

The two modules applied in step 1 of Fig 2 is described in detail.
The reshape represents tensor size transformation, ⊗ Stands for matrix dot product operation. RSA module: The similarity matrix with the calculated size of MN × MN is normalized,The refined CAM can be obtained by weighting the original CAM . CFC module:The foreground vector is formed by matrix multiplication of and .

More »

Fig 3.

The two modules applied in step 1 of Fig 2 is described in detail.
The reshape represents tensor size transformation, ⊗ Stands for matrix dot product operation. RSA module: The similarity matrix with the calculated size of MN × MN is normalized,The refined CAM can be obtained by weighting the original CAM . CFC module:The foreground vector is formed by matrix multiplication of and .

More »

Table 1 — Table 1.

Comparative experiments based on different values of λ and β, the quality of CAMs assessed by PASCAL VOC2012 training dataset in mIoU%.

More »

Table 2 — Table 2.

Comparison experiment with PuzzleCam: Two different backbones are used to experiment on PASCAL VOC2012 training dataset in mIoU%.

More »

Fig 4 — Fig 4.

Pseudo masks on PASCAL VOC 2012 train dataset.
From top to bottom are original images; ground truth; The prediction results of PuzzleCam; The prediction results of our method.

More »

Fig 5 — Fig 5.

Pseudo masks on PASCAL VOC 2012 val dataset.
From top to bottom are original images; ground truth; The prediction results of PuzzleCam; The prediction results of our method.

More »

Table 3 — Table 3.

Performance on the PASCAL VOC 2012 validation set, compared to weakly supervised approaches based only on image-level labels.

More »

Table 4 — Table 4.

Performance on the PASCAL VOC 2012 test set, compared to weakly supervised approaches based only on image-level labels.

More »

Table 5 — Table 5.

Comparison of our proposed method and existing state-of-the-art methods on the PASCALVOC2012 val and test.
I, image-level labels; S, saliency label.

More »

Table 6 — Table 6.

Comparison of our proposed method and existing methods on the MS COCO 2014 val.

More »

Table 7 — Table 7.

Ablation experiment of mIoU.
The quality of the pseudo mask is evaluated on PASCAL VOC2012 Training Dataset. ResNet50 was used as the backbone.

More »

Fig 6 — Fig 6.

Ablation experiment of CAM.
ResNet50 was used as the backbone network for analysis. CAM in the figure is generated by PASCAL VOC2012 Training Dataset.

More »

Fig 7 — Fig 7.

Ablation on the iterations number of PASCAL VOC2012 Training Dataset, and shows the mIoU, mFDR, and mFNR for different iterations, the backbone network is ResNest-101: (a) baseline + CFC; (b) baseline + RSA; (c) baseline + CFC + RSA; (d) shows the loss for the different number of iterations of the final version (c).

More »

Fig 8 — Fig 8.

Column 1 is the original figure, and columns 1, 3, 4, and 5 represent the cam generated with different training epoch.

More »