Robust salient object detection based on triple attention-guided multi-resolution fusion and feature refinement

doi:10.1371/journal.pone.0342974

Fig 1.

The proposed model framework.

It has three main components: the backbone, the triple attention-guided multi-resolution fusion (TAMF) module, and the feature refinement (FR) module. The TAMF is used for the coarse localization of salient objects and for improving the fusion effect and complementarity of cross-scale features, and FR is for recovering object details. (The input images and ground truth (GT) annotations are sourced from the DUTS dataset (official download link: http://saliencydetection.net/duts/; source code repository: https://github.com/scott89/WSS, [30]). The dataset is licensed under the 3-Clause BSD Open Source License (compatible with the CC BY 4.0 license). The feature maps are generated by the code independently developed in this paper, and the model architecture diagram is created using Microsoft PowerPoint. All the above elements are original content by the authors and are licensed under the CC BY 4.0 license.)

More »

Expand

Fig 2.

Structure of two key sub-modules TA and MF.

GAP, SA, and CA refer to global average pooling, spatial attention mechanism, and channel attention mechanism, respectively. All of them include convolution, BatchNorm, and ReLU.

More »

Expand

Fig 3.

Structure of feature refinement (FR) module.

The meanings of GAP, SA, and CA are the same as before.

More »

Expand

Table 1.

Ablation study of the proposed model on ECSSD, PASCAL-S, and DUTS.

More »

Expand

Fig 4.

Visualization of the feature mapping with or without using TA.

(a) Input image. (b) Ground truth. (c) Without TA. (d) With TA. We can see that our TA method suppresses background noise better while highlighting objects. (All feature maps are generated by the code in this paper. The input images and ground truth annotations are derived from the DUTS dataset (official download link: http://saliencydetection.net/duts/; source code repository: https://github.com/scott89/WSS, [30]). The dataset is licensed under the 3-Clause BSD Open Source License (compatible with the CC BY 4.0 license), and its licensing terms have been strictly followed in this work.)

More »

Expand

Fig 5.

Visual comparison results of our method with other methods at six different scales.

(a) Input image; (b) Ground truth; (c) F3Net; (d) ITSD; (e) MINet; (f) GCPANet; (g) Ours. As observed, our approach demonstrates a stronger capability in suppressing background noise and is more effective in addressing the challenges posed by highly variable scales and the sensitivity of SOD to such variations.

More »

Expand

Table 2.

Ablation Study on TA vs. CBAM.

More »

Expand

Table 3.

Ablation Study on FR Module with/without Inter-Branch Skip Connections.

More »

Expand

Table 4.

Ablation and Sensitivity Analysis on Kernel/Dilation(K/D) Sizes of FR Module (on ECSSD).

More »

Expand

Table 5.

Experimental results of different models on five datasets.

More »

Expand

Fig 6.

Precision-recall, F-measure curves, and FNR↓ results.

More »

Expand

Fig 7.

Visual comparison of our method with advanced methods.

(1) Input image; (2) Ground truth; (3) BASNet; (4) EGNet; (5) F3Net; (6) ITSD; (7) MINet; (8) GCPANet; (9) DFI; (10) ICON; (11) MGuidNet; (12) DIPONet; (13) DSLRDNet; (14) Ours. (To ensure the fairness of comparison, all visualization results are regenerated by running the official code of each method under a unified experimental setup. The use of input images complies with the licensing terms of the DUTS dataset (which is licensed under the 3-Clause BSD Open Source License, compatible with the CC BY 4.0 license; official download link: http://saliencydetection.net/duts/; source code repository: https://github.com/scott89/WSS, [30]).)

More »

Expand

Fig 8.

Illustration of failure cases.

(a) Input image; (b) Ground truth; (c) ITSD; (d) MINet; (e) GCPANet; (f) ICON; (g) Ours.

More »

Expand