Fig 1.
Overall framework of the method.
Fig 2.
Backbone network and LoRA fine-tuning method applied in SAM2.
Fig 3.
Multi-modal fusion methods, (a) addition; (b) concatenation; (c) cross-attention fusion.
Table 1.
Test metrics of different methods on different datasets.
Fig 4.
Performance of different methods on the SM metric.
Fig 5.
Visualization comparison of prediction results from different methods.
Table 2.
Ablation study of different modules in the model.
Fig 6.
Depth estimation information for different images.