Fig 1.
Illustration of the Mamba Hybrid Self-Attention Vision Transformers (MHS-VIT) architecture.
Fig 2.
Illustration of the Focus Block architecture.
Fig 3.
(a) Detailed structure of the SVT Block. (b) Illustration of the VISSBlock architecture. (c) Illustration of the DLS Block. (d) Illustration of the LS Block.
Fig 4.
Illustration of the LSDetect architecture.
Table 1.
Ablation study on MHS-VIT.
Fig 5.
DLS Block integration designs explored in the ablation study.
Table 2.
Ablation study on DLS block
Fig 6.
Comparison chart of different model detection results on the TROD datasets.
Fig 7.
Compare the performance of different model on the TROD dataset.
Table 3.
Comparison of MHS-VIT with other image detection networks on the TROD (bold indicates our framework).
Fig 8.
Comparison chart of different model detection results on the TSLD datasets.
Fig 9.
Compare the performance of different model on the TSLD dataset.
Table 4.
Comparison of MHS-VIT with other image detection networks on the TSLD (bold indicates our framework).