Fig 1.
Dual-modal object detection based on deep learning.
Fig 2.
Overall framework of the proposed DEF-Net.
Fig 3.
The backbone network architecture of Darknet53.
Fig 4.
Dual-branch backbone structure.
Fig 5.
Feature interaction and enhancement structure.
Fig 6.
Cross attention fusion network structure.
Fig 7.
Illustration of Cross attention weight.
Table 1.
Training Hyperparameter Configuration.
Fig 8.
SYUGV Datasets.
Table 2.
Comparative experimental results of different models.
Fig 9.
Comparison of model training on the SYUGV dataset.
Fig 10.
Comparison of detection effects of different models on the SYUGV dataset.
Fig 11.
Comparison of model training on the LLVIP dataset.
Table 3.
Comparative experimental results of different models.
Fig 12.
Comparison of detection effects of different models on the LLVIP dataset.
Table 4.
Model detection performance of different modal inputs.
Fig 13.
P-R curves of different modal inputs.
Fig 14.
Grad-CAM heatmap of dual branch model.
Table 5.
Model detection performance of different backbone networks.
Table 6.
Ablation study of different module combinations on the SYUGV dataset.
Table 7.
Ablation study of different module combinations on the LLVIP dataset.
Fig 15.
mAP@0.5 curves of ablation studies on the(a) SYUGV and (b) LLVIP datasets.
Fig 16.
Visualization of feature activation (heatmaps) for different model variants.