Fig 1.
The comparison of DFMNet with SOTA stereo methods on KITTI 2012 [17]and KITTI 2015 [18].
The term 3 − all refers to the percentage of pixels with errors larger than the 3-pixel predictions across all regions, and a lower value is desirable. Similarly, D1 − all represents the percentage of stereo disparity outliers in the first frame across all regions, and a smaller value is preferred.
Fig 2.
The prediction of Fast-GFM on Scene Flow [19].
Fig 3.
The architecture of the dual-dimension feature modulation network (DFMNet).
The letter C represents the concatenation operation, while CFV stands for the cost filter volume. DAM, which stands for dual-attention modulation, is a technique that incorporates dual-attention mechanisms into a model to improve performance. DFM Block refers to the dual-dimension feature modulation block. The output of our DFMNet is the prediction map.
Fig 4.
The structure of dual-attention modulation.
Fig 5.
The structure of Fast-GFM.
Fig 6.
The structure of SE block.
Fig 7.
Evaluation results of DFMNet on Scene Flow [19] and ETH3D [33].
Bad1.0 and Bad2.0 represent the proportion of pixels whose prediction differs from the ground truth by more than 1.0 and 2.0, respectively. These metrics are used to evaluate the accuracy of the predictions, and lower values indicate better performance.
Fig 8.
Evaluation results of Fast-GFM on Scene Flow [19].
EPE is used to evaluate the accuracy of the predictions, and lower values indicate better performance.
Table 1.
Fig 9.
Results on KITTI 2012 [17].
The matching results of interference area marked in red box.
Fig 10.
Results on KITTI 2015 [18].
The matching results of interference area marked in red box.
Fig 11.
Results of DFMNet on ETH3D [33].
Table 2.
Quantitative evaluation of Fast-GFM on ETH3D [33] and Middlebury [34].
Fig 12.
Results of Fast-GFM on Middlebury [34].
Fig 13.
Universal study of CFV on Scene Flow [19].
CFV represents the cost filter volume used in our method. PSMNet−CFV, which replaces the concatenated volume with CFV, and GwcNet−CFV, which replaces the combined volume with CFV.
Fig 14.
Comparisons between CFV and CAS [41] on Scene Flow [19].
It can be made by evaluating the performance of two different approaches: GwcNet−CAS, which utilizes the Cascaded Cost Volume (CAS) [41] as a replacement for the combined volume, and GwcNet−CFV, which replaces the volume with our CFV.
Table 3.
Performance of TOP-K when using different K on Scene Flow [19].
Table 4.
Analysis of performance with different numbers of hourglasses on Scene Flow [19].
Table 5.
Analysis of accuracy and runtime with different numbers of hourglass structures on Scene Flow [19].
Table 6.
Ablation study of DFMNet on Scene Flow [19].
Table 7.
Ablation study of DFMNet on Scene Flow [19].