Fig 1.
3D convolution.
Fig 2.
Self-attention module.
Fig 3.
Transformer model network structure.
Fig 4.
VST Block network structure.
Fig 5.
OM-VST model network structure.
Fig 6.
Optimized Downsampling network structure.
Fig 7.
Patch merging network structure.
Fig 8.
Multi-scale feature information fusion module network structure.
Table 1.
Experiment parameter configuration.
Table 2.
Performance comparison of various categories.
Fig 9.
Confusion matrix.
Fig 10.
ROC curves.
Fig 11.
Comparison of model accuracy.
Fig 12.
P-R curve.
Table 3.
Performance comparison of different models.
Table 4.
Comparison of model parameters.
Table 5.
Comparison of ablation experiment results.