Fig 1.
Example of visual question answering tasks.
Fig 2.
Overall flowchart of the proposed SRRN model.
Fig 3.
Sparse attention mechanism encoder.
Fig 4.
The sparse attention mechanism.
Fig 5.
Co-attention structure diagram.
Table 1.
Performance of different SRRN variant models on VQA 2.0 dataset.
Fig 6.
The change of accuracy and loss function in the process of model training.
Fig 7.
(a) The accuracy of “Yes/No” based on different parameters. (b) Accuracy of “Number” based on different parameter. (c) Accuracy of “Other” based on different parameters. (d) Accuracy of “All” based on different parameters.
Table 2.
The performance of different layers of encoder and decoder.
Table 3.
Comparison of pre-trained model parameters and SRRN model on the VQA 2.0 dataset.
Table 4.
Performance comparison results on VQA 2.0 dataset.
Table 5.
Performance comparison results on GQA dataset.
Fig 8.
Example of visualization on VQA 2.0 and GQA datasets.