An effective spatial relational reasoning networks for visual question answering

doi:10.1371/journal.pone.0277693

Fig 1.

Example of visual question answering tasks.

More »

Expand

Fig 2.

Overall flowchart of the proposed SRRN model.

More »

Expand

Fig 3.

Sparse attention mechanism encoder.

More »

Expand

Fig 4.

The sparse attention mechanism.

More »

Expand

Fig 5.

Co-attention structure diagram.

More »

Expand

Table 1.

Performance of different SRRN variant models on VQA 2.0 dataset.

More »

Expand

Fig 6.

The change of accuracy and loss function in the process of model training.

More »

Expand

Fig 7.

(a) The accuracy of “Yes/No” based on different parameters. (b) Accuracy of “Number” based on different parameter. (c) Accuracy of “Other” based on different parameters. (d) Accuracy of “All” based on different parameters.

More »

Expand

Table 2.

The performance of different layers of encoder and decoder.

More »

Expand

Table 3.

Comparison of pre-trained model parameters and SRRN model on the VQA 2.0 dataset.

More »

Expand

Table 4.

Performance comparison results on VQA 2.0 dataset.

More »

Expand

Table 5.

Performance comparison results on GQA dataset.

More »

Expand

Fig 8.

Example of visualization on VQA 2.0 and GQA datasets.

More »

Expand