Image captioning in Bengali language using visual attention

doi:10.1371/journal.pone.0309364

Fig 1.

Block diagrammatic representation of image caption generation.

More »

Expand

Fig 2.

Image captioning using encoder-decoder LSTM.

More »

Expand

Table 1.

Combination of key hyperparameters.

More »

Expand

Table 2.

Performance of various CNN and attention-based RNN combinations.

More »

Expand

Table 3.

Greedy search performance analysis for different CNN and RNN configurations.

More »

Expand

Table 4.

Beam search performance analysis for different CNN and RNN configurations.

More »

Expand

Fig 3.

SHAP value heatmap for a sample image and its corresponding Bangla caption.

Positive SHAP values (in red) highlight regions contributing positively towards the caption prediction, while negative values (in blue) indicate inhibitory regions.

More »

Expand

Fig 4.

Developed application for Bangla image captioning.

More »

Expand

Table 5.

Performance comparison of suggested and existing models.

More »

Expand