Fig 1.
Block diagrammatic representation of image caption generation.
Fig 2.
Image captioning using encoder-decoder LSTM.
Table 1.
Combination of key hyperparameters.
Table 2.
Performance of various CNN and attention-based RNN combinations.
Table 3.
Greedy search performance analysis for different CNN and RNN configurations.
Table 4.
Beam search performance analysis for different CNN and RNN configurations.
Fig 3.
SHAP value heatmap for a sample image and its corresponding Bangla caption.
Positive SHAP values (in red) highlight regions contributing positively towards the caption prediction, while negative values (in blue) indicate inhibitory regions.
Fig 4.
Developed application for Bangla image captioning.
Table 5.
Performance comparison of suggested and existing models.