Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Encoder-decoder model for image captioning.

More »

Fig 1 Expand

Fig 2.

Overview of the CNN-LSTM-based image captioning architecture.

More »

Fig 2 Expand

Fig 3.

Taxonomy of deep learning-based image captioning approaches.

More »

Fig 3 Expand

Table 1.

Comparison of vanilla transformer-based image captioning models.

More »

Table 1 Expand

Table 2.

A comparison of image captioning Vision-Language Pre-training (VLP) models.

More »

Table 2 Expand

Fig 4.

Architectural comparison between BLIP-2, CoCa, Flamingo, and the proposed gated cross-attention fusion model.

More »

Fig 4 Expand

Fig 5.

Comparison of common datasets for image captioning.

More »

Fig 5 Expand

Table 3.

Common datasets for image captioning.

More »

Table 3 Expand

Fig 6.

Dataset size and captions per image.

More »

Fig 6 Expand

Fig 7.

Comparison of caption density per image in benchmark image captioning datasets.

More »

Fig 7 Expand

Fig 8.

Distribution of specialized datasets in image captioning research.

More »

Fig 8 Expand

Fig 9.

Comparison of image quality challenge levels in captioning benchmarks.

More »

Fig 9 Expand

Table 4.

Baseline model CIDEr performance and SCST support status.

More »

Table 4 Expand

Table 5.

Summary of model configuration and training setup.

More »

Table 5 Expand

Fig 10.

Normalized model performance evaluation metrics.

More »

Fig 10 Expand

Table 6.

Common evaluation metrics for image captioning.

More »

Table 6 Expand

Fig 11.

Component intensity scores of image captioning models.

More »

Fig 11 Expand

Fig 12.

Performance comparison of image captioning models across multiple metrics.

More »

Fig 12 Expand

Table 7.

Comparison of methods with image captioning models across multiple datasets.

More »

Table 7 Expand

Table 8.

Proposed methodology results.

More »

Table 8 Expand

Fig 13.

Training and validation loss trends for image captioning models.

More »

Fig 13 Expand