Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1.

This table shows the subject characteristics of the dataset used in this study.

More »

Table 1 Expand

Fig 1.

Illustration of a Vision Transformer (ViT): A neural network architecture designed for computer vision tasks.

The diagram depicts the distinctive structure of the vision transformer, emphasizing its attention mechanisms and positional encoding techniques, facilitating efficient image data processing within a transformer-based framework.

More »

Fig 1 Expand

Fig 2.

The workflow of the proposed Hybrid-RViT model.

More »

Fig 2 Expand

Fig 3.

Overview of the Hybrid-RViT architecture.

The process begins with inputting images of dimensions 224×224, which are fed to ResNet-50 for feature extraction, the extracted features undergo processing through ViT in the form of patches (N+1)+D). A transformer encoder is then applied to perform self-attention. Finally, the learned features are passed through an MLP classifier for classification. The model optimization during training is performed using the validation set, while the test dataset is utilized to evaluate the model’s performance on unseen data.

More »

Fig 3 Expand

Table 2.

Hyper-parameters used during training of proposed Hybrid-RViT.

More »

Table 2 Expand

Fig 4.

Plot of training accuracy and loss.

Fig 4A displays training accuracy and validation accuracy, while Fig 4B shows training loss and validation loss.

More »

Fig 4 Expand

Table 3.

Evaluation metrics of the Hybrid-RViT.

More »

Table 3 Expand

Fig 5.

Confusion matrix for the Hybrid-RViT model on test data set.

More »

Fig 5 Expand

Fig 6.

Comparative analysis of accuracy for proposed Hybrid-RViT model against VGG-TSwift, SMIL-DeiT, Efficient+ViT, and ViT.

More »

Fig 6 Expand

Fig 7.

Results of the ablation study comparing ResNet-50 versus ResNet-101.

Fig 7A shows the training accuracy and validation accuracy of Hybrid-RViT after replacing ResNet-50 with ResNet-101. In Fig 7B, the training loss and validation loss of Hybrid-RViT are shown after replacing ResNet-50 with ResNet-101.

More »

Fig 7 Expand