A comparative analysis of video vision transformers on word-level sign language datasets

doi:10.1371/journal.pone.0341909

A comparative analysis of video vision transformers on word-level sign language datasets

Fig 6

Confusion matrix for BdSLW401 test set (visualizing first 50 classes).

This confusion matrix presents the model’s classification results for the first 50 classes out of a total of 401 in the BdSLW401 test set. Each row represents the actual class label, while each column shows the predicted label. The darker diagonal cells indicate correct predictions, suggesting the model has learned to recognize many of the signs accurately within this subset. The few lighter off-diagonal entries represent misclassifications, pointing to some confusion between certain signs. This visualization provides insight into the model’s performance on a portion of the full class set.

doi: https://doi.org/10.1371/journal.pone.0341909.g006