Fig 1.
Hybrid deep-learning architecture for cyberbullying detection.
This architecture shows the multimodal architecture combining CNN, BiLSTM, and Transformer encoders. The semantic consistency layer enforces alignment among modalities using the loss defined in Equation 2, improving contextual reliability before classification.
Table 1.
Feature extraction components.
Table 2.
Experimental configuration and training parameters.
Fig 2.
Sequential workflow of the proposed framework.
Fig 3.
Training and validation curves.
Fig 4.
Confusion matrices for CAVD and SocialVidMix.
Fig 5.
ROC (AUC) curves for proposed vs baseline models.
Fig 6.
Precision–recall (PR) curves.
Table 3.
Dataset characteristics and annotation reliability.
Table 4.
Performance comparison on CAVD and SocialVidMix datasets.
Table 5.
Cross-platform generalization results.
Table 6.
Ablation study results on CAVD dataset.