Fig 1.
Advantages of conversational agents.
The figure summarizes the key benefits of conversational agents in enhancing user interaction, personalization, and accessibility.
Table 1.
Comparative analysis of existing methods.
Fig 2.
Proposed architecture of the EAC-Agent framework.
The figure presents the overall workflow of the proposed system, including multimodal feature extraction, self- and cross-attention-based fusion, and emotion-aware response generation.
Fig 3.
Audio feature extraction process.
The figure illustrates the steps involved in noise reduction, MFCC computation, and statistical modeling for generating acoustic representations.
Fig 4.
Video feature extraction using Vision Transformer (ViT).
The figure illustrates the process of patch extraction, embedding, and self-attention-based feature learning for visual representation.
Fig 5.
Self and cross attention mechanism.
The figure illustrates the interaction between self-attention and cross-attention for multimodal feature fusion.
Table 2.
Multimodal datasets and their description.
Table 3.
Statistics of the datasets.
Table 4.
Results on IEMOCAP dataset.
Table 5.
Results on MELD dataset.
Fig 6.
Performance comparison of different fusion methods on the two datasets.
The figure compares the classification performance of various fusion strategies on the IEMOCAP and MELD datasets.
Fig 7.
Confusion matrices for emotion classification.
The figure illustrates the classification performance of the proposed model on the IEMOCAP and MELD datasets.
Table 6.
Model performance across different test sets.
Table 7.
Results of EAC-Agent on Perplexity, BLEU, and ROUGE scores across modalities.
Table 8.
Ablation study on IEMOCAP and MELD.