Enhanced graph attention network by integrating Long Short-Term Memory for artificial emotion representation in multi-modality datasets

doi:10.1371/journal.pone.0339946

Fig 1.

Framework of the proposed E-GAT + Bi-LSTM pipeline.

1) Feature extraction from text/audio/vision/biomedical signals; 2) E-GAT module: Constructs a semantic graph where nodes represent emotional modalities and edges represent modality interactions; 3) Bi-LSTM module: Captures bidirectional temporal dynamics; 4) Fully-connected layer + softmax: Outputs emotion class probabilities.

More »

Expand

Fig 2.

Overarching structure of the proposed graph model (E-GAT).

Interpretability details: 1) Nodes: Represent emotional states (text, audio), with attributes encoding feature vectors (red dashed lines link nodes to feature vectors); 2) Edges: Blue solid lines represent dynamic relationships between emotional states; 3) Weights: Edge weights indicate relationship strength—higher weights mean stronger correlation; 4) Temporal Adaptability: Black dashed lines denote feedback loops, illustrating that emotional states evolve over time.

More »

Expand

Fig 3.

The architecture of the introduced Bi-LSTM model.

This component consists of LSTM models, fully-connected layer, and softmax layer.

More »

Expand

Table 1.

The outcome of the proposed model on SemEval-2018 dataset (%).

More »

Expand

Table 2.

Performance on SemEval-2018 with statistical validation. 95% CI is calculated via 10-fold cross-validation to reflect result variability.

More »

Expand

Fig 4.

Confusion matrix of the proposed model on SemEval-2018.

Interpretability and decision process insights: 1) Color intensity corresponds to the number of samples (darker shades = more samples); 2) Diagonal elements: Correct classifications; 3) Off-diagonal elements: Misclassifications; 4) Overall balance: for all categories confirm the model’s ability to distinguish nuanced emotions, with confusion patterns ‌‌aligning with human emotional perception.

More »