Table 1.
Summary of related work comparing strengths, limitations, and how FedEmoNet addresses identified gaps.
Fig 1.
Complete methodology pipeline of the proposed FedEmoNet framework.
Phase 1: Data processing and feature engineering including spectrogram generation, MFCC extraction, chroma features, and handcrafted features, followed by ensemble PSO optimization. Phase 2: Multi-scale TCN-Transformer fusion architecture. Phase 3: Training and evaluation. Phase 4: Explainability analysis through LIME and SHAP.
Table 2.
Summary of datasets used in experiments. EmoDB and RAVDESS serve as federated training sources; CREMA-D is used exclusively for cross-corpus evaluation.
Fig 2.
FedProx-based federated learning protocol.
The global server maintains model and performs weighted aggregation. Each client receives the broadcast global model, performs local training, and sends updated parameters. The privacy boundary ensures raw data never leaves the client. The global test set is held out before client distribution.
Fig 3.
FedEmoNet local model architecture.
Three parallel branches: (1) CNN Branch processing spectral features through Conv2D layers with ReLU, batch normalization, MaxPool, and AdaptiveAvgPool; (2) TCN Branch processing PSR features at three scales through dilated causal convolutions with dilation rates d = 1,2,4; (3) Dense Branch processing handcrafted features. All branches produce embeddings fused via Multi-Head Attention and processed through N = 6 Transformer encoder blocks. The Classification Head combines Max Pool, Mean Pool, and Last State representations.
Fig 4.
(a) Fitness convergence showing global best and swarm mean stabilizing by iteration 35; (b) Feature count reduction from 150 to 103 selected features; (c) Computational cost breakdown.
Fig 5.
PSO-optimized feature selection pipeline.
Starting with 150 features, ensemble methods generate rankings aggregated via Borda count. PSO with 20 particles optimizes the feature subset iteratively using a sigmoid transfer function for binary selection.
Table 3.
PSO-optimized feature selection results.
Table 4.
Dataset partitioning for experimental evaluation.
Table 5.
Comparison of XAI methods used in FedEmoNet.
Fig 6.
(a) SHAP feature attribution for an anger sample; (b) LIME feature contribution for the same sample; (c) Strong agreement between SHAP and LIME importance values (r = 0.997).
Fig 7.
Comprehensive explainability analysis.
(a) Emotion-specific feature importance via LIME; (b) Multi-head attention weights; (c) SHAP feature impact distribution; (d) Cross-corpus feature consistency (r = 0.94); (e) Learned emotion embedding space via t-SNE; (f) Temporal attention pattern analysis.
Fig 8.
LIME explanation examples for individual samples.
Red bars indicate negative contributions and green bars indicate positive contributions. Feature indices correspond to PSO-selected features.
Table 6.
Classification performance on EmoDB (107 test samples).
Fig 9.
(a) EmoDB (99.07%, 107 samples): single misclassification Sadness→Neutral; (b) RAVDESS (98.96%, 288 samples): three errors between acoustically similar pairs; (c) CREMA-D cross-corpus (68.15%, 1,488 samples): high-arousal emotions show stronger transfer.
Table 7.
Classification performance on RAVDESS (288 test samples).
Fig 10.
Per-emotion cross-corpus performance on CREMA-D.
(a) Detailed metrics per emotion; (b) Arousal-based analysis: high-arousal emotions (71.9%) transfer significantly better than low-arousal (62.1%).
Table 8.
Per-emotion performance on CREMA-D (cross-corpus, 1,488 samples).
Fig 11.
t-SNE visualization of feature distributions across datasets.
(a) Dataset-colored view showing domain shift between EmoDB, RAVDESS, and CREMA-D; (b) Emotion-colored view revealing cross-dataset clustering for high-arousal emotions; (c) Domain shift visualization highlighting CREMA-D relative to training data.
Fig 12.
Reduced training data ablation.
(a) Performance vs. training data fraction showing monotonic improvement, ruling out memorization; (b) Performance degradation quantification.
Table 9.
Statistical validation (10-fold CV).
Fig 13.
(a) 10-fold CV comparison; (b) 95% confidence intervals; (c) Paired t-test significance; (d) Distribution across folds; (e) Effect size analysis; (f) ANOVA results (F = 78.45, p < 0.001).
Fig 14.
Federated learning training dynamics.
(a) Global accuracy convergence; (b) Loss on logarithmic scale; (c) Per-client heterogeneous convergence; (d) FedProx vs FedAvg comparison.
Fig 15.
Detailed FedProx protocol analysis.
(a) FedProx vs FedAvg convergence; (b) Proximal coefficient sensitivity ( optimal); (c) Non-IID distribution across 5 clients; (d) Client model drift; (e) DP accuracy-privacy trade-off; (f) Algorithm specification.
Table 10.
Ablation study results.
Fig 16.
Ablation study visualization for (a) EmoDB and (b) RAVDESS.
PSO feature selection, Transformer blocks, and FedProx provide the largest contributions.
Table 11.
Comparison with state-of-the-art methods.
Fig 17.
(a) DP accuracy-privacy trade-off; (b) Gradient clipping impact; (c) Noise distribution; (d) Membership inference resistance (AUC → 0.52); (e) Communication efficiency (67% reduction);.
Table 12.
Computational efficiency metrics.