FedEmoNet: Privacy-preserving federated learning with TCN-Transformer fusion for cross-corpus speech emotion recognition

doi:10.1371/journal.pone.0342953

FedEmoNet: Privacy-preserving federated learning with TCN-Transformer fusion for cross-corpus speech emotion recognition

Fig 3

FedEmoNet local model architecture.

Three parallel branches: (1) CNN Branch processing spectral features through Conv2D layers with ReLU, batch normalization, MaxPool, and AdaptiveAvgPool; (2) TCN Branch processing PSR features at three scales through dilated causal convolutions with dilation rates d = 1,2,4; (3) Dense Branch processing handcrafted features. All branches produce embeddings fused via Multi-Head Attention and processed through N = 6 Transformer encoder blocks. The Classification Head combines Max Pool, Mean Pool, and Last State representations.

doi: https://doi.org/10.1371/journal.pone.0342953.g003