Fig 1.
Architecture of the federated learning system.
The system consists of edge nodes, IoT devices, and a cloud server. The arrows indicate the flow of model parameters between edge nodes and the server during federated training, illustrating the decentralized and privacy-preserving learning process.
Fig 2.
General framework of the proposed algorithm.
The diagram illustrates the three core modules: feature extraction, feature fusion, and feature decision. The process flow from local model training on edge nodes to global model aggregation on the central server is depicted.
Fig 3.
Architecture of the hybrid LSTM-CNN text feature extraction sub-network.
The input text passes through an embedding layer, followed by convolutional filters for local feature extraction, max-pooling for dimensionality reduction, and an LSTM layer to capture sequential dependencies. The output is a context-aware text representation used for subsequent multimodal fusion.
Fig 4.
Multimodal data fusion based on Tucker decomposition.
, and
denote the audio, visual, and text feature matrices, respectively, while M represents the learnable core tensor. The symbol “×” indicates mode-n tensor–matrix multiplication, and M × x illustrates the intermediate fusion state during successive mode-wise multiplications. The final fused multimodal representation is denoted as K.
Fig 5.
Federated learning process for multimodal data fusion.
Each node performs local feature extraction and fusion, followed by uploading model updates to the central server. The server aggregates these updates to refine the global model, which is then redistributed to the nodes for the next training round, ensuring privacy preservation and collaborative learning.
Table 1.
MAP values of different algorithms.
Fig 6.
Comparison results of different algorithms on CombSUM.
Fig 7.
Comparison results of different algorithms on CombMNZ.
Fig 8.
Comparison results of different algorithms on MR.
Table 2.
Multimodal performance comparison on CMU-MOSI.
Fig 9.
Performance comparison of the proposed method and all-member system fusion.
Table 3.
Ablation study results.