Federated high order tensor fusion for privacy preserving multimodal social media analysis

doi:10.1371/journal.pone.0344980

Fig 1.

Architecture of the federated learning system.

The system consists of edge nodes, IoT devices, and a cloud server. The arrows indicate the flow of model parameters between edge nodes and the server during federated training, illustrating the decentralized and privacy-preserving learning process.

More »

Expand

Fig 2.

General framework of the proposed algorithm.

The diagram illustrates the three core modules: feature extraction, feature fusion, and feature decision. The process flow from local model training on edge nodes to global model aggregation on the central server is depicted.

More »

Expand

Fig 3.

Architecture of the hybrid LSTM-CNN text feature extraction sub-network.

The input text passes through an embedding layer, followed by convolutional filters for local feature extraction, max-pooling for dimensionality reduction, and an LSTM layer to capture sequential dependencies. The output is a context-aware text representation used for subsequent multimodal fusion.

More »

Expand

Fig 4.

Multimodal data fusion based on Tucker decomposition.

, and denote the audio, visual, and text feature matrices, respectively, while M represents the learnable core tensor. The symbol “×” indicates mode-n tensor–matrix multiplication, and M × x illustrates the intermediate fusion state during successive mode-wise multiplications. The final fused multimodal representation is denoted as K.

More »

Expand

Fig 5.

Federated learning process for multimodal data fusion.

Each node performs local feature extraction and fusion, followed by uploading model updates to the central server. The server aggregates these updates to refine the global model, which is then redistributed to the nodes for the next training round, ensuring privacy preservation and collaborative learning.

More »