Fig 1.
Schematic diagram of the overall architecture of the MDKG-RL model, showing the process of multimodal archival data processing, knowledge graph construction and retrieval optimization.
The multimodal archival data (text, image, audio) is processed in turn by the multimodal Transformer feature extraction module, the graph neural network and knowledge graph fusion module, and the deep reinforcement learning retrieval optimization module, reflecting the collaborative relationship between the modules and the flow of data.
Fig 2.
The working principle diagram of the multimodal Transformer feature extraction module, presenting the feature extraction and fusion process of text, image, and audio data, the specific steps of feature extraction of text data through the BERT model, image data with the help of Vision Transformer (ViT), and audio data with Audio Transformer, and finally the process of realizing multimodal feature fusion through the joint attention mechanism.
Fig 3.
Schematic diagram of the construction and reasoning of the graph neural network and knowledge graph fusion module, covering the entity alignment in graph construction (mapping multimodal entity representations to a unified space based on the TransE model), relational reasoning (aggregating neighbor node features using the GraphSAGE model), and the process of logical rule injection and external knowledge fusion (through the SWRL rule engine and access to the general knowledge base).
Fig 4.
Operational mechanism diagram of deep reinforcement learning retrieval optimization module, based on the user interaction retrieval strategy optimization process, encodes the user query vector, historical interaction record and knowledge graph subgraph representation into state space, designs reward function, and trains the policy network through the proximal policy optimization algorithm (PPO) to optimize the retrieval strategy.
Table 1.
Details of public datasets used in the experiment (ICDAR 2023 & AIDA Corpus).
Table 2.
Experimental environment configuration details.
Table 3.
Details of MDKG-RL model performance evaluation metrics.
Table 4.
Comparative performance of different models on ICDAR 2023 and AIDA corpus datasets based on MRR, NDCG, , and ELA.
Table 5.
Statistical significance of performance improvements between MDKG-RL and BERT-GNN models across multiple metrics.
Fig 5.
Performance comparison of different models on ICDAR 2023 and AIDA Corpus datasets (MRR, NDCG, response time, entity linking accuracy).
Table 6.
Ablation study results of MDKG-RL model on ICDAR 2023 and AIDA corpus datasets: Comparison of performance metrics with component removal.
Fig 6.
MDKG-RL model component ablation experiment performance comparison (ICDAR 2023 vs AIDA Corpus).
Fig 7.
Heatmap of the probability distribution of retrieval result ranking positions for MDKG-RL and Baseline models on the ICDAR 2023 dataset.
Fig 8.
Boxplot comparing the response time distribution of MDKG-RL, BERT-GNN, and Baseline models on ICDAR 2023 and AIDA Corpus datasets.
Fig 9.
Dynamic changes in reward values and retrieval accuracy over training iterations during deep reinforcement learning (DRL) training.
Fig 10.
Confusion matrix showing the entity linking error distribution for core entities, long-tail entities, cross-modal ambiguity, events, locations, and organizations by the MDKG-RL model on the AIDA Corpus dataset.