Figures
Abstract
Knowledge graph completion (KGC) is a fundamental task for improving downstream applications like semantic search and question answering. Effective KGC requires integrating structural and description information, allowing them to complement each other’s weaknesses (e.g., long-tail issues or overlooked structural knowledge). Existing work typically integrates structural and description information at the embedding level by feeding structure embeddings into pre-trained language models (PLMs) and coupling them via attention mechanisms, which ensures the complementarity. However, as many KG entities are multi-semantic, exhibiting semantics beyond descriptions in certain triplets and making PLMs struggle to learn them, and current embedding level coupling approaches fail to transfer the entity multi-semantic knowledge learned from the structure model to PLM, the integration effect can be further improved. To alleviate above issue, we propose AKD-KGC, which aims at realizing this knowledge transfer then enhancing the integration effect by adding a teaching-learning procedure based on Adaptive Knowledge Distillation during feature integration for KGC task in this work. The AKD-KGC framework integrates two features at the embedding level and use structural models to guide prediction behavior of integration model at the same time, adjusting the weight of PLM through additional supervision and enhancing its learning of entity additional semantics beyond descriptions. AKD-KGC can be applied to both transductive and inductive settings, and has achieved state-of-the-art results on a large number of datasets in both settings, demonstrating the effectiveness of our method. Our code and datasets are available at https://github.com/liqingsong1227/AKD-KGC.
Citation: Li Q, Lv Y, Wei X, Li C, Wei L, Zhang J, et al. (2026) Adaptive knowledge distillation based structure-text embedding integrating for knowledge graph completion. PLoS One 21(3): e0344363. https://doi.org/10.1371/journal.pone.0344363
Editor: Issa Atoum, Philadelphia University, JORDAN
Received: September 11, 2025; Accepted: February 19, 2026; Published: March 19, 2026
Copyright: © 2026 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Our code and datasets are available at https://github.com/liqingsong1227/AKD-KGC.
Funding: This work is supported by the Zhongguancun Laboratory, National Natural Science Foundation of China (Grant Nos. 62276013, 62141605, 62050132), the Beijing Natural Science Foundation (Grant No. 1192012), the Hebei Provincial Natural Science Foundation, China (Grant No. F2020111001), and the Fundamental Research Funds for the Central Universities. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Knowledge graph (KG) is a type of structured semantic knowledge base that is used to symbolically represent concepts within the physical world and their interconnectedness, which has extensive applications in various artificial intelligence applications, such as semantic search [1], question answering [2], and recommendation system [3,4]. Despite the inclusion of millions of entities and triplets, many KGs remain incomplete due to the continuous emergence of new knowledge. In order to address these challenges, researchers have focused their efforts on knowledge graph completion (KGC). Specifically, the KGC tasks can be categorized into transductive and inductive settings, depending on whether new entities appear in the test data.
KGC tasks can be formulated as predicting target value v when a query q is given, where the predicted value is selected through the plausibility of a pair. Existing methods for computing plausibility can be roughly divided into two categories: structure-based and description-based, where structure-based methods concentrate more on learning embeddings that capture the global structure features of massive positive and negative query-value pairs with different score functions [5–7], while description-based methods utilize pre-trained language models (PLMs) to learn the contextual semantics from entity descriptions [8,9]. Structural methods always suffer from the long-tail issue, and description-based methods are limited by overlooking global structural information. Recently, many researchers explore the integration of both approaches to enable both types of information to autonomously play different role in different query scenarios and compensate for each other’s shortcomings [10–12]. They focus on achieving the integration of two types of information at the embedding level, by feeding structure embedding into PLM and dynamically coupling the two types of information through attention mechanisms, which achieves complementarity between two types of information.
However, these integrating methods have some limitations caused by the inherent shortcomings of PLMs when dealing with multi-semantic entities. Taking Fig 1 as an example, the correct value entities for query (Master of Arts, /education /educational degree /people with this degree. /education /education /major field of study,?) should be some major fields of study that are correspond with the master of arts degree. When using PLM to embed query and values, only the value entities whose description have some certain text about major field can have probability to satisfy this query. Unfortunately, there are many entities satisfy this query even though their descriptions have no certain text about major field, such as English Language, whose description only contain the information as a language, but they can still contain the semantic of a major field of study in some similar triplets like (Master of Science, /education /educational degree /people with this degree. /education /education /major field of study, English Language). In other words, there are many entities that exhibit semantics beyond their descriptions in certain triplets and these semantics are difficult for PLMs to learn. Although structure-based methods can acquire additional semantics via entity correlations, existing embedding-level approaches fail to fully leverage this knowledge. They restrict the use of multi-semantic knowledge to a supplementary role, preventing the transfer of knowledge to improve the PLM’s intrinsic prediction behavior. Addressing this limitation suggests that the performance of current methods can be substantially enhanced.
(a) The property of English Language from description does not match the given query. (b) The major field property from structure matches the given query. When answer the query (Master of Arts, /education /educational degree /people with this degree. /education /education /major field of study,?), entity English Language can not be matched through the description because the description only contains the semantic of language like sub-figure (a). However, in sub-figure b, entity English Language are connected with entity Master of Science with the same relation like other major fields such as Linguistics, which implies English Language can contain the semantics of major field.
The major purpose of this work is realizing this multi-semantic knowledge transfer and enhancing the integration effect. At first, we compress the knowledge learned by the structural model into its scores for pairs, where different scores represent the semantic relevance of different
pairs. For an entity, its matching scores with different queries contain its multi-semantic semantics. Next, we let the integration model learn the prediction behavior of structure model, i.e., learning to predict score of
pairs like structure model, so that although the description model cannot directly calculate the correlation outside the description semantics based on the description content, it can adjust the weight of PLM through the score of the structural model, making the PLM module of the fusion model can also learn this multi-semantic knowledge. Specifically, a novel structure and description integration framework named AKD-KGC are build, which not only integrate two features at the embedding level, but also add a teaching-learning procedure based on Adaptive Knowledge Distillation during feature integration, to achieve multi-semantic knowledge transfer and guide the guide prediction behavior of integration model for KGC tasks. AKD-KGC employs a path-based GNN to capture KG structural features, then encodes queries and values separately using dual BERTs for descriptions, and finally fuses both embeddings. AKD-KGC employs a two-stage training process: first pre-training a structure-only GNN as the teacher model, then using adaptive knowledge distillation to train the integration model, enabling it to autonomously decide whether to learn from the teacher or data. AKD-KGC can be applied to both transductive and inductive settings, and has achieved state-of-the-art results on three transductive benchmark datasets, including WN18RR, FB15k-237, and CoDEx-M, and two inductive benchmark datasets with four standard splits including FB15k-237-V1 V4 and WN18RR-V1 V4, demonstrating the effectiveness of our method. The contribution of this work can be summarized as follows:
- A novel structure and description integration framework named AKD-KGC are build, which not only integrate two features at the embedding level, but also achieve multi-semantic knowledge transfer and guide the guide prediction behavior of integration model for KGC tasks.
- A teaching-learning procedure based on adaptive knowledge distillation is added during feature integration to achieve above knowledge transfer, enabling the integration model to autonomously decide whether to learn from the teacher or data.
- This AKD-KGC framework can be applied in both transductive and inductive setting and state-of-the-art on three transductive KGC datasets and two inductive KGC datasets with four standard splits are got.
Related work
Knowledge graph completion
Classic knowledge graph completion techniques mostly fall into two categories: structure-based techniques and recently developed description-based techniques.
Structure-based methods
Structure-based methods use structure statistical characteristics to directly predict, or generate embedding for entities by directly defining an embedding matrix or using graph neural networks to aggregate message from their neighbors, then use appropriate scoring functions to compute probability for a triplet.
Structure statistical characteristics mainly include path features such as Path Ranking [13,14] which directly uses relational paths as symbolic features for prediction, NeuralLP [15] and DRUM [16] which learn probabilistic logical rules to weight different paths.
Structure Embedding methods learn the near neighbor semantic relationship between entities and are limited by the long-tail problem. These methods can be divided into three categories based on score functions: translation distance methods, semantic matching methods and neural network methods. The fundamental concept of the translation distance model involves conceptualizing the relation between a subject entity and an object entity as a form of translation, whereby the plausibility is determined by the distance of translation between the two entities. Some representative translation based methods include TransE [5], TransH [17], TransR [18], TransD [19] and Rotate [20]. Semantic matching model measures the plausibility of a triplet by matching latent semantics in the embedding space. Typical methods include RESCAL [6], Distmult [21], ComplEx [22], QutatE [23], DURA [24] and BLMSearch [25]. The neural network model builds a neural network to compute the plausibility of one triplet, and it includes ERMLP [26], NTN [27] and ConvE [28], DiffusionE [29]. To better learn the structure feature of KG, some researchers use graph neural networks to generate more compressive embeddings, such as R-GCN [30], HRAN [31], DisenKGAT [32], Path-RNN [33], NBFNet [7], StructKGC [34].
Description-based methods
The first description-based KGC methods DKRL [35] uses a convolution neural network to embed entity’s descriptions. With the development of natural language processing, large pre-trained language models such as GPT1/2/3 [36–38], BERT [39], T5 [40] become mainstream encoder of natural sentences. Description based methods also benefit from these language models, such as KG-BERT [8], LMKE [9], SimKGC [41], MoCoKGC [42] and Csprom-KD-DC [43], use language models as entity and description encoder and get prior knowledge from pre-trained weights. Due to the large cost of fine-tuning BERT, these methods are limited by the number of negative samples.
Recently, LASS [10], CSprom-KG [11] and Mocosa [12] explore the fusion of structure and description approaches and adopt paradigms where the output of structural models (e.g., TransE [5]) is fed into PLMs like BERT as part of the input. However, these paradigms can not be used in inductive setting and have limitations on diluting structure information.
Knowledge distillation
The main idea of knowledge distillation is that the student model mimics the teacher model in order to obtain a competitive or even a superior performance, and it has received increasing attention from the research community in recent years. It has achieved remarkable success with good performance in many fields such as model compression and acceleration [44], multi-modal learning [45] and knowledge graph completion [46]. More introduction about knowledge distillation can be seen in the survey [47].
Materials and methods
In this section, the basic concepts, notation and the details of our proposed framework will be introduced.
Concepts and Notations
- Knowledge Graph and Knowledge Graph Embedding. A knowledge graph(KG) can be described as a tuple
, where
denotes the entity set,
denotes the relation set, and
denotes the triple set. A triple
denotes a fact or knowledge in KG, where u, v denotes head and tail entity respectively, and r denotes relation between them. Knowledge graph embedding aims at learning or calculating a vector representation for each entity and relation, then uses this vector representation for downstream tasks such as knowledge graph completion and triplet classification. In this paper, we use bold terms to denote vector representations, such as
to denote vector representation of entity u, and
to denote vector representations of all entities
.
- Knowledge Graph Completion. Knowledge graph completion aims at answering two queries
and
, where the second one is changed into the query
, in which
denotes the reverse relation of relation r. We use
to represent a specific query
. Researchers use a score function
to calculate plausibility for all candidate values of a given query and make predictions based on them.
AKD-KGC overview
Specifically, we use NBFNet [7] to learn structural representations containing different paths between query and value, and learn description embeddings of query and value separately by using two BERTs. Then we integrate two embeddings and make predictions based on the integrated embedding. To achieve multi-semantic knowledge transfer and guide the guide prediction behavior of integration model, we adopted a two-stage training process. Firstly, we trained an NBFNet to provide an initial output logits as soft label. Subsequently, when training the integrating model, we not only let the integrating model learn directly from the data, but also let the integrating model learn from the pre-trained NBFNet through knowledge distillation. Additionally, in order to prevent the integrating model from being limited by the performance of the NBFNet, we used an adaptive mechanism to adaptively balance learning from the teacher and the ground truth, ensuring it avoids suboptimal convergence due to noisy input while maintaining its own performance capacity. The overall framework is shown in Fig 2.
We aim to train a model to answer the query (Master of Arts, /education /educational degree /people with this degree. /education /education /major field of study,?). The left is the input features, including paths in KG, descriptions of query and value, where for a positive value, we sample several negative samples for learning of integrating model. The upper half in the middle is a pre-trained NBFNet, while the lower half in the middle is our integrating model, including an NBFNet that is loaded from the pre-trained one, and two BERT to encode query and values with descriptions. The right shows the loss function of our integrating model, including prediction loss and distillation loss, where two losses are dynamically weighted sum.
Components of integrating model
Description module
In order to utilize the description semantics of a triplet, we use two pre-trained BERT models to learn the query and value embeddings separately. The description of an entity u (or a relation r), which can be quired from original database, is a sequence of tokens that gives a detailed introduction for the entity or relation, where
denotes the length of the tokens,
denotes the description tokens of entity u. For the embedding of query
, the input tokens
of pre-trained BERT model are joined from entity token and description of entity u and relation r following with a special token [CLS], that is
, where
denote the token of entity u itself. Benefiting from the powerful contextual understanding ability of BERT model, every query can gain a vector representation containing rich description semantics. Although some entities or relations may lack description, they can still learn representation from related entity’s or relation’s description. We use the output embedding of token [CLS] as the final embedding of query q and recorded as
, which aggregates all information of whole tokens through attention mechanism. Embedding for value v can be calculated similarly. The input token of value v is
, then we use the embedding of token
as the final embedding of value v, and it is recorded as
.
Path module
We solve the path representations by NBFNet [7], which uses the generalized Bellman-Ford Algorithm to overcome the exponential quantity issue. NBFNet defines the path semantics between query and value v as a generalized sum of path representations between u and v with a commutative summation operator ⊕, where each path representation can be defined as a generalized product of the edge representations in the path with the multiplication operator
. It can be formulated as follows:
where represents all paths between u and v and
represents the edge representation of edge
. This path formulation can be solved by Generalized Bellman-Ford Algorithm and parameterized three operators. Specifically, given a query
, the conditional representation of each target node v with q can be initialized through an INDICATOR operator, then the conditional representation of each node can be updated with Bellman-Ford iteration, like
where and L is the total layer of NBFNet. The INDICATOR operator is defined as
, in which
if
otherwise
. The MESSAGE operator is translation or scaling operators such as the relational operators used in TransE [5] and DistMult [21], and
represents the conditional edge representation. The AGGREGATE operator can be instantiated as natural summation, max or min in traditional methods, and we used the principal neighborhood aggregation (PNA) proposed in [48]. It should be noted that all the node representations
in equation (3) are conditioned with query q, and they are pair representations of u and v, rather than single node representations. The path semantics between u and v with query q are embedded in the final layer
, and we record it as
.
Integrating module
We first fuse two kinds of semantic features by directly concatenating semantic embeddings, and we introduce additional node degree information for final prediction. We feed the concatenated embedding into a two layer MLP and the probability that q matches v is
where ,
represents sigmoid activation function,
represents a merging function which can be instantiated as difference, multiplication, concatenation or their composition, and
represents concatenation operator.
Training process
We use weighted Binary Cross Entropy(BCE) as loss of learning from data, like
where
and i is selected from all positive and negative value samples for a given query.
In the training stage, we first pre-train an NBFNet, then regard our integrating model as student model and use the pre-trained NBFNet as teacher model. We use adaptive knowledge distillation to help training student model with the soft labels from the teacher, where the weight of path module in student model is directly loaded from the pre-trained NBFNet for less back propagation.
Let the normalized output logits of student model and teacher model is and
, then the distillation loss can be described as the KL-divergence between
and
. We use the average of
and
as our final distillation loss, like
where the KL-divergence is defined as
and ,
are two arbitrary distribution.
Finally, the integrating model has much better potential in getting good performance due to it incorporates more description information than teacher model, so distilling directly from the teacher model with a fixed coefficient is suboptimal. Accordingly, we use the performance of student and teacher models to adaptively adjust weights of student models learning from ground truth or soft labels. Specifically, we use the Cross Entropy(CE) between prediction results and ground truth of student model or teacher model as performance indicator, then we use softmax function to normalize them. The CE can be written as
The final loss is
where
and τ is a hyperparameter controlling the influence of CE value to selecting weight.
Experimental setting
Datasets
We evaluated our method in both transductive and inductive knowledge graph completion. For transductive KGC, we used three competitive benchmark datasets: FB15k-237 [49], WN18RR [28], and CoDEx Medium [50], and we used standard transductive splits [28,49,51]. For inductive KGC, we followed splits in [52] for FB15k-237 and WN18RR. Statistics of these datasets are shown in Table 1 and Table 2.
Baselines
We compare our methods against path-based methods, structure-based methods, GNN-based methods, description-based methods and combination-based methods for transductive setting, which includes 16 baselines. For inductive setting, we compare against path-based methods and GNN-based methods, which includes 5 baselines.
Implementation details
Our implementation generally follows the open source codebases of knowledge graph completion. We augmented each triplet with a flipped triplet
. We used the original setting in [7] for pre-training the teacher NBFNet, and used BERT-base as our description module. The learning rate of fine-tuning PLM and other components were
and
respectively. Batch size was set as 24 for WN18RR, 48 for FB15k-237 and CoDEx-M. Our student model was trained on 1 Tesla A100 GPUs for 12 epochs, and we selected the models based on their performance on the validation set.
Evaluation metrics
For transductive setting, we used the same setting as much previous work such as [30–32]. Specifically, given the triplet in the test set, we ranked it by the value of score function f against all candidate triplets which filter out all the correct triplets appearing in the training, validation, and test datasets, and it’s the same as other work for fair comparison. We used three kinds of metrics including mean rank(MR), mean reciprocal rank(MRR), and Hits@k(for ). For inductive relation prediction, we followed [52] to draw 50 negative triplets for each positive triplet and use the above filtered ranking. We report HITS@10 for comparison.
Experimental results
Main results
Table 3 summarizes the results on transductive knowledge graph completion. AKD-KGC performs better than baseline methods on all three datasets and nearly all metrics. Compared with other integrating-based methods, AKD-KGC get significant improvements.
In general, integrating-based methods and GNN-based methods outperform other methods on FB15k-237 and Codex Medium datasets, while integrating-based methods and description-based methods are better than other methods on WN18RR dataset. We assume that the FB15k-237 and Codex Medium have high average degrees, which means structure model can learn more structure features, while the WN18RR has lower average degree and its entities and relations are almost common noun and their facts are almost language knowledge, which is easy for language models to learn. By integrating description semantics, AKD-KGC gets additional improvements compared with previous SOTA method NBFNet [7] on FB15k-237 and Codex Medium, and we think that they lead a small margin is due to the difficulties on fine-tuning BERT model in these two datasets. The description module of AKD-KGC is similar to LMKE [9] and AKD-KGC improves of MRR on WN18RR compared to LMKE and this proves the validity our structure features and integrating module.
Unlike other integration-based methods, AKD-KGC can be used in inductive setting. Table 4 shows all results on inductive setting. On WN18RR datasets, our method outperforms all other methods on all splits, and on FB15k-237, our methods outperform all other methods on three splits. The experimental results show that adding description features could improve more performance for graphs with lower average degree.
Analysis
We conduct ablation study to show the effectiveness of our proposed components on three transductive KGC datasets. In order to prove the necessity of each semantic feature, we trained a model without description module (AKD-KGC w/o description) and a model without path semantic module (AKD-KGC w/o path). We also trained a model without distillation loss (AKD-KGC w/o distillation) to prove the necessity of adaptive knowledge distillation and a model without path module but with distillation loss(AKD-KGC-description w distillation) to prove the necessity of feature integration. We report the MRR and Hits@1 metrics for comparison and all models are trained with same setting.
Table 5 summarizes all ablation study results, and we set the complete model as baseline. In WN18RR datasets, dropping any module will make the performance decrease significantly, and the performance of AKD-KGC-description w distillation is much worse than the baseline. This indicates that both description module and path module could learn one aspect of knowledge graph semantics and the importance of distillation. Additionally, the performance decrease of dropping distillation module compared to baseline and the performance increase of adding distillation compared to AKD-KGC w/o path prove that distilling from path module is benefit for language module learning, which further improves the importance of multi-semantic knowledge transfer. While in FB15k-237 and CoDEx Medium datasets, dropping description module only makes the performance decrease slightly, and dropping path module makes the performance decrease dramatically. This indicates that our description module does not fully learn the natural semantics of entities and relations and we think this is due to fine-tuning BERT for such events and proper nouns is still challenging and it worths for further studying, and the performance of AKD-KGC w/o distillation is even worse than AKD-KGC w/o description further illustrates this issue.
We then analyse the effect of weight of learning from teacher model or ground truth to the performance on WN18RR dataset. We conduct this by setting different values of hyperparameter τ. If τ is set large enough, student model will learn from teacher model with a fixed coefficient, and if τ is set small, student model will learn from teacher model with a dynamic coefficient. Specifically, when the performance of the student model is better than that of the teacher model, the student model tends to learn directly from the data and reduce the weight learned from the teacher model. We set five different values of τ: , 1,
,
and
. The experimental results are summarized in Table 6. Results show that our adaptive learning strategy does improve the performance and with smaller τ, student model could perform better.
Detailed results
In this section, to understand how does our AKD-KGC framework perform better than a single feature model, the detailed prediction results of three different ablation models are visualized, where three models include AKD-KGC (Both), AKD-KGC w/o description(Path) and AKD-KGC-description w distillation (Description). The complex relations in KG can be classified into 1-to-1, 1-to-N, N-to-1, and N-to-N [31] and the MR of each category on WN18RR of are summarized in Tabel Additionally, the rank of each test sample in each relation is shown in Fig 3.
(a) _verb_group. (b)_similar_to. (c) _member_of_domain_usage. (d) _member_of_domain_region. (e) _member_meronym. (f) _has_part. (g) _hypernym. (h) _instance_hypernym. (i) _synset_domain_topic_of. (j) _derivationally_related_form. (k) _also_see.
The green line represents the overall integrating model values, the red scatter point represents the description module values and the light blue bar represents path module values.
The Fig 3 results show that the whole integrating model performs better than single path-based model in all relations, which improves that integrating description feature could supplement the losses caused by using only structural features, except some relation like _member_of_domain_region, _hypernym and _also_see, where the whole integrating model performs worse than single description-based model. The graph further demonstrating this phenomenon. We think this is caused by the lack of path features and the distillation process mislead the learning of the description module to some extent.
Discussion
The experimental results demonstrate the effectiveness of our proposed AKD-KGC framework. By integrating structure and description features, and using adaptive knowledge distillation, our framework achieves state-of-the-art performance on both transductive and inductive KGC tasks. The ablation studies confirm the importance of each component in our framework, highlighting the benefits of multi-semantic knowledge transfer. Compared with existing methods, AKD-KGC shows significant improvements, especially on datasets with lower average degrees, where description features play a crucial role. Overall, our framework provides a promising direction for future research in knowledge graph completion by effectively combining multiple semantic sources.
Limitations
This Work has some limitations. First, the fine-tuning of BERT for entities and relations that are proper nouns or events remains challenging, which may limit the effectiveness of the description module in certain datasets. Second, integrating knowledge graph structure with large language models is an area that requires further research, especially in light of recent advancements in foundation models across various domains. Finally, since foundation models have been widely investigated in other fields such as computer vision and natural language processing, building foundation model for KGC task with considering integrating two semantics is also worthy for future studying. In future work, we aim to establish a stronger theoretical foundation for the adaptive distillation mechanism. Specifically, we plan to analyze the correlation between the adaptive weights and information transformation from an information-theoretic perspective, providing mathematical guarantees for the model’s convergence and generalization ability.
Threats to validity
This study may have several threats to validity. The choice of datasets, while standard in the field, may not fully represent the diversity of real-world knowledge graphs, potentially limiting the generalizability of our findings. Additionally, the hyperparameter settings and model architectures were selected based on prior work and may not be optimal for all scenarios. The reliance on pre-trained models like BERT also introduces dependencies on their training data and biases, which could affect the performance of our framework. Finally, while we conducted extensive experiments, there may still be unexamined factors that could influence the results, such as different training regimes or alternative integration strategies.
Conclusion
In this paper, to achieve multi-semantic knowledge transfer from structure to description model and guide the guide prediction behavior of integration model, we built a novel structure and description integration framework named AKD-KGC, which not only integrate two features at the embedding level, but also add a teaching-learning procedure based on adaptive knowledge distillation during feature integration. We conducted adequate experiments on both transductive and inductive KGC benchmarks, and got state-of-the-art results in both settings, which improved the effectiveness of our framework.Furthermore, it is worth noting that the benefits of text-structure integration are correlated with the semantic richness of the textual information. As observed, the performance gains may vary across datasets, showing more significant improvements in scenarios with rich descriptions compared to datasets dominated by proper nouns where textual semantics are limited.
References
- 1. Zhu G, Iglesias CA. Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Exp Syst Appl. 2018;101:8–24.
- 2. Hu S, Zou L, Yu JX, Wang H, Zhao D. Answering Natural Language Questions by Subgraph Matching over Knowledge Graphs. IEEE Trans Knowl Data Eng. 2018;30(5):824–37.
- 3. Wang H, Zhang F, Wang J, Zhao M, Li W, Xie X, et al. Exploring High-Order User Preference on the Knowledge Graph for Recommender Systems. ACM Trans Inf Syst. 2019;37(3):1–26.
- 4. Rosa RL, Schwartz GM, Ruggiero WV, Rodriguez DZ. A Knowledge-Based Recommendation System That Includes Sentiment Analysis and Deep Learning. IEEE Trans Ind Inf. 2019;15(4):2124–35.
- 5. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. Adv Neur Inform Process Syst. 2013;26.
- 6. Nickel M, Tresp V, Kriegel HP, et al. A three-way model for collective learning on multi-relational data. In: Icml. vol. 11; 2011. p. 3104482–584.
- 7. Zhu Z, Zhang Z, Xhonneux LP, Tang J. Neural bellman-ford networks: A general graph neural network framework for link prediction. Adv Neur Inform Process Syst. 2021;34:29476–90.
- 8.
Yao L, Mao C, Luo Y. KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:190903193. 2019.
- 9.
Wang X, He Q, Liang J, Xiao Y. Language Models as Knowledge Embeddings; 2022. p. 2266–72. http://doi.org/10.24963/ijcai.2022/315
- 10.
Shen J, Wang C, Gong L, Song D. Joint Language Semantic and Structure Embedding for Knowledge Graph Completion. In: Proceedings of the 29th International Conference on Computational Linguistics. 2022. p. 1965–78.
- 11.
Chen C, Wang Y, Sun A, Li B, Lam KY. Dipping PLMs Sauce: Bridging Structure and Text for Effective Knowledge Graph Completion via Conditional Soft Prompting. In: Findings of the Association for Computational Linguistics: ACL 2023. 2023. p. 11489–503.
- 12.
He J, Liu J, Wang L, Li X, Xu X. Mocosa: Momentum contrast for knowledge graph completion with structure-augmented pre-trained language models. In: 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE; 2024. p. 1–6.
- 13.
Gardner M, Mitchell T. Efficient and expressive knowledge base completion using subgraph feature extraction. In: Proceedings of the 2015 conference on empirical methods in natural language processing. 2015. p. 1488–98.
- 14. Lao N, Cohen WW. Relational retrieval using a combination of path-constrained random walks. Mach Learn. 2010;81:53–67.
- 15. Yang F, Yang Z, Cohen WW. Differentiable learning of logical rules for knowledge base reasoning. Adv Neur Inform Process Syst. 2017;30.
- 16. Sadeghian A, Armandpour M, Ding P, Wang DZ. Drum: End-to-end differentiable rule mining on knowledge graphs. Adv Neur Inform Process Syst. 2019;32.
- 17. Wang Z, Zhang J, Feng J, Chen Z. Knowledge Graph Embedding by Translating on Hyperplanes. AAAI. 2014;28(1).
- 18. Lin Y, Liu Z, Sun M, Liu Y, Zhu X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. AAAI. 2015;29(1).
- 19.
Ji G, He S, Xu L, Liu K, Zhao J. Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers). 2015. p. 687–96.
- 20. Ebisu T, Ichise R. Generalized translation-based embedding of knowledge graph. IEEE Trans Knowl Data Eng. 2019;32:941–51.
- 21.
Yang B, Yih SWt, He X, Gao J, Deng L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In: Proceedings of the International Conference on Learning Representations (ICLR) 2015. 2015.
- 22.
Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G. Complex embeddings for simple link prediction. In: International conference on machine learning. PMLR. 2016. p. 2071–80.
- 23. Zhang S, Tay Y, Yao L, Liu Q. Quaternion knowledge graph embeddings. Advances in neural information processing systems. 2019;32.
- 24. Wang J, Zhang Z, Shi Z, Cai J, Ji S, Wu F. Duality-Induced Regularizer for Semantic Matching Knowledge Graph Embeddings. IEEE Trans Pattern Anal Mach Intell. 2023;45(2):1652–67. pmid:35324433
- 25. Zhang Y, Yao Q, Kwok JT. Bilinear Scoring Function Search for Knowledge Graph Learning. IEEE Trans Pattern Anal Mach Intell. 2023;45(2):1458–73. pmid:35254979
- 26.
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, et al. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. p. 601–10.
- 27. Socher R, Chen D, Manning CD, Ng A. Reasoning with neural tensor networks for knowledge base completion. Adv Neur Inform Process Syst. 2013;26.
- 28.
Dettmers T, Minervini P, Stenetorp P, Riedel S. Convolutional 2d knowledge graph embeddings. In: Proceedings of the AAAI conference on artificial intelligence. vol. 32. 2018.
- 29.
Cao Z, Li J, Wang Z, Li J. DiffusionE: Reasoning on Knowledge Graphs via Diffusion-based Graph Neural Networks. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD ’24. New York, NY, USA: Association for Computing Machinery. 2024. p. 222–230.
- 30.
Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer; 2018. p. 593–607.
- 31. Li Z, Liu H, Zhang Z, Liu T, Xiong NN. Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Trans Neur Netw Learn Syst. 2021;33:3961–73.
- 32.
Wu J, Shi W, Cao X, Chen J, Lei W, Zhang F, et al. DisenKGAT: knowledge graph embedding with disentangled graph attention network. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021. p. 2140–9.
- 33. Neelakantan A, Roth B, Mccallum A. Compositional Vector Space Models for Knowledge Base Completion. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015. p. 156–66.
- 34.
Lin J, Wang L, Lu X, Hu Z, Zhang W, Lu W. Improving Knowledge Graph Completion with Structure-Aware Supervised Contrastive Learning. In: Al-Onaizan Y, Bansal M, Chen YN, editors. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics; 2024. p. 13948–59.
- 35. Xie R, Liu Z, Jia J, Luan H, Sun M. Representation learning of knowledge graphs with entity descriptions. In: Proceedings of the AAAI conference on artificial intelligence. vol. 30. 2016.
- 36.
Radford A, Narasimhan K, Salimans T, Sutskever I, et al. Improving language understanding by generative pre-training. 2018.
- 37. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1:9.
- 38. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Adv Neur Inform Process Syst. 2020;33:1877–901.
- 39.
Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019. p. 4171–86.
- 40. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21:5485–551.
- 41.
Wang L, Zhao W, Wei Z, Liu J. SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. p. 4281–94.
- 42.
Li Q, Zhong Y, Qin Y. MoCoKGC: Momentum Contrast Entity Encoding for Knowledge Graph Completion. In: Al-Onaizan Y, Bansal M, Chen YN, editors. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics; 2024. p. 14940–52. Available from: https://aclanthology.org/2024.emnlp-main.832/
- 43.
Li D, Tan Z, Chen T, Liu H. Contextualization Distillation from Large Language Model for Knowledge Graph Completion. In: Graham Y, Purver M, editors. Findings of the Association for Computational Linguistics: EACL 2024. St. Julian’s, Malta: Association for Computational Linguistics; 2024. p. 458–77.
- 44.
Hinton G. Distilling the Knowledge in a Neural Network. In: Deep Learning and Representation Learning Workshop in Conjunction with NIPS. 2014.
- 45.
Gupta S, Hoffman J, Malik J. Cross modal distillation for supervision transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 2827–36.
- 46.
Li Y, Liu J, Yang M, Li C. Self-distillation with meta learning for knowledge graph completion. In: Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. p. 2048–54.
- 47. Gou J, Yu B, Maybank SJ, Tao D. Knowledge Distillation: A Survey. Int J Comput Vis. 2021;129(6):1789–819.
- 48. Corso G, Cavalleri L, Beaini D, Liò P, Veličković P. Principal neighbourhood aggregation for graph nets. Adv Neur Inform Process Syst. 2020;33:13260–71.
- 49.
Toutanova K, Chen D. Observed versus latent features for knowledge base and text inference. In: Proceedings of the 3rd workshop on continuous vector space models and their compositionality. 2015. p. 57–66.
- 50.
Safavi T, Koutra D. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. p. 8328–50.
- 51.
Chen Y, Minervini P, Riedel S, Stenetorp P. Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations. In: 3rd Conference on Automated Knowledge Base Construction. 2021. Available from: https://openreview.net/forum?id=Qa3uS3H7-Le
- 52.
Teru K, Denis E, Hamilton W. Inductive relation prediction by subgraph reasoning. In: International Conference on Machine Learning. PMLR. 2020. p. 9448–57.