Figures
Abstract
Graph neural networks (GNNs) have shown great promise for representation learning on complex graph-structured data, but existing models often fall short when applied to directed heterogeneous graphs. In this study, we proposed a novel embedding method, a bidirectional heterogeneous graph neural network with random teleport (BHGNN-RT) that leverages the bidirectional message-passing process and network heterogeneity, for directed heterogeneous graphs. Our method captures both incoming and outgoing message flows, integrates heterogeneous edge types through relation-specific transformations, and introduces a teleportation mechanism to mitigate the oversmoothing effect in deep GNNs. Extensive experiments were conducted on various datasets to verify the efficacy and efficiency of BHGNN-RT. BHGNN-RT consistently outperforms state-of-the-art baselines, achieving up to 11.5% improvement in classification accuracy and 19.3% in entity clustering. Additional analyses confirm that optimizing message components, model layer and teleportation proportion further enhances the model performance. These results demonstrate the effectiveness and robustness of BHGNN-RT in capturing structural, directional information in directed heterogeneous graphs.
Citation: Sun X, Komaki F (2025) BHGNN-RT: Capturing bidirectionality and network heterogeneity in graphs. PLoS One 20(7): e0326756. https://doi.org/10.1371/journal.pone.0326756
Editor: Guangyin Jin, National University of Defense Technology, CHINA
Received: April 15, 2025; Accepted: June 4, 2025; Published: July 1, 2025
Copyright: © 2025 Sun, Komaki. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: This study was supported by JSPS KAKENHI Grant Number 22H00510, and AMED Grant Numbers JP23dm0207001 and JP23dm0307009. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Graphs are a natural and powerful abstraction for representing complex systems in the real world, including citation networks, social networks, and the World Wide Web [1–7]. They offer a flexible framework for modeling entities and their relationships, making them well-suited for capturing intricate structural dependencies [8]. However, graph-structured data are often high-dimensional, sparse, and non-Euclidean, which makes their analysis particularly challenging [9]. To address this, Graph Neural Networks (GNNs) have emerged as a powerful graph representation learning method designed for such graph data and have attracted considerable research attention [10,11]. Traditional GNNs focus on individual nodes to generate a vector representation or an embedding for each node, such that two nodes “close” in the graph have similar vector representations in a low-dimensional space [12,13]. Recently, many variants of GNNs have achieved superior performances in network analysis, including node classification [14], graph classification, link prediction [15], and recommendations [16,17]. Examples include spectral graph convolutional neural networks [18–20], message-passing algorithms [21] and recurrent graph neural networks [22]. Among them, message-passing frameworks have received particular attention because of their flexibility and empirical effectiveness [19,23].
Graph structures and topological characteristics play a critical role in influencing the effectiveness of inference and learning in graph-based models [9,24,25]. Preserving these structural properties is essential for accurate graph representation learning. However, existing GNNs, particularly spectral-based GNNs, are primarily designed for undirected graphs and often overlook directionality [8,15,26]. In contrast, most real-world graphs are inherently directed. For example, in a citation network, newer papers may cite older ones, but not vice versa. This asymmetry between incoming and outgoing connections carries distinct semantic meanings and relational dynamics. Recent models, such as Directed graph convolutional network (DGCN) [8], Node Embedding Respecting Directionality (NERD) [27], Message Passing Attention network for Document understanding (MPAD) [28], and Directed Graph Neural Network (Dir-GCN) [29], attempted to extend GNNs to directed graphs. While these methods introduce direction-aware components like incoming message aggregation or second-order proximity, they often fail to fully capture the asymmetry between edge directions. Effectively integrating both incoming and outgoing information can yield more expressive node representations and is particularly crucial for modeling the functional roles and semantic context within directed graphs.
Another well-known limitation in graph neural networks is the issue of over-smoothing, where node embeddings become indistinguishable as the number of layers increases [25]. Theoretically, the message-passing process of k iterations takes advantage of a subtree structure with height k rooted at each node. Such schemes can generalize the Weisfeiler-Lehman graph isomorphism test to learn the distribution and topology of node features in the neighborhood simultaneously [14,30]. However, increasing the depth often degrades performance due to over-smoothing. For example, previous work indicated that the best performance of a SOTA model, the graph convolutional network (GCN), is achieved with a 2-layer structure [19]. Their embedding results converged to the random walk’s limit distribution as the layer number increased [19,23]. This phenomenon has also been reported in other GNN variants [30–32], limiting their ability to exploit long-range dependencies. In principle, deeper versions of GCN perform worse, although they have access to more information. The limitation of GNN layer configuration strongly restrains the expressivity of node neighborhoods with high path lengths. To mitigate this, we incorporate a random teleportation mechanism into our model, which injects stochasticity during embedding updates and preserves node-level distinctions [23,33]. By optimizing the teleportation proportion, our approach balances local and global information flow, helping to maintain expressive node representations even in deeper networks.
Existing GNN models often neglect the asymmetry of edge directions and the semantic diversity of node and edge types, leading to suboptimal performance in complex network settings. Moreover, many GNNs suffer from over-smoothing in deeper architectures, which limits their ability to capture long-range dependencies. To address these gaps, we propose a novel model: the Bidirectional Heterogeneous Graph Neural Network with Random Teleportation (BHGNN-RT). BHGNN-RT is beneficial to capture the network characteristics, including the bidirectional message-passing pathways and network heterogeneity. With random teleport, it can also mitigate oversmoothing and prevent information from stagnating in poorly connected nodes. To validate the effectiveness of BHGNN-RT, we conducted extensive experiments on benchmark datasets while comparing with the benchmark algorithms. Our results demonstrate that BHGNN-RT consistently outperforms existing methods, achieving state-of-the-art performance. We further analyze the impact of key components, including directional message integration, teleportation proportion, and network depth. Overall, this study contributes a unified framework for effectively learning on directed heterogeneous graphs and provides practical insights into the design of robust GNNs for complex networked systems.
Related work
GNNs for directed graphs. Graph Neural Networks have traditionally been designed for undirected graphs, often overlooking the inherent directionality of many real-world networks [10,34]. Recognizing this limitation, recent research has focused on developing GNN architectures that effectively incorporate edge directionality to enhance learning on directed graphs [27,35]. Several models have been proposed to generalize spectral convolutions for directed graphs [8,36]. For instance, DGCN employed a generalized Laplacian via a personalized PageRank matrix and incorporates k-hop diffusion process [8]. MagNet utilized the magnetic Laplacian, a complex Hermitian matrix that encodes the magnitude and phase of graph connections, effectively capturing the directionality and structure of directed networks [37]. Additionally, adaptations of different message-passing frameworks have been proposed to enhance their applicability to directed networks. Gated Graph Sequence Neural Network (GGS-NN) employs modified gated recurrent units while aggregating messages only from output sequences [38]. Heterogeneous Directed Acyclic Graph (HetDAG) utilizes an attention-based directed graph learning module that fuses attributes and structures to search for an optimal graph representation between nodes [39]. Directed GNN (Dir-GNN) [29] is an example model designed to incorporate edge directionality into the message-passing process. It employs separate aggregations for incoming and outgoing edges by introducing asymmetry into the adjacency matrix to reflect directional relationships. Despite their advances, they typically focus on simple graph structures and do not empirically address the general functional forms for aggregating incoming and outgoing edges.
GNNs for heterogeneous graphs. Mathematically, a directed graph cannot simply be equivalently converted to an undirected relational network [29]. Relational Graph Convolutional Network (R-GCN) is designed to handle multi-relational graphs, focusing on link prediction and entity classification tasks [15]. Several papers deal with directed heterogeneous graphs by introducing inverse relations [40,41]. Dir-GCN can be treated as a Relational GCN applied to directed relational graphs with inverse edges [29]. While effective in modeling heterogeneity, most of the current approaches assume undirected edges, leaving the challenges posed by directed graphs largely unexplored.
Network embedding strategy
Problem formation
We formalized the problem on a directed heterogeneous graph , where
represents the set of nodes (
),
denotes the set of edges (
),
is the set of node types,
is the set of edge relations, and
is the adjacency matrix. In a heterogeneous graph, either the node types (
) or the edge types (
) are more than one (
). For a directed edge from node
to node
, the matrix element
specifies its edge weight, where
if no edge exists. Node attributes are initialized as
, where
represents the feature vector of node
and f is the feature dimension. Our goal is to develop an encoder to generate node embeddings
that effectively capture graph structure, heterogeneity, and directionality for downstream tasks such as node classification and clustering.
Encoding bidirectionality and heterogeneity
To capture the structural and relational properties of directed heterogeneous graphs, we proposed a novel embedding strategy, named Bidirectional Heterogeneous Graph Neural Network with Random Teleport (BHGNN-RT), that integrates bidirectional message-passing and heterogeneity-aware mechanisms.
For each node , we distinguished the incoming and outgoing edges according to its corresponding source node set
and target node sets
(Fig 1A). This distinction is particularly important in the context of directed networks, such as social networks, and biological neural circuits, where afferent and efferent pathways normally deliver different kinds of messages [4,17,34,42]. Considering the edge relation
, the edge-dependent attention mechanism with learnable weight matrices
was introduced to handle edge-level heterogeneity. Meanwhile, to reduce sensitivity to edge weight scaling, the weight of an edge from node
to node
was normalized by the coefficient
. For an unweighted graph, the normalization coefficient is
, where
and
are the nodal in- and out-degrees, respectively.
Panel A depicts an example of a directed heterogeneous graph, where an example node possesses distinct incoming and outgoing messages. B illustrates an updation function for node i during the message-passing process. Panel C describes how BHGNN-RT works for the clustering task.
In each iterable layer, the incoming messages with different edge relations to node were aggregated as follows:
where represents a
weight matrix associated with edge type r and
denotes the representation of node
at l-th layer. Matrix
shall be regularized by adopting basis decomposition [15].
Similarly, we consider the outgoing messages from node as the weighted summation of the hidden state
from node
itself instead of the hidden states of node neighborhood
. The outgoing messages are calculated as:
Considering the distinct roles of incoming and outgoing messages, different coefficients should be assigned to both of them. Afterward, the nodal representation was updated based on the linear combination of its incoming and outgoing messages transformed on the lth layer, as shown in Fig 1B.
where the activation function is a parametric rectified linear unit (PReLU),
with a learnable parameter a. Hyperparameters
control the contribution of different message components and are optimized during the training process.
Besides, we do not expect the message-passing process to be trapped in some specific nodes of the directed heterogeneous graph. Typically, nodes with strong self-loops or without outgoing edges easily absorb incoming messages and have little interaction with other nodes, where the message-passing process will not converge to the ideal embedding results. To overcome this problem, we draw inspiration from personalized PageRank [23,43] and introduce a teleport vector into the aggregator function to denote the random pairwise connections in the graph. The teleport proportion is assigned with a probability
. The aggregator function is then finalized as:
Afterward, the updated node embedding is normalized as , where
is the standard Euclidean norm. After L layers of iteration, the final node embedding is produced as
.
Objective functions
For node classification.
After stacking BHGNN-RT layers, we fed the output embedding with the activation function
to calculate the category scores
, whose element is defined as
.
The objective function was configured as a log-likelihood function that measures the gap between the ground truth and the predictions. A smaller gap indicates stronger consistency between them. We minimize the objective functions on all labeled data:
where yit is the ground-truth label of node with node type t.
For node clustering.
To realize unsupervised clustering, we adopted the objective function inspired by the Deep Graph Infomax (DGI) to maximize the mutual information (MI) between node embeddings and a network summary [44]. The network representation is summarized as by averaging all node embeddings in the graph. As a proxy for maximizing the mutual information between node-graph pairwise representations, a discriminator function is leveraged to measure its probabilistic score, which is defined as
.
is a learnable scoring matrix,
is a sigmoid function and
is the transpose of the node embedding
.
In parallel with the real graph, we generated a fake graph by introducing row-wise shuffling of the adjacency matrix
and initial node features
(as Fig 1C). The row-wise shuffling followed a random permutation
for the sequence of each row
. The fake graph was defined as
, where the edge set
consists of edges
for
. The initial feature matrix was shuffled as
.
Concerning the set of original and fake graphs, the objective function is configured as
This log-likelihood value is derived from mutual information and assigns higher scores to positive embeddings and lower scores to negative ones. It encourages the embedding method to capture meaningful information shared across all nodes.
Experimental setup
To evaluate the performance of our proposed BHGNN-RT model, we conducted extensive experiments across multiple benchmark datasets while comparing it with the SOTA algorithms. The experiments were designed to assess the effectiveness of the model in both node classification and unsupervised clustering tasks on directed heterogeneous graphs.
Datasets
The experiments were conducted on six publicly available datasets, including Cora [45], Cora_ml [45], CiteSeer [46], CiteSeer_full [46], Amazon_CS [47], Amazon_photo [47]. These datasets are representative of directed heterogeneous graphs, where edges encode distinct relationships between nodes. Cora, Cora_ml, CiteSeer, and CiteSeer_full are classical citation graphs, where nodes represent articles and directed edges indicate citation relationships. Amazon_CS and Amazon_photo capture co-purchase relationships in an e-commerce context, where nodes represent products and edges denote products purchased together. Detailed statistics of the datasets are listed in S1 Table.
Entity classification
To evaluate the model performance on node classification, we compared it with seven SOTA methods. These methods are divided into two categories: 1) spectral-based GNNs, such as ChebNet [18], GCN [19], simplifying GCN (SGC) [20], relational-GCN (R-GCN) [15]; 2) spatial-based GNNs, including GraphSAGE [14] and graph attention network (GAT) [21], directed GCN (Dir-GNN) [29]. The mechanisms of these baselines are described in the Appendix.
The experiments followed a consistent setup across all datasets to ensure fair evaluation. Nodes were randomly split into three subsets: 70 for training, 20
for validation, and 10
for testing. For training, we configured all models with hidden layer dimensions of 64 and tested the architecture of each model with layer depths ranging from 2 to 8. The configuration yielding the highest validation performance was used for evaluation. The weight matrices were initialized via the Glorot method and the Adam optimizer was adopted with a learning rate of 0.01. To ensure robustness, each experiment was repeated 10 times with different random seeds. The classification performance of each model was measured using accuracy and macro-F1 score. These metrics provided a comprehensive view of the model’s classification capabilities, accounting for both balanced and imbalanced datasets.
Unsupervised clustering
The clustering task evaluates the ability of the proposed model to group similar nodes into clusters based on their embeddings. We utilized the embeddings generated by BHGNN-RT and compared its clustering performance with five baseline methods, including DGI [44], deep attentional embedded graph clustering (DAEGC) [48], Graph InfoClust (GIC) [49], Just Balance GNN (JBGNN) [50], and a variant of R-GCN [15]. Detailed descriptions of these baselines are provided in the Appendix.
To maintain consistency, all models were configured with hidden layer dimensions of 64 and output dimensions of 512. The appropriate numbers of model layers were chosen for each model based on their performance, shown as the red starred points in Fig 3. A maximum of 300 epochs was used for training and the Adam optimizer was employed with a learning rate of 0.001. The embeddings from GNN models were then served as input to the K-Means algorithm, which grouped nodes into T clusters. The experiments was repeated 10 times, and their performance was evaluated by comparing the predicted clusters with the ground truth labels. The clustering quality was evaluated using accuracy, normalized mutual information (NMI), and adjusted Rand index (ARI) [48,49]. NMI is a metric based on information theory, and ARI is treated as an accuracy metric that penalizes incorrect predictions.
Panels A, B depict the classification results, while Panels C, D display their clustering performance on Cora and CiteSeer. Each bar exhibits results with distinct aggregation functions, including 1) aggregation without nodal messages, 2) aggregation without outgoing messages, 3) aggregation with unweighted incoming and outgoing messages, 4) aggregation function as Eq 3, and 5) aggregation function as Eq 4.
We implemented all models and experiments using PyTorch 1.12.0 and CUDA toolkit 11.6. The experiments were conducted on a computer with a 20-core Intel i9-10900K CPU(@3.7 GHz), NVIDIA RTX A4000 GPU (16 GB memory), and 80 GB RAM. The code shall be made publicly available after the paper is accepted.
Results
Node classification
The classification experiments demonstrated that the proposed BHGNN-RT consistently outperforms SOTA baselines across all benchmark datasets. As summarized in Table 1, the classification accuracy of BHGNN-RT exceeds that of other models by margins ranging from 1.8 to 11.5
. The largest improvement was observed on the CiteSeer_full, where BHGNN-RT achieved an 11.5
higher accuracy compared to GAT (87.9
0.3
). A similar trend is evident for the macro-F1 metric, further highlighting the robustness of our approach. Meanwhile, BHGNN-RT consistently achieved better performance than the proposed model without random teleport (BHGNN), with improvements up to 4.3
on the Cora dataset. This demonstrates that the random teleport enhances the classification capability of our model.
Node clustering
For clustering tasks, BHGNN-RT remains superior performance compared to other baselines across multiple datasets. Table 2 lists the highest clustering results over 10 runs for the proposed and baseline models, and the results with the mean and standard deviation are reported in S2 Table. Ten repetitions for comparative experiments with different random seeds are widely adopted in graph neural network researches and provide a reasonable estimation of performance variability while maintaining computational feasibility. Notably, on the CiteSeer dataset, BHGNN-RT outperformed the best baseline, GIC, by a substantial margin of 19.3 in accuracy. Improvements in NMI and ARI were also significant, ranging from 2.1
to 18.1
and 7.6
to 29.2
, respectively. It is promising that our proposed method allows each node stronger access to the structural properties of global connectivity.
Effects of message components
To analyze the contribution of individual message components, we performed ablation studies by modifying the aggregation functions in BHGNN-RT. In the traditional message-passing process, people mainly pay attention to the incoming messages [14,19,21,29]. Results in Fig 2 indicate that both nodal and outgoing messages play a critical role in graph representation learning, as evidenced by the comparison between configurations excluding nodal or outgoing messages. Interestingly, the inclusion of unweighted incoming and outgoing messages yielded good results in classification but underperformed in clustering tasks. We assume that this phenomenon occurs because the optimization of mutual information between node-graph representations affects the balance between incoming and outgoing messages when without ground truth. This discrepancy underscores the importance of optimizing the coefficients of message components. Specifically, while the full integration improves predictive performance by capturing richer structural information, it can also increase model complexity due to the additional parameters introduced by relation-specific transformation matrices and the need to compute multiple aggregation pathways. Integrating all message components led to the best classification and clustering performance, demonstrating the efficacy of the proposed message-passing framework.
Effects of model layers
The impact of varying the number of network layers was also evaluated. Generally speaking, each node interacts with information from the l-hop neighborhood when stacking l GNN layers [51], leading to over-smoothing and overfitting [30,31]. As shown in Fig 3, the test accuracy for BHGNN and BHGNN-RT increases as the number of layers grows from 2 to 4. Unlike other baselines, which exhibit performance degradation due to over-smoothing with deeper layers, BHGNN-RT maintains stable performance beyond four layers across different datasets. This resilience highlights the model’s ability to effectively suppress over-smoothing.
For simplicity and computational efficiency, we configured BHGNN-RT with four layers for all experiments, as higher layer counts did not yield significant improvements. The appropriate layers of all models are configured as red stars in Fig 3. This configuration strikes a balance between performance and computational complexity.
Visualization of embedding results
To provide a qualitative assessment of the learned embeddings, we applied the t-SNE method to visualize their clustering results across different datasets. The nodes are colored based on their labels in S1 Table. The clustering results were evaluated using silhouette scores (SIL), a metric to quantify the quality of the clusters generated.
As shown in Fig 4 and S1 Fig, BHGNN-RT achieves clearer clustering boundaries, separating nodes with the same labels into distinct groups. Among all methods, BHGNN-RT achieved the highest SIL score of 0.477 for Cora and 0.506 for Amazon dataset, indicating superior clustering quality. In addition, the clustering performance of BHGNN-RT was evaluated across diverse datasets (Fig 5), with consistent results observed. Particularly on the Amazon datasets, the method produced more distinct clusters, likely due to the higher average degrees in these graphs for stronger network connectivity density. In sum, our method improves unsupervised clustering quality when capturing more comprehensive nodal connectivity profiles and graph-level structural properties.
A and B exhibit classification results, while C and D show clustering results on Cora and CiteSeer. Legends in panels B and D indicate methods used for classification and clustering tasks, respectively. The configuration of model layers is marked as the red stars.
Individual panel depicts the results from different methods, including K-means (A), DGI (B), DAEGC (C), GIC (D), JBGNN (E), R-GCN-v (F), BHGNN (G), and BHGNN-RT (H).
The datasets contain Cora (A), Cora_ml (D), CiteSeer (B), CiteSeer_full (E), Amazon_cs (C), and Amazon_photo (F).
Conclusion
In this study, we proposed BHGNN-RT, a novel graph neural network designed specifically for directed heterogeneous graphs. The model effectively incorporates bidirectional message-passing and accounts for network heterogeneity, ensuring high-quality graph representation learning. By optimizing the teleportation proportion, BHGNN-RT balances information from neighboring nodes and random connections, which significantly mitigates the over-smoothing issue prevalent in deep GNNs. Furthermore, the model is compatible with both unweighted and weighted graphs, making it versatile for a range of complex graph scenarios.
Extensive experiments were conducted to evaluate the effectiveness of the proposed BHGNN-RT model. The model achieved state-of-the-art performance across node classification and unsupervised clustering tasks, consistently outperforming existing baselines. Our method demonstrated notable improvements, particularly in capturing bidirectional edge semantics and preserving feature heterogeneity. This is further supported by the model’s superior performance in clustering tasks, where BHGNN-RT produced more distinct and meaningful node groupings. Beyond the quantitative results, we investigated the impact of model components, including the effects of message-passing configurations, the number of layers, and teleportation. Our analysis revealed that the inclusion of both nodal and outgoing messages contributes substantially to improved performance, while careful optimization of message coefficients and teleportation proportions further enhances the results. Notably, BHGNN-RT demonstrated superior clustering performance in graphs, producing more distinct and meaningful clustering patterns. The findings underscore the model’s ability to generalize effectively while addressing critical challenges in directed heterogeneous graphs.
Looking ahead, future research can explore extending BHGNN-RT to dynamic and temporal graphs, which introduce additional complexities such as time-evolving structures and edge dynamics. We also plan to investigate advanced combinations of node- and layer-wise aggregation functions to further enhance the model’s flexibility and adaptability to diverse graph types. These directions aim to solidify the applicability of BHGNN-RT in real-world scenarios and expand its scope to emerging challenges in graph representation learning.
Supporting information
Programs. The python codes for our GNN model, experiments, and evaluation of performances are all in the BHGNN.ipynb file.
S1 Table. Statistics of the datasets for directed heterogeneous graphs.
The table lists the number of nodes, edges, node classes, edge relations, and the dimension of node features. The average degree measures the average number of edges per node in the graph.
https://doi.org/10.1371/journal.pone.0326756.s001
(PNG)
S2 Table. Clustering performance on benchmark datasets.
This table records the average results and standard deviations for clustering performance on 10 runs. We configured the random_state in K-means as 0, in which case its results are the same across different runs. The best results are depicted in bold.
https://doi.org/10.1371/journal.pone.0326756.s002
(PNG)
S1 Fig. t-SNE visualization for clustering Amazon_photo datasets and the corresponding SIL scores.
Individual panel depicts the results from different methods, including K-Means (A), DGI (B), DAEGC (C), GIC (D), JBGNN (E), R-GCN-v (F), BHGNN (G), and BHGNN-RT (H).
https://doi.org/10.1371/journal.pone.0326756.s003
(JPG)
References
- 1. Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, et al. Heterogeneous graph attention network. World Wide Web Conf. 2019.
- 2. Zhao J, Wang X, Shi C, Hu B, Song G, Ye Y. Heterogeneous graph structure learning for graph neural networks. AAAI. 2021;35(5):4697–705.
- 3. Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, et al. Graph neural networks for materials science and chemistry. Commun Mater. 2022;3(1):93. pmid:36468086
- 4. Veličković P. Everything is connected: graph neural networks. Curr Opin Struct Biol. 2023;79:102538. pmid:36764042
- 5. Pavethra M, Uma Devi MA. A cross layer graphical neural network based convolutional neural network framework for image dehazing. Automatika: Časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije. 2024;65:1139–53.
- 6. Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. 2021;32(1):4–24. pmid:32217482
- 7. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks:aA review of methods and applications. AI Open. 2020;1:57–81.
- 8.
Tong Z, Liang Y, Sun C, Rosenblum D, Lim A. Directed graph convolutional network. arXiv preprint 2020.
- 9. Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. IEEE Trans Knowl Data Eng. 2018;31:833–52.
- 10. Zhang S, Tong H, Xu J, Maciejewski R. Graph convolutional networks: a comprehensive review. Comput Soc Netw. 2019;6(1):11. pmid:37915858
- 11.
Mavromatis C, Karypis G. Global and nodal mutual information maximization in heterogeneous graphs. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023. p. 1–5.
- 12. Zheng V, Cavallari S, Cai H, Chang K, Cambria E. From node embedding to community embedding. arXiv prerpint 2016.
- 13. Gao C, Zheng Y, Li N, Li Y, Qin Y, Piao J, et al. A survey of graph neural networks for recommender systems: challenges, methods, and directions. ACM Trans Recommend Syst. 2023;1:1–51.
- 14. Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Adv Neural Inf Process Syst. 2017;30.
- 15.
Schlichtkrull M, Kipf T, Bloem P, Van Den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings. 2018. p. 593–607.
- 16.
Yang X, Yan M, Pan S, Ye X, Fan D. Simple and efficient heterogeneous graph neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2023. p. 10816–24.
- 17. Sharma K, Lee Y, Nambi S, Salian A, Shah S, Kim S, et al. A survey of graph neural networks for social recommender systems. ACM Comput Surv. 2024;56:1–34.
- 18. Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst. 2016;29.
- 19.
Kipf T, Welling M. Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations. 2016.
- 20.
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K. Simplifying graph convolutional networks. In: International Conference on Machine Learning. 2019. p. 6861–71.
- 21. Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. Stat. 2018;1050:4.
- 22.
Ioannidis VN, Marques AG, Giannakis GB. A recurrent graph neural network for multi-relational data. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019. p. 8157–61. https://doi.org/10.1109/icassp.2019.8682836
- 23.
Gasteiger J, Bojchevski A, Günnemann S. Predict then propagate: graph neural networks meet personalized PageRank. 2018.
- 24. Asif N, Sarker Y, Chakrabortty R, Ryan M, Ahamed M, Saha D, et al. Graph neural network: a comprehensive review on non-euclidean space. IEEE Access. 2021;9:60588–606.
- 25. Shchur O, Mumme M, Bojchevski A, Günnemann S. Pitfalls of graph neural network evaluation. arXiv preprint 2018.
- 26. Busbridge D, Sherburn D, Cavallo P, Hammerla N. Relational graph attention networks. arXiv preprint 2019.
- 27.
Khosla M, Leonhardt J, Nejdl W, Anand A. Node representation learning for directed graphs. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part I. 2020. p. 395–411.
- 28.
Nikolentzos G, Tixier A, Vazirgiannis M. Message passing attention networks for document understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2020. p. 8544–51.
- 29. Rossi E, Charpentier B, Di Giovanni F, Frasca F, Günnemann S, Bronstein M. Edge directionality improves learning on heterophilic graphs. arXiv preprint 2023.
- 30.
Xu K, Li C, Tian Y, Sonobe T, Kawarabayashi K, Jegelka S. Representation learning on graphs with jumping knowledge networks. In: International Conference on Machine Learning. 2018. p. 5453–62.
- 31.
Klicpera J, Bojchevski A, Günnemann S. Combining neural networks with personalized pagerank for classification on graphs. In: International Conference on Learning Representations. 2019.
- 32. Rong Y, Huang W, Xu T, Huang J. DropEdge: Towards Deep Graph Convolutional Networks on Node Classification. International Conference On Learning Representations. 2020. https://openreview.net/forum?id=Hkx1qkrKPr
- 33.
Roth A, Liebig T. Transforming pagerank into an infinite-depth graph neural network. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2022. p. 469–84.
- 34. Motie S, Raahemi B. Financial fraud detection using graph neural networks: a systematic review. Exp Syst Appl. 2024;240:122156.
- 35.
Kollias G, Kalantzis V, Idé T, Lozano A, Abe N. Directed graph auto-encoders. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022. p. 7211–9.
- 36.
Monti F, Otness K, Bronstein M. Motifnet: a motif-based graph convolutional network for directed graphs. In: 2018 IEEE Data Science Workshop (DSW). 2018. p. 225–8.
- 37. Zhang X, He Y, Brugnone N, Perlmutter M, Hirn M. MagNet: a neural network for directed graphs. Adv Neural Inf Process Syst. 2021;34:27003–15. pmid:36046111
- 38.
Li Y, Zemel R, Brockschmidt M, Tarlow D. Gated graph sequence neural networks. In: Proceedings of ICLR. 2016.
- 39. Liang J, Wang J, Yu G, Guo W, Domeniconi C, Guo M. Directed acyclic graph learning on attributed heterogeneous network. IEEE Trans Knowl Data Eng. 2023;35:10845–56.
- 40.
Jaume G, Nguyen A, Martinez M, Thiran J, Gabrani M. edGNN: a simple and powerful GNN for directed labeled graphs. In: International Conference On Learning Representations. 2019.
- 41.
Vashishth S, Sanyal S, Nitin V, Talukdar P. Composition-based multi-relational graph convolutional networks. In: International Conference on Learning Representations. 2020.
- 42. Li X, Sun L, Ling M, Peng Y. A survey of graph neural network based recommendation in social networks. Neurocomputing. 2023;549:126441.
- 43.
Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: bringing order to the web. Stanford Digital Library Technologies Project. 1998.
- 44.
Velickovic P, Fedus W, Hamilton W, Liò P, Bengio Y, Hjelm R. Deep graph infomax. In: ICLR. 2019.
- 45. McCallum A, Nigam K, Rennie J, Seymore K. Automating the construction of internet portals with machine learning. Inf Retrieval. 2000;3:127–63.
- 46.
Giles C, Bollacker K, Lawrence S. CiteSeer: sn automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries. 1998. p. 89–98.
- 47.
McAuley J, Targett C, Shi Q, Van Den Hengel A. Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015. p. 43–52.
- 48.
Wang C, Pan S, Hu R, Long G, Jiang J, Zhang C. Attributed graph clustering: a deep attentional embedding approach. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019. p. 3670–6.
- 49.
Mavromatis C, Karypis G. Graph InfoClust: maximizing coarse-grain mutual information in graphs. In: PAKDD. 2021.
- 50.
Bianchi F. Simplifying clustering with graph neural networks. In: Proceedings of the Northern Lights Deep Learning Workshop. 2023.
- 51.
Barceló P, Kostylev E, Monet M, Pérez J, Reutter J, Silva J. The logical expressiveness of graph neural networks. In: 8th International Conference on Learning Representations (ICLR 2020). 2020.