Figures
Abstract
Graph neural networks (GNNs), with their ability to incorporate node features into graph learning, have achieved impressive performance in many graph analysis tasks. However, current GNNs including the popular graph convolutional network (GCN) cannot obtain competitive results on the graphs without node features. In this work, we first introduce path-driven neighborhoods, and then define an extensional adjacency matrix as a convolutional operator. Second, we propose an approach named exopGCN which integrates the simple and effective convolutional operator into GCN to classify the nodes in the graphs without features. Experiments on six real-world graphs without node features indicate that exopGCN achieves better performance than other GNNs on node classification. Furthermore, by adding the simple convolutional operator into 13 GNNs, the accuracy of these methods are improved remarkably, which means that our research can offer a general skill to improve accuracy of GNNs. More importantly, we study the relationship between node classification by GCN without node features and community detection. Extensive experiments including six real-world graphs and nine synthetic graphs demonstrate that the positive relationship between them can provide a new direction on exploring the theories of GCNs.
Citation: Jiao Q, Zhang H, Wu J, Wang N, Liu G, Liu Y (2024) A simple and effective convolutional operator for node classification without features by graph convolutional networks. PLoS ONE 19(4): e0301476. https://doi.org/10.1371/journal.pone.0301476
Editor: Xiao Luo, University of California Los Angeles, UNITED STATES
Received: December 30, 2023; Accepted: March 17, 2024; Published: April 30, 2024
Copyright: © 2024 Jiao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by Science and Technology Development Plan Project of Henan under grant number 232102210021, Science and Technology Development Plan Project of Henan under grant number 222102320036, Henan Provincial Colleges and Universities Youth Key Teacher Training Plan under grant number 2021GGJS129 and The National Natural Science Foundation of China under grant number 61806007. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Graph neural networks (GNNs) employ deep learning strategies to deal with graph-structured data and are applied to various fields [1], such as graph classification [2], recommender systems [3, 4] and natural language processing [5]. As a successful model of GNN, graph convolutional network (GCN) [6] has become a promising and important algorithm. In the past few years, many deuterogenic GCN and GNN algorithms have been proposed to resolve the problems, such as over fitting, over smoothing, high time complexity and poor performance. By randomly removing a certain number of edges in a graph at each training epoch, Rong et al propose an algorithm named Dropedge to resolve over fitting and over smoothing [7]. Eliasof et al propose a pathGCN model which learns a spatial operator from random paths on the graph to resolve the over smoothing problem [8]. In order to overcome the vanishing gradient problem generated by deep layers, Li et al bring residual/dense connections and dilated convolutions from convolutional neural networks (CNNs) into GCN architectures [9], and propose a deep GCN model that achieves 56 layers. Likewise, based on initial residual and identity mapping, Chen et al propose an extensional GCN called GCNII to build a deep GCN model [10]. Besides, GCNII also can relieve the problem of over smoothing. Because GCN suffers from the challenges of time and memory for training large graphs, Chen et al propose a FastGCN algorithm to resolve the problem mentioned above [11]. FastGCN first interprets graph convolutions as integral transforms, and then evaluates the integrals through Monte-Carlo approximation. Chiang et al propose a ClusterGCN algorithm to train very deep GCN on large-scale graphs [12]. The key strategy of ClusterGCN is that it samples a block of nodes from a dense subgraph in the input graph.
Furthermore, many algorithms have been proposed to improve the performance of GNNs. Fusing attention and multi-hop graph convolution model enables effective long-range message passing and improves the accuracies of GNNs [13, 14]. Wang et al propose a multi-hop attention graph neural network (MAGNA) to improve the performance of node classification [13]. MAGNA computes attention by aggregating the attention scores over all the possible multi-hop neighborhoods. Likewise, Xue et al present multi-hop hierarchical graph neural networks (MHGNNs) to obtain further node information and broad receptive field [14], and employ attention to extract significant hop-level features. By collecting the information from multi-hop neighboring nodes within one step of graph convolutions, Li et al propose a modified GNN to improve the accuracy of node classification [15].
In addition, some characters of GCN are researched. For example, Jin et al study the performance of GCN changes with different propagation mechanisms including 1-hop, 2-hop and k-nearest neighbor (kNN) network neighbors, and propose a U-GCN algorithm to improve the accuracy of GCN [16]. Jin et al have conducted research indicating GCN can destroy original node feature similarity which plays an important role in node classification. Therefore, they propose a framework named SimP-GCN to preserve node similarity while exploiting graph structure [17]. SimP-GCN can balance the information from graph structure and node features and achieve better performance on both assortative and disassortative graphs. Duong et al find that a strong correlation between node features and node labels may lead to better performance of GNN, and they propose new feature initialization methods to deal with non-attributed graphs [18]. Chen et al design a distribution matching named structure-attribute transformer (SAT) to deal with attribute-incomplete graphs. SAT, which achieves the joint distribution modeling of structures and attributes, can be used to link prediction and node attribute completion tasks [19]. Taguchi et al propose a varietal GCN to handle the graphs with incomplete (or missing) features that are dealt with Gaussian mixture model compensator [20]. The proposed method combines the processing of missing features with graph learning in a neural network architecture. Likewise, in the face of the graph with weak information (incomplete structure, incomplete features and insufficient labels), Liu et al design a dual-channel diffused propagation then transformation (D2PT) model to improve the performance of GNN [21]. D2PT enables GNN to propagate information for the nodes with long-scale range and the isolated nodes from the largest connected component.
Many graphs in the real world do not contain any feature information due to privacy concerns or difficulty in collecting node features [18]. An example is the social networks including REDDIT [22, 23] and Karate [24] data. This phenomenon also exists in the chemical field [25, 26]. However, the existing GNNs cannot achieve satisfactory performance on the graphs with incomplete features [19], and their performance deteriorates on the graphs without any node features [27]. In this work, we propose a simple and effective convolutional operator which enables GCN to achieve better performance on the graphs without node features. First, the proposed approach introduces an extensional adjacency matrix that is defined by the 2-path neighboring nodes as a convolutional operator. Then, the modified GCN named exopGCN is tested on six widely used graphs. Second, the proposed convolutional operator is applied to 13 GNN models, and the performances of most of these methods are improved significantly. At last, the relationship between node classification by GCN and community detection is studied. The experimental results show that exopGCN can offer superior performance over other GNNs in the graphs without node features, and also offer a general skill to improve accuracy of GNNs. More importantly, the results reveal that there is a strong correlation between node classification by GCN and community detection. It is expected that these results will open up a new venue for exploring the theories of GCNs.
2. Methods
2.1 Graph convolutional networks (GCNs)
Given an undirected and unweighted graph with n nodes and m edges, it can be described as G = (V,E), v = {vi|i = 1,2,⋯,n} is the set of nodes, and E = {eij|i∈V and j ∈V} is the set of edges. The graph can also be described as an adjacency matrix A, if there is an edge between node vi and node vj, then Aij = 1; otherwise Aij = 0. If each node vi has d dimensional features, and all features of the nodes in the graph can be represented as a feature matrix X = [x1,x2,⋯,xi,⋯,xn]T∈Rn×d.
Graph convolutional network (GCN) [6] is a typical and successful model of GNN and is applied to many fields. The main reason for the success of GCN is its ability to effectively aggregate the feature information of neighboring nodes by adjacency matrix A (see Eq (1)). To balance the features of neighboring nodes and self-node and to prevent the values of the nodes with high degree being too large in multi-convolutional layers, GCN uses modified convolutional matrix to aggregate the feature information (see Eq (2)).
(1)
(2)
where Y and
are feature matrixes,
and I is an identity matrix,
is a degree matrix,
.
Using the convolutional matrix , the layer-wise propagation rule for GCN is described as Eq (3).
(3)
where H(l) is the matrix of activations in the lth layer; H(0) = X,σ(⋯) denotes an activation function, such as the ReLU(⋯) = max(0,⋯); W(l)∈Rd×f with d dimensional feature vector and f filters is a trainable weight matrix in the layer l.
GCN considers a two-layer for semi-supervised node classification on a graph based on the layer-wise propagation rule. The forward model of GCN is represented by Eq (4).
(4)
where
, the weights W(0) and W(1) are trained using gradient descent. The loss function is defined as the cross-entropy error over all labeled nodes (Eq (5)):
(5)
where yL is the set of node indices with labels, F is the dimension of the output features and is equal to the number of classes. Y∈R|yL|×F is a label indicator matrix.
2.2 The proposed method exopGCN
In this section, we analyze the process of aggregation of neighboring nodes by GCN in detail. We can rewrite Eq (1) in the form of matrix (Eq (6)). In Eq (6), Y represents the feature matrix that is generated by graph convolution, and represents the gth feature of node vi in the convolutional layer k. We take a small graph (see Fig 1) for example, assuming the node v1 has four neighboring nodes (v2,v3,v4,v7), that is N(v1) = {v2,v3,v4,v7}, and N(v2) = {v1}, N(v3) = {v1}, N(v4) = {v1,v5,v6}, N(v7) = {v1,v8}, respectively.
In the first convolutional layer, the first feature () of node v1 is aggregated by N(v1) in Eq (7). Likewise, the values of
and
are calculated by Eqs (8)–(11) respectively. From Eqs (7)–(11), it can be observed that GCN only captures the features of neighboring nodes (1-path neighboring nodes, see Eq (16)). Following, we calculate the features in the second convolutional layer of GCN by Eq (12). The feature
in the second layer can capture 2-path neighboring features (nodes v5,v6 and v8).
Here, we mainly consider the nodes without feature. For simplicity, we do not employ Eq (2) to deal with adjacency matrix A. If the nodes in the graph do not have any feature, GCN employs identity matrix (I) to replace the feature matrix (Eq (13)). For example, we can see that the value of is 0 in the first convolutional layer (see Eq (14)), and (Y1)without = A. In the second convolutional layer, the value of
is calculated by 1-path neighboring nodes (see Eq (15)). Note that, the calculation of
is determined by the value of 1-path neighboring nodes because some elements in (Y1)without are equal to 0. Like the case with node features,
also captures the information of 2-path neighboring nodes. Because long-range propagation can effectively improve the performance of GNNs [21]. Therefore, is there a method for GCN to propagate deeper long neighboring nodes with two convolutional layers.
To solve this problem, this work proposes a modified GCN named exopGCN for node classification without features. exopGCN first introduces an extensional adjacency matrix by path-driven neighboring nodes [27], and then a convolutional operator is performed on GCN for node classification without features. The path-driven neighboring nodes (or called t-path neighboring nodes) of node vi is defined by the shortest path between two nodes, that is
is the sets of nodes whose shortest path (dsp) to node vi is less than or equal to t (see Eq (16)). Based on the definition of path-driven neighboring nodes, we can construct the extensional adjacency matrix Mt and the element in
is defined by Eq (17).
Using the extensional adjacency matrix Mt, GCN can fuse the information of faster neighboring nodes in fewer layers. Take the node v1 as an example (see Fig 2), without node features, after 1-layer, GCN only contains its own information, and in 2-layer, GCN acquires the information from nodes v2,v3,v4 and v7 which are 1-path neighboring nodes. After 1-layer, exopGCN obtains the information from the nodes v2,v3,v4 and v7. After 2-layer, exopGCN acquires the information from all the nodes. Therefore, under the condition of the same number of layers, exopGCN can obtain more information from further nodes than GCN. As t increases, exopGCN quickly acquires information from more distant nodes.
3. Results
To evaluate the effectiveness of the proposed method exopGCN, we conducted empirical experiments on six publicly available datasets (see S1 Datasets), comparing its performance against 13 state-of-the-art GNN methods. These six datasets are Cora, Citeseer, Pubmed [28], Karate [24], Dolphins [29] and Polbook (http://www-personal.umich.edu/~mejn/netdata/, Books about US politics). Cora, Citeseer and Pubmed have 2708, 3312 and 19717 nodes, and 5429, 4732 and 44338 edges respectively. The nodes in the three graphs are divided into 7, 6 and 3 classes respectively. Note that, we only select the nodes with labels and features in Citeseer.
Karate, Dolphins and Polbook are small graphs with community structure, and the nodes in the three graphs do not have labels and features. They (Karate, Dolphins and Polbook) have 34, 62 and 105 nodes, and 78, 159 and 441 edges respectively. In order to evaluate the performance of different GNN methods, the nodes in the three graphs are divided by community labels. That is, we treat the nodes in the same community with the same class. As a result, Karate, Dolphins and Polbook are divided into 2, 2 and 3 classes respectively.
The performance of exopGCN is compared with other 13 GNN methods (The hyper-parameters setting for exopGCN and other GNNs is shown in S1 File). These 13 methods are GCN [6], FastGCN [11], GAT [30], SGC [31], ClusterGCN [12], DAGNN [32], APPNP [33], SSGC [34], GraphMLP [35], RobustGCN [36], LATGCN [37], MedianGCN [38] and ONF (ONFdw and ONFde) [18]. Some previous methods that deal with attribute-incomplete graphs have been proposed. But these methods, including SAT [19], GCNMF [20], D2PT [21], require some node features as input, and this is different from exopGCN method which does not require any node feature. Similar to exopGCN, the method (we abbreviate this method as ONF) proposed in the literature [18] does not need any node feature and classifies nodes by SGC [31]. The node features of ONF are generated by learning-based approaches and centrality-based approaches. Here, we first select two algorithms of node features generation with better performance [18, 23], that is deepwalk [39] (ONFdw) from learning-based approaches and degree (ONFde) from centrality-based approaches. The dimension of output node features for deepwalk is set to 64, and the node features generated by degree are represented by an one-hot vector [23]. Then, the performance on node classification of ONFdw and ONFde is compared with exopGCN.
In exopGCN, the convolutional operator Mt is generated by our work (see S2 Datasets), and the convolutional operator of other 13 GNNs are adjacency matrices A. Then, for Cora, Citeseer and Pubmed graphs (The indices of training and testing nodes for these three graphs are recorded in S1 File), we evaluate exopGCN and 13 GNN methods with 5% of the training size and 10% of the testing size, respectively. For other three small graphs, in order to improve performance, the training nodes are evenly selected from different communities (or labels) (The indices of training and testing nodes for three small graphs are recorded in S1 File). For example, in Karate with two communities (or labels), half of training nodes are from the first community, and remaining training nodes are from the other community. Since the sizes of the three graphs are small, we evaluate exopGCN and 13 GNN methods with 20% of the training size and 20% of the testing size, respectively. The accuracy of exopGCN and 13 GNN methods are shown in Table 1.
The performance of exopGCN is improved significantly compared with other 13 GNNs on Cora, Citeseer, Pubmed, Dolphins and Polbook. The best performance is appeared on Polbook, the performance of exopGCN is better than that of worst-performing method (LATGCN) by 68.18% and is better than that of best-performing method (SGC) by 36.36%. Comparing with the worst-performing methods on Cora, Citeseer, Pubmed and Dolphins, the values of relative improvement of exopGCN are 23.71%, 34.74%, 33.01% and 53.85%, respectively. The values of relative improvement of exopGCN are 3.71%, 22.36%, 0.3% and 5% more than best-performing methods. exopGCN shows poor performance on Karate.
4. The performance of graph neural networks with the proposed convolutional operator
In this section, we analyze the performance of current GNNs with the simple convolutional operator proposed in this work. First, 13 GNNs mentioned above are employed to test, and their convolutional operators (adjacency matrices) are replaced by our proposed convolutional operator (Mt) (see S2 Datasets). Second, the 13 modified GNNs are used to classify nodes on the six graphs (Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook). At last, the accuracy of 13 modified GNNs and original GNNs are computed. Note that, the parameter settings in this section are the same as those in the section of Results. Fig 3 shows the improvement or decrease of accuracy between the modified and original GNNs. From Fig 3, it can be observed that the accuracy of most GNNs are improved by using the proposed convolutional operator (Mt) as a whole. Next, we investigate the results in detail. The accuracy of eight GNNs (GCN, FastGCN, GAT, SGC, ClusterGCN, GraphMLP, LATGCN and ONFde) are improved significantly on Cora, Citeseer, Pubmed and Polbook. One method (MedianGCN) obtains worse performance on Cora by adding the proposed convolutional operator. The relative digits are 0, 6, 2, 2 and 2 on Citeseet, Pubmed, Karate, Dolphins and Polbook, respectively. The best improvement of accuracy is LATGCN on the Poolbook by 68.18%. The worst performance is FastGCN with a decrement of 38.46% on Dolphins by adding the proposed convolutional operator. Generally speaking, these GNNs show poor performance on Pubmed probably because long-range propagation may bring redundant information for node classification. However, these results can provide one with a general skill to improve the accuracy of node classification without features.
Co, Ci, Pu, Ka, Do and Po represent Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook.
5. Selection of the parameter t
Selection of the parameter t plays a crucial role in improving the performance and preventing over fitting of exopGCN. Here, we discuss the relationship between the parameter t and the accuracy on the six graphs including Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook. The parameter t is set from 1 to 6. Note that, if the graph diameter is less than 6, the upper limit of t is set to graph diameter. The results are show in Fig 4. From Fig 4, it can be observed that as the parameter t increase, the accuracy is reduced in general. On Pubmed, Karate, Dolphins and Polbook, the best accuracies appear when t is set to 2. For Citeseer, although the best accuracy with 50.45% is obtained when t is set to 3, exopGCN with t = 2 achieves a close accuracy of 45.92%. For Cora, the best accuracy with 42.96% is obtained when t is set to 4, and the accuracy is equal to 37.41% when t is set to 2. The phenomenon on Cora may be caused because it has diverse properties from other five graphs. In general, it is reasonable to set the parameter t to 2 for exopGCN, and the parameter with t = 2 can also prevent over fitting problem for diverse graphs.
6. The relationship between node classification and community detection
Furthermore, we discuss the relationship between node classification using GCN and community detection. From the Eq (13), it can be observed that GCN aggregates the information by the convolutional matrix and obtains similar information which has the same neighboring nodes when nodes do not have features. For example, the nodes 5 and 6 in Fig 1 have similar features. Therefore, GCN clusters the nodes with similar neighboring nodes into a class. The concept is close to the community structure in which the connections between nodes are tight, while the connections with other nodes in the network are loose [40].
In order to analyze the relationship between node classification using GCN and community detection, we first introduce edge (eij) clustering coefficient [41] defined by Eq (20).
In Eq (20), is the number of triangles of the edge eij,ki and kj are the degree of nodes vi and vj respectively. Edge clustering coefficient also can be represented by neighboring nodes (see Eq (21)).
In Eq (21), Nij is the common neighboring nodes set of the nodes vi and vj.
From the definition of edge clustering coefficient in Eq (21), it can be observed that if nodes vi and vj have more common neighboring nodes, the edge clustering coefficient of the edge connecting node vi and node vj is greater. Likewise, we also find that GCN tends to cluster the two nodes with common neighboring nodes into a class because the two nodes have similar features. Therefore, if the edge clustering coefficient of the edge that connects node vi and node vj has large values, node vi and node vj are clustered into the same class by GCN. From the literature [41], we know that the edge (eij) connecting node vi and node vj in the same community tends to have a large value of edge clustering coefficient. Therefore, if two nodes are grouped into a class by GCN, the two nodes are likely to be in the same community.
Modularity (see Eq (22)) [42] is widely used to measure community structure of a graph.
In Eq (22), m and A represent the number of edges and the adjacency matrix respectively. Pij is the expected number of edges between nodes vi and vj in the null model. δ = 1 if nodes vi and vj are in the same community (Ci = Cj), zero otherwise.
In order to research the relationship between node classification using GCN and community detection, the accuracy of node classification using GCN and community detection are studied on both six real-world graphs (Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook) and nine synthetic graphs with different values of modularity. The nine synthetic graphs called LFR benchmark are proposed by Lancichinetti et al [43]. In order to generate these synthetic graphs, some parameters should be set. (1) The number of nodes n, the average degree ⟨k⟩ and maximum degree ⟨max k⟩. (2) Minimum for the community sizes ⟨min c⟩ and maximum for the community sizes ⟨max c⟩. (3) Minus exponent for the degree sequence ⟨t1⟩ and minus exponent for the community size distribution ⟨t2⟩. (4) Mixing parameter ⟨μ⟩. The parameter μ is an index to represent community structure. Low μ indicates that the generated graphs have strong community structure. In this work, we set these parameters as follows: n = 1000, k = 8, max k = 40, t1 = 2, t2 = 1, min c = 5 and max c = 35. By turning the mixing parameter μ ∈ [0.1, 0.9] with a step 0.1, we will obtain 9 synthetic graphs (see S1 Datasets) with community labels.
First, the nodes in the six real-world graphs (see S3 Datasets) and nine synthetic graphs (see S3 Datasets) are classed by GCN (The indices of training and testing nodes for these 15 graphs are recorded in S1 File). Note that, the training size and the testing size of six real-world graphs are set the same as in the section of Results. The training size and the testing size of nine synthetic graphs are set 20%. Second, the values of modularity Q are calculated by real labels (or community labels). At last, the accuracy of node classification using GCN and the values of Q are compared, and the results are shown in Fig 5. From Fig 5A, it can be observed that there is no strong regularity between the accuracy and the modularity Q overall. In detail, the high values of modularity Q means high values of accuracy on Pubmed and Karate. On the contrary, the relationship between accuracy and modularity Q has strong regularity (see Fig 5B) on nine synthetic graphs. It can be observed that the accuracy of GCN decreases with the decrease of Q value, this means that the principle of GCN and modularity Q may be similar, that is they tend to cluster the nodes with similar neighboring nodes into a class or a community.
Ecc, In and Out represent edge clustering coefficient, and
, and s1, s2, s3, s4, s5, s6, s7, s8 and s9 represent nine synthetic graphs.
We study the weak regularity in six real-world graphs and strong regularity in nine synthetic graphs in detail, and use a modified edge clustering coefficient (Cm, see Eq (23)) to explain the phenomenon in Fig 5A and 5B. Here, we calculate two types of Cm of an edge eij, that is of the edge in which the two nodes (vi and vj) have the same label (or in the same community) and
of the edge in which the two nodes (vi and vj) have different labels (or in different communities). For an edge eij, a high value of
means that there are more common neighboring nodes between the two nodes (vi and vj) connected by eij, and two nodes can be classified by GCN with a high accuracy. While, a low value of
means that there are fewer common neighboring nodes between the two nodes (vi and vj) connected by eij, and the edge eij can be easily broken by community detection methods with a high modularity Q. On the contrary, a high value of
corresponds to a low value of modularity Q.
Therefore, we calculate the average values of and
for each graphs, the results are shown in Fig 5C and 5D respectively. From the Fig 5, we observe that higher values of
corresponds to higher accuracy on Cora, Citeseer, Karate, Dolphins and Polbook (see Fig 5A and 5C, solid red line) except for Pubmed. As previously analyzed, lower values of
corresponds to higher modularity Q on citeseer, Pubmed and Karate (see Fig 5A and 5C, dotted blue line), and higher values of
corresponds to lower modularity Q on Cora, Dolphins and Polbook (see Fig 5A and 5C, dotted blue line). Likewise, the phenomenon is appeared on eight synthetic graphs except for the second synthetic graphs. From the results mentioned above, the regularity between node classification using GCN and community detection (modularity Q) can be revealed by the modified edge clustering coefficient including
and
.
7. Conclusion and discussion
Graph convolutional network (GCN), which represents node features by a convolutional matrix and propagation mechanisms, has become a power tool to deal with graph-structure data. But, the performance of GCN deteriorates when it encounters the graphs with missing node features. In order to resolve the problem, we first introduce a simple and effective convolutional operator by path-driven neighboring nodes, and then a modified GCN named exopGCN is proposed for node classification. Experimental results demonstrate that exopGCN show better performance for node classification on the graphs without node features comparing with other GNNs. Furthermore, the performance of 13 GNNs are improved significantly by adding the proposed convolutional operator, which means that our research can provide one with a general skill to improve the performance of GNNs for node classification on graphs without features. More important, using the edge clustering coefficient as a gap, the relationship between node classification using GCN without features and traditional community detection are researched. As a result, the positive relationship can reveal the mysterious theory of GCN from view of traditional and unsupervised methods.
Here, we discuss two issues of exopGCN and a direction of further research. The first issue is the application of exopGCN on node classification with features. To resolve the problem, exopGCN is employed to classify the nodes with features on Cora, Citeseer and Pubmed, and the results obtained by other 13 GNNs are provided for comparison (see S1 Table). As shown in S1 Table, compared to other 13 GNNs, exopGCN does not obtain the best performance on three graphs. This demonstrates that aggregation of features from long-range neighboring nodes does not improve the accuracy of exopGCN, but leads to redundancy of features, and thus cannot classify nodes effectively. The second issue is the complexity of exopGCN. Compared with GCN, the additional overhead is the computational cost of the convolutional operator M2. In fact, the computer of convolutional operator can be converted into K-hop reachability queries [44] with K = 2 for each nodes in the graphs, and many fast algorithms [45, 46] are proposed to resolve the problem and have been apply to large real-world graphs. Finally, although exopGCN achieves good performance on node classification without node feature, the performance of exopGCN and some GNNs deteriorates after adding the proposed convolutional operator (see Fig 3). The main reason for this phenomenon may be that the proposed convolutional operator carries redundant information. As with references [13, 14], we can use attention mechanism to select vital 2-path neighboring nodes and improve the performance of GNNs.
Supporting information
S1 Datasets. Six original graph data and nine synthetic graph data.
https://doi.org/10.1371/journal.pone.0301476.s001
(ZIP)
S2 Datasets. Input data for exopGCN and different GNNs.
https://doi.org/10.1371/journal.pone.0301476.s002
(ZIP)
S1 Table. The accuracies of exopGCN and other GNNs on node classification with features.
https://doi.org/10.1371/journal.pone.0301476.s005
(PDF)
References
- 1. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks: A review of methods and applications. AI Open. 2020; 57–81.
- 2.
Ju W, Yang J, Qu M, Song W, Shen J, Zhang M. KGNN: Harnessing Kernel-based Networks for Semi-supervised Graph Classification. WSDM’22: The ACM International Conference on Web Search and Data Mining; 2022 Feb 21–25; Arizona, America. New York: Association for Computing Machinery (ACM); 2022.
- 3.
Qin Y, Wang Y, Sun F, Ju W, Hou X, Wang Z, et al. DisenPOI: Disentangling Sequential and Geographical Influence for Point-of-Interest Recommendation. WSDM’23: The ACM International Conference on Web Search and Data Mining; 2023 Feb 27-Mar 3; Singapore, Singapore. New York: Association for Computing Machinery (ACM); 2023.
- 4. Wu S, Sun F, Zhang W, Xie X, Cui B. Graph Neural Networks in Recommender Systems: A Survey. ACM Comput Surv. 2022; 37(4): 111.
- 5. Wu L, Chen Y, Shen K, Guo X, Gao H, Li S, et al. Graph Neural Networks for Natural Language Processing: A Survey. Found Trends Mach Le. 2023; 16(2): 119–328.
- 6.
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. ICLR 2017: International Conference on Learning Representations; 2017 Apr 24–26; Toulon, France. OpenReview.net; 2017. p. 1–14.
- 7.
Rong Y, Huang W, Xu T, Huang J. Dropedge: towards deep graph convolutional networks on node classification. ICLR 2020: International Conference on Learning Representations; 2020 Apr 26–30; Addis Ababa, Ethiopia. OpenReview.net; 2020. p. 1–17.
- 8.
Eliasof M, Haber E, Treister E. pathGCN: Learning General Graph Spatial Operators from Paths. ICML’22: Proceedings of the 39th International Conference on Machine Learning; 2022 Jul 17–23; Maryland, America. New York: PMLR; 2022. p. 5878–91.
- 9.
Li G, Muller M, Thabet A, Ghanem B. DeepGCNs: Can GCNs Go as Deep as CNNs?. ICCV 2019: 2019 IEEE/CVF International Conference on Computer Vision; 2019 Oct 27-Nov 2; Seoul, Korea (South). New Jersey: Institute of Electrical and Electronics Engineers (IEEE); 2019. p. 9267–76.
- 10.
Chen M, Wei Z, Huang Z, Ding B, Li Y. Simple and Deep Graph Convolutional Networks. ICML’20: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13–18; Online. New York: PMLR; 2020. p. 1725–35.
- 11.
Chen J, Ma T, Xiao C. Fastgcn: fast learning with graph convolutional networks via importance sampling. ICLR 2018: International Conference on Learning Representations; 2018 Apr 30-May 3; British Columbia, Canada. OpenReview.net; 2018. p. 1–15.
- 12.
Chiang W, Liu X, Si S. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. KDD’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019 Aug 4–8; Alaska, America. New York: Association for Computing Machinery (ACM); 2019. p. 257–66.
- 13.
Wang G, Ying R, Huang J, Leskovec J. Multi-hop Attention Graph Neural Networks. IJCAI-21: The 30th International Joint Conferences on Artificial Intelligence; 2021 Aug 19–26; Montreal, Canada. Massachusetts: Morgan Kaufmann Publishers; 2021. p. 3089–96.
- 14.
Xue H, Sun X, Sun W. Multi-hop Hierarchical Graph Neural Networks. BigComp 2020: 2020 IEEE International Conference on Big Data and Smart Computing; 2020 Feb 19–22; Busan, Korea. New Jersey: Institute of Electrical and Electronics Engineers (IEEE); 2020. p. 82–9.
- 15. Li Y, Tanaka Y. Structure-Aware Multi-Hop Graph Convolution for Graph Neural Networks. IEEE Access. 2022; 10: 16624–33.
- 16.
Jin D, Yu Z, Huo C, Wang R, Wang X, He D, et al. Universal Graph Convolutional Networks. NeurIPS 2021: Proceedings of the 35th International Conference on Neural Information Processing Systems; 2021 Dec 6–14; Online. New York: Curran Associates Inc.; 2021. p. 1–11.
- 17.
Jin W, Derr T, Wang Y, Ma Y, Liu Z, Tang J. Node Similarity Preserving Graph Convolutional Networks. WSDM’21: The ACM International Conference on Web Search and Data Mining; 2021 Mar 8–12; Jerusalem, Israel. New York: Association for Computing Machinery (ACM); 2021. p. 148–156.
- 18.
Duong CT, Hoang TD, Dang HTH, Nguyen QVH, Aberer K. On Node Features for Graph Neural Networks. NeurIPS 2019: Proceedings of the 33th International Conference on Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, Canada. New York: Curran Associates Inc.; 2019. p. 1–6.
- 19. Chen X, Chen S, Yao J, Zheng H, Zhang Y, Tsang IW. Learning on Attribute-Missing Graphs. IEEE Trans Pattern Anal Mach Intell. 2020; 44(2): 740–57.
- 20. Taguchi H, Liu X, Murata T. Graph convolutional networks for graphs containing missing features. Future Gener Comp Sys. 2021; 117: 155–68.
- 21.
Liu Y, Ding K, Wang J, Lee V, Liu H, Pan S. Learning Strong Graph Neural Networks with Weak Information. KDD’23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2023 Aug 6–10; California, America. New York: Association for Computing Machinery (ACM); 2023.
- 22. Morris C, Kriege NM, Bause F, Kersting K, Mutzel P, Neumann M. TUDataset: A collection of benchmark datasets for learning with graphs. ICML Workshop on Graph Representation Learning and Beyond. 2020.
- 23.
Cui H, Lu Z, Li P, Yang C. On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs. CIKM ‘22: Proceedings of the 31st ACM International Conference on Information and Knowledge Management; 2022 Oct 17–21; Atlanta, America. New York: Association for Computing Machinery (ACM). 2022. p. 3898–902.
- 24. Zachary WW. An information flow model for conflict and fission in small groups. J Anthropol Res. 1977; 33(4): 452–73.
- 25. Ramakrishnan R, Dral PO, Rupp M, Anatole von Lilienfeld O. Quantum chemistry structures and properties of 134 kilo molecules. Sci Data 1. 2014; 140022. pmid:25977779
- 26. Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J Chem Inf Model. 2012; 52 (11): 2864–75. pmid:23088335
- 27. Jiao Q, Zhao P, Zhang H, Han Y, Liu G. Path-enhanced graph convolutional networks for node classification without features. PLoS ONE. 2023; 18(6): e0287001. pmid:37294827
- 28. Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T. Collective classification in network data. AI magazine, 2008; 29(3): 93.
- 29. Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol. 2003; 54: 396–405.
- 30.
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. ICLR 2018: International Conference on Learning Representations; 2018 Apr 30-May 3; British Columbia, Canada. OpenReview.net; 2018. p. 1–12.
- 31.
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K. Simplifying Graph Convolutional Networks. ICML’19: Proceedings of the 36th International Conference on Machine Learning; 2019 Jun 9–15; California, America. New York: PMLR; 2019. p. 6861–71.
- 32.
Liu M, Gao H, Ji S. Towards Deeper Graph Neural Networks. KDD’20: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2020 Aug 23–27; Virtual Event, America. New York: Association for Computing Machinery (ACM); 2020. p. 338–48.
- 33.
Gasteiger J, Bojchevski A, Günnemann S. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. ICLR 2019: International Conference on Learning Representations; 2019 May 6–9; New Orleans, America. OpenReview.net; 2019. p. 1–15.
- 34.
Zhu H, Koniusz P. Simple Spectral Graph Convolution. ICLR 2021: International Conference on Learning Representations; 2021 May 3–7; Online. OpenReview.net; 2021. p. 1–15.
- 35. Hu Y, You H, Wang Z, Wang Z, Zhou E, Gao Y. Graph-MLP: Node Classification without Message Passing in Graph. arXiv:2106.04051, [Preprint]. 2021. Available from: https://arxiv.53yu.com/pdf/2106.04051.pdf.
- 36.
Zhu D, Zhang Z, Cui P, Zhu W. Robust Graph Convolutional Networks Against Adversarial Attacks. KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019 Aug 4–8; Alaska, America. New York: Association for Computing Machinery (ACM); 2019. p. 1399–407.
- 37. Jin H, Zhang X. Latent Adversarial Training of Graph Convolution Networks. ICML Workshop on Learning and Reasoning with Graph Structured Representations; 2019. Available from: https://www.cs.uic.edu/~hjin/files/icml_ws_latgcn.pdf.
- 38.
Chen L, Li J, Peng Q, Liu Y, Zheng Z, Yang C. Understanding Structural Vulnerability in Graph Convolutional Networks. IJCAI-21: The 30th International Joint Conferences on Artificial Intelligence; 2021 Aug 19–26; Montreal, Canada. Massachusetts: Morgan Kaufmann Publishers; 2021. p. 2249–55.
- 39.
Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations. KDD’14: Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2014 Aug 24–27; New York, America. New York: Association for Computing Machinery (ACM); 2014. p. 701–10.
- 40. Girvan M, Newman MEJ. Community structure in social and biological networks. Proc Natl Acad Sci USA. 2002; 99(12): 7821–26. pmid:12060727
- 41. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA. 2004; 101(9): 2658–63. pmid:14981240
- 42. Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004; 69(2): 026113. pmid:14995526
- 43. Lancichinetti A, Fortunato S. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E. 2009; 80(1): 016118. pmid:19658785
- 44. Cheng J, Shang Z, Cheng H, Wang H, Yu JX. Efficient processing of k-hop reachability queries. VLDB J. 2014; 23: 227–52.
- 45. Peng Y, Lin X, Zhang Y, Zhang W, Qin L. Answering reachability and K-reach queries on large graphs with label constraints. VLDB J. 2022; 31: 101–27.
- 46. Yildirim H, Chaoji V, Zaki MJ. GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 2012; 21: 509–34.