A simple and effective convolutional operator for node classification without features by graph convolutional networks

Qingju Jiao; Han Zhang; Jingwen Wu; Nan Wang; Guoying Liu; Yongge Liu

doi:10.1371/journal.pone.0301476

Abstract

Graph neural networks (GNNs), with their ability to incorporate node features into graph learning, have achieved impressive performance in many graph analysis tasks. However, current GNNs including the popular graph convolutional network (GCN) cannot obtain competitive results on the graphs without node features. In this work, we first introduce path-driven neighborhoods, and then define an extensional adjacency matrix as a convolutional operator. Second, we propose an approach named exopGCN which integrates the simple and effective convolutional operator into GCN to classify the nodes in the graphs without features. Experiments on six real-world graphs without node features indicate that exopGCN achieves better performance than other GNNs on node classification. Furthermore, by adding the simple convolutional operator into 13 GNNs, the accuracy of these methods are improved remarkably, which means that our research can offer a general skill to improve accuracy of GNNs. More importantly, we study the relationship between node classification by GCN without node features and community detection. Extensive experiments including six real-world graphs and nine synthetic graphs demonstrate that the positive relationship between them can provide a new direction on exploring the theories of GCNs.

Citation: Jiao Q, Zhang H, Wu J, Wang N, Liu G, Liu Y (2024) A simple and effective convolutional operator for node classification without features by graph convolutional networks. PLoS ONE 19(4): e0301476. https://doi.org/10.1371/journal.pone.0301476

Editor: Xiao Luo, University of California Los Angeles, UNITED STATES

Received: December 30, 2023; Accepted: March 17, 2024; Published: April 30, 2024

Copyright: © 2024 Jiao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by Science and Technology Development Plan Project of Henan under grant number 232102210021, Science and Technology Development Plan Project of Henan under grant number 222102320036, Henan Provincial Colleges and Universities Youth Key Teacher Training Plan under grant number 2021GGJS129 and The National Natural Science Foundation of China under grant number 61806007. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Graph neural networks (GNNs) employ deep learning strategies to deal with graph-structured data and are applied to various fields [1], such as graph classification [2], recommender systems [3, 4] and natural language processing [5]. As a successful model of GNN, graph convolutional network (GCN) [6] has become a promising and important algorithm. In the past few years, many deuterogenic GCN and GNN algorithms have been proposed to resolve the problems, such as over fitting, over smoothing, high time complexity and poor performance. By randomly removing a certain number of edges in a graph at each training epoch, Rong et al propose an algorithm named Dropedge to resolve over fitting and over smoothing [7]. Eliasof et al propose a pathGCN model which learns a spatial operator from random paths on the graph to resolve the over smoothing problem [8]. In order to overcome the vanishing gradient problem generated by deep layers, Li et al bring residual/dense connections and dilated convolutions from convolutional neural networks (CNNs) into GCN architectures [9], and propose a deep GCN model that achieves 56 layers. Likewise, based on initial residual and identity mapping, Chen et al propose an extensional GCN called GCNII to build a deep GCN model [10]. Besides, GCNII also can relieve the problem of over smoothing. Because GCN suffers from the challenges of time and memory for training large graphs, Chen et al propose a FastGCN algorithm to resolve the problem mentioned above [11]. FastGCN first interprets graph convolutions as integral transforms, and then evaluates the integrals through Monte-Carlo approximation. Chiang et al propose a ClusterGCN algorithm to train very deep GCN on large-scale graphs [12]. The key strategy of ClusterGCN is that it samples a block of nodes from a dense subgraph in the input graph.

Furthermore, many algorithms have been proposed to improve the performance of GNNs. Fusing attention and multi-hop graph convolution model enables effective long-range message passing and improves the accuracies of GNNs [13, 14]. Wang et al propose a multi-hop attention graph neural network (MAGNA) to improve the performance of node classification [13]. MAGNA computes attention by aggregating the attention scores over all the possible multi-hop neighborhoods. Likewise, Xue et al present multi-hop hierarchical graph neural networks (MHGNNs) to obtain further node information and broad receptive field [14], and employ attention to extract significant hop-level features. By collecting the information from multi-hop neighboring nodes within one step of graph convolutions, Li et al propose a modified GNN to improve the accuracy of node classification [15].

In addition, some characters of GCN are researched. For example, Jin et al study the performance of GCN changes with different propagation mechanisms including 1-hop, 2-hop and k-nearest neighbor (kNN) network neighbors, and propose a U-GCN algorithm to improve the accuracy of GCN [16]. Jin et al have conducted research indicating GCN can destroy original node feature similarity which plays an important role in node classification. Therefore, they propose a framework named SimP-GCN to preserve node similarity while exploiting graph structure [17]. SimP-GCN can balance the information from graph structure and node features and achieve better performance on both assortative and disassortative graphs. Duong et al find that a strong correlation between node features and node labels may lead to better performance of GNN, and they propose new feature initialization methods to deal with non-attributed graphs [18]. Chen et al design a distribution matching named structure-attribute transformer (SAT) to deal with attribute-incomplete graphs. SAT, which achieves the joint distribution modeling of structures and attributes, can be used to link prediction and node attribute completion tasks [19]. Taguchi et al propose a varietal GCN to handle the graphs with incomplete (or missing) features that are dealt with Gaussian mixture model compensator [20]. The proposed method combines the processing of missing features with graph learning in a neural network architecture. Likewise, in the face of the graph with weak information (incomplete structure, incomplete features and insufficient labels), Liu et al design a dual-channel diffused propagation then transformation (D²PT) model to improve the performance of GNN [21]. D²PT enables GNN to propagate information for the nodes with long-scale range and the isolated nodes from the largest connected component.

Many graphs in the real world do not contain any feature information due to privacy concerns or difficulty in collecting node features [18]. An example is the social networks including REDDIT [22, 23] and Karate [24] data. This phenomenon also exists in the chemical field [25, 26]. However, the existing GNNs cannot achieve satisfactory performance on the graphs with incomplete features [19], and their performance deteriorates on the graphs without any node features [27]. In this work, we propose a simple and effective convolutional operator which enables GCN to achieve better performance on the graphs without node features. First, the proposed approach introduces an extensional adjacency matrix that is defined by the 2-path neighboring nodes as a convolutional operator. Then, the modified GCN named exopGCN is tested on six widely used graphs. Second, the proposed convolutional operator is applied to 13 GNN models, and the performances of most of these methods are improved significantly. At last, the relationship between node classification by GCN and community detection is studied. The experimental results show that exopGCN can offer superior performance over other GNNs in the graphs without node features, and also offer a general skill to improve accuracy of GNNs. More importantly, the results reveal that there is a strong correlation between node classification by GCN and community detection. It is expected that these results will open up a new venue for exploring the theories of GCNs.

2. Methods

2.1 Graph convolutional networks (GCNs)

Given an undirected and unweighted graph with n nodes and m edges, it can be described as G = (V,E), v = {v_i|i = 1,2,⋯,n} is the set of nodes, and E = {e_ij|i∈V and j ∈V} is the set of edges. The graph can also be described as an adjacency matrix A, if there is an edge between node v_i and node v_j, then A_ij = 1; otherwise A_ij = 0. If each node v_i has d dimensional features, and all features of the nodes in the graph can be represented as a feature matrix X = [x₁,x₂,⋯,x_i,⋯,x_n]^T∈R^n×d.

Graph convolutional network (GCN) [6] is a typical and successful model of GNN and is applied to many fields. The main reason for the success of GCN is its ability to effectively aggregate the feature information of neighboring nodes by adjacency matrix A (see Eq (1)). To balance the features of neighboring nodes and self-node and to prevent the values of the nodes with high degree being too large in multi-convolutional layers, GCN uses modified convolutional matrix to aggregate the feature information (see Eq (2)). (1) (2) where Y and are feature matrixes, and I is an identity matrix, is a degree matrix, .

Using the convolutional matrix , the layer-wise propagation rule for GCN is described as Eq (3). (3) where H^(l) is the matrix of activations in the l^th layer; H⁽⁰⁾ = X,σ(⋯) denotes an activation function, such as the ReLU(⋯) = max(0,⋯); W^(l)∈R^d×f with d dimensional feature vector and f filters is a trainable weight matrix in the layer l.

GCN considers a two-layer for semi-supervised node classification on a graph based on the layer-wise propagation rule. The forward model of GCN is represented by Eq (4). (4) where , the weights W⁽⁰⁾ and W⁽¹⁾ are trained using gradient descent. The loss function is defined as the cross-entropy error over all labeled nodes (Eq (5)): (5) where y_L is the set of node indices with labels, F is the dimension of the output features and is equal to the number of classes. Y∈R^|y_L|×F is a label indicator matrix.

2.2 The proposed method exopGCN

In this section, we analyze the process of aggregation of neighboring nodes by GCN in detail. We can rewrite Eq (1) in the form of matrix (Eq (6)). In Eq (6), Y represents the feature matrix that is generated by graph convolution, and represents the gth feature of node v_i in the convolutional layer k. We take a small graph (see Fig 1) for example, assuming the node v₁ has four neighboring nodes (v₂,v₃,v₄,v₇), that is N(v₁) = {v₂,v₃,v₄,v₇}, and N(v₂) = {v₁}, N(v₃) = {v₁}, N(v₄) = {v₁,v₅,v₆}, N(v₇) = {v₁,v₈}, respectively.

(6)

Download:

Fig 1. An example of graph for GCN.

https://doi.org/10.1371/journal.pone.0301476.g001

In the first convolutional layer, the first feature () of node v₁ is aggregated by N(v₁) in Eq (7). Likewise, the values of and are calculated by Eqs (8)–(11) respectively. From Eqs (7)–(11), it can be observed that GCN only captures the features of neighboring nodes (1-path neighboring nodes, see Eq (16)). Following, we calculate the features in the second convolutional layer of GCN by Eq (12). The feature in the second layer can capture 2-path neighboring features (nodes v₅,v₆ and v₈).

(7)

(8)

(9)

(10)

(11)

(12)

Here, we mainly consider the nodes without feature. For simplicity, we do not employ Eq (2) to deal with adjacency matrix A. If the nodes in the graph do not have any feature, GCN employs identity matrix (I) to replace the feature matrix (Eq (13)). For example, we can see that the value of is 0 in the first convolutional layer (see Eq (14)), and (Y¹)^without = A. In the second convolutional layer, the value of is calculated by 1-path neighboring nodes (see Eq (15)). Note that, the calculation of is determined by the value of 1-path neighboring nodes because some elements in (Y¹)^without are equal to 0. Like the case with node features, also captures the information of 2-path neighboring nodes. Because long-range propagation can effectively improve the performance of GNNs [21]. Therefore, is there a method for GCN to propagate deeper long neighboring nodes with two convolutional layers.

(13)

(14)

(15)

To solve this problem, this work proposes a modified GCN named exopGCN for node classification without features. exopGCN first introduces an extensional adjacency matrix by path-driven neighboring nodes [27], and then a convolutional operator is performed on GCN for node classification without features. The path-driven neighboring nodes (or called t-path neighboring nodes) of node v_i is defined by the shortest path between two nodes, that is is the sets of nodes whose shortest path (d_sp) to node v_i is less than or equal to t (see Eq (16)). Based on the definition of path-driven neighboring nodes, we can construct the extensional adjacency matrix M^t and the element in is defined by Eq (17).

(16)

(17)

Using the extensional adjacency matrix M^t, GCN can fuse the information of faster neighboring nodes in fewer layers. Take the node v₁ as an example (see Fig 2), without node features, after 1-layer, GCN only contains its own information, and in 2-layer, GCN acquires the information from nodes v₂,v₃,v₄ and v₇ which are 1-path neighboring nodes. After 1-layer, exopGCN obtains the information from the nodes v₂,v₃,v₄ and v₇. After 2-layer, exopGCN acquires the information from all the nodes. Therefore, under the condition of the same number of layers, exopGCN can obtain more information from further nodes than GCN. As t increases, exopGCN quickly acquires information from more distant nodes.

Download:

Fig 2. Comparison of GCN and exopGCN on acquiring information by neighboring nodes without features in this work, the feed forward propagation and softmax classifier of exopGCN are described as Eq (18) and Eq (19) respectively.

https://doi.org/10.1371/journal.pone.0301476.g002

(18)

(19) where

is a degree matrix of

.

3. Results

To evaluate the effectiveness of the proposed method exopGCN, we conducted empirical experiments on six publicly available datasets (see S1 Datasets), comparing its performance against 13 state-of-the-art GNN methods. These six datasets are Cora, Citeseer, Pubmed [28], Karate [24], Dolphins [29] and Polbook (http://www-personal.umich.edu/~mejn/netdata/, Books about US politics). Cora, Citeseer and Pubmed have 2708, 3312 and 19717 nodes, and 5429, 4732 and 44338 edges respectively. The nodes in the three graphs are divided into 7, 6 and 3 classes respectively. Note that, we only select the nodes with labels and features in Citeseer.

Karate, Dolphins and Polbook are small graphs with community structure, and the nodes in the three graphs do not have labels and features. They (Karate, Dolphins and Polbook) have 34, 62 and 105 nodes, and 78, 159 and 441 edges respectively. In order to evaluate the performance of different GNN methods, the nodes in the three graphs are divided by community labels. That is, we treat the nodes in the same community with the same class. As a result, Karate, Dolphins and Polbook are divided into 2, 2 and 3 classes respectively.

The performance of exopGCN is compared with other 13 GNN methods (The hyper-parameters setting for exopGCN and other GNNs is shown in S1 File). These 13 methods are GCN [6], FastGCN [11], GAT [30], SGC [31], ClusterGCN [12], DAGNN [32], APPNP [33], SSGC [34], GraphMLP [35], RobustGCN [36], LATGCN [37], MedianGCN [38] and ONF (ONFdw and ONFde) [18]. Some previous methods that deal with attribute-incomplete graphs have been proposed. But these methods, including SAT [19], GCNMF [20], D²PT [21], require some node features as input, and this is different from exopGCN method which does not require any node feature. Similar to exopGCN, the method (we abbreviate this method as ONF) proposed in the literature [18] does not need any node feature and classifies nodes by SGC [31]. The node features of ONF are generated by learning-based approaches and centrality-based approaches. Here, we first select two algorithms of node features generation with better performance [18, 23], that is deepwalk [39] (ONFdw) from learning-based approaches and degree (ONFde) from centrality-based approaches. The dimension of output node features for deepwalk is set to 64, and the node features generated by degree are represented by an one-hot vector [23]. Then, the performance on node classification of ONFdw and ONFde is compared with exopGCN.

In exopGCN, the convolutional operator M^t is generated by our work (see S2 Datasets), and the convolutional operator of other 13 GNNs are adjacency matrices A. Then, for Cora, Citeseer and Pubmed graphs (The indices of training and testing nodes for these three graphs are recorded in S1 File), we evaluate exopGCN and 13 GNN methods with 5% of the training size and 10% of the testing size, respectively. For other three small graphs, in order to improve performance, the training nodes are evenly selected from different communities (or labels) (The indices of training and testing nodes for three small graphs are recorded in S1 File). For example, in Karate with two communities (or labels), half of training nodes are from the first community, and remaining training nodes are from the other community. Since the sizes of the three graphs are small, we evaluate exopGCN and 13 GNN methods with 20% of the training size and 20% of the testing size, respectively. The accuracy of exopGCN and 13 GNN methods are shown in Table 1.

Download:

Table 1. The accuracy of different GNNs.

https://doi.org/10.1371/journal.pone.0301476.t001

The performance of exopGCN is improved significantly compared with other 13 GNNs on Cora, Citeseer, Pubmed, Dolphins and Polbook. The best performance is appeared on Polbook, the performance of exopGCN is better than that of worst-performing method (LATGCN) by 68.18% and is better than that of best-performing method (SGC) by 36.36%. Comparing with the worst-performing methods on Cora, Citeseer, Pubmed and Dolphins, the values of relative improvement of exopGCN are 23.71%, 34.74%, 33.01% and 53.85%, respectively. The values of relative improvement of exopGCN are 3.71%, 22.36%, 0.3% and 5% more than best-performing methods. exopGCN shows poor performance on Karate.

4. The performance of graph neural networks with the proposed convolutional operator

In this section, we analyze the performance of current GNNs with the simple convolutional operator proposed in this work. First, 13 GNNs mentioned above are employed to test, and their convolutional operators (adjacency matrices) are replaced by our proposed convolutional operator (M^t) (see S2 Datasets). Second, the 13 modified GNNs are used to classify nodes on the six graphs (Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook). At last, the accuracy of 13 modified GNNs and original GNNs are computed. Note that, the parameter settings in this section are the same as those in the section of Results. Fig 3 shows the improvement or decrease of accuracy between the modified and original GNNs. From Fig 3, it can be observed that the accuracy of most GNNs are improved by using the proposed convolutional operator (M^t) as a whole. Next, we investigate the results in detail. The accuracy of eight GNNs (GCN, FastGCN, GAT, SGC, ClusterGCN, GraphMLP, LATGCN and ONFde) are improved significantly on Cora, Citeseer, Pubmed and Polbook. One method (MedianGCN) obtains worse performance on Cora by adding the proposed convolutional operator. The relative digits are 0, 6, 2, 2 and 2 on Citeseet, Pubmed, Karate, Dolphins and Polbook, respectively. The best improvement of accuracy is LATGCN on the Poolbook by 68.18%. The worst performance is FastGCN with a decrement of 38.46% on Dolphins by adding the proposed convolutional operator. Generally speaking, these GNNs show poor performance on Pubmed probably because long-range propagation may bring redundant information for node classification. However, these results can provide one with a general skill to improve the accuracy of node classification without features.

Download:

Fig 3. The improvement or decrease of accuracy for 13 GNNs using the proposed convolutional operator.

Co, Ci, Pu, Ka, Do and Po represent Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook.

https://doi.org/10.1371/journal.pone.0301476.g003

5. Selection of the parameter t

Selection of the parameter t plays a crucial role in improving the performance and preventing over fitting of exopGCN. Here, we discuss the relationship between the parameter t and the accuracy on the six graphs including Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook. The parameter t is set from 1 to 6. Note that, if the graph diameter is less than 6, the upper limit of t is set to graph diameter. The results are show in Fig 4. From Fig 4, it can be observed that as the parameter t increase, the accuracy is reduced in general. On Pubmed, Karate, Dolphins and Polbook, the best accuracies appear when t is set to 2. For Citeseer, although the best accuracy with 50.45% is obtained when t is set to 3, exopGCN with t = 2 achieves a close accuracy of 45.92%. For Cora, the best accuracy with 42.96% is obtained when t is set to 4, and the accuracy is equal to 37.41% when t is set to 2. The phenomenon on Cora may be caused because it has diverse properties from other five graphs. In general, it is reasonable to set the parameter t to 2 for exopGCN, and the parameter with t = 2 can also prevent over fitting problem for diverse graphs.

Download:

Fig 4. The relationship between the parameter t and the accuracy on the six graphs.

https://doi.org/10.1371/journal.pone.0301476.g004

6. The relationship between node classification and community detection

Furthermore, we discuss the relationship between node classification using GCN and community detection. From the Eq (13), it can be observed that GCN aggregates the information by the convolutional matrix and obtains similar information which has the same neighboring nodes when nodes do not have features. For example, the nodes 5 and 6 in Fig 1 have similar features. Therefore, GCN clusters the nodes with similar neighboring nodes into a class. The concept is close to the community structure in which the connections between nodes are tight, while the connections with other nodes in the network are loose [40].

In order to analyze the relationship between node classification using GCN and community detection, we first introduce edge (e_ij) clustering coefficient [41] defined by Eq (20).

(20)

In Eq (20), is the number of triangles of the edge e_ij,k_i and k_j are the degree of nodes v_i and v_j respectively. Edge clustering coefficient also can be represented by neighboring nodes (see Eq (21)).

(21)

In Eq (21), N_ij is the common neighboring nodes set of the nodes v_i and v_j.

From the definition of edge clustering coefficient in Eq (21), it can be observed that if nodes v_i and v_j have more common neighboring nodes, the edge clustering coefficient of the edge connecting node v_i and node v_j is greater. Likewise, we also find that GCN tends to cluster the two nodes with common neighboring nodes into a class because the two nodes have similar features. Therefore, if the edge clustering coefficient of the edge that connects node v_i and node v_j has large values, node v_i and node v_j are clustered into the same class by GCN. From the literature [41], we know that the edge (e_ij) connecting node v_i and node v_j in the same community tends to have a large value of edge clustering coefficient. Therefore, if two nodes are grouped into a class by GCN, the two nodes are likely to be in the same community.

Modularity (see Eq (22)) [42] is widely used to measure community structure of a graph.

(22)

In Eq (22), m and A represent the number of edges and the adjacency matrix respectively. P_ij is the expected number of edges between nodes v_i and v_j in the null model. δ = 1 if nodes v_i and v_j are in the same community (C_i = C_j), zero otherwise.

In order to research the relationship between node classification using GCN and community detection, the accuracy of node classification using GCN and community detection are studied on both six real-world graphs (Cora, Citeseer, Pubmed, Karate, Dolphins and Polbook) and nine synthetic graphs with different values of modularity. The nine synthetic graphs called LFR benchmark are proposed by Lancichinetti et al [43]. In order to generate these synthetic graphs, some parameters should be set. (1) The number of nodes n, the average degree ⟨k⟩ and maximum degree ⟨max k⟩. (2) Minimum for the community sizes ⟨min c⟩ and maximum for the community sizes ⟨max c⟩. (3) Minus exponent for the degree sequence ⟨t1⟩ and minus exponent for the community size distribution ⟨t2⟩. (4) Mixing parameter ⟨μ⟩. The parameter μ is an index to represent community structure. Low μ indicates that the generated graphs have strong community structure. In this work, we set these parameters as follows: n = 1000, k = 8, max k = 40, t1 = 2, t2 = 1, min c = 5 and max c = 35. By turning the mixing parameter μ ∈ [0.1, 0.9] with a step 0.1, we will obtain 9 synthetic graphs (see S1 Datasets) with community labels.

First, the nodes in the six real-world graphs (see S3 Datasets) and nine synthetic graphs (see S3 Datasets) are classed by GCN (The indices of training and testing nodes for these 15 graphs are recorded in S1 File). Note that, the training size and the testing size of six real-world graphs are set the same as in the section of Results. The training size and the testing size of nine synthetic graphs are set 20%. Second, the values of modularity Q are calculated by real labels (or community labels). At last, the accuracy of node classification using GCN and the values of Q are compared, and the results are shown in Fig 5. From Fig 5A, it can be observed that there is no strong regularity between the accuracy and the modularity Q overall. In detail, the high values of modularity Q means high values of accuracy on Pubmed and Karate. On the contrary, the relationship between accuracy and modularity Q has strong regularity (see Fig 5B) on nine synthetic graphs. It can be observed that the accuracy of GCN decreases with the decrease of Q value, this means that the principle of GCN and modularity Q may be similar, that is they tend to cluster the nodes with similar neighboring nodes into a class or a community.

Download:

Fig 5. The relationship between accuracy of node classification using GCN and modularity Q.

Ecc, In and Out represent edge clustering coefficient, and , and s1, s2, s3, s4, s5, s6, s7, s8 and s9 represent nine synthetic graphs.

https://doi.org/10.1371/journal.pone.0301476.g005

We study the weak regularity in six real-world graphs and strong regularity in nine synthetic graphs in detail, and use a modified edge clustering coefficient (C^m, see Eq (23)) to explain the phenomenon in Fig 5A and 5B. Here, we calculate two types of C^m of an edge e_ij, that is of the edge in which the two nodes (v_i and v_j) have the same label (or in the same community) and of the edge in which the two nodes (v_i and v_j) have different labels (or in different communities). For an edge e_ij, a high value of means that there are more common neighboring nodes between the two nodes (v_i and v_j) connected by e_ij, and two nodes can be classified by GCN with a high accuracy. While, a low value of means that there are fewer common neighboring nodes between the two nodes (v_i and v_j) connected by e_ij, and the edge e_ij can be easily broken by community detection methods with a high modularity Q. On the contrary, a high value of corresponds to a low value of modularity Q.

Therefore, we calculate the average values of and for each graphs, the results are shown in Fig 5C and 5D respectively. From the Fig 5, we observe that higher values of corresponds to higher accuracy on Cora, Citeseer, Karate, Dolphins and Polbook (see Fig 5A and 5C, solid red line) except for Pubmed. As previously analyzed, lower values of corresponds to higher modularity Q on citeseer, Pubmed and Karate (see Fig 5A and 5C, dotted blue line), and higher values of corresponds to lower modularity Q on Cora, Dolphins and Polbook (see Fig 5A and 5C, dotted blue line). Likewise, the phenomenon is appeared on eight synthetic graphs except for the second synthetic graphs. From the results mentioned above, the regularity between node classification using GCN and community detection (modularity Q) can be revealed by the modified edge clustering coefficient including and .

(23)

7. Conclusion and discussion

Graph convolutional network (GCN), which represents node features by a convolutional matrix and propagation mechanisms, has become a power tool to deal with graph-structure data. But, the performance of GCN deteriorates when it encounters the graphs with missing node features. In order to resolve the problem, we first introduce a simple and effective convolutional operator by path-driven neighboring nodes, and then a modified GCN named exopGCN is proposed for node classification. Experimental results demonstrate that exopGCN show better performance for node classification on the graphs without node features comparing with other GNNs. Furthermore, the performance of 13 GNNs are improved significantly by adding the proposed convolutional operator, which means that our research can provide one with a general skill to improve the performance of GNNs for node classification on graphs without features. More important, using the edge clustering coefficient as a gap, the relationship between node classification using GCN without features and traditional community detection are researched. As a result, the positive relationship can reveal the mysterious theory of GCN from view of traditional and unsupervised methods.

Here, we discuss two issues of exopGCN and a direction of further research. The first issue is the application of exopGCN on node classification with features. To resolve the problem, exopGCN is employed to classify the nodes with features on Cora, Citeseer and Pubmed, and the results obtained by other 13 GNNs are provided for comparison (see S1 Table). As shown in S1 Table, compared to other 13 GNNs, exopGCN does not obtain the best performance on three graphs. This demonstrates that aggregation of features from long-range neighboring nodes does not improve the accuracy of exopGCN, but leads to redundancy of features, and thus cannot classify nodes effectively. The second issue is the complexity of exopGCN. Compared with GCN, the additional overhead is the computational cost of the convolutional operator M². In fact, the computer of convolutional operator can be converted into K-hop reachability queries [44] with K = 2 for each nodes in the graphs, and many fast algorithms [45, 46] are proposed to resolve the problem and have been apply to large real-world graphs. Finally, although exopGCN achieves good performance on node classification without node feature, the performance of exopGCN and some GNNs deteriorates after adding the proposed convolutional operator (see Fig 3). The main reason for this phenomenon may be that the proposed convolutional operator carries redundant information. As with references [13, 14], we can use attention mechanism to select vital 2-path neighboring nodes and improve the performance of GNNs.

Supporting information

S1 Datasets. Six original graph data and nine synthetic graph data.

https://doi.org/10.1371/journal.pone.0301476.s001

(ZIP)

S2 Datasets. Input data for exopGCN and different GNNs.

https://doi.org/10.1371/journal.pone.0301476.s002

(ZIP)

S3 Datasets. Input data for GCN.

https://doi.org/10.1371/journal.pone.0301476.s003

(ZIP)

S1 File. Parameters setting.

https://doi.org/10.1371/journal.pone.0301476.s004

(PDF)

S1 Table. The accuracies of exopGCN and other GNNs on node classification with features.

https://doi.org/10.1371/journal.pone.0301476.s005

(PDF)

References

1. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks: A review of methods and applications. AI Open. 2020; 57–81.
- View Article
- Google Scholar
2. Ju W, Yang J, Qu M, Song W, Shen J, Zhang M. KGNN: Harnessing Kernel-based Networks for Semi-supervised Graph Classification. WSDM’22: The ACM International Conference on Web Search and Data Mining; 2022 Feb 21–25; Arizona, America. New York: Association for Computing Machinery (ACM); 2022.
3. Qin Y, Wang Y, Sun F, Ju W, Hou X, Wang Z, et al. DisenPOI: Disentangling Sequential and Geographical Influence for Point-of-Interest Recommendation. WSDM’23: The ACM International Conference on Web Search and Data Mining; 2023 Feb 27-Mar 3; Singapore, Singapore. New York: Association for Computing Machinery (ACM); 2023.
4. Wu S, Sun F, Zhang W, Xie X, Cui B. Graph Neural Networks in Recommender Systems: A Survey. ACM Comput Surv. 2022; 37(4): 111.
- View Article
- Google Scholar
5. Wu L, Chen Y, Shen K, Guo X, Gao H, Li S, et al. Graph Neural Networks for Natural Language Processing: A Survey. Found Trends Mach Le. 2023; 16(2): 119–328.
- View Article
- Google Scholar
6. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. ICLR 2017: International Conference on Learning Representations; 2017 Apr 24–26; Toulon, France. OpenReview.net; 2017. p. 1–14.
7. Rong Y, Huang W, Xu T, Huang J. Dropedge: towards deep graph convolutional networks on node classification. ICLR 2020: International Conference on Learning Representations; 2020 Apr 26–30; Addis Ababa, Ethiopia. OpenReview.net; 2020. p. 1–17.
8. Eliasof M, Haber E, Treister E. pathGCN: Learning General Graph Spatial Operators from Paths. ICML’22: Proceedings of the 39th International Conference on Machine Learning; 2022 Jul 17–23; Maryland, America. New York: PMLR; 2022. p. 5878–91.
9. Li G, Muller M, Thabet A, Ghanem B. DeepGCNs: Can GCNs Go as Deep as CNNs?. ICCV 2019: 2019 IEEE/CVF International Conference on Computer Vision; 2019 Oct 27-Nov 2; Seoul, Korea (South). New Jersey: Institute of Electrical and Electronics Engineers (IEEE); 2019. p. 9267–76.
10. Chen M, Wei Z, Huang Z, Ding B, Li Y. Simple and Deep Graph Convolutional Networks. ICML’20: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13–18; Online. New York: PMLR; 2020. p. 1725–35.
11. Chen J, Ma T, Xiao C. Fastgcn: fast learning with graph convolutional networks via importance sampling. ICLR 2018: International Conference on Learning Representations; 2018 Apr 30-May 3; British Columbia, Canada. OpenReview.net; 2018. p. 1–15.
12. Chiang W, Liu X, Si S. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. KDD’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019 Aug 4–8; Alaska, America. New York: Association for Computing Machinery (ACM); 2019. p. 257–66.
13. Wang G, Ying R, Huang J, Leskovec J. Multi-hop Attention Graph Neural Networks. IJCAI-21: The 30th International Joint Conferences on Artificial Intelligence; 2021 Aug 19–26; Montreal, Canada. Massachusetts: Morgan Kaufmann Publishers; 2021. p. 3089–96.
14. Xue H, Sun X, Sun W. Multi-hop Hierarchical Graph Neural Networks. BigComp 2020: 2020 IEEE International Conference on Big Data and Smart Computing; 2020 Feb 19–22; Busan, Korea. New Jersey: Institute of Electrical and Electronics Engineers (IEEE); 2020. p. 82–9.
15. Li Y, Tanaka Y. Structure-Aware Multi-Hop Graph Convolution for Graph Neural Networks. IEEE Access. 2022; 10: 16624–33.
- View Article
- Google Scholar
16. Jin D, Yu Z, Huo C, Wang R, Wang X, He D, et al. Universal Graph Convolutional Networks. NeurIPS 2021: Proceedings of the 35th International Conference on Neural Information Processing Systems; 2021 Dec 6–14; Online. New York: Curran Associates Inc.; 2021. p. 1–11.
17. Jin W, Derr T, Wang Y, Ma Y, Liu Z, Tang J. Node Similarity Preserving Graph Convolutional Networks. WSDM’21: The ACM International Conference on Web Search and Data Mining; 2021 Mar 8–12; Jerusalem, Israel. New York: Association for Computing Machinery (ACM); 2021. p. 148–156.
18. Duong CT, Hoang TD, Dang HTH, Nguyen QVH, Aberer K. On Node Features for Graph Neural Networks. NeurIPS 2019: Proceedings of the 33th International Conference on Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, Canada. New York: Curran Associates Inc.; 2019. p. 1–6.
19. Chen X, Chen S, Yao J, Zheng H, Zhang Y, Tsang IW. Learning on Attribute-Missing Graphs. IEEE Trans Pattern Anal Mach Intell. 2020; 44(2): 740–57.
- View Article
- Google Scholar
20. Taguchi H, Liu X, Murata T. Graph convolutional networks for graphs containing missing features. Future Gener Comp Sys. 2021; 117: 155–68.
- View Article
- Google Scholar
21. Liu Y, Ding K, Wang J, Lee V, Liu H, Pan S. Learning Strong Graph Neural Networks with Weak Information. KDD’23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2023 Aug 6–10; California, America. New York: Association for Computing Machinery (ACM); 2023.
22. Morris C, Kriege NM, Bause F, Kersting K, Mutzel P, Neumann M. TUDataset: A collection of benchmark datasets for learning with graphs. ICML Workshop on Graph Representation Learning and Beyond. 2020.
- View Article
- Google Scholar
23. Cui H, Lu Z, Li P, Yang C. On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs. CIKM ‘22: Proceedings of the 31st ACM International Conference on Information and Knowledge Management; 2022 Oct 17–21; Atlanta, America. New York: Association for Computing Machinery (ACM). 2022. p. 3898–902.
24. Zachary WW. An information flow model for conflict and fission in small groups. J Anthropol Res. 1977; 33(4): 452–73.
- View Article
- Google Scholar
25. Ramakrishnan R, Dral PO, Rupp M, Anatole von Lilienfeld O. Quantum chemistry structures and properties of 134 kilo molecules. Sci Data 1. 2014; 140022. pmid:25977779
- View Article
- PubMed/NCBI
- Google Scholar
26. Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J Chem Inf Model. 2012; 52 (11): 2864–75. pmid:23088335
- View Article
- PubMed/NCBI
- Google Scholar
27. Jiao Q, Zhao P, Zhang H, Han Y, Liu G. Path-enhanced graph convolutional networks for node classification without features. PLoS ONE. 2023; 18(6): e0287001. pmid:37294827
- View Article
- PubMed/NCBI
- Google Scholar
28. Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T. Collective classification in network data. AI magazine, 2008; 29(3): 93.
- View Article
- Google Scholar
29. Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol. 2003; 54: 396–405.
- View Article
- Google Scholar
30. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. ICLR 2018: International Conference on Learning Representations; 2018 Apr 30-May 3; British Columbia, Canada. OpenReview.net; 2018. p. 1–12.
31. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K. Simplifying Graph Convolutional Networks. ICML’19: Proceedings of the 36th International Conference on Machine Learning; 2019 Jun 9–15; California, America. New York: PMLR; 2019. p. 6861–71.
32. Liu M, Gao H, Ji S. Towards Deeper Graph Neural Networks. KDD’20: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2020 Aug 23–27; Virtual Event, America. New York: Association for Computing Machinery (ACM); 2020. p. 338–48.
33. Gasteiger J, Bojchevski A, Günnemann S. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. ICLR 2019: International Conference on Learning Representations; 2019 May 6–9; New Orleans, America. OpenReview.net; 2019. p. 1–15.
34. Zhu H, Koniusz P. Simple Spectral Graph Convolution. ICLR 2021: International Conference on Learning Representations; 2021 May 3–7; Online. OpenReview.net; 2021. p. 1–15.
35. Hu Y, You H, Wang Z, Wang Z, Zhou E, Gao Y. Graph-MLP: Node Classification without Message Passing in Graph. arXiv:2106.04051, [Preprint]. 2021. Available from: https://arxiv.53yu.com/pdf/2106.04051.pdf.
- View Article
- Google Scholar
36. Zhu D, Zhang Z, Cui P, Zhu W. Robust Graph Convolutional Networks Against Adversarial Attacks. KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019 Aug 4–8; Alaska, America. New York: Association for Computing Machinery (ACM); 2019. p. 1399–407.
37. Jin H, Zhang X. Latent Adversarial Training of Graph Convolution Networks. ICML Workshop on Learning and Reasoning with Graph Structured Representations; 2019. Available from: https://www.cs.uic.edu/~hjin/files/icml_ws_latgcn.pdf.
- View Article
- Google Scholar
38. Chen L, Li J, Peng Q, Liu Y, Zheng Z, Yang C. Understanding Structural Vulnerability in Graph Convolutional Networks. IJCAI-21: The 30th International Joint Conferences on Artificial Intelligence; 2021 Aug 19–26; Montreal, Canada. Massachusetts: Morgan Kaufmann Publishers; 2021. p. 2249–55.
39. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations. KDD’14: Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2014 Aug 24–27; New York, America. New York: Association for Computing Machinery (ACM); 2014. p. 701–10.
40. Girvan M, Newman MEJ. Community structure in social and biological networks. Proc Natl Acad Sci USA. 2002; 99(12): 7821–26. pmid:12060727
- View Article
- PubMed/NCBI
- Google Scholar
41. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA. 2004; 101(9): 2658–63. pmid:14981240
- View Article
- PubMed/NCBI
- Google Scholar
42. Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004; 69(2): 026113. pmid:14995526
- View Article
- PubMed/NCBI
- Google Scholar
43. Lancichinetti A, Fortunato S. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E. 2009; 80(1): 016118. pmid:19658785
- View Article
- PubMed/NCBI
- Google Scholar
44. Cheng J, Shang Z, Cheng H, Wang H, Yu JX. Efficient processing of k-hop reachability queries. VLDB J. 2014; 23: 227–52.
- View Article
- Google Scholar
45. Peng Y, Lin X, Zhang Y, Zhang W, Qin L. Answering reachability and K-reach queries on large graphs with label constraints. VLDB J. 2022; 31: 101–27.
- View Article
- Google Scholar
46. Yildirim H, Chaoji V, Zaki MJ. GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 2012; 21: 509–34.
- View Article
- Google Scholar

[ref1] 1. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks: A review of methods and applications. AI Open. 2020; 57–81.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Ju W, Yang J, Qu M, Song W, Shen J, Zhang M. KGNN: Harnessing Kernel-based Networks for Semi-supervised Graph Classification. WSDM’22: The ACM International Conference on Web Search and Data Mining; 2022 Feb 21–25; Arizona, America. New York: Association for Computing Machinery (ACM); 2022.

[ref3] 3. Qin Y, Wang Y, Sun F, Ju W, Hou X, Wang Z, et al. DisenPOI: Disentangling Sequential and Geographical Influence for Point-of-Interest Recommendation. WSDM’23: The ACM International Conference on Web Search and Data Mining; 2023 Feb 27-Mar 3; Singapore, Singapore. New York: Association for Computing Machinery (ACM); 2023.

[ref4] 4. Wu S, Sun F, Zhang W, Xie X, Cui B. Graph Neural Networks in Recommender Systems: A Survey. ACM Comput Surv. 2022; 37(4): 111.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Wu L, Chen Y, Shen K, Guo X, Gao H, Li S, et al. Graph Neural Networks for Natural Language Processing: A Survey. Found Trends Mach Le. 2023; 16(2): 119–328.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. ICLR 2017: International Conference on Learning Representations; 2017 Apr 24–26; Toulon, France. OpenReview.net; 2017. p. 1–14.

[ref7] 7. Rong Y, Huang W, Xu T, Huang J. Dropedge: towards deep graph convolutional networks on node classification. ICLR 2020: International Conference on Learning Representations; 2020 Apr 26–30; Addis Ababa, Ethiopia. OpenReview.net; 2020. p. 1–17.

[ref8] 8. Eliasof M, Haber E, Treister E. pathGCN: Learning General Graph Spatial Operators from Paths. ICML’22: Proceedings of the 39th International Conference on Machine Learning; 2022 Jul 17–23; Maryland, America. New York: PMLR; 2022. p. 5878–91.

[ref9] 9. Li G, Muller M, Thabet A, Ghanem B. DeepGCNs: Can GCNs Go as Deep as CNNs?. ICCV 2019: 2019 IEEE/CVF International Conference on Computer Vision; 2019 Oct 27-Nov 2; Seoul, Korea (South). New Jersey: Institute of Electrical and Electronics Engineers (IEEE); 2019. p. 9267–76.

[ref10] 10. Chen M, Wei Z, Huang Z, Ding B, Li Y. Simple and Deep Graph Convolutional Networks. ICML’20: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13–18; Online. New York: PMLR; 2020. p. 1725–35.

[ref11] 11. Chen J, Ma T, Xiao C. Fastgcn: fast learning with graph convolutional networks via importance sampling. ICLR 2018: International Conference on Learning Representations; 2018 Apr 30-May 3; British Columbia, Canada. OpenReview.net; 2018. p. 1–15.

[ref12] 12. Chiang W, Liu X, Si S. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. KDD’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019 Aug 4–8; Alaska, America. New York: Association for Computing Machinery (ACM); 2019. p. 257–66.

[ref13] 13. Wang G, Ying R, Huang J, Leskovec J. Multi-hop Attention Graph Neural Networks. IJCAI-21: The 30th International Joint Conferences on Artificial Intelligence; 2021 Aug 19–26; Montreal, Canada. Massachusetts: Morgan Kaufmann Publishers; 2021. p. 3089–96.

[ref14] 14. Xue H, Sun X, Sun W. Multi-hop Hierarchical Graph Neural Networks. BigComp 2020: 2020 IEEE International Conference on Big Data and Smart Computing; 2020 Feb 19–22; Busan, Korea. New Jersey: Institute of Electrical and Electronics Engineers (IEEE); 2020. p. 82–9.

[ref15] 15. Li Y, Tanaka Y. Structure-Aware Multi-Hop Graph Convolution for Graph Neural Networks. IEEE Access. 2022; 10: 16624–33.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref16] 16. Jin D, Yu Z, Huo C, Wang R, Wang X, He D, et al. Universal Graph Convolutional Networks. NeurIPS 2021: Proceedings of the 35th International Conference on Neural Information Processing Systems; 2021 Dec 6–14; Online. New York: Curran Associates Inc.; 2021. p. 1–11.

[ref17] 17. Jin W, Derr T, Wang Y, Ma Y, Liu Z, Tang J. Node Similarity Preserving Graph Convolutional Networks. WSDM’21: The ACM International Conference on Web Search and Data Mining; 2021 Mar 8–12; Jerusalem, Israel. New York: Association for Computing Machinery (ACM); 2021. p. 148–156.

[ref18] 18. Duong CT, Hoang TD, Dang HTH, Nguyen QVH, Aberer K. On Node Features for Graph Neural Networks. NeurIPS 2019: Proceedings of the 33th International Conference on Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, Canada. New York: Curran Associates Inc.; 2019. p. 1–6.

[ref19] 19. Chen X, Chen S, Yao J, Zheng H, Zhang Y, Tsang IW. Learning on Attribute-Missing Graphs. IEEE Trans Pattern Anal Mach Intell. 2020; 44(2): 740–57.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref20] 20. Taguchi H, Liu X, Murata T. Graph convolutional networks for graphs containing missing features. Future Gener Comp Sys. 2021; 117: 155–68.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref21] 21. Liu Y, Ding K, Wang J, Lee V, Liu H, Pan S. Learning Strong Graph Neural Networks with Weak Information. KDD’23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2023 Aug 6–10; California, America. New York: Association for Computing Machinery (ACM); 2023.

[ref22] 22. Morris C, Kriege NM, Bause F, Kersting K, Mutzel P, Neumann M. TUDataset: A collection of benchmark datasets for learning with graphs. ICML Workshop on Graph Representation Learning and Beyond. 2020.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref23] 23. Cui H, Lu Z, Li P, Yang C. On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs. CIKM ‘22: Proceedings of the 31st ACM International Conference on Information and Knowledge Management; 2022 Oct 17–21; Atlanta, America. New York: Association for Computing Machinery (ACM). 2022. p. 3898–902.

[ref24] 24. Zachary WW. An information flow model for conflict and fission in small groups. J Anthropol Res. 1977; 33(4): 452–73.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref25] 25. Ramakrishnan R, Dral PO, Rupp M, Anatole von Lilienfeld O. Quantum chemistry structures and properties of 134 kilo molecules. Sci Data 1. 2014; 140022. pmid:25977779
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref26] 26. Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J Chem Inf Model. 2012; 52 (11): 2864–75. pmid:23088335
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref27] 27. Jiao Q, Zhao P, Zhang H, Han Y, Liu G. Path-enhanced graph convolutional networks for node classification without features. PLoS ONE. 2023; 18(6): e0287001. pmid:37294827
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref28] 28. Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T. Collective classification in network data. AI magazine, 2008; 29(3): 93.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref29] 29. Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol. 2003; 54: 396–405.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref30] 30. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. ICLR 2018: International Conference on Learning Representations; 2018 Apr 30-May 3; British Columbia, Canada. OpenReview.net; 2018. p. 1–12.

[ref31] 31. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K. Simplifying Graph Convolutional Networks. ICML’19: Proceedings of the 36th International Conference on Machine Learning; 2019 Jun 9–15; California, America. New York: PMLR; 2019. p. 6861–71.

[ref32] 32. Liu M, Gao H, Ji S. Towards Deeper Graph Neural Networks. KDD’20: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2020 Aug 23–27; Virtual Event, America. New York: Association for Computing Machinery (ACM); 2020. p. 338–48.

[ref33] 33. Gasteiger J, Bojchevski A, Günnemann S. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. ICLR 2019: International Conference on Learning Representations; 2019 May 6–9; New Orleans, America. OpenReview.net; 2019. p. 1–15.

[ref34] 34. Zhu H, Koniusz P. Simple Spectral Graph Convolution. ICLR 2021: International Conference on Learning Representations; 2021 May 3–7; Online. OpenReview.net; 2021. p. 1–15.

[ref35] 35. Hu Y, You H, Wang Z, Wang Z, Zhou E, Gao Y. Graph-MLP: Node Classification without Message Passing in Graph. arXiv:2106.04051, [Preprint]. 2021. Available from: https://arxiv.53yu.com/pdf/2106.04051.pdf.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref36] 36. Zhu D, Zhang Z, Cui P, Zhu W. Robust Graph Convolutional Networks Against Adversarial Attacks. KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2019 Aug 4–8; Alaska, America. New York: Association for Computing Machinery (ACM); 2019. p. 1399–407.

[ref37] 37. Jin H, Zhang X. Latent Adversarial Training of Graph Convolution Networks. ICML Workshop on Learning and Reasoning with Graph Structured Representations; 2019. Available from: https://www.cs.uic.edu/~hjin/files/icml_ws_latgcn.pdf.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref38] 38. Chen L, Li J, Peng Q, Liu Y, Zheng Z, Yang C. Understanding Structural Vulnerability in Graph Convolutional Networks. IJCAI-21: The 30th International Joint Conferences on Artificial Intelligence; 2021 Aug 19–26; Montreal, Canada. Massachusetts: Morgan Kaufmann Publishers; 2021. p. 2249–55.

[ref39] 39. Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations. KDD’14: Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2014 Aug 24–27; New York, America. New York: Association for Computing Machinery (ACM); 2014. p. 701–10.

[ref40] 40. Girvan M, Newman MEJ. Community structure in social and biological networks. Proc Natl Acad Sci USA. 2002; 99(12): 7821–26. pmid:12060727
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref41] 41. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA. 2004; 101(9): 2658–63. pmid:14981240
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref42] 42. Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004; 69(2): 026113. pmid:14995526
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref43] 43. Lancichinetti A, Fortunato S. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E. 2009; 80(1): 016118. pmid:19658785
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref44] 44. Cheng J, Shang Z, Cheng H, Wang H, Yu JX. Efficient processing of k-hop reachability queries. VLDB J. 2014; 23: 227–52.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref45] 45. Peng Y, Lin X, Zhang Y, Zhang W, Qin L. Answering reachability and K-reach queries on large graphs with label constraints. VLDB J. 2022; 31: 101–27.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref46] 46. Yildirim H, Chaoji V, Zaki MJ. GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 2012; 21: 509–34.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

Figures

Abstract

1. Introduction

2. Methods

2.1 Graph convolutional networks (GCNs)

2.2 The proposed method exopGCN

3. Results

4. The performance of graph neural networks with the proposed convolutional operator

5. Selection of the parameter t

6. The relationship between node classification and community detection

7. Conclusion and discussion

Supporting information

S1 Datasets. Six original graph data and nine synthetic graph data.

S2 Datasets. Input data for exopGCN and different GNNs.

S3 Datasets. Input data for GCN.

S1 File. Parameters setting.

S1 Table. The accuracies of exopGCN and other GNNs on node classification with features.

References