Signed random walk diffusion for effective representation learning in signed graphs

How can we model node representations to accurately infer the signs of missing edges in a signed social graph? Signed social graphs have attracted considerable attention to model trust relationships between people. Various representation learning methods such as network embedding and graph convolutional network (GCN) have been proposed to analyze signed graphs. However, existing network embedding models are not end-to-end for a specific task, and GCN-based models exhibit a performance degradation issue when their depth increases. In this paper, we propose Signed Diffusion Network (SidNet), a novel graph neural network that achieves end-to-end node representation learning for link sign prediction in signed social graphs. We propose a new random walk based feature aggregation, which is specially designed for signed graphs, so that SidNet effectively diffuses hidden node features and uses more information from neighboring nodes. Through extensive experiments, we show that SidNet significantly outperforms state-of-the-art models in terms of link sign prediction accuracy.


Introduction
Given a signed social graph, how can we learn appropriate node representations to infer the signs of missing edges? Signed social graphs model trust relationships between people with positive (trust) and negative (distrust) edges. Many online social services such as Epinions [1] and Slashdot [2] that allow users to express their opinions are naturally represented as signed social graphs. Such graphs have attracted considerable attention [3] for diverse applications including sign prediction [4,5], link prediction [6][7][8], node ranking [9][10][11][12], community analysis [13][14][15][16], graph generation [17,18], and anomaly detection [19][20][21]. Node representation learning is a fundamental building block for analyzing graph data, and many researchers have put tremendous efforts into developing effective models for unsigned graphs. Graph convolutional networks (GCN) and their variants [22,23] have spurred great attention in data mining and machine learning community, and recent works [24,25] have demonstrated stunning progress by handling the performance degradation caused by over-smoothing [26,27] (i.e., node representations become indistinguishable as the number of propagation steps increases) or the vanishing gradient problem [25] in the first generation of GCN models. However, all of these models have a limited performance on node representation learning in signed graphs since they only consider unsigned edges under the homophily assumption [22]. Many studies have been recently conducted to consider such signed edges, and they are categorized into network embedding and GCN-based models. Network embedding [28,29] learns the representations of nodes by optimizing an unsupervised loss that primarily aims to locate two nodes' embeddings closely (or far) if they are positively (or negatively) connected. However, they are not trained jointly with a specific task in an end-to-end manner, i.e., latent features and the task are trained separately. Thus, their performance is limited unless each of them is tuned delicately. GCN-based models [30,31] have extended the graph convolutions to signed graphs using balance theory [32] in order to properly propagate node features on signed edges. However, these models are directly extended from existing GCNs without consideration of the over-smoothing problem that degrades their performance (see Fig 4). This problem hinders them from exploiting more information from multi-hop neighbors for learning node features in signed graphs.
In this paper, we propose SIGNED DIFFUSION NETWORK (SIDNET), a novel graph neural network for node representation learning in signed graphs. Our main contributions are summarized as follows: • Method. We propose SIDNET, an end-to-end representation learning method in a signed graph with multiple signed diffusion layers (Fig 1). Our signed diffusion layer exploits signed random walks to propagate node embeddings on signed edges, and injects local features (Fig 1). This enables SIDNET to learn distinguishable node embeddings effectively considering multi-hop neighbors while preserving local information.
• Theory. We theoretically analyze the convergence property (Theorem 1) of our signed diffusion layer, showing how SIDNET prevents the over-smoothing issue. We also provide the time complexity analysis (Theorem 2) of SIDNET, showing SIDNET is linearly scalable w.r.t. the numbers of edges. (c) Our diffusion module aggregates the features of node v so that they are similar to those connected by + edges (e.g., node u), and different from those connected by − edges (e.g., node t). Also, it injects the local feature (i.e., the input feature of each module) of node v at each aggregation to make the aggregated features distinguishable. • Experiments. Extensive experiments show that SIDNET effectively learns node representations of signed social graphs for link sign prediction, giving at least 3.3% higher accuracy than the state-of-the-art models in real datasets ( Table 3).
The symbols used in this paper are summarized in Table 1. The code of SIDNET and datasets are available at https://github.com/snudatalab/SidNet.

Graph convolutional networks on unsigned graphs
Graph convolutional network (GCN) [22] models the latent representation of a node by employing a convolutional operation on the features of its neighbors. Various GCN-based approaches [22,23,33] have aroused considerable attention since they enable diverse graph supervised tasks [22,34,35] to be performed concisely under an end-to-end framework. However, the first generation of GCN models exhibit performance degradation due to the oversmoothing and vanishing gradient problems. Several works [26,27] have theoretically revealed the over-smoothing problem. Also, Li et al. [25] have empirically shown that stacking more GCN layers leads to the vanishing gradient problem as in convolutional neural networks [36]. Consequently, most GCN-based models [22,23,33] are shallow; i.e., they do not use the feature information in faraway nodes when modeling node embeddings.
A recent research direction aims at resolving the limitation. Klicpera et al. [24] proposed APPNP exploiting Personalized PageRank [37,38] to not only propagate hidden node embeddings far but also preserve local features, thereby preventing aggregated features from being over-smoothed. Li et al. [25] suggested ResGCN adding skip connections between GCN layers, as in ResNet [36]. However, all of these models do not provide how to use signed edges since they are based on the homophily assumption [22], i.e., users having connections are likely to be similar, which is not valid for negative edges. As opposed to the homophily, negative edges have the semantics of heterophily [39], i.e., users having connections are dissimilar. Although these methods can still be applied to signed graphs by ignoring the edge signs, their trained features have limited capacity.

Network embedding and graph convolutional networks on signed graphs
Traditional methods on network embedding extract latent node features specialized for signed graphs in an unsupervised manner. Kim et al. [28] proposed SIDE which optimizes a likelihood over direct and indirect signed connections on truncated random walks sampled from a signed graph. Xu et al. [29] developed SLF considering positive, negative, and non-linked relationships between nodes to learn non-negative node embeddings. However, such approaches are not end-to-end, i.e., they are not directly optimized for solving a supervised task such as link prediction.
There are recent progresses on end-to-end learning on signed networks under the GCN framework. Derr et al. [30] proposed SGCN which extends the GCN mechanism to signed graphs considering balanced and unbalanced relationships supported by structural balance theory [32]. There are several techniques based on attention. Junjie et al. [40] proposed a graph attention network model by incorporating the importance of graph motifs into node feature. Yu et al. [31] reported that their SNEA model outperforms the motif based attention model by combining the graph attention technique and the balanced relationships. However, such state-of-the-art models do not consider the over-smoothing problem since they are directly extended from GCN.

Proposed method
We propose SIDNET (SIGNED DIFFUSION NETWORK), a novel end-to-end model for node representation learning in signed graphs. Our SIDNET aims to properly aggregate node features on signed edges, and to effectively use the features of multi-hop neighbors so that generated features are not over-smoothed. Our main ideas are to diffuse node features along random walks considering the signs of edges, and to inject local node features at each aggregation. Fig 1 depicts the overall architecture of SIDNET. Given a signed graph G and initial node features X 2 R n�d 0 as shown in Fig 1, SIDNET extracts the final node embeddings H ðLÞ 2 R n�d L through multiple layers where n is the number of nodes, L is the number of layers, and d l is the embedding dimension of the l-th layer. Then, H (L) is fed into a loss function of a specific task so that they are jointly trained in an end-to-end framework. Given H (l−1) , the l-th layer aims to learn H (l) based on feature transformations and signed random walk diffusions of F d ð�Þ as shown in Fig 1. The layer also uses the skip connection to prevent the vanishing gradient problem when the depth of SIDNET increases. Fig 1 illustrates the intuition behind the signed random walk diffusion. Each node has two features corresponding to positive and negative surfers, respectively. The surfer flips its sign when moving along negative edges, while the sign is kept along positive edges. For example, the positive (or negative) surfer becomes positive at node v if it moves from a positively connected node u (or a negatively connected node t). As a result, the aggregated features at node v become similar to those connected by positive edges (e.g., node u), and different from those connected by negative edges (e.g., node t). In other words, it satisfies homophily and heterophily at the same time while unsigned GCNs cannot handle the heterophily of negative edges. Furthermore, we inject the local feature (i.e., the input feature of the module) of node v at each aggregation so that the resulting features remain distinguishable during the diffusion.

Signed diffusion network
Given a signed graph G and the node embeddings H (l−1) from the previous layer, the l-th layer learns new embeddings H (l) as shown in Fig 1. It first transforms H (l−1) into hidden features H ðlÞ asH ðlÞ ¼ H ðlÀ 1Þ W ðlÞ t with a learnable parameter W ðlÞ t 2 R d lÀ 1 �d l . Then, it applies the signed random walk diffusion which is represented as the function F d ðG;H ðlÞ Þ which returns P ðlÞ 2 R n�d l and M ðlÞ 2 R n�d l as the positive and the negative embeddings, respectively. The embeddings are concatenated and transformed as follows: where ϕ(�) is a non-linear activator such as tanh, || denotes horizontal concatenation of two matrices, and W ðlÞ n 2 R 2d l �d l is a trainable weight matrix that learns a relationship between P ðlÞ and M ðlÞ . We use the skip connection [25,36] with H (l−1) in Eq (1) to avoid the vanishing gradient issue which frequently occurs when multiple layers are stacked.

Signed random walk diffusion
We design the signed random walk diffusion operator F d ð�Þ used in the l-th layer. Given the signed graph G and the hidden node embeddingsH ðlÞ , the diffusion operator F d ð�Þ diffuses the node features based on random walks considering edge signs so that it properly aggregates node features on signed edges and prevents the aggregated features from being oversmoothed.
Signed random walks are performed by a signed random surfer [11] who has the + or − sign when moving around the graph. Fig 2 shows signed random walks on four cases according to edge signs: 1) a friend's friend, 2) a friend's enemy, 3) an enemy's friend, and 4) an enemy's enemy. The surfer starts from node s with the + sign. If it encounters a negative edge, the surfer flips its sign from + to −, or vice versa. Otherwise, the sign is kept. The surfer determines whether a target node t is a friend of node s or not according to its sign.
The diffusion operator F d ð�Þ exploits the signed random walk for diffusing node features on signed edges. Each node is represented by two feature vectors which represent the positive and negative signs, respectively. Let k denote the number of diffusion steps or signed random walk steps. Then, p ðkÞ v 2 R d l �1 and m ðkÞ v 2 R d l �1 are aggregated at node v, respectively, where p ðkÞ v (or m ðkÞ v ) is the feature vector visited by the positive (or negative) surfer at step k. These are recursively obtained by the following equations: where ~ N s v is the set of incoming neighbors to node v connected with edges of sign s,Ñ u is the set of outgoing neighbors from node u regardless of edge signs,h ðlÞ v is the local feature of node v (i.e., the v-th row vector ofH ðlÞ ), and 0 < c < 1 is a local feature injection ratio. That is, the features are computed by the signed random walk feature diffusion with weight 1 − c and the local feature injection with weight c with the following details. Note that the convergence of Eq Local feature injection. Although the feature diffusion above properly considers edge signs, the generated features could be over-smoothed after many steps if we depend solely on the diffusion. In other words, it considers only the graph information explored by the signed random surfer, while the local information in the hidden featureh ðlÞ v is disregarded during the diffusion. Hence, as shown in Fig 2, we explicitly inject the local featureh ðlÞ v to p ðkÞ v with weight c at each aggregation in Eq (2) so that the diffused features are not over-smoothed. The reason why local features are only injected to + embeddings is that we consider a node should trust (+) its own information (i.e., its local feature).
Discussion. Our approach is motivated from SGCN [30], APPNP [24], and SRWR [11,41]. We describe how we utilize and combine their ideas for developing our method, and how our fusion resolves their limitations when its comes to learning node representation in signed graphs.
• Motivation from SGCN. The main idea of SGCN is to make GCN consider balanced and unbalanced paths based on structural balance theory so that the information of balanced paths and that of unbalanced ones are reflected into positive and negative embeddings, respectively. Inspired from this idea, we also maintain two positive and negative embeddings for each node, and make our aggregation phase follow the balance theory. However, simply extending GCN with the balance theory like SGCN does not resolve the over-smoothing issue as shown in Fig 4. Thus, we combine the following ideas of APPNP and SRWR in this framework to overcome the limitation, which are described below.
• Motivation from APPNP. To resolve the over-smoothing issue of unsigned GCNs, APPNP utilizes Random Walk with Restart (or personalized pagerank) [42] in a GCN. As a result, APPNP demonstrates that the restart of RWR prevents the over-smoothing problem by inserting input features stochastically during its diffusion (or aggregation) phase. This motivates us to introduce the local feature injection for the same purpose to avoid the issue when learning node embeddings in signed graphs. However, APPNP does not provide a way to deal with signed edges for aggregating node embeddings. To address this challenge, we adopt the signed random walks of SRWR.
• Motivation from SRWR. The signed random walks of SRWR were originally proposed for propagating probabilities, not embedding vectors on each node to measure node-to-node similarity scores which are used as a personalized ranking in a signed graph. Thus, this technique had not been studied for learning node representation in signed graphs. Hinted from SGCN and APPNP, we utilize the signed random walks with the local feature injection as shown in Eq (2), and demonstrate that our method effectively considers signed edges while resolving the aforementioned over-smoothing issue.

Convergence of signed random walk diffusion
Suppose that P ðkÞ ¼ ½p ðkÞ> � represent the positive and negative embeddings of all nodes, respectively, where; denotes vertical concatenation. Let A s be the adjacency matrix for sign s such that A suv is 1 for signed edge (u ! v, s), and 0 otherwise. Then, Eq (2) is vectorized as follows: whereÃ s ¼ D À 1 A s is the normalized matrix for sign s, and D is a diagonal out-degree matrix (i.e., D ii ¼ jÑ i j). The signed random walk diffusion operator F d ð�Þ iterates Eq (3) K times for 1 � k � K where K is the number of diffusion steps, and it returns P ðlÞ P ðKÞ and M ðlÞ M ðKÞ as the outputs of the diffusion module at the l-th layer. Furthermore, Eq (3) is compactly represented as where Then, T (k) is guaranteed to converge as k increases (see Theorem 1). Discussion. According to Eq (5) of Theorem 1,B KQ is the node features diffused by Kstep signed random walks whereB K is interpreted as the transition matrix of K-step signed random walks, andQ≔cQ is the scaled input feature of the diffusion layer. Thus, the approximation is the sum of the diffused features from 1 to K steps with a decaying factor 1 − c, i.e., the effect of distant nodes gradually decreases while that of neighboring nodes is high. This is the reason why SIDNET prevents diffused features from being over-smoothed. Also, the approximation error kT � − T (k) k 1 exponentially decreases as K increases due to the term (1 − c) K . Another point is that the iteration of Eq (3) converges to the same solution no matter what P (0) and M (0) are given. In this work, we initialize P (0) withH ðlÞ , and randomly initialize M (0) in [−1, 1].
As shown in Fig 1, we use multiple layers in SIDNET with non-linear activator tanh(�) to increase its learning capacity and model latent non-linear patterns inherent in data. As a result, SIDNET performs K × L-hop feature propagations where K and L are the numbers of diffusion steps and layers, respectively. One advantage of this approach is that users are able to flexibly control feature propagation and model capacity to suit their own purposes.

Algorithm of SIDNET
Algorithm 1 summarizes SIDNET's overall procedure which is depicted in Fig 1. Given signed adjacency matrix A and related hyper-parameters (e.g., numbers L and K of layers and diffusion steps, respectively), SIDNET produces the final hidden node features H (L) which are fed to a loss function as described in the following section. It first computes the normalized matrices A þ andÃ À (line 1). Then, it performs the forward function (lines 3 * 12). The forward function repeats the signed random walk diffusion K times (lines 6 * 9), and then performs the non-linear feature transformation skip-connected with H (l−1) (line 11).
Algorithm 1: SIDNET Input: signed adjacency matrix A, initial node feature matrix X, number K of diffusion steps, number L of layers, and local feature injection ratio c Output: hidden node feature matrix H (L) 1: compute normalized adjacency matrices for each sign, i.e.,Ã þ ¼

Loss function for link sign prediction
The link sign prediction task is to predict the missing sign of a given edge. As shown in where E is the set of signed edges, W 2 R 2�2d L is a learnable weight matrix, softmax t (�) is the probability for sign t after softmax operation, and Ið�Þ returns 1 if a given predicate is true, and 0 otherwise.

Analysis
We first show the convergence guarantee of T (k) , the positive and negative embeddings of all nodes, in Theorem 1 and Lemma 1. Our analysis is inspired from the convergence analysis of [41], which describes the power iteration of a single probability vector on a transition matrix constructed by the signed random walks. In this work, we extend the analysis to the power iteration of multidimensional embedding vectors, and show why our method prevents the oversmoothing issue in Eq (5) (see its interpretation below Eq (4)).
Proof. According to spectral radius theorem [43], rðBÞ � kB k 1 where k�k 1 denotes L 1norm of a given matrix, indicating the maximum absolute column sum of the matrix. Note that the entries ofB are non-negative probabilities; thus, the absolute column sums ofB are equal to its column sums which are obtained as follows: whereÃ > ¼Ã > þ þÃ > À , and 1 n is an n-dimensional one vector. Note thatÃ > s ¼ A > s D À 1 for sign s where D is a diagonal out-degree matrix (i.e., D uu ¼ jÑ u j). Then, 1 > nÃ > is represented as where | A | = A + + A − is the absolute adjacency matrix. The u-th entry of |A|1 n indicates the out-degree of node u, denoted by jÑ u j. Note that D À 1 uu is 1=jÑ u j if u is a non-deadend. Otherwise, D À 1 uu ¼ 0 (i.e., a deadend node has no outgoing edges). Hence, the u-th entry of b > is 1 if node u is not a deadend, or 0 otherwise; its maximum value is less than or equal to 1. Therefore, rðBÞ � kB k 1 � 1.
Complexity analysis. We analyze the time complexity of SIDNET as follows.

Experiments
We evaluate the effectiveness of SIDNET through the link sign prediction task on real-world signed graphs. Specifically, we aim to answer the following questions: • Q1. Link sign prediction. How effective is our proposed SIDNET for predicting the signs of missing edges compared to state-of-the-art methods?
• Q2. Ablation study. How does each component of SIDNET affect node representation learning in connection with the link sign prediction?
• Q3. Effect of local injection ratio. How does the ratio c of the local feature injection in SID-NET affect the performance of link sign prediction?
• Q4. Effect of propagation hops. How does propagation hops of SIDNET affect the performance of the link sign prediction?
• Q5. Effect of embedding dimension. How does the dimension of embeddings produced by SIDNET affect the accuracy of link sign prediction compared to other methods?

Experimental setting
Datasets. We perform experiments on five signed graphs summarized in Table 2. The Bitcoin-Alpha and Bitcoin-OTC datasets [5] are extracted from directed online trust networks served by Bitcoin Alpha and Bitcoin OTC, respectively. The Wikipedia dataset [44] is a signed graph representing the administrator election procedure in Wikipedia where a user can vote for (+) or against (−) a candidate. The Slashdot dataset [2] is collected from Slashdot, a technology news site which allows a user to create positive or negative links to others. The Epinions dataset [1] is a directed signed graph scraped from Epinions, a product review site in which users mark their trust or distrust to others.
The publicly available signed graphs do not contain initial node features even though they have been utilized as representative datasets in signed graph analysis. Due to this reason, many previous works [30,31] on GCN for signed graphs have exploited singular vector decomposition (SVD) to extract initial node features. Thus, we follow their setup, i.e., X = U S d is the initial feature matrix for all GCN-based models where A ' US d i V > is obtained by a truncated SVD method, called Randomized SVD [45], with target rank d i = 128. Note that the method is very efficient (i.e., its time complexity is Oðnd 2 i Þ where n is the number of nodes) and performed only once as a preprocessing in advance; thus, it has little effect on the computational performance of training and inference.
Competitors. We compare our proposed SIDNET with the following competitors: • SRWR[11, 41] is a personalized ranking method for measuring trustworthiness scores between nodes based on signed random walks. In [41], they used the Wikipedia, Slashdot, and Epinions datasets as directed graphs without preprocessing. They randomly selected 2, 000 seed nodes and choose 20% edges of positive and negative links of each node as validation and test sets. The remaining edges are used as a training set. They measured accuracy (i.e., the ratio of correct predictions) and macro F1 score for the task.
• APPNP [24] is an unsigned GCN model based on Personalized PageRank.
• ResGCN [25] is another unsigned GCN model exploiting skip connections to stack multiple layers.
• SIDE [28] is a network embedding model optimizing the likelihood over signed edges using random walk sequences to encode structural information into node embeddings. In [28], they used the Wikipedia, Slashdot, and Epinions datasets as directed graphs without preprocessing, and performed 5-fold cross validation. They measured AUC and F1 score for the task.
• SLF [29] is another network embedding model considering positive, negative, and nonlinked relationships to learn non-negative node embeddings. In [29], they used the Wikipedia, Slashdot, and Epinions datasets as directed graphs without preprocessing. They randomly split each dataset into training and test sets by the 8:2 ratio. They used AUC and F1 score for the task.
• SGCN [30] is a state-of-the-art signed GCN model considering balanced and unbalanced paths motivated from balance theory to propagate embeddings. In [30], they used the Bitcoin-Alpha, Bitcoin-OTC, Slashdot, and Epinions datasets. They modified each dataset so that the resulting graph becomes undirected, and filtered out nodes with few links randomly from the two larger networks (Slashdot and Epinions). For each graph, they randomly split edges into training and test sets by the 8:2 ratio. They used AUC and F1 score for the task.
• SNEA [31] is another signed GCN model extending SGCN by learning attentions on the balanced and unbalanced paths for modeling embeddings. According to [31], the experimental setup of SNEA is the same as that of SGCN.
Note that each dataset originally represents a directed graph, not an undirected graph. Thus, we test all methods including SGCN and SNEA in directed graphs formed from nonfiltered original datasets. Also, APPNP and ResGCN are originally designed for unsigned graphs (i.e., they were not tested for the sign prediction task in [24,25]). In this work, we use the absolute adjacency matrix for APPNP and ResGCN. Implementation and machines. All methods are implemented by PyTorch and Numpy in Python. We use a machine with Intel E5-2630 v4 2.2GHz CPU and Geforce GTX 1080 Ti.
Data split and evaluation metrics. We randomly split the edges of a signed graph into training and test sets by the 8:2 ratio. As shown in Table 2, the sign ratio is highly skewed to the positive sign, i.e., the sampled datasets are naturally imbalanced. Considering the class imbalance, we measure the area under the curve (AUC) to evaluate predictive performance. We also report macro F1 measuring the average of the ratios of correct predictions for each sign since negative edges need to be treated as important as positive edges (i.e., it gives equal importance to each class). A higher value of AUC or macro F1 indicates better performance. We repeat each experiment 10 times with different random seeds and report the average and standard deviation of test values.
Hyperparameter settings. We set the dimension of final node embeddings to 32 for all methods so that their embeddings have the same learning capacity for the target task. We perform 5-fold cross-validation for each method to find the best hyperparameters and measure the test accuracy with the selected ones. In the cross-validation for SIDNET, the local injection ratio c is selected from 0.05 to 0.95 by step size 0.1. We set the number L of layers to 2, the number K of diffusion steps to 10, and the feature dimension d l of each layer to 32. We follow the range of each hyperparameter recommended in its corresponding paper for the cross-validation of other models. Our model is trained by the Adam optimizer [46], where the learning rate is 0.01, the weight decay λ is 0.001, and the number of epochs is 100.

Link sign prediction
We evaluate the performance of each method on link sign prediction. Tables 3 and 4 summarize the experimental results in terms of AUC and macro F1, respectively. Note that our SIDNET shows the best performance in terms of AUC and macro F1 scores. SIDNET presents 3.3 * 6.6% and 1.6 * 7.4% improvements over the second best models in terms of AUC and macro F1, respectively. We have the following observations. Table 3. SIDNET gives the best link sign prediction performance in terms of AUC. The best model is in bold, and the second best model is underlined. The % increase measures the best accuracy against the second best accuracy. • SIDNET outperforms an unsupervised method SRWR for the link sign prediction over all datasets; this implies learning node embedding with the signed random walks and the local feature injection is more effective for the task.

AUC
• The unsigned GCN models APPNP and ResGCN show worse performance than SIDNET, which shows the importance of using sign information.
• The performance of network embedding techniques such as SIDE and SLF is worse than that of other GCN-based models; this shows the importance of jointly learning for feature extraction and link sign prediction for the performance.
• The performance of SGCN and SNEA which use limited features from nodes within 2 * 3 hops is worse than that of SIDNET which exploits up to K × L-hop neighbors' features for each layer where K is set to 10, and L is set to 2 in these experiments. It indicates that carefully exploiting features from distant nodes as well as neighboring ones is crucial for the performance.

Ablation study
We examine the effectiveness of each component used in SIDNET through an ablation study.
As a baseline, we consider the signed random walk diffusion (SRWDiff) of a single layer with no other components, which is achieved by setting c = 0, K = 10, and L = 1. Then, we combine SRWDiff with the local feature injection (LFI) by setting c > 0 where the value of c varies with datasets. As seen in the second row of Table 5, this combination significantly improves AUC of the link sign prediction, especially in Wikipedia and Slashdot datasets. This emphasizes the importance of injecting local features into the signed random walk diffusion process. Further, the performance slightly increases by using multiple layers (ML) with the skip connection (SC) over all datasets as shown in the fourth row of Table 5.

Effect of local injection ratio
We examine the effect of the local injection ratio c in the diffusion module of SIDNET. We use one layer, and set the number K of diffusion steps to 10; we vary c from 0.05 to 0.95 by 0.1, and measure the performance of the link sign prediction task in terms of macro F1. Fig 3 shows the effect of c for the predictive performance of SIDNET. For small datasets such as Bitcoin-Alpha and Bitcoin-OTC, c between 0.15 and 0.35 provides the best performance. On the other hand, c around 0.5 shows the best accuracy for Wikipedia, Slashdot, and Epinions datasets. For all datasets, a too low or too high value of c (e.g., 0.05 or 0.95) results in a poor performance. For each dataset, we select the value of c producing the best accuracy in Fig 3, and record it in Table 2 for the following experiments.

Effect of propagation hops
We investigate the effect of the propagation hop count with which features are propagated in SIDNET for learning from signed graphs. As described in Theorem 1, the hop count of SIDNET is determined by K × L where K and L are the numbers of diffusion steps and layers, respectively. Thus, we examine the effects of either or both of K and L. In these experiments, we use the local injection ratio c in Table 2 for each dataset. Effect of the number K of diffusion steps. To see its pure effect, we use one layer (L = 1) so that the hop count is decided by only the number K of diffusion steps. We vary K from 1 to 10 and evaluate the performance of SIDNET in terms of AUC for each diffusion step. Fig 3 shows that the performance of SIDNET gradually improves over all datasets as the hop count increases. Note that the performance of SIDNET converges in general after a sufficient number of diffusion steps, which is from Theorem 1.
Effect of the number L of layers. In this experiment, we set K to 1 so that the hop count is decided by only the number L of layers. We increase L from 1 to 10, and compare SIDNET to SGCN, the state-of-the-art-model for learning from signed graphs. The hop count of SGCN is also determined by its number of layers. Fig 4 shows that the performance of SIDNET gradually improves as L increases while that of SGCN dramatically decreases over all datasets. This indicates that SGCN suffers from the performance degradation problem when its network becomes deep, i.e., it is difficult to use information beyond 3-hop neighbors in SGCN. On the other hand, SIDNET utilizes features of farther nodes, and generates more expressive and stable embedding than SGCN does.
Effect of both K and L. We further vary both K and L to investigate the effect of hop counts which are determined by K × L where 1 � K, L � 10. Fig 5 demonstrates the AUC's tendency in the link sign prediction, with the following observations: • SIDNET produces a better accuracy when the hop count is between 20 and 30 in general. On the other hand, a small hop count results in inferior performance over all tested datasets. • Overall, the upper left triangle of each plot are redder than the lower right triangle, implying K of our diffusion module (or diffusing features via signed random walks) is more influential in the performance of SIDNET than L (or simply stacking layers).

Effect of embedding dimension
We investigate the effect of the node embedding dimension of each model for the link sign prediction task. For this experiment, we vary the dimension of hidden and final node embeddings from 8 to 128 where other hyperparameters of each model are set to those producing the  Table 3. Then, we observe the trend of AUC in the link sign prediction task. As shown in Fig 6, SIDNET significantly outperforms its competitors over all the tested dimensions, and it is relatively less sensitive to the embedding dimension than other models in all datasets except the Bitcoin-Alpha dataset.

Conclusion
In this paper, we propose SIGNED DIFFUSION NETWORK (SIDNET), a novel graph neural network that performs end-to-end node representation learning for link sign prediction in signed graphs. We propose a signed random walk diffusion method to properly diffuse node features on signed edges, and suggest a local feature injection method to make diffused features distinguishable. Our diffusion method empowers SIDNET to effectively train node embeddings considering multi-hop neighbors while preserving local information. Our extensive experiments show that SIDNET provides the best accuracy outperforming the state-of-the-art models in link sign prediction. Future research directions include analyzing our model for graph reconstruction and clustering in signed graphs, and extending it for multi-view networks.