Supervised and extended restart in random walks for ranking and link prediction in networks

Given a real-world graph, how can we measure relevance scores for ranking and link prediction? Random walk with restart (RWR) provides an excellent measure for this and has been applied to various applications such as friend recommendation, community detection, anomaly detection, etc. However, RWR suffers from two problems: 1) using the same restart probability for all the nodes limits the expressiveness of random walk, and 2) the restart probability needs to be manually chosen for each application without theoretical justification. We have two main contributions in this paper. First, we propose Random Walk with Extended Restart (RWER), a random walk based measure which improves the expressiveness of random walks by using a distinct restart probability for each node. The improved expressiveness leads to superior accuracy for ranking and link prediction. Second, we propose SuRe (Supervised Restart for RWER), an algorithm for learning the restart probabilities of RWER from a given graph. SuRe eliminates the need to heuristically and manually select the restart parameter for RWER. Extensive experiments show that our proposed method provides the best performance for ranking and link prediction tasks.

However, RWR has two challenges for providing more effective relevance scores.First, RWR assumes a fixed restart probability on all nodes, i.e., a random surfer jumps back to the query node with the same probability regardless of where the surfer is located.This assumption prevents the surfer from considering the query node's preferences for other nodes, thereby limiting the expressiveness of random walk for measuring good relevance scores.Second, RWR requires users to heuristically select the restart probability parameter without any theoretical guide or justification to choose it.
In this paper, we propose a novel relevance measure Random Walk with Extended Restart (RWER), an extended version of RWR, which reflects a query node's preferences on relevance scores by allowing a distinct restart probability for each node.We also propose a supervised learning method SuRe (Supervised Restart for RWER) that automatically finds optimal restart probabilities in RWER from a given graph.Extensive experiments show that our method provides the best the link prediction accuracy: e.g., SuRe boosts MAP (Mean Average Precision) by up to 15.8% on the best competitor as shown in Figure 1.Our main contributions are summarized as follows: • Model.We propose Random Walk with Extended Restart (RWER), a new random walk model to improve the expressiveness of RWR.RWER allows each node to have a distinct restart probability so that the random surfer has a finer control on preferences for each node.• Learning.We propose SuRe, an algorithm for learning the restart probabilities in RWER from data.SuRe automatically determines the best restart probabilities.• Experiment.We empirically demonstrate that our proposed method improves accuracy in all dataset.Specifically, our proposed method improves MAP by up to 15.8% and Precision@20 by up to 10.1% on the best competitor.
The code of our method and datasets used in this paper are available at http://datalab.snu.ac.kr/ sure.The rest of this paper is organized as follows.Section 2 presents a preliminary on RWR and defines SuRe shows the highest accuracies: 15.8% higher MAP, and 10.1% higher Precision@20 compared to the best existing method.
the problem.Our proposed methods are described in Section 3.After presenting experimental results in Section 4, we provide a review on related works in Section 5. Lastly, we conclude in Section 6.

Preliminaries
In this section, we describe the preliminaries on Random Walk with Restart.Then, we formally define the problem handled in this paper.We use A ij or A(i, j) to denote the entry at the intersection of the i-th row and j-th column of matrix A, A(i, :) to denote the i-th row of A, and A(:, j) to denote the j-th column of A. The i-th element of the vector x is denoted by x i .2.1 Random Walk with Restart.Random walk with restart (also known as Personalized PageRank, PPR) [28] measures each node's proximity (relevance) w.r.t. a given query node s in a graph.RWR assumes a random surfer who starts at node s.The surfer moves to one of its neighboring nodes with probability 1 − c or restarts at node s with probability c.When the surfer moves from u to one of its neighbors, each neighbor v is selected with a probability proportional to the weight in the edge (u, v).The relevance score between seed node s and node u is the stationary probability that the surfer is at node u.If the score is large, we consider that nodes s and u are highly related.
Limitations.RWR cannot consider a query node's preferences for estimating relevance scores between the query node and other nodes.For example, suppose we compute relevance scores from the query node A to other nodes in a political blog network in Figure 2 where blue colored nodes are liberal blogs, red colored ones are conservative, black ones are not labeled, and an edge between nodes indicates a hyperlink between the corresponding blogs.Based on the topology of the graph, we consider that nodes E and F tend to be moderate, node D is likely to be liberal, and node I  is likely to be conservative.Since the query node A is a liberal blog, node A will prefer other liberal nodes to conservative nodes.However, a conservative node G is ranked higher than nodes related to liberal blogs such as nodes C and D in the ranking result of RWR.The reason is that preferences are not considered in RWR, and the random surfer jumps back to the query node A with a fixed restart probability c wherever the surfer is.On the other hand, RWER reflects the query node's preferences on relevance scores by allowing a distinct restart probability for each node.
Another practical problem is that it is non-trivial to set an appropriate value of the restart probability c for different applications since we need to manually choose c so that the restart probability provides optimal relevance scores for each application.RWR scores are highly affected by the restart probability; the ranking results of each restart probability (c = 0.1 and c = 0.5) are quite different as seen in Figure 2. In contrast, our learning method SuRe automatically determines the optimal restart probabilities for all nodes based on the query node's preferences as well as relationships between nodes.The detailed descriptions of our proposed approaches RWER and SuRe are presented in Section 3. 2.2 Problem Definition.We are given a graph G with n nodes and m edges, a query node s, and side information from the query node.The side information contains a set of positive nodes P = {x 1 , ..., x k } that s prefers, and a set of negative nodes N = {y 1 , ..., y l } that s dislikes.Our task is to learn restart probabilities for all nodes such that relevance scores of the positive nodes are greater than those of the negative ones.

Proposed Method
In this section, we describe Random Walk with Extended Restart (RWER), our proposed model for extended restart probabilities.Also, we propose SuRe, an efficient algorithm for learning the restart probabilities.

Overview of Random Walk with Extended
Restart.RWER is a novel relevance model reflecting a query node's preferences on relevance scores.The main idea of RWER is that we introduce a restart probability vector each of whose entry corresponds to a restart probability at a node, so that the restart probabilities are related to the preferences for the nodes.
In RWER, a restart probability of each node is interpreted as the degree of boredom of a node w.r.t. the query node.That is, if the restart probability on a node is large, then the surfer runs away from the current node to the query node (i.e., the surfer becomes bored at the node).On the other hand, if the restart probability of the node is small, then surfer desires to move around the node's neighbors (i.e., the surfer has an interest in the node and its neighbors).
As depicted in Figure 2, each node has its own restart probability in our model RWER.The restart probabilities are determined by our supervised learning method SuRe (Section 3.5) from the query (liberal) node A, the positive (liberal) nodes B and C, and the negative (conservative) nodes G and H.Note that a ranking list where many liberal nodes are ranked high is desirable for the query node A since A is liberal.As shown in Figure 2, using distinct restart probabilities for each node by RWER provides more satisfactory rankings for the query node than using a single restart probability for all nodes by RWR.The restart probabilities of liberal nodes are smaller than those of conservative nodes, which implies that the random surfer prefers searching around the liberal nodes such as B and C while the surfer is likely to run away from the conservative nodes such as G and H.
One might think that it is enough to simply assign small restart probabilities to positive nodes and large restart probabilities to negative nodes for a desirable ranking.However, the restart probabilities should be determined also for unlabeled nodes, and the probabilities should reflect intricate relationships between nodes as well as the query node's preferences.For example, the restart probability of node F in Figure 2 is relatively moderate because node F is located between a liberal node A and a conservative node G. Also, the restart probability of node D is small since node D is close to We formulate RWER in this section.We first explain the formulation using the example shown in Figure 3, and present general equations.In the example, the surfer goes to one of its neighbors or jumps back to the query node.To obtain the RWER probability r u at time t + 1, we should take into account the scores of the three nodes which are i, j and k at time t.Suppose the surfer is at the node i at time t.The surfer can go to an out-neighbor through one of the two outgoing edges with probability 1 − c i .Note that every node has a distinct restart probability and node i has a restart probability c i in this case.Without the restart action, r in Figure 3 is defined as follows: Also, the surfer on any node v jumps back to the query node with probability c v .The above equation is rewritten as follows considering the restart action of the random surfer: where 1(u = s) is 1 if u is the query node s; otherwise, it is 0. Note that the restart term is different from that of the traditional random walk with restart.Based on the aforementioned example, the recursive equation of our model is defined as follows: where IN i is the set of in-neighbors of node i, and OUT i is the set of out-neighbors of node i.
Equation (3.1) is expressed in the form of a matrix equation as follows: where Ã is a row-normalized matrix of the adjacency matrix A, c is a restart vector whose i-th entry is c i , diag(c) is a matrix whose diag(c) ii = c i and other entries are 0, and q is a vector whose s-th element is 1 and all other elements are 0. Notice that if c is a vector all of whose elements are the same, then the RWER is equal to RWR (or PPR).
The following lemma shows that equation (3.2) can be represented as a closed form equation.
Note that if c is given, the RWER vector r can be calculated using the closed form in Lemma 3.1.However, the computation using the closed form requires O(n 3 ) time and O(n 2 ) memory space due to the matrix inversion where n is the number of nodes; thus, this approach is impractical when we need to compute RWER scores in large-scale graphs.In order to avoid the heavy computational cost, we exploit an efficient iterative algorithm described in Section 3.3.

Algorithm for Random
Walk with Extended Restart.We present an iterative algorithm for computing RWER scores efficiently.Our algorithm is based on power iteration and comprises two phases: a normalization phase (Algorithm 1) and an iteration phase (Algorithm 2).
Normalization phase (Algorithm 1).Our proposed algorithm first computes the out-degree diagonal matrix D of A (line 1).Then, the algorithm computes the row normalized matrix Ã using D (line 2).
Iteration phase (Algorithm 2).Our algorithm computes the RWER score vector r for the seed node s in the iteration phase.As described in Section 3.2, the vector q denotes a length-n starting vector whose entry at the index of the seed node is 1 and otherwise 0 (line 1).Our algorithm iteratively computes equation (3.2) (line 3).We then compute the error δ between r , the result in the previous iteration, and r (line 4).Next, we update r into r for the next iteration (line 5).The iteration stops when the error δ is smaller than a threshold (line 6).
Theoretical analysis.We analyze the convergence of the iterative algorithm in Theorem 3.1 and the time complexity in Theorem 3.2.We assume that all the matrices considered are saved in a sparse format, such as the compressed column storage [20], which stores only non-zero entries, and that all the matrix operations exploit such sparsity by only considering non-zero entries.3.4 Cost Function.Although our relevance measure RWER improves the expressiveness of RWR by introducing a distinct restart probability for each node, it is difficult to manually investigate the optimal restart probabilities for all nodes in large graphs.In this section, we define the cost function for finding the optimal restart probabilities.
As mentioned in Section 2.2, our goal is to set the optimal restart probabilities so that the relevance scores of positive nodes outweigh those of negative nodes.We define the following cost function: where λ is a regularization parameter that controls the importance of the regularization term, o is a given origin vector, h is a loss function, and r x and r y are RWER scores of nodes x and y, respectively.The cost function is obtained from the pairwise differences between the RWER scores of positive and negative nodes.Given an increasing loss function h, F (c) is minimized as the scores of positive nodes are maximized and those of negative nodes are minimized.The origin vector o prevents the c vector becoming too small, and serves as a model regularizer which helps avoid overfitting and thus improves accuracy, as we will see in Section 4.4.
We set o to a constant vector all of whose elements are set to a constant.We use the loss function h(x) = (1 + exp(−x/b)) −1 since the loss function maximizes AUC [17,5].
3.5 SuRe -Optimizing the Cost Function.Our goal is to minimize equation (3.4) with respect to c.Note that the objective function F (c) is not convex.Thus, we exploit the gradient descent method to find the local minimum of function F (c).For the purpose, we first need to obtain the derivative of F (c) w.r.t.c: where δ yx is r y − r x .The derivative ∂h(δyx) ∂δyx of the loss function is 1  b h(δ yx )(1 − h(δ yx )).In order to obtain the derivative ∂rx ∂c , we have to calculate the derivative of the relevance score r x w.r.t.c i which is the i-th element of c.Let M be (I−B) −1 ; then, r = (I − B) −1 q = Mq, M(:, s) = r, and M (x, s) = r x , from Theorem 3.1.
Since M is the inverse of I − B, according to [19], ∂M ∂ci becomes: where J ij is a single-entry matrix whose (i, j)th entry is 1 and all other elements are 0. Based on the above equation, ∂M (x,s) ∂ci is represented as follows: where e s is a length n unit vector whose s-th entry is 1.Note that ∂M (x,s) ∂ci is calculated for 1 ≤ i ≤ n; then, ∂rx ∂c is written in the following equation: where • denotes Hadamard product, and 1 is an allones vector of length n.Similarly, Notice that we do not obtain M explicitly to compute M(:, s) in equations (3.8) since M is the inverse of I − B and inverting a large matrix is infeasible as mentioned in Section 3.2.Instead, we use the iterative method described in Algorithm 2 to get r = M(:, s).However, the problem is that we also require rows of M (i.e., M(x, :) in r), and Algorithm 2 only computes a column of M for a given seed node.How can we calculate r without inverting M? r is computed iteratively by the following lemma: ∂δyx (e y − e x ), e x is an n × 1 vector whose x-th element is 1 and the others are 0, and δ yx = r y −r x .Then, r is the solution of the linear system (I − B )r = p which is solved by an iterative method for linear systems.
Note that M − = I − B is non-symmetric and invertible (Lemma 1.1.of [8]); thus, any iterative method for a non-symmetric matrix can be used to solve for r.We use GMRES [21], an iterative method for solving linear systems since it is the state-of-the-art method in terms of efficiency and accuracy.
Optimization phase (Algorithm 3).SuRe algorithm for solving the optimization problem is summarized in Algorithm 3 and Figure 4.In the algorithm, we use a gradient-based method to update the restart probability c based on equation (3.8).

Experiment
We evaluate our proposed method SuRe with various baseline approaches.Since there is no ground-truth of node-to-node relevance scores in real-world graphs, we instead evaluate the performance of two representative applications based on relevance scores: ranking and link prediction.Based on these settings, we aim to answer the following questions from the experiments: • Q1.Ranking performance (Section 4.2).
Does our proposed method SuRe provide the best relevances scores for ranking compared to other methods?  1.We use Polblogs and signed networks (Epinions and Slashdot) for the ranking task (Section 4.2), HepPh and HepTh for the link prediction task (Section 4.3), and Wikipedia for the scalability experiment (Section 4.5).Since only HepPh and HepTh have time information, we use them in the link prediction task.All experiments are performed on a Linux machine with Intel(R) Xeon E5-2630 v4 CPU @ 2.2GHz and 256GB memory.
Evaluation Metrics.To compare the methods, we use Mean Average Precision (MAP), Area under the ROC curve (AUC), and Precision@20.MAP is the mean of average precisions for multiple queries.AUC is the expectation that a uniformly drawn random positive is ranked higher than a uniformly drawn random negative.Precision@20 is the precision at the top-20 position in a ranking result.The higher the values of the metrics are, the better the performance is.4.2 Ranking Performance.We evaluate the ranking performance of our method SuRe compared to that of other methods.
Experimental setup.We perform this experiments on the Polblogs dataset and signed networks  (Epinions and Slashdot).See Section 2.1 of [8] for detailed experimental setup.
Case study.We analyze the ranking quality produced from each method in the Polblogs dataset.Table 2 shows the top-10 ranking list for a query node obsidianwings, a liberal blog.Red colored nodes are conservative, and the black colored ones are liberal.As shown in the table, our ranking result from SuRe is of a higher quality compared to those from RWR, SRW, and QUINT since top-10 ranking result from SuRe contains only liberal nodes while other ranking results have several conservative nodes, considering that the query node is liberal.
Result.To evaluate ranking performances, we measure MAP, Precision@20, and AUC for the ranking results produced from our method SuRe including other random walk based methods.For brevity, we report MAP and Precision@20 in Polblogs, and MAP and AUC in Epinions and Slashdot.
In Polblogs, if the query node is liberal (conservative), then the positive class is liberal (conservative),  Our method SuRe obtains up to 17.9% higher MAP and 25.6% higher Precision@20 (in Slashdot) compared to the best competitor.SuRe shows the highest accuracies: 12.5% higher MAP, and 6.8% higher Precision@20 compared to the best existing method.
and the negative one is conservative (liberal).As shown in Figure 5, our method SuRe shows the best ranking performance compared to other methods in terms of MAP and Precision@20.
In signed networks, we evaluate the performance of SuRe compared to other baselines including MRWR [23], an RWR-based method for signed networks.As in Figure 6, our method SuRe shows the best performance: up to 17.9% higher MAP and 25.6% higher Precision@20 compared to the best competitor.4.3 Link Prediction Performance.We examine the link prediction performance of our proposed method SuRe compared to other link prediction methods as well as RWR-based methods SRW and QUINT.
Experimental Setup.We perform this experiments on the HepPh and HepTh datasets which are time-stamped networks.See Section 2.2 of [8] for detailed experimental setup.
Result.Figures 1 and 7 show the link prediction performances in terms of MAP and Precision@20.As shown in the results, our method SuRe outperforms other competitors including SRW and QUINT which are the state-of-the-art methods for link prediction.In the HepTh dataset, compared to the best competitor SRW, SuRe achieves 15.8% improvement in terms of MAP, and 10.1% improvement in terms of Precision@20 (Figure 1).For the AUC results, see Section 2.3 of [8].Note that SuRe provides the best prediction over all datasets.The results state that assigning a distinct restart probability to each node and learning the restart probabilities (RWER and SuRe) have a significant effect on link prediction compared to using a fixed restart probability for all nodes (RWR).Furthermore, the result indicates that learning restart probabilities (SuRe) provides better link prediction accuracy than existing supervised learning methods that focus on learning edge weights (SRW) or network topology (QUINT).ing RWR in large graphs have been proposed to boost the performance of those applications in terms of time.
Ranking and link prediction.Jung et al. [9] extended the concept of RWR to design a personalized ranking model in signed networks.Wang et al. [29] proposed an image annotation technique that generates candidate annotations and re-ranks them using RWR.Liben-Nowell et al. [16] extensively studied the link prediction problem in social networks based on relevance measures such as PageRank, RWR, and Adamic-Adar [1].Many researchers have proposed supervised learning methods for link prediction.Backstrom et al. [5] proposed Supervised Random Walk (SRW), a supervised learning method for link prediction based on RWR.SRW learns parameters for adjusting edge weights.Li et al. [15] developed QUINT, a learning method for finding a query-specific optimal network.QUINT modifies the network topology including edge weights.In many real-world scenarios, however, modifying the graph structure would not be allowed.On the contrary, our SuRe method controls the behavior of the random surfer without modifying the graph structure, and provides better prediction accuracy than other competitors as shown in Section 4.

Conclusion
We propose Random Walk with Extended Restart (RWER), a novel relevance measure using distinct restart probabilities for each node.We also propose SuRe, a data-driven algorithm for learning restart probabilities of RWER.Experiments show that our method brings the best performance for ranking and link prediction tasks, outperforming the traditional RWR and recent supervised learning methods.Specifically, SuRe improves MAP by up to 15.8% on the best competitor.Future works include designing distributed algorithms for computing and learning RWER.

Figure 1 :
Figure1: Link prediction performance on the HepTh dataset.SuRe shows the highest accuracies: 15.8% higher MAP, and 10.1% higher Precision@20 compared to the best existing method.

Figure 3 :
Figure 3: Example of a network.Each node has its own restart probability.other liberal nodes B and C. Similarly, the restart probability of node I is large since node I is closely related to other conservative nodes G and H. 3.2 Formulation of Random Walk with Extended Restart.We formulate RWER in this section.We first explain the formulation using the example shown in Figure3, and present general equations.In the example, the surfer goes to one of its neighbors or jumps back to the query node.To obtain the RWER probability r u at time t + 1, we should take into account the scores of the three nodes which are i, j and k at time t.Suppose the surfer is at the node i at time t.The surfer can go to an out-neighbor through one of the two outgoing edges with probability 1 − c i .Note that every node has a distinct restart probability and node i has a restart probability c i in this case.Without the restart action, r

Algorithm 1 1 :Algorithm 2 4 : 5 :
Normalization phase of RWER Input: adjacency matrix A Output: row-normalized matrix Ã compute a degree diagonal matrix D of A (i.e., Dii = j Aij) 2: compute a normalized matrix, Ã = D −1 A. 3: return Ã Iteration phase of RWER Input: row-normalized matrix Ã, query node s, restart probability vector c, and error tolerance Output: RWER score vector r 1: set the starting vector q from the seed node s 2: repeat 3: r ← Ã (I − diag(c)) r + c r q compute error, δ = r − r update r ← r 6: until δ < 7: return r Theorem 3.1.(Convergence) Suppose the graph represented by Ã is irreducible and aperiodic.Then, the power iteration algorithm (Algorithm 2) for RWER converges.Proof.See Section 1.2 of [8] Theorem 3.2.(Time Complexity) The time complexity of Algorithm 2 is O(T m) where T is the number of iterations, and m is the number of edges.Proof.See Section 1.3 of [8].Theorem 3.2 indicates that our method in Algorithm 2 presents the linear scalability w.r.t the number of edges.

Figure 4 :
Figure 4: Flowchart of RWER (Algorithms 1 and 2) and SuRe (Algorithm 3).SuRe learns restart probability vector c, and RWER computes our node relevance score vector r for given seed node s. 3.6 Theoretical analysis.We analyze the time complexity of SuRe (Algorithm 3).Lemma 3.3.Let |P | and |N | denote the number of positive and negative nodes, respectively.The computation of r = x,y

Figure 5 :
Figure 5: Ranking performance on Polblogs.Our method SuRe provides the best ranking performance compared to other methods in terms of MAP and Precision@20.

Figure 6 :
Figure 6: Ranking performance on the signed networks.Our method SuRe obtains up to 17.9% higher MAP and 25.6% higher Precision@20 (in Slashdot) compared to the best competitor.

Figure 7 :
Figure7: Link prediction performance on the HepPh dataset.SuRe shows the highest accuracies: 12.5% higher MAP, and 6.8% higher Precision@20 compared to the best existing method.

Table 1 :
Dataset statistics.The query nodes are used for the ranking and the link prediction tasks.How effective is SuRe for link prediction tasks?•Q3.Parameter sensitivity (Section 4.4).How does the value of the origin vector used in SuRe affect the accuracy of link prediction?• Q4.Scalability (Section 4.5).How well doesSuRe scale up with the number of edges?4.1 Experimental Settings.Datasets.We experiment on various real-world network datasets.Datasets used in our experiments are summarized in Table

Table 2 :
Ranking results of our proposed method SuRe and other methods w.r.t. a query node obsidianwings, a liberal blog.Red colored nodes are conservative blogs, and the black colored ones are liberal blogs.Our ranking result from SuRe contains only liberal nodes, indicating the best result, while other ranking results wrongly contain conservative nodes.

Table 3 :
Number of parameters for each method.