Link Prediction in Weighted Networks: A Weighted Mutual Information Model

The link-prediction problem is an open issue in data mining and knowledge discovery, which attracts researchers from disparate scientific communities. A wealth of methods have been proposed to deal with this problem. Among these approaches, most are applied in unweighted networks, with only a few taking the weights of links into consideration. In this paper, we present a weighted model for undirected and weighted networks based on the mutual information of local network structures, where link weights are applied to further enhance the distinguishable extent of candidate links. Empirical experiments are conducted on four weighted networks, and results show that the proposed method can provide more accurate predictions than not only traditional unweighted indices but also typical weighted indices. Furthermore, some in-depth discussions on the effects of weak ties in link prediction as well as the potential to predict link weights are also given. This work may shed light on the design of algorithms for link prediction in weighted networks.


Introduction
The problem of link prediction attempts to uncover missing links and predict the emergence of future links in complex networks based on the available information, such as observed links and nodes' attributes [1][2][3]. Because of its broad applications in various domains, the study of link prediction has become a research hotspot. In some biological networks, such as proteinprotein interaction networks and metabolic networks [4,5], the discovery of interaction links is usually expensive. Therefore, accurate predictors can be applied for one to seek the most promising latent links, which will cost less than blindly checking all possible interaction connections [6,7]. With the overload of information nowadays, the dependence of people on information filtering systems, such as recommender systems, is increasing [8,9]. In this sense, link prediction can serve as a significant technique in recommender systems, such as e-commerce recommendation [10] and friendship recommendation [11,12]. Moreover, the technique of link prediction has been successfully applied to evaluate network evolving models [13,14], and also to identify spurious links [6]. Recently, the link-predictability problem was proposed to characterize the extent of links in a network could be predicted [15]. Accordingly, this can help us understand the organization of real networks.
Plenty of link prediction methods based on network structures have been proposed in the past years [16][17][18][19][20]. Among various approaches, Common Neighbors (CN) is the simplest one, which assumes that two nodes are more likely to form a link if they have more common neighbors. However, CN simply counts the number of common neighbors but ignores their different contributions on the connection likelihood. Hence, many variants of CN have been put forward to further boost the prediction accuracy by improving the discriminative extent of candidate links, such as Adamic-Adar (AA) [16] and Resource Allocation (RA) [17], where a common neighbor with low degree is advocated via assigning more weight on it. Based on the Bayesian theory, a local naïve Bayes model [18] was presented to differentiate the role of different common neighbors. In addition, node centrality (degree, closeness and betweenness) was also applied to make common neighbors more distinguishable [19]. Recently, Tan et al. [20] reexamined the role of common neighbors from the perspective of information theory, and the contributions of common neighbors are differentiated by the mutual information of local structures.
Most of previous studies on link prediction focused on unweighted networks but ignored the naturally existed link weights. Up to now, little literature is available on link prediction in weighted networks. Murata and Moriyasu [21] proposed the variants of CN, AA and RA as weighted indices for predicting the emergence of communications between users in social networks. It was revealed that proximities between nodes can be estimated better by using both graph proximity measures and the weights of existing links. In some networks, especially in social networks, weak ties may play a more important role than strong ties [22,23]. Lü and Zhou [24] investigated the role of weak ties in link prediction and suggested that emphasis on the contributions of weak ties can remarkably enhance the prediction accuracy. Sá and Prudêncio [25] studied the relevance of using link weights to improve supervised link prediction. Results proved that the prediction accuracy could be improved by using weights on the links.
In this paper, a weighted mutual information model is developed by gaining the benefits from both structural properties and link weights. In our model, the mutual information is adopted to estimate the effect of network structures on the connection likelihood. Different from the estimation of mutual information in Ref [20], we employ a more rigorous theoretic way here. Besides, the weights of links are applied to further emphasize the discriminative resolution of candidate links. Empirical experiments on four real-world weighted networks reveal that the proposed method improves the prediction accuracy substantially compared with not only traditional unweighted indices but also typical weighted indices. In addition, we also give some in-depth discussions on the role of weak ties in link prediction as well as the potential to predict link weights. We hope this work will provide some inspirations about how to incorporate the weights for link prediction in weighted networks.

Data and Problem Description
Four weighted networks from disparate fields are considered in our experiments. 1) Celegans: the neural network of the nematode worm C. elegans, where a node stands for a neuron, a link joins two neurons if they have synaptic contacts, and the weight represents the number of synapses between two neurons [26]. This network has 297 neurons and 2148 synaptic contacts. 2) USAir: the network of US air transportation, where the weight of a link is the frequency of flights between two airports [27]. This network contains 332 airports and 2126 airlines. 3) Baywet: the network which contains the carbon exchanges in the cypress wetlands of south Florida during the wet season [28], where a node represents a taxon, and an edge denotes that a taxon uses another taxon as food with a given trophic factor (feeding level). This network has 123 nodes and 2106 edges. 4) Bible: the lexical network with the nouns in King James Bible and information about their occurrences [28], where a node stands for a noun and a link indicates that two nouns appear together in the same verse. The weight on a link represents how often two nouns occurred together. This network contains 1773 nodes and 9131 edges.
In this paper, only an undirected weighted network G(V, E, W) is studied, where V, E and W denote sets of nodes, links and link weights, respectively. Note that, W xy = W yx , where W xy stands for the weight on link (x, y). Multiple links and self-loops are not allowed here. The task of link prediction is to discover missing links or predict future links. To do this, for each nonexistent node pair, namely a link (x, y) 2 U − E, where U stands for the universal set, we assign a score s xy to quantify the connection likelihood of nodes x and y. A higher score means higher probability that nodes x and y will form a link. All the non-existent links are sorted by their scores in descending order, and the links with highest ranks are most likely to appear.
To validate the prediction performance of a predictor, the observed links, E, are randomly divided into two parts: training set E T , is regarded as given information, and probe set E P , is only used for testing. Clearly, we have E T [ E P = E and E T \ E P = ø. In this paper, the training set always contains 90% of observed links, and the rest constitutes the probe set. We apply a standard metric called Precision to quantify the accuracy of prediction, which is defined as the ratio of true missing links in the predicted link set, i.e., if top L links are treated as predicted links while L r of which are in the probe set, then the value of Precision equals to L r /L.

Weighted Similarity Indices Based on Local Information
In most real-world networks, links are naturally weighted. The weight of a link may represent different meanings in different networks, such as the number of synapses and gap junctions in neural networks, the carbon flow between species in food webs or the amount of traffic load along connections in transportation networks. Murata and Moriyasu [21] studied the way to extend similarity indices from unweighted networks to weighted networks. Based on this method, the weighted cases of CN, AA and RA (named as WCN, WAA and WRA, respectively) are defined as [21,24] where O xy represents the common neighbor set of node pair (x, y), which can be written as O xy = {z : z 2 Γ(x) \ Γ(y)}. Γ(x) stands for the set of neighbors of node x. W xz is the weight of link (x, z). S z denotes the strength of node z, i.e., the sum of weights of links directly connected with node z, which is defined as S z = ∑ z 0 2Γ(z) W zz 0. For some networks, weak ties may play a more important role than strong ties in link prediction [24]. In order to investigate the role of weak ties in predicting missing links, Lü and Zhou [24] introduced a free parameter, α, to control the relative contributions of weak ties to the similarity measures. The indices WCN, WAA and WRA with parameters (denoted as WCN α , WAA α and WRA α , respectively) are where S z ¼ P z 0 2GðzÞ W a zz 0 . Note that, when α = 0, S z is the degree of node x, and the indices degenerate to the unweighted forms, namely CN, AA and RA. On the other hand, when α = 1, the indices are the simply weighted cases, as shown in Eqs (1)-(3).

Weighted Mutual Information Model
Considering a pair of disconnected nodes (x, y), our task is to determine a prediction measure that uses not only the structural properties of common neighbors of this node pair but also weights on corresponding links. As reported in literature [18,19], different common neighbors may have different contributions on the connection likelihood. Here we investigate the role of common neighbors from the perspective of mutual information [20,[29][30][31][32]. First of all, for the sake of brevity, some definitions about self-information and mutual information are given, respectively.
For two events (or random variables) X and Y, the conditional probability mass function is p(x|y) (x 2 X, y 2 Y), and the marginal probability mass functions are p(x) and p(y), respectively. The mutual information of two outcomes x i and y j (x i 2 X, y j 2 Y) can be derived as where I(x i |y j ) is the conditional self-information, which indicates the uncertainty of the occurrence of outcome x i given that outcome y j happens, and I(x i ) is the self information that quantifies the uncertainty of outcome x i . The mutual information measures how much the uncertainty about one event can be reduced by giving the outcome of the other event. Therefore, if two events are independent from each other, the mutual information equals to zero. Now consider the link-prediction problem. From the perspective of information theory, the estimation of connection likelihood between a pair of nodes can be treated as calculating the information of the event that two nodes are connected. More specifically, for a non-connected node pair (x, y), we use L 1 xy to denote the event that nodes x and y are connected. If the common neighbor set O xy is available, then the link likelihood can be estimated by ÀIðL 1 xy jO xy Þ [20,32]. According to the definitions of information, IðL 1 xy jO xy Þ can be written as where IðL 1 xy ; O xy Þ is the mutual information between the event that node pair (x, y) has one link and the event that node pair's common neighbors are given. IðL 1 xy Þ can be calculated through the prior probability where M T = |E T | and M ¼ jVjðjVjÀ1Þ

2
. | Á | denotes the cardinality of the set. Since the prior probabilities pðL 1 xy Þ are the same for every pair of nodes, here we define the connection likelihood as If the elements of O xy are supposed to be independent from each other, then Instead of estimating IðL 1 xy ; zÞ by averaging the mutual information over all node pairs connected to node z as presented in Ref [20], according to the definition Eq (7), IðL 1 xy ; zÞ can be calculated more accurately through where IðL 1 xy jzÞ is the conditional self-information of the event that node pair (x, y) have one link given that their common neighbor z is available. To calculate IðL 1 xy jzÞ, we need to obtain pðL 1 xy jzÞ. Generally speaking, pðL 1 xy jzÞ can be estimated by the clustering coefficient of node z, C z , which is defined as where N 4z and N^z are the numbers of connected and disconnected node pairs who share the common neighbor z, respectively. Altogether, we can obtain Note that, if nodes x and y do not own any common neighbor, IðL 1 xy ; zÞ equals to zero. Clearly, if C z = 1 for all nodes, then s xy degenerates to CN. Therefore, according to the clustering coefficient C z , different common neighbors offer different contributions on the connection likelihood.
Next, we will introduce how to enhance the accuracy of link prediction with link weights. In particular, CN-based unweighted indices have poor performance in low clustering networks [18]. In this case, additional information is needed to break the bottleneck. In WCN, WAA and WRA, the weights of links connecting common neighbors to the corresponding node pair are used to facilitate link prediction. Under this motivation, we add a weight function f(W xz , W zy ) in Eq (14) to combine the benefits from both structural properties and link weights, and obtain The proposed model is called Weighted Mutual Information (WMI). Although the expression of WMI model is similar to that of local naïve Bayes model [18], they are inspired by different motivations. The former is motivated by the combination of the benefits from both structure information and link weights, while the latter focuses on only network structures and tries to drill down the structure information. Here we apply Eqs (1) In order to distinguish the parameter-dependent versions Eqs (19)-(21) from the nonparameter ones Eqs (16)-(18), we call the latter pure WMI-based indices in the following discussions. Table 1 presents the comparison of our WMI model and other several typical unweighted methods under the measure of Precision. As literature [2,[18][19][20]24] suggested, the top L is set 100 in our experiments. According to the simulation results, without considering the fact of weak ties, the pure WMI-based indices achieve much higher prediction accuracy than the corresponding basic unweighted forms, namely CN, AA and RA, for Celegans and Baywet. In Baywet, the Precision value of WMI-WRA is even improved by nearly 10% compared with RA. In addition, we also give the comparison of our WMI model to the Local Naïve Bayes model (LNB) proposed in paper [18]. LNB-CN, LNB-AA and LNB-RA are the LNB forms of CN, AA and RA, respectively. Compared with the LNB model, pure WMI-based indices provide competitive prediction accuracy in Celegans and Baywet. Especially in Celegans, as the clustering coefficient is low (0.292, the lowest among the four networks), the LNB model can't improve the discriminative resolution of candidate links [18]; while with the help of weights on corresponding links, the WMI model makes them more distinguishable. Moreover, a comparison with node centrality based method [19] is also given in Table 1. Since the DC-CN index has the best overall performance among the node centrality based approaches, we only compare its optimal version with our model. From the results, except for USAir, our model shows competitive prediction accuracy with the DC-CN index. Further more, if we consider the parameterdependent versions Eqs (19)-(21) which take the role of weak ties into consideration, the prediction accuracy is enhanced substantially, and our WMI-based indices can achieve the best performance in Celegans, Baywet and Bible.

Experimental Results
From above results, it demonstrates that link weights could be applied to facilitate link prediction. In addition, the fact of weak ties needs to be emphasized in some networks, because weak ties may play a more significant role than strong ties in the prediction [24].
In order to further explore the role of weak ties in link prediction, the performances of parameter-dependent WMI-based indices with different α on four real-world networks are presented in Fig 1. And the optimal values of α are given in Table 2. From the results, we can find that the WMI-based indices obtain the best Precision values when α is smaller than 1 in USAir, Baywet and Bible, except for WMI-WRA α in Baywet. That means the link weights may not show the real strength of ties. Sometimes, the weak ties have a higher strength than their weights suggest. On the other hand, in Celegans, the optimal values of α are all greater than 1 for the WMI model, which on the contrary indicates that in some networks the role of weak ties can be as weak as their weights indicate. These results agree with the findings in Ref [24], which used different link prediction indices. This fact reveals that the role of weak ties is an essential characteristic of networks themselves, rather than the detailed link prediction method.   Finally, the performances of our WMI-based indices are compared to other weighted indices given by Eqs (1)-(6) and the reliable-route based methods [33]. As the results shown in Table 3, except for Celegans, all the pure WMI indices achieve better prediction accuracy than corresponding indices (i.e., WCN, WAA and WRA). Compared with the reliable-route based methods, namely, rWCN, rWAA and rWRA, the WMI model has better performance in all example networks except for Bible. Since the weighted indices given by Eqs (4)-(6) are parameter-dependent, which consider the role of weak ties as well, the results with different parameter α are also shown in Fig 1. From the results, we can conclude that the parameter-dependent WMI-based indices have consistent tendency with their basic weighted forms (i.e., WCN α , WAA α and WRA α ) in four real-world networks. In USAir, Baywet and Bible, the WMI model overwhelms its corresponding basic weighted forms almost at any α values, especially in USAir. Note that when α = 0, the example networks are all turned into unweighted forms, i.e. every link has the same weight. That's to say, compared with their basic weighted forms, the parameter-dependent WMI-based indices also have a better performance in unweighted networks according to Fig 1. If only consider the optimal results given by Table 3, we can find that except Celegans, the WMI-based indices achieve better prediction accuracy than their counterparts. In Celegans, the WMI-based indices also have nearly the same performance with their counterparts, and WMI-WCN Ã achieves the best performance among sixteen indices. Altogether, the WMI-based indices overwhelm the compared weighted indices.
Our experiments are conducted on a desktop computer with 8GB RAM and a Intel (R) Core (TM) i5-3470 CPU @ 3.20 GHz quad-core processor. To illustrate the computing efficiency of each predictor, we summarize their detailed computation time on four real-world networks in Table 4. The results indicate that the WMI based methods overwhelm the DC-CN index, and have relative high computing time but remain similar time scale to other unweighted and weighted methods.
In conclusion, the WMI model has better performance over other methods on weights networks and experiences reasonable time complexity. Table 3. Comparison of WMI-based methods with other typical weighted indices measured by Precision (top-100) on four networks. Each value is obtained by averaging over 100 independent runs of random division of training set and probe set. The abbreviations WCN*, WAA*, WRA*, WMI-WCN*, WMI-WAA* and WMI-WRA* represent the highest Precision values shown in Fig 1 (please refer to detailed α values in Table 2). The best performance in each network is marked by bold font. In practice, the choice of α in Eqs (4)- (6) and (19)-(21) still remains a problem. However, as we discussed above, if the strong ties have a significant role than weak ties, it's a good choice to set the value of α as 1 directly. For instance, in Celegans, all those methods perform well when incorporating weights with α = 1. Conversely, if the weak ties need to be emphasized, the selection of α is usually not easy. A widely applied approach is to divide the training set into two parts, and select one part as the validation set to search for an appropriate α. In Table 5, we randomly divide the original network into three parts: training set, validation set and test set, with a proportion 80%, 10% and 10% of the size of original network, respectively, and obtain the estimated optimal α values. Then we calculate the RMSD of the Precision values with the estimated and optimal α, respectively. From the results, we can find that the differences by applying the estimated values of α are small and acceptable, compared to using the optimal α in Table 2. Therefore, it's practical to employ this method to achieve an eligible α value.

Discussion
According to the empirical experiments, it demonstrates that the weak ties play different roles in different networks. For instance, the role of weak ties is more important than the role of strong ties in USAir, while on the contrary in Celegans. In Ref [24], a motif analysis of example networks is applied to elaborate the role of weak ties in link prediction. Here we try to get an in-depth understanding of the effects of weak ties from a different point of view. Among the similarity-based methods that incorporate link weights, one latent assumption is that the weights quantify the similarities or affinities between nodes. In other words, larger weights indicate closer relationship between nodes. For example, in Celegans, the weight of a link stands for the number of synapses between a neuron pair. If two neurons have many synaptic contacts, we believe that they have a close relationship with each other. Therefore, the weights describe the similarities between nodes positively. Under this condition, from Eqs (1)-(3) and (16)-(18), it can be concluded that if larger weights are assigned on the links connecting the common neighbors to candidate node pairs, the higher probability of the existence of links can be achieved. In this way, such weights are positively correlated with the connection likelihoods of links. Therefore, the role of weak ties should be depressed, while the role of strong ties, on the contrary, need to be advocated. As a result, the role of weak ties in Celegans is as weak as indicated by the results in Table 2.
However, not all the weights of networks exhibit similarities between nodes. It dependents on the network background. Specifically, the weights may represent dissimilarities between nodes, such as differences or distances. For instance, the weights in a power system network may stand for the distances between power stations. If two stations are far away from each other, the probability of the existence of a link between them is small. Under this situation, the weights are negatively correlated with the similarities of node pairs, and ulteriorly, negatively correlated with the connection likelihoods of links. In this case, if we directly apply such weights in Eqs (1)-(3) or (16)- (18) for link prediction, the results are worse than their unweighted cases accordingly, and this phenomenon is elucidated as the effects of weak ties on link prediction in Ref. [24]. Hence, in order to provide more accurate predictions, we should emphasize the role of weak ties as in Eqs (4)- (6) and (19)- (21). Consequently, the "revised weights" contribute positively to the connection likelihood. In USAir, the weight of a link represents the traffic flow between two airports. It's indicated that the role of weak ties is more significant than strong ties in this network. Given that most airports are local ones and only a few are hubs connecting different local airports, if two local airports have higher frequent flights to the same hub airport, then the probability of direct flight between these two local airports are lower. In this way, the weights of USAir are negatively correlated with the connection likelihoods of links. Consequently, the role of weak ties in such a network is emphasized as indicated in Table 2.
Altogether, if the weights are positively correlated with the similarities of node pairs, the role of weak ties is depressed. Otherwise, the role of weak ties should be advocated. Moreover, our model can also be used to predict the weights of missing links, which is also a significant task of link prediction in weighted networks. If the weights have a positive correlation with the connection likelihoods of links, we can use our model to get a score for each link, and then use the positive correlation between weights and scores to predict the missing weights (such as the method proposed in Ref [33]). On the contrary, if the weights denote the dissimilarities between nodes, the parameter α employed in Eqs (4)-(6) and (19)-(21) attempts to "modify" the weights to obtain a positive correlation with the similarities between nodes. After this modification, the method proposed in Ref [33] can then be applied to predict the "revised" weight, and finally the original weight can be predicted.

Conclusion
In this paper, we propose a weighted mutual information model for link prediction in weighted networks, which combines the benefits from both structural properties and link weights. To test our method, empirical experiments are carried out on four real-world networks. The comparisons are made from two aspects. On the one hand, comparing to unweighted indices, without considering the fact of weak ties, the pure WMI-based indices can overwhelm their basic unweighted forms and achieve competitive performance with the LNB model in Celegans and Baywet. In addition, by taking the weak ties into consideration, the WMI model always performs the best in most networks. On the other hand, compared with other weighted indices, the WMI model also overwhelms them in most networks. Furthermore, experiments on four real-world networks demonstrate that the WMI model enjoys reasonable computing time. Altogether, we conclude that the WMI model is effective in link prediction of weighted networks.
The presented unweighted indices extract information from CN-based structures, and they perform well in high clustering networks, such as Bible. However, when the network has low clustering, these unweighted indices based on only structure information perform poorly. In this case, our model could handle this situation well by additional weight information of links. Although our model has some advantages over previous methods, it may cost more time to search for a reasonable parameter value when the role of weak ties needs to be addressed. Further investigation and improvements include but not limited to following aspects. The proposed model combines the weight information and structure information in a brief way. Therefore, more efficient ways need to be explored. In addition, since the weights of links may not show the real strength of ties, we may try to reconstruct a weighted network where original link weights are replaced by the values that estimate the tie strength more accurately, which will facilitate the weighted indices for capturing similarities between nodes.