A Noise-Filtering Method for Link Prediction in Complex Networks

Bo Ouyang; Lurong Jiang; Zhaosheng Teng

doi:10.1371/journal.pone.0146925

Abstract

Link prediction plays an important role in both finding missing links in networked systems and complementing our understanding of the evolution of networks. Much attention from the network science community are paid to figure out how to efficiently predict the missing/future links based on the observed topology. Real-world information always contain noise, which is also the case in an observed network. This problem is rarely considered in existing methods. In this paper, we treat the existence of observed links as known information. By filtering out noises in this information, the underlying regularity of the connection information is retrieved and then used to predict missing or future links. Experiments on various empirical networks show that our method performs noticeably better than baseline algorithms.

Citation: Ouyang B, Jiang L, Teng Z (2016) A Noise-Filtering Method for Link Prediction in Complex Networks. PLoS ONE 11(1): e0146925. https://doi.org/10.1371/journal.pone.0146925

Editor: Wen-Bo Du, Beihang University, CHINA

Received: November 13, 2015; Accepted: December 24, 2015; Published: January 20, 2016

Copyright: © 2016 Ouyang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper.

Funding: This work was supported by Fundamental Research Funds for the Central Universities of China, Grant No. 531107040852. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

About one and a half decades ago, Barabási and Albert pointed out that the property of scale-invariance of many real networked systems originates from a specific growth process, named preferential attachment [1]. Since then, the study of complex networks has led to dramatic changes in many different fields [2–7], and also, many facets of node attractiveness in growing networks, rather than preferential attachment, have been revealed, e.g. similarity [8]. Since different growing processes often result in networks with strikingly different macroscopic properties, how real-world networks are evolved is a fundamental question in understanding our complex world. Link prediction, one of whose capabilities is to rank the best candidates of future links, plays an important role in revealing the evolution processes of networks [9, 10].

On the other hand, many applications have to predict missing links in networked systems [11–13]. Determining whether a link exists in such networks is usually very costly, yet the answer is crucial. For example, knowing the map of protein-protein interactions will reveal many aspects of the cellular function [14], but little has been studied. Link prediction are also widely used in these applications [15, 16].

The problem of link prediction has received much attention from the network science community in the past few years [9, 12, 17, 18]. In general, both topological feature and node attributes can be used in the prediction. However, the latter is usually unavailable or unreliable. For example, in online social networks, the personal information of users are inaccessible due to privacy policies. Thus, many algorithms consider only topological features.

Basically, there are two classes of topological methods—similarity-based and likelihood-based algorithms. Similarity based algorithms assume that two nodes are likely to be connected if they are similar. It assigns a score s_xy to each pair of nodes x and y, which is defined as the similarity between them. All non-observed links are ranked according to their scores, and the links connecting more similar nodes are supposed to be of higher existence likelihoods. A wealth of methods of this type have been proposed. For example, CN (Common Neighbours) [19] uses the number of common neighbours to rank the similarity of nodes and the likelihood that they are/will be linked. Many variations of CN are also proposed: AA (Adamic-Adar) [20], Resource Allocation (RA) [19] give more importance to common neighbours with lower degree, and Jaccard’s index is a normalised CN. Only local structural information are used in these methods. There are also methods utilizing quasi-global or global information. For example, the Local Path method defines the similarity as the number of paths passing through two nodes, whose length may be larger than 2.

Recently, the organization patterns existing in many real-world networks are utilized in predicting missing links. Likelihood-based methods make assumptions of the structure, with specific parameters obtained by maximising the likelihood of the known structure. Predictions of the non-observed links are made based on the presumed pattern and the parameters. For example, Ref. [21] utilizes the hierarchical structure existing in many networks to predict missing links. And Cannistraci et al. propose the local-community-paradigm to improve the performance of classical predictors [13].

We know that real-world information always contains noise, which is also the case in an observed network. However, this problem is rarely considered in existing methods. In Ref. [18], the authors use the average of the eigen-decomposition of perturbed adjacency matrix (by removing some links) to suppress the noise. However, the underlying physical meaning is not clear, say, why should the eigenvectors of the adjacency matrix reflect the regularity of a network, if they actually are sensitive to perturbation [22]? Besides, it has a high computational complexity. In this paper, by treating the existence of observed links as known “information” (as in [23, 24]), and filtering out the noise in it, we obtain similarity scores for all non-observed links. We give a more theoretical analysis of the link prediction problem and a more meaningful demonstration of the noise-filtering (NF) method. Our method outperforms the typical predictors.

Materials and Methods

Metrics

In this paper, two metrics are used to compare the performance of the base-line algorithms and the proposed noise-filtering method.

Consider that we are given an simple network G(V, E), where V and E are the set of nodes and links, respectively. By “simple”, we mean there are no self-loops or multi-links in the network. In a similarity-based algorithm, for each pair of nodes x, y ∈ V without a link, a similarity score is assigned. Then all unlinked pairs are ranked in descending order according to their scores, and the links on the top are considered as the ones with the highest likelihoods to be connected.

To test the accuracy of a predictor, we randomly divide the observed links in the network into a training set E^T and a probe set E^P. Here, E^T is treated as known information while E^P is only used to test the accuracy. Clearly, we have E^T∪E^P = E and E^T∩E^P = ∅.

In this study, we use two metrics, AUC (Area Under the Receiver operating characteristic curve) and precision to evaluate the performance of a predictor. They are defined as follows.

AUC: AUC is a metric in the receiver operating characteristics (ROC) analysis [25]. Taking the top L links as predicted links, a ROC curve is obtained by plotting true positive rates versus false positive rates for varying L values. Thus AUC can be interpreted as the probability that a randomly chosen missing link (i.e., a link in E^P) has a higher score than a randomly chosen non-existent link (i.e., a link in U − E), in the rank of all non-observed links. In the algorithmic implementation, if among n times of independent comparisons, there are n′ times in which the score of the missing link is higher than that of the non-existent link and n′′ times in which the two have the same score, then AUC can be expressed as (1) If all the scores are generated from an independent and identical distribution, AUC will be approximately 0.5. Therefore, the extent to which AUC exceeds 0.5 indicates how much better the algorithm performs than pure chance.
Precision: Given the ranking of the non-observed links, the precision is defined as the ratio of relevant items selected to the number of items selected. Thus if we choose the top-L links in the rank, and there are L_r links correctly predicted, then (2) Clearly, higher precision means higher accuracy. In this paper, L is always set to the size of the probe set.

Data Description

Networks from different fields are considered in the experiment, including biological, social, and technological networks. The original networks are turned into undirected, and simple (with multiple links or loops removed) networks. These networks are described in the following. i) Karate [26]: A social network of a university karate club. ii) FoodWeb [27]: A food web in Florida Bay during the rainy season. iii) Jazz [28]: A collaboration network of jazz musicians. iv) Neural [29]: The neural network of C.elegans. v) USAir [30]: The US Air transportation network. vi) Metabolic: The metabolic network of C.elegans. vii) Email [31]: A network of Alex Arenas’s email. viii) PB [32]: A network of US political blogs. ix) Yeast [33]: A protein-protein interaction network. x) EPA [34]: A network of web pages linking to the website www.epa.gov.xi) Router [35]: The router-level topology of the Internet. xii) WikiVote [36, 37]: The network contains all the Wikipedia voting data from its inception till January 2008. Their basic topological parameters are summarized in Table 1.

Download:

Table 1. Topological parameters of the real-world networks.

https://doi.org/10.1371/journal.pone.0146925.t001

Baseline Algorithms for Comparison

In this paper, six representative similarity indices are considered for performance comparison, including the Common Neighbours (CN), Adamic-Adar (AA) [20], Resource Allocation (RA) [19], Preferential Attachment (PA) [38], Local Path (LP) [39], and Katz [40]. The first four are local indices, the fifth is a quasi-local index, and the last is a global index. Some of them are briefly introduced earlier. Here we present the details of these algorithms.

CN index. The CN index follows the intuition that two nodes x and y are more likely to have connection if their nearest neighbours overlap substantially. The similarity score is obtained by (3) where Γ(x) is the set of neighbours of x and | ⋅ | denotes the cardinality of a set.
AA index. AA is a variation of CN: it gives less importance to common neighbours with high degree: (4)
RA index. Similar to AA, the only difference is that RA punishes high-degree common neighbours to a higher extent: (5)
PA index. The PA index supposes that popular nodes are more likely to be connected to. This index is defined as (6)
LP index. Unlike the previous indices, LP uses second order information (information about neighbours of the neighbours) to improve performance. It is defined by (7)
Katz index. This index sums over the number of paths (including loops) between two nodes, with each number exponentially damped by the path length (8)

Note that the LP index and Katz are both parameter-dependent.

Results

Link Prediction via Noise Filtering

In many networks, the formation of links usually embodies both regularities and irregularities. Only the former shows a uniform pattern, which is called the intrinsic pattern. For a specific link, if its existence does not correspond with this pattern, then its existence should be treated as noise. For a specific link, if its existence does not correspond with the connection pattern of the whole network, then its existence is treated as noise. A large body of link prediction methods (i.e. common neighbor method) assumes that nodes are linked if they are similar. Following this assumption, we treat links connecting dissimilar nodes as noise. By filtering out the noise, we can obtain the intrinsic connection pattern, which can be further used to predict missing or future links.

To this end, one has to define a measure to quantify the degree to which a link connects dissimilar nodes.

For every node in the network, assume that its topological features are captured by some vectors in . Define the feature matrix X to be an n-by-m matrix whose rows are the feature vectors of nodes. Thus, X_ik is the k-th feature of node i, and X_•k, the k-th column vector of X, is the k-th feature of all nodes. In real-world cases, features usually contain noise.

In some typical link prediction methods (i.e. common neighbor method), nodes are assumed to be linked because they are similar. Now focusing on the k-th feature, we may measure to what degree dissimilar nodes are linked in the whole network by where i ∼ j indicates that i and j are neighbors, and L is the Laplacian matrix [41]. However, this measure is biased. In the rhs of the first equation, the feature X_ik of node i appears in d_i different terms in the summation, where d_i is the degree of node i. So features of high-degree nodes dominate the value of , while in many real-world networks, most nodes are of low degree [1]. Thus the value of does not properly count the similarity of the features from the majority.

The rightmost term in the above equation is the quadratic form of the Laplacian. To treat features from different nodes equally, a natural alternative is using the quadratic form of the normalised Laplacian matrix [41], (9) The quadratic form of has similar interpretation of that of L, so larger D_k indicates to a larger extent, dissimilar nodes are linked together. Thus D_k can be used as a non-biased dissimilarity measure of the k-th feature.

In signal processing, to filter out noise, the signal is decomposed into a set of sine waves with different frequencies. For higher frequencies, the sine waves oscillate much more rapidly. Then the waves with frequencies that are considered within the band of noise are filtered out. In our case, the eigenvectors of the normalised Laplacian provide a similar notion of frequency. To understand this, denote by λ₁ < λ₂ < ⋯ < λ_n the eigenvalues of the normalised Laplacian matrix , and v₁, v₂, ⋯, v_n the corresponding eigenvectors. The Courant-Fischer Theorem [42] tells us that (10) and (11)

So, if X_•k = v₁, then D_k achieves its smallest, which indicates that v₁ oscillates slowly among connected nodes (since D_k is a dissimilarity measure). The eigenvectors associated with larger eigenvalues oscillate more rapidly.

Similar to filtering noise in signal processing, we can project X_•k onto {v_i}, and filter out the components with high “frequency”, i.e., the components on v_i with large subscript i, since we treat the existence of links connecting dissimilar nodes as noise. Denote the cut-off threshold by t, the noise-filtered X_•k reads (12) in which V_t = [v₁, v₂, ⋯, v_t] is a matrix whose columns are the first t eigenvectors of L with the smallest eigenvalues.

Since no prerequisite is required for k, we can easily generalise the above derivation for the k-th feature to any other feature. Then we obtain the noise-filtered features for the whole network (13)

For any node i, its connections with all other nodes in the same network are totally characterized by the corresponding rows in the adjacency matrix A. So one may use these rows as the feature vectors for nodes, as in [43, 44], and interpret the k-th feature of node i as whether it is a neighbour of k. But there are some minor issues with this choice. Recall that the above derivation is based on the minimisation of the dissimilarity measure of all linked nodes (see Eq (9)). We now consider two linked nodes i and j, which have exactly the same neighbourhoods, so we expect the dissimilarity of them is 0. However, their i-th feature will not be the same, since the i-th feature of i is 0 while the i-th feature of j is 1. This is the same with the j-th feature. We can see from this analysis that one can use the rows of A + I rather than A as the feature vectors for nodes. So the k-th feature of node i can be interpreted as whether its to node k is no more than 1. This is further demonstrated in Fig 1.

Download:

Fig 1. Demonstration of using rows of A + I as the feature vectors for nodes.

In the network, nodes 4 and 5 are topologically equivalent. However, the 4th row of A reads [0, 1, 1, 0, 1], and the 5th reads [0, 1, 1, 1, 0], which are different. By adding I, the 4th and 5th rows of A + I now are both [0, 1, 1, 1, 1], which is exactly what we want. This is also the case for nodes 2 and 3. The k-th feature of a node can be interpreted as whether the distance between it and node k is no more than 1. For example, the distance between node 1 and 4 is greater than 1, while the distance between all the other nodes and node 4 are within 1, so the 4th feature is [0, 1, 1, 1, 1]^T.

https://doi.org/10.1371/journal.pone.0146925.g001

Apply the above methodology, we have (14)

Entries of reflect the intrinsic connection pattern, so they can be used to predict missing links. However, since we are focusing on undirected networks, there is still one problem with . We can see that according to Eq (14), it might not be symmetric. So we will make predictions based on entries of instead of .

Experimental Results

To compare the performance of the Noise-Filtering (NF) method and some well-known algorithms, 12 real-world networks, including biological, social, and technological networks, are considered in the experiments. They are transformed into undirected, and simple (with multiple links or loops removed) networks. The resulting networks are summarized in Table 1.

Table 2 shows the prediction accuracy measured by AUC. Results measured by another widely used metric, precision, is presented in Table 3. These metrics are introduced in the Methods section. The highest AUC/precision for each network (in each column) is shown in boldface. Under the AUC metric, NF performs best in 7 out of 12 networks, while under the precision metric, NF performs best in 9 of them. Figs 2 and 3 compare prediction accuracy of different algorithms under varied partitioning ratio. It can be seen that the proposed method is either the best or very close to the best, except for only one network—PB. Moreover, the robustness of the proposed method can also be verified by Figs 2 and 3. Since in most networks, the accuracy of the proposed method is either the best or very close to the best, even with the size of training sets varied.

Download:

Fig 2. Comparison of prediction accuracy under the AUC metric.

The fraction of training sets f is varied from 0.5 to 0.9.

https://doi.org/10.1371/journal.pone.0146925.g002

Download:

Fig 3. Comparison of prediction accuracy under the precision metric.

The fraction of training sets f is varied from 0.5 to 0.9.

https://doi.org/10.1371/journal.pone.0146925.g003

Intuitively, the more the amount of known information, the higher the prediction accuracy. But in Fig 3, we see that most of the time, the precisions do not increase with the size of training sets. This is due to different sizes of probe sets (follow a conventional way, we always set L in Eq (2) to the size of the probe set). Thus with different sizes of training set, the precisions cannot be compared [45].

Download:

Table 2. Comparison of the prediction accuracy under the AUC metric in real-world networks.

https://doi.org/10.1371/journal.pone.0146925.t002

Download:

Table 3. Comparison of the prediction accuracy under the precision metric in real-world networks.

https://doi.org/10.1371/journal.pone.0146925.t003

For all the parameter dependent methods considered in the experiment, i.e., LP, Katz, and NF, the results correspond to the optimal parameter, subject to the highest prediction accuracy. The optimal parameter can be found through a process similar to the K-fold validation. For example, in the proposed method NF, the training set is first partitioned into K units, a single unit is retained as the validation data for testing the method with specific t, and the remaining K − 1 units are used as known information. The cross-validation is then repeated K times (the folds), with each of the K units used exactly once as the validation data. The K results from the folds are then averaged. This whole process is repeated several times to find the optimal value of t (the value of optimal t is manually bounded in the range [1, 125], so the computation complexity is relatively small). In Fig 4, we see that for the two metrics considered here, the optimal t is robust, since the value of t where the prediction accuracy peaks does not change with the choice of the size of the training set. So there is no need to search for an optimal t in every single run of the simulation. Once the optimal t is found, it is set to this same value in all subsequent simulations, even with the size of the training set varied.

Download:

Fig 4. Prediction accuracy with different cutoff threshold t in the proposed noise-filtering method.

The symbol f denotes the fraction of links in the training sets.

https://doi.org/10.1371/journal.pone.0146925.g004

The experiments are conducted on a workstation with 64 GB RAM and an Intel (R) Xeon (R) E5-2687W @ 3.10 GHz 8-core processor. The comparison of computational time is summarized in Table 4. We see that the proposed method NF has similar run time with the global index Katz, especially on large networks, but having better performance.

Download:

Table 4. Comparison of the computational efficiency in real-world networks.

https://doi.org/10.1371/journal.pone.0146925.t004

Discussion

Real-world information always contains noise. This is also the case when making observation of a network structure. This problem is rarely considered in existing link prediction methods. To address this issue, we treat the connection of a given network as known information, and filter out the noises in it, based on an assumption that connected nodes should have similar neighbourhoods. The underlying regularity of the connection information is then retrieved and used to predict missing or future links. Experimental results show that it performs better than typical algorithms. Future works include how to improve the performance of existing methods based on the same idea of noise filtering.

Acknowledgments

We acknowledge B. Y. Zhu, for his helpful suggestions. This work is supported by the Fundamental Research Funds for the Central Universities of China, Grant No. 531107040852.

Author Contributions

Conceived and designed the experiments: BO LJ. Performed the experiments: BO LJ. Analyzed the data: BO ZT. Contributed reagents/materials/analysis tools: BO ZT. Wrote the paper: BO LJ ZT. Drew figures: LJ ZT.

References

1. Barabási AL, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–512. pmid:10521342
- View Article
- PubMed/NCBI
- Google Scholar
2. Sachtjen ML, Carreras BA, Lynch VE. Disturbances in a power transmission system. Physical Review E. 2000;61(5):4877.
- View Article
- Google Scholar
3. Kinney R, Crucitti P, Albert R, Latora V. Modeling cascading failures in the North American power grid. The European Physical Journal B. 2005;46(1);101–107.
- View Article
- Google Scholar
4. Huang X, Vodenska I, Havlin S, Stanley HE. Cascading failures in bi-partite graphs: model for systemic risk propagation. Scientific Reports. 2013;3:1219. pmid:23386974
- View Article
- PubMed/NCBI
- Google Scholar
5. Borrvall C, Ebenman B, Jonsson T. Biodiversity lessens the risk of cascading extinction in model food webs. Ecology Letters. 2000;3(2):131–136.
- View Article
- Google Scholar
6. Gao Y, Du W-B, Yan G. Selectively-informed particle swarm optimization. Scientific reports. 2015;5:9295. pmid:25787315
- View Article
- PubMed/NCBI
- Google Scholar
7. Du W-B, Gao Y, Liu C, Zheng Z, Wang Z. Adequate is better: particle swarm optimization with limited-information. Applied Mathematics and Computation. 2015;268:832–838.
- View Article
- Google Scholar
8. Papadopoulos F, Kitsak M, Serrano MÁ, Boguñá M, Krioukov D. Popularity versus similarity in growing networks. Nature. 2012;489(7417):537–540. pmid:22972194
- View Article
- PubMed/NCBI
- Google Scholar
9. Zhang QM, Lü L, Wang WQ, Zhu YX, Zhou T, et al. Potential theory for directed networks. PloS one. 2013;8(2):e55437. pmid:23408979
- View Article
- PubMed/NCBI
- Google Scholar
10. Zhang QM, Xu XK, Zhu YX, Zhou T. Measuring multiple evolution mechanisms of complex networks. arXiv preprint arXiv:14103519. 2014.
11. Guimerà R, Sales-Pardo M. Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences. 2009;106(52):22073–22078.
- View Article
- Google Scholar
12. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, Zhou T. Recommender systems. Physics Reports. 2012;519(1):1–49.
- View Article
- Google Scholar
13. Cannistraci CV, Alanis-Lobato G, Ravasi T. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Scientific reports. 2013;3. pmid:23563395
- View Article
- PubMed/NCBI
- Google Scholar
14. Westermarck J, Ivaska J, Corthals GL. Identification of protein interactions involved in cellular signaling. Molecular & Cellular Proteomics. 2013;12(7):1752–1763.
- View Article
- Google Scholar
15. Lü L, Zhou T. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications. 2011;390(6):1150–1170.
- View Article
- Google Scholar
16. Wang P, Xu B, Wu Y, Zhou X. Link prediction in social networks: the state-of-the-art. Science China Information Sciences. 2015;58(1):1–38.
- View Article
- Google Scholar
17. Wang WQ, Zhang QM, Zhou T. Evaluating network models: A likelihood analysis. EPL (Europhysics Letters). 2012;98(2):28004.
- View Article
- Google Scholar
18. Lü L, Pan L, Zhou T, Zhang YC, Stanley HE. Toward link predictability of complex networks. Proceedings of the National Academy of Sciences. 2015;112(8):2325–2330.
- View Article
- Google Scholar
19. Zhou T, Lü L, Zhang YC. Predicting missing links via local information. Eur Phys J B. 2009;71(4):623–630.
- View Article
- Google Scholar
20. Adamic LA, Adar E. Friends and neighbors on the web. Social networks. 2003;25(3):211–230.
- View Article
- Google Scholar
21. Clauset A, Moore C, Newman ME. Hierarchical structure and the prediction of missing links in networks. Nature. 2008;453(7191):98–101. pmid:18451861
- View Article
- PubMed/NCBI
- Google Scholar
22. Restrepo JG, Ott E, Hunt BR. Characterizing the dynamical importance of network nodes and links. Physical Review Letters. 2006;97(9):094102. pmid:17026366
- View Article
- PubMed/NCBI
- Google Scholar
23. Tan F, Xia Y, Zhu B. Link Prediction in Complex Networks: A Mutual Information Perspective. PLoS ONE. 2014;9:e107056. pmid:25207920
- View Article
- PubMed/NCBI
- Google Scholar
24. Zhu B, Xia Y. An information-theoretic model for link prediction in complex networks. Scientific reports. 2015;5:13707. pmid:26335758
- View Article
- PubMed/NCBI
- Google Scholar
25. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nature methods. 2012;9(8):796–804. pmid:22796662
- View Article
- PubMed/NCBI
- Google Scholar
26. Zachary WW. An information flow model for conflict and fission in small groups. Journal of anthropological research. 1977;p. 452–473.
- View Article
- Google Scholar
27. Ulanowicz RE, DeAngelis DL. Network analysis of trophic dynamics in south florida ecosystems. US Geological Survey Program on the South Florida Ecosystem. 2005;114.
- View Article
- Google Scholar
28. Gleiser PM, Danon L. Community structure in jazz. Advances in complex systems. 2003;6(04):565–573.
- View Article
- Google Scholar
29. Watts DJ, Strogatz SH. Collective dynamics of’small-world’ networks. Nature. 1998;393(6684):440–442. pmid:9623998
- View Article
- PubMed/NCBI
- Google Scholar
30. Vladimir B, Andrej M. Pajek datasets; 2006. http://vlado.fmf.uni-lj.si/pub/networks/data/.
31. Duch J, Arenas A. Community detection in complex networks using extremal optimization. Physical review E. 2005;72(2):027104.
- View Article
- Google Scholar
32. Adamic LA, Glance N. The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. ACM; 2005. p. 36–43.
33. Von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature. 2002;417(6887):399–403. pmid:12000970
- View Article
- PubMed/NCBI
- Google Scholar
34. Vladimir B. Pajek datasets; Accessed 2015 Aug 19. http://vlado.fmf.uni-lj.si/pub/networks/data/mix/mixed.htm.
35. Spring N, Mahajan R, Wetherall D. Measuring ISP topologies with Rocketfuel. In: ACM SIGCOMM Computer Communication Review. vol. 32. ACM; 2002. p. 133–145.
36. Leskovec J, Huttenlocher D, Kleinberg J. Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web. ACM; 2010. p. 641–650.
37. Leskovec J, Huttenlocher D, Kleinberg J. Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM; 2010. p. 1361–1370.
38. Newman ME. Clustering and preferential attachment in growing networks. Phys Rev E. 2001;64(2):025102.
- View Article
- Google Scholar
39. Lü L, Jin CH, Zhou T. Similarity index based on local paths for link prediction of complex networks. Physical Review E. 2009;80(4):046122.
- View Article
- Google Scholar
40. Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43.
- View Article
- Google Scholar
41. Chung FR. Spectral graph theory. vol. 92. American Mathematical Soc.; 1997.
42. Horn RA, Johnson CR. Matrix analysis. Cambridge university press; 2012.
43. Burt RS. Positions in networks. Social forces. 1976;55(1):93–122.
- View Article
- Google Scholar
44. Scott J. Social network analysis. Sage; 2012.
45. Zhao J., Miao L., Yang J., Fang H., Zhang Q.-M., Nie M., et al. Prediction of Links and Weights in Networks by Reliable Routes Scientific reports. 2015;5:12261.

[ref1] 1. Barabási AL, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–512. pmid:10521342
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Sachtjen ML, Carreras BA, Lynch VE. Disturbances in a power transmission system. Physical Review E. 2000;61(5):4877.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Kinney R, Crucitti P, Albert R, Latora V. Modeling cascading failures in the North American power grid. The European Physical Journal B. 2005;46(1);101–107.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Huang X, Vodenska I, Havlin S, Stanley HE. Cascading failures in bi-partite graphs: model for systemic risk propagation. Scientific Reports. 2013;3:1219. pmid:23386974
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref5] 5. Borrvall C, Ebenman B, Jonsson T. Biodiversity lessens the risk of cascading extinction in model food webs. Ecology Letters. 2000;3(2):131–136.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. Gao Y, Du W-B, Yan G. Selectively-informed particle swarm optimization. Scientific reports. 2015;5:9295. pmid:25787315
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Du W-B, Gao Y, Liu C, Zheng Z, Wang Z. Adequate is better: particle swarm optimization with limited-information. Applied Mathematics and Computation. 2015;268:832–838.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref8] 8. Papadopoulos F, Kitsak M, Serrano MÁ, Boguñá M, Krioukov D. Popularity versus similarity in growing networks. Nature. 2012;489(7417):537–540. pmid:22972194
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref9] 9. Zhang QM, Lü L, Wang WQ, Zhu YX, Zhou T, et al. Potential theory for directed networks. PloS one. 2013;8(2):e55437. pmid:23408979
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref10] 10. Zhang QM, Xu XK, Zhu YX, Zhou T. Measuring multiple evolution mechanisms of complex networks. arXiv preprint arXiv:14103519. 2014.

[ref11] 11. Guimerà R, Sales-Pardo M. Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences. 2009;106(52):22073–22078.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref12] 12. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, Zhou T. Recommender systems. Physics Reports. 2012;519(1):1–49.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Cannistraci CV, Alanis-Lobato G, Ravasi T. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Scientific reports. 2013;3. pmid:23563395
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref14] 14. Westermarck J, Ivaska J, Corthals GL. Identification of protein interactions involved in cellular signaling. Molecular & Cellular Proteomics. 2013;12(7):1752–1763.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref15] 15. Lü L, Zhou T. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications. 2011;390(6):1150–1170.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref16] 16. Wang P, Xu B, Wu Y, Zhou X. Link prediction in social networks: the state-of-the-art. Science China Information Sciences. 2015;58(1):1–38.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref17] 17. Wang WQ, Zhang QM, Zhou T. Evaluating network models: A likelihood analysis. EPL (Europhysics Letters). 2012;98(2):28004.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref18] 18. Lü L, Pan L, Zhou T, Zhang YC, Stanley HE. Toward link predictability of complex networks. Proceedings of the National Academy of Sciences. 2015;112(8):2325–2330.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref19] 19. Zhou T, Lü L, Zhang YC. Predicting missing links via local information. Eur Phys J B. 2009;71(4):623–630.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref20] 20. Adamic LA, Adar E. Friends and neighbors on the web. Social networks. 2003;25(3):211–230.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref21] 21. Clauset A, Moore C, Newman ME. Hierarchical structure and the prediction of missing links in networks. Nature. 2008;453(7191):98–101. pmid:18451861
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref22] 22. Restrepo JG, Ott E, Hunt BR. Characterizing the dynamical importance of network nodes and links. Physical Review Letters. 2006;97(9):094102. pmid:17026366
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref23] 23. Tan F, Xia Y, Zhu B. Link Prediction in Complex Networks: A Mutual Information Perspective. PLoS ONE. 2014;9:e107056. pmid:25207920
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref24] 24. Zhu B, Xia Y. An information-theoretic model for link prediction in complex networks. Scientific reports. 2015;5:13707. pmid:26335758
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref25] 25. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nature methods. 2012;9(8):796–804. pmid:22796662
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref26] 26. Zachary WW. An information flow model for conflict and fission in small groups. Journal of anthropological research. 1977;p. 452–473.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref27] 27. Ulanowicz RE, DeAngelis DL. Network analysis of trophic dynamics in south florida ecosystems. US Geological Survey Program on the South Florida Ecosystem. 2005;114.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref28] 28. Gleiser PM, Danon L. Community structure in jazz. Advances in complex systems. 2003;6(04):565–573.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref29] 29. Watts DJ, Strogatz SH. Collective dynamics of’small-world’ networks. Nature. 1998;393(6684):440–442. pmid:9623998
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref30] 30. Vladimir B, Andrej M. Pajek datasets; 2006. http://vlado.fmf.uni-lj.si/pub/networks/data/.

[ref31] 31. Duch J, Arenas A. Community detection in complex networks using extremal optimization. Physical review E. 2005;72(2):027104.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref32] 32. Adamic LA, Glance N. The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. ACM; 2005. p. 36–43.

[ref33] 33. Von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature. 2002;417(6887):399–403. pmid:12000970
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref34] 34. Vladimir B. Pajek datasets; Accessed 2015 Aug 19. http://vlado.fmf.uni-lj.si/pub/networks/data/mix/mixed.htm.

[ref35] 35. Spring N, Mahajan R, Wetherall D. Measuring ISP topologies with Rocketfuel. In: ACM SIGCOMM Computer Communication Review. vol. 32. ACM; 2002. p. 133–145.

[ref36] 36. Leskovec J, Huttenlocher D, Kleinberg J. Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web. ACM; 2010. p. 641–650.

[ref37] 37. Leskovec J, Huttenlocher D, Kleinberg J. Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM; 2010. p. 1361–1370.

[ref38] 38. Newman ME. Clustering and preferential attachment in growing networks. Phys Rev E. 2001;64(2):025102.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref39] 39. Lü L, Jin CH, Zhou T. Similarity index based on local paths for link prediction of complex networks. Physical Review E. 2009;80(4):046122.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref40] 40. Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref41] 41. Chung FR. Spectral graph theory. vol. 92. American Mathematical Soc.; 1997.

[ref42] 42. Horn RA, Johnson CR. Matrix analysis. Cambridge university press; 2012.

[ref43] 43. Burt RS. Positions in networks. Social forces. 1976;55(1):93–122.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref44] 44. Scott J. Social network analysis. Sage; 2012.

[ref45] 45. Zhao J., Miao L., Yang J., Fang H., Zhang Q.-M., Nie M., et al. Prediction of Links and Weights in Networks by Reliable Routes Scientific reports. 2015;5:12261.

Figures

Abstract

Introduction

Materials and Methods

Metrics

Data Description

Baseline Algorithms for Comparison

Results

Link Prediction via Noise Filtering

Experimental Results

Discussion

Acknowledgments

Author Contributions

References