Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Noise-Filtering Method for Link Prediction in Complex Networks

  • Bo Ouyang ,

    Affiliation College of Electrical and Information Engineering, Hunan University, Changsha, Hunan Province, China

  • Lurong Jiang ,

    Contributed equally to this work with: Lurong Jiang, Zhaosheng Teng

    Affiliation School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou, Zhejiang Province, China

  • Zhaosheng Teng

    Contributed equally to this work with: Lurong Jiang, Zhaosheng Teng

    Affiliation College of Electrical and Information Engineering, Hunan University, Changsha, Hunan Province, China

A Noise-Filtering Method for Link Prediction in Complex Networks

  • Bo Ouyang, 
  • Lurong Jiang, 
  • Zhaosheng Teng


Link prediction plays an important role in both finding missing links in networked systems and complementing our understanding of the evolution of networks. Much attention from the network science community are paid to figure out how to efficiently predict the missing/future links based on the observed topology. Real-world information always contain noise, which is also the case in an observed network. This problem is rarely considered in existing methods. In this paper, we treat the existence of observed links as known information. By filtering out noises in this information, the underlying regularity of the connection information is retrieved and then used to predict missing or future links. Experiments on various empirical networks show that our method performs noticeably better than baseline algorithms.


About one and a half decades ago, Barabási and Albert pointed out that the property of scale-invariance of many real networked systems originates from a specific growth process, named preferential attachment [1]. Since then, the study of complex networks has led to dramatic changes in many different fields [27], and also, many facets of node attractiveness in growing networks, rather than preferential attachment, have been revealed, e.g. similarity [8]. Since different growing processes often result in networks with strikingly different macroscopic properties, how real-world networks are evolved is a fundamental question in understanding our complex world. Link prediction, one of whose capabilities is to rank the best candidates of future links, plays an important role in revealing the evolution processes of networks [9, 10].

On the other hand, many applications have to predict missing links in networked systems [1113]. Determining whether a link exists in such networks is usually very costly, yet the answer is crucial. For example, knowing the map of protein-protein interactions will reveal many aspects of the cellular function [14], but little has been studied. Link prediction are also widely used in these applications [15, 16].

The problem of link prediction has received much attention from the network science community in the past few years [9, 12, 17, 18]. In general, both topological feature and node attributes can be used in the prediction. However, the latter is usually unavailable or unreliable. For example, in online social networks, the personal information of users are inaccessible due to privacy policies. Thus, many algorithms consider only topological features.

Basically, there are two classes of topological methods—similarity-based and likelihood-based algorithms. Similarity based algorithms assume that two nodes are likely to be connected if they are similar. It assigns a score sxy to each pair of nodes x and y, which is defined as the similarity between them. All non-observed links are ranked according to their scores, and the links connecting more similar nodes are supposed to be of higher existence likelihoods. A wealth of methods of this type have been proposed. For example, CN (Common Neighbours) [19] uses the number of common neighbours to rank the similarity of nodes and the likelihood that they are/will be linked. Many variations of CN are also proposed: AA (Adamic-Adar) [20], Resource Allocation (RA) [19] give more importance to common neighbours with lower degree, and Jaccard’s index is a normalised CN. Only local structural information are used in these methods. There are also methods utilizing quasi-global or global information. For example, the Local Path method defines the similarity as the number of paths passing through two nodes, whose length may be larger than 2.

Recently, the organization patterns existing in many real-world networks are utilized in predicting missing links. Likelihood-based methods make assumptions of the structure, with specific parameters obtained by maximising the likelihood of the known structure. Predictions of the non-observed links are made based on the presumed pattern and the parameters. For example, Ref. [21] utilizes the hierarchical structure existing in many networks to predict missing links. And Cannistraci et al. propose the local-community-paradigm to improve the performance of classical predictors [13].

We know that real-world information always contains noise, which is also the case in an observed network. However, this problem is rarely considered in existing methods. In Ref. [18], the authors use the average of the eigen-decomposition of perturbed adjacency matrix (by removing some links) to suppress the noise. However, the underlying physical meaning is not clear, say, why should the eigenvectors of the adjacency matrix reflect the regularity of a network, if they actually are sensitive to perturbation [22]? Besides, it has a high computational complexity. In this paper, by treating the existence of observed links as known “information” (as in [23, 24]), and filtering out the noise in it, we obtain similarity scores for all non-observed links. We give a more theoretical analysis of the link prediction problem and a more meaningful demonstration of the noise-filtering (NF) method. Our method outperforms the typical predictors.

Materials and Methods


In this paper, two metrics are used to compare the performance of the base-line algorithms and the proposed noise-filtering method.

Consider that we are given an simple network G(V, E), where V and E are the set of nodes and links, respectively. By “simple”, we mean there are no self-loops or multi-links in the network. In a similarity-based algorithm, for each pair of nodes x, yV without a link, a similarity score is assigned. Then all unlinked pairs are ranked in descending order according to their scores, and the links on the top are considered as the ones with the highest likelihoods to be connected.

To test the accuracy of a predictor, we randomly divide the observed links in the network into a training set ET and a probe set EP. Here, ET is treated as known information while EP is only used to test the accuracy. Clearly, we have ETEP = E and ETEP = ∅.

In this study, we use two metrics, AUC (Area Under the Receiver operating characteristic curve) and precision to evaluate the performance of a predictor. They are defined as follows.

  • AUC: AUC is a metric in the receiver operating characteristics (ROC) analysis [25]. Taking the top L links as predicted links, a ROC curve is obtained by plotting true positive rates versus false positive rates for varying L values. Thus AUC can be interpreted as the probability that a randomly chosen missing link (i.e., a link in EP) has a higher score than a randomly chosen non-existent link (i.e., a link in UE), in the rank of all non-observed links. In the algorithmic implementation, if among n times of independent comparisons, there are n′ times in which the score of the missing link is higher than that of the non-existent link and n′′ times in which the two have the same score, then AUC can be expressed as (1) If all the scores are generated from an independent and identical distribution, AUC will be approximately 0.5. Therefore, the extent to which AUC exceeds 0.5 indicates how much better the algorithm performs than pure chance.
  • Precision: Given the ranking of the non-observed links, the precision is defined as the ratio of relevant items selected to the number of items selected. Thus if we choose the top-L links in the rank, and there are Lr links correctly predicted, then (2) Clearly, higher precision means higher accuracy. In this paper, L is always set to the size of the probe set.

Data Description

Networks from different fields are considered in the experiment, including biological, social, and technological networks. The original networks are turned into undirected, and simple (with multiple links or loops removed) networks. These networks are described in the following. i) Karate [26]: A social network of a university karate club. ii) FoodWeb [27]: A food web in Florida Bay during the rainy season. iii) Jazz [28]: A collaboration network of jazz musicians. iv) Neural [29]: The neural network of C.elegans. v) USAir [30]: The US Air transportation network. vi) Metabolic: The metabolic network of C.elegans. vii) Email [31]: A network of Alex Arenas’s email. viii) PB [32]: A network of US political blogs. ix) Yeast [33]: A protein-protein interaction network. x) EPA [34]: A network of web pages linking to the website Router [35]: The router-level topology of the Internet. xii) WikiVote [36, 37]: The network contains all the Wikipedia voting data from its inception till January 2008. Their basic topological parameters are summarized in Table 1.

Baseline Algorithms for Comparison

In this paper, six representative similarity indices are considered for performance comparison, including the Common Neighbours (CN), Adamic-Adar (AA) [20], Resource Allocation (RA) [19], Preferential Attachment (PA) [38], Local Path (LP) [39], and Katz [40]. The first four are local indices, the fifth is a quasi-local index, and the last is a global index. Some of them are briefly introduced earlier. Here we present the details of these algorithms.

  1. CN index. The CN index follows the intuition that two nodes x and y are more likely to have connection if their nearest neighbours overlap substantially. The similarity score is obtained by (3) where Γ(x) is the set of neighbours of x and | ⋅ | denotes the cardinality of a set.
  2. AA index. AA is a variation of CN: it gives less importance to common neighbours with high degree: (4)
  3. RA index. Similar to AA, the only difference is that RA punishes high-degree common neighbours to a higher extent: (5)
  4. PA index. The PA index supposes that popular nodes are more likely to be connected to. This index is defined as (6)
  5. LP index. Unlike the previous indices, LP uses second order information (information about neighbours of the neighbours) to improve performance. It is defined by (7)
  6. Katz index. This index sums over the number of paths (including loops) between two nodes, with each number exponentially damped by the path length (8)

Note that the LP index and Katz are both parameter-dependent.


Link Prediction via Noise Filtering

In many networks, the formation of links usually embodies both regularities and irregularities. Only the former shows a uniform pattern, which is called the intrinsic pattern. For a specific link, if its existence does not correspond with this pattern, then its existence should be treated as noise. For a specific link, if its existence does not correspond with the connection pattern of the whole network, then its existence is treated as noise. A large body of link prediction methods (i.e. common neighbor method) assumes that nodes are linked if they are similar. Following this assumption, we treat links connecting dissimilar nodes as noise. By filtering out the noise, we can obtain the intrinsic connection pattern, which can be further used to predict missing or future links.

To this end, one has to define a measure to quantify the degree to which a link connects dissimilar nodes.

For every node in the network, assume that its topological features are captured by some vectors in . Define the feature matrix X to be an n-by-m matrix whose rows are the feature vectors of nodes. Thus, Xik is the k-th feature of node i, and Xk, the k-th column vector of X, is the k-th feature of all nodes. In real-world cases, features usually contain noise.

In some typical link prediction methods (i.e. common neighbor method), nodes are assumed to be linked because they are similar. Now focusing on the k-th feature, we may measure to what degree dissimilar nodes are linked in the whole network by where ij indicates that i and j are neighbors, and L is the Laplacian matrix [41]. However, this measure is biased. In the rhs of the first equation, the feature Xik of node i appears in di different terms in the summation, where di is the degree of node i. So features of high-degree nodes dominate the value of , while in many real-world networks, most nodes are of low degree [1]. Thus the value of does not properly count the similarity of the features from the majority.

The rightmost term in the above equation is the quadratic form of the Laplacian. To treat features from different nodes equally, a natural alternative is using the quadratic form of the normalised Laplacian matrix [41], (9) The quadratic form of has similar interpretation of that of L, so larger Dk indicates to a larger extent, dissimilar nodes are linked together. Thus Dk can be used as a non-biased dissimilarity measure of the k-th feature.

In signal processing, to filter out noise, the signal is decomposed into a set of sine waves with different frequencies. For higher frequencies, the sine waves oscillate much more rapidly. Then the waves with frequencies that are considered within the band of noise are filtered out. In our case, the eigenvectors of the normalised Laplacian provide a similar notion of frequency. To understand this, denote by λ1 < λ2 < ⋯ < λn the eigenvalues of the normalised Laplacian matrix , and v1, v2, ⋯, vn the corresponding eigenvectors. The Courant-Fischer Theorem [42] tells us that (10) and (11)

So, if Xk = v1, then Dk achieves its smallest, which indicates that v1 oscillates slowly among connected nodes (since Dk is a dissimilarity measure). The eigenvectors associated with larger eigenvalues oscillate more rapidly.

Similar to filtering noise in signal processing, we can project Xk onto {vi}, and filter out the components with high “frequency”, i.e., the components on vi with large subscript i, since we treat the existence of links connecting dissimilar nodes as noise. Denote the cut-off threshold by t, the noise-filtered Xk reads (12) in which Vt = [v1, v2, ⋯, vt] is a matrix whose columns are the first t eigenvectors of L with the smallest eigenvalues.

Since no prerequisite is required for k, we can easily generalise the above derivation for the k-th feature to any other feature. Then we obtain the noise-filtered features for the whole network (13)

For any node i, its connections with all other nodes in the same network are totally characterized by the corresponding rows in the adjacency matrix A. So one may use these rows as the feature vectors for nodes, as in [43, 44], and interpret the k-th feature of node i as whether it is a neighbour of k. But there are some minor issues with this choice. Recall that the above derivation is based on the minimisation of the dissimilarity measure of all linked nodes (see Eq (9)). We now consider two linked nodes i and j, which have exactly the same neighbourhoods, so we expect the dissimilarity of them is 0. However, their i-th feature will not be the same, since the i-th feature of i is 0 while the i-th feature of j is 1. This is the same with the j-th feature. We can see from this analysis that one can use the rows of A + I rather than A as the feature vectors for nodes. So the k-th feature of node i can be interpreted as whether its to node k is no more than 1. This is further demonstrated in Fig 1.

Fig 1. Demonstration of using rows of A + I as the feature vectors for nodes.

In the network, nodes 4 and 5 are topologically equivalent. However, the 4th row of A reads [0, 1, 1, 0, 1], and the 5th reads [0, 1, 1, 1, 0], which are different. By adding I, the 4th and 5th rows of A + I now are both [0, 1, 1, 1, 1], which is exactly what we want. This is also the case for nodes 2 and 3. The k-th feature of a node can be interpreted as whether the distance between it and node k is no more than 1. For example, the distance between node 1 and 4 is greater than 1, while the distance between all the other nodes and node 4 are within 1, so the 4th feature is [0, 1, 1, 1, 1]T.

Apply the above methodology, we have (14)

Entries of reflect the intrinsic connection pattern, so they can be used to predict missing links. However, since we are focusing on undirected networks, there is still one problem with . We can see that according to Eq (14), it might not be symmetric. So we will make predictions based on entries of instead of .

Experimental Results

To compare the performance of the Noise-Filtering (NF) method and some well-known algorithms, 12 real-world networks, including biological, social, and technological networks, are considered in the experiments. They are transformed into undirected, and simple (with multiple links or loops removed) networks. The resulting networks are summarized in Table 1.

Table 2 shows the prediction accuracy measured by AUC. Results measured by another widely used metric, precision, is presented in Table 3. These metrics are introduced in the Methods section. The highest AUC/precision for each network (in each column) is shown in boldface. Under the AUC metric, NF performs best in 7 out of 12 networks, while under the precision metric, NF performs best in 9 of them. Figs 2 and 3 compare prediction accuracy of different algorithms under varied partitioning ratio. It can be seen that the proposed method is either the best or very close to the best, except for only one network—PB. Moreover, the robustness of the proposed method can also be verified by Figs 2 and 3. Since in most networks, the accuracy of the proposed method is either the best or very close to the best, even with the size of training sets varied.

Fig 2. Comparison of prediction accuracy under the AUC metric.

The fraction of training sets f is varied from 0.5 to 0.9.

Fig 3. Comparison of prediction accuracy under the precision metric.

The fraction of training sets f is varied from 0.5 to 0.9.

Intuitively, the more the amount of known information, the higher the prediction accuracy. But in Fig 3, we see that most of the time, the precisions do not increase with the size of training sets. This is due to different sizes of probe sets (follow a conventional way, we always set L in Eq (2) to the size of the probe set). Thus with different sizes of training set, the precisions cannot be compared [45].

Table 2. Comparison of the prediction accuracy under the AUC metric in real-world networks.

Table 3. Comparison of the prediction accuracy under the precision metric in real-world networks.

For all the parameter dependent methods considered in the experiment, i.e., LP, Katz, and NF, the results correspond to the optimal parameter, subject to the highest prediction accuracy. The optimal parameter can be found through a process similar to the K-fold validation. For example, in the proposed method NF, the training set is first partitioned into K units, a single unit is retained as the validation data for testing the method with specific t, and the remaining K − 1 units are used as known information. The cross-validation is then repeated K times (the folds), with each of the K units used exactly once as the validation data. The K results from the folds are then averaged. This whole process is repeated several times to find the optimal value of t (the value of optimal t is manually bounded in the range [1, 125], so the computation complexity is relatively small). In Fig 4, we see that for the two metrics considered here, the optimal t is robust, since the value of t where the prediction accuracy peaks does not change with the choice of the size of the training set. So there is no need to search for an optimal t in every single run of the simulation. Once the optimal t is found, it is set to this same value in all subsequent simulations, even with the size of the training set varied.

Fig 4. Prediction accuracy with different cutoff threshold t in the proposed noise-filtering method.

The symbol f denotes the fraction of links in the training sets.

The experiments are conducted on a workstation with 64 GB RAM and an Intel (R) Xeon (R) E5-2687W @ 3.10 GHz 8-core processor. The comparison of computational time is summarized in Table 4. We see that the proposed method NF has similar run time with the global index Katz, especially on large networks, but having better performance.

Table 4. Comparison of the computational efficiency in real-world networks.


Real-world information always contains noise. This is also the case when making observation of a network structure. This problem is rarely considered in existing link prediction methods. To address this issue, we treat the connection of a given network as known information, and filter out the noises in it, based on an assumption that connected nodes should have similar neighbourhoods. The underlying regularity of the connection information is then retrieved and used to predict missing or future links. Experimental results show that it performs better than typical algorithms. Future works include how to improve the performance of existing methods based on the same idea of noise filtering.


We acknowledge B. Y. Zhu, for his helpful suggestions. This work is supported by the Fundamental Research Funds for the Central Universities of China, Grant No. 531107040852.

Author Contributions

Conceived and designed the experiments: BO LJ. Performed the experiments: BO LJ. Analyzed the data: BO ZT. Contributed reagents/materials/analysis tools: BO ZT. Wrote the paper: BO LJ ZT. Drew figures: LJ ZT.


  1. 1. Barabási AL, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–512. pmid:10521342
  2. 2. Sachtjen ML, Carreras BA, Lynch VE. Disturbances in a power transmission system. Physical Review E. 2000;61(5):4877.
  3. 3. Kinney R, Crucitti P, Albert R, Latora V. Modeling cascading failures in the North American power grid. The European Physical Journal B. 2005;46(1);101–107.
  4. 4. Huang X, Vodenska I, Havlin S, Stanley HE. Cascading failures in bi-partite graphs: model for systemic risk propagation. Scientific Reports. 2013;3:1219. pmid:23386974
  5. 5. Borrvall C, Ebenman B, Jonsson T. Biodiversity lessens the risk of cascading extinction in model food webs. Ecology Letters. 2000;3(2):131–136.
  6. 6. Gao Y, Du W-B, Yan G. Selectively-informed particle swarm optimization. Scientific reports. 2015;5:9295. pmid:25787315
  7. 7. Du W-B, Gao Y, Liu C, Zheng Z, Wang Z. Adequate is better: particle swarm optimization with limited-information. Applied Mathematics and Computation. 2015;268:832–838.
  8. 8. Papadopoulos F, Kitsak M, Serrano MÁ, Boguñá M, Krioukov D. Popularity versus similarity in growing networks. Nature. 2012;489(7417):537–540. pmid:22972194
  9. 9. Zhang QM, Lü L, Wang WQ, Zhu YX, Zhou T, et al. Potential theory for directed networks. PloS one. 2013;8(2):e55437. pmid:23408979
  10. 10. Zhang QM, Xu XK, Zhu YX, Zhou T. Measuring multiple evolution mechanisms of complex networks. arXiv preprint arXiv:14103519. 2014.
  11. 11. Guimerà R, Sales-Pardo M. Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences. 2009;106(52):22073–22078.
  12. 12. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, Zhou T. Recommender systems. Physics Reports. 2012;519(1):1–49.
  13. 13. Cannistraci CV, Alanis-Lobato G, Ravasi T. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Scientific reports. 2013;3. pmid:23563395
  14. 14. Westermarck J, Ivaska J, Corthals GL. Identification of protein interactions involved in cellular signaling. Molecular & Cellular Proteomics. 2013;12(7):1752–1763.
  15. 15. Lü L, Zhou T. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications. 2011;390(6):1150–1170.
  16. 16. Wang P, Xu B, Wu Y, Zhou X. Link prediction in social networks: the state-of-the-art. Science China Information Sciences. 2015;58(1):1–38.
  17. 17. Wang WQ, Zhang QM, Zhou T. Evaluating network models: A likelihood analysis. EPL (Europhysics Letters). 2012;98(2):28004.
  18. 18. Lü L, Pan L, Zhou T, Zhang YC, Stanley HE. Toward link predictability of complex networks. Proceedings of the National Academy of Sciences. 2015;112(8):2325–2330.
  19. 19. Zhou T, Lü L, Zhang YC. Predicting missing links via local information. Eur Phys J B. 2009;71(4):623–630.
  20. 20. Adamic LA, Adar E. Friends and neighbors on the web. Social networks. 2003;25(3):211–230.
  21. 21. Clauset A, Moore C, Newman ME. Hierarchical structure and the prediction of missing links in networks. Nature. 2008;453(7191):98–101. pmid:18451861
  22. 22. Restrepo JG, Ott E, Hunt BR. Characterizing the dynamical importance of network nodes and links. Physical Review Letters. 2006;97(9):094102. pmid:17026366
  23. 23. Tan F, Xia Y, Zhu B. Link Prediction in Complex Networks: A Mutual Information Perspective. PLoS ONE. 2014;9:e107056. pmid:25207920
  24. 24. Zhu B, Xia Y. An information-theoretic model for link prediction in complex networks. Scientific reports. 2015;5:13707. pmid:26335758
  25. 25. Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nature methods. 2012;9(8):796–804. pmid:22796662
  26. 26. Zachary WW. An information flow model for conflict and fission in small groups. Journal of anthropological research. 1977;p. 452–473.
  27. 27. Ulanowicz RE, DeAngelis DL. Network analysis of trophic dynamics in south florida ecosystems. US Geological Survey Program on the South Florida Ecosystem. 2005;114.
  28. 28. Gleiser PM, Danon L. Community structure in jazz. Advances in complex systems. 2003;6(04):565–573.
  29. 29. Watts DJ, Strogatz SH. Collective dynamics of’small-world’ networks. Nature. 1998;393(6684):440–442. pmid:9623998
  30. 30. Vladimir B, Andrej M. Pajek datasets; 2006.
  31. 31. Duch J, Arenas A. Community detection in complex networks using extremal optimization. Physical review E. 2005;72(2):027104.
  32. 32. Adamic LA, Glance N. The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. ACM; 2005. p. 36–43.
  33. 33. Von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature. 2002;417(6887):399–403. pmid:12000970
  34. 34. Vladimir B. Pajek datasets; Accessed 2015 Aug 19.
  35. 35. Spring N, Mahajan R, Wetherall D. Measuring ISP topologies with Rocketfuel. In: ACM SIGCOMM Computer Communication Review. vol. 32. ACM; 2002. p. 133–145.
  36. 36. Leskovec J, Huttenlocher D, Kleinberg J. Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web. ACM; 2010. p. 641–650.
  37. 37. Leskovec J, Huttenlocher D, Kleinberg J. Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM; 2010. p. 1361–1370.
  38. 38. Newman ME. Clustering and preferential attachment in growing networks. Phys Rev E. 2001;64(2):025102.
  39. 39. Lü L, Jin CH, Zhou T. Similarity index based on local paths for link prediction of complex networks. Physical Review E. 2009;80(4):046122.
  40. 40. Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18(1):39–43.
  41. 41. Chung FR. Spectral graph theory. vol. 92. American Mathematical Soc.; 1997.
  42. 42. Horn RA, Johnson CR. Matrix analysis. Cambridge university press; 2012.
  43. 43. Burt RS. Positions in networks. Social forces. 1976;55(1):93–122.
  44. 44. Scott J. Social network analysis. Sage; 2012.
  45. 45. Zhao J., Miao L., Yang J., Fang H., Zhang Q.-M., Nie M., et al. Prediction of Links and Weights in Networks by Reliable Routes Scientific reports. 2015;5:12261.