Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Gravity Effects on Information Filtering and Network Evolving

  • Jin-Hu Liu,

    Affiliations Web Sciences Center, University of Electronic Science and Technology of China, Chengdu, People's Republic of China, Institute of Information Economy, Hangzhou Normal University, Hangzhou, People's Republic of China, Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, People's Republic of China

  • Zi-Ke Zhang ,

    zhangzike@gmail.com (ZKZ); xueqiniu.wang@gmail.com (XQW)

    Affiliations Institute of Information Economy, Hangzhou Normal University, Hangzhou, People's Republic of China, Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, People's Republic of China

  • Lingjiao Chen,

    Affiliations Web Sciences Center, University of Electronic Science and Technology of China, Chengdu, People's Republic of China, Institute of Information Economy, Hangzhou Normal University, Hangzhou, People's Republic of China, Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, People's Republic of China

  • Chuang Liu,

    Affiliations Institute of Information Economy, Hangzhou Normal University, Hangzhou, People's Republic of China, Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, People's Republic of China

  • Chengcheng Yang,

    Affiliation Web Sciences Center, University of Electronic Science and Technology of China, Chengdu, People's Republic of China

  • Xueqi Wang

    zhangzike@gmail.com (ZKZ); xueqiniu.wang@gmail.com (XQW)

    Affiliation Division of Translational Medicine, Shanghai Changzheng Hospital, Second Military Medical University, Shanghai, People's Republic of China

Gravity Effects on Information Filtering and Network Evolving

  • Jin-Hu Liu, 
  • Zi-Ke Zhang, 
  • Lingjiao Chen, 
  • Chuang Liu, 
  • Chengcheng Yang, 
  • Xueqi Wang
PLOS
x

Abstract

In this paper, based on the gravity principle of classical physics, we propose a tunable gravity-based model, which considers tag usage pattern to weigh both the mass and distance of network nodes. We then apply this model in solving the problems of information filtering and network evolving. Experimental results on two real-world data sets, Del.icio.us and MovieLens, show that it can not only enhance the algorithmic performance, but can also better characterize the properties of real networks. This work may shed some light on the in-depth understanding of the effect of gravity model.

Introduction

As one of the four fundamental interactions of nature, the Gravity law was discovered from the well-known Galilei's dropping ball experiment at the Leaning Tower of Pisa [1]. In the next four centuries, it has proved a great success in explaining the basic mechanisms governing the revolution of the heavenly bodies in macroscopic space [2]. In addition to understanding the natural rules, the Gravity model has also been used in a wide rage of domains in discovering a universal mechanism driving the dynamics of various phenomena, such as population migration [3], [4], transportation flows [5][8], trade [9][12] and trip [13][15] distributions. However, research on gravity-based interaction in online social systems is lacking attention. Recently, many pioneering works in this field focus on complex network based evolution and prediction, where nodes represent individuals, and edges denote the relations between them [16][20]. In network evolutionary studies, the main objective is to uncover the strategy of how to connect two nodes [16]. There is a vast class of studies trying to understand both the static features and dynamic properties of evolving networks [21], [22]. Some classical models, such as ER network [23], WS network [24], BA network [25], have been proposed to open the new horizons for the theoretical study of random graphs. After that, many extensive variants considering different factors (e.g. the aging effect [26], [27] and social impact [18], [28], [29]) have been presented to complete such a popular field. Analogously, the Information Filtering technique aims at mining missing links via estimating the indirect similarities of the two observed nodes [30]. Recommender Systems (RS) [31] is one of the most promising information filtering techniques to solve the problem of Information Overload. The RS aims at finding objects (e.g. books, movies etc.) that are most likely to be collected by online users based on their historical behaviors and the attributes of nodes. Unlike the classical information retrieval strategy which can be viewed as recommending documents with given words [32], RS can be classified into two categories: (i) estimation of similarity based on the historical records of user activities, such as user-based and object-based similarity [33][40]; (ii) incorporating accessorial information, such as object attributes and descriptions, to extensively assist the corresponding prediction algorithms [41], [42].

Therefore, the essential problem of both network evolution and recommender systems is to evaluate the similarity of each unconnected node pair, which is the core function that gravity model can provide. However, to the best of our knowledge, examples are relatively rare in adopting gravity model in online systems. In this paper, we apply gravity model in a particular scenario, Social Tagging Networks, [41][46], and take the tag usage pattern to weigh the nodes' mass and distance, and then verify this definition in a tunable classical gravity model. Experimental results on two representative datasets, Del.icio.us and MovieLens, show that the proposed gravity model can significantly enhance the recommendation performance. Further numerical observation on an evolutionary network model demonstrates that the gravity-based mechanism can better characterize the properties of real networks than the other two baseline models.

Gravity Based Recommender Systems

We begin our study with introducing gravity law based recommendation algorithms, as well as two baseline algorithms to evaluate its performance on tag-based information filtering. Conventionally, a tag-aware recommender system can be represented in a triple form [42]: , where , , and are respectively the sets of users, objects and tags. As a complete tagging action, , normally consists an arbitrary number of tags, e.g. , which indicates that user has assigned object with a tag set . Therefore, can be regarded as attributes for both and . Consequently, we use two matrices, and , to describe the user-object and object-tag relations, respectively. For , if user has selected object , , otherwise . Analogously, , if has been assigned with tag , and , otherwise. In addition, we also use two weighted matrices, and , to represent tagging preference of users and objects, respectively. We denote as the number of tag assigned by user , and as the number of tag assigned with object .

In this section, we introduce two baseline tag-aware algorithms based on the concept of mass diffusion [33], as well as the proposed gravity law based algorithm. Given a target user , the final resource of an object , , is calculated based on following methods. Finally, objects that hasn't selected will be recommended according to their respective final resources.

  1. Suppose the initial resource averagely located on objects has selected (resource for every selected object is initially set to 1) and each object equally distributes its resource to all neighbouring tags, and then each tag redistributes the received resource averagely to all its neighbouring objects. Therefore, after diffusion, the resource located on finally is [41](1)where is the number of neighboring objects of tag , and is the number of neighbouring tags of object .
  2. With considering the weighted matrix , the initial resources are located on tags and the resource received is proportional to . Then each tag equally distributes the initial resource to all its neighbouring objects. Therefore, after diffusion, the resource located on finally is [42](2)where is the number of neighboring objects of tag .
  3. Different from (I) and (II), this algorithm dose not only consider the network structure, but also take into account the common features of both users and objects. In this paper, we adopt the gravity model to estimate the likelihood of each user-object pair. Based on the classical gravity model, the resource located on finally is(3)where is the mass of user , is the mass of object , , where if and otherwise, indicating the number of common properties that user and object both hold, and is a tunable parameter. only counts how many common tag attributes that user and object simultaneously have, neglecting the accumulated times each tag has been used neither by nor .

For one typical personalized recommendation process, it aims at optimizing the utility of each individual. That is to say, once the target user is fixed, the sole purpose of a recommendation algorithm is to estimate the score of every object, which is defined by on of the three objective functions Eq. (1)Eq. (3). Therefore, the vector element of does not contain due to that the inial influence of is the same for every object. Once the object score function is defined, the recommendation process will be performed on each target individual , and all the objects that s/he has not collected are ranked in a descending order according to (generated by any one of the three algorithms (I)–(III)). Eventually, the top objects will be recommended to this user.

Generally, algorithm (I) only considers the tags effect on objects, neglecting the users' interest in tags, algorithm (II) only considers the user-tag weights, neglecting the weights of object-tag relations, while algorithm (III) takes both of them into account. The advantages of algorithm (III) are clear. On one hand, algorithm (III) not only considers the popularity (reflected by mass) of users and objects, but also directly takes into account the common features between them. Hence it might be a promising way to mine the potential preference of users. On the other hand, algorithm (III) alternatively compares the tag attributes vector statistically, while both algorithms (I) and (II) focus on studying the diffusion process on tripartite networks. Therefore, it can clearly save computational cost by avoiding multi-step iterations.

Experimental Results

The empirical data we use in this paper include (datasets are free to download as Supporting Information): (a) MovieLens: one representative website, provided by GroupLens project (http://www.grouplens.org/), where users can vote movies in five discrete ratings 1–5; (b) Del.icio.us (http://www.delicious.com/): obtained by downloading publicly available data from the social bookmarking website, which allows users to store, organize and retrieve personal bookmarks via social tags. To eliminate the data sparsity effect, in both datasets, we purify the data to guarantee that [47] (a) each user has collected at least one object; (b) each object has been collected by at least two users, and assigned by at least two tags; (c) each tag is used by at least two users. Table 1 shows the basic statistics of the observed data sets.

To test the algorithm performance, we randomly remove of the data as testing set and apply the algorithms in the remaining data to produce recommendations. In addition, to give a solid and comprehensive evaluation of the proposed algorithm, we employ three representative metrics to characterize the recommendations performance. (a) [48], defined as the probability that the score of an examined link in the testing set is larger than those in the training set; (b) [49], defined as the successful ratio of the number of top recommended links divided by the recommendation length ; (c) Inter Similarity [50], defined as, , where where runs over all users, is the similarity of object and appearing in the recommendation list for .

Fig. 1Fig. 4 show the experimental results of those three metrics. In Fig. 1, it can be seen that AUC will decrease as increases for both MovieLens and Del.icio.us. In addition, there are two stationary states of AUC for large or small , which respectively correspond to the best and worst AUC. In fact, Eq. (3) can be transformed to,(4)Note that, for a real dataset, , and are finite. That is to say, for the extreme cases of Eq. (4): (a) (but a finite value), is purely determined by ; (b) , . will be the same for all objects, hence resulting in a random recommendation process. In the experiments, for the simplicity of calculation, we set as a constant; (c) , is hybridly determined by and . In addition, as for a given recommendation process, once the target user is fixed, the value of will be the same for all examined objects. Consequently, the competition of and would finally determine whether object will be highly ranked and eventually recommended. For large , algorithm (III) degenerates to the object popularity priority first algorithm (so-called GRM in [51]). For , algorithm (III) degenerates to random recommendation. For small , the final result is determined by the resultant force of and , which we subsequently investigate in Fig. 2. It shows a clearly positive relationship between them. It means that the heavier (corresponds to large ) an object is, the more chance it will be attracted by users with more common interests (corresponds to large ), hence is more likely to be recommended by the proposed algorithm. In addition, Table 2 shows the pure AUC values of mass (), common interest () and algorithm (III). Indeed, it shows that can significantly enhance the recommendation accuracy comparing with that based on pure object popularity. Furthermore, with incorporating the object popularity, the gravity law based method can achieve even better performance.

thumbnail
Figure 1. AUC vs. for algorithm (III) on the two observed datasets.

The result is obtained by averaging over 50 independent realizations of random data division, and yellow lines represent the error intervals. It can be clearly seen that, for both datasets, AUC decreases monotonously with , and reaches saturation for both large and small .

https://doi.org/10.1371/journal.pone.0091070.g001

thumbnail
Figure 2. as the function of for the two observed datasets, showing that the common feature, , is positively correlated with the object mass.

https://doi.org/10.1371/journal.pone.0091070.g002

thumbnail
Figure 3. Precision vs. recommendation length of the three algorithms for Del.icio.us and Movielens.

The result is obtained by averaging over 50 independent realizations of random data division , and yellow lines represent the error intervals. The parameter for algorithm (III) is set to 0.001. Results on both datasets show that the gravity-model based algorithm (black) outperforms other two baselines.

https://doi.org/10.1371/journal.pone.0091070.g003

thumbnail
Figure 4. InnerS vs. recommendation length of the three algorithms for Del.icio.us and Movielens.

The result is obtained by averaging over 50 independent realizations of random data division, and yellow lines represent the error intervals. The parameter for algorithm (III) is set to 0.001. Results on both datasets show that the gravity-model based algorithm (black) outperforms other two baselines.

https://doi.org/10.1371/journal.pone.0091070.g004

thumbnail
Table 2. Comparisons of AUC results of respectively considering the effects of mass (), common interest (), and as well as three algorithms (algorithm I, II and III).

https://doi.org/10.1371/journal.pone.0091070.t002

Fig. 3 and Fig. 4 show the results of Precision and InnerS as the function of the length of recommendation list, respectively. In both figures, algorithm (III) performs better than the other two baselines, especially for small . Note that, in real applications, the number of recommended objects pushed to users could be very small (normally in real applications) due to the page limitation, the proposed algorithm might be very promising and useful in online applications.

Gravity Based Evolving Model

In this section, we propose an evolving model to better understand the gravity effect on networks. Among various mechanisms driving the corresponding emergent properties, preferential attachment (PA) [25], which considers rich-get-richer, is one of the most attractive models. However, the PA model only takes into account the mass of target nodes, while neglecting the underlying relationship between the two considering nodes. Consequently, we coherently present the gravity model to unify both node mass and common features, and compare it with two baseline models, ER model [23] and PA model.

Model Description

Using the divided training set extracted in the previous recommendation experiment, we build a static network (ST) as the baseline for comparison. In the ST network, nodes represent users, and one link will be created if the corresponding two users have collected at least one common object. Besides, each user has a weighted tag attribute vector. The final initialized network contains 648 vertices, 20,956 edges, and 1,382 tags. Comparatively, the other three observed evolving mechanisms at each step as following:

  1. ER model. Select two nodes randomly and connect them if there is no link between them;
  2. PA model. Select two nodes and connect them if there is no link between them. Each node is chosen according to its own degree . Initially, the degree of each node is set to one;
  3. GR model. Select one node randomly, and link it to another node that is chosen based on the probability defined by Eq. (3).

Results & Analysis

To give a solid and comprehensive evaluation of the GR model, we employ five different metrics to characterize the properties of the resulting networks. The observed properties include (a) size of giant component [52]; (b) assortativity [53]; (c) clustering coefficient [24]; (d) average distance [54]; (e) degree heterogeneity [16].

Table 3 shows the evolutionary results of corresponding models. In general, as ST network is extracted from the real user-object bipartite network, it would naturally keep the original relationship of users' common interests. Therefore, it could be used as the baseline to compare the proposed models. That is to say, the model can better characterize the real-world evolutionary dynamics if its resulting properties are more similar with ST. In Table 3, it apparently shows that GR performs much better than both ER and PA for all observed properties. Since GR also considers users' interests by taking into account their preferences on assigning common tags, it would have a high probability to generate a more connected (corresponds to large ) and more clossness (corresponds to large ) network. Comparatively, the diverse topics (e.g. tags about different subjects) would make closer connections within the same community of similar topics, however, simultaneously increase the distance between nodes affiliated with different communities of different subjects, hence result in a larger network distance. In addition, the high degree heterogeneity () indicates that the network tends to be more disassortative (corresponds to negative ) [55]. Furthermore, the dynamic link adding process (see Fig. 5) also shows the advantages of GR in the five corresponding properties. In a word, GR evolving mechanism indeed can result in a more real network with large clustering coefficient and network distance, high degree heterogeneity and a strongly disassortative linking pattern.

thumbnail
Figure 5. , , E, r, D and H as the function of ratio of added links.

The result is obtained by averaging over 50 interdependent network realizations. The dash line highlights the corresponding result of ST network. Results from five representative metrics show that the GR model (blue triangle) is the best one to approach the original ST network.

https://doi.org/10.1371/journal.pone.0091070.g005

Conclusions and Discussion

In this paper, we applied the classical gravity model in designing a new recommendation algorithm, considering the effects of both masses and common interests of two observed nodes. Experimental results on two real-world networks, MovieLens and Del.ico.us, demonstrated that the proposed algorithm outperformed the previous two baseline algorithms. Furthermore, we adopted the gravity principle to build an evolving network to understand its advantage in information recommendation. Numerical analyses of five corresponding network properties proved the gravity mechanism can characterize the structure of real networks better than two baseline stochastic methods, ER and PA models. Therefore, the gravity-based algorithm can naturally provide more suitable results that can be recommended to appropriate users.

In brief, this work innovatively applies the gravity principle in information filtering and network evolving of online system. The results provide preliminary evidence of gravity effect on directly mining the hidden interests of users, and picking up relevant information. However, the underlying mechanism of the gravity effect on network-based algorithms and models still need further exploration.

Supporting Information

Data S1.

The data sets are available as attachment.

https://doi.org/10.1371/journal.pone.0091070.s001

(ZIP)

Acknowledgments

We thank Kun Chin Hu, Tao Zhou and Chengzhi Zhang for their valuable discussions.

Author Contributions

Conceived and designed the experiments: JHL ZKZ. Performed the experiments: JHL ZKZ LJC CCY. Analyzed the data: JHL ZKZ LJC CCY CL. Contributed reagents/materials/analysis tools: JHL CL ZKZ XQW. Wrote the paper: ZKZ CL XQW.

References

  1. 1. Galileo G (1914) Dialogues Concerning Two New Sciences. Macmillan
  2. 2. Newton I (1999) The Mathematical Principles of Natural Philosophy. University of California Press.
  3. 3. Karemera D, Oguledo VI, Davis B (2000) A gravity model analysis of international migration to north america. Appl Econ 32: 1745–1755.
  4. 4. Rodrigue JP, Comtois C, Slack B (2013) The Geography of Transport Systems. Routledge
  5. 5. Casey HJ (1955) The law of retail gravitation applied to traffic engineering. Traffic Quarterly 9: 313–321.
  6. 6. Rietveld P (1989) Infrastructure and regional development. Ann Reg Sci 23: 255–274.
  7. 7. de Dios Ortúzar J, Willumsen LG (2001) Modelling Transport. Wiley
  8. 8. Jung WS, Wang FZ, Havlin S, Kaizoji T, Moon HT, et al. (2008) Volatility return intervals analysis of the japanese market. Eur Phys J B 62: 113–119.
  9. 9. Tinbergen J (1962) Shaping the World Economy. Twentieth Century Fund
  10. 10. Bergstrand JH (1985) The gravity equation in international trade: Some microeconomic foundations and empirical evidence. Rev Econ Stat 67: 474–481.
  11. 11. Rose AK (2004) Do we really know that the wto increases trade. Am Econ Rev 94: 98–114.
  12. 12. Westerlund J, Wilhelmsson F (2011) Estimating the gravity model without gravity using panel data. Appl Econ 43: 641–649.
  13. 13. Reilly WJ (1929) Methods for the study of retail relationships. University of Texas Press.
  14. 14. Reilly WJ (1931) The laws of retail gravitation. Knickerbocker
  15. 15. Stewart JQ (1950) The development of social physics. Am J Phys 18: 239–243.
  16. 16. Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74: 47–97.
  17. 17. Dorogovtsev SN, Mendes JFF (2002) Evolution of networks. Adv Phys 51: 1079–1187.
  18. 18. Newman MEJ, Park J (2003) Why social networks are different from other types of networks. Phys Rev E 68: 036122.
  19. 19. Boccaletti S, Latora V, Moreno Y, Chavez M, Huang DU (2006) Complex networks: Structure and dynamics. Phys Rep 424: 175–308.
  20. 20. Costa LDF, Rodrigues FA, Traviesor G, Boas PRU (2007) Characterization of complex networks: A survey of measurements. Adv Phys 56: 167–242.
  21. 21. Newman MEJ, Barabási AL, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press.
  22. 22. Cui AX, Zhang ZK, Tang M, Hui PM, Fu Y (2012) Emergence of scale-free close-knit friendship structure in online social networks. PloS ONE 7: e50702.
  23. 23. Erdős P, Rényi A (1959) On random graphs. Publ Math Debrecen 6: 290–297.
  24. 24. Watts DJ, Strogatz S (1998) Collective dynamics of śmall-worldńetworks. Nature 393: 440–442.
  25. 25. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Publ Math Debrecen 286: 509–512.
  26. 26. Dorogovtsev SN, Mendes JFF (2000) Evolution of networks with aging of sites. Phys Rev E 62: 1842–1845.
  27. 27. Dorogovtsev SN, Mendes JFF (2000) Scaling behaviour of developing and decaying networks. EPL 52: 33–39.
  28. 28. Jin EM, Girvan M, Newman MEJ (2001) Structure of growing social networks. Phys Rev E 64: 046132.
  29. 29. Castellano C, Fortunato S, Loreto V (2009) Statistical physics of social dynamics. Rev Mod Phys 81: 591–646.
  30. 30. Clauset A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453: 98–101.
  31. 31. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, et al. (2012) Recommender systems. Phys Rep 519: 1–49.
  32. 32. Salton G, McGill MJ (1983) Introduction to Model Information Retrieval. MuGraw-Hill.
  33. 33. Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, et al. (2010) Solving the apparent diversityaccuracy dilemma of recommender systems. Proc Natl Acad Sci USA 107: 4511–4515.
  34. 34. Liu W, Lü L (2010) Link prediction based on local random walk. EPL 89: 58007.
  35. 35. Liu JG, Zhou T, Guo Q (2011) Information filtering via biased heat conduction. Phys Rev E 84: 037101.
  36. 36. Lü L, Liu W (2011) Information filtering via preferential diffusion. Phys Rev E 83: 066119.
  37. 37. Qiu T, Chen G, Zhang ZK, Zhou T (2011) An item-oriented recommendation algorithm on coldstart problem. EPL 95: 58003.
  38. 38. Qiu T, Zhang ZK, Chen G (2013) Information filtering via a scaling-based function. PLoS ONE 8: e63531.
  39. 39. Qiu T, Wang TT, Zhang ZK, Zhong LX, Chen G (2013) Alleviating bias leads to accurate and personalized recommendation. EPL 104: 48007.
  40. 40. Qiu T, Han TY, Zhong LX, Zhang ZK, Chen G (2014) Redundant correlation effect on personalized recommendation. Comput Phys Commun 185: 489–494.
  41. 41. Zhang ZK, Zhou T, Zhang YC (2010) Personalized recommendation via integrated diffusion on useritemtag tripartite graphs. Physica A 389: 179–186.
  42. 42. Zhang ZK, Liu C, Zhang YC, Zhou T (2010) Solving the cold-start problem in recommender systems with social tags. EPL 92: 28001.
  43. 43. Zhang ZK, Liu C (2010) A hypergraph model of social tagging networks. J Stat Mech 2010: P10005.
  44. 44. Zhang ZK, Liu C (2012) Hybrid recommendation algorithm based on two roles of social tags. Int J Bifur Chaos 22: 1250166.
  45. 45. Zhang ZK, Zhou T, Zhang YC (2011) Tag-aware recommender systems: A state-of-the-art survey. J Comput Sci Technol 26: 767–777.
  46. 46. Hu F, Zhao HX, He JB, Li FX, Li SL, et al. (2013) An evolving model for hypergraph-structurebased scientific collaboration networks. Acta Phys Sin 62: 198901.
  47. 47. Zhang CX, Zhang ZK, Liu C (2013) An evolving model of online bipartite networks. Physica A 392: 61006106.
  48. 48. Hanely JA, McNeil BJ (1982) The meaning and user of the area under a reciever operating characteristic (roc) curve. Radiology 143: 29–36.
  49. 49. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-Based Collaborative Filtering Recommendation Algorithms. ACM Press.
  50. 50. Zhou T, Jiang LL, Su RQ, Zhang YC (2008) Effect of initial configuration on network-based recommendation. EPL 81: 58004.
  51. 51. Zhou T, Medo M, Ren J, Zhang YC (2007) Recommendation model based on opinion diffusion. EPL 80: 68003.
  52. 52. Zhou T, Lü L, Zhang YC (2009) Predicting missing links via local information. Eur Phys J B 71: 623–630.
  53. 53. Newman MEJ (2002S) Assortative mixing in networks. Phys Rev Lett 89: 208701.
  54. 54. Bouttier J, Francesco PD, Guitter E (2002) Census of planar maps: from the one-matrix model solution to a combinatorial proof. Nucl Phys B 645: 477–499.
  55. 55. Zhou S, Mondragón RJ (2007) Structural constraints in complex networks. New J Phys 9: 173–183.