The Power of Ground User in Recommender Systems

Accuracy and diversity are two important aspects to evaluate the performance of recommender systems. Two diffusion-based methods were proposed respectively inspired by the mass diffusion (MD) and heat conduction (HC) processes on networks. It has been pointed out that MD has high recommendation accuracy yet low diversity, while HC succeeds in seeking out novel or niche items but with relatively low accuracy. The accuracy-diversity dilemma is a long-term challenge in recommender systems. To solve this problem, we introduced a background temperature by adding a ground user who connects to all the items in the user-item bipartite network. Performing the HC algorithm on the network with ground user (GHC), it showed that the accuracy can be largely improved while keeping the diversity. Furthermore, we proposed a weighted form of the ground user (WGHC) by assigning some weights to the newly added links between the ground user and the items. By turning the weight as a free parameter, an optimal value subject to the highest accuracy is obtained. Experimental results on three benchmark data sets showed that the WGHC outperforms the state-of-the-art method MD for both accuracy and diversity.


Introduction
The explosive growth of the Internet and WWW raises a serious information overload problem: we face too many data and resources to effectively find out the relevant ones by our limited processing abilities. How to measure the values of all the alternatives and then identify the useful information is a crucial problem, which asks for the development of advanced automatic techniques on information filtering. Search engines are useful tools, by which users can find the relevant information with properly chosen queries. However, they lack the consideration of personalization and thus return the same results to people no matter what their preferences are. Besides, since the search engines require the keywords extracted by the users themselves, when the users don't know what they want or their preferences can't be expressed by keywords, the search engines are of no avail. To address these problems, recommender systems rise in response to the proper time and conditions, which do not require specified keywords, instead they use the users' historical activities and possible personal profiles to uncover their preferences and recommend the relevant items to the users according to their potential interests [1]. Actually, the recommendation can be considered as a link prediction problem on web-based user-item bipartite networks [2].
Many recommendation algorithms have been developed, including collaborative filtering [3,4], content-based analysis [5], spectral analysis [6,7] and iterative self-consistent refinement [8,9]. What most have in common is that they are based on similarity, either of users or items or both. Such approach is under high risk of providing poor coverage of the space of relevant items. As a result, with recommendations based on similarity rather than difference, more and more users will be exposed to a narrow band of popular items, and niches items will be hard to excavate. Although it seems more accurate to recommend popular items than niche ones, being accurate is not enough [10]. Diversity and novelty are also important criteria of algorithmic performance. The diversity-accuracy dilemma becomes one of the main challenges in recommender systems.
Recently, some physical dynamics, including mass diffusion [11] and heat conduction process [12] have been applied to design recommender systems. It was shown that MD has high accuracy yet low diversity, while HC has high diversity yet low accuracy. To solve the accuracy-diversity dilemma, a hybrid method that combining HC and MD was proposed [13]. Other methods include the biased HC which improves the accuracy of HC while keeping its diversity [14], and the biased MD methods which improve the diversity of MD algorithm while keeping its accuracy [15,16]. Different from the previous studies that mainly focused on the modification of the algorithms, in this paper we will show that the ground user who is supposed to select all the items in the system can improve the recommendation accuracy of HC while keeping its diversity. The ground user can also benefit other systems and its weighted form can further improve the performance.

Diffusion-based Methods
A recommender system can be represented by a user-item bipartite network G(U,O,E), which consists of a set of users U~fu 1 ,u 2 , Á Á Á ,u m g, a set of items O~fo 1 ,o 2 , Á Á Á ,o n g, and a set of links between them E. Denoted by A the adjacency matrix, where the element a ia~1 if user i has collected the item a, a ia~0 otherwise. We use Latin letters for users and Greek letters for items. The degree of item a (i.e., the number of users who have collected the item a) is denoted as k oa and the degree of user i (i.e., the number of items that connect to user i) is denoted as k ui . The essential task of a recommender system is to generate a ranking list of the target user's uncollected items, based on the observed information.
The original diffusion-based methods is called mass diffusion (MD) which is based on the resource allocation process on the user-item bipartite network [11]. For a target user i, a certain amount of resource is assigned to each item that the user i has collected. Since the network is unweighted (The biased allocation process was discussed in Ref. [15]), the unbiased allocation of the initial resource is split equally among all its neighboring users.
is the resource transfer matrix. Physically, the diffusion is equivalent to a three-step random walk starting with k u i units of resource on the target user i. The recommendation score of an item is taken to be its amount of gathered resources after the diffusion. The resulting recommendation list of uncollected items is then sorted according to f ' a in descending order. Different from MD, HC (we abbreviate this algorithm as HC, since it follows a conductive process analogous to heat diffusion across the user-item bipartite network) recommends items to an individual user by a process motivated by heat diffusion: items liked and disliked by this user are represented as hot and cold spots respectively, and recommendation is made according to the equilibrium temperature of the nodes in the networks [12]. The transition matrix of HC is represented by Similar to MD, HC also redistributes resources in a manner akin to a random-walk process. However the difference is significant in the diffusion process: the HC algorithm redistributes a resource via a nearest-neighbor averaging process, while the MD algorithm works by equally distributing the resource to the nearest neighbors. An illustration of the MD and HC processes is shown in fig. 1.
It has been pointed out that MD has high recommendation accuracy yet low diversity, while HC succeeds in seeking out novel or niche items and thus enhances the personalization of individual user recommendations but with relatively low accuracy. An effective way to solve the accuracy-diversity dilemma is to combine HC and MD by incorporating the hybridization parameter l into the transition matrix normalization [13]: where l~0 gives the pure HC algorithm and l~1 gives the MD algorithm. Such hybrid approach was shown to achieve both accurate and diverse recommendation subject to the optimal parameter l opt . Notice that the low-degree nodes are preferred in HC process than in MD process. For example in fig. 1, with MD the second user and the third user obtain the same recourse after one-step diffusion from the item side to the user side, while with HC the second user who owns lower degree k~2 obtains more than the third user with k~3. As a result, for the target user, the third item which is unpopular obtains more by HC. This is the reason why HC provides high diverse recommendation. A natural question is whether we can improve the recommendation accuracy while keep the diversity of HC. A potential way is the weighted HC where a turnable parameter is introduced [14]. Different from this route, we here propose a totally novel perspective where the key point is adding a ground user who collects all the items in the network. Figure 2 gives an example. The HC process will run on fig. 2(b) which consists of mz1 users and DEDzn links. The transition matrix of the GHC (abbreviation of the HC with ground user) algorithm is thus written as It can be rewritten as  The first term is the contribution by the common users of items a and b, which is similar to the HC algorithm (see Eq. 2). The essential difference between HC and GHC lies in the second term of Eq. 5, which leads to an additional relation between two items even when they don't have common users. It has been shown that the ground user can improve the performance on identifying influential leaders in social networks [17]. Here we will show that it also benefits the recommender systems. Experimental results show that by adding the ground user the recommendation accuracy will be largely increased. Clearly, for HC, the ground user only takes effect at the final step of the conductive process.
By assigning weight to each newly added link between the ground user and the item, we obtain a weighted form of HC algorithm with ground user (we abbreviate it as WGHC). In WGHC, the link between user i and item a in the original data has the weight q ia~1 , and the link between the ground user g and item a has the weight q ga~c . Thus, the transition matrix of WGHC is where s u i and s oa denote the weighted degree of user i and item a, respectively. Equation 6 gives a weighted heat conduction process on a bipartite network. Clearly, when c~0, WGHC degenerates to HC, and when c~1, WGHC equals to GHC where the original links and the newly added links have the same weight. By tuning the parameter c, an optimal value c opt will be obtained subject to the highest accuracy.

Methods for Comparison
For comparison, we present the results of three classical recommendation algorithms: the user-based K-Nearest-Neighbor (uKNN), item-based K-Nearest-Neighbor (iKNN), and weighted regularized matrix factorization (WRMF). KNN methods are very popular techniques in collaborative filtering. They rely on a similarity measure between either items (item-based) or users (userbased). In the uKNN method, for any user-item pair (i,a), if user i has not yet collected item a, the predicted score v ia is given as v ia~X where N k i is the set of user i's top-k nearest neighbors, and s il is the similarity between user i and user l. The main idea embedded in uKNN is that the target user will be recommended the items collected by those users sharing similar tastes with him. Different from uKNN, iKNN will recommend items similar to the ones that the target user preferred in the past. In iKNN method, the predicted score v a for user i to item a is defined as  where N k a is the set of item a's top-k nearest neighbors, and s ab is the similarity between item a and item b. Here, we use cosine similarity to measure the similarity between users or items. Notice that if we use all their neighbors to calculate the predicted scores, that is k~m{1 for uKNN and k~n{1 for iKNN, then uKNN and iKNN become respectively the standard user-based and item-based collaborative filtering algorithms, which will also be investigated in our experiments. Weighted regularized matrix factorization [18,19] is a matrix factorization method for item prediction. This method is an adaption of SVD. It associates each user i with a user-factors vector x ui , and each item a with an item-factors vector y oa . The prediction is done by taking an inner product of these two vectors, namely v ia~x T ui y oa . The factors are computed by minimizing the where c ia measures the confidence in observing a ia . Zero value of a ia should be associated with low confidence, as not taking any positive action doesn't mean that the user doesn't like the item. The l term is necessary for regularizing the model such that it will not overfit the training data. Here we set c ia~1 za ia and l~10.

Data Description
We use three benchmark data sets, MovieLens, Netflix and RYM, to test the algorithmic performance. The MovieLens data set is provided by GroupLens project at University of Minnesota (www.grouplens.org). Here, we use the data with 1 million ratings by 6040 users on 3952 items. The ratings are given on the integer scale from 1 to 5 (i.e., worst to best). We here only consider the ratings higher than 2. That is if a user i rates the item a higher than 2, it means the user likes the movie and there will be a link between user i and item a in the user-item bipartite network. After coarse gaining, the data contains 836478 links (i.e., user-item pairs). The Netflix data set is a huge data set released by the DVD rental company Netflix for its Netflix Prize (www.netflixprize.com). The ratings in Netflix are also given on the integer scale from 1 to 5. Similar to MovieLens data, only the links with ratings no less than 3 are considered. We extract a smaller data set by randomly sampling of the whole records of user activities. It finally consists of 10000 users, 6000 movies, and 701947 links. The RYM data set is publicly available on the music ratings website RateYourMusic.com. The ratings in RYM are given on the integer scale from 1 to 10. We here only consider the ratings higher than 5. The final data consists of 33221 users, 5234 albums, and 610398 links. Comparing with MovieLens and Netflix data sets, RYM is much sparser. Table 1 shows the basic statistical features of these three data sets.

Evaluation Metrics
To test the algorithmic performance, the data is randomly divided into two parts: the training set E T contains 90% of the data and the remaining 10% of the data constitutes the probe set E P . The recommendation list for each user is provided based on the training set, and the probe set will be used for testing. We apply six metrics to give quantitative measurements of the methods: ranking score, precision, normalized discounted cumulative gain (NDCG), intra-similarity, hamming distance and novelty.
Ranking score [11] is a metric for accuracy. It measures the ability of a recommendation algorithm to rank users' preferable items higher places than the disliked ones. For a target user i, the recommender system will return a ranking list of all his uncollected items to him. Ranking score measures the relevant rank of each hidden items (i.e., items in probe set for user i) in the recommendation list of this user. For example, a hidden item a ([E P ) with ranking r has the ranking score R ia~r =(DOD{k ui ), where k ui is the degree of user i in G(U,O,E T ). Averaging over all the hidden user-item relations, we obtain the mean value of ranking score, namely where (i,a) denotes the probe link connecting user i and item a. Clearly, the smaller the ranking score, the higher the algorithm's accuracy.
Since in many real online systems, only the top part of the recommendation list is presented to users, therefore a more practical approach is to consider the number of a user's relevant items ranked in the top-L places. Precision is one of the popular measurements based on this. For a target user i, the precision of the recommendation is defined as where d i (L) indicates the number of user i's hidden items in the top-L places of his recommendation list. The precision of the whole system P(L) can be obtained by averaging the individual precisions over all users who have at least one hidden link. In this Letter, we set L~20.
Another measurement for rank capabilities in recommender systems is normalized discounted cumulative gain (NDCG) which has different discount gain in averaging the ranked items [20]. For a ranking list of a target user i's all uncollected items, the discounted cumulative gain (DCG) is defined as where rel p is the graded relevance of the result at position p. Here in our experiments, rel p~1 if the item at position p is the user's hidden item, and rel p~0 otherwise. Under this definition, we can see that DCG actually gives the hidden item at position p a score 1=log 2 (pz1). This is very similar to ranking score which assigns the hidden item at position p a score p=(DOD{k u i ). Therefore, if we divide DCG by the number of hidden items of a user, the obtained value will be negatively correlated with this user's ranking score. Since ranking lists vary in length for different users, the DCG is normalized as where IDCG i is the ideal DCG of the ranking list, which is the maximum possible DCG. Clearly, the higher the NDCG is, the better the ranking result is. The NDCG of the whole system can be obtained by averaging the individual NDCG over all users who have at least one hidden link. Diversity is considered as another significant aspect for the evaluation of recommender systems. Hamming distance [21] is applied to measure the uniqueness of different users' recommendation lists. Denoting C ij (L) as the number of common items in the top-L places of the recommendation lists of user i and j, their hamming distance can be calculated as Clearly, H ij~0 corresponds to the case where the recommendation lists of user i and user j are exactly the same, while H ij~1 corresponds to the case where their lists are completely different. Averaging H ij over all pairs of users, we obtain the mean distance H(L). The greater the value is, the more diverse (or personalized) recommendations are given to the users.
Hamming distance only takes into account the diversity between users. However, a good algorithm is also expected to give diverse recommendation to a single user. Users may get tired of receiving many recommended items under the same topic [22]. Intrasimilarity is proposed to measure the diversity of a user's recommendation list [23]. For an arbitrary target user i with a recommendation list O i , the intra-similarity is defined as where L is the length of the recommendation list and s ab is the similarity of item a and item b. In this paper, we adopt the cosine similarity which is one of the most widely used similarity measures. For two items a and b, their cosine similarity is defined as The lower I i (L) is, the more diverse items are recommended to the user. Averaging I i over all users, we obtain the mean intrasimilarity I(L) of the whole system. Different from diversity which refers to how different the recommended items are with respect to each other, novelty measures the ability of an algorithm to generate unexpected and surprising recommendations. A good recommender system is expected to find the niche or unpopular items that cannot be easily known by other ways yet match users' preferences. The simplest way to calculate novelty is to use the average popularity of the recommended items. Given a recommendation list O i to user i where DO i D~L, the novelty is defined as Lower N i (L) indicates higher novelty and surprisal. Averaging N i (L) over all users, we obtain the mean novelty N(L) of the system.

Results
The recommendation performances of different methods on the Movielens, Netflix and RYM data sets are shown in table 2, table 3  and table 4, respectively. All the data points are averaged over ten independent runs with different data divisions. GHC is an abbreviation of the method HC with a ground user. HHM refers to the hybrid method that combines HC and MD algorithms, namely Eq. 3. GHHM is an abbreviation of hybrid method with ground user. WGHC is the weighed version of GHC. uKNN and iKNN are respectively the user-based k-nearest-neighbor and item-based k-nearest-neighbor algorithms. uKNN(all) and iKN-N(all) are the cases that consider all the neighbors, namely the standard user-based and item-based collaborative filtering algorithms, respectively. WRMF is the abbreviation of weighted regularized matrix factorization. For these parameter-dependent algorithms, the optimal parameter for each algorithm is set as the one corresponding to the lowest ranking score.
Comparing the results of HC and GHC, we can see that the recommendation accuracy can be improved by adding a ground user for all three data sets. Especially, the improvement is significant when we focus on the precision of top-20 recommended items for Netflix and Movielens. The P(20) increases from 0.0119 to 0.0773 for MovieLens data set, and increases from 0.0002 to 0.0472 for Netflix data set. The improvement mainly comes from the accurate recommendations on popular items. Figure 3 shows the dependence of ranking score on the item degree of HC and GHC algorithms. Previous studies have shown that the original HC algorithm prefers to the small-degree items (i.e., unpopular items), which is supported by the very small average degree of the recommended items, see N(20)~43:1 for Movielens, N(20)~1:5 for Netflix, and N(20)~275:8 for RYM. While by adding a ground user, the bias can be relieved. The main contribution of the ground user is to add an additional transition probability from one item to another. Actually, the number of heat source of ground user can be considered as the temperature of the whole system. Each item receives the same heat from the ground user and then average it with the heat from other sources. As a result, the temperature of the popular items will be enhanced. Besides accuracy, the ground user also improves the inter-diversity of the recommendation results (see the improvement of hamming distance by GHC) while keeping a relatively high novelty.
We have known that the original hybrid method HHM is a good trade-off of diversity and accuracy of recommendation [13]. From our experiments, we find that the ground user also improves the accuracy of HHM while keeping high diversity and novelty. Figures 4,5 and 6 show the performance of the hybrid algorithm under different l on Movielens, Netflix and RYM data sets, respectively. All the data points are averaged over ten independent runs with different data divisions. As we can see, for large l (indicates that the MD algorithm has a larger weight in the hybrid method) the HHM and GHHM perform almost the same. While given a small l (indicates that the HC algorithm has a larger weight in the hybrid method) GHHM obtains lower ranking score than HHM. The optimal l for the GHHM is smaller than that of the HHM method, meaning that the GHHM reaches the optimal case by considering less weight of MD and more weight of HC algorithm. That is also the reason why GHHM can slightly increase the recommendation diversity. Now we consider the case when the weights of the newly added links between the ground user and items are different from the original ones, namely the weighted GHC method (WGHC), see Eq. 6. Figure 7 shows the dependence of ranking score of WGHC algorithm on parameter c. With the increasing of c, the ranking score of HC method decreases sharply at the beginning and then reaches the lowest point. As we discussed above, the ground user can reduce the preference of low-degree items of the HC method by adding additional relations to every pair of items. Assigning higher weight to the newly added links between the ground user and the items can enhance the influence of the additional relations.
The WGHC method outperforms the MD algorithm for both accuracy and diversity. The improvements are significant. The exact scores of the six metrics on three data sets are given in tables 2, 3 and 4, respectively.
From the last five classical methods, it can be seen that, comparing with the standard user-based and item-based collaborative filtering algorithms, their corresponding KNN methods give better results in all six metrics. In general, iKNN performs better than uKNN. For MovieLens and RYM data sets, iKNN provides more accurate recommendations than WRMF in all three accuracy metrics, while in Netflix data set iKNN wins in precision and NDCG but WRMF has lower ranking score. Among all the eleven algorithms, GHHM yields the lowest ranking score on all three data sets. Besides, the diffusion-based methods are more efficient. In the worst case the complexity of one recommendation is approximate to O(DED), where DED is the number of links [15].
Essentially, the diffusion-based methods (e.g., MD, HC and HHM) and the similarity-based methods (e.g., uKNN and iKNN) can be unified in a common framework, since they all work via a transformation f '~Wf . The main difference between these two groups of methods is how to define the matrix W . For uKNN and iKNN, W is actually the similarity matrix which is symmetric. While for diffusion-based methods, the transformation matrix W is asymmetric. The role of the ground user is to add an additional relation between two items. Whether the ground user can improve the performance depends on how it works on matrix W . We have tested that the ground user will not affect the result of uKNN. The reason is twofold. On one hand, the cosine similarity between any two users will not change after adding the ground user. On the other hand, the ground user is hard to be included in the set of top-k nearest neighbors due to its very small similarity with the target user. For iKNN, the result becomes even worse with the consideration of a ground user. It is absurd that the similarity between two low-degree items which have not been selected by any common user changes from 0 to a high value after adding a ground user. This fact leads to a ridiculous result that the most dissimilar items are selected as top-k nearest neighbors.

Discussion
To summary, we proposed a novel way to address the accuracydiversity dilemma in recommender systems by adding a ground user who is supposed to select all the items and thus can be considered as the global environment or background temperature of the system. The main contribution of the ground user is to add an additional relation between every two items even when they don't have any common users. Each item receives the same heat from the ground user and then average it with the heat from other sources. Comparing with the original heat conduction algorithm the temperature of the popular items will be enhanced. That is to say, the ground user can relieve the bias of the original heat conduction algorithm on unpopular items. Experiments on three benchmark data sets showed that the ground user can improve the accuracy while keeping high diversity, and especially the improvement is significant with its weighted form.
In the BIG DATA era, we are able to quantitatively characterize the Internet evolution and human online activities, which may result in large improvement of the technologies of information services and thus significant social and economic values. How to effectively find the relevant information within a huge data space is a crucial problem, with three key scientific issues: (i) Understanding the structure and evolution of information systems, as well as the originality and spreading dynamics of information. (ii) Understanding the spatio-temporal statistics of human online behaviors, as well as the correlation between users' short-term and long-term interests embodied by their activities. (iii) Understanding the generation and organization of information, and providing better information services about prediction, navigation and recommendation. These studies promote the development of a new branch of research domains named ''Infophysics''. In our future studies, we will keep working in this direction and apply the perspectives, theories and methods in statistical physics to develop efficient algorithms, uncover the statistical features hidden in the huge amount of data, summarize the universal law of the evolution of information systems and the behaviors of humans, and eventually provide advanced informa-tion services. We believe these studies will ultimately contribute to the science and engineering in the big data era.