Information Filtering on Coupled Social Networks

In this paper, based on the coupled social networks (CSN), we propose a hybrid algorithm to nonlinearly integrate both social and behavior information of online users. Filtering algorithm, based on the coupled social networks, considers the effects of both social similarity and personalized preference. Experimental results based on two real datasets, Epinions and Friendfeed, show that the hybrid pattern can not only provide more accurate recommendations, but also enlarge the recommendation coverage while adopting global metric. Further empirical analyses demonstrate that the mutual reinforcement and rich-club phenomenon can also be found in coupled social networks where the identical individuals occupy the core position of the online system. This work may shed some light on the in-depth understanding of the structure and function of coupled social networks.


Introduction
In the past two decades, the rapid development of Internet has provided an unlimited source for us to search and find what we need [1].For instance, we now can enjoy plenty of TV channels as well as countless programs, while only few choice is available twenty years ago.Moreover, the Internet not only offers various games, but also becomes a versatile tool to change the lifeway that we have kept constantly over centuries.For example, online shopping has become more and more popular due to the exponential growth of e-commerce services (e.g.Amazon.com,Ebay.com,Taobao.com,etc), which allow us to choose, compare and purchase goods with single clicks.In addition, there is a vast class of novel job portions arising with the emergence of web related applications, such as SOHO workers (working at home but communicating via Internet).However, everything has two sides.Although Internet has changed the world a lot and much improved our lifespan to effectively and efficiently contact with others, it also brings many side effects, some of which are becoming critically important and even disruptive to our day-to-day routines.One of the most significant dilemmas is the well-known Information Overload problem.Take the aforementioned TV programs for example.In despite of the fact that we indeed have more items to choose than ever before, it is simultaneously surprising to see that we are even more difficult to find a proper program satisfying us.That is to say, we are facing too many choices to be able to compare each other and make the appropriate decision.
Recently, researchers from various disciplines, including computer science, social science, physics, etc., have devoted much effort to helping users avoid being drowned into the Information Ocean [2].Among numerous applications, the most successful milestone is the emergence of Search Engine (SE) [3], which can help users locate targets by filtering irrelevant objects with designed keywords, hence soon be widely applied over the Internet.Despite its great success in information filtering, the SE technology also has some apparent drawbacks, which interfere its further application in modern human society.On one hand, SE does not consider the personalization of each user, and return exactly the same results for value of similarity between user U 4 and user U 5 is zero since they do not collect the same object in the information network, which would be considered as there is no relation between them in traditional complex network theory [47].However, U 4 and U 5 are friends and may have frequent contacts in the social network, thus they should have many common interests in making acquittance of congenial friends or perform other social activities.Therefore, a reasonable consideration of the similarity of those two nodes should improve the consequent recommendation performance.Massa and Avesani [48] proposed a social propagation method based on users' distance from a fixed propagation horizon and increased the recommendation coverage while preserving the quality of closeness.There are also many works that introduced social trust and distrust relations to recommender systems [49,50].In [51], the propagation approach was used to combine pairs of trust and distrust.In [52] the author discussed the definition of trust, and their results demonstrated the positive relationship between trust and interest similarity in online social networks.[53] proposed a feedback effect between similarity and social influence in online communities.Esslimani et al. [54] proposed a new information network based collaborative filtering, exploited navigational patterns and transitive links to model users, analyzed behavior similarities, and eventually explored missing links.As we can see, many relationships can constitute a social network such as trust, friendship, community, organizational structure, etc.And some relations are directed, like trust and follower-followee, and others are undirected such as friendship.By utilizing those social relations, we can obtain the strength of social relationship between users, and we can use this weighted social relationship to generate more accurate, explainable and acceptable recommendations when it lacks user behavioral information or their profiles.
With the same motivation, we proposed an algorithm based on CSN by considering the similarities both from social and information networks, and provide recommendation in the classical CF framework.Numerical experiments on two benchmark data sets, Epinions and Friendfeed, demonstrate that our method can give higher accurate recommendations than previous methods.In addition, extensive analyses show that the RWR-based social similarity can not only enhance the connection between small-degree and large-degree user pairs, but also can reveal the large-distance user pairs which cannot reveled by other direct metrics.As a consequence, a wider range of similar users, which cannot be discovered solely from information network, could be made of use to generate more reliable yet more precise recommendations.

Methods
In this section, we start by introducing the approaches of respectively evaluating the social influence and personalized preference between two users.Then, we shill integrate them to measure the final similarity of each pair of users, and apply them in.Generally, a recommender system consists of two sets, respectively of users U = {U 1 , U 2 , . . ., U n }, and items I = {I 1 , I 2 , . . ., I m }.Denote R m×n as the adjacent matrix of the user-item bipartite network, of which each element R ij = 1 if user U i has collected item I j , and R ij = 0 otherwise.Analogously, T m×m is an asymmetric matrix, denoting the directed social network, where T ij = 1 if the user U i has linked to user U j , and T ij = 0 otherwise.

Social Influence
We firstly use the Random Walk with Restart (RWR) [55,56,57] method to evaluate the social influence of directed networks.Consider a random walker starting at node i.At each step, it can move to i ′ s nearest neighbors via directed links with probability c ∈ [0, 1] or returns to node i with probability 1 − c.And the final probability of each node at the stationary state will be considered as their respective peer-to-peer influence with node i. Denote T as the transition matrix of the directed network, where T ij = 1/k i (k i is the out-degree of node i if node i and j are linked).So, the final probability of i's influence to others can be defined in a vector manner, s RW R i , as where − → e i is a unit vector with dimension m × 1, and m is the number of users.Besides the RWR metrics, we also emply two typical local methods: LIN and LOUT to evaluate the social influence, using the adjusted Jaccad method, namely Tanimoto coefficient [58,59], to compute the social influence between two users.They are defined as: LIN : LOUT : Then these metrics (Eq.( 1) -Eq.( 3)) will be used to quantify how one user influences others.It can be seen that both s LIN ij and s LOUT ij only consider the local information.That is to say, only the common linked nodes of users i and j are taken into account.Comparatively, , from the perspective of dynamic influence flow, considers both the local and global structure of directed networks.Therefore, it is expected to be a promising index to characterize the social influence, hence may provide better recommendation performance.

Personalized Preference
There are many methods to compute the common preference between users or items in recommender systems, and the cosine metric [60] is one of the most frequently used one [61,62].It reads where p ij is the examined common preference between nodes i and j.

Hybrid Algorithm
To fully make use of the effect of both influence and preference of users, we then adopt a nonlinear hybrid method to integrate them.The final similarity between users i and j, S ij , is denoted as 2 Data & Metrics

Data set
In this paper, we use two data sets (datasets are free to download as Supporting Information), Epinions.com[63] and Friendfeed.com[64], to evaluate the effect of the algorithm.In Epinions, it not only allows users to rate items but also permits them to make social connections with others.F riendf eed is a microblogging service built in 2007 and acquired by F acebook in 2009.To alleviate the sparse problem [65], we purify the two data sets by make sure that each user has at least one out-link and 26 in-links ( 2for F riendf eed ) in the social network, and each user at least collects 7 items (8 items for the F riendf eed data set) that each item is collected at least 7 times (8 times for F riendf eed).Finally, we obtained a purified data set with 4,066 users, 7,649 items, 217,071 social links and 154,122 bipartite links for Epinions, and 4,188 users, 5,700items, 386,804 social links and 96,942 bipartite links for F riendf eed.
Table 1 shows the basic statistics for two representative data sets).

Metrics
Every data set is randomly divided into two parts: the training set which is consisted of 90% entries and the remainings constitute the testing set.For a general recommendation process, the training set is treated as known information to run algorithms and generate corresponding recommendations, while no information in testing set is allowed to use when making recommendations.In addition,n we use four metrics to evaluate in order to give comprehensive understanding of the methods' performance, we consequently employ four different metrics that characterize recommendation performance: 1. Precision [8] .-Precisionrepresents the probability to what extent a selected item is relevant in a given recommendation list, defined as: where L represents the length of recommendation's list, N i rs is the number of truly recovered items for user i.We can obtain the precision of whole recommender system by averaging over all individuals precisions, where m represents the number of users.Obviously, a higher precision means the more accurate the algorithm is.

2.
Recall [8] .-Recallrepresents the probability that a relevant item will be picked from testing set, defined as: where N i p is the number of items collected by user i in the testing set, and N i r is the number of recovered items of user i.We then obtain the overall recall of whole recommender system by averaging over all individuals, A higher recall means the more accurate the algorithm is.
3. F-measure [8] -The F-measure metric is a widely used metric to alleviate the sensitivity of solely usage of precision or recall, defined as, Anomalously, we can obtain the F-measure of whole system by averaging over all individuals, 4. AUC [66] -Different from the above three metrics, AUC evaluates the likelihood of all items instead of the TOP L recommendation.It can be approached with a sampling method where n is the number of independent sampling, n ′ is the number of that the predicted score of target item is higher than the score of the randomly selected item, and n ′′ is the times of the target and random items having the same score.If all the scores are generated from an independent and identical distribution, the AUC should be 0.5.Therefore, the value of the AUC exceeds 0.5 indicates how much the algorithm performs better than a random prediction.
It is noticed that, for all aforementioned results two crossing lines can be obviously found for LIN-and LOUT-based methods at α = 0 or β = 0, while only horizonal line is observed for RWR-based method at α = 0.As shown in Table 1, the information network is much sparser than that of corresponding social network, hence more items are possible to be discovered via social connections.In addition, the size of hot areas (correspond to high performance) of RWR-based method is much larger than the other two methods, as it considers not only the nearest neighbors, but also integrates the effect of remote nodes which are not directly connected.Comparatively, the local based (LIN-and LOUT-based) methods can only take into account the commonly direct neighbors, neglecting the global role of each individual.Furthermore, the hybrid case will reach the best performance for both the observed data sets with optimal parameters α * > β * , which also proves that social reinforcement is more significant than individual behaviors in information filtering.

Empirical Analysis
To better understand how the different layers of coupled networks interact with each other, in this section, we shall empirically investigate the relationship between social influence and personal preference from micro/macro perspectives.Fig. 6 shows the relationship between social influence and personal preference for each pair of users.It shows that, generally, they are positively correlated [52] for both local and global measures, indicating that the mutual reinforcement principle [60] also applies in online social activities.
In Fig. 7, we also show a typical example of an ego network [67] for a node with the largest social influence value (with the biggest size).It can be seen that it connects to a node of relatively large social influence yet small similarity (yellow one), suggesting the rich-club phenomenon [68] of social interests activities.That is to say, users with high social impact tend to interact with users of high social influence, even if they lack common activities.Furthermore, we show the degree distribution of successfully recommended items in Fig. 8 and Fig. 9 for Epinions and Friendfeed, respectively.In Fig. 8(a-c) and Fig. 9(a-c), the parameters of Eq. 5 are set as α = 0 and β = 1, of which only the social influence takes effect in the recommendation process.It shows that the local measures (LIN and LOUT) tend to find small-degree items (the degree is smaller than 6) than the RWR metric (around 57%).Similarity, for another extreme case of Eq. 5, (α, β) is set as (1,0), implying that only the personal preference will work for information filtering, hence all results are identical in Fig. 8(d-f) and Fig. 9(d-f), respectively.In addition, the number of recommended small-degree items are smaller than that of social based method.Comparatively, in Fig. 8(g-i) and Fig. 9(g-i), the parameter (α, β) is set as the optimal case given in Table 2. Since both the social influence and personal preference are integrated, the hybrid algorithm not only can find those cold items [33,25] (where the social influence primarily works), but also can push some popular items (which is largely because of the personal preference).Therefore, it finally can achieve a better performance for information filtering.

Conclusions & Discussion
In this paper, we have proposed a hybrid information filtering algorithm based on the coupled social networks, which considers the effects of both social influence and personalized preference.We apply three metrics, LIN, LOUT and RWR, to evaluate the asymmetrically social influence, and use the cosine similarity to measure the symmetrically personalized preference.In addition, we integrate them with two tunable parameters in order to obtain better recommendation results.Experimental results show that hybrid pattern can not only provide more accurate recommendations, but also enlarge the recommendation coverage while adopting global metric (RWR).Further empirical analyses demonstrate that the mutual reinforcement can also be extended to coupled networks where the same individuals occupy the core position of the entire online society.However, This article only provides a simple start for making use of both behavior and social information, while a couple of issues remain open for future study.Especially, the underlying mechanism driving the interaction of social and information networks is of particular importance to deeply understand how couples social networks works, as well as its potential applications.

Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (Grant Nos.11105024, 61103109, 1147015, 11301490 and 11305043), the Zhejiang Talents Project (No. QJC1302001), the EU FP7 Grant 611272 (project GROWTHCOM), the start-up foundation and Pandeng project of Hangzhou Normal University.5) are set as (1,0), (0,1), and (α * , β * ) given in Table 2, respectively.The dash line indicates the degree of 5, and the corresponding number shows the its percentage of all the recommenation items.5) are set as (1,0), (0,1), and (α * , β * ) given in Table 2, respectively.The dash line indicates the degree of 5, and the corresponding number shows the its percentage of all the recommenation items.

Figure 1 .
Figure 1.(Color online) Illustration of a coupled social network with five users and five items, where circles denote users and squares represent obejcts.(upper layer) social network consists of five users; (lower layer) the information network consists of five objects and five users, while user nodes are the same in the social network.

Figure 2 .
Figure 2. (Color online) Precision results on Epinions and F riendF eed data sets.The length of recommendation list L is set as 10.

Figure 3 .
Figure 3. (Color online) Recall results on Epinions and F riendF eed data sets.The length of recommendation list L is set as 10.

Figure 4 .
Figure 4. (Color online) F-measure results on Epinions and F riendF eed data sets.The length of recommendation list L is set as 10.

Figure 5 .
Figure 5. (Color online) AUC results on Epinions and F riendF eed data sets.

Figure 6 .
Figure 6.Mean personal preference versus social influence for Epinions and F riendf eed, respectively.From left to right, the metrics are respectively RWR-, LIN-, LOUT-based social influence.The personal preference is averaged according to each social influence value.

Figure 7 .
Figure 7. (Color online) Illustation of a typical example of an ego network for a node with the largest social influence value (the biggest size).

Figure 8 .
Figure 8. Number of recomended items versus degree on Epinions for L = 10.From left to right, the parameters (α, β) of Eq. (5) are set as (1,0), (0,1), and (α * , β * ) given in Table2, respectively.The dash line indicates the degree of 5, and the corresponding number shows the its percentage of all the recommenation items.

Figure 9 .
Figure 9. Number of recomended items versus degree on FriendFeed for L = 10.From left to right, the parameters (α, β) of Eq. (5) are set as (1,0), (0,1), and (α * , β * ) given in Table2, respectively.The dash line indicates the degree of 5, and the corresponding number shows the its percentage of all the recommenation items.

Table 1 .
Basic properties of the two datasets.|U |, |I|, N R and N S respectively represent the number of users, items, ratings and social activeities.S r = R |U|×|I| and S p =