Integrating Triangle and Jaccard similarities for recommendation

This paper proposes a new measure for recommendation through integrating Triangle and Jaccard similarities. The Triangle similarity considers both the length and the angle of rating vectors between them, while the Jaccard similarity considers non co-rating users. We compare the new similarity measure with eight state-of-the-art ones on four popular datasets under the leave-one-out scenario. Results show that the new measure outperforms all the counterparts in terms of the mean absolute error and the root mean square error.


Introduction
The distance measure is essential in machine learning tasks such as clustering [1,2], classification [3,4], image processing [5], and collaborative filtering [6][7][8][9]. Collaborative filtering (CF) through k-nearest neighbors (kNN) is a popular memory-based recommendation [10][11][12] schema. The key issue of CF scheme is how to calculate the similarity between users [6,13] or items [14,15]. Various types of similarity measures [16,17] have been adopted or designed for this issue. State-of-the-art ones include Cosine [18], Pearson Correlation Coefficient (PCC) [6,19], Jaccard [20], Proximity Impact Popularity (PIP) [21], New Heuristic Similarity Model (NHSM) [22] and so on. Naturally, new similarity measures providing better prediction ability are always desired. This paper proposes the Triangle multiplying Jaccard (TMJ) similarity. Only the itembased CF [14,15,23] will be considered since it performs better than the user-based [13,24] one. As illustrated in Fig 1, the rating vectors of two items form a triangle in the space. The Triangle similarity is one minus the third divided by the sum of two edges corresponding to the vectors. Since it only considers the co-rating users, it is not good enough when used alone. Fortunately, the Jaccard similarity complements with it in that non co-rating users are considered. Therefore TMJ can take advantage of both Triangle and Jaccard similarities.
We compare TMJ with eight existing measures on four popular datasets under the leaveone-out scenario. These datasets include Movielens 100k, 1M, FilmTrust and EachMovie. The leave-one-out scenario is chosen because the result is not influenced by the division of the training/testing sets. Results show that the recommender system using TMJ outperforms all In subsequent sections, we firstly review the basic concept of memory-based recommender system and eight popular similarity measures. Secondly we present the Triangle and TMJ similarities with a running example. Complexity analysis is also presented. Subsequently, we analyze the experimental results. Finally, we make our concluding remarks and indicate further  work. All code files and data sets are available from the Github database (https://github.com/ FanSmale/TMJSimilarity.git).

Rating system
The user-item relationship is often expressed by a rating system. Let U = {u 1 , u 2 , . . ., u m } be the set of users of a recommender system and I = {i 1 , i 2 , . . ., i n } be the set of all possible items that can be recommended to users. Then the rating function is often defined as [29] r : where R is the rating domain used by the users to evaluate items. For convenience, we let r u,i be the rating of item i 2 I evaluated by user u 2 U, r i ¼ ðr u 1 ;i ; r u 2 ;i Á Á Á ; r u m ;i Þ be the rating vector of item i, and 81 j 6 ¼ q n; C i j ;i q be the set of co-rating users who have rated i j and i q . Here we have the following example.
Example 1 Table 1 lists an example of rating system. R = {1, 2, 3, 4, 5}, where the numbers 1 through 5 represent the five rating levels; 0 indicates that the user has not rated the item. Given u 4 and i 2 , r u 4 ;i 2 ¼ 1 means that the rating of u 4 to i 2 is 1. r i 1 ¼ f4; 5; 4; 2; 4g is the rating vector of item i 1 ; C i 1 ;i 3 ¼ fu 1 ; u 3 ; u 5 g is the set of co-rating users who have rated i 1 and i 3 .

The leave-one-out scenario
Leave-one-out cross validation is a general training/testing scenario for evaluating the performance of a recommender system as well as a classifier. Each time only one rating is used as the test set, and the remaining ratings are used as the training set. Different from split-in-two or 10-fold cross validation, the result is not influenced by the division of the training/testing sets.
An example of the leave-one-out scenario is listed as follows. Example 2 Based on Table 1, we first leave r u 1 ;i 1 out and replace it with "?". The purpose is to predict the value of "?". After we obtain the prediction value called p u 1 ;i 1 , the error of prediction is hence computed by jr u 1 ;i 1 À p u 1 ;i 1 j. Then, we restore the value of r u 1 ;i 1 and leave the next rating out. This process terminates until all ratings are left out and predicted. where p u,i is the prediction rating of user u for item i, and the RSME [30] is computed by They are widely used to evaluate the performance of recommender systems. Naturally, the lower the value of MAE and RSME, the better the performance of the recommender system.

Popular similarities
Various popular similarities are employed in recommender systems.
PIP. PIP, consisting of three factors (i.e., Proximity, Impact, and Popularity), is defined as where the detail calculation can be found in [21]. NHSM. NHSM, consisting of two factors (i.e., JPSS and URP), is defined as [22] NHSMði j ; where the detail calculation can be found in [22]. Cosine. Cosine which focuses on the angle between two vectors of items is defined as [18] Cosineði j ; i q Þ ¼r j Ár q jr j j Â jr q j ; ð6Þ wherer j ¼ ðr u 1 ;j ; r u 2 ;j ; Á Á Á ; r u m ;j Þ T is the rating vector of item i j .
PCC. PCC which considers the linear correlation between two ratings vectors is defined as [6,19] CPCC. CPCC based on PCC, which considers the impact of positive and negative ratings, is defined as [13] CPCCði j ; i q Þ ¼ P u2C i j ;i q ðr u;j À r med Þðr u;q À r med ÞÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P where r med is the median of R. If the R = {1, 2, 3, 4, 5}, we have r med = 3.
Jaccard. Jaccard is defined as the size of the intersection divided by the size of the union of the rating users [20] where I j = {u 2 U|r u,j > 0} and I q = {u 2 U|r u,q > 0}. BC. BC, which measures similarity by means of two probability distributions, is defined as [25,26] where P j,x is the probability distribution of the rating x in item j. ES. Euclidean distance (ED) which is the real distance between two points in Euclidean space is defined as [27,28] In Fig 1, |AB| is ED(A, B). Therefore, ES can be computed by where ED max is defined as where R max is the maximum value (e.g., 5) of rating set R, and R min is the minimum one (e.g., 1).

kNN-based CF approach
The type of CF schema includes memory-based and model-based [31,32] methods. The kNN [33,34] algorithm is one of the most fundamental CF recommendation techniques. Here we adopt the kNN-based CF approach to predict the ratings. One key to kNN algorithms is the definition of the similarity measures. Popular measures have been presented. The prediction value of r u,j is computed as follows.
where h is set of neighbors, and Sim(i j , i q ) is similarity of items i j and i q .

Integrating Triangle and Jaccard similarities
In this section, we first propose the definition of Triangle similarity. Then we define the TMJ, and presented complexity analysis. Finally, we present a running example of TMJ.

Triangle
The Triangle similarity is defined by whose value range is [0, 1], where 0 indicates C i j ;i q ¼ ;. The bigger value of Triangle, the more similar they are.
With the perspective of geometry, Eq (15) also can be defined as follows.
where ! OA is the rating vector of i j , ! OB is the rating vector of i q . Triangle considers both the length of vectors and the angle between them, so it is more reasonable than the angle based Cosine measure. For example, given the two vectors A = (5, 5, 5) and B = (1, 1, 1), the Cosine similarity is 1, which is contrary to common sense. In contrast, the Triangle similarity between them is 0.33, more in line with expectations.

TMJ
However, Triangle only considers the co-rating users. To provide more information about non co-rating users, we further combine Jaccard measure to improve Triangle, hence obtain a new hybrid measure as follows.
which is the multiplication of Triangle and Jaccard similarity.

Complexity analysis
Let the number of users and items be m and n, respectively. According to Eqs (9), (15) and (17), the time complexity of item similarity computation of Jaccard, Triangle, and TMJ is O(m). kNN is employed to find the nearest k neighbors for each item. Therefore, for one item, the time complexity of finding all neighbors is O(mn).
In the leave-one-out cross validation scenario, all ratings should be predicted and validated. Since the maximal number of ratings is mn, the time complexity of testing the whole dataset is O(m 2 n 2 ).
A running example Given a rating system by Table 1. First, the co-rating users is obtained as C i 1 ;i 3 ¼ I 1 \ I 3 ¼ fu 1 ; u 3 ; u 5 g. Second, the Triangle similarity between i 1 and i 3 is computed by  Integrating Triangle and Jaccard similarities The Jaccard similarity between i 1 and i 3 is computed by Finally, the TMJ similarity between i 1 and i 3 is computed by

Experiments
In this section, quality measures like the MAE, the RSME are applied to evaluate the above 10 similarity measures. Experiments are undertaken on four real world datasets such as Movie-Lens 100K, MovieLens 1M, FilmTrust and EachMovie.

Datasets
In the experiments we used four real world datasets such as MovieLens 100K, MovieLens 1M, FilmTrust and Each Movie. The dataset schema is as follows.
• User (userID, age, gender, occupation) • Movie (movieID, release-year, genre) • Rating (userID, movieID) We used the MovieLens 100K (943 users× 1,682 movies), MovieLens 1M (6,040 users × 3,952 movies), FilmTrust (1,508 users × 2,071 movies), and EachMovie (72,916 users × 1,628 movies). The detail of these datasets are shown in Table 2. However, 0 is a rating level in EachMovie dataset. Table 3 compares the MAE obtained by recommender systems using 10 similarity measures. Symbol "-" indicates that the algorithm cannot be completed within an acceptable period of time when the measure is used. The recommender system using the TMJ measure achieves the best/minimal MAE. In these four datasets, it is lower by 0.4%-5.7%, 0.3%-13.7%, 0.3%-23.8%, and 0.1%-5.5%, respectively, than the values obtained by other methods. The MAE of Triangle is also acceptable. It ranked fourth in the first dataset and third in the other three. Figs 2, 3, 4 and 5 compare the MAE obtained by the recommender system using different similarity measures and setting different k values (i.e., number of the nearest neighbors). As we can see from the figure, the recommender system always obtains the best MAE when using TMJ, regardless of the k value. However, it obtains the best MAE, when k on the four datasets are 15, 15, 10, and 15, respectively. Table 4 compares the RSME obtained by recommender systems using 10 similarity measures. Symbol "-" indicates that the algorithm cannot be completed within an acceptable period of time when the measure is used. The recommender system using the TMJ measure achieves the best/minimal RSME. In these four datasets, it is lower by 0.5%-6.6%, 0.3%-18%, 0.1%-22.7%, and 0.1%-6.1%, respectively, than the values obtained by other methods. The RSME of Triangle is also acceptable. It ranked fourth in the first dataset and third in the other three. Figs 6, 7, 8 and 9 compare the RSME obtained by the recommender system using different similarity measures and setting different k values (i.e., number of nearest neighbors). As we can see from the figure, the recommender system always obtains the best RSME when using

Discussion
From the viewpoint of multiple kernel learning, the similarity measures such as Jaccard and Triangle meet the requirements of kernel function. TMJ is a product of Jaccard and Triangle. According to the property proved in [35] (pages 75-76), TMJ is also a kernel function. There are various types of recommendation algorithms, such as kNN, NMF, LMF, etc. NMF algorithms address the recommendation task as the matrix completion problem with high sparsity. They intrinsically work in batch mode to predict all missing values. Since they do not need any similarity measure, we cannot incorporate our new measure into them. In fact, our new measure only serves as the basis of some similarity-based prediction models such as kNN. It can replace the existing measures anywhere, such as Manhattan, cosine, etc. In this sense it is general enough. However, support for batch mode is provided by the prediction model, rather than through the similarity measure. Hence we do not discuss this issue in more detail. To the best of our knowledge, kNN-based approaches usually predict rating one-by-one even for the split-in-two scenario.

Conclusions
This paper defined the TMJ measure by integrating Triangle and Jaccard similarities. The new measure outperforms all the counterparts in terms of the MAE and the RMSE. In the Integrating Triangle and Jaccard similarities future, we will apply the new measure to other tasks, such as the three-way recommendation [7,[36][37][38][39][40][41][42], clustering [2,43], and image processing [5,44,45]. We will also develop other similarity measures in the light of multi-kernel learning [44,46].