Integrating Triangle and Jaccard similarities for recommendation

Shuang-Bo Sun; Zhi-Heng Zhang; Xin-Ling Dong; Heng-Ru Zhang; Tong-Jun Li; Lin Zhang; Fan Min

doi:10.1371/journal.pone.0183570

Abstract

This paper proposes a new measure for recommendation through integrating Triangle and Jaccard similarities. The Triangle similarity considers both the length and the angle of rating vectors between them, while the Jaccard similarity considers non co-rating users. We compare the new similarity measure with eight state-of-the-art ones on four popular datasets under the leave-one-out scenario. Results show that the new measure outperforms all the counterparts in terms of the mean absolute error and the root mean square error.

Citation: Sun S-B, Zhang Z-H, Dong X-L, Zhang H-R, Li T-J, Zhang L, et al. (2017) Integrating Triangle and Jaccard similarities for recommendation. PLoS ONE 12(8): e0183570. https://doi.org/10.1371/journal.pone.0183570

Editor: Quan Zou, Tianjin University, CHINA

Received: June 5, 2017; Accepted: August 7, 2017; Published: August 17, 2017

Copyright: © 2017 Sun et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All code files and datasets are available from the Github database (https://github.com/FanSmale/TMJSimilarity.git).

Funding: This work was supported by National Natural Science Foundation of China (Grant 61379089 and 41604114), http://www.nsfc.gov.cn/, decision to publish and preparation of the manuscript; Natural Science Foundation of the Department of Education of Sichuan Province (Grant 16ZA0060), http://www.scsjyt.gov.cn/, study design; Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province (grant No. OBDMA201601), http://obdm.zjou.edu.cn/, data collection and analysis; and Innovation and Entrepreneurship Foundation of Southwest Petroleum University (Grant SWPUSC16-003), http://www.swpu.edu.cn/, data collection and analysis.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The distance measure is essential in machine learning tasks such as clustering [1, 2], classification [3, 4], image processing [5], and collaborative filtering [6–9]. Collaborative filtering (CF) through k-nearest neighbors (kNN) is a popular memory-based recommendation [10–12] schema. The key issue of CF scheme is how to calculate the similarity between users [6, 13] or items [14, 15]. Various types of similarity measures [16, 17] have been adopted or designed for this issue. State-of-the-art ones include Cosine [18], Pearson Correlation Coefficient (PCC) [6, 19], Jaccard [20], Proximity Impact Popularity (PIP) [21], New Heuristic Similarity Model (NHSM) [22] and so on. Naturally, new similarity measures providing better prediction ability are always desired.

This paper proposes the Triangle multiplying Jaccard (TMJ) similarity. Only the item-based CF [14, 15, 23] will be considered since it performs better than the user-based [13, 24] one. As illustrated in Fig 1, the rating vectors of two items form a triangle in the space. The Triangle similarity is one minus the third divided by the sum of two edges corresponding to the vectors. Since it only considers the co-rating users, it is not good enough when used alone. Fortunately, the Jaccard similarity complements with it in that non co-rating users are considered. Therefore TMJ can take advantage of both Triangle and Jaccard similarities.

Download:

Fig 1. The Triangle in 3D space.

https://doi.org/10.1371/journal.pone.0183570.g001

We compare TMJ with eight existing measures on four popular datasets under the leave-one-out scenario. These datasets include Movielens 100k, 1M, FilmTrust and EachMovie. The leave-one-out scenario is chosen because the result is not influenced by the division of the training/testing sets. Results show that the recommender system using TMJ outperforms all the counterparts in terms of the mean absolute error (MAE) and the root mean square error (RSME). Specifically, the MAE obtained on four datasets are 0.707, 0.671, 0.614 and 0.179, respectively.

In subsequent sections, we firstly review the basic concept of memory-based recommender system and eight popular similarity measures. Secondly we present the Triangle and TMJ similarities with a running example. Complexity analysis is also presented. Subsequently, we analyze the experimental results. Finally, we make our concluding remarks and indicate further work. All code files and data sets are available from the Github database (https://github.com/FanSmale/TMJSimilarity.git).

Related work

In this section, we review eight similarity measures including the Cosine [18], PCC [6, 19], Constrained Pearson Correlation Coefficient (CPCC) [13], Jaccard [20], Bhattacharyya Coefficient (BC) [25, 26], Euclidean similarity (ES) [27, 28], PIP [21] and NHSM [22].

Rating system

The user-item relationship is often expressed by a rating system. Let U = {u₁, u₂, …, u_m} be the set of users of a recommender system and I = {i₁, i₂, …, i_n} be the set of all possible items that can be recommended to users. Then the rating function is often defined as [29] (1) where R is the rating domain used by the users to evaluate items.

For convenience, we let r_u,i be the rating of item i ∈ I evaluated by user u ∈ U, be the rating vector of item i, and be the set of co-rating users who have rated i_j and i_q. Here we have the following example.

Example 1 Table 1 lists an example of rating system. R = {1, 2, 3, 4, 5}, where the numbers 1 through 5 represent the five rating levels; 0 indicates that the user has not rated the item. Given u₄ and i₂, means that the rating of u₄ to i₂ is 1. is the rating vector of item i₁; is the set of co-rating users who have rated i₁ and i₃.

Download:

Table 1. Rating system.

https://doi.org/10.1371/journal.pone.0183570.t001

The leave-one-out scenario

Leave-one-out cross validation is a general training/testing scenario for evaluating the performance of a recommender system as well as a classifier. Each time only one rating is used as the test set, and the remaining ratings are used as the training set. Different from split-in-two or 10-fold cross validation, the result is not influenced by the division of the training/testing sets.

An example of the leave-one-out scenario is listed as follows.

Example 2 Based on Table 1, we first leave out and replace it with “?”. The purpose is to predict the value of “?”. After we obtain the prediction value called , the error of prediction is hence computed by . Then, we restore the value of and leave the next rating out. This process terminates until all ratings are left out and predicted.

MAE and RSME

Given a rating system, the MAE [30] of the predictors is computed by (2) where p_u,i is the prediction rating of user u for item i, and the RSME [30] is computed by (3)

They are widely used to evaluate the performance of recommender systems. Naturally, the lower the value of MAE and RSME, the better the performance of the recommender system.

Popular similarities

Various popular similarities are employed in recommender systems.

PIP.

PIP, consisting of three factors (i.e., Proximity, Impact, and Popularity), is defined as [21] (4) where the detail calculation can be found in [21].

NHSM.

NHSM, consisting of two factors (i.e., JPSS and URP), is defined as [22] (5) where the detail calculation can be found in [22].

Cosine.

Cosine which focuses on the angle between two vectors of items is defined as [18] (6) where is the rating vector of item i_j.

PCC.

PCC which considers the linear correlation between two ratings vectors is defined as [6, 19]

(7)

CPCC.

CPCC based on PCC, which considers the impact of positive and negative ratings, is defined as [13] (8) where r_med is the median of R. If the R = {1, 2, 3, 4, 5}, we have r_med = 3.

Jaccard.

Jaccard is defined as the size of the intersection divided by the size of the union of the rating users [20] (9) where I_j = {u ∈ U|r_u,j > 0} and I_q = {u ∈ U|r_u,q > 0}.

BC.

BC, which measures similarity by means of two probability distributions, is defined as [25, 26] (10) where P_j,x is the probability distribution of the rating x in item j.

ES.

Euclidean distance (ED) which is the real distance between two points in Euclidean space is defined as [27, 28] (11) In Fig 1, |AB| is ED(A, B).

Therefore, ES can be computed by (12) where ED_max is defined as (13) where R_max is the maximum value (e.g., 5) of rating set R, and R_min is the minimum one (e.g., 1).

kNN-based CF approach

The type of CF schema includes memory-based and model-based [31, 32] methods. The kNN [33, 34] algorithm is one of the most fundamental CF recommendation techniques. Here we adopt the kNN-based CF approach to predict the ratings. One key to kNN algorithms is the definition of the similarity measures. Popular measures have been presented. The prediction value of r_u,j is computed as follows. (14) where h is set of neighbors, and Sim(i_j, i_q) is similarity of items i_j and i_q.

Integrating Triangle and Jaccard similarities

In this section, we first propose the definition of Triangle similarity. Then we define the TMJ, and presented complexity analysis. Finally, we present a running example of TMJ.

Triangle

The Triangle similarity is defined by (15) whose value range is [0, 1], where 0 indicates . The bigger value of Triangle, the more similar they are.

With the perspective of geometry, Eq (15) also can be defined as follows. (16) where is the rating vector of i_j, is the rating vector of i_q.

Triangle considers both the length of vectors and the angle between them, so it is more reasonable than the angle based Cosine measure. For example, given the two vectors A = (5, 5, 5) and B = (1, 1, 1), the Cosine similarity is 1, which is contrary to common sense. In contrast, the Triangle similarity between them is 0.33, more in line with expectations.

TMJ

However, Triangle only considers the co-rating users. To provide more information about non co-rating users, we further combine Jaccard measure to improve Triangle, hence obtain a new hybrid measure as follows. (17) which is the multiplication of Triangle and Jaccard similarity.

Complexity analysis

Let the number of users and items be m and n, respectively. According to Eqs (9), (15) and (17), the time complexity of item similarity computation of Jaccard, Triangle, and TMJ is O(m).

kNN is employed to find the nearest k neighbors for each item. Therefore, for one item, the time complexity of finding all neighbors is O(mn).

In the leave-one-out cross validation scenario, all ratings should be predicted and validated. Since the maximal number of ratings is mn, the time complexity of testing the whole dataset is O(m² n²).

A running example

Given a rating system by Table 1. First, the co-rating users is obtained as . Second, the Triangle similarity between i₁ and i₃ is computed by

The Jaccard similarity between i₁ and i₃ is computed by

Finally, the TMJ similarity between i₁ and i₃ is computed by

Experiments

In this section, quality measures like the MAE, the RSME are applied to evaluate the above 10 similarity measures. Experiments are undertaken on four real world datasets such as MovieLens 100K, MovieLens 1M, FilmTrust and EachMovie.

Datasets

In the experiments we used four real world datasets such as MovieLens 100K, MovieLens 1M, FilmTrust and Each Movie. The dataset schema is as follows.

User (userID, age, gender, occupation)
Movie (movieID, release-year, genre)
Rating (userID, movieID)

We used the MovieLens 100K (943 users× 1,682 movies), MovieLens 1M (6,040 users × 3,952 movies), FilmTrust (1,508 users × 2,071 movies), and EachMovie (72,916 users × 1,628 movies). The detail of these datasets are shown in Table 2. However, 0 is a rating level in EachMovie dataset.

Download:

Table 2. Summaries of datasets.

https://doi.org/10.1371/journal.pone.0183570.t002

Comparison of the MAE

Table 3 compares the MAE obtained by recommender systems using 10 similarity measures. Symbol “–” indicates that the algorithm cannot be completed within an acceptable period of time when the measure is used. The recommender system using the TMJ measure achieves the best/minimal MAE. In these four datasets, it is lower by 0.4%- 5.7%, 0.3%- 13.7%, 0.3%- 23.8%, and 0.1%- 5.5%, respectively, than the values obtained by other methods. The MAE of Triangle is also acceptable. It ranked fourth in the first dataset and third in the other three.

Download:

Table 3. The MAE comparison.

https://doi.org/10.1371/journal.pone.0183570.t003

Figs 2, 3, 4 and 5 compare the MAE obtained by the recommender system using different similarity measures and setting different k values (i.e., number of the nearest neighbors). As we can see from the figure, the recommender system always obtains the best MAE when using TMJ, regardless of the k value. However, it obtains the best MAE, when k on the four datasets are 15, 15, 10, and 15, respectively.

Download:

Fig 2. The MAE obtained by the recommender system using different similarity measures on MovieLens 100K.

https://doi.org/10.1371/journal.pone.0183570.g002

Download:

Fig 3. The MAE obtained by the recommender system using different similarity measures on MovieLens 1M.

https://doi.org/10.1371/journal.pone.0183570.g003

Download:

Fig 4. The MAE obtained by the recommender system using different similarity measures on FilmTrust.

https://doi.org/10.1371/journal.pone.0183570.g004

Download:

Fig 5. The MAE obtained by the recommender system using different similarity measures on EachMoive.

https://doi.org/10.1371/journal.pone.0183570.g005

Comparison of the RMSE

Table 4 compares the RSME obtained by recommender systems using 10 similarity measures. Symbol “–” indicates that the algorithm cannot be completed within an acceptable period of time when the measure is used. The recommender system using the TMJ measure achieves the best/minimal RSME. In these four datasets, it is lower by 0.5%- 6.6%, 0.3%- 18%, 0.1%- 22.7%, and 0.1%- 6.1%, respectively, than the values obtained by other methods. The RSME of Triangle is also acceptable. It ranked fourth in the first dataset and third in the other three.

Download:

Table 4. RSME comparison.

https://doi.org/10.1371/journal.pone.0183570.t004

Figs 6, 7, 8 and 9 compare the RSME obtained by the recommender system using different similarity measures and setting different k values (i.e., number of nearest neighbors). As we can see from the figure, the recommender system always obtains the best RSME when using TMJ, regardless of the k value. However, it obtains the best RSME, when k on the four datasets are 15, 15, 10, and 15, respectively.

Download:

Fig 6. The RSME obtained by the recommender system using different similarity measures on MovieLens 100K.

https://doi.org/10.1371/journal.pone.0183570.g006

Download:

Fig 7. The RSME obtained by the recommender system using different similarity measures on MovieLens 1M.

https://doi.org/10.1371/journal.pone.0183570.g007

Download:

Fig 8. The RSME obtained by the recommender system using different similarity measures on FilmTrust.

https://doi.org/10.1371/journal.pone.0183570.g008

Download:

Fig 9. The RSME obtained by the recommender system using different similarity measures on EachMoive.

https://doi.org/10.1371/journal.pone.0183570.g009

Discussion

From the viewpoint of multiple kernel learning, the similarity measures such as Jaccard and Triangle meet the requirements of kernel function. TMJ is a product of Jaccard and Triangle. According to the property proved in [35] (pages 75–76), TMJ is also a kernel function.

There are various types of recommendation algorithms, such as kNN, NMF, LMF, etc. NMF algorithms address the recommendation task as the matrix completion problem with high sparsity. They intrinsically work in batch mode to predict all missing values. Since they do not need any similarity measure, we cannot incorporate our new measure into them. In fact, our new measure only serves as the basis of some similarity-based prediction models such as kNN. It can replace the existing measures anywhere, such as Manhattan, cosine, etc. In this sense it is general enough. However, support for batch mode is provided by the prediction model, rather than through the similarity measure. Hence we do not discuss this issue in more detail. To the best of our knowledge, kNN-based approaches usually predict rating one-by-one even for the split-in-two scenario.

Conclusions

This paper defined the TMJ measure by integrating Triangle and Jaccard similarities. The new measure outperforms all the counterparts in terms of the MAE and the RMSE. In the future, we will apply the new measure to other tasks, such as the three-way recommendation [7, 36–42], clustering [2, 43], and image processing [5, 44, 45]. We will also develop other similarity measures in the light of multi-kernel learning [44, 46].

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61379089, 41604114 and the Natural Science Foundation of the Department of Education of Sichuan Province under Grant 16ZA0060 and Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province (grant No.OBDMA201601) and the Innovation and Entrepreneurship Foundation of Southwest Petroleum University (Grant SWPUSC16-003).

References

1. Xing EP, Ng AY, Jordan MI, Russell S. Distance Metric Learning, With Application To Clustering With Side-Information. Advances in Neural Information Processing Systems. 2002;15:505–512.
- View Article
- Google Scholar
2. Wang M, Min F, Zhang ZH, Wu YX. Active learning through density clustering. Expert Systems with Applications. 2017;85.
- View Article
- Google Scholar
3. Ling H, Jacobs DW. Shape classification using the inner-distance. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007;29(2):286–299. pmid:17170481
- View Article
- PubMed/NCBI
- Google Scholar
4. Weinberger KQ, Saul LK. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research. 2009;10(1):207–244.
- View Article
- Google Scholar
5. Li C, Xu C, Gui C, Fox MD. Distance Regularized Level Set Evolution and Its Application to Image Segmentation. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society. 2010;19(12):3243–3254. pmid:20801742
- View Article
- PubMed/NCBI
- Google Scholar
6. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on Computer supported cooperative work. ACM; 1994. p. 175–186.
7. Zhang HR, Min F, Shi B. Regression-based three-way recommendation. Information Sciences. 2017;378:444–461.
- View Article
- Google Scholar
8. Yu H, Li JH. Algorithm to solve the cold-start problem in new item recommendations. Journal of Software. 2015;26(6):1395–1408.
- View Article
- Google Scholar
9. Katarya R, Verma OP. Effectual recommendations using artificial algae algorithm and fuzzy c-mean. Swarm and Evolutionary Computation. 2017;.
- View Article
- Google Scholar
10. Jeong B, Lee J, Cho H. Improving memory-based collaborative filtering via similarity updating and prediction modulation. Information Sciences. 2010;180(5):602–612.
- View Article
- Google Scholar
11. Ghazarian S, Nematbakhsh MA. Enhancing memory-based collaborative filtering for group recommender systems. Expert Systems with Applications. 2015;42(7):3801–3812.
- View Article
- Google Scholar
12. Yu H, Zhou B, Deng MY, Hu F. Tag recommendation method in folksonomy based on user tagging status. Journal of Intelligent Information Systems;.
- View Article
- Google Scholar
13. Shardanand U. Social information filtering: algorithms for automating word of mouth. In: Sigchi Conference on Human Factors in Computing Systems; 1995. p. 210–217.
14. Sarwar B, Karypis G, Konstan J, Riedl J. Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web. ACM; 2001. p. 285–295.
15. Barragns-Martnez AB, Costa-Montenegro E, Burguillo JC, Rey-Lpez M, Mikic-Fonte FA, Peleteiro A. A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition. Information Sciences. 2010;180(22):4290–4311.
- View Article
- Google Scholar
16. Su XY, Khoshgoftaar TM. A survey of collaborative filtering techniques. Advances in Artificial Intelligence. 2009;2009(12):4.
- View Article
- Google Scholar
17. Zou Q, Li JJ, Song L, Zeng XX, Wang GH. Similarity computation strategies in the microRNA-disease network: a survey. Knowledge-Based Systems. 2016;105:190–205.
- View Article
- Google Scholar
18. Salton G, Mcgill MJ. Introduction to modern information retrieval. McGraw-Hill; 1983.
19. Pearson K, Stouffer SA, David FN. On the distribution of the correlation coefficient in small samples. Biometrika. 1932;24(3-4):382–403.
- View Article
- Google Scholar
20. Jaccard P. Nouvelles researches sur la distribution florale. Bull Soc Vaud Sci Nat. 1908;44:223–270.
- View Article
- Google Scholar
21. Ahn HJ. A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Information Sciences. 2008;178(1):37–51.
- View Article
- Google Scholar
22. Liu HF, Hu Z, Mian A, Tian H, Zhu XZ. A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems. 2014;56:156–166.
- View Article
- Google Scholar
23. Deshpande M, Karypis G. Item-based top- N recommendation algorithms. Acm Transactions on Information Systems. 2004;22(1):143–177.
- View Article
- Google Scholar
24. Schafer JB, Konstan J, Riedl J. Recommender systems in e-commerce. In: Proceedings of the 1st ACM conference on Electronic commerce; 1999. p. 158–166.
25. Bhattacharyya A. On a measure of divergence between two statistical populations defined by their probability distributions. Bull Calcutta Math Soc. 1942;35:99–109.
- View Article
- Google Scholar
26. Patra BK, Launonen R, Ollikainen V, Nandi S. A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowledge-Based Systems. 2015;82:163–177.
- View Article
- Google Scholar
27. Elmore KL, Richman MB. Euclidean Distance as a Similarity Metric for Principal Component Analysis. Monthly Weather Review. 2001;129(3):540–549.
- View Article
- Google Scholar
28. Qiang G, Sural S, Gu YL, Pramanik S. Similarity between Euclidean and cosine angle distance for nearest neighbor queries. In: Proceedings of Acm Symposium on Applied Computing. ACM; 2004. p. 1232–1237.
29. Zhang HR, Min F. Three-way recommender systems based on random forests. Knowledge-Based Systems. 2016;91:275–286.
- View Article
- Google Scholar
30. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research. 2005;30:79–82.
- View Article
- Google Scholar
31. Billsus D, Pazzani MJ. Learning Collaborative Information Filters. In: Machine Learning. In: Proceedings of the Fifteenth International Conference; 1998.
32. Hofmann T. Collaborative filtering via gaussian probabilistic latent semantic analysis. In: SIGIR 2003: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28—August 1, 2003, Toronto, Canada; 2003. p. 259–266.
33. Zhang ML, Zhou ZH. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition. 2007;40(7):2038–2048.
- View Article
- Google Scholar
34. Song Y, Liang J, Lu J, Zhao X. An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing. 2017;251:26–34.
- View Article
- Google Scholar
35. Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge university press; 2004.
36. Huang J, Wang J, Yao YY, Zhong N. Cost-sensitive three-way recommendations by learning pair-wise preferences. International Journal of Approximate Reasoning. 2017;86:28–40.
- View Article
- Google Scholar
37. Li XN, Yi HJ, She YH, Sun BZ. Generalized three-way decision models based on subset evaluation. International Journal of Approximate Reasoning. 2017;83:142–159.
- View Article
- Google Scholar
38. Li HX, Zhang LB, Huang B, Zhou XZ. Sequential three-way decision and granulation for cost-sensitive face recognition. Knowledge-Based Systems. 2016;91:241–251.
- View Article
- Google Scholar
39. Huang CC, Li JH, Mei CL, Wu WZ. Three-way concept learning based on cognitive operators: An information fusion viewpoint. International Journal of Approximate Reasoning. 2017; p. 1–20.
- View Article
- Google Scholar
40. Xu W, Guo Y. Generalized multigranulation double-quantitative decision-theoretic rough set. Knowledge-Based Systems. 2016;105:190–205.
- View Article
- Google Scholar
41. Li Y, Zhang ZH, Chen WB, Min F. TDUP: an approach to incremental mining of frequent itemsets with three-way-decision pattern updating. International Journal of Machine Learning and Cybernetics. 2015; p. 1–13.
- View Article
- Google Scholar
42. Gao C, Yao YY. Actionable Strategies in Three-way Decisions. Knowledge-Based Systems. 2017;.
- View Article
- Google Scholar
43. Xue ZA, Cen F, Wei LP. A weighting fuzzy clustering algorithm based on euclidean distance. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery. vol. 1; 2008. p. 172–175.
44. Molina-Giraldo S, Alvarez-Meza AM, Peluffo-Ordonez DH, Castellanos-Domnguez G. Image Segmentation Based on Multi-Kernel Learning and Feature Relevance Analysis; 2012.
45. Zhao H, Zhu PF, Wang P, Hu QH. Hierarchical feature selection with recursive regularization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence; 2017.
46. Zhang JF. Chaotic time series prediction based on multi-kernel learning support vector regression. Acta Physica Sinica. 2008;57(5):2708–2713.
- View Article
- Google Scholar

[ref1] 1. Xing EP, Ng AY, Jordan MI, Russell S. Distance Metric Learning, With Application To Clustering With Side-Information. Advances in Neural Information Processing Systems. 2002;15:505–512.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wang M, Min F, Zhang ZH, Wu YX. Active learning through density clustering. Expert Systems with Applications. 2017;85.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Ling H, Jacobs DW. Shape classification using the inner-distance. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007;29(2):286–299. pmid:17170481
View Article
PubMed/NCBI
Google Scholar

[8] View Article

[9] PubMed/NCBI

[10] Google Scholar

[ref4] 4. Weinberger KQ, Saul LK. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research. 2009;10(1):207–244.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. Li C, Xu C, Gui C, Fox MD. Distance Regularized Level Set Evolution and Its Application to Image Segmentation. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society. 2010;19(12):3243–3254. pmid:20801742
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on Computer supported cooperative work. ACM; 1994. p. 175–186.

[ref7] 7. Zhang HR, Min F, Shi B. Regression-based three-way recommendation. Information Sciences. 2017;378:444–461.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Yu H, Li JH. Algorithm to solve the cold-start problem in new item recommendations. Journal of Software. 2015;26(6):1395–1408.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Katarya R, Verma OP. Effectual recommendations using artificial algae algorithm and fuzzy c-mean. Swarm and Evolutionary Computation. 2017;.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Jeong B, Lee J, Cho H. Improving memory-based collaborative filtering via similarity updating and prediction modulation. Information Sciences. 2010;180(5):602–612.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Ghazarian S, Nematbakhsh MA. Enhancing memory-based collaborative filtering for group recommender systems. Expert Systems with Applications. 2015;42(7):3801–3812.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Yu H, Zhou B, Deng MY, Hu F. Tag recommendation method in folksonomy based on user tagging status. Journal of Intelligent Information Systems;.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Shardanand U. Social information filtering: algorithms for automating word of mouth. In: Sigchi Conference on Human Factors in Computing Systems; 1995. p. 210–217.

[ref14] 14. Sarwar B, Karypis G, Konstan J, Riedl J. Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web. ACM; 2001. p. 285–295.

[ref15] 15. Barragns-Martnez AB, Costa-Montenegro E, Burguillo JC, Rey-Lpez M, Mikic-Fonte FA, Peleteiro A. A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition. Information Sciences. 2010;180(22):4290–4311.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref16] 16. Su XY, Khoshgoftaar TM. A survey of collaborative filtering techniques. Advances in Artificial Intelligence. 2009;2009(12):4.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref17] 17. Zou Q, Li JJ, Song L, Zeng XX, Wang GH. Similarity computation strategies in the microRNA-disease network: a survey. Knowledge-Based Systems. 2016;105:190–205.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref18] 18. Salton G, Mcgill MJ. Introduction to modern information retrieval. McGraw-Hill; 1983.

[ref19] 19. Pearson K, Stouffer SA, David FN. On the distribution of the correlation coefficient in small samples. Biometrika. 1932;24(3-4):382–403.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Jaccard P. Nouvelles researches sur la distribution florale. Bull Soc Vaud Sci Nat. 1908;44:223–270.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref21] 21. Ahn HJ. A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Information Sciences. 2008;178(1):37–51.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref22] 22. Liu HF, Hu Z, Mian A, Tian H, Zhu XZ. A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems. 2014;56:156–166.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref23] 23. Deshpande M, Karypis G. Item-based top- N recommendation algorithms. Acm Transactions on Information Systems. 2004;22(1):143–177.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref24] 24. Schafer JB, Konstan J, Riedl J. Recommender systems in e-commerce. In: Proceedings of the 1st ACM conference on Electronic commerce; 1999. p. 158–166.

[ref25] 25. Bhattacharyya A. On a measure of divergence between two statistical populations defined by their probability distributions. Bull Calcutta Math Soc. 1942;35:99–109.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref26] 26. Patra BK, Launonen R, Ollikainen V, Nandi S. A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowledge-Based Systems. 2015;82:163–177.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref27] 27. Elmore KL, Richman MB. Euclidean Distance as a Similarity Metric for Principal Component Analysis. Monthly Weather Review. 2001;129(3):540–549.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref28] 28. Qiang G, Sural S, Gu YL, Pramanik S. Similarity between Euclidean and cosine angle distance for nearest neighbor queries. In: Proceedings of Acm Symposium on Applied Computing. ACM; 2004. p. 1232–1237.

[ref29] 29. Zhang HR, Min F. Three-way recommender systems based on random forests. Knowledge-Based Systems. 2016;91:275–286.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref30] 30. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research. 2005;30:79–82.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref31] 31. Billsus D, Pazzani MJ. Learning Collaborative Information Filters. In: Machine Learning. In: Proceedings of the Fifteenth International Conference; 1998.

[ref32] 32. Hofmann T. Collaborative filtering via gaussian probabilistic latent semantic analysis. In: SIGIR 2003: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28—August 1, 2003, Toronto, Canada; 2003. p. 259–266.

[ref33] 33. Zhang ML, Zhou ZH. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition. 2007;40(7):2038–2048.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref34] 34. Song Y, Liang J, Lu J, Zhao X. An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing. 2017;251:26–34.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref35] 35. Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge university press; 2004.

[ref36] 36. Huang J, Wang J, Yao YY, Zhong N. Cost-sensitive three-way recommendations by learning pair-wise preferences. International Journal of Approximate Reasoning. 2017;86:28–40.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref37] 37. Li XN, Yi HJ, She YH, Sun BZ. Generalized three-way decision models based on subset evaluation. International Journal of Approximate Reasoning. 2017;83:142–159.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref38] 38. Li HX, Zhang LB, Huang B, Zhou XZ. Sequential three-way decision and granulation for cost-sensitive face recognition. Knowledge-Based Systems. 2016;91:241–251.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref39] 39. Huang CC, Li JH, Mei CL, Wu WZ. Three-way concept learning based on cognitive operators: An information fusion viewpoint. International Journal of Approximate Reasoning. 2017; p. 1–20.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref40] 40. Xu W, Guo Y. Generalized multigranulation double-quantitative decision-theoretic rough set. Knowledge-Based Systems. 2016;105:190–205.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref41] 41. Li Y, Zhang ZH, Chen WB, Min F. TDUP: an approach to incremental mining of frequent itemsets with three-way-decision pattern updating. International Journal of Machine Learning and Cybernetics. 2015; p. 1–13.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref42] 42. Gao C, Yao YY. Actionable Strategies in Three-way Decisions. Knowledge-Based Systems. 2017;.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref43] 43. Xue ZA, Cen F, Wei LP. A weighting fuzzy clustering algorithm based on euclidean distance. In: Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery. vol. 1; 2008. p. 172–175.

[ref44] 44. Molina-Giraldo S, Alvarez-Meza AM, Peluffo-Ordonez DH, Castellanos-Domnguez G. Image Segmentation Based on Multi-Kernel Learning and Feature Relevance Analysis; 2012.

[ref45] 45. Zhao H, Zhu PF, Wang P, Hu QH. Hierarchical feature selection with recursive regularization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence; 2017.

[ref46] 46. Zhang JF. Chaotic time series prediction based on multi-kernel learning support vector regression. Acta Physica Sinica. 2008;57(5):2708–2713.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

Figures

Abstract

Introduction

Related work

Rating system

The leave-one-out scenario

MAE and RSME

Popular similarities

PIP.

NHSM.

Cosine.

PCC.

CPCC.

Jaccard.

BC.

ES.

kNN-based CF approach

Integrating Triangle and Jaccard similarities

Triangle

TMJ

Complexity analysis

A running example

Experiments

Datasets

Comparison of the MAE

Comparison of the RMSE

Discussion

Conclusions

Acknowledgments

References