ESLI: Enhancing slope one recommendation through local information embedding

Slope one is a popular recommendation algorithm due to its simplicity and high efficiency for sparse data. However, it often suffers from under-fitting since the global information of all relevant users/items are considered. In this paper, we propose a new scheme called enhanced slope one recommendation through local information embedding. First, we employ clustering algorithms to obtain the user clusters as well as item clusters to represent local information. Second, we predict ratings using the local information of users and items in the same cluster. The local information can detect strong localized associations shared within clusters. Third, we design different fusion approaches based on the local information embedding. In this way, both under-fitting and over-fitting problems are alleviated. Experiment results on the real datasets show that our approaches defeats slope one in terms of both mean absolute error and root mean square error.


Introduction
Collaborative filtering (CF) [1][2][3] is one of the widely used techniques in recommender systems [4,5]. CF does not rely on the content descriptions of items, but purely depends on preferences expressed by a set of users. Memory-based and model-based CF are two main approaches [3,6]. The former uses the entire user-item database to make a prediction [7], such as slope one [8], k-nearest neighbor [9], and matrix factorization [10]. The latter first learns a descriptive model of user preferences and then uses it for predicting ratings [11], such as neural network classifiers [12], Bayesian network [13], linear classifiers [14].
Data sparsity [15] is one of the main factors affecting the prediction accuracy of CF. Slope one uses a linear regression model to handle data sparsity. By determining the quantitative relationship between two or more items, efficient recommendation can be generated in real time. However, slope one often faces under-fitting since the global information of all users/ items are considered.
In this paper, we propose a new approach called enhanced slope one recommendation through local information embedding (ESLI). On one hand, we try to alleviate under-fitting caused by slope one with global information. This is fulfilled through using local information of users/items to accurately measure the similarities between two users' preferences. On the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 other hand, we try to alleviate over-fitting caused by local information. This is fulfilled through appropriate granular selection [16] and approach fusion.
First, we employ clustering algorithms to extract local information. Users with similar rating habits will be clustered into one category. The user clusters represent local user information (LU). Correspondingly, items of similar popularity will be clustered into one category. The item clusters represent local item information (LI).
Second, we predict ratings using the local information of users and items in the same cluster. We design three enhanced slope-one approaches embedding local information. The localuser-global-item approach (LUGI, also called A 1 ) only embeds user local information. The global-user-local-item approach (GULI, also called A 2 ) only embeds item local information. The local-user-local-item approach (LULI, also called A 3 ) embeds both the user and the item local information.
Third, we design four fusion approaches (A 4 , A 5 , A 6 , and A 7 ) based on the above three basic approaches to make the best prediction. The four approaches merges LUGI, GULI, and LULI, respectively. We use the average of any two or three approaches to form four fusion approaches. In this way, both under-fitting and over-fitting are alleviated.
To examine the performance of the proposed method, we conducted experiments on the well-known MovieLens, DouBan datasets with a Java implementation. Experimental results show that (1) ESLI decreases both the mean absolute error (MAE) and root mean square error (RMSE) evaluation indicators; and (2) ELSI is more prominent than slope one in large datasets.
The rest of this paper is organized as follows: Firstly, we present the related works including rating system, slope one algorithm and clustering algorithms. Secondly, we discuss how to extract local information and embed it into the slope one algorithm. Subsequently, we present our experimental results for four datasets. Finally, we introduce the conclusion and further work. All code files and datasets are available from the Github database (https://github.com/ FanSmale/ESLI.git) or Supporting Information (see S1 and S2 Files). Table 1 defines notations used throughout the paper.

Related work
The ESLI scheme uses the rating system and the local user/item information as input. The clustering algorithm is employed to obtain the local user/item information using the rating system.

Rating system
Let U = {u 0 , u 1 , . . ., u m−1 } be the set of users and T = {t 0 , t 1 , . . ., t n−1 } be the set of items. The users' ratings of the items form a rating matrix. The rating function is given by [17] R : where V is the rating scale. For convenience, we denote the rating system as an m × n rating matrix R = (r i,j ) m×n , where r i,j = R(u i , t j ), 0 � i � m − 1, and 0 � j � n − 1. Table 2 depicts an example of rating system, where m = 5, n = 5 and V = {1, 2, . . ., 10}. "-" indicates that the users do not have ratings on the items.

Slope one
The underlying principle of the slope one algorithm [8] is based on linear regression to determine the extent by which users prefer one item to another. It uses a simple formula f(x) = x + b, where the parameter b represents the average deviation of the ratings of two users or items [8]. Then, given a user's ratings of certain items, we can predict the user's ratings of other items based on the average deviation.
Slope one [8] is adaptive to data sparsity. It is easy to realize and extend. Due to it can generate effective recommendation in real time, it is used in many online recommender systems, such as movies, music and books. However, owing to calculate the average deviation with global information, this can lead to under-fitting problem.

Global and local rating information fusion
CF uses rating information to predict users' preferences for items [18][19][20][21]. Rating information can be collected by implicit means, explicit means or both. Implicit ratings are inferred from a user's behaviors. In the explicit collection of ratings, the user is asked to provide an opinion The predicting rating of u i to t j corresponding to g-th user cluster The predicting rating of u i to t j corresponding to q-th item cluster The predicting rating of u i to t j corresponding to g-th user cluster and q-th item cluster about the item on a rating scale. Explicit ratings provide a more accurate description of a user's preference for an item than implicit ratings. We only take explicit ratings as input in this paper. CF algorithms typically use global or local information about user preferences to help people make choices. Some studies take global information as input, such as slope one [8], matrix factorization [22], which leads to under-fitting problems. Some studies take local information as input, such as MG-LCR [23], UPUC-CF [24], which leads to over-fitting problems. In order to avoid the above two problems, some studies combine local and global information to learn models, such as MPMA [25], GLOMA [26]. However, to the best of our knowledge, the fusion of global and local information has not been used for slope one algorithm.

Clustering algorithms
Clustering is used to reveal the intrinsic properties and laws of data [27,28]. It attempts to divide the data sample into several subsets which are usually not intersected [28]. In collaborative filtering, users and items can be grouped into different clusters. User-based clustering [29] divides users with similar rating habits into a cluster. Item-based clustering [3] divides items into different clusters based on the similarity of attributes such as item popularity, etc.
There are a lot of clustering algorithms, such as k-means [30] and M-distance [31]. k-means [30] randomly selects k samples as the center points and obtains clusters through multiple iterations. It is easy to implement, but the convergent speed is slow and the clustering results are uncertain. M-distance [31] defines the relationship between users or items using the average rating. Compared with k-means clustering, its convergent speed is fast and the clustering results are deterministic.

ESLI scheme
In this section, we describe our proposed scheme. Firstly, we describe the extraction of local information. Then, we describe the ESLI scheme, which includes three basic approaches and four fusion approaches.

Local information extraction
Local information is intended to extract the rating habits of similar users or the popularity of similar items. Naturally, the clustering algorithm is employed to obtain it. LU/LI are used to represent the local user/item information, respectively. S1 Fig depicts the schematic diagram of local information extraction. S1A Fig depicts an example of LU. Users are classified into different clusters based on the rating habits. The first user cluster is composed of u 0 and u 4 . Their ratings are no more than 5 points for all items. They are more strict users and are used to providing the lower rating. The second user cluster is composed of u 1 , u 2 and u 3 . Their ratings are no less than 6 points for all items. They are more tolerant users and are used to providing the higher rating. S1B Fig depicts an example of LI. Items are classified into different clusters based on the items popularity. The first item cluster is composed of t 0 and t 4 . They get a lot of low ratings of 1-2 points. The low ratings indicate that they are less popular. The second item cluster is composed of t 1 , t 2 and t 3 . They get a lot of high ratings of 8-9 points. The high ratings indicate that they are popular items. S1C Fig depicts an example of LULI. Each cluster contains a subset of users and a subset of items. The first cluster is composed of a user group {u 0 , u 4 } and an item group {t 0 , t 4 }. They are the lowest rating of 1-2 points. The second cluster is composed of a user group {u 0 , u 4 } and an item group {t 1 , t 2 , t 3 }. They are the lower rating of 3-5 points. The third cluster is composed of a user group {u 1 , u 2 , u 3 } and an item group {t 0 , t 4 }. They are the higher rating of 6-7 points. The fourth cluster is composed of a user group {u 1 , u 2 , u 3 } and an item group {t 1 , t 2 , t 3 }. They are the highest rating of 8-9 points. Within each LULI cluster, the rating distribution is more balanced and the rating similarity is higher than LUGI and GULI. Approach A 1 uses the sub-matrix R g,. as input, and computes the predicted rating p g;: i;j for u i to t j as p g;: Approach A 2 uses sub-matrix R .,q as input, and computes the predicted rating p :;q i;j for u i to t j as � 7:9. Approach A 3 uses the sub-matrix R g,q as input, and computes the predicted rating p g;q i;j for u i to t j as

Enhanced slope one algorithms
Based on S2D Fig, we

Time complexity analysis
Let the number of users and items be m and n, respectively. The complexity analysis includes off-line and on-line phases. Local information can be extracted in the off-line phase by clustering algorithm. For M-distance clustering algorithm [31], the time complexity is O(mn).
In the on-line prediction stage, we discuss the time complexity of predicting a rating.

Experiments
In this section, we report extensive computational tests designed to address the following questions: 1. Does the ESLI model perform better than existing slope one [8] in terms of MAE and RMSE?
2. Does the ESLI model have a more prominent advantage than the existing slope one [8] when there are more users or items?
Question 1 compares the MAE and RMSE between our proposed scheme and existing slope one. The question is the core issue of this paper. Question 2 compares the MAE and RMSE between our proposed scheme and the existing slope one under different scale of users or items. Table 3 lists the basic information of Movielens 100K (ML100K), Movielens 1M (ML1M), Movielens 10M (ML10M) and DouBan [32] (DB, https://www.cse.cuhk.edu.hk/irwin.king. new/pub/data/douban) datasets. The number of users ranges from 943 to 71,567. The number of items ranges from 1,682 to 39,695. The number of ratings ranges from 100,000 to 10,000,054, while the density of rating ranges from 0.78% to 6.30%. The average rating ranges from 3.51 to 3.75.

Datasets
The rating distributions of four datasets have similar normal distribution characteristics. The rating scale ranges from 0.5 to 5. The highest scale is 5. The lowest scale is 0.5. The step length is 0.5. The frequency is the highest when the rating is 4, and the frequency is the second highest when the rating is 3 or 5. For the ML100K dataset, the maximum number of ratings for users/movies are 737/583, respectively, with a minimum of 168/1, respectively. For the ML1M dataset, the maximum number of ratings for users/movies are 2,314/3,428, respectively, with a minimum of 341/388, respectively. For the ML10M dataset, the maximum number of ratings for users/movies are 7,359/34,864, respectively, with a minimum of 20/1, respectively. For the DB dataset, the maximum number of ratings for users/movies are 10,157/1,274, respectively, with a minimum of 166/1, respectively.

Evaluation metrics
We employ MAE [33,34] and RMSE [34,35] as evaluation metrics. The lower the values of MAE and RMSE, the better the performance of the recommender system [36]. Given a rating system, the MAE is calculated by and the RMSE is computed by RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P mÀ 1 where p i,j is the prediction rating of u i for t j .

Experimental design
We design two sets of experiments to answer the questions raised at the beginning of this section. Exp1. We first determine the parameters C and E, and then obtain the optimal MAE and RMSE. We employ k-means and M-distance clustering algorithms to extract user and item local information. To determine the parameters, we change C 2 [2,10] and E 2 [2, 10] and obtain the minimum MAE and RMSE.
Exp2. Our aim is to analyze its impact on the ESLI scheme under different scale of users and items. First, we gradually increase the number of users under the condition that all items are involved. Second, we gradually increase the number of items under the condition that all users are involved.
We randomly divide the entire dataset into a training set and a testing set. 80% of the data are usually specified as a training set and the remaining 20% as a testing set.

Sensitivity to parameters
Granular selection is one of the important factors affecting the performance of the ESLI scheme [37][38][39].
Because the approach A 7 is a fusion algorithm for all basic approaches. We find the optimal C and E through computing the MAE of approach A 7 . S3 and S4 Figs show the MAE of approach A 7 when C 2 [2,10] and E 2 [2,10] for the ML1M dataset.
In S3 Fig, the number of item clusters is set to 3. When the user cluster C 2 [2,4], the MAE decreases. When the user cluster C 2 [4,10], the MAE increases. We get the minimum MAE when C = 4. In S4 Fig, the number of user clusters is set to 4. When the user cluster E 2 [2,3], E 2 [4,5] and E 2 [8,10], the MAE decreases. When the user cluster E 2 [3,4] and E 2 [5,8], the MAE increases. We get the minimum MAE when E = 3.
We analyze the performance of the ESLI through changing the number of users/items. S5 and S6 Figs show the MAE comparison between A 7 and GUGI. In S5 Fig, we fix the number of items, then gradually increase the number of users. As the number of users increases, the advantages of the ESLI scheme become more apparent. In S6 Fig, we fix the number of users, then gradually increase the number of items. As the number of items increases, the advantages of the ESLI scheme become more apparent.
The runtime of all algorithms under M-distance clustering is compared in Table 4. Note that the runtime is the total execution time, which includes the file input and output overhead. The computations were performed on a Windows 10 64-bit operating system with 8 GB RAM and intel Core i5 CPU@3.4GHz processors, using java software. For

Comparison of MAE and RMSE
We compare the performance between ESLI scheme and the traditional slope one in terms of MAE and RMSE.  For dataset ML100K, approach A 2 obtains the lowest MAE, which is 0.21% lower than the traditional GUGI approach. For dataset ML1M, approach A 4 obtains the lowest MAE, which is 3.11% lower than the traditional GUGI approach. For dataset ML10M, approach A 1 obtains the lowest MAE, which is 4.60% lower than the traditional GUGI approach. For dataset DB, approach A 3 obtains the lowest MAE, which is 1.66% lower than the traditional GUGI approach. Table 6 shows RMSE comparison under M-distance clustering.
For dataset ML100K, all ESLI approaches are no lower than the traditional GUGI approach. For dataset ML1M, approach A 1 obtains the lowest RMSE, which is 2.55% lower than the traditional GUGI approach. For dataset ML10M, approach A 1 obtains the lowest RMSE, which is 4.23% lower than the traditional GUGI approach. For dataset DB, approach A 6 obtains the lowest RMSE, which is 1.00% lower than the traditional GUGI approach. Table 7 shows MAE comparison under k-means clustering. For datasets ML100K and DB, all ESLI approaches are no lower than the traditional GUGI approach. For dataset ML1M, approach A 5 obtains the lowest MAE, which is 0.21% lower than the traditional GUGI approach. For dataset ML10M, approach A 2 obtains the lowest MAE, which is 0.66% lower than the traditional GUGI approach. Table 8 shows RMSE comparison under k-means clustering. For datasets ML100K and DB, all ESLI approaches are no lower than the traditional GUGI approach. For dataset ML1M, approach A 5 obtains the lowest RMSE, which is 0.03% lower than the traditional GUGI approach. For dataset ML10M, approach A 1 obtains the lowest RMSE, which is 0.37% lower than the traditional GUGI approach.
In general, the M-distance-based ESLI is superior to the k-means-based ESLI. The k-means clustering is non-deterministic and is related to the initial center and distance function. The M-distance clustering is deterministic and is only relevant to the average rating of the user/ item. The user average rating indicates her/his rating preference, and the item average score indicates its popularity. Compared with the k-means clustering method, the M-distance clustering method can better reflect the difference in ratings between different clusters.

Conclusion and further work
In this paper, we propose an ESLI scheme for local information extraction based on clustering. In the ESLI scheme, we design seven different local information embedding approaches. The experimental results show that our scheme is better than slope one in terms of both MAE and RMSE.
In the future, we will apply the concept of local information embedding to other collaborative filtering algorithms. For model-based recommendation algorithms, the local demographic and occupation information will be considered.