Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems

One of the main concerns for online shopping websites is to provide efficient and customized recommendations to a very large number of users based on their preferences. Collaborative filtering (CF) is the most famous type of recommender system method to provide personalized recommendations to users. CF generates recommendations by identifying clusters of similar users or items from the user-item rating matrix. This cluster of similar users or items is generally identified by using some similarity measurement method. Among numerous proposed similarity measure methods by researchers, the Pearson correlation coefficient (PCC) is a commonly used similarity measure method for CF-based recommender systems. The standard PCC suffers some inherent limitations and ignores user rating preference behavior (RPB). Typically, users have different RPB, where some users may give the same rating to various items without liking the items and some users may tend to give average rating albeit liking the items. Traditional similarity measure methods (including PCC) do not consider this rating pattern of users. In this article, we present a novel similarity measure method to consider user RPB while calculating similarity among users. The proposed similarity measure method state user RPB as a function of user average rating value, and variance or standard deviation. The user RPB is then combined with an improved model of standard PCC to form an improved similarity measure method for CF-based recommender systems. The proposed similarity measure is named as improved PCC weighted with RPB (IPWR). The qualitative and quantitative analysis of the IPWR similarity measure method is performed using five state-of-the-art datasets (i.e. Epinions, MovieLens-100K, MovieLens-1M, CiaoDVD, and MovieTweetings). The IPWR similarity measure method performs better than state-of-the-art similarity measure methods in terms of mean absolute error (MAE), root mean square error (RMSE), precision, recall, and F-measure.

Traditional CF-based methods compute similarities between users for all co-rated items as well as for those items that are different than the target items. Therefore, the neighbors for the active user remain the same even for different items. The most important part of the CF-based method is to find similarities between different users. The commonly used recommender systems are based on traditional similarity measures like PCC or cosine vector similarity [13,20,21], which consider only local context information. There are still some issues in traditional similarity measures despite their enormous success. Traditional similarity measures calculate user similarity without considering user RPB. Generally, many users do not rate items according to their quality. These users can be categorized into two types; one that rates every item with almost the same rating leading to zero variance in their ratings. Second, those who give an average rating to all items regardless of the item quality. This creates a serious problem while calculating similarities among users, which often leads to poor recommendations [22]. The miniature research is carried to handle this type of user behavior despite its high impact on recommendations [23][24][25][26]. There are several design objectives, which need to be intended to make the recommender system successful, which are discussed below: a) Accurate: Accuracy is one of the most important design objectives of the recommender system. The accuracy helps to build the trust of users when they interact with the recommender system. When a user buys any product recommended to him and starts using it after some time the user realize that the system has given the wrong recommendation about that product. Consequently, a user stops trusting that recommender system. Therefore, the main objective of a recommender system is to give an accurate prediction for items to any user. The proposed IPWR similarity measure method gives accurate recommendations as compared with the state-of-the-art similarity measure methods used for the recommender systems. b) Scalable: A good recommender system should be able to handle large datasets and generate predictions in real time. When the number of users and items increases, the search space grows as well, then it may be difficult to give result in real time, if the recommender system is not scalable. There is always a conflict between accuracy and scalability of a recommender system. c) Overspecialization: In CF-based methods, items are recommended to a user, which are most similar to a user profile. Typical methods of recommender systems cannot give any recommendations about non-co-rated items. The IPWR similarity measure method also overcomes such a scenario by considering ratings of the non-co-rated items.
In this article, we present a novel method for recommender system known as IPWR similarity measure. It takes into account the user RPB towards an item rating to improve standard PCC similarity measure method. To record the user RPB, two methods are proposed: the first method uses mean and variance for each user and it is known as IPWR with variance. The second method uses mean and standard deviation (SD) for each user and it is known as IPWR with SD. The results from either method are then linearly combined with improved PCC similarity measure method. The performance of the IPWR similarity measure method is evaluated on the four state-of-the-art datasets for recommender systems using state-of-the-art similarity measure methods. The IPWR similarity measure outperforms state-of-the-art similarity measures in terms of MAE, RMSE, precision, recall, and F-measure.
The main contributions of this article are as follows: 1. A simple yet highly effective similarity measure method is proposed to model the rating preference behavior (RPB) of users.

Related work
Collaborative filtering (CF) is now commonly used in many fields for personalized recommendation [2,12,13,[27][28][29][30][31][32]. However, there are also some issues in collaborative filtering (CF), like accuracy, scalability, and cold start, etc. In this paper, the main focus is to improve the prediction accuracy. In CF, items are recommended to users' according to their preferences, therefore, it is very important that the history of users' preferences must be available. Different researchers worked on prediction accuracy to improve the performance of the recommender systems. For instance, Ahn et al. [20] propose a solution for CF known as proximity impact popularity (PIP) measure to address the shortcomings of standard PCC and cosine similarity. The PIP measure is the combination of three different aspects of user ratings, which are proximity, impact, and popularity. The PIP similarity only considers the local information of user rating, while the global preference of user ratings is ignored. Moreover, the results of the recommender system using PIP similarity measure are not normalized, which makes it difficult to combine it with other similarity measures. To resolve this issue, the weighted Pearson correlation coefficient (WPCC) method is proposed in. In WPCC [33], the idea of detaining confidence is considered that can be placed on the neighbors. When the number of rated items increases, the confidence also increases and vice versa. Jamali et al. [34] propose a similarity measure, which is based on the sigmoid function. This similarity measure can weaken the similarity of small common rated items among users. J. Bobadilla [35] propose adjusted cosine similarity to overcome the deficiencies of traditional cosine similarity, however, it does not consider the users' preferences. Bobadilla et al. [36] propose a novel similarity measure, which utilizes two similarity measures that are the mean squared difference and Jaccard similarity measure. Another metric called mean Jaccard difference (MJD) is proposed to address the cold start problem. Three steps are included in this metric to address the cold start problem. Firstly, the similarity metric is selected. The second step is an evaluation, in which weights are evaluated using neural networks. The last step is a prediction, which is obtained according to the selected similarity metric. [35]proposed a novel similarity measure known as a singularity-based similarity measure. In this similarity measure, it is assumed that the obtained results can be improved by taking contextual information. The user ratings are grouped as positive and negative and the singularity value of user and item is computed. The experimental results show the effectiveness of the proposed similarity. The significance-based similarity measure is proposed by Bobadilla et al. [35]. In this method, the significance of an item, the significance of a user, and the significance of an item for a user are computed. Then according to significance, similarity among users is computed using a standard Pearson correlation coefficient or cosine similarity. It also uses a data smoothing technique for similarity measure, which is the most widely used technique of recommender systems. Different sparsity measures are also used to improve the accuracy of the recommender system. H. Ma et al. [37] propose a similarity measure, in which information of users and items is taken into account and threshold for both are set, respectively. SongJie Gong et al. [38] propose another method to fill the missing ratings by merging SVD and itembased recommender. It uses the item-based method to recommend items to the user.
Szwabe et al. [39] propose a hybrid recommender system method that occupies two-stage data. It processes the data with content features that describe the items and users' preferences. It improves the accuracy of a system without raising the computational complexity. Moreover, probabilistic matrix factorization is also merged in the recommender system to address issues like data sparsity, cold start, etc. N. Polatidis [40] also propose a novel similarity measure, which uses four different thresholds on a number of co-rated items using PCC to improve the accuracy of the recommender system. Liu et al. [33] also propose a novel similarity measure known as the new heuristic similarity method (NHSM). It computes three parameters, which are proximity, significance, and singularity for each co-rated item. After that, each computed parameter is multiplied by modified Jaccard similarity. The obtained similarity is then again multiplied with a function named as URP to obtain the resultant NHSM similarity [20]. The computation of NHSM-based similarity is complex and lengthy, which makes it difficult to produce a result in real time for the recommender system. All factors in NHSM are again and again multiplied, which ultimately weakens the performance and combining these results with some other similarity measure becomes difficult. A novel similarity measure based on the Bhattacharyya coefficient is proposed by Bidyut Kr. Patra et al. [41]. This method considers both co-rated and non-co-rated items for similarity measure. The resultant similarity measure is a linear combination of the Bhattacharyya coefficient, PCC similarity, and Jaccard similarity. Shuang-Bo Sun et al. [42] propose a novel similarity measure, which combines triangle and Jaccard similarities to improve the performance of the recommender system. Sadasivam et al. [43] propose a novel similarity measure for recommender system, which modifies the Bhattacharyya coefficient using an exponential function and then combined it with Jaccard followed by proximity, significance, and singularity (PSS) measures using a weighted scheme.

Shortcomings of standard PCC
The standard PCC suffers from some shortcomings, which are discussed below in conjunction with Cosine and CPCC similarity measures.
Shortcoming 1: Flat value of ratings. In case user1 rating vector is flat such as (1,1,1) or (3,3,3) or (5,5,5) and user2 rating vector is (1,5,1), PCC will be not a number (NaN). Cosine value will be 0.777 and CPCC value depends upon whether the rating vector is above, equal to or below the median rating value of the rating scale. CPCC value will be +0.333, if rating vector consists of rating values less than median value (i.e. median value = 3), and will be -0.333 if rating values are greater than median value and CPCC, is NaN if all rating values are 3.
Shortcoming 2: Only single co-rated item. In case two users contain a single co-rated item, then PCC will be NaN and Cosine will be 1.0. There are two cases for CPCC. In the first case, when the value of rating for both users is equal then the value of CPCC is 1.0 for all values above or below than median value and NaN if rating value is equal to the median value. In the second case, if both users common rating value is different, then CPCC is also NaN.

Shortcoming 3: Ignorance of user rating preference behavior (RPB).
The rating preferences may vary from user to user. Some users may rate every item high and some may rate every item low. This scenario of user RPB is not considered in standard PCC.
Shortcoming 4: Ignorance of corresponding item average rating in case of user-based CF. In case of user-based CF, standard PCC only consider the average rating of users and ignores the average rating of the corresponding item. Similarly, in the case of item-based CF, user averages are also ignored.
Keeping in view the aforementioned shortcomings of the standard PCC, the proposed IPWR similarity measure method is named as improved PCC weighted with RPB (IPWR). In the IPWR similarity measure method, user RPB is modeled as a Cosine function of user averages and variance or SD. Almost all the aforementioned method in the related work, as well as state-of-the-art similarity measure methods for recommender system, ignore this behavior of a user rating. After that, calculated user RPB is linearly combined with improved PCC to enhance the performance of the IPWR similarity measure method.

The IPWR similarity measure method
In this section, we explain the methodology of the IPWR similarity measure that is used in memory-based CF to improve the performance of the recommender system. We denote the set of users by U = {a,b,c,. . .,z}3320 and a set of items are denoted by I = {i 0 ,i 1 ,i 2 ,. . .,i m }. Each user (e.g. denoted by a) rates a set of items denoted by I a . The rating of a user a for item i is denoted by R a,i and it can be any real number (normally ratings are represented by real numbers in some range [min, max]). The mathematical representation of different similarity measures that are used as a performance comparison with the proposed IPWR similarity measure is presented in Table 1.
As discussed earlier, almost all methods of the similarity measure for the recommender system uses the co-ratings provided by the user. There are many users whose rating preference behavior is different than normal users. They tend to rate items according to their own behavior. Some users may rate every item low whether the item is good or bad. They may do this for bad items or even for good items. There is the second category of users who rate every item high whether that item is good or bad. These types of behavior of the user are termed as rating preference behavior (RPB). In this article, to handle such behaviors, the IPWR similarity measure method uses variance or standard deviation (SD) of each user using a Cosine function. The variance for the user a is calculated as follows: where R a,j represent the rating of user a for item j, � R a represents the mean of ratings for the user a and I a represents a set of items rated by user a. The SD for the user a can be calculated as follows: where var a in Eq (8) denotes the variance of user a. The calculated values of variance and SD can be used to calculate RPB for two users separately. The RPB (a,b) function which uses variance is denoted by RPB (a,b) using var and if SD is used then it is denoted by RPB (a,b) using SD and mathematical represented using Eq (9) and Eq (10), respectively as follows: RPB ða;bÞ using var ¼ cosðj � R a À � R b j:jvar a À var b jÞ ð9Þ In Eq (9) and Eq (10), � R a and � R b represents the mean of ratings by user a and mean of ratings for user b, respectively. The cosine function is used to model the RPB of two users. The cosine function is the most commonly used function, which is used by the similarity measure methods either in CF or CB-based recommender systems in the literature. In addition, if two users have the same average rating and variance or SD value, then there RPB will be equal to 1. The use of the Cosine function also results in normalized values whose range is from -1 to +1, which is another reason to use cosine function. In the proposed IPWR similarity measure method, we intend to improve the standard PCC. The standard PCC works only on co-rated items and suffers from shortcoming 4 as discussed in the subsection of related work section entitled as "shortcomings of standard PCC". To tackle shortcoming 4, both user and item average ratings are used as mentioned in Eq (11). The resultant similarity is given the name of improved PCC similarity measure which is denoted by Sim_IPCC. Furthermore, the standard PCC ignores users rating pattern, which is also estimated by IPWR similarity measure method using full rating information in the form of user averages and variance or SD.
Sim IPCC ða;bÞ ¼ S j2I a \I b ½ðR a;j � � R a Þ À ðR a;j � � R j Þ� � ½ðR b;j � � R b Þ À ðR b;j � � R j Þ� ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi S j2I a ½ðR a;j � � R a Þ À ðR a;j � � R j Þ� 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi

PCC[40]
Sim PCC a; b ð Þ ¼ S j2Ia\Ib ðRa; j À � RaÞðRb; j À � RbÞ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi S j2Ia ðRa; j À � RaÞ 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where j is the set of common rated items between user a and b. Ra,j is the rating of user a for an item j and � R a is the average rating of user a.
CPCC [43] Sim CPCC a; b ð Þ ¼ S j2Ia\Ib ðRa; j À R med ÞðRb; j À R med Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi S j2Ia ðRa; j À R med Þ 2 q ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where j is the set of common rated items between user a and b. R a,j is the rating of user a for an item j and R med is the median rating on the rating scale.
WPCC [33] Sim where H is an experimental value and is set to 50 in [33].
NHSM [33] Sim PSSða; bÞ ¼ where PSS basically consists of three factors, proximity, significance, and singularity. These three factors are then combined with modified Jaccard similarity and URP.
where the variables involved in Eq (11) are the same as used in Table 1 for standard PCC. The final similarity is named as improved PCC weighted with RPB (a,b) and denoted by IPWR (a,b) is mathematically represented using Eq (12). The IPWR similarity measure considers both RPB (a,b) and Sim_IPCC (a, b) by combining both factors using an adaptive weighting scheme. Two weights α and β are chosen, α is applied to RPB (a,b) and β is applied to Sim_IPCC (a, b) . This also ensures that IPWR similarity measure method considers the user rating behavior and it also normalizes high rating effect as well as the low rating effect of each user.
IPWR ða;bÞ with variance ¼ a:RPB ða;bÞ using var þ b:Sim IPCC ða;bÞ ð12Þ IPWR ða;bÞ with SD ¼ a:RPB ða;bÞ using SD þ b:Sim IPCC ða;bÞ ð13Þ The weights of α and β are determined in a separate subsequent section entitled as "Determining best weights for α and β". The range of values for IPWR (a,b) are from -1 to +1 and thus a similarity threshold θ s is also required to be put on the similarity value generated by Eq (12) or Eq (13). The reason for adding both RPB (a,b) and Sim_IPCC (a,b) is that similarity range of Sim_IPCC (a,b) is from -1 to +1, while RPB (a,b) similarity range is also from -1 to +1. Now if two users have a slightly negative Sim_IPCC (a,b) similarity but a high positive value for RPB (a,b) then the overall similarity value for Eq (12) will become greater than zero implying a positive similarity between these two users. However, if RPB (a,b) is not used in conjunction with Sim_IPCC (a,b) then both users are treated as a dis-similar user by Sim_IPCC (a,b) . Similarly, if two users have a slightly positive Sim_IPCC (a,b) similarity while a high negative value for RPB (a,b) then overall similarity IPWR (a,b) consider these two users as dis-similar users.
The final recommendations are generated using Eq (14), which is known as Resnick's formula [44] and either Eq (12) or Eq (13) can be used by IPWR (a,b) similarity measure method and defined mathematically as follows: where p denotes a user belonging to nearest neighbor (NN) network of the user a. The top similar users of a are identified as nearest neighbors of the user a. The pseudo code for the IPWR similarity measure method is outlined below and entitled as "Algorithm 1". The experimental results of the IPWR similarity measure method are reported using five-fold cross validation [45] on each of the publically available dataset. The Pearson similarity is computed using Eq (11) and RPB is computed using either Eq (9) or Eq (10). The results of step 4 and step 5 are combined using Eq (12). Final prediction is generated using Eq (14). ___________________________________________________________________________ ________ 4. Find improved Pearson similarity Sim_IPCC (a,b) between a and b using Eq (11).
5. Find rating preference behavior RPB (a,b) of a and b using either Eq (9) or Eq (10).
7. Make predictionR on target item i 0 of target user a using Eq (14).

___________________________________________________________________________
Consider an example, which demonstrates the working of the IPWR similarity measure method. In this example, Table 2 is showing an instance of a user-item based rating matrix. In the current situation, the rating matrix consist of five users and five items. The five users are denoted by a to e, while five items are denoted by i 1 to i 5 . In this example, we want to predict a rating of an item i 5 for the user a. The distinct feature of this user-item based rating matrix is that user e rating vector is flat which corresponds to shortcoming 1 of standard PCC as mentioned earlier in the subsection entitled as "Shortcomings of standard PCC". For this reason, although user a and user e consist of exactly the same value of co-rated items, PCC is NaN. User a and user c consist of single co-rated item (i.e. i 3 only), which corresponds to the shortcoming 2 of the standard PCC. Table 3 contain various parameters, which are computed from Table 2. The variance of each user is computed using Eq (7), RPB of users is computed using Eq (9), Sim_PCC value is computed using Eq (1), and Sim_IPCC value is computed using Eq (11).   [48], CiaoDVD [49], and MovieTweetings [50]) are used for the performance evaluation of the IPWR similarity measure method. The details about these datasets are as follows: a) Epinions dataset: The Epinions is an online community website that allows users to review different products and services. Users can also rate the other user's review on a numerical scale. This dataset contains 664823 ratings on the scale of 1.0 (worst rating) to 5.0 (best rating) with the step size of 1.0. This dataset contains 139738 items that are rated by 40163 users with 99.90% sparsity. The value of mean rating per user is 10.39 with a maximum of 1023 ratings per user. The value of mean rating per item is 4.75 with a maximum of 2026 ratings per item.
The sparsity is calculated as follows: b) MovieLens-100K (ML-100K) dataset: The ML-100K dataset contains 943 users that rated different movies on a scale of 1.0 (worst rating) to 5.0 (best rating). The most rated value of this dataset is 4.0. This dataset includes 100000 user ratings over 1682 movies and each user rated at least 20 movies. This dataset is used by different state-of-the-art similarity measure methods for recommender system and its sparsity is 93.70%. c) MovieLens-1M (ML-1M) dataset: The group lens research group collected and made publically available this dataset from the MovieLens website. On this web site, users can rate and review different movies. This dataset contains 6040 users, 3952 movies, and 1000209 user ratings. The ratings take values from 1.0 (worst rating) to 5.0 (best rating) with the step size of  1.0. The sparsity of this dataset is 95.80%. The value of mean rating per user is 15.63 with a maximum of 2314 ratings per user and value of mean rating per item is 269.80 with a maximum of 3428 ratings per user. d)CiaoDVD dataset: The CiaoDVD dataset contains 72665 user ratings on the scale of 1.0 (worst rating) to 5.0 (best rating) with a step size of 1.0. This dataset contains 16121 items rated by 17615 users with 99.90% sparsity. The value of mean rating per user is 1.13 with a maximum of 1106 ratings per user. The value of mean rating per item is 4.48 with a maximum of 424 ratings per item.
The rating distribution for all four datasets is shown in Table 4. In the IPWR similarity measure method, RPB function comprises of user average rating and variance or SD, so the IPWR similarity measure method considers statistics of user average ratings, variance, and SD in intervals of 0.5. These statistics are shown in (Fig 1A-1E). It can be observed that maximum value of the variance/SD ratings occurs in the interval of 0.5-1.0 and a maximum value of average ratings of the users occur in the interval of 3.  Fig 1(E) for the MovieTweetings dataset, it can be noted that the variance/SD of more than 50% of users occurs in the interval of 0-0.5. Furthermore, maximum intervals for user average values are found to be 8.0-8.5 and 9.5-10.0. Keeping these facts into view, the RPB function of Eq (9) and Eq (10) give better performance for the Epinions, CiaoDVD, and MovieTweetings datasets as compared to the ML-1M and ML-100K datasets. Its reason is that variance/SD ratings of a maximum number of users occur in the interval of 0-0.5. It is also obvious from the results of ( Fig  1A-1E) that user averages and variances/SD ratings are not well distributed. The variance/SD occur in the interval of initial scale to median scale, while maximum average ratings of the users occur in the opposite side (i.e. median scale to maximum scale). This is the main reason behind the modeling of RPB function in terms of user average rating and variance/SD value.

Performance evaluation metrics
The performance of the IPWR similarity measure method is evaluated using five-fold crossvalidation due to its extensive usage in state-of-the-art similarity measure methods for recommender system and average results are reported. An alternate choice to five-fold cross-validation is Leave-one-out method which requires high computational complexity as compared to five-fold cross-validation method [45]. The performance of the IPWR similarity measure method is measured in terms of MAE, RMSE, precision, recall, and F-measure. The MAE calculates the average absolute deviation among the predicted ratings given by the recommender system and true ratings given by the user. The RMSE takes an average of squared error result by giving more weight to higher value errors and less weight to smaller value errors. The mathematical representation of the MAE and RMSE are as follows: Improved PCC weighted with RPB (IPWR) where N denotes the total number of items for which the prediction process is performed. The main goal of any recommender system is to decrease the MAE and RMSE. The precision and recall assess the specificity and sensitivity of a recommender system by measuring the frequency of items, respectively. The most suitable way to measure the precision and recall is to predict the top N items for known ratings. All the experimental results of the IPWR similarity measure method are reported by setting N = 5. The fundamental supposition is the division among relevant and irrelevant items in every user's dataset. Precision and recall are empirically defined in Table 5 and are mathematically expressed in Eq (18) and Eq (19) as follows: However, a tradeoff exists for precision and recall in the sense that if one value increases then other value decreases and vice versa. To overcome this tradeoff, the state-of-the-art similarity measure methods for recommender system also uses F-measure as a performance evaluation metric, which is mathematically defined using Eq (20) as follows:

Experimental results and discussions
Three different cases are used to measure the performance of the IPWR similarity measure method. In the first case, the impact of the varying similarity threshold (denoted by θ s ) on the performance of the IPWR similarity measure method is analyzed. In the second case, the best weights of α and β are determined for each dataset using an adaptive weighting scheme. In the third case, the impact on the performance of the IPWR similarity measure method is analyzed by varying neighbor's size and its performance comparisons are performed with state-of-theart similarity measure methods. The experimental details about these three cases are given in the following subsections.

Methods used for comparison
The performance of the IPWR similarity measure method is compared with state-of-the-art similarity measure methods. These similarity measure methods include PCC, CPCC, WPCC, SPCC, Cosine, PIP, Singularity measure. and NHSM. The detail about these state-of-the-art similarity measure methods is as follows: a) PCC similarity measure. The value of PCC similarity measure method is calculated using Eq (1). The range of PCC value is from -1 to +1. The -1 corresponds to the worst similarity value and +1 corresponds to the best similarity value. For all similarity measure methods, a similarity threshold is also required to be imposed on produced similarity values. The value of the similarity threshold (denoted by θ s ) for PCC similarity measure is greater than zero. This implies that users having negative similarity are ignored. b) CPCC similarity measure. This similarity measure method is calculated using Eq (2). The CPCC similarity measure categorizes all rating values as positive or negative. A rating value is positive if it is above the median rating of the rating scale and negative if it is below the median rating of the rating scale. For all reported datasets, the value of the median rating is set to 3. Like PCC, CPCC result range is also from -1 to +1 and the similarity threshold (θ s ) is also set to greater than zero. c) WPCC similarity measure. This similarity measure method is calculated using Eq (3). This method gives more weight to users whose number of common rated items are greater than some threshold and its value is set to 50 [20]. The range of values for the WPCC similarity measure method is from -1 to +1. d) SPCC similarity measure. This is an exponential version of the standard PCC similarity measure method and it is calculated using Eq (4). Its possible values are from -1 to +1 and the similarity threshold (θ s ) is set to greater than zero. e) COSINE similarity measure. This similarity measure method is introduced by [12]. Its possible values are from 0 to 1, which indicate that all users with similarity greater than zero are selected for the prediction process.
f) PIP similarity measure. This similarity measure method first computes a Boolean function followed by an agreement between two user ratings. After that, it calculates PIP factors based on whether the agreement is true or false. The value of the PIP similarity measure method is greater than zero and it can be any real number. The value of the PIP similarity measure method is calculated using Eq (5). g) Singularity measure. In this similarity measure, the user ratings are grouped as positive and negative, and the singularity value of user and item is computed. The prediction is generated based upon computed singularity value. h) NHSM similarity measure. This similarity measure method is calculated using Eq (6). This method considers both local and global preference of a user rating. Its value range is from 0 to 1.

Effect of the similarity threshold
To estimate the impact of the similarity threshold (θ s ), its values are varied from 0 to 1.0 with a step size of 0.1 at a fixed nearest neighbor size of 5. It is obvious from Fig 2(A) In Fig 2(C) of the ML-1M dataset, similar behavior of ML-100K dataset is observed. After θ s = 0.9, MAE, and RMSE values are increases, while the value of the precision, recall, and F-measure is decreasing. In Fig 2(D) of the Epinions dataset, MAE values remain almost constant till θ s = 0.7 and after that, it starts increasing. For RMSE, performance remains almost constant till θ s = 0.7, and at θ s = 0.8 and θ s = 0.9 its increases, while at threshold = 1.0, performance decreases to 1.199. Which means that values of the precision, recall, and F-measure are also decreasing due to an increase in the threshold value. After observing these experimental details of the reported datasets, it can be concluded that values of the MAE and RMSE increases with increase in the value of the similarity threshold (θ s ) while values of the precision, recall, and F-measure are decreases.

Determining best weights for α and β
In order to determine the best weights of α and β to achieve improved performance of the IPWR similarity measure method as compared with state-of-the-art similarity measure methods, its performance is evaluated by varying different weights of α and β from 0 to 1. The experimental details about the performance of the IPWR similarity measure method on different weights of α and β are given in Table 6 for all the reported datasets. In Table 6, the bold values indicate the best weights of α and β which gives the best performance of the IPWR similarity measure method in terms of the performance evaluation metrics for all the reported datasets. For ML-100K dataset, the best results are found when α = 0.5 and β = 0.5. Similarly, Improved PCC weighted with RPB (IPWR) by setting α = 0.1 and β = 0.9, better results are gathered as compared to the case, when α = 0 and β = 1.0. This implies that RPB produces an important effect on the recommendation performance of the IPWR similarity measure method. For Epinions, CiaoDVD, and ML-1M datasets, the best performance of the IPWR similarity measure method is obtained by setting weights of α = 0.4 and β = 0.6. In these datasets, it is also obvious that performance is improved when weights of α and β are increased from 0.0, 1.0 to 0.1, and 0.9. Furthermore, the worst performance is obtained, when α = 1.0 and β = 0.0, which indicates that RPB alone is not able to Table 6        yield good results. In the case of the MovieTweetings (10-star rating) dataset, the best performance of the IPWR similarity measure method is achieved by setting the weight of α = 0.4 and β = 0.6.

Effect of the number of neighbors and comparison of state-of-the-art similarity measure methods with IPWR similarity measure method
In this section, details about performance comparison of the IPWR similarity measure method with state-of-the-art similarity measure methods (i.e. Standard PCC, CPCC, WPCC, SPCC, COSINE, PIP, Singularity measure, and NHSM) is carried. The performance of the IPWR similarity measure method and its competitor methods are analyzed in terms of the performance evaluation metrics (i.e. MAE, RMSE, Precision, Recall, and F-measure) by varying a different number of neighbors whose details are shown in (Fig 3A-3E) to (Fig 7A-7E). Performance analysis on the Epinions (5-star rating) dataset. The performance analysis of the IPWR similarity measure method with its competitor methods is presented in (Fig 3A-3E) for the Epinions dataset. (Fig 3A-3E) present performance analysis of the IPWR similarity measure method with its competitor methods in terms of the performance evaluation metrics (i.e. MAE, RMSE, Precision, Recall, and F-measure) versus the different number of neighbors. The performance analysis indicates that IPWR similarity measure method outperforms as compared to its competitor similarity measure methods in terms of the performance evaluation metrics. In order to verify the results of the IPWR similarity measure method among its competitor methods, statistical analysis is performed on the experimental results of the reported datasets using non-parametric Wilcoxon matched-pairs signed-rank test and paired t-test whose details are presented in Tables 7-10. The statistical analysis is performed by setting a standard value of the level of significance at 0.05 (95%) and results are analyzed in terms of the z-score, p-value, and t-score. For all the reported datasets, the value of p is less than the value of the significance level (i.e. α�0.05), which proof the robust performance of the IPWR similarity measure method as compared to its competitor similarity measure methods. The pvalue also indicate that the performance of the IPWR similarity measure method is strongly significant. It means that there is significant differences exist between IPWR similarity measure method and its competitor similarity measure methods. The negative sign of z-score also indicates the robustness of the IPWR similarity measure method as compared to its competitor similarity measure methods. Moreover, we have applied paired t-test to reassess the statistical significance of the experimental results of the IPWR similarity measure method and its competitor similarity measure methods. The value of the degrees of freedom (df) for the paired ttest is set to 11. The negative sign of t-score also indicates that the IPWR similarity measure method gives the best performance as compared to its competitor similarity measure methods. Furthermore, all the results of t-score are highly significant which shows that there is significant difference exist between IPWR similarity measure method and its competitor similarity measure methods.
Performance analysis on the MovieLens-100K (ML-100K) (5-star rating) dataset. The performance comparisons in terms of the performance evaluation metrics (i.e. MAE, RMSE, Precision, Recall, and F-measure) of the IPWR similarity measure method with its competitor similarity measure methods is presented in (Fig 4A-4E) for the ML-100K dataset. After analyzing the experimental details of (Fig 4A-4E), it can be concluded that IPWR similarity measure method outperforms as compared to its competitor similarity measure methods. Furthermore, performance analysis of the IPWR similarity measure method in terms of accuracy is also better than its competitor similarity measure methods because it considers the average rating of an item and an average rating of a user simultaneously. Similarly, the RPB of a user is ignored  by its competitor similarity measure methods, while IPWR similarity measure also considers user RPB, which result in improved performance. Performance analysis on the MovieLens-1M (ML-1M) (5-star rating) dataset. (Fig 5A-5E) present performance analysis of the IPWR similarity measure method with its competitor similarity measure methods by a varying number of neighbors and analyzing performance in terms of the MAE, RMSE, Precision, Recall and F-measure on the ML-1M dataset. It is a large dataset with a sparsity of 95.80%. For this dataset, the IPWR similarity measure method also performs better than its competitor similarity measure methods except for the NHSM similarity measure method. The reason for the better performance of the NHSM similarity measure method is its proximity, significance, and singularity (PSS) factors, which are calculated for each common rating individually. However, the results of the NHSM similarity measure method are very close to IPWR similarity measure method. In the case of RMSE, the performance of the IPWR similarity measure method is better as compared to the NHSM similarity measure method.
Performance analysis on the CiaoDVD (5-star rating) dataset. (Fig 6A-6E) present performance analysis for a different number of neighbors in terms of performance evaluation metrics for the CiaoDVD dataset. In this dataset, mean ratings per user are 1.13 and mean ratings per item are 4.48. The IPWR similarity measure method also performs better on this dataset because of the consideration of the user RPB and improved PCC. It is also observed that the increase in the number of neighbors does not affect the performance of the IPWR similarity measure method. This implies that a small number of neighbors give the same results as a large number of neighbors, which also reduces the computational cost of the IPWR similarity measure method as compared to its competitor similarity measure methods.
Performance analysis on the MovieTweetings (10-star rating) dataset. The MovieTweetings dataset is also publicly available dataset which is crawled from twitter. This dataset consists of movie ratings in the range of [1][2][3][4][5][6][7][8][9][10]. In this rating scale, 1 indicates the worst rating and 10 indicates the best rating of a movie given by a user. This dataset contains a total of 759746 user ratings given by 56304 users using a total of 32810 movies. The detail of the user rating scales and user rating distribution for the MovieTweetings dataset is presented in Table 11. The sparsity of MovieTweetings dataset is 99.90%. The best performance of the IPWR similarity measure method is achieved by setting the weight of α = 0.4 and β = 0.6 whose experimental details are presented in Table 6 for the MovieTweetings dataset. The performance comparisons in terms of the performance evaluation metrics (i.e. MAE, RMSE, Precision, Recall, and F-measure) of the IPWR similarity measure method with its competitor similarity measure methods is presented in (Fig 7A-7E) for the MovieTweetings dataset. After analyzing the experimental details of (Fig 7A-7E), it can be concluded that IPWR similarity measure method outperforms in terms of evaluation metrics (i.e. MAE, RMSE, and Precision) as compared to its competitor similarity measure methods because it considers the average rating of an item and an average rating of a user simultaneously. Similarly, the RPB of a user is ignored by its competitor similarity measure methods, while IPWR similarity measure method also considers user RPB, which result in improved performance. The statistical details of the IPWR similarity measure and its competitor similarity measure methods for the MovieTweetings dataset are presented in Table 12. The statistical analysis is performed using a non-parametric Wilcoxon matched-pairs signed-rank test and paired t-test to investigate and provide statistical evidence regarding the robust performance of the IPWR similarity measure method as compared to its competitor similarity measure methods. The value of the degrees of freedom (df) for the paired t-test is set to 11 for the MovieTweetings dataset. According to the statistical results presented in Table 12, the negative sign of z-score and t-score shows that IPWR similarity measure method shows the best performance as compared to its competitor methods. Furthermore, z-score results of the IPWR similarity measure are highly significant in all cases as its p-values are less than the level of significance at 0.05 (95%) however in case of t-score, all the results are significant except of NHSM and IPWR with SD similarity measure methods which give insignificant results due to least significant difference exist between these two methods.

Conclusion and future work
In this article, we identify and analyze some limitations of the state-of-the-art similarity measure methods, especially the PCC similarity measure method. These similarity measures are used by collaborative filtering based methods to find similar users' and items' profiles. User RPB is one of the most important aspects, which is ignored by traditional similarity measurement methods. Typically, different users have different RPB and based upon this behavior, they tend to rate items with values that have not many variations. In this article, we have proposed an improved similarity measure method that uses the user's RPB pattern to find similar users. The RPB pattern is modeled as a function of user rating averages and user variance or standard deviation. The proposed IPWR similarity measure method overcomes some inherent shortcomings of a standard PCC similarity measure and it also considers the RPB pattern of users to achieve better performance. The extensive experiments are performed to check the effectiveness of the IPWR similarity measure method. The performance of the IPWR similarity measure method is compared against state-of-the-art similarity measure methods using four publically available datasets. The results show that the IPWR similarity measure performs better than conventional and state-of-the-art similarity measure methods like NHSM and PIP. It is also observed from experimental results that IPWR similarity measure method performs better on sparse datasets (i.e. Epinions, CiaoDVD, and MovieTweetings datasets) than dense datasets (i.e. ML-100K and ML-1M datasets). In future work, we intend to learn weights of α and β using various machine learning methods such as support vector machine (SVM), particle swarm intelligence (PSO), and artificial neural networks (ANN). Although the IPWR similarity measure method considers both local and global information of user ratings in terms of user RPB, one important information, which is the actual rating value of non-co-rated items are ignored. In the future, we will also try to incorporate this information using the IPWR similarity measure method. Furthermore, a friendship network of a user can also be used as an additional information source in the extremely cold start or sparse conditions.