Privacy-preserving recommender system using the data collaboration analysis for distributed datasets

Tomoya Yanagi; Shunnosuke Ikeda; Noriyoshi Sukegawa; Yuichi Takano

doi:10.1371/journal.pone.0319954

Abstract

In order to provide high-quality recommendations for users, it is desirable to share and integrate multiple datasets held by different parties. However, when sharing such distributed datasets, we need to protect personal and confidential information contained in the datasets. To this end, we establish a framework for privacy-preserving recommender systems using the data collaboration analysis of distributed datasets. Numerical experiments with two public rating datasets demonstrate that our privacy-preserving method for rating prediction can improve the prediction accuracy for distributed datasets. More precisely, compared to the individual analysis in which each party analyzes only its own dataset, our method reduced prediction errors by an average of 4.5% and up to 7.0%. This study opens up new possibilities for privacy-preserving techniques in recommender systems.

Citation: Yanagi T, Ikeda S, Sukegawa N, Takano Y (2025) Privacy-preserving recommender system using the data collaboration analysis for distributed datasets. PLoS ONE 20(4): e0319954. https://doi.org/10.1371/journal.pone.0319954

Editor: Ayesha Maqbool, National University of Sciences and Technology, PAKISTAN

Received: November 21, 2024; Accepted: February 10, 2025; Published: April 21, 2025

Copyright: © 2025 Yanagi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available from the GroupLens website (https://grouplens.org/datasets/movielens/100k/) and Toshihiro Kamishima’s website (https://www.kamishima.net/sushi/).

Funding: This work was partially supported by JSPS KAKENHI Grant Number JP21K04526, awarded to Y.T., as well as a joint research project between the University of Tsukuba and Toyota Motor Corporation, also awarded to Y.T. The websites of funders are available at https://www.jsps.go.jp/english/ and https://global.toyota/en/, respectively. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Background

In recent years, advances in information and communication technology have made it possible for individuals and organizations to access vast amounts of information on a daily basis. Against this background, recommender systems have become one of the most successful technologies based on data analytics [1,2]. These systems involve suggesting a personalized list of appealing items based on the users’ past preferences. Various algorithms, including collaborative filtering [3] and matrix factorization [4], have been developed to provide high-quality recommendations for users. Additionally, these algorithms are actively implemented in a variety of web services [5,6].

A number of prior studies have focused on improving the prediction accuracy of recommender algorithms [7]. In particular, deep learning techniques have recently received attention and achieved remarkable success in various real-world services [8,9]. The recommendation accuracy can be enhanced not only through sophisticated algorithms but also through data fusion, which involves merging multiple datasets into a single, consistent, and clean representation [10]. Therefore, to develop highly accurate recommender algorithms, it is desirable to share and integrate multiple datasets possessed by different parties.

When sharing such distributed datasets, however, we need to protect personal and confidential information contained in the datasets [11–13]. Privacy is an important issue for recommender systems, which contain information on a large number of registered users [7]. In fact, recommender systems may access users’ sensitive information, such as gender, age, and location, to improve the prediction accuracy [14]. Therefore, integrating datasets for better recommendations requires an algorithmic framework to protect user privacy.

Related work

Various privacy-preserving techniques have been used to protect personal and confidential information in recommender systems [11–13]; these include anonymization, randomization, cryptography techniques, differential privacy, and federated learning.

Anonymization involves removing personally identifiable information from data, whereas randomization aims at modifying data by adding some random noise. Although these techniques are readily available in recommender systems [15–18], they can lead to the loss of key information that is useful in generating accurate recommendations [13].

Cryptography techniques allow us to conduct data analysis while keeping data secret. Canny [19] was probably the first to apply cryptography techniques to collaborative filtering. Nikolaenko et al. [20] designed a recommender system based on matrix factorization using a cryptography technique known as garbled circuits. Although various cryptography techniques have been used for privacy protection in recommender systems, these techniques incur substantial computational and communication costs, making them impractical [13].

Differential privacy is a rigorous mathematical definition of privacy to guarantee that no individual-level information in a dataset is leaked. This is typically achieved by adding noise to individual data, where the amount of noise depends on the required level of privacy protection. McSherry and Mironov [21] proposed a movie recommender algorithm that mitigates the impact of noise added for differential privacy through post-processing. Various algorithms have been studied to apply differential privacy to collaborative filtering [22,23], matrix factorization [24–26], and variational autoencoders [27]. However, when a high degree of privacy protection is required, the differential privacy significantly reduces recommendation accuracy [28].

Federated learning aims at training a machine learning model from multiple local datasets while keeping them decentralized. The basic strategy consists of training local models from each local dataset and updating the global model by centralizing only the trained parameters. Ammad-ud-din et al. [29] proposed a federated collaborative filtering algorithm based on matrix factorization. Since then, various recommender systems based on federated learning have been proposed for privacy protection [30]; these systems use matrix factorization [31,32], deep neural networks [33,34], and variational autoencoders [35]. However, one of the challenges facing federated learning is the substantial communication costs required to train a machine learning model, which can result in long execution times [36].

To overcome these challenges associated with the privacy-preserving techniques mentioned above, Imakura and Sakurai [37] proposed the data collaboration analysis for distributed datasets. This method enables collaborative data analysis by sharing intermediate representations, each being individually constructed for privacy protection by each party from original datasets. Bogdanov et al. [38] demonstrated that when the number of involved parties is small, the data collaboration analysis consistently outperforms federated learning with lower computational and communication costs. Imakura et al. [39] proved that original datasets can be protected by the data collaboration analysis against insider and external attacks. Imakura et al. [40] proposed an improved version of the data collaboration analysis sharing intermediate representations that are not readily identifiable to the original datasets.

It is, however, impossible to directly apply the data collaboration analysis to recommender systems. A main reason for this is that the data collaboration analysis is specifically designed for regression and classification tasks. Therefore, this analysis cannot be performed to impute missing values in the user–item rating matrix used for rating prediction. To our knowledge, no prior studies have applied the data collaboration analysis to missing value imputation.

Our contribution

The motivation behind our research is to establish an effective data-sharing framework to overcome the challenges associated with the existing privacy-preserving techniques in recommender systems. For this purpose, we propose a framework for privacy-preserving recommender systems using the data collaboration analysis, which has demonstrated promising results in regression and classification tasks. While it is impossible to directly apply the data collaboration analysis to recommender systems, our main techinical contribution is to take advantage of the flattened data format used in the factorization machines [41]. Specifically, we convert a user–item rating matrix into the flattened format with the aim of treating missing value imputation as regression analysis. This conversion makes it possible to apply data collaboration analysis to rating prediction. Additionally, our algorithm can handle both horizontal and vertical integration of rating matrices.

To verify the effectiveness of our privacy-preserving method for rating prediction, we performed numerical experiments with two public rating datasets. For comparison, we implemented two alternative methods: the individual analysis, which uses distributed datasets separately for rating prediction; and the centralized analysis, which merges distributed datasets without preserving privacy. Numerical results demonstrate that our method using the data collaboration analysis outperformed the individual analysis by an average of 4.5% and up to 7.0% in the prediction error. Moreover, our method improved its prediction accuracy as the number of involved parties increased.

Methods

In this section, we first provide an overview of the data collaboration analysis based on the literature [37,42]. We next describe a rating prediction problem for recommendations and its data format conversion. We then formulate our rating prediction algorithm using the data collaboration analysis for privacy protection. Throughout this paper, we denote the set of consecutive integers ranging from 1 to n as [ n ] : = { 1 , 2 , … , n } .

Distributed datasets

We suppose that there are m parties, and each party k ∈ [ m ] holds the following dataset containing n(k) instances:

(1)

where for each instance i ∈ [ n ( k ) ] , is a row vector composed of p predictor variables, and is a response variable to be predicted.

In the individual data analysis, each party k ∈ [ m ] uses the dataset Eq (1) separately to train its own machine learning model. In this case, the prediction accuracy of trained models tends to be lower when the size of each dataset is small. In the centralized data analysis, the datasets held by parties k ∈ [ m ] are merged as

(2)

where . This merged dataset is then used to train a machine learning model; however, sharing datasets is often impossible due to privacy issues.

Intermediate representations

We aim to analyze multiple datasets collaboratively while keeping them decentralized for privacy protection. For this purpose, we use the anchor dataset, which is an artificially prepared dataset containing r instances:

(3)

where is a row vector corresponding to p predictor variables for each instance i ∈ [ r ] . This dataset can be generated using random numbers or through more sophisticated methods as proposed by Takahashi et al. [43] and Imakura et al. [44]. The anchor dataset is shared by all parties.

To preserve the privacy of the original datasets Eq (1), each party k ∈ [ m ] applies an individual encoding function

(4)

to all instances (i.e., row vectors) of the original and anchor datasets, and S, thereby transforming them into the following intermediate representations:

(5)

For example, each party can employ dimensionality reduction techniques, such as the principal component analysis and the singular value decomposition, as an encoding function Eq (4). Each party k ∈ [ m ] sends the intermediate representations Eq (5) and the response vector to the analyzer for data collaboration. The privacy of the original datasets is preserved by not sharing the encoding function with other parties or the analyzer, and not sharing the anchor dataset with the analyzer.

Collaboration representations

It is ineffective to merge and analyze the intermediate representations collected from parties k ∈ [ m ] , because they are created by each party using different encoding functions to preserve privacy. To remedy this situation, the analyzer applies the integration function

(6)

to all instances (i.e., row vectors) of the intermediate representations Eq (5), resulting in the following -dimensional collaboration representations:

(7)

Imakura and Sakurai [37] considered the following linear integration function:

(8)

where is the coefficient matrix for each party k ∈ [ m ] . In this case, the collaboration representations for party k ∈ [ m ] are expressed as follows:

(9)

Recall here that the anchor dataset is common among all parties. Therefore, the corresponding collaboration representations should be close to each other to ensure data consistency between different parties :

(10)

To estimate linear integration functions Eq (8), Imakura and Sakurai [37] proposed solving a minimum perturbation problem. This method first computes the singular value decomposition of the following matrix, which consists of the intermediate representations Eq (5) of the anchor dataset:

(11)

where . Let be a target matrix consisting of the left-singular vectors corresponding to the largest singular values of .

We next calculate the coefficient matrix such that the squared distance between the target matrix Z and the collaboration representation Eq (9) of the anchor dataset will be minimized for each party k ∈ [ m ] as follows:

(12)

We obtain as the analytical solution to problem Eq (12), where is the Moore–Penrose pseudoinverse of the matrix . The generalized eigenvalue problem [45] and the matrix manifold optimal computation [46] were also proposed to estimate linear integration functions Eq (8).

Collaborative machine learning

The collaboration representations Eq (9) obtained from the integration functions Eq (8) enable collaborative machine learning. Specifically, after combining the collaboration representations from all parties k ∈ [ m ] into a single dataset , we train a machine learning model

(13)

The obtained machine learning model Eq (13) and the integration function Eq (8) are returned to each party. Each party k ∈ [ m ] can then acquire a highly accurate machine learning model for the original dataset by adding its encoding function Eq (4) as follows:

(14)

Download:

Table 1. User–item rating matrix.

https://doi.org/10.1371/journal.pone.0319954.t001

Download:

Table 2. Flattened data format derived from Table 1.

https://doi.org/10.1371/journal.pone.0319954.t002

Data format conversion

The process of predicting ratings in recommender systems deals with a user–item rating matrix as shown in Table 1, where each entry indicates the user’s rating for a particular item.

For example, these ratings can be expressed as a five-point scale of the degree of preferences, or as a binary scale indicating “likes/dislikes.” We now focus on the problem of predicting ratings for items that have not yet been rated by a target user in a ratings matrix.

As described in the preceding section on related work, the data collaboration analysis is aimed at regression and classification tasks, where a single response variable is predicted from multiple predictor variables. However, these tasks are fundamentally different from the rating prediction problem, which aims at imputing missing values in a rating matrix. Therefore, the data collaboration analysis cannot be directly applied to the rating prediction problem for recommender systems.

To overcome the aforementioned challenge, we focus on the factorization machines [41] as a rating prediction algorithm. This method treats the rating prediction problem as a regression task by converting the rating matrix into a flattened data format as shown in Table 2. Here, dummy variables for users and items are created as predictor variables, and the corresponding ratings are employed as a response variable. Additionally, other attributes such as user gender and item category can readily be added as predictor variables infactorization machines. This data format conversion makes it possible to apply the data collaboration analysis to recommender systems.

Our algorithm

Algorithm 1 describes our rating prediction algorithm using the data collaboration analysis. We suppose that each party k ∈ [ m ] possesses its user–item rating matrix , where U(k) is the user index set in party k ∈ [ m ] , and I is the common item index set.

Algorithm 1: Rating prediction using the data collaboration analysis

Input: .

Output: .

{Phase 0 (party-side) . Data preparation}

1: Convert into for each k ∈ [ m ] .

2: Share S among all parties.

{Phase 1 (party-side) . Construction of intermediate representations}

3: Apply to and S to obtain and for each k ∈ [ m ] .

4: Send , , and for all k ∈ [ m ] to the analyzer.

{Phase 2 (analyzer-side) . Construction of collaboration representations}

5: Compute the singular value decomposition of to obtain Z.

6: Calculate as the solution to problem Eq (12) for k ∈ [ m ] .

7: Calculate for k ∈ [ m ] .

8: Merge and for all k ∈ [ m ] to obtain as in.

{Phase 3 (analyzer-side) . Collaborative rating prediction}

9: Train h from .

10: Return h and (i.e., ) to each party k ∈ [ m ] .

Each party k ∈ [ m ] first converts the data format of rating matrix from to (e.g., from Table 1 to Table 2), as defined by Eq (1). Here, n(k) is the number of ratings held by party k ∈ [ m ] , and is the number of dummy variables corresponding to users and items. All parties also share an anchor dataset for data collaboration.

In the first phase, each party k ∈ [ m ] applies its encoding function Eq (4) to the predictor matrix and the anchor dataset S, thereby yielding the privacy-preserving intermediate representations Eq (5). Then, all parties k ∈ [ m ] send , , and to the analyzer.

In the second phase, the analyzer calculates the collaboration representations Eq (9) based on the integration functions Eq (8) according to the procedure described in the preceding section on collaboration representations. After that, the analyzer merges the resultant datasets as

(15)

In the third phase, the analyzer uses the merged dataset to train a machine learning model Eq (13) for rating prediction. The analyzer then returns the trained model h and integration function to each party k ∈ [ m ] . As a result, each party k ∈ [ m ] obtains a highly accurate model Eq (14) for rating prediction.

Note that Algorithm 1 is aimed at the horizontal partitioning [30], where items are shared but users are different between parties; however, this algorithm can be readily applicable to the vertical partitioning [30], where users are shared but items are different among parties. Specifically, the same algorithm can be used after transposing rating matrices.

Experimental results and discussion

In this section, we evaluate the effectiveness of our privacy-preserving method for rating prediction through numerical experiments.

Experimental setup

We used two public rating datasets, namely the MovieLens 100K (https://grouplens.org/datasets/movielens/100k/) and SUSHI preference (https://www.kamishima.net/sushi/) datasets. The MovieLens 100K dataset contains 100,000 ratings for 1,682 movies from 943 users on a scale of 1 to 5. The SUSHI preference dataset contains ratings for 100 sushi items from 5,000 users, where each user rated 10 sushi items on a scale of 0 to 4. For reference, Fig 1 shows the frequency distributions of ratings in these datasets.

Download:

Fig 1. Frequency distributions of ratings.

https://doi.org/10.1371/journal.pone.0319954.g001

Ratings of each user were randomly split into training (80%) and testing (20%) datasets. These datasets were randomly distributed to 9 and 50 parties for the MovieLens 100K and SUSHI preference datasets, respectively, with each party holding ratings from 100 users. Machine learning models were trained on the training dataset, and prediction accuracy was measured by the root mean squared error (RMSE) in user ratings on the testing dataset. The process of dataset generation and accuracy evaluation was repeated 10 times, and the average RMSE values with standard errors are given as numerical results.

Download:

Fig 2. Workflow of the methods for analyzing distributed datasets.

https://doi.org/10.1371/journal.pone.0319954.g002

We compare the prediction accuracy of the following methods for analyzing distributed datasets (Fig 2):

Individual analysis: individual machine learning models are trained by each party using only its own dataset Eq (1);
Centralized analysis: a common machine learning model is trained by all parties using the merged dataset Eq (2), while privacy concerns are disregarded;
Data collaboration analysis: a common machine learning model is trained using the data collaboration analysis for privacy protection (Algorithm 1).

For the data collaboration analysis, an anchor dataset Eq (3) containing 1,000 instances (i.e., r = 1 , 000) was generated from a uniform distribution over the interval [ 0 , 1 ] . The intermediate representations Eq (5) were created using the singular value decomposition for dimensionality reduction. The numbers of dimensions for the intermediate and collaboration representations were set as and , respectively for k ∈ [ m ] . As in the data collaboration analysis, the flattened data format (e.g., Table 2) was used to train machine learning models in both the individual and centralized analyses.

For rating prediction, we used the following machine learning models:

pyFM: a Python implementation of factorization machines (https://github.com/coreylynch/pyFM);
LightGBM: a framework of gradient boosting decision trees [47].

The length of latent vectors in the pyFM was set to 100. To mitigate the sparsity of datasets, the number of dataset dimensions for the LightGBM was reduced to 200 through the singular value decomposition. The hyperparameters of the LightGBM were tuned through 5-fold cross-validation using Optuna [48], a library for hyperparameter optimization.

Results

Tables 3 and 4 give the testing RMSEs provided by the rating prediction methods for the MovieLens 100K and SUSHI preference datasets, respectively. Recall that these tables give average values over 10 repetitions, with standard errors in parentheses.

Download:

Table 3. Testing RMSE for the MovieLens 100K dataset (m = 9).

https://doi.org/10.1371/journal.pone.0319954.t003

Download:

Table 4. Testing RMSE for the SUSHI preference dataset (m = 50).

https://doi.org/10.1371/journal.pone.0319954.t004

First, we compare the performance of the three analysis methods (i.e., individual, centralized, and data collaboration analyses). For both datasets, the RMSEs of the data collaboration analysis were larger than those of the centralized analysis and smaller than those of the individual analysis. Recall that the individual analysis does not merge distributed datasets, whereas the centralized analysis merges distributed datasets without protecting privacy. For these reasons, the obtained results are considered reasonable.

Next, we discuss the dimensionality of intermediate and collaboration representations (i.e., and ) employed in the data collaboration analysis. As for the dimensionality of intermediate representations, the lowest RMSEs were obtained when setting . This is because low-dimensional intermediate representations lose information from the original datasets. As for the dimensionality of collaboration representations, the lowest RMSEs were often obtained when setting . This is probably because high-dimensional collaboration representations increase the flexibility of integration functions.

Then, we focus on the machine learning models (i.e., pyFM and LightGBM) implemented for rating prediction. The factorization machines (pyFM) consistently outperformed the gradient boosting decision trees (LightGBM) for the individual and centralized analyses, whereas the opposite results were obtained for the data collaboration analysis. Although factorization machines generally perform well for rating prediction, the data collaboration analysis involves dimensionality reduction for creating intermediate representations, thereby improving prediction performance of gradient boosting decision trees.

Download:

Fig 3. Testing RMSE as a function of the number of involved parties.

https://doi.org/10.1371/journal.pone.0319954.g003

Fig 3 shows the testing RMSEs provided by the rating prediction methods as a function of the number of involved parties for the MovieLens 100K and SUSHI preference datasets. Recall that this figure shows average values over 10 repetitions, with standard errors shown as error bars. Note here that the factorization machines were used for the individual and centralized analyses, and that the gradient boosting decision trees were used for the data collaboration analysis; this is due to the effectiveness of these combinations, as previously discussed. The numbers of dimensions for the intermediate and collaboration representations were set as and for k ∈ [ m ] .

Naturally, increasing the number of parties did not improve prediction accuracy in the individual analysis. In contrast, the RMSEs decreased with the increasing number of parties in the centralized and data collaboration analyses. Additionally, although the individual analysis always showed relatively large standard errors of RMSEs, the data collaboration analysis decreased the standard errors as the number of parties increased. These results demonstrate that increasing the number of involved parties yields more accurate and stable prediction models in the data collaboration analysis.

Conclusion

We focused on privacy-preserving recommender systems on distributed datasets. For this purpose, we designed a rating prediction algorithm using the data collaboration analysis [37] for privacy protection. In this algorithm, the user–item rating matrix is converted into the flattened format with the aim of treating missing value imputation as regression. This conversion makes it possible to apply the data collaboration analysis to rating prediction for recommendations. Note also that our algorithm is readily applicable to both horizontal and vertical integration of rating matrices.

To verify the effectiveness of our method, we performed numerical experiments using two public rating datasets. Our method for collaborative rating prediction improved the prediction accuracy while protecting privacy of the original datasets. Even when each party owns a small dataset, our method can build a reliable recommender system through data collaboration. Numerical results also confirmed that the prediction accuracy of our method surpassed that of the individual analysis for both datasets and was comparable to that of the centralized analysis, particularly in the SUSHI preference dataset.

A future research direction will be to use more sophisticated methods for creating anchor datasets [43,44] and integration functions [45,46] in the data collaboration analysis. Other directions include the application of the data collaboration analysis to generating a high-quality list of recommendations [49–51] and promoting sales through price optimization [52,53].

References

1. Resnick P, Varian HR. Recommender systems. Commun ACM. 1997;40(3):56–8.
- View Article
- Google Scholar
2. Aggarwal CC. Recommender systems. New York: Springer; 2016.
3. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. Grouplens: An open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work. 1994. p. 175–186.
4. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8):30–7.
- View Article
- Google Scholar
5. Lu J, Wu D, Mao M, Wang W, Zhang G. Recommender system application developments: A survey. Decis Support Syst. 2015;74:12–32.
- View Article
- Google Scholar
6. Rodríguez-Hernández M, Ilarri S. AI based mobile context-aware recommender systems from an information management perspective: Progress and directions. Knowl Based Syst. 2021;215:106740.
- View Article
- Google Scholar
7. Bobadilla J, Ortega F, Hernando A, Gutiérrez A. Recommender systems survey. Knowl Based Syst. 2013;46:109–32.
- View Article
- Google Scholar
8. Zhang S, Yao L, Sun A, Tay Y. Deep learning based recommender system. ACM Comput Surv 2019;52(1):1–38.
- View Article
- Google Scholar
9. Wu S, Sun F, Zhang W, Xie X, Cui B. Graph neural networks in recommender systems: A survey. ACM Comput Surv. 2022;55(5):1–37.
- View Article
- Google Scholar
10. Bleiholder J, Naumann F. Data fusion. ACM Comput Surv. 2009;41(1):1–41.
- View Article
- Google Scholar
11. Jeckmans A, Beye M, Erkin Z, Hartel P, Lagendijk R, Tang Q. Privacy in recommender systems. Social media retrieval. 2013. p. 263–281.
12. Himeur Y, Sohail SS, Bensaali F, Amira A, Alazab M. Latest trends of security and privacy in recommender systems: A comprehensive review and future perspectives. Comput Secur. 2022;118:102746.
- View Article
- Google Scholar
13. Ogunseyi TB, Avoussoukpo CB, Jiang Y. A systematic review of privacy techniques in recommendation systems. Int J Inf Secur 2023;22(6):1651–64.
- View Article
- Google Scholar
14. Sun Y, Zhang Y. Conversational recommender system. In: Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval. 2018. p. 235–244.
15. Weinsberg U, Bhagat S, Ioannidis S, Taft N. BlurMe: Inferring and obfuscating user gender based on ratings. In: Proceedings of the 6th ACM conference on recommender systems. 2012. p. 195–202.
16. Polatidis N, Georgiadis C, Pimenidis E, Mouratidis H. Privacy-preserving collaborative recommendations based on random perturbations. Expert Syst Appl. 2017;71:18–25.
- View Article
- Google Scholar
17. Wei R, Tian H, Shen H. Improving k-anonymity based privacy preservation for collaborative filtering. Comput Electr Eng. 2018;67:509–19.
- View Article
- Google Scholar
18. Saleem Y, Rehmani M, Crespi N, Minerva R. Parking recommender system privacy preservation through anonymization and differential privacy. Eng Rep. 2021:3(2);e12297.
- View Article
- Google Scholar
19. Canny J. Collaborative filtering with privacy. In: Proceedings of the 2002 IEEE symposium on security and privacy. 2002. p. 45–57.
20. Nikolaenko V, Ioannidis S, Weinsberg U, Joye M, Taft N, Boneh D. Privacy-preserving matrix factorization. In: Proceedings of the 2013 ACM SIGSAC conference on computer & communications security. 2013. p. 801–812.
21. McSherry F, Mironov I. Differentially private recommender systems: Building privacy into the Netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. 2009. p. 627–636.
22. Aïmeur E, Brassard G, Fernandez JM, Mani Onana FS. ALAMBIC: A privacy-preserving recommender system for electronic commerce. Int J Inf Secur 2008;7(5):307–34.
- View Article
- Google Scholar
23. Yin C, Shi L, Sun R, Wang J. Improved collaborative filtering recommendation algorithm based on differential privacy protection. J Supercomput 2019;76(7):5161–74.
- View Article
- Google Scholar
24. Berlioz A, Friedman A, Kaafar MA, Boreli R, Berkovsky S. Applying differential privacy to matrix factorization. In: Proceedings of the 9th ACM conference on recommender systems. 2015. p. 107–114. https://doi.org/10.1145/2792838.2800173
25. Liu Z, Wang Y, Smola A. Fast differentially private matrix factorization. In: Proceedings of the 9th ACM conference on recommender systems. 2015. p. 171–178.
26. Ran X, Wang Y, Zhang L, Ma J. A differentially private nonnegative matrix factorization for recommender system. Inf Sci. 2022;592:21–35.
- View Article
- Google Scholar
27. Fang L, Du B, Wu C. Differentially private recommender system with variational autoencoders. Knowl Syst. 2022;250:109044.
- View Article
- Google Scholar
28. Bagdasaryan E, Poursaeed O, Shmatikov V. Differential privacy has disparate impact on model accuracy. Adv Neural Inf Process Syst. 2019;32:15479–88.
- View Article
- Google Scholar
29. Ammad-ud-din M, Ivannikova E, Khan S, Oyomno W, Fu Q, Tan K. Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv Preprint. 2019.
- View Article
- Google Scholar
30. Yang L, Tan B, Zheng VW, Chen K, Yang Q. Federated recommendation systems. Federated learning: privacy and incentive. 2020. p. 225–239.
31. Liang F, Pan W, Ming Z. FedRec: Lossless federated recommendation with explicit feedback. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 35. 2021. p. 4224–4231.
32. Zhang H, Luo F, Wu J, He X, Li Y. LightFR: Lightweight federated recommendation with privacy-preserving matrix factorization. ACM Trans Inf Syst. 2023;41(4):1–28.
- View Article
- Google Scholar
33. Liu Z, Yang L, Fan Z, Peng H, Yu PS. Federated social recommendation with graph neural network. ACM Trans Intell Syst Technol 2022;13(4):1–24.
- View Article
- Google Scholar
34. Wang Q, Yin H, Chen T, Yu J, Zhou A, Zhang X. Fast-adapting and privacy-preserving federated recommender system. VLDB J 2021;31(5):877–96.
- View Article
- Google Scholar
35. Imran M, Yin H, Chen T, Nguyen QVH, Zhou A, Zheng K. ReFRS: Resource-efficient federated recommender system for dynamic and diversified user preferences. ACM Trans Inf Syst. 2023;41(3):1–30.
- View Article
- Google Scholar
36. Zhang C, Xie Y, Bai H, Yu B, Li W, Gao Y. A survey on federated learning. Knowl Based Syst. 2021;216:106775.
- View Article
- Google Scholar
37. Imakura A, Sakurai T. Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE-ASME J Risk Uncertain Eng Syst A Civ Eng. 2020:6(2);04020018.
- View Article
- Google Scholar
38. Bogdanova A, Nakai A, Okada Y, Imakura A, Sakurai T. Federated learning system without model sharing through integration of dimensional reduced data representations. arXiv Preprint. 2020.
- View Article
- Google Scholar
39. Imakura A, Bogdanova A, Yamazoe T, Omote K, Sakurai T. Accuracy and privacy evaluations of collaborative data analysis. arXiv Preprint. 2021.
- View Article
- Google Scholar
40. Imakura A, Sakurai T, Okada Y, Fujii T, Sakamoto T, Abe H. Non-readily identifiable data collaboration analysis for multiple datasets including personal information. Inf Fusion. 2023;98:101826.
- View Article
- Google Scholar
41. Rendle S. Factorization machines. In: 2010 IEEE international conference on data mining. 2010. p. 995–1000.
42. Imakura A, Ye X, Sakurai T. Collaborative data analysis: Non-model sharing-type machine learning for distributed data. Knowledge management and acquisition for intelligent systems. New York: Springer; 2021. .
43. Takahashi Y, Chang H, Nakai A, Kagawa R, Ando H, Imakura A, et al. Decentralized learning with virtual patients for medical diagnosis of diabetes. SN Comput Sci 2021;2(4):1–10.
- View Article
- Google Scholar
44. Imakura A, Kihira M, Okada Y, Sakurai T. Another use of SMOTE for interpretable data collaboration analysis. Expert Syst Appl. 2023;228:120385.
- View Article
- Google Scholar
45. Kawakami Y, Takano Y, Imakura A. New solutions based on the generalized eigenvalue problem for the data collaboration analysis. arXiv Preprint. 2024.
- View Article
- Google Scholar
46. Nosaka K, Yoshise A. Creating collaborative data representations using matrix manifold optimal computation and automated hyperparameter tuning. In: 2023 IEEE 3rd international conference on electronic communications, internet of things and big data. 2023. p. 180–185.
47. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W. LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146–54.
- View Article
- Google Scholar
48. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019. p. 2623–2631.
49. Wang J. Mean-variance analysis: A new document ranking theory in information retrieval. European conference on information retrieval. New York: Springer; 2009. .
50. Hurley N, Zhang M. Novelty and diversity in top-n recommendation – analysis and evaluation. ACM Trans Internet Technol 2011;10(4):1–30.
- View Article
- Google Scholar
51. Yasumoto Y, Takano Y. Mean–variance portfolio optimization with shrinkage estimation for recommender systems. Optimization online. 2023.
- View Article
- Google Scholar
52. Klein R, Koch S, Steinhardt C, Strauss AK. A review of revenue management: Recent generalizations and advances in industry applications. Eur J Oper Res 2020;284(2):397–412.
- View Article
- Google Scholar
53. Ikeda S, Nishimura N, Sukegawa N, Takano Y. Prescriptive price optimization using optimal regression trees. Oper Res Perspect. 2023;11:100290.
- View Article
- Google Scholar

[ref1] 1. Resnick P, Varian HR. Recommender systems. Commun ACM. 1997;40(3):56–8.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Aggarwal CC. Recommender systems. New York: Springer; 2016.

[ref3] 3. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. Grouplens: An open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work. 1994. p. 175–186.

[ref4] 4. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8):30–7.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Lu J, Wu D, Mao M, Wang W, Zhang G. Recommender system application developments: A survey. Decis Support Syst. 2015;74:12–32.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Rodríguez-Hernández M, Ilarri S. AI based mobile context-aware recommender systems from an information management perspective: Progress and directions. Knowl Based Syst. 2021;215:106740.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Bobadilla J, Ortega F, Hernando A, Gutiérrez A. Recommender systems survey. Knowl Based Syst. 2013;46:109–32.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Zhang S, Yao L, Sun A, Tay Y. Deep learning based recommender system. ACM Comput Surv 2019;52(1):1–38.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Wu S, Sun F, Zhang W, Xie X, Cui B. Graph neural networks in recommender systems: A survey. ACM Comput Surv. 2022;55(5):1–37.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref10] 10. Bleiholder J, Naumann F. Data fusion. ACM Comput Surv. 2009;41(1):1–41.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref11] 11. Jeckmans A, Beye M, Erkin Z, Hartel P, Lagendijk R, Tang Q. Privacy in recommender systems. Social media retrieval. 2013. p. 263–281.

[ref12] 12. Himeur Y, Sohail SS, Bensaali F, Amira A, Alazab M. Latest trends of security and privacy in recommender systems: A comprehensive review and future perspectives. Comput Secur. 2022;118:102746.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Ogunseyi TB, Avoussoukpo CB, Jiang Y. A systematic review of privacy techniques in recommendation systems. Int J Inf Secur 2023;22(6):1651–64.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Sun Y, Zhang Y. Conversational recommender system. In: Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval. 2018. p. 235–244.

[ref15] 15. Weinsberg U, Bhagat S, Ioannidis S, Taft N. BlurMe: Inferring and obfuscating user gender based on ratings. In: Proceedings of the 6th ACM conference on recommender systems. 2012. p. 195–202.

[ref16] 16. Polatidis N, Georgiadis C, Pimenidis E, Mouratidis H. Privacy-preserving collaborative recommendations based on random perturbations. Expert Syst Appl. 2017;71:18–25.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref17] 17. Wei R, Tian H, Shen H. Improving k-anonymity based privacy preservation for collaborative filtering. Comput Electr Eng. 2018;67:509–19.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref18] 18. Saleem Y, Rehmani M, Crespi N, Minerva R. Parking recommender system privacy preservation through anonymization and differential privacy. Eng Rep. 2021:3(2);e12297.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref19] 19. Canny J. Collaborative filtering with privacy. In: Proceedings of the 2002 IEEE symposium on security and privacy. 2002. p. 45–57.

[ref20] 20. Nikolaenko V, Ioannidis S, Weinsberg U, Joye M, Taft N, Boneh D. Privacy-preserving matrix factorization. In: Proceedings of the 2013 ACM SIGSAC conference on computer & communications security. 2013. p. 801–812.

[ref21] 21. McSherry F, Mironov I. Differentially private recommender systems: Building privacy into the Netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. 2009. p. 627–636.

[ref22] 22. Aïmeur E, Brassard G, Fernandez JM, Mani Onana FS. ALAMBIC: A privacy-preserving recommender system for electronic commerce. Int J Inf Secur 2008;7(5):307–34.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref23] 23. Yin C, Shi L, Sun R, Wang J. Improved collaborative filtering recommendation algorithm based on differential privacy protection. J Supercomput 2019;76(7):5161–74.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref24] 24. Berlioz A, Friedman A, Kaafar MA, Boreli R, Berkovsky S. Applying differential privacy to matrix factorization. In: Proceedings of the 9th ACM conference on recommender systems. 2015. p. 107–114. https://doi.org/10.1145/2792838.2800173

[ref25] 25. Liu Z, Wang Y, Smola A. Fast differentially private matrix factorization. In: Proceedings of the 9th ACM conference on recommender systems. 2015. p. 171–178.

[ref26] 26. Ran X, Wang Y, Zhang L, Ma J. A differentially private nonnegative matrix factorization for recommender system. Inf Sci. 2022;592:21–35.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref27] 27. Fang L, Du B, Wu C. Differentially private recommender system with variational autoencoders. Knowl Syst. 2022;250:109044.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref28] 28. Bagdasaryan E, Poursaeed O, Shmatikov V. Differential privacy has disparate impact on model accuracy. Adv Neural Inf Process Syst. 2019;32:15479–88.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref29] 29. Ammad-ud-din M, Ivannikova E, Khan S, Oyomno W, Fu Q, Tan K. Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv Preprint. 2019.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref30] 30. Yang L, Tan B, Zheng VW, Chen K, Yang Q. Federated recommendation systems. Federated learning: privacy and incentive. 2020. p. 225–239.

[ref31] 31. Liang F, Pan W, Ming Z. FedRec: Lossless federated recommendation with explicit feedback. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 35. 2021. p. 4224–4231.

[ref32] 32. Zhang H, Luo F, Wu J, He X, Li Y. LightFR: Lightweight federated recommendation with privacy-preserving matrix factorization. ACM Trans Inf Syst. 2023;41(4):1–28.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref33] 33. Liu Z, Yang L, Fan Z, Peng H, Yu PS. Federated social recommendation with graph neural network. ACM Trans Intell Syst Technol 2022;13(4):1–24.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref34] 34. Wang Q, Yin H, Chen T, Yu J, Zhou A, Zhang X. Fast-adapting and privacy-preserving federated recommender system. VLDB J 2021;31(5):877–96.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref35] 35. Imran M, Yin H, Chen T, Nguyen QVH, Zhou A, Zheng K. ReFRS: Resource-efficient federated recommender system for dynamic and diversified user preferences. ACM Trans Inf Syst. 2023;41(3):1–30.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref36] 36. Zhang C, Xie Y, Bai H, Yu B, Li W, Gao Y. A survey on federated learning. Knowl Based Syst. 2021;216:106775.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref37] 37. Imakura A, Sakurai T. Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE-ASME J Risk Uncertain Eng Syst A Civ Eng. 2020:6(2);04020018.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref38] 38. Bogdanova A, Nakai A, Okada Y, Imakura A, Sakurai T. Federated learning system without model sharing through integration of dimensional reduced data representations. arXiv Preprint. 2020.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref39] 39. Imakura A, Bogdanova A, Yamazoe T, Omote K, Sakurai T. Accuracy and privacy evaluations of collaborative data analysis. arXiv Preprint. 2021.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref40] 40. Imakura A, Sakurai T, Okada Y, Fujii T, Sakamoto T, Abe H. Non-readily identifiable data collaboration analysis for multiple datasets including personal information. Inf Fusion. 2023;98:101826.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref41] 41. Rendle S. Factorization machines. In: 2010 IEEE international conference on data mining. 2010. p. 995–1000.

[ref42] 42. Imakura A, Ye X, Sakurai T. Collaborative data analysis: Non-model sharing-type machine learning for distributed data. Knowledge management and acquisition for intelligent systems. New York: Springer; 2021. .

[ref43] 43. Takahashi Y, Chang H, Nakai A, Kagawa R, Ando H, Imakura A, et al. Decentralized learning with virtual patients for medical diagnosis of diabetes. SN Comput Sci 2021;2(4):1–10.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref44] 44. Imakura A, Kihira M, Okada Y, Sakurai T. Another use of SMOTE for interpretable data collaboration analysis. Expert Syst Appl. 2023;228:120385.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref45] 45. Kawakami Y, Takano Y, Imakura A. New solutions based on the generalized eigenvalue problem for the data collaboration analysis. arXiv Preprint. 2024.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref46] 46. Nosaka K, Yoshise A. Creating collaborative data representations using matrix manifold optimal computation and automated hyperparameter tuning. In: 2023 IEEE 3rd international conference on electronic communications, internet of things and big data. 2023. p. 180–185.

[ref47] 47. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W. LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146–54.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref48] 48. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019. p. 2623–2631.

[ref49] 49. Wang J. Mean-variance analysis: A new document ranking theory in information retrieval. European conference on information retrieval. New York: Springer; 2009. .

[ref50] 50. Hurley N, Zhang M. Novelty and diversity in top-n recommendation – analysis and evaluation. ACM Trans Internet Technol 2011;10(4):1–30.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref51] 51. Yasumoto Y, Takano Y. Mean–variance portfolio optimization with shrinkage estimation for recommender systems. Optimization online. 2023.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref52] 52. Klein R, Koch S, Steinhardt C, Strauss AK. A review of revenue management: Recent generalizations and advances in industry applications. Eur J Oper Res 2020;284(2):397–412.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref53] 53. Ikeda S, Nishimura N, Sukegawa N, Takano Y. Prescriptive price optimization using optimal regression trees. Oper Res Perspect. 2023;11:100290.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

Figures

Abstract

Introduction

Background

Related work

Our contribution

Methods

Distributed datasets

Intermediate representations

Collaboration representations

Collaborative machine learning

Data format conversion

Our algorithm

Experimental results and discussion

Experimental setup

Results

Conclusion

References