Research on differential privacy protection method based on user tendency

Zhaowei Hu

doi:10.1371/journal.pone.0288823

Abstract

It is a new attack model to mine user’s activity rule from user’s massive data. In order to solve the privacy leakage problem caused by user tendency in current privacy preserving methods, an extended differential privacy preserving method based on user’s tendency is proposed in the paper. By constructing a Markov chain, and using the Markov decision process, it equivalently expresses user’s tendency as measurable state transition probability, which can transform qualitative descriptions of user’s tendency into a quantitative representation to achieve an accurate measurement of the user tendency. An extended (P,ε)-differential privacy protection method is proposed in the work, by introducing a privacy model parameter R, it combines the quantified user’s propensity probability with a differential privacy budget parameter, and it can dynamically add different noise amounts according to the user’s tendency, so as to achieve the purpose of protecting the user’s propensity privacy information and improve data availability. Finally, the feasibility and effectiveness of the proposed method was verified by experiments.

Citation: Hu Z (2023) Research on differential privacy protection method based on user tendency. PLoS ONE 18(10): e0288823. https://doi.org/10.1371/journal.pone.0288823

Editor: Sathishkumar V E, Jeonbuk National University, REPUBLIC OF KOREA

Received: February 12, 2023; Accepted: July 4, 2023; Published: October 26, 2023

Copyright: © 2023 Zhaowei Hu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was partially supported by the grants from the Changzhou University Doctoral Research Funding Project (ZMF23020074), Jilin Province Science and Technology Research Planning Project (JJKH20210455KJ). There is no conflict of interest regarding the publication of this paper, and do not lead to any conflict of interests regarding the publication of this manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

With the continuous development of the Internet of Things and the Data Mining Technology, it is a new attack model to mine user’s activity patterns from a large amount of user data [1–4]. Based on the user’s historical trajectory data, the attacker first models the activity patterns of the target user, and he makes a specific quantitative description of it. Then, according to the constructed model, the attacker recovers and reconstructs the user’s trajectory to speculate the sensitive information hidden by the user, to predict the possibility of the user’s access to a certain geographical location, and to predict the user’s trajectory and path.

User’s movement behavior is often closely related to the time attribute, and the user’s activity patterns can reflect his tendency, which is very important for the attacker to analyze the user’s behavior attribute [5]. The more balanced the probability of users visiting different locations is, the less obvious the behavioral tendency is shown, and the smaller the risk of privacy information is disclosed. The more balanced the probability of users accessing different locations is, the less obvious their behavioral tendency is. The less tendentious a user is shown, the less the risk of privacy information is leaked. Therefore, only the tendency of users is protected in the data set, it can effectively protect the user’s privacy information and prevent sensitive information from being leaked.

In the paper, it aims to protect the tendency of mobile users, it is based on the state transitions in Markov models, a Markov chain model is constructed to quantify the characteristics of the probability transformation between adjacent location points. The probability characteristic and privacy budget parameter are associated to dynamically add noise and protect user tendency to prevent user’s private information from being leaked. The main work done in the paper is as follows.

First, an extended differential privacy protection method is proposed to protect a user’s tendency based on the Markov decision process. By analyzing the user’s historical trajectory data and calculating the probability of the user’s access to specific location points, the user’s tendency probability is converted into the state transition probability of a Markov chain. It uses the state transition probability of the location point to quantify the user’s tendency, and it protects the trajectory whose tendency is higher than the threshold to prevent the leakage of private information.

Second, in the basis of traditional ε-differential privacy protection method, by constructing a Markov chain and introducing privacy model parameter, user’s personal tendency is quantified and measured. It combines user tendency probability and differential privacy budget, which can dynamically add different and appropriate noise according to control user visit tendency, to protect user sensitive information and improve data availability.

Third, the feasibility and effectiveness of the proposed method are demonstrated by comparing with the other two privacy protection methods for analysis. The results show that the proposed method can provide efficient privacy protection and the data availability.

The paper is organized as follows: Section 2 discusses related work, and section 3 presents research content and makes a statement about the research question, section 4 systematically introduces research methods and introduces the methods of measuring the effectiveness of privacy protection. An analysis and discussion of the experiments are provided in section 5. Section 6 concludes the paper.

2. Related work

A user’s location and trajectory data implies the user’s activity law and personal tendency, according to the tendencies, an attacker can deduce the user’s sensitive information from the anonymous data and predict the possibility of the user to visit certain geographical locations. The activity rule of user himself and similar groups affect the next access location of user with different probabilities. The attack based on user’s tendency depends on the characterization of specific attack targets by the model, it uses the privacy model of the user’s location trajectory as the user profile to re-identify and de-anonymize the user [6–10]. Ashbrook firstly applied the Markov model to the analysis of geographical location information, it built a Markov model for users based on their temporal transfer characteristics of locations and predicted the next location of users based on the user movement Markov model [11].

Alvarez combined the user’s trajectory information with the local road network information to accurately predict the user’s current travel destination [12]. Gambs obtained the POI of user trajectory after clustering based on location density and calculated the transition probability between POI with Mobility Markov Chain [13]. Pan updated the discrete time Markov chain model to the continuous time one [14], which can better simulate the state change process of user’s stay and transfer. However, when the user’s behavior trajectory changes, it takes a long time to complete the model update.

Wang believed that user’s mobile behaviors is often closely related to time, and the time attribute contained was very significant for analyzing user’s mobile behaviors [15]. In practical applications, Researcher found that user’s mobile behavior was centrality, that is, users were centered around several geographic locations. Gonzalez studied the movement pattern of users through mobile phone data, he found that people regularly return to a small number of previously visited locations, and the movement pattern can be modeled as a random process centered on a fixed point [16]. Song demonstrated that 93% of human movements are highly regular, and the user spends about 70% of his time at the location which he visits most frequently [17].

Sadilek used dynamic Bayesian networks to predict the location by using friends’ historical trajectory and situational information [18]. On the basis of considering the problem of data sparseness, Xue divided the trajectory into several sub- trajectories and used them to generate R-order reachable transition matrix to expand the prediction space, it predicted all positions through Bayes and returned the extracted top-N positions to achieve more accurate user position prediction [19]. Huo used the Bayesian model as a hidden location reasoning attack model to obtain the behavior pattern and POI preference from the historical data of user, and she used the behavior patterns of most users to infer the likelihood that a user will visit a POI [20].

In order to analyze the influence of user’s activity rules on his mobile data and protect his privacy information, Qiu proposed a semi-supervised learning attack model [21]. By analyzing user’s mobile characteristics and life rules, it can identify important locations such as residence and workplace from his encrypted mobile data, and its accuracy can reach more than 98%. In order to solve the privacy leakage problem caused by user activity rules in mobile data sets, Tu proposed a method to recover user trajectories from aggregated mobile data [22]. Based on user’s personal characteristics or activity rules, 73–91% of personal trajectories can be recovered from anonymous mobility data without any prior knowledge. In order to hide the sensitive labels and the user’s activity rule which were contained in the user trajectory data, Yao proposed a comprehensive orbital protection trajectory publishing algorithm [23]. It determined the hot spots and outliers by density clustering, and it obfuscated the precise position by generalization. It captured the relationship between sensitive labels and trajectory points in all records and added Laplacian noise for differential privacy protection. In order to protect sensitive information such as user activity rules contained in the trajectory data, Buchholz proposed a protected trajectory reconstruction model based on deep learning [24], which used the difference of noise to reconstruct the original track, so as to reduce the Euclianian distance and Hausdorff distance between the published trajectory and the original one to protect the privacy information. Su proposed a POI recommendation algorithm which integrated social relationship and geographical influence to solve the problem of sparse data in users’ access points of interest [25]. It considered the privacy protection of activity rules in user trajectory data, and adopted graph convolutional neural network to explicitly learn the cooperative relationship between user and user, POI and POI, user and POI, so as to alleviate the problem of data sparsity and protect user’s privacy information.

However, the current research on the personal characteristics and activity rules of user’s mobile data which mainly focuses on the analysis, modeling and prediction of the regularity of user’s activities, it can identify the user’s activity rules and behavior attributes which were implied in the location trajectory data. They do not systematically analyze and mine the user’s activity rules, and summarize the user’s tendencies, and then to protect the tendencies. The analysis of user’s activity data sets show that user’s daily location and trajectory data show obvious tendencies, 53% of the check-in locations in Brightkite and 31% of the check-in locations in Gowalla are previously visited by users. The reflected tendency is very important to analyze the user’s activity rules in mobile data. It is not only necessary to protect the user’s location, trajectory and identity privacy, but also to protect the tendency of user, so as to effectively prevent the disclosure of user’s privacy information. At present, the research on differential privacy protection method has made a great progress, how to use differential privacy protection method to protect user’s tendency is the focus of the paper.

In order to reduce data distortion and improve data utilization, Comas proposed an improved differential privacy solution that could provide the same privacy guarantee as standard differential privacy [26]. They believed that the standard formalization of differential privacy was stricter than the intuitive privacy guarantee. In practical applications, the indistinguishability between a data set and its neighboring data sets were sufficient. Zhao proposed a privacy protection method based on differential privacy clustering; it added Laplacian noise to the trajectory position count in the cluster to resist continuous query attacks [27]. The noise cluster center was obtained via the noise location data and the noise location count. The noise cluster center was used to prevent excessive noise from affecting the clustering effect, and to ensure the availability of data in clustering analysis. Cao proposed a differential privacy protection method based on time correlation for continuous data release [28]. They designed an efficient algorithm that calculates a temporary privacy leakage and converted traditional differential privacy protection mechanisms into temporary privacy leakage mechanisms. These methods could effectively enhance the protection effect of user trajectory data and improve the availability of data, but they did not analyze and protect the personal tendency which was contained in user trajectory data.

Zhang proposed a differential privacy protection method based on probability mechanisms, which used a probability counting structure to count the number of users in various areas, and added noise extracted from the Laplace distribution to achieve disturbance. Additionally, a user could control the privacy level by adjusting the parameters of Laplace distribution [29]. To solve the problem of privacy protection in the release and use of mobile population perception data, Kim proposed a new type of cooperative game privacy protection model. According to personal preference, it provided effective payment solutions for each participating device to protect personal privacy information [30]. Gao solved the feedback Nash equilibrium solution through dynamic programming based on the MCS system, which could make users and platforms to maximize privacy requirements and data utility respectively, to solve the trade-off between privacy protection and data utility [31]. Cho believed that the user movement behavior pattern was consistent with the characteristics of Gaussian distribution, and the mixed Gaussian model could be used to model the user movement [32]. Analysis of the Brightkite dataset shows that if a user visits a location for the first time, there is a 53% probability that he will visit it again. Thus, it can be seen that the propensity of user’s activities can greatly reflect the behavior characteristics and life patterns, and it is very important to adopt the differential privacy method to protect user’s propensity privacy from disclosure.

To sum up, the current research analyzes, models and predicts the activity rules contained in mobile user trajectories, and builds a de-anonymization attack model based on user’s activity rules. However, they do not protect user’s tendency by combining attack models. With the continuous accumulation of data and the development of data mining, attacks will also increase based on user’s activity pattern and tendency. If privacy protection cannot be carried out according to user’s tendency, the attacker can predict the possibility of a user visiting certain sensitive locations based on the tendency revealed from a user’s historical data. Thus, the attacker can predict the starting point and ending point of his journey and an accurate path to mine his hidden sensitive information. In the paper, the traditional differential privacy protection model is extended to protect the user’s tendency, by constructing the transition state of Markov chain, it quantifies the user’s tendency as the state transition probability of Markov chain, and the user orientation probability is dynamically correlated with the privacy budget parameter of the differential privacy protection model. It can realize the dynamic addition of disturbing noise, so as to control the user’s access tendency, to protect his sensitive information and improve the data availability.

The differences between the done work of this paper and the previous work are as follows: In the first place, a trajectory privacy protection method based on user orientation is proposed to protect user’s tendency. By analyzing the historical trajectory data of user and calculating the probability of user’s accessing location points, it converts the user’s orientation probability to the state transition probability of Markov chain, and uses the state transition probability to quantify the user orientation. If a user’s preference is higher than the privacy protection threshold, it needs to be protected to prevent privacy information leakage. In the next place, a (P, ε)-differential privacy protection methods have been proposed to improve the data availability, by constructing a Markov Chain and introducing the privacy model parameters, the user’s propensity probability and differential privacy budget parameters are dynamically correlated, to dynamically add differential noise according to user preference probability, to control his access tendency and protect his sensitive information.

In the paper, the Markov model and the differential privacy protection model are combined to propose an extended differential privacy protection model based on user’s tendency, which is mainly based on the following three reasons: Firstly, a user’s movement trajectory is actually a collection of location points in chronological order, and the transition of location points has the property of Markov state transition, which can be regarded as a Markov decision process. Secondly, the user’s tendency reflects the transition probability of the user from one location point to another, which conforms to the Markov property and is suitable for processing with the state transition probability of the Markov model. Thirdly, the greater probability of a user visiting a specific location is, the more obvious its tendency is; which means that its sensitivity is higher, and the stronger privacy protection is required. Therefore, more noise needs to be added, which can be achieved by the differential privacy protection method.

3. Problem statement

3.1 User’s tendency attack model

With the in-depth development of data mining analysis, the new attacks based on user activity patterns and personal tendencies are becoming increasingly active. An attacker uses a specific model to quantitatively describe the activity of the target user and analyzes the regularity user’s access to certain physical locations, so as to obtain the user’s tendency and private information in the process of moving. In this way, the attacker can obtain the likelihood of user visiting a certain physical location, and even accurately predict his itinerary.

By analyzing the historical trajectory data of mobile user, the user’s movement regularity or tendency can be obtained. For example, people generally start from the residential area to the work area in the morning, from the work area to the commercial area in the afternoon, and finally from the commercial area to the residential area. If the service functions and geographical attributes of the places represented by the location points are different, the staying duration, and usage time of people are also different. As shown in Fig 1, people generally eat in restaurants at 07:30, 12:00 and 18:00, work at company from 7:30 to 18:00. The total time spent in fast food per day will not be more than 2 hours, while dining in restaurants will exceed 4 hours, but at company it will exceed 8 hours. The length of time that people visit different places and the transition of visiting places in different time periods are related to the user’s living habits, activity patterns and personal preferences, which reflects the user’s personal tendency.

Download:

Fig 1. Schematic diagram of user tendency.

(a) Usage Time. (b) Duration Time.

https://doi.org/10.1371/journal.pone.0288823.g001

The attacker collects the user’s historical trajectory data, which can analyze the real probability of the user visiting a location: (1)

Where u represents the attacked user, attacked user A is represented as u_A. p represents all the information sets of the user at a certain location, and t represents the time when the user u is at this location.

The attack based on the user’s tendency is that the attacker analyzes his activity pattern from his historical trajectory data, then it obtains the user’s tendency and predicts the possible future activities to obtain his private information. Assuming that at time t, the probability that the attacker can infer the location of the user is: (2)

Where B represents the user tendency information obtained by the attacker, the probability is called the posterior probability, which is the user’s state transition probability and can be calculated from the user’s previous state. The whole model represents the probability that the attacker can infer the user’s location when he has obtained the above information.

The amount of prior information that the user is in a certain position at time t is , then the amount of position information is exposed to the attacker is [33, 34]: (3)

When M is higher than the set threshold, that is, when it reaches the propensity attack threshold, the attacker can analyze the user’s activity regular pattern and personal preference according to the prior probability of user access and the posterior probability calculated by statistics, and it can conduct attacks based on user tendency to obtain user’s private information. Therefore, the protection of a mobile user’s tendency is an important content of user privacy protection. When establishing a privacy protection model, the tendency probability p must be reduced to a certain threshold to ensure the validity of the model, so that an attacker cannot predict or speculate the user’s next state based on his inclination and current state, to conduct tendentious attacks and obtain his private information.

3.2 User tendency analysis based on Markov decision process

A trajectory is a sequence of location points that a mobile user visits in chronological order. The possibility of a user visiting a location is only related to the location visited at the previous moment, not the location visited before. This feature is consistent with the properties of state point changes in Markov models. Therefore, the position points on the user’s trajectory can be regarded as the state points in the Markov model, and the user’s tendency is analyzed by analyzing the regularity of the state transition in the Markov chain. Assuming that the states (position points) of the first n moments on the user’s trajectory are X₀,X₁,X₂,…X_n, the probability of the state x at the n+1 moment can be obtained from the Markov chain.

Definition 1: For a random sequence {X(s,t),t∈T,s∈S}, a time parameter set T = {t_n|t_n≥0,n∈N^*}, the corresponding state space value is {n = 0,1,2,…}, for any t_n∈T and the state i_n at time t_n, then the probability that the state of sequence X is i_n+1 at time t_n+1 is: (4)

The sequence {X(t),t∈T} is called a Markov chain, i.e. given past states: X⁽⁰⁾, X⁽¹⁾,X⁽²⁾, X⁽³⁾, X⁽⁴⁾,…,X^(n-1), and the current state X⁽ⁿ⁾, then the future state X⁽ⁿ⁺¹⁾ is independent of the past state, and the conditional probability that it is only related to the current state can be expressed as: (5)

A Markov chain is a stochastic process with Markov properties, it is characterized by the fact that the next adjacent state is only related to the current state, but not to the past state. In the Markov chain, the state of a random process can be converted to another after a certain period time, which is called the state transition probability. The value of the probability can reflect the regularity and preference of user behavior, that is, user orientation.

Definition 2: Given a Markov chain {X(t),t∈T}, the one-step transition probability at time t_n is: (6)

Where p_ij represents the probability that the sequence transitions from state i at time t_n to state j at time t_n+1., i,j∈S, one-step transition probability is referred to as transition probability. The transition probability is satisfied with the following properties: (7)

It assumes a sequence with n states; the one-step transition probability from state i to state j is denoted as p_ij, then the matrix composed of one-step transition probabilities at all time is called the probability transition matrix, which is expressed as: (8)

Example. A Markov chain with three states is constructed and its state transition probability matrix p as follows:

When X⁽ⁿ⁾ = 0, then P(X⁽ⁿ⁺¹⁾ = 0) = 0.1, P(X⁽ⁿ⁺¹⁾ = 1) = 0.4, P(X⁽ⁿ⁺¹⁾ = 2) = 0.5.
When X⁽ⁿ⁾ = 1, then P(X⁽ⁿ⁺¹⁾ = 0) = 0.6, P(X⁽ⁿ⁺¹⁾ = 1) = 0.2, P(X⁽ⁿ⁺¹⁾ = 2) = 0.2.
When X⁽ⁿ⁾ = 2, then P(X⁽ⁿ⁺¹⁾ = 0) = 0.3, P(X⁽ⁿ⁺¹⁾ = 1) = 0.1, P(X⁽ⁿ⁺¹⁾ = 2) = 0.6.

The transition probability is the tendency of being in state i at time n and in state j after a certain time interval. According to the trajectory division method proposed in the literature [35, 36], when the Markov chain is modeled and the tendency analysis is performed, user’s trajectory data can be serialized and divided into sub-trajectories that only contain two adjacent points. Finally, these sub-trajectories are spliced into a complete trajectory. Tendency analysis is to evaluate the possibility of a direct state transition via the state transition matrix.

Definition 3: The prior probability of user visiting a location point. The trajectory T is a state sequence composed of n position points L, where the ith position point is represented by L_i, and the value of state is represented by l_i, then the probability of a user visiting it is represented as: (9)

Therefore, the occurrence probability of any state is only related to the previous state, and it has nothing to do with other states. From the probability of the position point, the occurrence probability can determine the probability of trajectory T.

For any state L_i, its prior probability is: (10)

Where num(l_i) represents the number of occurrences of state L_i, and l_j∈∑ represents the sum number of all states.

The posterior probability is the transition probability from one position point to another, therefore it can be obtained through the state transition matrix. For any trajectory sequence T, the occurrence probability of each position point L is calculated separately, and the maximum value of all probabilities is regarded as the propensity probability of the trajectory sequence T.

Definition 4: User’s tendency probability. The tendency probability is the spatial attribute contained in the user’s trajectory; it is the probability expression of a user’s regularity and willingness to visit a specific location and place. It is also the probability expression of a user’s personal preference. This can be quantitatively expressed as the state transition probability in the user’s trajectory, which is defined as: (11)

Where P(l₁) represents the probability that the user accesses the first position point of trajectory T, and P(l_i) represents the probability that the user accesses the i-th position point of the trajectory T, which is the real probability that the state l_i occurs, it is the prior probability. P(l_i|l_i-1) represents the probability that the user visits the i-th point after visiting the (i-1)-th point of the trajectory T, which is called the posterior probability. It is the probability that the attacker can obtain by analyzing, counting, and calculating the historical trajectory of the user, and it is an important manifestation of the user’s tendency. Thus, the propensity is quantified as the probability of the state transition in the mobile user trajectory, which is specifically expressed as P(T)∈[0,1], where 0 means no tendency and 1 means a strong tendency.

3.3 Problem characterization

If a user visits a location for the first time, there is a great probability that he will visit it again. Thus, it can be seen that the propensity of user’s activities can greatly reflect the behavior characteristics and life patterns. In this paper, a user’s propensity is measured by its probability of visiting a specific location point or the probability value of the corresponding Markov chain. In privacy protection, the propensity probability is used to measure the degree of privacy leakage, which must be expressed formally and expressed as a specific value, the propensity is measurable.

It is not only necessary to protect the user’s location, trajectory and identity privacy, but also to protect the tendency of user, so as to effectively prevent the disclosure of his privacy information. However, the current research on user’s tendency protection has not been carried out in depth, how to use Markov model and differential privacy protection method to protect user’s tendency is the focus of this paper. The workflow of the proposed method as shown in Fig 2.

Download:

Fig 2. Workflow diagram.

https://doi.org/10.1371/journal.pone.0288823.g002

In this process, firstly, to calculate the probability of visiting each location point, and to calculate the state transition probability of position points and construct a Markov chain. Secondly, to select privacy protection parameters according to user needs, and to calculate the privacy budget parameters under different user tendency, and to add privacy protection noise. Finally, to publish anonymized trajectory data.

4. Proposed methodology

4.1 Markov model construction

From a statistical point of view, the Markov model is a widely used statistical model. The historical state closer to the present has a greater impact on the decision-making of the next state, while the earlier historical state can be ignored. If the user’s trajectory is related to the previous k states, a new trajectory will be generated, and the attacker cannot analyze the user’s original tendency from the new trajectory. Therefore, it is only necessary to use a first-order Markov model to deal with the user’s trajectory tendency.

According to the theory of probability theory and mathematical statistics, when the number of experiments is large, the probability can be measured by the frequency. Therefore, in this paper, the frequency of state transition is meant to approximately calculate the state transition probability. The method of calculating the transition frequency is used to calculate the state transition probability in the Markov model. The steps are as follows:

Perform preprocessing such as screening, filtering and segmentation based on historical trajectory data.
Calculate the initial frequency of state transition with the processed data to obtain the initial probability.
Calculate the frequency of state transition to obtain the state transition probability.
Construct a state transition matrix and build a Markov model.

As shown in Table 1, the state transition record is obtained by using the trajectory frequency calculation method, and n_ij is the transition times from the current state s_i to the future state s_j. By using the number of state transitions, the frequency of state transitions can be reached.

Download:

Table 1. Frequency table of state transitions.

https://doi.org/10.1371/journal.pone.0288823.t001

For subsequences {x₀,x₁,x₂,…,x_i,x_j,…,x_n}, f_kij is used to represent the frequency of transition from state i to state j, i,j∈E, then the transition frequency matrix of time interval Δt for: (12)

Substituting frequency for probability, the transition probability for time interval Δt is estimated as: (13)

Then the state transition probability with interval Δt is: (14)

The core of establishing a user-propensity Markov model is to calculate the prior probability of a location point and its transition probability. Let T be the set of user trajectory data, any trajectory can be represented as t_i (t_i∈T), then the prior probability of any point l_i (l_i∈t) on the trajectory t_i is: (15)

Where num(l_i,t_i) represents the occurrence of location point l_i in trajectory t_i, and num(t_i) represents the occurrence of location points in the trajectory t_i.

All user’s trajectories are divided into sub-trajectory, which contains only two adjacent nodes. The sub-trajectories are connected to synthesize synthetic trajectories after processing. Each node in the sub-trajectory is mapped to each state in the Markov model, and the transition of any two adjacent states are corresponded to the transition of position point in the trajectory sequence. The propensity probability is a conditional probability that can be expressed as the quotient of the number of specific trajectories and the total number of trajectories. The transition probability of two adjacent nodes l_i to l_j can be expressed as the propensity probability. The state transition matrix is composed of the state transition probabilities of all trajectories. The specific method is shown in Algorithm 1.

Download:

The main idea behind the algorithm is to calculate the access probability and the state transition probability by calculating the state transition frequency that finds the state transition probability in the Markov model. The trajectory sequence T is divided into a set of sub-trajectory sequences t_i which only contains adjacent nodes, and each node of the sub-trajectories is mapped to each state in the Markov model. First, the frequency of occurrence of subsequences is calculated, then the frequency of occurrence of each node is calculated, then the access probability and the transition probability are calculated, and finally the Markov model is constructed. Each loop calculates the number of trajectories and the position points on the trajectory is executed n times, so the time complexity of the algorithm is O(n²). In the Markov model, the transition probability of various states is a reflection of a user’s tendency to visit certain locations. Therefore, a user’s propensity has a great correlation with the probability of state transitions in Markov chains, and the trajectories with high propensity and high risk of privacy leakage should be protected.

4.2 Extended differential privacy protection model

A new (P_i,ε_i)-differential privacy protection model is proposed to protect user’s tendency privacy based on the constructed Markov chain. It combines the user’s state transition probability p_i and differential privacy budget parameter ε_i, and it dynamically adjusts ε_i according to p_i, so as to differentially add Laplacian noise, to dynamically protect the user’s tendency privacy, and to improve the data availability. The specific description is as follows:

Definition 5: (P_i, ε_i)-differential privacy protection model. If p_i is the propensity probability of user u transitions from a state to the state i, its corresponding differential privacy budget parameter ε_i is satisfied with the following conditions, then the differential privacy protection model is called (P_i, ε_i)-differential privacy protection model (16)

Where R is the privacy model parameter, which is a constant. R∈(0,1], it is divided into 10 privacy levels: A, B, C, D, E, F, G, H, I, J, where the corresponding values are: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0. Level A is the most sensitive and level J is the least sensitive. The R value is selected by a user or the system based on privacy protection requirements. The term p_i is the tendency probability of a user visiting a state l_i, that corresponds to the state transition probability of the Markov chain, p_i∈(0,1]. ε_i is the privacy budget parameter corresponding to the user in state l_i, ε_i∈(0,+∞).

It can be seen from Eq 16, for a constant R, the larger state transition probability p_i is, the smaller privacy budget parameter ε_i. Thus, the more noise is added, the better the privacy protection effect is. When p_i = 1, the privacy differential privacy budget parameter is a fixed constant, i.e., it does not change with the user’s tendency. Thus, the (P_i,ε_i)-differential privacy protection model is a traditional ε-differential privacy protection model. By introducing the privacy model parameter R, the differential privacy budget parameters can be changed according to a user’s tendency, and differential disturbance noise can be added to achieve dynamic adjustment and change the differential privacy protection. The model can effectively solve the problem of user tendency leakage in trajectory data, the specific execution process is shown in Algorithm 2.

Download:

The premise of the algorithm is to combine user’s tendency and a privacy model parameter, to calculate and set the differential privacy budget parameters and add a differential Laplacian noise, which can effectively prevent tendency attacking. The larger the user’s state transition probability P_i is, the smaller the allocated privacy budget ε_i is. The more noise is added, the higher degree of privacy protection is. Therefore, in this algorithm, after the undirected graph is constructed by the state transition matrix, when each vertex is connected to other vertices, the algorithm executes the most times in the undirected graph, so its time complexity is O(n(n-1)).

A dynamic balance is achieved between privacy protection and data utilization by correlating the tendency p and the privacy budget parameter ε with the privacy protection parameter R. If a better privacy protection effect is required, a smaller R value is selected, then more noise is added which improves the privacy protection effect and reduces the data utilization rate. However, if a better data utilization rate is required, a larger R value is selected, which adds less noise and reduces the privacy protection strength to improve the data utilization rate. The specific process is divided into five steps:

Step 1: Calculate the probability of location point. According to the user’s movement trajectory, it calculates the probability of visiting the location points, and it analyzes statistics and calculates the probability of the user transitioning of these state points.
Step 2: Build the Markov chain. It uses the weight of the state point to represent its strength in the cluster set, it calculates the number of transitions of states and normalizes it to the transition probability of different states in the Markov chain, and finally it to construct the Markov chain.
Step 3: Calculate privacy budget parameters. According to the privacy protection parameter R and the tendency probability P_i, it combines with the (P_i,ε_i)-differential privacy model conditions, the privacy budget ε_i is calculated which is correspond to the position point on the trajectory.
Step 4: Anonymize the trajectory data. According to the ε_i of different states, the Laplacian noise is dynamically added to generate the released trajectory sequence.
Step 5: Trajectory data release. Publish the final trajectory sequence to use by third parties.

4.3 Measurement of privacy protection effectiveness

4.3.1 The measurement of information leakage.

Information leakage (IL) is mainly measured by the degree of information leakage of original data in the published data. The smaller the degree of information leakage is, the stronger the privacy protection capabilities is. The degree of information leakage can be expressed as the ratio of correct matching of data records in the published data set and the original data set [37], which is specifically calculated as: (17)

Thus, n represents the number of data records in the original data set, and Pr(t’) represents the published data record [38], which can be calculated as: (18)

G is a cluster set which is contained in the original data record.

4.3.2 Measurement of data availability.

Data availability (DA) mainly refers to the efficiency of published trajectory data for a third party. It also reflects the loss rate of trajectory information during the anonymization process, which is generally measured by calculating the amount of information loss in published data. In this paper, the standard deviation of distance error between the position on the real trajectory and the corresponding position on the published trajectory is used to measure the distance error. The more amount of information loss in the trajectory, the lower availability of trajectory data. The specific calculation is: (19)

Where d_i represents the Euclidean distance between the position loc_i on real trajectory and the corresponding position point on published trajectory, represents the average distance between the real trajectory and the corresponding position point on the published trajectory, n represents the number of position points on the trajectory, and |L| represents the length of trajectory.

5. Experimental evaluation and results analysis

The proposed method was evaluated experimentally in terms of privacy protection effect, data availability and execution efficiency, and the feasibility and effectiveness of the proposed method are evaluated through experiments. The proposed method (referred to from here on as the (P, ε)-DP) was compared against the standard differential privacy protection method (referred to from here on as the DP) and the method proposed in paper [39] (referred to from here on as the α-DP).

5.1 Experimental design and parameter setting

The experiments are conducted on a computer with an Intel Core (TM) i5-3470 CPU @ 3.2 GHz and 8 GB RAM running over the Microsoft Windows 7.sp1.64 bit operating system. The algorithm is implemented in Python. The experimental data comes from the Geolife data set [40–42], which collects GPS trajectory data of 182 users over 5 years. The data set is represented by a time-stamped point sequence, each point contains latitude, longitude, Elevation, date and time information, it contains 17,621 trajectories, a total distance of 1,292,951 kilometers, and a total duration of 11,129 days. Among them, 91.5% of the trajectories are in line with the dense representation, and the data includes various outdoor activities in the user’s daily life, such as going to work and going home in daily life, as well as some recreational activities and sports activities, such as shopping, sightseeing, dining, hiking and cycling bicycles and more. In the paper, 1,000 trajectories of a certain area are sampled from this dataset, and approximately 1,346,480 location points are formed after sampling. In the experiments, the original data is first processed in batches to separate the user’s longitude, latitude, altitude, date and time, and it uses the GIS online conversion tool (https://www.mygeodata.cloud/) to resolve the user’s latitude and longitude into the actual address, Table 2 shows the processed data set.

Download:

Table 2. Experimental data set.

https://doi.org/10.1371/journal.pone.0288823.t002

5.2 Experimental results and analysis

5.2.1 Analysis of privacy protection effect.

Privacy protection effect mainly refers to the effect that user’s tendency in trajectory data is protected or hidden, it also refers to the disclosure degree of user’s tendency privacy information. The smaller disclosure degree of sensitive information is, the stronger privacy protection ability is. In practice, the intensity of privacy protection is indirectly reflected by differential privacy budget parameters. ε value is smaller, the more noise is added, and the implementation of the privacy protection level will be higher.

Fig 3 shows the privacy protection effect comparison of the proposed methods when the privacy budget parameter ε is set to {0.01, 0.1, 1, 10}. When the user’s tendency probability P is given, the smaller the privacy budget parameter is, the higher the degree of privacy protection is, and the degree of information leakage is small. With the value of privacy protection parameter ε is added, the difference privacy method decreases the disturbance degree of the original trajectory data, and the privacy protection level also is decreased. The degree of information leakage continues to increase, whereas the privacy protection effect continues to decrease. When the value of ε is small, more protection noise is added. The smaller the degree of information leakage, the better privacy protection effect is. As can be seen from Fig 3, when ε = 10, the average information leakage degree is significantly higher than the other three cases. With the increasing of probability P, the probability of user visiting a specific location is raised and the user’s tendency is raised, the sensitive information contained is added, if the amount of added noise is constant, the information leakage will be increased.

Download:

Fig 3. Comparison of information leakage under different parameters.

https://doi.org/10.1371/journal.pone.0288823.g003

R is the privacy model parameter, which is selected by the user or system according to the privacy protection requirements. When R is increased, it means that the privacy protection requirement of the user is reduced, and the added noise is reduced in the data set, so the degree of disruption of the real data set will be reduced, and the degree of privacy protection will be reduced. As a result, the level of privacy information leakage is increased. According to the formula 16, when privacy budget parameter ε is a constant, the parameter P is increased, the parameter R will be increased. However, the added noise of data set is not increased, the actual privacy information leakage risk will be increased further.

With the increase of privacy protection budget value, the disturbance of the original trajectory data is reduced by the differential privacy protection method, and the privacy protection level is reduced. Fig 4 shows the comparison of information leakage degree between the proposed method and the other two methods, the privacy model parameter is P = 0.6. Privacy protection is inversely proportional to the privacy budget ε, smaller ε means larger data distortion is, and the higher the degree of privacy protection is. When ε tends to zero, the privacy protection reaches the highest level in theory. As can be seen from Fig 4, it shows that the degree of information leakage of the three methods increases as the value of ε increases. When the value of ε is added, the protection noise is relatively reduced, which means that the possibility of sensitive information leakage will be increased.

Download:

Fig 4. Comparison of information leakage under different methods.

https://doi.org/10.1371/journal.pone.0288823.g004

The privacy information leakage degree of DP method is the highest, because it uses the privacy budget to randomly add noise, the added noise is distributed independently and uniformly, it is low correlation with the original track data. The attacker can reduce the interference caused by noise through filtering, so the effect of privacy protection is relatively low. The privacy information leakage degree of α-DP method is also higher, although the added noise data by α-DP method is time-dependent with user trajectory data, there is no specific protection for user’s tendency privacy. However, the privacy information leakage degree of the proposed (P,ε)-DP method is the lowest, it quantifies the probability of user’s tendency, and it dynamically adds different noise. It is more targeted to protect user’s tendency, so it achieves the lowest degree of information leakage. When the propensity probability is a constant, it can adjust the privacy-preserving strength according to the privacy model parameters. Compared with other methods, the information leakage of the proposed method is reduced by 13.03%、25.65% on average.

Fig 5 shows the influence of privacy model parameters on information leakage degree when ε = 0.01. When the privacy model parameter R is increased, the degree of leakage of user sensitive information is also expanded. Because R is added, it means that the user’s privacy protection requirement is reduced, then the added noise is reduced in the data set, and the degree of privacy protection of real data is reduced, therefore, the disclosure of private information is increased. Meanwhile, according to the Eq 16, the privacy budget parameter ε is a fixed constant, when the value of R is increased, the value of P is also increased, the tendency of user to visit a certain location becomes more obvious. However, the added noise is not increased in the data set, so the actual risk of privacy information leakage is further increased.

Download:

Fig 5. Effect of privacy model parameter on information leakage.

https://doi.org/10.1371/journal.pone.0288823.g005

5.2.2 Analysis of data availability.

Data availability refers to the efficiency which published trajectory data is used by third parties, and it also indirectly reflects the rate of loss of trajectory information during anonymization. Fig 6 shows the data availability comparison of the proposed methods when the privacy budget parameter ε is set to {0.01, 0.1, 1, 10}. For the different privacy protection parameters ε, the data availability is not the same. When the value of ε is small, the added protection noise is relatively large, and the data availability is relatively low. When ε = 0.01, its average data availability is significantly lower than the other three cases. As the user visiting probability P is increased, the probability of the user visiting a specific location is increased, and the sensitivity of data is raised. To protect user’s tendency privacy, the amount of added noise will continue to increase, which leads to data availability is decreased.

Download:

Fig 6. Comparison of data availability under different parameters.

https://doi.org/10.1371/journal.pone.0288823.g006

According to the constraints in the (P,ε)-differential privacy protection model, when the privacy model parameter R is a constant, the user’s propensity probability P is inversely proportional to the privacy budget, and the larger value of P is, the smaller value of ε is. In this case, the more disturbing noise is added, the availability of data is reduced. The proposed (P,ε)-DP method can flexibly assign certain privacy preserving model parameter according to the user’s propensity probability. When the privacy parameter is fixed, the larger the propensity probability is, the smaller the corresponding privacy budget is, and the added noise will be increased, so the availability of data is reduced

Fig 7 shows the comparison of the data availability between the proposed method with the other two methods when the privacy model parameter is P = 0.6. ε is a key parameter in differential privacy, it is used to determine the strength of privacy protection and the amount of added noise. In the experiment, the availability of data is tested with different ε, when the value range of ε is [0.1, 1] and the step size is 0.1. From the comparison of different ε values, the data availability of three methods are increased as the value of ε is added. When the value of ε is increased, the amount of added noise is relatively reduced, and data availability is increased.

Download:

Fig 7. Comparison of data availability under different methods.

https://doi.org/10.1371/journal.pone.0288823.g007

From the comparison of different methods, the DP method randomly adds the same noise so that the data availability is the lowest. Both α-DP and (P, ε)-DP method can add different noise, and the added noise is targeted, so that the availability of the data is higher. However, α-DP method cannot quantify the user’s tendency, it adds relatively more noise, so the usability of the data is lower than the (P, ε)-DP method. The (P, ε)-DP method can dynamically adjust the added disturbing noise through the privacy preserving parameter R and the propensity probability P, and the added noise is more accurate, so its data availability is the highest. Compared with other methods, the data availability of the proposed method is improved by 0.81% and 5.29% on average.

Fig 8 shows the influence of privacy model parameters on data availability. Thus, when ε = 0.01, with the privacy model parameter R is increased and the availability of data shows an upward trend. Because R is added, it means that the user’s privacy protection requirement is reduced, then the added noise is reduced in the data set, and the information loss of the data set is reduced, therefore, the availability of data is increased. According to the constraints in the (P, ε)-differential privacy protection model, when the privacy budget parameter ε is a fixed constant, the value of R is increased, then the value of P is also increased. At this time, the tendency of users to visit a certain location becomes more obvious. However, the added noise is not increased in the data set, so the data availability is improved. It can be seen from the experiment, when R = 0.1, the data availability is 72.13%, when R = 1, the data availability is 86.39%.

Download:

Fig 8. Effect of privacy model parameter on data availability.

https://doi.org/10.1371/journal.pone.0288823.g008

5.2.3 Analysis of execution efficiency.

Execution efficiency is the time complexity of execution; it is mainly measured via the execution time. In Fig 9, the execution efficiency comparison of the proposed methods when the privacy budget parameter ε is set to {0.01, 0.1, 1, 10} is shown. It shows that the execution time for each of the four parameters is not the same; instead, the execution time is proportional to the added noise. The more noise is added, the longer the execution time is. When the probability of a user visiting a specific location is less than 0.5, the user’s sensitive information is relatively small, the difference of added noise via four privacy protection parameters is not large and the impact of different privacy parameters on execution time is not obvious. When the user visiting probability is increased, sensitive information is also gradually increased; thus, as the amount of added noise keeps increasing, and the execution time also becomes larger. Additionally, when ε = 0.01, the average execution time is significantly higher than the other three cases. When the user visiting probability P value is added, a user’s probability of visiting a specific location is raised, and his tendency is increased. To protect his tendency privacy, the amount of added noise will gradually grow, and the execution time will continue to enlarge.

Download:

Fig 9. Comparison of execution efficiency under different parameters.

https://doi.org/10.1371/journal.pone.0288823.g009

Fig 10 shows the comparison of execution efficiency between the proposed method and two other methods when the privacy model parameter is P = 0.6. The execution time is closely related to the time complexity and calculation complexity of the algorithm. The comparison of various ε values show that the execution time of three methods is decreased with increasing ε. When the value of ε is increased, the added noise and the execution time are both relatively reduced.

Download:

Fig 10. Comparison of execution efficiency under different methods.

https://doi.org/10.1371/journal.pone.0288823.g010

When ε is small, the added disturbance noise is relatively more, and their privacy preservation effect is better. The efficiency of proposed method is slightly higher than the other two methods. When the user’s tendency is large, it needs to add more noise, while the proposed method can accurately add an appropriate amount of noise according to the quantized tendency probability. The other two methods cannot quantify the user’s tendency, and they need to add more noise to achieve the same privacy protection effect, so their time complexity are relatively high. As the ε is increased, the less noise is added, there is little difference in the execution efficiency of the three methods. Because the added disturbing noise is almost the same when the user’s propensity protection requirement is not strong, their time complexity is also basically the same. As can be seen from the experiments, the execution time of the proposed method is reduced by 1.71% and 4.48% on average.

5.2.4 Experimental results.

The experiments discussed above altogether indicated that the proposed method can provide efficient privacy protection for user’s tendency privacy and ensure high availability for published trajectory data., and its execution efficiency has been improved. The user’s tendency probability is quantified by Markov model, by constructing Markov chain, the user’s tendency probability is converted into the state points on the Markov chain, so that it can use the Markov model to deal with the user’s tendency to visit a specific location, it can improve privacy effectively. The user’s tendency is dynamically protected by differential privacy model, the user’s tendency and differential privacy budget parameters are correlated, which makes it possible to dynamically add noise according to the user’s tendency probability to provide personalized privacy protection, it can significantly improve data availability.

6. Conclusion

If a user visits a location for the first time, there is a great probability that he will visit it again. To solve the problem of privacy protection based on user’s activity rules and tendencies in current privacy protection methods, a differential privacy protection method is proposed to control user’s personal tendency and prevent sensitive information from leaking. User’s tendency is expressed as a state transition probability via the Markov decision process.Access and transition probabilities are calculated by analyzing a user’s trajectory, and a user’s access probability is converted into the weight of the Markov chain. The Markov decision process is used to analyze and quantify a user’s tendency. Thus, a transformation from a qualitative description of a user’s tendency to a quantitative representation is made by extending the differential privacy model. A privacy model uses parameters to combine the user’s propensity probability and the differential privacy budget, it can dynamically add differences and appropriate amount of noise to control user access tendency, protect user’s tendency privacy information and improve data availability. Finally, the feasibility and effectiveness of the proposed method are verified on real data.

However, differential privacy is effective to protect the offline data, but there are certain limitations in protecting online data and stream data, and the privacy protection effect will also be affected by the allocation of privacy budget parameters. Therefore, how to scientifically set acceptable privacy parameters to ensure the availability and consistency of published data which are the contents of the next research.

References

1. Huanhuan Wang, Xiao Zhang, Youbing Xia, Xiang Wu. A differential privacy preserving deep learning caching framework for heterogeneous communication network systems. International Journal of Intelligent Systems. 2022;37:11142–11166.
- View Article
- Google Scholar
2. Xiang Wu, Yongting Zhang, Minyu Shi, Pei Li, Ruirui Li, Neal N. Xiong. An adaptive federated learning scheme with differential privacy preserving[J]. Future Generation Computer Systems, 2022, 127:362–372.
- View Article
- Google Scholar
3. Ren Ozeki,Haruki Yonekura,Hamada Rizk.Sharing without caring: privacy protection of users’ spatio-temporal data without compromise on utility. Proceedings of the 30th International Conference on Advances in Geographic Information Systems. November 2022:1–2.
- View Article
- Google Scholar
4. Xingxing Xiong, Shubo Liu, Dan Li. A Comprehensive Survey on Local Differential Privacy. Security and Communication Networks. 08 Oct 2020.
- View Article
- Google Scholar
5. Zhirun Zheng,Zhetao Li,Jie Li. Utility-aware and Privacy-preserving Trajectory Synthesis Model that Resists Social Relationship Privacy Attacks. ACM Transactions on Intelligent Systems and Technology. 2022:13(44):1–28.
- View Article
- Google Scholar
6. Simin Zhu,Xin Lv,Lin Yu. Location Privacy Protection Method based on Variable-Order Markov Prediction Model. 2021 4th International Conference on Computer Science and Software Engineering (CSSE 2021)October 2021:25–30.
- View Article
- Google Scholar
7. Yi Yang,Yurong Cheng,Ye Yuan. Privacy-preserving cooperative online matching over spatial crowdsourcing platforms. Proceedings of the VLDB Endowment (PVLDB),September 2022, 16(1):51–63.
- View Article
- Google Scholar
8. Zhang Tao, Tianqing Zhu, Renping Liu. Correlated data in differential privacy: Definition and analysis. Concurrency and Computation: Practice and Experience. 19 September 2020. https://doi.org/10.1002/cpe.6015
- View Article
- Google Scholar
9. Miao He ,Fenhua Bai,Chi Zhang. A Blockchain-Enabled Location Privacy-preserving under Local Differential Privacy for Internet of Vehicles. Proceedings of the 2022 4th Blockchain and Internet of Things Conference. July 2022:84–91.
- View Article
- Google Scholar
10. Zhang Xiaojian, Fu Nan, Meng Xiaofeng. Locally Differentially Private Key-Value data collection. Chinese Journal of Computers. 2020,8(43):1479–1492.
- View Article
- Google Scholar
11. Daniel Ashbrook, Thad Starner. Learning Significant Locations and Predicting User Movement with GPS. //Proceeding of the International Symposium on Wearable Computer. Piscataway, NJ:IEEE, 2002:101–108.
- View Article
- Google Scholar
12. Alvarez-Garcia J.A., Ortega J.A., Gonzalez-Abril L.. Trip Destination Prediction based on Past GPS Log Using A Hidden Markov Model. Expert Systems with Applications. 2010,37(12):8166–8171.
- View Article
- Google Scholar
13. Sébastien Gambs, Marc-Olivier Killijian. De-Anonymization Attack on GeolocatedData. Journal of Computer and System Sciences. 2014,80(8):1597–1614.
- View Article
- Google Scholar
14. Jiangwei Pan, Vinayak Rao. Markov-Modulated Marked Poisson Processes for Check-In Data. //Proceedings of the 33 rd International Conference on MachinenLearning, New York, NY, USA, 2016:2244–2253.
15. Rong Wang, Min Zhang, Dengguo Feng. A De-Anonymization Attack on Geo-Located Data Considering Spatio-Temporal Influences. //Proceeding of the International Conference on Information and Communications Security. Berlin Herdeberg:Springer,2015:478–484.
- View Article
- Google Scholar
16. Samiul Hasan, Xianyuan Zhan. Understanding Urban Human Activity and Mobility Patterns Using Large-scale Location-based Data from Online Social Media. // Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing. Chicago, Illinois, USA. August 2013. 1–8.
17. Vaibhav Kulkarni, Abhijit Mahalunkar,Benoit Garbinato. Limits of Predictability in Human Mobility. MDPI Journal. 2019, 21(4), 1–27.
- View Article
- Google Scholar
18. Adam Sadilek, Henry Kautz. Finding Your Friends and Following Them to Where You Are. //Proceedings of the ACM International conference on Web search and data mining. Now York: ACM, 2012:723–732.
19. Andy Yuan Xue Rui Zhang, Zheng Yu. Destination Prediction by Sub-Trajectory Synthesis and Privacy Protection Against such Prediction. //Proceeding of the International Conference on Data Engineering. Piscataway, NJ: IEEE, 2013:254–265.
- View Article
- Google Scholar
20. Zheng Huo, Xiaofeng Meng, Rui Zhang. Feel Free to Check-In: Privacy Alert Against Hidden Location Inference Attacks in GeoSNs. Database System for Advanced Application. Berlin Heidelberg Springer, 2013:377–391.
- View Article
- Google Scholar
21. Yuchen Qiu, Yuanyuan Qiao, Aimin Zhang . Residence and Workplace Recovery: User Privacy Risk in Mobility Data. //Proceedings of the 2020 on Intelligent Cross-Data Analysis and Retrieval Workshop, June 2020:15–20.
- View Article
- Google Scholar
22. Zhen Tu, Fengli Xu, Yong Li. A New Privacy Breach: User Trajectory Recovery From Aggregated Mobility Data. IEEE/ACM TRANSACTIONS ON NETWORKING. 2018,26(3):1446–1459.
- View Article
- Google Scholar
23. LIN YAO, ZHENYU CHEN, HAIBO HU. Privacy Preservation for Trajectory Publication Based on Differential Privacy. ACM Transactions on Intelligent Systems and Technology. 2022,13(3):1–21.
24. Erik Buchholz, Alsharif Abuadbba, Shuo Wang. Reconstruction Attack on Differential Private Trajectory Protection Mechanisms. //Proceedings of the 38th Annual Computer Security Applications Conference. December, 2022:279–292.
- View Article
- Google Scholar
25. Chang Su, Bin Gong, Xianzhong Xie. Personalized Point-of-Interest Recommendation Based on Social and Geographical Influence. //2021 4th Artificial Intelligence and Cloud Computing Conference. December 2021:130–137.
- View Article
- Google Scholar
26. Soria-Comas, Jordi. Domingo-Ferrer, Josep. Sanchez, David. Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees. IEEE Transactions on Information Forensics and Security. 2017(16):1–12.
27. Xiaodong Zhao, Dechang Pi, Junfu Chen. Novel trajectory privacy-preserving method based on clustering using differential privacy. Expert Systems With Applications. 2020, 7(149):113241.
- View Article
- Google Scholar
28. Yang Cao, Masatoshi Yoshikawa, Yonghui Xiao. Quantifying Differential Privacy in Continuous Data Release Under Temporal Correlations. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. 2019,7(31):1281–1295.
- View Article
- Google Scholar
29. Jianpei Zhang, Yang Qing, Yiran Shen. A differential privacy based probabilistic mechanism for mobility datasets releasing. Journal of Ambient Intelligence and Humanized Computing. 08 February 2020,1–12.
- View Article
- Google Scholar
30. Kim Sungwook. A new differential privacy preserving crowdsensing scheme based on the Owen value. EURASIP Journal on Wireless Communications and Networking. 2019:158.1–10.
- View Article
- Google Scholar
31. Hongjie Gao, Haitao Xu, Long Zhang. A Differential Game Model for Data Utility and Privacy-Preserving in Mobile Crowdsensing. IEEE Access. 2019,9(7):128526–128533.
- View Article
- Google Scholar
32. Eunjoon Cho, Seth A. Myers. Friendship and Mobility: User Movement in Location-based Social Networks. //Proceedings of the ACM SIGKDD International conference on Knowledge discovery and data mining. Now York: ACM, 2011:1082–1090.
33. Gunter CA, May MJ, Stubblebine SG. A formal privacy system and its application to location based services. In: Proc. of the 4th Int’1 Workshop on Privacy Enhancing Technologies. Toronto, 2004.256–282.
- View Article
- Google Scholar
34. Wang L, Meng XF. Location privacy preservation in big data era: A survey. Ruan Jian Xue Bao/Journal of Software, 2014,25(4):693−712 (in Chinese).
- View Article
- Google Scholar
35. XUE A Y, ZHANG R, ZHENG Y. Destination prediction by sub-trajectory synthesis and privacy protection against such prediction[C]//2013 IEEE 29^th International Conference on Data Engineering(ICDE). IEEE,2013:254–265.
- View Article
- Google Scholar
36. XUE A Y, QI J, XIE X. Solving the data sparsity problem in destination prediction[J].The VLDB Journal,2015,24(2):219–243.
- View Article
- Google Scholar
37. Jingjing Qu, Ying Cai, Yanfang Fan, Hongke Xie. Differentially Private Mixed Data Release Algorithm Based on k-prototype Clustering. Journal of Frontiers of Computer Science and Technology. 2021, 15 (1): 109–118.
- View Article
- Google Scholar
38. Yang Cao,Masatoshi Yoshikawa.Quantifying Differential Privacy under Temporal Correlations. 2017 IEEE 33rd International Conference on Data Engineering. 2017,821–832.
- View Article
- Google Scholar
39. Jing Yang. Differential Privacy Protection Method Based on Published Trajectory Cross-correlation Constraint. PLOS ONE. Aug 12, 2020:1–25.
- View Article
- Google Scholar
40. Yu Zheng, Lizhu Zhang, Xing Xie. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791–800.
- View Article
- Google Scholar
41. Yu Zheng, Quannan Li, Yukun Chen, Xing Xie. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 2008:312–321.
- View Article
- Google Scholar
42. Yu Zheng, Xing Xie, Wei-Ying Ma, GeoLife: A Collaborative Social Networking Service among User, location and trajectory. Invited paper, in IEEE Data Engineering Bulletin. 2010,2(33): 32–40.
- View Article
- Google Scholar

[ref1] 1. Huanhuan Wang, Xiao Zhang, Youbing Xia, Xiang Wu. A differential privacy preserving deep learning caching framework for heterogeneous communication network systems. International Journal of Intelligent Systems. 2022;37:11142–11166.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Xiang Wu, Yongting Zhang, Minyu Shi, Pei Li, Ruirui Li, Neal N. Xiong. An adaptive federated learning scheme with differential privacy preserving[J]. Future Generation Computer Systems, 2022, 127:362–372.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Ren Ozeki,Haruki Yonekura,Hamada Rizk.Sharing without caring: privacy protection of users’ spatio-temporal data without compromise on utility. Proceedings of the 30th International Conference on Advances in Geographic Information Systems. November 2022:1–2.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Xingxing Xiong, Shubo Liu, Dan Li. A Comprehensive Survey on Local Differential Privacy. Security and Communication Networks. 08 Oct 2020.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Zhirun Zheng,Zhetao Li,Jie Li. Utility-aware and Privacy-preserving Trajectory Synthesis Model that Resists Social Relationship Privacy Attacks. ACM Transactions on Intelligent Systems and Technology. 2022:13(44):1–28.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Simin Zhu,Xin Lv,Lin Yu. Location Privacy Protection Method based on Variable-Order Markov Prediction Model. 2021 4th International Conference on Computer Science and Software Engineering (CSSE 2021)October 2021:25–30.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Yi Yang,Yurong Cheng,Ye Yuan. Privacy-preserving cooperative online matching over spatial crowdsourcing platforms. Proceedings of the VLDB Endowment (PVLDB),September 2022, 16(1):51–63.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Zhang Tao, Tianqing Zhu, Renping Liu. Correlated data in differential privacy: Definition and analysis. Concurrency and Computation: Practice and Experience. 19 September 2020. https://doi.org/10.1002/cpe.6015
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Miao He ,Fenhua Bai,Chi Zhang. A Blockchain-Enabled Location Privacy-preserving under Local Differential Privacy for Internet of Vehicles. Proceedings of the 2022 4th Blockchain and Internet of Things Conference. July 2022:84–91.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Zhang Xiaojian, Fu Nan, Meng Xiaofeng. Locally Differentially Private Key-Value data collection. Chinese Journal of Computers. 2020,8(43):1479–1492.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Daniel Ashbrook, Thad Starner. Learning Significant Locations and Predicting User Movement with GPS. //Proceeding of the International Symposium on Wearable Computer. Piscataway, NJ:IEEE, 2002:101–108.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Alvarez-Garcia J.A., Ortega J.A., Gonzalez-Abril L.. Trip Destination Prediction based on Past GPS Log Using A Hidden Markov Model. Expert Systems with Applications. 2010,37(12):8166–8171.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Sébastien Gambs, Marc-Olivier Killijian. De-Anonymization Attack on GeolocatedData. Journal of Computer and System Sciences. 2014,80(8):1597–1614.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Jiangwei Pan, Vinayak Rao. Markov-Modulated Marked Poisson Processes for Check-In Data. //Proceedings of the 33 rd International Conference on MachinenLearning, New York, NY, USA, 2016:2244–2253.

[ref15] 15. Rong Wang, Min Zhang, Dengguo Feng. A De-Anonymization Attack on Geo-Located Data Considering Spatio-Temporal Influences. //Proceeding of the International Conference on Information and Communications Security. Berlin Herdeberg:Springer,2015:478–484.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref16] 16. Samiul Hasan, Xianyuan Zhan. Understanding Urban Human Activity and Mobility Patterns Using Large-scale Location-based Data from Online Social Media. // Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing. Chicago, Illinois, USA. August 2013. 1–8.

[ref17] 17. Vaibhav Kulkarni, Abhijit Mahalunkar,Benoit Garbinato. Limits of Predictability in Human Mobility. MDPI Journal. 2019, 21(4), 1–27.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref18] 18. Adam Sadilek, Henry Kautz. Finding Your Friends and Following Them to Where You Are. //Proceedings of the ACM International conference on Web search and data mining. Now York: ACM, 2012:723–732.

[ref19] 19. Andy Yuan Xue Rui Zhang, Zheng Yu. Destination Prediction by Sub-Trajectory Synthesis and Privacy Protection Against such Prediction. //Proceeding of the International Conference on Data Engineering. Piscataway, NJ: IEEE, 2013:254–265.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Zheng Huo, Xiaofeng Meng, Rui Zhang. Feel Free to Check-In: Privacy Alert Against Hidden Location Inference Attacks in GeoSNs. Database System for Advanced Application. Berlin Heidelberg Springer, 2013:377–391.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref21] 21. Yuchen Qiu, Yuanyuan Qiao, Aimin Zhang . Residence and Workplace Recovery: User Privacy Risk in Mobility Data. //Proceedings of the 2020 on Intelligent Cross-Data Analysis and Retrieval Workshop, June 2020:15–20.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref22] 22. Zhen Tu, Fengli Xu, Yong Li. A New Privacy Breach: User Trajectory Recovery From Aggregated Mobility Data. IEEE/ACM TRANSACTIONS ON NETWORKING. 2018,26(3):1446–1459.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref23] 23. LIN YAO, ZHENYU CHEN, HAIBO HU. Privacy Preservation for Trajectory Publication Based on Differential Privacy. ACM Transactions on Intelligent Systems and Technology. 2022,13(3):1–21.

[ref24] 24. Erik Buchholz, Alsharif Abuadbba, Shuo Wang. Reconstruction Attack on Differential Private Trajectory Protection Mechanisms. //Proceedings of the 38th Annual Computer Security Applications Conference. December, 2022:279–292.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref25] 25. Chang Su, Bin Gong, Xianzhong Xie. Personalized Point-of-Interest Recommendation Based on Social and Geographical Influence. //2021 4th Artificial Intelligence and Cloud Computing Conference. December 2021:130–137.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref26] 26. Soria-Comas, Jordi. Domingo-Ferrer, Josep. Sanchez, David. Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees. IEEE Transactions on Information Forensics and Security. 2017(16):1–12.

[ref27] 27. Xiaodong Zhao, Dechang Pi, Junfu Chen. Novel trajectory privacy-preserving method based on clustering using differential privacy. Expert Systems With Applications. 2020, 7(149):113241.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref28] 28. Yang Cao, Masatoshi Yoshikawa, Yonghui Xiao. Quantifying Differential Privacy in Continuous Data Release Under Temporal Correlations. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. 2019,7(31):1281–1295.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref29] 29. Jianpei Zhang, Yang Qing, Yiran Shen. A differential privacy based probabilistic mechanism for mobility datasets releasing. Journal of Ambient Intelligence and Humanized Computing. 08 February 2020,1–12.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref30] 30. Kim Sungwook. A new differential privacy preserving crowdsensing scheme based on the Owen value. EURASIP Journal on Wireless Communications and Networking. 2019:158.1–10.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref31] 31. Hongjie Gao, Haitao Xu, Long Zhang. A Differential Game Model for Data Utility and Privacy-Preserving in Mobile Crowdsensing. IEEE Access. 2019,9(7):128526–128533.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref32] 32. Eunjoon Cho, Seth A. Myers. Friendship and Mobility: User Movement in Location-based Social Networks. //Proceedings of the ACM SIGKDD International conference on Knowledge discovery and data mining. Now York: ACM, 2011:1082–1090.

[ref33] 33. Gunter CA, May MJ, Stubblebine SG. A formal privacy system and its application to location based services. In: Proc. of the 4th Int’1 Workshop on Privacy Enhancing Technologies. Toronto, 2004.256–282.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref34] 34. Wang L, Meng XF. Location privacy preservation in big data era: A survey. Ruan Jian Xue Bao/Journal of Software, 2014,25(4):693−712 (in Chinese).
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref35] 35. XUE A Y, ZHANG R, ZHENG Y. Destination prediction by sub-trajectory synthesis and privacy protection against such prediction[C]//2013 IEEE 29^th International Conference on Data Engineering(ICDE). IEEE,2013:254–265.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref36] 36. XUE A Y, QI J, XIE X. Solving the data sparsity problem in destination prediction[J].The VLDB Journal,2015,24(2):219–243.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref37] 37. Jingjing Qu, Ying Cai, Yanfang Fan, Hongke Xie. Differentially Private Mixed Data Release Algorithm Based on k-prototype Clustering. Journal of Frontiers of Computer Science and Technology. 2021, 15 (1): 109–118.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref38] 38. Yang Cao,Masatoshi Yoshikawa.Quantifying Differential Privacy under Temporal Correlations. 2017 IEEE 33rd International Conference on Data Engineering. 2017,821–832.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref39] 39. Jing Yang. Differential Privacy Protection Method Based on Published Trajectory Cross-correlation Constraint. PLOS ONE. Aug 12, 2020:1–25.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref40] 40. Yu Zheng, Lizhu Zhang, Xing Xie. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791–800.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref41] 41. Yu Zheng, Quannan Li, Yukun Chen, Xing Xie. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 2008:312–321.
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref42] 42. Yu Zheng, Xing Xie, Wei-Ying Ma, GeoLife: A Collaborative Social Networking Service among User, location and trajectory. Invited paper, in IEEE Data Engineering Bulletin. 2010,2(33): 32–40.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

Research on differential privacy protection method based on user tendency

Research on differential privacy protection method based on user tendency

Figures

Abstract

1. Introduction

2. Related work

3. Problem statement

3.1 User’s tendency attack model

3.2 User tendency analysis based on Markov decision process

3.3 Problem characterization

4. Proposed methodology

4.1 Markov model construction

4.2 Extended differential privacy protection model

4.3 Measurement of privacy protection effectiveness

4.3.1 The measurement of information leakage.

4.3.2 Measurement of data availability.

5. Experimental evaluation and results analysis

5.1 Experimental design and parameter setting

5.2 Experimental results and analysis

5.2.1 Analysis of privacy protection effect.

5.2.2 Analysis of data availability.

5.2.3 Analysis of execution efficiency.

5.2.4 Experimental results.

6. Conclusion

Supporting information

S1 Data.

S2 Data.

S3 Data.

S4 Data.

S5 Data.

References