Effectively computing transition patterns with privacy-preserved trajectory datasets

Jong Wook Kim; Beakcheol Jang

doi:10.1371/journal.pone.0278744

Abstract

Recent advances in positioning techniques, along with the widespread use of mobile devices, make it easier to monitor and collect user trajectory information during their daily activities. An ever-growing abundance of data about trajectories of individual users paves the way for various applications that utilize user mobility information. One of the most common analysis tasks in these new applications is to extract the sequential transition patterns between two consecutive timestamps from a collection of trajectories. Such patterns have been widely exploited in diverse applications to predict and recommend next user locations based on the current position. Thus, in this paper, we explore the computation of the transition patterns, especially with a trajectory dataset collected using differential privacy, which is a de facto standard for privacy-preserving data collection and processing. Specifically, the proposed scheme relies on geo-indistinguishability, which is a variant of the well-known differential privacy, to collect trajectory data from users in a privacy-preserving manner, and exploits the functionality of the expectation-maximization algorithm to precisely estimate hidden transition patterns based on perturbed trajectory datasets collected under geo-indistinguishability. Experimental results using real trajectory datasets confirm that a good estimation of transition pattern can be achieved with the proposed method.

Citation: Kim JW, Jang B (2022) Effectively computing transition patterns with privacy-preserved trajectory datasets. PLoS ONE 17(12): e0278744. https://doi.org/10.1371/journal.pone.0278744

Editor: Mahdi Abbasi, Bu-Ali Sina University: Bu Ali Sina University, IRAN, ISLAMIC REPUBLIC OF

Received: June 30, 2022; Accepted: November 22, 2022; Published: December 9, 2022

Copyright: © 2022 Kim, Jang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper.

Funding: This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. 2017-0-00515, Development of integraphy content generation technique for N-dimensional barcode application). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Recent advances in indoor and outdoor positioning techniques make it easier to monitor and collect user trajectory information during daily activities. An ever-growing abundance of data about trajectories of individual users paves the way for various applications [1–4]. However, similar to any other novel applications, these applications that leverage large-scale user trajectory data suffer from new problems as well. This is because the personal mobility information of individuals usually contains sensitive information, such as home and workplace addresses, hospital visit records, and political affiliation, which they do not want to disclose [5, 6]. Thus, most users are not comfortable to provide their trajectory information to those applications.

In the past decade, extensive studies have been conducted to protect user privacy when collecting and publishing trajectories, including spatial cloaking [7], mix-zone approach [8], and encryption-based scheme [9]. Recently, as differential privacy has emerged as a de facto standard for privacy-preserving data processing, there have been several efforts to apply differential privacy to the collection and publishing of trajectory data [10–13]. These privacy-preserving methods can alleviate, to some extent, users’ concerns about privacy exposure when providing their trajectory data to external service providers. However, from the service provider viewpoint, such privacy-preserving schemes lead to severe degradation in the utility of the collected dataset, eventually resulting in a loss in the quality of service.

One of the most common analysis tasks in applications that utilize user mobility information is to extract the sequential transition patterns between two consecutive timestamps, which corresponds to the probability that a user located at one location in the current timestamp moves to another location in the next timestamp. Such transition patterns have been widely exploited by diverse applications when predicting and recommending the next user location based on their current position. For example, the notion of first-order Markov chain, which models the sequential transition pattern between two consecutive point-of-interests (POIs), is used to recommend the next POI candidates to users in [2]. In [1], the sequential transition influences are integrated into the matrix factorization algorithm to enhance the accuracy of next location recommendations. In [14], transition probability matrices constructing from sequential rules are used to make location prediction for location-based-service users.

Computing transition probabilities from a collection of true trajectories is not difficult. However, the same cannot be said for the dataset consisting of privacy-preserved trajectory data. Let us consider a motivational example in Fig 1. As shown in the figure, users agree to contribute their trajectories to service providers for analysis purposes. However, owing to privacy concerns, they provide perturbed trajectories, which are obtained using geo-indistinguishability (GeoInd) [10] that is a variant of differential privacy, instead of true trajectories. During this process, it is guaranteed that individual users’ true location trajectories are not disclosed to the outside of the users’ devices, thus, protecting the location privacy of users. However, such a privacy-preserving mechanism leads to a loss in the utility of collected dataset. Subsequently, the resulting sequential transition patterns extracted from such dataset may have a lower utility.

Download:

Fig 1. High level architecture of the proposed framework.

https://doi.org/10.1371/journal.pone.0278744.g001

To address this problem, in this paper, we explore the computation of the sequential transition patterns (i.e., transition probability between two locations) with a perturbed trajectory dataset collected using GeoInd. There have been several efforts to compute aggregate statistics from users’ trajectory datasets in a privacy-preserving manner using GeoInd. Most of these works are dedicated to the estimation of population density distribution, which corresponds to static information of users, at a specific timestamp. For example, Ren and Tang [15] presented a vehicle location privacy protection framework based on GeoInd to compute traffic density distribution at a specific timestamp. Wang et al. [16] and Qui et al. [17] presented mobile crowdsensing frameworks in which workers’ density distribution is computed in a privacy-preserving manner using GeoInd. On the contrary, our work aims to estimate users’ transition probabilities, which correspond to moving information of users, from perturbed trajectory datasets collected using GeoInd. To the best of our knowledge, this is the first attempt to estimate transition probability between two locations under GeoInd setting. The contributions of this paper can be briefly summarized as follows: We develop a privacy-preserving framework based on GeoInd for the computation of transition probability between two locations. In particular, the proposed scheme exploits the functionality of the expectation-maximization (EM) algorithm to precisely estimate hidden transition patterns based on perturbed trajectory dataset. Through experiments with real data, it is demonstrated that a good estimation of transition pattern can be achieved with the proposed framework.

Background and problem definition

Geo-indistinguishability

GeoInd is an extended version of differential privacy with a distance metric to provide a privacy-preserving mechanism for location data [10]. Formally, ϵ-GeoInd is defined as follows:

Definition 1 (ϵ-GeoInd) Let be a set of possible user locations and be a set of reported locations. It is commonly assumed that is equal to . Let us assume that a randomized mechanism, K, probabilistically generates a perturbed location from a true location of a user. Then, K satisfies ϵ-GeoInd, if and only if for (1) all and (2) any output location, , the following equation is satisfied: (1) where d(x, x′) is the distance between x and x′.

The parameter ϵ, which corresponds to a privacy budget, determines a trade-off between the level of privacy and data utility. That is, smaller values of ϵ ensure a stronger privacy guarantee, introducing larger perturbation to the true location; whereas larger values of ϵ guarantee a weaker privacy but introduce smaller noise to the true location. This definition denotes that, given a reported location y that is an output of the randomized mechanism K, the ability of an adversary to identify whether the actual location of a user is x or x′ is limited by the privacy budget (i.e., ϵ) and distance between x and x′. This implies that the closer two locations are, the more indistinguishable they are.

Problem definition

In this subsection, we introduce key notations and formally define the problem. The main notations used in the rest of this paper are summarized in Table 1.

Download:

Table 1. Summary of major notations.

https://doi.org/10.1371/journal.pone.0278744.t001

Let us assume that the entire area is partitioned into m × n grid G. Then, if the current location of a specific user belongs to a grid g ∈ G at time t, the location of this user is represented as (g, t). The trajectory of the k-th user over r timestamps is represented as the sequence of r locations, such as TR_k = {(g_k1, t_k1), (g_k2, t_k2), …, (g_kr, t_kr)}. Given the k-th user’s trajectory TR_k, let be the corresponding perturbed trajectory obtained by the perturbation mechanism of GeoInd. Here, corresponds to the perturbed location generated from the i-th true location g_i using GeoInd. Here, to differentiate between the perturbed and true locations, we use g to represent a true location and g′ to represent a perturbed location. We will also omit the subscript, k, if it is clear from context.

Let be the collection of perturbed trajectories received from all users that are maintained by the data aggregation server. Here, f is the number of users. For each in DB, we can extract a set of perturbed location pairs in two adjacent timestamps, such as . Then, let DB be a collection of all such pairs of locations that can be obtained from all perturbed trajectories in . That is, DB is formally defined as follows: (2) where ⋃ represents the union all operator which allows duplicates.

Let p(g_x → g_y) be the transition probability from g_x to g_y, which denotes the probability that a user currently located at g_x moves to g_y in the next timestamp. Given DB, a straightforward solution for obtaining transition probabilities is to directly compute p(g_x → g_y) based on the perturbed trajectories, such as (3) where denotes the number of times that the location pair appears in DB. However, this straightforward approach cannot accurately compute the transition probabilities because it does not consider the effect of the location perturbation mechanism of GeoInd. Thus, in this study, we aim to propose a novel scheme to accurately estimate the transition probabilities for all pairs of locations in G with the collection of perturbed trajectories.

Privacy-preserving computation of transition probability between locations

In this section, we describe our privacy-preserving framework for effectively computing transition probabilities based on the collection of perturbed trajectories collected using GeoInd.

Privacy-preserving trajectory collection

There are two mechanisms to achieve ϵ-GeoInd: the Laplace and optimization mechanisms. The Laplace mechanism is simple but it is known to introduce large noise to the collected data, resulting in a lower utility. On the contrary, the optimization mechanism can provide the maximum utility for the collected data, while satisfying ϵ-GeoInd. In the optimization mechanism of GeoInd, the server first computes the obfuscation matrix, OM, by solving a linear programming problem. Let d(⋅, ⋅) be the distance metric between two grids in G, such as the Euclidean distance or the Manhattan distance. Let us further assume that κ_G be the prior probability distribution on users’ possible locations. κ_G can be either defined by a uniform distribution or inferred from the distribution of available historical data. Then, the obfuscation matrix OM can be obtained by solving the following linear programming problem [18, 19]: (4) Here, OM[u, v] represents the probability that a perturbed location is randomly generated from a true location g_u. The first constraint corresponds to the definition of GeoInd. The second constraint denotes that the sum of probabilities that each location is perturbed to other locations should be 1. The third constraint OM[u, v]≥0 denotes that the probability is no less than 0. It is well known that the number of constraints of the abovementioned linear optimization problem is proportional to ; therefore, the computational overhead of the optimal mechanism is significant. To alleviate the computational overhead of the optimal mechanism, several studies based on an approximation technique have been conducted in the literature [18, 19].

After computing the obfuscation matrix OM, the server distributes it to each user. Once receiving OM, each user perturbs his/her true trajectory according to the probabilities encoded in OM. Let us focus on the k-th user’s trajectory TR_k = {(g₁, t₁), (g₂, t₂), …, (g_r, t_r)}. Given the k-th user’s trajectory TR_k, the corresponding perturbed trajectory is computed in a way that the perturbed location is randomly generated from the true location g_i based on the probabilities encoded in OM. Finally, the perturbed trajectory, TR_kp, is sent to the data aggregation server. Note that during this process, true locations along users’ trajectories are not exposed to the outside of their devices, because the trajectory perturbation is performed on their devices.

Computation of transition probability with a collection of perturbed trajectories

In this subsection, we present the proposed method that effectively computes the transition probability between pairs of locations in G based on a collection of perturbed trajectories, which are collected under ϵ-GeoInd. Given DB, we first compute the joint probability P(g_x, g_y), which denotes the probability that a user’s current location is g_x and next location is g_y, by leveraging the functionality of the EM algorithm. We next compute the transition probability, p(g_x → g_y), using the joint probability P(g_x, g_y).

EM is an iterative algorithm that sequentially runs two steps, namely, E-step (expectation) and M-step (maximization). In the E-step, the expected value of the likelihood is computed based on the current parameters and observed variables, whereas in the M-step, an estimation on the parameters is performed to maximize the likelihood function. To leverage the EM algorithm, we first need to define a likelihood function. The collection of location pairs, DB, can be viewed as the set of observed variables, O = {o₁, o₂, …, o_|DB|}, where the i-th observed variable o_i corresponds to the i-th perturbed location pair . We also introduce the set of latent variables Z = {z₁, z₂, …, z_|DB|} where z_i, which is associated to o_i, is the mn × mn matrix. Here, z_i[u, v] is 1, if o_i (i.e., the perturbed location pair ) is randomly generated from the true location (g_u, g_v) by GeoInd. Otherwise, z_i[u, v] is set to 0. For simplicity, we use s_j,k to represent the true location pair (g_j, g_k).

Given the set of observed variables O and the set of latent variables Z, the complete data likelihood function is defined as follows: (5) Here, π_j,k represents the probability that the current and next locations of a user are g_j and g_k respectively that we aim to estimate by using EM. That is, π_j,k corresponds to P(g_j, g_k). Obviously, equals to 1.

Additionally, P(o_i|s_j,k) corresponds to . That is, P(o_i|s_j,k) denotes that the perturbed location pair is randomly generated from the true location pair (g_j, g_k) by GeoInd. We note that the perturbation process of GeoInd is independently applied to each true location in a trajectory. Hence, P(o_i|s_j,k) can be computed as follows: (6)

Note that by the definition of the obfuscation matrix, OM[j, cur_i] equals to .

Given the Eq (5), the log-likelihood function is defined as follows: (7)

In the last line of Eq (7), P(o_i|s_j,k) is substituted by Eq (6). We now define the method that computes the transition probabilities using the EM algorithm.

Initialization.

In the initialized phase, the initial parameter θ⁽⁰⁾ is defined. Determining good initial values in EM is vital to reducing the number of iterative steps until convergence and enhancing the accuracy of parameter estimation. In this paper, we use two different schemes to determine initial values:

Uniform initialization: In the first scheme, π_j,k is initialized with a uniform value, such as , which satisfies .
Distance-based initialization: Intuitively, as the distance between two locations decreases, the transition probability between them increases. Accordingly, in the second scheme, π_j,k is initialized with a value that is inversely proportional to the distance between g_j and g_k (i.e., ), while satisfying .

E-Step.

In this step, the conditional expectation of the latent variables Z is estimated based on the current parameter θ^(h). The E-step is stated as follows: (8) Here, can be viewed as a posterior probability that given the current parameter θ^(h), the perturbed location pair is generated from the true location pair (g_j, g_k). Thus, it can be computed by Bayes’ theorem as follows: (9)

Note that the last line of Eq (9) is rewritten using Eq (6).

M-Step.

This step finds the parameters θ that maximize the expectation function, Q(θ|θ^(h)), in the E-step. This task can be viewed to find the extreme value of Eq (8) with respect to the constraint . Therefore, we exploit the Lagrange multiplier method by defining the Lagrange function as follows: (10)

Then, the first-order partial derivative of with respect to π_j,k is obtained as follows: (11)

By utilizing the constraint to get the optimal value, we have the following: (12)

According to Eqs (11) and (12), the parameter that maximizes the expectation function, Q(θ|θ^(h)), is defined as follows: (13)

Thus, we can obtain the updated parameter using Eq (13), in which is already computed in the previous E-step.

Computation of transition probability.

The abovementioned E-step and M-step are repeated until convergence. After computing the parameters using EM, the transition probability from g_x to g_y is computed as follows: (14)

We note that the parameter π_x,y is already estimated in the previous E-Step and M-Step.

Experiments

In this section, we present the experiments we carried out to evaluate the proposed approach. First we describe the experimental setup and then we will discuss the experimental results.

Experimental setup

We evaluate the proposed method using the T-Drive dataset [20] that contains one-week trajectories of Beijing taxis. We first extract 85707 trajectories, with a length of 10, from the T-Drive dataset. Then, the entire geographic region, represented by longitude and latitude information, is segmented into three different grids: 10 × 10, 15 × 15, and 20 × 20. Each location in trajectories is assigned to a segmented region (i.e., grid) to which it geographically belongs. In the experiments, results are reported for the following alternatives, the straightforward approach (SA), the proposed EM-based approach with the uniform initialization (EM_uni), and the proposed EM-based approach with the distance-based initialization (EM_dis). Furthermore, we compare our proposed method with the particle filter-based approach (PF) which has been extensively used in trajectory detection [21, 22]. For our evaluations, we adapted the underlying particle filter-based approach technique for trajectory detection to infer users’ true locations from perturbed locations. Especially, at each iteration of the particle filter, the weights of particles are updated by using the probabilities embedded in the obfuscation matrix OM of GeoInd.

To compare these four schemes, we use the mean absolute error (MAE), which is defined as follows: (15) Here, p^est(g_x → g_y) is the transition probability estimated based on the perturbed trajectory dataset, whereas p(g_x → g_y) is the true transition probability computed based on the actual trajectory dataset.

Results and discussion

Fig 2 illustrates MAE values versus a varying privacy budget ϵ. In the experiments, ϵ varies from 0.5 to 2.0, while the size of grid is fixed to 10 × 10. Key observations based on Fig 2 can be summarized as follows: As expected, the error rate increases as the privacy budget decreases. This is because a smaller ϵ value provides stronger privacy. However, in terms of utility, a smaller ϵ results in a lower utility of the collected trajectory dataset. This in turn leads to a decreased estimation accuracy when computing the transition probability. On the contrary, a larger value of ϵ provides a weaker privacy guarantee, while introducing less perturbation to true locations. This in turn leads to an increased estimation accuracy of the transition probability.

Download:

Fig 2. MAE on varying privacy budgets.

https://doi.org/10.1371/journal.pone.0278744.g002

Among the four alternatives, the proposed EM-based schemes, EM_uni and EM_dis, exhibit the better performance compared to the straightforward approach SA and the particle filter-based approach PF. Especially, the performance gain of the proposed EM-based schemes over SA and PF is more pronounced at a smaller ϵ value, which provides stronger privacy. Considering that many real-world applications require stronger privacy protection guarantees against attackers with arbitrary backgrounds, these results indicate that the proposed method is more practical for real-world applications. Between the two EM-based schemes, EM_dis slightly outperforms EM_uni at all privacy bugets, which indicates that the distance-based initialization scheme is more promising than a uniform value scheme.

To further investigate the validity of the experimental results, in Fig 3, we plot a heatmap for the estimated transition probability distribution of each method with ϵ being set to 1.0. The size of grid is set to 10 × 10. For comparison purposes, we also plot the true transition probability distribution that is obtained with the actual trajectory dataset. As observed in the figure, the estimated probability distribution obtained by SA and PF is quite dissimilar to the true distribution. On the contrary, with the proposed EM-based schemes, we can obtain the estimated probability distributions that are highly similar to the true one. Between the two EM-based schemes, the estimated probability distributions obtained by EM_dis are more similar with the true distributions than those computed by EM_uni. The experimental results presented in Figs 2 and 3 indicate that the proposed EM-based approach can achieve higher precision in the computation of the transition probability with trajectory dataset collected in a privacy-preserving manner using GeoInd than that of the straightforward and particle filter-based approaches.

Download:

Fig 3. Heatmaps for the transition probability distribution.

https://doi.org/10.1371/journal.pone.0278744.g003

Fig 4 shows MAE values with respect to various grid sizes. In the experiment, three different grid sizes are used with ϵ fixed to 0.5. Similar experimental results can be observed with various grid sizes. As shown in Fig 4, for all grid sizes, the proposed EM_uni and EM_dis significantly outperform SA and PF regarding MAE, verifying the robustness of the proposed method against the grid size. Furthermore, the EM scheme using the distance-based initialization achieves slightly better performance than that using the uniform-based initialization.

Download:

Fig 4. MAE on varying grid sizes.

https://doi.org/10.1371/journal.pone.0278744.g004

Finally, we validate the convergence of the proposed EM-based schemes. In Fig 5, we plot the MAE values versus the number of iterations in the EM process. In the experiments, ϵ is fixed to 1.0, while three different grid sizes are used. As shown in the figure, we can observe a significant drop in the MAE values during the first a few iteration. Beyond this point, the MAE values stabilize. These experiment results verify that for the proposed EM-based scheme that computes the transition probability, the number of iterations guaranteeing convergence is small. This in turn indicates that the additional computational overhead of the proposed scheme incurred when using the EM algorithm is not significant.

Download:

Fig 5. MAE vs. the number of iterations of EM.

https://doi.org/10.1371/journal.pone.0278744.g005

Related work

GeoInd have been used in diverse application domains. Here, we present some application areas of GeoInd. Vehicle networks have recently emerged as a promising solution to improve driving experiences and road safety. In [23], Zhou et al. proposed edge-assisted vehicle networks for improving the service quality in the vehicle networks where GeoInd is deployed at the edge nodes to protect the true location of the vehicle. In the proposed system, a vehicle first submits a service request along with its location to an edge node at which the GeoInd-based mechanism is executed to protect the actual location of the vehicle, and the service request with the perturbed vehicle’s location is then forwarded to the service provider which returns the required service to the requesting vehicle. To protect location privacy in location-aware social networks, GeoInd combined with homomorphic encryption is used for privacy-preserving nearby friend discovery [24]. Spatial crowdsourcing is a platform where individual users are engaged to collect, analyze, and disseminate their surrounding information. Wang et al. [16] proposed a differential geo-obfuscation mechanism to protect the workers’ true location during task assignment by an spatial crowdsourcing platform. Yan et al. [25] developed a spatial crowdsourcing framework which can protect the privacy of the workers’ trajectories. In the developed framework, the GeoInd mechanism is used to protect a worker’s shortest path from the source to destination. Qui et al. [17] investigated location-privacy protection in a vehicle-based spatial crowdsourcing framework using GeoInd. To address the location privacy issues raised in ride-sharing applications such as Uber, Waze, and Lyft, Tong et al. [26] proposed a scheduling scheme which exploits GeoInd to protect the location information of ride-sharing users. We note that the existing works fucus on either protecting users’ location privacy using GeoInd or extracting static information, such as population density distribution, from perturbed location datasets collected under GeoInd setting. On the contrary, the objective of this work is to estimate users’ moving information, such as transition patterns, with perturbed trajectory datasets collected using GeoInd.

There have been extensive studies to leverage the concept of differential privacy for publishing trajectory data in a privacy-preserving manner. Hua et al. [27] proposed a differential-privacy-based scheme for publishing time-serial trajectory data. The proposed scheme leverages an exponential mechanism to probabilistically cluster locations based on their distance, and then relies on the Laplace mechanism to add a random noise to the count of trajectories in a cluster. DP-Star [28] is a differential-privacy-based framework for publishing trajectory data with strong utility. DP-Star generates synthetic trajectories that satisfy ϵ-differential privacy, while maintaining high utility. SafePath [29], which is a privacy-preserving algorithm for publishing trajectories, structures trajectories as a noisy prefix tree and publishes differentially-private trajectories, while retaining data utility. Ou et al. [30] proposed two lagrange multiplier-based differentially private approaches, UD-LMDP and UC-LMDP, to address privacy issues arising when publishing mutually correlated trajectories. The authors introduced an n-body Laplace framework which aims to prevent adversaries from inferring a social relation from the mutual correlation between two users’ trajectories. Chen et al. [11] proposed RNN-DP, which is a differential privacy scheme based on a recurrent neural network, to protect the privacy of real-time trajectory data. RNN-DP exploits a recurrent neural network to efficiently predict trajectory, while protecting the location privacy of users using differential privacy. OPTDP [31] is an optimal personalized trajectory differential privacy mechanism for trajectory data publishing. Zhao et al. [32] introduced a method to build the SR tree, which is based on R-tree, using the trajectory sequence, and then adds random noise to the nodes in the SR tree. Then, the noise SR-Tree indexes are used to answer user’s spatial queries. Liu et al. [33] developed a trajectory data publication scheme that exploits the staircase mechanism of differential privacy.

Differential privacy has been also used to process and analyze trajectory data in diverse application areas. PLDP-TD [34] supports personalized-location differentially private data analysis on trajectory databases. To answer queries in a differentially private way, PLDP-TD builds a personalized noisy trajectory tree which stores sub-trajectories of a trajectory database along with their privacy protection levels. That is, a personalized privacy level is assigned to each node of the tree depending on the privacy protection requirements of locations. Kim et al. [35] leverages local differential privacy, which is a localized version of differential privacy, to collect indoor positioning data from users, while protecting their location privacy. ConCrowd-DP [36] is a differentially private framework for mobile crowdsourcing applications in which the mobile users upload the location-related task results to the server for obtaining the rewards. In ConCrowd-DP, to protect of participating users’ location privacy, perturbed locations are used instead of users’ true locations, when reporting the location-related task results to the server. Deldar and Abadi [37] proposed DPLG to efficiently and accurately answer spatial queries on moving objects databases. DPLG is based on the combination of differential privacy and location generalization which enable to efficiently and accurately process spatial queries by reducing the number of locations and minimizing query errors.

Conclusion

The most common analysis task in applications that utilize mobility information of users is to compute the transition probability between two consecutive timestamps. Computing transition probabilities from a collection of privacy-preserved trajectories is challenging, because of a loss in the utility of collected dataset caused by a privacy-preserving mechanism. To address this challenge, in this paper, we proposed a novel scheme to compute the transition probability with the collection of perturbed trajectories collected using GeoInd. The proposed method leveraged the EM algorithm to precisely estimate hidden transition patterns based on a perturbed trajectory dataset. Experimental results with real datasets verified that a good estimation of transition pattern can be achieved with the proposed method.

References

1. S. Feng, X. Li, Y. Zeng, G. Cong, and Y. M. Chee. Personalized ranking metric embedding for next new poi recommendation. in Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2069–2075, 2015.
2. Kim J. S., Kim J. W., and Chung Y. D. Successive point-of-interest recommendation with local differential privacy. IEEE Access, vol. 9, pp. 66371–66386, 2021.
- View Article
- Google Scholar
3. Tong Y., Zhou Z., Zeng Y., Chen L., and Shahabi C. Spatial crowdsourcing: A survey. VLDB Journal, vol. 29, pp. 217–250, 2020.
- View Article
- Google Scholar
4. Shi D., Ding J., S. M. Errapotu, Yue H., Xu W., Zhou X., et al. Deep Q-network-based route scheduling for TNC vehicles with passengers’ location differential privacy. IEEE Internet of Things Journal, vol. 6, 2019.
- View Article
- Google Scholar
5. Xu C., Luo L., Ding Y., Zhao G., and Yu S. Personalized location privacy protection for location-based services in vehicular networks. IEEE Wireless Communications Letters, vol. 9, pp. 1633–1637, Oct. 2020.
- View Article
- Google Scholar
6. Takbiri N., Shejwalkar V., Houmansadr A., Goeckel D. L., and Pishro-Nik H. Leveraging prior knowledge asymmetries in the design of location privacy-preserving mechanisms. IEEE Wireless Communications Letters, vol. 9, pp. 2005–2009, July 2020.
- View Article
- Google Scholar
7. M. O. Gruteser and D. Grunwald. Anonymous usage of location-based services through spatial and temporal cloaking. in Proceedings of the international conference on Mobile systems, applications and services, pp. 31–-42, May 2003.
8. Beresford A. R. and Stajano F. Location privacy in pervasive computing. IEEE Pervasive Computing, vol. 2, no. 1, pp. 46–55, 2003.
- View Article
- Google Scholar
9. B. Liu, L. Chen, X. Zhu, Y. Zhang, C. Zhang and W. Qiu. Protecting location privacy in spatial crowdsourcing using encrypted data. in roceedings of the International Conference on Extending Database Technology, pp. 478–-481, April 2017.
10. M. E. Andres, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi. Geo-indistinguishability: Differential privacy for location-based systems. in Proc. CCS, pp. 901–914, November 2013.
11. Chen S., Fu A., Shen J., Yu S., Wang H., and Sun H. RNN-DP: A new differential privacy scheme base on recurrent neural network for dynamic trajectory privacy protection. Journal of Network and Computer Applications, vol. 168, pp. 189–196, 2020.
- View Article
- Google Scholar
12. Li J. and Chen G. A personalized trajectory privacy protection method. Computers & Security, vol. 108, September 2021.
- View Article
- Google Scholar
13. Kim J. W., Edemacu K., Kim J. S., Chung Y. D. and Jang B. A survey of differential privacy-based techniques and their applicability to location-based services. Computers & Security, vol. 111, December 2021.
- View Article
- Google Scholar
14. Zhang H., Chen Z., Liu Z., Zhu Y., and Wu C. Location prediction based on transition probability matrices constructing from sequential rules for spatial-temporal k-Anonymity dataset. PLOS ONE, vol. 11, August 2019.
- View Article
- Google Scholar
15. Ren W. and Tang S. EGeoIndis: An effective and efficient location privacy protection framework in traffic density detection. Vehicular Communications, vol. 21, 2020.
- View Article
- Google Scholar
16. L. Wang, D. Yang, X. Han, T. Wang, D. Zhang, and X. Ma. Location privacy-preserving task allocation for mobile crowdsensing with differential geo-obfuscation. in Proceedings of the International Conference on World Wide Web, 2017, pp. 627-636.
17. C. Qiu and A. C. Squicciarini. Location privacy protection in vehicle-based spatial crowdsourcing via geo-indistinguishability. in Proceedings of the IEEE International Conference on Distributed Computing Systems, pp. 1061-1071, Dallas, TX, USA, 2019.
18. K. Chatzikokolakis, E. ElSalamouny, and C. Palamidessi. Efficient utility improvement for location privacy. in Proceedings of the Privacy Enhancing Technologies Symposium, pp. 210–231, July 2017.
19. R. Ahuja, G. Ghinita, and C. Shahabi. A utility-preserving and scalable technique for protecting location data with geo-indistinguishability. in Proceedings of the International Conference on Extending Database Technology, pp. 210–231, April 2019.
20. T-Drive trajectory data sample. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample, 2018
21. Wang X., Li T., Sun S. and Corchado J. M. A survey of recent advances in particle filters and remaining challenges for multitarget tracking. Sensors, vol. 2017, no. 12, 2017.
- View Article
- Google Scholar
22. Fang Y., Wang C., Yao W., Zhao X., Zhao H., and Zha H. On-road vehicle tracking using part-based particle filter. IEEE Transactions on Intelligent Transportation Systems, vol. 20, pp. 4538–4552, 2019.
- View Article
- Google Scholar
23. Zhou L., Yu L., Du S., Zhu H., and Chen C. Achieving differentially private location privacy in edge-assistant connected vehicles. IEEE Internet of Things Journal, vol. 6, no. 3, Jun. 2019.
- View Article
- Google Scholar
24. Ma C. and Chen C. W. Nearby friend discovery with geo-indistinguishability to stalkers. Procedia Computer Science, vol. 34, pp. 352–359, 2014.
- View Article
- Google Scholar
25. Yan K., Luo G., Zheng X., Tian L., and Sai A. M. V. V. A comprehensive location-privacy-awareness task selection mechanism in mobile crowd-wensing. IEEE Access, vol. 7, pp. 77541–77554, 2019.
- View Article
- Google Scholar
26. W. Tong, J. Hua, and S. Zhong. A jointly differentially private scheduling protocol for ridesharing sServices. IEEE Transactions on Information Forensics and Security, vol. 12, no. 10, pp. 2444–2456, 2017.
27. J. Hua, Y. Gao, and S. Zhong. Differentially private publication of general time-serial trajectory data. Proceedings of the IEEE Conference on Computer Communications, pp. 549–557, Hong Kong, China, 2015.
28. M. E. Gursoy, L. Liu, S. Truex, and L. Yu. Differentially private and utility preserving publication of trajectory data. IEEE Transactions on Mobile Computing, vol. 18, pp. 2315–2329, October 2018.
29. Al-Hussaeni A. M. V. V. K., Fung B. C. M., Iqbal F., Dagher G. G., and Park E. G. SafePath: Differentially-private publishing of passenger trajectories in transportation systems. Computer Networks, vol. 143, pp. 126–139, October 2019.
- View Article
- Google Scholar
30. Ou L., Qin Z., Liao S., Hong Y., and Jia X. Releasing correlated trajectories: Towards high utility and optimal differential privacy. IEEE Transactions on Dependable and Secure Computing, vol. 17, pp. 1109–1123, October 2020.
- View Article
- Google Scholar
31. Cheng W., Wen R., Huang H., Miao W. and Wang C. OPTDP: Towards optimal personalized trajectory differential privacy for trajectory data publishing. Neurocomputing, vol. 472, pp. 201–211, February 2022.
- View Article
- Google Scholar
32. Zhao X., Dong Y., and Pi D. Novel trajectory data publishing method under differential privacy. Expert Systems With Applications, vol. 138, December 2019
33. Liu Q., Yu J., Han J., and Yao X. Differentially private and utility-aware publication of trajectory data. Expert Systems With Applications, vol. 180, October 2021.
- View Article
- Google Scholar
34. Deldar F. and Abadi M. PLDP-TD: Personalized-location differentially private data analysis on trajectory databases. Pervasive and Mobile Computing, vol. 49, pp. 1–22, September 2018.
- View Article
- Google Scholar
35. Kim J.W. and Jang B. Workload-aware indoor positioning data collection via local differential privacy. IEEE Communications Letters, vol. 23, p. 1352–1359, August 2019.
- View Article
- Google Scholar
36. Qiu G. and Shen Y. Mobility-aware differentially private trajectory for privacy-preserving continual crowdsourcing. IEEE Access, vol. 9, pp. 26362–26376, February 2021.
- View Article
- Google Scholar
37. Deldar F. and Abadi M. A differentially private location generalization approach to guarantee non-uniform privacy in moving objects databases. Knowledge-Based Systems, vol. 225, August 2021.
- View Article
- Google Scholar

[ref1] 1. S. Feng, X. Li, Y. Zeng, G. Cong, and Y. M. Chee. Personalized ranking metric embedding for next new poi recommendation. in Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2069–2075, 2015.

[ref2] 2. Kim J. S., Kim J. W., and Chung Y. D. Successive point-of-interest recommendation with local differential privacy. IEEE Access, vol. 9, pp. 66371–66386, 2021.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Tong Y., Zhou Z., Zeng Y., Chen L., and Shahabi C. Spatial crowdsourcing: A survey. VLDB Journal, vol. 29, pp. 217–250, 2020.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Shi D., Ding J., S. M. Errapotu, Yue H., Xu W., Zhou X., et al. Deep Q-network-based route scheduling for TNC vehicles with passengers’ location differential privacy. IEEE Internet of Things Journal, vol. 6, 2019.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Xu C., Luo L., Ding Y., Zhao G., and Yu S. Personalized location privacy protection for location-based services in vehicular networks. IEEE Wireless Communications Letters, vol. 9, pp. 1633–1637, Oct. 2020.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Takbiri N., Shejwalkar V., Houmansadr A., Goeckel D. L., and Pishro-Nik H. Leveraging prior knowledge asymmetries in the design of location privacy-preserving mechanisms. IEEE Wireless Communications Letters, vol. 9, pp. 2005–2009, July 2020.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref7] 7. M. O. Gruteser and D. Grunwald. Anonymous usage of location-based services through spatial and temporal cloaking. in Proceedings of the international conference on Mobile systems, applications and services, pp. 31–-42, May 2003.

[ref8] 8. Beresford A. R. and Stajano F. Location privacy in pervasive computing. IEEE Pervasive Computing, vol. 2, no. 1, pp. 46–55, 2003.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. B. Liu, L. Chen, X. Zhu, Y. Zhang, C. Zhang and W. Qiu. Protecting location privacy in spatial crowdsourcing using encrypted data. in roceedings of the International Conference on Extending Database Technology, pp. 478–-481, April 2017.

[ref10] 10. M. E. Andres, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi. Geo-indistinguishability: Differential privacy for location-based systems. in Proc. CCS, pp. 901–914, November 2013.

[ref11] 11. Chen S., Fu A., Shen J., Yu S., Wang H., and Sun H. RNN-DP: A new differential privacy scheme base on recurrent neural network for dynamic trajectory privacy protection. Journal of Network and Computer Applications, vol. 168, pp. 189–196, 2020.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref12] 12. Li J. and Chen G. A personalized trajectory privacy protection method. Computers & Security, vol. 108, September 2021.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref13] 13. Kim J. W., Edemacu K., Kim J. S., Chung Y. D. and Jang B. A survey of differential privacy-based techniques and their applicability to location-based services. Computers & Security, vol. 111, December 2021.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref14] 14. Zhang H., Chen Z., Liu Z., Zhu Y., and Wu C. Location prediction based on transition probability matrices constructing from sequential rules for spatial-temporal k-Anonymity dataset. PLOS ONE, vol. 11, August 2019.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref15] 15. Ren W. and Tang S. EGeoIndis: An effective and efficient location privacy protection framework in traffic density detection. Vehicular Communications, vol. 21, 2020.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref16] 16. L. Wang, D. Yang, X. Han, T. Wang, D. Zhang, and X. Ma. Location privacy-preserving task allocation for mobile crowdsensing with differential geo-obfuscation. in Proceedings of the International Conference on World Wide Web, 2017, pp. 627-636.

[ref17] 17. C. Qiu and A. C. Squicciarini. Location privacy protection in vehicle-based spatial crowdsourcing via geo-indistinguishability. in Proceedings of the IEEE International Conference on Distributed Computing Systems, pp. 1061-1071, Dallas, TX, USA, 2019.

[ref18] 18. K. Chatzikokolakis, E. ElSalamouny, and C. Palamidessi. Efficient utility improvement for location privacy. in Proceedings of the Privacy Enhancing Technologies Symposium, pp. 210–231, July 2017.

[ref19] 19. R. Ahuja, G. Ghinita, and C. Shahabi. A utility-preserving and scalable technique for protecting location data with geo-indistinguishability. in Proceedings of the International Conference on Extending Database Technology, pp. 210–231, April 2019.

[ref20] 20. T-Drive trajectory data sample. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample, 2018

[ref21] 21. Wang X., Li T., Sun S. and Corchado J. M. A survey of recent advances in particle filters and remaining challenges for multitarget tracking. Sensors, vol. 2017, no. 12, 2017.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref22] 22. Fang Y., Wang C., Yao W., Zhao X., Zhao H., and Zha H. On-road vehicle tracking using part-based particle filter. IEEE Transactions on Intelligent Transportation Systems, vol. 20, pp. 4538–4552, 2019.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref23] 23. Zhou L., Yu L., Du S., Zhu H., and Chen C. Achieving differentially private location privacy in edge-assistant connected vehicles. IEEE Internet of Things Journal, vol. 6, no. 3, Jun. 2019.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref24] 24. Ma C. and Chen C. W. Nearby friend discovery with geo-indistinguishability to stalkers. Procedia Computer Science, vol. 34, pp. 352–359, 2014.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref25] 25. Yan K., Luo G., Zheng X., Tian L., and Sai A. M. V. V. A comprehensive location-privacy-awareness task selection mechanism in mobile crowd-wensing. IEEE Access, vol. 7, pp. 77541–77554, 2019.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref26] 26. W. Tong, J. Hua, and S. Zhong. A jointly differentially private scheduling protocol for ridesharing sServices. IEEE Transactions on Information Forensics and Security, vol. 12, no. 10, pp. 2444–2456, 2017.

[ref27] 27. J. Hua, Y. Gao, and S. Zhong. Differentially private publication of general time-serial trajectory data. Proceedings of the IEEE Conference on Computer Communications, pp. 549–557, Hong Kong, China, 2015.

[ref28] 28. M. E. Gursoy, L. Liu, S. Truex, and L. Yu. Differentially private and utility preserving publication of trajectory data. IEEE Transactions on Mobile Computing, vol. 18, pp. 2315–2329, October 2018.

[ref29] 29. Al-Hussaeni A. M. V. V. K., Fung B. C. M., Iqbal F., Dagher G. G., and Park E. G. SafePath: Differentially-private publishing of passenger trajectories in transportation systems. Computer Networks, vol. 143, pp. 126–139, October 2019.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref30] 30. Ou L., Qin Z., Liao S., Hong Y., and Jia X. Releasing correlated trajectories: Towards high utility and optimal differential privacy. IEEE Transactions on Dependable and Secure Computing, vol. 17, pp. 1109–1123, October 2020.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref31] 31. Cheng W., Wen R., Huang H., Miao W. and Wang C. OPTDP: Towards optimal personalized trajectory differential privacy for trajectory data publishing. Neurocomputing, vol. 472, pp. 201–211, February 2022.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref32] 32. Zhao X., Dong Y., and Pi D. Novel trajectory data publishing method under differential privacy. Expert Systems With Applications, vol. 138, December 2019

[ref33] 33. Liu Q., Yu J., Han J., and Yao X. Differentially private and utility-aware publication of trajectory data. Expert Systems With Applications, vol. 180, October 2021.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref34] 34. Deldar F. and Abadi M. PLDP-TD: Personalized-location differentially private data analysis on trajectory databases. Pervasive and Mobile Computing, vol. 49, pp. 1–22, September 2018.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref35] 35. Kim J.W. and Jang B. Workload-aware indoor positioning data collection via local differential privacy. IEEE Communications Letters, vol. 23, p. 1352–1359, August 2019.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref36] 36. Qiu G. and Shen Y. Mobility-aware differentially private trajectory for privacy-preserving continual crowdsourcing. IEEE Access, vol. 9, pp. 26362–26376, February 2021.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref37] 37. Deldar F. and Abadi M. A differentially private location generalization approach to guarantee non-uniform privacy in moving objects databases. Knowledge-Based Systems, vol. 225, August 2021.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

Figures

Abstract

Introduction

Background and problem definition

Geo-indistinguishability

Problem definition

Privacy-preserving computation of transition probability between locations

Privacy-preserving trajectory collection

Computation of transition probability with a collection of perturbed trajectories

Initialization.

E-Step.

M-Step.

Computation of transition probability.

Experiments

Experimental setup

Results and discussion

Related work

Conclusion

References