## Figures

## Abstract

Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified.

**Citation: **Zhang H, Chen Z, Liu Z, Zhu Y, Wu C (2016) Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset. PLoS ONE 11(8):
e0160629.
https://doi.org/10.1371/journal.pone.0160629

**Editor: **Wen-Bo Du, Beihang University, CHINA

**Received: **March 31, 2016; **Accepted: **July 23, 2016; **Published: ** August 10, 2016

**Copyright: ** © 2016 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the paper and its Supporting Information files.

**Funding: **This research is supported by Jiangsu Government Scholarship for Overseas Studies, grants from National Natural Science Foundation of China (grant number 41201465), and grants from the Natural Science Foundation of Jiangsu province (grant number BK2012439). The authors thank the Institute of Cartography and Geoinformatics, Leibniz University Hannover for providing us with a good work environment. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exits.

## Introduction

With the rapid development in mobile communication and the popularity of positioning devices (e.g. Global Position System, GPS), LBS are widely used because of simplification in computing [1]. However, the deployment of LBS would bring privacy problems (e.g., employers snoop whereabouts of the staff, stalkers attack user trajectories to find out their religion, sex orientation, etc.) if used illegally, which has raised great attention from academia to business circle [2][3].

Early research on privacy protection for LBS users put emphasis on establishment of laws and treaties. While this research lacks flexibility, and has lagged behind attack technologies, some new technologies have been put forward. For instance, the use of hierarchical clustering [4], dummies [5][6], spatial transformation based on the Hilbert curve [7], private information retrieval (PIR) protocols [8] and spatial-temporal k-anonymity [9]. Spatial-temporal k-anonymity has become a mainstream privacy protection method for LBS users due to its simplification and various applications.

Furthermore, the basic principle of cloaking a requestor’s identification as well as accurate time and position information has inspired several variants on the original method [10].

As spatial and temporal properties are most important elements of spatial-temporal k-anonymity datasets (hereafter referred to as anonymity datasets), anonymity datasets can be formatted into a number of sequences of generalized regions. Analyzing large-scale anonymity datasets recorded and stored by LBS providers (such as Google Maps, Foursquare, Baidu Maps, etc.), can achieve a set of sequential rules reflecting LBS issuers’ movement behaviors. Furthermore, the sequential rules can be utilized to predict locations of future users, and provide assist decision support functions for LBS applications, such as intelligent navigation systems, personalized service systems, and so on [11][12]. Unfortunately, location prediction simply based on sequential rules does not perform well, as the prediction can only be single step, that is, the prediction only includes one source and one destination. A more practical location prediction method (such as multistep, etc.), is urgently needed in applications. To our knowledge, there is little literature that focuses on this subject by far.

In this paper, based on sequential rules mined from large-scale anonymity datasets, we propose two location prediction methods. Simultaneously, privacy attack problems that may result from our proposed location prediction methods are also analyzed.

The rest of this paper is organized as follows. Preliminary work is described in Section 2. Two location prediction methods based on preprocessing sequential rules from anonymity datasets are presented in Section 3. Comprehensive experiments are conducted in Section 4, and the results are analyzed. Section 5 concludes the paper and discusses further work.

### Preliminaries

In this section, the basic concepts of LBS queries and the primitives of LBS privacy are introduced. Examples of anonymity datasets adopted by a typical method of spatial-temporal k-anonymity are also presented.

### LBS query

A location service can be defined as a service that integrates the location of an LBS user with other information to provide added value to the user. Applications are designed by adopting two modes: push and pull [13]. Furthermore, there are two types of pull services, namely snapshot queries such as “recommend 10 nearby restaurants based on my profile”, and continuous queries such as “continually tell me the shopping mall nearest my location”. For a snapshot query, an LBS user only needs to report their current location to the service provider once to obtain the desired information. On the other hand, for a continuous query, an LBS user has to continually report their location to the service provider in a periodic or on-demand manner to obtain the desired results[14]. Additionally, in a continuous query, a consistent user identity (or pseudo-identifier) is used until the query expires, that is, LBS providers can link requests issued by the same (anonymous) user at different times in chronological order to obtain a sequence of requests.

### Primitives of LBS privacy

Privacy is an essential requirement for providing LBS, and can be grouped into two categories: identity and sensitive information [2]. Identity of each individual is unique which distinguishes an individual from a group of individuals (i.e., a security identifier or SID). Sensitive information consists of location and request content. Location privacy is the tracks of individuality or a group of people, which includes coordinates, landmarks, etc. Semantic location privacy is an instance of privacy regarding sensitive semantic information, for example, hospitals, religious buildings, and so on. Request content privacy involves sensitive attribute information, such as disease, salary, religion, and so on. It is worth noting that identity privacy can be associated with sensitive information privacy to cause more severe privacy invasion.

### Spatial-temporal k-anonymity

Spatial-temporal k-anonymity is a branch of the k-anonymity method, which is an obfuscation technique. Based on spatial-temporal k-anonymity, a query request submitted to LBS providers does not only contributed by the identity and location of LBS users, but also at least k pseudonyms of the users, including the requestor and others nearby, and a cloaking region enclosing the locations of the k (or more) LBS users. Thus, given a query request, an anonymity dataset is generated, consisting of at least k pseudonyms and a cloaking region. Consequently, identity privacy is protected by replacing the identities of requestors with pseudonyms, and location privacy is protected by replacing accurate locations of query requestors with cloaking regions. Furthermore, as an anonymity dataset includes at least k pseudonyms and a cloaking region, the association between pseudonyms and the cloaking region can be prevented at a certain degree. Likewise, the association between pseudonyms and the content of the request can also be avoided, as any pseudonym within the anonymity dataset may have issued the query request.

Spatial-temporal k-anonymity and its optimized versions are widely used in LBS snapshot queries and continuous queries [2]. To better understand the follow-up analysis of anonymity datasets, we present an example workflow of generating an anonymity dataset adopted by the modified adaptive-interval cloaking algorithm [9].

First, we present the basic definitions of an anonymity dataset for an LBS snapshot query *SnAS* = ⟨*UP*,*CR*,*TC*⟩, where *UP* = ⟨*U*_{1},*U*_{2},…,*U*_{k}⟩ represents a set of k user pseudonyms, *CR* = ⟨*Cell*_{1},*Cell*_{2},…,*Cell*_{m}⟩ represents a cloaking region that includes m grid cells enclosing the locations of the k users, and *TC* = 〈*TI*_{1},*TI*_{2},…,*TI*_{n}〉 represents temporal cloaking with n time intervals of equal duration. Moreover, the time intervals 〈*TI*_{1},*TI*_{2},…,*TI*_{n}〉 provide very little temporal information, that is, *SnAS* is a temporally-ordered sequence without a specified time.

Fig 1 presents an example of an anonymity dataset for an LBS snapshot query, where *SnAS* = ⟨⟨*U*_{11},*U*_{12},*U*_{13},*U*_{14},*U*_{15},*U*_{16},*U*_{17},*U*_{18},*U*_{19},*U*_{26},*U*_{27},*U*_{28}⟩,⟨*Cell*_{22},*Cell*_{23},*Cell*_{33}⟩,⟨1⟩⟩. We set k = 10, and for the sake of simplicity, we set the number of temporal cloaking to be 1.

Based on the definitions of anonymity datasets for a snapshot query, we define an anonymity dataset for an LBS continuous query as, *CoAS* = 〈*SnAS*_{1},*SnAS*_{2},…,*SnAS*_{s}〉 where *SnAS*_{i}(1 ≤ *i* ≤ *s*) represents an anonymity dataset for a snapshot query. In this paper, we focus on anonymity datasets for LBS continuous queries. Fig 2 presents an example of an anonymity dataset for an LBS continuous query, where *CoAS* = 〈*SnAS*_{1},*SnAS*_{2},*SnAS*_{3},*SnAS*_{4}〉,

Finally, as we deal only with spatial-temporal properties of anonymity datasets, an anonymity dataset for an LBS continuous query can be denoted more briefly as

In the case of the anonymity dataset in Fig 2, the simplified notation is as follows:

### Location prediction method

In this section, two location prediction methods are proposed. Either follows 5 phases:

- Mining sequential rules from anonymity datasets for LBS continuous queries;
- Constructing transition probability matrices from the mined sequential rules;
- Normalizing the transition probabilities in the transition probability matrices;
- Computing n-step transition probability matrices by raising the normalized transition probability matrices to the power n;
- Designing a rough location prediction method and an accurate location prediction method based on the n-step transition probability matrices.

### Mining sequential rules from anonymity datasets for LBS continuous queries

Prediction is an important type of data mining technology, and discovering temporal relationships in sequences of discrete events stored in large databases can help with the prediction of events[15]. Sequential patterns in sequences of events can reflect temporal relationships even without a specified time between events, and mining sequential patterns has become a popular technique for prediction. Meanwhile, as a sequential pattern only indicates that a sequence of events appears frequently in a database, it is not sufficient for the prediction of events. Thus, the concept of a sequential rule, also called a prediction rule, was proposed in [16].

A sequential rule has the form *X* → *Y*, where *X* and *Y* are two sets of events. *X* → *Y* is interpreted to mean “if events *X* appear, the events *Y* are likely to occur afterward with a given confidence value or probability”. Events *X* and events *Y* occur in succession frequently within a single sequence. A sequential rule typically has two measures of significance: support and confidence. The support of a sequential rule is here defined as the number of sequences where the left part occurs before the right part, divided by the number of sequences; the confidence of a rule is the number of sequences where the left part occurs before the right part, divided by the number of sequences where the left part occurs. For example, for a sequential rule *X* → *Y*, the support value and the confidence value of the sequential rule are respectively formulated as follows: *seq*sup(*X* → *Y*) = *seq*sup(*X* ∪ *Y*)/|*D*| and *seqconf*(*X* → *Y*) = *seq*sup(*X* ∪ *Y*) / *seq*sup(*X*), where |*D*| is the number of sequences in a sequence database *D*, *seq*sup(*X*) is the number of sequences in *D* where *X* occurs, and *seq*sup(*X* ∪ *Y*) is the number of sequences in *D* where *Y* occurs after *X*. Neither *seq*sup(*X* → *Y*) nor *seqconf*(*X* → *Y*) are less than the user-defined thresholds *seq*sup_{min} and *seqconf*_{min}.

In this paper, we focus on spatial-temporal properties of anonymity datasets. Sequential rules mined from large-scale historical anonymity datasets generated by LBS continuous queries can be used to make location prediction for LBS users. In particular, a sequential rule of the form *A* → *B* with the confidence *seqsconf*(*A*– > *B*) may indicate that, if an LBS user issued a continuous query and presented an anonymous request in grid cell *A*, then with the confidence *seqsconf*(*A* → *B*) (s)he will continue to present an anonymous request in grid cell *B*. That is, sequential rules mined from anonymity datasets can reflect the movement regularity of LBS users among a series of grid cells. Table 1 presents a sample of sequential rules mined from anonymity datasets generated by LBS continuous queries.

### Constructing n-step transition matrices by normalizing the confidence values of sequential rules

From a statistical standpoint, a mobility model for an LBS requester can be viewed as a stationary stochastic process [17]. Each movement of LBS users among a series of grid cells can be regarded as a discrete Markov process {*X*_{n},*n*∈*T*}, where *T* is a discrete time set (e.g., *T* = {1,2,…}), the random variable *X* represents the location of an LBS user who requests an anonymous continuous query, and *X*_{n} represents the value of random variable *X* at time *n* (here, *X*_{n} represents the grid cell where the LBS user is located at time *n*). We refer to *X*_{n} as a state and call *I* = {*i*_{1},*i*_{2},*i*_{3}…*i*_{m}}, the set of all possible states of *X*_{n}, the state space of *X*_{n}. *I* can be achieved by counting of the number of distinguished grid cells appearing in the left and right parts of the sequential rules mined from large-scale historical anonymity datasets generated by LBS continuous queries. In the case of the collection of sequential rules in Table 1, *I* = {*A*,*B*,*C*,*D*,*E*,*F*}.

For any given *n* ∈ *T*, *i*_{0},*i*_{1}…*i*_{n+1} ∈ *I*, the discrete Markov process {*X*_{n},*n*∈*T*} is called a 1-order Markov chain if the following formula holds: *P*{*X*_{n+1} = *i*_{n+1} | *X*_{1} = *i*_{1},*X*_{2} = *i*_{2},…,*X*_{n} = *i*_{n}} = *P*{*X*_{n+1} == *i*_{n+1} | *X*_{n} = *i*_{n}}, where *P*{*X*_{n+1} = *i*_{n+1} | *X*_{1} = *i*_{1},*X*_{2} = *i*_{2},…,*X*_{n} = *i*_{n}} is the conditional probability of *X*_{n+1} = *i*_{n+1} given *X*_{0} = *i*_{0}, *X*_{1} = *i*_{1},…,*X*_{n} = *i*_{n}, and *P*{*X*_{n+1} == *i*_{n+1} | *X*_{n} = *i*_{n}} is the conditional probability of *X*_{n+1} = *i*_{n+1} given *X*_{n} = *i*_{n}. That is, a 1-order Markov chain {*X*_{n},*n*∈*T*} can be characterized as memoryless: the next state *X*_{n+1} = *i*_{n+1} depends only on the current state *X*_{n+1} = *i*_{n+1} but not on the sequence of events that preceded it. In the case of sequential rules mined from anonymity datasets, the memoryless means that the future grid cell at which an LBS user arrives is independent of all but the most recent grid cell.

The conditional probability *P*{*X*_{n+1} == *i* | *X*_{n} = *j*}, *i*, *j* ∈ *I* can also be taken as a one-step transition probability from state *i* to state *j*, which is denoted by *p*_{ij}. In statistical significance, *p*_{ij} is consistent with the confidence value of the sequence rule of the form *i* → *j*. Based on all one-step transition probabilities that corresponding to the sequential rules, a transition matrix can be constructed, where the dimension *m* is equal to the number of states in the state space *I*. Continuing with the sequential rules in Table 1, the generated transition matrix is: .

However, the transition matrix must be normalized so that the condition holds. For example, for given *i* = *A*, *i* ∈ {*A*,*B*,*C*,*D*,*E*,*F*}, must be equal to 1, while = *p*_{AA} + *p*_{AB} + *p*_{AC} + *p*_{AD} + *p*_{AE} + *p*_{AF} = 0 + 0.2 + 0.5 + 0.2 + 0 + 0 = 0.9.

The normalization formula of *p*_{ij} is , . Then, we get , , , , and . We refer to a transition matrix with normalized transition probabilities as a one-step transition matrix, and denote by *P*^{(1)} the result corresponding to *P*^{0}.

In addition, the normalized probabilities are time-invariant. One reason for this is that the confidence values of the sequential rules mined from large-scale historical anonymity datasets generated by LBS continuous queries essentially reflect routine behaviors of a large number of LBS users. On the other hand, sequential rules only reflect temporally-ordered relationships between routine behaviors without specifying times. That is, LBS users follow a common route regardless of when they move. Hence, the Markov chain {*X*_{n},*n*∈*T*}, which corresponds to the movement of LBS users among a series of grid cells, can also be characterized as time-invariant, and further *P*^{(1)} can be considered to be independent of *n*. Furthermore, we can obtain the n-step transition matrix *P*^{(n)} from *P*^{(1)} using the formula *P*^{(n)} = (*P*^{(1)})^{n}. The maximum value of *n* can be determined from the conditions that *P*^{(n)} is not a zero matrix and that *n* is less than the length of the longest sequence of LBS anonymity datasets. Here, by raising *P*^{(1)} to an appropriate power, we obtain *P*^{(2)}, *P*^{(3)}, and *P*^{(4)} as follows:

### Prediction for arriving at a target location based on n-step transition matrices

#### Rough prediction.

This prediction consists of three main phases:

First, we specify a grid cell as the target location. Continuing the examples of sequential rules in Table 1, we assume that the grid cell represented by F is the target location.

Second, from the n-step transition matrix, we directly derive paths along which LBS users can arrive at the target location with specified probabilities. As the paths only include the target location and the grid cell ("begin location" for short) where the LBS users are when they begin to move, we denote these paths as simple paths. Based on the transition matrices *P*^{(1)} ∼ *P*^{(4)}, we obtain all simple paths by ascending number of steps as shown in Table 2.

Finally, by matching the grid cell ("current location" for short) where an LBS user is currently with all simple paths, we can make a location prediction for LBS users’ arriving at the target location. In the case of the simple paths in Table 2, the location prediction results are shown in Table 3. For example, when an LBS user appears in grid cell *A*, three predictions (indicated by the shaded entries) can be performed. In particular, after leaving the grid cell *A*, the LBS user has the three probability values 0.6984, 0.2749 and 0.0222 for arriving at the target location *F* through 2-, 3-, and 4-step transitions respectively. Likewise, location prediction can be performed when LBS users occupy grid cells *B*, *C*, *D* and *E*.

As can be seen in Table 3, the transitions from the current location to the target location can be classified as either single step or multistep. For single step transitions, the paths that LBS users follow are shown clearly. In particular, after leaving their current location, an LBS user arrives directly at the target location. For example, after leaving grid cell *C*, an LBS user arrives directly at the target location *F* with probability 0.8571.

However, for multistep transitions, we find that the simple paths that LBS users follow include one or more intermediate locations, but these intermediate locations are unknown, so the detailed path between the current location and the target location cannot be investigated. For example, LBS users currently in grid cell *C* arrive at the target location *F* with probability 0.1429 through a 2-step transition. This simple path certainly includes one intermediate location, but we cannot know the intermediate location. If there are several options for the intermediate location, then the simple path actually contains several detailed paths, and the probability 0.1429 is the sum of the probabilities for those detailed paths. In many practical applications, it is significant to know these detailed paths to predict future movements of the LBS users [18].

#### Accurate prediction.

We propose a method of calculating probabilities for detailed paths to make accurate location predictions. The principle of calculating transition probabilities for detailed paths is to iteratively calculate the probabilities for detailed paths with (*S*+1) steps by combining the probabilities for detailed paths with *n*−1 steps and the probabilities of simple paths with *n* steps.

The pseudo code for calculating transition probabilities for detailed paths is given below.

**Algorithm** 1: *R*: *CalcuDetailPath*(*L*,*TL*)

**Input:** *L*, a linked list of 1- to n-step transition matrices; *TL*, a target location.

**Output:** *R*, a linked list of detailed paths with one to n transition steps.

1. **{** *P*^{(1)} = *L*.*Get*(1);

2. ;

3. ;

4. ;

5. ;

6. Return *R*;

7. }

**Algorithm** 2: *CalcuDetailPathI*(*L*,*TL*,*R*_{Exac_in},*S*,*refR*)

**Input:** *L*, a linked list of 1-step to n-step transition matrices; *TL*, a target location; *R*_{Detail_in}, a linked list of detailed paths with *S*-step transition steps; *S*, the transition steps in the current iteration; *R*, a parameter passed by reference, which represents a linked list of detailed paths with one to *S* transition steps.

**Output:** null.

1. { *P*^{(S+1)} = *L*.*Get*(*S* + 1);

2. ;

3. For

4. { ;

5. For (*j* = 1; *j* ≤ *R*_{Datai_in}⋅*count*;*j*++)

6. { *prob*_{1} = *R*_{Detai_in}⋅*Get*(*j*)⋅*probValue*;

7. *E*_{S} = *R*_{Detai_in}⋅*Get*(*j*)⋅*FirstState*;

8. If (*P*^{(1)}⋅*Exist*(*E*_{S+1},*E*_{S}))

9. { Pr*ob*_{2} = *P*^{(1)}⋅Pr*obValue*(*E*_{S+1},*E*_{S});

10. ;

11. ;

12. ;

13. }//end if

14. }//end for

15. }//end for

16. ;

17. *R*⋅*add*(*R*_{Detai_in});

18. *S*++;

19. If (*S* ≤ *L*⋅*count*)

20. *CalcuDetaiPathI*(*L*,*TL*,*R*_{Detai_in},*S*,*R*);

21. }

**Algorithm 1** is the main procedure. Lines 1~4 are the initialization, where the simple paths with one transition step are obtained from *P*^{(1)}; line 5 calls the sub-procedure **Algorithm 2** to obtain a linked list of detailed paths with one to n transition steps; line 6 returns the final result *R*.

**Algorithm 2** performs recursive operations. Line 1 takes an (*S*+1)-step transition matrix *P*^{(S+1)} from a linked list of transition matrices *L*; line 2 obtains the simple paths with one transition step from *P*^{(S+1)}; lines 3~15 combine with the passed parameter *R*_{Detai_in} to obtain the detailed paths with (*S*+1) steps; line 8 checks for the state pair (*E*_{S+1},*E*_{S}), and line 10 calculates the probabilities for all detailed paths in ; lines 16~17 assign to *R*_{Detai_in}; lines 18~19 check whether (*S*+2) is greater than the number of steps in the linked list *L*; line 20 passes *R*_{Detai_in} for the next recursive call of procedure *CalcuDetailPathI*.

Next, we present the flowchart for the two algorithms and accurate location prediction based on the detailed paths obtained. The flowchart is depicted in Fig 3. The processes of the workflow are described below.

(1) As in the rough prediction, we first specify a grid cell as the target location. We again assume that grid cell *F* is the target location.

(2) From the (*S*+1)-step transition matrix, we directly obtain all simple paths and the probabilities of arriving at *F* after (*S*+1) steps. Here, we obtain all simple paths with two steps from *P*^{(2)} and record them in , so that = ([*A* → *F*], [*B* → *F*], [*C* → *F*]).

(3) Detailed paths with *S* steps are determined from the parameter passed by the previous iteration or by the initialization. Here, we initialize to obtain detailed paths with one step from simple paths with one step obtained from *P*^{(1)}, so that = = ([*C* → *F*], [*D* → *F*], [*E* → *F*]).

(4) Assemble the start locations of the simple paths and the start locations of the detailed paths to obtain location pairs. For each start location of the detailed paths in and each start location of the simple paths in , we can obtain a location pair. For example, for the detailed path ([*C* → *F*] and the simple path [*A* → *F*], we obtain the location pair [*A* → *C*]. Likewise, we can obtain other location pairs: [*B* → *C*], [*C* → *E*]. Furthermore, we obtain ([*A* → *D*], [*B* → *D*], [*C* → *D*], [*A* → *E*], [*B* → *E*] and [*C* → *E*].

(5) Check to see whether there are location pairs in *P*^{(1)} that place the start locations of the simple paths at the head of the detailed paths to obtain new detailed paths with n transition steps. Here, we obtain the location pairs [*A* → *C*], [*A* → *D*], [*B* → *C*] and [*C* → *D*], and the new detailed paths [*A* → *C* → *F*], [*A* → *C* → *F*], [*B* → *C* → *F*] and [*C* → *D* → *F*].

(6) Multiply the probabilities of the location pairs by the probabilities of the detailed paths with *S* steps to obtain probabilities for the detailed paths with (*S*+1) steps. Here, By multiplying the probability 0.8571 for the detailed path [*C* → *F*] and the probability 0.5556 for the location pair [*A* → *C*], we obtain the probability 0.4762 for the detailed path [*A* → *C* → *F*]. Similarly, we obtain probabilities 0.2222, 0.59997, 0.3, and 0.1429 for the detailed paths [*A* → *D* → *F*], [*B* → *C* → *F*], [*B* → *E* → *F*] and [*C* → *D* → *F*], respectively.

(7) Iterate to find detailed paths and probabilities for (*S* + 2) transition steps until the maximum number of transition steps is reached. Here, we first obtain all simple paths = ([*A* → *F*], [*B* → *F*]) from *P*^{(3)}, then combine these with = ([*A* → *C* → *F*], [*A* → *D* → *F*], [*B* → *C* → *F*], [*B* → *E* → *F*], [*C* → *D* → *F*]) in steps (2)~(6) and obtain = ([*A* → *B* → *C* → *F*], [*A* → *B* → *E* → *F*], [*A* → *C* → *D* → *F*], [*B* → *C* → *D* → *F*]).

Furthermore, we can combine with to obtain = [*A* → *B* → *C* → *D* → *E*]. Finally, we obtain all detailed paths and probabilities of arriving at *F*, as shown in Table 4.

(8) From the detailed paths and probabilities, we can make accurate location predictions for arriving at the target location. The accurate location prediction results are shown in Table 5. After leaving grid cell B, an LBS user will arrive at the target location F with probability 0.59997 along the detailed *B*, *C*, *D* [*B* → *C* → *F*], with probability 0.3 along the detailed path [*B* → *E* → *F*], and probability 0.10003 along the detailed path [*B* → *C* → *D* → *F*] (indicated by the shaded entries). Likewise, accurate location prediction can be performed when LBS users start from grid cells *A*, *C*, *D* and *E*.

## Experiments and Discussion

### Data preparation

#### Simulated anonymity datasets for LBS continuous queries.

Because spatial-temporal k-anonymity and its variants have not been widely applied in business LBS systems, we adopt a software system developed in the literature to simulate large-scale anonymity datasets for LBS continuous queries from GPS trajectories. Table 6 summarizes the basic characteristics of the simulated datasets.

#### Sequential rules & n-step transition matrix.

We adopt the RuleGrowth algorithm in SPMF [19] to mine sequential rules from simulated anonymity datasets. The parameters *seq*sup_{min} and *seqconf*_{min} are set to be 0.02 and 0.24, respectively. The 18 mined sequential rules are given in Table 7.

By normalizing the confidence values of the 18 mined sequential rules, we obtain the one-step transition matrix *P*_{simu}^{(1)}, and further calculate the 2-step transition matrix *P*_{simu}^{(2)}.

### Results and Discussion

#### Experiment 1.

We specify the grid cell *I* as the target location, and derive simple paths for arriving at *I* from *P*_{simu}^{(1)} and *P*_{simu}^{(2)} directly. Based on the simple paths, we make the rough location predictions shown in Table 8. Furthermore, we obtain detailed paths using the algorithm *CalcuDetailPath*, and make accurate location predictions based on the detailed paths. The results are shown in Table 9 and are mapped onto geographic background datasets in Fig 4.

#### Experiment 2.

This experiment aims to verify the correctness of the proposed location prediction methods. As mentioned above, we find that the accurate prediction method is essentially an optimized version of the rough location prediction method. Thus, we only evaluate the correctness of the accurate prediction method with the *realprecision* measure. This measure is a direct measurement calculated as the number of correct predictions divided by the total number of predictions. For example, for the detailed path [*A* → *E* → *I*], the *realprecision* value is equal to the conditional probability *P*(*A* → *E* → *I*|*A*). The results of this experiment are shown in Fig 5, from which we see that for the detailed paths [*H* → *E* → *I*], [*F* → *E* → *I*] and [*D* → *E* → *I*], the *realprecision* measure and the prediction probability are similar, while for the detailed paths [*A* → *E* → *I*], [*B* → *E* → *I*] and [*C* → *E* → *I*], there are significant differences between the *realprecision* values and the prediction probabilities. Namely, the *realprecision* values are much lower than the prediction probabilities.

Next, we analyze causes of this problem. According to the algorithm *CalcuDetailPath*, we find that the prediction probabilities for the detailed paths [*A* → *E* → *I*], [*B* → *E* → *I*] and [*C* → *E* → *I*] are products of the probabilities for the detailed path [*E* → *I*] and the location pairs [*A* → *E*], [*B* → *E*] and [*C* → *E*] in *P*_{simu}^{(1)}.

Furthermore, we find from Table 7 and *P*_{simu}^{(1)} that there are significant differences between the confidence values and the normalized values for the sequential rules [*A* → *E*], [*B* → *E*], [*C* → *E*] and [*E* → *I*]. Specifically, the confidence values are 0.2619, 0.2703, 0.2500, and 0.3182, respectively, but the normalized values are all 1. We argue that reason is that the confidence threshold *seqconf*_{min} for sequential rule mining is too large to allow the discovery of enough sequential rules. Hence, we make the assumption that too large confidence threshold for sequential rules may result in significant differences between the *realprecision* values and the prediction probabilities for the detailed paths [*A* → *E* → *I*], [*B* → *E* → *I*], [*C* → *E* → *I*]. Next, we further test this hypothesis with **Experiment 3**.

#### Experiment 3.

First, we use the two lower confidence thresholds 0.2 and 0.22 to mine sequential rules and obtain 104 sequential rules and 56 sequential rules respectively, among which the sequential rules with start locations A, B, C, and E are shown in Table 10.

Next, we obtain normalized values for the confidence values of sequential rules (in Table 10) of [*A* → *E*], [*B* → *E*], [*C* → *E*] and [*E* → *I*], which are shown in Table 11, and the comparison of the differences between the confidence values and the normalized values is shown in Fig 6. We see that differences decrease along with decreasing confidence thresholds except in the case of the sequential rule [*B* → *E*].

Finally, by constructing n-step transition matrices and adapting the algorithm *CalcuDetailPath*, we obtain location prediction probabilities for the detailed paths [*A* → *E* → *I*], [*B* → *E* → *I*], [*C* → *E* → *I*]. The comparison of varying proximities between location prediction probabilities and *realprecision* values is shown in Fig 7.

We see that as the confidence threshold decreases, the location prediction probabilities for six of the detailed paths are all closer to their corresponding *realprecision* values.

This experiment confirms our previous hypothesis from **Experiment 2**. Thus, we can conclude that proximity between location prediction probabilities and *realprecision* values for detailed paths can be adjusted flexibly by setting different confidence thresholds for mining sequential rules. That is, when users believe that the accuracy of the accurate prediction cannot meet their requirements, they can obtain higher prediction accuracy by decreasing confidence thresholds of mining sequence rules used to construct transition probability matrices.

## Conclusion and future work

Because of its ease of implementation, spatial-temporal k-anonymity has become a mainstream approach for protecting the privacy of LBS users. Analyzing large-scale anonymity datasets can benefit some LBS applications. In this paper, we propose two location prediction methods for the probabilities of arriving at specified locations based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. By conducting extensive experiments, we have verified the correctness and flexibility of our proposed methods.

However, because technologies are intent neutral, they harbor neither benevolent nor malevolent intent with respect to the individuals using them. Thus, our proposed location prediction methods can also lead to substantial privacy threats. For example, target locations that are regarded as privacy-sensitive regions, such as military zones, red-light districts, and so on, may be susceptible to more menacing attacks, because the existing spatial-temporal k-anonymity methods and its variants mainly concern the current and historical private information of LBS users but not the future information [20]. Hence, in the future, we will study the capabilities and limitations of those attacks methods to lay foundations for research into performance optimization for spatial-temporal k-anonymity methods and its variants, thus helping data miners and domain experts ensure that privacy-sensitive knowledge is released or accessible only to trusted parties.

## Supporting Information

### S1 Text. Seqsup0.02_seqconf0.2.

Sequential rules mined with parameters *seq*sup_{min} and *seqconf*_{min} set to be 0.02 and 0.20, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s001

(TXT)

### S2 Text. Seqsup0.02_seqconf0.2_1step.

Accurate one-step location predictions for arriving at the target location *I*, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s002

(TXT)

### S3 Text. Seqsup0.02_seqconf0.2_2step.

Accurate two-step location predictions for arriving at the target location *I*, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s003

(TXT)

### S4 Text. Seqsup0.02_seqconf0.2_3step.

Accurate three-step location predictions for arriving at the target location *I*, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s004

(TXT)

### S5 Text. Seqsup0.02_seqconf0.22.

Sequential rules mined with parameters *seq*sup_{min} and *seqconf*_{min} set to be 0.02 and 0.22, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s005

(TXT)

### S6 Text. Seqsup0.02_seqconf0.22_1step.

Accurate one-step location predictions for arriving at the target location *I*, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s006

(TXT)

### S7 Text. Seqsup0.02_seqconf0.22_2step.

Accurate two-step location predictions for arriving at the target location *I*, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s007

(TXT)

### S8 Text. Seqsup0.02_seqconf0.22_3step.

Accurate three-step location predictions for arriving at the target location *I*, which are used in Experiment 3.

https://doi.org/10.1371/journal.pone.0160629.s008

(TXT)

### S9 Text. Seqsup0.02_seqconf0.24.

Sequential rules mined with parameters *seq*sup_{min} and *seqconf*_{min} set to be 0.02 and 0.24, which are used in Experiment 1.

https://doi.org/10.1371/journal.pone.0160629.s009

(TXT)

### S10 Text. Seqsup0.02_seqconf0.24_1step.

Accurate one-step location predictions for arriving at the target location *I*, which are used in Experiment 1 and 2.

https://doi.org/10.1371/journal.pone.0160629.s010

(TXT)

### S11 Text. Seqsup0.02_seqconf0.24_2step.

Accurate two-step location predictions for arriving at the target location *I*, which are used in Experiment 1 and 2.

https://doi.org/10.1371/journal.pone.0160629.s011

(TXT)

### S12 Text. Test datasets.

Test datasets are to evaluate the correctness of the accurate prediction method with the *realprecision* measure, which are used in Experiment 2 and 3.

https://doi.org/10.1371/journal.pone.0160629.s012

(TXT)

## Acknowledgments

This research is supported by grants from National Natural Science Foundation of China (grant number 41201465), and Natural Science Foundation of Jiangsu province (grant number BK2012439). The authors wish to thank the anonymous reviewers for their valuable comments.

## Author Contributions

**Conceived and designed the experiments:**HTZ ZWC.**Performed the experiments:**ZWC.**Analyzed the data:**HTZ ZWC.**Contributed reagents/materials/analysis tools:**ZL.**Wrote the paper:**ZWC YHZ.- Reviewer before submission: CXW.

## References

- 1.
Jochen S, Agnès V. Location-Based Services. Elsevier Inc. 2004; 245–251.
- 2.
Bettini C, Jajodia S, Samarati P, Wang XS. Privacy in Location-Based Applications: Research Issues and Emerging Trends. Springer, Verlag Berlin Heidelberg. 2009.
- 3.
Pedreschi D, Bonchi F, Turini F, Verykios VS, Atzori M, Malin B, et al. Privacy protection and technologies, opportunities and threats. Mobility Data Mining&Privacy. 2008; 101–119.
- 4. Jaeheung L, Seokhyun K, Yookun C, Yoojin C, Yongsu P. A Hierarchical Clustering-Based Spatial Cloaking Algorithm for Location-Based Services. Journal of Internet Technology. 2012; 13(4): 645–653.
- 5.
Kido H, Yanagisawa Y, Satoh T. Protection of Location Privacy using Dummies for Location-based Services. Proceedings of the 21st International Conference on Data Engineering Workshops. 2005.
- 6.
Hara T, Suzuki A, Iwata M, Arase Y, Xie X. Dummy-based User Location Anonymization under Real-World Constraints. Access, IEEE. 2016; 1–1.
- 7.
Um JH, Kim HD, Chang JW. An Advanced Cloaking Algorithm Using Hilbert Curves for Anonymous Location Based Service. Proceedings of Social Com 2nd 2010; 1093–1098.
- 8.
Shang N, Ghinita G, Zhou YB, Bertino E. Controlling data disclosure in computational PIR protocols. Proceedings of ASIACCS 2010 5th, 2010; 310–313.
- 9.
Gruteser M, Grunwald D. Anonymous Usage of Location-Based Services through Spatial and Temporal Cloaking. Proceedings of MobiSys 2003.
- 10. Ni WW, Gu MZ, Chen X. Location privacy-preserving k nearest neighbor query under user's preference. Knowledge-Based Systems. 2016; 103:19–27.
- 11.
Giannotti F, Pedreschi D. Mobility Data Mining and Privacy: Geographic Knowledge Discovery. Springer-Verlag Berlin Heidelberg. 2008; 243–292.
- 12.
Zhang HT, Xu L, Huang HH, Gao SS. Mining spatial association rules from LBS anonymity dataset for improving utilization. Proceedings of Geoinformatics. 2013; 1–6.
- 13.
Chow CY, Mokbel MF. Enabling Private Continuous Queries for Revealed User Locations. Proceedings of SSTD. 2007; 258–275.
- 14.
Pan X, Meng XF, Xu JL. Distortion-based anonymity for continuous queries in location-based mobile services. Proceedings of GIS. 2009; 256–265.
- 15. Davor D, Karolj S, Danijel B, Maja TP. Grid implementation of the weather research and forecasting model. Earth Science Informatics. 2010; 3(4): 199–208.
- 16. Philippe FV, Usef F, Roger N, Engelbert MN. CMRULES: An Efficient Algorithm for Mining Sequential Rules Common to Several Sequences. Knowledge-based Systems. 2012; 25(1): 63–76.
- 17. Baum LE, Petrie T. Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Annals of Mathematical Statistics. 1966; 37(6): 1554–1563.
- 18. Wanalertlak W, Lee B, Yu CS, Kim MC, Park SM, Kim WT. Behavior-based mobility prediction for seamless handoffs in mobile wireless networks. Wireless Networks. 2011; 17(3): 645–658.
- 19. Philippe FV, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS. SPMF: a Java Open-Source Pattern Mining Library. Journal of Machine Learning Research. 2014; 15: 3569–3573.
- 20.
Renso DC, Spaccapietra DS, Zimnyi DE. Mobility Data: Modeling, Management, and Understanding. Cambridge University Press. 2013; 174–193.