Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset

Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified.


Introduction
With the rapid development in mobile communication and the popularity of positioning devices (e.g. Global Position System, GPS), LBS are widely used because of simplification in computing [1]. However, the deployment of LBS would bring privacy problems (e.g., employers snoop whereabouts of the staff, stalkers attack user trajectories to find out their religion, sex orientation, etc.) if used illegally, which has raised great attention from academia to business circle [2] [3].
Early research on privacy protection for LBS users put emphasis on establishment of laws and treaties. While this research lacks flexibility, and has lagged behind attack technologies, some new technologies have been put forward. For instance, the use of hierarchical clustering [4], dummies [5] [6], spatial transformation based on the Hilbert curve [7], private information retrieval (PIR) protocols [8] and spatial-temporal k-anonymity [9]. Spatial-temporal k-anonymity has become a mainstream privacy protection method for LBS users due to its simplification and various applications.
Furthermore, the basic principle of cloaking a requestor's identification as well as accurate time and position information has inspired several variants on the original method [10].
As spatial and temporal properties are most important elements of spatial-temporal k-anonymity datasets (hereafter referred to as anonymity datasets), anonymity datasets can be formatted into a number of sequences of generalized regions. Analyzing large-scale anonymity datasets recorded and stored by LBS providers (such as Google Maps, Foursquare, Baidu Maps, etc.), can achieve a set of sequential rules reflecting LBS issuers' movement behaviors. Furthermore, the sequential rules can be utilized to predict locations of future users, and provide assist decision support functions for LBS applications, such as intelligent navigation systems, personalized service systems, and so on [11] [12]. Unfortunately, location prediction simply based on sequential rules does not perform well, as the prediction can only be single step, that is, the prediction only includes one source and one destination. A more practical location prediction method (such as multistep, etc.), is urgently needed in applications. To our knowledge, there is little literature that focuses on this subject by far.
In this paper, based on sequential rules mined from large-scale anonymity datasets, we propose two location prediction methods. Simultaneously, privacy attack problems that may result from our proposed location prediction methods are also analyzed.
The rest of this paper is organized as follows. Preliminary work is described in Section 2. Two location prediction methods based on preprocessing sequential rules from anonymity datasets are presented in Section 3. Comprehensive experiments are conducted in Section 4, and the results are analyzed. Section 5 concludes the paper and discusses further work.

Preliminaries
In this section, the basic concepts of LBS queries and the primitives of LBS privacy are introduced. Examples of anonymity datasets adopted by a typical method of spatial-temporal k-anonymity are also presented.

LBS query
A location service can be defined as a service that integrates the location of an LBS user with other information to provide added value to the user. Applications are designed by adopting two modes: push and pull [13]. Furthermore, there are two types of pull services, namely snapshot queries such as "recommend 10 nearby restaurants based on my profile", and continuous queries such as "continually tell me the shopping mall nearest my location". For a snapshot query, an LBS user only needs to report their current location to the service provider once to obtain the desired information. On the other hand, for a continuous query, an LBS user has to continually report their location to the service provider in a periodic or on-demand manner to obtain the desired results [14]. Additionally, in a continuous query, a consistent user identity (or pseudo-identifier) is used until the query expires, that is, LBS providers can link requests issued by the same (anonymous) user at different times in chronological order to obtain a sequence of requests.

Primitives of LBS privacy
Privacy is an essential requirement for providing LBS, and can be grouped into two categories: identity and sensitive information [2]. Identity of each individual is unique which distinguishes an individual from a group of individuals (i.e., a security identifier or SID). Sensitive information consists of location and request content. Location privacy is the tracks of individuality or a group of people, which includes coordinates, landmarks, etc. Semantic location privacy is an instance of privacy regarding sensitive semantic information, for example, hospitals, religious buildings, and so on. Request content privacy involves sensitive attribute information, such as disease, salary, religion, and so on. It is worth noting that identity privacy can be associated with sensitive information privacy to cause more severe privacy invasion.

Spatial-temporal k-anonymity
Spatial-temporal k-anonymity is a branch of the k-anonymity method, which is an obfuscation technique. Based on spatial-temporal k-anonymity, a query request submitted to LBS providers does not only contributed by the identity and location of LBS users, but also at least k pseudonyms of the users, including the requestor and others nearby, and a cloaking region enclosing the locations of the k (or more) LBS users. Thus, given a query request, an anonymity dataset is generated, consisting of at least k pseudonyms and a cloaking region. Consequently, identity privacy is protected by replacing the identities of requestors with pseudonyms, and location privacy is protected by replacing accurate locations of query requestors with cloaking regions. Furthermore, as an anonymity dataset includes at least k pseudonyms and a cloaking region, the association between pseudonyms and the cloaking region can be prevented at a certain degree. Likewise, the association between pseudonyms and the content of the request can also be avoided, as any pseudonym within the anonymity dataset may have issued the query request.
Spatial-temporal k-anonymity and its optimized versions are widely used in LBS snapshot queries and continuous queries [2]. To better understand the follow-up analysis of anonymity datasets, we present an example workflow of generating an anonymity dataset adopted by the modified adaptive-interval cloaking algorithm [9].
First, we present the basic definitions of an anonymity dataset for an LBS snapshot query SnAS = hUP,CR,TCi, where UP = hU 1 ,U 2 ,. . .,U k i represents a set of k user pseudonyms, CR = hCell 1 ,Cell 2 ,. . .,Cell m i represents a cloaking region that includes m grid cells enclosing the locations of the k users, and TC = hTI 1 ,TI 2 ,. . .,TI n i represents temporal cloaking with n time intervals of equal duration. Moreover, the time intervals hTI 1 ,TI 2 ,. . .,TI n i provide very little temporal information, that is, SnAS is a temporally-ordered sequence without a specified time. Fig 1 presents an example of an anonymity dataset for an LBS snapshot query, where SnAS = hhU 11 ,U 12 ,U 13 ,U 14 ,U 15 ,U 16 ,U 17 ,U 18 ,U 19 ,U 26 ,U 27 ,U 28 i,hCell 22 ,Cell 23 ,Cell 33 i,h1ii. We set k = 10, and for the sake of simplicity, we set the number of temporal cloaking to be 1.
Based on the definitions of anonymity datasets for a snapshot query, we define an anonymity dataset for an LBS continuous query as, CoAS = hSnAS 1 ,SnAS 2 ,. . .,SnAS s i where SnAS i (1 i s) represents an anonymity dataset for a snapshot query. In this paper, we focus on anonymity datasets for LBS continuous queries. Fig 2 presents an example of an anonymity dataset for an LBS continuous query, where CoAS = hSnAS 1 ,SnAS 2 ,SnAS 3 ,SnAS 4 i, Finally, as we deal only with spatial-temporal properties of anonymity datasets, an anonymity dataset for an LBS continuous query can be denoted more briefly as . . . ; Cell m 1 i; hTI 1 ; TI 2 ; . . . ; TI n 1 ii; . . . ; Cell m 2 i; hTI 1 ; TI 2 ; . . . ; TI n 2 ii; . . . ; Cell m s i; hTI 1 ; TI 2 ; . . . ; TI n s ii: In the case of the anonymity dataset in Fig 2, the simplified notation is as follows: Location Prediction for Spatial-Temporal K-Anonymity Dataset

Location prediction method
In this section, two location prediction methods are proposed. Either follows 5 phases: 1. Mining sequential rules from anonymity datasets for LBS continuous queries; 2. Constructing transition probability matrices from the mined sequential rules; 3. Normalizing the transition probabilities in the transition probability matrices; 4. Computing n-step transition probability matrices by raising the normalized transition probability matrices to the power n; 5. Designing a rough location prediction method and an accurate location prediction method based on the n-step transition probability matrices.

Mining sequential rules from anonymity datasets for LBS continuous queries
Prediction is an important type of data mining technology, and discovering temporal relationships in sequences of discrete events stored in large databases can help with the prediction of events [15]. Sequential patterns in sequences of events can reflect temporal relationships even without a specified time between events, and mining sequential patterns has become a popular technique for prediction. Meanwhile, as a sequential pattern only indicates that a sequence of events appears frequently in a database, it is not sufficient for the prediction of events. Thus, the concept of a sequential rule, also called a prediction rule, was proposed in [16].
A sequential rule has the form X ! Y, where X and Y are two sets of events. X ! Y is interpreted to mean "if events X appear, the events Y are likely to occur afterward with a given confidence value or probability". Events X and events Y occur in succession frequently within a single sequence. A sequential rule typically has two measures of significance: support and confidence. The support of a sequential rule is here defined as the number of sequences where the left part occurs before the right part, divided by the number of sequences; the confidence of a rule is the number of sequences where the left part occurs before the right part, divided by the number of sequences where the left part occurs. For example, for a sequential rule X ! Y, the support value and the confidence value of the sequential rule are respectively formulated as follows: where |D| is the number of sequences in a sequence database D, seqsup(X) is the number of sequences in D where X occurs, and seqsup(X [ Y) is the number of sequences in D where Y occurs after X. Neither seqsup(X ! Y) nor seqconf(X ! Y) are less than the user-defined thresholds seqsup min and seqconf min .
In this paper, we focus on spatial-temporal properties of anonymity datasets. Sequential rules mined from large-scale historical anonymity datasets generated by LBS continuous queries can be used to make location prediction for LBS users. In particular, a sequential rule of the form A ! B with the confidence seqsconf(A-> B) may indicate that, if an LBS user issued a continuous query and presented an anonymous request in grid cell A, then with the confidence seqsconf(A ! B) (s)he will continue to present an anonymous request in grid cell B. That is, sequential rules mined from anonymity datasets can reflect the movement regularity of LBS users among a series of grid cells. Table 1 presents a sample of sequential rules mined from anonymity datasets generated by LBS continuous queries.

Constructing n-step transition matrices by normalizing the confidence values of sequential rules
From a statistical standpoint, a mobility model for an LBS requester can be viewed as a stationary stochastic process [17]. Each movement of LBS users among a series of grid cells can be regarded as a discrete Markov process {X n ,n2T}, where T is a discrete time set (e.g., T = {1,2,. . .}), the random variable X represents the location of an LBS user who requests an anonymous continuous query, and X n represents the value of random variable X at time n (here, X n represents the grid cell where the LBS user is located at time n). We refer to X n as a state and call I = {i 1 ,i 2 ,i 3 . . .i m }, the set of all possible states of X n , the state space of X n . I can be achieved by counting of the number of distinguished grid cells appearing in the left and right parts of the sequential rules mined from large-scale historical anonymity datasets generated by LBS continuous queries. In the case of the collection of sequential rules in Table 1, For any given n 2 T, i 0 ,i 1 . . .i n+1 2 I, the discrete Markov process {X n ,n2T} is called a 1-order Markov chain if the following formula holds: . .,X n = i n , and P{X n+1 == i n+1 | X n = i n } is the conditional probability of X n+1 = i n+1 given X n = i n . That is, a 1-order Markov chain {X n ,n2T} can be characterized as memoryless: the next state X n+1 = i n+1 depends only on the current state X n+1 = i n+1 but not on the sequence of events that preceded it. In the case of sequential rules mined from anonymity datasets, the memoryless means that the future grid cell at which an LBS user arrives is independent of all but the most recent grid cell.
The conditional probability P{X n+1 == i | X n = j}, i, j 2 I can also be taken as a one-step transition probability from state i to state j, which is denoted by p ij . In statistical significance, p ij is consistent with the confidence value of the sequence rule of the form i ! j. Based on all one-step transition probabilities that corresponding to the sequential rules, a transition matrix can be constructed, where the dimension m is equal to the number of states in the state space I. Continuing with the sequential rules in Table 1, the generated transition matrix is: P ¼ . Table 1. Sample of sequential rules mined from anonymity datasets generated by LBS continuous queries.

No.
Rules seqconf However, the transition matrix must be normalized so that the condition holds. For example, for given i = A, i 2 {A,B,C,D,E,F}, X A p Aj must be equal to 1, while X A p Aj = p AA + p AB + p AC + p AD + p AE + p AF = 0 + 0.2 + 0.5 + 0.2 + 0 + 0 = 0.9.
The normalization formula of p ij is p ij We refer to a transition matrix with normalized transition probabilities as a one-step transition matrix, and denote by P (1) the result corresponding to P 0 .
In addition, the normalized probabilities are time-invariant. One reason for this is that the confidence values of the sequential rules mined from large-scale historical anonymity datasets generated by LBS continuous queries essentially reflect routine behaviors of a large number of LBS users. On the other hand, sequential rules only reflect temporally-ordered relationships between routine behaviors without specifying times. That is, LBS users follow a common route regardless of when they move. Hence, the Markov chain {X n ,n2T}, which corresponds to the movement of LBS users among a series of grid cells, can also be characterized as time-invariant, and further P (1) can be considered to be independent of n. Furthermore, we can obtain the nstep transition matrix P (n) from P (1) using the formula P (n) = (P (1) ) n . The maximum value of n can be determined from the conditions that P (n) is not a zero matrix and that n is less than the length of the longest sequence of LBS anonymity datasets. Here, by raising P (1) to an appropriate power, we obtain P (2) , P (3) , and P (4) as follows: Prediction for arriving at a target location based on n-step transition matrices Rough prediction. This prediction consists of three main phases: First, we specify a grid cell as the target location.
Continuing the examples of sequential rules in Table 1, we assume that the grid cell represented by F is the target location.
Second, from the n-step transition matrix, we directly derive paths along which LBS users can arrive at the target location with specified probabilities. As the paths only include the target location and the grid cell ("begin location" for short) where the LBS users are when they begin to move, we denote these paths as simple paths. Based on the transition matrices P (1) * P (4) , we obtain all simple paths by ascending number of steps as shown in Table 2.
Finally, by matching the grid cell ("current location" for short) where an LBS user is currently with all simple paths, we can make a location prediction for LBS users' arriving at the target location. In the case of the simple paths in Table 2, the location prediction results are shown in Table 3. For example, when an LBS user appears in grid cell A, three predictions (indicated by the shaded entries) can be performed. In particular, after leaving the grid cell A, the LBS user has the three probability values 0.6984, 0.2749 and 0.0222 for arriving at the target location F through 2-, 3-, and 4-step transitions respectively. Likewise, location prediction can be performed when LBS users occupy grid cells B, C, D and E.
As can be seen in Table 3, the transitions from the current location to the target location can be classified as either single step or multistep. For single step transitions, the paths that LBS users follow are shown clearly. In particular, after leaving their current location, an LBS user arrives directly at the target location. For example, after leaving grid cell C, an LBS user arrives directly at the target location F with probability 0.8571. However, for multistep transitions, we find that the simple paths that LBS users follow include one or more intermediate locations, but these intermediate locations are unknown, so the detailed path between the current location and the target location cannot be investigated. For example, LBS users currently in grid cell C arrive at the target location F with probability 0.1429 through a 2-step transition. This simple path certainly includes one intermediate location, but we cannot know the intermediate location. If there are several options for the intermediate location, then the simple path actually contains several detailed paths, and the probability 0.1429 is the sum of the probabilities for those detailed paths. In many practical applications, it is significant to know these detailed paths to predict future movements of the LBS users [18].
Accurate prediction. We propose a method of calculating probabilities for detailed paths to make accurate location predictions. The principle of calculating transition probabilities for detailed paths is to iteratively calculate the probabilities for detailed paths with (S+1) steps by combining the probabilities for detailed paths with n−1 steps and the probabilities of simple paths with n steps.
The pseudo code for calculating transition probabilities for detailed paths is given below. R:addðR ð1Þ Detail in Þ;
Return R; 7. } Algorithm 2: CalcuDetailPathI(L,TL,R Exac_in ,S,refR) Input: L, a linked list of 1-step to n-step transition matrices; TL, a target location; R Detail_in , a linked list of detailed paths with S-step transition steps; S, the transition steps in the current iteration; R, a parameter passed by reference, which represents a linked list of detailed paths with one to S transition steps. Output: null. 1. { P (S+1) = L.Get(S + 1); 2. R ðSþ1Þ simp in ¼ P ðSþ1Þ :GetArriveðTLÞ; 3.
{ prob 1 = R Detai_in ÁGet(j)ÁprobValue; 7.  20. CalcuDetaiPathI(L,TL,R Detai_in ,S,R); 21. } Algorithm 1 is the main procedure. Lines 1~4 are the initialization, where the simple paths with one transition step are obtained from P (1) ; line 5 calls the sub-procedure Algorithm 2 to obtain a linked list of detailed paths with one to n transition steps; line 6 returns the final result R.
Algorithm 2 performs recursive operations. Line 1 takes an (S+1)-step transition matrix P (S+1) from a linked list of transition matrices L; line 2 obtains the simple paths R ðSþ1Þ simp in with one transition step from P (S+1) ; lines 3~15 combine R ðSþ1Þ simp in with the passed parameter R Detai_in to obtain the detailed paths with (S+1) steps; line 8 checks for the state pair (E S+1 ,E S ), and line 10 calculates the probabilities for all detailed paths in R ðSþ1Þ Detai in ; lines 16~17 assign R ðSþ1Þ Detai in to R Detai_in ; lines 18~19 check whether (S+2) is greater than the number of steps in the linked list L; line 20 passes R Detai_in for the next recursive call of procedure CalcuDetailPathI.
Next, we present the flowchart for the two algorithms and accurate location prediction based on the detailed paths obtained. The flowchart is depicted in Fig 3. The processes of the workflow are described below.
(1) As in the rough prediction, we first specify a grid cell as the target location. We again assume that grid cell F is the target location.
(2) From the (S+1)-step transition matrix, we directly obtain all simple paths and the probabilities of arriving at F after (S+1) steps. Here, we obtain all simple paths with two steps from P (2) and record them in R ð2Þ simp in , so that R ð2Þ Detailed paths with S steps are determined from the parameter passed by the previous iteration or by the initialization. Here, we initialize to obtain detailed paths with one step R ð1Þ Detai in from simple paths with one step R ð1Þ simp in obtained from P (1) , so that R ð1Þ (4) Assemble the start locations of the simple paths and the start locations of the detailed paths to obtain location pairs. For each start location of the detailed paths in R ð1Þ Detai in and each start location of the simple paths in R ð2Þ simp in , we can obtain a location pair.
Furthermore, we can combine R ð3Þ Detai in with R ð4Þ Finally, we obtain all detailed paths and probabilities of arriving at F, as shown in Table 4.

Data preparation
Simulated anonymity datasets for LBS continuous queries. Because spatial-temporal k-anonymity and its variants have not been widely applied in business LBS systems, we adopt a software system developed in the literature to simulate large-scale anonymity datasets for LBS continuous queries from GPS trajectories. Table 6 summarizes the basic characteristics of the simulated datasets.
Sequential rules & n-step transition matrix. We adopt the RuleGrowth algorithm in SPMF [19] to mine sequential rules from simulated anonymity datasets. The parameters seqsup min and seqconf min are set to be 0.02 and 0.24, respectively. The 18 mined sequential rules are given in Table 7. Table 4. Detailed paths for arriving at the target location F.

Steps
Current location Detailed path Probability Target location  Table 5. Accurate prediction for arriving at the target location F.

Current location
Steps Detailed path Probability Target location Location Prediction for Spatial-Temporal K-Anonymity Dataset By normalizing the confidence values of the 18 mined sequential rules, we obtain the onestep transition matrix P simu (1) , and further calculate the 2-step transition matrix P simu (2) . P

Results and Discussion
Experiment 1. We specify the grid cell I as the target location, and derive simple paths for arriving at I from P simu (1) and P simu (2) directly. Based on the simple paths, we make the rough location predictions shown in Table 8. Furthermore, we obtain detailed paths using the algorithm CalcuDetailPath, and make accurate location predictions based on the detailed paths.
The results are shown in Table 9 and are mapped onto geographic background datasets in   Experiment 2. This experiment aims to verify the correctness of the proposed location prediction methods. As mentioned above, we find that the accurate prediction method is essentially an optimized version of the rough location prediction method. Thus, we only evaluate the correctness of the accurate prediction method with the realprecision measure. This measure is a direct measurement calculated as the number of correct predictions divided by the total number of predictions.  Furthermore, we find from Table 7  We argue that reason is that the confidence threshold seqconf min for sequential rule mining is too large to allow the discovery of enough sequential rules. Hence, we make the assumption that too large confidence threshold for sequential rules may result in significant differences between the realprecision values and the prediction probabilities for the detailed paths Next, we further test this hypothesis with Experiment 3. Experiment 3. First, we use the two lower confidence thresholds 0.2 and 0.22 to mine sequential rules and obtain 104 sequential rules and 56 sequential rules respectively, among which the sequential rules with start locations A, B, C, and E are shown in Table 10. Location Prediction for Spatial-Temporal K-Anonymity Dataset Next, we obtain normalized values for the confidence values of sequential rules (in Table 10 Table 11, and the comparison of the differences between the confidence values and the normalized values is shown in Fig 6. We see that differences decrease along with decreasing confidence thresholds except in the case of the sequential rule [B ! E].
Finally, by constructing n-step transition matrices and adapting the algorithm CalcuDetail-Path, we obtain location prediction probabilities for the detailed paths The comparison of varying proximities between location prediction probabilities and realprecision values is shown in Fig 7. We see that as the confidence threshold decreases, the location prediction probabilities for six of the detailed paths are all closer to their corresponding realprecision values.
This experiment confirms our previous hypothesis from Experiment 2. Thus, we can conclude that proximity between location prediction probabilities and realprecision values for detailed paths can be adjusted flexibly by setting different confidence thresholds for mining sequential rules. That is, when users believe that the accuracy of the accurate prediction cannot Location Prediction for Spatial-Temporal K-Anonymity Dataset meet their requirements, they can obtain higher prediction accuracy by decreasing confidence thresholds of mining sequence rules used to construct transition probability matrices.

Conclusion and future work
Because of its ease of implementation, spatial-temporal k-anonymity has become a mainstream approach for protecting the privacy of LBS users. Analyzing large-scale anonymity datasets can benefit some LBS applications. In this paper, we propose two location prediction methods for the probabilities of arriving at specified locations based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. By conducting extensive experiments, we have verified the correctness and flexibility of our proposed methods.
However, because technologies are intent neutral, they harbor neither benevolent nor malevolent intent with respect to the individuals using them. Thus, our proposed location prediction methods can also lead to substantial privacy threats. For example, target locations that are regarded as privacy-sensitive regions, such as military zones, red-light districts, and so on, may be susceptible to more menacing attacks, because the existing spatial-temporal k-anonymity Location Prediction for Spatial-Temporal K-Anonymity Dataset methods and its variants mainly concern the current and historical private information of LBS users but not the future information [20]. Hence, in the future, we will study the capabilities and limitations of those attacks methods to lay foundations for research into performance optimization for spatial-temporal k-anonymity methods and its variants, thus helping data miners and domain experts ensure that privacy-sensitive knowledge is released or accessible only to trusted parties.
Supporting Information S1 Text. Seqsup0.02_seqconf0.2. Sequential rules mined with parameters seqsup min and seqconf min set to be 0.02 and 0.20, which are used in Experiment 3. S9 Text. Seqsup0.02_seqconf0.24. Sequential rules mined with parameters seqsup min and seqconf min set to be 0.02 and 0.24, which are used in Experiment 1. (TXT) S10 Text. Seqsup0.02_seqconf0.24_1step. Accurate one-step location predictions for arriving at the target location I, which are used in Experiment 1 and 2. (TXT) S11 Text. Seqsup0.02_seqconf0.24_2step. Accurate two-step location predictions for arriving at the target location I, which are used in Experiment 1 and 2. (TXT) S12 Text. Test datasets. Test datasets are to evaluate the correctness of the accurate prediction method with the realprecision measure, which are used in Experiment 2 and 3. (TXT)