A refined maximum predictability for next location prediction with fusion knowledge

Liuhong Huang; Zhaocheng He; Xiying Li; Zhi Yu

doi:10.1371/journal.pone.0342450

Abstract

Research on maximum predictability for next location prediction aims to derive the theoretical maximum accuracy that an ideal prediction model could achieve, which is crucial for analyzing travel regularity and evaluating prediction models. However, three problems remain: 1) The spatiotemporal information used in existing predictability measures is incomplete; 2) quantifying predictability across diverse spatiotemporal information is challenging due to the limitations of entropic measures; and 3) applications of predictability lack further analysis of individual regularity. In this work, we first summarized spatiotemporal information and categorized it into four types of spatiotemporal knowledge. Next, to better quantify predictability, we proposed a refined maximum predictability based on fusion knowledge and Shannon entropy. Finally, we leveraged individual spatiotemporal knowledge preferences based on the refined maximum predictability to analyze travel regularity and evaluate prediction models. Our experimental results showed that the proposed predictability achieved the best results in both the simulation dataset and actual datasets, with a simulation dataset’s mean absolute error (MAE) of 0.06. Furthermore, the evaluation results of prediction models indicated that personalized selection and full utilization of spatiotemporal knowledge are crucial for effective location prediction. This work provides insights into the design and improvement of location prediction models. Codes are available at https://github.com/hlh7/A-refined-maximum-predictability.

Citation: Huang L, He Z, Li X, Yu Z (2026) A refined maximum predictability for next location prediction with fusion knowledge. PLoS One 21(2): e0342450. https://doi.org/10.1371/journal.pone.0342450

Editor: Ziqiang Zeng, Sichuan University, CHINA

Received: September 2, 2024; Accepted: January 21, 2026; Published: February 13, 2026

Copyright: © 2026 Huang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript.

Funding: This work was supported by the National Natural Science Foundation of China (No. U1611461). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Traffic prediction is crucial for decision-making in traffic management [1,2]. As individual next-location prediction is a core component of traffic forecasting, its maximum prediction accuracy dictates the controllable boundaries and potential effectiveness of management strategies. Consequently, understanding this maximum prediction accuracy is essential. The maximum prediction accuracy is also referred to as maximum predictability, representing the theoretical upper bound of prediction accuracy that an ideal prediction model could achieve, independent of particular prediction methods, and reflecting the inherent regularity of individual trips [3]. Maximum predictability is also applied in the analysis of network social activity [4], indoor mobility [5,6], sequential recommendation [7,8], port traffic [9], passenger flow [10], travel time [11], and travel speed [12]. This work focuses on the estimation and application of maximum predictability within next location prediction scenarios.

However, existing methods for estimating maximum predictability in next location prediction, which primarily rely on entropic measures and Fano’s inequality, face several limitations. 1) Incomplete consideration of spatiotemporal information. Some studies focused solely on spatial information, neglecting temporal information [13–16], while others considered only a subset of spatiotemporal information [17–19], leading to an underestimation of predictability; 2) limited by the entropic measures, these methods either struggled to incorporate additional spatiotemporal information [17,18] or incorrectly treated non-repeating trips as regular trips [19], resulting in inaccurate predictability estimations; and 3) the inability to effectively leverage spatiotemporal information prevents accurate predictability estimations across diverse spatiotemporal information, hindering analysis of individual spatiotemporal preferences and insights into travel patterns and prediction models.

To overcome these limitations, this work proposed a refined maximum predictability. First, we examined the compositional relationships within spatiotemporal information and refined the categorization of spatiotemporal knowledge. Second, we enhanced the calculation of the maximum predictability for varying types of knowledge by employing Shannon entropy and random entropy. Third, we integrated predictability across various knowledge to derive the refined maximum predictability and individual spatiotemporal knowledge preferences. Finally, we employed spatiotemporal preferences to optimize group classification and the evaluation and enhancement of prediction models. The experimental results showed that the refined maximum predictability achieved the best results in both the simulation dataset and actual datasets and relatively decreased the mean absolute error (MAE) on the simulation dataset by 68% when compared to the classical method. Moreover, the location prediction model enhanced with spatiotemporal preference achieved comparable accuracy to deep learning models, demonstrating the significant potential of refined maximum predictability. The contributions of this work are as follows.

We proposed an improved method for estimating maximum predictability. This method fully considers and accurately estimates the predictability across diverse types of mobility knowledge, leading to a more reliable maximum predictability.
We extended the applications of maximum predictability and applied the refined maximum predictability to mine individual spatiotemporal knowledge preferences, which provides new insights into travel regularity analysis and prediction model evaluations.
To the best of our knowledge, we were the first to apply individual maximum predictability to improve location prediction models, and the experimental results showed the great potential in the location prediction models based on predictability.

Related work

Spatiotemporal information

Spatiotemporal information is the basic input for predictability. In previous studies, the maximum predictability was primarily measured by using the length of the shortest non-repeating subsequence based on location sequences, which utilized destination and origin-destination distributions and ignored temporal information [3,6,13–16]. In addition to spatial information, temporal information is important. Therefore, researchers used the location sequences of time periods to measure predictability by mainly employing time-destination and time-origin-destination distributions [17,18]. In a recent study, predictability was estimated based on the time-origin-destination distribution [19]. However, the spatiotemporal information used in previous studies differs, and no study has provided complete spatiotemporal information.

Studies in other fields can also provide some insights. In addition to predictability, destination [20–23], time-destination [24,25], origin-destination [26], and time-origin-destination [20,26] distributions were used in some studies to determine travel regularity based on entropy. Moreover, the distributions of time [22,27–29], destination [30], time-destination [31], origin-destination [22,32], and time-origin-destination [21,33,34] are usually adopted to mine travel patterns.

Overall, in the study of predictability, four main types of spatiotemporal information related to the destination are present: the distributions of destination, time-destination, origin-destination, and time-origin-destination, which are consistent with those in other related fields.

Methods of maximum predictability

There are two main tasks in existing studies: the next time-bin prediction task and the next location prediction task. This work focuses on the next location prediction task. Song et al. [3] first adopted real entropy and Fano’s inequality to quantify maximum predictability in the next time-bin prediction task. Wan et al. [13] subsequently extended Song’s method [3] to estimate the maximum predictability in the next location prediction task by applying the Lempel-Ziv algorithm to estimate real entropy based on location sequences. To better estimate maximum predictability, researchers focused on the entropic measures. Some researchers used the Burrows-Wheeler Transform algorithm to estimate real entropy [6,35], which has a higher precision than the Lempel-Ziv algorithm. Xu et al. [36] proposed two suggestions for calculating real entropy through numerical experiments. The real entropy primarily measures maximum predictability based on location sequences, and it neglects temporal information. Thus, the real entropy can quantify predictability well for sequences without any temporal regularity. However, temporal information is crucial in the next location prediction task. For instance, given a sequence similar to postal delivery, like ((t1,d1), (t2,d2), (t1,d1), (t2,d3), (t1,d1), (t2,d4), (t1,d1), (t2,d5)), the location d1 correlates highly with time and is completely predictable; thus, the theoretical maximum predictability can be higher than 0.5. However, according to the real entropy and Fano’s inequality, the estimated maximum predictability is 0.37, which is substantially lower than the theoretical value. Therefore, the real entropy cannot estimate the predictability for temporal information.

To address these problems, Teixeira et al. [17,18] divided the trajectory into multiple location subsequences for time periods and calculated the weighted real entropy based on the subsequences; however, the predictability was mostly underestimated owing to the limitation of the convergence speed. Yu et al. [37] utilized real entropy and mutual information to measure maximum predictability when considering external factors, and they considered exploration trips to be unpredictable; however, those exploration trips have a predictability of 1/m when the number of locations is m; thus, this method might underestimate the maximum predictability. Consequently, the predictability based on real entropy still cannot quantify the predictability for temporal information well.

To better quantify the predictability for temporal information, Zhang et al. [19] utilized conditional entropy to measure the maximum predictability based on context information. However, this method treats non-repeating trips with a conditional probability of 1 as regular trips, leading to a significant overestimation of predictability. In addition, the conditional entropy usually overestimates the number of locations when utilizing Fano’s inequality, which also results in an overestimation of predictability. These studies indicate that the entropic measures used in existing predictability still have some limitations.

Applications of predictability

Since Song et al. [3] proposed maximum predictability, many researchers have aggregated individual predictability to analyze group travel regularity, such as average predictability and predictability distributions [3,13–16,19,35,37–43]. Some researchers analyzed predictability distributions for different sampling frequencies and scales and found that predictability correlates highly with sampling frequency and scales in the next time-bin prediction task [38–41]. Wan et al. [13] analyzed the correlations between predictability and the number of visited locations, travel distance, and travel radius, and found negative correlations between them. Ikanovic et al. [14] also investigated the effect of travel features and found that the number of visited locations has the highest correlation with predictability. Yu et al. [37] analyzed the influence of external factors on predictability, such as holidays, temperature, and weather, and found that external factors could increase predictability. Despite the impressive results of group regularity analysis, further individual regularity analysis is needed to enhance the analysis of travel regularity, such as individual knowledge preferences.

In addition, some researchers applied maximum predictability for evaluating prediction models. Some studies analyzed the correlation between predictability and prediction accuracy and found a strong positive correlation between them [19,20,40], indicating that maximum predictability can be used to evaluate prediction models. Furthermore, some researchers evaluated the performance of prediction models for different predictability levels [20,21,44]. However, these evaluations only considered predictability, and other travel features were seldom included, resulting in a single evaluation dimension.

Problem definition

In this section, we introduce the basic definitions, notations, and problem formulations in this work. Table 1 lists the main notations.

Download:

Table 1. Summary of notations.

https://doi.org/10.1371/journal.pone.0342450.t001

Definition 1 (Check-in): Given user u, a check-in, also known as a trip, can be defined as . A check-in contains stay location and check-in time .

Definition 2 (Trajectory): For user u, a trajectory is a time-ordered sequence of check-ins and can be denoted by , where n is the length of the trajectory.

Definition 3 (Knowledge set): The knowledge set can be defined as , and each type of knowledge can be denoted by , where k is the number of knowledge types, z is the number of categories, and is the category j of knowledge i.

Problem 1 (Next location prediction): Given a historical trajectory with length n of user u, , predict the next location, .

Problem 2 (Maximum predictability on next location prediction): Given historical trajectory , calculate the maximum accuracy, , that the ideal prediction model can achieve in predicting the next location, .

Problem 3 (Individual knowledge preference): For user u, given knowledge set and trajectory , mine individual travel knowledge that can achieve the highest predictability.

Proposed method

Because of the incomplete spatiotemporal information and limitations in the entropic measures, the maximum predictability has not been fully considered or accurately quantified for various spatiotemporal information, resulting in a lack of detailed analysis on individual travel regularity. In this work, we first summarized the existing spatiotemporal information to obtain complete spatiotemporal knowledge. Then, we proposed a refined maximum predictability based on Shannon entropy to better quantify the predictability for various spatiotemporal knowledge. Finally, to extend the applications of predictability, we applied the refined maximum predictability to mine knowledge preference, which can be further applied to analyze travel regularity and improve prediction models.

Spatiotemporal knowledge

Travel time and location are fundamental travel information for predictability calculations. Therefore, this work derives four spatiotemporal knowledge types based on the combined relationships between these two elements. Specifically, in next location prediction, the travel location includes the origin and destination. Thus, there are three key spatiotemporal elements: travel time, origin, and destination. In single-element combinations, the destination-related knowledge contains the destination distribution knowledge. In two-element combinations, destination-related knowledge includes origin-destination distribution knowledge and time-destination distribution knowledge. In three-element combinations, the destination-related knowledge contains the time-origin-destination distribution knowledge. Except for destination distribution, spatiotemporal information has multi-order conditions. For instance, the second-order origin-destination distribution is the destination distribution when considering two previous locations.

In this work, we primarily considered first-order spatiotemporal information. Therefore, the spatiotemporal knowledge set can be defined as , which denotes the destination, origin-destination, last check-in time-destination, and last check-in time-origin-destination distribution knowledge, respectively. Specifically, we considered two temporal scales: weekly (weekday/workday) and hourly (48-hour periods).

Refined maximum predictability

Existing maximum predictability is mainly calculated using entropic measures and Fano’s inequality [45,46]. First, an entropic measure is used to estimate entropy based on individual travel information. Then, the entropy is converted into the maximum predictability using Fano’s inequality. Specifically, when the entropy value is H, and the number of visited locations is m, the maximum predictability, , can be obtained by solving the following entropy-predictability-conversion equation:

(1)

There are mainly two types of entropic measures: real entropy and conditional entropy. According to Zhang et al. [19], conditional entropy has better expandability than real entropy. However, when non-repeating trips are present, predictability based on conditional entropy is significantly overestimated; for instance, given a location sequence (a, b, c, d, e), the conditional probability of each trip is 1; therefore, the conditional entropy is 0, and the estimated maximum predictability is 1. This indicates that the location sequence is very regular, which is contrary to our knowledge. In addition, conditional entropy usually overestimates the number of visited locations when utilizing Fano’s inequality; for instance, given two individuals’ origin-destination co-occurrence matrices, as shown in Fig 1, there are three locations, including D1, D2, and D3. Individual A takes one trip each from to , and to . For Individual B, the frequencies for trips from to and to are both zero. According to the conditional entropy and Fano’s inequality, individual A’s maximum predictability is 0.709, while individual B’s is 0.849. Although the differences between the two matrices are negligible, individual B has a much higher predictability than individual A. The key factor is the number of visited locations in Fano’s inequality. In individual B’s matrix, the total number of visited locations is three, but most trips occurred between only two locations. Thus, the number of visited locations is slightly overestimated. Since the maximum predictability increases with the number of visited locations when the entropy value is fixed, the maximum predictability based on conditional entropy is usually overestimated due to the overestimated number of visited locations.

Download:

Fig 1. Examples of maximum predictability calculations, and there are three locations, including D1, D2, and D3.

(a): Co-occurrence matrix of Individual A. Individual A made single trips for D1 to D3, D2 to D3, and D3 to D3. And Individual A took two trips from D1 to D1 and D2 to D2 and six round trips between D1 and D2. (b): Co-occurrence matrix of Individual B. Individual B’s visits are mostly identical to those of Individual A, with the exception that the frequencies for trips from D1 to D3 and D2 to D3 are both zero.

https://doi.org/10.1371/journal.pone.0342450.g001

To address these problems, we define non-repeating trips as those with low repeatability. For each type of spatiotemporal knowledge, we first categorized the trips into repeated and non-repeating groups. Given a knowledge type, if the maximum destination frequency under sub-conditional knowledge is 1, such as in Fig 2(a) where the maximum destination frequency of sub-conditional knowledge C3 is 1, all trips under this sub-condition can be considered as non-repeating trips. As the number of trips increases, the randomness also increases. For instance, in Fig 2(b), the maximum destination frequency under sub-conditional knowledge C3 is 2, but it is significantly lower than the maximum destination frequencies of other sub-conditional knowledge; trips under C3 still can be considered as non-repeating trips. Therefore, the threshold for classifying non-repeating trips is positively correlated with the number of trips. Additionally, the more visited locations there are, the smaller the threshold should be. Thus, we utilized the individual’s average location frequency and a reduction factor to determine the threshold for non-repeating trips, as shown in Eq 2. The threshold should be greater than zero, and all trips under sub-conditional knowledge with a maximum destination frequency below this threshold are classified as non-repeating trips.

(2)

where max() denotes maximum operation, n denotes the length of trajectory, m denotes the number of visited locations, β is the reduction factor and is set to 8 in this work; this is discussed in the next session.

Download:

Fig 2. Examples of non-repeating trips.

(a): Example 1. Under sub-conditional knowledge C3, the trips to D2 and D3 are both 1, which belong to non-repeating trips, and under sub-conditional knowledge C1 and C2, the trips to D1 are 7 and 5, respectively. (b): Trips to D2 and D3 are 1 and 2 under sub-conditional knowledge C3, respectively. These values are mostly lower than those recorded under sub-conditional knowledge C1 and C2.

https://doi.org/10.1371/journal.pone.0342450.g002

Subsequently, for non-repeating trips, the predictability was set to 1/m when the number of visited locations was m. For repeated trips, to accurately estimate the number of visited locations, we applied Shannon entropy and Fano’s inequality to measure the predictability for each category set. Finally, the predictability was obtained by weighting the predictability of different category sets according to their category probabilities. Given user u, the predictability for knowledge Cⁱ can be defined as:

(3)

(4)

(5)

where f(c,l) denotes the co-occurrence of location l and category c, denotes the category set that contains repeated trips, denotes the category set that contains non-repeating trips, P(c) is the probability of category c, is the probability of location l when category c occurs, m denotes the number of visited locations, and Thrf is the frequency threshold of non-repeating trips, denotes the maximum predictability of category c, and denotes the Shannon entropy of category c.

To fully consider all spatiotemporal knowledge, the maximum predictability of different knowledge was selected as refined maximum predictability. Given the spatiotemporal knowledge set , the refined maximum predictability can be defined as

(6)

We used a case to describe how to calculate the refined maximum predictability. As shown in Fig 1, given individual A’s origin-destination co-occurrence matrix, we considered only the destination distribution and the first-order origin-destination distribution knowledge; therefore, according to Eq 2, the threshold of non-repeating trips is 1. For the origin-destination distribution knowledge, the trips that belong to categories D1 and D2 are repeated trips, while the trips that belong to category D3 are non-repeating trips. The Shannon entropy of the destination distribution knowledge can be calculated as . According to Eq 1, the predictability, , is 0.527. The Shannon entropy of origin-destination distribution knowledge can be calculated as , the predictability is 0.680, and the final predictability of origin-destination distribution knowledge can be calculated as . Therefore, the refined maximum predictability is .

Knowledge preference

To extend the applications of predictability, we mined the individual knowledge preference based on the refined maximum predictability. Furthermore, users were then divided into several groups based on the individual knowledge preference, which can be applied to analyze travel regularity and evaluate prediction models.

We mined the individual knowledge preference feature by comparing the predictabilities for different knowledge types and selecting the one with the highest predictability as the individual’s preferred knowledge. Therefore, given the knowledge set , the predictabilities for different knowledge , , and the refined maximum predictability , the knowledge preference feature can be defined as , and each element can be measured using the following equation:

(7)

There are 15 types of knowledge preference features when considering four types of knowledge, which indicates that users can be divided into 15 groups based on the knowledge preference feature. To better evaluate prediction models, we reduced the number of groups. We set a knowledge priority based on the principle of least knowledge, that is D > OD > TD > TOD. We then categorized the knowledge preference features into four types. Consequently, we obtained four groups: Gd, God, Gtd, and Gtod. Given user u and the knowledge preference feature , the process can be expressed as:

(8)

Experiments

In this section, the datasets and experimental settings are first introduced. Then, we conducted the numerical experiment and case study to evaluate the refined maximum predictability. Finally, the refined maximum predictability is applied to mine individual spatiotemporal knowledge preferences and evaluate knowledge selection and utilization in location prediction models.

Datasets

We conducted experiments using four real-world datasets: Foursquare NYC (NYC), Foursquare TKY (TKY), Foursquare POA (POA), and Geolife (BJ) [47–50]. The Foursquare datasets were collected from April 2012 to September 2013 in New York, Tokyo, and Porto Alegre using the Foursquare platform. These datasets represent individual real-world POI visit sequences, reflecting diverse visit patterns and featuring a relatively sparse sampling granularity, as they contain only partial individual trips. Conversely, the Geolife dataset was primarily collected from April 2007 to August 2012 in Beijing. It represents a fine sampling granularity, comprising high-frequency GPS trajectories with an average sampling interval of 1-5 seconds. The completeness and accuracy of these datasets are well-established, as they have been widely used in various fields, including travel pattern mining [51–55], next location prediction [56–63], and maximum predictability calculation [19]. For preprocessing, we split the check-ins into multiple sub-trajectories based on an interval of 72 hours, ensuring each sub-trajectory contained no more than 10 records. We then filtered out users with fewer than two sub-trajectories or 10 check-ins. In the location prediction, 80% of each user’s trajectories were used for training, and the rest were used for testing. The preprocessing results are listed in Table 2. The four datasets exhibit significant variability in scale and temporal scope, making them essential for testing method generalizability. The TKY dataset is the largest, featuring the highest number of users, 7341, and records, 741563, over a 10-month period. Conversely, BJ is the smallest in user count, 98, and records, 14566, yet it offers the longest temporal span at 63 months.

Download:

Table 2. Statistics of the real-world datasets.

https://doi.org/10.1371/journal.pone.0342450.t002

Furthermore, each approach has its specific scope. In real-world datasets, imbalanced data and unknown theoretical maximum predictability are common problems. Therefore, a balanced dataset with known theoretical predictability is crucial. To address these problems, we generated a simulation dataset with known theoretical predictability and balanced data. To ensure the simulation dataset reflects real-world mobility behaviors, we primarily followed the approach of Xu et al. [36] and adopted the following strategies. 1) Pattern diversity. The simulation dataset is constructed based on common individual travel patterns: regular and irregular. For regular trips, time-destination and origin-destination patterns are considered, both frequently observed in reality. Irregular trips are modeled as random trips, where individuals randomly select a location from their historical destinations, representing a common form of irregular behavior. 2) Varied regularity. Since individuals exhibit varying degrees of regularity, ranging from predominantly regular to predominantly irregular travelers, we generated a dataset with diverse proportions of regular trips to ensure balanced samples of different regularity individuals. 3) Reduced randomness. To mitigate the impact of randomness, we generated 50 travel sequence samples for each combination of pattern type and regularity level. This reduces the influence of randomness, making the simulation dataset more realistic.

Specifically, we considered two sequence generators with controllable predictability to generate a simulation dataset that contains different pattern types and regularity levels. Given distinct locations m, time periods h, and length n, two types of sequences can be defined as follows.

Markovian location sequences. At each step, the location is determined by a fixed location transition order (i.e., a, b, c, d, a, b, c, d... ) with probability p (representing regular origin-destination trip), and the location is randomly selected from m candidate locations with probability 1–p (representing irregular trip); therefore, the theoretical maximum predictability is .

Markovian time-location sequences. At each step, the location is determined based on fixed time-location transition rules (i.e., (t1, a), (t2, b), (t1, a), (t3, c), (t3, c), (t2, b)... ) with probability q (representing regular time-destination trip), and the location is randomly selected from m candidate locations with probability 1–q (representing irregular trip); therefore, the theoretical maximum predictability is .

We set the number of locations m to 10 and 20, respectively, and set length n to 200 and number of time periods h to 48 based on the work conducted by Xu et al. [36]. Furthermore, to diversify the regularity degrees in the simulation dataset, we set the interval to 0.1 and adjusted p, q in the range of [0, 1]. Finally, to reduce randomness, we generated 50 sequences for each p and q to form a simulation dataset containing 2200 trajectories.

Experimental settings

Evaluation metric. To evaluate the performance of the location prediction, the standard accuracy metric (Top K) [64] was used, which shows the ground-truth in the top K prediction results. The MAE was selected to measure the mean absolute error between the theoretical predictability and estimated predictability.

Next location prediction models. We selected Markov, HSTLSTM [64], DeepMove [65], LSTPM [66], and MHSA [67] as location prediction models. Specifically, we used the first-order Markov model, and the other models are mainstream deep learning methods. Furthermore, for each user, we selected the maximum accuracy among the prediction models as the maximum accuracy. The test results on the real-world datasets are presented in Table 3. The LSTPM model achieved the highest prediction accuracy, recording 0.248 on the NYC dataset and 0.266 on the POA dataset. For the TKY dataset, DeepMove delivered the top result at 0.208, while MHSA proved most effective on the BJ dataset, reaching an accuracy of 0.557.

Download:

Table 3. Performance of location prediction models on the real-world datasets.

https://doi.org/10.1371/journal.pone.0342450.t003

Comparison of maximum predictability. We compared our method with five state-of-the-art baselines: 1) real entropy (RE) [13–16], which mainly measures the maximum predictability based on the location sequences; 2) refined real entropy (RRE) [36], which improves the calculation process of real entropy; 3) conditional entropy (CE) [19], which measures the maximum predictability based on the distribution of time (48 h)-origin-destination; 4) fusion conditional entropy (FCE) [68], which utilizes conditional entropy to calculate the entropy for different knowledge and selects the minimum value among them as the minimum entropy. And we applied the fusion conditional entropy and Fano’s inequality to the complete spatiotemporal knowledge; 5) fusion multivariate sample entropy (FMSE): We applied the multivariate sample entropy [69–71] to the complete spatiotemporal knowledge and selected the maximum predictability derived from different knowledge types as the final result.

Evaluation of maximum predictability

We primarily conducted two experiments to validate the efficacy of our refined maximum predictability. We first evaluated various predictability methods by calculating the mean absolute error (MAE) between their estimated results and the theoretical ground truth in the simulation dataset. Then, using real-world datasets, we assessed their performance by analyzing the correlation between the estimated predictability and the corresponding maximum prediction accuracy.

1) Numerical experiment. Since the real-world datasets exhibit sample imbalance and lack ground truth for maximum predictability, we utilized the simulation dataset with theoretical maximum predictability and balanced samples to verify the performance of the refined maximum predictability. Specifically, the performance of predictability calculation methods was evaluated by comparing their results to the theoretical maximum predictability value (TV), with smaller errors indicating better performance. As shown in Table 4, the refined maximum predictability yielded the lowest MAE across all experimental configurations, ranging from 0.058 to 0.066, whereas the real entropy and refined real entropy achieved the lowest MAE among all baselines, which ranged from 0.095 to 0.286. Overall, the refined maximum predictability relatively decreased the total MAE by 68% compared to the state-of-the-art methods. Specifically, as shown in Fig 3, on Markovian location sequences, the refined maximum predictability achieved much lower MAEs than real entropy for users with low predictability. On Markovian time-location sequences, as the real entropy and refined real entropy cannot capture the predictability for temporal knowledge, they maintained the same predictability value for individuals regardless of their temporal regularity. In contrast, our proposed method effectively captured the predictability for individuals with differing levels of temporal regularity. In addition, because of the non-repeating trips, the fusion conditional entropy obtained the highest predictability when considering the time (48 h)-origin-destination distribution knowledge. Thus, the predictabilities based on conditional entropy and fusion conditional entropy were equal and considerably higher than the theoretical value. These findings suggest that the proposed refined maximum predictability provides a significantly more accurate estimation of individual predictability in next location prediction compared to traditional methods.

Download:

Table 4. Performance of different maximum predictabilities on simulation dataset.

https://doi.org/10.1371/journal.pone.0342450.t004

Download:

Fig 3. Predictability based on simulation dataset.

(a): Predictability based on Markovian location sequences. (b): Predictability based on Markovian time-location sequences.

https://doi.org/10.1371/journal.pone.0342450.g003

2) Case study. To further validate the performance of the refined maximum predictability on real-world datasets, we compared the correlation between individual maximum prediction accuracy and the predictability results of different methods based on the four real-world datasets. The higher correlation indicates that the predictability calculation method can better estimate the individual regularity. We selected those users with more than 10 test samples since their prediction accuracies are more reliable. As shown in Table 5, the overall correlation between the maximum accuracy and refined maximum predictability was the highest, with correlation improvements ranging from 4.9% to 16.42% compared to the baselines. Specifically, the correlation values of refined maximum predictability ranged from 0.628 to 0.728, whereas the values of refined real entropy ranged from 0.549 to 0.694. Moreover, Fig 4 showed that the prediction accuracy increased with the refined maximum predictability, and its values were larger than the prediction accuracies for most users. These findings indicate that the refined maximum predictability can reflect individual regularity well, regardless of whether the input data consists of sparsely sampled check-ins or densely sampled GPS trajectories.

Download:

Table 5. Correlation between the maximum prediction accuracy and predictability on real-world datasets.

https://doi.org/10.1371/journal.pone.0342450.t005

Download:

Fig 4. Correlation between the maximum accuracy and refined maximum predictability on real-world datasets.

https://doi.org/10.1371/journal.pone.0342450.g004

We also compared the efficiency of different maximum predictability calculation methods based on the NYC dataset, and the experiments were conducted on a computer equipped with 16 GB of memory and an Intel Core i7-8700 CPU. As shown in Table 6, the maximum predictabilities based on real entropy had the fastest computation speed, requiring only 0.67 ms to calculate the maximum predictability for one user. In contrast, the method based on fusion multivariate sample entropy was the slowest. Since the real entropy, refined real entropy, and conditional entropy only consider one or two types of knowledge, the proposed refined maximum predictability incorporates and calculates predictability for four different types of knowledge, cumulatively increasing the computational complexity. Thus, it took 5.40 ms to calculate the predictability for one user, enabling single-threaded computation of 185 users’ predictability per second, which is enough for practical application. In summary, the proposed method sacrifices minimal computational speed to deliver a far more valuable, comprehensive, and accurate predictability assessment that can effectively evaluate and improve location prediction methods in a way traditional methods cannot.

Download:

Table 6. The prediction speeds of different methods on NYC dataset.

https://doi.org/10.1371/journal.pone.0342450.t006

Applications of knowledge preference

Based on the predictability for different knowledge, we further analyzed individual spatiotemporal knowledge preference and evaluated prediction models on real-world datasets.

1) Dataset analysis. The analysis of the maximum predictability is useful in many fields, such as traffic planning and management. The distribution of the proposed refined maximum predictability is shown in Fig 5. The datasets exhibited distinct predictability distributions. The BJ dataset was characterized by a high proportion of highly predictable individuals, with an average predictability of 0.671. Conversely, TKY had more low-predictability individuals, averaging 0.278. NYC and POA mainly consisted of moderately regular individuals, with averages of 0.413 and 0.434. Furthermore, the dominant predictability types were Gd and God for NYC and BJ, while Gd and Gtd were most prevalent in TKY and POA. These results reflect the underlying diversity of the datasets.

Download:

Fig 5. Distribution of the refined maximum predictability on real-world datasets.

https://doi.org/10.1371/journal.pone.0342450.g005

2) Evaluations on prediction models. We compared the refined maximum predictability with prediction accuracy to evaluate the knowledge utilization and selection for location prediction models, which can guide the design and improvement of prediction models. Finally, we applied the evaluation results to improve the Markov model.

Knowledge utilization. The distributions of refined maximum predictability and prediction accuracy for different knowledge preference groups are shown in Fig 6. Overall, the prediction accuracy across all methods consistently falls below the refined maximum predictability, confirming that the proposed refined maximum predictability can represent the maximum potential predictability well. In the knowledge utilization, users in the God group with a predictability exceeding 0.8 exhibit larger gaps than other groups, highlighting significant room for improvement in utilizing origin-destination distribution knowledge. Regarding prediction methods, different approaches excelled in different areas: LSTPM and MHSA achieved the best prediction performance for the Gd and Gtd groups, while the Markov model performed best on the God and Gtod groups, which is consistent with its primary focus on origin-destination distribution knowledge. These findings demonstrate that our proposed method offers a more comprehensive evaluation of prediction methods, thereby guiding their future improvement and selection.

Download:

Fig 6. Distributions of the refined maximum predictability and prediction accuracy for different knowledge preference groups.

https://doi.org/10.1371/journal.pone.0342450.g006

Knowledge selection. The importance of new knowledge can be evaluated based on the refined maximum predictability with new knowledge. To illustrate this, we used the next check-in time as an example. We extended the previous spatiotemporal knowledge to incorporate the distributions of the next check-in time-destination and next check-in time-origin-destination. The average maximum predictability then reached 0.426, 0.286, 0.469, and 0.694 in NYC, TKY, POA, and BJ, respectively. These are slightly higher than the predictability without considering the next check-in time. Therefore, the next check-in time is useful for location prediction.

Improvement on prediction model. The prediction models can be improved using the evaluation results. Here, we enhanced the Markov model using model evaluation results. Specifically, we selected knowledge corresponding to each preference group as prior information for location prediction, defaulting to origin-destination distribution knowledge when necessary. As shown in Table 7, the improved Markov model can improve the accuracy of Top 1 by 0.5 - 4.4 percentage points compared to the Markov model. Furthermore, the improved Markov model outperformed all baselines on the NYC dataset, while it slightly underperformed DeepMove, LSTPM, and MHSA on other datasets. This discrepancy may be attributed to variations in regularity distributions across these datasets. The Top 5 and Top 10 accuracy of the improved Markov model was close to that of LSTPM, MHSA, and DeepMove, indicating that the personalized selection and full utilization of spatiotemporal knowledge are important for location prediction. In addition, the improved Markov model does not require training, effectively saving training resources, and thus has substantial application potential.

Download:

Table 7. Performance of the location prediction models.

https://doi.org/10.1371/journal.pone.0342450.t007

In summary, the proposed refined maximum predictability estimates individual maximum predictability and spatiotemporal preferences well, supporting predictability analysis and location prediction model improvement, enabling: 1) Enhanced travel services, such as accurate trip time estimations and tailored recommendations for routes, parking, and Points of Interest. 2) Traffic management support. The maximum predictability dictates the controllable boundaries and potential effectiveness of management strategies. And the prediction results of high-predictability groups enable better assessment of future traffic conditions. Thus, on one hand, this information guides the macroscopic traffic management, such as traffic signal control, ramp metering, and the optimization of public transit routes and schedules. On the other hand, it facilitates personalized travel guidance: customized incentives can be used to induce highly predictable individuals to adjust their travel time or route, thereby alleviating congestion. Moreover, this methodology can be extended to other scenarios, such as sequence recommendation.

Discussion

1) Threshold for non-repeating trips. The non-repeating trip threshold, an essential component in calculating the refined maximum predictability, is determined by the average location frequency and a reduction factor (β). An excessively high reduction factor results in a low non-repeating trip threshold, which may overestimate the predictability of high-frequency but irregular trips. Conversely, a value that is too low results in a high threshold, which may underestimate the predictability of low-frequency but regular trips. To determine the optimal setting, we evaluated the impact of different β values on the correlation between maximum prediction accuracy and refined maximum predictability, using data from four diverse cities with varying sampling granularities. The experimental results are shown in Fig 7. Although the optimal range varied slightly by city, all datasets relatively performed best when the value was set to 8 or above, indicating a degree of universality for this range and leading us to adopt 8 as the optimal β value for these datasets. Furthermore, because the average location frequency for most individuals was less than 8 in these datasets, their non-repeating trip threshold stabilized at 1, making the overall predictability insensitive to further increases in β within this range. In summary, because the underlying distribution of individual travel patterns differs by city, the precise optimal β value range may vary slightly. Consequently, we recommend selecting a value of 8 or above for practical applications based on our empirical range, preferably opting for a larger value. Alternatively, the optimal value can be determined by reevaluating the correlation between location prediction accuracy and predictability in the specific new dataset.

Download:

Fig 7. Correlations between the maximum accuracy and predictability for different

values.

https://doi.org/10.1371/journal.pone.0342450.g007

2) Spatiotemporal knowledge. Here, we discussed the significance of considering the four types of spatiotemporal knowledge. As presented in Table 8, the correlation between maximum accuracy and predictability was the highest when considering all four types of knowledge. Specifically, when only considering destination distribution knowledge (d), the correlations ranged from 0.4923 to 0.7180. And the addition of temporal and origin-related knowledge to the refined maximum predictability provided a consistent improvement in correlation across all datasets. The BJ dataset showed the most significant overall increase in correlation when incorporating knowledge, rising from 0.4923 to 0.6281. Fig 5 also displayed that each knowledge preference group had a specific proportion of users in the four datasets. These findings indicate the crucial role of external knowledge in enhancing refined maximum predictability and that all four types of spatiotemporal knowledge should be considered.

Download:

Table 8. Correlations between the maximum accuracy and predictability when considering different types of knowledge on real-world datasets.

https://doi.org/10.1371/journal.pone.0342450.t008

Conclusion

In this work, we focused on the estimation and applications of maximum predictability. And three problems persist, namely, incomplete consideration of spatiotemporal information, inadequate entropic measures, and the lack of utilizing predictability for detailed individual regularity analysis. To address these problems, we summarized spatiotemporal information and categorized it into four types. Then, we proposed a refined maximum predictability utilizing fusion knowledge and Shannon entropy. This method more accurately estimates predictability and identifies individual knowledge preferences, thereby improving group classification and prediction models. Our experiments demonstrated that the refined maximum predictability achieved a superior balance between spatial and temporal information, relatively outperforming classical methods by 68% in MAE on the simulation dataset. Moreover, the prediction model evaluations demonstrated the substantial potential of improving location prediction models with knowledge preference. Furthermore, the refined maximum predictability supports the inclusion or exclusion of various information types, allowing it to be generalized to mobility datasets with diverse spatiotemporal scales and information types, as well as extended to other prediction scenarios, such as sequence recommendation.

Acknowledgments

We are grateful to Associate Professor Yang’s team for the Foursquare datasets and Yu Zheng’s team for the Geolife dataset.

References

1. Li J, Fu D, Yuan Q, Zhang H, Chen K, Yang S, et al. A traffic prediction enabled double rewarded value iteration network for route planning. IEEE Trans Veh Technol. 2019;68(5):4170–81.
- View Article
- Google Scholar
2. Tang L, Duan Z, Zhu Y, Ma J, Liu Z. Recommendation for ridesharing groups through destination prediction on trajectory data. IEEE Trans Intell Transport Syst. 2021;22(2):1320–33.
- View Article
- Google Scholar
3. Chaoming S, Zehui Q u, Blumm N. Limits of predictability in human mobility. Science. 2010.
- View Article
- Google Scholar
4. Bagrow JP, Liu X, Mitchell L. Information flow reveals prediction limits in online social activity. Nat Hum Behav. 2019;3(2):122–8. pmid:30944448
- View Article
- PubMed/NCBI
- Google Scholar
5. Wang T, Cook DJ, Fischer TR. The indoor predictability of human mobility: estimating mobility with smart home sensors. IEEE Trans Emerg Top Comput. 2023;11(1):182–93. pmid:37457914
- View Article
- PubMed/NCBI
- Google Scholar
6. Wang Y, Yalcin A, VandeWeerd C. An entropy-based approach to the study of human mobility and behavior in private homes. PLoS One. 2020;15(12):e0243503. pmid:33301515
- View Article
- PubMed/NCBI
- Google Scholar
7. Xu E, Yu Z, Li N, Cui H, Yao L, Guo B. Quantifying predictability of sequential recommendation via logical constraints. Front Comput Sci. 2022;17(5).
- View Article
- Google Scholar
8. Xu E, Zhao K, Yu Z, Zhang Y, Guo B, Yao L. Limits of predictability in top-N recommendation. Information Processing & Management. 2024;61(4):103731.
- View Article
- Google Scholar
9. Li C, Lin Q, Huang D, Grifoll M, Yang D, Feng H. Is entropy an indicator of port traffic predictability? The evidence from Chinese ports. Physica A: Statistical Mechanics and its Applications. 2023;612:128483.
- View Article
- Google Scholar
10. Zhao K, Khryashchev D, Freire J, Silva C, Vo H. Predicting taxi demand at high spatial resolution: Approaching the limit of predictability. In: 2016 IEEE International Conference on Big Data (Big Data). 2016. p. 833–42. https://doi.org/10.1109/bigdata.2016.7840676
11. Li H, He F, Lin X, Wang Y, Li M. Travel time reliability measure based on predictability using the Lempel–Ziv algorithm. Transportation Research Part C: Emerging Technologies. 2019;101:161–80.
- View Article
- Google Scholar
12. Wang J, Mao Y, Li J, Xiong Z, Wang W-X. Predictability of road traffic and congestion in urban areas. PLoS One. 2015;10(4):e0121825. pmid:25849534
- View Article
- PubMed/NCBI
- Google Scholar
13. Wan S, Meng J, Fang S, Xing X, Xie K, Bian K. Predictability analysis on expressway vehicle mobility using electronic toll collection data. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC); 2016. p. 2589–94. https://doi.org/10.1109/itsc.2016.7795972
14. Ikanovic EL, Mollgaard A. An alternative approach to the limits of predictability in human mobility. EPJ Data Sci. 2017;6(1):12.
- View Article
- Google Scholar
15. Cuttone A, Lehmann S, González MC. Understanding predictability and exploration in human mobility. EPJ Data Sci. 2018;7(1):2.
- View Article
- Google Scholar
16. Lima E, Aguiar A, Carvalho P, Viana AC. Human mobility support for personalized data offloading. IEEE Trans Netw Serv Manage. 2022;19(2):1505–20.
- View Article
- Google Scholar
17. Teixeira DDC, Viana AC, Almeida JM, Alvim MS. The impact of stationarity, regularity, and context on the predictability of individual human mobility. ACM Trans Spatial Algorithms Syst. 2021;7(4):1–24.
- View Article
- Google Scholar
18. Teixeira D do C, Viana AC, Alvim MS, Almeida JM. Deciphering predictability limits in human mobility. In: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2019. p. 52–61. https://doi.org/10.1145/3347146.3359093
19. Zhang C, Zhao K, Chen M. Beyond the limits of predictability in human mobility prediction: context-transition predictability. IEEE Trans Knowl Data Eng. 2022;:1–1.
- View Article
- Google Scholar
20. Jiang H, Zhang Y, Xiao Z, Zhao P, Iyengar A. An empirical study of travel behavior using private car trajectory data. IEEE Trans Netw Sci Eng. 2021;8(1):53–64.
- View Article
- Google Scholar
21. Cheng Z, Trépanier M, Sun L. Probabilistic model for destination inference and travel pattern mining from smart card data. Transportation. 2020;48(4):2035–53.
- View Article
- Google Scholar
22. Hu S, Liang Q, Qian H, Weng J, Zhou W, Lin P. Frequent-pattern growth algorithm based association rule mining method of public transport travel stability. International Journal of Sustainable Transportation. 2020;15(11):879–92.
- View Article
- Google Scholar
23. Zhu L, Gonder J, Lin L. Prediction of individual social-demographic role based on travel behavior variability using long-term GPS data. Journal of Advanced Transportation. 2017;2017:1–13.
- View Article
- Google Scholar
24. Leng Y, Zhao J, Koutsopoulos H. Leveraging individual and collective regularity to profile and segment user locations from mobile phone data. ACM Trans Manage Inf Syst. 2021;12(3):1–22.
- View Article
- Google Scholar
25. Yang C, Yan F, Ukkusuri SV. Unraveling traveler mobility patterns and predicting user behavior in the Shenzhen metro system. Transportmetrica A: Transport Science. 2017;14(7):576–97.
- View Article
- Google Scholar
26. Huang Y, Xiao Z, Wang D, Jiang H, Wu D. Exploring individual travel patterns across private car trajectory data. IEEE Trans Intell Transport Syst. 2020;21(12):5036–50.
- View Article
- Google Scholar
27. Lu H, Chen M. Travel pattern analysis based on bus GPS and card record. In: 19th COTA International Conference of Transportation Professionals; 2019.
28. Chen Y, Zhao Y, Tsui KL. Clustering-based Travel Pattern Recognition in Rail Transportation System Using Automated Fare Collection Data. In: 2019 Prognostics and System Health Management Conference (PHM-Qingdao); 2019. p. 1–7.
29. Lei D, Chen X, Cheng L, Zhang L, Ukkusuri SV, Witlox F. Inferring temporal motifs for travel pattern analysis using large scale smart card data. Transportation Research Part C: Emerging Technologies. 2020;120:102810.
- View Article
- Google Scholar
30. Kieu L-M, Bhaskar A, Chung E. A modified density-based scanning algorithm with noise for spatial travel pattern analysis from smart card AFC data. Transportation Research Part C: Emerging Technologies. 2015;58:193–207.
- View Article
- Google Scholar
31. Sun L, Chen X, He Z, Miranda-Moreno LF. Routine pattern discovery and anomaly detection in individual travel behavior. Netw Spat Econ. 2021;23(2):407–28.
- View Article
- Google Scholar
32. Zhao J, Tian C, Zhang F, Xu C, Feng S. Understanding temporal and spatial travel patterns of individual passengers by mining smart card data. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC); 2014. p. 2991–7.https://doi.org/10.1109/itsc.2014.6958170
33. Yao W, Zhang M, Jin S, Ma D. Understanding vehicles commuting pattern based on license plate recognition data. Transportation Research Part C: Emerging Technologies. 2021;128:103142.
- View Article
- Google Scholar
34. Zhao J, Qu Q, Zhang F, Xu C, Liu S. Spatio-temporal analysis of passenger travel patterns in massive smart card data. IEEE Trans Intell Transport Syst. 2017;18(11):3135–46.
- View Article
- Google Scholar
35. Goulet-Langlois G, Koutsopoulos HN, Zhao Z, Zhao J. Measuring regularity of individual travel patterns. IEEE Trans Intell Transport Syst. 2018;19(5):1583–92.
- View Article
- Google Scholar
36. Xu P, Yin L, Yue Z, Zhou T. On predictability of time series. Physica A: Statistical Mechanics and its Applications. 2019;523:345–51.
- View Article
- Google Scholar
37. Yu Z, Dang M, Wu Q, Chen L, Xie Y, Wang Y, et al. An information theory based method for quantifying the predictability of human mobility. ACM Trans Knowl Discov Data. 2023;17(9):1–19.
- View Article
- Google Scholar
38. Smith G, Wieser R, Goulding J, Barrack D. A refined limit on the predictability of human mobility. In: 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom). 2014. p. 88–94. https://doi.org/10.1109/percom.2014.6813948
39. Osgood ND, Paul T, Stanley KG, Qian W. A theoretical basis for entropy-scaling effects in human mobility patterns. PLoS One. 2016;11(8):e0161630. pmid:27571423
- View Article
- PubMed/NCBI
- Google Scholar
40. Zeng S, Wang H, Li Y, Jin D. Predictability and prediction of human mobility based on application-collected location data. In: 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS); 2017. p. 28–36.
41. Liao Y, Yeh S. Predictability in human mobility based on geographical-boundary-free and long-time social media data. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC). 2018. p. 2068–73. https://doi.org/10.1109/itsc.2018.8569770
42. Kulkarni V, Mahalunkar A, Garbinato B, Kelleher JD. Examining the limits of predictability of human mobility. Entropy (Basel). 2019;21(4):432. pmid:33267146
- View Article
- PubMed/NCBI
- Google Scholar
43. Lu X, Wetter E, Bharti N, Tatem AJ, Bengtsson L. Approaching the limit of predictability in human mobility. Sci Rep. 2013;3:2923. pmid:24113276
- View Article
- PubMed/NCBI
- Google Scholar
44. Zhang J, Hasan S, Roy KC, Yan X. Predicting individual mobility behavior of ride-hailing service users considering heterogeneity of trip purposes. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). 2021. p. 3685–90. https://doi.org/10.1109/itsc48978.2021.9565125
45. Fano RM, Wintringham WT. Transmission of information. Physics Today. 1961;14(12):56–8.
- View Article
- Google Scholar
46. Brabazon A, O’Neill M. Natural computing in computational finance: an introduction. Natural computing in computational finance. Berlin, Heidelberg: Springer; 2008. p. 1–4.
47. Yang D, Zhang D, Zheng VW, Yu Z. Modeling user activity preference by leveraging user spatial temporal characteristics in LBSNs. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2015;45(1):129–42.
- View Article
- Google Scholar
48. Yang D, Zhang D, Qu B. Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Trans Intell Syst Technol. 2016;7(3):1–23.
- View Article
- Google Scholar
49. Yang D, Zhang D, Chen L, Qu B. NationTelescope: monitoring and visualizing large-scale collective behavior in LBSNs. Journal of Network and Computer Applications. 2015;55:170–80.
- View Article
- Google Scholar
50. Zheng Y, Chen Y, Xie X, Ma WY. GeoLife2.0: a location-based social networking service. In: 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware; 2009. p. 357–8. https://ieeexplore.ieee.org/abstract/document/5088957
51. Ghane’i-Ostad M, Vahdat-Nejad H, Abdolrazzagh-Nezhad M. Detecting overlapping communities in LBSNs by fuzzy subtractive clustering. Soc Netw Anal Min. 2018;8(1):23.
- View Article
- Google Scholar
52. Betancourt F, Riascos AP, Mateos JL. Temporal visitation patterns of points of interest in cities on a planetary scale: a network science and machine learning approach. Sci Rep. 2023;13(1):4890. pmid:36966183
- View Article
- PubMed/NCBI
- Google Scholar
53. Wang D, Wang P, Fu Y, Liu K, Xiong H, Hughes CE. Reinforced imitative graph learning for mobile user profiling. IEEE Trans Knowl Data Eng. 2023;35(12):12944–57.
- View Article
- Google Scholar
54. Yang D, Qu B, Yang J, Cudre-Mauroux P. Revisiting User Mobility and Social Relationships in LBSNs: a hypergraph embedding approach. In: The World Wide Web Conference. 2019. p. 2147–57. https://doi.org/10.1145/3308558.3313635
55. Zhang L, Long C, Cong G. Region embedding with intra and inter-view contrastive learning. IEEE Trans Knowl Data Eng. 2023;35(9):9031–6.
- View Article
- Google Scholar
56. He Y, Zhou W, Luo F, Gao M, Wen J. Feature-based POI grouping with transformer for next point of interest recommendation. Applied Soft Computing. 2023;147:110754.
- View Article
- Google Scholar
57. Chen W, Wan H, Guo S, Huang H, Zheng S, Li J, et al. Building and exploiting spatial–temporal knowledge graph for next POI recommendation. Knowledge-Based Systems. 2022;258:109951.
- View Article
- Google Scholar
58. Jiang S, He W, Cui L, Xu Y, Liu L. Modeling long- and short-term user preferences via self-supervised learning for next POI recommendation. ACM Trans Knowl Discov Data. 2023;17(9):1–20.
- View Article
- Google Scholar
59. Li S, Chen W, Wang B, Huang C, Yu Y, Dong J. MCN4Rec: multi-level collaborative neural network for next location recommendation. ACM Trans Inf Syst. 2024;42(4):1–26.
- View Article
- Google Scholar
60. Liu Y, Wu H, Rezaee K, Khosravi MR, Khalaf OI, Khan AA, et al. Interaction-enhanced and time-aware graph convolutional network for successive point-of-interest recommendation in traveling enterprises. IEEE Trans Ind Inf. 2023;19(1):635–43.
- View Article
- Google Scholar
61. Zheng Y, Zhou X. Modeling multi-factor user preferences based on transformer for next point of interest recommendation. Expert Systems with Applications. 2024;255:124894.
- View Article
- Google Scholar
62. Huang T, Pan X, Cai X, Zhang Y, Yuan X. Learning time slot preferences via mobility tree for next POI recommendation. AAAI. 2024;38(8):8535–43.
- View Article
- Google Scholar
63. Kumar A, Jain DK, Mallik A, Kumar S. Modified node2vec and attention based fusion framework for next POI recommendation. Information Fusion. 2024;101:101998.
- View Article
- Google Scholar
64. Kong D, Wu F. HST-LSTM: a hierarchical spatial-temporal long-short term memory network for location prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 2018. p. 2341–7. https://doi.org/10.24963/ijcai.2018/324
65. Feng J, Li Y, Zhang C, Sun F, Meng F, Guo A, et al. DeepMove: predicting human mobility with attentional recurrent networks. In: Proceedings of the 2018 World Wide Web Conference. WWW ’18. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee; 2018. p. 1459–68.
66. Sun K, Qian T, Chen T, Liang Y, Nguyen QVH, Yin H. Where to go next: modeling long- and short-term user preferences for point-of-interest recommendation. AAAI. 2020;34(01):214–21.
- View Article
- Google Scholar
67. Hong Y, Zhang Y, Schindler K, Raubal M. Context-aware multi-head self-attentional neural network model for next location prediction. Transportation Research Part C: Emerging Technologies. 2023;156:104315.
- View Article
- Google Scholar
68. Fassinut-Mombot B, Choquel J-B. A new probabilistic and entropy fusion approach for management of information sources. Information Fusion. 2004;5(1):35–47.
- View Article
- Google Scholar
69. Ahmed MU, Mandic DP. Multivariate multiscale entropy analysis. IEEE Signal Process Lett. 2012;19(2):91–4.
- View Article
- Google Scholar
70. Zhao Q, Yang G, Zhao K, Yin J, Rao W, Chen L. Multivariate time-series forecasting model: predictability analysis and empirical study. IEEE Trans Big Data. 2023;9(6):1536–48.
- View Article
- Google Scholar
71. Looney D, Adjei T, Mandic DP. A novel multivariate sample entropy algorithm for modeling time series synchronization. Entropy (Basel). 2018;20(2):82. pmid:33265173
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Li J, Fu D, Yuan Q, Zhang H, Chen K, Yang S, et al. A traffic prediction enabled double rewarded value iteration network for route planning. IEEE Trans Veh Technol. 2019;68(5):4170–81.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Tang L, Duan Z, Zhu Y, Ma J, Liu Z. Recommendation for ridesharing groups through destination prediction on trajectory data. IEEE Trans Intell Transport Syst. 2021;22(2):1320–33.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Chaoming S, Zehui Q u, Blumm N. Limits of predictability in human mobility. Science. 2010.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Bagrow JP, Liu X, Mitchell L. Information flow reveals prediction limits in online social activity. Nat Hum Behav. 2019;3(2):122–8. pmid:30944448
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref5] 5. Wang T, Cook DJ, Fischer TR. The indoor predictability of human mobility: estimating mobility with smart home sensors. IEEE Trans Emerg Top Comput. 2023;11(1):182–93. pmid:37457914
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Wang Y, Yalcin A, VandeWeerd C. An entropy-based approach to the study of human mobility and behavior in private homes. PLoS One. 2020;15(12):e0243503. pmid:33301515
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref7] 7. Xu E, Yu Z, Li N, Cui H, Yao L, Guo B. Quantifying predictability of sequential recommendation via logical constraints. Front Comput Sci. 2022;17(5).
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref8] 8. Xu E, Zhao K, Yu Z, Zhang Y, Guo B, Yao L. Limits of predictability in top-N recommendation. Information Processing & Management. 2024;61(4):103731.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref9] 9. Li C, Lin Q, Huang D, Grifoll M, Yang D, Feng H. Is entropy an indicator of port traffic predictability? The evidence from Chinese ports. Physica A: Statistical Mechanics and its Applications. 2023;612:128483.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref10] 10. Zhao K, Khryashchev D, Freire J, Silva C, Vo H. Predicting taxi demand at high spatial resolution: Approaching the limit of predictability. In: 2016 IEEE International Conference on Big Data (Big Data). 2016. p. 833–42. https://doi.org/10.1109/bigdata.2016.7840676

[ref11] 11. Li H, He F, Lin X, Wang Y, Li M. Travel time reliability measure based on predictability using the Lempel–Ziv algorithm. Transportation Research Part C: Emerging Technologies. 2019;101:161–80.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref12] 12. Wang J, Mao Y, Li J, Xiong Z, Wang W-X. Predictability of road traffic and congestion in urban areas. PLoS One. 2015;10(4):e0121825. pmid:25849534
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref13] 13. Wan S, Meng J, Fang S, Xing X, Xie K, Bian K. Predictability analysis on expressway vehicle mobility using electronic toll collection data. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC); 2016. p. 2589–94. https://doi.org/10.1109/itsc.2016.7795972

[ref14] 14. Ikanovic EL, Mollgaard A. An alternative approach to the limits of predictability in human mobility. EPJ Data Sci. 2017;6(1):12.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Cuttone A, Lehmann S, González MC. Understanding predictability and exploration in human mobility. EPJ Data Sci. 2018;7(1):2.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Lima E, Aguiar A, Carvalho P, Viana AC. Human mobility support for personalized data offloading. IEEE Trans Netw Serv Manage. 2022;19(2):1505–20.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Teixeira DDC, Viana AC, Almeida JM, Alvim MS. The impact of stationarity, regularity, and context on the predictability of individual human mobility. ACM Trans Spatial Algorithms Syst. 2021;7(4):1–24.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Teixeira D do C, Viana AC, Alvim MS, Almeida JM. Deciphering predictability limits in human mobility. In: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 2019. p. 52–61. https://doi.org/10.1145/3347146.3359093

[ref19] 19. Zhang C, Zhao K, Chen M. Beyond the limits of predictability in human mobility prediction: context-transition predictability. IEEE Trans Knowl Data Eng. 2022;:1–1.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref20] 20. Jiang H, Zhang Y, Xiao Z, Zhao P, Iyengar A. An empirical study of travel behavior using private car trajectory data. IEEE Trans Netw Sci Eng. 2021;8(1):53–64.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref21] 21. Cheng Z, Trépanier M, Sun L. Probabilistic model for destination inference and travel pattern mining from smart card data. Transportation. 2020;48(4):2035–53.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref22] 22. Hu S, Liang Q, Qian H, Weng J, Zhou W, Lin P. Frequent-pattern growth algorithm based association rule mining method of public transport travel stability. International Journal of Sustainable Transportation. 2020;15(11):879–92.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref23] 23. Zhu L, Gonder J, Lin L. Prediction of individual social-demographic role based on travel behavior variability using long-term GPS data. Journal of Advanced Transportation. 2017;2017:1–13.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Leng Y, Zhao J, Koutsopoulos H. Leveraging individual and collective regularity to profile and segment user locations from mobile phone data. ACM Trans Manage Inf Syst. 2021;12(3):1–22.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref25] 25. Yang C, Yan F, Ukkusuri SV. Unraveling traveler mobility patterns and predicting user behavior in the Shenzhen metro system. Transportmetrica A: Transport Science. 2017;14(7):576–97.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Huang Y, Xiao Z, Wang D, Jiang H, Wu D. Exploring individual travel patterns across private car trajectory data. IEEE Trans Intell Transport Syst. 2020;21(12):5036–50.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref27] 27. Lu H, Chen M. Travel pattern analysis based on bus GPS and card record. In: 19th COTA International Conference of Transportation Professionals; 2019.

[ref28] 28. Chen Y, Zhao Y, Tsui KL. Clustering-based Travel Pattern Recognition in Rail Transportation System Using Automated Fare Collection Data. In: 2019 Prognostics and System Health Management Conference (PHM-Qingdao); 2019. p. 1–7.

[ref29] 29. Lei D, Chen X, Cheng L, Zhang L, Ukkusuri SV, Witlox F. Inferring temporal motifs for travel pattern analysis using large scale smart card data. Transportation Research Part C: Emerging Technologies. 2020;120:102810.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref30] 30. Kieu L-M, Bhaskar A, Chung E. A modified density-based scanning algorithm with noise for spatial travel pattern analysis from smart card AFC data. Transportation Research Part C: Emerging Technologies. 2015;58:193–207.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref31] 31. Sun L, Chen X, He Z, Miranda-Moreno LF. Routine pattern discovery and anomaly detection in individual travel behavior. Netw Spat Econ. 2021;23(2):407–28.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref32] 32. Zhao J, Tian C, Zhang F, Xu C, Feng S. Understanding temporal and spatial travel patterns of individual passengers by mining smart card data. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC); 2014. p. 2991–7.https://doi.org/10.1109/itsc.2014.6958170

[ref33] 33. Yao W, Zhang M, Jin S, Ma D. Understanding vehicles commuting pattern based on license plate recognition data. Transportation Research Part C: Emerging Technologies. 2021;128:103142.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref34] 34. Zhao J, Qu Q, Zhang F, Xu C, Liu S. Spatio-temporal analysis of passenger travel patterns in massive smart card data. IEEE Trans Intell Transport Syst. 2017;18(11):3135–46.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref35] 35. Goulet-Langlois G, Koutsopoulos HN, Zhao Z, Zhao J. Measuring regularity of individual travel patterns. IEEE Trans Intell Transport Syst. 2018;19(5):1583–92.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref36] 36. Xu P, Yin L, Yue Z, Zhou T. On predictability of time series. Physica A: Statistical Mechanics and its Applications. 2019;523:345–51.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref37] 37. Yu Z, Dang M, Wu Q, Chen L, Xie Y, Wang Y, et al. An information theory based method for quantifying the predictability of human mobility. ACM Trans Knowl Discov Data. 2023;17(9):1–19.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref38] 38. Smith G, Wieser R, Goulding J, Barrack D. A refined limit on the predictability of human mobility. In: 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom). 2014. p. 88–94. https://doi.org/10.1109/percom.2014.6813948

[ref39] 39. Osgood ND, Paul T, Stanley KG, Qian W. A theoretical basis for entropy-scaling effects in human mobility patterns. PLoS One. 2016;11(8):e0161630. pmid:27571423
View Article
PubMed/NCBI
Google Scholar

[106] View Article

[107] PubMed/NCBI

[108] Google Scholar

[ref40] 40. Zeng S, Wang H, Li Y, Jin D. Predictability and prediction of human mobility based on application-collected location data. In: 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS); 2017. p. 28–36.

[ref41] 41. Liao Y, Yeh S. Predictability in human mobility based on geographical-boundary-free and long-time social media data. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC). 2018. p. 2068–73. https://doi.org/10.1109/itsc.2018.8569770

[ref42] 42. Kulkarni V, Mahalunkar A, Garbinato B, Kelleher JD. Examining the limits of predictability of human mobility. Entropy (Basel). 2019;21(4):432. pmid:33267146
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref43] 43. Lu X, Wetter E, Bharti N, Tatem AJ, Bengtsson L. Approaching the limit of predictability in human mobility. Sci Rep. 2013;3:2923. pmid:24113276
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref44] 44. Zhang J, Hasan S, Roy KC, Yan X. Predicting individual mobility behavior of ride-hailing service users considering heterogeneity of trip purposes. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). 2021. p. 3685–90. https://doi.org/10.1109/itsc48978.2021.9565125

[ref45] 45. Fano RM, Wintringham WT. Transmission of information. Physics Today. 1961;14(12):56–8.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref46] 46. Brabazon A, O’Neill M. Natural computing in computational finance: an introduction. Natural computing in computational finance. Berlin, Heidelberg: Springer; 2008. p. 1–4.

[ref47] 47. Yang D, Zhang D, Zheng VW, Yu Z. Modeling user activity preference by leveraging user spatial temporal characteristics in LBSNs. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2015;45(1):129–42.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref48] 48. Yang D, Zhang D, Qu B. Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Trans Intell Syst Technol. 2016;7(3):1–23.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref49] 49. Yang D, Zhang D, Chen L, Qu B. NationTelescope: monitoring and visualizing large-scale collective behavior in LBSNs. Journal of Network and Computer Applications. 2015;55:170–80.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref50] 50. Zheng Y, Chen Y, Xie X, Ma WY. GeoLife2.0: a location-based social networking service. In: 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware; 2009. p. 357–8. https://ieeexplore.ieee.org/abstract/document/5088957

[ref51] 51. Ghane’i-Ostad M, Vahdat-Nejad H, Abdolrazzagh-Nezhad M. Detecting overlapping communities in LBSNs by fuzzy subtractive clustering. Soc Netw Anal Min. 2018;8(1):23.
View Article
Google Scholar

[135] View Article

[136] Google Scholar

[ref52] 52. Betancourt F, Riascos AP, Mateos JL. Temporal visitation patterns of points of interest in cities on a planetary scale: a network science and machine learning approach. Sci Rep. 2023;13(1):4890. pmid:36966183
View Article
PubMed/NCBI
Google Scholar

[138] View Article

[139] PubMed/NCBI

[140] Google Scholar

[ref53] 53. Wang D, Wang P, Fu Y, Liu K, Xiong H, Hughes CE. Reinforced imitative graph learning for mobile user profiling. IEEE Trans Knowl Data Eng. 2023;35(12):12944–57.
View Article
Google Scholar

[142] View Article

[143] Google Scholar

[ref54] 54. Yang D, Qu B, Yang J, Cudre-Mauroux P. Revisiting User Mobility and Social Relationships in LBSNs: a hypergraph embedding approach. In: The World Wide Web Conference. 2019. p. 2147–57. https://doi.org/10.1145/3308558.3313635

[ref55] 55. Zhang L, Long C, Cong G. Region embedding with intra and inter-view contrastive learning. IEEE Trans Knowl Data Eng. 2023;35(9):9031–6.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref56] 56. He Y, Zhou W, Luo F, Gao M, Wen J. Feature-based POI grouping with transformer for next point of interest recommendation. Applied Soft Computing. 2023;147:110754.
View Article
Google Scholar

[149] View Article

[150] Google Scholar

[ref57] 57. Chen W, Wan H, Guo S, Huang H, Zheng S, Li J, et al. Building and exploiting spatial–temporal knowledge graph for next POI recommendation. Knowledge-Based Systems. 2022;258:109951.
View Article
Google Scholar

[152] View Article

[153] Google Scholar

[ref58] 58. Jiang S, He W, Cui L, Xu Y, Liu L. Modeling long- and short-term user preferences via self-supervised learning for next POI recommendation. ACM Trans Knowl Discov Data. 2023;17(9):1–20.
View Article
Google Scholar

[155] View Article

[156] Google Scholar

[ref59] 59. Li S, Chen W, Wang B, Huang C, Yu Y, Dong J. MCN4Rec: multi-level collaborative neural network for next location recommendation. ACM Trans Inf Syst. 2024;42(4):1–26.
View Article
Google Scholar

[158] View Article

[159] Google Scholar

[ref60] 60. Liu Y, Wu H, Rezaee K, Khosravi MR, Khalaf OI, Khan AA, et al. Interaction-enhanced and time-aware graph convolutional network for successive point-of-interest recommendation in traveling enterprises. IEEE Trans Ind Inf. 2023;19(1):635–43.
View Article
Google Scholar

[161] View Article

[162] Google Scholar

[ref61] 61. Zheng Y, Zhou X. Modeling multi-factor user preferences based on transformer for next point of interest recommendation. Expert Systems with Applications. 2024;255:124894.
View Article
Google Scholar

[164] View Article

[165] Google Scholar

[ref62] 62. Huang T, Pan X, Cai X, Zhang Y, Yuan X. Learning time slot preferences via mobility tree for next POI recommendation. AAAI. 2024;38(8):8535–43.
View Article
Google Scholar

[167] View Article

[168] Google Scholar

[ref63] 63. Kumar A, Jain DK, Mallik A, Kumar S. Modified node2vec and attention based fusion framework for next POI recommendation. Information Fusion. 2024;101:101998.
View Article
Google Scholar

[170] View Article

[171] Google Scholar

[ref64] 64. Kong D, Wu F. HST-LSTM: a hierarchical spatial-temporal long-short term memory network for location prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 2018. p. 2341–7. https://doi.org/10.24963/ijcai.2018/324

[ref65] 65. Feng J, Li Y, Zhang C, Sun F, Meng F, Guo A, et al. DeepMove: predicting human mobility with attentional recurrent networks. In: Proceedings of the 2018 World Wide Web Conference. WWW ’18. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee; 2018. p. 1459–68.

[ref66] 66. Sun K, Qian T, Chen T, Liang Y, Nguyen QVH, Yin H. Where to go next: modeling long- and short-term user preferences for point-of-interest recommendation. AAAI. 2020;34(01):214–21.
View Article
Google Scholar

[175] View Article

[176] Google Scholar

[ref67] 67. Hong Y, Zhang Y, Schindler K, Raubal M. Context-aware multi-head self-attentional neural network model for next location prediction. Transportation Research Part C: Emerging Technologies. 2023;156:104315.
View Article
Google Scholar

[178] View Article

[179] Google Scholar

[ref68] 68. Fassinut-Mombot B, Choquel J-B. A new probabilistic and entropy fusion approach for management of information sources. Information Fusion. 2004;5(1):35–47.
View Article
Google Scholar

[181] View Article

[182] Google Scholar

[ref69] 69. Ahmed MU, Mandic DP. Multivariate multiscale entropy analysis. IEEE Signal Process Lett. 2012;19(2):91–4.
View Article
Google Scholar

[184] View Article

[185] Google Scholar

[ref70] 70. Zhao Q, Yang G, Zhao K, Yin J, Rao W, Chen L. Multivariate time-series forecasting model: predictability analysis and empirical study. IEEE Trans Big Data. 2023;9(6):1536–48.
View Article
Google Scholar

[187] View Article

[188] Google Scholar

[ref71] 71. Looney D, Adjei T, Mandic DP. A novel multivariate sample entropy algorithm for modeling time series synchronization. Entropy (Basel). 2018;20(2):82. pmid:33265173
View Article
PubMed/NCBI
Google Scholar

[190] View Article

[191] PubMed/NCBI

[192] Google Scholar

Figures

Abstract

Introduction

Related work

Spatiotemporal information

Methods of maximum predictability

Applications of predictability

Problem definition

Proposed method

Spatiotemporal knowledge

Refined maximum predictability

Knowledge preference

Experiments

Datasets

Experimental settings

Evaluation of maximum predictability

Applications of knowledge preference

Discussion

Conclusion

Acknowledgments

References