## Figures

## Abstract

Mitigating traffic congestion on urban roads, with paramount importance in urban development and reduction of energy consumption and air pollution, depends on our ability to foresee road usage and traffic conditions pertaining to the collective behavior of drivers, raising a significant question: to what degree is road traffic predictable in urban areas? Here we rely on the precise records of daily vehicle mobility based on GPS positioning device installed in taxis to uncover the potential daily predictability of urban traffic patterns. Using the mapping from the degree of congestion on roads into a time series of symbols and measuring its entropy, we find a relatively high daily predictability of traffic conditions despite the absence of any priori knowledge of drivers' origins and destinations and quite different travel patterns between weekdays and weekends. Moreover, we find a counterintuitive dependence of the predictability on travel speed: the road segment associated with intermediate average travel speed is most difficult to be predicted. We also explore the possibility of recovering the traffic condition of an inaccessible segment from its adjacent segments with respect to limited observability. The highly predictable traffic patterns in spite of the heterogeneity of drivers' behaviors and the variability of their origins and destinations enables development of accurate predictive models for eventually devising practical strategies to mitigate urban road congestion.

**Citation: **Wang J, Mao Y, Li J, Xiong Z, Wang W-X (2015) Predictability of Road Traffic and Congestion in Urban Areas. PLoS ONE 10(4):
e0121825.
https://doi.org/10.1371/journal.pone.0121825

**Academic Editor: **Matjaz Perc,
University of Maribor, SLOVENIA

**Received: **November 23, 2014; **Accepted: **February 6, 2015; **Published: ** April 7, 2015

**Copyright: ** © 2015 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

**Data Availability: **Data are available from Figshare, under the DOI: http://dx.doi.org/10.6084/m9.figshare.1294744.

**Funding: **This work is supported by National High Technology Research and Development Program of China (No. 2013AA01A601) http://www.863.gov.cn/, National Natural Science Foundation of China (No. 61272350) http://www.nsfc.gov.cn/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

The past decades have witnessed a rapid development of modern society accompanied with an increasing demand for mobility in metropolises [1–4], accounting for the conflict between the limits of road capacity and the increment of traffic demand reflected by severe traffic congestions [5–7]. Induced by such problem, citizens suffer from the reduction of travel efficiency and the increase of both fuel consumption [8] and air pollution [9] related with vehicle emission. For instance, in recent years, a number of major cities in China have frequently experienced persistent haze, raising the need of better traffic management to mitigate congestion that is likely one of the main factors for the pollution [10, 11]. Despite much effort dedicated to address the problems of traffic jam [12], urban planning [13, 14], traffic prediction [15–17], as well as subway and bus system design [18, 19], we still lack a comprehensive understanding of the dynamical behaviors of urban traffic. The difficulty stems from two factors: the lack of systematic and accurate data in conventional researches based on travel surveys and the diversity of drivers’ complex self-adaptive behaviors in making routing choice decision [20, 21]. Fortunately, “big data” as the inevitable outcome in the information era opens new routes to reinvent urban traffic systems and offer solutions for increasingly serious traffic jams [22]. In this light, mobile phone and LBS data have been employed to explore road usage patterns in urban areas [23–26]. However, to eventually implementing control on road traffic, predicting traffic conditions is the prerequisite, which prompts us to wonder, to what degree traffic flow on complex road networks is predictable with respect to high self-adaptivity of drivers and without any priori knowledge of their origins and destinations.

In this paper, we for the first time explore the predictability of urban traffic and congestion by using comprehensive records of Global Position System (GPS) devices installed in vehicles. The data provide the velocity and locations of a large number of taxis in real time, enabling investigation and quantification of the predictability of segments in main roads in an urban road network. In particular, we establish a mapping from the degree of congestion on a segment of road into a time series of symbols, which allows us to exploit tools in the information theory, such as entropy [27] and Fano’s inequality [28] to measure the predictability of traffic condition on a segment of road. Our methodology is inspired by the seminal work of Song et al. who incorporate information theory into time series analysis to measure the limited predictability of individual mobility [29]. Our main contribution is that we extend the tools of time series analysis to the collective dynamics of road traffic rather than at individual level, by mapping the vehicle records from GPS into road usage so as to offer the predictability of traffic conditions at different locations. In contrast to the traditional way based on origin-destination analysis [30], our approach relies only on short-time historical records of traffic conditions without the need of priori knowledge of drivers’ origins and destinations and their associated navigation strategies. Our accessibility of such individual-level information is inherently limited by the diversity in population, job switching, moving and urbanization. Our research yields a number of interesting findings, including relatively high daily predictability of traffic conditions in the three Ring Roads in Beijing [31] despite quite different travel patterns on the weekends compared to working days, the non-monotonic dependence of the predictability on vehicle velocity and the recoverability of the traffic condition of an inaccessible segment by the information of its adjacent observable segments. Thus we present a general and practical approach for understanding the predictability of real time urban road traffic and for devising effective control strategies to improve the roads’ level of service.

## Results

We explore the predictability of traffic conditions by using the GPS records of more than 12000 taxis in Beijing, China, (see Methods for data description and processing). We focus on the three Ring Roads, the 2nd, 3rd and 4th Rings in Beijing by mapping the states of vehicles into the traffic conditions on the roads. The three rings bear the most heavy traffic burden in Beijing and the data records pertains to them with high frequency are sufficient for quantifying their traffic conditions. In particular, we divide each ring into a number of segments with given *segment* *length* Δ*L*, and measure the traffic condition of each segment by the average velocity of vehicles. To simplify our study, we discretize the average velocity of the segments in the range from 0km/h to the speed limit 100km/h with a certain *speed* *level* *interval* Δ*V*, e.g., 10km/h. Thus, the mapping gives rise to a time series of discrete states of speed for each road segment, which allows us to do some analysis of discrete time series to reveal intrinsic traffic patterns. The dynamical behavior of a whole ring can then be quantified by that of all segments of it. Fig 1 shows the transition probabilities between different ranges of speed, say, speed states. We find that on average, a speed state is more likely to remain unchanged or shift to its nearby states rather than change to a distant state. These observations imply the existence of a potentially stable transition pattern that may facilitate the prediction of traffic conditions and congestion from historical records.

(a)-(c) Transition probability between different speed states in the 2nd (a), 3rd (b) and 4th (c) Ring Roads of Beijing. The speed *V* between 10km/h and 70km/h is divided into 6 states with equal speed interval Δ*V* = 10km/h. Due to rare observations for *V* < 10km/h and *V* > 70km/h, they are set to be two states respectively, without any further partitions. For each Ring Road, the result is obtained by averaging over all road segments with equal length Δ*L* = 1km. We see that for each state, remaining unchanged and shifting to its adjacent states constitute a very large proportion, implying a potential stable regulation in the traffic patterns.

We exploit information entropy [27] to quantify the uncertainty of speed transition and the degree of predictability characterizing the time series of the speed at each segment. By following Ref. [29], we assign three entropy measures to each road segment’s traffic pattern: (i) *Random Entropy* ${S}_{i}^{\mathrm{\text{rand}}}$. Random entropy is defined as ${S}_{i}^{\mathrm{\text{rand}}}={\mathrm{log}}_{2}{N}_{i}$ where *N*_{i} is the number of distinct states, or speed levels, reached by road segment *i*. (ii) *Temporal-uncorrelated Entropy* ${S}_{i}^{\mathrm{\text{unc}}}$. Temporal-uncorrelated entropy is defined as ${S}_{i}^{\mathrm{\text{unc}}}=-{\sum}_{j=1}^{{N}_{i}}{p}_{i}(j){\mathrm{log}}_{2}{p}_{i}(j)$, where *p*_{i}(*j*) is the probability that the state *j* is reached by the road segment *i*. (iii) *Actual Entropy S_{i}*. Actual entropy is defined as $-{\sum}_{{T}_{i}^{\prime}\subset {T}_{i}}P({T}_{i}^{\prime}){\mathrm{log}}_{2}[P({T}_{i}^{\prime})]$, where

*T*

_{i}= {

*X*

_{1},

*X*

_{2}, …,

*X*

_{L}} denotes the sequence of states that road segment

*i*reaches in observation. $P({T}_{i}^{\prime})$ is the probability of finding the time-ordered subsequence

*T*

_{i}in the state transition sequence of segment

*i*. It is noteworthy that the random entropy ${S}_{i}^{\mathrm{\text{rand}}}$ reflects the degree of predictability of a road segment’s state transition based on the assumption that each state is visited with equal probability. For the temporal-uncorrelated entropy ${S}_{i}^{\mathrm{\text{unc}}}$, it takes the heterogeneity in the probability into account, but omits the order of the transition. In contrast, the actual entropy

*S*

_{i}by considering both heterogeneous probability and temporal correlation offers more realistic characterization of the traffic patterns.

The sufficient data with high record frequency on the three ring roads allow us to calculate the actual entropy *S*_{i} that in principle requires a continuous record of a road segment’s momentary state. As shown in Fig 2(a), we can see remarkable difference between *P*(*S*) and *P*(*S*^{rand}). To be concrete, *S*^{rand} peaks at about 2.6, indicating that on average each update of the speed state represents 2.6 bits per hour new information. In other words, the new speed level could be found in average 2^{2.6} ≈ 6 states. In contrast, the fact that *P*(*S*) of the actual entropy peaks at *S* = 0.9 demonstrates that the real uncertainty in a segment’s speed state is 2^{0.9} ≈ 1.87 rather than 6.

(a) The distribution of the random entropy *S*^{rand}, the uncorrelated entropy *S*^{unc} and the entropy *S*_{i} of road segments in the 2nd Ring Road in Beijing. (b) The distribution of the Π^{rand}, the Π^{unc} and the Π^{max} across all road segments. The road segments are of identical length Δ*L* = 1km and the interval of speed state is Δ*V* = 10km/h. The 3rd and 2th Ring Roads show similar results of *P*(*S*) and *P*(Π) to that of the 2nd Ring Roads.

The entropy of a segment’s speed allows us to measure the predictability Π that a suitable predictive algorithm can correctly predict the segment’s future speed state. In analogy with Ref. [29], the predictability measure is subject to Fano’s inequality. Specifically, if the speed level of a single road segment is updated in *N* states with the time, then its predictability Π ≤ Π^{max}(*S*, *N*), where Π^{max} could be acquired by solving
where *H*(Π^{max}) represents the binary entropy function, namely
For a road segment with Π^{max} = 0.1, we could predict its state transition accurately only in 10% of the cases. An equivalent statement is that 10% is the upper bound of probability for any algorithms attempting to predict the segment’s speed state transition. Since we calculates Π^{max} base on *S*^{rand}, *S*^{unc} and *S*, the result is encouraging. We found that under the condition Δ*L* = 1*km* and Δ*V* = 20*km*/*h*, the predictability of the 2nd Ring Road segments is narrowly peaked approximately at 0.83, indicating that it is theoretically possible to predict the transition of speed status in 83% of the cases. This high predictability with bounded distribution indicate that, despite the diversity of drivers’ origins, destinations, their routing decisions and adaptive behaviors, strikingly the traffic patterns as a collective behavior of a large number of drivers are of high degree of potential predictability exclusively based the historical records of daily traffic patterns in the absence of any individual level information. We have also explored the maximum predictability Π^{unc} and Π^{rand} based on *S*^{unc} and *S*^{rand}, as shown in Fig 2(b). We see that both maxima in *P*(Π^{unc}) and *P*(Π^{rand}) are much lower than that of *P*(Π^{max}), manifesting that Π^{max} is a much better predictive tool than the other two and the temporal order of traffic patterns contains significant information for precisely predicting future patterns.

We further explore how the settings of the road segment length Δ*L* and speed level interval Δ*V* affect the predictability. As shown in Fig 3, except very small Δ*V* and very short Δ*L*, quite high average predictability is observed. This provides strong evidence for the generally high predictability of traffic conditions of the three ring roads. The relatively low predictability for extreme cases is ascribed to the relatively big fluctuations in the average speed resulting from insufficient records. For example, for a road segment with very short length, the probability of finding a taxi in it within a certain time interval will be low. In other words, in this scenario, the data records of taxis will become insufficient to capture the actual average speed in the segment, accounting for the big fluctuation of speed and inaccurate reflection of the traffic pattern in the segment. Similarly, for small Δ*V*, the insufficient data subject to each speed state is incapable of characterizing the real situation, leading to the specious low predictability. Nevertheless, based on our findings, insofar as the records are adequate to measure traffic conditions, the traffic patterns are highly predictable, regardless of the settings of the road segment length and speed interval.

(a)-(c) The dependence of the maximum value Π^{max} on Δ*L* and Δ*V* for the 2nd (a), the 3rd (b) and the 4th (c) Ring Road. The color bars represent the values of Π^{max}. The results for each Ring Road are the average over all road segments in the Ring Road.

Although the traffic patterns of the three ring roads on average are highly predictable, there are certain variations between different segments. Fig 4(a) shows the local predictability of each segment on the map. We find that the local predictability is correlated with the average local speed (Fig 4(b)), prompting us to investigate the correlation between them. Interestingly, we observe a non-monotonic correlation between the local predictability and average speed with the lowest predictability arising at intermediate speed, as shown in Fig 5(a) and 5(b). As a result, we also find that it is most difficult to predict the traffic conditions of the 3rd ring road, due to its intermediate average speed compared to the 2nd and 4th ring roads. A heuristic explanation for this phenomenon can be provided with respect to the variational direction of speed. Suppose that in a segment all the vehicles are fully stopped because of heavy congestion. One minute later, remaining stopped or starting to pull away are the only two possible situations. Let’s consider another extreme case in which all vehicles are moving along the speed limit of a road without any congestion. One minute later, there are also only two possible scenarios, i.e., their speeds remain unchanged or reduce because of some suddenly emerged congestion. In contrast to the extreme cases, for a car with intermediate speed, it may accelerate, decelerate or keep its current speed some time later, relying on what happens in the near future. As demonstrated in the Fig 1, the variational direction of speed with the biggest volume for the 4th ring is between 50 60km/h to 60 70km/h. The volume of the biggest direction is obviously bigger than the direction with the second biggest volume, which is between 40 50km/h to 50 60km/h. The direction with the biggest volume for the 3rd ring is between 40 50km/h to 50 60km/h. The volume of the biggest direction is very close to the direction with the second biggest volume, which is between 30 40km/h to 40 50km/h. The speed transition probability of the 4th ring is more concentrated than that of the 3rd ring road. Thus, it is easier to predict the speed transition of the 4th ring. Therefore, due to higher variant possibilities of intermediate speed compared to that of low and high speed, the traffic condition of a segment with intermediate average speed is relatively most difficult to be predicted.

(a) The local predictability of road segments in the three Ring Roads. (b) The local average speed of road segments in the three Ring Roads. In (a), the color bar represents the maximum value Π^{max} of road segments and In (b), the color bar represents the average speed of road segments.

(a) Predictability as measured by Π^{max} as a function of the average speed for the three Ring Roads. (b) Box plots of the predictability in different ranges of the average speed. (c) The predictability and the average speed of each entire Ring Road. The results are obtained for Δ*L* = 1Km and Δ*V* = 10km/h. the highest and lowest values outside the boxes represent 9% and 91% of the rank of predicted values, respectively, the upper and lower bound of the boxes represent 25% and 75% of the rank of predicted values, respectively, and the bars inside the boxes characterize the median value.

To gain a deeper understanding of the predictability of traffic patterns, we explore the effect of commuter demand on daily traffic predictability in terms of the comparison between weekdays and weekends. It is intuitive that the commuter demand during weekdays may induce quite different traffic patterns and congestion distributions compared to that on weekends. However, to our surprise, despite the obvious difference, we find that the daily traffic patterns in a week are of very similar predictability, nearly regardless of the commuter demand, as shown in Fig 6. These striking results suggest that both weekdays and weekends have their specific inherent patterns encoded in the historical records, accounting for the relatively high and similar predictability. In other words, although weekdays and weekends have different inherent patterns, their predictability are as high as each other.

(a)-(c), The daily predictability during a week of the 2nd (a), 3rd (b) and 4th (c) Ring Road. (d) The daily predictability averaging over all of the three Ring Roads during a week. The parameter values and the box plots are the same as in Fig 5.

Next, we explore the probability of inferring the state of a segment from the state series of its adjacent segments. This problem is related to the observability that in the control theory is defined as if a system’s state can be fully recovered from a set of observable quantities [32]. To the urban road traffic, inferring traffic conditions at some locations from the observation of the other segments has important applications in monitoring and controlling traffic in real time from a limited number of speed detectors. In analogy with the predictability, we calculate the inference probability $\tilde{\Pi}$ of a segment based on the information entropy and the Fano’s inequality. However, different from the predictability, here the information entropy is calculated by ${S}_{i}^{\prime}=-{\sum}_{{R}_{i}^{\prime}\subset {R}_{i}}P({R}_{i}^{\prime}){\mathrm{log}}_{2}[P({R}_{i}^{\prime})]$, where *R*_{i} = {*X*_{1}, *X*_{2}, …, *X*_{L}} denotes the states observed within a single time interval of *L* road segments connected in a sequence, and $P({R}_{i}^{\prime})$ is the probability of finding the subsequence ${R}_{i}^{\prime}$ in this sequence. Similarly, by solving ${S}^{\prime}=H({\tilde{\Pi}}^{\mathrm{\text{max}}})+(1-{\tilde{\Pi}}^{\mathrm{\text{max}}}){\mathrm{log}}_{2}(N-1)$, we obtain an upper bound ${\tilde{\Pi}}^{\mathrm{\text{max}}}$ which captures the inference probability of the traffic pattern of a road segment from its observable adjacent segments.

As shown in Fig 7, we see that the inference probability increases as the amount of segments increases for all the three ring roads. This phenomenon can be heuristically explained as follows. For sufficiently short segment lengths (sufficient number of segments), the average vehicle speed in a segment will be sufficiently close to that in its adjacent segments, enabling an accurate inference of the segment’s state by trivially using that in its neighborhood. The increment of segment length induces more difference between adjacent segments, rendering the inference more difficult. As a result, the inference probability is an increase function of the amount of segments. More importantly, our results provide a quantitative understanding of the inference probability in terms of the number of segments, which is valuable for determining the density of speed detectors installed so as to infer the traffic conditions of the entire road in real time with a certain accuracy. In addition, we also find that the inference probability of the 3rd ring road exhibits the lowest values compared to the 2nd and 4th ring roads, which is the same as the predictability rank of the three ring roads, i.e., the 3rd ring road is of the lowest predictability. This suggests that the average vehicle speed plays similar roles in both predictability and inference probability, which deserves deeper explorations.

The inference probability (IP) as a function of the amounts of segments for all the three Ring Roads. Here IP is measured by the upper bound of ${\tilde{\Pi}}^{\mathrm{max}}$.

## Discussion

In summary, using the GPS records of vehicles to capture the traffic patterns on urban roads in the combination of entropy and Fano’s inequality demonstrates that daily traffic patterns in the three major ring roads in Beijing are highly predictable by relying only on short-time historical records, without any priori knowledge of drivers’ origins and destinations, driving habits, navigation strategies, and adaptive behaviors. We have also found that despite the apparently different traffic patterns on weekdays from that on weekends, where the former is highly affected by commuter demand, their traffic patterns exhibit similarly high predictability. This result indicates that each day has its specific inherent regularity and traffic pattern encoded in the historical records. Another striking finding is that the local predictability is non-monotonically correlated with the average velocity and the lowest predictability arises at intermediate velocity. Consequently, the traffic conditions of the 3rd ring road due to its intermediate average velocity compared to the 2nd and 4rd ring roads, is most difficult to be predicted. We have provided a heuristic explanation for this counterintuitive phenomenon. Furthermore, the probability of inferring the traffic pattern of an inaccessible road segment from the state series of its adjacent segment is explored by using entropy and Fano’s inequality, which is important for monitoring the traffic condition of the entire road network with respect to the limits of our ability to observe every location in real time.

All of these findings are valuable for the development of predictive models and algorithms for achieving actual predictions of traffic conditions in real time based solely on short-time historical records, without the need of individual-level information that in principle is impossible to be fully accessed. Relying on the successful prediction of traffic patterns, it is feasible to implement effective control to release and prohibit congestion by exploiting traditional approaches in traffic engineering [33] and the recently developed controllability theory for complex networks [34, 35]. Urban road network as a typical complex networked system exhibits a variety of dynamical behaviors, such as the phantom jam and the diffusion of congestion [12]. Thus, it is imperative to control the road network as a whole in virtue of the controllability framework rather than controlling a single road or area individually. Our approach gains new insight into mitigating increasingly severe congestion in urban areas by combining “big data” and the tools in information theory and for time series analysis. Further effort, we hope, will be inspired toward predicting traffic patterns and devising effective strategies to alleviate traffic congestion in urban areas.

## Materials and Methods

We use OpenStreetMap [36] to extract all roads in the spatial range of Beijing from available database. We then retrieve the trajectories of vehicles. The data set that we used contains the trajectories of 20000 taxies recorded every minutes within a month in Beijing. For each record, the location (the latitude and longitude), the direction, the state (whether there are any passengers in the taxi), the time stamp and the velocity updated in every minutes are included. Because of the inevitable error in the GPS locating process, all the records are preprocessed to match the GPS trajectories to the road by exploiting the ST-Matching algorithm [37]. After that, each GPS record is mapped to a road segments of OpenStreetMap.

The GPS data we used in this paper are Floating Car Data (FCD). FCD is a method to determine the traffic speed on the road network. It is based on the collection of localization data, speed, and direction of travel and time information from mobile phones or GPS devices in vehicles being driven [38]. The FCD are the essential source for traffic information and for most intelligent transportation systems of many cities. The FCD collection in Beijing is based on mature taxi system of Beijing. There are more than 70 thousand taxis running in Beijing, which accounts for 25% of total traffic of roads in Beijing.

To be concrete, ST-Matching algorithm of Ref. [37] is implemented via four steps: (i) *Candidate Preparation.* Firstly, for each GPS record point, the ST-Matching algorithm retrieves a set of candidate road segments within a fixed radius *r*, which is set to be 20 meters. For the points without any candidates within *r*, the algorithm discards them as invalid records. (ii)*Spatial Analysis.* The algorithm next evaluates the given candidate segments by using “observation probability” and “transmission probability” to express the geometric and topological information of each candidate segments and the spatial relationship between them. This step gives rise to the spatial analysis function ${F}_{s}({c}_{i-1}^{t}\to {c}_{i}^{s})$, which is simply the product of the observation probability and transmission probability. In this function, ${c}_{i}^{s}$ represents the *s*th candidate segment of the *i*th GPS sampling record. This function measures the probability that the the *i*th record is on ${c}_{i}^{s}$, given an assumed real segment mapping of the (*i*−1)th record, that is ${c}_{i-1}^{t}$. (iii) *Temporal Analysis.* The ST-Matching algorithm exploits the temporal analysis function ${F}_{t}({c}_{i-1}^{t}\to {c}_{i}^{s})$ to further incorporate the temporal features into the map-matching process. This step is available for the situation that only spatial analysis could not handle. Specifically, if the trajectory of a vehicle lies between a freeway and a service road, and it moves in a relatively high speed, then more likely it is that the vehicle is on the freeway. (iv) *Result Matching.* Finally, after ${F}_{s}({c}_{i-1}^{t}\to {c}_{i}^{s})$ and ${F}_{t}({c}_{i-1}^{t}\to {c}_{i}^{s})$ is computed, the algorithm uses the ST-function to evaluate each candidate segments, that is $F({c}_{i-1}^{t}\to {c}_{i}^{s})={F}_{s}({c}_{i-1}^{t}\to {c}_{i}^{s})\times {F}_{t}({c}_{i-1}^{t}\to {c}_{i}^{s}),2\le i\le n$. Thus, the problem is converted to finding a path with the highest ST-function value, given the candidates for all sampling points.

A GPS record is a triple like <longitude, latitude, time>. For given segment length and time level setup, the road segments travel speed was calculated using following steps: 1) Divide the ring roads into several segments using the given segment length. For the Δ*L* = 1*km*, the 2nd ring road were divided into 33 segments, the 3rd ring road into 49 and the 4th ring road into 66. 2) Use the ST-Matching algorithm to match all GPS records on the road segments. In this step, each GPS record was appended with a segment ID field. 3) For given time level, calculate the average speed of GPS records with same segment ID in each time level, and then we got the travel speed of every road segment for each time level.

After the map-matching process, each point is assigned with an attribute which represents the road segment that the point is on. Based on the work before, we could generate the time series of each road segment’s speed states.

## Acknowledgments

The authors thank Prof. X. Yan and Dr. S. Shen for valuable discussions and suggestions.

## Author Contributions

Conceived and designed the experiments: JW WXW. Performed the experiments: YM JL. Analyzed the data: JW WXW. Contributed reagents/materials/analysis tools: YM JL ZX. Wrote the paper: JW YM WXW.

## References

- 1. Brockmann D, Hufnagel L, Geisel T. The scaling laws of human travel. Nature. 2006;439(7075):462–465. pmid:16437114
- 2. Belik V, Geisel T, Brockmann D. Natural human mobility patterns and spatial spread of infectious diseases. Phys Rev X. 2011;1(1):011001.
- 3. Kölbl R, Helbing D. Energy laws in human travel behaviour. New J Phys. 2003;5(1):48.
- 4. Gonzalez MC, Hidalgo CA, Barabasi AL. Understanding individual human mobility patterns. Nature. 2008;453(7196):779–782. pmid:18528393
- 5.
Schrank D. Urban Mobility Report (2004). DIANE Publishing; 2008.
- 6. Helbing D. A section-based queueing-theoretical traffic model for congestion and travel time analysis in networks. J Phys A: Math Gen. 2003;36(46):L593.
- 7. Al-Kadi O, Al-Kadi O, Al-Sayyed R, Alqatawna J. Road scene analysis for determination of road traffic density. Frontiers of Computer Science. 2014;8(4):619–628.
- 8. Chin AT. Containing air pollution and traffic congestion: transport policy and the environment in Singapore. Atmos Environ. 1996;30(5):787–801.
- 9. Rosenlund M, Forastiere F, Stafoggia M, Porta D, Perucci M, Ranzi A, et al. Comparison of regression models with land-use and emissions data to predict the spatial distribution of traffic-related air pollution in Rome. J Expo Sci Env Epid. 2008;18(2):192–199.
- 10. Zhang X, Wang Y, Niu T, Zhang X, Gong S, Zhang Y, et al. Atmospheric aerosol compositions in China: spatial/temporal variability, chemical signature, regional haze distribution and comparisons with global aerosols. Atmos Chem Phys Discuss. 2012;12(2):779–799.
- 11.
Zheng Y, Liu F, Hsieh HP. U-Air: when urban air quality inference meets big data. In: Proceedings of ACM SIGKDD’13. ACM; 2013. p. 1436–1444.
- 12. Helbing D. Traffic and related self-driven many-particle systems. Rev Mod Phys. 2001;73(4):1067.
- 13. Batty M. The size, scale, and shape of cities. Science. 2008;319(5864):769–771. pmid:18258906
- 14. Barthélemy M. Spatial networks. Phys Rep. 2011;499(1):1–101.
- 15.
Herrera JC, Amin S, Bayen A, Madanat S, Zhang M, Nie Y, et al. Dynamic estimation of OD matrices for freeways and arterials. Institute of Transportation Studies, UC Berkeley; 2007.
- 16. Herrera JC, Work DB, Herring R, Ban XJ, Jacobson Q, Bayen AM. Evaluation of traffic data obtained via GPS-enabled mobile phones: The Mobile Century field experiment. Transport Res C-Emer. 2010;18(4):568–583.
- 17.
Wynter L, Shen W. Real-Time Traffic Prediction Using GPS Data with Low Sampling Rates: A Hybrid Approach. In: Proceedings of Transportation Research Board 91st Annual Meeting. 12-1692; 2012..
- 18. Leng B, Zeng J, Xiong Z, Lv W, Wan Y. Probability tree based passenger flow prediction and its application to the Beijing subway system. Frontiers of Computer Science. 2013;7(2):195–203.
- 19. Yu K, Zhu H, Cao H, Zhang B, Chen E, Tian J, et al. Learning to detect subway arrivals for passengers on a train. Frontiers of Computer Science. 2014;8(2):316–329.
- 20. Thomas JM, Darnton J. Social diversity and economic development in the metropolis. J Plan Liter. 2006;21(2):153–168.
- 21. Yousaf J, Li J, Chen L, Tang J, Dai X. Generalized multipath planning model for ride-sharing systems. Frontiers of Computer Science. 2014;8(1):100–118.
- 22.
Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt; 2013.
- 23.
Hornsey R. ‘He who Thinks, in Modern Traffic, is Lost’: Automation and the Pedestrian Rhythms of Interwar London. Ashgate; 2010.
- 24. Wang P, Hunter T, Bayen AM, Schechtner K, González MC. Understanding road usage patterns in urban areas. Sci Rep. 2012;2.
- 25. Wang J, Wei D, He K, Gong H, Wang P. Encapsulating urban traffic rhythms into road networks. Sci Rep. 2014;4.
- 26. Li RH, Liu J, Yu JX, Chen H, Kitagawa H. Co-occurrence prediction in a large location-based social network. Frontiers of Computer Science. 2013;7(2):185–194.
- 27.
Brabazon A, O’Neill M, Maringer D. Natural computing in computational finance. Springer; 2008.
- 28.
Cover TM, Thomas JA. Elements of information theory. John Wiley & Sons; 2012.
- 29. Song C, Qu Z, Blumm N, Barabási AL. Limits of predictability in human mobility. Science. 2010;327(5968):1018–1021. pmid:20167789
- 30. Jia T, Jiang B. Exploring human activity patterns using taxicab static points. ISPRS International Journal of Geo-Information. 2012;1(1):89–107.
- 31.
Ring roads of Beijing. Wikipedia. Available: http://en.wikipedia.org/wiki/Ring_Roads_of_Beijing. Accessed 2015 Jan 14.
- 32. Hautus M. Controllability and observability conditions of linear autonomous systems. Ned Akad Wetenschappen, Proc Ser A. 1969;72(5):443.
- 33.
Ortuzar Jd, Willumsen LG. Modelling transport. John Wiley & Sons; 2011.
- 34. Liu YY, Slotine JJ, Barabási AL. Controllability of complex networks. Nature. 2011;473(7346):167–173. pmid:21562557
- 35. Yuan Z, Zhao C, Di Z, Wang WX, Lai YC. Exact controllability of complex networks. Nat Commun. 2013;4.
- 36.
Beijing map in OSM. OpenStreetMap. Available: http://www.openstreetmap.org. Accessed 2014 Feb 20.
- 37.
Lou Y, Zhang C, Zheng Y, Xie X, Wang W, Huang Y. Map-matching for low-sampling-rate GPS trajectories. In: Proceedings of ACM SIGSPATIAL’17. ACM; 2009. p. 352–361.
- 38.
Floating car data. Wikipedia. Available: http://en.wikipedia.org/wiki/Floating_car_data. Accessed 2015 Jan 19.