Local Optimization Strategies in Urban Vehicular Mobility

The comprehension of vehicular traffic in urban environments is crucial to achieve a good management of the complex processes arising from people collective motion. Even allowing for the great complexity of human beings, human behavior turns out to be subject to strong constraints—physical, environmental, social, economic—that induce the emergence of common patterns. The observation and understanding of those patterns is key to setup effective strategies to optimize the quality of life in cities while not frustrating the natural need for mobility. In this paper we focus on vehicular mobility with the aim to reveal the underlying patterns and uncover the human strategies determining them. To this end we analyze a large dataset of GPS vehicles tracks collected in the Rome (Italy) district during a month. We demonstrate the existence of a local optimization of travel times that vehicle drivers perform while choosing their journey. This finding is mirrored by two additional important facts, i.e., the observation that the average vehicle velocity increases by increasing the travel length and the emergence of a universal scaling law for the distribution of travel times at fixed traveled length. A simple modeling scheme confirms this scenario opening the way to further predictions.

A GPS signal quality and data preprocessing The Global Positioning System (GPS) is based on the reception of radio signals cast by satellites orbiting the Earth. The position of the device is inferred by estimating the distance between the device and a variable number of satellites (the more satellites are connected, the better is the signal quality). Therefore, the precision of GPS position measures depends on the precision in determining distances between the receiver device and satellites. Row data signal quality (SQ) is expressed by an integer ranging from 1 to 3. If SQ = 1 the position of the device is determined with less than two satellites and there is a large uncertainty on the position of the vehicle. If SQ = 2 two or three satellites are connected, while more than three satellites are visible whenever SQ = 3. In this last optimal case the uncertainty on the car position is around 10 m. A large number of private vehicles (2.5% of the population in the whole Italy, 4% in the Rome district, data of year 2011) is equipped with an On Board Unit (OBU) that possesses three fundamental elements: a GPS locator, an accelerometer for crash detection and a GSM-GPRS module to communicate recorded data to the data service center. The OBU records selected data about car usage. Specifically, a single measure delivers the vehicle position and the time of recording. The OBU itself can estimate, from time and space measurements, the absolute speed value, its direction and the distance traveled on the road network since the previous record. Furthermore, at each position measurement, engine status (ignition, a going, off) and GPS signal quality are recorded. The data preprocessing operations addressed mostly the travels start and stop definitions. In fact, a signal loss can produce a wrong definition of the beginning or the end of a trip. Each measure includes information on the engine state (ES): ES = 0 means that the engine is being switched on while for ES = 2 the engine is being switched off. When ES = 1 the vehicle is traveling. Therefore, we used the value ES = 0 to detect the origin of a trip and ES = 2 to spot the destination. In case the values of ES are not available (eg., when parking locations lie underground), we use the first temporal measure of a vehicle as its origin, while the last measure is assumed to be its destination. Furthermore, if between two successive points with ES = 1 there is a time lapse larger than 30 min, we consider the first point as the end of a travel and the second one as the beginning of a new trip. Finally, we filtered out tracks with meaningless sequence of panel states (for instance the sequence of ES : 2 → 1 → 0) and deleted trips with length less than 10 m.

B Hourly dependence
Besides the topology of the urban road network, traffic conditions also affect the time of travel. In order to have a more homogeneous sampling of routes, we divide the overall day into four time slots [3], i.e., from 6AM to 10AM (morning), from 10AM to 4PM (early afternoon), from 4PM to 8PM (evening) and from 8PM to 6AM of the next day (night). Figure . . . shows the distributions of the length of the sampled routes in each one of this time-frames, together with a comparison with the overall distribution of lengths during the whole day. No significant differences were found in these distributions. The morning and evening slots are characteristic of rush hours. We find that, irrespective of the time slot h analyzed, each measured frequency distribution p h (t|l) of travel time t taken at fixed travel length l is compatible with a log-normal distribution as in the fully aggregated case discussed in the main text. During rush hours travel times are in average larger with respect to those measured during more quiet traffic conditions so that the curves corresponding to the morning and evening slots shift toward the right as shown in Figure B. We can write: where µ h,l and σ 2 h,l are respectively the average and variance of the variable log t at fixed l in the time slot h. From Fig. C we deduce that µ h,l and σ h,l can be expressed, similarly to the aggregated dataset shown in the main, text as: where the second equation holds for l > 1 km. The values of the parameters α h and s h depend on the time slot considered (see Table A). While the parameter α h substantially depends on the time slot, being larger in case of congested traffic (meaning that the average time of travel t h,l ≈ l α h is higher and the average driving velocity v h,l ≈ l 1−α h is lower), the value of s h stays practically constant to a value s. This allows a scaling collapse of the p h (l|t), irrespective of l and h by considering the rescaled variable τ = t/ t h,l and s = s h (Fig. D). The universal curve implies the existence of a universal navigation mechanism on the road network not depending on the hour of the day and thus not depending on the traffic situation.

C Time-Space Relation
In this section we show the dependence of travel length l with respect to a fixed travel time t. We estimate the conditional probability p(l|t) with the same procedure used for p(t|l). The right panel of Fig. E shows this distribution for some values of t. Also p(l|t) can be described by a log-normal distribution, though its scaling properties are different. In fact, the average travel length l t depends linearly on t (Fig. E left panel). This asymmetry of behavior suggests that car drivers optimize the travel time rather than traveled space. Conversely, we would have seen a superlinear relation between the average travel length and t, corresponding to the sub-linear relation of the average travel time and l presented in the main text.

D Model Details and Sampling
The simplest strategy used to build the travel path on the network from a node A to a node B is to choose the shortest path weighting each link with its average travel time. As explained in the main text, this leads to the observed powerlaw growth of the average travel time with the distance, but on the other hand produces decaying fluctuations around the average, which is in contrast to our experimental results. To heal this unrealistic behavior we have to introduce some sort of noise in the system so to better mimic the actual behavior of drivers. Large fluctuations in short trips and small fluctuations in long trips suggest that short trips might contribute in the construction of large trips so that drivers optimize their paths subdividing them in few intermediate steps. The time optimization is then performed from one step to the other, until the destination is reached (Fig. F).
Assuming that a driver has to travel from the node A to the node B in the network, we choose an optimization distance l optim from a uniform distribution in [3, l(A, B)] where l(A, B) is the euclidean distance between A and B. Starting from A, an intermediate n 1 step is chosen within a radius l optim from A so that the angle between the lines connecting A and n 1 and A and B is smaller than an assigned value α (α = π/6 rad in the following). The path between A and n 1 is chosen by minimizing the traveling time. The process is repeated from n 1 to the next intermediate step n 2 and so on, until the distance between a generic step n i and B is less than l optim . After that, n i+1 = B and the process ends. In our model, the average t goes as l α as in agreement with the observed behavior. In Fig. G we show how the exponent α changes with the fast link density (horizontal axis) and the different velocity with which they can be traveled. When no fast links are present, the exponent α is equal to one since all links are traveled with unit velocity. The exponent α is unity also when all nodes in the grid are connected by fast links with the same average velocity. This means that our model is characterized by an optimal density of fast links, in correspondence of which α is minimum and the average travel velocity gets its maximum. A further interesting property of our model is the power-law behavior connecting the number of nodes that can be covered in a fixed travel time t, i.e., N t ∝ t δ . In case no fast links are present, one trivially gets δ = 2. When fast link are added, we measure the exponent δ as the slope (in double logarithmic scale) at the inflection point of the cumulative number of isochronous paths N t (as in Ref. [1]) as depicted in Fig H.b. The behavior of δ as a function of the number of fast links is shown in Fig. H.a, where different colors stand for different fast link velocities. The value of δ stays larger than 2.0 with small values of N shortcuts and grows until it reaches a maximum around 2.8 depending on fast link velocity v, similarly to what has been found for real urban networks [1,5]. When the number of fast links reaches its maximum value, the value of δ

E Grid model with realistic speed distribution
In our basic model the average velocities of links are trivially chosen to be constant, i.e., unit velocity on the normal grid links and v > 1 on fast links. This is sufficient to reproduce most of the observed behavior apart the specific form of the function p(t|l) that in the simple model is not a log-normal (see Fig. 5 in the main text). To refine the model, we would like to define the average velocities on links by extracting them from the distribution of actual velocities on urban roads, still allowing model drivers to perform the optimal local search depicted in Fig. F. Moreover, in this way we fix the otherwise arbitrary ratio between normal and fast link velocities. The set of considered velocities stems from an integration of both OpenStreetMap and Google Maps databases (for sake of simplicity in the following we will refer to the network built with this data as the GoogleMaps Network) [4]. The GoogleMaps Network dataset consists of a list of roads whose ends (crossroads) are identified by their coordinates, with each road characterized by an average traveling time. We infer the coordinates of the crossroads and their connections from OpenStreetMap [9], and feed the Google Directions API [10] with them in order to extract the average traveling time of each road. We considered the GoogleMaps Network for part of the center of the cities of Rome (23504 links) and London (112377 links). In Fig. I.a we show the frequency distribution p(v) of the average velocities in the GoogleMaps Network of Rome, where the velocities have been scaled so that their average is set to unity. This distribution is peaked around v = 1 and is right skewed. We argue that the velocities much larger than the average characterize arterial roads, which are, at least in principle, traveled at a higher speed than that of regular streets. To assign average velocities to the links of our L × L grid model with shortcuts, we proceed as follows. We fix the number of shortcuts, i.e. fast links, N shortcuts and calculate the fraction of regular links as f = ) and assign to normal links an average velocity extracted from p(v) with v < v p (f ) and to fast links a velocity extracted again from p(v) but with v ≥ v p (f ). For this variation of the model we still assume that drivers minimize the time of travel by fixing intermediate steps.
In Fig. J we synthesize the results of this choice of link velocities. We still recover a power-law dependence of log t (Fig. J.a), a constant behavior of the variance σ 2 l of log t l at large trip lengths (Fig. J.b), and an estimated scaling collapsed distribution p(τ ) not well represented by a log-normal distribution (Fig. J.c). On the contrary, the behavior of the variance of log t l is more similar to the real case for short trips. In fact, in the case of the basic simple model with two velocities, the standard deviation σ l increases from zero to the expected constant value, as shown in Fig. J.b with the green curve, while in the real measured case the same quantity is decreasing (Fig. 1.d of the main text). This is because for short trip lengths l in our basic model, it is unlikely to travel through a fast link so that the regular Manhattan paths are the most used and these are traveled at the same constant unity velocity. In the actual case, instead, the noise induced by the different average velocities of streets, street regulations and traffic conditions affect the variability of the time of travel of short trips more than long ones. The introduction of stochastic velocities extracted from a realistic distribution partially reproduces this behavior (red curve of Fig. J.b). With respect to the actual measured σ l , which decreases monotonically toward a constant value, there is here the appearance of a deep minimum. There is no surprise that the behavior of our model does not mirror faithfully the real situation. In fact, our model displays an oversimplified topology of an urban road network and the velocities on its links are extracted independently from a realistic distribution thus lacking any sort of correlations. In the following section we shall deal with these two inconsistencies.

F Navigation algorithm in the Rome and London urban networks
In this section we further refine our model by considering the actual road network of Rome and London. The average velocities of roads are those extracted from the combined information of OpenStreetMap and Google Directions introduced in the previous section. In this way we are considering both a realistic topology of road connections and their actual average time travel with the correlations between adjacent roads included. We apply the navigation algorithm with time optimization and intermediate steps after sampling random origin and destination points over the network. We obtain as before that the average of the logarithm of the travel time log t l grows with the length of the path l as a power-law (Fig. K) with an exponent higher than the one found in real data. The fluctuations of log t l decrease with l for short trips and tend to a constant as l grows. The value of this constant is lower than the one found in real data (Fig. K top panels). The distributions p(t|l) have the same scaling property of those found in real data (Fig. K bottom panels) so that an universal collapsed function can be obtained. This function cannot be a gaussian function since it shows a significant skewness value. Interestingly, contrarily to what found in the case of our grid model with simple topology, the Kolmogorov-Smirnov test does not exclude that these distributions are log-normal. This suggests that velocity correlations between adjacent roads are an important ingredient of the model. All previous results are confirmed, including the scaling behavior of p(t|l) with the exponent of the power law relation between log t l and l as well as the constant value of its variance σ 2 l being similar in the two networks. The scaling property and the qualitative behavior of the distribution p(t|l) is captured by our model, although the optimization process is less efficient than the one occurring in real life and the fluctuation are dumped in some way. Notice that the measures obtained with GPS data are aggregated at the end of the trip of each driver, meaning that during her travel the trajectory of the car had been affected by all the dynamical fluctuations of speed due to the traffic situation at the time the trip occurred. On the other hand, when sampling paths on a network like the GoogleMaps one, these fluctuations are eliminated by the averaging process done by Google over the links of the network. In this way fluctuations due to traffic are averaged together in the average travel speed assigned to every road. A more predictive and accurate model might consider the interaction between different paths converging to the same roads at the same time and their effect on the traveling speed.

G Log-normal distribution and multiplicative process
The fact that the distributions p(t|l) in Fig. K and Fig. L can be described by log-normal distributions suggests that some kind of multiplicative process might be in action. First of all, we notice that since p(t|l) are log-normal also the distributions p(v|l), where v = l/t have to obey a log-normal distribution. In fact, it is easy to see that where µ l and σ l are the parameters of p(t|l). Thus it is possible to suppose that the multiplicative process occurs on the average velocities rather than on the total travel time.
A path Π of length l can be identified by the series of crossing times t k and traveled space l k at each node, i.e., Π = {(l 1 , t 1 ), (l 2 , t 2 ), . . . , (l m−1 , t m−1 ), (l, t)} where t is the total travel time. Similarly it is possible to define the average velocity at the k th step as v k = l k /t k . Following the results of Refs. [3,7] for the spreading of novelties, a multiplicative process that leads to a log-normal distribution of variables can be defined as where f (k) is so that ∞ k=0 f (k) and ∞ k=0 g(k) exist, X k are i.i.d. with zero mean and unit variance. The last approximation holds whenever the g(k) are small (i.e. they are close to 0). In this case, defining γ k = log(v k ), we have and the distribution of the sum ∞ i=0 g(i)X i converges to a normal distribution [8]. Considering a set of paths of length l, sampled on the GoogleMaps network with the intermediate step navigation algorithm, we can estimate X k = γ k+1 − γ k . For completeness we report that the variables X k result to be weakly correlated with a correlation coefficient of 0.1 and p-value of 0, and not identically distributed. We compute f (k) = X k and g(k) = Var[ X k ] at every step of a path. From the analysis of the behavior of these quantities, shown in where A = 0.015, k 0 = 13.9, B = 0.41, b = 0.83 and k 1 = 390. It is easy to check that with these parameters the series are both convergent and g(k) is small for k > 0. The behavior of these two functions can be understood by realizing that as the step of the path increases we get a convergence to the average process. In fact f (k) decreases rapidly to zero so that the ratio of the average velocities on subsequent links oscillates around one. This indicates that v k converges after few steps to a value close to its average and then oscillates around this value. The expression of g(k) indicates that these oscillations decrease until they approach zero value for large k, when v k is very close to the real average value. In order to make the rhs of Eq. (5) converge to a normal distribution, the variables X k must be i.i.d.. We compute them by defining Fig. N shows the distributions of X k for some values of k. All the variables are identically distributed and uncorrelated (almost vanishing correlation coefficient). The hypotheses of Refs. [3,7] are satisfied and the resulting distribution of the average velocities tends to a log-normal.
To highlight the differences, the previous reasoning can be repeated on the grid model with shortcuts, where the final distributions p(t|l) are not log-normal to understand which of the hypoteses of Refs. [3,7] do not hold. Since the convergence of f (k) and g(k) is assured by the convergence of v k to an average value, the only possibility is that the variables X k are not identically distributed or are not uncorrelated. The latter hypothesis is easily confuted by looking at the correlation coefficients between X k and X k+1 which is always zero for every k, so that the X k are not correlated. On the other hand it can be easily checked that they are not identically distributed by computing the cumulants of degree larger than 2 (Fig. O). In fact, these cumulants have a clear dependence on the step k on the grid model, thus signaling that they obey different statistics, while on the Rome Urban Network they oscillate around a constant value. Although we have shown that a sort of multiplicative effect is in action in the system, we could not find a satisfactory explanation to its emergence.