Understanding Human Mobility from Twitter

Understanding human mobility is crucial for a broad range of applications from disease prediction to communication networks. Most efforts on studying human mobility have so far used private and low resolution data, such as call data records. Here, we propose Twitter as a proxy for human mobility, as it relies on publicly available data and provides high resolution positioning when users opt to geotag their tweets with their current location. We analyse a Twitter dataset with more than six million geotagged tweets posted in Australia, and we demonstrate that Twitter can be a reliable source for studying human mobility patterns. Our analysis shows that geotagged tweets can capture rich features of human mobility, such as the diversity of movement orbits among individuals and of movements within and between cities. We also find that short- and long-distance movers both spend most of their time in large metropolitan areas, in contrast with intermediate-distance movers’ movements, reflecting the impact of different modes of travel. Our study provides solid evidence that Twitter can indeed be a useful proxy for tracking and predicting human movement.


S1 Statistics of tweeting
In this section, we report statistics which capture usage patterns of Twitter. We find that the distribution of the number of tweets among users in our dataset follows a fat-tailed distribution, as shown in Figure S1(a). The result indicates that the frequency of tweeting is not homogeneous across the population; it exhibits an 80/20 effect, where a majority of registered Twitter users only contribute a small number of tweets and most tweets are posted by only a small number of frequent users. As for other technologies, there is an inherent effect that geotagged tweets can provide fine-grained data for heavy users and coarser data for lighter users. This should be considered for individual-based modelling of mobility, but on a population level, the observed dynamics still hold as in previous studies.
Next, we explore the sensitivity of our observations on mobility patterns to the number of tweets from users. Figure S1(b) shows the displacement distribution ∆r separately for user groups based on the number of available tweets N in the dataset. We observe the same patterns as for the entire population in terms of movement modes. The main trend is the Twitter users with a higher N tend to have shorter steps. This is expected as their higher  2 number of tweets provides more fine-grained sampling of their actual movement, leading to shorter observed steps between position samples. Figure S1(c) shows the distribution of r g split across the same user groups. Again, we observe the same three modes of movement regardless of N and the distributions are broadly similar.  We then study the inter-event time of tweeting, i.e. the time interval between a user's two consecutive tweets. As shown in Figure S2(a), the inter-event time distribution also follows a fat-tailed distribution, which indicates, that unlike a homogeneous process with Poissonian distribution [10,11], heterogeneous mechanisms or bursty dynamics such as prioritising task execution [8] or reinforcement decision-making [9] may exist in tweeting behaviour. We also observe a discontinuity in the plot around 86,400 corresponding to the day/night cycle. To check whether our results depends on the individual tweeting frequency, we group users into five categories based on their number of tweets and recalculate the inter-event time distribution in each group. Figure S2 we explore the sensitivity of the displacement distribution to inter-event times. We plot the displacement distribution separately for tweets based on the inter event time in Figure S2(c).
The distribution for all tweet groups show no structural difference, though the plot for ∆T < 3600 clearly involves shorter displacements. This is expected since users can travel within a bounded distance within one hour of their last tweet, which explains the faster decay of this plot for larger distances.

S2 Technology Dependencies
To explore whether the observed irregularity in the distribution of d is merely due to GPS resolution, knowledge of the error associated with each reported location is important. Zandbergen in [5] reports Radius of Gyration (km) 1 < rg < 10 10 < rg < 100 100 < rg < 500 500 < rg < inf

S3 Time evolution of r g
We find that the radius of gyration as a function of time r g (t), averaged over the whole population, in Figure S3 increases ultra-slowly, which confirms that strong recurrent patterns exist in human mobility. This information is of value for modelling disease risk, for instance, as it indicates that observing the first few hours of tweets can strongly indicate the longerterm r g for a particular person. Thus, limited empirical data can seed mobility models for initial r g values of people, and these values remain relatively stable over time.

S4 Visitation frequency for different r g
We now explore how the visitation frequency changes for users with different r g , using the same approach as Figure 2. The results are shown in Figure S4 for a cluster size of 250m.
Clearly, all r g groups follows Zipf's law of preferential return, yet the likelihood to be at the most popular location decreases with increasing r g (see insets). Similarly, the steepness of the plots drops with increasing r g , indicating that people who move further have lower preference to return to previously visited locations. This effect is likely to result from the  higher cost [2] people incur for long-distance movement, which firstly increases the return cost, and secondly reduces the perceived value of returning. show similar patterns as in Figure 5, confirming that short and long distance movers remain mainly around the key cities, while intermediate distance movers are more likely to be found further away from key population centres.

S6 Statistical Validation and Goodness of Fit
We use the traditional least squares estimation (LSE) method to get the fitting function of the displacement distribution P (d) and the gyration radius distribution P (r g ). The estimated parameters of the fitting functions for the two fitting schemes in the main text are shown in Table 1-2. Here the probability density function (PDF) of the empirical data is obtained by logarithmic binning [12].

7
(a) 1 < r g < 10 (b) 10 < r g < 100 (c) 100 < r g < 500 (d) r g > 500 Figure S5: Differences in tweet spatial distributions as the radius of gyration varies for all of Australia. Tweet activity for 1 < r g < 10 and 500 < r g < 1000 is mainly concentrated in large cities, while tweets for intermediate r g extend further along main highways and other regions between cities.  It is arguable that maximum likelihood estimation (MLE) method is usually more powerful in the estimation of fitting parameters from broad distributions such as a power-law or an exponential [3], especially when the sample size is small. However, using MLE to fit a mixture function of broad distributions is not easy to implement and the performance is not well understood. Indeed, recent studies suggested that, when the sample size is large (e.g. in our study millions of displacements are used for fitting), traditional methods like LSE are comparable to the state-of-the-art methods like MLE [1]. LSE combined with logarithmic binning can even perform better than MLE in some cases [13].
To demonstrate that P (d) with d ∈ [100m, 50km] corresponding to the regime of urban movements is better approximated by a stretched-exponential compared to other candidate models with a single statistical function such as truncated power-law or log-normal, we use Akaike's information criterion (AIC) [14] to measure the relative goodness of fit for this part.
In particular, AIC for each candidate model i is given by where L i is the maximum likelihood of the fitting function whose parameters are estimated by MLE, and K i is the number of parameters. The Akaike weight, which represents the relative likelihood of each candidate model i, is then given by where ∆ i = AIC i − AIC min and AIC min = min{AIC i }. Here we consider five commonlyused statistical functions for heavy-tailed probability density, namely exponential (E), powerlaw (PL), truncated power-law (TPL), log-normal (LN) and stretched-exponential (SE). It is clear that stretched-exponential has a dominating Akaike weight over other candidate functions, as shown in Table 3.

S7 Clustering Effects
For evaluating return probabilities, the trajectories of the users were adjusted in the following way: Each point (x i , y i ) of a trajectory was mapped to the point (x c , y c ) where (x c , y c ) is the centroid of the cluster containing (x i , y i ). The results for Figures 2, 4, and S4 use cluster sizes of 250m. Here, we investigate the effect of cluster size on the trends that we observe, in order to establish that these trends are independent of our cluster size selection.
We note that most studies that use cellular phone traces for mobility analysis [1,4] do not define explicit location clusters, as the spatial resolution of this data is based on tower locations, and is typically in the order of 1km. In other words, most mobile-phone based studies have implicit cluster sizes of 1 km. Because Twitter data provides a resolution of up to 10m (the realistic resolution of GPS [15]), Twitter-based mobility analysis requires the explicit clustering positions to account for multiple tweets from the same location. To provide a comparison point with cellular phone data, we consider explicit clustering of 1km, in addition to clusters of 50m and 500m. Figure S6 plots the variation of the probability of return to the most popular location P (L = 1) and the preferential return exponent α for the 3 cluster size values (50m, 500m, 1000m). Compared with Figure 4(b), the cluster size does not affect the dominant trends in these plots. P (L = 1) consistently decreases and α increases with increasing r g , pointing to weaker preferential return. P (L = 1) increases by about 0.08 as we increase cluster sizes from 50m to 1km, and α decreases slightly indicating a mild strengthening of preferential return for larger clusters. Despite these scale differences, it is clear that the cluster size selection does not affect the observed trends in weaker preferential return for larger r g .    Figure S6: The effect of cluster size on observed trends in P (L = 1) and α. Clearly, the cluster size affects the scale but not the pattern of decreasing P (L = 1) and increasing α for larger r g .