Fig 1.
We show the activity of a single user during one week of his life. From the top to the bottom panel we show the activity of calls (red), texts (green), movement (purple), and social proximity (blue). A daily pattern and a tendency for bursts can be noticed in the series, but further structure is not immediately visible. In the following we will discover the precise interactions between the different activities and use the emerging patterns to put upper bounds on the predictability of the future whereabouts of an individual.
Fig 2.
(A) We visualize the task of predicting future activity based on a four dimensional time series of length Δth. Each time bin has a width of 15 minutes and 0/1 represents activity/inactivity. The probability of activity at t = tf may be determined from the statistics of similar predictive patterns in the data set. For example, the particular predictive pattern shown here has a 45% probability of activity of the mobility type at tf = 30 min. (B) The data set needs a certain size in order to make accurate probability estimates. Here we show the informedness of the predictions at tf = 15 min based on a predictive pattern of length Δth = 45 min and varying data set sizes. Full convergence is obtained around 107 predictive patterns, which means that the data set is sufficiently large for this task. (C) Using the full data set, we then make predictions for tf = 15 min and vary the length of the predictive pattern. By increasing the memory, we also increase the informedness of our predictions, which is clear from the connected markers. The disconnected markers at Δth = 60 min and Δth = 75 min are limited by statistics, meaning that the true upper bound on informedness is not obtained. (D) We then fix the length of the predictive patterns to Δth = 45 min and vary tf. The lines represent the population average of the informedness for predictions that are based on individual activity patterns. The ‘x’ and ‘+’ markers represent the population average of the informedness for predictions that are based on common activity patterns. The common patterns almost explain the full information of the individual patterns, which tells us that the activity patterns are general within our population. (E) We expand the population average in terms of individual data sets for the case of movement and tf = 30 min (second predictive pattern from left, purple line, previous figure). The horizontal axis shows the informedness of predictions based on individual predictive patterns, while the vertical axis shows the informedness of predictions based on general predictive patterns. Both measures vary across individuals, but they do so in almost perfect agreement.
Fig 3.
We show the informedness of our predictions regarding future activity of the respective activity types: call (A), text (B), movement (C), and proximity (D). The horizontal axis shows the reach of the predictions into the future in minutes. The informedness of predictions from three different models are presented. “Nonparametric” (squares) is the label for the nonparametric model that was also presented in Fig 2, here using a history length of Δth = 45 min. “Inertia” (discs) is a simple model which assumes that the future continues in the same state as the current one. We see that the near future predictions of movement and proximity is dominated by this information. “Linear” (triangles) labels the predictions of a generalized linear model. Note that the informedness of the linear model almost matches the upper bound represented by the nonparametric model.
Fig 4.
In (A)-(C) we show the increase in activity triggered by respectively calling, movement, and social proximity. The horizontal axis spans 24 hours backward and forward in time and the vertical axis gives the factor of increased activity on a logarithmic scale. For example, the purple line in (A) tells us that movement is enhanced by a factor 2 at the time of a call and significantly increased for several hours after. More generally, all activity types are subject to self-enhancement effects, and significant cross-correlations are observed for all combinations of activities.
Fig 5.
Convergence of informedness with the size of the data set.
We show the cases of 60 minute histories (A) and 75 minute histories (B). Note that full convergence is not reached within the full size of the data set.
Fig 6.
In (A)-(C) we show the probability of activity conditioned on respectively a call, movement, and social proximity at Δt = 0. Also shown are reference activities (dashed), which takes into account the circadian correlations. The horizontal axis spans 24 hours backward and forward in time.