Microscopic Modelling Circadian and Bursty Pattern of Human Activities

Recent studies for a wide range of human activities such as email communication, Web browsing, and library visiting, have revealed the bursty nature of human activities. The distribution of inter-event times (IETs) between two consecutive human activities exhibits a heavy-tailed decay behavior and the oscillating pattern with a one-day period, reflective of the circadian pattern of human life. Even though a priority-based queueing model was successful as a basic model for understanding the heavy-tailed behavior, it ignored important ingredients, such as the diversity of individual activities and the circadian pattern of human life. Here, we collect a large scale of dataset which contains individuals’ time stamps when articles are posted on blog posts, and based on which we construct a theoretical model which can take into account of both ignored ingredients. Once we identify active and inactive time intervals of individuals and remove the inactive time interval, thereby constructing an ad hoc continuous time domain. Therein, the priority-based queueing model is applied by adjusting the arrival and the execution rates of tasks by comparing them with the activity data of individuals. Then, the obtained results are transferred back to the real-time domain, which produces the oscillating and heavy-tailed IET distribution. This microscopic model enables us to develop theoretical understanding towards more empirical results.


Introduction
In the information age, a large scale of databases containing information on human activities on the Web are easily accessible. Understanding the emerging patterns from those datasets is a new interdisciplinary research subject [1,2]. Since individuals behave through complex and sometimes random decision-making processes, one may wonder whether it is indeed possible to predict human behaviors quantitatively. However, it was recently revealed that digital records left at the media behind one's activities make it possible to predict human activities up to 93% [3]. Accordingly, it has become an attractive subject to investigate emerging patterns from such large-scale data bases. Power-law or heavy-tailed behavior in the distribution of inter-event times (IET) between two consecutive human activities is one example of such emerging patterns. This example can be seen in various systems such as email [4][5][6][7][8][9] or surface mail communications [10], Web browsing [7,11], library loans [7], financial trades [7,12], on-line movie watching [13], file downloads [14][15][16], printing requests [17], and various actions on the Web [18]. This power-law behavior indicates that human activities proceed in a bursty manner during a short time interval, which is separated from other such intervals by long intermittent periods [19,20].
Several theoretical models have been proposed to explain such heavy-tailed behaviors in the IET distribution. One interesting model is the priority-based queueing model [4,7,21,22], in which the human activity of uploading articles is regarded as task executions in a queue, where tasks are performed based on the order of priorities assigned to each task. The use of this priority-based model leads to a power-law or heavy-tailed behavior in the waiting time distribution of tasks in the queue [9,13,18,[23][24][25]. The waiting time distribution was interpreted as the IET of human activities. However, the priority-based queueing model ignores important ingredients, such as the circadian pattern of human life [26] and the diversity of individual activities. Indeed, the empirical data recently collected exhibit an oscillating pattern with a oneday period in the IET distribution [6,13,18,27], which cannot be produced in the queueing model. Moreover, the decay behavior of the IET distribution in the long-time regime depends on the activities of individuals. Here, the activity of an individual is defined as the average number of posted articles in unit time. In this paper, we obtained a large-scale dataset containing highresolution data, and found a new pattern in the IET distribution that exhibits a power-law behavior when the IET is smaller than one day, where the exponent is insensitive to the activities of individuals. However, when the IET is longer than one day, the IET distribution exhibits a heavy-tailed behavior, in which the tail part depends on the activities of individuals. These empirical results are reproduced by developing a theoretical model below.

Methods
We analyze a large scale of dataset from the largest portal site in Korea, NAVER (http://naver.com) during more than five years. The dataset consists of individuals' time stamps when articles were posted on blog posts, which were recorded in the unit of seconds. There are 520,771,167 postings contributed by 9,878,904 distinct bloggers. Among them, we only select the data that were written by bloggers that had authored more than 100 articles and worked for more than one month. This selection aims to exclude those bloggers who had posted suspicious spam content. After this data filtering, the number of remaining articles is 379,627,193, contributed by 908,409 users.
From this dataset, we obtained the following empirical results: (i) The IET distribution decays following a power law with the exponent a^1:5 in a time regime shorter than one day. (ii) The IET distribution exhibits a heavy-tailed decay behavior in the long-time regime, which is nonuniversal depending on individual activities. (iii) An oscillating pattern appears with a period of one day; this pattern persists over the entire long-time regime. However, the amplitude of the oscillation pattern decreases with time. Details regarding these results are presented below.
We measured the IETs defined as the interval between two consecutive time stamps for each user. Then the distribution P i (t) of the IETs of user i is obtained as is the number of events having an IET of t. The total number of articles, N i , written by user i is given as To determine the collective behavior of all the users, we calculate : ð1Þ P(t) is shown in Fig. 1(a). When tv1 day, P(t) behaves as *t {1:5 . When tw1 day, P(t) follows a skew distribution. Interestingly, there exists an oscillating pattern in P(t), which can be seen more clearly in the finer scale shown in Fig. 1(b). Moreover, peak heights periodically change with a period of one week [6,13,18,27]. To check the periodicity of the oscillating pattern, we perform a Fourier transformation, Figure 1(c) shows that there indeed exist   two distinct meaningful peaks in P(t) at the frequencies corresponding to one day and one week, respectively. Other peaks correspond to multiples of one day. We study how such an oscillating pattern can be reproduced within the framework of the priority-based model later.
Next, we examine the dependence of the IET distribution on the activity of individuals. The activity A i of user i is the number of articles written per unit time interval. Thus, when user i writes N i articles during the time interval T (i) tot [13,18], where T (i) tot is the time interval between the first and the last time stamp of user i, the activity of user i is given as A i~Ni =T (i) tot . To determine the heterogeneity of individual activities, we measured the distribution of individual activities as shown in the inset of Fig. 2. Indeed, the distribution decays, following a power law with the exponent &2:6, indicating that individual activities are considerably heterogeneous. Thus, it is worth investigating how the heterogeneity of activities affects the IET distribution [28,29]. In Fig. 2, we can see that as one's activity level becomes higher, the IET distribution decays faster in the long time regime. This behavior is rather natural in the sense that a user with higher activity has a shorter mean IET. Accordingly, it would be interesting to introduce a new model to illustrate this activity-dependent behavior, and such a model is presented later.

Modelling Oscillating Pattern
In previous studies, the heavy-tailed behavior of the IET distribution was investigated by using the priority-based queueing model. In this approach, time was considered as continuous without any intermission. However, humans do not work continuously, and hence, intermission, for example, those that account for sleeping, must be considered. Moreover, the pattern of daily life during weekdays is almost regular, but it differs from that during weekends. Thus, it is natural to assume that each person can have a regular time interval during which the person is away from on-line world. This time interval is called the inactive time interval, and the remaining time of a day is called the active time interval. Moreover, the duration and starting time of the active time interval depend on the individual (see Fig. 3).
We suppose the situation that two events occur in the active period of one day (see Fig. 4a) at times h 1 and h 2 , where h 1 vh 2 and h 1 and h 2 belong to the same active time interval. Then, the inter-event time is defined as t~h 2 {h 1 . More generally, when two events are executed in different active intervals separated by nTzT I , where n is an integer n §0 (see Fig. 4b), we can obtain the following relation, where t{(nz1)T I is the IET after removing the inactive time intervals. This quantity is defined as the IET in the ad hoc time  domain, and is denoted as t 0 . Then the ad hoc time domain is continuous. We find that any inter-event time t belongs to one of the two sets of intervals T 1 and T 2 , defined as and T 2~f tDT I vt{nTvTg for h 1 wh 2 : The fraction of each category is given as and q 2 (t; T A ,n)~t respectively. Let P (i) ad (t 0 ) be the IET distribution of user i in the ad hoc time domain, and let P ad (t 0 ) be the collective one from individuals, defined as Here, t 0 is the IET defined in the ad hoc time domain, which is related to t in the original time domain as t 0~t {mT (i) I , where m is the largest non-negative integer satisfying t 0 w0, which implies that there exist m inactive time intervals during t. P (i) ad (t 0 ) is obtained from the queueing model [30], which is discussed later. Collecting all individuals' P (i) ad (t 0 ), i.e., using the formula (7), we obtain P ad (t 0 ), which exhibits a heavy-tailed distribution shown in Fig. 1.
We consider how to reproduce the oscillating behavior. For this purpose, we assume that an IET distribution is given, for example, the previously obtained P ad (t 0 ) from the empirical data, or P model (t 0 )*t 0 {1:5 from the queueing model [30]. Then, we can obtain the IET distribution of user i with the active time interval T (i) A as follows: where P x represents either P ad or P model . II is a rectangle function defined as  II(x; a,b): which represents the intervals defined in T 1 and T 2 . Next, we obtain the average P i (t; T (i) A ) over all users and obtain where r(T A ) is the fraction of users whose active time interval is T A . The distribution of r(T A ) exhibits a peak at T A~1 6 h as shown in Fig. 3(b). By plugging the empirical distribution P ad into P x in Eq. (8), we successfully reproduce the oscillating pattern of the IET distribution P theory (t) in the inset of Fig. 1(a) and Fig. 5.
When P x is replaced by the theoretical formula P model (t 0 )*t 0 {1:5 [30], the obtained result for P theory (t) is consistent with the simulated one, as shown in Fig. 6. It is noteworthy that the functional form of r(T A ) does not play an important role in determining the oscillating behavior of the IET distribution. For example, even for the flat distribution of r(T A ), the oscillating pattern of P theory (t) can be obtained.

Modelling Activity Dependence
As discussed in the previous section, we have shown that the activities of individuals are heterogeneous and that their distribution follows a power law: P a (k)*k {k with k&2:6+0:1 as shown in the inset of Fig. 2. That is, a few people post many articles and many others post only a few articles in a given interval. Moreover, individuals have their own active time intervals. Thus, it would be interesting to study how such heterogeneities affect the IET distribution. We categorize users into groups according to their activities, and we measure the IET distributions of each group as shown in Fig. 2. It is interesting to notice that the IET distribution appears to be independent of activities in the short-time regime within one day, but it depends on activities in the long-time regime.
In the priority-based queueing model introduced in Ref. [30], packets arrive at a queue with the rate l and are executed with the rate m, where the rates l and m are regarded as constants, independent of time and individuals. Here, however, since the activity and the period of the active time interval are different, we assign user index i to the rates as l i and m i , and those quantities are assumed to depend on time. We consider m i (t) as proportional to the frequency of blog postings at time t by user i. Next, we use the following relation between the execution rate m i (t) and the activity A i , where c i is a proportionality constant. For the arriving rate l i (t), since we do not have any information of when a new task is arriving, we assume l i (t) to be the same as m i (t).
Based on this idea, for each user i, we perform numerical simulations as follows: We numerically generate both arrival and execution time sequences ft k g[(0,T (i) tot through the Poisson process with the rates m i (t) and l i (t) [31]. ii) Subsequent these time sequences, we input a task into the queue when it is not full of L i tasks, where the queue size Li is determined at a later stage. Upon arrival, the task is given a priority x[½0,1. At the same time, a task with the highest priority is executed and removed from the queue. The waiting time of the task is also recorded. iii) We repeat this procedure until N i waiting times are obtained.
N i is regarded as the number of blog posts uploaded by user i.
In this model, the activity is determined to be A i~Ni =T (i) tot , whereas the queue size L i and the proportionality constant c i remain to be determined.
To determine L i and c i , i.e., to generate a synthetic probability distribution function fit to the empirical data, we use the Kolmogorov-Smirnov (KS) statistical test [32]. We obtain a set ofL L j andĉ c j for each user i by minimizing the KS statistic between the empirical data and simulated data. They are distributed as shown in Fig. 7. The closeness between the empirical data and the simulated data is tested (see Fig. 8): the obtained p value is shown in the legend. It is known that if the p-value is higher than a preassigned value (p~0:05), then one can accept the null hypothesis that the probability distribution functions are identical. As we can see in the p-value histogram of Fig. 7(b), most cases show good agreement between synthetic and empirical data with high p values: The fraction of users is 23.2% for pw0:9, and 86.3% for pw0:1. Thus, it can be said that our theoretical result reasonably reproduces the empirical pattern.
Moreover, we simulate the queuing process by using the average rates of SmT and SlT instead of the time-dependent form of m(t) and l(t) for each user. In most cases, there is only a slight difference between the two simulated results with different types of parameters as shown in Fig. 8. However, there are apparent different cases for the two results; these occur when periodic time intervals appear in the activity of writing blog posts. In this case, the time-dependent forms m(t) and l(t) are better for fitting to the empirical data.

Conclusions
In this work, we have studied the inter-event time statistics of human dynamics based on a large scale of on-line records of blog writings at a Korean portal site. We observed that the IET distributions of each user exhibit a universal pattern in the shorttime regime, but they exhibit different decay patterns in the longtime regime, which depends on the activities of individual users. Moreover, we observed a clear periodic pattern with a period of one day, which reflects the circadian pattern of human behavior. We explained these patterns within the framework of the queueing model. First, we identified active and inactive time intervals of individual behaviors and then removed inactive time interval and constructed an ad-hoc time domain. Next, we applied the prioritybased queueing model in the ad-hoc time domain by adjusting the arrival and execution rates of tasks to the empirical data. Following this, we returned to the real time domain and found our theoretical results to be in agreement with the empirical results including the positions of circadian peaks [6,13,18,27]. The microscopic studies performed in this paper enable us to understand these empirical results from a theoretical perspective.