Recent studies for a wide range of human activities such as email communication, Web browsing, and library visiting, have revealed the bursty nature of human activities. The distribution of inter-event times (IETs) between two consecutive human activities exhibits a heavy-tailed decay behavior and the oscillating pattern with a one-day period, reflective of the circadian pattern of human life. Even though a priority-based queueing model was successful as a basic model for understanding the heavy-tailed behavior, it ignored important ingredients, such as the diversity of individual activities and the circadian pattern of human life. Here, we collect a large scale of dataset which contains individuals’ time stamps when articles are posted on blog posts, and based on which we construct a theoretical model which can take into account of both ignored ingredients. Once we identify active and inactive time intervals of individuals and remove the inactive time interval, thereby constructing an ad hoc continuous time domain. Therein, the priority-based queueing model is applied by adjusting the arrival and the execution rates of tasks by comparing them with the activity data of individuals. Then, the obtained results are transferred back to the real-time domain, which produces the oscillating and heavy-tailed IET distribution. This microscopic model enables us to develop theoretical understanding towards more empirical results.
Citation: Kim J, Lee D, Kahng B (2013) Microscopic Modelling Circadian and Bursty Pattern of Human Activities. PLoS ONE 8(3): e58292. https://doi.org/10.1371/journal.pone.0058292
Editor: Petter Holme, Umeå University, Sweden
Received: December 7, 2012; Accepted: February 1, 2013; Published: March 11, 2013
Copyright: © 2013 Kim et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: National Research Foundation grant awarded through the Acceleration Research Program (Grant No. 2010-0015066). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: JK is an employee of NHN Corp. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.
In the information age, a large scale of databases containing information on human activities on the Web are easily accessible. Understanding the emerging patterns from those datasets is a new interdisciplinary research subject , . Since individuals behave through complex and sometimes random decision-making processes, one may wonder whether it is indeed possible to predict human behaviors quantitatively. However, it was recently revealed that digital records left at the media behind one’s activities make it possible to predict human activities up to 93% . Accordingly, it has become an attractive subject to investigate emerging patterns from such large-scale data bases. Power-law or heavy-tailed behavior in the distribution of inter-event times (IET) between two consecutive human activities is one example of such emerging patterns. This example can be seen in various systems such as email – or surface mail communications , Web browsing , , library loans , financial trades , , on-line movie watching , file downloads –, printing requests , and various actions on the Web . This power-law behavior indicates that human activities proceed in a bursty manner during a short time interval, which is separated from other such intervals by long intermittent periods , .
Several theoretical models have been proposed to explain such heavy-tailed behaviors in the IET distribution. One interesting model is the priority-based queueing model , , , , in which the human activity of uploading articles is regarded as task executions in a queue, where tasks are performed based on the order of priorities assigned to each task. The use of this priority-based model leads to a power-law or heavy-tailed behavior in the waiting time distribution of tasks in the queue , , , –. The waiting time distribution was interpreted as the IET of human activities. However, the priority-based queueing model ignores important ingredients, such as the circadian pattern of human life  and the diversity of individual activities. Indeed, the empirical data recently collected exhibit an oscillating pattern with a one-day period in the IET distribution , , , , which cannot be produced in the queueing model. Moreover, the decay behavior of the IET distribution in the long-time regime depends on the activities of individuals. Here, the activity of an individual is defined as the average number of posted articles in unit time. In this paper, we obtained a large-scale dataset containing high-resolution data, and found a new pattern in the IET distribution that exhibits a power-law behavior when the IET is smaller than one day, where the exponent is insensitive to the activities of individuals. However, when the IET is longer than one day, the IET distribution exhibits a heavy-tailed behavior, in which the tail part depends on the activities of individuals. These empirical results are reproduced by developing a theoretical model below.
We analyze a large scale of dataset from the largest portal site in Korea, NAVER (http://naver.com) during more than five years. The dataset consists of individuals’ time stamps when articles were posted on blog posts, which were recorded in the unit of seconds. There are 520,771,167 postings contributed by 9,878,904 distinct bloggers. Among them, we only select the data that were written by bloggers that had authored more than 100 articles and worked for more than one month. This selection aims to exclude those bloggers who had posted suspicious spam content. After this data filtering, the number of remaining articles is 379,627,193, contributed by 908,409 users.
From this dataset, we obtained the following empirical results: (i) The IET distribution decays following a power law with the exponent in a time regime shorter than one day. (ii) The IET distribution exhibits a heavy-tailed decay behavior in the long-time regime, which is nonuniversal depending on individual activities. (iii) An oscillating pattern appears with a period of one day; this pattern persists over the entire long-time regime. However, the amplitude of the oscillation pattern decreases with time. Details regarding these results are presented below.
We measured the IETs defined as the interval between two consecutive time stamps for each user. Then the distribution of the IETs of user is obtained as , where is the number of events having an IET of . The total number of articles, , written by user is given as . To determine the collective behavior of all the users, we calculate(1) is shown in Fig. 1(a). When day, behaves as . When day, follows a skew distribution. Interestingly, there exists an oscillating pattern in , which can be seen more clearly in the finer scale shown in Fig. 1(b). Moreover, peak heights periodically change with a period of one week , , , . To check the periodicity of the oscillating pattern, we perform a Fourier transformation, . Figure 1(c) shows that there indeed exist two distinct meaningful peaks in at the frequencies corresponding to one day and one week, respectively. Other peaks correspond to multiples of one day. We study how such an oscillating pattern can be reproduced within the framework of the priority-based model later.
(a) Plot of the IET distribution based on the empirical data (). The IET distribution after the removal of the inactive time interval is also shown (solid curve). Inset: Comparison of the IET distribution obtained from the empirical data () with that from the theory (solid curve). (b) Enlarged representation of the IET distribution , in which clear periodic peaks are observed. (c) The Fourier transform of the IET distribution. Peaks are located at frequencies and . Other peaks at multiples of are redundant.
Next, we examine the dependence of the IET distribution on the activity of individuals. The activity of user is the number of articles written per unit time interval. Thus, when user writes articles during the time interval , , where is the time interval between the first and the last time stamp of user , the activity of user is given as . To determine the heterogeneity of individual activities, we measured the distribution of individual activities as shown in the inset of Fig. 2. Indeed, the distribution decays, following a power law with the exponent , indicating that individual activities are considerably heterogeneous. Thus, it is worth investigating how the heterogeneity of activities affects the IET distribution , . In Fig. 2, we can see that as one’s activity level becomes higher, the IET distribution decays faster in the long time regime. This behavior is rather natural in the sense that a user with higher activity has a shorter mean IET. Accordingly, it would be interesting to introduce a new model to illustrate this activity-dependent behavior, and such a model is presented later.
Results and Discussion
Modelling Oscillating Pattern
In previous studies, the heavy-tailed behavior of the IET distribution was investigated by using the priority-based queueing model. In this approach, time was considered as continuous without any intermission. However, humans do not work continuously, and hence, intermission, for example, those that account for sleeping, must be considered. Moreover, the pattern of daily life during weekdays is almost regular, but it differs from that during weekends. Thus, it is natural to assume that each person can have a regular time interval during which the person is away from on-line world. This time interval is called the inactive time interval, and the remaining time of a day is called the active time interval. Moreover, the duration and starting time of the active time interval depend on the individual (see Fig. 3).
(a) Distribution of the starting time of the active time interval. A peak is located between 9 and 10 am. (b) Distribution of active time intervals. The model is located at 16 h.
We suppose the situation that two events occur in the active period of one day (see Fig. 4a) at times and , where and and belong to the same active time interval. Then, the inter-event time is defined as . More generally, when two events are executed in different active intervals separated by , where is an integer (see Fig. 4b), we can obtain the following relation,(2)where is the IET after removing the inactive time intervals. This quantity is defined as the IET in the ad hoc time domain, and is denoted as . Then the ad hoc time domain is continuous. We find that any inter-event time belongs to one of the two sets of intervals and , defined as(3)and
It is assumed that an individual essentially lives a well-regulated daily life consisting of active and inactive time intervals. To reproduce the oscillating behavior of IET distribution within the framework of the queueing model, we construct an ad hoc time domain in which separated active time intervals are connected by removing inactive time intervals between them. See text for details.
Here, is the IET defined in the ad hoc time domain, which is related to in the original time domain as , where is the largest non-negative integer satisfying , which implies that there exist inactive time intervals during . is obtained from the queueing model , which is discussed later. Collecting all individuals’ , i.e., using the formula (7), we obtain , which exhibits a heavy-tailed distribution shown in Fig. 1.
We consider how to reproduce the oscillating behavior. For this purpose, we assume that an IET distribution is given, for example, the previously obtained from the empirical data, or from the queueing model . Then, we can obtain the IET distribution of user with the active time interval as follows:(8)where represents either or . is a rectangle function defined as(9)which represents the intervals defined in and . Next, we obtain the average over all users and obtain(10)where is the fraction of users whose active time interval is . The distribution of exhibits a peak at h as shown in Fig. 3(b). By plugging the empirical distribution into in Eq. (8), we successfully reproduce the oscillating pattern of the IET distribution in the inset of Fig. 1(a) and Fig. 5. When is replaced by the theoretical formula , the obtained result for is consistent with the simulated one, as shown in Fig. 6. It is noteworthy that the functional form of does not play an important role in determining the oscillating behavior of the IET distribution. For example, even for the flat distribution of , the oscillating pattern of can be obtained.
Empirical distributions are obtained by aggregating the top 100 users who have a clear periodicity with an active time interval , and the distributions suitably show the change in the peak height and width. The weighted average of is also displayed in (f), and we can observe the characteristic peaks.
To calculate in Eq.(8), we assume that . We consider the two cases (a) and (b) , as examples. (c) The distribution in Eq.(10) collected over the flat distribution of . The resulting theoretical IET distribution is consistent with the one obtained from the simulated data.
Modelling Activity Dependence
As discussed in the previous section, we have shown that the activities of individuals are heterogeneous and that their distribution follows a power law: with as shown in the inset of Fig. 2. That is, a few people post many articles and many others post only a few articles in a given interval. Moreover, individuals have their own active time intervals. Thus, it would be interesting to study how such heterogeneities affect the IET distribution. We categorize users into groups according to their activities, and we measure the IET distributions of each group as shown in Fig. 2. It is interesting to notice that the IET distribution appears to be independent of activities in the short-time regime within one day, but it depends on activities in the long-time regime.
In the priority-based queueing model introduced in Ref. , packets arrive at a queue with the rate and are executed with the rate , where the rates and are regarded as constants, independent of time and individuals. Here, however, since the activity and the period of the active time interval are different, we assign user index to the rates as and , and those quantities are assumed to depend on time. We consider as proportional to the frequency of blog postings at time by user . Next, we use the following relation between the execution rate and the activity ,(11)where is a proportionality constant. For the arriving rate , since we do not have any information of when a new task is arriving, we assume to be the same as .
Based on this idea, for each user , we perform numerical simulations as follows:
- We numerically generate both arrival and execution time sequences through the Poisson process with the rates and .
- Subsequent these time sequences, we input a task into the queue when it is not full of tasks, where the queue size Li is determined at a later stage. Upon arrival, the task is given a priority . At the same time, a task with the highest priority is executed and removed from the queue. The waiting time of the task is also recorded.
- We repeat this procedure until Ni waiting times are obtained. Ni is regarded as the number of blog posts uploaded by user .
In this model, the activity is determined to be , whereas the queue size and the proportionality constant remain to be determined.
To determine and , i.e., to generate a synthetic probability distribution function fit to the empirical data, we use the Kolmogorov-Smirnov (KS) statistical test . We obtain a set of and for each user by minimizing the KS statistic between the empirical data and simulated data. They are distributed as shown in Fig. 7. The closeness between the empirical data and the simulated data is tested (see Fig. 8): the obtained value is shown in the legend. It is known that if the -value is higher than a preassigned value (), then one can accept the null hypothesis that the probability distribution functions are identical. As we can see in the -value histogram of Fig. 7(b), most cases show good agreement between synthetic and empirical data with high values: The fraction of users is 23.2% for , and 86.3% for . Thus, it can be said that our theoretical result reasonably reproduces the empirical pattern.
(a) Distribution of the best-estimate parameters and of individuals. Contour lines are obtained by interpolation between each nearest point. The most dense point is described by and , and a large portion of cases settle around the peak point. (b) A fraction of the values in the KS test between synthetic and empirical probability distribution functions. Over 86% of cases have values that are larger than 0.1, and hence, the null hypothesis cannot be rejected for those cases.
Model predictions are calculated by two methods by using and (red solid lines), and by using time-averaged rates of and (blue dotted lines). The histograms in the upper panel of each plot represent the relative ratio of blog posts written during a certain hour of the day. In cases with clear periodicity (a) and (b), red solid and blue dotted lines show apparent differences. Otherwise (c) and (d), they exhibit very similar patterns, and the periodicity assumption seems to be irrelevant to them. On the other hand, we only consider data points on scales larger than 30 min, because the resolution of and is 1 h.
Moreover, we simulate the queuing process by using the average rates of and instead of the time-dependent form of and for each user. In most cases, there is only a slight difference between the two simulated results with different types of parameters as shown in Fig. 8. However, there are apparent different cases for the two results; these occur when periodic time intervals appear in the activity of writing blog posts. In this case, the time-dependent forms and are better for fitting to the empirical data.
In this work, we have studied the inter-event time statistics of human dynamics based on a large scale of on-line records of blog writings at a Korean portal site. We observed that the IET distributions of each user exhibit a universal pattern in the short-time regime, but they exhibit different decay patterns in the long-time regime, which depends on the activities of individual users. Moreover, we observed a clear periodic pattern with a period of one day, which reflects the circadian pattern of human behavior. We explained these patterns within the framework of the queueing model. First, we identified active and inactive time intervals of individual behaviors and then removed inactive time interval and constructed an ad-hoc time domain. Next, we applied the priority-based queueing model in the ad-hoc time domain by adjusting the arrival and execution rates of tasks to the empirical data. Following this, we returned to the real time domain and found our theoretical results to be in agreement with the empirical results including the positions of circadian peaks , , , . The microscopic studies performed in this paper enable us to understand these empirical results from a theoretical perspective.
We would like to thank Mr. Youn Sik Lee, Director of Data Information Center, for allowing for using the data after deleting user names, and Mr. Sukwon Kang for helpful discussion.
Conceived and designed the experiments: BK JK. Analyzed the data: JK DL. Contributed reagents/materials/analysis tools: JK. Wrote the paper: BK JK.
- 1. Lazer D, Pentland A, Adamic L, Aral S, Barabási AL, et al. (2009) SOCIAL SCIENCE: computational social science. Science 323: 721–723.
- 2. Castellano C, Fortunato S, Loreto V (2009) Statistical physics of social dynamics. Rev Mod Phys 81: 591646.
- 3. Song C, Qu Z, Blumm N, Barabási AL (2010) Limits of predictability in human mobility. Science 327: 1018–1021.
- 4. Barabási AL (2005) The origin of bursts and heavy tails in human dynamics. Nature 435: 207–211.
- 5. Johansen A (2004) Probing human response times. Physica A 338: 286–291.
- 6. Eckmann JP (2004) Entropy of dialogues creates coherent structures in e-mail traffic. Proc Natl Acad Sci U S A 101: 14333–14337.
- 7. Vázquez A, Oliveira JG, Dezsö Z, Goh KI, Kondor I, et al. (2006) Modeling bursts and heavy tails in human dynamics. Phys Rev E 73: 036127.
- 8. Vázquez A (2007) Impact of memory on human dynamics. Physica A 373: 747–752.
- 9. Malmgren RD, Stouffer DB, Motter AE, Amaral LAN (2008) A poissonian explanation for heavy tails in e-mail communication. Proc Natl Acad Sci U S A 105: 18153–18158.
- 10. Oliveira JG, Barabási AL (2005) Human dynamics: Darwin and einstein correspondence patterns. Nature 437: 1251.
- 11. Dezsö Z, Almaas E, Lukács A, Rácz B, Szakadát I, et al. (2006) Dynamics of information access on the web. Phys Rev E 73: 066132.
- 12. Scalas E, Kaizoji T, Kirchler M, Huber J, Tedeschi A (2006) Waiting times between orders and trades in double-auction markets. Physica A 366: 463–471.
- 13. Zhou T, Kiet HAT, Kim BJ, Wang BH, Holme P (2008) Role of activity in human dynamics. Europhys Lett 82: 28002.
- 14. Johansen A, Sornette D (2000) Download relaxation dynamics on the WWW following newspaper publication of URL. Physica A 276: 338–345.
- 15. Johansen A (2001) Response time of internauts. Physica A 296: 539–546.
- 16. Chessa AG, Murre JM (2004) A memory model for internet hits after media exposure. Physica A 333: 541–552.
- 17. Harder U, Paczuski M (2006) Correlated dynamics in human printing behavior. Physica A 361: 329–336.
- 18. Radicchi F (2009) Human activity in the web. Phys Rev E 80: 026118.
- 19. Barabási AL (2011) Bursts: the hidden patterns behind everything we do, from your e-mail to bloody crusades. New York: Plume.
- 20. Karsai M, Kaski K, Barabási AL, Kertész J (2012) Universal features of correlated bursty behaviour. Sci Rep 2: 397.
- 21. Wu Y, Zhou C, Xiao J, Kurths J, Schellnhuber HJ (2010) Evidence for a bimodal distribution in human communication. Proc Natl Acad Sci U S A 107: 18803–18808.
- 22. Jo HH, Pan RK, Kaski K (2012) Time-varying priority queuing models for human dynamics. Phys Rev E 85: 066102.
- 23. Hidalgo R CA (2006) Conditions for the emergence of scaling in the inter-event time of uncorrelated and seasonal systems. Physica A 369: 877–883.
- 24. Malmgren D, Stouffer D, Campanharo A, Nunes Amaral L (2009) On universality in human correspondence activity. Science 325: 1696–1700.
- 25. Vajna S, Tóth B, Kertész J (2012) Modelling power-law distributed interevent times: arXiv: 1211.1175.
- 26. Jo HH, Karsai M, Kertész J, Kaski K (2012) Circadian pattern and burstiness in mobile phone communication. New J Phys 14: 013055.
- 27. Holme P (2003) Network dynamics of ongoing social relationships. Europhys Lett 64: 427–433.
- 28. Goh KI, Barabási AL (2008) Burstiness and memory in complex systems. Europhys Lett 81: 48002.
- 29. Kivelä M, Pan RK, Kaski K, Kertész J, Saramäki J, et al.. (2011) Multiscale analysis of spreading in a large communication network. J Stat Mech: P03005.
- 30. Grinstein G, Linsker R (2006) Biased diffusion and universality in model queues. Phys Rev Lett 97: 130201.
- 31. Lewis PAW, Shedler GS (1979) Simulation of nonhomogeneous poisson processes by thinning. Naval Research Logistics Quarterly 26: 403–413.
- 32. Conover WJ (1999) Practical nonparametric statistics. New York: Wiley.