Research on temporal characteristics of human dynamics has attracted much attentions for its contribution to various areas such as communication, medical treatment, finance, etc. Existing studies show that the time intervals between two consecutive events present different non-Poisson characteristics, such as power-law, Pareto, bimodal distribution of power-law, exponential distribution, piecewise power-law, et al. With the occurrences of new services, new types of distributions may arise. In this paper, we study the distributions of the time intervals between two consecutive visits to QQ and WeChat service, the top two popular instant messaging services in China, and present a new finding that when the value of statistical unit T is set to 0.001s, the inter-event time distribution follows a piecewise distribution of exponential and power-law, indicating the heterogeneous character of IM services users’ online behavior in different time scales. We infer that the heterogeneous character is related to the communication mechanism of IM and the habits of users. Then we develop a combination model of exponential model and interest model to characterize the heterogeneity. Furthermore, we find that the exponent of the inter-event time distribution of the same service is different in two cities, which is correlated with the popularity of the services. Our research is useful for the application of information diffusion, prediction of economic development of cities, and so on.
Citation: Cui H, Li R, Fang Y, Horn B, Welsch RE (2018) Heterogeneous characters modeling of instant message services users’ online behavior. PLoS ONE 13(5): e0195518. https://doi.org/10.1371/journal.pone.0195518
Editor: Xiao-Pu Han, Hangzhou Normal University, CHINA
Received: July 29, 2017; Accepted: March 23, 2018; Published: May 7, 2018
Copyright: © 2018 Cui et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work has been supported by the National Natural Science Foundation of China (61201153), the National 973 Program of China under Grant (2012CB315805), CCF Venus Research Project (CCF-VenustechRP2016004), Prospective Research Project on Future Networks in Jiangsu Future Networks Innovation Institute (BY2013095-2-16).
Competing interests: The authors have declared that no competing interests exist.
The study of distribution characteristics of human behavior has a long history. For a long while, people have been using the Poisson distribution to quantify the model of human activities. A different opinion appeared in 2005, when Albert-László and Barabási published a paper titled as ‘the origin of bursts and heavy tails in human dynamics’ in Nature, which proposed that the distributions of time intervals between two consecutive events, called inter-event time, followed a heavy tailed distribution rather than the exponential distribution produced by the Poisson process. The new opinion is different from the traditional observation of human behavior and led to a study frenzy on human behavior. Most researchers thought the distributions of inter-event time fits a power-law distribution. For example, in paper  researchers use the freely available Wikipedia’s editing records and find that the time series of events whose inter-event times follows a probability distribution that displays a fat tail. Alexei Vázquez, et al. In paper  find the distribution of inter-event times (IETs) between two consecutive human activities exhibits a heavy-tailed decay behavior and the oscillating pattern with a one-day period, reflective of the circadian pattern of human life. Yadong Zhou, et al. found that the dynamic sizes of incidental topic groups followed a heavy-tailed distribution, and developed an adaptive parametric method for predicting the dynamics of incidental topic groups based on the finding in . However, some researchers have different opinions. Malmgren R D, et al. mentioned that for the correspondence of sixteen famous writers, actors, politicians and scientists from the middle of sixteenth century to the middle of twentieth century, the inter-event time distribution can be better described by a cascading Poisson process than other kinds in . Stouffer D B, et al.  thought that the lognormal distribution better describes e-mail communications. In , László Gyarmati and Tuan Anh Trinh showed that users’ time spent online fit Weibull distributions whereas the duration of user’s online session fit a power-law distribution. Chenxu Wang, et al. found that the distribution of inter-event times of microblog posting and wiki revising followed a piecewise distribution, and they proposed that the human dynamics were heterogeneous in different time scales in , . There are obvious characteristics of circadian rhythm , burstyness , and memorability  in human behavior, which may explain the heavy-tailed distribution. Modeling is one of the best ways to reveal the pattern of human behavior. Based on the existing results, the models used to quantify human behavior can be roughly divided into three classes: 1) model based on queuing theory; 2) model based on memory , interest , rhythm  or some other ingredients; 3) model based on social interaction . Besides direct analysis of temporal statistical characteristics of human behavior, there is some work aiming at the statistical characteristics over time of systems driven by human –.
There may be a relationship between mobile Internet services usage behavior and other human behavior. For example, in  Yuanyuan Qiao, Xiaoxing Zhao, Jie Yang, and Jiajia Liu proposed that app usage had a strong relationship with human mobility. In  Fengli Xu, Yong Li, Min Chen, and Sheng Chen found a link between cyberspace and the physical world with social ecology. Furthermore, researchers in  proposed that the behavior pattern of IM services users is closely correlated with the development of the economy, transportation, and communication in the same area. For research on human dynamics, researchers focus on the inter-event time distribution of different activities. From existing studies, we can see that some different types of distributions are proposed to describe the inter-event time distribution. With the emergence of new services, human behavior may show some different temporal characteristics and thus need to be described with new types of distributions. In China, QQ and WeChat are two of the most popular mobile Internet services. Both of them belong to IM (Instant Messaging) services. People can send messages, have voice or video chat with friends, write logs, and so on by using QQ and WeChat. On average, billions of records can be produced by QQ and WeChat users in major cities every day. So the analysis of the temporal characteristics of QQ and WeChat users’ online behavior is useful for research on human dynamics. In this paper, we focus on analyzing the temporal characteristics of QQ and WeChat users’ online behavior in two developed cities Chongqing(City-A) and Tianjin(City-B) in China. The research can promote the study of human behavior and the prediction of a city’s economic condition.
The paper’s structure is as follows: Sect. II gives a brief introduction of our real data set and data processing. Sect. III analyzes the temporal characteristics of QQ users’ online behavior. Sect. IV analyzes the temporal characteristics of WeChat users’ online behavior. Sect. V analyzes the relationship between the inter-event time distribution and the popularity of IM services. In Sect. VI we propose a combination model to describe the heterogeneous characteristics of IM services users’ online behavior and verify the accuracy of the combination model using the Ali-talk data. In Sect. VII we summarize our conclusions from the paper.
Our raw datasets (A, B, C) were obtained from China Unicom. The dataset A consists of 171,734,242 records produced by all of QQ users and 30,070,724 records produced by all of WeChat users in Chongqing China from Nov 17th, 2012 to Nov 21st, 2012, six days in total. The dataset B covers 55,942,780 records produced by all of QQ users and 366,288,051 records produced by all of WeChat users in Tianjin China from May 4th, 2014 to May 10th, 2014, seven days in total. The dataset C covers 118,255,953 records produced by all of QQ users and 123,539,778 records produced by all of WeChat users in Chongqing China from Jul 17th, 2014 to Jul 23rd, 2014, seven days in total. The records consist of user id, traffic type, start time when users access to the server, end time and duration. In this paper, we focus on analyzing the temporal characteristics. The format of the dataset is shown in Table 1. Through data preprocessing, we can get the experimental datasets, including File A, File B and File C in S1 Supporting Information.
For analyzing temporal characteristics, we count the inter-event time distribution as follows:
- We set the time window size as one day , that is, we use one day’s records to analyze temporal characteristics. The consecutive access times to the same server are denoted as t1, t2, …, tn, where n represents the number of record, t1 < t2 < … < tn. The inter-event time τ represents the time intervals between two consecutive access times as calculated below: (1)
- In this paper, we analyze the distribution, P(τ) of the inter-event times τ. To be convenient for statistics, we use a term named T to calculate P(τ). is equal to the ratio of total number of τ which are in ((i − 1)*T, i * T) which we call ni and total number of records N,
where i represents the number of T.
- We calculate the P(τ) when T is set to 0.001s, 0.1s and 1s respectively and find that only when T is 0.001s, do the inter-event time distributions of IM services follow a piecewise distribution. So it’s worth looking in more detail at the distributions when T is set to 0.001s.
Temporal characteristics of QQ users’ online behavior
The X axis represents the logarithm of the inter-event time τ. The Y axis represents the logarithm of the distribution P(τ). Points with a different sign show P(τ) on different days. Figs 1, 2 and 3 are the distributions P(τ) in dataset A when T is 1s, 0.1s and 0.001s, respectively. Figs 4, 5 and 6 are the distributions P(τ) in dataset B when T is 1s, 0.1s and 0.001s, respectively. Figs 7, 8 and 9 are the distributions P(τ) in dataset C when T is 1s, 0.1s and 0.001s, respectively.
From Figs 3, 6 and 9 we can see that when T is 0.001s, P(τ) in both cities can be described by a piecewise distribution, the vertical dotted line marks the transition point of the piecewise distribution, indicating that the QQ users’ online behavior is heterogeneous in different time scales. This heterogeneous nature doesn’t emerge in other pictures. In Figs 1, 2, 7 and 8, most of τ are smaller than 0.1s, so P(τ) converges to a point. The inset shows the distribution of τ > 0.1s. In Figs 4 and 5, P(τ) follows a power-law distribution when T is 0.1s and 1s. In this paper, we focus on the situation when T is 0.001s. We set τ0 as the transition point of the piecewise distribution and τ0 is the closest point between the exponential function and the power-law distribution.
From Figs 3, 6 and 9 we can see that the trend of P(τ) in different days is quite similar. All of them fit piecewise distribution. When τ < τ0, the distribution function is P(τ) = aebτ, where the exponent parameter b is 287.8, 115.1 and 205.5 respectively. When τ > τ0, the distribution function is P(τ) = aτb, where the power-law exponent parameter b is 6.3, 2.782 and 6.201 respectively. We use the correlation coefficient to evaluate the goodness of fit. The value of R2 shown in Table 2 is defined as below.(3)
In this paper, R2 can be calculated by the average value of real data in seven days and the fitting data in Fig 1.
It is worth noting that the tail of P(τ) is fat in Fig 1(d), 1(e) and 1(f). That is because the tail of P(τ) changes too much. Here, to find whether there are other functions to fit the fat tail better, so we do an experiment as follows. In Fig 1, the tail of P(τ) belongs to heavy-tailed distribution. Moreover, the Gaussian and Weibull are two of the most popular heavy-tailed distributions. Furthermore, we use Gaussian and Weibull distribution to fit the tail of P(τ) respectively and analyze their goodness of fit. The value of R2 is shown in Table 3. We find R2 of them are less than 0.5, so both of the fitting functions can not fit P(τ)better than a power-law distribution.
From Table 2, we find that the parameter b of two cities are different for QQ. The more popular service, the bigger b is. The specific analysis process is introduced in part V.
Temporal characteristics of WeChat users’ online behavior
The X axis represents the logarithm of the inter-event time τ. The Y axis represents the logarithm of the distribution P(τ). Dots with a different color show P(τ) on different days. Figs 10, 11 and 12 are the distributions P(τ) for dataset A when T is 1s, 0.1s and 0.001, respectively. Figs 13, 14 and 15 are the distributions P(τ) for dataset B when T is 1s, 0.1s and 0.001s, respectively. Figs 16, 17 and 18 are the distributions P(τ) for dataset C when T is 1s, 0.1s and 0.001s.
WeChat has the same communication mechanism as QQ, which also can be seen from Figs 10–18. Only one different situation is that when T = 0.1s, P(τ) of WeChat in dataset A and dataset C fit power-law distribution. P(τ) of WeChat in both cities also follow a piecewise distribution when the T is 0.001s. In Figs 12, 15 and 18 the exponent parameter of P(τ) is 74.34, 847.2 and 228 respectively when τ < τ0. The power-law exponent parameter of P(τ) for City-A and City-B is 2.68, 2.228 and 4.19 respectively when τ > τ0. The specific fitting conditions can be seen in Table 2. The tail of P(τ) is fat in Figs 13, 14 and 15 which is similar to Figs 1–9. We also use Gaussian and Weibull distribution to fit them. The value of R2 is shown in Table 3.
Relationship between inter-event time distribution and popularity of services
To explore the reason why the parameter b is different in different cities, we count the number of records produced by IM services users as shown in Table 4. For QQ, no matter whether τ < τ0 or τ > τ0, the larger the size of records, the bigger the exponent parameter of fitting function. For WeChat, it has the same rules with QQ when τ < τ0. But when τ > τ0, the rules only exists in the same city. The size of records can show the popularity of a service in the city. So there is some relationship between inter-event time distribution and popularity of services, which may be related to the urban economic level.
A variety of models have been proposed to explain the Non-Poisson characteristics of human behavior. Because of the heterogeneous character of IM services users’ behavior in different time scales, we don’t use any model put forward before.
The results obtained by real data show that when τ > τ0 the inter-event time distribution of IM services fits a power-law distribution and the exponents of the distributions are larger than 1. The new model proposed by Chengxu Wang et al. in , used the interest model to explain the power-law distribution in the large time scale because the exponent of it is larger than 1. And SHANG M S et al. hold that the interest model is also suitable for IM services , because users will visit the services once again based on their preference. So we consider the interest model to explain the power-law distribution of our results at first. The communication mechanism of IM services is the same as Short Messages Services: Users start communications randomly, and the arrival of communications follows an Poisson process so that the inter-event time distribution follows exponential distribution. After triggering a communication, there are frequent exchanges of information between the user pairs. During this stage, the time interval is not uniform, the long waiting time exists and the inter-event time distribution follows a power-law distribution.
Based on the inter-event time distribution obtained by real data and the communication mechanism of IM services, in the paper, we consider a combined model of exponential function and interest model based on our analysis. For the case of τ < τ0, IM services users’ behavior is driven by a Poisson process. And when τ > τ0, the interest mechanism drives the behavior of IM users as referred to in .
When τ < τ0, the arrival of events is a Poisson process. For the Poisson process, the inter-event time distribution follows an exponential distribution: (4) where λ is the arrival rate of events and τ is the inter-event time.
From Table 2 we can see that two parameters of the exponential function are different when τ < τ0, so in this paper we use the deduction form of exponential distribution called exponential model in this paper to fit the real data. (5) where β1λ is equal to a in Table 2, and λ is equal to b in Table 2.
When τ > τ0, we use the interest model introduced in  to describe the inter-event time distribution. The interest model assumes that (i) the interest xi(t) at time step t of a user is quantified by the probability that an action will occur in this time; (ii) at each time step t, if an action occurs then the interest xi(t) is reset to 1; (iii) when the last action occurred at step t0, then the interest at step t is set as: (6) where α is a free parameter. If an action occurs at a certain time step t, the probability that next action occurs at time step t + Δt is (7)
In the interest model, (8) where γ is the power exponent of the interest model. We can get γ from real data and evaluate α by Eq (8).
To get the combination model used in this paper, the exponential model and interest model should be combined. However, the Eqs (5) and (7), which represent the exponential model and interest model respectively, can’t be combined directly now. We rescale the interest model as: (9) where β2 is a tuning parameter which set the initial probability of the interest model. We adjust the value of β2 to fit the real data. Since Pp(τ0) = Pi(τ0), then (10)
To verify the accuracy of this combination model, we simulate the actual data. We use the data of Ali-talk in dataset A, and calculate the P(τ). Ali-talk is an app which is widely used by users of Taobao. According to the P(τ) obtained by real data, we can get the parameters α, τ0, β2 and λ. Then using Eq (11) and β1 which we calculated by Eq (10) we get similar results between Alitalk data and simulation data and shown in Fig 19.
T is set to 0.001s. The result is obtained with parameter values .
Fig 20 represents the difference between real data and simulation data shown in Fig 3. We can see the difference between Ali-talk data and simulation data is less than 1.6*10−3. It verifies that the combination model can catch the heterogeneity of IM services users properly.
This paper investigates the inter-event time distribution of QQ and WeChat in two cities, and reveals that the inter-event time distributions of QQ and WeChat in both cities follow a piecewise distribution of exponential and power-law distribution when the T is set to 0.001s, thus indicating that the online behavior of IM services users’ are heterogeneous in different time scales. The phenomena may be caused by the communication mechanism of IM services and the hobby of users. The simulation results verify that the combination model proposed in the paper can describe the heterogeneity of IM services users properly. The new finding is useful for the application of information diffusion, disease infection, prediction of the economic development of a City, research on the mechanism of IM services, and so on.
Though the inter-event time distribution of QQ in the two cities follows a piecewise distribution, the parameters of the fitting distribution in two cities are distinct. As is referred to in , the behavior pattern of IM users is closely correlated with the development of the economy, transportation and communication in the same area. It’s promising to explore the relationship between inter-event time distribution of QQ in one city and the city index. So that we can get the city index easily.
- 1. BARABÁSI A L. The origin of bursts and heavy tails in human dynamics. Nature. 2005;435(7039): 207–211. pmid:15889093
- 2. Gandica Y, Carvalho J, Dos Aidos FS, Lambiotte R, Carletti T. Stationarity of the inter-event power-law distributions. Plos One.2017;12(3):e0174509. pmid:28346480
- 3. Kim J, Lee D, Kahng B. Microscopic Modelling Circadian and Bursty Pattern of Human Activities. Plos One.2013;8(3):e58292. pmid:23505479
- 4. Zhou Y, Guan X, Zheng Q, Sun Q, Zhao J. Group dynamics in discussing incidental topics over online social networks. IEEE Network,2010;24(6):42–47.
- 5. Malmgren R D, Stouffer D B, Campanharo A S L O, Amaral L A N. On universality in human correspondence activity. Science.2009;325(5948):1696–1700. pmid:19779200
- 6. Stouffer D B, Malmgren R D, Amaral L A N. Log-normal statistics in e-mail communication patterns. Exprint Arxiv Physics.2006;53(6):187–225. arXiv:physics/0605027.
- 7. Gyarmati L, Trinh T A. Measuring User Behavior in Online Social Networks. IEEE Network.2010;24(5):26–31.
- 8. Wang C, Guan X, Qin T, Yang T. Modeling heterogeneous and correlated human dynamics of online activities with double Pareto distributions. Information Sciences.2016;330:186–198.
- 9. Wang C, Guan X, Qin T, Yang T. Modeling the heterogeneity of human dynamics based on the measurements of influential users in Sina Microblog. Physica A: Statistical Mechanics and its Applications.2015;428:239–249.
- 10. Zhou T, Zhao Z D, Yang Z, Zhou C. Relative clock verifies endogenous bursts of human dynamics. EPL (Europhysics Letters).2012;97(1):18006.
- 11. Goh K I, Barabási A L. Burstiness and memory in complex systems. EPL (Europhysics Letters).2008;81(4):48002.
- 12. Hou L, Pan X, Guo Q, Liu J. Memory effect of the online user preference. Scientific Reports.2014;4:6560. pmid:25308573
- 13. Vazquez A. Impact of memory on human dynamics. Physica A: Statistical Mechanics and its Applications.2006;373(36):747–752.
- 14. Ming-Sheng S, Guan-Xiong C, Shuang-Xing D, Bing-Hong W, Tao Z. Interest-driven model for human dynamics. Chinese Physics Letters. 2010;27(4):48701–48703.
- 15. Malmgren R D, Stouffer D B, Motter A E, Amaral L A N. A Poissonian explanation for heavy-tails in e-mail communication. Proceedings of the National Academy of Sciences.2008;105(47):18153–18158. pmid:19017788
- 16. Wu Y, Zhou C, Xiao J, Kurths J, Schellnhuber H J. Evidence for a bimodal distribution in human communication. Proceedings of the National Academy of Sciences.2010;107(44):18803–18808. pmid:20959414
- 17. Zhou T, Han X, Yan X, Wang B. WANG Binghong. Statistical Mechanics on Temporal and Spatial Activities of Human. Journal of University of Electronic Science and Technology of China.2013;42(4):481–540.
- 18. Zhao F, Liu J H, Zha Y L, Zhou T. Human dynamics analysis in online collaborative writing. Acta Physica Sinica.2011;60(11):118902.
- 19. Wu Y, Zhou C, Chen M, Xiao J, Kurths J. Human comment dynamics in on-line social systems. Physica A: Statistical Mechanics and its Applications.2010;389(24):5832–5837.
- 20. Qiao Y, Zhao X, Yang J, Liu J. Mobile Big-Data-Driven Rating Framework: Measuring the Relationship between Human Mobility and App Usage Behavior. IEEE Network.2016;30(3):14–21.
- 21. Xu F, Li Y, Chen M, Chen S. Mobile Cellular Big Data: Linking Cyberspace and the Physical World with Social Ecology. IEEE Network.2016;30(3):6–12.
- 22. Ren X, Zhu Y, Wang S, Liao H, Han X, LÜ L. Online Social Network Analysis and the Relation with Regional Economic Development. Journal of University of Electronic Science and Technology of China.2015;44(5):438–444.
- 23. Kivelä M, Porter M A. Estimating inter-event time distributions from finite observation periods in communication networks. Physical Review E. 2015;92(5):052813. pmid:26651750