Emergent User Behavior on Twitter Modelled by a Stochastic Differential Equation

Data from the social-media site, Twitter, is used to study the fluctuations in tweet rates of brand names. The tweet rates are the result of a strongly correlated user behavior, which leads to bursty collective dynamics with a characteristic 1/f noise. Here we use the aggregated "user interest" in a brand name to model collective human dynamics by a stochastic differential equation with multiplicative noise. The model is supported by a detailed analysis of the tweet rate fluctuations and it reproduces both the exact bursty dynamics found in the data and the 1/f noise.


I. INTRODUCTION
In the online era, humans are connected in real time on global scales. Local or seemingly local information is instantaneously shared across geographical boundaries. In particular, social online media have become an important platform for the sharing of information and have allowed for detailed studies of the coherent behavior of humans on a global scale [1][2][3][4][5][6]. The popular microblogging platform Twitter is a good source for such studies for two reasons. First, Twitter is more about providing news updates than developing social networks [7,8]. User behavior is therefore to a large extent influenced by information available via other information channels in society. Secondly, users respond to available information by submitting short public messages, "tweets", of up to 140 characters that may be seen as proxies for the public interest. Recent research on Twitter has used the activity levels in forecasting real-world events including fluctuations of stock market prices [9], real-time detection of the location and spread of earthquakes hitting populated areas [10], and for sentiment analysis and opinion mining [11].
In a recent paper [4], fluctuations in the tweet rates of 92 brand names are shown to be distributed with a power law tail with an exponent of −2.9 ± 0.4(SD). The broad tail of the distribution is characteristic for bursty activity levels. It is moreover found that the power spectral density of the tweet rate signals are described by a power law with an exponent of −1.0 ± 0.4(SD). This so called, "1/f noise", is found in a range of complex systems including heartbeats ( [12]), DNA base sequences ( [13]) and condensed matter systems ( [14]), and it is interpreted as a sign of a pronounced memory in the systems ( [15]). We attribute the power spectral density and the broad distribution in the tweet rate fluctuations to a strong correlation on a global scale in the collective human dynamics.
In this paper we consider the global user interest in a brand, which in our definition is the likelihood for a tweet to mention the brand name. The global interest in a topic is expected to change in a continuous and random fashion as the result of many independent events in society. We shall therefore describe the global user interest by a stochastic differential equation (SDE), which we derive by analyzing the fluctuations in the tweet rate. The SDE predicts simultaneously the power law exponents of the tweet rate distribution and the strong memory in the temporal variation of the tweet rates.
The paper is organized as follows, first we briefly describe the data acquired from Twitter and explain how the data is turned into a tweet rate. Then we introduce a method for analyzing the fluctuations in the the tweet rate and demonstrate how it works on a generic signal. Our method is applied to data and supports an SDE with multiplicative noise.
Finally, we show that the noise term in the SDE reproduces the power law distribution of the tweet rate as well as the power spectral density of the temporal signal. 10min, for a few brands. Note the regular daily variation and the irregular bursty behavior.

A. Data collection and time signals
We used the public REST API by Twitter to collect tweets containing one of seven brand names during a time period in the fall 2012 and the spring 2014 (see Supporting Information sections S1 and S2 for data and a description of time periods). The brand names considered are "Samsung", "Pepsi", "Heineken", "Gucci", "Starbucks", "BMW", and "Google". In the analysis, we chose to use international brand names for a number of reasons. First, the brand names are used globally and the users posting tweets about the brands in general transcend local communities. Secondly, the brands are sufficiently popular that a continuous and robust stream of tweets exists.
From the tweets collected, we save the time t i where a tweet is posted. The index i refers to the identification number of a given tweet. From the individual tweets, we form a time signal by summing over all tweets mentioning a given brand, where δ(t − t i ) is the Dirac delta function. The time signal is turned into a tweet rate, x(t), by dividing the time axis into windows of length, ∆T , and summing the events in each window, (1) A plot of tweet rate signals is shown for a few brands in Fig. 1, where both a regular daily variation and an irregular bursty behavior on top are distinctly visible. Burstiness is known to be inherent to individual human dynamics [16] and to have an impact on information spreading [17]. Here we see that bursts also appear in the aggregated interest level of many users in a large-scale social organization.
Interpretation of the update formula in Eq. (2). If the signal at some point takes the value γ 0 , then a small time step, dt, later, its value will be realized from a Gaussian with a mean determined by f (γ 0 ) and a spread determined by g(γ 0 ). By performing statistics over many such realizations we may therefore obtain the drift and diffusion.

B. The algorithm
We now introduce a method to uncover the underlying stochastic properties of a time signal [18]. The method is based on the assumption that a signal, γ(t), is generated by a stochastic differential equation (SDE) on the form Here dW is a random Gaussian variable with mean, dW = 0, and variance, dW 2 = dt.
The first term in Eq. (2) gives the deterministic drift, while the second term gives the random diffusion. The differential equation will here be handled using Ito calculus. Note that the above equation is assumed to describe the dynamics of the global user interest and not the observed tweet rates given in Eq.
(1). Below we shall relate the two quantities.
It may be shown in the Fokker-Planck formalism [19] that if the variable takes the value, γ 0 , at time, t, then at some small time step, dt, later, it will be a random variable from a Gaussian distribution with mean, γ 0 + f (γ 0 )dt, and spread, g(γ 0 )dt 1/2 (see Fig. 2). It is therefore possible to get an estimate of f (γ 0 ) and g(γ 0 ) by binning all the signal values close to γ 0 and then construct the corresponding distribution of signal values one time step later.
From this distribution one may read off the mean and the spread to get the estimates of f (γ 0 ) and g (γ 0 ). The procedure is then repeated over the whole range of realized signal values in order to estimate the functional forms of f (γ) and g(γ), respectively.
In Fig. 3 we show the result of applying the analysis to a signal generated by Comparing the analytical functions with the estimates, we get R 2 -values of 0.98 and 0.97 for the drift and diffusion respectively. The functional form of the drift and diffusion used here are equivalent to the ones fitted for "Samsung" below, and to make the comparison complete, we have also used the same signal length, N = 74, 646, and time step, dt = 1.
Note that bins with less than 20 data points have not been included, due to the otherwise poor statistics, and therefore γ only assumes values between 0.26 and 2.35.

A. Model
The stochastic differential equation, Eq.
(2) is formulated in the probability, γ(t), for a random tweet to mention a specific brand and not in the tweet rate. We call γ(t) for the "global user interest". In fact, the expected number of tweets on a given topic, x(t) P , in a time window, ∆t, is given by the full number of tweets posted on Twitter within this time window, A(t), times the probability for any such tweet to mention the given topic γ(t), with the functional forms estimated by the algorithm.
Here the expectation value, · P , refers to the Poisson weighted average of all the possible realizations of the tweet rate. The actual tweet rate signal is one such realization drawn from a Poisson distribution The above equation summarizes the basic structure of our model: the observed signal, x(t), is realized from a Poissonian with a mean given by the product between the user interest, γ(t), and the activity, A(t).
Within the activity, A(t), we also include any factors depending on regional differences, since the global composition of active users is changing during a daily cycle. We will assume that A(t) may be approximated as a deterministic and periodic function of time and that γ(t) is reasonably described by the SDE in Eq. (2). The goal of the following data analysis is to find the functional form of the drift, f (γ), and diffusion, g(γ), of the SDE.

B. Data analysis
It is problematic to apply the algorithm introduced in the methods section to the tweet rate signals, x(t), since we only expect it to apply to the underlying tweet probability, γ(t).
We The second problem that we face by applying the algorithm is the presence of the activity, A(t), relating the observed signal, x(t), to the signal of interest, γ(t). In the following analysis we will assume that the activity is a deterministic function of time with a daily period. One would naturally expect it to also have a weekly variation along with a variation on slower time scales, but here we will be interested in time scales below the resolution of a day, why it makes sense to approximate the activity by a daily period.
We may estimate the daily variation by averaging x(t) over many days to obtain a variable that is proportional to the activity Here we introduced two more expectation values: · D is the average of repeated measurements at the same time of day and · T is the general time average. We have used that Poissonians sum to a Poissonian with an expectation value that is the sum of the individual expectation values. We have also used that A(t) and γ(t) are uncorrelated in our model, that γ(t) is independent of absolute time and that A(t) is a periodic function. In practice, the average is performed over 20 to 62 days of measurements and by smoothening data to a time resolution of 15 minutes. Using the obtained information, we may construct the which is proportional to γ(t) if one averages out the Poisson noise The variableγ(t) is the closest approximation we get of γ(t) by our analysis. To see the effect of the Poisson statistics on our analysis, we have also generated the signal and applied the algorithm to bothγ(t) andx(t). The variablex(t) is equivalent toγ(t), but with the dynamics of γ(t) replaced by the mean value γ T (compare Eqs. (7) and (9)). By applying the algorithm to bothγ(t) andx(t), we hope to be able to separate the effect of the daily variation and the Poisson statistics from the actual dynamics of γ(t).
In Fig. 4A we showγ(t) and the corresponding instance ofx(t) for "Samsung". Note that the Poisson noise ofx(t) is not enough to explain the bursty behavior observed for the tweet rates. The resulting drift (Fig. 4B) and diffusion (Fig. 4C) terms are estimated using our algorithm on the two signals shown below. The drift ofγ(t) has been fitted with a function on the form A best fit yields the coefficients a f = 0.47 and b f = 0.43. Similarly, the diffusion has been fitted using a function on the form with a g = 0.31 and b g = 0.23. The analysis has been performed in dimensionless time, t → t/∆t, such that dt = 1. We have been unable to estimate the error bars in the presence of the Poisson statistics.
Our algorithm estimates a linear drift term for both the data,γ(t), and for the synthetic signal,x(t). Forx(t), we find a coefficient b f = 1, which is expected from a Poisson process.
The fact that we find b f = 0.43 forγ(t) shows that the data is more rich than a simple homogeneous or a weakly inhomogeneous Poisson process. In other words, the fluctuations of γ(t) are comparable or stronger than the fluctuations generated by the superimposed Poisson process. While we cannot quantify the influence of the Poisson process on the linear drift, we are confident thatγ=1 is the only stationary point ofγ, corresponding to a potential, V (γ) = γ f (γ )dγ , with just a single minimum. Furthermore, we do not expect the drift to depart significantly from a linear form around and above the fix point. A drift term of this form limits the signal and allows bursts to be generated by the multiplicative diffusion term.
In the plot showing the diffusion terms, we find that the effect of the Poisson statistics is very distinctly visible as a constant background noise. It indeed matches the size of the coefficient a g pretty well. We therefore propose that the this first term is due Poisson noise and therefore that the underlying variable γ(t) is described solely by the second term, We apply the same analysis to the other brand names and provide in Table I the length of the fitted data series, N data , the mean tweet rate, x , the ratio between the maximum and minimum of the daily variation, , and the goodness-of-fit values for the diffusion term, R 2 . We find that the fit captures the observed diffusion well for 4 of the 7 brand names, but it performs poorly for the last 3. We believe that this is the result of applying the algorithm to a limited time series under the effects of daily variation and Poisson noise. As an example of this, we show in Fig. 5 the result of applying the algorithm to the tweet rate signal of "Starbucks". We see that a Poisson process captures most of the fluctuations found in the dynamical signal, i.e. the diffusion terms ofγ andx are approximately equal. We therefore conclude that the average interest, γ , the big daily variation, A(t), and the Poisson noise is enough to explain most of the signal for "Starbucks".
This leaves very little room in the analysis to capture the dynamics of γ(t) (compare with Fig. 4) and may explain the poor performance of the fit.
In general, however, we find that the analysis ofγ(t) provides support to the hypothesized noise exponent of 3/2. We therefore propose that the global user interest is described by the following model where f (γ) is a slowly decreasing drift term derived from a single well potential. We emphasize that in order to derive this result, we assume that γ(t) is described by the stochastic differential equation, Eq. (2), and that A(t) can be approximated by a periodic function with a daily period. Finally, our method works best when the Poisson fluctuations are not too strong.
In the next section we show that if the single well potential defining f (γ) is approximated by an infinite well, we obtain a probability distribution with a power law exponent of -3 and a power spectrum with a power law exponent of -1. This is in agreement with the characteristic behavior of the brand name signals analyzed in [4].

DISTRIBUTION AND POWER SPECTRUM FROM MODEL
To derive the probability distribution and power spectrum for the model proposed in Eq.
(12) we switch from the Langevin equation to the corresponding Fokker-Planck formulation Here we have approximated the drift potential by an infinite well. This yields a vanishing drift in the region γ ∈ [γ min , γ max ] and reflective boundaries at the effective potential walls γ min and γ max . One finds the stationary distribution P s (γ) = N γ 3 (14) where N is the normalization constant. The same asymptotic power law is found in the case of a linear drift, which is promising since it matches the behavior of the data.
Eq. (13) may be solved using the method of eigenfunctions as explained in [20]. One finds that for an intermediate range of frequencies the power spectrum scales as which is also the case for the data.
The model proposed for the dynamics of interest, Eq. (12), is therefore successful at simultaneously explaining the scaling exponents of the signal distribution and the corresponding power spectrum. To show the validity of the infinite well approximation, we conclude the paper with a simulation of the model with a linear drift An efficient and accurate numerical integration may be performed by considering the inverse variable, τ = 1/γ, which may be integrated using the splitting up method [21]. In Fig. 6 we show the distribution and power spectrum for the simulated signal, and we observe that the linear drift is consistent with the power laws observed in the data. The exponent of the distribution is fitted to α 1 = −2.961 ± 0.002 using the the maximum likelihood routine introduced in [5]. The exponent of the power spectrum is found to be α 2 = −0.98±0.03 using a logarithmic binning and a least squares fit. The corresponding errorbars are estimated by bootstrapping.

CONCLUSION
In this paper we have studied the dynamics of interest in global brands by analyzing tweet rates on the online social media site Twitter. As a result of the correlations in the user behavior, the rates are found to be bursty and distributed as a power law with an exponent of -3 and have a power spectrum inversely proportional to the frequency. Since the global interest in a brand name is the result of many random events, we have proposed to model it by a stochastic differential equation with a simple drift and a diffusion like term. By analyzing the fluctuations in the tweet rate signals, we find that the diffusion term scales like a power with an exponent of 3/2. The derived diffusion term may explain the pronounced burstiness and the 1/f noise observed for the tweet rate signals.
It remains an open question whether the dynamics observed for the brand names on Twitter can also be observed for the occurrence of other keywords or even in other large social organizations? Another interesting question, which we have not addressed with our model is, what is the detailed behavior of individual humans that leads to correlated behavior given by our model? In general, the growing information available on human behavior in global-scale social organizations has helped answer parts of these questions and further analysis along the lines of this paper might provide a more complete picture. The two signals are shown in A and below we see the drift terms, B, and diffusion terms, C, estimated by the algorithm. Also shown are the fits of the functions in Eq. (10) and (11) to the estimated drift and diffusion ofγ(t).
FIG. 5. Estimated diffusion for the signalsγ(t) andx(t) for "Starbucks". Note that the diffusion estimated for the two signals is almost equal for this brand name, therefore making it hard to filter out the effect of the dynamical interest γ(t) present in the signal ofγ(t).