## Figures

## Abstract

Social networks are getting closer to our real physical world. People share the exact location and time of their check-ins and are influenced by their friends. Modeling the spatio-temporal behavior of users in social networks is of great importance for predicting the future behavior of users, controlling the users’ movements, and finding the latent influence network. It is observed that users have periodic patterns in their movements. Also, they are influenced by the locations that their close friends recently visited. Leveraging these two observations, we propose a probabilistic model based on a doubly stochastic point process with a periodic-decaying kernel for the time of check-ins and a time-varying multinomial distribution for the location of check-ins of users in the location-based social networks. We learn the model parameters by using an efficient EM algorithm, which distributes over the users, and has a linear time complexity. Experiments on synthetic and real data gathered from Foursquare show that the proposed inference algorithm learns the parameters efficiently and our method models the real data better than other alternatives.

**Citation: **Zarezade A, Jafarzadeh S, Rabiee HR (2018) Recurrent spatio-temporal modeling of check-ins in location-based social networks. PLoS ONE 13(5):
e0197683.
https://doi.org/10.1371/journal.pone.0197683

**Editor: **Dejing Dou,
University of Oregon, UNITED STATES

**Received: **November 23, 2017; **Accepted: **May 7, 2018; **Published: ** May 23, 2018

**Copyright: ** © 2018 Zarezade et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All datasets are available through our git repository at https://github.com/azarezade/STP/tree/master/code/Data. Our data collection method complies with the terms of service of both Twitter and Foursquare. Moreover, the datasets are anonymized and does not reveal the identity of actual users.

**Funding: **The author(s) received no specific funding for this work.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

The advances in location-acquisition techniques and the proliferation of mobile devices have generated an enormous amount of spatial and temporal data of users activities [1]. People can upload a geotagged video, photo or text to social networks like Facebook and Twitter, share their present location on Foursquare or share their travel route using GPS trajectories to GeoLife [2]. A considerable amount of spatio-temporal data is generated by the activity of users in location-based social networks (LBSN). In a typical LBSN, like Foursquare, users share the time and geolocation of their check-ins, comment about a venue, or unlock badges by exploring new venues. These data motivated the researchers to study the human spatio-temporal behavior in social networks [3, 4].

Many techniques have been proposed for processing, managing, and mining the trajectory data in the past decade [5]. Several other studies try to leverage the spatial data in recommender systems [6]. However, a few works have attempted to model the recurrent spatio-temporal behavior of users in LBSNs [7, 8]. Given the history of users’ check-ins, the goal is to predict the time and location of users’ check-ins utilizing a model. This model can also be used to find the influence network between users which made up of their check-ins, detect the influential users and popular locations, predict the peak hours of a restaurant, recommend a location, and even control the movement of users.

In this paper, we propose a probabilistic generative model for the check-ins of users in location-based social networks, which can be used in predicting the future check-ins of the users, and discovering the latent influence network. People usually have periodic patterns in their movements [8–10]. For example, a typical user may check into her office in the morning and to a nearby restaurant at noon then return home and repeat this behavior in the following days. We model the time of check-ins of each user with a novel periodic-decaying doubly stochastic point process which leverages the periodicity in the movements of users and can also capture any drift in their patterns. To model the location of check-ins we use the fact that users in social media are influenced by the activities of their friends [11–13]. If many of your close friends have checked into a specific restaurant recently, then there is a high probability that you select that restaurant, next time. We model the location of check-ins using a time-varying multinomial distribution. In summary, we propose:

- Doubly stochastic point process for modeling the time of users’ check-ins, which captures the periodic behavior in the movement of users.
- Time-varying multinomial distribution for modeling the location of users’ check-ins, which incorporates the mutually-exciting effect of the friends’ history.
- Scalable inference algorithm based on the EM algorithm to find the model parameters, which is distributed over users, and has a linear time complexity.
- Compelling dataset of Foursquare users’ check-ins, curated from 12000 active users during three months in the year 2015.

Our work relates to previous work on temporal point processes, and location-based social networks analysis.

Modeling information diffusion in social networks has attracted a lot of attentions in recent years. Given the times that users have adopted to a contagion (information, behavior, or meme), the problem is to model the time and user of the next adoption, *i.e.*, predict the next event. Early methods [14, 15] studied information diffusion using a pair-wise probability distribution for each link from node *j* to *i*, which is the probability that node *i* generates an event in time *t*_{i} due to the event of node *j* at time *t*_{j}. These methods overlook the external effects on the generation of events. In addition, they assume that each node adopts a contagion at most once, *i.e.*, events are not recurrent. These issues were later addressed in [7, 16–20], which they use point processes for the modeling of events. In [15–17, 19], cascades are assumed to be independent and are modeled by a special point process, called Hawkes [21]. The independence assumption is removed in [11, 22], they tried to model the correlation between multiple competing or cooperating cascades. In [7] a spatio-temporal model is proposed for the interactions between a pair of users not an individual user as in our model. Other studies [17, 23–26], use the additional information of the diffusion network such as topic of tweets or the community structure to better model the influence network. Moreover, in [27, 28], a stochastic optimal control framework is proposed to control the diffusion process in complex networks. Recently, the recurrent neural networks (RNN) are utilized to learn the intensity function of a temporal point process as a nonlinear function of the history and solve the resulting nonlinear optimization by a stochastic gradient algorithm [29, 30]. Most of the previous works studied the temporal diffusion of information on microblogging networks like Twitter, whereas we try to model the time and location of users’ check-ins in the location-based networks like Foursquare. Moreover we proposed a periodic point process which is of independent importance, whereas in the previous studies the self-exciting point processes is used for the modeling of events.

The prior works in location-based social networks can be categorized into three groups [6]: location recommendation, trajectory mining and location prediction. The main approaches in location recommendation systems are: content-based which uses data from a user’s profile and the features of locations [31, 32]; link-based, which applies link analysis models like PageRank to identify the experienced users and interesting locations [33, 34]; and collaborative filtering which infers users’ preferences from their historical behavior, like the location history [35, 36]. In trajectory data mining, the source of data is usually generated by the GPS. These works include; trajectory pattern mining to find the next location of an individual [8, 37–39], anomaly detection to detect unexpected movement patterns [40, 41], and trajectory classification to differentiate between trajectories of different states, such as motions, transportation modes, and human activities [42]. A comprehensive review of these methods can be found in the recent survey [5]. We also discriminate our work from location recommendation and trajectory mining methods, because our goal is to model the check-ins of users not to recommend a location or to find the trajectory patterns of users with the position data of their routes. In location prediction, the goal is to predict the next location, given the user’s profile data and the history of check-ins. But these methods do not consider; the relation between friends (using the influence matrix), aging effect in the history of checkins (using decaying kernel), exogenous effects on users’ decisions, and periodicity in users’ movement patterns.

## Materials and methods

### Preliminaries

To model the time of occurrences of a phenomenon, which are called events, we can use point processes on the real line. The phenomena can be, an earthquake [43], a viral disease [44] or the spread of information over a network [15]. The sequence of events, as defined below, is the realization of a point process.

**Definition 1** (Point Process). *Let*
*be a sequence of non-negative random variables such that*
, *then we call* *a point process on* , *and* *as its history or filtration.*

There are different equivalent descriptions for the point processes such as; sequence of points {*t*_{i}}, sequence of intervals (duration process) *δt*_{i}, counting process *N*(*t*), or intensity process λ(*t*) [45]. In the following, we briefly explain each definition.

The counting process *N*(*t*) associated with the point process , counts the number of events occurred before time *t*, *i.e.*, , where indicator function is 1 if *x* ∈ *A*, and is 0 otherwise. The duration process *δt*_{i} associated with the point process is defined as . Finally, the intensity process λ(*t*) is defined as the expected number of events per units of time, which generally depends on the history:
where *N*(*t*, *s*] ≔ *N*(*s*) − *N*(*t*). To evaluate the likelihood of a sequence of events, *f*(*t*_{1}, *t*_{2}, …, *t*_{n}), we can use the chain rule of probability, *f*(*t*_{1}, *t*_{2}, ⋯, *t*_{n}) = ∏_{i} *f*(*t*_{i}|*t*_{1:i−1}). Therefore, it suffice to describe only the conditionals, which are abbreviated to *f**(*t*). According to the definition of point processes, we can write the probability of occurring the (*n* + 1)’th event in time *t* as:
If we divide both sides of the above equation by 1 − *F**(*t*), where *F**(⋅) is the cdf of *f**(⋅), then in the limit as *dt* → 0, we have:
Therefore, according to the definition of intensity, we find the relation between conditional distribution of the time of events and the intensity function as:
(1)
where we use * superscript to show that a function is dependent on the history. We can also express the relation of λ*(*t*) and *f**(*t*) in the reverse direction [46]:
(2)
Now, the cdf can be easily evaluated:
(3)
A point process is usually defined by specifying its conditional distribution *f**(*t*) or equivalently its intensity λ*(*t*). In the simplest case, the intervals *δt*_{i} are assumed to be *i*.*i*.*d*., therefore the process is memoryless, and hence λ*(*t*) = λ(*t*). The Cox process [47] is a doubly stochastic point processes, and conditioned on the intensity is a Poisson process [48]. Hawkes process [21] is a special type of Cox process, where the intensity is expressed by the history as:
(4)
where *ϕ*(*t*) is the kernel of the Hawkes process that defines the effect of past events on the current intensity, and *μ* is the base intensity. For example, the exponential kernel *ϕ*(*t*) = exp(−*t*), is used for the modeling of self-exciting events like earthquake [43]. In general, we have a multivariate process with a counting process vector * N*(

*t*) = [

*N*

_{1}(

*t*), ⋯,

*N*

_{n}(

*t*)]

^{T}and an associated intensity vector defined as: (5) where Φ(

*t*) is the matrix of mutual kernels,

*i.e.*, Φ

_{ij}(

*t*) models the effect of events of counting process

*N*

_{j}(

*t*) on

*N*

_{i}(

*t*),

**= [**

*μ**μ*

_{1}, ⋯,

*μ*

_{n}]

^{T}is the base intensity, and

**= [**

*A**α*

_{ij}] is a matrix of mutual-excitation kernels. Often, the point process carries other information than the time of events, which is called mark. For example, the strength of an earthquake can be considered as a mark. The mark

*m*, often a subset of or , is associated with each event through the conditional mark probability function

*f**(

*m*|

*t*): (6) The mutually-exciting property of the Hawkes process makes it a common modeling tool in a variety of applications such as seismology, neurophysiology, epidemiology, reliability, and social network analysis [14, 15, 22].

### Problem definition

Given the history of users activity in a location based social network, , with users and *L* locations in *C* different categories, we propose a generative model for the check-ins of users. In other words, for each user we can predict the location and time of her next check-in.

We define a check-in as a 4-tuple (*t*, *u*, *c*, *l*), which shows the time *t* that user *u* check-in to location *l* with category *c*. We observe the sequence of all check-ins in the network , in the time interval [0, *T*]. The observation , is composed of user’s check-ins where *t*_{i} ∈ [0, *T*], , *c*_{i} ∈ {1, 2, …, *C*} and *l*_{i} ∈ {*ϕ*_{1}, *ϕ*_{2}, …, *ϕ*_{L}}, where *ϕ*_{i} is the unique id of the *i*’th location. Since we use location ids instead of geo-coordinates, it is fair to assume the observation data is noiseless, however in practice, there may be an uncertainty in the locations of check-ins, which we are considering it as a future work. We use the following notation for the history of check-ins of user *u* in location *l* with category *c* up to time *t*:
Moreover, we use the dot notation to represent the union over the dotted variable, *e.g.*, represents the events of user *u*, before time *t*, in any location with any category, and represents the events of all users except *u*, before the time *t*, in any location with category *c*.

By observing the periodic pattern in the time of users’ checkins (see the Results section) we model the time of check-ins using a doubly stochastic point process which incorporates both the periodic patterns and exogenous effects, in the users’ movements. The exogenous effects are any other external effects on the users’ time of check-ins which are not necessarily periodic. To model the location of check-ins we propose a time-dependent multinomial distribution which incorporates the mutually-exciting effect of friends, which this effect is also empirically observed in the real data.

### Proposed method

#### modeling the time of check-ins.

In every working day, a user may check-in to her office in the morning then go to a restaurant at noon, and also have a weekly soccer practice program. By observing the history of the time of check-ins of a user, if she repeats some patterns recently (within several days), for example take a walk every afternoon, then it is more likely to repeat this pattern shortly in the upcoming days at approximately the same time. It means, there is a periodicity in the users’ behaviors. Moreover, there maybe also a drift or an addition of a new activity in the user’s behavior, for example, the working hour of her office may change or there may be a new weekly social gathering. Therefore, we need a periodic point process to model the time of user’s check-ins, which can also adapt to the new users’ check-ins. This is in contrast to the self-exciting nature of the Hawkes process, which is used to model the diffusion of information over a network [14, 15, 17].

We propose a doubly stochastic point process which is periodic, and also has a diminishing property that enables the process to change its periodic pattern and adapt to the new behaviors. The proposed process, is composed of a Poisson process with the base intensity *μ*, where each event *t*_{i} of this process triggers a Poisson process with the following intensity:
(7)
where *h*(*t*) is the kernel of the process, *g*(*k*) is a decreasing function to diminish the intensity in the future periods, and the hyper-parameter *τ* is the period. This intensity is illustrated in Fig 1. The self-exciting property of the Hawkes process can be observed from its exponentially decaying kernel in Fig 1. In the Hawkes process when an event occurs, there is a high probability to have events just after it, and this probability decreases exponentially afterward. But in the proposed process, there is a high probability to have events in the upcoming periods and this probability also decreases exponentially.

An event at time *t* = 0 triggers a poisson process. The solid curve shows the intensity of the proposed periodic point process with a Gaussian kernel and period *τ*, and the dashed curve shows a Hawkes process with an exponential decaying kernel.

According to the superposition theorem [48], the intensity of the proposed process can be written as follows:
(8)
To preserve the locality in time, the kernel *h*(*t*) should have a peak at *t* = 0 and decay to zero in both sides when *t* → ±∞. For example, the Gaussian kernel, *h*(*t*) = exp(−*t*^{2}/2*σ*^{2}) meets this requirements. This model has three main features:

*Periodic Nature*. When an event occurs in time*s*, then the intensity of events around this time in the upcoming periods,*s*+*kτ*, would increase.*Temporal Locality*. The intensity is high around the peak of the kernel and drops rapidly in both sides.*Adaptability*. The peak of the kernel decreases by the increase of*k*, so the process can adopt its intensity to any new periodic patterns.*Exogenous Effect*. Other external effects can be modeled by the base intensity*μ*.

If we use a truncated Gaussian kernel like , then we can substantially reduce the complexity of the intensity function. With this kernel we can show that:
(9)
where is the period number of which the event in *t*_{i} affects on the current intensity. So, we propose the following point process for the time of check-ins of user *u* in any location with category *c*:
(10)
The first term, *μ*_{uc} is the base intensity that models the external effect on user *u* to generates check-ins with category *c*, the second term is the periodic effect of the history, *β*_{u} is the kernel parameter, and *τ*, *σ* are hyper-parameters. All parameters of the model are listed in Table 1. The intuition of this model is that, if a user check-ins frequently, for example in the “restaurant” category at noon, then with high probability, she will checks in a restaurant at noon in the next day.

#### modeling the location of check-ins.

In this section, we propose a model for the location of users’ check-ins, given the history of check-ins. We use the fact that, users in social networks are influenced by the behavior of their neighbors. Let denote the weight of location *l* with category *c* for user *u* as:
(11)
which incorporates *α*_{uiu}, the influence of user *u*_{i} on *u*, and the time of check-ins with an exponentially decaying kernel. This kernel diminishes the effect of far past check-ins, so the model can adopt to any new behaviors of the users’ check-ins. Therefore, a location which checked in recently with many or even few but influential friends would have high weight. We also define a weight for the popularity of a location *l* with category *c* from the perspective of all users:
(12)
where the location that is most checked in recently, has the highest weight.

When a user decides to check-in for example, at a restaurant, she selects a location that herself or her friends have checked in frequently, recently (exploitation effect), and sometimes she check-ins to a new popular restaurant (exploration effect). Therefore, we use the following multinomial conditional distribution to define the probability that user *u* check-ins to location *ℓ*, given the time *t* and category *c*:
(13)
The Dirac delta function *δ*_{ϕl}(*ℓ*) is 1 if *ϕ*_{l} = *ℓ*, otherwise it is 0, and the parameter *η*_{uc} models the inclination of the user to explores new locations. This distribution means that, with probability *w*_{ucl}/(*η*_{uc} + *w*_{uc⋅}) the current location would be a previously checked in location *ϕ*_{l} by the user *u* or any of her friends (since for non visited locations the weight *w*_{ucl} is zero), and with probability *η*_{uc}/(*η*_{uc} + *w*_{uc⋅}) it would be selected from all locations in the network, with a probability that is modeled by the following distribution:
(14)
Where according to the definition of coefficient *m*_{cl}, it assigns more probability to the popular or recently frequently visited locations. The main features of the proposed location model are:

*Exploitation*. The future check-ins of a user are influenced by the history of check-ins of the user and her friends.*Exploration*. There is a probability that users explore and check into new unseen locations.*Adaptability*. Using exponential decaying kernel for the weights, the model can adopt to new patterns in users’ behavior.*Influence Network*. The parameters {*α*_{vu}} are actually modeling the latent influence network which are learned from the check-ins history.

#### summary of the generative model.

The proposed generative model is summarized in Alg. 1. Using the superposition theorem, first the time *t* of check-in is sampled from the proposed periodic point process λ(*t*) = ∑_{u,c}λ_{u}(*t*, *c*), then the user *u* which generated this event is selected in proportion to its intensity λ_{u}(*t*). The category *c* of the check-in is also selected in proportion to λ_{u}(*t*, *c*). Finally, the location *l* is sampled from the proposed location model.

**Algorithm 1**: Generative model of the check-ins.

**Input**: *N*, *C*, *L*, all parameters {*μ*_{uc}, *η*_{uc}, *α*_{uv}, *β*_{u}}, history of check-ins.

**Output**: Next check-in (*t*_{i}, *u*_{i}, *c*_{i}, *l*_{i}).

**for** *u* = 1 : *N* **do**

λ_{u}(*t*) = ∑_{c}λ_{u}(*t*, *c*)

**end**

λ(*t*) = ∑_{u}λ_{u}(*t*)

*l*_{i} ∼ *f*_{ui}(*ℓ*|*c*_{i}, *t*_{i})

**return** (*t*_{i}, *u*_{i}, *c*_{i}, *l*_{i})

#### inference.

We propose a Bayesian inference algorithm based on the EM algorithm to find the model parameters. To find the maximum likelihood solution, for each check-in (*t*_{i}, *u*_{i}, *c*_{i}, *l*_{i}), we define a latent variable *z*_{i} as the user that caused *u*_{i} to check into location *l*_{i}, given the time *t*_{i} and category *c*_{i}. We use 1-of-*N* coding to represent *z*_{i}’s. For notional convenient, lets define:
(15) (16)
where is the contribution or influence of user *v* in the check-in of user *u* at location *l* with category *c*. Now, we define:
(17)
where *z*_{iv} is the *v*’th element of *z*_{i}, or the index of the user that caused *i*’th check-ins. But, *v* = 0 is not the index of a user, it represents the exploration effect. It can be verified that marginalizing out the *z*_{i}, ∑_{zi} *f*_{ui}(*l*_{i}, *z*_{i}|*t*_{i}, *c*_{i}), results in the probability distribution (13). Now, to evaluate the complete likelihood of the data and hidden variables , given the parameters *θ* = {*μ*_{uc}, *η*_{uc}, *α*_{uv}, *β*_{u}}, *u* = 1…*N*, and *c* = 1…*C*, where is the set of neighbors of *u*, we use the following proposition.

**Proposition 1 ([11])** *Let* *N*_{u}, *u* = 1, 2, ⋯, *N be a multivariate marked point process with the associated intensity* λ_{u}(*t*), *and the mark probability* *f*_{u}(*m*|*t*). *Let* *be a realization of the process over* [0, *T*]. *Then the likelihood of* *on model* *N*_{u} *with parameters* *θ* *can be expressed as follows*.

If we consider (*c*_{i}, *l*_{i}, *z*_{i}) as the mark *m*_{i} of the process, according to this proposition the complete likelihood of our model is,
(18)
where using Bayes’ rule and Eq (17) it can be evaluated as follows.
To derive the second line, we used the superposition theorem, and the fact that the probability of a category, according to our generative model is *f*_{ui}(*c*_{i}|*t*_{i}) = λ_{ui}(*t*_{i}, *c*_{i})/λ_{ui}(*t*_{i}). Given the joint distribution of the observed and latent variables , we use EM algorithm to maximize the likelihood function with respect to *θ*. In the E-step we evaluate . Using Bayes’ rule we can write the posterior distribution of the latent variables as,
(19)
which factorizes over *i*, so that *z*_{i}’s are independent with multinomial distribution and we can write the expected of *z*_{iv} under this distribution as follows.
(20)
In the M-step we maximize the expected complete log-likelihood, which can be decomposed to the sum of expected log-likelihoods of users .
(21)
Where *Z*_{u} = {*z*_{i} ∈ *Z*|*u*_{i} = *u*} and *θ*_{u} = {*μ*_{uc}, *η*_{uc}, *α*_{uv}, *β*_{u}}, , *c* = 1⋯*C*. Accordingly, the M-step can be decomposed to multiple maximizations over users, which can be done in parallel. Therefore, for each user *u*, the two steps of the EM algorithm can be summarized as follows.
(22) (23)
In the following proposition, we prove that the maximization in M-step is concave, so it has a unique and optimal solution. Moreover, the performance of the overall inference algorithm is not affected by the network size, as long as the average degree of the network and the average number of events per users remains fixed. Since, they define the number of parameters and observed data of each EM inference algorithm, and consequently define the performance of the overall inference algorithm.

**Proposition 2**. *The expected log-likelihood of a user*, *as a function of* *is concave, where* *and* .

*Proof.* According to Eq (21) the log-likelihood of user *u* is:
The first term is a linear function of {*μ*_{uc}, *β*_{u}}, so it is both convex and concave. The second term is the log of a linear function which is concave, according to composition rules [49]. The third term is composed of , which for *v* > 0,
and for *v* = 0,
In both cases is concave according to Lemma 1 of [11] which state that logarithm of sum of linear exponentials is convex. So, the overall expression is concave. Actually, we use instead of *η*_{uc}, *α*_{uv} in the implementations, and solve the resulting concave optimization.

To find the time complexity of the inference algorithm, by carefully investigating all terms in the M-step, it can be verified that, each gradient descent iteration in maximization 23 has *O*(*k*_{u} *h*_{u}) operations, where *k*_{u}, *h*_{u} are the number of neighbors and the size of history of user *u*, respectively. Therefore, the approximate order of the overall inference algorithm is *O*(*mkhN*), where *k*, *h* are the average network degree and events per user, and *m* is the number of EM iterations times the number of the gradient descent iterations. In practice, since *m* which depends on the desired tolerance in the EM algorithm, and *k* are constant (the average degree of the most real work networks are less than 10 [44]), the overall complexity can be simply expressed by *O*(*hN*), which is linear with respect to the number of users *N*, and the average number of events per user *h*.

### Datasets

To evaluate the proposed method we use a synthetic data, and a real data gathered from users check-ins data in Foursquare. All dataset is available through our git repository, github.com/azarezade/STP. Our data collection method complies with the terms of service of both Twitter and Foursquare. Moreover, the dataset is anonymized and does not reveal the identity of actual users.

## Results and discussion

In this section, using both synthetic and real data, we evaluate the performance of the proposed method. First, in the synthetic data experiments, we show that the proposed inference algorithm can learn the model parameters with high accuracy. Then in the real data experiments, we show that the proposed method outperform the other competing methods.

### Experiments on synthetic data

Following the literature, we use the synthetic data generated from our model to evaluate the performance of proposed learning algorithm. Moreover, we analyze the effect of model parameters on users behavior.

We experiment with five random Kronecker networks [50] with *N* = 64 nodes, namely Core-periphery, Heterophily, Hierarchical, Homophily, and Erdos-Renyi, where the seed matrix parameters are [0.85, 0.45; 0.45, 0.3], [0.3, 0.89; 0.89, 0.3], [0.9, 0.1; 0.1, 0.9], [0.89, 0.3; 0.3, 0.89], and [0.60, 0.60; 0.60, 0.60], respectively. We set the number of categories to *C* = 4 and consider eight locations in each category. The temporal and spatial model parameters are randomly drawn from the uniform distributions *μ*_{uc}, *η*_{uc} ∼ *U*(0, 0.05), *α*_{uv} ∼ *U*(0, 0.5) and *β*_{u} ∼ *U*(0, 0.1). The period and standard deviation in the temporal model set to *τ* = 12 and *σ* = 0.5, respectively. We generate 16000 check-ins from our model, using the Ogata method [51], and consider the first 80% of them for the train and the remaining 20% for the test data. Then we learn the model with different percentages of the training data, and evaluate the average predicted log-likelihood on the test data (*AvgPredLogLik*) and the mean squared error between the estimated and real parameters (*MSE*). The inference algorithm is implemented in parallel for all users. All source codes and datasets are available in our git repository.

In Fig 2, the *AvgPredLogLik* and *MSE* of the temporal model is plotted versus the size of train data, where the average estimation error decreases to about 7 × 10^{−4}. These measures are also plotted for the spatial model with different random network structures in Fig 3, given the time of check-ins. We can see that the parameter estimation error decreases and the average log-likelihood increases as we increase the size of train data, which shows the proposed inference algorithm can consistently learn the model parameters with a very small estimation error. Furthermore, in the left of Fig 4, we show that for a fixed number of events per user, increasing the EM iterations would decrease *MSE* to about 0.1. To investigate the network structure prediction of our model, for each size of the train data, we use a threshold to convert the predicted weighted network (*i.e.*, the *α*_{ij}’s) to a (0, 1)-adjacency matrix and evaluate the percent of recovered edges to form the ROC curve. Then, we find the AUC curve, which is illustrated in the middle of Fig 4. Our method finds 64% of edges using only 150 events per user in the train data.

Average predicted log-likelihood on the test data (*left*), and MSE of the learned parameters (*right*), in the temporal model for the different percentages of the train data.

Average predicted log-likelihood on the test data (*left*), and MSE of the learned parameters (*right*), in the spatial model for the different percentages of the train data and various random graph structures.

Average predicted log-likelihood in logarithmic scale vs the iterations of EM (*left*), the network structure recovery for different percentages of the train data (*middle*), and the effect of spatial parameters on the users’ Sociality (*right*).

To study the effect of model parameters on the users’ behavior, we design two experiments. First, we define a measure called *Sociality*. For each user, the *Sociality* is the percent of check-ins that their location has been previously visited by the user or her friends. According to our spatial model, Eq (13), the exploration of users increase as we increase *η* or decrease *α*. To empirically validate this property of our model, in the right of Fig 4 the box plot of the users’ *Sociality* is illustrated for different parameters. The average sociality reaches up to 80% when the average ratio of spatial parameters, is equal to 100. It means that, users with high *α*/*η* are more affected by their friends. Moreover, to see the effect of temporal model parameters on the check-ins time of users, we plot the distribution of users’ interevent time (the time difference between two successive events in a specific category for each user). According to Eq (10), parameters *β* and *μ* regulate the periodicity in the time of events. The higher *β*, would result in more periodic events. We fix *μ* and set *β* = 0 and 1 in the left and right graphs of Fig 5, respectively. As we see, there is a peak around 12 in the right graph, which is the period of the simulated events but, in the left figure the frequency of events reduces exponentially and there is no peak except the initial one.

The distribution of interevent in the temporal model with *β* = 0 (*left*) and *β* = 1 (*right*). We can see that increasing *β* would cause a peak around 12, which is the period of the simulated events.

### Experiments on real data

In this section we use the real data gathered from users’ checkins in Foursquare, which is a popular LBSN, to evaluated the proposed method against other alternative continuous time check-in models.

We used both Twitter and Foursquare APIs to crawl the check-ins data of the users in Foursquare, because Foursquare does not provide the full check-ins data. Specifically, we crawled the tweets of the users that have installed Swarm application, and publicly tweet their check-ins. This app is connected to the Twitter and Foursquare account of the user. When a user check-ins, using this app, she can tweet the URL of that location in the Foursquare website. Therefore, we have access to the location details (via Foursquare API) and the time of check-ins (via Twitter API). Using the Twitter search API we found active users with high check-ins rate in Foursquare. By querying the API with “I am at”, the default template of Swarm app for check-ins, we selected the top 12000 users, and crawled their tweets in ten weeks during the year 2015. We pruned the data by selecting 1000 active users that were in the same country (Brazil). The average degree of the network is 6.4. The total number of check-ins is about 60000. The number of unique locations is about 10000 in 10 categories. Our data collection method complies with the terms of service of both Twitter and Foursquare. Moreover, the dataset is anonymized and does not reveal the identity of actual users.

We use the first eight weeks of the check-ins for train, and the remaining two weeks for test. The hyper-parameters of the temporal model are set to *τ* = 24 and *σ* = 1, by cross validation. We learn model parameters by the train data and use different temporal and spatial measures for the evaluations. We compare our proposed model with MH [17], where the intensity of user’s check-ins is modeled by a multivariate Hawkes process (the intensity depends on the user and her friends’ history); RNN [30] which use a recurrent neural network to learn a nonlinear intensity function based on the users’ history of events; and baseline HP where the intensity is modeled by a Hawkes process that also depends on the user’s history. The spatial model is also compared with two baselines, MP and PL. In the MP method the most checked in locations, disregarding the time of check-ins, are more probable to be selected as the next check-in location. The PL model assumes periodicity in the location of check-ins, the locations that are more checked in previous periods are more probable to be visited in the current time.

To reveal the motivation of the proposed method, we perform two empirical experiments on the real data. In summary, Fig 6 shows that: (i) most of the events are repeated after one, or more days (since there are peaks in the left graph at 1, 2, 3, …), which verifies the use of a periodic point process for modeling the time of users’ check-ins; (ii) about 80% of users are affected by their friend’s location of check-ins (the blue box) which justifies the use of the proposed mutually-exciting spatial model; (iii) only 10% of users explore new locations (the red box), which these users are modeled by the parameter *η* in Eq (13); (iv) as we more increase the size of the history time window, the less *Sociality* increases, which validates the use of the exponential decaying kernel in Eq (11) to reduce the effect of far past history.

The frequency of interevent times in the Food category of Foursquare dataset (*left*), and the Sociality box plot of users for different history window sizes (*right*).

To evaluate the prediction accuracy of the time of check-ins, we design two experiments. For each test event we estimate the time of the next event by different methods. The percent of check-ins which their times are closer than a threshold to the real time is plotted in the left graph of Fig 7. Our method achieved up to 35% improvement for a one hour threshold, compared to other methods. In the right graph, the number of users where the average distance of their estimated events is less than a threshold is plotted. The proposed method performed up to 20% better than the competing methods. We did not plot this graph for the thresholds less than 6 hr, where all methods perform poorly. The poor performance of the RNN method is probably due to underfitting, since its objective function is nonconvex (in contrast to the other methods, which are all convex), and the SGD method for the inference need much more training check-in data, which is rare in most of the real-world applications.

The percent of check-ins which their times are closer than a threshold to the real time (*left*). The number of users which their average distance of predicted check-in times to the real times are less than a threshold (*right*).

Now, given the time of check-ins, we evaluate the prediction accuracy of the location of check-ins. For each test event, each method assigns a probability to each location, forming a vector and selects the most probable location. *Accuracy*@*k* is the percent of events that the true location is among the first *k* high probable locations, and *NDCG*@*k* is , where *r*(*e*_{i}) is the (one-based) rank of the real location of *i*’th check-in in the location probability vector. These measures are plotted in Fig 8. For *k* = 1 the accuracy increase from in other methods to in our method—about 43% improvement. It should be noted that there are about 10,000 locations and the random guess has extremely low accuracy. For larger values of *k* the measure is less reliable, since all method would have the same accuracy. Our method reaches to 24% accuracy, and about 8% improvement at *k* = 40. But in the *NDCG* which dose not have the mentioned undesirable effect (since the low-rank events are more significant) we see our method consistently outperform the others—about 30 to 50% improvement for the different values of *k*.

The accuracy (*left*) and NDCG (*right*) of location prediction, given the times of check-ins, at different values of *k*.

Finally, we performed the scalability analysis for different methods as depicted in Fig 9. In the right graph we compared the inference time for different sizes of event history, in the real dataset. Our method achieved the second best performance. For better comparison, the time complexity of all models, expect RNN, are measured on a single core machine, although our method and HP can be executed in parallel and consequently the CPU time would be divided by the total number of cores. The time of RNN method is multiple orders of magnitude slower than the others, although we executed it on a 10-core machine, since the SGD methods need much more iterations to converge. Moreover, if we fit a line to these log-log curves, the slopes of Our, HP, MH, RNN, and Spatial curves would be 1.1, 1.3, 1.4, 0.01 and 1.2, respectively. This, validates the linear time complexity of our model with respect to the size of history *h*. In the left graph we compared the inference time in the synthetic data with different network sizes. Again, our method is the best performer after HP. Here, the slopes are 0.96, 0.98, 0.91, 0.99 and 1.2 for Our, HP, MH, RNN, and Spatial methods, respectively. These results validate the linear time complexity of our model with respect to the size of network *N*.

The time complexity of different temporal models and our spatial model (the other baseline spatial models also have approximately the same time complexity, so only one of them is depicted), for different network sizes (*left*), and for different sizes of events history (*right*).

## Conclusion

To model the check-ins of users in location-based social networks, we proposed a doubly stochastic point process for the time of check-ins, which leverages the periodicity in users’ behavior, and a multinomial distribution for the location of check-ins, which leverages the mutually-exciting effect of friends on decision of users.

The synthetic experiments show the proposed inference algorithm can learn the model parameters with high accuracy and its performance increases consistently by the size of train data. Moreover, we study the effect of model parameters on the users’ check-ins, from which one can interpret the users’ behavior in LSBNs from their inferred parameters. The real experiments on the curated Foursquare check-ins dataset, show the proposed method outperform the other competing methods in the time and location prediction of users’ check-ins. Specifically, we achieved up to 35% in the time prediction and 43% in the location prediction accuracy. Furthermore, the empirical studies show the real data meets the assumptions of the proposed model that is, users are periodic in the time and mutually-exciting in the location of their checkins.

Our work also opens many interesting venues for future works. For example, we can consider the home location of the users in defining the probability of the location of their check-ins, by modifying the weight of locations in Eq (11). In addition, we can investigate the utilization of a non-parametric spatial model instead of the multinomial distribution. Finally, we can use the proposed model to control the check-in behavior of users by incentivization, or use it for point-of-interest recommendations.

## References

- 1.
Zheng Y. Location-based social networks: Users. Computing with Spatial Trajectories, Zheng Yu and Zhou Xiaofang, Eds. Springer; 2011.
- 2. Zheng Yu Xing Xie WYM. GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory. IEEE Data(base) Engineering Bulletin. 2010;33(2):32–39.
- 3. Andris C. Integrating social network data into GISystems. International Journal of Geographical Information Science. 2016;30(10):2009–2031.
- 4. Li Q, Wei W, Xiong N, Feng D, Ye X, Jiang Y. Social media research, human behavior, and sustainable society. Sustainability. 2017;9(3):384.
- 5.
Zheng Y. Trajectory Data Mining: An Overview. ACM Transaction on Intelligent Systems and Technology. 2015;.
- 6. Bao Jie Yu Zheng DWMFM. Recommendations in Location-based Social Networks: A Survey. GeoInformatica. 2014;19:525–565.
- 7. Cho YS, Galstyan A, Brantingham PJ, Tita G. Latent self-exciting point process model for spatial-temporal networks. Discrete and Continuous Dynamical Systems—Series B. 2014;19(5):1335–1354.
- 8.
Cho E, Myers SA, Leskovec J. Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2011. p. 1082–1090.
- 9.
Williams MJ. Periodic patterns in human mobility. Cardiff University; 2013.
- 10.
Zhenhui Li JHRKPN Bolin Ding. Mining periodic behaviors for moving objects. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010). ACM; 2010. p. 1099–1108.
- 11.
Zarezade A, Khodadadi A, Farajtabar M, Rabiee HR, Zha H. Correlated Cascades: Compete or Cooperate. In: Thirty-First AAAI Conference on Artificial Intelligence; 2016.
- 12.
Farajtabar M, Du N, Gomez-Rodriguez M, Valera I, Zha H, Song L. Shaping social activity by incentivizing users. In: Advances in neural information processing systems; 2014. p. 2474–2482.
- 13. Gomez-Rodriguez M, Song L, Daneshmand H, Schoelkopf B. Estimating diffusion networks: Recovery conditions, sample complexity & soft-thresholding algorithm. Journal of Machine Learning Research. 2015;.
- 14.
Gomez-Rodriguez M, Balduzzi D, Schölkopf B. Uncovering the Temporal Dynamics of Diffusion Networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML’11); 2011. p. 561–568.
- 15.
Gomez-Rodriguez M, Leskovec J, Schölkopf B. Modeling Information Propagation with Survival Theory. In: Proceedings of The 30th International Conference on Machine Learning (ICML’13); 2013. p. 666–674.
- 16.
Iwata T, Shah A, Ghahramani Z. Discovering latent influence in online social activities via shared cascade poisson processes. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2013. p. 266–274.
- 17.
Yang SH, Zha H. Mixture of mutually exciting processes for viral diffusion. In: Proceedings of the 30th International Conference on Machine Learning (ICML’13); 2013. p. 1–9.
- 18.
Yuan Q, Cong G, Ma Z, Sun A, Thalmann NM. Who, where, when and what: discover spatio-temporal topics for twitter users. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2013. p. 605–613.
- 19.
Linderman SW, Adams RP. Discovering Latent Network Structure in Point Process Data. In: ICML; 2014. p. 1413–1421.
- 20. Gomez-Rodriguez M, Song L, Du N, Zha H, Schölkopf B. Influence Estimation and Maximization in Continuous-Time Diffusion Networks. ACM Transactions on Information Systems (TOIS). 2016;34(2):9.
- 21. Hawkes AG. Spectra of some self-exciting and mutually exciting point processes. Biometrika. 1971;58(1):83–90.
- 22.
Valera I, Gomez-Rodriguez M. Modeling adoption and usage of competing products. In: Data Mining (ICDM), 2015 IEEE International Conference on. IEEE; 2015. p. 409–418.
- 23.
He X, Rekatsinas T, Foulds J, Getoor L, Liu Y. Hawkestopic: A joint model for network inference and topic modeling from text-based cascades. In: Proc. 32nd Intl. Conf. on Machine Learning; 2015.
- 24.
Tran L, Farajtabar M, Song L, Zha H. NetCodec: Community Detection from Individual Activities. In: SIAM International Conference on Data Mining (SDM). SIAM; 2015.
- 25.
Gui H, Sun Y, Han J, Brova G. Modeling topic diffusion in multi-relational bibliographic information networks. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM; 2014. p. 649–658.
- 26.
Hosseini SA, Khodadadi A, Arabzade S, Rabiee HR. HNP3: A Hierarchical Nonparametric Point Process for Modeling Content Diffusion over Social Media. In: Data Mining (ICDM), 2016 IEEE International Conference on. IEEE; 2016.
- 27.
Zarezade A, Upadhyay U, Rabiee HR, Gomez-Rodriguez M. Redqueen: An online algorithm for smart broadcasting in social networks. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM; 2017. p. 51–60.
- 28.
Zarezade A, De A, Rabiee H, Rodriguez MG. Cheshire: An Online Algorithm for Activity Maximization in Social Networks. In: 55th Annual Allerton Conference on Communication, Control, and Computing; 2017.
- 29.
Du N, Dai H, Trivedi R, Upadhyay U, Gomez-Rodriguez M, Song L. Recurrent marked temporal point processes: Embedding event history to vector. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2016. p. 1555–1564.
- 30.
Xiao S, Yan J, Yang X, Zha H, Chu SM. Modeling the Intensity Function of Point Process Via Recurrent Neural Networks. In: AAAI; 2017. p. 1597–1603.
- 31. Xiao X, Zheng Y, Luo Q, Xie X. Inferring social ties between users with human location history. Journal of Ambient Intelligence and Humanized Computing. 2014;5(1):3–19.
- 32.
Liu B, Xiong H. Point-of-Interest Recommendation in Location Based Social Networks with Topic and Location Awareness. In: Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM;. p. 396–404.
- 33.
Scellato S, Noulas A, Mascolo C. Exploiting place features in link prediction on location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2011. p. 1046–1054.
- 34.
Liu B, Fu Y, Yao Z, Xiong H. Learning geographical preferences for point-of-interest recommendation. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2013. p. 1043–1051.
- 35.
Yuan Q, Cong G, Ma Z, Sun A, Thalmann NM. Time-aware point-of-interest recommendation. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM; 2013. p. 363–372.
- 36.
R H Huayu Li YG, Zhu H. Point-of-interest recommendations: Learning potential check-ins from friends. In: Proceedings of the 22th ACM SIGKDD international conference on on Knowledge discovery and data mining. ACM; 2016.
- 37.
Tang LA, Zheng Y, Yuan J, Han J, Leung A, Hung CC, et al. On discovery of traveling companions from streaming trajectories. In: 2012 IEEE 28th International Conference on Data Engineering. IEEE; 2012. p. 186–197.
- 38. Zheng K, Zheng Y, Yuan NJ, Shang S, Zhou X. Online discovery of gathering patterns over trajectories. IEEE Transactions on Knowledge and Data Engineering. 2014;26(8):1974–1988.
- 39.
Lichman M, Smyth P. Modeling human location data with mixtures of kernel densities. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2014. p. 35–44.
- 40.
Lee JG, Han J, Li X. Trajectory outlier detection: A partition-and-detect framework. In: 2008 IEEE 24th International Conference on Data Engineering. IEEE; 2008. p. 140–149.
- 41. Liu S, Ni LM, Krishnan R. Fraud detection from taxis’ driving behaviors. IEEE Transactions on Vehicular Technology. 2014;63(1):464–472.
- 42.
Yu Zheng LWXX Like Liu. Learning Transportation Mode from Raw GPS Data for Geographic Application on the Web. In: WWW; 2008.
- 43. Ogata Y. Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical association. 1988;83(401):9–27.
- 44.
Barabasi AL. Network Science. Cambridge university press; 2015.
- 45.
Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes—Vol. I. Springer Ser. Statist., Springer, New York; 2002.
- 46.
Aalen O, Borgan O, Gjessing H. Survival and event history analysis: a process point of view. Springer Science & Business Media; 2008.
- 47.
Cox DR, Isham V. Point processes. vol. 12. CRC Press; 1980.
- 48.
Kingman JFC. Poisson processes. Oxford university press; 1992.
- 49.
Boyd S, Vandenberghe L. Convex optimization. Cambridge university press; 2004.
- 50. Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C, Ghahramani Z. Kronecker graphs: An approach to modeling networks. The Journal of Machine Learning Research. 2010;11:985–1042.
- 51. Ogata Y. On Lewis’ simulation method for point processes. Information Theory, IEEE Transactions on. 1981;27(1):23–31.