Recurrent spatio-temporal modeling of check-ins in location-based social networks

Social networks are getting closer to our real physical world. People share the exact location and time of their check-ins and are influenced by their friends. Modeling the spatio-temporal behavior of users in social networks is of great importance for predicting the future behavior of users, controlling the users’ movements, and finding the latent influence network. It is observed that users have periodic patterns in their movements. Also, they are influenced by the locations that their close friends recently visited. Leveraging these two observations, we propose a probabilistic model based on a doubly stochastic point process with a periodic-decaying kernel for the time of check-ins and a time-varying multinomial distribution for the location of check-ins of users in the location-based social networks. We learn the model parameters by using an efficient EM algorithm, which distributes over the users, and has a linear time complexity. Experiments on synthetic and real data gathered from Foursquare show that the proposed inference algorithm learns the parameters efficiently and our method models the real data better than other alternatives.


Introduction
The advances in location-acquisition techniques and the proliferation of mobile devices have generated an enormous amount of spatial and temporal data of users activities [1]. People can upload a geotagged video, photo or text to social networks like Facebook and Twitter, share their present location on Foursquare or share their travel route using GPS trajectories to Geo-Life [2]. A considerable amount of spatio-temporal data is generated by the activity of users in location-based social networks (LBSN). In a typical LBSN, like Foursquare, users share the time and geolocation of their check-ins, comment about a venue, or unlock badges by exploring new venues. These data motivated the researchers to study the human spatio-temporal behavior in social networks [3,4].
Many techniques have been proposed for processing, managing, and mining the trajectory data in the past decade [5]. Several other studies try to leverage the spatial data in recommender systems [6]. However, a few works have attempted to model the recurrent spatio-temporal behavior of users in LBSNs [7,8]. Given the history of users' check-ins, the goal is to predict the time and location of users' check-ins utilizing a model. This model can also be used to find the influence network between users which made up of their check-ins, detect the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 influential users and popular locations, predict the peak hours of a restaurant, recommend a location, and even control the movement of users.
In this paper, we propose a probabilistic generative model for the check-ins of users in location-based social networks, which can be used in predicting the future check-ins of the users, and discovering the latent influence network. People usually have periodic patterns in their movements [8][9][10]. For example, a typical user may check into her office in the morning and to a nearby restaurant at noon then return home and repeat this behavior in the following days. We model the time of check-ins of each user with a novel periodic-decaying doubly stochastic point process which leverages the periodicity in the movements of users and can also capture any drift in their patterns. To model the location of check-ins we use the fact that users in social media are influenced by the activities of their friends [11][12][13]. If many of your close friends have checked into a specific restaurant recently, then there is a high probability that you select that restaurant, next time. We model the location of check-ins using a time-varying multinomial distribution. In summary, we propose: • Doubly stochastic point process for modeling the time of users' check-ins, which captures the periodic behavior in the movement of users.
• Time-varying multinomial distribution for modeling the location of users' check-ins, which incorporates the mutually-exciting effect of the friends' history.
• Scalable inference algorithm based on the EM algorithm to find the model parameters, which is distributed over users, and has a linear time complexity.
• Compelling dataset of Foursquare users' check-ins, curated from 12000 active users during three months in the year 2015.
Our work relates to previous work on temporal point processes, and location-based social networks analysis.
Modeling information diffusion in social networks has attracted a lot of attentions in recent years. Given the times that users have adopted to a contagion (information, behavior, or meme), the problem is to model the time and user of the next adoption, i.e., predict the next event. Early methods [14,15] studied information diffusion using a pair-wise probability distribution for each link from node j to i, which is the probability that node i generates an event in time t i due to the event of node j at time t j . These methods overlook the external effects on the generation of events. In addition, they assume that each node adopts a contagion at most once, i.e., events are not recurrent. These issues were later addressed in [7,[16][17][18][19][20], which they use point processes for the modeling of events. In [15][16][17]19], cascades are assumed to be independent and are modeled by a special point process, called Hawkes [21]. The independence assumption is removed in [11,22], they tried to model the correlation between multiple competing or cooperating cascades. In [7] a spatio-temporal model is proposed for the interactions between a pair of users not an individual user as in our model. Other studies [17,[23][24][25][26], use the additional information of the diffusion network such as topic of tweets or the community structure to better model the influence network. Moreover, in [27,28], a stochastic optimal control framework is proposed to control the diffusion process in complex networks. Recently, the recurrent neural networks (RNN) are utilized to learn the intensity function of a temporal point process as a nonlinear function of the history and solve the resulting nonlinear optimization by a stochastic gradient algorithm [29,30]. Most of the previous works studied the temporal diffusion of information on microblogging networks like Twitter, whereas we try to model the time and location of users' check-ins in the location-based networks like Foursquare. Moreover we proposed a periodic point process which is of independent importance, whereas in the previous studies the self-exciting point processes is used for the modeling of events.
The prior works in location-based social networks can be categorized into three groups [6]: location recommendation, trajectory mining and location prediction. The main approaches in location recommendation systems are: content-based which uses data from a user's profile and the features of locations [31,32]; link-based, which applies link analysis models like PageRank to identify the experienced users and interesting locations [33,34]; and collaborative filtering which infers users' preferences from their historical behavior, like the location history [35,36]. In trajectory data mining, the source of data is usually generated by the GPS. These works include; trajectory pattern mining to find the next location of an individual [8,[37][38][39], anomaly detection to detect unexpected movement patterns [40,41], and trajectory classification to differentiate between trajectories of different states, such as motions, transportation modes, and human activities [42]. A comprehensive review of these methods can be found in the recent survey [5]. We also discriminate our work from location recommendation and trajectory mining methods, because our goal is to model the check-ins of users not to recommend a location or to find the trajectory patterns of users with the position data of their routes. In location prediction, the goal is to predict the next location, given the user's profile data and the history of check-ins. But these methods do not consider; the relation between friends (using the influence matrix), aging effect in the history of checkins (using decaying kernel), exogenous effects on users' decisions, and periodicity in users' movement patterns.

Preliminaries
To model the time of occurrences of a phenomenon, which are called events, we can use point processes on the real line. The phenomena can be, an earthquake [43], a viral disease [44] or the spread of information over a network [15]. The sequence of events, as defined below, is the realization of a point process.
Definition 1 (Point Process). Let ft i g i2N be a sequence of non-negative random variables such that 8i 2 N; t i < t iþ1 , then we call ft i g i2N a point process on R, and F t ¼ ft i j i 2 N; t i < tg as its history or filtration.
There are different equivalent descriptions for the point processes such as; sequence of points {t i }, sequence of intervals (duration process) δt i , counting process N(t), or intensity process λ(t) [45]. In the following, we briefly explain each definition.
The counting process N(t) associated with the point process ft i g i2N , counts the number of events occurred before time t, i.e., NðtÞ ¼ P i2N Iðt i < tÞ, where indicator function Iðx 2 AÞ is 1 if x 2 A, and is 0 otherwise. The duration process δt i associated with the point process ft i g i2N is defined as 8i 2 N; dt i ¼ t i À t iÀ 1 . Finally, the intensity process λ(t) is defined as the expected number of events per units of time, which generally depends on the history: To evaluate the likelihood of a sequence of events, f(t 1 , t 2 , . . ., t n ), we can use the chain rule of probability, f(t 1 , t 2 , Á Á Á, t n ) = ∏ i f(t i |t 1:i−1 ). Therefore, it suffice to describe only the conditionals, which are abbreviated to f Ã (t). According to the definition of point processes, we can write the probability of occurring the (n + 1)'th event in time t as: If we divide both sides of the above equation by 1 − F Ã (t), where F Ã (Á) is the cdf of f Ã (Á), then in the limit as dt ! 0, we have: Therefore, according to the definition of intensity, we find the relation between conditional distribution of the time of events and the intensity function as: where we use Ã superscript to show that a function is dependent on the history. We can also express the relation of λ Ã (t) and f Ã (t) in the reverse direction [46]: Now, the cdf can be easily evaluated: A point process is usually defined by specifying its conditional distribution f Ã (t) or equivalently its intensity λ Ã (t). In the simplest case, the intervals δt i are assumed to be i.i.d., therefore the process is memoryless, and hence λ Ã (t) = λ(t). The Cox process [47] is a doubly stochastic point processes, and conditioned on the intensity is a Poisson process [48]. Hawkes process [21] is a special type of Cox process, where the intensity is expressed by the history as: where ϕ(t) is the kernel of the Hawkes process that defines the effect of past events on the current intensity, and μ is the base intensity. For example, the exponential kernel ϕ(t) = exp(−t), is used for the modeling of self-exciting events like earthquake [43]. In general, we have a multivariate process with a counting process vector N(t) = [N 1 (t), Á Á Á, N n (t)] T and an associated intensity vector λ Ã ðtÞ ¼ ½l Ã 1 ðtÞ; Á Á Á ; l Ã n ðtÞ T defined as: Fðt À tÞ dNðtÞ ð5Þ where F(t) is the matrix of mutual kernels, i.e., F ij (t) models the effect of events of counting process N j (t) on N i (t), μ = [μ 1 , Á Á Á, μ n ] T is the base intensity, and A = [α ij ] is a matrix of mutualexcitation kernels. Often, the point process carries other information than the time of events, which is called mark. For example, the strength of an earthquake can be considered as a mark. The mark m, often a subset of N or R, is associated with each event through the conditional mark probability function f Ã (m|t): The mutually-exciting property of the Hawkes process makes it a common modeling tool in a variety of applications such as seismology, neurophysiology, epidemiology, reliability, and social network analysis [14,15,22].

Problem definition
Given the history of users activity in a location based social network, G ¼ ðV; EÞ, with jVj ¼ N users and L locations in C different categories, we propose a generative model for the checkins of users. In other words, for each user we can predict the location and time of her next check-in.
We define a check-in as a 4-tuple (t, u, c, l), which shows the time t that user u check-in to location l with category c. We observe the sequence of all check-ins in the network G, in the where ϕ i is the unique id of the i'th location. Since we use location ids instead of geo-coordinates, it is fair to assume the observation data is noiseless, however in practice, there may be an uncertainty in the locations of check-ins, which we are considering it as a future work. We use the following notation for the history of check-ins of user u in location l with category c up to time t: Moreover, we use the dot notation to represent the union over the dotted variable, e.g., D uÁÁ ðtÞ represents the events of user u, before time t, in any location with any category, and D " ucÁ ðtÞ represents the events of all users except u, before the time t, in any location with category c.
By observing the periodic pattern in the time of users' checkins (see the Results section) we model the time of check-ins using a doubly stochastic point process which incorporates both the periodic patterns and exogenous effects, in the users' movements. The exogenous effects are any other external effects on the users' time of check-ins which are not necessarily periodic. To model the location of check-ins we propose a time-dependent multinomial distribution which incorporates the mutually-exciting effect of friends, which this effect is also empirically observed in the real data.
Proposed method modeling the time of check-ins. In every working day, a user may check-in to her office in the morning then go to a restaurant at noon, and also have a weekly soccer practice program. By observing the history of the time of check-ins of a user, if she repeats some patterns recently (within several days), for example take a walk every afternoon, then it is more likely to repeat this pattern shortly in the upcoming days at approximately the same time. It means, there is a periodicity in the users' behaviors. Moreover, there maybe also a drift or an addition of a new activity in the user's behavior, for example, the working hour of her office may change or there may be a new weekly social gathering. Therefore, we need a periodic point process to model the time of user's check-ins, which can also adapt to the new users' check-ins. This is in contrast to the self-exciting nature of the Hawkes process, which is used to model the diffusion of information over a network [14,15,17].
We propose a doubly stochastic point process which is periodic, and also has a diminishing property that enables the process to change its periodic pattern and adapt to the new behaviors. The proposed process, is composed of a Poisson process with the base intensity μ, where each event t i of this process triggers a Poisson process with the following intensity: where h(t) is the kernel of the process, g(k) is a decreasing function to diminish the intensity in the future periods, and the hyper-parameter τ is the period. This intensity is illustrated in Fig 1. The self-exciting property of the Hawkes process can be observed from its exponentially decaying kernel in Fig 1. In the Hawkes process when an event occurs, there is a high probability to have events just after it, and this probability decreases exponentially afterward. But in the proposed process, there is a high probability to have events in the upcoming periods and this probability also decreases exponentially.
According to the superposition theorem [48], the intensity of the proposed process can be written as follows: To preserve the locality in time, the kernel h(t) should have a peak at t = 0 and decay to zero in both sides when t ! ±1. For example, the Gaussian kernel, h(t) = exp(−t 2 /2σ 2 ) meets this requirements. This model has three main features: 1. Periodic Nature. When an event occurs in time s, then the intensity of events around this time in the upcoming periods, s + kτ, would increase.
2. Temporal Locality. The intensity is high around the peak of the kernel and drops rapidly in both sides.
3. Adaptability. The peak of the kernel decreases by the increase of k, so the process can adopt its intensity to any new periodic patterns.

Exogenous Effect.
Other external effects can be modeled by the base intensity μ.
If we use a truncated Gaussian kernel like hðtÞ ¼ exp ðÀ t 2 =2s 2 Þ IðÀ t=2 t t=2Þ, then we can substantially reduce the complexity of the intensity function. With this kernel we can show that: where k i ¼ b tÀ t i t c is the period number of which the event in t i affects on the current intensity. So, we propose the following point process for the time of check-ins of user u in any location with category c: The first term, μ uc is the base intensity that models the external effect on user u to generates check-ins with category c, the second term is the periodic effect of the history, β u is the kernel parameter, and τ, σ are hyper-parameters. All parameters of the model are listed in Table 1.
The intuition of this model is that, if a user check-ins frequently, for example in the "restaurant" category at noon, then with high probability, she will checks in a restaurant at noon in the next day. modeling the location of check-ins. In this section, we propose a model for the location of users' check-ins, given the history of check-ins. We use the fact that, users in social networks are influenced by the behavior of their neighbors. Let denote the weight of location l with category c for user u as: which incorporates α u i u , the influence of user u i on u, and the time of check-ins with an exponentially decaying kernel. This kernel diminishes the effect of far past check-ins, so the model can adopt to any new behaviors of the users' check-ins. Therefore, a location which checked in recently with many or even few but influential friends would have high weight. We also define a weight for the popularity of a location l with category c from the perspective of all users: where the location that is most checked in recently, has the highest weight. When a user decides to check-in for example, at a restaurant, she selects a location that herself or her friends have checked in frequently, recently (exploitation effect), and sometimes she check-ins to a new popular restaurant (exploration effect). Therefore, we use the following multinomial conditional distribution to define the probability that user u check-ins to location ℓ, given the time t and category c: The Dirac delta function δ ϕ l (ℓ) is 1 if ϕ l = ℓ, otherwise it is 0, and the parameter η uc models the inclination of the user to explores new locations. This distribution means that, with probability w ucl /(η uc + w ucÁ ) the current location would be a previously checked in location ϕ l by the user u or any of her friends (since for non visited locations the weight w ucl is zero), and with probability η uc /(η uc + w ucÁ ) it would be selected from all locations in the network, with a probability that is modeled by the following distribution: Where according to the definition of coefficient m cl , it assigns more probability to the popular or recently frequently visited locations. The main features of the proposed location model are: summary of the generative model. The proposed generative model is summarized in Alg. 1. Using the superposition theorem, first the time t of check-in is sampled from the proposed periodic point process λ(t) = ∑ u,c λ u (t, c), then the user u which generated this event is selected in proportion to its intensity λ u (t). The category c of the check-in is also selected in proportion to λ u (t, c). Finally, the location l is sampled from the proposed location model.
inference. We propose a Bayesian inference algorithm based on the EM algorithm to find the model parameters. To find the maximum likelihood solution, for each check-in (t i , u i , c i , l i ), we define a latent variable z i as the user that caused u i to check into location l i , given the time t i and category c i . We use 1-of-N coding to represent z i 's. For notional convenient, lets define: where g v uc' is the contribution or influence of user v in the check-in of user u at location l with category c. Now, we define: where z iv is the v'th element of z i , or the index of the user that caused i'th check-ins. But, v = 0 is not the index of a user, it represents the exploration effect. It can be verified that marginalizing out the z i , ∑ z i f u i (l i , z i |t i , c i ), results in the probability distribution (13). Now, to evaluate the complete likelihood pðD; ZjyÞ of the data D and hidden variables Z ¼ fz i g If we consider (c i , l i , z i ) as the mark m i of the process, according to this proposition the complete likelihood of our model is, where using Bayes' rule and Eq (17) it can be evaluated as follows.
To derive the second line, we used the superposition theorem, and the fact that the probability of a category, according to our generative model is f u i (c i |t i ) = λ u i (t i , c i )/λ u i (t i ). Given the joint distribution of the observed and latent variables pðD; ZjyÞ, we use EM algorithm to maximize the likelihood function pðDjyÞ with respect to θ. In the E-step we evaluate pðZjD; yÞ. Using Bayes' rule we can write the posterior distribution of the latent variables as, which factorizes over i, so that z i 's are independent with multinomial distribution and we can write the expected of z iv under this distribution as follows.
In the M-step we maximize E Z ½ ln pðD; ZjyÞ the expected complete log-likelihood, which can be decomposed to the sum of expected log-likelihoods of users E Z u ½ ln pðD u ; Z u jy u Þ. Where Accordingly, the M-step can be decomposed to multiple maximizations over users, which can be done in parallel. Therefore, for each user u, the two steps of the EM algorithm can be summarized as follows.
In the following proposition, we prove that the maximization in M-step is concave, so it has a unique and optimal solution. Moreover, the performance of the overall inference algorithm is not affected by the network size, as long as the average degree of the network and the average number of events per users remains fixed. Since, they define the number of parameters and observed data of each EM inference algorithm, and consequently define the performance of the overall inference algorithm. Proposition 2. The expected log-likelihood of a user, E Z u ½ ln pðD u ; Z u jy u Þ as a function of fm uc ;Z uc ;ã uv ; b u g is concave, where a uv ¼ exp ðã uv Þ and Z uc ¼ exp ðZ uc Þ.
Proof. According to Eq (21) the log-likelihood of user u is: The first term is a linear function of {μ uc , β u }, so it is both convex and concave. The second term is the log of a linear function which is concave, according to composition rules [49]. The third term is composed of log g v uc i l i , which for v > 0, In both cases log g v uc i l i is concave according to Lemma 1 of [11] which state that logarithm of sum of linear exponentials is convex. So, the overall expression is concave. Actually, we usẽ Z uc ;ã uv instead of η uc , α uv in the implementations, and solve the resulting concave optimization.
To find the time complexity of the inference algorithm, by carefully investigating all terms in the M-step, it can be verified that, each gradient descent iteration in maximization 23 has O (k u h u ) operations, where k u , h u are the number of neighbors and the size of history of user u, respectively. Therefore, the approximate order of the overall inference algorithm is O(mkhN), where k, h are the average network degree and events per user, and m is the number of EM iterations times the number of the gradient descent iterations. In practice, since m which depends on the desired tolerance in the EM algorithm, and k are constant (the average degree of the most real work networks are less than 10 [44]), the overall complexity can be simply expressed by O(hN), which is linear with respect to the number of users N, and the average number of events per user h.

Datasets
To evaluate the proposed method we use a synthetic data, and a real data gathered from users check-ins data in Foursquare. All dataset is available through our git repository, github.com/ azarezade/STP. Our data collection method complies with the terms of service of both Twitter and Foursquare. Moreover, the dataset is anonymized and does not reveal the identity of actual users.

Results and discussion
In this section, using both synthetic and real data, we evaluate the performance of the proposed method. First, in the synthetic data experiments, we show that the proposed inference algorithm can learn the model parameters with high accuracy. Then in the real data experiments, we show that the proposed method outperform the other competing methods.

Experiments on synthetic data
Following the literature, we use the synthetic data generated from our model to evaluate the performance of proposed learning algorithm. Moreover, we analyze the effect of model parameters on users behavior.
We experiment with five random Kronecker networks [50] with N = 64 nodes, namely Core-periphery, Heterophily, Hierarchical, Homophily, and Erdos-Renyi The temporal and spatial model parameters are randomly drawn from the uniform distributions μ uc , η uc * U(0, 0.05), α uv * U(0, 0.5) and β u * U(0, 0.1). The period and standard deviation in the temporal model set to τ = 12 and σ = 0.5, respectively. We generate 16000 check-ins from our model, using the Ogata method [51], and consider the first 80% of them for the train and the remaining 20% for the test data. Then we learn the model with different percentages of the training data, and evaluate the average predicted log-likelihood on the test data (AvgPredLogLik) and the mean squared error between the estimated and real parameters (MSE). The inference algorithm is implemented in parallel for all users. All source codes and datasets are available in our git repository.
In Fig 2, the AvgPredLogLik and MSE of the temporal model is plotted versus the size of train data, where the average estimation error decreases to about 7 × 10 −4 . These measures are also plotted for the spatial model with different random network structures in Fig 3, given the time of check-ins. We can see that the parameter estimation error decreases and the average log-likelihood increases as we increase the size of train data, which shows the proposed inference algorithm can consistently learn the model parameters with a very small estimation error.  To investigate the network structure prediction of our model, for each size of the train data, we use a threshold to convert the predicted weighted network (i.e., the α ij 's) to a (0, 1)-adjacency matrix and evaluate the percent of recovered edges to form the ROC curve. Then, we find the AUC curve, which is illustrated in the middle of Fig 4. Our method finds 64% of edges using only 150 events per user in the train data.
To study the effect of model parameters on the users' behavior, we design two experiments. First, we define a measure called Sociality. For each user, the Sociality is the percent of checkins that their location has been previously visited by the user or her friends. According to our spatial model, Eq (13), the exploration of users increase as we increase η or decrease α. To empirically validate this property of our model, in the right of Fig 4 the box plot of the users' Sociality is illustrated for different parameters. The average sociality reaches up to 80% when the average ratio of spatial parameters, " a=" Z is equal to 100. It means that, users with high α/η are more affected by their friends. Moreover, to see the effect of temporal model parameters on the check-ins time of users, we plot the distribution of users' interevent time (the time difference between two successive events in a specific category for each user). According to Eq (10), parameters β and μ regulate the periodicity in the time of events. The higher β, would result in more periodic events. We fix μ and set β = 0 and 1 in the left and right graphs of Fig 5, respectively. As we see, there is a peak around 12 in the right graph, which is the period of the simulated events but, in the left figure the frequency of events reduces exponentially and there is no peak except the initial one.

Experiments on real data
In this section we use the real data gathered from users' checkins in Foursquare, which is a popular LBSN, to evaluated the proposed method against other alternative continuous time check-in models.
We used both Twitter and Foursquare APIs to crawl the check-ins data of the users in Foursquare, because Foursquare does not provide the full check-ins data. Specifically, we crawled the tweets of the users that have installed Swarm application, and publicly tweet their checkins. This app is connected to the Twitter and Foursquare account of the user. When a user check-ins, using this app, she can tweet the URL of that location in the Foursquare website. Therefore, we have access to the location details (via Foursquare API) and the time of checkins (via Twitter API). Using the Twitter search API we found active users with high check-ins rate in Foursquare. By querying the API with "I am at", the default template of Swarm app for check-ins, we selected the top 12000 users, and crawled their tweets in ten weeks during the year 2015. We pruned the data by selecting 1000 active users that were in the same country (Brazil). The average degree of the network is 6.4. The total number of check-ins is about 60000. The number of unique locations is about 10000 in 10 categories. Our data collection method complies with the terms of service of both Twitter and Foursquare. Moreover, the dataset is anonymized and does not reveal the identity of actual users.
We use the first eight weeks of the check-ins for train, and the remaining two weeks for test. The hyper-parameters of the temporal model are set to τ = 24 and σ = 1, by cross validation. We learn model parameters by the train data and use different temporal and spatial measures for the evaluations. We compare our proposed model with MH [17], where the intensity of user's check-ins is modeled by a multivariate Hawkes process (the intensity depends on the user and her friends' history); RNN [30] which use a recurrent neural network to learn a nonlinear intensity function based on the users' history of events; and baseline HP where the intensity is modeled by a Hawkes process that also depends on the user's history. The spatial model is also compared with two baselines, MP and PL. In the MP method the most checked in locations, disregarding the time of check-ins, are more probable to be selected as the next check-in location. The PL model assumes periodicity in the location of check-ins, the locations that are more checked in previous periods are more probable to be visited in the current time.
To reveal the motivation of the proposed method, we perform two empirical experiments on the real data. In summary, Fig 6 shows that: (i) most of the events are repeated after one, or more days (since there are peaks in the left graph at 1, 2, 3, . . .), which verifies the use of a periodic point process for modeling the time of users' check-ins; (ii) about 80% of users are affected by their friend's location of check-ins (the blue box) which justifies the use of the proposed mutually-exciting spatial model; (iii) only 10% of users explore new locations (the red box), which these users are modeled by the parameter η in Eq (13); (iv) as we more increase the size of the history time window, the less Sociality increases, which validates the use of the exponential decaying kernel in Eq (11) to reduce the effect of far past history.
To evaluate the prediction accuracy of the time of check-ins, we design two experiments. For each test event we estimate the time of the next event by different methods. The percent of check-ins which their times are closer than a threshold to the real time is plotted in the left Recurrent spatio-temporal modeling of check-ins in location-based social networks graph of Fig 7. Our method achieved up to 35% improvement for a one hour threshold, compared to other methods. In the right graph, the number of users where the average distance of their estimated events is less than a threshold is plotted. The proposed method performed up to 20% better than the competing methods. We did not plot this graph for the thresholds less than 6 hr, where all methods perform poorly. The poor performance of the RNN method is probably due to underfitting, since its objective function is nonconvex (in contrast to the other  Recurrent spatio-temporal modeling of check-ins in location-based social networks methods, which are all convex), and the SGD method for the inference need much more training check-in data, which is rare in most of the real-world applications. Now, given the time of check-ins, we evaluate the prediction accuracy of the location of check-ins. For each test event, each method assigns a probability to each location, forming a vector and selects the most probable location. Accuracy@k is the percent of events that the true location is among the first k high probable locations, and NDCG@k is where r(e i ) is the (one-based) rank of the real location of i'th check-in in the location probability vector. These measures are plotted in Fig 8. For k = 1 the accuracy increase from $7% in other methods to $11% in our method-about 43% improvement. It should be noted that there are about 10,000 locations and the random guess has extremely low accuracy. For larger values of k the measure is less reliable, since all method would have the same accuracy. Our method reaches to 24% accuracy, and about 8% improvement at k = 40. But in the NDCG which dose not have the mentioned undesirable effect (since the low-rank events are more significant) we see our method consistently outperform the others-about 30 to 50% improvement for the different values of k.
Finally, we performed the scalability analysis for different methods as depicted in Fig 9. In the right graph we compared the inference time for different sizes of event history, in the real dataset. Our method achieved the second best performance. For better comparison, the time complexity of all models, expect RNN, are measured on a single core machine, although our method and HP can be executed in parallel and consequently the CPU time would be divided by the total number of cores. The time of RNN method is multiple orders of magnitude slower than the others, although we executed it on a 10-core machine, since the SGD methods need much more iterations to converge. Moreover, if we fit a line to these log-log curves, the slopes of Our, HP, MH, RNN, and Spatial curves would be 1.1, 1.3, 1.4, 0.01 and 1.2, respectively. This, validates the linear time complexity of our model with respect to the size of history h. In the left graph we compared the inference time in the synthetic data with different network sizes. Again, our method is the best performer after HP. Here, the slopes are 0.96, 0.98, 0.91, 0.99 and 1.2 for Our, HP, MH, RNN, and Spatial methods, respectively. These results validate the linear time complexity of our model with respect to the size of network N. Recurrent spatio-temporal modeling of check-ins in location-based social networks

Conclusion
To model the check-ins of users in location-based social networks, we proposed a doubly stochastic point process for the time of check-ins, which leverages the periodicity in users' behavior, and a multinomial distribution for the location of check-ins, which leverages the mutuallyexciting effect of friends on decision of users.
The synthetic experiments show the proposed inference algorithm can learn the model parameters with high accuracy and its performance increases consistently by the size of train data. Moreover, we study the effect of model parameters on the users' check-ins, from which one can interpret the users' behavior in LSBNs from their inferred parameters. The real experiments on the curated Foursquare check-ins dataset, show the proposed method outperform the other competing methods in the time and location prediction of users' check-ins. Specifically, we achieved up to 35% in the time prediction and 43% in the location prediction accuracy. Furthermore, the empirical studies show the real data meets the assumptions of the proposed model that is, users are periodic in the time and mutually-exciting in the location of their checkins.
Our work also opens many interesting venues for future works. For example, we can consider the home location of the users in defining the probability of the location of their checkins, by modifying the weight of locations in Eq (11). In addition, we can investigate the utilization of a non-parametric spatial model instead of the multinomial distribution. Finally, we can use the proposed model to control the check-in behavior of users by incentivization, or use it for point-of-interest recommendations.