Figures
Abstract
Space-time self-exciting point process models are introduced to capture the clustering features in crime datasets. It does particularly well in modeling social network datasets, crime and security datasets, financial datasets, and seismic datasets. However, there has been limited analysis of large crime datasets using the space-time self-exciting point process models due to a lack of flexibility in the estimation of the conditional intensity function and computational challenges associated with large datasets. To explore the applicability of these models for crime data, we propose a multi-dimensional Gaussian-type exponent approximation method. This method addresses computational difficulties associated with large datasets and enables flexible estimation of the conditional intensity function. We evaluate the proposed method through simulations and apply it to study the space-time patterns of burglaries in Chicago, Illinois, United States. The results demonstrate that the proposed method is flexible, has overcome computational difficulties, and reveals a strong clustering phenomenon in the burglary data.
Citation: Nketiah EA, Li C, Yang W, Jing Y, Guo P (2026) Estimation of space-time self-exciting point process models using multi-dimensional Gaussian-type exponent approximation. PLoS One 21(4): e0345865. https://doi.org/10.1371/journal.pone.0345865
Editor: Shun Cao, University of Houston, UNITED STATES OF AMERICA
Received: October 16, 2025; Accepted: March 11, 2026; Published: April 2, 2026
Copyright: © 2026 Nketiah et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The dataset supporting this study is publicly available on Figshare at: https://figshare.com/articles/dataset/multi-dimension_exponent_approximation/31302694.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Space-time point process models have gained significant attention as an extension of temporal point process models by incorporating spatial components. The space-time self-exciting point process model predicts the occurrence of events as a function of spatial components, time, and previous historical events; when an event happens, the likelihood of subsequent events increases. The traditional space-time Poisson process models are limited for analyzing this type of event because they assume conditional independence and do not account for history dependence, even when applied to inhomogeneous settings; hence, the space-time self-exciting point process models are introduced to handle such situations. According to [1], it is intuitively natural to specify a space-time self-exciting point process model by its conditional intensity function at a certain location and time, given the past events of the process up to that time.
In temporal self-exciting point process models, the occurrence of an event triggers other events to happen [2]; similarly, in space-time self-exciting point process models, the occurrence of events across spatial and temporal dimensions triggers other similar events to occur around or at the same location at a certain point in time. These models have been observed to capture clustering and triggering behaviors naturally, and have been applied in areas where spatiotemporal events occur, such as criminology [3,4], epidemiology [1], earthquakes [5], etc. The conditional intensity function of the self-exciting point process model is partitioned into two components: a background rate or intensity and the triggering effect. For instance, in seismology, the occurrence of a major earthquake triggers other minor earthquakes, called aftershocks, to occur. The main earthquake occurs independently as a result of the background intensity, while the aftershocks, triggered by the main earthquake, represent the triggering effect.
Due to the clustering of spatiotemporal events, it is well known that when an earthquake occurs, it increases the likelihood of subsequent earthquakes in nearby regions over a short period of time [5]. To model this behavior, the parametric Epidemic-Type Aftershock Sequence (ETAS) model was proposed for earthquake datasets [6]. In the case of criminology, the spread of crime results in the formation of crime clustering patterns in space and time, studied by [3]. A study into criminology has shown that crime is spatially concentrated or location-specific, and its continual occurrence follows a “contagion-like” process [7]. For instance, burglars may target a particular locality frequently due to its vulnerability [8]. The application of self-exciting point process models in criminology, specifically using the Hawkes process, was first studied by [3], with subsequent research expanding on the use of self-exciting process models [9,10].
Moving forward, [11] also investigated the spread of crime in a spatiotemporal setting and assessed the predictive performance of space-time self-exciting point process model for assault and burglary crimes. The use of basis function to investigate the triggering effect and background intensity have been studied [12,13]. [12] research focuses primarily on estimating the triggering function through iterative algorithm while [13] studies emphasize on combining maximum likelihood estimation (MLE) with group-lasso regularizer. [14] proposed a nonparametric estimation procedure for marked Hawkes processes and compared the results with other nonparametric methods. The study also establishes the relationship between the jump correlation matrix and the Hawkes kernel matrix using Wiener-Hopf systems of equations. [15] proposed a nonparametric approach where the research combines a parametric triggering effect and a nonparametric background intensity process, weighted kernel. By employing the “model-independent stochastic declustering algorithm (MISD)” to estimate both the background function and triggering effect [16]. A recent research by [17] expanded this approach to improve the estimation of the background intensity and enhance the capability of the trigger effect. In addition, [18] proposed a nonparametric approach for learning the triggering effect of the intensity function using online learning algorithm. This method makes use of the reproducing kernel Hilbert space for the triggering functions, and with proper regularization of the objective function, it becomes possible to reduce the triggering function estimation to an estimation of increasing sets of coefficients using the representer theorem. However, space-time self-exciting point process models are associated with high computational costs of estimation; therefore, fast algorithms are needed to enhance efficiency.
Criminology datasets provide rich insights into the timing and location of crime events, revealing complex dependencies between incidents. These dependencies often stem from individuals who report crimes, including direct participants, witnesses, or those indirectly informed about the events. Notably, a single criminal act can trigger a cascade of reports to law enforcement, creating a pronounced clustering of reports within a narrow time frame. By studying these datasets, researchers can uncover patterns in the dissemination of information related to criminal activities and identify hazard-prone areas. [19] introduced an Expectation-Maximization (EM)-type algorithm for maximizing the log-likelihood of a nonparametric model; however, it required substantial computation time to fit large datasets. Alternatively, a parametric model with exponential form of the triggering function spent less time than others when fitted to a large data set. Existing works often restrict parametric forms of the triggering functions to the exponential functions [20–22]. The intensity function can be assessed by considering only ‘recent’ occurrences, and the Markov property holds under exponential forms of the triggering functions. Although it is far less expensive computationally than using the nonparametric estimation, its applicability is limited.
In this paper, we focus on designing fast algorithm for estimating the intensity of self-exciting point process models specifically for crime events using data set from the city of Chicago, Illinois, in the United States, with an emphasis on a specific type of crime. For exploring the use of space-time self-exciting models for modeling crime datasets, a multi-dimensional Gaussian-type exponent approximation method based on the maximum likelihood (ML) is considered. The proposed nonparametric form allows for flexible estimation of the conditional intensity process in space-time self-exciting models and delves into space-time self-exciting point process models to better capture and characterize the dynamics of criminal events. The main problem is that the general approaches can be computationally expensive [18]. To solve the computation problem, a truncated conditional intensity process is proposed, which eases the computational difficulty that might be caused by the data size. By implementing the proposed method, we aim to examine the following: Is the proposed nonparametric approximation method both flexible and effective in handling space-time self-exciting point process models? Does the space-time self-exciting point process models based on the proposed method account significantly for the clustering characteristics observed in the crime datasets?
To address the primary concern, we consider simulation datasets generated from the crime model proposed by [3] to examine the efficiency of the developed nonparametric approach. The simulation-based algorithm introduced by [23] is used to generate the simulation datasets. For the second concern, we use residual analysis, a straightforward and effective method for model diagnostics [24], to evaluate the fitting performance. The contribution of this paper is to illustrate how point process analysis delivers valuable insight into the spread patterns of information during crime events and the distinctive factors of areas prone to hazards. And the proposed multi-dimensional Gaussian-type exponent approximation method can also be applied to other large datasets due to its flexibility in estimating the conditional intensity function and addressing computational challenges associated with large datasets.
In Section 2, we introduce the fundamental ideas of the space-time self-exciting point process models. Section 3 presents the multi-dimensional Gaussian-type exponent approximation method, the ML estimation, model diagnostics for the simulation algorithm and goodness-of-fit evaluation. In Section 4, we describe the effectiveness of the suggested method for fitting the simulated burglary datasets. The crime data set is analyzed in Section 5. In Section 6, a summary of the findings and potential avenues for further research would be provided.
2 Space-time self-exciting point process models
The space-time Hawkes point process is an extension of the univariate Hawkes process, generalizing it from the one-dimensional line to a higher-dimensional space where
. Space-time Hawkes point process models incorporate the spatial component s into the univariate Hawkes process. These models were developed to simultaneously account for both spatial and temporal components when the scientific question of interest involves studying their joint behavior, something that cannot be addressed by separately analyzing spatial and temporal components of space-time data [25]. For instance, analyzing a series of burglary-related calls from residents over a period of time requires space-time data analysis, enabling the timely detection of localized spatial and temporal peaks in criminal activity. This approach helps identify burglary hotspots aiding in monitoring and crime prevention. The space-time point process is characterized via its conditional intensity
. By extending the conditional intensity function, the space-time point process models forecast event occurrences at spatial locations
and occurrence times
. In general, if the conditional intensity exists, it is not conditioned solely on the counting process N(s,t) but also the filtration
which contains the history up to time t, given by
, but may include other additional information [6,26].
We denote N as a simple space-time point process in the interval T and within a two-dimensional spatial region . Let the filtration
consist of all the historical events up to time t. The intensity process
is interpreted as the average occurrence rate at which points cluster around a certain space-time location s, conditioned on the history
of all points observed up to time t, where
. This history includes all spatial locations and occurrence times of all events that happened before time t. Provided the conditional intensity function exists, we can defined it as in (1)
where the spatial coordinate is denoted as .
Determining the influence of past variables on the conditional intensity is a major problem that needs to be addressed [27]. Modeling such point processes requires specifying unique mathematical structures for defining the conditional intensity function. Self-exciting point process models are a specific class of conditional intensity process models. For a space-time data set consisting of spatial locations and occurrence times ti up to time t for some events, the conditional intensity process of a space-time self-exciting point process is defined as follows.
For a simple space-time point process N, the conditional intensity function can be expressed as follows:
where represents the spatial locations of events, and
denotes the corresponding occurrence times. For
, the function
takes the value 1 if an infinitesimal element
contains an event
for some index i; otherwise,
is equal to 0.
Equation (2) is partitioned into two categories: either a background intensity process (immigrant) or triggering effect (offspring or descendant). This is described by [26] as a subcritical (stationary) branching process with immigration. The background intensity process, , defines the rate of incoming immigrants, independent of past events. New arrivals from background process are immigrants, while new arrivals triggered by past events are descendants. Whenever a new event occurs the total intensity
increases due to self-excitation; however, the background intensity
remains unchanged. For detailed introduction to self-exciting point processes and conditional intensity processes, see [28,29]. The log-likelihood of the conditional intensity process for the space-time point process model in (2), defined over the time frame
, is given as follows. Following [29], let N be a space-time point process and assume that n points have been observed in the time interval
. Define the spatial coordinates as
. The log-likelihood function is given by:
It should be noted that the log-likelihood is contingent upon the selection of the observation period D. Additionally, the history described in (2) is modified to represent the sum of all events recorded within the time span
. It is very difficult to obtain the explicit form of the integral component of the log-likelihood function with the given model.
3 Methodology
In this section, we introduce a multi-dimensional Gaussian-type exponent approximation method for modeling the background and triggering components of a self-exciting point process. The estimation procedure and diagnostic methods are described in detail.
Multi-dimensional Gaussian-type exponent approximation
The triggering function g and the background function are essential for characterizing a conditional intensity process. They shed light on how the trigger pattern in the observed data behaves in terms of space-time-dependent reliability. The space-time shape of the triggering function g can exhibit various forms, such as decreasing, increasing, U-shaped, or inverted U-shaped patterns. When working with space-time point processes, it is often necessary to obtain accurate estimates of the background and triggering functions. Depending on the application, various parametric models, such as the ETAS model, can be employed to estimate these functions using the maximum likelihood approach. However, parametric models may be too restrictive, lacking the flexibility to capture the full range of triggering patterns. By contrast, nonparametric methods are more flexible and with adequate data and appropriate regularization, can yield consistent estimate of the triggering function. We propose a nonparametric method for estimating the conditional intensity function of a self-exciting point process.
Triggering function
To model the self-exciting point process, we first introduce the self-excitation (triggering) function. For one dimensional self-exciting point process models, a generalized parametrization triggering function
was proposed by [30]. In this section we extend the one dimensional generalized parametrization triggering function to model space-time self-exciting models, explicitly accounting for spatial effects and broadening practical applicability. Considering that the spatial coordinates are unordered, we use quadratic term for space coordinates. A multi-dimensional Gaussian-type exponent approximation of the triggering function g is defined as follow:
where ,
.
and
are the triggering spatial decay rates along the x and y axes, and
is the temporal decay rate.
Extensive research has been conducted on the application of a single isotropic “Gaussian kernel” for locations and an exponential kernel for the temporal coordinate in seismicity, crime studies, security, and social network datasets [3,5,22]. We propose the multi-dimensional Gaussian-type exponent approximation to expand the “Gaussian kernel” for spatial coordinates from single component to multiple components, and use the combine multi-component “Gaussian kernel” for spatial coordinates with exponential kernel for time coordinate. Then the proposed multi-dimensional Gaussian-type exponent approximation can be seen as a mixed model.
Stationary background rate
The application of self-exciting point processes mostly assumes that the background intensity is considered to be time-invariant, meaning it remains constant over time [2,6,15,16,18]. In our approach, we adopt a similar estimation method while considering a stationary background intensity process for the space-time point process models, where the baseline intensity depends on the spatial component (x,y) but not on temporal component t, this implies that the background intensity exhibits spatial variation but is temporally invariant:
where .
Note that has mean parameters, but
does not. The cj and dj represents the mean parameters in the background intensity. This is because, for the background function
, it may be centered at any location, and we need to estimate the center; for the triggering function g, the center is the location of the parent event, we do not need to estimate the center.
and
are the background spatial decay rates along the x and y axes, respectively.
Remark 1 The current model can be extended to cases where covariance differentials (space correlation) are incorporated into both the background and the triggering function. That is,
and
Triggering function expressed as
and
where and
are the covariance matrices, and
and
are the means of the background and triggering functions, respectively. However, the above extension makes the design of fast estimation algorithms challenging. We will investigate this extended model in subsequent studies. It is worth noting that although the method proposed in this paper does not account for spatial correlations, the simulation results demonstrate that the proposed model and estimation method still exhibit robust performance for datasets involving spatial correlation.
Estimation methods
With the background and triggering functions defined above, we proceed to estimate the model parameters using MLE. In most cases, explicit solutions for MLE are not available and iterative numerical optimization methods are used instead. Alternatively, [26] showed that the EM algorithm can efficiently optimize the log-likelihood function in parametric models. The EM-type methods were first introduced in [15] for a semi-parametric model and have since been frequently applied to both parametric and nonparametric models in estimating the background intensity function and the triggering effects [3,16,17,22,23,31,32]. In this section, we address the computational difficulties associated with existing research and the challenges encountered when dealing with large datasets. We introduce a fast estimation algorithm for the space-time self-exciting point process model in two dimensions. First, we derive the explicit form of the integral component of the log-likelihood function; second, we introduce a truncated intensity to accelerate the computation.
The proposed multi-dimensional Gaussian-type exponent approximation method, described in the previous section, enhances the flexibility of intensity function estimation and alleviates the computational barriers that make space-time point process models challenging and computationally expensive to handle, especially when dealing with large datasets. Based on (3), for a defined spatial region and a defined time window
, where
and
. The log-likelihood is given as follows:
We obtain the MLE of the conditional intensity function by maximizing the log-likelihood (7).
The direct computation using (7) is computationally expensive; therefore, we segment and compute each part of the equation to reduce it computational burden. The first part of the log-likelihood in (7), which represents the log of the intensity function, is segmented into the background and the triggering effect. The computation is performed separately to mitigate computational challenges. The integral part of (7) is also divided into two segments, both represented via a Gaussian distribution. This segmentation accelerates the computation process and reduce the computational burden.
Dealing with the integral (compensator) term of the log-likelihood in (7), we first evaluate the background intensity in (6). Let represent a Gaussian distribution value at x where the Gaussian distribution has mean
and standard deviation
. Since
we obtain
Next, we evaluate the triggering effect (5) of the intensity process. Notice that
and
Then we have
where , G0(1)=0,
and
To reduce the computational burden of Gj(i), we make the model more tractable by assuming that an event occurring at time ti and spatial location si influences the intensity process by an amount
within a window of
, where z is the truncated time bound. Under certain assumptions, such as g(x,y,t) tends to 0 fast, which is a weak restriction and satisfies most application situations, the approximation error due to truncation can be effectively controlled [18]. We have the following concept of “truncated intensity”, which belongs to [18]:
However, the summation part of (8) for computing is irregular, so we replace
with fixed number of summation for each i by
. Then we have the modified truncated intensity function:
Then the computational complexity of summation part in (9) at n points reduces from O(n2) to . Now we can approximate Gj(i),
as follows:
Obtaining MLE in space-time self-exciting point process models is particularly challenging to accomplish analytically, as the structure of the log-likelihood in (7) consists of a sum of logarithms of background and triggering functions, which involves summing over previous points, resulting in intractable analytical maximization [33]. Hence we use the nonlinear optimization procedure described in [5] to obtain the MLE.
Selection p and q
To compare the adequacy of the proposed models using goodness-of-fit for different p and q, we use the Bayes Information Criterion (BIC):
A self-exciting model with the smallest BIC value is considered the best fit.
Diagnostics
To evaluate the performance of point process models, residual analysis is commonly employed. Various diagnostic techniques have been proposed for self-exciting point process models [24,34–36]. In this study, we employ residual analysis using super-thinning introduced by [37] for both simulations and the Chicago crime data analysis. The thinning method introduced by [24] is simple, efficient, and possesses the characteristics that any process governed by its conditional intensity process can be reduced by thinning to yield a homogeneous Poisson process [30]. However, residuals obtained through thinning can face challenges such as variability and the lack of independence among the residual points after thinning, primarily as a result of losing information caused by the elimination of observed points. To address these challenges, [37] proposed the super-thinning which merges the observed points obtained through thinning with simulated points generated by superposition. This approach produces a homogeneous residual point process, provided that the conditional intensity is correctly estimated. The procedure outlined below follows a thinning algorithm introduced by [24]:
Algorithm 1 Thinning algorithm
(1) Start by defining ;
(2) For every observed event i, compute the value ;
(3) Event i is retained with probability pi.
For the application of super-thinning, since b is small, selecting the rate k as will result in few points after thinning, rendering the test uninformative. Therefore, we select the rate k such that
. Using the alternative method suggested by [37], we choose the value of k to optimize the effectiveness of formal tests in assessing the homogeneity of the residuals,
where |S| denote the area of the spatial domain S and |D| is the length of time within the interval D. We first thin the process with probabilities , then we add to the thinned process inhomogeneous Poisson process with intensity
. This will result in a homogenous point process with rate k if the conditional intensity
is correctly estimated.
Ripley’s K-function [38] determines the proportion of occurrences per unit area within a specific distance and can be used for model diagnostics in super-thinning. This helps assess whether the model still fails to account for clustering in the thinning process [24]. Below is the most frequently applied K-function incorporating edge-corrected estimators [38]:
where the weight function corresponds to the proportion of the circle’s circumference that lies within the study area, dij denotes the distance between the ith and jth points, and
represents the volume of the observation region. In comparison to Ripley’s K-function, the L-function, computed as
, exhibits more consistent variance. Hence, we employ the L-function for model diagnostics. Based on Monte Carlo techniques and the L-function, the following algorithm provides the diagnostic procedure.
Diagnostics algorithm
Start with a tuning parameter k, , which represents the estimated intensity, M, which denotes the number of simulations and a discrete value d. We follow these steps:
Algorithm 2 Super-thinning diagnostic algorithm
(1) Use super-thinning approach to derive the thinned process;
(2) Compute the L-functions;
(3) Perform Steps 1 and 2 iteratively for M repetitions;
(4) Compute the sample mean of L-functions;
(5) Generate a homogeneous Poisson process with a rate of k across the observed domain S and repeat Step 2 M times;
(6) Based on the M realizations of the homogeneous Poisson process, calculate the 95% confidence limits for the L-functions.
4 Numerical experiments
Synthetic datasets
This section focuses on evaluating the efficiency of the proposed method by generating artificial burglary datasets from the conditional intensity process (12) with approximate sizes and
. [3] considered a conditional-intensity process (12) and reported results consistent with patterns observed in real burglary data. We incorporate a correlation coefficient,
, into our simulation model to capture dependence between the x and y coordinates. The background intensity is modeled as a correlated bivariate Gaussian density scaled by
, whereas the triggering intensity is modeled as the product of a correlated bivariate Gaussian spatial kernel and an exponential temporal kernel. By varying
, we evaluate the proposed model’s performance on datasets generated by this simulation framework.
where ,
, c = 10, d = 10,
,
,
and
. In a spatial domain of 20 × 20, the simulation is conducted utilizing the Simulation Algorithm introduced by [23]. To ensure that the point process reaches a steady state, discard the first and last 2000 points in each simulation [3]. The above simulation function only select single component. To illustrate the multi-component for the simulation, we use the following model where p = 0 and q = 1:
where ,
, c = 10, d = 10,
,
,
,
,
,
,
and
.
Fitting synthetic datasets
Moving forward, after generating the artificial burglary datasets using the simulation algorithm, we fit them to the model in (9). Considering the tuning parameter, which is the length of the truncated intensity function z in (9), we select z from the range 100–1000. This choice is based on the observation that a small value of z causes the background function to be overestimated, whereas a large value of z makes the estimation computationally expensive. Therefore, we found 200 to be the appropriate truncation length.
In Figs 1 and 2, we plot the values of BIC(i) for , corresponding to
for the single and the multi-component cases, respectively. We observe that (p,q) = (0,0) yields the smallest BIC value for both datasets across all values under the single component specification as shown in Fig 1, whereas in the multi-component setting the selected order is (p,q)=(0,1) for both datasets as illustrated in Fig 2.
In Tables 1 and 2, we present the estimated and exact parameter values for datasets of approximately and
events, respectively, under varying values of
. The results suggest that the estimated parameter values are closely aligned with the corresponding exact values, although minor discrepancies exist. [39] demonstrated that such estimations are prone to bias and proposed the use of Laplace approximation to mitigate it, although some bias still remains. Similarly, we report parameter estimates for the multi-component case using (13). The bias becomes more pronounced when
, as illustrated for
in Tables 3 and 4. In Fig 3, we plot the marginal estimates of
and g(x,y,t) against their corresponding true distributions. The estimated marginals are shown to closely follow the true marginals. Additionally, we present a comparison of the true and estimated marginals of
and g(x,y,t) for the multi-component scenario in Fig 4. Although the proposed model excludes the spatial correlation parameter
, it still yields parameter estimates close to the true values on datasets generated with spatial correlation. In Tables 5 and 6, we assess the robustness of our proposed method in the single component (single) and multi-component (multi) settings using 50 simulated datasets per setting.
Fig 5 presents the estimated centered L-function, , along with the 95% confidence limits for homogeneous Poisson processes. When the difference between the L(d) and d is zero, it implies homogeneity, whereas any deviation from zero indicates heterogeneity. As shown in Fig 5, the thinned residuals appear to be homogeneous, as the estimated centered L-function remains within the 95% confidence bounds.
The lower red curve represents the 5% bounds of the estimated centered L-function, , for a homogeneous Poisson process, while the upper red curve corresponds to its 95% bounds. Similarly, the lower blue curve denotes the 5% bounds of the estimated centered L-function for the thinned residuals, and the upper blue curve indicates its 95% bounds. The middle black line represents the empirical mean of the estimated centered L-function for the thinned residuals, while the gray-shaded region illustrates the confidence interval of the estimated centered L-function for the thinned residuals.
For Figs 1 and 2, the order of the proposed model is selected using the BIC criterion, which plays a role analogous to bandwidth selection in kernel density estimation (KDE)-based methods that rely on cross-validation [3,19]. The proposed model selection procedure is computationally efficient compared to KDE-based approaches, which are considerably more expensive as they require repeated cross-validation. From Fig 3, the results indicate that for a dataset of size , the estimates are reasonable, while the variability decreases as the data size increases to
, with similar behavior observed in Fig 4. These observations are consistent with the results of [3,40]. This indicates that, with an appropriate choice of model order, the method avoids model misspecification, leading the proposed approach to closely approximate the true marginals when
. Furthermore, following the residual analysis approaches introduced in the literature [6,37,40,41], the results in Fig 5 indicate that the model passes the goodness-of-fit test for both datasets. This implies that the test effectively evaluates the proposed model.
In summary, Tables 1–6 show that the proposed estimation framework provides stable and reliable parameter recovery in both single and multi-component settings. The estimated parameters closely align with the true values across different sample sizes, with discrepancies decreasing as the number of events increases. While a slight bias is observed in the presence of spatial correlation, the method reliably captures the marginal structures of and g(x,y,t). In the multi-component case, this bias becomes more evident when
. Overall, these results affirm the robustness of the proposed approach across a range of simulation scenarios.
5 Crime data set analysis
Data
In this section of our study, we analyze burglary crime data set in Chicago, United States, during the year 2002. We use the proposed approach to fit the burglary data set compiled by Chicago police department. The city of Chicago is the largest city in Illinois and the Midwest, with a population of 2,746,388 as of 2020, making it the third-largest city in the U.S. after New York City and Los Angeles. It is the seat of Cook County and the center of the Chicago metropolitan area, known as “Chicagoland”. The data include occurrence time stamp, longitude, latitude, crime types, description of crime, etc.. The catalog is publicly available at [42]. The data set consists of 29 types of crimes. We focus only on the burglary crime subset, comprising 25,201 events occurring in Chicago within a rectangular area defined by longitudes –87.91° to –87.53° and latitudes 41.65° to 42.02°. The longitudes and latitudes are transferred to plane coordinates measured in kilometers using the locally scaled equirectangular projection. This transformation is based on a local linearization of the earth’s surface around the study area. Following this transformation, the spatial dimensions were shifted to start from zero by subtracting the minimum transformed value along each axis resulting in spatial region defined in kilometers.
Fitting the burglary data
Following the specification of the background and triggering components, we proceed with model fitting using the spatiotemporal burglary data. Our goal is to estimate the space-time background function, which represents the occurrence rate of spontaneous burglaries. The space-time background function , which varies across the observation region, is estimated using the multi-dimensional Gaussian-type exponent approximation method introduced in the previous section. The estimation results are presented in Figs 6–9. Using the BIC-based model selection method in Fig 6, we select the model with the smallest BIC value which corresponds to (p,q) = (0,1).
Modeling the burglary data using the proposed model, we first examine the triggering effect, which characterizes the spatiotemporal clustering of burglaries. From the second row of Fig 8, we can see a plot of the marginal density estimates of the triggering effects. The clustering features are vividly observed, as these spatial marginal densities approximate the distribution with minimal variance. Observation from Fig 8 indicates that initial burglaries may trigger other burglaries within a close distance and short time frame. This is because burglars may target nearby houses after successfully executing their initial plans. The burglaries occur within neighborhoods of approximately 0.5 km × 1.8 km and within the first few days (1–3 days). The burglaries are more concentrated within localized areas over specific time periods, but diminish as distance and time increase.
Fig 7 and the first row of Fig 8 shows the marginal densities of the estimated background process, which represents the rate of occurrence of spontaneous events, untriggered burglaries. The orientation of Fig 7 suggests that there is no correlation between the spatial components; hence, it appears aligned with the axes rather than slanted, indicating that the spatial components are independent.
We evaluate the goodness-of-fit of the proposed multi-dimensional Gaussian-type exponent approximation for the self-exciting point process within a selected region of size [15,23]× [15,30]. The implementation of the super-thinning method is illustrated in Fig 9, where we observe that the super-thinning process exhibits behavior similar to a Poisson process. Specifically, the first column of Fig 9 shows the selected spatial region used for model evaluation, while the second column illustrates the points that remain after applying the super-thinning procedure. In Fig 10, we present the computed values of the centered L-function, expressed as L(d)–d, along with the 95% confidence bounds for a homogeneous Poisson process within the designated area. The results indicate that the estimated centered L-function values fall within the 95% confidence limits as shown in Fig 10. This finding indicates that the proposed multi-dimensional Gaussian-type exponent approximation method for the self-exciting point process can effectively model burglary crimes.
The lower red curve represents the 5% bounds of the estimated centered L-function, L(d)–d, for a homogeneous Poisson process, while the upper red curve corresponds to its 95% bounds. Similarly, the lower blue curve denotes the 5% bounds of the estimated centered L-function for the thinned residuals, and the upper blue curve indicates its 95% bounds. The middle black line represents the empirical mean of the estimated centered L-function for the thinned residuals, while the gray-shaded region illustrates the confidence interval of the estimated centered L-function for the thinned residuals.
6 Conclusion
We introduce a fast algorithm that uses a multi-dimensional Gaussian-type exponent approximation to estimate the background intensity and triggering effect of a space-time self-exciting point process model based on space-time datasets. This method offers significant advantages over other estimation methods, particularly for space-time self-exciting models, in terms of flexibility and computational efficiency. The method employs the BIC criterion to select the appropriate model order, which can be either single component or multi-component. The proposed multi-dimensional Gaussian-type exponent approximation method demonstrates strong estimation performance in both simulation studies and the burglary data application. Although some parameter estimates exhibit slight bias, our model, despite not incorporating the parameter from the Gaussian function, still performs well. As the data size increases, the method improves and obtains close approximations.
Furthermore, we demonstrate how space-time self-exciting point processes can be applied to model burglary crime. The proposed model captured the clustering behavior of the burglary data. This clustering reflects the underlying mechanisms of event triggering and propagation, in which prior burglary incidents increase the short-term risk of subsequent crimes in nearby locations. This behavior illustrates how information cascades about criminal activity spread both spatially and temporally, influencing offender decision-making. The proposed method is not limited to burglary; it is also applicable to other areas involving counting data characterized by self-excitation or event cascades, such as earthquakes (aftershock sequences), financial transactions (volatility clustering), and information diffusion processes. In these settings, efficiently capturing triggering effects induced by past events is essential for both interpretation and prediction. A current limitation of the model is its assumption of independence among spatial coordinates, as it does not include a spatial covariance structure. Although spatial dependence is illustrated in the simulation study through a correlation parameter , this dependence is not modeled in the empirical estimation.
For future work, we aim to explore the use of separable temporal components for the background and triggering intensity functions, as well as investigate other properties of the model through second-order residual analysis. Future work will investigate the model’s covariance using the Gaussian mixture model (GMM). The next step of the study is to introduce covariance differentials.
References
- 1. Diggle PJ. Spatio-temporal point processes, partial likelihood, foot and mouth disease. Stat Methods Med Res. 2006;15(4):325–36. pmid:16886734
- 2. HAWKES AG. Spectra of some self-exciting and mutually exciting point processes. Biometrika. 1971;58(1):83–90.
- 3. Mohler GO, Short MB, Brantingham PJ, Schoenberg FP, Tita GE. Self-Exciting Point Process Modeling of Crime. Journal of the American Statistical Association. 2011;106(493):100–8.
- 4.
Reinhart A. Point process modeling with spatiotemporal covariates for predicting crime. Carnegie Mellon University. 2016.
- 5. Ogata Y. Space-Time Point-Process Models for Earthquake Occurrences. Annals of the Institute of Statistical Mathematics. 1998;50(2):379–402.
- 6. Ogata Y. Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes. Journal of the American Statistical Association. 1988;83(401):9–27.
- 7. Johnson SD. Repeat burglary victimisation: a tale of two theories. J Exp Criminol. 2008;4(3):215–40.
- 8. Bennett T. Identifying, explaining, and targeting burglary ‘hot spots’. European Journal on Criminal Policy and Research. 1995;3(3):113–23.
- 9. Mohler G. Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting. 2014;30(3):491–7.
- 10. Mohler GO, Short MB, Malinowski S, Johnson M, Tita GE, Bertozzi AL, et al. Randomized Controlled Field Trials of Predictive Policing. Journal of the American Statistical Association. 2015;110(512):1399–411.
- 11. Rosser G, Cheng T. Improving the Robustness and Accuracy of Crime Prediction with the Self-Exciting Point Process Through Isotropic Triggering. Appl Spatial Analysis. 2016;12(1):5–25.
- 12.
Zhou K, Zha HY, Song L. In: 2013. 1301–9.
- 13.
Xu HT, Farajtabar M, Zha HY. In: 2016. 1717–26.
- 14. Bacry E, Muzy J-F. First- and Second-Order Statistics Characterization of Hawkes Processes and Non-Parametric Estimation. IEEE Trans Inform Theory. 2016;62(4):2184–202.
- 15. Zhuang J, Ogata Y, Vere-Jones D. Stochastic Declustering of Space-Time Earthquake Occurrences. Journal of the American Statistical Association. 2002;97(458):369–80.
- 16. Marsan D, Lengliné O. Extending earthquakes’ reach through cascading. Science. 2008;319(5866):1076–9. pmid:18292339
- 17. Fox EW, Schoenberg FP, Gordon JS. Spatially inhomogeneous background rate estimators and uncertainty quantification for nonparametric Hawkes point process models of earthquake occurrences. Ann Appl Stat. 2016;10(3).
- 18. Yang YX, Etesami J, He N, Kiyavash N. Nonparametric Hawkes processes: online estimation and generalization bounds. arXiv preprint. 2018.
- 19. Zhuang J. Next-day earthquake forecasts for the Japan region generated by the ETAS model. Earth Planet Sp. 2011;63(3):207–16.
- 20. Bacry E, Dayri K, Muzy JF. Non-parametric kernel estimation for symmetric Hawkes processes. Application to high frequency financial data. The European Physical Journal B. 2012;85(5):157.
- 21.
Bacry E, Gaïffas S, Muzy JF. A generalization error bound for sparse and low-rank multivariate Hawkes processes. In: 2015. https://arxiv.org/abs/1501.00725
- 22. Fox EW, Short MB, Schoenberg FP, Coronges KD, Bertozzi AL. Modeling E-mail Networks and Inferring Leadership Using Self-Exciting Point Processes. Journal of the American Statistical Association. 2016;111(514):564–84.
- 23. Zhuang J, Ogata Y, Vere‐Jones D. Analyzing earthquake clustering features by using stochastic reconstruction. J Geophys Res. 2004;109(B5).
- 24. Schoenberg FP. Multidimensional Residual Analysis of Point Process Models for Earthquake Occurrences. Journal of the American Statistical Association. 2003;98(464):789–95.
- 25.
Diggle PJ. Statistical analysis of spatial and spatio-temporal point patterns. CRC press. 2013.
- 26. Veen A, Schoenberg FP. Estimation of Space–Time Branching Process Models in Seismology Using an EM–Type Algorithm. Journal of the American Statistical Association. 2008;103(482):614–24.
- 27. Vere-Jones D. Forecasting earthquakes and earthquake risk. International Journal of Forecasting. 1995;11(4):503–38.
- 28.
Liniger TJ. Multivariate Hawkes processes. ETH Zurich. 2009.
- 29.
Daley DJ, Vere-Jones D. An introduction to the theory of point processes: volume I: Elementary Theory and Methods. New York: Springer-; 2003.
- 30. Ogata Y. On Lewis’ simulation method for point processes. IEEE Trans Inform Theory. 1981;27(1):23–31.
- 31. Lewis E, Mohler G. A nonparametric EM algorithm for multiscale Hawkes processes. Journal of Nonparametric Statistics. 2011;1(1):1–20.
- 32.
Lewis EA. Estimation techniques for self-exciting point processes with applications to criminal behavior. University of California, Los Angeles. 2012.
- 33. Reinhart A. A review of self-exciting spatio-temporal point processes and their applications. arXiv preprint. 2017.
- 34. Ogata Y, Katsura K, Tanemura M. Modelling Heterogeneous Space–Time Occurrences of Earthquakes and its Residual Analysis. Journal of the Royal Statistical Society Series C: Applied Statistics. 2003;52(4):499–509.
- 35. Bray A, Wong K, Barr CD, Schoenberg FP. Voronoi residual analysis of spatial point process models with applications to California earthquake forecasts. Ann Appl Stat. 2014;8(4).
- 36. Bray A, Schoenberg FP. Assessment of Point Process Models for Earthquake Forecasting. Statist Sci. 2013;28(4).
- 37. Clements RA, Schoenberg FP, Veen A. Evaluation of space–time point process models using super‐thinning. Environmetrics. 2012;23(7):606–16.
- 38. Ripley BD. Modelling Spatial Patterns. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1977;39(2):172–92.
- 39. Clark NJ, Dixon PM. Extended Laplace approximation for self-exciting spatio-temporal models of count data. Spatial Statistics. 2023;56:100762.
- 40. Li C, Song Z, Wang X. Nonparametric Method for Modeling Clustering Phenomena in Emergency Calls Under Spatial-Temporal Self-Exciting Point Processes. IEEE Access. 2019;7:24865–76.
- 41. Zhuang J. Second-Order Residual Analysis of Spatiotemporal Point Processes and Applications in Model Evaluation. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2006;68(4):635–53.
- 42.
Singh U. Chicago Crime Mapping; 2019. cited 2024-12-11. https://www.kaggle.com/code/uds5501/chicago-crime-mapping/input