Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The nhppp package for simulating non-homogeneous Poisson point processes in R

  • Thomas A. Trikalinos ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

    thomas_trikalinos@brown.edu

    Affiliations Center for Evidence Synthesis in Health, Brown University, Providence, RI, United States of America, Department of Health Services, Policy & Practice, Brown University, Providence, RI, United States of America, Department of Biostatistics, Brown University, Providence, RI, United States of America

  • Yuliia Sereda

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Software, Writing – original draft, Writing – review & editing

    Affiliation Center for Evidence Synthesis in Health, Brown University, Providence, RI, United States of America

Abstract

We introduce the nhppp package for simulating events from one dimensional non-homogeneous Poisson point processes (NHPPPs) in R fast and with a small memory footprint. We developed it to facilitate the sampling of event times in discrete event and statistical simulations. The package’s functions are based on three algorithms that provably sample from a target NHPPP: the time-transformation of a homogeneous Poisson process (of intensity one) via the inverse of the integrated intensity function; the generation of a Poisson number of order statistics from a fixed density function; and the thinning of a majorizing NHPPP via an acceptance-rejection scheme. We present a study of numerical accuracy and time performance of the algorithms. We illustrate use with simple reproducible examples.

1 Introduction

It is often desirable to simulate series of events (stochastic point processes) so that the intensity of their occurrence varies over time. Examples include events such as the occurrence of death and occurrences of symptoms, infections, or tumors over a person’s lifetime. The non-homogeneous Poisson point process (NHPPP), which generalizes the simpler homogeneous-Poisson, Weibull, and Gompertz point processes, is a widely used model for such series of events. NHPPPs can model complicated event patterns given a suitable intensity function. They are, therefore, useful in statistical and mathematical model simulation.

An NHPPP has the properties that the number of events in all non-overlapping time intervals are independent random variables and that, within each time interval, the number of events is Poisson distributed. Thus an NHPPP is a memoryless point process. A large number of phenomena may reasonably conform with these properties.

NHPPPs have been used in the simulation analysis of queues in queuing theory and operations research [1, 2]; hospital operations [3]; ambulance services [4, 5]; traffic accidents [6]; product and network reliability [7]; and the modeling of cancer [811], heart disease [12], and dementia [13, 14], among other applications [15]. NHPPPs are used so widely in part because their assumptions are often plausible. For example, when modeling traffic accidents along a road, it may be plausible to assume that individual accidents are independent of each other, but they happen in some locations more often because the probability of an accident depends on local aspects of the road, such as turns, slopes, and propensity for slippery conditions. Similarly, when modeling the impact of screening strategies on colorectal cancer outcomes at the population level, it is probably plausible to assume that, for each person, the emergence of precancerous lesions (adenomas) over a time interval is independent of whether such lesions emerged in other non-overlapping time intervals. In these examples, the intensity of event occurrence over the carrier space (the probability of a traffic accident along a road; and the probability that an adenoma will emerge in a person’s colon at different ages) is captured by the NHPPP’s intensity function. An NHPPP can model complicated event patterns using intensity functions that vary over the carrier space (e.g., length of road, time).

The nhppp package in R contains functions for the simulation of NHPPPs over a one-dimensional carrier space, which we will take to represent time [16]. Table 1 summarizes the functions implemented in nhppp as of version 0.1.4. You can install the development version of nhppp with

R> # install.packages("devtools")
R> devtools::install_github("bladder-ca/nhppp")

or the release version with

R> install.packages("nhppp")

We review NHPPPs in Section 2 and algorithms for sampling from constant rate Poisson point processes in Section 3. We introduce the three sampling algorithms that are implemented in the package in Section 4. We discuss special functional forms for the intensity function (constant, piecewise constant, linear, and log-linear) in Section 5. We describe nhppp versus other R packages that can simulate from one dimensional NHPPPs in Section 6 and present a numerical study in Section 7. We summarize in Section 8.

2 The Poisson point process

2.1 Definition

The Poisson point process is a stochastic series of events on the real line. For some sequence of events, let N(t, t + Δt) be the number of events in the interval (t, t + Δt]. If for some positive intensity λ and, as Δt → 0, (1) then that sequence of events is a Poisson point process. In Eq (1), the third statement demands that events occur one at a time. The fourth statement implies that the process is memoryless: For any time t0, the behavior of the process is independent to what happened before that time.

2.2 Homogeneous Poisson point process and counting process

Assume that the next event after time t0 happens at time t0 + X. It follows from the above definition (see [17, par. 4.1]) that, for a constant λ, X is exponentially distributed (2) and that the number of events is Poisson distributed over the compact interval (a, b], i.e., (3)

Eq (2) generates the homogeneous Poisson point process Z1 = t0 + X1, Z2 = Z1 + X2, …, where Zi is the time of arrival of event i and Xi the inter-arrival times. We will use Z(j) to denote the event in position j when events are ordered in increasing time. Eq (3) describes the corresponding (dual) counting process N1 = N(t0, Z1), N2 = N(t0, Z2), …, where Ni is the total number of events from time t0 to time Zi. The point process (the sequence [Zi] of event times) and the counting process (the sequence [Ni] of cumulants) are two sides of the same coin.

Sampling from the constant rate point process in (2) is discussed in Section 3.

2.3 Non homogeneous Poisson point process and counting process

When the intensity function changes over time, the homogeneous Poisson point process generalizes to its non-stationary counterpart, an NHPPP, with intensity function λ(t) > 0. For details see reference [17], par 4.2]. Then, the number of events over the interval (a, b] becomes (4) where is the integrated intensity or cumulative intensity of the NHPPP. Eq (4) describes the counting process of the NHPPP, which in turn implies a stochastic point process—a distribution of events over time.

Here the simulation task is to sample event times from the point process that corresponds to intensity function λ(t), or equivalently, to the integrated intensity function (Section 4). (With some abuse of notation, we define Λ(t) ≔ Λ(0, t) when a = 0.)

2.3.1 A note on zero intensity processes.

In (1), λ is strictly positive but in nhppp we allow it to be non-negative. If λ = 0, Pr[N(t, t + Δt) = 0] = 1 and Pr[N(t, t + Δt) ≥ 1] = 0. This means that no events occur and the stochastic point process in the interval (t, t + Δt] is denegerate. Allowing λ(t) ≥ 0 has no bearing on the results of simulations. If we can always ignore the middle interval in which no events happen.

2.4 Properties that are important for simulation

2.4.1 Composability and decomposability of NHPPPs.

The definition (1) implies that NHPPPs are composable [17, par. 4.2]: merging two NHPPPs with intensity functions λ1(t), λ2(t) yields a new NHPPP with intensity function λ(t) = λ1(t) + λ2(t). The reciprocal is also true: one can decompose an NHPPP with intensity function λ(t) into two NHPPPs, one with intensity function λ1(t) < λ(t) and one with intensity function λ2(t) = λ(t) − λ1(t). An induction argument extends the above to merging and decomposing three or more processes.

The composability and decomposability properties are important for simulation because they

  • give the flexibility to simulate several parallel NHPPPs independently versus to merge them, simulate from the merged process, and then attribute the realized events to the component processes by assigning the i-th event to the j-th process with probability λj(Zi)/λ(Zi), where λ(t) = ∑ λj(t).
  • motivate a general sampling algorithm (Algorithm 4, “thinning” [18]) that simulates a target NHPPP with intensity λ1(t) by first drawing events from an easy-to-sample NHPPP with intensity λ(t) > λ1(t), and then accepts sample i with probability λ1(Zi)/λ(Zi).

2.4.2 Transformations of the time axis.

Strictly monotonic transformations of the carrier space of an NHPPP yield an NHPPP [19]. Consider an NHPPP with intensity functions λ(t) and a strictly monotonic transformation of the time axis u: tτ that is differentiable once almost everywhere. On the transformed time axis the point process is an NHPPP with intensity function (5) This property is important for simulation because

  • it motivates the use of another general sampling algorithm (Algorithm 5, “time transformation” or “inversion”, [19]): A smart choice for u yields an easy to sample point process. The event times in the original time scale can be obtained as Zi = u−1(ζi), where ζi is the i-th event in the transformed time axis and u−1 is the inverse function of u.
  • given that at least i events have realized in the time interval (a, b], it makes it possible to draw events Z(j), j < i given event Z(i). This is useful for simulating earlier events conditional on the occurrence of a subsequent event. Choosing u(t) ≔ Z(i)t makes the time count backwards from Z(i). In this reversed clock we draw as if in forward time exactly i − 1 events ζ(1), ζ(2), …, ζ(i−1). Back transforming yields all preceding events.

Table 2 summarizes the common simulation tasks, such as simulating single events (at most one, exactly one), a series of events (possibly demanding the occurrence of at least one event), or the occurrence of a prior (event i − 1 given Z(i)). The nhppp package implements functions to simulate these tasks for general λ(t) or Λ(t).

thumbnail
Table 2. Common simulation needs in discrete event simulation.

https://doi.org/10.1371/journal.pone.0311311.t002

3 Sampling the constant rate Poisson process

Sampling the constant rate Poisson process is straightforward. Algorithms 1 and 2 are two ways to sample event times in interval (a, b] with constant intensity λ. Algorithm 3 describes sampling event times conditional on observing at least k events within the interval of interest.

3.1 Sequential sampling

Algorithm 1 samples events sequentially, using the fact that the inter-event times Xi are exponentially distributed with mean λ−1 [17, par. 4.1]. It involves generation only of exponential random variates, which is cheap on modern hardware. To sample at most k events, change the condition for the while loop in line 3 to

The package’s ppp_sequential() function implements constant-rate sequential sampling that returns a vector with zero or more event times in the interval [a, b). The range_t argument is a two-values vector with the bounds a, b. Setting the optional argument atmost1 to TRUE from its default value of FALSE returns the first event or an empty vector, depending on whether at least one event is drawn in the interval.

Algorithm 1 Sequential sampling of events in interval (a, b] with constant intensity λ.

Require: t ∈ (a, b]

 1: ta

 2:      ▹ is an ordered set

 3: while t < b do     ▹ Up to k earliest points: while do

 4:  XX ∼ Exponential(λ−1)      ▹ Mean-parameterized

 5:  tt + X

 6:  if t < b then

 7:   

 8:  end if

 9: end while

 10: return

R> library("nhppp")
R> ppp_sequential(range_t = c(7, 10), rate = 1, atmost1 = FALSE)
[1] 7.673885 8.650502 9.011229 9.407575

nhppp functions can accept a user provided random number stream object via the rng_stream option.

R> library("rstream")
R> S <- new("rstream.mrg32k3a")
R> ppp_sequential(range_t = c(7, 10), rate = 1, rng_stream = S)
[1] 8.793702

3.2 Sampling using order statistics

Algorithm 2 Sampling events in interval (a, b] with constant intensity λ using order statistics.

Require: t ∈ (a, b]

 1: NN ∼ Poisson(λ(ba))

 2: ta

 3:      ▹ is an ordered set

 4: if N > 0 then

 5:  for i ∈ [N] do:

 6:   UiUi ∼ Uniform(0, 1)     ▹ Generate order statistics

 7:   

 8:  end for

 9:  

 10: end if

 11: return    ▹ Up to k earliest points: return

Algorithm 2 first draws the number of events in (a, b] from a Poisson distribution. Conditional on the number of events, the event times Zi are uniformly distributed over (a, b] [17, par. 4.1]. The algorithm returns the order statistics [Z(i)], obtained by sorting the event times [Zi] in ascending order. It is necessary to generate all event times to generate the order statistics. Thus, to sample at most k event times we should return the earliest k event times, and line 11 of the Algorithm would be changed to

The ppp_orderstat() function implements constant-rate sampling via the order-statistics algorithm.

R> ppp_orderstat(range_t = c(3.14, 6.28), rate = 1/2)
[1] 3.141663 5.700931

3.3 Sampling conditional on observing at least m events

Algorithm 3 Sampling with constant intensity λ conditional that at least m events occurred in interval (a, b]. Relies on generating order statistics analogously to Algorithm 2.

Require: t ∈ (a, b]

 1: NN ∼ TruncatedPoissonNm(λ(ba))      ▹ (m − 1)-truncated Poisson

 2: ta

 3:     ▹ is an ordered set

 4: if N > 0 then

 5:  for i ∈ [N] do:

 6:   UiUi ∼ Uniform(0, 1)     ▹ Generate order statistics

 7:   

 8:  end for

 9:  

 10: end if

 11: return     ▹ Up to k earliest points: return

Algorithm 3 is used to generate a point process conditional on observing at least m events. For example, if λ is the intensity of tumor generation, it can be used to simulate times of tumor emergence among patients with at least one (m = 1) tumor. To return the up to k earliest events, we modify line 11 the same way as for Algorithm 2. As an example, in a lifetime simulation we can sample the time of all-cause death by setting in Algorithm 3 m = 1, so that at least one event will happen in (a, b], and k = 1, to sample only the time of the first event Z(1).

To sample exactly m events, change line 1 of Algorithm 3 to

Function ztppp() simulates times conditional on drawing at least one event—i.e., setting m = 1 in Algorithm 3 to sample from a zero truncated Poisson distribution in line 1.

R> ztppp(range_t = c(0, 10), rate = 0.001, atmost1 = FALSE)
[1] 4.411277

Function ppp_n() simulates times conditional on drawing exactly m events.

R> ppp_n(size = 4, range_t = c(0, 10))
[1] 1.762014 2.902897 6.751627 9.733794

4 The general sampling algorithms used in nhppp

The nhppp package uses three well known general sampling algorithms, namely thinning, time transformation or inversion, and order-statistics. These algorithms are efficiently combined to sample from special cases, including cases where the intensity function is a piecewise constant, linear, or log-linear function of time, as described in Section 5.2.

The thinning algorithm works with the intensity function λ(t), which is commonly available. The inversion and order statistics algorithms have smaller computational cost than the thinning algorithm, but work with the integrated intensity function Λ(t) and its inverse Λ−1(z), which may not be available. The generic function draw() is a wrapper function that dispatches to specialized functions depending on the provided arguments. It is useful for general tasks but the specialized functions are probably faster.

R> l <- function(t) t
R> L <- function(t) 0.5 * t^2
R> Li <- function(z) sqrt(2 * z)
R> draw(
+    lambda = l, lambda_maj = l(10), range_t = c(5, 10),
+    atmost1 = FALSE, atleast1 = FALSE
+  ) |> head(n = 5)
[1] 5.179473 5.374814 5.957391 5.992196 6.101935
R> draw(
+    Lambda = L, Lambda_inv = Li, range_t = c(5, 10),
+    atmost1 = FALSE, atleast1 = FALSE
+  ) |> head(n = 5)
[1] 5.219264 5.230747 5.369646 5.398531 5.618079

4.1 The thinning algorithm

The thinning algorithm relies on the decomposability of NHPPPs (Section 2.4) and is described in [18]. Let the target NHPPP have intensity function λ(t) and λ*(t) ≥ λ(t) for all t ∈ (a, b] be a majorizing intensity function. Think of the majorizing function as an easy-to-sample function which is the sum of the intensity of the target point process λ(t) and the intensity λreject(t) of its complementary point-process,

The acceptance-rejection scheme in Algorithm 4 generates proposal samples with intensity function λ*(t) and stochastically attributes them to the target process (to keep, with probability λ(Z)/λ*(Z)) or its complement.

Algorithm 4 The thinning algorithm for sampling from λ(t).

Require:

  λ*(t) ≥ λ(t) ∀ t ∈ (a, ])    ▹ majorizing intensity function

       ▹ is an ordered set

 1:

 2:      ▹ is an ordered set

 3: if N > 0 then

 4:  for i ∈ [N] do:

 5:   UiUi ∼ Uniform(0, 1)

 6:   if then

 7:    

 8:   end if

 9:  end for

 10: end if

 11: return     ▹ Up to k earliest points: return

To sample the earliest k points, one can exit the for loop in lines 4–9 when k events have been sampled in line 7, or, alternatively, return the first up to k points in line 11.

A measure of the efficiency of Algorithm 4 is the proportion of samples that are accepted, which is (6) on average. Thus, the closer λ*(t) is to λ(t), the more efficient the algorithm.

In practice, λ*(t) can be chosen as one of the special cases in Section 5, for which we have fast sampling algorithms. For example, it can be a piecewise constant majorizer. Algorithm A in S1 Appendix can automatically generate a piecewise constant majorizer function for intensity functions that are monotonic and possibly non-continuous or Lipschitz continuous and possibly non-monotonic.

The nhppp package has functions that sample from time-varying intensity functions. The first function, draw_intensity(), expects a user-provided linear (λ*(t) = α + βt) or log-linear (λ*(t) = eα+βt) majorizer function.

R> lambda_fun <- function(t) exp(0.02 * t)
R> draw_intensity(
+    lambda = lambda_fun, # linear majorizer
+    lambda_maj = c(intercept = 1.01, slope = 0.03),
+    exp_maj = FALSE, range_t = c(0, 10)
+  ) |> head (n = 5)
[1] 1.310245 2.094217 2.908682 3.268384 8.007606
R> draw_intensity(
+    lambda = lambda_fun, # log-linear majorizer
+    lambda_maj = c(intercept = 0.01, slope = 0.03),
+    exp_maj = TRUE, range_t = c(0, 10)
+  ) |> head (n = 5)
[1] 0.3406743 0.6079479 0.8441584 2.6424551 3.3185387

The second function, draw_intensity_step(), expects a user-provided piecewise linear majorizer which is specified as a vector of length M + 1 including the points and a vector of length M with the values in each subinterval of (a, b]. For example, the following code splits the interval (0, 10] into M = 10 subintervals of length one. Because lambda_fun() is strictly increasing, its value at the upper bound of each subinterval is the supremum of the interval.

R> draw_intensity_step(
+    lambda = lambda_fun,
+    lambda_maj_vector = lambda_fun(1:10), # 1:10 (10 intensity values)
+    times_vector = 0:10 # 0:10 (11 interval bounds)
+  ) |> head(n = 5)
[1] 0.3825378 7.0822941 7.7839779 8.7766992 8.9554954

4.2 The time transformation or inversion algorithm

Algorithm 5 implements the time transformation or inversion algorithm from [19] and [17, par. 4.2]. As mentioned in Section 2.4, strictly monotonic transformations of the carrier space (here, time) of a Poisson point process yield another Poisson Point Process. In Eq (5), choosing the transformation τ = u(t) = Λ(t), so that , results in ρ(τ) = 1.

This means (proof sketched in [17, par. 4.2]) that we can sample points from a Poisson point process with intensity one over the interval (τa, τb] = (Λ(a), Λ(b)]. Via a similar argument, we transform event times sampled on the transformed scale back to the original scale using g(t) = Λ−1(τ). The transformations u(⋅), g(⋅) are not unique—at least up to the group of affine transformations.

Function draw_cumulative_intensity_inversion() works with a cumulative intensity function Λ(t) and its inverse Λ−1(z), if available. If the inverse function is not available (argument Lambda_inv = NULL), the Brent bisection algorithm is used to invert Λ(t) numerically, at a performance cost [20].

R> Lambda_fun <- function(t) 50 * exp(0.02 * t) - 50
R> Lambda_inv_fun <- function(z) 50 * log((z + 50) / 50)
R> draw_cumulative_intensity_inversion(
+    Lambda = Lambda_fun,
+    Lambda_inv = Lambda_inv_fun,
+    range_t = c(5, 10.5),
+    range_L = Lambda_fun(c(5, 10.5))
+  ) |> head(n = 5)
[1]  6.458937  7.608496  9.060817  9.566278 10.076889

Algorithm 5 The time transformation or inversion algorithm for sampling given Λ(t), Λ−1(z) [17, 19]. The notation PoissonProcess1 indicates sampling event times from a constant rate one Poisson point process.

Require: Λ(t), Λ−1(z), t ∈ (a, b]     ▹ Λ−1(z) possibly numerically

 1: τa ← Λ(a), τb ← Λ(b)

 2:     ▹ From Algorithm 1 (or 3 for conditional sampling)

 3: ▹ Λ−1(⋅) as set function, meant elementwise

 4: return

4.3 The order statistics algorithm

The general order statistics algorithm (Algorithm 6) is a direct generalization of Algorithm 2. It first draws the number N of realized events. Conditional on N (7) as discussed in [18]. Algorithm 6 makes the above explicit.

Algorithm 6 The order statistics algorithm for sampling from an NHPPP given Λ(t), Λ−1(z).

Require: Λ(t), Λ−1(z), t ∈ (a, b]      ▹ Λ−1(z) possibly numerically

 1: NN ∼ Poisson(Λ(b) − Λ(a))

 2: ta

 3:      ▹ is an ordered set

 4: if N > 0 then

 5:  for i ∈ [N] do:

 6:   UiUi ∼ Uniform(0, 1)      ▹ Generate order statistics

 7:   

 8:  end for

 9:  

 10: end if

 11: return      ▹ Up to k earliest points: return

Sampling up to k earliest points means returning the up to k earliest event times. If Λ(t) is a positive linear function of time, λ is constant and Algorithm 6 becomes Algorithm 2.

To sample conditional on observing at least m events in the interval (a, b] see Algorithm B in S2 Appendix.

Function draw_cumulative_intensity_orderstats() works with a cumulative intensity function Λ(t) and its inverse Λ−1(z), if available. Function ztdraw_cumulative_intensity() conditions that at least one event is sampled in the interval. As above, if the inverse function is not available (argument Lambda_inv = NULL), the Brent bisection algorithm is used to invert Λ(t) numerically, at a performance cost.

R> draw_cumulative_intensity_orderstats(
+    Lambda = Lambda_fun,
+    Lambda_inv = Lambda_inv_fun,
+    range_t = c(4.1, 7.6)
+  )
[1] 5.091581 5.526070 5.601576 5.762498 6.495684
R> ztdraw_cumulative_intensity(
+    Lambda = Lambda_fun,
+    Lambda_inv = Lambda_inv_fun,
+    range_t = c(4.1, 7.6)
+  )
[1] 5.063676 6.682454 6.749162 6.926164 7.298342

5 Special cases

The nhppp package implements several special cases where the intensity function λ(⋅), the integrated intensity function Λ(⋅), and its inverse Λ−1(⋅) have straightforward analytical expressions.

5.1 Sampling a piecewise constant NHPPP

Functions draw_sc_step() and draw_sc_step_regular() sample piecewise constant intensity functions based on Algorithm 5. The first can work with unequal-length subintervals (am, bm]. The second results in a small computational time improvement when all subintervals are of equal length.

R> draw_sc_step(
+    lambda_vector = 1:5, times_vector = c(0.5, 1, 2.4, 3.1, 4.9, 5.9),
+    atmost1 = FALSE, atleast1 = FALSE
+  ) |> head(n = 5)
[1] 0.8425117 1.3281115 2.3309443 2.6794560 2.7939130
R> draw_sc_step_regular(
+    lambda_vector = 1:5, range_t = c(0.5, 5.9), atmost1 = FALSE,
+    atleast1 = FALSE
+  ) |> head(n = 5)
[1] 2.058468 2.100620 2.508954 3.125179 3.604882

Function vdraw_sc_step_regular() is a vectorized version of draw_sc_step_regular(). It returns a matrix with one event series per row, and as many columns as the maximum number of events across all draws.

R> vdraw_sc_step_regular(
+    lambda_matrix = matrix(runif(20), ncol = 5), range_t = c(1, 4),
+    atmost1 = FALSE
+  )
         [,1]     [,2]     [,3]     [,4]
[1,] 2.304123 2.802767       NA       NA
[2,] 2.990953       NA       NA       NA
[3,] 1.840374 2.134357 3.784424 3.816034
[4,] 2.136138 2.703826 3.269631       NA

The corresponding functions that return at least one event in the interval are ztdraw_sc_step(), ztdraw_sc_step_regular(), and vztdraw_sc_step_regular().

5.2 Sampling NHPPPs with linear and log-linear intensities

Functions draw_sc_linear() and ztdraw_sc_linear() sample zero or more and at least one event, respectively, from NHPPPs with linear intensity functions. An optional argument (atmost1) returns the first event only.

R> draw_sc_linear(alpha = 3, beta = -0.5, range_t = c(0, 10)) |> head(n = 5)
[1] 0.3327657 0.4270154 0.5804320 0.6935027 0.9832093
R> ztdraw_sc_linear(alpha = 0.5, beta = 0.2, range_t = c(9.999, 10))
[1] 9.999757

An analogous set of functions ([nhppp|ztnhppp]_sc_loglinear()) samples from log-linear intensity functions

The sampling algorithm is a variation of Algorithm 5, as described in [21]. Example usage follows.

R> draw_sc_loglinear(alpha = 1, beta = -0.02, range_t = c(8, 10))
 [1] 8.028806 8.128887 8.457669 8.483558 8.498647 8.503109 8.522725
 [8] 8.665979 8.671737 8.978065 8.981105 9.493691 9.815000 9.909167
R> ztdraw_sc_loglinear(alpha = 1, beta = -0.02, range_t = c(9, 10))
[1] 9.038160 9.075722 9.238302

6 Comparisons with other R packages

Table 3 lists five R packages that simulate from NHPPPs, including nhppp. We did not consider research code that is not an R package in the Comprehensive R Archive Network or is developed in other languages. For example, we do not run comparisons with the R and Python code for sampling from piecewise constant NHPPPs with regular time intervals in Garibay et al [22]. (Their code corresponds to the vdraw_sc_step_regular() function in nhppp.)

Package reda [23] focuses on recurrent event data analysis and can simulate NHPPPs with the inversion and thinning algorithms using the simEvent() function. It can take function object arguments for λ(t). When using the thinning algorithm, it takes a constant majorizer. For the inversion algorithm, it approximates Λ(t) and its inverse numerically, at a computational cost.

Package simEd [24] includes various functions for simulation education. Function thinning() implements the homonymous algorithm for drawing points from an NHPPP. Users can specify the intensity function and a piecewise constant or linear majorizer function.

Package IndTestPP [25] provides a framework for exploring the dependence between two or more realizations of point processes. It includes the ancillary function simNHPc() for simulating NHPPPs with the inversion or thinning algorithms. The function’s argument is a piecewise constant approximation of the intensity function via a vector of evaluations, each corresponding to unit length subintervals. This resolution may not be adequate to simulate processes that change fast over a unit time interval.

Package NHPoisson [26, 27] fits NHPPP models to data and is not really geared towards mathematical simulation. Its simNHP.fun() function provides the ability for simulation-based inference via an implementation of the inversion algorithm. This function is designed to work with the package’s inference machinery and is not practical to use for simulation, because the user has no direct control over the function’s rescaling of the time axis.

The claimed advantage of nhppp over the existing packages is that

  • it samples from the target NHPPP and not from a numerical approximation thereof, e.g., as IndTestPP does.
  • It can sample conditional on observing at least one event in the interval, which no other package implement.
  • It accepts user-provided random number stream objects, which is useful for implementing simulation variance reduction techniques such as common random numbers [28] and antithetic variates [29].
  • It is fast and memory efficient, both for the non-vectorized functions that are implemented in native R and for the vectorized functions that use C++ plugins via the Rcpp package [30]. nhppp has specialized functions to leverage additional information about the point process, such as Λ(t), Λ−1(z), when available, which can result in faster simulation use the cumulative intensity function and its inverse, often at a computational speed advantage.

7 Illustrations

Depending on the application, we may have access to the intensity function or the integrated intensity function. We compared the R packages in Table 3 for sampling from a non-monotonic and highly non-linear intensity function for which the integrated intensity function can be derived analytically.

7.1 The target NHPPP to be simulated

Consider the example (8) of a sinusoidal intensity function λ(t) scaled to have an exponential amplitude and one of its antiderivatives Λ(t), with such a constant term that Λ(0) = 0. For the numerical study we set r = 0.2, w = 1, and t ∈ (0, 6π]. There is no analytic inverse function for this example. However, we can precompute Li(), a good numerical approximation to Λ−1(z). We will use it in Section 7.5 to compare the time performance of functions that use the inversion and order statistics algorithms when Λ−1 is available versus not.

R> l <- function(t) (1 + sin(t)) * exp(0.2 * t)
R> L <- function(t) {
+    exp(0.2 * t) * (0.2 * sin(t) - cos(t)) / 1.04 +
+      exp(0.2 * t) / 0.2 - 4.038462
+  }
R> Li <- approxfun(
+    x = L(seq(0, 6 * pi, 10^-3)),
+    y = seq(0, 6 * pi, 10^-3), rule = 2
+  )

Fig 1 graphs the intensity function and three majorizing functions over the interval of interest, which will be needed for the thinning algorithm.

thumbnail
Fig 1.

The λ(t) (left) and Λ(t) used in the illustration. Also shown three majorizing functions (left panel, marked a, b, c) that are used with the thinning algorithm in the analyses.

https://doi.org/10.1371/journal.pone.0311311.g001

The first, λ*a(t) = 43.38, shown as a dashed blue line, is is a constant majorizer equal to the maximum of the intensity function. A constant majorizer may be a practical choice when only an upper bound is known for λ(t). From (6), the analytic efficiency of the thinning algorithm using this majorizer is 0.209.

The second, λ*b(t), shown as a thin black line, is a piecewise constant envelope generated automatically from Algorithm A in S1 Appendix with 20 equal-length subintervals and Lipschitz cone coefficient K = 52.05. We set K equal to the maximum value of in the interval, attained at 6π. The analytic efficiency of the thinning algorithm using this majorizer is 0.245.

The third, λ*c(t), shown as a thicker black line, is a tighter piecewise constant majorizer with the same 20 equal-length subintervals that is constructed by finding a least upper bound in each subinterval. The analytic efficiency of the thinning algorithm with the third majorizer is 0.718.

7.2 Simulation functions and algorithms

We sampled series of events from the target NHPPP using the packages and functions listed in Table 3. We repeated the sampling 104 times, recording all simulated points (event times). We also recorded the median computation time for drawing one series of events with single-threaded computation on modern hardware.

From the nhppp package we use

  1. two functions that take as argument the intensity function and are based on Algorithm 4 (thinning): draw_intensity(), which uses linear majorizers such as λ*a, and draw_intensity_step(), which uses piecewise constant majorizers such as λ*b and λ*c in the example.
  2. Function draw_cumulative_intensity_inversion(), which takes as argument the cumulative intensity function Λ(t) and is based on Algorithm 5 (time transformation/inversion), and
  3. function draw_cumulative_intensity_orderstats(), which also uses Λ(t) and is based on Algorithm 6 (order statistics).

Regarding the other R packages in Table 3, we used all except for NHPoisson, whose simulation function is tailored to supporting simulation based inference for data analysis and is not practical to use as a standalone function. (Its implementation does not allow the user to control the scaling of the time axis in a practical way.) However, its source code/algorithm is very similar to that of the IndTestPP simulation function, which is developed by the same authors.

We used the metrics in Table 4 to assess simulation performance with each function. We compared the empirical versus the simulated distributions of number of events and event times over J = 100 simulation runs.

7.3 Simulation performance with respect to number of events

We calculated the absolute and relative bias in the first two moments of the empirical distribution in the counts of events, the bounds of equal-tailed confidence intervals at the 95, 90, 75, and 50 percent levels, a χ2-distributed goodness of fit statistic and its p-value, and the Wasserstein-1 distance W1 between the empirical and the theoretical count distributions and the asymptotic one sided p value to reject whether W1 = 0 according to [31]. W1 is the smallest mass that has to be redistributed so that one distribution matches the other. W1 is equal to the unsigned area between the cumulative distribution functions of the compared distributions. For example, W1 = 5.25 means that the mass that must be moved to transform one density to the other is no less than 5.25 counts and a W1 = 0 implies perfect fit.

The results for the nhppp functions in Fig 2 and Table 5 suggest excellent simulation performance.

thumbnail
Fig 2. Theoretical (red) and empirical (black) cumulative distribution functions for event counts in the illustration example with nhppp functions.

The unsigned area between the theoretical and empirical curves equals the Wasserstein-1 distance in Table 5.

https://doi.org/10.1371/journal.pone.0311311.g002

thumbnail
Table 5. Simulated total number of events with nhppp functions for the illustration example.

https://doi.org/10.1371/journal.pone.0311311.t005

The respective results for the R packages are in Fig 3 and Table 6. The simulation performance with the reda functions is excellent. Performance with simEd and IndTestPP functions depends on the adequacy with which they approximate the target density. In this example, the approximation accuracy is not ideal for either package, but is somewhat worse for IndTestPP.

thumbnail
Fig 3. Theoretical (red) and empirical (black) cumulative distribution functions for event counts in the illustration example with the R packages in Table 3.

The unsigned area between the theoretical and empirical curves equals the Wasserstein-1 distance in Table 5.

https://doi.org/10.1371/journal.pone.0311311.g003

thumbnail
Table 6. Simulated total number of events with the R packages of Table 3 for the illustration example.

https://doi.org/10.1371/journal.pone.0311311.t006

7.4 Event times

We compared the theoretical and empirical distribution of event times for all J = 104 event time draws. We calculated a goodness of fit statistic by binning realized times in 70 bins and its p value, by comparing the statistic against the distribution. We also calculated the W1 distance between these distributions and its associated p value.

Fig 4 and Table 7 indicate excellent simulation performance with the nhppp functions.

thumbnail
Fig 4. Simulated event times with nhppp.

Left column: histogram (gray) and theoretical distribution (red) of event times; right column: empirical (black) and theoretical (red) cumulative distribution function. The unsigned area between the empirical and cumulative distribution functions is the W1 distance in Table 7.

https://doi.org/10.1371/journal.pone.0311311.g004

thumbnail
Table 7. Goodness of fit of simulated event times with nhppp functions for the example.

https://doi.org/10.1371/journal.pone.0311311.t007

Fig 5 and Table 8 indicate excellent simulation performance with the reda functions. The simulation performance with the simEd and IndTestPP functions, which rely on approximations, is not as good.

thumbnail
Fig 5. Simulated event times with the R packages in Table 3.

Left column: histogram (gray) and theoretical distribution (red) of event times; right column: empirical (black) and theoretical (red) cumulative distribution function. The unsigned area between the empirical and cumulative distribution functions is the W1 distance in Table 8.

https://doi.org/10.1371/journal.pone.0311311.g005

thumbnail
Table 8. Goodness of fit of simulated event times with R functions in Table 3.

https://doi.org/10.1371/journal.pone.0311311.t008

7.5 Time performance

7.5.1 Time performance of non-vectorized functions.

To indicate time performance, we benchmarked functions by recording execution times when drawing a series of points (Fig 6). We also benchmarked functions for drawing the first-occurring event, because nhppp functions can sample the first time more efficiently when the inversion algorithm is used (Fig 7).

thumbnail
Fig 6. Computation times when drawing all events in interval.

https://doi.org/10.1371/journal.pone.0311311.g006

thumbnail
Fig 7. Computation times when drawing the first event in interval.

https://doi.org/10.1371/journal.pone.0311311.g007

We provided functions with the arguments they need to run fastest. For example, functions that use the inversion or order statistics algorithm execute faster when the inverse function Λ−1(z) is provided, rather than numerically calculated, as shown in both Figures for the nhppp package. (Functions in other packages do not take Λ(t) and Λ−1(z) arguments.) The fastest functions are nhppp functions that rely on the inversion or order statistics algorithms given Λ−1(z).

According to (6), the thinning algorithm has higher efficiency, and is expected to execute faster, for majorizer functions that envelop the intensity function more closely. Observe that λ*a ≻ λ*c and λ*b ≻ λ*c in Fig 1. As expected, the execution times are indeed shorter for majorizer ‘c’ compared to ‘b’ in Figs 6 and 7. However, the execution times are longer with majorizer ‘c’ compared to ‘a’ because draw_intensity(), the function that uses constant majorizers, and draw_intensity_step(), the function that use piecewise constant majorizers, are implemented differently. draw_intensity() happens to be faster in this example, but this is not always true.

In nhppp, functions that use the inversion or order statistics algorithms can exit earlier when only the first event is requested. This is not possible, however, for the thinning algorithm. This efficiency does not appear to be implemented in the other packages.

7.5.2 Time performance of vectorized functions.

In R, ‘vectorized’ computation, where operations are done in columns, is faster than using for loops or apply() functions. As shown in Table 1, nhppp includes vectorized functions for sampling from (i) piecewise constant intensity functions, using [vdraw|vztdraw]_sc_step_regular(); and (ii) general intensity functions, using [vdraw|vztdraw]_intensity_step_regular().

We compared the execution speed of non-vectorized and vectorized functions for sampling 105 times from the piecewise constant ‘b’ majorizer (λ*b) in Fig 1. The expected number of events with λ*b in (0, 6π] is 741.97. When drawing only the earliest event, the vectorized function is approximately 113 times faster than the non-vectorized function (median 59ms versus 6717ms over 105 simulations). When drawing all events, the vectorized function is approximately 1.4 times faster than the non-vectorized function (median 36.55s versus 50.97s over 105 simulations). The reason that the difference in speed attenuates is that the current implementation of the vectorized functions does not use sparse matrices to store samples, which introduces inefficiencies the expected number of samples becomes larger.

8 Summary and next developments

The nhppp facilitates the simulation of NHPPPs from time-varying intensity or cumulative intensity functions. Its claim is that it (i) simulates correctly from a target density, not just from an approximation; (ii) samples conditional on observing at least one event in an interval; (iii) accomodates user provided random number stream objects; and (iv) is fast. The current version includes one vectorized function for sampling from regular-spaced piecewise constant intensity functions. In future releases we will further optimize execution speed and memory usage.

9 Computational details and credits

R 4.3.1 [32] was used for all analyses. Packages xtable 1.8.4 [33] and knitr 1.45 [34] were used for automatic report generation. Packages ggplot2 3.4.4 [35], ggridges 0.5.5 [36], and latex2exp 0.9.6 [37] were used for plot generation and LaTeXformatting. Packages nhppp 0.1.4 [16], bench 1.1.3 [38], rstream 1.3.7 [39], otinference 0.1.0 [40], and parallel 4.3.1 were used in the examples and the analyses.

All computations were done on an Apple M1 Max machine with 64 megabytes of random access memory. A preprint of the current paper is in [16]. R itself and all aforementioned packages are available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/.

Supporting information

S1 Appendix. Piecewise constant majorizer functions.

Algorithm for the automatic generation of piecewise constant majorizer functions.

https://doi.org/10.1371/journal.pone.0311311.s001

(PDF)

S2 Appendix. Conditional sampling from NHPPPs.

Algorithm to sample conditionally on observing at least m events in (a, b].

https://doi.org/10.1371/journal.pone.0311311.s002

(PDF)

S1 Code. Code to reproduce the exhibits.

R code to reproduce the exhibits. Timing results are machine and platform specific.

https://doi.org/10.1371/journal.pone.0311311.s003

(TXT)

Acknowledgments

We thank the investigators of the Cancer Incidence and Surveillance Modeling Network (CISNET) Bladder Cancer Site Stavroula Chrysanthopoulou, Jonah Popp, Fernando Alarid-Escudero, Hawre Jalal, and David Garibay for useful discussions.

References

  1. 1. Law AM, Kelton WD, Kelton WD. Simulation modeling and analysis. vol. 3. Mcgraw-hill New York; 2007.
  2. 2. Luchak G. The solution of the single-channel queuing equations characterized by a time-dependent Poisson-distributed arrival rate and a general class of holding times. Operations Research. 1956;4(6):711–732.
  3. 3. Kim SH, Whitt W. Choosing arrival process models for service systems: Tests of a nonhomogeneous Poisson process. Naval Research Logistics (NRL). 2014;61(1):66–90.
  4. 4. Yang W, Su Q, Huang SH, Wang Q, Zhu Y, Zhou M. Simulation modeling and optimization for ambulance allocation considering spatiotemporal stochastic demand. Journal of Management Science and Engineering. 2019;4(4):252–265.
  5. 5. Zhou Z, Matteson DS, Woodard DB, Henderson SG, Micheas AC. A spatio-temporal point process model for ambulance demand. Journal of the American Statistical Association. 2015;110(509):6–15.
  6. 6. Abdel-Aty MA, Radwan AE. Modeling traffic accident occurrence and involvement. Accident Analysis & Prevention. 2000;32(5):633–642. pmid:10908135
  7. 7. Thompson W. On the foundations of reliability. Technometrics. 1981;23(1):1–13.
  8. 8. England T, Harper P, Crosby T, Gartner D, Arruda EF, Foley K, et al. Modelling lung cancer diagnostic pathways using discrete event simulation. Journal of Simulation. 2023;17(1):94–104. pmid:36760877
  9. 9. Rutter CM, Savarino JE. An evidence-based microsimulation model for colorectal cancer: validation and application. Cancer Epidemiol Biomarkers Prev. 2010;19(8):1992–2002. pmid:20647403
  10. 10. Jeon J, Meza R, Moolgavkar SH, Luebeck EG. Evaluation of screening strategies for pre-malignant lesions using a biomathematical approach. Math Biosci. 2008;213(1):56–70. pmid:18374369
  11. 11. Tsokos CP, Xu Y. Non-homogenous Poisson Process for Evaluating Stage I & II Ductal Breast Cancer Treatment. Journal of Modern Applied Statistical Methods. 2011;10(2):646–655.
  12. 12. Andreev VP, Head T, Johnson N, Deo SK, Daunert S, Goldschmidt-Clermont PJ. Discrete event simulation model of sudden cardiac death predicts high impact of preventive interventions. Scientific reports. 2013;3(1):1771. pmid:23648451
  13. 13. Getsios D, Blume S, Ishak KJ, Maclaine GD. Cost effectiveness of donepezil in the treatment of mild to moderate Alzheimer’s disease: a UK evaluation using discrete-event simulation. Pharmacoeconomics. 2010;28:411–427. pmid:20402542
  14. 14. Mar J, Soto-Gordoa M, Arrospide A, Moreno-Izco F, Martínez-Lage P. Fitting the epidemiology and neuropathology of the early stages of Alzheimer’s disease to prevent dementia. Alzheimer’s research & therapy. 2015;7:1–8. pmid:25713598
  15. 15. Zhang X. Application of discrete event simulation in health care: a systematic review. BMC health services research. 2018;18:1–11. pmid:30180848
  16. 16. Trikalinos TA, Sereda Y. nhppp: Simulating Nonhomogeneous Poisson Point Processes in R; 2024. Available from: https://CRAN.R-project.org/package=nhppp.
  17. 17. Cox D, Miller H. The Poisson Process. In: The Theory of Stochastic Processes. Chapman and Hall; 1965. p. 147.
  18. 18. Lewis PW, Shedler GS. Simulation of nonhomogeneous Poisson processes by thinning. Naval Research Logistics Quarterly. 1979;26(3):403–413.
  19. 19. Çinlar E. Introduction to stochastic processes Prentice-Hall. Englewood Cliffs, New Jersey (420p); 1975.
  20. 20. Press W, Teukolsky S, Vetterling W, Flannery B. Section 9.3. Van Wijngaarden-Dekker-Brent Method. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York; 2007.
  21. 21. Lewis P, Shedler G. Simulation of nonhomogeneous Poisson processes with log linear rate function. Biometrika. 1976;63(3):501–505.
  22. 22. Garibay D, Jalal H, Alarid-Escudero F. A computationally efficient nonparametric sampling (NPS) method of time to event for individual-level models. medRxiv. 2024.
  23. 23. Wang W, Fu H, Yan J. reda: Recurrent Event Data Analysis; 2022. Available from: https://github.com/wenjie2wang/reda.
  24. 24. Lawson B, Leemis L, Kudlay V. simEd: Simulation Education; 2023. Available from: https://CRAN.R-project.org/package=simEd.
  25. 25. Cebrián AC. IndTestPP: Tests of Independence and Analysis of Dependence Between Point Processes in Time; 2020. Available from:https://CRAN.R-project.org/package=IndTestPP.
  26. 26. Cebrián AC, Abaurrea J, Asín J. NHPoisson: An R Package for Fitting and Validating Nonhomogeneous Poisson Processes. Journal of Statistical Software. 2015;64(6):1–25.
  27. 27. Cebrián AC. NHPoisson: Modelling and Validation of Non-Homogeneous Poisson Processes; 2020. Available from: https://CRAN.R-project.org/package=NHPoisson.
  28. 28. Wright R, Ramsay Jr T. On the effectiveness of common random numbers. Management Science. 1979;25(7):649–656.
  29. 29. Hammersley JM, Mauldon JG. General principles of antithetic variates. In: Mathematical proceedings of the Cambridge philosophical society. vol. 52. Cambridge University Press; 1956. p. 476–481.
  30. 30. Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, et al. Rcpp: Seamless R and C++ Integration; 2024. Available from: https://CRAN.R-project.org/package=Rcpp.
  31. 31. Sommerfeld M, Munk A. Inference for empirical Wasserstein distances on finite spaces. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2018;80(1):219–238.
  32. 32. R Core Team. R: A Language and Environment for Statistical Computing; 2023. Available from: https://www.R-project.org/.
  33. 33. Dahl DB, Scott D, Roosen C, Magnusson A, Swinton J. xtable: Export Tables to LATEX or HTML; 2019. Available from: https://CRAN.R-project.org/package=xtable.
  34. 34. Xie Y. knitr: A Comprehensive Tool for Reproducible Research in R. In: Stodden V, Leisch F, Peng RD, editors. Implementing Reproducible Computational Research. Chapman and Hall/CRC; 2014.
  35. 35. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York; 2016. Available from: https://ggplot2.tidyverse.org.
  36. 36. Wilke CO. ggridges: Ridgeline Plots in ggplot2; 2023. Available from: https://CRAN.R-project.org/package=ggridges.
  37. 37. Meschiari S. latex2exp: Use LATEX Expressions in Plots; 2022. Available from: https://CRAN.R-project.org/package=latex2exp.
  38. 38. Hester J, Vaughan D. bench: High Precision Timing of R Expressions; 2023. Available from: https://CRAN.R-project.org/package=bench.
  39. 39. Leydold J. rstream: Streams of Random Numbers; 2022. Available from: https://CRAN.R-project.org/package=rstream.
  40. 40. Sommerfeld M. otinference: Inference for Optimal Transport; 2017. Available from: https://CRAN.R-project.org/package=otinference.