Figures
Abstract
In infectious disease epidemiology, the instantaneous reproduction number is a time-varying parameter defined as the average number of secondary infections generated by an infected individual at time t. It is therefore a crucial epidemiological statistic that assists public health decision makers in the management of an epidemic. We present a new Bayesian tool (EpiLPS) for robust estimation of the time-varying reproduction number. The proposed methodology smooths the epidemic curve and allows to obtain (approximate) point estimates and credible intervals of
by employing the renewal equation, using Bayesian P-splines coupled with Laplace approximations of the conditional posterior of the spline vector. Two alternative approaches for inference are presented: (1) an approach based on a maximum a posteriori argument for the model hyperparameters, delivering estimates of
in only a few seconds; and (2) an approach based on a Markov chain Monte Carlo (MCMC) scheme with underlying Langevin dynamics for efficient sampling of the posterior target distribution. Case counts per unit of time are assumed to follow a negative binomial distribution to account for potential overdispersion in the data that would not be captured by a classic Poisson model. Furthermore, after smoothing the epidemic curve, a “plug-in’’ estimate of the reproduction number can be obtained from the renewal equation yielding a closed form expression of
as a function of the spline parameters. The approach is extremely fast and free of arbitrary smoothing assumptions. EpiLPS is applied on data of SARS-CoV-1 in Hong-Kong (2003), influenza A H1N1 (2009) in the USA and on the SARS-CoV-2 pandemic (2020-2021) for Belgium, Portugal, Denmark and France.
Author summary
The instantaneous reproduction number is a key statistic that provides important insights into an epidemic outbreak as it informs about the average number of secondary infections engendered by an infectious agent. We present a flexible Bayesian approach called EpiLPS (Epidemiological modeling with Laplacian-P-Splines) for efficient estimation of the epidemic curve and
based on daily case count data and the serial interval distribution. Computational speed and absence of arbitrary assumptions on smoothing makes EpiLPS an interesting tool for estimation of the reproduction number. Our methodology is validated through different simulation scenarios by using the associated R software package (https://cran.r-project.org/package=EpiLPS). We also demonstrate the use of EpiLPS on real data from two historical outbreaks and on the SARS-CoV-2 pandemic.
Citation: Gressani O, Wallinga J, Althaus CL, Hens N, Faes C (2022) EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number. PLoS Comput Biol 18(10): e1010618. https://doi.org/10.1371/journal.pcbi.1010618
Editor: Claudio José Struchiner, Fundação Getúlio Vargas: Fundacao Getulio Vargas, BRAZIL
Received: January 20, 2022; Accepted: September 30, 2022; Published: October 10, 2022
Copyright: © 2022 Gressani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Simulation results and real data applications in this paper can be fully reproduced with the code available on the GitHub repository https://github.com/oswaldogressani/EpiLPS-ArticleCode based on the EpiLPS package version 1.0.6 available on CRAN (https://cran.r-project.org/package=EpiLPS).
Funding: This project is funded by the European Union’s Research and Innovation Action (https://cordis.europa.eu/project/id/101003688) under the H2020 work programme, EpiPose grant number 101003688. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
This is a PLOS Computational Biology Methods paper.
Introduction
The instantaneous reproduction number is a time-varying parameter defined as the average number of secondary cases generated by an infectious individual at time t. During epidemic outbreaks,
provides a snapshot (often on a daily basis) that quantifies the extent to which a given infectious disease transmits in a population and is therefore an important tool that assists governmental organizations in the management of a public health crisis. The reproduction number is also a good proxy for measuring the real-time growth phase of an epidemic and as such, constitutes a key signal about the transmission potential of the outbreak and the required control effort. For this reason, having a robust, accurate and timely estimator of
is a crucial matter that has attracted considerable interest in developing new statistical approaches during the last two decades as summarized in [1]. The paper of [2] compares several methods for estimating
and gives clear insights about the main challenges and obstacles that have to be faced. They recommend the method of [3] and its associated EpiEstim package [4] as an appropriate and accurate tool for near real-time estimation of the instantaneous reproduction number. Another recent approach is proposed in [5], where a recursive Bayesian smoother based on Kalman filtering is used to derive a robust estimate of
in periods of low incidence. The EpiNow2 package [6] also provides interesting extensions and implementations of current best practices for precise estimation and forecast of the reproduction number using a Bayesian latent variable framework. Spline based approaches have shown to be a useful tool for flexible modeling of the reproduction number. [7] use penalized radial splines for estimating
under a Bayesian setting with misreported data and [8] accelerated the computational implementation by replacing the Markov chain Monte Carlo (MCMC) scheme with Laplace approximations. From a frequentist perspective, [9] uses truncated polynomials and radial basis splines to model the series of new infections and a derivative thereof as a candidate estimator for the reproduction number.
In this article, we propose a new Bayesian approach termed “EpiLPS” for estimating based on case incidence data and the serial interval (SI) distribution (the time elapsed between the onset of symptoms in an infector and the onset of symptoms in the secondary cases generated by that infector). Our estimator of
is based on epidemic renewal equations [10, 11] and Laplacian-P-splines smoothing of the mean number of incidence cases. Time series of new cases by day of reporting (or day of symptom onset) are assumed to follow a negative binomial distribution to account for potential excess variability as frequently encountered in epidemiological count data. Algorithms related to Laplace approximations and evaluations of B-spline bases are coded in C++ and embedded in the R language through the Rcpp package [12], making computational speed another key strength of EpiLPS as
can be estimated in seconds. In addition, EpiLPS can also be used to obtain a smoothed estimate of the epidemic curve that can be of potential interest to further visualize an epidemic outbreak.
The proposed Bayesian methodology is based on a latent Gaussian model for the B-spline amplitudes and opens up two possible paths for inference. The first is called LPSMAP, a fully sampling-free approach based on Laplace approximations to the conditional posterior of B-spline coefficients. The hyperparameter vector is fixed at its maximum a posteriori and credible intervals of are computed via the “delta” method. The second path is called LPSMALA and is a MCMC approach based on the Langevin diffusion for efficient exploration of the posterior distribution of latent variables. The latter approach is computationally heavier than LPSMAP but has the merit of taking into account the uncertainty surrounding the hyperparameters. The underlying Metropolis-within-Gibbs structure keeps the practical implementation to a fairly simple level and the computational cost is reasonable even for long chains.
Compared to existing methods, EpiLPS resembles EpiEstim from a methodological point of view in the sense that is estimated from incidence time series and a serial interval distribution, yet the two approaches fundamentally differ in many aspects. First, the methodology of [3] assumes that incidence at time t is Poisson distributed, while EpiLPS assumes a negative binomial model. Second, as our approach uses penalized spline based approximations, prior specifications are imposed on the roughness penalty parameter and not directly on
as in EpiEstim. Third and most importantly, EpiLPS is free of any sliding window specification, while EpiEstim relies on a user-defined time window. This subjective time window choice is the key driving force that determines how smooth the estimated
trajectory will be. In EpiLPS, the optimal amount of smoothing is data-driven and objectively estimated (through the penalty parameter) within the Bayesian model. An R package for EpiLPS has been developed and is available at https://cran.r-project.org/package=EpiLPS. The software also allows to compute the Cori et al. (2013) [3] estimate of
for the sake of comparison.
The manuscript is organized as follows. We first present the Laplacian-P-splines model for smoothing count data and show how the Laplace approximation applies to the conditional posterior of the B-spline amplitudes and also derive the (approximate) posterior of the hyperparameter vector to be optimized. This yields the maximum a posteriori (MAP) estimate of the spline vector via Laplacian-P-splines (LPSMAP). We then use LPSMAP to propose a “plug-in” estimate of based on renewal equations and proceed to the computation of credible intervals. An alternative path for estimation of
based on MCMC is also presented. The latter approach uses Langevin dynamics for efficient sampling of the target posterior distribution and is termed LPSMALA for “Laplacian-P-splines with a Metropolis-adjusted Langevin algorithm”. Next, we assess the performance of EpiLPS in various simulation scenarios and make comparisons with EpiEstim. Finally, we apply EpiLPS to real world epidemic outbreaks before concluding with a discussion.
Methods
Negative binomial model for case incidence data
Let be a time series of counts during an epidemic of T days with
(set of non-negative integers) denoting the number of cases by reporting date or by date of symptom onset. We assume that the number of cases on day t follows a negative binomial distribution yt ∼ NegBin(μ(t), ρ), with
and probability mass function (see e.g. [13, 14]):
(1)
where Γ(⋅) is the gamma function. The above parameterization is frequently encountered in epidemiology [15] and yields a mean
and variance
, so that ρ is the parameter responsible for overdispersion (variance larger than the mean) that is absent in a Poisson setting. In the limiting case
and we recover the mean-variance equality of the Poisson model. The key argument in favor of a negative binomial distribution is thus its ability to capture the often encountered feature of overdispersion present in infectious disease count data [16]. We assume that μ(t) evolves smoothly over the time course of the epidemic and model it with cubic B-splines [17]:
(2)
where θ = (θ1, …, θK)⊤ is the vector of B-spline amplitudes to be estimated and b(⋅) = (b1(⋅), …, bK(⋅))⊤ is a cubic B-spline basis defined on the domain
, where rl is a lower bound on the time axis, typically the first day of the epidemic (i.e. rl = 1). The philosophy behind P-splines consists in specifying a “large” number K of basis functions together with a discrete roughness penalty λθ⊤Pθ as a counterforce to the induced flexibility of the fit. The parameter λ > 0 acts as a tuning parameter calibrating the “degree” of smoothness and
is a penalty matrix built from rth order difference matrices Dr of dimension (K − r) × K perturbed by an ε-multiple (here ε = 10−6) of the K-dimensional identity matrix IK to ensure full rankedness. There are several attractive reasons to use P-splines for smoothing the epidemic curve and
. First, as the P-splines setting specifies an abundant number of B-spline basis functions coupled with a penalty on the spline coefficients to control for overfitting, the resulting μ(t) fit is smooth and estimates can be obtained for any t on the continuous time domain. Second, even if the number K of B-splines is free to choose, the shape of the fitted
curve is actually regulated by the smoothing parameter λ and hence only negligibly affected by the arbitrary choice of K, provided it is large enough [18]. Third, the intrinsic sparseness of P and of the B-spline basis matrix is computationally appealing as it softens the algorithmic implementation and yields numerically stable routines [19, 20]. Another key advantage of P-splines smoothers is their natural formulation in a Bayesian framework by translating difference penalties on contiguous B-spline coefficients into Gaussian random walk smoothness priors [21]. Following the latter reference, we impose a Gaussian prior on the vector of spline coefficients
, with precision matrix Qλ = λP. For full Bayesian inference, the following priors are imposed on the model hyperparameters. Following [22], a robust Gamma prior is specified for the roughness penalty parameter
, where
is a Gamma distribution with mean a/b and variance a/b2, ϕ = 2 and δ is an additional dispersion parameter with hyperprior
. This prior specification favors “small” λ values and translates the belief that a wiggly
fit is more inclined to arise during the epidemic period as opposed to an oversmoothed fit. Finally, the following uninformative prior is imposed on the overdispersion parameter
. Let η ≔ (λ, ρ)⊤ denote the vector of hyperparameters. The full Bayesian model is thus:
Laplace approximation to the conditional posterior of θ
The Laplace approximation has two key roles in the proposed EpiLPS methodology. First, it determines the approximating distribution to the (conditional) posterior of the spline vector θ that will be used to estimate the average incidence of cases at time t, i.e. and hence
via the renewal equation. Second, the variance-covariance matrix of the Laplace approximation is used to quantify the uncertainty of the instantaneous reproduction number through a “delta” method in LPSMAP and is also introduced in the proposal distribution of the LPSMALA algorithm to form the skeleton of the correlation structure for the spline components. The synergy between Laplace approximations and P-splines has already been shown to be very effective for modeling count data (see for instance [23], in the context of generalized additive models). The log-likelihood for the negative binomial model is given by:
(3)
with g(yt, ρ) = log Γ(yt + ρ) − log Γ(ρ) and
denoting equality up to an additive constant. The gradient of the log-likelihood with respect to the spline coefficients is:
where:
The Hessian of the log-likelihood with respect to the B-spline amplitudes is:
with entries:
Using Bayes’ rule, the conditional posterior of θ for a given η is:
(4)
where
denotes the likelihood function. The gradient and Hessian of the log-likelihood (3) can be used to compute the gradient and Hessian of the (log-)conditional posterior (4), namely:
The above two equations will be used iteratively in a Newton-Raphson algorithm to obtain the Laplace approximation to the conditional posterior of θ:
(5)
where θ*(η) and Σ*(η) is the mode and variance-covariance respectively after convergence of the Newton-Raphson algorithm. The latter two quantities are functions of the hyperparameter vector η. An intuitive choice for η is to fix it at its maximum a posteriori. This is the option retained here, although it is also possible to work with a grid-based approach [23, 24].
Hyperparameter optimization
The hyperparameter vector η = (λ, ρ)⊤ will be calibrated by posterior optimization. Following [25] and [24], the hyperparameter vector can be approximated as follows:
(6)
Approximation (6) can be written extensively as:
where the K/2 power of λ comes from the determinant
. As
is the kernel of a Gamma distribution for the dispersion parameter δ, the following integral can be analytically solved:
Using the transformation of variables (ensuring numerical stability during optimization) w = log(ρ), v = log(λ), one can show that
can be written as follows after using the multivariate transformation method:
where
. The approximated log-posterior becomes:
(7)
Eq (7) is numerically optimized and yields
. Plugging the latter vector into the Laplace approximation (5), we obtain the estimate
of the spline vector. The latter can be seen as a MAP estimate of θ. Thus, the approximated (conditional) posterior of the spline vector is:
(8)
and can be used to construct credible intervals for functions that depend on θ, such as
as shown in the following section.
Estimation of
with LPSMAP
The renewal equation “plug-in” estimate.
In this section, we show how the negative binomial model for smoothing incidence counts can be used to estimate through the renewal equation. Let φ = {φ1, …, φk} be a known k-dimensional vector representing the serial interval (SI) distribution, where φs is the probability that the SI is equal to s day(s), i.e.
. We also assume
and
. The renewal model [10, 11] gives a mathematical statement of equality between the mean incidence of cases at time step t and a product between the reproduction number
and a convolution involving antecedent cases and the serial interval distribution:
(9)
where
denotes the number of circulating cases that contribute to active transmission, also known as total infectiousness at time t [5]. Rearranging Eq (9) and taking the length k of the serial interval into account, we obtain an equation with the instantaneous reproduction number on the left-hand side:
(10)
Our Bayesian “plug-in” estimator of
at time step t is obtained by replacing the average number of cases
by the estimated average
and by replacing yt−s by
:
(11)
Note that the MAP estimate of the overdispersion parameter affects the estimate
via
. Using the indicator function
, i.e.
if condition A is true and
otherwise, the above estimator can be written in a single line:
(12)
Credible intervals for
.
Using the functional relationship between and θ as in Eq (12), the log of the instantaneous reproduction number can be written as:
with
Note that h(θ|t) is seen here as a function of the spline vector θ for a given time point t. A (1 − α) × 100% approximate credible interval for
is obtained via a “delta” method. Consider a first-order Taylor expansion of h(θ|t) around
(henceforth θ* for the sake of a light notation), the mean of the Laplace approximated posterior of the spline vector in (8):
(13)
where the kth entry of the gradient vector ∇h(θ|t) = (∂h(θ|t)/∂θ1, …, ∂h(θ|t)/∂θK)⊤ is:
It follows that for k = 1, …, K, we have:
The Taylor expansion in (13) is a linear combination of the vector θ that is a posteriori (approximately) Gaussian due to the Laplace approximation. As the family of Gaussian distributions is closed under linear combinations, it follows that h(θ|t) (and hence
) is a posteriori also (approximately) Gaussian with mean
and variance
, where
is the covariance matrix of the Laplace approximation (8). This suggests to write:
(14)
The accuracy of the variance approximation in (14) can be improved through a scaling of the covariance matrix Σ* by multiplying it with the scaling factor
, corresponding to the estimated mean-to-variance ratio
at time step t (see S2 Appendix). The (approximate) posterior distribution for
is thus given by
, i.e. a lognormal distribution with parameters
and
. A quantile-based (1 − α) × 100% approximate credible interval for
is thus
, where zα/2 is the α/2-upper quantile of a standard normal variate.
Estimation of
with LPSMALA
In Bayesian statistics, posterior distributions obtained with Bayes’ theorem often entail a high degree of complexity and are typically not analytically tractable. To circumvent this problem, MCMC methods have been developed for generating samples from (possibly unnormalized) target distributions [26]. One of the most popular MCMC methods together with the Gibbs sampler [27] is the Metropolis-Hastings (MH) algorithm originally proposed by [28] and later generalized by [29]. In this section, we propose to implement a modified version of the Metropolis-adjusted Langevin algorithm (MALA) [30] within the EpiLPS framework. The major advantage of MALA as compared to MH algorithms is that the proposal distribution is based upon a discretized approximation of the Langevin diffusion that uses the gradient of the target posterior distribution. These “smarter” proposals make use of additional information about the target density so that algorithms based on Langevin dynamics can converge at sub-geometric rates and tend to be more efficient than naive random-walk Metropolis algorithms [31, 32].
This motivates our choice for embedding a MALA algorithm in EpiLPS as an efficient way of obtaining MCMC samples for inference on the instantaneous reproduction number via the renewal equation. The end-user will thus have a fully flexible choice regarding the underlying approach for estimating
either via Laplacian-P-splines, where the uncertainty surrounding the parameter λ responsible for smoothing is ignored and λ is fixed at its maximum a posteriori (LPSMAP); or via a modified MALA algorithm, where the uncertainty surrounding the penalty (and overdispersion) parameter is fully taken into account (LPSMALA). The approach permits to obtain samples from the joint posterior of the spline vector and the penalty and overdispersion parameters. The latter can then be injected in functionals of the spline vector to obtain smooth estimates of the epidemic curve as well as the instantaneous reproduction number. Another advantage is that highest posterior density intervals can be easily calculated with LPSMALA.
Conditional posteriors for a “Metropolis-within-Gibbs”.
Joint posterior of (ζ, λ)
Let ζ = (θ⊤, ρ)⊤ be the (K + 1)-dimensional vector gathering the B-spline coefficients θ and the overdispersion parameter ρ. Using Bayes’ theorem, the joint posterior distribution for ζ, λ and δ is:
(15)
The analytical formulas of the chosen priors are:
Injecting the above priors into (15) yields:
(16)
Conditional posteriors of ζ, λ and δ
The following conditional posterior distributions can be directly obtained from (16):
(17)
(18)
(19)
Sampling from the joint posterior
As the full conditionals ,
and
are available, we follow a “Metropolis-within-Gibbs” strategy to sample the joint posterior
. In particular, the hyperparameters λ and δ will be sampled in a Gibbs step, while ζ will be sampled using a modified Langevin-Hastings algorithm. This approach is presented in [33] in the context of Bayesian density estimation (see also [34] for the use of MALA in a proportional hazards model and [35] for a recent implementation in mixture cure models). We adapt the algorithm of the latter reference to our EpiLPS methodology. In particular, the variance-covariance matrix in the Langevin diffusion will be replaced by the variance-covariance matrix obtained with LPSMAP. The correlation structure borrowed from LPSMAP improves convergence and chain mixing.
The modified Metropolis-adjusted Langevin algorithm.
In what follows, we prefer to work under the log(⋅) parameterization for ρ, i.e. w = log(ρ) and denote by , the (K + 1)-dimensional vector of B-spline amplitudes and (log) overdispersion w. Under this parameterization, the conditional posterior of
given λ and δ can be obtained from (17) by using the transformation method of random variables:
(20)
with the following log-likelihood under the reparameterization:
(21)
Let us denote by
the state of the chain at iteration (m − 1). In the Langevin-Hastings algorithm, the proposal for the vector
at iteration m is a draw from the following multivariate Gaussian distribution:
(22)
where ϱ > 0 is a tuning parameter that has to be carefully chosen in order to reach a desired acceptance rate and ΣLH is the following block-diagonal variance-covariance matrix:
(23)
where Σ* is the K-dimensional covariance matrix obtained with LPSMAP. The gradient of
can be decomposed as follows:
(24)
and is analytically available (see S1 Appendix for more details). All the quantities related to the Langevin-Hastings proposal have been analytically derived, so that the draw in (22) can be obtained (for a given value of λ and δ). As in a classic MH algorithm, the next step consists in computing the acceptance probability:
(25)
where q(⋅, ⋅) denotes the (Gaussian) proposal distribution and
the target (conditional) posterior distribution. Finally, we generate a uniform random variable
and accept the proposed vector
if
and reject it otherwise. While iterating through the Metropolis-within-Gibbs algorithm, the tuning parameter
is automatically adapted to reach the optimal acceptance rate of 0.57 [31, 36, 37]. The pseudo-code below summarizes the LPSMALA algorithm.
LPSMALA algorithm to sample the posterior
.
1: Fix initial values m = 0, λ(0), δ(0), ϱ(0) and .
2: for m = 1, …, M do
3: (Langevin-Hastings)
4: Compute Langevin diffusion: .
5: Generate a proposal: .
6: Compute acceptance probability: .
7: Draw .
8: if u ≤ π set (accept), else
(reject).
9: (Gibbs sampler)
10: Draw ,
11: Draw .
12: (Adaptive tuning)
13: Update .
14: end for
The adaptive tuning part (line 13) involves the step function , with ϵ = 10−4 and
, see [33] for details. Finally, the ratio
entering the computation of the acceptance probability (line 6) is derived in S1 Appendix.
Posterior inference with LPSMALA.
Provided the LPSMALA algorithm is iterated long enough, say after iterations, MCMC theory certifies that
can be viewed as random draws from the target posterior distribution
. Note that a convenient starting point for the initial values of the parameters might be to fix them at their LPSMAP estimate. Given the sample
, inference on quantities that are functions of θ becomes straightforward in the sense that point estimates and credible intervals can be easily obtained. A point estimate for the mean number of incidence counts at time t is taken to be the posterior mean (after discarding the burn-in phase):
(26)
Note also that
can be used to compute highest posterior density intervals of μ(t) at any point t. Using the renewal equation and the MCMC sample, one can apply the “plug-in” method and recover the following estimate of the instantaneous reproduction number at time point t:
(27)
Also, using
, one can compute a highest posterior density interval of
at time step t.
Results
Setting of the simulation study
In this section, a numerical study is implemented with nine epidemic scenarios to assess the accuracy with which EpiLPS is able to track the target reproduction number over time. EpiLPS results are compared with the instantaneous reproduction number estimate from the EpiEstim package [3] using three sliding windows options (the default weekly windows, three days windows and daily windows). In addition, we disentangle between comparisons of EpiLPS against EpiEstim with estimates reported on the last day of a window following the convention of [3] and
estimates reported at the midpoint of a smoothing window following the best practice recommendation of [2]. For EpiLPS, K = 40 (cubic) B-splines are specified with a second-order penalty and a chain length of 3 000 for LPSMALA (including a burn-in of size 1 000). In each scenario, S = 100 incidence time series of T days are simulated (initiated with 10 index cases). The epidemic data generating process computes the mean incidence at a given day t, i.e. μ(t) according to the renewal equation and the incidence of cases at time point t is sampled from the negative binomial distribution yt ∼ NegBin(μ(t), ρ). The simulation study also accounts for varying degrees of overdispersion by using different values of ρ in the considered scenarios. Furthermore, the incidence data are generated according to three different serial interval distributions, namely φFLU, φSARS and φMERS corresponding to an influenza, a SARS-CoV-1 and a MERS-CoV like serial interval, respectively. The discretized version of the SI distributions are computed by using the Cori et al. (2013) [3] discretization formula assuming a (shifted) Gamma distribution. In Scenario 1, a constant instantaneous reproduction number
is considered. Scenario 2 imitates an intervention strategy, so that
until a sudden drop to
occurs at day t = 20. The latter scenario allows to check whether EpiLPS is able to quickly react to a sudden change in
. Scenario 3 is characterized by a more wiggly structure for
and Scenario 4 considers the case of a vanishing epidemic with a monotonic decreasing reproduction number. Scenarios 5–8 assume the same functional form for
as in Scenarios 1–4 but with a different serial interval distribution. In Scenario 9, the
function is chosen in such a way that there is a single large wave in the early phase of the epidemic and a more stable pattern (with smaller waves) in the late phase. Table 1 summarizes the time domain of the epidemic curve, the target
function, the serial interval distribution and its associated source(s) in the literature.
The simulation study is organized as follows. First, we compare EpiLPS with EpiEstim using the convention of Cori and colleagues, namely reporting the estimate at the end of the smoothing window, which is well suited for real-time estimation. The latter approach reports the
estimate computed in the window [t − ω; t], where ω denotes the window width. Next, the Gostic et al. (2020) [2] recommendation is used, where the
estimate is reported at the center of the window, i.e. [t − ω/2; t + ω/2]. Concentrating on the window midpoint avoids lagged
estimates at the cost of ruling out estimates at the last ω/2 time points as, in that case, the upper bound of the window reaches future calendar days for which data is not yet available. Fig 1 summarizes the two window structures used in the simulation study.
Illustration of smoothing windows of width ω to estimate with EpiEstim. (a) Cori et al. (2013) [3] convention with sliding windows [t − ω; t], where
is reported at the end of the window. (b) Gostic et al. (2020) [2] recommendation with centered sliding windows [t − ω/2; t + ω/2], where
is reported at the midpoint of the window. Under the midpoint rule,
estimates for the last ω/2 time units are unavailable ∅.
Comparing EpiLPS with EpiEstim at window boundary
The performance indicators computed for each scenario include the average bias, mean square error (MSE), coverage probability (CP) and width (CIΔ) of 90% and 95% credible intervals for the estimator (see detailed formulas in S2 Appendix) with EpiLPS and EpiEstim, respectively. Estimates obtained during the first week of the epidemic are ignored as they may be subject to serious bias due to the poor information carried by the (few) incident cases in such an early phase. Therefore, the performance measures are computed as an average over days t = 8, …, T, where T is the upper bound of
. For a chosen time window, the performance measures for EpiEstim are computed by comparing the true value of the reproduction number at time step t with the estimated reproduction number (and credible interval) obtained at the end of the chosen time window (cf. Fig 1). A detailed description of the data generating process and the figures of the estimated
trajectories for all scenarios are provided in S2 Appendix.
The performance measures given in Tables 2 and 3 provide interesting insights into the behavior of EpiLPS and EpiEstim across the considered scenarios. In terms of bias, EpiLPS is really competitive against EpiEstim as both LPSMAP and LPSMALA outperform EpiEstim (no matter the time window size) in Scenarios 4–8. For the remaining scenarios, the bias between the two competing methods is more or less similar. Regarding the MSE, EpiLPS exhibits smaller values as compared to EpiEstim with three days and daily windows respectively across all scenarios. Moreover, specifying smaller time windows in EpiEstim leads (generally) to an increase in MSE and also an increase in bias. A close inspection of the coverage probability of credible intervals reveals that EpiLPS has close to nominal coverage in almost all scenarios. This is however not the case for EpiEstim, especially for weekly and three days windows, where severe to mild undercoverage is observed. Also, EpiEstim tends to show more severe undercoverage in scenarios where data is more overdispersed (see e.g. Scenario 4). More importantly, even when EpiEstim approaches the nominal coverage probability (with daily windows), it has much wider credible interval width (and so less precision) as compared to EpiLPS in almost all scenarios.
The Bias, MSE, coverage probability (CP) and width (CIΔ) of 90% and 95% credible intervals for are averaged over days t = 8, …, 40. For EpiEstim,
is reported at the end of the window.
The Bias, MSE, coverage probability (CP) and width (CIΔ) of 90% and 95% credible intervals for are averaged over days t = 8, …, 60. For EpiEstim,
is reported at the end of the window.
Figs 2 to 4 summarize the epidemic curves and the trajectories obtained for the estimated with LPSMAP (blue curves) and EpiEstim under weekly sliding windows (green curves) for selected scenarios. These figures highlight the flexibility and the precision with which Laplacian-P-splines are able to capture the reproduction number over the time course of the epidemic. The dashed (dotted) curves represent the pointwise median (computed over the S = 100 estimates) of
with LPSMAP (EpiEstim). For LPSMAP, it closely follows the true pattern of
even under strong nonlinearities as in Fig 4. The EpiEstim trajectories appear shifted to the right of the target
curve. This lag is due to the fact that for weekly sliding windows, the
estimate provided by EpiEstim at the end of the window is entirely based on data from past days and is therefore lagged compared to the target (instantaneous)
. This shift effect can be corrected by decreasing the time window (e.g., using daily windows) at the cost of more “noisy” trajectories. Even then, the median
estimates of EpiEstim appear to capture less precisely the target
function as compared to LPSMAP/LPSMALA (see S2 Appendix) across most of the considered scenarios.
(Left) Simulated incidence data for Scenario 2. (Center) Estimated trajectories of for each simulated dataset with LPSMAP. (Right) Estimated trajectories of
with EpiEstim using weekly sliding windows and
reported at the end of the window. The pointwise median estimate of
for EpiLPS (dashed) and EpiEstim (dotted) is also shown.
(Left) Simulated incidence data for Scenario 3. (Center) Estimated trajectories of for each simulated dataset with LPSMAP. (Right) Estimated trajectories of
with EpiEstim using weekly sliding windows and
reported at the end of the window. The pointwise median estimate of
for EpiLPS (dashed) and EpiEstim (dotted) is also shown.
(Left) Simulated incidence data for Scenario 9. (Center) Estimated trajectories of for each simulated dataset with LPSMAP. (Right) Estimated trajectories of
with EpiEstim using weekly sliding windows and
reported at the end of the window. The pointwise median estimate of
for EpiLPS (dashed) and EpiEstim (dotted) is also shown.
To summarize, this simulation study sheds light on the trade-off faced by the Cori method when estimating the instantaneous reproduction number. Choosing a weekly sliding window as a default option in EpiEstim can lead to a forward shifted (and so inaccurate) estimate of . Smaller time windows in EpiEstim alleviate the lag effect, but the price to pay is that the fitted
trajectory is wiggly (undersmoothing) as it captures more variation than necessary [2].
EpiLPS does not suffer from such a trade-off as the latter is naturally solved by P-splines. In fact, one could say that the time window size in EpiEstim is analogue to the smoothing parameter λ in EpiLPS as these quantities will be key for the resulting smoothness of the fit. The major advantage with EpiLPS is that λ is estimated naturally within the Bayesian model (either via maximum a posteriori estimation or MCMC), while the choice of the time window in EpiEstim is chosen freely outside the model.
Comparing EpiLPS with EpiEstim at window midpoint
To correct for the lag effect in EpiEstim resulting from reporting the reproduction number estimate at the end of the window, Gostic and colleagues recommend to report it at the center of the window to obtain an estimate that is more accurately oriented in time. It is therefore important to compare the performance of EpiLPS against this “corrected” EpiEstim output as it is considered a best practice for a retrospective usage and, as such, is a legitimate candidate against EpiLPS which is by nature only partially real-time (see next section). We therefore run the entire simulations for all scenarios one more time accounting for the corrected EpiEstim output under weekly windows ω = 6 and three days windows ω = 2, where the estimated is computed at the window midpoint. Results for a daily window (ω = 0) are identical to those reported in Tables 2 and 3, as sliding windows become degenerate intervals at each time step [t, t]. The performance measures are reported in Table 4 and Figs 5 to 7 summarize the estimated trajectories for the same scenarios as in the previous section for the sake of comparison. As expected, the resulting
trajectories for EpiEstim are now closer to the target and the lag effect has disappeared. Despite this improvement, the performance indicators clearly highlight that EpiLPS outperforms EpiEstim in all scenarios except Scenario 1, where the numbers are of a similar order of magnitude. In general, the EpiLPS approach is less biased and provides credible intervals with close to nominal coverage. Even when correcting for the reporting of
at the middle of the window, EpiEstim results are less accurate, especially regarding credible intervals with weekly windows that can strongly undercover. This has important implications regarding the recommendation of using EpiLPS in practice and detailed recommendation guidelines are provided below.
(Left) Simulated incidence data for Scenario 2. (Center) Estimated trajectories of for each simulated dataset with LPSMAP. (Right) Estimated trajectories of
with EpiEstim using weekly sliding windows and
reported at the window midpoint. The pointwise median estimate of
for EpiLPS (dashed) and EpiEstim (dotted) is also shown.
(Left) Simulated incidence data for Scenario 3. (Center) Estimated trajectories of for each simulated dataset with LPSMAP. (Right) Estimated trajectories of
with EpiEstim using weekly sliding windows and
reported at the window midpoint. The pointwise median estimate of
for EpiLPS (dashed) and EpiEstim (dotted) is also shown.
(Left) Simulated incidence data for Scenario 9. (Center) Estimated trajectories of for each simulated dataset with LPSMAP. (Right) Estimated trajectories of
with EpiEstim using weekly sliding windows and
reported at the window midpoint. The pointwise median estimate of
for EpiLPS (dashed) and EpiEstim (dotted) is also shown.
The performance indicators in Scenarios 1–8 for are averaged over days t = 8,…,37 for LPSMAP, LPSMALA and weekly windows (EpiEstim) and over days t = 8,…,39 for 3 days windows under EpiEstim. In Scenario 9, the performance indicators for
are averaged over days t = 8,…,57 for LPSMAP, LPSMALA and weekly windows (EpiEstim) and over days t = 8,…,59 for 3 days windows under EpiEstim. For EpiEstim,
is reported at the window midpoint.
Real-time considerations
EpiEstim is a powerful tool to estimate in real-time and is probably the best tool currently available to deliver timely estimates of the reproduction number [41]. EpiLPS can be considered a real-time approach only to a certain degree, where the real-time concept is partially present but fundamentally different from the one proposed in EpiEstim. By real-time method, we mean a method for which an estimate of the reproduction number at time t uses data up to (and including) time t. Let us assume that EpiLPS is applied on epidemic data over a specific period, say,
. For time points t = 1, …, T − 1, EpiLPS is clearly non real-time as the global smoothing of
on the “bandwidth”
will be computed based on past, current and future data values. However, at the domain boundary (time point T), the EpiLPS estimate of
will exclusively make use of data up to time T and is therefore real-time (in the same sense as EpiEstim).
The EpiLPS real-time characteristic for this last time point is however only retained temporarily, as if applied (the next day) over the period , the estimate of the reproduction number at time T is not real-time anymore since it will be computed based on data up to time T and the “future” data value available at time point T+ 1. For EpiEstim, the real-time characteristic of the
estimate is retained for any time point t and is therefore more suitable for timely estimation. The real-time properties of EpiLPS and EpiEstim are compared and illustrated in Fig 8.
EpiLPS provides real-time estimates of only at the boundary of the considered domain and estimates at preceding time points are retrospective. On the contrary, estimates of
with EpiEstim are always real-time and therefore preferred for a timely usage.
The extensive simulation results provided here, suggest that EpiLPS imposes itself as a robust retrospective estimation method. In particular, it seriously addresses a challenge faced by many existing methods, namely that estimates typically lead or lag the true value [2]. EpiLPS is therefore a powerful retrospective tool to estimate the reproduction number during and/or after epidemic outbreaks. It is however less preferable than EpiEstim for real-time estimation and should therefore be used with care for timely purposes.
Computing time and sensitivity analyses
The computational time of the EpiLPS algorithm is mainly affected by the number K of B-splines specified in the basis and the total number of days T of the epidemic. Table 5 gives an overview of the real elapsed time in seconds required to run the EpiLPS routines for different (T, K) couples. Obviously, LPSMAP requires far less computational resources as it is a completely sampling-free approach relying on the MAP estimate of the hyperparameter vector. Even with an epidemic of roughly two months and K = 60, LPSMAP is extremely fast and delivers results in a fraction of a second. LPSMALA needs a larger computational budget as the algorithm relies on an iterative sampling scheme (MCMC). However, even for (T = 60, K = 60), LPSMALA requires less than 10 seconds, which is a relatively reasonable time given the number of parameters involved in the model.
EpiLPS algorithm running on an Intel Xeon E-2186M CPU @2.90GHz with 16Go RAM.
We assessed the sensitivity of the EpiLPS estimated reproduction number with respect to model inputs that are free to choose in order to check whether EpiLPS is robust with respect to different parameter choices. In particular, we focus on the sensitivity of the fit (with LPSMAP) to the number K of B-splines and to the parameters aδ and bδ of the Gamma hyperprior on δ. The sensitivity analyses are implemented in S2 Appendix and reveal a negligible sensitivity of the estimated
curve with respect to the above-mentioned parameters. We also discuss the sensitivity of the reproduction number estimates when computed over time domains of increasing width, for instance on [1, T1] and [1, T2] with T2 > T1. This gives an idea of the magnitude of variation in the estimated
in the domain [1, T1] when EpiLPS is actually fitted on the wider domain [1, T2]. Results show that despite having values of
that vary (in the past) when applied to larger time domains due to the global smoothing approach inherent in EpiLPS, the estimated values of the reproduction number remain reasonably close to the target. S3 Appendix provides ancillary results on the estimation performance of the overdispersion parameter ρ and sensitvity analyses of the computed credibles intervals for
with respect to different couples (aδ, bδ).
Application to observed case counts in infectious disease epidemics
Epidemics of SARS-CoV-1 and influenza A H1N1.
In this section, the LPSMALA algorithm is applied on two historical outbreak datasets presented in [3]. In particular, we consider the 2003 SARS outbreak in Hong Kong and the 2009 pandemic influenza in a school in Pennsylvania (USA). We use K = 40 B-splines with a second-order penalty and the serial interval distributions provided in the EpiEstim package [4]. The LPSMALA algorithm is implemented with a chain of length 25 000. Acceptance rates for the generated chains are close to the optimal value of 57% and the posterior samples have converged according to the Geweke (1992) [42] diagnostic test (at the 1% level of significance). Fig 9 shows the smoothed epidemic curves and the estimated for the two outbreaks. Results for the SARS data show that the reproduction number reaches a first peak during the third week, where
(95% CI: 5.19–16.47) and a second more moderate peak around week 6 with
(95% CI: 1.82–3.82). After day t = 43, the epidemic is under control and
smoothly decays below 1. For the pandemic influenza in Pennsylvania, in the end of the second week
is around 2.05 (95% CI: 1.21–3.06). During the middle of the third week, the situation is less severe and
points below 1. As noted in [3], a few cases appeared in the last days of the epidemic generating an upward trend in
estimates.
(Left column) EpiLPS fit for the epidemic curve (top) and the instantaneous reproduction number (bottom) of the SARS outbreak in Hong Kong, 2003. (Right column) EpiLPS fit for the epidemic curve (top) and the instantaneous reproduction number
(bottom) of the pandemic influenza in Pennsylvania, 2009. The shaded area corresponds to the 95% credible interval at each day.
Application on the SARS-CoV-2 pandemic.
The EpiLPS methodology is illustrated on the SARS-CoV-2 pandemic using publicly available data from the Covid-19 Data Hub [43] and its associated COVID19 package on CRAN (https://cran.r-project.org/package=COVID19). Country-level data on hospitalizations for Belgium, Denmark, Portugal and France from April 5th, 2020 to October 31st, 2021 is used and a serial interval distribution with a mean of 3 days (and standard deviation of 2.48 days) is assumed [44] discretized as φ = {0.344, 0.316, 0.168, 0.104, 0.068}. In Fig 10, the estimated reproduction number obtained with EpiLPS and EpiEstim respectively, is shown for the four countries. Results are obtained with the LPSMAP algorithm using K = 30 B-splines and a second-order penalty. The gray shaded surface corresponds to 95% (pointwise) credible intervals for with LPSMAP and the dashed curves are for EpiEstim. From a computational perspective, it takes less than 3 seconds to fit the EpiLPS model for the four countries. The fitted reproduction numbers reflect the different waves of the COVID-19 pandemic and the rise in infections in the beginning of September 2021. We also see that EpiLPS tends to follow the same trend as the estimates provided by EpiEstim, the only difference is that LPSMAP estimates appear globally smoother with credible intervals that are less wide for Belgium, Denmark and Portugal.
The shaded area corresponds to the 95% credible interval at each day. Dashed curves are results obtained with EpiEstim (with weekly sliding windows and estimated reported at the end of the window).
Discussion
EpiLPS (an acronym for Epidemiological modeling with Laplacian-P-Splines) is a fast and flexible tool for Bayesian estimation of the instantaneous reproduction number during epidemic outbreaks. The tool is flexible in the sense that (penalized) spline based approximations provide smoothed estimates of
with little computational effort and without the constraint of imposing any sliding window assumption that could potentially affect the timing and accuracy of the estimator. Moreover, the end user has the choice between a fully sampling-free approach (LPSMAP) or an efficient MCMC gradient-based approach with Langevin diffusions (LPSMALA) for inference. The available EpiLPS package (https://cran.r-project.org/package=EpiLPS) allows public health policy makers to analyze incoming data faster than existing methods relying on classic MCMC samplers, thus permitting them to be better informed when taking decisions on control measures for infectious disease outbreaks. Simulation studies in this paper provide encouraging results and support EpiLPS as being a robust tool capable of a precise tracking of
over time. The EpiLPS software package and the early website version (https://epilps.com) provide additional guiding material about the proposed methodology.
EpiLPS cannot be termed a real-time method in the same sense as in the Cori method and is therefore less preferred than EpiEstim for real-time analysis. Conceptually, EpiLPS and EpiEstim both use data from the past (EpiLPS also uses data from the future) to estimate the instantaneous reproduction number, but the mechanisms underlying the use of past observations differ. The method of Cori looks back in time only as far as the width of the chosen time window in terms of infected individuals. EpiLPS on the contrary has a stronger reach as the P-splines smoother approximates the reproduction number globally (or blockwise), over the entire domain of the epidemic curve, i.e. retrospectively and also including future values (except for the estimate of at the last day of the domain of the epidemic curve which makes use of the current day value and past values). This difference has important consequences and implies advantages as well as disadvantages. The advantage of working with a time window option as in EpiEstim is that one can control how far back in time to look in order to compute the desired
estimate. This is not an option in EpiLPS as the penalty parameter, the key driver of the degree of smoothness of the fitted
curve, is estimated within the model and is not fixed by the user. There is however no free lunch and the downside of having a time window choice in EpiEstim implies to face a trade-off between potential oversmoothing (with a wide time window) and undersmoothing (with a narrow time window). This trade-off is virtually absent in the EpiLPS setting as P-splines internally deal with the smoothing problem.
It is evident that when applying EpiLPS sequentially over time on epidemic curves with wider and wider domain length such as [1, T1], [1, T2], [1, T3] with 1 < T1 < T2 < T3, the estimate over past days (for instance t ∈ [1, T1]) will inevitably change as EpiLPS is by nature a global smoother. This past variability should not be seen as a drawback as it is essentially an “update” taking into account the fact that the method works with an epidemic curve with a longer domain. The real question is whether the past variability of the
estimate remains in a close neighborhood of the “true” value of the reproduction number for past days. On that side, the complete simulation study is rather convincing as it shows that EpiLPS is an accurate method that is successful in capturing the evolution of
over time.
There are also other aspects with respect to which EpiEstim and EpiLPS differ. For instance, prior specification in EpiEstim assumes a Gamma distributed prior on the reproduction number which is conjugate to the Poisson likelihood (EpiEstim assumes that incidence at time step t is Poisson distributed), so that the posterior of also has a Gamma distribution. In EpiLPS, the prior(s) are not directly imposed on the reproduction number, but on the spline parameters (and hyperparameters) and the resulting posterior distribution of
with LPSMAP is approximated by a lognormal distribution. Regarding computational complexity, EpiEstim and LPSMAP deliver estimates almost instantly, while LPSMALA requires a larger computing budget as it is a MCMC algorithm. We therefore recommend using LPSMALA over shorter epidemic durations and LPSMAP on longer outbreaks over several months. Our analysis suggests that EpiLPS might be more accurate than EpiEstim in presence of overdispersed epidemiological data, especially when it comes to quantify the uncertainty of
as EpiLPS is shown to have narrower credible intervals with good coverage performance. A main limitation is that EpiLPS is more prone to numerical instability (e.g. during hyperparameter optimization or in the Newton-Raphson algorithm for the Laplace approximation) than EpiEstim, although such problems were not encountered here. Finally, it is also worth mentioning that
estimates delivered by EpiLPS (and EpiEstim) are prone to potential biasing effects [2, 45] since the serial interval is used as a surrogate for the generation interval (time elapsed between infection events of an infector and an infectee) as the latter is less easily observed.
The EpiLPS project opens up several future research directions. A possible extension would be to formulate the EpiLPS model within a zero-inflated (Poisson) framework to cope with incidence time series characterized by an excess of zero counts. Another interesting extension would be to adapt the model to allow for regional variation and imported cases. Moreover, akin to EpiEstim, the EpiLPS methodology could be further developed to explicitly account for uncertainty in the serial interval distribution. Finally, in face of long-lasting epidemic scenarios involving several variants characterized by different levels of virulence, it would be useful to extend the EpiLPS methodology to allow for smooth transitions of the estimated reproduction number accompanying the evolution of variants.
Supporting information
S1 Appendix. Details for the LPSMALA algorithm.
Analytical gradient for the Langevin-Hastings proposal and analytical version of the ratio of proposal distributions in the LPSMALA algorithm.
https://doi.org/10.1371/journal.pcbi.1010618.s001
(PDF)
S2 Appendix. Simulation results and computational time.
Complete simulation results (for EpiLPS and EpiEstim) when EpiEstim reports at the window boundary, sensitivity analyses and computational time of EpiLPS.
https://doi.org/10.1371/journal.pcbi.1010618.s002
(PDF)
S3 Appendix. Further simulation and sensitivity results.
Complete simulation results (for EpiLPS and EpiEstim) when EpiEstim reports at the window midpoint and additional sensitivity analyses.
https://doi.org/10.1371/journal.pcbi.1010618.s003
(PDF)
References
- 1. White LF, Moser CB, Thompson RN, Pagano M. Statistical estimation of the reproductive number from case notification data. American Journal of Epidemiology. 2021;190(4):611–620. pmid:33034345
- 2. Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, Rt. PloS Computational Biology. 2020;16(12):1–21. pmid:33301457
- 3. Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal of Epidemiology. 2013;178(9):1505–1512. pmid:24043437
- 4.
Cori A. EpiEstim: estimate time varying reproduction numbers from epidemic curves (CRAN); 2021. Available from: https://cran.r-project.org/web/packages/EpiEstim/index.html.
- 5. Parag KV. Improved estimation of time-varying reproduction numbers at low case incidence and between epidemic waves. PloS Computational Biology. 2021;17(9):1–23. pmid:34492011
- 6. Abbott S, Hellewell J, Sherratt K, Gostic K, Hickson J, Badr HS, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters; 2020. Available from: https://zenodo.org/record/3957490#.YzmFeUxBxf8.
- 7. Azmon A, Faes C, Hens N. On the estimation of the reproduction number based on misreported epidemic data. Statistics in Medicine. 2014;33(7):1176–1192. pmid:24122943
- 8. Gressani O, Faes C, Hens N. An approximate Bayesian approach for estimation of the reproduction number under misreported epidemic data, MedRxiv [Preprint]; 2021. Available from: https://doi.org/10.1101/2021.05.19.21257438.
- 9.
Pircalabelu E. A spline-based time-varying reproduction number for modelling epidemiological outbreaks. LIDAM Discussion Paper ISBA; 2021. Available from: http://hdl.handle.net/2078.1/244926.
- 10. Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PloS one. 2007;2(8):e758. pmid:17712406
- 11. Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences. 2007;274(1609):599–604. pmid:17476782
- 12. Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, et al. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18.
- 13. Anscombe FJ. Sampling theory of the Negative Binomial and logarithmic series distributions. Biometrika. 1950;37(3/4):358–382. pmid:14801062
- 14. Piegorsch WW. Maximum likelihood estimation for the Negative Binomial dispersion parameter. Biometrics. 1990;46(3):863–867. pmid:2242417
- 15. Lloyd-Smith JO. Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. PloS one. 2007;2(2):e180. pmid:17299582
- 16. Imai C, Armstrong B, Chalabi Z, Mangtani P, Hashizume M. Time series regression model for infectious disease and weather. Environmental Research. 2015;142:319–327. pmid:26188633
- 17. Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11(2):89–121.
- 18. Frasso G, Lambert P. Bayesian inference in an extended SEIR model with nonparametric disease transmission rate: an application to the Ebola epidemic in Sierra Leone. Biostatistics. 2016;17(4):779–792. pmid:27324411
- 19. Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M. A review of spline function procedures in R. BMC Medical Research Methodology. 2019;19(46):1–16. pmid:30841848
- 20.
Eilers PHC, Marx BD. Practical Smoothing: The Joys of P-splines. Cambridge University Press; 2021.
- 21. Lang S, Brezger A. Bayesian P-splines. Journal of Computational and Graphical Statistics. 2004;13(1):183–212.
- 22. Jullion A, Lambert P. Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models. Computational Statistics & Data Analysis. 2007;51(5):2542–2558.
- 23. Gressani O, Lambert P. Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines. Computational Statistics & Data Analysis. 2021;154:107088.
- 24. Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using Integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2009;71(2):319–392.
- 25. Tierney L, Kadane JB. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association. 1986;81(393):82–86.
- 26. Chib S, Greenberg E. Markov Chain Monte Carlo Simulation Methods in Econometrics. Econometric Theory. 1996;12(3):409–431.
- 27. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence. 1984;PAMI-6(6):721–741. pmid:22499653
- 28. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics. 1953;21(6):1087–1092.
- 29. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57(1):97–109.
- 30. Roberts GO, Tweedie RL. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli. 1996;2(4):341–363.
- 31. Roberts GO, Rosenthal JS. Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1998;60(1):255–268.
- 32. Roberts GO, Rosenthal JS. Optimal Scaling for Various Metropolis-Hastings Algorithms. Statistical Science. 2001;16(4):351–367.
- 33. Lambert P, Eilers PHC. Bayesian density estimation from grouped continuous data. Computational Statistics & Data Analysis. 2009;53(4):1388–1399.
- 34. Lambert P, Eilers PHC. Bayesian proportional hazards model with time-varying regression coefficients: A penalized Poisson regression approach. Statistics in Medicine. 2005;24(24):3977–3989. pmid:16320263
- 35. Gressani O, Faes C, Hens N. Laplacian-P-splines for Bayesian inference in the mixture cure model. Statistics in Medicine. 2022;41(14):2602–2626. pmid:35699121
- 36. Haario H, Saksman E, Tamminen J. An Adaptive Metropolis Algorithm. Bernoulli. 2001;7(2):223–242.
- 37. Atchadé YF, Rosenthal JS. On adaptive Markov chain Monte Carlo algorithms. Bernoulli. 2005;11(5):815–828.
- 38. Ferguson NM, Cummings DA, Cauchemez S, Fraser C, Riley S, Meeyai A, et al. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 2005;437(7056):209–214. pmid:16079797
- 39. Lipsitch M, Cohen T, Cooper B, Robins JM, Ma S, James L, et al. Transmission dynamics and control of severe acute respiratory syndrome. Science. 2003;300(5627):1966–1970. pmid:12766207
- 40. Cauchemez S, Nouvellet P, Cori A, Jombart T, Garske T, Clapham H, et al. Unraveling the drivers of MERS-CoV transmission. Proceedings of the national academy of sciences. 2016;113(32):9081–9086. pmid:27457935
- 41. Nash RK, Nouvellet P, Cori A. Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges. PloS Digital Health. 2022;1(6):e0000052.
- 42. Geweke J. Evaluating the accurating of sampling-based approaches to the calculation of posterior moments. Bayesian Statistics. 1992;4:169–193.
- 43. Guidotti E, Ardia D. COVID-19 Data Hub. Journal of Open Source Software. 2020;5(51):2376.
- 44. Kremer C, Braeye T, Proesmans K, André E, Torneri A, Hens N. Observed serial intervals of SARS-CoV-2 for the Omicron and Delta variants in Belgium based on contact tracing data, 19 November to 31 December 2021, MedRxiv [Preprint]; 2022.
- 45. Britton T, Scalia Tomba G. Estimation in emerging epidemics: biases and remedies. Journal of the Royal Society Interface. 2019;16(150). pmid:30958162