Inference on dengue epidemics with Bayesian regime switching models.

Dengue, a mosquito-borne infectious disease caused by the dengue viruses, is present in many parts of the tropical and subtropical regions of the world. All four serotypes of dengue viruses are endemic in Singapore, an equatorial city-state. Frequent outbreaks occur, sometimes leading to national epidemics. However, few studies have attempted to characterize breakpoints which precede large rises in dengue case counts. In this paper, Bayesian regime switching (BRS) models were employed to infer epidemic and endemic regimes of dengue transmissions, each containing regime specific autoregressive processes which drive the growth and decline of dengue cases, estimated using a custom built multi-move Gibbs sampling algorithm. Posterior predictive checks indicate that BRS replicates temporal trends in Dengue transmissions well and nowcast accuracy assessed using a post-hoc classification scheme showed that BRS classification accuracy is robust even under limited data with the AUC-ROC at 0.935. LASSO-based regression and bootstrapping was used to account for plausibly high dimensions of climatic factors affecting Dengue transmissions, which was then estimated using cross-validation to conduct statistical inference on long-run climatic effects on the estimated regimes. BRS estimates epidemic and endemic regimes of dengue in Singapore which are characterized by persistence across time, lasting an average of 20 weeks and 66 weeks respectively, with a low probability of transitioning away from their regimes. Climate analysis with LASSO indicates that long-run climatic effects up to 20 weeks ago do not differentiate epidemic and endemic regimes. Lastly, by fitting BRS to simulated disease data generated from a stochastic Susceptible-Infected-Recovered model, mechanistic links between infectivity and regimes classified using BRS were provided. The model proposed could be applied to other localities and diseases under minimal data requirements where transmission counts over time are collected.


Model Specification (Bayesian Autoregression)
We consider the Bayesian autoregressive (AR) model of order p as follows, y t = β 1 + y t−1 β 2 + · · · + y t−p β p + t Notation is suppresed to matrix form as follows for ease of exposition of posterior distributions, with n denoting the number of observations, some common noise term parameterized by σ 2 : We estimate sequentially θ = {β, σ 2 } by placing the following priors on parameters: β ∼ N β 0 , σ 2 P 0 with β 0 = 0 p×1 and σ 2 P 0 = diag(100) p×p for our parameters to be centered around 0 and having a wide variance to impose noninformativeness.

Model Estimation (Bayesian Autoregression)
The Gibbs sampler proceeds by sequentially sampling the conditional posteriors for each of our parameters specified above.
For β, we block sample sequentially across each dimension, conjugacy between the normally distributed prior and normally distributed likelihood yields a normally distributed posterior with mean β * p×1 and variance σ 2 β P 1,p×p , For σ 2 , conjugacy between the inverse gamma prior and normally distributed likelihood yields an inverse gamma posterior with scalar shape and rate parameters v 1 /2 and v 1 /σ 2 1 respectively, Steps (1) and (2) are iterated until convergence.

Model Specification (Bayesian Regime Switching)
We consider a 2 regime Bayesian regime switching (BRS) model of order p with regime specific autoregressive parameters. β p,st=k denoting the p th autoregressive term for the k th regime at time t and similarly σ 2 st = k the variance parameter for the noise term of the k th regime at time t as follows, y t = β 1,st + y t−1 β 2,st + · · · + y t−p β p,st + t Notation here is also suppressed to matrix form for ease of deriving posterior distributions, with n s denoting the number of observations, s a regime specific noise term parameterized by σ 2 s : The transition matrix ξ t characterizes probabilities of switching between two states at each time point t: We estimate sequentially θ = {β s,p×1 , σ 2 s , s 1:t , ξ t } by placing the following priors on parameters, states and transition matrix: s P 0 = diag(100) p×p for our parameters to be centered around 0 and having a wide variance to impose noninformativeness.
with j, k ∈ {1, 2} denoting the regime index and e 11 = 5, e 12 = 15 for the two regime BRS to impose the belief that dengue transmissions are more likely to stay within their own regime rather than move to another regime.

Model Estimation (Bayesian Regime Switching)
The Gibbs sampler proceeds by sequentially sampling the conditional posteriors for each of our parameters specified above.
First, we conduct multi-move sampling on S 1:t with the conditional posteriors given by (5), with p(S t | Y t ) obtained using the Hamilton filter and T −1 k=1 p(S k | S k+1 , Y t ) through the Carter-Kohn recursion as described by Kim and Nelson 1999 [2].
Next, to sample the transition probability matrix ξ t , we note that with K denoting the number of regimes estimated for the BRS, S ∈ {s 1 . . . s t } has the complete likelihood: where N jk,t denotes counts for the transition from j th to k th states at time t, we derive: p(ξ t | S) ∝ Dir(e j1 + N j1,t (s), . . . , e jk + N jk,t (s)) For σ 2 s , conditional on the state sampled from (5), conjugacy between the inverse gamma prior and normally distributed likelihood yields an inverse gamma posterior with scalar shape and rate parameters v 1 /2 and v 1 /σ 2 1 respectively, Similarly for β s , conditional on the state sampled from (5), conjugacy between the normally distributed prior and normally distributed likelihood yields a normally distributed posterior with mean β * s and variance σ 2 β,s P 1 Lastly, to prevent label switching, we place an identification restriction on the sigma parameter such that σ 1,S=2 > σ 1,S=1 Steps (5) -(8) are iterated until convergence

Bayes Factor Computation
We compute the Bayes Factor by naive Monte Carlo simulation following Gelman 2013 [1]. The posterior model likelihood for the Bayesian Autoregression and Bayesian Regime switching models are averaged over posterior samples (7) after burnin is discarded. Namely, D denotes the observed elements D = {X, Y }. For the Bayesian Autoregression θ = {β, σ} and for the Bayesian Autoregression, it is sufficient to consider θ = {β, σ, S} for computing the posterior model likelihood.
Thereafter, the log Bayes factor is computed as (10).

Deviance Information Criterion Computation
The deviance information criterion (DIC) follows [3], by defining the deviance as D(θ) = −2log(p(y|θ)) + C, with C some constant which cancels in model comparison and log(p(y|θ)) the log likelihood of the model. To calculate the effective number of parameters, we take p D = D(θ) − D(θ) where θ denotes the posterior expectation of parameters, and D(θ) denotes expectation of the log likelihood computed over posterior draws of θ. The DIC is then given by: We obtain the relative DICs by computing: LASSO coefficients were estimated using 5-fold cross validation with the dependent variable being regimes estimated from the 2-regime 3 lag BRS specification and independent variables being 20 lags of observed climatic factors. Bootstrapping was conducted by sampling the dataset with replacement and conducting the LASSO estimation procedure on the subsetted data over 1000 repetitions. The bootstrap repetitions were then used to reconstruct the LASSO estimated coefficient distributions and summary statistics.