An approximate diffusion process for environmental stochasticity in infectious disease transmission modelling

Modelling the transmission dynamics of an infectious disease is a complex task. Not only it is difficult to accurately model the inherent non-stationarity and heterogeneity of transmission, but it is nearly impossible to describe, mechanistically, changes in extrinsic environmental factors including public behaviour and seasonal fluctuations. An elegant approach to capturing environmental stochasticity is to model the force of infection as a stochastic process. However, inference in this context requires solving a computationally expensive “missing data” problem, using data-augmentation techniques. We propose to model the time-varying transmission-potential as an approximate diffusion process using a path-wise series expansion of Brownian motion. This approximation replaces the “missing data” imputation step with the inference of the expansion coefficients: a simpler and computationally cheaper task. We illustrate the merit of this approach through three examples: modelling influenza using a canonical SIR model, capturing seasonality using a SIRS model, and the modelling of COVID-19 pandemic using a multi-type SEIR model.


Introduction
Mathematical modelling of the complex dynamics of infectious diseases remains an essential tool to inform public health policies during epidemic outbreaks.The major focus of such modelling work is describing the intrinsic transmission dynamics and the flow of individuals between compartments that segregate the population as per their disease state.However, an epidemic is also driven by a number of extrinsic factors, including population mobility, social cycles (e.g.holidays), non-pharmaceutical interventions, and climatic variations (Bretó et al., 2009).In a compartmental model, such factors are often introduced explicitly through the description of the hazard (force) of infection when information about these external drivers is available (Knock et al., 2021;Keeling et al., 2020;Davies et al., 2020).However, while it is impossible to fully account for all extrinsic factors influencing transmission, yet ignoring this epistemic uncertainty often known as "environmental stochasticity" leads to a structural miss-specification of the model, a "model discrepancy".Model discrepancy can lead to miss-calibrated models that underestimate uncertainty and produce biased predictions (Brynjarsdóttir and O'Hagan, 2014).An elegant approach to account for the un-modelled model discrepancy is to represent the force of infection as a stochastic process.For example, Dureau et al. (2013); Cazelles et al. (2018) use a diffusion process for this purpose, while Birrell et al. (2021) use a discrete time stochastic process.Parameter estimation for such stochastic models is, however, challenging.Inference, particularly in a Bayesian context, requires estimation of the joint posterior distribution of both the latent path of the stochastic process and the model parameters.Estimation using a Markov chain Monte Carlo (MCMC) algorithm, involves sampling the 1 arXiv:2208.14363v1[stat.CO] 30 Aug 2022 realisation of the stochastic process, a high dimensional object, often through data-augmentation techniques, which incur a hefty computational cost (De Angelis et al., 2015).As a result efficient calibration of a compartmental model, which embeds a stochastic process, has received significant attention in the literature (e.g.Fuchs, 2013;Sottinen and Särkkä, 2008) with the goal of alleviating the computational bottleneck associated with the inference of the stochastic process.
In this paper we propose a new approach to the calibration problem through the use of a pathwise approximation of a diffusion process.Specifically, we apply a truncated Fourier expansion of a Brownian motion to obtain the approximation.Application of this series expansion turns the task of inferring a high dimensional latent diffusion sample path into the task of inferring a smaller dimensional object, the expansion coefficients, which can be carried out without data-augmentation.This method is also applicable in the context of discrete time processes that converge to a diffusion in the continuous time limit.Such processes can be approximated by first carrying out the series expansion of the limiting diffusion and then applying a suitable time discretisation.We validate the proposed method against a data augmentation technique carried out using a particle MCMC sampler proposed in Dureau et al. (2013), using a dataset from an influenza outbreak in a boarding school.We then apply this method to fit a model of COVID-19 spread in England during the first wave.
2 Background: Epidemic models with a time-varying transmissionpotential We consider the canonical SIR (Susceptible-Infected-Removed) model (Anderson et al., 1992) to introduce the stochastic modelling framework, although the methodology can be applied to other more complex compartmental models.In the SIR model the compartments denote the number of susceptible (S), infected (I), and recovered (R) people in a population subjected to an epidemic at time t.For a population of size N , the SIR model is defined by the following ODE system: where λ = β It N is the force of infection, describing the generation of infections with a transmissionpotential β, between susceptible individuals and the fraction, I t /N , of infectious individuals.The expected period spent in the compartment is given by γ −1 .The individual compartment sizes sum to N = S t + I t + R t .
To include environmental stochasticity we introduce a time-varying β t (e.g.Ellner et al., 1998;Martinez-Bakker et al., 2015;Cauchemez and Ferguson, 2008;Cauchemez et al., 2008;Cazelles and Chau, 1997) to mitigate model discrepancy, leading to a reformulation of the model in Eq (1): where x t follows a diffusion process described by an Itô stochastic differential equation (SDE) (Oksendal, 2013) with drift a(•), and diffusion b(•) functions parameterised by the vector ξ; W t is a standard Brownian motion; and g(•) is a nonlinear transformation that enforces β t > 0, such as exponential or inverse-logit transformation.Here we make some mild assumptions about a(•) and b(•) such as, for example, being locally Lipschitz with a linear growth bound (Oksendal, 2013) to ensure a non-explosive solution.
Inference for the stochastic model in Eq (2) within a Bayesian framework, requires inference of the latent sample path x of the diffusion x t , which is indirectly observed through the time evolution of the disease states: S t , I t , R t .This is a missing data problem that can be addressed through data-augmentation based MCMC methods (e.g.Fuchs, 2013;Dureau et al., 2013) in which a high resolution (in time) Euler-Maruyama discretisation of x t is sampled along with the model parameters.Such MCMC methods incur high computational costs and have reduced efficiency in terms of mixing and speed of convergence.In what follows we will investigate a scalable approximation of x t that is faster to sample.

Methods
Following Lyons et al. (2012); Luo (2006); Ghosh et al. (2022), we carry out a Fourier expansion of a Brownian motion W t and obtain a smooth path-wise series approximation.Using this approximation of a Brownian motion, we can in turn approximate the SDE for x t with a random ODE.Inference of x t can then be carried out by inferring coefficients of this ODE, without requiring data-augmentation.

Fourier expansion of Brownian motion
Within a time interval [0, T ], where T is the length of the time horizon within which an epidemic is analysed, the Fourier expansion of a Brownian motion W t is given by (Luo, 2006): where {φ i } ∞ i=1 is a complete orthonormal basis of L 2 [0, T ] (see Appendix A for derivation).For example this can be the generalised Fourier cosine basis (Lyons et al., 2014) given by φ i (t) = (2/T ) 1/2 cos{(2i − 1)πt/2T }. (4) We will use the shorthand Z i = T 0 φ i (s)dW s .Since the basis functions {φ i } are deterministic and orthonormal, it follows from standard results of Itô calculus that Z i ∼ N (0, 1) (Luo, 2006).By truncating the infinite series in Eq (27) to n-terms we obtain a path-wise approximation of the Brownian motion W t given by Ŵt (5)

Approximating a SDE with a random ODE
Taking derivative of Ŵt with respect to time we obtain the following approximation to white noise, the derivative of Brownian motion, given by Now, let us replace the Itô SDE in Eq (2) with the following Stratonovich SDE (Oksendal, 2013) where (•) denotes a Stratonovich integral (Oksendal, 2013) with respect to W t .The Itô SDE in Eq (2) and the Stratonovich SDE given above are equivalent (Oksendal, 2013) if By substituting the term dW t in Eq (7) with the approximation d Ŵt in Eq (6), we obtain the following (random) ODE: The work of Wong and Zakai (1965) shows that as n → ∞ the solution xt of the above ODE will converge to the solution x t of the Stratonovich SDE Eq (7) which, given the choice of a (•) in Eq (8), is an equivalent representation of the Itô SDE in Eq (2).Thus, the series approximation xt of the solution x t of an Itô SDE converges to the solution of an equivalent Stratonovich SDE.Next, we discuss the implications of the above approximation with regards to inference.

Inference using the series approximation
Using the path-wise series approximation of a diffusion process x t , presented in the previous sections, we can re-write the canonical SIR model in Eq (2) as a system of coupled ODEs given by where a (•) is given by Eq (8).Note that the randomness in the above model is now encapsulated in the expansion coefficients Z = (Z 1 , . . ., Z n ).Inference in this model is then relegated to the inference of all the parameters: Z, ξ, γ, and the initial values: x 0 , S 0 , I 0 , R 0 .We denote the vector of the parameters governing the dynamics as θ = (ξ, γ).We denote the state vector evolving in continuous time by X t = (x t , S t , I t , R t ), and by X 0 = (x 0 , S 0 , I 0 , R 0 ) the vector of initial values.
In order to explain the inferential framework based on the series approximation, in Eq (10), we assume that the available data y t 1:m = (y t 1 , . . ., y tm ) are the noisy observations of the state I t at m time-points.Here we are simply considering prevalence data for the ease of exposition, however the same idea can be extended to more complex observational models where the observed data only provide partial (and often indirect) information of the states X t (Birrell et al., 2021).
The inferential goal is to learn the posterior distribution of all the unknown quantities, given the data y t 1:m .We place priors p(θ), p(Z), p(X 0 ) on the parameters, expansion coefficients and the initial values.Note that, by construction, the Z = (Z 1 , . . ., Z n ) have an independent standard Normal prior, see Section 3.1.We then numerically solve Eq (10) to obtain a likelihood p(y t 1:m |I t 1:m , ), based on the noise assumption, where I t 1:m is the numerical solution of the state I t evaluated at the m time-points, and are the parameters of the chosen data distribution.The posterior distribution, up to a normalisation constant, follows from the Bayes rule: Samples from the posterior distribution can be obtained using MCMC.The samples of the latent approximate diffusion path x are simply the numerical solution of the ODE for xt evaluated using samples of θ, Z, X 0 from the posterior distribution.
Note that if we had described β t using a SDE, then to sample the latent diffusion x we would have had to use data-augmentation.This involves imputing the sample path of the latent diffusion at the time-points of observations t 1:m as well as at time-points in-between the observations using, say, the Euler-Maruyama scheme (Kloeden and Eckhard, 1992).If one chooses l time-points between t m and t m−1 then the MCMC sampler would target m(l+1)−l random variables (including x 0 ) related to the diffusion.Using the proposed approximation we have replaced the inference of m(l + 1) − l variables with n, which is a simpler inference problem if n < m(l + 1) − l.Below we show that choosing a value of n substantially smaller than m(l +1)−l still renders an estimate of the posterior distribution that is a reliable approximation to the true posterior.

Evaluation
To evaluate the proposed approximation method we fit the model in Eq (10) to the data of an outbreak of influenza at a boarding school (Jackson et al., 2013) (see Fig 3 (a)), on the number of infections for a period of T = 14 days among a population of size N = 763.This dataset is publicly available in the R package outbreaks (Jombart et al., 2020).This dataset was previously used in Del Moral and Murray (2015); Ryder et al. (2018) to fit a SIR model under assumption that the time varying transmission-potential can be modelled using an Ornstein-Uhlenbeck (OU) process.The model in Del Moral and Murray ( 2015) is similar to the stochastic model introduced in Eq (2).Using the OU SDE for x t we can write the model in Eq (2) as: where ξ = (ξ 1 , . . ., ξ 3 ) denotes the parameter vector of the OU SDE.
Here we specifically want to compare the outcome of inference using the true OU diffusion used above (SDE) with its series approximation (SA), leading to a model such as in Eq (10), given by where we have chosen the generalised Fourier basis Eq (4) as the function φ i (t).
For the SDE model the latent sample path x, the diffusion parameters ξ, initial value x 0 and the parameter γ were also estimated together with the initial susceptibility, s 0 = S(t = 0)/N , assuming the initial recovered fraction r 0 = 0 and thus i 0 = 1 − s 0 .As this is count data we have specified a Poisson likelihood: where in this case X 0 = (x 0 , s 0 ).For the SA model we used the inferential framework introduced in the previous section and used the Poisson likelihood as above: We chose a weakly-informative prior for the parameters governing the dynamics ξ 1 , . . ., ξ 3 , γ ∼ Γ(2, 2).For s 0 a Beta(2, 1), since we expect the true value to be near or greater than 2/3, and for the initial value of the diffusion we used a prior 2 , which is the stationary distribution of the OU diffusion.
For the SDE model, data-augmentation using a particle filter was employed to sample the 'true' diffusion's path, following Dureau et al. (2013), and produce an unbiased estimate of the likelihood.Parameters γ, ξ, X 0 were estimated using the Metropolis-Hastings (MH) algorithm, with an adaptive random-walk proposal based on algorithm 4 of Andrieu and Thoms (2008).See B in S1 text for further details on this proposal mechanism.The likelihood estimate produced by the particle filter was used in the acceptance step of the MH algorithm.This particle-marginal Metropolis-Hastings (PMMH) MCMC scheme for jointly updating the latent diffusion path along with the parameters has been shown to have superior performance (Dureau et al., 2013) when compared to other data-augmentation approaches.For the PMMH, we used a Bootstrap particle filter (Gordon et al., 1995), where the particles are propagated using Euler-Maruyama discretisation, and set the number of particles to 1000.Following Del Moral and Murray (2015), we carried out the Euler-Maruyama iterations with a stepsize δt = 0.1, leading to l = 9 time-points between two observations.
For the SA model we used the Metropolis-Hastings algorithm with the same adaptive randomwalk proposal (RWMH) used with the PMMH scheme and the Euler method to numerically solve the ODE adopting the same step-size that is used with the Euler-Maruyama scheme for the SDE.
Note that inference for the SDE model using PMMH will be substantially more computationally heavy compared to the inference for the ODE based SA model, irrespective of the value of n.This is due to the particle filter requiring multiple evaluation of the Euler-Maruyama scheme at each MCMC iteration.Even when parallelised, the particle filter will be bottlenecked by a weightupdating step (see Gordon et al. (1995) for details) requiring message-passing across processes.The Euler scheme for solving the ODE in Eq (13), in comparison, is evaluated once every iteration of a Metropolis-Hastings algorithm targeting the posterior distribution in Eq (11).
A crucial parameter for the proposed method is the number of basis functions n.If a value of n produces a close match between the marginal densities of the true and approximate diffusion at the end of the analysis period T then the approximation will be valid throughout the course of the epidemic.In this case T = 14.In Fig 1 we compare the time T marginal densities p(x t )| t=T obtained by solving the ODE in Eq (13) associated with the SA, and p(x t )| t=T obtained from the original OU diffusion, both based on some trial parameters sampled from the prior.The value n = 15 produces a close match between the marginal densities.We defer further discussion of the effect of n on estimation to section 4.2.

Results: comparison between true and approximate diffusion
We fitted the two models, SDE and SA respectively, using the associated algorithms as described above to the influenza dataset.We ran two chains of both PMMH, for the SDE model, and RWMH, for the SA one, for 10 6 iterations where the first 5 × 10 5 iterations were discarded as burnin and the remaining samples thinned to obtain 1000 samples from the posterior distribution.The running times were 15907 and 2397 seconds for the PMMH and RWMH with n = 15, respectively.We implemented a vectorised particle filter and the Euler solver for the ODE using Jax (Bradbury et al., 2018).The adaptive MCMC algorithm was implemented using Python.
We notice a good agreement between the parameter estimates obtained using the SDE and SA counterparts (see   We observe a good agreement between the epidemic curves obtained using the SDE and the SA, but for the posterior distribution of the latent diffusion paths the credible intervals are narrower for the SA.The SA, due to the truncation of the infinite series expansion, produces smoother paths, slightly underestimating the volatility of the latent diffusion path.On a closer introspection of the posterior means (Fig 3 (b)), it is noticeable that the latent diffusion paths drop and increase again in the period between the 4-th and 9-th day, around the peak, indicating sudden changes in the transmission-potential.These changes are reflected in the estimates of both SDE and SA.After the 9-th day, the variability in the latent paths increase for both SDE and SA and the posterior means match closely.This is expected since after the peak, when the epidemic is receding, a large change in β t will have negligible effect on the case counts.

Sensitivity to the choice of n
In Fig 1 we noticed that the marginal distribution of the latent diffusion path and its series approximation starts agreeing beyond n ≥ 10 terms.It is worth investigating whether such a threshold exist for the posterior distributions obtained using the SDE and the SA.We did this by further comparing the joint posterior distribution p(θ, X 0 |y), from SDE and SA while varying n.Note that θ and X 0 are quantities which were estimated using both the SDE and SA counterparts, and thus the joint posterior of these were chosen for comparison.For this comparison we estimated the posterior distribution by fitting the SA repeatedly with number of basis set to n = 3, 5, 10, 15, 20, 25, 30.To compare the posterior distributions, we used the maximum mean discrepancy (MMD) metric (Gretton et al., 2012), a divergence metric that can be calculated using samples from the distributions.See Appendix H for further details on this metric.
In Fig 4 we plot the MMD between the posteriors from SDE and SA for increasing n.For  n ≥ 10 we found good agreement between the two posteriors, consistent with the results from comparing the marginal densities (Fig 1).This reinforces our approach of choosing the number of basis by comparing marginals of the latent process, while using the SA.We summarise the runtimes of MCMC with the RWMH proposal for each choice of n in Table 1, noting that the increase in the runtimes as we varied n was negligible, especially when compared to the PMMH with SDE. 5 Application: modelling COVID-19 outbreak in England Our proposed method of modelling the time-varying transmission-potential as an approximate diffusion can also be applied to a discrete time stochastic process that converges to a diffusion in the continuous time limit.For example, an AR(1) process converges to a OU diffusion.Thus, if one is already using an AR(1) process to model the transmission-potential, then a discretised version of the series approximation of OU diffusion, the ODE in Eq (13), can be chosen as its replacement.
To exemplify the application of the series expansion method in replacing a discrete time stochastic process, we have chosen to fit a compartmental model whose dynamics are described as a set of first order difference equations, to data from the first wave of the COVID-19 outbreak in England, between February and August 2020 (Birrell et al., 2021).This model captures the effect of unknown extrinsic factors on the force of infection through a time-varying transmission-potential modelled as a Gaussian random-walk.We introduce the model of Birrell et al. (2021) in what follows and introduce an alternative formulation using the series approximation of Brownian motion.

Transmission model for COVID-19
This is an age and spatially structured transmission model, stratifying the population into n A = 7 age groups and n r = 7 regions.Within each region, the transmission dynamics are governed by a system of first order difference equations: where: S r,t k ,a , E d r,t k ,a , I d r,t k ,a , d = 1, 2 represent the time t k , k = 1, . . ., K, partitioning of the population of individuals in a region r, r = 1, . . ., n r , in age-group a, a = 1, . . ., n A , into S (susceptible), E (exposed) and I (infectious) disease states.The average period spent in the exposed and infectious states are given by the parameters d L and d I respectively; and λ r,t k ,i is the time-and age-varying rate with which susceptible individuals become infected, the force of infection.Time steps of δt = 0.5 days are chosen to be sufficiently small relative to the latent and infectious periods.Following Birrell et al. (2011) the initial conditions of the system states S, E 1 , E 2 , I 1 , I 2 at t 0 are given by region-specific parameters ψ r and I 0,r , describing the initial exponential growth and the initial number of infectious individuals, respectively.New infections are generated as where λ r,t k ,a δt is driven over time by a region-specific time-varying potential, β t k ,r , which moderates the rate at which effective contact take place.This region-specific transmission-potential captures the discrepancy between how actual contact take place between the age groups, and that encoded by a set of time-varying contact matrices.We refer the reader to Birrell et al. (2021) for further details on the model dynamics and parameterisation.Over time β t k ,r is not allowed to vary unconstrained and a smoothing is imposed by assuming, a priori that its evolution follows a Gaussian random-walk process with volatility σ βt : where t lock indicates the time-point corresponding to the lockdown introduced in England on 23 rd March 2020.This random-walk formulation requires the inference of the high-dimensional (due to the choice of δt) sample path of this process, an extremely challenging task using MCMC.We will discuss this inferential difficulty later in Section 5.2.2.To restrict the dimensionality of the process, in Birrell et al. (2021) this transmission-potential is assumed to be piecewise constant with weekly changepoints, and its values at these changepoints modelled as a random-walk.Denote w k ≡ w(t k ) the week in which time t k falls.Then the time evolution of the transmission-potential is modelled at a slower weekly time-scale: as a Gaussian random-walk, with volatility σ βw , following the first week of the lockdown.Realisation of the process, for each region, can then be obtained by sampling the vector ∆β r of all the weekly increments ∆β w k ,r = log (β w k ,r ) − log β w k−1 ,r .It was assumed in Birrell et al. (2021) that the contact matrices sufficiently described how actual contacts took place between different age groups prior to the lockdown and thus β w k ,r = 1 over that period.

Inference
To fit the model, using a Bayesian framework, surveillance data of age-and region-specific counts of deaths in people with a lab-confirmed COVID-19 diagnosis between 17 st February and 1 st August was used.Furthermore, serological data from NHS Blood and Transplant (NHSBT), informing the fraction of the population carrying COVID-19 antibodies, were also used.
Following Birrell et al. (2021) the number of observed deaths y d r,t k ,a on day t k , in age group a, and in region r follows a negative binomial distribution: where the mean µ r,t k ,a = p a k l=0 f k−l ∆ infec r,t l ,a is derived using Eq (17), an assumed-known distribution of the time from infection to death from COVID-19, f , and an age-specific infection-fatality ratio p a .Here η is a dispersion parameter such that Ey d r,t k ,a = µ r,t k ,a and Var y d r,t k ,a = µ r,t k ,a (1 + η).If, on day t k , n r,t k ,a blood samples are taken from individuals in region r and age-group a, and the observed number of positive tests is y s r,t k ,a , then where k sens and k spec parametrises the sensitivity and the specificity of the serological testing process, and S r,t k ,a is obtained by solving the difference equations in Eq (16).N r,a is the total population in age-group a and region r.
The unknown quantities that need to be inferred can be divided into two groups: (i) Global parameters θ g = (η, d I , p 1 , . . ., p n A , k sens , k spec , σ βw ) shared between regions, and (ii) regional parameters specific to each region: θ r = (ψ r , I 0r , ∆β r ).After placing the same priors as was used in Birrell et al. (2021) (and listed in Appendix E), the posterior distribution of the unknown quantities is as follows: where we denote by y d , y s ∈ R K×n A ×nr the data for all time-points, ages and regions corresponding to deaths and sero-positive tests, respectively.

Sampling from the posterior
Sampling from the posterior distribution Eq ( 22) is challenging due to the large number of randomwalk increments corresponding to all regions and weeks since lockdown.MCMC with a vanilla RWMH proposal, as applied in Birrell et al. (2021), due to the linear scaling of convergence time with increasing dimensions mixes poorly and requires a large number of iterations (≈ 10 7 ) of the Markov chain before convergence is reached.To improve convergence we instead used a randomscan Metropolis-within-Gibbs (MwG) algorithm that circumvent the updating of a large parameter vector at each iteration.This MwG algorithm exploits the independence between the regional parameters.Our proposed sampling strategy consists of sampling alternatively, at each MCMC iteration, from the posterior of the global parameters conditioned on all the regional ones: (i) p(θ g |θ 1 , . . ., θ nr , y d , y s ), and regional parameters for one randomly chosen region conditioned on the global ones (since the regional parameters are conditionally independent of any other region's parameters): (ii) p(θ r * |θ g , y d , y s ), where r * ∼ Uniform(1, n r ).Samples from each of these conditional distributions are obtained using an adaptive RWMH move with the same adaptation mechanism used in Section 4.1.The pseudocode for this MwG algorithm is furnished in Appendix F.

An alternative formulation
The number of region-specific random-walk increments ∆β w k ,r that needs to be sampled increases with time.The performance of the MwG algorithm starts deteriorating and exhibiting poor mixing and slow convergence, as this number becomes large.This limits dramatically the usefulness of this model in the context of a real-time application.
For the model in Eq (19), this problem can be tackled by increasing the time between two successive changepoints thus reducinge the number of increments to be sampled for a period of analysis.This is however driven by computational convenience, and it would be more meaningful to learn these changes from data.We could model the time evolution of the transmission-potential at a faster time-scale, for example as in Eq (18).However, in this case the number of random-walk increments, to be sampled per region, equals the number of time-points between lockdown and the end of analysis date.Any MCMC sampler, that uses a RWMH proposal, would struggle severely to move efficiently in such a high-dimensional parameter space.
To alleviate these problems we propose to model the transmission-potential as a Brownian motion W t,r with volatility σ βt evolving in continuous time t and apply the series approximation as follows: where the second equality follows from choosing φ i as given in Eq (4) and carrying out the integration.We can then discretise this approximation using the same time-step of δt that is used for the compartmental dynamics to obtain the following path-wise (discrete time) approximation: where T is the number of days between lockdown and analysis date.Note that in this formulation the problem of sampling a large vector of increments ∆β r is reduced to that of sampling a ndimensional vector of the coefficients Z r = (Z 1,r , . . ., Z n,r ).From the comparison of the time T marginal distributions of the true and approximate Brownian motion, as for the OU process (see Fig 1), we found n = 10 to produce a good path-wise approximation.Thus, we used n = 10 for the subsequent comparative evaluations.The regional parameter vector, θ r = (ψ r , I 0r , Z r ), now contains the expansion coefficients instead of the random-walk increments ∆β r .

Results: comparative evaluations
We ran the MwG algorithm to target the posterior distribution in Eq (22) while using the randomwalk based piecewise constant transmission-potential in Eq (19) and the Brownian motion approximation (BMA) in Eq (24).In both cases we ran 3 × 10 6 iterations, discarded the first half of the iterations as burn-in and subsequently thinned the remaining samples to obtain 1000 samples.We implemented the epidemic model in C++.The MwG algorithm was implemented using Python.Following Birrell et al. (2021), we also obtain estimates of the effective region-specific reproduction number R t k ,r , their weighted average R t,E representing the reproduction number for all of England, (formulae for these are given in Appendix D in S1 text).In Fig 6 we show the posterior distributions for p(R t,E |y d , y s ), using the two alternative models.It is evident that the estimate obtained from the BMA appears to be smoother than what is obtained using the piecewise constant model, more realistically reflecting the actual transmission process that happens in continuous time.
In Table 2 we present infection-fatality ratio estimates from the two models, again showing close agreement across models.
Table 2: Posterior mean and 95% credible intervals for the age-specific infection-fatality ratio from the random-walk and BMA models of transmission-potential.
Age group (yrs) Random-walk BMA < 5 0.0009% 0.0009% (0.0002%-0.0022%)(0.00007%-0.0019%)5-14 0.0014% 0.0014% (0.0008%-0.0022%)(0.0006%-0.0022%)15-24 0.0046% 0.0044% (0.0032%-0.0062%)(0.0029%-0.0060%)Computational gains: The MwG algorithm took around 78 hours to finish for both models of the transmission-potential.However, the BMA allows inference at a faster time-scale producing a smoother estimate of R t,E avoiding artificial model assumptions.Such an inference would be computationally infeasible if using a random-walk model at the more granular time-scale as in Eq (18), given the poor scaling of the RWMH proposal in high dimensions.Thus, using the series approximation we were able to extract more information about the transmission-potential and reproduction-ratio in comparison to the piecewise constant model, while incurring the same computational expense.
Had we used the random-walk model in Eq (18), we would have had to further partition each of the regional parameter block in separate chunks to accommodate a large vector of increments ∆β t k ,r = log (β t k ,r ) − log β t k−1 ,r .Consequently, multiple Gibbs moves would have been necessary to update all the increments for a randomly chosen region.This, in turn, would have increased the number of likelihood computations, involving the computationally expensive updates of the transmission model, exponentially at each MCMC iteration.

Discussion
By modelling the force of infection as the function of a time-varying transmission-potential we can incorporate extrinsic, un-modelled effects in the description of the transmission process within a compartmental model.Describing this transmission-potential, in turn, as a stochastic process, a diffusion in particular, we can inject environmental stochasticity in an otherwise deterministic model.In this paper we proposed a path-wise approximation of a diffusion process as an alternative to modelling the dynamics of the transmission-potential as a SDE.Through the path-wise approximation we arrive at a random ODE approximating the SDE.As a function of its parameters, the path (solution) of an ODE is completely deterministic.As a result inference of the transmissionpotential is simplified, with no need to solve a missing data problem using a computationally expensive data-augmentation procedure.
We demonstrate the efficacy of the proposed path-wise approximation using two epidemic models.In the first one, an influenza model, we replaced an OU SDE with an equivalent pathwise approximation.We noticed similar inference outcomes in terms of parameter estimates and goodness-of-fit using the SDE and its ODE approximation.However, for the latter we observed orders-of-magnitude improvement in computational efficiency.
We then applied the path-wise approximation to replace a Gaussian random-walk with a discretised path-wise approximation of Brownian motion to model the transmission-potential within a compartmental model of COVID-19 pandemic spread in England.Again we noticed consistent estimates of crucial unknown quantities such as infection-fatality rate, latent infections and a timevarying estimate of the reproduction number.In addition, the path-wise approximation allows the transmission-potential to be modelled at a more granular time-scale providing a smooth estimate of the effective reproduction number.This would be impossible to achieve using the random-walk model due to an exorbitant computational burden.
As an alternative to using our path-wise approximation of Brownian motion to model the transmission-potential, at a faster time-scale, we could have used a different MCMC algorithm, such as the No-U-Turn sampler (Hoffman and Gelman, 2014), that is known to perform well for high dimensional problems.This algorithm proposes a move based on the gradient of the target density.Evaluating gradients, however, for the COVID-19 model is challenging as this requires, in addition to extra computations, a complete re-implementation of the model using an automatic differentiation package.However, for modelling studies where such re-implementation is straightforward, we like to point out that by applying a path-wise approximation of a diffusion process we are left with the task of sampling from a posterior distribution with a standard Gaussian prior (over the coefficients).The No-U-Turn sampler generally excels at this task.
In this paper we have used simple diffusion models whose transition densities are known analytically.However, if additional prior information about the force of infection is available, then such information can be incorporated in more complex nonlinear SDEs as models of the time-varying transmission-potential.Our methodology can be seamlessly applied in such cases to arrive at a path-wise approximation of such complex diffusion processes.

Software
Code and data supporting the experiment with the SIR model in Section 4, and the code for running the COVID-19 model in Section 5 is available at https://github.com/sg5g10/envstoch. Requests to access the non-publicly available data used for the COVID-19 model in section 5, are handled by the UKHSA Office for Data Release (ODR) https://www.gov.uk/government/publications/accessing-ukhsa-protected-data ( Over time the value of the reproduction number will change as contact patterns shift and the supply of susceptible individuals deplete.The time-t reproduction number is then estimated using the following formula: where t lock indicates the time-point corresponding to the lockdown.R * t k ,r is the dominant eigenvalue of the time t k next-generation matrix, Λ k,r , with elements: where C t k r,ij is a region-specific time-varying contact matrix, see Birrell et al. (2021) for further details on these matrices.
Fig 2).Furthermore, in Fig 3 we compare the goodness-of-fit and display the posterior distribution of the latent diffusion paths p(x|y t 1:m ) and p(x|y t 1:m ), corresponding to the SDE and SA.Additionally, for aid of visualisation, we have also plotted draws from the (posterior) sample paths for both models in Fig 3 (c) and (d).

Figure 1 :
Figure 1: Comparison between the marginal density of the OU SDE at time T = 14, with that obtained through the series approximation upon varying the number of basis n = 3, 5, 10, 15.

Figure 2 :
Figure 2: Comparison of the posterior marginal densities of the parameters obtained using the SDE and the SA (with n = 15 basis function).These densities are summarised using a kernel density estimate.

Figure 3 :
Figure 3: Influenza dataset: Goodness-of-fit (a); posterior distribution of the latent diffusion paths corresponding to the SDE and SA counterparts (b), with densities summarised by the mean (solid lines) and 95% credible intervals (broken lines); and samples from the posterior distribution of the latent diffusion paths, SDE (c) and SA (d) Runtimes in seconds SA with n = 3 n = 5 n = 10 n = 15 n = 20 n = 25 n = 30 SDE 2280

Figure 4 :
Figure 4: MMD between the joint posterior distributions of the parameters θ and initial values X 0 from SDE and SA (for different n).
Fig 5 (a) compares, for the two alternative choices of modelling the transmission-potential, the posterior predictive distributions of the death data aggregated across all ages and regions with the observed data (see Appendix G in for region-specific plots).Clearly the goodness-offit is indistinguishable between the two models.In Fig 5 (b) we show summaries of the posterior distributions of the latent infections p(∆ infec |y d , y s ), aggregated across all ages and regions (regionwise infections are shown in Appendix G) again showing close consistency across models, with the exception of a few days immediately following the lockdown where the number of infections estimated by the BMA is slightly higher.

Figure 5 :
Figure 5: Goodness-of-fit of daily death data (a) and the inferred latent infections (b), produced using the random-walk (magenta lines) and BMA (orange lines).These densities are summarised by the mean (solid lines) and 95% credible intervals (broken lines).The black line indicates the day of lockdown in England 23 rd March, 2020.

Figure 6 :
Figure 6: Posterior mean (solid lines) and 95% credible intervals (broken lines) for the all England reproduction number R t,E .

Figure 13 :
Figure 13: Goodness-of-fit of daily death data (a) and the inferred latent infections (b), produced using the random-walk (magenta lines) and BMA (orange lines) for the region East of England.These densities are summarised by the mean (solid lines) and 95% credible intervals (broken lines).The black line indicates the day of lockdown in England 23 rd March, 2020.

Figure 14 :
Figure 14: Goodness-of-fit of daily death data (a) and the inferred latent infections (b) for the region North West.

Figure 15 :
Figure 15: Goodness-of-fit of daily death data (a) and the inferred latent infections (b) for the region Midlands.

Figure 16 :
Figure 16: Goodness-of-fit of daily death data (a) and the inferred latent infections (b) for the region London.

Figure 17 :
Figure 17: Goodness-of-fit of daily death data (a) and the inferred latent infections (b) for the region North East and Yorkshire.

Figure 18 :
Figure 18: Goodness-of-fit of daily death data (a) and the inferred latent infections (b) for the region South East.

Table 1 :
Runtimes (rounded to nearest integer), in seconds, of MCMC for SA, as a function of the number of basis n, in comparison with the runtime of SDE.These were run on a 3.6 GHz machine with 16 GB memory.