^{1}

^{2}

^{1}

The authors have declared that no competing interests exist.

We examine the performance of a strategy for Markov chain Monte Carlo (MCMC) developed by simulating a discrete approximation to a stochastic differential equation (SDE). We refer to the approach as

The advent of Markov Chain Monte Carlo (MCMC) has led to major advances in the application of Bayesian analysis in complex problems. The idea is simply put: faced with a posterior distribution too complicated to compute or simulate from directly (i.e., we cannot readily obtain the normalizer or denominator appearing in Bayes’ Theorem), one develops a Markov chain whose stationary distribution is known to coincide with the target posterior distribution. One then runs that chain, knowing that eventually realizations from the chain form an approximate dependent sample from the posterior. Those realizations are then used to estimate features of the posterior (i.e., posterior expectations of interesting quantities, predictive densities, etc.) [

For example, in some settings, nonlinearity and/or nonconjugacy of certain components of a large model render the standard Gibbs Sampler unusable. Metropolis-Hastings algorithms and Gibbs-Metropolis hybrids can be suggested, though these approaches can be taxing and may require substantial tuning.

In response to such difficulties, we explore diffusion based strategies for MCMC analysis. That is, one develops a diffusion (a solution, in the sense of Itô, to a stochastic differential equation) whose stationary distribution is the target posterior, see Chapter 5 of [

In part, our motivation for suggesting diffusion MCMC is its simplicity in terms of set-up. There are no probability calculations to perform, as in Gibbs’ Sampling, nor any need for choosing and updating distributions for generating candidate states. Indeed, the approach is recommended as an “off-the-shelf” strategy that can be readily implemented. However, as indicated below, it is not a panacea. Further, issues such as burn-in, mixing, convergence rates, and output analysis remain challenging.

Consider a Bayesian analysis for an unknown quantity _{0} is some fixed initial value and the _{0} is a random variable with specified density

Our interest is in stationary solutions, i.e., solutions that are functionally independent of time, so the partial derivative with respect to

We let ^{2}(⋅) satisfying

Having completed this step, we can simulate the diffusion and proceed as in MCMC. This is typically accomplished by forming a discrete-time approximation to ^{2}(⋅)) that work for a fixed Bayesian model. It is important to note that the core of a

In this article we discretize the diffusion _{m}}_{m ≥ 0},
_{0} = _{m+1} is a realization from a standard Gaussian distribution. Abusing notation, we use _{m}}_{m ≥ 0} to a continuous time process via interpolation; however this step is not necessary for this paper. From a practical perspective, we are interested in the process {_{m}}_{m ≥ 0}. Two critical questions arise:

Does the discrete stochastic process converge to a stationary, ergodic distribution?

If so, is that stationary distribution “close” enough to the target posterior distribution to justify the use of conventional output analysis to enable approximate Bayesian inference?

Unfortunately, there are situations where for any choice of the time step _{m}}_{m ≥ 0} are similar to those of {_{t ≥ 0}. In particular, in [

To illustrate

We emphasize again that our goal is to present the benefits of a procedure that has been present in the literature for over a decade. Due to its wild behavior even in some simple cases, it has received little attention, especially from practitioners and applied scientists. It has been proposed (see for example the MALA algorithm presented in [

The multivariate form of the diffusion

One class of problems in which diffusion MCMC may be useful involve nonlinearity. For example, suppose the likelihood function

^{2} known, ^{2}) and ^{2}). Of course, we know that

Let ^{−2} ^{−2} ^{−2} + ^{−2}). The solution to

It follows that

It can be shown that

Returning to the original parameterization, we conclude that as

_{1}, …, _{K}_{ij}|_{i} ∼ _{i}(⋅|_{i}) where _{i} and all _{ij} are conditionally independent. For example, let _{i} be the Gaussian pdf with mean _{i} and variance ^{2}, and the prior for _{i} be a Cauchy distribution with median ^{th} component of the drift coefficient ∇ log

Note that conjugacy plays no direct role in this approach, though the presence of the Cauchy distribution makes a Gibbs sampler infeasible. This example is further analyzed in the next Section.

_{1}, …, _{n} are conditionally independent and identically distributed given _{i}(⋅ | _{1}, …, _{n}_{i} > 0, _{i} _{i} = 1. Diffusion MCMC is easily formulated if the derivatives of the _{i} with respect to

_{i} are involved in the calculation. This parallels the familiar step in computing full conditionals in setting up a Gibbs Sampler. Namely, for each

Suppose that the Bayesian model takes the form _{1}, … _{q}_{1}, … _{q}

We adapt the notation in _{1}, … _{q}) define

Hence,

We note that Gibbs sampling is useful when the full conditionals

To provide insight into diffusion MCMC (DMCMC), we present a standard test case and a real-data example. Our goal is to assess the performance of DMCMC, especially in comparison with the current _{m} is the current estimate of the covariance matrix of the target distribution and ^{2} can also be “adapted”; in this case we refer to the procedure as

For comparisons, we inspect trace-plots to assess convergence and compare algorithms via their

This quantity is estimated by

We also add a computational constraint for our examples. We limit ourselves to relatively short runs of the Markov chains (AM and DMCMC). This can be very dangerous for classical MCMC since one will have difficulty assessing whether the chains have reached stationarity. Our examples will show that the diffusion approach quickly finds regions with high posterior probability and explores them thoroughly.

Assume that _{i1}, …, _{iri}|_{i}, _{i}, _{i}. We specify the variance

We let the sample sizes _{i} vary between 5 and 500. For _{1}, _{2}, …, _{500} we specify independent prior distributions, _{i}|_{i} − ^{2}]^{−1}. The parameter

In our experiments we observed that selecting _{1} and _{201}.

Parameter | Target value | DMCMC estimate | AM estimate |
---|---|---|---|

_{1} |
0.2508 | 0.2517 | 0.2451 |

_{201} |
-0.9333 | -0.8403 | -0.8637 |

Parameter | AJSD_{DMCMC} |
ASJD_{AM} |
---|---|---|

_{1} |
0.0042 | 0.17 × 10^{−4} |

_{201} |
0.0044 | 0.16 × 10^{−4} |

The left panels show the results from the adaptive MCMC sampler. The right panels show the

In [^{−1}) used here is
_{bx} is sliding velocity, ^{−3} is the density of ice, and ^{−2} is the gravity constant. Though the quantity

We assume the following quadratic model for the surface:
_{0} and _{1} are unknown parameters. The authors in [

The dataset consists of vectors

Let

^{10} = 1024 bins of equal length (189.5 m). All basal observations within each bin are averaged, leading to a data vector _{i} is the number of observations averaged in bin _{i} are either one or two). We selected Daubechies wavelets, see [

_{0} and _{1} were specified to be independent normal distributions with large variances: 10,000 for _{0} and 10 for _{1}. The means of these normal distributions were set to be equal to the least squares estimates of _{0} and _{1} derived from a traditional analysis fitting the model in

_{s}

We set

To develop reasonable priors for the Fourier coefficients in ^{−16} (which is consistent with the theoretical value of the parameter ^{−5}. These parameters were all assumed to be independent, normal random variables with prior variances equal to 10.

Left panel shows the surface data, middle panel show the velocity data, right panel shows the basal data.

Simulation of a diffusion formulated to have stationary distribution coinciding with a target posterior distribution is a viable MCMC method. The approach is comparatively simple to implement since it requires no probability computations such as those needed in Gibbs’ sampling nor any accept-reject steps as in Metropolis algorithms. These advantages can be significant in a variety of settings including mixture likelihoods and/or priors, hierarchical models, nonconjugate priors, and nonlinear models.

The key problem that arises in diffusion MCMC is the approximation of the desired continuous time diffusion by a discrete time Markov chain. Our implementations use Euler discretizations. As reviewed in the Introduction, there are results in the literature providing sufficient conditions under which the discrete approximation has a stationary distribution that approximates that of the target, continuous-time diffusion. Though beyond our scope here, selection of the time-step

We implemented diffusion MCMC for a familiar test problem and compared it to an adaptive MCMC procedure. We found that diffusion MCMC out-performed the “state-of-the-art” adaptive MCMC. Next, we implemented the diffusion MCMC approach in a complicated, nonlinear model involving glacial dynamics. Again, we found that our suggested approach performs well, mixing very fast.

In summary, we believe that diffusion MCMC is a valuable addition to the MCMC toolbox. By construction, the DMCMC algorithm has the ability to quickly find important regions of the target distribution, while a classical, even adaptive MCMC, may require longer exploration times (as seen in the glaciological example). It can be applied in great generality and with ease in some complicated contexts for which other MCMC methods are difficult or very time-consuming to implement. DMCMC does carry the baggage of temporal discretization and concern for the quality of the resulting approximation. Nevertheless, the potential power of diffusion MCMC justifies its application and further development.

We provide the surface, basal and velocity data used in this manuscript.

(ZIP)