Annealed Importance Sampling for Neural Mass Models

Neural Mass Models provide a compact description of the dynamical activity of cell populations in neocortical regions. Moreover, models of regional activity can be connected together into networks, and inferences made about the strength of connections, using M/EEG data and Bayesian inference. To date, however, Bayesian methods have been largely restricted to the Variational Laplace (VL) algorithm which assumes that the posterior distribution is Gaussian and finds model parameters that are only locally optimal. This paper explores the use of Annealed Importance Sampling (AIS) to address these restrictions. We implement AIS using proposals derived from Langevin Monte Carlo (LMC) which uses local gradient and curvature information for efficient exploration of parameter space. In terms of the estimation of Bayes factors, VL and AIS agree about which model is best but report different degrees of belief. Additionally, AIS finds better model parameters and we find evidence of non-Gaussianity in their posterior distribution.

Multipling the integrand top and bottom by a 'proposal' density g(w)/Z g and re-arranging givesā where the importance weight v(w) = f (w)/g(w). A Monte Carlo estimate is given bȳ where the samples w i are drawn from the proposal. We can see that We can therefore writē

Model Evidence
By letting a(w) = p(y|w, m) and f (w) = p(w) we have An importance sampling estimate of the model evidence is therefore given by where q is known as an importance or approximating density (previously g/Z g ), w i are samples from q, and v (i) are referred to as importance weights. Different choices for q give rise to different IS approximations to the model evidence.
The simplest choice is the prior density, q(w|m) = p(w|m), which gives rise to the Prior Arithmetic Mean This approximation can of course be motivated from a simple Monte Carlo approximation to the evidence integral. A problem with this estimate, however, is that most samples from the prior will have low likelihood. A large number of samples will therefore be required to ensure that high likelihood regions of parameter space will be included in the average. If this does not occur then the model evidence will be under-estimated. A second choice is the posterior density, q(w|m) = p(w|y, m). Application of Bayes rule to the numerator and denominator of equation 7 then leads to the expression for the Posterior Harmonic Mean A problem with the PHM is that the largest contributions come from low likelihood samples which results in a high-variance estimator. In applications to phylogenetic networks, the PHM has been shown to overestimate the model evidence [2].
A third possibility which we explore in this paper is p ISV L which uses equation 7 with a proposal density given by the posterior from VL optimisation. Being a Gaussian this is straightforward to sample from and the importance weights are given by the ratio of the probability of the sample under the prior versus under the VL posterior.

Reverse Annealing
By inverting the equation for the model evidence we have

2/4
Averaging over multiple trajectories gives which shows that the PHM approximation to the model evidence is a special case of AIS with a reverse annealing schedule and only J = 2 temperatures. Importance weights for reverse annealing are given by and a series of samples w J , w J−1 , ...w 2 , w 1 are created by starting with w J from forward annealing, and generating the others sequentially using LMC.

Importance Weights
To derive the importance weights for AIS we consider the forward and backward joint densities over the whole trajectory. The backward density constitutes our target f and the forward constitutes our proposal g. If the importance weights are chosen to correct for discrepancies between them, then expectations based on w 1 , ..., w J will be correct. If expectations over the joint density are correct then so will be those over any element of the joint. Thus w J will be a sample from the posterior density. The density of the backward sequence is where the backward transition kernel is related to the forward as We can therefore write ...
The density of the forward sequence is Hence the importance weights are given by v = f (w 1 , ..., w J ) g(w 1 , ..., w J ) which is equal to Because the mean importance weight is equal to Z f /Z g (from equation 4), then if f 0 is the prior and f J is the unnormalised posterior, the mean importance weight is also equal to the model evidence [3].