A Variational Bayes Approach to the Analysis of Occupancy Models

Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site (K) are as low as three and that the accuracy of the approximations improves as K increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).


Named Arguments
The arguments of the function are as follows • formula -Double right-hand side formula describing covariates of detection and occupancy in that order. e.g. Assume that the presence absence data is named y; the detection covariates is contained in a named list W (see below) and the occupancy covariates is stored X. Further suppose that the named lists are named W1, W2, W3 and X1 and X2 respectively. y ∼ W1 + W2 + W3 ∼ X1+ X2 would be one example of a suitable formula call. The function does not allow one to fit a model that only contains intercepts at the moment. This option will be included in future.
• design mats -A named list generated by the call vb Designs(W, X, y).
W is a named list of data frames of covariates that vary within sites. i.e. The data frames are of dimension n × J where each row is associated with a site and each column represents a site visit. e.g. Suppose W contained three data frames W1, W2 and W3; W$W1[1, ] = the covariate values for site 1 for all of the visits. Note that some of the entries might be 'NA' meaning that no visit took place at those occasions.
X is a named data frame that varies at site level.
y is an n × J matrix of the detection, non-detection data, where n is the number of sites, J is the maximum number of sampling periods per site. • alpha 0 -Prior mean of the detection covariate coefficients. It is assumed that the detection covariate coefficients have the following prior distribution α α α ∼ N (alpha 0, Sigma alpha 0). Here α α α is viewed as a vector.
• beta 0 -Prior mean of the occurrence covariate coefficients. It is assumed that the occupancy covariate coefficients have the following prior distribution β β β ∼ N (beta 0, Sigma beta 0). Here β β β is viewed as a vector.
• Sigma alpha 0 -Prior covariance matrix of the detection covariate coefficients.
• Sigma beta 0 -Prior covariance matrix of the occurrence covariate coefficients.
• LargeSample -LargeSample==TRUE -indicates that the number of sites is 'large' and that an approximation to B(µ, σ 2 ) is used instead of integrations (otherwise numerical integrations are performed).
• epsilon -Convergence measured relative to this quantity.

The values outputted by the function
• alpha -The VB estimate of the posterior mean vector of α α α. (s × 1 vector) • beta -The VB estimate of the posterior mean vector of β β β. (r × 1 vector) • Sigma alpha -The VB estimate of the posterior covariance matrix of the α vector. (s × s matrix) • Sigma beta -The VB estimate of the posterior covariance matrix of the β β β vector. (r × r matrix) • occup p -The VB estimate of the posterior occupancy probabilities at the sites considered.
(n × 1 vector) • Log mla -The lower bound of the log marginal log likelihood.

A small simulated data set
The following R code could be used to produce a small simulated data set that could be used to undertake the VB Laplace approximations.

A small example
The following R code could be used as an example of how to use the VB code in order to undertake a small analysis.