## Figures

## Abstract

Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site (*K*) are as low as three and that the accuracy of the approximations improves as *K* increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).

**Citation: **Clark AE, Altwegg R, Ormerod JT (2016) A Variational Bayes Approach to the Analysis of Occupancy Models. PLoS ONE 11(2):
e0148966.
https://doi.org/10.1371/journal.pone.0148966

**Editor: **Corrie S. Moreau, Field Museum of Natural History, UNITED STATES

**Received: **June 16, 2015; **Accepted: **January 26, 2016; **Published: ** February 29, 2016

**Copyright: ** © 2016 Clark et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **The data has been appended as a RData file.

**Funding: **J.T. Ormerod received funding from the Australian Research Council Early Career Award DE130101670. R. Altwegg received funding from a National Research Foundation grant "Dynamic macroecology in conservation" (Grant No: 81685 and 85802). A.E. Clark received a South African National Research Foundation scholarship to complete this work.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Bayesian analysis is a coherent statistical paradigm whereby prior information regarding the research area is blended with that of information obtained from the observed data [1]. Subjective prior information is *elicited* either from expertise in the field or based on prior research (meta analyses). Informative priors are increasingly being used in ecology ([2, 3]) and even in the absence of prior information many ecologists are using Bayesian methods [4].

One class of model that is often analysed in a Bayesian way is the occupancy model [5]. The single season occupancy model was formulated by using ideas borrowed from closed population mark-recapture models. In this model *n* sites are visited a number of times (*K*) in order to estimate the occupancy (** ψ**) and detection probability (denoted throughout as

**) of a species associated with each site. (The term**

*d**detection probability*should be read as

*conditional detection probability*throughout the text.) These methods are particularly useful when studying the range dynamics of various animal species and have extensively been applied in the ecological literature (see [6, 7] and [8] for some examples). The model has been formulated as a hierarchical Bayesian model which has lead to numerous extensions of the single season occupancy model ([9, 10] and [11]).

Many papers have investigated the statistical properties of the estimators of the single season occupancy model. The first of these developed a maximum likelihood formulation of the model and investigated the properties of the estimators for the occupancy and detection probabilities using simulations. They assume that the parameters of the model are constant for all sites although also consider incorporating covariates in the model. They found that when *d* ≥ 0.3, the parameter estimates of the occupancy probability were reasonably unbiased when *K* ≥ 5 while when *K* = 2, a detection probability of at least 0.5 is required to provide a reasonable estimate of *ψ*. They also found that when the true detection probability is low that tends to 1 [5]. Numerous authors have found similar results regarding boundary problems ([12, 13] and [14]) although it has been argued that boundary parameter estimates are rare but could be observed in small data sets [13].

Moreno and Lele investigated the small sample properties of the maximum likelihood estimators [15]. They note that ‘When detection or occupancy probability is small or when the number of sites and number of visits per site is small, maximum likelihood estimators (MLE) of site occupancy parameters have large biases, are numerically unstable, and the corresponding confidence intervals have smaller than nominal coverage.’ They proposed a penalized maximum likelihood method which performed adequately for small sample sizes. Recently, their study has been extended by considering three different penalized likelihood type models [14]. They found that the penalized methods performed well and suggested that ‘fully Bayesian methods would be competitive’.

Here, we develop Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model. One big advantage of the methods developed here is the fact that they could be applied to cases where the researcher has informative priors and might not want to rely on the use of the MLE method. In that situation, Markov Chain Monte Carlo (MCMC) methods were so far the only methods available for fitting occupancy models in a Bayesian analysis. However, for big data sets, MCMC methods can be too slow to be useful. Admittedly the potential computational efficiencies accrued from using a VB algorithm compared to the MLE method would possibly only apply when fitting more complicated occupancy type models. We view our contribution as a first step towards developing similar methods for more complicated occupancy models (e.g., the inclusion of site-specific random effects, spatial occupancy models and dynamic occupancy models).

The proportion of occupied sample locations ( where *z*_{i} is the occupancy state for site *i*) is a derived parameter of interest in many studies ([9, 16]). Although frequentist methods can be used to estimate , the calculation of a valid confidence interval for is problematic for the frequentist. The same holds true for prediction of occupancy status in species distribution models [17]. We show (via simulations as well as using practical examples) that the VB approximations can be used to accurately obtain prediction intervals for latent state variables (e.g., occupancy states ** z**) or for functions of these state variables by simulating from the VB posterior distributions.

This paper commences with a brief discussion of Variational Bayes (VB). Thereafter, a VB implementation of a particular occupancy model is developed in Section 1.2. In Section 2.1 the results of a short simulation study are presented while in Section 2.2 we analyse site occupancy data of five bird species to illustrate the usefulness of the VB technique developed. A list of some of the notations and distribution theory used in the text can be found in S1 Text.

## 1 Material and Methods

### 1.1 A brief introduction to Variational Bayes (VB)

Variational Bayes is used to approximate posterior distributions obtained when undertaking Bayesian analysis and could be useful in many ecological applications.

In what follows let ** θ** be a vector of parameters of a statistical model,

*π*(

**) be a prior distribution for these parameters and**

*θ***be a random variable. In the context of this article,**

*y***are the parameters of a single-season occupancy model while**

*θ***represents detection-nondetection data used to fit an occupancy model. Further, suppose that a posterior distribution**

*y**π*(

**|**

*θ***) is not analytically tractable and that analytical expressions for its posterior moments do not exist. In probability theory the Kullback-Leibler (KL) divergence provides a measure of a difference between two probability distributions [18]. When the two distributions being compared are exactly the same the divergence measure is equal to zero while when they are different the divergence measure is positive.**

*y*The VB method approximates a posterior distribution by using a distribution *q*(** θ**) which is obtained by minimizing the Kullback-Leibler (KL) divergence between

*q*(

**) and**

*θ**π*(

**|**

*θ***) [18]. The KL divergence is (1) where**

*y**p*(

**) is the marginal likelihood,**

*y**p*(

**,**

*y***) is the joint likelihood of the data and the parameter vector**

*θ***with (2)**

*θ*Since *KL* (*q*(** θ**)||

*p*(

**|**

*θ***)) ≥ 0, ln**

*y**p*(

**) ≥**

*y**L*(

*q*(

**)) for every**

*θ**q*(

**) and minimising**

*θ**KL*(

*q*(

**)||**

*θ**p*(

**|**

*θ***)) is equivalent to maximising**

*y**L*(

*q*(

**)). Often it is assumed that**

*θ**q*(

**) can be factorized as a product of simple probability distributions as**

*θ**q*(

**) = ∏**

*θ*_{i}

*q*(

*θ*

_{i}) where each of the

*q*(

*θ*

_{i}) are iteratively estimated as . Here denotes an expectation with respect to the density ∏

_{j ≠ i}

*q*(

*θ*

_{j}). An alternate method of obtaining

*q*(

**) involves making an assumption regarding its parametric form. The parameters of this distribution are obtained my maximising**

*θ**L*(

*q*(

**)) [19].**

*θ*VB is often used as an alternative to Markov chain Monte Carlo (MCMC) methods since the method can be much faster to implement since in most applications *q*(*θ*_{i}) will be of a known simple form ([20–22]). Variational approximations to posterior distributions can accurately estimate the posterior mean of the parameters, although the posterior variances of some of the parameters might be underestimated ([23, 24]). Although this problem is context specific the estimate of the posterior variance is asymptotically valid for linear models [25]. As a solution the variational covariance matrix is often replaced by the inverse of the Fishers’ information matrix [23]. Alternately the non-parametric bootstrap could be used to provide interval estimates of the parameters [26].

### 1.2 VB applied to single season occupancy models

In a single season occupancy model *n* sites are visited ** K** times in order to estimate the occupancy (

**) and detection (**

*ψ***) probability of a species associated with each site. Each site could be surveyed a different amount of times such that**

*d***= (**

*K**K*

_{1},

*K*

_{2}, … ,

*K*

_{n})

^{T}where

*K*

_{i}represents the number of surveys undertaken to site

*i*and

**is a ragged matrix with dimensions determined by**

*d***. The total number of site visits undertaken is defined as**

*K**N*= ∑

_{i}

*K*

_{i}.

The data collected at each site are represented as an *N* dimensional vector , where each of the *y*_{i} denotes the vector of detections and nondetections for site *i*. A 0 in the vector *y*_{i} indicates that the species was not observed at the *i*^{th} site during a particular visit while a 1 indicates that the species was observed at the particular site during a particular visit. Let the vector ** z** represent the true species occupancy at the sites considered. Since we are using a single season model,

**is assumed to be constant across the season.**

*z***is partially observed, i.e.**

*z**z*

_{i}= 1 if the species occupies site

*i*and

*z*

_{i}= 0 if it does not occupy site

*i*. We know

*z*

_{i}= 1 if the species is observed at site

*i*during any of the visits since we assume that there are no false identifications of individuals. If the species is however not observed at site

*i*,

*z*

_{i}could equal 0 or 1 since we are uncertain about whether the species actually occurs at that site. We treat

*y*_{i}as a row vector and is of dimension 1 ×

*K*

_{i}while

**is of dimension**

*z**n*× 1.

The single season occupancy models can be represented using the following hierarchical model [9]
for all sites *i* = 1, … , *n*; for all visits *j* = 1, … , *K*_{i}. *ψ*_{i} = *Pr*(*z*_{i} = 1) denotes the probability that the species occurs at site *i* while *p*_{i, j} = *Pr*(*y*_{i, j} = 1|*z*_{i} = 1) denotes the conditional probability of detecting the species during the *j*^{th} visit of site *i* given that the species is present at site *i*. The occupancy probabilities and the detection probabilities can be estimated using either maximum likelihood [5], penalized maximum likelihood [15] or Bayesian methods [9]. In what follows we develop a VB approach to estimating these quantities.

Additional covariate data collected at each of the sites are used to estimate the site occupancy and detection probabilities. Specifically we assume that we have *r* occupancy and *s* detection covariates. We further assume that we have no missing values in these covariates. Formally we let ** W** and

**be the design matrices for the detection and occupancy effects respectively, with dimensions**

*X**N*×

*s*and

*n*×

*r*. Correspondingly, let

**and**

*α***be the detection and occupancy effects with dimensions**

*β**s*× 1 and

*r*× 1 respectively. The matrix

**is constructed by row-binding the detection covariates at the different locations and for different visits one below each other such that where each of the matrices are of dimension**

*W**K*

_{i}×

*s*with .

The occupancy and detection probabilities at the various sites for all visits are modelled using the following logistic link functions

It can be shown that the conditional likelihood of the data and the true occupancy variables is

We now assume that the prior distribution for ** α** and

**are multivariate Gaussian distributions (denoted as**

*β**π*(

**,**

*α***)) with parameters , and , respectively. We further assume that the**

*β***variational approximate distribution**of

*π*(

**,**

*α***,**

*β***) is of the form**

*z**q*(

**,**

*α***,**

*β***) =**

*z**q*(

**,**

*α***)∏**

*β*_{i}

*q*(

*z*

_{i}) where each of the

*q*(

*z*

_{i}) are Bernoulli distributed with success probability (

*sp*)

_{i}. Under this restriction

*q*(

**,**

*α***) can be factorized into two separate factors**

*β**q*(

**) and**

*α**q*(

**) with (3) (4) where , , and**

*β**b*(

*x*) = ln(1 + exp(

*x*)). Here . Refer to S1 Appendix for a derivation of the above results.

The normalization constant of *q*(** α**,

**) is not known analytically and thus**

*β**q*(

**) and**

*α**q*(

**) are not of a known type. We attempt to approximate the posterior distribution of (**

*β***,**

*α***) using two different methods. In the first method we approximate the variational distribution by using a Laplace approximation to Eqs (3) and (4) and thus assume that the variational distributions are multivariate Gaussian with parameters**

*β***,**

*μ*_{α}**Σ**and

_{α}**,**

*μ*_{β}**Σ**respectively; while in the second method we employ a tangent based approximation to

_{β}*b*(

**) and**

*Wα**b*(

**) to obtain approximations to**

*Xβ**q*(

**) and**

*α**q*(

**) respectively.**

*β*Once we have obtained approximations to *q*(** α**,

**) it then follows that the**

*β**q*-densities,

*q*(

*z*

_{i}|

*y*_{i}=

**0**)∀

*i*, is Bernoulli distributed with success probability (1 + exp(−

*c*

_{i}))

^{−1}. The approximate conditional occupancy probabilities for all sites can then be calculated for the two methods (denoted as ‘

*L*’ and ‘

*T*’ respectively) using (5) (6)

Here is a vector of length *K*_{i} such that

The parameters of the variational distributions are all dependent on one another and can be computed using an iterative scheme such as that given in Algorithm 1 and Algorithm 2. A detailed description of aspects of the above derivations can be found in the supplemental information to this paper. In particular, the quantities used to calculate can be found in S2 Appendix while an explanation regarding the stopping rule for both algorithms is described in S3 Appendix.

### 1.3 SIMULATION STUDY

In the following simulation study we investigate some of the properties of the VB method and investigate whether it could be used to produce *statistically valid inference*. We specifically focus on the frequentist properties of the posterior mean parameters of the VB distribution of ** α** and

**. This task is undertaken by**

*β**empirically*comparing the coverage probability and credibility/confidence intervals of

**and**

*α***associated with the two VB methods developed and comparing these to the same statistics obtained using MCMC and maximum likelihood. We calculate credibility intervals for the Bayesian methods and confidence intervals for the MLE method and focus particularly on the 95% credibility or confidence intervals.**

*β***Algorithm 1 Iterative scheme for obtaining the parameters of the optimal density of q(α, β) using the Laplace approximation.**

1. Initialize ** μ_{α}**,

**Σ**,

_{α}**,**

*μ*_{β}**Σ**

_{β}2. Cycle:

2.1 Cycle:

** μ_{α}** ←

**+**

*μ*_{α}**Σ**

_{11}

*g*_{1}and

**←**

*μ*_{β}**+**

*μ*_{β}**Σ**

_{22}

*g*_{2}

until the Newton-Raphson algorithm converges.

2.2 Calculate conditional occupancy probabilities for all sites where *y*_{i} = **0** using Eq (5). Note that (*sp*)_{i} = 1 for all sites where .

until the change in becomes negligible. (≤ 10^{−6})

**Algorithm 2 Iterative scheme for obtaining the parameters of the optimal density of q(α, β) using the tangent based method.**

1. Initialize *μ*_{α}, **Σ _{α}**,

**,**

*μ*_{β}**Σ**,

_{β}

*a*_{N}> 0 and

*b*

_{N}> 0.

2. Cycle:

2.1 Calculate the conditional occupancy probabilities for all sites where *y*_{i} = **0** using Eq (6). Note that (*sp*)_{i} = 1 for all sites where .

2.2 Set , , and with

2.3 Calculate the ‘variational parameters’. Refer to S2 Appendix.

until the change in becomes negligible. (≤ 10^{−6})

The accuracy of the VB approximations to the posterior distribution obtained through MCMC is also assessed. This is undertaken by calculating . The *acc*(*x*) measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution.

Occupancy models are often used to assess the predictive distribution of the proportion of occupied sites defined as . We thus investigate the posterior approximation of the PAO using the Laplace VB posterior approximation method. These can easily be obtained by sampling from the VB posterior distribution for each *z*_{i} in turn to construct the PAO statistic. To assess the VB approximations the acc(*x*) statistic was used.

We consider 32 simulation settings. The number of sites (*n*) are set to 50 and 100 while the number of visits to each site (*K*) are set to 2, 3, 4 and 5 respectively. The following combinations of the regression coefficients were used: 1. ** α** = [0, 1.75]

^{T},

**= [−1.85, 2.5]**

*β*^{T}; 2.

**= [1.35, 1.75]**

*α*^{T},

**= [−1.85, 2.5]**

*β*^{T}; 3.

**= [0, 1.75]**

*α*^{T}

**= [−0.1, 2.5]**

*β*^{T}and 4.

**= [1.35, 1.75]**

*α*^{T}

**= [−0.1, 2.5]**

*β*^{T}. These parameter values ensure an approximate average detection and occupancy probability among the sites of (0.5, 0.3), (0.7, 0.3), (0.5, 0.5) and (0.5, 0.7) respectively. We have not considered any cases where the detection and occupancy probabilities are lower than 0.3 since in these cases data sets are expected to be very sparse which requires many site visits in order to undertake useful statistical inference [5].

The occupancy regression covariate was obtained by standardizing a Uniform(−2, 2) random variable while the detection covariate was obtained by standardizing a Uniform(−5, 5) random variable. Each of these variables were transformed to have a zero mean and a standard deviation of one. The following parameter vectors were used to specify the prior distribution of the parameters: , for ** i** =

**,**

*α***.**

*β*Each simulation setting was replicated 350 times. All calculations were undertaken using R 3.3.1 [27]. Numerical optimizations were performed using the the BFGS method of the R function *optim*; MCMC sampling was undertaken using the R package R2jags [28] in combination with JAGS 3.4.0 [29] while all variational approximations were performed using the authors’ code. 100000 posterior samples were obtained for each MCMC simulation. The first 25000 samples were discarded as burn-in samples while the remaining 75000 samples were retained. Prior experimentation using the MCMC algorithm indicated that 25000 iterations are enough to ensure that the Markov chains would converge to the stationary distributions. The posterior samples were not thinned [30].

## 2 Results

### 2.1 SIMULATION RESULTS

Table 1 contains a summary of some of the results of the simulation study. For each value of *K* we tabulate the median coverage probability and credibility/confidence interval width of ** α** and

**associated with the four estimation procedures considered here. The medians are calculated across the different occupancy and detection probability combinations for fixed values of**

*β**n*and

*K*.

We found that as the number of sites increased, the credibility (and confidence) interval widths of the true regression parameters decreased (for all methods). For a fixed number of sites, the credibility (and confidence) interval widths of the true ** α** parameter values decreased as

*K*increased while the associated widths for the

**parameters did not appear to decrease noticeably with an increase in**

*β**K*. The simulation results suggests that the coverage probabilities associated with the Laplace method and the MLE methods (for all regression parameters) are very close to that of the nominal coverage value of 0.95 for

*K*≥ 3. It is evident from these results that the Tangent based method does not perform well under any of the scenarios considered and consistently produced the smallest credibility interval widths.

Based on the accuracy calculations across the replicate data sets, the Tangent based method generally appears to be worst at approximating the marginal posterior distributions of both ** α** and

**when comparisons are made based on the median accuracy measure for these parameters. As an example of these simulation results, consider the scenario where the estimated mean detection and occupancy probability across all sites and revisits are both 0.5 (see Fig 1). In general, the posterior approximations for the detection regression parameters were quite good with median accuracy statistics greater than 0.8 even when the number of revisits are small. The accuracy statistics for the detection regression parameters dramatically increase as**

*β**n*and

*K*increases with median accuracy statistics in excess of 0.95 when

*K*= 5. Similar comments can be made regarding the accuracy of the posterior approximations for the occupancy regression parameters. In general, the accuracies increase with an increase in

*n*and

*K*however the rate of increase in the accuracy statistics for the occupancy regression parameters appears slower than those observed for the detection regression parameters. The box plots of the accuracy statistics associated with the different methods for the remaining cases can be found in the Supporting Information (see S1, S2 and S3 Figs).

The detection and occupancy probabilities are approximately 0:5. The accuracy of the VB approximations is measured by calculating The measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution.

We found that the accuracy of the approximate predictive distributions for the proportion of occupied sites improves as *K* increases (see Fig 2). This observation is consistent across all of the scenarios considered. From an examination of the posterior predictive distributions (not shown here) it is evident that the VB predictive distributions are lighter tailed than the MCMC predictive distributions however this effect is reduced for *K* ≥ 3. This observation can clearly be seen when examining the results displayed in Tables 2 and 3. It is noticeable that the summary statistics of the predictive distributions using the two methods are very similar although the VB predictive distributions display a slightly reduced posterior variance under certain conditions.

The accuracy of the VB approximations is measured by calculating The measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution.

Box plots of the accuracy measurements for the predictive distribution of the proportions of occupied sites associated with the Laplace method for number of sites *n* = 50, 100 and number of visits to each site *K* = 2, 3, 4 and 5. The accuracy of the VB approximations is measured by calculating . The measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution.

### 2.2 Application to real data sets

As examples of the proposed technique, we use detection-nondetection data extracted from the second Southern African Bird Atlas Project [31] database (see http://sabap2.adu.org.za/) for 2012 to compare the performance of different methods for fitting a single season occupancy model. The data were collected by citizen scientists using 5-minute latitude × 5-minute longitude rectangular grids across South Africa [31]. Each site is approximately 8 km × 7.6 km [8]. The citizen scientists were asked to make a list of all the species that they encountered during at least two hours of intense birding. They were allowed to add additional species to the list for up to five days. By providing information on the species that they encountered, the citizen scientists implicitly also provided information about the species they did not encounter. Hence, we extracted detection-nondetection data for five bird species (1. Black-headed heron (*Ardea melanocephala*), 2. Egyptian goose (*Alopochen aegyptiaca*), 3. orange-throated longclaw (*Macronyx capensis*), 4. white-browed sparrow-weaver (*Plocepasser mahali*) and 5. Long-tailed widowbird (*Euplectes progne*)) from this database, treating each check-list as an independent observation. We included all grid cells in and around Gauteng, South Africa, that contained a minimum of three site visits. Many of the sites were visited a large number of times but we limited the maximum number of site visits to five (since the focus of the analysis was to assess whether the VB techniques could be used to analyse studies which have relatively small sample sizes and low number of revisits per site). This restriction reduced the data sets to 123 sites; 50 of which had three surveys; 52 had four surveys and the remaining 21 sites had 5 surveys.

In our analysis we specifically compare the MLE, MCMC and the VB methods where uninformative priors (as in the simulation study) were used for all parameters. We fitted a model with one detection covariate and one occupancy covariate. The detection covariate used was the number of species observed by the birder (denoted as *nspp*) while the occupancy probability was modelled as a function of the ratio of potential to realized evapotranspiration (*AETdivPETs*). *AETdivPETs* is a measure of vegetation cover and hydric stress and is an important predictor for bird species occurrence in South Africa [32]. Both covariates were standardized to have zero mean and unit variance.

Maximum likelihood estimation was undertaken using the R package *unmarked*[33]; MCMC sampling was undertaken using the R package jagsUI [34] while all variational approximations were performed using the authors’ code. The R code used to perform the analysis (S1 Code), the data (S1 Data) as well as documentation regarding the VB code (S2 Code, S2 Data) can be found in the Supporting information. The MCMC estimation was undertaken as per the simulation study discussed previously.

The approximate posterior means and standard deviations of the VB distributions were all close to the posterior means and standard deviations obtained using MCMC (see Table 4 and Fig 3). The regression coefficients are all positive and statistically significantly different from zero. From an examination of the predictive distributions of the PAO for the different species it is evident that the VB distributions can be used to obtain accurate approximations to the true predictive distributions of the PAO (see Fig 4 and Table 5). Notice that the accuracy statistics for all of the species considered were above 0.9.

A comparison between the VB distributions (solid line) and the posterior distributions obtained using MCMC (the histogram) for the regression parameters of the detection and occupancy process for the different bird species (denoted as (1) = Black-headed heron, (2) = Egyptian goose, (3) = Orange-throated longclaw, (4) = White-browed sparrow-weaver and (5) = Long-tailed widowbird).

The accuracy statistics (*acc*(*x*)) are displayed in brackets. The *acc*(*x*) measure lies between 0 and 1 with a value of 1 indicating a perfect approximation and a value close to 0 indicating a poor approximation by the variational distribution to the true posterior distribution.

## 3 Discussion

We developed two new methods of approximating the posterior distribution of the parameters of a Bayesian single season occupancy model that use logistic link functions. The first method uses a Laplace approximation of the VB optimal distributions while the second method utilizes the tangent based method of [35]. Based on the simulation studies it was found that the Laplace approximation method performed well under most conditions considered. We believe that the approximation results obtained using the probit link function would be similar to those obtained using the tangent based method and thus did not explicitly consider this link function here. The methods have laid the groundwork that would enable VB methods to be applied to more complicated occupancy models and are currently the focus of ongoing research.

One big advantage of the methods developed here is the fact that they could be applied to cases where the researcher has informative prior information and might not want to rely on the use of the MLE method. In that situation, Markov Chain Monte Carlo (MCMC) methods were so far the only methods available for fitting occupancy models in a Bayesian analysis. However, for big data sets, MCMC methods can be too slow to be useful. The code used to implement the methods is available in R and was at least 100 times faster than running MCMC using jagsUI **in our example**.

Simulations showed that when uninformative prior distributions were used, in general, the Laplace method attains very similar frequentist coverage probabilities to those obtained by the MLE method when the number of sampling occasions is at least three. We advise that the approximate methods could be used when the detection probability is at least 0.5 and there are at least three sampling occasions.

A further advantage of the methods developed here is the ease with which one can approximate the predictive distribution of the proportion of area occupied. Our simulation results showed that the Laplace approximate method can be used to obtain approximate distributions of the PAO. For scenarios where the detection probabilities are relatively low and the number of sites visits are small (*K* = 2) we found that the approximate methods slightly under estimate the upper bound of the PAO. The differences between the true predictive distribution and the approximate one is however very small for *K* ≥ 3.

In both of the methods considered the approximate distributions derived are both multivariate Gaussian. When the sample size is particularly small, the number of sampling occasions is low (possibly one or two) or when the detection probability is low (less than 0.3) we have found that the posterior distributions of the parameters of the model are often skewed (particularly the occupancy covariate parameters). In these cases the approximate methods do not work well. Future work could entail the use of skew distributions similar to that proposed by [36].

## Supporting Information

### S1 Text. Some notation and distribution theory used in the main part of the text.

https://doi.org/10.1371/journal.pone.0148966.s001

(PDF)

### S1 Appendix. Derivation of the lower bound to the joint likelihood and the VB distributions.

https://doi.org/10.1371/journal.pone.0148966.s002

(PDF)

### S2 Appendix. Derivation of the tangent based method.

https://doi.org/10.1371/journal.pone.0148966.s003

(PDF)

### S3 Appendix. Explanation regarding the convergence calculations.

https://doi.org/10.1371/journal.pone.0148966.s004

(PDF)

### S1 Fig. Box plots of the accuracy measurements for the model parameters associated with the Laplace (L-dark boxes) and Tangent (T-light boxes) based method for number of sites *n* = 50, 100 and number of visits to each site *K* = 2, 5.

The detection probability is approximately 0.5 while the occupancy probability is approximately 0.3.

https://doi.org/10.1371/journal.pone.0148966.s005

(TIF)

### S2 Fig. Box plots of the accuracy measurements for the model parameters associated with the Laplace (L-dark boxes) and Tangent (T-light boxes) based method for number of sites *n* = 50, 100 and number of visits to each site *K* = 2, 5.

The detection probability is approximately 0.7 while the occupancy probability is approximately 0.3.

https://doi.org/10.1371/journal.pone.0148966.s006

(TIF)

### S3 Fig. Box plots of the accuracy measurements for the model parameters associated with the Laplace (L-dark boxes) and Tangent (T-light boxes) based method for number of sites *n* = 50, 100 and number of visits to each site *K* = 2, 5.

The detection probability is approximately 0.7 while the occupancy probability is approximately 0.5.

https://doi.org/10.1371/journal.pone.0148966.s007

(TIF)

### S1 Code. The R code used to undertake the analysis.

https://doi.org/10.1371/journal.pone.0148966.s008

(R)

### S2 Code. How to use the VB Laplace approximation code.

https://doi.org/10.1371/journal.pone.0148966.s009

(PDF)

### S1 Data. The R data file that contains the data used to undertake the analysis.

https://doi.org/10.1371/journal.pone.0148966.s010

(RDATA)

### S2 Data. The R data file that contains the data used to explain how to use the VB Laplace approximation code.

https://doi.org/10.1371/journal.pone.0148966.s011

(RDA)

## Acknowledgments

This research was partially supported by an Australian Research Council Early Career Award DE130101670 (Ormerod), two South African National Research Foundation grants namely 81685 and 85802 (Altwegg), as well as a scholarship by the NRF (Clark). The financial assistance of the NRF towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at, are those of the author and are not necessarily to be attributed to the NRF.

We hereby acknowledge that some of the computations were performed using facilities provided by the University of Cape Town’s ICTS High Performance Computing team: http://hpc.uct.ac.za. We hereby thank the members of the Centre for Statistics in Ecology, Environment and Conservation for helpful comments in particular Greg Duckworth and Sanet Hugo. We also thank the three anonymous reviewers and the academic editor for their helpful comments, which greatly improved this manuscript.

We have no conflict of interest to declare.

## Author Contributions

Conceived and designed the experiments: AEC RA JTO. Performed the experiments: AEC. Analyzed the data: AEC. Contributed reagents/materials/analysis tools: AEC. Wrote the paper: AEC RA JTO. Designed the software used in analysis: AEC. Wrote computer code used to perform all analysis: AEC.

## References

- 1.
Robert C, Casella G. Monte Carlo statistical methods. Springer Science & Business Media; 2013.
- 2. Kuhnert PM, Martin TG, Griffiths SP. A guide to eliciting and using expert knowledge in Bayesian ecological models. Ecology Letters. 2010;13(7):900–914. pmid:20497209
- 3. Choy SL, O’Leary R, Mengersen K. Elicitation by design in ecology: using expert opinion to inform priors for Bayesian statistical models. Ecology. 2009;90(1):265–277. pmid:19294931
- 4. Clark JS. Why environmental scientists are becoming Bayesians. Ecology letters. 2005;8(1):2–14.
- 5. MacKenzie DI, Nichols JD, Lachman GB, Droege S, Royle JA, Langtimm CA. Estimating site occupancy rates when detection probabilities are less than one. Ecology. 2002;83(8):2248–2255.
- 6. Bailey LL, MacKenzie DI, Nichols JD. Advances and applications of occupancy models. Methods in Ecology and Evolution. 2014;5(12):1269–1279.
- 7. Bled F, Nichols JD, Altwegg R. Dynamic occupancy models for analyzing species’ range dynamics across large geographic scales. Ecology and Evolution. 2013;3(15):4896–4909. pmid:24455124
- 8. Broms KM, Johnson DS, Altwegg R, Conquest LL. Spatial occupancy models applied to atlas data show Southern Ground Hornbills strongly depend on protected areas. Ecological Applications. 2014;24(2):363–374. pmid:24689147
- 9.
Royle JA, Dorazio RM. Hierarchical modeling and inference in ecology: the analysis of data from populations, metapopulations and communities. Academic Press, San Diego, CA; 2008.
- 10. Johnson DS, Conn PB, Hooten MB, Ray JC, Pond BA. Spatial occupancy models for large data sets. Ecology. 2013;94(4):801–808.
- 11. Royle JA, Kery M. A Bayesian state-space formulation of dynamic occupancy models. Ecology. 2007;88(7):1813–1823. pmid:17645027
- 12. Welsh AH, Lindenmayer DB, Donnelly CF. Fitting and interpreting occupancy models. PLoS One. 2013;8(1):e52015. pmid:23326323
- 13. Guillera-Arroita G, Lahoz-Monfort JJ, MacKenzie DI, Wintle BA, McCarthy MA. Ignoring Imperfect Detection in Biological Surveys Is Dangerous: A Response to’Fitting and Interpreting Occupancy Models’. PloS One. 2014;9(7):e99571. pmid:25075615
- 14. Hutchinson RA, Valente JJ, Emerson SC, Betts MG, Dietterich TG. Penalized likelihood methods improve parameter estimates in occupancy models. Methods in Ecology and Evolution. 2015;6(8):949–959.
- 15. Moreno M, Lele SR. Improved estimation of site occupancy using penalized likelihood. Ecology. 2010;91(2):341–346. pmid:20391998
- 16.
MacKenzie DI. Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. Academic Press; 2006.
- 17. Kéry M, Gardner B, Monnerat C. Predicting species distributions from checklist data using site-occupancy models. Journal of Biogeography. 2010;37(10):1851–1862.
- 18. Kullback S, Leibler RA. On information and sufficiency. The Annals of mathematical statistics. 1951;22(1):79–86.
- 19. Ormerod JT, Wand MP. Explaining variational approximations. The American Statistician. 2010;62(2):140–153.
- 20. McGrory CA, Titterington D. Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis. 2007;51(11):5352–5367.
- 21.
Bishop CM. A new framework for machine learning. In: Computational Intelligence: Research Frontiers. Springer; 2008. p. 1–24.
- 22. Hensman J, Rattray M, Lawrence ND. Fast variational inference in the conjugate exponential family. In: Advances in Neural Information Processing Systems; 2012. p. 2888–2896.
- 23.
Wang B, Titterington DM. Inadequacy of interval estimates corresponding to variational Bayesian approximations. Proceedings of the 10th International Workshop on Artificial Intelligence; 2005.
- 24. Grimmer J. An introduction to Bayesian inference via variational approximations. Political Analysis. 2010;19(1):32–47.
- 25. You C, Ormerod JT, Müller S. On variational Bayes estimation and variational information criteria for linear regression models. Australian and New Zealand Journal of Statistics. 2014;56(1):73–87.
- 26. Nathoo FS, Babul A, Moiseev A, Virji-Babul N, Beg MF. A variational Bayes spatiotemporal model for electromagnetic brain mapping. Biometrics. 2014;70(1):132–143. pmid:24354514
- 27.
R Core Team. R: A Language and Environment for Statistical Computing; 2014. Available from: http://www.R-project.org/.
- 28.
Su, YS, Yajima, M. R2jags: A Package for Running jags from R. R package version 003-08, URL http://CRAN.R-project org/package=R2jags. 2012;.
- 29.
Plummer M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd international workshop on distributed statistical computing. vol. 124. Technische Universit at Wien Wien, Austria; 2003. p. 125.
- 30. Link WA, Eaton MJ. On thinning of chains in MCMC. Methods in Ecology and Evolution. 2012;3(1):112–115.
- 31.
Harebottle DM, Smith N, Underhill LG, Brooks M. The Southern African Bird Atlas Project 2; 2007.
- 32. Péron G, Altwegg R. The abundant centre syndrome and species distributions: insights from closely related species pairs in southern Africa. Global Ecology and Biogeography. 2015;24(2):215–225.
- 33. Fiske I, Chandler R. unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance. Journal of Statistical Software. 2011;43(10):1–23.
- 34. Kellner K. jagsUI: Run JAGS (specifically, libjags) from R; an alternative user interface for rjags. R package version. 2014;1.
- 35. Jaakkola TS, Jordan MI. Bayesian logistic regression: a variational approach. Statistics and Computing. 2000;10(2):25–37.
- 36.
Ormerod JT. Skew-Normal Variational Approximations for Bayesian Inference. School of Mathematics and Statistics, University of Sydney, Technical Report CRG- TR-93-1; 2011.