Correction: Evaluating Multi-Level Models to Test Occupancy State Responses of Plethodontid Salamanders

open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
Salamanders are a diverse, widespread, and often abundant group of organisms occurring in a broad range of habitat types and serve critical roles in ecosystem dynamics such as nutrient cycling and food webs [1,2]. Evolutionary studies often use salamanders as subjects because of their novel morphological adaptations, broad geographic distributions, and ancient phylogenies [3][4][5]. However, life history traits such as limited dispersal and low reproductive rates, and preferences for cool, moist micro-habitats, can render some species vulnerable to habitat modification, exotic species, and disease, and, as a result, many taxa are threatened or imperiled [6][7][8][9].
Ecological roles of most salamander species are poorly documented [10,11]. Due to their multi-stage life histories and occupancy of structurally complex aquatic and terrestrial habitats, studying salamanders poses a unique set of challenges to investigators [12][13][14]. Long-term studies, preferably with experimental manipulations, are required to understand mechanistic population responses to disturbances events and gradients [11,15,16]. However, in most cases only a portion of the population (e.g., in fossorial or in-stream species) is available for sampling [17][18][19][20] as many individuals may be sub-surface during sampling. Also, only a portion of these available individuals will be detected due to variation in environmental conditions (e.g., habitat complexity) and animal behavior [21]. Failure to account for this imperfect detection causes bias in naïve estimators of occupancy and abundance [22,23]. In addition, regression coefficients for associations and/or effects will likely be biased, leading to ineffective or counter-productive conservation and management prescriptions [22]. Finally, disturbances induced by the sampling process can modify salamander habitat and behavior, and these additional sources of variation can bias inferences and future sampling efforts [17]. As a result, sampling methods and estimators of quantities of interest (e.g., occupancy or abundance) require careful thought and evaluation before deployment [24,25].
We evaluated sampling designs and models for estimating occupancy and detection probabilities for plethodontid salamanders within the context of a planned before-after controlimpact manipulative study, and applied them to empirical data on Oregon slender (Batrachoseps wrighti) and Ensatina (Ensatina eschscholtzii) salamanders as examples. The Oregon slender salamander (BAWR) is endemic to the Oregon Cascades, USA, and is distributed widely in Douglas-fir forests from~200-2500 feet in elevation, where the species demonstrates a strong reliance on decayed coarse woody debris [26]. Ensatina salamanders (ENES) are distributed broadly in coniferous and deciduous forests from British Columbia, Canada, to southern California, USA [27].
Occupancy is a state variable of interest in conservation and management studies (e.g., determining species distribution as a function of environmental covariates) and may be defined at different scales. An appropriate choice of scale will depend on factors such as species' home range and dynamics, scientific or management questions of interest, and sampling constraints [28]. Further, scientific or management interest related to occupancy processes may exist at multiple spatial scales, for example at both local and regional scales [29]. Multi-scale occupancy models are a valuable extension to standard single-scale occupancy models, as they can be used with nested samples to evaluate patterns of habitat selection and use at different spatial scales while accounting for dependencies in occupancy status across scales (i.e., local scale occupancy depends on regional scale occupancy) [30]. Spatio-temporal models are yet another extension of occupancy models that account for variation across time. Several approaches exist to account for temporal variation, including implicit or explicit process dynamics, as well as empirical statistical models used for repeated measures [31][32][33].
We used simulations to evaluate estimator properties for single-scale and multi-scale occupancy models [29,34], with empirical correlation structures to account for temporal dependencies, for the analysis of a planned before-after control-impact study. We generated empirical results for the hierarchical single-scale occupancy model using field data for both salamander species and discuss the relative importance of empirical and simulation results in the context of both observational and experimental studies. Researchers and managers can use results from our evaluations to design efficient studies on basic and applied aspects of salamander ecology, conservation, and management. Our results are likely applicable to a broad range of other taxa for which occupancy is evaluated.

Site Selection & Sampling
We sampled BAWR and ENES at 66 forested harvest units in the Cascade Range, OR, USA. Weyerhaeuser and Port Blakely Tree Farms LP owned the harvest units and provided permission for sampling to occur after reviewing all sampling procedures and experimental manipulations. Field sampling did not involved threatened or endangered species. Harvest unit age ranged from 35-90 ( x = 60; SD = 8) years and size from 20-183 ha ( x = 79; SD = 33). Clearcut harvests were used to regenerate all units. Harvest units occurred in one of two study blocks: Clackamas (Clackamas County, OR) or Snow Peak (Linn County, OR)(S1 Fig). We selected harvest units randomly for inclusion within a long-term experimental study of salamander responses to contemporary management prescriptions.
Within each harvest unit, we sampled seven 81 m 2 (9×9 m) plots in 2013-2014. We chose plot sizes to incorporate multiple salamander home ranges [28] and to allow systematic searching of habitat features. We selected each 81 m 2 plot randomly and sampled over three consecutive 10 minute intervals in a single day (sampling occurred between 0800-1600 and from April-June in 2013 and 2014). During each 10 minute interval, one observer surveyed the plot. Although the same harvest units were sampled in both years, we selected a new set of plots to sample in each year (i.e., each plot received a maximum of three 10 minutes surveys). Spatial and temporal replication was necessary to estimate and incorporate detection into estimates of occupancy [35]. Observers employed a light touch methodology [36,37] in which all surface objects, including leaf litter and moss blankets on logs, were turned over to observe salamanders. All surface objects were then returned to as close to their original position as possible so as to not negatively impact the habitat quality within a plot for salamanders. Observers did not survey the same objects during each interval, but aimed to sample the entire plot over the course of the three intervals. Salamanders were not handled with this survey method. We followed a removal sampling protocol in which sampling stopped once both species were observed in a plot [31]. During sampling, observers quantified number of pieces of coarse woody debris (CWD; all logs > 25 cm DBH (small end) and > 1 m in length).

Statistical Models
The sampling design described above incorporates replication at two spatial scales (unit-level and plots within units) and two temporal scales (year and visit). Our objective in formulating a statistical model was to estimate patterns of occupancy at relevant scales of interest while accounting for the sampling design and adjusting for imperfect detection. To this end, we considered two different statistical models each representing a distinct occupancy process.
The first approach, which we refer to as the "hierarchical single-scale model", is a modified version of the basic MacKenzie occupancy model [35], that incorporates random effects to account for both the nested spatial sampling design and repeated measurements from the same unit across time. Specifically, we let u ijk denote the occupancy of plot k in year j from harvest unit i, which takes the value of 1 when the plot is occupied and 0 otherwise. We assume that plot-level occupancy follows a Bernoulli distribution u ijk * Bern(ψ ijk ), where ψ ijk is the probability that plot k of harvest unit i during year j is occupied. Within-plot occupancy is assumed to be closed across all visits within a year. We further assume that occupancy probability is defined in terms of a mean model, μ ijk , and a unit-specific random intercept, b 0i : logit(ψ ijk ) = μ ijk + b 0i . The mean model may be parameterized in terms of harvest unit level, year level and plot level covariates, while the random effect imparts a correlation among plots within a harvest unit and among measurements on the same harvest unit over time. We assume independence among plots in different harvest units, and conditional independence among plots within the same harvest unit, given the unit-level random effects. For the detection process, we let y ijkl denote the detection status of plot k from harvest unit i during year j and visit l, taking the value of 1 when the species is detected and 0 otherwise. We considered species detection to also follow a Bernoulli distribution y ijkl = Bern(u ijk Á p ijkl ), with detection probability p ijkl .
The second approach, which we refer to as the "multi-scale" model, is a modified version of methods for multi-scale inference introduced in [29,34]. In this approach, we allow for two nested but distinct occupancy processes, one acting at the harvest unit level, and the other acting at the plot level, conditional on harvest unit occupancy. We let z ij denote the occupancy of harvest unit i in year j, which takes the value of 1 when the harvest unit is occupied and 0 otherwise. We assume that z ij follows a Bernoulli distribution z ij * Bern(θ ij ) with harvest unit occupancy probability θ ij . Similar to plot-level occupancy for the single-scale model, we parameterize unit-level occupancy probability in terms of a mean model μ ij and a unit-specific random intercept, b 0i : logit(θ ij ) = μ ij + b 0i . The random intercept in this model accounts for repeated measurements on the same harvest unit across years. We represent the plot-level occupancy as a Bernoulli random variable u ijk * Bern(z ij Á ψ ijk ). Under this parameterization, plot-level occupancy probability is equal to ψ ijk when z ij = 1, and equal to 0 when z ij = 0. The detection process is again considered to follow a conditional Bernoulli distribution y ijkl = Bern(u ijk Á p ijkl ), with detection probability p ijkl , and where y ijkl = 1 if the species is detected and 0 otherwise. Covariates may be included in any of these models, but are restricted to harvest unit and year variables for θ ij , to plot, year or harvest unit variables for ψ ijk , and visit, plot, year, or harvest unit variables for p ijkl . Under this model, we assume independence among harvest units and conditional independence on measurements of the same harvest unit over time, given the unit-specific random effects. Plots within a harvest unit are also considered conditionally independent given the partially latent occupancy status z ij .
Both the hierarchical single-scale and multi-scale models may contain effects acting at different scales. The primary distinction between the two models is the scale at which an assumed occupancy process manifests: strictly at the plot level for the hierarchical single-scale model or separately at both the plot and harvest unit levels for the multi-scale model. For example, in the hierarchical single-scale model, we assume that habitat selection is taking place at the plot scale only, but that factors at a larger scale (the harvest unit) may influence this selection. For the multi-scale model, we assume that two different selection decisions are taking place: at the harvest unit first, and conditional on a harvest unit being selected, a second selection decision is made at the plot scale. Given this distinction, we cannot directly contrast the performance of these models. Instead, we present the results here for each model as two distinct options to be considered for analysis.

Simulation Study
We conducted a simulation study to evaluate the effectiveness of each occupancy model in the context of a Before-After Control-Impact (BACI) [38] experimental design with multiple levels of sampling. Our goal was to assess the impact of sample size on the consistency and efficiency of the treatment effect estimator of a BACI analysis, and to aid in the design of a planned manipulative experiment. We simulated data for both the hierarchical single-scale process and the multi-scale process discussed in the previous section. Models were then fit to data simulated under the assumed model (e.g., the multi-scale model was only fit to data simulated from the multi-scale process). The impact of model misspecification and checks for model misspecification were not considered in this study.
To simulate a BACI design, we assumed that N harvest units would be split evenly among two treatments ("control" and "impact"), and would be followed for two years: one year "before" and one year "after". Furthermore, n plots were assumed to be placed within each of the N harvest units, with each plot visited m times. In this setting, harvest units represent our primary sampling units and are the experimental units with respect to the treatment. Plots within harvest units are a secondary sampling unit and represent sub-samples with respect to the applied treatment. As with our empirical study, we assumed that different plots within a harvest unit were sampled across years and could therefore be considered conditionally independent sampling units. Unlike our empirical study, the simulation study did not stop visits after the first detection.
The mean model used to simulate occupancy data from the hierarchical single-scale model was of the form: where ψ ij is the expected plot-level occupancy of year j in harvest unit i, Year ij is an indicator variable for Year = "after", and Treatment i is an indicator variable for "impact" stands. The random intercept b 0i is assumed to follow a normal distribution with mean = 0 and variance = s 2 b ; and is included to account for correlation among plots within a harvest unit. Under this occupancy parameterization, exp(β 3 ) is interpreted as the relative effect of the impact treatment on plot-level occupancy-odds, after controlling for year and stand-level variation. Thus, the parameter β 3 can be considered an estimator for the impact treatment effect. For simplicity, no covariates, including treatment effects, were included in the detection probability model.
The mean model used to simulate data from the multi-scale model was of the form: where θ ij is the expected harvest unit-level occupancy of harvest unit i in year j, Year ij is an indicator variable for Year = "after", and Treatment i is an indicator variable for "impact" harvest units. We used constant values for both the conditional plot-level occupancy, ψ ijk , and detection probability, p ijkl , for plot k and visit l. As with the hierarchical single-scale model, the random intercept a 0,i is assumed to follow a normal distribution with mean = 0 and variance = s 2 b : Under this parameterization, exp(α 3 ) is interpreted as the relative effect of the impact treatment on harvest unit-level occupancy-odds, after controlling for year and harvest unit-level variation.
We draw attention to the different interpretations of the "treatment effect" in each of these models. Under the hierarchical single-scale model, the treatment effect is estimated for plot level occupancy, whereas under the multi-scale model, the effect is estimated for harvest unitlevel occupancy. The two occupancy models correspond to distinct occupancy processes with different interpretations. Consequently, we used different parameter values in our simulation study for the two different models. In each case, we chose parameters to fall within an expected range for our application to BAWR and ENES (Table 1).
We evaluated all parameter combinations in Table 1, giving a total of 90 unique conditions for each model. Each unique condition was simulated 500 times. All datasets were simulated in R [39] and we fit all models using JAGS [40] called from R using package 'R2jags' [41]. For both models, we used 3 chains of length 10000 each, with a burnin of 5000 and 1/10 thinning. We checked convergence for occupancy and detection model parameters using the Gelman-Rubin statistic [42]. The results summary included only those simulations where all mean model convergence statistics were less than 1.1. We included the R code used to conduct the simulation study in S1 and S2 Files.

Empirical Analysis
We chose to fit the hierarchical single-scale model to the empirical data collected across two years on 66 harvest units in the Cascade Range, Oregon, USA. These data represent pre-treatment conditions in harvest units that will be included in a long-term BACI design to investigate the impact of timber harvest on BAWR and ENES occupancy. Our goals in this analysis were to obtain preliminary estimates of salamander plot-level occupancy in control conditions; to understand the association between occupancy and coarse woody debris; and to estimate detection probabilities. We could have chosen to fit the multi-scale model to our dataset. However, we selected randomly harvest units to be included in the study from a larger pool of harvest units, all of which were known to be occupied. Consequently, no variation existed in occupancy status at the harvest unit level, and we could not model the upper-level occupancy process. For both BAWR and ENES, we modeled plot-level occupancy as a function of CWD, Year, and Block: where CWD ij is the coarse woody debris count for plot j in unit i, Year ij is an indicator for whether plot j was sampled in 2014, and Block i is an indicator for harvest units in the Clackamas tree farm. We sampled coarse woody debris on plots each year during salamander sampling. Detection probability was modeled as a quadratic function of Julian date and an indicator variable for year: where Julian Date ijk is the Julian date for visit k of plot j in harvest unit i, and Year ij is an indicator for whether plot j was sampled in 2014. We centered and scaled Julian date prior to analysis. We fit all models in a Bayesian framework using N(μ = 0, σ 2 = 3) priors for the plot-level occupancy and detection probability intercepts, and N(μ = 0, σ 2 = 4) priors for occupancy and detection covariates. A Gamma(2,1) prior was used for the random effects standard deviation. We fit the models using JAGS, called from R, with three chains of length 55,000, a burnin of length 5,000, and 1/10 thinning. We assessed convergence using the Gelman-Rubin statistic and visual inspection of the chains [42]. The JAGS model used for this analysis is included in S3 File. The detection data and covariates for both BAWR and ENES are included in S1 and S2 Tables.

Ethics Statement
We conducted this research in compliance with all Oregon and USA laws and regulations. The Oregon State University Institutional Animal Care and Use Committee (IACUC) approved all activities involving the sampling and handling of live vertebrate animals.

Results Simulations
We used the results of the simulation study to focus on three quantities of interest related to the treatment effect estimator: bias, precision, and coverage probability. We calculated bias as the difference between the true value of the parameter and the expected value of the estimator, and (inverse) precision as the mean posterior standard deviation of the treatment estimator.
Coverage probability was calculated as the proportion of times a 95% equal-tail credible interval contained the true value of the treatment effect parameter. Bayesian posterior credible intervals often show close to nominal coverage probabilities in both small and large sample problems [42,43], and we examine their performance in our results.
Although we fit our models in a Bayesian framework, we would like to understand these frequentist properties of the model estimators, as they provide useful pre-experimental guidance on the use of the models and design considerations. Any Bayesian estimate will, to some extent, depend on the priors used in the analysis. Nevertheless, we consider it instructive to examine these pre-experimental properties of the model estimators, even under a limited set of conditions.
The hierarchical single-scale model showed consistent results for the average posterior mean of the treatment effect estimator for all sample sizes when detection probability was 0.5, but some evidence of bias when detection probability was 0.15 (Fig 1). Bias was generally small (<10%) for all detection probabilities when the number of harvest units was at least 40. Estimator bias for the multi-scale model was strongly dependent on both the number of simulated harvest units as well as the number of plots per harvest unit, particularly at lower detection probabilities (Fig 2). Although bias was generally low when detection probability was 0.5, substantial bias existed with low detection probabilities, particularly in cases with less than 50 harvest units (Fig 2). However, these trends suggest that with sufficient sample size at both the primary and secondary sampling units, approximately unbiased estimates can be obtained for the treatment effect estimator under either model in a BACI design, at least for the prior distributions and range of parameter values considered in this study.
We observed improved posterior estimator precision with increasing numbers of harvest units, as expected (Figs 3 and 4). Number of plots per harvest unit, which represent subsamples, also played a role in estimator precision by affecting the amount of information available to estimate occupancy of the primary sampling unit. These trends also suggest improved precision with increasing detection probability, but very little impact due to the size of the treatment effect.
Coverage probabilities for the single-scale model treatment effect estimator ( Table 2, S3  Table) were close to nominal, but somewhat conservative with an average value of 0.97.
Coverage probabilities for the multi-scale model treatment effect estimator were highly conservative, with 100% of all intervals containing the true value. This result indicates that the posterior standard deviation of the treatment effect parameter is not a good surrogate for the standard deviation of the estimator sampling distribution. The hierarchical single-scale model (Fig 5, S3 Table) showed generally good calibration between sampling variability and posterior estimates of variability when detection probability was 0.5 for all sample sizes considered in this study. As detection probability decreased, posterior variability tended to be greater than the sampling variability of the treatment estimator. Results from the multi-scale model show, across all conditions included in this study, that the average posterior variability is substantially greater than the sampling variability of the treatment estimator (Fig 6). The trends for the multi-scale model (S4 Table) suggest that posterior credible intervals for the treatment effect estimate will tend to be highly conservative, exceeding nominal coverage rates and limiting the efficiency of estimating the sign and magnitude of putative treatment effects.
All of the above results are based on a single set of prior distributions, chosen beforehand to reflect subject matter knowledge regarding plethodontid salamander behavior. Although we think that this choice of priors is appropriate for our current application, these priors are potentially informative and have the potential to affect estimator properties such as bias and precision. At the suggestion of a reviewer, we repeated our simulation study with relaxed priors -N(μ = 0, σ 2 = 10) for covariate parameters-for a reduced set of the conditions in Table 1. Results for this simulation study are summarized in S5 and S6 Tables. The hierarchical singlescale model with relaxed priors showed a negative bias in the treatment estimator expected value across most conditions (Fig 7). The magnitude of bias was affected by both sample size and detection probability, showing relatively small bias with moderate detection probabilities and larger sample sizes, and substantial bias with low detection probabilities. Results for the multi-scale model with relaxed priors showed a mix of positive and negative bias depending on the simulation conditions (Fig 8). Under both the single-scale and multi-scale models, estimator precision and coverage probabilities using the relaxed priors were substantially the same as with our original choice of priors (S5 and S6 Tables).

Empirical Evaluations
Estimated plot-level mean occupancies from the hierarchical single-scale model were higher in the Clackamas tree farm than the Snow Peak tree farm for BAWR (0.66 vs. 0.43), but lower for ENES (0.20 vs. 0.63)( Table 3). The results also showed a strong association between coarse woody debris count and BAWR plot-level occupancy. We estimated approximately 35% higher odds of occupancy for each additional piece of CWD found in a plot. In contrast, no clear association existed between ENES occupancy and CWD.
Detection probabilities were estimated to be lower in 2014 than in 2013 for both BAWR (0.21 vs. 0.33) and ENES (0.09 vs. 0.14)( Table 3). Detection probabilities tended to increase later in the season for BAWR, but decrease later in the season for ENES.

Discussion
Diverse life history strategies and behavioral variation make Plethodontid salamanders challenging organisms to study [17,19,44]. Evaluation of sampling designs and estimators for Evaluating Multi-Level Models responses of interest are critical to provide strong inference about Plethodontid ecology and responses to conservation and management activities. We used simulations to evaluate performance of two candidate occupancy models in an analysis of a BACI experiment, and to set  Table 2. Estimated coverage probabilities for nominal 95% equal-tail credible intervals of the treatment effect estimator from hierarchical singlescale and multi-scale occupancy models. Results in this table are limited to combinations with detection probability of 0.3 and seven plots. Results for other conditions are consistent with these trends (S3 and S4 Tables). Evaluating Multi-Level Models appropriate sample sizes for the experiment. Our results indicated expected improvements in precision with increasing numbers of primary sampling units, but also highlighted the potential gains accrued when adding secondary sampling units. Importantly, our results identified conditions that could suffer from excessive estimator bias under particular choices of prior distributions. Both models showed evidence of estimator bias at low detection probabilities and low sample sizes, a problem that was also dependent on the choice of prior distribution. However, the results also suggested that sufficient sample size at both the primary and secondary sampling levels could ameliorate this issue. Another insight from this study relates to the calibration between posterior uncertainty and sampling variability. Under some conditions, good calibration existed between these two quantities for the hierarchical single-scale model, suggesting that posterior credible intervals could have a similar pre-experimental interpretation as confidence intervals. However, in all cases studied for the multi-scale model and in many cases for the hierarchical single-scale model, the posterior variability was substantially greater than the sampling variability. Under these conditions, interpreting credible intervals as confidence intervals appears unjustified and Bayesian interpretation of results is preferred.

Number of plots
Based on the simulation results, we expanded the size of our multi-year BACI experiment from 45 to 66 harvest units and maintained seven plots per harvest unit. This change should help to reduce estimator bias and improve precision.
The results of the simulation study are limited in several ways. Our study did not fully address the sensitivity of results to the choice of priors. Our limited comparison of two different prior distributions highlights sensitivity of estimator properties to these choices, even with priors that may be considered 'reasonable' based on subject matter considerations. We therefore stress the importance of evaluating model performance within applied contexts prior to fitting the model to data, as well as the need for posterior model checking.
We did not consider issues of lack-of-fit, variable selection, or the impact of model misspecification on estimator performance. These issues are important, but we considered them to be outside the scope of the study. In our particular application, some of these issues are less of a concern, as certain aspects of the model will be determined by the sampling design. For Evaluating Multi-Level Models example, all design-based factors will be included in the occupancy model(s), using a pre-determined functional form, regardless of statistical 'significance'.
We did not address the question of how one might select between the two models that we explored in the simulation study. As the two models represent two distinct processes for occupancy, scientific judgment can and should be considered in the choice of model. In cases such as our empirical example, the choice of models may be guided by species distribution considerations apparent in the raw data. When limited background information is available to support one model over the other, or when a goal of the analysis is to infer which model is a better descriptor of the data, a range of options are available for Bayesian model selection and comparison [45,46]. In all cases, it is incumbent on the analyst to perform appropriate diagnostic checks.
In the empirical analysis, we wanted to obtain preliminary estimates of salamander occupancy and detection under control conditions; examine the association between salamander occupancy and coarse woody debris; and determine suitability of models for evaluation of data  Table 1 Table 1    collected within a BACI experimental design. Our results suggest higher plot-level occupancy of Oregon slender salamander than ENES at Clackamas; the reverse is true for Snow Peak. Our results also helped to quantify the positive association between BAWR occupancy and CWD count, a trend which had been observed during stand pre-selection. Oregon slender salamanders are thought to rely on CWD for foraging and nesting substrates, as opposed to the more generalist affinities of ENES [26]. However, we note that both BAWR and ENES appear to be relatively abundant on our study sites. For organisms occurring as relatively abundant and well-distributed populations, estimating effects of habitat change may be difficult if the response variable is occupancy as sample units may remain occupied even if population size changes substantially. Finally, the detection probability estimates for both BAWR and ENES were similar to conditions examined in the simulation study, and suggest low bias for the BACI treatment-effect estimator under either model (and our choice of priors), given that we have included 66 harvest units in our study. However, due to relatively low detection probabilities, we can anticipate that the multi-scale model treatment-effect credible intervals for ENES will be substantially larger than the estimator sampling variability. Consequently, the intervals can be expected to exceed nominal coverage, and will be conservative if used to assess evidence of a putative treatment effect.

Supporting Information
Additional supporting information may be found in the online version of this article.   Table. 'Multi-scale' BACI occupancy model simulation summary with alternative priors. (CSV)