^{1}

^{2}

^{*}

^{1}

^{3}

^{4}

^{3}

^{1}

^{4}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: JAG LNR KMP MSM. Performed the experiments: DAWM JDN JAG LNR KMP JEH MSM. Analyzed the data: DAWM JDN LNR JEH. Contributed reagents/materials/analysis tools: DAWM JEH. Wrote the paper: DAWM JDN JAG LNR KMP MSM.

Large-scale presence-absence monitoring programs have great promise for many conservation applications. Their value can be limited by potential incorrect inferences owing to observational errors, especially when data are collected by the public. To combat this, previous analytical methods have focused on addressing non-detection from public survey data. Misclassification errors have received less attention but are also likely to be a common component of public surveys, as well as many other data types. We derive estimators for dynamic occupancy parameters (extinction and colonization), focusing on the case where certainty can be assumed for a subset of detections. We demonstrate how to simultaneously account for non-detection (false negatives) and misclassification (false positives) when estimating occurrence parameters for gray wolves in northern Montana from 2007–2010. Our primary data source for the analysis was observations by deer and elk hunters, reported as part of the state’s annual hunter survey. This data was supplemented with data from known locations of radio-collared wolves. We found that occupancy was relatively stable during the years of the study and wolves were largely restricted to the highest quality habitats in the study area. Transitions in the occupancy status of sites were rare, as occupied sites almost always remained occupied and unoccupied sites remained unoccupied. Failing to account for false positives led to over estimation of both the area inhabited by wolves and the frequency of turnover. The ability to properly account for both false negatives and false positives is an important step to improve inferences for conservation from large-scale public surveys. The approach we propose will improve our understanding of the status of wolf populations and is relevant to many other data types where false positives are a component of observations.

Presence-absence surveys have become increasingly prominent in large-scale ecological and conservation research

Ecologists have long recognized the need to account for imperfect detection when estimating parameters for wildlife populations and have developed an extensive set of methods to deal with non-detection

Similarly, most attention for studies of species occurrence have focused on non-detection

Two approaches have been suggested for estimating occupancy when false positives occur in single season occupancy analyses. The first is a simple modification of the standard occupancy estimator

We extend previous methods which allow for false positive errors in single season data to models used to estimate occupancy dynamics across multiple seasons. To illustrate the approach, we examine occupancy dynamics for gray wolves (

Miller et al.

There are two possible occurrence sampling designs where both certain and uncertain detections could be recorded. The first is where either certain or uncertain detections can occur during a single sampling occasion. For example during an avian point count survey, observers may consider visual observations of morphologically cryptic species uncertain because of the potential for misidentification, but auditory observations certain if the call is distinct. The second sample design occurs when only one observation type may occur during any given sampling occasion so that detections during a sampling occasion are either all uncertain or are all certain. As an example, consider a mammal species where both scat-surveys and direct-trapping occurs. For many species, the probability of false positives occurring for scat surveys will be non-trivial due to potential species misidentification, and thus we would want to deem detections by this method uncertain. Our sampling design could be used if a second survey type that could be considered certain, such as trapping and direct handling, occurred in at least a subset of the sites. We refer to the two sampling designs as the multiple detection state model and multiple detection method model, respectively. In both cases a site must be occupied for certain detections to be recorded, but there is some possibility when an uncertain detection is recorded that the site is actually unoccupied (i.e., false positive detection).

We estimate occupancy dynamics among seasons following the general framework for multiseason occupancy models described by MacKenzie et al. _{0} and the probability of being unoccupied by 1−ψ_{0}. The probability an unoccupied site in time _{t} and the probability it will remain unoccupied is (1−γ_{t}). Similarly, the probability an occupied site in time _{t} and the probability it will remain occupied is (1−ε_{t}).

The difference between our approach and the approach described by MacKenzie et al. _{10} and the probability of no detection is 1−_{10}. For occupied sites, no detections, certain detections, and uncertain detections can occur. We use 1−_{11} to denote the probability of not detecting the species. The probability the detection will be certain, _{11}*(1–_{11}*

Now consider the multiple detection method design where individual sampling occasions will include either all certain detections or all uncertain detections. When the uncertain method is used, species will be detected at sites that are unoccupied with a false positive detection probability _{10}, while no detection will occur with probability (1- _{10}). For occupied sites, the true positive detection probability is _{11} and the probability of a false negative error is (1- _{11}). When the certain method is used, the probability is 1 that no detections will occur for unoccupied sites. If the site is occupied, the probability of a true positive detection is _{11} and of not detecting the species is (1- _{11}).

The parameters above can be used to calculate the probability of an encounter history occurring for a site, where _{i}^{th} site. The product of the probabilities for data from all the sampled sites is then used to generate maximum likelihood estimates for parameters. The following provides examples of how to calculate probabilities for different potential encounter histories. Consider the case where a site is sampled on 3 occasions during each of two consecutive seasons. When both types of observations can be recorded in the same sampling occasion, we denote non-detections as 0, uncertain detections as 1, and certain detections as 2. The interval between seasons is shown using a space so that the encounter history

Because no certain detections occurred, it is possible that the site could have either been occupied or unoccupied in each of the two time periods. Thus, the probability is the sum of the probabilities for each of the 4 possible state combinations: unoccupied in both seasons, unoccupied then occupied, occupied then unoccupied, or occupied in both seasons. In each case the probability calculation is the product of 1) the probability of being in the initial state, 2) the probability of observing a set of detections conditional on the starting state, 3) the probability of being in a state in the second season conditional on the initial state, and 4) the probability of observing a set of detections conditional on the state of the site in the second season. Calculating probabilities of encounter histories that include additional seasons involves iterating the 3^{rd} and 4^{th} steps for each additional season.

Consider another site where the observed encounter history is

Because a certain detection occurred during the first season we have only two possibilities for the true state of the site over the two seasons. The first possibility is that the site was occupied in the first period but transitioned into being unoccupied in the second. In this case detections during the second season would be false positives. Alternatively, the site could have been occupied in both seasons.

Next consider a survey involving two detection methods employed on separate occasions, where the site is sampled twice using an uncertain method and once using a certain detection method in both seasons. The encounter history

Note that the detection portion of the probabilities only has 2 terms for unoccupied sites. This is because the probability of getting a non-detection for the second survey conditional on the site being unoccupied is 1.

Next consider the encounter history

Because of the certain detection during the first season, only two possibilities exist for the true states during the two seasons, occupied in the first season and not in the second or occupied in both.

Further variation can be accounted for by allowing any of the parameters to vary among seasons, among sampling occasions, or among sites. This is easily done by specifying model parameters as linear functions of covariates (e.g. logit[ε_{it}_{it}

We can formulate the estimator for a general sampling design allowing for additional occupancy and observation states and for varying degrees of certainty. Sampling designs in which observations can be divided into certain and uncertain detections are a special case of the more general estimator described here. Multistate occupancy models that allow for >2 occupancy states are useful for many sampling situations

We consider a standard multiseason occupancy survey where ^{th} site is visited _{it}^{th} season. The true occupancy state of the ^{th} site in the ^{th} season, _{it}^{th} site on the ^{th} visit in the ^{th} season, y_{irt}_{lk}_{irt}_{it}^{th} site on the ^{th} visit in the ^{th} season, w_{ist}_{mk}_{ist}_{it}

The likelihood of the full set of parameters θ given the full set of encounter histories

The initial state distribution ^{th} element of the vector is the proportion of sites in the ^{th} occupancy state at the start of the study. Transition probabilities among occupancy states between years are given by the ^{th} column and ^{th} row give the probability a site will be in occupancy state

The final component is _{h,t}_{h,t})_{h,t}

The ^{th} element of _{h,t}^{th} element is given by^{th} element is given by

The same likelihoods can be used to implement the detection structure of other single season occupancy estimators that account for false positives to be used in dynamic analyses. In addition, the approach is flexible enough to allow for other multistate occupancy models that incorporate multiple species, abundance classes, reproductive state, and habitat to be modified to allow for false positive detections.

We use the estimator to examine multiseason occupancy patterns of gray wolves in northern Montana from 2007–2010. The area includes Montana’s portion of the federally designated Northwest Montana Recovery Area

We make a few key assumptions in using our approach. First, certain detections only occur for established packs and thus our estimates of occupancy are for the probability a pack occurs rather than wolves in general. We suspect that in some cases hunters report transient wolves, which would be the functional equivalent of false positive observations in our model. This definition of occupancy is concurrent with our desired metric for monitoring, making it a feature of the approach in our case. We also work under the assumption that the probability a pack is detected using the certain survey method (i.e., the probability that >1 wolf from a pack is captured and radio-collared) is not correlated with detection probabilities of packs by hunters. We believe our sample of known packs is representative and this was not an issue but consideration should be given to this condition when using our estimator. Finally, locations for our certain method of observation are typically are gathered before and immediately after the hunting season rather than during the hunting season. We therefore assume that the resident wolf packs monitored for our certain detection method use the same territories during the hunting season.

Our goal was to estimate the proportion of 600-km^{2} grid cells (i.e., mean territory size of wolf packs in Montana;

We recognized there was wide variation in the density of wolves related to habitat quality across this region and classified cells based on a composite measure of habitat quality. Based on prior research we identified 4 measures we believed to be good predictors of wolf distribution

Cells were divided based on perceived habitat quality into low, medium, and high quality categories (A). Most certain observations of known packs based on collaring and relocation by radio telemetry were concentrated in the western end of the study area where the higher quality habitat was located (B). While high frequency of observations by hunters also occurred in high-quality areas, they also reported a low frequency of wolf observations in the eastern portion of the study area, which we suspected were due to misidentification (C).

Unaccounted-for heterogeneity in observation effort can bias estimates of occupancy making it important to account for variation in effort ^{2} in each of the 162 hunting units within Montana using a stratified network estimator

All models were fit using PRESENCE v 3.1 _{11}, p_{01}, and r_{11}), where the most general parameterization for other parameters was used (model 5 in the next paragraph). We considered four alternatives that differed according to whether detection for the hunter survey (p_{11} and p_{01}) varied with respect to hunting effort and whether the known-pack survey locations (r_{11}) varied by habitat quality. The detection models were: 1) neither effect, 2) only the effort effect, 3) only the habitat effect, and 4) both effects. In all cases we specified that all detection parameters varied among years. Hunter effort was used to account for the wide variation among grid cells in the opportunity for hunters to detect wolves, which was likely to affect detection probabilities. We included an effect of habitat quality to account for the possibility that a perceived lack of wolves in low-quality cells may have led to lower detection rates of established packs.

Using the parameterization for the best detection model (lowest AIC), we considered 5 alternatives for the occupancy parameters related to whether or not transitions (ε and γ ) varied annually and whether all the parameters (ψ_{0}, ε, and γ) varied by habitat quality. The alternative occupancy models included: 1) neither effect, 2) only the year effect, 3) only the habitat effect, 4) an additive combination of habitat and year, and 5) an interaction between habitat and year.

We also compared the estimates from the model with the lowest AIC among our alternatives to equivalent parameterizations where false positives were assumed not to occur and where both false negatives and false positives were assumed not to occur. First, we estimated parameters when false negatives were allowed but false positives were not. We did this by fixing the false positive probability to equal 0 in the models described above. This is equivalent to using the standard dynamic occupancy estimator proposed by MacKenzie et al.

In general, medium and high habitat quality cells were in the western end of the study area and are associated with mountainous and forested terrain (

For cells with known packs this distribution increased across the possible values peaking at the maximum of 5 times. For cells without a known pack, as expected due to unoccupied cells, the proportion of 0 observations was much higher than for those without a known pack. If no false positives occurred we would expect the relative frequencies for 1 to 5 hunter observations to be the same for cells with and without known packs. Instead we see a greater relative frequency of 1 or 2 observations in cells without known packs, which is consistent with a low probability of false positive errors occurring in unoccupied cells.

The parameterization of our model with the lowest AIC value was the one where detection for the hunter survey increased with respect to hunter effort and detection of packs was somewhat lower in low quality habitat (detection model 4). The best model for initial occupancy and transitions included an additive function of year and habitat quality (occupancy model 4). The next best model, which did not include an annual effect on transition probabilities, had ΔAIC = 6.3. Because of the strong support for the best model and for ease of comparison we focused on results for the best fitting model.

Both the true positive (p_{11}) and false positive (p_{01}) detection probabilities for the hunter survey data increased as hunter effort increased (_{11}) were lower in low quality habitats than medium and high quality, and all detection probabilities increased across years.

Plotted lines are the estimated relationships for 2010, and dashed lines are 95% confidence intervals. Our best model included an additive effect of year so that in other years detection had the same basic relationship to hunter effort on a logit scale but was lower overall.

As expected, occupancy dynamics varied among low, medium, and high quality cells (

Cells were divided into low, medium, and high habitat quality. Parameters were estimated using a naïve approach where false positives and false negatives were assumed not to occur, using a standard multi-season occupancy estimator where false positives were assumed not to occur, and using our multi-season occupancy estimator that allows for false positives.

Estimates based on our approach differed significantly from estimates using a naïve estimator and occupancy estimates when false positives were assumed not to occur (

We demonstrate that detection errors in general, and false positives in particular, can have large effects on estimates of range-dynamics and other presence-absence processes. False negatives led to underestimation of occupancy, while false positives led to over-estimation of occupancy, extinction, and colonization. Whereas the importance of accounting for false negative errors is frequently recognized, much less attention has been given to the potential effects of false positive errors. However, even small probabilities of false positive errors can lead to significant over-estimation of occupancy once false negatives are accounted for

The effects of false positives are less predictable when estimating transition probabilities among time periods

Our results in general suggest that hunter observations are a viable survey method to monitor range dynamics of wolves, particularly when detection errors are accounted for. We found that range size was generally stable across the years of our study, with most occurrences restricted to higher quality habitat in the western end of our study area, consistent with existing models of wolf habitat suitability

In recent years initiatives to collect and analyze large and extensive data sets collected by the public have increased

False positive errors are not limited to data collected by the public, and growing evidence suggests they may be common to many ecological data sets. Multiple studies have demonstrated that false positives frequently occur in auditory call surveys for birds and amphibians

The emergence of large collaborative monitoring efforts is an exciting development that will provide many unique opportunities to inform conservation and improve ecological understanding. The success of these efforts will depend on whether analysis methods properly account for the observational uncertainty that is inherent in these data sets. The methods we present here are an important step in that direction. We believe explicitly accounting for observational uncertainty can address the limitations of many data and will open the door to many exciting applications.

We thank K. Wash and J. Dykstra who helped design and implement the hunter surveys and more than 50 Montana Fish, Wildlife and Parks telephone survey interviewers per year who conducted the annual surveys of deer and elk hunters. We thank C. Sime, K. Laudon, L. Bradley, N. Lance, M. Ross, A. Nelson, and dozens of technicians and volunteers for conducting the wolf population monitoring that constituted our “certain” surveys.