Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Statistical Efficiency in Distance Sampling

Statistical Efficiency in Distance Sampling

  • Robert Graham Clark


Distance sampling is a technique for estimating the abundance of animals or other objects in a region, allowing for imperfect detection. This paper evaluates the statistical efficiency of the method when its assumptions are met, both theoretically and by simulation. The theoretical component of the paper is a derivation of the asymptotic variance penalty for the distance sampling estimator arising from uncertainty about the unknown detection parameters. This asymptotic penalty factor is tabulated for several detection functions. It is typically at least 2 but can be much higher, particularly for steeply declining detection rates. The asymptotic result relies on a model which makes the strong assumption that objects are uniformly distributed across the region. The simulation study relaxes this assumption by incorporating over-dispersion when generating object locations. Distance sampling and strip transect estimators are calculated for simulated data, for a variety of overdispersion factors, detection functions, sample sizes and strip widths. The simulation results confirm the theoretical asymptotic penalty in the non-overdispersed case. For a more realistic overdispersion factor of 2, distance sampling estimation outperforms strip transect estimation when a half-normal distance function is correctly assumed, confirming previous literature. When the hazard rate model is correctly assumed, strip transect estimators have lower mean squared error than the usual distance sampling estimator when the strip width is close enough to its optimal value (± 75% when there are 100 detections; ± 50% when there are 200 detections). Whether the ecologist can set the strip width sufficiently accurately will depend on the circumstances of each particular study.

1 Introduction

The number of animals in a region is often of ecological importance. This paper considers the distance sampling approach to estimating abundance, in its usual conjunction with line transect sampling. Transect lines are laid down across a region, often parallel and equally spaced but not necessarily so. Observers move along the transects, and record observations of animals (or plants, or groups of animals, or other objects) and their perpendicular distances from the transect. Empirically, more objects are detected near to transect lines than far from them in many studies, suggesting that detectability is a decreasing function of distance. The distance sampling methodology exploits this phenomenon, by modelling the detection rate as a function of distance. The number of detected objects can then be scaled to estimate the abundance N allowing for imperfect detection. Provided the assumptions of the method are met, the maximum range can be made fairly large, thereby increasing the sample size, while avoiding or reducing bias due to declining detection rate. Distance sampling is widely used in ecology: a Web of Science search found 276 articles on distance sampling in ecology journals in 2014 alone. The wide range of applications includes wild horses in the Australian Alps [1], large herbivores in South Africa’s Kruger National Park [2] and odonata (dragonflies) in a rainforest locality in Papua New Guinea [3]. For a detailed description of the approach, see [4].

The major assumptions of the method are

  1. Detection is perfect at zero distance.
  2. The detection function is of known form, with some unknown parameters requiring estimation. Alternatively, model-averaging may be used provided the detection function is assumed to be one of a known set of alternatives.
  3. Animals’ distances to the nearest transect line are (at least approximately) uniformly distributed.

It is also assumed that there is no measurement error (for example, false positive detections, or mis-measurement of distances), that there is no movement of objects in response to the observer which could lead to multiple chances of detection, that detection events are independent, that there are sufficient transects for reliable estimation, and that transect lines are a good representation of the study area. This paper will consider the conventional distance sampling (CDS) scenario where there is only one observer. The same methodology can be used with multiple observers by pooling their detections. Mark recapture and mark recapture distance sampling (MRDS) are methods which more fully use data from multiple observers by matching their detections; MRDS, in particular, can be used to relax assumption (i). More recent approaches combine a spatial model with the detection model (see for example [57] and [8]). Spatial models allow abundances to be estimated for subregions, and can exploit spatial trends in estimation, however inference may be sensitive to the assumed spatial model which must therefore be carefully constructed. This paper focuses on CDS, as most applications of line transect sampling remain single observer. The use of spatial models in distance sampling is an important advance, but some researchers may decide that the extra effort and complication required to develop an adequate spatial model are not warranted for some studies, particularly when the number of detections is relatively small.

Robustness to the 3 assumptions above is explored in the literature. For example, MRDS can be used to achieve some level of robustness to (i). Assumption (ii) is dealt with by the use of flexible families of detection functions with two or more parameters, and the use of model averaging. [9] argue that (iii) is approximately satisfied provided transect lines are placed randomly or systematically from a random starting point. However, [10] question the uniformity assumption and find that CDS estimators are biased in a design-based framework, that is, under repeated random placement of transect lines. The matter remains in contention [1113]. [11] suggest an alternative approach where the detection function is estimated from a separate calibration study, and [14] propose estimating detection probabilities using multiple observer data and possibly but not necessarily distance data.

A natural alternative to distance sampling is the simple scaling up of observations in a strip about the transect. When strips are too wide, this strip transect estimator is severely negatively biased due to non-detections, particularly of the more distant animals in the strip. When strips are sufficiently narrow, this bias becomes negligible, but the variance of the estimated abundance becomes large. Distance sampling aims to achieve lower variance by including observations at greater distances, while reducing bias by adjusting for non-detection. But there is a hidden cost—the effect of unknown detection parameters on the precision of the estimated abundance—which reverses at least some of the benefit due to wider strips. This paper illuminates this cost both asymptotically and in small samples. This existence of a penalty due to unknown detection parameters is known, but no asymptotic expression has been derived in the literature, and the penalty has not been quantified except in a simulation of the half-normal and negative exponential detection functions [15]. A number of authors have compared CDS and strip transect estimators in particular applications (e.g. [1, 16]) but typically with only one or two options for strip width. [17] also suggested strip transects as an alternative to CDS but did not make a numerical comparison.

An example of the variability of CDS estimators may be found in [1], which compares strip transect estimates with two different strip widths to CDS estimates with various detection functions, and to mark-recapture and MRDS estimates of abundance. Detections of groups of wild horses are attempted up to 200m from a helicopter traversing 91 parallel transect lines in south-eastern Australia, resulting in 52 group detections (pooled from two observers). Table 1 here reproduces results from page 1145 of [1]. The strip transect estimates of abundance are the number of observed groups within 50m or 200m multiplied by the total area divided by the area lying within either 50m or 200m of a transect line, with no allowance for undercount. The values of the Akaike Information Criterion (AIC) are also shown for each detection model. Model averaged estimates are calculated using weights calculated from the AICs as described in [18]. Mark recapture and MRDS results are not reproduced here as they are outside the scope of the current paper. The CDS estimator shown in Table 1 corresponds to formula (6) in the next section of the current paper.

Table 1. Estimates of Density (abundance/area) of Horse Groups obtained from [1].

CV is the estimated coefficient of variation of the estimate, given by the square root of the estimated variance divided by the estimate.

Table 1 shows that the CDS abundance estimators have high coefficients of variation (CV), with the two-parameter hazard rate model leading to a much higher CV (about 57%) than the other detection models which are one-parameter (CVs between 19% and 25%). The model-averaged estimator has a CV of 30%. The strip transect estimator using detections up to 200m has much lower CV, but is much lower than the CDS estimators, suggesting that it is negatively biased due to undetected groups. The strip transect estimator based on detections up to 50m is more plausible. It is close to the model-averaged CDS estimator, suggesting that its bias is small, presumably because the detection rate would decline relatively little over this shorter range. Surprisingly, the 0–50m strip transect estimator gives similar CVs to the 0–200m model-averaged CDS method, even though it only uses one quarter of the distance range and 44% of the detected groups. This motivated the research in the current paper on the efficiency of CDS estimators.

Section 2 proves for the first time that the usual CDS estimator of N is also a maximum likelihood estimator (MLE) of N under a particular model provided the likelihood is approximated using Stirling’s Rule. Its asymptotic variance under the model is derived using this result. The asymptotic variance is identical to that of [19], but the use of the Stirling approximation allows a simpler derivation. The limiting variance of is expressed as the variance when the detection function is known multiplied by a penalty for unknown θ. This asymptotic penalty is tabulated for various detection models. It can be substantial, and for many situations arising in practice is between 2 and 6. Section 3 summarizes a simulation study to evaluate the small sample performance of CDS estimators for various detection functions and strip transect estimators with differing strip widths. Unlike the theoretical result in Section 2, the simulation allows for overdispersion in the counts of animals falling in any given range. Section 4 contains conclusions about the magnitude of the penalty due to unknown detection parameters in CDS, and the relative performance of CDS and strip transect estimators.

2 Theoretical results on maximum likelihood estimation of N

2.1 Notation and background

The aim is to estimate the number of objects, which may be animals, groups of animals or other objects, in a region. Let N be the number of objects in a defined study area and n be the number of detections made by an observer moving along predefined line transects. Observation may either be one-sided (only objects to the right, or only objects to the left, are observed) or two-sided. Only objects with perpendicular distance up to a pre-chosen limit w from a transect line have a chance of being observed (the covered region); let Nc be the number of such objects. Two-sided observation is the more common case, but one-sided observation is sometimes necessary, for example if the observer can only see out one side of a vehicle. It is assumed that the probability of observing an object at perpendicular distance di from a transect line is g(di, θ) when 0 ≤ diw, where θ is a vector of p parameters specifying the function within a family. It is assumed that g(0, θ) = 1 and that g is a non-increasing function of distance. The Distance software [20], which implements both CDS and MRDS, allows four possible functional forms for g(), including the half-normal model, , and the hazard rate function, . Both of these functions satisfy the shoulder condition that g′(0) = 0, with the hazard rate model giving greater flexibility in modelling the shoulder width (i.e. the range near 0 over which g is relatively flat). The use of “robust models”, which have enough flexibility to model a range of typical shapes, is recommended on pages 46–49 of [9], with the hazard rate model given as a particularly useful example.

It is also possible to include other covariates affecting the detectability of objects in the distance function, such as characteristics of the animal or plant (see [21] and [22]).

Let di, i = 1, …, N, be the perpendicular distances from the objects to the nearest transect line, and let δi = 1 for observed objects and δi = 0 for the rest. Under model assumption (iii) stated in the Introduction, di are independent and identically distributed uniform U(0, M) for i = 1, …, N, where M is the maximum possible distance from a transect line. The distribution of di given δi = 1 is easily derived as (1) Let be the unconditional probability of detection. (This is the probability that an animal in the covered region is detected, unconditional on its distance, denoted by Pa in equation (2) of [23]. The notation is used here to emphasise that it is the mean value of g(di; θ) over diw.) The conditional likelihood of ds = (d1, …, dn)T given n is (2) and the corresponding conditional log-likelihood given n is (3) The parameters θ can be obtained by setting the derivative of ld with respect to θ to 0. Prior to this, it is convenient to define (4) and . Notice that h(u; θ) is a p-vector where p is the number of parameters in θ. The partial derivative of with respect to θ is , subject to regularity conditions allowing the derivative operator to be taken within the integral. Setting the derivative of ld to 0 gives the following estimating equation for θ: (5)

The most commonly used estimator of N in CDS is: (6) where P is the proportion of the area falling within perpendicular distance of w of a transect line, and is the solution to Eq (5) (CML stands for conditional maximum likelihood, as the likelihood Ld in Eq (2) is conditional on n). In CDS, the estimation of θ is model-based and maximises the likelihood conditional on n, while the estimation of N in Eq (6) is design-based and motivated by the fact that . See equation 1.4 of [9]. See also [23] for a recent discussion of CDS and extensions. The value of P is assumed to be known, and is approximated in practice by the total length of all transects multiplied by w (and multiplied by 2 for two-sided observation) divided by the region’s area.

Another alternative that has been proposed is maximum likelihood estimation of N, which is the minimum variance unbiased estimator for large samples subject to regularity conditions. Section 7.2 of [24] derives the likelihood for N and θ. The conditional density of ds given n is Ld in Eq (2). Under the assumed model, δi are independent Bernoulli random variables with expected value . Hence (7) The likelihood is the product of the probability function of n and Ld: (8) (9) (10) (See equation 2.33 of [4]). L can be maximised with respect to N and θ to obtain a maximum likelihood estimator of N. In this approach, both N and θ are treated as unknown parameters. On pages 16–17, [4] recommend against this approach because in practice n is likely to be overdispersed and so to have higher variance than implied by Eq (7). For example, this would occur if there were positive correlations between the values of di. It is worth noting that any such overdispersion is likely to also invalidate Eq (2) and so and to some extent, as Eq (2) assumes that di are independent conditional on n.

When θ is known, the MLE of N is the integer part of (see pages 17–19 and 138 of [24]). The same reference also shows that if the factorial terms in Eq (10) are approximated using Stirling’s rule and N is treated as a continuous parameter, the MLE is then (11) It is straightforward to derive the variance of using the fact that : (12) When θ is unknown, [19] note that the MLE of N is the integer part of where is the MLE of these parameters. The next subsection of this paper extends this result by showing that and maximise a Stirling approximation to the full likelihood L. This enables a theoretical result on the large sample variance of , albeit under the strong assumption Eq (7). The simulation study in Section 3 relaxes this assumption by including overdispersion.

2.2 Derivations of the MLE and its variance based on Stirling’s approximation

The maximum likelihood estimator of N under Stirling’s approximation for factorials is derived in this subsection, where the likelihood L is given by Eq (10). (Note that in section 3.3.1, [9] consider maximization of Ld, not L, to estimate θ, where Ld, defined in Eq (2), is the conditional likelihood given n). Stirling’s rule log(x!) ≈ xlog(x) − x implies that (13) (14) (15) There is a positive probability that n is equal to 0 or N, in which case log(n) or log(Nn) are not defined. The limit of the right hand side of Eq (15) can easily be shown to equal 0 in these cases, so the following refinement of Eq (15) is used: (16)

L will be maximised with respect to N and θ treating N as a continuous parameter. Stirling’s approximation for log(x!) is very accurate even for small x as long as x is at least 2 or 3, and both Nn and n would be well above this in practice. Theorem 1 states the MLEs and the approximate Fisher information.

Theorem 1 The model defined by Eqs (1) and (7) is assumed, and it is assumed that the log of the likelihood Eq (10) can be approximated using Eq (16), leading to (17) Let Ω be an open set defining the set of feasible values of θ. If there is a unique satisfying Eq (5) then it is the maximum likelihood estimator, and the MLE of N on (0, ∞) is in Eq (6).

Let D be a random variable with density for 0 ≤ dw, which is also the distribution of di conditional on detection. The Fisher Information of (N,θT)T is approximately equal to (18) for large N, where (19) (20) (21) (22)

Surprisingly, and maximise the Stirling approximation to the full likelihood L, even though was defined as the maximizer of the conditional likelihood Ld given n (not L), and was motivated by a design-based argument.

Let V be the inverse of the Fisher Information matrix in Eq (18). Subject to regularity conditions, maximum likelihood estimators are asymptotically normal with expectation equal to the true parameter values and variance-covariance matrix equal to the inverse of the Fisher Information matrix. Unfortunately these regularity conditions do not hold here, for example condition (M3) of the Central Limit Theorem on pages 499–500 of [25] is not met, because n and ds are dependent. Moreover, Eq (18) is only the approximate Fisher information based on a Taylor Series expansion, whereas the usual Central Limit Theorem requires the exact Fisher information. [26] uses an alternative method of proof to derive a Central Limit Theorem for the MLE. It is shown in this working paper that V is indeed the limiting variance of . The proof in [26] is essentially a simplified version of the proof of the result in [19] which does not use the Stirling approximation. Theorem 1 is an advance on the result in [19], because it shows that is the full maximum likelihood estimator (subject to the Stirling approximation), and also provides a simpler derivation of V. Theorem 1 helps explain the finding of [19] that the difference between and the exact maximum likelihood estimator (i.e. the MLE when the Stirling approximation is not made) is small asymptotically. The theorem also suggests that there is no need for the calculation of the exact MLE in practice, even when the model assumptions are justified, since maximises an excellent approximation to the full likelihood.

It is convenient to express V in block form. We will henceforth mostly write g(u), h(u), and for readability, rather than g(u; θ) etc. Using a standard result on the inverse of a matrix in block form (e.g. 5.16a of [27]), V is equal to (23) where (24) and Δ is the p by p matrix defined by (25)

The limiting variance of , V11 from Eq (24), is of primary interest. It can be expanded by elementary operations as (26) where is the variance of when θ is known, as defined by Eq (12), and (27) is a penalty term attributable to θ requiring estimation. The penalty F is always 1 or greater, because Δ is a variance-covariance matrix, and so is positive semi-definite.

The coefficient of variance (CV) of follows directly, noting that : (28) (29)

2.3 Numerical values of the asymptotic variance for selected models

Values of Δ are obtained by rewriting Eq (25) as (30) and calculating by quadrature using the integrate function in the R Statistical Environment [28]. Table 2 shows values of F numerically calculated for various hazard rate models. The parameter θ2 determines the shape of the detection curve, with 1.1 giving a very narrow shoulder (i.e. steeply declining for small distances) and 3 giving a very wide shoulder. Hazard rate detection models for a number of values of θ2 are illustrated in the next section. The parameter θ1 is calculated numerically to give the specified in each row.

Table 2. Asymptotic penalty (F) for the hazard rate model for selected values of the coverage rate P, the shape parameter (θ2) and the mean detection rate .

The largest distance for which detection is attempted is w = 1 in all cases.

Table 2 shows that F increases as θ2 decreases, i.e. as the shoulder becomes narrower. For a given , F decreases as the coverage rate P increases. This is because increasing P improves the precision of , but it improves the precision of even faster. For a given P, F decreases as increases when θ2 ≤ 2 (narrow shoulder), but increases as increases when θ2 > 2. A possible reason is that when θ2 is small and is large, the detection function is near 1 for much of its range but then decreases precipitously. The fact that it remains near 1 for much of the range may mean that is relatively insensitive to . In contrast, when θ2 > 2 and is large, the detection function declines more smoothly, so that is more sensitive to .

Values of of 0.1, 0.3 or 0.6, P = 0.3 and θ2 ≥ 1.25 are probably the most representative of studies in practice. F varies from 1.9 to 5.6 in this subset of Table 2.

Table 3 shows similar results for half-normal detection models. The values of F are generally much closer to 1 than in Table 2, varying from 1.5 to 1.9 in the subset of the table where and P = 0.3. F increases with P, as in Table 2. F increases with for fixed P, similar to the wide-shouldered results in Table 2 where θ2 > 2.

Table 3. Asymptotic penalty (F) for the half-normal model for selected values of the coverage rate P and the mean detection rate .

The largest distance for which detection is attempted is w = 1 in all cases.

3 Simulation study

3.1 Design of the simulation study

Generation of distances di for i = 1, …, N.

Distance data are simulated for abundances N such that the expected numbers of detections are E[n] = 50, 100, 200, …, 1000, and the fraction of the area covered is P = 0.1. 10,000 simulations are used in every case.

Distances di are generated both with and without overdispersion. In the latter case, di are independent U(0, 10) random variables. The maximum range of observation is set to w = 1, so that objects are only eligible for detection when di ≤ 1, and so the probability of any given object falling within the covered area is P = 0.1. One implication of this model is that the number of objects n(v) with distance falling into any given interval of length v within [0, 10] is distributed as n(v)∼bin(N, vP), and hence E[n(v)] = NvP and var[n(v)] = NvP(1 − vP). For example, Nc is a special case of n(v) with v = 1. The assumption of independent uniform distances has been criticised because in practice n(v) is often observed to be overdispersed, with variance greater than NvP(1 − vP).

Overdispersed di are generated by firstly replacing the U(0, 10) distribution by the discrete approximation with probability 0.001 at each of 1000 evenly spaced values between 0 and 10. The probability that di falls in any given interval would then be 0.001 in the non-overdispersed case. Overdispersed di are assumed to be discrete random variables with the same set of possible values, with probability ϕk for value k = 1, …, 1000. The vector ϕ is simulated as coming from a Dirichlet distribution with vector parameter equal to 0.001α 11000 where 11000 is a vector containing 1000 values all equal to 1, and α is a parameter which controls the variance of ϕ. When α → ∞, ϕ is equal to 11000/1000 with probability 1, resulting in the non-overdispersed case. For 0 < α < ∞, ϕ are random variables each lying between 0 and 1, with . The parameters ϕ are generated anew for each simulation, and {di:i = 1, …, N} are then generated as independent discrete random variables with probabilities ϕ at each of the 1000 evenly spaced values between 0 and 10.

The overdispersed di have the property that n(v) is beta-binomial distributed with parameters N, αPv, and α(1 − Pv). (This follows from the properties of the Dirichlet-multinomial and beta-binomial distributions—see for example [29].) The expected value of n(v) is then NvP as before, but the variance is inflated to var[n(v)] = cNvP(1 − vP) where c = (α + N)/(α + 1) is an overdispersion factor. Values of α corresponding to c = 1, 2 and 3 are used.

The above process is a discrete approximation of a Dirichlet process with base distribution given by U(0, 10). Dirichlet processes are widely used as prior distributions for distribution functions (e.g. chapter 23 of [30]). This also makes them suitable to simulate overdispersed data, particularly as the resulting di have the property that the number of distances falling in any interval is overdispersed to the same degree.

Simulation of detection process.

Objects are detected with probability g(di; θ) when diw = 1 for each i, with detection independent across objects. Two families of detection functions are used: the hazard rate and the half-normal. Both are among those proposed in [9]. The two-parameter hazard rate function meets the requirement specified on page 41 of [9] of being a flexible model, giving some robustness to mis-specified detection function; in particular, it allows the shoulder to be narrow or wide. The half-normal detection function is less flexible, but is also often used, and would generally be easier to fit from data as it has only one parameter. Fig 1 shows the 4 particular distance functions used: hazard rate (hr) functions with θ = (0.405, 1.25), θ = (0.448, 2) and θ = (0.484, 3) corresponding to a very narrow, narrow, and wide shoulder respectively; and a half-normal (hn) function with θ = 0.502. This gives a variety of shapes of the detection function, with all 4 having the same average detection rate of . This means that the number of detections is approximately . The values of g(w) for all three functions are at least 0.11, and all but the wide hazard rate function are at least 0.14, roughly in line with the rule of thumb for choice of w on page 16 of [9]. The figure also shows the asymptotic penalty F due to unknown θ for each detection function. The penalty ranges from 2 to 5.4.

Fig 1. Detection Functions used to Generate Simulated Data.

The variance penalty factors F from Eq (26) due to unknown θ are also shown.

Estimators of abundance.

The CDS estimator defined in Eq (6) is calculated using the hazard rate model and the half-normal model. The CDS estimator assuming known θ, defined in Eq (11), is also calculated. This last option is of course unrealistic in practice, but is included to show the impact of uncertainty about θ.

An estimate of N is also calculated using the strip transect estimator . This is unbiased under the binomial model nbin(N, P), which incorrectly assumes perfect detection up to distance w. The strip transect estimator is also applied to restricted datasets using only those distances up to a range of w′ = 0.01, 0.02, …, 1, with .

Each simulation corresponds to a single detection function, and a single value of E[n] and of c. 10,000 populations of distances and detections are generated in each simulation. All computations are carried out in the R statistical environment version 3.0.1 [28]. The Distance package [31] is not used because of occasional non-convergence (this would generally not be an issue in practice, but is a problem in a large simulation). The estimator is calculated by maximising ld from Eq (3) using the optim function, using the Nelder-Mead method for the two-parameter hazard rate function and the Brent method for the one-parameter half-normal function. The complete simulation requires approximately 14 hours on a Macbook Pro with a 2.7GHz Intel Core i7 processor and 16GB of RAM. The code to conduct the simulation and produce the figures and tables is in S1 Code.

The aim of the simulation is to estimate the penalty factor due to unknown detection parameters for finite sample sizes, and to compare the mean squared errors (MSEs) of line transect and strip transect estimators in different scenarios. The focus of the paper is on variances and mean squared errors of abundance estimators, so variance estimators and confidence interval coverage are not reported on.

3.2 Simulation results for the asymptotic penalty

Fig 2 shows how the simulation estimates of F converge to the asymptotic values as n and N increase, for c = 1. For the hazard rate with very narrow shoulder and the halfnormal model, the asymptotic approximation is good even for E[n] = 100. For the other two models, the small sample penalties are higher than the asymptotic value for E[n] = 100, but converge to the asymptote as E[n] increases. The results provide a computational confirmation of the derivation of F in Section 3. Fig 3 shows the simulation estimates of F when the overdispersion parameter c is 2. No asymptotic result for F is available in this case. The values of F are 0.1–0.7 higher than when c = 1.

Fig 2. Variance of MLE of N when θ is unknown relative to when it is known, for various sample sizes based on simulation, and asymptotically from Eq (26), where P = 0.1, c = 1 and .

Fig 3. Variance of MLE of N when θ is unknown relative to when it is known, for various sample sizes based on simulation, and asymptotically from Eq (26), where P = 0.1, c = 2 and .

3.3 Simulation MSEs of CDS and strip transect estimators

Fig 4 compares the simulation MSEs of , and with varying strip width, when E[n] = 50. The 3 horizontal red lines in each plot are the MSEs of , which make use of all detections up to distance w = 1. The 3 lines are for overdispersion values of c = 1, 2 and 3. The 3 blue curves show the MSEs of for c = 1, 2 and 3, with strip widths of 0.01, 0.02, …, 1. Fig 4(a) shows MSEs for the half-normal detection function. Fig 4(b), 4(c) and 4(d) are results for the hazard rate models with very narrow, narrow and wide shoulders, respectively.

Fig 4. Relative root mean squared errors (RRMSE) (%) of CDS and strip transect estimators estimators (strip widths ranging from 0 to 1) of N, for 4 detection functions, with E[n] = 50, P = 0.1 and .

The format of Fig 4 is loosely based on Figure 1 in [15]. [15] compared CDS and strip transect estimators for the half-normal and negative exponential detection functions (both one parameter families) for n = 40, 60 and 100, and c = 1, 2 and 3. Here we extend these results to the two-parameter hazard rate function, and we simulate over-dispersed distances corresponding to c = 1, 2 and 3, rather than using the approximate formula in [15] to convert results when c = 1 to other values of c.

Figs 5, 6 and 7 are of the same format as Fig 4 and show results when E[n] is 100, 200 and 400 respectively.

Fig 5. Relative root mean squared errors (RRMSE) (%) of CDS and strip transect estimators estimators (strip widths ranging from 0 to 1) of N, for 4 detection functions, with E[n] = 100, P = 0.1 and .

Fig 6. Relative root mean squared errors (RRMSE) (%) of CDS and strip transect estimators estimators (strip widths ranging from 0 to 1) of N, for 4 detection functions, with E[n] = 200, P = 0.1 and .

Fig 7. Relative root mean squared errors (RRMSE) (%) of CDS and strip transect estimators estimators (strip widths ranging from 0 to 1) of N, for 4 detection functions, with E[n] = 400, P = 0.1 and .

For the half-normal function, Figs 47 replicate the finding of [15] that CDS dominates strip transect estimators, with the former having lower MSE for almost all values of c, E[n] and strip width.

For the hazard rate function, the picture is quite different, presumably because of the larger value of the penalty (F) due to unknown detection parameters for this model. [15] argue that c = 2 is the most practically relevant scenario out of c = 1, 2 and 3, so we concentrate on this case. For this value of c, and for the narrow shoulder detection function (panel (c) of each figure), has lower MSE than when the strip width is 0.1 and above (when E[n] = 50), between 0.15 and 0.71 (when E[n] = 100), 0.19 and 0.54 (when E[n] = 200) and 0.21 and 0.45 (when E[n] = 400). The optimal strip widths are 0.47, 0.42, 0.38 and 0.35 for E[n] equal to 50, 100, 200 and 400, respectively.

performs better relative to than the above when the hazard rate function has a wide shoulder, and worse than the above when the shoulder is very narrow.

The MSEs of both and increase as c increases, particularly . So when c increases, decreases slightly.

Simulations were also carried out with values of P other than 0.1. Results for P equal to 0.2, 0.25, 0.3 and 0.5, with E[n] = 100, are in S1, S2, S3 and S4 Figs, respectively. A comparison of these figures to Fig 4 shows that mean squared errors decrease slightly as P increases. The relative performance of the different methods for the various c and w is insensitive to P.

4 Conclusions

To achieve a given coefficient of variation CV for the line transect estimator, the required expected sample size is (31) (follows from rearranging Eq (29)). When n is much smaller than N, this simplifies to (32) The factor F is the inflation due to unknown detection function parameters. Asymptotic values of F are made available here for the first time (see Tables 2 and 3), albeit under a strong simplifying assumption that counts are binomially distributed. The asymptotic values of F for typical hazard rate models are between 2 and 6. Simulation confirms the accuracy of the asymptotic result when there is no overdispersion, and shows that F is larger in the presence of overdispersion. The R package distance.sample.size[32] calculates F and required sample size using the methods described in the current paper.

Note that the inflation factor F does not apply if g is modelled using mark-recapture data as suggested by [14].

The results on F help to explain the relative mean squared errors of strip and CDS estimators in the simulation. When the number of detections n is sufficiently large (e.g. 400), CDS outperforms strip transect estimation unless the strip width is moderately close to its optimal value (within about ± 25%). This is because variances of estimated abundances are then relatively small, so that bias considerations are paramount. For smaller n, the variance of the CDS estimator becomes more dominant, partly due to the penalty F. As a result, when the overdispersion factor is 2 and the hazard rate model applies, we find that strip transects give lower mean squared error (MSE) than CDS when:

  • n = 50 and the strip width is 0.1 or higher (with an optimal width of about 0.4), where a width of 1 corresponds to a typical detection range for distance sampling;
  • n = 100 and the strip width is between 0.2 and 0.7, i.e. within about ± 75% of its optimal value of 0.42; or
  • n = 200 and the strip width is within about 50% of its optimal value.

A Dirichlet process approach was able to generate non-uniform detections resulting in overdispersion factors of 1, 2 and 3. This approach has not been used in simulations in statistical ecology to the author’s knowledge. It would be applicable to any simulation where overdispersion or robustness relative to an assumed spatial model is of interest.

The simulation settings are more favourable to CDS than to strip transect estimators. In particular, it is assumed that the functional form used in CDS estimation matches the true detection function. This assumption will never be perfectly justified in reality, and its failure will impact on the performance of CDS estimators to some extent. Model selection or model averaging are often used to try to identify the correct model, but uncertainty about the model form will inflate variances and potentially lead to some bias as well. In contrast, strip transect estimators do not require a specification of the detection function. Of course, the properties of strip transect estimators depend on the mean detection rate in the strip, but beyond that there is no particular impact of one functional form rather than another. The simulations with c = 1 are also favourable to both CDS and strip transects as they mean that distances are independent and uniformly distributed. This is relaxed to some extent by the overdispersed simulations with c equal to 2 and 3, however the Dirichlet process used is still centred on the uniform distribution. When there is a more severe departure from uniformity, CDS estimators will be biased, and strip transect estimators will also be affected to some degree.

The choice of methodology for assessing abundance, as well as the determination of required sample size, should be informed by consideration of all relevant biases and by the likely achievable precision. The results in this paper will help in this process, by providing a sample size formula reflecting the penalty due to unknown detection parameters, theoretical and simulation results on the size of this penalty, and a comparison of the mean squared errors of strip and line transect estimators in a wide-ranging simulation.

Appendix: Proof of Theorem 1

Applying Eq (15), we approximate l = log(L) by (33) The next step is to differentiate l to obtain the score function: (34) (35) (36)

The MLE is obtained by setting Eqs (35) and (36) to 0. Firstly, set Eq (35) to 0 and exponentiate both sides: (37) which leads directly to (38) Setting Eq (36) to 0, and then substituting for from Eq (38) gives an estimating equation for θ: (39) (40) (41) (42) Results Eqs (38) and (42) give the Theorem result on the maximum likelihood estimators of N and θ.

The Fisher Information is given by the variance of the score vector, the elements of which are given by Eqs (35) and (36). It can be written as (43) The block elements of are easily derived. Some preliminary notes:

  1. Let D be a random variable with density for 0 ≤ dw.
  2. The distances di follow the same distribution as D, conditional on n.

For the remainder of the proof, I will write g(u), h(u), and for readability, rather than g(u; θ) etc. Using (a), (b) and (c), we obtain: (44) (45) (46) (47) (48) (49) (50) (51) (52) (53) (54) As N → ∞ , so the right hand side of Eq (54) can be approximated using a first order Taylor Series of n about : (55) (56) (57) (58) The top-left element of , , can also be approximated by a first order Taylor Series about : (59) (60) (61) (62) (63)

Supporting Information

S1 Fig. Simulation results for P = 0.2 with E[n] = 100.


S2 Fig. Simulation results for P = 0.25 with E[n] = 100.


S3 Fig. Simulation results for P = 0.3 with E[n] = 100.


S4 Fig. Simulation results for P = 0.5 with E[n] = 100.


S1 Code. R program used to produce simulation results.



I am grateful to Geoff Robertson and Michelle Dawson who provided background on sampling of brumbies and to Alan Welsh for his insights and encouragement. Anonymous referees made comments which improved the paper. All views are the author’s own.

Author Contributions

Conceived and designed the experiments: RC. Performed the experiments: RC. Analyzed the data: RC. Contributed reagents/materials/analysis tools: RC. Wrote the paper: RC.


  1. 1. Walter MJ, Hone J. A comparison of 3 aerial survey techniques to estimate wild horse abundance in the Australian Alps. Wildlife Society Bulletin. 2003;p. 1138–1149.
  2. 2. Kruger J, Reilly B, Whyte I. Application of distance sampling to estimate population densities of large herbivores in Kruger National Park. Wildlife Research. 2008;35(4):371–376.
  3. 3. Oppel S. Using distance sampling to quantify Odonata density in tropical rainforests. International Journal of Odonatology. 2006;9(1):81–88.
  4. 4. Buckland ST, Anderson DR, Burnham KP, Laake JL, Borchers DL, Thomas L. Advanced Distance Sampling. Oxford: Oxford University Press; 2004.
  5. 5. Hedley SL, Buckland ST. Spatial models for line transect sampling. Journal of Agricultural, Biological, and Environmental Statistics. 2004;9(2):181–199.
  6. 6. Johnson DS, Laake JL, Ver Hoef JM. A model-based approach for making ecological inference from distance sampling data. Biometrics. 2010;66(1):310–318. pmid:19459840
  7. 7. Miller DL, Burt ML, Rexstad EA, Thomas L. Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution. 2013;4(11):1001–1010.
  8. 8. Oedekoven C, Buckland S, Mackenzie M, King R, Evans K, Burger L Jr. Bayesian methods for hierarchical distance sampling models. Journal of Agricultural, Biological, and Environmental Statistics. 2014;p. 1–21.
  9. 9. Buckland S, Anderson D, Burnham K, Laake J, Borchers D. Introduction to Distance Sampling: Estimating Abundance of Biological Populations. Oxford: Oxford University Press; 2001.
  10. 10. Barry SC, Welsh A. Distance sampling methodology. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2001;63(1):23–31.
  11. 11. Melville G, Welsh A. Line transect sampling in small regions. Biometrics. 2001;57(4):1130–1137. pmid:11764253
  12. 12. Fewster RM, Laake JL, Buckland ST. Line transect sampling in small and large regions. Biometrics. 2005;61(3):856–859. pmid:16135038
  13. 13. Melville GJ, Tracey JP, Fleming PJ, Lukins BS. Aerial surveys of multiple species: critical assumptions and sources of bias in distance and mark–recapture estimators. Wildlife Research. 2008;35(4):310–348.
  14. 14. Melville GJ, Welsh AH. Model-based prediction In ecological surveys including those with incomplete detection. Australian & New Zealand Journal of Statistics. 2014;56(3):257–281.
  15. 15. Burnham KP, Anderson DR, Laake JL. Efficiency and bias in strip and line transect sampling. The Journal of Wildlife Management. 1985;p. 1012–1018.
  16. 16. Azhar B, Zakaria M, Yusof E, Leong PC. Efficiency of fixed-width transect and line-transect-based distance sampling to survey red junglefowl (Gallus gallus spadiceus) in Peninsular Malaysia. Journal of Sustainable Development. 2009;1(2):63–73.
  17. 17. Welsh A. Incomplete detection in enumeration surveys: whither distance sampling? Australian & New Zealand Journal of Statistics. 2002;44(1):13–22.
  18. 18. Burnham KP, Anderson DR. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. New York: Springer-Verlag; 2002.
  19. 19. Fewster R, Jupp P. Inference on population size in binomial detectability models. Biometrika. 2009;96(4):805–820.
  20. 20. Thomas L, Buckland ST, Rexstad EA, Laake JL, Strindberg S, Hedley SL, et al. Distance software: design and analysis of distance sampling surveys for estimating population size. Journal of Applied Ecology. 2010;47(1):5–14. pmid:20383262
  21. 21. Marques FF, Buckland ST. Incorporating covariates into standard line transect analyses. Biometrics. 2003;59(4):924–935. pmid:14969471
  22. 22. Marques TA, Thomas L, Fancy SG, Buckland ST, Handel C. Improving estimates of bird density using multiple-covariate distance sampling. The Auk. 2007;124(4):1229–1243.
  23. 23. Buckland ST, Oedekoven CS, Borchers DL. Model-based distance sampling. Journal of Agricultural, Biological, and Environmental Statistics. 2015;p. 1–18.
  24. 24. Borchers DL, Buckland ST. Estimating Animal Abundance: Closed Populations. London: Springer-Verlag; 2002.
  25. 25. Lehmann EL. Elements of Large-Sample Theory. New York: Springer; 1999.
  26. 26. Clark RG. Efficiency and Robustness in Distance Sampling; 2015. National Institute for Applied Statistics Research Australia (NIASRA) Working Paper 15–15. Available from:
  27. 27. Harville DA. Matrix Algebra from a Statistician’s Perspective. New York: Springer; 1997.
  28. 28. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2013.
  29. 29. Johnson NL, Kotz S, Balakrishnan N. Discrete Multivariate Distributions. Wiley New York; 1997.
  30. 30. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. 3rd ed. Taylor & Francis; 2014.
  31. 31. Miller DL. Distance: a simple way to fit detection functions to distance sampling data and calculate abundance/density for biological populations; 2014. R package version 0.9. Available from:
  32. 32. Clark RG. distance.sample.size: Calculates study size required for distance sampling; 2016. R package version 0.0. Available from: