Statistical Efficiency in Distance Sampling

Distance sampling is a technique for estimating the abundance of animals or other objects in a region, allowing for imperfect detection. This paper evaluates the statistical efficiency of the method when its assumptions are met, both theoretically and by simulation. The theoretical component of the paper is a derivation of the asymptotic variance penalty for the distance sampling estimator arising from uncertainty about the unknown detection parameters. This asymptotic penalty factor is tabulated for several detection functions. It is typically at least 2 but can be much higher, particularly for steeply declining detection rates. The asymptotic result relies on a model which makes the strong assumption that objects are uniformly distributed across the region. The simulation study relaxes this assumption by incorporating over-dispersion when generating object locations. Distance sampling and strip transect estimators are calculated for simulated data, for a variety of overdispersion factors, detection functions, sample sizes and strip widths. The simulation results confirm the theoretical asymptotic penalty in the non-overdispersed case. For a more realistic overdispersion factor of 2, distance sampling estimation outperforms strip transect estimation when a half-normal distance function is correctly assumed, confirming previous literature. When the hazard rate model is correctly assumed, strip transect estimators have lower mean squared error than the usual distance sampling estimator when the strip width is close enough to its optimal value (± 75% when there are 100 detections; ± 50% when there are 200 detections). Whether the ecologist can set the strip width sufficiently accurately will depend on the circumstances of each particular study.


Introduction
The number of animals in a region is often of ecological importance. This paper considers the distance sampling approach to estimating abundance, in its usual conjunction with line transect sampling. Transect lines are laid down across a region, often parallel and equally spaced but not necessarily so. Observers move along the transects, and record observations of animals (or plants, or groups of animals, or other objects) and their perpendicular distances from the transect. Empirically, more objects are detected near to transect lines than far from them in many studies, suggesting that detectability is a decreasing function of distance. The distance sampling methodology exploits this phenomenon, by modelling the detection rate as a function of distance. The number of detected objects can then be scaled to estimate the abundance N allowing for imperfect detection. Provided the assumptions of the method are met, the maximum range can be made fairly large, thereby increasing the sample size, while avoiding or reducing bias due to declining detection rate. Distance sampling is widely used in ecology: a Web of Science search found 276 articles on distance sampling in ecology journals in 2014 alone. The wide range of applications includes wild horses in the Australian Alps [1], large herbivores in South Africa's Kruger National Park [2] and odonata (dragonflies) in a rainforest locality in Papua New Guinea [3]. For a detailed description of the approach, see [4].
The major assumptions of the method are 1. Detection is perfect at zero distance.
2. The detection function is of known form, with some unknown parameters requiring estimation. Alternatively, model-averaging may be used provided the detection function is assumed to be one of a known set of alternatives.
3. Animals' distances to the nearest transect line are (at least approximately) uniformly distributed.
It is also assumed that there is no measurement error (for example, false positive detections, or mis-measurement of distances), that there is no movement of objects in response to the observer which could lead to multiple chances of detection, that detection events are independent, that there are sufficient transects for reliable estimation, and that transect lines are a good representation of the study area. This paper will consider the conventional distance sampling (CDS) scenario where there is only one observer. The same methodology can be used with multiple observers by pooling their detections. Mark recapture and mark recapture distance sampling (MRDS) are methods which more fully use data from multiple observers by matching their detections; MRDS, in particular, can be used to relax assumption (i). More recent approaches combine a spatial model with the detection model (see for example [5][6][7] and [8]). Spatial models allow abundances to be estimated for subregions, and can exploit spatial trends in estimation, however inference may be sensitive to the assumed spatial model which must therefore be carefully constructed. This paper focuses on CDS, as most applications of line transect sampling remain single observer. The use of spatial models in distance sampling is an important advance, but some researchers may decide that the extra effort and complication required to develop an adequate spatial model are not warranted for some studies, particularly when the number of detections is relatively small. Robustness to the 3 assumptions above is explored in the literature. For example, MRDS can be used to achieve some level of robustness to (i). Assumption (ii) is dealt with by the use of flexible families of detection functions with two or more parameters, and the use of model averaging. [9] argue that (iii) is approximately satisfied provided transect lines are placed randomly or systematically from a random starting point. However, [10] question the uniformity assumption and find that CDS estimators are biased in a design-based framework, that is, under repeated random placement of transect lines. The matter remains in contention [11][12][13]. [11] suggest an alternative approach where the detection function is estimated from a separate calibration study, and [14] propose estimating detection probabilities using multiple observer data and possibly but not necessarily distance data.
A natural alternative to distance sampling is the simple scaling up of observations in a strip about the transect. When strips are too wide, this strip transect estimator is severely negatively biased due to non-detections, particularly of the more distant animals in the strip. When strips are sufficiently narrow, this bias becomes negligible, but the variance of the estimated abundance becomes large. Distance sampling aims to achieve lower variance by including observations at greater distances, while reducing bias by adjusting for non-detection. But there is a hidden cost-the effect of unknown detection parameters on the precision of the estimated abundance-which reverses at least some of the benefit due to wider strips. This paper illuminates this cost both asymptotically and in small samples. This existence of a penalty due to unknown detection parameters is known, but no asymptotic expression has been derived in the literature, and the penalty has not been quantified except in a simulation of the half-normal and negative exponential detection functions [15]. A number of authors have compared CDS and strip transect estimators in particular applications (e.g. [1,16]) but typically with only one or two options for strip width. [17] also suggested strip transects as an alternative to CDS but did not make a numerical comparison.
An example of the variability of CDS estimators may be found in [1], which compares strip transect estimates with two different strip widths to CDS estimates with various detection functions, and to mark-recapture and MRDS estimates of abundance. Detections of groups of wild horses are attempted up to 200m from a helicopter traversing 91 parallel transect lines in south-eastern Australia, resulting in 52 group detections (pooled from two observers). Table 1 here reproduces results from page 1145 of [1]. The strip transect estimates of abundance are the number of observed groups within 50m or 200m multiplied by the total area divided by the area lying within either 50m or 200m of a transect line, with no allowance for undercount. The values of the Akaike Information Criterion (AIC) are also shown for each detection model. Model averaged estimates are calculated using weights calculated from the AICs as described in [18]. Mark recapture and MRDS results are not reproduced here as they are outside the scope of the current paper. The CDS estimator shown in Table 1 corresponds to formula (6) in the next section of the current paper. Table 1 shows that the CDS abundance estimators have high coefficients of variation (CV), with the two-parameter hazard rate model leading to a much higher CV (about 57%) than the other detection models which are one-parameter (CVs between 19% and 25%). The modelaveraged estimator has a CV of 30%. The strip transect estimator using detections up to 200m has much lower CV, but is much lower than the CDS estimators, suggesting that it is negatively biased due to undetected groups. The strip transect estimator based on detections up to 50m is more plausible. It is close to the model-averaged CDS estimator, suggesting that its bias is small, presumably because the detection rate would decline relatively little over this shorter range. Surprisingly, the 0-50m strip transect estimator gives similar CVs to the 0-200m model-averaged CDS method, even though it only uses one quarter of the distance range and 44% of the detected groups. This motivated the research in the current paper on the efficiency of CDS estimators. Section 2 proves for the first time that the usual CDS estimator of N is also a maximum likelihood estimator (MLE) of N under a particular model provided the likelihood is approximated using Stirling's Rule. Its asymptotic variance under the model is derived using this result. The asymptotic variance is identical to that of [19], but the use of the Stirling approximation allows a simpler derivation. The limiting variance ofN is expressed as the variance when the detection function is known multiplied by a penalty for unknown θ. This asymptotic penalty is tabulated for various detection models. It can be substantial, and for many situations arising in practice is between 2 and 6. Section 3 summarizes a simulation study to evaluate the small sample performance of CDS estimators for various detection functions and strip transect estimators with differing strip widths. Unlike the theoretical result in Section 2, the simulation allows for overdispersion in the counts of animals falling in any given range. Section 4 contains conclusions about the magnitude of the penalty due to unknown detection parameters in CDS, and the relative performance of CDS and strip transect estimators.
2 Theoretical results on maximum likelihood estimation of N

Notation and background
The aim is to estimate the number of objects, which may be animals, groups of animals or other objects, in a region. Let N be the number of objects in a defined study area and n be the number of detections made by an observer moving along predefined line transects. Observation may either be one-sided (only objects to the right, or only objects to the left, are observed) or two-sided. Only objects with perpendicular distance up to a pre-chosen limit w from a transect line have a chance of being observed (the covered region); let N c be the number of such objects. Two-sided observation is the more common case, but one-sided observation is sometimes necessary, for example if the observer can only see out one side of a vehicle. It is assumed that the probability of observing an object at perpendicular distance d i from a transect line is g (d i , θ) when 0 d i w, where θ is a vector of p parameters specifying the function within a family. It is assumed that g(0, θ) = 1 and that g is a non-increasing function of distance. The Distance software [20], which implements both CDS and MRDS, allows four possible functional forms for g(), including the half-normal model, gðd i Þ ¼ exp À , and the hazard rate function, gðd i Þ ¼ 1 À exp ðÀðd i =y 1 Þ Ày 2 Þ. Both of these functions satisfy the shoulder condition that g 0 (0) = 0, with the hazard rate model giving greater flexibility in modelling the shoulder width (i.e. the range near 0 over which g is relatively flat). The use of "robust models", which have enough flexibility to model a range of typical shapes, is recommended on pages 46-49 of [9], with the hazard rate model given as a particularly useful example. It is also possible to include other covariates affecting the detectability of objects in the distance function, such as characteristics of the animal or plant (see [21] and [22]).
Let d i , i = 1, . . ., N, be the perpendicular distances from the objects to the nearest transect line, and let δ i = 1 for observed objects and δ i = 0 for the rest. Under model assumption (iii) stated in the Introduction, d i are independent and identically distributed uniform U(0, M) for i = 1, . . ., N, where M is the maximum possible distance from a transect line. The distribution of d i given δ i = 1 is easily derived as Let g ðθÞ ¼ w À1 R w 0 gðu; θÞdu be the unconditional probability of detection. (This is the probability that an animal in the covered region is detected, unconditional on its distance, denoted by P a in equation (2) of [23]. The notation g ðθÞ is used here to emphasise that it is the mean and the corresponding conditional log-likelihood given n is The parameters θ can be obtained by setting the derivative of l d with respect to θ to 0. Prior to this, it is convenient to define and is a p-vector where p is the number of parameters in θ. The partial derivative of g ðθÞ with respect to θ is hðθÞ, subject to regularity conditions allowing the derivative operator to be taken within the integral. Setting the derivative of l d to 0 gives the following estimating equation for θ: The most commonly used estimator of N in CDS is: where P is the proportion of the area falling within perpendicular distance of w of a transect line, andθ CML is the solution to Eq (5) (CML stands for conditional maximum likelihood, as the likelihood L d in Eq (2) is conditional on n). In CDS, the estimation of θ is model-based and maximises the likelihood conditional on n, while the estimation of N in Eq (6) is design-based and motivated by the fact that E½n ¼ NP g ðθÞ. See equation 1.4 of [9]. See also [23] for a recent discussion of CDS and extensions. The value of P is assumed to be known, and is approximated in practice by the total length of all transects multiplied by w (and multiplied by 2 for twosided observation) divided by the region's area.
Another alternative that has been proposed is maximum likelihood estimation of N, which is the minimum variance unbiased estimator for large samples subject to regularity conditions. Section 7.2 of [24] derives the likelihood for N and θ. The conditional density of d s given n is L d in Eq (2). Under the assumed model, δ i are independent Bernoulli random variables with expected value g ðθÞ. Hence The likelihood is the product of the probability function of n and L d : (See equation 2.33 of [4]). L can be maximised with respect to N and θ to obtain a maximum likelihood estimator of N. In this approach, both N and θ are treated as unknown parameters. On pages 16-17, [4] recommend against this approach because in practice n is likely to be overdispersed and so to have higher variance than implied by Eq (7). For example, this would occur if there were positive correlations between the values of d i . It is worth noting that any such overdispersion is likely to also invalidate Eq (2) and soθ CML andN CDS to some extent, as Eq (2) assumes that d i are independent conditional on n.
When θ is known, the MLE of N is the integer part of n=ðP g ðθÞÞ (see pages 17-19 and 138 of [24]). The same reference also shows that if the factorial terms in Eq (10) are approximated using Stirling's rule and N is treated as a continuous parameter, the MLE is then It is straightforward to derive the variance ofN knowny using the fact that n $ binðN; P g ðθÞÞ: When θ is unknown, [19] note that the MLE of N is the integer part of n=ðP gθ Þ whereθ is the MLE of these parameters. The next subsection of this paper extends this result by showing thatθ CML andN CDS maximise a Stirling approximation to the full likelihood L. This enables a theoretical result on the large sample variance ofN CDS , albeit under the strong assumption Eq (7). The simulation study in Section 3 relaxes this assumption by including overdispersion.

Derivations of the MLE and its variance based on Stirling's approximation
The maximum likelihood estimator of N under Stirling's approximation for factorials is derived in this subsection, where the likelihood L is given by Eq (10). (Note that in section 3.3.1, [9] consider maximization of L d , not L, to estimate θ, where L d , defined in Eq (2), is the conditional likelihood given n).
There is a positive probability that n is equal to 0 or N, in which case log(n) or log(N − n) are not defined. The limit of the right hand side of Eq (15) can easily be shown to equal 0 in these cases, so the following refinement of Eq (15) is used: L will be maximised with respect to N and θ treating N as a continuous parameter. Stirling's approximation for log(x!) is very accurate even for small x as long as x is at least 2 or 3, and both N − n and n would be well above this in practice. Theorem 1 states the MLEs and the approximate Fisher information.
Theorem 1 The model defined by Eqs (1) and (7) is assumed, and it is assumed that the log of the likelihood Eq (10) can be approximated using Eq (16), leading to ÞÀn log ðnÞ þ n log ðPÞ À n log ðwÞ Let O be an open set defining the set of feasible values of θ. If there is a uniqueθ CML satisfying Eq (5) then it is the maximum likelihood estimator, and the MLE of N on (0, 1) isN CDS in Eq (6). Let D be a random variable with density gðdÞ= R w 0 gðuÞdu for 0 d w, which is also the distribution of d i conditional on detection. The Fisher Information of (N,θ T ) T is approximately equal to for large N, where I Ny ¼ ð1 À P g ðyÞÞ À1 P hðyÞ ð 22Þ Surprisingly,θ CML andN CDS maximise the Stirling approximation to the full likelihood L, even thoughθ CML was defined as the maximizer of the conditional likelihood L d given n (not L), andN CDS was motivated by a design-based argument. Let V be the inverse of the Fisher Information matrix in Eq (18). Subject to regularity conditions, maximum likelihood estimators are asymptotically normal with expectation equal to the true parameter values and variance-covariance matrix equal to the inverse of the Fisher Information matrix. Unfortunately these regularity conditions do not hold here, for example condition (M3) of the Central Limit Theorem on pages 499-500 of [25] is not met, because n and d s are dependent. Moreover, Eq (18) is only the approximate Fisher information based on a Taylor Series expansion, whereas the usual Central Limit Theorem requires the exact Fisher information. [26] uses an alternative method of proof to derive a Central Limit Theorem for the MLE. It is shown in this working paper that V is indeed the limiting variance of ðN ;θ T Þ T .
The proof in [26] is essentially a simplified version of the proof of the result in [19] which does not use the Stirling approximation. Theorem 1 is an advance on the result in [19], because it shows thatN CDS is the full maximum likelihood estimator (subject to the Stirling approximation), and also provides a simpler derivation of V. Theorem 1 helps explain the finding of [19] that the difference betweenN CDS and the exact maximum likelihood estimator (i.e. the MLE when the Stirling approximation is not made) is small asymptotically. The theorem also suggests that there is no need for the calculation of the exact MLE in practice, even when the model assumptions are justified, sinceN CDS maximises an excellent approximation to the full likelihood.
It is convenient to express V in block form. We will henceforth mostly write g(u), h(u), g and h for readability, rather than g(u; θ) etc. Using a standard result on the inverse of a matrix in block form (e.g. 5.16a of [27]), V is equal to where and Δ is the p by p matrix defined by The limiting variance ofN CDS , V 11 from Eq (24), is of primary interest. It can be expanded by elementary operations as where varN knowny À Á is the variance ofN when θ is known, as defined by Eq (12), and is a penalty term attributable to θ requiring estimation. The penalty F is always 1 or greater, because Δ is a variance-covariance matrix, and so is positive semi-definite. The coefficient of variance (CV) ofN CDS follows directly, noting that E½n ¼ NP g :

Numerical values of the asymptotic variance for selected models
Values of Δ are obtained by rewriting Eq (25) as and calculating by quadrature using the integrate function in the R Statistical Environment [28]. Table 2 shows values of F numerically calculated for various hazard rate models. The parameter θ 2 determines the shape of the detection curve, with 1.1 giving a very narrow shoulder (i.e. steeply declining for small distances) and 3 giving a very wide shoulder. Hazard rate detection models for a number of values of θ 2 are illustrated in the next section. The parameter θ 1 is calculated numerically to give the specified g in each row. Table 2 shows that F increases as θ 2 decreases, i.e. as the shoulder becomes narrower. For a given g , F decreases as the coverage rate P increases. This is because increasing P improves the precision ofN CDS , but it improves the precision ofN knowny even faster. For a given P, F decreases as g increases when θ 2 2 (narrow shoulder), but increases as g increases when θ 2 > 2. A possible reason is that when θ 2 is small and g is large, the detection function is near 1 for much of its range but then decreases precipitously. The fact that it remains near 1 for much of the range may mean thatN is relatively insensitive toθ. In contrast, when θ 2 > 2 and g is large, the detection function declines more smoothly, so thatN CDS is more sensitive toθ. Values of g of 0.1, 0.3 or 0.6, P = 0.3 and θ 2 ! 1.25 are probably the most representative of studies in practice. F varies from 1.9 to 5.6 in this subset of Table 2. Table 3 shows similar results for half-normal detection models. The values of F are generally much closer to 1 than in Table 2, varying from 1.5 to 1.9 in the subset of the table where g 2 f0:1; 0:3; 0:6g and P = 0.3. F increases with P, as in Table 2. F increases with g for fixed P, similar to the wide-shouldered results in Table 2 where θ 2 > 2.

Design of the simulation study
Generation of distances d i for i = 1, . . ., N. Distance data are simulated for abundances N such that the expected numbers of detections are E[n] = 50, 100, 200, . . ., 1000, and the fraction of the area covered is P = 0.1. 10,000 simulations are used in every case.
Distances d i are generated both with and without overdispersion. In the latter case, d i are independent U(0, 10) random variables. The maximum range of observation is set to w = 1, so that objects are only eligible for detection when d i 1, and so the probability of any given object falling within the covered area is P = 0.1. One implication of this model is that the number of objects n(v) with distance falling into any given interval of length v within [0, 10] is distributed as n(v)*bin (N, vP), and hence E[n(v)] = NvP and var[n(v)] = NvP(1 − vP). For example, N c is a special case of n(v) with v = 1. The assumption of independent uniform distances has been criticised because in practice n(v) is often observed to be overdispersed, with variance greater than NvP(1 − vP).
Overdispersed d i are generated by firstly replacing the U(0, 10) distribution by the discrete approximation with probability 0.001 at each of 1000 evenly spaced values between 0 and 10. The probability that d i falls in any given interval would then be 0.001 in the non-overdispersed case. Overdispersed d i are assumed to be discrete random variables with the same set of possible values, with probability ϕ k for value k = 1, . . ., 1000. The vector ϕ is simulated as coming from a Dirichlet distribution with vector parameter equal to 0.001α 1 1000 where 1 1000 is a vector containing 1000 values all equal to 1, and α is a parameter which controls the variance of ϕ.
When α ! 1, ϕ is equal to 1 1000 /1000 with probability 1, resulting in the non-overdispersed case. For 0 < α < 1, ϕ are random variables each lying between 0 and 1, with P 1000 k¼1 k¼1 . The parameters ϕ are generated anew for each simulation, and {d i :i = 1, . . ., N} are then generated as independent discrete random variables with probabilities ϕ at each of the 1000 evenly spaced values between 0 and 10.
The overdispersed d i have the property that n(v) is beta-binomial distributed with parameters N, αPv, and α(1 − Pv). (This follows from the properties of the Dirichlet-multinomial and beta-binomial distributions-see for example [29].) The expected value of n(v) is then NvP as before, but the variance is inflated to var[n(v)] = cNvP(1 − vP) where c = (α + N)/(α + 1) is an overdispersion factor. Values of α corresponding to c = 1, 2 and 3 are used.
The above process is a discrete approximation of a Dirichlet process with base distribution given by U(0, 10). Dirichlet processes are widely used as prior distributions for distribution functions (e.g. chapter 23 of [30]). This also makes them suitable to simulate overdispersed Table 2. Asymptotic penalty (F) for the hazard rate model for selected values of the coverage rate P, the shape parameter (θ 2 ) and the mean detection rate g. The largest distance for which detection is attempted is w = 1 in all cases. data, particularly as the resulting d i have the property that the number of distances falling in any interval is overdispersed to the same degree.
Simulation of detection process. Objects are detected with probability g(d i ; θ) when d i w = 1 for each i, with detection independent across objects. Two families of detection functions are used: the hazard rate and the half-normal. Both are among those proposed in [9]. The two-parameter hazard rate function meets the requirement specified on page 41 of [9] of being a flexible model, giving some robustness to mis-specified detection function; in particular, it allows the shoulder to be narrow or wide. The half-normal detection function is less flexible, but is also often used, and would generally be easier to fit from data as it has only one parameter. Fig 1 shows the 4 particular distance functions used: hazard rate (hr) functions gðd i ; θÞ ¼ 1 À e Àðd i =y 1 Þ Ày 2 with θ = (0.405, 1.25), θ = (0.448, 2) and θ = (0.484, 3) corresponding to a very narrow, narrow, and wide shoulder respectively; and a half-normal (hn) function gðd i ; θÞ ¼ 1 À e Àðd i =yÞ 2 =2 with θ = 0.502. This gives a variety of shapes of the detection function, with all 4 having the same average detection rate of g ðθÞ ¼ 0:6. This means that the number of detections is approximately n % P g ðθÞN ¼ 0:06N. The values of g(w) for all three functions are at least 0.11, and all but the wide hazard rate function are at least 0.14, roughly in line with the rule of thumb for choice of w on page 16 of [9]. The figure also shows the asymptotic penalty F due to unknown θ for each detection function. The penalty ranges from 2 to 5.4.
Estimators of abundance. The CDS estimatorN CDS defined in Eq (6) is calculated using the hazard rate model and the half-normal model. The CDS estimator assuming known θ, defined in Eq (11), is also calculated. This last option is of course unrealistic in practice, but is included to show the impact of uncertainty about θ.
An estimate of N is also calculated using the strip transect estimatorN ST ¼ n=P. This is unbiased under the binomial model n * bin(N, P), which incorrectly assumes perfect detection up to distance w. The strip transect estimator is also applied to restricted datasets using only those distances up to a range of w 0 = 0.01, 0.02, . . ., 1, withN ST ðw 0 Þ ¼ n½d w 0 =ðPw 0 =wÞ.
Each simulation corresponds to a single detection function, and a single value of E[n] and of c. 10,000 populations of distances and detections are generated in each simulation. All computations are carried out in the R statistical environment version 3.0.1 [28]. The Distance package [31] is not used because of occasional non-convergence (this would generally not be an issue in practice, but is a problem in a large simulation). The estimatorθ CML is calculated by maximising l d from Eq (3) using the optim function, using the Nelder-Mead method for the two-parameter hazard rate function and the Brent method for the one-parameter half-normal function. The complete simulation requires approximately 14 hours on a Macbook Pro with a 2.7GHz Intel Core i7 processor and 16GB of RAM. The code to conduct the simulation and produce the figures and tables is in S1 Code.
The aim of the simulation is to estimate the penalty factor due to unknown detection parameters for finite sample sizes, and to compare the mean squared errors (MSEs) of line transect and strip transect estimators in different scenarios. The focus of the paper is on variances and mean squared errors of abundance estimators, so variance estimators and confidence interval coverage are not reported on.   Fig 4(a) shows MSEs for the half-normal detection function. Fig 4(b), 4(c) and 4(d) are results for the hazard rate models with very narrow, narrow and wide shoulders, respectively.

Simulation MSEs of CDS and strip transect estimators
The format of Fig 4 is loosely based on Figure 1 in [15]. [15] compared CDS and strip transect estimators for the half-normal and negative exponential detection functions (both one parameter families) for n = 40, 60 and 100, and c = 1, 2 and 3. Here we extend these results to the two-parameter hazard rate function, and we simulate over-dispersed distances corresponding to c = 1, 2 and 3, rather than using the approximate formula in [15] to convert results when c = 1 to other values of c.  For the hazard rate function, the picture is quite different, presumably because of the larger value of the penalty (F) due to unknown detection parameters for this model. [15] argue that c = 2 is the most practically relevant scenario out of c = 1, 2 and 3, so we concentrate on this case. N CDS performs better relative toN ST than the above when the hazard rate function has a wide shoulder, and worse than the above when the shoulder is very narrow.
The MSEs of bothN CDS andN ST increase as c increases, particularlyN ST . So when c increases, MSE½N CDS =MSE½N ST decreases slightly.
Simulations were also carried out with values of P other than 0.1. Results for P equal to 0.2, 0.25, 0.3 and 0.5, with E[n] = 100, are in S1, S2, S3 and S4 Figs, respectively. A comparison of

Conclusions
To achieve a given coefficient of variation CV for the line transect estimator, the required expected sample size is (follows from rearranging Eq (29)). When n is much smaller than N, this simplifies to The factor F is the inflation due to unknown detection function parameters. Asymptotic values of F are made available here for the first time (see Tables 2 and 3), albeit under a strong simplifying assumption that counts are binomially distributed. The asymptotic values of F for typical hazard rate models are between 2 and 6. Simulation confirms the accuracy of the asymptotic result when there is no overdispersion, and shows that F is larger in the presence of overdispersion. The R package distance.sample.size [32] calculates F and required sample size using the methods described in the current paper. Note that the inflation factor F does not apply if g is modelled using mark-recapture data as suggested by [14].
The results on F help to explain the relative mean squared errors of strip and CDS estimators in the simulation. When the number of detections n is sufficiently large (e.g. 400), CDS outperforms strip transect estimation unless the strip width is moderately close to its optimal value (within about ± 25%). This is because variances of estimated abundances are then relatively small, so that bias considerations are paramount. For smaller n, the variance of the CDS estimator becomes more dominant, partly due to the penalty F. As a result, when the overdispersion factor is 2 and the hazard rate model applies, we find that strip transects give lower mean squared error (MSE) than CDS when: • n = 50 and the strip width is 0.1 or higher (with an optimal width of about 0.4), where a width of 1 corresponds to a typical detection range for distance sampling; • n = 100 and the strip width is between 0.2 and 0.7, i.e. within about ± 75% of its optimal value of 0.42; or • n = 200 and the strip width is within about 50% of its optimal value. A Dirichlet process approach was able to generate non-uniform detections resulting in overdispersion factors of 1, 2 and 3. This approach has not been used in simulations in statistical ecology to the author's knowledge. It would be applicable to any simulation where overdispersion or robustness relative to an assumed spatial model is of interest.
The simulation settings are more favourable to CDS than to strip transect estimators. In particular, it is assumed that the functional form used in CDS estimation matches the true detection function. This assumption will never be perfectly justified in reality, and its failure will impact on the performance of CDS estimators to some extent. Model selection or model averaging are often used to try to identify the correct model, but uncertainty about the model form will inflate variances and potentially lead to some bias as well. In contrast, strip transect estimators do not require a specification of the detection function. Of course, the properties of strip transect estimators depend on the mean detection rate in the strip, but beyond that there is no particular impact of one functional form rather than another. The simulations with c = 1 are also favourable to both CDS and strip transects as they mean that distances are independent and uniformly distributed. This is relaxed to some extent by the overdispersed simulations with c equal to 2 and 3, however the Dirichlet process used is still centred on the uniform distribution. When there is a more severe departure from uniformity, CDS estimators will be biased, and strip transect estimators will also be affected to some degree.
The choice of methodology for assessing abundance, as well as the determination of required sample size, should be informed by consideration of all relevant biases and by the likely achievable precision. The results in this paper will help in this process, by providing a sample size formula reflecting the penalty due to unknown detection parameters, theoretical and simulation results on the size of this penalty, and a comparison of the mean squared errors of strip and line transect estimators in a wide-ranging simulation.