Efficient and Unbiased Estimation of Population Size

Marcos Cruz; Domingo Gómez; Luis M. Cruz-Orive

doi:10.1371/journal.pone.0141868

Abstract

Population sizing from still aerial pictures is of wide applicability in ecological and social sciences. The problem is long standing because current automatic detection and counting algorithms are known to fail in most cases, and exhaustive manual counting is tedious, slow, difficult to verify and unfeasible for large populations. An alternative is to multiply population density with some reference area but, unfortunately, sampling details, handling of edge effects, etc., are seldom described. For the first time we address the problem using principles of geometric sampling. These principles are old and solid, but largely unknown outside the areas of three dimensional microscopy and stereology. Here we adapt them to estimate the size of any population of individuals lying on an essentially planar area, e.g. people, animals, trees on a savanna, etc. The proposed design is unbiased irrespective of population size, pattern, perspective artifacts, etc. The implementation is very simple—it is based on the random superimposition of coarse quadrat grids. Also, an objective error assessment is often lacking. For the latter purpose the quadrat counts are often assumed to be independent. We demonstrate that this approach can perform very poorly, and we propose (and check via Monte Carlo resampling) a new theoretical error prediction formula. As far as efficiency, counting about 50 (100) individuals in 20 quadrats, can yield relative standard errors of about 8% (5%) in typical cases. This fact effectively breaks the barrier hitherto imposed by the current lack of automatic face detection algorithms, because semiautomatic sampling and manual counting becomes an attractive option.

Citation: Cruz M, Gómez D, Cruz-Orive LM (2015) Efficient and Unbiased Estimation of Population Size. PLoS ONE 10(11): e0141868. https://doi.org/10.1371/journal.pone.0141868

Editor: Tobias Preis, University of Warwick, UNITED KINGDOM

Received: June 4, 2015; Accepted: October 14, 2015; Published: November 4, 2015

Copyright: © 2015 Cruz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: All relevant data are within the paper.

Funding: The authors have no support or funding to report.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The size of a population is the total number of individual feature elements or units (e.g. organisms) constituting the population. If the latter sits in an open area, then its elements can in principle be identified and counted from aerial photographs (see for instance [1]). Often the natural reaction is to count all the elements, a task that is usually carried out ultimately by hand. In practice, however, this may be bearable for population sizes of the order of 1000 elements, although it is tedious, slow, highly-dependent on the skill of the operator and difficult to verify. It is easy to realize that proper sampling will be imperative in general. Unfortunately, proper sampling strategies are usually lacking in this context. As far as human crowds is concerned, size estimates differ widely among convention organizers, media and police [2]. Similar remarks tend to apply in ecology and other sciences. Population density is usually estimated with quadrats, but no clear criteria are given on how to place the quadrats, on how to correct the edge effects arising in quadrat counting, etc. The lack of a well defined sampling mechanism precludes not only the unbiased estimation of particle size, but also a reliable prediction of the corresponding error variance.

Automatic image analysis is not yet a reliable alternative. For instance, the automatic detection of human faces in still pictures has been studied carefully [3, 4]. As explained in [5], however, state of the art human face detectors are known to perform poorly in general due to a host of artifacts such as large pose and illumination variations, occlusions, expression variations, out-of-focus blur, and low image resolution. In the particular case of human crowds, alternative methods are arising from the recent increase in the usage of technology [6].

Here we propose design based systematic sampling to estimate population size with low and predictable errors. In mathematical terms, the problem is to estimate the finite size N of a bounded population Y of particles on an observation plane. In general, a particle is defined as a compact and connected subset separated from other particles. In the human crowd context a particle is the planar projection of a human head, or a clearly distinguishable fragment of it, as observed on a photograph of a crowd, see Fig 1A. The purpose of this paper is twofold. First, to propose a design unbiased estimator of N, which means that, up to practical artifacts, the mean of the error over all possible samples, is zero. This is a mathematical property warranted by the sampling design, and it does not depend on population pattern and size. The only practical requirement is that the particles are unambiguously distinguishable for counting. The second purpose is to predict the error variance . The later task is non trivial because the observations will be systematic, hence dependent in general. Here no variance estimator exists which is always unbiased.

Download:

Fig 1. Systematic quadrat sampling and unbiased counting rule to estimate feature number in the plane.

(A). Spectators in a football match (Bilbao, 1966), (original picture taken from [14]; with permission of the author). A square grid of quadrats was superimposed uniformly at random to estimate the total head number. The quadrat marked with a yellow arrowhead is magnified in (E). Size of the entire picture: 1796 × 1200 pixels. (B) Corresponding associated points—as used in our Monte Carlo automatic resampling—with the same grid superimposed on them (quadrat side length t = 50, fundamental box side length T = 250 pixels). (C) Forbidden line rule to remove edge effects in manual counting [11]. The green particles are counted in the quadrat, the red ones are not because they hit the extended forbidden edge (in red). (D) Application of the forbidden line rule to each of the four quadrats shown, yields the total correct count of one particle. (E) With the latter rule, only arrowed heads are counted.

https://doi.org/10.1371/journal.pone.0141868.g001

The method proposed to estimate N is adapted from well known principles of geometric sampling for stereology, and they are widely used in many disciplines—for references see for instance [7] and [8]. Early papers on particle counting methods are [9–11]; for a review see [12].

The design unbiased estimator of N is described in section Design based model: Unbiased estimation of N. In section Variance Estimators we describe two alternative variance estimators, namely a traditional, naive one relying on independence assumptions, and a new one based on relatively recent results. Their relative performance is checked via Monte Carlo simulations on two real pictures (Fig 1A and Fig 7b of [13]) described in section Material. The simulations are facilitated by replacing each sampling unit (human head) with an associated point. This process, plus the concrete details of the simulations, the results, and their interpretation, are described in section Results and Discussion. Practical estimation procedures for arbitrary particles (as opposed to point particles) are illustrated in section Sizing a crowd in practice: numerical step-by-step procedures. Finally, section Concluding Remarks is devoted to final comments and conclusions.

Materials and Methods

Design based model: Unbiased estimation of N

The population of particles is assumed to be fixed and deterministic, and it is represented by a discrete, finite set Y = {y₁, y₂, …, y_N}, where y_i denotes the ith particle. It is also assumed that Y is bounded, that is, Y can be contained in a disk of finite radius. To estimate the population size N we use systematic sampling with a uniform random (UR) test system of quadrats (also called a ‘grid of quadrats’) of a given, arbitrary orientation in the plane. Here we adopt a square grid of square quadrats. Initially, the lower left corners of the quadrats sit at the vertices of a fixed square lattice of points whose fundamental tile or box J₀ is a square of side length T (also called the ‘lattice size’) and area a = T². The quadrats have side length t, (0 < t ≤ T < ∞) and area a′ = t². The UR condition is essential to the method. Strictly, a UR systematic grid Λ_z of quadrats is generated by shifting the lower left corner of a quadrat to a point z ∈ J₀ which is UR within J₀, thus dragging the whole quadrat grid together (Fig 1B). In practice, one simply throws the grid ‘at random’ over the target population Y. The intersection Y ∩ Λ_z is a systematic quadrat sample from Y. Let Q = Q(Y ∩ Λ_z) denote the sample size, namely the total number of particles captured by the quadrats. Then, (1) is an unbiased estimator of N, that is, the average error of over all the potential UR superimpositions of the grid, is zero. For completeness, a proof of the unbiasedness of is given in S1 Appendix. The ratio a′/a ≤ 1 is the sampling fraction. Note that the estimation is direct, that is, no use is made of any quantity (such as a reference area, etc.) other than the relevant number count Q. By the same token, the estimator is scale independent. Another advantage of the proposed counting method is that only those particles sampled by the quadrats need to be examined, the rest may be ignored. Note also that the grid is theoretically unbounded, hence grazing quadrats hitting particles must not be ignored.

It only remains to define an unbiased counting rule to obtain Q, namely a rule that copes with edge effects and ensures that all the particles have identical probabilities of being sampled with a UR grid of quadrats. A convenient rule for manual counting is the forbidden line rule, see [7, 11, 12], and Fig 1C–1E. A particle is counted in a quadrat only if its has points in common with the quadrat but it does not hit the extended forbidden line of the quadrat. Alternatively, the associated point rule [7, 9, 10], is better suited for automatic particle detection, and for computer aided simulations. The rule establishes that a particle is counted in a quadrat if its associated point—namely a point attached to the particle according to a rule fixed a priori for all particles—is contained in the quadrat. Fig 1B shows all the associated points corresponding to the spectators in Fig 1A. Particle counting is now straightforward but, unfortunately, obtaining Fig 1B from Fig 1A is generally an arduous task as described in the Material subsection below.

Note that two different unbiased rules such as the preceding ones do not need to yield identical counts in a given quadrat—unbiasedness implies coincidence in the mean.

Variance Estimators

We explore the performance of two alternative error variance estimators—namely the naive one based on independence, and a more elaborate one—by Monte Carlo resampling on digitized versions of Fig 1A and Fig 7b of [13]. True variances are denoted by Var(⋅), whereas variance estimators are denoted by var(⋅).

Estimation of the error variance of assuming independence between quadrats.

The first estimator, , is often used [2], and it assumes independence between quadrats. Suppose that the sample Y ∩ Λ_z consists of n ≥ 2 non empty systematic quadrats capturing {q₁, q₂, …, q_n} particles, respectively. Then q₁ + q₂ + … + q_n = Q is the total number of sampled particles. Further let var(q₁) denote the sample variance of the {q_i}. Then, (2)

Estimation of the error variance of using the Cavalieri slices design.

The second error variance predictor, , contemplates quadrat dependence, and it is based on G. Matheron’s transitive theory [15]. In [16] the target parameter was volume, and the variance estimator was derived for a volume estimator obtained from Cavalieri slices produced by parallel systematic slabs normal to an arbitrary sampling axis. In our context the slabs are planar stripes of thickness t > 0 a constant distance T > t apart (namely the distance between left hand side stripe edges say), with a UR positon of the left hand side edge in the interval [0, T). The estimator was extended to the case in which the target parameter is particle number in [17–19]. Here we adopt a suitable combination of these methods. The idea is to regard the quadrat sample as a two stage sample. The first stage involves Cavalieri stripes, and in the second, each stripe is subsampled in turn by a perpendicular series of Cavalieri stripes with the same parameters t, T. The result is clearly equivalent to a grid of systematic quadrats with the latter parameters. Here, the required notation is different from the other sections. Define

τ = t/T ∈ (0, 1], stripe sampling fraction.
n: number of stripes encompassing the particle population, (n > 2).
n_i: number of quadrats subsampled within the ith stripe, i = 1, 2, …, n.
q_ij: number of particles captured by the jth quadrat within the ith stripe, j = 1, 2, …, n_i.
Q_oi, Q_ei: total numbers of particles captured by the odd numbered, and by the even numbered quadrats, respectively, within the ith stripe.
, total number of particles sampled in the ith stripe. Note that Q_i = Q_oi + Q_ei.
, total number of sampled particles.

Now, the estimator given by Eq 1 may be written . The following estimator of is obtained from Eq (3.3) of [18] with q = 0, namely, (3) (4) The first term in the right hand side of Eq (3) estimates the between stripes variance contribution, whereas τ⁻⁴ ν_n estimates the within stripes contribution (namely the contribution of the variation between quadrats within stripes). The latter contribution may be estimated using the splitting estimator given in [17]. The relevant within stripes variance term is obtained from Eq (4.1) of the latter paper (with ), namely: (5) Remarks: Let N_i denote the total, true number of particles captured by the ith entire stripe (this notation is different from that used in other sections). Then the Cavalieri stripes estimator of N, namely τ⁻¹(N₁ + N₂ + … + N_n), is unbiased, and its variance is estimated by the first term in the right hand side of Eq (3). Further, an unbiased estimator of N_i is . The random error is assumed to have zero mean and variance . In the derivation of Eq (5), the within stripe errors {e₁, e₂, …, e_n} are assumed to be independent between stripes, whereby the right hand side of Eq (5) estimates the quantity .

Material

To check our model and estimators we chose the crowd pictures shown in Fig 1A and Fig 7b of [13]. To facilitate programming each particle was replaced with its associated point which was approximately the centre of the smallest rectangle enclosing a sampling unit (i.e. a head or head fragment). For Fig 1A, this task was performed with the aid of the OpenCV software (Open Source Computer Vision), with tedious additional manual editing. We chose Fig 1A since most of the visible faces are frontal, and well resolved, hence OpenCV was able to detect about 80% of the faces, including some false detections. However, this is generally not the case, hence the failure rate for OpenCV may be expected to be higher in most real crowd images. Units hitting the left hand side border of the picture were discarded, the ones hitting the right hand side border were retained; this rule would eliminate double counting in potentially adjacent pictures.

For Fig 7b of [13] the associated point coordinates were borrowed from the dataset mentioned in [13].

Fig 1A and Fig. 7b of [13] contain N = 1120 (Fig 1B) and N = 4633 associated points, respectively.

In practice a picture such as Fig 1A or Fig 7b of [13], does not need to constitute a population of interest in itself, but rather a sample from a high-resolution panoramic image. For our present purposes, however, each of the two pictures is regarded as a target population in itself.

Results and Discussion

Empirical assessment of the variance estimators by Monte Carlo resampling with systematic quadrats on real pictures

The empirical distribution of and the performances of and were checked by Monte Carlo resampling on Fig 1A and Fig 7b of [13] for the values of t and T considered in Fig 2 and Fig 3. To increase precision [20], each picture was tilted 60° prior to sampling in order to avoid parallelism between the sampled stripes and the edges of the picture, see Fig 1B. For each pair (t, T) a total of K² = 32² = 1024 replicated superimpositions of the grid Λ_z onto Y were generated, corresponding to K² random replications of the point z within J₀. Instead of generating independent replications, we adopted a systematic design, which should be expected to be more efficient in most cases. Thus, a UR square subgrid of K × K points of coordinates {(x_i, y_j, i, j = 1, 2, …, K)} was generated within J₀ with a gap Δ = T/K between points, namely, (6) where U₁, U₂ are independente UR numbers in the interval [0, 1). Therefore, the whole set of K² replications required a pair of random numbers only. For each pair (U₁, U₂), relabel the K² subgrid points as {z_k, k = 1, 2, …, K²}. For each k, the corresponding sample total, (7) was computed automatically with the aid of a simple point-in-polygon algorithm (http://www.ariel.com.au/a/python-point-int-poly.html). A particular superimposition Y ∩ Λ_{z_k} is illustrated in Fig 1B. The corresponding design unbiased estimator of the population size N was, (8) The empirical mean and variance of were computed respectively as follows, (9) (10) These values are supposed to be very close to their respective true values. Empirical distributions of the are displayed in Fig 2, confirming unbiasedness and a moderate dispersion.

Download:

Fig 2. Empirical, Monte Carlo histograms of crowd number estimates.

(A) Empirical distributions of the number estimator given by Eq (1), obtained for Fig 1A from 1024 Monte Carlo superimpositions of quadrat grids of different sizes. The side length of the fundamental box was T = 250 pixels. As expected, the histogram means always coincide with the true crowd size because the estimator is unbiased. (B) Analogous data for Fig 7b of [13]. Here T = 150 pixels.

https://doi.org/10.1371/journal.pone.0141868.g002

Download:

Fig 3. Monte Carlo empirical square coefficient of error and corresponding mean predictors.

(A, B, C) Monte Carlo results corresponding to Fig 1A, for fundamental box side lengths T = 200,250,300 pixels, respectively, and for different quadrat side lengths in each case. The equivalent mean sample sizes are also shown. The empirical square coefficient of error (= error variance divided by N²) is represented in blue, whereas the corresponding mean predictors obtained with Eqs (2) and (3), are represented in red and black colour, respectively. Grey dots represent all the replicated values of . The dark grey dots lie between the 2.5% and 97.5% quantiles. The broken horizontal lines correspond to 5% and 10% coefficients of error, respectively. (D, E, F) Analogous data for Fig 7b of [13]—here T = 120,150,180 pixels, respectively.

https://doi.org/10.1371/journal.pone.0141868.g003

We also computed the corresponding K² replicates , . The empirical square coefficient of errors: (11) (12) are compared with the corresponding empirical (’true’) value, in Fig 3.

The estimator var_ind, showed a poor performance. This is not unexpected since the indpendence assumption clearly fails. In addition it looks a bit paradoxical that does not decrease as t and Q increase. This can be explained from the following two facts: first, the coefficient of variation among quadrat contents increases with quadrat size t, and second, the number of non empty quadrats increases relatively slowly as t increases for each value of T. The corresponding graphs are not shown—instead, the empirical distribution of the non empty quadrat contents is displayed in Fig 4 for different quadrat sizes. The contents of large quadrats tends to exhibit a bimodal distribution approaching an unfavourable ‘U’ shape which causes the coefficient of variation to increase abnormally. The bimodality is probably due to perspective effects and to grazing quadrats contributing low quadrat counts. In addition, Fig 1A roughly consists of two different subpopulations: spectators sitting in the front ranks are more sparse than the remaining, standing spectators.

Download:

Fig 4. Empirical probabilities of particle number per non-empty quadrat.

Empirical probabilities of the number of particles in non-empty quadrats in Fig 1A (top row) and Fig 7b of [13] (bottom row). Quadrat sizes t = 33.3,50,100 pixels and t = 15,20,36 pixels are considered for Fig 1A and Fig 7b of [13], respectively.

https://doi.org/10.1371/journal.pone.0141868.g004

The results for look more encouraging, but still subject to improvement.

Sizing a crowd in practice: numerical step-by-step procedures

Example of Fig 1B

The parameters of the grid are t = 50, T = 250, hence τ = t/T = 0.2. The non empty quadrats in Fig 1B are contained in the 6 vertical stripes numbered {1, 2, …, 6}. The corresponding particle counts in the individual quadrats (from bottom to top in the figure) are displayed in Table 1.

Download:

Table 1. Individual quadrat counts and preliminary calculations corresponding to Fig 1B.

https://doi.org/10.1371/journal.pone.0141868.t001

The estimate of N is (13) Recall that the true vale was N = 1120. Now, Eq (5) yields, (14) Further, Eq (3) yields, (15) Thus, the estimate of the percent coefficient of error (or relative standard error) of the number estimator is, (16) which is a reasonable precision taking into account that only 50 particles were counted. The between and within slices contributions are and , respectively.

On the other hand, the naive variance estimate assuming independence between quadrats yields, according to Eq (2), (17) which corresponds to a coefficient of error of .

The 1024 Monte Carlo samples yielded an empirical, nearly true variance . Thus, the estimate obtained above was fairly satisfactory. On the other hand, the Monte Carlo mean of was 19129.27, namely a gross overestimate, as illustrated in Fig 3B.

Planning a population sizing design from the outset

Practical criteria to design a grid which is efficient and convenient to use are:

Aim at a total count Q of between 50 and 150 particles, according to whether the pattern of the particles is judged to be fairly homogeneous (i.e. the population density is seen to vary little in different regions of the picture), or relatively heterogeneous.
Aim at counting no more that 4 or 5 particles per quadrat.

The preceding criteria imply that the planned number of nonempty quadrats may lie between 20 and 50.

It is worth emphasizing that no ‘guess’ or pilot estimate of the target size is required to plan the sampling design. Furthermore, the method will work for any population size.

As an example consider Fig 5. The size of the relevant part of the picture, namely of the region containing the penguins, was approximately 2359 × 826 pixels. The inhomogeneity of the population pattern suggests to aim at counting Q = 150 in about 50 quadrats. To obtain a total of 50 quadrats, we require a fundamental box side length of pixels. We adopted T = 200 pixels. Further, since we aim at counting Q = 150 penguins, the mean number of counts per quadrat will be about 3, which sounds reasonable. To capture 1 − 6 penguins in a quadrat, inspection of the picture suggests a quadrat side length of t = 30 pixels. Random superimposition of the grid at 45° tilting yielded the following 76 quadrat counts: (18) corresponding to the quadrats shown in Fig 6. The 12 subsets within curly brackets correspond to the 12 tilted stripes or ‘slices’ seen in Fig 5. The total count was Q = 123 and (T/t)² = 400/9, whereby penguins. Application of Eq (3) yielded . Counting was fairly cursory (without too much attention) and took only a few minutes. Note that the error variance is reasonably low, despite the inhomegenous distribution of the penguin population. A factor contributing to the low error variance in this case was the smoothness of the total slice counts, {1, 4, 10, 21, 15, 19, 15, 16, 12, 5, 4, 1}, owing in part to the fact that the test grid was conveniently tilted 45°. Theory shows that an important component of the variance trend is the sum of squares of the jumps of the measurement function (of which the preceding sequence is a systematic sample), see [21, 22]. Thus, aiming at a smooth dome shape of such sequence is important [23].

Download:

Fig 5. A practical protocol for systematic quadrat sampling.

Aerial view of an emperor penguin colony on November 2nd, 2012. Photograph by Robin Cristofari, from an altitude of ca. 1,000 feet, Fig 3 of [1]. Picture size: 2359 × 2808 pixels. The adopted quadrat grid with T = 200, t = 30 pixels was superimposed uniformly at random with a tilting of 45° to reduce error variance, see text.

https://doi.org/10.1371/journal.pone.0141868.g005

Download:

Fig 6. Quadrats used to estimate the total number of people in Fig 5.

Magnified version of the 50 quadrats marked in Fig 5.

https://doi.org/10.1371/journal.pone.0141868.g006

Concluding Remarks

As stated in the Introduction, apart from the basic requirement that all the particles in the population should be unambiguously identifiable for counting, there are no other practical limitations to the method. If the sampling protocol is respected, then the resulting estimate is necessarily design unbiased in all cases, irrespective of population pattern and size. Unbiasedness means that the mean of all potential estimates obtained by repeated sampling will coincide with the true population size N. This is a truth founded on the mathematics of the sampling design (see S1 Appendix), and it cannot be verified by experiment unless the true value of N is known, (see e.g. Fig 2).

Our conclusions are the following. (a) For the first time we have implemented a direct, unbiased and efficient design to estimate population size in still pictures, provided that every sampling unit is distinguishable for counting. (b) After sampling, manual counting is a realistic option because sample sizes from 50 to 120 will usually yield unbiased number estimates with a moderate relative standard error, irrespective of population pattern and size. This fact effectively breaks the barrier hitherto imposed by the unavailability of automatic face detection in general. (c) The quadrat grid parameters can be easily designed in each case combining the preceding values with the convenience criterion that the number of individuals counted per quadrat should vary from 1 to 5. (d) A new error variance prediction formula has also been developed and Monte Carlo checked to have a reasonable performance. This probably owes to the fact that the formula takes the correlation structure of the data into account.

Naturally the present method can in principle be used to count people, animals, or any kind of distinguishable objects.

Supporting Information

S1 Appendix. Proof of unbiasedness of .

https://doi.org/10.1371/journal.pone.0141868.s001

(PDF)

Author Contributions

Conceived and designed the experiments: LMCO MC. Performed the experiments: MC DG. Analyzed the data: MC. Contributed reagents/materials/analysis tools: MC DG. Wrote the paper: LMCO MC.

References

1. Ancel A, Cristofari R, Fretwell PT, Trathan PN, Wienecke B, Boureau M, et al. Emperors in Hiding: When Ice-Breakers and Satellites Complement Each Other in Antarctic Exploration. PLoS ONE. 2014 06;9(6):e100404. pmid:24963661
- View Article
- PubMed/NCBI
- Google Scholar
2. Watson R, Yip P. How many were there when it mattered? Significance. 2011;8(3):104–107.
- View Article
- Google Scholar
3. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. vol. 1. IEEE; 2001. p. I–511.
4. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Stat. 2000;28(2):337–374.
- View Article
- Google Scholar
5. Liao S, Jain AK, Li SZ. Unconstrained face detection. MSU-CSE-12-15, Department of Computer Science, Michigan State University; 2012.
6. Botta F, Moat HS, Preis T. Quantifying crowd size with mobile phone and Twitter data. Royal Society Open Science. 2015;2(5):150162. pmid:26064667
- View Article
- PubMed/NCBI
- Google Scholar
7. Baddeley AJ, Jensen EBV. Stereology for Statisticians. Chapman & Hall/ CRC, London; 2005.
8. Howard CV, Reed MG. Unbiased Stereology. Three-dimensional Measurement in Microscopy. 2nd ed. Oxford: Bios/ Taylor & Francis; 2005.
9. Miles RE. On the elimination of edge effects in planar sampling. In:. In: Harding E F, Kendall D G, editors. Stochastic Geometry: A Tribute to the Memory of Rollo Davidson. Wiley, London; 1974. p. 228–247.
10. Miles RE. The sampling, by quadrats, of planar aggregates. J Microsc. 1978;113(3):257–267.
- View Article
- Google Scholar
11. Gundersen HJG. Notes on the estimation of the numerical density of arbitrary profiles: the edge effect. J Microsc. 1977;111(2):219–223.
- View Article
- Google Scholar
12. Baddeley AJ. Spatial sampling and censoring. In:. In: Bandorff-Nielsen O E, Kendall W S, van Lieshout M N M, editors. Stochastic Geometry: Likelihood and Computation. Chapman & Hall/ CRC, London; 1999. p. 37–78.
13. Idrees H, Saleemi I, Seibert C, Shah M. Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE; 2013. p. 2547–2554.
14. Cancio R. Simplemente … Periodismo. Ediciones APM, Madrid; 2010.
15. Matheron G. The Theory of Regionalized Variables and its Applications. vol. 5. Les Cahiers du Centre de Morphologie Mathèmatique de Fontainebleau, No. 5. Fontainebleau: École National Supérieure des Mines de Paris.; 1971.
16. Gual Arnau X, Cruz-Orive LM. Variance prediction under systematic sampling with geometric probes. Adv Appl Probab. 1998;30(4):889–903.
- View Article
- Google Scholar
17. Cruz-Orive LM. Precision of the fractionator from Cavalieri designs. J Microsc. 2004;213(2):205–211. pmid:14731303
- View Article
- PubMed/NCBI
- Google Scholar
18. Cruz-Orive LM. A general variance predictor for Cavalieri slices. J Microsc. 2006;222(3):158–165. pmid:16872414
- View Article
- PubMed/NCBI
- Google Scholar
19. Cruz-Orive LM, Geiser M. Estimation of Particle Number by Stereology: An Update. J Aerosol Med. 2004 sep;17(3):197–212. pmid:15625812
- View Article
- PubMed/NCBI
- Google Scholar
20. Gundersen HJG, Jensen EBV, Kiêu K, Nielsen J. The efficiency of systematic sampling in stereology—reconsidered. J Microsc. 1999;193(3):199–211. pmid:10348656
- View Article
- PubMed/NCBI
- Google Scholar
21. Kiêu K, Souchet S, Istas J. Precision of systematic sampling and transitive methods. J Statist Plan Inf. 1999;77(2):263–279.
- View Article
- Google Scholar
22. García-Fiñana M, Cruz-Orive LM. Improved variance prediction for systematic sampling on . Statistics. 2004;38(3):243–272.
- View Article
- Google Scholar
23. Gundersen HJG. The smooth fractionator. J Microsc. 2002;207(3):191–210. pmid:12230489
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Ancel A, Cristofari R, Fretwell PT, Trathan PN, Wienecke B, Boureau M, et al. Emperors in Hiding: When Ice-Breakers and Satellites Complement Each Other in Antarctic Exploration. PLoS ONE. 2014 06;9(6):e100404. pmid:24963661
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Watson R, Yip P. How many were there when it mattered? Significance. 2011;8(3):104–107.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. vol. 1. IEEE; 2001. p. I–511.

[ref4] 4. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Stat. 2000;28(2):337–374.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Liao S, Jain AK, Li SZ. Unconstrained face detection. MSU-CSE-12-15, Department of Computer Science, Michigan State University; 2012.

[ref6] 6. Botta F, Moat HS, Preis T. Quantifying crowd size with mobile phone and Twitter data. Royal Society Open Science. 2015;2(5):150162. pmid:26064667
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref7] 7. Baddeley AJ, Jensen EBV. Stereology for Statisticians. Chapman & Hall/ CRC, London; 2005.

[ref8] 8. Howard CV, Reed MG. Unbiased Stereology. Three-dimensional Measurement in Microscopy. 2nd ed. Oxford: Bios/ Taylor & Francis; 2005.

[ref9] 9. Miles RE. On the elimination of edge effects in planar sampling. In:. In: Harding E F, Kendall D G, editors. Stochastic Geometry: A Tribute to the Memory of Rollo Davidson. Wiley, London; 1974. p. 228–247.

[ref10] 10. Miles RE. The sampling, by quadrats, of planar aggregates. J Microsc. 1978;113(3):257–267.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref11] 11. Gundersen HJG. Notes on the estimation of the numerical density of arbitrary profiles: the edge effect. J Microsc. 1977;111(2):219–223.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref12] 12. Baddeley AJ. Spatial sampling and censoring. In:. In: Bandorff-Nielsen O E, Kendall W S, van Lieshout M N M, editors. Stochastic Geometry: Likelihood and Computation. Chapman & Hall/ CRC, London; 1999. p. 37–78.

[ref13] 13. Idrees H, Saleemi I, Seibert C, Shah M. Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE; 2013. p. 2547–2554.

[ref14] 14. Cancio R. Simplemente … Periodismo. Ediciones APM, Madrid; 2010.

[ref15] 15. Matheron G. The Theory of Regionalized Variables and its Applications. vol. 5. Les Cahiers du Centre de Morphologie Mathèmatique de Fontainebleau, No. 5. Fontainebleau: École National Supérieure des Mines de Paris.; 1971.

[ref16] 16. Gual Arnau X, Cruz-Orive LM. Variance prediction under systematic sampling with geometric probes. Adv Appl Probab. 1998;30(4):889–903.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref17] 17. Cruz-Orive LM. Precision of the fractionator from Cavalieri designs. J Microsc. 2004;213(2):205–211. pmid:14731303
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref18] 18. Cruz-Orive LM. A general variance predictor for Cavalieri slices. J Microsc. 2006;222(3):158–165. pmid:16872414
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref19] 19. Cruz-Orive LM, Geiser M. Estimation of Particle Number by Stereology: An Update. J Aerosol Med. 2004 sep;17(3):197–212. pmid:15625812
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref20] 20. Gundersen HJG, Jensen EBV, Kiêu K, Nielsen J. The efficiency of systematic sampling in stereology—reconsidered. J Microsc. 1999;193(3):199–211. pmid:10348656
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref21] 21. Kiêu K, Souchet S, Istas J. Precision of systematic sampling and transitive methods. J Statist Plan Inf. 1999;77(2):263–279.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref22] 22. García-Fiñana M, Cruz-Orive LM. Improved variance prediction for systematic sampling on . Statistics. 2004;38(3):243–272.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref23] 23. Gundersen HJG. The smooth fractionator. J Microsc. 2002;207(3):191–210. pmid:12230489
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

Figures

Abstract

Introduction

Materials and Methods

Design based model: Unbiased estimation of N

Variance Estimators

Estimation of the error variance of assuming independence between quadrats.

Estimation of the error variance of using the Cavalieri slices design.

Material

Results and Discussion

Empirical assessment of the variance estimators by Monte Carlo resampling with systematic quadrats on real pictures

Sizing a crowd in practice: numerical step-by-step procedures

Example of Fig 1B

Planning a population sizing design from the outset

Concluding Remarks

Supporting Information

S1 Appendix. Proof of unbiasedness of .

Author Contributions

References