Table 1.
Bo municipal survey data tabulated by section.
Fig 1.
Flow chart for stratified sampling protocols.
This figure summarizes all of the optimization and control protocols for stratified sampling developed in this study. See text for a summary of each major protocol and its corresponding steps through the flow chart. The light brown parallelogram is the starting point for all protocols, the yellow diamonds are decision boxes, and the light green squares denote the process end states.
Table 2.
Neyman-optimized allocation as a function of sample size and stratum.
Fig 2.
Relative uncertainty of optimized Horvitz-Thompson population estimates.
Quantile boxplots (0.25, 0.75) showing the distribution of the stratified Horvitz-Thompson population estimates as a function of sample size and stratification protocol. The bar in each box is the median value of the estimate, while outliers deviating by one or more quantiles from the median are denoted as discrete points. (A) control—all 20 sections are placed in a single stratum (B) 4 strata, with proportional allocation for sample selection (C) 4 strata, with Neyman allocation for sample selection. Persons per residence was used as the stratification variable, and there were 1,000 simulations for each boxplot.
Table 3.
Optimal cluster allocation as a function of sample size.
Table 4.
Neyman stratification of Bo sections by “residential structures per section” and “persons per section.”
Table 5.
A comparison of uncertainty for unstratified, proportional-, and Neyman-allocated population estimates.
Fig 3.
Single-stage cluster sampling.
Quantile boxplots for 1,000 stratified 4-level simulated single-stage cluster sampling trials using H-T estimation. The bar in each box is the median value of the estimate, while outliers deviating by one or more quantiles from the median are denoted as discrete points. Four selected sections are completely sampled on each simulation trial. (1) “Survey” is the measured value of the population of the 20 sections (25,954 persons). (2) 4L/4C (pers.)—4 cluster sample, sections stratified by “persons per section.” (3) 4L/4C (strs.)—4 cluster sample, sections stratified by “residential structures per section.” (4) 1L/4C—4 clusters selected at random from the 20 available sections.
Table 6.
A comparison of simulation results for single-stage cluster sampling.
Fig 4.
Quantile boxplots for each of the 20 sections.
For each section, a quantile boxplot (0.25, 0.75) shows the distribution of the number of persons per residence, arranged in descending order of total section population. The bar in each box is the median value, while outliers deviating by one or more quantiles from the median are denoted as discrete points. The width of each box is proportional to the square root of the number of residential structures (i.e., records) in the section. Roma is an anomaly with 4 residential structures, and 139 total persons.
Fig 5.
Quantile boxplots for optimal 4-level stratification by “persons per residence.”
The 4-level stratification variable is “persons per residence” (Table 2-d). The quantile boxplots [0.25, 0.75] show the partitioning of the records by stratum for all 1,979 records. The bar in each box is the median value of persons per residence, while outliers deviating by one or more quantiles from the median are denoted as discrete points. The samples in a given stratum may be assigned from any of the 20 eligible sections. The optimized Neyman allocation has completely separated the 4 strata with respect to overlapping values of the stratification variable.
Fig 6.
Quantile boxplots for single-stage cluster stratification by (A) “residential structures per section” and (B) “total persons per section”.
(A) For the single-stage cluster sampling, the 20 sections were partitioned into 4 proportionally-allocated stratification levels. Within each stratum, the sections are arranged in descending order of total persons. The stratification variable is the total number of residential buildings per section (see Table 4). The quantile boxplots show the partitioning by stratum of the 1,979 records in the database, although only a subset of 4 sections will be drawn on a single simulation trial. The bar in each box is the median value of “persons per residence,” while outliers deviating by one or more quantiles from the median are denoted as discrete points. (B) Quantile boxplots showing stratification by total persons per section. This stratification approach requires that the population of each section be known, in contrast to stratification by residential structures per section.