On bootstrap based variance estimation under fine stratification

Alexis Habineza; Romanus Odhiambo Otieno; George Otieno Orwa; Nicholas Makumi

doi:10.1371/journal.pone.0292256

Abstract

The primary focus of all sample surveys is on providing point estimates for the parameters of primary interest, and also estimating the variance associated with those point estimates to quantify the uncertainty. Larger samples and important measurement tools can help to reduce the point estimates’ uncertainty. Numerous effective stratification criteria may be used in survey to reduce variance within stratum. In fine stratification design, the population is divided into numerous small strata, each containing a relatively small number of sampling units as one or two. This is done to ensure that certain characteristics or subgroups of the population are well-represented in the sample. But with many strata, the sample size within each stratum can become small, potentially resulting in higher errors and less stable estimates. The variance estimation process becomes difficult when we only have one unit per stratum. In that case, the collapsed stratum technique is the classical methods for estimating variance. This method, however, is biased and results in an overestimation of the variance. This paper proposes a bootstrap-based variance estimator for the total population under fine stratification, which overcomes the drawbacks of the previously explored estimation approach. Also, the estimator’s properties were investigated. A simulation study and practical application on survey of mental health organizations data were done to investigate properties of the proposed estimators. The results show that the proposed estimator performs well.

Citation: Habineza A, Otieno RO, Orwa GO, Makumi N (2024) On bootstrap based variance estimation under fine stratification. PLoS ONE 19(6): e0292256. https://doi.org/10.1371/journal.pone.0292256

Editor: Saeed Mian Qaisar, CESi Engineering School: Ecole d’Ingenieurs CESi, FRANCE

Received: January 14, 2023; Accepted: September 6, 2023; Published: June 13, 2024

Copyright: © 2024 Habineza et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available protocol package in R (https://rdrr.io/cran/PracTools/man/smho98.html).

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Instead of enumerating the entire population, only the individuals in the sample survey are observed for the purpose of estimating population characteristics. The sample characteristics are used to approximate the population. The inaccuracy in such approximation is known as sampling error, and it is inherent and inescapable in all sampling designs. Nonetheless, when time and cost are considered, sampling results in considerable improvements. This is done to observe characteristics and subsequent handling of data.

A variety of sample selection designs are available, and careful selection will provide accurate and dependable estimates. Rough estimations of sample size n can be derived for each sampling strategy with the necessary degree of precision. The requirement for reliable estimates, generally for very small samples with limited survey resources, along with the various framing and sampling procedures, leads in a complex survey design that uses different sampling techniques. For units taken from a complex survey sampling design, the observed value of the variable of interest is neither independent nor identically distributed. In addition, survey processing strives to improve the quality and usability of survey data by eliminating estimation bias, meeting confidentiality rules and raising the survey’s complexity [1], the examples can be found in [2, 3].

The fundamental purpose of all sample surveys is to get a point estimate for the parameters of primary interest, and also estimating the variance associated with those point estimates to quantify the uncertainty. The significance of variance estimators and related standard errors stems from an estimator’s estimated variance being the most critical component of its quality.

Estimating the sampling variance can be extremely difficult because of the complex sample design, non-linear estimators, and survey processing effects [4]. Simple and exact analytic formulae for statistical variance estimations under various sample designs are offered in [5]. No closed-form formulae for estimating variances exist when sample designs are more complicated or deployed in multiple phases. Furthermore, sophisticated weighting mechanisms make the variance estimation formula of simple statistics like totals challenging, even with simple sample designs. When there is no accurate technique for unbiased computing estimates of point estimates’ standard errors, the only choice is to approximate the required quantities. A different approach is based on replication techniques to get results inside analytic techniques by applying simplified assumptions concerning the sample design or the statistic to be variance-estimated [6].

In a stratified sample design, the target population is divided into a finite number of subpopulations (strata) with homogeneous units that share at least one common trait, such as age, sex, educational or income level, geographical area and economic status among others. Homogeneous subpopulations are often defined by strata, thus reducing the total variance of the parameter of interest. Furthermore, because strata are supposed to be independent, stratification provides a flexible sampling technique per stratum, for example, simple random sampling (SRS) with or without replacement and systematic sampling. The samples in various strata are independent, each estimate and its related variance estimator are just the sums of the corresponding estimators inside each stratum. As a result, the difficulty of finding the proper variance estimator for a stratified single-stage sample is reduced to the problem of determining the optimal variance estimate for each stratum’s sampling designs [7–9]. Therefore, this study focuses on the specific issue of fine stratification design, in which the sample size per stratum is small as n_i = 1 or n_i = 2 primary sampling units (PSUs) selected using (SRS) without replacement.

The variance estimation process becomes difficult when we only have one unit per stratum. This scenario may arise if we have a highly fine stratification. Each stratum has a sample size greater than one, but only one responding unit exists; the sampling design itself imposes a single unit per stratum. For example, in [10] the new Canadian Health Measures Survey (CHMS) samples just a single PSU in one of its five strata, although CHMS estimates are required at the national level. Having a stratum with a single PSU is a fairly common problem. When there is only one PSU within a stratum, there is insufficient information with which to compute an estimate of that stratum’s variance. Some of the suggested solutions and their corresponding drawbacks are detailed and discussed in [9, 11–14].

The variance for two or three PSUs per stratum, on the other hand, is significant. A key technique for estimating the variance of an unknown parameter under fine stratification is to collapse neighboring strata to form pseudo-strata with such a higher number of PSUs and then estimate their variance.

For the first time, the method was introduced by [15], although it frequently overestimates the estimator’s variance. The collapsed stratum technique is the most commonly proposed strategy in the literature for dealing with this problem. The topic of collapsing strata for variance estimation with one unit per stratum is covered in [3, 5, 16, 17]. In either of these circumstances, it is challenging to calculate variances using one sampled unit per stratum directly. According to [15, 18], several auxiliary variables that are correlated well to the expected values of strata’s mean are recommended to minimize the bias of the variance estimator. Unfortunately, this type of useful auxiliary information may not be easily accessible for all strata. Mantel and his coauthor [10] presented to the CHMS a new technique on variance factors from distinct sample stages. Mosaferi [19] designed and constrained empirical Bayesian estimators for a one-unit variance per stratum sampling procedure. The author also compared one PSU per stratum design to two PSUs per stratum design, highlighting some of the inconsistencies of the proposed estimators due to the moment parameter estimation approach.

The collapsed stratum technique is the most commonly used strategy in the literature for dealing with this problem and its results in positive bias as discussed in section 2. By replacing a collapsed stratum estimator with a kernel-weighted stratum neighborhood and utilizing deviations from a fitted mean function, a nonparametric kernel-based technique for estimating variance was developed in [9]. They demonstrated the superiority of their method over the collapsed stratum variance estimator nonparametric using the United States Consumer Expenditure Survey. A major weakness with the use of nonparametric Kernel-based regression over a finite range, is the bias at boundary points. The bias of the estimators towards the boundary points decreases at the expense of increasing variance. The trade-off between Bias and Variance has thus remained a issue. Most of the proposed alternative methods for collapsed stratum variation are based on concomitant or auxiliary information; however, this type of desirable auxiliary information may not be readily available for all strata.

Fine stratification is a popular design as it allows the population to be divided into numerous small strata, each containing a relatively small number of sampling units. This is done to ensure that certain characteristics or subgroups of the population are well-represented in the sample and lead to more precise estimates for specific subgroups. Some examples include the Current Population Survey and National Crime Victimization Survey both conducted by the U.S. Census Bureau, and the National Survey of Family Growth conducted by the University of Michigan’s Institute for Social Research. Clearly, the fine stratification survey has proved useful in many applications as its point estimator is unbiased and efficient. In such situations, traditional variance estimation techniques may not perform well due to the limited number of observations in some strata.

This work suggests a bootstrap-based variance estimator for the total population under fine stratification as an additional method to overcome the inadequacies of previously explored estimate approaches. This method involves repeatedly resampling from the original sample with replacement to create multiple pseudo-samples. These pseudo-samples are used to compute the point estimator of interest (specifically the total) for each resample. The variance of the point estimator is then calculated based on the variability among these pseudo-estimates. The new method is detailed in section 4.

The paper structure is as follows: Section 2 offers a collapsing technique for variance estimation for the total population, section 3 presents non parametric kernel based variance estimation for the total population. Section 4 provides the bootstrap-based variance estimator development and the corresponding properties. Section 5 provides an empirical assessment of the findings, and Section 6 provides the conclusion.

2 Collapsing strata technique for variance estimation

When just one PSU is chosen per stratum or when only one PSU in a stratum participates, strata or PSUs are combined to generate pseudo or analytic strata for variance estimation. The number of PSUs in some sample designs can be extraordinarily enormous, especially in education and establishment surveys, where there can be thousands of first-stage units. In such cases, PSUs, strata, or both may be collapsed together.

Let the population total be estimated by , where is unbiased estimator for the stratum total t_i. Assuming a single element k is selected with inclusion probability π_k from the stratum, the π_k adds to unity in the stratum. In particular for all k if simple random selection is used. After pairing the strata, let i and j refer to the two strata in i^th and j^th pair such that i = 1, …, H and j = 1, …, H. We suppose that the value of the study variables y_k is observed without error for the unit k in the sample s. Our goal is to estimate the total population: (1)

Let us define indicator variable I_k = 1 if k ∈ s and I_k = 0. If the second inclusion probability π_kl > 0 for all , the design is considered to be measurable, and the design variance admits an unbiased estimator as discussed in [20–22] and is given by: (2) which is an unbiased estimation of t, and its variance is determined by (3) where V_i is defined by (4)

In [3, 9, 16] the collapsed stratum variance estimator is given by (5) where

The variance estimator given in (5) is design-biased, and its bias concerning the design is (6)

As shown in (6), the estimator in (5) has a positive bias, and the bias is small if the strata are successfully matched, in the sense that t_i ≈ t_j and c_j(i) = 1. To retain the statistical properties, the pairing must be conducted irrespective of any sample knowledge. There is also a temporal, geographical, or other structure in population that uses fine stratification that may be employed in pairing just to minimize the difference t_i − t_j [6].

3 Nonparametric kernel based variance estimation

To reduce the bias in (5), the alternative method was introduced in [9], where, the binary function c_j(i) in Eq (5) was replaced by the kernel weights defined in section (1.3) of [9]. The following equation, that is (7) provided the nonparametric variance estimator as an alternative to collapse variance estimator under fine stratification: (7)

The expectation variance of the estimator (7) is given by (8) where the nonrandom normalizing constant, c_d, depends on the kernel weights but not on the survey variables and it is defined by: (9)

The estimator (7) has also a positive bias defined by (10)

Therefore c_d was chosen to reduce the part of the bias due to V_i’s if the V_i’s are constant across strata.

4 Bootstrap-based variance estimator

The collapsed variance estimator defined in Eq (5) has non negative bias. Its alternative nonparametric kernel-based estimator defined by Eq (7), also suffer from the boundary bias. As it is known that most kernel smoother have boundary problems and require modifications at the boundary points. That is, towards the boundary points the bias of the estimators decreases at the cost of an increasing variance. It is assumed that for any collapsing, the contribution to the bias of the variance estimator from each pair of strata is known and non negative. Therefore, we are coming up with a methodology of developing the bootstrap-based variance estimator for a given set of H strata, that should be paired to reduce the bias of the variance estimator with no cost to the variance. When applying bootstrapping procedures this single unit can lead to a variety of issues.

The guiding principle in using the bootstrap method to stratified sampling designs is that the bootstrap replicate should itself be a stratified sample selected from the parent sample. However, in this paper, the parent sample usually has only one or two elements per stratum, which is meaningless in implementing resampling. Therefore, in this paper we combined a single unit at each stratum with the next smallest stratum to create the pseudo strata with at least two units, before applying the bootstrap process. Bootstrapping is applied after the process of collapsing the strata with the approximated characteristics which is the source of the bias. The bootstrap sampling is applied for the two groups of collapsed strata and for no collapsed strata by selecting a sample size of n* = 2 in each stratum. Therefore, the bootstrap bias corrector defined by Eq (13) is used to reduce the bias for the collapsed strata.

From (2), we define the bootstrap total population by using the replication variable in stratum population . For a given H, over all B resamples across the stratum, the bootstrap estimator of total population t_b is calculated. We define the bias of an estimator (11)

A bootstrap-based approximation to this bias is given by (12) where are copies of bootstrap of t_bj.

This construction is also based on standard bootstrap thinking to replace the population with the sample’s empirical population. The following defines the bootstrap bias corrector: (13)

Then from (7) we replace the weights d_j(i) by the bootstrap bias corrector defined in (13), therefore, the bootstrap variance estimator under fine stratification is given by: (14) where is the bootstrap bias corrector defined in (13) and c_b is the nonrandom normalizing constant depending on bootstrap bias corrector and is defined by:

4.1 Properties of the proposed estimator

The bootstrap variance estimator is judged based on design expectation, design variance, mean squared error, and a specific sampling design for the fixed finite population. Therefore, we are interested in finding the above estimators’ statistical properties about the sampling design. The design expectation of is given by: (15) (16)

For more details consider the Appendix A in S1 File.

4.1.1 The variance of the estimator.

The design variance of is given by:

Hence the Variance of is given by: (17)

The prove is on Appendix B in S1 File.

4.1.2 Mean squared error of the estimator.

The design mean squared error of our estimator is expressed in terms of bias and variance. However, for the sake of simplicity, it is easily seen that and the bias of is

Accordingly, the design mean squared error of our estimator is given as: (18)

The MSE of the estimators could be simply used for the efficiency comparison, which includes the information of estimator variance and bias. By comparing Eqs (17) and (18), it is easy to see that both relations are approximately related thus the bias of our estimator is expected to be small.

5 Simulation study

5.1 Unconditional simulation study

In the simulation, we investigate the behavior of bootstrap-based variance estimators as compared to collapsed variance estimator and nonparametric kernel-based variance estimator. These are estimated under fine stratification at different strata, bandwidth, and standard deviation error values. For the nonparametric kernel based variance estimator in (7), the Epanechnikov kernel function, is used and bandwidths are chosen as 1/H < h < 2/H to yield smallest possible nonempty kernel window therefore h = {0.025, 0.015, 0.045, 0.0055, 0.0075} has been considered, the detailed discussion seen in [9]. The population x_k is generated as a set of uniform (0, 1) random variables that are distributed independently and identically. A stratified finite population was created with eight survey variables of interest with H evenly sized strata of size N_i = N/H and x_i = i/H for stratum i. During this simulation, 1000 bootstrap samples were used to assess the estimator’s quality. Then, the simulated data was stratified so that each stratum can have two primary sample units, and then evaluated the variance using the three variance estimators specified by Eqs (5), (7) and (14). Three possible values for standard deviation were considered: σ = 0, σ = 0.25, and σ = 0.5. For each of the seven variables, the population is N = 3000. Simple random sampling without replacement is used to create samples with stratum sizes of H = 50, H = 100, and H = 200 because fine stratification allows deep stratification, therefore a larger number of strata have been created and, in all cases, we consider H = 30 to be collapsed. Increasing sample size has the same effect as lowering standard deviation. Therefore, estimators’ design-averaged performance can be evaluated as the population is kept constant during these 1000 bootstrap samples. The design bias, variance, and mean squared error were calculated, and the mean functions were assumed for the eight variables of interest to be (19)

This indicates that for each of the first seven mean functions, the lowest is zero and the maximum is two. In all cases, the population values are generated from the mean functions by adding i.i.d N(0, σ²) errors. That is; (20) so that, the total is given by (21) where the mean functions are defined as: (22) with x ∈ [0, 1]. The above means reflect a range of correct and incorrect model specifications for the estimators under evaluation [22]. As the anticipated model is accurately defined, μ₁, is the preferable estimator. As a result, it’s fascinating to analyze how much efficiency is lost by assuming a smooth rather than underlying linear model. The remaining mean functions deviate from the linear model in various ways. The trend for μ₂ is quadratic and assumed linear model would be incorrect for the whole range of x_k, but appropriate locally. Except for a bump presenting a small portion of the x_k, range, the function μ₃ is linear across most of its range. The smoothness of the mean function μ₄ is not present. The function μ₆ is a sinusoid that completes one full cycle on [0, 1], whereas the function μ₇ completes four full cycles and μ₅ is exponential as discussed in [8, 9, 23].

When V_i = 0, meaning that , Table 1 illustrates the precise biases of the variance estimators. The conclusions described here apply to any design because the t_i values, the kernel, and H are the primary determinants of the variance estimators’ expectation and bias in this case. Compared to the collapsed stratum variance estimator and the non-parametric kernel based variance estimator, the suggested bootstrap-based variance estimator has a substantially less bias for each response variable.

Download:

Table 1. The exact bias of

,

and

for σ = 0.

https://doi.org/10.1371/journal.pone.0292256.t001

Table 2 compares the bias of the three estimators for standard deviation error values other than zero.

Download:

Table 2. The bias comparison of

,

and

estimators.

https://doi.org/10.1371/journal.pone.0292256.t002

The Table 3 compare both the RMSE of the non parametric kernel based variance estimator and the bootstrap-based variance estimator .

Download:

Table 3. The performance of the estimators based on RMSE.

https://doi.org/10.1371/journal.pone.0292256.t003

The suggested bootstrap variance estimator has a smaller RMSE than the collapsed stratum variance estimator, frequently significantly lower in every scenario studied. At each value of H, outperforms because it has a more negligible bias; at higher strata, the variability of the two estimators is essentially comparable.

The Table 4 compares both the Bias of the non parametric kernel based variance estimator and the bootstrap-based variance estimator . The simulation results in illustrate the difference in bias between the two variance estimators. The findings reinforce the preference of especially for higher number of strata H.

Download:

Table 4. The performance of the estimators based on biases.

https://doi.org/10.1371/journal.pone.0292256.t004

5.2 Conditional simulation study

In order to prove the performance of the variance estimates depend on , we arranged the 1000 bootstrap samples from each population to increase values in . We then grouped the samples in 50 sets of 20 so that the first set contains 20 wherein are smallest, the next set contains the samples with the next 20 smallest in , and so on. For each of these so 50 sets, we calculated the average value of , the conditional root mean squared error (CRMSE), and the variance estimates’ averages for all the variance estimators. Thereafter, the values of CRMSE and conditional biases against the average values of was plotted. Figs 1–14 show that the new estimator has a small RMSE and biases respectively in almost every scenario considered.

Download:

Fig 1. Comparison based on the biases of the 3 estimators for linear mean function.

https://doi.org/10.1371/journal.pone.0292256.g001

Download:

Fig 2. Comparison based on RMSE of the 3 estimators for linear mean function.

https://doi.org/10.1371/journal.pone.0292256.g002

Download:

Fig 3. Comparison based on biases of the 3 estimators for quadratic mean function.

https://doi.org/10.1371/journal.pone.0292256.g003

Download:

Fig 4. Comparison based on RMSE of the 3 estimators for quadratic mean function.

https://doi.org/10.1371/journal.pone.0292256.g004

Download:

Fig 5. Comparison based on biases of the 3 estimators for bump mean function.

https://doi.org/10.1371/journal.pone.0292256.g005

Download:

Fig 6. Comparison based on RMSE of the 3 estimators for bump mean function.

https://doi.org/10.1371/journal.pone.0292256.g006

Download:

Fig 7. Comparison based on biases of the 3 estimators for jump mean function.

https://doi.org/10.1371/journal.pone.0292256.g007

Download:

Fig 8. Comparison based on RMSE of the 3 estimators for jump mean function.

https://doi.org/10.1371/journal.pone.0292256.g008

Download:

Fig 9. Comparison based on biases of the 3 estimators for cycle1 mean function.

https://doi.org/10.1371/journal.pone.0292256.g009

Download:

Fig 10. Comparison based on RMSE of the 3 estimators for cycle1 mean function.

https://doi.org/10.1371/journal.pone.0292256.g010

Download:

Fig 11. Comparison based on biases of the 3 estimators for cycle4 mean function.

https://doi.org/10.1371/journal.pone.0292256.g011

Download:

Fig 12. Comparison based on RMSE of the 3 estimators for cycle4 mean function.

https://doi.org/10.1371/journal.pone.0292256.g012

Download:

Fig 13. Comparison based on biases of the 3 estimators for exponential mean function.

https://doi.org/10.1371/journal.pone.0292256.g013

Download:

Fig 14. Comparison based on RMSE of the 3 estimators for exponential mean function.

https://doi.org/10.1371/journal.pone.0292256.g014

Different strata for all mean functions in deriving the biases were considered. It is clear that the new estimator is better in terms of having small bias under the same conditions than the estimators favoured in the current practice. By comparing the and estimators, it was found that except for the quadratic, jump and exponential, when smoothing over the discontinuity is the incorrect strategy, the bias of is substantially smaller than the bias of for any other response variable. By splitting the sample at the discontinuity, calculating two variance estimators, and then combining them, this might be readily solved in reality for both collapsed and kernel variance estimators. In terms of their RMSE of the is small compared to the RMSE of except for the jump.

5.3 Data application

The performance of the developed estimator is evaluated by using the 1998 SMHO data available in the protocol package in R precisely available on https://rdrr.io/cran/PracTools/man/smho98.html. The Substance Abuse and Mental Health Services Administration in the United States performed the 1998 SMHO. It gathered information on mental health care organizations and general hospitals that provide mental health care services intending to develop national and state-level estimates for total expenditure, full-time equivalent staff, bed count, and total caseload by organization type. N = 875 is the original data set size, with 12 variables divided into 16 strata. Then, after eliminating outpatient facilities, only organizations with a number of beds greater than zero remained. After that, the strata pairs {12, 13}, {10, 11}, {6, 8}, and {4, 5} have collapsed due to the small size of the leftover PSUs following the exclusion phase. As a result, we build eight new strata to estimate variance. The number of beds (total inpatient beds) and EXPTOTAL (total expenditures) were considered as the variables of interest, and strata were ordered by x_i = log of total beds in stratum i. After collapsing the aforementioned strata, a simple random sample without replacement has been used to select a sample size of two PSUs per stratum (n_i = 2) and estimate the variances using the three variance estimators methods specified in the manuscript. Following that, we examine the coefficient of variation (CV), root mean squared error (RMSE) and the findings for each estimator are shown in the Table 5 as well as in Figs 15 and 16.

Download:

Table 5. SMOH RMSE, bias and coefficient of variation results.

https://doi.org/10.1371/journal.pone.0292256.t005

Download:

Fig 15. Comparison of the

estimatorag ainst the

and

estimators.

https://doi.org/10.1371/journal.pone.0292256.g015

Download:

Fig 16. Comparison of the

estimator against the

and

estimators.

https://doi.org/10.1371/journal.pone.0292256.g016

The bootstrap-based variance estimator has the lowest CV and RMSE values as well as the small bias among the rest.

6 Concluding remarks

A bootstrap-based variance estimator has been developed as an alternative to the collapsed variance and the non parametric kernel-based variance estimators. These approaches are currently applied in fine stratification and are frequently used in survey statistics research. Fine stratification survey has proved useful in many applications as its point estimator is unbiased and efficient. A common practice to estimate the variance in this context are collapsing the adjacent strata to create pseudo-strata and then estimating the variance, and a non-parametric kernel-based variance estimator but both the attained estimator of variance are not design-unbiased, and the bias increases as the population means of the pseudo-strata become more variant and these estimators may suffer from a large root mean squared error RMSEs. A number of alternative variance estimators have been proposed in the literature, but they often rely on some strong auxiliary variables well-correlated with the response variable, or they have a complex form, which make them inapplicable in the real life. This paper proposes a viable solution for this long-standing problem based on a bootstrap-based variance estimator technique that replaces the pseudo strata and the kernel weight by the bootstrap bias corrector. Its properties have been determined, and the simulation study and the real data application show that the new estimator performs well in each case considered. It has a small root mean squared error compared to the current estimators under the same conditions. The proposed approach provides the variance estimates that appropriately account for the complexities of the sampling design and the specific characteristics of interest within the population. It leads to more accurate and precise statistical inference for complex survey data compared to existing approaches.

Supporting information

S1 File.

https://doi.org/10.1371/journal.pone.0292256.s001

(PDF)

S1 Abbreviation.

https://doi.org/10.1371/journal.pone.0292256.s002

(PDF)

Acknowledgments

We want to take this opportunity to express our gratitude to the Pan-African University Institute of Basic Sciences, Technology, and Innovation (PAUSTI) for the support.

References

1. Franklin S and Walker C. Survey methods and practices. Statistics Canada Social Survey Methods Division, Ottawa, 41(3):1–52, 2003.
- View Article
- Google Scholar
2. Lohr S. Sampling design and analysis Duxbury Press Pacific Grove. CA, volume 221, pages 249, 1999.
3. Särndal Carl-Erik and Swensson Bengt and Wretman Jan. Model assisted survey sampling. Springer Science & Business Media, 2003.
4. Binder David A and Roberts, Georgia R. Design-based and model-based methods for estimating model parameters. Analysis of survey data, Wiley Chichester, pages 29–48, 2003.
5. Cochran William G. Sampling techniques. John Wiley & Sons, 2007.
6. Berger Yves G and Priam Rodolphe. A simple variance estimator of change for rotating repeated surveys: an application to the European Union Statistics on Income and Living Conditions household surveys. Journal of the Royal Statistical Society: Series A (Statistics in Society), Wiley Online Library, 179(1):251–272, 2016.
- View Article
- Google Scholar
7. Hoshaw-Woodard, Stacy. Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage. Citeseer, 2001.
8. Dahlke, Mark and Breidt, F Jay and Opsomer, Jean D and Van Keilegom, Ingrid. Nonparametric endogenous post-stratification estimation. Statistica Sinica, JSTOR, 189–211, 2013.
9. Breidt F Jay and Opsomer Jean D and Sanchez-Borrego Ismael. Nonparametric variance estimation under fine stratification: an alternative to collapsed strata. ACM computing surveys (CSUR), publisher = Taylor & Francis, 111(514):822–833, 2016.
- View Article
- Google Scholar
10. Mantel Harold and Giroux Suzelle. Methodologies for data quality assessment and improvement. Variance Estimation in Complex Surveys with One PSU per Stratum, Atlantic, 1(1), 2009.
- View Article
- Google Scholar
11. Lu, Lu and Larsen, Michael D. Variance Estimation in a High School Student Survey with One-Per-Stratum Strata. Proceedings of the Third International Conference on Establishment Surveys (ICES-III), 2007.
12. DeBell, Matthew. How to analyze ANES survey data. American National Elections Studies. https://electionstudies.org/wp-content/uploads/2018/05/HowToAnalyzeANESData.pdf (December 7, 2019), Citeseer, 2010.
13. Rust, K and Krawchuk, S. Survey weighting and the calculation of sampling variance. PISA, 89–98, 2000.
14. Hartley HO and Rao JNK and Kiefer Grace. Variance estimation with one unit per stratum. Journal of the American Statistical Association, Taylor & Francis, Vol. 64, No. 327, pp. 841–851, 1969.
- View Article
- Google Scholar
15. Hansen, Morris H and Hurwitz, William N and Madow, William G. Sample survey methods and theory. Vol. I. Methods and applications. John Wiley, 1953.
16. Wolter, Kirk. Introduction to variance estimation. Springer Science & Business Media, 2007.
17. Mosaferi, Sepideh. A Bayesian Approach for the Variance of Fine Stratification. arXiv preprint arXiv:2110. 10296, 2021.
18. Isaki Cary T. Variance estimation using auxiliary information. Journal of the American Statistical Association, Taylor & Francis, 78(381):117–123, 1983.
- View Article
- Google Scholar
19. Mosaferi, Sepideh. Empirical and Constrained Empirical Bayes Variance Estimation Under A One Unit Per Stratum Sample Design. arXiv preprint arXiv:1910. 05840, 2019.
20. Breidt F Jay and Opsomer Jean D. Model-assisted survey estimation with modern prediction techniques. Statistical Science, Institute of Mathematical Statistics, 32(2):190–205, 2017.
- View Article
- Google Scholar
21. Horvitz Daniel G and Thompson Donovan J. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, Taylor & Francis, 47(260):663–685, 1952.
- View Article
- Google Scholar
22. Breidt F Jay and Opsomer Jean D. Local polynomial regression estimators in survey sampling. Annals of statistics, JSTOR, 1026–1053, 2000.
- View Article
- Google Scholar
23. Breidt FJ and Claeskens Gerda and Opsomer JD. Model-assisted estimation for complex surveys using penalised splines. Biometrika, 92(4) Oxford University Press, 831–846, 2005.
- View Article
- Google Scholar

[ref1] 1. Franklin S and Walker C. Survey methods and practices. Statistics Canada Social Survey Methods Division, Ottawa, 41(3):1–52, 2003.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Lohr S. Sampling design and analysis Duxbury Press Pacific Grove. CA, volume 221, pages 249, 1999.

[ref3] 3. Särndal Carl-Erik and Swensson Bengt and Wretman Jan. Model assisted survey sampling. Springer Science & Business Media, 2003.

[ref4] 4. Binder David A and Roberts, Georgia R. Design-based and model-based methods for estimating model parameters. Analysis of survey data, Wiley Chichester, pages 29–48, 2003.

[ref5] 5. Cochran William G. Sampling techniques. John Wiley & Sons, 2007.

[ref6] 6. Berger Yves G and Priam Rodolphe. A simple variance estimator of change for rotating repeated surveys: an application to the European Union Statistics on Income and Living Conditions household surveys. Journal of the Royal Statistical Society: Series A (Statistics in Society), Wiley Online Library, 179(1):251–272, 2016.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref7] 7. Hoshaw-Woodard, Stacy. Description and comparison of the methods of cluster sampling and lot quality assurance sampling to assess immunization coverage. Citeseer, 2001.

[ref8] 8. Dahlke, Mark and Breidt, F Jay and Opsomer, Jean D and Van Keilegom, Ingrid. Nonparametric endogenous post-stratification estimation. Statistica Sinica, JSTOR, 189–211, 2013.

[ref9] 9. Breidt F Jay and Opsomer Jean D and Sanchez-Borrego Ismael. Nonparametric variance estimation under fine stratification: an alternative to collapsed strata. ACM computing surveys (CSUR), publisher = Taylor & Francis, 111(514):822–833, 2016.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref10] 10. Mantel Harold and Giroux Suzelle. Methodologies for data quality assessment and improvement. Variance Estimation in Complex Surveys with One PSU per Stratum, Atlantic, 1(1), 2009.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref11] 11. Lu, Lu and Larsen, Michael D. Variance Estimation in a High School Student Survey with One-Per-Stratum Strata. Proceedings of the Third International Conference on Establishment Surveys (ICES-III), 2007.

[ref12] 12. DeBell, Matthew. How to analyze ANES survey data. American National Elections Studies. https://electionstudies.org/wp-content/uploads/2018/05/HowToAnalyzeANESData.pdf (December 7, 2019), Citeseer, 2010.

[ref13] 13. Rust, K and Krawchuk, S. Survey weighting and the calculation of sampling variance. PISA, 89–98, 2000.

[ref14] 14. Hartley HO and Rao JNK and Kiefer Grace. Variance estimation with one unit per stratum. Journal of the American Statistical Association, Taylor & Francis, Vol. 64, No. 327, pp. 841–851, 1969.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref15] 15. Hansen, Morris H and Hurwitz, William N and Madow, William G. Sample survey methods and theory. Vol. I. Methods and applications. John Wiley, 1953.

[ref16] 16. Wolter, Kirk. Introduction to variance estimation. Springer Science & Business Media, 2007.

[ref17] 17. Mosaferi, Sepideh. A Bayesian Approach for the Variance of Fine Stratification. arXiv preprint arXiv:2110. 10296, 2021.

[ref18] 18. Isaki Cary T. Variance estimation using auxiliary information. Journal of the American Statistical Association, Taylor & Francis, 78(381):117–123, 1983.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref19] 19. Mosaferi, Sepideh. Empirical and Constrained Empirical Bayes Variance Estimation Under A One Unit Per Stratum Sample Design. arXiv preprint arXiv:1910. 05840, 2019.

[ref20] 20. Breidt F Jay and Opsomer Jean D. Model-assisted survey estimation with modern prediction techniques. Statistical Science, Institute of Mathematical Statistics, 32(2):190–205, 2017.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref21] 21. Horvitz Daniel G and Thompson Donovan J. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, Taylor & Francis, 47(260):663–685, 1952.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref22] 22. Breidt F Jay and Opsomer Jean D. Local polynomial regression estimators in survey sampling. Annals of statistics, JSTOR, 1026–1053, 2000.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref23] 23. Breidt FJ and Claeskens Gerda and Opsomer JD. Model-assisted estimation for complex surveys using penalised splines. Biometrika, 92(4) Oxford University Press, 831–846, 2005.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

Figures

Abstract

1 Introduction

2 Collapsing strata technique for variance estimation

3 Nonparametric kernel based variance estimation

4 Bootstrap-based variance estimator

4.1 Properties of the proposed estimator

4.1.1 The variance of the estimator.

4.1.2 Mean squared error of the estimator.

5 Simulation study

5.1 Unconditional simulation study

5.2 Conditional simulation study

5.3 Data application

6 Concluding remarks

Supporting information

S1 File.

S1 Abbreviation.

Acknowledgments

References