Figures
Abstract
In this study, we address the problem of estimating the finite population mean when the non-response occurs on the characteristics under study. We propose a class of Rao-regression type estimators when ranked set sampling (RSS) procedure is used to collect the data from non-response group only and from both, the response and non-response groups. The information provided on the auxiliary variable is used at both stages i.e., at designing stage and the estimation stage. Expressions for bias and mean square error of the estimators are obtained up to first order of approximation. A comprehensive simulation study is carried out to observe the performances of the estimators under non-response.
Citation: Rehman SA, Shabbir J (2022) An efficient class of estimators for finite population mean in the presence of non-response under ranked set sampling (RSS). PLoS ONE 17(12): e0277232. https://doi.org/10.1371/journal.pone.0277232
Editor: Eugene Demidenko, Dartmouth College Geisel School of Medicine, UNITED STATES
Received: May 31, 2022; Accepted: October 22, 2022; Published: December 16, 2022
Copyright: © 2022 Rehman, Shabbir. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
A survey can be conducted by using the variety of sampling techniques depending upon situation and structure of the data. Simple random sampling (SRS) is an easy and unbiased data collection method, but in situations where the overall population is scattered and diverse, it requires a larger sample for a given margin of accuracy. On the other hand, estimation of population parameters becomes more problematic if non-response occurs in a scattered population. Non-response is an inability to collect data from one or more units of a population selected in a sample frame. Most of the time, we have to generalize the survey results by using a smaller sample, which demands that an appropriate sampling method should be employed to collect the sample. In such case, researchers tend to employ robust methods for survey design and parameter estimation. For example, some writers have proposed estimators of the population mean based on L-moments theory, which is robust to outliers and less vulnerable to the effects of sample volatility. The L-moments theory includes the use of order statistics, and it leads to new procedures for estimating population parameters. Computation of the first few sample L-moments and their ratios provides a useful review of the location, shape, and dispersion of the population from which the sample was drawn. In this regard, the work of [1–4] can be seen. Another efficient and alternative approach of estimating population parameters is ranked set sampling (RSS) procedure, which is determined by ranking a greater number of sampling units based on their relative sizes, then picking a smaller number of units from each ranked group under observation. As a result, RSS increases the precision of estimates by reducing sampling error. The availability of some appropriate auxiliary variables that are correlated with the variable under study is also a factor in improving the estimation. The RSS sampling procedure utilizes the auxiliary information to collect data by using its order at the designing stage.
Theory of RSS was first introduced by [5]. In this method, units of the study variable are ordered by either visual judgment or by some cheap quantitative measures. [6] developed an unbiased estimator of the population mean under RSS technique. [7] discovered another use of the auxiliary variable, and it is that one can use its order to rank units of the study variable. [8–18] can also be seen for the use of this feature of an auxiliary variable. Use of the auxiliary variable can also be used to reduce sample variation and increase the efficiency of estimates at the estimation stage. Several authors have claimed that use of the auxiliary variable at the designing stage can result in more efficient estimates. For the applications of RSS in various disciplines of research, one can review the work of [19–23].
[24] has considered the comprehensive layout of the non-response under RSS. In his book, the problem of non-response when the sample is collected through RSS on the second attempt from a non-response group can be seen. Inspired by [24–27], we investigate the estimation of finite population mean under RSS in the presence of non-response.
Importance of the problem
Non-response is thought to be one of the most important problems in the theory of survey sampling. The occurrence of non-response leads to skewed estimates and a less representative sample of the population. The discrepancy between respondents and non-respondents on a given measure, combined with the non-response rate in the population, produces non-response bias. A lower response rate raises the risk of larger non-response bias, but when data is missing at random (MAR), a lower response rate has no effect on the non-response error. In practice, using information from the auxiliary variable is often costly, but classifying things based on it, is quite straightforward. We assume that the RSS technique may improve the precision in estimating the population mean by using information from the auxiliary variable at the estimation stage.
Methodology
We consider the naive model of [28]. According to which, we draw a sample S of size ns by using SRSWOR method from a finite population Ω of size N. Let n1 units respond to the survey at first attempt while n2 (= n − n1) units do not respond. Special efforts are made to approach the non-responding units and a part of them is included in the sample. Thus, we get a final sample of size
for estimation purpose. This allows the entire population to be divided into two complementary groups, called response and non-response groups.
Let ; j = 1, 2 be population units of the study variable (Y) and the auxiliary variable (X) in the two groups. Our goal is to estimate the finite population mean
of the study variable. [28] suggested the following unbiased estimator when non-response occurs on Y:
(1)
The variance of
is given by
(2)
where
and
Similar results can be obtained for an auxiliary variable X when it is involved in the estimation of population mean for the study variable Y with the following covariance:
When population mean of the auxiliary variable is known and incomplete information exists only on the study variable, then the [29, 30] suggested the following ratio and regression estimators for estimating
as
(3)
and
(4)
[31] defined the following ratio and regression estimators when non-response occurs on both variables:
(5)
and
(6)
where
and
are estimates of population regression coefficient
such that
where
(7)
[27] proposed the following Rao-regression type estimators in the line of [32].
(8)
where d1 and d2 are scalars, whose values are either pre-determined or calculated wisely to minimize the MSE of the estimator. The minimum MSE of ts1 with optimum values of d1 and d2, is given by
and
(9)
[27] also proposed the following generalized class of Rao-regression type estimators by using the idea of [26]:
(10)
where d3 and d4 are constants and h is generic function of
satisfying some mild conditions. The optimum values of d3 and d4 along with the minimum MSE, are given by
and
(11)
where
and
[27] showed that this class of estimators is more precise than usual [28], ratio and regression estimator as discussed above under when some certain conditions are satisfied. It should be noted that the values of a, b and c are {1, −1, 1} and for
and
respectively.
RSS procedue
The RSS procedure is described as in the following steps:
- Step 1: Collect ν independent sets each of ν units.
- Step 2: Array each set inside in ascending order by mean of the study variable or any closely related auxiliary variable. The ranking is done either by visual inspection or some quantitative measurements.
- Step 3: Select the lowest order unit from the first set.
- Step 4: Select the second lowest order unit from the second set and continue selecting units in this way until νth order statistic is selected from νth set.
This completes one cycle of an RSS procedure which can be repeated r times to obtain a sample of size n = rν. The selected units can be represented as Y1(1)j, Y2(2)j, …, Yi(i)j, …, Yν(ν)j; i = 1, 2, …, ν, j = 1, 2, …, r.
RSS under non-response
[24] has suggested a very comprehensive work on dealing with the missing observations under RSS. Here we discuss the following two major situations of dealing non-response under RSS.
Situation-I: When RSS is used at second attempt only.
Let we collect data at second attempt by the method of RSS in such a way that ν independent sets, each of size ν are selected from the non-response group. The later procedure is followed step by step as discussed in Section 3 earlier. The estimate of sample mean from non-responding units based on sampled units under RSS, is given by
(12)
If the population is symmetrically distributed then
is an unbiased estimator of the population mean with the following expected variance
(13)
where
Thus, Eq (1) becomes
(14)
The estimator
is unbiased for which the variance is given by
(15)
Similar result can be obtained for the auxiliary variable X as
(16)
Situation-II: When RSS is used at both attempts.
It is an extension of Situation-I in which we collect a sample from both groups by using RSS.
The estimator of sample mean can be written as
(17)
where
is the sample mean based on n1 units collected at first attempt, while
is the sample mean obtained from
units collected at second attempt.
is also an unbiased estimator, the variance of
given by
(18)
Similar expressions can be obtained for the auxiliary variable as
(19)
Note that the set size ν is kept constant while other notations are used as
.
Generalized class of Rao-regression type estimators under RSS in the presence of non-response
We extend the work of [26, 32] for the estimation of finite population mean when non-response occurs in surveys and RSS is used for the collection of data instead of SRS.
Our first suggested estimator is;
(20)
where q1 and q2 are constants, whose optimum values are used to minimize the error of estimate.
Eq (20) can be written as
(21)
The error terms are defined as
and
, then it is easy to calculate that;
and
The bias and MSE of tr1, are given by
or
The optimum values of q1 and q2 with minimum MSE, are given by
and
(22)
It should be noted that the value of q1opt is mathematically always a positive quantity, while the nature of sign for q2opt is depending upon the correlation coefficient between Y and X.
Similarly, our second suggested general class of estimators, is given by
(23)
where q3 and q4 are pre-determined constants and h is generic function of
which satisfies the following mild conditions;
- Function h is bounded in the vicinity of zeros and is continous.
- Function h is independent of n, N and (X1, X2, …, XN).
- Function h is thrice differentiable with bounded and continuous derivatives.
Eq (23) can be written as:
Expanding
using Taylor series up to including terms i.e.,
, the resulting expression can be written as
where h(0) is a constant term, h′(0) is first order partial derivative in zero and h′′(0) is second order partial derivative in zero. For simplicity we write h(0) = a, h′(0) = b and h′′(0) = c. Thus
or
(24)
The bias and MSE of tri, are given by
(25)
The optimum values of q3 and q4 are given by
and
The minimum MSE of tri, is given as,
(26)
where
and
Choice of function h
The proposed class of estimators is determined by the function which can theoretically and practically take into account a variety of options, but in our study, we discuss the following two well-known functions.
The ratio function.
We consider [33, 34] estimators as a choice for ratio function h.
(27)
where τ is a constant, and
- h(0) = a = 1,
- h′(0) = b = −τ,
-
.
Remarks.
- If we consider τ = 0, then {a, b, c} = {1, 0, 0} and the proposed estimator tri is converted to proposed estimator tr1 as
(28)
- If we consider τ = 1, then {a, b, c} = {1, −1, 1} and the function h is converted to Ray and Singh (1981) estimator. Thus the proposed estimator becomes
(29)
- If we consider τ = −1, then {a, b, c} = {1, 1, 0}. Hence the proposed estimator is converted to the product type estimator which is used when the correlation coefficient between the study variable and the auxiliary variable is negative i.e.,
(30)
The exponential function.
Now we consider the exponential function proposed by [35] as a possible choice of function h i.e.,
(31)
Then proposed class of estimator takes the following form:
(32)
By expanding
and using Taylor’s series, we get;
- h (0) = a = 1,
-
-
By using the values of constants a, b and c, we can easily calculate the corresponding optimum values of q3 and q4 with minimum MSE of estimators by using Eq (26), for the two choices of function h discussed above.
Efficiency comparison
The gain in precision of the proposed class of estimators is completely relying on the precision gained in the estimation by using RSS instead of SRS sampling method. So, it is reasonable to compare the variances of sample mean by using the two competing sampling strategies. Consider the variance of
(33)
In the above expression, the term is same as the variance of
, which indicate that the RSS procedure will produce more precise estimates than the SRS if the inequality μy(i) ≠ μy holds. This also leads to a certainty that if we use the RSS procedure for collecting data at both attempts, then the proposed class of estimators will be more precise than the estimators discussed under Situation-I, where we collect data through RSS only at second attempt.
Simulation study
The simulation study is carried out with the same setup used by [28]. We generated the auxiliary variable for two groups as Xj ∼ Normal(Nj, μyj, σyj); j = 1, 2. The corresponding study variable is produced by using the relationship , where ej ∼ Normal(Nj, 0, 1) and ρyx is the coefficient of correlation between Y and X. A sample of size
is selected from a population by using the procedures of SRS and RSS and sample means are calculated for the study and the auxiliary variables under Situation-I and Situation-II. Then the sample mean for competing and proposed estimators are estimated. This procedure is repeated 20,000 times to calculate the MSE and RE of the estimators using the following formula;
(34)
(35)
where tri represents an estimator under consideration. The results are given in Tables 1–6.
Conclusions
In this paper, we proposed a generalized class of Rao-regression type estimators for finite population mean under non-response when RSS is used to collect the data at second attempt and at both attempts. Expressions for bias and MSE of the proposed class of estimators were derived up to the first order of approximation. A brief simulation study was conducted to determine the RE of proposed class of estimators for different values of non-response rate k, set size ν, and overall sample size n.
Based on simulation results, we suggest using the proposed class of estimators under RSS for more precise estimation of the finite population mean when non-response occurs in the data. Additionally, we encourage using the RSS technique for data collection in both attempts since it provides more precise estimates.
Acknowledgments
I would like to express gratitude to my PhD supervisor, Prof. Dr. Javid Shabbir, who helped me write this article.
References
- 1. Shahzad U, Ahmad I, Almanjahie IM, Al Noor NH, Hanif M. A new class of L-Moments based calibration variance Estimators. CMC-COMPUTERS MATERIALS & CONTINUA. 2021;66(3):3013–3028.
- 2. Anas MM, Huang Z, Alilah DA, Shafqat A, Hussain S. Mean estimators using robust quantile regression and L-moments’ characteristics for complete and partial auxiliary information. Mathematical Problems in Engineering. 2021;2021.
- 3. Anas MM, Huang Z, Shahzad U, Zaman T, Shahzadi S. Compromised imputation based mean estimators using robust quantile regression. Communications in Statistics-Theory and Methods. 2022; p. 1–16.
- 4. Lone SA, Subzar M, Sharma A. Enhanced estimators of population variance with the use of supplementary information in survey sampling. Mathematical Problems in Engineering. 2021;2021.
- 5. McIntyre G. A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research. 1952;3(4):385–390.
- 6. Takahasi K, Wakimoto K. On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the Institute of Statistical Mathematics. 1968;20(1):1–31.
- 7. Lynne Stokes S. Ranked set sampling with concomitant variables. Communications in Statistics-Theory and Methods. 1977;6(12):1207–1211.
- 8. Singh HP, Tailor R, Singh S. General procedure for estimating the population mean using ranked set sampling. Journal of Statistical Computation and Simulation. 2014;84(5):931–945.
- 9. Abu-Dayyeh WA, Ahmed MS, Ahmed R, Muttlak HA. Some estimators of a finite population mean using auxiliary information. Applied Mathematics and Computation. 2003;139:287–298.
- 10. Kadilar C, Cingi H. A new estimator using two auxiliary variables. Applied Mathematics and Computation. 2005;162(2):901–908.
- 11. Malik S, Singh R. An improved estimator using two auxiliary attributes. Applied Mathematics and Computation. 2013;219(23):10983–10986.
- 12. Singh R, Sharma P. A class of exponential ratio estimators of finite population mean using two auxiliary variables. Pakistan Journal of Statistics and Operation Research. 2015; p. 221–229.
- 13. Fatima M, Shahbaz SH, Hanif M, Shahbaz MQ. A modified regression-cum-ratio estimator for finite population mean in presence of nonresponse using ranked set sampling. AIMS Mathematics. 2022;7(4):6478–6488.
- 14. Muneer S, Shabbir J, Khalil A. Estimation of finite population mean in simple random sampling and stratified random sampling using two auxiliary variables. Communications in Statistics-Theory and Methods. 2017;46(5):2181–2192.
- 15. Rehman SA, Shabbir J. On the improvement of paired ranked set sampling to estimate population mean. Statistics in Transition New Series. 2021;22(3).
- 16. Khalid M, Singh GN. Some imputation methods to deal with the issue of missing data problems due to random non-response in two-occasion successive sampling. Communications in Statistics-Simulation and Computation. 2020; p. 1–21.
- 17. Singh GN, Khalid M. Exponential chain dual to ratio and regression type estimators of population mean in two-phase sampling. Statistica. 2015;75(4):379–389.
- 18. Singh G, Khalid M. Effective estimation strategy of population variance in two-phase successive sampling under random non-response. Journal of Statistical Theory and Practice. 2019;13(1):1–28.
- 19.
Chen Z, Bai Z, Sinha BK. Ranked set sampling: theory and applications. vol. 176. Springer; 2004.
- 20. Cobby J, Ridout M, Bassett P, Large R. An investigation into the use of ranked set sampling on grass and grass-clover swards. Grass and Forage Science. 1985;40(3):257–263.
- 21. Murray R, Ridout M, Cross J, et al. The use of ranked set sampling in spray deposit assessment. Aspects of Applied Biology. 2000;57:141–146.
- 22. Samawi HM, Al-Saleh MF, Al-Saidy O. On the matched pairs sign test using bivariate ranked set sampling: an application to environmental issues. African Journal of Environmental Science and Technology. 2008;2(1):001–009.
- 23. Husby CE, Stasny EA, Wolfe DA. An application of ranked set sampling for mean and median estimation using USDA crop production data. Journal of Agricultural, Biological, and Environmental Statistics. 2005;10(3):354–373.
- 24.
Bouza-Herrera CN. Handling missing data in ranked set sampling. Springer; 2013.
- 25. Al-Omari A, Bouza C. Ratio estimators of the population mean with missing values using ranked set sampling. Environmetrics. 2015;26(2):67–76.
- 26. Diana G, Giordan M, Perri PF. An improved class of estimators for the population mean. Statistical Methods & Applications. 2011;20(2):123–140.
- 27. Riaz S, Diana G, Shabbir J. IMPROVED CLASSES OF ESTIMATORS FOR POPULATION MEAN IN PRESENCE OF NON-RESPONSE. Pakistan Journal of Statistics. 2014;30(1).
- 28. Hansen MH, Hurwitz WN. The problem of non-response in sample surveys. Journal of the American Statistical Association. 1946;41(236):517–529. pmid:20279350
- 29. Rao P. Ratio estimation with sub sampling the non-respondents. Survey Methodology. 1986;12(2):217–230.
- 30. Rao . Ratio and regression estimates with sub sampling the non-respondents. International Statistical Association Meeting. 1987;(2):2–16.
- 31.
Cochran WG. Sampling Techniques. John Wiley & Sons; 1977.
- 32. Rao T. On certail methods of improving ration and regression estimators. Communications in Statistics-Theory and Methods. 1991;20(10):3325–3340.
- 33. Singh RK. On estimating ratio and product of population parameters. Calcutta Statistical Association Bulletin. 1982;31(1-2):69–76.
- 34. Kadilar C, Cingi H. Ratio estimators in simple random sampling. Applied Mathematics and Computation. 2004;151(3):893–902.
- 35. Bahl S, Tuteja R. Ratio and product type exponential estimators. Journal of Information and Optimization Sciences. 1991;12(1):159–164.