Figures
Abstract
In the present paper we propose an improved class of estimators in the presence of measurement error and non-response under stratified random sampling for estimating the finite population mean. The theoretical and numerical studies reveal that the proposed class of estimators performs better than other existing estimators.
Citation: Zahid E, Shabbir J (2018) Estimation of population mean in the presence of measurement error and non response under stratified random sampling. PLoS ONE 13(2): e0191572. https://doi.org/10.1371/journal.pone.0191572
Editor: Christof Markus Aegerter, Universitat Zurich, SWITZERLAND
Received: February 24, 2017; Accepted: January 8, 2018; Published: February 5, 2018
Copyright: © 2018 Zahid, Shabbir. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: “All relevant data are within the paper and its Supporting Information files.” The data used during simulation study and real life study is available. (See Appendix).
Funding: Erum Zahid received a fellowship from the Higher Education Commission of Pakistan grant 213-63287-2SS2-187. The funders had no role in study design, data collection and analysis, decision to publish or preparation the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In survey sampling, usually it is assumed that all the observations are correctly measured on the characteristics under study. But in practice this assumption is not met for a variety of reasons, such as non-response may occurs due to refusal of respondents to give the information or not at home or lack of interest or due to some ethical reasons. Usually measurement error and non-response are studied separately using the known auxiliary or additional information. In reality, both measurement error and non-response occur simultaneously in survey sampling. Mostly the information is not obtained from all the units during surveys, so non-response is a common problem which may creeps during a sample survey. In sampling theory the estimation of population mean of a variable of interest in the presence of non-response, when the auxiliary information available is widely debated. [1–5] and [6] discussed the problems of non-response in detail. To estimate the population mean, the researchers dealt with the problem of measurement error. For more details, see [7–11], etc. Recently few researchers studied the problem of measurement error and non-response together like [12–14] and [15]. [16] and [17] studied the improved estimation of population mean in simple and stratified random sampling.
In practice, the researchers who have studied measurement error have ignored the presence of non-response. But very few of them studied both under simple random sampling. In this paper, we have proposed a class of estimators for estimating the population mean in the presence of measurement error and non-response simultaneously under stratified random sampling. The efficiency of the suggested class of estimators over the existing estimators is shown through simulation study and real data sets.
Consider a finite population of N identifiable units which are partitioned into L homogeneous subgroups called strata, such that the hth strata consist of Nh units, where h = 1, 2, …, L and . It is assumed that N consists of two mutually exclusive groups called response and non-response groups. Let N1h and N2h are the responding and non-responding unit in the hth stratum respectively. We select a sample of size nh from Nh by using simple random sampling without replacement (SRSWOR) and assume that n1h units respond and n2h units do not respond. We select a sub-sample of size kh,
from n2h non-responding units in the hth stratum.
Let be the observed values and
be the actual values on the variables (X, Y) of the ith(i = 1, 2, …, n) sample units in the hth stratum. Then the measurement errors be
and
. Let
and
be the population variances for the responding units, and
and
be the population variances for non-responding units. Let
and
be the population variances associated with the measurement error for responding units. Let
and
be the population variances associated with measurement error for the non responding part of the population. Further let ChY and ChX be the coefficient of variations for the respondents and ChY(2) and ChX(2) be the coefficient of variations for the non-responding units respectively. Let ρhYX and ρhYX(2) be the population correlation coefficients between their respective subscripts respectively for responding and non-responding units, respectively.
In this paper an improved class of estimators for estimating the population mean in the presence of measurement error and non-response is proposed under stratified random sampling. Expressions for the bias and mean square error (MSE) of the class of estimators are obtained upto first order of approximation, when both the study and the auxiliary variables suffer with a problem of non-response and measurement errors.
The present paper is organized as: Section 2 gives some existing estimators of the finite population mean. In Section 3, an improved class of estimators is suggested for estimating the finite population mean by incorporating both measurement error and non-response information simultaneously. Efficiency comparison is presented in section 4. Numerical results and simulation study are presented in Section 5. Conclusion is given in Section 6.
Existing estimators in literature
In this section we consider the following existing estimators.
Hansen and Hurwitz (1946) estimator
In stratified random sampling, the Hansen and Hurwitz (1946) estimator for population mean , is given by
(1)
where
and
.
Here and
are the sample means based on n1h of responding and gh units of sub-samples from n2h non-responding groups, respectively.
Ratio estimator
The usual ratio estimator under stratified random sampling, is given by
(3)
where
is known and
is given in Eq (13). The bias and mean square error of
, are given by
(4)
and
(5)
where
,
Ch = λ2h ρhYX ShY ShX + θh ρhYX(2) ShY(2) ShX(2).
The suggested estimator
We propose an improved general class of estimators for estimating the population mean, dealing with the problem of measurement error and non-response simultaneously in stratified random sampling. Measurement error and non-response is present on both, the study and the auxiliary variables. The suggested estimator is given by
(11)
where, m1h and m2h are constants whose values are to be determined and αh is the scalar, chosen arbitrary. For obtaining the bias and mean square error, we assume that
,
.
,
.
Adding and
, we get
.
Dividing both sides by nh, and then simplifying, we get
(12)
Similarly, we can get
(13)
Further
On simplifying, we get
(14)
where
and
.
Further simplifying, and ignoring error terms greater than two, we have
(15)
where,
and
Using Eq (15), the bias of to first order of approximation, is given by
(16)
Squaring both sides of Eq (15), and keeping the terms up to power two in errors, and then taking expectations, the mean square error of
is given by
The above equation can be written as
(17)
where,
,
,
,
,
.
For finding the optimal values of m1h and m2h, we differentiate Eq (17) with respect to m1h and m2h respectively. The optimal values are given by and
.
Substituting these optimum values in Eq (17), we get the minimum mean square error of , as
(18)
Efficiency comparison
The efficiency comparison of and
with respect to
are given by,
- From Eqs (2) and (18),
if
- From Eqs (5) and (18),
if
- From Eqs (7) and (18),
if
- From Eqs (10) and (18),
if
The proposed class of estimators is more efficient than other existing estimators when above conditions 1 to 4 are satisfied.
Numerical results
In this section three populations are generated for simulation study and four are based on real data sets. The results are given in Tables 1–3 (simulation) and 4–7 (real data).
Simulation study
We have generated three populations from normal distribution by using R language program. First population is generated for equal strata and the second one is generated for unequal strata and third one is generated for equal strata of small sample size (see Appendix).
The above tables shows that a general class of proposed estimators outperform all the other existing estimators. For shows the better performance.
Application to real data
In this section we have considered four data sets (see Appendix) for numerical comparisons and results are given in Tables 4–7.
In these tables, we observed that a general class of proposed estimators is more efficient as compared to all other considered estimators. For shows the better performance.
Conclusion
In the present paper, we have suggested an improved class of estimators of the finite population mean in the presence of measurement error and non-response under stratified random sampling. Through simulation study and real life data sets it is observed that the proposed class of estimators perform better than the existing estimators. The mean square error values are generally smaller under 10% of non-response as compared to 20% of non-response, which are expected results. Generally as the non-response rate increases, mean square error also increases. Based on numerical findings, it turns out that the proposed class of estimators is more efficient for the situations when α = 0, α = 1 and α = −1 as compared to the other existing estimators. Among different classes, the performance of proposed class of estimators is better for α = 0 in Tables 1–7.
Appendix
Population I.
X1 = rnorm(1000, 5, 10), Y1 = X1 + rnorm(1000, 0, 1), y1 = Y1 + rnorm(1000, 1, 3), x1 = X1 + rnorm(1000, 1, 3)
X2 = rnorm(1000, 4, 8), Y2 = X2 + rnorm(1000, 0, 1), y2 = Y2 + rnorm(1000, 1, 3), x2 = X2 + rnorm(1000, 1, 3)
X3 = rnorm(1000, 4, 9), Y3 = X3 + rnorm(1000, 0, 1), y3 = Y3 + rnorm(1000, 1, 3), x3 = X3 + rnorm(1000, 1, 3)
X4 = rnorm(1000, 3, 7), Y4 = X4 + rnorm(1000, 0, 1), y4 = Y4 + rnorm(1000, 1, 3), x4 = X4 + rnorm(1000, 1, 3)
Number of Stratas = 4
N1 = 1000, N2 = 1000, N3 = 1000, N4 = 1000, n1 = 200, n2 = 200, n3 = 200, n4 = 200, ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, ρ1YX = 0.9950779, ρ2YX = 0.9926346, ρ3YX = 0.9939164, ρ4YX = 0.9896319.
Population II.
X1 = rnorm(1000, 5, 10), Y1 = X1 + rnorm(1000, 0, 1), y1 = Y1 + rnorm(1000, 1, 3), x1 = X1 + rnorm(1000, 0, 1)
X2 = rnorm(1200, 4, 8), Y2 = X2 + rnorm(1200, 0, 1), y2 = Y2 + rnorm(1200, 1, 3), x2 = X2 + rnorm(1200, 0, 1)
X3 = rnorm(1300, 4, 9), Y3 = X3 + rnorm(1300, 0, 1), y3 = Y3 + rnorm(1300, 1, 3), x3 = X3 + rnorm(1300, 0, 1)
X4 = rnorm(1500, 3, 7), Y4 = X4 + rnorm(1500, 0, 1), y4 = Y4 + rnorm(1500, 1, 3), x4 = X4 + rnorm(1500, 1, 3)
Number of Stratas = 4
N1 = 1000, N2 = 1200, N3 = 1300, N4 = 1500, n1 = 200, n2 = 210, n3 = 220, n4 = 215, ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, ρ1YX = 0.9950779, ρ2YX = 0.9924618, ρ3YX = 0.9939475, ρ4YX = 0.9903624.
Population III.
X1 = rnorm(200, 5, 10), Y1 = X1 + rnorm(200, 0, 1), y1 = Y1 + rnorm(200, 1, 3), x1 = X1 + rnorm(200, 0, 1)
X2 = rnorm(200, 4, 8), Y2 = X2 + rnorm(200, 0, 1), y2 = Y2 + rnorm(200, 1, 3), x2 = X2 + rnorm(1200, 0, 1)
X3 = rnorm(200, 4, 9), Y3 = X3 + rnorm(200, 0, 1), y3 = Y3 + rnorm(200, 1, 3), x3 = X3 + rnorm(1300, 0, 1)
X4 = rnorm(200, 3, 7), Y4 = X4 + rnorm(200, 0, 1), y4 = Y4 + rnorm(200, 1, 3), x4 = X4 + rnorm(200, 1, 3)
Number of Stratas = 4
N1 = 200, N2 = 200, N3 = 200, N4 = 200, n1 = 22, n2 = 28, n3 = 22, n4 = 25, ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, ρ1YX = 0.9447295, ρ2YX = 0.9214058, ρ3YX = 0.9529337, ρ4YX = 0.912232.
Population IV. (Source: [18])
Y: Number of teachers, X: Number of students.
Number of Stratas = 6
N1 = 106, N2 = 106, N3 = 94, N4 = 171, N5 = 204, N6 = 173, n1 = 15, n2 = 15, n3 = 12, n4 = 20, n5 = 23, n6 = 15, ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
. ρ1YX = 0.8156414, ρ2YX = 0.8156414, ρ3YX = 0.9011201, ρ4YX = 0.9858761, ρ5YX = 0.7130988, ρ6YX = 0.893595.
Population V. (Source: [19])
Y: 1983 population(in millions), X: 1982 gross national product
Number of Stratas = 5
N1 = 38, N2 = 14, N3 = 11, N4 = 33, N5 = 24, n1 = 17, n2 = 6, n3 = 4, n4 = 12, n5 = 11, ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, ρ1YX = 0.7439544, ρ2YX = 0.969956, ρ3YX = 0.9768227, ρ4YX = 0.2948897, ρ5YX = 0.9011072.
Population VI. (Source: [19])
Y: 1983 population(in millions), X: 1980 population(in millions)
Number of Stratas = 5
N1 = 38, N2 = 14, N3 = 11, N4 = 33, N5 = 24, n1 = 17, n2 = 6, n3 = 4, n4 = 12, n5 = 11, ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, ρ1YX = 0.9996193, ρ2YX = 0.9998693, ρ3YX = 0.9998858, ρ4YX = 0.9993071, ρ5YX = 0.9998059.
Population VII. ([20])
Y: Production of wheat(in tons), X: Area of wheat (in hectares)
Number of Stratas = 4
N1 = 47, N2 = 30, N3 = 29, N4 = 13, n1 = 15, n2 = 10, n3 = 10, n4 = 5, ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, ρ1YX = 0.9583838, ρ2YX = 0.779071, ρ3YX = 0.8719665, ρ4YX = 0.9922591.
Supporting information
S1 File. Data used in the manuscript “S1 File.zip”.
https://doi.org/10.1371/journal.pone.0191572.s001
(ZIP)
Acknowledgments
The authors are grateful to the two anonymous referees for their valuable comments and feedback.
References
- 1. Hansen MH, Hurwitz WN. The problem of non-response in sample surveys. Journal of the American Statistical Association. 1946;41(236):517–529. pmid:20279350
- 2.
Cochran WG. Sampling techniques. New York: John Wills & Sons. 1977;.
- 3. Rao P. Ratio estimation with subsampling the nonrespondents. Survey Methodology. 1986;12(2):217–230.
- 4. Khare B, Srivastava S. Generalized two phase sampling estimators for the population mean in the presence of nonresponse. Aligarh Journal of Statistics. 2010;30:39–54.
- 5. Singh HP, Kumar S, et al. Combination of regression and ratio estimate in presence of nonresponse. Brazilian journal of Probability and Statistics. 2011;25(2):205–217.
- 6. Shabbir J, Khan NS. Some Modified Exponential-Ratio Type Estimators in the presence of Non-response under Two-Phase Sampling Scheme. Electronic Journal of Applied Statistical Analysis. 2013;6(1):1–17.
- 7. Cochran WG. Errors of measurement in statistics. Technometrics. 1968;10(4):637–666.
- 8. Fuller WA. Estimation in the presence of measurement error. International Statistical Review/Revue Internationale de Statistique. 1995; p. 121–141.
- 9. Shalabh S. Ratio method of estimation in the presence of measurement errors. Jour Ind Soc Agri Statist. 1997;52:150–155.
- 10.
Biemer PP, Groves RM, Lyberg LE, Mathiowetz NA, Sudman S. Measurement Errors in Surveys. John Wiley & Sons; 2011.
- 11. Shukla D, Pathak S, Thakur N. An estimator for mean estimation in presence of measurement error. Research and Reviews: A Journal of Statistics. 2012;1(1):1–8.
- 12. Kumar S, Bhougal S, Nataraja N, Viswanathaiah M. Estimation of Population Mean in the Presence of Non-Response and Measurement Error. Revista Colombiana de EstadÝstica. 2015;38(1):145–161.
- 13. Singh RS, Sharma P. Method of Estimation in the Presence of Non-response and Measurement Errors Simultaneously. Journal of Modern Applied Statistical Methods. 2015;14(1):12.
- 14. Azeem M, Hanif M. Joint influence of measurement error and non response on estimation of population mean. Communications in Statistics-Theory and Methods. 2017;46(4):1679–1693.
- 15. Kumar S. Improved estimation of population mean in presence of nonresponse and measurement error. Journal of Statistical Theory and Practice. 2016;10(4):707–720.
- 16. Shabbir J, Gupta S. On estimating finite population mean in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2010;40(2):199–212.
- 17. Haq A, Shabbir J. Improved family of ratio estimators in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2013;42(5):782–799.
- 18. Koyuncu N, Kadilar C. Ratio and product estimators in stratified random sampling. Journal of Statistical Planning and Inference. 2009;139(8):2552–2558.
- 19.
Särndal C, Swensson B, Wretman J. Model Assisted Survey Sampling. New York: Springer. 1992;.
- 20.
FBS. Crops area production by districts, Islamabad; 2011.