Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Estimation of population mean in the presence of measurement error and non response under stratified random sampling

  • Erum Zahid ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    erumzahid22@gmail.com

    Affiliation Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan

  • Javid Shabbir

    Roles Investigation, Project administration, Resources, Supervision, Validation, Visualization

    Affiliation Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan

Abstract

In the present paper we propose an improved class of estimators in the presence of measurement error and non-response under stratified random sampling for estimating the finite population mean. The theoretical and numerical studies reveal that the proposed class of estimators performs better than other existing estimators.

Introduction

In survey sampling, usually it is assumed that all the observations are correctly measured on the characteristics under study. But in practice this assumption is not met for a variety of reasons, such as non-response may occurs due to refusal of respondents to give the information or not at home or lack of interest or due to some ethical reasons. Usually measurement error and non-response are studied separately using the known auxiliary or additional information. In reality, both measurement error and non-response occur simultaneously in survey sampling. Mostly the information is not obtained from all the units during surveys, so non-response is a common problem which may creeps during a sample survey. In sampling theory the estimation of population mean of a variable of interest in the presence of non-response, when the auxiliary information available is widely debated. [15] and [6] discussed the problems of non-response in detail. To estimate the population mean, the researchers dealt with the problem of measurement error. For more details, see [711], etc. Recently few researchers studied the problem of measurement error and non-response together like [1214] and [15]. [16] and [17] studied the improved estimation of population mean in simple and stratified random sampling.

In practice, the researchers who have studied measurement error have ignored the presence of non-response. But very few of them studied both under simple random sampling. In this paper, we have proposed a class of estimators for estimating the population mean in the presence of measurement error and non-response simultaneously under stratified random sampling. The efficiency of the suggested class of estimators over the existing estimators is shown through simulation study and real data sets.

Consider a finite population of N identifiable units which are partitioned into L homogeneous subgroups called strata, such that the hth strata consist of Nh units, where h = 1, 2, …, L and . It is assumed that N consists of two mutually exclusive groups called response and non-response groups. Let N1h and N2h are the responding and non-responding unit in the hth stratum respectively. We select a sample of size nh from Nh by using simple random sampling without replacement (SRSWOR) and assume that n1h units respond and n2h units do not respond. We select a sub-sample of size kh, from n2h non-responding units in the hth stratum.

Let be the observed values and be the actual values on the variables (X, Y) of the ith(i = 1, 2, …, n) sample units in the hth stratum. Then the measurement errors be and . Let and be the population variances for the responding units, and and be the population variances for non-responding units. Let and be the population variances associated with the measurement error for responding units. Let and be the population variances associated with measurement error for the non responding part of the population. Further let ChY and ChX be the coefficient of variations for the respondents and ChY(2) and ChX(2) be the coefficient of variations for the non-responding units respectively. Let ρhYX and ρhYX(2) be the population correlation coefficients between their respective subscripts respectively for responding and non-responding units, respectively.

In this paper an improved class of estimators for estimating the population mean in the presence of measurement error and non-response is proposed under stratified random sampling. Expressions for the bias and mean square error (MSE) of the class of estimators are obtained upto first order of approximation, when both the study and the auxiliary variables suffer with a problem of non-response and measurement errors.

The present paper is organized as: Section 2 gives some existing estimators of the finite population mean. In Section 3, an improved class of estimators is suggested for estimating the finite population mean by incorporating both measurement error and non-response information simultaneously. Efficiency comparison is presented in section 4. Numerical results and simulation study are presented in Section 5. Conclusion is given in Section 6.

Existing estimators in literature

In this section we consider the following existing estimators.

Hansen and Hurwitz (1946) estimator

In stratified random sampling, the Hansen and Hurwitz (1946) estimator for population mean , is given by (1) where and .

Here and are the sample means based on n1h of responding and gh units of sub-samples from n2h non-responding groups, respectively.

The variance of , is given by (2) where , , .

Ratio estimator

The usual ratio estimator under stratified random sampling, is given by (3) where is known and is given in Eq (13). The bias and mean square error of , are given by (4) and (5) where , Ch = λ2h ρhYX ShY ShX + θh ρhYX(2) ShY(2) ShX(2).

Difference estimator

The usual difference estimator under stratified random sampling, is given by (6) where and dh is the constant.

The minimum variance of is given by (7) The optimum value of dh is , where .

Azeem and Hanif (2017) estimator

Azeem and Hanif (2016) estimator under stratified random sampling, is given by (8) The bias and MSE of , are given by (9) and (10) where .

The suggested estimator

We propose an improved general class of estimators for estimating the population mean, dealing with the problem of measurement error and non-response simultaneously in stratified random sampling. Measurement error and non-response is present on both, the study and the auxiliary variables. The suggested estimator is given by (11) where, m1h and m2h are constants whose values are to be determined and αh is the scalar, chosen arbitrary. For obtaining the bias and mean square error, we assume that , .

, .

Adding and , we get .

Dividing both sides by nh, and then simplifying, we get (12)

Similarly, we can get (13) Further

On simplifying, we get (14) where and .

Further simplifying, and ignoring error terms greater than two, we have (15) where, and

Using Eq (15), the bias of to first order of approximation, is given by (16) Squaring both sides of Eq (15), and keeping the terms up to power two in errors, and then taking expectations, the mean square error of is given by The above equation can be written as (17) where, , , , , .

For finding the optimal values of m1h and m2h, we differentiate Eq (17) with respect to m1h and m2h respectively. The optimal values are given by and .

Substituting these optimum values in Eq (17), we get the minimum mean square error of , as (18)

Efficiency comparison

The efficiency comparison of and with respect to are given by,

  1. From Eqs (2) and (18), if
  2. From Eqs (5) and (18), if
  3. From Eqs (7) and (18), if
  4. From Eqs (10) and (18), if

The proposed class of estimators is more efficient than other existing estimators when above conditions 1 to 4 are satisfied.

Numerical results

In this section three populations are generated for simulation study and four are based on real data sets. The results are given in Tables 13 (simulation) and 47 (real data).

thumbnail
Table 1. Mean squared error (MSE) of different estimators for Pop I(N = 4000).

https://doi.org/10.1371/journal.pone.0191572.t001

thumbnail
Table 2. Mean squared error (MSE) of different estimators for Pop II(N = 5000).

https://doi.org/10.1371/journal.pone.0191572.t002

thumbnail
Table 3. Mean squared error (MSE) of different estimators for Pop III(N = 800).

https://doi.org/10.1371/journal.pone.0191572.t003

thumbnail
Table 4. Mean squared error (MSE) of different estimators for Pop IV(N = 854).

https://doi.org/10.1371/journal.pone.0191572.t004

thumbnail
Table 5. Mean squared error (MSE) of different estimators for Pop V(N = 120).

https://doi.org/10.1371/journal.pone.0191572.t005

thumbnail
Table 6. Mean squared error (MSE) of different estimators for Pop VI(N = 120).

https://doi.org/10.1371/journal.pone.0191572.t006

thumbnail
Table 7. Mean squared error (MSE) of different estimators for Pop VII(N = 119).

https://doi.org/10.1371/journal.pone.0191572.t007

Simulation study

We have generated three populations from normal distribution by using R language program. First population is generated for equal strata and the second one is generated for unequal strata and third one is generated for equal strata of small sample size (see Appendix).

The above tables shows that a general class of proposed estimators outperform all the other existing estimators. For shows the better performance.

Application to real data

In this section we have considered four data sets (see Appendix) for numerical comparisons and results are given in Tables 47.

In these tables, we observed that a general class of proposed estimators is more efficient as compared to all other considered estimators. For shows the better performance.

Conclusion

In the present paper, we have suggested an improved class of estimators of the finite population mean in the presence of measurement error and non-response under stratified random sampling. Through simulation study and real life data sets it is observed that the proposed class of estimators perform better than the existing estimators. The mean square error values are generally smaller under 10% of non-response as compared to 20% of non-response, which are expected results. Generally as the non-response rate increases, mean square error also increases. Based on numerical findings, it turns out that the proposed class of estimators is more efficient for the situations when α = 0, α = 1 and α = −1 as compared to the other existing estimators. Among different classes, the performance of proposed class of estimators is better for α = 0 in Tables 17.

Appendix

Population I.

X1 = rnorm(1000, 5, 10), Y1 = X1 + rnorm(1000, 0, 1), y1 = Y1 + rnorm(1000, 1, 3), x1 = X1 + rnorm(1000, 1, 3)

X2 = rnorm(1000, 4, 8), Y2 = X2 + rnorm(1000, 0, 1), y2 = Y2 + rnorm(1000, 1, 3), x2 = X2 + rnorm(1000, 1, 3)

X3 = rnorm(1000, 4, 9), Y3 = X3 + rnorm(1000, 0, 1), y3 = Y3 + rnorm(1000, 1, 3), x3 = X3 + rnorm(1000, 1, 3)

X4 = rnorm(1000, 3, 7), Y4 = X4 + rnorm(1000, 0, 1), y4 = Y4 + rnorm(1000, 1, 3), x4 = X4 + rnorm(1000, 1, 3)

Number of Stratas = 4

N1 = 1000, N2 = 1000, N3 = 1000, N4 = 1000, n1 = 200, n2 = 200, n3 = 200, n4 = 200, , , , , , , , , , , , , , , , , ρ1YX = 0.9950779, ρ2YX = 0.9926346, ρ3YX = 0.9939164, ρ4YX = 0.9896319.

Population II.

X1 = rnorm(1000, 5, 10), Y1 = X1 + rnorm(1000, 0, 1), y1 = Y1 + rnorm(1000, 1, 3), x1 = X1 + rnorm(1000, 0, 1)

X2 = rnorm(1200, 4, 8), Y2 = X2 + rnorm(1200, 0, 1), y2 = Y2 + rnorm(1200, 1, 3), x2 = X2 + rnorm(1200, 0, 1)

X3 = rnorm(1300, 4, 9), Y3 = X3 + rnorm(1300, 0, 1), y3 = Y3 + rnorm(1300, 1, 3), x3 = X3 + rnorm(1300, 0, 1)

X4 = rnorm(1500, 3, 7), Y4 = X4 + rnorm(1500, 0, 1), y4 = Y4 + rnorm(1500, 1, 3), x4 = X4 + rnorm(1500, 1, 3)

Number of Stratas = 4

N1 = 1000, N2 = 1200, N3 = 1300, N4 = 1500, n1 = 200, n2 = 210, n3 = 220, n4 = 215, , , , , , , , , , , , , , , , , ρ1YX = 0.9950779, ρ2YX = 0.9924618, ρ3YX = 0.9939475, ρ4YX = 0.9903624.

Population III.

X1 = rnorm(200, 5, 10), Y1 = X1 + rnorm(200, 0, 1), y1 = Y1 + rnorm(200, 1, 3), x1 = X1 + rnorm(200, 0, 1)

X2 = rnorm(200, 4, 8), Y2 = X2 + rnorm(200, 0, 1), y2 = Y2 + rnorm(200, 1, 3), x2 = X2 + rnorm(1200, 0, 1)

X3 = rnorm(200, 4, 9), Y3 = X3 + rnorm(200, 0, 1), y3 = Y3 + rnorm(200, 1, 3), x3 = X3 + rnorm(1300, 0, 1)

X4 = rnorm(200, 3, 7), Y4 = X4 + rnorm(200, 0, 1), y4 = Y4 + rnorm(200, 1, 3), x4 = X4 + rnorm(200, 1, 3)

Number of Stratas = 4

N1 = 200, N2 = 200, N3 = 200, N4 = 200, n1 = 22, n2 = 28, n3 = 22, n4 = 25, , , , , , , , , , , , , , , , , ρ1YX = 0.9447295, ρ2YX = 0.9214058, ρ3YX = 0.9529337, ρ4YX = 0.912232.

Population IV. (Source: [18])

Y: Number of teachers, X: Number of students.

Number of Stratas = 6

N1 = 106, N2 = 106, N3 = 94, N4 = 171, N5 = 204, N6 = 173, n1 = 15, n2 = 15, n3 = 12, n4 = 20, n5 = 23, n6 = 15, , , , , , , , , , , , , , , , , , , , , , , , . ρ1YX = 0.8156414, ρ2YX = 0.8156414, ρ3YX = 0.9011201, ρ4YX = 0.9858761, ρ5YX = 0.7130988, ρ6YX = 0.893595.

Population V. (Source: [19])

Y: 1983 population(in millions), X: 1982 gross national product

Number of Stratas = 5

N1 = 38, N2 = 14, N3 = 11, N4 = 33, N5 = 24, n1 = 17, n2 = 6, n3 = 4, n4 = 12, n5 = 11, , , , , , , , , , , , , , , , , , , , , ρ1YX = 0.7439544, ρ2YX = 0.969956, ρ3YX = 0.9768227, ρ4YX = 0.2948897, ρ5YX = 0.9011072.

Population VI. (Source: [19])

Y: 1983 population(in millions), X: 1980 population(in millions)

Number of Stratas = 5

N1 = 38, N2 = 14, N3 = 11, N4 = 33, N5 = 24, n1 = 17, n2 = 6, n3 = 4, n4 = 12, n5 = 11, , , , , , , , , , , , , , , , , , , , , ρ1YX = 0.9996193, ρ2YX = 0.9998693, ρ3YX = 0.9998858, ρ4YX = 0.9993071, ρ5YX = 0.9998059.

Population VII. ([20])

Y: Production of wheat(in tons), X: Area of wheat (in hectares)

Number of Stratas = 4

N1 = 47, N2 = 30, N3 = 29, N4 = 13, n1 = 15, n2 = 10, n3 = 10, n4 = 5, , , , , , , , , , , , , , , , , ρ1YX = 0.9583838, ρ2YX = 0.779071, ρ3YX = 0.8719665, ρ4YX = 0.9922591.

Supporting information

S1 File. Data used in the manuscript “S1 File.zip”.

https://doi.org/10.1371/journal.pone.0191572.s001

(ZIP)

Acknowledgments

The authors are grateful to the two anonymous referees for their valuable comments and feedback.

References

  1. 1. Hansen MH, Hurwitz WN. The problem of non-response in sample surveys. Journal of the American Statistical Association. 1946;41(236):517–529. pmid:20279350
  2. 2. Cochran WG. Sampling techniques. New York: John Wills & Sons. 1977;.
  3. 3. Rao P. Ratio estimation with subsampling the nonrespondents. Survey Methodology. 1986;12(2):217–230.
  4. 4. Khare B, Srivastava S. Generalized two phase sampling estimators for the population mean in the presence of nonresponse. Aligarh Journal of Statistics. 2010;30:39–54.
  5. 5. Singh HP, Kumar S, et al. Combination of regression and ratio estimate in presence of nonresponse. Brazilian journal of Probability and Statistics. 2011;25(2):205–217.
  6. 6. Shabbir J, Khan NS. Some Modified Exponential-Ratio Type Estimators in the presence of Non-response under Two-Phase Sampling Scheme. Electronic Journal of Applied Statistical Analysis. 2013;6(1):1–17.
  7. 7. Cochran WG. Errors of measurement in statistics. Technometrics. 1968;10(4):637–666.
  8. 8. Fuller WA. Estimation in the presence of measurement error. International Statistical Review/Revue Internationale de Statistique. 1995; p. 121–141.
  9. 9. Shalabh S. Ratio method of estimation in the presence of measurement errors. Jour Ind Soc Agri Statist. 1997;52:150–155.
  10. 10. Biemer PP, Groves RM, Lyberg LE, Mathiowetz NA, Sudman S. Measurement Errors in Surveys. John Wiley & Sons; 2011.
  11. 11. Shukla D, Pathak S, Thakur N. An estimator for mean estimation in presence of measurement error. Research and Reviews: A Journal of Statistics. 2012;1(1):1–8.
  12. 12. Kumar S, Bhougal S, Nataraja N, Viswanathaiah M. Estimation of Population Mean in the Presence of Non-Response and Measurement Error. Revista Colombiana de EstadÝstica. 2015;38(1):145–161.
  13. 13. Singh RS, Sharma P. Method of Estimation in the Presence of Non-response and Measurement Errors Simultaneously. Journal of Modern Applied Statistical Methods. 2015;14(1):12.
  14. 14. Azeem M, Hanif M. Joint influence of measurement error and non response on estimation of population mean. Communications in Statistics-Theory and Methods. 2017;46(4):1679–1693.
  15. 15. Kumar S. Improved estimation of population mean in presence of nonresponse and measurement error. Journal of Statistical Theory and Practice. 2016;10(4):707–720.
  16. 16. Shabbir J, Gupta S. On estimating finite population mean in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2010;40(2):199–212.
  17. 17. Haq A, Shabbir J. Improved family of ratio estimators in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2013;42(5):782–799.
  18. 18. Koyuncu N, Kadilar C. Ratio and product estimators in stratified random sampling. Journal of Statistical Planning and Inference. 2009;139(8):2552–2558.
  19. 19. Särndal C, Swensson B, Wretman J. Model Assisted Survey Sampling. New York: Springer. 1992;.
  20. 20. FBS. Crops area production by districts, Islamabad; 2011.