Estimation of population mean in the presence of measurement error and non response under stratified random sampling

Erum Zahid; Javid Shabbir

doi:10.1371/journal.pone.0191572

Abstract

In the present paper we propose an improved class of estimators in the presence of measurement error and non-response under stratified random sampling for estimating the finite population mean. The theoretical and numerical studies reveal that the proposed class of estimators performs better than other existing estimators.

Citation: Zahid E, Shabbir J (2018) Estimation of population mean in the presence of measurement error and non response under stratified random sampling. PLoS ONE 13(2): e0191572. https://doi.org/10.1371/journal.pone.0191572

Editor: Christof Markus Aegerter, Universitat Zurich, SWITZERLAND

Received: February 24, 2017; Accepted: January 8, 2018; Published: February 5, 2018

Copyright: © 2018 Zahid, Shabbir. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: “All relevant data are within the paper and its Supporting Information files.” The data used during simulation study and real life study is available. (See Appendix).

Funding: Erum Zahid received a fellowship from the Higher Education Commission of Pakistan grant 213-63287-2SS2-187. The funders had no role in study design, data collection and analysis, decision to publish or preparation the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In survey sampling, usually it is assumed that all the observations are correctly measured on the characteristics under study. But in practice this assumption is not met for a variety of reasons, such as non-response may occurs due to refusal of respondents to give the information or not at home or lack of interest or due to some ethical reasons. Usually measurement error and non-response are studied separately using the known auxiliary or additional information. In reality, both measurement error and non-response occur simultaneously in survey sampling. Mostly the information is not obtained from all the units during surveys, so non-response is a common problem which may creeps during a sample survey. In sampling theory the estimation of population mean of a variable of interest in the presence of non-response, when the auxiliary information available is widely debated. [1–5] and [6] discussed the problems of non-response in detail. To estimate the population mean, the researchers dealt with the problem of measurement error. For more details, see [7–11], etc. Recently few researchers studied the problem of measurement error and non-response together like [12–14] and [15]. [16] and [17] studied the improved estimation of population mean in simple and stratified random sampling.

In practice, the researchers who have studied measurement error have ignored the presence of non-response. But very few of them studied both under simple random sampling. In this paper, we have proposed a class of estimators for estimating the population mean in the presence of measurement error and non-response simultaneously under stratified random sampling. The efficiency of the suggested class of estimators over the existing estimators is shown through simulation study and real data sets.

Consider a finite population of N identifiable units which are partitioned into L homogeneous subgroups called strata, such that the h^th strata consist of N_h units, where h = 1, 2, …, L and . It is assumed that N consists of two mutually exclusive groups called response and non-response groups. Let N_1h and N_2h are the responding and non-responding unit in the h^th stratum respectively. We select a sample of size n_h from N_h by using simple random sampling without replacement (SRSWOR) and assume that n_1h units respond and n_2h units do not respond. We select a sub-sample of size k_h, from n_2h non-responding units in the h^th stratum.

Let be the observed values and be the actual values on the variables (X, Y) of the i^th(i = 1, 2, …, n) sample units in the h^th stratum. Then the measurement errors be and . Let and be the population variances for the responding units, and and be the population variances for non-responding units. Let and be the population variances associated with the measurement error for responding units. Let and be the population variances associated with measurement error for the non responding part of the population. Further let C_hY and C_hX be the coefficient of variations for the respondents and C_hY(2) and C_hX(2) be the coefficient of variations for the non-responding units respectively. Let ρ_hYX and ρ_hYX(2) be the population correlation coefficients between their respective subscripts respectively for responding and non-responding units, respectively.

In this paper an improved class of estimators for estimating the population mean in the presence of measurement error and non-response is proposed under stratified random sampling. Expressions for the bias and mean square error (MSE) of the class of estimators are obtained upto first order of approximation, when both the study and the auxiliary variables suffer with a problem of non-response and measurement errors.

The present paper is organized as: Section 2 gives some existing estimators of the finite population mean. In Section 3, an improved class of estimators is suggested for estimating the finite population mean by incorporating both measurement error and non-response information simultaneously. Efficiency comparison is presented in section 4. Numerical results and simulation study are presented in Section 5. Conclusion is given in Section 6.

Existing estimators in literature

In this section we consider the following existing estimators.

Hansen and Hurwitz (1946) estimator

In stratified random sampling, the Hansen and Hurwitz (1946) estimator for population mean , is given by (1) where and .

Here and are the sample means based on n_1h of responding and g_h units of sub-samples from n_2h non-responding groups, respectively.

The variance of , is given by (2) where , , .

Ratio estimator

The usual ratio estimator under stratified random sampling, is given by (3) where is known and is given in Eq (13). The bias and mean square error of , are given by (4) and (5) where , C_h = λ_2h ρ_hYX S_hY S_hX + θ_h ρ_hYX(2) S_hY(2) S_hX(2).

Difference estimator

The usual difference estimator under stratified random sampling, is given by (6) where and d_h is the constant.

The minimum variance of is given by (7) The optimum value of d_h is , where .

Azeem and Hanif (2017) estimator

Azeem and Hanif (2016) estimator under stratified random sampling, is given by (8) The bias and MSE of , are given by (9) and (10) where .

The suggested estimator

We propose an improved general class of estimators for estimating the population mean, dealing with the problem of measurement error and non-response simultaneously in stratified random sampling. Measurement error and non-response is present on both, the study and the auxiliary variables. The suggested estimator is given by (11) where, m_1h and m_2h are constants whose values are to be determined and α_h is the scalar, chosen arbitrary. For obtaining the bias and mean square error, we assume that , .

, .

Adding and , we get .

Dividing both sides by n_h, and then simplifying, we get (12)

Similarly, we can get (13) Further

On simplifying, we get (14) where and .

Further simplifying, and ignoring error terms greater than two, we have (15) where, and

Using Eq (15), the bias of to first order of approximation, is given by (16) Squaring both sides of Eq (15), and keeping the terms up to power two in errors, and then taking expectations, the mean square error of is given by The above equation can be written as (17) where, , , , , .

For finding the optimal values of m_1h and m_2h, we differentiate Eq (17) with respect to m_1h and m_2h respectively. The optimal values are given by and .

Substituting these optimum values in Eq (17), we get the minimum mean square error of , as (18)

Efficiency comparison

The efficiency comparison of and with respect to are given by,

From Eqs (2) and (18), if
From Eqs (5) and (18), if
From Eqs (7) and (18), if
From Eqs (10) and (18), if

The proposed class of estimators is more efficient than other existing estimators when above conditions 1 to 4 are satisfied.

Numerical results

In this section three populations are generated for simulation study and four are based on real data sets. The results are given in Tables 1–3 (simulation) and 4–7 (real data).

Download:

Table 1. Mean squared error (MSE) of different estimators for Pop I(N = 4000).

https://doi.org/10.1371/journal.pone.0191572.t001

Download:

Table 2. Mean squared error (MSE) of different estimators for Pop II(N = 5000).

https://doi.org/10.1371/journal.pone.0191572.t002

Download:

Table 3. Mean squared error (MSE) of different estimators for Pop III(N = 800).

https://doi.org/10.1371/journal.pone.0191572.t003

Download:

Table 4. Mean squared error (MSE) of different estimators for Pop IV(N = 854).

https://doi.org/10.1371/journal.pone.0191572.t004

Download:

Table 5. Mean squared error (MSE) of different estimators for Pop V(N = 120).

https://doi.org/10.1371/journal.pone.0191572.t005

Download:

Table 6. Mean squared error (MSE) of different estimators for Pop VI(N = 120).

https://doi.org/10.1371/journal.pone.0191572.t006

Download:

Table 7. Mean squared error (MSE) of different estimators for Pop VII(N = 119).

https://doi.org/10.1371/journal.pone.0191572.t007

Simulation study

We have generated three populations from normal distribution by using R language program. First population is generated for equal strata and the second one is generated for unequal strata and third one is generated for equal strata of small sample size (see Appendix).

The above tables shows that a general class of proposed estimators outperform all the other existing estimators. For shows the better performance.

Application to real data

In this section we have considered four data sets (see Appendix) for numerical comparisons and results are given in Tables 4–7.

In these tables, we observed that a general class of proposed estimators is more efficient as compared to all other considered estimators. For shows the better performance.

Conclusion

In the present paper, we have suggested an improved class of estimators of the finite population mean in the presence of measurement error and non-response under stratified random sampling. Through simulation study and real life data sets it is observed that the proposed class of estimators perform better than the existing estimators. The mean square error values are generally smaller under 10% of non-response as compared to 20% of non-response, which are expected results. Generally as the non-response rate increases, mean square error also increases. Based on numerical findings, it turns out that the proposed class of estimators is more efficient for the situations when α = 0, α = 1 and α = −1 as compared to the other existing estimators. Among different classes, the performance of proposed class of estimators is better for α = 0 in Tables 1–7.

Appendix

Population I.

X₁ = rnorm(1000, 5, 10), Y₁ = X₁ + rnorm(1000, 0, 1), y₁ = Y₁ + rnorm(1000, 1, 3), x₁ = X₁ + rnorm(1000, 1, 3)

X₂ = rnorm(1000, 4, 8), Y₂ = X₂ + rnorm(1000, 0, 1), y₂ = Y₂ + rnorm(1000, 1, 3), x₂ = X₂ + rnorm(1000, 1, 3)

X₃ = rnorm(1000, 4, 9), Y₃ = X₃ + rnorm(1000, 0, 1), y₃ = Y₃ + rnorm(1000, 1, 3), x₃ = X₃ + rnorm(1000, 1, 3)

X₄ = rnorm(1000, 3, 7), Y₄ = X₄ + rnorm(1000, 0, 1), y₄ = Y₄ + rnorm(1000, 1, 3), x₄ = X₄ + rnorm(1000, 1, 3)

Number of Stratas = 4

N₁ = 1000, N₂ = 1000, N₃ = 1000, N₄ = 1000, n₁ = 200, n₂ = 200, n₃ = 200, n₄ = 200, , , , , , , , , , , , , , , , , ρ_1YX = 0.9950779, ρ_2YX = 0.9926346, ρ_3YX = 0.9939164, ρ_4YX = 0.9896319.

Population II.

X₁ = rnorm(1000, 5, 10), Y₁ = X₁ + rnorm(1000, 0, 1), y₁ = Y₁ + rnorm(1000, 1, 3), x₁ = X₁ + rnorm(1000, 0, 1)

X₂ = rnorm(1200, 4, 8), Y₂ = X₂ + rnorm(1200, 0, 1), y₂ = Y₂ + rnorm(1200, 1, 3), x₂ = X₂ + rnorm(1200, 0, 1)

X₃ = rnorm(1300, 4, 9), Y₃ = X₃ + rnorm(1300, 0, 1), y₃ = Y₃ + rnorm(1300, 1, 3), x₃ = X₃ + rnorm(1300, 0, 1)

X₄ = rnorm(1500, 3, 7), Y₄ = X₄ + rnorm(1500, 0, 1), y₄ = Y₄ + rnorm(1500, 1, 3), x₄ = X₄ + rnorm(1500, 1, 3)

Number of Stratas = 4

N₁ = 1000, N₂ = 1200, N₃ = 1300, N₄ = 1500, n₁ = 200, n₂ = 210, n₃ = 220, n₄ = 215, , , , , , , , , , , , , , , , , ρ_1YX = 0.9950779, ρ_2YX = 0.9924618, ρ_3YX = 0.9939475, ρ_4YX = 0.9903624.

Population III.

X₁ = rnorm(200, 5, 10), Y₁ = X₁ + rnorm(200, 0, 1), y₁ = Y₁ + rnorm(200, 1, 3), x₁ = X₁ + rnorm(200, 0, 1)

X₂ = rnorm(200, 4, 8), Y₂ = X₂ + rnorm(200, 0, 1), y₂ = Y₂ + rnorm(200, 1, 3), x₂ = X₂ + rnorm(1200, 0, 1)

X₃ = rnorm(200, 4, 9), Y₃ = X₃ + rnorm(200, 0, 1), y₃ = Y₃ + rnorm(200, 1, 3), x₃ = X₃ + rnorm(1300, 0, 1)

X₄ = rnorm(200, 3, 7), Y₄ = X₄ + rnorm(200, 0, 1), y₄ = Y₄ + rnorm(200, 1, 3), x₄ = X₄ + rnorm(200, 1, 3)

Number of Stratas = 4

N₁ = 200, N₂ = 200, N₃ = 200, N₄ = 200, n₁ = 22, n₂ = 28, n₃ = 22, n₄ = 25, , , , , , , , , , , , , , , , , ρ_1YX = 0.9447295, ρ_2YX = 0.9214058, ρ_3YX = 0.9529337, ρ_4YX = 0.912232.

Population IV. (Source: [18])

Y: Number of teachers, X: Number of students.

Number of Stratas = 6

N₁ = 106, N₂ = 106, N₃ = 94, N₄ = 171, N₅ = 204, N₆ = 173, n₁ = 15, n₂ = 15, n₃ = 12, n₄ = 20, n₅ = 23, n₆ = 15, , , , , , , , , , , , , , , , , , , , , , , , . ρ_1YX = 0.8156414, ρ_2YX = 0.8156414, ρ_3YX = 0.9011201, ρ_4YX = 0.9858761, ρ_5YX = 0.7130988, ρ_6YX = 0.893595.

Population V. (Source: [19])

Y: 1983 population(in millions), X: 1982 gross national product

Number of Stratas = 5

N₁ = 38, N₂ = 14, N₃ = 11, N₄ = 33, N₅ = 24, n₁ = 17, n₂ = 6, n₃ = 4, n₄ = 12, n₅ = 11, , , , , , , , , , , , , , , , , , , , , ρ_1YX = 0.7439544, ρ_2YX = 0.969956, ρ_3YX = 0.9768227, ρ_4YX = 0.2948897, ρ_5YX = 0.9011072.

Population VI. (Source: [19])

Y: 1983 population(in millions), X: 1980 population(in millions)

Number of Stratas = 5

N₁ = 38, N₂ = 14, N₃ = 11, N₄ = 33, N₅ = 24, n₁ = 17, n₂ = 6, n₃ = 4, n₄ = 12, n₅ = 11, , , , , , , , , , , , , , , , , , , , , ρ_1YX = 0.9996193, ρ_2YX = 0.9998693, ρ_3YX = 0.9998858, ρ_4YX = 0.9993071, ρ_5YX = 0.9998059.

Population VII. ([20])

Y: Production of wheat(in tons), X: Area of wheat (in hectares)

Number of Stratas = 4

N₁ = 47, N₂ = 30, N₃ = 29, N₄ = 13, n₁ = 15, n₂ = 10, n₃ = 10, n₄ = 5, , , , , , , , , , , , , , , , , ρ_1YX = 0.9583838, ρ_2YX = 0.779071, ρ_3YX = 0.8719665, ρ_4YX = 0.9922591.

Supporting information

S1 File. Data used in the manuscript “S1 File.zip”.

https://doi.org/10.1371/journal.pone.0191572.s001

(ZIP)

Acknowledgments

The authors are grateful to the two anonymous referees for their valuable comments and feedback.

References

1. Hansen MH, Hurwitz WN. The problem of non-response in sample surveys. Journal of the American Statistical Association. 1946;41(236):517–529. pmid:20279350
- View Article
- PubMed/NCBI
- Google Scholar
2. Cochran WG. Sampling techniques. New York: John Wills & Sons. 1977;.
3. Rao P. Ratio estimation with subsampling the nonrespondents. Survey Methodology. 1986;12(2):217–230.
- View Article
- Google Scholar
4. Khare B, Srivastava S. Generalized two phase sampling estimators for the population mean in the presence of nonresponse. Aligarh Journal of Statistics. 2010;30:39–54.
- View Article
- Google Scholar
5. Singh HP, Kumar S, et al. Combination of regression and ratio estimate in presence of nonresponse. Brazilian journal of Probability and Statistics. 2011;25(2):205–217.
- View Article
- Google Scholar
6. Shabbir J, Khan NS. Some Modified Exponential-Ratio Type Estimators in the presence of Non-response under Two-Phase Sampling Scheme. Electronic Journal of Applied Statistical Analysis. 2013;6(1):1–17.
- View Article
- Google Scholar
7. Cochran WG. Errors of measurement in statistics. Technometrics. 1968;10(4):637–666.
- View Article
- Google Scholar
8. Fuller WA. Estimation in the presence of measurement error. International Statistical Review/Revue Internationale de Statistique. 1995; p. 121–141.
- View Article
- Google Scholar
9. Shalabh S. Ratio method of estimation in the presence of measurement errors. Jour Ind Soc Agri Statist. 1997;52:150–155.
- View Article
- Google Scholar
10. Biemer PP, Groves RM, Lyberg LE, Mathiowetz NA, Sudman S. Measurement Errors in Surveys. John Wiley & Sons; 2011.
11. Shukla D, Pathak S, Thakur N. An estimator for mean estimation in presence of measurement error. Research and Reviews: A Journal of Statistics. 2012;1(1):1–8.
- View Article
- Google Scholar
12. Kumar S, Bhougal S, Nataraja N, Viswanathaiah M. Estimation of Population Mean in the Presence of Non-Response and Measurement Error. Revista Colombiana de EstadÝstica. 2015;38(1):145–161.
- View Article
- Google Scholar
13. Singh RS, Sharma P. Method of Estimation in the Presence of Non-response and Measurement Errors Simultaneously. Journal of Modern Applied Statistical Methods. 2015;14(1):12.
- View Article
- Google Scholar
14. Azeem M, Hanif M. Joint influence of measurement error and non response on estimation of population mean. Communications in Statistics-Theory and Methods. 2017;46(4):1679–1693.
- View Article
- Google Scholar
15. Kumar S. Improved estimation of population mean in presence of nonresponse and measurement error. Journal of Statistical Theory and Practice. 2016;10(4):707–720.
- View Article
- Google Scholar
16. Shabbir J, Gupta S. On estimating finite population mean in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2010;40(2):199–212.
- View Article
- Google Scholar
17. Haq A, Shabbir J. Improved family of ratio estimators in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2013;42(5):782–799.
- View Article
- Google Scholar
18. Koyuncu N, Kadilar C. Ratio and product estimators in stratified random sampling. Journal of Statistical Planning and Inference. 2009;139(8):2552–2558.
- View Article
- Google Scholar
19. Särndal C, Swensson B, Wretman J. Model Assisted Survey Sampling. New York: Springer. 1992;.
20. FBS. Crops area production by districts, Islamabad; 2011.

[ref1] 1. Hansen MH, Hurwitz WN. The problem of non-response in sample surveys. Journal of the American Statistical Association. 1946;41(236):517–529. pmid:20279350
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Cochran WG. Sampling techniques. New York: John Wills & Sons. 1977;.

[ref3] 3. Rao P. Ratio estimation with subsampling the nonrespondents. Survey Methodology. 1986;12(2):217–230.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref4] 4. Khare B, Srivastava S. Generalized two phase sampling estimators for the population mean in the presence of nonresponse. Aligarh Journal of Statistics. 2010;30:39–54.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Singh HP, Kumar S, et al. Combination of regression and ratio estimate in presence of nonresponse. Brazilian journal of Probability and Statistics. 2011;25(2):205–217.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref6] 6. Shabbir J, Khan NS. Some Modified Exponential-Ratio Type Estimators in the presence of Non-response under Two-Phase Sampling Scheme. Electronic Journal of Applied Statistical Analysis. 2013;6(1):1–17.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Cochran WG. Errors of measurement in statistics. Technometrics. 1968;10(4):637–666.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref8] 8. Fuller WA. Estimation in the presence of measurement error. International Statistical Review/Revue Internationale de Statistique. 1995; p. 121–141.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Shalabh S. Ratio method of estimation in the presence of measurement errors. Jour Ind Soc Agri Statist. 1997;52:150–155.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Biemer PP, Groves RM, Lyberg LE, Mathiowetz NA, Sudman S. Measurement Errors in Surveys. John Wiley & Sons; 2011.

[ref11] 11. Shukla D, Pathak S, Thakur N. An estimator for mean estimation in presence of measurement error. Research and Reviews: A Journal of Statistics. 2012;1(1):1–8.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref12] 12. Kumar S, Bhougal S, Nataraja N, Viswanathaiah M. Estimation of Population Mean in the Presence of Non-Response and Measurement Error. Revista Colombiana de EstadÝstica. 2015;38(1):145–161.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref13] 13. Singh RS, Sharma P. Method of Estimation in the Presence of Non-response and Measurement Errors Simultaneously. Journal of Modern Applied Statistical Methods. 2015;14(1):12.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref14] 14. Azeem M, Hanif M. Joint influence of measurement error and non response on estimation of population mean. Communications in Statistics-Theory and Methods. 2017;46(4):1679–1693.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref15] 15. Kumar S. Improved estimation of population mean in presence of nonresponse and measurement error. Journal of Statistical Theory and Practice. 2016;10(4):707–720.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref16] 16. Shabbir J, Gupta S. On estimating finite population mean in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2010;40(2):199–212.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref17] 17. Haq A, Shabbir J. Improved family of ratio estimators in simple and stratified random sampling. Communications in Statistics-Theory and Methods. 2013;42(5):782–799.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref18] 18. Koyuncu N, Kadilar C. Ratio and product estimators in stratified random sampling. Journal of Statistical Planning and Inference. 2009;139(8):2552–2558.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref19] 19. Särndal C, Swensson B, Wretman J. Model Assisted Survey Sampling. New York: Springer. 1992;.

[ref20] 20. FBS. Crops area production by districts, Islamabad; 2011.

Figures

Abstract

Introduction

Existing estimators in literature

Hansen and Hurwitz (1946) estimator

Ratio estimator

Difference estimator

Azeem and Hanif (2017) estimator

The suggested estimator

Efficiency comparison

Numerical results

Simulation study

Application to real data

Conclusion

Appendix

Supporting information

S1 File. Data used in the manuscript “S1 File.zip”.

Acknowledgments

References