Estimation of population mean in the presence of measurement error and non response under stratified random sampling

In the present paper we propose an improved class of estimators in the presence of measurement error and non-response under stratified random sampling for estimating the finite population mean. The theoretical and numerical studies reveal that the proposed class of estimators performs better than other existing estimators.


Introduction
In survey sampling, usually it is assumed that all the observations are correctly measured on the characteristics under study. But in practice this assumption is not met for a variety of reasons, such as non-response may occurs due to refusal of respondents to give the information or not at home or lack of interest or due to some ethical reasons. Usually measurement error and non-response are studied separately using the known auxiliary or additional information. In reality, both measurement error and non-response occur simultaneously in survey sampling. Mostly the information is not obtained from all the units during surveys, so nonresponse is a common problem which may creeps during a sample survey. In sampling theory the estimation of population mean of a variable of interest in the presence of non-response, when the auxiliary information available is widely debated. [1][2][3][4][5] and [6] discussed the problems of non-response in detail. To estimate the population mean, the researchers dealt with the problem of measurement error. For more details, see [7][8][9][10][11], etc. Recently few researchers studied the problem of measurement error and non-response together like [12][13][14] and [15]. [16] and [17] studied the improved estimation of population mean in simple and stratified random sampling.
In practice, the researchers who have studied measurement error have ignored the presence of non-response. But very few of them studied both under simple random sampling. In this paper, we have proposed a class of estimators for estimating the population mean in the presence of measurement error and non-response simultaneously under stratified random sampling. The efficiency of the suggested class of estimators over the existing estimators is shown through simulation study and real data sets.
Consider and P L h¼1 N h ¼ N. It is assumed that N consists of two mutually exclusive groups called response and non-response groups. Let N 1h and N 2h are the responding and non-responding unit in the h th stratum respectively. We select a sample of size n h from N h by using simple random sampling without replacement (SRSWOR) and assume that n 1h units respond and n 2h units do not respond. We select a sub-sample of size k h , k h ¼ n 2h g h ; g h > 1 from n 2h nonresponding units in the h th stratum. Let ðx Ã hi ; y Ã hi Þ be the observed values and ðX Ã hi ; Y Ã hi Þ be the actual values on the variables (X, Y) of the i th (i = 1, 2, . . ., n) sample units in the h th stratum. Then the measurement errors be Let S 2 hY and S 2 hX be the population variances for the responding units, and S 2 hYð2Þ and S 2 hXð2Þ be the population variances for non-responding units. Let S 2 hV and S 2 hU be the population variances associated with the measurement error for responding units. Let S 2 hVð2Þ and S 2 hUð2Þ be the population variances associated with measurement error for the non responding part of the population. Further let C hY and C hX be the coefficient of variations for the respondents and C hY (2) and C hX (2) be the coefficient of variations for the non-responding units respectively. Let ρ hYX and ρ hYX (2) be the population correlation coefficients between their respective subscripts respectively for responding and non-responding units, respectively.
In this paper an improved class of estimators for estimating the population mean in the presence of measurement error and non-response is proposed under stratified random sampling. Expressions for the bias and mean square error (MSE) of the class of estimators are obtained upto first order of approximation, when both the study and the auxiliary variables suffer with a problem of non-response and measurement errors.
The present paper is organized as: Section 2 gives some existing estimators of the finite population mean. In Section 3, an improved class of estimators is suggested for estimating the finite population mean by incorporating both measurement error and non-response information simultaneously. Efficiency comparison is presented in section 4. Numerical results and simulation study are presented in Section 5. Conclusion is given in Section 6.

Existing estimators in literature
In this section we consider the following existing estimators.

Hansen and Hurwitz (1946) estimator
In stratified random sampling, the Hansen and Hurwitz (1946) estimator for population mean Y, is given by Here y n 1h and y0 2h are the sample means based on n 1h of responding and g h units of subsamples from n 2h non-responding groups, respectively.
The variance of y Ã SðHHÞ , is given by where

Ratio estimator
The usual ratio estimator under stratified random sampling, is given by where X h is known and x Ã h is given in Eq (13). The bias and mean square error of y Ã SðRÞ , are given by and (2) .

Difference estimator
The usual difference estimator under stratified random sampling, is given by and d h is the constant.
The minimum variance of y Ã SðDÞ is given by The optimum value of d h is d hðoptÞ

Azeem and Hanif (2017) estimator
Azeem and Hanif (2016) estimator under stratified random sampling, is given by The bias and MSE of y Ã SðAHÞ , are given by and MSEðy Ã SðAHÞ ÞÞ ffi

The suggested estimator
We propose an improved general class of estimators for estimating the population mean, dealing with the problem of measurement error and non-response simultaneously in stratified random sampling. Measurement error and non-response is present on both, the study and the auxiliary variables. The suggested estimator is given by where, m 1h and m 2h are constants whose values are to be determined and α h is the scalar, chosen arbitrary. For obtaining the bias and mean square error, we assume that Dividing both sides by n h , and then simplifying, we get Similarly, we can get Further On simplifying, we get where Further simplifying, and ignoring error terms greater than two, we have where, : Using Eq (15), the bias of y Ã SðPÞ to first order of approximation, is given by Squaring both sides of Eq (15), and keeping the terms up to power two in errors, and then taking expectations, the mean square error of y Ã SðPÞ is given by The above equation can be written as where, For finding the optimal values of m 1h and m 2h , we differentiate Eq (17) with respect to m 1h and m 2h respectively. The optimal values are given by m 1hðoptÞ Substituting these optimum values in Eq (17), we get the minimum mean square error of y Ã SðPÞ , as

From Eqs
3. From Eqs (7) and (18), (10) and (18), The proposed class of estimators is more efficient than other existing estimators when above conditions 1 to 4 are satisfied.

Numerical results
In this section three populations are generated for simulation study and four are based on real data sets. The results are given in Tables 1-3 (simulation) and 4-7 (real data).

Simulation study
We have generated three populations from normal distribution by using R language program. First population is generated for equal strata and the second one is generated for unequal strata and third one is generated for equal strata of small sample size (see Appendix).
The above tables shows that a general class of proposed estimators outperform all the other existing estimators. For a ¼ 0; y Ã SðPÞ shows the better performance.

Application to real data
In this section we have considered four data sets (see Appendix) for numerical comparisons and results are given in Tables 4-7.  In these tables, we observed that a general class of proposed estimators is more efficient as compared to all other considered estimators. For a ¼ 0; y Ã SðPÞ shows the better performance.

Conclusion
In the present paper, we have suggested an improved class of estimators of the finite population mean in the presence of measurement error and non-response under stratified random sampling. Through simulation study and real life data sets it is observed that the proposed class of estimators perform better than the existing estimators. The mean square error values are generally smaller under 10% of non-response as compared to 20% of non-response, which are expected results. Generally as the non-response rate increases, mean square error also increases. Based on numerical findings, it turns out that the proposed class of estimators is more efficient for the situations when α = 0, α = 1 and α = −1 as compared to the other existing estimators. Among different classes, the performance of proposed class of estimators is better for α = 0 in Tables 1-7.