A generalized class of estimators for sensitive variable in the presence of measurement error and non-response

In this paper, a general class of estimators is proposed for estimating the finite population mean for sensitive variable, in the presence of measurement error and non-response in simple random sampling. Expressions for bias and mean square error up to first order of approximation, are derived. Impact of measurement errors is examined using real data sets, including the survey conducted at Quaid-i-Azam University, Islamabad. Simulated data sets are also used to observe the performance of the proposed estimators in comparison to some other estimators. We obtain the empirical bias and MSE values for the proposed and the competing estimators.


Introduction
In survey sampling, if the variable of interest is sensitive in nature, the chance to get incorrect information increases. The problem of measurement error is usually ignored during the sensitive surveys and the assumption is made that the information obtained is free from error. Another important factor in surveys is non-response, which may arise due to refusal by respondents to give the information, or for not being at home. Usually measurement error and non-response are studied separately. In reality, when the variable of interest is sensitive, the respondents hesitate to provide the personal information, which gives rise to non-response. Many researchers have studied the problem of non-response, including [1][2][3][4][5][6][7][8][9]. In survey sampling, when the variable under study contains social stigma, the respondents are not comfortable to provide their personal information. Direct survey on sensitive questions increases the response bias. [10] introduced the randomized response technique (RRT), which reduces the possible response bias by insuring the privacy of the respondents. For estimation of mean of a sensitive quantitative variable, the Randomized Response Model (RRM) was used by [10] and [11]. Further work in this are done by [12][13][14][15][16][17][18][19][20][21], among others.
Many researchers have dealt with the problem of measurement error for estimating the population mean. For more details, see [22][23][24][25][26][27], etc. Recently few researchers have studied the problem of measurement error and non-response together; for example [28][29][30][31][32][33] have discussed the problem of measurement error and non-response under stratified random sampling.
In many cases, the researchers who have studied measurement error, have ignored the presence of non-response and randomized response particularly when using randomized response. In this study, we have proposed a class of estimators for the population mean of a sensitive variable in the presence of measurement error and non-response simultaneously, under simple random sampling. The efficiency of the suggested class of estimators as compared to the existing estimators is shown using simulated and real data sets.
Let O = {O 1 , O 2 , . . ., O N } be a finite population of size N. Suppose that a sample of size n is drawn from O by using simple random sampling without replacement. We assume that a population of size N consists of two mutually exclusive groups: N 1 (respondents) and N 2 (nonrespondents). After selecting the sample, we assume that n 1 units respond and n 2 units do not respond. We select a sub-sample of size k, k ¼ n 2 g ; g > 1 � � from the n 2 non-responding units.
Let Y be the sensitive study variable, which is not observed directly and X be a non-sensitive auxiliary variable which has positive correlation with Y. Let R x be ranks of the auxiliary variable X. Let S be a scrambling variable which is independent of Y and X. We assume that S has zero mean and variance S 2 s . The respondent is asked to give a scrambled response for the study variable Y, given by Z = Y + S, and is asked to provide a true response for X.
Let ðz � i ; y � i ; x � i ; r � x i Þ be the observed values and ðZ � i ; Y � i ; X � i ; R � x i Þ be the actual values corresponding to the i th (i = 1, 2, . . .n) sampled unit with R � x i being the corresponding ranks of X � i . Then the measurement errors are: Note that Y is not observed directly, so we consider measurement error only on its scrambled version Z. Let S 2 Z , S 2 X and S 2 R x be the population variances of the variable Z, X and R x respectively. Let S 2 Zð2Þ , S 2 Xð2Þ and S 2 R x ð2Þ be the population variances of the variable Z, X and R x respectively for the nonresponding units. Let S 2 Q , S 2 V and S 2 T be the population variances associated with measurement error in the variables Z, X and R x respectively. Let S 2 Qð2Þ , S 2 Vð2Þ and S 2 Tð2Þ be the population variances associated with measurement error in the variables Z, X and R x respectively for the nonresponding units. Let ρ ZX , r ZR x , r XR x be the population coefficients of correlation for respondents and ρ ZX (2) , ρ ZRx (2), ρ XRx (2) be the population coefficients of correlation for nonresponding units.
The layout of paper is as follows: In Section 2, some existing estimators of the finite population mean are given. In Section 3, a generalized class of estimators is suggested for the finite population mean by incorporating both measurement error and non-response information simultaneously. Efficiency comparison is also presented. Numerical results and a simulation study are presented in Section 4. Some concluding remarks are given in Section 5.

Some existing estimators in literature
In this section, we consider the following existing estimators.

Hansen and Hurwitz (1946) estimator
In simple random sampling, Hansen and Hurwitz (1946) estimator for population mean � Y , is given by where � z � ¼ n 1 n Here � z n 1 ¼ 1 n 1 P n 1 i¼1 z i and � z k ¼ 1 k P k i¼1 z i are the sample means based on n 1 of responding units and k of the n 2 non-responding units, respectively.
The variance of � y � 0 HH , is given by

Ratio estimator
The usual ratio estimator under simple random sampling, is given by where Eq 22).
The bias and mean square error of � y � 0 R , are given by and

Product estimator
The usual product estimator under simple random sampling, is given by The bias and mean square error of � y � Pr , are given by and

PLOS ONE
The bias and mean square error of � y � 0 BT , are given by and

Singh and Kumar (2010) estimator
Singh and Kumar (2010) estimator under simple random sampling, is given by The bias and mean square error of � y � 0 SK , are given by and

Difference estimator
The difference estimator under simple random sampling, is given by where � x 0 � ¼ N � X À n� x � NÀ n and d �0 is a constant. The minimum variance of � y � 0 D , is given by The optimum value of d is

Azeem and Hanif (2017) estimator
Azeem and Hanif (2017) estimator under simple random sampling, is given by The bias and MSE of � y � 0 AH , are given by and where q ¼ Nþn NÀ n .

Proposed generalized class of estimators
We propose a generalized class of estimators for the population mean for a sensitive variable considering the problem of measurement error and non-response simultaneously. Measurement error and non-response are present on both the study variable and the auxiliary variable. The proposed estimator is given by where, m 1 , m 2 and m 3 are constants whose values are to be determined, and α r (r = 0, 1, 2, 3) are the scalars, chosen arbitrarily. For obtaining the bias and mean square error, we assume that Dividing both sides by n, and then simplifying, we get

PLOS ONE
On simplifying, we get Simplifying further, and ignoring error terms of power greater than two, we have Using Eq (25), the bias of � y � 0 GP , to first order of approximation, is given by Squaring both sides of Eq (25), and keeping the terms up to power two in errors, and then taking expectations, the mean square error of � y � 0 GP is given by

The above equation can be written as
where, For finding the optimal values of m 1 , m 2 and m 3 , we differentiate Eq (27) with respect to m 1 , m 2 and m 3 respectively. The optimal values are given by Substituting these optimum values in Eq (27), we get the minimum mean square error of � y � 0 GP , as where

Specific members of generalized proposed class of estimators
We consider the following members of the class of estimators � y � 0 GP by choosing different values of α 0 , α 1 , α 2 , α 3 , m 1 , m 2 and m 3 .

Condition (7)
From Eqs (19) and (28), The proposed class of estimators � y � 0 GP is more efficient than the competing estimators when Conditions (1) to (7) are satisfied. Table 1 shows that all the conditions are satisfied.

Numerical results
In this section three populations are generated for simulation study and three real data sets are used. The results are given in Tables 2-7 (simulated data) and Tables 9-14 (real data).

Simulation study
We have generated three populations from a normal distribution by using R language program. In Tables 2-7, we can see that the MSE for the generalized proposed estimator is minimum, both with and without measurement error. The value for the bias (in brackets) of the estimators are also given in Tables 2-7.
Population I. Tables 2-7 show that the generalized class of proposed estimators � y � 0 GP performs better than all other estimators for both with and without measurement errors. The values of the absolute

PLOS ONE
biases are given in brackets. Table 2 shows that generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when α r = 1, r = 0, 1, 2, 3 is 0.021233 for 10% non-response rate. When the non-response rate increases to 20%, the MSE for generalized proposed estimator increases to 0.023755. It is also observed that � y � 0 R is less biased and � y � 0 BT is most biased among all estimators. Table 3 shows the same pattern of results. Table 4 shows that generalized proposed estimators performs better than other estimators. The MSE for the generalized proposed estimator, when α r = 1, r = 0, 1, 2, 3 is 0.021796 for 10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.024110. It is also observed that � y � 0 R is less biased and � y � 0 BT is most biased among all estimators. Table 5 shows the same pattern of results.    PLOS ONE Table 6 also shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when α r = 1, r = 0, 1, 2, 3 is 0.014048 for 10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.015831. It is also observed that � y � 0 R is less biased and � y � 0 SK is most biased among all estimators. Table 7 shows the same pattern of results. Through the simulation study, it is concluded that the generalized proposed class of estimators perform better as compared to the other existing estimators. For 10% non-response rate the MSE is minimum.

Application to real data set
In this section we have considered three data sets for numerical comparisons and results are given below. Population IV consists of 654 observations. The data summary is given below (see Tables 8).
Population IV [Source: [34]] In Population IV, Forced expiratory volume is taken as the study variable, Age as the auxiliary variables and Smoke (No = 0, Yes = 1) is taken as scrambling response. The correlation coefficients are: ρ ZX = 0.7564, r XR x ¼ 0:7831 and r ZR x ¼ 0:6161.

Data collection.
To see the practical implication of measurement error, we conducted a study based on real data set at Quaid-i-Azam University, Islamabad during 2018. We distributed 55 questionnaires to the students of BS Statistics (5th Semester Fall, 2018) and M. Phil Statistics (1st and 2nd Semesters, Fall 2018). We consider our population of those students who gave the false response, which comes out to be 23. As we already have the true response from their academic record, which is available in the department of statistics. In question (i) we asked about Y = Age and X = Marks (in percentage) of Intermediate or Matric. In question (ii) S = Social media effects the academic result is asked, where Y is the study variable, X is the auxiliary variable and S is the scrambling response variable. We have 23 students (N = 23), including 8 male students and 15 female students who gave the false response.

PLOS ONE
Tables 9-14 show that the generalized class of proposed estimators � y � 0 GP performs better than all other existing estimators both with and without measurement errors. The values of the absolute biases are given in brackets in Tables 9-14. Table 9 shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when α r = 1, r = 0, 1, 2, 3 is 0.011258 for 10% non-response rate. When the non-response rate increases to 20%, the MSE for generalized proposed estimator increases to 0.012462. It is also observed that � y � 0 R is less biased and � y � 0 BT is most biased among all considered estimators. Table 10 shows the same pattern of results. Table 11 shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when α r = 1, r = 0, 1, 2, 3 is 0.442375 for

PLOS ONE
10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.499340. It is also observed that � y � 0 AH is less biased and � y � 0 BT is most biased among all considered estimators. Table 12 shows the same pattern of results. Table 13 shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when α r = 1, r = 0, 1, 2, 3 is 0.449115 for 10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.478928. It is also observed that � y � 0 Pr is less biased and � y � 0 BT is most biased among all considered estimators. Table 14 shows the same pattern of results.
Through numerical study it is concluded that the generalized proposed estimator performs better as compared to all other existing estimators. For 10% non-response rate the MSE value is minimum. The MSE values also increase as the value of constant g increases.

Conclusion
In this study, we proposed a generalized class of estimators for the finite population mean when the variable of interest is stigmatizing in nature, considering both measurement error and non-response under simple random sampling. Through simulation study (see Tables 2-7) and real data sets (see  it is observed that the proposed class of estimators � y � 0 GP performs better than all existing estimators considered here. Supporting information S1 File. Data used in the manuscript "fevdata.csv". (CSV)