Figures
Abstract
In this paper, a general class of estimators is proposed for estimating the finite population mean for sensitive variable, in the presence of measurement error and non-response in simple random sampling. Expressions for bias and mean square error up to first order of approximation, are derived. Impact of measurement errors is examined using real data sets, including the survey conducted at Quaid-i-Azam University, Islamabad. Simulated data sets are also used to observe the performance of the proposed estimators in comparison to some other estimators. We obtain the empirical bias and MSE values for the proposed and the competing estimators.
Citation: Zahid E, Shabbir J, Gupta S, Onyango R, Saeed S (2022) A generalized class of estimators for sensitive variable in the presence of measurement error and non-response. PLoS ONE 17(1): e0261561. https://doi.org/10.1371/journal.pone.0261561
Editor: Maria Alessandra Ragusa, Universita degli Studi di Catania, ITALY
Received: September 7, 2021; Accepted: December 4, 2021; Published: January 19, 2022
Copyright: © 2022 Zahid et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: The authors(s) received no specific funding for this work.
Competing interests: No authors have competing interests.
1 Introduction
In survey sampling, if the variable of interest is sensitive in nature, the chance to get incorrect information increases. The problem of measurement error is usually ignored during the sensitive surveys and the assumption is made that the information obtained is free from error. Another important factor in surveys is non-response, which may arise due to refusal by respondents to give the information, or for not being at home. Usually measurement error and non-response are studied separately. In reality, when the variable of interest is sensitive, the respondents hesitate to provide the personal information, which gives rise to non-response. Many researchers have studied the problem of non-response, including [1–9]. In survey sampling, when the variable under study contains social stigma, the respondents are not comfortable to provide their personal information. Direct survey on sensitive questions increases the response bias. [10] introduced the randomized response technique (RRT), which reduces the possible response bias by insuring the privacy of the respondents. For estimation of mean of a sensitive quantitative variable, the Randomized Response Model (RRM) was used by [10] and [11]. Further work in this are done by [12–21], among others.
Many researchers have dealt with the problem of measurement error for estimating the population mean. For more details, see [22–27], etc. Recently few researchers have studied the problem of measurement error and non-response together; for example [28–33] have discussed the problem of measurement error and non-response under stratified random sampling.
In many cases, the researchers who have studied measurement error, have ignored the presence of non-response and randomized response particularly when using randomized response. In this study, we have proposed a class of estimators for the population mean of a sensitive variable in the presence of measurement error and non-response simultaneously, under simple random sampling. The efficiency of the suggested class of estimators as compared to the existing estimators is shown using simulated and real data sets.
Let Ω = {Ω1, Ω2, …, ΩN} be a finite population of size N. Suppose that a sample of size n is drawn from Ω by using simple random sampling without replacement. We assume that a population of size N consists of two mutually exclusive groups: N1 (respondents) and N2 (non-respondents). After selecting the sample, we assume that n1 units respond and n2 units do not respond. We select a sub-sample of size k, from the n2 non-responding units.
Let Y be the sensitive study variable, which is not observed directly and X be a non-sensitive auxiliary variable which has positive correlation with Y. Let Rx be ranks of the auxiliary variable X. Let S be a scrambling variable which is independent of Y and X. We assume that S has zero mean and variance . The respondent is asked to give a scrambled response for the study variable Y, given by Z = Y + S, and is asked to provide a true response for X.
Let be the observed values and
be the actual values corresponding to the ith(i = 1, 2, …n) sampled unit with
being the corresponding ranks of
. Then the measurement errors are:
,
and
. Note that Y is not observed directly, so we consider measurement error only on its scrambled version Z. Let
,
and
be the population variances of the variable Z, X and Rx respectively. Let
,
and
be the population variances of the variable Z, X and Rx respectively for the non-responding units. Let
,
and
be the population variances associated with measurement error in the variables Z, X and Rx respectively. Let
,
and
be the population variances associated with measurement error in the variables Z, X and Rx respectively for the non-responding units. Let ρZX,
,
be the population coefficients of correlation for respondents and ρZX(2), ρZRx(2), ρXRx(2) be the population coefficients of correlation for non-responding units.
The layout of paper is as follows: In Section 2, some existing estimators of the finite population mean are given. In Section 3, a generalized class of estimators is suggested for the finite population mean by incorporating both measurement error and non-response information simultaneously. Efficiency comparison is also presented. Numerical results and a simulation study are presented in Section 4. Some concluding remarks are given in Section 5.
2 Some existing estimators in literature
In this section, we consider the following existing estimators.
2.1 Hansen and Hurwitz (1946) estimator
In simple random sampling, Hansen and Hurwitz (1946) estimator for population mean , is given by
(1)
where
.
Here and
are the sample means based on n1 of responding units and k of the n2 non-responding units, respectively.
2.2 Ratio estimator
The usual ratio estimator under simple random sampling, is given by
(3)
where
is the known population mean and
is the sample mean (see Eq 22).
The bias and mean square error of , are given by
(4)
and
(5)
where
,
2.3 Product estimator
The usual product estimator under simple random sampling, is given by
(6)
The bias and mean square error of
, are given by
(7)
and
(8)
2.4 Bahl and Tuteja (1991) estimator
Bahl and Tuteja (1991) estimator under simple random sampling, is given by
(9)
The bias and mean square error of
, are given by
(10)
and
(11)
2.5 Singh and Kumar (2010) estimator
Singh and Kumar (2010) estimator under simple random sampling, is given by
(12)
The bias and mean square error of
, are given by
(13)
and
(14)
3 Proposed generalized class of estimators
We propose a generalized class of estimators for the population mean for a sensitive variable considering the problem of measurement error and non-response simultaneously. Measurement error and non-response are present on both the study variable and the auxiliary variable. The proposed estimator is given by
(20)
where, m1, m2 and m3 are constants whose values are to be determined, and αr(r = 0, 1, 2, 3) are the scalars, chosen arbitrarily. For obtaining the bias and mean square error, we assume that
,
,
,
,
,
.
Adding and
, we get
.
Dividing both sides by n, and then simplifying, we get
(21)
(22)
and
(23)
Further
,
,
,
,
,
.
On simplifying, we get
(24)
where
b* = α3,
,
,
and
.
,
and
.
Simplifying further, and ignoring error terms of power greater than two, we have
(25)
Using Eq (25), the bias of , to first order of approximation, is given by
(26)
Squaring both sides of Eq (25), and keeping the terms up to power two in errors, and then taking expectations, the mean square error of
is given by
where
.
The above equation can be written as
(27)
where,
,
,
,
,
,
,
,
,
.
For finding the optimal values of m1, m2 and m3, we differentiate Eq (27) with respect to m1, m2 and m3 respectively. The optimal values are given by
and
Substituting these optimum values in Eq (27), we get the minimum mean square error of
, as
(28)
where
and
.
3.1 Specific members of generalized proposed class of estimators
for different choices of (α0, α1, α2, α3, m1, m2, m3)
We consider the following members of the class of estimators by choosing different values of α0, α1, α2, α3, m1, m2 and m3.
1. For α1 = m2 = m3 = 0 and α0 = m1 = 1 in Eq 20, the generalized proposed class of estimators reduces to usual mean estimator:
2. For α0 = α1 = m1 = 1 and m2 = m3 = 0 in Eq 20, the generalized proposed class of estimators reduces to usual ratio estimator:
3. For α0 = m1 = 1, α1 = −1 and m2 = m3 = 0 in Eq 20, the generalized proposed class of estimators reduces to usual product estimator:
4. For α0 = α1 = m2 = m3 = 0 and m1 = 1 in Eq 20, the generalized proposed class of estimators reduces to the estimator in Eq 9:
5. For α0 = α1 = α2 = 1, m3 = 0, m1 = m4 and m2 = m5 in Eq 20, the generalized proposed class of estimators reduces to the following estimator.
6. For α0 = m2 = m3 = 0, α1 = 2 and m1 = 1 in Eq 20, the generalized proposed class of estimators reduces to the estimator in Eq 12:
7. For α0 = α1 = α2 = 0, m3 = 0, m1 = m6 and m2 = m7 in Eq 20, the generalized proposed class of estimators reduces to the following estimator:
8. For α1 = α2 = m3 = 0, α0 = m1 = 1 and m2 = d in Eq 20, the generalized proposed class of estimators reduces to difference estimator:
9. For α0 = α1 = α2 = α, m1 = m8, m2 = m9 and m3 = 0 in Eq 20, the generalized proposed estimator reduces to another form of proposed estimator.
(29)
3.2 Efficiency comparison
The efficiency comparison of and
with respect to
are given by:
Condition (1)
if
Condition (2)
if
Condition (3)
if
Condition (4)
if
Condition (5)
if
Condition (6)
if
Condition (7)
if
The proposed class of estimators is more efficient than the competing estimators when Conditions (1) to (7) are satisfied. Table 1 shows that all the conditions are satisfied.
4 Numerical results
In this section three populations are generated for simulation study and three real data sets are used. The results are given in Tables 2–7 (simulated data) and Tables 9–14 (real data).
4.1 Simulation study
We have generated three populations from a normal distribution by using R language program. In Tables 2–7, we can see that the MSE for the generalized proposed estimator is minimum, both with and without measurement error. The value for the bias (in brackets) of the estimators are also given in Tables 2–7.
Population I.
X = N(5, 10), Y = X + N(0, 1), y = Y + N(1, 3), x = X + N(1, 3), N = 5000, ,
,
,
,
,
,
,
,
, ρYX = 0.995059,
,
.
Population II.
X = N(5, 10), Y = X + N(0, 1), y = Y + N(2, 3), x = X + N(2, 3), N = 5000, ,
,
,
,
,
,
,
,
, ρYX = 0.995187,
,
.
Population III.
X = N(5, 10), Y = X + N(0, 1), y = Y + N(1, 2.5), x = X + N(1, 2.5), N = 5000, ,
,
,
,
,
,
,
,
, ρZX = 0.9749021,
,
.
Tables 2–7 show that the generalized class of proposed estimators performs better than all other estimators for both with and without measurement errors. The values of the absolute biases are given in brackets. Table 2 shows that generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when αr = 1, r = 0, 1, 2, 3 is 0.021233 for 10% non-response rate. When the non-response rate increases to 20%, the MSE for generalized proposed estimator increases to 0.023755. It is also observed that
is less biased and
is most biased among all estimators. Table 3 shows the same pattern of results.
Table 4 shows that generalized proposed estimators performs better than other estimators. The MSE for the generalized proposed estimator, when αr = 1, r = 0, 1, 2, 3 is 0.021796 for 10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.024110. It is also observed that is less biased and
is most biased among all estimators. Table 5 shows the same pattern of results.
Table 6 also shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when αr = 1, r = 0, 1, 2, 3 is 0.014048 for 10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.015831. It is also observed that is less biased and
is most biased among all estimators. Table 7 shows the same pattern of results.
Through the simulation study, it is concluded that the generalized proposed class of estimators perform better as compared to the other existing estimators. For 10% non-response rate the MSE is minimum.
4.2 Application to real data set
In this section we have considered three data sets for numerical comparisons and results are given below. Population IV consists of 654 observations. The data summary is given below (see Table 8).
Population IV [Source: [34]]
In Population IV, Forced expiratory volume is taken as the study variable, Age as the auxiliary variables and Smoke (No = 0, Yes = 1) is taken as scrambling response. The correlation coefficients are: ρZX = 0.7564, and
.
4.2.1 Data collection.
To see the practical implication of measurement error, we conducted a study based on real data set at Quaid-i-Azam University, Islamabad during 2018. We distributed 55 questionnaires to the students of BS Statistics (5th Semester Fall, 2018) and M.Phil Statistics (1st and 2nd Semesters, Fall 2018). We consider our population of those students who gave the false response, which comes out to be 23. As we already have the true response from their academic record, which is available in the department of statistics. In question (i) we asked about Y = Age and X = Marks (in percentage) of Intermediate or Matric. In question (ii) S = Social media effects the academic result is asked, where Y is the study variable, X is the auxiliary variable and S is the scrambling response variable. We have 23 students (N = 23), including 8 male students and 15 female students who gave the false response.
Population V. [Source: Section 4.2]
The explanation of the data set is given in the introduction Section 4.2.
Y: Age of BS 5th and Mphil Students of Statistics department, X: Marks in O level or Matric, S: Social media effects on the academic result
N = 23, ,
,
,
,
,
,
,
, ρZX = 0.046204,
,
.
Population VI. [Source: Section 4.2]
The explanation of the data set is given in the introduction Section 4.2.
Y: Age of BS 5th and Mphil Students of Statistics department, X: Marks in A level or Intermediate, S: Social media effects on the academic result, Z = Y + S
N = 23, ,
,
,
,
,
,
,
, ρZX = 0.117576,
,
.
Tables 9–14 show that the generalized class of proposed estimators performs better than all other existing estimators both with and without measurement errors. The values of the absolute biases are given in brackets in Tables 9–14.
Table 9 shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when αr = 1, r = 0, 1, 2, 3 is 0.011258 for 10% non-response rate. When the non-response rate increases to 20%, the MSE for generalized proposed estimator increases to 0.012462. It is also observed that is less biased and
is most biased among all considered estimators. Table 10 shows the same pattern of results.
Table 11 shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when αr = 1, r = 0, 1, 2, 3 is 0.442375 for 10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.499340. It is also observed that is less biased and
is most biased among all considered estimators. Table 12 shows the same pattern of results.
Table 13 shows that the generalized proposed estimator performs better than other estimators. The MSE for the generalized proposed estimator, when αr = 1, r = 0, 1, 2, 3 is 0.449115 for 10% non-response rate. When the non-response rate becomes 20%, the MSE for generalized proposed estimator increases to 0.478928. It is also observed that is less biased and
is most biased among all considered estimators. Table 14 shows the same pattern of results.
Through numerical study it is concluded that the generalized proposed estimator performs better as compared to all other existing estimators. For 10% non-response rate the MSE value is minimum. The MSE values also increase as the value of constant g increases.
5 Conclusion
In this study, we proposed a generalized class of estimators for the finite population mean when the variable of interest is stigmatizing in nature, considering both measurement error and non-response under simple random sampling. Through simulation study (see Tables 2–7) and real data sets (see Tables 9–14) it is observed that the proposed class of estimators performs better than all existing estimators considered here.
Supporting information
S1 File. Data used in the manuscript “fevdata.csv”.
https://doi.org/10.1371/journal.pone.0261561.s001
(CSV)
Acknowledgments
The authors are grateful to the anonymous referees for their valuable comments and feedback.
References
- 1. Hansen MH, Hurwitz WN. The problem of non-response in sample surveys. Journal of the American Statistical Association. 1946;41(236):517–529. pmid:20279350
- 2.
Cochran WG. Sampling Techniques. John Wiley & Sons. New York. 1977.
- 3. Rao P. Ratio estimation with subsampling the nonrespondents. Survey Methodology. 1986;12(2):217–230.
- 4. Khare B, Srivastava S. Generalized two phase sampling estimators for the population mean in the presence of nonresponse. Aligarh Jouranal of Statistics. 2010;30:39–54.
- 5. Andridge RR, Little RJ. A review of hot deck imputation for survey non-response. International Statistical Review. 2010;78(1):40–64. pmid:21743766
- 6. Singh HP, Kumar S, et al. Combination of regression and ratio estimate in presence of nonresponse. Brazilian Journal of Probability and Statistics. 2011;25(2):205–217.
- 7. Khare B, Pandey S, Kumar A. Estimation of Population Mean in Sample Surveys Using Auxiliary Character, Method of Call Backs and Subsampling from Non-respondents. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences. 2013;83(1):49–54.
- 8. Shabbir J, Khan NS. Some Modified Exponential-Ratio Type Estimators in the presence of Non-response under Two-Phase Sampling Scheme. Electronic Journal of Applied Statistical Analysis. 2013;6(1):1–17.
- 9. Shabbir J, Gupta S, Ahmed S. A generalized class of estimators under two-phase stratified sampling for non response. Communications in Statistics-Theory and Methods. 2018; p. 1–17.
- 10. Warner SL. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association. 1965;60(309):63–69. pmid:12261830
- 11. Greenberg BG, Kuebler RR Jr, Abernathy JR, Horvitz DG. Application of the randomized response technique in obtaining quantitative data. Journal of the American Statistical Association. 1971;66(334):243–250.
- 12. Eichhorn BH, Hayre LS. Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and inference. 1983;7(4):307–316.
- 13. Gupta S, Shabbir J. Sensitivity estimation for personal interview survey questions. Statistica. 2004;64(4):643–653.
- 14. Kim JM, Warde WD. A stratified Warner’s randomized response model. Journal of Statistical Planning and Inference. 2004;120(1-2):155–165.
- 15. Singh HP, Mathur N. Estimation of population mean when coefficient of variation is known using scrambled response technique. Journal of Statistical Planning and Inference. 2005;131(1):135–144.
- 16. Gjestvang CR, Singh S. A new randomized response model. Journal of the Royal Statistical Society. 2006;68(3):523–530.
- 17. Diana G, Perri PF. New scrambled response models for estimating the mean of a sensitive quantitative character. Journal of Applied Statistics. 2010;37(11):1875–1890.
- 18. Gupta S, Shabbir J, Sehra S. Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference. 2010;140(10):2870–2874.
- 19. Chaudhuri A, Pal S. On efficacy of empirical Bayes estimation of a finite population mean of a sensitive variable through randomized responses. Model Assisted Statistics and Applications. 2015;10(4):283–288.
- 20. Gupta S, Shabbir J, Sousa R, Corte-Real P. Improved Exponential Type Estimators of the Mean of a Sensitive Variable in the Presence of Nonsensitive Auxiliary Information. Communications in Statistics-Simulation and Computation. 2016;45(9):3317–3328.
- 21. Bouza CN, Singh P, Singh R. Ranked Set Sampling and Optional Scrambling Randomized Response Modeling. Investigación Operacional. 2018;39(1):100–107.
- 22. Cochran WG. Errors of measurement in statistics. Technometrics. 1968;10(4):637–666.
- 23. Fuller WA. Estimation in the presence of measurement error. International Statistical Review. 1995;63(2):121–141.
- 24. Shalabh S. Ratio method of estimation in the presence of measurement errors. Journal of Indian Society of Agriculture Statististics. 1997;52:150–155.
- 25.
Biemer PP, Groves RM, Lyberg LE, Mathiowetz NA, Sudman S. Measurement Errors in Surveys. John Wiley & Sons; 2011.
- 26. Shukla D, Pathak S, Thakur N. An estimator for mean estimation in presence of measurement error. Research and Reviews: A Journal of Statistics. 2012;1(1):1–8.
- 27. Zahid E, Shabbir J. Estimation of finite population mean for a sensitive variable using dual auxiliary information in the presence of measurement errors. PloS one. 2019;14(2):e0212111. pmid:30742674
- 28. Kumar S, Bhougal S, Nataraja N, Viswanathaiah M. Estimation of Population Mean in the Presence of Non-Response and Measurement Error. Revista Colombiana de EstadÝstica. 2015;38(1):145–161.
- 29. Singh RS, Sharma P. Method of Estimation in the Presence of Non-response and Measurement Errors Simultaneously. Journal of Modern Applied Statistical Methods. 2015;14(1):12.
- 30. Azeem M, Hanif M. Joint influence of measurement error and non response on estimation of population mean. Communications in Statistics-Theory and Methods. 2017;46(4):1679–1693.
- 31. Kumar S. Improved estimation of population mean in presence of non-response and measurement error. Journal of Statistical Theory and Practice. 2016;(just-accepted).
- 32. Zahid E, Shabbir J. Estimation of population mean in the presence of measurement error and non response under stratified random sampling. PloS one. 2018;13(2):e0191572. pmid:29401519
- 33. Khalil S, Gupta S, Hanif M. Estimation of finite population mean in stratified sampling using scrambled responses in the presence of measurement errors. Communications in Statistics-Theory and Methods. 2018; p. 1–9.
- 34.
Rosner B. Fundamentals of Biostatistics. Duxbury Press. 2015.