Figures
Abstract
Gupta et al. suggested an improved estimator by using the Diana and Perri model in estimating the finite population variance using the single auxiliary variable. On the same lines, Saleem et al. proposed a new scrambled randomized response model (RRT) based on two auxiliary variables for estimating the finite population variance. Recently Azeem et al. presented a new randomized response model in estimating the finite population variance. It is observed that Bias and MSE of these estimators up to first order of approximation seem to lack sufficient information. In this study, we rectify the bias and MSE expressions of the estimators proposed by Gupta et al., Saleem et al. and Azeem et al. Additionally, we suggest a new generalized class of estimators that is more efficient in comparison to the previously considered estimators. A simulation study is conducted to establish the behavior of the estimators. The suggested estimator performs better than the estimators considered by the authors earlier.
Citation: Shabbir J, Movaheedi Z (2024) Use of generalized randomized response model for enhancement of finite population variance: A simulation approach. PLoS ONE 19(12): e0315658. https://doi.org/10.1371/journal.pone.0315658
Editor: Rab Nawaz, COMSATS University Islamabad, PAKISTAN
Received: September 13, 2024; Accepted: November 28, 2024; Published: December 20, 2024
Copyright: © 2024 Shabbir, Movaheedi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the data are available within the manuscript.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Obtaining reliable data, particularly direct and honest response from a respondent is a serious issue which takes a lot of effort and time. Under non-sampling errors, nonresponse may appear due to response bias and nonresponse bias. Consequently, results may be deceptive and misleading, which have a negative impact on decision-making issues. As an alternative, to get around these issues, Warner [1] developed an indigenous model known as randomized response model (RRM). In current situation, it is challenging to gather information on sensitive topics like forced abortion, sexual assault, tax evasion, bribery, corruption, drug addiction, human trafficking, and malpractice etc., because respondents may refuse to give honest answers on such delicate phenomena. So, results may remain erroneous or may be misleading and lead to serious biasedness. To preserve confidentiality and protect respondent’s privacy, another option is to use the RRM.
In our daily lives, we come across a variety of scenarios where unpredictability is involved, such as regular variations in the prices of consumable goods and the significant impact of political instability on stock market crashes. A thorough treatment of variability is essential in such contents. Many authors including Yadav et al. [2], Asghar et al. [3], Sanaullah et al. [4] Niaz et al. [5] and Azeem et al. [6] have investigated the estimation of finite population variance using the scrambled models. Sensitive variable was originally used by Singh et al. [7] to estimate finite population variance under multiplicative scrambled response model. A class of estimators for quantitative sensitivity in quantitative RRM was proposed by Diana and Perri [8]. Ahmad et al. [9] and Gupta et al. [10] suggested some variance estimators using the auxiliary information for sensitive variables. Aloraini et al. [11] introduced several separate and combined variance estimators under stratified sampling for estimation of variance. Azeem et al. [12] suggested a new RRM for mean estimation. Zaman et al. [13] discussed the focus group under RRT to obtain efficient estimators. Some other important references include are: [14–19].
In this paper, we adjusted the bias and mean squared error (MSE) expressions that were previously proposed by Gupta et al. [10], Saleem et al. [20] and Azeem et al. [6] under RRT by utilizing the auxiliary variables. Furthermore, we propose an alternative generalized class of estimators for population variance using generalized RRM.
Diana and Perri [8] suggested a model Z = TY+S, where T and S are two scrambled variables such that E(S) = 0 and E(T) = 1, and
Consider a finite population
where yi, (x1i,x2i and zi be the characteristics of the study variable Y, the auxiliary variables (X1,X2) and Z respectively. We draw a sample of size n from a population by using simple random sampling with replacement (SRSWR). Let
and
be the sample variances corresponding to population variances
,
,
and
respectively, where
and
Let we define the error terms to obtain the bias and MSE of estimators. Let
such that
and
2. Existing estimators
We discuss the different estimators suggested by different authors under different models.
(i) Usual variance estimator under Dian and Perri [8] model
The finite population variance of Z, is given by
(1)
where
and
are the variances of the scrambled variables S and T respectively. From (1), we can write
as
.
Replacing and
by their consistent estimates
and
respectively, Gupta et al. [10] proposed an estimator for population variance, is given by:
(2)
The bias and MSE of to first order of approximation, is given by
(3)
where
as
and
(4)
(ii) Ratio estimator under Dian and Perri [8] model
Gupta et al. [10] defined the following ratio estimator under RRT, is given by:
(5)
The bias and MSE of to first order of approximation as reported by Gupta et al. [10] are given by
(6)
and
(7)
Note: It is observed that the bias and MSE expressions of to first order of approximation given in (6) and (7) are incorrect.
Now, we reconsider the estimator as observed in (5) by giving a name
and in terms of errors, we have:
(8)
Using (8), the corrected bias and corrected MSE of , are given by:
(9)
and
(10)
(iii) Generalized ratio type estimator under Diana and Perri [8] model
Gupta et al. [10] suggested another ratio type estimator for population variance as given by:
(11)
where g,α,β and w are suitably constants.
The bias and minimum MSE of to first order of approximation as reported by Gupta et al. [10], are given by:
(12)
where φ = gwψi and
The optimum value of φ, is given by
Note: It is also observed that the bias and MSE expressions of given in (12) and (13) to first order of approximation as reported by Gupta et al. [10] are not correct. The corrected bias and MSE expressions of
called it
to first order of approximation, are given below.
Solving (11) in terms of errors, we have
(14)
To first order of approximation, we have
(15)
From (15), the corrected bias expression of to first order of approximation, is given by
(16)
Squaring (15) and then taking expectation, we get the corrected MSE of to first order of approximation, is given by
(17)
where
Differentiate (17) w.r.t (wψig), we get the optimum value as given by
Substituting (wψig)opt in (17), we get the minimum MSE of as given by
(18)
(iv) Difference type estimator using single auxiliary variable
A difference type estimator using a single auxiliary variable, is given by:
(19)
where K is constant.
Solving (19) in terms of errors, we get
(20)
From (20), the bias of , is given by
(21)
Squaring (20) and then taking expectation, we get the MSE of as given by
(22)
where
and
.
From (22), the optimum value of K is .
Substituting the optimum value of K in (22), we get minimum MSE of as given by
(23)
(v) Difference type estimator using two auxiliary variables
A difference type estimator using two auxiliary variables, is given by:
(24)
where Ji(i = 1,2,3) are constants.
Solving (24) in terms of errors, we get
(25)
where
.
From (25), the bias of , is given by
(26)
Squaring (25) and then taking expectation, we get the MSE of as given by
(27)
where
From (27), the optimum values of Ji(i = 1,2,3) are:
Substituting the optimum values of Ji(i = 1,2,3) in (27), we get minimum MSE of as given by
(28)
3. Saleem et al. estimator
Saleem et al. [20] suggested an estimator under Diana and Perri [8] model and proposed a new model also.
(i) Saleem et al. [20] proposed estimator under Diana and Perri [8] Model
Recently Saleem et al. [20] suggested the following improved estimator for finite population variance, as given by:
(29)
where Ki(i = 1,2,3) are constants and λi(i = 1,2) are scaler quantities.
The bias and MSE expressions reported by Saleem et al. [20], are given by:
(30)
where
and
(31)
where
The optimum values are:
Note: It is observed that the bias and MSE expressions suggested by Saleem et al. (2023) are incorrect.
(ii) Saleem et al. [20] estimator under proposed RRT model
Saleem et al. [20] proposed the following RRT model and used it for obtaining bias and MSE.
(32)
where Y is the sensitive variable; g and α are the constants; T and S are two scrambled variables and are mutually uncorrelated with Y such that E(S) = 0 and E(T) = 1,
and
(33)
where
and using
, we can write
as
:
(34)
Estimating and
by an unbiased estimator
and
, we have
(35)
Note: Saleem et al. [20] reported the expressions given in (33)–(35) are not correct. The correct version is given below.
Rewriting (32) as given by:
(36)
Solving the above expression given in (36), we have
(37)
where G is defined above.
From (37), we have
as
Estimating and
by their unbiased estimators, we have
(38)
Note: Saleem et al. [20] used the proposed model given in (35) in their proposed estimator, but their resultant bias and MSE expressions are not correct.
4. Generalized RRM and suggested estimator
We use the generalized RRM in Saleem et al. [20] suggested estimator.
(i) Generalized RRM
On the lines of Saleem et al. [20], a generalized RRM, is given by:
(39)
where Y is the sensitive variable; h and α are the constants; T and S are two scrambled variables and are mutually uncorrelated with Y such that E(S) = 0 and E(T) = 1,
and
By changing the values of h and α, we can obtain the additive and substrative types models.
Solving (39), we get
(40)
where
and
Using
, we can write
as
:
(41)
(ii) Use of RRM in Saleem et al. [20] estimator
On the lines of Saleem et al. [20], we propose the following estimator by using the generalized RRT model given in (42), as given by:
(43)
where Ki(i = 1c,2c,3c) are constants and λi(i = 1c,2c) are scaler quantities.
(44)
where
and
Solving (44), we get
(45)
where
and
From (45), the bias of
to first degree of approximation, is given by
(46)
Squaring (45) and then taking expectation, we get the MSE of and is given by:
(47)
where
The minimum MSE of , is given by
(48)
5. Azeem et al. [6] RRT model
Azeem et al. [6] suggested the following RRT model to obtain the bias and MSE of the usual variance estimator and the ratio estimator.
(49)
where Y is the sensitive variable; 0≤l≤1; S is scrambled variables and are mutually uncorrelated with Y such that E(S) = 0,
, E(Y) = μY, and
.
(50)
where
and using
, we can write
as
:
(51)
Estimating and
by an unbiased estimator
and
, we have
(52)
Note: It is observed that the expressions given in (50)–(52) by Azeem et al. [6] are not correct. So bias and MSE expressions based on the estimator and consequently numerical results may be misleading.
The correct expression of the Azeem et al. [6] model, is given by
(53)
Using , we can write
as
:
(54)
Estimating and
by an unbiased estimator
and
, we have
(55)
The bias and MSE of to first order of approximation, are given by
(56)
Azeem et al. [6] suggested the following ratio estimator
(58)
where
.
The bias and MSE of are also not correct (For detail, see Azeem et al. [6])
The ratio estimator based on correct model, is given by
(59)
The bias and MSE of to first order of approximation, are given by
(60)
where
and
(61)
6. Proposed estimator
The purpose of the suggested estimator is to construct such a type of estimator which should provide better estimation as compared to all other previously considered estimators. On the lines of Gupta et al. [10] and Saleem et al. [20], and Azeem et al. [6], we propose the following generalized class of difference-in-exponential ratio-product type estimators for finite population variance as given by:
(62)
where Li(i = 1,2,3) are constants, whose values are to be determined and πi(i = 1,2,3) are scalar quantities.
In terms of errors, we have
or
(63)
where
.
From (63), the bias of to first order of approximation, is given by
(64)
From (63), MSE of to first order of approximation, is given by
(65)
where
The minimum MSE of , is given by
(66)
The optimum values Li(i = 1,2,3) are given by
where
Note 1: We can generate many more estimators from a generalized class of estimators described in (43), are given below:
1. Put λ1c = 0 and λ2c = 0 in (43), we get,
2. Put λ1c = 1 and λ2c = 1 in (43), we get,
3. Put λ1c = 1 and λ2c = 0 in (43), we get,
4. Put λ1c = 0 and λ2c = 1 in (43), we get,
5. Put λ1c = −1 and λ2c = −1 in (43), we get,
6. Put λ1c = −1 and λ2c = 0 in (43), we get,
7. Put λ1c = 0 and λ2c = −1 in (43), we get,
8. Put λ1c = −1 and λ2c = 1 in (43), we get,
9. Put λ1c = 1 and λ2c = −1 in (43), we get,
Note 2: We can generate many more estimators from a generalized class of estimators given in
(62) are described below:
1. Put π1 = π2 = π3 = 1 in (62), we get,
2. Put π1 = 0 and π2 = π3 = 1 in (62), we get,
3. Put π1 = −1 and π2 = π3 = 1 in (62), we get,
4. Put π1 = 0.5 and π2 = π3 = 1 in (62), we get,
5. Put π1 = 0 and π2 = π3 = 1 in (62), we get,
Put π1 = 0, π2 = 1 and π3 = −1 in (62), we get,
6. Put π1 = 0, π2 = −1 and π3 = 1 in (62), we get,
7. Simulation study
In this section, a simulation study is conducted to validate the efficiency of estimators. We consider three populations of size N = 500 each from three bivariate normal populations having mean and covariance matrices, are given below:
μy = 5.10938,
μy = 5.084729,
μy = 5.103771,
Covariance matrices show the distribution of the sensitive variable Y and the auxiliary variables X1 and X2. The response variable is Z = TY+S, where the scrambling variables S and T are distributed normally with mean 0 and variance 1 respectively. MSE values of different estimators given in Tables 1–4 are based on the various models as discussed earlier.
In Table 1, we observed that the difference estimator has the least variance as compared to other considered estimators under all Populations I, II and III. Table 2 gives the MSE values of Saleem et al. [20] model for different values of λi(i = 1,2) which indicates that Saleem et al. [20] estimators are generally perform better as compared to the estimators given in Table 1. Table 3 provides the MSE values of correct version of Azeem et al. [6] model. Table 4 provides the least MSE values of the proposed estimator under different values of πi(i = 1,2,3) as compared to all other considered estimators given in Tables 1–3. The ratio estimator
for l = 0 shows poor performance in Table 3 due to a model suggested by Azeem et al. [6]. The proposed estimator is based on the corrected model which earlier used by Saleem et al. [20] and play a major role in reduction of the MSEs under different situations.
8. Conclusion
Using the generalized RRM, we proposed an enhanced general class of difference-in-exponential ratio-product type estimators in estimating the finite population variance utilizing two auxiliary variables. We adjusted the bias and MSE expressions of the Gupta et al. [10], Saleem et al. [20] and Azeem et al. [6] estimators. The usefulness of the proposed estimator is demonstrated by a simulated study that uses the RRM in estimating the finite population variance using three data sets. The performance of the proposed estimator (see Table 4) is superior under different scenarios as compared to all other considered estimators (see Tables 1–3). The reduction in MSE is due to the right choice of RRM, so proposed estimator may be useable in all sorts of real data sets because of flexibility in accommodating both positive and negative correlation coefficients. The key findings include: (i) Many new estimators can be generated from a proposed generalized class of estimators. (ii) The proposed class of estimators is restricted to two auxiliary variables but by employing other suitable designs, it may be expanded to include multi-auxiliary variables. (iii) Use of appropriate RRMs, the efficiency of the suggested class of estimators can be improved.
Acknowledgments
Both authors are thankful to the three learned referees which helped in improving the manuscript.
References
- 1. Warner S.L., Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. Journal of the American Statistical Association, 1965. 60(309): p. 63–69. pmid:12261830
- 2. Yadav S.K., Kadilar C., Shabbir J., and Gupta S., Improved Family of Estimators of Population Variance in Simple Random Sampling. Journal of Statistical Theory and Practice, 2015. 9(2): p. 219–226.
- 3. Asghar A., Sanaullah A., and Hanif M., Generalized Exponential Type Estimator for Population Variance in Survey Sampling. Revista Colombiana de Estadística, 2014. 37(1): p. 213–224.
- 4. Sanaullah A., Saleem I., and Shabbir J., Use of scrambled response for estimating mean of the sensitivity variable. Communications in Statistics—Theory and Methods, 2020. 49(11): p. 2634–2647.
- 5. Niaz I., Sanaullah A., Saleem I., and Shabbir J., An improved efficient class of estimators for the population variance. Concurrency and Computation: Practice and Experience, 2022. 34(4): p. e6620.
- 6. Azeem M., et al., An efficient estimator of population variance of a sensitive variable with a new randomized response technique. Heliyon, 2024. 10(5): p. e27488. pmid:38495208
- 7. Singh S., Sedory S.A., and Arnab R., Estimation of Finite Population Variance Using Scrambled Responses in the Presence of Auxiliary Information. Communications in Statistics—Simulation and Computation, 2015. 44(4): p. 1050–1065.
- 8. Diana G. and Perri P.F., A class of estimators for quantitative sensitive data. Statistical Papers, 2011. 52(3): p. 633–650.
- 9. Ahmad S., et al., An enhanced estimator of finite population variance using two auxiliary variables under simple random sampling. Scientific Reports, 2023. 13(1): p. 21444. pmid:38052847
- 10. Gupta S., Nouman Qureshi M., and Khalil S., Variance Estimation Using Randomized Response Technique. REVSTAT-Statistical Journal, 2020. 18(2): p. 165–176.
- 11. Aloraini B., Khalil S., Nouman Qureshi M., and Gupta S., Estimation of Population Variance for a Sensitive Variable in Stratified Sampling Using Randomized Response Technique: Accepted: June 2022. REVSTAT-Statistical Journal, 2022.
- 12. Azeem M., et al., A novel randomized scrambling technique for mean estimation of a finite population. Heliyon, 2024. 10(11): p. e31690. pmid:38832257
- 13. Zaman Q., Ijaz M., and Zaman T., A randomization tool for obtaining efficient estimators through focus group discussion in sensitive surveys. Communications in Statistics—Theory and Methods, 2023. 52(10): p. 3414–3428.
- 14. Lin S., Zhang J., and Qiu C., Asymptotic Analysis for One-Stage Stochastic Linear Complementarity Problems and Applications. Mathematics, 2023. 11(2): p. 482.
- 15. Xiang X., Zhou J., Deng Y., and Yang X., Identifying the generator matrix of a stationary Markov chain using partially observable data. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2024. 34(2). pmid:38386908
- 16. Chen R., et al., Component uncertainty importance measure in complex multi-state system considering epistemic uncertainties. Chinese Journal of Aeronautics, 2024.
- 17. Batool A., et al., Assessing the generalization of forecasting ability of machine learning and probabilistic models for complex climate characteristics. Stochastic Environmental Research and Risk Assessment, 2024. 38(8): p. 2927–2947.
- 18. Mukhtar M.A., N and Shahzad U., An improved regession type mean estimator using redescending M-estimator. UW Journal of Science and Technology, 2023. 7(1): p. 11–18.
- 19. Gupta S. and Shabbir J., Sensitivity estimation for personal interview survey questions. Statistica, 2007. 64(4): p. 643–653.
- 20. Saleem I., et al., Efficient estimation of population variance of a sensitive variable using a new scrambling response model. Scientific Reports, 2023. 13(1): p. 19913. pmid:37963915