Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Use of generalized randomized response model for enhancement of finite population variance: A simulation approach

Abstract

Gupta et al. suggested an improved estimator by using the Diana and Perri model in estimating the finite population variance using the single auxiliary variable. On the same lines, Saleem et al. proposed a new scrambled randomized response model (RRT) based on two auxiliary variables for estimating the finite population variance. Recently Azeem et al. presented a new randomized response model in estimating the finite population variance. It is observed that Bias and MSE of these estimators up to first order of approximation seem to lack sufficient information. In this study, we rectify the bias and MSE expressions of the estimators proposed by Gupta et al., Saleem et al. and Azeem et al. Additionally, we suggest a new generalized class of estimators that is more efficient in comparison to the previously considered estimators. A simulation study is conducted to establish the behavior of the estimators. The suggested estimator performs better than the estimators considered by the authors earlier.

1. Introduction

Obtaining reliable data, particularly direct and honest response from a respondent is a serious issue which takes a lot of effort and time. Under non-sampling errors, nonresponse may appear due to response bias and nonresponse bias. Consequently, results may be deceptive and misleading, which have a negative impact on decision-making issues. As an alternative, to get around these issues, Warner [1] developed an indigenous model known as randomized response model (RRM). In current situation, it is challenging to gather information on sensitive topics like forced abortion, sexual assault, tax evasion, bribery, corruption, drug addiction, human trafficking, and malpractice etc., because respondents may refuse to give honest answers on such delicate phenomena. So, results may remain erroneous or may be misleading and lead to serious biasedness. To preserve confidentiality and protect respondent’s privacy, another option is to use the RRM.

In our daily lives, we come across a variety of scenarios where unpredictability is involved, such as regular variations in the prices of consumable goods and the significant impact of political instability on stock market crashes. A thorough treatment of variability is essential in such contents. Many authors including Yadav et al. [2], Asghar et al. [3], Sanaullah et al. [4] Niaz et al. [5] and Azeem et al. [6] have investigated the estimation of finite population variance using the scrambled models. Sensitive variable was originally used by Singh et al. [7] to estimate finite population variance under multiplicative scrambled response model. A class of estimators for quantitative sensitivity in quantitative RRM was proposed by Diana and Perri [8]. Ahmad et al. [9] and Gupta et al. [10] suggested some variance estimators using the auxiliary information for sensitive variables. Aloraini et al. [11] introduced several separate and combined variance estimators under stratified sampling for estimation of variance. Azeem et al. [12] suggested a new RRM for mean estimation. Zaman et al. [13] discussed the focus group under RRT to obtain efficient estimators. Some other important references include are: [1419].

In this paper, we adjusted the bias and mean squared error (MSE) expressions that were previously proposed by Gupta et al. [10], Saleem et al. [20] and Azeem et al. [6] under RRT by utilizing the auxiliary variables. Furthermore, we propose an alternative generalized class of estimators for population variance using generalized RRM.

Diana and Perri [8] suggested a model Z = TY+S, where T and S are two scrambled variables such that E(S) = 0 and E(T) = 1, and Consider a finite population where yi, (x1i,x2i and zi be the characteristics of the study variable Y, the auxiliary variables (X1,X2) and Z respectively. We draw a sample of size n from a population by using simple random sampling with replacement (SRSWR). Let and be the sample variances corresponding to population variances , , and respectively, where and Let we define the error terms to obtain the bias and MSE of estimators. Let such that and

2. Existing estimators

We discuss the different estimators suggested by different authors under different models.

(i) Usual variance estimator under Dian and Perri [8] model

The finite population variance of Z, is given by (1) where and are the variances of the scrambled variables S and T respectively. From (1), we can write as .

Replacing and by their consistent estimates and respectively, Gupta et al. [10] proposed an estimator for population variance, is given by: (2)

The bias and MSE of to first order of approximation, is given by (3) where as and (4)

(ii) Ratio estimator under Dian and Perri [8] model

Gupta et al. [10] defined the following ratio estimator under RRT, is given by: (5)

The bias and MSE of to first order of approximation as reported by Gupta et al. [10] are given by (6) and (7)

Note: It is observed that the bias and MSE expressions of to first order of approximation given in (6) and (7) are incorrect.

Now, we reconsider the estimator as observed in (5) by giving a name and in terms of errors, we have: (8)

Using (8), the corrected bias and corrected MSE of , are given by: (9) and (10)

(iii) Generalized ratio type estimator under Diana and Perri [8] model

Gupta et al. [10] suggested another ratio type estimator for population variance as given by: (11) where g,α,β and w are suitably constants.

The bias and minimum MSE of to first order of approximation as reported by Gupta et al. [10], are given by: (12) where φ = gwψi and

and (13)

The optimum value of φ, is given by

Note: It is also observed that the bias and MSE expressions of given in (12) and (13) to first order of approximation as reported by Gupta et al. [10] are not correct. The corrected bias and MSE expressions of called it to first order of approximation, are given below.

Solving (11) in terms of errors, we have (14)

To first order of approximation, we have (15)

From (15), the corrected bias expression of to first order of approximation, is given by (16)

Squaring (15) and then taking expectation, we get the corrected MSE of to first order of approximation, is given by (17) where

Differentiate (17) w.r.t (ig), we get the optimum value as given by

Substituting (ig)opt in (17), we get the minimum MSE of as given by (18)

(iv) Difference type estimator using single auxiliary variable

A difference type estimator using a single auxiliary variable, is given by: (19) where K is constant.

Solving (19) in terms of errors, we get (20)

From (20), the bias of , is given by (21)

Squaring (20) and then taking expectation, we get the MSE of as given by (22) where and .

From (22), the optimum value of K is .

Substituting the optimum value of K in (22), we get minimum MSE of as given by (23)

(v) Difference type estimator using two auxiliary variables

A difference type estimator using two auxiliary variables, is given by: (24) where Ji(i = 1,2,3) are constants.

Solving (24) in terms of errors, we get (25) where .

From (25), the bias of , is given by (26)

Squaring (25) and then taking expectation, we get the MSE of as given by (27) where

From (27), the optimum values of Ji(i = 1,2,3) are:

Substituting the optimum values of Ji(i = 1,2,3) in (27), we get minimum MSE of as given by (28)

3. Saleem et al. estimator

Saleem et al. [20] suggested an estimator under Diana and Perri [8] model and proposed a new model also.

(i) Saleem et al. [20] proposed estimator under Diana and Perri [8] Model

Recently Saleem et al. [20] suggested the following improved estimator for finite population variance, as given by: (29) where Ki(i = 1,2,3) are constants and λi(i = 1,2) are scaler quantities.

The bias and MSE expressions reported by Saleem et al. [20], are given by: (30) where and (31) where

The optimum values are:

Note: It is observed that the bias and MSE expressions suggested by Saleem et al. (2023) are incorrect.

(ii) Saleem et al. [20] estimator under proposed RRT model

Saleem et al. [20] proposed the following RRT model and used it for obtaining bias and MSE. (32) where Y is the sensitive variable; g and α are the constants; T and S are two scrambled variables and are mutually uncorrelated with Y such that E(S) = 0 and E(T) = 1, and (33) where and using , we can write as: (34)

Estimating and by an unbiased estimator and , we have (35)

Note: Saleem et al. [20] reported the expressions given in (33)–(35) are not correct. The correct version is given below.

Rewriting (32) as given by: (36)

Solving the above expression given in (36), we have (37) where G is defined above.

From (37), we have

as

Estimating and by their unbiased estimators, we have (38)

Note: Saleem et al. [20] used the proposed model given in (35) in their proposed estimator, but their resultant bias and MSE expressions are not correct.

4. Generalized RRM and suggested estimator

We use the generalized RRM in Saleem et al. [20] suggested estimator.

(i) Generalized RRM

On the lines of Saleem et al. [20], a generalized RRM, is given by: (39) where Y is the sensitive variable; h and α are the constants; T and S are two scrambled variables and are mutually uncorrelated with Y such that E(S) = 0 and E(T) = 1, and By changing the values of h and α, we can obtain the additive and substrative types models.

Solving (39), we get (40) where and Using , we can write as : (41)

Estimating and by an unbiased estimator and , we have (42)

(ii) Use of RRM in Saleem et al. [20] estimator

On the lines of Saleem et al. [20], we propose the following estimator by using the generalized RRT model given in (42), as given by: (43) where Ki(i = 1c,2c,3c) are constants and λi(i = 1c,2c) are scaler quantities.

(44) where and Solving (44), we get (45) where and From (45), the bias of to first degree of approximation, is given by (46)

Squaring (45) and then taking expectation, we get the MSE of and is given by: (47) where

The minimum MSE of , is given by (48)

The optimum values are: where

5. Azeem et al. [6] RRT model

Azeem et al. [6] suggested the following RRT model to obtain the bias and MSE of the usual variance estimator and the ratio estimator. (49) where Y is the sensitive variable; 0≤l≤1; S is scrambled variables and are mutually uncorrelated with Y such that E(S) = 0, , E(Y) = μY, and . (50) where and using , we can write as : (51)

Estimating and by an unbiased estimator and , we have (52)

Note: It is observed that the expressions given in (50)–(52) by Azeem et al. [6] are not correct. So bias and MSE expressions based on the estimator and consequently numerical results may be misleading.

The correct expression of the Azeem et al. [6] model, is given by (53)

Using , we can write as : (54)

Estimating and by an unbiased estimator and , we have (55)

The bias and MSE of to first order of approximation, are given by (56)

and (57)

Azeem et al. [6] suggested the following ratio estimator (58) where .

The bias and MSE of are also not correct (For detail, see Azeem et al. [6])

The ratio estimator based on correct model, is given by (59)

The bias and MSE of to first order of approximation, are given by (60) where and (61)

6. Proposed estimator

The purpose of the suggested estimator is to construct such a type of estimator which should provide better estimation as compared to all other previously considered estimators. On the lines of Gupta et al. [10] and Saleem et al. [20], and Azeem et al. [6], we propose the following generalized class of difference-in-exponential ratio-product type estimators for finite population variance as given by: (62) where Li(i = 1,2,3) are constants, whose values are to be determined and πi(i = 1,2,3) are scalar quantities.

In terms of errors, we have or (63) where .

From (63), the bias of to first order of approximation, is given by (64)

From (63), MSE of to first order of approximation, is given by (65) where

The minimum MSE of , is given by (66)

The optimum values Li(i = 1,2,3) are given by where

Note 1: We can generate many more estimators from a generalized class of estimators described in (43), are given below:

1. Put λ1c = 0 and λ2c = 0 in (43), we get,

2. Put λ1c = 1 and λ2c = 1 in (43), we get,

3. Put λ1c = 1 and λ2c = 0 in (43), we get,

4. Put λ1c = 0 and λ2c = 1 in (43), we get,

5. Put λ1c = −1 and λ2c = −1 in (43), we get,

6. Put λ1c = −1 and λ2c = 0 in (43), we get,

7. Put λ1c = 0 and λ2c = −1 in (43), we get,

8. Put λ1c = −1 and λ2c = 1 in (43), we get,

9. Put λ1c = 1 and λ2c = −1 in (43), we get,

Note 2: We can generate many more estimators from a generalized class of estimators given in

(62) are described below:

1. Put π1 = π2 = π3 = 1 in (62), we get,

2. Put π1 = 0 and π2 = π3 = 1 in (62), we get,

3. Put π1 = −1 and π2 = π3 = 1 in (62), we get,

4. Put π1 = 0.5 and π2 = π3 = 1 in (62), we get,

5. Put π1 = 0 and π2 = π3 = 1 in (62), we get,

Put π1 = 0, π2 = 1 and π3 = −1 in (62), we get,

6. Put π1 = 0, π2 = −1 and π3 = 1 in (62), we get,

7. Simulation study

In this section, a simulation study is conducted to validate the efficiency of estimators. We consider three populations of size N = 500 each from three bivariate normal populations having mean and covariance matrices, are given below:

Population-I

μy = 5.10938,

Population-II

μy = 5.084729,

Population-III

μy = 5.103771, Covariance matrices show the distribution of the sensitive variable Y and the auxiliary variables X1 and X2. The response variable is Z = TY+S, where the scrambling variables S and T are distributed normally with mean 0 and variance 1 respectively. MSE values of different estimators given in Tables 14 are based on the various models as discussed earlier.

thumbnail
Table 1. MSE values of Gupta et. al [10] estimator under Diana and Perri [8] model.

https://doi.org/10.1371/journal.pone.0315658.t001

thumbnail
Table 2. MSE values of Saleem et al. [20] estimator under generalized model using two auxiliary variables.

https://doi.org/10.1371/journal.pone.0315658.t002

thumbnail
Table 3. MSE values of Azeem et al. [6] estimator under a given model.

https://doi.org/10.1371/journal.pone.0315658.t003

thumbnail
Table 4. MSE values of proposed estimator under generalized model using two auxiliary variables.

https://doi.org/10.1371/journal.pone.0315658.t004

In Table 1, we observed that the difference estimator has the least variance as compared to other considered estimators under all Populations I, II and III. Table 2 gives the MSE values of Saleem et al. [20] model for different values of λi(i = 1,2) which indicates that Saleem et al. [20] estimators are generally perform better as compared to the estimators given in Table 1. Table 3 provides the MSE values of correct version of Azeem et al. [6] model. Table 4 provides the least MSE values of the proposed estimator under different values of πi(i = 1,2,3) as compared to all other considered estimators given in Tables 13. The ratio estimator for l = 0 shows poor performance in Table 3 due to a model suggested by Azeem et al. [6]. The proposed estimator is based on the corrected model which earlier used by Saleem et al. [20] and play a major role in reduction of the MSEs under different situations.

8. Conclusion

Using the generalized RRM, we proposed an enhanced general class of difference-in-exponential ratio-product type estimators in estimating the finite population variance utilizing two auxiliary variables. We adjusted the bias and MSE expressions of the Gupta et al. [10], Saleem et al. [20] and Azeem et al. [6] estimators. The usefulness of the proposed estimator is demonstrated by a simulated study that uses the RRM in estimating the finite population variance using three data sets. The performance of the proposed estimator (see Table 4) is superior under different scenarios as compared to all other considered estimators (see Tables 13). The reduction in MSE is due to the right choice of RRM, so proposed estimator may be useable in all sorts of real data sets because of flexibility in accommodating both positive and negative correlation coefficients. The key findings include: (i) Many new estimators can be generated from a proposed generalized class of estimators. (ii) The proposed class of estimators is restricted to two auxiliary variables but by employing other suitable designs, it may be expanded to include multi-auxiliary variables. (iii) Use of appropriate RRMs, the efficiency of the suggested class of estimators can be improved.

Acknowledgments

Both authors are thankful to the three learned referees which helped in improving the manuscript.

References

  1. 1. Warner S.L., Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. Journal of the American Statistical Association, 1965. 60(309): p. 63–69. pmid:12261830
  2. 2. Yadav S.K., Kadilar C., Shabbir J., and Gupta S., Improved Family of Estimators of Population Variance in Simple Random Sampling. Journal of Statistical Theory and Practice, 2015. 9(2): p. 219–226.
  3. 3. Asghar A., Sanaullah A., and Hanif M., Generalized Exponential Type Estimator for Population Variance in Survey Sampling. Revista Colombiana de Estadística, 2014. 37(1): p. 213–224.
  4. 4. Sanaullah A., Saleem I., and Shabbir J., Use of scrambled response for estimating mean of the sensitivity variable. Communications in Statistics—Theory and Methods, 2020. 49(11): p. 2634–2647.
  5. 5. Niaz I., Sanaullah A., Saleem I., and Shabbir J., An improved efficient class of estimators for the population variance. Concurrency and Computation: Practice and Experience, 2022. 34(4): p. e6620.
  6. 6. Azeem M., et al., An efficient estimator of population variance of a sensitive variable with a new randomized response technique. Heliyon, 2024. 10(5): p. e27488. pmid:38495208
  7. 7. Singh S., Sedory S.A., and Arnab R., Estimation of Finite Population Variance Using Scrambled Responses in the Presence of Auxiliary Information. Communications in Statistics—Simulation and Computation, 2015. 44(4): p. 1050–1065.
  8. 8. Diana G. and Perri P.F., A class of estimators for quantitative sensitive data. Statistical Papers, 2011. 52(3): p. 633–650.
  9. 9. Ahmad S., et al., An enhanced estimator of finite population variance using two auxiliary variables under simple random sampling. Scientific Reports, 2023. 13(1): p. 21444. pmid:38052847
  10. 10. Gupta S., Nouman Qureshi M., and Khalil S., Variance Estimation Using Randomized Response Technique. REVSTAT-Statistical Journal, 2020. 18(2): p. 165–176.
  11. 11. Aloraini B., Khalil S., Nouman Qureshi M., and Gupta S., Estimation of Population Variance for a Sensitive Variable in Stratified Sampling Using Randomized Response Technique: Accepted: June 2022. REVSTAT-Statistical Journal, 2022.
  12. 12. Azeem M., et al., A novel randomized scrambling technique for mean estimation of a finite population. Heliyon, 2024. 10(11): p. e31690. pmid:38832257
  13. 13. Zaman Q., Ijaz M., and Zaman T., A randomization tool for obtaining efficient estimators through focus group discussion in sensitive surveys. Communications in Statistics—Theory and Methods, 2023. 52(10): p. 3414–3428.
  14. 14. Lin S., Zhang J., and Qiu C., Asymptotic Analysis for One-Stage Stochastic Linear Complementarity Problems and Applications. Mathematics, 2023. 11(2): p. 482.
  15. 15. Xiang X., Zhou J., Deng Y., and Yang X., Identifying the generator matrix of a stationary Markov chain using partially observable data. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2024. 34(2). pmid:38386908
  16. 16. Chen R., et al., Component uncertainty importance measure in complex multi-state system considering epistemic uncertainties. Chinese Journal of Aeronautics, 2024.
  17. 17. Batool A., et al., Assessing the generalization of forecasting ability of machine learning and probabilistic models for complex climate characteristics. Stochastic Environmental Research and Risk Assessment, 2024. 38(8): p. 2927–2947.
  18. 18. Mukhtar M.A., N and Shahzad U., An improved regession type mean estimator using redescending M-estimator. UW Journal of Science and Technology, 2023. 7(1): p. 11–18.
  19. 19. Gupta S. and Shabbir J., Sensitivity estimation for personal interview survey questions. Statistica, 2007. 64(4): p. 643–653.
  20. 20. Saleem I., et al., Efficient estimation of population variance of a sensitive variable using a new scrambling response model. Scientific Reports, 2023. 13(1): p. 19913. pmid:37963915