A neutral comparative analysis of additive, multiplicative, and mixed quantitative randomized response models

In survey sampling, the randomized response technique is a useful tool to collect reliable data in many fields including sociology, education, economics, and psychology etc. Over the past few decades, many variants of quantitative randomized response models have been developed by researchers. The existing literature on randomized response models lacks a neutral comparative study of different models to help the practitioners choose the appropriate model for a given practical problem. In most of the existing studies, the authors tend to show only the favorable results by hiding the cases where their suggested models are inferior to the existing models. This approach often leads to biased comparisons which may badly misguide the practitioners when choosing a randomized response model for a practical problem at hand. This paper attempts a neutral comparison of six existing quantitative randomized response models using separate as well as joint measures of respondent-privacy and model-efficiency. The findings suggest that one model may perform better than the other model in terms of efficiency but may perform worse when other metrics of model quality are taken into account. The current study guides practitioners in choosing the right model for a given problem under a particular situation.


Introduction
Survey researchers often face refusals and false responses by respondents while collecting data on sensitive variables. A few examples of sensitive variables are: cheating in examination, illegal income, expenditure on luxury items, the amount of tax payable, and the number of cigarettes used per day etc. In order to deal with non-response on questions regarding sensitive characteristics, a useful procedure, popularly known as the randomized response technique, was suggested by Warner [1]. The original randomized response technique was designed for practical situations where the researcher collects data on binary-type qualitative variables. Warner [2] extended the original qualitative technique to the case of quantitative variables by introducing an additive-type scrambling. Motivated by Warner [2], a new variant of the quantitative scrambling techniques was suggested by Eichhorn and Hayre [3] by using a multiplicative-type scrambling variable. Gupta et al. [4] devised a randomized strategy where the respondents are offered the choice to report either the true or a scrambled response. If a respondent opts for the scrambled response, he/she has to use an additive-type scrambling procedure to report the response. Later on, a multiplicative version of the Gupta et al. [4] procedure was suggested by Bar-Lev et al. [5]. Gjestvang and Singh [6] introduced an additive-type scrambling procedure for data collection on quantitative sensitive variables. Diana and Perri [7] suggested a quantitative randomized technique by utilizing both additive and multiplicative-type scrambling noises. Al-Sobhi et al. [8] introduced a new quantitative technique by using additive-subtractive scrambling noise. Gupta et al. [9] presented a measure for evaluation of randomized response models by quantifying the respondents' privacy and efficiency as a single number. Narjis and Shabbir [10] suggested an efficient variant of the Gjestvang and Singh [6] technique for data collection on quantitative sensitive variables. Khalil et al. [11] analyzed the influence of measurement errors on the mean estimator of the sensitive quantitative variable. Recently, Gupta et al. [12] introduced a new quantitative randomized technique, showing the improvement over the Diana and Perri [7] technique with regard to the respondents' privacy as well as the efficiency of model. In addition to the above studies, various aspects of the randomized response techniques have been analyzed by Yan et al. [13], Kalucha et al. [14], Young et al. [15], Zhang et al. [16], Murtaza et al. [17], Zapata et al. [18], and Saleem and Sanaullah [19] etc.
Chen et al. [20] presented the direct probability integral method to cope with Stochastic response and global dynamic analyses of structures with chaotic motion. Torkayesh et al. [21] presented a new method for minimizing air pollutants and to enhance environmental sustainability. Mondal et al. [22] analyzed the robustness of multilayer perceptron with regard to additive or multiplicative input noise. Silva et al. [23] recently studied a Bayesian analysis of the additive main effects and multiplicative interaction model utilizing three a priori distributions. Akgun et al. [24] studied multi-objective optimization for the rating of carbon-based additives in phase change materials using different criteria for evaluation.
Recently, Singh et al. [25] developed two new quantitative randomized response models which were shown to be better than the existing models in terms of efficiency as well as privacy protection level. In another study, Singh et al. [26] utilized Poisson distribution to develop a three-stage randomized response model which helps in estimating the mean number of persons having a sensitive attribute. The Singh et al. [26] model improved the efficiency of the existing models.
The research studies mentioned above have presented many variants of randomized response models. Some of these existing models utilize additive scrambling variables whereas others use multiplicative scrambling or mixed scrambling where both additive and multiplicative variables have been used by researchers. However, to our knowledge, no attempt has been made to conduct a detailed comparative analysis of the different versions of the existing randomized response models. The present study compares six existing randomized response models using three measures of model-quality: (i) model efficiency, (ii) measure of respondent-privacy, and (iii) joint measure of model-efficiency and respondent-privacy.

Selected existing models for comparative analysis
In the current study, we have chosen six existing quantitative randomized response models for comparative analysis. Out of the six selected models for comparison, two models are based on scrambled responses with no option for true response. The next two of the selected six models are optional models, including one additive and one multiplicative model. Finally, the last two of the selected six models are mixed models, that is, they use both additive and multiplicative scrambling noise. Before proceeding to comparative analysis, we introduce the notations used for the variables and their parameters, along with some distributional assumptions under which the comparative study is carried out.
Let the population under study contains N units and let a simple random sample of size n units is obtained with replacement. Let the quantitative sensitive variable under study be denoted by Y, and let the additive scrambling variable be denoted by S. We further assume are the variances of the variable Y and S, respectively, for the population data. Further, let μ Y and θ be the population means of the variable Y and S, respectively. Likewise, let T be a multiplicative-type scrambling variable, such that E(T) =1, and VarðTÞ ¼ s 2 T . It is also assumed that all of the three variables work independently of each other. In this section, some existing quantitative scrambling techniques are presented.

Warner's [2] model
The additive model suggested by Warner [2] is as follows: where Z is the reported response. An unbiased mean estimator of Y based on Warner's [2] model is given as:m The variance ofm W is given as:

Eichhorn and Hayre [3] model
The responses reported by the respondents under the Eichhorn and Hayre [3] model, are as follows: An unbiased estimator of μ Y under the Eichhorn and Hayre [3] technique is: The variance ofm EH is given as:

Gupta et al. [4] model
The reported responses under the Gupta et al. [4] model are given as:

PLOS ONE
The mean estimator under the Gupta et al. [4] technique is given by: where Z is defined in Eq (7). The variance of the mean estimator is as follows:

Bar-Lev et al. [5] model
The responses reported by the respondents under the Bar-Lev et al. [5] technique, are as follows: The mean estimator under the Bar-Lev et al. [5] technique is given by: where Z is defined in Eq (10). The variance of the mean estimator is as follows:

Murtaza et al. [17] model
The reported responses under the Murtaza et al. [17] model, are given as: ( where α is a constant. An unbiased mean estimator of the sensitive variable based on Murtaza et al. [17] model is given as:m The Murtaza et al. [17] model is based on correlated scrambling variables. In order to make the comparison feasible, the assumption of uncorrelated variables is used, as the other models selected for comparison also use uncorrelated scrambling variables. The variance ofm M is given by:

Gupta et al. [12] model
Gupta et al. [12] introduced the following optional scrambling model: where A is a constant, 0 < A < 1. An unbiased mean estimator on the basis of the Gupta et al. [12] model, is given by:m The sampling variance ofm G is as follows:

Privacy and efficiency metrics
The Yan et al. [13] measure for quantifying the respondent-privacy is as follows: A higher the value of r translates into a better level of respondents' privacy provided by a given quantitative randomized response model.
The Gupta et al. [9] joint measure of efficiency and privacy-protection is as follows: From Eq (20), one may clearly observe that lower values of δ are desirable.
For the Warner's [2] model, the measure of privacy can be expressed as: The joint measure of privacy and efficiency for the Warner's [2] model is given as: For the Eichhorn and Hayre [3] quantitative technique, the measure of privacy can be obtained as: or, The joint measure of model-efficiency and respondent-privacy for the Eichhorn and Hayre [3] quantitative technique is given as: The privacy level offered by the Gupta et al. [4] model is: The joint measure of model-efficiency and respondent-privacy for the Gupta et al. [4] quantitative technique is given as: The measure of privacy for the Bar-Lev et al. [5] technique is: The joint measure of model-efficiency and respondent-privacy for the Bar-Lev et al. [5] quantitative technique is given as: The measure of privacy for the Murtaza et al. [17] model is given as: The joint measure of privacy and efficiency for the Murtaza et al. [17] model is given as: The privacy level offered by the Gupta et al. [12] model is given by: The joint measure of privacy and efficiency for the Gupta et al. [12] model is given as:

Efficiency conditions
In this section, the mathematical conditions for the efficiency are derived.

Warner's [2] model is more precise compared to the Eichhorn and Hayre [3] model, if
Condition (33) may not always be true.

Gupta et al. [4] model vs. Bar-Lev et al. [5] model
The Gupta et al. [4] model will be more efficient than the Bar-Lev et al. [5] model, if Condition (34) is the same as condition (33). This is because the Gupta et al. [4] procedure is the optional variant of the Warner's [2] model; and the Bar-Lev et al. [5] technique is simply the optional variant of the Eichhorn and Hayre [3] model.

Gupta et al. [12] model vs. Murtaza et al. [17] model
The Gupta et al. [12] model will be more efficient than the Murtaza et al. [17] model, if Condition (35) may not always be true.  Tables 2 and 3 show the values of r and δ, respectively, for different models.

Discussion and conclusion
The current study is based on a neutral comparison of six existing quantitative randomized response models: (i) Warner [12] model. We presented the comparative analysis in a neutral manner, that is, our analysis doesn't favor one model over the other, it just evaluates the strengths and weaknesses of the models chosen for the comparative analysis. Table 1 shows that the Gupta et al. [4] model is the most efficient model, whereas the oldest model of Warner's [2] is the second-best model in terms of efficiency. Moreover, one may also observe that the recently developed model of Murtaza et al. [17] is less precise than the much older model of Bar-Lev et al. [5]. It is also interesting to observe that the newest of the of the six selected models-the Gupta et al. [12] model, is less efficient than the oldest Warner's [2] model for the selected choices of values of the parameters of scrambling variables. Further, the Murtaza et al. [17] model is also less efficient than the 50 years old Warner's [2] model. Model-efficiency is not the only criterion for assessing the quality of a given quantitative randomized response model. The respondents' privacy-protection offered by the model is also equally important to judge the quality of a randomized response model. The respondent- privacy level can be measured by the value of r with a higher value indicating better level of privacy protection offered by the model. Table 2 displays the values of r for different choices of values of the parameters of scrambling variables. Table 2 indicates that the old Eichhorn and Hayre [3] model has the highest values of r, indicating the best level of privacy protection. It is also observed from Table 2 that the Gupta et al. [4] optional model has the smallest values of r, making it the worst of all six models. Finally, as far as the overall quality of the six selected models is concerned, the δ values are displayed in Table 3. One may clearly observe that the Gupta et al. [12] model has the smallest δ values, making it the best among all six models.
We can conclude that a model which performs better on one measure of model-quality may perform worse on other measure. In practice, the researcher may prefer model-efficiency over respondent-privacy, and vice-versa, depending on the requirements of the survey. If model-efficiency alone is preferable, the researcher may choose one model over the other. Likewise, if respondent-privacy level is preferable over efficiency, then another model may be more appropriate. Since the joint measure, δ, assigns equal weights to efficiency and privacy, so it alone may not guide the researcher in choosing a particular randomized response model, as efficiency and privacy may not be equally important in practice. Thus, it is recommended to the researchers to keep in mind the particular situation at hand while choosing a randomized response model for data collection on sensitive variables.

Future research
This paper compares six existing quantitative randomized response models in a neutral manner. There is also a need to conduct a comparative study on qualitative randomized response models. This will help the researchers choose the appropriate model in situations where the variable of interest is of qualitative nature, such as gender, marital status, socio-economic class, etc.