Additive and Subtractive Scrambling in Optional Randomized Response Modeling

This article considers unbiased estimation of mean, variance and sensitivity level of a sensitive variable via scrambled response modeling. In particular, we focus on estimation of the mean. The idea of using additive and subtractive scrambling has been suggested under a recent scrambled response model. Whether it is estimation of mean, variance or sensitivity level, the proposed scheme of estimation is shown relatively more efficient than that recent model. As far as the estimation of mean is concerned, the proposed estimators perform relatively better than the estimators based on recent additive scrambling models. Relative efficiency comparisons are also made in order to highlight the performance of proposed estimators under suggested scrambling technique.


Introduction
To procure reliable data on stigmatizing characteristics, Warner [1] introduced the notion of randomized response technique where the respondent himself selects randomly one of the two complementary questions on probability basis. Greenberg et al. [2] extended the Warner's [1] work to collect the data on quantitative stigmatizing variables. Since then, several authors have worked on quantitative randomized response models including, Eichhorn and Hayre [3], Gupta and Shabbir [4], Gupta and Shabbir [5], Bar-Lev et al. [6], Gupta et al. [7], Hussain and Shabbir [8], Saha [9], Chaudhuri [10], Hussain and Shabbir [11] and references therein. Quantitative randomized response models are classified into fully (Eichhorn and Hayre [3]), partial (Gupta and Shabbir [5]), Bar-Lev et al. [6]) and optional randomized response models (Gupta et al. [4]), Gupta et al. [7], Huang [12]). In a fully randomized response models all the responses are obtained as scrambled responses. In a partial randomized response model a known proportion of respondents is asked to report their actual responses while the others report scrambled responses.
Our focus in this article is on ORRMs only. The notion of ORRM started with Gupta et al. [4]. The concept of ORRM is based on the respondent's perception about sensitivity of the variable of interest. Using ORRM, a respondent can report the truth (or scramble his/her response) if he/she perceives the study variable as non sensitive (sensitive) to him/her. The proportion of respondents reporting the scrambled response is unknown, and is termed as the sensitivity level of the study variable. Gupta et al. [4] used multiplicative ORRM and provided unbiased (biased) estimator of mean (sensitivity). Moreover, Gupta et al. [4] ORRM requires approximation in order to derive the variances of the estimators. In Gupta et al. [4] ORRM, simultaneous estimation of mean and sensitivity is not possible. To avoid approximation, Gupta et al. [7], Huang [12], Gupta et al. [13] and Mehta et al. [14] proposed ORRMs to provide unbiased estimators of mean and sensitivity level. Gupta et al. [7] and Huang [12] are the onestage ORRMs, Gupta et al. [13] is a two-stage ORRM whereas Mehta et al. [14] is a three-stage ORRM. Gupta et al. [7], Gupta et al. [13] and Mehta et al. [14] used additive scrambling whereas Huang [12] used a linear combination of additive and multiplicative scrambling. Further, Gupta et al. [15] observed that additive scrambling yields more precise estimators than a linear combination of additive and multiplicative scrambling by Huang [12]. Also, Gupta et al. [16] observed that in Gupta et al. [13] twostage ORRM a large value of truth parameter (T) is required when the study variable is highly sensitive. Motivated by the advocacy of additive scrambling and requirement of larger value of truth parameter (T), Mehta et al. [14] proposed a three stage ORRM by introducing a forced scrambling parameter (F). Mehta et al. [14] established the better performance of estimator of mean but did not discuss the performance of sensitivity estimator. As far as the estimation of mean is concerned, Mehta et al. [14] ORRM can be further improved by using a multi-stage randomization but it results in a poor estimation of sensitivity level.
All of the ORRMs mentioned above share a common feature of splitting the total sample into two subsamples. We base our proposals on two strategies: (i) taking two subsamples and making use of additive scrambling in one subsample and subtractive scrambling in the other, and (ii) drawing a single sample and collecting two responses from each respondent through additive and subtractive scrambling. Through our strategies, we plan to improve Mehta et al. [14] ORRM for estimating the mean. As far as estimation of mean is concerned, we show that the proposed ORRM is better than Mehta et al. [14], Huang [12] and Gupta et al. [13] ORRMs. We show that there is no need of large value of the parameter (T or F) when the study variable is either low, moderately or highly sensitive. In addition, we also propose an estimator of the variance of the study variable.
Mehta et al. [14] ORRM Assume that the interest lies in unbiased estimation of the mean m X and the sensitivity level W of the study variable X . Let D i , i~1,2 ð Þ be the unrelated scrambling variable. Two independent subsamples of size n i i~1,2 ð Þ, are drawn from the population through simple random sampling with replacement such that n 1 zn 2~n , the total sample size required. In i th subsample, a fixed predetermined proportion T ð Þ of respondents is instructed to tell the truth and a fixed predetermined proportion F ð Þ of respondents is instructed to scramble additively their response as X zD i ð Þ . The remaining proportion 1{T{F ð Þof respondents have an option to scramble their response additively if they consider the study variable sensitive. Otherwise, they can report the true response X . Let m Di~hi , be the known mean, and s 2 Di~d 2 i , be the known variance of the positive-valued random variable D i i~1,2 ð Þ. The optional randomized response from j th respondent in the i th subsample is given by: where i~1,2:,j~1,2,:: The expectation of the sample response Z ij from i th sample is given by: Taking Z Z 1 and Z Z 2 as the observed means from the two subsamples, Mehta et al. [14] proposed the following estimators of m X and W .m The variances of estimators in (2) and (3) are given by: VarŴ W M À Á~1 where Gupta et al. [13] ORRM It is interesting to note that for F~0, the Mehta et al. [14] ORRM reduces to Gupta et al. [13] ORRM. . Let Z' ij be the optional scrambled response from j th respondent in the i th subsample then taking F~0 in (1)-(5), unbiased estimators and their variances are given by: where

Huang [12] ORRM
Each respondent in the i th subsample is provided with two randomization devices which generate two independent random variables, say S i and D i , from some pre-assigned distributions. The respondent chooses randomly by himself one of the following two options: (a) report the true response X (if you do not feel the study variable sensitive), or (b) report the scrambled response S i X zD i (if you feel the study variable sensitive). Let m Si~1 , be the known mean, and s 2 Si~c 2 i , be the known variance of the positive-valued random variables S i . The optional randomized response Z'' ij from j th respondent in the i th subsample is given by: The expectation of sample response Z'' ij from i th sample is given by: since m S i~1 . Huang [2] proposed the following estimators of m X and W .
where Z Z'' 1 and Z Z'' 2 are the observed means from the two subsamples. The variances of estimators in (12) and (13) are given by: VarŴ W H À Á~1 where

Proposed Procedures
In this section, we propose split sample and double response approaches using Mehta et al. [14] ORRM.

Split sample approach
Unlike Mehta et al. [14], in the proposed procedure, we use an additive scrambling in one subsample and subtractive scrambling in the other. All the other procedure is same as that of Mehta et al. [14]. Let R 1j and R 2j be response from j th j~1,2,:::,n i ð Þ respondent selected in the i th i~1,2 ð Þ sample, then R 1j and R 2j can be written as: The expected responses from the two subsamples are given by: Solving (18) and (19), we get: Estimating E R 1j À Á and E R 2j À Á by the respective sample means R R 1 and R R 2 , unbiased estimators of m X and W are proposed as: Unbiasedness ofm m XZ andŴ W Z can be easily established through (18) and (19). The variances ofm m XZ andŴ W Z are given by : VarŴ where s 2 R i~s 2 Z i . It is important to note that subtractive scrambling in the second subsample is same as the additive scrambling if {D 2 is viewed as the new scrambling variable. We anticipate two advantages by calling it subtractive scrambling. Firstly, it is easier just to subtract a constant (randomly chosen by the respondent) from the actual response on sensitive variable. Second advantage is a psychological one in nature. Perhaps, due to social desirability, a typical respondent would like to report smaller response in magnitude. In other words, respondents would be happy in underreporting, in general. Thus, subtracting a positive constant from the actual response would help satisfying the social desirability of underreporting. Of course, these two advantages are gained in the second subsample only since D 1 and D 2 are positive valued random variables. On average, affect of additive scrambling in one subsample is offset by subtractive scrambling in the other. As a result, parameters are estimated with increased precision.
Proof: Applying the expectation operator atV Var Y ð Þ, we get: Then, applying Theorem 2.3, we get: Now, we consider the estimation of variance s 2 X of the sensitive variable X . Provided that d 2 2 {d 2 1 =0, from (6) we can, after a simple algebra, write that We define unbiased estimators of s 2 X in the following theorems. Theorem 2.5: In case when d 2 2 {d 2 1 =0, an unbiased estimator of s 2 X is given by: Theorem 2.6: In case when d 2 2 {d 2 1~0 , an unbiased estimator of s 2 X is given by:ŝ where b is known constant belonging to the interval 0,1 ½ ,

Double response approach
Without incurring any additional sampling cost, Mehta et al. [14] ORRM may also be improved by taking two responses from each respondent. We take scrambling variables the same as defined in Mehta et al. [14] ORRM. To report the first (second) response, respondents are requested to use additive (subtractive) scrambling with the variable D 1 D 2 ð Þ. Let R 0 1j and R 0 2j be the two responses of j th respondent then the two responses can be written as It is obvious from (26) and (27) that the true value of sensitive variable X j cannot be worked out for the respondents feeling study variable sensitive enough. The reported responses of a particular respondent would be same if he/she feels study variable insensitive. In this case, he/she reports true value of study variable both the times. This is not challenging since the respondents feeling study variable insensitive would be willing to dispose their true value on sensitive variable. Thus, it may be concluded that privacy of respondents, feeling study variable sensitive, remains intact. As correctly pointed out by one of the referees, there is extra burden on the respondent if he/she has to report twice. This issue may be tackled by explaining whole the procedures to the respondent before actually obtaining data. He/she must be assured that his/her actual response on sensitive variable cannot be traced back to his/her actual response. Further he/she must be made clear that interest of the study lies in the estimation of parameters only. Moreover, we do not need any additional sampling cost to obtain two responses. Thus, obtaining two responses from a respondent should not be an issue in a particular study.
The expected responses from the j th respondent are same as given by (18) The variances ofm m 0 XZ andŴ W 0 Z are given by : Optional Scrambled Response Technique In some studies, interest of researchers lies in estimating m X rather than the sensitivity level W of variable X while it is of major interest in other studies. Following Huang [12], we define a linear combination of Varm m XZ ð Þ and VarŴ W Z À Á in order to find the optimum allocation of sample size. Thus, depending upon the interest of researchers, optimum subsample sizes can be obtained. Consider, Using Lagrange approach to minimize Varm m XZ ,Ŵ W Z À Á under the restriction that P 2 i~1 n i~n , we get: With these optimum sample sizes, the minimum value of Varm m XZ ,Ŵ W Z À Á is given by: In practice, s 2 Ri is unknown and the optimum allocation of sample sizes cannot be made. Following Murthy [17], the unknown values of s 2 Ri can be estimated from pilot surveys, past experience or simply an intelligent guess can be made about s 2 Ri .

Privacy Protection Discussion
There are many privacy measures suggested by different authors. We take E Z i {X i ð Þ 2 as the measure of privacy. This measure of privacy is proposed by Zaizai et al. [18]. A given model is taken as more protective against privacy if E Z i {X i ð Þ 2 is higher. For a model providing privacy protection to some extent À Á in the second sample. This shows that, in both the subsamples, Gupta et al. [13] ORRM is more protective compared to Mehta et al. [14] ORRM. The measures of privacy for Huang [12] ORRM are given by m 2 X zs 2 in the first and second subsamples, respectively. The measures of privacy for the proposed estimator in split sample approach are the same as that of Mehta et al. [14] ORRM. In double response approach the measure of privacy is given by W 1{T{F ð Þ 4 h 2 1 zh 2 2 zd 2 1 zd 2 2 { À 2h 1 h 2 Þ which is equal to measure of privacy provided by Mehta et al. [14] ORRM if and only if 3 h 2 1 zd 2 . This shows that the proposed double response approach may be made more protective compared to Mehta et al. [14] ORRM at the cost of increased variance. In fact, it is a trade-off between the efficiency and privacy protection. That is, we can have highly efficient estimator by compromising on privacy. Similarly, we can build a more protective model by compromising on the efficiency.

Efficiency Comparison
We compare the proposed split sample and double response approaches with the Mehta et al. [14], Huang [12] and Gupta et al. [13] ORRMs in terms of relative efficiency.  (7), (12) and (20). The variances of these estimators are obtained using 5000 iterations. The relative efficiency results (for the different scenarios given below) are given in the Figures 1-4

Conclusion
To estimate the mean, variance and the sensitivity level of a sensitive variable optional randomized response model by Mehta et al. [14] is improved. Utilizing the idea of additive scrambling in one sample and subtractive scrambling in the other subsample, we have proposed unbiased estimators of mean, variance and sensitivity level. We compared the proposed procedure with Mehta et al. [14] Huang [12], and Gupta et al. [13] procedure. The proposed idea resulted in the improved estimation of mean of the study variable. It has been shown by Huang [12] that his procedure works better than Gupta et al. [4] procedure. Therefore, the proposed split sample procedure is also better than Gupta et al. [4] procedure both in terms of relative efficiency and providing unbiased estimators of the mean m X , sensitivity level W and variance s 2 X of the study variable. Like Huang [12], the proposed procedure has the same advantage of estimating the variance of Y with no bias. Unlike Gupta et al. [4], proposed procedures do not require larger value of truth parameter T ð Þ when the study variable is highly sensitive. This may be considered the major advantage of the proposed procedures. It has been established that the proposed procedure of estimating mean is more efficient than all the procedures considered in this study. Moreover, as far as, the estimation of sensitivity is concerned we observed that the proposed estimators are less efficient (not shown in the figures) than all the estimators considered here except Mehta et al. [14].
As a final comment, we recommend using proposed procedures in the field surveys without increasing sampling cost when estimation of mean of the study variable is of prime interest.