Physical and cognitive doping in university students using the unrelated question model (UQM): Assessing the influence of the probability of receiving the sensitive question on prevalence estimation

Study objectives In order to increase the value of randomized response techniques (RRTs) as tools for studying sensitive issues, the present study investigated whether the prevalence estimate for a sensitive item π^s assessed with the unrelated questionnaire method (UQM) is influenced by changing the probability of receiving the sensitive question p. Material and methods A short paper-and-pencil questionnaire was distributed to 1.243 university students assessing the 12-month prevalence of physical and cognitive doping using two versions of the UQM with different probabilities for receiving the sensitive question (p ≈ 1/3 and p ≈ 2/3). Likelihood ratio tests were used to assess whether the prevalence estimates for physical and cognitive doping differed significantly between p ≈ 1/3 and p ≈ 2/3. The order of questions (physical doping and cognitive doping) as well as the probability of receiving the sensitive question (p ≈ 1/3 or p ≈ 2/3) were counterbalanced across participants. Statistical power analyses were performed to determine sample size. Results The prevalence estimate for physical doping with p ≈ 1/3 was 22.5% (95% CI: 10.8–34.1), and 12.8% (95% CI: 7.6–18.0) with p ≈ 2/3. For cognitive doping with p ≈ 1/3, the estimated prevalence was 22.5% (95% CI: 11.0–34.1), whereas it was 18.0% (95% CI: 12.5–23.5) with p ≈ 2/3. Likelihood-ratio tests revealed that prevalence estimates for both physical and cognitive doping, respectively, did not differ significantly under p ≈ 1/3 and p ≈ 2/3 (physical doping: χ2 = 2.25, df = 1, p = 0.13; cognitive doping: χ2 = 0.49, df = 1, p = 0.48). Bayes factors computed with the Savage-Dickey method favored the null (“the prevalence estimates are identical under p ≈ 1/3 and p ≈ 2/3”) over the alternative (“the prevalence estimates differ under p ≈ 1/3 and p ≈ 2/3”) hypothesis for both physical doping (BF = 2.3) and cognitive doping (BF = 5.3). Conclusion The present results suggest that prevalence estimates for physical and cognitive doping assessed by the UQM are largely unaffected by the probability for receiving the sensitive question p.


Socially sensitive research
Whenever studies with the aim to assess the prevalence rates of socially sensitive issues are performed, it is a challenge for researchers to measure these rates validly [1,2]. This challenge starts by giving a precise definition of the term "sensitive" research. According to Sieber & Stanley [3], socially sensitive research is defined as "studies in which there are potential consequences or implications, either directly for the participants in the research or for the class of individuals represented by the research". Lee & Renzetti [1] and Lee [4] described it as research which potentially poses a substantial threat for those who are or have been involved in it. As a consequence, when sensitive topics are studied, participants often react in a way that negatively affects the validity of study results (underreporting and non-responding) due to hesitating to provide compromising information about themselves [5,6]. Examples for socially sensitive topics are domestic violence [7], political activism [8], homicide and rape [9], mental health [10], death, murder and abortion [11,12], traumatic childbirth [13], and sexual health [14], according to Fahie [15]. Other examples that demonstrate how common sensitive issues are even in every-day aspects are medical adherence [16], attitudes towards foreigners [17], or cooperation in social interactions [18]. Another topic which has been considered to be socially sensitive is the use of prohibited substances for enhancing physical performance in athletes (doping) [19][20][21] and the use of illicit and prescription drugs for enhancing cognitive performance (pharmacological neuroenhancement) in students, academics, and workers [22][23][24].

Randomized response designs
Warner [25,26] stated that participants may be more willing to reveal sensitive information if participant's anonymity was guaranteed and introduced the first randomized response design, the so called Warner's original method. Randomized response designs (also called randomized response technique; RRT) are developed specifically to obtain more valid estimates when sensitive topics are studied through guaranteeing a maximum amount of anonymity [27,28]. Beside Warner's original method, several further RRTs have been developed such as the unrelated question method (UQM) [29], the forced response method [30,31], the item count technique [32], the crosswise method [33], the cheater detection model (CDM) [34], and the stochastic lie detector [35].
In their meta-analysis of 38 randomized response validation studies, Lensvelt-Mulders et al. [28] concluded that RRTs yield more valid results when assessing socially sensitive items compared to more conventional survey techniques such as face-to-face interviewing, selfadministered questionnaires, and telephone interviewing (see also Moshagen et al. [36] for a recent discussion on the validation of questioning techniques assessing sensitive issues). Furthermore, they stated that although RRT results are more valid compared to results of conventional survey techniques, there is still room for improvement and the more efficient RRTs become, the larger will be their value as a tool to study sensitive topics. For example, a recent study by Hoffmann et al. [37] indicated that although the mean perceived privacy protection of four chosen RRT variants was higher compared to direct questioning, the mean perceived privacy protection varied between these four variants.

The unrelated question method (UQM)
Within the present article, we focus on the UQM [22][23][24]29]. Using the UQM, each participant is guided with the aid of a randomizer (dice, coin, or deck of cards) to one of two questions, which should be answered honestly: a neutral (non-sensitive) question (A) or a sensitive question (assessing the sensitive item; B). The probability of receiving the sensitive question (based on the randomization) is denoted as p and the probability of receiving the neutral question as 1-p (Fig 1). For example, in a former study [20,21] participants were asked to draw a card of their choice from a deck of 20 cards. 15 cards contained the sensitive question and 5 cards the neutral question. Thus, the probability of receiving the sensitive question p was 3/4 and the probability of receiving the neutral question 1-p was 1/4. The neutral question has to fulfill the criteria to get answered by a sample with a certain probability with "yes". This probability for answering the neutral question with "yes" is denoted as π n and is known by the interviewers. Only the participant knows the outcome of the randomization process. Thus, the specific statement the participant answers with "yes" or no" is hidden from the experimenter, thereby providing both objective and subjective anonymity. However, based on the proportion Unrelated question model evaluation study of total "yes" responses of a sample (denoted as a) and given that the probabilities π n and p are known by the researchers, a prevalence estimate for the sensitive questionp s can be calculated using the formulap S ¼ aÀ ð1À pÞÁp n p ; and a 95% confidence interval (CI) for the unknown prevalence estimate can be used on the basis of the sampling variance where n denotes the sample size: Var ðp S Þ ¼ aÁð1À aÞ NÁp 2 . According to Lensvelt-Mulders et al. [28] the UQM has been rated to be one of the most efficient designs for measuring socially sensitive issues compared to other RRTs. In order to follow their recommendation to increase the efficacy of RRTs, Dietz et al. [22] developed a paper-and-pencil version of the UQM that enabled the researchers to randomly guide the participants to either answer the neutral or the sensitive question without using a randomization device such as dice, coin or deck of cards. This improvement enabled researchers to assess higher case numbers in less time. In addition, Schröter et al. [38] used the UQM and a CDM to assess the prevalences for physical and cognitive doping in triathletes and observed no meaningful differences for the prevalences estimated with CDM and UQM.

Aim of the present study
In order to increase the value of RRTs as a tool for studying sensitive topics, as suggested by Lensvelt-Mulders et al. [28], the present study investigated whether the prevalence estimatep s of the UQM is influenced by changing the probability of receiving the sensitive question p. In other words, does the UQM generate different prevalence estimates for a sensitive issue when different ps are used?
For example, a major concern is that some participants may hesitate to respond truthfully with "yes" to the sensitive question because an affirmative response leaves open the possibility that the respondent has the stigmatizing attribute; in order to avoid this impression, a respondent may provide a dishonest "no" response instead. Specifically, according to Bayes's rule, participants should become increasingly reluctant to provide a "yes" response to the sensitive question when p increases toward one [39] and will therefore cheat more. This is because the conditional probability P(A|"Yes") of having the stigmatizing attribute A given a "Yes" response increases with probability p (see S1 Appendix). Accordingly, we expect smaller prevalence estimates for p % 2/3 than for p % 1/3. Note that small values of p produce relatively large standard errors of the prevalence estimate unless especially large samples sizes are used. Therefore, p % 2/3 is more feasible than p % 1/3. For this reason, we assessed the prevalence estimates of physical and cognitive doping in a sample of university students using two UQM versions with different probabilities of receiving the sensitive question (p % 1/3 and p % 2/3). According to Dietz et al. [19], the term physical doping describes "the intake of illicit or banned substances to improve physical performance" (e.g. anabolic androgenic steroids, human growth hormones, erythropoietin) and the term cognitive doping includes "illicit drugs (e.g. cocaine) and prescription drugs (. . .) such as stimulants (e.g. methylphenidate and amphetamines), antidepressants, beta-blockers, or modafinil" with the aim to improve cognitive functions such as memory, attention, learning performance, or mood. Both types of substances have been reported by previous studies to be frequently used by students [40][41][42][43][44][45] and therefore, the collective of university students seems to be highly suitable for testing our hypothesis.

Survey procedure
On the basis of a previous performed survey concerning performance enhancing substances in university students using the RRT [22], a short paper-and-pencil questionnaire was distributed among university students of the University of Mainz, Germany at the beginning of classes. In order to be able to recruit an approximate representative sample of students based on age, sex, and field of study, all major classes of the different disciplines were identified using the online study administration platform of the university. Two weeks before the survey was performed (the survey was performed in the first third of the semester in order to reach a high number of students), all teachers/lecturers were informed about the survey by email. Once the questionnaire was distributed by our researcher team at the beginning of the classes, a briefed assistant introduced the students to the survey procedure and stressed the anonymity of the RRT. Students were told to fill-in the questionnaire immediately at the beginning of the class and to drop it into black boxes by the classrooms doors at the end of the class.

Questionnaire
At the beginning of the single-paged paper-and-pencil questionnaire, a short introduction was given explaining the aim of the study and guaranteeing the anonymity of the survey. Afterwards the terms 'physical doping' and 'cognitive doping' (in German, the term "Doping" is commonly used for physical doping whereas the term 'Hirndoping', which literally translates to brain doping, is commonly used for cognitive doping) were described to the participants: "Substances for physical and cognitive enhancement (doping and brain doping) are pharmaceuticals or illicit drugs, which you cannot buy in a drug store and that were not prescribed to you to treat a disease. The only reason why you use this substance is to reach a certain goal.
Cognitive doping: The goal is to improve cognitive performance such as attention, alertness, and mood. Examples: stimulants (e.g. amphetamines), caffeine tablets, cocaine, Ritalin 1 (methylphenidate), mephedrone (coffee and tea do not count to these substances).
After the introduction and the description of the terminology, two RRT questions followed using the UQM; one question to assess the 12-month prevalence for physical doping, and one to assess the 12-month prevalence of cognitive doping. For this study, we adapted the questions used in previous studies using the UQM [19,22,23,38,46] but used two different probabilities of receiving the sensitive question. One half of the questions contained a probability of receiving the sensitive question of p % 1/3 and the other half of p % 2/3. For example, to assess the prevalence of cognitive doping by using a p % 2/3, the following text appeared in the questionnaire: __________________________________________________________________________ Please consider a certain birthday (yours, your mother's, etc.). Is this birthday in the first third of a month (1 st to 10 th day)? If "Yes", please proceed to question A; if "No", please proceed to question B.
Question A: Is this birthday in the first half of the year (prior to the 1 st of July)? Question B: Did you use brain-doping substances during the last 12 months? Your answer to question A or question B is (note that only you know which of the two questions you will answer): YES NO __________________________________________________________________________ Thus, 32.9% (120 of 365.25) of the students received the non-sensitive question A, whereas 67.1% % 2/3 (245.25 of 365.25) receive the sensitive question B.
To assess, for example, the prevalence of physical doping by using a p % 1/3, the following text appeared in the questionnaire: __________________________________________________________________________ Please consider a certain birthday (yours, your mother's, etc.) Is this birthday in the first two thirds of a month (1 st to 20 th day)? If "Yes", please proceed to question A; if "No", please proceed to question B.
Question For calculating the prevalence estimates and confidence intervals the exact (non-rounded) probabilities for p and π n were used.
In order to avoid a possible influence of the question order on the results, the order of questions (physical doping and cognitive doping) as well as the probabilities of receiving the sensitive question (p % 1/3 or p % 2/3) were counterbalanced across participants. As a consequence, eight different versions of the questionnaire were created (Table 1). At the end of the questionnaire, four characteristics of the participants were asked. These were gender (female/male), age (metric), number of semester (metric), and field of study (nominal).

Statistics
Statistical power analyses [47] were performed to determine the necessary sample size for detecting an overall prevalence of 15% for physical and cognitive doping respectively, which, according to the literature [19,22,40], appears to be a lower conservative value of the prevalences. The null hypothesis of this power analysis assumes that π s = 0. Under this assumption, for the questions using p % 1/3, a sample size of 550 provides a power of approximately 0.8 for rejecting the null hypothesis and for the questions using p % 2/3, a sample size of 150 provides a power of approximately 0.9 for rejecting the null hypothesis. This difference is based on the circumstance that the higher p is, the more people of a certain sample receive the sensitive question. Since approximately every second question contained p % 1/3 (Table 1), a total sample size of 1,100 valid questionnaires was needed.
A similar approach as introduced in [47] was used to conduct a power analysis for detecting a significant difference between the prevalence estimates under the two probability conditions. Specifically, based on former studies with p % 2/3, we proceeded from a prevalence estimate of 0.15 for this probability condition. For p % 1/3, however, one might assume that participants feel more protected than with p % 2/3. Thus, one may assume a prevalence estimate of 0.20, 0.25, or 0.30 under p % 1/3. If n = 600 participants are tested under each probability condition, one computes a statistical power of 0.33, 0.78, or 0.98, respectively, for detecting a significant difference between the prevalence estimates in the two probability conditions. Descriptive data are presented as mean ± SD values and were calculated using SPSS software, version 22. Prevalence estimatesp s for physical doping (separately for p % 1/3 and p % 2/3), and cognitive doping (separately for p % 1/3 and p % 2/3) are presented as percentages with 95% confidence intervals (CI) and standard error (SE). MatLab version R2015a was used to calculate a combined prevalence estimate (p % 1/3 and p % 2/3) for physical doping and cognitive doping. Moreover, a likelihood ratio test was used to assess whether the prevalence estimates for the two ps differ significantly. Finally, the Savage-Dickey method [48] was used to compute the Bayes factor B01 for H 0 (i.e. the prevalence estimates are identical under p % 1/ 3 and p % 2/3) versus H 1 (i.e. the prevalence estimates differ under p % 1/3 and p % 2/3). This Bayesian analysis treated the UQM as a multinomial processing tree. Markov chain Monte Carlo sampling with the software RJAGS [49] was then employed to compute the posterior distribution of the difference between the prevalence estimate for p % 1/3 and the prevalence estimate for p % 2/3 under H 0 as well as under H 1 . The Savage-Dickey density ratio was calculated at zero difference by using a kernel density estimator. This Bayesian analysis was performed under R (see Thielmann et al. [50] for a similar approach using a different RRT model).
Ethical approval to conduct this study was obtained by the Ethics Committee of the Medical Faculty and the University Medical Center of the University of Tuebingen, Germany (project number 095/2011BO2).

Results
Of the 1,243 questionnaires distributed at the beginning of the classes, 1,206 questionnaires were returned, resulting in a response rate of 97%. The distribution of the eight questionnaire versions (Table 1) is presented within Table 2. Of the 1,206 participants that returned a questionnaire, 33 participants (2.7%) did not fill in any of the two RRT questions and nine participants (0.7%) filled in the first RRT question only. In summary, 1,169 participants answered to the RRT question concerning physical doping (p % 1/3 or p % 2/3) and 1,168 participants concerning cognitive doping (p % 1/3 or p % 2/3). The characteristics of the respondents are presented within Table 3.

Discussion
The aim of this study was to investigate whether the prevalence estimate for a sensitive item assessed with the unrelated questionnaire method (UQM) is influenced by changing the probability of receiving the sensitive question p. This was done by assessing the 12-month prevalence estimates for physical doping and cognitive doping in a collective of university students using two different probabilities of receiving the sensitive question (p % 1/3 and p % 2/3). Therefore, the study design from a previously performed study in university students was adapted [22]. Similar to the previous study, the present study showed a high response rate of more than 90%. In order to evaluate representativeness of a surveyed sample, Baruch [53] stated that not only the rate of responded questionnaires but also the rate of useable (valid) questionnaires is important. Since only 33 participants who returned a questionnaire did not The items for the variable field of study were grouped on the basis of a previous study [22] https://doi.org/10.1371/journal.pone.0197270.t003 provide answers to any of the RRT questions, the rate of usable questionnaires was 94.4%, which is an excellent percentage compared to other surveys addressing the use of substances in student collectives [42,[54][55][56].
Likelihood ratio tests revealed that the prevalence estimates for physical doping (22.5% and 12.8%) as well as for cognitive doping (22.5% and 18.0%) estimated with p % 1/3 and p % 2/3 were not significantly different. In addition, a Bayes analysis revealed fairly more support for the null hypothesis ("prevalence estimates are identical for p % 1/3 and p % 2/3") than for the alternative hypothesis ("prevalence estimates differ for p % 1/3 and p % 2/3"). Consequently, these results provide no support for our initial assumption that according to Bayes's rule, participants should become increasingly reluctant to provide a "yes" response to the sensitive question when p increases toward one (see S1 Appendix). Contrary, the prevalence estimation by UQM seems to be rather robust against a manipulation of the probability p of receiving the sensitive question. A similar result was recently reported in a study by Hilbig and Zettler (2015, Experiment 5), in which the prevalence of cheating behavior in a coin-toss task was largely unaffected by the randomization probability (i.e., the probability of winning an incentive) [57].
The present findings might suggest that one should implement a high value of p when using the UQM in sensitive surveys, because for a fixed n, power increases with p [47]. However, such a conclusion might be premature on the sole basis of the present study. First, the objective amount of privacy protection by UQM decreases with increasing p. Therefore, it is most likely that also the subjective amount of privacy protection decreases with increasing p. The exact relation of felt privacy protection and p may be influenced by several factors such as the subjective sensitivity of the sensitive question, the implemented randomizer, or the overall setup of the survey. It could be argued that in the present study, the subjective sensitivity of doping behavior was comparably low for the surveyed students, because different from athletes, doping behavior would have no immediate consequences even in case of revelation. Consistent with this notion, the estimated prevalence rates for doping behavior in the present study were comparable with a previous study surveying university students with the UQM [22] but considerably higher than in a previous study surveying recreations competitive athletes with the UQM [38].
Therefore, it seems necessary to perform further studies assessing individuals' attitudes and beliefs towards sensitive items and possible interaction effects of subjective sensitivity and privacy protection on prevalence estimation by the UQM. Furthermore, perceived privacy protection of UQM should also be compared with other RRTs, because Hoffmann et al. [36] showed that perceived privacy protection varies between different indirect techniques. Second, it could be argued that-given the higher absolute estimated prevalence rates for p % 1/3 than for p % 2/3 for both physical and cognitive doping-participants indeed tended to change their response behavior (thus, more dishonest answers for p % 2/3 than for p % 1/3) but that the power of the present study was not sufficient to detect this difference. Therefore, future studies should increase the sample size to provide a more robust test of a possible decrease of perceived privacy protection with increasing p and its resulting effect on prevalence estimation. For example, this might be achieved by further increasing the difference between the two p values of receiving the sensitive question.
Supporting information S1 Appendix. This appendix shows how this probability depends on the parameters p, π s , and π n . (DOCX) S1 Dataset. (SAV)