Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Why should this posting be reviewed?
See also Guidelines for Comments and Corrections.
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.close
Relation to recent paper
Posted by nman1000 on 16 Dec 2009 at 08:34 GMT
I wondered if the authors could comment on how their work relates to this recent paper that has come out recently:
Eisenegger, C., Naef, M., Snozzi, R., Heinrichs, M., & Fehr, E. (2009). Prejudice and truth about the effect of testosterone on human bargaining behaviour. Nature. doi: 10.1038/nature08711.
I'm the lead author for the PLoS ONE paper that reporting the opposite result than the paper published last week in Nature. You should note that we studied men, while the Nature paper studied women. More that that, though, there are many reasons to worry about the reported findings in the paper in Nature.
1. The Zak paper compares individuals on testosterone (T) to THEMSELVES on placebo, while the Eisenegger paper compares women on T to other women given a placebo. Our "with-in subjects" design is much stronger statistically and the results therefore more compelling.
2. The Eisenegger paper does not document that they have increased T (they report that their treatment did not increase cortisol, so they did take before and after hormone measurements). This is quite worrisome. The Zak paper documents that we roughly doubled total T, free T and DHT.
3. The results in Zak for both a reduction in generosity and an increase in punishment for men on T both scale with T levels. That is, the higher the T, the more stingy men were and the more they punished others for being stingy to them. Such correlations are strong evidence that our findings are real, not noise. No such correlations are reported in the Eisenegger paper so we don't know if their effects vary with T levels at all.
4. The supplementary information in the Nature paper reports their analyses controlled for the beliefs of participants regarding what drug they took by adding to their data 90 additional participants who took neither drug nor placebo. This suggests that if one does not include these additional people, that the main effect of drug vs. placebo disappears. Yet, the conditions between experiments are not identical so combining these data are suspect (e.g. the placebo group took a pill, but the additional 90 did not). Since this is a randomized trial, if the finding is real, there should be an effect of drug vs. placebo without additional control variables ("beliefs") or additional participants. This "belief effect" indicates that the reported results are either just noise, or statistically fragile. The Zak paper shows main effects cleanly and with substantial size effects.
5. When beliefs in the Nature paper were measured was not specified, though presumably after they drug loaded. This suggests that participants had an incentive, if they were stingy, to report they thought they were on testosterone. This was noted by a blogger here http://neuroskeptic.blogs...
My view is that the "beliefs" aspect of the Nature study is a red herring. Either T causes a change in behavior or not. Beliefs are irrelevant.
5. Unlike the paper in Nature, the Zak paper does a substantial number of statistical robustness tests to demonstrate that our finding is robust, using only the moderate sample size data our all analyses are based on.
6. We identify the mechanism through which T reduces generosity as its effect as an oxytocin antagonist, comparing the T findings to our earlier paper showing that 40IU of oxytocin infusion increased generosity by 80% (Zak, et al, 2007, found here http://neuroskeptic.blogs...
No such mechanism for T increasing generosity or "beliefs" reducing it is offered in the Nature paper. Again, this is worrisome because their finding may simply be due to noise.
Basically the studies are very similar except for the gender difference, but I believe that the experiment reported in PloS ONE paper is better designed, has robust findings, and clearly documents the rise in T and shows that both main effects vary in proportion to T levels.
In our study “Prejudice and truth about the effects of testosterone on bargaining behaviour” we have shown that individuals with exogenously increased testosterone levels make significantly more fair bargaining offers in the ultimatum game compared to subjects who received a placebo. However, subjects who believed that they received testosterone— regardless of whether they actually received it or not—behaved much more unfairly than those who believed that they were treated with placebo. Finally, we found no effect of testosterone administration on the responder’s rejection behaviour. This finding appears to be challenged by findings of Zak et al., who report a decrease in proposer offers and a decrease in generosity after testosterone administration. Before we respond to the particulars of the comment by Paul Zak, we would first like to point out an important statistical issue in the Zak et al. paper.
Zak et al. used a sample of 25 subjects in a “within-subjects” design, i.e. each subject played the ultimatum game 8 times. They played the game 4 times in the testosterone condition and 4 times in the placebo condition. Thus Zak et al. recorded a total of 200 decisions.
In the statistical analysis one has to take into account that these 200 decisions are not independent of each other and that there are only 25 independent observations. In their statistical analysis however, Zak et al. assume that the 4 decisions of each individual in the two treatments are independent of each other. This assumption is clearly not justified and tends to inflate the statistical significance of the results.
Zak et al. sent us their raw data for a re-analysis. Based on the dataset we received, we conducted the t-test in a way that only assumes 25 independent observations by comparing each subjects’ average generosity between the testosterone and placebo treatment. We found that their main result that subjects under Androgel are less generous than under placebo is no longer significant (paired t-test, p = 0.33, two-tailed, n = 25). Thus, based on a proper statistical test, the null hypothesis of no treatment group differences between the placebo and the Androgel condition cannot be rejected.
In addition, we have the following response to the comment by Paul Zak.
The decision to choose a “between” or a “within” subjects design is largely dependent on the research question and there is not simply a good and a bad choice. Virtually all social decision making studies employ “between” subjects designs, because “within-subjects” designs may carry the risk of spill-over effects across the treatment conditions. Such spill-over effects were noted by Zak et al. who found a significant increase in proposer offers across the two treatment sessions. This might explain the unusually high average offer of above 50% in the placebo group, which has hitherto never been observed in Western societies (Henrich et al., 2006). On the other hand, the costs of choosing a ”between” subjects design is the relatively lower statistical power, which we counteracted with a relatively larger sample size.
Administration procedure and blood levels.
Blood measures verify whether one is successful in exogenously increasing testosterone levels. We did not take blood measures during our experiment because the use of syringes may have stressed the subjects. This might be accompanied by a cortisol response which obviously might interact with the testosterone administration. But even without such a neuroendocrine response, subjects nevertheless could have changed their behaviour. However, we are confident that we increased levels of testosterone, because our testosterone administration procedure is well-established and robust pharmacokinetic data have been published previously (Tuiten et al., 2000). We used the same dosage, route of administration and a subject pool that has the same age range, gender and other important communalities with the sample used by Tuiten et al.
Our procedure of testosterone administration contrasts with the one used by Zak et al. because in the latter case no data support the reliability of the method. In particular, as testosterone levels tend to be higher in the morning and lower in the evening, blood measures should be taken at the same daytime. Despite this fact, Zak et al. took the first (baseline) blood-draw in the evening and the second one in the early morning. Therefore, one cannot rule out that their reported increase in testosterone levels after gel administration was entirely due to the natural circadian fluctuations in serum testosterone.
Beliefs about the drug.
The community now becomes increasingly aware that preconceptions (“beliefs”) about drugs are potent determinants of behaviour. These may take the form of the well-known placebo or nocebo-effect. Because social decision making is sensitive to framing effects and expectations, any preformed belief about potential effects of a drug could interfere with the outcome measure. This fact can have far-reaching consequences: if subject‘s beliefs about a substance cause them to act in favour of the true effect of the substance itself, the substance effect might be overestimated. On the other hand, if subjects’ beliefs about a substance causes them to act against the true effect, as it is the case in our study, we are likely to underestimate the true effect of the substance. Although our measure of the beliefs is not perfect, which we extensively discuss in the paper (on page 358, left column), our observation that beliefs are highly negatively correlated with bargaining offers teaches us an important methodological lesson, which will likely influence future study designs.
The identification of the neurobiological mechanisms by which testosterone may influence social behaviour is obviously one of the major challenges in the field. Both Eisenegger et al. and Zak et al. do not address these research questions. Future research, in particular animal research, will have to provide models of how hormonal drug challenges affect neurobiological processes and how these translate back into behaviour. So far, animal research has suggested many different potential mechanisms, including for example through the aromatisation of testosterone to estradiol, which acts on estrogen receptors in the amygdala, through a down-regulation of oxytocin receptors and/or an up-regulation of vasopressin receptors or by changing catecholamines and/or serotonin neurotransmission. Thus, we are reluctant in attributing testosterone administration effects to one single neurobiological mechanism in humans.
It is valuable to have a discussion of the differences in our papers with my friends from Zurich. On the statistical analysis, the Zurich group is wrong. In most papers in which individuals are put on a drug, participants will make multiple decisions so that the fewest people are subjected to the drug--this is appropriate ethical design. Indeed, in my paper with Prof. Fehr on oxytocin infusion published in Nature in 2005 we did exactly this: had participants make four decisions in a between subjects design and used every observation as a separate datum. The statistical analysis in that paper and in Zak et al 2009 PLoS ONE are entirely appropriate. Further, the Zak et al testosterone paper shows that our results hold up to a variety of statistical robustness tests.
The Zurich group has shared their data with me and when I tested for an effect of the drug on ultimatum game (UG) behavior, there is no statistical difference between drug and placebo using a t-test or Mann-Whitney test. Period.
The strongest evidence that the Zak paper has the correct findings is the powerful parametric effect of T and Free T and DHT levels on both lower UG offers and greater UG punishment. It is highly unlikely that such parametric effects are to noise and would hold for ALL 3 measures of testosterone. I emphasize the additional trouble, cost and pain we (and participants) went through to get this data: we took blood before and after every testosterone or placebo treatment to document the rise in T. The Zurich group simply assumed that they raised T levels in their participants but cannot tell how much.
As stated above, there is also no known physiologic mechanism that would make women on T more generous in the UG than those in placebo. Since testosterone inhibits the effect of oxtyocin, and my group had previously shown that oxytocin increases generosity in the UG, the Zak et al. paper has a clear and consistent rationale for the findings.
My overall view is that the Zurich's group finding is due to noise. Additional evidence for this is in their Figure 2. Without controlling for beliefs (which are irrelevant in a drug study), testosterone appears to increase offers in the UG (panel a) while when beliefs are accounted for the directional effect of testosterone on UG offers changes direction (panel b). At best this suggests a very fragile result.
I invite other researchers to resolve this debate: replicate our studies and report what you find. I suggest, though, that the best design involves documenting the rise in testosterone and comparing participants' behaviors on testosterone to themselves on placebo.