Reader Comments

Post a new comment on this article

Response to Dijksterhuis

Posted by drshanks on 26 Apr 2013 at 06:45 GMT

Dijksterhuis (“Replication crisis or crisis in replication? A reinterpretation of Shanks et al.”) has argued that the data we reported fail to provide any serious challenge to the intelligence priming effect originally reported by Dijksterhuis and van Knippenberg (1998) and followed up in several studies cited in our article. Here we respond to Dijksterhuis’ criticisms of our studies.

In our view almost none of his criticisms have any genuine force. Indeed, on examination they turn out to be quite superficial. Moreover, he has made minimal attempt to accurately represent the experimental hypotheses we tested, and his meta-analysis of results from a subset of our experiments is flawed because he has miscoded the data.

We turn to the objections levelled at each experiment in our article.

Experiment 1

The main criticism seems to be that our participant group was too heterogeneous, varying in age for instance from 19-79 (participants were not, as he suggests, ‘recruited on the streets’). Dijksterhuis seems to have imputed to us an experimental hypothesis that we never adopted, namely that Experiment 1 (and 2) was a direct replication of the original method. As we explicitly noted, it is not: we changed the priming method (a video) as well as the test (IQ test items). In these initial experiments we asked whether the priming effect would generalize to a different set of circumstances (at the time of running those experiments, we expected to be able to obtain the basic priming effect). Thus the null result suggests that the priming effect does not generalize to these circumstances. We did not claim that this experiment fails to replicate the original demonstrations. As such, it’s not clear to us what the point of Dijksterhuis’ criticism is. A heterogeneous sample would normally be a good thing, so long as the comparison groups are equally heterogeneous. Incidentally, Dijksterhuis and van Knippenberg (1998) did not give age data for their participants.

Experiment 2

Again, we acknowledge that the groups were more heterogeneous than those used by Dijksterhuis and van Knippenberg (1998). We also acknowledge that the sample size is small. The fact that there was a difference between the groups in their pre-test scores was dealt with by taking a pre-/post-prime difference score.

Experiments 3 and 4

Dijksterhuis has no objections to these experiments, apart from a concern about whether the participants in Experiment 4 were tested in cubicles (they were, as noted in the Supporting Information).

It is important to reflect on the ‘big picture’ for a moment. Experiments 3 and 4 (and Experiment 8, see below) were near-exact replications of the original Dijksterhuis and van Knippenberg (1998) method (and unlike Experiments 1 and 2 employed a student sample). The failure of these experiments to find any evidence of a priming effect is a clear-cut demonstration that the original finding is not robust. Dijksterhuis makes no comment on this crucial conclusion.

Experiment 5

Dijksterhuis’ criticism of this experiment is that we only ran two conditions, in which participants made professor-similarity or Einstein-dissimilarity ratings in the priming stage. He suggests that we should have run the four possible conditions obtained from crossing category (professor) versus an exemplar (Einstein) with similarity focus (rate similarity/dissimilarity). This criticism is very hard to understand. The two conditions we chose were the ones which LeBoeuf and Estes (2004) reported yielding the largest difference, a result we failed to replicate under conditions very close to theirs.

Experiment 6

Dijksterhuis’ dismissal of this experiment again highlights that for him this is not an exercise in careful analysis. This experiment explored the question of whether a priming effect would be obtained if an appropriate experimental demand were implanted in participants’ minds. No such effect was observed. We never suggested that this experiment was a test of the hypothesis that under the standard Dijksterhuis and van Knippenberg conditions, priming is observed. Dijksterhuis offers no justification for why he believes it fails to adequately test the hypothesis it sought to test (namely, that emphasizing the experimental expectations would lead to a stronger priming effect). He simply misrepresents our hypothesis.

Experiment 7

Dijksterhuis’ major objection to this study is that it necessitated asking participants to form an impression of a UCL student who was also a soccer hooligan. Worse still, we did not measure the believability of this scenario via a manipulation check. Our participants expressed no difficulty imagining this individual, probably because UCL students are as varied as they are anywhere else.

We accept Dijksterhuis’ other criticism that the group sizes in this experiment were small.

Experiment 8

One objection to this experiment (which was run in the UK) is that it used the same questionnaire as that in Experiment 4 (run in Australia). It is puzzling why Dijksterhuis finds the use of the same questionnaire ‘mystifying’. If we had devised a tailor-made questionnaire, it would’ve been identical to the one used previously because UK and Australian language and culture are – in all aspects relevant to this experiment – identical.

The second objection is that in the two groups which once again formed a replication of the original Dijksterhuis and van Knippenberg (1998) procedure, we obtained a numerical effect in the right direction (though far from significant). Dijksterhuis chides us for not doing a further replication of this experiment. It seems that 3 experiments (Experiments 3, 4, and 8), all of which were closely modelled on the original method yet found no trace of a priming effect, is not sufficient evidence. Dijksterhuis’ criticisms of this experiment (a) aren’t really criticisms of the experiment at all but of its connection with other studies, and (b) are once again entirely superficial.

Experiment 9

Dijksterhuis notes that in this experiment we only analysed the answers to the 5 most difficult questions out of 20 in total, and that the standard deviations (SDs) were large. Our group sizes were small. Clearly these factors reduce the power of the experiment and we acknowledge that further attempts to replicate the Bry, Follenfant, and Meyer (2008) study are needed. However, the reason our analysis was restricted to just 5 questions, obviously, was that we were following Bry et al.’s analysis. Quite why this is a criticism of our experiment is unclear.

Dijksterhuis does note that we additionally reported the data across all 20 questions. This yields a pattern which is the opposite of what Bry et al. obtained, as well as being nonsignificant. Nevertheless, for Dijksterhuis, this experiment shows that Shanks et al. “had finally developed the tacit – or perhaps explicit – knowledge to get their paradigm to show some sensitivity to their manipulations”. It shows, of course, no such thing. What it does show is that the priming pattern reported by Bry et al. is not readily replicated.


Dijksterhuis reports a meta-analysis of the data from Experiments 3-8 and concludes that the priming effect is marginally significant. But this meta-analysis is wholly flawed as it is based on mis-coding of the data. The predictions tested by our experiments (following the previously published studies) were Einstein > Professor in Experiments 5 and 6, and that out-group Professor < out-group Hooligan in Exp 7. Dijksterhuis has coded them in reverse which is completely inappropriate.

We re-computed the meta-analysis with correct coding. Taking the pooled data from these 6 experiments, the group expected to show boosted scores has a mean of 41.647 (N=156) correct and the group expected to perform worse a mean of 41.217 (N=157), t(311)=0.308, p=.379 (1-tailed). Power to detect an effect of the size reported in the previously published experiments (median d≈1.0, lowest = 0.72) is 1.000, to detect a medium (d=0.5) effect is 0.997, and a small (d=0.2) effect is 0.547. The 95% confidence limits on the obtained effect size (d=0.034) are [-0.19, 0.26]. This analysis resoundingly confirms that our replication attempts have adequate power to detect an effect considerably smaller than the original reported magnitude.

Performance level

Dijksterhuis speculates that one reason we failed to obtain priming effects is that our test items were too difficult. As noted in our article, performance levels were generally between around 35-50% correct on the multiple-choice questions, which had 4 options including the correct one. This criticism has little plausibility as the level of performance in our Experiment 4 (the largest and closest replication of the original Dijksterhuis and van Knippenberg study), at 40%, is not appreciably lower than that reported in Dijksterhuis and van Knippenberg’s experiments, where group means ranged from 37.9 to 59.5%.

The lack of force to this criticism is confirmed by a histogram we have plotted of scores in percent correct for 5-point bins between 0-100% for the pooled data from Experiments 3, 4, and 8 (priming groups). The data are, as expected, roughly normally distributed and there is no significant difference between the distributions.


Dijksterhuis’ “criticisms” of our experiments are, in large part, entirely superficial. Of course we are not claiming that all our experiments were perfect: for instance, we acknowledge that larger sample sizes in some experiments would have reduced the size of the confidence limits on the estimate of the between-group effect. More importantly, however, we have highlighted that Dijksterhuis’ scattergun approach masks the fact that he seems to have absolutely no concrete response to our major result, the fact that across 3 experiments with a combined sample of 92 participants per group we found no evidence of priming, despite replicating the Dijksterhuis and van Knippenberg (1998) method very closely. In addition Dijksterhuis dismisses several of our other experiments but his only grounds are that he disagrees with the hypotheses we were testing – hypotheses that all had a priori bases in the published literature. Theoretical disagreement is not grounds to dismiss empirical data as flawed.

In his comment, Dijksterhuis makes a number of references to the way in which our experiments were run, speculates that participants were recruited on the street (they weren’t), and implies that our null results are attributable to the data being collected by under-trained students. In fact our research team comprised both staff researchers and students, and all were carefully supervised. We do not believe it is appropriate in a scholarly exchange to suggest, without concrete evidence, that another group’s research practices are unprofessional. Dijksterhuis also chastises us for not contacting him for his expert advice during our research project. It must have slipped his mind that we did so twice but on neither occasion was he able to provide the information we requested.

David Shanks & Ben Newell

No competing interests declared.