Reader Comments

Post a new comment on this article

Questionable classification of studies and choice of outcome measures

Posted by livmfield on 19 Sep 2016 at 07:29 GMT

The paper by Cristea and colleagues makes an important contribution to the literature on cognitive bias modification (CBM) interventions for addiction. Their meta-analyses demonstrate that the overall effect size for CBM interventions on addiction outcomes may be close to zero (immediately after training) or trivially small (at follow-up), and effect sizes are diminished still further when methodological problems such as inadequate blinding are taken into consideration. We are not particularly surprised by their conclusions; in recent narrative reviews of attentional bias modification (e.g., Christiansen et al., 2015) we have noted that positive trials are highly cited and their methodological weaknesses are overlooked, whereas numerous null findings are rarely cited at all. This problem is widespread throughout psychology and psychiatry (Ferguson and Heene, 2012).

However, we believe that the authors’ overall conclusions should not be uncritically accepted because of problems in the way that studies were classified and outcome measures were selected for inclusion.

Laboratory research versus randomized controlled trials (RCTs)

Cristea et al. label all studies in this area as ‘randomized controlled trials’ (RCTs). This is misleading because they include several laboratory studies (11 of 25 studies included) that are not RCTs. These lab studies had the aim of testing theoretical predictions by experimentally manipulating cognitive biases in order to investigate their causal influence on substance use and / or craving. These lab studies were never intended to be RCTs, and they are not described as such in the original reports. Of course, one implication of findings from some of the lab studies is that CBM might be a useful clinical intervention and therefore it should be investigated in subsequent RCTs with clinical populations. However, there are a number of important differences between lab studies and RCTs which mean that they should not be treated as if they are one and the same. Lab studies typically involve student volunteers (who are not motivated to change their behaviour), and the studies investigate the effects of (typically) a single session of CBM on a proxy outcome measure (such as a taste test, see below), immediately after the training. On the other hand, RCTs are typically conducted with patients with a substance use disorder who are seeking treatment, who receive multiple CBM sessions over an extended period of time, and the typical outcome measure is self-reported (and ideally independently verified) substance use. In their defense, Cristea and colleagues reported no difference in addiction outcomes between single session versus multiple sessions of CBM, and in the discussion (page 17) they acknowledge that lab studies ‘are less relevant in determining the efficiency (sic) of these interventions’. Despite this caveat, we are concerned that they appear to view the difference between these two types of studies as a trivial methodological detail, when it is much more than that.

Unjustified choice of outcome measures

Secondly, regarding the laboratory alcohol CBM studies, Cristea and colleagues made the decision to disregard outcome measures derived from bogus ‘taste tests’, and they offered the following rationale for doing so (page 3):

‘We did not include outcomes that measured consumption or preferences during a non-standardized behavioral test (i.e., participants were given an impromptu taste test which measured whether they chose non-alcoholic over alcoholic drinks or how much they consumed from each drink). We opted for this because this is a non-standardized and variable task that does not take into account participants’ general preferences, their habitual alcohol consumption, and that is usually carried out without their awareness’.

We were surprised by this decision, not least because participants’ alcohol consumption during the bogus taste test was unambiguously framed as a primary outcome measure in many of these studies (e.g., Jones and Field, 2013, Wiers et al., 2010). Instead, Cristea et al. opted to use subjective craving as the outcome measure in their analysis of the lab studies. We were not convinced by their justification for doing so. We agree that the taste test is not a perfect measure, and indeed it is correct that there is no standardized version of the task, so different research groups tend to use different versions of it. But this is also true of subjective craving measures (which differed across studies, as is evident from Table 1). Furthermore, the psychometric properties of subjective craving questionnaires are not consistent across different populations (Kavanagh et al., 2013). Therefore, subjective craving is not a perfect outcome measure either.

Cristea et al.’s dismissal of the taste test as a valid outcome measure overlooks the fact that similar measures have been widely used in animal research on motivation (see Ward, 2016) and human appetite research (e.g. Conger et al., 1980), for decades. Most importantly, it has been widely used as a behavioural measure of the motivation to drink alcohol since publication of a seminal paper more than 40 years ago (Marlatt et al., 1973). In a recent paper, we demonstrated that the amount of alcohol that volunteers drink in the lab in the context of a bogus taste test is proportional to their typical intake outside of the lab, and the task remains valid even in participants who are aware that their intake is being measured (Jones et al., 2016).

When conducting a meta-analysis, difficult decisions have to be made, including the choice of outcome measures to focus on. Cristea and colleagues took the decision to exclude all data from an outcome measure that is widely used in alcohol laboratory studies and has been used in other domains for decades, yet their reasons for this decision appear arbitrary, ignorant of the history and methodological research that underpins the measure, and unsupported by any references whatsoever. We find their decision difficult to understand and suggest that their overall conclusions should be considered alongside findings from a recent meta-analysis that identified robust effects of a single session of one form of CBM (Inhibitory Control Training) on alcohol consumption in the laboratory when assessed with a bogus taste test (Jones et al., 2016).

Matt Field, Andrew Jones, and Paul Christiansen
University of Liverpool


References
Christiansen, P., Schoenmakers, T. M. & Field, M. (2015). Less than meets the eye: Reappraising the clinical relevance of attentional bias in addiction. Addictive Behaviors 44, 43-50.
Conger, J. C., Conger, A. J., Costanzo, P. R., Wright, K. L. & Matter, J. A. (1980). The effect of social cues on the eating behavior of obese and normal subjects. Journal of Personality 48, 258-271.
Ferguson, C. J. & Heene, M. (2012). A vast graveyard of undead theories: publication bias and psychological science's aversion to the null. Perspectives on Psychological Science 7, 555-561.
Jones, A., Di Lemma, L. C. G., Robinson, E., Christiansen, P., Nolan, S., Tudur-Smith, C. & Field, M. (2016). Inhibitory control training for appetitive behaviour change: A meta-analytic investigation of mechanisms of action and moderators of effectiveness. Appetite 97, 16-28.
Jones, A. & Field, M. (2013). The effects of cue-specific inhibition training on alcohol consumption in heavy social drinkers. Experimental and clinical psychopharmacology 21, 8-16.
Kavanagh, D. J., Statham, D. J., Feeney, G. F., Young, R. M., May, J., Andrade, J., Connor, J. P. (2013). Measurement of alcohol craving. Addictive Behaviors 38, 1572-1584.
Marlatt, G. A., Demming, B. & Reid, J. B. (1973). Loss of control drinking in alcoholics: an experimental analogue. Journal of Abnormal Psychology 81, 233-241.
Ward, R. D. (2016). Methods for dissecting motivation and related psychological processes in rodents. In Current Topics in Behavioral Neurosciences, pp. 451-470.
Wiers, R. W., Rinck, M., Kordts, R., Houben, K. & Strack, F. (2010). Retraining automatic action-tendencies to approach alcohol in hazardous drinkers. Addiction 105, 279-287.


No competing interests declared.

Author Reply

icristea replied to livmfield on 19 Oct 2016 at 01:20 GMT

We thank the Field, Jones and Christiansen for these stimulating comments on our meta-analysis of cognitive bias modification (CBM) for substance addictions and will take the opportunity to reply. Indeed, the results of our meta-analysis might not appear surprising based on narrative reviews. However, narrative reviews are an unreliable source of evidence as they do no try to provide a systematic snapshot of a field. They also leave a lot of action room for eventual a priori biases from their authors influencing the selection and interpretation of included studies. In what follows, we will succinctly attempt to reply to the main points raised by Field and colleagues in their comment.
The first critique involves our combination of what the authors of the comment call “laboratory research” with randomized controlled trials (RCTs). This apparent distinction is based on a current misconception about what defines an RCT. RCTs are identified as such by the existence of a random allocation of participants to two or more groups, with one receiving an intervention, and the other no intervention, a dummy treatment or no treatment. This is clearly stated by all the relevant guidelines, such as the Cochrane Handbook[1] (box 6.3a), which we have in fact closely followed for our inclusion criteria, or the NICE guidelines (http://www.nice.org.uk/gl...). In our previous review of an early version of our paper at a preeminent journal in the field, we were once again confronted with and surprised by how common this misconception is. An RCT is not defined by the presence of trial registration, by the existence of blinding, the use of a clinical samples and definitely not by the authors’ own description of the trial. An RCT is certainly not defined by whether or not it had “intended” to be an RCT. This last criteria regarding “intention” is the most misleading, as it opens the door to explicit and unaccountable data selection. In fact, by suggesting that we exclude studies that by definition would be RCTs just because their authors did not “intend” them as such, Field and colleagues effectively reprimand us for not having employed data selection. The term under which this process is known in the scientific research is “cherry-picking”, the selective exclusion of studies from a meta-analysis based on non-transparent, non-reproducible criteria, a practice that has been shown to distort results[2].
The second main critique regards our choice of outcome measures. Field and colleagues criticize us for not including alcohol consumption as measured by the “taste taste”. As it is clearly stated in our paper, we did include alcohol consumption or other measures of substance consumption (like cigarette consumption) when they were measured with a standardized instrument (such as the Alcohol Approach Avoidance Questionnaire/AAAQ, in fact developed by one of the CBM developers, or The Timeline Follow-Back diary/TLFB). We did indeed not consider behavioral measures such as the taste test. This task measures in the laboratory how much the participants drink from an alcoholic (usually beer) and respectively non-alcoholic drink (usually orange juice). We were unable to find any independent validation of this measure, particularly regarding its criterion validity. Field and co-authors claim to have demonstrated a correlation between how much participants drink in the lab and how much they drink in the laboratory. The paper they cite in the comment is their own meta-analysis showing that a form of CBM has a positive impact on food or alcohol consumption measured in the laboratory. We were not able to find the result they made reference to in that cited paper. However, in another paper, unfortunately not referenced in the comment[3] Jones and colleagues did show, as claimed in the commentary, that participants’ typical alcohol consumption was a significant predictor for their performance on the taste test. However, in the comment Field et al. omitted to mentioning other relevant results from this paper. For instance, typical alcohol consumption was a significant predictor together (i.e., in the same model) with subjective craving and the perceived pleasantness of the alcoholic drink. All these three variables were related to consumption as measured by the taste test. Also, hazardous drinking as measured by the Alcohol Use Disorders Identification Test (AUDIT) was not related to consumption as measured by the taste test. Hence, this is hardly evidence of validity. We still don’t know which one of these variables is more important in accounting for the performance in the taste test. Maybe it’s subjective preference, which would not really be related to addiction (unless of course to like beer more than orange juice is a problem). In fact, what appears evident from Jones at al. [3] is that it is not clear what behavior at the taste test means and that this behavior is most likely connected to a number of factors. Importantly, social desirability, which could have been very relevant given the nature of the problem and the study population (mostly young females) was not considered. Nonetheless, there is an even more problematic aspect in considering this paper as evidence of criterion validity. The analysis presented was not carried on an independent sample, but aggregating data from separate intervention studies. Jones et al.[3] claim to have demonstrated the construct validity of the taste test because they showed it is sensitive to experimental manipulations (mainly several forms of CBM). But this argument is circular. The authors are doing what is known as a “double-dip” in their evidence. On one hand, they use these studies as separate studies proving the efficiency of CBM interventions because of a demonstrated effect on participants’ performance in the taste test (in the single papers and in their meta-analysis[4]). On the other hand, they collate a very similar pool of studies (some are identical) to show these prove the validity of the taste test as a relevant outcome measure because it was successfully impacted by CBM interventions. A proper validation analysis would require a separate, new sample. Moreover, we would add that proper establishing of criterion validity for the taste test should not simply show that it correlates with consumption, this is of course evident. If you don’t drink beer, you will not drink it in the laboratory. Rather, validation ought to show that this behavioral measure is sensitive enough to detect relevant variations in drinking behavior, i.e., if you drink more outside the laboratory, you would tend to drink more in the taste test.
In sum, we are perplexed as to why Field and colleagues find our decision to exclude ad hoc non-validated behavioral measures confusing or arbitrary. They argue this decision was not supported by any references but this because there were no references validating the taste test as a substitute measure of addiction relevant outcomes. As we explained previously, citing the authors’ own research, it is not even clear what the taste test measures. Does it measure consumption? Does it measure preference? Does it measure the fact one simply does not like orange juice very much? As we are talking about a sensitive behavior like alcohol consumption, particularly given that a significant proportion of participants were aware of the purpose of the taste test, it would also be reasonable to question how this measure fares in regards to social desirability. Nonetheless, a more serious problem for the taste test in regard to our meta-analysis was that studies used different versions and very different scoring procedures. This is not simply a trivial problem to be just noted as a limitation, it cannot be simply filed under “the measure is not perfect” as Field and colleagues seem to suggest in their comment. This is a major issue particularly germane for a meta-analysis as it would have led to very high heterogeneity around the estimations of the effect. This means that the pooled effect size could have no longer been considered a reliable estimation of the “true” effect of the intervention. It also did not help that outcomes on the taste test are given in ml, a unit that simply cannot be aggregated with measures such as psychological scales. We do not have any standardization for this measure, so as to be able to tell how many ml of beer (or orange juice) drank in the lab are a little or a lot. As importantly, any standardization should probably take into account participant’s baseline values.
Field and colleagues also criticize us for considering craving as a more important outcome and seem to imply we did not take into account the existent limitations of measures of craving. This is inaccurate. Craving was simply a construct that was assessed in more trials and as such we could aggregate results. Some of these studies used validated scales, some used visual analogue or Likert scales. As evident from Table 2, we conducted sensitivity analysis exclusively examining the validated measures of craving and results were similar. We believe this is evidence we did acknowledge the limitations of the craving measures since we thought it necessary to verify them in sensitivity analysis. Moreover, we did not use attempt to speculate on the meaning of these results for addiction research, and simply reported them because craving was a relevant outcome measured in a sufficient number of studies. Surely, Field et al. can see that there are considerable differences between interpreting a construct like urge to drink, even when measured on a Likert scale, and interpreting the quantity of beer consumed in the laboratory, or in many cases an obscure proportion between the quantity of beer consumed and that of orange juice, as done by variations of the taste test.
To conclude, we thank Field and co-authors for these constructive criticisms. We admit that there are limitations to our meta-analysis and acknowledge the merit of their assessment. In fact, we are more than willing to recognize that if synthesizing their main critiques, their claim would be that while CBM might not work for addictions, it could be effective on making an individual drink orange juice or soda rather than beer in a laboratory setting, that might indeed be true. But we are less sure this is a healthy, desirable or practically relevant result.

Ioana Cristea, Babes-Bolyai University, Cluj-Napoca, Romania

References
1. Higgins JPT, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011] [Internet]. The Cochrane Collaboration; 2011. Available: www.cochrane-handbook.org.
2. Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, et al. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database Syst Rev. 2014; MR000035. doi:10.1002/14651858.MR000035.pub2
3. Jones A, Button E, Rose AK, Robinson E, Christiansen P, Di Lemma L, et al. The ad-libitum alcohol “taste test”: secondary analyses of potential confounds and construct validity. Psychopharmacology (Berl). 2016;233: 917–924. doi:10.1007/s00213-015-4171-z
4. Jones A, Di Lemma LCG, Robinson E, Christiansen P, Nolan S, Tudur-Smith C, et al. Inhibitory control training for appetitive behaviour change: A meta-analytic investigation of mechanisms of action and moderators of effectiveness. Appetite. 2016;97: 16–28. doi:10.1016/j.appet.2015.11.013

Competing interests declared: I am the main author of the paper that is being commented on.