Development and Psychometric Validation of the EDE-QS, a 12 Item Short Form of the Eating Disorder Examination Questionnaire (EDE-Q)

Nicole Gideon; Nick Hawkes; Jonathan Mond; Rob Saunders; Kate Tchanturia; Lucy Serpell

doi:10.1371/journal.pone.0152744

Abstract

Objective

The aim of this study was to develop and validate a short form of the Eating Disorder Examination Questionnaire (EDE-Q) for routine, including session by session, outcome assessment.

Method

The current, 28-item version (6.0) of the EDE-Q was completed by 489 individuals aged 18–72 with various eating disorders recruited from three UK specialist eating disorder services. Rasch analysis was carried out on factors identified by means of principal component analysis, which in combination with expert ratings informed the development of an EDE-Q short form. The shortened questionnaire’s reliability, validity and sensitivity was assessed based on online data collected from students of a UK university and volunteers with a history of eating disorders recruited from a national eating disorders charity aged 18–74 (N = 559).

Results

A 12-item short form, the Eating Disorder Examination Questionnaire Short (EDE-QS) was derived. The new measure showed high internal consistency (Cronbach’s α = .913) and temporal stability (ICC = .93; p < .001). It was highly correlated with the original EDE-Q (r = .91 for people without ED; r = .82 for people with ED) and other measures of eating disorder and comorbid psychopathology. It was sufficiently sensitive to distinguish between people with and without eating disorders.

Discussion

The EDE-QS is a brief, reliable and valid measure of eating disorder symptom severity that performs similarly to the EDE-Q and that lends itself for the use of sessional outcome monitoring in treatment and research.

Citation: Gideon N, Hawkes N, Mond J, Saunders R, Tchanturia K, Serpell L (2016) Development and Psychometric Validation of the EDE-QS, a 12 Item Short Form of the Eating Disorder Examination Questionnaire (EDE-Q). PLoS ONE 11(5): e0152744. https://doi.org/10.1371/journal.pone.0152744

Editor: Nori Takei, United (Osaka U, Kanazawa U, Hamamatsu U Sch Med, Chiba U and Fukui U) Graduate School of Child Developmen, JAPAN

Received: July 20, 2015; Accepted: March 18, 2016; Published: May 3, 2016

Copyright: © 2016 Gideon et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All anonymized, relevant data are within the paper and its Supporting Information files.

Funding: The authors have no support or funding to report.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Eating disorders pose a serious challenge to mental health services due to their often chronic trajectory [1] and far-reaching psycho-social and medical consequences [2, 3]. It is therefore crucial to carry out appropriate psychological assessments and monitor progress throughout therapy so that care and treatment can be optimised. Evidence suggests that continuous collection and feedback of routine outcome measures leads to more positive treatment outcomes for patients [4–7]. Hence, public health authorities responsible for the delivery and regulation of mental health services increasingly demand the collection and reporting of patient outcome data [8].

The Eating Disorder Examination Questionnaire (EDE-Q) [9] is a 28-item, self-report measure derived from the Eating Disorder Examination (EDE) [10, 11], the latter being widely viewed as the “gold standard” in the assessment of eating disorder pathology [12]. The EDE-Q was developed to provide a self-report questionnaire that can approach the “gold standard” whilst being less onerous for patients [9]. It is widely used and is the only outcome tool for the assessment and monitoring of eating disorders recommended by the National Institute for Mental Health in England [13].

The EDE-Q’s psychometric properties have been extensively investigated in various study populations, including individuals with eating disorders receiving specialist treatment. The measure has been found to have strong psychometric properties in terms of internal consistency and test-retest reliability for both total scores and scores on subscales assessing four domains of eating disorder psychopathology (concerns about dietary restraint; concerns about eating; concerns about weight; concerns about shape) [14, 15, 16, 17]. Strong convergent validity between the EDE-Q and EDE has also been demonstrated in both clinical and general population samples [9, 12, 18].

There are, however, a number of problems with the EDE-Q. First, studies investigating the measure’s factor structure in various study populations have not supported the four-factor model entailed in the current subscales, while also failing to agree on an alternative factor structure [14, 19–21]. This raises the question of the appropriateness and utility of the existing subscales.

Second, despite general convergence in scores, people consistently score higher on the EDE-Q than on the EDE. This raises concerns about using these methods interchangeably [22]. Inconsistencies between the EDE and the EDE-Q have also been observed in the self-report assessment of certain eating disorder features, such as objective binge eating behaviours [9, 17, 18, 22–25], laxative use [9] and self-induced vomiting [26]. Discrepancies of this kind are typically taken to infer the superiority of interview assessment, although this is not necessarily the case [12, 25].

Third, although administration time for the EDE-Q is markedly reduced when compared with that for the EDE, the EDE-Q is still longer than ideal for use as a session-by-session outcome measure. Finally, the EDE-Q assesses the occurrence and frequency of symptoms over the past 28 days. This time frame makes the capture of identification of change from one week to the next problematic.

With these considerations in mind, the aim of the current study was to develop a short form of the EDE-Q, the EDE-QS, which can be used for sessional outcome measurement. We aimed for a measure with high reliability and construct validity, including strong positive correlations between the EDE-QS and the original EDE-Q, other measures of eating disorder pathology, and measures of comorbid psychopathology, and strong negative correlations with measures of quality of life. It was also hypothesised that the frequency of eating disorder behaviours would be comparable for the EDE-QS and the EDE-Q. A secondary aim of the study was to compare EDE-QS scores between people with and without a current eating disorder and thereby determine the measure’s sensitivity for differentiating between these subgroups.

Study 1

The purpose of this study was to develop a short version of the EDE-Q, which could be used for sessional outcome monitoring.

Methods

Participants and procedures

This study obtained ethical approval from a National Health Service (NHS) ethics committee. EDE-Q data for 489 patients attending three UK Eating Disorders Services between April 2008 and January 2013 were included. Informed consent was not sought for this archival sample and patient data were anonymised and de-identified prior to analysis. One service had administered and collected EDE-Q version 6.0 (28 items, N = 297), whereas the other two services had used an older, 36 item version (N = 192). The main difference was the removal of frequency questions about subjective binge eating and diuretic misuse from version 6.0. Therefore, responses to the 36-item EDE-Q were mapped onto the latest version and the samples were combined.

The final sample included in- and outpatients. The majority was female (90.2%) and age ranged from 18 to 72 years (M = 31.5, SD = 11.5). The Global EDE-Q scale scores ranged from 1.4 to 6 (M = 4.2, SD = 1.2). Probable DSM-5 diagnoses [27] were derived from EDE-Q responses (see S1 Appendix for diagnostic methods employed). Sixteen percent of respondents were identified as probable anorexia nervosa (AN)—restrictive, 15% as probable AN–binge/purge subtype, 21% as probable bulimia nervosa (BN), 18% as probable binge eating disorder (BED) and 30% as probable other specified feeding and eating disorders (OSFED). Mean Body Mass Indices (BMI) were 14.23 (SD = 1.7) for AN–restrictive subtype, 14.79 (SD = 1.5) for AN–binge/purge subtype, 24.83 (SD = 7.8) for BN, 37.23 (SD = 13.8) for BED and 27.27 (SD = 13.6) for OSFED.

Measures

EDE-Questionnaire.

The current version of the EDE-Q (6.0) comprises 28 items. The 22 scaled items are categorised into four subscales: Restraint (5 items), Eating Concern (5 items), Shape Concern (8 items) and Weight Concern (5 items). Scores on each item range from “0” to “6”, with higher scores indicating higher symptom levels. As the subscales have varying numbers of items, subscale scores are calculated as average scores per item, also ranging from ‘‘0” to ‘‘6”. Based on Fairburn and Beglin’s (2008) instructions, the EDE-Q’s global score is commonly obtained by taking the mean of each of the subscales’ mean scores. The subscales vary in the number of items which they contain and therefore, some items are more heavily weighted than others. However, in view of concerns regarding the validity of the EDE-Q subscales [12, 14, 19], the global score for the current study was derived from the mean score of the 22 scaled questionnaire items, which also meant that all items received equal weighting. Items 13–18 of the EDE-Q elicit open responses to the frequency (number of times or days) of specific eating behaviours, such as objective binge eating (OBE), self-induced vomiting (SIV), laxative use (LAX) or excessive exercise (EX), over the last 28 days. These are not included in the subscale scores.

In this archival sample, Cronbach alpha for the global score was 0.90 and ranged from 0.70 for weight concern to 0.80 for shape concern.

Statistical Analyses

The psychometric properties of the EDE-Q were explored using the Rasch model [28, 29]. Rasch analysis was chosen as it can be used to assess the appropriateness of a questionnaire’s rating scale, identify redundant or “misfitting” items for deletion and identify those items for retention that are sensitive to variance across the range of eating disorder severity. Due to the questionable validity of the EDE-Q’s subscales and inconsistent results with regards to number of factors and associated items in the literature (e.g., [19, 20]), an exploratory principal component analysis (PCA) was conducted first to derive the dimensions of the EDE-Q in our sample. Rasch analysis was carried out separately on each dimension to satisfy the model’s assumption of unidimensionality [30, 31]. A survey of mental health professionals obtained ratings on the importance of each EDE-Q item, i.e. how clinically meaningful scores or changes in score on each item are perceived to be. Information from the exploratory PCA, Rasch Modelling and expert survey was considered in conjunction to make decisions on the inclusion and exclusion of items.

Exploratory PCA.

An exploratory PCA was carried out using SPSS 21, using oblimin rotation (oblique) to allow factors to correlate. Only the scaled EDE-Q items were included in the analyses. There was less than 5% of missing data for each scaled item, which were completely missing at random (Little’s MCAR test χ(741) = 754.79; p = .35). Imputations were made using the Expectation Maximisation method. Factors with eigenvalues above 1 were retained.

Rasch Analysis.

The use of Rasch analysis was considered to be of particular importance as it is a good method for examining the appropriateness of a questionnaire’s rating scale and for identifying those items that are less useful as well as those that are more valuable for measuring a scale’s construct [32, 33].

Winsteps software was used (version Bond&FoxSteps [32]). The polytomous Rasch rating scale model was applied because the EDE-Q’s response scale is ordinal with seven response options.

1. Response Scale Properties

As a first step, the characteristics of the questionnaire’s rating scale were examined for each factor to assess whether their response categories were meaningful and informative. Rating scale criteria, as set out by Linacre [34], were examined:

At least 10 responses should be present in each response category.
There should be a regular distribution of responses across response categories.
There should be a consistent increase of average measures with each category.
Category thresholds should increase monotonically, ideally by at least 1.4 logits but no more than 5 logits [32]. This was also inspected visually by examining the probability curves for each factor. The individual curves should show distinct peaks for each category, indicating that each is the most probable response for some part of the eating disorder pathology [32].
Category outfit mean square values should be less than 2.

Violation of these criteria prompted collapsing of categories. The rating scale diagnostics and probability curves of the collapsed models were then compared to the original to identify the optimal number of response categories [32]. Person and item separation indices were inspected to assess whether collapsing of response categories improved the reliability of persons and items. According to Bond and Fox indices should have values of at least 2 [32].

2. Model Fit

Items that have little predictive value and obtain unexpected ratings are said to misfit the model and introduce random variability into the data. Mean square infit and outfit values were used to assess this with acceptable values between 0.7 and 1.40 [32]. The item-measure correlation was also investigated. Values greater than 0.3 demonstrate that the item is sufficiently correlated to the overall concept or model [35]. Poorly fitting items were considered for deletion.

3. Redundant Items

Residual correlations between items within a scale were examined for local dependency, i.e. that responses to one item are dependent on or can be predicted by responses to another item, which implies item redundancy [33]. Residual item correlations that have values greater than 0.3 of the overall average of all correlations suggest local dependence [36]. Where this applied, deletion of one of the dependent items was considered.

4. Eating Disorder Severity

Each item’s difficulty estimate was calculated to select those items from each subscale that capture both mild and more severe eating disorder symptoms. Items that showed a strong overlap of difficulty (i.e., differences <0.20) were considered for deletion [37].

Expert Survey.

A link to an online survey was emailed to mental health clinicians in the eating disorder field known to the authors. Experts were asked to categorise each EDE-Q item into “least important”, “very important—might be good to include” or “most important–needs to be included”. These categories were given values from 0–2 and summed up for each EDE-Q item. Their total scores were used to obtain an overall rating of importance for assessing clinical change. These could range from a minimum score of 0 to a maximum score of 20. Ten clinicians with at least six years expertise in working with people with eating disorders completed the survey. Professionals from different countries (including UK, USA, Canada and Australia) were invited to participate but unfortunately the country of residence was not recorded in the survey and details in this regard are therefore not available.

Results

Exploratory PCA

The exploratory PCA suggested a five-factor model with distinct and reliable factors (KMO = 0.874). This was further supported by the scree plot. Bartlett’s test of sphericity (5,289.84; p < .001) indicated data appropriate for PCA.

Factor I explained 33.01% of the total variance and Factor II added 13.04% of variance. Both consisted of six items. Factor III (4 items) explained 6.53% of additional variance. Factor IV added 5.34% of explained variance and consisted of two items only and Factor V (4 items) explained 4.98% of variance. None of the factors replicated the original EDE-Q’s subscales. Factor 2 and 3 resembled the original Shape Concern and Dietary Restrain subscale, respectively. However, they also contained items from other subscales. Factors 1, 4 and 5 differed substantially from the original subscales (see Table 1 for individual factor loadings). Factor 4, consisting of two items regarding secret and guilty eating, two criteria for a binge eating episode, resembled an indirect index of binge eating episodes.

Download:

Table 1. Summary of PCA, Rasch analysis, expert survey and diagnostic relevance.

https://doi.org/10.1371/journal.pone.0152744.t001

Rasch analysis

As the fourth factor comprised only two items, it was not included in a separate Rasch analysis [35, 38].

Rating scale diagnostics showed that responses across categories were not evenly distributed (e.g., rating scale category 2 held consistently less than 10% of responses) as respondents most commonly selected the most extreme response options (“no days” and “every day”). Further, all categories had disordered category thresholds, which was also clearly visible from the probability curves (see Fig 1 for an example). This implies that the given categories are not selected in a way consistent with the respondents’ severity of eating disorder. For example, more severely impaired persons may have selected response option three (13–15 days), whereas only mildly impaired people chose option four (16–22 days). The distinctions between the individual response categories might not be meaningful to participants and it suggests that there may be too many categories to choose from.

Download:

Fig 1. Response probability curve with original 7-point response options (Factor 5).

https://doi.org/10.1371/journal.pone.0152744.g001

To shorten the scale, response options 1 and 2, options 3 and 4, and options 5 and 6 respectively were combined. This produced a four-point response scale, which included values ranging from zero to three (0112233). This was applied across all factors as it was considered essential that all items of the short form use the same response scale. The revised four-point response scale demonstrated improved category thresholds, distribution of response frequencies and probability curves across all factors (see Table 2). All but one factor now showed ordered category thresholds and probability curves showed more distinct peaks (see Fig 2). Probability curves improved markedly but still showed respondents’ tendency to endorse the extreme points of the questionnaire, i.e. “no days” and “every day”.

Download:

Fig 2. Response probability curve with collapsed 4-point response options (Factor 5).

https://doi.org/10.1371/journal.pone.0152744.g002

Download:

Table 2. Rating scale diagnostics, reliability indices and visual inspection of probability curves for original and collapsed 4-point rating scale.

https://doi.org/10.1371/journal.pone.0152744.t002

A high person separation index indicates that there is a good spread of responses, or in this case eating disorder pathology amongst the sample, which is likely to lead to consistent responding over time. Collapsing of response categories led to a slight reduction in the person separation indices for each factor. It was however decided to prioritise ordered thresholds over an already low person separation index. The item separation indices, which determine if items are responded to in the same way if given to a different sample, also reduced but remained above the recommended threshold of 2 [32].

Expert survey

Ratings of items’ ability to indicate clinically significant change ranged from 0 to 15. Please refer to Table 1 for experts’ individual item ratings.

Combination of methods

Table 1 summarises the results of the principal component analysis, Rasch model and expert survey for the scaled questionnaire items, alongside decisions and rationales for deleting individual items.

Frequency items.

The frequency items were inspected in a similar fashion, investigating expert ratings and diagnostic relevance. Item 15 had high overlap in content with item 13 and 14. As the latter were rated higher by experts, item 15 was removed. Item 14 refers to a loss of control over eating. This has shown to be a better predictor of eating disorder pathologies than objective binge eating [39]. In order to have an independent item on perceived loss of control over eating [40] as well as a measure of objective binge eating, the order of items 13 and 14 was reversed. Respondents are therefore asked about perceived loss of control first, which is followed by a question on objective binge eating episodes. Items 16 and 17 refer to compensatory behaviours (i.e. taking laxatives and vomiting) and were combined into a single item.

Since the study aimed to develop a measure suitable for sessional outcome measurement, which is likely to be weekly, the response scale was recoded from a 28 day reference to seven days. To reduce missing responses and increase simplicity of coding of the frequency items, a Likert-scale response format was adopted. This resulted in the 12 item EDE-QS (see S2 Appendix), which, unlike the original EDE-Q, consists of a single scale.

Discussion

The aim of study 1 was to develop a short form of the EDE-Q from questionnaire responses of people presenting to mental health services with a wide range of eating disorders. By combining statistical and expert based methods, we sought to develop a shortened version of the EDE-Q that is psychometrically and conceptually sound [41]. The exploratory PCA produced a five-factor model that did not replicate the original EDE-Q subscales, consistent with previous studies [14, 19, 20, 42].

Altering the response categories to a four-point rating scale, based on Rasch analysis, improved the functioning of the scale although a tendency of respondents to endorse the extreme points of the scale remained. Changing the reference time period of the scale to seven days was intended to improve accuracy of recall and permit evaluation of change over a shorter time frame. Combining statistical analyses and expert ratings resulted in the 12 item EDE-QS. The second study was conducted to evaluate the psychometric properties of the new measure, including its convergent and divergent validity, internal consistency, test-retest reliability and sensitivity in discriminating between people with and without eating disorders.

Study 2

The aim of this study was to validate the EDE-QS, the newly developed short version of the EDE-Q.