Problems in Cross-Cultural Use of the Hospital Anxiety and Depression Scale: “No Butterflies in the Desert”

  • Gemma A. Maters ,

    Affiliation Health Psychology Section, Department of Health Sciences, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands

  • Robbert Sanderman,

    Affiliation Health Psychology Section, Department of Health Sciences, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands

  • Aimee Y. Kim,

    Affiliation Interdisciplinary Studies in Human Development, Graduate School of Education, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America

  • James C. Coyne

    Affiliations Health Psychology Section, Department of Health Sciences, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands, Institute for Health, Health Care Policy and Aging Research, Rutgers, the State University of New Jersey, New Brunswick, New Jersey, United States of America

The Hospital Anxiety and Depression Scale (HADS) is widely used to screen for anxiety and depression. A large literature is citable in support of its validity, but difficulties are increasingly being identified, such as inexplicably discrepant optimal cutpoints and inconsistent factor-structures. This article examines whether these problems could be due to the construction of the HADS that poses difficulties for translation and cross-cultural use.


Authors’ awareness of difficulties translating the HADS were identified by examining 20% of studies using the HADS, obtained by a systematic literature search in Pubmed and PsycINFO in May 2012. Reports of use of translations and validation studies were recorded for papers from non-English speaking countries. Narrative and systematic reviews were examined for how authors dealt with different translations.


Of 417 papers from non-English speaking countries, only 45% indicated whether a translation was used. Studies validating translations were cited in 54%. Seventeen reviews, incorporating data from diverse translated versions, were examined. Only seven mentioned issues of language and culture, and none indicated insurmountable problems in integrating results from different translations.


Initial decisions concerning item content and response options likely leave the HADS difficult to translate, but we failed to find an acknowledgment of problems in articles involving its translation and cross-cultural use. Investigators’ lack of awareness of these issues can lead to anomalous results and difficulties in interpretation and integration of these results. Reviews tend to overlook these issues and most reviews indiscriminately integrate results from studies performed in different countries. Cross-culturally valid, but literally translated versions of the HADS may not be attainable, and specific cutpoints may not be valid across cultures and language. Claims about rates of anxiety and depression based on integrating cross-cultural data or using the same cutpoint across languages and culture should be subject to critical scrutiny.


The Hospital Anxiety and Depression Scale [1] is one of the most widely used questionnaires in clinical and health psychology worldwide, outside of the United States where it has not won as much favor. It has been translated into 78 languages [2] for use in both western and non-western countries. The HADS is the most frequently used measure of mood disturbance in cancer care, where it has been applied in two-stage screening, assessment of severity of mood disturbance, and for validation of other measures [3]. It was originally designed for clinicians with the aim of providing a short screening instrument assessing psychopathology in non-psychiatric medical patients. Based on the assumption that scores on existing mood scales were confounded with somatic complaints in medically ill patients, the developers of the HADS excluded items seen as overlapping with symptoms of a somatic disorder [1]. Explicit reference to psychiatric symptoms was avoided and colloquial British English was chosen for some items, notably “I get a sort of frightened feeling like ‘butterflies’ in the stomach”, and response options varied across items in terms of both wording and keying. The Depression subscale (7 items) was based mainly on symptoms of anhedonia, rather than depressed mood, because the authors assumed that the former symptoms would respond better to antidepressants. The Present State Examination [4], together with research into clinical manifestations of anxiety neurosis [5], provided the basis for the 7-item Anxiety subscale. Reviews of the psychometric properties of the HADS have generally concluded that it has adequate sensitivity, case finding ability, concurrent validity and internal consistency [6], [7].

The HADS continues to enjoy international use and wide endorsement as one of the best available measures of depression and anxiety both for screening purposes and assessment of symptom severity, but several difficulties are now being identified in the same literature. It is our purpose to explore the implications of these difficulties for translation and cross-cultural use and to evaluate whether investigators have handled the HADS with appropriate sensitivity to issues. The goal is to evaluate whether non-equivalence of the HADS across languages and cultures might explain problems in the generalizability of cutpoints and consistency of factor-structures that have been reported.

Issues Raised in the Recent Literature Concerning the HADS

Vodermaier et al’s [8] review noted a troublingly broad, inconsistent range of optimal cutoffs obtained across studies, ranging from 8–22 for total score and 5–11 for depression and anxiety subscales. Singer et al [9] also noted varying cutpoints between studies for the depression subscale, and suggested re-calculation of different cutpoints for distinct groups of patients. Carey et al [10] reported a wide range of recommended thresholds in their recent review of validation studies performed in cancer patients. A Danish study [11] unexpectedly found lower mean HADS scores in a sample of breast cancer patients relative to women of the general population, a result that challenges either the presumed greater levels of depression among cancer patients than in the general population, or the validity of the HADS as a means of establishing relative levels of depression.

Cosco et al’s recent review [12] of 50 studies concluded that factor-structures of the HADS varied across studies and within populations, with the particular factor solutions ranging from one to four factors, with findings dependent upon the specific analytic strategy employed. Inconsistencies were greatest with cancer patients, the medical population in which the HADS is the most widely used measure of anxiety and depression. Cosco et al concluded that the original intention of the HADS having a two factor-structure distinguishing between anxiety and depression had not been achieved, and that the HADS should be interpreted as an assessment of emotional distress that does not distinguish between anxiety and depression. Cosco et al recommended that “the absence of psychometric robustness suggests that researchers should interpret subscale scores with caution or use the total score.”

In a pair of commentaries Coyne and Van Sonderen [13], [14] accepted Cosco et al’s conclusions concerning the basic factor-structure of the HADS, but disputed a recommendation for continued use of the HADS as a screening instrument, noting the inconsistencies in the cutpoints that were obtained within and across populations. They proposed that some problems might stem from decisions made in construction of the HADS, and particularly its deliberately varying response keys. They noted the consistently anomalous factor loading of item 7 (‘I can sit at ease and feel relaxed’), pointing out that it is a positively valence item, but with a reversed response key and different anchors than the item that just proceeded it. Coyne and Van Sonderen [14] expressed doubt that even an exceedingly alert patient would notice and be responsive to these changes in what was being asked. To answer consistent with the intention of the design of the HADS, patients would have to be attentive to sudden changes back and forth between positive and reverse worded items and in the available response options:

six items alternate between positive and reverse worded items indicating negative affect, but the seventh item breaks with this pattern. Furthermore, going from item to item, the first available response option shifts from “most the time,” to “definitely as much,” to “very definitely and quite badly” to “as much as I always could,” to “a great deal of the time,” to “not at all,” to “definitely.” The “not at all” is for the item “I feel cheerful” and the “definitely” is for the item “I can sit at ease and feel relaxed.” A number of items are ambiguous as to whether they refer to actual level of negative affect or to a comparison with ‘usual’.

We would add that when it comes to translating the HADS, it might prove difficult to preserve the comparability of positive versus reverse worded items, as well as the equivalence of the varying response key options across languages. Paralleling the problems of patients completing the HADS, translators might simply overlook these transitions, fail to capture them adequately in a second language, or they might improvise in an effort to compensate for problems that were recognized.

Four Different Dutch Versions of the HADS

Our concerns about translation and cross-cultural use of the HADS were prompted when we discovered four different Dutch versions of the HADS [15][18]. The four Dutch versions have different content for five (items 5, 7, 9, 11 and 13) of the 14 items, different response options in nine items (items 1, 2, 3, 4, 5, 7, 10, 11 and 14), different ranges of scores (0–4 or 1–3) and different timeframes (one week versus four weeks). Yet, we could find no indication in the published studies depending on a Dutch translation of the HADS that these multiple versions existed or which version was used, either among primary research studies using one of the versions, or in secondary discussions or integrations of results of the primary studies. The finding of four different Dutch versions was worrisome because the distinctions between these versions could conceivably prove substantial and there is little reason to presume that results could be generalized from one version to another. For generalizability across these four translations to hold, it would have to be assumed that results were not substantially influenced by differences in content, response options, or time frames. It would be extraordinary if this were the case. Thus, recommendations for cutpoints for Dutch translations are highly unlikely to generalize across versions, and integration of data from Dutch versions with the original English version or translations into other languages is likely problematic, particularly if the goal is identification of a cross-culturally valid cutpoint. We sought to determine how the translation of the HADS is being handled in other languages, whether potential problems were noted, and how they were being addressed.

Challenges in Translating the HADS

A review of translation methods by MAPI Research Trust [19] concluded that recommendations for cross-cultural translations of questionnaires need further development and that a multistep approach was needed to obtain high quality translations. A checklist was provided in the review to assess the methods used in a translation process and to list actions taken. Producing a dependable, high quality translation is costly and labor intensive [19], [20]. Several other papers already have paid attention to the complexity of producing a high quality translation, ways to reach equivalence across different languages and cultures, and problems that might arise in the translation process [21][24]. Adequate cultural adaptations of instruments are not easily achieved and with questionnaires usually not designed with anticipation of the issues posed for translation, it is difficult to ensure that items in a translated instrument are conceptually equivalent to the original version [25]. If not addressed carefully, the influence of language or culture might manifest in each of several ways. One possibility is a shift in mean scores. Another possibility is diminished validity, because the translated item measures something else than intended in the original version [26], as represented in different validity correlates. Subtle differences between questionnaires caused by translation of items or response options could lead to incomparable cutpoints.

MAPI Research Trust in France is responsible for the distribution of HADS translations. They state “the author has selected MAPI Institute as exclusive linguistic validation company to ensure the production of harmonized and consistent language versions” [2]. Yet, MAPI did not carry out all translations and validations. The original developers of the HADS intended to make the items easy to translate into other languages [27]. But the question is whether they succeeded, and whether the apparent benefits of the reliance on colloquial British English for construction of items remain when the instrument is translated into other languages. An earlier guideline by Brislin et al [28] cautioned against use of colloquialisms in a questionnaire because of the risk of subsequent difficulties in achieving an equivalent translation. So, reliance of the developers of the HADS on colloquial British language complicates the translation process, in addition to the existing complexity of achieving an adequate translation in itself.

In preliminary work, we had sent an email inquiring about translation procedures to a sample of investigators. As anticipated, several of the colloquial items turned out to be difficult to translate into some languages. For instance, considerable effort was put into translating the item “I get a sort of frightened feeling like ‘butterflies’ in the stomach” into Omani Arabic dialect. The investigators recognized that they had to capture the intended feeling, and chose to do this in the audiotaped delivery of the item. In addition, the author of an Arabic version explained to us: “A lot of difficulties because this question of the butterflies appeared not only strange but rather funny to many Arabic-speaking individuals”. Translation of the response options turned out difficult in some languages as well. This is what the author of a Punjabi version replied: “The response options were difficult to translate, to get appropriate gradations between ‘all of the time’ and ‘most of the time’. The same word was commonly used in Punjabi for both of these responses”. Although not systematic, this preliminary work encouraged us to look further into the awareness of these issues on the part of investigators who were using translated versions and in reviews that integrated results from translated versions with results from the original English version. In sum, we had obtained preliminary indications that the HADS is not as easy to translate as intended by the developers and that unacknowledged problems might exist in translated versions.

Cultural Awareness of Investigators in their Usage of the HADS

We next looked for remarks in HADS literature concerning problems that might have been caused by using translated versions. Surprisingly few concerns were expressed in literature about the use of translated versions of the HADS. Herrmann [6] noted that scores on translated versions of the HADS might be influenced by cultural factors. As one of the possible explanations for the diverging thresholds, Carey et al [10] referred to the translated versions of the HADS used cross-culturally in the studies they reviewed. They noted that only one of the ten studies validating the HADS for use with cancer patients had used the original English-language version and that different translations might yield different factor-structures and optimal cutpoints. A study by Martin et al [29] in patients with coronary heart disease in three different countries suggested a three-factor structure. But the factor-structure turned out to be different among the three countries. Wang et al [30] identified issues in factor-structure in the Chinese version of the HADS as possibly caused by difficulties in the translation of the HADS into Chinese. Similarly, a study by Chan et al [31] indicated a two-factor structure, but also the loading of item 7 (‘I can sit at ease and feel relaxed’) on depression. Citations indicate the same Chinese translation was used in both studies. El-Rufaie and Absood [32] concluded that differences in cutpoints of the Arabic HADS relative to the English version might have been caused by linguistic or cultural factors. Research conducted in Oman [33] compared HADS scores with the results of the Composite International Diagnostic Interview, in patients with Traumatic Brain Injury and found a sensitivity of 53.8% and specificity of 75.9%, but with an optimal cutpoint of only four. It was concluded that the poor performance of the HADS might have arisen in the process of translating the questionnaire into the Omani dialect. Chaturvedi [34] pointed out how the results of studies with translated HADS versions in Asian participants could have been affected by cultural differences, commenting on a paper by Nayani [35]. In their recent review, Cosco and colleagues [12] acknowledge the possibility of translation issues causing heterogeneity in factor-structures. But the tables and source papers in their article suggest different HADS language versions were nonetheless integrated.

Overall, few concerns were expressed about the use of translated HADS versions and subsequent consequences, which made us suspicious of the awareness of investigators of these problems. We were concerned whether investigators who used a translation of the HADS identified the source of a translated version they used and measures taken to ensure proper validation. Van Widenfelt et al [21] observed that quite often articles fail to report the origin of translated questionnaires. In addition, we examined if authors of reviews integrate data from diverse cultures and translations and acknowledge difficulties in doing so. While HADS is our specific focus, other instruments, particularly those constructed in colloquial language, might pose the same issues when translated and used cross culturally.

Methods and Results

Reports of HADS Translations and Validity Studies in Papers Originating from Non-English Speaking Countries

We were encouraged to examine how explicit and accurate investigators reported in their article about the translated version of the HADS, its provenience or, if it was translated by the investigators themselves, how validity was assured. A comprehensive search was performed in the Pubmed and PsycINFO databases in May 2012. Keywords were (“HADS”) OR (“HAD scale”) OR (“hospital anxiety” AND “depression” AND (“scale” OR “scales” OR “score” OR “scores” OR “subscale” OR “subscales” OR “sub-scale” OR “sub-scales”). After removal of duplicates, and citations for book chapters and comments and letters to the editor, 4555 references were left. To reduce the scope of the task, every fifth (20%) of the remaining abstracts were examined by one of the authors (GAM), and a research assistant. They examined 913 abstracts and removed references of papers that were written in another language than English (79) or in which the HADS had not actually been used, but only cited (4). For the remaining 830 abstracts, the country in which the research was conducted was recorded. For 15 papers the country could not be determined, because it was not mentioned in the abstract and the full text was not available either on the web or through interlibrary loan. A total of 345 articles originated from an English speaking country, of these 69% (237 papers) originated from the UK. Other identified English speaking countries were the USA, Australia, Canada and New Zealand.

A total of 470 papers originated from non-English speaking countries (58%). Available full texts of the papers were examined (417). Country, indications of the use of translated HADS versions and documentation on the source of a translated version and its validity were recorded. Table 1 shows the results of our examination. The first column indicates the country in which the study was executed. Per country the total number of papers examined are reported (second column), as are the reports of authors on the language version used (third column). The number of citations of studies validating a particular language version and the number of citations of the 1983 study by Zigmond and Snaith [1] are indicated in the last two columns. As a specific illustration using results presented in Table 1, 34 papers originating from Norway were examined in total. Yet, only 9 out of the 34 papers indicated a Norwegian version of the HADS was used. The other 25 papers reported nothing about the version used. In the Norwegian case 30 out of 34 papers cited Zigmond and Snaith, but only 14 out of 34 papers cited a validation study of a Norwegian version of the HADS.

Table 1. Reports of translated HADS versions used, citations of the Zigmond and Snaith 1983 study and citations of validation studies with non-English HADS versions, by investigators in non-English speaking countries.

On the whole, explicit reports of the use of a translated version of the HADS were outnumbered by articles making no statements at all about the version used; in only 45% did investigators state that they used a translated version of the HADS in their study and indicated the language. Of all papers from non-English speaking countries 46% did not cite a validation study in the language of their country and yet 13% did not even cite Zigmond and Snaith [1]. In conclusion, although the HADS was frequently used in non-English speaking countries, less than half of the papers originating from non-English speaking countries reported which particular HADS version was used and slightly more than half of the papers did report validation in the language to which the HADS was translated.

Integration of Data from Different Language Versions of the HADS in Reviews

Seventeen reviews, including two meta-analyses [3], [6][8], [12], [36][47], that integrated studies with at least two different language versions of the HADS, were next extracted from our database. These papers were examined by GAM and AYK for the strategies that the authors reported to deal with different language versions, the way different versions were compared and reports of possible problems and corrective actions concerning language or culture. Table 2 summarizes the results of our examination. The Table shows which language versions were compared to each other, and in what way (column 4 and 5). In the last column of the Table all comments by the authors of the reviews, if any, about language or culture are listed.

Table 2. Reports of translated HADS versions used, and of corrective actions and qualifications concerning language and culture in reviews that integrated studies with different language versions of the HADS.

Seven papers did not mention that they included studies with several different translated versions [3], [12], [36][40], although we could determine that they did so by examining citations and source articles. Few concerns or problems were reported about reliance on translations. Bjelland et al [7] raised concerns on the reliability of the HADS across translations. But they argued against this being a problem, citing Cronbach’s coefficient alphas of ≥.60 in all studies. However, such reliability does not establish comparability. Bjelland et al also calculated a mean cutpoint >8 on the anxiety and depression subscales for cancer patients. Yet, examining the original source papers we discovered that the mean cutpoint was calculated from one study with an Italian version, two studies with a French version, one study with a Japanese version and five studies with the original British version (including one study executed in South Africa). In the original source paper of the Italian study no specific information is provided on the origin or quality of the used translation [48]. One of the French studies reported that the HADS was translated into French by Zigmond and Snaith [1] and validated by Lepine [49] and Razavi [50], but it is difficult to evaluate the quality of the translations from the information that was provided. The Japanese study mentioned a back translated Japanese version by Kitamura [51]. Thus, examining the original source papers did not yield a clear picture of the quality of the different translated versions, and so the calculated mean cutpoint across countries could be of dubious value. Herrmann [6] warned that scores could vary across cultures, and validity studies have not been performed for all translated versions. So, the review by Herrmann [6] stands out as an exception in which it is stated that culture or language has to be taken into account. Vodermaier et al [8] concluded there is considerable evidence for HADS validity in different languages, relative to other measures used for research in cancer care, although cutpoints differed between studies. A recent meta-analysis by Brennan et al [41] inspected the possible contribution of translation to heterogeneity, with a fixed cutpoint. Based on a diagnostic odds ratio of.72 they decided against it. On the other hand, the paper by Carey et al [42] explicitly mentioned how culture might influence HADS thresholds. And Meades and Ayers [45] referred to cultural or psychometric factors contributing to problems with the internal consistency, factor-structures and cutpoints.

In sum, attention paid to translation and cross-cultural issues is limited in the reviews that we examined. The authors of most review papers indiscriminately compared results obtained with different language versions of the HADS without acknowledgment.


Our discovery of four different Dutch versions of the HADS triggered concerns over whether cross-cultural and translation issues cause problems in the wide usage and interpretation of this instrument worldwide. Our concerns were consistent with problems increasingly raised in HADS literature concerning varying cutpoints and factor-structures. The aim of this paper was to investigate the possibility that cross-cultural and translation issues are underlying to the reported problems in HADS literature.

Examination of a sample of abstracts from papers on studies using the HADS showed this questionnaire was used more often in non-English speaking countries than in English speaking countries. Thus, integrative reviews and meta-analyses of cutpoints and correlates of the HADS that do not distinguish between studies conducted in different languages are relying more on translated versions than the original English version. Yet, most papers originating from non-English speaking countries did not report the version of the HADS used, and only slightly more than half of all papers report whether it was validated in the language of the participants. In the reviews and meta-analyses we examined, cross-cultural issues were addressed in only seven of the seventeen papers [6][8], [41], [42], [44], [45]. Others uncritically combined studies conducted in different cultures and languages [3], [12], [36][40], [43], [46], [47]. Thus, cultural awareness of investigators concerning the HADS turned out unsatisfactory in our sample.

We believe that the inattention to problems in translating the HADS can explain at least some of the problems in varying cutpoints across studies as well as inconsistencies in factor-structure. These problems can be compounded when data from translated versions are integrated across studies in narrative and systematic reviews. However, documentation exists of varying cutpoints and factor-structures in when studies are limited to English-speaking populations with the unaltered original instrument, and so use of the translated HADS alone cannot explain more pervasive problems.

This paper indicates considerable room for improvement in terms of transparency and accuracy on the part of investigators regarding the origin of version of the HADS used. This is likely a more general issue in the reporting of studies using translated questionnaires [21]. We strongly recommend that journals publicize requirements for explicit reporting of the information concerning translation and revalidation in any cross-cultural use of the HADS or other translated questionnaires. According to the Scientific Advisory Committee of the Medical Outcomes Trust [52] for others to be able to review the quality of the translation and cultural adaptation of a questionnaire, the following information should be made available by the developers: how linguistic and conceptual equivalence were reached, whether any differences exist between the original and the new version, and how inconsistencies where dealt with. Acquadro et al [19] further provide a checklist to assess the information reported in articles concerning the process of translation and revalidation by. To be able to use this checklist, detailed information on the method of translation used, the translators involved and the qualification, any communications with the developer(s) of the original version, pilot testing and “International Harmonization” is needed. Analogously, we suggest that investigators dependent on an already translated tool to report in their papers at the minimum: the language or dialect into which the HADS was translated, how the translated version was obtained, whether the quality of the translation process and the result of this process were reviewed and if a validation study was conducted with the translated version. Lacking information on the quality of a translation and validation of a questionnaire, readers cannot be certain that problems in the language or the cross-cultural usage of the HADS did not bias or even invalidate the results of the study. Yet, published studies reviewing or using the HADS have consistently assumed that different versions are comparable enough so that any differences can be ignored.

We caution that our review was not exhaustive, but was based on a sampling of 20% of papers with results dependent on the HADS. However, our efforts meet the Black Swan criterion: we think that we have found sufficient documentation of problems in the translation and interpretation of the HADS to reject the null White Swan hypothesis of no problems in the translation or cross-cultural interpretation of the HADS. Yet, we need to start to ensure that our measures – as the HADS - across languages/cultures are measuring exactly the same so that we can trust comparisons of data collected in different languages.

The problems that we have identified with the cross-cultural use of the HADS may not be specific to this instrument, but endemic to translated versions of other instruments, and particularly in those deliberately constructed in colloquial language, these problems may even be more pervasive. The Edinburgh Postpartum Depression Scale [53] embraces British colloquial language with the item ‘Things are getting on top of me’, which must strike many Americans as odd and confusing. Similarly, the item on the Beck Depression Inventory [54], ‘I feel sad and blue’ will perplex respondents confronting the item in languages like Italian where “blue” does not have the affective connotation as in English. Translators would seem to do best to avoid attempting literal translations of colloquialisms, but then run the risk of not being able to establish exact equivalency at the item level, and possibly the scale level. Based on the limited number of reports we obtained for investigators using the HADS cross-culturally, we suspect that considerable improvisation occurs and therefore inconsistency in results in the translation of other scales.

In conclusion, we think the issues currently being raised in HADS literature concerning inexplicably varying factor-structures and cutpoints might very well be created in part or amplified by translation and cross-cultural problems. Results obtained with translated versions of the HADS should be treated with caution. Because most investigators in this study were not explicit on the way the translated version was acquired and how validation was ensured, there is no guarantee that authors handled the HADS in a proper culturally sensitive way. Our results strongly suggest that readers of published cross-cultural studies should have some skepticism about the validity of findings and that future publications should better document exactly what was done to ensure the cross-cultural validity of translated versions and generalizations from results obtained in other cultures and languages. If other questionnaires are being handled in the same way by investigators, this warning applies to these measures too.

Author Contributions

Conceived and designed the experiments: GAM RS AYK JCC. Performed the experiments: GAM AYK. Analyzed the data: GAM RS AYK JCC. Wrote the paper: GAM RS AYK JCC.


