The Global Burden of Disease (GBD) project systematically assesses mortality, healthy life expectancy, and disability across 195 countries and territories, using the disability-adjusted life year (DALY). Disability weights in the DALY are based upon surveys that ask users to rate health states based on lay descriptions. We conducted an experimental study to examine whether the inclusion or removal of psychological, social, or familial implications from a health state description might affect individual judgments about disease severity, and thus relative disability weights.
We designed a survey consisting of 36 paired descriptions in which information about plausible psychological, social, or familial implications of a health condition was either present or absent. Using a Web-based platform, we recruited 1,592 participants, who were assigned to one of two experimental groups, each of which were asked to assign a value to the health state description from 0 to 100 using a slider, with 0 as the “worst possible health” and 100 as the “best possible health.” We tested five hypotheses: (1) the inclusion of psychological, social, or familial consequences in health state descriptions will reduce the average rating of a health state; (2) the effect will be stronger for diseases with lower disability weights (i.e., less severe diseases); (3) the effect will vary across the type of additional information added to the health state description; (4) the impact of adding information on familial consequences will be stronger for female than male; (5) the effect of additional consequences on ratings of health state descriptions will not differ by levels of completed education and age.
On average, adding social, psychological, or familial consequences to the health state description lowered individual ratings of that description by 0.78 points. The impact of adding information had a stronger impact on ratings of the least severe conditions, reducing average ratings in this category by 1.67 points. Addition of information about child-rearing had the strongest impact, reducing average ratings by 2.09 points. We found little evidence that the effect of adding information on ratings of health descriptions varied by gender, education, or age.
Including information about health states not directly related to major functional consequences or symptoms, particularly with respect to child-rearing and specifically for descriptions of less severe conditions, can lead to lower ratings of health. However, this impact was not consistent across all conditions or types of information, and was most pronounced for inclusion of information about child-rearing, and among the least severe conditions.
Citation: King NB, Harper S, Young M, Berry SC, Voigt K (2018) The impact of social and psychological consequences of disease on judgments of disease severity: An experimental study. PLoS ONE 13(4): e0195338. https://doi.org/10.1371/journal.pone.0195338
Editor: Brecht Devleesschauwer, Scientific Institute of Public Health (WIV-ISP), BELGIUM
Received: November 17, 2017; Accepted: March 20, 2018; Published: April 17, 2018
Copyright: © 2018 King et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying this study have been uploaded to the Harvard Dataverse and are accessible using the following link: https://dataverse.harvard.edu/dataverse/samharper.
Funding: This work was supported by the Canadian Institutes for Health Research (CIHR), Operating Grant EPP-122908, “Measuring Global Health.” The funder had no role in the design of the study, in the collection, analysis, or interpretation of data, or in writing the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The Global Burden of Disease (GBD) project systematically assesses mortality, healthy life expectancy, and disability across 195 countries and territories, using the disability-adjusted life year (DALY), which incorporates mortality and morbidity into a single metric.
After the DALY was first introduced in 1990, the GBD’s methods were criticized on the grounds that ‘health’ cannot—or should not—be separated from general welfare, which is shaped not only by an individual’s symptoms, but also by the interaction between those symptoms and the social environment.[2, 3] Individuals in different social or cultural milieus may experience the same symptom (e.g. reduced mobility or poor eyesight) differently, and symptoms or health conditions that in one context might pose few problems for social interactions might in another context be strongly stigmatized and thus profoundly disabling.[4, 5] Because these contexts vary between and within countries, and may also vary depending on an individual’s relative social position, the use of universal disability weights was critiqued as misguided.
Responding to these concerns, the 2010 GBD update sought to isolate the impact of health loss from the broader concept of welfare loss, thus reducing the possibility that differences in contextual factors might lead to systematic variations in health assessments across settings.[6, 7] In the 2010 GBD, respondents were asked to rate which of two hypothetical individuals is ‘healthier,’ on the basis of ‘brief lay descriptions that emphasised the major functional consequences and symptoms associated with each health state with simple, non-clinical vocabulary… that aimed to capture the most salient details for each health state, while ensuring consistency in wording across states and avoiding ambiguous terms.’
GBD researchers argued that the use of parsimonious descriptions effectively isolated ‘health’ from welfare considerations, resulting in exceptionally consistent disability weights across countries and social groups. They thus concluded that
‘we did not observe evidence to support the hypothesis that comparative assessments of health at a global level are undermined by extensive cultural variation. On the contrary, we have reported strong evidence that many aspects of individuals’ assessments of health outcomes seem to reflect common values, affirming universal aspirations for averting negative health outcomes such as pain or depression and for enjoying high levels of functioning in domains of health such as mobility.’
While the GBD 2010 methodology was generally consistent, there were potentially important variations in the language used to describe health states. In particular, the descriptions for some health states included not only information about symptoms and functional consequences, but also consequences unrelated to health per se. This additional information included psychological consequences (e.g. anxiety about a diagnosis or recurring symptoms), implications for social interaction (e.g. a condition causes others to “stare and comment”), and/or implications for child-rearing (e.g. a condition makes it difficult to care for children). Moreover, the inclusion of psychological, social, or child-rearing (hereafter “familial”) implications was applied inconsistently across disease states: only some health state descriptions in which these implications are likely actually mentioned them. For example, psychological implications (anxiety) were included in the description of epilepsy, but not in the description of asthma:
Epilepsy: had sudden seizures in the past, but they have stopped now with medicines. The person has some drowsiness, difficulty concentrating and some anxiety about future episodes. [emphasis added]
Asthma: has wheezing, cough and shortness of breath more than twice a week, which causes difficulty with daily activities and sometimes wakes the person at night.
The inconsistent inclusion of non-health information raises two concerns. First, it calls into question the claim that ‘health’ was consistently isolated from welfare considerations in all health state descriptions. Second, if extra information affected evaluators’ ratings of the health states, then the inconsistent inclusion of this information may have influenced the relative disability weights of the health states. There is evidence that evaluative judgments may be subject to an “unpacking effect,” in which more detailed descriptions of a category or event facilitates the generation of evaluative evidence, which in turn produces more extreme evaluations of those categories or events, including health and suffering.
We conducted an experimental study to examine whether the inclusion or removal of psychological, social, or familial implications from a health state description might affect individual judgments about disease severity, and thus relative disability weights.
We selected our study population using Crowdflower (http://www.crowdflower.com), a Web-based platform for recruiting and paying subjects to perform tasks. Crowdflower offers a wide selection of compensated tasks to a participant pool of over 1 million participants worldwide, and provides demographic information about its source population, which allowed us to examine the overall characteristics and representativeness of our participant pool. The use of an online platform for study recruitment and execution allowed for a larger and more representative sample than in-person convenience sampling.[9, 10] The Crowdflower system also allowed us to introduce safeguards into the administration of our survey, including specification of primary language; a minimum time allotment for the survey; rules to guard against participants completing the same survey multiple times (a “maximum judgments” option); selecting from a category of experienced and validated participants; and the collection of identifying information about participants (e.g. internet service provider (IP) addresses and worker identification (ID) numbers), which allowed us to exclude participants who may have attempted to take the surveys multiple times. Crowdflower offers three primary options for recruiting participants: 1) by level of trustworthiness/accuracy; 2) by country; 3) by primary language spoken. We entered our specifications into Crowdflower for these options (highest trustworthiness level, no specified country, English), and Crowdflower selected participants from their existing participant base.
Based on the findings of two pilot surveys of 50 participants each, we adopted the following strategy to exclude low-quality responses: upon completion of the surveys, we deleted (1) any surveys that were incomplete, on the grounds that they likely indicated that participants were not taking the survey seriously, and thus were unlikely to produce useful individual answers, with the exception of participants who did not complete the demographic information, since this content was optional; (2) any surveys in which participants answered any one of 5 dummy questions incorrectly; and (3) any survey that was completed in less than 4 minutes, a minimum time allotment determined by having a research assistant complete the survey in the pilot phase. In cases where we detected duplicate IP addresses or worker IDs, we retained the earliest results and deleted the others. Our surveys were taken by 3012 respondents; surveys were run sequentially within one day of each other. After excluding surveys that were incomplete (n = 205), surveys with at least one incorrect dummy question (n = 553), and surveys completed in less than 4 minutes (n = 662), we accepted surveys with n = 803 for version 1 and n = 789 for version 2. Results from the pilot surveys were not included in the final analysis.
We collected optional demographic information on the respondent’s age (continuous), gender (male, female), highest level of education (less than high school, high school diploma, university degree, graduate or professional degree), and race-ethnicity (using United States Census categories). Our sample was drawn from literate individuals with computer and internet access.
Ethics approval and consent to participate
This study was approved by the McGill University Faculty of Medicine’s Research Ethics Office (IRB Study Number A09-B44-10B). All participants provided written consent for participation.
We selected 36 different health state descriptions across the spectrum of disease severity from the GBD 2010. GBD health state descriptions were ranked by disability weight, then divided into low (disability weight <0.25), moderate (disability weight 0.24–0.5), and high (disability weight >0.5) severity category. We selected 12 descriptions from each category, choosing health state descriptions that were amenable to addition of social, psychological, or familial implications.
For each health state description, we designed a ‘paired’ description by either adding or removing information about plausible psychological, social, or familial implications of a health condition, based on the original health state descriptions published in . We constructed two survey versions containing all 36 health state descriptions (S1 Text). The order of the descriptions was identical, but whether or not each description contained additional information varied—that is, for any given description, one survey contained the version with additional information, and the other survey contained the version without. Both surveys included descriptions with and without additional information in order to mimic the original GBD survey content. Health state descriptions contained only information about the symptoms of a health state, and did not name the health state.
In order to maximize variance across health state descriptions, we used a modified visual analog scale (VAS) which allowed respondents to assign a precise numeric value to the health state description from 0 to 100 using a slider, with 0 as the “worst possible health” and 100 as the “best possible health” (Fig 1). As a check on the quality of the responses, we included 5 “dummy” questions throughout the survey that required the respondent to set the slider to a specific value (e.g., “Set the rating for this person at 78”) or included an unambiguous description of a 0 valuation (e.g. “The person is not breathing and has no pulse. The person is dead”).
We tested five hypotheses: (1) the inclusion of psychological, social, or familial consequences in health state descriptions will reduce the average rating of a health state; (2) the effect will be stronger for diseases with lower disability weights (i.e., less severe diseases); (3) the effect will vary across the type of additional information added to the health state description; (4) the impact of adding information on familial will be stronger for females than males; (5) we also tested whether the effect of additional consequences on ratings of health state descriptions differed by levels of completed education and age.
We used linear regression models to estimate the impact of adding psychological, social, or family/familial information to each disease condition: (1) where yi is the health ranking of subject i and x is an indicator variable for whether the survey contained additional information. We used Eq (1) across all 36 conditions to estimate the average effect, and in subsequent models we also added demographic covariates and indicator variables for each question to estimate a conditional effect. For hypothesis (2) we extended Eq (1) to allow the effect to differ across levels of disability weight: (2) where yi and x are defined as in Eq (1), z is a continuous variable representing the disability weight as published in the original 2010 GBD survey  and γ is the coefficient corresponding to the independent effect of a one-unit change in disability weight, which varies from 0 (least disabling) to 1 (most disabling). A test of the coefficient δ = 0 provides evidence of a departure from additive effects of additional information and disability weight. We used a model similar to (2) to test hypotheses 3–5, all of which allowed the effect of adding information to vary by other characteristics. We used cluster robust standard errors to account for non-independence of responses among individuals—i.e., standard errors were clustered at the individual level.
All of our analyses were pre-specified and conducted using Stata software, version 14 (Statacorp, College Station, TX). A copy of our pre-analysis plan is registered on the Experiments in Government and Political Science web portal (http://egap.org/registration/740).
Table 1 shows basic descriptive statistics and Pearson chi-square tests for independence for our two survey samples. Overall, the two surveys are balanced with respect to age, gender, race, education, and average survey completion time. There are minor differences for some categories. As a sensitivity analysis, we control for these demographic characteristics in regression models.
Table 2 shows the health state name (the health state name was omitted from the actual surveys), average rating for each version of the survey, which version of the survey contained additional information, and the crude effect of adding information. Questions 15, 21, and 33 were dummy questions and were thus omitted from our analysis. Average ratings were lowest (~15) for conditions such as paralysis or severe cognitive and motor impairment, and highest (~70) for impotence. Adding information increased ratings for some questions and decreased ratings for others.
Table 3 shows estimates of the impact of adding information (i.e., the coefficient β) for three different models. The crude model without any adjustments shows that, on average, adding social, psychological, or familial consequences to the health state description lowered individual ratings of that description by 0.78 points (95% confidence interval [CI]: -1.01 to -0.56). Adjusting for demographic covariates and indicator variable for each question had little impact on the crude estimate, thus for parsimony and transparency we did not adjust for covariates in the other analyses.
To test hypothesis (2), we allowed the effect of added information to vary according to GBD disability weight. Fig 2 shows the marginal effects from models that included product terms between the main treatment variable and GBD score. When looked at as a continuous variable, the impact of adding information is strongest for health states with the lowest GBD disability weight (least severe conditions), and declines as GBD disability weight increases. However, this assumes that GBD disability weight has a linear relationship with ratings of health state descriptions. We also categorized the GBD disability weight into broad categories of mild (<-0.25), moderate (0.25–0.49), and severe (0.50+). When looked at categorically, adding information reduced disease ratings by 1.66 points (95% CI 1.2 to 2.1), for the least severe conditions (GBD weights <0.25), but showed weaker effects at higher GBD weight categories.
Fig 3 shows results from analyses allowing the impact of adding information to vary by the type of information added (psychological, social, or familial). Overall, adding information about familial consequences showed the greatest impact, reducing ratings by an average of 2.09 points (95%CI 1.7 to 2.5), compared to reductions of 0.10 for psychological and 0.16 for social implications. The different types of information also showed some evidence of heterogeneity by GBD category, with only familial information showing a consistently negative impact for all disability weight categories. Finally, we found limited evidence that the effect of adding information varied by gender, education, or age.
We hypothesized that the inclusion of psychological, social, or familial consequences related to health state descriptions will reduce the average rating of health, possibly due to an “unpacking effect” or other psychological phenomenon. We found that inclusion of information that is not directly related to major functional consequences or symptoms reduced average health ratings by 0.78 on a scale of 100. There was considerable heterogeneity within our sample: adding non-health information varied from reducing health rating by 9.1 points (traumatic brain injury) to increasing it by 5.19 points (end-stage renal disease). While this these differences did not consistently support our hypothesis that additional information would reduce average ratings, it is possible that—as found in other research on unpacking effects—added detail can produce more extreme evaluative judgments in either direction.
We also hypothesized that the effect of including additional information would vary by disability weight (i.e. stronger for health conditions with lower disability weights), type of information (i.e. psychological, social, or familial consequences), and gender (i.e. the impact of adding information on familial will be stronger for women than men), but would not vary by education level or age.
We found some confirmatory evidence that adding information had a stronger impact on ratings of the least severe conditions, reducing average ratings in this category by 1.67 points, and that addition of information about child-rearing had a stronger impact than psychological or social consequences, reducing average ratings by 2.09 points. We found little evidence that the effect of adding information on ratings of health descriptions varied by gender, education, or age.
Overall, we found that inclusion of non-health information in health state descriptions could impact individuals’ evaluations of the severity of those health states. However, this impact was not consistent across all conditions or types of information, and was most pronounced for inclusion of information about child-rearing, and among the least severe conditions.
Our study has limitations. Use of a web-based survey rather than in-person testing methods may have impacted our results, and thus limit the generalizability of our findings. Our sample was drawn exclusively from literate individuals with computer and internet access, so generalizability is limited. It is unknown whether the effects that we discovered are large or consistent enough to impact actual relative rankings of conditions in a study such as the Global Burden of Disease; nor whether this effect would be present when using paired comparisons (as the GBD does) rather than a visual analogue scale.
We found some evidence that inclusion of information about health states not directly related to major functional consequences or symptoms, particularly with respect to child-rearing and specifically for descriptions of less severe conditions, can lead to lower ratings of health, although the effect size was small and this finding was not consistent across health states. Future studies that attempt to isolate evaluations of ‘health’ should be consistent in their inclusion or exclusion of this type of information in order to facilitate the interpretation of subsequent findings, and ensure appropriate comparability of health states.
S1 Text. Survey versions 1 and 2.
S1 Table. Location of ISP addresses of respondents.
- 1. GBD 2015 Disease and Injury Incidence and Prevalence Collaborators: Global, regional, and national disability-adjusted life-years (DALYs) for 315 diseases and injuries and healthy life expectancy (HALE), 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 2016, 388(10053):1603–1658. pmid:27733283
Broome J: Measuring the Burden of Disease by Aggregating Well-Being. In: Summary measures of population health: concepts, ethics, measurement and applications. edn. Edited by Broome J, Murray CJ, Salomon J, Mathers C, Lopez A. Geneva: World Health Organization; 2002.
- 3. Anand S, Hanson K: Disability-adjusted life years: a critical review. J Health Econ 1997, 16(6):685–702 pmid:10176779
- 4. Reidpath D, Allotey P, Kouame A, Cummins R: Measuring health in a vacuum: examining the disability weight of the DALY. Health Policy Plan 2003, 18(4):351–356. pmid:14654511
- 5. Allotey P, Reidpath D, Kouame A, Cummins R: The DALY, context and the determinants of the severity of disease: an exploratory comparison of paraplegia in Australia and Cameroon. Soc Sci Med 2003, 57(5):949–958. pmid:12850119
- 6. Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A, Begum N, Shah R, Karyana M, Kosen S et al: Common values in assessing health outcomes from disease and injury: disability weights measurement study for the Global Burden of Disease Study 2010. Lancet 2012, 380(9859):2129–2143. pmid:23245605
- 7. Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A, Begum N, Shah R, Karyana M, Kosen S et al: Supplement to: Common values in assessing health outcomes from disease and injury: disability weights measurement study for the Global Burden of Disease Study 2010. Lancet 2012, 380(9859):1–25.
- 8. Van Boven L, Epley N: The unpacking effect in evaluative judgments: When the whole is less than the sum of its parts. Journal of Experimental Social Psychology 2003, 39(3):263–269.
- 9. Buhrmester M, Kwang T, Gosling SD: Amazon’s Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data? Perspect Psychol Sci 2011, 6(1):3–5. pmid:26162106
- 10. Berinsky AJ, Huber GA, Lenz GS: Evaluating Online Labor Markets for Experimental Research: Amazon.com’s Mechanical Turk. Political Analysis 2012, 20(3):351–368.
- 11. Salomon JA, Murray CJ: A multi-method approach to measuring health-state valuations. Health Econ 2004, 13(3):281–290. pmid:14981652
- 12. Cameron CA, Miller DL: A Practitioner’s Guide to Cluster-Robust Inference. J Human Resources 2015, 50(2):317–372.