The SF-6D and EQ-5D are widely used generic index measures as health-related quality of life. We assessed within-subject agreement between SF-6D and EQ-5D utilities with different preference weights, and their validities in measuring Chinese rural residents, before and after standardization scores.
Rural residents over 18 years old were interviewed using EQ-5D and SF-6D in Jiangsu Province, China. EQ-5D utility-scoring algorithms were used from three conversion tables from the United Kingdom, Japan, and the United States. Validities, Sensitivity and agreement between instruments were computed and compared. Factors affecting utility difference were explored with multiple liner regression models. Scores with standardization intervals of 0–1 in the two instruments were analyzed by the use of the above methods again. In 929 respondents, relative efficiency statistic and receiver operating characteristic curves analysis showed SF-6D to be the more efficient, followed by the EQ-5D model in Japan weights. Bland–Altman plot analysis showed paired SF-6D/EQ-5D in UK weights had better agreement. Though some risk factors were found, multiple liner regression demonstrated most coefficients were weaker than 0.2, and all R2 values were less than 0.06. Standardization did not significantly influence these results except scores' value.
Citation: Jin H, Wang B, Gao Q, Chao J, Wang S, Tian L, et al. (2012) Comparison between EQ-5D and SF-6D Utility in Rural Residents of Jiangsu Province, China. PLoS ONE 7(7): e41550. https://doi.org/10.1371/journal.pone.0041550
Editor: Richard Fielding, The University of Hong Kong, Hong Kong
Received: March 15, 2012; Accepted: June 22, 2012; Published: July 27, 2012
Copyright: © Jin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported in part by National Science and Technology Major Project of China (2009ZX10004-904. No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
In light of the fact that evaluating health related-quality of life (HRQoL) currently operates without a gold standard, it is important to understand the real state of health by comparing different instruments. Some studies have focused on comparisons between European quality of life (EQ-5D) and Short Form of the Medical Outcomes Study Questionnaire (SF-6D) in measuring discrepancies from the general population ,  and patients , , , , .
In EQ-5D, the best-known preference weights were derived from samples in a UK population, which may be applied to other populations when country-specific weights are not available, such as in China . Now different weights in EQ-5D were randomly used in China and other countries, which cause the occurrence of different scores for the same population. Therefore, it is necessary to ensure the suitable weight in current ones for Chinese population. Moreover, using the same preference weights, some evidence has suggested that valuations of health states could differ for people in different countries owing to differences in demographic backgrounds , , including self-reported score. Also, it is worth noting the potential complexity on the comparison from different intervals of utility score between EQ-5D (−0.59 to 1.00 or −0.11 to 1.00) and SF-6D (0.32 to 1.00). However, both of them were used to evaluate the real health related-quality of life and compared with each other in many studies despite the different scales. It is difficult to be understood that life of quality was negative value, or the same patient had significant distinct in utility scores. Therefore, it is of great significance to standardize these intervals (0 to 1) for the understanding and comparisons. Have the standardization similar effects as the non-standardization in these instruments? Furthermore, more studies have been carried out on patients than on the general population, on urban rather than on rural residents in China ,  and others countries , , , so that the applicability of the above instruments was not taken into account for the latter.
Therefore, this study provides an opportunity to examine Chinese rural residents' HRQoL as measured by EQ-5D with three countries' preference-weighted scores and SF-6D instruments to test the validity and sensitivity of these instruments and assess within-subject agreement between them before and after standardization scores ranging from 0 to 1.
Materials and Methods
The target population for the study was Jiangsu's rural residents, aged 18 and older, with rural hukou. A multistage, stratified, random sampling procedure was employed, aiming at generating a sample representing the age, sex, and socioeconomic status distribution in the target population. Due to limited resources, the target sample size was restricted to 1,000 individuals. The subjects were sampled from three counties (Taixing, Danyang, and Zhangjiagang) in Jiangsu Province, China, in 2010. The subjects were assigned to 13 regions according to population size. And 25 to 30 households were randomly selected for interview in these regions. Temporary residents were excluded. Following informed consent, each subject was interviewed by a trained interviewer using a standardized questionnaire containing the sociodemographic information, the medical conditions, the EQ-5D/visual analog scale (VAS) and SF-6D. This study was approved by the Ethics Committee of the Jiangsu Provincial Center for Disease Control and Prevention. We obtained written informed consent from all participants involved in our study. The data were analyzed anonymously.
The SF-6D algorithm is described in detail elsewhere . The SF-6D utility-scoring algorithm was derived from a representative sample of the UK general population with Standard Gamble (SG) method, ranging from 0.32 to 1.00. The Hong Kong Chinese version and HK scoring algorithm of SF-6D was adopted . To assess differences in the EQ-5D algorithm, scores were compared from three conversion tables of the United Kingdom (EQ-5D-UK) , Japan (EQ-5D-JP) , and the United States (EQ-5D-USA) , using time tradeoff (TTO)-based preference scores. The scores ranged from −0.59 to 1.00 in the United Kingdom weights and from −0.11 to 1.00 in Japan and the United States. The EQ-5D Visual Analogue Scale (VAS) records the respondent's self-rated health status on a VAS. The simplified Chinese version of EQ-5D/VAS in this study is an official version authorized by the EuroQol Group.
Continuous variables are presented as mean standard error (SE), while categorical variables are shown as a proportion of the sample. EQ-VAS scores were divided by 100 to generate values between 0 and 1.
Convergent validity of the EQ-5D and SF-6D was assessed by examining their association with EQ-VAS classified by different cutoff values . The validity coefficient was computed as Spearman's rank correlation coefficient . The efficiency of EQ-5D and SF-6D to detect the relevant differences was compared using relative efficiency (RE) statistic and receiver operating characteristic (ROC) curves. The area under the ROC curves (AUC) was computed to compare the discriminative properties of these instruments (AUC≥0.5).
Agreement among these instruments was assessed by means of Bland-Altman plots , the limit of agreement (LOA) being greater than 0.95. To determine whether the subjects' socioeconomic status was related to the utility difference between EQ-5D and SF-6D, multiple linear regression (MLR) was used in all entry models. The standard adjustments were as follows: SF-6D value minus 0.32 and then divided by 0.68, EQ-5D-UK value added 0.59 and then divided by 1.59, EQ-5D-JP/EQ-5D-USA value added 0.11 and then divided by 1.11. After the corresponding adjustment was done to obtain identical intervals between SF-6D and EQ-5D for 0–1, the standard results from the above analysis methods were compared with the previous nonstandard ones.
All statistical analyses based on complex sampling data were conducted using SAS version 9.1 with the programmes, such as surveyfreq, surveymeans and surveyreg (SAS Institute Inc., Cary, NC, USA.).
There were 929 (the response rate 92.9%) SF-6D and EQ-5D forms evaluated in our study, with no missing items eligible for analysis, while 71 subjects were excluded for refusal to answer questions or urban residents. The sample sociodemographic characteristics were shown in Table 1 and 2. The scores' value increased in EQ-5Ds and decreased in SF-6D after the standardization of the interval.
A strong ceiling effect was observed (Table 2): the highest percentage of the ceiling effect appeared with mobility, self-care, and usual activities in EQ-5D, and role limitation in SF-6D (Table 3 and 4). For rural residents, the mental and vitality dimensions were associated with more serious problems in SF-6D, while pain/discomfort and anxiety/depression were seen in EQ-5D.
Validation Sensitivity of EQ-5D and SF-6D
Convergent validity was demonstrated by moderate correlation coefficients (r≥0.349) between EQ-5D/SF-6D and VAS, strong (r≥0.574) between SF-6D and EQ-5D, and very strong (r≥0.999) between different EQ-5Ds (Table 5). A significant difference in utility scores was observed among different levels of VAS for these instruments (P<0.0001). The RE statistic calculation showed that EQ-5D-JP had a greater efficiency at detecting a difference in VAS scores under its different cutoff values than EQ-5D-UK and EQ-5D-USA; however, SF-6D's RE was higher than EQ-5D-JP's except for the VAS cutoff between 0.80–0.90 (Table S1). The orders of the AUC scores were as follows: SF-6D>EQ-5D-JP>EQ-5D-UK or EQ-5D-USA. The results after standardization scores had a similar effect on the sensitivity except the mean scores.
Evaluation of Agreement
In the non-standardization model, SF-6D showed better agreement with EQ-5Ds than with VAS; EQ-5D-UK and EQ-5D-JP/EQ-5D-USA had the highest LOA of 97.8%, while EQ-5D-JP and EQ-5D-USA had the lower LOA of 95.9%; different EQ-5D had good agreement with VAS (LOA>0.95) (Figure 1). Similar results were found in the standardization model.
Factors Affecting Utility Difference between EQ-5D and SF-6D
Noticeably, when SF-6D or VAS was compared with different EQ-5Ds, middle education demonstrated lower scores difference between SF-6D and EQ-5D than higher education, whether adjusting scores ranging from 0 to 1 (Table S2). Other factors such as age, marriage and acute medical condition influenced their difference. Full or partial coverage showed less difference changes in utility scores than self-expense among EQ-5Ds and VAS. After standardization, most of these variables demonstrated similar association for the EQ-5Ds. However, these coefficients had a weak strength of less than 0.2, and all R2 values were less than 0.05.
In this study, we provide evidence of the validity and sensitivity of EQ-5D with different preference weights and SF-6D in general Chinese rural residents. However, some qualifications have to be made.
First, for distinguishing self-reported health status, RE and ROC analysis showed SF-6D to be the more efficient , followed by the EQ-5D model, in Japan. SF-6D includes broader aspects of HRQoL, such as role and social functioning, and has a greater response level for each domain . This can make the description of health status more comprehensive, and patients would be more likely to find the best description for their status. In fact, the five-level version of EQ-5D is under development . Also, it is one of the reasons why EQ-5D utility scores tend to be higher than SF-6D scores in healthier population , . The Japan scheme provided better convergent and known-groups validity than the UK and US schemes did in this sample. These results may reflect the fact that China is an Asian country, whose culture is closer to Japan than to the United Kingdom and the United States. Noticeably, SF-6D's RE was higher than EQ-5D-JP's except for the VAS cutoff between 0.80–0.90. The phenomenon was related to selection of VAS, which was self-reported scores and underestimated by Chinese rural residents; moreover, the interval of 0.80–0.90 included ones from healthy people with conservative self-evaluation. Moreover, being different from other studies , MLR analysis implied the ability of understanding , influenced by the education levels, and could potentially introduce systematic bias resulting from possible differences in rural residents' experience. It is necessary to further follow up more rural residents and give more reasonable evaluation, especially for healthy people.
Second, EQ-5D had a greater stronger ceiling effect than SF-6D, and this may limit its ability to discriminate within the general population with mild to moderate symptoms. The relatively small sample size of chronic patients with mild symptoms might aggregate the high ceiling effect observed. Similar phenomena have been found in chronic prostatitis patients in China . Several statistical methods have been proposed to address ceiling effects, such as Tobit models, the censored least absolute deviation approach, two-part models(TPM) and latent class models (LCM), which were compared by Huang et al . Huang et al suggested the LCM and TPM with a log-transformed were superior to other approaches.
Third, Standardization of scores could be introduced into the direct comparison between the two instruments. The idea of standardization scores is based on an assumption that the scores from different instruments could be conveniently compared and be easily understood by readers at the same interval, ignoring various preferences methods and models. The standardization scores for different measurements had similar effects to nonstandard scores except the scores' value in the study. The value in SF-6D decreased while the ones in EQ-5Ds increased slightly, potentially owing to different dimensions and higher proportion in healthy people. The phenomenon would be weakened when the standardization of the interval was used in patients' evaluation of life of quality in these instruments. However, the standardization scores were not applicable in the instruments with non-linear scale, and they maybe conceal the truth of people health. Further research with larger sample size of population, especially for patients with clear definition, is needed to establish and determine the feasibility of standardization score.
Efficiency of EQ-5D and SF-6D to detect relevant difference.
We would like to thank two anonymous referees for their helpful comments on earlier drafts of this paper. We are grateful to the heads and staff at the various facilities used for data collection. Our sincere thanks go to the ethics committees of Center of Disease Control and Prevention of Jiangsu Province.
Conceived and designed the experiments: HJ BW PL. Performed the experiments: HJ SYW QG JQC LT. Analyzed the data: HJ SYW. Wrote the paper: HJ BW QG JQC SYW LT PL.
- 1. Kontodimopoulos N, Pappa E, Papadopoulos AA, Tountas Y, Niakas D (2009) Comparing SF-6D and EQ-5D utilities across groups differing in health status. Qual Life Res 18: 87–97.
- 2. Cunillera O, Tresserras R, Rajmil L, Vilagut G, Brugulat P, et al. (2010) Discriminative capacity of the EQ-5D, SF-6D, and SF-12 as measures of health status in population health survey. Qual Life Res 19: 853–864.
- 3. Bharmal M, Thomas JR (2006) Comparing the EQ-5D and the SF-6D descriptive systems to assess their ceiling effects in the US general population. Value Health 9: 262–271.
- 4. Barton GR, Sach TH, Avery AJ, Jenkinson C, Doherty M, et al. (2008) A comparison of the performance of the EQ-5D and SF-6D for individuals aged >or = 45 years. Health Econ 17: 815–832.
- 5. Brazier J, Roberts J, Tsuchiya A, Busschbach J (2004) A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 13: 873–884.
- 6. McCrone P, Patel A, Knapp M, Schene A, Koeter M, et al. (2009) A comparison of SF-6D and EQ-5D utility scores in a study of patients with schizophrenia. J Ment Health Policy Econ 12: 27–31.
- 7. Adams R, Walsh C, Veale D, Bresnihan B, FitzGerald O, et al. (2010) Understanding the relationship between the EQ-5D, SF-6D, HAQ and disease activity in inflammatory arthritis. Pharmacoeconomics 28: 477–487.
- 8. Dolan P (1997) Modeling valuations for EuroQol health states. Med Care 35: 1095–1108.
- 9. Badia X, Roset M, Herdman M, Kind P (2001) A comparison of United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Med Decis Making 21: 7–16.
- 10. Sakthong P, Charoenvisuthiwongs R, Shabunthom R (2008) A comparison of EQ-5D index scores using the UK, US, and Japan preference weights in a Thai sample with type 2 diabetes. Health Qual Life Outcomes 6: 71.
- 11. Zhang XH, Li SC, Fong KY, Thumboo J (2009) The impact of health literacy on health-related quality of life (HRQoL) and utility assessment among patients with rheumatic diseases. Value Health 12 Suppl 3: S106–S109.
- 12. Zhao FL, Yue M, Yang H, Wang T, Wu JH, et al. (2010) Validation and comparison of EuroQol and short form 6D in chronic prostatitis patients. Value Health 13: 649–656.
- 13. Brazier J, Roberts J, Deverill M (2002) The estimation of a preference-based measure of health from the SF-36. J Health Econ 21: 271–292.
- 14. McGhee SM, Brazier J, Lam CL, Wong LC, Chau J, et al. (2011) Quality-adjusted life years: population-specific measurement of the quality component. Hong Kong Med J 17 suppl6: 17–21.
- 15. Tsuchiya A, Ikeda S, Ikegami N, Nishimura S, Sakai I, et al. (2002) Estimating an EQ-5D population value set: the case of Japan. Health Econ 11: 341–353.
- 16. Shaw JW, Johnson JA, Coons SJ (2005) US valuation of the EQ-5D health states: development and testing of the D1 valuation model. Med Care 43: 203–220.
- 17. Barton GR, Sach TH, Avery AJ, Jenkinson C, Doherty M, et al. (2008) A comparison of the performance of the EQ-5D and SF-6D for individuals aged >or = 45 years. Health Econ 17: 815–832.
- 18. Bland JM, Altman DG (1999) Measuring agreement in method comparison studies. Stat Methods Med Res 8: 135–160.
- 19. Grieve R, Grishchenko M, Cairns J (2009) SF-6D versus EQ-5D: reasons for differences in utility scores and impact on reported cost-utility. Eur J Health Econ 10: 15–23.
- 20. Janssen MF, Birnie E, Haagsma JA, Bonsel GJ (2008) Comparing the standard EQ-5D three-level system with a five-level version. Value Health 11: 275–284.
- 21. Huang IC, Frangakis C, Atkinson MJ, Willke RJ, Leite WL, et al. (2008) Addressing ceiling effects in health status measures: a comparison of techniques applied to measures for people with HIV disease. Health Serv Res 43: 327–339.