Chinese Version of the EQ-5D Preference Weights: Applicability in a Chinese General Population

Objectives This study aimed to test the reliability, validity and sensitivity of Chinese version of the EQ-5D preference weights in Chinese general people, examine the differences between the China value set and the UK, Japan and Korea value sets, and provide methods for evaluating and comparing the EQ-5D value sets of different countries. Methods A random sample of 2984 community residents (15 years or older) were interviewed using a questionnaire including the EQ-5D scale. Level of agreement, convergent validity, known-groups validity and sensitivity of the EQ-5D China, United Kingdom (UK), Japan and Korea value sets were determined. Results The mean EQ-5D index scores were significantly (P<0.05) different among the UK (0.964), Japan (0.981), Korea (0.987), and China (0.985) weights. High level of agreement (intraclass correlations coefficients > 0.75) and convergent validity (Pearson’s correlation coefficients > 0.95) were found between each paired schemes. The EQ-5D index scores discriminated equally well for the four versions between levels of 10 known-groups (P< 0.05). The effect size and the relative efficiency statistics showed that the China weights had better sensitivity. Conclusions The China EQ-5D preference weights show equivalent psychometric properties with those from the UK, Japan and Korea weights while slightly more sensitive to known group differences than those from the Japan and Korea weights. Considering both psychometric and sociocultural issues, the China scheme should be a priority as an EQ-5D based measure of the health related quality of life in Chinese general population.


Introduction
differences from three value sets: the UK [8], Japan [9] and South Korea schemes [13] using a general population from China. The UK and Japan schemes were usually used in China before with little evidence about the applicability used in the Chinese general population. The Korea scheme was selected to compare with the China scheme as the social, economical and cultural backgrounds were much similar between Korea and China, which would make it valuable to compare the schemes of the two countries. This study provided an alternative way for further studies to evaluate and characterize differences of the EQ-5D country-specific value sets.

Subjects and procedures
A survey was conducted in Xixiang Street, Bao'an District of Shenzhen in southeast China from October to November, 2013. There were over 600 thousand people living in Xixiang Street and about 80% were from other areas of China. The survey was a part of a large study, which aimed to provide information for community diagnosis for local health sectors. The whole protocol and the questionnaire were adapted from that of the NHSS 2013 and had been reviewed by relative experts. Xixiang Street had 33 communities. In each community, 40 families were designed to be randomly selected, representing 1320 families in total. Face-to-face household interviews were conducted by trained local interviewers using a questionnaire to all family members. One of the parents or another adult family member living together answered the questionnaire on behalf of the minor participant aged five years or below or those who could hardly answer the questionnaire themselves. The main contents of the questionnaire included the socio-economic characteristics, health status (including the Simplified Chinese version of the EQ-5D-3L scale), health risk factors and health service needs and utilization. Like the NHSS protocol, the EQ-5D scale was only interviewed among participants aged 15 years and older, as the Chinese child-friendly version of the EQ-5D (EQ-5D-Y) for younger respondents was not available. Physical examinations were also conducted among participants aged 15 years and older in local community health service centers to measure blood pressure, height, weight, waistline and hipline. In order to present the final results as a Chinese population value set, the study also reported results which applied corrective weights to reflect the Chinese national age/sex distribution in 2013. Participant information was anonymised prior to analysis. All participants provided written consent to participate in this study. Written consent of minor participants was provided by their parents or other adult next of kin, caretakers, or guardians who lived together. The study protocol including consent procedure was approved by the Institutional Review Board of Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.

EQ-5D: China, UK, Japan and Korea preference weights
The EQ-5D China, UK, Japan and Korea preference weights were compared in the study. Before the China weights were established, the UK and Japan weights were usually used in China. The Korea weights were also compared considering the similar social, economical and cultural backgrounds between Korea and China, which would make the comparison valuable.
The UK N3 model for EQ-5D preference weights was derived from the MVH study by Dolan based on a representative sample of non-institutionalized adult population in England, Scotland and Wales in 1993 [8]. As the first and well validated EQ-5D value set in the world, the UK scheme is widely used when there are not local ones offered or for comparison with other schemes in sensitivity analysis. The Japan valuation study was conducted based on the quasi MVH protocol by collecting a random sample of adult population from three prefectures in 1998 [9]. A plain main effects model for the Japan social value set was established with high goodness of fit. The Japan value set is the first EQ-5D tariff in Asia. Considering similar culture contexts, it has often been used in Asian countries where no local tariffs were available. There are two EQ-5D value sets for South Korea [12][13]. Lee et al. carried out a study using instruments and protocol similar to the MVH protocol based on a nationally representative sample in 2006 and established a model with promoted goodness-of-fit than the former one [13]. Liu et al. developed an EQ-5D value set for the Chinese general population in 2011 based on the Paris protocol [7] which was revised from the MVH protocol by Kind [14].

Data analysis
Respondents aged 15 years and above were included in this study. The EQ-5D index scores for each respondent were calculated using the UK, Japan, South Korea and China algorithms. The differences of scores generated from the four value sets were compared using ANOVA followed by post-hoc Bonferroni tests.
The four preference weights were compared with regard to the criteria of three psychometric properties: level of agreement (i.e. intra-observational reliability), convergent validity and known-groups validity and sensitivity. In order to examine whether the newly established China value set is applicable for the Chinese national general population, the results reported in this study applied corrective weights to reflect the Chinese national age/sex distribution in 2013.
Level of agreement. The agreements among the EQ-5D scores using the UK, US, and Japan preference weights were assessed using intraclass correlations coefficients (ICCs) and two-way random-effects model with absolute agreement and Bland-Altman agreement plots. According to the Rosner's criteria, ICC below 0.40 is regarded as poor agreement, 0.40-0.75 as fair to good agreement, and 0.75 and above as excellent agreement [16].
Convergent validity. Convergent validity of the four weighting schemes was evaluated by assuming that subjects with high scores from China EQ-5D scheme had high scores from other EQ-5D schemes and the EQ-VAS scores, and high global ratings of health status. The global rating is from the first item of SF-36 scale with five response options: excellent, very good, good, fair, and poor [5]. Pearson's correlation coefficient was used to describe correlations between two schemes of the EQ-5D index scores as well as between the EQ-5D index scores and the VAS scores. Spearman's rho correlation coefficient was used to describe correlations between the EQ-5D index scores and the global well-being ratings. The strength of the correlation was defined that strong correlations were > 0.50, moderate correlations ranged between 0.35 and 0.49 and weak correlations ranged between 0.20 and 0.34 [17].
Known-groups validity. Known-groups validity of the EQ-5D preference weights was analyzed by making assumptions that the EQ-5D weights have the ability to discriminate subjects from different socio-economic, risk factor related and health status known groups. Previous studies had shown different distributions of EQ-5D index scores according to the respondents' demographic, socio-economic status (SES), and other health related indicators [12,18,19]. This study examined whether the EQ-5D index scores were significantly different among subgroups by age, gender, education, income, employment status, health status, and health service utilization. Education, income and employment status were the SES indicators most commonly used. The indicators for health status included four variables: VAS score, global rating, chronic condition diagnosed and/or treated during the past 6 months, and onset of diseases or injuries during the last two weeks. Outpatient visit during the last two weeks and hospitalization during the past 12 months were indicators for health service utilization.
The educational level was classified into below primary school, primary school, junior middle school, senior middle school, college and above. The income was defined as the household annual income divided by the numbers of persons living in the family within the last half-year. Respondents were divided into five income groups of equal size: the lowest income group had an income below 15,000 RMB; the second group from 15,000 to 23,333 RMB; the third group from 23,334 to 29,999 RMB; the fourth group from 30,000 to 49,999 RMB; the fifth and highest income group 50,000 RMB and above. Employment status was classified into employed, unemployed, student and retired. Global rating was defined as mentioned in the section of convergent validity. Respondents were categorized into two groups by with or without a chronic disease, disease/injury during the last two weeks, outpatient visit during the last two weeks, hospitalization during the past 12 months, as well as having VAS scores below versus equal or above the median, respectively.
To identify the differences of EQ-5D index scores among known groups, independent t-test and ANOVA analysis were used. When the respondents were divided by a dichotomous variable, independent t-test was used to identify statistically significant effects on utility scores of different groups, while ANOVA test was used for a polytomous variable. The differences of EQ-5D index scores among known groups were compared with the minimal important difference (MID) to find if they were meaningful. There was no authoritative recommendation for the MID of EQ-5D index scores. Compared with those of other preference-based health-related quality of life instruments, the MID of the EQ-5D would be larger [20][21]. A change of 0.05 was considered to be a MID as this is equivalent to the mean change in the SF-36 (0.03) and the HUI (0.04 for HUI2 and 0.07 for HUI3) [21].
Sensitivity. The sensitivity of the EQ-5D weighting schemes was estimated by assessing the Cohen's d effect size (ES) statistic and the relative efficiency (RE) statistic. ES is defined as the differences between known groups divided by the standard deviation (SD) of EQ-5D index scores. The equation is as follows: Cohen suggests that an ES ! 0.2 would be a meaningful difference, and 0.2 ES < 0.5 indicates a "small" difference, 0.5 ES < 0.8 indicates a "moderate" difference, and ES ! 0.8 indicates a "large" difference [22]. RE is based on the ratio of the F or squared t statistic for one algorithm divided by that for another algorithm. RE ! 1 means the efficiency of the numerator is better or equal than that of the denominator and vice versa [23]. All statistical analyses were performed using SPSS17.0 (IBM Corp., Armonk, NY, USA).

Characteristics of respondents
A total of 1320 families were selected and 63 families refused to the survey, with a respondent rate of 95.2%. There were 4148 respondents interviewed. Twenty-five respondents were excluded for missing data on age (17), gender (6) and both age and gender (2), remaining 4123. Among them, 3028 respondents were 15 years old and above. Forty-four were excluded for missing data regarding to the EQ-5D five-dimensional system. Finally, 2984 respondents were included in our analysis, with an average age of 36.7 (SD = 12.4) years. Detailed results are shown in Table 1.

Quality profile of the EQ-5D
The results showed that 7.0% of the respondents reported some or extreme problem on one or more EQ-5D dimensions, with most respondents reporting problems with the pain/discomfort dimension and least reporting problems with the self-care dimension. The ratio rose to 11.2% in the weighted analysis (S1 Table). S2 Table shows the weighted percentages of respondents with some or extreme problem in each dimension by age groups and gender. In general, the percentages of problems increased in older age groups and were higher in women than men in each EQ-5D dimension in most age groups. Table 2 shows the distribution of the EQ-5D index scores and ceiling effect of the respondents. There were 93.0% of the respondents who reported with no problems in any dimension. The percentages of ceiling effects were higher in men (94.6%) than in women (91.4%). The China weights (mean ± SD: 0.985 ± 0.059) generated slightly higher mean utility scores compared to the UK (mean ± SD: 0.964 ± 0.133) and Japan (0.981 ± 0.073) weights, and slightly lower mean utilities compared to the Korea weights (0.987 ± 0.053). All score means were significantly different between two weights (P < 0.05) but less than MID, ranging from 0.002 (Korea/China) to 0.022 (Korea/UK). Distributions were skewed to the left for all weights. Table 3 shows the agreements among the EQ-5D scores valuated by the four preference weights using ICCs with two-way random-effects model with absolute agreement. Based on Rosner's criterion, all the ICCs were very high and excellent agreement was found between any two of the four weights. The highest ICC of 0.987 was between the China and Korea weights, and the lowest between the UK and Korea weights (0.780). After truncation of EQ-5D scores below 0 Applicability of the China EQ-5D Value Set for sensitivity analysis, the ICCs remained the same between the pairs of Asian weights and decreased by less than 0.01 between the UK and any Asian weights. Fig 1a-1f showed the Bland-Altman plots of the pairs of the four weighted EQ-5D scores to compare the degree of agreement. The mean of the differences (d) and the limits of agreement (95% CI of d) were indicated by lines. Perfect agreement was found between each pair of the weights as over 90% of the difference scores were in the limits of agreement.

Convergent validity
The Pearson correlation coefficients of EQ-5D index scores for the pairs of weights were all above 0.95 (P < 0.001) ( Table 3). The correlations between the EQ-5D utilities and the EQ-VAS scores were significant (P < 0.001), with 0.352 for the China and Japan weights, 0.353 for the UK weights, and 0.343 for the Korea weights, respectively. Spearman's rho correlation coefficients were -0.194 between all the EQ-5D based scores and the global ratings of health status (P < 0.001).

Known-groups validity
Overall, the known-groups validity and sensitivity measures were close for the EQ-5D scores generated by the four preference weights ( Table 4). The EQ-5D scores derived from each value set were significantly different for all the 11 categories of known groups (P < 0.01). When the respondents were divided indicators of age, global rating, and outpatient visit respectively, the score differences by each of the four preference weights were larger than MID. The score differences larger than MID were also observed between respondents with education level of below primary school and college and above using China, UK and Japan weights, with and without chronic diseases and disease/injury in the last two weeks using UK weights, and with and without disease/injury in the last two weeks using Japan weights.

Sensitivity
Among nine of the 11 known groups, the estimates of ES for all schemes were larger than 0.2, presenting a certain degree of sensitivity. Large differences with the ES ! 0.8 were observed for four known groups, i.e. age, education level, outpatient visit and global rating (Table 5). There were moderate differences with the ES between 0.5 and 0.8 for groups divided by chronic condition and two-week morbidity for all schemes. Small differences existed for two known groups divided by history of hospitalization and VAS levels.
Comparing the four preference weights, the China schemes generated the largest RE estimates for three variables: age, education, and household income. The UK weighted scores were most efficient at discriminating the differences between groups for the rest eight known-groups with the biggest REs. The UK weights also provided slightly larger ES values than the three other weights in groups divided by three variables. Besides, the ES estimates from the China weights were the largest in groups related to three variables and the second largest in groups related to four variables as well.

Discussion
The study examined the psychometric properties of the China preference weights for EQ-5D utility scores in a general community population. The overall results showed that there were small differences and similar validity of the China, Japan, Korea and UK EQ-5D preference weights as a measure of HRQol in the Chinese general population. The scores generated by the four weights were close with mean differences of statistical but meaningless significance. Excellent agreement and a strong correlation were found between any two of the four weights as well. Besides, the four weights discriminated similarly well among demographic, SES, and health status known groups. The China value set showed weak superiority of discriminative ability for age, education, and household income subgroups. As far as we could comprehend, this was the first time for the China preference weights to be tested. The study provided evidence about validity and performance of the UK, Japan and Korea EQ-5D preference weights in the Chinese general population, which was rarely reported. Limited studies had compared preference values and their algorithms for EQ-5D health states in different countries. However, strong positive correlations [9,13,[24][25][26], high level of agreement [24,25], and similar validities [26,27] with mixed results of REs [24][25][26][28][29][30] among the UK, Japan and Korea value sets had been reported in previous studies in general and patient samples from and outside of these countries, including China [9,13,[24][25][26][27]. In particular, Chang et al. [27] found that in a representative Taiwan sample the UK and Japan utilities could discriminate equally among the known groups of five-level global health status, two-week outpatient visits, and one-year hospitalization, as in the present study. Meaningless differences in absolute magnitude among the EQ-5D's utilities from the three value sets were also reported in previous studies both in the general population [24,26,27] and patients [25,28]. On the other hand, significantly different scores existed in this study and some previous studies comparing the UK, Japan and other national tariffs [24,27]. The cross-country discrepancies may ascribe to actual differences in peoples' preferences for health after ruling out effects of technical aspects from the development process of value sets: noise introduced during the translation process of the EQ-5D scale and the valuation procedure (such as TTO), as well as differences in the study design and method [9][10]. Four studies used TTO as the valuation procedure. Considering the simplicity of the EQ-5D scale and TTO, the effects of the former two factors were unlikely to be great. Referred to the third factor, three protocols were used in the four studies: the MVH protocol, a modified version of the MVH protocol and the Paris protocol. The latter two protocols were all based on the MVH protocol and used in developing the EQ-5D value sets around the world. The four studies also equally developed acceptable models using the three protocols. Then people's preferences for health would be the main source of differences.
Inter-country comparisons of the EQ-5D country-specific models revealed that there was a significant tendency that Asian raters gave greater weights to the functional dimensions of health whereas Western raters gave greater weights to the dimensions of pain/discomfort and anxiety/depression [7][8][9][10][11][12][13]. And valuations for EQ-5D health status in the generic public were broadly similar across Western countries [18,[31][32], while valuations across Asian countries like Japan and China are consistent in which the percentages of respondents reporting problems in each dimension were very low except for the pain/discomfort dimension [15,19,33]. Similar findings were also achieved in international comparisons of the SF-6D [34][35][36][37][38][39]. Moreover, our results also suggested that the three Asian weights performed closer to each other than to the UK weights in many ways. These observations illustrated similarities in health preferences among people in China, Korea and Japan.
The choice of weighting scheme in health valuations was a subject of debate [40]. It was clear that the China scheme established a model of perfect goodness-of-fit with estimates superior to the UK and Korea models and close to the Japan model [7][8][9]13]. The general criteria underlying the priority of weighting schemes was that, in some parts due to the cultural influences on subject ratings and differences in study settings, health valuations will perform differently when applied to different populations [41]. Culture was a complex issues, but previous studies and this study implied that Asian value systems perform closer to each other and better than the UK and US systems in the Asians [9,13,29,32,35], as referred above. Besides, the protocols for model estimation studies were similar but there were as much as 97 health states directly valued in the China study, which therefore minimized the interpolation spaces in the model estimation. Most importantly, the China version of the EQ-5D preference weights was found in our study to discriminate known-groups efficiently and even a little better than the Japan and Korea weights considering both ES and RE estimates. Based on the above factors, the China EQ-5D preference weights should be used preferentially for Chinese population.
For studies where the EQ-5D value sets were evaluated and compared [24][25][26][27]41], the present study had a relative large sample. However, due to limited resource restraints, it was carried out in only one city, and such findings should be interpreted carefully to the national population as the sample were not nationally representative. To make the conclusion generalized, Shenzhen was selected as high proportion of the people were from other areas around China, thus the population in the study could represent the national population in a certain degree. In addition, the findings of the present study were similar to those of recent studies comparing the psychometric properties of the EQ-5D value sets of other countries in Chinese population, for instance, one study in an urban community population of northern China [24] and another in a rural community population of southern China [26]. It suggested that age, sex and location would not affect the validity as evidence from previous studies had showed [9,10,41,42]. In this regard, the results of this study were reliable and could be able to be generalized to the national population in a certain degree. There was another limitation of the study that the EQ-5D is usually self administered but was interviewed face-to-face in this study like in the NHSS to making it possible to collect information from those with reading difficulties. In order to reduce inter-cluster correlation and interview bias, the interviewers were trained to avoid interference between family members during investigation. Finally, the study was based on a crosssectional investigation and other measurement properties such as test-retest reliability and responsiveness were not available. They should be addressed for further examination.

Conclusions
In conclusion, the validity and sensitivity of the China EQ-5D value set is verified in the Chinese general population by comparing with those of the UK, Japan, and Korea. The China TTO value set for the EQ-5D should be given preference for use for the general adult Chinese population and an increasingly wide utilization of the EQ-5D scale should be encouraged in China in the future.

S1 File. Chinese Version of the EQ-5D Preference Weights-Data.
(XLSX) S1 Table. Percentages of respondents reporting problems on each EQ-5D dimension. (DOCX) S2 Table. Weighted percentages of respondents reporting moderate and severe problems on each EQ-5D dimension by age group and sex (%). (DOCX)