Estimation of Disability Weights in the General Population of South Korea Using a Paired Comparison

We estimated the disability weights in the South Korean population by using a paired comparison-only model wherein ‘full health’ and ‘being dead’ were included as anchor points, without resorting to a cardinal method, such as person trade-off. The study was conducted via 2 types of survey: a household survey involving computer-assisted face-to-face interviews and a web-based survey (similar to that of the GBD 2010 disability weight study). With regard to the valuation methods, paired comparison, visual analogue scale (VAS), and standard gamble (SG) were used in the household survey, whereas paired comparison and population health equivalence (PHE) were used in the web-based survey. Accordingly, we described a total of 258 health states, with ‘full health’ and ‘being dead’ designated as anchor points. In the analysis, 4 models were considered: a paired comparison-only model; hybrid model between paired comparison and PHE; VAS model; and SG model. A total of 2,728 and 3,188 individuals participated in the household and web-based survey, respectively. The Pearson correlation coefficients of the disability weights of health states between the GBD 2010 study and the current models were 0.802 for Model 2, 0.796 for Model 1, 0.681 for Model 3, and 0.574 for Model 4 (all P-values<0.001). The discrimination of values according to health state severity was most suitable in Model 1. Based on these results, the paired comparison-only model was selected as the best model for estimating disability weights in South Korea, and for maintaining simplicity in the analysis. Thus, disability weights can be more easily estimated by using paired comparison alone, with ‘full health’ and ‘being dead’ as one of the health states. As noted in our study, we believe that additional evidence regarding the universality of disability weight can be observed by using a simplified methodology of estimating disability weights.


Introduction
Measuring disease burden is essential in order to set health service and research priorities [1]. However, quantifying disease burden is challenging. Although some epidemiological indicators, such as mortality and morbidity, have been used as measures of disease burden, there is a need for a common single measure reflecting various aspects of a disease [2]. In the global burden of disease (GBD) study in 1990, disability-adjusted life year (DALY) was used as a summary measure that reflects both the mortality and morbidity aspects of diseases [3]. Similar to quality-adjusted life year, DALY combines the impact of mortality and the occurrence and severity of diseases into a single index, thereby enabling disease burden to be compared between different diseases [4].
DALYs are the sum of 2 components: years of life lost (YLLs) and years lived with the disability (YLDs). YLLs reflect premature mortality, whereas YLDs represent the time period living with a disability, i.e., short or long-term loss of health [5]. YLDs can be regarded as the difference in disability between fully healthy people and diseased people, and are calculated by multiplying the number of people with disease or sequela by a relevant disability weight via a prevalence-based approach [6]. The disability weight for a health state quantifies the severity of disease, sequela as a percentage reduction from full health and has value ranging from 0 to 1, with 0 representing full health and 1 indicating being dead. In DALYs, the disability weights act as a bridge between mortality and morbidity.
Several studies have attempted to estimate the disability weights for the GBD or the national burden of disease studies by modifying and adapting methodologies [6][7][8][9][10][11]. However, the appropriate method to estimate disability weights and validity as well as the universality of the estimated disability weights remain controversial [12][13][14][15]. To address these controversies, the disability weights for 220 health states were estimated through an adapted methodology in the recent GBD 2010 disability weight study [6]. In that study, highly consistent results on disability weights were obtained through household surveys in 5 countries and a web-based survey by using a paired comparison and population health equivalence (PHE, a modified form of person trade off). Recently, in the GBD 2013 study, the disability weights were modified from the previous versions by including the results from a European disability weight study [16].
Nevertheless, several aspects of the assessment of disability weights, such as the methodological design and the validity of the values, were criticized following the publication of the GBD 2010 disability weight study [17,18]. In particular, the universality of the disability weights was questioned, indicating a need for more empirical evidence on universal disability weights and selection of health states [19][20][21]. Furthermore, considering the disadvantages of using person trade off, such as the lack of theoretical basis and cognitive burden [22], determining an easier way to estimate disability weights is necessary. By adapting the current methodology of estimating disability weights, we believe that empirical evidence on the universality of disability weights can be determined.
In the present study, we estimated disability weights by using a paired comparison-only model wherein 'full health' and 'being dead' were included as anchor points, without resorting to a cardinal method such as person trade-off. In particular, we calculated and compared the disability weights from 4 different models: a paired comparison-only model, hybrid model between a paired comparison and PHE; visual analogue scale (VAS) model; and standard gamble (SG) model.

Study design and participants
The study was conducted through a household survey and a web-based survey in South Korea, in the same way as in the GBD 2010 disability weight study [6]. The household survey was performed from August 2014 to November 2014 whereas the web-based survey was performed from September 2014 to November 2014. The household survey was conducted using computer-assisted face-to-face interviews, and the web-based survey was available only in the Korean language. This study was approved by the institutional review board of Asan Medical Center (S2014-1396-0002), and written informed consent was obtained from participants prior to household survey participation.
For the household survey, the target population was adults (19 years of age) living in South Korea. To select a representative of the Korean population, a total of 2728 representative general samples were drawn from the target population by using a multistage stratified quota method. Sample quotas were predefined considering regions, gender, age, and educational level, as defined by the June 2013 resident registration data, available through the Ministry of Administration and Security, South Korea. The household survey participants were contacted while walking on the street along with quotas and were asked to participate in the survey. Each household survey participant received approximately US$ 9 for completing the survey. For the web-based survey, participants were recruited through advertising in medical colleges and hospitals; announcement at medical meetings and conferences; and word of mouth from other participants involved in the web-based survey.

Health states
We tested a total of 258 health states, which reflected a diversity of health outcomes as a consequence of disease. Each health state was described by brief lay descriptions that explained the meaning of that health state in terms of several aspects of health [6]. Among the 258 health states, 220 were taken from the GBD 2010 disability weight study and 11 health states were related to environmental diseases as described in the Korea national burden of disease 2012 study. The health states related to environmental diseases were developed by authors (M Ock and MW Jo) based on the existing lay descriptions from the GBD 2010 disability weight study to enhance comparability between health states. We attached a supplemental file with the lay descriptions in English of the added health states (S1 File).
Among the remaining health states, 25 were derived from the EQ-5D-5L health states selected from an orthogonal design [23]. Lastly, two health states ('full health' and 'being dead') were included as anchor points. M Ock first translated the 220 health states from the GBD 2010 disability weight study into Korean, and MJ Jo revised them. Back translation was performed by a bilingual person and rechecked by M Ock and MJ Jo.

Survey procedure and interviewer training
In both surveys, participants were initially asked about their gender, age, and educational level. Thereafter, the participants evaluated randomly selected health states by using valuation methods. Different valuation methods were used for the household survey and the web-based survey. Paired comparison, VAS, and SG were used in the household survey, whereas paired comparison and PHE were applied in the web-based survey. Visual aids for SG were used to help participants understand the changes of probability. After the evaluation of health states, participants were additionally asked about other socio-demographic factors, such as current job, income, and clinical information, such as ambulatory care visit in the past 2 weeks, hospitalization in the past 12 months, and morbidities. In the case of morbidities, we asked participants whether they currently had any diseases.
The interviewers of the household survey were explained the survey procedure and health states and were trained to perform each valuation method. All interviewers performed 2 pilot tests before conducting field surveys. The total training time for the interviewers was approximately 2.5 hours.

Valuation method
The participants in the household survey were asked to elicit their preferences of health states by using 3 valuation methods (paired comparison, VAS, and SG). First, in the paired comparison, the participants were asked to select the healthier option between 2 health states, which were randomly extracted from among the 258 health states (including 'full health' and 'being dead'). Each participant conducted a total of 15 paired comparisons.
Second, in the VAS, the participants were asked to rate the proposed health state on a scale from 0 to 100, with 0 representing the worst imaginable health state and 100 indicating the best imaginable health state. Each participant performed a total of 3 VAS tests. For the first and second VAS tests, 2 health states were randomly selected from among the 256 health states (excluding 'full health' and 'being dead'), while 'being dead' was assessed in the third VAS.
Third, in the SG, we asked participants to choose between 2 health states, wherein 1 health state was randomly selected from among the 256 health states (excluding 'full health' and 'being dead') and the other one was 'being dead'. Each participant conducted the SG 3 times. If the first health state was regarded as worse than 'being dead', the next SG question was asked. If the health state was regarded as better than 'being dead', the participants were asked to choose between remaining in that particular health state for rest of their life or having an alternative treatment that could result in the restoration to full health or in immediate death. The questions were continued until the participant did not have a preference between the 2 options. The minimum probability interval of SG was 5%. The probability of 2 choices started at 50% and changed by 5% depending on the participant's response. Overlap was possible in the health states included in the paired comparison, VAS, and SG.
The participants in the web-based survey were asked to evaluate health states by using 2 valuation methods: paired comparison and PHE. As in the household survey, in the paired comparison, the participants were asked to choose the healthier option between 2 health states, which were randomly extracted from among the 258 health states (including 'full health' and 'being dead'). Each participant performed a total of 15 paired comparisons.
In the PHE, participants were asked to choose the better option between 2 different programs [6]. The first ('program A') was a life-saving program, in which 1,000 people were prevented from getting a fatal illness causing rapid death. The second ('program B') was a diseaseprevention program, in which a certain number of patients with the proposed health state (randomly selected from 1,500, 2,000, 3,000, 5,000, and 10,000) were prevented from getting a less fatal illness. The health state for 'program B' was randomly selected from among 256 health states (excluding 'full health' and 'being dead'). If the participant thought 'program A' produced a greater overall population health benefit, the number of patients for 'program B' increased to the next higher value, from among 1,500, 2,000, 3,000, 5,000, and 10,000. In contrast, if the participant though 'program B' produced a greater overall population health benefit, the number of patients for 'program B' decreased to the next lower value, from among 1,500, 2,000, 3,000, 5,000, and 10,000. The questions were continued until the choice was altered from 'program A' to 'program B' or vice versa or until the number of patients for 'program B' could no longer be increased or decreased.

Analysis
Descriptive analyses for the socio-demographic factors were first conducted. Then, the disability weights of the health states for each participant were evaluated. Four models were considered: a paired comparison-only model (Model 1); a hybrid model between paired comparison and PHE (Model 2); a VAS model (Model 3); and a SG model (Model 4).
In Model 1, we randomly selected 80% of the data from the pooled paired comparison data including data from the household survey and the web-based survey. The remaining 20% of the paired comparison data were used to assess the fit of Model 1. Probit regression, which has been commonly used in the analysis of paired comparison data [24], was applied with the stated paired comparison choice as the dependent variable. The 258 health states were regarded as independent variables and treated as dummy variables with 'being dead' as the reference. From the coefficient estimates of each health state, we calculated the predicted probabilities. To anchor the transformed predicted probabilities of health states on the disability weight scale ranging from 0 to 1, we used that of 'being dead (1)' and 'full health (0)' as anchor points. The mean absolute difference was assessed between the observed probability of being selected from the 20% paired comparison data and the predicted probability from the 80% paired comparison data.
In Model 2, we used pooled paired comparison data including those from the household survey and the web-based survey and PHE data from the web-based survey. Initially, we obtained the predicted probabilities from paired comparison data in the same manner as in Model 1, and performed interval regression analysis to obtain the predicted probabilities from PHE data, by adapting the methodology used in the GBD 2010 disability weight study [6]. To link the predicted probabilities between the paired comparison and disability weight estimates derived from the PHE, linear regression was applied with the disability weight estimates from PHE as the dependent variables and the predicted probabilities from the paired comparison as the independent variables. We obtained the predicted probabilities by using the coefficient estimates of each health state and regarded them as disability weights for Model 2.
In Model 3 and Model 4, the concept of disutility was applied; disutility is defined as 1 minus the utility and was assumed to be equal to the disability weight. For Model 3, we used VAS data from the household survey. The utility weights of health were estimated with the formula: 'VAS values of the health state/100', if the VAS value of 'being dead' was evaluated as 0. In contrast, the utility weights of the health states were estimated with the formula: '(VAS values of the health state-VAS values of 'being dead')/(100-VAS values of 'being dead')', if the VAS values of 'being dead' was not evaluated as 0. Similar to other models, we obtained the predicted probabilities by using linear regression and regarded them as the disability weights for Model 3.
For Model 4, we used SG data from the household survey. The utility weights of the health states were calculated differently according to the response obtained in the comparison between the proposed health state and 'being dead'. If the health state was evaluated as better than 'being dead', the utility weight of the health state was calculated as the possibility of the restoration to full health. On the other hand, the utility weights of the health states that were evaluated as worse than 'being dead' were censored at 0 utility weight. As in Model 3, the disutility of each health state was calculated using the formula: '1 -utility = disutility'. In addition, we estimated the predicted probabilities by using linear regression and considered them as the disability weights for Model 4.
We calculated the 95% confidence intervals of the disability weights by using the 95% confidence intervals of the predicted probabilities in each model. The frequency distributions of the disability weights from the models were determined and the Pearson correlation coefficients were calculated to compare the disability weights from these models to those obtained in the GBD 2010 disability weight study. Furthermore, the values of '1 minus the disability weights' from the EQ-5D-5L were compared with the utility weights from the EQ-5D-5L, which were derived from the EQ-5D-5L valuation study in Korea, to evaluate the validity of the disability weights in the best model [25]. All statistical analyses were conducted using the Stata 13.1 software (StataCorp, College Station, TX). The Stata code is available from the author upon request. P-values below 0.05 were considered statistically significant.

Results
A total of 2,728 individuals participated in the household survey and 3,188 participated in the web-based survey. The details of the participants' socio-demographic factors and clinical information for each survey are summarized in Table 1. Those who participated in the web-based survey tended to be younger, have female gender, be involved in non-manual labor, and have higher levels of education and monthly household income, as compared to those who participated in the household survey. However, the people who participated in the household survey tended to have a lower number of clinically relevant medical problems as compared to the participants in the web-based survey.
The estimated disability weights for the 256 health states (excluding 'full health' and 'being dead') from the models and the 220 health states from the GBD 2010 disability weight study are shown in Table 2. The 95% confidence intervals in each model can be found in S1 Table. The frequency distributions of the disability weights of the 220 overlapping health states are presented in Table 3 according to each model. In the GBD 2010 disability weight study, 85.5% of the health states were located below a disability weight of 0.4. However, the frequency distribution of the disability weights from this study differed according to each model. The proportion of health states below a disability weight of 0.4 was 30% in Model 1, 98.6% in Model 2,      Based on the distribution of disability weights of the health states for each model and the Pearson correlation coefficients, the paired comparison-only model based on probit regression was selected as the best model for estimating the disability weight in South Korea and for maintaining the simplicity of the analysis. The estimated disability weights and 95% confidence intervals for the 256 health states (excluding 'full health' and 'being dead') from Model 1 are shown in Table 4. The health state with the highest disability weight (0.912) was 'Spinal cord lesion at neck level: untreated' (N191 in Table 4), followed by 'Spinal cord lesion below neck:

Discussion
In the present study, the disability weights for the 256 health states were estimated based on the perceptions of the 2,728 participants in the household survey and those of the 3,188 participants in the web-based survey from among the general South Korean population. Four models were used in the analysis of responses and, Model 1, the paired comparison-only model was selected as the best model for estimating the disability weights in South Korea. This is based on the distribution of the disability weights of the health states, the Pearson correlation coefficient, and simplicity of the analysis. Although the Pearson correlation coefficient was highest in Model 2, the difference between the Pearson correlation coefficients from Model 1 and Model 2 was only 0.006 and the discrimination of values according to the severity of the health states was better in Model 1 than in Model 2. We showed that the disability weights could be estimated based only on paired comparison data by including 'full health' and 'being dead' as anchor points in the compared health state lists.
The results from our current study, and in particular the data generated from Model 1, showed that PHE data are not needed to calculate disability weights. PHE is the revised version of the person trade-off (PTO) provided by Nord [26]. Although there are several variations of PTO analyses [6,8,10,27,28], the lack of theoretical support, ethical concerns about distributional preference, and the questionable validity of forced consistency have been reported [13,22,29]. Nevertheless, in the GBD 2010 disability weight study, the responses to PHE were still Table 3. Distribution of disability weights for the 220 health states from the GBD 2010 study according to each model. Estimation of Disability Weights in South Korea used to anchor the results from the paired comparison data on the disability weight scale ranging between 0 and 1 [6]. In the GBD 2013 study, the disability weights were revised by including the results from the European disability weight study, however, PHE data were utilized in addition to PC data to estimate the disability weights [16,30]. Hence, the use of the trade-off method to link the results from the PHE with paired comparison data was unavoidable. However, the addition of 'full health' and 'being dead' as anchor points in the process of estimating disability weights could help overcome the concerns of PTO or PHE. In the other valuation method, such as SG and time trade-off, 'being dead' is regarded as a reference for eliciting participants' preferences [2]. In the VAS, the 'best imaginable health state' and the 'worst      imaginable health state' are utilized as references [31]. By adding 'full health' and 'being dead' to the list of health states, we can the estimate disability weights based on a paired comparison, without relying on PTO or PHE. Using only paired comparison as a valuation method for estimating disability weight can simplify data analysis. In addition, the analytical methodology for paired comparison data has a sound theoretical basis. For example, Thurstone's model has been in widespread use for the analysis of paired comparison data since the 1920s [32], and the Bradley-Terry model has also been extensively used [33,34]. In addition, when survey questions have binary choices, as in the present study, probit regression models such as in Model 1 are appropriate for data analysis [22]. Because paired comparisons are easier for participants to understand and more convenient to employ than PTO or PHE, more consistent responses from participants, in particular, from participants with a low educational level, are obtained [35]. Taken together, these points suggest that the use of a paired comparison-only model is an appropriate method for estimating disability weights in the future.
Some disability weights may appear slightly counterintuitive in terms of the extent and order as compared to others, as the disability weights for numerous health states were estimated to range from 0 to 1. However, it is not easy to assess the validity, particularly the concurrent validity, of disability weights, as there is no gold standard for the disability weights [17]. In the present study, we utilized the EQ-5D-5L health states to evaluate the validity of the disability weights and support the robustness of the analytic methods. When the values of 1 minus disability weights and the utility weights from the 25 EQ-5D-5L health states in Model 1 were compared, there was a fairly high Pearson correlation coefficient between these parameters. In the case of EQ-5D-5L 11111, which indicates no problems in the 5 dimensions, the disability weight was estimated to be 0.116. However, in general, the parameter estimate of constant is included in the models for the EQ-5D-5L valuation study, and the constant variable in the tariff formula for the Korean EQ-5D-5L valuation study (0.096) was found to be similar to the disability weight of EQ-5D-5L 11111 in Model 1 [25]. We cannot determine whether the disability weight or utility weight should be considered as the gold standard; hence, we only compared the utility weights and disability weights from EQ-5D-5L, and did not use them to adjust the results of the analyses. Another method to confirm the validity of disability weights is to detect the reverse order of disability weights in specific health states with different severity levels (e.g. mild, moderate, and severe) [17]. In Model 1, there was no inversion of disability weights in the health states with different severity levels. For example, in the case of 'Hearing loss', the disability weights were 0.138 for mild, 0.231 for moderate, 0.406 for severe, 0.491 for profound, and 0.669 for complete hearing loss. In contrast, the disability weights of 'Hearing loss' in the GBD 2010 disability weight study were 0.005 for mild, 0.023 for moderate, 0.032 for severe, 0.031 for profound, and 0.033 for complete hearing loss [6]. In addition, in the GBD 2013 disability weight study, the disability weights of 'Hearing loss' were 0.010 for mild, 0.027 for moderate, 0.158 for severe, 0.204 for profound, and 0.215 for complete hearing loss [16]. Thus, the disability weights of 'Hearing loss' in the present study were larger than those in the GBD 2010 disability weight Estimation of Disability Weights in South Korea study and GBD 2010 disability weight study. Although it is not easy to determine the validity of disability weights, discussions for refining the methodology, including the method of analysis, are needed to determine valid disability weights.
Contextual differences in the perception of health problems are also an important matter. Although the universality of disability weights has been questioned, previous studies have shown conflicting results. A study among western European countries reported a reasonably high level of agreement in the ranking of disability weights [27], and the GBD 2010 disability weight study showed strong evidence of highly consistent results for disability weights between countries [6]. However, another study showed differences in the ranking of the majority of health states between 14 countries, indicating a lack of universality of disability weight assessment [36]. In our current study, comparing the disability weights between Model 1 and the previous GBD 2010 disability weight study, a significantly similar pattern was seen, based on the Pearson correlation coefficient. However, only few reports have investigated the contextual differences in disability weight assessment and further studies are needed to determine the universality of such data. As mentioned above, using a paired comparison-only model may simplify the execution of disability weight studies at the national level. Furthermore, we expect that pooling such data may overcome concerns on the universality of disability weights.
In our study, health state descriptions played an important role in the resultant values of disability weights [6,17]. We mainly used lay descriptions of health states based on the GBD 2010 disability weight study by translating English into Korean. For this reason, similar patterns of disability weights were seen between our study and the GBD 2010 disability weight study. For example, the disability weights of drug addiction health states were high as compared to the other health states in both studies. We suspect that the social stigma associated with drug addiction may influence the perception of participants, because the lay descriptions of the drug addiction health states included the name of the drugs. This phenomenon became more apparent in the comparisons of the disability weights for health states related to HIV or AIDS, for which the descriptions contained no mention of "HIV" or "AIDS". We also assumed that this phenomenon might be more prominent in a survey involving the general public than one specifically involving health professionals. Hence, a comparison of the results of a survey of the general public and health professionals would be meaningful, and the re-estimation of the disability weights may be needed to diminish the controversy over these disability weights after modifying the lay description.
One limitation of the present study was that information about the response rate of the household survey and web-based survey was not obtained. Therefore, we could not determine the number of people who refused to participate in both surveys and dropped out during the surveys. Moreover, we could not exclude the possibility of a non-response bias; this limitation may restrict the representativeness of this study in Korea. Another limitation of this study is that we did not verify the responses in the web-based survey. Although the individuals who participated in the web-based survey tended to be younger, those who participated in the household survey tended to have a lesser number of clinical medical problems than the participants in the web-based survey. Hence, quality control, including the verification of responses, will be required in a future web-based survey.

Conclusions
The paired comparison-only model is the best model for estimating disability weights in South Korea, based on the distribution of the disability weights of the health states, the Pearson correlation coefficient, and the simplicity of the analysis. Hence, disability weights can be estimated using only paired comparisons and by including 'full health' and 'being dead' as anchor points in the list of health states. Furthermore, we utilized the EQ-5D-5L health states to evaluate the validity of disability weights and determined the robustness of the paired comparison-only model. By adapting and simplifying the methodology of estimating disability weights, as in the present study, we believe that addition empirical evidence on the universality of disability weight can be obtained.