Psychometrics of the Short Form 36 Health Survey Version 2 (SF-36v2) and the Quality of Life Scale for Drug Addicts (QOL-DAv2.0) in Chinese Mainland Patients with Methadone Maintenance Treatment

Objective To test psychometrics of the Short Form 36 Health Survey version 2 (SF-36v2) and the Quality of Life Scale for Drug Addicts (QOL-DAv2.0) in Chinese mainland patients with methadone maintenance treatment (MMT). Methods A total of 1,212 patients were recruited from two MMT clinics in Xi’an, China. Reliability was estimated with Cronbach’s α and intra-class correlation (ICC). Convergent and discriminant validity was assessed using multitrait-multimethod correlation matrix. Sensitivity was measured with ANOVA and relative efficiency. Responsiveness was evaluated by pre-post paired-samples t-test and standardized response mean based on the patients’ health status changes following 6-month period. Results Cronbach’s α of the SF-36v2 physical and mental summary components were 0.80 and 0.86 (eight scales range 0.73–0.92) and the QOL-DAv2.0 was 0.96 (four scales range: 0.80–0.93). ICC of the SF-36v2 two components were 0.86 and 0.85 (eight scales range: 0.72–0.87) and the QOL-DAv2.0 was 0.94 (four scales range: 0.88–0.92). Convergent validity was lower between the two instruments (γ <0.70) while discriminant validity was acceptable within each instrument. Sensitivity was satisfied in self-evaluated health status (both instruments) and average daily methadone dose (SF-36v2 physical functioning and vitality scales; QOL-DAv2.0 except psychology scale). Responsiveness was acceptable in the improved health status change (SF-36v2 except vitality scale; QOL-DAv2.0 except psychology and symptoms scales) and deteriorated health status change (SF-36v2 except vitality, social functioning and mental health scales; QOL-DAv2.0 except society scale). Conclusions The SF-36v2 and the QOL-DAv2.0 are valid tools and can be used independently or complementary according to different emphases of health-related quality of life evaluation in patients with MMT.


Introduction
Drug abuse is a common problem over the past three decades in China [1]. Official statistics show that the number of registered drug users increased from 70,000 in 1990 [2] to 2.22 million at the end of May, 2013 [3]. Among drug users, about 75% to 85% depend on opioid [4]. Opioid dependence is a chronic maladaptive pattern of heroin or other opioid use, which often associated with co-morbid psychiatric disorders, and elevated risk of infection and transmission of blood-borne diseases (e.g., HIV/AIDS, hepatitis B or C), premature death and drug related crime [5]. It poses adverse effects for individuals and society and has become a major public health and social problem.
Methadone maintenance treatment (MMT) is a long term opioid replacement therapy with daily methadone administration [6]. In China, MMT was initiated as a pilot program in 8 clinics serving 1,029 drug users in 2004 [7] and subsequently expanded to 748 clinics serving 360,000 drug users in 2012 [8]. MMT bases on a harm reduction philosophy, represents one component of a continuum of treatment approaches for opioid dependent individuals, and allows a return-to-normal physiological, psychological and social functioning [9]. International evidence-based practices have proved that MMT is effective to prevent transmission of blood-borne diseases, to reduce illegal drug use and high-risk behaviors, to avoid criminal involvement and to enhance social productivity [10]. However, limited consideration is given to person-centered outcomes such as health-related quality of life (HRQoL) [11].
HRQoL is a multifactorial construct that describes the individual's perceptions of physical, psychological and social functioning [12]. Unlike clinical parameters, HRQoL is a more holistic assessment of health status regarding the individual's functional health and well-being, especially in chronic diseases [13]. Considering opioid dependence is a chronic disorder requiring a continuing care and support and MMT is a long term substitution therapy for opioid dependence [14], it is particularly important to use HRQoL as the primary endpoint for evaluating health and treatment effects in patients with MMT.
HRQoL is measured using generic or disease-specific instruments. Generic instruments provide a global assessment and allow for comparisons with other health conditions, while diseasespecific instruments evaluate the difficulties presented in a specific group of patients or associated with a specific disorder [15]. In the area of opioid dependence research, the commonly used generic HRQoL instruments include the Brief Version of the World Health Organization Quality of Life Instrument (WHOQOL-BREF) [16][17][18], the EuroQol-5D (EQ-5D) [19], the Short Form 36 Health Survey (SF-36) [20][21][22] and the Lancashire Quality of Life Profile (LQoLP) [11,23], and disease-specific instruments include the Injection Drug User Quality of Life Scale (IDUQoL) [24], the Health-related Quality of Life for Drug Abusers (HRQoLDA) [25] and the Quality of Life Scale for Drug Addicts (QOL-DAv2.0) [26,27]. However, a dearth of study evaluated the Short Form 36 Health Survey version 2 (SF-36v2) in MMT patients and the complementary use of the SF-36v2 and the QOL-DAv2.0 has not been assessed in this population of mainland China.
The objective of this study was to evaluate reliability, validity, sensitivity and responsiveness of the SF-36v2 and the QOL-DAv2.0 in Chinese mainland MMT patients. To our knowledge, this is the first study to test psychometric properties of both instruments in the same population. Findings of this work will help find proper tools for health management and provide evidence for need-oriented use of the SF-36v2 and the QOL-DAv2.0 in HRQoL evaluation in patients with MMT.

Ethics Statement
The study protocol was reviewed and approved by the Human Research Ethics Committee of Xi'an Jiaotong University. The written informed consent was obtained from each recruited patient before the questionnaire survey.

Subjects and Data Collection
The subjects were admitted patients of the Minle MMT clinic (privately funded) and the Xinan MMT clinic (publicly funded), each of which has the largest number of patients among the twelve MMT clinics in Xi'an, China. Inclusion criteria were aged 18 years or over and Chinese-speaking. If the patients had cognitive disorders or refused to give written informed consent, they were excluded.
Data were collected from March to September, 2012. The recruited MMT patients were given an individual face-to-face interview administered by the trained interviewers in a quiet and well-lit room. The patients answered the questions on sociodemographic and clinical characteristics, the SF-36v2 and the QOL-DAv2.0 at baseline (pretest). Besides, the two instruments were retested one week later and post-tested after 6-month followup, respectively.

Data Measurement
The SF-36 health survey version 2 (SF-36v2). The Chinese (simple) SF-36v2 was used in the study, which is provided by QualityMetric Incorporated [28]. It is the standard (4-week recall) form and consists of thirsty-six items. Except for the one singleitem of self-evaluated transition (SET) (item 2), the scores of the other thirty-five items are summated into eight multi-item scales, including physical functioning (PF), limitations due to physical health problems [role-physical (RP)], bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), limitations due to emotional health problems [role-emotional (RE)] and mental health (MH). The eight multi-item scales are aggregated into physical component summary (PCS) and mental component summary (MCS). The scoring of two summary components and eight scales was performed by QualityMetric Health Outcomes TM Scoring Software 4.0 based on norms with a mean of 50 and a standard deviation of 10 [29]. For all scales and summary components, higher scores demonstrate better HRQoL.
Quality of Life Scale for Drug Addicts (QOL-DAv2.0). The QOL-DAv2.0 was developed by Wan et al. in 1995 [30]. It consists of forty items measuring four scales of physiology (PH) (9 items), psychology (PS) (9 items), society (SO) (11 items) and symptoms (ST) (11 items), and one independent single-item of self-evaluated health status (item 41). Each item is rated on a 5-point Likert scale (1 = none/very difficult/very poor to 5 = severe/very easy/very good) and ranges from 1 to 5. The four scale scores are calculated by the corresponding endorsed item scores, ranging from 9 to 45 (PH and PS) and 11 to 55 (SO and ST). The total score is calculated by the four scale scores, ranging from 40 to 200. For all scales and total score, higher scores indicate better HRQoL. Specifically, the singleitem of self-evaluated health status (How do you evaluate your overall health status) has five levels (1 = very poor; 2 = poor; 3 = neither poor nor good; 4 = good; 5 = very good). In order to reflect the health change after follow-up, the patients were stratified into three health status, including improvement (i.e., the health status changed from very poor to poor, or very poor to good, and the like), status quo (i.e., the health status changed from very poor to very poor, or good to good, and the like), and deterioration (i.e., the health status changed from very good to good, or very good to poor, and the like).

Data Analyses
A database was built using the software EpiData 3.1 and the data were double-entered by two different persons to capture data entry errors. All analyses were carried out using SPSS 13.0 (SPSS Inc., Chicago, IL, USA). A value of P,0.05 was considered as statistically significant.
Internal consistency reliability was measured by Cronbach's a, with the value greater than 0.70 representing acceptable reliability [31]. Test-retest reliability was measured by intra-class correlation (ICC) between the one-week test-retest results. ICC # 0.40 is considered poor to fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 good agreement, and .0.80 excellent agreement [32]. Floor and ceiling effects were calculated as the number and percentage of the total patients at the lowest and highest possible scores. This should be less than 15% regarding floor or ceiling effect respectively to ensure that the scales (SF-36v2 and QOL-DAv2.0), summary components (SF-36v2) and total score (QOL-DAv2.0) are capturing the full range of potential responses in MMT patients and that changes over time can be detected [33].
Convergent and discriminant validity were tested using multitraitmultimethod (MTMM) correlation matrix [15]. Convergent validity is determined by scale-to-scale correlations between different instruments, with higher value (c $ 0.70) indicating the scales of different instruments measuring the same trait. Discriminant validity is determined by scale-to-scale correlations for each instrument, with lower value (c ,0.70) representing the scales measuring different traits. In this study, scale-to-scale correlations (spearman c) were computed between the SF-36v2 eight scales and the QOL-DAv2.0 four scales.
Sensitivity was evaluated by ANOVA and relative efficiency (RE). RE was calculated as the ratio of F-statistics of the scales (SF-36v2 and QOL-DAv2.0), summary components (SF-36v2) and total score (QOL-DAv2.0), with the smallest F-statistics being the denominator in RE calculation. The smallest RE (RE = 1) represents the least sensitivity; the higher the RE value, the better the sensitivity [15]. In this study, the significant F-statistics with corresponding higher RE value represents acceptable sensitivity. Additionally, multiple linear regression analysis was used to further prove sensitivity of the SF-36v2 and the QOL-DAv2.0. The SF-36v2 two summary components and eight scales scores and the QOL-DAv2.0 four scales and total scores were dependent variables, respectively; independent variables were age, gender, education attainment, marital status, employment status, average monthly income over the past year (Chinese $), chronic disease, days of methadone intake and average daily dose of methadone intake (mg).
Based on stratification of the patients' health status (i.e., improvement, status quo, deterioration), responsiveness was assessed by paired-samples t-test between the 6-month follow-up pre-post test results, with statistical significant mean differences representing magnitude observed change [34]. Another evaluation of responsiveness was standardized response mean (SRM), which is a ratio of observed change and the standard deviation reflecting variability of the change scores. SRM values of 0.20, 0.50, and 0.80 or greater have been proposed to represent small, moderate, and large responsiveness respectively [34]. In this study, the statistical significant mean difference with corresponding eligible SRM value demonstrates acceptable responsiveness.

Results
A total of 1,212 patients were recruited at baseline, with 851 (70.2%) in the Minle MMT clinic and 361 (29.8%) in the Xinan MMT clinic. One hundred patients were randomly selected using a randomization code generated by computer software in the retest survey [35] and 1,010 patients completed the 6-month follow-up. Two-hundred and two (16.7%) patients lost to follow-up due to transferring to other MMT clinics (n = 153, 75.7%), being admitted to hospital (n = 40, 19.8%), or losing contact (n = 9, 4.5%). In the face-to-face interview, the patients understood the questions of the SF-36v2 and the QOL-DAv2.0 well and finished both questionnaires completely.

Reliability and Floor/ceiling Effect
Internal consistency reliability, test-retest reliability and floor/ ceiling effect are detailed in Table 2. Cronbach's a of the SF-36v2 physical and mental summary components were 0.80 and 0.86 (eight scales ranged from 0.73 to 0.92) and the QOL-DAv2.0 total score was 0.96 (four scales ranged from 0.80 to 0.93). ICC of the SF-36v2 two summary components were 0.86 and 0.85 (eight scales ranged from 0.72 to 0.87) and the QOL-DAv2.0 total score was 0.94 (four scales ranged from 0.88 to 0.92). Floor effects of the SF-36v2 and the QOL-DAv2.0 were all less than 15%. Except for the SF-36v2 physical functioning (18.2%), role-physical (23.9%), bodily pain (30.2%), social functioning (23.2%) and role-emotional (22.4%) scales, ceiling effects of the SF-36v2 remaining scales and summary components and the QOL-DAv2.0 four scales and total score were less than 15%.

Sensitivity
Sensitivity was assessed in self-evaluated health status and average daily dose of methadone intake (mg). With respect to the five levels of health status, the SF-36v2 and the QOL-DAv2.0 scores increased from 'very poor' to 'very good' level by level with significant F-statistics (P,0.001). Compared with bodily pain scale which had the least sensitivity (RE = 1), other scales with better sensitivity were found in physiology (RE = 3.21) and general health (RE = 2.78) and the QOL-DAv2.0 total score (RE = 2.53). RE of the remaining scales and summary components ranged from 1 to 2, representing close sensitivity ( With respect to the three levels of average daily methadone dose (i.e., # 20mg, 21-60mg, .60mg) [36], the SF-36v2 and the QOL-DAv2.0 scores decreased as the methadone dose increased. Regarding the comparison among the three group patients with different average daily methadone dose, significant F-statistics with higher RE value were found in the SF-36v2 physical functioning (RE = 3.14) and vitality (RE = 3.91) scales, and the QOL-DAv2.0 physiology (RE = 2.87), society (RE = 3.71) and symptoms (RE = 4.53) scales and total score (RE = 3.72), respectively (Table 5).

Responsiveness
According to health status changes, the patients were stratified into improvement, status quo and deterioration groups. In the Table 2. Internal consistency reliability, test-retest reliability, and floor/ceiling effect of the SF-36v2 and the QOL-DAv2.0 (N = 1212).   Table 4. Sensitivity of the SF-36v2 and the QOL-DAv2.0 in self-evaluated health status: scores (mean 6 SD) and relative efficiency (RE) (N = 1212). improvement group (n = 33), significant mean differences were found in the SF-36v2 physical component summary and rolephysical, general health, role-emotional and mental health scales and the QOL-DAv2.0 physiology scale; eligible SRM were found in the SF-36v2 except vitality scale and the QOL-DAv2.0 except psychology and symptoms scales. In the status quo group (n = 216), significant mean differences were found in the SF-36v2 mental component summary and general health and vitality scales and the QOL-DAv2.0 physiology and society scales; eligible SRM was found in the SF-36v2 general health scale. In the deterioration group (n = 761), significant mean differences were found in the SF-36v2 and the QOL-DAv2.0; eligible SRM were found in the SF-36v2 except vitality, social functioning and mental health scales and the QOL-DAv2.0 except society scale (Table 6).

Discussion
This study demonstrated that the SF-36v2 and the QOL-DAv2.0 have satisfactory psychometric properties in patients with MMT. Internal consistency reliability (Cronbach's a) of the SF-36v2 and the QOL-DAv2.0 were all above the critical threshold (0.70) for acceptable reliability, indicating that each questionnaire measures homogeneous extent in supporting the same concept [37]. Test-retest reliability of the SF-36v2 was good (ICC .0.60) and the QOL-DAv2.0 was excellent (ICC .0.80), showing the stability of each questionnaire over time [38]. The consistent result of the SF-36v2 reliability was also found by Lam et al. [39], who reported that Cronbach's a of the eight scales ranged from 0.81 to 0.91 and ICC ranged from 0.54 to 0.93. With respect to the QOL-DAv2.0, Zhang et al. [40] found that Cronbach's a of the four scales ranged from 0.88 to 0.93 and ICC ranged from 0.73 to 0.92.
These consistent findings confirm that the SF-36v2 and the QOL-DAv2.0 have acceptable reliability.
Both the SF-36v2 and the QOL-DAv2.0 had eligible floor effects, with the lowest score percentages less than 15%. However, ceiling effects of the SF-36v2 were significant in physical functioning (18.2%), role-physical (23.9%), bodily pain (30.2%), social functioning (23.2%) and role-emotional (22.4%) scales, revealing that these five scale scores did not capture the full range of the corresponding responses and detect the special changes during the treatment process in patients with MMT [33]. Being a disease-specific instrument, the QOL-DAv2.0 had no significant ceiling effects in four scales and total score. It confirms that the QOL-DAv2.0 is superior to the SF-36v2 in capturing full range of changes in certain health domains in MMT patients.
Between the SF-36v2 and the QOL-DAv2.0, scale-to-scale correlation coefficients were all less than 0.70, indicating that the scales between the two instruments measuring different traits of HRQoL. HRQoL consists of physical, psychological and social aspects [15]. The SF-36v2 and the QOL-DAv2.0 reflect and measure these three aspects from generic and disease-specific viewpoints respectively; besides, the QOL-DAv2.0 is more specific in measuring the impacts of symptoms on HRQoL. Therefore, using the SF-36v2 and the QOL-DAv2.0 complementary is helpful to evaluate HRQoL comprehensively in patients with MMT.
For the SF-36v2 and the QOL-DAv2.0, scale-to-scale correlation coefficients within each instrument were less than 0.70, demonstrating that the scales within each instrument measuring different traits of HRQoL [15]. However, lower discriminant validity (c $ 0.70) were found in the SF-36v2 role-emotional-torole-physical correlation (c = 0.75) and mental health-to-vitality correlation (c = 0.73), and the QOL-DAv2.0 psychologyto-physiology correlation (c = 0.75) and society-to-psychology Table 5. Sensitivity of the SF-36v2 and the QOL-DAv2.0 in average daily dose of methadone intake (mg): scores (mean 6 SD) and relative efficiency (RE) (N = 1212).  Table 6. Responsiveness of the SF-36v2 and the QOL-DAv2.0: scores (mean 6 SD) and standardized response mean (SRM) stratified on improvement, status quo, or deterioration in health status (N = 1010). correlation (c = 0.74). The probable explanation may be that the items within these four-pair correlation scales relate to feelings or perceptions toward activities, function or psychological status under certain circumstances, which increase the correlation between the corresponding scales. Therefore, the discriminant validity of these four-pair scales needs further examination. Sensitivity is the ability to detect differences between or among groups [15]. In patients with different self-evaluated health status, the SF-36v2 and the QOL-DAv2.0 were sensitive in detecting differences of all scales, summary components and total score, which were also proved by the results of multiple linear regression analysis. The finding indicates that both instruments were sensitive in detecting HRQoL score differences from the viewpoint of selfperception in health status.
In patients with different average daily methadone dose, both the SF-36v2 and the QOL-DAv2.0 scores decreased as the methadone dose increased, which is probably because of methadone adverse effects, including severe constipation, lethal cardiac complications, major depression, anxiety, and sexual dysfunction [41]. However, the two instruments were sensitive in specific scales. The SF-36v2 was mainly sensitive in physical functioning and vitality scales; after controlling for influences of sociodemographic and clinical characteristics, the multiple linear regression analysis further confirmed the difference in vitality scale. No significant F-statistics with corresponding eligible RE were found in the remaining two summary components and six scales. This is probably because of the following reasons: firstly, although the SF-36v2 scores decreased as the average daily methadone dose increased, there were no significant impacts of average daily methadone dose on the patients' overall physical and mental health, limitations due to physical or emotional health problems, bodily pain, general health, and social function; secondly, the generic SF-36v2 measures health status in general population as well as specific disease population, which is poorer in detecting differences of special characteristics; thirdly, the sample sizes of patients with different average daily methadone dose were unbalanced (i.e., # 20mg: n = 23; 21-60mg: n = 886; .60mg: n = 303), which may influence statistical inference results. Therefore, future work should further examine sensitivity of the SF-36v2 in MMT patients with different methadone dose.
Unlike the SF-36v2, the QOL-DAv2.0 was sensitive in physiology, society and symptoms scales and total score, especially in symptoms scale after controlling for influences of sociodemographic and clinical characteristics. The finding proves that the QOL-DAv2.0 is superior to the SF-36v2 in detecting differences in physical function, social function, symptoms and total health status in MMT patients with different average daily methadone dose. However, psychology scale had no significant Fstatistics and eligible RE, which is probably because the patients' psychological function was not significantly influenced by average daily methadone dose. Therefore, sensitivity of the QOL-DAv2.0 psychology scale needs further exploration.
Responsiveness is the ability of a scale to detect changes [15]. After the 6-month follow-up, both the SF-36v2 and the QOL-DAv2.0 were responsive in detecting improved and deteriorated health status changes. Regarding the improvement group, eligible SRM were found in the SF-36v2 and the QOL-DAv2.0, demonstrating that both instruments had better responsiveness. The scales without eligible SRM (i.e., the SF-36v2 vitality scale (0.06) and the QOL-DAv2.0 psychology (0.11) and symptoms (0.11) scales) indicate that the increased changes of the corresponding scores were inconsistent between patients [34], that is, some patients improved significantly in vitality, psychological function and symptoms, while others did not. On the other hand, the mean differences were not statistically significant in the SF-36v2 mental component summary and physical functioning, bodily pain, vitality and social functioning scales and the QOL-DAv2.0 except physiology scale, which is probably due to small sample size (n = 33). Therefore, a larger sample size is needed to confirm the responsiveness of these scales in future work.
Regarding the deterioration group, significant mean differences with eligible SRM were found in the SF-36v2 and the QOL-DAv2.0, suggesting that both instruments were responsive to detect decreased score changes in MMT patients after the 6month follow-up. In MMT clinical practice, it is very helpful for clinicians to find those patients in such condition and to implement targeted interventions as early as possible. However, SRM of the SF-36v2 vitality (20.17), social functioning (20.13) and mental health (20.19) scales and the QOL-DAv2.0 society (20.17) scale were less than the critical value of 0.20, revealing the inconsistent decreased changes of the corresponding scores [34]. Therefore, it needs further examination on SRM of these scales in detecting deteriorated health status change in patients with MMT.
Specifically, significant mean difference with eligible SRM was found in the SF-36v2 general health scale in the status quo group, revealing discrepancy between the self-rated health status and corresponding real score difference. This is probably because the score change of general health was more significant than that of other health domains after the 6-month follow-up, which further proves better responsiveness of the SF-36v2 general health scale.
There were some limitations of the study. First, the SF-36v2 and the QOL-DAv2.0 were administered using a face-to-face interview, the performance of the instruments by self-completion will need to be confirmed by future work. Second, this study was conducted in Xi'an, which limited the generalization of the results to all of the Chinese mainland MMT patients.
The SF-36v2 and the QOL-DAv2.0 have been proved valid tools for assessing HRQoL in patients with MMT. Reliability of the two instruments was demonstrated to be strongly satisfactory. Convergent and discriminant validity showed that both instruments measured different traits of HRQoL. Better sensitivity was confirmed in both instruments in self-evaluated health status; regarding average daily methadone dose, the SF-36v2 was sensitive in physical functioning and vitality scales and the QOL-DAv2.0 in physiology, society and symptoms scales and total score. Both instruments were responsive in detecting improved and deteriorated health status changes. The SF-36v2 and the QOL-DAv2.0 can be used independently or complementary according to different emphases in HRQoL evaluation in patients with MMT.