Validation of the Disease-Specific Components of the Kidney Disease Quality of Life-36 (KDQOL-36) in Chinese Patients Undergoing Maintenance Dialysis

Aim The aim of this study was to evaluate the validity, reliability and sensitivity of the disease-specific items of the Kidney Disease Quality of Life-36 (KDQOL-36) in Chinese patients undergoing maintenance dialysis. Methods The content validity was assessed by content validity index (CVI) in ten subjects. 356 subjects were recruited for pilot psychometric testing. The internal construct validity was assessed by corrected item-subscale total correlation. Confirmatory factor analysis (CFA) was used to confirm the factor structure. The convergent validity was assessed by Pearson’s correlation test between the disease specific subscale scores and SF-12 version 2 Health Survey (SF-12 v2) scores. The reliability was assessed by the internal consistency (Cronbach’s Alpha coefficient) and 2-week test–retest reliability (intraclass correlation coefficient (ICC)). The sensitivity was determined by performing known group comparisons by independent t-test. Results The CVI on clarity and relevance was ≥ 0.9 for all items. Corrected item- total correlation scores were ≥0.4 for all, except an item related to problems with access site. CFA confirmed the 3-factor structure of the disease-specific component of the KDQOL-36. The correlation coefficients between the disease-specific domain scores and the SF-12 v2 physical and mental component summary scores ranged from 0.328 to 0.492. The reliability was good (Cronbach’s alpha coefficients ranged from 0.810 to 0.931, ICC ranged from 0.792 to 0.924). Only the effect subscale was sensitive in detecting differences in HRQOL between haemodialysis and peritoneal dialysis patients, with effect size = 0.68. Conclusion The disease-specific items of the KDQOL-36 are a valid, reliable and sensitive measure to assess the health-related quality of life of Chinese patients on maintenance dialysis.


Introduction
The number of people needing maintenance dialysis for end-stage renal disease (ESRD) worldwide is increasing at an alarming rate causing significant global and individual burden to health and wellbeing [1]. It was suggested that the incidence of ESRD will increase disproportionately in developing countries, such as mainland China due to the expanding numbers of elderly people [2]. In 2012, Hong Kong ranked 11 th among Asian countries in the prevalence of ESRD with patients aged 65 or older [3]. These patients require renal replacement therapy (RRT) in the form of kidney transplantation or maintenance dialysis. The vast majority of RRT patients in Hong Kong are managed in the public sector. The 2013 Renal Registry database in Hong Kong which reflects this patient population revealed that about 60% of patients on RRT receive dialysis. Of these, 76% are on peritoneal dialysis (PD) with the remainder on haemodialysis (HD) [4].
Maintenance dialysis is a rigorous and time consuming process which may have a significant effect on patients and their family members or carers [5]. Impacts include physical limitation, impairment of social functioning and psychological distress [6]. This results in poorer health-related quality of life (HRQOL) which has been linked to poorer clinical and service outcomes [7]. The goal of dialysis care is to prolong life while maintaining a patient's quality of life. Therefore, a valid and reliable tool for measuring quality of life specific to patients with of kidney disease is needed as an outcome measure to monitor treatment effectiveness and also to help assess the value of other interventions tailored to improve patient care.
The Kidney Disease Quality of Life-36 (KDQOL-36) is a short form version of the Kidney Disease Quality of Life Questionnaire which has generic and disease specific components [8]. The Cantonese Chinese version of the KDQOL-36 was translated by MAPI and is available on RAND Health (http://www.rand.org/health/surveys_tools/kdqol.html). This version was developed through iterative forward and backward translation only without a cognitive debriefing step to confirm the content validity of the measure [personal communication]. The step is important to ensure that the translated measure is conceptually equivalent to the original version, and relevant and culturally acceptable to the target population. Pilot testing, usually in the form of cognitive debriefing interviews with ten patients from the target population is recommended to evaluate a measure for content validity [9].
Previous studies have evaluated the psychometric properties of the English [8] and Cantonese Chinese version of the KDQOL-36 [10]. Although the validity (criterion validity, convergent validity), reliability and sensitivity of the KDQOL-36 has been evaluated in Hong Kong before [10], a large proportion of the subjects (52.6% of the total sample) in that study were comprised of renal transplant patients which may influence the psychometric assessment of the instrument. As the KDQOL-36 was originally developed for dialysis patients [11], the KDQOL-36 may have some limitations when applied to transplant patients [12]. Some of the items in the instrument refer to signs and symptoms (such as fatigue, nausea, and problem with the access site) and effects (fluid restriction, dietary restriction and ability to travel) of end stage renal failure which may not be relevant to transplant patients. A qualitative study on transplant patients found that they were free of stress and anxiety related to the need to undergo dialysis, repeated blood test to assess renal function and the social isolated imposed by ESRD such as fatigue [13]. Their HRQOL improved because they think they can "get back to normal" [13]. On the other hand, the KDQOL-36 might miss the specific concerns of transplant patients such as anxiety and worries related to the side effects of immunosuppressants and graft rejection [12,13].
To strengthen and build on the findings of previous validation study in Hong Kong [10], we sought to add new evidence on the psychometric properties of the disease-specific subscales of the KDQOL-36 which has not been previously evaluated. Our specific objective was to assess the content validity, construct validity, reliability, sensitivity and factor structure of the diseasespecific subscales of the KDQOL-36 in Chinese patients undergoing maintenance dialysis.

Study Design and Subjects
Content Validation. Cognitive debriefing interviews on the kidney disease-specific scales of the KDQOL-36 were conducted with ten Chinese (Cantonese-speaking) patients on haemodialysis to assess the clarity, relevance and interpretation of each item by one of the authors (EPHC). Subjects for the cognitive debriefing interviews were recruited by convenience sampling, balanced for age, from one community-based haemodialysis center. Patients who could understand Cantonese were eligible to participate in the cognitive debriefing interviews. The subjects were asked: (i) whether they could understand the item; (ii) to interpret the items in their own words, and (iii) whether the items were relevant to their kidney disease [14]. The answers from the interviews were transcribed verbatim and an expert panel reviewed the results of the cognitive debriefing interviews. The content validity index (CVI) on clarity and relevance was assessed by examining the proportion of the dichotomous responses "yes" or "no". Items with CVI 0.8 were considered to have good content validity [15].
Pilot Psychometric Testing. Subjects for pilot psychometric testing of the KDQOL-36 were patients undergoing peritoneal or hemodialysis recruited from two different settings (hospital-based renal units and community HD centres) in order to include patients with a spectrum of disease severity and socio-demographic characteristics. Patients from both groups were excluded if they were aged <18, could not understand Cantonese or refused to participate. Eligible patients were invited by a trained research assistant to take part in this study. The aims, procedures and nature of the study were explained before obtaining consent. Subjects who consented were subsequently interviewed by trained research assistants to complete the study instruments. All interviewers were required to read the questionnaire verbatim using a standardized approach. A total of 135 HD patients in hospital-based renal units, 118 HD patients in community HD centres and 103 PD patients were recruited and completed the survey between Oct 2014 and Sep 2015.
102 HD patients were randomly selected to repeat the questionnaire two-weeks after their baseline interview to assess test-retest reliability of the KDQOL-36. All interviewers administering the test-retest interviews were blinded to the results of the baseline interviews.
A sample size of 200 subjects (100 subjects from hospital-based renal units and 100 subjects from community HD centres) was planned, based on the recommendation for pilot psychometric testing [16]. A sample of 100 subjects in each group is able to detect a statistically significant difference between groups by independent t-test based on 80% power (p = 0.05, two tailed) with Cohen's moderate effect size 0.4.
(a) Disease specific components The KDQOL-36 comprises a generic and a disease-specific core. The disease-specific core has 3 subscales (4 items for burden of kidney disease; 12 items for symptom bother and problems and 8 items for effects of kidney disease). The domain scores are calculated by summation of the relevant item scores and transformation into a range from 0 to 100, with higher scores indicating better HRQOL [17,18].
(b) Short Form 12 Health Survey, version 2 (SF-12 v2) The generic core of the KDQOL-36 is the Short Form 12 Health Survey, version 1 (SF-12 v1). However, in the present study, we deliberately replaced the SF-12v1 with the SF-12 version 2 (SF-12 v2) because it outperforms the SF-12 v1 in terms of item wording, response scales and scoring algorithms, leading to better score precision [19] and is more commonly used in different patient populations in Hong Kong such as patients with lower urinary tract symptoms, prostate cancer and depression [20][21][22] and haemodialysis patients [23,24] because it has been well validated for use in Hong Kong [25]. The population norm of the SF-12 v2 has also been established for Chinese adults in Hong Kong [25,26]. The SF-12 v2 can be summarized into physical and mental component summary scores (PCS and MCS), with higher scores indicating better HRQOL. In this study, the SF-12 v2 was used to assess convergent validity [14,27].

Statistical Analysis
Since the generic core part (SF-12 v2) has been well developed and validated, only the analysis of the three subscales in the disease-specific components (burden, symptom and effects) were conducted in the present study.
Descriptive statistics were used to show the socio-demographic comparison between HD and PD group. Differences in characteristics between groups were tested using independent t-test for continuous variables or chi-square for categorical variables. Descriptive statistics, including percentage of floor and ceiling for each domain were calculated with 15% used as the threshold for a significant floor or ceiling effect [28].
The internal construct validity of disease specific domains was assessed by using corrected item scale correlation using cut-off scores 0.4 to indicate adequate correlation [29].
Factor structure was evaluated using confirmatory factor analysis (CFA). Two models were established for each disease specific domain. Several indicators such as the root mean square error of approximation (RMSEA), the comparative fit index (CFI), Tucker-Lewis index (TLI), standardized root mean square residual (SRMR) and coefficient of determination (CD) were used to assess the goodness of fit for the models. RMSEA value of <0.5, 0.5-0.1 and >0.1 indicated good, moderate and bad fit of the model to the covariance matrix, respectively [30].
The convergent validity of the KDQOL-36 was assessed using Pearson's correlation test against the SF-12 v2 MCS and PCS, and disease specific domains. Correlation coefficients were interpreted as negligible for coefficient = 0, small = 0.1, medium = 0.3, large = 0.5, very large = 0.75.
In term of reliability, Cronbach's alpha using cut-off scores 0.7 was used to evaluate the internal consistency of each disease specific domains of KDQOL-36 [31]. Intra-class correlation coefficient (ICC) and paired t-test were performed to assess the test-retest reliability. ICC 0.7 was considered to indicate good reproducibility [28].
Sensitivity of the KDQOL-36 was determined by performing known group comparisons for each disease specific domain. Independent t-tests were used to assess the differences in mean domain scores between HD and PD groups. The Cohen's d effect sizes were also calculated.
All significance tests were two-tailed and those with a p-value less than 0.05 were considered statistically significant. The statistical analysis was performed in STATA Version 13.0.

Ethics
All authors declare they have no conflict of interest The study protocol was approved by the institutional review boards. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent: Written informed consent was obtained from all individual participants included in the study.

Content Validity
Ten subjects were recruited for cognitive debriefing interviews: six males, four females; age range 41-64 years, mean 49.3 years (SD: 7.8). One out of ten subjects could not interpret the item "problems with your access site" and "your personal appearance", respectively. The CVI on clarity and relevance achieved the 0.8 standard for all items. The results are shown in Table 1.

Descriptive Statistics
356 subjects were recruited for analysis. Table 2 shows the socio-demographic comparison between HD and PD patients. PD patients were older (63.14 vs 56.62) and a larger proportion were married (71.84% vs. 59.20%) and had higher personal income (9.78% vs. 2.59% had monthly personal income $20,000) than HD patients. The ceiling and floor percentages of disease specific domains are displayed in Table 3. No floor and ceiling effect was detected for the three subscale scores.

Internal Construct Validity
The corrected item-scale correlations for disease specific domains are shown in Table 3. Based on the standard of correlation coefficients 0.4 between items and subscales, all correlations reached the target standard except on the item related to problems with access site (0.291), demonstrating item-internal consistency. Table 4 displays the goodness-of-fit indices of the confirmatory factor analysis models for disease specific domains. The factor loadings of all items, except the one related to the problems with access site (0.319), were positive and exceeded the threshold of 0.4, indicating substantial interpretability of underlying factor structure. The model had a RMSEA 0.1, CFI 0.8, SRMR 0.1 and CD 0.95, which exceeded the cutoff of the individual fit statistics, indicating a good fit.

Convergent Validity
Pearson's coefficients of correlation between generic and disease specific domains are calculated and displayed in Table 5. The disease-specific domain scores had a moderate correlation with the SF-12 v2 MCS and PCS cores, supporting the convergent validity. Table 6 outlines the reliability coefficients of disease specific domains. A total of 102 HD subjects completed the 2-week retest questionnaires and their three subscale scores at baseline and 2-week retest were compared. The mean test-retest scores differences were all not statistically significant except that of burden score, which was just marginally significant. Moreover, good agreement was found in all subscales according to the intra-class correlation coefficients (0.792-0.924). The Cronbach's alpha coefficients of the three subscales ranged from 0.810 to 0.931, which indicated acceptable internal consistency.  Table 7 presents the comparison of disease specific domains between HD and PD patients. In general, PD patients had higher subscale scores than HD patients. The difference reached statistical significant for the effects score, which had an effect size of 0.68.

Discussion
The results of the cognitive debriefing interviews on the kidney disease-specific scales of the KDQOL-36 were reassuring. The CVI on clarity and relevance reached target standards. Most people could understand and correctly interpret all the items of the KDQOL-36 and all items were relevant to them. As a content validation of this instrument has never been previously performed, our findings are the first to demonstrate that the instrument is linguistically and culturally relevant to Cantonese-speaking people in Hong Kong. No significant floor and ceiling effects were identified in the kidney disease-specific scales of the KDQOL-36. This is different from the findings of a previous study in the US which observed a significant floor effect in the 'burden of kidney disease' and 'effects of kidney disease' subscales in Hispanic and White subjects [32]. The difference in findings reaffirmed the need for psychometric testing when adapting patient-reported outcome measures for use in different linguistic settings or among patients of different cultural backgrounds. The influence of engrained cultural and health beliefs, and disease profiles of study subjects might affect the results of psychometric testing. The lack of floor and ceiling effects indicates that the scales can potentially capture deterioration or improvement in disease-specific HRQOL along the disease trajectory. All correlations in the corrected item-total correlation testing of the adapted KDQOL-36 reached the 0.4 standard, except the item "problems with access site". There are some possible explanations. First the item "problems with access site" is measuring a related but slightly different domain than the other items of the subscale which relate to physical symptoms. Second, it is possible that some subjects may not have understood the item, resulting in a suboptimal result. In addition, the corrected item-total correlations observed in the present study were substantially lower than those observed in a validation study in Singapore, in which all correlation coefficients were >0.7 [33]. This may be due to the use of item-total correlation testing without correction in the Singapore study which would bias the results because the total score includes the contribution of the item. We used item and total scale score correlation corrected for overlap to assess internal construct validity [34]. This method is more stringent which may account for the lower correlations. Contrary to the previous studies which found no significant correlation between the SF-36 PCS score and effects of kidney disease score in Hong Kong [10], and weak correlation between the SF-12 PCS and burden of kidney disease score in Singapore [35], we found that all diseasespecific scores of the KDQOL-36 were moderately correlated with the SF-12 v2 PCS and MCS scores, which confirmed that their constructs are related but not equivalent. As the SF-12 v2 assesses generic HRQOL, the constructs of such measures might not be specific and sensitive enough to capture the impacts of ESRD on HRQOL. Generic measures may contain constructs of less relevance to ESRD patients and miss specific concerns of ESRD patients. In contrast, the disease-specific components of the KDQOL-36 focus on the impacts of ESRD on HRQOL, so the constructs of the disease-specific components should be more relevant to those with ESRD. Our results supported the clinical importance of the disease-specific components of the KDQOL-36.
In line with the previous study in Singapore [35], our data fit the original 3-factor model (symptoms, burden and effects). The factor loading scores of all items of disease-specific domains exceeded 0.4, except for the item related to the access site. This finding was consistent to that found in our internal construct validity assessment.
The KDQOL-36 was found to be a reliable instrument. The internal consistency (Cronbach's alpha > 0.8) observed in the present study was good and comparable to that observed in other populations such as Mainland China [36], Thailand [37] and the US [32]. The internal consistency of this instrument was much higher than that found in a similar study previously conducted in Hong Kong [8] which may be due to our larger sample size. A sample size of at least 200 is needed to determine adequately precise reliability coefficients [38]. The 2-week test-retest reliability of the adapted KDQOL was acceptable and comparable to those observed in previous studies [10,36,37].
The 'effects of kidney disease' subscale of the KDQOL-36 was sensitive enough to differentiate between HD patients and PD patients. We found that HD patients had poorer HRQOL related to the effects of kidney disease. A previous study in Singapore also found that HD patients had poorer HRQOL than PD patients as measured by the General Health Questionnaire-28 [39]. A large difference in HRQOL related to the effects of kidney disease (effect size >0.6) between these groups was observed in the present study. This is expected as HD patients spend a significant proportion of their time bound to a dialysis machine in a health care facility with previous studies suggesting that the HRQOL of HD patients was particularly affected by "loss of freedom", "dependence on the caregivers", "disrupted marital, family and social life" and "fluid and dietary restriction" [40].

Limitations
First, subjects in the present study were only recruited in government-funded dialysis settings. Patients who receive dialysis in private settings, who comprise about 5-10% of this patient population, were not represented and as their treatment experience may differ and thereby affect quality of life, the study results may be biased. Second, although the instrument is more commonly used as a self-administered questionnaire, we deliberately chose interviewer-administration because the pilot testing found that our subjects prefer interviewer-administration to selfadministration. Further study is needed to assess whether the mode of administration will affect a response to the instruments. Third, only HD subjects were included in the test-retest analysis, because HD subjects go to the hemodialysis centre frequently and it was not feasible to follow up with PD subjects after 2-weeks.

Conclusion
The disease-specific components of the KDQOL-36 demonstrated satisfactory content validity, construct validity, reliability and sensitivity in Cantonese speaking patients on peritoneal dialysis or haemodialysis. Our findings support the use of this instrument to evaluate the HRQOL of Chinese patients on dialysis although future work is recommended to evaluate the responsiveness of the instrument.