Impact of Response Shift on Time to Deterioration in Quality of Life Scores in Breast Cancer Patients

Background This prospective multicenter study aimed to study the impact of the recalibration component of response-shift (RS) on time to deterioration (TTD) in health related quality of life (QoL) scores in breast cancer (BC) patients and the influence of baseline QoL expectations on TTD. Methods The EORTC-QLQ-C30 and BR-23 questionnaires were used to assess the QoL in a prospective multicenter study at inclusion (T0), at the end of the first hospitalization (T1) and, three (T2) and 6 months after the first hospitalization (T3). Recalibration was investigated by the then-test method. QoL expectancy was assessed at diagnosis. Deterioration was defined as a 5-point decrease in QoL scores, considered a minimal clinically important difference (MCID). TTD was estimated using the Kaplan-Meier method. Cox regression analyses were used to identify factors influencing TTD. Results From February 2006 to February 2008, 381 women were included. Recalibration of breast cancer patients' internal standards in the assessment of their QoL had an impact on TTD. Median TTD were significantly shorter when recalibration was not taken into account than when recalibration was taken into account for global health, role-functioning, social-functioning, body-image and side effects of systemic therapy. Cox multivariate analyses showed that for body image, when recalibration was taken into account, radiotherapy was associated with a shorter TTD (HR: 0.60[0.38–0.94], whereas, no significant impact of surgery type on TTD was observed. For global health, cognitive and social functioning dimensions, patients expecting a deterioration in their QoL at baseline had a significantly shorter TTD. Conclusions Our results showed that RS and baseline QoL expectations were associated with time to deterioration in breast cancer patients.


Introduction
The assessment of longitudinal changes in subjective patientreported outcomes such as health-related quality of life (HRQoL) is a key component of many clinical and research evaluations. Indeed, the aim of assessing the impact of disease and treatment on HRQoL is increasingly stressed as crucial for evaluating the overall treatment effectiveness in cancer clinical trials. Moreover, cancer patients require information not only related to survival estimates, but also regarding HRQoL issues [1].
The challenge of using HRQoL measurements in longitudinal studies or clinical trials is related to their self-report nature and also to their subjectivity. Because measurements of HRQoL are completed from the patient's perspective, they could be modified by psychological phenomena such as health expectancies [2,3]. For instance, the mechanism by which people assess or quantify their HRQoL could change over time. These changes, which are closely related to the process of accommodating to the illness, are referred to as response shift (RS) [4][5][6]. Schwartz and Sprangers defined response shift through three components ''as a change in the meaning of one's self-evaluation of a target construct as a result of a change in the respondent's internal standards of measurement (recalibration), a change in the respondent's values (reprioritization) or a redefinition of the target construct (reconceptualization) [5]. A major goal of measuring patient-reported HRQoL is to determine to what extent changes in HRQoL scores over time represent true changes in HRQoL due to treatment or cancer and to what extent they reflect measurement error [7]. The occurrence of RS has been demonstrated in breast cancer (BC) patients [8][9][10]. Response shift is a natural process that could distort the interpretation of change in HRQoL scores over time in interventional comparative studies. Characterizing response shift may therefore be a requirement to obtain a valid and sensitive assessment of change over time.
Another concern in assessing HRQoL is how to deal with missing data [11] since they could impact the results of HRQoL estimates and lead to biased interpretations. Indeed, in longitudinal studies, observations of patients can be missed at certain time points because they miss visits or do not fill in some questionnaires. In these cases, the interpretation of HRQoL results can be seriously hampered by these missing data. Thus, analysis methods requiring complete cases (e.g., multivariate analysis of variance) are not adequate. Analysis methods should retain, at least, all of the available data [11] but should produce results that are robust and meaningful for clinicians in order to help decision making [12,13]. In this way, the time to deterioration in QoL scores (TTD) approach has been defined as a method of longitudinal analysis for breast cancer (BC) patients [14]. Indeed, TTD can deal with missing data by making underlying assumptions about whether the missing data reflect a deterioration of the patient's health status or not. Furthermore, the measure of TTD might be more familiar to clinicians because it is based on Kaplan-Meier survival curves and hazard ratios (HR) [15].
The aims of this study were to evaluate the impact of the recalibration component of RS on TTD estimations in patients with BC. The secondary objective was to examine the influence of baseline QoL expectations on TTD in patients with BC.

Patients
A prospective multicenter randomized cohort study that included all women hospitalized for the diagnosis or treatment of primary BC or for a suspicion of BC was implemented in the cancer centers of Dijon, and Nancy, and in the university hospitals of Strasbourg and Reims. Patients were included between February 2006 and February 2008. Patients with other primary cancer sites than BC and patients already hospitalized or treated for BC were excluded. Only, women patients were included. Patients who declined the study or who were unable to give a written informed consent were excluded.
All of the participants gave their written informed consent, and the protocol of the study was approved by the regional ethics committee (Comité Consultatif de Protection des Personnes dans la Recherche Biomédicale de Bourgogne) in 2005.

Health related Quality of life assessments
HRQoL was assessed using the EORTC-QLQ-C30 [16] and the EORTC QLQ-BR23 questionnaires [17] at diagnosis (T0), at the end of the first hospitalization (T1) and, three (T2) and six months after the first hospitalization (T3). The QLQ-C30 is a cancer specific tool composed of 30 items which generate 15 scores: five scores of functional parameters, a financial difficulties scale, and eight scores for symptoms. The breast cancer module comprises 23 questions assessing disease symptoms and the sideeffects of treatment. These scores vary from 0 (worst) to 100 (best) for functional functions and from 0 (best) to 100 (worst) for symptom parameters.
Patients were also asked to assess their QoL expectations at baseline using the following question: do you expect that your QoL: 1) will not change globally, 2) will deteriorate, 3) will improve.

Assessment of the recalibration component of RS using the then-test method
Recalibration was assessed using the then-test method. This method requires patients to rate their previous health state from their current perspective [5,18]. The order in which the QoL questionnaires, then-test and post-test, were administered was determined by randomization 1:1 with center stratification to assess the impact of the order of the questionnaires on RS estimates. In arm A, patients were asked to complete the questionnaires at time T (posttest), and then retrospectively (then-test) to assess baseline QoL at the end of the first hospitalization. In arm B, the order of the questionnaires was then-test/post-test. In this study, we did not compare patients according to randomized arm because previous study showed only a small impact of the ordre of the randomized arm on QoL scores [10]. Three then-tests were implemented (figure 1): two to retrospectively assess baseline QoL (measured at the end of the first hospitalization and 3 months after the first hospitalization) and one to retrospectively assess the three-month QoL (measured at 6 months). In other words, patients were asked to retrospectively assess their baseline QoL at T1 (then test1) and at T2 (then test 2), and to retrospectively assess their three-month QoL at T3 (then test 3).The mean differences between the assessment of the baseline QoL at the inclusion (pretest) and then test1 were calculated to assess recalibration at the end of the first hospitalization. In order to assess recalibration at 3 months, the mean differences between the assessment of the baseline QoL at the inclusion and thentest2 were calculated. Lastly, the mean differences between the three-month QoL and its retrospective assessment at 6 months (then-test3) were compared in other to assess recalibration in internal standards at 6 months. A + (or 2) mean difference between the ''then-test and the pre-test'' retrospectively reflects a higher (or lower) QoL level at baseline (or at 3 months) for the functional (or symptoms) dimensions.

Statistical methods
Patients' characteristics were described and compared according to the completion of baseline questionnaire in order to determine whether missing score at inclusion was dependent on patients' clinical or sociodemographic status.
Wilcoxon matched pairs tests were used to assess recalibration. Time to QoL deterioration. All patients who had a baseline and at least one follow-up QoL assessment were included in the TTD analyses.
The time to QoL deterioration (TTD) was defined as the time from inclusion in the study to the first 5-point [19] decrease in QoL scores according to baseline score. Patients were censored at the time of the last QoL completed if they had not deteriorated before that [14].
To take into account the recalibration component of RS, thentest assessments were used as reference scores when significant recalibration effects were observed. Therefore, if significant recalibration of baseline QoL was observed only at T1 (or at T2), analyses were done using then-test1 (or then-test2), as the reference score. In addition, then-test3 was used in TTD analyses (instead of three-month QoL), when significant recalibration of the three-month QoL was observed at T3.
The TTD was estimated using the Kaplan-Meier method. The TTD was described using medians and the 95% confidence interval (CI). Statistical significant difference between median TTD when recalibration component of RS was taken into account and median TTD when recalibration was not taken into account was assessed using bootstrap Kaplan-Meier estimate of median TTD. Nonparametric 95% confidence intervals for the difference in bootstrap Kaplan-Meier estimate of median TTD were computed. Differences between medians were considered statistically significant if their 95% confidence intervals did not include the value of 0.
Cox regressions were applied to identify factors associated with TTD for each QoL dimension. All variables with an univariate Cox p value #0.20 were eligible for multivariate Cox analyses. Cox multivariate analyses were stratified on the center of inclusion. The statistical significance level was set at p = 0.05 for Cox models analyses and reduced to p = 0.01 for the analysis performed with the then-test method in order to prevent false positive results due to the number of multiple comparisons performed with this method.
Analyses were performed using STATA Statistics 11/Data Analysis Software (StataCorp LP, College Station, Texas, USA)

QoL completion
At baseline, 359 (94.2%) patients completed the questionnaire with at least one available QoL dimension and 357 (93.7%) had a baseline and at least one follow-up QoL assessment. The clinical and pathological characteristics of these two populations were similar and are presented in Table 1. Only the center of inclusion was statistically different according to missing score.

Retrospective assessments of baseline QoL
The occurrence of recalibration effects differed according to the time of the retrospective assessment (T1 or T2) for 7 dimensions. For fatigue, appetite loss and the side effects of systemic therapy, with mean differences (MD) in QoL scores of 21.8(p = 0.0006), 2 2.9(p = 0.0081) and 21.96(p = 0.0001), respectively (Table 2), symptoms were significantly higher at inclusion than the retrospective assessment at T1 (then-test1). These differences were no longer statistically significant with the retrospective assessment of the baseline QoL at T2 (then-test 2).

Time to QoL deterioration
Medians TTD for the studied population are shown in table 3. Results showed that median TTD were significantly shorter when recalibration was not taken into account than when recalibration was taken into account for global health, role-functioning, socialfunctioning, body-image and side effects of systemic therapy (figure 2 a to f). For example for GHS, the median TTD increased from 3.1[2.9-3.3] when recalibration was not taken into account to 3.6[3.2-6.3] when it was (figure 2a). For role-functioning (figure 2b), the median TTD increased from 3.2[3.1-3.3] to 4.7[3.3-6.2] when recalibration was taken into account. For social functioning score (figure 2d), median TTD increased from 3.6 months to 6.3 months when recalibration was taken into account. For body image score (figure 2e), median TTD increased from 3.3 months to 6.2 months (table 3).
However, for emotional-functioning dimension (figure 2c) median TTD was significantly longer when recalibration was not taken into account. Bootstrap Kaplan-Meier estimate of difference in median TTD was 23.13 [23.8-20.5]. For the other dimensions no statistically significant difference was found between median TTD.

Univariate analyses of TTD
Results of the univariate Cox analyses of TTD are reported in table S1 in File S1 for QLQ-C-30 score and table S2 in File S1 for QLQ-BR23 scores. An MCID of 5 points was used for these analyses. For example, for the body-image score, when recalibration was not taken into account, there was no beneficial effect on TTD of either SLNB or not undergoing radiotherapy. When recalibration was taken into account, women treated with SLNB had a significantly longer TTD than those treated with axillary lymph node dissection (ALND): HR = 0.65[0.45-0.93]. Concerning radiotherapy, patients who did not receive treatment by radiotherapy had a significantly longer TTD than those who underwent radiotherapy.

Cox multivariate analyses of TTD
Multivariate Cox models analyses were done for all dimensions of the QLQ-C30 and BR-23 questionnaire. However, for parsimony of the presentation, only dimensions (of QLQ-C30 or BR23) where times to deterioration estimations were significantly influenced by factors are shown in table 4. For body-image, when RS was not taken into account, cox multivariate analyses showed that the modality of surgery was significantly associated with TTD. Patients who underwent mastectomy had a shorter TTD for bodyimage as compared to patient having conservative surgery: HR When recalibration was taken into account for body-image, the association between TTD and the modality of surgery became non-statistically significant while radiotherapy became significantly associated with TTD. Patients who did not receive radiotherapy had a significantly longer TTD than did those who received radiotherapy: HR 0.60 [0.38-0.94].
Cox multivariate analyses showed that, expectation about QoL level at baseline was significantly associated with TTD. As example, when the recalibration component of RS was taken into account, QoL expectancy at baseline was significantly associated with TTD in GHS, physical-functioning, cognitive-functioning, social-functioning, and breast symptoms scales. Patients who expected a deterioration in their QoL at baseline had a

Discussion
In this study, we examined the impact of the recalibration component of response shift on TTD estimations of QoL scores in breast cancer patients. Our results underlined that BC patients' internal standards for assessing their QoL could change during the course of treatment and disease.
The recalibration of BC patients' internal standards had a significant effect on Time to QoL score deterioration for six of the 23 dimensions. Indeed, the median TTD of the studied-population was underestimated for global health, role-functioning, socialfunctioning, body-image and side effects of systemic therapy when recalibration was not used as reference score to qualify QoL score deterioration. Regarding the emotional-functioning scale, the median TTD was overestimated when recalibration was not taken into account.
Our results showed that, as compared to ALND SLNB modality was independently associated with longer TTD for arm symptoms, nausea and vomiting symptoms as well as systemic therapy side effects [14,[20][21][22][23]. Interestingly, for breast symptoms, our results showed that SLNB followed by complementary ALND resulted in a significantly shorter TTD than for ALND alone [14]. According the surgical modality, TTD was significantly associated with diarrhea symptoms when recalibration was take into account. In contrast, for body-image, we found a significant association between the type of surgery and TTD only when the recalibration effect was not take into account. To our knowledge, no study reporting the association between the type of surgery and QoL has considered the effect of the recalibration component of RS [24][25][26][27]. In addition, we suggest that radiotherapy could be independently associated with a shorter time to body-image deterioration, when RS into account. These results underline the requirement to assess impact of RS through sensibility analyses.
Moreover, patients who expected deterioration or no change in their QoL level reported a significantly shorter TTD than patients who expected an improvement. Previous studies have also suggested that the high expectation of patients regarding health and QoL level, could predict better outcome [2,28-31,]. Although, heterogeneity between studies clinical outcome, investigators have consistently and in a majority shown strong, statistically and clinically significant associations between patients' expectations and clinical recovery. However, the interpretation of this association remains unclear. Incorporating questions about patient expectations related to health and QoL in future trials should be promote to clarify the role for clinical outcomes.
One of the limits of our study is that we focused on the recalibration component of response-shift only using then test method.
Furthermore due to the retrospective assessment, a major limitation of the then-test method is its susceptibility to recall bias. Thereby, respondents are supposed to be able to remember their previous health and QoL level at the baseline assessment [18,32]. The risk when using this approach could be that patients will miss to accurately recall their health and QoL level before the intervention (recall bias). Additionally, recent evidence has emerged amongst patients undertaking self-management interventions for chronic diseases that the then-test approach may contain psychometric flaws resulting from implicit theory of change, social desirability, halo effects and recall bias [33]. Including a comparison group when designing studies could help to achieve optimal use of the then-test approach. However, RS has been defined as a treatment-dependent phenomenon, pre-test, post-test and then-test scores of control subjects would only reflect effects due to history, maturation or testing. Thus, recalibration RS is only indicated if the difference between the then-test and pre-test scores are significantly larger in the experimental than in the control group [18].
Response shift has been explored over time in HRQoL through a variety of designs and statistical methods. Each of these methods is specific, with its own advantages, limitations and challenges. However, assessing response shift is of paramount importance in longitudinal HRQoL research.
In conclusion, our study showed that BC patients' internal standards change during QoL follow-up. Since patients could accommodate to the treatment toxicities or disease progression over time, this could result in the attenuation or the inflation of treatment effect estimation. Therefore, cancer clinical trials must investigate the RS effect more deeply. We encourage to plan longitudinal QoL analyses taking it into account such effect to improve interpretation of the results. Our study also showed that baseline QoL expectations were associated with QoL deterioration in several dimensions. For this reason, health care providers should give adequate counselling and psychological support to the patients at the time of the diagnosis to prevent the early QoL level deterioration.

Supporting Information
File S1 Tables S1 and S2. Table S1. Univariate analyses of time to QLQ-C30 score deterioration for factors significantly affecting TTD with or without taking account of the recalibration component of RS. Table S2. Univariate analyses of time to QLQ-BR3 score deterioration for factors significantly affecting TTD with or without taking into account the recalibration component of RS. (DOC)