Interpretation of health-related quality of life outcomes in Parkinson’s disease from the EARLYSTIM Study

The EARLYSTIM Study compared deep brain stimulation (DBS) with best medical treatment (BMT) over 2-years, showing a between-group difference of 8.0 from baseline in favor of DBS in health-related quality of life (HRQoL), measured with the PDQ-39 SI (summary index). This study obtained complementary information about the importance of the change in HRQoL as measured by the PDQ-39, using anchor-based (Patient Global Impression of Change, PGIC) and distribution-based techniques (magnitude of change, effect size, thresholds, distribution of benefit) applied to the EARLYSTIM study data. Anchor-based techniques showed a difference follow-up–baseline for patients who reported “minimal improvement” of -5.8 [-9.9, -1.6] (mean [95%CI]) in the DBS group vs -2.9 [-9.0, 3.1] in the BMT group. As the vast majority (80.8%) of DBS patients reported “much or very much improvement”, this difference was explored for the latter group and amounted to -8.7 for the DBS group and -6.5 in the BMT group. Distribution-based techniques that analyzed the relative change and treatment effect size showed a moderate benefit of the DBS on the HRQoL, whereas a slight worsening was observed in the BMT group. The change in the DBS group (-7.8) was higher than the MIC (Minimally Important Change) estimated value (-5.8 by the anchor; -6.3 by triangulation of thresholds), but not in the BMT (0.2 vs. -3.0 to -5.4, respectively). Almost 90% of the patients in the DBS group declared some improvement (58.3% and 56.7% beyond the estimated MIC), which was significantly different from the BMT group whose proportions were 32.0% and 30.3%, respectively. The number needed to treat to improve ≥1 MIC by DBS vs BMT was 3.8. Change in depression, disability and pain influenced the improvement of the DBS group. DBS improved HRQoL in a high proportion of patients to a significant and moderate degree, at 2 years follow-up.


Introduction
Parkinson's disease (PD) is a neurodegenerative disorder, second in prevalence after Alzheimer disease in population greater than 60 years, and the global burden of Parkinson's disease has more than doubled with aging of the population and longer disease duration [1]. The semiology of PD includes characteristic motor manifestations (bradykinesia, rest tremor, rigidity, gait disturbances, and impairment of postural reflexes) and a variety of non-motor symptoms (e.g., sleep, mood, and autonomic disorders). Progression of the disease over time bears progressive disability, physical and mental complications (e.g., dyskinesia, dementia), psychosocial malfunction, and potential personal financial loss. All these factors can impact on and severely deteriorate the patients' health-related quality of life (HRQoL) [2][3][4][5][6].
HRQoL is a component of the global QoL and may be defined as "the perception and evaluation, by patients themselves, of the impact that the disease and its consequences have caused in their life" [7]. The main components of the construct HRQoL are: Physical symptoms, Mental symptoms (mood and cognition), Functional ability, and Social functioning [8,9]. Derived from traditions such as the health and social indicators, and designed and validated through psychometric theories, HRQoL measures are available. These measures can be classified as "generic", usable in any condition or population, and "specific" for populations with specific characteristics, symptoms, condition, or dysfunctions. Generally biometrical presentation of results does not reflect the changes or statistical distribution-based views leading to limitations in understanding which changes matter to the patient. The information provided by the analysis and interpretation of the HRQoL measures provides relevant information about the impact of the disease, its course over time, priority areas to be attended, and effect of the interventions. Therefore, the perspective of the disease from the patients' point of view is invaluable complementary information for a clinical practice and research, as recognized by the regulatory agencies [10,11]. Measures used for HRQoL assessment in PD have been reviewed by an ad hoc Movement Disorder Society Task Force [12].
The EARLYSTIM Study was a prospective randomized study comparing subthalamic nucleus deep brain stimulation (DBS) with best medical treatment (BMT) to BMT alone over a 2-year follow-up. The primary endpoint was HRQoL as measured with the Parkinson's Disease Questionnaire-39 items (PDQ-39) [13][14][15][16][17][18][19][20]. The "positive" results of the EARLYSTIM Study favor the use of DBS in PD patients with early motor complications. Consequently, the Food and Drug Administration (FDA) approved in November 2015 the use of DBS in PD patients with "at least 4 years duration and with recent onset motor complications, or motor complications of longer-standing duration that are not adequately controlled with medication". The proposal of earlier intervention and the FDA approval, however, have not been free of criticisms [21][22][23].
In the pivotal paper of the EARLYSTIM study [14] the results of the primary endpoint focused on the difference (comparison) pre-post-intervention and the percentage of change.
We now address the clinical importance of the change, the relationships between change in the HRQoL and change in clinical aspects, and the proportion of patients experiencing a significant improvement by the intervention. Therefore, the objectives of this secondary analysis  (2) to assess what modifications in the manifestations of the disease were associated with changes in HRQoL.

Design
Post hoc secondary analysis of the EARLYSTIM data (baseline and 24 months).

Population
Cohort of PD patients included in the EARLYSTIM Study which has been succinctly described as PD patients under age 61 with mild levodopa-responsive PD (motor response �50%), Hoehn & Yahr stage �2.5, and preserved psychosocial competence who experienced levodopa-induced motor complications for no more than 3 years [13].

Assessments
The outcome assessments evaluated in this analysis were the following:

Ethics statement
The EARLYSTIM study conformed to the Declaration of Helsinki and all patients provided written informed consent before randomization. Permission was approved by the local ethics committees of all centers (Ethik-Kommission, Medizinische Fakultät der Christian-Albrechts-Universität zu Kiel, Kiel, Germany; Comité de Protection des Personnes, Ile de France IV, Hôpital Saint-Louis, Paris, France). The study was registered at ClinicalTrials.gov (NCT00354133).

Analysis methods
This analysis used the data from the EARLYSTIM Study and focused on HRQoL outcomes interpretability. The variable of interest was the change observed in the PDQ-39 SI at a 24-month follow-up (FU).
Objective 1: Quantifying the meaning of HRQoL outcomes 1. Anchor-based method. The Minimally Important Change (MIC) has been defined in many ways [24], but we will use the concept "the smallest difference in score that patients perceive as important".
In the present study, the patient global impression of change (PGIC) [25,26] was used as the anchor for estimation of the MIC, this patient-reported outcome is the most appropriate to compare with another patient-reported outcome of HRQoL. The MIC was determined by the change observed in those patients who declared to be "minimally improved" at follow-up [24,[27][28][29].
2. Distribution-based methods. This section refers to statistical techniques based on the distribution of scores, in order to provide information enough from different quantitative sources without reference of the patients' point of view. The following parameters, whose formulas are shown in the Supporting information (S1 Text), were calculated: 2.1. Magnitude of the change. Although there are not standard or threshold values for these magnitudes, they furnish an intuitive approach to the importance of change (a higher change will be more important than a small one).
1. Intragroup difference follow-up (FU)-baseline. Negative figures reflect improvement and positive differences worsening, according to the PDQ-39 SI. This outcome is available in the primary study publication [14], but is shown here again for completeness.
2. Comparison of the magnitude of the difference FU-baseline inter-group [14].

Relative change or percentage of change (intragroup) [30]
and inter-group comparison of proportions.

Threshold values and triangulation.
Comparison between the observed change and some thresholds proposed as representative of the MIC value (intra-group): 1. Standard error of the difference (S diff ), as an estimate of the measurement error of change in longitudinal studies [33]. From this perspective, the S diff (as the standard error of measurement, SEM) could be considered the threshold for a minimal important change [34,35] in absence of the SEM, which could not be calculated due to the low number of stable patients at FU (n = 6).
Different methods usually offer different results. Therefore, the calculation of an average value ("triangulation") theoretically approaching the true MIC has been proposed [38,39]. For this purpose, the values averaged were the anchor-based difference FU-baseline, the S diff , and ½ standard deviation at baseline.

Quantifying the meaning of health-related quality of life outcomes
Using the anchor-based method, the mean improvement in the group of patients who felt to be "minimally improved" with respect to the baseline was -5.8±5.8 (CI: [-9.9, -1.6]) in the DBS group vs -2.9±11.0 (CI: [-9.0, 3.1]) in the BMT group (p = 0.468). Table 1 shows the mean PDQ-changes and confidence intervals as related to the PGIClevel. It is evident that the PDQ-39 SI change related to 'much improved' (-7.7 ± 14.1 for DBS and -4.8 ± 10.4 for BMT groups) and 'very much improved' (-10.1 ± 13.2 for DBS and -14.1 ± 9.9 for BMT groups) shows a gradation in which subjective level of improvement or worsening is associated with a stepwise change in HRQoL. In the Table 2, the results of the distribution-based analysis in both treatment groups are shown. There was an evident change of PDQ-39 SI scores towards improvement in the DBS group, whereas change in the BMT group was towards impairment and negligible. The effect size value was �0.60 ("moderate"), both for the DBS group and inter-group difference, and negligible (0.02) for BMT. In regard to the Minimally Improved group in Table 2 the two threshold values (S diff and ½ SD baseline ) were, respectively, 6.0 and 7.1 for both the DBS group and the BMT group. Average of these values plus that obtained from the anchor in both groups produced a MIC estimate of 6.3 and 5.4 (in absolute values), respectively, indicating that the observed change (difference follow-up-baseline) was higher than the estimated MIC in the DBS group, but much lower in the BMT group.
The proportion of patients who minimally improved at least 1MIC in the DBS group was 58.3% (70/120), whereas only 32.0% (39/122) improved in the BMT group (p<0.0001) ( Table 2). Subsequently, the NNT to observe one patient improving the MIC or more was 1.7 for DBS and 3.1 for BMT. As most (80.8%) of the DBS patients were 'much or very much improved', similar calculations were carried out for these subjects, collapsed to a unique level ( Table 2). The MIC for this level 'much or very much improved' was 6.9 for DBS and it was reached or surpassed by 56.7% (68/120) of subjects in this group, whereas it was 6.2 for BMT, achieved by 30.3% (37/122) of patients in this group (p<0.0001). The corresponding NNT were similar for 'minimally improved or more' and 'much or very much improved" (1.7 vs 1.8 for the DBS group; 3.1 vs 3.3 for BMT). The NNT for DBS compared to BMT was 3.8

PLOS ONE
Interpretation of health-related quality of life outcomes in Parkinson's disease (

Changes of disease characteristics associated health related quality of life changes
The change in the PDQ-39 SI was highly correlated with the improvement in psychosocial adjustment (SCOPA-PS) and depression (BDI), and a moderate association was found with the change in the activities of daily living (ADL, UPDRS II) and pain (VAS-Pain). The correlation was moderate/weak with SOFAS. Most of the classical motor parameters like UPDRS III in the worst or best condition, OFF-time or fluctuations and dyskinesia, were only weakly correlated (Table 3). Table 4 shows the results of the regression model built to identify determinants of change in QoL (PDQ-39 SI), which was the dependent variable. To this purpose, the independent variables in the model were selected after discarding collinearity (intercorrelation coefficients <0.75). Change in SCOPA-PS and SF-36 were not included due to their direct interaction with the dependent variable, and "On time without troublesome dyskinesias" was discarded by potential interaction with UPDRS-IV.

Discussion
The EARLYSTIM Study showed a favorable outcome at FU in the primary endpoint, HRQoL measured with the PDQ-39 (improvement of 26%), in the group treated with DBS whereas the BMT group slightly worsened (1%) [14]. In this pivotal study, information about the effect of the intervention on HRQoL domains, UPDRS parts II to IV, psychosocial adjustment,

PLOS ONE
Interpretation of health-related quality of life outcomes in Parkinson's disease neuropsychiatric disorders, and levodopa-equivalent daily dose also showed a significant beneficial effect on these endpoints for the DBS group. However, neither the importance that the change in HRQoL entailed for the patient nor the relationship between this outcome and the change in the other variables in the study were explored. The present analysis was aimed, therefore, to investigate these gaps.

Objective 1
Two approaches are used for interpretation of change in patient-reported outcomes, in general, and HRQoL in particular: anchor-based and distribution-based methods [11,28,29]. The only finding of an anchor-based method we obtained was the difference baseline-FU for the group of patients who declared to be "minimally improved" (5.8 points for the DBS and 2.9 points for the BMT group). However, the number of patients in this situation was quite small (10 DBS and 15 BMT patients), representing 8.3% and 12.3% of the respective total samples (Table 1). These figures are insufficient for providing reliable information about the MIC, although they would indicate a significant benefit in the DBS group only ( Table 2). Given that most of DBS patients declared to be 'much/very much improved', the same procedure that for 'minimally improved' category was performed for those patients, and the mean difference FU-baseline was -8.7 [-11.4, -5.9] in the DBS versus -6.5 [-10.8, -2.2] in the BMT group.
A variety of parameters with the distribution of score changes was subsequently calculated. Anchor-based methods are preferred [11,42] because they connect the concept measured by the patient-reported outcome with the anchor, making easy and reliable the interpretation of the outcome. Distribution-based methods, on the contrary, have not a connection with a directly interpretable measure and mainly provide information based on the magnitude of the change. This, however, exceeds the information of a merely 'statistically significant difference'. Both approaches, anchor-and distribution-based methods, have advantages and disadvantages [28,39,[42][43][44] (S1 Table).
Summarizing the results at FU of the distribution-based methods ( Table 2): 1. The relative change and effect size showed a moderate benefit of DBS on HRQoL, whereas a slight impairment was observed in the BMT group.
2. The observed change (-7.8) was higher than the MIC estimated value for a minimal (-6.3) or even a better improvement (-6.9) in the DBS group, but not in the case of the BMT (0.2 vs. -5.4 and -6.2), respectively.
3. Almost 90% of the patients in the DBS group declared improvement (58.3% and 56.7% of them beyond the estimated MIC for this arm), whereas in the BMT group these proportions were 32.0% and 30.3%, respectively.
4. In both treatment groups there were patients declaring 'much' or 'very much" improvement, but the proportions were significantly different (81% for DBS, 22% for BMT), as well as the proportion of patients who improved more than their respective minimal change for much and very much improved (56.7% for DBS, 30.3% for BMT).
5. The comparative NNT to improve �1 minimal change for much and very much improved by DBS vs BMT was 3.8.
According to these results at long-term, STN-DBS generated benefit in HRQoL in the vast majority of patients and such improvement was considerable in almost 60% of them (Fig 1). On the contrary, almost two thirds of patients on BMT were stable or worse at FU in the BMT group of patients.
Using a transition question as anchor, Peto et al. [45] carried out a postal survey with 728 responses (53.1% of the baseline sample) from members of the UK Parkinson's Disease Society, at 6-month of the first evaluation with the PDQ-39. The mean change (impairment) in patients who declared to be "a little worse" (n = 192) was 1.60 (±8.89), with an effect size 0.10.
Fitzpatrick et al [33] determined by distribution-based methods (SEM and S diff ) the threshold for a real change and, next, compared these results to those from the anchor-based approach to explore the relationship between both methods. They found that one SEM values were relatively close to the MIC values obtained with the anchor-based, in turn usually smaller than the S diff . If SEM represents the minimal change beyond the measurement error, it constitutes the lower threshold for considering a change as real and the S diff , which is higher than the SEM, could be nearer of the MIC than the SEM (the limit of the "noise" by error). In the clinical sample of this study, eight patients declared improvement at follow-up (4 months), with a change in the PDQ-39 SI of-2.15 (±6.62) points, whereas in 40 patients considered to be worse the change was 3.31 (±8.80). The S diff was 5.39, clearly higher than the change observed, and quite close to the S diff values found in the present analysis (6.0 for both arms). In the present study, the SEM based on agreement (intraclass correlation coefficient of stable patients) [46] was not used due to the very small size of this group in both arms. In another series of PD patients followed-up for one year by Martinez-Martin et al. [47], the sample tended to worsen and the SEM for the PDQ-39 SI was 4.26 (baseline) and 4.77 (follow-up). Horváth et al. [48] found, through a transition question, a minimal clinically important difference ( Holden et al. [49] also used a transition question, the Clinical Global impression of Change, and determine the MCID (improvement) in 90 patients with parkinsonism, after a palliative intervention clinical trial with six months follow-up. The absolute MCID obtained for the PDQ-39 (12.7), was considered valid by the authors for this kind of populations. Similar to our findings, a decrease in the PDQ-39 SI score was unexpectedly observed in patients with minimal worsening, a finding probably related with "the full spectrum of patient experience" [49] and expectations [50].
From the previous considerations, it is concluded that MIC varies with the populations, disease severity, type of study (natural progression of disease vs. intervention), and length of the follow-up. Therefore, comparison with other studies performed in similar circumstances is not possible if these studies do not exist, like in the case with the EARLYSTIM study.
Concerning the effect size, most studies with conventional levodopa or dopamine agonists showed changes from 5-30% and weak to moderate effect sizes, whereas therapies for advanced patients achieved 30-50% and moderate to large effect sizes [51][52][53]. Several reviews about the effect of the bilateral STN-DBS on the HRQoL, measured with the PDQ-39, found improvements of 19-34%, with effect sizes 0.60-0.80 [51,52,54]. Therefore, the results of the intervention in the present study, are in line with the published literature.

Objective 2
This analysis was performed to identify how the change in other variables captured in the study could be associated with the change in HRQoL. A close correlation was found with improvement in psychosocial adjustment and depression; moderate with improvement in the global HRQoL, activities in daily living, and pain; and a weak association was observed with the other variables (Table 3). According to the results of the multiple regression, the improvement in depression, functional state, and pain were independent and significant determinants of the change in HRQoL.
There are many factors associated with and able to influence the HRQoL of PD patients [2][3][4][55][56][57][58][59]. Depression, disability, and pain are universal determinants of HRQoL, and they are very frequently present and combined in PD patients, causing a severe deterioration of their HRQoL. If these factors remain significantly improved at long-term after surgery, it would be expected that their improvement entailed a significant improvement in patients' quality of life. Typically, patients have received DBS therapy due to motor complications. The current analysis as well as the data from the main study may prompt considerations about non-motor and quality-of-life aspects to become indication criteria for DBS. More studies are needed here.

Conclusions
From the previous discussion it may be concluded that DBS improved patients' HRQoL in a significant and moderate degree, as a whole, over a two year follow up period. In this group, the beneficial effect was present in the vast majority of patients. These results are even more remarkable when compared to the BMT group, which tended to worsen their HRQoL over the observation time. MIC of the PDQ-39 SI for both populations in the study was determined by means of anchor-and distribution-based methods.