Clinical improvement after surgery for degenerative cervical myelopathy; A comparison of Patient-Reported Outcome Measures during 12-month follow-up

Object Although many patients report clinical improvement after surgery due to degenerative cervical myelopathy, the aim of intervention is to stop progression of spinal cord dysfunction. We wanted to provide estimates and assess achievement rates of Minimal Clinically Important Difference (MCID) at 3- and 12-month follow-up for Neck Disability Index (NDI), Numeric Rating Scale for arm pain (NRS-AP) and neck pain (NRS-NP), Euro-Qol (EQ-5D-3L), and European Myelopathy Score (EMS). Methods 614 degenerative cervical myelopathy patients undergoing surgery responded to Patient-Reported Outcome Measures (PROMs) prior to, 3 and 12 months after surgery. External criterion was the Global Perceived Effect Scale (1–7), defining MCID as “slightly better”, “much better” and “completely recovered”. MCID estimates with highest sensitivity and specificity were calculated by Receiver Operating Curves for change and percentage change scores in the whole sample and in anterior and posterior procedural groups. Results The NDI and NRS-NP percentage change scores were the most accurate PROMs with a MCID of 16%. The change score for NDI and percentage change scores for NDI, NRS-AP and NRS-NP were slightly higher in the anterior procedure group compared to the posterior procedure group, while remaining PROM estimates were similar across procedure type. The MCID achievement rates at 12-month follow-up ranged from 51% in EMS to 62% in NRS-NP. Conclusion The NDI and NRS-NP percentage change scores were the most accurate PROMs to measure clinical improvement after surgery for degenerative cervical myelopathy. We recommend using different cut-off estimates for anterior and posterior approach procedures. A MCID achievement rate of 60% or less must be interpreted in the perspective that the main goal of surgery for degenerative cervical myelopathy is to prevent worsening of the condition.


Introduction
Degenerative cervical myelopathy (DCM) describes a range of conditions in the cervical spine causing cord compression and neurological dysfunction [1]. There is current lack of evidence for nonoperative management in terms of preventing neurological deterioration, although physical rehabilitation and close observation can be considered in mild to asymptomatic cases. For moderate to severe cases, individualized surgical treatment is recommended [2][3][4]. Anterior Cervical Discectomy and Fusion (ACDF) and Anterior Cervical Disc Arthroplasty (ACDA) are frequently used in patients with disc herniation, while posterior approach procedures are well-established treatments options for patients with posterior and/or multi-level spinal cord compression [5]. In cases where symptoms are caused by spinal cord compression due to cervical ossification of the posterior longitudinal ligament, no treatment consensus is obtained and various anterior and posterior approach procedures are currently applied [6,7].
The aim of surgery has traditionally been to stop progression of spinal cord dysfunction symptoms. However, recent studies have shown that many patients report improvement post intervention both regarding functionality and disability, as well as quality-of-life outcomes [2,8]. Depending on PROMs used, severity of preoperative disease and length of follow-up, improvement rates range from around 20 to 80% [9,10].
Patient-Reported Outcome Measures (PROMs) are commonly used to measure clinical improvement or worsening in spine literature. In combination with the concept of Minimal Clinically Important Difference (MCID), defined as the smallest change in an outcome score that is clinically beneficial within a patient group [11], optimal cut-off estimates for an individual PROM can be assessed [12,13]. The traditional method is to assess the MCID change score, or the delta value. However, since the interpretation of a change score is dependent on the baseline score, the percentage change score can provide a more representative result at group level [14]. To date, MCID estimates for PROM percentage change scores have not been reported for DCM patients undergoing surgery. Further, there is current lack of evidence in terms of which PROMs are the more accurate in capturing changes in health status among these patients and whether results differ across surgical approach.
The purpose of this study was to estimate MCID for frequently used PROMs 3 and 12 months after surgery for DCM; NDI, Numeric Rating Scale for arm pain (NRS-AP) and neck pain (NRS-NP), Euro-Qol (EQ-5D-3L), and European Myelopathy Score (EMS). A secondary aim was to report achievement rates of MCID through 12 months of follow-up. The MCID estimates are reported for change scores and percentage changes scores for the whole sample, as well as for anterior and posterior approach procedural groups.

Data collection
All data were collected through the Norwegian Registry for Spine Surgery (NORspine) which is a government funded comprehensive clinical registry. Participation in NORspine is not required for a patient to gain access to the health care, or for payment/reimbursement to a provider. All Norwegian health care providers offering cervical spine surgery (six public hospitals and three private clinics) report to NORspine. The proportion of operated patients reported to the registry was 75-78% over the study period [15].
Our research protocol was approved by the Norwegian Committee for Medical and Health Research Ethics Midt (2014/344). Informed consent was obtained from all patients before entering the registry.

Design
This is a multicenter observational study with follow-up at 3 and 12 months. Results are reported consistent with the Strengthening The Reporting of Observational Studies in Epidemiology (STROBE) statement [16], and methods are in accordance with the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) recommendations [12].

Eligibility criteria
A cohort of 614 patients undergoing surgery for DCM between January 2011 and August 2016 were found to be eligible (Fig 1). Exclusion criteria were: 1) prior surgery the index level; and 2) patients undergoing combined anterior and posterior approach, since these patients commonly are selected on a case-by-case basis [17]. Of the 614 patients, 371 underwent either ACDF (363, 98%) or ACDA (8, 2%), and 243 patients underwent posterior approach procedures, such as laminectomy with or without fusion, hemilaminectomy or laminoplasty.

Measurements
At admission for surgery (baseline), patients complete the NORspine questionnaire which cover demographics, location and extent of pain and PROMs. During the hospital stay, the surgeon records data concerning diagnosis, American Society of Anesthesiologists physical status (ASA), surgical treatment and comorbidity on a separate form. Under 'indication for operation' the surgeon can checkmark if he/she considers the patient to have myelopathy based on clinical assessment and radiological findings. To avoid selective reporting, the 3-and 12-month follow-up is conducted by the NORspine central registry unit without involvement from treating hospitals. After surgery, a questionnaire identical to that used at baseline is distributed by mail to every registered patient. One reminder questionnaire is sent to those who do not respond. The following PROMs are collected: 4. European Myelopathy Score (EMS): a patient-based questionnaire derived for assessing spinal cord function. Scoring is between 5 (severe deficit) and 18 (no symptoms) [21].
The Global Perceived Effect scale (GPE) was in the present study used as an external criterion for defining MCID. The GPE measures patient-reported treatment outcome through one single question and seven answer choices; "completely recovered", "much improved", "slightly improved", "unchanged, "slightly worse", "much worse" and "worse than ever" [22]. Patients reporting to be "completely recovered", "much improved" or "slightly improved" (1-3) were classified as having achieved a MCID. Those who considered themselves to be "unchanged" or worse (4-7) were classified as not improved.

Statistics
All statistical analyses were performed with the Statistical Package for the Social Sciences (SPSS, version 26). Continuous variables were reported as means and standard deviations and categorical variables as numbers and percentages. Differences were evaluated by Chi-square test for categorical variables and by t-tests for continuous variables. PROM change scores were obtained by subtracting the follow-up score from the baseline score. The percentage change score was calculated by dividing the change score with the baseline score and multiplying by 100. To be able to calculate the EQ-5D-3L percentage change score we converted the value range from -0.6 to 1.0 into a relative score from 0 to 100.
The correlations between the GPE scale and the different PROMs were analyzed using the Spearman correlation coefficient. Receiver Operating Curves (ROCs) were used to assess discriminative ability of each PROM and to define the optimal cut-off with the highest sensitivity and specificity. ROCs were made by plotting the sensitivity against (1 -specificity) for each possible MCID cut-off estimate. The sensitivity refers to the probability of correctly classifying an individual replying "slightly improved" or better (1-3) according to the PROM score. Correspondingly, the specificity refers to the probability of correctly classifying a patient reporting to be "unchanged" or worse as having "not improved" after surgery (4-7). The area under the ROC (AUC) with 95% confidence interval (CI) describes the test's accuracy of correctly classifying a case according to the anchor. The AUC is classified as "acceptable" from 0.7 to 0.8, "excellent" from 0.8 to 0.9 and "outstanding" from 0.9 to 1.0 [23]. To determine MCID cut-off estimates with highest sensitivity and specificity, the closest point to the upper left corner of the ROC-curve was calculated from the coordinates of the curve. Cut-off estimates were assessed for the whole DCM group and for both procedural groups. Lastly, proportions of patients achieving MCID for the whole group and both procedural groups were calculated using the cut-off estimates for each PROM.

Respondents and baseline characteristics
Of 4229 consecutive patients undergoing surgery for degenerative disorders in the cervical spine between January 2011 and August 2016, 614 patients were included. Of these patients, 371 underwent an anterior approach procedure, while 243 underwent a posterior approach procedure. A total of 67.9% and 70.1% of patients responded to the 3-and 12-month followup questionnaire, respectively (Fig 1). The non-responding patients were slightly younger (p<0.001), less likely to be retired (p<0.001), and more likely to smoke (p<0.001) ( Table 1). There were no statistically significant differences in PROM scores, except for the EQ-5D-3L mean, which was lower (poorer health-related quality-of-life) among non-responders (p<0.008) ( Table 1). Baseline characteristics of the whole myelopathy group and the two procedural groups are presented in Table 2. Compared to the anterior approach procedure group, patients in the posterior approach group were more likely to be male, not working, and to be operated at a higher number of levels. Also, they had significantly higher mean age, higher mean ASA level, more comorbidity, and more severe myelopathy symptoms according to EMS.

Correlation between the PROMs and the external criterion
For all PROMs, there was a stepwise decrease in mean change scores and mean percentage change scores at 12 months for patients who reported themselves to be completely recovered, much better and slightly better compared to those reporting no change or some degree of worsening (S1 Table). A similar pattern was found for results at 3 months (obtained on request). For the whole group, the Spearman correlation coefficients ranged from 0.30 to 0.59. The NDI showed the strongest correlation with the external anchor.

AUC and MCID
We found minor differences in AUC and MCID cut-off estimates at 3 and 12 months. Therefore, further analysis of the data is presented only for the PROMs at 12-month followup. 3-month scores are presented in S2 Table. The change scores of NDI, NRS-NP and the EQ-5D-3L showed acceptable AUC values (>0.70), whereas AUC values of the NRS-AP change score and EMS percentage change score were slightly lower than acceptable (0.69 and 0.68, respectively) ( Table 3). Most of the AUC change score values (0.64-0.74) were similar to or lower than the corresponding AUC percentage change score value (0.68-0.77). Only for EMS, the change score AUC (0.69) was higher  than the percentage change score AUC (0.68) ( Table 3). The percentage change scores of the NDI and NRS-NP had the highest sensitivity and specificity. Similar results were found for AUCs analyzed for the anterior and posterior approach groups. However, there was a tendency to lower discriminative ability for all PROMs in the posterior approach group except for EMS in which case the AUCs were higher in this group ( Table 4).

Proportions of patients with clinical improvement at 12-month follow-up
In Fig 2, we present the proportions of patients that achieved a clinical improvement according to MCID estimates for percentage change scores at 12-month follow-up. Overall, NDI (59%),   NRS-NP (61%) and EQ-5D-3L (59%) showed similar proportions of achieving a MCID, whereas NRS-NP (56%) and, in particular, EMS (51%) showed lower proportions of improvement. The rates were slightly higher for the anterior approach group compared to the posterior approach group for both change score and percentage change score (S3 Table).

Discussion
This study showed that NDI and NRS-NP were the most accurate PROMs to measure MCID among patients undergoing surgery due to DCM. EQ-5D-3L also showed acceptable accuracy for both change and percentage change score. Further, achievement of clinical improvement according to the optimal MCID estimates of the investigated PROMs ranged from 51% to 62%, depending on type of PROM, type of MCID and surgical approach.
Although there are several studies investigating MCID for DCM patients undergoing surgery [24][25][26][27][28], there are no reports of percentage change scores for this patient group. In our study, the majority of the percentage change scores were more accurate than the change scores. As shown in Table 3, percentage change scores for NDI, NRS-AP and NRS-NP showed higher AUC, including higher sensitivity and specificity, compared to the change scores. For EQ-5D, the AUCs were identical, while the EMS AUC was slightly higher for the change score than for the percentage change score (0.69 vs. 0.68). Since the use of change scores for benchmarking has been criticized for not taking into account the baseline score [29][30][31], we recommend using percentage change scores in future research.
The observed MCID estimate of 4.3 points for the NDI 12-month change score is similar to a previous study of Kato et al., who found a cut-off estimate of 4.2 in 101 myelopathy patients undergoing cervical laminoplasty [32]. Chien et al. report a slightly higher cut-off of 6 for NDI which might be due to a very small patient sample (n = 45) [26]. Similarly, in a study of 30 DCM patients by Auffinger et al., five statistical methods used for calculation of cut-off estimates showed similar or substantially higher findings for both NDI (4.8-13.4) and NRS-NP (0.36-3.11) [25].
The accuracy of EQ-5D-3L has also been assessed in a previous study. Kato et al. reported a MCID estimate of 0.05 for EQ-5D-3L with an AUC of 0.704 [32], which is in accordance with the results in the present study. Since the accuracy of EQ-5D-3L has been found to be acceptable (>0.70) in both these studies, we recommend further use of this PROM for DCM patients.
Several studies have reported MCID estimates for degenerative neck surgery patients. However, in many of the investigated cohorts there have been a mix of radiculopathy and myelopathy patients [33][34][35]. We argue that it is necessary to distinguish between myelopathy and radiculopathy patient cohorts considering the smaller amount of expected improvement among DCM patients. For example, Carreon et al., who analyzed a mixed sample of 505 patients, reported higher MCID estimates than our study for both NDI (7.5 vs. 4.3), NRS-AP (2.5 vs. 0.5) and NRS-NP (2.5 vs. 0.5) [34].
As far as we know, no previous study has presented MCID estimates for EMS and NRS-AP in a DCM cohort.

Surgical approach
We found minor differences in accuracy of NDI and NRS-NP across patients undergoing anterior versus posterior surgical procedures. However, there was a tendency to lower discriminative ability for NDI, the two NRS scores and EQ-5D-3L in the posterior approach group (Table 4). In each group, NDI and NRS-NP showed the best discriminative ability.
The MCID estimates for NDI, NRS-AP and NRS-NP were lower in the posterior approach group compared to the anterior approach group. This may indicate that posterior approach patients, which were older and had multilevel degenerative disease, were satisfied with less improvement compared to the younger and healthier patients in the anterior approach group. These results confirm that it is reasonable to analyze these two surgical groups separately. They also suggest that the interpretation of change and percentage change scores of PROMs should be different across anterior and posterior procedures.

Proportion of patients achieving MCID
The proportion of DCM patients that achieved MCID varied between 51% and 61% for the percentage change score. This is in line with a previous study by Stull et al. which reported that 40 to 61% achieve MCID in a sample of 53 DCM patients [9]. Although Stull et al. found little or no difference in achievement rates between radiculopathy and myelopathy patients, others have shown that the proportion of patients achieving a MCID is substantially higher among radiculopathy patients. Applying a cut-off estimate of 15 for NDI, two recent studies found NDI success rates of 80-92% for patients undergoing ACDF or ACDA [36,37].

Limitations and strengths
GPE is a self-reported scale and not an objective anchor. This represents the main limitation of our study as global scale ratings tend to be influenced by the current health status of the patient [22]. However, no alternative gold standard currently exists, and the GPE is still the most frequently used anchor in scientific literature [38][39][40][41][42].
The main inclusion criterion for all patients was that the operating surgeon had made a checkmark for myelopathy (yes/no) in the post-operative questionnaire under "indication for operation". This response represents a subjective judgement based on patient history, clinical features, and radiological findings. Since we have no objective reference for evaluating the accuracy of the surgeons' judgment, misclassifications could exist.
The non-respondent rate of approximately 30% is usually regarded as acceptable for a spine registry [43]. As some of the baseline characteristics of the non-respondents have been associated with poorer outcomes [44], this might still be considered a selection bias especially since we are estimating the proportion of patients achieving MCID. However, this should be of less importance when assessing actual cut-off estimates across a wide range of outcomes. Two previous studies found no differences in outcome when comparing respondents and non-respondents at follow-up, though both had slightly lower non-respondent rates [45,46].
A major strength of this study is the large sample size of surgical patients in daily clinical practice and the high coverage rate [15] indicating a high external validity of our results.

Conclusion
NDI and NRS-NP were the most accurate PROMs to measure a clinical improvement according to MCID estimates 12 months after surgery for DCM. Also, EQ-5D-3L showed acceptable discriminative ability.
Percentage change scores were more accurate than change scores, hence, we recommend using percentage change cut-off estimates in future studies. The cut-off estimates and MCID achievement rates were also slightly higher for the anterior approach group compared to the posterior approach group indicating that separate cut-off estimates should be used for each surgical approach.
An achievement of a MCID of 60% or less among DCM patients must be interpreted in the perspective that the main goal of surgery is to prevent worsening of the condition.