Degenerative cervical myelopathy [DCM] is a disabling and increasingly prevalent group of diseases. Heterogeneous reporting of trial outcomes limits effective inter-study comparison and optimisation of treatment. This is recognised in many fields of healthcare research. The present study aims to assess the heterogeneity of outcome reporting in DCM as the premise for the development of a standardised reporting set.
A systematic review of MEDLINE and EMBASE databases, registered with PROSPERO (CRD42015025497) was conducted in accordance with PRISMA guidelines. Full text articles in English, with >50 patients (prospective) or >200 patients (retrospective), reporting outcomes of DCM were eligible.
108 studies, assessing 23,876 patients, conducted world-wide, were identified. Reported outcome themes included function (reported by 97, 90% of studies), complications (reported by 56, 52% of studies), quality of life (reported by 31, 29% of studies), pain (reported by 29, 27% of studies) and imaging (reported by 59, 55% of studies). Only 7 (6%) studies considered all of domains in a single publication. All domains showed variability in reporting.
Citation: Davies BM, McHugh M, Elgheriani A, Kolias AG, Tetreault LA, Hutchinson PJA, et al. (2016) Reported Outcome Measures in Degenerative Cervical Myelopathy: A Systematic Review. PLoS ONE 11(8): e0157263. https://doi.org/10.1371/journal.pone.0157263
Editor: Faiz Ahmad, Emory University School of Medicine, UNITED STATES
Received: February 26, 2016; Accepted: May 26, 2016; Published: August 2, 2016
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by the corresponding author (MRNK) NIHR [National Institute Health Research] Clinician Scientist Award. PJAH holds a NIHR research professorship.
Competing interests: The authors have declared that no competing interests exist.
Chronic compression of the cervical spinal cord can arise from a range of disease processes, including spondylosis, stenosis, disc herniation, ligament hypertrophy or ossification. Collectively these disorders are referred to by the encompassing term of degenerative cervical myelopathy [DCM].  Symptoms often start with mild pain, loss of digital dexterity and subtle gait disturbances but typically progress, with tetraplegia a potential extreme. DCM is estimated to be the most common cause of spinal cord dysfunction worldwide.  In an Asian population 4 in 100,000 underwent surgery for DCM annually.  Given its prevalence in the elderly and with an aging population, its incidence is predicted to rise. 
Surgical decompression, to alleviate cord compression and prevent deficit progression, is the mainstay of current treatment. However, many controversies persist, including the type and timing of surgery, leading to wide ranging variation in practice. 
The emergence of an optimum treatment strategy has been complicated by the heterogeneous outcome measures used across the globe, introducing publication bias, hampering inter-study comparison and guideline creation. [6,7] This is a recognised challenge in many clinical fields and has lead to the development of minimum data sets, which consist of agreed, standardised set of data elements that should be measured and reported in a specific field of healthcare. 
Various methods have been developed to generate such data sets. A first step is often a systematic review, to identify the range of outcomes used in the literature. The list of reported outcomes is subsequently refined using a structured consensus process including all relevant parties; clinicians, academics, allied care professionals, patients and carers.[9,10] The latter stakeholders are key to ensuring outcomes are patient centred. Organisations such as the COMET [Core outcome measures in effectiveness trials] initiative have been setup to facilitate the process. 
The objectives of this study were therefore to describe the range of outcome measures, and the manner in which they are reported, in studies of DCM, to inform a subsequent consensus study. The current work complements and extends a recent narrative review of ancillary outcome measures in DCM. 
The systematic review was conducted in accordance with the PRISMA guidelines (S1 Table) and registered with the PROSPERO prospective register of systematic reviews (CRD42015025497). MEDLINE [Ovid] and Embase [Ovid] databases, from 1st January 1995 to 12th August 2015 were searched using the search strategy [‘Cervical’] AND [‘Myelopathy’] for articles considering myelopathy secondary to subacute compression. Animal studies, case reports and letters/editorials were excluded.
Titles and abstracts were screened for relevance, with subsequent full text articles sought and screened for eligibility according to the following criteria;
- English, full text
- Prospective study with >50 patients or retrospective study with >200 patients
- Assessment of clinical outcomes in response to a treatment stratagem (conservative or interventional)
Articles were screened by two authors [BMD, AE] and data were extracted independently by two authors [BMD, MM] using a piloted proforma. Discrepancies were settled by discussion and mutual agreement. (S2 Table, S3 Table)
Descriptive statistics were used to report frequency and proportion of outcome measures. When considering the reporting method of a single instrument, proportions were presented as the percentage of studies, which had used that instrument.
Of the 6894 articles identified, 4261 articles were excluded based on our criteria. Following abstract and title review, 170 articles were shortlisted (S2 Table) and their full published manuscripts reviewed. Of these, 108 were included in this study (S3 Table), assessing 23, 876 patients [Fig 1]. The majority of studies were conducted in North America (33, 31%), Japan (28, 26%) or other parts of Asia (36, 33%). Seventeen (16%) studies were randomised controlled trials and 91 (85%) were conducted prospectively. Publication rate, including the proportion of prospective studies and randomised controlled trials increased over time [Fig 2]. 105 [97%] studies considered patients undergoing surgery, including one study assessing CT guided treatment of disc herniation. The remaining studies considered conservative management.
Stacked line graph of year of publication of identified studies. Retrospective studies are in blue, prospective studies in red and the number of RCTs indicated by the black line. [Pro: Prospective, Retro: Retrospective, RCT: Randomised Controlled Trial]. The number of publications has increased over time, including those deemed of higher quality (prospective and randomised).
Identified outcomes could be categorised into five common domains; function (reported by 97, 90% of studies), complications (reported by 56, 52% of studies), quality of life [QOL] (reported by 31, 29% of studies), pain (reported by 29, 27% of studies) and imaging (reported by 59, 55% of studies) outcomes (Table 1). Seventy-sex (70%) of studies considered more than one domain. Only 7 (7%) studies considered all of domains in a single publication, 6 of which were RCTs.
Function was clinically assessed in 97 (90%) of studies, of which 35 (32%) used more than one tool. The Japanese Orthopaedic Association assessment [JOA], functional and neurological assessments, was most prevalent (50, 46%). Common alternatives included the gait and mobility-centric Nurick score (25, 23%), modified JOA [mJOA], an adaption of the JOA for the non-Asian population (20, 19%), and the patient-reported Oswestry Neck Disability Index [NDI] scales (20, 19%). The new JOA cervical myelopathy evaluation questionnaire, which in essence combines the mJOA with the SF-36, was used in 3 (3%) studies. The popularity of these grading systems changed over time (Fig 3) with the use of the JOA declining, in favour of mJOA and Nurick.
Each bar represents 100%. The use of the most prevalent outcome measures (JOA, Nurick, NDI, mJOA, JOACMEQ) are reported as percentages for each time period.
Geographical variation was noted in the choice of functional assessment (Fig 4). For example Japanese studies predominantly used the JOA assessment, whereas this was never used in North American studies.
Each bar represents 100%. The use of the most prevalent outcome measures (JOA, Nurick, NDI, mJOA, JOACMEQ) are reported as percentages for each territory.
Other functional outcome assessments included Odom’s Criteria (8, 7%), “Neurological Success” dichotomised to the same/better or worse from examination findings (6, 6%), return to work (3, 3%) and the patient-reported Myelopathy Disability Index (2, 2%). Grip and release assessment, 30m walking test, Neurosurgical Cervical Spine Score, Mean Locomotion Score, Grip Strength, Neck range of movement and the Ranawat classification of disease severity featured once only.
The JOA was mostly reported as an overall mean (42, 84%) with or without a standard deviation (38, 76%). Most, additionally reported the JOA recovery rate (34, 68%), first described by Hirabayashi et al (1981).  The recovery rate was reported uniquely in two cases. Alternatives include mean difference and categorised reporting. Components of the JOA were also reported and sub-analysed in four studies (8%).
The method of reporting the Nurick grade had greater variability. Half reported a mean (13, 52%), often with standard deviation or 95% confidence intervals. Others reported subcategories, such as proportion of patients with improvement (9, 36%). The Nurick recovery rate was used twice.
The mJOA was generally reported as a mean (15, 75%), with or without standard deviation (8, 40%). In addition values were categorised (8, 40%), presented as mean difference (2, 10%) or using a recovery rate formula (1, 5%). On three occasions, the postoperative mJOA was only reported as a categorised value.
The NDI was generally reported as a mean (15, 75%), with or without standard deviation (10, 50%). Alternatives were mean difference (6, 30%) or proportion achieving a predefined improvement (3, 15%).
56 (52%) of studies reported intervention complications. This could simply be a description of their overall frequency or a specific breakdown. Reported specific complications included C5 palsy, dysphagia, dysphonia, dural tear, CSF leak, surgical site infection and haematoma. Mortality, even if absent, was infrequently reported (15, 14%). The requirement for revision surgery, either immediate or delayed, was reported in 27 (25%) studies.
Quality of Life
QOL was reported by 31 (29%) of studies. The Medical Outcome Short Form Health Survey [SF-36] was the predominant QOL measure used (25, 23%). The more recently developed, Japanese Orthopaedic Association Cervical Myelopathy Evaluation Questionnaire [JOACMEQ] was only used in 3 (3%) of studies. Other outcome measures included the EQ-5D (2, 2%) and the 12-item Short Form Health Survey [SF-12] (2, 2%).
The method of reporting the SF-36 varied; 9 (36%) calculated and reported each of the 8 component scores [vitality, physical functioning, bodily pain, general health perceptions, physical role functioning, emotional role functioning, social role functioning and mental health], whereas 9 (36%) reported the mental [MCS] and physical [PCS] component summary scores only. Other options included reporting only the PCS (3, 12%), the overall score (3, 12%) or everything (1, 1%).
Pain was assessed in 29 (27%) of studies. With the exception of one study, it was reported via an assessment method distinct to a QOL tool. Pain was most commonly measured using the 10cm Visual Analogue Scale [VAS] (21, 19%). Alternatives included a numeric rating scale (1, 1%), Likert scale (2, 2%) or consultation questions as simple as “Do you have any pain, yes or no?”
In general pain assessment was focused (24, 22%); 17 (16%) considered upper limb pain, 22 (20%) neck pain and 9 (8%) axial pain. The term axial pain is ambiguous and it was rarely defined. One study considered it to include neck and shoulder pain, whereas another made a distinction between neck and ‘axial symptoms’.
The reporting of the VAS also varied. The majority (17, 81%) reported a mean and standard deviation before and after, with statistical comparison by a T Test. This was despite distribution analysis being reported on only 4 occasions. Alternative reporting included the percentage of patients who attained a >20mm improvement (2, 12%).
A total of 59 (55%) of studies reported radiological outcomes. Assessments were predominantly made using X-rays (46, 43%). CT (23, 21%) and MRI (19, 18%) were used less commonly. Radiological outcomes largely concerned fusion (29, 49%), range of movement (25, 42%), cervical alignment (25, 42%) and decompression (15, 25%). The exact definition of fusion and assessment of range of movement was variable. Cervical alignment was largely assessed using Cobb’s method (14, 56%) or the Ishihara’s Cervical Curvature Index (7, 28%). Decompression generally referred to adequate cord decompression assessed by MR (8, 53%). Alternative metrics pertain to changes in canal size.
Additional outcome measures included adjacent segment degeneration (3, 5%) or novel MR cord metrics pertaining to cord signal intensity, cross sectional area and ‘drift back’ (4, 4%).
Some studies reported treatment characteristics including length of operation (23, 21%), blood loss (22, 20%) and length of hospital (8, 7%). Cost of Care was reported twice (2%).
Randomised Controlled Trials
Overall there were 17 RCTs, and the mean number of domains they reported was significantly greater than other studies (3.9 vs. 2.3) (Table 2).
A functional outcome was reported by all RCTs. Four RCTs reported more than one function assessment, and chosen assessments were typically the JOA (8, 47%) or NDI (7, 41%). Complications were reported by 15 (88%) and QOL by 8 (47%) of studies. The favoured QOL measure was the SF-36, used by 7 of these. Pain was assessed by 11 (65%) of studies, typically using the VAS (9, 82%). Radiographic outcomes were reported by 16 (94%) of RCTs
This systematic review has identified that DCM is a topic of world-wide interest, with research participation from all corners of the globe. However, there is great variation in the types of outcomes assessed and how they are reported. Key outcome domains were function (most commonly reported using the JOA), QOL (most commonly reported using the SF-36), treatment complications, pain (most commonly reported using the 10cm VAS) and imaging. Very few studies considered them all. RCTs reported more domains than other types of study and typically, when considering a domain, were more consistent in their choice of reporting.
This heterogeneity is not surprising, a survey by Singh et al (2005) of clinicians found great variability in the assessment and grading of DCM. Equally it has been well demonstrated in other fields of health care. [17–19] Heterogeneity of outcome reporting is recognised to challenge inter study comparison and likely lead to bias in the dissemination of knowledge.  To overcome these challenges the development of standardised reporting sets has been proposed. Similarly, the present findings of this study provide a strong basis for the development of a standardised reporting set in DCM. [7,10,17]
The search results demonstrate that cervical myelopathy is a feature of much published research. The search strategy excluded foreign language articles and was designed to focus on contemporary and large sample studies. The global representation of included studies suggests that the foreign language exclusion is unlikely significant. Indeed, the authors propose that assessment of 20 years of published data of large sample studies, is representative of current practice.
Beyond demonstrating heterogeneity, this study aimed to collate current reporting practice, to help inform stakeholders of a future DELPHI process. When interpreting these results, it is therefore important to recognise that only outcome measures previously used will be represented. This poses a few potential problems. First, more recently developed assessment tools could be underrepresented (e.g JOACMEQ). Second, the measurement properties of these scales have not been assessed here, nor whether there is current rational for the measured outcomes. Third, novel areas of assessment will not be represented.
In relation to this latter aspect, it is significant that patient reported outcomes [PROMs] and preference-based outcomes, topical areas for current trial development and funding, were poorly represented (33, 31%). The significance of patient involvement in research has become more apparent in recent years. Chalmers et al (2014) found 85% of US health research funding was wasted and concluded one of many contributing factors was the misalignment of research objectives with patient needs. Consequently PROMs are now an important aspect of trial funding applications.
The significance of preference-based outcomes in trials is largely to allow economic analysis via the derivation of metrics such as quality adjusted life years [QALY]. In practice, the gold standard preference-based tests (such as standard gamble and time trade off) are replaced by utility instruments, to infer cost utility indirectly. These utility instruments are generally quality of life measures. The most commonly used scales are the EQ-5D and SF-6D. Whilst the SF-36 can act as a utility instrument, typically it requires derivation of the SF-6D to do so. The UK advisory body for government funded healthcare, NICE, recommends the EQ-5D for cost utility assessment, when considering treatments for approval.
As stated in the introduction, this systematic review is the starting point for a larger process, to define the core outcomes and common data elements in degenerative cervical myelopathy [CODE-DCM]. The study aims to identify the key data elements, to advise on how they should be reported and to identify the umbrella term by which this disease process should be referred. It is registered with the COMET initiative.
The results of this process will be to allow efficient and effective inter-study comparison, to support future research and the development of an optimum treatment.
The findings of this systematic review, alongside further planned work, will be used to inform a DELPHI process, made up of key stakeholders representing patients, carers, professionals and industry.
Significant heterogeneity exists in the outcome reporting of studies assessing management of DCM. The development of a standardised, reporting set would support the field in the future. The findings of this study will be used as part of a larger consensus process to define the core outcomes and data elements in degenerative cervical myelopathy [CODE-DCM].
S1 Table. PRISMA Checklist for Systematic Reviews.
The PRISMA Checklist, including page references to the location of components in this article.
S2 Table. Shortlisted Articles.
Spreadsheet providing the initially shortlisted articles.
- Conceived and designed the experiments: BMD MRNK.
- Performed the experiments: BMD AE MM.
- Analyzed the data: BMD.
- Wrote the paper: BMD.
- Review and appraisal of manuscript and interpretation of findings: MRNK PJAH AK MGF LAT.
- 1. Nouri A, Tetreault L, Singh A, Karadimas SK, Fehlings MG. Degenerative Cervical Myelopathy: Epidemiology, Genetics, and Pathogenesis. Spine. 2015;40: E675–93. pmid:25839387
- 2. Singh A, Tetreault L, Casey A, Laing R, Statham P, Fehlings MG. A summary of assessment tools for patients suffering from cervical spondylotic myelopathy: a systematic review on validity, reliability and responsiveness. Eur Spine J. 2015;24 Suppl 2: 209–228. pmid:24005994
- 3. Wu H-L, Ding W-Y, Shen Y, Zhang Y-Z, Guo J-K, Sun Y-P, et al. Prevalence of vertebral endplate modic changes in degenerative lumbar scoliosis and its associated factors analysis. Spine. 2012;37: 1958–1964. pmid:22565387
- 4. Karadimas SK, Gatzounis G, Fehlings MG. Pathobiology of cervical spondylotic myelopathy. Eur Spine J. 2015;24 Suppl 2: 132–138. pmid:24626958
- 5. Shiban E, Meyer B. Treatment considerations of cervical spondylotic myelopathy. Neurology: Clinical Practice. 2014.
- 6. Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340: c365. pmid:20156912
- 7. Dwan K, Gamble C, Williamson PR, Kirkham JJ, Reporting Bias Group. Systematic review of the empirical evidence of study publication bias and outcome reporting bias—an updated review. PLoS ONE. 2013;8: e66844. pmid:23861749
- 8. Boers M, Kirwan JR, Wells G, Beaton D, Gossec L, d'Agostino M-A, et al. Developing core outcome measurement sets for clinical trials: OMERACT filter 2.0. J Clin Epidemiol. 2014;67: 745–753. pmid:24582946
- 9. Sinha IP, Smyth RL, Williamson PR. Using the Delphi technique to determine which outcomes to measure in clinical trials: recommendations for the future based on a systematic review of existing studies. PLoS Med. 2011;8: e1000393. pmid:21283604
- 10. Abma TA. Patient participation in health research: research with and for people with spinal cord injuries. Qual Health Res. 2005;15: 1310–1328. pmid:16263914
- 11. Gargon E, Gurung B, Medley N, Altman DG, Blazeby JM, Clarke M, et al. Choosing important health outcomes for comparative effectiveness research: a systematic review. PLoS ONE. 2014;9: e99111. pmid:24932522
- 12. Gargon E, Williamson PR, Altman DG, Blazeby JM, Clarke M. The COMET initiative database: progress and activities update (2014). Trials. 2015;16: 515. pmid:26558998
- 13. Kalsi-Ryan S, Singh A, Massicotte EM, Arnold PM, Brodke DS, Norvell DC, et al. Ancillary outcome measures for assessment of individuals with cervical spondylotic myelopathy. Spine. 2013;38: S111–22. pmid:23963009
- 14. PROSPERO Database. Available: http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42015025497
- 15. Hirabayashi K, Miyakawa J, Satomi K, Maruyama T, Wakano K. Operative results and postoperative progression of ossification among patients with ossification of cervical posterior longitudinal ligament. Spine. 1981;6: 354–364. pmid:6792717
- 16. Singh A, Gnanalingham KK, Casey AT, Crockard A. Use of quantitative assessment scales in cervical spondylotic myelopathy—survey of clinician's attitudes. Acta Neurochir (Wien). 2005;147: 1235–8– discussion 1238.
- 17. Saver JL, Warach S, Janis S, Odenkirchen J, Becker K, Benavente O, et al. Standardizing the Structure of Stroke Clinical and Epidemiologic Research Data. Stroke. 2012;43: 967–973. pmid:22308239
- 18. Chari A, Hocking KC, Broughton E, Turner C, Santarius T, Hutchinson PJ, et al. Core Outcomes and Common Data Elements in Chronic Subdural Hematoma: A Systematic Review of the Literature Focusing on Reported Outcomes. J Neurotrauma. 2015.
- 19. Kirkham JJ, Gargon E, Clarke M, Williamson PR. Can a core outcome set improve the quality of systematic reviews?—a survey of the Co-ordinating Editors of Cochrane Review Groups. Trials. 2013;14: 21. pmid:23339751
- 20. Song F, Parekh S, Hooper L, Loke YK, Ryder J, Sutton AJ, et al. Dissemination and publication of research findings: an updated review of related biases. Health Technol Assess. 2010;14: iii–ix–xi– 1–193.
- 21. Fukui M, Chiba K, Kawakami M, Kikuchi S-I, Konno S-I, Miyamoto M, et al. An outcome measure for patients with cervical myelopathy: Japanese Orthopaedic Association Cervical Myelopathy Evaluation Questionnaire (JOACMEQ): Part 1. J Orthop Sci. 2007;12: 227–240. pmid:17530374
- 22. Norvell DC, Dettori JR, Chapman JR. Success in Spine Care: The Proof Is in the Measurements, Part II. Global Spine J. 2015;5: 455–456. pmid:26682094
- 23. Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J, Gülmezoglu AM, et al. How to increase value and reduce waste when research priorities are set. Lancet. 2014;383: 156–165. pmid:24411644
- 24. Hawthorne G, Densley K, Pallant JF, Mortimer D, Segal L. Deriving utility scores from the SF-36 health instrument using Rasch analysis—Springer. Qual Life Res. 2008;17: 1183–1193. pmid:18825509
- 25. Brazier J, Usherwood T, Harper R, Thomas K. Deriving a preference-based single index from the UK SF-36 Health Survey. J Clin Epidemiol. 1998;51: 1115–1128. pmid:9817129
- 26. Longworth L, Rowen D. Mapping to obtain EQ-5D utility values for use in NICE health technology assessments. Value in Health. Elsevier; 2013;16: 202–210.
- 27. comet-initiative.org. Available: http://www.comet-initiative.org.