The Effectiveness of Cognitive Behavioural Treatment for Non-Specific Low Back Pain: A Systematic Review and Meta-Analysis

Objectives To assess whether cognitive behavioural (CB) approaches improve disability, pain, quality of life and/or work disability for patients with low back pain (LBP) of any duration and of any age. Methods Nine databases were searched for randomised controlled trials (RCTs) from inception to November 2014. Two independent reviewers rated trial quality and extracted trial data. Standardised mean differences (SMD) and 95% confidence intervals were calculated for individual trials. Pooled effect sizes were calculated using a random-effects model for two contrasts: CB versus no treatment (including wait-list and usual care (WL/UC)), and CB versus other guideline-based active treatment (GAT). Results The review included 23 studies with a total of 3359 participants. Of these, the majority studied patients with persistent LBP (>6 weeks; n=20). At long term follow-up, the pooled SMD for the WL/UC comparison was -0.19 (-0.38, 0.01) for disability, and -0.23 (-0.43, -0.04) for pain, in favour of CB. For the GAT comparison, at long term the pooled SMD was -0.83 (-1.46, -0.19) for disability and -0.48 (-0.93, -0.04) for pain, in favour of CB. While trials varied considerably in methodological quality, and in intervention factors such as provider, mode of delivery, dose, duration, and pragmatism, there were several examples of lower intensity, low cost interventions that were effective. Conclusion CB interventions yield long-term improvements in pain, disability and quality of life in comparison to no treatment and other guideline-based active treatments for patients with LBP of any duration and of any age. Systematic Review Registration PROSPERO protocol registration number: CRD42014010536.


Materials and Methods
The primary objective was to assess the effectiveness of CB interventions in comparison to no treatment and other conservative guideline active treatments, on pain, disability and quality of life in adults with non-specific LBP. While we assessed short-term (ST) (as close to 6 weeks and not exceeding 12 weeks) effects, our primary end point of interest was long-term (LT) (closest to 52 weeks and >26weeks). This review followed a protocol registered on PROSPERO (reference: CRD42014010536).

Data sources and searches
Using search terms from the Cochrane Back Review Group (CBRG, 2013b) (S1 Fig Search  strategy), a sensitive search of 9 electronic databases (Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE (1966 to date), EMBASE (1988 to date), CINAHL (1982 to date), AMED (1985 to date), Physiotherapy Evidence Database (PEDro), the Cochrane Back Review Group (CBRG) Trials Register, PsycINFO and OpenGrey (www.opengrey.eu) was performed from inception to November 2014. In addition, searches of reference lists of all included studies and relevant systematic reviews as well as personal communication was undertaken to identify potentially eligible studies.

Selections of studies and data extraction
Inclusion criteria. From the identified studies, original studies were included if they were a randomised controlled trial, included patients with non-specific low back pain of any duration, contained a cognitive behavioural intervention arm, contained a comparison arm of waitlist control/usual care (WL/UC), and/or guideline-based active treatment (GAT), and included one of the following outcomes: pain, disability, quality of life, or work disability. The European LBP guidelines for acute [23] and chronic [24] non-specific LBP were used to guide the identification of treatments for the GAT comparison (Fig 1). Full descriptions of the inclusion and exclusion criteria, including our intervention definition, are reported in Table 1.  • Trials were excluded if they included participants with a pathological cause of LBP, such as: infection, neoplasm, metastasis, osteoporosis, rheumatoid arthritis, fractures, spinal canal stenosis, or nerve root compromise.
•Participants with neurodegenerative conditions (such as, multiple sclerosis), or women experiencing LBP during pregnancy, were also excluded.

Intervention
• RCTs were included if they investigated a CB intervention for non-specific LBP.
• As there is no consensus for a specific definition of CB interventions (Burton et al, 2005; Hansen et al, 2010), the review team developed a working definition to allow for transparency in selection of studies**:CB interventions were included if they met the following working definition 'The intervention is explicitly or implicitly based on the CB model (where the use of CB in relation to the intervention is explicitly stated OR where the connection between thoughts, feelings and behaviours in relation to the intervention is implicitly described) AND it uses specific techniques to both change cognitions and change behaviours.' • Psychological interventions that were not explicitly or implicitly based on the CB model were excluded. Interventions using techniques to change either cognitions (such as cognitive restructuring) or behaviours (such as operant conditioning), but not both, were also excluded.
• CB interventions delivered by any health care professional were included, however, interventions delivered by lay personnel were excluded.
• The delivery method was not restricted (e.g. delivery using face-to-face or with online methods were included).
• In cases where treatments were multimodal, for example, including CB as a component of a comprehensive back school, the intervention was deemed eligible only when the main focus of the intervention was based on CB. For example, if an intervention consisted of six treatment sessions covering a wide range of components, and CB constituted only one of those sessions, it was not deemed eligible for inclusion as CB was not the main focus of the treatment.

Comparator (s)
• Two comparison arms were included: (1) No treatment (WL/UC): No treatment-A trial arm in which participants received no active treatment during the study period, this included studies with a wait-list (WL) comparison or a comparison defined as usual care (UC) in which no prescribed treatment was provided within the trial.
(2) Guideline-based Active Treatment (GAT): A prescribed/supervised treatment in line with the European Guidelines (2009). A trial arm in which participants were allocated to receive an active treatment, in line with the European LBP guidelines, the details of which were specified in some way.
• Studies comparing different types of CB intervention (e.g. one to one versus group interventions) were only included where a non-CB control arm was used as a comparison. Studies comparing CB interventions to a surgical comparator or other treatments not listed in the European Guidelines were be excluded.
• Studies comparing a CB intervention to a drug based comparator were only to be included if the drug type and dosage were in line with the current European LBP guidelines (2009).

(Continued)
Screening, data extraction and risk of bias assessment. Ã Screening and data extraction forms were piloted prior to study selection to ensure consistency among reviewers. All study titles and abstracts retrieved from the literature searches were combined in EndNote X10 and double screened by review authors (BC-100%, AH-50%, HR, 50%); subsequent full texts were also double screened. Double data extraction was inputted onto a standardised form, adapted from the Cochrane Back Review Group form, and included information on: patient characteristics (age, symptom duration and treatment allocation); intervention information (duration, dose, mode and provider), number and type of comparison groups; outcome information (measurement tool, assessment time point and response rate); and analysis information (numbers analysed, mean and standard deviation). Two reviewers assessed each study for risk of bias against the updated Cochrane CBRG criteria, which classifies risk of bias across 6 domains (selection bias, performance bias, detection bias, attritions bias, reporting bias and other bias) [29], and rated as "low", "high" or "unclear" (handbook.cochrane.org, section 8.5d). With permission and in collaboration with colleagues who authored the most recent Cochrane Review on CBT for LBP [17], we used quality assessments from that review where these were available. All assessments were undertaken using the same tool, and by trained and experienced individuals. In situations where agreement was not achieved between the two assessors, a further review author (EW) was consulted. If either of the review authors were a (co-) author of one of the included studies, they were not involved in the assessment of that trial in this review.
Data cleaning. When available, multiple published sources were retrieved for each study to capture all study information. For clarification or further information, study authors were contacted. Where the standard deviation (SD) could not be obtained nor calculated from

Variable Description
Outcomes At least one measure of either; Pain, Disability, Quality of Life, Function, work-disability. If more than one measure was used to assess these variables, priority was assessed according to the following rules: • Pain: For pain, if more than one outcome measure is reported, the hierarchy will be VAS then NRS then single item measure, then multi-item measure. For quality of life, both the EQ-5D and the SF-36 or SF-12 are commonly used to assess general quality of life. If an included study reports more than one of these scales, they will be prioritized in the above order.
• Condition-specific disability: For disability, if more than one outcome measure is reported, the Roland-Morris Questionnaire (RMQ) will be used in the analysis if available. Otherwise, the ODI, the QBPDS, and the PDI will be prioritised as ordered.
• Quality of life: Both the EQ-5D and the SF-36 or SF-12 are commonly used to assess general quality of life. If an included study reports more than one of these scales, they will be prioritised in the above order. For the SF-12 and SF-36 quality of life measures, the summary scale (mental & physical health) will be prioritized. If the two components are only reported separately, the physical health component will be prioritized over the mental health component. If only the eight subscales are reported, the general health domain will be used in the analysis.
• Work Disability: days off work is a commonly reported outcome. If an included study reports day off work, it may be eligible to be included in a meta-analyses if certain criteria are met between studies, i.e. comparable time-periods.

Language
No restrictions, translation where possible ** The working CB definition was determined by mapping and cross referencing the best available evidence pertaining to the definition of a CB intervention and included four sources: two expert discussion papers [25,26], a clinical competency tool for using CB interventions (CTS-R-Pain; [27]) and the DOH's clinical competency criteria for delivering CB treatments for anxiety and depression [28]. available data, imputation using the pooled SD from all the other studies in the same metaanalysis was planned [30]. Studies reporting only the median and range of outcomes were not included in the meta-analysis since it suggests that data was skewed [30]. For outcomes where data was not reported in a suitable format for a meta-analysis, a narrative summary was produced. Cluster RCTs were eligible for inclusion and where possible, effect measures and standard errors were extracted from an analysis which took clustering into account. If the reported results did not take clustering into account, we adjusted for this where possible by using the number of clusters and an estimate of the intracluster correlation coefficient [30]. Where a study contained a different number of eligible intervention and/or control groups, the eligible groups were pooled to create one effect size for the study to avoid double-counting and therefore biasing the meta-analyses [30].

Data Synthesis
Meta-analyses. Heterogeneity was assessed from clinical, methodological and statistical perspectives. Statistical heterogeneity was assessed graphically with forest plots and statistically with the Chi-squared (χ 2 ) test and the I 2 statistic [31]. I 2 statistics were interpreted as follows: 0% to 40% may not be important; 30% to 60% may represent moderate heterogeneity; 50 to 90% may represent substantial heterogeneity; 75% to 100% high heterogeneity [32]. Data was analysed using Stata IC 13.
Meta-analyses were performed using a random effects model [33]. The primary summary effect measure was the standardised mean difference (SMD) for all outcomes where data was measured with different instruments. Where applicable, scales were reversed by subtracting the mean score from the maximum score for that scale. A negative SMD indicated a treatment effect in favour of the CB intervention. Effect sizes proposed by Cohen [34] were used with 0.2 representing a small effect, 0.5 a moderate effect, and 0.8 a large effect.
Contrasts. Our primary contrast was the effect of CB versus GAT at long-term follow-up (closest to 52 weeks and >26weeks). We also included a short-term follow-up assessment (as close to 6 weeks but not exceeding 12 weeks) and a comparison to waitlist and usual care (WL/UC).
Reporting bias. Funnel plots were produced to assess for reporting bias. Asymmetry of funnel plots was assessed visually and using Egger's test [35] when a minimum of 10 studies were included in the meta-analysis and the studies were not of similar size [36]. In the event of any detected asymmetry, sensitivity analyses were planned to consider the implications of bias on the meta-analysis.
Sub-group analyses. Based on previous evidence [37][38][39] we explored the treatment effect for studies that only included patients with acute (<6weeks) or persistent (>6 weeks) LBP through subgroup analyses. To explore baseline severity, studies were categorised according to the mean score for all participants at baseline on a pain scale and a back-specific disability measure; studies with a mean score of !60% of the scale maximum for both pain and disability were classified as high intensity [40] and analysed as separate subgroups.
We explored potential areas of heterogeneity by examining methodological quality, intervention and control features, and assessment time point variation. Risk of bias was based on five items likely to be associated with internal validity to calculate a summary score (allocation concealment, blinding of patients, blinding of outcome assessor, intention to treatment analysis, acceptable drop-out rate). Using the PRECIS tool [41], pragmatism was assessed on 3 items for both the intervention and control (i) training / expertise, (ii) protocol flexibility, and (iii) fidelity assessment. Therefore, studies were classified as (i) low risk of bias (having at 3 to 5 items) or high risk of bias (having 0-2 items), and (ii) high pragmatism (score of 3) or low pragmatism (score of 1-2). We planned to explore intervention and control intensity (total number of contact hours), however, we noted that as the intensity of the experimental intervention increased, so did the intensity of the control intervention. Thus, we chose not to explore this analysis since it would not have been possible to determine the extent intervention and control intensity influenced effect size.
Sensitivity analyses. We formally investigated influences on effect size at two levels: (i) methodological, (ii) concurrent treatments, and (iii) assessment time point variation. First, meta-analyses were repeated including only studies judged as low risk of bias (assessed as above). Secondly, additional sensitivity analyses were performed to examine the impact of concurrent treatments (those studies that evaluated the CB intervention in combination with the control intervention, such as, CB plus exercise vs exercise alone) on the summary effect size. Thirdly, meta-regression was used to assess the impact of assessment time point variation on the level of observed heterogeneity [42].

Results
We identified 1629 unique titles, from which 23 unique studies met the inclusion criteria (Fig 2). The 23 studies contained 3359 participants with non-specific LBP; with only 3 studies [22,43,44] including participants with pain of less than 6 weeks in duration (n = 373). CB interventions were delivered through three modes: group-based (n = 10), individual (n = 9), or combined (n = 4). Intervention duration varied between 1 to 52 weeks (average 8.4 weeks) and total contact time ranged from 20 minutes to 91 hours (average: 19 hours). Treatment providers included psychologists (n = 8), physiotherapists (n = 6), multiple professions (n = 5), GPs (n = 1), and self-directed (n = 3). Comparators included WL/UC (n = 10), GAT (n = 12), and both a WL/UC and a GAT comparison (n = 1). Pain and disability were the most frequently reported outcomes (ST: 87% and 61% and LT: 48% and 35% respectively). However, the choice of outcome measure, and the assessment time points, varied considerably between studies. A description of each study can be found in Tables 2 and 3.

Sample size and methodological quality
Overall, sample sizes were moderate, though variable, ranging from 12 [22] to 701 [9] participants. Quality of reporting was poor and inconsistent leading to judgments of 'unclear' risk of bias in at least 25% of the six domains (Figs 3 and 4). While there was wide variation across studies on most items, over 80% of studies scored high or unclear on blinding of study participants and personnel, and blinding of outcome assessment. Table 4 presents an overview of the results including the standardised mean difference for both contrasts CB vs (i) WL/UC and (ii) GAT at both short term and long term for three outcomes (pain, disability and quality of life). Sensitivity analyses including only those studies with low risk of bias are also reported for each contrast. The main meta-analyses for pain, disability and quality of life outcomes at short and long term follow-up are shown in Figs 5-7 and in S2-S4 Figs.

Meta analyses
CB versus WL/UC. Pooled estimates at ST were small and statistically significant for pain (p<0.01, n = 9) and disability (p = 0.02, n = 8). Sensitivity analysis excluded many studies due to high risk of bias but did not show statistically different results. At long term, pooled estimates were still small and significant for pain (p = 0.02, n = 3) but did not reach significance for disability (p = 0.06; n = 3). No studies at long term were classified as low risk of bias and therefore sensitivity analysis was not performed. Only two studies reported quality of life data,  which showed a small and insignificant effect in favour of CB at short and long term. Two studies reported work disability but the evidence remained inconclusive. One study had incomplete data, reporting the intervention group results only [45], and the other study found no significant between-group difference in work disability when assessed using a patient-reported binary (yes/no) measure of days lost at work [44]. CB vs GAT. Pooled effect estimates were moderate to large and statistically significant for pain and disability in both the short (pain; p = 0.02, n = 10; disability; p<0.01, n = 7) and long term (pain; p = 0.03, n = 10; disability; p = 0.01, n = 7). While effect sizes were moderate for quality of life, they were not statistically significant (LT p = 0.05, n = 5; ST p = 0.10, n = 3). However, we observed considerable heterogeneity in all comparisons (I 2 >80%). There was a wide range in the magnitude of effects and while three studies showed large and significant effect sizes in favour of CB, the majority of studies showed small to moderate effect sizes that were insignificant at long term. Furthermore, a single study [46] with a particularly long intervention duration (52 weeks) may have influenced the pooled effect sizes due to its extremely large effects (SMD -5.36 for disability LT, compared to the second largest of SMD -0.92). While removing this study from the analyses considerably reduced the magnitude of the pooled SMDs, the estimates remained statistically significant and the heterogeneity for pain and disability outcomes remained substantial. Therefore, the pooled effect sizes should be viewed with caution and thus, we have presented a narrative synthesis of the studies to provide a more meaningful interpretation of the results.
Narrative synthesis. At long-term follow-up, 7 studies assessed disability, 10 assessed pain, and 5 assessed quality of life, with a wide range of reported effect sizes. The majority of studies reported effects in favour of CB, most of which were small to moderate and not statistically significant, with a smaller number reporting large and significant effect sizes. This wide range in effect sizes was also observed at short term. Due to the considerable statistical heterogeneity, we explored common factors that could explain the diversity in effect sizes including methodological design (risk of bias) and intervention and control characteristics (such as, pragmatism) (S1 Table. Information for GAT comparisons). Further details of the GAT treatments, such as dose and duration, are also reported in S1 Table. Information for GAT comparisons.
Restricting on methodological quality reduced heterogeneity to a moderate level at short and long term for all outcomes. In the subgroup of low risk of bias studies (n = 4), effect sizes remained in favour of CB, were smaller and more precise, and were either approaching significance (n = 4) or significant (n = 1). Subgrouping according to the PRECIS tool classification did not reduce heterogeneity or influence the pooled effect estimates.
Pre-planned subgroup and sensitivity analyses. Analysis by pain duration was not performed since only 1 study had a duration of <6 weeks. In terms of severity, 3 studies were classified as having high severity on pain and disability at baseline [10,46,47]. There were no significant differences in the effect sizes between these subgroups. Sensitivity analyses including only concurrent treatments (CB + GAT vs GAT, n = 10) had minimal impact on the pooled effect sizes. Results from the meta-regression indicated that the time point of assessment did not explain the high levels of heterogeneity for pain and disability at both short and long term time points.

Summary
To our knowledge this is the first systematic review that has investigated the effects of CB interventions for patients with non-specific LBP of any duration and of any age, aiming to reflect the clinical population. The review included 23 studies with a total of 3359 participants and pooled effect estimates suggest small to moderate effect sizes in favour of CB interventions on a range of patient reported outcomes when compared to no treatment arm or a guideline-based active treatment. This review provides evidence that CB interventions are clinically effective and worthwhile for non-specific LBP, and this appears robust across a range of presentations and sample characteristics. These effects appear to be maintained over time, with patients followed up for an average of 54 weeks for disability and 49 weeks for pain.   Comparison with other studies Firstly, our results comparing CB interventions to wait-list/usual care are consistent with the findings of previous systematic reviews and meta analyses of CB for LBP [16,17], which reported moderate effects in favour of CB on pain and disability in the short term. Thus, the results of this review and previous reviews show that compared to wait-list or usual care, CB appears to have a beneficial effect. Our review is the first to compare CB treatments to other guideline-based active treatments solely for a LBP population. While we acknowledge the heterogeneity within this comparison, visual inspection indicates that for the majority of studies, when compared to other typical physiotherapy-based treatments, a CB intervention is more effective. When a single study with large effect sizes was removed from our analyses, the pooled estimates were reduced and were more consistent with previous meta-analyses of CB versus other active treatments for patients with non-malignant pain [48]. Thus, it is likely that the true effect of CB versus other guideline based active treatments may range from small to moderate. When interpreting the clinical significance of these effect sizes, most studies maintained a 30% decrease on the RMDQ at long term, which is considered clinically meaningful [49]. It is also worth noting that the included trials varied in their degree of pragmatism. Since we expect effect sizes found in more pragmatic trials to be smaller for a given amount of clinical change, these small effect sizes may reflect clinically important changes [50].

Strengths and limitations
Our review used a rigorous approach in line with the Cochrane guidelines which included a sensitive search strategy in multiple databases (including grey literature), and ensured that all study processes (screening, data checking, and risk of bias assessment) were completed by two  authors. In line with PRISMA guidelines, we assessed the level of reporting bias and influence of methodological quality. By not limiting our patient inclusion criteria by duration of pain or age, we were able to include more participants, making our results more precise and applicable to the typical clinical population of LBP patients. Additionally, we selected contrasts that would be meaningful for health care professionals and policy makers, excluding studies with active treatment comparators not recommended in the European LBP guidelines. Moreover, since there is no consensus on the definition of CB treatment for LBP in the literature, we used clear and transparent criteria to assess the eligibility of study interventions. Limitations included the exclusion of a small number of studies from the meta-analysis because of poor reporting of study data. The search term used for interventions was 'cognitivebehavioural', and hence where investigators have tested a CB intervention and not identified it as such, we would not have identified these papers. Lastly, considerable heterogeneity was observed on all outcomes when CB was compared to GAT (I 2 > 80%). Heterogeneity was partially explained by methodological quality. Other reasons could include differences in the interventions and lack of consistency in the reference (control) treatments. Despite exploring various factors, such as intervention and control characteristics, we found no single factor that could explain the heterogeneity amongst studies.

Clinical Implications
Nearly all included studies found clinically meaningful effects in favour of the CB intervention, with CB outperforming the majority of GAT comparisons. GAT treatments encompassed typical physiotherapy management and included a mixture of education, home and clinical based exercise, and some passive modalities (including manual therapy), indicating that for the most part, the management of patients with LBP can be improved by using a CB intervention.
Our results suggest that CB interventions can be successfully delivered by a range of health professionals. However, on the whole, interventions were poorly reported, hindering implementation in practice. To this end, we recommend that future studies use the TiDier guidelines to describe the intervention [51].

Future Work
The individual effect sizes varied markedly in magnitude and, on closer inspection, it was clear that individual study CB interventions also varied considerably on key factors, such as intervention content, dose, method of delivery, and provider. Whilst further research on these differences in these factors may be of interest, there were several examples of lower intensity, low cost interventions that were effective. Thus, future work should focus on integrating these interventions into clinical practice.

Conclusions
In conclusion, our results suggest that CB interventions have a long-term beneficial effect on pain, disability and quality of life in comparison to no treatment and other guideline-based active treatments for patients with LBP of any duration and of any age.

Author Contributions
Conceived and designed the experiments: HR AH BC ZH EW NT ZC SL. Performed the experiments: HR AH BC ZH NT. Analyzed the data: BC. Wrote the paper: HR AH BC ZH EW NT ZC SL.