Predicting who responds to spinal manipulative therapy using a short-time frame methodology: Results from a 238-participant study

Background Spinal manipulative therapy (SMT) is among the nonpharmacologic interventions that has been recommended in clinical guidelines for patients with low back pain, however, some patients appear to benefit substantially more from SMT than others. Several investigations have examined potential factors to modify patients’ responses prior to SMT application. The objective of this study was to determine if the baseline prediction of SMT responders can be improved through the use of a restricted, non-pragmatic methodology, established variables of responder status, and newly developed physical measures observed to change with SMT. Materials and methods We conducted a secondary analysis of a prior study that provided two applications of standardized SMT over a period of 1 week. After initial exploratory analysis, principal component analysis and optimal scaling analysis were used to reduce multicollinearity among predictors. A multiple logistic regression model was built using a forward Wald procedure to explore those baseline variables that could predict response status at 1-week reassessment. Results Two hundred and thirty-eight participants completed the 1-week reassessment (age 40.0± 11.8 years; 59.7% female). Response to treatment was predicted by a model containing the following 8 variables: height, gender, neck or upper back pain, pain frequency in the past 6 months, the STarT Back Tool, patients’ expectations about medication and strengthening exercises, and extension status. Our model had a sensitivity of 72.2% (95% CI, 58.1–83.1), specificity of 84.2% (95% CI, 78.0–89.0), a positive likelihood ratio of 4.6 (CI, 3.2–6.7), a negative likelihood ratio of 0.3 (CI, 0.2–0.5), and area under ROC curve, 0.79. Conclusion It is possible to predict response to treatment before application of SMT in low back pain patients. Our model may benefit both patients and clinicians by reducing the time needed to re-evaluate an initial trial of care.


Introduction
Spinal manipulative therapy (SMT) is among the nonpharmacologic interventions for low back pain (LBP) recommended as a second-line or adjunctive treatment option after exercise or cognitive behavioral therapy [1]. Spinal manipulative therapy is described as a high velocity, low amplitude force applied to the vertebral column most often by chiropractors [2]. Although recommended in clinical guidelines, some patients with LBP appear to benefit substantially more from SMT than others [3]. This observation has initiated several investigations that have examined potential factors to modify patients' responses prior to SMT application (Table 1).
Of these investigations, several have concluded that baseline characteristics can indeed be used to predict SMT response. A prospective study from the Nordic back pain subpopulation program examined 50 potential baseline factors in 875 LBP patients who received chiropractic care [24]. Their model correctly classified 99% of non-responders using 5 baseline variables: 1) sex, 2) social benefit, 3) severity of pain, 4) duration of continuous pain at first consultation, and 5) additional neck pain in the past year [24]. These results suggest that non-recovery from LBP in a chiropractic population is strongly related to demographic/self-report variables and weakly related to clinical variables; all five predictors were collected at the baseline without physical examination [24]. Interestingly though, the prediction rate for responders to chiropractic care was very low (6%). Further studies from this research group demonstrated similar results [12,21]. Importantly, a subsequent validation study was performed by this group that constructed 5 predictive models on the basis of baseline information. None of the 5 models was sensitive (0-19%), whereas they were all reported highly specific (96-100%). Three factors were recognized as best at predicting non-responders by the fourth visit including no definite overall improvement by the second treatment session, the minimum total duration of LBP in the past year being 30 days, and presence of leg pain [18]. Similarly, a study using a pragmatic osteopathic approach that employed SMT found two statistically significant baseline variables including depression and pain intensity as predictors of back-related disability at 4 years [22]. Other studies from other groups have achieved similar results when consideration for symptom duration was given [14,17].
Notably, a clinical prediction rule was developed to examine the characteristics of patient with LBP that may define a subgroup likely to benefit from SMT [23]. This work identified five predictive variables associated with 50% improvement in the Oswestry disability Index (ODI) within 1 week: duration of symptoms < 16 days, the fear avoidance beliefs questionnaire work subscale score < 19, at least one hip with > 35˚of internal rotation range of motion, hypomobility in the lumbar spine, and no symptoms distal to the knee. According to this prospective, cohort study, patients were considered to be likely responders to manipulation when four or more of these variables were met. The probability of success with manipulation increased from 45% to 95%, when patients met this threshold. These predictive criteria was also investigated in a subsequent validation study [3]. The results showed LBP patients who received manipulation and met these criteria experienced greater decreases in pain and disability after 1, 4, and 24 weeks compared to those who received manipulation but did not meet the criteria and those who met the criteria but did not receive manipulation.
On the contrary, a number of studies have had difficulty in identifying baseline characteristics of patients who respond to SMT. A secondary analysis of the large British randomized trial (UK BEAM) showed that patient baseline characteristics including age, work status, pain and disability, duration of episode, quality of life, and beliefs did not identify who was more likely to respond to manipulation or exercise with manipulation followed by exercise (combined treatment) [15]. Another retrospective analysis found that a lower baseline Roland Morris score predicted non-response to back school and individual physiotherapy but not to spinal manipulation which was provided over 4-6 weeks [9]. In another randomized controlled trial [6], researchers tried to build pre-and post-treatment models to predict responders to SMT and future pain intensity in 400 patients with chronic LBP. They reported the pre-treatment responder model in identifying SMT responders from their baseline characteristics didn't perform better than chance.
In addition, the predictive value of psychological factors in persons with LBP seeking help from chiropractors is uncertain. While an early study on the value of psychosocial variables with early identification of patients with poor prognosis showed initial psychosocial information in the form of the patient's cognitive coping strategies is highly predictive of the level of disability reported at 1 year [27], more recent studies have found little or no correlation with outcomes [5,7,11,12,14,17].
Given the above, predicting SMT responder status at baseline may be confounded by several factors including the timeframe over which SMT applications are given, the use of additional interventions other than SMT, inclusion of treatment response variables and the choice of baseline characteristics. While many of these prior attempts at predicting SMT responder status are from pragmatic trials, application of SMT over longer time frames that reflect clinical practice may result in confounding with the natural history of the condition. Further, use of additional interventions found in clinical practice complicates interpretation and comparison between studies. Similarly, inclusion of treatment response variables voids the ability to make a baseline prediction. Finally, as our understanding of the predictive value of baseline characteristics grows, choices of which characteristics are included or excluded in the final model can cause concern. With these issues in mind, we conducted a secondary analysis of a prior study that provided two applications of standardized SMT over a period of 1 week. The design of this prior study provides a unique opportunity to mitigate many of the potential confounders described above. Specifically, the shortened time frame of this design increases the likelihood of observing responses arising solely from SMT while decreasing the possibility of including responses associated with longer term mechanisms (e.g. natural history, contextual effects) or additional intervention. We further benefit from this design as it employs a previously validated criteria to define SMT responders; improvement in self-reported ODI occurring over 2 treatment sessions [28]. Importantly, this criterion has been tied to improvements in physical measurements in responders including biomechanical, neurological and biological variables [29][30][31] that were also collected in this study and available for use in baseline predictions. The study design also includes other new variables that have not been used previously but are increasingly thought it influence outcome (e.g. lumbar spine stiffness measures [31][32][33], lumbar multifidus (LM) muscles contraction [30,31,34]).
Therefore, the objective of this study is to determine if the baseline prediction of SMT responders can be improved through the use of a restricted, non-pragmatic methodology, established variables of responder status, and newly developed physical measures observed to change with SMT.

Primary protocol
In this current study, we performed a secondary analysis of data from a randomized controlled clinical trial. The original protocol for the primary study has been published previously [35]. In brief, the primary objective of the original study was to develop an optimized, multicomponent, SMT protocol using a phased, factorial design with three factors (additional SMT, multifidus muscle activation exercises, and spine mobilizing exercises). Sample size calculation was based on previous work in similar patient populations [31]. An initial sample of 280 participants was identified to provide at least 80% power to detect the minimum important differences for the patient-centered outcomes with a conservative 2-sided α = 0.025 to account for co-primary outcomes. A more detailed explanation of sample size assumptions is provided in the protocol publication [35].
Participants for the original study were individuals between 18-60 years of age with a primary complaint of LBP with or without symptoms into one or both legs, and an Oswestry disability score of at least 20%. Potential participants were excluded if they were currently receiving mind-body or exercise treatment for LBP from a healthcare provider, had "red flags" for a serious spinal condition (e.g., spinal tumor, fracture, infectious disorder, osteoporosis, or other bone demineralizing condition, etc.), showed signs consistent with nerve root compression (diminished myotomal strength, muscle stretch reflexes or sensation, positive straight leg raise), were currently pregnant, or had prior surgery to the lumbosacral spine.
After initial screening, those who provided informed consent were enrolled in the study. Each participant completed forms related to personal demographics, clinical history, and patient-reported outcomes. One of the study clinicians then performed a baseline assessment to collect various physical measurements. All participants then received two separate sessions of SMT occurring one day to one week apart. Manipulations were provided by either licensed chiropractors or physical therapists associated with the study. Following SMT, a re-assessment was conducted which collected the same baseline variables. Participants were categorized as SMT responders if their ODI score improved by 30% in 1-week reassessment.
The primary study received ethical approval from the University of Alberta (Pro00067152) and University of Utah (IRB_00092127) Institutional Review Boards. All the patients' data were fully anonymized. Permission to use anonymized data for the present study was obtained by the responsible authority, Julie M Fritz.

Demographic and history measures
Basic demographic information including age, gender, race, ethnicity, weight, height, marital status, employment status, highest education level, and clinical history (e.g. duration of symptoms, comorbid health conditions, prior history of LBP) were collected.

Patient reported outcome measures
Baseline assessment also included the ODI and Numeric Pain Rating Scale (NPRS) which were used as participant self-report measures of function and pain respectively [36,37]. The Fear-Avoidance Beliefs Questionnaire (FABQ) was also collected to measure patient beliefs about how physical activity and work may affect their LBP and perceived risk for re-injury [38]. In addition, short forms from the University of Washington concerns about pain (UWCAP) and pain-related self-efficacy (UWPRSE) item banks were collected to measure the extent to which people catastrophize in response to pain and their degree of confidence in the ability to function with pain respectively. We also assessed the participant's risk of persistent disabling pain as low, medium, or high risk using the STarT Back Tool (SBT) [39]. Patients were asked about their expectations of LBP outcomes specifically related to medications, surgery, rest, X-ray, MRI, modalities, traction, manipulation, massage, strengthening, aerobic, and range of motion exercises.

Physical examination measures
Physical examination measures included assessment of spinal (flexion, extension, left and right side-bending) [40] and hip range of motion (left and right internal rotation), lumbar segmental testing for mobility with manually applied posterior-anterior force [41], pain on palpation, straight leg raise (SLR) [42], Aberrant movements during lumbar range of motion [43], multifidus lift test at two levels (L4-L5 and L5-S1) and a prone instability test [42,43].

Instrumented measures
Both LM muscle activation and lumbar spine stiffness were evaluated at the baseline. Multifidus activation was measured with brightness-mode ultrasound images using a Sonosite Micro-Maxx (Sonosite Inc. Bothell, WA, USA) and a 60-mm, 2-5 MHz curvilinear array transducer based on a previously validated protocol [44]. Participants were positioned prone with their head neutral and a pillow under their abdomen to flatten the lordosis. Images were obtained at two vertebral levels (L4-L5 and L5-S1) in the parasagittal plane during rest (static) and submaximal contraction (dynamic) in response to the participant lifting a small weight with the contralateral hand. The weight was selected according to the participant's mass (<150 lb: 1.5 lb; 150-200 lb: 2 lb; and >200 lb: 3 lb). Three images were acquired in each state (relaxed and contracted) for each side and at two levels (L5/S1, L4/5), one side at a time. Images were stored and analyzed offline using ImageJ V1.38t software (National Institutes of Health, Bethesda, MD). Offline measures of LM thickness were obtained from determining the distance between the posterior-most aspect of the facet joint inferiorly and the plane between the multifidus and thoracolumbar fascia superior for both the resting and contracted states. Multifidus muscle activation was calculated as: (Thickness contracted −Thickness relaxed ) / Thickness relaxed ) [44]. The average of three measures was used for the analysis, for the total of 8 variables.
Lumbar spinal stiffness was assessed with the VerteTrack™ (VibeDx Corporation, Canada) which uses a rolling wheel system to apply vertical loads over the spine of a prone participant. The VerteTrack houses multiple sensors to provide continuous, real-time quantification of spinal deformation in response to a defined load. The resulting force displacement curves were used to calculate stiffness at each lumbar segment in N/mm. Terminal Stiffness was calculated as the ratio of the maximum applied force to the resultant displacement at each lumbar level [31]. Global stiffness was determined from the slope of force-displacement curve between 5 N and 60 N, representing the stiffness of underlying tissues throughout each trial [31]. One measure per lumbar segment corresponding to general stiffness, terminal stiffness, last load, and displacement were retained for analysis, for a total of 20 variables. The within-and betweensession reliability and accuracy for spinal stiffness measures taken with this device has been evaluated previously [45,46].

Spinal manipulative therapy
All SMT sessions began with a brief assessment by the clinician to identify possible SMT contraindications. The preferred SMT technique has been described previously [3]. This procedure is performed with the participant supine. The clinician stands opposite the side to be manipulated and side-bended the participant. The side to be manipulated was the side identified as more painful on the basis of participant's report. If the participant couldn't identify a more painful side the clinician selected a side. The participant crossed their arms in front of the chest while the clinician rotated him/her and delivered a high-velocity, low-amplitude (HVLA) thrust to the anterior superior iliac spine in a posterior/inferior direction. If this technique was not possible due to participant preference or comfort, a side-posture HVLA was performed. The participant laid on their uninvolved side with their superior leg bent to 90˚and the clinician places their pisiform on to their posterior superior iliac spine and delivers a high velocity low amplitude (HLVA) thrust. Previous study found no difference in outcome between this SMT procedure and a side-posture HVLA technique [47] while both techniques have been found to be well-tolerated [47].
Spinal manipulative therapy was considered complete if a cavitation (i.e. a "pop") occurred following SMT application. If cavitation was not achieved, the participant was repositioned and SMT performed again. If no cavitation occurred on this second attempt, the clinician performed SMT on the opposite side. A maximum of 2 attempts per side was permitted. If no cavitation was noted after the fourth attempt, SMT was complete. The number of SMT attempts and the technique used were recorded by the clinician.

Statistical analysis
All measures collected at baseline were used at the beginning of this analysis. Continuous data was summarized by means, medians and standard deviation. Categorical data was summarized by frequencies and percentages.
We have summarized the statistical methods used for data analysis in Fig 1. An initial exploratory analysis demonstrated that the collected variables at the baseline were associated with the relative changes in ODI. However, a high correlation was found between most of the ultrasound values, stiffness measures, and lumbar mobility testing results in bivariate correlation analysis (R � ±0.7), therefore a principal component analysis using varimax rotation with Kaiser normalization was conducted to address this multicollinearity and reduce the number of variables input into the subsequent multiple regression model [15]. An optimal scaling analysis was also performed to address the problem of too few observations for some of the categorical variables. Optimal scaling is a general approach to treat multivariate data through the optimal transformation of qualitative scales to quantitative values. Using this approach, both nominal and ordinal variables can be optimally transformed into numerical values to reduce multicollinearity among predictors and maximize the homogeneity or internal consistency among variables. As a result nonlinear relationships between transformed variables can be modeled [48,49]. Finally, a multiple logistic regression model was built using a forward Wald procedure to explore those baseline variables that could predict overall outcome (response status) at 1-week reassessment [6]. Analyses were conducted using IBM SPSS version 26.0 (Armonk, New York, USA). An alpha value of 0.05 was used for all analysis. In addition, sensitivity/specificity, positive/negative predictive values, positive/ negative likelihood ratios [50], and the area under the receiver operating characteristic (ROC) curve were estimated for the final model.

Results
Two hundred and thirty-eight participants completed the 1-week reassessment (age 40.0± 11.8 years; 59.7% female). Tables 2-5 and 6 present the results of the history and demographic, patient-reported outcome measures, patients' expectations, physical examination and instrumented measures at the baseline, respectively.
Numeric pain rating scale reports the average of the worst, best, and current scores for pain over the last 24 hours using a self-reported 0-10 numerical pain rating scale ranging from '0' no pain, and '10' worst imaginable pain [37]. Function was evaluated using Oswestry Disability Index on a 0-100 scale, with lower numbers indicating better function [36]. Fear-avoidance beliefs about physical activity and work were assessed using the Fear Avoidance Beliefs Questionnaire (FABQ) [38]. The short form of the University of Washington concerns about pain (UWCAP) is a measure of pain catastrophizing including 8-items, with each item rated on a  5-point scale: 1 (Never) to 5 (always). The higher the score, the more catastrophizing thoughts are present. The short form of the University of Washington pain-related self-efficacy (UWPRSE) was used to assess one's confidence in performing particular activities while in pain. It is a 9-item scale, with each item rated on a 5-point scale: 0 (Not at all) to 5 (very much). Higher scores represent higher confidence to function with pain. The short forms of the UWCAP and the UWPRSE items were scored by converting the total raw score into an item response theory-based T-score for with a mean of 50 and a standard deviation of 10. The mean score of 50 represents a mean of a large sample of people with chronic pain. The STarT Back Tool (SBT) is a 9-item questionnaire including physical and psychosocial statements that are used to categorize patients into low, medium, or high-risk groups for persistent LBPrelated disability [39]. Principal component analysis identified a three-factor solution for the stiffness values, onefactor solution for ultrasound values, and four-factor solution for the mobility testing results. Together these factors explained 89.1%, 90.1%, and 78.3% of the variance in the stiffness, ultrasound, and lumbar mobility testing data respectively. Lumbar spine stiffness values, LM activation values, and mobility testing results were then converted into principal component scores to construct our model.
Logistic regression analysis resulted in a model with eight baseline variables ( Table 7). The 8 variables in this model represent a number of different domains including participant demographics (height and gender), history (neck or upper back pain and pain frequency in the past 6 months), participant self-reported measures (SBT, patients' expectations about medication and strengthening exercises) and physical examination (extension status). Two variables were removed: One variable (depression) for not being statistically significant (P-value> 0.05) and another one (current pain duration) for having a regression coefficient of 0 and odds ratio (OR) equals to 1 showing there was no difference between responders and non-responders in the duration of their current pain.
As seen in Table 7, the effect of gender is significant but negative, indicating that females were 0.42 times less likely to respond to SMT than males. Higher expectations about strengthening (OR = 2.47) was associated with an increased likelihood of responding to SMT but higher expectation about medication (OR = 0.49) was associated with a reduction in the likelihood of responding to SMT. Participants with peripheralized pain during extension and those with more frequent pain in the past six month were 1.48 and 2.25 times more likely to be SMT responders, respectively. The ß coefficient for height, neck or upper back pain, and SBT score were also significant and negative indicating that increasing affluence is associated with decreased odds of responding to treatment. Table 8 presents the degree to which predicted probabilities agree with actual outcomes in a classification table. The overall correct prediction, 81.5% shows an improvement over the chance level which is 50%. Our model had a sensitivity of 72.2% (95% CI, 58.1-83.1), specificity of 84.2% (95% CI, 78.0-89.0), a positive likelihood ratio of 4.6 (CI, 3.2-6.7), a negative likelihood ratio of 0.3 (CI, 0.2-0.5), and area under ROC curve, 0.79.

Discussion
Identification of SMT responders and non-responders prior to application of the SMT has received increasing attention in the conservative treatment of patients with LBP; however, the evidence for the effectiveness of this approach is mixed. To determine if the baseline prediction of SMT responders can be improved through the use of a restricted, non-pragmatic methodology, established definitions of responder status, and newly developed physical measures observed to change with SMT, we investigated the predictive values of 20 history and demographic variables, 6 patient-reported outcome measures, 22 physical measures, and 28 instrumented measures as unique domains and in combination. Our results suggest that it is possible to predict SMT response in a specific group of patients with 91.2% accuracy in nonresponder and 57.4% in responder after only two applications of standardized SMT over a one-week period. To our knowledge, this is the first investigation to achieve prediction results of this magnitude for responder group although the model has yet to be validated. Prior studies that have generated successful predictions of SMT response have tended to arise from pragmatic designs. In contrast, prior studies that have chosen to provide SMT alone or with minimal additional interventions have not achieved successful predictions. While it is possible that the prior success of pragmatic studies in this regard is because a pragmatic design more closely mimics clinical practice, our results do not support that idea. Specifically, our methodology applied fewer SMTs over a shorter time frame using a pre-defined technique for  SMT application. Therefore, one explanation for our non-congruent results is that our hypothesis is tenable; that is, predicting SMT response is best assessed in a short-time frame and in isolation of other interventions. In addition, the magnitude of our SMT responder prediction was substantial greater when compared to prior studies that have not exceeded 19% to date. In the clinical prediction rule developed by Flynn et al, SMT response was predicted with 100% in non-responders and 19% in responders. Although this previous model consisted of fewer variables (i.e. 5) that is presumably easier to manage, the prediction performance for responders was lower. While at first glance it may appear unwieldy to use an 8-variable model including a 9-item questionnaire in a future clinical situation, 7 of the 8 variables can be collected in advance of the examination. The remaining one variable can be collected by clinicians with relative ease and expediency (extension status). In addition, one fourth of the model presented in the study is about patients' expectations on treatment. Although previous studies showed illness beliefs and beliefs about rehabilitation make a significant contribution to the prediction of different rehabilitation outcome indicators, the reason for this association remains unexplained [51][52][53][54][55][56]. However, it would be worthwhile to address the power of treatment expectations in comparison to other psychosocial factors in this group of patients. Importantly, none of the clinical measures included in our final model involved newly described physical measures involving special equipment and training (ultrasonic evaluation of muscle contraction, evaluation of spinal stiffness evaluation with a mechanical device).
The strengths of our study include a multi-site design which would tend to mitigate the possibility of our results arising from a specific population. Although most previous studies used other measures as response criteria, we defined our response value as 30% improvement on the ODI which is an accepted threshold of change based on minimal clinically important difference scores for this questionnaire [57,58]. Given this and considering the high sensitivity and specificity of our prediction results, we propose that a future validation study of this model is warranted. If found to be valid, these 8 variable models could provide clinicians with the opportunity to construct a more focused intervention plan after only 1 week of care. This would benefit both patients and clinicians by reducing more traditional re-evaluation periods of an initial trial of care that may extend into multiple weeks with many more treatment sessions.
As with all experiments, our study had limitations. First, our sample was heterogeneous in terms of pain duration. Although most participants in this study could be classified as having chronic LBP, our inclusion criteria were not limited to chronicity. Since the original primary study was designed to assess therapeutic effects in a wide range of participants, it did not restrict enrollment to a specific duration of low back pain. Therefore, the usability of the proposed model cannot be easily extrapolated to populations that may be highly homogeneous in pain duration. Second, we did not have a control group, thus these outcome data cannot be regarded as a clinical prediction rule, however, it can inform the professions of what might be important in patients' clinical assessment.

Conclusion
The 8 variable model presented here was able to predict SMT response with a sensitivity of 72.2% a specificity of 84.2%, and an overall classification accuracy of 81.5%. Given these results, and that 7 model variables can be collected prior to clinician engagement, future validation of the model is warranted. Should the model be valid, it may benefit both patients and clinicians by reducing the time needed to re-evaluate an initial trial of care.
Supporting information S1 File. Data necessary to replicate the analyses. (XLSX)