Clinical diagnostic model for sciatica developed in primary care patients with low back-related leg pain

Background Identification of sciatica may assist timely management but can be challenging in clinical practice. Diagnostic models to identify sciatica have mainly been developed in secondary care settings with conflicting reference standard selection. This study explores the challenges of reference standard selection and aims to ascertain which combination of clinical assessment items best identify sciatica in people seeking primary healthcare. Methods Data on 394 low back-related leg pain consulters were analysed. Potential sciatica indicators were seven clinical assessment items. Two reference standards were used: (i) high confidence sciatica clinical diagnosis; (ii) high confidence sciatica clinical diagnosis with confirmatory magnetic resonance imaging findings. Multivariable logistic regression models were produced for both reference standards. A tool predicting sciatica diagnosis in low back-related leg pain was derived. Latent class modelling explored the validity of the reference standard. Results Model (i) retained five items; model (ii) retained six items. Four items remained in both models: below knee pain, leg pain worse than back pain, positive neural tension tests and neurological deficit. Model (i) was well calibrated (p = 0.18), discrimination was area under the receiver operating characteristic curve (AUC) 0.95 (95% CI 0.93, 0.98). Model (ii) showed good discrimination (AUC 0.82; 0.78, 0.86) but poor calibration (p = 0.004). Bootstrapping revealed minimal overfitting in both models. Agreement between the two latent classes and clinical diagnosis groups defined by model (i) was substantial, and fair for model (ii). Conclusion Four clinical assessment items were common in both reference standard definitions of sciatica. A simple scoring tool for identifying sciatica was developed. These criteria could be used clinically and in research to improve accuracy of identification of this subgroup of back pain patients.


Source of data and participants
Data from primary care consulters with LBLP taking part in the ATLAS (Assessment and Treatment of Leg pain Associated with the Spine) observational cohort study [24] was analysed. Ethical approval for the ATLAS study was granted by the South Birmingham Research Ethics committee, reference number 10/H1207/82.
As part of the ATLAS study, patients completed questionnaires, underwent a standardised clinical assessment by one of seven musculoskeletal physiotherapists, and had a lumbar spine MRI within two weeks of their assessment (providing there were no clinical contraindications to the procedure). At the end of the clinical assessment, physiotherapists documented (i) a diagnosis of either sciatica or referred leg pain (ii) confidence (0-100%) in their diagnosis. Clinicians made their diagnosis based on information from history and physical examination findings only, the MRI findings were not part of the diagnostic process. For the purposes of the study, the term sciatica signifies spinal nerve root involvement. The MRI scans were scored by a senior consultant musculoskeletal radiologist, blind to any clinical information about the patient other than that the patient had LBLP (not specifying which leg). The radiologist provided a clinical report indicating definite, possible or absence of nerve root compression.

Outcome
The outcome of interest in this study was a diagnosis of sciatica. Two reference standards were chosen for the diagnostic model: Model (i): High confidence (! 80%) sciatica clinical diagnosis Diagnosis of sciatica for model (i) reference standard was when the clinician documented the presence of leg pain was due to sciatica and they were ! 80% confident in their diagnosis. A cut off point of ! 80% diagnostic confidence was used because at this criterion, reliability among clinicians diagnosing LBLP improves considerably [4].
Model (ii): High confidence (! 80%) sciatica clinical diagnosis with confirmatory MRI findings The second reference standard combined the clinician's diagnosis with MRI findings (possible/definite) of nerve root compression. Using clinical diagnosis alone as a reference standard may leave diagnosis open to incorporation bias as the reference standard is not blind to knowledge of the predictors under consideration [25].

Predictors
Nine candidate predictors were a priori chosen for potential inclusion in the diagnostic model from the larger set of available self-report and clinical assessment findings. Predictor selection was guided by (a) expert consensus from a Delphi study on items from clinical assessment considered most important for distinguishing sciatica from non-specific leg pain in LBLP patients [26] and (b) items used in other multivariable diagnostic models shown to have acceptable diagnostic accuracy for identifying sciatica [15][16][17][18][19][20].
Small or zero frequencies identified within 2x2 table cells made logistic regression using one variable unfeasible (myotomes). The three tests of myotomes, reflexes and sensation were therefore combined in a clinically acceptable way [27] to one variable: 'any deficit on neurological testing'. Seven predictors remained for selection in the multivariable model (S1 Table for  predictor selection and their measurement level). sample size was adequate to satisfy the recommended guide of 10 events per predictor [28] with nine initial predictors and 100 patients in the smallest outcome category (patients diagnosed with referred pain for model (i)).

Statistical analysis
Univariable logistic regression analysis quantified the relationship between each individual predictor variable and the presence of sciatica based on both reference standards. Multiple logistic regression with backwards stepwise selection (p = 0.05) was performed using all a priori selected predictor variables. Complete case analysis was planned because all but one of the baseline predictors had no missing data. Contribution of each predictor variable within the final model was presented as beta coefficients and odds ratios (ORs) with 95% confidence intervals (CI).
Measures of calibration and discrimination assessed predictive performance of the models. Calibration was assessed graphically using the observed outcome plotted against the predicted probability of the outcome obtained from the fitted logistic regression model using the Lowess smoothing curve technique [29]. Perfect calibration shows a slope on the 45 degree line, hence deviation of the line from the diagonal indicates lack of calibration. The plot was supplemented with the Hosmer and Lemeshow goodness-of-fit test [27]. P ! 0.05 supports the goodness-offit.
Discrimination (the ability of the model to distinguish between those who do and do not have the sciatica diagnosis) was summarised using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. An AUC of 0.5 indicates no discrimination whereas AUC of 1.0 indicates perfect discrimination [27].
Internal validity of the final model was assessed using 1000 bias corrected bootstrap samples. An adjusted AUC was calculated for the bootstrapped model to reflect the discriminative performance of the internally validated model.
Characteristics of the population used in diagnostic modelling were compared to those excluded from analyses due to application of high confidence in diagnosis criteria, using descriptive statistics.
To address the issue of an imperfect reference standard, a probabilistic statistical alternative using latent class (LC) modelling was used [30]. This method specifies a model so that response probabilities of clinical assessment items used to model the classes can be derived without knowing the patient's true classification (diagnosis) [31]. The technique identifies latent or underlying groups of patients based on their response to clinical assessment items, and circumvents the need for a reference standard. It is therefore a useful comparator to the other analyses. Each patient was reclassified according to a two solution latent class model. It is assumed that the two latent classes correspond to one class of patients in which the target condition is present and one class in which the target condition is absent [31]. Concordance between the clinical diagnosis (+/-MRI) groups and the two latent classes was calculated using percentage agreement and a kappa statistic.
MPlus v5 was used for LC modelling. SPSS v21 and Stata v13 were used for the diagnostic model and descriptive analyses.

Scoring tool
A simplified scoring tool for the best performing model was derived, to give a LBLP patient their probability of having sciatica. Regression coefficients for each predictor in the final model were converted to whole numbers by dividing each item coefficient by the lowest value coefficient [13]. Scores were presented alongside their associated outcome probabilities [32].

Sensitivity analysis
An additional multivariable logistic regression model was performed using MRI only as the reference standard so as to compare to models published in the literature that have used MRI findings only as reference standard. The log ORs, corresponding CIs and AUCs of this additional model was compared to the original two models.

Participants
Of the 609 LBLP consulters who participated in the ATLAS study, 395 participants were included in the diagnostic model development analysis and LC modelling. Reasons for excluding patients from the diagnostic model analysis were (i) if clinician confidence in diagnosis (for either referred leg pain or sciatica) < 80% (n = 173), (ii) patients did not have an MRI scan (an additional 41 patients). Table 1 displays characteristics of patients in the diagnostic model development sample (n = 395) and those not included in model building analyses (n = 214). The excluded group had a greater proportion of patients aged over 65 years (18% v 14%), higher proportion of females (68% v 60%), more patients with leg symptoms for over 3-months (42% v 33%) and more comorbidity (17% ! 2 comorbidities v 11%). Comparing clinical characteristics, a greater proportion of patients in the diagnostic model group had a positive cough/sneeze (25.8% v 12 .6%), leg pain worse than back pain (50.1% v 38.3%), neurological deficits (57.7% v 46.3%) and positive neural tension tests (60.8% v 44.4%).
Of the 395 patients included in the analysis, 75% (n = 295) were diagnosed with sciatica using model (i) reference standard. Using model (ii) reference standard, where clinical diagnosis was corroborated by positive MRI findings, 51% (n = 200) were diagnosed with sciatica.
Class one identified by LC modelling had 244 patients, class two had 151 patients. The overall percentage agreement between the clinical diagnosis groups defined by model (i) and the two latent classes was 83%, with a kappa coefficient of 0.62 (95% CI 0.54, 0.70) indicating substantial agreement [35]. This suggests the clinical diagnosis reference standard was adequate. Comparing the two latent classes to groups diagnosed using high confidence clinical diagnosis and confirmatory MRI findings (model ii), showed agreement of 72% and kappa 0.43 (95% CI 0.35, 0.52) indicating moderate agreement.

Model development
Following univariable analysis, all predictor variables were significantly associated with both reference standard outcomes (p<0.001) ( Table 2). The ORs for model (i) were all higher than model (ii). The greatest strength of association with model (i) diagnosis was 'positive neural tension tests' with very high ORs of 31.9. For model (ii), 'leg pain worse than back pain' had the highest association with the diagnosis (6.1; 3.9, 9.4).

Model specification
Multivariable analysis was performed on 394 participants since one variable had missing data on one patient. Results are presented in Table 3. The clinical diagnosis reference standard model (i) produced a final model with five items (p<0.05). Positive cough/sneeze and intensity of leg pain were eliminated. Six items were retained in model (ii), only subjective sensory changes was eliminated. Four items were retained in both models: below knee pain; leg pain worse than back pain; positive neural tension tests; neurological deficit. The ORs for model (i) were all higher than model (ii).

Model performance
The shape of the slope on the calibration plots show that model (i) is well calibrated and model (ii) less well calibrated (Fig 1). The Hosmer and Lemeshow statistical test for the observed data for model (i) supported the goodness-of-fit of the model (χ 2 = 11.4, p = 0.18) whereas model (ii) showed poor calibration (χ 2 = 22.4 p = 0.004). Discrimination was almost perfect for model (i) (AUC 0.95, 95% CI 0.93, 0.98) and excellent for model (ii) (AUC 0.82, CI 0.78, 0.86). Adjusted AUCs for both models were unaltered following bootstrapping.
A simple scoring method, for the better performing model (i), was developed by converting the beta coefficient values into whole numbers. A total score of 10 could be achieved ( Table 4). The corresponding predicted probability of sciatica for each sum score was calculated. Using this clinical diagnostic model (with high confidence clinical diagnosis as the reference standard), a threshold score of 5 or above suggests high likelihood of being diagnosed with sciatica (at least 83%). Using coordinates from the ROC curve, at this threshold, the model has sensitivity of 0.85 and specificity of 0.88.

Sensitivity analyses
When MRI only was the reference standard, the predictors remaining in the model were leg pain worse than back pain (OR 2.4, CI 1.

Discussion
This study ascertained the items from clinical assessment that best identify sciatica in primary care consulters with LBLP. In the absence of a gold standard for diagnosing sciatica, two reference standards were compared. Model (i), using high confidence in clinical diagnosis as a reference standard, retained five items and had almost perfect calibration and discrimination. Model (ii), with the addition of confirmatory MRI in the reference standard, retained six items and showed good discrimination but poor calibration. Bootstrapping revealed minimal overfitting in both models.
The predictors that were retained in both models are unsurprising from a clinical perspective. "Pain below the knee" is commonly considered a proxy for sciatica [36] and other diagnostic models report its association with nerve root involvement defined by either clinical diagnosis [17] or MRI [19]. The "leg pain worse than back pain" item performed strongly in both models and two previous diagnostic models reported its association with sciatica [16,17]. However, it has received less attention in the literature, for example as an eligibility criterion for selecting sciatica patients in intervention studies [37]. In clinical practice, cough/sneeze/strain reproducing leg pain is considered indicative of sciatica. In this analysis "positive cough/sneeze" was significant in model (ii), similar to associations seen in other models using MRI findings as the reference standard [16,18], but not significant in model (i). Self-report symptoms of weakness or numbness have previously shown minimal association with MRI findings of nerve root compression [19], similar to our findings for model (ii). Neurological deficit was associated with sciatica in both models. "Positive neural tension tests" remained in both models but with considerable difference in the magnitude of the ORs. The model including MRI in the reference standard gives much less weight to the association between positive neural tension and sciatica (OR 1.8). When MRI findings only were used as the reference standard, positive neural tension was not predictive of sciatica diagnosis, similar to a previously published model which used MRI as the reference standard [16]. Clinically and in the literature it is recognised as a diagnostic criterion for sciatica [6]. It is suggested that neural tension tests may cause pain due to chemical mediators irritating the nerve root but not generating detectable signal on MRI [19].
Different choice of patient population and reference standards (MRI versus clinical diagnosis) limits readers' ability to compare diagnostic models for sciatica. Four models in the literature have used MRI as reference standard [15,16,18,19]; two of these included only self-report items as predictors [15,19]. One model used clinical diagnosis as the reference standard [17] and the oldest published model used mylegrophy [20]. Three of the six models are based on patients in secondary care settings [15,19,20], with potentially more severe presentations than those from primary care settings [16][17][18].
Performance measures are not always reported [18][19][20] which makes it difficult to compare models. In a model that used nerve root compression on MRI as a reference standard; gender and sensory loss remained significant predictors, but performance was poor (AUC 0.65) [15]. History items alone were used to develop the model and the population was a highly selected group with severe sciatica. Items identified by Vroomen et al. [16], to be associated with nerve root compression defined by MRI, performed well (AUC 0.80) for demographic (age and gender) and history domains (spasmodic pain, pain worse in leg than back, pain in a dermatomal distribution, positive cough/sneeze). Their model performance improved slightly (AUC 0.83) when physical examination items were added (restricted forward bending, myotome weakness). External validation of the history items in a different data set resulted in a much lower AUC of 0.58 [15]. Using a similar reference standard and population setting to the study in this report, Konstantinou et al. [17] also found pain below knee, leg pain worse than back pain and feeling of numbness or pins and needles to be associated with the clinical diagnosis of sciatica. The authors acknowledge that not including clinical examination items may explain their models' performance (AUC 0.72 for only definite cases of sciatica; AUC 0.74 for definite and possible cases of sciatica indicated by clinical diagnosis).

Limitations
As there is no gold standard for diagnosing sciatica, selection of a reference standard is always a challenge. In this study, for Model (i), expert clinical opinion was chosen as a reference standard, which is considered in some circumstances appropriate in the development of diagnostic criteria in the absence of a gold standard [22]. It also reflects current practice in primary care when in the majority of cases, diagnosis and initial management plans are put into place without access to imaging, at least initially. Patients excluded from the analysis were cases where Diagnostic model for sciatica clinicians indicated low diagnostic confidence, irrespective of either a referred leg pain or sciatica diagnosis. A reliability study, nested in this cohort, showed good reliability on diagnosis of LBLP when clinician confidence is high (at least 80%) [4]. Diagnostic uncertainty is a clinical reality as sometimes a return visit from the patient is needed to further confirm or explore diagnosis. All patients received an MRI scan as part of this research study and patients were not selected for inclusion in the study based on the results of this scan. The clinicians unavoidably used information from the assessment predictor variables to make their diagnosis; this contributes to incorporation bias and potentially inflates accuracy estimates [38,39]. Ideally the reference standard and the predictors should be independent of one another to avoid inflation of accuracy estimates [38,39].
A second reference standard was chosen which combined confirmatory MRI findings with the high confidence clinical diagnosis, in order to address to some extent the issue of incorporation bias.
Alternative approaches to deal with an "imperfect reference standard" include using a combination of reference standards in a sequential manner to diagnose patients [38]. For example firstly interpreting clinical information, then, if needed, combining this information with further diagnostic tests (e.g. MRI). Another recommended means of limiting bias with reference standard selection is the use of consensus so more than two assessors agree on a diagnosis [38]. However, both these methods can result in selection bias as the "easier to identify" cases are selected therefore losing the heterogeneity of patients seen in normal clinical life.
Using MRI only as a reference standard, which allows the reference standard and predictors to be independent of each other, produced the lowest performance index (AUC 0.70) and did not retain the predictors "pain below the knee" and "positive neural tension tests". Excluding these variables is at odds with clinical opinion and evidence in the literature, and reflects the mismatch seen in studies between clinical presentation and MRI findings [40].
The latent class analysis was performed to classify patients into two groups without the need for a reference standard. The two class solution showed good concordance with the groups defined as referred pain and sciatica according to clinical diagnosis reference standard, supporting the validity of clinical diagnosis for use as a reference standard.
Stepwise regression is an automated process and using too many variables and removal of variables that may be important are some of the recognised limitations of the technique [13]. For this model the number of predictors was not excessive in relation to the sample size [28] and the backwards approach to predictor selection allows the model to be assessed as the variables are removed sequentially.
The choice of predictor selection for this model was primarily based on previous consensus work on items from clinical assessment that contribute most to the diagnosis of LBLP due to sciatica [26]. The primary care setting of this study helps to limit the issue of selection bias seen in other diagnostic studies where patients are selected from secondary care settings and have more severe symptoms. Assessors who participated in this study were all experienced physiotherapists and underwent training to enhance standardisation of the data collection and diagnostic decisions. It could be argued that diagnostic accuracy may be better among other healthcare professionals or medically trained clinicians. However previous work showed that agreement among clinicians is similar between physiotherapists and other healthcare professionals when diagnosing sciatica [4].

Conclusion
This study used information from clinical assessment to estimate the likelihood of sciatica in patients with LBLP presenting in the primary care setting. It is the first study to explore the considerable challenges, implications and sources of bias inherent with reference standard selection in identifying sciatica, and to compare models with different reference standards. A clear cluster of items was found which consistently identified sciatica: pain below the knee, leg pain worse than back pain, positive neural tension and neurological deficit. A simple scoring tool was developed which could prove useful to clinicians and researchers wishing to support their clinical judgement regarding the probability of whether a patient's leg pain is sciatica. In research settings, the tool could enable more optimum identification of a homogenous group.
Supporting information S1