Is One Trial Sufficient to Obtain Excellent Pressure Pain Threshold Reliability in the Low Back of Asymptomatic Individuals? A Test-Retest Study

The assessment of pressure pain threshold (PPT) provides a quantitative value related to the mechanical sensitivity to pain of deep structures. Although excellent reliability of PPT has been reported in numerous anatomical locations, its absolute and relative reliability in the lower back region remains to be determined. Because of the high prevalence of low back pain in the general population and because low back pain is one of the leading causes of disability in industrialized countries, assessing pressure pain thresholds over the low back is particularly of interest. The purpose of this study study was (1) to evaluate the intra- and inter- absolute and relative reliability of PPT within 14 locations covering the low back region of asymptomatic individuals and (2) to determine the number of trial required to ensure reliable PPT measurements. Fifteen asymptomatic subjects were included in this study. PPTs were assessed among 14 anatomical locations in the low back region over two sessions separated by one hour interval. For the two sessions, three PPT assessments were performed on each location. Reliability was assessed computing intraclass correlation coefficients (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for all possible combinations between trials and sessions. Bland-Altman plots were also generated to assess potential bias in the dataset. Relative reliability for both intra- and inter- session was almost perfect with ICC ranged from 0.85 to 0.99. With respect to the intra-session, no statistical difference was reported for ICCs and SEM regardless of the conducted comparisons between trials. Conversely, for inter-session, ICCs and SEM values were significantly larger when two consecutive PPT measurements were used for data analysis. No significant difference was observed for the comparison between two consecutive measurements and three measurements. Excellent relative and absolute reliabilities were reported for both intra- and inter-session. Reliable measurements can be equally achieved when using the mean of two or three consecutive PPT measurements, as usually proposed in the literature, or with only the first one. Although reliability was almost perfect regardless of the conducted comparison between PPT assessments, our results suggest using two consecutive measurements to obtain higher short term absolute reliability.


Introduction
Pain is defined as an unpleasant sensory and emotional experience associated with actual or potential tissue damage, or describe in terms of such damage [1]. According to the American Pain Society [2], pain is the fifth vital sign of medical examination. Pressure algometry (PA) performed with a handheld algometer is a method increasingly used since the 80s to assess mechanical pain sensitivity in different anatomical regions. When it is applied perpendicularly to the skin, the algometer creates a mechanical painful stimulation by activating group III and group IV muscle nociceptors [3]. Through pressure pain thresholds (PPT), PA provides a quantitative value related to deep structures sensitivity allowing clinicians or researchers to make comparison over time. In case of musculoskeletal pain, as recently proposed in a literature review by Arendt-Nielsen and Yarnitsky [4], PA seems particularly relevant to compare pain over time or between various normal, affected or treated anatomical regions.
It has been reported that pressure pain sensitivity is different between individual muscles [5] and also non uniformly distributed between muscle belly and tendons of a same muscle [6][7][8][9]. Thus, according to Anderssen and colleagues [6], the assessment of pain sensitivity in two adjacent sites can lead to two significantly different PPT's values. This difference could be explained by a change in muscle thickness and density of nociceptors. However, no difference are observed when PPT are assessed bilaterally over homologous body locations [5,10]. Among all the different anatomical locations, the low back region is particularly of interest for PPT's measurements since 70% of the population will experience Low Back Pain (LBP) at least once in his lifetime [11] and because LBP is often reported in relation to work related musculoskeletal disorders [12], disability and sickness absence from work [13,14]. The assessment of PPT can be used as a method to diagnose and monitor the effectiveness of various treatments or interventions over the lower back region [15][16][17].
According to a literature review by Arendt-Nielsen and Yarnitsky [4], PA seems relevant to compare pain sensitivity over time or between various normal, affected, or treated anatomical locations. In numerous studies, PA reported good to excellent intra-and inter-reliability to assess pain sensitivity in the low back [16,[18][19][20][21]. Mokkink and colleagues [22] have defined relative reliability as the extent to which scores for subjects who have not changed are the same for repeated measurements, in our study assessed by one examiner on two different occasions. Relative reliability is commonly quantified using intraclass coefficient correlation (ICC) [23]. Absolute reliability also called "agreement" or "absolute measurement error" is defined as how close the score on repeated measures are [24] and it is quantified using standard error of measurement (SEM). Interestingly, PPT reliability studies have (1) generally assessed only two or four locations over the low back region and/or (2) assessed PPT's reliability only unilaterally. Further investigations are therefore needed to ensure that PA is a reliable method to assess PPT in numerous locations covering the low back region of asymptomatic individuals.
The purpose of this study study was (1) to evaluate the intra-and inter-absolute and relative reliability of PPT within 14 locations covering the low back region of young asymptomatic individuals and (2) to determine the number of trials required to ensure reliable PPT assessments.

Materials and Methods Subjects
Fifteen asymptomatic subjects (8 women and 7 men), described in Table 1, volunteered to participate in this study. The subjects were recruited within the Grenoble community and consisted of students (11) and newly-hired workers (4). Inclusion criteria were being aged to 18 to 55 years, no musculoskeletal pain in the low back during the last week, no previous injury or/ and surgery in the low back region and no pregnancy. This study was conducted in accordance with the Declaration of Helsinki and was approved by the national ethics committee (French society for independent-living technologies and gerontechnology). Subjects gave their informed written consent to the experimental procedure.

Experimental protocol
A Somedic Algometer (Type 2, Sollentuna, Sweden) with a probe size of 1 cm² and calibrated before each session was used to assess PPT over two sessions separated by one hour and lasting approx. 30 minutes. The pressure was applied (1) by a single examiner, (2) over 14 anatomical locations in the lower back region with 7 locations on each side of the lumbar spinal processes L1-L5 and (3) at a rate of 30 kPa/s in line with previous studies. To avoid tissue injury [26][27][28], a 1 minute interval was observed between two consecutive PPT assessments over the same location to avoid temporal sensitization [29].
Subjects lying comfortably in a prone position were asked to press a button that locks the algometer when the pressure became painful. Then, the examiner noted the pressure indicated on the algometer display corresponding to the PPT. As in numerous studies [30][31][32] a training PPT measurement was realized prior recordings on the tibialis anterior [33], a remote site from the low back.

Procedure to mark the 14 anatomical locations
After palpation, the examiner placed two marks at the level of the first (L1) and fifth (L5) vertebrae spinal processes and measures the distance between these two locations (d1). This distance allows the examiner to select one paper grid with 14 anatomical locations among 8 grids specially designed according to the average L1-L5 distance reported earlier [32][33]. Once selected, the examiner aligns the grid with the L1 and L5 marks over the skin and start the experiment.
To design these grids, we calculated d2, corresponding to the quarter of the distance L1-L5. A first column of 5 points was placed bilaterally at the distance (d2) from a fictive line joining L1 to L5. Then, a second column of 2 points was set bilaterally at 2 times the distance (d2) of L2 and L3 (Fig 1).

Data analysis
PPT measurements were found to be normally distributed (Shapiro-Wilk normality test). On the one hand, the results of the first session were analyzed using a repeated measure of variance (ANOVA) to investigate the intra-session reliability, followed by Tukey post-hoc test to highlight differences between trials [34]. The relative and absolute reliability across the trials 1-2-3 were computed using ICC, SEM and minimum detectable change (MDC). The relative reliability was evaluated by calculating a 2-way fixed ICC 2,1 (for absolute agreement). Reliability coefficients (i.e. ICC values) were interpreted according to Landis and Koch [35] in which an ICC between 0.00-0.20 is considered poor, 0.21-0.40 is fair, 0.41-0.60 is moderate, 0.61-0.80 is substantial, and 0.81-1.00 is almost perfect. The SEM expressed in the same unit as pain sensitivity (kPa) quantifies the precision of PPT measurements of individual subjects [23,36]. The SEM was calculated as SD ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 À ICC p where SD is the standard deviation of the scores from all subjects and ICC the relative reliability [36]. MDC calculated as SEM × 1.96 × ffiffi ffi 2 p provides information on the thresholds required to be confident that a difference can be considered as "real" [36]. Mean, standard deviation, ICC 2,1 , SEM, MDC and limits of agreement (LOA) values were calculated for the two sessions to investigate the inter-session reliability in relation to the following three comparisons [16,[37][38][39]: 1. trial 1 from session 1 versus trial 1 from session 2; 2. the mean of trials 1 and 2 (trials 1-2) from session 1 versus the mean of trials 1 and 2 (trials 1-2) from session 2; 3. the mean of trials 1,2 and 3 (trials 1-2-3) from session 1 versus the mean of trials 1, 2 and 3 (trials 1-2-3) from session 2.
Furthermore, Bland and Altman plots of the differences between trials against their mean and LOA were used to assess the magnitude of disagreement between trials of the 2 sessions. Of note, a difference between trials outside the LOA can be considered as a real change [36].
Finally, one way ANOVA (with the number of trials as within-subject factor) followed by Tukey post-hoc test for pair-wise comparison was performed to compare reliability values (ICC and SEM) of the two sessions.

Intra-session reliability
Relative reliability of PPT in the low back. As mentioned in Table 2, with values ranged from 0.85 to 0.99, the ICCs of the 14 anatomical locations (Table 2) and of the left, right and overall low-back (P left , P right , P all ) were almost perfect regardless of the conducted comparison (trials 1-2-3).
Absolute reliability of PPT in the low back. Table 2 also reports that absolute reliability (. i.e. SEM) remained non statistically different for all possible combinations. SEM values ranged from 26 to 91 kPa.
Number of trials to ensure reliable measurements in the low back. The mean PPT values at each anatomical locations were not significantly different between trials regardless of the three conducted comparisons (trial 1 versus trial 2, trial 1 versus trial 3, trial 2 versus trial 3), the p-values were ranged from 0.7220 to 1.000 (Table 3). Concerning the left, right and overall low-back (P left , P right , P all ), p-values ranged from 0.9960 to 0.9995.
The comparison of means of ICC regardless of the conducted comparisons between trials (Table 2) showed a statistical difference for Trials 2-3 versus Trials 1-3 (p = 0.0338). The same analysis for SEM further showed no significant difference between trials.

Inter-session reliability
Relative reliability of PPT in the low back. The ICCs of the 14 locations were almost perfect regardless of the conducted comparison (Table 4 and Fig 2). ICC values ranged from 0.86 to 0.99.
The visual analysis of Bland and Altman's plots suggested no difference in PPT values between sessions because (1) zero was included in the 95% confidence interval and (2) all the subjects were inside the limits of agreement (Fig 3). Furthermore, this visual analysis also suggested narrowed LOA for the association trials 1-2 and trials 1-2-3 compared to the plot of the first trial.
Number of trials to ensure reliable measurements in the low back. The mean PPT values at each pressure pain location between sessions 1 and 2 were not significantly different regardless of the following three comparisons: (1) trial 1 from session 1 versus trial 1 from session 2, (2) the mean of trials 1 and 2 (trials 1-2) from session 1 versus the mean of trials 1 and 2 (trials 1-2) from session 2, (3) the mean of trials 1, 2 and 3 (trials 1-2-3) from session 1 versus the mean of trials 1, 2 and 3 (trials 1-2-3) from session 2, the p-values were ranged from 0.4137 to 0.9974.
When two consecutive measurements (Trials 1-2 or Trials 2-3) or all trials were used to calculate subjects' relative and absolute reliability, ICC and SEM values were significantly higher than when the first trial was used. Conversely, no statistical difference was observed between the two first consecutive trials and all trials, or between the two last consecutive trials and all trials. Finally, no statistical difference was observed between the two first and the two last consecutive measurements (Tables 5 and 6). Visual analysis of Bland and Altman plots showed that LOA values decreased when the two first and the three trials were analyzed (Fig 4).

Discussion
Considering the importance of collecting reliable PPT over the lower back region, the purpose of the present experiment was (1) to evaluate the intra-and inter-absolute and relative reliability of PPT within 14 locations covering the low back region of asymptomatic individuals, and (2) to determine the number of trial required to ensure reliable PPT assessments. Intra-session results will be discussed before those of the inter-session.
First, the analysis of PPT measurements of the low back showed excellent relative reliability for the intra-session. ICCs values were almost perfect regardless of the conducted comparisons Table 2. Intraclass correlation coefficients (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for pressure pain thresholds assessed during session 1 over 14 locations (P1 to P14) over the low back region, for left and right locations (Pleft and Pright) as well as overall low back (Pall) between the mean of the first and second trials (T1-T2), the first and the third trials (T1-T3), the second and the third trials (T2-T3) and the means of the three trials (T1-T2-T3).   Trials  T1-T2  T1-T3  T2-T3  T1-T2-T3   ICC  SEM MDC MDC  ICC  SEM MDC MDC  ICC  SEM MDC MDC  ICC  SEM MDC  (trial 1 versus trial 2, trial 1 versus trial 3, trial 2 versus trial 3) suggesting no difference in PPTs' measurements between trials and no systematic error in the data. Moderate to excellent relative reliability was also obtained in previous studies, assessing PPT in other anatomical locations such as tibia [40], calf, hand [41] and trapezius [42]. In a recent study assessing PPT in the lower back region of young healthy subjects, Waller and colleagues [43] reported ICC ranged from 0.94 to 0.99 and further conclude that intra-rater reliability was excellent in the low back. As Waller and colleagues [43] have assessed PPT only over one location in the low back (2 cm laterally from L4/L5). However, the generalization of such a finding is questionable considering that PPT can be different over the same muscle [6]. Moreover, the study population was small but sufficient to obtain substantial relative reliability values [35]. The inter-session relative reliability has also been shown to be excellent in our study. It is first important to note that no significant difference was observed for PPT measurements between session 1 and session 2. Then, the analysis trial-to-trial showed that ICCs values were also almost perfect regardless of the number of trials considered, confirming excellent reliability previously reported by Koo and colleagues [19] in the low back of healthy individuals. In the latter study, the six anatomical locations assessed (1) bilaterally in the low back perpendicularly to the spinal processes of L1, L3 and L5 and (2) over two sessions separated by 5 minutes led to ICC ranged from 0.86 to 0.91. Conversely to the intra-session's results, we report higher relative and absolute inter-session reliability (i.e. ICC and SEM values) when two consecutive PPTs measurements were used for data analysis. Similar results have also been reported in the low back of healthy individuals by Chesterton and colleagues [29], i.e, higher intra-session reliability for the mean of three consecutive PPTs assessments than when only the first assessment was used for analysis. Even more interesting to note was that in our study, contrary to numerous studies [18,31,44], the first PPT assessment did not need to be discarded to obtain excellent reliability for both intra-and inter-session. Lacourt and colleagues [44], in a test-retest study have reported significant differences in PPTs values respectively, between the first and second Table 3. Mean (SD) pressure pain thresholds (kPa) for session 1 assessed over 14 locations covering the low back region and level of significance (p-values) among trial. See "Procedure to mark the 14 anatomical locations" for explanation concerning the locations of PPT assessments. Trial 2  Trial 3  T1-T2-T3  T1 -T2  T1 -T3  T2 -  PPT measurement and also between the first and third one. This result led them to use only the second and third PPT measurements for data analysis. Higher inter-session reliability was also found by Nussbaum and Downes [31], when the first PPT measurement was omitted. This could be explained by the effort made in the current study to familiarize the subject with PPT measurements (tests at a remote location, one practice session).

Trial 1
As ICC is largely influenced by between-subjects variability and does not provide information on typical error [36], it was necessary to complete our analysis by computing SEM and MDC. When the first PPT measurement was associated with the second or third one, SEM were generally below 65 kPa and MDC ranged from 11% to 27% (71 kPa to 179 kPa). In other words, this result suggests that (1) the true score of PPT was 65 kPa below or above the observed score and (2) that a clinical change will not be masked by measurement error if the observed score changed by more than 11 to 27%. The limited number of published studies assessing absolute reliability has made comparison difficult with the existing literature. However, after two sessions of PPTs measurements over one location on the trapezius muscle and the tibialis anterior separated by three to five days, Walton and colleagues [45] have reported a SEM value close to ours with a value of 49 kPa. Similar results were found by Fingleton and colleagues [25] in the lower limbs with SEM ranged from 16 to 39 kPa and by Chesterton and colleagues [29] in the back with SEM equal to 60 kPa.  In general, when looking at the number of trials required ensuring reliable measurement in the low back, our results are rather original. Indeed, we have reported almost perfect intra-and inter-session reliability on the first PPT measurement (ICC ranged from 0.85 to 0.99) suggesting that one training trial over the tibialis anterior would be sufficient to familiarize the participant with the PPT procedure. Hopkins in 2000 [46] assumed that the reliability of a test could be influenced by several factors such as motivation or boredom. For instance, during series of trials, the second one is often better than the first because participants want to improve their performance or because they benefit from the experience of the first one. Conversely, a decreased performance between the first trial and the following ones could be explained by fatigue or loss of motivation. In our study it seems that there is no learning effect between trials because PPT values did not change. Then, it seems that the cognitive and attentional resources needed to perform three consecutive PPT assessments over the low back do not generate boredom or loss of motivation and do not influence reliability. Furthermore, even though both relative and absolute relative reliabilities were significantly higher when two or three consecutive measurements were used for data analysis compared with the first measurement, no statistical difference was observed between the two first and two last PPT measurements. Therefore, this result suggests that using the two first PPT measurements for data analysis will not lead to lower relative and absolute reliabilities than using the two last or three PPT measurements. Finally, in accordance with existing literature in the low back [16,45], MDC values were regularly between 100 and 200 kPa corresponding to approx. 10-20% of the PPT scale range considered as acceptable measurement error by Chiarotto and colleagues [47]. For instance, to be confident that a true change was observed in the low back of young football player after an intervention, Madeleine and colleagues [16] reported MDC value of 140 kPa. Walton and colleagues [45] assessing PPTs in the trapezius muscle among young healthy subjects reported MDC of 113 kPa. These results imply a small sensitivity to change and that a change in PPT measurement can be masked by the measurement error regardless of the absolute changes in PPT [16].
Recent studies have reported good to excellent PPTs' reliability between sessions, respectively separated by one day and assessed over 14 locations covering the abdominal region [48], two days and assessed over the 2 locations from the low back [20] and twenty-one days and assessed 1 location from the trapezius muscle [49]. Still, the current results need to be confirmed by assessing PPT's reliability over longer period of time. Then, the absence of significant difference for some important parameters such as ICC and SEM values between two or three consecutive PPT measurements could be explained by the relatively small sample size. Indeed, true significant effects might have been missed in our study because the sample size used might have not adequate power for detecting a true difference of a meaningful magnitude.
Finally, we recruited a mixed population of asymptomatic individuals classified as such since they did not report pain in the low-back within the last 7 days prior to the experiment and had no history of low back injury or/and surgery. However, pain is usually fluctuating as reported by recent studies [50,51,52]. Further, the present results should not be generalized to specific population or gender as gender differences are reported in pressure pain sensitivity [33,53,54]. Still, Paungmali and colleagues [20] have reported almost perfect relative reliability for chronic non-specific low back pain individuals in line with our results. Future studies could address the relative and absolute variability of PPT assessed over 14 locations covering the low back in population suffering from LBP.

Conclusions
Excellent relative and absolute reliability of PPT measured over 14 locations covering the low back of asymptomatic individuals were reported for both intra-and inter-session. Reliable measurements can be equally achieved when using the mean of consecutive PPT measurement or with only the first one. Although reliability was almost perfect regardless of the conducted comparison between PPT assessments, our results suggest using at least two consecutive measurements to obtain higher inter-session absolute reliability among asymptomatic participants in the low back region. Further studies are needed to enable a more global generalization of these findings.