Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Reliability and minimal detectable change of dynamic temporal summation and conditioned pain modulation using a single experimental paradigm

  • Matthieu Vincenot,

    Roles Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Research Center on Aging, CIUSSS de l’Estrie-CHUS, Sherbrooke, Québec, Canada, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Québec, Canada

  • Louis-David Beaulieu,

    Roles Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Writing – review & editing

    Affiliation BioNR Research Lab, Université du Québec à Chicoutimi, Chicoutimi, Canada

  • Louis Gendron,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliations Faculty of Medicine and Health Sciences, Department of Pharmacology-Physiology, Université de Sherbrooke, Sherbrooke, Québec, Canada, Research Center of the Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, Canada

  • Serge Marchand,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliations Research Center of the Centre Hospitalier Universitaire de Sherbrooke, Sherbrooke, Québec, Canada, Faculty of Medicine and Health Sciences, Department of Surgery, Université de Sherbrooke, Sherbrooke, Québec, Canada

  • Guillaume Léonard

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Guillaume.Leonard2@USherbrooke.ca

    Affiliations Research Center on Aging, CIUSSS de l’Estrie-CHUS, Sherbrooke, Québec, Canada, Faculty of Medicine and Health Sciences, School of Rehabilitation, Université de Sherbrooke, Sherbrooke, Québec, Canada

Abstract

Background

Quantitative sensory tests (QST) are frequently used to explore alterations in somatosensory systems. Static and dynamic QST like pain threshold and temporal summation (TS) and conditioned pain modulation (CPM) are commonly used to evaluate excitatory and inhibitory mechanisms involved in pain processing. The aim of the present study was to document the reliability and the minimal detectable change (MDC) of these dynamic QST measurements using a standardized experimental paradigm.

Material and methods

Forty-six (46) pain-free participants took part in 2 identical sessions to collect TS and CPM outcomes. Mechanical (pressure pain threshold [PPT]) and thermal (constant 2-minute heat pain stimulation [HPS]) nociceptive stimuli were applied as test stimuli, before and after a cold-water bath (conditioning stimulus). TS was interpreted as the change in pain perception scores during HPS. CPM were determined by calculating the difference in pain perception between pre- and post- water bath for both PPT and HPS. Relative and absolute reliability were analyzed with intra-class correlation coefficient (ICC2, k), standard error of the measurements (SEMeas) and MDC.

Results

Results revealed a good to excellent relative reliability for static QST (ICC ≥ 0.73). For TS, a poor to moderate relative reliability depending on the calculation methods (ICC = 0.25 ≤ ICC ≤ 0.59), and a poor relative reliability for CPM (ICC = 0.16 ≤ ICC ≤ 0.37), both when measured with mechanical stimulation (PPT) and thermal stimulation (HPS). Absolute reliability varied from 0.73 to 7.74 for static QST, 11 to 22 points for TS and corresponded to 11.42 points and 1.56 points for thermal and mechanical-induced CPM, respectively. MDC analyses revealed that a change of 1.58 to 21.46 point for static QST, 31 to 52 points for TS and 4 to 31 points for CPM is necessary to be interpreted as a real change.

Conclusion

Our approach seems well-suited to clinical use. Although our method shows equivalent relative and absolute reliability compared to other protocols, we found that the reliability of endogenous pain modulation mechanisms is vulnerable, probably due to its dynamic nature.

Introduction

Pain is a complex phenomenon resulting from the interplay between excitatory and inhibitory mechanisms. Temporal summation (TS) refers to an excitatory pain mechanism implying a gradual enhancement of pain perception during repeated or constant painful stimulations [1,2]. Conditioned pain modulation (CPM) delineates a phenomenon where pain itself serves as an inhibitory mechanism for further pain perception. It involves a pain-inhibiting mechanism wherein a noxious stimulation results in widespread hypoalgesia [1,3,4]. Although involving distinct underlying processes, TS and CPM are often compared to sensitization [5] and diffuse noxious inhibitory control [6] respectively, two neurophysiological phenomena initially studied in animals. In the last decade, TS and CPM have been proposed as potential biomarkers to predict treatment success and guide personalized medicine [7,8]. However, evidence supporting their metrologic properties, particularly their reliability, remains uncertain [9,10].

Reliability is an umbrella term referring to the ability to give error-free measurements in stable individuals, while responsiveness refers to a measurement detecting changes, over time [11]. TS and CPM measures, like any other measurement, require both high reliability and responsiveness to be considered useful for diagnostic, prognostic, and personalized care purposes. Concerning relative reliability (i.e., the degree to which stable individuals in a sample maintain their position relative to each other with repeated measurements [11,12]), studies reveal indices ranging from poor to excellent for TS- and CPM-related measures [9]. Several factors have been put forward to explain these variations across studies, such as the protocol used and the methods for calculating reliability [9,10,13]. While relative reliability (best measured with the intraclass correlation coefficient [ICC[14]]) is useful to explore if a measurement can reliably detect differences between individuals, it is not best suited to judge the measurement’s ability to track changes, over time [12,15]. To this end, absolute reliability (or measurement error)–which estimates the degree of error in a person’s score in the absence of true changes [11], and is ideally measured by the standard error of the measurement [14,16] (SEMeas)–is best suited. SEMeas allows to estimate the minimal detectable change (MDC), which in turn informs on the measurement’s responsiveness [17]. Unfortunately, SEMeas and MDC have rarely been reported for TS and CPM [9], and no studies have yet documented these values for protocols using constant thermal stimulations.

The aim of this study was to fill this knowledge gap by assessing absolute and relative reliability and responsiveness of TS and CPM using a single experimental procedure. We were particularly interested in the impact of calculation methods on reliability performance.

Materials and methods

Participants

Forty-six (46) healthy participants were enrolled in this study and in which they participated in 2 identical testing sessions. In fact, there is no formal sample size calculation for reliability studies to ensure a minimum of significance. Although it is generally recommended to recruit between 20 and 40 participants [18], the confidence interval is another important element to consider, as it provides a minimum level of confidence in the ICC estimate. Each session lasted approximatively 1.5 hour and took place on different days, within a 1-week period. Inclusion and exclusion criteria are presented in Table 1. We asked participants to refrain from consuming tobacco [19,20], alcohol [21,22], stimulants [23] (caffeine, theine; etc.), as well as pain medication (e.g,. acetaminophen, non-steroidal anti-inflammatory drugs, and narcotics) during the 24-hour period preceding each experimental session.

Ethic approval

This cross-sectional study was conducted at the Centre de recherche du Centre hospitalier universitaire de Sherbrooke (Sherbrooke, Quebec, Canada). The recruitment occurred from February 4, 2019, to July 14, 2019, following the principles outlined in the Helsinki Declaration and the International Council for Harmonisation of Good Clinical Practice. Ethics approval was granted from the institutional review board of the Centre intégré universitaire de santé et de services sociaux de l’Estrie ‐ Centre hospitalier universitaire de Sherbrooke (CIUSSS de l’Estrie ‐ CHUS), Sherbrooke, Canada. The trial was registered on clinicaltrial.gov (NCT03376867). A written informed consent was obtained for each participant.

Experimental setup

All sessions were conducted in the same room. For a same participant, sessions were conducted at the same time of the day to avoid circadian effect [24]. The same experimenter realized all experiments. TS and CPM were assessed within a single quantitative sensory testing experimental procedure [4,25,26]. Participants’ eligibility was confirmed, and their written informed consent was obtained before beginning the session. TS and CPM were performed based on the protocol described by Tousignant-Laflamme et al. [26]. This protocol allows for the measurement of both TS and CPM within only one dynamic quantitative sensory testing procedure, by administering test stimuli before and after a conditioning stimulus. Testing sessions were carried out in a quiet and temperature-controlled room. Experiments were conducted in a comfortable chair.

Dynamic quantitative sensory testing

Equipment.

A mechanical test stimulus was administered by determining the pressure pain threshold (PPT), during which mechanical pressure was gradually increased over the upper trapezius muscle until the participant reached the early first sensations of pressure pain. The mechanical test stimulus was administered via a digital algometer (Jtech Medical, Midvale, Utah, USA). A thermal test stimulus was administered using a 3 cm2 thermode (TSA II, Thermal Sensory Analyzer, Medoc, Ramat Yishai, Israel) applied to each participant’s right (calibration) and left (test stimulus) forearm. For thermal test stimulus, pain perception was continuously recorded with a computerized visual analog scale (CoVAS) which consists of a slider running along a 100 mm horizontal track, housed in a box. Participants were asked to rate their pain by moving the slider between the left boundary (identified as “no pain” ‐ score = 0) and the right boundary (identified as “maximum you can tolerate” ‐ score = 100). We also included a numerical scale above the slider to give participants reference points. The CoVAS sampling rate was set at 10 Hz (10 pain measurements per second). The conditioning stimulus consisted of a cold pressor test involving a cold-water bath set at 10°C in which participants immersed their right forearm and hand for 120 seconds.

Fig 1 illustrates the step-by-step timeline of the procedure. This sequence was chosen in relation to the experts’ recommendations [4]. We favored a sequential protocol over a parallel protocol to limit possible attentional bias. We also applied 2 different types of test stimuli (mechanical and thermal), as recommended [4].

Pretests and test stimuli.

The mechanical and thermal test stimuli were administered before and immediately after the conditioning stimulus. The mechanical test stimulus was first applied, followed by a pause (∼ 6 minutes) until the pain sensation induced by the pressure pain threshold disappeared completely. We performed a practice test before administering the thermal test stimulus to familiarize the participants with this type of stimulation. Afterwards, a pretest was performed to identify the stimulation parameters to be used as the formal heat pain test stimulus by applying the thermode on the right-side anterior forearm. The temperature was gradually increased from a baseline temperature of 32°C to a maximum of 51°C, at a rate of 0.3°C/s. The pretest consisted of 3 trials in which participants were asked for the first one to verbally identify the point at which the heat became painful (heat pain threshold ‐ HPT) and the point at which the pain was no longer tolerable (heat pain tolerance ‐ HPTol). For the 2 other trials, we asked participant to use the CoVAS to identify the HPT, HPTol and the target temperature, i.e. the temperature that induced a pain intensity rating of 50/100. We averaged the 2 last trials for the HTP, HPTol and the target temperature. The resulting target temperature was used for the formal thermal test stimulus.

The thermal test stimulus consisted of a painful thermal stimulation applied with the thermode on the left forearm, at a predetermined, individually tailored temperature (target temperature). To avoid creating expectations, participants were told that the thermode temperature could either randomly increase, remain stable, or decrease throughout the stimulation. In fact, after a constant rise (0.3°C/s) from baseline (32°C) to the predetermined temperature (pain levels of 50/100 based on pretests data), the temperature remained constant throughout the 120 seconds of the test stimulus, during which participants were asked to continuously record their pain level, using the CoVAS. The mechanical and thermal test stimuli were re-administered under the same conditions after the conditioning stimulus.

Conditioning stimulus.

The conditioning stimulus consisted of a cold pressor test, wherein the participants immersed their right forearm and hand in a cold-water bath at 10°C for 120 s. During this cold pressor test, participants were also asked to continuously rate the intensity of their pain, using the CoVAS. These scores were used to calculate the average pain evoked by the conditioning stimulus and to ensure that a sensation of pain has been induced.

Temporal summation and CPM calculation.

Different techniques for calculating TS and CPM have been reported in the literature. To compare techniques and determine whether one provides better reliability results, we calculated TS amplitude with 2 methods that focus on pain fluctuations prior to the conditioning stimulus over 3 different time intervals (Table 2). A positive score indicates increased pain perception and was interpreted as the presence of TS. Conversely, a negative score or a score equal to 0 was interpreted as an absence of TS. For its part, the amplitude of CPM was calculated using 7 different methods, based on the pain level fluctuation evoked by mechanical and thermal test stimuli prior to and after the conditioning stimulus. For CPM-HPS, a negative score indicates a hypoalgesic effect and a positive score or a score equal to 0 was interpreted as an absence of CPM effect. Interpretation is reversed for CPM-PPT considering the nature and characteristics of pain threshold measurement, with an increase in pain thresholds reflecting a hypoalgesic effect, and vice versa. Calculation methods for TS and CPM are presented in Table 2.

Statistical analyses

Validation of assumptions.

Several assumptions must be verified before performing reliability analyses [14,27]. First, the Shapiro-Wilk’s test and a visual examination of histograms and normal Q-Q plots for each outcome were considered to assess normality distributions. Second, homoscedasticity (the absence of correlation between the size of error and the magnitude of the observed score) was tested by exploring the correlation (R2) among the absolute difference and the mean values between the two sessions [14,28]. If R2 values were lower than 0.1, homoscedasticity was considered present. Systematic error between the 2 sessions was evaluated with a paired t-test, using an α threshold of .05. In cases where at least one of these assumptions (normality, homoscedasticity, absence of systematic error) was not met, the variable(s) in question was (were) not included in the reliability analyses. In fact, there is no non-parametric equivalent for ICCs, SEMeas, and MDC, and although log transformation can offer a suitable alternative for correcting non-normal or heteroscedastic distributions [29,30] it was not appropriate in the present study since our data often had negative values or values equal to 0.

Relative reliability.

Relative reliability was determined by the ICC [16] using the ICC2,1 model (named 2-way random-effect “consistency” and “single measure” in SPSS [14]). Although the qualitative interpretation of ICC scores remains arbitrary [31], we applied the following benchmarks based on Koo and colleagues [32]: poor if ICC < 0.50; moderate if 0.51 ≤ ICC ≤ 0.75; good if 0.76 ≤ ICC ≤ 0.90; and excellent if ICC > 0.90. ICC estimation is coupled to the 95% confidence interval (95%CI). We also computed the coefficients of variation (CV), which represents the ratio of the standard deviation to the mean to allow for a better interpretation of ICC scores. With no real unit of measurement, it can be used to compare value distributions whose measurement scales are not comparable. The higher the coefficient of variation, the greater the level of dispersion around the mean. CV were estimated as follow: CV = (pooled mean from tests 1 & 2) / (pooled standard deviation from tests 1 & 2) *100.

Absolute reliability.

Absolute reliability was evaluated using the SEMeas [14,28]. SEMeas were determined using the following formula: SEMeas = where MSE is the mean squared error term (or ‘‘residual error”) obtained from the ANOVA applied on test and retest measurements. For its part, the MDC was calculated from SEM scores, as follows: MDC = 1.96*SEMeas* as suggested [14,28]. 1.96 and are used, respectively, to build 95% confidence intervals around SEMeas and to consider that errors are reproduced at both test and retest [14]. The MDC indicates the minimal amount of change to be reached in an individual to exceed the measurement’s error, which can then be interpreted with 95% confidence that a real change occurred. A smaller MDC indicates a more sensitive measure [33].

Results

Participants characteristics

All 46 participants completed the experiments and were included in the analyses. Participant’s characteristics are presented in Table 3. None of them experienced acute or chronic pain during the testing sessions, and all refrained from using pain medication for 24h hours prior to the experiments. The average interval between session 1 and 2 was 4 days (± 1.68; min: 2 days and max: 7 days).

Assumption validation

All variables were homoscedastic. The variables TS 30–120, TS Reg 30–120, CPM-HPS 0–30, CPM-HPS 30–120, and CPM-HPS 60–120 did not respect normality. Moreover, we noted a significant difference between the 2 test sessions for all CPM-HPS variables. As a result, the variables listed above were all excluded from the analyses. Group means and standard deviations for all TS and CPM outcomes and systematic differences between session 1 and 2 are presented in Table 4.

thumbnail
Table 4. Group means, standard deviation, systematic differences, homoscedasticity and normality for TS and CPM outcomes.

https://doi.org/10.1371/journal.pone.0307556.t004

Relative reliability

ICC with 95% CI and CV are presented in Table 5. ICCs for static QST ranged from good to excellent, except for HPT, suggesting moderate reliability. ICCs for TS ranged from poor to moderate. TS amplitude calculated for the largest time interval (0–120 sec) showed the greatest ICC value. In contrast, the lowest ICC value was observed for the 60–120 sec time interval. ICCs of the TS measures calculated using the regression line were poor (0.25). ICCs for CPM were all poor, regardless of the calculation method. Still, CPM-MaxPain outcome showed a higher ICC score than CPM-MPT.

thumbnail
Table 5. Relative, absolute reliability and minimal detectable change for TS and CPM outcomes.

https://doi.org/10.1371/journal.pone.0307556.t005

Absolute reliability

SEMeas and MDCs for TS are reported in Table 5. Concerning static QST, SEMeas ranged from 0.57 to 7.74. For TS, all SEMeas values were comparable to each other, as they were expressed in the same unit. All SEMeas were under 25. The lowest SEMeas value was that associated with the smallest time interval (60–120 sec), whether calculated by subtraction or using regression analyses. On the contrary, the highest SEMeas were observed for the largest time interval (0–120 sec). As MDCs are mathematically related to the SEMeas, the largest MDC also relates to the smallest intervals, and vice versa. Concerning CPM, the 2 metrics are not expressed in the same unit, thus preventing direct comparisons. Nevertheless, the CPM MaxPain variable has a relatively low SEMeas, close to those observed for the TS. SEMeas and MDCs for CPM are presented in Table 5.

Discussion

There is increasing clinical interest in TS and CPM measures, particularly due to their potential utility in identifying individuals at risk of developing chronic pain and predicting the success of pain treatments. The development of reliable measurements remains an important prerequisite to achieve this goal. The objective of this study was to evaluate the relative and absolute reliability of a series of TS and CPM measurements obtained using a standardized protocol with static and dynamic QST measures.

Relative reliability

The results of our analyses on static QST showed that the values of the relative reliability indices are moderate to excellent, consistent with previous studies examining thermal and mechanical pain thresholds [9,24,3437], and CPT pain [10]. Regarding the target temperature, our analyses show that this variable remains relatively stable over time and can therefore be considered as a good indicator for performing the stimulus test.

It should be reminded that the ICC results are group-dependent and are therefore difficult to generalize. ICCs allow us to determine what proportion of the observed variability of the measurement can be attributed to an error of measurement, compared with the variability between individuals in the group [14,16]. From a clinical point of view, ICC indicates the extent to which a given test is able to discriminate between different individuals, and whether their relative position in the sample is maintained when the test is reproduced [14,16]. However, very large or very small group variability (as expressed for example by group CVs) tends to respectively over- or under-estimate ICC, independently of the actual reliability of the measure [14]. The CVs observed for QST values in the present study were quite low, indicating low inter-subject variability. This, combined with their respective ICCs, confirms the good relative reliability of the static QST variables derived from our protocol. This low inter-subject variability also has the effect of lowering the ICC coefficient, possibly explaining why these coefficients were sometimes lower than those observed in other studies. As recommended [14], our ICC results should be compared and generalized to samples of healthy persons presenting with similar group variation (i.e. group CVs).

Concerning TS, few studies have attempted to document test-retest reliability, and, to the best of our knowledge, no previous study has attempted to investigate the reliability of TS induced by a 2-minute continuous HPS applied to the forearm. Our results indicate that TS induced by such a pain paradigm has poor to good relative reliability indices. Some studies have used a phasic thermal stimulation paradigm [13,38] and show poor to excellent ICCs. In this context, it has been shown that increasing the number of trials can increase ICC values [13,28]. Training most likely allows the participant to become better at succeeding at the task. This could be an interesting avenue to improve our protocol. For studies that induced TS with a mechanical phasic paradigm, the range of ICCs was wider, from poor [28] to excellent [28,39]. Strikingly, our results showed that the larger the calculation interval, the greater the ICC. Averaging over a wider interval and including more data points could potentially reduce intra-individual variability, thus increasing ICC values. Finally, we observed that smoothing the response curve with the regression line (reduce micro-variation of the VAS sampling) did not improve relative reliability. These results are coherent with the observations of Kong et al. [13]. In light of our results and in the context of measuring TS amplitude induced by continuous thermal stimulation, it appears that the 0–120 second interval would be the preferred calculation method to ensure moderate relative reliability. However, the CV value associated with this calculation interval suggests that the sample data is highly dispersed. Also, high inter-subject variability increases the likelihood of increasing the ICC.

For CPM measurement, our results show poor reliability for both thermal and mechanical test stimuli. This is coherent with the conclusion of a recent systematic review and meta-analysis [9], which revealed that reliability indices for CPM vary from poor to moderate. Using a protocol similar to ours, Kovacevic et al. [34] observed a good reliability index for CPM measures using PPT and HPS applied on the paravertebral musculature of the lumbar segment. Conversely, Naugle and al. [38], who performed PPT measurements on the forearm, obtained results much closer to ours (ICC: 0.17), suggesting that the site of stimulation could be an important factor. Complementary observations by Marcuzzi et al. [35] suggest that CPM measures tend to be more reliable when the test stimulus is based on a series a HPS, than when they are based on PPT. However, Marcuzzi et al. observed only marginally better ICCs than we did (ICC 0.50 and 0.35 for thermal and mechanical stimulation, respectively). Finally, a commendable study by Gehling et al. [40] showed that ICC were higher (increased reliability) when the stimulus test was applied during the conditioning stimulus (ICC: 0.62) compared to when it was applied 5 minutes after the conditioning stimulus (ICC: 0.17), suggesting that the timing between the second test stimulus and the conditioning stimulus can significantly affect reliability. In our study, the test stimulus was administered immediately after the conditioning stimulus. Interestingly, the ICC value in our study (ICC = 0.37) falls between these 2 values. Nevertheless, the application of the second test stimulus during the conditioning stimulus poses has certain drawbacks, notably regarding its potential impact on attention and the possibility of inducing distracting effect. Applying the second test stimulus immediately after the conditioning stimulus could represent an ideal compromise for maximizing fidelity while avoiding distraction effects [4].

Another notable point is the systematic error between the average pain perceived pre-post water bath. We noticed that the systematic error for second session was systematically lower. These observations may suggest a cognitive bias or an effect on anxiety. For instance, after completing the CPT procedure once, participants might experience less anxiety during the second session, potentially leading to reduced pain perception.

Based on our results, we recommend the use of continuous thermal stimulation over mechanical stimulation to test CPM. However, as with TS, the CVs associated with CPM ICCs are very high (in the order of 150–220%), and we must therefore be cautious in our interpretation. It should be remembered that like TS, the inter-individual variability of CPM is inherent to these mechanisms [9]. We must also keep in mind that measurement reliability of inhibitory mechanisms is dependent on methodological [9,41] and individual factors [10]. However, relying solely on the methodological argument is not enough to explain the low reliability of CPM. It is plausible that CPMs exhibit substantial variability over time, due to various biological and emotional factors. Therefore, it would be more fitting to perceive endogenous inhibitory mechanisms as dynamic processes over time, rather than stable “traits”.

Absolute reliability

SEMeas inform us on the extent to which the results of the same measurement can vary from one observation to another, due to random factors such as measurement errors or natural variations. In practical terms, a lower SEMeas value indicates greater reliability and consistency between repeated measurements [14]. SEMeas are rarely reported in QST studies [9]. The MDC, also infrequently reported, represents the amount of change below which there is a high probability (95% chance) that no real change has occurred.

Concerning QST parameters for thermal pain perception tolerance thresholds, the SEMeas values observed in our study closely match those reported in the literature [35,36,39], demonstrating high stability over time. Unfortunately, concerning PPT, it is difficult to compare between studies, since the unit of measurement (kg, kPa, kg/cm2) and equipment (algometer, von Frey filaments) are different. Moreover, no previous studies referenced SEMeas for pain perception during the cold pressor test and the target temperature (temperature used for HPS). Our data could therefore serve as a starting point for further studies, especially as the search for the target temperature for thermal stimulation of both TS and CPM, as well as the measure of pain perception during the water bath, are relatively common in this type of protocol.

Concerning TS, SEMeas varied from 11.23 to 22.46, depending on the calculation methods. In contrast to the relative indices, the shortest measurement intervals (60–120 seconds) were associated with the best metrics (i.e., lowest SEMeas and lowest MDC). These results indicated that an individual’s TS response can be expected to vary between 11 (for TS 60–120) to 22 points (for TS 0–120) around its true score between measurements, representing the natural variation of the test. Given the absence of comparative data, it is challenging to determine the significance of this score. Yet, within our protocol, the theoretical TS range spans from -100 (complete pain perception decrease) to +100 (complete pain perception increase). In this context, SEMeas of 11 to 22 points constitute only 5.5 to 11% of the total potential score variation, which seems plausible to attain, in response to an intervention or in the presence of pain or disease.

MDC gives us another interesting piece of information. Our results indicate that a shift of 31 to 62 points on a 0–100 pain scale, contingent on the calculation method, would be essential to discern a genuine change. This is especially pertinent, for example, when implementing medication to alleviate TS in individuals grappling with persistent pain[8,42]. It would be important to consider further studies in this little-studied area, to identify parameters that influence error of measurements of TS.

Concerning CPMs, absolute reliability is rarely reported in studies [9]. Overall, the SEMeas resulting from our analyses are fairly consistent with previous studies using similar protocols with SEMeas around 10 to 15 (on a 0–100 pain scale) [35,40,41]. MDCs reported from the above-mentioned studies show the same tendency. We identified only one study reporting lower SEMeas, using the suprathreshold heat pain response as a test stimulus [43]. When CPM are calculated with PPT, our analyses reveal larger SEMeas compared with stimulation applied on the arm as opposed to the trapezius or another part of the body [35,41,44]. It would therefore seem more appropriate to apply the test stimulus to the arm.

Strengths and limitations

This study has some limitations. The first concerns the repeated application of 2 sequential test stimuli (mechanical and thermal tests), which may have influenced measurements. However, applying PPT and HPS at distant points could have helped minimize the uncontrolled effects of CPM. Lack of randomization between mechanical and thermal modalities is a second limitation. We chose not to randomize the sequence of the 2 stimulus tests, consistently beginning instead with the shorter mechanical test to minimize the time between CPT and the second stimulus test [34]. Moreover, this study, like almost all other reliability studies, focused on short-term reliability. Future studies, looking at long-term reliability, are needed. We also wanted to emphasize that these results are specific to the protocols presented. It would be interesting for the measurement methods of TS and CPM to be standardized in terms of practice, as the DNFS has done, considering the clinical implications of these mechanisms. Finally, like any other measure of perception, it is important to remind and emphasize the subjective nature of pain measurement. Despite these limitations, our results align with previous findings. Strengths include alignment with expert group recommendations [4] and prevailing literature practices for the evaluation of TS and CPM [9,10]. Test-retest data from a larger protocol with 400+ participants would showcase experimenter expertise with a relatively large sample size compared to others [9] for better generalization.

Conclusion

Overall, our study shows that the reliability of excitatory and inhibitory endogenous pain modulation mechanisms is quite labile and probably suffers from its dynamic nature. The method we have developed demonstrates competitive relative reliability compared with other protocols. The absolute reliability and minimal detectable change highlighted here shed light on a shadowy and clinically important point for future studies, notably on the personalization of pain medication. Further studies need to be proposed to properly compare results and identify the factors involved in these mechanisms.

Acknowledgments

The authors would like to thank Amélie Têtu, Virginie Bolduc, Catherine Raynauld, Marie-Pier Houde, and Marie-Claude Battista of the Unité de recherche clinique et épidémiologique platform form the Centre de recherche du Centre hospitalier universitaire de Sherbrooke (Sherbrooke, Quebec, Canada) for their logistical and technical assistance in data collection and study coordination. The authors would also like to thank Mrs. Nathalie Ouellet and Mrs. Beatrice Debarges, patient partners, for their assistance in developing the research protocol.

References

  1. 1. Marchand S. The Physiology of Pain Mechanisms: From the Periphery to the Brain. Rheumatic Disease Clinics of North America. 2008;34: 285–309. pmid:18638678
  2. 2. Staud R, Craggs JG, Robinson ME, Perlstein WM, Price DD. Brain activity related to temporal summation of C-fiber evoked pain: Pain. 2007;129: 130–142. pmid:17156923
  3. 3. Ramaswamy S, Wodehouse T. Conditioned pain modulation-A comprehensive review. Neurophysiol Clin. 2021;51: 197–208. pmid:33334645
  4. 4. Yarnitsky D, Bouhassira D, Drewes AM, Fillingim RB, Granot M, Hansson P, et al. Recommendations on practice of conditioned pain modulation (CPM) testing: CPM consensus meeting recommendations 2014. European Journal of Pain. 2015;19: 805–806. pmid:25330039
  5. 5. Mendell LM. The Path to Discovery of Windup and Central Sensitization. Frontiers in Pain Research. 2022;3. Available: https://www.frontiersin.org/article/10.3389/fpain.2022.833104. pmid:35295805
  6. 6. LeBars D, Dikenson AH, Besson J-M. Diffuse noxious inhibitory controle (DNIC). I. Effects on dorsal horn convergent neurones in the rat. PAIN. 1979; 283–304.
  7. 7. Yarnitsky D. Conditioned pain modulation (the diffuse noxious inhibitory control-like effect): its relevance for acute and chronic pain states: Current Opinion in Anaesthesiology. 2010;23: 611–615. pmid:20543676
  8. 8. Bosma RL, Cheng JC, Rogachov A, Kim JA, Hemington KS, Osborne NR, et al. Brain Dynamics and Temporal Summation of Pain Predicts Neuropathic Pain Relief from Ketamine Infusion: Anesthesiology. 2018;129: 1015–1024. pmid:30199420
  9. 9. Nuwailati R, Bobos P, Drangsholt M, Curatolo M. Reliability of conditioned pain modulation in healthy individuals and chronic pain patients: a systematic review and meta-analysis. Scandinavian Journal of Pain. 2022;22: 262–278. pmid:35142147
  10. 10. Kennedy DL, Kemp HI, Ridout D, Yarnitsky D, Rice ASC. Reliability of conditioned pain modulation: a systematic review. PAIN. 2016;157: 2410–2419. pmid:27559835
  11. 11. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63: 737–745. pmid:20494804
  12. 12. Beaulieu L-D, Flamand VH, Massé-Alarie H, Schneider C. Reliability and minimal detectable change of transcranial magnetic stimulation outcomes in healthy adults: A systematic review. Brain Stimul. 2017;10: 196–213. pmid:28031148
  13. 13. Kong J-T, Johnson KA, Balise RR, Mackey S. Test-Retest Reliability of Thermal Temporal Summation Using an Individualized Protocol. The Journal of Pain. 2013;14: 79–88. pmid:23273835
  14. 14. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19: 231–240. pmid:15705040
  15. 15. Schambra HM, Ogden RT, Martínez-Hernández IE, Lin X, Chang YB, Rahman A, et al. The reliability of repeated TMS measures in older adults and in patients with subacute and chronic stroke. Front Cell Neurosci. 2015;9: 335. pmid:26388729
  16. 16. Liljequist D, Elfving B, Skavberg Roaldsen K. Intraclass correlation ‐ A discussion and demonstration of basic features. PLoS One. 2019;14: e0219854. pmid:31329615
  17. 17. Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek AL. Smallest real difference, a link between reproducibility and responsiveness. Qual Life Res. 2001;10: 571–578. pmid:11822790
  18. 18. Borg DN, Bach AJE, O’Brien JL, Sainani KL. Calculating sample size for reliability studies. PM&R. 2022;14: 1018–1025. pmid:35596122
  19. 19. Shi Y, Weingarten TN, Mantilla CB, Hooten WM, Warner DO. Smoking and PainPathophysiology and Clinical Implications. Anesthes. 2010;113: 977–992. pmid:20864835
  20. 20. Ditre JW, Zale EL, Heckman BW, Hendricks PS. A Measure of Perceived Pain and Tobacco Smoking Interrelations: Pilot Validation of the Pain and Smoking Inventory. Cogn Behav Ther. 2017;46: 339–351. pmid:27871214
  21. 21. Horn-Hofmann C, Büscher P, Lautenbacher S, Wolstein J. The effect of nonrecurring alcohol administration on pain perception in humans: a systematic review. J Pain Res. 2015;8: 175–187. pmid:25960674
  22. 22. Horn-Hofmann C, Capito ES, Wolstein J, Lautenbacher S. Acute alcohol effects on conditioned pain modulation, but not temporal summation of pain: PAIN. 2019; 1. pmid:31276454
  23. 23. Baratloo A, Rouhipour A, Forouzanfar MM, Safari S, Amiri M, Negida A. The Role of Caffeine in Pain Management: A Brief Literature Review. Anesth Pain Med. 2016;6. pmid:27642573
  24. 24. Geber C, Klein T, Azad S, Birklein F, Gierthmühlen J, Huge V, et al. Test–retest and interobserver reliability of quantitative sensory testing according to the protocol of the German Research Network on Neuropathic Pain (DFNS): A multi-centre study. PAIN. 2011;152: 548. pmid:21237569
  25. 25. Tousignant-Laflamme Y, Pagé S, Goffaux P, Marchand S. An experimental model to measure excitatory and inhibitory pain mechanisms in humans. Brain Research. 2008;1230: 73–79. pmid:18652808
  26. 26. Vincenot M, Coulombe-Lévêque A, Sean M, Camirand Lemyre F, Gendron L, Marchand S, et al. Development and Validation of a Predictive Model of Pain Modulation Profile to Guide Chronic Pain Treatment: A Study Protocol. Frontiers in Pain Research. 2021;2. Available: https://www.frontiersin.org/article/10.3389/fpain.2021.606422. pmid:35295452
  27. 27. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998;26: 217–238. pmid:9820922
  28. 28. Mailloux C, Beaulieu L-D, Wideman TH, Massé-Alarie H. Within-session test-retest reliability of pressure pain threshold and mechanical temporal summation in healthy subjects. Rushton A, editor. PLoS ONE. 2021;16: e0245278. pmid:33434233
  29. 29. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1: 307–310. pmid:2868172
  30. 30. Beaulieu L-D, Massé-Alarie H, Ribot-Ciscar E, Schneider C. Reliability of lower limb transcranial magnetic stimulation outcomes in the ipsi- and contralesional hemispheres of adults with chronic stroke. Clin Neurophysiol. 2017;128: 1290–1298. pmid:28549277
  31. 31. Portney LG, Watkins MP. Foundations of Clinical Research Applications to Practice. 3rd Edition. Pearson Education, Inc. Pearson Education, Inc. New Jersey; 2009. Available: https://www.scirp.org/(S(lz5mqp453edsnp55rrgjct55))/reference/ReferencesPapers.aspx?ReferenceID=2002526.
  32. 32. Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15: 155–163. pmid:27330520
  33. 33. Dontje ML, Dall PM, Skelton DA, Gill JMR, Chastin SFM. Reliability, minimal detectable change and responsiveness to change: indicators to select the best method to measure sedentary behaviour in older adults in different study designs. PLoS ONE. 2018;13. pmid:29649234
  34. 34. Kovacevic M, Klicov L, Vuklis D, Neblett R, Knezevic A. Test-retest reliability of pressure pain threshold and heat pain threshold as test stimuli for evaluation of conditioned pain modulation. Neurophysiol Clin. 2021;51: 433–442. pmid:34304974
  35. 35. Marcuzzi A, Wrigley PJ, Dean CM, Adams R, Hush JM. The long-term reliability of static and dynamic quantitative sensory testing in healthy individuals. Pain. 2017;158: 1217–1223. pmid:28328574
  36. 36. Nothnagel H, Puta C, Lehmann T, Baumbach P, Menard M, Gabriel B, et al. How stable are quantitative sensory testing measurements over time? Report on 10-week reliability and agreement of results in healthy volunteers. JPR. 2017; Volume 10: 2067–2078. pmid:28919806
  37. 37. Rolke R, Baron R, Maier C, Tölle TR, Treede ‐ D. R., Beyer A, et al. Quantitative sensory testing in the German Research Network on Neuropathic Pain (DFNS): Standardized protocol and reference values. Pain. 2006;123: 231–243. pmid:16697110
  38. 38. Naugle KM, Ohlman T, Wind B, Miller L. Test–Retest Instability of Temporal Summation and Conditioned Pain Modulation Measures in Older Adults. Pain Medicine. 2020;21: 2863–2876. pmid:33083842
  39. 39. Middlebrook N, Heneghan NR, Evans DW, Rushton A, Falla D. Reliability of temporal summation, thermal and pressure pain thresholds in a healthy cohort and musculoskeletal trauma population. Calvo-Lobo C, editor. PLOS ONE. 2020;15: e0233521. pmid:32469913
  40. 40. Gehling J, Mainka T, Vollert J, Pogatzki-Zahn EM, Maier C, Enax-Krumova EK. Short-term test-retest-reliability of conditioned pain modulation using the cold-heat-pain method in healthy subjects and its correlation to parameters of standardized quantitative sensory testing. BMC Neurology. 2016;16: 125. pmid:27495743
  41. 41. Nuwailati R, Curatolo M, LeResche L, Ramsay DS, Spiekerman C, Drangsholt M. Reliability of the conditioned pain modulation paradigm across three anatomical sites. Scand J Pain. 2020;20: 283–296. pmid:31812949
  42. 42. Yarnitsky D. Role of endogenous pain modulation in chronic pain mechanisms and treatment: PAIN. 2015;156: S24–S31. pmid:25789433
  43. 43. Valencia C, Kindler LL, Fillingim RB, George SZ. Stability of conditioned pain modulation in two musculoskeletal pain models: investigating the influence of shoulder pain intensity and gender. BMC Musculoskelet Disord. 2013;14: 182. pmid:23758907
  44. 44. Kennedy DL, Kemp HI, Wu C, Ridout DA, Rice ASC. Determining Real Change in Conditioned Pain Modulation: A Repeated Measures Study in Healthy Volunteers. J Pain. 2020;21: 708–721. pmid:31715262