Validity and reliability of handgrip dynamometry in older adults: A comparison of two widely used dynamometers

Background Among older adults, decreased handgrip strength is associated with greater risk of frailty, and loss of physical function, mobility, lean mass, and overall muscular strength and power. Frailty is also associated with sarcopenia, for which handgrip strength measurement has been recommended for diagnostic purposes. Specific cutoff points for diagnosis have been identified, but use of different devices may affect measurement. Therefore to assess validity and reliability, we compared the two most frequently used devices, the Jamar hydraulic and Smedley spring handgrip dynamometers. Methods Sixty-seven older (76.2 ± 0.9 years) men (n = 34) and women (n = 33) completed two trials of handgrip strength measurement on sequential days (T1, T2) using both devices in random order. Intraclass correlations were used to assess test-retest reliability, and Bland-Altman analysis was used to assess validity as the level of agreement between devices. Results There were significant (p < 0.001) relationships between devices at T1 (r = 0.94) and T2 (r = 0.94) and strong (p < 0.001) intraclass correlations were observed for both devices (Jamar = 0.98; Smedley = 0.96), indicating excellent reliability. However, there were significant differences between devices. Strength measured with Jamar was greater than Smedley at both T1 (27.4 ± 1.4 vs. 23.4 ± 1.1 kg, p < 0.001) and T2 (25.3 ± 1.4 vs. 21.8 ± 1.2 kg, p < 0.001). Bland-Altman analysis confirmed these differences. Subgroup analysis to evaluate the effect of gender and age indicated that in women and old-old (>75 years) participants, differences between devices were closer to zero for both measurements compared to men and young-old (65–75 years) participants. Conclusions Our results demonstrate that despite excellent reliability, there is poor agreement between devices, indicating a lack of validity. For use as a diagnostic tool, standardization and device-specific cutoff points for handgrip dynamometry are needed.


Background
In middle and older-aged adults, handgrip strength predicts all-cause and disease-speci c mortality, including mortality related to cardiovascular disease, chronic obstructive pulmonary disease, and cancer greater risk of frailty, and loss of physical function, mobility, lean mass, and overall muscular strength and power [8][9][10][11][12][13][14]. Handgrip strength is generally recognized as a surrogate measure of whole-body strength and can be used clinically to assess for age-related deterioration in function and health status associated with frailty [3,8,14].
Frailty and loss of function and health are also associated with sarcopenia, a geriatric syndrome characterized by loss of muscle and strength [15]. Globally, the prevalence of sarcopenia among adults aged 60 years and over is estimated to be at least 10% [16]. Sarcopenia not only predicts mortality among community-dwelling and acutely ill older adults [17][18][19], but is also related to functional decline, loss of independence, and hospitalization [20][21][22]. Exercise interventions can successfully prevent and reverse muscle loss and functional decline [23,24], but clinical assessment is needed to identify older adults who are at risk [16].
Muscle strength is a biomarker for sarcopenia [25], and handgrip strength measured with dynamometry has been recommended for diagnostic purposes [26,27]. However, although absolute and precise genderspeci c cut points for normal handgrip strength have been identi ed, these cut points do not consider potential differences between measurement devices. Currently, there is no universally agreed-upon device or procedure for clinical measurement [26,27]. In fact, a systematic review of handgrip measurement protocols found incomplete reporting of both the procedures and the devices used [28]. The Jamar hydraulic dynamometer is widely used, but multiple other devices are available for clinical and research purposes [29]. Recent systematic reviews of handgrip strength named at least 10 different devices used for measurement [4,6,30]. Among these, the Jamar hydraulic dynamometer and Smedley spring dynamometer were the most frequently identi ed [4,6,30].
There are similarities and differences between the two dynamometers. Both devices weigh approximately 0.66 kg and provide force measurements up to 90 kg. However, the Jamar hydraulic dynamometer displays force using an analog dial with 2-kilogram increments, so smaller more discrete measurements must be interpreted by the operator. By comparison, the Smedley uses a digital display that provides force measurements to the nearest 0.l kg, so operator interpretation is eliminated. Also, both have adjustable handles, to modify grip size, although the Jamar has a concave grip while the Smedley grip is straight. Finally, the Jamar is metal, so the surface temperature can be cooler to touch than the Smedley, which is plastic.
The differences in these two devices may in uence the validity and reliability of measurement. To date, we can nd only one study comparing the Jamar and Smedley dynamometers in older adults [31]. Although measurements obtained by the two devices were similar, they were statistically different and in uenced by gender and age. Speci cally, differences were greater in women compared to men, and in older compared to younger participants [31]. Moreover, only a single trial was used for comparison, so reliability over time could not be evaluated. Therefore, to assess validity and reliability, we compared sequential grip strength measurements in older adults over a two-day period using a Jamar hydraulic (Patterson Medical, USA) versus a Smedley spring (Takei Scienti c Instruments, Japan) handgrip dynamometer. Our secondary aim was to evaluate the effect of gender and age on agreement between devices.

Design
The current study was part of a larger study that has previously been described [32]. Brie y, this was an empirical 2 X 2 design. Participants completed two measurement sessions on consecutive days (T1, T2) using two devices (Jamar, Smedley). T1 measurements were scheduled in the middle of the day when participants were normally fed and hydrated. T2 measurements were scheduled in the early morning on the following day when participants were fasting (i.e., without food or uids for at least eight hours). This design was speci cally intended to elicit loss of muscular strength and function between T1 and T2, which allowed the researchers to assess both reliability and validity when measurements changed over time. The order of testing for each device was randomized between participants and between times.
Although interrater reliability for handgrip dynamometry is good to excellent [33], all data were obtained by the same investigator to avoid any potential differences. Ethical approval was obtained from the University of Colorado Colorado Springs Institutional Review Board and all participants signed an informed consent prior to enrollment.

Participants
Sixty-seven community-dwelling older adults (76.2 ± 0.9 years) volunteered and completed both measurement sessions. Inclusion criteria were age 65 years or older, non-smoking, and able to stand up and ambulate independently or with an assistive device. The only exclusion criterion was the inability to hold the dynamometer and maintain correct positioning during measurement. No participants were excluded from the study.

Measurements
Anthropometrics Body measurements were obtained using standardized procedures [34]. Before measurement, participants were asked to void and remove their shoes and all excess clothing. Weight was calculated to the nearest 0.2 kg. Height and waist circumference were calculated to the nearest 0.5 cm.

Handgrip Dynamometry
For all measurements, the grip width on the Jamar was standardized to the second position (5.0 cm) that has been found to maximize strength production in the majority of adults regardless of age, body mass, or hand dimensions [35,36]. The grip width of the Smedley was also adjusted to 5.0 cm for uniformity between devices. Consistent with recommendations for handgrip dynamometry by the American Society of Hand Therapy and previous research [10,31], participants sat in a chair with the device held in their dominant hand, their arm supported on a table or other stable surface, their wrist in a neutral position, and their elbow bent at a 90° angle. This procedure has been reported to have high test-retest reliability [37]. Participants then squeezed the device one time, as hard as possible, for 3 seconds. A single attempt was used for each device to avoid muscle fatigue and loss of strength attributed to multiple attempts [36,38,39]. After a 2-minute rest, participants repeated the same measurement procedure with the second device. The maximal force exerted with the Smedley was measured to the nearest 0.1 kg using the digital readout. The maximal force exerted with the Jamar was measured to the nearest 2.0 kg using markers on the analog dial and then estimated by the investigator to the nearest 0.5 kg based on visual inspection of the gauge needle's position between the 2-kg markers.

Statistical Analysis
Data were analyzed using SPSS version 27.0 (IBM Corporation, USA) and reported as mean ± SE with 95% CI unless otherwise indicated. Statistical signi cance was set as p < 0.05. Analysis of variance (ANOVA) was used to assess differences at T1 and T2. Pearson correlations were used to evaluate the association between the two devices at T1 and T2, and intraclass correlations (ICC) were used to assess test-retest reliability. For purposes of this analysis, values between 0.8-0.9 were considered good, and values above 0.9 were considered excellent [40]. Bland-Altman analysis was used to assess the level of agreement between devices by plotting differences ± 2 SD against mean values [41]. Plots were visually assessed for characteristics demonstrating good agreement, including mean values close to zero, uniform distribution over the range of measurement, and 95% of differences within ± 2 SD [41]. Finally, to examine the effect of gender and age on agreement, data were strati ed by age (young-old 65-75 years, old-old > 75 years) and gender (male, female).

Participant Characteristics
Thirty-four men and 33 women (age range 65-96 years) completed the study. There were no differences between men and women for age or body mass index (BMI). However, males had signi cantly greater (p < 0.001) height, body weight, waist circumference and handgrip strength (Table 1).

Validity
There were signi cant between-group differences between devices at both T1 and T2 (Table 1). At T1, there was an average (± SD) difference of 4.1 ± 4.2 kg (p < 0.001) between the Jamar and Smedley dynamometers, and at T2 the average (± SD) difference was 3.5 ± 4.0 kg (p < 0.001) ( Table 2). Bland-Altman analysis indicated poor agreement between the Jamar and Smedley dynamometers at T1 and T2 (Fig. 2). Mean values were not close to zero, although they were closer at T2 than at T1, and distribution was not uniform over the range of either measurement. Also, although 97% of differences fell within ± 2 SD at T1, only 94% of differences fell within ± 2 SD at T2.

Effect of Gender and Age
When data were strati ed by gender, mean differences were not statistically different between men and women at either T1 or T2, indicating no effect for gender, although differences in women were closer to zero for both measurements (Table 2). Visual inspection of Bland-Altman plots found women to cluster at the lower end of the range of strength measurement, re ecting lower absolute grip strength, but with generally similar differences between devices as those observed for men (Fig. 3). Interestingly, all differences falling outside of ± 2 SD from mean values were for men.
When data were strati ed by age, mean differences were not statistically different at T1, indicating no effect of age on initial measurement. However, at T2 the mean difference between devices in young-old participants was statistically greater than the mean difference in old-old participants (Table 2), indicating a possible age effect. Visual inspection of Bland-Altman plots found differences in old-old participants to cluster somewhat closer to zero along the range, re ecting smaller mean differences (Fig. 4). Clustering was more evident at T2, consistent with the statistically signi cant difference from young-old participants for that measurement.

Discussion
To our knowledge, this is the rst study to evaluate both the validity and reliability of the Jamar and Smedley handgrip dynamometers that are the most widely used dynamometers for research purposes.
Signi cant differences in handgrip strength were observed between devices at both timepoints, indicating poor validity that was con rmed with visual inspection of Bland Altman plots. However, intra-class correlations for both devices were excellent, indicating good reliability for both devices.
Previously, Guerra and Amaral [31] compared the Jamar and Smedley dynamometers at one time point only in a sample of 55 older adults between 65-99 years of age. They reported a correlation coe cient of r = 0.83 with a mean difference of 3.2 kg, which is similar to but slightly smaller than our current ndings. In contrast to our ndings, they reported that the level of agreement between the two devices was poorer for women compared to men, and old-old compared to young-old. In our sample, women demonstrated better agreement than men, and old-old participants demonstrated better agreement than the young-old. Participant differences may somehow have in uenced discrepancies in our ndings. Our sample was somewhat younger, with an average age of 76.2 years, compared to 79.2 years for their sample. Furthermore, our sample was evenly distributed between men and women, while theirs was predominantly (76%) female.
Guerra and Amaral [31] attributed discrepancies in measurement to an interaction between participants and dynamometer characteristics. As previously described, there are both similarities and differences between the Jamar and Smedley dynamometers. Although they may be considered subtle, these differences could be su cient to cause the discrepancies in measurement observed in both our studies, while differences in our samples may have been su cient to cause the inconsistent effects of gender and age between our studies. For example, the ability to exert handgrip strength is in uenced by pain or discomfort [13], so the design of each dynamometer may be of importance. More research in this area is needed.
It has been suggested that increasing handgrip strength may independently improve physical and functional resilience, resulting in improved health outcomes [2]. A meta-analysis of 25 studies, including almost 200,000 adults with an average age of 65 years or older, demonstrates that increasing handgrip strength by only 1 kg reduces mortality due to heart disease by 30% [6]. This is an intriguing nding, but without clear guidelines for handgrip strength measurement, accurate clinical assessment is di cult and appropriate implementation of interventions is problematic.
Current diagnostic criteria for sarcopenia in older adults differentiate handgrip strength cut points based on gender [26,27]. This is supported by the absolute differences between men and women we observed. However, the diagnostic cut points ignore differences between devices that are highlighted by our current ndings. Speci c cut points of < 27 kg for men and < 16 kg for women are recommended by the European Working Group on Sarcopenia in Older People (EWGSOP) [26], and < 26 kg for men and < 18 kg for women are recommended by the Asian Working Group for Sarcopenia (AWGS) [27], but they are not speci c to the device used for measurement. By comparison to handgrip strength, the AWGS has now differentiated cut points for muscle mass that are device speci c [27]. They differentiate cutoff values for skeletal muscle based on either dual-energy x-ray absorptiometry (DXA) or bioelectrical impedance analysis (BIA), in recognition that both are widely used technologies with numerous advantages but without absolute agreement. Based on our ndings, we believe that device-speci c cut points for handgrip strength are also needed. This is consistent with the EWGSOP's key recommendation that simple, speci c cutoff points for measures used to identify sarcopenia be developed [26].
We recognize that there are limitations to our study. Our sample size was relatively small, although it exceeded the sample used for the only previous study we could nd that compared the Jamar and Smedley dynamometers in older adults. Furthermore, a small sample is primarily of concern due to the risk of a type II error. The signi cant differences we observed appear to rule out this concern in relation to our ndings. It is also possible that use of another procedure for handgrip measurement may have obviated differences between the two devices. As previously noted, there are multiple procedures reported in the literature that include variations in arm positioning and the number of attempts used for measurement. The in uence of these variations should be explored in future research.

Conclusion
The mean differences we observed between the Jamar and Smedley, two of the most widely used handgrip dynamometers, could result in misdiagnosis either for or against sarcopenia (i.e., either a false positive or a false negative). Nevertheless, as a diagnostic tool, handgrip dynamometry has numerous advantages including low cost, portability, rapid results, and easy use. However, without device-speci c cut points or a universally agreed-upon device for handgrip dynamometry, sarcopenia treatment and research may be impeded. Individual labs and clinics may not have universal access to all types of devices available, so device-speci c cut points for handgrip dynamometry would appear to have the greater advantage and are recommended.     Bland-Altman plots for differences against mean values of hand grip strength measured with the Jamar and Smedley dynamometers at T1 (top) and T2 (bottom) strati ed by age (young-old = empty circles, oldold = solid circles).