MRI-based anatomical characterisation of lower-limb muscles in older women

The ability of muscles to produce force depends, among others, on their anatomical features and it is altered by ageing-associated weakening. However, a clear characterisation of these features, highly relevant for older individuals, is still lacking. This study hence aimed at characterising muscle volume, length, and physiological cross-sectional area (PCSA) and their variability, between body sides and between individuals, in a group of post-menopausal women. Lower-limb magnetic resonance images were acquired from eleven participants (69 (7) y. o., 66.9 (7.7) kg, 159 (3) cm). Twenty-three muscles were manually segmented from the images and muscle volume, length and PCSA were calculated from this dataset. Personalised maximal isometric force was then calculated using the latter information. The percentage difference between the muscles of the two lower limbs was up to 89% and 22% for volume and length, respectively, and up to 84% for PCSA, with no recognisable pattern associated with limb dominance. Between-subject coefficients of variation reached 36% and 13% for muscle volume and length, respectively. Generally, muscle parameters were similar to previous literature, but volumes were smaller than those from in-vivo young adults and slightly higher than ex-vivo ones. Maximal isometric force was found to be on average smaller than those obtained from estimates based on linear scaling of ex-vivo-based literature values. In conclusion, this study quantified for the first time anatomical asymmetry of lower-limb muscles in older women, suggesting that symmetry should not be assumed in this population. Furthermore, we showed that a scaling approach, widely used in musculoskeletal modelling, leads to an overestimation of the maximal isometric force for most muscles. This heavily questions the validity of this approach for older populations. As a solution, the unique dataset of muscle segmentation made available with this paper could support the development of alternative population-based scaling approaches, together with that of automatic tools for muscle segmentation.


Introduction
The characterisation of the ability of individual muscles to produce force is of particular relevance in older individuals, for whom ageing-associated muscle loss (sarcopenia) can significantly affect the ability of a muscle to produce strength [1][2][3]. The ability of a muscle to generate force depends on its fibre composition and characteristics, and on its structural and morphological features [4]. The loss of muscle strength at older ages has been explained by a reduction in muscle mass [5], an increase in slower muscle fibres [4], a higher percentage of intramuscular fat, in combination with a smaller physiological cross-sectional area (PCSA) [4,6].
Being able to quantify lower-limb muscle forces during dynamic tasks can help to understand the capability of an individual to control a movement, with relevant application in the prediction of risk of fall and fractures in older people. Musculoskeletal (MSK) models have been increasingly adopted for this purpose [7][8][9][10]. These models, however, often rely on a number of assumptions about anatomical features of muscles, neglecting possible variabilities between subjects and within subject (i.e. body asymmetry) which knowingly affect the accuracy of their outputs [11][12][13][14][15]. Additionally, muscle properties such as the maximal isometric force (F max ) are often derived from cadaver-based dissection studies [16,17], hence neglecting population-specific features like reduction of muscle strength in the elderly.
Estimated muscle forces obtained from MSK models are known to be sensitive to variations in architectural musculotendon parameters [18]. Moderate and muscle-specific sensitivity to F max has been previously reported in young adults, both for generic-scaled [19] and subjectspecific models [13]. In these studies, F max was only made to vary within small ranges [19] or proportionally for all muscles [13], and the effects of muscle-or subject-specificity have not been investigated [20]. Additionally, MSK models do not typically account for loss of muscle strength typically associated to ageing [3,21], which can be both subject-and muscle-specific [22].
In an attempt to overcome the above limitations, clinical measurements of muscle strength, such as those from hand-held dynamometer measurements of grip strength, can be integrated in MSK models [23]. These, however, provide an overall indication of strength rather than muscle-specific strength properties. Medical imaging such as Magnetic Resonance Imaging (MRI) has been successfully adopted for deriving individual muscle volume and muscle length through image segmentation [24]. The ratio of muscle volume and length is proportional to the muscle PCSA [25] and hence to the maximal isometric force a muscle can generate [26].
Except for tendon slack length, which cannot yet be quantified with any routine non-invasive techniques, a full characterisation of the muscle parameters based on MRI is certainly feasible [27], but not commonly pursued due to the time and repeatability challenges associated with image processing. As a result, very little is known about specific characteristics of these parameters, especially in older individuals. The main aim of this study was to investigate lower-limb muscle anatomical characteristics, including volume, length, and PCSA, in a group of post-menopausal women. When enough knowledge about individual muscles parameters is available, this could be used to either build population-based statistical models [28] or, as recently proposed by Handsfield et al. [24] for young individuals, establish their relationship with the body mass or length, overcoming the need for segmenting individual muscles in future applications. The second aim of this paper was to verify the suitability of this approach and to provide the community with a fully characterised database including 3D muscle and bone geometries as obtained from lower-limb MRI from a group of post-menopausal women, in the attempt to foster the community efforts towards the development of automatic image processing and modelling tools.

Participants and data acquisition
Eleven post-menopausal women (mean (standard deviation, SD): 69 (7) y. o., 66.9 (7.7) kg, 159 (3) cm) with no movement limitations were recruited by the Metabolic Bone Centre, Northern General Hospital in Sheffield, UK as part of larger studies (Multisim and Multisim 2, EP/K03877X/1 and EP/S032940/1, https://epsrc.ukri.org). Inclusion criteria were having a bone mineral density T-score at the lumbar spine or total hip (whichever was the lower value) less than or equal to -1. Bone mineral density was measured by dual energy x-ray absorptiometry using a Discovery A densitometer (Hologic Inc., Bedford, MA, USA). Exclusion criteria were: body mass index (BMI) <18 or >35, history of or current conditions known to affect bone metabolism and bone mineral density, history of or current neurological disorders, prescription of oral corticosteroids for more than three months within the last year, history of any long term immobilization (>3 months), conditions that prevent the acquisition of musculoskeletal images, use of medications or treatment known to affect bone metabolism other than calcium/vitamin D supplementation and alcohol intake greater than 21 units per week. The study was approved by the East of England-Cambridgeshire and Hertfordshire Research Ethics Committee and the Health Research Authority and was conducted in accordance with the Declaration of Helsinki (October 2000). Written informed consent was obtained from all participants.
During a hospital visit, full lower-limb MRI was collected using a Magnetom Avanto 1.5 T scanner (Siemens, Erlangen Germany). A T1-weighted scanning sequence was used with an echo time of 2.59 ms, a repetition time of 7.64 ms, flip angle of 10 degrees and voxel sizes of 1.1x1.1x5.0 mm for the long bones and 1.1x1.1x3.0 mm for the joints. In this occasion participants' lower-limb dominance was determined asking them "If you kicked a football which foot would you use?" [29].

Data processing
Muscle segmentation. Lower-limb bones were segmented within the MRI scans using Mimics 20.0 (Materialise, Leuven, Belgium). In each limb, 30 muscles were segmented, initially using the automated muscle segmentation toolbox (Mimics Research 20.0, Materialise, Belgium), followed by manual adjustments when necessary. Inter-operator repeatability of the muscle segmentation procedure was assessed by calculating the ratio between SD and mean (referred to as coefficient of variation, CoV) of the muscle volumes (V M ) calculated by three different operators on a subset of three participants. According to literature suggestions [30,31], values of CoV can be considered as acceptable when below 10%. Using a conservative approach, for those muscles where inter-operator CoV was higher than 5% we also performed an intra-operator analysis, asking the same operator to repeat the segmentation three times on the same dataset. Following the latter analysis, we discarded all the muscles with non-acceptable repeatability (CoV > 10%). The Psoas major muscle was removed from the repeatability study since it was partially cut off from the MRI field of view in some cases. Similarly, the foot extensors and flexors were not evaluated, since their external boundaries were not identifiable in many of the MRI datasets.
Calculation of the maximal isometric force. Two different approaches were used to calculate F max . Firstly, a linear scaling of F max based on lower-limb mass [32], which is typically used in MSK models when individual muscle geometries are not available (Lower-limb massbased scaling, LLMS). Secondly, F max was calculated as a function of muscle PCSA, calculated from individual muscle volumes and length (Volume and length-based scaling, VLS).
In the LLMS approach [32], F max was linearly scaled to the lower-limb mass according to (1): where m LL is the mass of the lower limbs of the subject, calculated as a product of the volume of the lower limbs (estimated from the MRI) and the density of the tissue [33]), m LLGen is the mass of the lower limbs of the generic OpenSim model gait2392 [17] and F maxGen is the default F max of each muscles in the gait2392 model. An equivalent estimate of F max could be theoretically obtained in the absence of MRI by estimating m LL after a scaling procedure (e.g. using the Scaling Tool in OpenSim [34]).
In the VLS approach, muscle segmentations were used to calculate the muscle volume (V M ) and the anatomical muscle length (l M ) was calculated as the length of the centreline from the 3D muscle segmentation. This was generated as the line connecting the points representing the topological skeleton of each muscle cross section in the 2D MRI slices. A smooth curve was fitted to the centreline using a moving average filter, with the span of the filter being selected individually for each muscle. Values for l M were then denoted as the arc length of the fitted smoothed curve constituting the centreline of the 3D segmentations. All above computations were performed in MATLAB R2019b (The Mathworks Inc., Natick, MA, USA). V M and l M were then used to calculate the muscle PCSA according to (2): where k is the ratio between a muscle optimal fibre length (l 0 f ) and length, as taken from the literature [25].
Values of V M and PCSA were compared to those available in the literature for healthy young adults [24] and cadavers [25,27].
F max was calculated as a product of the PCSA described in Eq (1) and the specific tension (σ = 61 N/cm 2 , [16,35]), [26]: For the Glutei and Adductor magnus, 1/3 of the total F max value was attributed to each of the three bundles constituting the muscle and used for comparison to the values obtained with the LLMS method. Statistical analysis. All variables were tested for normality using the one-sample Kolmogorov-Smirnov test in MATLAB and null hypothesis were then consistently tested using either a student's t test in the case of normally distributed data or a Wilcoxon signed-rank test in the case of non-normally distributed data. To discard the hypothesis of anatomical symmetry, V M , l M and PCSA of the muscles belonging to the right and left limb were compared. The percentage difference between the values in the right and left limb was also quantified for all the muscles and all the subjects. CoV was calculated for each muscle to quantify the inter-subject variability.
Linear regressions were computed between total lower limb muscle volume (V TOT equal to the sum of the muscles whose segmentation resulted repeatable) and lower-limb mass, body mass, height, and BMI.
The effect of accounting for individual muscle geometry on the calculated F max was quantified by comparing the F max values obtained using the LLMS and VLS approaches. Percentage difference between F max estimated with the two methods was calculated. Significance level α was set to 0.05 for all statistical tests.

Muscle segmentation
The inter-operator analysis provided higher CoV than the intra-operator analysis (Table 1) for all the muscles tested. The Gastrocnemii and Vastus medialis were easily identifiable and led to very high inter-operator repeatability. The Peronei had the worst inter-operator CoV (close to 50%). Even though better results were found for the intra-operator analysis for the Peroneus brevis (CoV = 7.6%), this was not the case for the Peroneus longus (CoV = 10.9%), which was removed from further analysis together with the Gluteus minimus (CoV = 21.6%).
In light of the high inter-operator differences, only muscle segmentations generated by the same single expert operator were used for the following analyses.

Muscle anatomical parameters
From the dominance test, all participants resulted right limb dominant.
All investigated parameters were not normally distributed; therefore, non-parametric tests were selected for the statistical analysis. An evident intra-and inter-subject variability was Inter-and intra-operator coefficient of variation (CoV) for muscle volume calculated by three operators (inter-op) and by one operator over three repetitions (intra-op). The percentage difference of V M between the two limbs was above 85% for the Gracilis in one subject and for the Rectus femoris in another subject. A significant difference between the two limbs was found for the V M of the Sartorius, Gluteus maximus, Adductor magnus, and Vastus lateralis, with lower values in the left limb. Between-subject CoV (see Supplementary material) ranged between 14% (Vastus medialis) and 36% (Sartorius).
The percentage difference of l M between the two limbs was up to 22% (Adductor brevis). A significant difference between the two limbs was observed for the l M of the Gluteus medius and Vastus lateralis, with lower values in the left limb. Between-subject CoV (see Supplementary material) ranged between 3% (Sartorius) and 13% (Gastrocnemius lateralis).
Mean and SD of the V M are reported in Table 2 for the sake of comparison with literature data. Overall, our values were higher than dissection-based muscle volumes from elderly cadavers [25] but smaller than muscle volumes from mixed-age cadavers [27] and MRI-based muscle volumes from healthy young adults [24] both of mixed sexes and females only.
Among the tested anthropometric parameters height and lower-limb mass did not significantly correlate with V TOT (Fig 3), whereas BMI and body mass showed significant correlations, with coefficient of determination, R 2 = 0.44 (p = 0.003) and R 2 = 0.50 (p = 0.004), respectively.
The percentage difference in PCSA (Fig 4) between right and left limb ranged between -84% for the Gracilis and Rectus femoris (with smaller PCSA in the right limb) and 38% for the Gracilis (with bigger PCSA in the right limb). Only for Gastrocnemius medialis, Gluteus maximus and medius and Soleus between-limb variations were below 20% for all the subjects. The values found for the participants in this study were similar to those reported by other authors for mixed-age/sex cadavers [25,27], but smaller than those from healthy young adults of both sexes [24] (Fig 5).

Maximal isometric force
The F max calculated from the VLS approach differed from that of the LLMS by up to 400% (Biceps femoris short head) for individual subjects (Fig 6), with overall smaller estimates of F max with the VLS model. On average, the percentage difference between the two approaches was between -176% (for the Iliacus where F max was smaller in the VLS) and 36% (for the Adductor magnus II where F max was bigger in the VLS). Differences were found significant for all muscles except for Gluteus maximus I and III, Adductor magnus III, Biceps femoris long head, Semimembranosus, Rectus femoris, and Peroneus brevis.

Discussion
This study aimed to quantify lower-limb muscle anatomical characteristics from medical images in a group of post-menopausal women. To this purpose, the 3D geometries of 23 lower-limb muscles segmented from MRI from a cohort of eleven post-menopausal women were used to assess inter-and intra-individual differences and compared to existing literature data. The use of image segmentation for the calculation of muscle parameters is complicated by the time and repeatability challenges associated with this technique. However, broadening the knowledge of muscle anatomical characteristics could support the development of tools (e.g. population-based statistical models [28] or regression models [24]) to overcome the need for segmentation.   [27] and from twenty-one elderly cadavers (83 (9) y.o.) by Ward et al. [25]. � with participants we here refer to the number of limbs considered independently; gender of the participants is reported too. † muscle mass values reported by Ward et al. [25] were multiplied by a muscle density of 1.056 g/cm 3 as suggested by the authors in [36].
https://doi.org/10.1371/journal.pone.0242973.t002 This is, to our knowledge, the first study providing a quantification of lower-limb muscle volumes and lengths in older women, and a thorough assessment of the differences observable both between body sides and across individuals. An ultrasound-based study quantified up to 24% of muscle thickness asymmetry in abdominal muscles in healthy individuals of different ages [37], suggesting that analogous results could be expected in the lower limbs. When comparing the two limbs of each subject in our cohort, we observed differences of up to 85% for V M and of up to 22% for l M (Figs 1 and 2). Except for very few muscles (Sartorius, Gluteus maximus, Adductor magnus, Vastus lateralis) which were significantly bigger on the right side, no recognisable pattern was observed across the cohort to be associated with limb dominance. In fact, both muscle volumes and lengths were notably variable in the population. This clearly indicates that care should be taken in assuming limb symmetry when assigning musculotendon parameters, even in healthy populations.
Even though different approaches to the image segmentation may have affected the estimate of the muscle parameters, the comparison to MRI-based values from the literature [24] led to valuable insights. Despite the average height and weight of our participants being smaller than those previously reported for an ex-vivo cohort [25], slightly larger V M were found ( Table 2). This could be explained by the loss in muscle mass in cadavers [36]. On the contrary, our V M was smaller than that estimated in-vivo from MRI in healthy young adults (25.5 (11.1) y. o.) [24], which might be explained by both younger age and mixed-sex participants. In fact, when isolating the female component from the young population, smaller average V M and SD were still observed in our cohort for all the muscles. This explains the smaller inter-subject variability (as quantified by CoV) found in our study, i.e. between 14% (Vastus medialis) and 36% Median (minimum, maximum) of the physiological cross-sectional area (PCSA) for 23 lowerlimb muscles for eleven subjects in our study (n = number of limbs). PCSA are derived from the segmented V M and l M and using the average optimal fibre length to muscle length ratio proposed by Ward et al. [25]; � PCSA of the Tensor fasciae latae was calculated setting the optimal fibre length to muscle length ratio equal to 1 (as proposed by Handsfield et al. [24]) since the actual values were not available from the literature source. Minimum and maximum percentage difference across the subjects is reported for each muscle. (Sartorius), compared to literature values for healthy young mixed-sex adults (quantified between 20% and 40% from the reported mean and SD) [25] and even more when isolating the female component (except for the Tensor fasciae latae muscle). The V M calculated from our cohort remained consistently smaller to those from young females, except for the Vastus intermedius, likely due to ageing-related muscles volume loss [3,21].
In order to overcome the need for individual muscle segmentations to estimate muscle-specific parameters in MSK models, Handsfield et al. [24] proposed a series of regression equations linearly correlating individual muscle volume to participants' total lower-limb muscle volume, body mass and height. Lower correlations were quantified in this study (Fig 3), likely due to having included only 23 instead of 35 lower-limb muscles. This discrepancy could also be preferential weakening or atrophy of certain muscles caused by ageing [38], an hypothesis which seems to be confirmed by the lower volumes found in our cohort when compared to younger females. Surprisingly, V TOT correlated more strongly with total body mass than with lower-limb mass, suggesting that scaling muscle forces based on lower-limb mass (LLMS) [32,39] might not be a suitable approach in an older population, and a simple scaling to body mass should be preferred in the absence of MRI.
The maximal force that a muscle can produce is highly affected by its PCSA [26]. Since optimal fibre length could not be calculated from available MRI data, the PCSA was here calculated by scaling l M according to ex-vivo literature values from an older population [25]. This led to PCSA values in agreement with literature [24,25,27], except for bigger values for the Gluteus maximus and smaller values for the Iliacus (Fig 5). The PCSA of the Sartorius muscle presented a 37% of CoV between the subjects, due to high variability in its volume and small variability in its length. This was also the only muscle showing significantly different F max between the body sides at group level, with larger values in the dominant limb. Previous studies  [27]. Blue diamonds represent mean PCSA values for twenty-one cadavers as calculated by Ward et al. [25] and divided by the cosine of the mean pennation angle reported by the same authors. Green squares with deviation error bars represent PCSA values estimated by Handfield et al. [24] from MRI segmentation of thirty-two healthy young adults.
https://doi.org/10.1371/journal.pone.0242973.g005 highlighted intra-subject variability in the tendon-to-muscle belly length ratio as well as in the location of the widest part of the muscle along its axis [40], therefore confirming our findings.
The specific tension (σ) of a muscle also contributes to the estimate of F max . The choice of setting σ to 61 N/cm 2 was suggested by previous literature where this value was proposed for elderly populations [16,35]. Sensitivity of models to this parameter was previously tested by Valente et al. [13], finding a moderate effect on the model output. In the effort of maximally personalizing muscle parameters, individual values for the specific tension should be obtained for different subjects and different muscles, however such a measure is not currently available in-vivo. The use of dynamometer could provide further insight in the specific tension of muscle groups and overcome this limitation.
Estimated F max were overall significantly smaller when based on V M than when linearly scaled to lower-limb mass (Fig 6), except for the Adductor magnus, Vastus intermedius and medialis, Gastrocnemius medialis, and Soleus, that, on the contrary, presented significantly higher values for the LLMS approach. Declining muscle strength has been observed from the age of fifty [3] and a reduction by 20% of F max has been quantified in older people aged seventy [41]. This could explain the smaller F max obtained from individual V M (when volume loss associated with ageing was taken into account) compared to a scaling approach. This also confirms previous literature suggesting that a scaling approach might only be appropriate if starting from values from a sex-and age-matched population [28].
The choice of F max highly impacts the output of MSK models [42,43], since a change in an individual muscle ability to produce force alters the solution of the static optimisation problem https://doi.org/10.1371/journal.pone.0242973.g006 [42], affecting both individual muscle force estimates and the resulting joint contact force. A previous study found limited sensitivity of muscle forces and joint contact forces to F max [32] estimating its values based on scaling of literature values or using Handsfield's regression equations. Ackland et al. [19] studied the effect of variation between +10% and −10% of F max nominal value, reporting no significant changes in the model output. However, in their study, they did not account for actual muscle geometry to estimate F max , which proved to cause variation up to 400% in our study when compared to scaling approaches. This suggests that calculating individual F max from MRI-segmentations could affect the estimates of muscle forces and joint contact forces on a larger scale than reported in the literature and lead to more accurate estimates. This supports the conclusions from Arnold et al. [16] that tuning individual muscle parameters might provide estimates of internal forces that compare better to experimental measurements [16]. Further studies are needed to confirm this hypothesis.
This study had some limitations. Out of the 35 muscles commonly included in lower-limb MSK models, only 23 were included in the study, as these were not significantly affected by operator-related error in the segmentation. Muscle segmentation is a time-consuming (10 hours per subject on average for this study) and operator-dependent procedure, therefore further effort should be put into developing automated algorithms based on machine learning [28] for the segmentation of individual muscles or statistical shape modelling-based approaches for the extraction of muscle volume and muscle centreline/length. The dataset associated with this paper is publicly available, which will likely foster advances in this field, i.e. acting as a reference atlas.
The cohort enrolled for this study included eleven participants; a larger sample size would be needed to ensure generalisability of the results observed here. Our results suggest that muscle asymmetry could be higher in older adults due to age-related processes. However, this finding is based on comparison to literature [24], where data were obtained following a slightly different methodology. Therefore, a wider study, including a control group of younger women, should be designed to prove our hypothesis.
In the attempt of preserving a degree of subject-specificity in the muscle parameters, PSCA was calculated from muscle volume and length. Nonetheless, due to the impossibility of estimating the optimal fibre length from the implemented MRI sequence, the required ratio between optimal fibre length and muscle length was taken from cadaveric data. Diffusion Tensor imaging recently proved to be a valuable option to enable both muscle segmentation and the estimate of fibre length [27]. Further studies are needed to understand whether this technique might be included within an MSK modelling imaging protocol to overcome this limitation.
In conclusion, this study uniquely proved the existence of significantly large muscle-and subject-specific asymmetry in muscle volume, length, and PCSA. This suggests that individual differences in muscle geometry must not be neglected, and inter-limb symmetry cannot be assumed in older women. Personalised muscle characteristics should be accounted for in MSK models aiming at investigating dynamic tasks such as walking, where strength asymmetry plays an important role in older women. This could be of substantial relevance when internal forces are used in clinical contexts, such as prediction of osteoporotic risk of fracture.
Supporting information S1