Dynamic MRI to quantify musculoskeletal motion: A systematic review of concurrent validity and reliability, and perspectives for evaluation of musculoskeletal disorders

Purpose To report evidence for the concurrent validity and reliability of dynamic MRI techniques to evaluate in vivo joint and muscle mechanics, and to propose recommendations for their use in the assessment of normal and impaired musculoskeletal function. Materials and methods The search was conducted on articles published in Web of science, PubMed, Scopus, Academic search Premier, and Cochrane Library between 1990 and August 2017. Studies that reported the concurrent validity and/or reliability of dynamic MRI techniques for in vivo evaluation of joint or muscle mechanics were included after assessment by two independent reviewers. Selected articles were assessed using an adapted quality assessment tool and a data extraction process. Results for concurrent validity and reliability were categorized as poor, moderate, or excellent. Results Twenty articles fulfilled the inclusion criteria with a mean quality assessment score of 66% (±10.4%). Concurrent validity and/or reliability of eight dynamic MRI techniques were reported, with the knee being the most evaluated joint (seven studies). Moderate to excellent concurrent validity and reliability were reported for seven out of eight dynamic MRI techniques. Cine phase contrast and real-time MRI appeared to be the most valid and reliable techniques to evaluate joint motion, and spin tag for muscle motion. Conclusion Dynamic MRI techniques are promising for the in vivo evaluation of musculoskeletal mechanics; however results should be evaluated with caution since validity and reliability have not been determined for all joints and muscles, nor for many pathological conditions.


Introduction
The term 'musculoskeletal disorder' refers to conditions, diseases, and injuries of bones, joints and muscles. Musculoskeletal disorders can result from neurological diseases (e.g stroke, cerebral palsy) and orthopaedic disorders (e.g. anterior cruciate ligament injuries, osteoarthritis) that alter the human musculoskeletal system and impair its functions. The world-wide prevalence of musculoskeletal disorders is high, and they cause 21.3% of the total years lived with disability (ranked second after behavioral and mental health problems) [1][2][3]. Currently, standard static MRI sequences are used to provide a clinical diagnosis and an understanding of bone and tissue pathology. However, it could be hypothesized from a functional perspective, that abnormal or altered musculoskeletal mechanics cause musculoskeletal disorders. Furthermore, previous research has shown that images of static joint positions do not provide a comprehensive evaluation of the dynamic musculoskeletal system [4][5][6][7][8][9]. As a result, clinical, or even surgical treatments may be inappropriate. Understanding normal and impaired musculoskeletal function during motion is a high radiological, biomechanical and clinical priority. Accurate and reliable in vivo measurement of functional mechanics of the musculoskeletal system is thus necessary: 1) to understand normal joint mechanics in asymptomatic individuals, 2) to predict, detect or diagnose musculoskeletal disorders (e.g. scapholunate subluxation), and 3) to determine appropriate treatments for disorders using evidence based analysis.
Dynamic MRI techniques were originally developed for cardiovascular imaging to quantify blood flow and to study heart valve functions [10]. Dynamic MRI sequences for the quantification of functional joint motion were developed in the early 90's [11][12][13]. As more dynamic sequences are being developed, they are becoming an integral part of image-based musculoskeletal modeling pipelines that rely heavily on dynamic imaging data to input joint kinematic parameters and predict patient specific outcomes [14]. However, controversial results have been reported for dynamic MRI based studies of joint mechanics in comparison with static studies. For example, the Achilles tendon moment arm determined using dynamic MRI by Sheehan FT [15] was much varied at larger ankle angles than reported previously by Manganais and colleagues [16] using static image based calculations. Despite an abundance of existing literature on dynamic MRI [14,17], no systematic reviews of the validity of these techniques have been carried out. Such a review is necessary to guide researchers and clinicians in the selection of the best available and validated techniques.
Concurrent validity and reliability provide valuable information for the interpretation of data. The aim of this systematic review was to report evidence of validity and reliability of dynamic MRI techniques to quantify in vivo joint and muscle mechanics. The global aim of this work was to identify gaps in the literature, to propose recommendations for the assessment of both healthy and impaired musculoskeletal function using current dynamic MRI techniques, and to make suggestions for future research in this field.

Database search strategy
Articles published between 1990 and August 2017 were identified through a systematic search of the following five databases: (1) Web of science, (2) PubMed, (3) Scopus, (4) Academic search Premier, and (5) Cochrane Library. In order to ensure the search was systematic, the following combinations of keywords were used: 1) Keywords relative to acquisition method: "MRI", "cine", "dynamic", "volumetric", "velocity", "in vivo" 2) "muscle", "joint", "bone" 3) "kinematics", "displacement" 4) Keywords relative to metrological properties: "accuracy", "reliability", "repeatability", "validity". The guidelines by Sampson and McGowan [18] were used to reduce search errors. Search strings were formulated and tailored to the search syntax of each database to ensure a common search strategy (S1 Appendix). All keywords were truncated to check for variants in Pubmed, then the search was carried out without truncation. In this paper, validity refers to the general concept of concurrent validity [19] of the measurement error relating to joint kinematics or skeletal muscle motion properties between a reference method and the dynamic MRI method under evaluation. Reliability refers to intra/inter-rater/ session reliability [20] of the dynamic MRI method used in the study.

Study selection process
After removing duplicates from the search results, the titles and abstracts of the remaining studies were assessed by two reviewers independently to determine if they fulfilled the inclusion criteria. To be included in the review, studies had to fulfil three criteria: (1) the study was performed using a dynamic MRI imaging technique, (2) the study focused on joints or skeletal muscles and/or a moving phantom that mimicked joint or muscle movement, and (3) the study focused directly on quantifying concurrent validity and/or reliability. Exclusion criteria were: (1) the article was not published in English, (2) the article was categorized as a systematic or narrative review article or an editorial or a letter to the editor or as an abstract from conference proceedings, and (3) the article focused on moving or rotating phantoms but did not mimic skeletal joints or muscles. In the case of disagreement, consensus was reached by discussion.
To complete the review process, the references of the selected articles were also checked and articles found were included in the final selection. Four categories of data were extracted and presented in standardized tables: study population and joint/muscle studied, study description, dynamic tasks performed, dynamic MRI parameters, and results of concurrent validity and/or reliability.

Quality assessment of selected studies
To the authors' knowledge, no standardized tool for the assessment of quality of articles in this field currently exists. Thus, a customized quality assessment tool was developed based on three previously reported quality assessment tools for radiology and biomechanics related studies: 1) QUADAS-a tool for quality assessment of studies of diagnostic accuracy [21], 2) STROBE statement (STrengthening and Reporting of OBservational studies in Epidemiology) [22], and 3) quality assessment tools developed in recent systematic reviews of validity and reliability of joint motion analysis [23] and radiological assessment of hip geometry [24].
Two categories of quality were rated for each selected article (Table 1):1) intrinsic quality (Questions 1 to 11, Table 1), based on questions related to the study design, quality of reporting the methodology, and quality of reporting the results and findings/conclusion (maximum score 24); and 2) metrological evidence (Questions 12 to 17, Table 1), based on the questions related to quality of reporting the outcome measures and quality of metrological evidence to support the conclusions (maximum score 22). The total score (maximum 46) was converted into a percentage and named QAS (Quality assessment score). All the QAS values were rounded off to nearest integers for simplicity.

Data analysis
Two observers independently reviewed the selected articles and rated the QAS. In case of significant disagreements in scores, consensus was reached by discussion. The QAS rated the overall quality of the selected article. To assess concurrent validity of techniques, the values of the results reported in the article were analyzed. Validity was considered excellent if errors were less than one millimeter or degree or cm/second, moderate if errors were in the order of one millimeter or degree or cm/second, and poor if errors were around, or greater than, two millimeters or degrees or cm/second. We acknowledge that this categorization has not been validated, however we used it to provide clarity when reporting the results. For the assessment of reliability, a Kappa coefficient (K), linear regression coefficient (r) or interclass correlation coefficient (ICC) between 0 and 0.60 was considered as poor, 0.60-0.80 as moderate, and 0.81-1.0 as excellent [25][26][27][28]. Due to the different statistical methods used in each article, it was impossible to directly compare or group the results. Thus, the results for validity and reliability were directly reported from the articles. Table 1. Quality assessment score (QAS) questionnaire used to evaluate the quality of each selected article.

Quality Question
Score Criteria

Results
The literature search identified 15854 articles from electronic databases, 6358 of which remained after removing duplicates. After screening titles and abstracts, 73 articles were found to be potentially eligible. Twenty articles were finally selected after verification of inclusion and exclusion criteria (Fig 1). The data were then summarized in four tables.  Table 2 provides a description of study populations and designs, Table 3 provides details of tasks and measurement methods, Table 4 reports concurrent validity measures and Table 5 reports reliability measures. In the 20 studies, 1.5T and/or 3.0T MRI scanners were used, from the three major original equipment manufacturers (Philips, GE and Siemens), and for both open and closed bore types of scanner. This systematic review adheres to the PRISMA guidelines and a PRISMA checklist is available as a supplementary material (S2 Appendix)

Quality assessment
The mean QAS of all the selected articles was 66% (± 10.46%) ( Table 2). Two of the selected articles had a QAS of 80% or more and both these studies reported the concurrent validity of a real-time dynamic MRI technique [31,34]. Six studies had a QAS between 70% and 80% [30,42,[45][46][47][48]. Seven studies had a QAS ranging from 60% to 70% [29,[36][37][38][39]41,43]. Three studies had QASs between 50% and 60% [35,40,44]. The other two studies had QASs of 48% [32,33]. All the articles selected are presented to provide an all-inclusive review of the available literature on the metrological assessment of dynamic MRI techniques. Details of the scores of each article are provided in the supporting document S3 Appendix.

Concurrent validity and reliability
Four studies [30,34,36,39] (mean QAS 73%) evaluated the concurrent validity of the technique in question using a moving phantom and later determined its reliability on healthy volunteers (Tables 4 and 5). Seven studies [29,32,35,40,44,46] (mean QAS 55%) evaluated only concurrent validity either using a moving phantom or another imaging technique as a gold standard ( . The names of the sequences are reported as stated in the respective articles. The knee joint was the most frequently studied (seven studies), followed by the ankle and temporo-mandibular joints (two studies each), and the shoulder, wrist and hip joints (one study each). Three articles studied upper limb muscles and three studied lower limb muscles.

Joint evaluations
Measurement of knee joint mechanics. Of the seven articles that studied the knee joint (Tables 3, 6  Among all the cine PC MRI techniques used, in-plane mean concurrent validity was excellent and out-of-plane mean concurrent validity was moderate to excellent [30,44] (mean QAS 64%) on 3.0T scanner. Furthermore, Benham et al. [30] reported that between no signal averaging and two signal averages, translational accuracy increases as much as 3.5 times, whereas rotational accuracy remains unchanged. Reliability of the cine PC MRI technique was reported by comparing knee kinematics (patellofemoral and tibiofemoral) from two acquisitions collected during same session. Reliability was moderate for rotations and excellent for translations [30,43]  Intra-observer reliability was excellent and inter-observer reliability was poor for bisect offset and patellar tilt respectively [34] (QAS 85%).

Hip
Wrist inter +++ (1); SLD +++ (1) within each session was recommended to produce adequate ICC values on bisect offset and patellar tilt whereas an average of four measurements was recommended to yield consistent sulcus angles. For cine MRI, concurrent validity and reliability for tibiofemoral kinematic tracking were both excellent, using a 3.0T scanner [36]. The same study also reported excellent concurrent validity for determining tibiofemoral cartilage contact location.
Sheehan and colleagues [45] (QAS 79%) reported moderate intra-subject reliability for the evaluation of ankle joint kinematics using Cine PC MRI on a 3.0T scanner. Clarke et al., [31] (QAS 80%) used ultrafast MRI to study the Achilles tendon moment arm using the 'geometric method' of measuring the distance from the joint axis to the muscle-tendon line-of-action and reported poor concurrent validity on a 3.0T scanner.
Measurement of temporo-mandibular joint (TMJ) mechanics. Since standard static clinical examinations cannot reliably assess TMJ disorders, dynamic MR imaging has become standard in the evaluation of TMJ problems. Two studies carried out metrological evaluation of dynamic MRI sequences based on quantitative parameters of TMJ mechanics (Tables 3  and 7). For dynamic HASTE sequence (half-Fourier acquired single-shot turbo spin-echo) acquired on a 1.5T scanner, Wang and colleagues [47] (QAS 73%) reported excellent reliability for the evaluation of maximal TMJ opening and closing. Zhang and colleagues [48] (QAS 71%) used real-time MRI with a radial data encoding scheme, and reported excellent reliability for visual assessment of the dynamic positions of the TMJ.
Measurement of shoulder, hip, and wrist joint mechanics. The metrological properties of dynamic MRI sequences at the shoulder, hip and wrist joints were each assessed in one study. For real-time MRI techniques, moderate reliability was reported for shoulder joint kinematics using a 1.

Skeletal muscle mechanics
Six studies evaluated skeletal muscle motion using three different dynamic MRI techniques (Tables 2, 6 and 7). A spin tag or tagged MRI sequence was used in three studies [39,40,46] (mean QAS 62%), a cine PC MRI sequence was used in three studies [32,33,46] (mean QAS 55%), and a real-time PC MRI sequence was used in one study [29] (QAS 65%).
Using  [32,33] (mean QAS 48%) of a velocity encoded cine PC MRI technique. In the first study [32] (QAS 48%), they reported excellent concurrent validity and excellent prediction of the sinusoidal displacements of a moving phantom, and in the second study [33], they reported moderate concurrent validity for 2D trajectory-tracking of skeletal muscles. Asakawa and associates [29] (QAS 65%) compared real-time PC MRI with cine PC MRI to determine the velocities of the biceps brachii, and found moderate concurrent validity for peak velocity values within the volunteers.

Discussion
This systematic review reports current evidence regarding the metrological properties of dynamic MRI techniques for the measurement of joint and muscle mechanics. Eight dynamic MRI techniques identified from 20 selected articles were reported. Image acquisition techniques, output parameters, post-processing requirements, and metrological outcomes varied across studies. Moderate to excellent concurrent validity and reliability were reported for various MRI techniques in different studies for joints, moving phantoms, and muscles. However, only four out of 20 selected studies included subjects with musculoskeletal disorders, thus evidence for the metrological parameters of these techniques in clinical practice is currently lacking. Based on the current level of metrological evidence, the most valid and reliable techniques appear to be cine-PC and real-time MRI for joint mechanics and Spin tag MRI for muscle mechanics.

Joint kinematics
The findings of this systematic review highlight that the concurrent validity of the different dynamic MRI techniques has not been evaluated for all joints (Tables 6 and 7). Concurrent validity was mostly evaluated using moving phantoms (Table 4), whereas reliability studies involved repeated measures in the same subject, or reporting observer reliability with image processing (Table 5). Overall, the largest number of joints were studied using cine PC and real-time MRI (three for cine PC and four for real-time), with good to excellent levels of validity. For knee joint kinematics, concurrent validity (2 studies, [30,44]) and reliability (2 studies [30,43]

Skeletal muscle tracking
Many musculoskeletal and neurological disorders lead to changes in muscle properties and function that are still not well understood. Skeletal muscle tracking can be used to evaluate shear strain, tensile strain, and strain rate, along with regional deformations [32] and thus, could play a major role in understanding the pathophysiology of muscle disorders. However, very few studies and research groups use dynamic MRI techniques to study skeletal muscle disorders. For example, dynamic MRI techniques have been employed to determine impaired muscle mechanics in the Achilles tendon [61], gastrocnemius [62,63] and soleus muscles [63], however the validity of these techniques has been scarcely reported. Spin tag MRI is the only technique that consistently showed excellent concurrent validity and reliability for both upper and lower limb muscles. Tagged MRI sequences allow the measurement of deformation by tracking a tagged pattern on the muscles [39,46]. No other dynamic MRI techniques were used for muscle tracking/strain/displacement except cine PC MRI [46] and real-time PC MRI [29]. Furthermore, non-invasive measurement of the mechanical properties of muscles requires detailed in vivo measurements of skeletal muscles deformation. Thus, although the results of this study suggest spin-tag MRI is currently the most valid and reliable technique for the evaluation of muscle, further studies are required to confirm this.

Limitations-Systematic review
This systematic review presents some limitations. The review protocol was not registered a priori in an international prospective register of systematic reviews, such as PROSPERO (https:// www.crd.york.ac.uk/PROSPERO/). We did not use MeSH terms in the search strategy as MeSH terms were not consistent across the search engines and some search engines do not have controlled vocabulary (for e.g., Web of Science). However, the search strategy was crosschecked for common errors, according to the guidelines by Sampson et al. [18], and was made reproducible by providing the search strings used for each database (S1 Appendix). However, it is possible that certain keywords or word variants were missed. Certain databases, such as the Cochrane Library, automatically search for word variants in terms of linguistic variants, spelling (British vs American) variants, or even non-standard plural variants, however the other databases do not have this function, which could be a potential limitation of the search. Another limitation of this review was that the questionnaire (Table 1) used to determine QAS was not validated, although it was based on validated questionnaires. Thus, the QAS should be interpreted with caution.

Limitations and improvements-Metrological studies
The main limitation of this review was the heterogeneity of MRI parameters, experimental designs, methods employed, and non-reported parameters due to manufacturer-specific sequences, which made it impossible to use a common scale for comparison. Even if studies used the same sequences, the parameters were heterogeneous since they are scanner dependent. Thus, although we recommend use of certain techniques, we cannot recommend a generalized set of parameters. To understand basic differences in these techniques, a brief methodological overview for each of these techniques with their trade names used by different manufacturers is provided in the S4 Appendix. Furthermore, not only did the metric quantification methods differ, different statistical methods were used to report concurrent validity (coefficient of regression (r), standard deviation, absolute differences, root mean square error, mean error values etc.) and reliability (standard deviation, absolute differences, interclass correlation coefficients, kappa statistics, root mean square, etc.).
Most in vivo tests were conducted on healthy volunteers. Only four studies (Table 3) included subjects with musculoskeletal disorders [37,42,46,47], and the data acquired was mostly used for feasibility or proof of concept. Despite the challenges relating to magnetism and scanner bore size constraints, it is now possible to mimic standing in an open MRI scanner or weight-bearing in a closed scanner. These conditions are considered to increase understanding of musculoskeletal disorders [17,64]. The literature suggests that researchers have succeeded in determining in vivo healthy joint kinematics for weight-bearing [65][66][67] and non-weight bearing conditions [15,[68][69][70][71] that would evoke joint pain in pathological population. However, there are no studies of concurrent validity and reliability in persons with musculoskeletal disorders and abnormal joint kinematics. Future studies to evaluate dynamic MRI techniques should therefore involve patients with musculoskeletal disorders or mimic pathology.
With regard to the statistical analysis, which is a key point when reporting metrological studies, no exhaustive recommendations are available. However, for future reliability studies, we recommend reporting the standard error of measurement (SEM) or the minimal detectable difference of the measures. Reporting these metrics would allow the readers and users to attribute the observed difference to a true measurement of change, or a measurement error [27]. Furthermore, none of the studies carried out an a priori sample size calculation. This is important to ensure the study has adequate power [72,73].
This review highlighted that the most optimal way to evaluate the concurrent validity of dynamic MRI was by using motion phantoms that mimic joints or muscles. Search strategy found three studies [74][75][76] that reported the concurrent validity of cine PC MRI by using the known movement of specially designed motion phantoms, without mimicking joint or muscle motion. Since these studies did not fit in the aim of this systematic review, they were not included in the selected articles. We highly recommend the use of joint or muscle motion mimicking phantoms to evaluate all the dynamic MRI sequences using a single scanner in order to evaluate their concurrent validity.

Future development
Future developments in this field can be classified into two categories: MRI sequence and post-processing techniques. Dynamic MRI sequences are evolving rapidly with advances in imaging technology. The typical fast imaging sequences based on balanced steady state free precession techniques, originally used for cardiac exams, are insufficient to obtain a total volume acquisition within a single breath hold for cardiac MRI [77]. A number of strategies have been developed to further reduce the acquisition time. These include, but are not limited to 1) k-t BLAST/SENSE (Sensitivity Encoding)/ASSET (Array coil Spatial Sensitivity Encoding) [78,79], 2) k-t FOCUSS [80], 3) parallel imaging techniques like GRAPPA (Generalised autocalibrating partially parallel acquisition)/ARC (Autocalibrating Reconstruction for Cartesian imaging) [81], and 4) Echoplanar imaging (EPI). [78]. [82] All these imaging techniques and sequences are promising for the investigation of joint and muscle mechanics.
Although the focus of this review was not improving post-processing techniques, post-processing is key with regard to the feasibility and clinical utility of dynamic MRI. One such area that should be targeted is artifacts produced by eddy currents. In all types of imagery, eddy currents produce typical image artifacts that include image shearing, image scaling, and global position shifts. Thus, it is important to minimize the systematic error induced by eddy currents, which is possible using several techniques including 1) slotted coils and shields to interrupt current loops, 2) active shielding of gradients, and 3) image post-processing to correct for frequency/phase shifts. None of the selected articles reported the use of any of these techniques to minimize the eddy current error. However, one of the non-selected phantom studies [74] stated the use of post-processing techniques to reduce eddy current error.

Perspectives for the evaluation of musculoskeletal disorders
Dynamic MRI-based evaluation of musculoskeletal disorders could have huge impact on understanding of the pathomechanics of the musculoskeletal system as well as to guide surgery [37] and rehabilitation [83]. Individuals with musculoskeletal disorders often experience joint pain and/or weakness during simple daily tasks or motions. Pain-inducing tasks would provide the most relevant dynamic MRI data, however, if such tasks are used, it is essential that the technique is quick and non-repetitive. While cine-PC and real-time MRI techniques stand out for the evaluation of skeletal joint mechanics, their use in the clinical setting is limited. For example, cine-PC MRI needs tasks to be repeated for up to two minutes (Tables 4 and 5) to acquire dynamic data. This is inappropriate in the case of pain. Real-time MRI can acquire dynamic data in single cycle, however requires slower joint motion, making the movement quasi-static. Future studies should focus on eliminating these limitations.
The most difficult challenge is to obtain physiological joint loading conditions inside the constrained space of the scanner, whether a horizontal close-bore system or upright open-bore system. Weight bearing MRI of joints is suggested to identify conditions that are otherwise challenging to diagnose using non-weight bearing MRI [64]. Weight bearing joint kinematics are different from non-weight bearing kinematics [4,5,7,9,84,85]. Furthermore, weight bearing joint kinematics are load dependent and change significantly with variations of the applied load [86]. Active in vivo joint kinematics are significantly different from passive or static analyses [8,87]. To reproduce physiological joint loading, special loading fixtures are needed which makes the experimental set-up complex and uncomfortable. Moreover, it is difficult to derive accurate and reliable joint kinematics from the acquired images because the quality of dynamic MR images is always lower than for static images. This is because fast image acquisition sequences with lower TR and TE values are typically used for dynamic MRI. Standardized processes for weight-bearing MRI have not yet been defined and their use for diagnosis, treatment and post-surgical follow-up remains to be specified.
In summary, dynamic MRI techniques may have potential to be used as clinical tools (for diagnosis or follow-up). However, there is a lack of metrological evidence for their use in the evaluation of musculoskeletal disorders. Moreover, due to the high costs involved, lack of standardization, lack of research demonstrating diagnostic value, post-processing time and complexity, manufactures are not developing and including standardized dynamic sequences for the study of musculoskeletal disorders. Thus, the role of dynamic MRI for the diagnosis of challenging cases is currently uncertain, and this technique is at an early stage of development. At the very best, dynamic MRI techniques can be used in the research setting to answer clinically important research questions such as understanding pain mechanisms [88] or evaluating functional anatomy [55,71] etc. Nevertheless, the results of this study regarding the validity and reliability of dynamic MRI techniques for the assessment of the musculoskeletal system are encouraging.