The introduction of low cost optical 3D motion tracking sensors provides new options for effective quantification of motor dysfunction.
The present study aimed to evaluate the Kinect V2 sensor against a gold standard motion capture system with respect to accuracy of tracked landmark movements and accuracy and repeatability of derived clinical parameters.
Nineteen healthy subjects were concurrently recorded with a Kinect V2 sensor and an optical motion tracking system (Vicon). Six different movement tasks were recorded with 3D full-body kinematics from both systems. Tasks included walking in different conditions, balance and adaptive postural control. After temporal and spatial alignment, agreement of movements signals was described by Pearson’s correlation coefficient and signal to noise ratios per dimension. From these movement signals, 45 clinical parameters were calculated, including ranges of motions, torso sway, movement velocities and cadence. Accuracy of parameters was described as absolute agreement, consistency agreement and limits of agreement. Intra-session reliability of 3 to 5 measurement repetitions was described as repeatability coefficient and standard error of measurement for each system.
Accuracy of Kinect V2 landmark movements was moderate to excellent and depended on movement dimension, landmark location and performed task. Signal to noise ratio provided information about Kinect V2 landmark stability and indicated larger noise behaviour in feet and ankles. Most of the derived clinical parameters showed good to excellent absolute agreement (30 parameters showed ICC(3,1) > 0.7) and consistency (38 parameters showed r > 0.7) between both systems.
Given that this system is low-cost, portable and does not require any sensors to be attached to the body, it could provide numerous advantages when compared to established marker- or wearable sensor based system. The Kinect V2 has the potential to be used as a reliable and valid clinical measurement tool.
Citation: Otte K, Kayser B, Mansow-Model S, Verrel J, Paul F, Brandt AU, et al. (2016) Accuracy and Reliability of the Kinect Version 2 for Clinical Measurement of Motor Function. PLoS ONE 11(11): e0166532. https://doi.org/10.1371/journal.pone.0166532
Editor: Natasha M. Maurits, Universitair Medisch Centrum Groningen, NETHERLANDS
Received: June 9, 2016; Accepted: October 31, 2016; Published: November 18, 2016
Copyright: © 2016 Otte et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All files are available from the open science framework database (https://osf.io/5jpyh/).
Funding: The company “Motognosis UG” provided support in the form of salaries for authors KO, BK and SMM, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.
Competing interests: Authors SMM and AUB hold stocks of Motognosis UG. Motognosis UG filed for patent (DE201410013828) using Kinect technology in postural control. Authors TSH, FP and JV report nothing to disclose. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Kinematic movement analysis has contributed valuable insights into the physiology of movement coordination. It is also used to describe specific impairments of motor function in detail and thus augments clinical diagnosis. As an objective, quantitative technique, some applications have claimed to track changes in motor functions over time more accurately than clinical ratings . This development could lead to the feasibility of clinical ratings based on affordable measurement solutions that do not require trained staff, and thus may be applied outside of the clinical setting . Information on the accuracy of the methods used by these solutions is one fundamental prerequisite for their clinical application. Kinematic movement analysis is most often based on spatiotemporal data of defined anatomic locations. For clinical applications, these are usually transformed into clinically meaningful and interpretable parameters, such as gait speed, range of limb movements and amount of body sway during stance. Here, we explore the suitability of a commercially available motion sensor for clinical movement analysis.
Since the release of the first version of the Kinect sensor in November 2010, this markerless tool has been used in different research scenarios as a low cost alternative to time-of-flight sensors (e.g. SR-4000 CW10 by Mesa technologies) and motion tracking systems (e.g. Vicon, Optotrac). The second generation Kinect (Kinect V2), released in September 2014, is an RGB-Depth (RGB-D) sensor that emits a grid of infrared light. The distance of objects within the camera’s recording range is calculated from time-of-flight analysis of reflected light beams, which yields a depth model of surrounding structures. Based on machine learning techniques, the software development kit (SDK) of the Kinect V2 detects human shapes (up to six people at once). It further provides an artificial skeleton based on 25 artificial anatomical landmarks (‘Kinect joints’) projected into these shapes based on depth data. The sensor improves on the V1 in several respects: it provides depth data with higher spatial resolution, an increased measurement range from 0.8–4m to 0.5–4.5m and increased number of tracked landmarks from 21 to 25. Kinect V1 and V2 have been proposed for the quantification of motor symptoms, for example in posturography [3, 4], gait analysis [5–7] and quantification of hypokinesia in Parkinson’s disease . Gait analysis with multi-camera  or single-camera setup [5, 6] in healthy subjects suggested high accuracy for gait speed, stride length, stride time but lower accuracy for other parameters like stride width or speed variability. The accuracy of functional movement parameters of standing balance seemed to depend on observed movement amplitude [3, 4]. However, analysis was confined to trunk landmarks in these studies. Comprehensive studies on Kinect V2 landmark movement accuracy [8, 9] pointed to differences in signal accuracy with landmark location and the direction of movements performed. Importantly, retest reliability was not generally lower with Kinect compared to other motion tracking systems. In addition, investigations of functional movement parameters in a patient cohort found similar accuracy in both Parkinson’s disease and healthy subjects with Kinect V1 .
With the present study, we aim to provide a comprehensive evaluation of Kinect V2 accuracy for further development of Kinect V2 into a clinically applicable tool for movement analysis. We first explored the spatiotemporal accuracy of 21 out of 25 different Kinect V2 anatomical landmarks against multi-camera optical motion capture (Vicon) in a set of six motor tasks concerned with balance and lower limb function. Secondly, based on both capturing methods, we analysed the agreement of 45 clinical parameters derived from these tasks and compared their precision in three to five test repetitions. We further propose pre-processing procedures for Kinect V2 data.
Materials and Methods
Nineteen healthy individuals (age: 29.5 ± 4.4 years, height: 171.7 ± 7.4 cm, 12 female/ 7 male) volunteered to participate in this study. Inclusion criteria were absence of any neurological, motor or cognitive impairment. All participants attended one test session and no inter-day repeated measurements were performed. The study was approved by the Human Research Ethics Committee of Max Planck Institute for Human Development. All subjects provided written informed consent.
Kinect data were captured with Motognosis Labs v1.0 (Motognosis UG, Berlin, Germany) with a Kinect for Windows V2 Sensor  at 30Hz sampling rate. Motognosis Labs used the Software Developer Kit Version 1409 provided by Microsoft . The Kinect sensor was placed on a tripod at 1.4m height with a vertical angle of −8°. The sensor was placed to approximately match the orientation of the coordinate system from the gold-standard reference, facing the frontal plane of the test subjects in all tasks. The Kinect skeleton model with its 25 anatomical landmark locations is shown in Fig 1 (left). As gold-standard reference motion tracking system, we used a 16-camera Vicon system (MX13+, Nexus 2.1; Vicon Motion Systems Ltd., Oxford, UK) using 36 attached IR reflecting markers (Fig 1, middle and right). It was configured to measure marker positions at 100 Hz with 2mm accuracy within an area of 3m by 6m. The Kinect system covered a trapezoid measurement area of roughly 3m by 4m, with a maximum distance of 4.5m to the sensor. Descriptions of all six performed tasks are given in Table 1. All tasks were recorded simultaneously with both systems. The systems were directly connected from Kinect audio output to Vicon’s audio input via cable. Audio start and stop signals were given for offline temporal synchronization. Each task was performed three to five times before measuring the next. For all tasks except walks, patients started at 2.5m distance to the Kinect sensor for best depth resolution . To cover full gait cycles in gait tasks, starting position for these tasks was in 5m distance to the Kinect sensor, which was slightly outside of the sensor range.
Adapted from  under Creative Common license.
Kinematic data acquired with the Vicon system was pre-processed using standard pipelines of the Vicon Software (Nexus 2.1) which includes reconstruction of the model and labeling of the markers, based on individually calibrated subject models. Details on Vicon preprocessing procedures and configuration are provided in the S1 File. Resulting marker labels and marker selection were manually corrected if necessary. Kinect data were used as provided by Kinect SDK without additional pre-processing steps.
Data processing was performed in MATLAB v2015a (MathWorks Inc., Natick, Massachusetts, United States). To compare the movement signals of both systems, spatial and temporal alignment of Vicon and Kinect data were necessary. The following steps were performed for each pair of recordings:
- The Vicon marker positions were aggregated to represent landmark positions similar to the Kinect skeleton. This was achieved by using the nearest marker of similar representation (e.g. Shoulder, Hand) or by using the mean of the representing markers (e.g. Wrist, Hip). For more detail see S1 Appendix. Since there were no markers for fingertips and thumb in the Vicon marker model, these landmarks of the Kinect model were excluded from further analysis. The resulting mapped skeleton contains therefore only 21 anatomical landmarks.
- For the alignment of both coordinate systems (origin and axis), we assumed Vicon data (as gold standard) to contain minimal rotation or tilt bias, as this would be corrected during standard pre-processing. Kinect sensor tilts were corrected by using floor normal vectors provided by the Kinect SDK using 3D rotation correction. The final correction step was performed by translation of the Vicon coordinate system to minimize the mean spatial differences between both recordings.
- Missing values within Vicon data with a gap size smaller than 5 frames were reconstructed by linear interpolation and subsequent re-sampling to 30Hz. No gaps larger than 5 frames were found within all Vicon measurements. Since 5 frames with 100Hz sampling rate are about 1.5 frames in 30Hz, this Vicon interpolation is considered negligible and does not alter the signal behaviour.
- Due to the loss of audio signals, synchronization by using them resulted in unstable signal offsets. Instead, we used cross correlation shifts of selected landmark movements (see Mentiplay et al. ). Respective landmarks were selected for their magnitude of movement depending on the motor task. Since cross correlation requires stationary linear signals , the approach was not suitable for gait tasks due to the non-stationary signal in anterior-posterior (AP) dimension. Aligned results still contained temporal offsets of more than 10 frames. Synchronisation for these tasks was therefore achieved with a distance minimization approach based on the AP-signals.
Since one aim of this study was to analyse the accuracy of Kinect landmark movements, these were not smoothed during the data processing steps. The skeleton mapping as performed here (see processing pt. 1) may lead to a spatial bias between corresponding markers and landmarks from both systems. As some analyses require metrical comparison of landmark movements from Kinect and Vicon, we minimized their bias by subtracting the mean of each signal from the signal. This type of signal is further called ‘zero-mean-shifted’.
The processing steps resulted in two 3D skeleton movements with 21 different landmarks sampled at 30Hz. The movements of a single landmark can be described as time series (signals) for each movement dimension. We refer to these signals further on as ‘movement signals’. The movement dimensions are anterio-posterior (AP), medio-lateral (ML) and vertical (V).
Data Analysis of Movement Signals
Movement signals are the foundation of all kinematic parameters used for movement description. We therefore first analysed the accuracy of movement signals before proceeding to the analysis of derived clinical parameters. The accuracy of movement signals was expressed as the mean 3D Euclidean distance (diff3D) of the zero-mean-shifted movement signals of the Vicon and the Kinect systems and as Pearson’s correlation coefficient (r) of each anatomical landmark in each dimension. Based on the thresholds provided by Portney and Watkins, we distinguish between poor (r < .4), moderate (r = .4 - .7), good (r = .7 - 0.9) and excellent (r > .9) accuracy .
To quantify noise behaviour of Kinect in comparison to the gold standard system, we utilized the signal to noise ratios (SNR) based on the signal variance (see Formula 1) . (1) We assumed movement signals from Vicon to represent the true signal (gold standard) and thus referred to the difference between the zero-mean shifted signals as noise. SNRs were calculated for each landmark and dimension as the ratio between variance of the Vicon signal and variance of the noise. Since SNR is typically given in decibel (dB) a transformation of 10 log 10 was applied .
A SNR below 0dB indicates that variance of the noise is larger than the variance of the signal, while 10dB indicates that the signal variance is 10 times larger than the variance of the noise. Since no thresholds for these movement signals are given in literature, we propose thresholds of -10dB and 10dB after visual analysis of the movement signals. Signals with SNR above 10dB can be seen as accurate enough for further analysis. Signals showing SNR below -10dB should be handled with care and are altered or influenced by large noise. Signals that show SNR between -10dB and 10dB seem to be often influenced by small noise or small systematic bias (e.g. in signal amplitude) and should be analysed individually for their suitability of further analyses.
With both motion tracking systems, unreliable landmarks or marker locations may incidentally occur, for instance due to the coverage of landmarks or markers by other body parts. In this case, a ‘jumping’ behaviour of movement signals in one or all dimensions is observed (see S2 and S3 Figs for examples), further called ‘calibration error’. Such calibration errors generally reduce the accuracy of a movement signal. While small, low frequent calibration errors only introduce noise to the signal, large or highly frequent calibration errors can alter a movement signal significantly and would lead to measurement error in derived clinical parameters.
To identify large-amplitude calibration errors that could lead to measurement error, we performed outlier detection prior to the calculation of clinical parameters. Since SNR depends on the signal amplitude, generalised thresholds seemed inappropriate for outlier detection. Based on previous test recordings, we chose Spine base and ankle landmarks as indicators for the occurrence of calibration errors. We derived the maximum velocity and the largest difference of the signal amplitudes between both systems for these landmarks. Based on the limitations of natural movement behaviour, we set lower thresholds of 0.006m/frame for maximum velocity and 0.1m for amplitude differences. If a measurement exceeded both thresholds for one of these landmarks, it was defined as erroneous and excluded from the analyses of derived clinical parameters. However, detected outliers remained in the dataset for the analyses of the movement signal accuracy.
Extraction and Analysis of clinical Parameters
All tasks targeted different movement behaviours aiming to detect and describe specific motor problems. This necessitated task specific parameter extraction (Table 1). For the ‘stand up and sit down” (SAS) task, postural transition was identified by movement analysis of the shoulder spine landmark and given as time per movement phase and trunk excursion in AP and ML dimension. Additionally, the hand range of motion in AP dimension was calculated as a possible compensatory movement strategy. For all three walk tasks (SCSW, SMSW and SLW), we focused on overall walking speed and quantification of upper body motion during walking as a potential measure of dynamic balance [6, 14]. Due to the short distance, we did not include gait cycle detection and associated parameters like step length and width from our parameter set.
For stance with open and closed eyes (POCO), we analysed body sway at the level of the hip, i.e. close to the body’s centre of gravity as described previously . For walking on the spot (STEPO), we focused on the quantification of lower limb movements described by ranges of motion (RoM) on anterio-posterior-vertical (AP-V) plane and step count per minute (cadence) as a potential measure of for instance muscular weakness, hypokinesia or muscle fatiguing.
In total, 45 different clinical parameters were extracted.
Statistical analysis was performed in MATLAB v2015a (MathWorks Inc., Natick, Massachusetts, United States) and visualised in Python 3.4 using the packages ‘seaborn’ and ‘matplotlib’. ICC(1,1) (one-way random model) and standard error of measurement (SEM) were used to describe repeatability of derived parameters for each system. For better comparison of parameters, the SEM was expressed as proportion of the mean. Absolute agreement between Vicon and Kinect was described by ICC(3,1) (two-way mixed model) and limits of agreement (LOA). Pearson’s correlation coefficient (r) was used to describe consistency by neglecting systematic measurement bias.
Accuracy of Movement Signals
Spatial accuracy of the Kinect landmark movements is reported as 1) mean Euclidean 3D distances (diff3D) between temporally aligned zero-mean shifted signals to show absolute differences for signal pairs and 2) Pearson’s correlation coefficients (r) against Vicon markers. In addition, signal to noise ratios (SNR) are reported in AP, ML and V dimension each. As an overview, means and standard deviations for all expressions of signal accuracy averaged over all tasks, subjects and measurement repetitions are shown in Table 2. Task specific landmark accuracy is provided in Fig 2 and S2 Appendix.
Data are presented as mean and standard deviation (SD) of all measurements (including all subjects, tasks and measurement repetitions).
Bi-lateral joints were aggregated by their mean for better visualisation. Abbrev: AP—anterio-posterior; ML—medio-lateral; V—vertical.
The 3D differences between Vicon and Kinect V2 movement signals were typically between 1 and 2cm, except for higher values for feet and ankles (diff3D > 5cm). Pearson’s correlation coefficients (r) were highest in AP dimension and good (4) or excellent (15) in all landmarks with excellent spatial agreement. Exceptionally, only moderate correlations were seen for feet. Signals in ML dimension provided good results as well (head excellent, 16 landmarks good, feet and ankle moderate). However, in vertical dimension, correlations were only poor (6) or moderate (7) to good (8). Observed accuracy also varied with landmark location, with head having the highest and feet the lowest values (rAP foot L/R = 0.64/0.66; rAP head = 0.99). Furthermore, spatial accuracy of landmarks was found to depend on the measured task, for example the Spine base landmark in quiet stance (POCO) had rML = 0.95, but rML = 0.64 in stand up and sit down (SAS) (see S2 Appendix).
Fig 2 presents SNR per task, landmark and dimension as an indicator of overall signal quality. The standard deviations of SNR are smaller than those of the correlation analyses (see S1 Fig) making SNR results easier to interpret. Similar to the Pearson’s correlation coefficients (r), the most robust signal quality, i.e. the highest SNRs, are in AP dimension (most > 10dB). SNRs in ML dimension were smaller and less consistent between different tasks (most upper body landmarks > 5dB). The lowest SNRs were seen in V dimension (most between -10db and 8dB) with the exception of SAS, that showed large SNR (> 18dB) in V dimension, probably related to the large vertical movement in this task. Feet and ankles generally showed small SNR in all dimensions (SNR < 0dB), especially in SAS and POCO tasks, i.e. with feet stable on the ground throughout the task. The best mean SNRs for feet in V dimension for STEPO tasks were still near 0dB (Feet SNR V = 0.5dB), while, the ankle and knee landmarks seemed more stable (ankle SNR V = 2.73dB; knee SNR V = 5.71dB).
In total, 13 out of 532 measurements were detected that contained large calibration errors and were excluded from the calculation of clinical parameters. These comprised 12 Kinect and 1 Vicon measurements from the following tasks: SAS (1 Kinect and 1 Vicon), SLW (2 Kinect) and POCO (9 Kinect). Calibration errors were most prominent in V dimension (11 measurements) with 3 Measurements showing additional errors in AP dimension. As expected, detected outliers had highly negative mean SNRs of -29,55 dB in V dimension, -40,51 dB in AP and -44,16 dB in ML dimension. An overview of the outliers is given in the S1 Table.
Accuracy and Repeatability of clinical Parameters
As shown in Table 3, most of the 45 clinical parameters showed good to excellent absolute agreement (ICC(3,1): 30 parameters > 0.7) and consistency (r: 38 parameters > 0.7), Absolute agreement was especially high for trunk movement and time needed for postural transitions, gait speed determined from short walks at different speeds, sway velocity in quiet standing, as well as cadence while walking in place. Lower accuracy was determined for roll trunk movement in roll direction in all short walks (ICC(3,1) of 0.43–0.65). For knee RoM in AP-V plane while walking on the spot, low accuracy (ICC(3,1) < 0.12) was accompanied by good consistency (rL = 0.72, rR = 0.83) for this parameter. This may be explained by a systematic measurement bias of 0.05m in this parameter. Since a measurement bias was only observed in this parameter, a general, systematic bias between both systems is unlikely. Up-down deviation during short line walk was the only parameter that showed poor absolute agreement (ICC(3,1) = 0.03) and poor consistency agreement (r = 0.09). This is likely attributable to the small RoM (ca. 0.4cm) and the noise behaviour of the Spine base in vertical dimension (SNR < 0). This is likely attributable to the small RoM (ca. 0.4cm), the noise behaviour of the hip joints (SNR < 0) and generally poor accuracy for vertical movement components.
Means and standard deviations (SD) are given along with accuracy (ICC(3,1) and Pearson’s r, LOA in % of methods’ mean).
To address repeatability of each parameter and for each method, ICC(1,1) and Standard Error of Measurement (SEM) as percentage of mean were calculated (see Table 4). ICC(1,1) was acceptable for most parameters (ICC(1,1) > 0.6 Kinect: 33; Vicon:30). More importantly, repeatability results were of similar magnitude for both, Kinect V2 and Vicon derived parameters (ICC(1,1) Kinect V2.20–0.98; Vicon.28–0.98). Relative SEM was acceptable (< 20%) for Kinect V2 in 31 of 45 parameters investigated (Vicon: 30). This included all parameters with high between-method agreement as outlined above. In total, 12 parameters showed good SEM (< 10%) in Vicon and Kinect, including walking speeds in all walk tests, time parameters and AP movement components of SAS and all parameters from STEPO.
In the present study, we evaluated the suitability of the Kinect V2 sensor for clinical motion analyses against a gold standard reference system, namely Vicon. We analysed landmark movement accuracies as well as the accuracy and reliability of different clinical parameters derived from six motor tasks in young healthy subjects. Caution should be taken since, the presented results can only be generalised for young healthy adults.
Methods, Setup and Data Processing
The automatically labelled anatomical landmarks from Kinect yielded signals of sufficient calibration accuracy in 520 of 532 measurements compared to 531 Vicon measurements. This is remarkable, as calibration with Vicon in our experience required far more manual processing effort. The aggregation of Vicon markers that was chosen according to Galna et al. seemed appropriate as only minimal spatial offsets were observed between the aligned signals from Kinect V2 and Vicon. However, with the inherent differences between the 3D skeletons of both methods in mind, i.e. surface markers with Vicon versus landmarks within the body shape with Kinect V2, this approach was not intended to achieve exact location matching. As derived clinical parameters are calculated only within each method’s coordinate system without any reference to absolute external locations, this approach may slightly affect 3D Euclidean distances, but is not expected to affect correlation analyses and agreement of clinical parameters. For the same reason, the spatial alignment used here is considered appropriate for the purpose of our study. If anatomical correctness was to be studied (such as in [15–17]), synchronisation should rather use a multi-point minimization approach . The higher Euclidean distances seen for foot landmarks coincide with low between-method correlations and low SNR. We therefore consider the spatial offset for these landmarks not due to differences in skeleton models but attributable to signal noise, for example a higher rate of calibration errors. Concerning temporal synchronisation by audio signals, we expected delays and remaining offsets of < 500ms due to the varying latency in sound card processing . Unexpectedly, synchronization by audio signals turned out to be unreliable due to signal losses from the Kinect to the Vicon system. The synchronisation by cross-correlation and distance minimization seemed reliable, since no detected temporal offsets remained, but required manual selection of suitable landmark movements. For future work, the Network Time Protocol  or the Precision Time Protocol  seem more appropriate, especially if one system is expected to show temporal delay during recording.
Movement signal accuracy
Main findings from this part of analysis were the differences in signal accuracy according to 1) directional components (lowest for vertical), 2) landmark location (lowest for feet) and 3) performed movement task. The last is possibly attributable to the differences in movement amplitudes. As one conclusion of this study, 3D positions of axial landmarks (Spine base, Spine mid, Spine shoulder and Head) and upper body extremity landmarks (Hand, Elbow and Shoulder) can validly be used for general movement analyses and calculation of clinical parameters.
Concerning the differences in accuracy between the directional components, our data support previous reports on clinical parameters derived from Kinect V2 trunk landmarks during standing , where highest accuracy was also observed for AP compared to ML movements, while V components were not reported. In our study, r in V dimension did not exceed 0.8 in any of the landmarks and was < 0.7 in 13 out of 21 landmarks. Interestingly, highest accuracies in the vertical movement components (r > 0.7) were observed for head, shoulder, (not elbow), wrist and hand signals. A similar pattern for the accuracy of limb landmarks was seen in a recent study that used a Kinect V2 multi-camera setup . We interpret this as a consequence of Kinect SDK optimization for the intended use of the Kinect sensors in the context of interactive computer-gaming based on gesture recognition. For all other landmarks with low accuracy in the vertical dimension, different recording angles may be explored to increase accuracy, if the tracking of (minor) vertical displacement is of interest.
Signals of feet (and ankles) had the lowest accuracy according to mean 3D distance and correlation analysis in all dimensions. Their low negative SNRs point to a general instability of this landmark location that differs with the task (Fig 2), with the worst SNR for stable foot positions throughout the task. This has also been noted by others  and is interpreted as a specific difficulty of Kinect V2 to differentiate feet from ground in such conditions. Furthermore, differences in signal accuracy were seen between tasks for the same landmarks. One explanation is that the accuracy of movement signals is influenced by the respective landmark’s range of motion , and increases with larger movements as larger signals favourably alter the SNR. The reason is, that the noise is proportionally smaller in signals with larger amplitudes. This is supported by the high accuracy for AP in walks, and for V in SAS for head, trunk, arms and hips. In terms of noise behaviour, the instability of landmark locations according to SNR are first, reflected in generally lower signal agreement for the same landmarks, and, second, are outweighed by signal increases such that accuracy improves. Other possible factors that may contribute to differences in accuracy are differences in body posture or coverage of landmarks to different extents with different tasks (e.g. feet in SLW task). For clinical applications, we therefore recommend to design movement tasks preferably to not cover landmarks during execution. Nevertheless, as outlined above, advanced filtering techniques or alternative skeleton models may also derive more accurate clinical parameters even for small movements, like tremor, or temporarily covered landmark locations.
Accuracy and Reliability of clinical Parameters
Based on clinical assessment routines, we extracted 45 different parameters to describe the movement behaviour of each subject. Previous publications showed, that Kinect V1 and V2 measurements of landmark angles [16, 20, 21] and length of body parts  derived from different movements may lack accuracy. Therefore, we focused on parameters based on single ‘stable’ landmarks with the exception of POCO, where foot landmarks were integrated into an anchor point for the sway vector.
In summary, most clinical parameters showed high absolute agreement and no systematic bias between systems. The parameters that showed moderate absolute agreement mostly showed high consistency agreement as well. This leads us to the assumption that the Kinect V2 is accurate enough to measure these clinical parameters in healthy subjects. Our data concur with previous reports on gait analysis with Kinect V2  with respect to comfortable and maximum speeds including high accuracy and repeatability for these parameters. Galna et al. used Kinect V1 to analyse a task similar to SAS. Although they measured performance time of 5 stand up-sit down tasks, whereas we assessed both transition phases of the movement separately, the Pearson’s correlation coefficients against the Vicon standard are equally high (Galna et al. r = .999 vs r = .98 here). The same study also analysed the stepping on the spot task and observed somewhat higher cadence (Galna et al. 50.85 steps/min vs. 47.3 steps/min here) but similar accuracy for this parameter (Galna et al. r = 0.983 vs. r = 1 here), whereas we found a systematic spatial bias for knee RoM. Further investigation should analyse the cause of this bias and their impact on clinical interpretation.
As discussed above, the dependency of movement signal accuracy on movement amplitude may impact derived clinical parameters. For instance, smaller movement parameters show larger LOAs (see e.g. Deflection Range in ML direction during SAS or walk assessments) and are therefore more difficult to interpret. Since our data were derived from young healthy adults, a generalisation to pathological movements is difficult. If decreased movement amplitude is expected in the disease under study, such as hypokinesia in Parkinson’s disease, this may negatively affect signal accuracy with Kinect especially for ‘noisy” landmarks. However, an evaluation of Kinect V1 in healthy controls and Parkinson’s disease patients with mild-moderate severity did not reveal major differences in accuracy between groups . In contrast, for trunk sway during standing, the RoM may even be expected to increase with different diseases which, accordingly, may even improve accuracy compared to our data in healthy subjects . As a consequence, as has been suggested for the validation of other sensor-based motion analysis solutions , testing the accuracy of Kinect V2 in the target populations of clinical application should be considered.
Repeatability is another measure that has to be considered for the interpretation of results, as it impacts on the parameters’ potential to track changes. In this respect, all time parameters, the AP trunk movement during postural transition and knee displacement when walking in place showed excellent reliability in immediate retest. Importantly, repeatability analysis yielded rather similar results for both, Kinect V2 and Vicon. Deflection range of sway during standing, although measurable with high accuracy according to correlation with Vicon, showed lower repeatability than sway velocity, that thus proves more favourable as a parameter to follow up postural disturbance. In contrast, although the accuracy for knee excursion in STEPO is only moderate, this parameter is among those with highest repeatability in agreement with previous findings . Again, also the results of repeatability analysis may be distorted with only small between-subject variance seen for some parameters in healthy subjects. Thus, as observed in other studies , repeatability measures may even prove better in patient groups with more diverse motor performance. This may also apply for trunk vectors during short walks in conditions where increased trunk motion during gait is to be expected, such as in multiple sclerosis .
The results presented here help to select clinical parameters with potential for further clinical application to be validated in patient groups. While some parameters like walking speed or postural sway velocity are already in use as clinical measures, the clinical meaning of others like the leg parameters from stepping in place still need to be defined. As both time and range of the step-like movements in STEPO showed high repeatability, it will be interesting to explore these parameters as potential surrogates of locomotor stepping. Our results may further guide the design of new assessment tasks and derived clinical parameters using Kinect V2 technology.
S1 File. Vicon Processing Pipeline.
This pipeline is used by the Vicon system for preprocessing of recorded data and includes the set of parameters that can be adjusted in the system.
S1 Appendix. Description of Mapping from Vicon to Kinect Landmarks.
A description of the processing steps to map the Vicon marker positions of a standard gait model to corresponding Kinect V2 landmark locations.
S2 Appendix. Landmark Accuracies for each Task.
A detailed overview of the signal accuracies per assessment task. Signal accuracy is described as mean 3D distance of zero shifted signals, Pearson’s correlation coeffcient (r) and Signal to noise ratios (SNR).
S1 Fig. Correlation Coefficients (r) of all Joints per Assessment in each Dimension.
S2 Fig. Example of Calibration Errors in Kinect Data.
S3 Fig. Example of Calibration Errors in Vicon Data.
- Conceptualization: KO BK SMM JV FP AUB TSH.
- Data curation: KO BK SMM JV.
- Formal analysis: KO BK SMM AUB TSH.
- Investigation: KO BK SMM JV AUB TSH.
- Methodology: KO BK SMM JV AUB TSH.
- Project administration: KO JV TSH.
- Resources: KO BK SMM JV.
- Software: KO BK SMM.
- Supervision: JV AUB TSH.
- Validation: KO BK SMM JV AUB TSH.
- Visualization: KO BK SMM.
- Writing – original draft: KO BK SMM TSH.
- Writing – review & editing: KO BK SMM JV FP AUB TSH.
- 1. Heldman DA, Espay AJ, LeWitt PA, Giuffrida JP. Clinician versus machine: Reliability and responsiveness of motor endpoints in Parkinson’s disease. Parkinsonism & Related Disorders. 2014 Jun;20(6):590–595. pmid:24661464
- 2. Galna B, Barry G, Jackson D, Mhiripiri D, Olivier P, Rochester L. Accuracy of the Microsoft Kinect sensor for measuring movement in people with Parkinson’s disease. Gait & Posture. 2014 Apr;39(4):1062–1068. pmid:24560691
- 3. Clark RA, Pua YH, Oliveira CC, Bower KJ, Thilarajah S, McGaw R, et al. Reliability and concurrent validity of the Microsoft Xbox One Kinect for assessment of standing balance and postural control. Gait & Posture. 2015 Jul;42(2):210–213. pmid:26009500
- 4. Behrens JR, Mertens S, Krüger T, Grobelny A, Otte K, Mansow-Model S, et al. Validity of visual perceptive computing for static posturography in patients with multiple sclerosis. Multiple Sclerosis Journal. 2016;p. 1352458515625807. pmid:26814201
- 5. Mentiplay BF, Perraton LG, Bower KJ, Pua YH, McGaw R, Heywood S, et al. Gait assessment using the Microsoft Xbox One Kinect: Concurrent validity and inter-day reliability of spatiotemporal and kinematic variables. Journal of Biomechanics. 2015 Jul;48(10):2166–2170. pmid:26065332
- 6. Behrens J, Pfüller C, Mansow-Model S, Otte K, Paul F, Brandt AU. Using perceptive computing in multiple sclerosis-the Short Maximum Speed Walk test. Journal of NeuroEngineering and Rehabilitation. 2014;11(1):89. pmid:24886525
- 7. Pfister A, West AM, Bronner S, Noah JA. Comparative abilities of Microsoft Kinect and Vicon 3D motion capture for gait analysis. Journal of Medical Engineering & Technology. 2014;38(5):274–280. pmid:24878252
- 8. Geerse DJ, Coolen BH, Roerdink M. Kinematic Validation of a Multi-Kinect v2 Instrumented 10-Meter Walkway for Quantitative Gait Assessments. PLoS ONE. 2015 Oct;10(10):e0139913. pmid:26461498
- 9. Wang Q, Kurillo G, Ofli F, Bajcsy R. Evaluation of pose tracking accuracy in the first and second generations of Microsoft Kinect. In: Healthcare Informatics (ICHI), 2015 International Conference on. IEEE; 2015. p. 380–389.
- 10. Kinect—Windows app development;. Available from: https://developer.microsoft.com/en-us/windows/kinect.
- 11. Stein J. Digital Signal Processing: A Computer Science Perspective. 1st ed. Paperback; 2003.
- 12. Portney LG, Watkins MP. Foundations of Clinical Research: Applications to Practice. Pearson/Prentice Hall; 2009.
- 13. Plonus M. Electronics and Communications for Scientists and Engineers. Elsevier Science; 2001.
- 14. Spain RI, St George RJ, Salarian A, Mancini M, Wagner JM, Horak FB, et al. Body-worn motion sensors detect balance and gait deficits in people with multiple sclerosis who have normal walking speed. Gait & Posture. 2012 Apr;35(4):573–578. pmid:22277368
- 15. Yeung LF, Cheng KC, Fong CH, Lee WCC, Tong KY. Evaluation of the Microsoft Kinect as a clinical assessment tool of body sway. Gait & Posture. 2014 Sep;40(4):532–538. pmid:25047828
- 16. Huber ME, Seitz AL, Leeser M, Sternad D. Validity and reliability of Kinect skeleton for measuring shoulder joint angles: a feasibility study. Physiotherapy. 2015 Apr. pmid:26050135
- 17. Xu X, McGorry RW. The validity of the first and second generation Microsoft Kinect™ for identifying joint center locations during static postures. Applied Ergonomics. 2015 Jul;49:47–54. pmid:25766422
- 18. Walker M. Dealing With Computer Audio Latency. Sound on Sound. 1999 Apr
- 19. IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. IEEE Std 1588–2008 (Revision of IEEE Std 1588–2002). 2008 Jul;p. 1–269.
- 20. Schmitz A, Ye M, Shapiro R, Yang R, Noehren B. Accuracy and repeatability of joint angles measured using a single camera markerless motion capture system. Journal of Biomechanics. 2014 Jan;47(2):587–591. pmid:24315287
- 21. Kuster RP, Heinlein B, Bauer CM, Graf ES. Accuracy of KinectOne to quantify kinematics of the upper body. Gait & Posture. 2016 Jun;47:80–85. pmid:27264408
- 22. Bonnechère B, Sholukha V, Jansen B, Omelina L, Rooze M, Van Sint Jan S. Determination of Repeatability of Kinect Sensor. Telemedicine and e-Health. 2014 May;20(5):451–453. pmid:24617290
- 23. Welk GJ. Principles of design and analyses for the calibration of accelerometry-based activity monitors. Medicine and Science in Sports and Exercise. 2005 Nov;37(11 Suppl):S501–511. pmid:16294113
- 24. Mancini M, Salarian A, Carlson-Kuhta P, Zampieri C, King L, Chiari L, et al. ISway: a sensitive, valid and reliable measure of postural control. Journal of Neuroengineering and Rehabilitation. 2012;9:59. pmid:22913719